Hypercube Special Index & Fulltext Search
Transbase® Hypercube and Big Data Analysis
Transbase® Hypercube is the index technology for multi-dimensionally structured data and is ideally suited for Big Data analysis, Data Warehouse Systems or OLAP applications as well as a high-end Database System.
It is also possible to include hierarchically constructed dimensions. Hypercube is seamlessly integrated into the Transbase® core, greatly simplifying data definition and updating.
Transbase® Hypercube is particularly suitable for fast simultaneous analysis and evaluation of large Data sets according to several different characteristics.
For example, Transbase® Hypercube has been used by mobile providers for the geographical analysis of radio data, as well as in the Market research for the preparation of the sales figures of music, books and games.
Typical use cases are:
- Big Data Analysis
- Business Analytics & Business Intelligence (BA/BI)
- Business Performance Management
- Internet of Things (IoT)
- Planning, Analysis and Reporting
- Statistical Databases
Transbase® has a variety of properties that are not the same in specialized OLAP solutions. Especially in the above-mentioned applications, however, they are of decisive importance because of the large amounts of data.
The advantages of Transbase® Hypercube are:
- Performance and Scalability
- Backup and Recovery
- strict Transaction Protection
- Privacy by Encryption
Only through the complete and seamless integration of Hypercube technology as a real index technology into the database core is the decisive advantage in all areas of the application lifecycle.
The evaluation efficiency results from the excellent clustering properties of the Hypercube index as well as from the integration of the complex but efficient evaluation algorithms into the database kernel. The integration of the Hypercube algorithms into the normal query optimization process must be mentioned.
The efficiency of application development results from:
- clear description of the hierarchies in the definition of the database schema
- standard-compliant SQL queries without special constructs that obscure the semantics of the queries
- the automatic use of the hypercube index by the query optimizer
- the loss of programmatic effort to maintain
- the auxiliary data support for standard interfaces
For the modeling of databases with Transbase® Hypercube, the graphical database editor TBJ is particularly suitable. It supports the design and analysis of database schemas through a form-oriented presentation and definition of database objects. In particular, hierarchies and MHC schemes are also presented in a clear overview.
With the MHC technology, the Transbase® Hypercube special index combines two indexing technologies (multidimensional clustering and hierarchical clustering) into an efficient and easy-to-use tool. For the realization of hierarchical clustering, Transbase® Hypercube uses its own numerical coding, so-called surrogates.
Transbase® Hypercube incorporates the multidimensional hypercube index and the processes required to carry out efficient manipulation and evaluation of the stored data.
Due to its seamless integration into the database core, this technology can be applied easily, robustly and efficiently. This results in significant advantages in productivity in the creation, maintenance and development as well as in the use of the associated application. Alone because of the excellent performance is it realistically possible, in many cases, to carry out adhoc enquiries at all.
The Hypercube Index unites two indexing technologies to create an efficient and easy to use tool:
Multidimensional Clustering and Hierarchical Clustering
The combination of these techniques is therefore called Multidimensional Hierarchical Clustering (MHC). It supports in a special way the filtering and analysis of multidimensional and hierarchically structured data repositories, whereby linear and hierarchical features can occur in any combination desired.
Multidimensional clustering supports the processing of multidimensional data, i.e. data arranged according to different independent features.
Using the Hypercube Index as a physical data structure, data sets are cleverly organized as multidimensional points in space and stored according to their “spatial” position. This means that data sets which lie spatially close to each other are collected together in a small number of hard disc blocks and can therefore be downloaded from secondary storage using few read operations.
This accelerates particularly those queries which have many hits on multidimensional areas, since with large data repositories the efficiency of the evaluation is almost entirely dependent upon access via secondary storage.
With OLAP applications the individual dimensions are often organised hierarchically. The hierarchies are in the main defined by the possible levels of evaluation.
Hierarchical clustering enables not only the efficient filtering of the elements on the desired hierarchy levels, but also, and most importantly, the efficient grouping and analysis of the data repositories based on these elements.
Transbase Hypercube uses its own numerical coding, so-called surrogates. In principle this is similar to the allocation of post codes or zip codes, which reflect the hierarchy. This coding is then used for the clustered storage of data in the hypercube index, which thereby abridges the individual dimensions for the multi dimensional indexing.
The allocation and, as necessary, the updating of the surrogate is thereby carried out automatically by Transbase Hypercube. The developer is only required to define the hierarchical levels for the dimensions and accordingly the external encoding relationships. By means of the hierarchical coding using the surrogate technique the abridging of hierarchical elements are formed for each associated interval in the surrogate area.
A special algorithm is responsible for the optimal use of the hierarchical information contained in the surrogate: At first the restrictions of the dimensions are evaluated.
- This results in establishing the intervals in the dimension coding, which are evaluated particularly efficiently in the Fact Table via the hypercube index.
- By using the hierarchy coding in the Fact Table a grouping and summarizing of the data (i.e. addition) can even be carried out at this point, without needing to combine the individual data sets with the dimensions.
- The effort required for combining the few remaining groups with the dimensions (the so-called “reverse join”) is, as a result, almost insignificant.
By means of this method of query processing the execution of such queries is greatly accelerated in comparison to traditional database systems.
Transbase® Fulltext search in large collections of texts
Transbase® Fulltext is a special index for full-text indexing. It allows you to search large collections of texts in seconds for defined terms efficiently and completely.
Full SQL integration allows you to combine with other structured search terms. This makes Transbase® full text the ideal basis for all databases that combine textual and structural search. The text representation can be done in single-byte, multi-byte or UNICODE representation.
Fulltext indexes can be created on CHAR or VARCHAR table columns and on BLOB and CLOB columns. The BLOB objects can also contain any mixtures of text and images. Fulltext indexes can be dynamically removed and added.
The advantages of Fulltext indexing are:
Integration in SQL:
Search predicates can therefore be combined as desired with other search predicates. All logical links are usable within fulltext predicates and mixed with standard search predicates.
Word search and pattern matching:
The basic building block of the full-text search is the search for a word that is expressed with the search clause <field> CONTAINS ('word'). The search for a word pattern is expressed by the use of the usual wildcards '%' and '_' in SQL. Logical joins are expressed with AND and OR in the word argument list.
With an additional clause for the index type, Transbase® Fulltext can also cover the neighborhood search. This makes it possible to search for occurrences of word pairs at a given distance. For example, the clause <field> CONTAINS ('word1' 'word2') looks for occurrences of both words that appear immediately after one another in the text. In addition, optionally any minimum and maximum distance can be specified.
Character mapping, stop words and delimiters:
To reduce space requirements, when creating the full-text index, you can specify any list of words that are not indexed. Umlauts and lower case letters can be handled by arbitrary specifiable character mapping. Special characters and word breaks can also be explicitly specified by specifying delimiter characters if required.