Transbase® Hypercube Special Index & Fulltext Search


Big Data Analytics with Transbase® Hypercube

Transbase® Hypercube is the index technology for multi-dimensionally structured data and is ideally suited for Big Data analysis, Data Warehouse Systems or OLAP applications as well as a high-end Database System. It is also possible to include hierarchically constructed dimensions. Hypercube is seamlessly integrated into the Transbase® core, greatly simplifying data definition and updating.

Transbase® Hypercube is particularly suitable for fast simultaneous analysis and evaluation of large Data sets according to several different characteristics. For example, Transbase® Hypercube has been used by mobile providers for the geographical analysis of radio data, as well as in the Market research for the preparation of the sales figures of music, books and games.

Typical use cases are:
  • Big Data Analytics
  • Business Analytics & Business Intelligence (BA/BI)
  • Business Performance Management
  • Internet of Things (IoT)
  • Planning, Analysis and Reporting
  • Statistical Databases

In 2001, Transbase® Hypercube was awarded the European ICT Prize for successful and innovative applications in information technology. Transbase® has a variety of properties that are not the same in specialized OLAP solutions. Especially in the above-mentioned applications, however, they are of decisive importance because of the large amounts of data.

Through the complete and seamless integration of Hypercube technology as a real index technology into the database core there are essential advantages in all areas of the application lifecycle like
  • Performance and Scalability
  • Backup and Recovery
  • strict Transaction Protection
  • Privacy by Encryption

The evaluation efficiency results from the excellent clustering properties of the Hypercube index as well as from the integration of the complex but efficient evaluation algorithms into the database kernel. The integration of the Hypercube algorithms into the normal query optimization process must be mentioned.

For DevOps Transbase® is especially efficient through

  • a clear description of hierarchies within the database schema definition
  • standard-compliant SQL queries without special constructs that obscure the semantics of the queries 
  • the automatic use of the hypercube index by the query optimizer
  • the loss of programmatic effort to maintain
  • the auxiliary data support for standard interfaces

For the modeling of databases with Transbase® Hypercube, the graphical database editor TBJ is particularly suitable. It supports the design and analysis of database schemas through a form-oriented presentation and definition of database objects. In particular, hierarchies and MHC schemes are also presented in a clear overview.

Transbase® MHC (Multidimensional and Hierarchical Clustering)

With the MHC technology, the Transbase® Hypercube special index combines two indexing technologies (multidimensional clustering and hierarchical clustering) into an efficient and easy-to-use tool.

The multidimensional Hypercube index and the procedures required for it are used for efficient manipulation and evaluation of the stored data. Due to its seamless integration into the database core, this technology can be applied easily, robustly and efficiently. This results in significant advantages in productivity in the creation, maintenance and development as well as in the use of the associated application. Alone because of the excellent performance is it realistically possible, in many cases, to carry out adhoc enquiries at all.

Multidimensional Clustering (MC)

Multidimensional clustering supports the processing of multidimensional data, i.e. data arranged according to different independent features.

Using the Hypercube Index as a physical data structure, data sets are cleverly organized as multidimensional points in space and stored according to their “spatial” position. This means that data sets which lie spatially close to each other are collected together in a small number of hard disc blocks and can therefore be downloaded from secondary storage using few read operations.

This accelerates particularly those queries which have many hits on multidimensional areas, since with large data repositories the efficiency of the evaluation is almost entirely dependent upon access via secondary storage.

Hierarchical Clustering (HC)

With OLAP applications the individual dimensions are often organised hierarchically. The hierarchies are in the main defined by the possible levels of evaluation.

Hierarchical clustering enables not only the efficient filtering of the elements on the desired hierarchy levels, but also, and most importantly, the efficient grouping and analysis of the data repositories based on these elements.

Surrogat Technique

For the realization of hierarchical clustering, Transbase® Hypercube uses its own numerical coding, so-called surrogates. In principle, this is similar to the allocation of postcodes, which represent the hierarchy. This coding is then used for the clustered storage of the data in the hypercube index, which entangles the surrogates of the individual dimensions for multidimensional indexing.

The assignment and, if necessary, updating of these surrogates is carried out automatically by Transbase® Hypercube. The developer only defines the hierarchy levels for the dimensions and the corresponding foreign key relationships. The hierarchical coding using surrogate technology means that the restrictions on hierarchical elements result in connected intervals in the surrogate area.

Data Storage:

The automatic maintenance of the hierarchy information (surrogates) in the data stocks means that no special processing of the data is required when the data is transferred, which makes the entire process considerably easier. In addition, the following features contribute to the fact that the data transfer and evaluation can take place in a simple and efficient manner:

  • Stored procedures and user-defined functions enable complex transformations and evaluations during the ETL process.
  • The Transbase® OCI-Reader and the Transbase® JDBC-Reader enable direct access to Oracle or JDBC-compatible databases and thus make the detour via export and import superfluous.
  • Common ETL tools can be connected via standard interfaces such as ODBC, OLEDB, ADO.NET and JDBC.
  • Special algorithms accelerate the insertion of large amounts of data into Transbase® Hypercube.
  • Using analytical functions, complex evaluations can be formulated directly in SQL (e.g. RANK, PARTITION OVER). This relieves the burden on reporting applications while the database system can fully utilize its optimization possibilities.
Data Evaluation:

A special algorithm ensures optimal use of the hierarchy information contained in the surrogates: First, the restrictions of the dimensions are evaluated.

  • This results in establishing the intervals in the dimension coding, which are evaluated particularly efficiently in the fact table via the Hypercube index.
  • The hierarchy coding in the fact table enables data to be grouped and summarized (e.g. summation) without having to combine the individual data records with the dimensions.
  • The effort involved in combining the few remaining groups with the dimensions (the so-called "back join") is therefore almost negligible.

This query processing speeds up the execution of such queries by orders of magnitude compared to conventional database systems.

Learn more about Transbase® Hypercube or an application example for efficient use.

 

Transbase® Fulltext search in large collections of texts

Transbase® Fulltext is a special index for full-text indexing. It allows you to search large collections of texts in seconds for defined terms efficiently and completely. Full SQL integration allows you to combine with other structured search terms. This makes Transbase® full text the ideal basis for all databases that combine textual and structural search. The text representation can be done in single-byte, multi-byte or UNICODE representation.

Fulltext indexes can be created on CHAR or VARCHAR table columns and on BLOB and CLOB columns. The BLOB objects can also contain any mixtures of text and images. Fulltext indexes can be dynamically removed and added.

Advantages of Fulltext indexing are:
  • Integration in SQL: 
    Search predicates can therefore be combined as desired with other search predicates. All logical links are usable within fulltext predicates and mixed with standard search predicates.
  • Word search and pattern matching: 
    The basic building block of the full-text search is the search for a word that is expressed with the search clause <field> CONTAINS ('word'). The search for a word pattern is expressed by the use of the usual wildcards '%' and '_' in SQL. Logical joins are expressed with AND and OR in the word argument list.
  • Neighborhood search: 
    With an additional clause for the index type, Transbase® Fulltext can also cover the neighborhood search. This makes it possible to search for occurrences of word pairs at a given distance. For example, the clause <field> CONTAINS ('word1' 'word2') looks for occurrences of both words that appear immediately after one another in the text. In addition, optionally any minimum and maximum distance can be specified.
  • Character mapping, stop words and delimiters: 
    To reduce space requirements, when creating the full-text index, you can specify any list of words that are not indexed. Umlauts and lower case letters can be handled by arbitrary specifiable character mapping. Special characters and word breaks can also be explicitly specified by specifying delimiter characters if required.

Learn more about Transbase® Fulltext or visit our Info-Center.​​​​​​​