Due to our Product Development and our close Scientific Cooperation with Universities, leading Research Institutes and Knowledge Networks, we are always confronted with topics that we publish as a company or individual employees of Transaction Software.
For software developers on the customer side we offer helpful recommendations. You can also profit from this practice-oriented collection of Knowledge at any time.
Nikos Karayannidis, Aris Tsois, Timos Sellis, Roland Pieringer, Volker Markl, Frank Ramsak, Robert Fenk, Klaus Elhardt, Rudolf Bayer: Processing Star Queries on Hierarchically-Clustered Fact Tables Proceedings of the 28th International Conference on Very Large Databases (VLDB), 2002, Hongkong, China
Star Queries are the most prevalent kind of queries in Data Warehousing, OLAP and Business Intelligence Applications. Thus, there is an impera-tive need for efficiently processing Star Queries. To this end, a new class of fact table organizations has emerged that exploits path-based surrogate keys in order to hierarchically cluster the fact table data of a star schema [DRSN98, MRB99, KS01]. In the context of these new organizations, star query processing changes radically. In this paper, we present a complete abstract processing plan that captures all the necessary steps in evaluating such queries over hierarchically clustered fact tables. Furthermore, we present optimizations for surrogate key processing and a novel early grouping transformation for grouping on the dimension hierarchies. Our algorithms have been already implemented in a commercial Relational Database Management System (RDBMS) and the experimental evaluation, as well as customer feedback, indicates speed-ups of orders of magnitude for typical star queries in real world applications.
Proceedings of the 28th International Conference on Very Large Databases (VLDB), 2002, Hongkong, China
Real world data has usually a non-uniform data distribution, i.e., there are clusters of data, but most of the universe is unpopulated space, the so called dead space. When indexing such data it is important to handle also dead space efficiently, i.e., the index should not degenerate with respect to size and performance when dealing with such non-uniformly distributed data.
Robert Fenk, Volker, Markl, Rudolf Bayer:
Interval Processing with the UB-Tree
Proceedings of the International Database Engineering & Applications Symposium (IDEAS), 2002, Edmonton, Canada
Advanced data warehouses and web databases have set the demand for processing large sets of time ranges, quality classes, fuzzy data, personalized data and extended objects. Since, all of these data types can be mapped to intervals, interval indexing can dramatically speed up or even be an enabling technology for these new applications.
Frank Ramsak, Volker Markl, Robert Fenk, Rudolf Bayer, Thomas Ruf:
Interactive ROLAP on Large Databases: A Case Study with UB-Trees
Proceedings of the International Database Engineering & Applications Symposium (IDEAS) 2001, Grenoble, France
Online Analytical Processing (OLAP) requires query response times within the range of a few seconds in order to allow for interactive drilling, slicing, or dicing through an OLAP cube. While small OLAP applications use multidimensional database systems, large OLAP applications like the SAP BW rely on relational (ROLAP) databases for efficient data storage and retrieval. ROLAP databases use specialized data models like star or snowflake schemata for data storage and create a large set of indexes or materialized views in order to answer queries efficiently. In our case study, we show the performance benefits of TransBase HyperCube, a commercial RDBMS, whose kernel fully integrates the UB-Tree, a multi-dimensional extension of the B-Tree. With this newly developed access structure, TransBase HyperCube enables interactive OLAP without the need of storing a large set of materialized views or creating a large set of indexes. We compare not only the query performance, but also consider index size and maintenance costs. For the case study we use a 42 million record ROLAP database of GfK, the largest German market research company.
Martin Zirkel, Volker Markl, Rudolf Bayer:
Exploitation of Pre-Sortedness for Sorting in Query Processing: The TempTris-Algorithm for UB-Trees
Proceedings of the International Database Engineering & Applications Symposium (IDEAS), 2001, Grenoble, France
Bulk loading is used to efficiently build a table or access structure, if a large data set is available at index creation time, e.g., the spool process of a data warehouse or the creation of intermediate results during query processing. In this paper we introduce the TempTris algorithm that creates a multidimensional partitioning from a one-dimensionally sorted stream of tuples. In order to achieve that, TempTris exploits the fact that a one-dimensional order can be used as a partial multidimensional order for the creation of a multidimensional partitioning. In this way, TempTris avoids external sorting for the creation of a multidimensional index. In combination with the Tetris sort algorithm, TempTris can be used to create intermediate query processing results that can - without external sorting - be re-used to generate various sort orders. As example of this new processing technique we propose an efficient algorithm for computing an aggregation lattice. Thus, TempTris can also be used to speed up the processing of CUBE operators that frequently occur in OLAP applications.
Roland Pieringer, Volker Markl, Frank Ramsak, Rudolf Bayer:
HINTA: A Linearization Algorithm for Physical Clustering of Complex OLAP Hierarchies
Proceedings of the 3rd International Workshop on Design and Management of Data Warehouses (DMDW), 2001, Interlaken, Switzerland
Hierarchies are an important means to categorize data stored in OLAP systems. OLAP queries follow the drill/slice/dice-paradigm and therefore exhibit navigation patterns that follow the hierarchy of a dimension. In real-world applications, hierarchies are often unbalanced and share levels, resulting in complex hierarchy structures. So far, encoding methods for simple structured hierarchies have been introduced to handle hierarchies efficiently for query processing. In this paper we propose the HINTA algorithm to compute the clustering order for complex hierarchies by linearization. The physical clustering of OLAP data computed by HINTA significantly improves the performance of OLAP queries. HINTA enables clustering of complex hierarchies that can share hierarchy levels in several classifications over one dimension.
Volker Markl, Frank Ramsak, Roland Pieringer, Robert Fenk, Klaus Elhardt, Rudolf Bayer:
The Transbase® Hypercube RDBMS: Multidimensional Indexing of Relational Tables
Proceedings of the 17th International Conference on Data Engineering (ICDE), 2001, Heidelberg, Germany
Only few multidimensional access methods have made their way into commercial relational DBMS. Even if a RDBMS ships with a multidimensional index, the multidimensional index usually is an add-on like Oracle SDO, which is not integrated into the SQL interpreter, query processor and query optimizer of the DBMS kernel. Our demonstration shows TransBase HyperCube, a commercial RDBMS, whose kernel fully integrates the UB-Tree, a multidimensional extension of the B-Tree. This integration was performed in an ESPRIT project funded by the European Commission. We put the main emphasis of our demonstration on the application of UB-Tree indexes in realworld databases for OLAP. However, we also address general issues of UB-Trees like creation, spacerequirements, or comparison to other indexing methods.
Robert Fenk, Akihiko Kawakami, Volker Markl, Rudolf Bayer, Shuichi Osaki:
Bulk loading a Data Warehouse built upon a UB-Tree
Proceedings of the International Database Engineering & Applications Symposium (IDEAS) 2000, Yokohama, Japan
This paper considers the issue of bulk loading large data sets for the UB-Tree, a multidimensional index structure. Especially in dataware housing (DW), data mining and OLAP it is necessary to have efficient bulk loading techniques, because loading occurs not continuously, but only from time to time with usually large data sets. We propose two techniques, one for initial loading, which creates a new UB-Tree, and one for incremental loading, which adds data to an existing UB-Tree. Both techniques try to minimize I/O and CPU cost. Measurements with artificial data and data of a commercial data warehouse demonstrate that our algorithms are efficient and able to handle large data sets. As well as the UB-Tree, they are easily integrated into a RDBMS. Keywords: bulk loading, UB-tree, multidimensional index, dataware housing, data mining, OLAP 1 Introduction In case of loading a huge amount of data into a data base indexed table, it is usually not feasible to use the standard insert operation.
Frank Ramsak, Volker Markl, Robert Fenk, Martin Zirkel, Klaus Elhardt, Rudolf Bayer:
Integrating the UB-Tree into a database System Kernel
Proceedings of the 26th International Conference on Very Large Databases (VLDB), 2000, Cairo, Egypt
Multidimensional access methods have shown high potential for significant performance improvements in various application domains. However, only few approaches have made their way into commercial products. In commercial database management systems (DBMSs) the B-Tree is still the prevalent indexing technique. Integrating new indexing methods into existing database kernels is in general a very complex and costly task. Exceptions exist, as our experience of integrating the UB-Tree into TransBase, a commercial DBMS, shows. The UB-Tree is a very promising multidimensional index, which has shown its superiority over traditional access methods in different scenarios, especially in OLAP applications. In this paper we discuss the major issues of a UB-Tree integration. As we will show, the complexity and cost of this task is reduced significantly due to the fact that the UB-Tree relies on the classical B-Tree. Even though commercial DBMSs provide interfaces for index extensions, we favor the kernel integration because of the tight coupling with the query optimizer, which allows for optimal usage of the UB-Tree in execution plans. Measurements on a real-world data warehouse show that the kernel integration leads to an additional performance improvement compared to our prototype implementation and competing index methods.
Volker Markl, Rudolf Bayer:
Processing Relational OLAP Queries with UB-Trees and Multidimensional Hierarchical Clustering
Proceedings of the 2nd Workshop on Design and Management of Data Warehouses (DMDW), 2000, Stockholm, Sweden
Multidimensional access methods like the UBTree can be used to accelerate almost any query processing operation, if proper query processing algorithms are used: Relational queries or SQL queries consist of restrictions, projections, ordering, grouping and aggregation, and join operations. In the presence of multidimensional restrictions or sorting, multidimensional range query or Tetris algorithms efficiently process these operations. In addition, these algorithms also efficiently support queries that generate some hierarchical restrictions (for instance by following 1:n foreign key relationships). In this paper we investigate the impacts on query processing in RDBMS when using UB-Trees and multidimensional hierarchical clustering for physical data organization. We illustrate the benefits by performance measurements of queries for a star schema from a real world application of a SAP business information warehouse.
Martin Zirkel, Volker Markl, Rudolf Bayer:
Efficient Processing of the Cube Operator
Ph.D. Workshop of Extending Database Technology (EDBT), 2000, Konstanz, Germany
This paper presents a part of the doctoral work with the theme: “The impact of sorted reading from UB-trees on relational database systems”.
Frank Ramsak, Volker Markl, Rudolf Bayer:
Physical Data Modeling for Multidimensional Access Methods
Proceedings of the 11th GI Workshop on 'Grundlagen von Datenbanken', 1999, Luisenthal, Germany
Introduction Despite the fact that the database community has proposed a vast number of indexing methods over the years, no standard physical data model has been established like it has been achieved on the conceptual and logical level. How to optimize a given data model by using various indexing methods is still the 'trade secret` of the database administrators. Only recently, some approaches have been tried to make this knowledge available to the normal database user by easy to use optimization tools (e.g., AutoAdmin-Tool of MS SQL Server 7.0). In addition, physical data modeling has concentrated on one-dimensional access methods, since these were the only ones available in commercial database management systems. As multidimensional access methods (MDAMs) are making their way from the research labs into commercial products, a general physical data model should also take MDAMs into account, especially since MDAMs have a high potential to improve processing in important applications.
Volker Markl, Frank Ramsak, Rudolf Bayer:
Improving OLAP Performance by Multidimensional Hierarchical Clustering
Proceedings of the International Database Engineering & Applications Symposium (IDEAS), 1999, Montreal, Canada
MISTRAL: Processing Relational Queries using a Multidimensional Access Technique
Ph.D. Thesis, TU München, 1999, published by infix Verlag, St. Augustin, DISDBIS 59, ISBN 3-89601-459-5
Volker Markl, Martin Zirkel, Rudolf Bayer:
Processing Operations with Restrictions in Relational Database Management Systems without external Sorting
Proceedings of the 15th International Conference on Data Engineering (ICDE), 1999, Sydney, Australia
Do you have questions? Then contact us. We are happy to help you.