Thursday, August 13, 2009

A SQL Compiler for High-Performance Delta Processing in Main-Memory Databases

DBToaster is a novel SQL compiler that generates database engines for high-performance main-memory processing of streaming data. In a nutshell, DBToaster aggressively compiles aggregate queries to incremental (or delta-) form, enabling stream data to be processed highly efficiently, one tuple at a time, through the entire query, in contrast to today's operator-centric query plan interpreters. We will demonstrate the DBToaster compiler at VLDB 2009 in the "Core DB Technology & System issues" Demo Session on Tuesday, August 25th, and Wednesday, August 26th.

The DBToaster compiler produces database engines in native code to perform incremental view maintenance of continuous queries posed on update streams. Update streams cannot be addressed efficiently by today's systems, and one clear motivating application is that of algorithmic trading on orderbook data, where buy and sell orders on an exchange's orderbooks are updated arbitrarily. Algorithmic trading applications require high-performance processing, have bounded size state in practice, and often involve queries with a processing scope that cannot be addressed by constructs such as windows or punctuations.

The novelty behind DBToaster's query compilation to view maintenance code is twofold, first in its use of a recursive compilation technique to aggressively simplify queries to delta forms, and second, its maintenance of multiple map data structures throughout recursive compilation to support delta processing. To define recursive compilation, we contrast to traditional view maintenance algorithms. Current view maintenance techniques use query plans to compute result deltas from changes to a single base relation. To emphasize, the incremental form of a user-specified query is itself a query, but critically, a simpler query that exploits new query inputs. Typically, the new inputs allow the elimination of joins and simplifications of aggregate computations.

Now, today's view maintenance algorithms stop at this point. DBToaster on the other hand ploughs straight ahead, recursively applying compilation to transform delta forms that are themselves queries, to simpler and simpler queries, by considering combinations of base relation deltas. Our recursive compilation bottoms out at queries that can be represented as very simple procedural code statements. Furthermore, DBToaster maintains each delta form encountered as a map datastructure, essentially a group-by aggregate index derived from applying aggregate distributivity properties together with join-graph decompositions. These maps are incrementally maintained by the procedural code generated by recursively compiling the maps' delta forms. DBToaster internally uses a map algebra to represent and reason about queries and map datastructures, and performs recursive compilation through a set of transformation rules defined in our map algebra.

We invite you to attend our demo at VLDB to see DBToaster in action. We will demonstrate DBToaster and its recursive compilation as applied to executing algorithmic trading strategies on orderbook data, and a data warehouse loading scenario, emulating simultaneous processing of a data integration query and an OLAP query, while incrementally loading the warehouse from an OLTP database. Our demonstration includes a visualization of recursive compilation and the map datastructures as applied to these demo queries, as well as the ability to trace and step through the generated map maintenance code. DBToaster generates extremely high performance native code database engines, and we will demonstrate the performance of these query processors compared to both popular open source database systems, as well as commercial-grade tools. You'll even have the opportunity to try out your own queries on our datasets! Look out for the toaster...

No comments:

Post a Comment