COMPUTER SCIENCE TECHNICAL REPORT ABSTRACTS

CMU-CS-21-106
Computer Science Department
School of Computer Science, Carnegie Mellon University

CMU-CS-21-106

On Building Robustness into Compilationn-Based
Main-Memory Database Query Engines

Prashanth Menon

Ph.D. Thesis

April 2021

CMU-CS-21-106.pdf

Keywords: Code Generation, Query Compilation, Adaptive Query Processing, Vectorized Processing

Relational database management systems (DBMS) are the bedrock upon which modern data processing intensive applications are assembled. Critical to ensuring low-latency queries is the efficiency of the DBMSs query processor. Just-in-time(JIT) query compilation is a popular technique to improve analytical query processing performance. However, a compiled query cannot overcome poor choices made by the DBMSs optimizer. A lousy query plan results in lousy query code. Poor query plans often arise and for many reasons. Although there is a large body of work exploring how a query processor can adapt itself at runtime to compensate fori nadequate plans, these techniques do not work in DBMSs that rely on compiling queries.

This dissertation presents multiple effective, practical, and complementary techniques to build adaptive query processing into compilation-based engines with negligible overhead. First, we propose a method that intelligently blends two otherwise disparate query processing a pproaches (compilation and vectorization) into one engine. This necessary first step allows operators to optimize themselves using a combination of software memory prefetching and single instruction, multiple data (SIMD) vectorization resulting in improved performance. Next, we present a framework that builds upon our previous work to allow the DBMS to modify compiled queries without recompiling the plan or generating code speculatively. This technique enables more extensive groups of operators in a query to coordinate their optimization process. Finally, we present a method that decomposes query plans into fragments that can be compiled and executed independently. This not only reduces compilation overhead but enables the DBMS to learn properties about data processed in an earlier phase of the query to hyper-optimize the code it generates for later phases.

Collectively, the techniques proposed in this dissertation enable any compilation-based DBMS to achieve dynamic runtime robustness without succumbing to any of its overheads.

124 pages

Thesis Committee:
Andrew Pavlo (Co-Chair)
Todd C. Mowry (Co-Chair)
Jonathan Aldrich
Thomas Neumann (Technische Universität München)

Srinivasan Seshan, Head, Computer Science Department
Martial Hebert, Dean, School of Computer Science

Return to: SCS Technical Report Collection
School of Computer Science

This page maintained by reports@cs.cmu.edu