CMU-CS-24-107 Computer Science Department School of Computer Science, Carnegie Mellon University
On Embedding Database Management System Logic in Matthew Butrovich Ph.D. Thesis May 2024
The ever-increasing improvement in computer storage and network performance means that disk I/O and network communication are often no longer bottlenecks in database management systems (DBMSs). Instead, the overheads associated with operating system (OS) services (e.g., system calls, thread scheduling, and data movement from kernel-space) limit query processing responsiveness. User-space applications like DBMSs can elide these overheads with a kernel-bypass design. However, extracting benefits from kernel-bypass frameworks is challenging, and the libraries are incompatible with standard deployment and debugging tools. This dissertation presents an alternative implementation strategy for systems called user-bypass–a design that extends OS behavior for DBMS-specific features, including observability, networking, and query execution. Historically, DBMS developers avoid kernel extensions for safety and security reasons, but recent improvements in OS extensibility present new opportunities. Developers write safe, event-driven programs with user-bypass to push DBMS logic into the kernel and avoid user-space overheads. When a DBMS in user-space invokes these programs, user-bypass provides behavior similar to a new OS system call, albeit without kernel modifications. Alternatively, when an OS thread or interrupt triggers these programs in kernel-space, user-bypass inserts DBMS logic into the kernel stack. In this dissertation, we will introduce three systems that use the user-bypass method in DBMSs. First, we present an observability framework that employs user-bypass to collect training data for self-driving DBMSs that reduces the number of round trips to kernel-space to retrieve performance counters and other system metrics. Next, we present a database proxy that applies user-bypass to support features like connection pooling and workload replication while reducing data copying and user-space thread scheduling. Lastly, we present an embedded that provides ACID transactions over multi-versioned data in kernel-space. The techniques in this dissertation show user-bypass benefits across multiple DBMS design disciplines and provide a template for future DBMS and OS co-design 132 pages
Thesis Committee:
Srinivasan Seshan, Head, Computer Science Department
| |
Return to:
SCS Technical Report Collection This page maintained by reports@cs.cmu.edu |