CMU-CS-20-101
Computer Science Department
School of Computer Science, Carnegie Mellon University



CMU-CS-20-101

Memory-Efficient Search Trees for Database Management Systems

Huanchen Zhang

Ph.D. Thesis

February 2020

CMU-CS-20-101.pdf


Keywords: Ssearch tree, memory-efficiency, database management system, indexing, range filtering, succinct data structure, key compression

The growing cost gap between DRAM and storage together with increasing database sizes means that database management systems (DBMSs) now operate with a lower memory to storage size ratio than before. On the other hand, modern DBMSs rely on in-memory search trees (e.g., indexes and filters) to achieve high throughput and low latency. These search trees, however, consume a large portion of the total memory available to the DBMS. this dissertation seeks to address the challenge of building compact yet fast in-memory search trees to allow more efficient use of memory in data processing systems. We first present techniques to obtain maximum compression on fast read-optimized search trees. We identified sources of memory waste in existing trees and designed new succinct data structures to reduce the memory to the theoretical limit. We then introduce ways to amortize the cost of modifying static data structures with bounded and modest cost in performance and space. Finally, we approach the search tree compression problem from an orthogonal direction by building a fast string compressor that can encode arbitrary input keys while preserving their order. Together, these three pieces form a practical recipe for achieving memory-efficiency in search trees and in DBMSs.

159 pages

Thesis Committee:
David G. Andersen (Chair)
Michael Kaminsky
Andrew Pavlo
Kimberly Keeton (Hewlett-Packard Labs)

Srinivasan Seshan, Head, Computer Science Department
Martial Hebert, Dean, School of Computer Science


Return to: SCS Technical Report Collection
School of Computer Science

This page maintained by reports@cs.cmu.edu