Computer Science Department
School of Computer Science, Carnegie Mellon University


Redesigning Database Systems in Light of
CPU Cache Prefetching

Shimin Chen

December 2005

Ph.D. Thesis

Keywords: Cache Prefetching, Database Systems, CPU Cache Performance, Data Locality Optimizations, B+-Trees, Hash Joins

Computer systems have enjoyed an exponential growth in processor speed for the past 20 years, while main memory speed has improved only moderately. Today a cache miss to main memory takes hundreds of processor cycles. Recent studies have demonstrated that on commercial databases, about 50% or more of execution time in memory is often wasted due to cache misses. In light of this problem, a number of recent studies focused on reducing the number of cache misses of database algorithms. In this thesis, we investigate a different approach: reducing the impact of cache misses through a technique called cache prefetching. Since prefetching for sequential array accesses has been well studied, we are interested in studying non-contiguous access patterns found in two classes of database algorithms: the B+-Tree index algorithm and the hash join algorithm. We re-examine their designs with cache prefetching in mind, and combine prefetching and data locality optimizations to achieve good cache performance.

For B+-Trees, we first propose and evaluate a novel main memory index structure, Prefetching B+-Trees, which uses prefetching to accelerate two major access patterns of B+-Tree indices: searches and range scans. We then apply our findings in the development of a novel index structure, Fractal Prefetching B+-Trees, that optimizes index operations both for CPU cache performance and for disk performance in commercial database systems by intelligently embedding cache-optimized trees into disk pages.

For hash joins, we first exploit cache prefetching separately for the I/O partition phase and the join phase of the algorithm. We propose and evaluate two techniques, Group Prefetching and Software-Pipelined Prefetching, that exploit inter-tuple parallelism to overlap cache misses across the processing of multiple tuples. Then we present a novel algorithm, Inspector Joins, that exploits the free information obtained from one pass of the hash join algorithm to improve the performance of a later pass. This new algorithm addresses the memory bandwidth sharing problem in shared-bus multiprocessor systems.

We compare our techniques against state-of-the-art cache-friendly algorithms for B+-Trees and hash joins through both simulation studies and real machine experiments. Our experimental results demonstrate dramatic performance benefits of our cache prefetching enabled techniques.

227 pages

Return to: SCS Technical Report Collection
School of Computer Science homepage

This page maintained by