CMU-CS-00-177
Computer Science Department
School of Computer Science, Carnegie Mellon University



CMU-CS-00-177

Improving Index Performance through Prefetching

Shimin Chen, Phillip B. Gibbons*, Todd C. Mowry

CMU-CS-00-177.ps
CMU-CS-00-177.pdf


Keywords: Cache performance, prefetching, databases, B-Tree indices


In recognition of the crucial role that cache hierarchies play in database performance, recent studies have revisited core database algorithms and data structures in an effort to reduce the number of cache misses. While these efforts to avoid cache misses are certainly helpful, they are not a complete solution for two reasons. First, a large number of cache misses still remain that cannot be eliminated. Second, because modern processors support prefetching and other mechanisms to potentially overlap cache misses with computation and other misses, it is not the total number of cache misses that dictates performance, but rather the total amount of exposed miss latency. Hence an algorithm that is more amenable to prefetching can potentially outperform an algorithm with fewer cache misses.

In this paper, we propose and evaluate Prefetching B+-Trees (pB+-Trees). Such trees are designed to exploit prefetching to accelerate two important operations on B+-Tree indices: searches and range scans. To accelerate searches, pB+-Trees use prefetching to effectively create wider nodes than the natural data transfer size: e.g., eight vs. one cache lines or disk pages. These wider nodes reduce the height of the B+-Tree, thereby decreasing the number of expensive misses when going from parent to child without significantly increasing the cost of fetching a given node. Our results show that this technique speeds up search, insertion, and deletion times by a factor of 1.2--1.5 for main-memory B+-Trees. In addition, it outperforms and is complementary to "Cache-Sensitive B+-Trees." To accelerate range scans, pB+-Trees provide arrays of pointers to their leaf nodes. These allow the pB+-Trees to prefetch arbitrarily far ahead, even for nonclustered indices, thereby hiding the normally expensive cache misses associated with traversing the leaves within the range. Our results show that this technique yields over a sixfold speedup on range scans of 1000+ keys. Although our experimental evaluation focuses on main memory databases, the techniques that we propose are also applicable to hiding disk latency.

26 pages

*Information Sciences Research Center, Bell Laboratories, 600 Mountain Avenue, Murray Hill, NJ 07974


Return to: SCS Technical Report Collection
School of Computer Science homepage

This page maintained by reports@cs.cmu.edu