CMU-CS-97-175
Computer Science Department
School of Computer Science, Carnegie Mellon University



CMU-CS-97-175

Predicting Data Cache Misses in Non-Numeric Applications Through Correlation Profiling

Todd C. Mowry, Chi-Keung Luk*

September 1997

An abbreviated version of this paper will appear in the
Proceedings of the Fourth International Symposium on High-Performance Computer Architecture, February 1-4, 1998.

CMU-CS-97-175.ps


Keywords: Caches memories, performance of systems (measurement techniques, performance attributes), data structures (graphs, lists, trees), compilers


Software-based latency tolerance techniques offer the potential for bridging the ever-increasing speed gap between the memory subsystem and today's high-performance processors. However, to fully exploit the benefit of these techniques, one must be careful to apply them only to the dynamic references that are likely to suffer cache misses --- otherwise the runtime overheads can potentially offset any gains. In this paper, we focus on isolating dynamic miss instances in non-numeric applications, which is a difficult but important problem. Although compilers cannot statically analyze data locality in non-numeric applications, one viable approach is to use profiling information to measure the actual miss behavior. Unfortunately, the state-of-the-art in cache miss profiling (which we call summary profiling) is inadequate for references with intermediate miss ratios --- it either misses opportunities to hide latency, or else inserts overhead that is unnecessary. To overcome this problem, we propose and evaluate a new profiling technique that helps predict which dynamic instances of a static memory reference will hit or miss in the cache: correlation profiling.

Our experimental results demonstrate that roughly half of the 22 non-numeric applications we study can potentially enjoy significant reductions in memory stall time by exploiting at least one of the three forms of correlation profiling we consider: control-flow correlation, self correlation, and global correlation. In addition, our detailed case studies illustrate that self correlation succeeds because a given reference's cache outcomes often contain repeated patterns, and control-flow correlation succeeds because cache outcomes are often call-chain dependent. We also demonstrate that software prefetching can achieve better performance on a modern superscalar processor when directed by correlation profiling rather than summary profiling information.

26 pages

*Department of Computer Science, University of Toronto, Toronto, Ontario, Canada, M5S 3G4


Return to: SCS Technical Report Collection
School of Computer Science homepage

This page maintained by reports@cs.cmu.edu