CMU-CS-24-152
Computer Science Department
School of Computer Science, Carnegie Mellon University



CMU-CS-24-152

Machine learning for flash caching in bulk storage systems

Daniel Lin-Kit Wong

Ph.D. Thesis

September 2024

CMU-CS-24-152.pdf


Keywords: Flash caching, machine learning for caching, machine learning for systems, bulk storage systems

Flash caches are used to reduce peak backend load for throughput-constrained data center services, reducing the total number of backend servers required. Bulk storage systems are a large-scale example; backed by high-capacity but low-throughput hard disks, they use flash caches to provide a cost-effective storage layer underlying everything from blobstores to data warehouses.

However, flash caches must address flash's limited write endurance by limiting the number of flash writes to avoid premature wear-out. Thus, most flash caches rely on admission policies to filter cache insertions and maximize the workload-reduction value of each write.

This dissertation evaluates and demonstrates potential uses of ML in place of traditional heuristic cache management policies for flash caches in bulk storage systems. The most successful elements of my research are embodied in a flash cache system called Baleen, which uses coordinated ML admission and prefetching to reduce peak backend load. After learning painful lessons with early ML policy attempts, I exploit a new cache residency model (episodes) to guide model training. I focus on optimizing an end-to-end metric (Disk-head Time) that measures backend load more accurately than IO miss rate or byte miss rate. Evaluation using 7-day Meta traces from 7 storage clusters shows that Baleen reduces Peak Disk-head Time (and hence backend hard disks required) by 12% over state-of-the-art policies for a fixed flash write rate constraint.

I present a TCO (total cost of ownership) formula quantifying the costs of additional flash writes against reductions in Peak Disk-head Time in terms of flash drives and hard disks needed. Baleen-TCO chooses optimal flash write rates and reduces estimated TCO by 17%.

Workloads change over time, requiring that caches adapt to maintain performance. I present a strategy for peak load reduction that adapts selectivity to load levels. I also evaluated workload drift and its impact on ML policy performance on 30-day Meta traces. Baleen is the result of substantial exploration and experimentation with ML for caching. I present lessons learned from additional strategies considered and explain why they saw limited success on our workloads. These include enhancements for ML-based eviction, more complex ML models, and optimizing the use of DRAM in hybrid caches. I also present lessons from ML production deployments.

Code and traces are available via https://www.pdl.cmu.edu/CILES. These include our 7-day traces which were the most extensive public collection of traces from a production bulk storage system at the time of writing.

127 pages

Thesis Committee:
Gregory R. Ganger (Chair)
David G. Andersen
Nathan Beckmann
Daniel S. Berger (Microsoft Research / University of Washington)

Srinivasan Seshan, Head, Computer Science Department
Martial Hebert, Dean, School of Computer Science


Return to: SCS Technical Report Collection
School of Computer Science

This page maintained by reports@cs.cmu.edu