CMU-CS-25-133
Computer Science Department
School of Computer Science, Carnegie Mellon University



CMU-CS-25-133

Cost-Efficient Storage and Caching in Public Clouds

Hojin Park

Ph.D. Thesis

September 2025

CMU-CS-25-133.pdf


Keywords: Cloud Storage, Public Cloud Resource Provisioning, Cross-Cloud/Cross-Region Data Access, Caching, Cache Prefetching, System Auto-Configuration

As modern data-intensive workloads increasingly migrate to the public cloud, managing the resulting costs has emerged as a pressing challenge despite the operational simplicity and elasticity that cloud environments offer. Although many efforts in cost optimization have focused on computation, storage-related costs have received comparatively less attention despite being a significant portion of total cloud spending. In particular, two categories dominate storage-related costs in public cloud: the cost of deploying and operating storage clusters in the cloud, and the cost of accessing data across geographically distributed regions or clouds. These challenges cannot be effectively addressed by existing optimization techniques developed for on-premise environments, since they often overlook the unique characteristics of public clouds, including elastic resource provisioning, diverse cost-performance trade-offs, and dynamic and unique access patterns found in cloud object storage workloads.

This dissertation addresses these challenges by proposing a cost-efficient approach to designing storage and caching systems that are cloud-aware, elastic, and adaptive to workload behavior. It introduces three systems that target key aspects of cloud storage cost optimization. First, Mimir reduces the cost of the deployment of storage clusters by automatically selecting cost-effective combinations of virtual machines and block storage types, based on profiling workload characteristics and benchmarking available resource options. Second, Macaron reduces cross-region and cross-cloud data access costs by auto-configuring a cache with a tiered storage architecture that leverages low-cost object storage and dynamically resizes the cache based on workload changes. Third, Macaron+ builds on Macaron by introducing a cost-aware prefetching technique that analyzes object-level access patterns to reduce latency in workloads with high cold miss ratios, while preserving cost-efficiency. Together, these systems demonstrate that by tailoring automated resource selection, adaptive configuration, and predictive techniques to the characteristics of the public cloud, it is possible to significantly reduce the cost of storing and accessing data.

132 pages

Thesis Committee:
George Amvrosiadis (Co-Chair)
Gregory R. Ganger (Co-Chair)
Jignesh M. Patel
Carlo Curino (Microsoft Research)

Srinivasan Seshan, Head, Computer Science Department
Martial Hebert, Dean, School of Computer Science

Creative Commons: CC-BY (Attribution)


Return to: SCS Technical Report Collection
School of Computer Science

This page maintained by reports@cs.cmu.edu