Computer Science Department
School of Computer Science, Carnegie Mellon University
Energy-efficient Data-intensive Computing
Vijay R. Vasudevan
Large-scale data-intensive computing systems have become a critical foundation for Internet-scale services. ţóeir widespread growth during the past decade has raised datacenter energy demand and created an increasingly large ˘nancial burden and scaling challenge: Peak energy requirements today are a signi˘cant cost of provisioning and operating datacenters. In this thesis, we propose to reduce the peak energy consumption of datacenters by using a FAWN: A Fast Array of Wimpy Nodes. FAWN is an approach to building datacenter server clusters using low-cost, low-power servers that are individually optimized for energy efficiency rather than raw performance alone. FAWN systems, however, have a different set of resource constraints than traditional systems that can prevent existing soţware from reaping the improved energy efficiency bene˘ts FAWN systems can provide.
ţóis dissertation describes the principles behind FAWN and the soţware techniques necessary to unlock its energy efficiency potential. First, we present a deep study into building FAWN-KV, a distributed, log-structured key-value storage system designed for an early FAWN prototype. Second, we present a broader classi˘cation and workload analysis showing when FAWN can be more energy-efficient and under what workload conditions a FAWN cluster would perform poorly in comparison to a smaller number of high-speed systems. Last, we describe modern trends that portend a narrowing gap between CPU and I/O capability and highlight the challenges endemic to all future balanced systems. Using FAWN as an early example, we demonstrate that pervasive use of "vector interfaces" throughout distributed storage systems can improve throughput by an order of magnitude and eliminate the redundant work found in many data-intensive workloads.