COMPUTER SCIENCE TECHNICAL REPORT ABSTRACTS

CMU-CS-11-131
Computer Science Department
School of Computer Science, Carnegie Mellon University

CMU-CS-11-131

Energy-efficient Data-intensive Computing
with a Fast Array of Wimpy Nodes

Vijay R. Vasudevan

October 2011

Ph.D. Thesis

CMU-CS-11-131.pdf

Keywords: Energy Efficiency, Low Power, Cluster Computing, Flash

Large-scale data-intensive computing systems have become a critical foundation for Internet-scale services. î¢eir widespread growth during the past decade has raised datacenter energy demand and created an increasingly large ônancial burden and scaling challenge: Peak energy requirements today are a signiôcant cost of provisioning and operating datacenters. In this thesis, we propose to reduce the peak energy consumption of datacenters by using a FAWN: A Fast Array of Wimpy Nodes. FAWN is an approach to building datacenter server clusters using low-cost, low-power servers that are individually optimized for energy efficiency rather than raw performance alone. FAWN systems, however, have a different set of resource constraints than traditional systems that can prevent existing soîware from reaping the improved energy efficiency beneôts FAWN systems can provide.

î¢is dissertation describes the principles behind FAWN and the soîware techniques necessary to unlock its energy efficiency potential. First, we present a deep study into building FAWN-KV, a distributed, log-structured key-value storage system designed for an early FAWN prototype. Second, we present a broader classiôcation and workload analysis showing when FAWN can be more energy-efficient and under what workload conditions a FAWN cluster would perform poorly in comparison to a smaller number of high-speed systems. Last, we describe modern trends that portend a narrowing gap between CPU and I/O capability and highlight the challenges endemic to all future balanced systems. Using FAWN as an early example, we demonstrate that pervasive use of "vector interfaces" throughout distributed storage systems can improve throughput by an order of magnitude and eliminate the redundant work found in many data-intensive workloads.

154 pages

Return to: SCS Technical Report Collection
School of Computer Science

This page maintained by reports@cs.cmu.edu