Computer Science Department
School of Computer Science, Carnegie Mellon University
Rethinking Storage for Discard-Based Search
Lily Mummert*, Steve Schlosser*, Mike Mesnier*, M. Satyanarayanan
The workload characteristics of content-based image retrieval are poorly suited to existing storage architectures. This is particularly true for discard-based search, where images are filtered on-demand, potentially at the storage devices themselves (an active disk technique referred to as early discard). In general, discard-based search is a highly concurrent, sequential, and read-only workload, making it ideally suited for JBOD ("just a bunch of disks"), as opposed to a more familiar RAID configuration. Further, as with most image databases, no specific order is imposed on the retrieved images. In the context of discard-based search, such any-order semantics introduce a variety of opportunities to the storage system designer (e.g., an I/O coalescing technique called "bandwagon synchronization").
This paper examines the storage workloads of discard-based search, and discusses the implications for a new storage system specifically designed for content-based image retrieval. In addition, representative synthetic workloads are used to demonstrate the efficiency of JBOD over RAID, and to quantify the benefits of bandwagon synchronization. We show that JBOD achieves 80-90% of an array's bandwidth (depending on the level of I/O concurrency and the average object size), compared to 40-80% for RAID-0.
*Intel Research Pittsburgh