Computer Science Department
School of Computer Science, Carnegie Mellon University


The Open Diamond® Platform for
Discard-based Search

M. Satyanarayanan, Rahul Sukthankar*, Adam Goode,
Larry Huston, Lily Mummert*, Adam Wolbach,
Jan Harkes, Richard Gass*, Steve Schlosser*

May 2008


Keywords: Middleware, open source, medical images, image processing, pattern recognition, self-tuning, load balancing, result caching, just-in-time indexing, interactive search, database, anomaly detection, similarity search, pathology, radiology, mammograms, whole-slide image, active disk, smart storage, object storage, coarse-grain parallelism, ImageJ

Interactive exploration of large distributed collections of complex, non-text data such as medical images is a challenging task because of the difficulty of creating useful indexes. To handle such tasks, we introduce a new approach to search called discard-based search. In contrast to classic search strategies that precompute indexes for all anticipated queries, discard-based search is an on-demand strategy that performs content-based computation in response to a specific query. This simple change in strategy turns out to have deep consequences for flexibility and user control, while also enabling easy exploitation of CPU and storage parallelism on servers. This paper presents the design and implementation of the OpenDiamond platform for discard-based search, describes some of the applications that have been built with it, and offers experimental evidence that its workloads exhibit easily-exploitable storage parallelism.

22 pages

*Intel Research Pittsburgh

Return to: SCS Technical Report Collection
School of Computer Science

This page maintained by