Computer Science Department
School of Computer Science, Carnegie Mellon University


Searching Complex Data Without an Index

Mahadev Satyanarayanan, Rahul Sukthankar*, Adam Goode,
Nilton Bila**, Lily Mummert*, Jan Harkes, Adam Wolbach,
Larry Huston, Eyal de Lara**

December 2009


Keywords: Data-intensive computing, non-text search technology, medical image processing, interactive search, computer vision, pattern recognition, distributed systems, ImageJ, MATLAB, parallel processing, human-in-the-loop, Diamond, OpenDiamond

We show how query-specific content-based computation pipelined with human cognition can be used for interactive search when a pre-computed index is not available. More specifically, we use query-specific parallel computation on large collections of complex data spread across multiple Internet servers to shrink a search task down to human scale. The expertise, judgement, and intuition of the user performing the search can then be brought to bear on the specificity and selectivity of the current search. Rather than text or numeric data, our focus is on complex data such as digital photographs and medical images. We describe Diamond, a system that can perform such interactive searches on stored data as well as liveWeb data. Diamond is able to narrow the focus of a non-indexed search by using structured data sources such as relational databases. It can also leverage domain-specific software tools in search computations. We report on the design and implementation of Diamond, and its use in the health sciences.

24 pages

*Intel Labs Pittsburgh
**University of Toronto

Return to: SCS Technical Report Collection
School of Computer Science

This page maintained by