COMPUTER SCIENCE TECHNICAL REPORT ABSTRACTS

CMU-CS-10-144
Computer Science Department
School of Computer Science, Carnegie Mellon University

CMU-CS-10-144

QMAS: Querying, Mining and Summarization
of Multi-modal Databases

Robson L. F. Cordeiro*, Fan Guo, Donna S. Haverkamp**,
James H. Horne**, Ellen K. Hughes**, Gunhee Kim,
Agma J. M. Traina*, Caetano Traina Jr.*, Christos Faloutsos

November 2010

This is an extended version of a paper to appear in the
ICDM'10 conference proceedings by the same authors.

CMU-CS-10-144.pdf

Keywords: Attention routing, low-labor labeling, multi-modal databases

Given a large collection of images, very few of which have labels given a priori, how can we automatically assign the labels of the remaining majority, and make suggestion for images that may need brand new labels distinct from existing ones? Popular automatic labeling techniques usually scale super linearly with the size of the image set, and/or their performances degrade if limited images bear initial labels. In this paper, we propose QMAS, an efficient solution to the following problems: (i) low-labor labeling (L3) – given a collection of images, very few of which are already labeled with keywords, find the most suitable labels for the remaining ones; and (ii) mining and attention routing – with the same input set, output a number of top representative images and top outliers. We present experimental evaluation on three data sets of proprietary and public satellite images up to a size of 2.25GB. QMAS scales linearly with the number of images, obtaining better or equal accuracy while being up to 40 times faster than its baseline algorithm. With limited numbers of initial labels available, QMAS achieves a significant accuracy margin over the baseline approach. The application of QMAS to recommend representatives and spot outliers is also illustrated. The proposed framework could be generalized to solve similar content-based annotation and mining problems on other multi-modal databases.

25 pages

*University of São Paulo, São Carlos, Brazil
**Science Applications International Corporation, McLean, VA

Return to: SCS Technical Report Collection
School of Computer Science

This page maintained by reports@cs.cmu.edu