Computer Science Department
School of Computer Science, Carnegie Mellon University
QMAS: Querying, Mining and Summarization
Robson L. F. Cordeiro*, Fan Guo, Donna S. Haverkamp**,
This is an extended version of a paper to appear in the
Given a large collection of images, very few of which have labels given a priori, how can we automatically assign the labels of the remaining majority, and make suggestion for images that may need brand new labels distinct from existing ones? Popular automatic labeling techniques usually scale super linearly with the size of the image set, and/or their performances degrade if limited images bear initial labels. In this paper, we propose QMAS, an efficient solution to the following problems: (i) low-labor labeling (L3) – given a collection of images, very few of which are already labeled with keywords, find the most suitable labels for the remaining ones; and (ii) mining and attention routing – with the same input set, output a number of top representative images and top outliers. We present experimental evaluation on three data sets of proprietary and public satellite images up to a size of 2.25GB. QMAS scales linearly with the number of images, obtaining better or equal accuracy while being up to 40 times faster than its baseline algorithm. With limited numbers of initial labels available, QMAS achieves a significant accuracy margin over the baseline approach. The application of QMAS to recommend representatives and spot outliers is also illustrated. The proposed framework could be generalized to solve similar content-based annotation and mining problems on other multi-modal databases.
*University of São Paulo, São Carlos, Brazil