CMU-CS-03-193
Computer Science Department
School of Computer Science, Carnegie Mellon University



CMU-CS-03-193

Learning to Identify TV News Monologues by Style and Context

Cees G.M. Snoek, Alexander G. Hauptmann

October 2003

CMU-CS-03-193.ps
CMU-CS-03-193.pdf


Keywords: Multimodal analysis, semantic learning, classifier ensembles, broadcast video, news subject monologue, style detectors, context detectors, TRECVID benchmark.


We focus on the problem of learning semantics from multimedia data associated with broadcast video documents. In this paper we propose to learn semantic concepts from multimodal sources based on style and context detectors, in combination with statistical classifier ensembles. As a case study we present our method for detecting the concept of news subject monologues. This approach had the best average precision performance amongst 26 submissions in the 2003 video track of the Text Retrieval Conference benchmark. Experiments were conducted with respect to individual detector contribution, ensemble size, and ranking mechanism. It was found that the combination of detectors is decisive for the final result, although some detectors might appear useless in isolation. Moreover, by using a probabilistic ranking, in combination with a large classifier ensemble, results can be improved even further.

23 pages


Return to: SCS Technical Report Collection
School of Computer Science homepage

This page maintained by reports@cs.cmu.edu