Computer Science Department
School of Computer Science, Carnegie Mellon University
Learning to Identify TV News Monologues by Style and Context
Cees G.M. Snoek, Alexander G. Hauptmann
Keywords: Multimodal analysis, semantic learning, classifier
ensembles, broadcast video, news subject monologue, style detectors,
context detectors, TRECVID benchmark.
We focus on the problem of learning semantics from multimedia data associated
with broadcast video documents. In this paper we propose to learn semantic
concepts from multimodal sources based on style and context detectors, in
combination with statistical classifier ensembles. As a case study we
present our method for detecting the concept of news subject monologues.
This approach had the best average precision performance amongst 26
submissions in the 2003 video track of the Text Retrieval Conference
benchmark. Experiments were conducted with respect to individual detector
contribution, ensemble size, and ranking mechanism. It was found that
the combination of detectors is decisive for the final result,
although some detectors might appear useless in isolation. Moreover,
by using a probabilistic ranking, in combination with a large
classifier ensemble, results can be improved even further.