Computer Science Department
School of Computer Science, Carnegie Mellon University
Informedia News-On Demand:
Using Speech Recognition to Create a Digital Video Library
Howard D. Wactlar, Alexander G. Hauptmann, Michael J. Witbrock*
This work was first presented at the 1996 DARPA Spoken Language Technology
Workshop, Arden House, Harriman, NY, February 1996.
Keywords:Digital libraries, digital video, speech recognition,
image analysis, video segmentation, information retrieval, spoken
document retrieval, speech interfaces, Informedia
In theory, speech recognition technology can make any spoken words in
video or audio media usable for text indexing, search and retrieval.
This article describes the News-on-Demand application created within the
InformediaTM Digital Video Library project and discusses how speech
recognition is used in transcript creation from video, alignment with
closed-captioned transcripts, audio paragraph segmentation and a spoken
query interface. Speech recognition accuracy varies dramatically
depending on the quality and type of data used. Informal information
retrieval tests show that reasonable recall and precision can be
obtained with only moderate speech recognition accuracy.
*Justsystem Pittsburgh Research Center, 4616 Henry Street, Pittsburgh, PA 15213. The work described in this paper was done while M. Witbrock was an employee of Carnegie Mellon University.