|
CMU-CS-97-1
Computer Science Department
School of Computer Science, Carnegie Mellon University
CMU-CS-97-152
Large-scale Topic Detection and Language Model Adaptation
Kristie Seymore, Ronald Rosenfeld
June 1997
CMU-CS-97-152.ps
Keywords: Speech recognition, statistical language modeling, topic
detection, topic adaptation, document clustering
The subject matter of any conversation or document can typically be
described as some combination of elemental topics. We have developed
a language model adaptation scheme that takes a piece of text, chooses
the most similar topic clusters from a set of over 5000 elemental topics,
and uses topic specific language models built from the topic clusters to
rescore N-best lists. We are able to achieve a 15% reduction in perplexity
and a small improvement in word error rate by using this adaptation. We also
investigate the use of a topic tree, where the amount of training data for
a specific topic can be judiciously increased in cases where the elemental
topic cluster has too few word tokens to build a reliably smoothed and
representative language model. Our system is able to fine-tune topic
adaptation by interpolating models chosen from thousands of topics,
allowing for adaptation to unique, previously unseen combinations of
subjects.
20 pages
|