|
CMU-CS-97-173
Computer Science Department
School of Computer Science, Carnegie Mellon University
CMU-CS-97-173
Lattice Based Language Models
Pierre Dupont*, Ronald Rosenfeld
September 1997
CMU-CS-97-173.ps
Keywords: Speech recognition, statistical language modeling, lattice
basedmodels, smoothing techniques
This paper introduces lattice based language models, a new language
modeling paradigm. These models construct multi-dimensional
hierarchies of partitions and select the most promising partitions to
generate the estimated distributions. We discussed a specific two
dimensional lattice and propose two primary features to measure the
usefulness of each node: the training-set history count and the
smoothed entropy of its prediction. Smoothing techniques are reviewed
and a generalization of the conventional backoff strategy to multiple
dimensions is proposed. Preliminary experimental results are obtained
on the SWITCHBOARD corpus which lead to a 6.5% perplexity reduction
over a word trigram model.
28 pages
*Department of Mathematics, University Jean Monnet, 23 rue P. Michelon,
42023 Saint-Etienne Cedex, France.
|