|   | CMU-CS-97-173 Computer Science Department
 School of Computer Science, Carnegie Mellon University
 
    
     
 CMU-CS-97-173
 
Lattice Based Language Models 
Pierre Dupont*, Ronald Rosenfeld 
September 1997   
CMU-CS-97-173.ps Keywords: Speech recognition, statistical language modeling, lattice 
basedmodels, smoothing techniques
 This paper introduces lattice based language models, a new language
modeling paradigm. These models construct multi-dimensional
hierarchies of partitions and select the most promising partitions to
generate the estimated distributions.  We discussed a specific two
dimensional lattice and propose two primary features to measure the
usefulness of each node: the training-set history count and the
smoothed entropy of its prediction.  Smoothing techniques are reviewed
and a generalization of the conventional backoff strategy to multiple
dimensions is proposed.  Preliminary experimental results are obtained
on the SWITCHBOARD corpus which lead to a 6.5% perplexity reduction
over a word trigram model.
 
28 pages 
*Department of Mathematics, University Jean Monnet, 23 rue P. Michelon, 
42023 Saint-Etienne Cedex, France.
 |