|   | CMU-CS-99-108 Computer Science Department
 School of Computer Science, Carnegie Mellon University
 
    
     
 CMU-CS-99-108
 
A Gaussian Prior for Smoothing Maximum Entropy Models 
Stanley F. Chen, Ronald Rosenfeld 
February 1999  
CMU-CS-99-108.psCMU-CS-99-108.pdf
 Keywords:Language models, maximum entropy, smoothing
 In certain contexts, maximum entropy (ME) modeling can be viewed as maximum
likelihood training for exponential models, and like other maximum likelihood
methods is prone to overfitting of training data.
Several smoothing methods for maximum entropy models have been proposed
to address this problem, but previous results do not make it
clear how these smoothing methods compare with smoothing methods for 
other types of related models. In this work, we survey previous work 
in maximum entropy smoothing and compare the performance of several
of these algorithms with conventional techniques for smoothing
n-gram language models.  Because of the mature body of research in
n-gram model smoothing and the close connection between maximum 
entropy and conventional n-gram models, this domain is well-suited 
to gauge the performance of maximum entropy smoothing methods.
Over a large number of data sets, we find that an ME smoothing method 
proposed to us by Lafferty (1997) performs as well as or
better than all other algorithms under consideration.
This general and efficient method involves using a Gaussian prior on the
parameters of the model and selecting maximum a posteriori
instead of maximum likelihood parameter values.  We contrast this
method with previous n-gram smoothing methods to explain
its superior performance.
 
23 pages 
 |