Computer Science Department
School of Computer Science, Carnegie Mellon University
A Gaussian Prior for Smoothing Maximum Entropy Models
Stanley F. Chen, Ronald Rosenfeld
Keywords:Language models, maximum entropy, smoothing
In certain contexts, maximum entropy (ME) modeling can be viewed as maximum
likelihood training for exponential models, and like other maximum likelihood
methods is prone to overfitting of training data.
Several smoothing methods for maximum entropy models have been proposed
to address this problem, but previous results do not make it
clear how these smoothing methods compare with smoothing methods for
other types of related models. In this work, we survey previous work
in maximum entropy smoothing and compare the performance of several
of these algorithms with conventional techniques for smoothing
n-gram language models. Because of the mature body of research in
n-gram model smoothing and the close connection between maximum
entropy and conventional n-gram models, this domain is well-suited
to gauge the performance of maximum entropy smoothing methods.
Over a large number of data sets, we find that an ME smoothing method
proposed to us by Lafferty (1997) performs as well as or
better than all other algorithms under consideration.
This general and efficient method involves using a Gaussian prior on the
parameters of the model and selecting maximum a posteriori
instead of maximum likelihood parameter values. We contrast this
method with previous n-gram smoothing methods to explain
its superior performance.