Computer Science Department
School of Computer Science, Carnegie Mellon University


A Gaussian Prior for Smoothing Maximum Entropy Models

Stanley F. Chen, Ronald Rosenfeld

February 1999

Keywords:Language models, maximum entropy, smoothing

In certain contexts, maximum entropy (ME) modeling can be viewed as maximum likelihood training for exponential models, and like other maximum likelihood methods is prone to overfitting of training data. Several smoothing methods for maximum entropy models have been proposed to address this problem, but previous results do not make it clear how these smoothing methods compare with smoothing methods for other types of related models. In this work, we survey previous work in maximum entropy smoothing and compare the performance of several of these algorithms with conventional techniques for smoothing n-gram language models. Because of the mature body of research in n-gram model smoothing and the close connection between maximum entropy and conventional n-gram models, this domain is well-suited to gauge the performance of maximum entropy smoothing methods. Over a large number of data sets, we find that an ME smoothing method proposed to us by Lafferty (1997) performs as well as or better than all other algorithms under consideration. This general and efficient method involves using a Gaussian prior on the parameters of the model and selecting maximum a posteriori instead of maximum likelihood parameter values. We contrast this method with previous n-gram smoothing methods to explain its superior performance.

23 pages

Return to: SCS Technical Report Collection
School of Computer Science homepage

This page maintained by