Center for Automated Learning and Discovery
School of Computer Science, Carnegie Mellon University


Statistical Models for Frequent Terms in Text

Edoardo M. Airoldi, William W. Cohen, Stephen E. Fienberg

May 2004


Keywords: Bayesian models, multinomial, binomial, Poisson, negative-binomial

In this paper we present statistical models for text which treat words with higher frequencies of occurrence in a sensible manner, and perform better than widely used models based on the multinomial distribution on a wide range of classification tasks, with two or more classes. Our models are based on the Poisson and Negative-Binomial distributions, which keep desirable properties of simplicity and analytic tractability.

12 pages

SCS Technical Report Collection
School of Computer Science homepage

This page maintained by