Center for Automated Learning and Discovery
School of Computer Science, Carnegie Mellon University
Statistical Models for Frequent Terms in Text
Edoardo M. Airoldi, William W. Cohen, Stephen E. Fienberg
In this paper we present statistical models for text which treat words with higher frequencies of occurrence in a sensible manner, and perform better than widely used models based on the multinomial distribution on a wide range of classification tasks, with two or more classes. Our models are based on the Poisson and Negative-Binomial distributions, which keep desirable properties of simplicity and analytic tractability.
||SCS Technical Report Collection
School of Computer Science homepage
This page maintained by email@example.com