CMU-CS-00-155
Computer Science Department
School of Computer Science, Carnegie Mellon University



CMU-CS-00-155

Assessing the Calibration of Naive Bayes' Posterior Estimates

Paul N. Bennett

September 2000

CMU-CS-00-155.ps
CMU-CS-00-155.pdf


Keywords: Naive Bayes, calibration, well-calibrated, reliability, posterior, text classification, Reuters


In this paper, we give evidence that the posterior distribution of Naive Bayes goes to zero or one exponentially with document length. While exponential change may be expected as new bits of information are added, adding new words does not always correspond to new information. Essentially as a result of its independence assumption, the estimates grow too quickly. We investigate one parametric family that attempts to downweight the growth rate. The parameters of this family are estimated using a maximum likelihood scheme, and the results are evaluated.

8 pages


Return to: SCS Technical Report Collection
School of Computer Science homepage

This page maintained by reports@cs.cmu.edu