Computer Science Department
School of Computer Science, Carnegie Mellon University
Assessing the Calibration of Naive Bayes' Posterior Estimates
Paul N. Bennett
Keywords: Naive Bayes, calibration, well-calibrated, reliability,
posterior, text classification, Reuters
In this paper, we give evidence that the posterior distribution of
Naive Bayes goes to zero or one exponentially with document length.
While exponential change may be expected as new bits of information
are added, adding new words does not always correspond to new
information. Essentially as a result of its independence assumption,
the estimates grow too quickly. We investigate one parametric family
that attempts to downweight the growth rate. The parameters of this
family are estimated using a maximum likelihood scheme, and the
results are evaluated.