|
CMU-CS-03-153
Computer Science Department
School of Computer Science, Carnegie Mellon University
CMU-CS-03-153
Learning Bayesian network Model Structure from Data
Dimitris Margaritis
May 2003
Ph.D. Thesis
CMU-CS-03-153.ps
CMU-CS-03-153.pdf
Keywords: Bayesian networks, Bayesian network structure learning,
continuous variable independence test, Markov blanket, causal discovery,
DataCube approximation, database count queries
In this thesis I address the important problem of the
determination of the structure of directed statistical models,
with the widely used class of Bayesian network models as a
concrete vehicle of my ideas. The structure of a Bayesian
network represents a set of conditional independence relations
that hold in the domain. Learning the structure of the Bayesian
network model that represents a domain can reveal insights into
its underlying causal structure. Moreover, it can also be used
for prediction of quantities that are difficult, expensive, or
unethical to measure -- such as the probability of lung cancer for
example -- based on other quantities that are easier to obtain.
The contributions of this thesis include (a) an algorithm for
determining the structure of a Bayesian network model from
statistical independence statements; (b) a statistical
independence test for continuous variables; and finally (c) a
practical application of structure learning to a decision support
problem, where a model learned from the database -- most
importantly its structure -- is used in lieu of the database to
yield fast approximate answers to count queries, surpassing
in certain aspects other state-of-the-art approaches to the
same problem.
126 pages
|