
CMUCS03153
Computer Science Department
School of Computer Science, Carnegie Mellon University
CMUCS03153
Learning Bayesian network Model Structure from Data
Dimitris Margaritis
May 2003
Ph.D. Thesis
CMUCS03153.ps
CMUCS03153.pdf
Keywords: Bayesian networks, Bayesian network structure learning,
continuous variable independence test, Markov blanket, causal discovery,
DataCube approximation, database count queries
In this thesis I address the important problem of the
determination of the structure of directed statistical models,
with the widely used class of Bayesian network models as a
concrete vehicle of my ideas. The structure of a Bayesian
network represents a set of conditional independence relations
that hold in the domain. Learning the structure of the Bayesian
network model that represents a domain can reveal insights into
its underlying causal structure. Moreover, it can also be used
for prediction of quantities that are difficult, expensive, or
unethical to measure  such as the probability of lung cancer for
example  based on other quantities that are easier to obtain.
The contributions of this thesis include (a) an algorithm for
determining the structure of a Bayesian network model from
statistical independence statements; (b) a statistical
independence test for continuous variables; and finally (c) a
practical application of structure learning to a decision support
problem, where a model learned from the database  most
importantly its structure  is used in lieu of the database to
yield fast approximate answers to count queries, surpassing
in certain aspects other stateoftheart approaches to the
same problem.
126 pages
