Computer Science Department
School of Computer Science, Carnegie Mellon University


Learning Bayesian network Model Structure from Data

Dimitris Margaritis

May 2003

Ph.D. Thesis

Keywords: Bayesian networks, Bayesian network structure learning, continuous variable independence test, Markov blanket, causal discovery, DataCube approximation, database count queries

In this thesis I address the important problem of the determination of the structure of directed statistical models, with the widely used class of Bayesian network models as a concrete vehicle of my ideas. The structure of a Bayesian network represents a set of conditional independence relations that hold in the domain. Learning the structure of the Bayesian network model that represents a domain can reveal insights into its underlying causal structure. Moreover, it can also be used for prediction of quantities that are difficult, expensive, or unethical to measure -- such as the probability of lung cancer for example -- based on other quantities that are easier to obtain. The contributions of this thesis include (a) an algorithm for determining the structure of a Bayesian network model from statistical independence statements; (b) a statistical independence test for continuous variables; and finally (c) a practical application of structure learning to a decision support problem, where a model learned from the database -- most importantly its structure -- is used in lieu of the database to yield fast approximate answers to count queries, surpassing in certain aspects other state-of-the-art approaches to the same problem.

126 pages

Return to: SCS Technical Report Collection
School of Computer Science homepage

This page maintained by