CALD TECHNICAL REPORT ABSTRACTS

	CMU-CALD-04-103 Center for Automated Learning and Discovery School of Computer Science, Carnegie Mellon University CMU-CALD-04-103 Tractable Structural Learning of Large Bayesian Networks from Sparse Data Anna Goldenberg, Andrew W. Moore April 2004 CMU-CALD-04-103.pdf Keywords: Bayesian networks/graphical models, statistical learning, Bayes Net structure learning This paper addresses three questions. Is it useful to attempt to learn a Bayesian network structure with hundreds of thousands of nodes? How should such structure search proceed practically? The third question arises out of our approach to the second: how can Frequent Sets (Agrawal et al., 1993), which are extremely popular in the area of descriptive data mining, be turned into a probabilistic model? Large sparse datasets with hundreds of thousands of records and attributes appear in social networks, warehousing, supermarket transactions and web logs. The complexity of structural search made learning of factored probabilistic models on such datasets unfeasible. We propose to use Frequent Sets to significantly speed up the structural search. Unlike previous approaches, we not only cache n-way sufficient statistics, but also exploit their local structure. We also present an empirical evaluation of our algorithm applied to several massive datasets. 20 pages

SCS Technical Report Collection School of Computer Science homepage This page maintained by reports@cs.cmu.edu