CMU-CALD-04-102
Center for Automated Learning and Discovery
School of Computer Science, Carnegie Mellon University



CMU-CALD-04-102

PCX: Markov Blanket Classification for
Large Data Sets with Few Cases

Xue Bai, Clark Glymour, Rema Padman,
Joseph Ramsey, Peter Spirtes, Frank Wimberly

March 2004

CMU-CALD-04-102.pdf


Keywords: PCX, PC algorithm, Bayesian Networks, Markov Blanket, Markov Blanket Bayesian Classifier

Data sets with many discrete variables and relatively few cases arise in many domains. Several studies have sought to identify the Markov Blanket (MB) of a target variable by filtering variables using statistical decisions for conditional independence and then applying a classifier using the MB predictors. Other studies have applied the PC algorithm or heuristic procedures, to estimate a DAG model of the MB and classify by Bayesian updating. The PC output is not a DAG or MB, and how a DAG representation of the MB is formed in these studies is not specified. Using a filter from the HITON feature selection procedure, we find a Markov equivalence class using the PC algorithm, provide an explicit algorithm for converting the output to a graphical Markov Blanket, and classify by Bayesian updating. We apply this procedure (PCX) to five empirical data sets from different domains, and compare it with results from HITON, which applies several state-of-the-art classifiers. The PCX classifier has fewer variables than those found by the HITON procedure, and gives comparable classification accuracy while supplying insight into possible causal relations among the variables.

15 pages


SCS Technical Report Collection
School of Computer Science homepage

This page maintained by reports@cs.cmu.edu