MACHINE LEARNING TECHNICAL REPORT ABSTRACTS

	CMU-ML-09-104 Machine Learning Department School of Computer Science, Carnegie Mellon University CMU-ML-09-104 Detecting Anomalous Groups in Categorical Datasets Kaustav Das, Jeff Schneider, Daniel B. Neill April 2009 CMU-ML-09-104.pdf Keywords: Pattern detection, anomaly detection, machine learning We propose a new method for detecting groups of anomalies in categorical datasets. Our approach is a generalization of the spatial scan statistic, a commonly used method for detecting clusters of increased counts in spatial data. We extend this framework to non-spatial datasets with discrete valued attributes, where the degree of anomalousness of each record depends on its attribute values and we wish to find self-similar groups of anomalous records. We model the relationship between the attributes using a probabilistic model (e.g. Bayesian network), define a likelihood ratio statistic in terms of the pseudo-likelihoods for the null and alternative hypotheses, and maximize this statistic over all subsets of records. Since an exhaustive search over all such groups is computationally infeasible, we propose an efficient (but approximate) search heuristic. We show that this algorithm is able to accurately detect anomalous groups in real-world hospital, container shipping and network connections data. 21 pages

SCS Technical Report Collection School of Computer Science homepage This page maintained by reports@cs.cmu.edu