MACHINE LEARNING TECHNICAL REPORT ABSTRACTS

	CMU-ML-07-122 Machine Learning Department School of Computer Science, Carnegie Mellon University CMU-ML-07-122 Actively Learning Specific Function Properties with Applications to Statistical Inference Brent Bryan December 2007 Ph.D. Thesis CMU-ML-07-122.pdf Keywords: Active learning, statistical methods, astronomy, cosmology Active learning techniques have previously been shown to be extremely effective for learning a target function over an entire parameter space based on a limited set of observations. However, in many cases, only a specific property of the target function needs to be learned. For instance, when discovering the boundary of a region – such as the locations in which the wireless network strength is above some operable level, – we are interested in learning only the level-set of the target function. While techniques that learn the entire target function over the parameter space can be used to detection specific properties of the target function (e.g. level-sets), methods that learn only the required properties can be significantly more efficient, especially as the dimensionality of the parameter space increases. These active learning algorithms have a natural application in many statistical inference techniques. For example, given a set of data and a physical model of the data, which is a function of several variables, a scientist is often interested in determining the ranges of the variables which are statistically supported by the data. We show that many frequentist statistical inference techniques can be reduced to a level-set detection problem or similar search of a property of the target function, and hence benefit from active learning algorithms which target specific properties. Using these active learning algorithms significantly decreases the number of experiments required to accurately detect the boundaries of the desired 1 - α confidence regions. Moreover, since computing the model of the data given the input parameters may be expensive (either computationally, or monetarily), such algorithms can facilitate analyses that were previously infeasible. We demonstrate the use of several statistical inference techniques combined with active learning algorithms on several cosmological data sets. The data sets vary in the dimensionality of the input parameters from two to eight. We show that naive algorithms, such as random sampling or grid based methods, are computationally infeasible for the higher dimensional data sets. However, our active learning techniques can efficiently detect the desired 1-α confidence regions. Moreover, the use of frequentist inference techniques allows us to easily perform additional inquiries, such as hypothetical restrictions on the parameters and joint analyses of all the cosmological data sets, with only a small number of additional experiments. 214 pages

SCS Technical Report Collection School of Computer Science homepage This page maintained by reports@cs.cmu.edu