Machine Learning Department
School of Computer Science, Carnegie Mellon University


Distribution and Histogram (DisH) Learning

Junier Bárbaro Oliva

July 2018

Ph.D. Thesis


Keywords: Distributions, Sets, Sequences, Nonparametric, Statistics, Machine Learning

Machine learning has made incredible advances in the last couple of decades. Notwithstanding, a lot of this progress has been limited to basic point-estimation tasks. That is, a large bulk of attention has been geared at solving problems that take in a static finite vector and map it to another static finite vector. However, we do not navigate through life in a series of point-estimation problems, mapping x to y. Instead, we find broad patterns and gather a far-sighted understanding of data by considering collections of points like sets, sequences, and distributions. Thus, contrary to what various billionaires, celebrity theoretical physicists, and sci-fi classics would lead you to believe, true machine intelligence is fairly out of reach currently. In order to bridge this gap, this thesis develops algorithms that understand data at an aggregate, holistic level.

This thesis pushes machine learning past the realm of operating over static finite vectors, to start reasoning ubiquitously with complex, dynamic collections like sets and sequences. We develop algorithms that consider distributions as functional covariates/responses, and methods that use distributions as internal representations. We consider distributions since they are a straightforward characterization of many natural phenomena and provide a richer description than simple point data by detailing information at an aggregate level. Our approach may be seen as addressing two sides of the same coin: on one side, we use traditional machine learning algorithms adjusted to directly operate on inputs and outputs that are probability functions (and sample sets); on the other side, we develop better estimators for traditional tasks by making use of and adjusting internal distributions.

We begin by developing algorithms for traditional machine learning tasks for the cases when one's input (and/or possibly output) is not a finite point, but is instead a distribution, or sample set drawn from a distribution. We develop a scalable nonparametric estimator for regressing a real valued response given an input that is a distribution, a case which we coin distribution to real regression (DRR). Furthermore, we extend this work to the case when both the output response and the input covariate are distributions; a task we call distribution to distribution regression (DDR).

After, we look to expand the versatility and efficacy of traditional machine learning tasks through novel methods that operate with distributions of features. For example, we show that one may improve the performance of kernel learning tasks by learning a kernel's spectral distribution in a data-driven fashion using Bayesian nonparametric techniques. Moreover, we study how to perform sequential modeling by looking at summary statistics from past points. Lastly, we also develop methods for high-dimensional density estimation that make use of flexible transformations of variables and autoregressive conditionals.

131 pages

Thesis Committee:
Barnabás Póczos (Co-Chair)
Jeff Schneider (Co-Chair)
Gregory R. Ganger
Ruslan Salakhutdinov
Le Song (Georgia Institute of Technology)

Roni Rosenfeld, Head, Machine Learning Department
Andrew W. Moore, Dean, School of Computer Science

SCS Technical Report Collection
School of Computer Science