Computer Science Department
School of Computer Science, Carnegie Mellon University


Scalable Distribution-to-Distribution Regression

Andrea Klein

July 2014

M.S. Thesis


Keywords: Machine Learning, Cosmology, Dark Matter, N-body Simulations

In this thesis I present a scalable approach to distribution-to-distribution regression on large, multi-dimensional datasets. The basic algorithm is demonstrated on 1-dimensional toy data, then modified for efficiency and scalability. Key enhancements include parallel computation of non-parametric estimators and the use of a ball tree to support efficient nearest-neighbor search in high dimension. I then explore the ability of this technique to compute the final states of cosmological N-body simulations. An existing method uses cosmological perturbation theory to rapidly approximate the evolution of simulations; I attempt to learn the unknown function from the approximate to the true distributions, thereby exploiting the speed of perturbative approximation while still approaching the accuracy of a true N-body simulation. I investigate whether it is possible to train the algorithm on O(1) simulations that have been run both exactly and approximately, thereby making it possible to quickly generate many more final simulation states via regression.

74 pages

Return to: SCS Technical Report Collection
School of Computer Science

This page maintained by