CMU-ML-18-108
Machine Learning Department
School of Computer Science, Carnegie Mellon University



CMU-ML-18-108

Representation Learning @ Scale

Manzil Zaheer

July 2018

Ph.D. Thesis

CMU-ML-18-108.pdf

Keywords: NA


Machine learning techniques are reaching or exceeding human level performances in tasks involving simple data like image classification, translation, and text-to-speech. The success of these machine learning algorithms is attributed to highly versatile representations learnt from data using deep networks or intricately designed Bayesian models. Representation learning has also provided hints in neuroscience, e.g. understanding how humans might categorize objects. Despite these instances of success, progress has been limited to simple data-types so far.

Most real-world data come in all shapes and sizes, not just as images or text, but also as point clouds, sets, graphs, compressed or even heterogeneous combinations thereof. In this thesis, we develop representation learning algorithms for such complex data types by leveraging structure and establishing new mathematical properties. Representations learned in this fashion were applied on diverse domains and found to be competitive with task-specific state-of-the-art methods.

Having representations is not enough in various applications - its interpretability is as crucial as its accuracy. Deep models often yield better accuracy but require a large number of parameters, often in contrast to the simplicity of the underlying data, rendering it uninterpretable. This is highly undesirable in tasks like user modeling. In this thesis, we show that by leveraging structure by incorporating domain knowledge in the form of Bayesian components on top of deep models, we learn sparser representations with discrete components that are more amenable to human interpretation. Our experimental evaluations show that the proposed techniques compare favorably with several state-of-the-art baselines.

Finally, inferring interpretable representations from large-scale data is desirable, but often hindered by a mismatch between computational resources and statistical models. In this thesis, we bridge this gap by again leveraging structure, albeit of a different kind. Our solutions are based on a combination of modern computational techniques/data structures on one side and modified statistical inference algorithms on the other which exploit topological properties of the training objective. This introduces new ways to parallelize, reduce look-ups, handle variable state space size, and escape saddle points. On latent variable models, like latent Dirichlet allocation (LDA), we observe significant gains in performance.

To summarize, in this thesis, we advance the three major aspects of representation learning – diversity: being able to handle different types of data, interpretability: being accessible to and understandable by humans, and scalability: being able to process massive datasets in a reasonable time and budget – all by leveraging some form of structure.

314 pages

Thesis Committee:
Barnabás Póczos (Co-Chair)
Ruslan Salakhutdinov (Co-Chair)
Geoffrey J. Gordon
Andrew McCallum (University of Massachusetts Amherst)
Alexander J. Smola (Amazon)

Roni Rosenfeld, Head, Machine Learning Department
Andrew W. Moore, Dean, School of Computer Science


SCS Technical Report Collection
School of Computer Science