CMU-ML-15-102
Machine Learning Department
School of Computer Science, Carnegie Mellon University



CMU-ML-15-102

Spectral Probabilistic Modeling and Applications to Natural Language Processing

Ankur Parikh

August 2015

Ph.D. Thesis

CMU-ML-15-102.pdf


Keywords: Probabilistic graphical models, spectral methods, kernels, unsupervised parsing, language modeling


Probabilistic modeling with latent variables is a powerful paradigm that has led to key advances in many applications such natural language processing, text mining, and computational biology. Unfortunately, while introducing latent variables substantially increases representation power, learning and modeling can become considerably more complicated. Most existing solutions largely ignore non-identifiability issues in modeling and formulate learning as a nonconvex optimization problem, where convergence to the optimal solution is not guaranteed due to local minima.

In this thesis, we propose to tackle these problems through the lens of linear/multi-linear algebra. Viewing latent variable models from this perspective allows us to approach key problems such as structure learning and parameter learning using tools such as matrix/tensor decompositions, inversion, and additive metrics. These new tools enable us to develop novel solutions to learning in latent variable models with theoretical and practical advantages. For example, our spectral parameter learning methods for latent trees and junction trees are provably consistent, local-optima-free, and 1-2 orders of magnitude faster than EM for large sample sizes.

In addition, we focus on applications in Natural Language Processing, using our insights to not only devise new algorithms, but also to propose new models. Our method for unsupervised parsing is the first algorithm that has both theoretical guarantees and is also practical, performing favorably to the CCM method of Klein and Manning. We also developed power low rank ensembles, a framework for language modeling that generalizes existing n-gram techniques to non-integer n. It consistently outperforms state-of-the-art Kneser Ney baselines and can train on billion-word datasets in a few hours.

196 pages

Thesis Committee:
Eric Xing (Chair)
Geoffrey Gordon
John Platt (Google)
Noah Smith
Le Song (Georgia Institute of Technology)

Tom M. Mitchell, Head, Machine Learning Department
Andrew W. Moore, Dean, School of Computer Science


SCS Technical Report Collection
School of Computer Science