Machine Learning Department
School of Computer Science, Carnegie Mellon University


Learning Large-Scale Conditional Random Fields

Joseph K. Bradley

January 2013

Ph.D. Thesis


Keywords: NA

Conditional Random Fields (CRFs) [Lafferty et al., 2001] can offer computational and statistical advantages over generative models, yet traditional CRF parameter and structure learning methods are often too expensive to scale up to large problems. This thesis develops methods capable of learning CRFs for much larger problems. We do so by decomposing learning problems into smaller, simpler subproblems. These decompositions allow us to trade off sample complexity, computational complexity, and potential for parallelization, and we can often optimize these trade-offs in model- or data-specific ways. The resulting methods are theoretically motivated, are often accompanied by strong guarantees, and are effective and highly scalable in practice.

In the first part of our work, we develop core methods for CRF parameter and structure learning. For parameter learning, we analyze several methods and produce PAC learnability results for certain classes of CRFs. Structured composite likelihood estimation proves particularly successful in both theory and practice, and our results offer guidance for optimizing estimator structure. For structure learning, we develop a maximum-weight spanning tree-based method which outperforms other methods for recovering tree CRFs. In the second part of our work, we take advantage of the growing availability of parallel platforms to speed up regression, a key component of our CRF learning methods. Our Shotgun algorithm for parallel regression can achieve near-linear speedups, and extensive experiments show it to be one of the fastest methods for sparse regression.

147 pages

SCS Technical Report Collection
School of Computer Science