CMU-ML-20-111
Machine Learning Department
School of Computer Science, Carnegie Mellon University



CMU-ML-20-111

Towards a Unified Framework for Learning and Reasoning

Han Zhao

August 2020

Ph.D. Thesis

CMU-ML-20-111.pdf


Keywords: Artificial Intelligence, Machine Learning, Deep Learning, Information Theory, Learn-ing Theory, Probabilistic Graphical Models, Representation Learning, Algorithmic Fairness, DomainAdaptation, Transfer Learning, Multitask Learning, Sum-Product Networks, Probabilistic Circuits


The success of supervised machine learning in recent years crucially hinges on the availabil-ity of large-scale and unbiased data, which is often time-consuming and expensive to collect. Recent advances in deep learning focus on learning rich and invariant representations that have found abundant applications in domain adaptation, multitask learning, algorithmic fairness,and machine translations, just to name a few. However, it is not clear what price we have to pay in terms of task utility for such universal representations. On the other hand, learning is only one of the two most fundamental cognitive abilities of intelligent agents. An intelligent agent needs to have both the ability to learn from the experience, and the ability to reason from what has been learned. However, classic symbolic reasoning cannot model the inherent uncertainty that ubiquitously exists, and it is not robust to noisy observations. Perhaps more fundamentally, reasoning is computationally intractable in general. As a result, learning, which often takes reasoning as a sub-procedure, is also hard. Building on the fundamental concepts from information theory and theoretical computer science, this thesis aims to understand the inherent tradeoff between utility and invariance in learning the representations, and to develop efficient algorithms for learning tractable and exact probabilistic inference machines.

This thesis contains two parts. The first part is devoted to understanding and learning invariant representations. In particular, we will focus on understanding the costs of existing invariant representations by characterizing a fundamental tradeoff between invariance and utility. First, we will use domain adaptation as an example to both theoretically and empirically show such tradeoff in achieving small joint generalization error. This result also impliesan inherent tradeoff between demographic parity, a statistical notion of group fairness, and utility in both classification and regression settings. Going beyond, we will further show that such general tradeoff exists in learning with structured data. In particular, we shall derivean impossibility theorem for universal machine translation by learning language-invariant representations. Second, we will focus on designing learning algorithms to escape the existing tradeoff and to utilize the benefits of invariant representations. We will show how the algorithm can be used to guarantee equalized treatment of individuals between groups, and discuss what additional problem structure it requires to permit efficient domain adaptation and machine translation through learning invariant representations.

The second part of the thesis is devoted to learning tractable and exact circuits for probabilistic reasoning. It is well-known that exact marginal and conditional inference in classic probabilistic graphical models (PGMs), including Bayesian Networks (BNs) and Markov Networks (MNs), is #P-complete. As a result, practitioners usually need to resort to various approximate inference schemes to ensure computational tractability. Probabilistic circuits, which include Sum-Product Networks (SPNs) as a special case, have been proposed astractable deep models for exact probabilistic inference. They distinguish themselves from other types of probabilistic graphical models by the fact that inference can be done exactly in linear time with respect to the size of the circuit. This has generated a lot of interest since inference is not only a powerful tool to reason under uncertainty, but also a core task for parameter estimation and structure learning. In this part, we will concentrate on both theoretica l and practical parts of learning tractable probabilistic circuits. In particular, we will investigate the representational power of SPNs, as well as its parameter learning procedures in both online and offline settings.

227 pages

Thesis Committee:
Georffrey J. Gordon (Chair)
Ruslan Sakalhutdinov
Barnabáas Póczos
Tommi S. Jaakola (Massachusetts Institute of Technology)

Roni Rosenfeld, Head, Machine Learning Department
Martial Hebert, Dean, School of Computer Science


SCS Technical Report Collection
School of Computer Science