CMU-ML-20-106
Machine Learning Department
School of Computer Science, Carnegie Mellon University



CMU-ML-20-106

Data Decompositions for Constrainted Visual Learning

Calvin Murdock

April 2020

Ph.D. Thesis

CMU-ML-20-106.pdf


Keywords: Computer vision, representation learning, component analysis, constrained optimization, deep neural networks, sparse approximation


With the increasing prevalence of large daWith the increasing prevalence of large datasets of images, machine learning has all but overtaken the field of computer vision. In place of specialized domain knowledge, many problems are now dominated by deep neural networks that are trained end-to-end on collections of labeled examples. But can we trust their predictions in real-world applications? Purely data-driven approaches can be thwarted by high dimensionality, insuļ¬€icient training data variability, intrinsic problem ambiguity, or adversarial vulnerability. In this thesis, we address two strategies for encouraging more effective generalization: 1) integrating prior knowledge through inference constraints 2) theoretically motivated model selection. While inherently challenging for feed-forward deep networks, they are prevalent in traditional techniques for data decomposition such as component analysisand sparse coding. Building upon recent connections between deep learning and sparse approximation theory, we develop new methods to bridge this gap between deep and shallow learning.

We first introduce a formulation for data decomposition posed as approximate constraint satisfaction, which can accommodate richer instance-level prior knowledge. We apply this framework in Semantic Component Analysis, a methodvfor weakly-supervised semantic segmentation with constraints that encourage interpretability even in the absence of supervision. From its close relationshipto standard component analysis, we also derive Additive Component Analysisvfor learning nonlinear manifold representations with roughness-penalized additive models.

Then, we propose Deep Component Analysis, an expressive model of constrained data decomposition that enforces hierarchical structure through multiple layers of constrained latent variables. While it can again be approximatedby feed-forward deep networks, exact inference requires an iterative algorithmf or minimizing approximation error subject to constraints. This is implemented using Alternating Direction Neural Networks, recurrent neural networks that can be trained discriminatively with backpropagation. Generalization capacity is improved by replacing nonlinear activation functions with constraints that are enforced by feedback connections. This is demonstrated experimentally through applications to single-image depth prediction with sparse output constraints.

Finally, we propose a technique for deep model selection motivated by sparse approximation theory. Specifically, we interpret the activations of feed-forwarddeep networks with rectified linear units as algorithms for approximate inferencein structured nonnegative sparse coding models. These models are then comparedby their capacities for achieving low mutual coherence, which is theoretically tied to the uniqueness and robustness of sparse representations. This provides a framework for jointly quantifying the contributions of architectural hyperparameters such as depth, width, and skip connections without requiring expensive validation on a specific dataset. Experimentally, we show correlation between a lowerbound on mutual coherence and validation error across a variety of common network architectures including DenseNets and ResNets. More broadly, this suggests promising new opportunities for understanding and designing deep learning architectures based on connections to structured data decomposition.

114 pages

Thesis Committee:
Simon Lucey (Chair)
Katerina Fragkiadaki
Deva Ramanan
James Hays (Georgia Institute of Technology)

Roni Rosenfeld, Head, Machine Learning Department
Martial Hebert, Dean, School of Computer Science


SCS Technical Report Collection
School of Computer Science