CMU-CS-21-138
Computer Science Department
School of Computer Science, Carnegie Mellon University



CMU-CS-21-138

Training Deep Networks with Material-Aware Supervision

Tiancheng Zhi

Ph.D. Thesis

September 2021

CMU-CS-21-138.pdf


Keywords: Deep Networks, Supervision Signal, Material

Deep learning is a strong tool for predicting scene properties from images. Typical supervised methods require large scale real data with ground truth, which is hard to obtain. This situation demands techniques with little ground truth real data.

Without annotations, an apparent question is: Where does the supervision signal come from for training deep networks? In this thesis, we demonstrate that the awareness of materials provides such easy-to-obtain signals. We also present a framework that can be used for different tasks to exploit material-aware supervisions.

We consider four forms of supervision signals in the framework: ground truth and photometric supervisions from appearance models, and adversarial and confidence supervisions from appearance locations. Specifically, given a task, an approximate appearance model can be built to describe the whole or part of the scene. With this model, we could render synthetic images for ground truth supervision or optimize the networks using photometric supervision. The scene may also contain spatially-varying materials providing additional appearance location information. Such information can be used for separating special appearances using adversarial supervision, or fixing failure cases using confidence supervision. We present four applications to demonstrate the effectiveness of the proposed framework. In the first application, we introduce an approach for fine-grained recognition of powders on complex backgrounds, to provide an example of synthetic ground truth supervision from translucent material awareness. We build a blending model for synthesizing images of translucent powders on various backgrounds. As a second contribution, we demonstrate a method for recovering human texture and geometry from an RGB-D video, as an example of photometric supervision from Lambertian material model. In the third task, we propose a floor appearance decomposition approach for realistic object insertion, as an example of adversarial supervision for diffuse-specular separation and direct sunlight detection. We obtain coarse locations of specular and sunlight appearances based on layout geometry and the awareness of emissive and transparent materials. Lastly, we present a cross-spectral stereo matching method for road scenes, to show that the confidence supervision from non-Lambertian appearance locations helps fix regions of failure.

We believe that the method proposed in this thesis can be used in more real applications, including interior design, medical imaging, and autonomous driving, especially when ground truth real data are not easy to obtain.

147 pages

Thesis Committee:
Srinivasa G. Narasimhan (Co-Chair)
Martial Hebert (Co-Chair)
Matthew P. O'Toole
Sing Bing Kang (Zillow Group)

Srinivasan Seshan, Head, Computer Science Department
Martial Hebert, Dean, School of Computer Science


Return to: SCS Technical Report Collection
School of Computer Science

This page maintained by reports@cs.cmu.edu