CMU-CS-21-138 Computer Science Department School of Computer Science, Carnegie Mellon University
Training Deep Networks with Material-Aware Supervision Tiancheng Zhi Ph.D. Thesis September 2021
Deep learning is a strong tool for predicting scene properties from images. Typical supervised methods require large scale real data with ground truth, which is hard to obtain. This situation demands techniques with little ground truth real data. Without annotations, an apparent question is: Where does the supervision signal come from for training deep networks? In this thesis, we demonstrate that the awareness of materials provides such easy-to-obtain signals. We also present a framework that can be used for different tasks to exploit material-aware supervisions. We consider four forms of supervision signals in the framework: ground truth and photometric supervisions from appearance models, and adversarial and confidence supervisions from appearance locations. Specifically, given a task, an approximate appearance model can be built to describe the whole or part of the scene. With this model, we could render synthetic images for ground truth supervision or optimize the networks using photometric supervision. The scene may also contain spatially-varying materials providing additional appearance location information. Such information can be used for separating special appearances using adversarial supervision, or fixing failure cases using confidence supervision. We present four applications to demonstrate the effectiveness of the proposed framework. In the first application, we introduce an approach for fine-grained recognition of powders on complex backgrounds, to provide an example of synthetic ground truth supervision from translucent material awareness. We build a blending model for synthesizing images of translucent powders on various backgrounds. As a second contribution, we demonstrate a method for recovering human texture and geometry from an RGB-D video, as an example of photometric supervision from Lambertian material model. In the third task, we propose a floor appearance decomposition approach for realistic object insertion, as an example of adversarial supervision for diffuse-specular separation and direct sunlight detection. We obtain coarse locations of specular and sunlight appearances based on layout geometry and the awareness of emissive and transparent materials. Lastly, we present a cross-spectral stereo matching method for road scenes, to show that the confidence supervision from non-Lambertian appearance locations helps fix regions of failure. We believe that the method proposed in this thesis can be used in more real applications, including interior design, medical imaging, and autonomous driving, especially when ground truth real data are not easy to obtain.
147 pages
Thesis Committee:
Srinivasan Seshan, Head, Computer Science Department
| |
Return to:
SCS Technical Report Collection This page maintained by reports@cs.cmu.edu |