CMU-CS-21-128
Computer Science Department
School of Computer Science, Carnegie Mellon University



CMU-CS-21-128

Dynamic Model Specialization for Efficient Inference, Training and Supervision

Ravi Teja Mullapudi

Ph.D. Thesis

August 2021

CMU-CS-21-128.pdf


Keywords: Model Specialization, Computer Vision, Machine Learning, Deep Learning

Abstract Recent supervised learning approaches focus on designing and building models that generalize to a wide range of scenarios. The key ingredients for building these general models are large scale datasets that capture a diverse set of scenarios and computational resources to train large models. This large scale supervised learning approach has well known scalability challenges namely: 1) accurate general models are computationally expensive for training and inference 2) collecting and labeling large datasets requires extensive human effort and 3) datasets need to be repeatedly curated due to shifts in the target distribution. In this thesis, we argue that in many cases creating a set of highly specialized models that span the domain of interest can reduce model inference, training, and supervision costs, compared to creating a single monolithic model that generalizes across the entire domain. Specifically, we exploit temporal specialization for building efficient video segmentation models. We show that continuously specializing a compact model to the content in a video stream enables accurate and efficient inference. We leverage specialization to visually similar categories for building efficient image classification architectures. We show that by specializing model features to discriminate between visually similar categories, one can improve inference efficiency by only computing the subset of features necessary for classifying a specific image. We exploit specialization to individual categories for reducing human labeling effort in building models for rare categories. We show that models specialized for binary classification of individual rare categories reduce human effort in mining large unlabeled data collections for relevant examples. More broadly, we demonstrate that by dynamically specializing to a moment in time, to an input scene, or to a specific object category, it is possible to train accurate models quickly, reduce inference costs, and significantly reduce the amount of supervision required for training.

112 pages

Thesis Committee:
Kayvon Fatahalian (Co-Chair)
Deva Ramanan (Co-Chair)
David G. Andersen
Ross Girshick (Facebook)
William R. Mark (Google)

Srinivasan Seshan, Head, Computer Science Department
Mar tial Hebert, Dean, School of Computer Science


Return to: SCS Technical Report Collection
School of Computer Science

This page maintained by reports@cs.cmu.edu