CMU-CS-21-120
Computer Science Department
School of Computer Science, Carnegie Mellon University



CMU-CS-21-120

Human-efficient Discovery of Edge-based
Training Data for Visual Machine Learning

Ziqiang Feng

Ph.D. Thesis

August 2021

CMU-CS-21-120.pdf


Keywords: Edge Computing, Cloudlet, Video Analytics, Training Data

Deep learning enables effective computer vision without hand crafting feature extractors. It has great potential if applied to specialized domains such as ecology, military, and medical science. However, the laborious task of creating labeled training sets of rare targets is a major deterrent to achieving its goal. A domain expert's time and attention is precious. We address this problem by designing, implementing, and evaluating Eureka, a system for human-efficient discovery of rare phenomena from unlabeled visual data. Eureka's central idea is interactive content-based search of visual data based on early-discard and machine learning. We first demonstrate its effectiveness for curating training sets of rare objects. By analyzing contributing factors to human efficiency, we identify and evaluate important system-level optimizations that utilize edge computing and intelligent storage. Lastly, we extend Eureka to the task of discovering temporal events from video data.

133 pages

Thesis Committee:
Mahadev Satyanarayanan (Chair)
Martial Hebert
Roberta Klatzky
Padmanabhan Pillai (Intel Labs)

Srinivasan Seshan, Head, Computer Science Department
Martial Hebert, Dean, School of Computer Science


Return to: SCS Technical Report Collection
School of Computer Science

This page maintained by reports@cs.cmu.edu