CMU-CS-16-123
Computer Science Department
School of Computer Science, Carnegie Mellon University



CMU-CS-16-123

Tinkering Under The Hood:
Pictorial Languages with Applications to
Interactive-Zero Shot Learning

Vivek R. Krishnan

August 2016

M.S. Thesis

CMU-CS-16-123.pdf


Keywords: Convolutional neural networks, knowledge transfer, weak supervision, zero-shot learning, visualization, internal semantics, dimensionality reduction, deformable part models, object detection, image classification

We consider the task of visual zero-shot learning, in which a system must learn to recognize concepts omitted from the training set. While most prior work make use of linguistic cues to do this, we do so by using a pictorial language representation of the training set, implicitly learned by a CNN, to generalize to new classes. We first demonstrate the robustness of pictorial language classifiers (PLCs) by applying them in a weakly supervised manner: labeling unlabeled concepts for visual classes present in the training data. Specifically we show that a PLC built on top of a CNN trained for ImageNet classification can localize humans in Graz- 02 and determine the pose of birds in PASCAL-VOC without extra labeled data or additional training. We then apply PLCs in an interactive zero-shot manner, demonstrating that pictorial languages are expressive enough to detect a set of visual classes in MSCOCO that never appear in the ImageNet training set.

30 pages

Thesis Committee:
Deva Ramanan (Co-Chair)
Kayvon Fatahalian (Co-Chair)

Frank Pfenning, Head, Computer Science Department
Andrew W. Moore, Dean, School of Computer Science



Return to: SCS Technical Report Collection
School of Computer Science