CMU-CS-22-111 Computer Science Department School of Computer Science, Carnegie Mellon University
The Language of Sketches Xiaoyu Zhang M.S. Thesis May 2022
Creative AI has seen much progress in recent years. Works like DALL-E 2 can generate inspiring art pieces from text descriptions. Instead of synthesizing realistic art works from language, we approach creativity from a different angle and investigate composition of semantic parts and visual concepts in sketches. For example, people can draw a circle to represent the moon, a scoop of icecream, or the face of a cat. Similarly, language descriptors can be composed to create new concepts. People can draw a large round cat face or a narrow oval cat face. In order to study this reuse of abstract concepts, we construct a dataset of language annotated sketches. We examined current sketch datasets and found that they either lack language annotations or semantic part annotations. Therefore, we collect a dataset of 11,150 (sketch part, text) pairs for 572 face sketches and 787 angel sketches. To understand the limits of current vision-language models, we fine-tuned CLIP, a model pretrained with a contrastive objective on 400 million (image, text) pairs and can map (image, text) pairs into a joint vision-language embedding space. We observed that (1) CLIP cannot easily generalize to an unseen category on the task of pairing sketches with their descriptions even though similar shapes and descriptions have occurred in training; (2) through fine-tuning, average cosine distance has increased between a pair of descriptors used by annotators to differentiate two sketches. With insights gained about how language and sketches interact in the CLIP embedding space, our aim is to facilitate research into models that can generate sketches in a part-based manner satisfying descriptions given by users of the pictures they have on their minds.
60 pages
Thesis Committee:
Srinivasan Seshan, Head, Computer Science Department
| |
Return to:
SCS Technical Report Collection This page maintained by reports@cs.cmu.edu |