CMU-CS-22-147 Computer Science Department School of Computer Science, Carnegie Mellon University
A Self-Supervised Study of Multimodal Jiaxin Shi M.S. Thesis August 2022
Our experience of the world is inherently multimodal. Analyzing human multi- modal language is an increasingly popular area of research that often focuses on sentimental analysis and emotion recognition, where three main modalities are present: language, acoustic, and vision. The advancements in deep learning rely heavily on the abundance of data available for the model to learn rich patterns. Due to the heavy labor required to annotate large-scale data, it is beneficial to explore what we could achieve from self-supervised learning methods. In this work, we propose a self-supervised task to study the cross-modal interactions present in the multimodal language datasets (with language, acoustic and visual modalities). We study bimodal interactions between two source modalities through our proposed self-supervised task by generating the third modality, the target modality, given the two source modalities. In other words, we quantify the information overlap between the source and target modalities while studying which multimodal interactions are used for this self-supervised task. A secondary advantage of our proposed self-supervised task is that it can also be used in downstream tasks where one of the modalities is missing. Our approach builds on the intuition that observed modalities may be able to generalize information about the missing modality. For example, people may be able to imagine the voice of a speaker when watching muted videos. In summary, this thesis is a self-supervised study on multimodal interactions in opinionated videos. Our work investigates how much information overlap exists between different modalities, quantifies the amount of cross-modal interactions, and evaluates how much information can belearned from a missing modality given other available modalities.
39 pages
Thesis Committee:
Srinivasan Seshan, Head, Computer Science Department
| |
Return to:
SCS Technical Report Collection This page maintained by reports@cs.cmu.edu |