CMU-CS-20-115
Computer Science Department
School of Computer Science, Carnegie Mellon University



CMU-CS-20-115

Deep Multi-view Clustering
Using Local Similarity Graphs

Shuli Jiang

M.S. Thesis

May 2020

CMU-CS-20-115.pdf


Keywords: Data mining, unsupervised learning, multi-view clustering, canonical correlation analysis, local similarity graphs, mutual K nearest neighbors, deep autoencoders

Multi-view clustering involves clustering data with different, possibly distinct feature sets simultaneously. In many application domains, multi-view data arises naturally. For example, news articles can be described by both text and pictures, and multimedia segments can be described by their video signals from cameras and audio signals from voice recorders. Multi-view clustering has a wide range of potentially high impact applications. Yet, the benefits of using graph-based local similarity information to learn better representations of data for clustering, and the flexibility of incorporating pairwise constraints which may be accessible to improve clustering performance, are still under-explored in multi-view clustering.

In this thesis, we present Local Similarity Graph based Multi-view Clustering (LSGMC), a new and improved correlation-based multi-view clustering approach. The method leverages local similarity graphs constructed by mutual K nearest neighbors. LSGMC uses the graphs to guide the search for a better data representation through exploring first order proximity within views, and utilizing complementary information across views. We empirically show that LSGMC can efficiently use information from multiple views to improve clustering accuracy, and outperform state-of-the-art multi-view alternatives on a variety of benchmark and real world datasets, including image data for hand digit recognition, text data for language recognition and acoustic-articulatory data for speech recognition. We further show that LSGMC is flexible in incorporating pairwise constraints and thus it can be naturally extended to handle semi-supervised learning problems.

70 pages

Thesis Committee:
Artur Dubrawski (Chair)
Jeff Schneider

Srinivasan Seshan, Head, Computer Science Department
Martial Hebert, Dean, School of Computer Science


Return to: SCS Technical Report Collection
School of Computer Science

This page maintained by reports@cs.cmu.edu