CMU-CS-21-147
Computer Science Department
School of Computer Science, Carnegie Mellon University



CMU-CS-21-147

Machine Learning: Metrics and Embeddings

Timothy Chu

Ph.D.. Thesis

December 2021

CMU-CS-21-147.pdf


Keywords: Machine learning, Computational Geometry, Clustering, Kernels, Group Theory, Theoretical Foundations

In this thesis, we analyze new theories of clustering, one of the most fundamental tasks in machine learning. We use methods drawing from multiple disciplines, including metric embeddings, spectral algorithms, and group representation theory.

  • We propose a metric that adapts to the shape of data, and show how to quickly compute it. These metrics may be useful for improving k-means clustering methods.
  • We build a spectral partition method with provable theoretical guarantees. This may lead to more theoretically principled spectral clustering methods, as existing methods do not have any such guarantees. Spectral clustering is one of the most popular methods of clustering.
  • We classify all Manhattan distance kernels. Kernel methods are one of the oldest and most established methods of clustering data. This result is a Manhattan distance analog of one of the fundamental results on machine learning kernels.
Each of these contributions answers natural questions in machine learning theory. We develop multidisciplinary tools from disciplines ranging from linear algebra to group theory, and combine these with key ideas from metric embeddings and computational geometry.

228 pages

Thesis Committee:
Gary L. Miller (Chair)
Anupam Gupta
Daniel Slater
Satish Rao (University of California, Berkeley)

Srinivasan Seshan, Head, Computer Science Department
Martial Hebert, Dean, School of Computer Science


Return to: SCS Technical Report Collection
School of Computer Science

This page maintained by reports@cs.cmu.edu