CMU-ML-13-107
Machine Learning Department
School of Computer Science, Carnegie Mellon University



CMU-ML-13-107

Approximation Algorithms and New Models
for Clustering and Learning

Pranjal Awasthi

August 2013

Ph.D. Thesis

CMU-ML-13-107.pdf


Keywords: Clustering, PAC learning, Interactive learning


This thesis is divided into two parts. In part one, we study the k-median and the k-means clustering problems. We take a different approach than the traditional worst case analysis models. We show that by looking at certain well motivated stable instances, one can design much better approximation algorithms for these problems. Our algorithms achieve arbitrarily good approximation factors on stable instances, something which is provably hard on worst case instances. We also study a different model for clustering which introduces limited amount of interaction with the user. Such interactive models are very popular in the context of learning algorithms but their effectiveness for clustering is not well understood. We present promising theoretical and experimental results in this direction.

The second part of the thesis studies the design of provably good learning algorithms which work under adversarial noise. One of the fundamental problems in this area is to understand the learnability of the class of disjunctions of Boolean variables. We design a learning algorithm which improves on the guarantees of the previously best known result for this problem. In addition, the techniques used seem fairly general and promising to be applicable to a wider class of problems. We also propose a new model for learning with queries. This model restricts the algorithms ability to only ask certain "local" queries. We motivate the need for the model and show that one can design efficient local query algorithms for a wide class of problems.

177 pages


SCS Technical Report Collection
School of Computer Science