Computer Science Department
School of Computer Science, Carnegie Mellon University
Beyond Worst-Case Analysis in Privacy and Clustering:
This thesis can be viewed as a collection of work in differential privacy and in clustering. In its first part we discuss work aimed at preserving differential privacy in a social network, with respect to either the presence/absence of a single edge , or with respect to changing all edges adjacent to one node . In its second part we discuss multiple clustering problems, focusing on the k-means and k-median problems. We show how to correctly cluster an instance whose optimal k-means solution either satisfies a certain stability condition , or is resilient to small constant metric perturbations , or with cluster centers that satisfy a particular separation condition .
Alternatively, this thesis can be viewed as an investigation of specific nonworst case analysis paradigms. The common theme among of all results composing this thesis is that they all introduce algorithms whose guarantees are meaningful only for a subset of inputs – for inputs that satisfy certain nice properties, or assumptions. These assumptions can be roughly divided into two types, which we refer to as explicit or implicit. Explicit assumptions give a very specific and quantifiable characterization of the input (e.g. clustering instances with the distance between any pair of cluster centers larger than a specific bound). On the other hand, implicit assumptions are harder to characterize. Implicit assumptions pose a certain property that the input should satisfy, due to some compelling "real-life" reasoning (e.g. justifying a particular value of k for a k-means clustering instance), and often give much leeway as to the particular structure of the input. In this thesis, we exhibit multiple examples of assumptions of both kinds, in differential privacy and clustering, and give algorithms that take advantage of these assumptions. In particular, we show how tasks that are provably hard become feasible under suitable assumptions; tasks like providing accurate answers for queries over graph while preserving privacy on the node level, or giving a c-approximation for the k-median objective for c < 1 + e1.