CMU-CS-24-131
Computer Science Department
School of Computer Science, Carnegie Mellon University



CMU-CS-24-131

On Efficient Sketching Algorithms

Praneeth Kacham

Ph.D. Thesis

May 2024

CMU-CS-24-131.pdf


Keywords: Sketching, Streaming, Distributed Algorithms, Regression, Low Rank Approximation

Sketching refers to a wide variety of techniques to compress large datasets into much smaller forms that can be efficiently processed to answer questions about the original dataset. Over the past few decades, sketching has emerged as a key tool to efficiently handle large datasets in majorly three settings: (i) the Classic setting, in which the dataset is given to us and we want to solve a problem as quickly as possible, (ii) the Streaming setting, in which the underlying dataset is defined by a large stream of updates and we want to compute interesting properties of the dataset using a small amount of space, and (iii) the Distributed setting, in which the dataset of interest is split among multiple servers and we want protocols that use a small amount of communication among servers to solve problems of interest on the underlying dataset.

Each of the above settings presents a different challenge with regard to the measure of efficiency we are interested in. In this thesis, we study sketching algorithms in these three settings for a variety of problems. While the techniques required to obtain our algorithms differ across problems and settings, the underlying idea of data compression to convert the original large dataset into a much smaller form is a key ingredient behind all the results in this thesis.

446 pages

Thesis Committee:
David Woodruff (Chair)
Pravesh K. Kothari
Richard Peng
Rasmus Pagh (University of Copenhagen)

Srinivasan Seshan, Head, Computer Science Department
Martial Hebert, Dean, School of Computer Science


Return to: SCS Technical Report Collection
School of Computer Science

This page maintained by reports@cs.cmu.edu