CMU-CS-21-118
Computer Science Department
School of Computer Science, Carnegie Mellon University



CMU-CS-21-118

Sketching and Sampling Algorithms for High-Dimensional Data

Rajesh Jayaram

Ph.D. Thesis

2021

CMU-CS-21-118.pdf


Keywords: Sketching, Streaming, Sampling, Numerical Linear Algebra

Sketching is a powerful technique which compresses high-dimensional datasets into lower-dimensional sketches, while preserving properties of interest. Sampling is a quintessential form of sketching, where one generates samples from the data to use as the basis for a sketch. Sketching and sampling methods have become ubiquitous across computer science, and are a central component of modern algorithm design.

This thesis studies the design and analysis of sketching and sampling algorithms for a variety of computational tasks. In particular, we make contributions to three closely related areas: Streaming and Distributed Algorithms, Numerical Linear Algebra, and Database Query Evaluation. Our contributions to these areas include:

Streaming and Distributed Algorithms:

  • The first perfect Lp samplers, near-optimal algorithms for Lp estimation for p ∈ (0, 1), improved sketches for the Earth Mover's Distance, and the introduction of the adversarially robust and bounded deletion streaming models.
Numerical Linear Algebra:
  • The first property testing algorithms for matrix positive semi-definiteness, and the first input-sparsity solvers for linear regression on design matrices which are (1) a Kronecker product of several smaller matrices or (2) a database join of several smaller tables.
Database Query Evaluation:
  • A characterization of the classes of conjunctive queries for which approximate counting and uniformly sampling can be accomplished in polynomial time. Along the way, we obtain the first polynomial time algorithm for estimating the number of strings of length n accepted by a non-deterministic finite automaton.

722 pages

Thesis Committee:
David Woodruff (Chair)
Anupam Gupta
Andrej Risteski
Alexandr Andoni (Columbia University)
Jelani Nelson (University of California, Berkeley)

Srinivasan Seshan, Head, Computer Science Department
Martial Hebert, Dean, School of Computer Science


Return to: SCS Technical Report Collection
School of Computer Science

This page maintained by reports@cs.cmu.edu