CMU-CS-23-146 Computer Science Department School of Computer Science, Carnegie Mellon University
Robust Mean Estimation Against Oblivious Adversaries Shuchen Li M.S. Thesis December 2023
We give the first algorithm that achieves arbitrarily high accuracy for robust mean estimation on Gaussian and Laplace distribution when the adversary corrupts the samples before the noise is added. For sufficiently small constant α > 0 and accuracy ε > 0, our algorithm takes as input samples Y ⊆ ℝd of size n obtained by i.i.d. samples X ⊆ ℝd of size n from the true distribution with unknown mean μ, while an adversary corrupts an α fraction of the points before the (Gaussian or Laplace) noise is added. When the true distribution is Gaussian with unknown mean μ and covariance I, our algorithm needs sample size n = 2O(d/ε2) . When the true distribution is Laplace with unknown mean μ and covariance I, our algorithm needs sample size n = Ŏ(d2/ε4). Our algorithm runs in Ŏ(nd) time, and outputs an estimation μ̂ that ∥μ̂ - μ∥2 ≤ ε with high probability. Our method is to transform the sample to a Fourier-sparse signal that encodes the true mean μ and apply the sparse Fourier transform to decode μ. In Huber's contamination model, where the adversary corrupts an α fraction of the sample after the noise is added, it is known that there is a Ω(α) lower bound on the error of the estimator. In contrast, our algorithm can estimate the mean arbitrarily closely in this corruption-before-noise setting.
Our algorithm in this new setting has many possible applications. For example, the one that motivated us is max-affine regression. In max-affine regression, the model is the maximum of k linear models, where the noise is added after taking the maximum. If we can extend our algorithm to the list-decodable setting, then we immediately get an algorithm for the max-affine regression 35 pages
Thesis Committee:
Srinivasan Seshan, Head, Computer Science Department
| |
Return to:
SCS Technical Report Collection This page maintained by reports@cs.cmu.edu |