CMU-CS-24-137 Computer Science Department School of Computer Science, Carnegie Mellon University
Classical Improvements to Modern Machine Learning Shiva Kaul Ph.D. Thesis August 2024
Over the past decade, a large rift has grown between classical and modern machine learning. The predictive performance of modern learning is incomparably better, but it is easier to analyze and guarantee safety, efficiency, fairness, and other properties of classical learning. In this dissertation, I investigate when it is possible to restore such desiderata to modern machine learning by prudently and strategically incorporating classical techniques. I form syntheses between classical and modern learning which can be categorized according to two high-level strategies: (1) wrapping, in which reliable performance guarantees are extracted from modern, opaque models via classical analytic techniques, or (2) swapping, in which some components of a modern model are rebuilt from classical primitives in a way which improves overall efficiency, tractability, and/or expressivity. These efforts lead to new developments in multiple areas of machine learning. The most important contribution in this dissertation pertains to meta-analysis, a structured form of question-answering which serves as the foundation of evidence-based medicine. Classic meta-analytic techniques are based upon randomized, controlled trials, whose causal validity is trusted; by contrast, modern regression models are trained upon large observational databases whose causal validity is untrusted. I show it is possible by incorporate the untrusted data into meta-analysis without sacrificing validity. This involves basic improvements to full conformal prediction which are of general interest. In a separate, more focused, application to healthcare, I generalize classic, handcrafted heart-rate variability statistics so they can be fine-tuned, via supervised learning, as part of a deep neural network. This leads to more accurate, physiologically-informed models. I also present foundational computational primitives that can be used within future machine learning models and algorithms. The first is an algorithm to (approximately) run nonlinear RNNs for T steps in just O(log T) parallel time. A key innovation of this algorithm is replacing nonlinearity across time by nonlinearity along depth through a provably-consistent scheme of local, parallelizable corrections. Inthis manner, classical linear dynamical systems (also known as state-space models) can be stacked to form fast, nonlinear sequence models. Another new computational primitive is gradient-based optimization over the set of all sequences of orthogonal polynomials. This optimization formulation has connections to many different problems in signal processing and optimization. Finally, I propose fairness criteria that circumvent computational intractability, based upon the geometric notion of margin used throughout learning theory and optimization. 206 pages
Thesis Committee:
Srinivasan Seshan, Head, Computer Science Department
| |
Return to:
SCS Technical Report Collection This page maintained by reports@cs.cmu.edu |