CMU-CS-23-139
Computer Science Department
School of Computer Science, Carnegie Mellon University



CMU-CS-23-139

Practical Methods for Automated Algorithm Design
in Machine Learning and Computational Biology

Quang Minh Hoang

Ph.D. Thesis

October 2023

CMU-CS-23-139.pdf


Keywords: Automated Machine Learning, Computational Biology, Combinatorial Optimization, Sequence Sketching, Federated Learning, Meta Learning

Configuration tuning is an essential practice to achieve good performance with many computational methods. However, configuring complex and discrete algorithms often requires significant trial-and-error effort due to a lack of automated solutions. In large-scale systems where computational tasks are numerous and constantly changing in specificity, the repetitive cost of manual tuning becomes a major bottleneck that hinders scalability. Moreover, the absence of a systematic approach to configure deployment settings makes it challenging to replicate the obtained results in different deploying conditions. To address these problems, this thesis focuses on developing new data-driven automated algorithm design (AAD) frameworks in several classical and multi-task settings. Specifically, in the classical configuration tuning setting, we address the problems of kernel selection for Bayesian methods, and minimizer construction for biological sequence sketching. In the multi-task scenario, we address the problems of privacy-preserving neural architecture search for multiple clients, and meta-learning for parameter optimization in a heterogeneous task stream. In all of these problems, the variables to be optimized often have underlying discrete structures such as trees, graphs or permutations. Our contribution is a suite of reformulation techniques that result in efficient and accurate tuning methods for these configuration domains. Finally, we demonstrate the performance of our methods on practical scenarios and show that they have significantly outperformed state-of-the-art benchmarks.

153 pages

Thesis Committee:
Carl Kingsford (Chair)
David P. Woodruff
Maria-Florina Balcan
Risto Miikkulainen (The University of Texas at Austin)

Srinivasan Seshan, Head, Computer Science Department
Martial Hebert, Dean, School of Computer Science


Return to: SCS Technical Report Collection
School of Computer Science

This page maintained by reports@cs.cmu.edu