CMU-CS-24-136 Computer Science Department School of Computer Science, Carnegie Mellon University
On Resource Efficient Transfer Learning Lucio Mwinmaarong Dery Ph.D. Thesis July 2024
Transfer learning is a machine learning (ML) paradigm where performance on a desired end task1 is improved by exploiting "knowledge" from other tasks. The technique has become a critical workhorse driving many of the advances on the envelope of capabilities of machine learning models. The current formula is relatively simple – train a large model on large amounts of data from the transfer task(s); then applythe learned model either zero-shot or adapted to the desired downstream task(s). This thesis recognizes that these powerful models are not developed in-vacuo but rather require non-trivial resources to train and deploy. As such, there are a wide range of salient problems and communities of researchers that the status-quo leaves behind. In the first part of this thesis, we will focus on the training time problem of data-efficient transfer learning. We will begin by making a case for exploiting advanced knowledge of the desired downstream task(s) – which is commonly the case in many ML settings – to inform different dimensions of transfer learning. We dub this end task aware transfer learning. Next, we will present a set of novel end task aware optimization algorithms that bias the learning trajectory towards data-efficient solutions with strong generalization on the end task. We will close this part by providing an automated approach to constructing and searching over task-relevant transfer objectives when only end task data is available and in limited amounts. For the second section of this thesis, we will develop algorithms for compute and memory efficient transfer learning. Our goal will be to deliver a small and efficient yet performant task specific model for deployment seeded from a large, generalist model that has already been pre-trained on a transfer task (or set of tasks). Focusing on structured pruning as the technique for making models smaller, we will investigate pruning under two resource constrained settings: (1) limited task data, where we will exploit extra transfer tasks to learn pruning structures that, at the same task performance, lead to more compute and memory efficient models (2) settings of limited memory, where many of the classical pruning techniques break down because they require gradient-based optimization which can have prohibitive memory overhead. This thesis concludes by presenting more avenues for future work on resource efficient transfer learning by building on our past work and suggesting novel branches of investigation.
157 pages
Thesis Committee:
Srinivasan Seshan, Head, Computer Science Department
| |
Return to:
SCS Technical Report Collection This page maintained by reports@cs.cmu.edu |