Computer Science Department
School of Computer Science, Carnegie Mellon University


Active Transfer Learning

Xuezhi Wang

June 2016

Ph.D. Thesis


Keywords: Transfer Learning, Active Learning, Model Shift, Multi-Task Learning, Stability Analysis, Distribution Learning

Transfer learning algorithms are used when one has sufficient training data for one supervised learning task (the source task) but only very limited training data for a second task (the target task) that is similar but not identical to the first. These algorithms use varying assumptions about the similarity between the tasks to carry information from the source to the target task. Common assumptions are that only certain specific marginal or conditional distributions have changed while all else remains the same. Moreover, not much work on transfer learning has considered the case when a few labels in the test domain are available. Alternatively, if one has only the target task, but also has the ability to choose a limited amount of additional training data to collect, then active learning algorithms are used to make choices which will most improve performance on the target task. These algorithms may be combined into active transfer learning, but previous efforts have had to apply the two methods in sequence or use restrictive transfer assumptions.

This thesis focuses on active transfer learning under the model shift assumption. We start by proposing two transfer learning algorithms that allow changes in all marginal and conditional distributions but assume the changes are smooth in order to achieve transfer between the tasks. We then propose an active learning algorithm for the second method that yields a combined active transfer learning algorithm. By analyzing the risk bounds for the proposed transfer learning algorithms, we show that when the conditional distribution changes, we are able to obtain a generalization error bound of O(i/λ* √ nl) with respect to the labeled target sample size nl, modified by the smoothness of the change (λ*)across domains. Our analysis also sheds light on conditions when transfer learning works better than no-transfer learning (learning by labeled target data only). Furthermore, we consider a general case where both the support and the model change across domains. We transform both X (features) and Y (labels) by a parameterized-location-scale shift to achieve transfer between tasks.

On the other hand, multi-task learning attempts to simultaneously leverage data from multiple domains in order to estimate related functions on each domain. Similar to transfer learning, multi-task problems are also solved by imposing some kind of "smooth" relationship among/between tasks. We study how different smoothness assumptions on task relations affect the upper bounds of algorithms proposed for these problems under different settings.

Finally, we propose methods to predict the entire distribution P(Y) and P(Y\X) by transfer, while allowing both marginal and conditional distributions to change. Moreover, we extend this framework to multi-source distribution transfer.

We demonstrate the effectiveness of our methods on both synthetic examples and real-world applications, including yield estimation on the grape image dataset, predicting air-quality from Weibo posts for cities, predicting whether a robot successfully climbs over an obstacle, examination score prediction for schools, and location prediction for taxis.

117 pages

Thesis Committee:
Jeff Schneider (Chair)
Christos Faloutsos
Geoff Gordon
Jerry Zhu (University of Wisconsin-Madison)

Frank Pfenning, Head, Computer Science Department
Andrew W. Moore, Dean, School of Computer Science

Return to: SCS Technical Report Collection
School of Computer Science

This page maintained by