COMPUTER SCIENCE TECHNICAL REPORT ABSTRACTS

CMU-CS-08-164
Computer Science Department
School of Computer Science, Carnegie Mellon University

CMU-CS-08-164

Techniques for Exploiting Unlabeled Data

Mugizi Robert Rwebangira

October 2008

Ph.D. Thesis

CMU-CS-08-164.pdf

Keywords: Semi-supervised, regression, unlabeled data, similarity

In many machine learning application domains obtaining labeled data is expensive but obtaining unlabeled data is much cheaper. For this reason there has been growing interest in algorithms that are able to take advantage of unlabeled data. In this thesis we develop several methods for taking advantage of unlabeled data in classification and regression tasks.

Specific contributions include:

A method for improving the performance of the graph mincut algorithm of Blum and Chawla [12] by taking randomized mincuts. We give theoretical motivation for this approach and we present empirical results showing that randomized mincut tends to outperform the original graph mincut algorithm, especially when the number of labeled examples is very small.
An algorithm for semi-supervised regression based on manifold regularization using local linear estimators. This is the first extension of local linear regression to the semi-supervised setting. In this thesis we present experimental results on both synthetic and real data and show that this method tends to perform better than methods which only utilize the labeled data.
An investigation of practical techniques for using the Winnow algorithm (which is not directly kernelizable) together with kernel functions and general similarity functions via unlabeled data. We expect such techniques to be particularly useful when we have a large feature space as well as additional similarity measures that we would like to use together with the original features. This method is also suited to situations where the best performing measure of similarity does not satisfy the properties of a kernel. We present some experiments on real and synthetic data to support this approach.

114 pages

Return to: SCS Technical Report Collection
School of Computer Science

This page maintained by reports@cs.cmu.edu