CMU-CS-05-164Computer Science Department School of Computer Science, Carnegie Mellon University
CMU-CS-05-164
Christopher James Langmead July 2005
Keywords: Computational biology, metric learning, classification, regression
The inputs to the algorithm are a set, U, of unlabeled points in
R, a set of pairs of points, ^{n}S = {(x,y),
that are known to be similar, and a set
of pairs of points,
_{i}}; x,y ∈ UD = {(x,y),_{i}}; x,y ∈ Uthat are known to be dissimilar. The algorithm randomly samples S, D, and m-dimensional subspacesof R and learns a metric
for each subspace. The metric over ^{n}R is a linear
combination of the subspace metrics. The randomization addresses
issues of efficiency and overfitting. Extensions of the algorithm
to learning non-linear metrics via kernels, and as a
pre-processing step for dimensionality reduction are discussed.
The new method is demonstrated on a regression problem
(structure-based chemical shift prediction) and a classification
problem (predicting clinical outcomes for immunomodulatory
strategies for treating severe sepsis).
^{n}15 pages
| |

Return to:
SCS Technical Report Collection This page maintained by reports@cs.cmu.edu |