CMU-CS-10-101 Lane Center for Computational Biology School of Computer Science, Carnegie Mellon University
A Framework for Inferring Protein Location Shannon Quinn May 2010 M.S. Thesis
The Waldo framework offers one method for determining subcellular protein location patterns. The framework operates by gathering data from many dierent protein databases. Waldo builds a model from proteins with observed location patterns, clustering them by their locations under specic conditions. This creates clusters whose labels are effectively consensus location patterns. These consensus patterns serve as starting points for proteins whose location patterns under the conditions of interest are unknown. Under the assumption that similar proteins localize similarly, the unobserved proteins are compared against those that are clustered, identifying a cluster whose constituent members have structure and sequence closest to that of the protein of interest. By associating a known protein of closest match to the unknown protein, the location pattern of the known protein provides a point estimate for the unobserved location pattern. Using the Z-score and percent identity resulting from the homology comparison, a confidence can then be placed on the point estimate of the location pattern. 18 pages
| |
Return to:
SCS Technical Report Collection This page maintained by reports@cs.cmu.edu |