Lane Center for Computational Biology
School of Computer Science, Carnegie Mellon University
A Framework for Inferring Protein Location
The Waldo framework offers one method for determining subcellular protein location patterns. The framework operates by gathering data from many dierent protein databases. Waldo builds a model from proteins with observed location patterns, clustering them by their locations under specic conditions. This creates clusters whose labels are effectively consensus location patterns. These consensus patterns serve as starting points for proteins whose location patterns under the conditions of interest are unknown. Under the assumption that similar proteins localize similarly, the unobserved proteins are compared against those that are clustered, identifying a cluster whose constituent members have structure and sequence closest to that of the protein of interest. By associating a known protein of closest match to the unknown protein, the location pattern of the known protein provides a point estimate for the unobserved location pattern. Using the Z-score and percent identity resulting from the homology comparison, a confidence can then be placed on the point estimate of the location pattern.