Computer Science Department
School of Computer Science, Carnegie Mellon University
A Computational Framework for the Analysis
In this thesis I present algorithms for the analysis of microarray expression data from multiple species. These algorithms are used to identify core genes in two biological systems, the cell cycle and the immune response.
With data generated from high throughput biological experiments, it is now becoming possible to study organisms at the systems level. One of the first questions facing researchers is the identification of the core components of biological subsystems within an organism. This task is made difficult by the high levels of experimental and biological noise associated with these experiments. To address these problems I introduce a new computational framework for combining data from multiple species, for both improving prediction accuracy and identifying important subsets of genes involved in a given system. The computational framework is based on Markov random fields which allow the integration of microarray and sequence data from multiple species. Applying this framework to study cell cycle regulated genes, I have identified genes representing the core machinery of the cell cycle. These findings are supported by both complementary high-throughput data and motif analysis. In addition, I apply this computational framework to study immune response in human and mouse. I show that by using Gaussian random fields instead of discrete Markov random fields we are able to achieve better accuracy in predicting immune response genes. Finally, we identify a list of immune response genes that are conserved between cell types and species for further experimental study.