CMU-ML-08-110
Machine Learning Department
School of Computer Science, Carnegie Mellon University



CMU-ML-08-110

Computational Methods for Analyzing and
Modeling Gene Regulation Dynamics

Jason Ernst

August 2008

Ph.D. Thesis

CMU-ML-08-110.pdf


Keywords: Dynamics, gene expression, gene regulation, microarray, regulatory networks, time course, time series, transcription, transcription factor


Gene regulation is a central biological process whose disruption can lead to many diseases. This process is largely controlled by a dynamic network of transcription factors interacting with specific genes to control their expression. Time series microarray gene expression experiments have become a widely used technique to study the dynamics of this process. This thesis introduces new computational methods designed to better utilize data from these experiments and to integrate this data with static transcription factor-gene interaction data to analyze and model the dynamics of gene regulation. The first method, STEM (Short Time-series Expression Miner), is a clustering algorithm and software specifically designed for short time series expression experiments, which represent the substantial majority of experiments in this domain. The second method, DREM (Dynamic Regulatory Events Miner), integrates transcription factor-gene interactions with time series expression data to model regulatory networks while taking into account their dynamic nature. The method uses an Input-Output Hidden Markov Model to identify bifurcation points in the time series expression data. While the method can be readily applied to some species, the coverage of experimentally determined transcription factor-gene interactions in most species is limited. To address this we introduce two methods to improve the computational predictions of these interactions. The first of these methods, SEREND (SEmi-supervised REgulatory Network Discoverer), motivated by the species E. coli is a semi-supervised learning method that uses verified transcription factor-gene interactions, DNA sequence binding motifs, and gene expression data to predict new interactions. We also present a method motivated by human genomic data, that combines motif information with a probabilistic prior on transcription factor binding at each location in the organism's genome, which it infers based on a diverse set of genomic properties. We applied these methods to yeast, E. coli, and human cells. Our methods successfully predicted interactions and pathways, many of which have been experimentally validated. Our results indicate that by explicitly addressing the temporal nature of regulatory networks we can obtain accurate models of dynamic interaction networks in the cell.

188 pages


SCS Technical Report Collection
School of Computer Science homepage

This page maintained by reports@cs.cmu.edu