Computer Science Department
School of Computer Science, Carnegie Mellon University
Identifying the Signaling Cascades and Regulatory
Adaptation to diverse and ever-changing environmental conditions is vital to the survival of all organisms. From single-celled organisms reacting to changes in the chemical makeup of their surroundings to human cells fighting off infection, there are many global similarities across stress responses. In general, sensory proteins detect environmental perturbations and, via signaling cascades, alert specific transcription factors to adjust gene regulation and counteract negative effects of the stress. In this thesis, we present the challenges that arise when trying to understand such responses and propose computational methods for developing end-to-end models of stress response.
One primary goal when modeling the reaction to environmental perturbations is to determine the sensory proteins (sources) and transcription factors (targets) that form the endpoints of the directed signaling pathways. Many previous approaches rely on gene deletions for this task; however, we show that this strategy is unreliable due to widespread redundancy in transcriptional regulatory networks, which can mask the effects of a knockout. Instead, we propose to utilize condition-specific dynamic gene expression data to identify the transcription factors that control the divergence points in groups of gene expression profiles. We then construct a network of undirected physical protein interactions, the backbone of signaling pathways, and search for an optimal orientation of the network that connects the sensory proteins, which are already known in many conditions of interest, and the predicted active transcription factors.
Analysis of yeast signaling pathways reveals that our predicted interaction orientations are generally consistent with known annotations but also contain novel orientations that are biologically valid. Through a detailed analysis of yeast hyperosmotic stress, we demonstrate our method's ability to construct accurate end-to-end models and identify not only the transcription factors that are active in the response, but also when they are active and how they receive messages from upstream sensors. We also discuss the challenges of scaling to human interaction networks and how to overcome them. Comparative analysis of several strains of influenza demonstrates how our models can be used to identify genes with clinical relevance in the immune response to pathogens. Lastly, we explore alternative computational models for stress response that have a global probabilistic interpretation.