CMU-CS-17-124
Computer Science Department
School of Computer Science, Carnegie Mellon University



CMU-CS-17-124

Inferring Temporal Signaling Pathways and Regulatory
Mechanisms from High-Throughput Data

Siddhartha Jain

October 2017

Ph.D. Thesis

CMU-CS-17-124.pdf


Keywords: Genomics, computational biology, signaling networks, network inference, regulatory networks

Cells need to be able to sustain themselves, divide, and adapt to new stimuli. Proteins are key agents in regulating these processes. In all cases, the cell behavior is regulated by signaling pathways and proteins called transcription factors which regulate what and how much of a protein should be manufactured. Anytime a new stimulus arises, it can activate multiple signaling pathways by interacting with proteins on the cell surface (if it is an external stimulus) or proteins within the cell (if it is a virus for example). Disruption in signaling pathways can lead to a myriad of diseases including cancer. Knowledge of which signaling pathways play a role in which condition, is thus key to comprehending how cells develop, react to environmental stimulus, and are able to carry out their normal functions.

Recently, there has also been considerable excitement over the role epigenetics – modification of the DNA structure that doesn't involve changing the sequence may play. This has been buoyed by the tremendous amount of epigenetic data that is starting to be generated. Epigenetics has been heavily implicated in transcriptional regulation. How epigenetic changes are regulated and how they affect transcriptional regulation are still open questions however.

In this thesis we present a suite of computational techniques that are focused on modeling the dynamic regulation of biological processes. These methods address the various aspects of the problem mentioned above focusing on the reconstruction of dynamic signaling and regulatory networks. In many cases, the amount of biological data available for a specific condition can be very small compared to the number of variables. We present an algorithm which uses multi-task learning to learn signaling networks from many related conditions. There are also very few tools that attempt to take temporal dynamics into account when inferring signaling networks. The thesis presents a new algorithm that utilizes and extends Integer Programming methods for inferring such dynamic regulation. Finally, we present a new strategy to integrate epigenetic data with other temporal datasets using deep neural networks. We use this new method to reconstruct dynamic disease progression networks in Idiopathic Pulmonary Fibrosis (IPF).

145 pages

Thesis Committee:
Ziv Bar-Joseph (Chair)
Jaime Carbonell
Eric Xing
Naftali Kaminski (Yale University)

Frank Pfenning, Head, Computer Science Department
Andrew W. Moore, Dean, School of Computer Science




Return to: SCS Technical Report Collection
School of Computer Science

This page maintained by reports@cs.cmu.edu