Lane Center for Computational Biology
School of Computer Science, Carnegie Mellon University


Computational Study of Transcriptional Regulations
- From Sequence to Expression

Shan Zhong

March 2012

Ph.D. Thesis


Keywords: Motif finding, Transcriptional regulatory network, p53, Protein binding microarray, Tissue specificity, EIN3, Ethylene response

Transcription is the process during which RNA molecules are synthesized based on the DNAs in cells. Transcription leads to gene expression, and it is the first step in the flow of genetic information from DNA to proteins that carry out biological functions. Transcription is tightly regulated both spatially and temporally at multiple levels, so that the amount of mRNAs produced for different genes is controlled across different kinds of cells and tissues, as well as in different developmental stages and in response to different environmental stimulus. In eukaryotes, transcription is a complicated process and its regulation involves both cis-regulatory elements and trans-acting factors. By studying spatiotemporally what genes are regulated by which cis-elements and trans-factors, we can get a better understanding of how we develop, how we react to environmental signals, and the mechanisms behind diseases like cancer that, at least in part, result from failures in proper transcriptional regulation.

In this thesis, we present a suite of computational methods and analyses that, combined, provide a solution to problems related to the identification of DNA binding motifs, linking these motifs to the TFs that bind them and the genes that they control, and integrating these motifs and interactions with time series expression data to model dynamic regulatory networks. Specifically, we first develop a novel method for finding discriminative DNA motifs, motifs that are over-represented in a set of positive sequences but depleted in a set of negative sequences. Second, we present a new method of using protein binding microarray data combined with DNase I hypersensitivity and conservation data to predict tissue-specific transcription factor activities and binding sites. Finally, we extend the DREM framework which was previously developed by our group to study dynamic regulatory networks, and we use the improved version to analyze a biological dataset of gene responses in arabidopsis following ethylene treatment. Together, the methods and analyses presented contribute to the studying and understanding of transcriptional regulation.

180 pages

Return to: SCS Technical Report Collection
School of Computer Science

This page maintained by