Computational Biology Department
School of Computer Science, Carnegie Mellon University
Algorithms for Identifying Compact Regions
Recent genome sequencing experiments allow us to observe regions of DNA that are spatially close to each other in cell nuclei. Analyses from them have shown that the 3D structure of DNA may be closely linked to genome functions such as long-range regulation of gene expression and DNA replication. Although these experiments enable, for the first time, genome-wide analyses of chromatin structure and its relationship to genome function, the typical approaches used in these analyses either do not incorporate higher level spatial features of chromatin, are prone to error, or are computationally prohibitive. The algorithms and analysis techniques presented in this dissertation substantially advance our understanding of the relationship between the 3D structure of DNA and genome function on the scale of the whole genome by incorporating higher level spatial features of chromatin and by controlling for a number of confounding variables that could lead to artifactual conclusions. All of our techniques operate directly on interaction data obtained from experiment as opposed to estimated three-dimensional structures which are prone to error and expensive to compute. Specifically, we designed algorithms based on graph rigidity theory to identify regions of chromatin interactions that are sufficiently constrained for embedding in three-dimensions, and we designed additional algorithms to identify subsets of constraints with metrically consistent distances. We also established that locally clustered regions of chromosomes (topological domains) are hierarchically organized and provided the first quantification of this organization using an efficient multiscale domain identification method that we designed. Finally, we performed two major genome-wide analyses relating three-dimensional genome structure to gene regulation. From these analyses, we show that mutations that affect the expression of genes far away on the genome are surprisingly close in 3D and that they occur preferentially at the boundaries of topological domains. We also analyzed a novel structural feature of DNA that we call 'dense regions'. They occupy spatially small volumes of the nucleus but can include genomically distant regions of the genome. We find that the majority of transcription or active gene expression occurs within these dense regions despite covering a significantly smaller portion of the genome. We also show that genes within these regions can change expression in concert to a cell signaling event. The algorithms and analysis techniques that we developed have enabled us to perform some of the first rigorous quantifications of the relationship of genome structure with gene regulation and these techniques can be easily applied and extended for use with future experimental data.