CMU-CB-20-102
Ray and Stephanie Lane Computational Biology Department
School of Computer Science, Carnegie Mellon University



CMU-CB-20-102

Algorithms for the study of chromosomal structure variability

Natalie Sauerwald

September 2020

Ph.D. Thesis

CMU-CB-20-102.pdf


Keywords: Three-dimensional structure, Algorithms, Hi-C, Topologically associating domains, Computational Biology

The last two decades have introduced several experimental methods for studying three-dimensional chromosome structure, opening up a new dimension of genomics. Studies of these new data types have shown great promise in explaining some of the open questions in gene regulation, but the experiments are indirect and imperfect measurements of the underlying structure, requiring rigorous computational methods. We can now study the 3D relationships between all pairs of chromosome segments across the genome, but questions such as the variability of this structure between cell and tissue types, the predictors of structural similarity, the dynamics of this complex system, and a complete definition of the observed substructures remain unclear. This dissertation presents several approaches to improve our understanding of human genomic spatial architecture. We present a new method to quantify the variability of chromosomal substructures, called topologically-associating domains (TADs) between any pair of samples. This algorithm efficiently identifies all regions with statistically significantly similar TAD structures between the two samples. Using this method, we quantify the structural similarity within each chromosome and between chromosomes, and between cell types. We show that cancer cell lines are structurally disrupted at pan-cancer genes, but not globally. We perform extensive data analysis using this method and others to assess the consistency of TADs across a range of biological and technical conditions. This large scale study of chromosomal structural variability emphasizes the differences between chromosome structures between cell and tissue types, in contrast to the belief that genome structure is highly conserved. We quantify the influence of genetic difference and similarity, as well as technical confounders, on chromosome structural similarity in a systematic study of over 100 samples. We also apply a biophysics model to predict the dynamics of chromosomes from static data. Our predictions correlate well with several different experimental measures and known substructures. We predict the existence of long range dynamic couplings involved in gene regulation that have not been found without a dynamic model. Finally, we develop a generalized TAD-finding algorithm that can be guided towards selecting TADs for any desired property. Defining several functions around common evaluation criteria for TADs, we explore th erelationships between various biological TAD properties and the computational definitions used to identify TADs. The algorithms and analysis we have developed enable rigorous study of the basic properties of this new dimension of genomics, and can continue to inform the study of TADs as more experimental data becomes available.

154 pages

Thesis Committee:
Carl Kingsford (Chair)
Jian Ma
Anne-Ruxandra Carvunis (University of Pittsburgh)
William Stafford Noble (University of Washington)

Russell S. Schwartz, Head, Computational Biology Department
Martial Hebert, Dean, School of Computer Science



Return to: SCS Technical Report Collection
School of Computer Science

This page maintained by reports@cs.cmu.edu