Computational Biology Department
School of Computer Science, Carnegie Mellon University
Algorithms to Reconstruct Evolutionary
Salim Akhter Chowdhury
Cancer is one of the major causes of human mortality. Extensive genetic, epigenetic and physiological variations are observed within tumor cells, which complicates the diagnosis and treatment of the disease. Despite the extensive heterogeneity within single tumors, recurring features of their evolutionary processes are observed by comparing multiple regions or cells of a tumor. Recently, phylogenetic models have begun to see widespread use in cancer research to reconstruct processes of evolution in tumor progression. Mutations that drive development and progression of solid tumors typically include changes in the number of copies of genes or genomic regions. One particularly useful source of data for studying likely progression of individual tumors is fluorescence in situ hybridization (FISH), which allows one to count copy numbers of several genes in hundreds of single cells per tumor and thus especially well suited to characterizing intratumor heterogeneity. This thesis focuses primarily on phylogenetic characterization of single tumors at the cellular level from FISH data. We first develop phylogenetic methods using single gene duplication to infer likely models of tumor progression at the cellular level from FISH copy number data and apply these to a study of FISH data from two cancer types. We next extend our single gene models to include copy number changes at the scale of entire chromosomes and the whole genome. We develop new provably optimal methods for computing an edit distance between the copy number states of two cells given evolution by copy number changes of single probes, all probes on a chromosome, or all probes in the genome. Our two proposed models for inferring phylogenies of single tumors by copy number evolution assume models of uniform rates of genomic gain and loss across different genomic sites and scales, a substantial oversimplification necessitated by a lack of algorithms and quantitative parameters for fitting to more realistic tumor evolution models. We propose a framework for inferring models of tumor progression including variable rates for different gain and loss events. Application of the phylogenies inferred by our algorithms to real cervical and breast cancer data identifies key genomic events in disease progression consistent with prior literature. Classification experiments on cervical and tongue cancer datasets lead to improved prediction accuracy for models that allow non-uniform rates over models that have uniform rates, for the metastasis of primary cervical cancers and for tongue cancer survival.