Computational Biology Department
School of Computer Science, Carnegie Mellon University


Algorithms to Reconstruct Evolutionary
Models of Tumor Progression

Salim Akhter Chowdhury

February 2015

Ph.D. Thesis


Keywords: Tumor Progression, Tumor Evolution, Tumor Phylogenetics, Copy Number Change, Steiner Tree, Maximum Parsimony, Fluorescence in situ Hybridization, Cancer Diagnosis and Prognosis

Cancer is one of the major causes of human mortality. Extensive genetic, epigenetic and physiological variations are observed within tumor cells, which complicates the diagnosis and treatment of the disease. Despite the extensive heterogeneity within single tumors, recurring features of their evolutionary processes are observed by comparing multiple regions or cells of a tumor. Recently, phylogenetic models have begun to see widespread use in cancer research to reconstruct processes of evolution in tumor progression. Mutations that drive development and progression of solid tumors typically include changes in the number of copies of genes or genomic regions. One particularly useful source of data for studying likely progression of individual tumors is fluorescence in situ hybridization (FISH), which allows one to count copy numbers of several genes in hundreds of single cells per tumor and thus especially well suited to characterizing intratumor heterogeneity. This thesis focuses primarily on phylogenetic characterization of single tumors at the cellular level from FISH data. We first develop phylogenetic methods using single gene duplication to infer likely models of tumor progression at the cellular level from FISH copy number data and apply these to a study of FISH data from two cancer types. We next extend our single gene models to include copy number changes at the scale of entire chromosomes and the whole genome. We develop new provably optimal methods for computing an edit distance between the copy number states of two cells given evolution by copy number changes of single probes, all probes on a chromosome, or all probes in the genome. Our two proposed models for inferring phylogenies of single tumors by copy number evolution assume models of uniform rates of genomic gain and loss across different genomic sites and scales, a substantial oversimplification necessitated by a lack of algorithms and quantitative parameters for fitting to more realistic tumor evolution models. We propose a framework for inferring models of tumor progression including variable rates for different gain and loss events. Application of the phylogenies inferred by our algorithms to real cervical and breast cancer data identifies key genomic events in disease progression consistent with prior literature. Classification experiments on cervical and tongue cancer datasets lead to improved prediction accuracy for models that allow non-uniform rates over models that have uniform rates, for the metastasis of primary cervical cancers and for tongue cancer survival.

217 pages

Thesis Committee:
Russell Schwartz (Chair)
Dannie Durand
Carl Kingsford
Adrian Lee
Alejandro A. Schaffer (National Center for Biotechnology Information)

Robert F. Murphy, Head, Computational Biology Department
Andrew W. Moore, Dean, School of Computer Science

Return to: SCS Technical Report Collection
School of Computer Science

This page maintained by