CMU-CB-21-103
Ray and Stephanie Lane Computational Biology Department
School of Computer Science, Carnegie Mellon University



CMU-CB-21-103

Genome-Driven Personalized Medicine of Cancer
via Machine Learning and Phylogenetic Models

Yifeng Tao

August 2021

Ph.D. Thesis

CMU-CB-21-103.pdf


Keywords: Interpretable Machine Learning, Deep Learning, Attention Mechanism, Cancer Genomics, Tumor Heterogeneity, Tumor Phylogenetics, Evolutionary Methods, Cancer Prognosis, Drug Response

Cancer proceeds from the accumulation of genomic alterations, and develops into heterogeneous cell populations in an evolutionary process. Therefore, the prognoses of cancer patients, such as survival profile, metastasis, and drug response, are encoded by the large-volume genome data. We first investigate the reliable phenotype inference of cancer through well-designed interpretable machine learning models. By leveraging the power of large-scale genomic data and external biomedical knowledge base, we utilize deep learning models for the accurate inference of cancer phenotypes, including transcriptome expression levels, transcription factor activities, and drug resistance. We address the interpretability of models through techniques such as attention mechanisms to identify driver mutations and critical biomarkers. Secondly, we reveal the intra-/inter-tumor heterogeneity and mechanism of tumor progression via robust deconvolution and phylogenetic algorithms. We formulate the deconvolution of bulk tumor molecular data mathematically as a biologically inspired matrix factorization problem, and propose a neural network and then an improved hybrid optimizer to solve the problem robustly and accurately. We develop and apply a Minimum Elastic Potential algorithm to reconstruct the evolutionary trajectory from the unmixed clones. Finally, we improve the prognostic prediction of cancer by incorporating machine learning and evolutionary methods. Clinicians traditionally focused on the pathological features and driver-level genomic profiles to facilitate the treatment. However, it is possible that critical clones, instead of the bulk tumor as a whole, affect the prognoses. We explore the questions by integrating both the evolutionary mutational features, driver-level features, and clinical features to improve the prognostic prediction of cancer. We develop an L0-regularized Cox regression model, and find that the evolutionary features account for roughly 1/3 of all the available features, depending on cancer types and sequencing techniques.

143 pages

Thesis Committee:
Russell Schwartz (Chair)
Jian Ma
Xinghua Lu (University of Pittsburgh)
Adrian V. Lee (University of Pittsburgh)

Russell Schwartz, Head, Computational Biology Department
Martial Hebert, Dean, School of Computer Science



Return to: SCS Technical Report Collection
School of Computer Science

This page maintained by reports@cs.cmu.edu