CMU-CB-21-103 Ray and Stephanie Lane Computational Biology Department School of Computer Science, Carnegie Mellon University
Genome-Driven Personalized Medicine of Cancer Yifeng Tao August 2021 Ph.D. Thesis
Cancer proceeds from the accumulation of genomic alterations, and develops into heterogeneous cell populations in an evolutionary process. Therefore, the prognoses of cancer patients, such as survival profile, metastasis, and drug response, are encoded by the large-volume genome data. We first investigate the reliable phenotype inference of cancer through well-designed interpretable machine learning models. By leveraging the power of large-scale genomic data and external biomedical knowledge base, we utilize deep learning models for the accurate inference of cancer phenotypes, including transcriptome expression levels, transcription factor activities, and drug resistance. We address the interpretability of models through techniques such as attention mechanisms to identify driver mutations and critical biomarkers. Secondly, we reveal the intra-/inter-tumor heterogeneity and mechanism of tumor progression via robust deconvolution and phylogenetic algorithms. We formulate the deconvolution of bulk tumor molecular data mathematically as a biologically inspired matrix factorization problem, and propose a neural network and then an improved hybrid optimizer to solve the problem robustly and accurately. We develop and apply a Minimum Elastic Potential algorithm to reconstruct the evolutionary trajectory from the unmixed clones. Finally, we improve the prognostic prediction of cancer by incorporating machine learning and evolutionary methods. Clinicians traditionally focused on the pathological features and driver-level genomic profiles to facilitate the treatment. However, it is possible that critical clones, instead of the bulk tumor as a whole, affect the prognoses. We explore the questions by integrating both the evolutionary mutational features, driver-level features, and clinical features to improve the prognostic prediction of cancer. We develop an L0-regularized Cox regression model, and find that the evolutionary features account for roughly 1/3 of all the available features, depending on cancer types and sequencing techniques.
143 pages
| |
Return to:
SCS Technical Report Collection This page maintained by reports@cs.cmu.edu |