Machine Learning Department
School of Computer Science, Carnegie Mellon University


Learning Gene Networks Underlying Clinical Phenotypes
Under SNP Perturbations From Genome-Wide Data

Calvin McCarter

May 2019

Ph.D. Thesis


Keywords: Probabilistic graphical models, sparse learning, structure learning, convex optimization, genomics, gene networks, systems biology

Recent technologies are generating an abundance of genome sequence data and molecular and clinical phenotype data, providing an opportunity to understand the genetic architecture and molecular mechanisms underlying diseases. Previous approaches have largely focused on the co-localization of single-nucleotide polymorphisms (SNPs) associated with clinical and expression traits, each identified from genome-wide association studies and expression quantitative trait locus (eQTL) mapping, and thus have provided only limited capabilities for uncovering the molecular mechanisms behind the SNPs influencing clinical phenotypes. Here we aim to extract rich information on the functional role of trait-perturbing SNPs that goes far beyond this simple co-localization. We introduce a computational framework called PerturbNet for learning the gene network that modulates the influence of SNPs on phenotypes, using SNPs as naturally occurring perturbation of a biological system. PerturbNet uses a probabilistic graphical model to directly model both the cascade of perturbation from SNPs to the gene network to the phenotype network and the network at each layer of molecular and clinical phenotypes. PerturbNet learns the entire model by solving a single optimization problem with an extremely fast algorithm that can analyze human genome-wide data within a few hours. In our analysis of asthma data, for a locus that was previously implicated in asthma susceptibility but for which little is known about the molecular mechanism underlying the association, PerturbNet revealed the gene network modules that mediate the influence of the SNP on asthma phenotypes. Many genes in this network module were well supported in the literature as asthma-related.

112 pages

Thesis Committee:
Seyoung Kim (Chair)
Pradeep Ravikumar
Kathryn Roeder (Statistics/Computational Biology)
Dietrich Stephan (NeuBase Therapeutics/previously University of Pittsburgh)

Roni Rosenfeld, Head, Machine Learning Department
Tom M. Mitchell, Interim Dean, School of Computer Science

SCS Technical Report Collection
School of Computer Science