Computer Science Department
School of Computer Science, Carnegie Mellon University
Structured Probabilistic Models of Proteins
Proteins are dynamic molecules. They flex in space, adopting many different spatial configurations while performing their function and evolve over time, changing their amino acid composition in response to changing fitness landscapes. The thesis of this dissertation is that this inherent variability of proteins can be modeled by structurally sparse representations. These sparse models can then be used to efficiently reason about the properties of the protein by the means of algorithms that exploit their sparsity.
This dissertation develops the first Probabilistic Graphical Model that models the entire protein across spatial configurations (GOBLIN). By compactly encoding and manipulating a probability distribution over an exponentially large space, and using statistical inference algorithms that exploit structural sparsity, GOBLIN is able to compute experimentally measurable properties of protein interactions quickly and accurately.
We then develop a method of learning generative models of amino acid composition of evolutionarily related protein families (GREMLIN) that captures dependencies between sequential and long-range pairs of positions in the protein. GREMLIN is vastly more accurate than existing statistical models based on Hidden Markov Models; by effectively utilizing a distributed map-reduce framework, it also presents a scalable alternative to these extant approaches.
Building on these two contributions, this dissertation develops a game-theoretic approach to drug design (GAMUT). GAMUT determines the affects of a change in the fitness landscape on the composition of the protein. GAMUT can be used to design drug cocktails that remain effective against natural possible mutant variants of the target. Towards this, GAMUT develops a novel algorithm that bounds properties of the Correlated Equilibria of Graphical Games based on outer relaxations to the marginal polytope.