CMU-ML-08-105
Machine Learning Department
School of Computer Science, Carnegie Mellon University



CMU-ML-08-105

mStruct: Inference of Population Structure
in Light of both Genetic Admixing and Allele Mutations

Suyash Shringarpure, Eric P. Xing*

May 2008

CMU-ML-08-105.pdf


Keywords: Population structure, graphical models, variational methods, bayesian models


Traditional methods for analyzing population structure, such as the Structure program, ignore the influence of the effect of allele mutations between the ancestral and current alleles of genetic markers, which can dramatically influence the accuracy of the structural estimation of current populations, and reveal additional information of population evolution such as the the divergence time and migration history of admixed populations. We propose mStruct, an admixture of population-specific mixtures of inheritance models, that addresses the task of structure inference and mutation estimation jointly through a hierarchical Bayesian framework, and a variational algorithm for inference. We validated our method on synthetic data, and used it to analyze the HGDP-CEPH cell line panel of microsatellites used in [1] and the HGDP SNP data used in [2]. A comparison of the structural maps of world populations estimated by mStruct and Structure is presented, and we also report potentially interesting mutation patterns in world populations estimated by mStruct.

25 pages

*Eric Xing, addressee for all correspondence, Carnegie Mellon University


SCS Technical Report Collection
School of Computer Science homepage

This page maintained by reports@cs.cmu.edu