Get access

A novel bayesian graphical model for genome-wide multi-SNP association mapping


Correspondence to: Yu Zhang, Department of Statistics, The Pennsylvania State University, 421A Thomas Building, University Park,PA 16802


Most disease association mapping algorithms are based on hypothesis testing procedures that test one variant at a time. Those methods lose power when the disease mutations are jointly tagged by multiple variants, or when gene-gene interaction exist. Nearby variants are also correlated, for which procedures ignoring the dependence between variants will inevitably produce redundant results. With a large number of variants genotyped in current genome-wide disease association studies, simultaneous multivariant association mapping algorithms are strongly desired. We present a novel Bayesian method for automatic detection of multivariant joint association in genome-wide case-control studies. Our method has improved power and specificity over existing tools. We fit a joint probabilistic model to the entire data and identify disease variants simultaneously. The method dynamically accounts for the strong linkage disequilibrium (LD) between variants. As a result, only the primary disease variants will be identified, with all secondary associations due to LD effects filtered out. Our method better pinpoints the disease variants with improved resolution. The method is also computationally efficient for genome-wide studies. When applied to a real data set of inflammatory bowel disease (IBD) containing 401,473 variants in 4,720 individuals, our method detected all previously reported IBD loci in the same data, and recovered two missed loci. We further detected two novel interchromosome interactions. The first is between STAT3 and PARD6G, and the second is between DLG5 and an intergenic region at 5p14. We further validated the two interactions in an independent study.