Evolutionary-based association analysis using haplotype data



Association studies, both family-based and population-based, can be powerful means of detecting disease-liability alleles. To increase the information of the test, various researchers have proposed targeting haplotypes. The larger number of haplotypes, however, relative to alleles at individual loci, could decrease power because of the additional degrees of freedom required for the test. An optimal strategy would focus the test on particular haplotypes or groups of haplotypes, much as is done with cladistic-based association analysis. First suggested by Templeton et al. ([1987] Genetics 117:343–351), such analyses use the evolutionary relationships among haplotypes to produce a limited set of hypothesis tests and to increase the interpretability of these tests. To more fully utilize the information contained in the evolutionary relationships among haplotypes and in the sample, we propose generalized linear models (GLM) for the analysis of data from family-based and population-based studies. These models fully account for haplotype phase ambiguity and allow for covariates. The models are encoded into a software package (the Evolutionary-Based Haplotype Analysis Package, EHAP), which also provides for various kinds of exploratory data analysis. The exploratory analyses, such as error checking, estimation of haplotype frequencies, and tools for building cladograms, should facilitate the implementation of cladistic-based association analysis with haplotypes. Genet Epidemiol 25:48–58, 2003. © 2003 Wiley-Liss, Inc.