Power and Sample Size for Testing Associations of Haplotypes with Complex Traits


*Corresponding Author: Daniel J. Schaid, Ph.D., Harwick 775, Section of Biostatistics, Mayo Clinic, 200 First Street, SW, Rochester, MN 55905. Tel: 507-284-0639. Fax: 507-284-9542. E-mail: schaid@mayo.edu


Evaluation of the association of haplotypes with either quantitative traits or disease status is common practice, and under some situations provides greater power than the evaluation of individual marker loci. The focus on haplotype analyses will increase as more single nucleotide polymorphisms (SNPs) are discovered, either because of interest in candidate gene regions, or because of interest in genome-wide association studies. However, there is little guidance on the determination of the sample size needed to achieve the desired power for a study, particularly when linkage phase of the haplotypes is unknown, and when a subset of tag-SNP markers is measured. There is a growing wealth of information on the distribution of haplotypes in different populations, and it is not unusual for investigators to measure genetic markers in pilot studies in order to gain knowledge of the distribution of haplotypes in the target population. Starting with this basic information on the distribution of haplotypes, we derive analytic methods to determine sample size or power to test the association of haplotypes with either a quantitative trait or disease status (e.g., a case-control study design), assuming that all subjects are unrelated. Our derivations cover both phase-known and phase-unknown haplotypes, allowing evaluation of the loss of efficiency due to unknown phase. We also extend our methods to when a subset of tag-SNPs is chosen, allowing investigators to explore the impact of tag-SNPs on power. Simulations illustrate that the theoretical power predictions are quite accurate over a broad range of conditions. Our theoretical formulae should provide useful guidance when planning haplotype association studies.