• expectation-maximization;
  • haplotype;
  • genotype;
  • linkage disequilibrium;
  • resource implications


Haplotype analysis is essential to studies of the genetic factors underlying human disease, but requires a large sample size of phase-known data. Recently, directly haplotyping individuals was suggested as a means of maximizing the phase-known data from a sample. Haplotyping, however, is much more labor-intensive than indirectly inferring haplotypes from genotypes (genotyping). This study uses simulations to compare the power of each methodology to detect associations between a haplotype and a trait or disease locus under conditions of varying linkage disequilibrium. The relative power of haplotyping over genotyping in association studies increases with decreasing sample size, decreasing linkage disequilibrium, decreasing numbers of marker loci, and decreasing numbers of different haplotypes. In addition, the frequency of the haplotype of interest and the magnitude of its association with the disease affect the power. From a cost-benefit standpoint, genotyping would be favored with large multiplicative risks (relative risk of haplotype >2.5). If case numbers are limiting rather than cost, haplotyping would maximize the information obtained. At small haplotype frequencies (e.g., <0.05), haplotyping is relatively more efficient, but there is little absolute power to detect associations under either methodology. Given the much larger laboratory resources required for direct haplotyping, genotyping would probably be favored under most conditions, but this must be balanced against the unit costs associated with recruitment and phenotyping. In the context of multipurpose, prospective cohort studies (e.g., the UK Biobank study), there may be a general value in establishing a series of directly haplotyped individuals to serve as controls for a number of alternative studies. Genet Epidemiol 26:116–124, 2004. © 2004 Wiley-Liss, Inc.