Two-Phase Designs to Follow-Up Genome-Wide Association Signals With DNA Resequencing Studies


  • Contract grant sponsor: National Institutes of Health (NIH); Contract grant numbers: GM065450; U10-CA-37377; U10-CA-69974; U19 GM6138 (Mayo PGRN); P50 CA116201 (Mayo Clinic Breast Cancer SPORE).

Correspondence to: Daniel J. Schaid, 200 1st St., SW, Rochester, MN 55905. E-mail:


Genome-wide association studies (GWAS) of complex traits have generated many association signals for single nucleotide polymorphisms (SNPs). To understand the underlying causal genetic variant(s), focused DNA resequencing of targeted genomic regions is commonly used, yet the current cost of resequencing limits sample sizes for resequencing studies. Information from the large GWAS can be used to guide choice of samples for resequencing, such as the SNP genotypes in the targeted genomic region. Viewing the GWAS tag-SNPs as imperfect surrogates for the underlying causal variants, yet expecting that the tag-SNPs are correlated with the causal variants, a reasonable approach is a two-phase case-control design, with the GWAS serving as the first-phase and the resequencing study serving as the second-phase. Using stratified sampling based on both tag-SNP genotypes and case-control status, we explore the gains in power of a two-phase design relative to randomly sampling cases and controls for resequencing (i.e., ignoring tag-SNP genotypes). Simulation results show that stratified sampling based on both tag-SNP genotypes and case-control status is not likely to have lower power than stratified sampling based only on case-control status, and can sometimes have substantially greater power. The gain in power depends on the amount of linkage disequilibrium between the tag-SNP and causal variant alleles, as well as the effect size of the causal variant. Hence, the two-phase design provides an efficient approach to follow-up GWAS signals with DNA resequencing.