• gene association studies;
  • two-stage designs;
  • single nucleotide polymorphisms;
  • tag SNPs


We consider two-stage case-control designs for testing associations between single nucleotide polymorphisms (SNPs) and disease, in which a subsample of subjects is used to select a panel of “tagging” SNPs that will be considered in the main study. We propose a pseudolikelihood [Pepe and Flemming, 1991: JASA 86:108–113] that combines the information from both the main study and the substudy to test the association with any polymorphism in the original set. SNP-tagging [Chapman et al., 2003: Hum Hered 56:18–31] and haplotype-tagging [Stram et al., 2003a; Hum Hered 55:27–36] approaches are compared. We show that the cost-efficiency of such a design for estimating the relative risk associated with the causal polymorphism can be considerably better than for a single-stage design, even if the causal polymorphism is not included in the tag-SNP set. We also consider the optimal selection of cases and controls in such designs and the relative efficiency for estimating the location of a causal variant in linkage disequilibrium mapping. Nevertheless, as the cost of high-volume genotyping plummets and haplotype tagging information from the International HapMap project [Gibbs et al., 2003; Nature 426:789–796] rapidly accumulates in public databases, such two-stage designs may soon become unnecessary. Genet. Epidemiol. © 2004 Wiley-Liss, Inc.