Tag SNP selection for association studies


  • Daniel O. Stram

    Corresponding author
    1. Division of Biostatistics and Genetic Epidemiology, Department of Preventive Medicine, Keck School of Medicine, University of Southern California, Los Angeles, California
    • Department of Preventive Medicine, University of Southern California, 1540 Alcazar Street, Suite 220, Los Angeles, CA 90033
    Search for more papers by this author


This report describes current methods for selection of informative single nucleotide polymorphisms (SNPs) using data from a dense network of SNPs that have been genotyped in a relatively small panel of subjects. We discuss the following issues: (1) Optimal selection of SNPs based upon maximizing either the predictability of unmeasured SNPs or the predictability of SNP haplotypes as selection criteria. (2) The dependence of the performance of tag SNP selection methods upon the density of SNP markers genotyped for the purpose of haplotype discovery and tag SNP selection. (3) The likely power of case-control studies to detect the influence upon disease risk of common disease-causing variants in candidate genes in a haplotype-based analysis. We propose a quasi-empirical approach towards evaluating the power of large studies with this calculation based upon the SNP genotype and haplotype frequencies estimated in a haplotype discovery panel. In this calculation, each common SNP in turn is treated as a potential unmeasured causal variant and subjected to a correlation analysis using the remaining SNPs. We use a small portion of the HapMap ENCODE data (488 common SNPs genotyped over approximately a 500 kb region of chromosome 2) as an illustrative example of this approach towards power evaluation. © 2004 Wiley-Liss, Inc.