Get access

A flexible genome-wide bootstrap method that accounts for rankingand threshold-selection bias in GWAS interpretation and replication study design

Authors

  • Laura L. Faye,

    1. Dalla Lana School of Public Health, University of Toronto, 6th Floor, Health Sciences Building, 155 College Street, Toronto, ON, Canada M5T 3M7
    2. Samuel Lunenfeld Research Institute of Mount Sinai Hospital, 60 Murray Street, Box #18, Toronto, ON, Canada M5T 3L9
    Search for more papers by this author
  • Lei Sun,

    1. Dalla Lana School of Public Health, University of Toronto, 6th Floor, Health Sciences Building, 155 College Street, Toronto, ON, Canada M5T 3M7
    2. Department of Statistics, University of Toronto, Sidney Smith Hall, 100 St. George St., Toronto, ON, Canada M5S 3G3
    Search for more papers by this author
  • Apostolos Dimitromanolakis,

    1. Dalla Lana School of Public Health, University of Toronto, 6th Floor, Health Sciences Building, 155 College Street, Toronto, ON, Canada M5T 3M7
    2. Samuel Lunenfeld Research Institute of Mount Sinai Hospital, 60 Murray Street, Box #18, Toronto, ON, Canada M5T 3L9
    Search for more papers by this author
  • Shelley B. Bull

    Corresponding author
    1. Dalla Lana School of Public Health, University of Toronto, 6th Floor, Health Sciences Building, 155 College Street, Toronto, ON, Canada M5T 3M7
    2. Samuel Lunenfeld Research Institute of Mount Sinai Hospital, 60 Murray Street, Box #18, Toronto, ON, Canada M5T 3L9
    • Samuel Lunenfeld Research Institute of Mount Sinai Hospital, 60 Murray Street, Box #18, Toronto, ON, Canada M5T 3L9
    Search for more papers by this author

Abstract

The phenomenon known as the winner's curse is a form of selection bias that affects estimates of genetic association. In genome-wide association studies (GWAS) the bias is exacerbated by the use of stringent selection thresholds and ranking over hundreds of thousands of single nucleotide polymorphisms (SNPs). We develop an improved multi-locus bootstrap point estimate and confidence interval, which accounts for both ranking- and threshold-selection bias in the presence of genome-wide SNP linkage disequilibrium structure. The bootstrap method easily adapts to various study designs and alternative test statistics as well as complex SNP selection criteria. The latter is demonstrated by our application to the Wellcome Trust Case Control Consortium findings, in which the selection criterion was the minimum of the p-values for the additive and genotypic genetic effect models. In contrast, existing likelihood-based bias-reduced estimators account for the selection criterion applied to an SNP as if it were the only one tested, and so are more simple computationally, but do not address ranking across SNPs. Our simulation studies show that the bootstrap bias-reduced estimates are usually closer to the true genetic effect than the likelihood estimates and are less variable with a narrower confidence interval. Replication study sample size requirements computed from the bootstrap bias-reduced estimates are adequate 75–90 per cent of the time compared to 53-60 per cent of the time for the likelihood method. The bootstrap methods are implemented in a user-friendly package able to provide point and interval estimation for both binary and quantitative phenotypes in large-scale GWAS. Copyright © 2011 John Wiley & Sons, Ltd.

Ancillary