Centralizing the non-central chi-square: a new method to correct for population stratification in genetic case-control association studies

Authors

  • Prakash Gorroochurn,

    Corresponding author
    1. Division of Statistical Genetics, Department of Biostatistics, Mailman School of Public Health, Columbia University
    • Department of Biostatistics, Columbia University, R620 (6th floor), 722 W 168th Street, New York, NY 10032
    Search for more papers by this author
  • Gary A. Heiman,

    1. Department of Epidemiology, Mailman School of Public Health, Columbia University
    Search for more papers by this author
  • Susan E. Hodge,

    1. Division of Statistical Genetics, Department of Biostatistics, Mailman School of Public Health, Columbia University
    2. Clinical-Genetic Epidemiology Unit, New York State Psychiatric Institute, New York, NY
    Search for more papers by this author
  • David A. Greenberg

    1. Division of Statistical Genetics, Department of Biostatistics, Mailman School of Public Health, Columbia University
    2. Clinical-Genetic Epidemiology Unit, New York State Psychiatric Institute, New York, NY
    Search for more papers by this author

Abstract

We present a new method, the δ-centralization (DC) method, to correct for population stratification (PS) in case-control association studies. DC works well even when there is a lot of confounding due to PS. The latter causes overdispersion in the usual chi-square statistics which then have non-central chi-square distributions. Other methods approach the non-centrality indirectly, but we deal with it directly, by estimating the non-centrality parameter τ itself. Specifically: (1) We define a quantity δ, a function of the relevant subpopulation parameters. We show that, for relatively large samples, δ exactly predicts the elevation of the false positive rate due to PS, when there is no true association between marker genotype and disease. (This quantity δ is quite different from Wright's FST and can be large even when FST is small.) (2) We show how to estimate δ, using a panel of unlinked “neutral” loci. (3) We then show that δ2 corresponds to τ the non-centrality parameter of the chi-square distribution. Thus, we can centralize the chi-square using our estimate of δ; this is the DC method. (4) We demonstrate, via computer simulations, that DC works well with as few as 25–30 unlinked markers, where the markers are chosen to have allele frequencies reasonably close (within ±.1) to those at the test locus. (5) We compare DC with genomic control and show that where as the latter becomes overconservative when there is considerable confounding due to PS (i.e. when δ is large), DC performs well for all values of δ. Genet. Epidemiol. 2006. © 2006 Wiley-Liss, Inc.

Ancillary