Genetic background comparison using distance-based regression, with applications in population stratification evaluation and adjustment

Authors

  • Qizhai Li,

    1. Biostatistics Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Bethesda, Maryland
    2. Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China
    Search for more papers by this author
  • Sholom Wacholder,

    1. Biostatistics Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Bethesda, Maryland
    Search for more papers by this author
  • David J. Hunter,

    1. Biostatistics Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Bethesda, Maryland
    2. Program in Molecular and Genetic Epidemiology, Department of Epidemiology, Harvard School of Public Health, Boston, Massachusetts
    Search for more papers by this author
  • Robert N. Hoover,

    1. Biostatistics Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Bethesda, Maryland
    Search for more papers by this author
  • Stephen Chanock,

    1. Biostatistics Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Bethesda, Maryland
    Search for more papers by this author
  • Gilles Thomas,

    1. Biostatistics Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Bethesda, Maryland
    Search for more papers by this author
  • Kai Yu

    Corresponding author
    1. Biostatistics Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Bethesda, Maryland
    • 6120 Executive Boulevard, EPS 8040, Bethesda, MD 20892
    Search for more papers by this author

  • This article is a US Government work and, as such, is in the public domain in the United States of America.

Abstract

Population stratification (PS) can lead to an inflated rate of false-positive findings in genome-wide association studies (GWAS). The commonly used approach of adjustment for a fixed number of principal components (PCs) could have a deleterious impact on power when selected PCs are equally distributed in cases and controls, or the adjustment of certain covariates, such as self-identified ethnicity or recruitment center, already included in the association analyses, correctly maps to major axes of genetic heterogeneity. We propose a computationally efficient procedure, PC-Finder, to identify a minimal set of PCs while permitting an effective correction for PS. A general pseudo F statistic, derived from a non-parametric multivariate regression model, can be used to assess whether PS exists or has been adequately corrected by a set of selected PCs. Empirical data from two GWAS conducted as part of the Cancer Genetic Markers of Susceptibility (CGEMS) project demonstrate the application of the procedure. Furthermore, simulation studies show the power advantage of the proposed procedure in GWAS over currently used PS correction strategies, particularly when the PCs with substantial genetic variation are distributed similarly in cases and controls and therefore do not induce PS. Genet. Epidemiol. 33:432–441, 2009. © 2009 Wiley-Liss, Inc.

Ancillary