Bias due to two-stage residual-outcome regression analysis in genetic association studies

Authors

  • Serkalem Demissie,

    Corresponding author
    1. Department of Biostatistics, Boston University School of Public Health, Boston, Massachusetts
    • Department of Biostatistics, Boston University School of Public Health BUMC, 801 Massachusetts Avenue, 315, Boston, MA 02118
    Search for more papers by this author
  • L. Adrienne Cupples

    1. Department of Biostatistics, Boston University School of Public Health, Boston, Massachusetts
    Search for more papers by this author

Abstract

Association studies of risk factors and complex diseases require careful assessment of potential confounding factors. Two-stage regression analysis, sometimes referred to as residual- or adjusted-outcome analysis, has been increasingly used in association studies of single nucleotide polymorphisms (SNPs) and quantitative traits. In this analysis, first, a residual-outcome is calculated from a regression of the outcome variable on covariates and then the relationship between the adjusted-outcome and the SNP is evaluated by a simple linear regression of the adjusted-outcome on the SNP. In this article, we examine the performance of this two-stage analysis as compared with multiple linear regression (MLR) analysis. Our findings show that when a SNP and a covariate are correlated, the two-stage approach results in biased genotypic effect and loss of power. Bias is always toward the null and increases with the squared-correlation between the SNP and the covariate (equation image). For example, for equation image, 0.1, and 0.5, two-stage analysis results in, respectively, 0, 10, and 50% attenuation in the SNP effect. As expected, MLR was always unbiased. Since individual SNPs often show little or no correlation with covariates, a two-stage analysis is expected to perform as well as MLR in many genetic studies; however, it produces considerably different results from MLR and may lead to incorrect conclusions when independent variables are highly correlated. While a useful alternative to MLR under equation image, the two -stage approach has serious limitations. Its use as a simple substitute for MLR should be avoided. Genet. Epidemiol. 2011. © 2011 Wiley Periodicals, Inc. 35: 592-596, 2011

Ancillary