Accounting for haplotype uncertainty in matched association studies: A comparison of simple and flexible techniques

Authors

  • Peter Kraft,

    Corresponding author
    1. Department of Epidemiology, Harvard School of Public Health, Boston, Massachusetts
    2. Department of Biostatistics, Harvard School of Public Health, Boston, Massachusetts
    • 665 Huntington Avenue, Building 2, Room 109, Boston, MA 02115
    Search for more papers by this author
  • David G. Cox,

    1. Department of Epidemiology, Harvard School of Public Health, Boston, Massachusetts
    Search for more papers by this author
  • Randi A. Paynter,

    1. Department of Epidemiology, Harvard School of Public Health, Boston, Massachusetts
    Search for more papers by this author
  • David Hunter,

    1. Department of Epidemiology, Harvard School of Public Health, Boston, Massachusetts
    2. Department of Nutrition, Harvard School of Public Health, Boston, Massachusetts
    3. Channing Laboratory, Brigham and Women's Hospital and Harvard Medical School, Boston, Massachusetts
    Search for more papers by this author
  • Immaculata De Vivo

    1. Department of Epidemiology, Harvard School of Public Health, Boston, Massachusetts
    2. Channing Laboratory, Brigham and Women's Hospital and Harvard Medical School, Boston, Massachusetts
    Search for more papers by this author

Abstract

Population-based case-control studies measuring associations between haplotypes of single nucleotide polymorphisms (SNPs) are increasingly popular, in part because haplotypes of a few “tagging” SNPs may serve as surrogates for variation in relatively large sections of the genome. Due to current technological limitations, haplotypes in cases and controls must be inferred from unphased genotypic data. Using individual-specific inferred haplotypes as covariates in standard epidemiologic analyses (e.g., conditional logistic regression) is an attractive analysis strategy, as it allows adjustment for nongenetic covariates, provides omnibus and haplotype-specific tests of association, and can estimate haplotype and haplotype × environment interaction effects. In principle, some adjustment for the uncertainty in inferred haplotypes should be made. Via simulation, we compare the performance (bias and mean squared error of haplotype and haplotype × environment interaction effect estimates) of several analytic strategies using inferred haplotypes in the context of matched case-control data. These strategies include using only the most likely haplotype assignment, the expectation substitution approach described by Stram et al. ([2003b] Hum. Hered. 55:179–190) and others, and an improper version of multiple imputation. For relatively uncomplicated haplotype structures and moderate haplotype relative risks (≤2), all methods performed comparably well (small bias with appropriately-sized confidence intervals). For larger relative risks, the most likely haplotype and multiple imputation strategies showed noticeable bias towards the null; the expectation substitution strategy still performed well. When there was more uncertainty in the inferred haplotypes, the most likely and multiple imputation strategies showed even more bias towards the null, while the expectation substitution method had slightly smaller than nominal confidence intervals for larger relative risks (≥5). An application to progesterone-receptor haplotypes and endometrial cancer further illustrates that the performance of all these methods depends on how well the observed haplotypes “tag” the unobserved causal variant. Genet. Epidemiol. © 2005 Wiley-Liss, Inc.

Ancillary