Accounting for haplotype uncertainty in matched association studies: A comparison of simple and flexible techniques
Article first published online: 6 JAN 2005
© 2005 Wiley-Liss, Inc.
Volume 28, Issue 3, pages 261–272, April 2005
How to Cite
Kraft, P., Cox, D. G., Paynter, R. A., Hunter, D. and De Vivo, I. (2005), Accounting for haplotype uncertainty in matched association studies: A comparison of simple and flexible techniques. Genet. Epidemiol., 28: 261–272. doi: 10.1002/gepi.20061
- Issue published online: 9 MAR 2005
- Article first published online: 6 JAN 2005
- Manuscript Accepted: 11 OCT 2004
- Manuscript Received: 17 FEB 2004
- population-based matched case-control data;
- gene-environment interaction
Population-based case-control studies measuring associations between haplotypes of single nucleotide polymorphisms (SNPs) are increasingly popular, in part because haplotypes of a few “tagging” SNPs may serve as surrogates for variation in relatively large sections of the genome. Due to current technological limitations, haplotypes in cases and controls must be inferred from unphased genotypic data. Using individual-specific inferred haplotypes as covariates in standard epidemiologic analyses (e.g., conditional logistic regression) is an attractive analysis strategy, as it allows adjustment for nongenetic covariates, provides omnibus and haplotype-specific tests of association, and can estimate haplotype and haplotype × environment interaction effects. In principle, some adjustment for the uncertainty in inferred haplotypes should be made. Via simulation, we compare the performance (bias and mean squared error of haplotype and haplotype × environment interaction effect estimates) of several analytic strategies using inferred haplotypes in the context of matched case-control data. These strategies include using only the most likely haplotype assignment, the expectation substitution approach described by Stram et al. ([2003b] Hum. Hered. 55:179–190) and others, and an improper version of multiple imputation. For relatively uncomplicated haplotype structures and moderate haplotype relative risks (≤2), all methods performed comparably well (small bias with appropriately-sized confidence intervals). For larger relative risks, the most likely haplotype and multiple imputation strategies showed noticeable bias towards the null; the expectation substitution strategy still performed well. When there was more uncertainty in the inferred haplotypes, the most likely and multiple imputation strategies showed even more bias towards the null, while the expectation substitution method had slightly smaller than nominal confidence intervals for larger relative risks (≥5). An application to progesterone-receptor haplotypes and endometrial cancer further illustrates that the performance of all these methods depends on how well the observed haplotypes “tag” the unobserved causal variant. Genet. Epidemiol. © 2005 Wiley-Liss, Inc.