Maximum likelihood estimation of haplotype effects and haplotype-environment interactions in association studies



The associations between haplotypes and disease phenotypes offer valuable clues about the genetic determinants of complex diseases. It is highly challenging to make statistical inferences about these associations because of the unknown gametic phase in genotype data. We describe a general likelihood-based approach to inferring haplotype-disease associations in studies of unrelated individuals. We consider all possible phenotypes (including disease indicator, quantitative trait, and potentially censored age at onset of disease) and all commonly used study designs (including cross-sectional, case-control, cohort, nested case-control, and case-cohort). The effects of haplotypes on phenotype are characterized by appropriate regression models, which allow various genetic mechanisms and gene-environment interactions. We present the likelihood functions for all study designs and disease phenotypes under Hardy-Weinberg disequilibrium. The corresponding maximum likelihood estimators are approximately unbiased, normally distributed, and statistically efficient. We provide simple and efficient numerical algorithms to calculate the maximum likelihood estimators and their variances, and implement these algorithms in a freely available computer program. Extensive simulation studies demonstrate that the proposed methods perform well in realistic situations. An application to the Carolina Breast Cancer Study reveals significant haplotype effects and haplotype-smoking interactions in the development of breast cancer. Genet. Epidemiol. 2005. © 2005 Wiley-Liss, Inc.