Conditional Likelihood Methods for Haplotype-Based Association Analysis Using Matched Case–Control Data


  • Jinbo Chen,

    Corresponding author
    1. Biostatistics Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute, 6120 Executive Boulevard, Rockville, Maryland 20852, U.S.A.
    Search for more papers by this author
  • Carmen Rodriguez

    1. Department of Epidemiology and Surveillance Research, American Cancer Society, Atlanta, Georgia 30329-4251, U.S.A.
    Search for more papers by this author


Summary Genetic epidemiologists routinely assess disease susceptibility in relation to haplotypes, that is, combinations of alleles on a single chromosome. We study statistical methods for inferring haplotype-related disease risk using single nucleotide polymorphism (SNP) genotype data from matched case–control studies, where controls are individually matched to cases on some selected factors. Assuming a logistic regression model for haplotype-disease association, we propose two conditional likelihood approaches that address the issue that haplotypes cannot be inferred with certainty from SNP genotype data (phase ambiguity). One approach is based on the likelihood of disease status conditioned on the total number of cases, genotypes, and other covariates within each matching stratum, and the other is based on the joint likelihood of disease status and genotypes conditioned only on the total number of cases and other covariates. The joint-likelihood approach is generally more efficient, particularly for assessing haplotype-environment interactions. Simulation studies demonstrated that the first approach was more robust to model assumptions on the diplotype distribution conditioned on environmental risk variables and matching factors in the control population. We applied the two methods to analyze a matched case–control study of prostate cancer.