Measuring and partitioning the high-order linkage disequilibrium by multiple order Markov chains

Authors

  • Yunjung Kim,

    1. Bioinformatics Research Center, North Carolina State University, Raleigh, North Carolina
    2. Department of Statistics, North Carolina State University, Raleigh, North Carolina
    Search for more papers by this author
  • Sheng Feng,

    1. Bioinformatics Research Center, North Carolina State University, Raleigh, North Carolina
    2. Department of Statistics, North Carolina State University, Raleigh, North Carolina
    Current affiliation:
    1. Department of Biostatistics and Bioinformatics, Duke University School of Medicine, Durham, NC
    Search for more papers by this author
  • Zhao-Bang Zeng

    Corresponding author
    1. Bioinformatics Research Center, North Carolina State University, Raleigh, North Carolina
    2. Department of Statistics, North Carolina State University, Raleigh, North Carolina
    3. Department of Genetics, North Carolina State University, Raleigh, North Carolina
    • Zhao-Bang Zeng, Bioinformatics Research Center, Department of Statistics, North Carolina State University, Raleigh, NC 27695-7566
    Search for more papers by this author

Abstract

A map of the background levels of disequilibrium between nearby markers can be useful for association mapping studies. In order to assess the background levels of linkage disequilibrium (LD), multilocus LD measures are more advantageous than pairwise LD measures because the combined analysis of pairwise LD measures is not adequate to detect simultaneous allele associations among multiple markers. Various multilocus LD measures based on haplotypes have been proposed. However, most of these measures provide a single index of association among multiple markers and does not reveal the complex patterns and different levels of LD structure. In this paper, we employ non-homogeneous, multiple order Markov Chain models as a statistical framework to measure and partition the LD among multiple markers into components due to different orders of marker associations. Using a sliding window of multiple markers on phased haplotype data, we compute corresponding likelihoods for different Markov Chain (MC) orders in each window. The log-likelihood difference between the lowest MC order model (MC0) and the highest MC order model in each window is used as a measure of the total LD or the overall deviation from the gametic equilibrium for the window. Then, we partition the total LD into lower order disequilibria and estimate the effects from two-, three-, and higher order disequilibria. The relationship between different orders of LD and the log-likelihood difference involving two different orders of MC models are explored. By applying our method to the phased haplotype data in the ENCODE regions of the HapMap project, we are able to identify high/low multilocus LD regions. Our results reveal that the most LD in the HapMap data is attributed to the LD between adjacent pairs of markers across the whole region. LD between adjacent pairs of markers appears to be more significant in high multilocus LD regions than in low multilocus LD regions. We also find that as the multilocus total LD increases, the effects of high-order LD tends to get weaker due to the lack of observed multilocus haplotypes. The overall estimates of first, second, third, and fourth order LD across the ENCODE regions are 64, 23, 9, and 3%. Genet. Epidemiol. 2008. © 2008 Wiley-Liss, Inc.

Ancillary