Fine mapping of disease genes via haplotype clustering

Authors

  • E.R.B. Waldron,

    Corresponding author
    1. Department of Epidemiology and Public Health, Imperial College London, London, United Kingdom
    • Department of Epidemiology and Public Health, Imperial College London, St. Mary's Campus, Norfolk Place, London W2 1PG UK
    Search for more papers by this author
  • J.C. Whittaker,

    1. Department of Epidemiology and Public Health, Imperial College London, London, United Kingdom
    2. Department of Epidemiology and Population Health, London School of Hygiene and Tropical Medicine, London, United Kingdom
    Search for more papers by this author
  • D.J. Balding

    1. Department of Epidemiology and Public Health, Imperial College London, London, United Kingdom
    Search for more papers by this author

Abstract

We propose an algorithm for analysing SNP-based population association studies, which is a development of that introduced by Molitor et al. [2003: Am J Hum Genet 73:1368–1384]. It uses clustering of haplotypes to overcome the major limitations of many current haplotype-based approaches. We define a between-haplotype score that is simple, yet appears to capture much of the information about evolutionary relatedness of the haplotypes in the vicinity of a (unobserved) putative causal locus. Haplotype clusters can then be defined via a putative ancestral haplotype and a cut-off distance. The number of an individual's two haplotypes that lie within the cluster predicts the individual's genotype at the causal locus. This predicted genotype can then be investigated for association with the phenotype of interest. We implement our approach within a Markov-chain Monte Carlo algorithm that, in effect, searches over locations and ancestral haplotypes to identify large, case-rich clusters. The algorithm successfully fine-maps a causal mutation in a test analysis using real data, and achieves almost 98% accuracy in predicting the genotype at the causal locus. A simulation study indicates that the new algorithm is substantially superior to alternative approaches, and it also allows us to identify situations in which multi-point approaches can substantially improve over single-SNP analyses. Our algorithm runs quickly and there is scope for extension to a wide range of disease models and genomic scales. Genet. Epidemiol. 2006. © 2005 Wiley-Liss, Inc.

Ancillary