Haplotype reconstruction for scnp DNA: a consensus vote approach with extensive sequence data from populations of the migratory locust (Locusta migratoria)

Authors

  • ZU-SHI HUANG,

    1. State Key Laboratory of Integrated Management of Pest Insects and Rodents, Institute of Zoology, Chinese Academy of Sciences, Beijing 100101, China,
    2. Graduate University of the Chinese Academy of Sciences, Beijing 100049, China,
    Search for more papers by this author
  • YA-JIE JI,

    1. State Key Laboratory of Integrated Management of Pest Insects and Rodents, Institute of Zoology, Chinese Academy of Sciences, Beijing 100101, China,
    Search for more papers by this author
  • DE-XING ZHANG

    1. State Key Laboratory of Integrated Management of Pest Insects and Rodents, Institute of Zoology, Chinese Academy of Sciences, Beijing 100101, China,
    2. Center for Computational and Evolutionary Biology, Institute of Zoology, Chinese Academy of Sciences, Beijing 100101, China
    Search for more papers by this author

De-Xing Zhang, Institute of Zoology, Chinese Academy of Sciences, Datun Road, Chaoyang District, Beijing 100101, China. Fax: (+86) 10 64807232; E-mail: dxzhang@ioz.ac.cn

Abstract

Single copy nuclear polymorphic (scnp) DNA is potentially a powerful molecular marker for evolutionary studies of populations. However, a practical obstacle to its employment is the general problem of haplotype determination due to the common occurrence of heterozygosity in diploid organisms. We explore here a ‘consensus vote’ (CV) approach to this question, combining statistical haplotype reconstruction and experimental verification using as an example an indel-free scnp DNA marker from the flanking region of a microsatellite locus of the migratory locust. The raw data comprise 251-bp sequences from 526 locust individuals (1052 chromosomes), with 71 (28.3%) polymorphic nucleotide sites (including seven triallelic sites) and 141 distinct genotypes (with frequencies ranging from 0.2 to 25.5%). Six representative statistical haplotype reconstruction algorithms are employed in our CV approach, including one parsimony method, two expectation–maximization (EM) methods and three Bayesian methods. The phases of 116 ambiguous individuals inferred by this approach are verified by molecular cloning experiments. We demonstrate the effectiveness of the CV approach compared to inferences based on individual statistical algorithms. First, it has the unique power to partition the inferrals into a reliable group and an uncertain group, thereby allowing the identification of the inferrals with greater uncertainty (12.7% of the total sample in this case). This considerably reduces subsequent efforts of experimental verification. Second, this approach is capable of handling genotype data pooled from many geographical populations, thus tolerating heterogeneity of genetic diversity among populations. Third, the performance of the CV approach is not influenced by the number of heterozygous sites in the ambiguous genotypes. Therefore, the CV approach is potentially a reliable strategy for effective haplotype determination of nuclear DNA markers. Our results also show that rare variations and rare inferrals tend to be more vulnerable to inference error, and hence deserve extra surveillance.

Ancillary