• ambiguous genotype;
  • consensus vote;
  • flanking region of nuclear microsatellite DNA;
  • Locusta migratoria;
  • statistical haplotype reconstruction


Single copy nuclear polymorphic (scnp) DNA is potentially a powerful molecular marker for evolutionary studies of populations. However, a practical obstacle to its employment is the general problem of haplotype determination due to the common occurrence of heterozygosity in diploid organisms. We explore here a ‘consensus vote’ (CV) approach to this question, combining statistical haplotype reconstruction and experimental verification using as an example an indel-free scnp DNA marker from the flanking region of a microsatellite locus of the migratory locust. The raw data comprise 251-bp sequences from 526 locust individuals (1052 chromosomes), with 71 (28.3%) polymorphic nucleotide sites (including seven triallelic sites) and 141 distinct genotypes (with frequencies ranging from 0.2 to 25.5%). Six representative statistical haplotype reconstruction algorithms are employed in our CV approach, including one parsimony method, two expectation–maximization (EM) methods and three Bayesian methods. The phases of 116 ambiguous individuals inferred by this approach are verified by molecular cloning experiments. We demonstrate the effectiveness of the CV approach compared to inferences based on individual statistical algorithms. First, it has the unique power to partition the inferrals into a reliable group and an uncertain group, thereby allowing the identification of the inferrals with greater uncertainty (12.7% of the total sample in this case). This considerably reduces subsequent efforts of experimental verification. Second, this approach is capable of handling genotype data pooled from many geographical populations, thus tolerating heterogeneity of genetic diversity among populations. Third, the performance of the CV approach is not influenced by the number of heterozygous sites in the ambiguous genotypes. Therefore, the CV approach is potentially a reliable strategy for effective haplotype determination of nuclear DNA markers. Our results also show that rare variations and rare inferrals tend to be more vulnerable to inference error, and hence deserve extra surveillance.