Computation vs. cloning: evaluation of two methods for haplotype determination


Ryan J. Harrigan, Center for Tropical Research, Institute of the Environment, University of California, Los Angeles, La Kretz Hall, Suite 300, Box 951496, Los Angeles, CA 90095–1496, USA, Fax: 310-825-5446; E-mail:


Nuclear sequence data, often from multiple loci, are increasingly being employed in analyses of population structure and history, yet there has been relatively little evaluation of methods for accurately and efficiently separating the alleles or haplotypes in heterozygous individuals. We compared the performance of a computational method of haplotype reconstruction and standard cloning methods using a highly variable intron (ornithine decarboxylase, intron 6) in three closely related species of dabbling ducks (genus Anas). Cloned sequences from 32 individuals were compared to results obtained from phase 2.1.1 . phase correctly identified haplotypes in 28 of 30 heterozygous individuals when the underlying model assumed no recombination. Haplotypes of the remaining two individuals were also inferred correctly except for unique polymorphisms, the phase of which was appropriately indicated as uncertain (phase probability = 0.5). For a larger set of 232 individuals, results were essentially identical regardless of the recombination model used and haplotypes for all 30 of the tested heterozygotes were correctly inferred, with the exception of uncertain phase for unique polymorphisms in one individual. In contrast, initial sequences of one clone per sample yielded accurate haplotype determination in only 26 of 30 individuals; polymerase chain reaction (PCR)/cloning errors resulting from misincorporation of individual nucleotides could be recognized and avoided by comparison to direct sequences, but errors due to PCR recombination resulted in incorrect haplotype reconstruction in four individuals. The accuracy of haplotypes reconstructed by phase, even when dealing with a relatively small number of samples and numerous variable sites, suggests broad utility of computational approaches for reducing the cost and improving the efficiency of data collection from nuclear sequence loci.