Get access
Advertisement

ABGD, Automatic Barcode Gap Discovery for primary species delimitation

Authors

  • N. PUILLANDRE,

    1. UMR 7138, Muséum National d'Histoire Naturelle, Departement Systématique et Evolution, 43, Rue Cuvier, 75231 Paris, France
    Search for more papers by this author
  • A. LAMBERT,

    1. Laboratoire de Probabilités et Modèles Aléatoires (UMR 7599), UPMC Univ Paris 06, Univ Paris Diderot, CNRS, Paris, France
    Search for more papers by this author
  • S. BROUILLET,

    1. Systématique, Adaptation et Evolution (UMR 7138), UPMC Univ Paris 06, CNRS, MNHN, IRD, Paris, France
    2. Atelier de Bioinformatique, UPMC Univ Paris 06, Paris, France
    Search for more papers by this author
  • G. ACHAZ

    1. Systématique, Adaptation et Evolution (UMR 7138), UPMC Univ Paris 06, CNRS, MNHN, IRD, Paris, France
    2. Atelier de Bioinformatique, UPMC Univ Paris 06, Paris, France
    Search for more papers by this author

Guillaume Achaz, Fax: +33 1 44 27 63 12; E-mail: guillaume.achaz@upmc.fr

Abstract

Within uncharacterized groups, DNA barcodes, short DNA sequences that are present in a wide range of species, can be used to assign organisms into species. We propose an automatic procedure that sorts the sequences into hypothetical species based on the barcode gap, which can be observed whenever the divergence among organisms belonging to the same species is smaller than divergence among organisms from different species. We use a range of prior intraspecific divergence to infer from the data a model-based one-sided confidence limit for intraspecific divergence. The method, called Automatic Barcode Gap Discovery (ABGD), then detects the barcode gap as the first significant gap beyond this limit and uses it to partition the data. Inference of the limit and gap detection are then recursively applied to previously obtained groups to get finer partitions until there is no further partitioning. Using six published data sets of metazoans, we show that ABGD is computationally efficient and performs well for standard prior maximum intraspecific divergences (a few per cent of divergence for the five data sets), except for one data set where less than three sequences per species were sampled. We further explore the theoretical limitations of ABGD through simulation of explicit speciation and population genetics scenarios. Our results emphasize in particular the sensitivity of the method to the presence of recent speciation events, via (unrealistically) high rates of speciation or large numbers of species. In conclusion, ABGD is fast, simple method to split a sequence alignment data set into candidate species that should be complemented with other evidence in an integrative taxonomic approach.

Get access to the full text of this article

Ancillary