Despite taxonomy’s 250-year history, the past 20 years have borne witness to remarkable advances in technology and techniques, as well as debate. DNA barcoding has generated a substantial proportion of this debate, with its proposition that a single mitochondrial sequence will consistently identify and delimit species, replacing more evidence-rich and time-intensive methods. Although mitochondrial DNA (mtDNA) has since been the focus of voluminous discussion and case studies, little effort has been made to comprehensively evaluate its success in delimiting closely related species. We have conducted the first broadly comparative literature review addressing the efficacy of molecular markers for delimiting such species over a broad taxonomic range. By considering only closely related species, we sought to avoid confusion of success rates with those due to deeply divergent taxa. We also address whether increased population-level or geographic sampling affects delimitation success. Based on the results from 101 studies, we found that all marker groups had approximately equal success rates (∼70%) in delimiting closely related species and that the use of additional loci increased average delimitation success. We also found no relationship between increased sampling of intraspecific variability and delimitation success. Ultimately, our results support a multi-locus integrative approach to species delimitation and taxonomy.
The concept of a universal marker for accurate identification of all life holds a compelling simplicity. If its application is indeed effective at levels close to 100%, it would unquestionably be useful in many biological disciplines. Mitochondrial DNA was a logical choice for such a marker once PCR-based sequencing allowed standardized primer selection (e.g. cyt b: Bartlett & Davidson 1991; COI: Bogdanowicz et al. 1993; Sperling et al. 1994). However, the effectiveness of any single marker for this purpose remains open to debate, and the botanical community has since moved beyond this one marker system and is in the process of selecting a small number of markers for their molecular identification system (Hollingsworth et al. 2009). Despite the practice with plants, COI is increasingly being used as the primary provisional identifier of animal specimens to the species level (e.g. Ward et al. 2005; Burns et al. 2007) or to simpler entities such as molecular operational taxonomic units, or MOTU’s (e.g. Hebert et al. 2004a; Janzen et al. 2005). In some of these cases, there is little or no evaluation of the effectiveness of mtDNA as a diagnostic character by referring to multiple data sources.
Possible discordance between the evolutionary history of a species (i.e. species tree) and the phylogenetic reconstruction provided by a gene (i.e. gene tree) is not a new concept (Nei 1987; Pamilo & Nei 1988). Yet automated methods relying on single genes to identify species are gaining popularity (e.g. Forister et al. 2008 and references within), without comprehensive testing of the efficacy of those genes. In this literature survey, we provide a taxonomically broad comparison among various classes of molecular markers to more comprehensively evaluate their success in delimiting closely related species, a problem that is usually only addressed case-by-case or with only a few species or markers at a time. Although systems such as DNA barcoding are concerned with a broader range of applications—e.g. utilizing a large, user-friendly database to match life stages, identify unknown tissues, etc. (see references above)—here we focus on the delimitation of closely related species, as we believe this task is of particular interest to taxonomy and systematics as sciences (de Carvalho et al. 2008).
To provide a more consistent basis for re-evaluating the reported success of single molecular markers, we conducted a literature survey of studies that have employed multi-locus species delimitation of closely related species across animals and fungi. In many studies using single-locus barcoding, COI or other markers are relied on to separate both deeply divergent taxa (e.g. in different genera) as well as closely related species. However, it is the closely related species distinctions that are often the most problematic, as more distantly related species can usually be more easily identified using classical morphological characters, negating the need for other methods (of course, this is not always true, particularly with immature stages or partial remains). Moreover, when overall success rates are calculated, high success in delimiting different genera can mask low success in delimiting closely related species. To compensate for this limitation in documenting the effectiveness of single marker delimitation and identification, we focused solely on published comparisons that used multiple independent genetic markers to delimit closely related species, typically at the level of recently diverged sister species. We then compared the success rates of molecular marker classes for delimiting species boundaries and tested the hypothesis that increased population-level and/or geographic sampling would uncover more variation and thereby decrease species delimitation success (Moritz & Cicero 2004; Meier 2008). Finally, we addressed whether using more molecular markers increased average species delimitation success and discussed the implications of these results for the future of integrative taxonomy.
Multi-locus literature survey
Detailed explanations of search procedures and subsequent characterization of studies for our literature survey are presented in Appendix S1 (Supporting information). Briefly, we included studies published from 1990 to February 2011 that: (i) dealt with the delimitation of closely related species (generally species in the same genus); (ii) compared at least two closely related but unambiguously distinct species/entities as determined by the authors; (iii) sampled at least five specimens per species; and (iv) used at least two independent molecular genetic markers (DNA- or gene-based molecular markers not inherited as a single genetic block). These studies were then characterized using indices for haplotype fixation and phylogenetic congruence developed in Roe & Sperling (2007) and Roe et al. (2010).
Genetic markers or loci were characterized as mtDNA, ribosomal DNA (rDNA), autosomal, sex-linked or anonymous, the last category including loci with unknown genomic locations such as microsatellites and amplified fragment length polymorphisms (AFLPs). Only nuclear-encoded rDNA genes were characterized as rDNA here. Taxa were classified as hexapods (various orders), miscellaneous (nonhexapod) invertebrates, fishes, amphibians, reptiles, birds, mammals and fungi. Studies were sorted by clade unless sample size was low, in which case an informal paraphyletic grouping was used (e.g. invertebrates, fishes, reptiles). Plants were not examined, as the botanical community commonly uses multiple markers, and similar comprehensive analyses have already been conducted (e.g. Hollingsworth et al. 2009).
Fixation and congruence indices
Due to the heterogeneous presentation of data in diverse publications, a standardized metric was needed to quantify and compare marker success across studies. We calculated a fixation index (FI) that represented haplotype fixation and a congruence index (CGI) to describe phylogenetic correspondence for each species comparison. This approach was designed to allow standardized comparisons across taxa and studies. Although standardizing the method of analysis (for instance, reanalyzing all data with maximum likelihood) would create a more level playing field for these comparisons (although unnecessary with some datasets: Rindal & Brower 2011), difficulties in acquiring data and the time required for such analysis made this option unviable.
The FI is the proportion of genetic markers whose haplotypes or alleles are reported as fixed or unique to a species (Roe & Sperling 2007). Haplotypes or alleles were classified as fixed (found only in one species) or shared (found in two or more species). We preferentially use the term haplotype to refer to both haplotypes and alleles when the distinction is unnecessary.
The CGI scores the phylogenetic or clustering relationship exhibited by loci and is the proportion of fixed loci that display either reciprocal monophyly or distance-based congruence (clustering) with the species boundaries preferred by the authors of the original studies (a more detailed discussion of the effect of species concepts on these methods is found in Appendix S1, Supporting information). CGI was originally named the clustering index or CI by Roe & Sperling (2007), but we now use CGI to reduce confusion with the widely used consistency index in phylogenetics, confidence interval in statistics and the common use of clustering to denote distance-based analyses. CGI should not be confused with Icong proposed by De Vienne et al. (2007) for testing topological similarity between trees. CGI was scored based on the type of analysis used. For trees derived using explicitly phylogenetic methods (parsimony, maximum likelihood or Bayesian inference), loci were characterized as exhibiting either reciprocal monophyly or paraphyly/polyphyly, relative to the preferred species delimitations of authors. For trees derived using distance-based methods (e.g. neighbour-joining, upgma, or similar approaches), loci were scored as either congruent or incongruent with the species limits used by the authors of the studies. To avoid inflated proportions of incongruence, loci that had shared haplotypes across species (and therefore cannot form monophyletic groups or congruent clusters compared with independently determined species limits) were classified as ‘NA’. Thus, CGI was based only on the subset of loci that had fixed haplotype differences and was a quantification of relationships among these fixed haplotypes. To summarize taxonomic subsets of the data, weighted means and 95% confidence intervals of FI and CGI were calculated following a binomial distribution (to accommodate the binomial states in the FI and CGI, Zar 1999).
Although FI is the most easily applied measure of successful species delimitation, fixation is difficult to measure with some marker types, such as microsatellites, AFLPs and allozymes. These markers are generally treated and reported as groups of loci in distance-based analysis, preventing the calculation of FI and reducing comparative power between marker groups. Furthermore, FI is likely to increase with the length of DNA sequenced, as longer DNA sequences are more likely to have unique mutations. Although markers analysed using distances can only be characterized as congruent or noncongruent, this still allows calculation of CGI. Both FI and CGI are conservative in calculating success in species delimitation, as just one specimen that displays a shared haplotype or nonmonophyletic relationship for that locus would cause the species to be classified as shared and paraphyletic/polyphyletic, respectively.
Population and geographic sampling adequacy
Our survey addressed sampling adequacy at three levels: genomic, population and geographic. Genomic sampling was assessed by comparing different classes of genetic markers (e.g. mtDNA vs. autosomal). Population-level variation was taken into account by recording the number of specimens examined per species for every study in the literature review. Many studies sampled different numbers of specimens for different loci, and to streamline analysis in these cases, we recorded the minimum number of specimens examined for all loci. To address the adequacy of geographic sampling in our literature survey, we estimated the proportion of the total geographic distribution of a species that was included in each study (see Appendix S1 for details, Supporting information). To assess the total geographic distribution (including known introductions), we preferentially used information provided by the authors of the original studies. However, if that was not sufficient, we obtained this information from related literature. The total size of the species’ distributions was also categorized as: (i) <100 km diameter; (ii) 100–1000 km diameter; (iii) 1000 km to across continent, or 1000 to 5000 km for marine species; or (iv) more than one continent; or >5000 km for marine species. Then the extent of sampling within the total distribution was assessed in terms of 25% increments. Although these estimates are relatively coarse and dependent on the availability of knowledge about the species distributions, they provide a preliminary assessment of whether widespread or widely sampled species are more difficult to delimit using molecular markers.
Multi-locus power analysis
A direct, although in practice substantially more complex, approach to testing whether increasing the number of loci improves species delimitation is to conduct a multi-locus power analysis (see Roe et al. 2010). This analysis was conducted by constructing neighbour-joining trees for all individual loci and for every combination of two, three and four loci. For each neighbour-joining tree, congruence with the author’s preferred limits for species was determined as for CGI, and the average proportion of successful species delimitations (i.e. average congruence) was calculated for each number of loci. We conducted a multi-locus power analysis on a subset of studies in our literature review, and details of the methodology are given in Appendix S1 (Supporting information). We also assessed the effect of the number of loci on the proportion of successful delimitation using logistic regression. Logistic regression was conducted in r version 2.14.0 (R Development Core Team 2012) using the MASS library. Post hoc analysis was conducted using Tukey’s honestly significant differences with a Bonferroni adjustment to control for pairwise error rates.
In total we examined 425 studies in detail. Of these, 324 were subsequently rejected, primarily due to low sample size. Missing data, undefined or ambiguous taxa and inappropriate taxonomic focus, such as examinations at the level of genus or within species rather than relationships among closely related species, also contributed to many rejections (Table S2, Supporting information). The 101 accepted studies are summarized in Table S3 (Supporting information) and are presented in detail in Table S4 (Supporting information). Accepted papers examined from 2 to 12 closely related species and used 2 to 27 loci for comparison of these species.
We examined a total of 377 separately used loci across all accepted studies (Fig. S1A, Supporting information). Of these, 241 showed fixed haplotypes or alleles and 108 had shared haplotypes or alleles between species (28 loci could not be classified as fixed or shared due to marker type: see Fixation and congruence indices, above). Reciprocal monophyly or congruence with author-defined species limits was seen in 157 loci; 111 showed either paraphyly, polyphyly or noncongruence; and 109 were classified as ‘NA’ (Fig. S1B, Supporting information).
Fixation and congruence indices
Overall, the five marker classes had similar success rates in delimiting closely related species when all taxonomic groups were combined (Fig. 1A). Autosomal loci had the lowest FI value (66% fixed vs. shared haplotypes), but were surpassed only slightly by mtDNA, rDNA and sex-linked loci (71%, 74% and 74%, respectively). Mean CGI also showed a rather narrow range among loci (Fig. 1B), with anonymous loci having the highest CGI values (76%), and autosomal loci the lowest (52%). We also examined variation in the mean FI for different taxonomic groups, as fixation acts as a general measure of delimitation success (Fig. 2). Marker groups with less than five loci (for a particular taxonomic group) were omitted, to avoid potential sampling artefacts in mean FI and CGI values (Fig. 1; Table S3A, Supporting information), and as with the combined data, mean FI was highly variable for all marker classes (Fig. 2A). Ribosomal DNA showed the smallest range (50% fixation in fungi vs. 88% in miscellaneous invertebrates), and autosomal markers showed the highest (17% fixation in birds vs. 82% in fungi). Hexapods were the most intensively sampled group of organisms and were further sorted by the order where sample size allowed. Generally, frequencies for FI and CGI within Hexapoda were similar to the rates for all taxa combined (Table S3B, Supporting information). One major difference is an elevated frequency of fixed alleles in sex-linked markers (93%; Fig. 2B), associated with increased use of these markers in the Lepidoptera and Diptera (e.g. Roe & Sperling 2007). Sex-linked markers in other groups do not show elevated fixation, although we found few studies using this marker type (one study of miscellaneous invertebrates and several of birds and mammals: Table S3A, Supporting information). Interestingly, apart from fungi, rDNA also exhibited high FI and CGI (Fig. 2; Table S3, Supporting information).
Population and geographic sampling analysis
Accepted studies sampled up to 320 specimens per species, but above our arbitrary cut-off of five, there was a sharp decline in the number of specimens sampled per species (Fig. 3A). No consistent relationship was present between the number of specimens sampled per species and either FI or CGI (Fig. 3B).
Geographically, studies tended to be polarized, sampling either most of the distribution of a species or less than half of it (Fig. 4, right Y-axis). As with the number of specimens sampled per species, no relationship between FI or CGI and extent of geographic sampling was evident (Fig. 4). When marker groups were assessed separately for both the number of specimens sampled per species and geographic sampling, no overarching trends were apparent for either FI or CGI (Fig. S2, Supporting information). When geographic sampling was subdivided by estimated global distribution, an apparent trend is present towards increased FI in species with more geographically extensive ranges (Fig. S3, Supporting information). This subdivision of the data, however, contains substantial variation, and high FI values for species with more extensive geographic ranges are based on low sample sizes.
Using a literature review, we were able to compare species delimitation success for five classes of molecular markers across a wide range of closely related fungal and animal taxa. Three main findings were obtained from these results: (i) Used individually, all marker classes were moderately successful at delimiting closely related species; (ii) increased geographic or population sampling did not significantly affect success in delimiting species; and (iii) these results—particularly those of the multi-locus power analysis—support investigation and use of multiple alternate markers for species delimitation.
Species delimitation success compared among marker classes
All marker classes showed roughly similar success rates in species delimitation when all taxonomic groups were combined (66–76% FI; Fig. 1). Notably, mtDNA does not prove to be significantly better or worse than any other marker group. With an overall success rate of 71%, our results for mtDNA correspond well to several other estimates that were restricted to one taxonomic group (∼70%: Meier et al. 2006; 77%: Elias et al. 2007). By focusing on closely related species, we intentionally distinguished success rates at this taxonomic level from surveys that include deeply divergent taxa. We feel that this focus gives a more accurate measure of delimitation success for the cases that are most in need of molecular markers—closely related species. With these limited success rates, our results emphasize that a single marker cannot consistently be used for unequivocal and universal species delimitation (e.g. Brower 2006; Meier et al. 2006; Elias et al. 2007; Roe et al. 2010), particularly not with confidence levels that would, for instance, hold up in a court of law (Sperling & Roe 2009). Additionally, the variability present between taxonomic groups and marker types (Fig 2) can be used as a guide for future investigation and development of additional universal markers for species delimitation. For example, sex-linked markers show consistently high success in delimiting closely related species in Diptera and Lepidoptera, a previously detected pattern (Diptera: Coyne & Orr 1989; Lepidoptera: Sperling 1994; Diptera and Lepidoptera: Roe & Sperling 2007).
Our multi-locus power analysis indicated a significantly positive relationship between the number of loci used and species delimitation success, thus supporting previous findings using this approach (Roe et al. 2010). Of course, this methodology is rudimentary, and the concatenation of multiple loci with potentially different effective population sizes and evolutionary dynamics does require phylogenetic discretion. In practice, the addition of more loci is further complicated by associated costs (including both time and money), which can increase quickly and must be weighed on a project-by-project basis. Although simple, however, this analytic approach sheds light on multi-locus species delimitation, and we recommend its continued use.
Intraspecific variation and geographic sampling adequacy
Our assessment of the effects of population-level sampling used a minimum filter of at least five specimens sampled per species as a criterion for selecting studies. This gave us an ample, but not overwhelming, number of studies to work with. Of course, five specimens per species will not capture all real-world variability (DeSalle et al. 2005), particularly in cases with widespread species distributions (Davis & Nixon 1992; Walsh 2000). Some mtDNA barcoding proponents have proposed higher standards (10 specimens per species: Hajibabaei et al. 2005; 12 specimens per species: Matz & Nielsen 2005). Nonetheless, low sample size was still responsible for the highest number of rejected studies after our initial scan of the literature (154 studies: Table S2, Supporting information), and an additional 25 studies would have been rejected with a cut-off of 10 specimens sampled per species. Furthermore, a large number of accepted studies (40 of the 101) sampled <12 specimens per species (Fig. 3A).
Contrary to theoretical expectation, we found no trend supporting the hypothesis that increasing the number of specimens sampled per species (>5) decreases FI or CGI due to increased intraspecific variation—an idea exemplified in empirical studies (e.g. Brower 2006; Meier et al. 2006; Segerer et al. 2011). The expected relationship between sampling and elevated FI or CGI may still hold if four or fewer specimens are sampled per species, but its assessment would be complicated by other factors such as the generally phylogenetic focus of such studies. We are also cautious about concluding that there is no biologically valid relationship between FI or CGI and more extensive population sampling for two main reasons. First, a review methodology relying on the literature introduces the potential for publication bias. Studies with clean, clear results are both easier to write up and easier to shepherd through review, a general phenomenon that is widely recognized (Rosenthal 1979; Csada et al. 1996; Johnson & Dickersin 2007; Lehrer 2010). Second, the occurrence of selective sweeps not only within species, but also introgression between species, is becoming more apparent (see Chan & Levin 2005 and references within). Either, or both, of these issues could confound our assessment of intraspecific variation, and identifying the exact cause is beyond the scope of this study.
Interestingly, several recent DNA barcoding studies have also addressed the issue of geographic sampling, with differing results; while Lukhtanov et al. (2009) found substantially increased intraspecific variability with increased geographic sampling, Hebert et al. (2010) did not. Both of these studies, however, were limited to one taxonomic group and were further distanced from our results by the inclusion of numerous deeply divergent species. As other empirical studies continue to reinforce the importance of capturing interspecific variability for species delimitation (Brower 2006; Meier et al. 2006; Segerer et al. 2011), it is clear that summarizing these effects requires more work.
mtDNA, species delimitation and taxonomy
The third, and we believe most important, issue raised by our results concerns the fundamental nature of species delimitation as a taxonomic approach. Although DNA barcoding is useful in many applications, limitations in methodology and the nature of mitochondrial evolution decrease its applicability for detailed systematic or taxonomic analysis, particularly for closely related species (DeSalle et al. 2005; Will et al. 2005; de Carvalho et al. 2008). COI—or any other molecular marker for that matter—serves only as a rough guide for successfully delimiting species. Although some groups of organisms are well delimited by a single marker, many will not fit into this single-locus conceptual construct. Species may be considered to be hypothetical vessels to hold and characterize variation and, as hypotheses, are either supported or rejected by data (De Queiroz 2007; Padial & de la Riva 2010; Yeates et al. 2011). Despite recent discussions addressing contrasting goals and definitions of DNA barcoding, taxonomy and systematics (e.g. Vogler & Monaghan 2006; DeSalle 2007; Waugh 2007; Brower 2010; Ebach 2011; Stevens et al. 2011), each of these fields is concerned with species as taxonomic hypotheses. By limiting the amount of genomic variation (i.e. using only one marker) or intraspecific variation that is sampled (as discussed previously), we limit the ability to effectively realize patterns and formulate alternative hypotheses concerning species boundaries.
Ultimately, a balance must be met between the standardization and automation advocated by DNA barcoding, and the systematic and taxonomic view of a species as a hypothesis. Therefore, we argue in favour of standardization of multiple markers within groups of animals (e.g. van Nieukerken et al. 2012), a task that our taxonomically partitioned results can assist, and iterative or integrative approaches to species delimitation and taxonomy (see Yeates et al. 2011). The importance and the added complexity of incorporating multiple lines of evidence in species delimitation are not new concepts (e.g. Wilson & Brown 1953). Reliance on multiple molecular markers may lead to more cases of incongruence, as compared to a ‘barcode species concept’ (Rubinoff 2006), but the aim in this endeavour is the delimitation of evolutionary significant units rather than self-referential consistency. Furthermore, detection of incongruence leads to greater evolutionary understanding of phenomena such as introgression, population structure and sex-biased gene flow (Funk & Omland 2003; Rubinoff & Holland 2005; Marko & Hart 2011). Power analyses evaluating the need for multiple markers are available for plants (e.g. Hollingsworth et al. 2009; Burgess et al. 2011), and we have attempted to move in this direction for animals and fungi (e.g. Roe et al. 2010); however, there is a clear need for further studies of this kind. Ultimately, by capturing as much natural variation as possible within biologically meaningful species limits, our knowledge of those species units will have more universal applications in the ways that matter to us all.
This is the first taxonomically comprehensive review of the efficacy of different marker groups for the delimitation of closely related species. Through the use of strict screening methods, we have shown that all marker groups have relatively equal success in delineating closely related species and that using more markers increases average delimitation success. Unexpectedly, we found no relationship between population-level or geographic sampling and delimitation success, although this may be an artefact of our review methodology and deserves more rigorous and systematic investigation. Ultimately, we support a hypothesis-based, integrative approach to species delimitation. Divorcing our knowledge of real biological complexity from the operational process of species delimitation would only serve to confine our knowledge of biodiversity and suspend progress in taxonomy, systematics and biology as a whole.
This work was funded by an NSERC Discovery Grant to F.A.H.S. We sincerely thank C.M. Whitehouse for statistical assistance, and L. Bernatchez, A.V.Z. Brower, B.M.T. Brunet, J.J. Dombroskie, L.M. Lumley, B.A. Mori, H.C. Proctor and three anonymous reviewers for review comments on the manuscript.
J.R.D. is a PhD student in the systematics and evolution program at the University of Alberta and his research interests include the interaction between speciation, hybridization and spatial ecology. A.D.R. is an NSERC Visiting Fellow with the Canadian Forest Service and continues her research into species limits, diagnostics and phylogeographic patterns at the population–species interface across a diverse range of organisms. F.A.H.S. is a professor at the University of Alberta and has broad research interests including the evolutionary biology, taxonomy and systematics of insects.
Appendix: Studies included in the literature review