Accounting for uncertainty in species delineation during the analysis of environmental DNA sequence data


Correspondence author. E-mail:


1. Defining species boundaries represents a significant challenge in biodiversity studies, especially as these studies increasingly rely on high-throughput DNA sequencing technologies. A promising approach for delineating species in environmental sequence data combines phylogenetics and coalescence theory to estimate species boundaries from distributions of lineage birth rates within multispecies coalescent trees.

2. Existing methods for interpreting these models utilize hypothetico-deductive reasoning to identify thresholds associated with a mixed speciation-coalescent model that fits the data better than a null model. Here, I describe an alternative approach that ranks and assigns weights to models based on their fit to the data using information criteria and then uses model averaging to estimate parameters and species probabilities.

3. This approach is applied to data from two independent studies that address (i) patterns of cospeciation in an aphid–bacterial symbiosis and (ii) diversity of bacterial communities associated with the human gut. In both of these cases, accounting for uncertainty during model selection allowed greater flexibility to detect variable (with respect to time) speciation-coalescent thresholds among lineages.

4. The precision of the predicted species boundaries varied among the studies, and the variance-to-mean ratio for richness estimates ranged from 0.023 to 0.079. Sample-based estimates of gut bacteria richness revealed that accounting for uncertainty during species delineation increased the variance in the estimates of population means (by individual from which the samples were taken or by sex of the individuals) by up to 7.5%.

5. In ecological and evolutionary studies, conclusions are highly dependent on the classification system that is adopted; given the uncertainty in species boundaries observed here, ignoring this source of error (as is common practice) likely results in inflated type I error rates. The approach described here represents an objective, theory-based method for predicting species boundaries and explicitly incorporates uncertainty in the classification system into biodiversity estimation, thus allowing researchers to better address the causes and consequences of biodiversity.