Undersampling and the measurement of beta diversity


correspondence author. E-mail: jan.beck@unibas.ch


  1. Beta diversity is a conceptual link between diversity at local and regional scales. Various additional methodologies of quantifying this and related phenomena have been applied. Among them, measures of pairwise (dis)similarity of sites are particularly popular. Undersampling, i.e. not recording all taxa present at a site, is a common situation in ecological data. Bias in many metrics related to beta diversity must be expected, but only few studies have explicitly investigated the properties of various measures under undersampling conditions.
  2. On the basis of an empirical data set, representing near-complete local inventories of the Lepidoptera from an isolated Pacific island, as well as simulated communities with varying properties, we mimicked different levels of undersampling. We used 14 different approaches to quantify beta diversity, among them dataset-wide multiplicative partitioning (i.e. ‘true beta diversity’) and pairwise site x site dissimilarities. We compared their values from incomplete samples to true results from the full data. We used these comparisons to quantify undersampling bias and we calculated correlations of the dissimilarity measures of undersampled data with complete data of sites.
  3. Almost all tested metrics showed bias and low correlations under moderate to severe undersampling conditions (as well as deteriorating precision, i.e. large chance effects on results). Measures that used only species incidence were very sensitive to undersampling, while abundance-based metrics with high dependency on the distribution of the most common taxa were particularly robust. Simulated data showed sensitivity of results to the abundance distribution, confirming that data sets of high evenness and/or the application of metrics that are strongly affected by rare species are particularly sensitive to undersampling.
  4. The class of beta measure to be used should depend on the research question being asked as different metrics can lead to quite different conclusions even without undersampling effects. For each class of metric, there is a trade-off between robustness to undersampling and sensitivity to rare species. In consequence, using incidence-based metrics carries a particular risk of false conclusions when undersampled data are involved. Developing bias corrections for such metrics would be desirable.