Geographical sampling bias in a large distributional database and its effects on species richness–environment models

Authors

  • Wenjing Yang,

    1. State Key Laboratory of Vegetation and Environmental Change, Institute of Botany, Chinese Academy of Sciences, Beijing, China
    2. Biodiversity, Macroecology and Conservation Biogeography Group, Faculty of Forest Sciences and Forest Ecology, University of Göttingen, Göttingen, Germany
    3. Graduate University of Chinese Academy of Sciences, Beijing, China
    Search for more papers by this author
  • Keping Ma,

    Corresponding author
    • State Key Laboratory of Vegetation and Environmental Change, Institute of Botany, Chinese Academy of Sciences, Beijing, China
    Search for more papers by this author
  • Holger Kreft

    Corresponding author
    1. Biodiversity, Macroecology and Conservation Biogeography Group, Faculty of Forest Sciences and Forest Ecology, University of Göttingen, Göttingen, Germany
    • State Key Laboratory of Vegetation and Environmental Change, Institute of Botany, Chinese Academy of Sciences, Beijing, China
    Search for more papers by this author

Correspondence: Keping Ma, State Key Laboratory of Vegetation and Environmental Change, Institute of Botany, Chinese Academy of Sciences, 20 Nanxincun, Xiangshan, 100093 Beijing, China.

E-mail: kpma@ibcas.ac.cn

Holger Kreft, Biodiversity, Macroecology and Conservation Biogeography Group, Faculty of Forest Sciences and Forest Ecology, University of Göttingen, Büsgenweg 1, 37077 Göttingen, Germany.

E-mail: hkreft@uni-goettingen.de

Abstract

Aim

Recent advances in the availability of species distributional and high-resolution environmental data have facilitated the investigation of species richness–environment relationships. However, even exhaustive distributional databases are prone to geographical sampling bias. We aim to quantify the inventory incompleteness of vascular plant data across 2377 Chinese counties and to test whether inventory incompleteness affects the analysis of richness–environment relationships and spatial predictions of species richness.

Location

China.

Methods

We used the most comprehensive database of Chinese vascular plants, which includes county-level occurrences for 29,012 native species derived from 4,236,768 specimen and literature records. For each county, we computed smoothed species accumulation curves and used the mean slope of the last 10% of the curves as a proxy for inventory incompleteness. We created a series of data subsets with different levels of inventory incompleteness by excluding successively more under-sampled counties from the full data set. We then applied spatial and non-spatial regression models to each of these subsets to investigate relationships between the species richness of subsets and environmental factors, and to predict spatial patterns of vascular plant species richness in China.

Results

Log10-transformed numbers of records and documented species were strongly correlated (= 0.97). In total, 91% of Chinese counties were identified as under-sampled. After controlling for inventory incompleteness, the overall explanatory power of environmental factors markedly increased, and the strongest predictor of species richness switched from elevational range to annual wet days. Environmental models calibrated with more complete inventories yielded better spatial predictions of species richness.

Main conclusions

Our results indicate that inventory incompleteness strongly affects the explanatory power of environmental factors, the main determinants of species richness obtained from regression analyses, and the reliability of environment-based spatial predictions of species richness. We conclude that even large distributional databases are prone to geographical sampling bias, with far-reaching implications for the perception of and inferences about macroecological patterns.

Ancillary