Recent advances in the availability of species distributional and high-resolution environmental data have facilitated the investigation of species richness–environment relationships. However, even exhaustive distributional databases are prone to geographical sampling bias. We aim to quantify the inventory incompleteness of vascular plant data across 2377 Chinese counties and to test whether inventory incompleteness affects the analysis of richness–environment relationships and spatial predictions of species richness.
We used the most comprehensive database of Chinese vascular plants, which includes county-level occurrences for 29,012 native species derived from 4,236,768 specimen and literature records. For each county, we computed smoothed species accumulation curves and used the mean slope of the last 10% of the curves as a proxy for inventory incompleteness. We created a series of data subsets with different levels of inventory incompleteness by excluding successively more under-sampled counties from the full data set. We then applied spatial and non-spatial regression models to each of these subsets to investigate relationships between the species richness of subsets and environmental factors, and to predict spatial patterns of vascular plant species richness in China.
Log10-transformed numbers of records and documented species were strongly correlated (r = 0.97). In total, 91% of Chinese counties were identified as under-sampled. After controlling for inventory incompleteness, the overall explanatory power of environmental factors markedly increased, and the strongest predictor of species richness switched from elevational range to annual wet days. Environmental models calibrated with more complete inventories yielded better spatial predictions of species richness.
Our results indicate that inventory incompleteness strongly affects the explanatory power of environmental factors, the main determinants of species richness obtained from regression analyses, and the reliability of environment-based spatial predictions of species richness. We conclude that even large distributional databases are prone to geographical sampling bias, with far-reaching implications for the perception of and inferences about macroecological patterns.