Factors influencing the precision of species richness estimation in Japanese vascular plants

Estimating species richness from a series of samples is an important and widely debated issue in ecology and biodiversity conservation. Numerous tests of respective richness estimators gave insights into the precision, the limitations and the pitfalls of richness forecasting. However, few benchmark tests used almost complete empiric census data obtained at those spatial scales where richness estimation is most useful for conservation management.

Estimators of species richness can be classified into four groups (Chao & Chiu, 2016). First, nonparametric asymptotic estimators are often based on random sampling assuming a Poisson species spatial distribution (Chao, Colwell, Lin, & Gotelli, 2009). Second, parametric non-asymptotic rarefaction estimators extrapolate species richness of standardized samples within a common finite sample size towards larger sample sizes (Colwell, Chao, & Gotelli, 2012). Third, species accumulation curves and species distribution models extrapolate richness data for observed samples or areas towards the focal sample size or area (Scheiner, 2003;Shen & He, 2008;Ugland, Gray, & Ellingsen, 2003). These approaches also include spatially explicit ecological drift (Hubbell, 2001) and maximum entropy models (Harte & Newman, 2014). Fourth, Dewdney (1998) and Ulrich and Ollik (2005) have advocated the use of species-abundance or species-occupancy distributions (below abbreviated as SAD) obtained from finite samples to estimate richness of larger samples.
Tests of richness estimators were mainly based on simulation studies where samples were taken from idealized communities spread over many sample sites (e.g. Chao & Chiu, 2015). These tests provided insights into the behaviour of estimators, returned recommendations for estimator choice (Kunin et al., 2018;Walther & Moore, 2005) and demonstrated the comparable behaviour of different estimators that are based on the same theoretical assumptions (Chao & Chiu, 2015;Gwinn, Allen, Bonvechio, Hoyer, & Beesley, 2016). They also pointed to possible pitfalls. Particularly, Gwinn et al. (2016) demonstrated the sensitivity of all available nonparametric estimators to sample size and the shape of the underlying specie abundance distributions.
These tests, however, have been performed on small communities with low total species richness and their results cannot be extended in a straightforward way to systems with large spatial extent and with a large number of species. It is also not clear whether methods proposed for local scale sampling are also applicable at regional or even global scales (Fattorini, 2013). However, most environmental studies and also conservation planning use sample plots to predict ecological patterns at larger areas or even globally (Chao, Colwell, Gotelli, & Thorn, 2019;Chiarucci, Bacaro, & Scheiner, 2011). An exception is the recent comparative study by Kunin et al. (2018), who used fully censused plant records from Southern England to evaluate whether richness estimators based on extrapolation techniques (species-area relationships and species occupancy distributions) and species distribution modelling (maximum entropy and ecological drift) are able to forecast the true regional plant richness and the respective species accumulation with increasing space. However, all the methods tested by Kunin et al. (2018) required information on either small-scale species accumulation curves or species occupancy distributions that are not available in most ecological and conservation studies. Consequently, these latter studies still rely on traditional parametric and nonparametric estimators based on specific probability distributions (reviewed in Chao & Chiu, 2016). These estimators still have to be tested with large-scale empirical data.
Common richness estimators rely on the assumption that the distribution of abundances and the pattern of species spatial distribution are sufficiently scale invariant to meet the assumptions of the underlying model (Brose, Martinez, & Williams, 2003). For instance, fractal spatial species distributions would generate richness accumulation curves that follow a power function (Šizling & Storch, 2004), thus allowing its use for precise richness estimation.
However, many studies revealed that patterns of species spatial aggregation (e.g. Kunin, 1998;Lennon, Koleff, Greenwood, & Gaston, 2001) and community evenness (e.g. Brose et al., 2003) change with increasing sample area. These changes appeared to be taxon and biome-specific, making it challenging to apply universal correctors for richness estimators.
Without knowledge of the scaling of species spatial distributions, extrapolations of species richness become increasingly uncertain, especially when based on very small sample sizes (Chiarucci et al., 2003). The scaling of the spatial distribution of species and their abundances is highly dependent on environmental factors (Mertes & Jetz, 2018). Consequently, we might expect that the performance of richness estimators is also influenced by these factors. A possible yet largely unexplored way to improve richness estimators would be to use environmental characteristics that determine community composition (defined by species identities) in a predictable way. Often, environmental variables are easier to assess than community patterns. However, respective studies on environmental constraints on richness estimation are largely missing. For instance, Beukema and Dekker (2012) found that species accumulation curves changed in shape along environmental gradients from near-shore to off-shore intertidal areas due to shifts in the relative abundances of rare and common species, allowing reliable richness estimates only for large aggregated samples.
Here, we try to fill this gap in our knowledge using the distributional data (presence-absence) of Japanese plant species as a model system. This dataset contains complete information on the occurrence of each Japanese plant species at the 10 × 10 km grid cell level (Kusumoto, Villalobos, Shiono, & Kubota, 2019). Thus, our data do not rely on samples, but on complete censuses. Using a fully nested sampling of these cells, we link the precision and the relative error of four common richness estimators to geographical position, climate and geographical variability, as well as to the pattern of species spatial turnover and evenness. We ask whether and how (a) geographical position and altitude influence the accuracy of richness estimation, (b) the variability in climate variables, elevation and forest cover affects estimator performance and (c) evenness in occurrence and the spatial species turnover among cells influence estimation accuracy. For each of the four categories of estimators cited above (i.e. nonparametric asymptotic, parametric non-asymptotic rarefaction, species accumulation and SAD-based models), we selected that estimator known to perform best. We test them against our data to infer which estimator is best suited in a given large-scale, empirical situation.

| Study area
The East Asian islands mainly including the Japanese and Ryukyu archipelagos are located off the eastern coast of the Eurasian continent. The mean annual temperature ranges from −5.3 to 24.2°C.
The annual precipitation ranges from 650 to 4,538 mm. Such a wide range of climate form diverse biomes across hemiboreal, temperate and subtropical vegetation.

| Species distribution data
Occurrence records for vascular plant species across Japan were compiled by searching the botanical literature on the flora of Japan (Kubota, Kusumoto, Shiono, & Tanaka, 2017;Kubota, Shiono, & Kusumoto, 2015). Most of the references were based on specimen records, local species checklists, expert species range maps and vegetation census records (phytosociological tables). We then georeferenced occurrences to latitudinal and longitudinal coordinates. Based on the resulting set of species occurrence data, we built a geographical distribution database for 5,614 species at the 10 × 10 km 2 grid cell level (4,852 cells). Therefore, our study is based on almost complete species census data at the grid level minimizing both commission (false presence) and omission (false absence) errors. Species names followed the Japanese Scientific Names Index (Yonekura & Kajita, 2003). In this analysis, exotic species including planted species were excluded from the dataset (Kusumoto et al., 2019). Kubota et al. (2015), Kubota et al. (2017) provided a more detailed description of how this dataset was created.

| Environmental data
For each of the 4,852 Japanese grid cells, we compiled an environmental dataset including temperature and precipitation seasonality (i.e. the difference between the annual maximum and minimum values), forest area (km 2 ), land and forest area, and the variability in forest cover and elevation within each grid cell. Temperature and precipitation data refer to the period between 1971 and 2000, and were extracted from the 1-km gridded data in the Mesh Climate Data 2000 (Okada, Iizumi, Nishimori, & Yokozawa, 2008).
We split the 4,852 grid cells into 46 windows of 100 geographically adjacent cells each (excluding cells that covered more than 50% sea area). We excluded another 252 cells that were either isolated (violating the nested design) or lacked plant records (inflating zero counts). We calculated for each of these windows four estimators described below based on 10, 20, 30, … 90, 99 cells (in a fully nested design) resulting in a total of 460 estimates.
The Chao2 estimator is given by.
where S obs is the observed richness, N the number of sample units, and Q 1 and Q 2 are the numbers of species that occurred exactly in one and in two of the sample units (Chao & Chiu, 2016).

2008) comes from
where K is the total number of sample units (Chao & Chiu, 2016).
Among various species accumulation models, species-area relationships (SARs) are most often applied in ecology and biodiversity conservation. There is a long-standing debate whether and in which situation SARs are better described by the power function or other models, such as the logarithmic function (e.g. Connor & McCoy, 1979;Triantis, Guilhaumon, & Whittaker, 2012). Most work favoured power functions (Rosenzweig, 1995;Dengler et al., 2020). Both the power function and the logarithmic model might be able to provide upper and lower limits of richness, but the logarithmic function frequently underestimates richness when extrapolated to larger areas (Dengler, 2009). The estimate based on the power function speciesarea relationship (Rosenzweig, 1995) is given by where S 0 , the richness at unit area, and z, the slope value, are the parameters inferred from the observed range of area. Here, we extrapolated eq. 4 using its linearized version Accordingly, extrapolations from the logarithmic SAR were based on where both estimates for the intercept S 0 and the slope z came from ordinary least squares regressions for the richness data within the observed range of areas.
Recent meta-analyses (Ulrich, Nakadai, Matthews, & Kubota, 2018;Ulrich, Ollik, & Ugland, 2010) and estimator comparisons (Kunin et al., 2018) have corroborated the view that the vast majority of observed species abundances distributions fall within boundaries set by the log-series (Fisher, Corbet, & Williams, 1943) and the lognormal distribution (Preston, 1948). When applied to a finite area both distributions allow for an assessment of total richness. Therefore, we used the method of Ulrich and Ollik (2005) to obtain an estimate of the upper limit of species richness from the observed distribution of species occurrences (cf. Kunin et al., 2018). Under the assumption that the observed species rank occurrence distribution follows a log-series, the upper limit of expected richness S LS is given by The lower limit of richness comes from the assumption that the

SAD follows a lognormal distribution
where Int is the intercept and slope the slope of the linear regression for the observed log-occurrence-species rank order distribution (SAD).
N obs and N total are the observed number of sample plots used for constructing the SAD and the total number of plots, respectively.
We assessed the relative error (relative bias) and the precision from and where S est and S total are the estimated and total richness, respectively.
We assessed the dependence of relative bias on sampling fraction (the proportion of cells sampled) by the coefficient of correlation r bias between relative bias and ln-transformed sampling fraction.
Below, species coverage refers to the proportion of species sampled with respect to the total (the empirical) number in the focal window.
For each sample, we assessed the total area sampled, the proportions of area sampled (the sampling fraction) and of forest cover, and the averaged values of the aforementioned variability in forest cover, elevation, precipitation and temperature. Additionally, we calculated for each window the degree of species spatial segregation from the C-score (Stone & Roberts, 1990), which is a normalized count of the number of pairwise checkerboard occurrences summed over all species pairs. As raw metrics of co-occurrences are biased by matrix size and fill (Ulrich et al., 2018), we use a null model approach and compared observed scores with those obtained from an equiprobable reshuffling of the occurrences of each species among the cells of each window. This null model retained for each window the observed species composition but randomized the occurrences of each species among the samples. The motivation behind this random assumption was that there is no a priori reason to constrain the occurrences of a focal species towards certain cells within a window of generally similar climatic conditions. We used standardized effect where CS obs is the observed score and CS exp and σ exp are the mean and the standard deviation of the null distribution, respectively) to relate the degree of spatial segregation to the error in richness estimation. High values of CS obs and SES point to significant species spatial turnover among cells. Finally, we calculated for each window and each sample the degree of evenness in the number of occurrences from E = H lnS total ; where H denotes the Shannon diversity and ln S total the log-transformed total number of species in the focal window (sample).
We used a general linear model (GLM) to link the relative bias for each sample size and window to environmental variables, the degree of species spatial segregation and evenness. We note that this nested sampling design caused spatial non-independence of single samples (10 to 99 grid cells) that might affect significance testing. To account for this non-independence, we added the dominant eigenvector (PCA1) of the geographical Euclidean distance matrix of the single cells as a covariate in the GLM (reviewed by Hawkins, 2012).
Additionally, we the log-transformed sampling fraction entered the models. Coverage increased in a nested manner and accounted for the richness increase from the area effect only. Prior to analyses, we calculated pairwise Pearson's correlations between predictor variables. These were at most moderately correlated (r < .40) except for PCA1 and temperature seasonality (r = .88). Excluding PCA1 from the analysis did not change the remaining results qualitatively.

| RE SULTS
Japan shows a strong latitudinal gradient in species richness above 35° north (Figure 1a). The proportion of species detected increased logarithmically with the proportion of area sampled (the sampling fraction, Figure 1b). However, this proportion-area relationship was independent from total richness (Figure 1c).
When more than 40% of area was sampled, 93.1% of the Chao2 and 94.1% of the rarefaction estimates had errors of <10% (Figure 2a,b), with a tendency of richness underestimation. Chao2 and rarefaction behaved very similar in all comparisons (Figure 2, Tables 1 and 2) and significantly underestimated true richness below 40% sampling fraction (r bias = .84). The power function SAR approach (powSAR) tended to overestimate richness at lower sampling fraction (r bias = .81, Figure 2c) and was generally less precise than Chao2 and rarefaction but also less affected by sampling fraction (Figure 2c). Only 53.3% of the powSAR estimates were within the 10% error level. In turn, the logarithmic SAR (logSAR) estimates were always below those of the powSAR and tended to underestimate true richness below 40% sampling fraction (r bias = −0.39, Figure 2c). 76.3% of the logSAR estimates were within the 10% error level without a stronger relative at lower sampling fraction (r bias = .22). The SAD approach also provides upper and lower limits in richness. Indeed, the S LS estimator consistently overestimated observed richness and behaved similarly to the powSAR approach ( Figure 2d). Only 30.9% of the estimates were within the 10% error level (Figure 2d). The S LN provided robust estimates over the whole range of sampling fraction (r bias = .04), with 67.4% of estimates ranging within the 10% error boundaries (Figure 2d). S LN behaved similar to the logSAR approach (Figure 2c,d) and was slightly positively biased at low sampling fraction (r bias = .13).
Because the SAR and the SAD estimators were designed to provide lower and upper limits to richness, average values might reduce the relative bias and consequently the precision of the estimates (Figure 2e, f). This was indeed the case although the SAD approach still tended to overestimate richness irrespective of sampling fraction F I G U R E 1 (a) Above 35°N, Japanese tree species richness strongly declines with increasing latitude. Data from the 46 grid windows only. (b) The species coverage (p S = observed richness/total richness) in dependence on the sampling fraction (p A ). Data from 4,600 10 × 10 km 2 grid cells. The increase in sampling fraction is well described by a logarithmic function: fra = 0.15 ln(p A ) + 1; OLS fit: r 2 = .84). (c) The increase in sampling fraction with increasing area was independent from total richness in each window (r 2 < .01) performed as well as the Chao2 and rarefaction approaches when relaxing the error level to 20% (86.5% and 83.0%, respectively, of estimates within the 20% error level) but were not significantly biased by sampling fraction (Figure 2, Table 1).
We found a weak increase in precision of the studied estimators (except of powSAR with increasing latitude (Table 1), which explained between 3% and 7% of the variance in precision, respectively.
Further, the pattern of tree species co-occurrences, as quantified by the degree of species spatial segregation, did not markedly affect the Chao2, the rarefaction and both SAR estimators (Table 1). Higher species spatial turnover (positive SES CS ) increased the precision of S LS and S LN . Evenness did not affect the precision of the Chao2, the rarefaction and the SAR estimators, while high evenness negatively affected the precision of S LS and S LN (Table 1). In turn, S LS and S LN were least influenced by the proportion of area sampled (Table 1) and provided unbiased estimated even at low cell coverage (Figure 2d).
Relative bias and precision of the four estimators were largely unaffected by environmental variability (Table 2, Figure 3). After accounting for cell coverage and spatial distribution, elevation variability, precipitation and temperature seasonality, the variability in forest cover explained at most 3% of the variance in the Chao2, rarefaction, and the SAR estimators (Table 2). Exception were the S LS and S LN estimators that turned out to be the least biased at intermediate levels of variability in elevation (Table 2, Figure 3d) and forest cover (Table 2, Figure 3l).

| D ISCUSS I ON
Our study focused on two important aspects of species richness forecasting. First, we tested the performance of four types of estimators against empirical large spatial scale data. Second, we assessed how precision and relative bias of these estimators were influenced by environmental variables and community composition. Importantly, our study is the first that uses almost complete census data on species occurrences at a large spatial scale. These data enable a precise analysis of the behaviour of each estimator at different fractions of area covered.
Our first task largely confirmed prior tests based on simulation studies (e.g. Chao & Chiu, 2015) and empirical data at small and regional scale (Herzog et al., 2002;Walther & Moore, 2005). The TA B L E 1 General linear models of richness coverage (= observed richness/ total richness) and precision in richness estimation in relation to longitude, latitude, the standardized effect sizes of species co-occurrences (SES CS ), and the degree of evenness  nonparametric Chao2 estimator and rarefaction performed well only at high sampling fraction (Figure 2), being negatively biased at low sample sizes. At higher sampling fraction (say above 40%), rarefaction was superior to Chao2 because all estimates ranged within the 10% error boundary (Figure 2b) without overestimating richness. At lower coverage, Chao2 and rarefaction performed nearly identical and underestimated true richness (even by 60%). Nevertheless, even at low sampling fraction both estimators performed not worse or even better than the parametric SAR and SAD estimators (Figure 2). This is a promising result obtained from large-scale empirical data meaning that we have very different approaches to diversity forecasting that provide reliable results independent of the available data structure. For instance, in many cases available data contain only species lists or information on total abundances and do not allow for the construction of rarefaction or species accumulation curves, while Chao2 or SAD provides alternatives.
The SAR and SAD estimators were designed to assess lower and  Figure 2). The log-series is a sample distribution (Fisher et al., 1943) and implies fairly constant proportions of species along the occupancy axis. This distribution is predicted from ecological drift (Hubbell, 2001) and a variety of niche-based SAD models (Tokeshi, 1998). In turn, recent meta-analyses on global plant species abundance distributions pointed to a higher proportion of lognormal type SADs with a lot of rare species (Ulrich et al., 2016a;Ulrich et al., 2016, Ulrich et al., 2018. are not dispersal limited (Green & Plotkin, 2007;Hubbell, 2001).
Therefore, our finding of a better fit of regional lognormal abundance distributions corroborates the view that local plant communities at least at higher latitudes are dispersal limited and lognormally With respect to the SAR estimators, we found a consistently better estimate of the logarithmic SAR (Figure 2). The power function SAR constantly tended to overestimate richness, which indicates deviations from the power function species accumulation irrespective of spatial scale (Figure 2). This finding stands in certain contrast to recent analyses of richness patterns that clearly pointed to a prevalence of power function SARs at local (Dengler et al., 2020) and regional scales (Dengler, 2009). However, direct comparisons of SAR model fits do not exclude spatial scale dependent deviations from the model. With respect to extrapolations, even small (and possibly statistically insignificant) deviations might cause imprecise richness estimates. Therefore, observed positive relative biases in richness estimation demonstrate that the accumulation of species richness in natural systems at regional scales occurs slower than predicted by a power function. In this respect, Dengler (2009) and Dengler et al. (2020) reported that the power function SAR reliably predicts the increase in terrestrial vegetation over ten orders of magnitude while logarithmic SAR models proved inappropriate.
However, the data of Dengler et al. (2020) covered local samples of continuous vegetation. Rosindell and Cornell (2007) and Storch (2016) demonstrated that SARs can be triphasic with shallower slopes at regional scales. Consequently, extrapolating across the local-regional border would cause either an over-or an underestimation of richness depending on the proportion of data points from local and regional samples used to construct the SAR. This is the pattern reported here. In this respect, Storch, Keil, and Jetz (2012) found that local to continental SARs collapse into one common power function after area was rescaled by mean species range sizes. Unfortunately, for most datasets (including the present) such data are unavailable. Regarding our second main task, we first asked whether and how geographical and climate variability influences the accuracy of richness estimation. This was indeed the case. Although weak, we found positive latitudinal gradients in the precision of all estimators with the exception of powSAR (Table 1). This trend was apparent even after correcting for sampling fraction. Latitude is only a surrogate variable. We hypothesize that this latitudinal gradient is caused by respective latitudinal changes in community composition. Surprisingly, we found an increase in relatively infrequent species (those that occurred only in one or in two grid cells) at higher latitudes (Figure 4a Consequently, the proportion of rare species is relatively high. We also asked whether the variability in environmental variables affects estimator performance. This was only marginally the case (Table 2, Figure 3). After correcting for covariates (Table 2), only temperature variability appeared to markedly influence estimator performance (Table 2) explaining between 3% and 10% of variability in logSAR and the SAD estimators. Importantly, most robust were Chao2 and rarefaction indicating that the underlying theoretical models are indeed sufficiently independent of environmental covariates. Again, environmental variability might only indirectly influence estimator performance via respective changes in the spatial distribution of species and/or the distribution of abundances. Both distributions are known to change along gradients of environmental variability (Ackerly, Knight, Weiss, Barton, & Starmer, 2002;Pottier et al., 2013). The fact that this did not significantly influence estimator performance is an important and a strong argument in favour of the underlying theoretical assumptions of scale invariant species spatial distribution patterns.
Finally, we asked whether and how evenness in species incidence and the spatial species turnover among grid cells influence estimation accuracy. To our surprise, higher spatial species turnover was positively related only to the performance of the SAD estimators. We expected to see a signal for SAR as the SAR slope is directly linked to the decay in compositional similarity among communities (Soininen, McDonald, & Hillebrand, 2007). Low spatial species turnover might decrease the proportion of infrequent species in dispersal limited communities as predicted by neutral theory (Hubbell, 2001). Low proportions of infrequent species are equivalent to a higher evenness in species incidence. If evenness F I G U R E 4 Proportions of (a) infrequent (species with only single or double incidence among the 100 cell of each window) and (b) frequent (species occurring in all grid cells of each window) in dependence of latitude. Coefficients of determination refer to ordinary least squares linear regressions. Significance of the regression in (a) p(F 1,458 ) < .001 were higher than predicted by the log-series distribution, both SAD estimators would perform weak as species richness is also higher than the predicted upper limit estimated by the S LS estimator (Ulrich & Ollik, 2005). This is the pattern observed here (Table 2).

DATA AVA I L A B I L I T Y S TAT E M E N T
The raw data used in the present publication have been published by Kubota et al. (2015) and Kusumoto et al. (2015).