sPlotOpen – An environmentally balanced, open- access, global dataset of vegetation plots

Abstract


| BACKG ROUND & SUMMARY
Biodiversity is facing a global crisis. As many as 1 million species are currently threatened with extinction, the vast majority due to anthropogenic impacts such as land-use and climate change (IPBES, 2019;WWF, 2020). In addition, the rates of biodiversity homogenization and redistribution are accelerating (Fricke & Svenning, 2020;Lenoir et al., 2020;Staude et al., 2020). Biological assemblages are becoming progressively more similar to each other globally, as local and endemic species go extinct and are replaced by more widespread and competitive native or alien species (IPBES, 2019;Staude et al., 2020). Many terrestrial and marine species are also shifting their geographical distribution as a response to climate change . This has profound potential impacts on ecosystems and human health (Bonebrake et al., 2018;Pecl et al., 2017).
Plant communities are no exception to this biodiversity crisis (Cardinale et al., 2011;Lenoir et al., 2008;Staude et al., 2020). This is particularly worrying since terrestrial vegetation accounts for 80% (450 Gt C) of the living biomass on Earth ( Bar-On et al., 2018). Given the central role of vegetation in ecosystem productivity, structure, stability and functioning (Cardinale et al., 2011), assessing biodiversity status and trends in plant communities is paramount for other kingdoms of life and human societies alike.
Monitoring trends in plant biodiversity requires adequate data across a range of spatio-temporal scales (Kühl et al., 2020;Pimm, 2021). Large independent collections of plant occurrence vegetation plots have been recorded, most are not available to the global research community. A recent initiative, called 'sPlot', compiled the first global vegetation plot database, and continues to grow and curate it. The sPlot database, however, is extremely unbalanced spatially and environmentally, and is not open-access. Here, we address both these issues by (a) resampling the vegetation plots using several environmental variables as sampling strata and (b) securing permission from data holders of 105 local-to-regional datasets to openly release data. We thus present sPlotOpen, the largest open-access dataset of vegetation plots ever released. sPlotOpen can be used to explore global diversity at the plant community level, as ground truth data in remote sensing applications, or as a baseline for biodiversity monitoring.
Main types of variable contained: Vegetation plots (n = 95,104) recording cover or abundance of naturally co-occurring vascular plant species within delimited areas.
sPlotOpen contains three partially overlapping resampled datasets (c. 50,000 plots each), to be used as replicates in global analyses. Besides geographical location, date, plot size, biome, elevation, slope, aspect, vegetation type, naturalness, coverage of various vegetation layers, and source dataset, plot-level data also include community- Software format: Three main matrices (.csv), relationally linked.

K E Y W O R D S
big data, biodiversity, biogeography, database, functional traits, macroecology, vascular plants, vegetation plots data do exist at the global or continental extent via the Botanical Information and Ecology Network (BIEN; Enquist et al., 2016), the Global Inventory of Floras and Traits (GIFT; Weigelt et al., 2020) or the Global Biodiversity Information Facility (GBIF; https://www.gbif. org/). However, these databases suffer from one or several of the following limitations: (a) imbalance towards tree species only; (b) lack of data on how individual plant species co-occur and interact locally to form plant communities; and (c) coarse spatial resolutions (e.g., one-degree grid cells), which preclude intersection with high resolution remote sensing data and the assessment of biodiversity trends at the plant community level (Boakes et al., 2010).
There is a long tradition among botanists and phytosociologists to record the cover or abundance of each plant species that occurs in a vegetation plot (here used as a synonym of 'relevé' or 'quadrat') of a given size (i.e., surface area) at a given time (e.g., Stebler & Schröter, 1892). Compared to presence-only data, vegetation-plot data present many advantages. As all visible plant species are recorded, plots contain information on which plant species do, and do not co-occur in the same locality at a given moment in time (Chytrý et al., 2016). This is important for testing hypotheses related to biotic interactions among plant species. Vegetation-plot data also provide crucial information on where and when a species was absent, therefore, improving predictions from current species distribution models (Phillips et al., 2009). Being spatially explicit, vegetation plots can be resurveyed through time to assess potential changes in plant species composition relative to a baseline (Perring et al., 2018;Staude et al., 2020;Steinbauer et al., 2018). As they normally contain information on the relative cover or abundance of each species, vegetation plots are also more appropriate for detecting biodiversity changes than data representing only the occurrence of individual species (Beck et al., 2018;Jandt et al., 2011).
Globally, however, vegetation-plot data are very fragmented, as they typically stem from a myriad of local research and survey projects (Bruelheide et al., 2019). These are fine-grained data (e.g., 1-10,000 m 2 ) normally covering small spatial extents (e.g., 1-1,000 km 2 ). With their disparate sampling protocols, standards and taxonomic resolutions, aggregating and harmonizing vegetation plot data proves extremely challenging (Bruelheide et al., 2018). It is not surprising, therefore, that these data are rarely used in global-scale research on the biodiversity of plant communities Franklin et al., 2017;Wiser, 2016).
The sPlot initiative tries to close this data gap. It consolidates numerous local to regional vegetation-plot datasets to create a harmonized and comprehensive global database of georeferenced terrestrial plant species assemblages (Bruelheide et al., 2019).
Established in 2013, sPlot v3.0 currently contains more than 1.9 million vegetation plots, and is fully integrated with the TRY database (Kattge et al., 2020), from which it derives information on plant functional traits. The sPlot database is increasingly being used to study continental-to-global scale vegetation patterns (Cai et al., 2021;Testolin, Carmona, et al., 2021), such as the relative contribution of regional versus local factors to the global patterns of fern richness (Weigand et al., 2020), the mechanisms underlying the spread and abundance of native versus invasive tree species (van der Sande et al., 2020), and worldwide trait-environment relationships in plant communities (Bruelheide et al., 2018).
Yet, most of these data are not open-access. Here, we secured permission from data holders in the sPlot database to openly release a dataset composed of 95,104 vegetation plots. We selected the plots to be released using a replicated environmental stratification, in order to represent the entire environmental space covered by the sPlot database. This maximizes the benefits of releasing these data for a wide range of potential uses. The selected vegetation plots stem from 105 databases and span 114 countries (Figure 1). This resampled dataset (sPlotOpen -hereafter) is composed of: (a) plotlevel information, including metadata and basic vegetation structure descriptors; (b) the vascular plant species composition of each vegetation plot, including species cover or abundance information when available; and (c) community-level functional information obtained by intersection with the TRY database (Kattge et al., 2020).
sPlotOpen is specifically designed for global macroecological studies, for example, the exploration of functional diversity patterns of communities with continental-to-global extent. We expect, however, that sPlotOpen might likewise prove useful to answer a range of different questions, related for instance to species co-occurrence patterns, the definition of species pools, the link between regional versus local determinants of species diversity, or the niche overlap between co-occurring species. Yet, data in sPlotOpen should not be considered as representative of the distribution of plant communities worldwide, especially when working at local spatial extents. This should be kept in mind for applications such as species distribution models (SDMs) or joint SDMs, whose results might be affected by the uneven geographical distribution of sPlotOpen's data. We refer the reader to the section 'Usage notes' for additional guidance on critical issues related, for instance, to incompletely sampled vegetation plots, varying plot size, and nested vegetation plots.

| Vegetation plot data sources
We started from the sPlot database v2.1 (created in October 2016), which contains 1,121,244 unique vegetation plots and 23,586,216 species records. Most of the data in sPlot refer to natural and seminatural vegetation, while vegetation shaped by intensive and repeated human interference, such as cropland or ruderal communities, is hardly represented. Data originate from 110 different vegetationplot datasets of regional, national or continental extent, some of which stem from regional or continental initiatives (see Bruelheide et al., 2019, for more information). For instance: 48 vegetationplot datasets derive from the European Vegetation Archive (EVA; Chytrý et al., 2016); three major African datasets derive from the Tropical African Vegetation Archive (TAVA); and multiple vegetation datasets in the USA and Australia derive from the VegBank (Peet, Lee, Boyle, et al., 2012;Peet, Lee, Jennings, et al., 2012) and TERN's AEKOS (Chabbi & Loescher, 2017)

| Resampling method
Data in the sPlot database are unevenly distributed across vegetation types and geographical regions (see Bruelheide et al., 2018).
Mid-latitude regions in developed countries (mostly Europe, the USA and Australia) are overrepresented in sPlot, while regions in the tropics and subtropics are underrepresented, which is a typical geographical bias in biodiversity data (see Lenoir et al., 2020;Lenoir & Svenning, 2015 for similar geographical bias in species redistribution). Such a geographical bias usually translates into an environmental bias with temperate climate usually more represented than tropical or Mediterranean climates. Unbalanced sampling effort in the environmental space is of particular concern for comparative macroecological studies (Bruelheide et al., 2018;Lenoir et al., 2010). To reduce this imbalance as much as possible, we performed a stratified resampling approach within the environmental space using several environmental variables available at global extent as sampling strata.

F I G U R E 1
Top: global distribution of all vegetation plots contained in sPlotOpen (n = 95,104). Each colour represents a different source dataset (n = 105 -different datasets might have the same colour). Bottom: spatial distribution of vegetation plot density for the environmentally balanced dataset selected by the first resampling iteration (n = 49,787). Densities are calculated in hexagonal cells with a spatial resolution of approximately 70,000 km². Map projection is Eckert IV First, we removed vegetation plots without geographical coordinates or with a location uncertainty higher than 3 km. We also removed vegetation plots identified by the respective data contributors as having been recorded in wetlands or in anthropogenic vegetation types, since these data were available only for a few geographical regions, mostly in Europe. This resulted in a total of 799,400 out of the initial set of 1,121,244 vegetation plots.
We then ran a global principal component analysis (PCA) on a matrix of all terrestrial grid cells at a spatial resolution of 2.5 arcmin (n = 8,384,404), based on 30 climatic and soil variables. For climate, we used the 19 bioclimatic variables from CHELSA (Climatologies at high resolution for the earth's land surface areas) v1.2 (Karger et al., 2017), as well as two other bioclimatic variables reflecting the growing-season length (growing degree days above 1 ℃ -GDD1 -and 5 ℃ -GDD5), which were derived from CHELSA's monthly temperatures as in Synes and Osborne (2011). In addition, we considered an index of aridity and a layer for potential evapotranspiration from the Consortium of Spatial Information (CGIAR-CSI, Trabucco & Zomer, 2010). For soil, we extracted seven variables from the SoilGrids database (Hengl et al., 2017), namely: (a) soil organic carbon content in the fine earth fraction; (b) cation exchange capacity; (c) pH; as well as the fractions of (d) coarse fragments; (e) sand; (f) silt; and (g) clay. The results of this PCA represent the full environmental space of all terrestrial habitats on Earth, irrespective of whether a grid cell hosted vegetation plots or not (Supporting Information Figure S1). We then subdivided the PCA ordination space, represented by the first two principal components (PC1-PC2), which accounted for 47 and 23%, respectively, of the total environmental variation in terrestrial grid cells, into a regular 100 × 100 grid. This PC1-PC2 two-dimensional space was subsequently used to balance our sampling effort across all PC1-PC2 grid cells for which vegetation plots were available. After excluding 42,878 vegetation plots for which no PC1 or PC2 values were available, due to missing data in the bioclimatic or soil variables, we projected the remaining 756,522 vegetation plots onto this PC1-PC2 grid. We finally calculated how many vegetation plots occurred in each PC1-PC2 grid cell ( Figure 2).
In total, vegetation plots were available for 1,720 out of the 4,125 PC1-PC2 grid cells covered by the 8,384,404 terrestrial grid cells of the geographical space. We then resampled those PC1-PC2 grid cells (n = 858) with more than 50 vegetation plots, which is F I G U R E 2 Distribution of vegetation plots from sPlotOpen in the global environmental space based on a principal component analysis (PCA) using 30 climate and soil variables. Top: spatial distribution of PCA values across all terrestrial grid cells (n = 8,384,404, spatial grain = 2.5 arcmin). Bottom left: distribution of plots compared to the distribution of all terrestrial 2.5 arc-minute cells (grey background) in the PCA space. Only the plots in the environmentally balanced dataset selected in the first resampling iteration are shown (n = 49,787). The PCA space was divided into a 100 × 100 regular grid. The first and second PCA axes explained 47 and 23% of the total variance, respectively. Bottom right: geographical distribution of the vegetation plots contained in four randomly selected PCA grid cells the median number of plots occurring across occupied grid cells in sPlot. This threshold of 50 vegetation plots represents a compromise between selecting a high number of plots, and keeping the resampled dataset as balanced as possible across the PC1-PC2 environmental space. To select these 50 vegetation plots we used the heterogeneity-constrained random resampling algorithm (Lengyel et al., 2011). This algorithm quantifies the variability in plant species composition among a set of vegetation plots by computing the mean and the variance of the Jaccard's dissimilarity index (Jaccard, 1912) between all possible pairs of vegetation plots. More precisely, for a given PC1-PC2 grid cell containing more than 50 vegetation plots, we generated 1,000 random selections of 50 vegetation plots and ranked each selection according to the mean (ascending order) and variance (descending order) value of the Jaccard's dissimilarity index.
Ranks from both sortings were summed for each random selection, and the selection with the lowest summed rank was considered to provide the most balanced/even representation of vegetation types within the focal grid cell. Where a grid cell contained fewer than 50 plots, we retained all of them. In this way, we reduced the imbalance towards over-sampled climate types while ensuring that the resampled dataset represents the entire environmental gradient covered by the original sPlot database. This approach optimizes the selection of a subset of vegetation plots that encompasses the highest variability in species composition while avoiding peculiar and rare communities, which may represent outliers. As such, our approach maximizes variability over representativeness within each grid cell.
We repeated the whole resampling procedure three times to get three different environmentally balanced, resampled subsets of our vegetation plots. These three resampling iterations can therefore be used as separate replicates, albeit these are not completely independent, as the same plots might have been drawn in two or even three of the three resampling iterations. In addition, those plots located in PC1-PC2 grid cells with fewer than 50 vegetation plots are completely shared by all three iterations.

| Permission to release the data as open access
The resampling procedure resulted in 56,486, 56,501 and 56,494 vegetation plots selected during resampling iterations #1, #2 and #3, respectively, for a total of 107,238 unique vegetation plots. Since the sPlot database is a consortium of independent datasets whose copyright belongs to the data contributors, we used this preliminary potential selection to ask each dataset's custodian (i.e., either the owner of a dataset or its authorized representative in the case of a collective dataset) for permission to release the data of selected To mitigate the imbalance due to the exclusion of these confidential plots, we created a 'consensus' dataset. We started from resampling iteration #1, and replaced the 6,699 plots not granted as open access with plots selected in the second and third iterations, for which such permission could be granted ('reserve' plots, hereafter). We imposed the constraint that each candidate vegetation plot in the reserve pool should belong to the same environmental stratum, that is, the same PC1-PC2 grid cell, as the confidential vegetation plot, even though we acknowledge that this procedure does not maximize the variability in plant species composition of the replacement plots. Even after drawing from reserves, there were 3,150 plots that could not be replaced. These were distributed across 279 PC1-PC2 grid cells (16.2% of occupied cells), each cell having on average 11 irreplaceable plots (min. = 1, median = 5, max. = 50).
Because missing values were particularly widespread in the species-trait matrix, we calculated community-weighted means using the gap-filled version of these traits we received from TRY (Kattge et al., 2020). Gap-filling was performed at the level of individual observations and relies on hierarchical Bayesian modelling (R package 'BHPMF' - Fazayeli et al., 2014;Schrodt et al., 2015) in R (R Core Team, 2020). This is a Bayesian machine learning approach, with no a priori assumptions, except for the data being missing completely at random. The algorithm 'learns' from the data, that is, if there was a phylogenetic signal in the data, this was used to fill the gaps but where no such signal was apparent, none was introduced.
After gap-filling, we transformed to the natural logarithm all gapfilled trait values and averaged each trait by taxon (i.e., at species or genus level). The gap-filling approach was run only for species having at least one trait observation (n = 21,854). Additional information on the gap-filling procedure is available in Bruelheide et al. (2019).
Community-weighted means (CWM) and variances (CWV) were calculated for every plant functional trait j and every vegetation plot k as follows (Enquist et al., 2015): where n k is the number of species with trait information in vegetation plot k, p i,k is the relative abundance of species i in vegetation plot k calculated as the species' fraction in cover or abundance of total cover or abundance, and t i,j is the mean value of species i for trait j.

| DATA RECORDS
sPlotOpen contains 95,104 unique vegetation plots from 105 constitutive datasets (Table 1)  Information on the size (surface area) of the vegetation survey is available for 67,022 plots, and ranges between 0.03 and 40,000 m 2 (mean = 377 m 2 ; median = 100 m 2 ). Specifically, sPlotOpen contains 12,894 plots with size smaller than 10 m 2 , 25,742 with size 10-100 m 2 , 24,750 plots with size 100-1,000 m 2 and 3,075 plots with size greater or equal to 1,000 m 2 . Similarly, only for a minority of plots (n = 24,167) is information on the exact group of plants sampled in the field available (e.g., complete vegetation, only trees, only trees > 1 m height, and so on). However, as most data were collected using the phytosociological method, we deem it safe to assume that, unless otherwise specified, plots contain information on all vascular plants. We retained plots with incomplete vegetation, because they were mostly located in the tropics, that is, in areas where vegetation plots are particularly scarce otherwise. The average number of vascular plant species per vegetation plot ranges between 1 (i.e., monospecific stands) and 271 species (mean = 20; median = 16).
By capping the number of vegetation plots in overrepresented environmental conditions, the resampling procedure described above strongly reduced the bias in the distribution of vegetation plots within the PC1-PC2 environmental space. Yet, due to the lack or scarcity of data from some geographical regions, like the tropics, there is some remaining imbalance in the spatial distribution of vegetation plots across geographical regions (Figure 1). This is evident when comparing the number of plots across continents. When considering the first resampling iteration only (n = 49,787), Europe is by far the best represented continent, with 15,920 vegetation plots.
The least represented continents are Africa and South America, with 3,709 and 5,498 vegetation plots, respectively. Some residual imbalance remains also when considering biomes (Figure 3). With the exception of the 'Temperate mid-latitudes' biome, which includes 14,100 vegetation plots, all other biomes have a number of plots comprised between 1,558 ('Polar and subpolar zone') and 6,245 ('Subtropics with year-round rain') vegetation plots (Figure 3, left).
Despite this residual imbalance, all the Whittaker biomes are covered by sPlotOpen (Figure 3, right), and our resampling algorithm has resulted in a much more balanced dataset than many other global datasets that are available, such as GBIF.
Almost one third of the 95,104 vegetation plots in sPlotOpen belong to forests (n = 38,282), one half to non-forest vegetation (n = 45,735), with 11.6% of plots remaining unassigned (n = 11,087).
When not directly done by data providers, the assignment of plots to forests and non-forests was based on multiple lines of evidence, including the plot-level information on the cover of the tree layer, as well as traits of species composing a plot, such as growth form and height. In short, a plot record was considered as forest if the cover of the tree layer, or alternatively, the sum of the (relative) cover of all tree taxa (scaled by the sum of all cover values, as a percentage), was greater than 25%. It was considered a non-forest record if the sum of relative cover of low-stature, non-tree and non-shrub taxa was greater than 90%. For an extensive explanation of this classification scheme, we refer the reader to Bruelheide et al. (2019). Even though the proportion of forest versus non-forest vegetation plots is relatively well balanced, the geographical distribution of vegetation plots belonging to different vegetation types is likely not balanced in the geographical space, as it depends on the idiosyncrasies of the constitutive datasets composing the sPlot database. For instance, the data from New Zealand only include plots collected in non-forest ecosystems, while data from Chile only refer to forests. We urge potential users to carefully read the section 'Usage notes' below and the description of each individual dataset in GIVD (Dengler et al., 2011), and to contact the custodians of each dataset for further information.

| DATABA S E ORG ANIZ ATION
The environmentally balanced and open-access dataset sPlotOpen is organized into three main matrices, relationally linked through the key column 'PlotObservationID'.
The 'header' matrix contains plot-level information for the 95,104 vegetation plots, including: metadata (e.g., plot ID, data source, sampling date, geographical location, positional accuracy); sampling design information (e.g., the total surface area used during the vegetation survey); and a plot-level description of vegetation  Turboveg is a program specifically designed to store, maintain and export vegetation plot data (https://www.synbi osys.alter ra.nl/turboveg; Hennekens & Schaminée, 2001).
Finally, the object 'references' contains all the bibliographic ref- erences formatted according to a BibTex standard. Each reference is tagged with a key corresponding to the fields 'DB_BIBTEXKEY' and 'BIBTEXKEY' in the metadata. We further provide an R function ('sPlotOpen_citation') to create reference lists, based on a selection of plots and/or datasets.
Except for the 'reference' file (format.bib), all objects/matrices are provided in tab-delimited .txt files. All objects, including the 'sPlo-tOpen_citation' function, are also compiled inside a .RData object.

| TECHNI C AL VALIDATI ON
The original sPlot database has a nested structure and consists of several individual datasets, each validated and maintained by its respective dataset custodian. In many cases, individual datasets are also collections whose vegetation plots were provided by their respective owners (the person who performed the actual vegetation survey) or by someone who digitized the original data from the scientific published or grey literature. We obviously have no direct control over the individual vegetation plots that we provide here in sPlotOpen. Yet, all these vegetation plots stem from trained professional botanists, or published scientific work, and are accompanied by detailed information on the sampling protocols used, thus ensuring data quality and reliability.
Before integration into the sPlot database, each dataset was further checked for consistency. If the dataset was in a different format, we converted it to a Turboveg 2 dataset (Hennekens & Schaminée,

F I G U R E 3
Distribution of vegetation plots in the first resampling iteration of sPlotOpen (n = 49,787) in the two-dimensional climatic space represented by mean annual temperature and mean annual precipitation. Left: plots are colour coded based on sBiomes, that is, sPlot's definition of biomes (Bruelheide et al., 2019), which derives from Schultz's (2005) ecozones, modified to include also the alpine biome from Körner et al. (2017). Right: the same plots superimposed onto Whittaker's biomes (Whittaker, 1975), as adapted by Ricklefs (2008) and plotted using the R package 'plotbiomes' TA B L E 2 Description of the variables contained in the 'header' matrix, together with their range (if numeric) or possible levels (if nominal or binary) and the number of non-empty (i.e., non NA) records the required metadata information, and cross-checked that each plot was located within the geographical scope of its respective dataset.
All individual Turboveg 2 datasets were then integrated into a Turboveg 3 database, and exported to comma-separated files. Finally, we harmonized all the taxonomic names from all datasets, based on sPlot's taxonomic backbone (Purschke, 2017). This backbone matched all the taxonomic names (without nomenclatural authors) from all datasets in sPlot v2.1 and TRY v3.0 (Kattge et al., 2020) to their resolved version based on the Taxonomic Name Resolution Service web application (TNRS version 4.0; Boyle et al., 2013). This allowed us to (a) harmonize all datasets to a common nomenclature and (b) link the sPlot database to the TRY database (Kattge et al., 2020). The final backbone only retained matched taxonomic names at the rank of species or higher. Additional detail on the taxonomic resolution is reported in Bruelheide et al. (2019), while a description of the workflow, including R-code, is available in Purschke (2017).

| USAG E NOTE S
The sPlotOpen database can be downloaded from https://doi. org/10.25829/ idiv.3474-40-3292. A short vignette introducing the use of sPlotOpen in R can be found in Supporting Information Appendix S1. Users are urged to cite the original sources when using sPlotOpen in addition to the present paper (see Table 1). For two datasets (AF-00-009, AF-CD-001), the identification of taxa at species level is still in progress. Data on lichens and mosses, where available (e.g., dataset NA-GL-001), can be obtained on request from the respective dataset custodian or sPlot coordinator. As most of the constitutive datasets remain under continuous development, sPlo-tOpen users are encouraged to get in touch with the custodian(s) of the data they are planning to use (the updated list of custodian names is maintained on the sPlot website).
The use of sPlotOpen comes with a number of warnings. First, sPlotOpen was resampled in a way that maximizes the compositional variability of vegetation in different environmental conditions. As such, sPlotOpen should not be considered as representative of the spatial distribution of plant communities, especially when the focus has a local or regional spatial extent. Second, for most regions data were collected opportunistically, and without a randomized sampling design. This might lead to some vegetation types being oversampled in some regions, but undersampled in other regions, which might affect the output of species distribution models, especially at local or regional spatial extents. Third, not all plots were sampled using the same plot size, and some plots, mostly located in tropical regions, only contain data on woody species. This should be accounted for when exploring biodiversity patterns or comparing biodiversity indices (e.g., species richness, beta diversity) across plots or regions.
Finally, a small fraction of plots are nested subsets of larger plots.
Depending on the application, this might or might not represent a problem. Nested plots can be identified using the information in the 'metadata' matrix. The most appropriate way to deal with these issues depends on the problem being analysed. Users are, therefore, invited to carefully consider the limitations above when designing applications relying on sPlotOpen.
The data described here represent the subset of sPlot for which we were able to secure permission for making these data open.
Additional data from sPlot are available under sPlot's Governance and Data Property Rules (https://www.idiv.de/en/splot). Using the full sPlot dataset is also recommended if a stratification is desired that is different from the environmental factors used here, for example by geographical region or plot size.

ACK N OWLED G M ENTS
The authors are grateful to the thousands of vegetation scientists who sampled vegetation plots in the field or digitized them into regional, national or international databases. The authors also  Dengler et al. (2011). Biomes refer to Schultz (2005), modified to include also the world mountain regions (Körner et al., 2017). The column ESY refers to the European Nature Information System (EUNIS) Habitat Classification expert system (ESY, Chytrý et al., 2020).  This paper is dedicated to the memory of Dr. Ching-Feng (Woody) Li.