sPlot – A new tool for global vegetation analyses

: Aims Vegetation‐plot records provide information on the presence and cover or abundance of plants co‐occurring in the same community. Vegetation‐plot data are spread across research groups, environmental agencies and biodiversity research centers and, thus, are rarely accessible at continental or global scales. Here we present the sPlot database, which collates vegetation plots worldwide to allow for the exploration of global patterns in taxonomic, functional and phylogenetic diversity at the plant community level. Results sPlot version 2.1 contains records from 1,121,244 vegetation plots, which comprise 23,586,216 records of plant species and their relative cover or abundance in plots collected worldwide between 1885 and 2015. We complemented the information for each plot by retrieving climate and soil conditions and the biogeographic context (e.g., biomes) from external sources, and by calculating community‐weighted means and variances of traits using gap‐filled data from the global plant trait database TRY. Moreover, we created a phylogenetic tree for 50,167 out of the 54,519 species identified in the plots. We present the first maps of global patterns of community richness and community‐weighted means of key traits. Conclusions The availability of vegetation plot data in sPlot offers new avenues for vegetation analysis at the global scale. Abstract Aims: Vegetation- plot records provide information on the presence and cover or abundance of plants co- occurring in the same community. Vegetation- plot data are spread across research groups, environmental agencies and biodiversity research centers and, thus, are rarely accessible at continental or global scales. Here we present the sPlot database, which collates vegetation plots worldwide to allow for the exploration of global patterns in taxonomic, functional and phylogenetic diversity at the plant community level. Results: sPlot version 2.1 contains records from 1,121,244 vegetation plots, which comprise 23,586,216 records of plant species and their relative cover or abundance in plots collected worldwide between 1885 and 2015. We complemented the information for each plot by retrieving climate and soil conditions and the biogeographic context (e.g., biomes) from external sources, and by calculating community- weighted means and variances of traits using gap- filled data from the global plant trait database TRY. Moreover, we created a phylogenetic tree for 50,167 out of the 54,519 species identified in the plots. We present the first maps of global patterns of community richness and community- weighted means of key traits. Conclusions: The availability of vegetation plot data in sPlot offers new avenues for vegetation analysis at the global scale.


| INTRODUC TI ON
Studying global biodiversity patterns is at the core of macroecological research (Costello, Wilson, & Houlding, 2012;Kreft & Jetz, 2007;Wiens, 2011), since their exploration may provide insights into the ecological and evolutionary processes acting at different spatiotemporal scales (Ricklefs, 2004). The opportunities engendered by the compilation of large collections of biodiversity data into widely accessible global (GBIF, www.gbif.org) or continental databases (e.g., BIEN, www.bien.nceas.ucsb.edu/bien) have recently advanced our understanding of global biodiversity patterns, especially for vertebrates, but also for vascular plants (Butler et al., 2017;Engemann et al., 2016;Lamanna et al., 2014;Swenson et al., 2012). Although this development has led to the formulation of several macroecological theories (Currie et al., 2004;Pärtel, Bennett, & Zobel, 2016), a more mechanistic understanding of how assembly processes shape ecological communities, and consequently global biodiversity patterns, is still missing (Lessard, Belmaker, Myers, Chase, & Rahbek, 2012).
Understanding the links between biodiversity patterns and assembly processes requires fine-grain data on the co-occurrence of species in ecological communities, sampled across continental or global spatial extents (Beck et al., 2012;Wisz et al., 2013). For example, such co-occurrence data have been used to compare changes in vegetation composition over time spans of decades (Jandt, von Wehrden, & Bruelheide, 2011;Perring et al., 2018). Unfortunately, up to now information on fine-grain vegetation data has not been readily available, as most of the continental to global biodiversity datasets have been derived from occurrence data (i.e., presenceonly data), and after being aggregated spatially, have a relatively coarse-grain scale (e.g., one-degree grid cells) without information on species co-occurrence at the meaningful scale of local communities (Boakes et al., 2010). In contrast, vegetation-plot data record the cover or abundance of each plant species that occurs in a plot of a given size at the date of the survey, representing the main reservoir of plant community data worldwide (Dengler et al., 2011).
Vegetation-plot data differ in fundamental ways from databases of occurrence records of individual species aggregated at the level of grid cells or regions of hundreds or thousands of square kilometers ( Figure 1). First, vegetation plots usually provide information on the relative cover or relative abundance of species, allowing for the testing of central theories of biogeography, such as the abundancerange size relationship (Gaston & Curnutt, 1998) or the relationship between local abundance and niche breadth (Gaston et al., 2000).
Second, they contain information on which plant species co-occur in the same locality , which is a necessary precondition for direct biotic interactions among plant individuals. Third, unrecorded species can be considered truly absent from the aboveground vegetation at this scale because the standardized methodology of taking a vegetation record requires a systematic search for all species in a plot, or at least all species of the dominant functional group. Fourth, many plots are spatially explicit and can be resurveyed through time to assess possible consequences of land use and climate change (Perring et al., 2018;Steinbauer et al., 2018).
Fifth, vegetation plots represent a snapshot of the primary producers of a terrestrial ecosystem, which can be functionally linked to F I G U R E 1 Conceptual figure visualizing how functional composition (in this case plant height) differs between calculations based on mean traits for grid cells and community data sampled in vegetation plots. Occurrence data (e.g., from distribution atlases, GBIF, etc.) can be used to calculate mean trait values in grid cells G1-G3. However, community weighted means (CWMs) of traits differ across local plots (P1-P6), while the mean values of CWMs in the grid cells differ from the unweighted values calculated in the grid cells. This example is simplified by showing few species and few plots. In reality, differences are generally more pronounced organisms from different trophic groups sampled in the same plots (e.g., multiple-taxa surveys) and related processes and services both below (e.g., decomposition, nutrient cycling) and above ground (e.g., herbivory, pollination) (e.g., Schuldt et al., 2018).
Recently several projects at the regional to continental scale have demonstrated the potential of using vegetation-plot databases for exploring biodiversity patterns and the underlying assembly processes. Using vegetation data of French grasslands, Borgy et al. (2017) demonstrated that weighting leaf traits by species abundance in local communities is pivotal to capture leaf trait-environment relationships.
Analyzing United States forest assemblages surveyed at the community level, Šímová, Rueda, and Hawkins (2017) were able to relate cold or drought tolerance to leaf traits, dispersal traits and traits related to stem hydraulics. Using plot-based tree inventories of the United States forest service, Zhang, Niinemets, Sheffield, and Lichstein (2018) found that shifts in tree functional composition amplify the response of forest biomass to droughts. Based on >15.000 plots from a wide number of habitat types in Denmark, Moeslund et al. (2017) showed that typical plant species that are part of the site-specific species pool but are absent in a community tend to depend on mycorrhiza, are mostly adapted to low light and low nutrient levels, have poor dispersal abilities and are ruderals and stress-intolerant. By collating >40,000 vegetation plots sampled in European beech forests, Jiménez-Alfaro et al. (2018) found that current local community diversity and species pool sizes calculated at different scales were mainly explained by proximity to glacial refugia and current precipitation.
Although large collections of vegetation-plot data are now available from national to continental levels (e.g., Enquist, Condit, Peet, Schildhauer, & Thiers, 2016;Peet, Lee, Jennings, & Faber-Langendoen, 2012;Schaminée, Hennekens, Chytrý, & Rodwell, 2009;Schmidt et al., 2012), they are rarely used in global-scale biodiversity research (Franklin, Serra-Diaz, Syphard, & Regan, 2017;Wiser, 2016). This is unfortunate because vegetationplot data may reveal important patterns that cannot be captured by grid-based datasets (Table 1). Functional composition patterns, for instance, may differ substantially when considering vegetation-plot data rather than single species occurrences aggregated at the level of coarse-grain grid cells. Using plant height as an illustration reveals that the trait means calculated on all the species occurring in a grid cell may differ strongly from the community-weighted means (CWMs) averaged across local communities ( Figure 1). Nevertheless, only the gridbased approach has been used to date in studies of the geographic distribution of trait values (e.g., Swenson et al., 2012Swenson et al., , 2017Wright et al., 2017).
Here, we present sPlot, a global database for compiling and integrating plant community data. We describe (a) main steps in integrating vegetation-plot data in a repository that provides taxonomic, functional and phylogenetic information on co-occurring plant species and links it to global environmental drivers; (b) principal sources and properties of the data and the procedure for data usage; and (c) expected impacts of the database in future ecological research. To illustrate the potential of sPlot we also show global diversity patterns that can be readily derived from the current content.

| Vegetation-plot data
The sPlot consortium currently collates 110 vegetation-plot databases of regional, national or continental extent. Some of the databases have previously been aggregated by and contributed through TA B L E 1 Types of information provided by single vegetation plots, vegetation plots aggregated within grid cells (or other geographic units) and single species occurrence records aggregated within grid cells. The three levels are illustrated in Figure 1 Information … at the local level … at the regional level … at the regional level BRUELHEIDE Et aL.
two (sub-)continental database initiatives (Table 2 and Appendix S1). All data from Europe and nearby regions were contributed via the European Vegetation Archive (EVA), using the SynBioSys taxon database as a standard taxonomic backbone .
Three African databases were contributed via the Tropical African Vegetation Archive (TAVA). In addition, multiple U.S. databases were contributed through the VegBank archive maintained in support of the U.S. National Vegetation Classification (Peet, Lee, Boyle, et al., 2012;Peet, Lee, Jennings, & Faber-Langendoen, 2012). The data from other regions (South America, Asia) were contributed as separate databases.
We stored the vegetation-plot data from the individual databases in the database software TURBOVEG v2 (Hennekens & Schaminée, 2001

| Taxonomic standardization
To combine the species lists of the different databases in sPlot, we constructed a taxonomic backbone. To link co-occurrence information in sPlot with plant traits, we expanded this backbone to integrate plant names used in the TRY database (Kattge et al., 2011 We accepted all names that were matched, or converted from synonyms, with an overall match score of 1. In cases with no exact match (i.e., the overall match score was <1), names were inspected on an individual basis. All names that matched at taxonomic ranks at or lower than species (e.g., subspecies, varieties) were accepted as correct names. The name matching procedure was repeated for the uncertain names (i.e., with match accuracy scores below the threshold value from the first matching run), with a preference on first using the source 'Tropicos' (Missouri Botanical Garden; http:// www.tropicos.org/; accessed 19 Dec 2014) because here matching scores were often higher for names of low taxonomic rank. The remaining 9,641 non-matched names were resolved using (a) the additional source 'NCBI' (Federhen, 2010)  One potential shortcoming of our taxonomic backbone is that for most regions it was necessary to standardize taxa using standard sets of taxonomic synonyms. Thus, if a taxonomic name represents multiple taxonomic concepts, e.g., such as created by the splitting and lumping of taxa, or a name has been misapplied in a region, we must trust that this problem has been addressed in our component databases (Franz, Peet, & Weakley, 2004;Jansen & Dengler, 2010).
However, different component databases may have applied different taxonomic concepts for splitting and lumping taxa.

| Physiognomic information
To achieve a classification into forests versus non-forests that is applicable to all plots irrespective of the structural and habitat data provided by the source database, we defined as forest all plot records that had >25% absolute cover of the tree layer, making use of the attribute data of sPlot. This threshold is similar to the classification of Ellenberg and Müller-Dombois (1967), who defined woodland formations with trees covering more than 30%. There were 16,244 tree species in the sPlot database. As tree layer cover was available for only 25% of all plots, we additionally used the information whether the taxa present in a plot were trees (usually defined as being taller than 5 m), using the plant growth form information from TRY (see below). Thus, plots lacking tree cover information were defined as forests if the sum of relative cover of all tree taxa was | 169 >25%. Similarly, we defined non-forests by calculating the cover of all taxa that were not defined as trees or shrubs (also taken from the TRY plant growth form information) and that were not taller than 2 m, using the TRY data on mean plant height. In total, 21,888 taxa belonged to this category. We defined all plots as non-forests if the sum of relative cover of these low-stature, non-tree and non-shrub taxa was >90%. As we did not have the growth form and height information for all taxa, a fraction of about 25% of the plots remained unassigned (i.e., neither forest, nor non-forest). In addition, more detailed classifications of plots into physiognomic formations (Table   S3.2 in Appendix S3) and naturalness (Table S3.3 in Appendix S3) were derived from various types of plot-level or database-level information provided by the sources and stored in five separate fields (see Table S2.1 in Appendix S2).

| Phylogenetic information
We developed a workflow to generate a phylogeny of the vascular plant species in sPlot, using the phylogeny of Zanne et al. (2014), updated by Qian and Jin (2016). Species present in sPlot but missing from this phylogeny were added next to a randomly selected congener (see also Maitner et al., 2018). This approach has been demonstrated to introduce less bias into subsequent analyses than adding missing species as polytomies to the respective genera (Davies, Kraft, Salamin, & Wolkovich, 2012 Note. GIVD ID refers to the ID in the Global Index of Vegetation-Plot Databases (http://www.givd.info), which manages the metadata for sPlot and provides updated online descriptions of these databases; * after the GIVD ID indicates that the respective database description is currently not visible on the GIVD website. Datasets contributed in harmonized format from a continental data aggregator ("collective database" according to the sPlot Rules) are listed under its name. Further references, attributions and disclaimers for particular datasets are found Appendix S1.

| Associated environmental plot information
To complement the plot data, we harmonized geographical coordinates (in decimal degrees), elevation (m above sea level), aspect (degrees) and slope (degrees) as provided by the contributing databases. All other variables were too sparsely and too inconsistently sampled across databases to be combined in the global set, but were retained in the original data sources and can be retrieved for particular purposes.
We used the geographic coordinates to create a geodatabase in  (Dee et al., 2011). While the CHELSA climatological data have a similar accuracy as other products for temperature, they are more precise for precipitation patterns (Karger et al., 2017). We also calculated growing degree days for 1°C (GDD1) and 5°C (GDD5), according to Synes and Osborne (2011) and based on CHELSA data, and included the index of aridity and potential evapotranspiration extracted from the CGIAR-CSI website (www.cgiar-csi.org). In addition, we extracted seven soil variables from the SOILGRIDS project We linked all vegetation plots to two global biome classifications. We used the World Wildlife Fund (WWF) spatial information on terrestrial ecoregions (Olson et al., 2001) to assign plots to one of the 867 ecoregions, 14 biomes and eight biogeographic realms. The WWF approach is based on a bottom-up expert system using various regional biodiversity sources to define ecoregions, which in turn are grouped into realms and biomes (Olson et al., 2001). In addition, we created a shapefile for the ecozones defined by Schultz (2005) to represent major biomes in response to global climatic variation. Since these zones are climatically heterogeneous in mountain regions, we differentiated an additional "alpine" biome for mountain areas above the lower mountain thermal belt, as defined in the classification of world mountain regions by Körner et al. (2017). This resulted in a distinction of 10 major biomes ( Figure S4.5 in Appendix S4), whose shapefile is freely available (Appendix S5).

| Trait information
To broaden the potential applications of the global vegetation database in functional contexts, we linked sPlot to TRY. We accessed plant trait data from TRY version 3.0 on August 10, 2016, and included 18 traits that describe the leaf, wood and seed economics spectra (Westoby, 1998;Reich, 2014; (Fazayeli, Banerjee, Kattge, Schrodt, & Reich, 2017;Schrodt et al., 2015), For every trait j and plot k, we calculated the community-weighted mean (CWM) and the community-weighted variance (CWV) for each of the 18 traits in a plot (Enquist et al., 2015): where n k is the number of species with trait information in plot k, p i,k is the relative abundance of species i in plot k calculated as the species' fraction in cover or abundance of total cover or abundance, and t i,j is the mean value of species i for trait j. CWMs and CWVs were calculated for 18 traits in 1,117,369 and 1,099,463 plots, respectively, the second being a smaller number as at least two taxa were needed for CWV calculation.

| Plot community data
sPlot 2.1 contains 1,121,244 vegetation plots from 160 countries and from all continents (Figure 3). The global coverage is biased towards Europe, North America and Australia, reflecting unequal sampling effort across the globe (Table 1). At the ecoregion level, major gaps occur in the wet tropics of South America and Asia, as well as in subtropical deserts worldwide and in the North American taiga. Although the plots are highly clustered geographically, their coverage in the environmental space is much more representative: the highest concentration of plots is found in environments that are most abundant globally (Figure 2), while they are lacking in the very moist parts of the environmental space, which are also spatially rare, and in the very cold parts, which are sparsely vegetated.
In most cases (98.4%), plot records in sPlot include full species lists of vascular plants, while 1.6% had only wood species above a certain diameter or only the most dominant species recorded.
Terricolous bryophytes and lichens were additionally identified in 14% and 7% of plots, respectively (Table S2 Figure   S2.4 in Appendix S2). When using these size ranges, forest plots tend to be richer in species (Figure 4a). The fact that the gradient in richness found in our plots was at least one order of magnitude stronger than differences that could be expected by the differences in plot size prompted us to produce the first global maps of plot-scale species richness, separately for forests and non-forests

| Phylogenetic information
The phylogenetic tree for sPlot was produced from 53,489 vascular plant names contained in the database, comprising 5518 genera (Appendix S7). Moderately to highly frequent species in sPlot 2.1 are equally distributed across the phylogeny (corresponding to yellowish to reddish colors for low and high peaks, respectively, in Figure   S7.6 in Appendix S7). Coverage of species included in the phylogeny ranges from 89% of species that occur only once in all plots to 100% of species with a frequency >10,000 plots ( Figure S7.7 in Appendix S7).

| Functional information
The proportion of species with trait information increases with the species' frequency in plots. Gap-filled trait information is available for 77.2% and 96.2% for taxa that occurred in more than 100 and 1,000 plots, respectively. Trait coverage is similar across biomes ( Figure S8.8 in Appendix S8). Across all biomes, the proportion of species for which gap-filled trait data are available increases with the species' frequency across plots. Compared to gap-filled data, trait coverage for the original trait data is considerably lower, being highest for height, seed mass, leaf area and specific leaf area (SLA, Figure   S8.9 in Appendix S8). The

| DATA USAG E
The sPlot database (the vegetation-plot data, including the environmental information for each plot and the species phylogeny) is released in fixed versions to allow reproducibility of results, but also due to the enormous effort needed for data integration and harmonization and for updating the phylogeny. By delivering few fixed versions while keeping older versions available, the sPlot consortium ensures that the same data can be used in parallel projects and that the data underlying a specific study remain accessible in the future, thus allowing re-analysis. Each new version will be matched to the current TRY database.

Data access to sPlot is regulated by the Governance and Data
Property Rules (www.idiv.de/sPlot) to ensure a fair balance between the interests of data contributors and data analysts. In brief, the sPlot Rules state that: (a) all contributing vegetation-plot databases become members of the sPlot consortium, represented by their custodian and deputy custodian; (b) vegetation-plot data contributed to sPlot remain the property of the data contributors and can be withdrawn at any time except for approved projects; (c) other scientists (e.g., data managers or participants of the sPlot workshops) with particular responsibilities may also be appointed as personal members to the sPlot consortium; (d) sPlot data can be requested for projects that involve at least one member of the sPlot consortium; (e) whenever a project has been proposed, all sPlot consortium members will be informed and can declare their interest in becoming co-authors of manuscripts resulting from this project and then becoming actively involved in data evaluation and writing; and (f) if also the matched gap-filled or original trait data from TRY are requested for a project, likewise members from the TRY consortium can opt-in as co-authors. The sPlot database is, therefore, available according to a 'give-and-receive' system. Moreover, the data are available to any researcher by establishing a collaboration that includes and is supported by at least one sPlot consortium member.
The sPlot consortium is governed by a Steering Committee elected by all consortium members for two-year, renewable terms.
Project proposals can be submitted to the Steering Committee, which ensures that the sPlot Rules are followed and redundant work between overlapping projects is avoided. The lists of databases, sPlot consortium members and the Steering Committee members are updated regularly on the sPlot website, as are the sPlot Rules and the list of approved projects.

| E XPEC TED IMPAC T AND LIMITATI ON S
The main aim of the sPlot database is to catalyze a collaborative network for understanding global diversity patterns of plant communities in space and time. sPlot provides a unique, integrated global repository of data that would otherwise be fragmented in unconnected and structurally inconsistent databases at regional, national or continental levels. Together with the provision of harmonized phylogenetic, functional and environmental information, sPlot allows, for the first time, global analyses of plant community data. Compared to approaches using data aggregated from species occurrences in grid cells, sPlot will significantly advance ecological analyses and future interdisciplinary research in at least four different ways.
1. Using sPlot, one can predict the species that can co-exist in a community and also the frequencies of their co-occurrence
3. sPlot data provide information on the proportion of species in a community (in terms of cover, basal area, frequency). When combined with functional trait information, relative abundance of species allows calculation of community abundance-weighted mean trait values . Information on the relative contribution of species to a community-aggregated trait value is particularly necessary when traits are used as proxies for vegetation functions and processes, allowing to test, among other things, the mass ratio hypothesis (Garnier et al., 2004;Grime, 1998) and to assess the roles of divergent traits (Díaz et al., 2007;Kröber et al., 2015).

4.
Plant species within plots can be linked to traits that predict interactions with organisms from other trophic groups, both belowground (mycorrhizae, soil decomposers) and above-ground (herbivores and pollinators). This will allow linking vegetation plot information to ecosystem processes and services such as pest control, pollination and nutrient cycling (e.g., de Bello et al., 2010).
Despite the large amount of available data and its potential suit- size. This means that using sPlot data for such studies either requires filtering for plots with identical or at least similar size or accounting for the plot-size effects in the statistical model. In addition, analyses of functional diversity with sPlot data are limited by the absence of trait data for a (small) portion of the species and by the lack of plotspecific trait measures. Furthermore, the non-random and geographically and ecologically very unequal distribution of the plots contained in sPlot call for stratified resampling to balance records of different environments (e.g., stratified by climate, Figure 2) or physiognomic formations ( Figure 4). Users of sPlot need to be aware of these and other limitations and to correct potential biases for their specific research question.

ACK N OWLED G EM ENTS
We are grateful to thousands of vegetation scientists who sampled vegetation plots in the field or digitized them into regional, national or international databases. We also appreciate the support of the German Research Foundation for funding sPlot as one of the iDiv (DFG FZT 118) research platforms, and the organization of three workshops through the sDiv calls. We acknowledge this support with naming the database "sPlot", where the "s" refers to the sDiv synthesis workshops. The study was supported by the TRY initiative on plant traits (http://www.try-db.org). For all further acknowledgements see Appendix S10. We thank Meelis Pärtel for his very fast and constructive feedback on an earlier version of this manuscript.

DATA ACC E S S I B I L I T Y
The data contained in sPlot (the vegetation-plot data complemented by species phylogeny and environmental information) are available on request, through contacting any of the consortium members for submitting a paper proposal. The proposals should follow the Governance and Data Property Rules of the sPlot Working Group, which are available on the sPlot website (www.idiv.de/sPlot). After acceptance, the respective data will be provided. In addition to the plot data, CWMs and CWVs of 18 plant traits are available for every plot.