Synthesizing tree biodiversity data to understand global patterns and processes of vegetation

Aims: Trees dominate the biomass in many ecosystems and are essential for ecosystem functioning and human well- being. They are also one of the best- studied functional groups of plants, with vast amounts of biodiversity data available in scattered sources. We here aim to illustrate that an efficient integration of these data could produce a more holistic understanding of vegetation. Methods: To assess the extent of potential data integration, we use key databases of plant biodiversity to: (a) obtain a list of tree species and their distributions; (b) identify coverage of and gaps in different aspects of tree biodiversity data; and (c) discuss large- scale patterns of tree biodiversity in relation to vegetation. Results: Our global list of trees included 58,044 species. Taxonomic coverage varies in three key databases, with data on the distribution, functional traits, and molecular sequences for about 84%, 45% and 44% of all tree species, which is > 10% greater than


| 3 of 14
Journal of Vegetation Science KEPPEL Et aL. inclusive than alternative definitions (cf. "freestanding plants" in Taseski et al., 2019). Here we combine data from GlobalTreeSearch with tree species and distribution data at finer spatial resolutions for large countries (i.e., USA, Canada, Mexico, Brazil, China and Australia) from the Global Inventory of Flora and Traits (GIFT) database (http://gift.uni-goett ingen.de, Weigelt et al., 2020) using only records indicating the species to be native. The integration of tree data from these two databases and taxonomic standardization (see Appendix S2 for details) produced a final data set (the "tree list") that included 58,044 tree species (Appendix S3) and distribution information across 463 geographic regions worldwide ( Figure 2).
We used this list to query various databases, including spatial occurrences from the Global Biodiversity Information Facility (GBIF, https://www.gbif.org/; GBIF occurrence download https://doi. org/10.15468/ dl.77gcvq, accessed from R via rgbif [https://github. com/ropen sci/rgbif] on 2021-03-16), publicly available traits from TRY version 5 (https://www.try-db.org/TryWe b/Home.php, Kattge et al., 2020), abundance data in sPlot 3.0 (https://www.idiv.de/de/ F I G U R E 1 Overview of nine key aspects of biodiversity for macroecological studies and the major global databases (see Appendix S1 for more detail) with information for trees and other taxa. Note that many databases contain information on multiple aspects of biodiversity; here we only link each database to a single aspect  Weigelt et al., 2020). See Appendix S2 for details on how the data were collated KEPPEL Et aL. splot.html-, Bruelheide et al., 2019), molecular sequences from GenBank (https://www.ncbi.nlm.nih.gov/genba nk/), and conservation assessments from the IUCN Red List version 2020-3 (https:// www.iucnr edlist.org/; Appendix S2). We evaluated the completeness of publicly available tree biodiversity data by determining the number of species in the "tree list" for which data were available in the selected databases.

MACROECOLOGY OF VEG E TATION US ING E XIS TING TREE B IODIVER S IT Y DATA
Below we discuss the completeness of data currently available in selected key databases for various aspects of tree biodiversity ( Figure 1) and evaluate where gaps are present, focusing on global, geographically wide-ranging sources. Key databases are summarized in Appendix S1. Furthermore, we discuss and illustrate how the available tree biodiversity data could be applied to better understand macroecological patterns of vegetation. Where applicable, we also compare data availability for trees with that reported for all plants .

| Distribution and abundance
Providing a spatial dimension for biodiversity, distribution data are central to many applications in ecology, evolution and conservation (Franklin et al., 2013;Keppel et al., 2015;Daru et al., 2017). Several sources of occurrence data are available, and each has its challenges and limitations (Meyer et al., 2016;König et al., 2019). Published floras and checklists are often highly curated sources of information that provide the most complete occurrence records of taxa in a region or country, and an increasing number of these are digitally available in global databases (König et al., 2019). However, these have limited spatial precision, as many species do not occur throughout an entire range or region (Meyer et al., 2016). Herbarium and museum collections are the traditional source of the most precise geo-referenced tree occurrence data and the number of digitized specimens continues to increase (Lavoie, 2013;Soltis, 2017). Increasingly, citizen scientists produce high-quality data through various platforms that, when leveraged with user expertise and advanced artificial intelligence, provide information about the distribution and ecology of plants (Havens & Henderson, 2013;Van Horn et al., 2018).
Despite important gaps in occurrence data (Meyer et al., 2016;Serra-Diaz et al., 2017;Hortal et al., 2015), databases such as GBIF, Botanical Information and Ecology Network (BIEN; http://bien. nceas.ucsb.edu/bien/, Enquist et al., 2016) and GIFT  provide a good indication of the global distribution of tree species richness at the country scale. The largest of these databases, GBIF, currently holds >23 million georeferenced records for trees, with about 84.4% of all tree species having at least one record ( Figure 3a). Hence, trees have about 11% better representation in this database than plants in general (cf. Cornwell et al., 2019) and information for distribution at the country scale is available for all described tree species in GlobalTreeSearch (Beech et al., 2017).
However, GBIF records frequently contain erroneous taxonomic and spatial information and may not capture the full extent of a species' range (Meyer et al., 2016;Zizka et al., 2020). Furthermore, GBIF has poor coverage for tree species on many islands, central Asia, the Middle East and central and northeast Africa (Figure 3a).
Databases with geo-referenced occurrence records, like GBIF, are particularly valuable for understanding large-scale vegetation patterns. Because a small number of tree species often dominates and defines vegetation, even in tropical rain forests (Keppel et al., 2011;Pitman et al., 2013;ter Steege et al., 2013), they can be good indicators of the distribution of vegetation. For example, the distribution records for the Australian desert oak, Allocasuarina decaisneana, suggest that the associated desert oak woodland vegetation may be more widespread than currently mapped ( Figure 4). Spatial distribution data can also be used to predict the responses of species and vegetation to climatic change, provided relevant environmental data are available at grain sizes fine enough to capture habitat affinities of species (Franklin et al., 2013;Fourcade et al., 2018).
Forest plots and national forest inventories are frequently used primary sources of vegetation data and provide information on the abundance and distribution of tree species. Data on species abundance and distribution, and environmental data, can be integrated to determine how climatic conditions, dispersal barriers and/or biotic interactions may shape spatial patterns of abundance (Dallas et al., 2017;Copenhaver-Parry & Bell, 2018;Steidinger et al., 2019).
Furthermore, abundance data can provide important insights into the dynamics of vegetation, particularly if related to disturbances, either in the form of time series from multiple censuses (Li et al., 2016), long-term historic data on abundance from fossil pollen records (van der Sande et al., 2019), or through comparing biodiversity change across sites with different disturbance intensities in a spacefor-time substitution (Rozendaal et al., 2019;Ibanez et al., 2020).
While plot data and relevant databases are becoming increasingly available and comprehensive (e.g., Dengler et al., 2011;Bruelheide et al., 2019), they are spread thinly across a labyrinth of sources with different levels of access and are frequently reported in nonstandard formats (Wiser, 2016). Furthermore, plot databases often have a strong spatial bias. For example, while sPlot (Bruelheide et al., 2019) includes at least one record for 25.6% of all tree species, its coverage is much better in Europe than elsewhere (Figure 3c). Like other databases, such as GFBI (Global Forest Biodiversity Initiative; http://gfbin itiat ive.com/), sPlot, therefore, has limited coverage of plot data in highly biodiverse regions, particularly Amazonia, southeast Asia, the Congo Basin and the southwest Pacific Islands, which is preventing a thorough understanding of these forests and their species (ter Steege et al., 2015;Tovo et al., 2017). At finer scales there may also be spatial biases, with more plot data believed to be available for locations that are easier to access or have experienced less anthropogenic disturbance (Jobe & White, 2009;Phillips et al., 2002).
Other data inconsistencies and biases further limit our ability to understand spatial patterns of abundance. Plot sizes across forest community data sets vary considerably (Liang et al., 2016;Bruelheide et al., 2019) and this affects estimates of forest structure and dynamics (Wagner et al., 2010). In addition, variability in minimum tree size thresholds, sampled growth forms (e.g., lianas, palms and tree ferns), and taxonomy among forest plots, further complicate efforts to find generality in biodiversity patterns (Wiser, 2016;Muscarella et al., 2020).

| Demography and ecological interactions
Demographic information reveals how the environment influences vegetation and populations (Condit et al., 1999) through its relationship with the vital rates (survival, growth, reproduction) of individuals (Caswell, 2001). Generally, models are used to connect vital rates of trees to population dynamics, as vital rates depend on the size of the individual (seedlings have low survivorship and fertility, while large trees have high survivorship and fertility) and individuals contribute unequally to population dynamics (Caswell, 2001;Ramula et al., 2009). However, limited demographic data of trees throughout their life cycle are currently available in the COMPADRE database (https://compa dre-db.org/; Salguero-Gómez et al., 2015).

Forest plots with multiple censuses, such as those in the Forest
Global Earth Observatory (ForestGEO; https://fores tgeo.si.edu/; Anderson-Teixeira et al., 2015) can provide vital information about forest dynamics (Condit et al., 1999;Sullivan et al., 2020) and additional demographic data (e.g., Visser et al., 2016). Limitations of these data include that they are not typically available in an open access format and that individuals are often only measured in forest plots once they reach a minimum size. However, summary statistics F I G U R E 3 Completeness of global digitally accessible tree biodiversity data in key databases by region. Maps indicate the proportion of species covered per region with records for: (a) distribution in GBIF (https://www.gbif.org/); (b) at least three traits in TRY version 5 (https:// www.try-db.org/); (c) at least one plot in sPlot 3.0 (https://www.idiv.de/de/splot.html); (d) at least one sequence in GenBank (https://www. ncbi.nlm.nih.gov/genba nk/); and (e) assessment of conservation status in IUCN Red List version 2020-3 (https://www.iucnr edlist.org/). In the central Venn diagram numbers of tree species unique to the various databases and the number of species represented in all databases are reported. Coverage = proportional coverage of tree species in a database with respect to total species richness in an administrative unit. See Appendix S1 for more details on databases and Appendix S2 for methods to produce values used in this figure KEPPEL Et aL. can be synthesized from these data to gain an understanding of tree population dynamics. For example, two key trade-offs explained about a third of the variation in demographic traits in the tropical moist forests of Barro Colorado Island, Panama; fast-growing, lightdemanding species with high mortality vs slow-growing, shadetolerant species with lower mortality, and long-lived, tall species with low recruitment vs short-lived, shorter species with high recruitment. The resulting demographic model was able to accurately predict the changes in structure and composition during secondary succession in these forests (Rüger et al., 2020).
Trees interact with a variety of organisms, including other trees of the same or other species, pollinators, seed dispersers and predators, herbivores, parasites and symbionts. While some interactions of trees are relatively easy to quantify, others are laborious (e.g., below-ground functions) and rarely measured . Many of these interactions are vital for maintaining biodiversity, the functioning of ecosystems and for the performance and survival of trees (Neuschulz et al., 2016;Steidinger et al., 2019). However, few databases focus on ecological interactions (see Figure 1 and Appendix S1 for examples), and those that do primarily contain species of agricultural or agroforestry importance or are geographically restricted (e.g., MycoFlor; Hempel et al., 2013). Despite these limitations, the Global Biotic Interactions (GloBI) database (Poelen et al., 2014) is a useful tool for accessing available data sets, particularly as more are added in the future. While pollination and dispersal of trees have been studied extensively using direct and genetic approaches (Bacles et al., 2006;Bennett et al., 2018), only limited information for pollination is found in databases, preventing adequate assessment of phenomena like the global pollination crisis (Bartomeus et al., 2019). However, information on mycorrhizae and nitrogen fixation is more widely available (Appendix S1), likely due to the great importance of these interactions for plant survival and performance (Steidinger et al., 2019).

F I G U R E 4
The Australian desert oak, Allocasuarina decaisneana (Casuarinaceae) is restricted to sandy desert environments in central Australia and exemplifies the close relationship between the distribution of tree species and related vegetation types. Adults (a) are often the only trees in sandy deserts, defining the desert oak woodlands major vegetation subgroup (MVS 72;Keith & Pellow, 2015). The species starts as feather-duster-like seedlings (b) that start branching after reaching below-ground water sources and has the largest fruits of the Casuarinaceae (c), suggesting unique functional traits. The occurrence records of the species (Atlas of Living Australia, ALA; https://www. ala.org.au/) suggest a wider distribution for MVS 72 than currently mapped (National Vegetation Information System V5.1, © Australian Government Department of Agriculture, Water and the Environment 2018; d) Forest plot data can reveal factors driving species co-occurrence, including ecological interactions such as competition and facilitation, which may co-vary with environmental conditions (Lankau et al., 2015;Steidinger et al., 2019). For example, in the Brazilian Atlantic forest, fragmentation was found to result in strong shifts toward tree species with less specialized pollinators and dispersers, which often require large stretches of high-quality forest (Girão et al., 2007;da Silva & Tabarelli, 2000). On a global scale, linking tree composition with associated symbionts revealed strong interactions between climate, microbial symbionts and trees. For example, ectomycorrhizal trees dominate at high latitudes and elevations, but arbuscular mycorrhizal trees in aseasonal, warm tropical forests (Steidinger et al., 2019).

| Functional trait and genetic data
Functional traits and genetics are important aspects of tree biodiversity and can provide insights into effects of biodiversity change on species, vegetation and ecosystem functioning (Dayrell et al., 2017;Echeverría-Londoño et al., 2018). Functional ecology has emerged as a dominant paradigm for understanding biophysical constraints on plant form and function, species-and community-level responses to environmental change, and ecosystem functioning in terrestrial ecosystems (Reich, 2014;Díaz et al., 2016;Dayrell et al., 2017;Gross et al., 2017). Molecular data have revolutionized our understanding of evolutionary relationships among species, populations and functional traits (Byrne et al., 2017;Dayrell et al., 2017;Sandel et al., 2019), identified cryptic species, and provided a deeper understanding of biodiversity (Turner et al., 2013;Eiserhardt et al., 2018;Forest et al., 2018;Sandel et al., 2020).
The compilation of open global plant trait databases (Appendix S1) and establishment of common measurement protocols (Pérez-Harguindeguy et al., 2013;Gallagher et al., 2020) have facilitated global syntheses of plant-environment relationships across time and space that provide insight to variation in key ecosystem functions and processes across ecological scales, e.g., leaves, individuals, communities, and ecosystems (Reich, 2014;Díaz et al., 2016;Echeverría-Londoño et al., 2018;van der Sande et al., 2020).
Tree species are relatively well represented in trait databases, with 95.4% of all tree species having data for at least one trait in the TRY database (Kattge et al., 2020; Figure 2), compared to 35.5% of all plant species . However, for only about half of these species (45.4% of all tree species) are data available for at least three traits (i.e., at least one in addition to growth form and woodiness), with the tropics being particularly poorly sampled (Figure 3b).
In addition, trait measurements in databases are typically reported as species' means, making it difficult to assess intra-specific variation when determining trait-based community processes (Violle et al., 2012).
Furthermore, trait databases are biased toward relatively easy to measure "effect" traits (i.e., specific leaf area, leaf dry matter content and leaf area) associated with a limited number of above-ground ecosystem functions, principally carbon storage, growth/productivity and nutrient cycling. These traits are not direct measures of plant function, unlike photosynthesis or water-use efficiency ( Figure 5), limiting current knowledge of tree function. They also do not capture information about below-ground and reproductive traits, which are associated with other key ecosystem functions (Girão et al., 2007;Ottaviani et al., 2017). However, there are efforts, such as GRooT (Global Root Traits database; https://groot -datab ase.github. io/GRooT/), to address this gap (Klimešová et al., 2018;Guerrero-Ramírez et al., 2021).
Vegetation community data are increasingly being linked to plant functional traits, and this is providing novel insights into global patterns for traits and trait relationships (Bruelheide et al., 2019;van der Sande et al., 2020). For example, climate (temperature variability and water availability) appears to be the key driver of functional diversity at the global scale (Wieczynski et al., 2019), while for communities in similar climatic and soil conditions, non-climatic factors, such as disturbance, fine-scale environmental heterogeneity and biological factors, appear to be more important . Another global-scale study found that invasive tree species are most abundant, if they are functionally similar to, but taller with higher seed production and wood density, than co-occurring native species (van der . Placement of tree species in a phylogenetic context should be based on DNA sequence data for each species and a well-supported and dated phylogeny that is readily updated when new data or methods become available (Eiserhardt et al., 2018). Many phylogenies, and the sequences they are derived from, are available from the TreeBASE database (https://www.treeb ase.org/). Although much of the molecular data gathered across the plant tree of life have been deposited into public databases, only 43.9% of all tree species are represented with any data in the GenBank database (https:// www.ncbi.nlm.nih.gov/genba nk/) and coverage is low in the tropics ( Figure 3d). However, this representation is still >10% greater than that for plants in general .
Of those species that have sequences, not all have available data that are useful for phylogenetic analyses, as genes sequenced for some species may not be widely sampled. In fact, only 24% of taxa with any public sequence data can be confidently placed into a large phylogeny (Smith & Brown, 2018). In addition, population-level molecular data and information about intra-specific genetic diversity can be extremely valuable for conservation and understanding evolution (González-Martínez et al., 2006;Byrne et al., 2017). However, while genetic data are available for two or more populations of numerous species (Nason et al., 1997;González-Martínez et al., 2006), a comprehensive database for such information is lacking.
Phylogenetic data can provide important information about the genetic diversity and evolutionary history of vegetation. For example, rates of recent speciation have been found to be highest in less biodiverse communities (Schluter & Pennell, 2017;Igea & Tanetzap, 2020). Furthermore, phylogenetic endemism, an indicator of phylogenetic uniqueness of a community, tends to be higher in regions of relatively high climatic stability, greater geographic isolation and topographically KEPPEL Et aL. more complex regions. Furthermore, the factors being most strongly related to phylogenetic endemism differ among phytogeographic regions in a manner that can be explained by their climatic histories (Sandel et al., 2020).

| Conservation status and socio-economic role
Trees are more threatened than ever by habitat destruction and degradation, overexploitation for timber and other products, displacement by invasive species, loss of pollinators and seed dispersers, and anthropogenic climate change (Allen et al., 2010;Crowther et al., 2015;Forest et al., 2018). It has been estimated that 15 million trees are cut down every year (Crowther et al., 2015) and that 2.3 million km 2 , an area more than five times the size of France, of forest cover were lost from 2000 to 2012 (Hansen et al., 2013). A holistic understanding of landscapes, vegetation, and ecological communities without considering anthropogenic influences is, therefore, generally impossible (Clark, 1996;van der Sande et al., 2019).
The IUCN Red List (https://www.iucnr edlist.org/) is the most comprehensive database for global conservation information and contains assessments for 45.3% of all tree species. Of the assessed species (Appendix S4), 9,854 (37.4%) are considered either threatened with extinction (9,792 species; "Critically Endangered", "Endangered" and "Vulnerable" categories) or extinct (62 species; "Extinct" and "Extinct in the Wild" categories). The proportion of species assessed is lowest in the tropics and the Southern Hemisphere (Figure 3e), possibly due to more limited resources for conservation in some regions (e.g., Keppel et al., 2012). Additionally, national lists of threatened species are more relevant for achieving protection than the IUCN Red List for some countries, such as Australia (Schatz, 2009). An additional key resource for tree conser-  (e.g., Gillespie et al., 2014). As the Global Tree Assessment, an initiative to assess the IUCN Red List status for all known tree species (Newton et al., 2015), is moving closer to completion, tree data in the IUCN Red List database will become an increasingly representative tool for assessing the conservation status of, and threats to, vegetation.
An understanding of the socio-economic history of landscapes and trees is essential for understanding all aspects of the biodiversity of trees and vegetation (e.g., Clark, 1996;Anisi et al., 2021). Extensive socio-economic data for trees are available but are widely scattered across floristic works and monographs, most with a relatively narrow geographic focus (e.g., Thaman, 1992) or targeted at a specific type of usage (e.g., Van Wyk & Wink, 2017). The Useful Tropical Plants Database (http://tropi cal.thefe rns.info/; Appendix S1) is one of the few databases with a socio-economic focus that covers a broad geographic region, and there is urgent need to digitize and integrate available data.

| TOWARDS A G LOBAL SYNTHE S IS OF TREE B I OD IVER S IT Y
A wide range of data on tree biodiversity are, therefore, available and

| Aggregating and imputing existing data
All available data for each dimension of biodiversity (Figure 1) should be readily and freely available. Data integration is probably most urgent for the forest plot data sets that are currently spread across a wide variety of databases. Such an initiative should be inclusive and diverse, reflecting the opinions and needs of data owners and data users alike, and should fairly acknowledge the contributions of data owners (Tenopir et al., 2011;Gallagher et al., 2020).
For example, the ongoing digitization of herbarium specimens is providing critical information to investigate all aspects of tree biodiversity (Soltis, 2017). Libraries are another rich source of information and the increasing power of text mining might help to mobilize data on traits or economic uses of trees (Deans et al., 2012;König et al., 2019). Furthermore, there can be extensive local and traditional knowledge about trees with high relevance for tree biodiversity and conservation (e.g., Thaman, 1992).
Aggregating and integrating existing data would also increase our capacity to impute gaps in our knowledge. For example, functional traits can be strongly correlated (Reich, 2014;Díaz et al., 2016). Such correlations allow the imputation of missing trait values using gapfilling techniques (Schrodt et al., 2015;Echeverría-Londoño et al., 2018;Rüger et al., 2020), especially for traits that are phylogenetically conserved. For example, functional traits can predict demographic parameters such as generation time in vascular plants with reasonable accuracy (Salguero-Gómez, 2017). Therefore, there is the potential to use information on tree functional traits, such as wood density and specific leaf area, to estimate the generation times for tree species for which structured population models are unavailable.
The Open Tree of Life project illustrates the utility of data imputation. It uses publicly available phylogenetic data and taxonomic information to construct more comprehensive phylogenies, extrapolating for taxa that lack available sequence data by assuming monophyly of the recognized genera and/or families (Hinchliff et al., 2015). While there is no obvious taxonomic bias in those species that lack data, tropical species are less likely to be represented by sequences (Smith & Brown, 2018). Although many species have not been sampled well enough to confidently place them in a phylogeny without the use of taxonomic information, more than 90% of tree genera are represented in GenBank.

| Addressing remaining data needs
Glaring data gaps remain and there are limitations to what can be achieved through data integration and imputation (Moles, 2018).
Many gaps clearly require additional data to be collected. For example, the placement of species in the Open Tree of Life phylogeny based on taxonomic information could be improved by specialists evaluating the morphology and genetics of understudied taxa and reporting results in a standardized format (Deans et al., 2012), particularly if focused on genera that currently lack any molecular data.
Closing some gaps, such as the limited data in all aspects of tree biodiversity from the tropics, and our extremely limited knowledge of intraspecific differences, will require considerable investment in human and financial resources. Furthermore, investment in adequate maintenance of biodiversity collections would help prevent irreplaceable loss of preserved specimens and associated data (Escobar, 2018). However, declines in the number of botanists and reluctance to publish (by high-impact journals) and fund botanical work are starting to limit our ability to examine new or understudied taxa (Crisci et al., 2020), despite many tree species still awaiting description (Cheek et al., 2020;Slik et al., 2015).

KEPPEL Et aL.
Technological advances allow filling some of these gaps faster and cheaper than we could in the past. For example, DNA barcoding and other molecular techniques can be used for the identification of species and populations, with important applications for conservation, ecology , and evolution (Kress et al., 2015).
Furthermore, high-throughput technologies such as near-infrared spectroscopy (NIRs) promise rapid, large-scale measurements of structural and chemical traits of leaves and wood (Ramirez et al., 2015). In addition, recent developments in remote-sensing techniques, such as LiDAR, allow mapping of trees and vegetation types (Schut et al., 2014;Vaughn et al., 2012) at large spatial extents, and faster quantification of forest functions, such as carbon storage (Saatchi et al., 2011) and leaf traits (Asner et al., 2017), especially when combined with statistical modeling (Butler et al., 2017). High-resolution, remotely-sensed imagery from satellites, planes, and unmanned aerial vehicles are making detailed mapping of individual trees increasingly feasible (Baena et al., 2017).
Although these data have limits, e.g., the extent of imagery available and the number of species that can be unambiguously identified, they can play a key role mapping and detecting change in tree distributions (Vaughn et al., 2012;Baena et al., 2017) and generating geo-referenced occurrence records throughout the entire range of tree species.

| Potential applications
Global, integrated tree biodiversity data have wide-ranging applica- TanDEM-X would facilitate remotely-sensed estimates of forest structure and biomass of unprecedent accuracy (Qi et al., 2019), and could be integrated with actual plot-based measurements to provide ground-truthing and improved accuracy. Comparing such a global map of tree height with potential maximum tree height, which could be obtained by combining data on species distributions (e.g., from GBIF or GIFT) and maximum tree height (e.g., from TRY), could provide new insights into the factors limiting tree height when related to topographic, climatic, and geological conditions.
Refugia, i.e., places that provide buffering from landscape-scale trends in climate change, have facilitated the persistence of biodiversity in the past and are considered increasingly important for conservation (Keppel et al., 2015). However, we still know little about the ecological and evolutionary functioning of refugia and an integrated functional traits and molecular approach has been proposed to address this . By integrating data from GBIF, TRY and GenBank, this data gap could be addressed on a global scale by comparing the functional and phylogenetic characteristics inside and outside of refugia. Furthermore, data on the demography and conservation status of tree species could provide insights about the performance of tree populations within refugia.

| CON CLUS IONS
Considerable amounts of data for a global assessment of tree biodiversity are available and have important applications for understanding large-scale patterns and processes of vegetation. Although important data gaps remain, a comprehensive, integrated synthesis of tree biodiversity data is feasible. We show that such an approach, especially when combined with gap-filling approaches, can provide an increasingly complete understanding of biodiversity (species, functional and genetic) patterns, community dynamics, and ecological interactions of vegetation and biodiversity.
Future macroecological studies utilizing biodiversity data would greatly benefit from all accessible data being available in a framework based on database structures and standard terminology that facilitate interoperability. Such a framework could be enhanced by continuous integration of key resources from evolving taxonomies, fair and equitable data-sharing principles, and stored and new data from herbaria, museums, and libraries made available through ongoing data mobilization efforts. These efforts will require the development of interoperable standard terminology for all aspects of biodiversity ( Figure 1) and could expand on existing standards. Due to the scale dependence of biodiversity and its environmental drivers (Chase et al., 2018;Ibanez et al., 2018), it is particularly important that data can be aggregated at multiple spatial grains and that the development of appropriate statistical techniques continues (Tovo et al., 2017;McGlinn et al., 2019).
An integrated, accessible framework would have broad applications, but outputs need to become more accessible to decision makers and end users. The framework, combined with imputation and modeling, would provide an increasingly comprehensive picture of biodiversity and macroecological patterns. Furthermore, data could be readily investigated to facilitate targeted and efficient data collection to fill gaps and move us closer to a complete picture. However, for these outputs to be efficiently applied in the management and conservation of biodiversity, a platform that makes this up-to-date information readily available to end users is needed. Therefore, the approach here used to comprehensively integrate existing tree data has broad applications for improved understanding and conservation of global biodiversity.

ACK N OWLED G EM ENTS
We thank the BGCI and its members for providing data on the global distributions of trees. We also appreciate the support of sPlot -the Global Vegetation-Plot database, a platform of iDiv -the German