Assessing the reliability of predicted plant trait distributions at the global scale

Abstract Aim Predictions of plant traits over space and time are increasingly used to improve our understanding of plant community responses to global environmental change. A necessary step forward is to assess the reliability of global trait predictions. In this study, we predict community mean plant traits at the global scale and present a systematic evaluation of their reliability in terms of the accuracy of the models, ecological realism and various sources of uncertainty. Location Global. Time period Present. Major taxa studied Vascular plants. Methods We predicted global distributions of community mean specific leaf area, leaf nitrogen concentration, plant height and wood density with an ensemble modelling approach based on georeferenced, locally measured trait data representative of the plant community. We assessed the predictive performance of the models, the plausibility of predicted trait combinations, the influence of data quality, and the uncertainty across geographical space attributed to spatial extrapolation and diverging model predictions. Results Ensemble predictions of community mean plant height, specific leaf area and wood density resulted in ecologically plausible trait–environment relationships and trait–trait combinations. Leaf nitrogen concentration, however, could not be predicted reliably. The ensemble approach was better at predicting community trait means than any of the individual modelling techniques, which varied greatly in predictive performance and led to divergent predictions, mostly in African deserts and the Arctic, where predictions were also extrapolated. High data quality (i.e., including intraspecific variability and a representative species sample) increased model performance by 28%. Main conclusions Plant community traits can be predicted reliably at the global scale when using an ensemble approach and high‐quality data for traits that mostly respond to large‐scale environmental factors. We recommend applying ensemble forecasting to account for model uncertainty, using representative trait data, and more routinely assessing the reliability of trait predictions.


| INTRODUC TI ON
Global trait-based models have proliferated in recent years owing to the increasing availability of plant trait data (Kattge, 2019). Fitting and projecting trait-environment relationships over large spatial scales is becoming increasingly common to study trait trade-offs (e.g., Díaz et al., 2016;, to relate traits to environmental gradients (e.g., Moles et al., 2009;Wright et al., 2005) and to describe geographical patterns of traits (e.g., Madani et al., 2018;Yang et al., 2016). Trait-based models can not only increase our understanding of trait-environment relationships, but can also allow us to estimate how plant traits might respond to global environmental change (Bjorkman et al., 2018). This, in turn, is considered particularly useful for predicting the impact of environmental change on vegetation (Webb, Hoeting, Ames, Pyne, & LeRoy Poff, 2010), because trait-based models allow to directly link plant fitness to environmental filters, including climate, soil properties and disturbance (Keddy, 1992). Recent attempts to model global plant trait distributions as a function of environmental conditions have yielded different trait patterns (e.g., Butler et al., 2017;Madani et al., 2018;Moreno-Martínez et al., 2018;Van Bodegom, Douma, & Verheijen, 2014).
For example, Butler et al. (2017) and Moreno-Martínez et al. (2018) predicted specific leaf area to be low in western Canada and high in northern Russia and eastern Brazil, whereas Van Bodegom et al. techniques. The first difference translates to the use of global species trait averages, which may cause a potential mismatch of trait and environmental data. The second and third differences may result in different spatial patterns of uncertainty owing to extrapolations of trait-environment relationships (Thuiller, Brotons, Araujo, & Lavorel, 2004). The fourth difference may render considerable variation among predictions (Thuiller, Guéguen, Renaud, Karger, & Zimmerman, 2019). Furthermore, the ecological realism of the combination of predicted plant traits needs to be tested against observed trait combinations, which has, to our knowledge, not yet been done at the global scale. Additionally, although all studies reported the variance explained by the trait-based models, the predictability of independent samples (i.e., data not used to train the models) has not yet been assessed thoroughly. A necessary step forward to increase macroecological insights in global trait-environment relationships with potential application in ecological impact or conservation assessments (e.g., Lavorel & Garnier, 2002;Madani et al., 2018) is to perform a thorough assessment of the reliability of global plant trait predictions.
Here, we predict community mean plant traits at the global scale and present a systematic evaluation of the reliability of the predictions in terms of the models' accuracy, ecological realism and various sources of uncertainty. We systematically selected locally measured, representative data focusing on four widely studied plant traits (specific leaf area, leaf nitrogen concentration, height and wood density). These traits reflect the global spectra in plant form and function, are responsive to the abiotic environment and show physical trade-offs with other traits (Table 1; Díaz et al., 2016;Lavorel & Garnier, 2002). For each of the traits, we calculated community mean trait values and predicted global patterns at a 0.5° resolution (c. 55 km × 55 km at the equator) using an ensemble modelling approach based on two regression and two machine learning techniques. Subsequently, we evaluated the predictive performance of the models and assessed their ecological plausibility in terms of the trait-environment relationships and the correlations and combinations of individually predicted community mean trait values (Díaz et al., 2016;Lavorel & Garnier, 2002). Finally, we evaluated the effect of various sources of uncertainty: (a) the effect of data quality in terms of representativeness of the sampled species to the entire plant community and the use of global species trait averages versus local trait measurements; (b) the uncertainty across geographical space attributed to extrapolation of traits outside the applicability domain (i.e., the geographical area with environmental variation covered by the environmental variation of the trait data); and (c) the uncertainty across geographical space owing to discrepancies among the predictions of the four modelling techniques. We build upon this assessment to provide guidelines for the further development, interpretability and usability of global trait-based models.

| Plant functional traits
We selected four plant functional traits: specific leaf area (SLA; in square millimetres per milligram), leaf nitrogen concentration (LNC; in milligrams per gram), height (in metres) and wood density (in milligrams per cubic millimetre). The SLA and LNC are both linked to photosynthetic capacity and nutrient investment, where a high SLA and high LNC represent a fast return on investment at the expense of a shorter life span . Plant height is considered to be indicative of the ecological strategy of carbon distribution, indirectly determining growth and reproduction and their initial response to climate change (Moles et al., 2009). Last, wood density is a measure of carbon investment, representing a trade-off between growth (e.g., to overcome light limitation) and strength (e.g., mechanical support and drought tolerance), with higher wood density reflecting slower growth but increased strength at a similar stem diameter (Chave et al., 2009;Larjavaara & Muller-Landau, 2010).
The selected traits are easily measured according to standardized measurement procedures (Perez-Harguindeguy et al., 2013), which facilitates the integration of data from multiple datasets and reduces trait variation caused by seasonal changes (Bloomfield et al., 2018). To optimize between the number of observations and the consistency of the measurement methods, we included SLA measurements on both sunleaves and shade-leaves, wood density measurements on both heartwood and sapwood, and plant height measurements on both vegetative and generative plant organs (see also Siefert et al., 2015).

| Data collection and selection procedure
Our main source for plant trait data was the TRY database (Kattge et al., 2011), from which we received 964,464 trait records on 85,437 species from 168 datasets. We also obtained data from the Tundra Trait Team (TTT) database (Bjorkman et al., 2018) and from various other published and unpublished datasets (Supporting Information Appendix S1). A list of data sources is provided in the Appendix. All species names were standardized using The Plant List (2013).
We selected trait observations based on six criteria. First, we included only georeferenced observations in order to enable a meaningful link with environmental covariates. Second, we considered only real measurements of plant traits (i.e., no species-level averages) in order to include intraspecific trait variation, which can contribute substantially to the trait variation within and between communities (Albert et al., 2010;Bloomfield et al., 2018;Siefert et al., 2015). Third, we considered only measurements obtained from natural vegetation, to minimize the influences of local management practices and legacy effects of historical land use (Perring et al., 2017). Fourth, we included observations only from studies that measured all or the most abundant species present in the entire plant community or in the dominant vegetation structure, in order to account for the representativeness of the sampled species for the plant community. This is in line with previous studies on this topic (e.g., Poorter et al., 2017) and in accordance with the biomass ratio hypothesis (Grime, 1998).
Fifth, we included observations only from studies that targeted all life stages and/or size classes and from studies that targeted only adults. Thus, we excluded observations from studies focusing only on early-successional plant communities and studies measuring only seedlings or juveniles in a more established vegetation, in order to reduce confounding effects of ontogeny (e.g., Thomas & Bazzaz, 1999) and succession (e.g., Purschke et al., 2013

| Data processing
We checked and corrected for possible errors in our database, such as duplicates, coordinate and unit inaccuracies and outliers (see Supporting Information Appendix S2). We then calculated locationspecific community means per trait and per study to represent the mean response of the plant community to the local environment (Ackerly & Cornwell, 2007). Using unweighted community means was the better compromise over abundance-weighted community means, mainly because previous studies were not conclusive on the superiority of abundance-weighted means and because > 50% of our data did not include species abundances (for further discussion, see Supporting Information Appendix S3). We averaged the locationspecific community means to 0.5° grid cells ( Figure S4).

| Environmental data
We considered environmental variables that are expected to affect plant performance (e.g., Kimball, Gremer, Angert, Huxman, & Venable, 2012). This allowed us to check the ecological plausibility of the resulting trait-environment relationships. Based on ecological relevance, we selected bioclimatic variables from CHELSA v.1.2 (Karger et al., 2017).
Given that water availability to plants is not determined by precipitation alone, we calculated the aridity index as the mean annual precipitation divided by the mean annual potential evapotranspiration, which in turn was calculated using the Penman-Monteith model (Zomer, Trabucco, Bossio, & Verchot, 2008). However, given that lower values of the aridity index indicate higher aridity, to avoid confusion this predictor will be referred to as the "humidity index" from now on. Furthermore, we selected soil characteristics from SoilGrids250m (Hengl et al., 2017) and resampled them to a resolution of 0.5° to match the resolution of the plant community mean trait data. We averaged the soil data to a depth of 30 cm, which we considered most relevant for community composition via plant establishment and by influencing plants in later life stages, for example, through the potentially high nutrient availability (e.g., Vitousek & Sanford, 1986).
To avoid collinearity, we reduced the number of predictors based

| Model fitting and validation
For each relationship between the plant traits and environmental variables, we formulated expectations on the shape of the response based on existing literature (Table 1) To quantify the relative importance of each predictor in a consistent way across the models, we predicted traits using permuted values for the predictor of concern, correlated those predictions with predictions of the model using the original data and quantified relative variable importance as one minus the Spearman rank correlation coefficient (Thuiller, Lafourcade, Engler, & Araújo, 2009).

| Testing sensitivity to data quality
To test the influence of data quality on the performance of the trait models, we created three alternative datasets differing in terms of trait values (i.e., including intraspecific trait variation versus using speciesspecific trait values) and in terms of species representativeness (i.e., a representative sample of species versus the random selection of one species in the plant community; Supporting Information, Figure S7).
For each alternative dataset, we fitted the four different models for each of the four traits, following the procedure described above for our default dataset. We then evaluated each model separately, where the predictions are confronted with the observed community means of the full dataset (Supporting Information Appendix S7).

| Spatial predictions and their assessment
To derive spatial predictions per trait, we used an ensemble forecasting procedure, which averages the predictions of the four algorithms weighted by their cross-validated pseudo-R 2 values (Marmion, Parviainen, Luoto, Heikkinen, & Thuiller, 2009 exist, but merely that they did not occur in the input data and should therefore be interpreted with caution. To identify the applicability domain, we calculated and mapped the multivariate environmental similarity surface (Elith, Kearney, & Phillips, 2010). This analysis quantifies, per grid cell, the differ-

| Predicted global plant trait variation
The variation explained by the environmental variables varied among traits (Figure 1). Model predictive performance was highest for plant height, followed by wood density, SLA and LNC (Figure 1). The wood density for all woody vegetation in a location, was predicted to be particularly low in areas with low PrecSeas (Figure 2g).

| Trait-environment relationships
In our models, community mean SLA was mostly explained by HumInd and Tmin ( Figure 3a). As expected, SLA showed a unimodal albeit mostly decreasing response to Tmin, an increase with HumInd and a flat response to PrecSeas. However, we did not find the expected response of SLA to soil CEC and soil pH (Table 1; Figure 4). We also found a flat response of SLA to PrecDryQ ( Figure 4). Community mean LNC was mostly explained by Tmin ( Figure 3b). We expected LNC to decrease with Tmin, to increase with HumInd and to show a flat response to PrecSeas, but we found a unimodal response of LNC with Tmin around 0 °C, and a decrease of LNC with HumInd and PrecSeas (Table 1; Figure 4).
We also found LNC to increase with PrecDryQ and, as expected, to show a flat response to soil CEC and soil pH (Table 1; Figure 4).
Community mean plant height was mostly explained by PrecDryQ ( Figure 3c). As expected, height increased with increasing Tmin and PrecDryQ, whereas in contrast to our original expectation, we found height to decrease with HumInd and to increase with PrecSeas (Table 1; Figure 4). Height showed a flat response to soil CEC and a unimodal response to soil pH. Community mean wood density was mostly explained by Tmin ( Figure 3d). As expected, wood density increased with Tmin, decreased with HumInd and decreased overall with soil CEC (Table 1; Figure 4). Against our expectations, wood density showed a flat response to PrecSeas and soil pH (Table 1; Figure 4). Furthermore, we found a flat response of wood density to PrecDryQ (Figure 4).

| Combinations of predicted traits
The  Figure S9).

| Data quality
The predictive accuracy decreased by 11% on average across all traits and models when intraspecific trait variation was excluded ( Figure 5; Supporting Information Table S7). When species were sampled randomly (i.e., an unrepresentative sample), the predictive accuracy of the models decreased on average by 19% ( Figure 5). The combination of ignoring intraspecific trait variation and using a nonrepresentative species sample amplified the reduction in accuracy to 28% compared with the default models ( Figure 5).

| Applicability domain
Despite data paucity in large areas of the world (e.g., India, Asian Russia and Africa; Figure 6), the trait data covered a large part of the global environmental space (Figure 6; Supporting Information Figure   S10). However, deserts, tropical islands and some parts of the Arctic were outside the environmental domain covered by the trait data.

| Model selection
Predictive performance differed between models per trait, where RF showed the highest predictive performance for SLA, LNC and wood density, and GLM for height ( Figure 1). The ensemble predictions were always equal to or better than the individual models ( Figure 1). The consistency in the predictions among the four modelling techniques varied geographically (Figure 2b,d,f,h). The  coefficient of variation of SLA was mostly low, apart from some areas in Africa. LNC had an overall low coefficient of variation, except for the Sahara. Plant height showed a shifting pattern of high and low coefficients of variation over the globe. Wood density had an low coefficient of variation overall but was predicted with more uncertainty in parts of Africa and most of the Arctic.

| Global trait patterns
The variance explained by our models predicting global community mean trait values along environmental gradients is comparable with other global or large-scale trait-based studies (Butler et al., 2017;Madani et al., 2018;Van Bodegom et al., 2014;Yang et al., 2016). Whenever trait predictions among different studies disagree, it is difficult to conclude which predictions are more reliable. One option is to validate global trait predictions against regional trait maps. For example, we found wood density to be higher in the east of the Amazon region compared with the north-west and south-west (Figure 2g), as was found by Baker et al. (2004). However, the paucity of regional trait maps makes this validation method impractical. Another option is to consider model predictive performances and the applicability domain to infer reliability of predictions. Unfortunately, the lack of quantification and indication of these in previous studies prevents us from drawing any conclusions about which predictions are more accurate.

| Ecological evaluation of global trait predictions
In general, we found community mean SLA, plant height and wood density to vary more with climatic factors than with soil  Figure S11). Furthermore, all predicted trait combinations and correlations were realistic.
Our results confirmed most of the expected relationships between SLA and the environmental variables (Table 1; Figure 4). However, we found SLA to decrease with soil CEC and to show a flat response to soil pH, whereas we expected SLA to increase with soil fertility because less durable structures are thought to be maintained with higher soil fertility (Table 1; Figure 4). The flat response of SLA with soil pH might be explained by a high nutrient turnover rate in areas of low soil fertility (Vitousek & Sanford, 1986), although it might also be simply that SLA does not respond to changes in soil pH (Firn et al., 2019).
The expected responses of LNC to the environmental variables were not found (Table 1; Figure 4). The partial responses of LNC showed great variation over small environmental ranges, possibly owing to the tendency of machine learning models to over-fit.
Together with the low predictive performance of the models, this might reflect that LNC responds primarily to small-scale environmental variation, whereas our models make predictions at a coarser resolution (55 km × 55 km). Our results thus support the deviating responses of LNC to environmental variables at the species level (Maire et al., 2015;Ordoñez et al., 2009;Reich & Oleksyn, 2004;Reich et al., 1996).
Additionally, LNC might be highly variable in relationship to multiple environmental factors, leading to non-universal adaptations to the predictors in our model (Bloomfield et al., 2018;Reich & Oleksyn, 2004). This variability makes it difficult to interpret LNC responses to large-scale environmental gradients biologically, especially given that they will vary depending on whether intraspecific variation is considered or not (Albert et al., 2010). We conclude that it is not possible to predict LNC distributions reliably at this extent and resolution.
Our results confirmed the expected relationships between average community height and Tmin and PrecDryQ (Table 1; Figure 4).
However, we found mean plant height to decrease with HumInd (Table 1; Figure 4). This might indicate that shorter vegetation (e.g., grasses, herbs and shrubs) is more abundant in more humid environments (i.e., higher annual rainfall and/or lower potential evapotranspiration  (Borchert, 1998;Canadell et al., 1996).
Our results confirmed the expected relationships between wood density and Tmin, HumInd and soil CEC (Table 1; Figure 4). Both the increase in wood density with increasing Tmin and the unexpected flat response with PrecSeas might reflect that colder areas with less seasonal precipitation are dominated by soft-wooded gymnosperms, whereas warmer areas with higher seasonal precipitation are dominated by angiosperms with generally denser wood (Swenson & Enquist, 2007), although xylem vulnerability for frost-induced cavitation decreases with wood density (Reich, 2014). We expected a decrease in wood density with soil CEC and soil pH because higher soil fertility is expected to sustain higher growth rates. Although the overall trend confirmed our expectation, the response to pH was very limited, and at the lower end of the CEC gradient, wood density showed a small increase. This might indicate that extreme limitation to accessible soil nutrients cannot sustain high wood density, whereas low fertility but not nutrient-limited environments promote slow growth, resulting in high wood density (Table 1). Given that no global trait-environment relationship has been described for wood density (Moles, 2018), these patterns should be investigated further.
Literature reports that some relationships between traits and environmental variables may vary with leaf habit, plant growth forms and photosynthetic pathway at the local scale (e.g., Šímová et al., 2018). Additionally, trait-environment relationships are affected by factors other than climate and soil properties, such as land-use type F I G U R E 6 Locations of trait observations (black dots) and the environmental coverage of trait observations, that is, the multivariate environmental similarity surface (Elith et al., 2010), where blue represents interpolation and red represents extrapolation. More intense shades indicate greater similarities (blue) or differences ( and disturbance (e.g., Chen et al., 2018). However, we expect that these factors are of limited relevance at the biological scale (communities instead of species), spatial resolution and extent considered, where environmental filtering effects are expected to be much less confounded by biotic interactions and fine-grain disturbances (Pearson & Dawson, 2003).

| Data quality
We found that including intraspecific trait variation contributed to improve the predictability of traits ( Figure 5; Supporting Information Table S7). This improvement in trait predictions at the global scale highlights the importance of considering intraspecific trait variation over and above the conclusion from small-scale studies that intraspecific trait variation contributes greatly to community trait variation (Albert et al., 2010;Bloomfield et al., 2018;Poorter, Castilho, Schietti, Oliveira, & Costa, 2018;Siefert et al., 2015). This indicates that widespread species are likely to show adaptability in their traits, changing them in order to optimize performance for different environments.
Additionally, our results emphasize the need to build community trait models on a representative sample of species ( Figure 5; Supporting Information Table S7). Moreover, high-quality data not only improve models statistically, but also theoretically lead to different results . Thus, the inclusion of intraspecific trait variation of species representative of the local vegetation should be preferred when the aim is to predict plant community mean trait values.

| Applicability domain
The strict selection criteria we set greatly reduced the amount of available community trait data (Supporting Information Table S2).
Nevertheless, our dataset covered the major part of the global terrestrial environmental space, indicating a wide applicability domain of our models ( Figure 6). However, trait predictions should be interpreted carefully for deserts, the Arctic and tropical islands because of the high variation in predictions between models, and for mountainous areas because of the high environmental variation within a grid cell. Furthermore, predictions for wood density are extrapolated to a larger extent in comparison to other traits. A reason for the fewer community mean data points for wood density is that species mean values are generally considered appropriate because interspecific variation in wood density is larger than the intraspecific variation; therefore, new wood density data are rarely collected (Siefert et al., 2015).

| Model selection
The selection of a specific modelling technique greatly affected the ability to predict traits, and no single "best" model could be identified ( Figure 1)

| Reliability of global plant trait predictions
Our results suggest that plant community traits can be predicted reliably at the global scale when using an ensemble approach with high-quality data (i.e., including intraspecific trait variation and a representative species sample). We show that intraspecific trait variation and the representativeness of species considered in a community are important factors to consider, even at the global scale, and that an ensemble forecasting approach helps to deal with and quantify multiple types of uncertainty. Based on these results, we recommend the systematic and careful selection of data and modelling techniques for trait-environment models, and more routine assessment of their reliability based on model predictive performance, applicability domain, model uncertainty and realism of predicted trait combinations. Such systematic presentation of validation results and applicability domain in studies presenting predictions of spatial patterns of community mean traits will enhance our ability to build upon previous modelling attempts and improve our understanding of trait-environment relationships.
Our approach also led to new insights, such as the unexpected increase of community height with the seasonality of precipitation, but the lack of proper model assessment by previous studies limits our ability to draw any objective conclusions on the observed differences in trait responses. We suggest that higher predictive accuracy can be achieved for traits that respond primarily to large-scale environmental factors, such as specific leaf area, whereas predictive accuracy at this extent and resolution would be lower for traits such as LNC that might respond primarily to small-scale environmental