Do plant traits retrieved from a database accurately predict on-site measurements?

Authors

  • Verena Cordlandwehr,

    Corresponding author
    1. Institute of Biology and Environmental Sciences, University of Oldenburg, Oldenburg, Germany
    • Community and Conservation Ecology, Centre for Ecological and Evolutionary Studies, Centre for life Sciences, University of Groningen, Groningen, The Netherlands
    Search for more papers by this author
  • Rebecca L. Meredith,

    1. Ecology, Evolution and Systematics, LMU Munich, Munich, Germany
    Search for more papers by this author
  • Wim A. Ozinga,

    1. Department of Experimental Plant Ecology, Institute for Water and Wetland Research, Radboud University Nijmegen, Nijmegen, The Netherlands
    2. Centre for Ecosystem Studies, Alterra, Wageningen University and Research, Wageningen, The Netherlands
    Search for more papers by this author
  • Renée M. Bekker,

    1. Community and Conservation Ecology, Centre for Ecological and Evolutionary Studies, Centre for life Sciences, University of Groningen, Groningen, The Netherlands
    Search for more papers by this author
  • Jan M. van Groenendael,

    1. Department of Experimental Plant Ecology, Institute for Water and Wetland Research, Radboud University Nijmegen, Nijmegen, The Netherlands
    Search for more papers by this author
  • Jan P. Bakker

    1. Community and Conservation Ecology, Centre for Ecological and Evolutionary Studies, Centre for life Sciences, University of Groningen, Groningen, The Netherlands
    Search for more papers by this author

Correspondence author. E-mail: v.cordlandwehr@rug.nl

Summary

  1. Trait-based approaches are increasingly used to obtain an insight into the functional aspects of plant communities. Since measuring traits can be time-consuming, large international databases of plant traits are being compiled to share the effort. From these databases, average trait values are often extracted per species by averaging trait values of individuals over multiple populations and habitats. However, the accuracy of such aggregated information from regional databases as a surrogate for on-site measurements has seldom been tested.

  2. For the local species pool (aggregated at the habitat-level) and the plant communities on the plots (aggregated at the community-level), we quantified how accurately trait values for each species measured at the plot (plot scale) and those averaged per species and site (site scale) can be estimated from those retrieved from a North-west-European trait database. We analysed three widely used plant traits, canopy height (CH), leaf dry matter content (LDMC) and specific leaf area (SLA), of species occurring in a wet meadow and a salt marsh.

  3. Database values more accurately predicted traits aggregated at the habitat-level than those aggregated at the community-level. In addition, traits with lower plasticity, such as LDMC, were more accurately predicted by database values. The performance of database values also depended upon the habitat studied, for example, habitat-level SLA values were accurately predicted by database values in the wet meadow but inaccurately predicted in the salt marsh.

  4. Synthesis. This study reveals that the accuracy of traits retrieved from a database depends on the level of aggregation (lower at community-level), the trait (lower in plastic traits) and the habitat type (lower in extreme habitats). For studies focussing on processes mainly acting at the site scale (e.g. trait–environment relationships), traits retrieved from a regional database and filtered according to habitat will probably lead to good results. Whereas studying processes acting at the plot scale (e.g. niche partitioning), requires the additional effort of measuring traits on-site.

Introduction

The increased availability of trait data on a vast number of species in databases (e.g. Fitter & Peat 1994; Hodgson et al. 1995; Wright et al. 2004; Wheeler, Baas & Rodgers 2007; Kleyer et al. 2008; Klimesova & de Bello 2009; Kattge et al. 2011) has facilitated ecologists in using trait-based approaches on large vegetation data sets (Baker et al. 2004; Wright et al. 2004; Ozinga et al. 2005; ter Steege et al. 2006; Swenson et al. 2007; Schumacher & Roscher 2009; Thompson et al. 2010; Pakeman 2011). A central assumption of applying species averages of trait data retrieved from a database is that the aggregation of trait data by species identities captures the majority of trait variation. This basic tenet, however, has seldom been tested (Kattge et al. 2011). The aggregation of multiple database entries per species implies averaging over multiple populations and habitats, which are spread over varying altitudes and longitudes. However, species traits show intraspecific variability between sites, caused by plastic reactions to differences in environmental conditions (e.g. Garnier et al. 2001; Mokany & Ash 2008), genotypic diversity (e.g. Whitlock, Grime & Burke 2010) or a combination of both (e.g. Grassein, Till-Bottraud & Lavorel 2010; Scheepens, Frei & Stocklin 2010).

The impact of intraspecific trait variability on trait-based assembly rules is currently an active point of discussion (Albert et al. 2010b, 2012; Hulshof & Swenson 2010; Jung et al. 2010; Messier, McGill & Lechowicz 2010). Until now, only a few studies have quantified the accuracy of traits measured within the site to local on-site measurements, finding both poor (Baraloto et al. 2010) and good correspondence (Cornwell, Schwilk & Ackerly 2006). To our knowledge there is no study quantifying the accuracy of common traits retrieved from a database for its proxy use (but see Vivian, Doherty & Cary 2010 for fire traits).

In the present study, we focus on the possibilities and limitations of using average trait values of plant species retrieved from a North-west-European trait database (LEDA) as a surrogate for on-site measurements. We used three morphological traits: canopy height (CH), leaf dry matter content (LDMC) and specific leaf area (SLA) in two grassland sites representing two habitat types. These traits are commonly used in functional and community ecology (Lavorel et al. 2008). CH is an important morphological factor for the competitive vigour of a species (Gaudet & Keddy 1988; Tilman 1988). LDMC is associated with leaf life span, as more conservative species have a higher mass density of leaf tissue (Weiher et al. 1999; Ryser & Urbas 2000) and is also correlated with community litter decomposition rates (Quested et al. 2007; Laughlin et al. 2010). SLA is tightly correlated with photosynthetic capacity and leaf life span (Ryser & Urbas 2000; Wright et al. 2004; Shipley 2006).

We addressed two questions in this study: (i) Can habitat species pool traits retrieved from a regional database be used as a proxy for habitat species pool traits measured at the site? And (ii) Can plant community traits calculated using species trait values retrieved from a database and those calculated using species trait values measured at the site be used as a proxy for community traits measured at the plot? We analysed our data using two levels of aggregation (habitats and communities) and three spatial scales of trait value origin (measurements at the plot as well as at the site and derived from a regional database) (see Fig. 1). Given the large growth and increasing usage of trait databases, it is important to know with what accuracy these data can be used in studying assembly rules or performing ecosystem research. With our study, we contribute to the discussion on when it is possible to use average trait values of species from regional or global databases as an accurate proxy for on-site trait values of individuals. This would have the advantage of conducting local-level experiments without the often laborious task of measuring the functional traits of individual plants for each sampled community.

Figure 1.

Definition of the aggregation levels of the trait data and the spatial scales of the origin of trait values (applicable for all three traits and both sites). 1: Increasing accuracy of trait data, from averaging trait values per species over multiple habitats (wide range of ecotypic variation within species) to averaging trait values per species within sites (thus within habitats) and within plots (thus within plot conditions); 2: Increasing spatial scale of trait value origin and thus potentially increasing distance to the actual trait values in the field; 3: Two levels of aggregation: Aggregation across the habitat species pool (thus including the filtering effects of site conditions on trait values) versus aggregation across co-occurring species in the local community (thus including the filtering effects of both local environmental conditions and local interactions on trait values).

Materials and methods

Sites

We examined two grassland sites: a wet meadow and a salt marsh. The wet meadow was located in the watershed of the Drentsche Aa in the Netherlands (53°2′ N, 6°39′ E). In total, 23 plots of 2 m × 2 m each were sampled in an area of about 40 ha. These plots were arranged along an elevation gradient perpendicular to the banks of the brook. The salt marsh was located on the Wadden Sea island of Schiermonnikoog in the Netherlands (53°30′ N, 6°10′ E). We sampled the vegetation within 47 plots of 2 m × 2 m each in an area of about 60 ha. These plots covered the gradient from low to high salt-marsh. The salt marsh represents a habitat with extreme abiotic conditions and thus, strong environmental filtering, which results in a small species pool consisting of specialized species. In contrast, the wet meadow represents a mesic habitat hosting a vast pool of species.

Within each plot, above-ground cover and abundance of each species were estimated using the decimal scale of Londo (1976). Vegetation recordings were made during the peak of the growing season in June and July of 2008 (full species lists see Table S4a-b in Supporting Information). The nomenclature follows Van der Meijden (1996) and European synonyms can be checked with the SynBioSys species checklist (Schaminée, Hennekens & Ozinga 2007). The 23 wet meadows plots contained a total of 72 species, which represents the species pool of this site, and a range of 10–34 species (mean = 23) in each plot. The species pool for the salt-marsh site consisted of 47 species, which includes the 27 salt-marsh plots with leaf trait data as well as canopy height measurements and the additional 20 salt-marsh plots with only canopy height measurements. Species number per plot ranged from 1 to 19 species (mean=8). For the analysis at the community-level, plots with plot scale trait data for less than three species were excluded (wet meadow: = 23, salt marsh: = 46 for CH and = 20 for leaf traits).

Trait measurements

Species trait data were collected from each plot at both sites. Individuals used for measurements were selected at random. All trait measurements were taken following the standard protocols of the LEDA-trait base (Knevel et al. 2003). CH was measured for three individuals of each species present per plot. We measured CH from the base of the plant to the highest photosynthetic tissue (m). For the leaf trait measurements, three leaves from three different individuals (total of nine leaves) were collected from each species present per plot and processed in the laboratory. We sampled all species at the flowering stage to compare all species in the same phenological stage as it is the standard procedure in trait databases.

For the wet meadow, we sampled the trait data in two phases: early flowering species in May and late flowering species in July of 2008. Sampled plant material was put immediately between moist paper sheets in a self-sealing plastic bag and stored in a freezer at −18 °C after sorting and cleaning in the laboratory.

In the salt marsh, trait data were sampled in June 2008. Sampled plant material was stored at 8 °C between moist paper sheets in self-sealing plastic bags and processed within 3 days.

Processing of the leaves involved scanning them with the HP Image & Scanning Program (2009), measuring their fresh weight, drying them in the oven at 70 °C for 24 h, and finally measuring their dry weight. Leaf area (cm2) was determined using the software, Lafore (Lehsten 2005). LDMC was calculated as the ratio of dry to fresh leaf weight (mg g−1), and SLA was calculated by dividing the area of the fresh leaf by the dry weight (mm2 mg−1) (Trait data see Table S4a–b).

Data analysis

We aggregated the trait data at two levels for each site: the habitat-level (i.e. the species pool) and the community-level (i.e. the plant communities in the 2 m × 2 m plots) (Fig. 1). In order to allow for comparisons between scales, we used trait values originating from three different scales for both the species and communities in our sites: (i) plot; (ii) site; (iii) regional database, the LEDA-trait database (Kleyer et al. 2008), but after excluding all measurements ever derived from our two sites (Fig. 1). To increase the sample size of trait measurements taken per species, we augmented the data retrieved from the LEDA database with unpublished measurements taken by the Landscape Ecology group at the University of Oldenburg as well as from recent literature (Kühner 2004; Schadek 2006; Lehsten 2009; Jung et al. 2010; Minden & Kleyer 2011).

Average trait values at the community-level were calculated by weighing species according to their relative percentage cover and averaging over all species of a community. The usage of community weighted means of trait values is very common in ecological studies (Lavorel et al. 2008) and is consistent with the ‘mass ratio hypothesis’, which states that more abundant species have a stronger effect on ecosystem functions (Grime 1998). At the community-level, the standard of ‘sampling at the flowering stage’ leads to an aggregation of species trait data sampled at different times in the growing period. As it is known that the studied traits vary between seasons (Garnier et al. 2001), the resulting composition of trait values probably never existed together at one point in time. Nevertheless, these community averages functionally characterize the community of co-occurring species as well as the habitat.

At both the habitat-level and community-level, we used the Spearman rank correlation coefficient in order to test whether the rank order of means was conserved for each trait between the plot scale, the site scale and the regional database scale. We further used linear regression of log-transformed trait values to test how well traits measured on-site at small scale (dependent variables) can be explained by traits originating from a broader scale (independent variables, e.g., traits retrieved from a database). We quantified the deviations from the isocline, which is the line of exact correspondence. In order to assess differences in trait variability between the scales, we calculated the coefficient of variation for each hypothetical trait distribution at the habitat-level, at the community-level and within each community.

All analyses were carried out using the statistical software package R (R Development Core Team 2009).

Results

Habitat-level traits

Using habitat-level traits retrieved from a database as predictors for habitat-level traits at the site scale resulted in reasonably accurate models for LDMC, although there was a greater overestimation of LDMC values in the wet meadow than in the salt marsh as shown by the shallower slope than expected (Table 1, Fig. 2). When comparing the LDMC between sites, the actual rank order of the two grassland sites with respect to average LDMC of their species pools (salt marsh: 210 mg g−1, wet meadow: 187 mg g−1) was reversed when traits retrieved from a database were used (wet meadow: 223 mg g−1, salt marsh: 205 mg g−1) as database values overestimate LDMC in the wet meadow. However, the rank order of LDMC values by species was well conserved (Fig. 2) and trait variability of LDMC at the habitat-level corresponded well between the two scales in both grasslands (see Table S1).

Table 1.  Linear regression models indicate that using habitat-level trait values from a regional database (data) to predict patterns at the site scale (site) are reasonably accurate for leaf dry matter content (LDMC) and specific leaf area (SLA) of the wet meadow site but less so for canopy height (CH) and specific leaf area (SLA) of the salt-marsh site. Trait data were log-transformed. ß0, intercept; ßlog([TRAIT]data), coefficient of the independent variable; NS, not significant; .<0.05, * < 0.01, ** < 0.005, *** < 0.001
Dependent variableCoefficients  R²
Wet meadow
log(CHsite)ß0−0.249 ± 0.114. 0.566***
ßlog(CHdata)0.792 ± 0.084 ***
log(LDMCsite)ß00.912 ± 0.378. 0.657***
ßlog(LDMCdata)0.794 ± 0.071 ***
log(SLAsite)ß0−0.289 ± 0.334NS 0.645***
ßlog(SLAdata)1.152 ± 0.106 ***
Salt marsh
log(CHsite)ß0−1.330 ± 0.245*** 0.484***
ßlog(CHdata)0.882 ± 0.136 ***
log(LDMCsite)ß00.601 ± 0.600NS 0.694***
ßlog(LDMCdata)0.895 ± 0.117 ***
log(SLAsite)ß00.896 ± 0.446NS 0.322***
ßlog(SLAdata)0.625 ± 0.169 **
Figure 2.

Habitat-level traits at site scale plotted against habitat-level traits calculated using trait values retrieved from a database. Only those species from the species pool are shown with a frequency of at least three in the sampled plots. Bars indicate the inter-quantile range between the first and the third quantile. The isocline (dashed line) indicates a perfect correspondence between trait values originating from the site and database scale ρ: Spearman rank correlation coefficient, n.s., not significant. .< 0.05, *< 0.01, ** < 0.005, *** < 0.001.

Habitat-level values of CH and SLA retrieved from a database were fairly good predictors for trait values measured at the site scale for the wet meadow (Table 1), as shown by the rank correlation between the two scales (Fig. 2). However, the variability of habitat-level traits was overestimated in CH values and underestimated in SLA values retrieved from a database (Table S1). Thus, within the species pool species’ CH values are more similar and species’ SLA values are less similar when measured at the site scale, which is supported by the slope coefficients of the linear models being below and above one, respectively (Table 1). Additionally, CH values were systematically lower at the site scale (see intercepts < 0 in Table 1 and species pool averages in Table S1). In the salt marsh, species values for SLA, and similarly those for CH, were poorly predicted by values retrieved from a database (Table 1), resulting in overestimation, and there was poor rank correlation between the two scales (Fig. 2).

Moreover our results indicate that within-species variability was lower than between-species variability for all three traits and both sites (Table S1).

Community-level traits

For both sites, community-level traits at the site scale were generally better predictors of community-level traits at the plot scale than those retrieved from a database (Table 2, see also Table S2), also showing a tighter rank correlation (Table 3). In the wet meadow, community mean traits at the site scale were good predictors of those at the plot scale, whereas this was only true for CH in the salt marsh (Table 2). In the salt marsh, community means of leaf traits at the site scale only showed a moderate predictive power. This difference between sites was even more pronounced when assigning traits retrieved from a North-west-European database. In the salt marsh, community mean traits retrieved from a database had either no (SLA) or a very limited (LDMC and CH) predictive power (Table 2). Comparison between the slope coefficients of the CH models shows that the distribution of community-level traits at the site scale is compressed, whereas community-level traits retrieved from a database are more scattered than at the plot scale (Table 2). We found the opposite pattern for community-level SLA in the wet meadow. For CH and SLA, this pattern is consistent with the deviation of between-community variability between scales. At the site scale, variability of community-level traits within sites is lower than at the plot scale and thus underestimates the variability, except for SLA in the wet meadow (Table 3).

Table 2. Linear regression models indicate that using community-level trait values measured at the site scale (site) to predict patterns at the plot scale (plot) is more accurate than using values from a regional database (data). Results were also more accurate for the wet meadow than the salt-marsh site. CWM, community trait mean weighted by cover; ß0, intercept; ß[TRAIT], coefficient of the predictor variable; NS, not significant, .< 0.05, * < 0.01, ** < 0.005, *** < 0.001
Dependent variable: CWM at plot scaleIndependent variable: CWM at site scaleIndependent variable: CWM retrieved from a database
Coefficients   R²Coefficients R²
Wet meadow
CHplot ß0 −0.256 ± 0.056*** 0.875*** ß0 0.085 ± 0.052NS 0.674***
ßCHsite 1.653 ± 0.136 *** ßCHdata 0.850 ± 0.129 ***
LDMCplot ß0 −2.635 ± 23.476NS 0.770*** ß0 69.907 ± 34.856NS 0.377**
ßLDMCsite 1.015 ± 0.121 *** ßLDMCdata 0.494 ± 0.139 **
SLAplot ß0 5.416 ± 2.156. 0.819*** ß0 −12.979 ± 5.070. 0.740***
ßSLAsite 0.779 ± 0.080 *** ßSLAdata 1.877 ± 0.243 ***
Salt marsh
CHplot ß0 −0.025 ± 0.020NS 0.768*** ß0 0.070 ± 0.034. 0.254***
ßCHsite 1.400 ± 0.116 *** ßCHdata 0.339 ± 0.088 ***
LDMCplot ß0 26.199 ± 63.241NS 0.456** ß0 83.903 ± 81.563NS 0.224.
ßLDMCsite 0.962 ± 0.248 ** ßLDMCdata 0.727 ± 0.320 .
SLAplot ß0 3.874 ± 2.527NS 0.415** ß0 15.291 ± 2.607*** 0.052NS
ßSLAsite 0.670 ± 0.188 ** ßSLAdata −0.165 ± 0.166 NS
Table 3. Spearman rank correlation coefficients (ρ) calculated for community-level traits indicate higher correlation between the site and plot scale than between the plot and database scale (data). Coefficients of variation (C υ) are also shown. [TRAIT]CWM, community trait mean weighted by cover; n PLOT, number of plots; NS,not significant. .< 0.05, *< 0.01, **< 0.005, ***< 0.001
n PLOTS ρDATA – PLOT ρSITE – PLOT C υ DATA C υ SITE C υ PLOT
             
Wet meadow
CHCWM 23 0.84***0.94***0.3160.1770.306
LDMCCWM 23 0.67***0.84***0.1420.1280.147
SLACWM 23 0.77***0.79***0.0940.1880.164
Salt marsh
CHCWM 46 0.42**0.85***0.6920.6420.835
LDMCCWM 20 0.53.0.72***0.2210.2390.319
SLACWM 20−0.11NS0.69***0.2250.1810.196

Lastly, for all three traits and at both sites, there was a poor correspondence between within-community variability of traits retrieved from a regional database with that of traits measured at the plot (Table S3). Except for leaf traits in the wet meadow, this was also the case for correspondence between variability at the site and plot scale, although to a lesser degree. In the wet meadow, there was clearly overestimation of within-community values of CH variability and underestimation of SLA variability when using trait data retrieved from a database (Fig. 3).

Figure 3.

Deviation of within-community trait variability retrieved from a database and within-community trait variability at the site scale from within-community trait variability at the plot scale, respectively. While the site scale deviation shown in percentage. The dashed line represents perfect correspondence.

Discussion

This study is the first to integrate trait aggregation level and the spatial scale of the origin of traits to test the accuracy of traits retrieved from a database as a proxy for on-site measurements. Our results stress the importance of testing one of the basic tenets of trait-based ecology, i.e. the aggregation of trait data per species captures the majority of trait variation (Kattge et al. 2011), because they show that the accuracy of traits retrieved from a database varies between the level of aggregation, the trait and the habitat type (Fig. 4). The deviation of the site-specific trait range of the species (habitat-level) to the traits retrieved from a database is potentiated at the community-level. This means that values retrieved from a database need to be a good proxy at the habitat-level to give good results at the community-level, but it cannot be inferred from it. For the leaf trait LDMC, showing a relatively low variability within species (Garnier et al. 2001; Roche, Diaz-Burlinson & Gachet 2004; Albert et al. 2010b), database values seem to be better proxies than for CH and SLA, which also follows from the ranking of these traits by standard deviation in the TRY database: CH (0.78), SLA (0.26) and LDMC (0.17) (Kattge et al. 2011). Comparing the two habitats, traits retrieved from a database had generally less predictive power in the more stressful, species-poor salt marsh than in the mesic wet meadow.

Figure 4.

Conclusions on the accuracy of the spatial scales of the origin of trait values for the two aggregation levels. 1: Increasing accuracy of trait data and by that strongly increasing sampling effort. 2: The increasing spatial scale of trait value origin leads to an acceptable deviation to the realised traits for the mesic habitat and to a strong deviation for the extreme habitat. 3: Trait data retrieved from a database give better results on the habitat-level as compared to the community-level (thus trait response to site conditions is better reflected in traits retrieved from a database than trait response to small-scale plot conditions).

Habitat-level traits

Our results show that species traits are really specific to species, as between-species variability was higher than within-species variability, which is consistent with the findings of other studies (Albert et al. 2010a; Hulshof & Swenson 2010; Jung et al. 2010; but see Messier, McGill & Lechowicz 2010). The species-specificity of trait values is the basis for a trait-for-species substitution (Keddy 1992; Shipley 2007). Our results show, what has already been shown within habitat types (Garnier et al. 2001; Lavorel et al. 2008) and for greenhouse measurements (Mokany & Ash 2008), that the rank order of species according to habitat-level traits at the site scale is conserved by habitat-level traits retrieved from a North-west-European database. Habitat-level SLA in the salt marsh was an exception.

Looking beyond the rank order of species, however, intraspecific trait variability led to a deviation of habitat-level traits retrieved from a database from those measured on-site. For example, the habitat characteristics of the wet meadow, such as the high groundwater table, led to an underestimation of habitat-level SLA and in turn an overestimation of habitat-level LDMC by data retrieved from a database. In the species pool of the wet meadow, grassland generalists occur at the wet end of their realized niche (e.g. Anthoxanthum odoratum, Holcus lanatus and Ranunculus repens) and respond to high water availability with high SLA and low LDMC values (Quetier, Thebault & Lavorel 2007; Poorter et al. 2009; Jung et al. 2010). This is in contrast to individuals sampled in trait databases, which originate from a much wider range of habitats including well-drained grasslands. Similarly, specific environmental conditions can explain the skewed trait distribution of habitat-level SLA values retrieved from a database in the salt marsh, as salinity is negatively correlated with SLA (Minden & Kleyer 2011). Habitat-level SLA in the salt marsh is overestimated by values retrieved from a database due to values contributed by glycophytes, which occur in both saline and fresh water habitats (e.g. Agrostis stolonifera, Trifolium repens and Parapholis strigosa). In databases, most trait records will come from individuals sampled from fresh water habitats, thus not reflecting osmotically stress-induced water deficiency through reduced leaf size (Hafsi et al. 2007). Also, habitat-level CH is overestimated by trait data retrieved from databases, as mowing, natural grazing and regular inundation limit maximum canopy height. Especially the individuals of species not exclusively found in grasslands are shorter in our sites than their representatives in the database (e.g. Phragmites australis and Scirpus sylvaticus). The results at the habitat-level show that species ranking in a species pool is less sensitive to the scale of trait aggregation than their actual values. Moreover, a well-functioning database option to select species measurements by habitats, as well as a sufficient number of entries per habitat, is indispensable for accurate substitution.

Community-level traits

Community-level traits retrieved from a database performed worse than those measured at the site scale, indicating that within-species variability of traits increases with the geographic range of measurements. Furthermore, our results for the saline grassland demonstrated that community-level traits retrieved from a database cannot be used for habitats with extreme habitat conditions. Both results are consistent with our findings at the habitat-level.

We found a good correspondence of trait averages at the community-level between the site and plot scale. This is consistent with the results of Baraloto et al. (2010) for trees in a tropical forest and Cornwell & Ackerly (2009) for woody species in Californian chaparral. The slightly worse performance of models predicting the averages of leaf traits at the community-level in the salt marsh results from the response of individuals to the strong salinity gradient within this site, as, for example, SLA is negatively correlated with salinity (Minden & Kleyer 2011). When community means are calculated at the site scale or with values retrieved from a database, trait averages are substituted for each species in the aggregation process (trait-for-species substitution). Therefore, intraspecific trait variability is omitted from the calculation of community-level traits, which can give misleading community means. This has already been shown for the site scale (Baraloto et al. 2010; Hulshof & Swenson 2010; Messier, McGill & Lechowicz 2010; Albert et al. 2012) and as our results show is potentiated when traits are retrieved from a database. Community-level patterns are skewed by both dominant species and satellite species, as testing the correspondence between database values and on-site measurements for non-weighted community means revealed as good or even worse results (data not shown). If we accept the assumption that community traits reflect environmental gradients (e.g. Ackerly & Cornwell 2007; Cornwell & Ackerly 2009; Sonnier, Shipley & Navas 2010), a trait-for-species substitution that results in a skewed trait–environment relationship can lead to an underestimation of community trait response to these gradients. In line to this community-level means vary less at the site scale than at the plot scale, as within-species variability of traits (reflecting environmental gradients) contributes considerably to the over-all variance of traits (Albert et al. 2010b; Messier, McGill & Lechowicz 2010).

In respect to within-community trait variability traits measured at the site scale perform poorly due to omission of intraspecific variability of traits, confirming the findings of other studies (Baraloto et al. 2010; Albert et al. 2012). Our results show that this effect is added when trait values are retrieved from a database.

Conclusions

Firstly, the conclusions on the accuracy of traits retrieved from a database depend on the level of aggregation, the type of trait and the habitat (Fig. 4). We conclude from our results that trait values retrieved from a database could predict on-site values better at the habitat-level (trait data aggregated across the habitat species pool) than at the community-level (trait data aggregated across co-occurring species in the local community). For traits linked to the usage of resources that are highly variable on a small spatial scale, such as CH, LDMC and SLA, which are linked to light capture, traits retrieved from a database generally have to be used with caution. However, dispersal traits for instance may lead to more accurate results. Especially for extreme habitats, we conclude that filtering database entries from comparable habitats will improve the accuracy of the results notably.

Secondly, we conclude that the merit of using trait values retrieved from a database depends on the research questions. For questions regarding trait–environment relationships, traits retrieved from a regional database are valuable provided that database entries originate from comparable habitats. This is also applicable for comparisons between communities, although community-level traits calculated using values retrieved from a database are less accurate. When community-level means of traits are, for example, used to assess ecosystem services, using traits from the site scale will give results closer to the actual trait values. From the poor correspondence of community-level traits with respect to within-community trait variability, we conclude that neither average trait values of species measured at the site scale nor those retrieved from a database can be used to study processes operating at the plot scale, such as niche partitioning and competitive exclusion. For these questions, it is strongly recommended to rigorously sample individual plants at the plot scale to calculate functional traits per species and community.

Acknowledgements

We thank Yzaak de Vries, Jacob Hogendorf and the students of the Community Ecology Course on Schiermonnikoog in 2008 for their help with field work and laboratory work and Michael Kleyer from the Landscape Ecology Group of the C.v.O. University of Oldenburg for supplying unpublished trait data. We further thank Ken Thompson, Hans Cornelissen and two anonymous reviewers for valuable comments on an earlier version of this manuscript. This work has been made possible thanks to the support from the European Science Foundation (ESF) under the EUROCORES Programme EuroDIVERSITY, through contract No. ERAS-CT-2003-980409 of the European Commission, DG Research, FP6. We thank the nature conservation agencies, State Forestry Commission and Natuurmonumenten, for permission to work in the nature reserves, Stroomdallandschap Drentsche Aa and Schiermonnikoog, respectively.

Ancillary