TRY – a global database of plant traits

Plant traits – the morphological, anatomical, physiological, biochemical and phenological characteristics of plants and their organs – determine how primary producers respond to environmental factors, affect other trophic levels, influence ecosystem processes and services and provide a link from species richness to ecosystem functional diversity. Trait data thus represent the raw material for a wide range of research from evolutionary biology, community and functional ecology to biogeography. Here we present the global database initiative named TRY, which has united a wide range of the plant trait research community worldwide and gained an unprecedented buy-in of trait data: so far 93 trait databases have been contributed. The data repository currently contains almost three million trait entries for 69 000 out of the world's 300 000 plant species, with a focus on 52 groups of traits characterizing the vegetative and regeneration stages of the plant life cycle, including growth, dispersal, establishment and persistence. A first data analysis shows that most plant traits are approximately log-normally distributed, with widely differing ranges of variation across traits. Most trait variation is between species (interspecific), but significant intraspecific variation is also documented, up to 40% of the overall variation. Plant functional types (PFTs), as commonly used in vegetation models, capture a substantial fraction of the observed variation – but for several traits most variation occurs within PFTs, up to 75% of the overall variation. In the context of vegetation models these traits would better be represented by state variables rather than fixed parameter values. The improved availability of plant trait data in the unified global database is expected to support a paradigm shift from species to trait-based ecology, offer new opportunities for synthetic plant trait research and enable a more realistic and empirically grounded representation of terrestrial vegetation in Earth system models.


Introduction
Plant traits -morphological, anatomical, biochemical, physiological or phenological features measurable at the individual level (Violle et al., 2007) -reflect the outcome of evolutionary and community assembly processes responding to abiotic and biotic environmental constraints (Valladares et al., 2007). Traits and trait syndromes (consistent associations of plant traits) determine how primary producers respond to environmental factors, affect other trophic levels and influence ecosystem processes and services (Aerts & Chapin, 2000;Grime, 2001Grime, , 2006Lavorel & Garnier, 2002;Díaz et al., 2004;Garnier & Navas, 2011). In addition, they provide a link from species richness to functional diversity in ecosystems (Díaz et al., 2007). A focus on traits and trait syndromes therefore provides a promising basis for a more quantitative and predictive ecology and global change science (McGill et al., 2006;. Plant trait data have been used in studies ranging from comparative plant ecology (Grime, 1974;Givnish, 1988;Grime et al., 1997) and functional ecology (Grime, 1977;Reich et al., 1997;Wright et al., 2004) to community ecology Kraft et al., 2008), trait evolution (Moles et al., 2005a), phylogeny reconstruction (Lens et al., 2007), metabolic scaling theory , palaeobiology (Royer et al., 2007), biogeochemistry (Garnier et al., 2004;Cornwell et al., 2008), disturbance ecology (Wirth, 2005;Paula & Pausas, 2008), plant migration and invasion ecology (Schurr et al., 2005), conservation biology (Ozinga et al., 2009;Rö mermann et al., 2009) and plant geography (Swenson & Weiser, 2010). Access to trait data for a large number of species allows testing levels of phylogenetic conservatism, a promising principle in ecology and evolutionary biology (Wiens et al., 2010). Plant trait data have been used for the estimation of parameter values in vegetation models, but only in a few cases based on systematic analyses of trait spectra (White et al., 2000;Kattge et al., 2009;Wirth & Lichstein, 2009;Ziehn et al., 2011). Recently, plant trait data have been used for the validation of a global vegetation model as well (Zaehle & Friend, 2010).
While there have been initiatives to compile datasets at regional scale for a range of traits [e.g. LEDA (Life History Traits of the Northwest European Flora: http:// www.leda-traitbase.org), BiolFlor (Trait Database of the German Flora: http://www.ufz.de/biolflor), EcoFlora (The Ecological Flora of the British Isles: www.ecoflora. co.uk), BROT (Plant Trait Database for Mediterranean Basin Species: http://www.uv.es/jgpausas/brot.htm)] or at global scale focusing on a small number of traits [e.g. GlopNet (Global Plant Trait Network: http://www. bio.mq.edu.au/$ iwright/glopian.htm), SID (Seed Information Database: data.kew.org/sid/)], a unified initiative to compile data for a large set of relevant plant traits at the global scale was lacking. As a consequence studies on trait variation so far have either been focussed on the local to regional scale including a range of different traits (e.g. Baraloto et al., 2010), while studies at the global scale were restricted to individual aspects of plant functioning, e.g. the leaf economic spectrum (Wright et al., 2004), the evolution of seed mass (Moles et al., 2005a, b) or the characterization of the wood economic spectrum ). Only few analyses on global scale have combined traits from different functional aspects, but for a limited number of plant species (e.g. Díaz et al., 2004).
In 2007, the TRY initiative (TRY -not an acronym, rather an expression of sentiment: http://www.try-db. org) started compiling plant trait data from the different aspects of plant functioning on global scale to make the data available in a consistent format through one single portal. Based on a broad acceptance in the plant trait community (so far 93 trait databases have been contributed, Table 1), TRY has accomplished an unprecedented coverage of trait data and is now working towards a communal global repository for plant trait data. The new database initiative is expected to contribute to a more realistic and empirically based representation of plant functional diversity on global scale supporting the assessment and modelling of climate change impacts on biogeochemical fluxes and terrestrial biodiversity (McMahon et al., 2011).
For several traits the data coverage in the TRY database is sufficient to quantify the relative amount of intra-and interspecific variation, as well as variation within and between different functional groups. Thus, the dataset allows to examine two basic tenets of comparative ecology and vegetation modelling, which, due to lack of data, had not been quantified so far: (1) On the global scale, the aggregation of plant trait data at the species level captures the majority of trait variation. This central assumption of plant comparative ecology implies that, while there is variation within species, this variation is smaller than the differences between species (Garnier et al., 2001;Keddy et al., 2002;Westoby et al., 2002;Shipley, 2007). This is the basic assumption for using average trait values of species to calculate indices of functional diversity (Petchey & Gaston, 2006;de Bello et al., 2010;Schleuter et al., 2010), to identify ecologically important dimensions of trait variation (Westoby, 1998) or to determine the spatial variation of plant traits Swenson & Weiser, 2010).
(2) On the global scale, basic plant functional classifications capture a sufficiently important fraction of trait variation to represent functional diversity. This assumption is implicit in today's dynamic global vegetation models (DGVMs), used to assess the response of ecosystem processes and composition to CO 2 and climate changes. Owing to computational constraints and lack of detailed information these models have been developed to represent the functional diversity of 4300 000 documented plant species on Earth with a small number (5-20) of basic plant functional types (PFTs, e.g. Woodward & Cramer, 1996;Sitch et al., 2003). This approach has been successful so far, but limits are becoming obvious and challenge the use of such models in a prognostic mode, e.g. in the context of Earth system models McMahon et al., 2011).
This article first introduces the TRY initiative and presents a summary of data coverage with respect to different traits and regions. For a range of traits, we characterize general statistical properties of the trait density distributions, a prerequisite for statistical analyses, and provide mean values and ranges of variation. For 10 traits that are central to leading dimensions of plant strategy, we then quantify trait variation with respect to species and PFT and thus examine the two tenets mentioned above. Finally, we demonstrate how trait variation within PFT is currently represented in the context of global vegetation models.

Types of data compiled
The TRY data compilation focuses on 52 groups of traits characterizing the vegetative and regeneration stages of plant life cycle, including growth, reproduction, dispersal, establishment and persistence (Table 2). These groups of traits were collectively agreed to be the most relevant for plant life-history strategies, vegetation modelling and global change responses on the basis of existing shortlists (Grime et al., 1997;Weiher et al., 1999;Lavorel & Garnier, 2002;Cornelissen et al., 2003b;Díaz et al., 2004;Kleyer et al., 2008) and wide consultation with vegetation modellers and plant ecologists. They include plant traits sensu stricto, but also 'performances' (sensu Violle et al., 2007), such as drought tolerance or phenology. Quantitative traits vary within species as a consequence of genetic variation (among genotypes within a population/ species) and phenotypic plasticity. Ancillary information is necessary to understand and quantify this variation. The TRY dataset contains information about the location (e.g. geographical coordinates, soil characteristics), environmental conditions during plant growth (e.g. climate of natural environment or experimental treatment), and information  Cornelissen Cornelissen (1996), Cornelissen et al. (1996Cornelissen et al. ( , 1997Cornelissen et al. ( , 1999Cornelissen et al. ( , 2001Cornelissen et al. ( , 2003aCornelissen et al. ( , 2004, Castro-Diez et al. (1998, Quested et al.   Ogaya & Peñ uelas (2003, Sardans et al. (2008a, b) Reich et al. (2008) Continued about measurement methods and conditions (e.g. temperature during respiration or photosynthesis measurements). Ancillary data also include primary references. By preference individual measurements are compiled in the database, like single respiration measurements or the wood density of a specific individual tree. The dataset therefore includes multiple measurements for the same trait, species and site. For some traits, e.g. leaf longevity, such data are only rarely available on single individuals (e.g. Reich et al., 2004), and data are expressed per species per site instead. Different measurements on the same plant (resp. organ) are linked to form observations that are hierarchically nested. The database structure ensures that (1) the direct relationship between traits and ancillary data and between different traits that have been measured on the same plant (resp. organ) is maintained and (2) conditions (e.g. at the stand level) can be associated with the individual measurements (Kattge et al., 2010). The structure is consistent with the Extensible Observation Ontology (OBOE;  Sack et al. (2003Sack et al. ( , 2005 Tropical Traits from West Java Database S. Shiodera Shiodera et al. (2008) 79 Leaf And Whole Plant Traits Database B. Shipley Shipley (1989Shipley ( , 1995, Shipley and Meziane (2002), Shipley & Parent (1991), McKenna & Shipley (1999), Meziane & Shipley (1999a, Pyankov et al. (1999) Madin et al., 2008), which has been proposed as a general basis for the integration of different data streams in ecology. The TRY dataset combines several preexisting databases based on a wide range of primary data sources, which include trait data from plants grown in natural environments and under experimental conditions, obtained by a range of scientists with different methods. Trait variation in the TRY dataset therefore reflects natural and potential variation on the basis of individual measurements at the level of single organs, and variation due to different measurement methods and measurement error (random and bias).

Data treatment in the context of the TRY database
The TRY database has been developed as a Data Warehouse ( Fig. 1) to combine data from different sources and make them available for analyses in a consistent format (Kattge et al., 2010). The Data Warehouse provides routines for data extraction, import, cleaning and export. Original species names are complemented by taxonomically accepted names, based on a checklist developed by IPNI (The International Plant Names Index: http://www.ipni.org) and TROPICOS (Missouri Botanical Garden: http://www.tropicos.org), which had been made publicly available on the TaxonScrubber website by the SALVIAS (Synthesis and Analysis of Local Vegetation Inventories Across Sites: http://www.salvias.net) initiative (Boyle, 2006). Trait entries and ancillary data are standardized and errors are corrected after consent from data contributors. Finally, outliers and duplicate trait entries are identified and marked (for method of outlier detection, see Appendix S1). The cleaned and complemented data are moved to the data repository, whence they are released on request.

Selection of data and statistical methods in the context of this analysis
For the analyses in the context of this manuscript, we have chosen traits with sufficient coverage from different aspects of plant functioning. The data were standardized, checked for errors and duplicates excluded. Maximum photosynthetic rates and stomatal conductance were filtered for temperature (15-30 1C), light (PAR 4500 mmol m 2 s À1 ) and atmospheric CO 2 concentration during measurements (300-400 ppm); data for respiration were filtered for temperature (15-30 1C). A temperature range for respiration from 15-30 1C will add variability to trait values. Nevertheless, an immediate response of respiration to temperature is balanced by an opposite adaptation of basal respiration rates to long-term temperature changes. More detailed analyses will have to take short-and long-term impact of temperature on both scales into account. With respect to photosynthetic rates the problem is similar, but less severe. Statistical properties of density distributions of trait data were characterized by skewness and kurtosis on the original scale and after log-transformation. The Jarque-Bera test was applied to assess departure from normality (Bera & Jarque, 1980). Finally outliers were identified (see supporting information, Appendix S1). The subsequent analyses are based on standardized trait values, excluding outliers and duplicates. PFTs were defined similar to those used in global vegetation models (e.g. Woodward & Cramer, 1996;Sitch et al., 2003; see Table 5), based on standardized tables for the qualitative traits 'plant growth form' (grass, herb, climber, shrub, tree), 'leaf type' (needle-leaved, broad-leaved), 'leaf phenology type' (deciduous, evergreen), 'photosynthetic pathway' (C3, C4, CAM) and 'woodiness' (woody, nonwoody).
The evaluation of the two tenets of comparative ecology and vegetation modelling focuses on 10 traits that are central to leading dimensions of trait variation or that are physiologically relevant and closely related to parameters used in vegetation modelling (Westoby et al., 2002;Wright et al., 2004): plant height, seed mass, specific leaf area (one-sided leaf area per leaf dry mass, SLA), leaf longevity, leaf nitrogen content per leaf dry mass (N m ) and per leaf area (N a ), leaf phosphorus content per leaf dry mass (P m ) and maximum photosynthetic rate per leaf area (A maxa ), per leaf dry mass (A maxm ) and per leaf nitrogen content (A maxN ). As for the relevance of the 10 selected traits: plant height was considered relevant for vegetation carbon storage capacity; seed mass was considered relevant for plant regeneration strategy; leaf longevity was considered relevant for trade-off between leaf carbon investment and gain; SLA for links of light capture (area based) and plant growth (mass based); leaf N and P content: link of carbon and respective nutrient cycle; photosynthetic rates expressed per leaf area, dry mass and N content for links of carbon gain to light capture, growth and nutrient cycle. Although we realize the relevance of traits related to plant-water relations, we did not feel comfortable to include traits such as maximum stomatal conductance or leaf water potential into the analyses for the lack of sufficient coverage for a substantial number of species. For each of the 10 traits, we quantified variation across species and PFTs in three ways: (1) Differences between mean values of species and PFTs were tested, based on one-way ANOVA.
(2) Variation within species, in terms of standard deviation (SD), was compared with variation between species (same for PFTs). (3) The fraction of variance explained by species and PFT R 2 was calculated as one minus the residual sum of squares divided by the total sum of squares.
We observed large variation in SD within species if the number of observations per species was small (see funnel plot in Appendix S1). With an increasing number of observations, SD within species approached an average, trait specific level. To avoid confounding effects due to cases with very few observations per species, only species with at least five trait entries were used in statistical analyses (with exception of leaf longevity, where two entries per species were taken as the minimum number because species with multiple entries were very rare). The number of measurements per PFT was sufficient in all cases. Statistical analyses were performed in R (R Development Core Team, 2009).

Data coverage in the TRY database
As of March 31, 2011 the TRY data repository contains 2.88 million trait entries for 69 000 plant species, accompanied by 3.0 million ancillary data entries [not all data from the databases listed in Table 1 and summarized in Table 2 could be used in the subsequent analyses, Fig. 1 The TRY process of data sharing. Researcher C contributes plant trait data to TRY (1) and becomes a member of the TRY consortium (2). The data are transferred to the Staging Area, where they are extracted and imported, dimensionally and taxonomically cleaned, checked for consistency against all other similar trait entries and complemented with covariates from external databases [3; Tax, taxonomic databases, IPNI/TROPI-COS accessed via TaxonScrubber (Boyle, 2006); Clim, climate databases, e.g. CRU; Geo, geographic databases]. Cleaned and complemented data are transferred to the Data Repository (4). If researcher C wants to retain full ownership, the data are labelled accordingly. Otherwise they obtain the status 'freely available within TRY'. Researcher C can request her/his own data -now cleaned and complemented -at any time (5). If she/he has contributed a minimum amount of data (currently 4500 entries), she/he automatically is entitled to request data other than her/ his own from TRY. In order to receive data she/he has to submit a short proposal explaining the project rationale and the data requirements to the TRY steering committee (6). Upon acceptance (7) the proposal is published on the Intranet of the TRY website (title on the public domain) and the data management automatically identifies the potential data contributors affected by the request. Researcher C then contacts the contributors who have to grant permission to use the data and to indicate whether they request coauthorship in turn (8). All this is handled via standard e-mails and forms. The permitted data are then provided to researcher C (9), who is entitled to carry out and publish the data analysis (10). To make trait data also available to vegetation modellers -one of the pioneering motivations of the TRY initiative -modellers (e.g. modeller E) are also allowed to directly submit proposals (11) without prior data submission provided the data are to be used for model parameter estimation and evaluation only. We encourage contributors to change the status of their data from 'own' to 'free' (12) as they have successfully contributed to publications. With consent of contributors this part of the database is being made publicly available without restriction. So far look-up tables for several qualitative traits (see Table 2) have been published on the website of the TRY initiative (http://www.try-db.org). Metadata are also provided without restriction (13). because some recently contributed datasets were still being checked and cleaned in the data staging area (see Fig. 1)]. About 2.8 million of the trait entries have been measured in natural environment, o100 000 in experimental conditions (e.g. glasshouse, climate or open-top chambers). About 2.3 million trait entries are for quantitative traits, while 0.6 million entries are for qualitative traits (Table 2). Qualitative traits, like plant growth form, are often treated as distinct and invariant within species (even though in some cases they are more variable than studies suggest, e.g. flower colour or dispersal mode), and they are often used as covariates in analyses, as when comparing evergreen vs. deciduous (Wright et al., 2005) or resprouting vs. nonresprouting plants (Pausas et al., 2004). The qualitative traits with the highest species coverage in the TRY dataset are the five traits used for PFT classification and leaf compoundness: woodiness (44 000 species), plant growth form (40 000), leaf compoundness (35 000), leaf type (34 000), photosynthetic pathway (32 000) and leaf phenology type (16 000); followed by N-fixation capacity (11 000) and dispersal syndrome (10 000). Resprouting capacity is noted for 3000 species (Description of qualitative traits: Plant dispersal syndrome: dispersed by wind, water, animal; N-fixation capacity: able/not able to fix atmospheric N 2 ; leaf compoundness: simple versus compound, resprouting capacity: able/not able to resprout).
The quantitative traits with the highest species coverage are seed size (27 000 species), plant height (18 000), leaf size (17 000), wood density (12 000), SLA (9000), plant longevity (8000), leaf nitrogen content (7000) and leaf phosphorus content (5000). Leaf photosynthetic capacity is characterized for more than 2000 species. Some of these traits are represented by a substantial number of entries per species, e.g. SLA has on average 10 entries per species, leaf N, P and photosynthetic capacity have about eight resp. five entries per species, with a maximum of 1470 entries for leaf nitrogen per dry mass (N m ) for Pinus sylvestris.
About 40% of the trait entries (1.3 million) are georeferenced, allowing trait entries to be related to ancillary information from external databases such as climate, soil, or biome type. Although latitude and longitude are often recorded with high precision, the accuracy is unknown. The georeferenced entries are associated with 8502 individual measurement sites, with sites in 746 of the 4200 2 Â 21 land grid cells of e.g. a typical climate model (Fig. 2). Europe has the highest density of measurements, and there is good coverage of some other regions, but there are obvious gaps in boreal regions, the tropics, northern and central Africa, parts of South America, southern and western Asia. In tropical South America, the sites fall in relatively few grid cells, but there are high numbers of entries per cell. This is an effect of systematic sampling efforts by long-term projects such as LBA (The Large Scale Biosphere-Atmosphere Experiment in Amazonia: http://www. lba.inpa.gov.br/lba) or RAINFOR (Amazon Forest Inventory Network: http://www.geog.leeds.ac.uk/ projects/rainfor). For two individual traits, the spatial coverage is shown in Fig. 3. Here we additionally provide coverage in climate space, identifying biomes for which we lack data (e.g. temperate rainforests). More information about data coverage of individual traits is available on the website of the TRY initiative (http://www.try-db.org).

General pattern of trait variation: test for normality
For 52 traits, the coverage of database entries was sufficient to quantify general pattern of density distributions in terms of skewness and kurtosis, and to apply the Jarque-Bera test for normality (Table 3) tive skewness. For 49 of the 52 traits, the Jarque-Bera test indicates an improvement of normality by logtransformation of trait values -only for three traits normality was deteriorated (leaf phenolics, tannins and carbon content per dry mass; Table 3). The distribution of leaf phenolics and tannins content per dry mass is in between normal and log-normal: positively skewed on the original scale, negatively skewed on log-scale. Leaf carbon content per dry mass has a theoretical range from 0 to 1000 mg g À1 . The mean value, about 476 mg g À1 , is in the centre of the theoretical range, and the variation of trait values is small (Table 4).    Results based on dataset after excluding obvious errors, but before detection of outliers. Skewness, measure of the asymmetry of the density distribution (0 in case of normal distribution; o0, left-tailed distribution; 40, right-tailed distribution); Kurtosis, measure of the 'peakedness' of the density distribution (here presented as excess kurtosis: 0, in case of normal distribution; o0, wider peak around the mean; 40, a more acute peak around the mean); JB test, result of Jarque-Bera test for departure from normality (0 for normal distribution; 40 for deviation from normal distribution); P-value, probability of obtaining a test statistic at least as extreme as the observed, assuming the null hypothesis, here the data are normal distributed, is true (on the original scale, resp. after log-transformation, 40.5 in case of normality accepted at 95% confidence); change of normality, difference between results of Jarque-Bera test on the original scale and after log-transformation of trait data (40, improvement of normality by log-transformation; o0, deterioration of normality by log-transformation); RMSE, root mean squared error; bold: traits for which we quantified the fraction of variance explained by species and PFT. Nevertheless, according to the Jarque-Bera test, also on a logarithmic scale all traits show some degree of deviation from normal distributions (indicated by small P-values, Table 3). Seed mass, for example, is still positively skewed after log-transformation (Table  3). This is due to substantial differences in the number of database entries and seed masses between grasses/ herbs, shrubs and trees (Fig. 4a). Maximum plant height in the TRY database has a strong negative kurtosis after log-transformation (Table 3). This is due to a bimodal distribution: one peak for herbs/ grass and one for trees (Fig. 4b). The number of height entries for shrubs is comparatively small -which may be due to a small number or abundance of shrub species in situ (i.e. a real pattern) but is more likely due to a relative 'undersampling' of shrubs (i.e. an artefact of data collection). Within the growth forms herbs/grass and shrubs, height distribution is approximately log-normal. For trees the distribution is skewed to low values, because there are mechanical constrictions to grow taller than 100 m. The distribution of SLA after log-transformation is negatively skewed with positive kurtosis (Table 3) -an imprint of needle-leaved trees and shrubs besides the majority of broadleaved plants (Fig. 4c). The distribution of leaf nitrogen content per dry mass after log-transformation has small skewness, but negative kurtosis (Table 3) -the data are less concentrated around the mean than normal (Fig. 4d). In several cases, sample size is sufficient to characterize the distribution at different levels of aggregation, down to the species level. Again we find approximately log-normal distributions (e.g. SLA and N m for Pinus sylvestris; Fig. 4c and d).

Ranges of trait variation
There are large differences in variation across traits ( Table 4). The standard deviation (SD) expressed on a logarithmic scale ranges from 0.03 for leaf carbon content per dry mass (resp. about 8% on the original scale) to 1.08 for seed mass (resp. À95% and 1 1100% on the original scale). Note two characteristics of SD on the logarithmic scale: (1) it corresponds to an asymmetric distribution on the original scale: small range to low values, large range to high values; (2) it can be compared directly across traits. For more information, see supporting information Appendix S2. Leaf carbon content per dry mass, stem density and leaf density show the lowest variation, followed by the concentration of macronutrients (nitrogen, phosphorus), fluxes and conductance (photosynthesis, stomatal conductance, respiration), the concentration of micronutrients (e.g. aluminium, manganese, sodium), traits related to length (plant height, plant and leaf longevity), and traits related to leaf area. Mass-related traits show the highest variation (seed mass, leaf dry mass, N and P content of the whole leaf -in contrast to concentration per leaf dry mass or per leaf area). The observations reveal a general tendency towards higher variation with increasing trait dimensionality (length oarea omass; for more information, see Appendix S3).

Tenet 1: Aggregation at the species level represents the major fraction of trait variation
There is substantial intraspecific variation for each of the 10 selected traits (Table 5): for single species the standard deviation is above 0.3 on logarithmic scale, e.g. SD 5 0.34 for maximum plant height of Phyllota phyllicoides (À55% and 1 121% on the original scale), but based on only six observations and SD 5 0.32 in case of Dodonaea viscosa (n 5 26). The SD of N m for Poa pratensis is 0.17 (n 5 63), which is almost equal to the range of all data reported for this trait, but this is an exceptional case. The trait and species with the most observations is nitrogen content per dry mass for Pinus sylvestris with 1470 entries (SD 5 0.088, À18% and 1 22%). The variation in this species spans almost half the overall variation observed for this trait (SD 5 0.18), covering the overall mean (Fig. 4d). For several trait-species combinations, the number of measurements is high enough for detailed analyses of the variation within species (e.g. on an environmental gradient).
The mean SD at the species-level is highest for plant height (0.18) and lowest for leaf longevity (0.03, but few observations per species, Table 5). For all ten traits the mean SD within species is smaller than the SD between species mean values (Table 5) (c) (d) Fig. 4 Examples of trait frequency distributions for four ecologically relevant traits (Westoby, 1998;Wright et al., 2004). Upper panels: (a) seed mass and (b) plant height for all data and three major plant growth forms (white, all database entries; light grey, herbs/grasses; dark grey, trees; black, shrubs). Rug-plots provide data ranges hidden by overlapping histograms. Lower panels: (c) Specific leaf area (SLA) and (d) leaf nitrogen content per dry mass [N m , white, all database entries excluding outliers (including experimental conditions); light grey, database entries from natural environment (excluding experimental conditions); medium grey, growth form trees; dark grey, PFT needle-leaved evergreen; black, Pinus sylvestris].

Table 5
Variation within and between species and within and between plant functional types (PFT)    ).
Tenet 2: Basic PFTs capture a sufficiently important fraction of trait variation to represent functional diversity For all 10 traits, the PFT mean values are significantly different between PFTs (Table 5). Four traits show larger variation between PFT mean values than within PFTs (plant height, seed mass, leaf longevity, A maxN ), two traits show similar variation between PFT means and within PFTs (SLA, A maxm ). As a consequence, more than 60% of the observed variance occurs between PFTs for plant height and leaf longevity, and about 40% of the variation occurs between PFTs for seed mass, SLA, A maxm and A maxN (Fig. 5). The high fraction of explained variance for these six traits reflects the definition of PFTs based on the closely related qualitative traits: plant growth form, leaf phenology (evergreen/deciduous), leaf type (needle-leaved/broadleaved) and photosynthetic pathway (C3/C4). For theses traits, PFTs such as those commonly used in vegetation models, capture a considerable fraction of observed variation with relevant internal consistency. However, for certain traits the majority of variation occurs within PFTs: four traits show smaller variation between than within PFTs, causing substantial overlap across PFTs (N m , N a , P m , A maxa ). In these cases only about 20-30% of the variance is explained by PFT, and about 70-80% of variation occurs within PFTs.

Representation of trait variation in the context of global vegetation models
To demonstrate how the observed trait variation is represented in global vegetation models, we first compare observed trait ranges of SLA to parameter values for SLA used in 12 global vegetation models; then we compare observed trait ranges of N m with state variables of nitrogen concentration calculated within the dynamic global vegetation model O-CN (Zaehle & Friend, 2010). Some vegetation models separate PFTs along climatic gradients into biomes, for which they assign different parameter values. A rough analysis of SLA along the latitudinal gradient (as a proxy for climate) indicates no major impact on SLA within PFT (Fig. 6), and we further jointly analyse SLA data by PFT. However, the range of observed trait values for SLA per PFT is remarkably large, except for the PFT 'needle-leaved deciduous trees' (Figs 6 and 7). The parameter values from most of the 12 models match moderately high density of SLA observations, but most are clearly different from the mean, and some parameter values are at the low ends of probabilities, surprisingly far off the mean value of observations.
The range of observed trait values for N m per PFT is also high (Fig. 8), except for the PFT 'needle-leaved evergreen trees'. Modelled state variables are in most cases within the range of frequently observed trait values -model values for the PFT 'needle-leaved evergreen trees' match the observed distribution almost  perfectly. Nevertheless, there are considerable differences between modelled and observed distributions: the modelled state variables are approximately normally distributed on the original scale, while the observed trait values are log-normally distributed; the range of modelled values is substantially smaller than the range of observations; and the highest densities are shifted. Apart from possible deficiencies of the O-CN model, the deviation between observed and modelled distributions may be due to inconsistencies between compiled traits and modelled state variables: trait entries in the database are not abundance-weighted with respect to natural occurrence, and they represent the variation of single measurements, while the model produces 'community' measures. The distribution of observed data presented here is therefore likely wider than the abundance-weighted leaf nitrogen content of communities in a given model grid cell.

Discussion
The TRY initiative and the current status of data coverage The TRY initiative has been developed as a Data Warehouse to integrate different trait databases. Nevertheless, TRY does not aim to replace existing databases, but rather provides a complementary way to access these data consistently with other trait data -it facilitates synergistic use of different trait databases. Compared with a Meta Database approach, which would link a network of separate databases, the integrated database (Data Warehouse) provides the opportunity to SLA (mm 2 mg -1 )

Latitude
Latitude SLA (mm 2 mg -1 ) SLA (mm 2 mg -1 ) SLA (mm 2 mg -1 ) Fig. 6 Worldwide range in specific leaf area (SLA) along a latitudinal gradient for the main plant functional types. Grey, all data; black, data for the plant functional group (PFT) under scrutiny. standardize traits, add ancillary data, provide accepted species names and to identify outliers and duplicate entries. A disadvantage of the Data Warehouse approach is that some of the databases contributing to TRY are continuously being developed (see Table 2). However, these contributions to TRY are regularly updated.
The list of traits in the TRY database is not fixed, and it is anticipated that additional types of data will be added to the database in the future. Examples include sap-flow measurements, which are fluxes based on which trait values can be calculated, just as photosynthesis measurements can be used to determine parameter values of the Farquhar model (Farquhar et al., 1980), and leaf venation, which has recently been defined in a consistent way and appears to be correlated with other leaf functional traits (Sack & Frole, 2006;Brodribb et al., 2007;Blonder et al., 2011). Ancillary data, contributed with the trait data, may include images. There is also room for expansion of the phylogenetic range of the data incorporated in the database. There is currently little information on nonvascular autotrophic cryptogams in TRY (i.e. bryophytes and lichens), despite their diversity in species, functions and ecosystem effects, and the growing number of trait measurements being made on species within these groups.
Although they represent a limited set of species (5-10%), most probably they include the most abundant (dominant) species. The high number of characterized species opens up the possibility of identifying the evolutionary branch points at which large divergences in trait values occurred. Such analyses will improve our understanding of trait evolution at both temporal and spatial scales. They highlight the importance of includ-ing trait data for autotrophs representing very different branches of the Tree of Life (Cornelissen et al., 2007;Lang et al., 2009) in the TRY database.
For some traits, we know that many more data exist, which could potentially be added to the database. Nevertheless, for some traits the lack of data reflects difficulties in data collection.   (Zaehle & Friend, 2010). n, number of entries in the TRY database (left) and number of grid elements in O-CN with given PFT (right). the measurements are difficult or laborious. Root measurements fall into this category. Rooting depth (or more exactly, maximum water extraction depth) is among the most influential plant traits in global vegetation models, yet we have estimates for only about 0.05% of the vascular plant species. Data for other root traits is even scarcer. However, many aboveground traits correlate with belowground traits (see Kerkhoff et al., 2006), so the data in TRY do give some indication about belowground traits. Apart from this, root traits are focus of current studies (Paula & Pausas, 2011). Anatomical traits also have weak coverage in general. Quantifying anatomy from microscopic cross-sections is a slow and painstaking work and there is currently no consensus on which are the most valuable variables to quantify in leaf sections, apart from standard variables such as tissue thicknesses and cell sizes, which show important correlations with physiological function, growth form and climate (Givnish, 1988;Sack & Frole, 2006;Markesteijn et al., 2007;Dunbar-Co et al., 2009;Hao et al., 2010). An exception is wood anatomy, where TRY contains conduit densities and sizes for many species (about 7000 and 3000 species, respectively). Finally, allometric or architectural relationships that describe relative biomass allocation to leaves, stems, and roots through the ontogeny of individual plants are presently scattered across 72 different traits, each with low coverage. These traits are essential for global vegetation models and this is an area where progress in streamlining data collection is needed.
Many trait data compiled in the database were not necessarily collected according to similar or standard protocols. Indeed many fields of plant physiology and ecology lack consensus definitions and protocols for key measurements. However, progress is being made as well towards a posteriori data consolidation (e.g. Onoda et al., 2011), as towards standardizing trait definitions and measurement protocols, e.g. via a common plant trait Thesaurus (Plant Trait Thesaurus: http://trait_ ontology.cefe.cnrs.fr:8080/Thesauform/), and a handbook and website (PrometheusWiki: http://prometheuswiki. publish.csiro.au/tiki-custom_home.php) of standard definitions and protocols (Cornelissen et al., 2003b;Sack et al., 2010).
Information about the abiotic and biotic environment in combination with trait data is essential to allow an assessment of environmental constraints on the variation of plant traits (Fyllas et al., 2009;Meng et al., 2009;Ordoñ ez et al., 2009;Albert et al., 2010b;Poorter et al., 2010). Some of this information has been compiled in the TRY database. However, the information about soil, climate and vegetation structure at measurement sites is not well structured, because there is no general agreement on what kind of environmental information is most useful to report in addition to trait measurements. A consensus on these issues would greatly improve the usefulness of ancillary environmental information. Geographic references should be a priority for nonexperimental data.
The number of observations or species with data for all traits declines rapidly with an increasing number of traits: fewer species have data for each trait (see Appendix S3). In cases where multivariate analyses rely on completely sampled trait-species matrices, this issue poses a significant constraint on the number of traits and/or species that can be included. Gap filling techniques, e.g. hierarchical Bayesian approaches or filtering techniques (Shan & Banerjee, 2008;Su & Khoshgoftaar, 2009) offer a potential solution. On the other hand, simulation work in phylogenetics has shown that missing data are not by themselves problematic for phylogenetic reconstruction (Wiens, 2003(Wiens, , 2005. Similar work could be performed in trait-based ecology, and the emerging field of ecological informatics (Recknagel, 2006) may help to identify representative trait combinations while taking incomplete information into account (e.g. Mezard, 2007) .

General pattern and ranges of trait distribution
Based on the TRY dataset, we characterized two general patterns of trait density distributions: (1) plant traits are rather log-normal than normal distributed and (2) the range of variation tends to increase with trait-dimensionality. Here the analysis did benefit from compiling large numbers of trait entries for several traits from different aspects of plant strategy. Based on the rich sampling, we could quantify simple general rules for trait distributions and still identify deviations in the individual case. The approximately log-normal distributions confirm prior reports for individual traits (e.g. Wright et al., 2004) and are in agreement with general observations in biology (Kerkhoff & Enquist, 2009), although we also observe deviation from log-normal distribution, e.g. as an imprint of plant growth form or leaf type. Being approximately log-normal distributed is most probably due to the fact that plant traits often have a lower bound of zero but no upper bound relevant for the data distribution. This log-normal distribution has several implications: (1) On the original scale, relationships are to be expected multiplicative rather than additive (Kerkhoff & Enquist, 2009, see as well Appendix S2). (2) Log-or log-log scaled plots are not sophisticated techniques to hide huge variation, but the appropriate presentation of the observed distributions (e.g. Wright et al., 2004). On the original scale, bivariate plots of trait distributions are to be expected heteroscedastic (e.g. Kattge et al., 2009). (3) Trait related parameters and state variables in vegetation models can be assumed log-normal distributed as well, e.g. Figs 7 and 8 (Knorr & Kattge, 2005). For more details, see Appendix S2.
For several traits, we quantified ranges of variation: overall variation, intra-and interspecific variation, and variation with respect to different functional groups. Most of the trait data compiled within the TRY database have been measured within natural environments and only a small fraction comes from experiments. Therefore, the impact of experimental growth conditions on observed trait variation is probably small in most cases and the observed trait variation in the TRY database comprises primarily natural variation at the level of single organs, including variation due to different measurement methods and, of course, measurement errors. However, systematic sampling of trait variation at single locations is a relatively new approach (Albert et al., 2010a, b;Baraloto et al., 2010;Hulshof & Swenson, 2010;Jung et al., 2010b;Messier et al., 2010), and it may therefore be shown that trait variability under natural conditions is underestimated in the current dataset.

Tenets revisited
The results presented here are a first step to illuminate two basic tenets of plant comparative ecology and vegetation modelling at a global scale: (1) The aggregation of trait data at the species level represents the major fraction of variation in trait values. At the same time, we have shown surprisingly high intraspecific variationfor some traits responsible for up to 40% of the overall variation (Table 5, Figs 4 and 5). This variation reflects genetic variation (among genotypes within a population/species) and phenotypic plasticity. Through the TRY initiative, a relevant amount of data is available to quantify and understand trait variation beyond aggregation on species level. The analysis presented here is only a first step to disentangle within-and between-species variability. It is expected that in combination with more detailed analyses the TRY database will support a paradigm shift from species to traitbased ecology.
(2) Basic PFTs, such as those commonly used in vegetation models capture a considerable fraction of observed variation with relevant internal consistency. However, for certain traits the majority of variation occurs within PFTs -responsible for up to 75% of the overall variation (Table 5,  . This variation reflects the adaptive capacity of vegetation to environmental constraints (Fyllas et al., 2009;Meng et al., 2009;Ordoñ ez et al., 2009;Albert et al., 2010b;Poorter et al., 2010) and it highlights the need for refined plant functional classifications for Earth system modeling. The current approach to vegetation modelling, using few basic PFTs and one single fixed parameter value per PFT (even if this value equals the global or regional mean) does not account for the rather wide range of observed values for related traits and thus does not account for the adaptive capacity of vegetation. A more empirically based representation of functional diversity is expected to contribute to an improved prediction of biome boundary shifts in a changing environment.
There are new approaches in Earth system modelling to better account for the observed variability: suggesting more detailed PFTs, modelling variability within PFTs or replacing PFTs by continuous trait spectra. In the context of this analysis we focused on a basic set of PFTs. This schema is not immutable and there is not one given functional classification scheme. In fact, PFTs are very much chosen and defined along specific needs -and the availability of information. For example, the PFTs used in an individual based forest simulator (e.g. Chave, 1999), are by necessity very different from those used for DGVMs. The TRY dataset will be as important for allowing the definition of new, more detailed PFTs as for parameterizing the existing ones. Some recent models represent trait ranges as state variables along environmental gradients rather than as fixed parameter values.
The O-CN model (Zaehle & Friend, 2010) is an example towards such a new generation of vegetation models, also the NCIM model (Esser et al., 2011), or in combination with an optimality approach the VOM model (Schymanski et al., 2009). Finally, functional diversity may be represented by model ensemble runs with continuous trait spectra and without PFT classification (Kleidon et al., 2009). However, compared with current vegetation models, these new approaches will be more flexible with respect to the adaptive capacity of vegetation. The TRY database is expected to contribute to these developments, which will provide a more realistic, empirically grounded representation of plants and ecosystems in Earth system models.

A unified database of plant traits in the context of global biogeography
The analyses presented here are only a first step to introduce the TRY dataset. To better understand, separate, and quantify the different contributions to trait variation observed in TRY, more comprehensive analyses could be carried out, e.g. variance partitioning accounting for phylogeny and disentangling functional and regional influences or analysis of (co-)variance of plant traits along environmental gradients. An integrative exploration of ecological and biogeographical information in TRY is expected to substantially benefit from progress in the science of machine learning and pattern recognition (Mjolsness & DeCoste, 2001). In principle, we are confronted with a similar challenge that genomics faced after large-scale DNA sequencing techniques had become available. Instead of thousands of sequences, our target is feature extraction and novelty detection in thousands of plant traits and ancillary information. Nonlinear relations among items and the treatment of redundancies in trait space have to be addressed. Nonlinear dimensionality reduction (Lee & Verleysen, 2007) may shed light on the inherent structures of data compiled in TRY. Empirical inference of this kind is expected to stimulate and strengthen hypothesis-driven research (Golub, 2010;Weinberg, 2010) towards a unified ecological assessment of plant traits and their role for the functioning of the terrestrial biosphere.
The representation of trait observations in a spatial or climate context in the TRY database is limited (Figs 2  and 3). This situation can be overcome using complementary data streams: trait information can be spatially expanded with comprehensive compilations of species occurrence data, e.g. from GBIF or herbarium sources. For SLA and leaf nitrogen content we provide an example for combining trait information with species occurrence data from the GBIF database and with climate reconstruction data derived from the CRU database (Fig. 3). Given that the major fraction of variation is between species, the variation of species mean trait values may be used -but with caution -as a proxy for trait variation, as has already been performed in recent studies at regional and continental scales Swenson & Weiser, 2010). Ollinger et al. (2008) derived regional maps of leaf nitrogen content and maximum photosynthesis from trait information in combination with eddy covariance fluxes and remote sensing data. Based on these approaches and advanced spatial interpolation techniques (Shekhar et al., 2004), a unified global database of plant traits may permit spatial mapping of key plant traits at a global scale (Reich, 2005).
The relationship between plant traits (organism-level) and ecosystem or land surface functional properties is crucial. Recent studies have built upon the eddy covariance network globally organized as FLUXNET (a network of regional networks coordinating observations from micrometeorological tower sites: http://www. fluxnet.ornl.gov) and inferred site specific ecosystemlevel properties from the covariation of meteorological drivers and ecosystem-atmosphere exchange of CO 2 and water (Baldocchi, 2008). These include inherent water-use efficiency (Reichstein et al., 2007;Beer et al., 2009), maximum canopy photosynthetic capacity (Ollinger et al., 2008), radiation use efficiency and light response curve parameters (Lasslop et al., 2010). How species traits relate to these ecosystem-level characteristics has not been investigated, but should be possible via a combined analysis of FLUXNET and TRY data. For example, it is possible to test the hypothesized correlation between SLA, P, and N content of dominant species with radiation use efficiency and inherent water-use efficiency at the ecosystem level (as implicit in Ollinger et al., 2008). Similarly, patterns of spatially interpolated global fields of biosphere-atmosphere exchange (Beer et al., 2010;Jung et al., 2010a) may be related to spatialized plant traits in order to detect a biotic imprint on the global carbon and water cycles. Such increased synthetic understanding of variation in plant traits is expected to support the development of a new generation of vegetation models with a better representation of vegetation structure and functional variation Violle & Jiang, 2009).

Conclusions and perspectives
The TRY database provides unprecedented coverage of information on plant traits and will be a permanent communal repository of plant trait data. The first analyses presented here confirm two basic tenets of plant comparative ecology and vegetation modelling at global scale: (1) the aggregation of trait data at the species level represents the major fraction of variation and (2) PFTs cover a relevant fraction of trait variation to represent functional diversity in the context of vegetation modelling. Nevertheless, at the same time these results reveal for several traits surprisingly high variation within species, as well as within PFTs -a finding which poses a challenge to large-scale biogeography and vegetation modelling. In combination with improved (geo)-statistical methods and complementary data streams, the TRY database is expected to support a paradigm shift in ecology from being based on species to a focus on traits and trait syndromes. It also offers new opportunities for research in evolutionary biology, biogeography, and ecology. Finally, it allows the detection of the biotic imprint on global carbon and water cycles, and fosters a more realistic, empirically grounded representation of plants and ecosystems in Earth system models.