Functional trait metrics are sensitive to the completeness of the species' trait data?



  1. Functional diversity (FD) is an important concept for studies of both ecosystem processes and community assembly, so it is important to understand the behaviour of common metrics used to express it.
  2. Data from an existing study of the relationship between FD and environmental drivers were used to simulate the impact of a progressive failure to measure the traits of all the species present under three scenarios: intraspecific variation between sites ignored (i), assessed (ii) or (iii) ignored but with metrics calculated at the sampling unit rather than the site level.
  3. All the FD metrics were highly sensitive to failing to measure the traits of all the species present. Functional dispersion, functional richness and Rao's entropy all generally declined with a reduced proportion of species or cover assessed for traits, whilst functional divergence and evenness increased for some sites and decreased for others. Functional richness was the most sensitive (mean absolute deviation at 70% of species assessed had a range of 11·2–28·2% across scenarios), followed by functional evenness (range 6·4–38·5%), functional divergence (5·2–8·3%), Rao's entropy (1·4–7·0%) and functional dispersion (0·7–3·5%).
  4. It is clear that failing to measure the traits of all species at a site can have a serious impact on the value of any functional trait metric computed and on any conclusions drawn from such data. Future studies of FD need to concentrate on the potential impact of the sampling regime of both traits and species and the scale at which the computations are made on the behaviour of metrics and subsequent robustness of the results.


Functional diversity (FD) is of increasing focus in ecology. Since the first use of the term ‘functional diversity’ in a publication's title in 1990 (Claret 1990), the number of papers published had reached 140 in 2012 (papers containing the phrase ‘functional diversity’ in the topic in the category ‘Ecology’ in Web of Science). Functional diversity has become an important concept as it has contributed to our understanding of community assembly; for example, analysing FD has been shown to be useful in assessing how communities respond to disturbance (Mouillot et al. 2013). It has also been seen as a predictor of ecosystem function (Díaz et al. 2007), where it is assumed that communities with a higher FD deliver more in terms of ecosystems services (Roscher et al. 2012) or may be more resilient to environmental change (Walker, Kinzig & Langridge 1999). Consequently, a range of methods have been proposed to quantify FD (summarised in Mouchet et al. 2010) based on information on trait values or attributes for the species present as proxies to represent functional differences.

Considerable trait information is now held in data bases; for instance, LEDA (Kleyer et al. 2008) and TRY (Kattge et al. 2011), amongst others, hold large quantities of plant trait data. However, collecting trait information in specific studies is important if there is evidence that the trait values, or even attributes, vary according to the environmental conditions or local adaptation. For instance, leaf dry matter content and specific leaf area are sensitive to management and soil fertility (Ordoñez et al. 2009; Pakeman 2013). In certain studies, it may be acceptable to treat traits as invariant within the system in question if intraspecific variation is likely to be small. However, where there is substantial intraspecific variation, it may be necessary to measure the traits of a species at multiple sites within a study. A previous paper (Pakeman & Quested 2007) demonstrated the impact of sampling only the more common species in a system on the calculation of community weighted mean trait values; errors in estimating community weighted means traits were minimal if the commoner species were measured such that the cover of the species measured exceeded 80%. What is not known is how such an approach would impact on the behaviour of FD metrics (de Bello et al. 2013). In other words, can measuring traits on only a subset of the species within a system give you an appropriate measure of FD? This is important as collecting trait information is costly, and hence, there is a potential trade-off between the accuracy in quantifying FD within a site and the power gained by quantifying FD across more sites.

Data from an existing multi-site study that investigated patterns of FD along environmental gradients (Pakeman 2011a; Pakeman, Lennon & Brooker 2011) were used to test the hypothesis that unweighted measures of FD such as functional richness (FRic) will be more sensitive to incomplete trait censuses that those that use abundance weighting in their calculation. In addition, it assessed how quickly accuracy in estimating FD is lost across three different scenarios of trait sampling: (1) the progressive impact of selecting only the commoner species for trait measurement across the whole system – a situation that might be encountered where resources are limited, but it is assumed that intraspecific differences between the sites in a system are unimportant. (2) The progressive impact of selecting only the commoner species within sites for trait measurement – a situation that might be encountered where it is known that intraspecific variation is substantial and effort is made to measure intraspecific variation in traits. An additional third scenario (3) was examined; this repeated the first scenario but examined the impacts on FD metrics if they were assessed at the quadrat level before averaging values for each site.

Materials and methods

Plant and Trait Data

This study took data from a larger study investigating how land use influences biodiversity and ecosystem functioning. The study system contained a variety of land uses and intensities within a small area (5 × 7 km) on the west coast of Scotland (centred at lat. 57·31 N, long. 5·66 W). Sites shared the same geology and climate. Twenty-four of the sites were used in this study, with land uses covering arable, fallow areas cropped for hay, winter-grazed rough grassland and tall herb communities, unimproved silage fields, unimproved pasture and moorland. This excluded the wetland and woodland sites referred to in previous papers as they shared few species with the other sites, and hence, the differences in species composition may have swamped the analysis described here (Pakeman 2011a,b; Pakeman, Lennon & Brooker 2011). Vegetation surveys were carried out in July 2007. Relative abundance (the proportion of above-ground biomass estimated visually) of all higher plant species, bryophytes and litter was estimated in between four and seven, randomly placed 1 × 1 m quadrats per site. Species richness varied from 11 to 42 per site and totalled 125 species across the study area. Further details have been published in previous papers from this site (Pakeman 2011a,b; Pakeman, Lennon & Brooker 2011).

Plant trait data were assembled from two main sources – BiolFlor (Klotz, Kühn & Durka 2002) and LEDA (Kleyer et al. 2008) – and were restricted to the eight response traits identified for this system that linked management and soil conditions with the vegetation (Pakeman 2011b); namely, bud height (life-form), canopy structure, leaf dry matter content, leafing period, log canopy height, log leaf size, vegetative spread by rhizomes and seed terminal velocity (Table S1). These traits included a mix of quantitative and categorical traits. Where gaps were present (c. 2% of cases), information was supplemented with data from floras and by averaging data from congeners. Canopy structure and bud height were coded to reduce the number of attribute columns in the analysis.

Functional Diversity Calculations

All FD calculations were carried out with the FD software of Laliberté & Shipley (2011). This uses principal co-ordinates analysis (PCoA) to return axes that are then used as ‘traits’ to compute FD. The software calculates a range of multidimensional indices, but for this exercise, these were restricted to functional divergence (FDiv), functional evenness (FEve) and FRic of Villéger, Mason & Mouillot (2008), functional dispersion (FDis) (Laliberté & Legendre 2010) and Rao's quadratic entropy (Q) (Botta-Dukát 2005) – definitions are provided in Table 1. For illustration, the community weighted mean leaf dry matter content CWM LDMC was computed to allow comparison with the impact of the scenarios on mean trait calculations.

Table 1. Definitions of the functional diversity indices used in this study
Functional diversity metricDefinition
  1. Sources 1Laliberté & Legendre (2010), 2Villéger, Mason & Mouillot (2008) and 3Botta-Dukát (2005).

  2. FD, Functional diversity.

Functional dispersion1 (FDis)The weighted mean distance in multidimensional trait space of individual species to the centroid of all species. Weights are species relative abundances
Functional divergence2 (FDiv)Species deviance from the mean distance to the centre of gravity weighted by relative abundance within multidimensional trait space
Functional evenness2 (FEve)The regularity with which species abundances are distributed along the minimum spanning tree which links all the species in the multidimensional functional space
Functional richness2 (FRic)The convex hull volume of the individual species in multidimensional trait space
Rao's quadratic entropy3 (Q)Sum of the pairwise distances between species in multidimensional trait space weighted by their relative abundance
Community weighted mean (CWM)The mean trait value of species weighted by the species abundances

Scenarios of Progressive Failure to Measure Species' Traits

Two scenarios were set up to analyse the impact on the FD metrics of not measuring varying proportions of the species. (1) ‘Ordered, system loss’: this assumes that intraspecific trait variation is limited in importance, and hence, the sampling strategy would be to sample the traits of as many common species as possible. Thus, the scenario analyses the impact of progressive removal of the traits of the rarer species from the whole data set. (2) ‘Ordered, site loss’: this assumes that intraspecific variation is significant, so traits for each species are measured across all sites. The scenario analyses the impact of the progressive failure to measure the traits of species from each site. For this data set, the 125 species across the study area represent 607 site by species combinations; a near fivefold increase in effort required compared with measuring traits for the whole species set under Scenario 1. A third scenario (3) repeated Scenario 1, but used calculated FD at the quadrat level before averaging values for each site. Thus, this scenario assessed the strategy of assessing FD per unit area, and use of the sampling itself to reduce the impact of rare species or species with particularly divergent traits (the three scenarios are shown diagrammatically in Fig. 1). A fourth scenario is possible – assessing trait variation at the quadrat level – but this is unlikely to be followed and would give similar results to the other scenarios. All three scenarios were assessed through looking at the responses of the FD metrics to reduce the number of species which have had their traits measured. One consequence of reducing the number of species is that this affects the PCoA as the number of axes kept for the computation of the indices is set to the number of species in the most species poor site. Removing species can then reduce the number of axes kept and hence the calculation of the diversity indices. This can result in step changes in some indices as the number of axes is reduced. Expressing these data in terms of the cover of species is shown in the supplementary material. All metrics were scaled to equal one for the full community, to allow for easier comparison between the responses of the different sites.

Figure 1.

Diagrammatic representation of the structure of the data and the simulations. □ represents a quadrat. Scenario 1: Traits measured on the area species pool. The simulation represents site level FD metrics calculated by successively removing trait information for the rarest species across the study area – the impact on the species used in calculating the FD metrics for site n are shown for failure to measure the traits of the rarest 50% of species across the study area. Scenario 2: Traits measured for site species pools. The simulation represents site level FD metrics calculated by successively removing trait information for the rarest species in the site – in this figure the impact on site n of removing the trait information on the rarest 50% of species is shown. Scenario 3: Traits measured on the area species pool. The simulation represents quadrat level FD metrics calculated by successively removing trait information for the rarest species (50% for this illustration) across the study area followed by averaging FD metrics within sites. FD, Functional diversity.


It is clear that not measuring the traits of the rarer species (Scenario 1) had a differential effect on the different FD metrics (Fig. 2).FDis (Fig. 2a) and Q (Fig. 2m) appeared to be resilient across all the communities, showing little change down to 70% of species measured (mean absolute deviation of 0·7 and 1·4%, respectively, Fig. 3a), and many communities appeared to be unaffected down to 30%. The responses of FDiv (Fig. 2d) and FEve (Fig. 2g) were more variable than either FDis or Q, with mean absolute deviations of 8·3 and 13·0%, respectively, at 70% of species measured (Fig. 3a). The behaviour of some communities for these metrics was highly sensitive, with significant deviations from the value of the full community appearing with only a small reduction in the number of species measured. As might be expected because it is correlated with species number, FRic was very sensitive (Fig. 2j) and, on average declined sharply with reduced effort at measuring species (mean absolute deviation of 20·9% at 70% of species measured, Fig. 3a). The increases seen for some communities are the result of the initial ordination step producing different solutions as the rarer species are removed from the analysis. In contrast, estimating CWM LDMC can be carried out with only 50% of the species with minimal error (Fig. 2p); 0·4% mean absolute deviation at 50% of species measured, 0·1% at 70% (Fig. 3a). As rank:abundance curves are right-skewed within communities (mean skewness 2·90, mean number of species making up 50% of abundance 2·71, 75% 5·54 and 90% 9·29, mean richness 25·29), expressing the changes in FD indices in terms of reducing the cover of species showed that very high levels of coverage of species abundances are necessary (Fig. S1). The behaviour of the FD metrics was highly variable and varied greatly from community to community.

Figure 2.

The impact of removing trait information for the rare species on the computation of the functional diversity metric under Scenario 1 (a, d, g, j, m, p), rarity assessed from the full species list across all 24 sites (trait values assumed to be constant across all sites), under Scenario 2 (b, e, h, k, n, q), rarity assessed at the individual site level trait values assumed to vary between sites (i.e. high intraspecific variability) and Scenario 3 (c, f, i, l, o, r), rarity assessed from the full species list across all 24 sites (trait values assumed to be constant across all sites, metrics calculated per quadrat). Data expressed as a proportion of the metric value at the full species complement for (a, b, c) FDis – functional dispersion, (d, e, f) FDiv – functional divergence, (g, h, i) FEve – functional evenness, (j, k, l) FRic – functional richness, (m, n, o) RaoQ – Rao's functional entropy and (p, q, r) LDMC – the community weighed mean of leaf dry matter content. Each line represents one site.

Figure 3.

The mean absolute proportional deviation of the FD metrics from that of the full species data set for (a) Scenario 1 (trait values assumed to be constant across all sites), (b) Scenario 2 (trait values assumed to vary between sites, that is, high intraspecific variability) and (c) Scenario 3 (trait values assumed to be constant across all sites, metrics calculated per quadrat) expressed as a proportion of the full species list at each of the 24 sites; FDis, ····· FDiv, ---- FEve, ―··―··― FRic, ――― Rao's Q, ―·―·― LDMC. FD, Functional diversity.

Comparison of progressive failure to measure the traits of species under Scenario 2 with that under Scenario 1 revealed that the patterns for the FD metrics were smoother and less erratic as the removal of species' trait information was ordered within site rather than across sites. Again FDis and Q (Fig. 2b and n) were initially the most stable (mean absolute deviation 3·5 and 7·0%, respectively, at 70%, Fig. 3b), but all showed the substantial impact of failing to measure the traits of even a small proportion of the species present on a site. In general, FDiv and FEve (Fig. 2e and h) increased as more species were not included in the calculations whilst FRic declined (Fig. 2k). FDiv was less sensitive (mean absolute deviation of 6·1% at 70%) than either the similar FEve or FRic (31·5 and 28·2%, respectively, Fig. 3b). As for Scenario 1, LDMC could be estimated well by measuring only 50% of the species present (0·7% mean absolute deviation at 50 and 0·2% at 70% of species measured, Fig. 2q). Expressing the results as reductions in the cover of the measured species had the impact of stretching the left side of the species graphs and condensing the right. Thus, it appeared that for the sites measured here, estimating these FD metrics can be highly sensitive to failing to measure the traits of those species that make up only a small proportion of the cover of a community (Fig. S2).

Contrasting Scenario 3 with Scenario 1 clearly showed that taking a quadrat-based approach is generally more robust (Fig. 2c, f, i, l and o). The metrics provided a similar ranking in sensitivity to Scenario 1 with FRic the most sensitive (mean absolute deviation of 11·2% at 70% of species included, Fig. 3c) followed by FEve (6·4%), FDiv (5·2%), Q (1·9%), FDis (1·1%) and LDMC (0·1%). These deviations are smaller than under Scenario 1 for FRic (53·6%), FEve (49·2%) and FDiv (62·7%), larger for Q (135%) and FDis (157%) and similar for LDMC. This is reflected in the behaviour of the individual sites. FRic is highly sensitive, and substantial positive and negative fluctuations can be seen (Fig. 2l). Some of these are in response to the impact of removing species' trait information on the initial multivariate step used in calculating FRic with non-numeric data. Negative and positive fluctuations are also seen for the other FD metrics, although there did appear to be a reduction in spread of behaviour between sites; the widely divergent behaviour of the minority of sites appeared to have been restricted to a degree.


The data clearly show that the FD measures are highly sensitive to failing to measure the traits of all the species in the community; much more so than is the community weighted mean (Pakeman & Quested 2007). The most sensitive measure was FRic, which is expected as it is unweighted by abundance and correlated to species richness (Villéger, Mason & Mouillot 2008; Mouchet et al. 2010) – hence its sensitivity to the failure to measure the traits of the rarer species. At least some of the rare species in this data set must have extreme trait values that affect the values of the FD indices, as is the case with rare species in other systems (Reader 1998; Cornwell & Ackerly 2010). Species with extreme trait values lie at the vertices of the convex hull and their removal can substantially reduce the remaining volume. FRic was also sensitive in these simulations to the recalculation of the PCoA each time a species' trait information was removed. This accounts for the positive shifts in FD for some sites as the number of axes selected for calculating the indices was reduced as species' trait information was removed from the calculations and the remaining species end up occupying a greater volume of trait space under the reformed axes. In addition to the sensitive measure FRic, it is clear that removing a small number of species can have a significant impact on FDiv and FEve for a subset of the communities here. FDiv is dependent on the vertices of the convex hull, so is sensitive to the removal of species with extreme traits values from the calculations, although less than FRic as the abundance of species is taken into account. If the rare species do have extreme trait values then this can alter the shape of the Minimum Spanning Tree used to calculate FEve. More slowly affected are FDis and Q as they are weighted, like FEve and FDiv, but rely less upon the exact geometry of species distribution in trait space than FDiv and FEve (Villéger, Mason & Mouillot 2008; Mouchet et al. 2010). Computing the FD by quadrat rather than site is helpful for the metrics where outlying species in trait space affect the result; namely, FDiv, FEve and FRic. Conversely, reducing the number of species within each calculation of FD when changing from the site level to the quadrat level has the result of increasing the variability of FDis and Q.

The initial conclusion to be drawn from this is that studies where only a selection of species have had their traits assessed must have their conclusions seen as tentative (Shipley, Vile & Garnier 2006; Bernard-Verdier et al. 2012; de Bello et al. 2013) and only the conclusions of studies where all species have had their traits measured should be depended on Mokany & Roxburgh 2010. Arguably for some traits that behave predictably as the proportion of the species present have traits available declines (possibly FDiv, FRic and RaoQ) the conclusions may prove robust. However, certain metrics, particularly FDiv and FEve showed substantially divergent behaviour between sites, such that there is no possibility of a priori judging the impact of measuring traits on only a proportion of the species present.

It is evident from the simulations in this study that there is a clear risk from failing to measure the traits of interest for all the species present. Consequently, there is a real trade-off between effort at recording species distributions and effort at measuring traits. This can partly be resolved by concentrating on a small number of traits for measurement, or by relying mainly on traits that are invariant within a species, that is, life-history traits rather than leaf traits, or by accepting the potential, but likely systematic, error in trait values extracted from data bases. The trade-off, however, is exacerbated if there is considerable intraspecific variation in the traits of interest (Albert et al. 2011; Pakeman 2013); the same amount of effort is needed to measure the traits of 20% of the species under Scenario 2 as for all the species under Scenarios 1 and 3. Comparisons within Figs 2 and 3 should be seen in this light. However, what is an acceptable level of error in estimating the ‘true’ value of these metrics must depend on circumstances. What is necessary is that this error is identified and the simulations shown in this study can give an idea of the likely magnitude and direction of these errors.

A logical progression from these simulations is that the sampling of species should attempt to be complete where this is possible, for example, censusing of birds in small woodlands or islands (Ding et al. 2013), as missing rare species from the sampling can have a significant impact on the value of the FD metrics. Where completeness is unlikely to be possible, for example, plants in species-rich grasslands, or for invertebrate and microbial communities under most circumstances, then FD metrics should be computed at the level of the sampling unit rather than the site (and also suggests many small samples may be better than a few large ones). As FRic is correlated with species richness, then with hindsight, it is clear that these metrics should be seen as related to the sampled area in the same way that species richness is (Kent 2012) and expressed the same way. However, the same is true for FDiv and FEve which are not related to species richness (Villéger, Mason & Mouillot 2008; Mouchet et al. 2010), although they appear to be sensitive to outliers in trait space. This potential sensitivity to the precise method and effort in the sampling of species' abundances may have contributed a fair degree of noise to studies which have attempted to assess the strength of community convergence/divergence or the impact of the environment on these patterns (e.g. Pakeman 2011a; Bernard-Verdier et al. 2012).

For future studies of FD, it is recommended that the sampling regime of communities and of species' traits should be clearly assessed as to their potential impact on the results prior to the study being carried out. Failure to do this risks, the validity of conclusions drawn from such a study. Where complete censusing of traits is not possible, then studies should focus on the more robust measures FDis and Q, although this would limit the types of questions addressed by the studies (Mouchet et al. 2010).


I would like to thank Iain Turnbull of the National Trust for Scotland for all his help in arranging access and the many crofters and the grazing clerks of Balmacara, Drumbuie, Duirinish and Plockton for their help in our work. Rob Brooker, Antonia Eastwood, Roger Cummins and Jelle van Rijmenant all helped with the fieldwork and Rob Brooker with collating the trait information. This work was funded by the Scottish Government's Rural and Environment Science and Analytical Services Division.