The intrinsic dimensionality of plant traits and its relevance to community assembly

Authors


Summary

  1. Plants are multifaceted organisms that have evolved numerous solutions to the problem of establishing, growing and reproducing with limited resources. The intrinsic dimensionality of plant traits is the minimum number of independent axes of variation that adequately describes the functional variation among plants and is therefore a fundamental quantity in comparative plant ecology. Given the large number of functional traits that are measured on plants, the dimensionality of plant form and function is potentially vast.
  2. A variety of linear and nonlinear methods were used to estimate the intrinsic dimensionality of three large trait data sets. The results of these analyses indicate that while the dimensionality of plant traits is generally larger than we have admitted in the past, it does not exceed six in the most comprehensive data set.
  3. The dimensionality of plant form and function is a blessing, not a curse. The higher the intrinsic dimension of traits in an analysis, the more easily our models will be able to accurately discriminate species in trait space and therefore be able to predict species distributions and abundances. Recent analyses indicate that the ability to predict community composition increases rapidly with additional traits, but reaches a plateau after four to eight traits.
  4. Synthesis. There appears to be a tractable upper limit to the dimensionality of plant traits. To optimize research efficiency for advancing our understanding of trait-based community assembly, ecologists should minimize the number of traits while maximizing the number of dimensions, because including multiple correlated traits does not yield dividends and including more than eight traits leads to diminishing returns. It is recommended to measure traits from multiple organs whenever possible, especially leaf, stem, root and flowering traits, given their consistent performance in explaining community assembly across different ecosystems.

What casts the pall over our victory celebration? It is the curse of dimensionality, a malediction that has plagued the scientist from the earliest days.Richard E. Bellman (1961)

We have stressed the practical difficulties caused by increases in dimensionality. Now we turn to the theoretical benefits.David L. Donoho (2000)

Introduction

Quantifying the variation of functional traits among and within species increases our capacity to understand ecosystem processes and community assembly (Lavorel & Garnier 2002; McGill et al. 2006; Westoby & Wright 2006; Suding et al. 2008; Shipley 2010). Ecologists measure dozens of traits, but many are redundant. The intrinsic dimension of a multitrait data set can be informally described as the minimum number of parameters or latent variables needed to describe it (Lee & Verleyson 2007). In other words, the intrinsic dimensionality of plant traits represents the number of independent axes of functional variation among plants and is therefore a fundamental quantity in comparative plant ecology. Ecologists have rarely used dimensionality estimators and have not emphasized the fundamental importance of the intrinsic dimension. Much like psychologists consider a five-dimensional human personality scheme for predicting human behaviour in varying circumstances (McCrae & Costa 2003), ecologists strive to determine the dimensionality of plant traits for predicting species responses to environmental conditions (Grime et al. 1997; Westoby et al. 2002; Reich et al. 2003; Wright et al. 2007). It is therefore both timely and appropriate to empirically derive the number of useful dimensions within our growing data sets of plant traits.

Given the vast diversity of life on Earth, ecologists are no strangers to the ‘curse of dimensionality’, which is the general difficulty of navigating high-dimensional spaces (Bellman 1961; Donoho 2000). One of the great challenges in community ecology is understanding the structure of multidimensional species space and mapping multiple environmental variables onto that space (Legendre & Legendre 2012). Trait-based ecology offers an alternative approach for linking environmental gradients to species distributions through the mechanistic link of functional traits (McGill et al. 2006). Ecosystem ecologists see the shift in focus from species to a reduced number of traits as a way of exorcizing the curse of dimensionality, simplifying a once-intractable problem and incorporating mechanism into the relationship between plants and ecosystem function. Eliminating the taxonomic focus on species identity frees us to evaluate the responses of traits to the environment and the effects of traits on ecosystem processes (Lavorel & Garnier 2002; Suding et al. 2008). This shift dramatically reduces the dimensionality and complexity of the problem and vastly increases our ability to generalize across ecosystems and transcend taxonomic and geographic boundaries.

For community ecologists, a trait-based approach is less a radical shift in focus from species to traits and more a sharpened focus on traits to understand and predict the distribution and abundance of species (Shipley 2010; Laughlin et al. 2012). Community ecologists have sought to insert traits into models to explain the distribution of species along environmental gradients (Bernhardt-Römermann et al. 2008; Dray & Legendre 2008; Shipley 2010; Kleyer et al. 2012; Pollock, Morris & Vesk 2012; Jamil et al. 2013), to predict shifts in species distributions in a changing environment (Laughlin et al. 2011; Frenette-Dussault et al. 2013) and to test the theories of environmental filtering and limiting similarity (Kraft, Valencia & Ackerly 2008).

Here, I briefly review the theoretical and expected dimensionality of plant traits, empirically estimate the intrinsic dimensionality of plant traits using three large species-trait data sets and discuss the advantages of incorporating multiple trait dimensions into analyses of community assembly.

Plant strategies and trait dimensions

Plants are multifaceted organisms that have evolved numerous solutions to the problem of establishing, growing and reproducing with limited resources. Since the dawn of ecology, plants have been classified into functional groups (Grime & Pierce 2012). Classifications allow us to generalize functional responses to the environment and to reduce high-dimensional species space. Functional group classification will likely always retain favour and use in many applications (Lavorel et al. 1997; Hooper & Dukes 2004; Craine et al. 2012). However, comparative plant ecologists have increasingly emphasized continuous functional variation among plants and have elucidated multiple dimensions of functional specialization (Grime et al. 1997; Westoby et al. 2002; Diaz et al. 2004; Westoby & Wright 2006).

The vast majority of terrestrial vascular plants exhibit a relatively short list of organs (leaves, stems, roots, seeds, and flowers or cones) and a few key whole-plant properties (e.g. height, life-form) (Fig. 1). Westoby (1998) pragmatically proposed to focus attention on plant organs and whole-plant traits, such as specific leaf area (SLA), maximum height and seed mass to operationalize functional comparisons of plants at a global scale. There is theoretical support for this leaf–height–seed (LHS) plant strategy scheme because these traits influence dispersal, establishment and persistence (Weiher et al. 1999), and there is empirical support because these three traits loaded strongly on independent multitrait axes (Laughlin et al. 2010). The LHS plant strategy scheme has been applied in many ecological contexts (Lavergne, Garnier & Debussche 2003; Golodets, Sternberg & Kigel 2009). Westoby and colleagues later added leaf area (Westoby et al. 2002), and wood density and root traits (Westoby & Wright 2006) as potentially important plant strategy dimensions. Each plant organ yields potentially unique information about how a plant functions within its environment and how plants are sorted along environmental gradients. I will briefly discuss each of these in turn.

Figure 1.

Seven plant organs or whole-plant properties and their functional significance. Known statistical relationships among each circle are illustrated by black arrows, and weaker relationships are shown as grey dashed arrows. The strength of all these relationships among a set of plants determines the intrinsic dimensionality of plant traits.

Leaves are the most conspicuous and well-studied plant organs, and the leaf economics spectrum is the most well-known dimension of plant function, describing a trade-off between leaf life span and maximum rate of carbon acquisition (Reich et al. 1999; Wright et al. 2004). Leaf economics traits are remarkably one-dimensional. The majority of traits that we measure on leaves (e.g. specific leaf area, leaf dry matter content, life span, mass-based maximum rate of photosynthesis, dark respiration rates, leaf nitrogen concentration, leaf phosphorus concentration) are strongly correlated among species. A critical implication of this is that these traits are statistically redundant. If leaf economics traits are subjected to a method of data reduction, such as principal components analysis, a single dimension accounts for the vast majority of the variation (Wright et al. 2004). This implies that multiple traits are associated with a singular trade-off in function. The remaining variation (i.e. the minor axes) may be the result of genetic drift or may have been shaped by other forces in the adaptive landscape that have not been considered.

There are other leaf properties, however, that are not strongly correlated with leaf economics traits. For example, minimum water potential has been shown to be largely independent of leaf life span (Ackerly 2004). Leaf surface area (Pierce et al. 2013), hydraulic conductance (Sack et al. 2003) and vein density (Sack & Scoffoni 2013) have also been shown to be independent of leaf economics traits, suggesting that leaf function is multidimensional (Fig. 1).

Plant height at maturity is an important whole-plant trait that influences plant competitiveness for light (Keddy & Shipley 1989) and dominance in a forest canopy, though several trade-offs for achieving a tall canopy exist (Givnish 1995). Maximum height at maturity, the speed at which maximum height is attained, and the length of time a species maintains its maximum height all have costs and benefits (Westoby et al. 2002), which implies that the functional aspects of plant height may be multidimensional (Fig. 1).

Seeds vary in their ability to disperse away from the parent plant, successfully germinate and become established seedlings (Grubb 1977). There is a fundamental trade-off between seed size and total seed output (Westoby et al. 2002), and the large variety of seed sizes and shapes are indicative of the range of regeneration strategies in plants. Recent studies indicate that seed mass reflects a trade-off between stress tolerance and fecundity rather than one between competition and colonization (Muller-Landau 2010; Lönnberg & Eriksson 2013). Seed mass and seed shape influence persistence in the seed bank (Thompson, Band & Hodgson 1993; Moles, Hodson & Webb 2000), and these properties are uncorrelated, suggesting that seed traits are multidimensional (Fig. 1).

Stems provide structural support in the gravity-laden terrestrial environment, and they transport water, nutrients and sugars, and they can be important for defence and storage. Stem density (i.e. specific gravity) is an important property of plant stems that represents a trade-off between the efficiency of hydraulic conductivity and resistance to drought- or freezing-induced cavitation (Hacke et al. 2001; Baas et al. 2004). It also reflects a trade-off between growth rate and survival (Wright et al. 2010). Chave et al. (2009) discuss other aspects of wood density, such as resistance to decay, storage capacity and mechanical strength. Bark thickness is another important stem trait important to defence against fire, pests and pathogens (Paine et al. 2010). Given this multifaceted nature of plant stems, stem traits may be multidimensional (Fig. 1).

Roots are perhaps the most mysterious of plants organs (Ryser 2006), but our understanding of root function is rapidly improving (Eshel & Beeckman 2013). Given the logistical difficulty of measuring root traits in the field on a large pool of species, roots have often been left off core lists of important plant traits (Weiher et al. 1999; Westoby et al. 2002). Root traits such as specific root length or tissue density may represent a trade-off between growth rate and life span and will influence the plant's ability to proliferate fine absorptive roots into nutrient-rich patches. Properties of coarse (large) roots are likely aligned with wood traits (Fortunel, Fine & Baraloto 2012), but evidence is mixed as to whether fine root traits are independent of leaf traits (Craine & Lee 2003; Tjoelker et al. 2005; Laughlin et al. 2010), or whether leaves and roots are functionally coordinated reflecting a ‘whole-plant economics spectrum’ (Freschet et al. 2010; Pérez-Ramos et al. 2012). Root tissue density and specific root length of fine roots appear to not be strongly intercorrelated, indicating that root functions may be multidimensional (Fig. 1).

Flowering phenology is a key component of plant function. Compared to leaf, stem, root and seed traits, the timing of flowering has not been widely discussed as an important trait for community ecology, despite the fact that onset of flowering was included in the short list of core traits by Weiher et al. (1999). Flowering onset and duration (or, more generally, the timing of pollination) are influenced by environmental conditions and developmental regulation (Mouradov, Cremer & Coupland 2002) and affect plant interactions with pollinating mutualists (Hegland et al. 2009). Flowering phenology appears to be particularly sensitive to global change (Fitter & Fitter 2002).

Life-history traits tend to be categorical whole-plant properties such as the life-form of the species based on perennating bud placement (Raunkiaer 1934), growth form (e.g. herb, graminoid, shrub, tree), the occurrence of vegetative reproduction (e.g. tillering) (Klimesova & Klimes 2007) and the capacity for resprouting (Bellingham & Sparrow 2000). Life history can also include continuous traits, such as life span, which influences population dynamics (Fig. 1). Continuous functional traits can differ among life-forms, so these categories are often the basis of dynamic global vegetation models. Life-form may simply be a combination of continuous traits such as height, stem density and leaf traits, so categorization may be redundant if other continuous plant traits are known.

Given this brief overview of seven groups of traits (Fig. 1) and given the large number of traits that can be measured on plants (Pérez-Harguindeguy et al. 2013), the dimensionality of plant form and function could be a very large number! The most pressing question now is how redundant are these traits? The orthogonality of trait spectrums is critical because ecologists should measure traits that yield unique information about plant function (Ackerly 2004; Wright et al. 2007) to maximize our understanding of community assembly and ecosystem processes. What then is the intrinsic dimensionality of plant traits?

Count your blessings: the intrinsic dimensionality of large species-trait data sets

The dimensionality of trait data sets is not identical to the dimensionality of plant function. The former is a statistical sample of the latter, and the latter is the ‘population’ of interest. High-dimensional data sets can often be reduced to fewer dimensions without losing much information because the set includes redundant (i.e. correlated) variables. A variety of analytical methods exist for determining the intrinsic dimensionality D of a set of N objects with T traits (Lee & Verleyson 2007), where 0 < ≤ T. Data reduction methods, especially principal components analysis (PCA), are frequently used in ecology to explore relationships among traits. The vast majority of studies only report details of the eigenanalysis for two or three dimensions (Grime et al. 1997; Ackerly 2004; Diaz et al. 2004; Laughlin et al. 2010), but some have reported up to four (Craine et al. 2002; Wright et al. 2007).

Here, I estimate the intrinsic dimensionality of three large species-trait data sets from three regions: Northern Arizona, USA (Laughlin et al. 2010); California, USA (Ackerly 2004); and Sheffield, UK (Grime et al. 1997). I expanded a previously published data set from Northern Arizona woodlands and forests (Laughlin et al. 2010) to include 16 traits from 201 species. The other two data sets were extracted from the literature and represent some the most comprehensive list of traits included in any study to date. The trait data set of California chaparral species included 36 ecophysiological and morphological traits measured on 20 species (Ackerly 2004). The Sheffield trait data set of Grime et al. (1997) included 67 traits on 43 species, including whole-plant, leaf, seed, root and flower traits. See Appendix S1 in the Supporting Information for more details about these data sets.

I used a variety of linear and nonlinear methods (Table 1). While the linear methods are well-known to many ecologists, the nonlinear methods are very recent and functions are just now becoming widely available. Nonlinear data reduction may be particularly useful if traits are nonlinearly related; these techniques have rarely, if at all, been used on species-trait matrices. I applied a series of tests to estimate the intrinsic dimensionality of the species-trait data sets: Cattell's scree test (Cattell 1966); the Kaiser rule (Kaiser 1960); Horn's parallel analysis (Horn 1965); optimal coordinates analysis (Ruscio & Roche 2012); two methods based on nearest-neighbour information, a non-iterative estimator (Pettis et al. 1979); a manifold-adaptive estimator (Farahmand, Szepesvári & Audibert 2007); and an unbiased maximum likelihood estimator (Levina & Bickel 2004; MacKay & Ghahramani 2005). I used Cattell's scree test on stress values from non-metric multidimensional scaling (NMS) ordinations with relative Euclidean distances using the ‘metaMDS’ function in the ‘vegan’ library of R (Oksanen et al. 2011). Finally, I used Cattell's scree test on residual variance from an isomap analysis using the ‘Isomap’ function of the ‘RDRToolbox’ library of R (Bartenhagen 2010). Isomap is a nonlinear dimension reduction technique that preserves geodesic distances (i.e. shortest path through nearest neighbours rather than straight lines) between all sample units (Tenenbaum, De Silva & Langford 2000). Nonlinear methods, such as isomap, were created to detect such nonlinear structures and unfold them for more accurate mapping of high-dimensional data into lower dimensional spaces (Lee & Verleyson 2007). These methods generated a range of estimates of the intrinsic dimension for each of the three data sets, but the estimates tended to converge on the median across the methods.

Table 1. Estimations of the intrinsic dimensionality of three large trait data sets
MethodArizona, USACalifornia, USASheffield, UKa
  1. a

    Results from the Sheffield data set represent the median among 10 repeated simulations.

Cattell's scree test (Cattell 1966)366
Kaiser's rule (Kaiser 1960)4919
Parallel analysis (Horn 1965)456
Optimal coordinates (Ruscio & Roche 2012)456
Non-iterative nearest neighbour (Pettis et al. 1979)5510
Manifold-adaptive nearest neighbour (Farahmand, Szepesvári & Audibert 2007)547
Unbiased maximum likelihood (Levina & Bickel 2004; MacKay & Ghahramani 2005)558
NMS (Oksanen et al. 2011)446
Isomap (Bartenhagen 2010)346
Median456

Estimates of the intrinsic dimensionality of the Northern Arizona data set ranged between three and five, and the median was four (Table 1). Estimates of the intrinsic dimensionality of the California chaparral data set ranged between four and nine, and the median was five (Table 1). Estimates of the intrinsic dimensionality of the Sheffield data set ranged between 5 and 19, and the median was 6 (Table 1). The isomap method tended to estimate fewer dimensions than the linear methods (Table 1), indicating that nonlinear trait relationships may be causing linear methods to slightly overestimate the true dimensionality. Plant trait dimensionality did not exceed six even in the most comprehensive data set.

In each case, the median number of dimensions estimated here exceeded the number of dimensions that were described and discussed in the original papers (Grime et al. 1997; Ackerly 2004; Laughlin et al. 2010), indicating that trait dimensionality is generally higher than we often admit it to be. Laughlin et al. (2010) emphasized only three dimensions, and Ackerly (2004) emphasized only two dimensions. Grime et al. (1997) discussed three dimensions, but concluded that the two-dimensional CSR triangle could be superimposed on the first and third axes. Grime's parsimonious CSR theory proposed that two independent agents of selection, stress and disturbance, have driven the evolution of a two-dimensional, triangular space of plant strategies (Grime & Pierce 2012). The results presented here indicate that CSR theory may be too simple to account for the multidimensional variation in plant traits. Perhaps this is because of the multidimensional nature of both stresses (e.g. heat, frost, toxicity) and disturbances (e.g. herbivory, fire, storms). The underestimation of the dimensionality of functional traits is also likely exacerbated by the incompleteness of our trait data. Dimensionality may increase if we discover additional important axes of plant function and include harder-to-measure traits.

Higher trait dimensionality enhances our capacity to predict species abundances

Trait dimensionality is a blessing, not a curse. Each independent trait dimension has the potential to be selected by different environmental filters; for example, leaf tissue chemistry may be selected by soil nutrients (Richardson et al. 2005), and wood density may be selected by extreme climatic events (Chave et al. 2009). Additionally, the ability to discriminate among a set of objects will tend to always improve with increased number of variables, but the ability to discriminate improves at a faster rate when those variables are orthogonal. This can be illustrated using simulated data where two traits from five species are either correlated or uncorrelated (Fig. 2). In the case of correlated traits, species identity could be adequately predicted (i.e. discriminated) in trait space using a semi-parametric discriminant function 64% of the time. If the traits are orthogonal (i.e. if dimensionality increases), then species were correctly discriminated 88% of the time (Fig. 2).

Figure 2.

The blessing of dimensionality in trait-based ecology. These simulated bivariate trait distributions for five species illustrate that orthogonal trait dimensions allow for more accurate discrimination of species in trait space. Discriminant analysis was based on Gaussian finite mixture modelling (Fraley & Raftery 2003), which is used in the Traitspace model of community assembly for estimating probability density functions for species within high-dimensional trait spaces.

Enhanced discrimination of species in trait space is important in predictive models of community assembly that seek to predict the probability of a species given its distribution of trait values (Laughlin et al. 2012). For example, the trait distributions of species A and B overlap considerably when two redundant traits are measured (Fig. 2a). In this scenario, a trait-based model would consider these species to be more or less functionally redundant. However, it can be seen that species A and B are actually functionally distinct if two independent traits are considered (Fig. 2b). Thus, the ability to discriminate between species A and B is enhanced if the two measured traits reflect independent attributes of plant function. An advantage of high dimensionality in the context of trait-based models of community assembly is that the higher the intrinsic dimension of traits in the analysis, the more easily our models will be able to detect functional differences among species and accurately predict their distribution and abundance.

Recent research consistently shows that the ability to explain and predict community composition increases rapidly with the number of traits included in the model (Fig. 3 and Appendix S1), which substantiates the importance of being able to discriminate species within trait space. Importantly, a combination of different traits was required to obtain good predictions (Fig. 3). The specific combination of traits differed among ecosystems, but traits from multiple organs and whole-plant properties were required to accurately explain and predict community assemblages. In the ponderosa pine forest understorey of Northern Arizona, leaf, root and seed traits were most important (Shipley et al. 2011). In the montane and subalpine forests of Arizona, stem traits, height and flowering phenology were most important (Laughlin et al. 2011). In the fynbos of South Africa, leaf traits, stem traits and flowering phenology were most important (Merow, Latimer & Silander 2011). In the tussock grasslands of New Zealand, leaf traits, root traits, height and life-history traits were most important (Laliberté et al. 2012). In upland rangelands of France, leaf traits, height, seed traits and flowering phenology were most important (Sonnier et al. 2012). In the arid steppe of Morocco, leaf traits, stem traits, flowering phenology and life-history traits were most important (Frenette-Dussault et al. 2013). These results provide strong evidence of the significance of including traits from at least three different organs and whole-plant properties that provide unique information about plant function to maximize our understanding of trait-based community assembly.

Figure 3.

Relationship between the number of traits and the ability to predict and explain variation in community composition (based on the R2 of the relationship between observed and predicted relative abundances) using a trait-based model of community assembly in six published studies. Vertical dotted lines indicate where predictive power begins to plateau. Data were obtained from analysis of published data or directly from the authors and are reproduced here with permission.

However, the ability to predict community composition stops increasing substantially after four to eight traits, depending on the ecosystem (Fig. 3), suggesting that including more than eight traits leads to diminishing returns. Ecologists should minimize the number of traits while maximizing the number of dimensions because including multiple correlated traits does not yield dividends. It must be noted that the number of traits needed to explain and predict community assembly is not the same thing as the intrinsic dimensionality of plant traits. Indeed, many of the important traits in some studies were correlated traits (e.g. SLA and LDMC in the arid steppe of Morocco). However, the intriguing general result that has emerged from these studies is that traits from multiple organs and whole-plant properties were needed to explain community assembly. Traits from different plant organs can be correlated (Freschet et al. 2010; Pérez-Ramos et al. 2012), so we cannot expect any core list of plant traits from different organs to be completely independent. However, our understanding of community assembly processes will be maximized when we measure traits that are maximally independent, and our chance of this occurring increases by measuring traits from different plant organs (Fig. 1).

The importance of each organ or whole-plant property can be ranked by counting the number of times each was important, and dividing this by the number of times each was included across the six studies (Appendix S1). For example, leaf traits were consistently useful because they were important in five out of the six (hereafter, ‘5/6’) studies that included them. Based on this ranking, leaf traits, stem traits (3/3), flowering phenology (4/6) and root traits (2/3) were consistently important and were useful at least 67% of the time. Height (3/6), seed traits (2/5) and life-history traits (2/5) were less consistent, but still important at least 40% of the time. Leaf, height, seed and life-history traits are typically the most commonly measured traits. Given the importance of stem traits, flowering phenology and root traits across the majority of studies assessed here, these should also be included in any core list of plant traits and, where applicable, be incorporated into analyses of community assembly.

Based on this review, there appears to be a tractable upper limit to the dimensionality of plant traits. To optimize research efficiency for advancing a trait-based ecology, we should measure maximally independent traits from as many different plant organs as is practical. Though much work is still needed to determine the interdependencies among plant organs, our understanding of community assembly will rapidly advance if ecologists seek a whole-organism perspective by measuring leaf traits, stem traits, root traits, flowering phenology, maximum height, seed traits and life-history traits.

Acknowledgements

I thank Chris Lusk, Mike Clearwater, the students and staff in the Plant Biology and Ecology Lab at the University of Waikato, and two anonymous referees for providing helpful comments and suggestions. This research was supported by a Grant (UOW1201) from the Royal Society of New Zealand Marsden Fund.

Ancillary