A framework to study and predict functional trait syndromes using phylogenetic and environmental data

Traits do not evolve in isolation but often as part of integrated trait syndromes, yet the relative contributions of environmental effects and evolutionary history on traits and their correlations are not easily resolved. In the present study, we develop a methodological framework to elucidate eco‐evolutionary patterns in functional trait syndromes. We do so by separating the amount of variance and covariance related to phylogenetic heritage and environmental variables (environmental phylogenetic conservatism), only phylogenetic heritage (non‐attributed phylogenetic conservatism) and only to environmental variables (evolutionarily labile environmental effects). Variance–covariance structures of trait syndromes are displayed as networks. We then use this framework to guide a newly derived imputation method based on machine learning models that predict trait values for unsampled taxa, considering environmental and phylogenetic information as well as trait covariation. TrEvol is presented as an R package providing a unified set of methodologies to study and predict multivariate trait patterns and improve our capacity to impute trait values. To illustrate its use, we leverage both simulated data and species‐level traits related to hydraulics and the leaf economics spectrum, in relation to an aridity index, demonstrating that most trait correlations can be attributed to environmental phylogenetic conservatism. This conceptual framework can be employed to examine issues ranging from the evolution of trait adaptation at different phylogenetic depths to intraspecific trait variation.


| INTRODUC TI ON
Functional traits are defined as morpho-physio-phenological attributes that impact fitness via their effects on individual performance (Violle et al., 2007).As such, they are likely to undergo adaptive evolution in response to environmental drivers (Ackerly et al., 2000).
The functional significance of any one trait depends on coordination with other traits, creating functional strategies referred to as trait syndromes that contribute to success under different environmental conditions (Reich et al., 2003;Sanchez-Martinez et al., 2020;Wright et al., 2004).A trait syndrome is a combination of functional trait values that occur in a given taxonomic unit (e.g. a species) and that can be shared with others (e.g. a group of species, which may or may not be closely related).Variability in functional syndromes can lead to the conformation along trait modules (also referred to as trait axes or trait spectra), which are sets of traits that covary, potentially in response to environmental conditions, and that may also be associated with evolutionary legacies (Cavender-Bares et al., 2016;Reich et al., 2003).While the relationships among some functional traits (e.g.Mencuccini et al., 2019;Wright et al., 2004), their relationship with environmental variables (e.g.Bhaskar & Ackerly, 2006;Flo et al., 2021) and trait phylogenetic conservatism (e.g.Ackerly, 2009;Flores et al., 2014) have been widely studied, to our knowledge, a unified framework to study trait syndromes from an eco-evolutionary perspective is still lacking.We posit that a phylogenetically explicit framework describing the multivariate structure of traits and their relationship with the environment will improve our capacity to understand the ecological and evolutionary nature of correlations underlying trait syndromes observed in nature.In turn, this framework may be used to predict plant functional trait values for unsampled taxa, which can help, for example, in determining vegetation responses to environmental changes (Anderegg et al., 2021;Choat et al., 2018;Sanchez-Martinez et al., 2023).The proposed predictive framework can be especially useful in the common situation where sparse trait data undermine the ability to make predictions in understudied locations or vegetation types (García-Valdés et al., 2021;Kattge et al., 2020;Trugman et al., 2020).
Phylogenetic conservatism and evolutionary lability appear in the literature as contrasted concepts describing patterns in functional traits that may arise under different evolutionary scenarios (Blomberg et al., 2003;Blomberg & Garland, 2002).Phylogenetic conservatism also referred as phylogenetic inertia or phylogenetic signal can appear under a scenario of lineage-specific stabilising selection, meaning that values of a given trait will be maintained and transmitted to descendants (Crisp & Cook, 2012).This scenario is compatible with a slow evolution of traits and the existence of strong environmental filtering determining the environmental space that species occupy (Losos, 2008).Under this scenario, phylogenetic conservatism in functional traits is expected to be related to environmental variables describing the species ecological niche.This is translated into a pattern of closely related species occupying similar ecological spaces and responding to the environment by means of similar functional strategies.This pattern of phylogenetic conservatism related to the environmental space that defines the ecological niche is referred to as phylogenetic niche conservatism (Crisp et al., 2009;Crisp & Cook, 2012;Losos, 2008).Because the ecological niche of a given taxonomic entity is usually complex and multivariate (Holt, 2009), quantifying its overall effect on functional syndromes is challenging, and here we take the simpler approach of using individual environmental variables to characterise phylogenetic niche conservatism.We refer to this pattern as environmental phylogenetic conservatism for a given trait or trait syndrome in response to a given environmental variable or environmental axis.
Then, to characterise phylogenetic niche conservatism, multiple analyses may be required to define the multivariate niche.
Phylogenetic conservatism in trait variation and trait-trait correlations can also be related to evolutionary constraints that are not associated with measured environmental variables.These potentially non-adaptive processes can lead to a slow and constrained evolution as a result of genetic correlation, linkage disequilibrium, pleiotropy, lack of genetic variability or homogenising gene flow, among others (Ackerly, 2009).This conservatism may also be related to unmeasured environmental variables (e.g.soil effects in a study only considering climate).We refer to this conservatism in functional trait syndromes as the non-attributed phylogenetic conservatism (Figure 1).
Evolutionary lability is a pattern that may appear in a scenario where evolution is less constrained by ancestral values.This scenario is compatible with rapid evolution, which can be adaptive, leading to patterns such as adaptive radiation (Ackerly, 2009;Ackerly et al., 2000).Under this adaptive scenario, we expect traits to be related to environmental variables exerting selective pressures in a non-phylogenetically structured way, with closely related taxa presenting different strategies that allow them to survive under different conditions.We refer to this pattern as the evolutionarily labile environmental effect of a specific environmental variable on trait syndromes.
A better knowledge of functional trait correlations and their environmental and phylogenetic foundations can improve predictive power to infer unmeasured trait values.Previous methodologies allow one to predict missing values by using phylogenetic information (Swenson, 2014) and statistical covariation among traits and environmental variables (Maynard et al., 2022;Poyatos et al., 2018).
However, in some cases, traits and their correlations are highly phylogenetically conserved, such that phylogeny is very informative in predicting missing values, while in other cases, traits present a more (evolutionarily) labile correlation in response to environmental variables.In the latter cases, environmental variables may be better predictors of missing functional trait values.In addition, the organisation of traits in modules (i.e.groups of tightly related traits) indicates that not all traits are equally informative for inferring the values of other traits.We propose an imputation method that implements a data-driven procedure to select which predictors to include in each case, based on trait covariation and relationships with phylogeny and environmental data.
The methodology described here uses phylogenetic mixed models to separate the contribution of environmental phylogenetic conservatism, non-attributed phylogenetic conservatism and evolutionarily labile environmental effects to trait variances and covariances, helping to elucidate their relative importance in shaping patterns in comparative data.Next, this methodology is used to optimise the use of available data in a newly-derived machine learning algorithm that predicts missing values for functional traits, and which we compare with alternative imputation methods.The reliability of this framework is first tested using simulated data, and it is then applied to a real global dataset of hydraulic (Mencuccini et al., 2019;Sanchez-Martinez et al., 2020) and leaf economics spectrum (LES) traits (Wright et al., 2004) for woody angiosperm species.The methodology presented here is implemented in an accompanying R package named TrEvol.A tutorial on how to use the package can be found in Supporting Information and at the following link: https:// github.com/ pablo sanch ezmart/ TrEvol.

| ME THODS
The proposed method consists of two parts.The first part consists of determining the variance-covariance matrix of pairs of traits by elucidating their association with environmental factors and a phylogeny.We represent these results as trait networks to describe the partition of variance-covariance for a given set of traits, differentiating four components of interest: (1) environmental phylogenetic conservatism, (2) non-attributed phylogenetic conservatism, (3) evolutionarily labile environmental effects and (4) residual variance-covariance (not related to the phylogeny, nor to the environmental variable considered).Network metrics can be calculated for each case, which can prove helpful to summarise the multivariate structure of evolutionary patterns in trait syndromes.The second part of the method consists of an imputation algorithm, which uses the previously described variance-covariance structures to make decisions on the use Variance and covariance partitioning into the different components related to non-attributed phylogenetic conservatism, environmental phylogenetic conservatism, evolutionarily labile environmental effects and residual.Note how nonattributed phylogenetic conservatism and environmental phylogenetic conservatism sum to total phylogenetic conservatism and environmental phylogenetic conservatism and evolutionarily labile environmental effect sum to total environmental effect.Examples representing two traits in each case are presented as networks (showing variance results as pie charts in nodes and correlation as edges).At the bottom, examples for different traits showing extreme cases maximising each one of the components are shown as scatterplots, representing two different hypothetical traits in each case and the phylogeny relating different points, showing the phylogenetic group for each terminal taxon value (different shapes for different major lineages) and an environmental variable as the filling colour.Each point is connected to a terminal taxon in the phylogeny by red-dashed lines.We can see how in the non-attributed phylogenetic conservatism scenario, only the phylogeny is related to variance and covariance patterns.In the environmental phylogenetic conservatism scenario, both the phylogeny and the environmental variable are related to the variance and covariance patterns.In the evolutionarily labile environmental effect scenario, only the environmental gradient is related to the variance and covariance patterns.Finally, in the residuals, not the phylogeny nor the environment are related to variances and correlation between traits.
of available data to produce the most accurate predictions (imputations) of missing values.

| Trait variance-covariance partition
We developed a framework to estimate trait variance-covariance related to an environmental variable or axis and to a phylogeny, so that the individual contribution of these elements (i.e.evolutionarily labile environmental effect and non-attributed phylogenetic conservatism, respectively) as well as their combined contribution (environmental phylogenetic conservatism) can be calculated for pairs of traits.
To do so, we used multi-response phylogenetic mixed models implemented in the MCMCglmm R package (Hadfield, 2010a).In these models, the phylogenetic effect is set as a random factor on intercepts.To do so, a relatedness matrix is calculated from the phylogeny and it is then used to calculate the effect of the phylogeny on variances for each one of the two traits of interest and their covariances.The phylogenetic effect is analogous to Pagel's lambda, being equal to 1 when the data reflect a pure Brownian Motion model of evolution (Pagel, 1999).As we are interested in evaluating the phylogenetic conservatism of trait relationships with the environment, the environmental variable of interest is also included as a response variable.Then, when an environmental variable is included with the trait pair, we have a model with three response variables for which the phylogenetic and non-phylogenetic variances and covariances are estimated.We understand the environmental variable at the macroevolutionary level, as a component of the ecological niche that can evolve through time.
Because MCMCglmm uses a Bayesian framework, it needs prior distributions for the fixed and random effects.We used the default prior for the fixed effects, as suggested in previous implementations (https:// www.mpcm-evolu tion.com/ pract ice/ onlin e-pract ical-mater ial-chapt er-11/ chapt er-11-1-simpl e-model -mcmcglmm).These priors follow an inverse-Gamma distribution, which is canonical and is considered a non-informative prior (https:// devil lemer euil.legtux.org/ wp-conte nt/ uploa ds/ 2012/ 12/ tuto_ en.pdf) (Hadfield, 2010a(Hadfield, , 2010b(Hadfield, , 2019)).The random effects priors can be manually modified using the defineModelSpecifications function of the TrEvol package, where users can also modify the number of iterations (default to 1000), the burn-in (default to 100) and thinning (default to 5) (see Supporting Information for a basic tutorial).
For a specified list of traits, the computeVarianceCovariancePartition function selects all pairwise combinations among traits and a single environmental variable (when specified) and builds phylogenetic mixed models including them as response variables (i.e.triresponse models, two traits and one environmental variable).When more than two traits are included, the function iterates and does all possible pairwise combinations among different traits.For each combination of traits, all pairwise complete observations are used to calculate variances and covariances, maximising the use of the data available.For each trait-trait-environment variable combination, the model estimates the amount of phylogenetically dependent (u) and phylogenetically independent (e) variances and covariances.Given two traits (T1 and T2) and a continuous environmental variable (E), the model structure is the following: From this model, the following two variance-covariance matrices are estimated: The elements of these matrices are used to calculate the environmental phylogenetic variance (VAR env,phylo ), the non-attributed phylogenetic variance (VAR phylo ), the labile environmental variance (VAR env ) and the residual variance (VAR res ).Then, these variance components can be aggregated to calculate total phylogenetic variance (VAR total_phylo ), total environmental variance (VAR total_env ) and total variance (VAR total ) for a given trait (e.g.T1), as follows (in bold, aggregation of variance estimates): The same applies to trait 2. Note that all the variance-covariance estimates used to calculate our metrics are derived from the outputs of the multi-response phylogenetic mixed models (specified in matrices u and e).When more than two traits are included, different variances can be estimated, as a model is specified for each pairwise combination of traits.In these cases, the mean of all variances' posterior distributions are reported for each trait.
(1) (T1, T2, E) ∼ u + e. (2) Different variance estimates for the same trait calculated in the current study had a very low variability (lower than the 5% of the mean).However, users can explore this variability by including the traits in the computeVarianceCovariancePartition function in pairs, to generate a single estimate of variance for each trait before including all traits together.We recommend this practice to evaluate variability in variance estimates for a given trait before reporting the main results.
Similarly, elements of the two matrices were used to calculate environmental phylogenetic covariance (COV env,phylo ), non-attributed phylogenetic covariance (COV phylo ), labile environmental covariance (COV env ), and residual covariance (COV res ) between all pairwise combinations of traits.Then, these covariance components can be aggregated to calculate total phylogenetic covariance (COV total_phylo ), total environmental covariance (COV total_env ) and total covariance (COV total ) for a given pair of traits (e.g.T1 and T2), as follows (in bold, aggregation of variance estimates): When no environmental effect is specified, the function calculates total variance and total covariance and phylogenetic variance and phylogenetic covariance, the difference being the total non-phylogenetic variance and covariance.The function allows for representation of variances in absolute or relative terms.When show_relative_variance = TRUE (which is the default), the proportion of explained variance from the total variance is reported.Covariance is reported as correlation in all cases (e.g. using phylogenetic variances and covariances to calculate phylogenetic correlation), calculated as follows: The Inputs needed in the computeVarianceCovariancePartition function are: • traits: character vector containing the names of the trait or traits to be considered.These variables need to be in the dataset.For example c('trait_1', 'trait_2').
• environmental_variable: character vector containing the names of the continuous environmental variable to be considered.These variable need to be in the dataset.For example 'aridity_index'.See supplementary methods for a version of the methodology that allows to include more than one environmental variable at the same time.
• dataset: data frame containing traits, environmental variables and taxon names (e.g.species, genus or other taxon names).
• phylogeny: phylo object including taxa present in the dataset.The function matches internally the dataset with the phylogeny, so it allows missing values in the data frame and some taxa present or absent in the phylogeny.The function will only use complete cases for each pair of traits, so data with incomplete phylogenetic and environmental observations are pruned.
• Terminal_taxa: character identitying which is the column containing the terminal taxa (e.g.'species', which is the default).

| Networks
To represent the multivariate structure, variance-covariance structures can be plotted as networks by the function plotNetworks, which shows traits' variance (i.e.variance related to the environment, phylogeny or their combination) as pie charts and correlation (11) COV T1,T2  env,phylo = |r| mean represent networks with a higher dependence among related traits.At the node level, degree (i.e.number of connections per node) can be also calculated and displayed as node size when the argument degree_as_node_size is set to TRUE, which is the default.
Traits with a high degree value (i.e. higher node size in the visualisation) can be considered hubs.

| Imputation algorithm
The package presented here includes an imputation algorithm performed by the function imputeTraits, which uses the previously described variance-covariance structures to guide the trait imputation process.Inputs needed are: • dataset: a data frame containing traits with missing values and optionally environmental variables for a set of terminal taxa.Note that more than one observation per taxon can be included (i.e. the methodology can include within species variability).
• terminal_taxa: character identifying which is the column containing the terminal taxa (e.g.'species', which is the default).
• phylogeny: a 'phylo' object including all terminal taxa present in the dataset.
• variables_to_impute: character vector indicating the name of the variables to be imputed as they appear in the dataset.
• predictors: character vector indicating the list of environmental factors included in the dataset (without missing values) that may be considered as potential predictors (see below).Phylogenetic coordinates are stored internally so the algorithm does not need to calculate them each time that it is run for a given dataset, which can be time consuming specially for big datasets.
However, it will recalculate them if the argument force_run is set The algorithm uses random forest models to predict missing values by means of the randomForest function of the randomForest R package (Liaw & Wiener, 2002).These models calculate out of bag errors (OOB), which are returned also as a measure of imputation error reported as the normalised root mean square error (NRMSE).
Moreover, the developed algorithm calculates cross validation R 2 values by randomly creating NAs in the traits to be imputed, predicting values, and then comparing predictions with observed values.
The proportion of NAs that are randomly created to calculate cross validation R 2 is controlled by the proportion_NAs argument of the function (between 0 and 1, set to zero by default).When propor-tion_NAs = 0, cross validation R 2 is not calculated.Parallel processing is enabled in the function when the number_clusters is set to an integer number higher than one, which will determine the number of clusters used.By default, the number of clusters used is set to two.
The parallelisation method is based on previous implementations of random forests in R programming (Stekhoven & Bühlmann, 2012).

| Application
To test the methodology, we first applied it to 10 simulated traits, each one with 100 observations (simulated species) with a known variance-covariance structure (Table S1).We also simulated a pure-birth stochastic phylogeny using the pbtree function of the phytools package (Revell, 2012) for the 100 simulated species.Five traits were simulated to present a phylogenetic component in their variance-covariance structure by means of simulate_bm_model function of the R package castor (Louca & Doebeli, 2018), including the simulated phylogeny and the variance-covariance matrix (Table S1) as the diffusion matrix.The remaining five traits were simulated using the same variance-covariance matrix (Table S1) to present a variance-covariance structure not-explicitly related to the phylogeny by using the function rnorm_multi of the R package faux (DeBruine, 2021).S2).Note that 'Env' variables were considered as environmental variables but the procedure to obtain them was not different from the one implemented for the rest of the simulated variables.The expectations for results were summarised a priori (Table S2).Variable simulations were performed by using the simulateDataSet function of the R package presented here, which by default produces the simulated data used in this study.
Next, we applied the methodology to real data on functional traits for angiosperm woody-plant species related to hydraulic function and the leaf economics spectrum (Wright et al., 2004).We used data covering seven leaf traits coming from a previously published dataset (Mencuccini et al., 2019).Functional traits included were specific leaf area (cm 2 g −1 , SLA, number of species = 1183), leaf nitrogen content (mg g −1 , number of species = 930), maximum photosynthetic rate per area as a measure of photosynthetic capacity Partitioned trait networks for simulated data and for functional trait data were obtained.Then, the imputation method was implemented both for simulated and for functional traits, considering environmental effects described in each case as potential predictors as well as the phylogeny.In this empirical data case, a species level phylogeny was built using the V.PhyloMaker R package (Qian & Jin, 2016).The remaining arguments of the function imputeTraits were maintained as default.
To compare results obtained by imputeTraits with existing imputation methodologies, we performed a cross validation procedure by using 80%-20%, 50%-50% and 30%-70% of the data to test and train, respectively, for each one of the methodologies, including our function imputeTraits.These methodologies include mean imputation (i.e.mean for each one of the traits to be taken as imputed values); MICE imputation (van Buuren & Groothuis-Oudshoorn, 2011), which uses the covariation of multiple traits to be imputed as well as other complete environmental variables to predict missing values; and a phylogenetic imputation by means of the phylopars R package, which uses trait phylogenetic variance-covariance to predict missing values (Goolsby et al., 2017).Thus, predictive performance results reported are not the ones reported internally by the impu-teTraits function but calculated outside the algorithm in order to use the same cross validation data for all the imputation methods compared.The cross-validation process was iterated 10 times in each case.We acknowledge that a cross-validation using data with an underlying phylogenetic structure may underestimate prediction error and overestimate cross-validation R 2 (Roberts et al., 2017).However, error and R 2 in this implementation are used to compare the predictive capacity of different methodologies, so absolute values are of less interest and we expect the phylogenetic structure to have similar effects on the R 2 and errors for all the methodologies.

| Variance-covariance partition results for simulated data
Results obtained by applying the described methodology for a set of simulated traits matched those expected given the variance-covariance matrix used to simulate data, detecting the four hypothesised modules (Table S1; Figure 2a).The methodology correctly detected the amount of variance and covariance related to the phylogenetic structure of the data, as is the case for the covariance between Phylo Trait 1 and Phylo Trait 2, between Phylo Trait 3 and Phylo Trait 4 and their variances (edges and pie charts, respectively, in Figures 3 and   4).Moreover, it successfully detected the independence between these two modules of phylogenetically correlated traits.The methodology also successfully separated the variance and covariance that is uniquely attributed to shared evolutionary heritage (nonattributed phylogenetic conservatism) from environmental phylogenetic conservatism associated with Phylo Env (Figures 4b,c and 5b,c).
Finally, the methodology successfully detected the evolutionarily labile environmental effect (i.e.phylogenetically independent effect of the environmental variable) of Non Phylo Env on variances of Non Phylo Trait 1 and Non Phylo Trait 2 traits as well as on their covariance (Figure 4d), while the covariance not related to the phylogenetic structure nor to the given environmental variables were placed within the unexplained or residual component (Figures 3 and   4e).Networks presenting a higher contribution to the covariation pattern had higher edge density (ED); those presenting a strong relationship between connected nodes showed a high maximum absolute correlation (|r| max ) and mean absolute correlation (|r| mean ) and the number of modules related to each one of the components was also successfully detected by the methodology (NM) (Figures 3 and 4).

| Variance-covariance partition results for functional trait data
When applied to the dataset of traits of woody angiosperms species, we found that leaf economics traits and leaf hydraulic traits conformed to two main modules (Figure 5a), that trait variances and covariance were mainly phylogenetically conserved and that this conservatism was partially associated with aridity (Figures 2b   and 5).Contrarily, most of the phylogenetic conservatism in leaf economic traits and their covariance (specific leaf area, nitrogen content and leaf lifespan) cannot be attributed to aridity (Figure 5b).The second module relates hydraulic sufficiency with photosynthetic capacity (higher maximum hydraulic conductivity related to higher photosynthetic capacity, both expressed on a per unit leaf area).
This module showed phylogenetic conservatism, which cannot be attributed to aridity (Figure 5b).
The two modules described were shown to be integrated in response to aridity (Figure 5c).More concretely, specific leaf area was positively related to hydraulic sufficiency and negatively related to photosynthetic capacity, and drought exposure was negatively related to hydraulic sufficiency.Then, the integration between these two modules is mediated by specific leaf area and xylem drought exposure, which show a high degree (number of connections presented by a node) and act as hubs of plant functional trait syndromes in woody angiosperm species.These correlations are related to modules in response to aridity (Figure 5c).The higher ED in the phylogenetic part of trait relationships indicates that most of the trait covariances are phylogenetically conserved.The higher |r| max and |r| mean in these networks indicates that phylogenetically conserved relationships are stronger (Figure 5b,c).

| Imputation results
The predictive performance of the presented imputation algorithm improved when informative imputed traits were considered as shown by the higher cross-validation R 2 and lower cross-validation Leaf trait networks considering aridity as the environmental variable of interest.Trait networks using angiosperm woody plant traits data considering aridity (aridity index) as the environmental variable of interest.Pie charts represent the proportion of the variance for each variable related to each one of the components (non-attributed phylogenetic conservatism, environmental phylogenetic conservatism, evolutionarily labile environmental effect, and residual) and edges represent the covariance related to each one of these components shown as correlations (green-solid lines represent positive correlations and red-dashed line represent negative correlations with a width proportional to the correlation coefficient).Node size is proportional to node degree (number of connections).Network metrics are also shown, in each case.NRMSE of the second and third imputation rounds compared to results obtained using only phylogenetic and environmental data (first round) (Figure S3), both using simulated and leaf functional trait data.The imputeTraits algorithm outperformed mean imputation and the MICE imputation when using all environmental and traits covariation and performed similarly to phylopars imputation using all environmental and traits phylogenetic covariation when traits show phylogenetic structure.The new algorithm outperformed phylopars approach when the covariance among traits was not phylogenetically structured (Figure 6).Thus, considering all simulation scenarios, the new algorithm proposed here was the most effective, both in simulated and leaf functional data.

| A new framework to study functional syndromes evolution
In this study, we developed a new methodological framework to study eco-evolutionary patterns in multiple functional traits.This methodology allows separation of variances and covariances related to phylogenetic heritage from those that are phylogenetically independent, complementing previous methods such as phylogenetically independent contrasts and phylogenetic least squares, which incorporate the phylogenetic structure when analysing trait relationships, identifying correlations that are not associated with a low number of deeply conserved divergences (Blomberg, 2016;Felsenstein, 1988).By using this methodology, evolutionary patterns can be elucidated not only from phylogenetically independent contrast results as done in previous studies (e.g.Ackerly & Reich, 1999;Maherali et al., 2004;Zheng et al., 2019), but also from the phylogenetically conserved component of evolutionary relationships (Sanchez-Martinez et al., 2020).Furthermore, our methodology makes it possible to test whether phylogenetically conserved and phylogenetically independent structures are related to a potential driving variable (here, environmental variable).
Therefore, this method quantifies which part of trait variance and covariance is related only to the phylogeny given one or more variable(s) (non-attributed phylogenetic conservatism), only to an environmental variable (evolutionarily labile environmental effect) and to the phylogenetically conserved environmental effect (environmental phylogenetic conservatism).The latter represents (co) variation that is related to both an environmental variable and a phylogeny.
These three components show patterns reflecting different evolutionary processes which could not be distinguished using previously published frameworks, which were not able to consider multiple traits, their phylogenetic structure and their relationship with the environment all at once.Our approach also partitions networks and calculates network metrics such as edge density (ED), number of modules (NM), maximum absolute correlation (|r| max ) and mean absolute correlation (|r| mean ) for each component, showing how the multivariate trait structure is related to the phylogeny and to the environment.

| Leaf economic and hydraulic conform to two phylogenetically conserved modules integrated in response to aridity
Implementing the new methodological approach we demonstrate that the leaf economics spectrum (Wright et al., 2004) and plant hydraulics are evolutionarily integrated, conforming to two evolutionary modules across species at a global scale.The first module describes the coordination between leaf acquisitiveness (Wright et al., 2004) and the coordination between xylem drought exposure and tolerance (Sanchez-Martinez et al., 2020).The second module describes the coordination between leaf hydraulic sufficiency and photosynthetic capacity (Scoffoni et al., 2016).Our results show how these two modules are partially integrated following a pattern of environmental phylogenetic conservatism related to aridity.This integration is mainly mediated by specific leaf area and drought exposure, which act as functional hubs relating the two modules previously presented in response to aridity.However, some phylogenetic conservatism in individual trait variability, the correlation among leaf economic traits and the correlation between hydraulic sufficiency and photosynthetic capacity were not attributed to a response to macro-evolutionary patterns of aridity.These functional traits and their integration may be related to other environmental components of species niches (e.g.soil properties, successional status) or may be constrained at the phenotypic or genetic level.
From these results, we can better understand the generalities of woody angiosperm adaptations to aridity.Some angiosperm lineages exposed to low soil water availability and high evaporative demand will tend to present a more negative turgor loss point leading to a higher capability to maintain stomata open and carbon uptake under low water potentials (i.e.anisohydric strategy), presenting a higher drought exposure in the xylem as a consequence (Volaire, 2018).
These lineages will tend to present an embolism resistant xylem that will allow maintenance of water transport under low water potentials (Sanchez-Martinez et al., 2020).Drought resistant lineages will tend to have smaller and/or thicker leaves as an adaptation to tolerate water stress (Wright et al., 2017), with lower nitrogen content and higher leaf lifespan, conforming to a conservative strategy of resource acquisition and processing (Reich, 2014;Wright et al., 2004).
Importantly, lineages with a lower specific leaf area will also tend to present a relatively higher photosynthetic capacity (per unit area) in response to aridity levels, suggesting that even if there is a phylogenetically conserved coordination between hydraulic conductivity and carbon uptake, this coordination is not associated with the macro-evolutionary response to aridity.
These results mean in turn that under more arid conditions, lineages with a lower specific leaf area may tend to present a higher photosynthetic capacity per unit leaf area independently of their hydraulic sufficiency, probably as an adaptation to maintain carbon uptake under low water availability.Then, a drought resistant functional strategy may compensate for the lower hydraulic sufficiency and the lower leaf area by increasing its photosynthetic capacity per unit area.This result contradicts lineage-specific results reporting that hydraulic sufficiency is evolutionarily coordinated with photosynthetic capacity in response to aridity (Scoffoni et al., 2016) but supports other evidence showing that photosynthetic capacity may increase under dry conditions (Green et al., 2020;Ramírez-Valiente & Cavender-Bares, 2017).It should be noted, however, that these traits presented the lowest data availability from all the combination of traits used (118 species with both traits), which may limit the ability to extract global The reported results show that leaf economic traits and hydraulic traits are not evolving independently, leading to a pattern of phylogenetic conservatism where closely related species tend to present similar functional strategies.These relationships are partially related to species niches characterisation, specifically to aridity, pointing to a pattern of phylogenetic niche conservatism (Losos, 2008).Based on these results, we expect adaptation in leaf and hydraulic trait syndromes of woody angiosperms to happen in a slow and conserved manner, broadly maintaining their relationships and multivariate structure, with closely related species tending to occupy similar ecological and functional spaces (i.e.having similar trait syndromes).It seems likely that environmental filtering related to variables such as water availability and atmospheric water demand constrains the diversity of trait syndromes that can be present under a given set of conditions and hardwired relationships among traits constrain individual trait variation at the macro-evolutionary scale.Elucidating the directionality of these relationships in future works will help us to better understand which traits are responding to selective pressures directly and which are indirectly related to them as a result of trait integration.

| Using information on trait evolution to perform data imputation
Our statistical framework also enables new approaches to imputation and achieved high predictive performance, especially when trait covariances are considered.The proposed imputation methodology optimises the usage of available data by including phylogenetic, environmental, and additional functional trait data as predictors when they present a significant correlation with the variable to be imputed.This method provides a framework for users that are not familiar with phylogenetic methods, internally testing for the importance of the phylogenetic structure, and including it when informative.Moreover, it allows the consideration of multiple environmental variables, which will only be included when informative.Finally, it allows for the consideration of functional trait data with missing values as predictors by performing imputation and then including them in predictive models when informative.
This methodology outperformed mean and MICE imputations, while performing comparably to phylopars in most cases, and outperforming it for non-phylogenetically conserved traits and observed plant functional traits.However, it slightly underperformed phylopars for phylogenetically conserved traits.This may be associated with the fact that we include phylogenetic data by means of a relatively low number of principal components describing most of the phylogenetic structure, which may lead to relatively lower predictive performance in highly phylogenetically structured traits compared to methods that include the whole phylogeny, such as phylopars.Overall, we show how the presented methodology is a good option, especially when using real data, which has a more complex underlying structure than the data simulated in this study.
The fact that the methodology presented here uses widely available information, such as phylogenetic and environmental data, makes it broadly applicable, particularly given the generalised scarcity of functional data at global scales.Moreover, it facilitates the use of phylogenetic information for ecologists without training in phylogenetics.

| Future directions
The new methodology proposed here estimates the phylogenetic effect on variances and covariances based on an underlying Brownian Motion model of evolution.Then, this methodology does not allow, for instance, to include different adaptive optima leading patterns in the data.In future versions of this method, other evolutionary models may be included, such as the Orstein-Ulhenbeck model of evolution to allow for one or more adaptive optima (Butler & King, 2004).
The proposed methodology also allows further elucidation of patterns at the intraspecific level as well as to use this information to predict missing values.In these cases, the phylogeny is provided at the species level but variability in trait and environmental data can be considered by including more than one observation per species.
The application of the proposed framework including within-species variability in future work will allow better quantification of the extent to which variability within species is phylogenetically structured and related to specific environmental variables.

CO N FLI C T O F I NTE R E S T S TATE M E NT
Authors have no conflict of interests to declare.
function computeVarianceCovariancePartition returns a list containing different elements.The first element includes two results tables, one displaying variance results and the other the covariance results.The second element is the individual model outputs (MCMCglmm models from which the estimates to calculate variances and covariance are derived).The third element is the table of posterior distributions for each variance and covariance component, from which variances and covariances are calculated.These posterior distributions can be used to diagnose models by testing convergence, effective sample sizes and autocorrelation.Tests for the convergence, effective sample sizes and autocorrelation are implemented in TrEvol package by means of the diagnose-Models function.The result tables showing variance and covariance (given as correlation) partitions show the mean estimate for each variance and covariance estimate jointly with the credible intervals and a p-value representing statistical significance.The interpretation of credible intervals is that there is a 95% probability that the true estimate lies within the interval, given the data.Then, there is a statistically significant effect when the 95% credible interval does not contain zero.This result table also reports p-values as an alternative to evaluate statistical significance of estimates.p-values are calculated from the probability of direction, p-direction, a metric that has shown high correspondence to the frequentist p-value (two sided p-value = 2 × (1 − p-direction))(Makowski, Ben-Shachar, Chen, et al., 2019).Probability of direction is calculated using the bayes-testR package(Makowski, Ben-Shachar, & Lüdecke, 2019)  for the distributions obtained for each variance and covariance component reported.

=
COV T1,T2 phylo + COVT1,T2  env,phylo , (16) COVT1,T2  total_env = COV T1,T2 env,phylo + COV T1,T2 env ,(17) COV T1,T2 total = COV T1,T2 u + COV T1,T2 e = COV T1,T2 total_env + COV T1,T2 phylo + COV T1,T2 res .each one of the pairs of traits included.A given network represents trait variances and correlations related to environmental phylogenetic conservatism, non-attributed phylogenetic conservatism, evolutionarily labile environmental effect and residuals.Total correlation can also be plotted.The same function also calculates and displays network metrics when the plot_metrics argument is set to TRUE, which is the default.These measures are the number of modules (NM) showing how many independent groups of correlated traits exist; edge density (ED) describing the proportion of actual connections among nodes out of all possible connections; maximum absolute correlation coefficient (|r| max ) and mean absolute correlation coefficient (|r| mean ).In this framework, NM represents the number of trait modules, high ED represents high coordination among all traits and high |r| max and Phylogenetic information is included to predict by calculating phylogenetic coordinates for each terminal taxon.To calculate phylogenetic coordinates, the algorithm first calculates the phylogenetic distance matrix by means of the cophenetic.phylofunction and then calculates principal coordinates by means of the pcoa function, both from the ape package(Paradis & Schliep, 2019).
to 'TRUE'.Users can also control the number of phylogenetic coordinates to be considered as predictors by using the number_ of_phylo_axis argument (default set to 10).Note that all terminal taxa present in the input dataset need to be also present in the phylogeny.The algorithm included in the imputeTraits function uses trait relationships as well as the phylogenetic structure and, optionally, environmental factors to predict missing values by performing three chained imputation rounds.In the first round, missing values for individual traits are predicted by using complete data on environmental variables and phylogeny as predictors, when they are informative (i.e. when they present statistically significant relationships with trait variance).In the second round, each missing values of individual traits is sequentially predicted using environmental variables, phylogeny and other traits (including values imputed in the first round) as predictors, when they are informative.In the third round, missing values of individual traits are predicted in turn using environmental variables, phylogeny and other traits (imputed in the second round) as predictors when they are informative (FigureS1).Preliminary analyses showed that imputation performance improved in some cases when a third round of imputation was performed.To elucidate which variables are informative and thus included as predictors in each case, the computeVarianceCovariancePartition function is computed internally to calculate phylogenetic and environmental effects for each trait as well as the covariances among the traits to be imputed.Users can also include the results of the computeVari-anceCovariancePartition manually using the variance_results and the correlation_results arguments of the function imputeTraits (see package tutorial).In each imputation round, imputation of missing values for traits is performed for each one of the traits of interest.Each imputation model is iterated several times (controlled by the number_iterations argument, set to 10 by default).From each imputation round, an imputed matrix is obtained by calculating the mean for all iterations for each predicted value.Standard deviations of predicted values and individual model results are also returned.

(
micromol CO 2 m −2 s −1 , A area , number of species = 206), leaf lifespan (months, LL, Number of species = 198), midday leaf xylem tension as a measure of xylem drought exposure (MPa, |P min |, number of species = 636), tension at the turgor loss point as a measure of drought tolerance (MPa, |P tlp |, number of species = 556) and leafspecific maximum hydraulic conductivity as a measure of hydraulicsufficiency (Kg m −1 MPa −1 s −1 , K l , number of species = 878).We considered an aridity index (AI) as an environmental variable related to species niche.The aridity index was obtained from CGIAR(Trabucco & Zomer, 2018) in a previously published analysis(Sanchez-Martinez et al., 2020).Low values of aridity index represent regions with low water availability relative to the atmospheric water demand.All variables were transformed by calculating the natural logarithm of absolute values, as it improved their normality.
The first module represents the correlation between leaf resource uptake and use strategies (i.e.leaf economics) with xylem drought exposure and drought tolerance (quantified as the absolute value of the minimum water potential measured in the xylem, |P min | and the absolute value of the water potential at stomatal turgor loss point, |P tlp |, respectively).This module represents variation in an axis describing acquisitive leaves with low drought exposure-tolerance to leaves with conservative resource uptake and use with high drought exposure and tolerance (specific leaf area is positively related to nitrogen content, negatively related to leaf lifespan and negatively related to drought exposure and tolerance).This module presents phylogenetic conservatism and part of this conservatism is associated with aridity, especially in those correlations involving drought exposure and tolerance (Figure 5c).
Total correlation among variables.(a) Total correlation among simulated variables and (b) total correlation among plant functional traits.Pink nodes represent environmental variables and blue nodes represent traits.aridity in a phylogenetically structured manner, describing a pattern of environmental phylogenetic conservatism associated with aridity in the integration between plant hydraulics and leaf economics.Trait networks supported the predominant pattern of phylogenetic conservatism in functional traits of woody angiosperm species.Networks representing non-attributed phylogenetic conservatism F I G U R E 3 Simulated data networks.Simulated data networks considering a phylogenetically conserved environmental variable (Phylo Env).Pie charts represent the variance and edges represent the covariance related to each one the components (non-attributed phylogenetic conservatism, environmental phylogenetic conservatism, evolutionarily labile environmental effect and residual).Green-solid edges represent positive correlations and red-dashed ones negative correlations with a width proportional to the correlation coefficient.Node size is proportional to node degree (number of connections).Network metrics are also shown in each case.(a) Total correlation; (b) non-attributed phylogenetic conservatism in variances and correlations given Phylo Env; (c) environmental phylogenetic conservatism in variances and correlations associated with Phylo Env, (d) labile environmental effect of Phylo Env in variances and correlations and (e) residual variance and correlation given the phylogeny and Phylo Env.
and environmental phylogenetic conservatism associated with aridity showed higher values of edge density (ED), maximum absolute correlation (|r| max ) and mean absolute correlation (|r| mean ) compared to the evolutionarily labile environmental effect of aridity.The number of modules proved to be useful to detect the number of independent groups of correlated traits, showing the integration of the two F I G U R E 4 Simulated data networks.Simulated data networks considering a phylogenetically independent environmental variable (Non Phylo Env).Pie charts represent the variance and edges represent the covariance related to each one the components (non-attributed phylogenetic conservatism, environmental phylogenetic conservatism, evolutionarily labile environmental effect and residual).Green-solid edges represent positive correlations and red-dashed ones negative correlations with a width proportional to the correlation coefficient.Node size is proportional to node degree (number of connections).Network metrics are also shown in each case.(a) Total correlation; (b) non-attributed phylogenetic conservatism in variances and correlations given Non Phylo Env; (c) environmental phylogenetic conservatism in variances and correlations associated with Non Phylo Env, (d) labile environmental effect of Non Phylo Env in variances and correlations and (e) residual variance and correlation given the phylogeny and Non Phylo Env.
(a) Total correlation; (b) non-attributed phylogenetic conservatism in variances and correlations given aridity; (c) environmental phylogenetic conservatism in variances and correlations associated with aridity, (d) labile environmental effect of aridity in variances and correlations and (e) residual variance and correlation given the phylogeny and aridity.

F I G U R E 6
Predictive performance.Predictive performance of the imputation framework.Results for mean imputation (red), MICE imputation (blue), phylopars imputation (green) and imputeTraits (TrEvol package) imputation (purple) are shown.(a) and (b) show normalised root mean square error (NRMSE) for simulated and leaf traits, respectively.(c) and (d) show R 2 for simulated and leaf traits, respectively.NRMSE and R 2 were calculated from a cross validation procedure using 80% of the data to train models and 20% to test., future work on the coordination of photosynthetic capacity with other functional traits will be useful to further elucidate its adaptive role.
Martinez, Maurizio Mencuccini, Jordi Martínez-Vilalta, David D. Ackerly and Todd E. Dawson designed the study; Pablo Sanchez-Martinez created the R Package, analysed the data and wrote the first draft of the manuscript.All authors contributed substantially to revisions.ACK N OWLED G EM ENTS This research was supported by competitive grants CGL2017-89149-C2-1-R and PID2021-127452NB-I00 funded by MCIN/ AEI/10.13039/501100011033.Pablo Sanchez-Martinez acknowledges an FPU predoctoral fellowship from the Spanish Ministry of Science, Innovation and Universities (grant FPU18/04945).Jordi Martínez-Vilalta benefited from an ICREA Academia award.