Functional diversity metrics can perform well with highly incomplete data sets

Characterising changes in functional diversity at large spatial scales provides insight into the impact of human activity on ecosystem structure and function. However, the approach is often based on trait data sets that are incomplete and unrepresentative, with uncertain impacts on functional diversity estimates. To address this knowledge gap, we simulated random and biased removal of data from three empirical trait data sets: an avian data set (9579 species), a plant data set (2185 species) and a crocodilian data set (25 species). For these data sets, we assessed whether functional diversity metrics were robust to data incompleteness with and without using imputation to fill data gaps. We compared two metrics each calculated with two methods: functional richness (calculated with convex hulls and trait probabilities densities) and functional divergence (calculated with distance‐based Rao and trait probability densities). Without imputation, estimates of functional diversity (richness and divergence) for birds and plants were robust when 20%–70% of species had missing data for four out of 11 and two out of six continuous traits, respectively, depending on the severity of bias and method used. However, when missing traits were imputed, functional diversity metrics consistently remained representative of the true value when 70% of bird species were missing data for four out of 11 traits and when 50% of plant species were missing data for two out of six traits. Trait probability densities and distance‐based Rao were particularly robust to missingness and bias when combined with imputation. Convex hull‐based estimations of functional richness were less reliable. When applied to a smaller data set (crocodilians, 25 species), all functional diversity metrics were much more sensitive to missing data. Expanding global morphometric data sets to represent more taxa and traits, and to quantify intraspecific variation, remains a priority. In the meantime, our results show that widely used methods can successfully quantify large‐scale functional diversity even when data are missing for half of species, provided that missing traits are estimated using imputation. We recommend the use of trait probability densities or distance‐based Rao when working with large incomplete data sets and filling data gaps with imputation.


| INTRODUC TI ON
Trait-based approaches have emerged as an effective tool for addressing pressing questions in ecology and conservation (Petchey & Gaston, 2006) and have been used to monitor ecosystem service delivery (Hevia et al., 2017), understand threats to ecosystem processes (Donoso et al., 2020) and forecast changes in ecosystem functioning (Şekercioǧlu et al., 2004).In some cases, assessments of functional diversity provide greater explanatory power than complementary measures of diversity such as species richness (Cadotte et al., 2011;Wilkes et al., 2020), in part due to the mechanistic link between traits and ecosystem functions (Petchey & Gaston, 2006).
Research into how functional diversity has varied across time and space could yield fruitful insights into ecosystem responses to global change (Green et al., 2022;Petchey & Gaston, 2006).Given the potential and versatility of trait-based approaches, ensuring functional diversity analyses are appropriately applied is of great importance (Blonder et al., 2018).
Traditionally, assessments of functional diversity have been carried out on local community scales (Díaz & Cabido, 2001); only recently has the accumulation of large trait data sets enabled application on continental to global scales (Etard et al., 2020;Migliavacca et al., 2021).Large-scale trait analyses provide unique opportunities that complement those of local analyses.Integration of functional diversity across scales (Carmona et al., 2016), inclusion of species overlooked by local studies (such as migratory species; Somveille et al., 2013) and identification of synergies and differences in the way humans impact natural ecosystems (Toussaint et al., 2021) provide a global perspective on biodiversity loss, needed for international policy development (Santini et al., 2021;Scholes & Biggs, 2005).
Large-scale analyses come with their own set of challenges.
Studying functional diversity at continental to global scales across broad taxonomic groups requires relying on data that are rarely complete (Toussaint et al., 2021), introducing uncertainty into inferences obtained (Richter et al., 2021).For example, while 80% of plant species were represented in the largest plant trait data set (the TRY database) at the time of writing (Kattge et al., 2020), only 17% of species had data for 10 or more traits (Tobias et al., 2022), less than 1% had complete data across the six fundamental traits that were used to characterise the global spectrum of plant form and function (Díaz et al., 2016), and this percentage dropped to 0.1% when fine-root traits (characteristics of roots less than 2 mm of diameter, such as root nitrogen concentration and tissue density, that affect acquisition of soil resources) were also considered (Carmona, Bueno, et al., 2021;Carmona, Tamme, et al., 2021).Etard et al. (2020) found that mean trait completeness across seven ecological traits was only 47% for amphibians and 46% for reptiles.Trait data availability is further restricted in invertebrates, fungi, algae, protozoa and prokaryotes (Gallagher et al., 2019).In some groups, the data availability is better.Etard et al. (2020) found that a mean of 89% of mammals and 84% of birds were represented (across seven ecological traits).
A recent release of a new data set, AVONET, sees 95%-99% of bird species represented (with some variation by trait and taxonomy) for 11 morphological traits (Tobias et al., 2022) with ecological and geographical data also available for most species.However, even when a small proportion of species are missing data, the available information may not be representative of the true diversity (González-Suárez et al., 2012).For most taxonomic groups, data are not missing at random and are biased according to morphological and ecological traits as well as geographical factors such as range size (González-Suárez et al., 2012;Sandel et al., 2015;Tyler et al., 2012).This may affect the accuracy of functional diversity metrics calculated (Sandel et al., 2015), potentially leading to spurious or misleading results (Pincheira- Donoso & Hodgson, 2018) and impeding the application of trait-based methods (Etard et al., 2020).
Methods to estimate functional diversity can now be applied to large-scale trait databases (Blonder et al., 2018;Carmona et al., 2016), but there is uncertainty over how these methods respond to missing data, particularly when data are not missing at random (Petchey & Gaston, 2006).This lack of understanding risks making unfounded assumptions that data are representative (Tyler et al., 2012) and impedes correct application of functional diversity methods (Pakeman, 2014).Where continuous trait data are available, functional diversity is most commonly assessed using three methods: distance-based methods, convex hulls and probabilistic hypervolumes (Mammola et al., 2021).Using these methods, distinct aspects of functional diversity can be described including richness (the total volume of trait space occupied by an assemblage) and divergence (the variance in traits of biological units in an assemblage; Mason et al., 2005).A few studies have quantified sensitivity to missing data at local scales using distance-based methods (Pakeman, 2014;Richter et al., 2021).However, to our knowledge, no one has quantitatively compared methods spanning conceptual frameworks (Mammola et al., 2021).This is urgently needed because densities or distance-based Rao when working with large incomplete data sets and filling data gaps with imputation.

K E Y W O R D S
bias, functional diversity, incomplete, missing, trait data local-scale sensitivity and bias may differ substantially from those that occur at large scales and sensitivity may differ when using multidimensional methods.
The best approach for dealing with missing data is debated (Taugourdeau et al., 2014).One approach involves removing all species with missing data, known as complete case analysis (Nakagawa & Freckleton, 2008).Alternatively, missing data can be imputed.
While, in some instances, complete case analysis can outperform imputation (Johnson et al., 2021), research has also documented the danger of introducing bias into the results, as data are often not missing at random (Nakagawa & Freckleton, 2008).The performance of different imputation methods has been evaluated in several recent studies (Johnson et al., 2021;Penone et al., 2014) that collectively identify most reliable methods, and for trait data sets, agree on the value of using phylogenetic information to improve accuracy (Debastiani et al., 2021).However, whether imputation is needed or valuable when estimating functional diversity remains largely unexplored.
Here, we test how data missingness and bias affect estimation of functional diversity metrics using both complete case analysis and imputation and provide a protocol with recommendations for estimating functional diversity when trait data are missing.Our tests addressed three goals: (1) quantify changes due to missing and biased data in two types of diversity metrics: functional richness and functional divergence, each estimated with two methods; (2) determine whether data imputation methods reduce errors and biases in estimated metrics; and (3) identify the degree of missingness and bias permissible when quantifying large-scale functional diversity.
In contrast with many previous studies that use simulated data sets (Johnson et al., 2021;Kim et al., 2018), here we take advantage of the recently released near-complete bird trait data set, AVONET (Tobias, 2021), near-complete crocodile data set (Griffith et al., 2022) and also analysed a partial plant data set (Díaz et al., 2022).Using nonsimulated data sets offers a more realistic representation of empirical diversity analyses, where outliers and non-normal distributions are likely to be present (Junker et al., 2016).Finally, we provide direct recommendations on how to proceed when using incomplete trait data sets to assist methodological decision-making in functional diversity analyses.

| The complete data sets
Using the AVONET data set of avian morphological traits (Tobias, 2021), we investigated the impact of data missingness and bias in estimates of functional richness and functional divergence.
In this analysis, we followed the BirdTree taxonomy and phylogeny (Jetz et al., 2012).From the 9993 species recognised in BirdTree, empirical data are missing for one or more traits for 410 species, which were removed from the analysis.As this is <5% of species, this is not expected to have a considerable impact on our findings.
All morphological traits were included: beak length (culmen), beak length (nares), beak width, beak depth, tarsus length, wing length, secondary length, tail length, body mass, hand wing index and Kipps distance (for additional details on how traits were measured, see Tobias et al., 2022).All trait values were log 10 -transformed and standardised.While we aimed to use all available data, after initial exploration, we removed the four species of kiwis (Apterygidae) recognised in the BirdTree phylogeny (Jetz et al., 2012) as they were extreme outliers with respect to morphological traits (Figure S1, Supplementary Information Section 1, Matthews et al., 2022).Convex hulls were particularly sensitive to outliers so retaining the kiwi species may have made convex hulls appear particularly sensitive to missing data (Figure S2, Supplementary Information Section 1), and it would have been unclear whether this result was generalisable to other data sets without outliers.For this reason, kiwi species were removed.The final complete data set included data for 9579 bird species.
We also analysed a near-complete crocodilian data set (Griffith et al., 2022) and partially complete plant data set (Díaz et al., 2022) to test the generalisability of our findings.Details of these data sets, their analyses and findings are described in Supplementary Information Section 2.

| Overview
From the complete avian data set, we created 2800 data sets by removing data under all permutations of seven degrees of missingness, four degrees of bias and two ways of dealing with missing data (complete case analysis and imputation), with 50 replicates for each permutation (for each replication, data were removed at random under stipulated missingness and bias conditions).Most functional diversity metrics rely on the assumption that traits are uncorrelated, which is often not the case.Principal component analysis is one way to overcome trait co-correlation and provides uncorrelated axes that maximise variance present in the original data set.Through doing this, principal component analysis provides a means to reduce the number of dimensions in the data, which is essential when using large trait data sets as computation times become untenable with a large number of dimensions.Principal component analysis was carried out after imputation (or removal of species with incomplete data for complete case analysis), and before functional diversity calculation.For every data set, we calculated four metrics: convex hull (functional richness), distanced-based Rao (functional divergence) and trait probability densities (functional richness and divergence).
Figure 1 provides a schematic overview of the methodology.

| Missingness
Missingness refers to the percentage of species from which data were removed.We applied seven degrees of missingness, from 10% to 70%, at 10% intervals.We also explored two scenarios F I G U R E 1 Summary of the method of this analysis.Ovals indicate data, where white ovals indicate data obtained from a published data set (AVONET) and green ovals indicate data that had been processed in some way (for example, data sets with simulated data removal, or principal components calculated from data sets with simulated data removal).Blue ovate rectangles and purple rectangles indicate processing steps of the analysis.Purple rectangles are distinct from blue ovate rectangles as they refer to a process that has been described in more detail in the text.
considering the number of traits missing data for each species.The main text shows results based on removing data from four out of 11 traits for each species with data removal (chosen at random).Supplementary Information Section 3 shows results based on removing values from seven out of 11 traits.

| Bias
We emulated a commonly observed bias in trait data missingness, wherein species with lower body masses are more likely to be missing data (Garamszegi & Møller, 2012;González-Suárez et al., 2012;Tyler et al., 2012).Bias was applied by increasing the probability that data were removed from species with a below median body mass with four scenarios: no bias-50% probability, low bias-60% probability, medium bias-70% and high bias-80%.This means that in a low bias scenario, there was 60% probability that data were removed from species with below median body mass, and 40% probability that data were removed from species with above median body mass.The extent of bias in these scenarios was chosen based on analysis of bias in historic accumulation of data in large-scale bird trait data sets (Table S11, Supplementary Information Section 4).

| Imputation
We tested two ways of dealing with the generated incomplete data sets: (1) removal of species with missing data (complete case analysis) and (2) filling data gaps through imputation.We used missForest imputation, implemented through the missForest (Stekhoven & Bühlmann, 2012), due to its demonstrated accuracy (Hong & Lynn, 2020;May et al., 2023;Penone et al., 2014), and fast computation times.
One way of including phylogenetic data is through eigenvectors (Debastiani et al., 2021).Phylogenetic eigenvectors are calculated by conducting Principal Coordinates Analysis on pairwise phylogenetic distances among species.The first 10 phylogenetic eigenvectors were used as predictor variables in the missForest imputation procedure, as recommended by Penone et al. (2014), alongside functional traits.Phylogenetic distances were calculated from a maximum clade credibility tree from the first 1000 trees provided by Jetz et al. (2012).
We assessed imputation accuracy in each missingness and bias scenario (Figure S12, Supplementary Information Section 5).
Because body mass was expected to have high phylogenetic signal, biased removal was expected to affect imputation accuracy.Clades that were closely related to large species may have been more likely to have complete trait data than clades closely related to small species, which could increase accuracy of phylogenetic imputation for large species.We assessed whether this was the case by testing the impact of biased data removal on tree imbalance (Figures S13 and   S14, Supplementary Information Section 6), and tree imbalance on imputation accuracy (Figure S15, Supplementary Information Section 6).We also examined the impact of using different imputation methods on estimations of functional diversity using a reduced data set (the Tyrannidae family) and compared imputation methods with and without phylogenetic information (Supplementary Information Section 7).

| Principal component analysis
For each data set, we carried out principal component analysis.Variation in bird morphology can be meaningfully explained in a small number of dimensions (Pigot et al., 2020) so we used only the first three principal components that collectively explained 90% of total variance for the complete data set.We repeated the analysis using two, instead of three principal components (Figures S3 and S4, Tables S1 and S2, Supplementary Information Section 2), and found that conclusions were consistent regardless of the number of principal components used.
Previous research has analysed the impact of data set incompleteness on principal component analysis (Dray & Josse, 2015).
Here, we found that carrying out principal component analysis on incomplete data sets did not introduce a substantial component of error (Supplementary Information Section 8).We compared results with and without use of principal component analysis, presented in Supplementary Information Section 2 (Figure S5, Tables S3-S6).

| Functional diversity estimation
For each of the 2801 data sets, we calculated four metrics, two estimating functional richness and two estimating functional divergence (Figure 1, Table 1).There is a wealth of functional diversity metrics available which meant that comparing all their responses to incomplete data was untenable.As such, we chose to compare metrics which are both widely used and span nondimensional and multidimensional frameworks (Mammola et al., 2021).We did not compare functional evenness metrics because current metrics are unreliable (de Bello et al., 2021;Olusoji et al., 2023).Functional richness is a measure of the total variation of traits in a community (also described as the volume of trait space occupied, Mammola et al., 2021).We calculated functional richness using convex hull (a multidimensional, nonprobabilistic method that calculates the multivariate range of species traits in trait space [Cornwell et al., 2006;Villéger et al., 2008]) and trait probability densities (a multidimensional probabilistic method based on probability density distribution of species traits within trait space, Carmona, 2019).Figure 2 shows how convex hull and trait probability densities estimate functional richness for principal component values of the complete avian data set (shown in two dimensions to facilitate visualisation).Functional divergence is a measure of the dispersion of trait space occupation (how far species are from each other, or the centre of trait space [Mouchet et al., 2010]).We calculated functional divergence using Rao, which calculates the mean dissimilarity between individuals in a community (nondimensional, [Botta-Dukát, 2005;de Bello et al., 2016;Rao, 1982]), and trait probability densities, which calculates the mean distance of the distribution of abundances of species from the centre of the portion of trait space occupied by the community (Carmona, 2019).Distance-based methods (Rao) and convex hull are the methods underlying the widely used FD package (Laliberté & Legendre, 2010).We compared distance-based methods and convex hull to a more recent approach using probabilistic hypervolumes, implemented through trait probability densities in the package TPD (Carmona, 2019).An alpha value of 0.95 (the default value describing the proportion of the probability density distribution included in functional richness and functional divergence calculations) was used for TPD calculations.We calculated functional diversity of the bird assemblage as a whole, using presence only data (no information on abundance).We did not consider intraspecific functional diversity or calculate overlap between taxonomic or ecological groups.

| Statistical analysis
To determine the impact of missingness, bias and how missing data were handled (complete case analysis or imputation), we calculated the deviation in functional diversity metrics from the true value calculated from the full data set, for each data set replicate i for each trait method m as, where Rd m is functional diversity of the complete data set estimated using metric m and Rd i m is the functional diversity of replicate data set i estimated using the same metric m.
(1) TA B L E 1 Methods used for quantifying functional diversity.For convex hull and trait probability densities (TPD), functional space is built using the function named in 'R function (functional space)', then functional diversity is calculated using the function named in 'R function (functional diversity)'.with all predictor combinations and their interactions.We evaluated the model using a Type III Analysis of Variance (ANOVA;

Method
Anova function in library car [Fox & Weisberg, 2019]), and significance was assessed with Wald tests.The impact of using different metrics and complete case analysis or imputation on absolute deviation was analysed using estimated marginal means (library emmeans [Lenth, 2022]).Custom contrasts were used to compare absolute deviations with and without imputation, and compare absolute deviations using different metrics (for complete case and with imputation separately).Contrasts were compared across all bias scenarios.p values were adjusted using the Šidák correction.
Permissible levels of missingness were defined as those where the absolute deviation in estimated functional diversity (|Dev|) was <10% for all data set replicates (based on observed values of absolute deviation rather than model output).

| Overview
Functional diversity estimates from incomplete avian data sets deviated from true values (calculated with the complete data set) by between −48% and 31%.Deviations were affected by all tested predictors and their interactions (Table 2, R 2 = 0.82).Significant interactions among predictors demonstrate the complexity of response to incomplete data, but by and large, functional diversity estimates deviated more from the true values at greater levels of missingness and bias (Figure 3).Dimensionality and whether raw traits or principal components were used, generally had a small impact on functional diversity metric response to incomplete data, and was less important than choice of functional diversity metric and how missing data were handled (imputation or complete case analysis, and S4, Supplementary Information Section 2).

| Metric response to incomplete data
When applied to the avian data set, regardless of bias, or the way of handling missing data (imputation or complete case analysis), convex hull had greater absolute deviations (|Dev|) than TPD richness, and distance-based Rao had greater absolute deviations than TPD divergence (Table S15a, Supplementary Information Section 10).
The only exception was that convex hull and TPD richness did not differ in absolute deviation when there was high bias in data removal and complete case analysis was used ( applied on two principal components instead of three, and on raw traits (body mass and hand wing index) instead of principal components (Table S2, Table S5, Table S6, Supplementary Information Section 2).
When applied to the plant data set with imputation, absolute deviation did not differ significantly between TPD richness and convex hull, and between TPD divergence and distance-based Rao (Figure S6, Table S9, Supplementary Information Section 2).Under complete case analysis, TPD richness resulted in lower absolute deviation than convex hull, and TPD divergence resulted in lower absolute deviation than distance-based Rao.For the crocodilian data set, TPD richness resulted in lower absolute deviation than convex hull but there was no significant difference between TPD divergence and distance-based Rao, regardless of whether imputation or complete case analysis was used (Figure S6 and Table S10, Supplementary Information Section 2).
Whether missing data resulted in over-or underestimation of functional diversity was variable, and dependant on the method used and how missing data were handled.When species with missing data were removed, avian functional richness was underestimated with convex hull but overestimated with TPD.When using imputation, functional richness was underestimated with both convex hull and TPD.Distance-based methods consistently overestimated avian functional divergence when species with missing data were removed, whereas TPD divergence was slightly underestimated functional divergence.With imputation, the effect of missing data on estimates of functional divergence was negligible for both distancebased methods and TPD (Figure 3).The direction of the response to missing data when using TPD richness and TPD divergence was also dependant on the data set, as TPD richness underestimated functional richness and TPD divergence overestimated functional divergence when applied to crocodilians (Figure S6, Supplementary Information Section 2, opposite to the effect when applied to the avian data set, Figure 3).Whether missing data resulted in over-or underestimation of TPD richness was dependant on the bias scenario when using the plant data set.
Consistency within replicates (for the same level of missingness and bias) was variable among methods.Deviation was less predictable (more variation among replicates) when using complete case analyses, and when using convex hull to estimate functional richness.
F I G U R E 3 Deviation in (a) avian functional richness and (b) avian functional divergence for incomplete data sets compared with the complete data set given varying levels of missingness (percentage of species with missing data for four out of 11 traits) and bias (greater probability of data missing from small-sized species).Lines represent the mean with shaded areas showing 1 SD from the 50 replicates analysed for each combination.Top panels (in a and b) show results for complete case analyses (only species with all trait data analysed), and bottom panels show results for when missing values were imputed using random forest implemented with the missForest R function.Functional diversity was calculated using four metrics: convex hull (richness), trait probability densities (TPD) richness, distance-based Rao (divergence) and TPD divergence.

| The relative performance of complete case analysis and imputation
Absolute deviations were usually greater and more clearly affected by bias when only species with no missing data were analysed (complete case analysis) highlighting the benefit of imputation (Figure 3, Table S15b, Supplementary Information Section 10).In the case of avian distance-based Rao, the absolute deviation with imputation was on average 5.2 times smaller than in the complete case scenario, and for convex hull, the absolute deviation with imputation was on average 1.9 times smaller than then complete case scenario.As an exception, the lowest absolute deviations when using TPD richness occurred when data were missing at random and complete case analysis was used (Table S15b, Supplementary Information Section 10).
If imputation was used, avian functional divergence from distance-based Rao was never overestimated by more than 6% and functional divergence from TPD was never overestimated by more than 3% (versus 31% overestimation and 5% underestimation in the complete case scenario, respectively).TPD divergence never deviated from the true values by more than 5% regardless of bias, missingness or the way of handling missing data (imputation or complete case analysis).With imputation, functional richness from TPD was never underestimated by more than 10% (versus 19% overestimation in the complete case scenario).Functional richness from convex hull was underestimated by up to 48% without imputation and 24% with imputation.Metrics estimated using distance-based and TPD methods were insensitive to bias when imputation was used.
Imputation reduced absolute deviation relative to complete case analysis when applied to the crocodilian data set (Table S10, Supplementary Information Section 2).For plants, however, the results were more variable, as for TPD richness and TPD divergence, complete case analysis resulted in lower absolute deviation than imputation (Table S9, Supplementary Information Section 2).When applied to the Tyrannidae family, choice of imputation method was not a significant explanatory variable of imputation accuracy, which was dependent on missingness, bias and their interaction (Figure S17 and Table S12, Supplementary Information Section 7).Despite this, the choice imputation method did affect estimates of functional diversity.When applied to the Tyrannidae family, complete-case analysis resulted in greater absolute deviation than imputation for all metrics and imputation methods, aside from TPD richness where missForest (with and without phylogenetic information) resulted in greater absolute deviation than complete case analysis (Figures S18 and S19, Table S14, Supplementary Information Section 7).

| Missingness and bias permissible in trait data sets
When using complete case analysis with the avian data set, the extent of missing data permissible was dependant on the bias scenario and method, with biased data removal resulting in lower missingness permissible for all metrics other than TPD divergence.Functional divergence from TPD was extremely robust to missing and biased data and did not deviate by more than 10% from true values even at 70% missingness, regardless of whether complete case analysis or imputation was used.Convex hull was much less reliable with very limited levels of missing data permissible (Figure 4).was permissible when seven out of 11 (rather than four out of 11) traits were removed except for functional divergence from TPD (Fig- ure S9, Supplementary Information Section 3).
For plants, if imputation was used functional divergence could be reliably estimated with up to 70% missingness regardless of bias scenario or whether distance-based Rao or TPD divergence was used.When calculating plant functional richness with imputation, the maximum missingness permissible was dependant on bias scenario; 70% missingness was permissible at random and low bias, but at medium bias 60% missingness was permissible and at high bias 50% missingness was permissible.For crocodilians, the missingness permissible rarely exceeded 10%, regardless of metric, imputation or bias scenario.
Overall, these results indicate that when using imputation and a large trait data set, TPD richness, TPD divergence and distancebased Rao (divergence) can be robustly estimated even with high data missingness (up to 50% to 70% given high trait coverage).

| DISCUSS ION
Our results indicate that metrics of functional richness and divergence can be estimated with relative accuracy even with incomplete data sets.From the methods we tested, we recommend estimating functional richness with TPD, and while TPD outperformed distance-based Rao, both were reliable for calculating functional divergence.We also recommend using imputation as we found that imputing missing trait data reduced deviations in functional diversity metrics to <10%.This finding is promising for the application of large-scale trait analyses.In global analyses, complete trait information is rare, especially as the number of traits used increases to provide a more comprehensive assessment of functional diversity.
As such many studies rely on imputation, but the impact of this decision is often questioned.Our findings show that imputation can be an effective way for accounting for incomplete data, and in the avian and plant data sets greatly reduced errors in functional diversity estimation compared to removing species with incomplete data (see Table 15b, [Supplementary Information Section 11], and Table S9 [Supplementary Information Section 2]).Our results suggest that given observed levels of missing data for many taxa (Etard et al., 2020;Kattge et al., 2020;Tobias et al., 2022), current methods could be used to calculate functional diversity with reasonable accuracy, even when applied with a smaller number of dimensions and with raw traits instead of principal components (Figures S3-S5, Supplementary Information Section 2).
Despite the robustness of functional diversity metrics (aside from functional richness estimated using convex hull), careful consideration is needed when interpreting functional diversity values, as some unexpected responses to incomplete data were observed.
Functional richness is the volume of trait space occupied, so it is expected to shrink with data removal.While this was observed for convex hull, TPD richness increased with biased removal of species when using complete case analysis on the avian data set.This counter-intuitive response is due to data-defined bandwidth selection during kernel density estimation of trait probability densities (see Supplementary Information Section 11 for detailed explanation) but was not observed when applied to plants or crocodilians.These results emphasise the need for null models and clear consideration of expectations when conducting functional diversity analyses (particularly when estimating functional richness).Outlining expectations and using null models ensures results are interpreted correctly and observed responses are due to the hypothesis being tested rather than an artefact of data availability or metric behaviour.
Our comparison of functional diversity methods revealed sensitivities to missing data that were previously unreported.Convex hulls performed consistently worse than TPD richness, with low tolerance to missing data even when imputation was used.This suggests that when relying on incomplete data sets to quantify functional richness, using TPD (or similar approaches based on probabilistic hypervolumes [Blonder, 2018]) is advisable.This does not preclude value in convex hull approaches.Convex hulls are sensitive to changes that affect extreme trait values, so if a complete data set is available, they can provide an estimation of functional diversity, which is highly responsive to changes occurring in an ecosystem.Convex hull showed increased sensitivity relative to TPD when applied to a crocodilian data set or plant data set when species with incomplete data were removed, but similar sensitivity to incomplete trait data when applied to a plant data set with imputation (Figures S6 and S7, Supplementary Information Section 2).Our findings agree with others that sensitivity to missingness and bias varies among methods (Pakeman, 2014) and attest to the robustness of distance-based metrics such as Rao (Pakeman, 2014;Richter et al., 2021).We also found that both functional divergence metrics, distance-based Rao and TPD, were particularly robust to incomplete data.TPD (divergence and richness) and distance-based methods, if used with accurate imputation, can be applied to large-scale trait data sets with confidence, even at moderate-to-high levels of missingness and bias.
Our findings support the use of functional diversity metrics for analyses on large taxonomic and geographical scales in functional ecology.A good example of how functional diversity metrics can be applied is given by Sayol et al. (2021).Sayol et al. (2021) compiled trait data on extinct, extant and established alien bird species for nine oceanic archipelagos, using imputation to fill data gaps.
They used hypervolumes to quantify functional diversity (Blonder et al., 2018) and revealed how alien species did not replace the functional diversity lost to extinctions in these archipelagos.Using data from Sayol et al. (2021), we show how convex hull and TPD richness delineate functional space in this context, for the historic New Zealand assemblage with extinct birds, and the novel assemblage, with extant alien species, (Figure 5).
Imputation was generally effective for overcoming gaps and biases in incomplete data sets, even at extreme levels of missingness and bias.This is in agreement with previous studies (Johnson et al., 2021;Penone et al., 2014;Swenson, 2014) and highlights the value of imputation when appropriately applied.When applied to birds, imputation was particularly effective at eradicating the signal of bias.This is important given that bias in missing data are prevalent in large-scale data sets (González-Suárez et al., 2012;Sandel et al., 2015;Tyler et al., 2012) and the extent of bias is often unknown (Tyler et al., 2012).Despite this, in some cases (when calculating avian functional richness with TPD with random or low bias in data removal or when calculating plant functional diversity with TPD richness or TPD divergence), imputation could lead to greater errors than complete case analysis (see also Johnson et al., 2021).Therefore, while our results agree with others that random forest models (as implemented by the missForest R function) are an accurate imputation method for trait data (Johnson et al., 2021), care should be taken to ensure use of imputation is appropriate.Our findings regarding the utility of imputation are only applicable to continuous trait imputation, as the efficacy of categorical traits imputation was not explored (although see May et al., 2023), and to large trait data sets on the scale of hundreds or thousands of species rather than tens.The utility of imputation in tackling missing and biased data has been shown to depend on the correlation between traits, and extent of phylogenetic autocorrelation (Clavel et al., 2015).As there is a strong link between form and function in birds (Pigot et al., 2020), phylogenetically informed imputation methods are expected to be particularly reliable for this group.This is supported by the fact that performance of imputation relative to complete case analysis was more variable when applied to plants where phylogenetic correlation of traits was expected to be lower (Table S9, Supplementary Information Section 2).Studies should assess imputation accuracy, through methods such as bootstrapping (Khan et al., 2019) and testing the ability to locate species within trait space (Khan et al., 2019), to ensure imputation is adequately accurate for the desired application.In addition, when data are only available for a small subset of taxa or traits, we recommend reducing the scope of the study to focus on a smaller group with better data availability.
Making an informed decision on how much missing data can be tolerated is one of the challenges of functional diversity analyses.When using imputation and TPD or distance-based methods, we found up to 50% to 70% missingness was permissible under our criterion of functional diversity estimates falling within 10% of true values.However, the maximum missingness permissible was lower when using complete case analysis and was very low with convex hull functional richness.For smaller data sets (such as crocodilians, see Figure S7, Supplementary Information Section 2), when using less-well-defined phylogenies or when many dimensions are needed to quantify trait space, the maximum missingness permissible will likely be reduced.When using TPD richness and imputation, and removing seven out of 11 traits values rather than four, the maximum missingness permissible (where deviation of richness from true values was less than 10%) was reduced to 40% (Figure S9, Supplementary Information Section 3).While this indicates how sensitivity to missing data may change when more data across traits is missing, further research on how functional diversity estimation responds to incomplete data when missingness is uneven between traits is needed.Therefore, 50% missingness can serve as a conservative guideline where application is similar to that reported here, but the spatial scale of the analysis, the size of the taxonomic or ecological group, the correlation between traits, the missingness between traits and the extent of phylogenetic conservatism could affect maximum missingness permissible in other contexts.In addition, more restricted criteria for acceptable deviation from true values (here defined as 10%) may be desirable in some cases leading to lower missingness permissible.Future work is needed to explore how these different factors influence functional diversity estimation when using incomplete data sets.In the meantime, wherever possible the extent of missingness should be quantified and used alongside other factors to inform choice of imputation and functional diversity metric.
Trait-based analyses are increasingly being applied to greater taxonomic and spatial scales (Etard et al., 2020;Johnson et al., 2021), so it is likely that trait-based ecologists will have to deal with missing data for the foreseeable future.As such, we have provided a guide for reducing error when estimating functional diversity from incomplete trait data sets (Figure 6), summarising recommendations made in this study.Our findings show that imputation can permit reasonably accurate measurements of functional diversity even when 50% of species have incomplete data.
Melodic (divergence) de Bello et al. (2016) Convex hull Functional richness Binary hypervolume (multidimensional) BAT::hull.build Cardoso et al. (2015) BAT:: hull.alpha (richness) Cardoso et al. (2015) REND (richness and divergence) Carmona (2019) We then fitted a multiple generalised linear regression using a gamma distribution to explain absolute Dev values (|Dev|) as a function of missingness (continuous variable), bias (ordered factor), missing data handling (binary variable with values: complete-case and imputation) and the functional diversity method (categorical variable).Absolute values of deviation (|Dev|) were used rather than raw values of deviation (Dev) to facilitate interpretation of pairwise comparisons of estimated marginalised means.If raw deviation values (Dev) were used, a positive contrast in estimated marginalised means could indicate reduced underestimation of functional diversity under a particular condition (for example when using one metric over another), or increased overestimation, presenting an ambiguity that would hinder interpretation of results.When using absolute deviation, positive contrasts indicated more severe deviation in functional diversity values from true values of one condition compared with another.The distribution of absolute deviation values was right skewed, and when using a linear model, residuals were non-normally distributed and there was evidence of heteroscedasticity (Figure S20, Supplementary Information Section 9).As such a generalised linear model was fitted using a gamma distribution with an inverse link function.This gave residuals that were closer to a normal distribution (albeit still deviating slightly from normal distribution) and removed heteroscedasticity (Figure S21, Supplementary Information Section 9).The model was fitted using the glm function (family = Gamma(link="inverse")) in base R version 4.1.2(R Core Team, 2021) Maximum missingness permissible (maximum missingness where all functional diversity values fell within +/− 10% of functional diversity of the complete data set) for (a) avian functional richness and (b) avian functional divergence, with four degrees of bias.Functional diversity was calculated using convex hull (richness), trait probability densities (TPD) richness, distance-based Rao (divergence) and TPD divergence.Results are shown for complete case analysis (darker colour bars) and for imputed data sets (lighter colour bars).

F
I G U R E 5 (a) Principal component values of extant, extinct and established alien bird species of New Zealand archipelago, showing how (b) trait probability densities richness and (c) convex hull delineate functional space occupation for the historic assemblage with native extant and extinct species, and the novel assemblage, with native extant and alien extant species.Shown in two dimensions (two principal components) to enable visualisation.Sayol et al. (2021) used three dimensions (three principal components) and hypervolumes (probabilistic hypervolume method, Blonder et al., 2018) to quantify functional richness.Trait data were obtained from the first imputation data set of Sayol et al., 2021 (Data S7).
and distance-based methods (Rao for divergence) resulted in robust quantification of functional diversity.Generally, functional richness estimated using convex hull was sensitive to incomplete data and should be used with caution.Given observed deviations we can conclude that for some taxa and traits, we have enough data to perform accurate large-scale analyses of functional diversity.Overall, the findings presented here are promising for the application of functional diversity methods to large-scale questions in ecology and conservation, and highlight the potential of imputation for overcoming gaps in trait databases.

Figure S5 .
Figure S5.Deviation in avian functional richness (a and c) and functional divergence (b and d) for incomplete datasets calculated from the first two principal components and raw traits (body mass and hand wing index), compared to functional diversity of the complete dataset given varying levels of missingness (percentage of species with missing data for four out of 11 traits) and bias (greater probability of data missing from small-sized species).

Figure S6 .
Figure S6.Deviation in functional richness (a, c and e) and functional divergence (b, d and f) for incomplete datasets compared to the complete dataset for birds (a and b), plants (c and d) and crocodilians (e and f) given varying levels of missingness (percentage of species with missing data) and bias (greater probability of data missing from small-sized species).

Figure S7 .
Figure S7.Maximum missingness permissible (maximum missingness where all of functional diversity values fell within +/− 10% of functional diversity of the complete dataset), for birds (a), plants (b) and crocodilians (c) with four degrees of bias.

Figure S8 .
Figure S8.Deviation in (a) avian functional richness and (b) avian functional divergence for incomplete datasets compared to the complete dataset given varying levels of missingness (percentage of species with missing data) and bias (greater probability of data missing from small-sized species).

Figure S9 .
Figure S9.Maximum missingness permissible (maximum missingness where all avian functional diversity values fell within +/− 10% of functional diversity of the complete dataset) for (a) functional

Figure S10 .
Figure S10.Probability density plot of body mass (log 10 transformed) in three bird trait datasets for the full dataset (blue) and incomplete datasets, for species included (orange) and species missing (green).

Figure S12 .
Figure S12.Imputation normalised root mean square error (expressed as a percentage of trait range, mean shown by line with shaded areas showing 1 SD from 50 replicates) with varying levels of missingness (percentage of bird species with missing data for four out of 11 traits) and bias (greater probability of data missing from small-sized species).

Figure S13 .
Figure S13.Tree imbalance (calculated with the Colless corrected index) with varying bird species missingness (% of species removed from tree) and bias (probability that species removed had a below median body mass).

Figure S14 .
Figure S14.Tree imbalance (calculated with the Colless corrected index) with varying bird species missingness (% of species removed from tree) and bias (probability that species removed had a below median body mass).

Figure S16 .
Figure S16.Tree imbalance calculated with (a) Colless corrected index and (b) average leaf depth against imputation error (nrmse, % of range), coloured by species missingness (% of species removed) for different scenarios of bias (random, low, medium and high).

Figure S17 .
Figure S17.Imputation normalised root mean square error (expressed as a percentage of trait range, mean shown by line with shaded areasshowing 1 SD from 20 replicates) when using difference imputation methods (MICE, missForest without phylogenetic information [no phy], missForest with phylogenetic information [with phy] and Phylopars), with varying levels of missingness (percentage of species with missing data for four out of 11 traits) and bias (random, low, medium and high, where higher bias indicated a greater probability of data missing from small-sized species).

Figure S18 .
Figure S18.Deviation of functional richness from the complete dataset (first three principal components of morphological variation in the Tyrannidae family) (mean shown by line with shaded areasshowing 1 SD from 20 replicates) when using difference imputation methods (MICE, missForest without phylogenetic information [no phy], missForest with phylogenetic information [with phy] and Phylopars), with varying levels of missingness (percentage of species with missing data for four out of 11 traits) and bias (random, low, medium and high, where higher bias indicated a greater probability of data missing from small-sized species).

Figure S19 .
Figure S19.Deviation of functional divergence from the complete dataset (first three principal components of morphological variation in the Tyrannidae family) (mean shown by line with shaded areas showing 1 SD from 20 replicates) when using difference imputation methods (MICE, missForest without phylogenetic information

Figure S20 .
Figure S20.Diagnostic plots of a linear model of absolute deviation in avian functional diversity, with missingness, bias, missing data handling and functional diversity metric (and their two-, threeand four-way interactions) as explanatory variables.a) Histogram of response variables shows right-skewed distribution of absolute deviation (response variable), b) normal Q-Q plot shows non-normal deviation of residuals, c) residuals vs fitted, and d) scale-location, shows heteroscedastic distribution of residuals.

Figure S21 .
Figure S21.Diagnostic plots of a generalized linear model of absolute deviation in avian functional diversity (gamma distribution with an inverse link function), with missingness, bias, missing data handling and functional diversity metric (and their two-, three-and four-way interactions) as explanatory variables.(a) Histogram of response variables shows right-skewed distribution of absolute deviation (response variable), (b) normal Q-Q plot shows residuals that were closer to a normal distribution than when using a gaussian linear model, but still deviated slightly from a normal distribution, and (c) residuals versus fitted, and (d) scale-location, shows homoscedastic distribution of residuals.

Figure S22 .
Figure S22.Conceptual diagram demonstrating how data removal (b) and increase in sample variance can result in an increase in trait range included in the 95th percentile of probability density functions (red line) (relative to the complete dataset, a).Dots hypothetical trait data along one trait axis, blue areas show the trait probability densities estimated for single data points, and the black line shows the convolution of these probability densities, to give the trait probability of the sample.Hypothetical trait data in (a) have the same range as those in (b).

Figure S23 .
Figure S23.Probability distributions of principal component 2 (PC2) in incomplete avian datasets (blue area and grey lines.The black line shows the probability distribution of principal component 2 in the full dataset.Red lines show the 5th and 95th percentile of the full dataset, grey vertical lines show 5th and 95th percentiles of the probability density functions of incomplete datasets.

Figure S24 .
Figure S24.Deviation in avian functional richness of datasets with missing trait data compared to the complete dataset (mean shown by line with shaded areas showing 1 SD from 100 replicates) given varying levels of missingness (proportion of species with missing data) and bias (greater probability of data missing from small-sized species).

Table S15a
Type III Analysis of Variance (ANOVA) table of a generalised linear model explaining absolute deviation (|Dev|) in avian functional diversity based on missingness, bias, functional diversity metric and way of handling missing data (and their interactions).Significance of predictors was tested with a Wald test.Rows are ordered by the χ 2 value, with the strongest predictor (Metric) at the top.All predictors apart from 'Imputation: Bias' and 'Imputation: Missingness: Bias' explained a significant proportion of variance (p value <0.001).
, Supplementary Information Section 10).Convex hull had greater absolute deviations (|Dev|) than TPD richness, and distance-based Rao had greater absolute deviations than TPD divergence even when the analyses were TA B L E 2

Table S1 .
Type III Analysis of Variance (ANOVA) table of a generalized linear model explaining absolute deviation (|Dev|) in avian functional diversity based on missingness, bias and functional diversity metric (and their interactions), way of handling missing data and dimensionality.

Table S2 .
Contrasts of estimated marginalised means of a gamma generalised linear model of absolute deviation of avian functional diversity with missingness, functional diversity metric, way of