Severe limitations of the FEve metric of functional evenness and some alternative metrics

Abstract The metric of functional evenness FEve is an example of how approaches to conceptualizing and measuring functional variability may go astray. This index has several critical conceptual and practical drawbacks: Different values of the FEve index for the same community can be obtained if the species have unequal species abundances; this result is highly likely if most of the traits are categorical. Very minor differences in even one pairwise distance can result in very different values of FEve. FEve uses only a fraction of the information contained in the matrix of species distances. Counterintuitively, this can cause very similar FEve scores for communities with substantially different patterns of species dispersal in trait space. FEve is a valid metric only if all species have exactly the same abundances. However, the meaning of FEve in such an instance is unclear as the purpose of the metric is to measure the variability of abundances in trait space. We recommend not using the FEve metric in studies of functional variability. Given the wide usage of FEve index over the last decade, the validity of the conclusions based on those estimates is in question. Instead, we suggest three alternative metrics that combine variability in species distances in trait space with abundance in various ways. More broadly, we recommend that researchers think about which community properties (e.g., trait distances of a focus species to the nearest neighbor or all other species, variability of pairwise interactions between species) they want to measure and pick from among the appropriate metrics.

approaches have been developed since the 1990s to measure this key community attribute (many of them are listed in Scheiner, 2019).
Biodiversity, of which functional trait variability is one component, is a complex concept. Scheiner et al. (2017b) pointed out that the three basic types of information-abundance, relatedness, and trait values-each have properties of magnitude and abundance.
Together with species identity information (e.g., species richness), Scheiner (2019) defined fourteen basic elements of biodiversity that could be combined in myriad ways to produce many different types of biodiversity metrics. One scheme for describing different facets of functional trait variability was proposed by Villéger et al. (2008), who suggested three separate metrics: functional richness (FRic), functional evenness (FEve), and functional divergence (FDiv), which measure, respectively, the amount of trait space filled by the community, the evenness of species abundances as they are distributed in trait space, and how abundances are spread across trait space. In the classification scheme of Scheiner (2019), these are all composite metrics that, respectively, combine species richness with trait magnitude (FRic), abundance magnitude with trait variability of nearest-neighbor distances (FEve), and abundance magnitude with trait variability of mean distances. Because they are composites and because of the way that trait magnitude and variability are measured, among the few commonly used approaches, these three metrics are some of the most complicated. They are assumed to provide an exhaustive measure of functional variability within a community, although that is clearly not the case given the limited types of information that they encompass. Despite some criticisms of these indices, mainly focused on functional evenness (e.g., Legras & Gaertner, 2018;Ricotta et al., 2014) and richness (e.g., Podani, 2009), their usage has continually grown in recent years from 134 citations in 2015 to 288 in 2019, with a current total of over 1,500 citations. In this paper, we demonstrate that functional evenness (FEve) has severe limitations in its applicability and interpretation. We concentrate on FEve as an example of how approaches to conceptualizing and measuring functional evenness may go astray.
A community can be characterized by its species and their abundances. Using additional information about those species, relationships among the species can be expressed in terms of pairwise distances that in turn can be used to measure overall community variation. In particular, if each species is described by the same set of T traits (standardized trait values are assumed), a community of S species can be represented by S points in a T-dimensional trait space.
While distances can be estimated with different metrics, relationships are completely predetermined by the species' dispersion in the trait space. Functional trait diversity can be measured in a variety of ways; the differences in trait space among species can be measured using all pairwise distances, the mean distance of a given species from other species, or nearest-neighbor distances (Scheiner, 2019).
Those distances can then be further weighted by the species abundances to provide a measure of abundance-weighted functional trait variation within this multitrait space. FEve measures functional evenness based on abundance-weighted nearest-neighbor distances, so this metric might be relevant if the primary interactions within a community are among species that are most similar in trait values.
While such types of interactions occur in many circumstances, there are many circumstances when this is not true for either species or types of interactions. However, the FEve metric has been widely used to analyze functional variation without consideration of the types of processes and entities being considered. We return to this issue in the final section of the paper when we discuss alternative measures of functional variation. If FEve is to be used as a measure of some property of biodiversity, its conceptual basis needs to be described and justified. In particular, what is the reasoning for the use of MST edges in combination with abundances as a functional characteristic? Which functional characteristic is addressed by this combination? In what sense is it a measure of evenness? In addressing these questions, we uncover two conceptual problems: (a) the possibility of nonuniqueness of MSTs and (b) its use as an index of evenness.

| CON CEP TUAL PROB LEMS
Given a particular MST with S nodes (species s i , i = 1, 2, ⋯, S), FEve is calculated as follows. First, each edge linking species s i and s j with functional distance d ij = dist s i , s j between them is weighted by the sum of their abundances (w i and w j ): Second, those weighted edges are normalized by the sum of the EW ij values for the corresponding MST: where (i, j) designates an edge between species s i and s j . (Because of this normalization, either relative or absolute abundances can be used.) Finally, FEve is calculated as follows: (1) (2) which takes values between 0 and 1 (the denominator is the theoretically possible maximum value of the numerator

| INFEREN CE S FROM CON S TRUC TED E X AMPLE S
Multiple MSTs can result in multiple, different values of FEve index for the same community if the species have unequal species abundances, which severely limit the utility of the metric. The following example demonstrates such a situation. Let the community consist of three equally distant species (s 1 , s 2 , and s 3 ) in a given trait space w 2 = 2, and w 3 = 3, respectively ( Figure 1, community network).
There are three MSTs with the same minimum total distance (2d): MST 1 with one edge connecting s 1 and s 2 , and one edge connecting Now consider the three MSTs in Figure 1 to be generated for three different communities and the distances between the species no longer identical, but just very, very slightly different so that each MST is unique for that community (e.g., for MST 1, d 12 = d 23 = 1 and However, the meaning of FEve in such an instance is unclear as the purpose of the metric is to measure variability of abundances in trait space. A central reason for the problems raised above is that FEve uses only a fraction of the information contained in the ma-   ing, hovering, other). To determine the functional distance between species, Jaccard dissimilarity was calculated for each group of binary traits, and then, the combined distance between species was F I G U R E 2 In these communities, two of the three species are equally distant in both communities (d 12 = d 23 = d) with a distance that is smaller than the third distance (d 13 ). If the abundances of the three species are w 1 = 1, w 2 = 2, and w 3 = 3, then FEve = 0.75, even though community B seems much more functionally irregular than community A

F I G U R E 3
This community consists of three species in which d 23 is larger than d 13 and d 12 . The abundances (w) and distances result in values of EW 12 = EW 13 = 1∕6, PEW 12 = PEW 13 = 0.5, and FEve = 1 determined by an equal-weight averaging of the three group-specific dissimilarities (

| Bryozoan genotypes
Cristatella mucedo is a diploid freshwater bryozoan. We used data on eight microsatellite loci (

| Wheat fungal pathogen (Puccinia graminis f. sp. tritici) genotypes
The data consisted of eleven virulence phenotypes of P. graminis isolates collected from bread wheat in the Novosibirsk region of Russia. The binary phenotypes (virulence/avirulence) were determined with a set of twenty North American wheat differential lines (Skolotneva et al., 2020). The distance between the phenotypes was calculated using simple mismatch dissimilarity; the corresponding matrix of pairwise distances is presented in Table 3. Twenty-four different MSTs can be generated (Table 3)

| Wheat fungal pathogen (Puccinia triticina Erikss) genotypes
The data consist of eleven genotypes of single-uredinial isolates of P. triticina (a dikaryotic fungus) collected from durum wheat in Russia using eleven microsatellite markers (Table 3 in Table 4A. Three different MSTs can be generated based on the distance matrix (Table 4B).

| SUMMARY OF FE VE ISSUE S
In constructing species networks, it is assumed that trait values are measured without error and that there is no variation within species, two assumptions that we know are false. This issue could be addressed by a procedure that would estimate the mean and variability of relevant estimates over all closely related networks. We do not know of any attempt to study that matter for any diversity   Twenty-four different MSTs are possible with one of these edges. The multiple MSTs resulted in ten, twenty-four, and eighteen different FEve values for actual abundances, Y-modification, and Z-modification, respectively. Variability of FEve estimates is shown in Figure 4.
abundances) into a single assessment of evenness results in a metric that fails to distinguish between distance evenness and abundance evenness (Gregorius, 1990).
This entire paper has been about FEve, but we would be remiss if we do not mention PEve-phylogenetic evenness-which was defined by Dehling et al. (2014) to be identical to FEve, but substituting nearest-neighbor phylogenetic distances for distances in functional trait space. We discussed the nonuniqueness problem with FEve that occurs when you have two species which have identical nearest-neighbor distances to a third, but differ in their abundances. This problem is most likely for categorical traits or those based on counts with just a few possible values and so many not occur that often.
However, this problem is highly likely for phylogenetic data. It will occur any time you have a pair of sister species that are equally distant from a third and that differ in their abundances. PEve has been used much less frequently than FEve, but should also be abandoned.
As with functional traits, there are alternatives for phylogenetic evenness that can measure the same properties while avoiding the uniqueness problem (Scheiner, 2019;Tucker et al., 2017).

| NE X T S TEPS
We have shown that FEve has critical conceptual and practical drawbacks, and therefore, we recommend not using this index in studies of functional variability. However, it is still possible to measure even- An alternative approach for combining trait distance and abundance information is the use of the abundance-weighted distance of species i from all of the other S − 1 species:

F I G U R E 4
Variability of FEve estimates for the actual abundances of eleven virulence phenotypes of Puccinia graminis, and the Y-modification and Z-modification of abundances (see Table 3  where N = ∑ S j=1 n j is the total number of individuals in the assemblage.
Then, functional diversity can be estimated in terms of Hill numbers as follows: which is the effective number of distinct species that equally contribute to functional interaction and variability within a community based on abundances and weighted distances of every species from all other species (n i d i = n j d j for all i ≠ j). From this, we can obtain an evenness measure as follows: This metric is a Hill-based generalization of the metrics of Guiasu and Guiasu (2012) and Ricotta et al. (2014). This measure of evenness would be appropriate if a given species interacts with all of the other species in a community in a way that "averages" over all of those interactions (e.g., in a system with diffuse competition).
The evenness metrics given in Equations 5  A23 in Scheiner et al., 2017), so that the number of equally interacting species is determined as follows: (Equations 4 and A4, Scheiner et al., 2017). The corresponding metric of functional evenness is then as follows: This measure of evenness would be appropriate if the pairwise interactions are important and those interactions occur with all of the other species in the community (e.g., scramble competition for a spectrum of resources). The metrics presented here (Equations 4-11), as well as FEve itself, assume that all individuals within a species are identical; somewhat different forms are necessary to capture within-species variation.
More general concepts (Gregorius & Kosman, 2017) and a large variety of metrics (Scheiner, 2019) exist for measuring functional variation and can be used as alternatives for FEve. We caution, though, that many of them have not yet been critically evaluated.
The metrics suggested here (Equations 5,8,and 11) are all based on a concept of diversity of dispersion measured by an effective number of types. Division of this effective number by the actual number of types turns these into metrics of functional evenness. While there is no single best way to measure functional trait evenness or its combination with abundance, there are metrics, such as FEve, that should be avoided.

ACK N OWLED G M ENTS
We thank Carlo Ricotta and an anonymous reviewer for their helpful comments on an earlier version of this manuscript. This manuscript is based on work done by S.M.S. while serving at the U.S. National Science Foundation. The views expressed in this paper do not necessarily reflect those of the National Science Foundation or the United States Government.

CO N FLI C T O F I NTE R E S T
None declared.

DATA AVA I L A B I L I T Y S TAT E M E N T
All of the data used in this paper were previously published.

O RCI D
q D AT P = 1 + √ 1 + 4 q H AT P ∕2, (11) q E AT P = q D AT P ∕S.