### Introduction

- Top of page
- Summary
- Introduction
- The Wheatsheaf index
- Materials and methods
- Results
- Discussion
- Acknowledgements
- Data accessibility
- References

The independent evolution of similar phenotypic traits in multiple organisms, or convergence, has been recognized as a key evolutionary process since Darwin (1859). Convergent evolution is often a consequence of adaptation to a similar niche (although not always, see Stayton 2008) and has therefore been recognized and studied in cases of replicated adaptive radiations such as *Anolis* ecomorphs (Losos 1992, 2009; Beuttell & Losos 1999) and African cichlids (Kocher *et al*. 1993; Muschick, Indermaur & Salzburger 2012). In addition, convergence may be seen when organs have similar uses and converge on a similar form, as in the camera eye which has evolved in both vertebrates and invertebrates. Convergence between organisms for a particular niche can promote speciation by causing divergent selection within a lineage inhabiting two niches (Rosenblum 2006), limit the suite of phenotypic traits that will evolve as adaptations (Martin & Wainwright 2013) and drive distantly related organisms towards the same phenotypic adaptive optima (Mahler *et al*. 2013). Notably, Conway Morris (2003) has argued that convergence of traits towards a limited number of ‘engineering optima’ is a central guiding force in phenotypic evolution. For example, there are only a small number of ways to construct an effective, functioning eye; hence, engineering constraints cause convergence and limit biological diversity in this trait. If correct, Conway Morris's view is profoundly important for our understanding of biological variation. Therefore, an understanding of convergent evolution is important to understanding the generation of biodiversity, constraints on adaptation, and how natural selection optimizes an organism for a particular niche. For the purposes of this study, we use ‘niche’ to refer to an aspect (or aspects) of the biotic and/or abiotic environment of an organism that is of interest for a hypothesis under study.

There have been several approaches and methods developed to identify instances of convergent evolution, and these have enabled a large number of cases to be described and recognized as such. At its simplest, convergence may be identified by carefully cataloguing traits across many species. McGhee's recent text (2011) is an excellent example of this.

More formally, perhaps the most commonly used and simplest method for identifying convergence is ancestral state reconstruction of the (purportedly) convergent trait. For example, this method has provided support for convergent evolution of plumage coloration in *Icterus* orioles (Omland & Lanyon 2000) and in the chemically defended *Pitohui* birds (Dumbacher & Fleischer 2001). In such an analysis, the phenotype is reconstructed in some way over the phylogeny, and independent origins (multiple shifts to the same state) are taken as evidence of convergence.

Muschick, Indermaur and Salzburger (2012) used an alternative approach to test for convergence in cichlid fishes by considering that convergence should result in a pattern of reduced phenotypic differentiation when compared with phylogenetic distance. These authors thus calculated Euclidean distances between species in the morphological traits of interest and plotted them against the phylogenetic distances. They then used simulations to identify instances where phenotypic divergence was significantly lower than expected based on phylogenetic distance. As this method involves a straightforward comparison of phylogenetic and phenotypic distances, Muschick, Indermaur and Salzburger (2012) included both convergence and slower-than-expected divergence within their measure, as the two would produce the same signature.

A third approach was described by Ingram and Mahler (2013) which explicitly models trait evolution onto a phylogeny to identify convergent evolution. Their ‘SURFACE’ method takes a continuous trait and fits Ornstein–Uhlenbeck models with varying numbers of selective regimes and with shifts at varying points on the tree. Akaike's information criterion is then used to select the best fitting model. Convergence is identified by the independent adoption of the same selective regime at multiple points on the phylogeny.

Each of these methods represents a technique to identify when convergence has occurred. Statistical recognition of convergence is, of course, fundamental. However, once convergence is established, a number of important questions can be explored. For example, we may be interested in whether there are general rules in the way convergence operates. Do some traits show stronger convergence than others? Do different types of traits converge more easily than others (e.g. morphological vs. biochemical traits), and if so, is evolution more predictable for some kinds of traits than for others? Do particular ‘levels’ of convergence (e.g. functional, structural, developmental, genetic) vary in their contribution to adaptive evolution? Why might such differences exist (e.g. what might drive stronger convergence in protein sequences than limb anatomy)? It is perhaps notable that most analyses of convergent evolution have focussed on morphological traits, which limits our knowledge base on how different types of traits may differ in aspects of convergence; however, some exceptions do exist (e.g. Mirceta *et al*. 2013).

To answer such questions, we need a way of quantifying the strength of convergence. When we have a suitable measure of convergent evolution, we can start to test hypotheses about the nature of convergence, rather than simply recognizing it. Specifically, we require a metric that is comparable across many types of traits, incorporates both phylogenetic relatedness and the extent of phenotypic similarity, and is quantitative.

In this study, we describe a simple measure of the strength of convergent evolution, which we call the ‘Wheatsheaf index’. For the purposes of our method and this study, we consider convergence to be the pattern that results from the process of convergent evolution, rather than the process itself. Furthermore, because we use a pattern-based description of convergence, parallelism is indistinguishable from ‘true’ convergence using our method and so comes under the concept of convergence for the purposes of this study. The index was designed to meet the requirements outlined above and with the underlying assumption that we can define a set of species as convergent or have a working hypothesis as to the niche upon which the organisms are adapted (or adapting towards).

### The Wheatsheaf index

- Top of page
- Summary
- Introduction
- The Wheatsheaf index
- Materials and methods
- Results
- Discussion
- Acknowledgements
- Data accessibility
- References

To calculate the Wheatsheaf index we take a set of organisms, and within that identify a subset that we treat as convergent (we call this the subset of ‘focal’ taxa), and the residual species as members of the ‘non-focal’ subset. The index measures the similarity of focal species to each other and the isolation in phenotypic space of the focal group from non-focal species, all penalized for phylogenetic relatedness. To understand this in a conceptual way, we can consider convergence to be movement in phenotypic space over a fitness landscape towards an elevated position (such as an adaptive peak) which characterizes a particular environment or niche. The distance between non-focal and focal species represents the distance across such a landscape that focals have had to move to reach the peak, with movement over larger distances representing more evolution and therefore a stronger signature of convergence. In addition, the more tightly clustered the focal species are in this phenotypic space (the more similar they are to each other), the stronger are the selective forces pulling converging species towards the peak, or the narrower the peak itself, which in either case would indicate a more intense pull towards a particular point in phenotypic space.

Both of these aspects seem to be good foundations for a conceptual view of the strength of convergence, providing phylogenetic relatedness is accounted for, as is the case with the Wheatsheaf index. Thus, we consider convergence to be stronger when focal species are more phenotypically similar to each other, and when the focal species are more dissimilar to the non-focal species – in other words, when they have had to evolve further from the baseline of non-focal species to reach the convergent state. We note that some patterns of convergence may leave convergent species still more similar to their close relatives than each other in many phenotypic attributes (Stayton 2006), but we view this as a manifestation of differing strengths of convergence rather than a challenge to our definition. This phenotypic aspect of the index is penalized for close phylogenetic affinities and generates a quantitative measure which can subsequently be used to test hypotheses about the strength of convergence across traits.

Before we can apply the Wheatsheaf index, we require a clade to work with in which some members have been demonstrated to exhibit convergent evolution. In other words, we would use other methods (e.g. ancestral state reconstruction or SURFACE) which identify convergence so that we can start with a supported assumption that there is convergence in our group of interest. We then need to assign (*a priori*) species within that group as either ‘focal’ or ‘non-focal’ species. This is often related to a working hypothesis on the niche the organisms are expected to be converging on such that focals are those species occupying that niche (expected to show convergent adaptations) and non-focals are those species not occupying that niche. To give two examples, we might be interested in measuring convergence in body form for burrowing in lizards; in this case, burrowing species would be assigned to the focal group. Or we might look at convergence in salinity tolerance for brackish habitats, in which case species inhabiting estuaries and other such environments would form the focal group. Alternatively, we could consider the species already identified as convergent as the focal group, which would allow us to measure how strong the convergence is in selected phenotypes of these taxa, regardless of any adaptive reason for it.

Other information required for the Wheatsheaf index is a phylogeny for the clade of interest and trait information. How we choose traits will depend on the purposes of the study. If we are interested in whether a particular set of traits are important for a given niche, then the selection of traits should be hypothesis-driven such that traits are chosen so that they may be convergent for that niche. This approach has the benefit that specific adaptive hypotheses of convergence for a given niche are examined. If, on the other hand, we are interested in an unguided investigation of which traits might be convergent for a given niche (if we have no working hypothesis with which to make *a priori* predictions), then we could use a large number of traits spanning the range of those we can measure, run the index on all of them and therefore obtain estimates of which ones are most convergent. However, an important stipulation is that the traits must be (semi)quantitative (e.g. continuous, count or ordinal data; see 'Discussion' for further details).

Calculation of the Wheatsheaf index requires the data (both phylogenetic and phenotypic) to be represented in pairwise distance matrices. For the phylogeny, a matrix of proportion shared distances between species is used, such that the total tree height is scaled to one and distances are given as the proportion of the tree shared between two species. In other words, bigger distances represent more closely related species. For phenotypic traits (which are first standardized for variance by dividing by the standard error of the trait across species), a matrix of Euclidean distances between species is used, which enables any number of traits to be incorporated, and bigger shared distances represent more dissimilar species for the included traits. This allows us to look at single traits individually or grouped traits as appropriate for the hypothesis being tested, for example, we could obtain a distance matrix for a set of morphological traits and a second one for a set of physiological traits. Again, the selection of traits to include in the study as a whole and in a given distance matrix will be driven by the hypothesis in question.

To calculate the Wheatsheaf index, we first obtain a corrected (for phylogenetic relatedness) phenotypic distance matrix as follows:

- (eqn 1)

where *d*_{ij} is the phenotypic (Euclidean) distance between species *i* and *j*,* p*_{ij} is the shared proportional distance between species *i* and *j* obtained from the phylogeny, and is therefore the phenotypic distance between species *i* and *j* corrected for phylogeny. Note that *p*_{ij} is transformed by adding a small (and arbitrary) value and logging; this is so that *p*_{ij} scales approximately linearly with . If a pair of species are closely related, and therefore, *p*_{ij} is close to 1, then will be much larger than *d*_{ij}. As species become more distantly related, then *p*_{ij} will decrease and will become progressively smaller and approach *d*_{ij}. This is an intuitive way of correcting for phylogeny as more weight (i.e. a smaller distance) is assigned to more distantly related taxa being similar, therefore penalizing the phenotypic similarity of closely related species. As *p*_{ij} and are approximately linearly related in the equation, this is in effect assuming that the phenotype diverges in proportion to time (phylogenetic history). Note that as we consider convergence to be a pattern in this paper, no model is fitted and so no parameterization is conducted, and thus, eqn. 1 should be robust to the particular evolutionary model that best fits the trait data, providing that we can expect more phenotypic divergence when species pairs are more distantly related. Nevertheless, it might be possible to extend this method in the future to incorporate specific evolutionary models in the penalizing term, should this become necessary.

Using the corrected phenotypic distances (pairwise matrix of between each pair of species), we can now calculate the Wheatsheaf index (*w*) as follows:

- (eqn 2)

As the calculation of *w* is not amenable to multiple, independent sampling (it uses information from the entire sample – all species in the clade), 95% confidence intervals are generated by jackknifing the data set and using the resulting distribution of values to calculate the intervals.

Because the topology of the tree may constrain the possible values of *w,* we used a bootstrapping approach to resample the tips of the tree along with their trait values and thus obtain a distribution of possible *w* indices given the phylogeny and the trait values for each species. Using this distribution and the calculated value of *w*, we can generate a ‘*P*-value’ by taking the proportion of bootstrap samples that are greater than or equal to the value of *w* calculated from the original data set (see Fig. 2). We stress that this *P*-value is *not* a test for the presence of convergent evolution; as described earlier, we begin an analysis with the Wheatsheaf index with the knowledge that convergence has occurred in our clade of interest. Rather, it represents a test of whether convergence is significantly stronger than we would expect compared to a random distribution of trait values across the specified tree. A further advantage of this is that comparisons of the *P*-values provide a measure of convergence that accounts for the given tree structure and so, in effect, standardizes for this. In other words, we can potentially use the *P*-value to compare the strength of convergence across trees, which is not possible using our value of *w* alone. However, we would add that as *P*-values are bound between zero and one, comparisons using this part of the method may be limited in extreme cases by floor and ceiling effects.

### Discussion

- Top of page
- Summary
- Introduction
- The Wheatsheaf index
- Materials and methods
- Results
- Discussion
- Acknowledgements
- Data accessibility
- References

An important question in evolutionary biology is whether convergence can be quantified. To begin to examine this question, we have described a new method (the Wheatsheaf index) for measuring the strength of convergent evolution. The index provides a simple quantification of convergence and achieves a number of desirable qualities: comparability, intuitive interpretation and phylogenetically informed.

The basis of the index is the relative phenotypic distances rather than absolute distances (and particularly as the traits are standardized to account for the degree of variation) and consequently is comparable between a wide variety of traits. It therefore provides a useful measure which can be compared directly between, for example, behavioural, morphology and molecular traits, or between functional and developmental traits, for species within the same overall set. This provides a high level of flexibility in how the method can be used and opens up a range of questions which can now be explicitly tested. Because *w* increases as convergence becomes stronger, it has an intuitive interpretation.

Although the interpretation of a particular value is made more difficult by the possible influence of topological constraints, the *P*-value incorporates this aspect and can also be used to compare across trees – further assisting with interpretation. The index provides a measure that incorporates both the similarity of focal species to each other and the differentiation from non-focal species, which we regard as two key aspects of convergence. However, we must note that a high (or low) Wheatsheaf index can result from either of these aspects, for example, from close similarity in phenotypic values or from less phenotypically similar species that are more phylogenetically distant. Therefore, if we are interested in how a given value arose, we must look back at the tree to further inform our interpretations of the underlying patterns. In most or all cases, it is probable that both of these elements will be responsible in part.

#### Limitations to the application of the index

As mentioned earlier, the Wheatsheaf index requires (semi)continuous rather than discrete traits, unless there are multiple discrete traits to be included in the same analysis. This restriction is imposed on logical grounds. If a trait is either present or absent, then organisms cannot be more or less convergent for that trait: they either are convergent (share the trait) or not. Therefore, in the case of single discrete traits, it is meaningless to give a measure of the strength of convergence and the best we can do is to identify whether or not convergence has occurred and look for correlates with any hypothesized focal niche. If, however, there are multiple discrete traits, then we may sensibly ask questions about the strength of convergence providing we are concerned with a set of such traits rather than each one individually. In this case, we can measure the strength of convergence in a phenotypic space defined by a set of binary traits, as this essentially creates a quantitative scale of similarity across traits (i.e. species can be more similar by sharing a larger number of discrete traits).

We have not examined the impact of taxon sampling within a clade, but given that all distances are pairwise distances, we do not expect incomplete sampling to be a problem, at least for analyses on the same tree. If incomplete sampling does not pose a problem, we could potentially take a large taxonomic group (e.g. birds, insects, animals) and sample a number of species from this group, encompassing both focal and non-focal taxa, with which we can calculate the Wheatsheaf index. However, we recommend where possible using reasonably well-sampled clades for analysis as this will reduce any concerns over selection of species for inclusion and so avoid potential confirmation bias arising from non-random choice of species to include. In particular, and given that the index works well on small trees, we would recommend that such questions are addressed by taking a number of smaller trees and comparing results across them, rather than using a very large but very poorly sampled clade.

It is important to choose the focal group based on clear, objective criteria based on an *a priori* hypothesis for two reasons. First, if we assume that convergence is due to adaptation for a particular niche, then it must be considered in relation to that niche. In essence, this instils a biological context to studies of convergence and encourages hypothesis-driven research. Even if we do not assume that the observed convergence is adaptive, the analysis should still be hypothesis-driven in that focals may be defined based on *a priori* identification of convergent species using other methods (e.g. SURFACE). Secondly, where we consider convergence to be adaptive, it allows us to consider whether convergence has been driven by adaptation to the hypothesized niche. In the case of body shape in burrowing lizards, we might have three data sets with different classifications for the focal group: burrowing, sandy soils and dense ground vegetation. We could then compare the strength of convergence for each of these and examine whether one shows a stronger signal than the others.

A final limitation of our method is that in the current implementation it is problematic to include fossil taxa. Because phylogenetic relatedness is penalized based on the distance from the root of the tree till the point when the pair of species diverged, it assumes that the species continued along independent lineages until the present day. As an extinct taxa pair may have been closely related at the time of their extinction but would be penalized based only on the time of their divergence, they would be considered by our method to be more distantly related than they actually are. Therefore, the Wheatsheaf index can currently only be applied to trees of extant species, although this could potentially be addressed in a future development by using a cophenetic phylogenetic distance to penalize phenotypic similarity when extinct species are included in the study.

#### Concordance of empirical results with previous literature

In our *Anolis* lizard data set, perhaps the most notable finding is that ecomorphs differ in the strength of their convergence – grass-bush and trunk-ground anoles stand out as having particularly strong convergence compared to others. Furthermore, some traits are more strongly convergent within some ecomorphs but not others. Therefore, patterns of convergence in particular traits are ecomorph specific. Given the different niches inhabited by each ecomorph, this is perhaps not surprising as different traits may be more or less needed for a given situation and so the divergence between ecomorphs drives the evolution of different combinations of traits. We will now discuss and highlight that many of our results are consistent with previous literature, which again indicates that the Wheatsheaf index is a useful and meaningful measure of convergent evolution.

Our analyses found the strongest convergence in limb length occurred in grass-bush anoles compared to the other ecomorphs, consistent with Losos’ (1990b, 2009) finding of relationships between limb length and jumping and sprinting (perhaps particularly important for grass-bush anoles). The strong convergence of lamellae number detected in trunk-ground anoles suggests that there is a notable degree of adaptation in this trait. This could be a consequence of opposing selection pressures favouring fewer lamellae than highly arboreal ecomorphs but still enough to permit adequate climbing ability, for example, for making quick dashes down tree trunks to capture prey (Losos 2009). Grass-bush anoles have a small body size to facilitate movement through their structurally complex microhabitat and have long hindlimbs, short forelimbs and an exceptionally elongated tail (Losos 2009). Consistent with this, we found that the Wheatsheaf index was very high for body size, limb length and tail length in grass-bush anoles.

#### Extendibility and final comments

It should be noted that, in the current version of the index, the term used to penalize phenotypic similarity for phylogenetic relatedness includes a matrix of shared proportional distances. Consequently, penalized phenotypic distances increase with time since divergence of a given species pair. This implicitly assumes an evolutionary model similar to Brownian motion, wherein we expect greater phenotypic disparity with greater time since divergence. However, the method can be readily extended to explicitly incorporate other evolutionary models by generating the matrix of phylogenetic distances under these models, such as the various variance–covariance structures available in the R package ape (Paradis, Claude & Strimmer 2004). This is a simple extension that relates to the creation of the input files before the calculations of *w* are conducted, but may serve to increase the flexibility of the index further.

Another useful extension would be a ‘multi-focal-group’ implementation of the Wheatsheaf index. By this, I mean the ability to investigate many focal groups in the same analysis. For instance, having several focal groups (e.g. ecomorphs) included in the same index value to assess the extent of convergence in the clade as a whole. However, care would need to be taken to ensure that differences between focal groups would not mask convergence within each focal group.

Finally, we would like to highlight once more that the Wheatsheaf index is not designed to test for the presence of convergence. There are many good methods available for this (see introduction), and we assume that the selection of a group to use our index on is based on the presence of convergent evolution in the clade and that species contained within it have desirable characteristics for the question being asked in a given study. When convergence has been demonstrated, our method then allows the strength of this convergence to be quantified. Also, and particularly, if the specific value of *w* is to be interpreted, the *P*-values must be discussed in relation to any inference in order to account for topological constraints on *w*.

We have developed and herein presented a novel method for the quantification of convergent evolution. The Wheatsheaf index is intended as an addition to the methodological toolkit for the analysis of convergence (used along with other methods, e.g. those for identification of convergence), and it is hoped that it will prove useful in elucidating details of this important and widespread evolutionary process.