Testing gradual and speciational models of evolution in extant taxa: the example of ratites


  • M. LAURIN,

    1. UMR 7207, CNRS/MNHN/UPMC ‘Centre de Recherches sur la Paléobiodiversité et les Paléoenvironnements’, Muséum National d’Histoire Naturelle, Département Histoire de la Terre, Bâtiment de Géologie, Paris Cedex 05, France
    Search for more papers by this author

    1. Experimental Zoology Group, Wageningen University and Research Centre, Wageningen, The Netherlands
    Search for more papers by this author

    1. UMR 7207, CNRS/MNHN/UPMC ‘Centre de Recherches sur la Paléobiodiversité et les Paléoenvironnements’, Muséum National d’Histoire Naturelle, Département Histoire de la Terre, Bâtiment de Géologie, Paris Cedex 05, France
    Search for more papers by this author

    1. UPMC, Univ. Paris 06, UMR 7193, ISTeP, Paris, France
    2. CNRS, UMR 7193, ISTeP, Paris, France
    Search for more papers by this author
  • J. CUBO

    1. UPMC, Univ. Paris 06, UMR 7193, ISTeP, Paris, France
    2. CNRS, UMR 7193, ISTeP, Paris, France
    Search for more papers by this author

Michel Laurin, UMR CNRS 7207 ‘Centre de Recherches sur la Paléobiodiversité et les Paléoenvironnements’, Muséum National d’Histoire Naturelle, Département Histoire de la Terre, Bâtiment de Géologie, Case Postale 48, 43 rue Buffon, F-75231 Paris Cedex 05, France.
Tel.: +331 40 79 36 92; fax: +331 40 79 37 39; e-mail: michel.laurin@upmc.fr


Abstract  Ever since Eldredge and Gould proposed their model of punctuated equilibria, evolutionary biologists have debated how often this model is the best description of nature and how important it is compared to the more gradual models of evolution expected from natural selection and the neo-Darwinian paradigm. Recently, Cubo proposed a method to test whether morphological data in extant ratites are more compatible with a gradual or with a speciational model (close to the punctuated equilibrium model). As shown by our simulations, a new method to test the mode of evolution of characters (involving regression of standardized contrasts on their expected standard deviation) is easier to implement and more powerful than the previously proposed method, but the Mesquite module comet (aimed at investigating evolutionary models using comparative data) performs better still. Uncertainties in branch length estimates are probably the largest source of potential error. Cubo hypothesized that heterochronic mechanisms may underlie morphological changes in bone shape during the evolution of ratites. He predicted that the outcome of these changes may be consistent with a speciational model of character evolution because heterochronic changes can be instantaneous in terms of geological time. Analysis of a more extensive data set confirms his prediction despite branch length uncertainties: evolution in ratites has been mostly speciational for shape-related characters. However, it has been mostly gradual for size-related ones.


The proposal of the model of punctuated equilibria by Gould & Eldredge (1977) triggered a debate about the tempo and mode of evolution. Several palaeontological studies attempted to determine which of the two main models (natural selection with variable rates of evolution, or punctuated equilibria characterized by long periods of stasis interrupted by brief periods of change that coincide with cladogenetic events) was most compatible with data from several successive populations representing a few evolving lineages sampled over relatively long periods of time (several tens of thousands of years to a few million years). Based on a review of the literature (including previous reviews), Benton & Pearson (2001) argued that both patterns occur in the fossil record of eukaryotes, but that in unicellular eukaryotes of the marine plankton, gradual evolution prevails, whereas in metazoans, a punctuated equilibrium pattern may be more common. However, this latter conclusion remains tentative because the fossil record of metazoans is much less complete than that of many unicellular organisms with mineralized skeletons.

Evolutionary models of characters are interesting because they can produce evidence for the presence of selection and trends, limits on character value, and patterns of change and thus contribute to refining evolutionary theory. Determining the correct evolutionary model is also important because modern comparative methods used to test character correlation and infer ancestral values assume a Brownian motion model, whereas strong departures from this model can lead to inaccurate results (Díaz-Uriarte & Garland, 1996; Martins et al., 2002). Punctuated equilibrium is such a departure. Comparative data that result from such an evolutionary model can be analysed by most comparative tests, but instead of using branch lengths proportional to evolutionary time or to the variance observed in other characters (which may have evolved according to other models), branches of equal lengths on a tree including a representative sample of all known (extant and extinct) lineages of a clade should be used. Using appropriate branch lengths should lead to more accurate estimates of correlation and ancestral values.

More recently, comparative data from extant taxa have been used to investigate the preponderance of these evolutionary models. Ratites are good candidates to test models of character evolution because this clade contains a low number of species (thus making exhaustive sampling possible). Note that the model of punctuated equilibria has two variants: (i) the punctuational model, in which changes occur at the time of speciation (cladogenesis) only in a single daughter species (clade), and (ii) the speciational model, according to which changes occur at the time of speciation in both daughter species (clades) (Rohlf et al., 1990). Cubo (2003) recently proposed a method to test whether a character evolved according to a gradualist or a speciational model. His test is based on determining which phylogenetic distance matrix best explains the phenotypic distance matrix of the relevant character. One matrix is based on estimated times of divergence between terminal taxa and represents a gradual model of evolution, whereas the other has unitary branch lengths and represents a speciational model of evolution. Thus, Cubo (2003) regressed distance matrices of individual characters against the phylogenetic distance matrices of the taxa assuming gradual evolution (branch lengths reflecting evolutionary time) and a speciational model (branches were of equal lengths). His data set included dimensionless shape variables: the ratio diaphyseal diameter/total length of stylopodial (humerus and femur) and zeugopodial bones (ulna, radius and tibiotarsus) and the ratio wing length/leg length. Cubo (2003) tested the significance of these regressions using permutations. This procedure suggested that some phylogenetic distance matrices explained the character data better than others, as shown by the probability that the regression coefficients reflected random fluctuations. However, in some cases, neither phylogenetic distance matrix seemed to explain the character data, whereas in others, both the gradual and the speciational model of evolution seemed to be compatible with the data. Thus, the resolution of this method may not be optimal. We propose instead to regress standardized contrasts on their expected standard deviation, a test that is implemented in Mesquite, and we use simulations to assess the relative merits of both tests, as well as the performance of a third approach available in the Mesquite module comet (Lee et al., 2006).

One of the most sophisticated tools to investigate evolutionary model using comparative data is the Mesquite module comet, which uses maximum likelihood and the Akaike Information Criterion (AIC) values to compare the fit of nine evolutionary models to the data (Lee et al., 2006). This implements methods presented in more detail by Oakley et al. (2005). The nine models represent all possible combinations of two properties of the model, each of which can follow one of three submodels (Lee et al., 2006: fig. 1). Thus, evolutionary change can be purely phylogenetic (each branch of the reference phylogeny is used), nonphylogenetic (only terminal branches are used, the internal ones are set to 0) or punctuated, in which only one of every pair of sister branches stemming from a node has a positive length (the other is set to 0). The length of the branches can follow the reference phylogeny (‘distance’ in the terminology of Lee et al., 2006), can be of equal length or can be estimated from the data (‘free’ in the terminology of Lee et al., 2006). The evolutionary model that Cubo (2003) called gradual is the distance, purely phylogenetic model of Lee et al. (2006), whereas Cubo’s (2003) speciational model is Lee et al.’s (2006) equal, pure phylogenetic model. Note that in this last model, according to Mooers et al. (1999), phenotypic change between taxa is proportional to the number of speciation events (cladogeneses) that have occurred between them.

Later, we use comet to reanalyse the data on appendicular bone shape, based on which Cubo (2003) argued for a mostly speciational model of evolutionary change in osteological characters in ‘ratites’. We added 48 characters to this data set. These analyses were performed using a phylogeny based on morphological characters and suggesting that ratites are monophyletic (Bourdon et al., 2009), as well as on three molecular phylogenies.


Analysis of the evolutionary model (gradual/speciational)

We tested the performance of the test developed by Cubo (2003) to determine whether a character evolved according to a gradualist or a speciational model. For this purpose, we simulated the evolution of 100 characters using a Brownian motion model in Mesquite (Maddison & Maddison, 2010) on two phylogenies (Fig. 1a,c) with 36 terminal taxa. These represent the characters that have evolved according to a gradual model. We also simulated the evolution of 100 characters using a Brownian motion model on two phylogenies with unitary branch lengths (Fig. 1b,d); these represent the characters that have evolved according to a speciational model. We then regressed distance matrices obtained from each character against the phylogenetic distance matrices that reflect two phylogenies: (i) the original phylogeny; (ii) a phylogeny with the same topology in which all branches were of equal length. As in Cubo (2003), the test is based on determining which phylogenetic distance matrix (one is based on estimated times of divergence between terminal taxa, whereas the other has unitary branch lengths) best explains the distance matrix of the relevant character. For this purpose, we regressed distance matrices of individual characters against the phylogenetic distance matrices using multiple linear regressions. The significance of these regressions was tested using permutations and a forward selection procedure (p to enter = 0.05). Cubo (2003) used a similar procedure to discriminate between two main alternative sets of branch lengths: the original branch lengths (implying a gradual model) and equal branch lengths implying a speciational model.

Figure 1.

 Trees used to establish the validity of two proposed methods to determine whether characters evolved according to a speciational or a gradualistic model of evolution. (a) The first random (Yule) tree produced by Mesquite used to simulate the characters that evolve according to a Brownian motion model. This also represents the true tree (in which branch lengths represent time). (b) Tree of identical topology but in which all branches are of equal length. This tree was used to generate characters that evolve according to a speciational model of evolution. Note that branch lengths do not represent time here; the true tree, in which branch lengths represent time, is still represented by (a). Similar pairs are presented for the second (c, d) random tree, also generated using a Yule process.

In our tests using Cubo’s (2003) original implementation, the procedure using forward selection selects the variable (in this case, a phylogenetic distance matrix representing a tree) with the most significant coefficient of correlation (lowest probability, not necessarily the highest R2), provided that both the probability of the R2 and of the b coefficient (slope) are inferior to the p-to-enter (here, 0.05 for the first step of the analysis). Then, the remaining variable(s) are tested to determine whether their addition significantly improves the regression model (using multiple regression); again, if both the probabilities of the R2 and of the b coefficient are inferior to the p-to-enter value (here 0.025 because at this step, it must be half of the p-to-enter value of the first step), the variable is entered into the model. With two competing trees, this analysis requires up to two steps (a single step is required minimally, in the cases in which none of the trees yields a significant regression). The statistical significance of the R2 and the b coefficient is tested using 999 permutations in the program Permute (Casgrain, 2005); the regression on the unpermuted values is added to this sample of randomized data, which makes the test conservative. Thus, up to 3000 individual regressions are used to determine which tree(s) correspond to the model of character evolution. In sum, over 1 200 000 regressions were performed for this study. Given that the forward selection procedure can select more than one tree, the accuracy score of the test is 1/n, where n is the number of selected trees, provided that the latter included the correct tree. If none of the trees were selected, the correctness score is 0. The values reported later are the average of the correctness scores for 100 characters.

We also evaluated two other related but somewhat simpler procedures to determine the evolutionary model of a character, which consists of choosing the tree that has the lowest probability (even if it is > 0.05) associated with the explained variance or with the b coefficient in the first step of the forward selection procedure. In case of a tie, the correctness score is 1/n, where n is the number of selected trees, provided that the latter includes the correct tree.

We also test the performance of a second method to establish whether a character has evolved according to a gradual or to a speciational model. This method consists in regressing standardized contrasts on their expected standard deviation (based on the branch lengths), a method that is implemented in Mesquite and which is commonly used to determine whether selected branch lengths are adequate to standardize data prior to performing an analysis of phylogenetically independent contrasts (e.g. Laurin, 2004; Cubo et al., 2005). If the character has evolved according to a Brownian motion model and the branch lengths have been estimated correctly, there should be no significant relationship (the slope should be about 0 and its associated probability should be high, typically over 0.05, reflecting adequate contrast standardization), but unitary branch lengths should provide inferior standardization (the probability associate with the slope should be lower). If the character has evolved according to a speciational model, a nonsignificant relationship will usually be found using unitary branch lengths, provided that all cladogeneses are documented (or a representative sample thereof), including those that have given rise to lineages that are now extinct. Conversely, branch lengths reflecting time should yield inferior standardization (with a lower probability associated with the slope). This test thus selects the phylogeny with the highest probability associated with the linear regression slope between standardized contrasts and branch lengths as the best one. Note that the test used by Cubo (2003), as well as any other test that we could imagine, is also subject to this latter (and most problematic) requirement when a speciational model is among those tested. Clearly, the quality of the fossil record is the most limiting factor in our ability to detect speciational change from comparative data.

Lastly, we tested the performance of comet (Lee et al., 2006) by determining which of the two models of interest (pure phylogenetic/distance and pure phylogenetic/equal) best fit the data (lower AIC scores are better). When the best model was neither of these two, we still scored the character on the basis of the fit of both of these models, ignoring the score of the seven other models. We followed the same procedure in our analysis of the empirical ratite data set.

All our tests of the methods relied on two random trees generated by the Yule algorithm of Mesquite, to cover a diversity of tree symmetry and branch length distribution. Testing the impact of tree symmetry, number of taxa, and branch length distribution would of course improve reliability of such tests, but such a procedure would require software development that is beyond the scope of this study.

Description of characters

The skull was described using a total number of 18 continuous characters distributed over the ventral side of the cranium (Table 1). Each character was measured twice using a digital caliper (accuracy 0.01 mm; Sylvac, Crissier, Switzerland). The average of the two values was used for further analysis. To reduce size effects, all character were scaled to the width of the skull measured at the quadrate–jugal articulation (parameter A), which adds 17 continuous characters to the data set. When possible, multiple specimens of a single species were measured to obtain a mean species value for the analysis. When a character was absent, the value for this character was set to be zero. In four cases, the museum specimens were incomplete and some characters could not be measured. Only if less than three characters were impossible to measure was the specimen included in the analysis. In four such cases, missing values were calculated based on the mean relative value of the species in the same genus. These relative values were then used to calculate absolute values for the missing parameter. The characters used in the analyse give a good description of the palatine–pterygoid complex (PPC) (Gussekloo & Zweers, 1999; Gussekloo & Bout, 2002), which plays an important role in the cranial kinesis of birds (Zusi, 1984). The pterygoid, the palatine and in some cases the vomer are bony elements that play an important role in transferring muscle force to either open or close the upper bill (Bock, 1964; Gussekloo et al., 2001).

Table 1.   Description of the continuous morphological characters representing the ventral side of the cranium of ratites and related taxa. These correspond to characters 20–37 (raw measurements) and characters 38–54 (standardized characters) in Data S1–S3. They are denoted by the same letter followed by ‘m’ (for raw measurement) or ‘s’ (standardized) in Data S1–S3.
A: Skull width at the quadrate–jugal articulation (standard)
B: Distance between most distal points of proc. orbitales quadrati
C: Width at pterygoids at quadrate–pterygoid articulation
D: Width at most rostral part of pterygoids at the pterygoid–palatine connection
E: Maximal width of the right pterygoid in the transversal plane
F: Width of the vomer (caudal)
G: Width of the vomer (rostral)
H: Distance between the anguli caudolaterales of the palatal wings (pars lateralis)
I: Maximal distance between the lateral margins of the palatal wings at their rostral endings
K: Width between palates at position ‘I’
L: Width at most caudal part of the palatines at the pterygoid–palate connection
M: Width between the connection of the proc. palatinus and proc. jugalis of the maxilla
N: Width of the rostrum parasphenoidale incl. proc. basipterygoidei if present
O: Distance foramen magnum to measurement ‘N’
P: Distance foramen magnum to most caudal part of an element of the PPC connecting or crossing the r. parasphenoidale
Q: Maximal length palatine
R: Width at palatine–maxillae articulation
S: Internal width at palatine–maxillae articulation

The morphology of appendicular bones was quantified through measurements of total length and diaphyseal diameter of stylopodial bones (humerus and femur) and zeugopodial bones (ulna, radius and tibiotarsus) to the nearest 0.01 mm using a digital caliper (Roch, Lunéville, France). Dimensionless variables were computed: for each bone, the ratio diaphyseal diameter/total length was calculated (shape characters). In addition, the ratio wing length/leg length was also calculated (limb length was computed as stylopodial length + zeugopodial length). Thirteen appendicular bone characters were added to the data set by Cubo (2003): the 10 size characters used to compute shape characters (i.e. total length and diaphyseal diameter of stylopodial and zeugopodial bones) plus tarsometatarsus total length, diaphyseal diameter and shape. Mean values of these ratios for each species were used, assuming no sampling error because of small sample sizes. Data were collected for twelve species, but each character is documented for ten species (not always the same ones). All data can be found in Data S1.

Choice of the reference phylogenies and temporal calibration

Ratite phylogeny is still in a state of flux, with important differences between molecular (Cooper et al., 2001; Haddrath & Baker, 2001; Hackett et al., 2008; Harshman et al., 2008; Phillips et al., 2010; Johnston, 2011; and references therein) and morphological studies (Bourdon et al., 2009; and references therein), and even between studies using the same kind of evidence. Divergence times are even more difficult to estimate within the context of a molecular phylogeny, although Phillips et al. (2010) provide a very good starting point. For all these reasons, and because Cubo (2003) had used two phylogenies (Cooper et al., 2001; Haddrath & Baker, 2001), we decided to use four phylogenies to test evolutionary models in ratites. These include those of Cooper et al. (2001: fig. 2) and Haddrath & Baker (2001: fig. 2), both as modified by Cubo (2003: fig. 1a), and Phillips et al. (2010: fig. 5), which are all molecular phylogenies, as well as a morphological one (Bourdon et al., 2009) that we dated using a combination of fossil and biogeographical data. These four trees allow us to assess the robustness of our results to phylogenetic uncertainties.

Figure 2.

 One of the four trees (called ‘palaeontological tree’ in the text) used for the species-level analysis of the ratite empirical data set. The genus-level analysis was carried out by pruning the tree to retain one terminal taxon per genus and by inserting the generic averages into the remaining taxa.

Bourdon et al. (2009), using 129 morphological characters, assumed that ratites were monophyletic, but were unable to find evidence for the monophyly of extant Australasian ratites suggested by all molecular studies. Bourdon et al. (2009) found the New Zealand ratites (kiwis plus moas) as the sister group of all other ratites. Within this last clade, the aepyornithids (Madagascar) are the sister group of a clade comprising all other extant ratites. Finally, Struthio (Africa) and the Rhea-Pterocnemia clade (South America) are successive sister groups of the Australian Casuarius-Dromaius clade (Bourdon et al., 2009).

To date the tree by Bourdon et al. (2009), we estimated divergence times using well-known ratite fossils: the minimal age for a node was determined by the age of the oldest fossil included in this node, if there is one. The divergence between Dromaius and Casuarius was dated to 35–38 My, following Boles (2001) and Paton et al. (2002). The oldest known ratite, Diogenornis fragilis, has been identified as a rheid (Alvarenga, 1983; Mayr, 2009), allowing us to estimate an age of 56–59 My for the divergence Rheidae/Casuaridae, in the late Palaeocene. Other Tertiary ratite fossils were not relevant in this study, because there were either too recent to estimate the age of other nodes (Mourer-Chauviréet al., 1996; Bertelli & Chiappe, 2005) or not well enough known to unambiguously estimate their position in the phylogeny (Houde, 1986; Houde & Haubold, 1987; Grellet-Tinner & Dyke, 2005; Mayr, 2005; Bibi et al., 2006). A possible exception is the Eocene Lithornis, but its affinities may be with tinamous (Grellet-Tinner & Dyke, 2005; Johnston, 2011), which would place it outside ratites and hence outside the sampled taxa on the topology of Bourdon et al. (2009).

Vicariance biogeography proved to be congruent with the ages estimated by the use of fossils, and some ornithologists have hypothesized that all ratites are descended from a flightless ancestor that was widespread in Gondwana (see for instance Cracraft, 1973, 1974, 2001; Bourdon et al., 2009). This hypothesis allowed us to use geological events to date parts of the tree. Indeed, South America and Australia remained in contact through Antarctica until the Paleogene (Woodburne & Case, 1996), and sweepstake dispersal was still possible until the early Eocene (Veevers et al., 1991; Lawver et al., 1992). Thus, we estimated a divergence time between the South American Pterocnemia-Rhea clade and the Australian Casuarius-Dromaius clade at 60 Mya, which is consistent with the age of Diogenornis, the first South American ratite. The loss of contact between Australia and south-east Papua occurred in the early Eocene (Veevers & McElhinny, 1976), about 25 million years before Emuarius. Africa drifted away from South America (and thus from Antarctica) in the late Albian (Scotese, 2001), which may fix the divergence between Struthio and the clade Rheidae-Casuaridae at 90–110 My. However, this is tentative because there is no fossil record of paleognaths in Africa before the Miocene, by which time there are ostriches both there and in Eurasia. Phillips et al. (2010: 102) suggest that ostriches invaded Africa from Eurasia in the Miocene, but that is not certain because this is based on claimed close relationships between the mid-Eocene Palaeotis from Messel (Germany) and ostriches, for which Phillips et al. (2010: 99) cite Houde (1986), whereas a more recent unpublished analysis cited by Dyke & van Tuinen (2004: 161) has instead found it to be a stem-ratite. We estimated that the divergence between Aepyornithidae and the clade Struthio-Rheidae-Casuaridae occurred between 130 and 110 My, as the Madagascar/India block drifted away from Antarctica in the Early Cretaceous (Scotese, 2001).

The palaeogeographic dating roots the clade of paleognathous birds in the Early Cretaceous, a very old estimate compared to the oldest undoubted ratite fossils known from the Paleogene (around half the age of the Early Cretaceous) or indeed the oldest undoubted fossils of crown-group birds (Kurochkin et al., 2002; Clarke et al., 2005) and their sister group (Clarke & Chiappe, 2001) from the end of the Cretaceous. The much lower ages implied by the fossil record (Hope, 2002; Clarke, 2004) would require several independent losses of flight among ratites during the Tertiary to explain their distribution and thus much morphological convergence between the various ratite taxa. This hypothesis is suggested by the most recent molecular phylogenies, which place tinamous within ratites (Hackett et al., 2008; Harshman et al., 2008; Phillips et al., 2010), but this is unproblematic because we have included such phylogenies among those used in our tests; we have not changed the dates of the molecular phylogenies.

A problem in the vicariance model is the case of New Zealand ratites, because New Zealand drifted away from Antarctica after the separation between Madagascar and Antarctica. Assuming ratite monophyly and a single loss of flight, as we have done to date the tree by Bourdon et al. (2009), this incongruence can be resolved only by the hypothesis that the initial divergence between the moa-kiwi lineage and all other ratites occurred before the separation of Gondwana and New Zeland, and that differential extinction events led to the extinction of the other ratite lineage in New Zealand, and the extinction of the kiwi-moa lineage on other continents, as suggested by Bourdon et al. (2009). The fossil record is so far silent on this question.

We did not find palaeontological or molecular data that could be used to reliably date the divergences between species. Thus, we simply inserted the minimal branch lengths that we enforced throughout all trees (5 My) between species, whenever molecular ages were unavailable. This is analogous to the method proposed by Laurin et al. (2009) to deal with missing branch length data in comparative analyses. For the two other trees, we used the branch lengths shown in Cubo (2003: fig. 1a). However, to assess whether unreliable branch lengths between closely related species (within genera) favour the speciational model over the gradual one, we performed the analyses using generic averages (first analysis) for all characters and repeated them using species data (second analysis) for the four trees.


Our simulations indicate that the test proposed by Cubo (2003) to determine whether a character evolved according to a gradualistic or a speciational model has only a moderate success rate (Table 2). When forward selection was used to determine which of the two phylogenetic distance matrices (unitary branch length or branch lengths reflecting evolutionary time) best explained character data, the correct matrix was identified in only 5–82% of the cases (with an average success rate of 52%). This low and very heterogeneous success rate is attributable to the fact that in many cases neither phylogenetic distance matrix was found to be significantly correlated with the phenotypic data, and when one was, it was often the wrong one; in both cases, this was scored as a failure of the test. Using the probability associated with the explained variance or with the b coefficient to choose between phylogenetic distance matrices gives slightly better, but still very heterogeneous results (17–94% of correct results using the probability associated with the variance; 28–94% of correct results using the b coefficient). This great heterogeneity concerns mostly the Yule two tree, in which a bias in favour of the speciational phylogenetic distance matrix was present, yielding very low success rates (5–27.5%) when the true model of evolution was continuous. Conversely, these three approaches (Cubo’s original, or both modifications thereof) on the same tree yielded very high success rates when characters follow speciational evolution (success rate of 82–94%), apparently reflecting the same bias. Note that, of all methods analysed in this study, Cubo’s original method is the only one that takes into account exclusively those phylogenetic distance matrices significantly related to the trait under analysis. Variants of this method, although more performant in terms of success rates, select phylogenetic distance matrices on the basis of the lowest probability, even when this probability is higher than 0.05. In other words, even when the analyses conclude that neither of the models significantly explain the variation of the trait under analysis, these methods consider that one of them fit the data better than the other.

Table 2.   Power of the three tests to determine whether a character evolved according to a gradualistic or a speciational model. These are the test proposed by Cubo (2003), as originally implemented (‘matrix selection’), modified to use the smallest probability (even if greater than 0.05) associated with the explained variance, modified to use the probability associated with the slope, the test using phylogenetic independent contrasts and the maximum likelihood test (using AIC scores) implemented in comet (Lee et al., 2006). In the tests of Cubo and the new contrast-based test, four trees were used to discriminate between the models: two topologies and two evolutionary models (one in which the branch lengths reflect geological time, and another with unitary branch lengths). The original method by Cubo (2003) is based on linear regressions between phylogenetic distance and phenetic distance matrices. The proportion of the simulations in which the correct tree was selected for all methods is the average for 100 simulations for each of the two trees and for each evolutionary model (speciational and gradual); when Cubo’s (2003) original test selected both phylogenetic distance matrices, this simulation was scored as 0.5. The modified version of Cubo’s (2003) test is the proportion of times that the correct tree has the lowest probability associated with the explained variance or with the slope, even when these probabilities exceed the 0.05 threshold. When, for a given topology, the probability associated with the slope was the same for both evolutionary models, we scored 0.5. The contrast-based method (‘contrasts’ in the table) is based on a regression between standardized contrasts and their expected standard deviation (based on branch lengths). The choice of the model is based on the probability associated with the slope (higher is better).
TestTrue evolutionary model of the charactersProportion of simulations in which each method yields correct results
Tree 1Tree 2Average for the two trees
  1. AIC, Akaike Information Criterion.

Cubo (2003), matrix selectionGradual0.6100.0500.330
Cubo (2003), matrix selection, modifiedGradual0.6000.1700.385
Cubo (2003), slopeGradual0.6050.2750.440
Cubo (2003), matrix selectionSpeciational0.6100.820.715
Cubo (2003), matrix selection, modifiedSpeciational0.5700.9400.755
Cubo (2003), slopeSpeciational0.5500.9350.742

Our proposed test, which consists of regressing standardized contrasts against their expected standard deviation, is slightly better, with a global success rate ranging from 76% to 99%, depending on the real evolutionary model of the characters and, to a lower extent, on the tree. The contrast-based method performed overall better with characters evolving according to a speciational model (maximum success rate of 99%) than with characters evolving gradually, according to a Brownian motion model (maximum success rate of 77%). However, comet outperformed both, with a global success rate around 97%, and not differing significantly between both models. Therefore, only comet was used to test the evolutionary model of our ratite data.

The comet analyses of the 54 osteological characters of ratites show that about 70% (37) evolved according to a gradual model, and only about 30% (17) according to a speciational model (Table 3). This result represents the grand average over the four trees and using both generic averages and species data; the tree and taxonomic level have little influence on these results. The tree that implies the lowest number of ‘gradual’ characters (Cooper et al., 2001) still finds 35, whereas the tree that supports the greatest amount of gradual change (Haddrath & Baker, 2001) finds 40, at the genus level (at the species level, the spread is even narrower, from 37 to 39). A paired-sample t test, performed manually (Zar, 1984) and repeated using Statistica, shows that overall more characters evolve according to a gradual than a speciational model (Data S3), both when genera (t3 = 8.25, P = 0.003726) and species (t3 = 26.94, P = 0.000112) are used. However, the size characters (measurements) follow predominantly a gradual model (87% to 97%, according to genus- and species-level trees, respectively), whereas the shape characters (ratios of the former) predominantly follow a speciational model (55–63%, according to genus- and species-level trees, respectively). Again, this conclusion does not heavily depend on the selected tree; among the size characters, the trees support from 25 (Cooper et al., 2001) to 27 (Haddrath & Baker, 2001) ‘gradual’ characters at the genus level (at the species level, all trees find 29). Similarly, among shape characters, these numbers range from 9 (palaeontological tree) to 13 characters (Haddrath & Baker, 2001) at the genus level; these numbers range from 8 in Cooper et al. (2001) to 10 in Phillips et al. (2010). A paired-sample t test (Zar, 1984) shows that the difference in model between raw and shape measurements is very highly significant (Data S3) for both genera (t3 = 24.24, P = 0.000154) and species (t3 = 48.99, P = 0.000019). This suggests a fair amount of independence between raw (size-related) and shape characters. No single tree seems to give outlier values, and each yields one of the highest or lowest values at least once; the two trees that most often yield extreme (but by no means ‘outlying’) values are those of Haddrath & Baker (2001) and Cooper et al. (2001).

Table 3.   Evolutionary model of ratite osteological characters according to the four tested trees. For the data set, ‘all’ indicates results for all 54 characters; ‘size’ indicates results for the 30 unstandardized, size-related characters; ‘shape’ indicates results for the 24 shape characters.
TreeData setTaxonomic level
Number of gradual charactersNumber of speciational charactersNumber of gradual charactersNumber of speciational characters
Palaeontological and biogeographicalAll39193816
Phillips et al., 2010All37173915
Haddrath & Baker, 2001All40143816
Cooper et al., 2001All35193717
Average of four treesAll36.7517.253816


Speciational vs. gradual models of evolution

A survey of the literature suggest that it will be generally difficult to determine whether characters evolved according to a gradual or a punctuational model, for several reasons. First, as pointed out previously, the tests that have been proposed so far (including the new test presented earlier) require fairly precise knowledge about the evolutionary time separating the sampled species to assess the fit of a gradual model of evolution to the data. This is currently problematic because neither of the two main sources of timing data currently provides more than a crude estimate of the chronology of taxonomic diversification. Namely, the fossil record is notoriously incomplete, and molecular dating relies on calibration constraints predominantly extracted from the fragmentary fossil record. To illustrate this, of about 320 lissamphibian clades for which Marjanović & Laurin (2007) proposed minimal divergence dates based on the fossil record, only four clades had enough known extinct relatives to estimate their maximum age. Yet, Marjanović & Laurin (2007) demonstrated that these few maximum age constraints were crucial for deriving plausible molecular estimates of the ages of most other clades. Other methodological problems plaguing molecular dating are well known and have been adequately described elsewhere (Rodríguez-Trelles et al., 2002; Shaul & Graur, 2002; Brochu, 2004a,b; Graur & Martin, 2004; Britton, 2005). Maximum ages were used for every calibration constraint by Phillips et al. (2010).

Second, assessing the fit of a speciational model to the data requires data about all extant and extinct species of a taxon, or at least a representative sample of the latter (i.e. with fairly homogeneous sampling in all groups). This condition is most limiting because < 1% of the species that have ever lived on this planet are known from fossils, according to plausible models and our knowledge of the past biodiversity (Newman, 2001; Laurin, 2005). Mooers et al. (1999) argued that a punctuational model can be established using extant species if most of the extant species of a clade are included in the study. However, this method assumes that the proportion of extinct species is homogeneously distributed on the tree or negligible (Mooers et al., 1999), and at least the second is in most cases unrealistic because most species are already extinct, as mentioned earlier. Thus, in practice, such a test is possible only within a taxon that has a very dense fossil record that allows detection of most cladogenetic events, a very uncommon situation (Prothero, 2004; Laurin, 2010).

Nonetheless, our simulations show that if all the rather restrictive conditions mentioned earlier are met, the correct model of evolution of characters can be inferred by comet with great reliability (about 97% global success rate). Thus, the problem does not lie in the statistical analysis of the data, but rather in obtaining a phylogeny with correct topology and branch lengths.

Our results are encouraging because, despite the methodological differences, increased sample of characters and number of phylogenies, they confirm the conclusion by Cubo (2003) that the osteological characters of ratites that reflect shape evolved mostly according to a speciational model. Our results provide additional information by suggesting that this does not apply to size-related characters; thus, over two-thirds of the characters in our sample (which includes 30 size-related and 24 shape characters) have evolved according to a gradual model. Size- and shape-related characters appear to follow different models in ratites. These results provide additional support for the hypothesis by Cubo (2003) that heterochronic mechanisms may underlie morphological changes in bone shape during the evolution of ratites because it has been argued that (i) heterochronic changes are instantaneous on a geologic time scale (Gould, 1977), in such a way that the outcome of these changes may be consistent with a speciational model of character evolution and that (ii) only evolutionary shape changes (and not evolutionary size changes) could be evidence for heterochrony (Gould, 2000).

Our conclusions on these points should be seen as tentative because the uncertainties in branch lengths and topology remain substantial. Nevertheless, the fact that the choice of tree (among the four tested) impacts very little on the results suggests that our results are fairly robust to phylogenetic uncertainties concerning ratites. The palaeontological tree, whose paleobiogeographical dating rests on the highest number of hypotheses, yields results congruent with the other trees, and indeed, it is not one of the two trees (Cooper et al., 2001; Haddrath & Baker, 2001) that most frequently yields extreme (but still not outlying) values. Thus, in this case, there is no sharp difference between morphological and molecular signals.


We thank the CNRS and the French Ministry of Research (ML, operating grant to UMR 7207; JC, operating grant to UMR 7193), the Netherlands Organization for Scientific Research (NWO; SG) and the university study subsidy office of Austria (DM) for funding this research.

Data deposited at Dryad: doi: 10.5061/dryad.cr0qh5gc