How biodiversity is generated and maintained underlies many major questions in evolutionary biology, particularly relating to the tempo and pattern of diversification through time. Molecular phylogenies and new analytical methods provide additional tools to help interpret evolutionary processes. Evolutionary rates in lineages sometimes appear punctuated, and such “explosive” radiations are commonly interpreted as adaptive, leading to causative key innovations being sought. Here we argue that an alternative process might explain apparently rapid radiations (“broom-and-handle” or “stemmy” patterns seen in many phylogenies) with no need to invoke dramatic increase in the rate of diversification. We use simulations to show that mass extinction events can produce the same phylogenetic pattern as that currently being interpreted as due to an adaptive radiation. By comparing simulated and empirical phylogenies of Australian and southern African legumes, we find evidence for coincident mass extinctions in multiple lineages that could have resulted from global climate change at the end of the Eocene.
The cumulative fossil record can give the impression that the diversity of life has increased inexorably through time (Raup 1991; Nee 2006; Benton and Emerson 2007: Fig. 1), although punctuated with occasional mass extinction events. What we see today is the standing diversity—the net difference between cumulative speciation (births) and extinction (deaths)—which is a minute fraction of all the species that have ever lived (Jablonski 1995; May et al. 1995; Niklas 1997). The tempo and pattern of diversification remain controversial (Benton and Emerson 2007)—does macroevolution proceed at a relatively constant rate, accumulating lineages exponentially and punctuated only by mass extinction events, or does it proceed through bursts of speciation triggered by processes such as adaptive radiation but otherwise remaining relatively constant? These alternatives underlie such controversies as whether the Cambrian “explosion” of animal phyla was real (Rokas et al. 2005) and perhaps triggered by environmental change (Conway Morris 2006) or was an artifact of flawed methods for estimating diversification rates (Ho et al. 2005).
Teasing apart diversification patterns through time (birth and death rates) is problematic and, until recently, has mostly relied on observations of appearance and disappearance of fossil taxa (Raup 1976; Jablonski 1995; Niklas 1997). However, the fossil record is a biased sample, and notoriously incomplete, making quantification difficult (Raup 1976; Jablonski 1995; Foote and Raup 1996; Niklas 1997; Peters and Foote 2002; Smith 2007). For example, organisms with soft body parts or occurring in arid habitats are seldom fossilized, with observed diversity directly related to the amount of available sedimentary strata (Raup 1976).
Molecular phylogenies can be used to model changes in birth and death rates without needing the fossil record, except to calibrate the relaxed molecular clocks that provide a time frame for the model (Harvey et al. 1994; Nee 2006; Rabosky 2006b). They can represent a near-complete sample of extant species, including lineages of organisms that have been excluded from the fossil record because of their life history or habitat. These reconstructed molecular phylogenies can be converted to models of lineages through time (LTT), which might be useful for interpreting the tempo and pattern of change if common patterns can be discerned and alternative causes excluded. If birth and death rates remain constant, then well-sampled LTT are exponential curves, which are linear in a semi-log plot (Fig. 1A) but sometimes with a late upturn (Fig. 1B) (Harvey et al. 1994; Nee 2006; Rabosky 2006b). Shifts in birth and death rates can leave distinctive signatures in phylogenies, resulting in departures from linearity in semi-log LTT plots (Fig. 1C–F) (Harvey et al. 1994; McKenna and Farrell 2006; Nee 2006; Rabosky 2006b; Ricklefs 2007; McPeek 2008). For example, a density-dependent lineage death rate (Moran process), which is often predicted for an adaptive radiation (Sepkoski 1984; Benton and Emerson 2007), gives a convex LTT with a steep slope early, flattening later (Fig. 1E) (Harvey et al. 1994; Nee 2006; Ricklefs 2007; McPeek 2008). By contrast, the signature of a mass extinction is an LTT with a late upswing in slope (Harvey et al. 1994), technically known as an antisigmoidal curve (Fig. 1F). Although the signatures of some of these processes have been compared in simulated and real data (Harvey et al. 1994; Nee et al. 1995; McKenna and Farrell 2006; Benton and Emerson 2007; Ricklefs 2007; McPeek 2008), others have not. An increase in speciation rate, such as that often predicted from the early stage of an adaptive radiation, is expected to result in a sudden significant increase in the rate of diversification (Fig. 1D) (Turgeon et al. 2005; Rabosky 2006b; Crisp and Cook 2007; Moore and Donoghue 2007; McPeek 2008). Mass extinctions produce a sharp drop in cumulative fossil diversity and are commonly thought to stimulate an adaptive radiation, manifest by a sharp increase in the rate of diversification (Sepkoski 1984; Benton and Emerson 2007). Can these quite different macroevolutionary processes be distinguished by their impact on the shapes of molecular phylogenies? Here we use LTT modeling with simulated data to address this question. We then apply this modeling to a real dataset comprising a large sample of legumes from Australia and infer likely causes of observed LTT rate changes.
The legumes are one of the most diverse plant families, with 18,000 species occurring worldwide and across all habitats. Molecular dating indicates that the legume family diversified rapidly from about 60 Ma, at the beginning of the Cenozoic (Lavin et al. 2005) and, from about 50 million years before present (Ma), large radiations occurred in Australia (tribes Mirbelieae and Bossiaeeae, Crisp et al. 2004) and Africa (genistoid legumes including tribe Podalyrieae, Boatwright et al. 2007). This was a period of dramatic global climate change, with the greenhouse world of 50 Ma giving way to a much cooler, drier, and more seasonal climate 32 Ma (Zachos et al. 2001; Beerling 2007). Given their high diversity and the timing of their radiation, the legumes make an excellent study group for testing hypotheses about environmental correlates of diversification in lineages.
Material and Methods
SIMULATION OF LINEAGES THROUGH TIME
Birth–death phylogenies were simulated using Phyl-O-Gen (Rambaut 2002) with various parameter combinations including: constant birth (b) and death (d) rates; an increase or decrease in rates; sequences of rate changes; episodes of stasis (no or very slow net diversification); and mass extinction episodes of different intensities. To address the question posed in the Introduction, three kinds of models were compared: (1) two diversification rates (slow then fast); (2) a single rate punctuated by a mass extinction episode; and (3) three rates (fast, very slow/stasis, fast) to test whether a stasis episode can produce an effect similar to an extinction. Extinction was assumed to be random and the extremes of clade extinction and uniform extinction (Harvey et al. 1994) were not simulated. Absolute and relative birth and death rates were varied systematically. Usually, we set d≤ 0.5 b because a higher d/b ratio resulted in a high rate of simulation failure (> 50%) due to whole phylogeny extinction. One hundred simulations were run per model with different random seeds, and each tree was grown to 700 extant “species,” from which 50% (i.e., 350) were sampled randomly to reflect the size and sampling of the real phylogenies (see below). Whole and reconstructed trees and their LTT plots were generated graphically in Phyl-O-Gen and compared visually (cf. Harvey et al. 1994). Simulation scripts with parameter values are detailed in Appendix S1.
We sampled about 50% of the ca. 700 known species of Mirbelieae and Bossiaeeae. Trees were rooted with outgroups that included the likely sister group (Hypocalypteae, Lavin et al. 2005), and other taxa sampled from across the legumes. Sequences were obtained from our previous studies (Crisp and Cook 2003a,b,c; Orthia et al. 2005), GenBank, or newly generated as follows.
Genomic DNA was extracted from fresh or silica-gel-preserved leaf material using either cetyltrimethylammonium bromide (CTAB) (Doyle 1991) or a DNeasy plant mini kit (Qiagen, Doncaster, Australia) following the manufacturer's instructions. Three DNA loci were amplified: two from the chloroplast (trnL-trnF and partial ndhF) and one from the nucleus (ITS) (primers are listed in Table S1). PCRs were carried out in a 25 μl volume with 2 mM MgCl2 and using 0.8 U of Platinum Taq (Invitrogen, Mt Waverley, Australia). A standard PCR cycling protocol was followed with an annealing temperature of 55°C. Fragments were sequenced in both forward and reverse directions using ABI Big-Dye chemistry on an ABI Prism 3100 genetic analyser (Applied Biosystems, Carlsbad, CA). DNA sequences were edited using Sequencher version 4.5 (GeneCodes, Ann Arbor, MI) and aligned manually in Se-Al (Rambaut 1996). Parts of the ITS and trnL-trnF sequences that were not confidently alignable across the more distantly related terminals and were offset or omitted; ambiguity in aligning trnL-trnF and ITS prevented the use of outgroups more distantly related than Baphia (Lavin et al. 2005).
For comparison with the Australian taxa, a dataset was compiled for the southern African legume tribe Podalyrieae using ITS sequences from GenBank for 107 of the 128 known species in the tribe (Boatwright et al. 2007).
For each locus, phylogenies were estimated using maximum likelihood (ML), retaining the best tree from five searches using the genetic algorithm implemented in GARLI (Zwickl 2006) with a GTR + I + G model, found to be optimal by Modeltest (Posada and Crandall 1998) for all datasets. Also, a Bayesian Monte Carlo Markov chain (MCMC) search (MrBayes 3.1.2: Ronquist and Huelsenbeck 2003) was conducted for ndhF, with the data partitioned by the three codon positions (models: 1st and 2nd = GTR + I + G; 3rd = GTR + G) and indels scored as binary characters (standard model). To achieve convergence of the two parallel runs (determined by the multiple tests recommended by the authors), it was necessary to run these analyses for 107 generations with eight chains and temperature = 0.1. A consensus tree (e.g., from a Bayesian MCMC search) is less suitable than a single ML tree for molecular dating because it is a composite tree with nonoptimal branch lengths and multiple zero length internodes, potentially leading to dating artifacts. Therefore, as the resulting topologies and branch lengths showed little difference between the two search methods, most subsequent analyses were based on the optimal GARLI trees. The trnL-trnF and ndhF data showed no supported differences in resolution and were combined into a single cpDNA partition for comparative analysis.
RELAXED MOLECULAR CLOCK DATING
Phylograms were transformed into chronograms, in which branch lengths were proportional to time, using penalized likelihood (PL) in r8s 1.71 (Sanderson 2003) with smoothing parameters optimized by fossil cross-validation. To estimate minimum absolute ages of nodes, several primary (fossil-based) and secondary calibration points were used, as described previously (Lavin et al. 2005; Boatwright et al. 2007; Crisp and Cook 2007). The 95% confidence intervals around identified dates of change in diversification rate (see Results) were estimated: (1) from the ndhF data using PL with 100 Bayesian post-burnin trees, sampled randomly from the 95% credible set; and (2) from all loci using the “confidence” function in r8s. The first method is more reliable than the second but was not used for all loci because it requires a profile of suitable trees (Sanderson 2004), such as were available only for ndhF. In any case, the confidence intervals estimated for ndhF by both methods were nearly identical (Table 1).
Table 1. Estimated times (Ma) of slope upturn in LTT, following the inferred mass extinctions at plateaux in Fig. 2. Confidence intervals (95%) were derived in r8s using the “profile” command with a posterior sample of trees (ndhF Bayesian trees), and using the “confidence” function in r8s (all ML trees).
Mirbelieae + Bossiaeeae
DIVERSIFICATION OF LEGUMES
Cumulative LTT from the legume chronograms were plotted in Excel using tables of estimated node ages imported from r8s. Outgroups were first deleted to avoid artifacts resulting from undersampling of species (Nee et al. 1995). Significance of shifts in diversification rates over time was tested using the gamma and ML tests implemented in LASER (Rabosky 2006a).
Additional simulations were aimed at reconstructing the diversification rate changes observed in the real data. Goodness of fit between the simulated and real LTT plots was assessed visually. For the two best-fitting models, 95 LTT plots (i.e., excluding the outlying 5%) were output from Phyl-O-Gen as graphical vectors and overlaid in a single plot to visualize the stochastic variation.
In general, simulations produced expected results (cf. Harvey et al. 1994; Nee et al. 1995). Constant birth and death rates, with 50% sampling of lineages, gave approximately linear LTT (cf. Fig. 1A and Nee et al. 1995: fig. 11.10). The different models of diversification gave different LTT plots. Simulations having two phases of diversification (slow then fast) gave a linear LTT plot that abruptly increased in slope when the diversification rate increased (Fig. S1, cf. Turgeon et al. 2005). When the fast and slow phases were reversed, the resulting LTT rose linearly then abruptly decreased in slope (cf. Fig. 1C). In simulations with three diversification phases (fast, very slow/stasis, fast), corresponding changes of LTT slope were sometimes evident but often not (e.g., Fig. S2). In contrast, mass extinction models consistently resulted in antisigmoidal LTT curves, i.e., in which the line rose steeply to a plateau, then again rose steeply (Fig. 3). The simulated phylogenies from which these curves were plotted clearly showed a mass extinction signature (Figs. S3 and S4), including a “broom and handle” (Crisp et al. 2004) shape in the reconstructed tree (Fig. S4), i.e. with lineages having long stems and speciose crowns. The LTT curves closely resembled those from mass extinction simulations by Harvey et al. (1994). They also resembled a few of the curves from the three-phase simulations with a stasis period but in most of the latter no plateau is evident (Fig. S2). See Appendix S1 for further description of simulation results.
DIVERSIFICATION OF LEGUMES
All three genes sampled for Mirbelieae + Bossiaeeae resulted in very similarly shaped phylogenies, having little difference in topology or relative branch lengths. The chronograms (Figs. S5–S8) of each show an initial period of diversification within the ingroup, after which the three main clades (Daviesia, Pultenaea s.l., and Bossiaea) have “broom and handle” shapes with long stems and speciose crowns that are near-polytomous at the base. LTT plots for each chronogram (Fig. 2A–C and S9) are antisigmoidal curves (cf. Figs. 1F and 3), i.e., they rise steeply at first, curve over to a plateau between 50 and 33 Ma, then rise steeply again to the present. These curves were found consistently for all gene regions and rate constancy was rejected (e.g., for the ndhF data: gamma =−2.040, P= 0.021; delta AICrc = 86.19, P < 0.000). The chronogram of the African tribe Podalyrieae (Fig. S10) gave an LTT plot (Fig. 2D) with the same shape, having a plateau between 50 and 30 Ma. Rate constancy was again rejected.
In both the Mirbelieae + Bossiaeeae and the Podalyrieae, the plateau period ceased abruptly as the final diversification period began. Mean estimated starting dates of this diversification period (28.5–34.0 Ma) were similar across all gene regions, with confidence intervals overlapping (Table 1). ndhF gave younger mean dates than the other loci but these estimates could be more reliable because the ndhF data included more outgroups (cf. Figs. S5–S8) and more primary calibration points. Also, alignment offsets in trnL-trnF and ITS probably artificially shortened the deep internal branches in the trees. This would have the effect of drawing the more distal nodes towards the root, resulting in older date estimates for these nodes.
The best-fit LTT simulations were produced by models having two diversification periods punctuated by a 95–97% mass extinction (e.g., Fig. 3). A composite plot shows that, despite stochastic noise, an antisigmoidal curve very similar to those in Figure 2 emerges from multiple simulations (Fig. 4A). In this model, parameter values closely fit those from the empirical data: the number of lineages in the reconstructed trees was 8–10 at the plateau and 350 (being a 50% sample of the 700 lineages in the whole tree) at the end of the final period, and the relative lengths of the three periods were similar (cf. Figs. 2 and 4A). Note that in the LTT plot of a reconstructed tree (i.e., including only lineages that survive to the present), the time of the mass extinction is marked by the end of the plateau, when the diversification rate increases again (compare the whole and reconstructed trees and their LTT in Figs. 3, S3, and S4). The plateau extends back in time from that point because, by chance, a large extinction removes some larger clades from nodes deep in the tree.
Some three-rate models (i.e., fast/slow/fast, with a stasis episode instead of a mass extinction) produced a small proportion of LTT similar to those from the real data but most LTT from these models had different shapes, lacking plateaux (e.g., Fig. S2). Nevertheless, a diffuse antisigmoidal shape is evident within the “noisy” composite plot (Fig. 3B). Simulations using two-rate models (also lacking a mass extinction) did not reproduce the distinctive antisigmoidal LTT given by the real data and showed no evidence of a plateau (e.g., Fig. S1).
Given the results of the simulations presented here, what is the most likely explanation for the observed pattern of evolutionary diversification in the legume taxa? The null hypothesis of constant-rate lineage diversification has been clearly rejected. The pattern of change has a distinctive signature that is replicated across two taxa and three loci from two genomes. This signature is not consistent with a simple explosive radiation model having a single increase in the diversification rate; nor is it convex, as expected in a density-dependent adaptive radiation (Harvey et al. 1994; McKenna and Farrell 2006; McPeek 2008). The signature is most closely matched by a model having constant diversification punctuated by a mass extinction. Alternatively, a model with three diversification episodes at different rates (fast/slow/fast) sometimes produces similar LTT and cannot be ruled out by simulation alone.
The diversification changes in the legumes were congruent in both pattern and timing across two major lineages and two continents. These lineages were evolving independently at the time and this strongly implicates an external driver of the changes. An intrinsic trigger, such as a key innovation, would not be expected to arise simultaneously in multiple separate lineages. An external driver would be expected to simultaneously affect multiple lineages and is therefore more likely, given the congruence in our data. Could a cryptic mass extinction be the external driver of the antisigmoidal signature in the legume lineages?
Mass extinction events are marked by the rapid disappearance of biological diversity from the fossil record at significantly more than the background rate. They have been recognized by their impact on animal diversity (Sepkoski 1984; Benton 1995; Jablonski 2005) but loss of plant fossil diversity has been much less evident during these events, possibly reflecting ecological or taphonomic differences between the organisms (Niklas 1997; McElwain and Punyasena 2007). Just after the end of the Eocene, c. 32–30 Ma, a global change in climate occurred, triggered by the final phase in the break-up of the supercontinent Gondwana and the opening of the Southern Ocean (Zachos et al. 2001; Lawver and Gahagan 2003). The climate cooled and became seasonal as the first ice cap formed in Antarctica and the West Wind Drift circulation system established, continuing till today. A hypothesis of mass extinction following major environmental change predicts that multiple taxa occurring in the affected environments should have similar signatures in their molecular phylogenies. We have found similar signatures in three clades within the Australian legumes Mirbelieae + Bossiaeeae (Bossiaea, Daviesia, and Pultenaea s.l.) and an African legume clade (Podalyrieae). A common link between these signatures and the climate change 32–30 Ma is further supported by the inferred timing of the extinction events: c. 28–34 Ma in the Australian groups and c. 30 Ma in the African group. Phylogenies of co-occurring plant taxa in the southern hemisphere, such as Casuarinaeae, show a “broom and handle” signature (Crisp et al. 2004) and should be investigated for evidence of mass extinction in the same period. Alternatively, if McPeek's (2008) meta-community model were applicable, with either ecological or geographic speciation dominating in different lineages, resulting, respectively, in decelerating and accelerating radiations, then co-occurring taxa would not show congruent LTT curves. However, McPeek's model does not include mass extinction episodes.
Associate Editor: J. Huelsenbeck
We thank J. Ross for providing specimens of Bossiaea, D. Morris and H. Reichel for some of the sequences, and R. Carter for running the tests in LASER. This research was supported by grants to both authors from the Australian Research Council.