Inclusion of a near-complete fossil record reveals speciation-related molecular evolution


Correspondence author. E-mail:


  1. The rate of genetic evolution is often too variable among lineages to be explained by a strict molecular clock, prompting alternative ecological and evolutionary hypotheses to explain this rate heterogeneity.
  2. One controversial hypothesis is that speciation provokes a burst of rapid genetic change, giving molecular evolution a punctuational component. The amount of root-to-tip genetic change therefore tends to increase with the number of identified speciation events (nodes) along the root-to-tip path in molecular phylogenies. The controversy arises because nodes on molecular phylogenies can typically only be counted if both descendants are extant.
  3. Here, using stratigraphic, phylogenetic and ecological data from the exceptional fossil record of Cenozoic macroperforate planktonic foraminifera, we test whether among-lineage rate heterogeneity is explained by ecological factors (abundance, life history and environment) and by the numbers of speciation events according to fossil lineage, fossil morphospecies and molecular species concepts.
  4. The number of nodes between root and tips on the fossil lineage phylogeny was a statistically significant correlate of the rate of molecular evolution over the same root-to-tip path. The speciation counts from other species concepts and hypothesized ecological drivers had considerably less support.
  5. Our results showcase how the fossil record contains signals of biological processes that drive genetic evolution, justifying calls to further marry fossil and molecular data when studying macroevolution over geological time-scales.


Gene sequences evolve along the branches of the developing species phylogeny. The rate of this evolution often shows too much heterogeneity to be compatible with a strict molecular clock, inviting the search for causal factors (Martin & Palumbi 1993; Mooers & Harvey 1994; Lanfear, Welch & Bromham 2010). Explanatory variables identified so far fall into three broad categories (reviewed by Lanfear, Welch & Bromham 2010; Gaut et al. 2011): life history (e.g. generation time, effective population size: Martin & Palumbi 1993; Mooers & Harvey 1994; Bromham 2011), environment (e.g. environmental energy, Davies et al. 2004) and clade diversity (Barraclough & Savolainen 2001; Pagel & Meade 2006; Goldie, Lanfear & Bromham 2011; Lawrence et al. 2012). When clade diversity was first shown to correlate with rates of molecular evolution, the direction of causality was left open (e.g. Barraclough, Harvey & Nee 1996). More recent papers argue that a large proportion of molecular evolution, across many phylogenies of different clades, is associated with speciation events, prompting the suggestion that speciation per se causes a burst of genetic evolution (Webster, Payne & Pagel 2003; Pagel, Venditti & Meade 2006; Venditti, Meade & Pagel 2006). Theoretical models exploring the time to speciation have, particularly, focused on the factors that affect fixation probability of mutations, including postzygotic isolation (Orr & Turelli 2001) and population subdivision to reduce effective population size (Orr & Orr 1996; Gavrilets 1999). Regardless of whether molecular evolution drives speciation or vice versa (Barraclough, Harvey & Nee 1996), the correlation between diversification and the rate of molecular evolution may, of course, simply reflect some codependent third (or other) variable.

Empirical tests of the cause of the correlation between species’ phylogenetic path length (i.e. the amount of molecular evolution inferred to have taken place along the root-to-tip path) and the number of nodes along said path are contentious, however. There are two main objections. First, the rate of gene sequence evolution inferred from a molecular phylogeny is prone to a long-recognized artefact (e.g. Goodman et al. 1974) known as the node density effect (Sanderson 1990): the amount of change inferred will tend to increase with the number of nodes between the root of the phylogeny and the terminal extant species. The second objection is that molecular phylogenies only contain direct information about speciation events whose daughters both leave extant descendants. For all but the most recent clades, this is likely to be a small fraction of all speciation events. This difficulty is exacerbated if the relative diversities of subclades have changed over time (e.g. Ezard et al. 2011; Jetz et al. 2012) and model fit fails to acknowledge this. The consequence is to reduce any correlation between current diversity and numbers of speciation events (this can also occur if the phylogeny does not contain all relevant extant species: Witt & Brumfield 2004).

A diverse extant clade with complete genome coverage and a complete fossil record would provide the ideal testing ground for resolving the controversy about the role of speciation in gene sequence evolution: the total number of speciation events in each extant species’ history could be counted and compared against the explanatory power of other proposed ecological, life history and/or environmental correlates of molecular evolution. Competing hypotheses could be distinguished using multiple regression.

Cenozoic macroperforate planktonic foraminifera possibly provide, at present, the closest approximation to this ideal. This clade of open-ocean plankton have an exceptional fossil record, due to extremely large population sizes (Norris 2000), extremely widespread species distributions (Hemleben, Spindler & Anderson 1989) and calcium carbonate ‘shells’ (tests) that record morphology, ecology (e.g. Coxall et al. 2000) and climate (e.g. Pearson et al. 2001). Their fossils form thick deposits over much of the open-ocean floor and have been sampled widely from different ocean basins and geological epochs by the Integrated Ocean Drilling Program and its predecessors; these cores provide the foundation of our understanding of palaeoclimate (Zachos et al. 2001). The results from this comprehensive sampling have been captured in the Neptune database (Lazarus 1994; Spencer-Cervato 1999), which therefore contains data to estimate how geographical distributions have changed over the ancestry of each extant species (Liow et al. 2010). Aze et al. (2011) recently published a complete species-level estimate of the clade's phylogeny from only fossil data, inferring species’ functional ecologies and the relationships among morphospecies and underlying evolutionary species lineage (Stanley 1979). Here, we test speciational, morphological, ecological and environmental hypotheses of what drives molecular evolution.

Materials and methods

Fossil phylogenies

We use a complete phylogeny of macroperforate planktonic foraminifer species of the Cenozoic Era, constructed exclusively from fossil data to provide palaeontologically calibrated ages for every divergence within the clade that are independent of molecular data (Aze et al. 2011). The fossil phylogenies were constructed using typical microfossil practice. This means a (more or less) literal reading of the fossil record to assign specimens to morphospecies, that is, species-level taxa identified from morphology (Pearson 1998a). Aze et al. (2011) recognized 339 morphospecies and then aimed to eliminate ‘pseudospeciation’ and ‘pseudoextinction’ (Stanley 1979), which arise when gradual evolution causes morphospecies to intergrade through time, by constructing another related phylogeny based on inferred evolutionary species. The lineage phylogeny of 210 species serves as the basis for our palaeontologically calibrated ages. This traditional approach to phylogenetic reconstruction is facilitated by the completeness of this group's fossil record: species-level lineages in this group have at least an 81% chance of being detected per million year interval (Ezard et al. 2011), making this species-level fossil record at least as complete as the best genus-level records of macroinvertebrates (Foote & Sepkoski 1999). Given this exceptional species-level completeness, the phylogeny of evolutionary species provides more comprehensive counts of cladogenetic (lineage-splitting) events in each extant species’ history than the typical molecular restriction of extant descendants.

Molecular phylogenies

Throughout the manuscript, we refer to species by the current names only but used both new and old conventions to obtain all possible data from GenBank. Of the 34 extant lineages identified morphologically from fossil data, 17 were represented in GenBank (Fig. 1) at sufficient quality to permit analysis and construction of molecular phylogenies. Nearly all of the DNA sequence data available for the clade are from a single gene, small subunit ribosomal RNA (SSU rDNA, e.g. Darling & Wade 2008; Ujiie & Lipps 2009). We constructed molecular phylogenies from data accessed in GenBank, discarding sequences shorter than 500 base pairs, likely misidentifications, and other specimens previously declared problematic (Aurahs et al. 2009a; Göker et al. 2010). We constructed our trees using 138 samples (Table S1) by, wherever possible, selecting 10 individuals at random from each morphological species to account for the computational demands of the Q-INS-i alignment algorithm (see below). To avoid within-species node density effects, we constructed data sets with one randomly selected sample from each species. In this way, we built 100 molecular phylogenies constrained to the fossil topology and 100 unconstrained molecular phylogenies.

Figure 1.

The Aze et al. (2011) lineage fossil phylogeny of evolutionary species of Cenozoic macroperforate planktonic foraminifera with the 17 species sampled from GenBank in thick black lines and their ancestors in dashed black lines. The time-scale is from Berggren et al. (1995). Drawn using paleoPhylo (Ezard & Purvis 2009).

Alignment of SSU rDNA sequences is not straightforward within this group as sequence length is highly variable (Darling & Wade 2008). These variable length ‘expansion sequences’ complicate the construction of large-scale phylogenies. Previous strategies have included aligning only the most strongly conserved ‘stem’ regions (e.g. Darling et al. 2000) and a multi-analysis approach in which alignment parameters were varied systematically (e.g. Aurahs et al. 2009a). Careful alignment is especially important when estimating rates of molecular evolution (Lanfear, Welch & Bromham 2010), so, for the first time in this group, we used structural alignment methods (Q-INS-i algorithm implemented in MAFFT, Katoh & Toh 2008). Q-INS-i uses an iterative fold-then-align procedure (Gardner & Giegerich 2004) to estimate and account for common secondary structures among unaligned sequences. The alignment was then inspected by eye, revealing one probable misalignment (Turborotalita quinqueloba accession number AY241710), which we excluded from all subsequent analyses.

All phylogenetic inference was conducted using beast 1.7.4 (Drummond et al. 2012). For the trees with one representative from each of the 17 lineages, we used a GTR + Γ model of sequence evolution, an exponential relaxed molecular clock (Drummond et al. 2006), Yule tree prior and two independent MCMC runs of 5 million generations each. For the constrained phylogenies, we used the fossil lineage phylogeny given in Aze et al. (2011) to fix the tree topology and to constrain node ages, assuming that node ages have small, uniform uncertainty. The dating of the sample at which the speciation is thought to have taken place has an approximate uncertainty of 100 000 years; this figure should be interpreted as an estimate of the lower bound (e.g. Norris & Hull 2012). Speciation could have taken place earlier but been missed because it occurred in an unsampled part of the world (unlikely to cause a long delay, given the geographical distribution of sampling) or because morphological differentiation only arose after speciation.

Burn-in, convergence and mixing for all parameters were evaluated by reference to the effective sample size and by visual inspection of trace plots in tracer 1.5 ( We discarded the first 1000 trees (10%) from each run as a conservative burn-in and summarized the runs into a single tree by averaging branch lengths across the combined posterior distribution. The posterior distributions were summarized as a maximum clade credibility (MCC) tree. Branch lengths for the MCC tree were calculated by averaging branch lengths from the posterior distribution. All trees, alignments, BEAST inputs and our data frame are available on FigShare via

Explanatory variables

Phylogenetic path length, that is the sum of molecular evolution between the root and the focal tip, is an integrative measure over the whole history of the clade (Welch & Waxman 2008). We have therefore made use of the rich fossil record for this group to develop similarly integrative explanatory variables, summarizing the history along each extant species’ ancestry. Such integrative variables are expected from first principles to produce greater statistical power than variables taken solely from extant representatives (Welch & Waxman 2008), but are seldom available. Our explanatory variables were as follows (bold words refer to results in Table 1):

Table 1. AICc (Akaike information criterion corrected for small sample size) scores between the peaks of kernel density plots for all hypothesized drivers of molecular evolution
Fossil lineage nodes5·021·7726·18 0
Extant sampled nodes3·25 029·553·37
Fossil morphospecies nodes11·658·4030·133·95
Ecogroup changes11·678·4230·654·47
Morphological changes11·338·0831·285·10
  1. Speciation events between the root of the phylogeny and the extant tips. Fossil lineage nodes (Aze et al. 2011) represent our best estimates of numbers of true speciation (i.e. cladogenetic) events in each species’ ancestry. We also calculated fossil morphospecies nodes (Aze et al. 2011) to give a coarse proxy for rates of phenotypic evolution and extant sampled nodes from a phylogeny of extant species only.
  2. Speciation in this clade typically appears asymmetric, in that one descendant is markedly more distinct from the ancestor than the other (Pearson 1998b; Aze et al. 2011). We therefore counted, on the lineage phylogeny, the proportion of nodes throughout each species’ ancestry when the focal species was a new, descendant species and not the persistent ancestor. A strong correlation between this descendance and phylogenetic path length would be consistent with the hypothesis that bursts of molecular evolution occur when species are at lower effective population sizes, as might be expected in the newly formed daughter at each speciation event.
  3. If rapid genetic change is associated with rapid adaptive evolution, then a key factor might be how many times each lineage has changed its niche. Ecogroups are defined by a combination of carbon and oxygen stable isotope ratios (reviewed by Aze et al. 2011) and reflect the depth habitats that extinct species are inferred to have inhabited. Following Ezard et al. (2011), we classified morphospecies as inhabiting mixed, thermocline and subthermocline layers. Morphogroups are defined by the presence or absence of three morphological innovations: photosynthetic algal symbionts, spines and keels (reviewed by Aze et al. 2011), all of which affect foraging behaviour (Hemleben, Spindler & Anderson 1989). We therefore counted the number of ecogroup changes and morphogroup changes, including reversals, between the root and tip of the fossil lineage phylogeny. These changes usually, but not always, occur at nodes.
  4. Temperature (Davies et al. 2004): Many species in the clade live in the surface mixed layer of tropical or subtropical water masses, where waters are warm and individuals are potentially exposed to mutagenic UV radiation. Other species are found predominantly in cooler, lower-energy environments – higher latitudes or deeper water. We combined the ecological and phylogenetic data for the ancestral morphospecies to compute the proportion of time when ancestral species were associated with high-energy environments (ecogroups 1, 2 and 6 in Aze et al. 2011).
  5. Ecological Dominance: Ecologically dominant, abundant, widespread species might evolve at different rates from rarer species with a more restricted distribution. After synonymizing taxonomic names in Neptune (Lazarus 1994; Spencer-Cervato 1999) with those in Aze et al. (2011), we calculated – within each 1 My temporal bin from 65 Ma to the present – the proportion of sampled sites that contained ancestors of each extant lineage. Averaging these values across the 65 bins then gave a prevalence value for the history of each extant lineage. Neptune usually reports abundance on an ordinal scale (from rarest to most common): present, trace, rare, frequent, common, abundant, dominant. For each extant species, we estimated abundance as the proportion of occurrences along the lineage in which the species or its ancestor was at least ‘common’.
  6. Spinosity is associated with ecological and life-history variations (Hemleben, Spindler & Anderson 1989). Spinose species often contain photosymbionts and probably have generations of 2–4 weeks, whereas nonspinose species may have longer generation times (Hemleben, Spindler & Anderson 1989). In contrast to the other ecological variables, spinosity has been strongly phylogenetically conserved: eight of the 17 species in our study have not had spines at any point in their Cenozoic ancestry, whereas the remaining nine evolved spines around 64·85 Mya. We therefore score this as a binary trait.

Statistical analyses

We ran all models independently over each of the 100 constrained and 100 unconstrained trees and so obtained distributions of statistical quantities, regressing root-to-tip phylogenetic path length (measured as substitution rate) for the 17 lineages against the explanatory variables detailed in the previous subsection using phylogenetic generalized least squares (pgls) models from the caper package in r (v. 15. 2, R Core Team 2012). We constructed a full model, a null model and univariate models for each explanatory variable. The null model assumes that speciation has no impact on molecular evolution. No interactions were fitted due to low statistical power because only 17 phylogenetically dependent data points are available. We used the Akaike information criterion (AIC) corrected for small sample size (AICc, Burnham & Anderson 2002) for model selection. Burnham & Anderson (2002, p. 71) consider models within 2 AIC units (ΔAIC < 2) of the minimum AIC to all have ‘substantial’ support, while a model with ΔAIC > 4 has ‘considerably less’ support. As the distributions of AICc scores are not well described by parametric distributions we take the peak height of kernel density plots as the most likely value.


When the molecular phylogeny is constrained to the fossil lineage topology, the strongest correlate of molecular evolution in extant species was the number of nodes between root and tip on the fossil lineage phylogeny (Table 1). The difference in AICc scores between the fossil lineage nodes and the null model is statistically significant (ΔAICc = 2·56), as it is to fossil morphospecies (ΔAICc = 3·95) and extant sampled nodes (ΔAICc = 3·37). The ecological hypotheses that act as potential ‘third’ variables all had considerably less support (i.e. ΔAICc > 4, Table 1). To test the robustness of these results, we also tested SSU rDNA phylogenies that were not constrained to the fossil topology. In this instance, the model with the lowest AICc was actually extant sampled nodes, followed by the fossil lineage nodes (ΔAICc = 1·77), which also had substantial support. All remaining models, including the null and all ecological models, had considerably less support (Table 1). The distributions of AICc scores had less variability in constrained than unconstrained phylogenies (Fig. 2). As we shall see, however, the significant correlation between extant sampled nodes and the rate of molecular evolution does not match expectations as clearly as it might first appear.

Figure 2.

Kernel density of Akaike information criterion corrected for small sample size (AICc) for the null model (dark grey), fossil lineage nodes (blue), fossil morphospecies nodes (light blue) and sampled extant nodes (orange) when constrained (a) or not constrained (b) to the topology of the fossil lineage phylogeny. The vertical lines denote the highest point on the kernel density, which we interpret as the most likely AICc score.

From first principles, we expect this correlation to be positive if rates of molecular evolution are indeed higher at and around speciation. The node counts from different species concepts are correlated among themselves, but only moderately so (Fig. 3). For example, Globigerina bulloides and Globoconella inflata both have four nodes between root and tip on the molecular phylogeny, but the corresponding numbers on the fossil lineage phylogeny are 6 and 12. Orbulina universa has seven nodes between root and tip on the molecular phylogeny, but 12 – the same as G. inflata – on the fossil lineage phylogeny.

Figure 3.

Correlated node counts (speciation events) across the three species concepts considered. The Spearman rank correlation between fossil lineage and fossil morphospecies phylogenies is 0·66, 0·69 between fossil lineage and molecular phylogenies and 0·35 between fossil morphospecies and molecular phylogenies. Darker grey indicates >1 data with the same values.

Root-to-tip molecular evolution does indeed correlate positively with fossil lineage node counts on both constrained and unconstrained phylogenies (Fig. 4a, d). Correlations with fossil morphospecies node counts were marginally negative but not significantly so for either topology (Fig. 4b, e). For sampled extant node counts, this correlation is positive on the constrained phylogenies, but negative on unconstrained phylogenies (Fig. 4c, f). The AICc scores for fossil morphospecies on both phylogenies and sampled extant counts on unconstrained phylogenies (Table 1) might therefore be interpreted as overestimates due to the observed negative correlations (Fig. 4). The model-averaged parameters represent a marginally statistically significant effect of the fossil lineage model for constrained phylogenies (Fig. 4a), and it is only the fossil lineage model that returns positive correlations with the rate of molecular evolution for both constrained and unconstrained phylogenies (Fig. 4).

Figure 4.

Scatter plots of the three node counts and mean branch length over the 100 constrained (top row) and 100 unconstrained (bottom row) trees, with model-averaged mean pgls coefficients, standard errors, P-values and mean r2 in the legend. Error bars are 95% parametric confidence intervals.


Diversification rate has often been observed to correlate with root-to-tip molecular evolution (Barraclough, Harvey & Nee 1996; Webster, Payne & Pagel 2003; Pagel, Venditti & Meade 2006; Eo & DeWoody 2010; Lanfear et al. 2010; Venditti & Pagel 2010). The underlying assumption of such correlations, typically restricted to comparisons among extant species only, is that observed node counts from molecular trees are tightly correlated with the actual node count, despite being incomplete due to the absence of extinct species. Here, the number of speciation events on the fossil lineage phylogeny correlated positively with the rate of molecular evolution, insofar as this can be quantified by a single gene, unlike the corresponding counts from the fossil morphospecies or molecular phylogenies (Fig. 4). The unconstrained molecular phylogeny node count had the lowest AICc score among the three species concepts (Table 1, Fig. 2b), but this arose from a negative correlation (Fig. 4f). We are unaware of any evolutionary mechanism that could generate a negative correlation, but incomplete and/or nonrandom sampling among extant species is a potential statistical mechanism. The positive correlation between fossil lineage node counts and the rate of molecular evolution has similar statistical support to extant node counts for unconstrained trees and is the outstanding candidate for constrained trees (Fig. 2, Table 1), so we interpret this as our best performing explanatory variable. One explanation for the positive correlation between fossil lineage node counts and the rate of molecular evolution is because they, presumably, are the best estimate among the three species concepts of the true number of cladogenetic events along the same phylogenetic path (Quental & Marshall 2010).

Why might numbers of speciation events matter? Rather than driving molecular evolution, speciation could merely correlate with another variable, measured or not, that is the true cause of both elevated diversification and higher rates of molecular evolution. We attempted to quantify many of the proposed third variables that could drive both speciation and molecular evolutions, but none provided any improvement over the null model (Table 1). Our time-averaged explanatory data offer no support for the hypothesis that high temperatures promote rapid change in this group (Allen et al. 2006; though note they analysed morphospecies), as they appear to in other clades (Davies et al. 2004), nor for any effect of prevalence or abundance. We might have also envisaged support for the impact of effective population size via the descendance variable, assuming that newly formed species initially have lower abundance than their ancestors, but none was detected (Table 1). Changes in the depth or timing of gamete release have the potential to cause reproductive isolation (Norris 2000), and the life-history characteristics of planktonic foraminifera (vertical migration and synchronized release of large numbers of isogamous gametes: Hemleben, Spindler & Anderson 1989) suggest that decreased reproductive output when locally rare (Allee effects) could be catastrophic in this clade. Generation time is perhaps the predictor with most empirical support to date (Bromham 2011), but current data for this clade are too coarse and incomplete, especially for the deep-dwelling globorotaliids, to permit a meaningful test. Our failure to find statistical support for these proposed correlations does not mean that they are not actually the true drivers of molecular evolution or speciation (but see Lanfear et al. 2010). If we assume that the lack of support for third variables is robust, this leaves the possibility that speciation drives rates of molecular evolution, or vice versa. We cannot separate these hypotheses with current data. We constructed integrative explanatory variables to sum variation throughout the history of the clade (Welch & Waxman 2008), but a potential consequence of such conglomerate constructs is to dilute the impact of short bursts of environmental volatility. If these short bursts leave long-lasting evolutionary signatures via the fixation of particular mutations or speciation events or both, then analysis in a stochastic framework with a temporally autocorrelated random process could prove insightful.

The clade's enviable fossil record makes them ideal for palaeontological analyses of wider biological interest (Wei & Kennett 1988; Schmidt et al. 2004; Ezard et al. 2011). In other important respects, the group is far from a model system for evolutionary biology, however. Nearly all the current sequence data are from a single gene and only available for around half of the extant species (Darling & Wade 2008; Aurahs et al. 2009b). Furthermore, SSU rDNA often underestimates true levels of biodiversity (at least in terrestrial species: Tang et al. 2012), making work to sequence additional genes (Longet & Pawlowski 2007) particularly urgent. These limitations reemphasize how important fossils can be when inferring evolutionary relationships (Quental & Marshall 2010; Slater, Harmon & Alfaro 2012), particularly those close together deep in the clade's history (Rokas & Carroll 2005). Both molecular and palaeobiological approaches to identify the drivers of diversification attempt to do the same thing, albeit at different scales, so the correlation between fossil lineage node counts and rate of molecular evolution might be anticipated assuming that morphological variation represents the outcome of multilocus genetic variation related to the SSU rDNA divergence (Will, Mishler & Wheeler 2005). As with previous comparisons of molecular and fossil phylogenies (reviewed in Aze et al. 2011), there were substantial differences between trees obtained by constraining or by not constraining the SSU rDNA tree to the fossil topology: an SH test (Shimodaira & Hasegawa 1999) rejected the fossil topology for each of our 100 unconstrained trees (median log-likelihood difference = 70·6).

Several extant morphologically defined species contain deep divergences, often reflecting biogeographical differences, suggesting that many morphologically defined species may be more inclusive than genetic species (see Darling & Wade 2008 for a review). A serious issue for palaeontological approaches is the nonuniform distribution of cryptic genetic diversity across the phylogeny (Darling & Wade 2008). Although advances in laboratory techniques hold great promise for extracting more reliable genetic material and to fill gaps in our knowledge of the life history and ecology of certain species (Hemleben, Spindler & Anderson 1989), such advances should not render palaeobiological approaches irrelevant. Andre et al. (2013) showed extremely reduced genetic variation in four morphospecies within the Globigerinoides sacculifer plexus and concluded there had been taxonomic over-interpretation. However, these four morphospecies are considered part of the same biological lineage (Aze et al. 2011), and the asserted morphological diversification is associated with developmental plasticity (Bijma et al. 1992), so is not expected to be underpinned by genetic differentiation.

If different species concepts are used, then estimates of speciation (Fig. 3), extinction and diversification rates change (Ezard et al. 2012) because different species concepts represent different ideas of what speciation and extinction mean (Stanley 1979). Unlike the fossil morphospecies phylogeny, the fossil lineage phylogeny of evolutionary species aimed to exclude ‘pseudospeciation’ and ‘pseudoextinction’ (Stanley 1979). Pseudospeciation is not ‘biological speciation’ because the evolution is anagenetic rather than cladogenetic. Although counts of speciation events on fossil morphospecies, fossil lineage and molecular phylogenies are correlated (Fig. 3), their explanatory power differs strikingly (Table 1, Fig. 4). Our analyses support the idea that molecular evolution is associated with increased speciation along a root-to-tip phylogenetic path if an evolutionary species concept is used, but also suggest that it is challenging to obtain satisfactory estimates using only sequence data because of missing speciation events (Table 1, Fig. 4). Analytical methods have been developed to compensate for these missing speciation events (Bokma 2008; Ingram 2011), but their assumptions about how clades diversify are seldom if ever met (Mooers et al. 2007; Rabosky & McCune 2010; Ezard et al. 2011; Etienne et al. 2012), and molecular phylogenies can provide only a partial test of them.

This is the first attempt to separate numbers of speciation events in lineages' history from numbers of nodes in molecular phylogenies. An interesting comment on the correlations (Figs 2 and 4) is that the fossil phylogenies were constructed in a traditional, ‘expert opinion’ way (a more or less literal reading of this group's exceptional fossil record), whereas the molecular phylogenies used modern statistical methods. Cenozoic macroperforate planktonic foraminifera have a track record for enabling the integration of genetic and morphological data to delimit cryptic diversity (Huber, Bijma & Darling 1997; de Vargas et al. 1999), and developments in both molecular and palaeontological disciplines will push evolutionary biology towards the desired interdisciplinary, integrated taxonomy (Vogler & Monaghan 2007). Our results exemplify how the number of speciation events on fossil lineage phylogenies can increase our understanding of how, if not why, speciation disrupts molecular evolution (Fig. 4).


We thank Tim Barraclough, Luke Harmon, Pincelli Hull, Gene Hunt, Arne Mooers, Paul Pearson, Graham Slater and an anonymous reviewer for helpful comments on earlier drafts and the editors for their patience, all of which improved our manuscript considerably. GT is funded by NERC Fellowship NE/G012938/1 and THGE by NERC Fellowship NE/J018163/1. We are not aware of any conflict of interest to declare.