History is written by the victors: The effect of the push of the past on the fossil record

Abstract Survivorship biases can generate remarkable apparent rate heterogeneities through time in otherwise homogeneous birth‐death models of phylogenies. They are a potential explanation for many striking patterns seen in the fossil record and molecular phylogenies. One such bias is the “push of the past”: clades that survived a substantial length of time are likely to have experienced a high rate of early diversification. This creates the illusion of a secular rate slow‐down through time that is, rather, a reversion to the mean. An extra effect increasing early rates of lineage generation is also seen in large clades. These biases are important but relatively neglected influences on many aspects of diversification patterns in the fossil record and elsewhere, such as diversification spikes after mass extinctions and at the origins of clades; they also influence rates of fossilization, changes in rates of phenotypic evolution and even molecular clocks. These inevitable features of surviving and/or large clades should thus not be generalized to the diversification process as a whole without additional study of small and extinct clades, and raise questions about many of the traditional explanations of the patterns seen in the fossil record.

The patterns of diversity through time have been of continuous interest ever since they were broadly recognized in the 19th century (e.g., Phillips 1840). In particular, both major radiations (such as the origin of animals (Budd and Jensen 2000) or angiosperms (Sanderson and Donoghue 1994)) and the great mass extinctions (e.g., the end-Permian (Erwin 1993) or end-Cretaceous (Friedman 2010;Hull et al. 2011)) have attracted much attention, with an emphasis on trying to understand the causal mechanisms behind these very striking patterns. For example, the "Cambrian Explosion" and "Great Ordovician Biodiversification Event" have both been discussed at great length, with mechanisms as diverse as the cooling of the Earth to bombardment with cosmic rays or secular changes in developmental mechanisms being suggested (Smith and Harper 2013). However, in the midst of this search, the effects of survival biases on creating the patterns under consideration have hardly been considered. The last decades have also seen a great deal of interest and work on mathematical approaches to diversification and extinction (e.g., Stadler et al. 2014;Wang et al. 2013;Lieberman 2001;Ezard et al. 2012;, including some that touch on the topics considered by this article (e.g., see especially Mooers et al. 2011;Stadler and Steel 2012;Ricklefs 2007;and Stadler 2013), but there is hardly any literature on the dynamics of clade origins from the perspective of survival biases and their effect on the fossil record. In this article, then, we wish to explore the basis for such biases and then consider how it is exported to various important aspects of the observed large-scale patterns of evolution, with particular (but not exclusive) focus on the sort of data that can be extracted from the fossil record. In the following analyses, we calculate diversification over an interval of 0.1 Myrs and plot graphs with an interval of 2 Myrs. We term the (often unvarying) rate of a particular type of event (extinction or speciation) in a model as the "background" rate; and the rate of such events measured during a particular interval of time as the "observed" rate. For example, the background rate of rolling a six using an unbiased 6-sided die is 0.17; but if in seven trials five sixes happened to be rolled, the observed rate would be 0.71.
Our mathematical models are implemented in R (R Core Team 2017) and the code for this article is available in the Supporting Information.
The "push of the past" Nee and colleagues (Nee et al. 1994b;Harvey et al. 1994) summarized the general mathematics of stochastic birth-death models as applied to phylogenetic diversification (see also especially Ricklefs 2007). In such models, each lineage has a certain chance of either disappearing ("death," the rate of which is usually labeled "μ") or splitting into two ("birth," rate "λ"). Many models that consider diversity in this way have used constant birth and death rates that have revealed much of interest about phylogenetic processes (Nee 2001;Nee 2006). As Nee et al. (1994b) pointed out, conditioning clades on survival to the present generates two biases in the rate of diversification through time: the "push of the past" (POTPa) and the "pull of the present" (POTPr).
The POTPa emerges as a feature of diversification by the fact that all modern clades (tautologically) survived until the present day. This singles them out from the total population of clades that could be generated from any particular pair of background birth and death rates: clades that happened by chance to start off with higher net rates of diversification have better-than-average chances of surviving until the present day. As Nee et al. (1994a) put it, such clades "got off to a flying start"; and they accumulate species faster than one would expect. Once clades become established, they are less vulnerable to random changes in the observed diversification rate, and this value therefore tends to revert to the background rate through time. Long-lived clades thus tend to show high observed rates of diversification at their origin, which then decrease to their long-term average as the present is approached. It is important to note that such an effect is only seen in the rates of appearance of total species through time, not in the rate of appearance of lineages. A similar effect should apply to now extinct clades (such as trilobites) that nevertheless survived a substantial length of time. This effect is analogous to the "weak anthropic principle," which contends that only a certain subset of possible universes, that is those with particular initial conditions, could generate universes in which humans could evolve in order to experience them. Similarly, to ask the question why living clades appear to originate with bursts of diversification that then moderate through time is to miss the point that this pattern is a necessary condition for (most) clades to survive until the present day.
The "pull of the present" (POTPr) is, conversely, an effect seen in the number of lineages through time that will eventually give rise to living species, which is effectively what is being reconstructed with molecular phylogenies. As the present is approached, the number of lineages leading to recent diversity should increase faster than the background rate of diversification because less time is available for any particular lineage to go extinct. Thus, the POTPa affects reconstructions of diversity through time; and the POTPr affects the number of ancestral lineages through time (Fig. 1A).
Despite the theoretical considerations above, molecular phylogenies, far from showing a pronounced POTPr, typically show either no change in observed diversification rate as the present is approached, or even show a marked slow-down in rate-the opposite effect to what might be expected. Why this might be the case has been the subject of intensive research over the last few years, with various models being proposed, the most important of which are the "protracted speciation" model of , and various proposals that carrying capacities of environments lead to "diversity-dependent diversification" (DDD) (see Rabosky 2013 for review; see also Cusimano and Renner 2010 for skepticism about the reality of the effect). Diversification patterns thus potentially show two sorts of slow-down: one after the initial burst of diversification; and one as the present day is reached. This article deals with the first of these effects. When clades had themselves a recent origin, these two effects can of course be confounded, so in our discussion we largely confine ourselves to old clades whose "beginning" and "end" are clearly separated.
Although the POTPa is a known effect (although sometimes confused with the "Large Clade Effect" that we discuss below (e.g., Wahlberg et al. 2009;cf Ricklefs 2007)), and emerges naturally from conditioning clade survival to the present (see e.g., Stadler 2010;Stadler et al. 2015), it has had no penetration into the paleontological literature, where it is most important. In this article, therefore, we wish to show how influential an effect it and related effects are, and discuss its implications for general discussions about the reasons behind typical patterns of diversification seen in the fossil record and molecular phylogenies, including guidelines for how it and related effects might be detected.

Mathematical Analysis
In this section, we extend on the approach of Nee et al. (1994b) by explicitly conditioning the birth-death model on the number of extant species in the crown group, and considering the full distribution of clade abundances over time rather than just the central expectation.

SURVIVING CLADES THROUGH TIME
As noted by previous studies (Nee et al. 1994b;Strathmann and Slatkin 1983), in the classic birth-death model with speciation rate per species λ and extinction rate per species μ, the number of extant species, n t in a surviving clade at a given time t from the origin at time zero, obeys the following zero-truncated geometric probability distribution It is also useful to introduce the survival probability, s t : the probability that a lineage with one originating species will survive for a duration of time t, For the limiting case where λ − μ → 0, see Strathmann and Slatkin (1983). Nee et al. (1994b) proceeded by conditioning the distribution of n t on the tree surviving until some future time, T . Here, we also condition on there being n T extant species at time T . By Bayes' rule: Two terms in this equation are given immediately from equation (1). We can evaluate the remaining term, P(n T | n t ), by recognizing n T as a sum of m t i.i.d. geometric random variables obeying equation (1) over a time period T − t, where m t is the (unknown) number of species at time t that will give rise to surviving lineages. This implies that P(n T | m t ) follows a truncated negative binomial distribution (with n T taking a minimum value of m t ): The number of lineages that survive depends on the probability, s T −t for a lineage to survive from time t to time T , and follows a binomial distribution: Combining the previous equations, and summing over the unknown value of m t , we are left with the following expression for the number of living species at time t, conditioned on the number of species in the present: where the relevant probability mass functions are as defined above. We can further evaluate the conditional probability of m t -the number of species at time t that will have at least one descendant at time T . Here, we integrate over possible values of n t : What sort of expectations should we have about the size and duration of the POTPa? By looking at a small initial interval of time t from the origin and considering both the probability that the clade has diversified to two species in this interval and the probability that it will survive to the present, we can estimate the initial observed rate of diversification for surviving clades: P(n t = 2 | survive to T ) = P(survive to T | n t = 2)P(n t = 2) P(survive to T ) where we have assumed that s T − t s T for small t. It follows that the initial rate of diversification, R 0 , in the POTPa can be estimated by: If we look back to the origins of major clades, we expect s T to be small for geologically significant periods of time, and thus for these examples the rate can be further approximated as, (we note that similar results concerning the interior branch lengths of reconstructed trees have been derived by Stadler and Steel 2012).
It is important to note that at the precise origin of a clade that will survive to the present, the observed extinction rate is necessarily zero, since any extinction event would terminate the clade. Thus at this point the observed speciation and diversification rates are the same.
The result above shows us that if we know λ and μ, we can immediately calculate that the expected POTPa should produce a decline in observed diversification rate from about 2λ as an initial value down to λ − μ as deduced from the fossil record. However, the broad confidence intervals on this value place the 95% range on this value widely: for example, in Figure 1 over the first million years, the initial rate could be as low as 0 and as high as 3 (i.e., 6λ). Thus slowdowns seen in rates of diversification that begin with a wide range of values and quickly decline (largely being over by the time of the establishment of the crown group), with reconstructed rates in the stem lineage being significantly higher than in the generated plesions (i.e., extinct branches), are attributable to the POTPa. For the case of the birds discussed below, one would expect (in the fossil record) an observed initial diversification rate of about 1.25, that is about 20 times faster than the background rate. Another example is provided by the study of diversification rates in placental mammals (Raia et al. 2013) that simulates a best fit homogeneous model with parameters λ = 0.7, μ = 0.6, and λ − μ = 0.1. A POTPa would be consistent with a decline in rates from 1.4 to 0.1 over 5-10 million years, which closely matches their reconstruction of phenotypic rates (that they show to be correlated to diversification rates in their data). Thus, the calculated POTPa corresponds closely to real-world examples.
Even though the POTPa as an average effect quickly declines in time, the survivorship bias that gives rise to it must persist along the surviving lineages: every clade that will survive to the present commences with one original species, which is vulnerable to extinction. As the survival rate typically remains low until close to the present, it follows that the constant renewal of the surviving stem lineages follows a quasi-fractal pattern of repetition. A further notable feature is that as the present day is approached, and the survival probability thus tends to one, the observed rate of speciation along surviving lineages declines back toward λ.
An example plot displaying the various parameters that govern this analysis is given in Figure 1A. (cf Nee et al. 1994b). This plot is for a clade that has 10,000 living species/lineages; which emerged about 500 million years ago, and which has an average lineage duration of 2 million years (i.e., μ = 0.5).
The blue line gives the number of species at any particular time; and the slope of the blue line is the observed diversification rate governed by the time elapsed and total number of taxa at the Recent, n T . Conversely, the red line gives the number of lineages that gave rise to living species/lineages. We take as the rate of speciation the maximum likelihood estimate of λ, given μ, T and n T , which in this case is 0.5107 (rates in this article are given to a maximum of four decimal places). Thus the rate of diversification (λ − μ) is c. 0.0107 per species per million years. As can be seen, the slopes of the two lines diverge at the beginning, representing the push of the past, and at the end, representing the pull of the present, both of which are large in this case. If there had been a deterministic (i.e., nonstochastic) radiation of species from the Cambrian onwards with the net diversification rate of 0.0107 species per species per million years, then instead of 10,000 there would have been only about 210 species of this taxon today. Figure 1B shows the implied large spike in the initial observed diversification rate, owing to the POTPa, with the initial observed diversification rate (∼1) being 100 times the underlying average. Note also that this effect generates a (noncausal) correlation between diversity and diversification rate ( Fig. 1C): as diversity increases, (average) rate of diversification decreases.
How realistic are the numbers in our example? The size (and thus importance) of the POTPa depends on the rate of extinction relative to the other parameters. Extinction rates have proven difficult to estimate from both molecular phylogenies (notably Rabosky 2010; but see also Beaulieu and O'Meara 2015;and Rabosky 2016) and the fossil record (see e.g., discussions in Alroy 1999; Alroy 2014; Wagner and Lyons 2010;and Hagen et al. 2017). Nevertheless, the fossil record in particular shows that extinction rates must be relatively high, as most species across a wide range of taxa only last a few million years at most in the record (e.g., Crampton et al. 2016). For example: extinction rates over all marine invertebrates have been broadly estimated at c. 0.25 per species per million years (Barnosky et al. 2011;Raup 1991), and for Cenozoic mammals at up to 2 per species per millon years (Barnosky et al. 2011;Ceballos et al. 2015). Even highly conservative (i.e., low) estimates of background extinction rates, which partly equate species with genera in the fossil record, suggest rates in excess of 0.13 (de Vos et al. 2015;cf Harnik et al. 2012; see Alroy 2014, however, for a critique of the methodology used in the latter, which produces a notable downwards bias). An example clade would be the birds, that, although may have had their crown group origin some 120 Ma (Jetz et al. 2012; but see also Prum et al. 2015 andKsepka et al. 2017 for a more compressed view of bird evolution), nevertheless seems to have undergone a mass extinction along with the other dinosaurs 66 Ma (Longrich et al. 2011). Such a clade thus took approximately 66-70 Myrs to radiate into 10,000 species. Assuming a (probably conservative) extinction rate of 0.5 based on other land vertebrates (Loehle and Eschenbach 2012); taking 70 Ma, this would imply a speciation rate of 0.6068 and a diversification rate of 0.1068 (cf Jetz et al. 2012). Of course, all these numbers are approximate, but our aim with them is to show that the patterns we discuss in this article arise from very typical empirical values seen in analyses of extinction and diversification. Assuming that birds are a "typically" sized clade (see below), this would imply a notably enhanced rate of diversification with initial rate of 1.214 species per species per million years that would decline over about 5-10 million years (cf Ksepka et al. 2017).
Although the parameters we have explored in this article thus seem to be typical of diversifications over a large range of species numbers and time, we wish to stress the important point that the clades that emerge from them that survive for long periods are rare. In our example, although the living clade survived for 500 Myrs, the median survival time of an average clade generated by these parameters is only 2 Myrs. Furthermore, only 2.1% of clades thus generated will survive for 500 Myrs. These numbers emphasize how unusual long surviving clades are, even when there is a net positive diversification rate: survival rates for other diversification scenarios are given in Figure 4. The POTPa generally has the paradoxical effect of making high extinction rates increase observed rates of diversification and numbers of living species-in the rare clades that managed to survive.

CROWN GROUP ORIGINS
We now turn our attention to estimating the origin time of extant crown groups. First we consider the definition of a "randomly selected" crown group used by Raup (1993): the group emerging from the common ancestor of two randomly selected extant species.
Assume that we have selected one species at random from n T extant taxa. We now select a second. What is the probability, W (t | n T ) that we select one that shares a common ancestor with the first at time t? If there are m t lineages at time t that will survive to the present, then each of these must give rise to at least one extant species, leaving n T − m t remaining extant species that do not inevitably have to join up with the other m t − 1 ancestor species. Each species from this remaining set will have a 1/m t probability of sharing an ancestor with the first selected, thus: To obtain the desired probability, W (t | n T ) requires a posteriorweighted summation over the possible values of m t : where the posterior distribution on m t is calculated as above. W (t | n T ) represents a cumulative distribution function for the timing of crown group origins for randomly selected pairs of species, looking backwards in time. The corresponding probability density function, w(t | n T ), is given by differentiation of Compared to Raup's model (which can be most closely approximated by the Yule process (see below), although he did not include a stochastic component) our model delays the average time of origin of Raupian crown groups because of the effect of the POTPr of allowing a longer period of lower early lineage diversification rates. Nevertheless, it remains true that randomly selected pairs of taxa will also tend to have early origins (Fig. 4F, I). As can be seen (Fig. 4C), the Yule process forces crown-groups defined in this way to emerge very early. Budd and Jackson (2016) simulated the origin of the first crown groups in clades conditioned on survival, a topic of much interest in "Cambrian Explosion" literature (e.g., Erwin 1993). In simulations that start with one lineage and go on to diversify to the Recent, the time the simulation begins can be taken as the origin of the total group, and the emergence of the crown group (for the entire clade) when m t (the number of lineages at any time t that will give rise to living descendants) is equal to two (i.e., the basal split of the crown group is formed).
Since this state can only be reached from a previous state of m t = 1, the probability density u(t) that the first crown emerges at time t can therefore by calculated by considering the rate of change in the probability that m t = 1: As summarized in Nee et al. (1994b) and Figure 1A, we can see that m t in the early stages of diversification essentially depends . Thus a simple approximation for the expected length of time it takes for the first crown-group to emerge is given by: Thus t cg , the time in millions of years ago that the first crown group is expected to have emerged is simply where T is the time elapsed since the origin of the total group. As an interesting aside, the underlying diversification rate λ − μ is thus approximated by: The combination of the POTPa and the dependence of t cg on λ − μ means that stem and crown groups exhibit different characteristics of diversification and diversity, as the first crown group tends to emerge as the effect of the POTPa fades away. An example of this is given in Figure 2B.

OVERVIEW
Within a particular total group, then, stem groups are characterized by high observed diversification rates and low diversity; and crown groups by low diversification rates and increasing diversity (cf Fig. 1C). The interaction between the crown group and the POTPa allows us to understand why it is that the crown group emerges just as the POTPa dies away: the POTPa is an effect seen when there are few (surviving) lineages and as soon as there are two rather than one, the likelihood of the clade surviving until the present is considerably increased.
It is important to note that the high rates of observed diversification in stem groups are not general features, as we are applying a homogeneous model of diversification. Rather, unusually high observed diversification rates are concentrated in the stem lineage that leads to the crown group(s) (cf Stadler and Steel 2012). Stem groups should thus generate a high number of so-called "plesions" (i.e., extinct sister groups to crown groups (Budd 2001;Craske and Jefferies 1989)) that themselves will diversify and go extinct at approximately the background rate governed by λ − μ. From equation (10) we can see that the rate of speciation along most of the stem lineages, and thus the rate of production of plesions, remains close to 2λ, although the rate slowly declines until close to the Recent, when it precipitously drops to λ. Similarly, lengths of stem-groups also decrease, over a longer timescale, as the present is reached (see Fig. 2A for graphical treatment). Such effects should be observable and inferable from the fossil record by plotting speciation and extinction rates using appropriate methods for extracting these values (e.g., Alroy 2000; Alroy 2014).
This analysis gives us a remarkable perspective on the fossil record (Fig. 3), which is after all considered on methodological grounds, as taxa in cladograms are only ever terminals, to be composed entirely of plesions (Budd 2003; but see also Gavryushkina et al. 2014) except for fossils of extant species. Average rates of speciation (and, as we shall argue below, rates of phenotypic evolution) typify the clouds of plesions that are constantly being generated (and dissipating) at a high rate; but underlying them, and hidden from view, are stem lineages that speciate at twice the normal rate. It is only briefly, at the beginnings of radiations and after the great mass extinctions (see below) that these obscuring clouds are stripped away, and we get to peer at the underlying hyperactive stem lineages. Once again though it must be stressed that this pattern only emerges as a result of our perspective in the Recent, which allows us to distinguish stem lineages from plesions.

DIVERSIFICATION SCENARIOS
Armed with the mathematical analysis and example above, we are now in a position to analyze various scenarios that might play out in patterns of diversity and the fossil record. In each, we wish to examine: (i) the size of the POTPa effect; (ii) the distribution of the timing of crown group origins; and (iii) the relative proportions that the stem and crown groups take up of the total group.

THE YULE PROCESS
The Yule process (Yule 1924) governs diversification processes with no extinction, that is that μ = 0. Of course this is not realistic over geologically significant time periods, but nevertheless is important to show the contrast between this and more realistic models. Furthermore, it can be used to model surviving lineages through time, that have no extinction.
Under the no-extinction model, as all species give rise to living lineages, it is clear that the blue and red lines of Figure 1 are coincident (Fig. 4A) irrespective of the error on each. There is neither a pull of the present nor a push of the past (Fig. 4B), and the slope of the line simply gives the diversification rate through the time required to lead to the observed n T . The rate of diversification is completely constant along the mean, since the diversification rate has been selected to generate n T (i.e., 1000 species in this case). Nonetheless, as the confidence region shows, early fluctuations in this process are possible, which we consider further later. Another feature of the no-extinction model is that total and crown groups are nearly coincident for any particular clade, as stem-groups grow by extinction (Budd 2003). A lag at the beginning is possible though, before the first speciation event takes place. Figure 4D-F and G-I model two net diversification models; one with μ = 0.1 and the other with μ = 0.5, both with the best-fit implied λ (the maximum-likelihood value given T, n T and the selected value of μ). As can be seen, increasing μ increases the POTPa.

AND EXTINCTION
If μ is set very low (e.g., μ = 0.01 for T = 500 Myrs and n T = 1000), then the POTPa can be much reduced. However, such models imply very implausible species longevities (a typical species would be expected to survive 100 Myrs in this model, numbers that are not realistic for the Phanerozoic. They may, however, be more appropriate to the Proterozoic (Butterfield 2007)).
Models with the same background diversification rate can have high and low turnover (e.g., λ = 0.6, μ = 0.5, or λ = 0.2, μ = 0.1). As models with larger diversity fluctuations in will be more vulnerable to extinction than ones with small ones, it follows that high turnover scenarios require a larger POTPa to escape the early period of vulnerability.

Mass Extinctions
We have chosen for simplicity a diversification model that is diversity-independent and has homogeneous rates of extinction both through time and for taxon-age (cf Van Valen 1973). Nevertheless, the handful of mass extinctions through time have had a large impact on diversification patterns (Benton and Emerson 2007). The most important are perhaps the end-Permian (with c. 80% of all species going extinct (Stanley 2016)) and the end-Cretaceous (c. 68% loss (Stanley 2016)). Such events could be considered as simply "resetting the clock"-that is if evidence exists that extinction was extremely severe in a particular clade, then T should be considered to restart at that point. Some overall patterns of diversification suggest that the only truly important mass extinction in this regard is the end-Permian one (Sepkoski 1998;Aberhan and Kiessling 2012), which divides Phanerozoic time more or less into two, with large, but largely uncommented, POTPa effects at the beginning of each. One interesting effect is that the bigger a mass extinction, the bigger the subsequent POTPa would be, assuming something survives to the present. Even so, these big pushes can never make up for the lost diversity, even if they compensate for it to some extent. For example, for a diversification that started 500 Ma, and that would have generated 1000 living species without any disturbance, and with a background extinction rate of 0.5 (and implied maximum-likelihood speciation rate of 0.504), a mass extinction 250 Ma down to only one species and the subsequent POTPa and reradiation would only generate 240 living species-it is a rerun of the original radiation but in half the time. On the other hand, without any POTPa, this reradiation would be expected to generate only three living species. It is possible to model the POTPa with a standing diversity, and show how the size of the POTPa declines as surviving diversity increases. We modelled this by plotting number of survivors against immediate observed diversification rate postextinction (Fig. 5) for different rates of background extinction for a radiation that took 250 Myrs to generate 1000 species. As can be seen, extinctions can indeed generate a large POTPa, but the number of remaining species for the clade needs to be reduced to a few percent of their original numbers. Thus, really large POTPa effects after a mass extinctions are likely to be contingent on large extinctions preceding them in the clade in question. It should be noted that individual clades can be reduced to just a few surviving species, even when overall extinction rates during a mass extinction are relatively moderate (such "dead clades walking" however, are likely to be common: see for example comments in Jablonski 2002).
Another POTPa-like bias may also be having an effect, which is the effect of the POTPa on fossilization rates themselves. One of the controls on the preservation probability of a taxon is its true (as opposed to fossil record) temporal duration, and thus its extinction rate (Foote 1997;Foote and Raup 1996). When diversity drops to a low level, survivorship over the next short interval of time is compromised, with the implication that only taxa that experience unusually high rates of diversification are likely to survive-and thus enter the fossil record. Figure 6 shows there is a strong relationship between survivorship on a million or submillion year scale and diversification rates. In brief: taxa straight after a mass extinction or at the beginning of a radiation have an unusually poor chance of entering the fossil record, as their diversity is so low and their chance of almost instant extinction is so high. However, the taxa that by chance experience high rates of early diversification are much more likely to survive long enough to generate a discoverable fossil record. Such an effect may at least partly lie behind the observation that fossilization rates seem to be depressed after mass extinctions (notably the end-Permian (Twitchett 2001)). Thus, one interesting aspect to this pattern in the record, that such "recoveries" seem to be delayed, with clades sometimes taking millions of years to show increased rates of diversification (see e.g., discussion in Sepkoski 1998), may be partly explicable by this effect too: early survivors are simply such low diversity that they tend to go extinct faster than they can enter the fossil record.

BIRTH-DEATH MODELS
The various cases we have considered above show that the POTPa is in general a very important factor that cannot be neglected in trying to understand diversity patterns of the past. The most important control on the size of the POTPa is the extinction rate (compare Figs. 4E and H) although time to the Recent also has some effect. Thus, when significant time periods have passed, the POTPa is always large unless the background extinction rate is extremely low (cf Ricklefs 2007)-much lower than seems to be typical for at least Phanerozoic taxa, which typically have a life time of a few million years (Sepkoski 1998).
Because of the nature of the homogeneous model we are using, we wish to stress its "Copernican" aspect, that is that diversification is on average the same at all times. Each stem lineage will be characterized by a high POTPa, but as diversification continues, its distorting effect on average diversification rates in surviving lineages will be diminished by two factors. The first (which is small until the POTPr is reached) is that as time advances, each lineage has less time to survive until the present. The second is that as diversification proceeds, more average or even below average diversification-rate lineages will be present, and thus the overall average rate of diversification will be swamped by their diversification rates. In the first stem group, so few lineages are present that the implied POTPa on the stem lineage will have a disproportionate effect on average diversification rates. Such controls produce the characteristic decline in average actual diversification rates through time, even though an observer at any particular time would not notice any difference whatsoever.

The Large Clade Effect: An Analogy to the POTPa in Reconstructed Phylogenies
So far, we have considered the effect of survivorship biases in the blue line of Figure 1. Our exploration of the POTPa shows, however, that when conditioned on survival, it remains nearly constant at along the surviving lineages at close to 2λ until the Recent is approached. Hence, it largely cannot account for long-term declines in phenotypic and molecular rates along the lineages (see below) or lineage production rates themselves. However, survival alone is not the only characteristic of a clade that can lead to statistical biases. As we have shown, the birth-death model can also be conditioned on the number of extant species in the Recent, n T . Therefore we can ask about the characteristics of outliers within the set of surviving clades, specifically those with a larger-than-expected present diversity (Ricklefs 2007). Such outliers represent those clades that are held up as the most "successful" examples of their type and, erroneously as we shall see, are often presented as "representative" of their particular time of origination. As Pennell et al. (2012) observed through simulation analysis, larger than average clades are statistically more likely to show a slowdown relative to smaller clades. To illustrate this, we recalculated the example shown in Figure 1, but conditioned it to generate 100,000 instead of 10,000 species (Fig. 7). Under such rare conditions, more lineages need to be generated than normal, and the most likely moment to do this (as can be seen in the confidence regions of Fig. 1A) is at the beginning, when overall numbers of lineages are small and statistical fluctuations more noticeable in effect.
Under such circumstances, a lineage effect is produced (Phillimore and Price 2008;Pennell et al. 2012), which could be called the large clade effect (LCE). Although it is smaller than the classical POTPa (in the example of Fig. 7 the rate of speciation along the lineages increases to 2.2λ from 2λ), it has the effect of speeding up the appearance of new living lineages near the beginning, and thus makes crown groups emerge (even) earlier. The lineage through time plot thus takes on a characteristic inverted "S" shape that is often seen in plots of molecular phylogenies (e.g., Harmon et al. 2003). Like the POTPa, it has a quasi-fractal organization-within a given clade, larger subclades will experience greater early diversification than smaller subclades. As we discuss below, this effect thus influences rates of evolution in large clades, and will be particularly prominent if such clades have happened to attract more than average attention, as has indeed been suggested (Ricklefs 2007). A correlation with lineage diversity is also generated Figure 7C, which in our example can be seen for about the first 20 lineages.
The initial magnitude of the LCE can be explicitly calculated in terms of the relative magnitude of the clade relative to its expected size conditioned on the background speciation and extinction rates, E(n T | survive). To determine the initial rate relative to the background value R = λ − μ, we consider the probability that the new clade with one lineage diverges into two lineages within a small unit of time, both a priori and conditioned on the final clade size. Recalling that the distribution of n T conditioned on m t is negative binomial, we have: Thus we can see that the expected size of the initial LCE, and thus the magnitude of the later slowdown, is proportional to the eventual clade size. It should be noted the clade containing any randomly chosen species is expected to be twice the average clade size, and thus there is a consistent bias toward this effect appearing.

Effects of the POTPa and LCE on Rates of Phenotypic and Molecular Change
The rate of phenotypic change through time is another pattern that has seen a great deal of interest (e.g., Westoll 1949;Lloyd et al. 2012;Lee et al. 2013;Ruta et al. 2006;Brusatte et al. 2010;Bronzati et al. 2015; but see also Harmon et al. 2010). A classical pattern of rates of phenotypic change is that rates are elevated at the origin of a clade and then show an exponential decline (e.g., Erwin 1993). Such a pattern looks, of course, like a POTPa effect, but this effect would seem to rely on a correlation between rates of phenotypic change and diversification. While this seems both intuitively reasonable and has much theoretical backing, this pattern has been difficult to demonstrate and indeed some studies have failed to reveal it (e.g., Adams et al. 2009;Hopkins and Smith 2015; but see also Rabosky and Adams 2012 who review the topic in general). Our model can account for such patterns by considering the fossil record to consist of plesions that are generated by a rapid rate of speciation in an underlying but unseen stem lineage. In principle at least, each of these speciation events (at least as recognized in the fossil record) should be accompanied by a set of diagnostic synapomorphies that accumulate within the stem lineage twice as fast as they do in the plesions that arise from it. This effect does rely on some sort of correlation between phenotypic change and diversification rates. However, in at least the fossil record, where distinct taxa are recognized solely on the phenotypic differences, some such connection must exist.
As the survivorship bias that leads to the POTPa remains more or less constant along long stretches of the lineages until the present is reached, it follows that it should not generate a "slow-down" in measured rates of either phenotypic or molecular change along the lineages (i.e., measured along the red line of Fig. 1) when the present is far away. Average phenotypic rates of change should however decline through time when measured over all fossil taxa (i.e., measuring the rate of phenotypic change in the blue, rather than red, line of Fig. 1). The notable study of lungfish evolution through time by Lloyd et al. (2012) reconstructed rates of phenotypic evolution through time and indeed noted such a decline (see their Fig. 4; note that it also shows characteristic postextinction spikes; cf Raia et al. 2013 that shows a similar decline through time). However, as the authors note, the (reconstructed) stem lineage leading up to the extant lungfish retains high rates of phenotypic change much later than the initial rapid decline in overall rates, while the plesions appear to show no such pattern (the authors do not differentiate between the two in their analysis of the decline in rates). Study of bird and dinosaur phenotypic change rates also strikingly shows a similar concentration of change along the stem leading to the extant taxa (Benson et al. 2014). This pattern is exactly what the model we develop here would predict, as it confines the POTPa to the stem lineages, and suggests that their documented decline of phenotypic rates of evolution is a striking consequence of the POTPa (compare their Fig. 1 with our Fig. 3). Clearly, it would be possible to test this pattern in other groups too. The POTPa should not, however, affect rates of molecular evolution because this cannot be measured in the blue line, only in the red.
The LCE, conversely, should affect rates measured along the lineages, but in a subtle way. For a big LCE, for example the 10 x larger than expected effect of our Figure 7, the rate of speciation along the lineages is initially increased only very modestly, that is in this instance from 2λ to 2.2λ. However, the initial rate of appearance of lineages increases from λ − μ to 10 (λ − μ). The implication of both together is that even if rates of speciation are correlated with rates of molecular and phenotypic change, neither of the latter should noticeably increase as a result of the LCE. However, the amount of that approximately constant change in large clades that is curated into the present along the lineages is disproportionately sourced from the early stage of the clade's history, when the LCE is in effect, at a rate proportional to the size of the LCE. In other words, when more early species survive to become lineages, more of the total amount of phenotypic and molecular change that took place during that interval of time is captured by those lineages and thus survives to the present, rather than being lost to the unobservable plesions.
Thus, the initial change per unit time per lineage should be increased proportionally to the size of of the LCE for both molecular and phenotypic change. This can give rise to large effects, which can be seen in the study of Lee et al. (2013), where large initial rates for both phenotypic and molecular evolution can be seen as measured along the lineages. We note that the initial effect seen in Lee et al. (2013) is approximately 10× the normal rate and lasts until about 17 lineages have been created, very similar to our calculation in Figure 7C.
One implication of this finding is that in unusually large clades, one should expect a concentration of rapid molecular evolution in early lineages, and, if not corrected for, will create the effect of making molecular clocks overestimate origination times. Such an effect could in principle account for the continuing discrepancy between molecular clock estimates for the origin of the animals and the fossil record, for example Lee et al. 2013 (but only in large clades, such as the arthropods (Lee et al. 2013)and, of course, the animals themselves). Thus, although various studies have shown that rates of molecular change may or may not be correlated with diversification rates (e.g., Barraclough and Savolinen 2001;Bromham 2003;Davies et al. 2004;Pagel et al. 2006;Lanfear et al. 2010;Goldie et al. 2011), our model suggests that it is not diversification rates per se, but rates of lineage creation, that are correlated with (curated) amounts of molecular change.

Rate Heterogeneity
A question that arises from this analysis is: "when is it appropriate to attribute heterogeneities in frequencies of events to intrinsic survivorship biases such as the POTPa and LCE rather than to adopt models where background probabilities vary exogenously through time?" To take an example from the fossil record: some extinct taxa, notably the trilobites (e.g., Bell 2013), indeed show a very rapid initial diversification, followed by a fairly drawn-out decline and final extinction. It is clear that such a decline cannot be realistically modeled by keeping the same background rate of diversification through time-it implies that the most appropriate background diversification rate has actually turned negative (cf Stanley et al. 1981). One should note here however, that given that the trilobites experienced several mass extinctions, these singular events may have successively reduced their diversity to the point where they became vulnerable to stochastic extinction, even with net positive diversification rates.
Several recent software packages (e.g., RPANDA (Morlon 2016) and TreePar (Stadler 2011b)) have been developed for detecting statistically significant rate shifts of this sort within clades (Stadler 2013; see also Arbour and Santana 2017;Jetz et al. 2012 for examples of examination of rate shifts in different clades). How do the effects we outline here intersect with them? We have shown above the expected sizes of both the POTPa and LCE, which are themselves rate heterogeneities that arise from homogeneous models when conditioned on either/or survival and clade size. Nevertheless, there is a difference between such heterogeneities and those seen from more complex models, because the heterogeneities that emerge from survivorship bias are strictly local as opposed to global in effect. However, it remains currently unclear how they could be disambiguated from each other. Genuine DDD should improve the survivorship of early clades and thus leave fewer plesions, but this effect will resemble the LCE (see Fig. 7 for the apparent diversity dependent diversification under such conditions).
For a particular surviving clade, inferred diversification rate variation that falls within the expectations of the POTPa outlined here should not be generalized as pointing to a time-specific period of enhanced diversification, for example after mass extinctions. Furthermore, given such patterns are inevitable, they should not be taken on their own as evidence for a particular generative mechanism, even if one thinks on other grounds that such mechanisms are probable. Conversely, evidence of diversification bursts should be taken more seriously when it occurs in short-lived clades (e.g., in the fossil record) or if it occurs across a whole clade that is already well-established.
The LCE, conversely, can by definition only be measured in the lineages. We have given an expression above for its expected size, which depends on the size of a clade relative to a base-line expectation for a given background rate of diversification. We note here, however, that there are two problems with estimating clade size. The first of these is relatively straight-forward, and relates to our inability to count all living species. This has been accounted for by for example Stadler (2011a) where it is assumed that every species could be identified with a fixed probability (for sampling of higher taxa only, see Stadler and Bochma 2012). The second issue, which is more serious, concerns the nature of the species-level birth-death model we have been using. For the past, this model is an appropriate representation of the outcomes of the evolutionary process, and abundance of species is an appropriate measure of diversity. However, as we approach the present, the validity of this model arguably breaks down, as species lose their singular identity and become more accurately represented by individuals, sub-populations and nascent new species (see e.g., Etienne and Rosindell 2012;Stadler 2013). This break-down is at least partly likely to account for the apparent lack of an observed POTPr in molecular phylogenies, which has been attributed to various types of background rate heterogeneity (Stadler 2013). Although this topic is clearly an active area of research, one approach as far as detection of the LCE is concerned would be to compare the relative present sizes of otherwise equivalent clades of similar ages. The LCE would predict that their early rates of lineage diversification and the size of their subsequent slowdown would be proportional to their current clade size. The general stochasticity of the whole process can of course lead to a very wide range of possible outcomes and a suitably large sample of clades would be required to reliably detect the effect. Simulation of large numbers of clades, with a range of both POTPa and LCE, and taking into account the possible disturbing effects of mass extinctions, may assist in fully understanding the range of possible outcomes of rate heterogeneity that can arise from homogeneous models.
The intersection of background rate heterogeneity and survivorship biases raises important issues about the generalizability of theories about diversification. Clades that survive until the present day are biased by the POTPa, and of those, the large clades will be further biased by the LCE. Thus, large living clades represent a very unrepresentative sample of clades in general, and their features will not be universal to the entire population of clades that have been generated by the evolutionary process. These unusual clades can of course be modeled with specific models that describe the rate shifts that must exist in them. Such models could be used to systematically generate similar large, surviving clades, but would likely fail to generalize to the more numerous, smaller and nonsurviving clades. Thus, analyzing only large, surviving clades to the exclusion of smaller and extinct clades will not demonstrate whether the properties of the studied clades results from survivorship biases, exogenous rate variation, or both. Indeed, the very simplicity of birth-death models and their powerful application suggests that the data we have are not in general sufficient to distinguish between the various ecological and evolutionary events that eventually give rise to them. As Nee remarked some years ago: "It is well known that completely different mechanisms can generate the same pattern: the distribution of parasitic worms among people is the same as the distribution of word usage in Shakespeare -the negative binomial. This means that the patterns themselves cannot inform us about mechanism and some other techniques are needed" (Nee 2002).

Summary
In this article, we have explored the patterns of diversification that can be generated by a retrospective view of a purely homogeneous model of diversification. These patterns can be substantial and highly nonhomogeneous, and it is essential to understand these "null" hypotheses before considering causal explanations for any residuals (cf Stadler and Steel 2012). Patterns of diversification through time have been much discussed in the literature (e.g., Hopkins and Smith 2015), with a common pattern being seen that diversification rates are high at the beginning of major evolutionary radiations, in both raw diversity counts and lineages through time plots. Various mechanisms for such effects have been proposed (such as filling empty ecological niches or unusual or flexible developmental evolution). The question that the analysis above poses is, however: are such patterns inevitably generated by the push of the past and/or the large clade effect? We have shown that the POTPa is strongest when background extinction rates are high, and that in likely scenarios for the evolution of large clades, it eventually accounts for nearly all of modern diversity. Furthermore, the POTPa impacts other many aspects of diversification dynamics, including recovery from mass extinctions. Indeed, the universality of such processes extends beyond evolutionary biology, with similar patterns being observed, for instance, in the size-or age-dependent growth of companies (see e.g. Reichstein and Dahl 2004 and references therein). Even under homogeneous models, large clades can be generated at the edge of likely distributions that possess another characteristic, the "large clade effect," which generates distinctive patterns in phenotypic and molecular evolution. Harvey et al. (1994), when briefly describing the POTPa, commented that "If these statistical effects are not fully appreciated, it could be tempting to misinterpret such a higher early slope as evidence for lineage birth rates being higher, and/or lineage death rates being lower, at earlier times" (p. 526). Here, we have attempted to quantify both the size of, and controls on this effect, and to show just how important it in patterns of changes of rates of evolution through time including: dependency of rates of diversification on diversity; initial bursts of diversification at the origin of clades and the effects of mass extinctions. Although it seems natural to take the history and diversification of large and ultimately successful clades such as the arthropods as proxies for evolutionary radiations as a whole (e.g., Briggs et al. 1992;Lee et al. 2013) including after mass extinctions, our analysis shows this to be particularly fraught with difficulties: the history of life was written by the victors.

AUTHOR CONTRIBUTIONS
The study and concepts within this article were jointly developed. Both authors reviewed the text and accept responsibility for the entire article.