Modelling distributions of fossil sampling rates over time, space and taxa: assessment and implications for macroevolutionary studies

Authors


Correspondence author. E-mail: wagnerpj@si.edu

Summary

  1. Observed patterns in the fossil record reflect not just macroevolutionary dynamics, but preservation patterns. Sampling rates themselves vary not simply over time or among major taxonomic groups, but within time intervals over geography and environment, and among species within clades. Large databases of presences of taxa in fossil-bearing collections allow us to quantify variation in per-collection sampling rates among species within a clade. We do this separately not just for different time/stratigraphic intervals, but also for different geographic or ecologic units within time/stratigraphic intervals. We then re-assess per-million-year sampling rates given the distributions of per-collection sampling rates
  2. We use simple distribution models (geometric and lognormal) to assess general models of per-locality sampling rate distributions given occurrences among appropriate fossiliferous localities. We break these down not simply by time period, but by general biogeographic units in order to accommodate variation over space as well as among species.
  3. We apply these methods to occurrence data for Meso-Cenozoic mammals drawn from the Paleobiology Database and the New and Old Worlds fossil mammal database. We find that all models of distributed rates do vastly better than the best uniform sampling rates and that the lognormal in particular does an excellent job of summarizing sampling rates. We also show that the lognormal distributions vary fairly substantially among biogeographic units of the same age.
  4. As an example of the utility of these rates, we assess the most likely divergence times for basal (Eocene–Oligocene) carnivoramorphan mammals from North America and Eurasia using both stratigraphic and morphological data. The results allow for unsampled taxa or unsampled portions of sampled lineages to be in either continent and also allow for the variation in sampling rates among species. We contrast five models using stratigraphic likelihoods in different ways to summarize how they might affect macroevolutionary inferences.

Introduction

A concern expressed in even the oldest studies of evolution using fossil data is that inconsistent sampling might distort evolutionary patterns (Darwin 1859). Inconsistent sampling over time, geography and among taxa affects our perceptions of a wide range of macroevolutionary issues: from more general to more specific, these span from differences in overall richness (Raup 1972; Alroy et al. 2001), to extinction and origination rates (Sepkoski 1975; Foote 1997, 2001; Alroy 2000) and further down to specific ideas about timings of extinctions (Signor & Lipps 1982; Marshall 1995a) and originations (Wagner 1995a, 2000a; Huelsenbeck & Rannala 1997). These issues in turn spill over into other macroevolutionary issues such as whether apparent patterns of punctuated or continuous morphological change might reflect sampling (Marshall 1995b) or whether apparent shifts in rates of morphological change reflect differences in sampling affecting how much time lineages might have had to accumulate change (Wagner 1995b, 1997). Thus, being able to model variation in the rates at which we sample taxa from the fossil record transcends simple interest in sampling itself.

A key point is that there is no such thing as ‘the quality’ of the fossil record: the probabilities of sampling taxa, either per-stage or per-million years, vary enormously over time (Alroy 1999; Foote 2001), among taxonomic groups (Foote & Sepkoski 1999), across geography and environment (Smith 2001), and among species within clades (Wagner 2000a). Foote (1997; also Foote & Raup 1996) presents methods that can assess the first two issues, that is, variation in rates at which we sample taxa over time (Foote 2001) and among major clades (Foote & Sepkoski 1999). These methods require only synoptic compilations of first and last occurrences such as provided by Sepkoski (2002). However, they provide only single ‘average’ sampling rates for whole taxonomic groups and/or stratigraphic intervals. These rates themselves reflect two factors: sampling at the finest stratigraphic levels (i.e. individual collections of fossils from particular rock layers) and the number of collections within a stratigraphic or million-year interval. Assessing variation in per-stage or per-million-year sampling rates among taxa in the same interval (e.g. different species in a clade, or species from different habitats or geographic areas) therefore requires that we assess per-collection sampling rates.

Fortunately, palaeontologists have assembled large databases of fossil occurrences and distributions of sedimentary rock such as the Paleobiology Database (http://paleodb.org). These provide information about numbers of finds, numbers of sampling opportunities, and where those finds and opportunities exist geographically and environmentally. This opens the door to modelling sampling as distributions of per-collection rates rather than single ‘average’ per-stage or per-million-year rates and then extrapolating distributions of per-stage or per-million-year sampling rates from per-collection sampling rates.

The issue of rate variation is hardly specific to sampling rates from fossil record. Phylogeneticists deal with an analogous problem when accommodating variations in rates of character change. Instead of deriving specific rates for individual characters, phylogeneticists assume that rates are drawn from model distributions such as the gamma (Yang 1994). Instead of assuming that single gammas fit all data partitions (e.g. different genes), they examine whether different data partitions fit different gamma distributions (Yang 1996). We adopt the same approach by first assessing whether model distributions for sampling rates such as geometrics or lognormals better predict distributions of fossil occurrences than do single sampling rate models. We further adapt this by breaking the distributions up by time and geography in order to better model suspected variation in the fossil record. Finally, we apply these results to a particular issue – the likelihood of stratigraphic gaps associated with divergence times implicit to a hypothesized phylogeny – and provide cursory discussion of how these results might affect macroevolutionary inferences.

Data and methods

Distributions of sampling rates

Some basic issues concerning the data

Our tests rely on occurrence (=incidence) data of fossil species. In particular, we are interested in how well distributions of sampling rates predict frequencies of occurrences per-sampling opportunity (i.e. relevant fossil-bearing collection). This introduces one major difference between our goal and the conceptually similar goal of modelling abundance distributions within communities (May 1975; Gray 1987). In those studies, the number of specimens sets the limits on the possible specimens that might go to Species A, B, C, etc. Thus, if there are 100 specimens, then only one species can have 100 individuals. The number of possible occurrences is the number of collections. Thus, if there are 100 collections, then theoretically all species could have 100 occurrences.

‘Per-sampling opportunity’ leads to our next basic issue: exactly which fossiliferous collections count as sampling opportunities? We define ‘sampling opportunities’ as collections from which a species of interest could have been sampled had they been present. For example, we do not expect to sample terrestrial vertebrates from marine sediments; thus, a collection of marine invertebrates is not a sampling opportunity for terrestrial mammals. This goes beyond basic environmental differences. For example, a Cenozoic locality preserving only terrestrial plants almost certainly captures an environment that hosted mammals, too. However, taphonomic processes (e.g. factors causing fossilization; Behrensmeyer & Kidwell 1985) can exclude basic preservational groups. Thus, environmental and taphonomic controls (Bottjer & Jablonski 1988) are critical for assessing sampling opportunities: a collection is an opportunity only if it shows that it could have contained the species of interest.

Assessed models

The first model that we assess is not a distribution, but a single rate. This represents the simplest (one parameter) model and thus is the null relative to all others. Many studies estimate global sampling rates per-chronostratigraphic unit (e.g. stage or substage; Foote & Raup 1996). Given a per-stage sampling rate Rs and N collections in that stage, we can estimate the per-collection sampling rate to be:

display math

Thus, if one estimates Rs = 0·333 for an interval with N = 100 collections, then one would estimate an ‘average’ Rc = 0·004.

For numerous reasons, we do not expect a single Rc to summarize all taxa and all collections. Buzas et al. (1982) consider two models when looking at the distribution of occurrences: the log-series (Fisher, Corbet & Williams 1943) and the lognormal (Preston 1948). Their primary justification for doing this was that abundances of fossil taxa within collections often fit these two distributions well. Although we also are using occurrence data, we are modelling underlying sampling rates rather than occurrence frequencies. Thus, the log-series is unusable as it models distributions of discrete variables (e.g. taxa with 1, 2, etc. finds) rather than fractional variables such as rates. However, we can use geometric distributions (Motomura 1932) as an alternative. Like the log-series, the geometric distribution assumes that the relevant variables follow a uniform exponential distribution. We might expect geometric distributions of sampling rates if there is no cohesion among the numerous processes underlying sampling rates (e.g. geographic ranges, relative abundance and sample size from collections, local preservation potential and ease of identification and recovery). Conversely, if there are central tendencies to those processes that are associated with particular taxa, then we expect lognormal distributions (Montroll & Shlesinger 1982). Wagner & Marcot (2010) show that sampling rates among some Ordovician–Silurian gastropods follow a lognormal distribution.

Both the geometric and lognormal models have a basic sampling rate, r, as one parameter. For the geometric distribution, there is an additional ‘decay’ parameter, δ, giving how many times lower the next sampling rate is. For the taxon with the ith highest sampling rate, the per-collection sampling rate Rci is

display math

where r1 is the sampling rate of the most easily found taxon. Note that these rates are per-collection, so r1 cannot exceed 1·0. The uniform is a special case of the geometric in which δ = 1, that is, there is no decay in rates over taxa.

For the lognormal, the basic rate r represents the geometric mean of the rates. The distribution is determined by two more parameters. One is a magnitude parameter, m, which gives one standard deviation in magnitude of change around the mean. The second is true richness (S). For the taxon with the ith highest sampling rate, Rci is

display math

where norminv(x) gives the number of standard deviations away from the mean for which x is the area under the bell curve to the left of that point. The latter parameter illustrates the importance of S. At = 50 taxa, Rc1 is proportional to m to the power norminv(50/51) = 2·06; however, at = 100 taxa, Rc1 is proportional to m to the power of norminv(100/101) = 2·33. Note that the uniform is a special case of the lognormal where = 1·0 and = ∞.

There are several other distribution models (e.g. gamma, Zipf, etc.) that one might consider. However, we found none of them to perform as well as the best models considered here, and there are no particular theoretical reasons to expect these distributions. Therefore, we do not discuss them here. We do include the likelihoods of saturated models (Sanderson 2002) where the expected number of taxa with 1…N finds equals the observed. Saturated models provide the maximum possible likelihood of any hypothesis derived from these sorts of models and thus provide a useful benchmark for evaluating the performance of the simple models.

Model assessment

We assess hypotheses under any particular model by deriving the expected frequencies of taxa with 1…N finds and then using multinomial probability to assess the likelihood of the rate distribution given occurrence data. For any distribution, the expected frequencies given Rc are

display math

The numerator sums the binomial probability of k occurrences given N collections and sampling rate Rc. The denominator sums binomial probabilities of finding the taxon at all (i.e. one minus the probability of 0 finds). This conditions ƒ(k) on the taxa being found, which is appropriate because we can tally taxa with 1, 2, etc. occurrences, but not those with 0 occurrences. The lognormal is the only model for which S is an explicit parameter (Wagner, Kosnik & Lidgard 2006). In the case of the geometric, we sum up to the i where until Rci becomes so low that it no longer elevates the denominator past the 4th decimal point. In the case of the uniform, the summation is unnecessary as Rci = r for any taxon i; thus, the summations in both the numerator and denominator can be eliminated (for the saturated model, we forgo this equation and simply set ƒ(x) = Sx/Sobs where Sx = observed number of taxa with x occurrences and Sobs is the observed number of taxa).

For any particular Rc from any given model given the data, the sufficient statistic for the likelihood is

display math

where n(x) gives the number of taxa with = 1…N occurrences (see Figs S1–S4). Although the uniform represents a special case of the geometric and the lognormal, the geometric is not a special case of the lognormal. Therefore, we use Akaike's modified information criterion (Sugiura 1978) to compare the best representatives of each model. We include comparisons with the saturated model simply to inform the readers' intuitions regarding how close simple models come to maximally explaining the data.

Data

We use occurrences and collections of terrestrial mammals from the Campanian through the Pliocene. The bulk of the data come from the paleodb (http://paleodb.org), downloaded on 29 November 2012. We augment these data with the New and Old Worlds database (formerly Neogene Old World; http://www.helsinki.fi/science/now/), after vetting the data to remove duplicate localities and occurrences. Uhen et al. (2013) review both databases and numerous macroevolutionary studies that use these data (Alroy 1996, 1998, 1999; Fortelius et al. 1996, 2002; Raia et al. 2012). We include only occurrences identified to the species level and thus only those collections with such occurrences. We used species from the Lepidosauromorpha (e.g. lizards, snakes and relatives) as a taphonomic control group. Our justification for this is that terrestrial localities from which workers can identify lepidosauromorpha species level have the potential to preserve mammal specimens that can be identified to the same level. Conversely, our analyses exclude Cetacea and other exclusively marine mammals, as we do not expect to find terrestrial mammals in fossil beds yielding those taxa. In total, we use 46612 occurrences of 8129 species from 7871 localities (Table 1) and binned these into standard stages and substages of the Mesozoic and Cenozoic. PaleoDB data represent 5587 references.

Table 1. General summary of analysed data
Stage/Substage(s)Onset (Millions of Years Ago)Sampled taxaOccurrencesCollections
  1. ‘Onset’ gives the beginning of the stratigraphic units in question in millions of years as per Gradstein, Ogg & Smith (2005). Sampled taxa give numbers of species sampled.

Campanian+83·589314133
Maastrichtian+70·686728198
Danian+65·53061396253
Selandian–Thanetian+61·73631668296
Ypresian+55·85536458998
Lutetian+48·64451605328
Bartonian–Priabonian+40·47622326374
Rupelian+33·94841778489
Chattian+28·4397871202
Aquitanian–Burdigalian+23·0142968751082
Langhian–Serravallian+16·0128478901230
Tortonian–Messinian+11·6166773261165
Zanclean+5·38312131369
Piacenzian–Gelasian+3·613095246754

Mammalian localities are not evenly distributed over time (Fig. 1). In particular, the Miocene and Pliocene have more localities than expected given their durations, and the Paleocene and Cretaceous have fewer localities than expected. This pattern becomes more pronounced when we subdivide collections by continent. Although large proportions of Miocene and Pliocene collections come from Eurasia, a much smaller proportion of pre-Miocene collections are from Eurasia. African sampling not only is less than North American or Eurasian, but simply quite poor prior to the Miocene. (Both results likely reflect the NOW database beginning as the Neogene Old World database, leading it to still be better populated with post-Oligocene data than pre-Miocene data.) The bulk of Campanian–Oligocene localities in our pooled data set come from North America.

Figure 1.

Chronology and stratigraphy for Campanian – Pliocene mammals. Time scale is modified from Gradstein, Ogg & Smith (2005). ‘Coll.’ gives the number of collections (taphonomically controlled fossiliferous localities) within each stratigraphic unit for the globe, Africa, Eurasia and North America. Data are from the paleodb and now. Solid lines divide the stages and substages used in these analyses; dashed lines separate substages lumped into single units.

Using the localities per-stratigraphic unit shown in Fig. 1, we evaluate the different basic models for per-collection sampling rates both globally and then by individual continent.

Results

For every interval considered, the lognormal distribution outperforms the geometric and (especially) the uniform model (Table 2; see Table S1 for the parameters of the best representative of each model). This is not simply due to extra parameters: the lognormal does much better than the geometric and uniform given AICc scores (Table S2). These simple models do very good jobs of summarizing the data. The difference in log-likelihoods between the best uniform rate hypothesis and the saturated model (i.e. the best possible hypothesis) represents the maximum improvement in log-likelihood that a model can offer. If we scale this difference to 1·0, then we find that in nearly all intervals, lognormals provide over 90% (and frequently over 95%) of the possible improvement in log-likelihoods over the uniform distribution (Fig. 2). Thus, lognormals do not leave huge room for improvement by still more complex models.

Table 2. Log-likelihoods for the best representatives of the three rate distribution models considered
Stage/Substage(s)UniformGeometricLognormalSaturated
  1. ‘Saturated’ gives the maximum possible log-likelihood from a theoretical hypothesis predicting the En] = observed fn for all = 1…N occurrences. We use log-likelihoods to evaluate how well the geometric and lognormal perform relative to the maximum possible performance.

Campanian−253·7−192·2−183·6−172·4
Maastrichtian−902·3−232·3−216·1−181·9
Danian−1281·9−725·3−696·5−665·6
Selandian–Thanetian−1929·9−931·4−790·3−755·0
Ypresian−8684·0−1664·8−1496·5−1389·3
Lutetian−1778·6−944·3−816·6−789·6
Bartonian–Priabonian−2076·8−1486·6−1399·6−1375·5
Rupelian−2186·3−1090·3−868·4−836·6
Chattian−752·8−617·8−569·6−556·6
Aquitanian–Burdigalian−7055·8−3549·3−3195·5−3122·5
Langhian–Serravallian−8853·7−3398·1−3094·3−3024·2
Tortonian–Messinian−7938·9−3906·0−3486·6−3438·1
Zanclean−1988·2−1473·4−1318·6−1295·4
Piacenzian–Gelasian−5450·9−2948·8−2654·4−2605·7
Figure 2.

The performance of the geometric and lognormal models relative to the best possible performance. The X-axis represents the difference between the best uniform rate model and a ‘saturated’ model that predicts the observed frequencies of taxa with 1, 2, … N occurrences. The latter represents the maximum possible likelihood for models such as the geometric or lognormal.

Lognormal sampling rates change perceptions of overall sampling markedly. Here, we summarize the lognormal using the midpoint rates associated with four equal-area partitions, that is, the rates at which 12·5%, 37·5%, 62·5% and 87·5% of taxa have higher rates (Yang 1994). The most likely uniform rate frequently is higher than the 4th quartile rate (Fig. 3) simply because the few commonly occurring taxa are less probable given low sampling rates than many infrequently occurring taxa are given high sampling rates. The 4th quartile rates often are similar to average per-million-year sampling rates estimated by range data for North American species alone (0·34 in Foote & Raup 1996; 0·48 in Foote 1997). Thus, both methods of ‘uniform’ sampling rates accommodate the common taxa before the rare ones.

Figure 3.

Per-collection sampling probabilities in different stratigraphic intervals for Meso-Cenozoic mammals. X gives the uniform rate maximizing the probability of the observed fossil record. We summarize the lognormal using the medians of four equal-area partitions of the lognormal distribution; thus, the highest rate is the 87·5‰ rate, whereas the lowest rate is 12·5‰ rate.

We approximate sampling rates per-million years within intervals, Rm, using the number of localities per-million years (Fig. 4). For any interval i, we estimate Rm as:

display math

where Ni is the number of collections and ti is the duration of the interval in millions of years.

Figure 4.

Estimated per-collection sampling rates per-million years in different stratigraphic intervals for Meso-Cenozoic mammals. Each per-collection sampling rates from Fig. 3 is rescaled to inline image where Ni is the number of collections and ti the duration (in millions of years) for interval i.

We can improve overall likelihoods still further not by employing more complicated mathematical models, but by applying separate lognormals to different data partitions. Here, we partition sampling rate distributions among the major biogeographic regions in our pooled data set (Fig. 5; Fig. S5 gives per-collection sampling rates; Table S3 gives details about each lognormal). We would get different sampling rates per-million years for these geographic partitions simply because of the different numbers of sampling opportunities per-million years in these regions (Fig. 1). However, the best model per-collection rate distributions often are very different for different regions from the same interval. The most obvious case is the early Eocene (approximately 52·5 Ma), where the single, global lognormal (Fig. 4) apparently represents a bad compromise between different regional lognormals (Fig. 5). However, several other intervals show marked differences in both geometric means and variances of per-collection sampling rates.

Figure 5.

Estimated collection sampling probabilities per-million years in different stratigraphic intervals for Meso-Cenozoic mammals now divided into different biogeographic units. Note that intervals with too few localities are excluded. See Fig. 3 for additional explanation.

Discussion

Application of results: divergence times among carnivoramorphan mammals

In lieu of a traditional discussion, we present an applied example using mammalian sampling rates in a phylogenetic context. Phylogenies are particularly useful for our discussion for two basic reasons. First, phylogenies allow reconstruction of ancestral geographic distributions (Ree 2005; Ree & Smith 2008), which in turn let us take advantage of different sampling rates for different regions. Second, there is some branch duration that will maximize the likelihood of any branch on a phylogeny given some distribution of character state and some hypothesized rate(s) of change (Felsenstein 1973). This latter point is particular critical because alternate macroevolutionary hypotheses often predict different rates of morphological change over particular time intervals and thus are optimized at different branch durations. Branch duration therefore is a critical nuisance parameter for hypotheses ranging from punctuated versus continuous morphological change (Marshall 1995b) to ‘big bangs’ in morphologic evolution early in clade histories (Wagner 1995b, 1997; Ruta, Wagner & Coates 2006). If we can indefinitely extend branch durations (and thus the time to accumulate change), then we often can elevate the likelihood of hypotheses of constant rates when a literal reading of the fossil record suggests rate decreases. However, the likelihoods of these durations might be strongly reduced whether the stratigraphic gaps implicit to those durations are improbable given sampling rates (Huelsenbeck & Rannala 1997; Wagner 2000a). ‘Molecular clock’ studies reverse the emphasis on the same parameters: character rate variation is the nuisance parameter, and branch durations are the inference (Huelsenbeck, Larget & Swofford 2000; Drummond et al. 2006). Although this traditionally was restricted to molecular characters, recent divergence-date studies have extended this to morphological characters when dating branches leading to fossil taxa (Pyron 2010, 2011; Ronquist et al. 2012). Marshall (2008) addresses bracketing divergence times with limited occurrence data; here, we approach the same general problem using a much broader range of information from occurrence data.

To explore how our approach might alter inferences, we calibrate branch durations for early (Eocene – Oligocene) carnivoramorphan mammals from North America and Eurasia under five different models using morphological, biogeographic and stratigraphic data. Here, we will discuss how different approaches in general and how branch duration calibrations in particular might affect conclusions drawn from Wesley-Hunt's (2005) analyses of carnivoramorphan dental disparity.

We use character data and a corresponding parsimony tree from Wesley-Hunt & Flynn (2005). We estimate morphological likelihoods using an Mk model assuming continuous change through time (Lewis 2001), with morphological rates assumed to follow a lognormal distribution estimated from compatibility tests (Wagner 2012) and an initial estimate of unsampled lineages over which change could accrue based on an global lognormal preservation rates (Foote 1996). We also use an Mk model to estimate likelihoods of geographic distributions (Ree & Smith 2008), with lineages assumed to occupy either North America or Eurasia, and with the probability of an unsampled lineage being in North America calculated simply as:

display math

Given this particular tree we use, ancestral geographies have very high probabilities for either North America or Eurasia save for a very few branches near a likely North American → Eurasian incursion (Fig. S6; see also Table S4).

The stratigraphic likelihood of any branch duration is the probability of zero finds over X million years based on per-collection sampling rates illustrated in Figs 3-5 and collections per-million years illustrated in Fig. 1:

display math

where A is a possible ancestral region, st and ST are the first and last stages (or other chronostratigraphic units) over which a branch spans, t is the duration within that stage that the branch spans, and q is one of four quartiles within the lognormal distribution (Wagner & Marcot 2010). Thus, RmA•s•q gives the per-million-year sampling rate for Area A in stage s from lognormal quartile q, and (1 − RmA•s•q)t gives the probability of zero finds over t million years. Branch likelihoods are now

display math

We then estimate divergence times and branch durations under five different models. In reverse order of complexity, these are

  • Model 1: Separate regional lognormal sampling rates from Fig. 5 (Fig. 6);
  • Model 2: Global lognormal sampling rates from Fig. 4 (Fig. S7a);
  • Model 3: Global uniform sampling rates from Fig. 4 (Fig. S7b);
  • Model 4: Branch durations optimized solely to fit Mk model with sampling rates ignored (hereafter, Model 4; Fig. S7c);
  • Model 5: Minimum divergence times determined as the oldest occurrences of descendant taxa, with both sampling rates and Mk likelihoods ignored (Fig. S7d).
Figure 6.

Phylogeny of early carnivoramorphan mammals with branch durations estimated from both morphological and stratigraphic data. Branch widths reflect stratigraphic likelihoods assuming different lognormal sampling rates for North America and Eurasia (Model 1 of text; see Fig. 5), with the probability of unsampled lineages occupying North America or Eurasia estimated based on illustrated distributions and an Mk model with expectation of 0·011 geographic shifts per-myr (see Fig. S6). Morphological likelihoods use an Mk model (Lewis 2001) assuming single lognormal distribution for rates of morphological change, which has a geometric mean of 0·028 changes per-myr that changes 3·4 times every standard deviation. Data and phylogeny modified from Wesley-Hunt & Flynn (2005). See Fig. S7 for trees estimated using different models.

The models become progressively simpler by assuming fewer variable terms in the stratigraphic likelihood. Model 1 allows for a separate RmA•s•q for each quartile in each region in each stage s. Model 2 simplifies this by assuming that Rm1•s•q = Rm2•s•q = … = RmA•s•q for all A regions (i.e. global lognormal sampling rates for each stage s). This eliminates the first summation. Model 3 simplifies still further by assuming that RmA•s•1 = RmA•s•2 = RmA•s•3 = RmA•s•4 for all A regions (i.e. a global uniform sampling rate for each stage s). This effectively reduces the second summation to (1 − Rms)t. Finally, Models 4 and 5 effectively eliminate this last term by assuming Rms = 0·0 and thus the probability of any gap = 1·0. In both cases, stratigraphic data are used only to set minimum divergence times: that is, the appearance on the oldest taxon in a clade if we assume no sampled ancestors. Within each clade, Model 4 uses the Mk model to calibrate some divergence time preceding the oldest taxon's first appearance (e.g. Viverravus gracilis in the V. minutus + Vgracilis pair). However, Model 5 ignores even this and will essentially ascribe branch durations of 0 to species such as Vgracilis. Note that Model 5 therefore necessarily ascribes the minimum possible divergence times (unless we add ancestor–descendant hypotheses; see Wagner 2000b,c) because we use no information to infer that divergences might be older than absolutely necessary. Model 4 ascribes the maximum possible divergence times because stratigraphic data cannot gainsay morphological data.

Table 3 summarizes the total time that each model allots for character change. Figure 7 contrasts the differences in individual branch durations, using the Model 1 (regional lognormal sampling rates) tree as a benchmark. These make a very important point: although our intuition might be that accounting for variation in the fossil record should increase the probability of gaps, in some cases, it will have the opposite effect. Here, assuming global lognormal sampling rates makes many gaps more probable rather than less probable (and thus deep divergences more likely rather than less likely; Fig. 8a) than when we account for geographic variation in sampling rates (i.e. Model 2 vs. Model 1 trees). This is because so many of the necessary gaps are for lineages that likely resided in the well-sampled North American realms. However, allowing for variation in sampling rates among taxa yields the intuitive result: not only do global uniform sampling rates imply much shorter branch durations than regional lognormals (Fig. 7), but uniform sampling rates make these shorter branch times less likely than regional lognormal sampling rates make longer branch durations (Fig. 8b).

Table 3. Sum of branch durations inferred under different models
Model∑ Branch durations (myr)
  1. This gives the total amount of time allotted for character change under Mk (or other) models. myr, million years.

1: Regional lognormal sampling rates + Mk131·8
2: Global lognormal sampling rates + Mk140·1
3: Global uniform sampling rates + Mk121·6
4: Mk only141·8
5: Minimum divergence times101·5
Figure 7.

Differences in branch durations between the Model 1 tree (using regional lognormal sampling rates to calculate stratigraphic likelihoods) and Model 2 (global lognormal sampling rates), Model 3 (global uniform sampling rates), Model 4 (stratigraphic likelihood ignored) and Model 5 (minimum divergence time) trees. Positive numbers indicate that the branch duration is greater on the Model 1 tree than on the contrasted tree.

Figure 8.

Log-likelihoods (lnL) of gaps implied by branch durations in carnivoramorphan phylogeny (Fig. 6). Note that although each plotted branch links the same taxa, the durations sometimes are different (Fig. 7). (a) Regional lognormal sampling rates versus global lognormal sampling rates. (b) Regional lognormal sampling rates versus global uniform sampling rates.

Carnivoramorphan dental characters show high disparity despite low taxonomic diversity very early in clade history (Wesley-Hunt 2005), which is consistent with elevated early rates of change (Foote 1993). However, dental characters are only a subset of the characters that we use (Wesley-Hunt & Flynn 2005). We therefore can ask whether disparity patterns among dental characters predict rates of change among all characters. The Model 4 tree obviously will contradict ideas of elevated early rates because it calibrates the both early and late branch durations assuming the same rates of change. However, stratigraphic likelihood assuming lognormal distributions of sampling rates (either global or regional) also allow for comparably deep divergences (and thus reasonably continuous rates of change) at the base of the clade. In contrast, the Model 3 (global uniform) tree would be much more disposed to favour elevated early rates of change, as it allots far shorter basal branch durations under the same model of morphological change. Finally, the Model 5 tree basically leaves the question open, as it provides no means for implying deeper divergences.

Dental disparity did not increase following the near extinction of likely competitors (the Creodonta) during the late Eocene, which suggests that increased ecological opportunities for carnivoramorphans did not elevate their rates of dental change. Notably, all four models using the Mk model for all characters are consistent with this: although those using stratigraphic data shorten many of the branch durations relative to the Mk model alone, they still allot substantial time for change. Again, the Model 5 tree will be more disposed towards favouring elevated Late Eocene rates simply because that tree allots no time for many nodal branches to accumulate change.

Although paleontological rate studies frequently ask whether rates are elevated early in clade history or in association with some major event, none have explored the idea of local rate variation among branches (Huelsenbeck, Larget & Swofford 2000; Drummond et al. 2006). This is where the Model 1 tree might well yield more tangible differences from the Model 4 tree: allowing some probability of elevated ‘local’ rates would elevate the total likelihood of the branch durations reduced by stratigraphic data (Fig. 7). However, the Model 1 tree would be less prone to doing this than the Model 3 simply because regional lognormal sampling rates reduce the likelihoods of gaps much less than global uniform rates do. Note that any stratigraphic likelihood model would be less biased towards supporting local rate heterogeneity than the Model 5 (minimum divergences) tree. This is not just because all stratigraphic likelihood trees extend many branches with near-zero durations given minimum divergences, but also because the stratigraphic likelihoods reduce the durations of branches linking clades in some cases.

Future directions

Relaxing assumptions about continuous distributions of localities over time

When assessing sampling rates per-million years, our approach currently assumes that localities are continuously distributed throughout an interval. This is rarely, if ever, true. However, biochronological techniques for ordinating localities based on constituent species combined with some absolute dates offer the potential for very high resolution biochronological placement of localities (Alroy 1994; Sadler, Kemple & Kooser 2003). This can allow us to generate different per-myr and even per-collection sampling rates within intervals.

Origination and extinction

Another advantage to ordinating collections within stratigraphic units is that these results will often show that species durations are less than that of whole stratigraphic intervals (Alroy 1996). This is important because our estimates of sampling rates do not take into account turnover within stratigraphic intervals: instead, we assume that any taxon present in an interval was present throughout the entire interval. For taxa with true durations less than that of the entire interval, many collections currently tallied as ‘gaps’ actually come from before or after the species’ lifetimes: and this biases our method towards underestimating sampling rates. An obvious next step in this sort of approach is to add origination and extinction parameters (Weiss & Marshall 1999). However, we stress that the approach as done here should be ‘conservative’ with respect to rejecting null hypotheses because of long stratigraphic gaps.

Conclusions

The existence of large databases of fossil occurrences such as the PaleoDB and NOW allows us to assess sampling rates over time and space in greater detail than ever before. Here, we show that in the case of fossil mammals at least, lognormal distributions of sampling rates among taxa prevail. Moreover, these distributions vary considerably over time and among contemporaneous geographic areas. Many interesting macroevolutionary hypotheses concerning rates of morphological change, speciation patterns and turnover events differ in the gaps they require between observed stratigraphic ranges and either divergence times or extinction times. Combining these models and these data should allow evolutionary biologists to more fully exploit the fossil record as a tool for corroborating or contradicting these hypotheses while at the same time allowing for the uncertainties inherent to the fossil record.

Acknowledgements

We thank the special issue editors for their invitation and their subsequent forbearance. We also thank D. Bapst and P. D. Polly for very insightful reviews that (hopefully) led to clarification of our primary goals and concepts. For discussions about the appropriate distributions to model sampling rates, we thank J. Alroy. This represents PaleoDB Publication No. 182. For those data, we thank in particular J. Alroy, K. Behrensmeyer, M. Uhen, A. Turner, L. v. d. Hoek Ostende and M. Carrano. Occurrence data and a C program for estimating per-collection sampling rates are available at the Dryad Data repository (http://datadryad.org; doi:10.5061/dryad.3b87j).

Ancillary