Time for a rethink: time sub‐sampling methods in disparity‐through‐time analyses
Data archiving statement:
Data for this study are available in Bapst et al. ( 2016b); Wright ( 2017b); Brusatte et al. ( 2014b); Beck & Lee ( 2014) but can also be found on GitHub: https://github.com/nhcooper123/time-slice; and Zenodo: https://doi.org/10.5281/zenodo.1172000. Supporting information can be found in the Dryad Digital Repository: https://doi.org/10.5061/dryad.vp4q518.
Abstract
Disparity‐through‐time analyses can be used to determine how morphological diversity changes in response to mass extinctions, or to investigate the drivers of morphological change. These analyses are routinely applied to palaeobiological datasets, yet, although there is much discussion about how to best calculate disparity, there has been little consideration of how taxa should be sub‐sampled through time. Standard practice is to group taxa into discrete time bins, often based on stratigraphic periods. However, this can introduce biases when bins are of unequal size, and implicitly assumes a punctuated model of evolution. In addition, many time bins may have few or no taxa, meaning that disparity cannot be calculated for the bin and making it harder to complete downstream analyses. Here we describe a different method to complement the disparity‐through‐time tool‐kit: time‐slicing. This method uses a time‐calibrated phylogenetic tree to sample disparity‐through‐time at any fixed point in time rather than binning taxa. It uses all available data (tips, nodes and branches) to increase the power of the analyses, specifies the implied model of evolution (punctuated or gradual), and is implemented in R. We test the time‐slicing method on four example datasets and compare its performance in common disparity‐through‐time analyses. We find that the way we time sub‐sample taxa can change our interpretations of the results of disparity‐through‐time analyses. We advise using multiple methods for time sub‐sampling taxa, rather than just time binning, to gain a better understanding disparity‐through‐time.
Disparity‐through‐time analyses are common in palaeontology (Gould 1991; Briggs et al. 1992; Wills et al. 1994; Foote 1994). They reveal how the morphological diversity of clades has changed through time, and allow us to make inferences about the breadth of ecological niches that extinct taxa occupied (Foote 1997). Results from disparity‐through‐time studies also provide insights into the ecological impacts of mass extinctions, competitive replacements, and the drivers of morphological evolution (Foote 1996; Brusatte et al. 2008b; Friedman 2010). Unfortunately, the way we perform these analyses may have profound effects on our conclusions.
Disparity‐through‐time analyses have two main analysis components: calculating disparity, and creating time sub‐subsets of the data. Here we focus on the latter. The nature of disparity (i.e. it is a diversity metric) means that it cannot be calculated using a single individual, so some way of sub‐sampling taxa is required. Changes in disparity‐through‐time are generally investigated by calculating the disparity of taxa present during specific time intervals or time bins (e.g. Cisneros & Ruta 2010; Prentice et al. 2011; Hughes et al. 2013; Hopkins 2013; Benton et al. 2014; Benson & Druckenmiller 2014). These time bins are usually defined based on stratigraphy (e.g. Cisneros & Ruta 2010; Prentice et al. 2011; Hughes et al. 2013; Benton et al. 2014) but can also be arbitrarily chosen time bins of equal (or approximately equal) duration (Butler et al. 2012; Hopkins 2013; Benson & Druckenmiller 2014). However, this approach has several limitations.
First, time bins defined by stratigraphy are not of equal size, biasing higher disparity towards longer stratigraphic periods. This can be dealt with using rarefaction methods, i.e. repeating the analysis while resampling the taxa to have the same number of taxa in each bin (e.g. using bootstrapping with limited resampling). This can, however, lead to large confidence intervals when there are stratigraphic periods with few species. Other studies split large time bins so they are of roughly equal size, but this is often an ad hoc procedure that can introduce more bias depending on where bins are split. Second, the time binning approaches (whether bins are equally sized or not) favour punctuated equilibrium modes of evolution. Whether the disparity represents an average across the interval (with no interpretation of if or how it varies within the time bin), or it is effectively postulated to be constant, when analysing the changes in disparity‐through‐time, this method will only allow changes in disparity to occur between intervals rather than also allowing for gradual changes within intervals (a pattern that is fairly common in the fossil record; Hunt et al. 2015). Third, when investigating changes in disparity due to events at a specific time point (e.g. a mass extinction) time bins may have not have high enough resolution to resolve changes at the event; for example, if time bins are every 20 million years it may be hard to capture the effects of an event five million years into the bin. Finally, time bin analyses are often limited by the number of taxa in each bin. If there are insufficient taxa in a time bin, disparity cannot be calculated, so further analyses, e.g. correlations of disparity with hypothesized drivers of morphological evolution, are not possible.
To address these issues, we propose a ‘time‐slicing’ approach that takes advantage of the wealth of palaeontological datasets that now have associated phylogenies. Time‐slicing uses a phylogenetic tree and considers subsets of taxa at specific equidistant points in time, as opposed to considering subsets of taxa between two points in time (a similar approach was outlined by Halliday & Goswami 2016). This results in even‐sampling across time and permits us to define the underlying model of character evolution (punctuated or gradual). Time‐slicing also includes any element present in the phylogeny (branches, nodes and tips) at the time‐slice in question as part of the disparity calculation. This allows us to measure disparity at time points where there are no sampled terminal taxa, and increases the sample size at each time point, making downstream analyses of the drivers of disparity much more feasible.
Here we present our time‐slicing methods using four datasets taken from the literature. We calculate disparity‐through‐time for each dataset using a range of time binning and time‐slicing methods, and then compare these approaches with respect to the relative disparities calculated, but also investigate how the different approaches influence biological conclusions. We find that the choice of time sub‐sampling method can have profound effects on the conclusions of disparity‐through‐time analyses.
Material and Method
Overview
To test the different time sub‐sampling methods, we followed the protocol below (Fig. 1). All the code needed to reproduce these analyses (along with detailed instructions) is provided on GitHub (Guillerme & Cooper 2018a).

Example datasets
To test the different time binning/slicing methods we selected four datasets: a mammal dataset from Beck & Lee (2014), two theropod datasets from Brusatte et al. (2014a) and Bapst et al. (2016a), and a crinoid dataset from Wright (2017a). Table 1 and Guillerme & Cooper (2018b, appendix S1) provide more details. Each dataset consists of first and last occurrence dates for all taxa, a matrix of morphological characters in NEXUS format, and a time‐scaled phylogeny. These datasets are freely available with their accompanying papers (Table 1) but for reproducibility purposes we also provide the data we used on GitHub (Guillerme & Cooper 2018a).
| Beck 2014 | Brusatte 2014 | Bapst 2016 | Wright 2017 | |
|---|---|---|---|---|
| Group | Mammals | Theropods | Theropods | Crinoids |
| No. taxa | 106 | 152 | 89 | 42 |
| No. characters | 421 | 853 | 374 | 87 |
| Age range (myr)aa
Age ranges are root time to most recent tip taxon.
|
171.8–0 | 168.5–66 | 207.2–66 | 485.4–372.2 |
| Mass extinction (mya) | 66 (K–Pg) | NA | NA | 443 (O–S) |
| References | Beck & Lee (2014) | Brusatte et al. (2014a) | Bapst et al. (2016a) | Wright (2017a) |
| Data reference | Beck & Lee (2014) | Brusatte et al. (2014b) | Bapst et al. (2016b) | Wright (2017b) |
- a Age ranges are root time to most recent tip taxon.
Preparing the data for disparity‐through‐time analysis
Estimating ancestral character states
For each dataset we estimated the ancestral character states at each node using the AncStatesEstMatrix function from the Claddis R package (Lloyd 2016; R Core Team 2017). This function uses the re‐rooting method (Yang et al. 1996; Garland & Ives 2000) to get Maximum Likelihood estimates of the ancestral states for each character at every node in the phylogeny (based on the rerootingMethod function in phytools; Revell 2012). Inapplicable and missing characters for any taxon were treated as ambiguous characters (i.e. any possible observed state for the character). To prevent poor ancestral state estimations from biasing our results, especially when a lot of error is associated with the estimations, we only included ancestral state estimations with a scaled Likelihood ≥ 0.95. Ancestral state estimations with scaled Likelihoods below this threshold were recoded as missing data (‘?’). This allowed our results to be less dependent on the quality (or the absence thereof) of the ancestral state estimations, especially in parts of the datasets where data were sparse. This approach is similar to Brusatte et al. (2011) but uses model‐based estimations (rather than parsimony) allowing us to control for ambiguous (i.e. poorly estimated) nodes.
Building morphospaces
To explore disparity‐through‐time in our datasets, we used a morphospace approach (e.g. Foote 1994, 1996; Wesley‐Hunt 2005; Brusatte et al. 2008b; Friedman 2010; Toljagić & Butler 2013; Hughes et al. 2013). Morphospaces can be obtained from any multidimensional morphological dataset but can differ in the data used (e.g. discrete or continuous), and whether they include phylogenetic data or not. Although empirical morphospaces from discrete or continuous data have been shown to have similar properties (Foth et al. 2012; Hetherington et al. 2015), our morphospaces are based on discrete morphological data (originally collected for phylogenetic analysis; cf. geometric morphometric data) and include some phylogenetic information (see above). Mathematically, our morphospaces are n dimensional objects that summarize the distances between discrete morphological characters of the taxa present and their ancestors.
Constructing distance matrices
To estimate the morphospaces for each of our datasets we first constructed pairwise distance matrices of length n, where n is the total number of tips and nodes in the dataset. We calculated the n × n distances using the Gower distance (Gower 1971), i.e. the number of mismatched characters over the number of shared characters. This allows us to correct for distances between two taxa that share many characters and could be closer to each other than to taxa with fewer characters in common (i.e. because some pairs of taxa share more characters in common than others, they are more likely to be similar). For discrete morphological matrices, using this corrected distance is preferable to the raw Euclidean distance because of its ability to deal with discrete or/and ordinated characters as well as with missing data (Anderson & Friedman 2012). However, the Gower distance cannot calculate distances when taxa have no overlapping data. Therefore, we used the TrimMorphDistMatrix function from the Claddis R package to remove pairs of taxa with no cladistic characters in common. This led to us removing 9 taxa from the Bapst et al. (2016a) dataset, and 19 from the Brusatte et al. (2014a) dataset, but none from the other two datasets (see Guillerme & Cooper (2018b, appendix S1) for details of the species removed).
Ordination
After constructing our distance matrices, we transformed them using classical multidimensional scaling (MDS; Torgerson 1965; Gower 1966; Cailliez 1983). This method (also referred to as PCO, e.g. Brusatte et al. 2015; or PCoA, e.g. Paradis et al. 2004; but distinguished in Legendre & Legendre 2012) is an eigen decomposition of the distance matrix. Because we used Gower distances instead of raw Euclidean distances, negative eigenvalues can be calculated. To avoid this problem, we first transformed the distance matrices by applying the Cailliez correction (Cailliez 1983) which adds a constant c* to the values in a distance matrix (apart from the diagonal) so that all the Gower distances become Euclidean (dGower + c* = dEuclidean; Cailliez 1983). We were then able to extract k eigenvectors for each matrix (representing the k dimensions of the morphospace) where k is equal to n − 2, i.e. the number of taxa in the matrix (n) minus the last two eigenvectors that are always null after applying the Cailliez correction. Contrary to previous studies (e.g. Brusatte et al. 2008a; Cisneros & Ruta 2010; Prentice et al. 2011; Anderson & Friedman 2012; Hughes et al. 2013; Benton et al. 2014), we use all k dimensions of our morphospaces and not a sub‐sample representing the majority of the variance in the distance matrix (e.g. selecting only x dimensions that represent up to 90% of the variance in the distance matrix; Brusatte et al. 2008b; Toljagić & Butler 2013). Note that our morphospaces represent an ordination of all possible morphologies coded in each study through time. It is unlikely that all morphologies will co‐occur at each time point, therefore, the disparity of the whole morphospace is expected to be greater than the disparity at any specific point in time.
Disparity‐through‐time analyses
Disparity‐through‐time analyses were performed using the dispRity R package (Guillerme 2015).
Calculating disparity
(1)
is the variance for the
dimension ranging from n to n − 2 with n being the number of taxa in the dataset. Note that there are still statistical issues with this metric (such as the co‐variance between dimensions not being measured), but for the purposes of comparison with previous work we decided to use a standard metric for these analyses.
Time sub‐sampling
To estimate disparity‐through‐time we first need to split the data into time sub‐samples. Here we use three time sub‐sampling methods:
- Stratigraphic time bins. This is the traditional method, where all the taxa within each stratigraphic period are included in the disparity calculation. This often leads to bins of unequal duration. Here we use stratigraphic stages and epochs.
- Equally sized time bins. This is another commonly used method, where the time frame of interest is split into equally sized time bins, then all the taxa within each time bin are included in the disparity calculation.
- Time‐slicing. We describe this in more detail below, but in brief, time‐slicing uses a phylogeny, and rather than binning the data, it takes slices through a phylogeny and includes all the taxa and nodes in that slice within the disparity calculation.
Time‐slicing
The ‘time‐slicing’ approach considers subsets of taxa in the morphospace at specific equidistant points in time, as opposed to considering subsets of taxa between two points in time. This results in even‐sampling of the morphospace across time and allows us to use different underlying models of character evolution (punctuated or gradual).
In practice, time‐slicing considers the disparity of any element present in the phylogeny (branches, nodes and tips) at any point in time. When the phylogenetic elements are nodes or tips, the ordination scores for the nodes (estimated using ancestral state estimations as described above) or tips are directly used for calculating disparity. When the phylogenetic elements are branches we choose the ordination score for the branch using one of two evolutionary models:
- Punctuated evolution. This model selects the ordination score from either the ancestral node or the descendant node/tip of the branch regardless of the position of the slice along the branch. Similarly to the time bin approach, this reflects a model of punctuated evolution where changes in disparity occur either at the start or at the end of a branch over a relatively short time period, and clades undergo long periods of stasis during their evolution (Gould & Eldredge 1977; Hunt 2007). We apply this model in four ways:
-
The ‘acctran’ model, always selecting the ordination score of the descendant node/tip of the branch.
-
The ‘deltran’ model, always selecting the ordination score of the ancestral node of the branch.
-
The ‘random’ model, randomly selecting the ordination score of either the ancestor or the descendant of the branch.
-
The ‘proximity’ model, selecting the ordination score of the ancestor if the slice occurs in the first half of the branch, and the descendant if the slice occurs in the second half of the branch. The two first models assume that changes always occur early (accelerated transition) or late along the branches (delayed transition). The third model makes neither assumption and simply selects data from the ancestor or the descendant at random, and the fourth bases the selection of either the ancestor or the descendant on where the slice occurs along the branch. These punctuated models only select either the ordination score from the ancestor and the descendant once in the whole disparity analysis. For example, if using the ‘random’ model, if the data of the ancestor has been randomly chosen, only this data will be used during the bootstrapping (see below) and for the disparity calculation.
-
- Gradual evolution. Unlike the punctuated models, the following models do not select the ordination score of either the ancestor or the descendant but associate a probability to both. This reflects a model of gradual evolution where changes in disparity are gradual and cumulative along the branch.
- The ‘equal splits’ model (probabilistic), selects the ordination score from both the ancestor and the descendant with an equal probability:
(2) - The ‘gradual splits’ model (probabilistic), selects the ordination score from both the ancestor and the descendant with a probability function of the distance between the nodes/tip at the ends of the branch and the slice:
(3)
- The ‘equal splits’ model (probabilistic), selects the ordination score from both the ancestor and the descendant with an equal probability:
(4)In these models, the ordination scores of both the ancestor and descendant contribute to the disparity calculation. For example, using the ‘gradual splits’ model, if the slice occurs in the third quarter of a branch joining node A to node/tip B (75% of the total branch length), after bootstrapping, the disparity results will be based on 25% of the data from A and 75% of the data from B. Because of the probabilistic nature of these models, they are only meaningful when calculating disparity from bootstrapped datasets.
It is important to note that the time‐slicing method is not an ancestral states estimation method per se. This method does not estimate values along a branch applying a model (cf. methods described for ancestral character estimation in the ‘Preparing the data for disparity‐through‐time analysis’ section above) but rather chooses between the two available pieces of information (the ordination score of the descendant or the ancestor) using the methods described above. This allows the method to be used in post‐ordination analysis where the data used in each time‐slice is data already present in the morphospace. In other words, this method does not require a re‐ordination of the morphospace every time a slice goes through a branch, thus allowing the properties of the morphospace (e.g. distance between species, variance of each axis, etc.) to remain constant. For example, using the ‘equal splits’ model on an ancestor and a descendant with PCO1 values of respectively 0.04 and 0.03, after a sufficient number of bootstrap replicates (e.g. 100) the value along the branch will be close to 0.5 × 0.04 + 0.5 × 0.03 = 0.035. By estimating this value rather than generating it (i.e. creating a new element mid‐way along the branch that would be the average of the descendant and ancestor – 0.035) we obtain the same results without modifying the morphospace properties.
Comparing time sub‐sampling methods
To compare the time binning and time‐slicing approaches we applied the methods as follows (see Fig. 2):

- Stratigraphy: time sub‐samples defined by stratigraphic periods (Fig. 2):
- Time bins (unequal). We calculated disparity for the taxa in each stratigraphic period (stage or epoch). To reduce the influence of outliers on our disparity estimates, we bootstrapped each disparity measurement for each time bin by randomly resampling with replacement a new sub‐sample of taxa from the observed taxa in the bin 100 times. We then calculated the median disparity value for each time bin along with the 50% and 95% confidence intervals.
- Time‐slices (non‐equidistant). We calculated disparity using our time‐slicing approach with slices occurring at the midpoint of each stratigraphic period (stage or epoch), and using all six time‐slicing methods (acctran, deltran, random, proximity, equal splits and gradual splits). To reduce the influence of outliers on our disparity estimates, we bootstrapped each disparity measurement as described above for the stratigraphic time bins.
- Duration: time sub‐samples defined by the duration of stratigraphic periods (Fig. 2):
- Time bins (equal). We calculated disparity for the taxa in each time bin where time bin size was defined by the mean duration of the stratigraphic period (stage or epoch), and bootstrapped the disparity values as described above.
- Time‐slices (equidistant). We calculated disparity using our time‐slicing approach where the interval between slices, was defined by the mean duration of the stratigraphic period (stage or epoch). We used the six time‐slicing methods and bootstrapped as described above.
- Number: time sub‐samples defined by the number of stratigraphic periods (Fig. 2):
- Time bins (equal). We calculated disparity for the taxa in each time bin where the number of time bins was defined by the number of stratigraphic periods (ages or epochs) in the time frame of interest, and bootstrapped the disparity values as described above.
- Time‐slices (equidistant). We calculated disparity using our time‐slicing approach where the number of slices, was defined by the number of stratigraphic periods (ages or epochs) in the time frame of interest. We used the six time‐slicing methods and bootstrapped as described above.
We also recorded the number of taxa (or taxa and nodes for time‐slicing methods) in each sub‐sample as a proxy for taxonomic diversity.
Testing for differences in the time sub‐sampling methods
Testing for statistical differences among the time sub‐sampling methods described above is difficult, as we need to compare similar units, and also to tackle questions important to the interpretation of disparity‐through‐time analyses. We therefore present three different, simple ways of comparing the time sub‐sampling methods as follows.
Systematic differences in disparity‐through‐time
To test whether using time bins or time‐slices resulted in significantly different disparity values at common time points, we used paired Wilcoxon tests to compare the median bootstrapped disparities obtained in the stratigraphy (time sub‐samples defined by stratigraphic periods), duration (time sub‐samples defined by the duration of stratigraphic periods), and number (time sub‐samples defined by the number of stratigraphic periods) analyses described above.
Due to the uneven spread of taxa across phylogenies, some time bins will contain one or no species, meaning that we cannot estimate disparity for that time bin. Therefore, we first removed the time bins, and corresponding time‐slices, without disparity estimates. We then performed paired Wilcoxon tests with Bonferroni corrected p‐values, so that bins and slices for the same time period are being compared. Significant results suggest that there is a systematic difference in disparity values at each time point, depending on whether bins or slices are used.
Disparity peaks
We are perhaps more interested in how the conclusions of disparity‐through‐time analyses are influenced by the choice of time sub‐sampling method, rather than the disparities estimated by each method per se, especially as these will be influenced by the number of taxa (and/or nodes) included in each sub‐sample. Therefore, we also investigated where peaks of disparity occurred in each of our datasets for the different time sub‐sampling methods. We calculated the maximum bootstrapped disparities for each dataset and for each time sub‐sampling method, along with their associated confidence intervals. Significant shifts in disparity peaks suggest that the choice of time sub‐sampling method will influence our conclusions about relative changes in the disparity of our groups through time.
Effects of mass extinction events
Many analyses of disparity‐through‐time aim to demonstrate differences in disparity before and after mass extinction events. Two of our four datasets contain taxa before and immediately after a mass extinction (Cretaceous–Palaeogene 66 Ma; Beck & Lee 2014; Ordovician–Silurian 455–430 Ma; Wright 2017a), so we used Wilcoxon tests with Bonferroni corrected p‐values to compare disparity in the time bin/slice prior to the appropriate mass extinction, to that of the time bin/slice following the extinction event. Significant results suggest an effect of the mass extinction on disparity in the group. We then compare these results across the time sub‐sampling methods to determine if our conclusions change depending on the method used. We repeated these analyses using the two time bins/slices after the one immediately following the mass extinction event to account for any lag effects of the mass extinction on disparity.
Results
Disparity‐through‐time analyses
Disparity changes through time for each of our four datasets (Fig. 3; Guillerme & Cooper 2018b, appendix S2: figs A1–A2). Relative disparities tend to be lower with time binning methods, probably because these contain fewer taxa than time‐slicing methods. The six different time‐slicing methods (acctran, deltran, random, proximity, equal splits and gradual splits) show similar patterns, so we focus only on the results for one method with a punctuated model of evolution (specifically the ‘proximity’ method), and one method with a gradual model of evolution (specifically the ‘gradual splits’ method). Results for all six methods can be found in Guillerme & Cooper (2018b, appendix S2: figs A1–A2).

Testing for differences in the time sub‐sampling methods
Systematic differences in disparity‐through‐time
There is no overall significant systematic difference among the disparities calculated using time bins and those calculated using the time‐slicing methods (Table 2; Guillerme & Cooper 2018b, appendix S2: table A1). Instead, the differences depend on the dataset and method in question. For example, the Brusatte et al. (2014a, b), Bapst et al. (2016a, b) and Wright (2017a, b) datasets, show significant differences when using bins versus time‐slices defined by stratigraphy, but the Beck & Lee (2014) dataset appears robust to these different approaches. Likewise, the Beck & Lee (2014), Brusatte et al. (2014a) and Bapst et al. (2016a) datasets have different disparities when the number of bins or slices is the mean number of stratigraphic periods, but this is not seen in the Wright (2017a) dataset. Note that for epochs, we find fewer significant differences simply because the smaller numbers of bins and slices being compared means we have low power to detect a significant difference.
| Dataset | Period | Methodaa
Time‐slices used either a punctuated (‘proximity’ method) or gradual (‘gradual splits’ method) model of evolution.
|
Stratigraphybb
Stratigraphy uses unequal time bins or non‐equidistant time‐slices, where the width of the bin, or the interval between slices, is equivalent to stratigraphic stages or epochs.
|
Durationcc
Duration uses equal time bins or equidistant time‐slices, where the width of the bin, or the interval between slices, is the average duration of stratigraphic stages or epochs in the time frame of the dataset.
|
Numberdd
Number uses equal time bins or equidistant time‐slices, where the number of bins, or the number of slices, is the average number of stratigraphic stages or epochs in the time frame of the dataset.
|
|---|---|---|---|---|---|
| Beck 2014 | Stage | Gradual splits | 111 | 115ee
p < 0.001 (p‐values were Bonferroni corrected). Results for other time‐slicing methods are in Guillerme & Cooper (2018b, appendix S2: table A1).
|
65ee
p < 0.001 (p‐values were Bonferroni corrected). Results for other time‐slicing methods are in Guillerme & Cooper (2018b, appendix S2: table A1).
|
| Beck 2014 | Stage | Proximity | 105 | 83 | 68ee
p < 0.001 (p‐values were Bonferroni corrected). Results for other time‐slicing methods are in Guillerme & Cooper (2018b, appendix S2: table A1).
|
| Beck 2014 | Epoch | Gradual splits | 21 | 39 | 43ee
p < 0.001 (p‐values were Bonferroni corrected). Results for other time‐slicing methods are in Guillerme & Cooper (2018b, appendix S2: table A1).
|
| Beck 2014 | Epoch | Proximity | 21 | 36 | 32 |
| Brusatte 2014 | Stage | Gradual splits | 28ee
p < 0.001 (p‐values were Bonferroni corrected). Results for other time‐slicing methods are in Guillerme & Cooper (2018b, appendix S2: table A1).
|
61ee
p < 0.001 (p‐values were Bonferroni corrected). Results for other time‐slicing methods are in Guillerme & Cooper (2018b, appendix S2: table A1).
|
52ee
p < 0.001 (p‐values were Bonferroni corrected). Results for other time‐slicing methods are in Guillerme & Cooper (2018b, appendix S2: table A1).
|
| Brusatte 2014 | Stage | Proximity | 27ee
p < 0.001 (p‐values were Bonferroni corrected). Results for other time‐slicing methods are in Guillerme & Cooper (2018b, appendix S2: table A1).
|
31 | 28ee
p < 0.001 (p‐values were Bonferroni corrected). Results for other time‐slicing methods are in Guillerme & Cooper (2018b, appendix S2: table A1).
|
| Brusatte 2014 | Epoch | Gradual splits | 3 | 6 | 6 |
| Brusatte 2014 | Epoch | Proximity | 0 | 5ee
p < 0.001 (p‐values were Bonferroni corrected). Results for other time‐slicing methods are in Guillerme & Cooper (2018b, appendix S2: table A1).
|
5 |
| Bapst 2016 | Stage | Gradual splits | 93 | 153 | 165 |
| Bapst 2016 | Stage | Proximity | 57ee
p < 0.001 (p‐values were Bonferroni corrected). Results for other time‐slicing methods are in Guillerme & Cooper (2018b, appendix S2: table A1).
|
47 | 75ee
p < 0.001 (p‐values were Bonferroni corrected). Results for other time‐slicing methods are in Guillerme & Cooper (2018b, appendix S2: table A1).
|
| Bapst 2016 | Epoch | Gradual splits | 4 | 6 | 12 |
| Bapst 2016 | Epoch | Proximity | 2 | 0ee
p < 0.001 (p‐values were Bonferroni corrected). Results for other time‐slicing methods are in Guillerme & Cooper (2018b, appendix S2: table A1).
|
8 |
| Wright 2017 | Stage | Gradual splits | 152ee
p < 0.001 (p‐values were Bonferroni corrected). Results for other time‐slicing methods are in Guillerme & Cooper (2018b, appendix S2: table A1).
|
155 | 116 |
| Wright 2017 | Stage | Proximity | 160ee
p < 0.001 (p‐values were Bonferroni corrected). Results for other time‐slicing methods are in Guillerme & Cooper (2018b, appendix S2: table A1).
|
175ee
p < 0.001 (p‐values were Bonferroni corrected). Results for other time‐slicing methods are in Guillerme & Cooper (2018b, appendix S2: table A1).
|
101 |
| Wright 2017 | Epoch | Gradual splits | 28 | 29 | 21 |
| Wright 2017 | Epoch | Proximity | 23 | 28 | 18 |
- a Time‐slices used either a punctuated (‘proximity’ method) or gradual (‘gradual splits’ method) model of evolution.
- b Stratigraphy uses unequal time bins or non‐equidistant time‐slices, where the width of the bin, or the interval between slices, is equivalent to stratigraphic stages or epochs.
- c Duration uses equal time bins or equidistant time‐slices, where the width of the bin, or the interval between slices, is the average duration of stratigraphic stages or epochs in the time frame of the dataset.
- d Number uses equal time bins or equidistant time‐slices, where the number of bins, or the number of slices, is the average number of stratigraphic stages or epochs in the time frame of the dataset.
- e p < 0.001 (p‐values were Bonferroni corrected). Results for other time‐slicing methods are in Guillerme & Cooper (2018b, appendix S2: table A1).
Disparity peaks
In the Beck & Lee (2014) and Bapst et al. (2016a) datasets, disparity peaks occur much at much older ages when time‐slicing rather than time binning approaches are used (Fig. 4; Guillerme & Cooper 2018b, appendix S2: figs A3–A4). This is also true for stratigraphic time bins in the Wright (2017a) dataset, although when using equal time bins the peaks are later than the time‐slicing methods, or very similar (Fig. 4; Guillerme & Cooper 2018b, appendix S2: figs A3–A4). Across the three time binning methods, the Brusatte et al. (2014a) dataset has similar disparity peaks whichever method is used, the Wright (2017a) dataset only has variation in peaks when using unequal time bins (stratigraphy), whereas in the Bapst et al. (2016a) and Beck & Lee (2014) datasets, stratigraphic (unequal) versus equally sized time bins make a large difference to where the disparity peak occurs (Fig. 4; Guillerme & Cooper 2018b, appendix S2: figs A3–A4). Additionally, there seem to be small discrepancies within the time‐slicing methods (gradual splits vs proximity) except in the Beck & Lee (2014) dataset where the gradual splits model recovered disparity peaks at younger ages than the proximity model (Fig. 4; Guillerme & Cooper 2018b, appendix S2: figs A3–A4).

Effects of mass extinction events
Mass extinction events influence disparity in both the Beck & Lee (2014) and Wright (2017a) datasets (Fig. 5). However, whether this change in disparity is significant or not depends on the method used to create time sub‐samples (Fig. 5), and whether stages or epochs are used. In general, for the Beck & Lee (2014) dataset, time binning tended to give more significant results than time‐slicing methods, but this was not the case for the Wright (2017a) dataset.

Discussion
Disparity‐through‐time analyses are influenced by the choice of time sub‐sampling method used to divide the taxa. While differences in the relative disparities calculated among time sub‐sampling methods may not be of much biological importance, these changes can have important implications for the conclusions of downstream analyses. For example, using stratigraphic epochs as our reference time period, there are 21 potential methods for time sub‐sampling our data (splitting by stratigraphy, number and duration, see methods, and using time bins or one of six time‐slicing methods). Of these 21 methods, in 16 (76%) we show that placental mammals (Beck & Lee 2014) significantly increased in disparity in the time bin/slice immediately after the K–Pg mass extinction event, and in 20 (95%) we show that crinoids (Wright 2017a) significantly decreased in disparity in the time bin/slice immediately after the O–S mass extinction event. Given the high congruence (76% and 95%) of these results, one could argue that time‐sub‐sampling methods are not important. However, if we had chosen to investigate crinoid disparity only using time bins and splitting these so the number of time bins was equal to the number of epochs (number), we would have concluded that the O–S extinction had no effect on crinoid disparity. Likewise, the timing of peak disparity differs among methods. This is particularly evident when comparing stratigraphic time bins to time slicing methods, where for most of our datasets we see a much later time to peak disparity. This could have major implications for our understanding of how morphological diversity changes through time, for example in response to climate. These results highlight the sensitivity of disparity‐through‐time analyses to the choice of time sub‐sampling method. Fortunately, this issue is easy to solve; either disparity‐through‐time analyses should use, and report results from, multiple time sub‐sampling methods (as demonstrated here), or great care should be taken in determining the appropriate time sub‐samples to answer the question of interest.
Time‐slicing has several advantages over time binning (using either equally or unequally sized bins) approaches. First, it allows us to use as much of the information available to us, in the form of phylogenetic relationships and ancestral taxa, as possible. This increases our ability to investigate key biological questions, such as how do various drivers influence morphological diversity through time, and how do mass extinctions influence disparity (Foote 1996; Brusatte et al. 2008b; Friedman 2010), both by increasing the statistical power of analyses and through the availability of data at key time points in the history of our groups. Second, we are able to be more explicit about the mode of evolution in our clades; in time‐slicing we can apply punctuated or gradual models of trait change rather than making an assumption of punctuated evolution. This may be important, as gradual change is a common pattern of trait evolution in the fossil record (Hunt 2007).
Of course the method also has limitations. The main one of these is a practical one; it requires a time‐calibrated phylogeny and these are not available for all palaeontological datasets. Furthermore, like most phylogeny based methods, time‐slicing depends on ancestral state estimations. Care should be taken in interpreting these, as they are highly dependent on the data and models used for the estimations (Ekman et al. 2008; Slater et al. 2012). The difference between the time‐binning and time‐slicing results could also simply be due to the nature of the fossil record. Rates of sedimentation vary in time and space influencing the groups found within the rock record and their temporal distribution. In this case, different beds could represent different ‘packages’ of fauna through time separated by gaps, resulting in natural ‘bins’ rather than slices of the data. Slicing through such strata will yield similar results no matter where in time the slice occurs. It is important to note however, that the time slicing method also includes ancestral estimations (either through the nodes or the branches) that are by definition not available in the fossil record and thus are not influenced by its nature. Additionally, this effect is likely to be most obvious in groups where the fossil record is ‘patchy’ (e.g. vertebrates) but less problematic for groups with a more continuous record, such as Foraminifera. Finally, Hunt et al. (2015) found that time series are best characterized by gradual directional changes (biased random walks). In fact, homogeneous directional changes are more likely to be supported than heterogeneous ones (e.g. punctuated changes) in longer duration series with few samples in each series. In our implementation of time‐slicing, the models are not selected based on any model fit criterion (e.g. AIC) but merely on researcher assumptions. We thus suggest that both types of models (punctuated and gradual) are tested during analysis, unless there is strong independent support for one or the other.
Conclusions
The choice of time sub‐sampling method can alter the conclusions we obtain from disparity‐through‐time studies. Time‐slicing methods, with explicit models of evolution, provide an alternative to traditional time binning approaches. Note that while we introduce the time‐slicing methods here, and describe their advantages, we do not suggest that time‐slicing is necessarily the ‘best’ method for time sub‐sampling in all cases. As with all methods, the choice of methodology should be appropriate for the question and data at hand. However, we do strongly recommend performing disparity‐through‐time analyses using a series of appropriate time sub‐sampling methods, and comparing these, to ensure that results are not merely a consequence of the time sub‐sampling method employed.
Acknowledgements
NC thanks Mark Sutton and Philip Mannion for the invitation to contribute to the ‘Evolutionary Modelling’ symposium at The Palaeontological Association Annual Meeting 2017. TG acknowledges support from the Australian Discovery Project Grant number DP170103227 awarded to Vera Weisbecker. We thank Dave Bapst, Graeme Lloyd, April Wright and David Wright for assistance in gathering data for the analyses and/or discussions about the approach; and Steve Brusatte, Sally Thomas and one anonymous reviewer for helpful comments on the manuscript.
References
Citing Literature
Number of times cited according to CrossRef: 9
- Thomas Guillerme, Natalie Cooper, Stephen L. Brusatte, Katie E. Davis, Andrew L. Jackson, Sylvain Gerber, Anjali Goswami, Kevin Healy, Melanie J. Hopkins, Marc E. H. Jones, Graeme T. Lloyd, Joseph E. O'Reilly, Abi Pate, Mark N. Puttick, Emily J. Rayfield, Erin E. Saupe, Emma Sherratt, Graham J. Slater, Vera Weisbecker, Gavin H. Thomas, Philip C. J. Donoghue, Disparities in the analysis of morphological disparity, Biology Letters, 10.1098/rsbl.2020.0199, 16, 7, (20200199), (2020).
- Leonardo M. Borges, Victor Candido Reis, Rafael Izbicki, Schrödinger's phenotypes: Herbarium specimens show two‐dimensional images are both good and (not so) bad sources of morphological data, Methods in Ecology and Evolution, 10.1111/2041-210X.13450, 11, 10, (1296-1308), (2020).
- Ricardo S. De Mendoza, Raúl O. Gómez, Claudia P. Tambussi, The lacrimal/ectethmoid region of waterfowl (Aves, Anseriformes): Phylogenetic signal and major evolutionary patterns, Journal of Morphology, 10.1002/jmor.21265, 281, 11, (1486-1500), (2020).
- Christopher D. Dean, A. Alessandro Chiarenza, Susannah C. R. Maidment, Formation binning: a new method for increased temporal resolution in regional studies, applied to the Late Cretaceous dinosaur fossil record of North America, Palaeontology, 10.1111/pala.12492, 0, 0, (2020).
- Felix Vaux, Michael R Gemmell, Simon F K Hills, Bruce A Marshall, Alan G Beu, James S Crampton, Steven A Trewick, Mary Morgan-Richards, Lineage Identification Affects Estimates of Evolutionary Mode in Marine Snails, Systematic Biology, 10.1093/sysbio/syaa018, (2020).
- Pedro L. Godoy, Crocodylomorph cranial shape evolution and its relationship with body size and ecology, Journal of Evolutionary Biology, 10.1111/jeb.13540, 33, 1, (4-21), (2019).
- James W. Clark, Mark N. Puttick, Philip C. J. Donoghue, Origin of horsetails and the role of whole-genome duplication in plant macroevolution, Proceedings of the Royal Society B: Biological Sciences, 10.1098/rspb.2019.1662, 286, 1914, (20191662), (2019).
- Raúl O. Gómez, Celeste M. Pérez-Ben, Fossils Reveal Long-Term Continuous and Parallel Innovation in the Sacro-Caudo-Pelvic Complex of the Highly Aquatic Pipid Frogs, Frontiers in Earth Science, 10.3389/feart.2019.00056, 7, (2019).
- Thomas Guillerme, dispRity: A modular R package for measuring disparity, Methods in Ecology and Evolution, 10.1111/2041-210X.13022, 9, 7, (1755-1763), (2018).




