Amniotes through major biological crises: faunal turnover among Parareptiles and the end-Permian mass extinction

Authors


Abstract

Abstract:  The Parareptilia are a small but ecologically and morphologically diverse clade of Permian and Triassic crown amniotes generally considered to be phylogenetically more proximal to eureptiles (diapsids and their kin) than to synapsids (mammals and their kin). A recent supertree provides impetus for an analysis of parareptile diversity through time and for examining the influence of the end-Permian mass extinction on the clade’s origination and extinction rates. Phylogeny-corrected measures of diversity have a significant impact on both rates and the distribution of origination and extinction intensities. Time calibration generally results in a closer correspondence between origination and extinction rate values than in the case of no time correction. Near the end-Permian event, extinction levels are not significantly higher than origination levels, particularly when time calibration is introduced. Finally, regardless of time calibration and/or phylogenetic correction, the distribution of rates does not differ significantly from unimodal. The curves of rate values are discussed in the light of the numbers and distributions of both range extensions and ghost lineages. The disjoint time distributions of major parareptile clades (e.g. procolophonoids and nycteroleterids-pareiasaurs) are mostly responsible for the occurrence of long-range extensions throughout the Permian. Available data are not consistent with a model of sudden decline at the end-Permian but rather suggest a rapid alternation of originations and extinctions in a number of parareptile groups, both before and after the Permian/Triassic boundary.

Mass extinctions provide the most dramatic example of large-scale biological crises on record. The modalities of decline, disappearance and/or recovery of different clades inform our understanding of the tempo and mode of ecosystem collapse and recovery and allow us to explore the interplay of biotic and abiotic factors in shaping clade diversification. Therefore, the study of mass extinctions is of great interest to palaeontologists, evolutionary biologists, and macroecologists (e.g. Benton 2003; Erwin 2006; Purvis 2008). However, the specific responses of individual groups to extinctions remain inadequately documented, particularly in the terrestrial domain. It is clear that some clades of plants and animals were minimally affected by these events, whilst others were hit to varying degrees. One of the challenges in the analysis of large-scale extinctions is to integrate patterns that emerge for specific groups to understand how they contribute to broader local and global trends.

Numerous papers have used tetrapods for case studies of the impact of the end-Permian mass extinction on land (in particular, see papers by Pitrat 1973; Benton 1987; Olson 1989; Milner 1990; Maxwell 1992; Modesto et al. 2001, 2003; Benton et al. 2004; Ward et al. 2005; Botha and Smith 2006; Retallack et al. 2006; Roopnarine et al. 2007; Ruta et al. 2007; Angielczyk and Walsh 2008; Fröbisch 2008; Ruta and Benton 2008; Sahney and Benton 2008; Kalmar and Currie 2010; Sahney et al. 2010). The major conclusions of these papers differ depending upon the taxonomic hierarchy used (e.g. species; genera; families; orders), the broader scope of the investigation (e.g. ecological remodelling; postextinction recovery; net speciation rates; faunal turnover; changes in morphofunctional complexes; extinction selectivity) or the geographic areas under scrutiny (e.g. global vs. continental/subcontinental vs. basin scale). For example, in their studies of diversification patterns and faunal turnover in temnospondyl amphibians (the most speciose group of early tetrapods), Ruta et al. (2007) and Ruta and Benton (2008) showed that rates of cladogenesis increased considerably immediately before the end-Permian event as well as in the lower part of the Early Triassic. They also showed that family-level rates of extinction near the end-Permian decreased substantially when phylogeny was introduced as a correction factor for calculations of diversity estimates (i.e. estimates that are inferred from plotting a tree on a time scale). Finally, they found a lower percentage of new family originations near the Palaeozoic–Mesozoic boundary when phylogenetically corrected diversity counts were taken into account. As an additional example, Angielczyk and Walsh (2008) found little support for the hypothesis that decreasing oxygen levels across the Permian–Triassic transition accounted for differences in the size and proportions of the nares and secondary palate in anomodont therapsids (especially after accounting for body size and phylogeny). Finally, Sahney et al. (2010) observed that in the aftermath of major extinctions – including the end-Permian event – tetrapods invaded available ecological space at a faster rate than their predecessors, using their own key evolutionary adaptations.

This paper is the first in a series discussing macroevolutionary patterns in a diverse group of extinct amniotes called the Parareptilia (see review inTsuji and Müller 2009). Together with eureptiles (the clade that includes diapsids) and synapsids, parareptiles are one of the three major groups within crown amniotes. They encompass small to very large tetrapods that are almost universally placed in phylogenetic proximity to eureptiles (e.g. Tsuji and Müller 2009). They have been recorded from the middle Sakmarian to the late Rhaetian, with occurrences in Antarctica, Canada, Madagascar, North and South Africa, North and South America, North China, Russia, UK and western continental Europe. Parareptiles are of great zoological interest because some constituent clades in this group – particularly procolophonids and pareiasaurs – have at times been considered to be phylogenetically close to the ancestry of turtles (see Rieppel 2007 for a review and Werneburg and Sánchez-Villagra 2009 for a novel perspective on this issue).

Our goal in the present work is threefold: (1) using a recently published species-level supertree of parareptiles (Tsuji and Müller 2009), we explore the influence of phylogeny on palaeodiversity estimates throughout the recorded history of this group; (2) we assess the performance of several metrics for origination and extinction rates (e.g. Foote 2000; Foote and Miller 2007; Fröbisch 2008; Ruta and Benton 2008); (3) we quantify the (dis)continuities in the magnitude of origination and extinction rates throughout parareptile history. Note that we use the terms ‘continuity’ and ‘discontinuity’ in a statistical sense to indicate unimodal and multimodal distributions of extinction and origination rate values, respectively (see Wang 2003).

Our first objective – although conceptually simple – is also the most challenging. Analyses of palaeodiversity have long been part of palaeobiological investigations, and the availability of detailed databases has promoted a resurgent interest in the topic. At the same time, we are witnessing an ever-growing perception of the limitations of such data, and continuous efforts are now being made to evaluate their potential (for recent reviews related to the continental fossil record, see also Benton and Simms 1995; Sahney and Benton 2008; Kalmar and Currie 2010; Sahney et al. 2010).

With our first objective, we seek to establish whether values of parareptile diversity extrapolated from plotting parareptile phylogeny on a time scale (see also Smith 1994 for the methodological protocols) are significantly higher than observed diversity values (i.e. those based exclusively upon the known fossil record, without using phylogeny as a correction factor) at any point in the group’s history. Ultimately, we want to assess the extent to which corrected diversity counts compensate for vagaries of fossil preservation. We anticipate that corrected counts can be used effectively in conjunction with nonphylogenetic proxies for the quality of the fossil record to provide a more complete picture of palaeodiversity patterns than known occurrences alone.

Our second objective addresses evolutionary turnover among parareptiles through an analysis of extinction and origination rates. As rate metrics differ in the type and number of parameters used in their calculation, we employed a variety of methods (for detailed discussions, see Alroy 2000; Foote 2000; Alroy et al. 2001; Wang 2003; Fröbisch 2008; Ruta and Benton 2008).

Originations and extinctions of variable intensity may precede or follow biotic crises and episodes of intense speciation. The relationships between small- and large-scale extinctions are well documented for marine biota (e.g. Wang 2003, and references therein) but have not been addressed in detail in the terrestrial realm. Also, emphasis on mass extinctions has partially overshadowed efforts to understand the patterns of origination and their correlation with extinctions (but see Benton et al. 2004; Fröbisch 2008; Ruta and Benton 2008, and references therein). Therefore, our major goal is to quantify models of rise and decline in parareptiles.

For our third objective, we apply the analytical and statistical protocols outlined by Wang (2003) to quantify the statistical continuity or discontinuity between large-scale and background extinction episodes and diversification events. Such continuity may imply similarity of causes, effects and intensities. However, a major difficulty for the continuity of cause is the establishment of a direct and unambiguous association between a specific factor (or a set of synergistic causes; e.g. temperature changes; increased concentration of poisonous gases in the atmosphere) and successive episodes of taxon decline and/or disappearance. Similarly, for continuity of effect, one would need to identify some similarities in the implications of separate extinction episodes (e.g. extinctions affects consistently only those organisms that belong to a specific trophic level or that fall within a specific range of body sizes).

Therefore, in this paper, we focus exclusively on the continuity of magnitude of extinction and origination rates and on the distributions of such rates. The statistical treatment of rate magnitude is feasible and has already been explored in the marine fossil record (Wang 2003). The analytical framework is conceptually much simpler than in the case of analyses of cause and effect, which we will deal with in another paper (but see Benton et al. 2004 and Sahney et al. 2010 for some methodological approaches).

The key purpose of the continuity analyses is to assess whether at some point during their evolutionary history, parareptiles experienced unusually high levels of origination or extinction that ‘stood out’ relative to other, less conspicuous levels. In statistical terms, this translates into a statistically significant separation, or discontinuity (multimodality), between different values of origination or extinction rates. In other words, we want to understand whether the distribution of origination and extinction rates through time is unimodal or multimodal. Unimodality implies that the values of origination or extinction rates merge smoothly, and no set of values is significantly different from the other values (i.e. there is only one maximum in the probability density function associated with the values). Multimodality implies that the distribution of rate values shows two or more maxima (i.e. two or more modes in the profile of the probability density function) that are statistically different. The importance of this approach lies in the fact that it allows us to quantify episodes of biotic crises in a robust statistical framework that takes into account the whole history of a group.

Diversity analyses: basic assumptions

Traditionally, analyses of diversity (= taxonomic richness) have relied upon a simple count of observed taxa through time. The utility of the taxic approach is debated. When taxonomic sampling is dense and uniform, and groups are very speciose, taxon counts may approximate palaeodiversity estimates. The taxic approach is inevitable when phylogenies are not available for certain groups or include only a limited sample of taxa (comments in Benson et al. 2010). For some groups, including parareptiles, observed taxonomic diversity may be inextricably linked to sampling efforts in a few exceptionally productive areas (e.g. the South African Karoo and the Russian cis-Urals). Also, widespread occurrences of singletons (i.e. taxa present only in one stage; see also below) may indicate that taxon sampling is patchy (Foote and Miller 2007).

Phylogeny-corrected estimates of diversity (discussions in Smith 1994), such as are obtained by plotting a cladogram on a time scale, offer a suitable alternative to taxic estimates. It is important to clarify that a phylogeny only enables us to generate minimally corrected taxon counts by showing the placement and lengths of ghost lineages and range extensions. The extent and distribution of ghost lineages and range extensions give an indication of where systematic and/or sampling efforts might improve our knowledge of a group (e.g. Norell 1992; Smith 1994). Importantly, they can be used to refine estimates of extinct biodiversity by adding to the observed temporal ranges of taxa.

Our definitions of ghost lineage and range extension follow Smith (1994). Although the two concepts are related, they are often inaccurately used, conflated, or even synonymised in the palaeontological literature. We prefer to maintain their distinction, in agreement with previous works (e.g. Ruta and Benton 2008) and with the descriptions in Smith (1994). A ghost lineage is an internal cladogram branch superimposed on a time scale, i.e. ‘… an entire branch of an evolutionary tree for which there is no fossil record, but which needs to be hypothesized after combining cladistic and biostratigraphic data’ (Smith 1994, p. 139, also fig. 6.5). Given two terminal sister taxa with different earliest documented occurrences, the range extension is the temporal range added to the first documented appearance of the younger terminal taxon, drawing it back to the first documented appearance of the older taxon. In Smith’s (1994, p. 138 and also fig. 6.5) account, ‘… [an] observed stratigraphic range can be extended at either end’. However, as explained below, if we regard all taxa as being monophyletic, then we can simplify the construction of range extensions by extending back in time the range of a younger taxon so that it meets the earliest observed record of an older sister taxon or sister group. We point out that ghost lineages and range extensions are contingent on the current knowledge of the fossil record. The true extension of the temporal range of any taxon rests exclusively on additional fossil discoveries.

Inclusion of ghost lineages and range extensions may alter remarkably palaeodiversity counts, but only a handful of studies have explored their effects in detail (e.g. Smith 1994; Fara 2004; Ruta and Benton 2008; Barrett et al. 2009; Butler et al. 2009; Benson et al. 2010; Young et al. 2010; Mannion et al. 2011). Here lies an interesting implication for the study of palaeodiversity: establishing the extent to which correction of diversity estimates through phylogeny can be used to compensate for gaps in the palaeontological record. We emphasize that we do not use phylogenies – no matter how detailed and well supported – as predictors of true taxonomic richness at any one point in time (true richness is, and will always remain, unknown). Instead, we seek to determine the extent to which phylogenetic corrections approximate diversity values extrapolated from independent and nonphylogenetic proxies. For example, suppose that estimated diversity values were obtained from a regression of observed diversity against a given nonphylogenetic predictor (e.g. number of fossil-bearing formations or outcrop areas) over a certain time period (e.g. Smith and McGowan 2007). The phylogenetically corrected values may be significantly lower or significantly higher than the estimated values from such a regression. If they are lower than the regressed estimates, then, under the assumption of a uniform sampling, we conclude that the group is undersampled during the time periods in which discrepancies between values occur. If the phylogenetically corrected values are higher than the regressed estimates, then we conclude (under the same assumption of uniform sampling) that the group is better sampled than theoretical predictions suggest.

Material and methods

All data used in the calculations are available in the form of an EXCEL spreadsheet, downloadable as an online supplementary file (Data S1–S5).

Parareptile phylogeny

Recent fossil discoveries, novel insights into the evolution of various morphofunctional complexes and assembly of increasingly detailed taxon/character matrices for phylogenetic reconstruction have improved significantly our knowledge of the interrelationships of parareptiles and their affinities to other clades of amniotes (for synopses, see Modesto et al. 2001, 2003, 2009; Jalil and Janvier 2005; Modesto 2006; Tsuji 2006; Müller and Tsuji 2007; Reisz et al. 2007; Cisneros 2008a, b; Müller et al. 2008; Säilä 2008, 2009; Sues and Reisz 2008; Tsuji and Müller 2008, 2009). Certain portions of parareptile phylogeny remain unresolved, pending a re-assessment of various key species and the construction of enlarged, refined and integrated data matrices. Despite these uncertainties, progress in parareptile phylogeny was distilled in the form of a supertree (Tsuji and Müller 2009; for similar methodological approaches, see Ruta et al. 2007, Ruta and Benton 2008, and Kammerer and Angielczyk 2009). This supertree is the starting point for our analyses of macroevolutionary patterns in parareptiles because it includes the vast majority of described species (Text-fig. 1).

Figure TEXT‐FIG. 1..

 Supertree of parareptiles superimposed on a stratigraphic time scale of the Permian (partim) and Triassic (modified after Tsuji and Müller 2009). The same tree is also reported in the Data S1 of the online supplementary data, where the observed ranges, range extensions and ghost lineages are shown using colour-coded symbols. Here, the tip of a terminal branch represents the last recorded datum for any taxon. Thus, the length of a terminal branch consists of the length of the taxon’s observed range plus the length of its range extension. SAK, Sakmarian; ART, Artinskian; KUN, Kungurian; ROA, Roadian; WOR, Wordian; CAP, Capitanian; WUC, Wuchiapingian; CHX, Changhsingian; IND, Induan; OLE, Olenekian; ANS, Anisian; LAD, Ladinian; CRN, Carnian; NOR, Norian; RHT, Rhaetian.

Wherever possible, we split genera into their constituent species. We manually grafted species not present in Tsuji and Müller (2009) onto the supertree based upon their placements in the phylogenetic analyses accompanying their descriptions. For example, the Middle Permian South African parareptile Australothyris smithi was the sister taxon to Ankyramorpha (the clade consisting of Lanthanosuchoidea, Bolosauridae, Nyctiphruretus and Procolophonia) in Modesto et al. (2009). Accordingly, we placed it between the Millerosauria and the Lanthanosuchoidea in the supertree.

We chose not to resolve arbitrarily any polytomy in the supertree, pending clarification of sister group relationships in future, character-based analyses. Unresolved polytomies necessarily imply an underestimate of phylogenetically corrected taxonomic richness (see below) because for any N taxa that are collapsed in a polytomy, there are N-1 internal nodes and N-2 internal branches deriving from a full resolution of that polytomy. Although such internal branches are in some cases excluded from calculations of phylogeny-based diversity estimates (e.g. Ruta and Benton 2008), they too form a component of palaeodiversity (e.g. see Milner 1990; Smith 1994; Fara 2004). A minimum of three and a maximum of six lineages are joined in the polytomous nodes of the supertree (Text-fig. 1); thus, one to four additional internal branches would result from these polytomies in a completely bifurcating tree. However, experimenting with a fully resolved tree does not affect our main results to any remarkable degree (e.g. curves of origination/extinction rates; degree of correlation between originations and extinctions; statistical distributions of rate values), despite readjustments of both diversity and rate values.

Diversity estimates: some basic assumptions

Taxa and time bins.  To construct observed and inferred curves of diversity (i.e. curves that take into account ghost lineages and range extensions), we considered three categories of taxa. These categories account for the stratigraphic positions of the taxa relative to the upper and lower boundaries of a time bin (we use stages as temporal subdivisions; see below). Thus, we distinguish (1) taxa that are present solely within a stage (singletons); (2) taxa that cross both the upper and the lower boundaries of a stage; (3) taxa that cross either the upper or the lower boundary of a stage, but not both. In the case of observed occurrences, these three categories are all given equal importance for two main reasons. First, the various origination and extinction metrics that we use permit flexibility in their treatment of singletons (except in the case of Foote’s 2002 per-capita rate metrics). Second, singleton taxa are by far the most represented (77.1 per cent), followed by taxa crossing a single boundary (21.7 per cent) and a single taxon crossing both the upper and the lower boundaries of an interval (1.2 per cent). As singletons form a significant percentage of documented parareptiles, we find it hard to justify their a priori exclusion from diversity calculations (see also Fröbisch 2008; Lloyd et al. 2008; Marx 2008; Ruta and Benton 2008; Barrett et al. 2009; Butler et al. 2009; Benson et al. 2010; Marx and Uhen 2010).

For brevity, we did not use subsampling or rarefaction methods to explore the effects of different numbers of taxon occurrences or collections in different time bins. We will address this issue in a separate paper (but see Lloyd et al. 2008 and Ruta and Benton 2008 for examples).

Ghost lineages and range extensions.  Following protocols outlined in previous analyses (e.g. Smith 1994; Fara 2004; Ruta and Benton 2008), our calculations of inferred diversity counts rely upon the extent and location of ghost lineages and range extensions. Range extensions are easy to interpret. We assume that a taxon maintains its identity throughout the entire duration of its extended temporal range. This assumption is a necessary simplification, but it implies only a minimal, reasonably conservative correction for inferred diversity. Ghost lineages require some clarification. They are seldom taken into account in palaeodiversity analyses, despite the fact that they, too, contribute to minimally extrapolated counts of diversity (e.g. see Smith 1994; Fara 2004; Ruta and Benton 2008; Young et al. 2010).

A practical problem associated with trees superimposed on a time scale is the length assigned to internal branches (ghost lineages). Suppose that a number of taxa show identical earliest documented occurrences. Further, suppose that these taxa branch from adjacent points in a phylogeny (i.e. they form a pectinate arrangement) or are part of the same clade. Barring the possibility of a simultaneous speciation (equivalent to assigning no duration at all to the internal branches), we face the issue of determining a suitable length (duration) for the ghost lineages. Ruta and Benton (2008) assumed that in this situation, the branching events could be regarded as simultaneous. Here, we opt for a less restrictive approach, and instead, assign an arbitrary and equal length of 0.25 myr to ghost lineages. We apply this convention exclusively to the situation in which an internal branch connects taxa with identical first documented appearances. We emphasize that the length of 0.25 myr is arbitrary, given the absence of an explicit model of rate changes (e.g. one derived from character distribution). Such a short duration implies relatively rapid accumulations of originations and extinctions (an internal branch occurring solely within a time bin effectively behaves like a singleton and thus contributes one origination event and one extinction event to that time bin). It follows that short internal branches counteract the effects of simultaneous extinctions and originations of widely represented singletons (Data S1 of the online supplementary file), because they add an extra component to rate metric calculations. Indirectly, they also bias calculations of rates towards the null hypothesis that extinction rates near the Permo-Triassic boundary were not significantly different from those in preceding or successive time intervals (i.e. the distribution of extinction rate values was unimodal). This is because additional data points are added to the range of rate values.

Status of terminal taxa.  We assume the monophyly of all taxa in the supertree, i.e. we posit that each is supported by at least one autapomorphy. This is not unreasonable, although the status of each species should be re-addressed in the light of comprehensive cladistic analyses of all parareptiles. Previous studies of various parareptile clades do show occurrences of metataxa sensuSmith (1994), e.g. among procolophonids and nycteroleterids-pareiasaurs (for a detailed account of potential autapomorphies, see reference list in Tsuji and Müller 2009, and recent examples in Cisneros and Ruta 2010). However, this could be attributed to paucity of specimens and/or poor preservation of relevant material and the consequent lack of comprehensive lists of comparable characters. By treating all taxa as monophyletic, we simplify the construction of range extensions, so that no taxon is directly ancestral to any stratigraphically later taxon or group of taxa (e.g. see Smith 1994, p. 129).

The issue of zombie lineages.  One final comment concerns the extent to which taxa may have occurred beyond their known earliest and latest observed datum. ‘Backward’ extensions of taxa depend exclusively upon the hypothesis of sister group relationships plotted on a time scale (though accuracy of a phylogenetic hypothesis and consideration of ancestor–descendant relationships will influence the extension and position of the lineages). However, ‘forward’ extensions are problematic. These are ‘… unsampled portion[s] of a taxon’s range occurring after the final appearance of the taxon in the fossil record prior to its actual extinction’ (Lane et al. 2005, p. 23). For these extensions, Lane et al. (2005) aptly coined the term ‘zombie lineage’ (Note that a zombie lineage turns into a range extension, itself part of a Lazarus taxon (a taxon with a significant gap in its stratigraphic record; Smith 1994), as soon as a later representative of that taxon is discovered). Unfortunately, there are no ways of determining the extension of zombie lineages when known taxon occurrences are represented either by single specimens or by very limited numbers of horizons and collections. We agree that the unknown contributions of zombie lineages to the total diversity of a taxonomic group present a severe challenge to palaeodiversity analyses, and this remains true for analyses that make use of phylogenetic correction (e.g. Wagner 1995, 2000; Foote 1996; Lane et al. 2005). Unfortunately, probabilistic estimates of upper and lower extensions of observed taxon ranges (e.g. Strauss and Sadler 1989; Marshall 1990, 1994, 1997; Wang and Marshall 2004) are not easily applicable to our case study.

However, zombie lineages may not pose a problem if palaeodiversity studies are conducted at species level. The estimated known (observed) ranges for various parareptiles vary slightly (Data S1 of the online supplementary file), but their duration can be in part reconciled with current estimates of mean species longevity in tetrapods. Early studies in this field (Stanley 1979) provided estimates of 1 or 2 myr for vertebrates, so our ranges are close to or slightly greater these (see also Stanley 1985). Recent works indicate turnover values within intervals of approximately two orders of magnitude relative to Stanley’s (e.g. see Makarieva and Gorshkov 2004) in various groups of extant tetrapods, but there is still a dearth of analyses focussing on extinct tetrapods (e.g. see King 1993; Liow 2007). Clearly, future studies of palaeodiversity will benefit from augmented knowledge in this area.

Time intervals

For all analyses, we use stratigraphic stages as time bins (e.g. see Benton et al. 2004; Lloyd et al. 2008; Ruta and Benton 2008; Barrett et al. 2009; Butler et al. 2009; Benson et al. 2010) and we did not attempt to lump stages together or subdivide them further. The selection of time bins of identical (or nearly identical) lengths does offer obvious advantages, but it is not practical in our case. Fröbisch (2008) used intervals of 1 myr in his study of anomodonts, but stratigraphic plots of anomodont ranges reveal a good continuity of occurrences and very limited gaps. Parareptiles display a much patchier record. Extensive ghost lineages and range extensions imply that no real interpretable results would emerge from the analyses of turnover, rates and continuity of originations/extinctions using very fine time bins. For instance, with 1 myr bins, we would have several examples of zero value rates. Although these could conceivably be real values, we suspect that they would reflect record absence instead of true signal. Similarly, rate metrics would result in strings of zero values which would artificially create a gap between a major extinction in one bin and each of the adjacent zero value bins. By taking stages as time units, our bins have very uneven temporal durations. To overcome this problem, we present all rate metrics in both uncalibrated and time-calibrated versions (i.e. we temporally scale uncalibrated rate values by introducing interval duration as a correction factor; see Foote and Miller 2007). We did not explore alternative options, such as the use of intervals of arbitrary unit length (e.g. 3 myr), but we will discuss this approach in a separate paper on the quality of the parareptile fossil record.

We attempted to place taxa as accurately as possible within each stage, based upon the relevant information in the literature as well as data on co-occurring taxa from other groups (e.g. see Fröbisch 2008). When a taxon occurs solely in a stage (i.e. it does not cross the upper and lower boundaries of the stage), and no precise data on its distribution were available, we considered it to extend throughout the whole duration of the stage. However, when more precise data were available, we tried to position the taxon as accurately as possible within the stage, so that it occupies only part of the time bin. The observed ranges of species do not pose a major problem for analyses of turnover based upon observed and phylogeny-corrected taxon counts, as we use stages as time intervals: however, the use of finer temporal subdivisions likely would have introduced errors because these kinds of subdivisions would have necessitated extremely accurate placements of taxa relative to both interval durations and boundaries. Finally, in assigning ages to species, we did not try to correct for dating errors (but see also Lane et al. 2005; Pol and Norell 2006).

Origination and extinction rates

The performance of different rate metrics has been discussed elsewhere, and we agree that ‘… each has strengths and weaknesses, and each may give biased estimates of the true [per-taxon] likelihood of extinction [or origination] under certain conditions’ (Wang 2003, p. 461). For these reasons, we applied some of the most widely employed metrics (Data S2 and S3 of the online supplementary file). The selected metrics belong to two broad categories. In the first category, we express origination and extinction rates as percentages of the total diversity of taxa (i.e. all parareptiles in the supertree). In the second category, we employ one of a variety of so-called mean standing diversity (MSD) values per stage (Van Valen 1984; Hammer and Harper 2006; Ruta and Benton 2008), with origination and extinction rates expressed as percentages of MSD for each stage. The MSD in each time interval is calculated through the addition of ‘partial weights’ assigned to various taxa in that interval (e.g. Sepkoski 1975; Foote 2000; Hammer 2003), and weighting follows simple rules. A ‘weight’ of 1 is given to taxa that cross both the lower and the upper boundaries of an interval; taxa that cross only one but not both boundaries are given a ‘weight’ of 0.5; finally, following Hammer (2003), singletons are given a ‘weight’ of 0.3 (see also Ruta and Benton 2008). The MSD is thus given by the sum of all the ‘weights’, each ‘weight’ being multiplied by the number of taxa to which it is assigned.

With observed taxa, MSD calculations are straightforward. With inferred taxa, any internal cladogram branch occurring solely within an interval behaves like a singleton (see above) and accordingly receives a ‘weight’ of 0.3. Although arbitrary (e.g. see Ruta and Benton 2008), such ‘weights’ conform to common practice and usage in the relevant literature. Unlike Ruta and Benton (2008), however, we do not include any calculations based upon alternative ‘weighting schemes’.

We used nine types of rate metrics (Data S2 and S3 of the online supplementary file). The first of these takes into account the total number of observed taxa in our supertree. For each stage, we report the numbers of taxon originations (i.e. the earliest appearances of taxa in the stage) and extinctions (i.e. the last appearances of taxa in the stage) as percentages of the total number of taxa. If ‘obs.o’, ‘obs.e’ and ‘obs.total’ represent, respectively, the number of observed originations for a stage, the number of observed extinctions for that stage and the total number of observed taxa, then the first metric [observed per-taxon proportional originations (extinctions) per interval] is calculated as follows:

image(1)

The second metric [observed per-taxon proportional originations (extinctions) per million year] is similar to the first metric, except that the values derived from the two formulae above are divided by the stage duration given in millions of years (Δt):

image(2)

With introduction of phylogenetic correction, both ghost lineages and range extensions are added to the number of originations (‘inf.o’) and extinctions (‘inf.e’) and to the total number of taxa (‘inf.total’). Thus, the third rate metric [inferred per-taxon proportional originations (extinctions) per interval] is:

image(3)

If stage duration is considered, then the fourth rate metric [inferred per-taxon proportional originations (extinctions) per million year] is:

image(4)

In the previous four rate metrics, the total number of taxa (either obs.total or inf.total) in the denominator is the same, regardless of the time interval considered. Conversely, the following four rate metrics – collectively referred to as the Van Valen metrics (Van Valen 1984) –are obtained through replacement of the total number of taxa (observed or inferred) in the denominator with the MSD. The observed (‘obs.MSD’) and the inferred (‘inf.MSD’) mean standing diversities are calculated using the weighting rules discussed earlier. As in the case of the previous four rate metrics, the Van Valen metrics can be calculated per interval and per million year. The fifth, sixth, seventh and eighth rate metrics are as follows:

image(5)
image(6)
image(7)
image(8)

In these rate formulae, the values of both obs.MSD and inf.MSD usually differ in each time interval (Van Valen 1984; Foote 1994; Wang 2003; Ruta and Benton 2008).

Finally, we employed a ninth metric discussed at length by Foote (2000). His origination and extinction rates, referred to as per-capita rates and indicated by Nbt, make use of the number of taxa that cross both the lower and upper boundaries of a time interval. Two additional quantities, indicated by Nt and Nb, represent the total number of taxa that are present at the end (top or upper boundary) and at the beginning (bottom or lower boundary) of an interval, respectively. Nt includes Nbt plus those taxa that first appear in the interval and cross its upper boundary only (NFt). Nb includes Nbt plus those taxa that cross the lower boundary of the interval and are last observed in the interval (NbL). For extinctions, the ‘per-capita’ qualifier implies that ‘… the number of extinctions is scaled to the number of lineages at risk and to the amount of time they are at risk’ (Foote and Miller 2007, p. 180). Similar reasoning can be extended to originations. Foote’s (2000) per-capita origination (p) and extinction (q) rates are given by the following formulae, respectively:

image(9)

As the number of observed taxa that cross the lower and upper boundaries is very small, we opted for a compromise solution, whereby we used Foote’s (2000) metrics in conjunction with phylogenetic correction. As a result, we calculated Nbt, NbL and NFt using observed and inferred taxa. In the case of the Sakmarian, Carnian, Norian and Rhaetian, Nbt equals zero, invalidating the use of the two metrics for those stages. However, this does not constitute a limitation. The Sakmarian and Rhaetian represent the extremes of the entire time interval of interest (i.e. the recorded history of parareptiles). They include, respectively, the earliest and latest recorded appearances of parareptiles. As our focus is on a clearly monophyletic group with no older sister groups included, and because p and q disregard singletons, no Nbt taxa can logically be present in those two stages. However, unlike Foote’s (2000) metrics, all other metrics do permit inferences about origination and extinction at the initial and terminal stages of parareptile history and thus take singletons into account.

We calculated the majority of rates (excluding Foote’s per-capita rates) for 15 time intervals (Sakmarian to Rhaetian), which formed the input for turnover analyses. We quantified the strength of the correlation (linear dependence) between the two sets of values (origination and extinction rates) derived from each metric with Spearman’s ρ and Kendall’s τ, as implemented in PAST v. 2.08 (e.g. Hammer et al. 2001; Hammer and Harper 2006). Spearman’s ρ measures the ability of a strictly increasing or strictly decreasing (monotonic) function to describe the relationship between the two rate variables. Kendall’s τ determines the degree of association between the two variables in terms of numbers of both concordant and discordant pairs. Given two points in a bivariate plot, each with coordinates corresponding to rate values (e.g. originations on the horizontal axis; extinctions on the vertical axis), such points are said to form a concordant pair if the difference between the values along one of the two axes has the same sign as the difference between the values along the other axis, when the values are taken in the same order on both axes (e.g. Sokal and Rohlf 1995; Ruta and Benton 2008).

We used the nonparametric tests in a general way to test for the strength of the association between origination and extinction values. However, the data form a time series, i.e. a collection of measurements at successive time intervals. For this reason, the data points (individual measurements of rates) may not be independent. Therefore, we introduced a set of calculations for the autocorrelation function to test the independence of the data (for a similar approach, see also Marx and Uhen 2010). We checked for autocorrelations in the residuals (null hypothesis of no autocorrelation) up to lag 3 (i.e. correlation of residual rate values at time t vs. values at times t − 1, t − 2 and t − 3) using the Durbin–Watson test (Durbin and Watson 1950, 1951). We treated extinction as the response (dependent) variable and origination as the predictor (independent) variable. All statistical calculations were undertaken in the packages ‘car’, ‘lmtest’ and MASS (e.g. see Venables and Ripley 2002; Zeileis and Hothorn 2002; Fox and Weisberg 2011) compiled for the ‘R’ language and environment for statistical computing and graphics: http://www.r-project.org. The p values for the Durbin–Watson statistic were estimated via bootstrapping (1000 replicates, with resampling from observed residuals).

Estimates of probability density for origination and extinction rate values

To evaluate statistically the (dis)continuity of origination and extinction magnitude (rate values), we employed a method termed ‘kernel density estimation’ (e.g. Parzen 1962; Silverman 1981, 1986; Wang 2003). Wang (2003) provided an elegant and comprehensive discussion of the method and its rationale, and the reader is referred to his paper for a detailed exposition. Here, we offer only a very succinct treatment. Briefly, kernel density estimation is a nonparametric technique used to estimate the probability density function of a random variable. Given a finite data sample, we want to build an estimate of the occurrence of any given value in the sample. For each of the rate metrics discussed in the previous section, we calculate a set of rate values for each stage. To analyse the distribution of these variables, we could arrange them along a hypothetical axis and we could then group the points into a number of rectangular ‘bars’ of arbitrary and identical unit width (the width of the intervals into which the range of values is subdivided). The heights of these bars would then describe the frequency of occurrence of the points along the axis, the height being proportional to the number of points included in the bar.

However, the width of the bars and their position along the axis where the points are aligned can be chosen in different, arbitrary ways. In addition, given both a selected width for the bar and the number of bars, we are left with the problem of accounting for variations in the distribution of the points within either an individual bar or adjacent bars (i.e. points can be irregularly spaced).

Kernel density estimation circumvents these issues. With this method, we position a series of bars of unit width = 1 on the axis along which the points are aligned, so that each bar is centred on each point. The height of each bar equals 1/n, where n is the number of observations or time bins. Next, a cumulative height (a ‘block-like’ density) is built directly above each point, by adding up the heights of all the bars that overlap one another above any given point. For example, if three bars overlap one another above a point, then the height of the corresponding block is 3/n. The ‘block-like’ aspect of the bars is subsequently eliminated by replacing them with a kernel density estimator, i.e. a smooth curve that allows one to visualize contributions of each data point to the overall distribution. The area below this estimator is 1, and the position of each point has an associated probability. Commonly, a Gaussian function with a mean value of 0 and a variance of 1 (known as the kernel) is used to integrate the area under the estimator to 1. The smoothing parameter is known as the bandwidth. Probability density estimations were calculated with software available at: http://www.wessa.net/rwasp_density.wasp. In addition, the results were checked with codes in ‘R’ (Wang 2003).

Thackeray (1990), Gilinsky (1994), Newman and Eble (1999) and Wang (2003) noted a trend of decreasing extinction intensities in the Phanerozoic. Owing to this trend, relatively recent mass extinction events would be comparable in intensity to older background extinction levels but might be significantly separated from more recent background extinction levels. To compensate for this effect, Wang (2003) recommended the use of residuals from a regression of extinction rates on time bin duration as input data for calculations of probability density estimates. We note that the total time interval used in the present work (i.e. the entire recorded history of parareptiles) is relatively small. Therefore, any long-term decline of extinction intensities is not likely to affect our general conclusions. Despite this, we found it useful to examine probability density estimates that take into account Wang’s (2003) correction method. Together with extinction rates, we also regressed origination rates. An additional benefit of Wang’s (2003) approach is that it allows us to eliminate biases in origination and extinction rates that might result from the use of unequal time bin lengths and from the widespread occurrence of singletons. Although Δt offers a partial temporal correction, the use of rate residuals further compensates for unevenness of sampling in the parareptile record.

In PAST, we performed ordinary least squares (OLS) regressions using the rate values as the response variable and the stage durations as the predictor variable. Subsequently, we used regressed values of the response variable (that is, the rate values along the regression line) as input for probability density estimations. In this context, OLS has an exclusively forecasting purpose. Thus, neither the degree of colinearity nor the strength of association between the predictor variable and the response variable is of immediate interest. In a similar way, we did not explore the fit and suitability of nonlinear regression models to our data, because we are solely concerned with the impact of estimated rates on probability density calculations.

Concordance between phylogeny and stratigraphy

The availability of a parareptile supertree gives us an opportunity to investigate the amount of congruence between the supertree and first occurrence records. However, we present only a preliminary treatment of this topic in the light of a planned complete re-evaluation of the quality of the parareptile fossil record. We performed permutation tests (1000 replicates) in GHOSTS v. 2.3 (Wills 1999, 2007) to assess the significance of values of three well-established indexes, the Relative Completeness Index (RCI; Benton 1994), the Stratigraphic Consistency Index (SCI; Huelsenbeck 1994) and the Gap Excess Ratio (GER). Recently introduced and more sophisticated indexes (Wills et al. 2008) will be examined in a forthcoming publication (but see also Cisneros and Ruta 2010). As GHOSTS can handle a maximum of 74 taxa, we modified the supertree by deleting a number of taxa from the more derived portions of each clade, wherever applicable (data available upon request).

Results

Taxonomic richness through time

A visual comparison between the curves of observed (Text-fig. 2A; white squares) and phylogeny-corrected (Text-fig. 2A; black rhombs) diversity illustrates the dramatic increase in the numbers of taxa when ghost lineages and range extensions are taken into account. We report differences between inferred and observed values in Text-figure 2B as bars, with percentage increase in diversity implied by phylogeny tagged to each bar. Percentage increase in the Permian ranges from 87.5 (Wuchiapingian) to 1800 per cent (Sakmarian). In the Triassic, percentages vary from 0 (Rhaetian) to 200 per cent (Ladinian). More importantly, differences between inferred and observed values do not correlate significantly with stage durations (Spearman’s ρ = −0.31541, p = 0.25215; Kendall’s τ = −0.22331, p = 0.2459), suggesting that they are mostly dependent upon tree shape. Without time calibration, the mean difference between values is significantly higher in the Permian than in the Triassic (Mann–Whitney test: U = 7.5, z = −2.321, p = 0.0203). When time calibration is applied (i.e. the difference between values is divided by stage duration), the values for the Permian are not significantly higher than the values for the Triassic (Mann–Whitney test: U = 13, z = −1.678, p = 0.09334). Finally, a two-sample Kolmogorov–Smirnov test rejects the null hypothesis of equal distributions of values for the uncorrected and corrected curves (D = 0.6; p = 0.0047152).

Figure TEXT‐FIG. 2..

 Changes in parareptile diversity per time interval, using stratigraphic stages (from Sakmarian to Rhaetian) as time bins. For clarity of illustration, all time bins are marked by segments of equal length in this and in other figures. The vertical grey bars mark the Permo-Triassic boundary. A, profiles of observed (bottom curve, white squares) and inferred (top curve, black rhombs) diversity; for each stage, observed (taxic) diversity is the count of all taxa actually recorded, whereas inferred (phylogeny-corrected) diversity is given by observed taxa (or their range extensions) plus ghost lineages. B, differences (black vertical bars) between the inferred and the observed diversity values for each stage; the number above each bar is the percentage increase entailed by the inferred value relative to the observed value. For stage name abbreviations, see caption of Text-figure 1.

The striking difference between these curves is borne out by the application of Raup and Sepkoski’s (1986) sampling error method to taxonomic counts. Assuming a uniform taxonomic sampling (various tests of the quality of the parareptile record will be discussed in a separate work), each observed diversity value is symmetrically bracketed by an interval of two standard errors, calculated as plus or minus the square root of observed diversity (Raup and Sepkoski 1986; Eble 2000; Ruta and Benton 2008). The upper and lower boundaries of these intervals are represented by white rectangles in Text-figure 2A. The upper boundaries, in particular, occur below the phylogeny-corrected estimates for the whole Permian and for several stages in the Triassic.

Observed diversity increases steadily for most of the Permian, peaking in the Wuchiapingian. The Wuchiapingian value is slightly higher than the Olenekian value – the highest for the Triassic. However, when we introduce ghost lineages and range extensions, the Wuchiapingian and the Olenekian values are identical. Inferred diversity rises and falls irregularly throughout the Permian, with Kungurian and Wordian peaks alternating with Artinskian and Roadian troughs.

Faunal turnover

We present curves for origination (black circles) and extinction (white triangles) through time derived from formulae 1–8 in Text-figures 3 and 4 (see also Data S3 of the online supplementary file). Text-figures 5 and 6 show the associated probability density estimations for the relevant rate metrics. Text-figure 7A depicts the curves of origination and extinction values obtained with Foote’s (2000)formula 9, and Text-figure 7B, C illustrates the associated probability density estimations. Without time correction, extinction intensity in the Changhsingian is always higher than in the Induan, only marginally so in the case of three metrics (Text-figs 3C, 4A, C), but remarkably so in the case of one metric (Text-fig. 3A). With time correction, the Induan has the highest extinction rate of all stages (Text-figs 3B, D, 4B, D). In time-corrected calculations, only two metrics yield extinction rate values of high magnitude relative to the Induan value, namely in the Wordian (Text-fig. 3D) and in the Roadian and Ladinian (Text-fig. 4B). With Foote’s (2000) per-capita rates (Text-fig. 7A), extinction intensities rise fairly dramatically throughout the latest part of the Permian and after the Palaeozoic–Mesozoic boundary, before decreasing steeply during the Triassic.

Figure TEXT‐FIG. 3..

 Curves of origination (black circles) and extinction (white triangle) rates. The rate metrics employ the total number of observed (A, B) or inferred (C, D) taxa to normalize numbers of originations and extinctions in each time interval and either exclude (A, C) or include (B, D) correction for time interval duration. A, B, observed per-taxon proportional originations and extinctions per interval and per million year, respectively. C, D, inferred per-taxon proportional originations and extinctions per interval and per million year, respectively. The vertical grey bars mark the Permo-Triassic boundary. For stage name abbreviations, see caption of Text-figure 1.

Figure TEXT‐FIG. 4..

 Curves of origination (black circles) and extinction (white triangle) rates. The rate metrics employ observed (A, B) or inferred (C, D) mean standing diversity (MSD) to normalize numbers of originations and extinctions in each time interval and either exclude (A, C) or include (B, D) correction for time interval duration. A, B, uncorrected and time-corrected Van Valen metrics for observed MSD, respectively; C, D, uncorrected and time-corrected Van Valen metrics for inferred MSD, respectively. The vertical grey bars mark the Permo-Triassic boundary. For stage name abbreviations, see caption of Text-figure 1.

Figure TEXT‐FIG. 5..

 Probability density estimates of rate values based upon the total number of observed (A–D) or inferred (E–H) taxa. The curves represent probability density functions, with probability density values on the vertical axis and rate values on the horizontal axis. For each curve, we report the optimal bandwidth value (h) and the degree of significance (p) associated with Silverman’s critical bandwidth test (null hypothesis of unimodal distribution of rate values). A–D, probability densities of observed per-taxon proportional originations (A, C) and extinctions (B, D), and calculated per interval (A, B) and per million year (C, D). E–H, probability densities of inferred per-taxon proportional originations (E, G) and extinctions (F, H), and calculated per interval (E, F) and per million year (G, H).

Figure TEXT‐FIG. 6..

 Probability density estimates of rate values based upon observed (A–D) or inferred (E–H) mean standing diversity (MSD) of taxa. The curves represent probability density functions, with probability density values on the vertical axis and rate values on the horizontal axis. For each curve, we report the optimal bandwidth value (h) and the degree of significance (p) associated with Silverman’s critical bandwidth test (null hypothesis of unimodal distribution of rate values). A–D, probability densities of originations (A, C) and extinctions (B, D) using Van Valen metric for observed MSD, and calculated per interval (A, B) and per million year (C, D). E–H, probability densities of originations (E, G) and extinctions (F, H) using Van Valen metric for inferred MSD, and calculated per interval (E, F) and per million year (G, H).

Figure TEXT‐FIG. 7..

 A, curves of per-capita origination (black circles) and extinction (white triangle) rates using inferred counts of taxa. The vertical grey bar marks the Permo-Triassic boundary. For stage name abbreviations, see caption of Text-figure 1. B, probability density estimates of per-capita rate values of originations based upon the total number of inferred taxa. C, probability density estimates of per-capita rate values of extinctions based upon the total number of inferred taxa. For each curve in B and C, we report the optimal bandwidth value (h) and the degree of significance (p) associated with Silverman’s critical bandwidth test (null hypothesis of unimodal distribution of rate values).

In terms of statistical significance, the different rate metrics do not give consistent results for the correlation between extinction and origination intensities. Both Spearman’s ρ and Kendall’s τ indicate nonsignificant correlations between originations and extinctions for the uncorrected Van Valen metric for observed taxa and the uncorrected and time-corrected Van Valen metrics for inferred taxa (formulae 5, 7, 8). For all other metrics (formulae 1–4, 6, 9), both tests indicate a significant correlation (Data S4 of the online supplementary file). Discrepancies among these results are not easy to interpret, but they highlight the necessity of comparing different measures of faunal turnover. Taken at face value, the significance of results for six of the nine metrics implies that as net increases in speciation took place, parareptiles experienced almost simultaneous episodes of decline (i.e. relatively fast turnover). However, when a standardization by stage duration is introduced (as with the Van Valen metrics), nonsignificant correlations are almost the norm. Therefore, scaling of rates for lineages that are at risk produces – for most of the Van Valen metrics – disjoint originations and extinctions, implying differential impact on overall turnover.

The Durbin–Watson test for autocorrelation returned nonsignificant results (Data S5 of the online supplementary file) for almost all comparisons between origination and extinction rate values and for all time lags. The only exception is the occurrence of autocorrelation at lag 1 for the inferred per-taxon proportional originations and extinctions per interval (formula 3 above; Durbin–Watson d = 0.9849337; p = 0.03). However, we urge caution in the interpretation of these results because of the small number of values in each autocorrelation calculation. A more detailed autocorrelation study will form part of a separate publication on the quality of the parareptile fossil record.

Another way to look at turnover is through correlations between rate and diversity. Significant and strong correlations may be expected in some cases. For example, high extinction intensities are likely to be associated with diversity peaks. However, this is not always the case. Indeed, a dramatic decline or disappearance of taxa may coincide with a diversity minimum (e.g. see Ruta and Benton 2008). We thus tested for the null hypothesis of no significant correlation between inferred numbers of taxa per stage and corresponding rate values that make use of inferred taxa, as these provide a more complete picture of diversity than taxic counts. The inferred per-taxon proportional originations and extinctions per interval and per million year (formulae 3 and 4) all correlate significantly with inferred diversity per stage, as do the time-corrected Van Valen rates of origination (formula 8). The other three Van Valen metrics, however, show a nonsignificant correlation with diversity (formulae 7 and 8 (extinctions)), as do Foote’s per-capita metrics (formula 9).

From the Anisian onward, however, the pattern of faunal turnover becomes increasingly more difficult to interpret given the dramatic decrease in the number of taxa, both observed and inferred. At present, it is difficult to ascertain whether such a decrease represents genuine signal or merely an artefact of sampling. We will address this issue in a separate publication.

Continuity of originations and extinctions

We first subjected the sets of values used to produce curves of origination and extinction rates for each of the metrics described earlier (Text-figs 3, 4) to probability density estimations (Text-figs 5–7). Subsequently, we applied Silverman’s critical bandwidth test (e.g. Silverman 1986; Wang 2003) to evaluate the null hypothesis of unimodal distributions of the origination and extinction values. In all cases, we show exclusively the graphs of probability density functions, but frequency histograms are also available upon request.

Silverman’s critical bandwidth test returns nonsignificant p values for all metrics. In no case does the distribution of rate values differ significantly from unimodal, regardless of whether phylogenetic correction and/or stage duration correction are introduced (Text-figs 5–7). In very general terms, our results match similar findings by Wang (2003). In his analysis of the continuity between background and mass extinctions in the Phanerozoic, Wang (2003) detected a significant departure from unimodality only in the case of one metric, the number of extinctions per interval (not used here, as it does not account for total diversity and is sensitive to highly unequal variances; see Foote 1994; Wang 2003; Foote and Miller 2007). For originations, and employing total diversity count of parareptiles as the standardizing parameter, the smallest and the largest p values relate, respectively, to the inferred per-taxon proportional originations per million year (0.118; see Text-figure 5G) and to the observed per-taxon proportional originations per interval (0.755; see Text-figure 5A). As regards extinctions, again using total diversity count of parareptiles as a standardizing factor, the smallest and largest p values pertain, respectively, to the inferred per-taxon proportional extinctions per million year based upon the Van Valen metric (0.127; see Text-figure 6H) and to the inferred per-taxon proportional extinctions per million year (0.419; see Text-figure 5H). The critical bandwidth test also fails to reject the null hypothesis of unimodal distribution among the rate values for Foote’s (2000) per-capita rates (Text-fig. 7B, C).

We obtained nonsignificant results when we analysed regression residuals of all metrics. In the interest of brevity, we do not report the probability density estimation graphs and the corresponding critical bandwidths. However, both can be easily generated using the data in the Data S4 of the online supplementary file together with the protocols detailed in the methods. Because the critical bandwidth tests included a relatively small number of observations (maximum value of n = 15), they may have low power. For this reason, rejection of the null hypothesis would be difficult unless a truly remarkable discontinuity in the rate values could be observed. This issue will be addressed more fully in forthcoming publications on the turnover rate of various tetrapod groups across the Permo-Triassic boundary.

Discussion

Parareptile diversification through time

Visual inspection of the parareptile phylogeny plotted on a time scale (Text-fig. 1) shows a series of cladogenetic events, mostly in the middle to late Sakmarian, resulting in the emergence of a variety of small but ecologically differentiated groups. These groups precede the radiation of the two major clades, nycteroleterids-pareiasaurs and procolophonoids, and each is subtended by a very long ghost lineage. The nycteroleterid-pareiasaur/procolophonoid group minimally dates to the mid-part of the Kungurian, based upon observed taxon occurrences, and is also rooted into the Sakmarian. The stratigraphic distribution of nycteroleterids-pareiasaurs reveals diversity peaks in the Capitanian and the Wuchiapingian. The Wuchiapingian also marks a notable drop in the taxic diversity of pareiasaurs. The stratigraphic distribution of procolophonoids is more uniform than that of pareiasaurs, at least in the Early Triassic when two peaks are recorded in the Induan and in the Olenekian. The ancestry of procolophonoids still remains elusive (Tsuji and Müller 2009): procolophonoids join the nycteroleterids-pareiasaurs via a ghost lineage spanning the middle Kungurian to the early/middle Wuchiapingian, and further conspicuous gaps dot their Middle to Late Triassic fossil record.

When ghost lineages and range extensions are introduced, the most extensive gaps in the fossil record of parareptiles appear confined mostly to the Middle-earliest Late Permian and to the Middle and Late Triassic (Text-fig. 1; Data S1 of the online supplementary data). Middle-earliest Late Permian gaps characterize millerettids (clade including Milleretta rubidgei to Millerosaurus ornatus), lanthanosuchoids (clade including Lanthaniscus efremovi to Colobomycter pholeter) and bolosaurids (clade including Eudibamus cursoris to Bolosaurus striatus). Middle and Late Triassic gaps characterize leptopleuronine procolophonids (clade including Pentaedrusaurus ordosianus to Hypsognathus fenneri). Curiously, counts of taxic diversity do not highlight a sudden decline in diversity in the latest part of the Middle Permian, but a decline is evident in the corrected estimates (Kungurian–Roadian transition; Text-fig. 2A). This decline, commonly termed Olson’s Gap, has been observed in other tetrapods (Benton 1987; Erwin 1990; Lucas 2004; Ruta and Benton 2008; Sahney and Benton 2008), but it remains unclear whether it represents a genuine phenomenon or an artefact.

Clade dynamics in parareptiles

The different metrics do not provide a coherent picture of clade dynamics, in partial agreement with similar recent studies on other tetrapod groups (e.g. Fröbisch 2008; Ruta and Benton 2008). In most cases, introduction of time calibration tends to cause the origination and extinction curves to track each other fairly closely (Text-figs 3B, D, 4B, D). The most striking examples of mismatch in rate profiles are the Van Valen metric based upon observed data without time calibration (Text-fig. 4A) and, to a lesser extent, proportional rates based upon observed data with time calibration (Text-fig. 3B).

Taken together, the patterns shown in Text-figures 3, 4 and 7A best summarize the dynamics of parareptiles as a group. Throughout the Mesozoic, the group is only represented by the procolophonids, and they are solely responsible for the apparent peak in rate values observed during the lowermost Triassic. Conversely, pareiasaurs (together with a very limited sample of taxa from more basal clades) are chiefly responsible for the rate patterns during the Cisuralian and the Lopingian. At present, it is impossible to combine results from these two disjointed temporal distributions into a unified pattern, owing to the long ghost lineage at the base of procolophonoids. Despite this problem, available data do not seem consistent with a sudden decline of parareptiles near the end of the Palaeozoic. Instead, they suggest a rapid alternation of decline and speciation events in different clades (see also Fröbisch 2008 and Ruta and Benton 2008).

Parareptiles and the end-Permian crisis

The evolutionary history of parareptiles presents an interesting challenge for analyses of the impact of the end-Permian mass extinction. During the Capitanian and Wuchiapingian, parareptiles suffered two successive episodes of decline, but these mainly affected separate clades within the pareiasaurs. By the end of the Changhsingian, most pareiasaurs were already extinct, and only a few lineages belonging to slightly older clades (but no new clades) survived until the end of the Palaeozoic. Together with these pareiasaur lineages, a small number of basal parareptiles – notably millerettids – and some basal procolophonoids also went extinct. When plotted on a stratigraphic scale, the phylogeny of parareptiles does not show the pattern of increasing diversity throughout the Permian and sudden truncation at the end of the Palaeozoic that has been noted in other terrestrial vertebrate groups, particularly temnospondyls (Milner 1990; Ruta et al. 2007; Ruta and Benton 2008). This largely agrees with previous suggestions that the end-Permian mass extinction did not have a dramatic impact on parareptiles (e.g. Modesto et al. 2001, 2003; Botha et al. 2007). However, only one clade of parareptiles, the Procolophonoidea, crosses the Permo-Triassic boundary, and it accounts for all Mesozoic parareptile diversity. Similar phylogenetic ‘bottlenecks’ have also been documented in the Synapsida and Eureptilia. For each group, only a single lineage, Therapsida and Diapsida respectively, survived the end-Permian mass extinction and explanations for this selective survival remain elusive.

The ghost lineage connecting procolophonoids to nycteroleterids-pareiasaurs almost certainly represents an artefact of sampling. In the absence of any data on putative procolophonoid ancestors from the middle Kungurian to the early–middle Wuchiapingian, we can only speculate on plausible diversification scenarios for this clade. Considering that in other ankyramorphan parareptiles, such as bolosaurids, the major phylogenetic diversification seemingly did not occur later than the Permo-Carboniferous transition (e.g. Müller et al. 2008), we must assume that the poor fossil record of Early Permian parareptiles and the absence of any Carboniferous taxa potentially have a severe impact on scenarios of early parareptilian diversification. More solid conclusions must await additional discoveries of parareptiles from crucial time intervals.

Phylogeny and stratigraphy

The overall concordance between the stratigraphic order of appearance of taxa and their branching order in the supertree is fairly poor. The SCI, RCI and GER values are significantly different from random (p = 0.01), but only 31 of 71 internal supertree nodes are stratigraphically consistent (SCI = 0.43662). In addition, added ghost lineages and range extensions exceed in duration added observed taxon ranges (RCI = −22.781775; also see Data S1 of the online supplementary file). Finally, the difference between the total duration of ghost lineages plus range extensions and the added gaps between successive earliest observed occurrences is small relative to the difference between added gaps between the earliest known record of the oldest taxon and those of other taxa and the added gaps between successive earliest observed occurrences (GER = 0.823506). Comparisons with the results of Cisneros and Ruta (2010) suggest that the fit of phylogeny to stratigraphy may vary across individual parareptile clades. Although the overall fit may be poor, the congruence may increase in specific groups, procolophonids being an obvious example.

Conclusions

We compared and contrasted various palaeodiversity metrics for the fossil record of parareptiles and highlighted for the first time the overall impact of the end-Permian extinction event on the clade as a whole. Our major conclusions are as follows:

  • 1 Phylogenetically corrected estimates of diversity greatly augment our knowledge of the faunal turnover and dynamics of originations and extinctions of clades through time. Although corrected estimates cannot themselves be used as a proxy for total diversity at any one time in the history of a group, they do highlight areas where future effort can improve the quality of the record. They can also be used in conjunction with several nonphylogenetic proxies of record completeness to provide alternative measures of sampling effort and quality. The introduction of ghost lineages and range extensions in calculations of parareptile diversity through time produces diversity counts that greatly exceed the upper bounds of confidence intervals around observed diversity values, both for the entire Palaeozoic history of parareptiles and for part of their Triassic history. In addition, the apparent rise in observed diversity values throughout most of the Palaeozoic contrasts with alternations of increasing and decreasing diversity when phylogeny correction is used.
  • 2 Various metrics of origination and extinction rates do not give a unified picture of parareptile diversification during their recorded history. However, disjoint curves of origination and extinction appear to be the norm when time scaling is excluded from calculations. If time correction is used, however, originations and extinctions track one another more closely. Origination and extinction magnitudes are invariably close to one another in time-calibrated curves, regardless of the metric used, and these magnitudes are invariably higher after the Permo-Triassic extinction event than they are immediately before the end-Permian. In addition, origination magnitude slightly exceeds extinction magnitude in the Induan, whereas the opposite pattern is observed in the Changhsingian.
  • 3 Even with different combinations of time calibration and phylogenetic correction, rate values are distributed according to a unimodal pattern. Despite the occurrence of local maxima in the profile of the probability density function derived from the vast majority of rate metrics, these maxima are not significantly separate. The main conclusion from this finding is that at no point during their history did parareptiles experience unusually high or low levels of origination or extinction. As a corollary, we conclude that the group did not suffer from significantly higher extinction levels during the end-Permian event than it did in other periods of its history.

Acknowledgments

Acknowledgements.  We are greatly indebted to Professors Pierre Legendre (Université de Montréal, Département des Sciences Biologiques, Canada) and Steve Wang (Department of Mathematics and Statistics, Swarthmore College, USA) for much generous and prompt advice on numerous statistical issues. Professor Wang also made his R codes (probability density estimates and critical bandwidth test) available to us. MR exchanged countless discussions on mass extinctions with Professor Mike Benton (Department of Earth Sciences, University of Bristol) and Dr Andrew Milner (Department of Palaeontology, The Natural History Museum, London). An early draft of this work was greatly improved by constructive comments from Drs Kenneth Angielczyk (Department of Geology, Field Museum, Chicago), Jennifer Botha-Brink (Department of Karoo Palaeontology, National Museum, Bloemfontein, South Africa), Matthew Wills (Department of Biology and Biochemistry, University of Bath) and Professor Steve Wang (already mentioned in another connection). Our research was funded by Natural Environment Research Council (Advanced Research Fellowship NE/F014872/1 to MR); Conselho Nacional de Desenvolvimento Científico e Tecnológico (Grant CNPq 155085/2006-9 to JCC); and Deutsche Forschungsgemeinschaft (Grant Mu 1760/2-3 to JM, LAT, and TL).

Editor. Kenneth Angielczyk

Ancillary