VARIABLE PRESERVATION POTENTIAL AND RICHNESS IN THE FOSSIL RECORD OF VERTEBRATES

Variation in preservation and sampling probability clouds our estimates of past biodiversity. The most extreme examples are Lagerst€atten faunas and floras. Although such deposits provide a wealth of information and represent true richness better than other deposits, they can create misleading diversity peaks because of their species richness. Here, we investigate how Lagerst€atten formations add to time series of vertebrate richness in the UK, Germany and China. The first two nations are associated with wellstudied fossil records and the last is a country where palaeontology has a much shorter history; all three nations include noted Lagerst€atten in their fossil records. Lagerst€atten provide a larger proportion of China’s sampled richness than in Germany or the UK, despite comprising a smaller proportion of its fossiliferous deposits. The proportions of taxa that are unique to Lagerst€atten vary through time and between countries. Further, in all regions, we find little overlap between the taxa occurring in Lagerst€atten and in ‘ordinary’ formations within the same time bin, indicating that Lagerst€atten preserve unusual faunas. As expected, fragile taxa make up a greater proportion of richness in Lagerst€atten than the remainder of the fossil record. Surprisingly, we find that Lagerst€atten account for a minority of peaks in the palaeodiversity curves of all vertebrates (18% in the UK; 36% in Germany and China), and Lagerst€atten count is generally not a good overall predictor of the palaeodiversity signal. Vastly different sampling probabilities through taxa, locations and time require serious consideration when analysing palaeodiversity curves.

T H E fossil record is a key source of information on changes in biodiversity through deep time. Correlations between diversity and environmental time series can be tested, and from these, causal relationships hypothesized at different scales (Benton et al. 2004;Mayhew et al. 2008;Hannisdal & Peters 2011). There are many problems, however, with taking global taxon counts across long spans of time at face value. Raup (1972) argued that the empirical global palaeodiversity signal was driven primarily by differing levels of sampling within each time bin.
Although palaeontologists agree that the fossil record is incomplete, there has been debate about the extent of the problem and how to deal with it (Smith 2001;Benton et al. 2011;Smith & McGowan 2011;Mannion et al. 2013;Walker et al. 2017). One approach is to use sampling proxies such as the number of collections, localities or formations to model geological control of the fossil record (Smith 2007;Smith & McGowan 2007;Lloyd & Friedman 2013). Problems have been noted with the use of strict proxies as they are redundant with the palaeodiversity signal (Benton et al. 2011), except in the case of broader sample counts; for example a count of ammonite diversity versus all marine localities. Geological map area avoids many of the redundancy problems, but even this can be problematic as a sampling proxy (Dunhill 2011(Dunhill , 2012Dunhill et al. 2014a, b). Further, Smith & McGowan's (2007) sampling proxy and residual diversity estimates modelling approach produces contrary results and is statistically flawed (Dunhill et al. 2014a(Dunhill et al. , b, 2018Brocklehurst 2015;Sakamoto et al. 2016;Close et al. 2018). More useful may be subsampling approaches (rarefaction: Tipper 1979; SQS: Alroy 2010) and a Poisson sampling model (TRiPS: Starrfelt & Liow 2016), which have been employed to control for various sampling biases, with varying degrees of success.
We are interested here in the impact of Lagerst€ atten on palaeodiversity. Lagerst€ atten were defined originally as deposits or sites that contain 'an unusual amount of palaeontological information' (Seilacher 1970). Seilacher et al. (1985) identified two types: conservation Lagerst€ atten, which preserve exceptional fossil detail or less recalcitrant ('soft') tissues; and concentration Lagerst€ atten, which can include an unusually large number and density of taxa or specimens. For the rest of this study we follow the narrower definition of Muscente et al. (2017) who focused on 'exceptionally preserved fossils', so we exclude concentration deposits such as bone beds, comprising mainly hard tissues, often in abraded condition. Lagerst€ atten are traditionally thought of as representing unusual, localized depositional environments, but there is strong evidence for clustering in space and time, perhaps reflecting global-scale phenomena that enable several similar Lagerst€ atten to occur in an area or within a time bin (Muscente et al. 2017).
It is commonly assumed that Lagerst€ atten generate global diversity peaks, for example the Burgess Shale peak in the global Cambrian palaeodiversity plot (Sepkoski 1996), but is this always the case? At a local scale, even a small change in preservation probability (that is, one bed containing highly abundant and more complete fossils) can change the shape of the measured diversity curve in a section (Walker et al. 2017). At a larger scale, and with extremely good preservation, one might expect the effects of this heterogeneity of diversity curves to be even larger. Indeed, Benson et al. (2010) and Benson & Butler (2011) found that Lagerst€ atten provided more than half of the marine tetrapod taxa of the Triassic-Jurassic.
Although Lagerst€ atten are exceptional windows into particular facies and ecosystems, and provide a wealth of information for functional inferences and the timeline of key evolutionary innovations (Selden & Nudds 2012), they can, ironically, be a hindrance for diversity studies that require equal sampling throughout. The impact of Lagerst€ atten varies between taxonomic groups, and is strong for fragile, hollow-boned taxa such as pterosaurs (Dean et al. 2016), birds (Brocklehurst et al. 2012, and bats (Brown et al. 2019). Dean et al. (2016) found a causal relationship between Lagerst€ atten occurrences and diversity of pterosaurs, and a correlation between skeletal completeness and diversity that broke down when data from Lagerst€ atten were removed.
Lagerst€ atten are not the only cause of artificial peaks in raw diversity counts; monographic effects (detailed descriptions of a taxonomic group, occasionally limited to a single time interval or locality) can cause these too (Raup 1972;Fara 2004;Lloyd & Friedman 2013). Without referring to the original data, these may be mistaken for the Lagerst€ atten effect, and since a prerequisite for writing a monograph is usually an abundance of describable fossils, monographic and Lagerst€ atten effects may be linked. Therefore, monographic effects must be considered when investigating the diversity of a single taxonomic group, but they are probably less important in studies of numerous taxa over large geographical areas (Raup 1976).
Our aim is to address three questions: (1) Do Lagerst€ atten significantly alter our perceptions of the vertebrate fossil record? (2) How does this vary between countries with widely different sampling histories? (3) Do Lagerst€ atten contribute significantly to our understanding of new ecosystems, or do they mostly record previously identified ecosystems in greater detail? To address these questions, we explore how the palaeodiversity signal from three countries (UK, Germany, China) depends on ordinary and Lagerst€ atten formations, discriminating between marine and terrestrial deposits, and between fragile and robust fossil specimens. We also explore the data using model-fitting approaches to identify potential key drivers of the palaeodiversity signals.
We conduct this study at the scale of countries, as these allow finer subdivision of stratigraphy than globalscale data. Each country has a different combination of facies that may be more or less likely to preserve exceptional fossils. In addition, countries have national geological surveys (and in many cases, researchers) who have erected the stratigraphic frameworks by which rocks are dated, making the correlation of units easier within than between countries. Further, the palaeontological research history may be defined by national boundaries. The regional approach that we take is in line with many recently published studies of diversity and sampling bias (Crampton et al. 2003;Dunhill 2012;Dunhill et al. 2012Dunhill et al. , 2013Dunhill et al. , 2014aBenson et al. 2016;Close et al. 2017;Walker et al. 2017).

Defining Lagerst€ atten, biotas and formations
It is an oversimplification to say that the fossil record is made up of Lagerst€ atten and ordinary deposits. Preservation probability is a continuum (Seilacher et al. 1985), depending on the differing taphonomic processes behind each fossiliferous deposit (Allison 1988;Briggs 1995Briggs , 2003Sansom 2014). However, even though Lagerst€ atten are end members of high preservation probability, palaeontologists generally divide fossiliferous deposits into either Lagerst€ atten or not (Butler et al. 2009(Butler et al. , 2013Benson & Butler 2011;Dean et al. 2016); we follow this assumption but consider various ways of defining the cut-off between the two categories.
Geological formations are units that can be tracked laterally over mappable areas. For this study, geological survey definitions of formations are used for Germany and the UK, and 'palaeontological community' definitions (formations that have been entered into the Paleobiology Database; PBDB; https://paleobiodb.org/) are used for the Chinese data). There may be differences between these formation definitions in each country; we explore this issue on the basis of our empirical data and discuss below possible reasons for the differences in duration and definition.
Lagerst€ atten are usually sites or units below the scale of formation, although there are some exceptions (e.g. the units containing the Jehol Biota). We use the term 'Lagerst€ atte' rather than 'biota' because this is the common term in two of the countries we sampled (UK and Germany); the term biota, commonly used for Chinese examples, may correspond to a geographically and stratigraphically localized Lagerst€ atte, or may include numerous separate Lagerst€ atten (e.g. Yanliao and Jehol biotas). We standardize the scale at which Lagerst€ atten are designated to formation level. In fact, previously in the diversity count literature, whole formations have been described as Lagerst€ atten (Benson & Butler 2011;Dean et al. 2016), although generally a Lagerst€ atte is a single horizon or set of horizons within a formation. Formations clearly have different scales (Benton et al. 2011), but they are more comparable in scale than one group is to another or one bed is to another, for example. The time span represented by individual formations is also assessed here.
Formations that have been described as, or containing, one or more Lagerst€ atten are referred to here as 'Lagerst€ atte formations', and all other formations are termed 'ordinary formations'. Of course, there is much value in studying the variety of sediments labelled 'ordinary formations' here; the label is purely a designation based on a lack of exceptional preservation, rather than on poor preservation.

Data collection
Occurrence data for China, the UK, and Germany were downloaded from the PBDB (26 February 2016), searching for each country in turn and the taxon 'Vertebrata'. Over recent years there has been a concerted effort to increase the coverage of a number of vertebrate groups in the PBDB (Mannion et al. 2013). We cleaned the data before analysis to remove synonyms, uncertain taxa and trace fossils. Most Lagerst€ atten were accurately documented because of abundant publications and a focus by data enterers, but some 'ordinary' formations presented less accurate data.
These three countries were chosen because they represent different aspects of sampling. Fossil taxa from the UK and Germany have been named and systematically recorded for hundreds of years, whereas samples from China have only recently been systematically named. Famous Lagerst€ atten have been found in all three countries, and especially from China, where the Jehol Biota has greatly increased our knowledge of the Cretaceous Terrestrial Revolution (Benton et al. 2008).
Stratigraphic information was extracted from the PBDB to produce lists of fossiliferous geological formations for each country. Synonymous geological formations from these data were found and renamed, and the duration of deposition was established. Formations were coded as 'marine', 'terrestrial' or 'mixed/marginal'. For the UK, the majority of the formation names, durations and depositional environments came from the BGS Lexicon of Named Rock Units (British Geological Survey 2016), with a literature search performed for the remaining units (see Walker et al. 2019). For Germany, these same data were derived from Litholex (Deutsche Stratigraphische Kommission) and the Stratigraphische Tabelle von Deutschland (Menning 2002). However, such geological survey data on Chinese formations were not available, so ages from the PBDB were used. The PBDB is not the best source of stratigraphic data (Benton et al. 2013), even though it has been used extensively elsewhere to collect formation counts; dates relate to individual collections, so they do not always represent the full time span of a formation, and single formations may be ascribed different chronostratigraphic ages depending on the original publication opinions.
Each formation was coded as a Lagerst€ atte formation or an ordinary formation based on a census of the literature. Some, such as Solnhofen or Jehol, were identified as Lagerst€ atten in 100% of the literature, whereas others were sometimes called a Lagerst€ atte, sometimes not. We tried both conservative and permissive approaches and were permissive in making a maximum count of Lagerst€ atten. According to some, the UK has no Lagerst€ atten at all, whereas others, followed here, would allow that, for example, the Early Jurassic Charmouth Mudstone Formation of Lyme Regis (Benson et al. 2010) and the Beacon Limestone Formation of Strawberry Bank (Williams et al. 2015) show evidence of exceptional preservation. Adopting this 'permissive' approach means we do not underestimate the occurrences of Lagerst€ atten, and in any case all, even the doubtful Lagerst€ atten, show higher-than-average preservation quality.
Fossil occurrences were allocated to time bins. Time bins with a mean duration of 11 myr (standard deviation = 3.8 myr) were used, the same as used by Alroy et al. (2001) to achieve durations that are as similar as possible, and with a sufficient sample size per bin. Where a formation in reality spans several time bins, the formation occurrence was counted in all possible bins. In other cases, multiple bins are indicated because of uncertainty over dating in the PBDB, and here we assigned each collection an age between the upper and lower age limits randomly, using a bootstrapping approach over a uniform probability distribution. This process was repeated 1000 times to obtain mean occurrence counts with confidence intervals reflecting the gross uncertainty of dating the collections. Bootstrapped counts of collections, occurrences, and genera were produced in this way, for Lagerst€ atte and ordinary formations, and from marine and terrestrial lithofacies in each country. All calculations were carried out in R (R Core Team 2016).
Analysis was carried out at genus level to ensure enough data could be included and to manage some forms of uncertainty. Genera have been used as a proxy for species in previous studies (Sepkoski 1996;Dunhill et al. 2014a); however, genera are not defined biologically, may not be monophyletic, and include widely varying numbers of species (Hendricks et al. 2014). This must be weighed against the volume of data required for a study such as this one. PBDB data enterers have made an effort to include data at species level where possible, but this is not uniform, and especially so for incomplete specimens, which may be recorded simply to genus level. Further, for some taxa, the distinctions between species are disputed, and in any case, most of the vertebrate taxa we consider are monospecific, meaning that generic counts per bin are not so different from species counts.
Taxonomic richness for each country-level study was considered both with and without Lagerst€ atten. The UK and Germany should have similar patterns of outcrop, being geographically close. China, on the other hand, has been less thoroughly sampled for fossils (being larger in geographical area and having been sampled over a shorter period of history), and should lie at a lower point on a regional-scale 'collector curve'. Here we use collector curves as a measure of research history, and in particular the accumulation of novel taxa, as done in previous palaeontological studies (Tarver et al. 2007;Benton 2008Benton , 2015.

Fragile and robust taxa: a new metric
If Lagerst€ atten form an important part of the total palaeodiversity record, they are assumed to achieve this by providing better sampling of fragile taxa than in 'ordinary' fossil beds. It has been claimed that the fossil records of some delicate and small taxa, namely birds Benton 1987), and that their fossil records are dominated by occurrences in Lagerst€ atten. To test this, occurrences of four of these 'fragile' vertebrate taxa (lissamphibians, lepidosaurs, pterosaurs, birds) in Lagerst€ atte formations and ordinary formations were counted, and the proportions of 'robust' and 'fragile' taxa in each comparable time bin were calculated. Note that these designations as 'robust' and 'fragile' are arbitrary and include aspects of body size and bone fragility; lissamphibians, squamates, pterosaurs, and birds are all generally smaller than turtles, crocodilians, dinosaurs, and mammals, may have hollow bones, and lack armour plates and horns. There are many exceptions of course, including tiny mammals and large squamates, such as mosasaurs.
To document the proportions of fragile to robust taxa in Lagerst€ atte and ordinary formations, we developed a new ratio. We could not simply record actual numbers in each category because the values are affected by phylogeny; clades of both types may originate, diversify or become extinct within any time bin. Therefore, the ratio for each time bin was normalized by the equivalent ratio for ordinary formations in the same bin. The ratio is given in Equation 1, where N is number of taxa (species or genera).
This equation is analogous to an isotope fractionation equation; the Lagerst€ atte formation ratio takes the place of the 'sample' and the ordinary formation ratio takes the place of the 'standard'. This ratio is calculated for each time bin and country with enough data; i.e. in time bins where fragile and robust taxa occur in both Lagerst€ atten and ordinary formations. The fragile/robust fossil ratio indicates the extent to which the hard-to-preserve groups, such as Lissamphibia, Lepidosauria, Pterosauria and Aves, are represented in Lagerst€ atte formations compared to ordinary formations. It is an indicator of how much of the richness of these groups has been lost in time bins containing no Lagerst€ atte formations. When r f = 0, the ratio of fragile to robust taxa in Lagerst€ atte formations is the same as in ordinary formations, when r f > 0, the ratio of fragile to robust taxa is greater in Lagerst€ atte formations than ordinary formations, and when r f < 0, the ratio of fragile to robust taxa in Lagerst€ atten formations is smaller than in ordinary formations. Since fragile groups such as birds and pterosaurs are more commonly preserved in Lagerst€ atten, the expectation is that r f >> 0 for all time bins. For each country, a mean ratio is also calculated.

Regression modelling
It has previously been suggested that the occurrence of Lagerst€ atten can be a driver of diversity counts (Butler et al. 2009;Benson & Butler 2011;Lloyd & Friedman 2013;Dean et al. 2016). This hypothesis is examined using linear regression models for each country. Previously, this effect has been corrected by coding the presence or absence of Lagerst€ atten in each time bin (Benson & Butler 2011;Dean et al. 2016). Every time bin containing Lagerst€ atten is given equal weight, regardless of the number of Lagerst€ atten contained within. To improve this, we use a count of the number of Lagerst€ atten in each time bin, rather than binary coding. For comparison we have, however, duplicated the modelling process using a binary presence/absence time series; these results are reported in Walker et al. (2019).
Model fit was assessed in two ways: R 2 values (best fit), which do not consider model simplicity, and AICc weights (Akaike information criterion corrected for small sample sizes; best fit, model simplicity and sample size), which do The tests for multicollinearity were used to determine whether different predictor variables (e.g. formation count, rock volume) are redundant with each other; for example, time bins with more fossil-bearing formations may also include more Lagerst€ atten if preserved rock volume from this time bin is high. Multicollinearity can be assessed by computing the variance inflation factor (VIF), which measures how much the variance of a regression coefficient is inflated due to multicollinearity in the model. The smallest possible value of VIF is 1 (absence of multicollinearity), and a VIF value that exceeds 5 or 10 indicates a problematic amount of collinearity (James et al. 2013).
We used regression models to test how well various sampling proxies predict genus richness. Regression models here were not used to produce estimates of 'true' diversity; they are used to obtain a greater understanding of the interplay between diversity and sampling. Regression models were calculated for each country. The German dataset contained both marine and terrestrial Lagerst€ atten, so the data could be considered for each; this was not possible for the data from the UK and China. The possible controlling variables included in the models were: time bin length, number of formations, number of collections, number of Lagerst€ atte formations in each bin, and the r f ratio; the last as a measure of how much the absence of Lagerst€ atten in a time bin might explain the absence of fragile taxa.

True richness using a Poisson sampling model (TRiPS)
To assess the effects of Lagerst€ atten on palaeodiversity time series, we sought a method that would provide sampling-free estimates of total, ordinary, and Lagerst€ attenbased counts per time bin. The TRiPS method (True richness estimated using a Poisson sampling model) uses occurrence data and the duration of a time bin to estimate original richness (Starrfelt & Liow 2016). These authors claimed that the TRiPS method provides not simply a measure of relative incompleteness of sampling per bin, but generates an absolute, corrected palaeodiversity time series.
To apply the method, sampling is modelled as a timehomogeneous Poisson process. Based on taxon abundances, a single sampling rate for a particular time interval can be estimated using maximum likelihood. In theory, this sampling rate encompasses all aspects of sampling, from decay through to preservation, diagenesis and sampling by palaeontologists. The estimate in each time bin is independent of all other time bins. Starrfelt & Liow (2016) claimed that the TRiPS method produces a 'real' diversity estimate, unlike subsampling and sampling proxy methods, which only aim to produce relative diversity estimates and therefore need to be viewed in the context of diversity estimates in other time bins. Estimates of richness were obtained by: (1) using all data; and (2) including data only derived from Lagerst€ atten. This is an empirical examination of the Lagerst€ atten effect on TRiPS.
We calculated TRiPS for each of the 1000 bootstrapped runs accommodating for uncertainty in the dating of the collections in the dataset. The mean TRiPS score was taken, and the TRiPS uncertainty was interpreted as the mean of the confidence intervals from all runs.

Formation durations and environments
Average formation duration is remarkably similar in the UK (9.81 myr) and Germany (9.85 myr); in China it is nearly twice as long (17.08 myr). This difference does not arise from differing methods used to compile the formation duration information (geological survey data for the UK and Germany and PBDB data for China) because formation durations for the UK and Germany based on PBDB data (8.21 myr and 7.82 myr respectively) are similar to their respective geological survey durations.
Depositional environments within the study time interval (early Silurian to mid-Cenozoic) differ in each country. In the UK and Germany, they are mixed (UK formations: 58% marine, 31% terrestrial; German formations: 65% marine, 23% terrestrial), whereas the majority of Chinese formations that yield vertebrates are terrestrial (86% of the formations in this dataset). It could be argued that the high proportion of terrestrial Lagerst€ atten in China reflects a strong focus on dinosaurs; this cannot be tested, but there has been strong collecting pressure recently to add marine vertebrates, for example from the key Triassic marine biotas (Benton et al. 2013). The differing proportions probably partially or largely reflect the relative proportions of outcrop of Mesozoic rocks in each of the three countries.

UK time series
In the UK, 8% of formations are Lagerst€ atte formations, but these provide 14% of collections and contain 19% of genera recorded ( Table 1). 8% of genera occur in both Lagerst€ atte formations and ordinary formations, so 11% of genera are only found in Lagerst€ atten.
Most formations are marginal, mixed or marine (Walker et al. 2019). Terrestrial genus counts are highest in the Middle Triassic, Upper Jurassic and Lower Cretaceous, with the terrestrial formations in these intervals being particularly rich in taxa and yielding large proportions of the collections derived from the time bins they occupy. None of the purely terrestrial formations counts as a Lagerst€ atte formation (Fig. 1).
Marine Lagerst€ atte formations occur in the Upper Triassic, throughout the Jurassic, the middle Cretaceous, and the Palaeogene (Fig. 1). Marine Lagerst€ atte formations yield a large number of collections in the Upper Jurassic. There is little overlap in the genera occurring in marine Lagerst€ atte formations and ordinary formations, except in the Palaeogene. Lagerst€ atte formations account for most of the genus richness in the marine Jurassic.  The Middle to Upper Jurassic has the most fossiliferous formations (Fig. 2B). Similarly, counts of genera and collections also peak in the Upper Jurassic, but other highs in genus and collection counts (Lower Cretaceous, Palaeogene and Neogene) do not correspond to large peaks in the number of formations (Fig 2A). There is little error introduced into the collection and genus curves as a result of uncertainty in age estimation (Walker et al. 2019). Lagerst€ atte formations contribute to the diversity peaks of the Upper Jurassic and Palaeogene, and account for all of the diversity in the Upper Cretaceous. There is little sharing of genera between Lagerst€ atte formations and ordinary formations.

German time series
In Germany, 13% of formations are Lagerst€ atte formations, but these provide 42% of collections and contain 30% of genera recorded (Table 1). 8% of genera occur in both Lagerst€ atte formations and ordinary formations, so 22% of genera are only found in Lagerst€ atte.
The rock record could be split into terrestrial and marine components (Fig. 1). Terrestrial sediments yielded the greatest number of collections and genera by far in the Middle-Upper Triassic, Lower Cretaceous and Neogene.
Marine Lagerst€ atte formations occur in the Lower Devonian, Lower Triassic, Middle-Upper Jurassic and Lower Cretaceous (Fig. 1). Marine Lagerst€ atte formations add greatly to the collection and genus counts in the Devonian, Triassic and Jurassic -Lower Cretaceous. There is little overlap between genera found in marine Lagerst€ atte formations and ordinary formations. Lagerst€ atte formations add most of the genus richness in the Devonian, Triassic and Jurassic time bins in the marine record of Germany.
Terrestrial Lagerst€ atte formations occur in the Triassic and Palaeogene (Fig. 1). Deposits of these ages have yielded a large proportion of the collections and of genus richness within the respective time bins. Again, there is little overlap between genera occurring in Lagerst€ atte formations and ordinary formations, with some genera occurring in both types in only the Upper Triassic. There is little error introduced into the terrestrial counts in Germany, except for genus counts in the Palaeogene. There is little uncertainty introduced into specimen and genus counts as a result of age uncertainty, except in the Palaeogene. Lagerst€ atte formations add about half of the Triassic diversity in the terrestrial record, and account for most of the terrestrial Palaeogene richness.
The German rock record has the largest number of fossiliferous formations in the Triassic, with smaller peaks in the Upper Jurassic and Palaeogene-Neogene (Fig. 3). The number of collections through time largely follows this pattern. However, there are many collections in the Lower Jurassic, all of which come from Lagerst€ atte formations. There is also a dearth of Devonian to Carboniferous material in the data, with most of the diversity in this period deriving from two collections of material from a single formation (the Lower Devonian Hunsr€ uck Slate).
Generally, genus richness does not follow the same pattern as formations and collections; genus richness is relatively low in the Palaeozoic and Mesozoic, with a large peak in the Neogene. There is little uncertainty introduced into the curves as a result of age estimate uncertainty, except in the genus counts in the Palaeogene, Devonian and Upper Jurassic. Lagerst€ atte formations make up a small proportion of the fossiliferous formations. Lagerst€ atte formations occur in the Devonian, Triassic, Jurassic, Palaeogene and Neogene. These few Lagerst€ atte formations have yielded large numbers of collections in the Triassic to Lower Cretaceous, and a large proportion of genus diversity in this period and in the Palaeogene. However, the large Neogene peak consists of data from formations that have not been defined as Lagerst€ atten. There is little sharing of genera between Lagerst€ atte formations and ordinary formations.

Chinese time series
Only 7% of formations are Lagerst€ atte formations, but these provide 26% of collections and contain 32% of genera. Only 3% of genera occur in both Lagerst€ atten and ordinary formations, so 29% of genera from China are only found in Lagerst€ atten.
The fossil record is almost entirely made up of terrestrially-deposited formations, with the exception of examples in the Silurian, Permian-Triassic and Lower Cretaceous (Fig. 1). This contrasts with the mostly marine record of the UK and the evenly split record of Germany. Because continental deposits dominate, the terrestrial patterns are similar to the patterns seen for all formations. There are relatively small numbers of terrestrial Lagerst€ atte formations in the Lower Triassic, Upper Jurassic and Lower Cretaceous, but these are entirely responsible for producing the peaks in genus richness in these periods. There are only three marine Lagerst€ atten in this dataset.
The Chinese geological record has a large number of fossiliferous formations in the Lower Cretaceous, and in the mid-Palaeogene (Fig. 4B). Peaks and troughs in the number of collections through time match peaks and troughs in genus diversity.

Fragile and robust taxa
The groups defined here as fragile taxa comprise a varying proportion of the diversity in each time bin and each country (see Walker et al. 2019). Lissamphibia, Lepidosauria and Pterosauria originated in the Triassic, and occur sporadically from the Triassic to Neogene in the UK. In Germany, fragile taxa occur in the Jurassic, Palaeogene and Neogene. In China, fragile taxa make up most of the diversity in the Lower Cretaceous, and this diversity is largely composed of birds.  for each country (in fact, the Coniacian in both countries, arising from the very small sample sizes of only one or two specimens). If we tried to calculate r f values for individual clades, the standard deviations would probably become larger because of the patchiness of the data.

Model fits
The best three model fits for each country, and the marine and terrestrial of Germany, are listed in Table 3 and Walker et al. (2019). In the UK, the model using the number of collections and specimens is the best for predicting raw richness (Table 3; AICc weight = 0.346, of 63 models). The secondbest model includes the number of collections and specimens, plus formations. In Germany, the best model overall includes the number of specimens, collections, formations and Lagerst€ atten (Table 3; AICc weight = 0.158, of 63 models). When the German data are split into marine and terrestrial, marine data are best predicted by the number of collections and specimens (Table 3; AICc weight = 0.294, of 32 models), with the second-best model using only the number of specimens. Terrestrial German data are best predicted by the model including the number of collections, specimens, formations and Lagerst€ atten (Table 3; AICc weight = 0.779, of 31 models), although within this model the collections time series shows a high level of multicollinearity (VIF = 5.1; Walker et al. 2019). Chinese measured richness is best predicted by the number of specimens and formations (Table 3; AICc weight = 0.271, of 63 models). In other Chinese models with less weight, specimens and collections when used together in a model display a high level of multicollinearity (VIF > 5), suggesting that they are redundant with each other (Walker et al. 2019), as might have been predicted (Benton 2015).

TRiPS estimates
Generic richnesses estimated from TRiPS (Fig. 5) broadly follow the same patterns and are rarely much higher than raw measured richnesses in all three countries. The error bars in Figure 5 show that there are significant departures from the raw richness calculations in some time bins (e.g. Palaeozoic to Permian of Germany), which also have the largest error bars.
TRiPS estimates for Lagerst€ atten are no more similar to raw richness measurements than the estimates from the whole datasets (the R 2 values in Fig. 6A-C are within the same range as the same values for Fig. 6D-F). The Lagerst€ atten calculations also have large error bars which, in some time bins, are greater than the range of the estimated time series (Fig. 5).
T A B L E 3 . The three best-fitting models that explain aspects of the fossil genus richness palaeodiversity time series for each country, with the German record shown in total and subdivided into marine and terrestrial categories.

Model
Model

Formation duration
It is remarkable that the average formation duration in China is nearly twice as long as average formation durations in the UK and Germany. This is unexpected for the methodological reason that the Chinese duration data are derived from individual collection dates from the PBDB only, not from entire formation duration data as for Germany and the UK. There are two possible explanations, that either: (1) the concept of a formation encompasses less time and fewer sediments in the UK and Germany, when compared with China; or (2) the dating of collection records in PBDB is less precise than the dating of formations by geological surveys in Europe. The second suggestion has been rejected (see Results) as mean formation durations are the same in the PBDB and from independent survey data from the UK and Germany. Therefore, the first explanation is probably correct, that this is a historical aspect of the ways in which survey geologists worked in their respective countries, reflecting the huge size of China compared to the UK and Germany, and that exposure in China is often better than in northern Europe, so correlations can be done directly over longer distances, thus making some formations geographically and temporally huge (Benton et al. 2011). Further, of course, geological surveys in Europe began in the 1830s, whereas the China Geological Survey was refounded in 1999 (Jackson 2018), thus providing less time for revision and subdivision.
Which taxa are preserved in Lagerst€ atten?
We have noted the problem of establishing a sharp definition of Lagerst€ atten, but Lagerst€ atte type might also affect how taxa co-occur in ordinary fossil beds. A conservation Lagerst€ atte can preserve a range of soft tissues and softbodied organisms, and so is likely to expand raw diversity counts, for example by 50% for pterosaurs (Dean et al. 2016) and by a similar amount for marine reptiles (Benson & Butler 2011). On the other hand, a concentration Lagerst€ atte such as a bonebed can preserve a large amount of mineralized material, but much of it may represent taxa already known from ordinary fossil beds. Lagerst€ atten distribution is variable in time and space; for example, conservation Lagerst€ atten are particularly common in the Cambrian and Jurassic (Allison & Briggs 1993;Muscente et al. 2017). This variation arises from real changes in environment, making exceptional preservation more likely in some regions and time periods. As regions and taxa are sampled, the accumulation of fossiliferous formations follows a collector curve. In the early stages of investigation (e.g. in the nineteenth century in the UK), new discoveries were frequent. As collecting within a region matures, fewer new taxa are discovered, as common taxa have already been found (Benton 2008(Benton , 2015. Lagerst€ atten are likely to be discovered relatively early in the cycle as their fossils are abundant and spectacular. We cannot provide definitive evidence about the . Whether these count as rare late chance finds, or whether Lagerst€ atten can continue to be discovered even after centuries of search, has to be quantified. Thus, the influence of Lagerst€ atten on diversity counts may depend on the current point along the collector curve.
In our study, for all three countries, and as expected, 'fragile' taxa make up a greater proportion of taxa in Lagerst€ atte formations than in ordinary formations in the same time bins. The proportions in Table 2 suggest that 'fragile' taxa are on average nearly 14 times more likely to be preserved in Lagerst€ atten than in 'ordinary' formations. This high figure is skewed by the Chinese data, which have a ratio of 25.4, much higher than the German ratio of 1.4.
It might be expected that Lagerst€ atten would preserve broadly the same taxa as in less informative deposits of similar age and facies, but with the addition of some less easily preserved taxa. However, we find there is little overlap in the genera preserved in Lagerst€ atte formations and ordinary formations (Table 1), even in Germany, where the r f ratio is low, and therefore with fewer fragile taxa unique to Lagerst€ atten. It could be that some time bins have few Lagerst€ atten, so this would be expected, but this does not seem to be the case. When the time series data from each country are considered, the overlap between genera preserved in Lagerst€ atten and in ordinary formations remains small even in the time bins containing multiple Lagerst€ atten, and when Lagerst€ atten contain large proportions of diversity. There are three possible explanations for this:  (1977). Therefore, museum collections might feature the birds, mammals or trilobites at the expense of plants or brachiopods. This could be true of some Jehol Group localities, where collectors have specialized in recovering rare bird fossils. However, with time, palaeontologists make more controlled samples (Benton 2015), where every fossil is recovered and catalogued, and in the case of the PBDB, all taxa in collections are recorded, not just the rare ones. In summary, it is certainly true that many Lagerst€ atten preserve unusual environments (1) or are subject to sedimentary and environmental filters (2), but collection-level datasets based on museum collections may reflect the bonanza effect (3).

Differences between countries
The impact of the Lagerst€ atten effect varies among the three countries. Germany has the highest proportion of Lagerst€ atte formations making up the rock record (Table 1; 14%) but a similar proportion of diversity from Lagerst€ atte formations as in China (30% and 33% respectively). The UK has few Lagerst€ atte formations (8% of formations), and few genera from such units (19%). China has few Lagerst€ atten formations (7%), but a large proportion of genera from Lagerst€ atten (32%). From this, in terms of genus count, Lagerst€ atten have yielded less diversity in the UK record than that of Germany, but in China, a large proportion of diversity derives from exceptionally preserved deposits.
The palaeodiversity curves show a number of Lagerst€ atten and other peaks. For the UK ( Fig. 2A), only two of the 11 peaks (18%) can be ascribed to Lagerst€ atten. Others, such as the Middle Triassic (Anisian) peak, correspond to ordinary fossiliferous formations; here the widespread Helsby Sandstone Formation and equivalents. This unit is also likely to be well sampled due to wide exposure through quarrying for building stone (Dunhill et al. 2013). The Late Jurassic peaks represent the fossiliferous Oxford Clay and Kimmeridge Clay formations, which yield abundant marine reptiles and other fossils, although their identification as Lagerst€ atten might be queried by some. Although largely clay-dominated and so not naturally exposed extensively, these units were massively quarried for brick-making and so many fossils were collected (Dunhill et al. 2014b). The Early Cretaceous peak corresponds to the Wealden sandstones and mudstones, sources of abundant fossils, but not Lagerst€ atten as there is little or no soft tissue preservation. The same is true of some of the Palaeogene units such as the London Clay, with abundant fossils, which are heavily studied despite limited exposure (Dunhill et al. 2014a).
In Germany (Fig. 3A), four of the 11 peaks (36%) can be explained by Lagerst€ atten. These are: the Triassic peak represented largely the Muschelkalk, with rich finds in several formations over wide areas, but not all classed as Lagerst€ atten; the Early Jurassic peak, largely the Holzmaden Lagerst€ atte; the Late Jurassic peak, largely the Solnhofen Lagerst€ atte; and the Palaeogene peak, the Messel Lagerst€ atte. However, the very high diversity Neogene peak is composed of non-Lagerst€ atte formations.
In China (Fig. 4A), Lagerst€ atten peaks are also significant, explaining four of the 11 peaks (36%). These include the Middle Triassic (Panxian, Luoping, Guanling), the Late Jurassic (Yanliao) and Early Cretaceous (Jehol). The relative importance of Lagerst€ atten to the fossil record of China may have arisen because Germany and the UK have been more thoroughly studied than China; this could be explicitly tested by examining the collector curve for each country. Alternatively, terrestrial Lagerst€ atte formations could contain a greater number of taxa than marine Lagerst€ atte formations. However, when German units are split into marine and terrestrial, a greater proportion of genera are sampled from Lagerst€ atte formations in marine environments (32%) than terrestrial ones (18%). Another explanation may be the geological and geographical characteristics of each country; China is large, and Chinese Lagerst€ atten such as the Middle Triassic Guanling Formation and the Early Cretaceous Jehol Group cover enormous areas, and so have yielded huge numbers of fossils from hundreds of localities each.

Model fitting
Lagerst€ atte count appears as an independent variable in two of the three best German models, but in only the third-best model from the UK, and in none of the top three Chinese models. In a localized study of fishes in Great Britain, Lloyd & Friedman (2013) found that the presence of Lagerst€ atten in time bins was included in one of their three top models, but the best global marine reptile models of Benson & Butler (2011) all included the presence of Lagerst€ atten. These previous results suggest that the effect of Lagerst€ atten on taxic richness is variable, and the results presented here suggest a regional effect, with Lagerst€ atten exerting more control over diversity in Germany than in the UK and China. The results for China contradict conclusions that may be drawn from Table 1 (that a large proportion of genera come from Lagerst€ atten) probably because data are skewed by the high Lagerst€ atte genus richness count in the Lower Cretaceous.
For the German data that has been split into marine and terrestrial facies, Lagerst€ atte count is included in the top two terrestrial models but in none of the top three marine models. This suggests a greater Lagerst€ atte control in German terrestrial than marine environments.
Regression modelling has been used frequently in seeking to identify which predictor variables best explain the response variable (palaeodiversity), but we confirm other recent studies (Benton et al. 2011;Dunhill et al. 2014aDunhill et al. , 2018Benton 2015) here, that the method can be problematic. Input variables are often related, and here for several of the Chinese models, specimens and collections are redundant with each other. Additionally, the differences in AICc values between best and near-best models are often small, suggesting that, for the total German data, for example, there is little to choose between the top two or three models.

Estimating true diversity
In all three countries, the TRiPS estimates of richness are remarkably similar to the raw richness measurements.
Additionally, in many of the time bins, the TRiPS method produces very large error bars. There are two possible explanations for this: (1) in all three countries, sampling does not change the gross palaeodiversity pattern, and only minimally reduces the total richness which is recorded in fossiliferous sediments; or (2) it is not the case that TRiPS can correct for sampling in real datasets. This inadequacy was also suggested by Close et al. (2018), who assessed different estimators and extrapolators of the fossil record, including TRiPS.
Lagerst€ atten, when compared with units with nonexceptional preservation, may provide a more accurate record of past biodiversity for the ecosystems they preserve than 'ordinary' deposits. Mineral replacement allows for less recalcitrant tissues to be preserved, a greater variety of taxa are fossilized, and preservation potential is higher in these deposits (Muscente et al. 2017). Additionally, palaeontological research effort is often concentrated on exceptional deposits, as 'systematists follow the fossils' (Raup 1977). Elevated preservation potential and sampling effort combine to produce the Lagerst€ atten effect. Because exceptional deposits have higher preservation potential and are more thoroughly sampled, it might be that raw richness measured from Lagerst€ atten-only data would be more similar to TRiPS estimates than the allformations data.
Contrary to expectation, TRiPS calculations based on all data and the Lagerst€ atten-only data are both equally similar to their respective raw richnesses (R 2 values in Fig. 6). This could be because the Lagerst€ atten data dominate the total data sets in some way, or because the TRiPS method is more dependent on the input data, and less able to predict missing data, than has been assumed (Close et al. 2018).
For the TRiPS method to have utility, and to meet the claims of its authors (Starrfelt & Liow 2016), it should yield a richness count that is close to the truth, compensating for heterogeneous sampling. This is difficult to achieve because the fossil record is so patchy, with gaps in time bins that lack Lagerst€ atten, or even lack Lagerst€ atten from multiple sedimentary environments. The findings here that TRiPS values mirror raw richness counts and that the error bars produced by the method span a range greater than the time bin values (as found by Starrfelt & Liow 2016) both call into question the utility of this form of TRiPS in real palaeontological datasets.

CONCLUSIONS
Not all of the dramatic peaks seen in the richness curves here are produced by Lagerst€ atten, although some regional patterns emerge: (1) a small number of Lagerst€ atten yield a large proportion of diversity in China; (2) in Germany, exceptional preservation is more widely spread through the fossil record, yielding a similarly large proportion of genus diversity; (3) in the UK, a small number of Lagerst€ atten yield a moderate proportion of genus diversity. The fossil records of the UK and Germany (in particular the fossil records of fragile taxa in these countries) are less dominated by Lagerst€ atten than that of China.
The extent to which fragile taxa are preserved in Lagerst€ atten also varies among the three countries; the fragile taxa occurrence per robust taxon rate (r f ratio) in Chinese Lagerst€ atten is much higher than the UK or German equivalents. This indicates that fragile taxa are hugely undersampled in ordinary formations, or, conversely, that robust taxa are underrepresented in Lagerst€ atten in China; this could be further tested. We infer that some Lagerst€ atten truly over-sample fragile taxa because they preserve unusual ecosystems where, for example, birds or pterosaurs are especially represented; numbers of such taxa are high from China because of the great scale of some Lagerst€ atten deposits such as those in the Jehol Group.
It is unlikely that the TRiPS method can provide a true estimate of past richness. TRiPS produces diversity curves that are remarkably similar to the empirical curves. In exceptional deposits this result might be unsurprising, but this also occurs when using data from deposits which have not been termed 'Lagerst€ atten', and so the correction factors do not seem to differentiate cases where we have independent evidence for good or poor sampling.