The fossil record of ichthyosaurs, completeness metrics and sampling biases

Ichthyosaurs were highly successful marine reptiles with an abundant and well‐studied fossil record. However, their occurrences through geological time and space are sporadic, and it is important to understand whether times of apparent species richness and rarity are real or the result of sampling bias. Here, we explore the skeletal completeness of 351 dated and identified ichthyosaur specimens, belonging to all 102 species, the first time that such a study has been carried out on vertebrates from the marine realm. No correlations were found between time series of different skeletal metrics and ichthyosaur diversity. There is a significant geographical variation in completeness, with the well‐studied northern hemisphere producing fossils of much higher quality than the southern hemisphere. Medium‐sized ichthyosaurs are significantly more complete than small or large taxa: the incompleteness of small specimens was expected, but it was a surprise that larger specimens were also relatively incomplete. Completeness varies greatly between facies, with fine‐grained, siliciclastic sediments preserving the most complete specimens. These findings may explain why the ichthyosaur diversity record is low at times, corresponding to facies of poor preservation potential, such as in the Early Cretaceous. Unexpectedly, we find a strong negative correlation between skeletal completeness and sea level, meaning the most complete specimens occurred at times of global low sea level, and vice versa. Completeness metrics, however, do not replicate the sampling signal and have limited use as a global‐scale sampling proxy.

P A L A E O N T O L O G I S T S are keen to discover a reliable means to identify completeness of the fossil record. Suggested approaches include sampling standardization to equalize sample sizes, comparison and correction of fossil record data with proposed metrics of sampling such as formation or collection counts, identification of implied gaps (Lazarus gaps, ghost ranges) and consideration of specimen quality (reviewed in Smith 2007;Benton et al. 2011). In terms of specimen quality, it might be hypothesized that times of overall poor sampling should also correspond to times of poor specimen quality: incomplete or damaged specimens would be hard to identify and so diversity would be underestimated. Completeness metrics have been devised to document the preservation quality of taxa or individual specimens. These include taxon completeness scores that document whether species are represented by isolated bones, complete skulls or multiple skeletons (Fountaine et al. 2005;Benton 2008;Dyke et al. 2009), and completeness scores that document the percentage of the skeleton that is present (Mannion and Upchurch 2010;Beardmore et al. 2012;Brocklehurst et al. 2012).
The relationship between specimen completeness and diversity is unclear. One might expect that diversity would be highest when skeletons were most complete, and indeed, Brocklehurst et al. (2012) found a positive and statistically significant correlation between completeness and diversity for Mesozoic birds, and Mannion and Upchurch (2010) also found a correlation for sauropodomorph dinosaurs, but only for the Late Cretaceous. On the other hand, Brocklehurst and Fr€ obisch (2014) found a negative relationship between skeletal completeness and diversity for early synapsids, indicating a tendency among palaeontologists to name many species based on incomplete material.
Equally interesting is to assess whether skeletal completeness is a predictor of sampling more generally. Initial studies using completeness scores on terrestrial animals including sauropodomorph dinosaurs (Mannion and Upchurch 2010), birds (Brocklehurst et al. 2012) and nonmammalian synapsids (Brocklehurst et al. 2013; Walther and Fr€ obisch 2013; Brocklehurst and Fr€ obisch 2014) did not find any relationship between times when skeletal completeness was low and times of poor overall sampling (i.e. low numbers of species, low numbers of fossiliferous formations). If anything, some times of apparently poor overall sampling corresponded to high overall skeletal completeness values, based on small numbers of sites of exceptional preservation. This could reflect some particular aspects of the sporadic nature of preservation of terrestrial fossil deposits and terrestrial tetrapods, so we chose to explore a group that is marine and apparently has a rich fossil record (McGowan and Motani 2003), the ichthyosaurs.
Ichthyosaurs were highly successful pelagic predators with a temporal range from the Early Triassic to the early Late Cretaceous (Motani 2009). They have an abundant fossil record for a large proportion of this time and have been intensely studied since the early nineteenth century. Many researchers have examined the diversity of ichthyosaurs as part of studies of all Mesozoic marine reptiles or for particular Mesozoic stages. While some consider potential biases affecting the fossil record (Benson et al. 2010;Benson and Butler 2011;Benton et al. 2013;Kelley et al. 2014), others only briefly mention (Thorne et al. 2011;Fischer et al. 2012) or do not consider (Zammit 2012) how this might affect observed diversity.
Mesozoic marine reptiles, including ichthyosaurs, have figured prominently in recent debates about the quality of the fossil record. In an initial study of the marine reptile record (Benson et al. 2010), strong correlations were found between apparent diversity and numbers of fossil reptile-bearing formations, and this was taken as evidence of prevalent bias. In a further study by Benson and Butler (2011), the ranking of rock volume and apparent diversity was found to indicate a biased record for pelagic taxa, but the correlations between formations and diversity for shelf taxa were ascribed by them to a 'common cause' (Peters 2005), namely sea level change and the resultant areas of continental flooding. This example illustrates how the commonly found covariation between fossil diversity and fossil-bearing formations could result from one of three causes, namely bias (Barrett et al. 2009;Benson et al. 2010), common cause (Peters 2005) or redundancy (Dunhill et al. 2014a), and all three should be considered as potential explanations (Benton et al. 2011;Upchurch et al. 2011).
Here, we explore the completeness of ichthyosaur specimens through their entire temporal and geographical distributions and investigate relationships with palaeodiversity, the rock record and sea level. We seek to identify times of low preservation quality, when a paucity of well-preserved fossils could increase the difficulty of identifying species. We explore host facies and completeness, as original deposition conditions can greatly affect preservation. We also compare records from the northern and southern hemispheres, as a preliminary test for any geographical variation in specimen completeness.

Data
We constructed a matrix of 351 specimens, representing all 102 currently valid ichthyosaur species (Cleary et al. 2015, appendix 1, sheets 1-3). Up to ten specimens were scored from each species (range 1-10, mean = 3.44 specimens per species), and information was drawn primarily from the literature, in papers containing good images or detailed descriptions of specimens (or a combination of both). TJC also visited Bristol City Museum and Art Gallery and the Natural History Museum in London to study otherwise inaccessible specimens, test the coding methods on actual fossils and check aspects of ichthyosaur anatomy.
Decisions on which taxa to include and exclude from this study were made using the most recent taxonomic literature (McGowan and Motani 2003;Maisch 2010). If a species was considered a nomen dubium, it was excluded, except in cases where taxonomic validity was debated, for example the Cretaceous genus Platypterygius.
Here, for completeness, we chose to retain species whose status is debated (Zammit 2012) as the study is based on individual specimens, and records of stratigraphic age, geographical location and overall size are unaffected.
A wealth of information was collected for each specimen (Cleary et al. 2015, appendix 1, sheets 1-3, 15), including geographical locality (modern coordinates), age (stratigraphic stage), body size (based on the length of the humerus when available) and geological setting (facies, divided into fine and coarse siliciclastic and carbonate categories, or a combination of the two).

Completeness metrics
We used two completeness metrics, the Skeletal Completeness Metric (SCM) and Beardmore's Skeletal Completeness Metric (BSCM). The SCM was devised by Mannion and Upchurch (2010) to document the skeletal completeness of sauropodomorph dinosaurs, and we adapted it for use with ichthyosaurs. The premise is to separate the skeleton into regions and then assign each region a percentage based on how much of the total skeleton that region represents. For ichthyosaurs, we divided the body into the skull, cervical + dorsal vertebrae, caudal vertebrae, pectoral girdle and forelimb, and pelvic girdle and hindlimb (Fig. 1A). We altered the proportions assigned to each skeletal division between Triassic and Jurassic/Cretaceous ichthyosaurs, as their body structure changed through time. As an example, the skull is rated at 20% in Triassic ichthyosaurs, but 30% in Jurassic and Cretaceous forms because it accounts for relatively more distinctive characters in later forms (Fig. 1A). Note that some regions are further subdivided. For example, in the Jurassic-Cretaceous, a preserved forelimb represents 10% of the total body and comprises the humerus (4%), radius (1%), ulna (1%) and phalanges (4%) (Fig. 1A). All divisions and subdivisions are listed in Figure 1A and Cleary et al. (2015, appendix 1, sheet 4). The sum of percentages from each area preserved gives a total SCM score.
The SCM had to be further adapted because ichthyosaurs are usually preserved in a lateral orientation, with only one side visible. Therefore, we report the completeness of one side of each specimen only. Where ichthyosaurs are preserved in the rarer dorsoventral orientation, we chose the best preserved side of the two. The skull must also be included as a whole entity, rather than as its individual components, as the compression of carcasses often eliminates cranial sutures (McGowan and Motani 2003). Two SCM values were recorded for each species: the SCM1 was based on the most complete specimen from each species; and the SCM2 was a composite of the SCM1 value plus any missing parts added from other specimens.
The BSCM was designed for use with marine crocodilians (Beardmore et al. 2012), but we modified it for use with the ichthyosaur body plan. The skeleton is divided into areas (Fig. 1B), and the completeness of each region is assessed according to a simplified scale, with a value between 0 (absent) and 4 (mostly/totally complete). For example, if approximately 40% of the dorsal vertebrae are present, then the dorsal section will score 2 (25-50% complete). The criteria for each numbered category can be found in Figure 1B and Table 1   divided by the total possible score (24) to give a BSCM score, which is then multiplied by 100 to obtain a percentage, for better comparability to the SCM. As with the SCM, only one side of each specimen is measured. Furthermore, the cervical vertebrae are amalgamated with the dorsal vertebrae, as it can be hard to determine the division between these two areas in some taxa (Fig. 1B). We also integrated the pelvic and pectoral girdles into the limb categories, as they were not included in the original (Fig. 1B). Two versions of the BSCM are given, BSCM1 for the single best preserved specimen of each species, and BSCM2 for a composite comprising the best individual plus others that provide information on elements missing in the best specimen, to provide the most complete value possible.

Comparative time series
Several time series of physical environmental variables and potential sampling metrics were compiled and divided into time bins equivalent to Mesozoic stratigraphic stages. There are no values for the Bathonian or Valanginian stages because these stages have not yielded ichthyosaurs identified to species level. This may affect correlation strength and significance, but omitting these stages might have removed a genuine signal of non-preservation and so the decision was made to run the analyses twice, both retaining and removing the zero-value data. Mean completeness values were calculated for each time bin from the sum of all SCM and BSCM values from ichthyosaur species included in that time bin. For each time bin, we also recorded ichthyosaur diversity (number of species, from our data), the number of all fossiliferous marine formations (FMFs) and ichthyosaur collections, taken from the Paleobiology Database (Paleo-DB; http://fossilworks.org; http://www.paleobiodb.org/). Sea level data were taken from the standard summaries by Haq et al. (1987)  Additional data recorded for each individual specimen included body size. This was assessed in classes, based on the length of the humerus, as small (<6 cm), medium (6-14 cm) and large (>14 cm) categories. Exact body sizes were not estimated, because the humerus is easy to measure accurately and is proportional to total body size in any taxon (Maxwell 2012; Martin et al. in press), and we were interested simply in broad patterns of skeletal completeness in size classes. Ichthyosaur specimens were further categorized as coming from the modern northern and southern hemispheres, as a means of assessing evenness of collecting across the globe. Sedimentary facies for each specimen were also noted, as predominantly siliciclastic or carbonate, based mainly on categories given in the PaleoDB. For these additional data, we grouped all individual specimens into categories, rather than using the 'best specimen' and 'composite' metrics.
In our study, we did not distinguish Lagerst€ atten from other deposits for statistical comparison, as the distinction is not clear for ichthyosaurs, and perhaps also for other marine reptiles, especially when compared to pterosaurs and birds (e.g. Brocklehurst et al. 2012). An easy solution would have been to choose only those geological formations that are traditionally called Lagerst€ atten (e.g. Guanling, Holzmaden, Solnhofen) and compare them with the rest. However, there is a sliding scale of ichthyosaur completeness between these, and other units of excellent preservation that are only sometimes called Lag-erst€ atten (e.g. Lias of Dorset, Oxford Clay). Drawing the line would be arbitrary.
Relationships between pairs of time series were assessed using pairwise Spearman rank correlation tests and multiple regression models following the methods of Benson and Butler (2011) False discovery rate (FDR) corrections were applied to families of associated correlation tests using the method of Benjamini and Hochberg (1995) to reduce the chance of acquiring type I statistical errors. Both linear modelling ('lm' and 'step' functions in R) and generalized least squares models (GLS; nlme and qpcR programs in R, 'gls' and 'AICc' functions) were applied. The linear models allowed sequential removal and addition of time series to seek the model that best explained the completeness metrics. GLS models take account of autocorrelation, and the GLS estimator is unbiased, consistent, efficient and asymptotically normal. We used the first-order autoregressive (AR(1)) correlation model, which has the property of seeking autocorrelation at up to one lag in either direction, and of minimizing the error term (Box et al. 1994). The quality of fit of models can be estimated using AIC and BIC values given by the GLS output, but these may not provide the best results for small sample sizes, as we have here. Therefore, we used the Akaike's second-order corrected information criterion (AICc command in qpcR program in R). We do not provide correlation coefficients. We do not compute R-squared ('pseudo-Rsquared'), F-value or p-value for the GLS models as the merits of such estimators are currently debated (e.g. Freese and Long, 2006).
The aim was to determine whether any of the various metrics might be a reliable indicator of sampling quality, and also why some time bins might be better or worse represented by fossil specimens, and whether this might be associated with differences in specimen size or sedimentary facies available. Differences in completeness were assessed using Kruskal-Wallis tests. All analyses were carried out in R (v. 3.1.1), and we give code for the functions we used (Cleary et al. 2015, supplement, appendix 6).
During the Triassic, completeness is lowest during the Ladinian for all metrics (Fig. 2). This dip reflects the limited geographical range of sampling: only two species are known from one area of British Colombia. The rise in the Carnian after this low represents the Chinese Guanling Lagerst€ atte, whereas during the Norian there are numerous specimens, but poor completeness. Most of the Norian specimens are from a small area of Canada, and there is evidence for a marine transgression during this stage (Edwards et al. 1994), which may have led to a lack of restricted basinal facies that are associated with exceptional preservation.
Completeness varies throughout the Jurassic (Fig. 2), with the first peak in the Sinemurian, corresponding to the heavily sampled Blue Lias and Charmouth Mudstone formations (Dunhill et al. 2012), which have yielded many excellent, complete specimens of ichthyosaurs since the early 1800s. Completeness falls during the Middle Jurassic (Fig. 2), reflecting a paucity of localities that only produce a sparse assemblage of incomplete specimens. The Callovian peak in completeness reflects the geographically restricted collections from the Oxford Clay Formation that yield exquisite, and occasionally mostly complete ichthyosaur specimens (Martill 1986). There is a dramatic drop in preservation quality across the Jurassic-Cretaceous boundary (Fig. 2), and it has long been debated whether this represents an extinction event or simply a major facies change, from marine to continental deposits, across Europe. In fact, the extinction rate of ichthyosaurs across the J/K boundary appears no higher than the background rate (Fischer et al. 2012;Zammit 2012), despite claims of an apparent mass extinction event at that time (Bambach 2006). Completeness remains relatively low throughout the Cretaceous (Fig. 2), apart from a spike during the Albian, although is lower that the periods of best preservation in the Jurassic. Our plot of ichthyosaur diversity through time (Fig. 3A) shows peaks in the Early and Middle Triassic, Early and Middle Jurassic, latest Triassic (Tithonian), and in the early Late Cretaceous. This diversity time series represents counts from the taxa we assessed, so is not complete, but it shows the same pattern as seen in previous, comprehensive compilations (e.g. Benson and Butler 2011, fig. 3), except for our J/K peak. The peaks in many cases represent Lagerst€ atten, sites of exceptional fossil preservation.
We compared the various completeness metrics with a number of sampling proxies (Figs 3-5). The results with and without zero-zero data are broadly similar (Table 2; Cleary et al. 2015, supplement, appendix 2), although the removal of the zero-zero data highlights the relationships between sea level and specimen completeness. All the results discussed further in this study refer to the data set with the zero-zero Bathonian and Valanginian data removed.  Ichthyosaur diversity correlates significantly with collection count, and collection and formation counts correlate significantly before FDR correction (Table 2). This could indicate a sampling bias or, more likely, may relate to the relative rarity of ichthyosaur fossils, compared to other fossil groups, and thus redundancy between diversity and collections metrics (Dunhill et al. 2014a). The non-correlation between raw diversity and completeness metrics, however, confirms that these metrics have no relationship to diversity, and that their use as a sampling proxy is limited. Ichthyosaur collections show no correlation with completeness metrics (Table 2;  . This suggests that there is no link between time bins, the abundance of ichthyosaur specimens and specimen completeness. Fossiliferous marine formation counts (FMFs) show no correlation with diversity, sea level or the completeness metrics (Table 2; Figs 4-5). One would expect rising sea level to increase formation count, because most marine formations are from the continental shelf, and rising sea level expands the area of continental shelf, but it appears not to be the case in this study. Sea level also does not correlate with any of the other proxies. However, sea level does correlate negatively and significantly with all the specimen completeness metrics (Table 2;  , and all but the correlation between sea level and BSCM1 survive FDR correction (Table 2). This shows that ichthyosaur specimen completeness is highest during times of low sea level and deteriorates as sea levels rise.
As the completeness of ichthyosaur specimens seems to vary considerably between the Triassic-Jurassic time bins and the Cretaceous time bins, with an apparent marked dip in completeness across the Jurassic-Cretaceous boundary (Fig. 2), all correlations were run again for the Triassic-Jurassic and Cretaceous separately ( Table 2). The results for the Triassic-Jurassic data were very similar to the total data set, albeit with stronger negative correlations between sea level and completeness, and non-significant correlations and non-significant results between diversity and collections, and collections and formations ( Table 2). The Cretaceous data consist of fewer time bins, and therefore, the analysis lacks sufficient statistical power to make any conclusions.
The model fitting procedures provide rather different results. Multiple regressions highlight combinations of sea level, formations, collections and time period as the best predictors of specimen completeness (Table 3). As with the correlation results, the relationship between completeness and both sea level and formations is negative, suggesting that lower sea levels and fewer sampled formations result in specimens of higher completeness. The relationship between time period and completeness is also negative (as the coding refers to Triassic-Jurassic = 1,  Table 2 for correlation coefficients and p-values. and Cretaceous = 2), confirming that Triassic-Jurassic specimens are, on average, more complete than Cretaceous specimens. The only independent variable that does not feature in any of the best fitting models is diversity, providing further evidence that recorded ichthyosaur diversity is not linked to specimen completeness. Generalized least squares models do not eliminate diversity as a part of the best models for predicting specimen completeness (Table 4). Ranked by AICc value, the SCM1 is best explained by the model comprising collections, formations and sea level, and worst by the model comprising diversity, collections and formations. All five time series are roughly equally distributed between the best 16 models and the poorest 16 models, although, of single-factor models, time period performed best, and sea level, collections, diversity and formations were progressively poorer and poorer correlates of the SCM1 time series. The 'top five' models all contain collections and sea level as parameters, while the 'bottom five' do not all contain any one parameter, but formations occur in four of the five. All four SCMs showed similar best and poorest models (Cleary et al. 2015, supplement, appendix 4): the best models were 14 and 23 in all cases, with 7, 3, 16 and 9 always within the top five. The poorest five models were generally some mix of 25, 30, 17, 18 and 8, with 19, 22 and 28 featuring once. The GLS results then are equiv-ocal, and do not confirm the exclusion of diversity as in some way related to specimen completeness.

Variation in completeness with body size, geography and lithology
Completeness varies with size: medium-sized ichthyosaurs were significantly more complete than smaller or larger ichthyosaurs (BSCM; Kruskal-Wallis: v 2 = 10.578, df = 2; p = 0.005). Small and large ichthyosaurs had very similar median completeness ( Fig. 6; Cleary et al. 2015, appendix 1, sheet 10). This is surprising, because the null expectation was that larger ichthyosaurs would be more completely preserved than smaller ones, given the robustness of larger bones and their increased resistance to disarticulation and decay. There is a large range of completeness in each category (Fig. 6), however, which may be attributed to other factors such as geographical location and facies. Note that for the statistics in this section, SCM and BSCM were so similar that only one set of results is mentioned for size, hemisphere and geology comparisons (see Figs 6-8).
Northern hemisphere ichthyosaurs tend to be much more complete than southern hemisphere specimens ( D indicates that data has undergone generalized differencing prior to the application of Spearman rank correlation tests. A, BSCM1/2 and diversity; B, BSCM1/2 and collections; C, BSCM1/2 and fossiliferous marine formations (FMFs); D, SCM1/2 and sea level. See Table 2 for correlation coefficients and p-values.
Kruskal-Wallis: v 2 = 8.745, df = 1; p = 0.003). While materials from the northern hemisphere show a large amount of variation at individual localities, southern hemisphere ichthyosaurs consistently show low completeness values, with the exception of two specimens from Argentina that are reasonably complete (Fig. 7). When comparing the completeness of specimens recovered from different facies, we found no detectable difference in completeness between those recovered from coarse-vs. fine-grained lithologies (Cleary et al. 2015, appendix 1, sheet 12; for SCM, Kruskal-Wallis: v 2 = 2.374, df = 1; p = 0.1). However, ichthyosaur specimens in (coarser-grained) sandstones generally showed lower completeness scores than those in (finer-grained) mudstones in the original data, indicating that grain size should have an effect on completeness, but that a combination of facies factors (grain size and composition) is more important in preservation. A key example of these factors is whether each sediment is primarily siliciclastic or carbonate in its underlying lithology. There is a significant difference (Cleary et al. 2015, appendix 1, sheet 13; for SCM, Kruskal-Wallis: v 2 = 8.840, df = 2; p = 0.01) in completeness scores for specimens preserved in different lithologies of differing composition. Ichthyosaurs from predominantly siliciclastic deposits were best preserved, followed by those from mixed siliciclastic/carbonate facies, with the worst preserved recovered from predominantly carbonate units. When lithological categories are combined to reflect both composition and grain size, the five categories ( Fig. 8; Cleary et al. 2015, appendix 1, sheet 14) show significant differences in completeness (for SCM, Kruskal-Wallis: v 2 = 17.474, df = 4; p = 0.002). Coarse siliciclastic and fine carbonate sediments appear to be associated with a poor level of fossil completeness, while fine siliciclastic sediments consistently yield the most complete specimens (Fig. 8). However, we do see a high variance of completeness values, especially among the finer-grained lithologies and mixed facies (Fig. 8).

Comparison of completeness metrics
The very close correlation between the SCM and BSCM was surprising, as they had been expected to differ. SCM assigns completeness based on the amount each region contributes to the overall skeleton, but BSCM counts all regions as having the same relative weighting. This means that SCM accounts for the higher preservation potential of some parts over others, while BSCM does not. However, the nearly uniform very highly significant correlation between the two ( not matter. Perhaps also this might suggest that either metric would be equally useful in studies of overall skeletal completeness such as this; the BSCM (Beardmore et al. 2012) is more rapid to assess than the SCM (Mannion and Upchurch 2010).

Drivers of diversity and fossil quality
Diversity and collections correlate, albeit only before FDR correction (Table 2). In general, any single ichthyosaur species may be a part of many collections (as listed by the PaleoDB). This could be read as a simple metric of samplingthe more collections that are made (reflecting a combination of rock availability and collecting effort), the more ichthyosaur species are identified. Equally, this could be an indicator of the 'bonanza effect' (Raup 1977): time bins containing abundant fossils are much visited and much collected, so many ichthyosaur taxa are identified (Raup 1977;Dunhill et al. 2014b). Brocklehurst et al. (2013) identified that this may have been the case in their study of synapsid diversity, in which they found a similar significant correlation. Do collections drive diversity in this case (evidence of bias) or does diversity  Similarly, low collection counts show some correspondence with low ichthyosaurian diversity, reflecting an absence of ichthyosaur materials. It is unclear whether this means that ichthyosaurs were rare or absent in life (biological signal), were not preserved (preservation bias; geological signal) or were there and in the rocks, but just have not been collected (sampling bias). Dunhill et al. (2014a) found that the two variables drove each other equally in the fossil record of Great Britain, suggesting redundancy between the two signals. It is therefore not a given that the rarity or abundance of specimens or collections is a metric simply of sampling; it could reflect reality.
There are exceptions to the correlation between diversity and collections. The Albian, for example, has the highest number of ichthyosaur collections but only nine recognized species (the Anisian holds the record, with 19 species). Here, other factors come into play. The Albian shows generally low values of specimen completeness, and this compromises the ability of palaeontologists to identify ichthyosaur collections, and a lack of collections generally hinders the identification of new species. Further, a mix of siliciclastic and carbonate facies is associated with lower completeness values. We do not have independent evidence, but it could also be that Albian ichthyosaur localities have been less intensively studied than those from some other stages.   There was no correlation between diversity and any of the four completeness metrics (Table 2; Fig. 4A-B). It was predicted that high completeness of specimens ought to enable more to be identified to species level and thus should enhance the reported diversity. Instead, specimen completeness appears to have no bearing on diversity and is therefore a poor proxy for global-scale sampling. It does not take into account other confounding factors that can affect how completeness varies between time bins. For example, one time bin may have beautiful, near-complete fossils, but be poorly sampled, while another may be heavily sampled but only produce an abundance of scrappy fossils. A case in point is the Anisian, which shows moderate mean completeness values, but high diversity, arising from large numbers of formations that show wide variation in completeness scores, including high values in some Lagerst€ atten.
Other studies have found a variety of results for this relationship. Brocklehurst et al. (2012) found a significant positive correlation between diversity and completeness for Mesozoic birds. Perhaps completeness provides a better proxy for sampling with terrestrial species, or for the avian fossil record in particular, which is notoriously patchy. Mannion and Upchurch (2010), however, demonstrated a lack of correlation between SCM and diversity in their sauropodomorph study. We found that mediumsized ichthyosaurs had higher preservation values than small or large specimens; there may be an upper limit on large size and preservation beyond which completeness begins to decline. It is possible that diversity has been inflated in some places because of the habit of naming new species from poor fossil remains; this may also apply to our study. A similar explanation is offered by Brocklehurst and Fr€ obisch (2014), who noted poor taxonomic practices in the mid-twentieth century in naming pelycosaurian-grade synapsids. Completeness metrics are useful to elucidate certain aspects of bias in the fossil record, as Benton et al. (2013) noted, but they cannot capture the entirety of the sampling biases affecting the fossil record.
The correlation of FMFs with collections (Table 2) is in line with earlier studies (Benson et al. 2010;Benson and Butler 2011), which suggests that both reflect some aspect of sampling. However, contrary to these studies, we found no correlation between formations and diversity. Benson and Butler (2011) regarded the formationsdiversity relationship as key evidence for a rock record bias mechanism driving the record of open-ocean, pelagic marine reptiles. Here, without any significant correlation, we can only conclude that the FMF metric is not a good sampling proxy or, if it is a good proxy, we are not observing any significant sampling bias. FMFs also did not show any correlation with sea level.
Arguably our most striking result is the strong, but negative, correlation between sea level and all variants of the completeness metrics (Table 2). Oddly, Mannion and Upchurch (2010) also found a negative correlation between skeletal completeness of sauropodomorph dinosaurs and sea level, primarily in the Late Jurassic and Early Cretaceous, which was hard to explain. They suggested that high sea levels might decrease the availability of land area, and so in some way diminish the quality of preservation of sauropod skeletons. In our case, the finding that ichthyosaurs are better preserved at times of low sea level and more poorly preserved at times of high sea level could indicate something about their habitats and eventual death locations. Some classic Lagerst€ atten, such as Solnhofen, correspond to shallow settings, but at a time of high sea level globally, whereas others, such as Holzmaden and the Oxford Clay, correspond to deeper water settings at times of high global sea level.
Similar patterns are present in invertebrate species on a more local geographical and temporal scale (Smith et al. 2001). In this case, the culprit appears to be the lack of suitable taphonomic settings: repeated transgressions in the Cretaceous created new areas of onshore, moderate depth (20-50 m) deposits in which skeletal remains wee best preserved (Kidwell and Baumiller 1990). However, these were removed in the following regression by erosion of this part of the sequence, building a sequence of deeper water, less well preserving facies. Many ichthyosaur specimens are found in these shallower water settings (Martill 1986). The effect of increasing sea level through the Cretaceous does have a negative correlation with completeness (Table 2), but this is minor compared to what is found in the Triassic-Jurassic, and in the complete data.

Size, geographical location and geology
It was expected that larger ichthyosaurs would be better preserved. This is the norm for most fossil groups, including marine invertebrates (Cooper et al. 2006;Sessa et al. 2009) and some dinosaurs (Brown et al. 2013). Unexpectedly, we found that medium-sized ichthyosaurs had a higher median completeness than small or large, although there is much variation in each category (Fig. 6). It was expected that smaller specimens would not preserve as well, because of the lower robustness of smaller carcasses (Brown et al. 2013), but this is confounded by Lagerst€ atten that can preserve small forms in excellent detail. The largest ichthyosaur specimens might have been expected to be the best preserved, but this category contains many of the incomplete ichthyosaurs from poorly sampled areas such as Argentina and Russia. Mannion and Upchurch (2010) found a low completeness for sauropodomorphs despite their size, indicating that other factors may be at play. On the other hand, they noted that basal sauropodomorphs were the most complete and titanosaurs, the least complete, perhaps reflecting the giant size of the latter. Is there an upper limit of size for good preservation quality, and could this apply to ichthyosaurs? It may also be that there are geographical or sampling biases, which are further described below.
Ichthyosaur specimens from the southern hemisphere tend to be much less complete than those from the northern hemisphere (Fig. 7). This could arise either from low preservation potential of facies in the south, or, most likely, from a lack of study. Argentinian ichthyosaur specimens show consistently low completeness scores, for example, but the vast majority of southern hemisphere ichthyosaurs originate from mudstones or even black shales, which should be associated with high completeness. This finding can be seen as only provisional, however, because the southern hemisphere sample size is very small, with only 21 specimens. The vast majority of localities in the northern hemisphere have been studied for a long time and have yielded dozens or hundreds of specimens. Argentina, on the other hand, continues to produce new species of ichthyosaur (Fernandez and Maxwell 2012), and there are still many countries that have barely been explored by palaeontologists. Brocklehurst et al. (2012) found a similar scenario between north and south for Mesozoic birds, with the majority of specimens originating from modern 30-60°N latitudes.
Our finding that sedimentary grain size does not affect fossil completeness (Fig. 8) was unexpected. It is predicted that fine-grained sediments are likely to host higher quality fossils than coarse-grained rocks. This is because sediment permeability is affected by the way grains fit together: the finer the grains, the less space there is between them, and the lower the permeability. Finer grains restrict ion transport in pore waters, impeding decay-causing bacteria (Allison 1988a), so such sediments tend to produce fossils of a much higher completeness than coarse-grained sediments. It is likely that the controls on preservation quality are more varied than simply grain size. Each sedimentary lithology represents a different depositional environment that can exert a variety of controls on preservation potential and thus specimen completeness. For example, black shales deposited in deep, anoxic waters represent ideal conditions for excellent preservation of marine fossils (Allison 1988a, b), but other kinds of mudstones, such as those deposited at delta fronts, lack such properties.
The higher chance of good preservation of ichthyosaurs in siliciclastic than carbonate sediments (Fig. 8) may be because benthic organisms may be more abundant in marine carbonates than in muds and sands, especially in coral reefs. Benthic organisms often scavenge carcasses on the seafloor, which can scatter parts and reduce their completeness. An oxygenated water column is needed for these organisms to thrive; the most complete ichthyosaur fossils often come from areas of anoxia, which cannot support benthic organisms, and these are generally black shales, as noted above. Best and Kidwell (2000) found higher quality preservation in siliciclastic sediments than in carbonate or a siliciclastic-carbonate mix.
Some fine-grained siliciclastic rocks in this study do not yield ichthyosaur fossils of high completeness, whether for reasons of the original environment of deposition, or because sites may today lie in inaccessible or remote areas (e.g. Spitsbergen, Norway) where it is hard to recover complete skeletons, or where they are prone to intense weathering. There is also the issue of immature sampling in particular geographical areas, as mentioned above. More data are required from localities that have not been intensely sampled to identify areas of particularly poor preservation and elucidate causal mechanisms.

Implications for palaeodiversity studies
This study is the first to examine specimen completeness in a group of marine vertebrates, and it has revealed that ichthyosaur fossil completeness varies greatly through the Mesozoic, and how skeletal completeness relates to diversity and various sampling proxies. It is widely agreed that the vertebrate fossil record is incomplete and poorly sampled (Benson et al. 2010;Mannion and Upchurch 2010;Benson and Butler 2011;Benton et al. 2011Benton et al. , 2013, but determining the amount of error is extremely difficult. Widespread covariation of rock record and palaeodiversity signals has frequently been interpreted simply as evidence of bias, but it could equally be explained by a common cause model, or by varying degrees of redundancy between palaeodiversity and sampling proxy signals (especially counts of formations or collections for sparsely sampled taxa).
In the case of ichthyosaurs, it is evident that sampling worldwide has been extremely uneven, with long histories of collecting in western Europe, but relatively limited collecting in many other parts of the world. With greater effort devoted to collecting in the southern hemisphere, some of the inequalities of human sampling effort could be mitigated.
The absence of a relationship between numbers of fossiliferous formations and apparent diversity suggests that unevenness in knowledge of ichthyosaurs may result from other factors. Dunhill et al. (2014a) showed that formation count may fail as a sampling signal because it is redundant with recorded diversity. Clearly there are also geographical inequalities in sampling, and these may reflect differences in human effort devoted to particular time bins and to northern, rather than southern, continents. Most important was the evidence we have identified for selectivity in skeletal completeness scores relating to specimen size and facies. The most favourable conditions for high skeletal completeness scores were for medium-sized specimens preserved in fine-grained siliciclastic rocks (particularly in black shales).
There is no reason that SCMs should correlate with other sampling metrics that attempt to quantify rock volume or rock availability, such as formation counts (Mannion and Upchurch 2010). Completeness metrics, however, have a direct causal link to observed diversity (Benton et al. 2013), as taxonomic identification requires a certain level of completeness. We therefore expected that we would find a correlation between completeness and diversity, but this was not the case. It has yet to be determined whether ichthyosaur specimen completeness correlates with the sampling-corrected diversity signal, as Brocklehurst et al. (2012) found for Mesozoic birds. CONCLUSIONS 1. A study of 351 specimens belonging to 102 ichthyosaur species from the Olenekian to the Cenomanian shows that skeletal completeness does not correlate with diversity and thus is likely not a good globalscale sampling proxy. Completeness does have a relationship with collection counts, but the weakness of the correlation means only tenuous conclusions can be drawn. 2. Completeness fluctuated throughout the Mesozoic, with times of high quality often marked by localities of exceptional preservation (Lagerst€ atten). Times of low completeness were also identified: the Ladinian, much of the Middle Jurassic (Aalenian-Bathonian) and the majority of the Early Cretaceous. These times of low data quality should be taken into account when examining apparent ichthyosaur diversity. 3. Completeness is affected by ichthyosaur body size.
Fossils of medium body size are the best preserved, but small sample size may account for the apparent poorer preservation potential seen in larger ichthyosaurs, as we would expect their robust bodies to be more resistant over time. 4. Ichthyosaurs from the northern hemisphere are much more complete than those from the South; it is unclear whether this is due to sampling or geological biases. The prevalence of fine-grained siliciclastic formations suggests the latter is more likely. 5. Facies composition has a significant effect on fossil completeness, with fine siliciclastic sediments showing the highest preservation potential, particularly if associated with anoxia, as in black shales. Coarse carbonate and coarse siliciclastic sediments appear to have the poorest preservation potential, perhaps because of their increased permeability to pore waters with oxygen and decay bacteria.
6. Skeletal completeness varies negatively with global sea level, which relates to the availability of suitable facies for preservation of fossils. 7. Completeness metrics are an effective proxy to highlight preservational bias in the fossil record. However, they do not capture the entirety of the bias signal and cannot explain ichthyosaur diversity patterns. Further study is required to obtain a larger picture of ichthyosaur fossil completeness for the Mesozoic, and to understand how this may affect observed diversity and the perceived evolution of the ichthyosaurs.