On formation‐based sampling proxies and why they should not be used to correct the fossil record

The fossil record is a unique resource on the history of life, but it is well known to be incomplete. In a series of high‐profile papers, a residual modelling technique has been applied to correct the raw palaeodiversity signal for this bias and incompleteness, and the claim is made that the processed time series are more accurate than the raw data. We apply empirical and simulation approaches to test for correlation and directionality of any relationships between rock and fossil data. The empirical data comprise samples of the global fossil record through the Phanerozoic, and we use simulations to assess whether randomly sampled subsets of modelled data can be improved by application of the residual modelling technique. Our results show that using formation counts as a sampling proxy to correct the fossil record via residual modelling is ill founded. The supposedly independent model of sampling is information‐redundant with respect to the raw palaeodiversity data it seeks to correct, and so the outputs are generally likely to be further from the truth than the raw data. We recommend that students of palaeodiversity cease to use residual modelling estimates based on formation counts, and suggest that results from a substantial number of papers published in the past ten years require re‐evaluation.

T H E fossil record provides the only direct evidence which can be used to analyse biodiversity patterns over extended periods of geological time (Raup 1972;Smith 2007). However, it is generally accepted that the fossil record is compromised by incompleteness and bias, and therefore numerous methods have been employed to try to recover a bias-free, or corrected, palaeodiversity signal (Raup 1972(Raup , 1976 A commonly implemented technique for correcting for sampling biases in palaeodiversity studies is the residual modelling method, proposed by Smith & McGowan (2007) and refined by Lloyd (2012). The method employs a model-fitting approach using sampling proxies to identify times of poor and good sampling and to apply post hoc corrections. The residual modelling method is worth exploring in some detail because it has become the method of choice for a large number of high-profile papers, many of which make radical claims about macroevolutionary consequences after correction of the data; it has been cited over 175 times (Google Scholar, September 2017).
The most common sampling proxies used in conjunction with the residual modelling technique are counts of fossiliferous formations per time bin (Fr€ obisch 2008; Barrett et al. 2009;Benson et al. 2010;Benson & Butler 2011;Benson & Upchurch 2013;Dean et al. 2016). These are normally compiled in one of two ways, either as: (1) strict fossiliferous formation counts that contain only fossils of members of the clade of interest, i.e. clade-bearing formations (CBFs) (Fr€ obisch 2008;Barrett et al. 2009;Butler et al. 2009); or (2) total fossiliferous formation counts (TFFs) (Marx & Uhen 2010), which include all fossiliferous formations in which particular fossils of the group in question could have possibly occurred (see Table 1 for a list of acronyms complete with explanations). Although it has been claimed that formation counts are suitable sampling proxies, as they summarize aspects of rock volume, facies heterogeneity, geographical and temporal dispersion, and research effort (Benson & Upchurch 2013), many studies have criticized the use of such approximations (Crampton et al. 2003;Dunhill 2011;Dunhill et al. 2014b;Benton 2015). Perhaps the most prominent criticism has been the identification of redundancy between formation counts and raw palaeodiversity data (Benton et al. 2011;Dunhill et al. 2014b;Benton 2015) in other words, that one time series is as likely to drive the other as vice versa. For sparsely occurring fossils such as dinosaurs, the two time series are essentially the same (Benton 2015). Dunhill et al. (2014b) provided a quantitative assessment of this issue using Information Transfer (IT) (Hannisdal 2011a, b;Hannisdal & Peters 2011) and concluded that, in the British marine fossil record, the strong association between formation counts and raw diversity is best explained as a result of redundancy.
Here, we further test the redundancy hypothesis on both observed global fossil data from the Ordovician-Neogene, and on simulated data, expanding on the approach of Brocklehurst (2015). Specifically, we confirm two predictions of the redundancy hypothesis, namely that: (1) raw palaeodiversity is more information-redundant with CBFs than TFFs; and (2) even in a random world, CBFs can be driven more by diversity than by sampling.

Data
Global Phanerozoic generic occurrence databases for a number of major marine clades, Arthropoda, Bivalvia, Brachiopoda, Cephalopoda, Echinodermata, Foraminifera and Vertebrata, along with numbers of clade-bearing and total marine fossiliferous formations, were downloaded from the Paleobiology Database (PBDB) (http://paleobiodb.org; Clapham et al.

Simulations
The simulations used in this study are an expansion of those presented by Brocklehurst (2015). The original simulation used a birth-death model with parameters for dispersal and local extinction to simulate the evolution of a clade over ten notional geographical regions. Within each region there were 10 localities, which would become the fossil-bearing localities sampled by palaeontologists. To simulate the taphonomic removal of specimens, each species was subject to random deletion from each locality in which it lived. The probability of a species being sampled at each locality is hereafter called PTAPH. To simulate the removal of fossil-bearing formations from the record, by erosion for example, being covered or simply not yet being found, entire regions were sampled with the probability PFORM. The number of regions sampled (not randomly deleted) in each time bin was stored to represent use of the number of fossil-bearing formations as a sampling proxy. Finally, individual localities within the sampled regions were sampled with a probability PLOC, and the number of localities sampled in each time bin was stored as a second sampling proxy. For further details of the original simulation see Brocklehurst (2015).
The simulation presented in this study includes two modifications to Brocklehurst (2015). First, having simulated a clade via the birth-death model, a smaller clade from within this was chosen at random to be the clade of interest. This allowed a comparison of two different classes of sampling proxy: formations bearing the clade of interest (CBFs), and formations bearing the larger clade containing the clade of interest, the wider clade-bearing formations (WCBFs). This latter class of proxy has been used in a number of studies (Marx & Uhen 2010;Mannion et al. 2011;Brocklehurst et al. 2012Brocklehurst et al. , 2013 in an attempt to circumvent the issues of redundancy and nonoccurrences (palaeontologists having examined rocks of a particular age but not having found representatives of the clade of interest). As noted before (e.g. Benton et al. (2011), palaeontologists often used CBFs as yardsticks of sampling but, unlike in standard ecological sampling theory, they ignored null returns (non-occurrences), so choosing to exclude powerful evidence for relative sampling quality between temporal or spatial bins.
The second change was to add another ten regions to the simulated landscape, which the simulated clade would not be permitted to enter. However, these regions and the localities within were subjected to random deletion as described above, and could be counted towards a sampling proxy (TFFs). This allows assessment of whether all sampled formations/localities should be included in a sampling proxy, or whether the analyst should be more selective and only count those in which the clade of interest could have lived; for example, when examining a clade found only in shallow marine shelf environments, should deep marine formations be included? With these additions to the simulation, four classes of proxy could be generated: (1) sampled formations/localities bearing the clade of interest (CBFs); (2) sampled formations/localities bearing the wider clade containing the clade of interest (WCBFs); (3) sampled formations/localities which the clade of interest could potentially have entered, i.e. potential clade-bearing formations (PCBFs); and (4) all sampled formations/localities (TFFs). Localities were not tested because the original simulations showed that they were a poor proxy for sampling (Brocklehurst 2015). The code for the simulations is presented in Dunhill et al. (2017).
Eight clades were simulated (Dunhill et al. 2017). Speciation and origination rates were equal, so the diversification of the clade proceeded via a random walk. Clades that did not survive for at least 100 time bins were discarded, as were clades containing fewer than 1000 and more than 3000 taxa. The birth-death model output was converted to a phylogeny, and nodes were selected at random one at a time. Once a node containing at least 25% of the total number of taxa but no more than 75% was selected, this became the clade of interest.
For each clade, we varied PFORM amongst time bins in order to capture fluctuating sampling levels through the time series. Therefore, we can test whether formation residual corrections perform better than raw diversity when sampling is heterogeneous and test the level of sampling at which raw taxonomic counts stop being more reliable proxies for true diversity. Hence, we are giving formation counts the best possible chance of being a good proxy for sampling (as varying sampling is dictated entirely by PFORM). Three sets of simulations were carried out incorporating different degrees of variation of PFORM. The first allowed PFORM to vary from 0.1 to 0.9, the second from 0.2 to 0.6, the third from 0.3 to 0.5. In each time bin in each simulation, a value of PFORM was selected at random from a uniform distribution covering the permitted range of values. PLOC and PTAPH did not vary across time bins within any of these simulations.
At each sampling level, 100 simulations were run. For each, a raw diversity estimate was calculated along with the four classes of formation-based sampling proxies. These were used to calculate four residual diversity estimates. The coefficient of determination (R 2 ) was used to quantify the shared variation between the residual diversity estimates and the true diversity estimates, and also shared variation between the sampling proxies and both the true diversity and the raw diversity estimate. Examples of the simulated diversity estimates and proxies are shown in Figure 2.
In the event, all eight clades showed extremely similar results across all sampling regimes, and so we report our statistical analyses for one example clade. For the lowest sampling level (0.1), the simulated raw diversity was in some cases too sparse for statistical analysis, hence we report our analyses of the example clade at three levels of PLOC and PTAPH: 0.2, 0.5 and 0.8.

Statistical analysis
Here we are primarily interested in quantifying the relative strength of statistical associations between sampling proxies and raw diversity, and, in the simulated cases, between sampling proxies, residual diversity estimates, true diversity, and true sampling. For the empirical data, we first carried out pairwise Spearman rank-order correlation tests between formation counts and raw diversity, detrended by generalized differencing, using an R script by G. Lloyd (http://www.graemetlloyd.com/pubdata/func tions_2.r). False discovery rate corrections for multiple comparisons were applied using the method of Benjamini & Hochberg (1995).
Next, we evaluated the relative strength of statistical associations between pairs of time series using two quantities: (1) the coefficient of determination (R 2 ), calculated as the square of the Pearson product-moment correlation, equivalent to a linear regression with intercept term; (2) pairwise information transfer (IT), a more generalized measure of shared information, calculated in each direction, X?Y and Y?X (Hannisdal 2011a, b). To minimize directional bias due to differences in non-stationarity, time series were detrended (linearly, or, if necessary, using a higher-order polynomial fit) to satisfy a stationarity criterion (Kwiatkowski et al. 1992). All records were log transformed and normalized to mean zero and unit standard deviation prior to analysis. To quantitatively characterize the robustness of the IT and R 2 , each time series was randomly subsampled across a spectrum of sample sizes, down to half the total number of time bins, with 200 iterations for each sample size. IT and R 2 results were then integrated across the subsampling spectrum. For each iteration, we computed the corresponding IT and R 2 values for 500 amplitude-adjusted Fourier transform surrogate time series.
Note that we use R 2 as a relative measure of shared variation, not as a basis for significance testing. We thus calculated R 2 on both raw time series and on the detrended series used in IT analysis. Similarly, we use IT primarily as a relative measure of generalized statistical dependence (mutual information), but we only report pairwise IT results for detrended data to avoid non-stationarity bias. For the residual diversity simulations, we evaluated the relative degree of association in sets of three variables using two quantities: (1) the partial rank (Spearman) correlation between X and Y, conditioned on a third variable Z; (2) conditional IT between X and Y when taking into account shared information with a third variable, Z. The analytical settings for conditional or partial analyses were identical to the pairwise analyses, except that the data were not detrended, iterations corresponded to simulation runs, and the surrogates were random shuffles of the original data.
To facilitate the comparison of IT and correlationbased results, all values are reported as relative to the 99th percentile of a distribution of values calculated for 500 surrogate time series (Fig. 3). The surrogates can be thought of as a null distribution unique to each combination of variables, but here we are not focusing on hypothesis (significance) testing. For the simulation results, significance testing is unwarranted. Instead, we are interested in the relative strength of statistical association as a measure of the degree of shared variation and information redundancy.

Empirical data
All clades in the analyses of empirical data show closer correlations between CBFs and raw diversity than between TFFs and raw diversity ( Table 2). All clades show significant correlation with TFFs, although the correlations for arthropods and vertebrates become nonsignificant after correction for multiple comparisons (Table 2).
There is strong bidirectional IT between raw diversity and CBFs in all clade data sets (Fig. 4). In general, there is less IT between raw diversity and TFFs (Fig. 4). On average, only bivalves and echinoderms have detectable IT with TFFs (i.e. the median being above the zero line). The coefficient of determination (R 2 ) for the raw time series (Fig. 5A) and the detrended data ( Fig. 5B) is consistent with the correlation and IT results in terms of the relative strength of CBFs and TFFs with respect to raw diversity within each clade. The R 2 on detrended records (Fig. 5B) is also in broad agreement with the pairwise IT analysis (Fig. 4), which is to be expected when the statistical associations are predominantly linear or monotonic: Within lineages, raw diversity is more strongly associated F I G . 2 . Examples of simulation output, illustrating the true diversity, the raw diversity, clade-bearing formation (CBF) and potential clade bearing formation (PCBF) proxies and residual diversity estimates calculated from these proxies. PFORM varies between 0.1 and 0.9 in each time bin. A-B, PLOC (random sampling of individual localities) and PTAPH (random sampling for taphonomic reasons) = 0.5; A, individual localities (PLOC). C-D, PLOC and PTAPH = 0.8. Colour online.
with CBFs than TFFs, and across lineages, TFFs tend to be below zero, on average (median).

Simulated data
The simulations demonstrate, as expected, that as sampling increases, the various residual diversity estimates display an increasingly better fit to true diversity, and the fit of the various proxies to the raw diversity decreases (Fig. 6A). CBFs show by the far the best fit to the raw diversity, but when used to calculate the residual diversity estimate, produce by the far the worst fit to the true diversity, even worse than the raw diversity estimate at low sampling levels. TFFs sampled in each time bin have the worst fit to the raw diversity. However, the best residual estimates are produced using formations that the clade of interest was 'allowed' to enter (i.e. PCBFs), despite the poor fit between this proxy and raw diversity (Fig. 6B). Residuals calculated using WCBFs fit the raw diversity estimate better than those calculated using CBFs, but not so well as those calculated using PCBFs. When we compare the four proxies to PFORM (our actual measure of true sampling heterogeneity) TFF shows the best correlation, in spite of performing extremely poorly when used to calculate residual diversity (Fig. 6C). This shows that the best proxy for indicating sampling heterogeneity is not the best proxy for producing residual diversity estimates.
Conditional IT analysis of the simulated data shows that residual diversity estimates based on CBFs are worse predictors of true diversity than raw diversity unless the sampling level is very high (i.e. PLOC and PTAPH values are high and PFORM variability is high) (Fig. 7).
However, residual diversity estimates based on WCBFs, PCBFs and TFFs all show improved estimates of true diversity relative to raw diversity. The best predictor of true diversity comes from residual diversity estimates based on PCBFs (Fig. 7). The conditional IT and partial correlations agree when it comes to the relative strength of associations (Fig. 7). By looking at the relative strength of influence of true diversity and sampling on the different proxies, we can tease apart redundancy between the various proxy classes and true diversity. Conditional IT from true diversity to CBFs, beyond shared information with sampling (i.e. PFORM), demonstrates why residual diversity based on CBFs performs worse than the other proxies. At low sampling levels, CBFs are driven more by the true diversity of the clade of interest than by sampling (i.e. PFORM), therefore we can interpret this as information redundancy between CBFs and true diversity (Fig. 8). However, as PFORM variability increases, the sampling signal swamps all other signals in the proxy. As we  increase the phylogenetic scope of the formation counts, from WCBFs via PCBFs to TFFs, the sampling signal becomes more dominant, because any diversity signal in these proxies is less redundant with the true diversity of the sub-clade of interest (Fig. 8). Again, the conditional IT results agree with the partial correlation results (Fig. 8).

DISCUSSION
The strong correlations observed between CBFs and raw diversity could indicate severe temporal sampling bias in the fossil record. It could be said that the weaker correlations observed between TFFs and raw diversity (or complete lack of correlation in some clades) mean that TFFs are not as effective a sampling proxy as CBFs. However, when we consider that the IT between CBFs and raw diversity is strongly symmetrical (Fig. 4), it is not possible to claim that one signal drives the other more than vice versa. The relative strength of relationships between the CBFs/TFFs and raw diversity is confirmed by the R 2 values (Fig. 5). This result is supported by the simulation results, which not only show that, at low sampling levels, residual diversity estimates based on CBFs are worse predictors of true diversity than the raw data (e.g. Fig. 7B), despite showing strong correlation with raw diversity, but also that CBFs are driven more by true diversity than they are by sampling intensity, even when we give formations the best possible chance of being indicative of sampling (e.g. Fig. 8B). It is, therefore, a cause for concern that studies that have used CBFs as sampling proxies, such as Fr€ obisch (2008)  F I G . 7 . At low sampling levels, residual diversity estimates using clade-bearing formation counts (CBFs) are further from the truth than raw data. Plotted values are integrated, conditional IT (A) and partial correlation coefficients (B) on simulated time series of true diversity (Div), raw sampled diversity (Raw), and residual diversity estimates (RDE). Panel rows correspond to the four different proxy classes (CBF, WCBF, PCBF and TFF) used to compute the RDE. Panel columns correspond to different levels of variability in sampling (PFORM), increasing from left to right. Values on the abscissa represent the probability of location sampling (PLOC) and preservation (PTAPH). Conditional IT from raw to true diversity, given RDE (Div Raw | RDE) quantifies how much information raw diversity provides on true diversity beyond the information already contained in RDE. Conversely, conditional IT from RDE to true diversity, given raw diversity (Div RDE | Raw) quantifies how much information RDE provides on true diversity beyond the information already contained in raw diversity. Partial correlations can be interpreted analogously. All IT and correlation values are relative to the 99th percentile of surrogate time series. Filled circles are medians, and error bars encompass the range of IT or correlation values obtained across the 100 simulation runs, with 500 surrogates for each run. Colour online. corrected data that may be further from the truth than the original raw data. Our results emphasize that the strength of the correlation between sampling proxies and raw diversity is not a good indicator of the degree of sampling bias in the data. It has previously been suspected that formation counts and raw palaeodiversity are information redundant, for two reasons: tallying and comparability. The palaeontological literature grows by the discovery of new fossils, and as those fossils are added to the catalogues of palaeodiversity, so too are the formations in which they occur. This is why, for sparsely occurring fossils at least, the tally of known fossils of group A (i.e. diversity), and the count of known formations yielding fossils of group A (the CBF, i.e. dispersal), both grow in tandem and both equally reflect intensity of sampling (Benton 2015). Therefore, they both equally reflect estimated palaeodiversity and sampling effort. The comparability argument can be put in two ways: first, geological formations vary in volume over eight orders of magnitude, from 0.073-225 000 km 3 (Benton et al. 2011;Dunhill et al. 2012) so they are a poor means of binning data and making comparisons of any kind. Second, their definition can depend on the richness of their fossil content (an aspect of redundancy that the simulations cannot account for) so greater environmental or faunal turnover may enable finer stratigraphic partitioning (Dunhill et al. 2014a, b). It is highly likely, especially in the case of CBFs, where formations that only contain the specific fossil group under study are used as sampling proxies, that raw palaeodiversity drives formation counts every bit as much as formation counts drive raw palaeodiversity. Even total formation counts may be inextricably linked to palaeodiversity, as exemplified by the Triassic-Jurassic fossil record of Great Britain, where formation boundaries are not independent of changes in fossil richness (Dunhill et al. 2014b). Benton (2015 showed that the discovery of new dinosaurs through research time has been closely linked to the discovery of new dinosaur-bearing formations (or the reverse), and the fact that they covary does not mean that one signal can be used to correct the other. Thus, CBFs only allow for the quantification of what has already been sampled and make no allowance for future possibilities to sample, whereas WCBFs, PCBFs, and TFFs offer some perspective on the 'unknown' (Benton et al. 2011(Benton et al. , 2013. Therefore, sampling proxies that include formations that have not yielded fossils of the clade in question a priori represent a better sampling proxy than CBFs, because the former is a closer approximation of supposed total sampling effort and its underlying driver (i.e. availability of sedimentary rock) whilst the latter ignores all sampling that failed to find the group in question. However, the simulations assume that the definition of formations is random with respect to fossil diversity, which is likely to be violated in the real world due to mutual dependencies with facies changes. In addition to this, our empirical results show that IT is still symmetrical, although much weaker between TFFs and raw diversity, suggesting that this supposed sampling proxy may also be information redundant with raw diversity, despite not being as closely linked.
The simulations do show that including formations containing the wider clade of interest or, better still, the potential to include the clade of interest, produces residual diversity estimates that are more similar to true diversity than the raw data, albeit in a best-case scenario where formations equal true sampling. We also show that all sampling proxies perform better at higher levels of sampling. This gives some confidence in using residual modelling as a conservative method to correct for sampling bias in the fossil record, but only if the correct sampling proxy is used and our sampling of the fossil record is already very good. Therefore, the choice of sampling proxy is important and the problem therein is to ensure that the correct sampling proxy is employed. For example, the best performing sampling proxies are counts of formations that could potentially yield fossils of the clade of interest. How do we go about defining which formations might preserve our clade of interest? Whatever the answer, it is unavoidably highly subjective. It may be F I G . 8 . At low sampling levels, the clade-bearing formation (CBF) sampling proxy is driven more by diversity than by sampling. In the simulations, sampling variability is entirely driven by PFORM, and entirely random with respect to true diversity. Plotted values are integrated, conditional IT (A) and partial correlation coefficients (B) on simulated time series of a sampling proxy (Proxy), true diversity (Div), and true sampling (PFORM). Panel rows correspond to the four different proxy classes (CBF, WCBF, PCBF and TFF). Panel columns correspond to different levels of variability in sampling (PFORM), increasing from left to right. Values on the abscissa represent the probability of location sampling (PLOC) and preservation (PTAPH). Conditional IT from true diversity to the sampling proxy, given true sampling (Proxy Div | PFORM) quantifies how much information true diversity provides on the sampling proxy, beyond the information already contained in PFORM. Conversely, conditional IT from PFORM to the sampling proxy, given true diversity (Proxy PFORM | Div) quantifies how much information PFORM provides on the sampling proxy beyond the information already contained in true diversity. Partial correlations can be interpreted analogously. All IT and correlation values are relative to the 99th percentile of surrogate time series. Filled circles are medians, and error bars encompass the range of IT or correlation values obtained across the 100 simulation runs, with 500 surrogates for each run. Colour online. easier to define a higher clade of interest, but there is still the question of how wide a clade is required to reach the optimum in estimating sampling intensity. Neither should we rely on a sampling correction method that cannot cope with poor sampling, particularly as the residual modelling approach has been widely used when the empirical fossil record (e.g. vertebrates) has been deemed too poor for sampling standardization approaches.

CONCLUSION
Our results indicate that the close correlations commonly observed between formation counts and raw palaeodiversity are often the result of information redundancy, rather than evidence of large-scale temporal sampling biases. It is therefore inadvisable to use metrics of formation counts to 'correct' raw fossil diversity using a residual modelling approach for two reasons: (1) easily definable formation counts produce inaccurate residual diversity estimates, i.e. CBFs, are information redundant with regard to raw sampled diversity and perform increasingly poorly as sampling levels are degraded; and (2) formation counts, shown to produce more accurate residual diversity estimates in simulated data, are difficult and subjective to define in empirical studies. Coupled with results from similar simulation studies showing that the residual modelling technique performs less well than phylogenetic diversity estimates and even more poorly than raw fossil diversity estimates when using CBFs (Brocklehurst 2015), the fact that formation counts have been repeatedly shown to be a poor proxy for actual sampling effort (Crampton et al. 2003;Dunhill 2011) and a recent critique of the methodology of residual modelling showing that there are objective statistical flaws in the residual modelling method as most commonly applied (Sakamoto et al. 2017), our results suggest that residual modelling using sampling proxies is not an appropriate method for correcting for temporal sampling biases in the fossil record.
Our study does not mean that the fossil record is not biased, it undoubtedly is, but there are much more appropriate methods available to palaeontologists to address this issue than residual modelling based on formation counts (Smith 2007;Alroy 2010;Benton et al. 2011;Hannisdal et al. 2012Hannisdal et al. , 2017Liow 2013;Starrfelt & Liow 2016;Walker et al. 2017). Second, any prior assumption that the fossil record is generally very poor and biased in a major way should be reconsidered on a case-by-case basis. It might just be that the fossil record is adequate, via the application of appropriate analytical techniques, for many of the macroevolutionary and palaeobiological questions that interest modern palaeontologists.
Acknowledgements. We thank James Crampton, Noel Heim and Andrew Smith for providing constructive critiques which greatly helped improve this manuscript. We thank Sally Thomas for providing editorial comments. This is Paleobiology Database official publication 294. AMD is funded by a Leverhulme Early Career Fellowship (ECF-2015-044) and a NERC research grant (NE/P013724/1). BH is funded by the Research Council of Norway grant 231259, and by the Bergen Research Foundation. NB is funded by Deutsche Forschungsgemeinschaft grant FR 2457/5-1, awarded to Professor J€ org Fr€ obisch.
Author contributions. AMD, BH and MJB conceived the project. AMD processed the data. BH and AMD performed the statistical analyses. NB performed the simulations. AMD led the writing of the manuscript. All authors reviewed and edited the final manuscript.