Histological skeletochronology indicates developmental plasticity in the early Permian stem lissamphibian Doleserpeton annectens

Abstract Doleserpeton annectens is a small‐bodied early Permian amphibamiform, a clade of temnospondyl amphibians regarded by many workers to be on the lissamphibian stem. Most studies of this taxon have focused solely on its anatomy, but further exploration of other aspects of its paleobiology, such as developmental patterns, is critical for a better understanding of the early evolutionary history of lissamphibians. Here, we present a histological analysis of growth patterns in D. annectens that utilizes 60 femora, the largest sample size for any Paleozoic tetrapod. We identified pervasive pairs of closely spaced lines of arrested growth (LAGs), a pattern that indicates a marked degree of climatic harshness and that would result in two cessations of growth within a presumed single year. We documented a wide degree of variation compared to previous temnospondyl skeletochronological studies, reflected in the poor correlation between size and inferred age, but this observation aligns closely with patterns observed in extant lissamphibians. Furthermore, sensitivity analyses conducted by subsampling our dataset at more typical sample sizes for paleontological studies produced a wide range of results. This includes biologically improbable results and exceptionally well‐fit curves that demonstrate that low sample size can produce potentially misleading artifacts. We propose that the weak correlation between age and size represents developmental plasticity in D. annectens that typifies extant lissamphibians. Detection of these patterns is likely only possible with large sample sizes in extinct taxa, and low sample sizes can produce false, misleading results that warrant caution in drawing paleobiological interpretations from such samples.

To date, only two other amphibamiforms have been histologically sampled, the early Permian branchiosaurid Apateon von Meyer, 1844 spp. from Europe  and the Early Triassic micropholid Micropholis stowi Huxley, 1859from South Africa (McHugh, 2015, and neither study provided quantitative assessments of growth. The paucity of work on amphibamiforms stems from the paucity of material; many terrestrial amphibamiforms are represented by few specimens and predominantly cranial material that is not amenable to histological sampling. Doleserpeton annectens is thus unique in being one of the most abundant tetrapods at the Richards Spur locality, rivaled only by the eureptile Captorhinus Cope, 1895(MacDougall, 2017, with dense accumulations of Doleserpeton-bearing matrix referred to as "D-concentrate" (Bolt, 1969). In particular, it is well-represented by long bones, more specifically femora ( Figure 1). Therefore, the abundance of material, in concert with the highly nested position of D. annectens within Amphibamiformes (e.g., Schoch, 2019;Sigurdsen & Bolt, 2010), makes this taxon an ideal candidate for exploring aspects of developmental paleobiology along the presumed lissamphibian stem. The objectives of this study are twofold: (a) to compile and to examine the skeletochronology of D. annectens using long bone histology and (b) to assess the effect of sample size on skeletochronological studies through the use of sensitivity analyses by subsampling the exceptionally large sample size (n = 60) available for this taxon.

| Specimen selection
In order to examine growth patterns in D. annectens, sixty isolated femora from the early Permian karst deposits near Richards Spur, Oklahoma, that were donated by W. May (Figures 1 and 2).
Embedded specimen resin blocks and their associated thin sections are reposited at the Royal Ontario Museum Vertebrate Paleontology collection (ROMVP) and are assigned the catalogue numbers of ROMVP 79310 to 79354, inclusive, and ROMVP 80676 to 80690, inclusive (Table 1).
Specimen identification was based on previous descriptions of the femur of D. annectens (e.g., Sigurdsen & Bolt, 2010). Femora were selected based on completeness and to sample as broad of a size range as possible. Only right femora were chosen to avoid potential sampling of the same individual. General features of terrestrial dissorophoid femora are conserved, but relative osteological development and size-related features make taxonomic distinctions from the large-bodied olsoniforms readily apparent (e.g., Sullivan, Reisz, & May, 2000). However, we note that there is virtually no postcrania (and no limb elements) of the other two Richards Spur amphibamiforms, Pasawioops mayi  and Tersomius dolesensis Anderson & Bolt, 2013. Based on the current understanding of size relationships, P. mayi appears to have reached a much larger adult size than Doleserpeton. The holotype skull measures 32.6 mm in length , but much larger individuals have also been reported (Maddin, Fröbisch, Evans, & Milner, 2013), while D. annectens skulls range from 12 to 19 mm (Bolt, 1969;Sigurdsen & Bolt, 2010). T. dolesensis is somewhat larger than D. annectens; the holotype skull of the former is about 22.5 mm in length, but its level of ossification indicates that it may not be a full adult (Anderson & Bolt, 2013). Material referred to D. annectens is far more abundant than P. mayi and T. dolesensis (Bolt, 1969;Sigurdsen & Bolt, 2010) and further comments on how this affects the results of this study are presented in the discussion.

| Histological preparation
Histological sampling followed standard procedures of the Royal Ontario Museum (ROM). Specimens were photographed in several standard profiles using a Leica DVM6 tilting microscope prior to sampling. Specimens were glued to a base coat to standardize sectioning plane, embedded using Castolite AC resin, allowed to cure for a minimum of 24 hr, and sectioned at approximately the minimum diaphyseal circumference, which does not necessarily correspond to the geometric mid-length of the element. Because of the small size of these specimens, the cut was made slightly to one side of this minimum circumferential region in order to avoid a kerf loss (0.25 mm) that could result in the full loss of the minimum circumference. Specimens were cut using a Buehler IsoMet 1000 fitted with a 0.25-mm wafer blade at a speed of 275 rpm and then mounted to frosted plexiglass slides using cyanoacrylate adhesive. Polishing was done using the Hillquist 1010 grinding cup, followed by manual polishing on glass plates with 1,000-mesh levigated grit and 5-micron aluminum dioxide. Slides were imaged on a Leica DVM6 digital transmitted light microscope using LAS (Leica Application Suite) X software and a Nikon AZ-100 microscope with a Nikon DS-Fi2 camera using NIS Elements BR software.

| Skeletochronological methods
Retrocalculation methods (Woodward, Padian, & Lee, 2013) are often used to infer the amount of skeletochronological information that has been lost due to remodeling of the periosteal bone from within the medullary cavity. Some of the most common methods include section stacking, in which sections of variable size (and presumably variable maturity) are overlaid to infer how many growth cycles have been obscured, and arithmetic models in which the spacing between lines of arrested growth (LAGs) in variably sized specimens is measured and then combined to reconstruct a section with a fully preserved record. Both methods rely on a consistent plane of section at the minimum diaphyseal diameter. This is typically not a major challenge for large-bodied tetrapods in which an entire "puck" containing the minimum diaphyseal region can be extracted and an exact section can be made. However, sectioning small elements that are sometimes less than 5 mm in the longest axis while maintaining a perfectly consistent plane is challenging. As a result, there are clear differences between sections related to the relative cortical thickness and medullary cavity circumference. This does not affect the quality or counting of LAGs when sections are made approximately at this region (personal observation). However, it does interfere with the aforementioned methods in which a perfectly consistent plane is required to properly align growth marks and to properly compare relative cortical thickness. In order to at least account for specimens with the same LAG count, but with disparity in the presence of endosteal remodeling (suggesting that ones with remodeling are older than their LAG count indicates), we added one LAG to each specimen in which remodeling was observed. Both adjusted and nonadjusted data are presented in the results (Figure 3). This is admittedly semi-arbitrary, as some individuals almost certainly lost more than one annual growth cycle to remodeling, but we elected to pursue a more conservative approach given the lack of reliability of retrocalculation methods for this dataset. Without any sort of calibration method similar to mark-recapture studies performed in extant tetrapods, it is unclear at which point more than one LAG may be lost to remodeling, and we did not attempt to define an arbitrary size or inferred age threshold at which to add a second inferred growth cycle.
Double LAGs were counted as one LAG in the skeletochronological analysis. This is based on previous work on both extinct (e.g.,  and extant (e.g., Smirina, 1994)

| Sensitivity analysis
In order to assess the effect of sample size on growth curve reconstructions, we performed a random subselection experiment.
Specimens were randomly sampled (repetition, or selection of the same specimen, was prohibited) in bins of 10, 20, and 30 samples, Under this model, the y-intercept (b) represents the hypothetical femoral length at birth (time = 0), although the femur is virtually assured not to have been fully ossified at this time, and the slope (m) represents the "growth rate" in the sense of change in femur length relative to the proxy for age (LAG count). This model was selected because it has been used in other studies and because the goal of this analysis is only to test the effects of sample size variation, rather than to conclusively determine the best model or to refine an existing one. This was run for 5,000 iterations for each sampling bin. Histograms for the correlation coefficient (R 2 value), the y-intercept, and the slope were then produced. This analysis was run using the retrocalculated data. All data analysis was performed in R Studio v.1.2.1335.

| Histological description and ontogeny
The bone histology and microanatomy of Doleserpeton Bolt, 1969 has been analyzed and described by previous workers (Castanet et al., 2003;de Ricqlés, 1981;Laurin, Girondot, & Loth, 2004)  with other patterns were also noted; these comprise single LAGs (n = 6), double LAGs transitioning to single LAGs toward the periphery (i.e., later in development; n = 7), and an indeterminate pattern without a discernible trend (n = 4). The sample size for these uncommon LAG patterns is too low to derive reasonable quantitative correlations with size regardless of the distribution of the four points; data plotted by LAG pattern are included in Figure   S1. Full-size microphotographs of representative thin sections are included in Appendix 1.

Similar variation was noted in the study of the branchiosaurid
Apateon by , although those patterns may be more readily correlated with the variable paleoaltitudes of montane lakes (but see Laurin & Soler-Gijón, 2010;Schultze, 2009 for alternative paleoenvironmental interpretations). In contrast, LAGs are totally absent in a sampled humerus of the terrestrial amphibamiform M. stowi. This was interpreted as evidence for fast growth during early development among Early Triassic temnospondyls (McHugh, 2015), although in the strictest sense, it indicates only that growth did not totally cease in this taxon.
The presence of fibrolamellar bone in M. stowi (rather than lamellar bone as in Apateon and D. annectens) suggests a faster absolute growth rate that is likely associated with its larger size (Schoch & Rubidge, 2005).
Remodeling is variable across our sample, and secondary osteons are not present, as in Apateon and M. stowi (McHugh, 2015;. Some specimens of D. annectens have a relatively thick layer of endosteal bone that is separated from the periosteal lamellar bone by a line of resorption and that extends around the entire circumference of the medullary cavities. Other specimens only have incomplete rings of remodeling. Osteocyte lacunae are relatively evenly dispersed throughout the cortex, although they cannot always be clearly discerned because they appear to be translucent or are not visible in the focal plane of the thin section when sections are not z-stacked.

| Growth curve reconstruction
LAGs could be definitively counted for 52 of the 60 sections that were produced (

| Sensitivity analysis
The  Castanet, and Francillon (1985) and Castanet and Caetano (1993 (Tinsley & Tocque, 1995), a factor that could have also been influential at low-latitude paleoenvironments of the early Permian such as Richards Spur (e.g., Woodhead et al., 2010 and references therein).
Cessations in growth due to insufficient nutrition may also occur and are usually identified as LAGs with an incomplete circumference, or false LAGs, in extant lissamphibians (e.g., Castanet & Smirina, 1990;Hemelaar, 1985;Sagor, Ouellet, Barten, & Green, 1998) Miaud and Guillaume (2005) showed regular LAG patterns in both populations, which the authors suggested could relate to cessation of growth associated with reduced feeding (of uncertain relationship to reduced prey populations). Although climatic fluctuations may strongly influence prey populations (probably arthropods in the case of Doleserpeton), environmental perturbation is not the only limiting factor on prey density. A more comprehensive survey of the skeletochronology and inferred life histories of the Richards Spur tetrapods will be necessary to more thoroughly explore this hypothesis.

| Skeletochronology
Our analysis produces a widely variable dataset that cannot be wellapproximated by mathematical growth models, although there is a clear degree of correlation between inferred age and femur size (Table 2; Figure 3). Typically, skeletochronological studies of extant lissamphibians have used either linear or von Bertalanffy (VBGM) models to attempt to fit their data to a predictable growth curve.
In this study, it was not possible to fit a VBGM to the data, perhaps because of the relative scaling between the range of sizes among femora and the range of inferred ages in the analysis. The VBGM predicts a rapid growth rate in early ontogeny that plateaus in later stages, and thus, incomplete sampling of the growth trajectory may produce a dataset that cannot be properly modeled using nonlinear models. It is worth noting that the VBGM was originally developed Developmental plasticity is a widespread ecological attribute in extant lissamphibians (e.g., Alcobendas & Castanet, 2000;Augert & Joly, 1993;Denoël & Joly, 2000;Eden et al., 2007) that permits them to adapt to local environments and that in turn produces variation in growth curve reconstructions. This hypothesis can be ex-

| Effects of sample size on growth curve reconstruction
The subsampling experiments that we performed here indicate that growth curve reconstruction is extremely susceptible to low sample size. Some permutations of the smallest subsampling (n = 10), a common sample size for extinct tetrapods (see Steyer et al., 2004 Linnaeus, 1758;Emerson, 1988). This diminution occurs during metamorphosis when much of the skeleton remains unossified and would thus not be reflected in the long bone histology, and the disparity between larval and adult sizes is probably related to an unusually large tadpole stage that evolved in response to environmental harshness (Emerson, 1988). In a similar vein, a few replicates with a subsample of n = 10 produced y-intercept values that greatly exceeded the size of all specimens sampled in this analysis.
Sample sizes are typically low for any extinct tetrapod because of taphonomic biases, incompleteness of specimens, and restricted access to materials for destructive sampling. As a result, these datasets are less robust than those that can be produced for extant tetrapods, especially lissamphibians, for which several hundred individuals may be sampled by using relatively noninvasive methods such as toe clippings. The few studies that have previously produced a growth curve for temnospondyls (e.g., Konietzko-Meier & Sander, 2013;Steyer et al., 2004) have sampled relatively few (<12) specimens but also produced a well-fit curve (R 2 > .90) using linear models (either simple linear or second-order polynomial). Although this may be interpreted as reflective of low intraspecific variation throughout ontogeny, such results may also be interpreted as an artifact of low sampling size or incomplete sampling of the entire ontogenetic trajectory (i.e., only a stage that is well-approximated by linear models) based on our findings. A strong fit is not equivalent to biological accuracy. This is especially salient if it has not been tested whether multiple specimens of the same size are of the same predicted maturity. Typically, paleohistological studies of growth have tried to sample as broad of a size range as possible, but this often results in large gaps in size between specimens and an absence of testing of multiple specimens of a comparable size, which is arguably the more direct way to assess whether size and age are tightly correlated.
These findings are not meant to suggest that we interpret wellfit linear models like those of Steyer et al. (2004) and Konietzko-Meier and Sander (2013)  However, our unusually high sample does provide a cautionary tale showing that limited sample size and limited sampling of specimens of the same size (to test whether they are the same age) can produce misleading or biologically implausible results. Regardless of whether a sample size for a paleohistological study is considered to be standard for the field, this does not negate the fact that interpretations derived from low sample size are inherently tenuous.
Obtaining a strongly fit curve (or line) to a dataset compiled from a low sample size should be treated with some skepticism because other large datasets examining ontogeny in extinct tetrapods (e.g., Griffin & Nesbitt, 2016;Sander & Klein, 2005) often recover patterns that are indicative of marked plasticity and for which growth curves cannot be well-modeled. Linear models in particular are not typically regarded as accurate or precise models for growth in tetrapods because growth in the form of size changes is virtually assured to plateau at later life stages (although linear models may approximate particular stages of growth). Assessing the effects of methodology, such as sample size, sampling bins, and bin frequency is an important step in evaluating previous growth curve reconstructions.
Modern lissamphibian studies have long shown a poor correlation between age, either inferred or known, and body size in some taxa (e.g., Halliday & Verrell, 1988; but see Laurin & Germain, 2011), regardless of whether factors such as number of sampled populations, uniform time of sampling, and biological sex can be controlled for.
Better correlations are produced between body mass and body size, but the former is not available to paleontologists and is difficult to produce without relying heavily on either age (poor correlation) or body size (usually unknown and the dependent variable in this relationship). In most instances of paleontological studies, it cannot be demonstrably proven that all of the individuals in a given locality were part of the same interbreeding biological population, rather than a time-averaged assemblage. However, the fissure fill nature of Richards Spur possibly exacerbates the time-averaging relative to other localities, and this should be considered when comparing our findings to those of other studies.

| The role of apomorphy-based identification in histology
Apomorphy-based identification (e.g., Bell, Gauthier, & Bever, 2010;Nesbitt & Stocker, 2008) is a taxonomic practice that relies on unique derived features (apomorphies) to justify taxonomic identifications and specimen referrals. It is considered to be more rigorous than resemblance-based identification but can also be limited by outdated taxonomy. Apomorphy-based identification is rarely applied for paleohistological work, which is the product of a number of factors related to histological methods. First, the majority of all paleohistological work to date focuses on isolated postcranial elements, which are more readily accessible but also less likely to be properly referable. Second, a historic precedent on cranial characters and features for identification and diagnoses of new taxa limits the ability to link isolated postcrania with the more diagnostic crania without articulated specimens. Thus, taxonomic identification of many specimens selected for histological sampling is identified by a combination of resemblance-based identification and circumstantial evidence, such as stratigraphic occurrence and faunal community assemblage. We are not suggesting that this is a poor practice-in many instances, it is both robust and the only viable practice-but that does not eliminate the limitations associated with data obtained through a sample selected in this way, and such shortcomings should be explicitly stated for transparency.
With respect to this study, D. annectens is only one of three amphibamiforms and nine dissorophoids at the Richards Spur locality.
Because of the size disparity between adult amphibamiforms and adult olsoniforms, small yet relatively well-ossified elements can be excluded from consideration as markedly immature olsoniforms.
This assumption is further validated upon examination of the data, as there are no sampled specimens of a markedly young age. However, the other two amphibamiforms, P. mayi  and T. dolesensis (Anderson & Bolt, 2013), are represented almost exclusively by cranial material. More broadly, little is known about terrestrial Permian amphibamiform postcranial anatomy, as many taxa are represented only by cranial material, and what is known indicates little more than slight variation in proportions between species (e.g., Clack & Milner, 2010;Daly, 1994 We would thus predict that even if there is inadvertent capture of non-Doleserpeton amphibamiforms in our sample, the proportion of "foreign" taxa would be relatively minimal and thus unimportant from a statistical perspective. It is likely that such capture would increase the observed variation under an assumption that the three amphibamiforms did not follow an identifical growth trajectory or achieve the same maximum body size. However, without a clearer understanding of various attributes of the ontogeny of P. mayi and T. dolesensis (e.g., maximum body size), it is not clear exactly how capture of these taxa in our sample could influence the interpretations of the data.
The challenge of working with isolated materials is not limited to this study, but we reiterate that it is important to be explicit about this shortcoming, nonetheless. Paleohistological sampling is fundamentally opportunistic. Limitations on accessibility to specimens for histological sampling of extinct tetrapods typically result in the near-exclusive use of isolated elements that are identified by circumstantial evidence (e.g., stratigraphic occurrence in a monotaxic bone bed, relative abundance) or resemblance-based identification.
Outside of mass-death assemblages that probably represent a single catastrophic event (e.g., the metoposaurid Dutuitosaurus ouazzoui Dutuit, 1976;Steyer et al., 2004), the sampled population usually cannot be determined to represent a true, single, original population. The potential problem of inadvertently sampling other taxa is not exclusive to this study (e.g., Konietzko-Meier & Klein, 2013). In this study, we followed previous practices and identified femora that clearly matched those identified to D. annectens in past studies (e.g., Sigurdsen & Bolt, 2010) and have observed larger dissorophoid femora from the locality that are too large to belong to D. annectens but that are too small and well-ossified to belong to the co-occurring olsoniforms. Histological variation that would clearly indicate divergent growth patterns attributable to taxonomy is not present in our sample beyond some variability in LAG patterns. This is not itself evidence for explicit variability associated with taxonomy based on our previous discussion regarding the formation of double LAG patterns.

| CON CLUS ION
Here, we have presented a skeletochronological analysis of the presumed stem lissamphibian D. annectens, represented by the largest sample size for a histological study of a Paleozoic tetrapod to date.
Our analysis reveals a high degree of variability within the sample, likely reflecting developmental plasticity, a common ecological strategy among extant lissamphibians to cope with unstable and unpredictable environmental conditions. These findings thus suggest the retention of a deep temporal origin of a life-history strategy common to many metazoan clades that characterizes both terrestrial amphibamiforms and extant lissamphibians and that may have contributed to the persistence of this particular clade of temnospondyls. The presence of double LAGs in most of our specimens further supports this hypothesis, indicating that these animals were often slowing their growth twice a year and thus experiencing two distinctly seasonal types of climatic harshness. Poor correlation between inferred age (proxied by LAG count) and body size (proxied by femur length) is suggestive of developmental plasticity that would be adaptive for the climate conditions at Richards Spur, although microevolution, time-averaging, and inadvertent capture of other amphibamiforms cannot be fully excluded.
Low sample size inherently undermines the reliability of statistical inference (e.g., Button et al., 2013). The results of our sensitivity analysis have implications specifically for similar skeletochronological studies of extinct tetrapods because they indicate that low sample size may mask variation and produce artificial results that can be highly improbable (e.g., negative growth rates) and highly compelling (well-fit curves). Low sample size is a reality of working with the fossil record, and our results should not be considered as a criticism of previous studies that utilized lower sample sizes. However, our analysis underscores the point that caution should be exercised in making interpretations from growth curves reconstructed from low sample sizes, even if a well-fit curve is recovered from the available data. This may differ between clades with so-called determinate (e.g., birds, mammals) and indeterminate (e.g., lissamphibians) growth, as well as disparate life strategies that affect ontogeny and survivorship within a clade. Simply because a sample size is "normal" or relatively high for a paleontological study does not negate the fact that when the sample size is small in absolute or statistical terms, it may be lacking in controls (e.g., stratigraphic), and drawing inferences from such a sample cannot be considered to be robust. This should not preclude the forming of evidence-based interpretations and informed speculation but rather emphasizes the need for limitations to be made explicit and for workers to avoid making excessive extrapolations in their conclusions from the available limited data.

ACK N OWLED G M ENTS
We thank W. May for generously donating his personal collection of Doleserpeton limb elements that were sampled for this experiment.
Thanks to K. Seymour (ROMVP) for collections numbers and to N.

CO N FLI C T O F I NTE R E S T
None declared.

AUTH O R CO NTR I B UTI O N S
BMG and RRR conceived and designed the study. RRR contributed materials. BMG performed the data analyses. BMG and YH interpreted the data and drafted the manuscript. All authors contributed to editing the manuscript and approved it for submission.

DATA AVA I L A B I L I T Y S TAT E M E N T
All data included in this study are included in the main document.