Machine learning confirms new records of maniraptoran theropods in Middle Jurassic UK microvertebrate faunas

Current research suggests that the initial radiation of maniraptoran theropods occurred in the Middle Jurassic, although their fossil record is known almost exclusively from the Cretaceous. However, fossils of Jurassic maniraptorans are scarce, usually consisting solely of isolated teeth, and their identifications are often disputed. Here, we apply different machine learning models, in conjunction with morphological comparisons, to a suite of isolated theropod teeth from Bathonian microvertebrate sites in the UK to determine whether any of these can be confidently assigned to Maniraptora. We generated three independent models developed on a training dataset with a wide range of theropod taxa and broad geographical and temporal coverage. Classification of the Middle Jurassic teeth in our sample against these models and comparison of the morphology indicates the presence of at least three distinct dromaeosaur morphotypes, plus a therizinosaur and troodontid in these assemblages. These new referrals significantly extend the ranges of Therizinosauroidea and Troodontidae by some 27 myr. These results indicate that not only were maniraptorans present in the Middle Jurassic, as predicted by previous phylogenetic analyses, but they had already radiated into a diverse fauna that pre‐dated the break‐up of Pangaea. This study also demonstrates the power of machine learning to provide quantitative assessments of isolated teeth in providing a robust, testable framework for taxonomic identifications, and highlights the importance of assessing and including evidence from microvertebrate sites in faunal and evolutionary analyses.

Abstract: Current research suggests that the initial radiation of maniraptoran theropods occurred in the Middle Jurassic, although their fossil record is known almost exclusively from the Cretaceous. However, fossils of Jurassic maniraptorans are scarce, usually consisting solely of isolated teeth, and their identifications are often disputed. Here, we apply different machine learning models, in conjunction with morphological comparisons, to a suite of isolated theropod teeth from Bathonian microvertebrate sites in the UK to determine whether any of these can be confidently assigned to Maniraptora. We generated three independent models developed on a training dataset with a wide range of theropod taxa and broad geographical and temporal coverage. Classification of the Middle Jurassic teeth in our sample against these models and comparison of the morphology indicates the presence of at least three distinct dromaeosaur morphotypes, plus a therizinosaur and troodontid in these assemblages. These new referrals significantly extend the ranges of Therizinosauroidea and Troodontidae by some 27 myr. These results indicate that not only were maniraptorans present in the Middle Jurassic, as predicted by previous phylogenetic analyses, but they had already radiated into a diverse fauna that pre-dated the break-up of Pangaea. This study also demonstrates the power of machine learning to provide quantitative assessments of isolated teeth in providing a robust, testable framework for taxonomic identifications, and highlights the importance of assessing and including evidence from microvertebrate sites in faunal and evolutionary analyses. M A N I R A P T O R A is a diverse and speciose clade of theropod dinosaurs that includes some of the most familiar small-bodied predators of the Cretaceous Period, such as Velociraptor and Deinonychus. In addition to these iconic dromaeosaurids, the clade also includes troodontids, scansoriopterygids, oviraptorosaurs, therizinosaurs, alvarezsaurs and the only living dinosaurs, birds. During the Cretaceous they occupied a varied range of niches ranging from obligate herbivores to arboreal insectivores, as well as cursorial predators. Maniraptoran remains are best known from the northern hemisphere, but they achieved a wide geographic distribution that also encompassed South America, Africa and Madagascar (Ding et al. 2020).
Although maniraptoran remains are known almost exclusively from the Cretaceous, ghost lineages derived from phylogenetic analyses indicate that it is likely that their initial radiation occurred in the Middle Jurassic (Holtz 2000;Rauhut 2003;Xu et al. 2010;Carrano et al. 2012;Rauhut & Foth 2020). This date is bracketed by discoveries of earlier-branching coelurosaurs, such as tyrannosauroids, in Middle Jurassic deposits (Rauhut et al. 2010). However, the Jurassic maniraptoran record is frustratingly incomplete: a handful of named taxa are known from the Late Jurassic (Archaeopteryx, scansoriopterygids and possibly Ornitholestes) and there is one possible Middle Jurassic representative, Eshanosaurus, the identification and dating of which remains contentious (Kirkland & Wolfe 2001;Xu et al. 2001;Barrett 2009). Nevertheless, fragmentary, generically indeterminate remains of some maniraptoran subclades, such as possible dromaeosaurs, have been reported from Middle Jurassic microvertebrate sites in Europe and Asia (Evans & Milner 1994;Metcalf & Walker 1994;Averianov et al. 2005;Prasad & Parmar 2020). However, due to the disarticulated nature of the material, it has not been possible to identify these specimens beyond clade level and these identifications have been questioned, even at this coarse level of taxonomic resolution (Benson 2010a;Foth & Rauhut 2017;Ding et al. 2020;Sell es et al. 2021). This, and issues relating to the dating of some sites, has meant that these discoveries have usually been excluded from, or overlooked by, broader evolutionary analyses. As a result, they have had little impact on determining the divergence times or palaeobiogeographic relationships of the major maniraptoran lineages. Consequently, the discovery of temporally well-constrained maniraptoran material from the Jurassic is of critical importance to more accurately constrain the timing of this major diversification event and shed light on early maniraptoran evolution.
Dinosaur teeth, including those of theropods, were continually shed and replaced throughout the animal's life and are highly resistant to chemical alteration and abrasion (Argast et al. 1987;Currie et al. 1990;Farlow et al. 1991). As a result, they are abundant in many Mesozoic deposits and sometimes represent the only evidence recording the dinosaur species-richness at such sites (e.g. Evans & Milner 1994;Fiorillo & Currie 1994;Larson & Currie 2013;Gates et al. 2015). The comparatively simple structure of theropod teeth has made identifications difficult historically, given that traditional taxonomic characters lack the resolution for distinguishing the teeth of closely related clades. However, apomorphy-based identifications, and statistical and morphometric analyses, have now been developed that offer solutions to this problem (Currie et al. 1990;Farlow et al. 1991;Smith et al. 2005 Most recently, the use of machine learning procedures has been shown to produce accurate groupdiscrimination when applied to morphological data (Hoyal Cuthill et al. 2019;MacLeod & Kolska Horwitz 2020). Wills et al. (2021) applied this technique to a diverse sample of theropod teeth and demonstrated that these methods lead to higher classification accuracies than more traditional statistical analyses.
Here, we apply these new methods to a large sample of isolated theropod teeth from a series of UK Middle Jurassic microvertebrate sites. Using machine learning and morphological-based approaches we demonstrate that many of these teeth can be referred with confidence to three distinct maniraptoran lineages (Dromaeosauridae, Troodontidae, Therizinosauroidea). These represent some of the earliest, or the earliest, records of these clades known from anywhere in the world, and their presence confirms the predictions of numerous phylogenetic analyses. They indicate that multi-taxic maniraptoran faunas were established by the Bathonian, millions of years earlier than the well-sampled biotas from the Late Jurassic (e.g. Yanliao biota) or late Early Cretaceous (e.g. Jehol biota) that previously represented the best windows on the initial diversification of the clade.

GEOLOGICAL SETTING
Rapid changes in sedimentary facies took place during the Middle Jurassic in the region that is now the UK, with the shallow marine conditions that prevailed during the Early Jurassic giving way to more varied environments, ranging from open shallow-water marine in the south of England to increasingly non-marine strata in the East Midlands, Yorkshire and Scotland (Fig. 1). Deposition took place in a series of rifted basins with intervening structural highs and carbonate shelves developed on the margins of these landmasses. In southern and central England, there were emergent landmasses in the areas that are now South-West England, Wales and the London area. The generally north-south seaway between these consisted of open marine conditions in the south, a lagoon and mudflat complex in the north and a series of oolitic shoals separating these. Sealevel fluctuations throughout the Bathonian often caused pauses in marine sedimentation with occasional localized emergence accompanied by the development of hardgrounds, palaeosols and terrigenous sediment influxes (Palmer & Jenkyns 1975;Palmer 1979;Horton et al. 1995;Wyatt 1996;Underwood 2004;Hesselbo 2008;Barron et al. 2012;Wills et al. 2019). These changing conditions created a mosaic of different environments that were populated by a series of diverse Bathonian vertebrate faunas. Although the remains of large-bodied terrestrial taxa are relatively rare, several important microvertebrate localities have yielded large numbers of small vertebrate remains, including sharks, bony fish, mammals, turtles, crocodilians, choristoderes, pterosaurs, squamates and amphibians (Freeman 1976a(Freeman , 1976b(Freeman , 1979Metcalf et al. 1992;Evans & Milner 1994;Wills et al. 2014Wills et al. , 2019. Dinosaur teeth are common and some of these were referred tentatively to various coelurosaurian theropod clades (Freeman 1976a(Freeman , 1976b(Freeman , 1979Metcalf et al. 1992;Evans & Milner 1994;Wills et al. 2014Wills et al. , 2019. A summary of each main locality is provided below.
The clay unit developed following a flooding event that introduced the initial sediment into the karstic hollow, which subsequently became a coastal marsh pond supporting a wide variety of freshwater organisms (Metcalf 1995). The introduction of terrestrial vertebrate remains occurred both as a direct result of the initial flooding event and subsequent fluvial transport into the pond. This lens of non-marine sediments is over-and underlain by oolitic limestones that were deposited in fully (but lagoonal) marine environments.

Woodeaton
Woodeaton Quarry presents a continuous section through most of the Bathonian including the Rutland, White Limestone and the lower part of the Forest Marble formations, with lower horizons being also briefly exposed (Palmer 1973(Palmer , 1974Palmer & Jenkyns 1975;Horton et al. 1995;Wyatt 2002;Wills et al. 2019). Microvertebrates have been recovered from bed 23 of the Bladon Member, White Limestone Formation, H. retrocostatum Zone (Barron et al. 2012;Wills et al. 2019). This is a pale massive clay, marl or impure limestone that can be traced across the entire quarry face. Unlike other British Middle Jurassic microvertebrate sites, which represent shallow brackish to freshwater ponds, lakes or marginal marine settings of a restricted geographical extent, Woodeaton represents a larger scale brackish water lagoon of fluctuating salinity with periodic influxes of seawater that experienced seasonal aridity (Wills et al. 2019).

Kirtlington
This quarry exposes sections through the White Limestone Formation and overlying Forest Marble and Cornbrash formations with the microvertebrate horizon, the 'Mammal Bed', forming a thin and impersistent lens of unconsolidated brown marl at the boundary between the White Limestone and Forest Marble formations (H. retrocostatum Zone). While its exact correlation with Woodeaton is uncertain, it appears that the Kirtlington fauna is of a slightly younger age than the approximately coeval section at Woodeaton (McKerrow et al. 1969;Wills et al. 2019). The Mammal Bed at Kirtlington formed in a shallow marginal marine environment during a period of marine regression along a shallow coastal plain region characterized by coastal lakes, swamps and lagoons (Freeman 1979;Palmer 1979;Evans & Milner 1994).

Watton Cliff
The Forest Marble Formation (C. discus Zone) section at Watton Cliff is composed of a 10 m thick lower sequence of clays and shales followed by 3-5 m of cross-bedded bioclastic limestones and 9 m of inter-bedded clays and siltstones (Woodward 1894;Strahan 1898;Torrens 1969b;Cope et al. 1980;Melville & Freshney 1982;Holloway 1983;Barron et al. 2012). The entire succession represents open marine conditions, probably with a moderate water depth, although with signs of weak storm influence (such as rippled sand lenses) throughout. There is a cross-stratified bioclastic unit, most of which is strongly cemented, with lenses and irregular patches that lack this cement forming a bioclastic gravel. These unconsolidated patches seem to represent either channels or burrows and commonly contain water-worn vertebrate material (Dineley & Metcalf 1999;Benton et al. 2005); similar material is also present (although harder to extract) in the cemented sediment. The Watton Cliff site represents deposition of a shell bank, possibly during storm-related events (Holloway 1983), in an open marine, clear water, shallow coastal sea on a gently sloping shelf, which was subject to continuous wave action in a tidedominated system with runoff channels developing during emergent conditions. Terrestrial and freshwater organisms are present as allochthonous elements deposited alongside marine invertebrates and vertebrates (marine sharks and teleosaurid crocodilians) (Hunter & Underwood 2009).

MATERIAL AND METHOD
The material consists of isolated theropod teeth from four Middle Jurassic (Bathonian, Great Oolite Group) localities in the UK (Fig. 1 Except for the Hornsleasow material, which is housed in the Museum of Gloucester, the specimens are held in the collections of the Natural History Museum, London. Some of the previously collected material had undergone an initial sort and was assigned either a general taxonomic identification (e.g. 'theropod') or a morphotype (e.g. 'morphotype A').
New material from Woodeaton was obtained by bulk sediment collection on site following initial fieldwork to identify productive horizons. Large bulk samples (often weighing several tonnes) were screen-washed using the methodology described by Ward (1981) to produce an initial concentrate of vertebrate material. This was split into four size fractions (500 lm-1 mm, 1-2 mm, 2-4 mm, >4 mm) to facilitate initial sorting and picking using a binocular microscope. We initially identified 164 isolated theropod teeth from the older collections and new Woodeaton material (Kirtlington, n = 49; Hornsleasow, n = 50; Watton Cliff, n = 4; Woodeaton, n = 61) of which 149 were sufficiently complete to warrant further investigation.
All teeth in the sample underwent a combination of optical imaging with a Dino-Lite AM 7915 MZTL microscope and scanning electron microscopy on a LEO 1455VP microscope. We also scanned each (complete) tooth using micro-computed tomography (lCT) with a Nikon Metrology HMX ST 225 lCT scanner and a Zeiss Versa lCT scanner at a range of voxel resolutions from 4 to 30 lm and created 3D models from the CT volumes using Avizo (v.8.1; ThermoFisher) (Appendix S1). Five morphometric variables were collected from each tooth, which were measured directly from the images using Fiji (Schindelin et al. 2012) and from the 3D models using Avizo. The measurements are simple 2D linear distances ( Fig. 2) between landmarks on the tooth crown: crown height (CH), height of the crown measured from the tip of the tooth to the base of the enamel; crown base length (CBL), length of the base of the crown measured along its mesiodistal axis; crown base width (CBW), width of the base of the crown measured along its linguolabial axis perpendicular to the CBL; average number of denticles per millimetre along the mesial carina (MDM); and average number of denticles per millimetre along the distal carina (DDM). When a measurement could not be taken due to crown damage it was recorded as NA in the data, and carinae with no denticles were recorded as zero for either MDM or DDM variables. When required, the crown base ratio (CBR) is calculated as CBW/CBL and the denticle size density index (DSDI), a measure of the size difference between mesial and distal denticles . Given this and the lack of comparative digital image-based theropod tooth datasets we feel that the approach we have taken is appropriate.
To determine the taxonomic identifications of the teeth we undertook a quantitative analysis of morphometric data using a mixture of machine learning models following the methodology of Wills et al. (2021). We used three different machine learning techniques: mixture discriminant analysis (MDA), random forests (RF) and C5.0, and combined the classification results from all models to form an ensemble classifier. The three models differ in their approach to learning, enabling us to base the final classification prediction on the output of more than one technique. MDA is a non-linear extension of linear discriminant analysis whereby each class is modelled as a mixture of multiple multivariate normal subclass distributions, RF is an ensemble consisting of classification or regression trees (in this case classification trees) where the prediction from each individual tree is aggregated to form a final prediction, and C5.0 is a decision tree classifier based on information theory (Hastie & Tibshirani 1996;Breiman 2001;Kuhn et al. 2018;Wills et al. 2021). Models were combined into an ensemble classifier using both a simple majority voting rule and by combining the class prediction posterior probabilities for each tooth.
To build and train the models we combined several published datasets (Farlow et al. 1991;Sankey et al. 2002;Currie & Varrichio 2004;Smith et al. 2005;Larson 2008;Longrich 2008;Sankey 2008;Rauhut et al. 2010;Larson & Currie 2013;Hendrickx et al. 2015a;Gerke & Wings 2016;Larson et al. 2016;Young et al. 2019) that had been used for prior morphometric analysis with additional measurements taken as part of this study. The resultant dataset covers a wide range of theropod taxa with a broad geographical and temporal distribution, although there is some bias to North American Late Cretaceous taxa (Fig. 3). See the supporting data for a summary of the data used, taxonomic groups chosen and sample sizes used in our analysis.
Different definitions have been applied to these morphometric variables, with Smith et al. (2005) and Hendrickx et al. (2015a) differing in their methods for measuring CBL and CH (Fig. 2), and we used the corrected data provided by Gerke & Wings (2016) where possible. However, the difference in methodology has little overall effect on the reclassification rate, and the perclade accuracies returned from the combined training dataset used here are similar to those reported by Wills et al. (2021). Prior to training these models the data were cleaned to improve model performance. First, we removed any outliers using a density-based spatial clustering algorithm (DBSCAN), which assumes that clusters of data form dense regions in space separated or surrounded by regions of lower density, with the outliers (or noise) falling in the lower density space (Ester et al. 1996). Outliers distort morphospace by shifting the mean centroid of a group to the direction of the outlier, which affects the model accuracy and the resultant classification. Second, we removed any classes with fewer members than the number of predictive variables (five), and last, we removed cases with missing data because this can have a detrimental effect on machine learning models; similarly, any unknown teeth with missing data were excluded from final classification. The data were log-transformed (adding a value of 1 to enable the transformation of zero values), scaled and centred prior to analysis. We made no attempt to directly address class imbalance by creating synthetic data (due the detrimental effect this has on model accuracy) and used equal prior probabilities in all models (Wills et al. 2021). From an initial dataset of 3886 specimens, data cleaning resulted in a final set of 1702 usable cases. We undertook an initial exploration of clade feature space for the transformed morphometric variables using two different dimension reduction techniques to visualize the data, principal components analysis (PCA) and t-distributed stochastic neighbour embedding (t-SNE). We used both techniques given that PCA tries to preserve the global structure of the data whereas t-SNE looks to preserve local structure by keeping similar instances close to each other, potentially giving different insights into the data.
We undertook a series of non-parametric statistical analyses using permutational multivariate analysis of variance (PERMANOVA) with the Mahalanobis distance (Anderson & Walsh 2013; Anderson 2017), to obtain estimates of the statistical significance of training set group separations in feature space. PERMANOVA is used to compare groups of objects by testing for equivalence between the group centroids. The test works on the underlying distance matrix derived from the input variables rather than the raw or ordinated data. Given that PERMANOVA tests only whether all of the centroids in the data are equal, we performed post-hoc comparisons between the groups using a pairwise implementation of the PERMANOVA test with Bonferroni-corrected pvalues. The PERMANOVA and pairwise-PERMANOVA tests were each performed with 10 000 replications.
For each model the cleaned data was split in an 80:20 ratio, preserving the overall class distribution of the data (Kuhn 2008), into a training dataset (1364 cases) and a testing dataset (338 cases). The models were developed on the training data and then assessed against the testing data. Testing data were not used in the initial model. The teeth to be classified were then run through each model in turn to provide independent classifications based on different techniques. We used k-fold cross-validation on the training set with k = 10 to give an overall model accuracy. We also ran each model permutation using a range of tuning parameters to obtain the highest accuracy. For MDA we modelled the response using a range of subclasses, from one to eight, for each taxonomic class; the RF model was tuned by varying the random subset of predictors that the model uses at each split in the tree (m try parameter) from two to five and we grew the forest to 2000 trees; and for the C5.0 model we varied the number of model iterations from 1 to 100 and used both rule-and tree-based classifier models (Kuhn & Johnson 2013;Wills et al. 2021). In addition to the predicted class generated from the models we also calculated the posterior probability of the predicted class for each tooth. Training of the models relies on a random selection of teeth from the overall training data for each run, and indeed within each model there will be a degree of randomization input into the training. As a result, there may be slightly different results obtained from different training cycles of the models. For more details on the techniques involved and descriptions of the differences between the machine learning algorithms see Wills et al. (2021).
Dental terminology and nomenclature follows that outlined by Hendrickx et al. (2015b), and anatomical descriptions are based on morphological observations by one of the authors (SW). Geological, sedimentological and palaeoenvironmental observations are based on the study of published literature and field observations by one of the authors (SW).
All analyses were performed using R v.4.0.5 (R Core Team 2020) in RStudio (RStudio Team 2020). The following R packages were used for specific models or processes:

Machine learning models
The difficulties in providing accurate quantitative assessments of theropod tooth morphological discrimination are highlighted in Figure 4. Here, we show two different feature-space representations of the untrained morphological data, a PCA ordination and a t-SNE ordination, which clearly demonstrate the degree of overlap between numerous theropod clades. Non-parametric statistical tests on the t-SNE ordinated training data confirm this. The PERMANOVA test indicates that although the separation between groups is statistically significant overall (F = 169.6, p < 0.01), there is difficulty in resolving between-group structures for some group-pairs as demonstrated by the pairwise PERMANOVA tests (Fig. 5). This is consistent with previous reports in the literature in which attempts to distinguish theropod taxa using PCA or linear discriminant analysis have produced high degrees of feature-space overlap between some taxonomic groups (e.g. Hendrickx et al. 2019;Young et al. 2019;Noto et al. 2022). This result is unsurprising given that we are constrained in attempting to differentiate teeth with very similar gross morphology based on a small set of morphological measurements. As MacLeod et al. (2022) noted, however, that this does not preclude the possibility that different techniques may uncover significant between-group differences that can be used as the basis of a classification. In fact, when comparing the betweengroup structures for Maniraptora with other groups, the pairwise PERMANOVA tests (Table 1) suggest that these taxa are differentiable from most major theropod clades (p < 0.01).
We also conducted PERMANOVA tests on the trained MDA feature-space scores generated from the training data (Fig. 6). The overall test rejected the null hypothesis that there are no between-group differences (p < 0.01) but, as before, the post-hoc pairwise tests indicate that some group-pairs might be difficult to differentiate using this method, highlighting the importance of using multiple techniques to compare and classify isolated theropod teeth.
All three machine learning techniques have similar levels of accuracy (Table 2), with the overall accuracy of the machine learning models ranging from 82.4% (C5.0) to 85.6% (RF). When the models were run against the test dataset the two decision-tree algorithms, RF at 88.4% and C5.0 at 85.4%, slightly outperformed the MDA F I G . 4 . Untrained ordinated feature-space occupation for teeth comprising the training data set formed by: A, the first two principal component (PC) axes; B, the first two t-distributed stochastic neighbour embedding (t-SNE) axes. F I G . 5 . Training data PERMANOVA Bonferroni adjusted p-values for pairwise clade groups using untrained t-distributed stochastic neighbour embedding (t-SNE) ordinated feature-space based on three t-SNE dimensions. At the individual clade level (Table 3; Fig. 7) the performance of both the ensemble model and the individual machine learning classifiers that make up this ensemble varies with classification accuracy, ranging from 50% to 100% (Fig. 7). Maniraptoran clades have a high level of classification accuracy regardless of the machine learning model used, ranging from 92.8% (Dromaeosaur morphotype A, RF model) to 100% (Dromaeosaur morphotype B, RF model; Dromaeosaur morphotype C, MDA model; and Therizinosauria, RF model). The variation in clade accuracy is driven by several factors, including the number of cases comprising the training group for that particular clade; morphological overlap with other clades; and the limited morphological measurements used to train the classifiers. The accuracy results reported here are derived from cross-tabulation tests on the classified testing data and confirm, as MacLeod et al. (2022) note, that good levels of discrimination for some clades can be achieved by machine learning even when group-level feature-spaces overlap.

UK Bathonian sites
The classification results from the UK Bathonian isolated teeth (Table 4) indicate the presence of three distinct dromaeosaur morphotypes. These morphotypes are strongly supported across all machine learning models and the ensemble classifier in either majority-vote or combined probability mode. Our confidence in the classifications is a combination of the machine learning results from three independent classifiers and our post-hoc morphological analysis. In all machine learning systems there is likely to be a degree of misclassification and in this case the models incorrectly classified GCLRM G8-23 as a dromaeosaur rather than a troodontid, NHMUK PV R37948 as a troodontid rather than a dromaeosaur and GCLRM G167-32 as a dromaeosaur rather than a therizinosaur (see Systematic palaeontology, below). The posterior probabilities from the ensemble classifier (Fig. 8) also add to our confidence in the machine learning prediction given that the majority of the teeth return high posteriors in favour of the assigned class, with the second-highest class posterior in each case also indicating maniraptoran affinities. In addition, it is clear from the trained MDA data (Fig. 9) that the small teeth from these sites occupy a segment of feature-space that is both congruent with a broad maniraptoran feature-space and distinct from that occupied by other Jurassic taxa.  Description. Morphotype A tooth crowns (36 in total) are ziphodont, range in CH from 1.45 mm to 7.79 mm (Fig. 11) and have serrated distal carinae and unserrated mesial carinae. The distal crown margin is concave, the crowns are labiolingually compressed (CBR 0.36-0.76) and their lingual and labial surfaces possess centrally placed concave depressions that extend apically to the mid-height of the crown surface. These depressions, especially where strongly developed, result in a lemniscate (figure-of-eight) basal crosssection. Both mesial and distal carinae are well developed with the distal carina often deflected labially towards the crown base and the mesial carina twisted slightly and deflected lingually basally. The distal carina extends from the crown apex to the crown base and bears denticles that are generally restricted to the lower two-thirds of the carina, although occasionally reaching the apex. The distal denticles decrease in size both apically and distally from carina midlength. Distal denticles are small, ranging in length from 0.05 mm to 0.27 mm (18.2 per mm to 3.6 per mm), are subrectangular in shape with a convex external margin, and are orientated perpendicular to the carina (except for a few teeth in which the denticles are slightly inclined apically). The mesial carina extends from the apex of the crown to a position approximately two-thirds down the crown and lacks denticles. The crown surface has a braided enamel texture consisting of sinuous grooves and ridges that are orientated apicobasally (Hendrickx et al. 2015a(Hendrickx et al. , 2019.   Description. Morphotype B crowns (37 in total) are grouped together by the machine learning analysis but show considerable variation in denticle size differences between carinae, hence might encompass several different subgroups with broadly similar morphology. Tooth crowns are ziphodont, slightly larger than morphotype A (CH ranging from 1.66 mm to 19 mm) and have a straight to concave distal margin. Most of the crowns are labiolingually narrow (CBR < 0.6) although four (NHMUK PV R37934, NHMUK PV R36778, NHMUK PV R37911 and NHMUK PV R37931) have a CBR of >0.8 (Fig. 11) and may represent more mesially positioned teeth (Hendrickx et al. 2019). In contrast to morphotype A, the depressions on the lingual and labial surfaces are less prominent. Consequently, the basal cross-section of morphotype B ranges from a weaker lemniscate outline to a more oval or lenticulate shape. The mesial and distal carinae are both well developed and extend from the crown apex to just above the crown base with the distal carina often exhibiting a labial deflection basally and the mesial carina (where preserved) twisted slightly lingually. In contrast to morphotype A, both mesial and distal carinae are denticulate. Mesial denticles are restricted to the apical region of the carina but distal denticles extend over the full length of the carina. The distal denticles are generally larger than the mesial denticles with DSDI > 1. However, in some smaller crowns (NHMUK PV R36778, NHMUK PV R37912, NHMUK PV R37913, NHMUK PV R37937, NHMUK PV R37911, NHMUK PV R37951) the DSDI is <1, indicating that the mesial denticles are larger than distal ones. In several crowns (NHMUK PV R37936, NHMUK PV R37943, NHMUK PV R37916, NHMUK PV R37931, GCLRM G167-24, GCLRM G10-37, GCLRM GTube 67, NHMUK PV R37923 and NHMUK PV R37938) the difference in size between mesial and distal denticles is exaggerated, with DSDI > 1.4, and it is possible that they may represent either a variation within this morphotype or a separate morphotype. However, in the absence of any other morphological Combined posterior probability, assigned tooth morphotype by combining posterior probabilities from three machine learning models; majority vote, assigned tooth morphotype following simple majority vote of three machine learning models; morphotype, assigned tooth morphotype following machine learning and visual description; P, combined posterior probability value.
differences, and the machine learning support for this grouping, we have elected to keep these crowns in morphotype B. Mesial and distal denticles are all rectangular to subrectangular in shape with a convex external margin and are orientated perpendicular to the carina. The crown surface has a braided enamel texture consisting of sinuous grooves and ridges orien-  Description. Morphotype C includes two small, damaged crowns ranging in CH from 0.61 mm to 1.6 mm with a concave distal margin. The crowns are labiolingually narrow (CBR c. 0.5) and the depressions on the lingual and labial surfaces seen in morphotypes A and B are absent or weakly developed, resulting in a subcircular to oval basal crosssection. Both mesial and distal carinae are present, extending from the crown apex to just above the crown base, and are denticulate. Mesial denticles are restricted to the upper half of the carina, and distal denticles extend from the base to just below the crown apex. The mesial carina is extensively worn on both teeth. Denticles on the mesial and distal carinae are equal to subequal in size, with a DSDI of 1.1. The serration density on both the mesial and distal carinae is substantially greater than in morphotype B, with mesial denticles ranging from 15 per mm (NHMUK PV R36779) to 18 per mm (NHMUK PV R37920) and distal denticles from 13 per mm (NHMUK PV R36779) to 17.4 per mm (NHMUK PV R37920). By contrast, morphotype B mesial denticles average 8.7 per mm and distal denticles, 7.0 per mm. Both mesial and distal denticles are rectangular to subrectangular in shape with a convex external margin and are orientated perpendicular to the carina. These small teeth, although damaged and worn in places, appear to represent a morphotype distinct from morphotype B based on their smaller size, and greater serration density on both carinae. TROODONTIDAE Gilmore, 1924 Gen. et sp. indet.
Description. GLRCM 8-23 is a small, almost complete isolated tooth with a distinctive morphology. The tooth shows some damage at the base of the distal carina and at the crown apex where denticles are missing. The crown is small (CH 2.9 mm) and phylloform, with a slight lingual inclination. It is labiolingually compressed (CBR 0.53), lenticular in basal cross-section and has a weak constriction at the base. The distal margin of the crown is straight to weakly concave and the mesial margin is convex. The mesial and distal carinae are both denticulate with large, prominent and apically orientated denticles. The mesial carina reaches the base of the crown: however, due to damage it is not possible to confirm this for the distal carina. Distal denticles are both significantly larger and fewer in number than the mesial denticles with a DSDI of 1.43. Both mesial and distal denticles appear to extend from the base of the crown to the apex, although damage to the basal portion of the distal carina obscures this somewhat. Mesial denticles decrease in size both apically and basally from the crown midpoint whereas distal denticles increase in size slightly towards the apex. Distal denticles are subrectangular in shape, being slightly longer mesiodistally than apicobasally, and have convex external margins. The denticles are aligned perpendicular to the carina towards the base of the crown but become apically orientated and hooked midway along the carina. Mesial denticles have a parallelogram-shaped outline in labial view caused by the apical orientation of the denticles along the carina. Grooves are present between adjacent denticles on both carinae but do not extend to the crown surface. THERIZINOSAUROIDEA Maleev, 1954 Gen. et sp. indet.
Description. GCLRM G167-32 is an isolated complete crown that is phylloform in shape, labiolingually compressed and subsymmetrical in both lingual and labial views with convex mesial and distal margins. The crown is small with a crown height of 3.5 mm, a maximum width of 2.8 mm (decreasing to 2.4 mm at the crown base: crown base occupying around 85% of the maximum crown width, CBR = 0.73), and has a small basal F I G . 9 . Trained feature-space occupation of UK Bathonian teeth compared with training data based on two mixture discriminant analysis (MDA) dimensions. A, compared with all taxa in the training data with Maniraptoran clades highlighted. B, compared with Jurassic taxa. CV, canonical variate.
constriction. The labial surface is strongly convex. The lingual surface is dominated by a median ridge running from apex to base forming a slightly convex profile bounded by mesial and distal concave depressions adjacent to both carinae. Carinae are present on both margins of the teeth with the mesial carina restricted to the upper half of the crown and the distal carina extending toward, but not reaching, the crown base. Both carinae are denticulated with fewer, and larger, denticles towards the apex than at the mid-crown position. Average denticle sizes on both carinae are equal with the distal carina ranging from 6.3 per mm at mid-crown to 5.9 per mm apically and the mesial carina being 6.6 per mm at mid-crown to 5.7 per mm apically. Denticles appear to reach almost to the apex of the crown although slight damage and wear at the apex obscures this. The denticles are rectangular, being slightly longer apicobasally, have a convex exterior margin and are slightly inclined apically.

Morphological comparisons
Dromaeosaurid morphotypes. We interpret morphotypes A-C as dromaeosaurids based on the machine learning classification and several morphological characters of the teeth, which, in  Troodontid morphotype. We refer the single tooth GCLRM G8-23 to Troodontidae on both morphological-based considerations and machine learning morphospace position. GCLRM G8-23 resembles the teeth of troodontids based on its large, bulbous, widely spaced and apically inclined denticles on the distal carina, the overall phylloform shape of the crown, and the presence of a basal constriction. The presence of denticles on both the mesial and distal carinae is seen in derived troodontids ( Abelisaurid lateral teeth also share this denticle morphology: however, the distal margins of most abelisaurid crowns, with a few exceptions, tend to be convex rather than straight to weakly concave and have a triangular crown shape rather than the phyllodont shape seen here (Hendrickx & Mateus 2014).
Therizinosauroid morphotype. We refer the single tooth GCLRM G167-32 to Therizinosauroidea on morphological-based considerations only, given that this tooth was incorrectly classified as a dromaeosaurid in the machine learning analysis. A subsymmetrical phylloform-shaped tooth with a basal constriction as seen in GCLRM G167-32 are features shared with therizinosauroids such as

DISCUSSION
The application of machine learning techniques, combined with morphological-based approaches, to isolated teeth from Bathonian microvertebrate sites confirms the presence of at least three maniraptoran taxa in the assemblage: three dromaeosaur morphotypes (which might indicate multiple dromaeosaur taxa); a troodontid; and a ). The analysis of A. sarcophagus teeth suggests that strongly heterodont dentitions can influence morphospace occupation, with premaxillary teeth quantifiably different to maxillary and dentary teeth but with no quantifiable difference between maxillary and dentary teeth. Analysis of 848 teeth from 23 skulls of C. bauri using both discriminant analysis and canonical variate analysis shows that positional variation does not influence morphospace occupation but that it can be influenced by ontogeny. This does suggest that a degree of caution is warranted when ascribing morphotypes of isolated theropod teeth to different taxa; hence here we distinguish the teeth only as morphotypes within a broader taxonomic framework. These results provide the first quantitative support for the presence of maniraptoran theropods in the Middle Jurassic, from sites that are well constrained biostratigraphically in Bathonian ammonite zones, increase the known diversity of Middle Jurassic theropods from the UK, and provide the oldest occurrences of troodontids and therizinosaurs worldwide (Fig. 7). These identifications provide the first definitive body-fossils consistent with predictions made by phylogenetic analyses, which posited the likely presence of these clades at this time (Holtz 2000;Rauhut 2003;Xu et al. 2010;Carrano et al. 2012;Rauhut & Foth 2020). Previous reports of Middle Jurassic maniraptoran occurrences have been disputed (Foth & Rauhut 2017;Ding et al. 2020) or have considerable temporal and stratigraphic confusion (Sullivan et al. 2014;Xu et al. 2016). The age of the paravians from the Middle to Upper Jurassic Daohugou Beds (Yanliao biota) in northeastern China is controversial because Our results show that Maniraptora was not only established by the Bathonian but was already diverse at this time, at least in Laurasia, and also extend significantly the known temporal ranges of all major maniraptoran clades. Therizinosaurians, excluding the controversial occurrence of Eshanosaurus (Xu et al. 2001;Barrett 2009), are currently known mainly from the Cretaceous of Asia apart from the basal, and oldest, therizinosauroids Falcarius and Martharaptor from the Berriasian Cedar Mountain Formation of Utah (Kirkland et al. 2005;Senter et al. 2012;Joeckel et al. 2020) and the Turonian taxon Nothronychus from New Mexico and Utah (Kirkland & Wolfe 2001). The occurrence of a therizinosaur in the Bathonian of the UK extends the temporal range of this clade by c. 27 myr (Fig. 7). Dromaeosaurs had an almost pan-global distribution during the Late Cretaceous, although they are best known from Asia and North America. The earliest definitive dromaeosaurs, excluding records of referred isolated teeth, are from the Barremian Jehol biota of China (Xu et al. 2000;Zheng et al. 2009). Isolated teeth from the Middle and Late Jurassic of Laurasia and Gondwana have been assigned to the clade previously (Zinke 1998;Hendrickx & Mateus 2014;Vullo et al. 2014; Prasad & Parmar 2020) but their identifications have not been widely accepted (Foth & Rauhut 2017;Ding et al. 2020;Sell es et al. 2021). Our results, however, offer the first quantitative assessment of potential dromaeosaur teeth from the Middle Jurassic, confirming the existence of the clade by the Bathonian and a confirmed range extension of some 38 myr (Fig. 7). Based on comparisons with our data, it seems likely that some other published Jurassic records also represent this clade, although rigorous analysis will be needed to confirm this suggestion. Troodontids are known primarily from the Cretaceous of Asia, Europe and North America (Brown & Schlaikjer 1943;Russell 1946;Barsbold et al. 1987;Currie 1987;Sell es et al. 2021) and possibly the Late Jurassic of China (Hu et al. 2009;Turner et al. 2012;Brusatte et al. 2014), although more recent analyses consider these Late Jurassic taxa to be basal avialians (Foth & Rauhut 2017;Pei et al. 2017). Isolated teeth from the Late Jurassic of Portugal and North America and the Late Cretaceous of India have been assigned to the clade (Chure 1994;Zinke 1998;Goswami et al. 2013) although many of these identifications have been questioned (Ding et al. 2020). Thus, our confirmed Middle Jurassic European troodontid pushes back the origin of this clade by 27 myr (Fig. 7) from the Berriasian (Geminiraptor, Utah;Senter et al. 2010) to the Bathonian.
The presence of this diverse Middle Jurassic biota also suggests we need to re-visit the biogeographical scenarios that have been proposed to account for patterns in maniraptoran faunal distributions (Case et al. 2007;Rauhut et al. 2010;Zanno 2010b;Ding et al. 2020). Two non- mutually exclusive scenarios are widely accepted as having major impacts on maniraptoran biogeographical distributions: vicariance from a widespread initial distribution, driven by continental break-up and fragmentation (Fastovsky & Weishampel 1996;Upchurch et al. 2002;Zanno 2010b;Ding et al. 2020), and faunal dispersal with dispersal routes shaped by the establishment of land bridges between continental masses (Upchurch et al. 2002;Dunhill et al. 2016;Ding et al. 2020). It is also likely that regional extinction events played a part in shaping biogeographical distributions (Sereno 1997;Barrett et al. 2011;Benson et al. 2012). The presence of Middle Jurassic Laurasian proceratosaurids and earliest Cretaceous Gondwanan ornithomimosaurs suggests that coelurosaurs were widespread before the break-up of Pangaea (Rauhut et al. 2010;Choiniere et al. 2012), with a recent analysis by Ding et al. (2020) suggesting that continental-scale vicariance was an important factor in accounting for coelurosaurian biogeographical distributions. Due to the uncertainty created by the absence of definitive and temporally well constrained pre-Cretaceous maniraptorans (Zanno 2010b; Foth & Rauhut 2017; Sell es et al. 2021), several different scenarios have been put forward to account for maniraptoran distributions while accepting that more fossil evidence would be needed in order to test these. For example, Foth & Rauhut (2017) suggested that all maniraptoran clades more derived than Ornitholestes originated and diversified in eastern Asia, followed by dispersal from this area to Europe and North America by the Late Jurassic. By contrast, the pan-Laurasian distribution of Early Cretaceous therizinosaurs has been taken to indicate either a vicariance event, with therizinosaurs present in Asia and North America prior to major rifting and the opening of the North Atlantic, or a dispersal of basal therizinosaurs between North America and Asia via land bridges after the rifting event  The presence of maniraptorans in the Middle Jurassic (Fig. 16) suggests that a pan-Pangaean distribution was established before continental separation began at c. 170 Ma (Scotese 2021). A combination of vicariance events driven by continental separation, regional extinctions and later dispersal events can be invoked that then lead to the later Mesozoic distributions.
Machine learning provides a powerful new tool that can provide quantitative assessments of isolated theropod tooth identifications and has been shown to outperform other analytical methods (Wills et al. 2021). The use of multiple machine learning algorithms. as applied here, enables the corroboration of results by checking predictions derived from another technique. It is also important to note the limitations of any technique and our study was constrained (due to the nature of the training datasets available) to a small number of morphometric variables. Moreover, data availability was too poor to accurately describe a model in some cases. However, we expect the ability to classify isolated teeth in this manner to improve with the collection of more data (including 3D data) to train the classifiers. For now, we emphasize the importance of cross-checking results from machine learning analyses with more traditional morphologicalbased approaches.

CONCLUSION
The use of machine learning algorithms has enabled us to confirm, in a quantifiable framework, the presence of a diverse maniraptoran theropod fauna in the Middle Jurassic (Bathonian) of the UK. Our sample includes the oldest-known occurrences of Troodontidae and Therizinosauroidea. This confirms a Middle Jurassic (or earlier) origin for Maniraptora and suggests that the clade had a pan-Pangaean distribution prior to continental break-up. The presence of these early maniraptorans, currently known only from isolated teeth, highlights the importance of incorporating microvertebrate remains into faunal and evolutionary analyses. The accuracy of machine learning results is hampered by the quality of the data used to train the models, and larger datasets will be required to improve model performance, but the combination of these results with morphological-based identifications can overcome this issue to provide a robust, testable framework for taxonomic identifications.