The Tien Shan vole (Microtus ilaeus; Rodentia: Cricetidae) as a new species in the Late Pleistocene of Europe

Abstract Grey voles (subgenus Microtus) represent a complex of at least seven closely related and partly cryptic species. The range of these species extends from the Atlantic to the Altai Mountains, but most of them occur east of the Black Sea. Using ancient DNA analyses of the Late Pleistocene specimens, we identified a new mtDNA lineage of grey voles in Europe. Phylogenetic analysis of mitochondrial DNA cytochrome b sequences from 23 voles from three caves, namely, Emine–Bair–Khosar (Crimea, Ukraine), Cave 16 (Bulgaria), and Bacho Kiro (Bulgaria), showed that 14 specimens form a previously unrecognized lineage, sister to the Tien Shan vole. The average sequence divergence of this lineage and the extant Tien Shan vole was 4.8%, which is similar to the divergence of grey vole forms, which are considered distinct species or being on the verge of speciation; M. arvalis and M. obscurus or M. mystacinus and M. rossiaemeridionalis. We estimated the time to the most recent common ancestor of the grey voles to be 0.66 Ma, which is over twice the recent estimates, while the divergence of the extant Tien Shan vole and the new lineage to be 0.29 Ma. Our discovery suggests that grey voles may have been more diversified in the past and that their ranges may have differed substantially from current ones. It also underlines the utility of ancient DNA to decipher the evolutionary history of voles.


| INTRODUC TI ON
Climatic and environmental changes in the Pleistocene are widely understood to have shaped the species evolutionary history and distribution of mammals worldwide (Hofreiter & Stewart, 2009;Stuart, 2015). The climatic and environmental processes that occurred during the Late Pleistocene and the onset of the Holocene had a profound impact on the extinction and formation of species that make up extant mammal faunas (Baca et al., 2017;Cooper et al., 2015;Lorenzen et al., 2011;Sommer, 2020). Small mammals are an integral component of every biome, and rodents, which are well-known arvicolines (subfamily Arvicolinae), are the main component of nearly every Late Pleistocene fossil assemblage in the Northern Hemisphere. In Europe, aside from lemmings (Dicrostonyx and Lemmus), several vole species (from the genera Arvicola, Lasiopodomys, Alexandromys, and Microtus) are the predominant type of small mammal (Kowalski, 2001).

and distributed in Europe and
Western and Central Asia. The specific status of members of this group remains a matter of taxonomic debate. For example, some authors consider M. arvalis and M. obscurus as separate species (Tougard et al., 2013), whilst others consider them chromosomal races of the single species (Golenishchev et al., 2019;Sibiryakov et al., 2018). Similarly, the taxonomic status of the 54-chromosome forms of grey voles, that is, M. rossiaemeridionalis and M. mystacinus, is ranked differently. Some authors treat these chromosomal forms as the single species M. mystacinus (e.g., , whilst others reserve this species name only for the Iranian clade and classify European populations to the separate cryptic species M. rossiaemeridionalis (Golenishchev et al., 2019). In the Late Pleistocene of south-eastern Europe, only two species of the arvalis group are recognized, namely, the common vole (M. arvalis), which has been detected in the Balkans and Pannonian Basin (e.g., Bogićević et al., 2017;Mauch Lenardić, 2007;Popov, 2018) but is absent in the western and northern fringes of the Black Sea (Krokhmal & Rekovets, 2010;Petculescu & Ştiucă, 2008), and M. obscurus, which has been detected only in Crimea (Markova, 2011;Ridush et al., 2013).
The species determination as M. arvalis based on morphology has recently been confirmed by genetic studies of sub-fossil specimens from different parts of Europe Lemanik et al., 2020). Given the morphological and morphometric uniformity of molars of species from the arvalis group (Kochev, 1986;Markova et al., 2010), the timing of evolutionary events within the subgenus is based on molecular data. However, estimates of divergence times vary considerably depending on the calibration method used.
The age of the most recent common ancestor of grey voles was estimated to be between 1.2 Ma, as determined using fossil calibration (Thanou et al., 2020), and 0.315 Ma, as determined using ancient DNA-based calibration .
Unfortunately, the remarkable variation in the estimates of species divergence prevents the accurate association of evolutionary events with biogeographic or climatic data and the reconstruction of the evolutionary history of this group.
We sampled isolated molars or mandibles with molars identified as M. arvalis or Microtus sp. on the basis of morphology. To increase the number and temporal range of radiocarbon-dated specimens, we included two common vole mandibles, one from Obłazowa cave (WE) and another from Obłazowa 2 in Poland. Finally, we extracted DNA from ethanol-preserved tissue fragments of four Tien Shan voles collected by JZ in the early 1990s in Kyrgyzstan and stored at the Institute of Vertebrate Biology, Czech Academy of Sciences, Brno, Czechia (Table S1).

| Morphometric analyses
The morphological characteristics and general taxonomy of the fossil specimens, including those belonging to the newly identified mtDNA lineage (n = 14), were identified using linear measurements.
The nomenclature and measurements of the first lower molar, total length (L), and length of the anteroconid complex (A) were conducted according to the methods proposed by van der Meulen (1973). La/ Li indices were calculated to quantify the degree of asymmetry between triangles T4 and T5 according to the method prescribed by Nadachowski (1984a) and modified by Cuenca-Bescós and Laplana (1995). This morphometric parameter is useful to distinguish vole species with an arvaloid morphology (e.g., Bogićević et al., 2012;Nadachowski, 1984aNadachowski, , 1984b).

| DNA extraction, enrichment, and sequencing
The samples were processed in a clean laboratory at the Laboratory of Paleogenetics and Conservation Genetics, Centre of New Technologies, University of Warsaw. The pre-PCR laboratory has separate rooms designated for specimen preparation, DNA extraction, and library preparation. There is an increased air pressure to enforce unidirectional airflow toward the exterior of the laboratory, and the rooms were UV irradiated after each use. No modern tissues of any species have ever been processed in this laboratory. The post-PCR laboratory is physically separated from the pre-PCR one.
Prior to DNA extraction, each tooth was rinsed with ultrapure water and crushed with a pipette tip. DNA was extracted using a protocol optimized for the retrieval of ultrashort DNA sequences (Dabney et al., 2013). Each batch of extractions was accompanied by blanks to monitor possible contaminations. A fraction of the extract was converted into double-stranded, double-indexed Illumina libraries following the protocol of Meyer and Kircher (2010) with minor modifications (Baca et al., 2019). After the fill-in reaction, each library was amplified for 19 cycles in three replicates by using AmpliTaq Gold 360 DNA polymerase (Applied Biosystems). To enrich the mtDNA libraries, we performed two cycles of in-solution hybridization following the protocol by Horn (2012). After hybridization, the libraries were quantified, pooled, and sequenced on the NextSeq550 platform (2 × 75 bp, MID output).
DNA from modern specimens was extracted in a laboratory that was physically isolated from the ancient DNA and post-PCR laboratories by using a DNeasy blood & tissue kit (QIAGEN ® ) following the manufacturer's recommendations. A fraction of the DNA extracts was converted into sequencing libraries and subjected to in-solution target enrichment by using the same protocols conducted to obtain libraries from the ancient DNA except that the new libraries were amplified for 12, instead of 19, cycles and only one round of hybridization was performed.
Adaptors and low-quality nucleotides were removed, and overlapping reads were merged using AdapterRemoval v.2 (Schubert et al., 2016). Filtered reads were mapped to the mtDNA sequences of various Microtus species using bwa mem (Li, 2013). Here, only reads longer than 30 bp and MapQ >30 were retained. Putative PCR duplicates were removed with the samtools rmdup command (Li et al., 2009). Comparison of mapping statistics with different mtDNA references enabled the preliminary species assignment. Variants were called using the bcftools mpileup and call commands, and alignments were visually inspected in Tablet (Milne et al., 2013). A list of regions with a coverage of <3 was generated using the bedtools genomecov command (Quinlan & Hall, 2010), and these regions were subsequently masked with N. Consensus was called using the bcftools consensus command. The mtDNA genome of extant M. ilaeus was de novo assembled using NOVOplasty (Dierckxsens et al., 2017).

| Phylogenetic analyses
Phylogenetic reconstruction was conducted based on a sequence of the mtDNA cytochrome b (1143 bp). We used a dataset of 152 sequences that included all species of Microtus (s. str.), as well as species from the subgenera Sumeriomys (social voles) and Terricola (M. subterraneus (Selys-Longchamps, 1836)). We used two sequences of field vole (M. agrestis) and another two sequences of European snow vole (Chionomys nivalis (Martins, 1842)) as out-groups (Table   S2). We accepted only sequences covering over 70% of the mtDNA cytochrome b. To include also available genetic information from M. ilaeus igromovi, a vicariant population of M. ilaeus classified as a subspecies (Golenishchev et al., 2019), additional analysis was performed using short cytochrome b fragment (341 bp).
The best-fitting substitution model was revealed by jModelTest2 (Darriba et al., 2012) to be TrN+I+G. Bayesian phylogeny was reconstructed using MrBayes 3.2.7a (Ronquist et al., 2012). We conducted two independent runs with four coupled chains each for 5×10 6 generations sampled every 500 generations. A maximum likelihood tree was reconstructed using IQtree (Nguyen et al., 2014) with 1000 ultrafast bootstrap replicates to assess branch support. In both analyses, the data were partitioned into three codon positions.
We used the Bayesian approach implemented in BEAST 1.10.4 (Suchard et al., 2018) to estimate divergence times within the Microtus subgenus. In this analysis, we used only sequences with known sampling times (n = 95; Table S2). To calibrate the molecular clock, we used 12 sequences obtained from directly radiocarbondated specimens. Nine of these sequences originated from M. arvalis and have been previously reported . Three other dated sequences, that is, two from M. arvalis and one from the new mtDNA lineage, are reported here for the first time (Table   S4). To increase the number of sequences from the newly reported mtDNA lineage for divergence dating analysis, we used the sequences of two specimens from layer G in Emine-Bair-Khosar dated to the post-LGM period (Doan et al., 2018) and assigned them a sampling time of 15 cal ka BP. We used the Bayesian evaluation of temporal signal (BETS) (Duchene et al., 2020) to check whether sufficient temporal resolution is available within our dataset to calibrate the molecular clock. We compared the support for the four models; in two of them, we assigned real sampling times to the sequences (heterochronous analysis) and then used either strict clock or uncorrelated relaxed lognormal clock. In the two other models, we used the same sampling time (i.e., isochronous analysis) for all sequences and applied either strict clock or uncorrelated relaxed lognormal clock. We applied a constant population size tree prior for all analyses and a CTMC rate reference prior for the heterochronous datasets (Ferreira & Suchard, 2008).
Each analysis was run for 20 million generations sampled every 2000 generations. We then estimated the log marginal likelihoods (MLE) of each model by using the generalized stepping-stone (GSS) sampling approach (Baele et al., 2016). The MLE calculation comprised 50 path steps, each of which was run for 10 6 iterations. We performed two replicates of each BEAST analysis. Convergence and stationarity were inspected in Tracer 1.7 (ESS > 200 for all parameters), and trees from the two runs were combined in logcombiner and summarized in treeannotator.
The mean divergence (D xy ) between mtDNA lineages was calculated in MegaX (Kumar et al., 2018) by using the TrN substitution model with uniform rates across sites and 100 bootstrap replicates to determine the variance. Both transitions and transversions were included, and pairwise deletion was employed in case of missing data.

| Radiocarbon dating
Three vole mandibles, one each from Obłazowa WE, Obłazowa 2, and Bacho Kiro Cave, were pre-treated for radiocarbon dating at the Department of Human Evolution, Max Planck Institute for Evolutionary Anthropology (MPI-EVA, Leipzig, Germany), following the protocol for <100 mg bone samples described in Fewlass et al. (2019). The quality of the collagen extracts was assessed based on the yield as a percentage of the original bone weight (minimum requirement, 1%). The elemental and isotopic ratios of the extracts (~0.5 mg) were measured on a Thermo Finnigan Flash elemental analyzer coupled to a Thermo Delta Plus XP isotope ratio mass spectrometer. If sufficient collagen was extracted, the collagen was graphitized using automated graphitization equipment  at the Laboratory of Ion Beam Physics, ETH-Zurich (Switzerland) and dated on a MIni CArbon DAting System (MICADAS) accelerator mass spectrometer (Wacker, Bonani, et al., 2010) (Laboratory Code: ETH). If the extracted collagen yield was insufficient for graphitization, it was combusted to CO 2 and measured directly using a gas interface system coupled to the gas ion source of the MICADAS (Wacker et al., 2013) following the protocol described in Fewlass et al. (2018Fewlass et al. ( , 2019

| RE SULTS
Although we enriched our libraries for the whole mtDNAs, in numerous specimens, we encountered a mixture of at least two sequences spanning usually the first ca. 11 kb or, much less often, a larger portion of the mtDNA genome; thus, reliable consensus calling could not be achieved. The contaminating sequences were likely to be nuclear sequences of mitochondrial origin (numts). The presence of numts in mtDNA assemblies was previously reported for various Microtus species (Barbosa et al., 2018;Duckett et al., 2021;Triant & DeWoody, 2008), and their presence are especially expected in the case of the assemblies based on mtDNA-enriched libraries. Therefore, we called consensus sequences only for mtDNA fragments where numts were not present, and this was either whole mtDNA, ca. 4.3 kb of mtDNA (12,001-16,267 according to NC_038176.1) or only the cytochrome b sequence (1143 bp) (Table S1). In the case of several specimens, the presence of second sequence prevented reliable consensus calling even for the cytochrome b fragment and such specimens were discarded (Table   S1). We recovered at least the cytochrome b sequence of 22 specimens from the three investigated caves and the two common vole specimens from Poland. All the specimens yielded a damage pattern and length distribution of DNA molecules characteristic of ancient DNA (Table S1).

| Phylogenetic analysis and divergence estimates
The phylogeny based on the mtDNA of cytochrome b recovered all main mtDNA lineages of grey voles with high support (Figures 1 and S1). One specimen from Bacho Kiro cave was revealed to be European pine vole (M. subterraneus). Four specimens were classified as common vole (M. arvalis), two from Bacho Kiro and two from Cave 16, another three specimens from Emine-Bair-Khosar cave were identified as M. obscurus. A total of 14 specimens, specifically, eight from Emine-Bair-Khosar, five from Cave 16, and one from Bacho Kiro, formed a previously unrecognized, highly supported lineage sister to the Tien Shan vole (M. ilaeus; Figures 1 and S1). The mean divergence (D xy ) of this new lineage and the Tien Shan vole was 4.8% ± 0.6%, which is slightly higher than the D xy between M. arvalis and M. obscurus (3.9% ± 0.5%) and similar to the D xy of M. mystacinus and M. rossiaemeridionalis (4.8% ± 0.7%; Table S3). It was also much higher than divergence between M. ilaeus and M. ilaeus igromovi (2.6% ± 0.7%; based on 341-bp fragment; Table S3).

| Estimation of divergence times within the subgenus Microtus
The three specimens pre-treated for radiocarbon dating produced high-quality collagen in terms of Carbon-to-Nitrogen ratio and yield (Table S4; van Klinken, 1999;Talamo et al., 2021). The resulting dates were consistent with the stratigraphic position of the specimens (Text S1) and used for calibration of the molecular clock. BETS analysis showed that our dataset contains sufficient temporal signal to calibrate the molecular clock. Amongst the models tested, the model with a strict molecular clock and correctly assigned sampling times produced the highest log MLE and it was highly supported (2lnBF = 23.54, 19.52, and 15.09 with respect to models with the correct sampling times and an uncorrelated relaxed clock, the model with no sampling times and a strict clock, and the model with no sampling times and an uncorrelated relaxed clock, respectively; Table S5). The validity of our dataset to estimate rates was further supported by the results of the date randomization test (Table S6)

| Morphometric features of the new mtDNA lineage
Measurements were performed on all 23 specimens that yielded mtDNA sequences. We also calculated the La/Li index of subset specimens from Bacho Kiro and Cave 16 because this parameter has not been reported in original publications describing the assemblages of small mammals from these sites (Nadachowski, 1984b;Popov, 2000). The tooth size of specimens from the newly identified lineage was between 2.53 and 3.31 mm ± SD, 2.92 ± 0.24; Figure 3; Table S7). The teeth of specimens from Bulgaria were significantly larger (xL = 3.16, n = 6) than those from Crimea (xL = 2.71, n = 7; T-test, p < .005; Figure 3). In Bulgaria, specimens from the new lineage were larger than those of M. arvalis and exceeded the size range of M. agrestis (Figure 3), but this result must be treated with caution because only very small number of individuals were measured. The La/Li index varied between 61.4 and 85.7 (x ± SD, 72.55 ± 6.61; n = 14) and were within the range recorded for M. arvalis, that is, generally exceeded 65.0 (Nadachowski, 1984a; Figure 3; Table S7).
The La/Li index for a random subset of specimens from Bacho Kiro (x ± SD, 67.98 ± 7.41; n = 84) and Cave 16 (x ± SD, 71.18 ± 6.75; n = 60) also yielded values in the range of species from the arvalis group ( Figure 3). The anterioconid complex of M. ilaeus, especially the anterior cap, was highly variable, and no morphological features that could clearly distinguish M. ilaeus from M. arvalis and/or M. obscurus were found (Figure 4).  Table S7). The size of the teeth (L) of the new species was highly variable but remained within the range of tooth sizes reported for other species from the subgenus . Differences in tooth size between M. ilaeus from Bulgaria and Crimea and M. ilaeus and M. arvalis from Bulgaria may reflect partitioning into different ecological niches; the current data, however, are too limited to build any definite conclusions.

| Characterization of the newly discovered lineage
The difference in size between M. arvalis and M. ilaeus from Bulgaria may have, however, some implications for understanding of paleontological small mammal assemblages from the area.
Amongst rodents from the late Middle and Late Pleistocene of Bulgaria, voles with the arvaloid m1 morphology were found in 11 sites (Popov, 2018). These voles are most often referred to as Microtus ex gr. arvalis/agrestis (Nadachowski, 1984b;Popov, 2000Popov, , 2018Popov & Marinska, 2007) or Microtus ex gr. arvalis (Ivanova et al., 2016;Popov, 1994). Because of the large variability and range of the m1 length of specimens with arvaloid morphology from Bacho Kiro and Cave 16, these specimens were previously suggested to belong to two species: the smaller M. arvalis and the larger M. agrestis (Nadachowski, 1984b;Popov, 2000). Considering the genetic and morphometric data obtained in this study, these species may be considered to actually be two species from the arvalis group, namely, the smaller M. arvalis and the larger M. ilaeus, not M. agrestis, especially because the latter species does not occur in Bulgaria today (Shenbrot & Krasnov, 2005) and was not indisputably found in the Late Pleistocene (Popov, 2018).
In contrast to other species of the subgenus Microtus, the newly described mtDNA lineage of M. ilaeus most probably did not survive into the present time; however, the exact timing and causes of its extinction remain unclear. Specimens from Cave 16 were obtained from layers below the Campanian Ignimbrite tephra dated to ca. 40 ka (Giaccio et al., 2008;Popov, 2018), which is slightly older than the age of the specimen from Bacho Kiro. The distribution of large (L > 3.1 mm) teeth at both Bulgarian sites, F I G U R E 2 Maximum clade credibility phylogeny of the subgenus Microtus. The phylogeny was reconstructed in BEAST and is drawn to a timescale. Black dots indicate nodes with >0.95 posterior probability. The estimated divergence times with their 95% highest posterior density intervals are indicated next to the nodes leading to the main lineages. The median of the 95.4% calibrated range ages of the directly radiocarbon dated specimens are given in the labels

Microtus rossiaemeridionalis
which were previously considered M. agrestis and now may be attributed to M. ilaeus, may provide more detailed insight into the chronology of this form. In Cave 16, such large teeth were found nearly exclusively in layers 9 and 8, both of which date to over 40 kya (Popov, 2000; Text S1). In Bacho Kiro, the same-size teeth occurred in multiple layers, with a noticeable peaks in layers 13 and 6b-7, and disappeared in layer 3. The new radiocarbon chronology of the site suggests that layer 13 dates to more than 51 ky BP, while the layers 6b and 7 to 43-36 ka cal BP (Fewlass et al., 2020).
There are no radiocarbon dates from layer 3 but based on small mammal assemblage it was dated to the LGM (Kozłowski, 1982;Nadachowski, 1984b). Specimens belonging to the new mtDNA lineage from Emine-Bair-Khosar were distributed across layers H and G between depths of 4.8 and 1.9 m. Layer H is dated to F I G U R E 3 Morphometric characteristics of the newly identified mtDNA lineage of Microtus ilaeus. The boxplots summarize the variations in length (L) and La/Li index of the arvaloid m1s from the three studied sites. L is given according to Nadachowski (1984b) and Popov (2000). The box extends from 1st to 3rd quartile, while the whiskers extend to the data points within 1.5 times the interquartile range (IQR) from the ends of the box, values outside this range are denoted with circles. Vertical lines denote medians, whilst crosses denote mean values. The colored symbols represent individual measurements of specimens assigned to species using mtDNA sequences. The reference ranges for M. arvalis (ARV) and M. agrestis (AGR) are based on modern specimens according to Nadachowski (1984a) (Table S7) the older part of MIS 3 (between >50 and 36 ka cal BP) (Ridush et al., 2013). The youngest specimens come from the uppermost part of layer G. This layer is attributed to MIS 2, most probably the post-LGM or even the Late Glacial period (Ridush et al., 2013).
These findings suggest that the extinction of the newly described mtDNA lineage may have been caused by environmental and climatic changes associated with the Pleniglacial to the Late Glacial or the Late Glacial to the Holocene transitions.

| Molecular rates and dating of divergences within subgenus Microtus
The Estimates of evolutionary rates are known to scale negatively with the age of calibration Molak & Ho, 2015).
In previous studies, very recent calibration points were used to estimate rates; in the case of M. arvalis, these points were sequences from specimens radiocarbon-dated to no more than 6 kya. In the case of M. agrestis, the calibration points employed were biogeographic events younger than 10 kya. The substitution rates obtained from such calibration are more suitable for the estimation of recent, intraspecific rather than interspecific events.
In the present work, we used calibration specimens dated up to 40 kya to estimate interspecific divergence times. Consequently, our estimates were nearly two times older than those proposed by , who based their calculations on the substitution rate of Martínková et al. (2013). Our estimates are also nearly two times younger than the estimates based on fossil calibration (Thanou et al., 2020 (Bannikova et al., 2010). The obtained estimation is also highly consistent with the first appearance of the M. arvalis group in fossil record (MIS 14-11; 0.60-0.45 Ma). Although, the distinction between species from the arvalis group and M. agrestis in fossil record may be difficult as exemplified by our study, the appearance of arvalis group around 0.60-0.45 Ma was reported independently by multiple authors (Berto et al., 2021;Fejfar, 1965;Kolfschoten, 1990;Maul & Markova, 2007 (Bikchurina et al., 2021). Therefore, the taxonomic forms identified above should be considered valid species despite their low genetic divergence of only 4%-5%. The mtDNA divergence between the newly identified lineage and M. ilaeus is also much higher than between the latter and M. ilaeus igromovi, which are considered subspecies ( Figure S1; Golenishchev et al., 2019).
This reinforces the interpretation that the newly identified lineage represents a distinct species. On the other hand, Allen et al. (2020) suggested that interspecific reproductive isolation in mammals appears when cytb DNA sequences differ by over 7.2%. Because of these differences in interpretation of mtDNA diversity, we do not postulate taxonomic changes in the M. arvalis species group.
Further accumulation of genomic and morphological data will provide a better understanding of the evolutionary history of this lineage.
Overall, our ancient DNA study revealed the existence of a previously unknown, highly divergent population of the Tien Shan vole (M.

CO N FLI C T O F I NTE R E S T
The authors declare that they have no conflict of interest.