Quaternary geomorphological and climatic changes associated with the diversification of Iberian freshwater fishes: The case of the genus Cobitis (Cypriniformes, Cobitidae)

Abstract We studied the population genetic structure of Cobitis vettonica, an endangered freshwater fish species endemic to the Iberian Peninsula, in order to propose a biogeographic model of the responses of species to the multiple changes that occurred in the Iberian hydrological system during the Quaternary period. We also deciphered the relationship of C. vettonica with its sister species C. paludica, particularly in sympatric areas, and provide genetic information for conservation purposes. To achieve this goal, we analyzed both mitochondrial and nuclear data (the cytochrome b and the nuclear recombination activating 1 genes) and a battery of single‐nucleotide polymorphisms (SNPs) of 248 individuals of C. vettonica or C. paludica from 38 localities, including some sympatric ones, covering the entire distribution area of C. vettonica. We highlight the important role played by the hydrogeomorphological processes and climatic changes that occurred in the Iberian Peninsula during the Quaternary on both the population structure of C. vettonica and its relationship with its sister species C. paludica. Our results support the genetic introgression of populations at the eastern limit of the distribution of C. vettonica. Furthermore, we postulate genetic introgression in sympatric areas. Finally, we propose the establishment or expansion of four Operational Conservation Units (OCUs) for C. vettonica, and highlight the threat faced by its populations due to the low level of genetic diversity detected for some of its populations and genetic introgression with C. paludica, which could eventually displace C. vettonica, resulting in a loss of diversity in this species.


| INTRODUC TI ON
The Quaternary is a geological period characterized by glacialinterglacial cycles that have dominated the global climate since 2.58 Mya to the present era (Gibbard et al., 2010;Pillans & Naish, 2004), which have had drastic consequences on the evolution of the biota of many regions. Due to advances in the field of phylogeography, we have a better understanding of the responses of organisms to Pleistocene events (Hewitt, 2004;Weiss & Ferrand, 2007).
Classical studies support the role of Mediterranean peninsulas as refuges for fauna during the Quaternary, which provided the stock for recolonizations of northern and central Europe (Hewitt, 1996;Taberlet & Bouvet, 1994). The Iberian Peninsula is considered one of the most important refuges during this period, as shown by several studies, mainly of terrestrial fauna (Querejeta et al., 2017;Valdiosera et al., 2008).
Primary freshwater fishes (i.e., fishes strictly intolerant of saltwater; Myers, 1966) have limited dispersal abilities and are often restricted to specific hydrographic basins. Dispersal between basins during the Quaternary was possible mainly through downstream connections caused by the decrease in sea level during cold periods or by upstream piracy (Corral-Lou et al., 2019;Mesquita et al., 2005;Perea & Doadrio, 2015). The formation of the Iberian fluvial network also culminated during this period, which affected the region's hydrogeomorphology and therefore the current evolutionary patterns found in primary freshwater fauna (Alonso et al., 2007;Corral-Lou et al., 2021;Pais et al., 2012;Perea et al., 2016). Likewise, the current population structure of many Iberian freshwater fishes has been attributed to the interaction of various natural factors that occurred during the Quaternary such as the drying up of bodies of water, sealevel fluctuations, tsunamis, stream piracy, isolation of basins, hydrogeomorphological changes, and secondary contact of two different basins (Casal-López et al., 2017;Corral-Lou et al., 2019;Gante et al., 2009;Perea et al., 2016).
Despite efforts made in the last several decades, there is still much to be learned about the impact of Quaternary changes on the evolutionary processes and patterns of diversification of Iberian freshwater fish populations. Especially those species with a restricted distribution range since they were probably more affected by climatic and geological changes during the Quaternary than species with larger ranges. In order to address some open evolutionary questions, such as the role of hydrogeomorphologic changes and genetic introgression in species evolution, we analyzed populations of Cobitis vettonica as a case study. This species is an endangered Iberian freshwater fish whose distribution is restricted to a few rivers in the Tagus and Duero basins in the mid-western Iberian Peninsula (Figure 1). It inhabits rivers with low pH and water hardness levels, and gravel and rocky bottoms, and is more commonly found in the headwaters of these rivers (Carmona et al., 1999;Collares-Pereira et al., 2021;Doadrio et al., 2011;Perdices & Coelho, 2020). In contrast, the sister species of C. vettonica, Cobitis paludica (Doadrio & Perdices, 2005;Perdices & Doadrio, 2001), is a generalist that inhabits most of the Iberian drainages including Tagus and Douro, with a preference for streams close to the main channel with high suspended solids, high water hardness, low transparency, low current velocity and muddy bottoms (Carmona et al., 1999;Doadrio et al., 2011). Sympatric zones of C. vettonica and C. paludica have been reported in the western limits of the distribution area of C. vettonica but genetic introgression between them has not been reported (Perdices & Coelho, 2020). However, in some eastern populations there has been mention of genetic introgression between C. vettonica and C. paludica, but no more information has been detailed (Doadrio et al., 2011. Sister species of Iberian freshwater fishes generally have an allopatric distribution that was established mainly before the Quaternary (Doadrio, 1988;Sousa-Santos et al., 2019). Some species now show patterns of sympatry in certain areas as a consequence of secondary contacts during the Quaternary, which has led to genetic introgression, as is the case of some species within the genera of Luciobarbus and Phoxinus (Corral-Lou et al., 2019;Denys et al., 2013;Machordom et al., 1990). In the case of C. vettonica and C. paludica, phylogenetic studies have shown they diverged during the Plio-Pleistocene (Doadrio & Perdices, 2005;Sousa-Santos et al., 2014). However, both the origin of the sympatric zones between C. vettonica and C. paludica and the plausible genetic introgression between them in the western distribution area of C.
vettonica are unknown. All these points make fishes with restricted distribution areas as excellent models to decipher the Quaternary effects on the diversification of freshwater-restricted taxa within a glacial refuge as the Iberian Peninsula ).
Phylogeographic studies of Iberian primary freshwater fishes have mainly used mitochondrial genes combined with nuclear markers such as microsatellites or introns (Casal-López & Doadrio, 2018;Corral-Lou et al., 2019;Gonzalez et al., 2014Gonzalez et al., , 2018Perea & Doadrio, 2015). In recent years, phylogeographic and population genetics studies have taken advantage of next-generation sequencing (NGS) technologies to broadly screen the genome at high resolution, yet some challenges remain in the analysis of NGS F I G U R E 1 Sampling localities included in this study. The orange shading delimits the Duero Basin, and the green shading delimits the Tagus Basin. Numbers correspond to those listed in Table 1 data such as the handling of large-scale and complex data, the upstream process of pipeline, and the limited number of available reference genomes (McCormack et al., 2013;Tan et al., 2019).
Combined analyses of single-nucleotide polymorphisms (SNPs) with mitochondrial and nuclear markers have proven useful to provide more complete and more detailed phylogeographic and biogeographic models of the past and present relationships of populations of various species Mendes et al., 2019;Zarraonaindia et al., 2012).
The main aim of this study is to provide a robust biogeographic model for the species C. vettonica throughout its entire distribution as a witness of the evolution of the Iberian Peninsula throughout the Quaternary. We also decipher the relationship of C. vettonica with its sister species C. paludica at the limits of its distribution area. In addition, we assess the population structure and genetic diversity of the species across its distribution to revise the Operational Conservation Units (OCUs; Doadrio et al., 1996) previously established for C. vettonica  or to establish new ones. More detailed and accurate OCUs and deciphering the relationship with its sister species throughout its entire range are key for the effective management and conservation of the endangered C. vettonica. To achieve these goals, we analyzed the mitochondrial marker cytochrome b (MT-CYB), the nuclear marker recombination activating gene 1 (RAG1), and a set of SNPs obtained through next-generation sequencing of populations throughout the entire distribution range of C. vettonica and some of C. paludica from adjacent sub-basins.

| Sampling, DNA extraction, amplification and sequencing
We sampled 204 individuals of C. vettonica and 60 of C. paludica from a total of 38 localities ( Figure 1; Table 1). The 27 localities (23 in the Tagus and 4 in the Duero) in which C. vettonica was found cover the entire known distribution area of the species, and includes its type locality (Árrago River, Tagus Basin). The samples of C. paludica were collected from 13 localities, including its type locality (Tiétar River, Tagus Basin): of these, 11 were either adjacent to those of C. vettonica or in sympatric localities in the Tagus Basin, and two were in the Duero Basin ( Figure 1; Table 1). Tissue samples were obtained from the DNA and Tissue Collection at the National Museum of Natural Sciences of Madrid (MNCN-CSIC; Table S1). Sequences of MT-CYB of C. vettonica and C. paludica available in GenBank were also included in the study (33 of C. vettonica and 7 of C. paludica; Table S1).
For each individual, DNA was extracted from ventral fin tissue using the Qiagen DNeasy ® Blood and Tissue Kit (Qiagen, Inc., Valencia, CA, USA), following the manufacturer's protocol.

| Genotyping and SNP filtering
For the SNP study, we selected eight populations of each C. vettonica and C. paludica (Table 1) based on the structure observed in a previous study using mitochondrial data and available DNA . A total of 106 individuals (60 of C. vettonica and 46 of C. paludica; Table S1) were used to prepare libraries for double digest restriction site-associated DNA sequencing (ddRADseq) following the protocol described by Kess et al. (2016). All DNA samples were used as an input for a custom library preparation protocol. Libraries were dual-indexed for postsequencing demultiplexing. The samples were run in a NovaSeq 6000 PE150 lane.
Trimmomatic 0.36 (Bolger et al., 2014) was used to remove adapters (ILLUMINACLIP option). Using the process_radtags program in STACKS 2.4 (Catchen et al., 2013), all reads were truncated to the same length of 95 bp, and low-quality reads were removed using the -q parameter according to phred33 system. A de novo_ map analysis was also performed in STACKS, in which different programs were run to assemble loci in each individual (ustacks), build a catalogue (cstacks), match all generic samples against the catalogue (sstacks), and reconstruct loci using R2 reads and identify SNPs using the metapopulation information (gstacks). Prior to running the STACKS modules, several tests were used to identify which parameters maximized the number of SNPs obtained in at least 80% of the individuals (r80 rule; Paris et al., 2017). Since SNPs were obtained for both species, we explored parameters in a conservative way in the de_novo maps module of stacks. The parameter m (i.e., the minimum depth of coverage required to create a stack) was set to 5 due to the depth of coverage values obtained. The M parameter (i.e., number of mismatches allowed between stacks within individuals) has to be carefully set. If it is set too high, paralogous or nonhomologous loci can be incorrectly merged into the same locus and if it is set too low, homologous loci can be lost. For this reason, the parameter M was explored between 3 and 5. Following the indications of Paris et al. (2017), the values of n (i.e., the number of mismatches allowed between stacks between individuals) were explored from 2 to 4 (Table S2).
Finally, the selected parameters were m = 5, M = 3, and n = 2. The TA B L E 1 Information on the sampling localities included in the present study

| Phylogeny
A phylogenetic tree was constructed to assess relationships based on the collapsed MT-CYB haplotypes matrix. The selected substitution models were SYM, HKY + I, and GTR + G for the first, second, and third position, respectively, based on the results obtained in PartitionFinder2 (Lanfear et al., 2017) using the Akaike information criterion (AIC; Akaike, 1974). The analysis was implemented in MrBayes v3.2 (Ronquist et al., 2012), with two simultaneous independent runs each with four Markov chain Monte Carlo (MCMC), which were run for 5 × 10 7 generations. The first 25% of generations were removed as burn-in. Posterior probability (pp) values were used to assess the reliability of the phylogenetic hypothesis.
Two sequences of other species of Cobitis were used as outgroups: C. bilineata (EF605321.1) and C. zanandreai (AF263089.1). The genetic divergence between the lineages obtained in the MT-CYB analysis was evaluated through uncorrected p-distances with 1000 permutations using MEGA v.6.0 (Tamura et al., 2013). For the SNP data, phylogenetic relationships were evaluated using the complete SNP matrix (C. vettonica + C. paludica) through the Maximum Likelihood (ML) method in the RaxML program implemented in CIPRES Science Gateway v3.3. The evolutionary model selected was ASC_GTRGAMMA, as recommended in the program manual for data that contain only variable sites, and the Lewis ascertainment bias correction was used (Stamatakis, 2016). The robustness of the tree was evaluated with 1000 bootstrap (b) replicates.

| Genetic structure
To examine the genetic structure of all studied populations of C.
vettonica and C. paludica, haplotype networks for the two genes were constructed using the median-joining algorithm (Bandelt et al., 1999) implemented in PopART (Leigh & Bryant, 2015). Analysis of molecular variance (AMOVA) was used to determine the source of the genetic variation in MT-CYB for the populations identified as C.
vettonica (excluding V4 subset, see below) using different groupings based on basin, sub-basin, and evolutionary lineage. These analyses were implemented in Arlequin v.3.11 (Excoffier & Lischer, 2010) with 10,000 permutations. Population differentiation in terms of Φst (Hudson et al., 1992) between all population pairs (except those represented by only one individual) for the complete dataset (C. vettonica and C. paludica) was also calculated in Arlequin v.3.11. For the localities in which mitochondrial haplotypes of both C. vettonica and C. paludica were found, the Φ st was calculated by treating the individuals of each species as separate populations.
For the SNPs data, only neutral loci were taken into account to study the population structure in order to avoid the bias that could be caused by candidate loci for selection. For this reason, we evaluated the presence of loci under selection with BayeScan v.2.0 (Foll & Gaggiotti, 2008), using the default parameters except for the prior odds (prior odds = 100; Lotterhos & Whitlock, 2014). We removed the loci under selection and constructed two new matrices for the neutral loci, keeping 3998 loci in the C. vettonica matrix and 4500 loci in the C. vettonica + C. paludica matrix. STRUCTURE (Pritchard et al., 2000) was used to assess the genetic structure based on the two SNP matrices. The most probable number of subpopulations (K) for each analysis was equal to the number of populations studied + 1 (K = 9 and K = 17, respectively). We performed 10 independent simulations for each K with a burn-in length of 50,000 and 50,000 MCMC repetitions after the burn-in. The most probable number of subpopulations for each matrix was estimated by taking into account the results of both the ΔK (Evanno et al., 2005) and the Puechmaille method (Puechmaille, 2016). The web server StructureSelector (Li & Liu, 2018) was used to make these estimations. We selected K based on both methods because a greater probability of K = 2 exists when the structure is analyzed with the ΔK Notes: Also detailed are the number of individuals studied for each of the marker types: mtDNA (MT-CYB), nDNA (RAG1), and SNPs. Under the mtDNA column, for each sympatric area, the two numbers indicate the number of individuals analyzed for C. vettonica and C. paludica, respectively.

TA B L E 1 (Continued)
method (Janes et al., 2017), and similar results for different methods indicate a clear signal (Puechmaille, 2016).

| Genetic diversity and demography
Different genetic diversity parameters were estimated for both MT-CYB and RAG1 in DnaSP for each river, sub-basin, species, evolutionary lineage, and OCUs. The studied genetic diversity parameters were haplotype or nuclear alleles number (h or a for MT-CYB and RAG1, respectively), haplotype or allelic diversity (Hd or a d for MT-CYB and RAG1, respectively), nucleotide diversity ( To evaluate the demography of populations of C. vettonica, deviations from a model of mutation-drift equilibrium for MT-CYB were tested using Fu's Fs (Fu, 1997) and Tajima's D (Tajima, 1989), as implemented in Arlequin v.3.11.

| Divergence times and niche modeling
Divergence times among the populations of C. vettonica were estimated using a relaxed lognormal clock and a coalescent model on the collapsed MT-CYB haplotypes matrix in BEAST v 1.8.4 . The molecular clock was calibrated using, as a normal prior, an evolutionary rate of 0.34% divergence per lineage per million years (Doadrio & Perdices, 2005), and the substitution model used was GTR + I. The quality of the model was evaluated by the Area Under the Curve (AUC) derived from the Receiver Operating Characteristic (ROC).

| Ancestral area reconstructions
The ancestral areas of C. vettonica were reconstructed using both Huebra and Yeltes, K: Tamuja and Almonte, L: Ibor, and M: Tiétar. We used the BEAST-derived trees as the input files for RASP. We eliminated 50% of the total initial trees, and used 100 random trees for the analysis. Ancestral ranges were limited to include no more than three adjacent areas. Only the reconstructed areas for C. vettonica are shown as this lineage is the focus of the present study.

| Phylogeny
The phylogenetic analysis of MT-CYB recovered two main lineages corresponding to the species C. vettonica (S1) and C. paludica (S2) with high support (pp = 1; Figure 2). The species C. vettonica (S1) was not monophyletic since many individuals identified morphologically as C. vettonica from the Alagón sub-basin populations were found in both lineages (i.e., Jerte, Francia, Alagón, Caparro, Cuerpo de Hombre [Cdh], and Gata rivers). The lineage of C. paludica (S2) was also not a monophyletic group since one individual identified morphologically as C. paludica from the Salor sub-basin was included in the C. vettonica lineage (S1). Within C. vettonica (S1), there were two well-differentiated lineages ( Figure 2). The first grouped popu- were not resolved. However, within V2, we detected another highly supported monophyletic grouping (pp = 1) in which only the single individual studied from Ladrillar River was excluded. The second well-differentiated lineage within C. vettonica (V3; pp = 1) included populations from the Ponsul, Aravil, and Erjas sub-basins located along the western limit of the species' distribution range, and one individual from Salor River (Figure 2).
The unrooted phylogenetic tree of C. vettonica and C. paludica

| Genetic structure
The haplotype network for MT-CYB was consistent with the phylogenetic reconstruction for MT-CYB. The network showed two main groups separated by 20 mutational steps corresponding to the species C. vettonica and C. paludica (S1 and S2, respectively; Figure 4). For C. vettonica (S1), the network was more informative F I G U R E 2 Phylogenetic tree based on Bayesian inference for the mitochondrial marker MT-CYB. Posterior probability values are indicated above branches. S1: Cobitis vettonica. The vertical colored bars represent the three main lineages (V1-V3) of the species and their localities (see legend). S2: Cobitis paludica. Also, the map is shown indicating the sampling locations referenced in Table 1. The color of the sampling point indicates the detected lineage in each population based on the results of the phylogenetic tree TA B L E 2 Uncorrected absolute genetic distances between the main lineages of C. vettonica (V1, V2, and V3) and C. paludica (S1  Table 1 F I G U R E 4 Haplotype network for the mitochondrial marker MT-CYB. The localities are indicated by different colors and numbers (see Table 1). Mutational steps are represented as follows: one short line for two steps, two short lines for three steps, or a circle with the number of steps indicated for four or more steps. V4* refers only to those individuals identified as C. vettonica and found in the S2 lineage According to the AMOVAs with the MT-CYB data, the highest percentage of variance among groups was when the groupings were comprised of the lineages found in the previous analysis (V1, V2, V3, and V4; Table 3). The percentage of variance explained between basins (Duero and Tagus) was not significant, with most of the variation explained by differences between and within populations.
Population differentiation between localities within each lineage of C. vettonica for MT-CYB was low (Φ st < 0.34), and no significant differences were found, except for the Acebo River, which showed a high level of differentiation with the Árrago river (Φ st > 0.74; Table   S4). High values of Φ st were detected between populations from different lineages (Φ st > 0.75).
The results of the SNP structure analyses showed that the most probable number of subpopulations (genetic groups) for C. vettonica matrix was K = 4 and K = 2 for Puechmaille and ΔK methods, respectively (Table S5)

| Genetic diversity and demography
Overall genetic diversity parameters for all individuals identified as C. vettonica based on MT-CYB were high (Table 4). Excluding populations of C. vettonica in which only one individual was studied, Hd ranged from 0 to 1, and h ranged from 1 to 7. The populations with the lowest values of genetic diversity were Mayas, Acebo, Arades, and Caparro with only one haplotype each (Table 4). The most variable populations for all of the diversity parameters were the lower and middle Erjas, Alagón, and Gata. The sub-basins with the highest F I G U R E 5 Haplotype network for the nuclear marker RAG1. The colors correspond to the different localities, except pink, which is used for all localities of C. paludica. Mutational steps are represented as follows: no line for one step and one short line for two steps. In the legend, the populations identified as C. vettonica are represented by circles and those identified as C. paludica are represented by squares. Numbers correspond to those used in Table 1 genetic diversity values were Erjas and Alagón, specifically the western Alagón sub-basin. The one with the lowest values was Águeda with only one haplotype.
The highest number of private alleles for C. vettonica, based on the SNP data, was found in the upper Erjas (227 alleles) and Caparro (76 alleles) rivers (Table 5). However, when the complete matrix including the populations of C. paludica was taken into account, the number of private alleles decreased drastically for all rivers, except Mayas (41 vs. 40) and Gata (1 vs. 0), whose numbers remained nearly the same as when only C. vettonica was considered. In the upper Erjas and Caparro rivers, the number of private alleles decreased to eight and two, respectively (Table 5). Notably, when Aurela and Salor TA B L E 3 Results of the AMOVA for MT-CYB for groups established according to basin, sub-basin and lineage (see text for more details)

F I G U R E 6
Results of the STRUCTURE analyses. The first graph shows the results obtained for K = 2 and K = 4 when only the populations of C. vettonica were analyzed (1-8). The third and fourth graphs show the results obtained when K = 2 and K = 6 when populations of both C. vettonica (1-8) and C. paludica (9-16) were analyzed. Abbreviation: R. for river TA B L E 4 Genetic variability parameters for MT-CYB and RAG1 and the results of the neutrality tests of Tajima and Fu for the different groups (see text for details

| Divergence time and niche modeling
The topology of the tree obtained by calibrating the molecular clock was identical to that of the phylogenetic Bayesian tree (Figure 7) (Elith et al., 2006;Swets, 1988 Figure 8A,B). For the LIG period ( Figure 8C), the area with the optimal conditions for the species generally overlapped with that found for the present, although the highest probability values were concentrated in the central part of the species' current distribution (i.e., the western Alagón and Águeda sub-basins in the Tagus and Duero basins, respectively). For the LGM period ( Figure 8D), optimal conditions were located in the southwestern area of the species current distribution (the Erjas and lower Árrago sub-basins and Salor, Aravil, and Ponsul sub-basins). Some rivers outside the current distribution of the species in Portugal also showed optimal conditions.

| Ancestral area reconstructions
In general, the results of the ancestral area reconstructions using either the S-DIVA or S-DEC method were congruent (Figure 9).
Both the vicariant events between lineages and the ancestral area with the highest marginal probability were consistent between the two methods. The most probable ancestral areas were as follows: Alagón-Erjas sub-basins for the junction node of V1-V2-V3, Alagón sub-basin for V1-V2, and Erjas sub-basin for V3. Both analyses supported the diversification of the main evolutionary lineages due to vicariant events. Several dispersal and vicariant events were also detected within V1 and V3 (Figure 9). The population structure of C. vettonica based on the mitochondrial data differed from the main pattern observed in other European freshwater fishes, which is that gene flow within the same basin is greater than among different basins (e.g., Buj et al., 2017;Corral-Lou et al., 2019;Marić et al., 2017Marić et al., , 2019Seifertová et al., 2012;Wetjen et al., 2020). Specifically, for C. vettonica, genetic differentiation between populations from the same basin (Tagus Basin) is greater than between populations from the two main basins (Duero and Tagus basins). Similarly, greater differentiation within the same sub-basins than between different sub-basins has been found for Achondrostoma salmantinum, which has a restricted distribution area that partially overlaps with that of C. vettonica . Iberian freshwater fishes with a wider distribution range that partially or completely overlaps with that of C. vettonica, such as Squalius carolitertii, S. pyrenaicus, S. alburnoides, or Luciobarbus bocagei, do not show a population structure pattern similar to that of C. vettonica (Cunha et al., 2004;Doadrio et al., 2002;Perea et al., 2021;Sousa-Santos et al., 2007). Differences in the genetic pattern of the various species may be due to differences in size and dispersal abilities. Species of Cobitis are smaller benthic organisms with lower dispersal abilities (like A. salmantinum) than species of Luciobarbus and Squalius (Doadrio et al., 2011). This difference in the biology of freshwater fish species, along with different colonization times, could account for the population structure found in C. vettonica and A. salmantinum and the lack of structure in the species of Luciobarbus and Squalius (see Biogeography section).
Consistent with this, we present strong evidence confirming the in-

| (V1) Western Alagón sub-basin and Duero Basin
The Gata population within the western Alagón sub-basin was the only one that presented a mitochondrial haplotype of C. paludica.
However, in the SNP analysis, none of the individuals showed admixture with C. paludica, and they also resolved as a phylogenetically isolated group. Therefore, at present, there is no evidence to support genetic introgression with C. paludica in this area.

| (V2) Eastern Alagón sub-basin
We found strong evidence of genetic introgression between C. paludica and C. vettonica in the eastern Alagón sub-basin, in line with previous studies (Doadrio et al., 2011 The case of the population from Caparro River is particularly interesting. It was the only eastern Alagón sub-basin population that showed a closer phylogenetic relationship with populations of C.
paludica than with those of C. vettonica and a higher degree of genetic introgression. This fact could be explained by two hypotheses.
The first is that the source population was C. paludica and, through a founder effect, a few individuals of C. vettonica established themselves in this locality, giving rise to a population of C. paludica introgressed by C. vettonica. The second hypothesis is that the source population was C. vettonica, but it was displaced by C. paludica after this species' arrival due to its better adapted genome (Abbott et al., 2013).

| (V3) Ponsul, Aravil, and Erjas sub-basins
According to the mitochondrial analyses, the Ponsul, Aravil, and Erjas sub-basin populations of C. vettonica were related to a population of C. paludica from Salor River whose most common haplotype was also shared by all of these populations of C. vettonica. In addition, the However, we postulate genetic introgression in the Ponsul, Aravil, and lower Erjas sub-basin, where both species are present (Perdices & Coelho, 2020). This postulation is based on their location outside the area indicated by the ecological niche models and the fact that introgression by hybridization is a common phenomenon that has been detected for not only C. vettonica and C. paludica but also other species of European Cobitis (Bohlen & Ráb, 2001;Janko et al., 2007;Ráb & Slavík, 1996). Furthermore, this general pattern has often been observed in the Iberian Peninsula for two sympatric species as a consequence of secondary contact after diversification (Corral-Lou et al., 2019;Doadrio et al., 2021;Perea et al., 2016Perea et al., , 2021. Within this group, hybrid localities seem to be restricted to the lower reaches of rivers, while the upper reaches are inhabited by only C. vettonica (Perdices & Coelho, 2020). The habitat preferences of the two species, C. paludica in the middle and lower reaches, and C. vettonica in the upper reaches (Doadrio et al., 2011), likely explains the restricted distribution of the hybrids.

| Biogeography
According to our divergence time estimations, the timing of these divergences agrees with a common period of population differentiation for many Iberian freshwater fish species, which in turn coincides with the culmination of the formation of the Iberian drainage network and the Pleistocene glacial cycles (Casal-López et al., 2017;Corral-Lou et al., 2019Pais et al., 2012;Perea et al., 2016).
Although geological information for the study area is scarce, several tectonic and climatic events are known to have occurred during the Quaternary near the study area, which influenced the hydrogeomorphology of the aquatic network such as the fluvial capture phenomena, the presence of paleoglaciers during the last glacial cycle, and changes in the trajectory of some rivers (Benito et al., 2003;Carrasco et al., 2013Carrasco et al., , 2015Goy et al., 2020). Therefore, we propose several hypotheses based on the biogeography of the species, which may reveal the hydrogeomorphological evolution of the study area within the Iberian Peninsula. The divergence of the two other lineages comprising the western and the eastern Alagón sub-basin populations (V1 and V2) also occurred during the Pleistocene (Chibanian period, ~80,000 Mya). The complex geology of the Alagón sub-basin is evident by the anomalous current trajectory of some of its rivers (Carrasco & Pedraza, 1991;Díez Herrero, 2003;Goy et al., 2020;Jiménez, 1994;Schnabel & Gutiérrez, 2014). For instance, the two banks of the Alagón subbasin, represented by the two mitochondrial lineages, are currently joined by the mouth of the Árrago River in the Alagón River, that is, the main river of the western and the eastern Alagón sub-basin, respectively. The union between the two banks occurred relatively recently as a consequence of the upriver action of the Alagón River, which captured the Caparro River that eventually flowed into the Tagus through the Fresnedosa riverbank (Schnabel & Gutiérrez, 2014). Given this context, we propose two scenarios for the divergence of V1 and V2. In the first, the two lineages diverged prior to the union of the two banks and later genetic flow between populations was not possible due to the poor habitat conditions for C.
vettonica in the lower Alagón sub-basin. In the second, they diverged after the union of the banks due to climatic changes during the LGM.
According to our niche models, the optimal ecological conditions for C. vettonica were located further downstream and, as predicted by phylogeography theory, the latitudinal distribution of organisms retracted southwards (Ehrich et al., 2007;Hewitt, 1996;Rodríguez et al., 2011). Once optimal conditions returned to the river headwaters, the species went back upstream toward both the east and west margins of the Alagón sub-basin, and subsequently diversified.
After the divergence of the two banks of the Alagón, the populations in the Duero and western Alagón sub-basin diverged.
Several studies have reported on the genetic structure of freshwater fauna associated with connections between the Tagus Basin and adjacent basins (e.g., Alagón, Alberche and Lozoya rivers; Carmona et al., 2000;Casas-Sainz & De Vicente, 2009;Doadrio, 1988;Pérez-González, 1980;Sousa-Santos et al., 2007). Although recent connections may be explained as a consequence of the tectonic activity in the region (Goy et al., 2020), there are no geological studies that support a recent connection between the western Alagón sub-basin (Tagus Basin) and the Águeda sub-basin (Duero Basin), despite the close proximity of some of their river headwaters (in some cases, as little as 150 m of linear distance).
A connection, however, has been hypothesized to explain the co-occurrence of individuals with the mitochondrial genome of Squalius carolitertii (distributed in the Duero Basin) and those of S. pyrenaicus (distributed in the Tagus Basin) in the Árrago River . The close proximity of Mayas (Duero Basin) and Árrago (Tagus Basin) rivers, whose courses are only separated by ~150 m, is particularly notable. Piracy events that occurred between these two rivers in the Pleistocene may explain the close mitochondrial relationship of the populations inhabiting them, as suggested by the dispersal event estimated in our ancestral area reconstruction analysis. Later isolation of the Duero and Tagus hydrological basins then led to the divergence of these populations, as corroborated by the SNP analysis.

| Conservation
The habitats of C. vettonica have been and continue to be threatened by the main causes of biodiversity loss (e.g., overexploitation, water pollution, flow modification, habitat destruction and degradation, and the introduction of invasive species) (Doadrio et al., 2011;Dudgeon, 2019;Sousa-Santos et al., 2014). Given these threats, the populations in OCU I (Águeda sub-basin) are particularly vulnerable as they had the lowest genetic diversity values for both MT-CYB and the SNPs, and only one MT-CYB haplotype shared by all the populations. This low level of genetic diversity makes them more sensitive to extrinsic changes and therefore at greater risk of extinction (Frankham et al., 2002). In addition, populations in OCU I are shrinking as a result of the formation of dams along the Águeda sub-basin, and in some localities (i.e., Turones), they have even disappeared.
The species' distribution in this area is now restricted to a few tributaries of Mayas River. Although the genetic diversity values of populations in OCU II were higher than those of OCU I, the possibility of a bottleneck followed by a population expansion cannot be rejected for these populations, despite a decrease in the number of individuals observed in recent years (Doadrio et al., 2011). Exotic species, such as Lepomis gibbosus and Micropterus salmoides, have colonized the upper parts of the rivers covered by OCU II, which may be one of the main causes for the decline in the number of C. vettonica in these rivers. Other potential causes of this decline remain unknown.
For OCU IV, there was no genetic distinction between the non-introgressed (upper Erjas) and putatively introgressed (Ponsul, Aravil, and lower Erjas) populations. Despite this, any management plan must take into consideration this potential introgression as the indiscriminate mixing of individuals from these two groups could lead to genetic introgression with C. paludica throughout the entire area of OCU IV. Cobitis paludica, which is more of a generalist than C. vettonica, is widely distributed throughout the Iberian Peninsula, occupying a great variety of ecological niches (Doadrio et al., 2011).
Thus, genetic introgression by C. paludica in sympatric localities would likely prove disadvantageous for C. vettonica in general, as two of the four OCUs (III and IV) would be affected by this phenomenon. It could lead to the extinction of C. vettonica in these areas or to adaptive variations, resulting in a major loss of the genetic diversity of this species. a member of the Spanish Supercomputing Network, which was used to perform simulations/analyses.

CO N FLI C T O F I NTE R E S T
The authors declare no conflict of interest.

DATA AVA I L A B I L I T Y S TAT E M E N T
The new sequences of the mitochondrial (MT-CYTB; OM234794-OM235001) and nuclear (RAG1; OM235002-OM235091) markers obtained from this study are available in GenBank.