Patterns of speciation in a parapatric pair of Saturnia moths as revealed by target capture

The focus of this study has been to understand the evolutionary relationships and taxonomy of a widely distributed parapatric species pair of wild silk moths in Europe: Saturnia pavonia and Saturnia pavoniella (Lepidoptera: Saturniidae). To address species delimitation in these parapatric taxa, target enrichment and mtDNA sequencing was employed alongside phylogenetic, admixture, introgression, and species delimitation analyses. The dataset included individuals from both species close to and farther away from the contact zone as well as two hybrids generated in the lab. Nuclear markers strongly supported both S. pavonia and S. pavoniella as two distinct species, with hybrids forming a sister group to S. pavoniella. However, the Maximum Likelihood (ML) tree generated from mtDNA sequencing data presented a different picture, showing both taxa to be phylogenetically intermixed. This inconsistency is likely attributable to mitonuclear discordance, which can arise from biological factors (e.g., introgressive hybridization and/or incomplete lineage sorting). Our analyses indicate that past introgressions have taken place, but that there is no evidence to suggest an ongoing admixture between the two species, demonstrating that the taxa have reached full postzygotic reproductive isolation and hence represent two distinct biological species. Finally, we discuss our results from an evolutionary point of view taking into consideration the past climatic oscillations that have likely shaped the present dynamics between the two species. Overall, our study demonstrates the effectiveness of the target enrichment approach in resolving shallow phylogenetic relationships under complex evolutionary circumstances and that this approach is useful in establishing robust and well‐informed taxonomic delimitations involving parapatric taxa.


| INTRODUC TI ON
A fundamental unit of biological diversity is the species.Essentially, all organisms are assignable to a species, although the identification of a species is not always straightforward due to morphological or genetic similarities between species.These difficulties are further confounded by the fact that many species remain undescribed and hence unnamed.The semantics over the word 'species' represents another long-standing debate among taxonomists.Consequently, over 20 species concepts have put forward (Hausdorf, 2011;Mallet, 2005;Mayden, 1997).These concepts, despite being mutually incompatible, share one common denominator: species are separately evolving metapopulation lineages (De Queiroz, 1999).Species delimitation, that is, the process of identifying species-level biological diversity (Carstens et al., 2013), is a crucial aspect of all taxonomic, evolutionary, and biodiversity research (Firneno et al., 2021;Hey, 2001).
Despite the importance of the species in characterizing biodiversity, delimiting species remains challenging as boundaries between them cannot always be defined unambiguously based on morphological and/or genetic characteristics (Barley et al., 2013).
For example, many species tend to exhibit considerable phenotypic variability across their range, while others appear morphologically indistinguishable despite reproductive isolation (Knowlton, 1993).
Evolutionary processes such as hybridization, introgression, and incomplete lineage sorting can further complicate the delimitation of species (Harrison & Larson, 2014;Ivanov et al., 2018), especially in cases of limited geographic sampling (Chambers & Hillis, 2020).
Reaching a consensus over the principles on how species should be delimited under various evolutionary circumstances is a pre-requisite for efficient communication of biodiversity and conservation efforts (Gaston & Spicer, 2004;Tobias et al., 2010).
Climatic oscillations during the Pleistocene have shaped the genetic variability and distribution of European Lepidoptera to a considerable extent (Schmitt, 2007).As speciation is often a long and gradual process, the uncertainty over species boundaries is especially pronounced under allopatric and parapatric settings (Joshi et al., 2022;Mutanen et al., 2012;Nosil, 2008).In cases of parapatrically distributed taxa, delimitation of species is further complicated by the possibility of admixture and hybridization between taxa, as the ranges of two taxa may have a zone of overlap with an incomplete reproductive barrier (Bull, 1991).It is important to note that observing parapatry in nature is difficult and can easily but not necessarily be regarded as another mode of speciation since parapatric distribution can be the result of parapatric speciation or secondary contact between populations first diverged in allopatry (Abbott et al., 2013).
Here, we focus on a parapatric pair of small Emperor moths belonging to the family Saturniidae: Saturnia pavonia Linnaeus, 1758 and Saturnia pavoniella Scopoli, 1763.The two species are widely distributed across Europe with S. pavonia found in Northern and Central Europe and S. pavoniella in Southern and Eastern Europe while the situations in Southern France, Iberian, and Balkan Peninsulas are still under debate (Huemer & Nässig, 2003;Mazel, 2007).S. pavoniella was separated from S. pavonia based on the differences in morphological characteristics of the wings and other body parts including the genitalia, as well as the infertility of female F1 hybrids.However, in Southern France, S. pavoniella populations show a mixture of morphological characters from both taxa (Mazel, 2007).Indeed, both species have been observed to overlap locally and create a suture zone in Southern France, Czech Republic, Italy, and Austria.While the distributions of the two taxa have not yet been elucidated in all detail, it seems evident that they form a pair of parapatrically distributed taxa with a long but narrow zone of contact, because both taxa are seldom co-occurring.The two taxa have been suggested to occasionally exchange genetic material as male hybrids can be fertile (Huemer & Nässig, 2003).Whether this capacity to hybridize has resulted in introgression and the disruption of the barrier to gene flow rendering the populations to remain discrete has remained unstudied in natural conditions.
With the increasing availability of genomic-scale data, it is now possible to understand the interplay between different evolutionary processes responsible for formation and distribution of biological diversity (Ferrer Obiol et al., 2023).Although whole genomes are now being sequenced at a relatively fast rate and are potentially an ideal source of data for phylogenomic studies, they remain unavailable for many non-model taxa (Breinholt et al., 2018).Approaches such as target enrichment methods allow for the efficient capture and sequencing of specific regions of the genome, which can be used to infer species boundaries (Mamanova et al., 2010).Further methodological advances have allowed the development of species delimitation programs in statistically rigorous frameworks such as Bayesian statistics and multispecies coalescent or MSC (Leaché et al., 2014;Yang, 2015;Zhang et al., 2018), to delimit species from molecular data.
Our aim was to shed light on the evolutionary relationships and speciation process of the parapatric and near-cryptic species pair of S. pavonia and S. pavoniella and to investigate whether they represent a single variable species or two distinct species in terms of biological and phylogenetic species concepts.To address these questions, we utilized a target enrichment approach that generates large number of fixed loci across the genome (Joshi et al., 2022;Mayer et al., 2021).
In relation to that, we further investigated the levels of admixture and traces of historical or ongoing introgression between the two taxa.Motivation to understand this system stems largely from the | 3 of 18 KHAN et al.
desire to find out better justified and more standardized ways to delimit species under complex evolutionary circumstances such as parapatry as this remains a largely unresolved problem that causes taxonomic instability.The benefits of using a set of standardized genetic markers in establishing a consistent criterion for species delimitation has been discussed in Eberle et al. (2020) and implemented in a set of metazoan taxa in Dietz et al. (2022).

| Taxon sampling and sample preparation
The study dataset involved a total of 27 specimens of the genus Saturnia, including 11 S. pavonia, 13 S. pavoniella, and 1 S. josephinae as an outgroup (Table 1).Additionally, two laboratory hybrid specimens (one male and one female) were also included in the dataset.One of the aims of including lab-reared hybrids was to confirm them as hybrids based on genetic data.The specimens were initially identified and assigned to putative species based on morphological characteristics (Details given in Appendix S1).The samples were collected from nine different countries across Europe (Austria, Croatia, Czech Republic, Finland, France, Italy, Slovenia, Spain) using the sex pheromone (E6,Z11)-C16Ac synthesized for this study by Till Tolasch (Germany).Lures were prepared at INRAE Orléans from 11-mm red rubber septum lures (Wheaton Scientific) loaded with heptane solutions (100 μL, 1 mg/mL) of the synthesized pheromone that had been shipped by courier to INRAE from Tolasch Lab in Germany.Loaded septa were placed in a 20-ml glass vial and kept in a freezer when not in use.Lures were sent by post to several colleagues across Europe to attract and collect males for this study (see Acknowledgements for list of colleagues).
The female used to rear the hybrids was a lab reared S. pavoniella from the Var region in Southern France and the male was a wild S.  States) following the protocols given by the manufacturers.

| Target enrichment library preparation
Following extraction, the DNA concentration of each sample was measured using the Invitrogen Quant-iT PicoGreen dsDNA

| Barcoding
For the mitochondrial DNA (mtDNA) sequencing, the COI gene was amplified using the primers HybLCO and HybHCO.PCR was conducted following the protocols in deWaard et al. ( 2008) (protocol given in Appendix S1).The PCR cleanup was then performed using Sephadex columns (Sigma-Aldrich).The purified product was sent to Macrogen Europe for sequencing.
The COI sequences received from Macrogen Europe were aligned using ClustalW as implemented in MEGA X software v10.0.5 (Kumar et al., 2018) and BioEdit 7.2.5 sequence alignment editor (Hall, 1999).The reference sequence against which the other sequences were aligned was retrieved from BOLD Systems 4.0 database (http:// www.bolds ystems.org).This reference specimen was one of those included in the sample specimens.For the specimens which failed the sequencing, we extracted the COI barcode region from the target enrichment data using exonerate ver.2.4.0 (Slater & Birney, 2005), with the alignment of individuals that were successfully barcoded as a query and our de-novo assembled target enrichment assemblies as a target (details of the command are given in Appendix S1).These extracted barcodes were aligned with the rest of the barcodes using Muscle aligner on EMBL-EBI server (https:// www.ebi.ac.uk/ Tools/ msa/ muscle/ ) with default parameters.The alignment was manually checked in Geneious Prime (https:// www.genei ous.com/ prime/ ) before proceeding for phylogenetic analysis.
To visualize divergence patterns in COI, phylogenetic trees of the aligned COI barcode sequences were generated using IQTREE ver.2.0.3 (Minh et al., 2020), using the best-fit model estimated by the program using ModelFinder (Kalyaanamoorthy et al., 2017) and ultrafast bootstrapping with 1000 replicates.We used the -bnni option to reduce the impact of model violation and overestimated branch support.Twenty likelihood searches were performed and the tree with the highest log-likelihood was visualized in FigTree v1.4.4 (available from tree.bio.ed.ac.uk/ softw are/ figtr ee/ ).

| Target enrichment bioinformatics
The demultiplexed data after being checked for quality were processed using the TEnriAn pipeline (Mayer et al., 2021).During the first step of the TEnriAn workflow, adaptor sequences and lowquality bases were removed from the demultiplexed data using fastp (Chen et al., 2018).The reads were then assembled using Trinity version 2.9.0 (Grabherr et al., 2011).We skipped the Contamcheck step during the assembly, where the data are checked and filtered for potential cross-contaminations, using 'all against all' BLAST searches as it was observed to be removing large amounts of useful data due to our samples being very closely related.Instead, we used the original assembled files before contamination checking as input for Orthograph v. 0.7.1.(Petersen et al., 2017) step.During this step, clusters of orthologous loci are inferred, using a pre-generated reference database.In our case, this reference database was generated using the coding gene alignments of five species -Bombyx mori, Danaus plexippus, Heliconius melpomene, Melitaea cinxia, and Papilio glaucus.Next, the orthologous clusters were aligned, using reference Hidden Markov Models (HMMs) in hmmalign (part of the HMMER package, http:// hmmer.org), and the alignments were filtered using various criteria such as coverage and outliers, following Mayer et al. (2021).We further removed the loci having GC content of higher than 60%, as those can have a negative effect on downstream phylogenetic analysis.

| SNP calling
To perform population genetic analyses, we also did variant calling on the target enrichment dataset.We used the ingroup sample with the highest number of orthologous clusters inferred overall (during orthology assessment step in the TEnriAn workflow) as a reference for SNP calling.The raw cleaned data (i.e., after removing adaptors and low-quality bases) were mapped against this reference using the BWA-MEM algorithm in bwa 0.7.17 (available from bio-bwa.sourc eforge.net) with the minimum seed length set to 30.
SNPs were then extracted from the sorted BAM files generated during mapping by using the samtools v. 1.16.1 (Danecek et al., 2021) mpileup option piped together with the bcftools call option.Filtering was then accomplished using bcftools to remove indels and multiallelic SNPs and keeping only biallelic SNPs.The number of SNPs obtained after this filtering step were 10,265.The SNP coverage distribution is shown in Figure S6.

| Phylogenetic analyses
The filtered alignments from the TEnriAn workflow were concatenated using FASConcat-G available at https:// github.com/ Patri ckKue ck/ FASco nCAT-G.This concatenated dataset was used to infer Maximum Likelihood tree using IQTREE 2.0.3 (Minh et al., 2020).To partition the data according to loci, we set up partitioned analysis using IQTREE with the option -m TESTMERGEONLY to resemble PartitionFinder (Chernomor et al., 2016) and the rcluster algorithm (Lanfear et al., 2014) with the rcluster percentage set to 10, under the AICc criterion.The best partitioning scheme was then used as an input for a partitioned analysis in IQ-TREE.We used the ultrafast bootstrap approximation with 1000 replicates (Hoang et al., 2018).
To further reduce the impact of severe model violations, the -bnni option was used.We also performed an SH-like approximate likelihood ratio test (Guindon et al., 2010) with 1000 replicates using the -alrt option.Twenty tree searches were run and the tree with the highest log-likelihood was visualized in Figtree.
Generally, in phylogenomic datasets with multiple loci, some amount of discordance is expected among the gene trees.To investigate the impacts of incomplete lineage sorting, we inferred a species tree using ASTRAL-III v. 5.7.4 (Zhang et al., 2018), since it is statistically consistent under the multispecies coalescent model.
A set of gene trees was inferred using IQTREE version 2.0.3 (Minh et al., 2020) where IQTREE was prompted to perform model selection (using ModelFinder) and tree inference separately for each locus.To improve the ASTRAL results further, the gene trees were filtered using program TreeShrink (Mai & Mirarab, 2018) before running ASTRAL to detect and prune any abnormally long branches.
The program was run in default 'per-species' mode with a false positive tolerance rate (α) set to 0.05.The resulting shrunk output gene trees were used as an input for ASTRAL, which generated a species tree along with the quartet score.The tree was visualized in FigTree v1.4.4.

| Population genetics
To get a notion of the possible number of genetic clusters and related species in the dataset, we performed Principal Component Analysis (PCA) on the target enrichment SNP dataset.The PCA was ran using dudi.pcafunction in the 'adegenet package' (Jombart & Ahmed, 2011) in R studio v1.4.1106.We also calculated the pairwise F ST from the same SNP dataset using R package hierfstat (Goudet, 2005).
STRUCTURE analysis (Pritchard et al., 2000) was also conducted on the target enrichment SNP dataset to infer population structure as well as admixture patterns.We used unlinked SNPs (as one of the assumptions of STRUCTURE includes that the loci are at HW equilibrium), and for this SNP data was filtered using a custom Perl script (https:// www.biost ars.org/p/ 313701/ , accessed on 23/08/2023) to keep only one SNP per scaffold.The final number of SNPs after this filtering was 1831.The analysis was done assuming five clusters, that is for K = 1 to K = 5, each with 10 replicate runs.
We used Clustering Markov Packager Across K or CLUMPAK ver.1.1 (Kopelman et al., 2015) to align the cluster assignments across all 10 replicates for K = 2, K = 3, and K = 4 and Distruct (Rosenberg, 2004) to show barplots for the K populations.Structure Harvester (Earl & vonHoldt, 2012) was used to obtain optimal number of genetic clusters (K) using ∆K method with 500,000 generations for the Markov chain and a value of 100,000 as burn-in.The program produces a graph based on the Evanno method (Evanno et al., 2005), between K cluster and DeltaK along the X and Y axis respectively, which shows the K with the highest DeltaK along the axes.

| Species delimitation analyses
To perform species delimitation analyses, we conducted Trinomial distribution analysis tr2 (Fujisawa et al., 2016), in which the distribution of rooted triplets is modelled to find the congruence between gene trees under multispecies coalescent framework.In short, it measures the incongruence of tree topologies within species and congruence between species.We tested a two-species hypothesis compared to the null model (assuming a single species) and obtained the results based on calculation of -log (posterior probability).
Additionally, we calculated the Genealogical Divergence Index (gdi) (Jackson et al., 2017) for the genetically distinct clades identified from phylogenetic analyses with BPP using the approach described in Leaché et al. (2019).For this, we used our complete target enrichment dataset including all the loci.We first conducted a full BPP analysis (A11), where species tree and species delimitation are jointly estimated.We observed that BPP consistently inferred delimitation when all the populations were assigned to distinct species with posterior probability of 1, irrespective of prior combinations used.This asymptotic behaviour of Bayesian species delimitation models, that is, increase in the posterior probability of two-species model or species-splitting as the number of loci increases, has been discussed in Leaché et al. (2019).Next, we conducted a BPP analysis where species delimitation and species tree were fixed, to infer theta and tau values for each node corresponding to the distinct clades in the tree (BPP A00 analysis).
We used beta prior values 0.004 for theta (population size) and 0.002 for tau (divergence times) respectively and the tree inferred by ASTRAL as a guide tree.Both values were estimated using Bayesian analysis, by specifying the 'e' option in the A00 analysis, which was conducted two independent times using a MCMC chain length of 1,000,000 and burn-in of 100,000.The convergence of the two runs was assessed in Tracer v1.7.2 (Rambaut et al., 2018) and the converged runs were concatenated to generate posterior distributions for the multispecies coalescent parameters.The gdi was calculated as where tau_AB is the inferred divergence time of the clade from its sister group, and theta_A is the inferred population size for the clade.We determined the final theta and tau values as the median theta and tau values from the converged run.A gdi value of less than 0.2 suggests the conspecificity of sister clades, but values between 0.2 and 0.7 are indicative of the so-called 'grey zone', and a gdi value of >0.7 supports them to being a distinct species (Jackson et al., 2017).

| Isolation by distance
We tested for isolation by distance (IBD) using a Mantel test between a matrix of genetic distances and matrix of geographic distances using the mantel.randtestfunction in R package adegenet.This test finds the correlation between individual Edwards' genetic distances and Euclidean geographic distances (Mantel, 1967).Based on the simulated p-value, we determined whether isolation by distance was significant (p < .05).

| D-statistics
We used the program Dsuite (Malinsky et al., 2021)  assuming every specimen to be a distinct species/population, excluding hybrids from the analysis.D-statistics was calculated using Dtrios command, which calculates ABBA-BABA proportions and f4-ratio statistics for all possible trios of populations/species.We also performed the f-branch (fb) test which calculates f-branch statistic values for each branch on the input tree, including internal branches (Malinsky et al., 2021); it is useful in guiding the interpretation of correlated f4-ratio results.We used the tree inferred from ASTRAL as an input hypothesis tree, to calculate the f-branch statistic.
Additionally, we also tested for gene flow among the populations of two species using HyDe program (Blischak et al., 2018), which

| Overview of the dataset
With regard to our genomic dataset, the average number of informative loci retained was 439 (SD = 29.7,Table 1).The average amount of missing data was 26.42% (SD = 0.043, Table 1).For our Mitochondrial DNA (mtDNA) dataset, 23 specimens were successfully sequenced, and for the remainder we used the barcodes extracted from target enrichment dataset for further analysis.

| Phylogenetic analyses
The preliminary specimen identification was done based on the morphological characteristics.The ML tree obtained from the mitochondrial COI dataset (Figure 2) did not show reciprocal monophyly, and the taxa seemed to be phylogenetically intermixed.However, five specimens of S. pavoniella formed a sister group to the remaining specimens, suggesting that at least this species has species-specific haplotypes.
The ML tree from the target enrichment dataset showed two separate and strongly supported clusters corresponding to S. pavonia (indicated by orange, Figure 2) and S. pavoniella (indicated by blue, Figure 2), with SH-aLRT/UFBoot support values 100/100 and 100/99 respectively.The hybrids are highly supported as the sister group to S. pavoniella.One individual from S. pavonia (ID SAT024) has an unstable longer branch length which could be potentially caused by errors during the alignment step, or alternatively due to sequencing errors that were not filtered out at the bioinformatics step.
We observed that the species tree obtained from the dataset filtered using TreeShrink did not differ from the one obtained using data without filtering.Both species trees (generated from filtered and unfiltered gene trees) correspond to the ML tree generated from our genomic dataset (Figure 3 and Figure S5).Similar to the ML tree, S. pavonia and S. pavoniella were separated into two clusters with quartet support values of 0.99 and 1, respectively, with the hybrids highly supported as the sister group of S. pavoniella.As in the ML tree, this relationship is highly supported, but the clade including only the two hybrids has low support.We further observed that in both the COI and ASTRAL trees, two individuals from Italy, TLMF Lep 30361 and TLMF Lep 30362, were placed on an internal branch longer than for the rest of the individuals.To dissect the reasons behind this pattern, we further built a neighbour-joining (NJ) tree for the COI data, where it was observed that S. josephinae was nested within the variability of S. pavonia and S. pavoniella.Thus, we selected a more distant outgroup, Saturnia pyri Denis & Schiffermüller, 1775, to root the COI ML tree.However, we continued to observe the same pattern where the two specimens are placed on a longer internal branch, in ML tree.The neighbour-joining tree, on the other hand, once again displayed the pattern of S. josephinae being nested inside S. pavonia-S.pavoniella clade (Figure S7).

| PCA and F ST
PCA based on the SNP dataset resulted in two genetically distinct clusters for S. pavonia and S. pavoniella (Figure 4).The two individuals that are situated between these two clusters in the figure are hybrids that appear intermediate and stay separated from both clusters (Figure 4).Three S. pavonia specimens with sample IDs.SAT026, MM27425, and MM27428 are separated from the rest of the S. pavonia individuals mainly along PC2, indicating to some extent the presence of genetic differentiation in these three individuals.The pairwise F ST between the two species was calculated as 0.345.

| Genomic admixture and isolation by distance
The analysis of genomic admixture using STRUCTURE produced the highest likelihood estimate for K = 2 clusters for the target enrichment SNP dataset as can be seen in Table S1.The CLUMPAK analysis of STRUCTURE results gave bar-plots that show cluster assignment from K = 2 to K = 4 (Figure S1).The results do not suggest admixture between the two parapatric taxa.The only admixture that was observed was in the hybrids that were included in the study.
From the geographic distribution of genomic admixture, it can be observed that both species have maintained genetic separation, despite range overlap in the hybrid zone (Figure 4).Only very slight admixture was observed (from K = 3) in a few S. pavonia and S. pavoniella individuals (Figure S4).
The isolation by distance was observed to be non-significant (pvalue > .05, Figure S3).

| Species delimitation
The tr2 species delimitation analysis gave log-likelihood scores based on posterior probability for the null model (assuming S. pavonia and S. pavoniella as a single species) and model1 (assuming S.

| Introgression
Out of 2024 trios analysed, high D-statistics values with significant p-values were observed for about 515 trios (D-statistics > 0.25, Table S2).From these, we further separated the trios where both P1  2) and additionally an elevated f4 ratio (Figure S2).However, Malinsky et al. (2021) showed that the D and f 4 -ratio statistics are correlated and that a significantly elevated result for a trio does not necessarily indicate that the populations are involved in gene flow event.However, we observed stronger f-branch signals, that is, between 5 and 10% for several branches including pavonia7 (SAT026) and the entire pavoniella clade as well as pavoniella14 (SAT013) and several pavonia branches (Figure S2c), indicating past introgression incidents between the two species.

| DISCUSS ION
S. pavonia and S. pavoniella represent a well-known system of recently diverged Lepidoptera with parapatric distributions and a narrow contact zone in Europe, similar to M. athalia and M. celadussa (Joshi et al., 2022;Tahami et al., 2021).We used this species pair as a model to elucidate the evolutionary genetic patterns and admixtures as well as species delimitation under parapatric mode of distribution.
Based on hundreds of loci, we attempt to interpret the observations from evolutionary and taxonomic points of view, also taking into consideration the past climate oscillations that have shaped the genetic diversity of many European Lepidoptera (Hewitt, 1996(Hewitt, , 2000(Hewitt, , 2004)).Systems like the one studied by us are likely to be common in nature, and while some cases are well documented, many more remain poorly understood and even undetected, because parapatric taxa usually represent recent divergence of lineages at the interface of species and population levels, rendering observations on two taxonomically distinct but possibly cryptic entities challenging.
Possibly numerous closely related species have undergone a similar phase of a secondary contact following partial or full reproductive incompatibility that evolved during prolonged isolation, especially during glaciation maxima when ranges were split into several isolated refugia (Schmitt, 2007).Periodic expansions of the species' ranges due to climate oscillations can be expected to have led to many secondary contacts between diverged populations, which in turn would have provided grounds for ecological differentiation of reproductively diverged populations through reinforcement and character displacement.

| Mitonuclear discordance
The Maximum Likelihood tree for the mtDNA dataset did not fully separate the two taxa, suggesting that S. pavonia and S. pavoniella may not be distinct at the species level (Figure 1).In contrast, the Maximum Likelihood tree and the ASTRAL species tree from the genomic dataset recovered the two taxa as separate entities, with hybrids being grouped together and appearing intermediate with regards to the wild collected specimens (Figures 1 and 2).The causes for the (seemingly) mitonuclear discordance can be nonbiological (operational) or biological (Ivanov et al., 2018).Possible biological causes include incomplete lineage sorting (ILS), introgression, horizontal gene transfer, and Wolbachia-mediated selective sweeps following hybridization (Bergthorsson et al., 2003;Mutanen et al., 2016;Soucy et al., 2015;Toews & Brelsford, 2012).The operational causes include misidentifications, contaminations, paralogues, NUMTs, and chimeric sequences, but also inconsistencies due to inaccurate taxonomy (Funk & Omland, 2003;Mutanen et al., 2016).In this study, operational causes were ruled out via a thorough analysis and investigation; furthermore, the results were re-analysed to systematically eliminate operational causes as contributing factors.
Therefore, it seems evident that the two species display genuine mitonuclear discordance.
While it is difficult to distinguish between incomplete lineage sorting and introgression based on phylogenomic analyses (Funk & Omland, 2003), insights of whether ILS is present in the dataset or not can be obtained.Generally, when ILS is present, individual genealogies tend to be discordant with each other and the species tree, because under this scenario gene genealogies are supposed to vary in a stochastic manner (Degnan & Rosenberg, 2009;Maddison, 1997).We observed that both concatenation and coalescent approaches displayed similar relationships between the two taxa and the normalized quartet score from ASTRAL was calculated as 0.53.This score is an indicator of gene tree discordance due to ILS, although in some cases it also could be an indicative of gene tree error (Mirarab, 2019).We conclude that moderate ILS likely is present in our study species.
There are several well-documented cases of mitonuclear discordance resulting from historical introgression (Linnen & Farrell, 2007;Shaw, 2002;Sota & Vogler, 2001).pavonia and S. pavoniella do not display significant levels of admixture.It has been proposed that the two species could show sporadic introgression through fertile hybrid males (Huemer & Nässig, 2003).
Analysis of ancient introgression using Dsuite supports this idea of episodes of introgression in the past.However, presently the two taxa show no or only little gene flow between them.
We show that the HyDe program can be reliably used to detect hybridization and the presence of hybrids in phylogenomic datasets.
As in our case, the two hybrid individuals were lab-reared and their status as hybrids was therefore known.With wild collected specimens, hybrid status may be difficult to document, although its detection could sometimes be important for example in conservation efforts (Godinho et al., 2011;Harmoinen et al., 2021).As we demonstrated here, the gamma value can give an idea of F1 hybrids, as it represents the probability that the hybrid population is sister to one of the two parental populations.The value close to 0.5 suggests that they are about 50:50 (F1) hybrids (Blischak et al., 2018).At the same time, it should be emphasized that the process of hybridization is not always uniform, and that hence the gamma value can vary widely between individuals within hybridizing population.This was the case with our data.
Although the topologies of the barcode and ASTRAL trees differed from each other, in both trees we observed two specimens from Italy placed on long internal branches.In ASTRAL, such long branches are generally indicative of gene tree discordance, as the branch lengths are in coalescent units (Sayyari & Mirarab, 2016).
In some cases, they could also be indicative of the presence of pseudogenes or bad alignments due to missing data or shorter sequences, etc.As we had carefully checked all our alignments (barcode and nuclear genes) before proceeding for further analysis, the possibility of misalignments can be ruled out in both cases; the biological causes, if any, remain to be discovered.We further observed the somewhat surprising pattern of COI ML and NJ trees displaying different phylogenetic relationships when rooted on a more distant outgroup.
Results based on species delimitation analyses using BPP were found to contradict those of other analyses, in which S. pavonia and S. pavoniella were clearly identified as separate species.Gdi values have been shown to be unreliable predictors of the species status (Dietz et al., 2022), and gdi values estimated for taxonomically distinct species may sometimes be lower than those estimated for intraspecific entities (Jackson et al., 2017), which was also observed to be the case in the present study.

| Character displacement and parapatry
Results from our analysis of nuclear markers strongly suggest that the two taxa are separate species that fulfil the criteria of both the to gene flow, despite having a narrow geographical overlap.This is also evident from isolation by distance (IBD) analysis, where geographic distance was not found to be significant in explaining the observed genetic separation of the two taxa.Often, species that are a part of a parapatric distribution system show striking morphological and ecological similarities, but morphologically intermediate individuals may occur (Guiller et al., 2017;Slender et al., 2017;Tahami et al., 2021).In the present study system, S. pavonia and S. pavoniella are morphologically similar, although S. pavoniella is slightly larger.
The differences between the wing patterns and the genitalia were noticed 20 years ago only, and despite being separated by Huemer and Nässig (2003), members of the two populations were commonly identified as varieties of a single species.A possibility for hybridization exists in and around the contact zone as pheromonal differentiation seems to be incomplete and females of both species attract males of the other species.Natural hybridization has been reported by Huemer and Nässig (2003).Although hybridization occurs, it seems that the hybrids are sterile in natural populations as we could not see any signs of present introgression.Thus the postzygotic reproductive isolation seems to be complete.
Two evolutionary scenarios can explain the origin of parapatric mode of distribution: (1) parapatric speciation and (2) secondary contact of the populations that diverged in allopatry.While parapatric speciation is disputable and documented cases are few, there is evidence that under certain circumstances strong disruptive selection can promote speciation by driving adjacent populations to adapt to different ecological conditions and to differ through character displacement and reinforcement, eventually leading to reproductive isolation (Reifová et al., 2011).However, S. pavonia and S. pavoniella seem to share very much the same ecological niche and are attracted to the pheromones of "wrong" species, suggesting that they have not been in contact for a very long time as otherwise pheromonal differentiation would be expected.Thus, our observations are not fully consistent with the scenario of speciation through ecological separation (due to adaptations to local habitats) under parapatry.
The two Saturnia populations presumably diverged during the last interglacial period in distinct refugia in allopatry.This isolation led to the appearance of postzygotic isolation by genetic drift (rather than selection), followed by the post-glacial expansion of ranges resulting in a secondary contact, as has been assumed to have taken place in other similar cases (Tahami et al., 2021).As body size is closely associated with the mating behaviour, the size difference between parapatric taxa could potentially result from character displacements in secondary contact periods (Zhang & Kubota, 2023).
The possibility that the two Saturnia species evolved body size difference through this process can therefore not be ruled out.
In theory, the secondary contact of diverged populations might not only lead to a continuing speciation process through reinforcement (Pfennig, 2016), but also to the merging of the populations.As there is a mechanism at play that maintains a reproductive barrier between S. pavonia and S. pavoniella, as indicated by most of the analyses, the second scenario appears unlikely.Some evidence of past introgression was observed in our data, but it seems to have ceased and has not resulted in disruption of this barrier.In some cases, hybrid zones can also act as a 'genetic sink' that strongly restricts gene flow between the species (Hafner et al., 1983;Yanchukov et al., 2006).Indeed, it has been suggested that the divergence in local allopatry is protected by hybrid zones as the taxa later become parapatric (Hewitt, 1988).To test this idea in the present study system, a more detailed study of samples from the contact zone would be required.
Parapatric species systems provide interesting models to study speciation in action.Events such as these take place at the interface of population genetics and phylogenetics levels for which reason analyses designed to address the question at both levels are potentially useful.Previously studied cases of parapatric species pairs or groups have revealed common patterns that suggested speciation not having reached completion makes taxonomic delimitation of parapatric taxa a very challenging and often largely arbitrary exercise, thus highlighting that speciation as a slow gradual process rather than a sudden event.From an evolutionary point of view, one might expect to observe systems at different stages of the speciation-population continuum, particularly when parapatry has resulted from a secondary contact of populations differentiated in allopatry, for example during glacial refugial periods.

| Potential of target enrichment in species delimitation
A number of recent studies have promoted the use of multi-locus genomic approaches such as ultraconserved elements (UCEs), ddRAD-sequencing, and target capture to aid species delimitation (Natusch et al., 2020;Rancilhac et al., 2019;Smith et al., 2014).
Target capture is a cost-effective approach with greater sequencing depth and less data burden compared to whole genome sequencing (WGS) and whole exome sequencing (WES) (Bewicke-Copley et al., 2019).Evidently, the number of loci recovered in the present study was sufficient to clarify the taxonomic relationships between the two very closely related species.The percentage of missing data ranged from 18.56% to 35.08%, which is modest compared to those typically seen in RAD data sets (Lee et al., 2018).The amount of recovered data provided insights into evolutionary processes that are of high relevance in species delimitation, including admixture, presence of gene flow, and reciprocal monophyly.There is therefore little doubt that the used approach is powerful enough to address all the relevant evolutionary and other epistemological aspects needed for robust species delimitation.
This is a starting point to address perhaps even more difficult ontological and philosophical aspects of species delimitation.
Because parapatric species systems may show a variety of patterns for example with regard to the levels of admixture and mode of divergence (or merging), it might not be possible to establish objective criteria for taxonomic delimitation of such taxa.Due to this inherent arbitrariness, we have before been calling for more consistency for delimitation of species under such complex evolutionary circumstances.This would serve taxonomic stability the best, not jeopardizing scientific validity.Needless to say, the two Saturnia species this study has focused on do not represent the most complex case in the speciation continuity as their integrity as biological or phylogenetic species did not remain unclear.In case of ongoing but restricted gene flow, the cohesion (or lack of it) of the two lineages might provide a useful solution to species delimitation problem of admixing parapatric taxa (Templeton, 1989), although other feasible solutions may remain.

| CON CLUS IONS
We investigated the parapatric relationship of S. pavonia and S. pavoniella to elucidate genetic patterns and admixture under a complex evolutionary setting in which speciation of recently diverged populations seems not to have reached full completion.We further addressed the question if genomics tools such as target capture, which enables the accumulation of vast amounts of genetic data, could possibly provide better means to delimit species under complex circumstances, which, for instance, parapatric species pairs commonly represent.Our findings support the hypothesis that the two taxa are distinct species based on both biological and phylogenetic species concepts.We found evidence for mitonuclear discordance due to past introgression and incomplete lineage sorting.
Our observations suggest that S. pavonia and S. pavoniella have experienced a secondary contact after a period of isolation.Periodic range expansions due to climate oscillations may have resulted in many secondary contacts of such diverged populations, which ultimately could lead to ecological differentiation through character displacement.Our study highlights the importance of understanding the evolutionary processes that shape the genetic diversity of organisms in parapatric relationships, which are likely to be common in nature.We found that species delimitation of parapatric species can at times be even relatively straightforward as we could not detect any ongoing introgression in Saturnia.However, the two reproductively isolated taxa have not yet completed speciation from an ecological standpoint as their sexual pheromones continue to attract heterospecific individuals and hybridizations occur.We assume that the lack of pheromonal differentiation plays a significant evolutionary role in maintaining parapatry in moths, as individuals dispersing into the area of the other species are likely to have low fitness.This situation creates evolutionary "pressure" for the two species to develop more reliable prezygotic isolation mechanisms, for example, through pheromonal character displacement.Although we lack direct evidence, we hypothesize that this process is underway.Conducting detailed studies focusing on the natural pheromone chemistry of the two species in and around the contact zone would provide valuable insights into this process.
female was tied around the base of its two forewings with a piece of 50-70 cm of cotton string.The string was itself attached to a shrub to prevent the female from escaping.The female called and mated with one of the wild S. pavonia males she had attracted.The mated S. pavoniella female laid eggs and the hybrid caterpillars were reared to obtain F1 hybrid adults.DNA was extracted from the antenna or thorax of either ethanol preserved or dry specimens.DNA extraction was performed using the QIAGEN DNeasy Blood and Tissue Kit (Hildesheim, Germany) or the Omega Bio-tek E.Z.N.A. Insect DNA Kit (Norcross, United to calculate the Patterson's D-statistics to test for the presence of historical introgression in our dataset.This test was first introduced in Green et al. (2010) to detect the presence of Neandertal ancestry in modern humans, with further theoretical improvements and applications in Reich et al. (2010) and Durand et al. (2011).This test computes the excess of alleles in one species (usually denoted as P3) in relation to two sister species (P1 and P2), and a fourth outgroup species as a reference.For this, we used a vcf file containing multiallelic SNPs as an input.The Dsuite program uses only biallelic SNPs by default, thereby eliminating the need to filter multiallelic SNPs file beforehand.Since we did not have three populations (excluding the outgroup), we ran this program hybridization in phylogenomic datasets using phylogenetic invariants.The program uses D-statistics and site-pattern probabilities to test for hybridization.The analysis was run on concatenated sequence data.Since the main purpose behind using this program was to confirm the hybrid status of the two lab-reared hybrids based on genetic data, we defined the two hybrids as a separate population, along with S. pavonia and S. pavoniella as another two populations and S. josephinae as an outgroup.First, we performed a 'full analysis' using run_hyde.pyscript, which tests all possible triples of populations.Further, we tested for hybridization at individual level within populations detected as hybrids using individual_hyde.pyscript.The results were interpreted based on gamma (γ) value and a significance score (p-value).
pavonia and S. pavoniella as two different species) of -(231271.46)and -(7339.42)respectively, clearly preferring the two-species model (model1).The genealogical divergence index (gdi) value for divergence of S. pavonia clade from S. pavoniella was calculated as 0.277, suggesting the distinct species status of the two clades being ambiguous.
and P2 are represented by same species, after which we obtained about 182 trios where both P1 and P2 are represented by S. pavonia and P3 by S. pavoniella and about 223 trios where P1 and P2 are represented by S. pavoniella and P3 by S. pavonia.Out of these, a significantly high D-statistic value (>=0.4) was observed for about 53 trios (Table ABBA STRUCTURE analysis and F I G U R E 1 . 1 Representative images of live wild female and male (a) S. pavoniella and (b) S. pavonia specimens.| 11 of 18 KHAN et al. geographic distribution of genomic admixture demonstrate that S.

F
Maximum likelihood tree inferred using barcode data (on left) and target enrichment data (on right).S. pavonia individuals are indicated by orange colour and S. pavoniella by blue colour.For barcode ML tree, numbers on the branches indicate ultrafast bootstrap support values (UFBoot), and for target enrichment tree, they indicate sh-aLRT/UFboot values.Both specimens SAT022 (male) and SAT023 (female) indicated by the box are lab-reared F1 hybrids obtained from the following crossing: pavoniella female * pavonia male.biological species concept (reproductive barrier present) and the phylogenetic species concept (reciprocal monophyly).The apparent lack of gene flow suggests that the taxa have developed barriers

F
I G U R E 3 ASTRAL tree inferred for Saturnia.Numbers on the branches indicate quartet support values.Both specimens SAT022 (male) and SAT023 (female) indicated by the box are lab-reared F1 hybrids obtained from the following crossing: pavoniella female * pavonia male.

F
I G U R E 4 (a) PCA for Saturnia SNP dataset.(b) Pie charts based on membership coefficient matrix at K = 2 mapped on geographic coordinates.Hybrid individuals are indicated by 'H'.
pavonia from the Loire valley.The crossing was made in a garden in Indre & Loire in central France where the reared S. pavoniella