Colonization of Northern Europe by Zygaena filipendulae (Lepidoptera)

Abstract Northern and mountainous ice sheets have expanded and contracted many times due to ice ages. Consequently, temperate species have been confined to refugia during the glacial periods wherefrom they have recolonized warming northern habitats between ice ages. In this study, we compare the gene CYP405A2 between different populations of the common burnet moth Zygaena filipendulae from across the Western Palearctic region to illuminate the colonization history of this species. These data show two major clusters of Z. filipendulae populations possibly reflecting two different refugial populations during the last ice age. The two types of Z. filipendulae only co‐occur in Denmark, Sweden, and Scotland indicating that Northern Europe comprise the hybridization zone where individuals from two different refugia met after the last ice age. Bayesian phylogeographic and ecological clustering analyses show that one cluster probably derives from an Alpe Maritime refugium in Southern France with ancestral expansive tendencies to the British Isles in the west, touching Northern Europe up to Denmark and Sweden, and extending throughout Central Europe into the Balkans, the Peleponnes, and South East Europe. The second cluster encompasses East Anatolia as the source area, from where multiple independent dispersal events to Armenia, to the Alborz mountains in north‐western Iran, and to the Zagros mountains in western Iran are suggested. Consequently, the classical theory of refugia for European temperate species in the Iberian, Italian, and Balkan peninsulas does not fit with the data from Z. filipendulae populations, which instead support more Northerly, mountainous refugia.

sheltered mountain valleys, could also have played a role for the survival and recolonization of some temperate species (Schmitt & Varga, 2012). Since glacials are approximately 10 times longer than interglacials, they are thought to have played a major role in population divergence for temperate species (Hewitt, 2011). Studies have shown that populations become extinct as their habitat disappears, rather than physically moving into refugia at the beginning of a glacial period (Hampe & Petit, 2005). Accordingly, populations in longterm refugia (inhabited for at least one full glacial/interglacial cycle) are descended from populations that were already in place before the onset of glaciation (Hewitt, 2011). Long-term refugia are therefore expected to harbor the greatest level of genetic diversity within a species range, whereas populations from recent colonizations have the least diversity (Hewitt, 2011). The last glacial period occurred from approximately 110,000 to 12,000 years ago, and Denmark, the rest of Scandinavia and Scotland, was covered in ice until at least 15,000 years ago, resulting in most present day species having arrived in these areas after this time.
Phylogeographic analyses can be used to unravel historical processes resulting in contemporary population distributions based on gene sequences (Hickerson et al., 2010). Bayesian phylogeographic and ecological clustering (BPEC) (Manolopoulou & Hille, 2018;Manolopoulou, Hille, & Emerson, 2019) allows estimation of the posterior probabilities for haplotype tree networks under a coalescent-based migration-mutation model (Manolopoulou & Emerson, 2012). BPEC combines an evolutionary model for the genealogical relationships among sampled DNA sequences, specified as latitudes and longitudes, together with a geographical model representing splitting and dispersal events forming spatial clusters, assuming that population substructure is the result of individuals migrating to a new geographical area.
Burnet moths (Zygaena, Lepidoptera) are brightly colored day flying moths comprising 98 species distributed across most of Europe.
Zygaena filipendulae (Figure 1) are very common and especially cold-hardy burnet moths occurring as far North as Northern Norway and Scotland as well as across most of Europe. Hofmann and Tremewan (2017, p. 49) inferred hypothetical areas of origin of the genus Zygaena and found clear indications for a primary differentiation of the species groups belonging to the "filipendulae"-stem group from the Balkans, around the Black Sea, and Caucasus region, from where several species expanded during different epochs.
Zygaena filipendulae was stated to have a Euro-Siberian distribution belonging to those species groups derived from the Middle East.
When the preferred larval food plant Lotus corniculatus (Zagrobelny, Bak, & Møller, 2008) is sown, for example along the bank of a newly built or renovated road, Z. filipendulae can appear within a few years.
Zygaena filipendulae has been established as a model system to study the evolution and roles of its chemical defense compounds, the cyanogenic glucosides linamarin and lotaustralin (Zagrobelny & Møller, 2011;Zagrobelny et al., 2014). The biosynthetic and degradative pathways of these compounds have been solved in Z. filipendulae (Jensen et al., 2011;Pentzold et al., 2017), and the first enzyme in the biosynthetic pathway, CYP405A2, was used for the present study ( Figure 2). CYP405A2 is ~6,000 bp long with nine introns ranging from 79 to 1,278 bp in length. Accordingly, ~4,500 bp of the gene is noncoding introns. Such noncoding nuclear regions of a genome have previously been shown to be optimal for studies of populations from a species distribution range, due to the rapid accumulation of variation within them, in contrast to the low level of variation accumulating in coding regions (Hewitt, 2011).
In this study, we compare the coding and noncoding parts of

| Biological specimens and sequencing
CYP405A2 was sequenced and compared for 1-2 specimens from 30 different populations of Z. filipendulae. Larvae or moths were collected from seven populations around Denmark, and Z. filipendulae moths from a further 23 populations were obtained from collaborators around Europe (Table 1) (see Acknowledgments) either preserved in ethanol or as dried specimens. Genomic DNA was isolated from moths or larvae using DNeasy Blood and Tissue kit (Qiagen 69504). CYP405A2 genes from the specimens were PCR amplified from genomic DNA diluted to a concentration of 10 ng/ µl, using the following program: denaturation 94°C for 2 min followed by 30 rounds of denaturation at 94°C for 30 s, annealing at 60°C for 30 s, and extension at 72°C for 1 min. Final extension was at 72°C for 7 min. Primers are shown in Supporting Information Table S1. Three different polymerases were used in the PCR reactions: proofreading X7 (made in our laboratory) or nonproofreading TaKaRa (TaKaRa Bio Company) or KapaTaq (KapaBioSystem KR0352). The PCR products were purified on 1% agarose gels and extracted with a gel extraction kit (Omega E.Z.N.A Gel Extraction Kit D2500-02). The purified fragments were ligated into commercially available vectors: X7 products into pJET1.2 (CloneJet K1232), the other two into pCR2.1TOPO TA (Invitrogen 450641).

| Assembly and phylogenetic analysis
The fragments of CYP405A2 obtained from all the specimens were manually assembled into full-length genomic sequences in MEGA7 (Kumar, Stecher, & Tamura, 2016 partly by MUSCLE (Edgar, 2004a,2004b and partly manually (due to the difficulty of aligning intron sequence in any computer program). Genes were submitted to Genbank (Accession numbers shown on Figure 3). The evolutionary history was inferred with the maximum-likelihood method based on the Tamura 3-parameter model and a discrete Gamma distribution. This model was chosen based on the model test from the MEGA7 program, where the model with the lowest BIC score (Bayesian Information Criterion) is considered to describe the observed substitution pattern best.
Phylogenetic trees were also constructed using BEAST (Suchard et al., 2018) with essentially the same results as MEGA7 (results not shown). Evolutionary distances, ancestral sequences, and molecular clock analyses were all carried out in MEGA7 with default parameter settings.

| RE SULTS
To unravel the relationships between different populations of Z. filipendulae, CYP405A2 sequences from 43 specimens were analyzed. Z. filipendulae CYP405A2 sequences are between 92% and 99% identical to each other (Table 2) and the outgroups Zygaena lonicera and Zygaena transalpina are 89%-93% and 82%-85% identical to them, respectively. From the phylogenetic tree (Figure 3), it is evident that the specimens largely fall into two clusters (the European cluster and the Caucasian cluster). The specimen from Iran Northwest seems ancestral to both clusters. Remarkably, some populations (from Denmark, Scotland East, and France Southeast) have specimens from both clusters and the single specimens analyzed from Sweden and Armenia are both heterozygous with one allele belonging to each cluster. The longest evolutionary distances between any of the specimens analyzed (Table 2) can be found between members of the European cluster and the Caucasian cluster.
More variation is evident within the Caucasian cluster than within the European cluster (Table 2), and in some European populations, there is almost no variation between specimens analyzed.
When looking more closely at the introns, it is clear that most of the variation can be found in intron 1, 2, and 4, whereas intron 5-9 are a lot more conserved. The large variation in intron 4 is the main reason for the segregation of the phylogenetic tree into the Caucasian and European cluster, since there are two "versions" of this intron, the Caucasian (631 bp ± 10) and the European (971 bp ± 33). When calculating an ancestral sequence for the Caucasian and European cluster as well as an ancestor of all Z. filipendulae populations, the ancestors both have features mostly from the European cluster in intron 4. They both group with Greece South and France Southeast in a maximum-likelihood tree with 1,000 bootstrap replicates (data not shown). The accuracy scores for the ancestral sequences are 0.998 and 0.997, respectively, implying that on average any base has a 99.8% or 99.7% chance of being correct (Hall, 2011). This indicates that intron 4 has evolved from a European common ancestor in spite of more variation present in the Caucasian cluster.
Since Tajimas rate test and molecular clock analysis using maximum likelihood showed equal evolutionary rate throughout the tree, it is possible to do a molecular clock analysis. However, a fixed time point is needed to calibrate the clock, and the paleontological record only contain two different Zygaenidae genera described from the Miocene (5-23 MYA) in central Europe (Naumann, 1987).
None of them can be ascribed to a specific species; one is probably an early Epizygaenella (sister group to Zygaena) and the other could be either Praezygaena, Reissita, Epizygaenella, or Zygaena.
A Z. filipendulae + Z. transalpina + Zygaena nevadensis stem group was previously established based on molecular-phylogenetic reasoning (Niehuis, Hofmann, Naumann, & Misof, 2007), but to date there is no evidence of when Z. filipendulae diverged from other Zygaena species. Consequently, time points based on available habitat after the withdrawal of the Northern European ice sheets were used to calibrate the molecular clock. Since Scotland East is the earliest diverging species in the Caucasian cluster, this divergence point could presumably not be earlier than 15,000 years ago (http://www.donsmaps.com/icemaps.html), because Scotland was covered in ice before that timepoint. When using this calibration point, timings of other events in the phylogenetic tree appeared to have happened later than they logically could have, for example Z. lonicera should have split from Z. filipendulae only 22,000 years ago which seems highly unlikely. Consequently, the split between the European and the Caucasian cluster appear to be much earlier than the 15,000 years ago used to calibrate the tree and consequently, it is not possible to assign a reliable calibration time to the molecular clock tree (Figure 4). As evident in Figure 4   could not be unambiguously assigned to a single cluster. It is remarkable that in specimens tested from two neighboring populations in England, only differing in the timing of the end of diapause in spring (England Early and England Late), we found a 100% assignment to the Caucasian cluster for the early specimen, and a two-third assignment to the European population for the late specimen. Another example is the haplotypes of both strands of the gene in the Armenian specimen, being assigned with high proportion to the predominant West-Central-North cluster though located in the Eastern region.
Theoretical predictions of coalescent theory states that a root haplotype is expected to have a higher number of haplotype connections in the network, rather than being close to the tips. The fact that BPEC does provide a finite (probability) distribution of haplotype trees-as well as the existence of the underlying migration modelallows this method to incorporate the uncertainty in the haplotype rooting.
F I G U R E 4 Phylogenetic tree with relative times inferred using the maximum-likelihood method under the Tamura 3-parameter model. The rates among sites were treated as a Gamma distribution with invariant sites. There were a total of 8,248 positions in the final dataset. The Western-Central-Northern European cluster is shown in red and the Caucasian-Iranian cluster is shown in blue

| D ISCUSS I ON
We analyzed specimens from populations of Z. filipendulae across the Western Palearctic region to infer their phylogeographic relationships. Only a fraction of the total genetic diversity within populations of a species can be seen in any study, relying on the number of specimens and populations analyzed. Most previous studies have relied on the mitochondrial COI gene and a sequence of 648 bp in length (Huemer, Mutanen, Sefc, & Hebert, 2014;Mutanen et al., 2012). We sequenced ~6,000 bp from each specimen, and the results clearly show two population clusters, shown in red (West- is highly problematic to find enough samples of the "native" plant populations and to avoid dilution of the phylogenetic signal by sequences from the sown plants.
Conclusively, two major groupings of Z. filipendulae populations were found in this study which could reflect two different refugial populations during the last ice age, an Alpine/Maritime refugium in Southern France and another in East Anatolia. These two types of Z. filipendulae are only co-occurring in Denmark, Sweden, and Scotland which could indicate that Northern Europe comprise the "hybridization zone" where individuals from the two different refugia met after the last ice age. BPEC analyses identified genetically distinct geographical population clusters and ancestral locations in Z. filipendulae and suggest an Alpine/Maritime refugium in Southern France and another in East Anatolia. Consequently, the classical theory of refugia for European temperate species in the Iberian, Italian, and Balkan peninsulas does not fit with the data from Z. filipendulae populations, which instead support more Northerly, mountainous refugia.