Genetic structure and demographic history of Lymantria dispar (Linnaeus, 1758) (Lepidoptera: Erebidae) in its area of origin and adjacent areas

Abstract We analyzed the population genetic structure and demographic history of 20 Lymantria dispar populations from Far East Asia using microsatellite loci and mitochondrial genes. In the microsatellite analysis, the genetic distances based on pairwise F ST values ranged from 0.0087 to 0.1171. A NeighborNet network based on pairwise F ST genetic distances showed that the 20 regional populations were divided into five groups. Bayesian clustering analysis (K = 3) demonstrated the same groupings. The populations in the Korean Peninsula and adjacent regions, in particular, showed a mixed genetic pattern. In the mitochondrial genetic analysis based on 98 haplotypes, the median‐joining network exhibited a star shape that was focused on three high‐frequency haplotypes (Haplotype 1: central Korea and adjacent regions, Group 1; Haplotype 37: southern Korea, Group 2; and Haplotype 90: Hokkaido area, Group 3) connected by low‐frequency haplotypes. The mismatch distribution dividing the three groups was unimodal. In the neutral test, Tajima's D and Fu's FS tests were negative. We can thus infer that the Far East Asian populations of L. dispar underwent a sudden population expansion. Based on the age expansion parameter, the expansion time was inferred to be approximately 53,652 years before present (ybp) for Group 1, approximately 65,043 ybp for Group 2, and approximately 76,086 ybp for Group 3. We propose that the mixed genetic pattern of the inland populations of Far East Asia is due to these expansions and that the inland populations of the region should be treated as valid subspecies that are distinguishable from other subspecies by genetic traits.

ellite loci were mainly conducted by North American researchers. In the first attempt by Bogdanowicz et al. (1997), four markers were developed and used to assay allelic variation in four gypsy moth populations (Japan, Far East Russia, China, and North America). Subsequently, Keena et al. (2008) evaluated flight capability and related traits using four microsatellite loci (from Bogdanowicz et al., 1997) and mitochondrial DNA analyses of samples obtained from 46 geographic strains. In Far East Asia, Koshio et al. (2002) compared the allele types of regional populations using three microsatellite loci of Japanese samples from three local populations; however, they did not consider population structure because of small sample sizes. Recently, Wu et al. (2015) thoroughly analyzed the population structure of the Holarctic gypsy moth and performed an origin test for each regional population using nine microsatellite loci, including three from Bogdanowicz et al. (1997).
These studies were conducted from the perspective of quarantine inspection (or invasive species control), and the number of sampled individuals was large; however, the number of sampled areas in each region was small, leading to taxonomic confusion with respect to the subspecies of L. dispar. For example, it was reported that two Asian subspecies, L. dispar asiatica and L. dispar japonica, were difficult to distinguish using morphological characters, with individuals of L. dispar asiatica collected from the southern coastal area of Korea having characteristics similar to L. dispar japonica . To resolve this taxonomic confusion at the subspecific level, a demographic history of the Far East Asian populations of L. dispar based on intensive sampling is required. Therefore, the goal of this study was to reveal the population genetic structure and demographic history of L. dispar in Far East Asia, including in the region of species origin: Hokkaido, Japan. For this purpose, we analyzed the genetic diversity and demographic history of L. dispar from Far East Asia using eight microsatellite loci and three mitochondrial genes (cytochrome c oxidase I [COI], ATP6, and ATP8 genes).

| Sampling and genomic DNA extraction for NGS and pyrosequencing
For NGS, we extracted genomic DNA from an egg mass of L. dispar.
The egg mass was collected from Suwon, Korea (37°14.092′N, 127°02.840′E; Figure 2b: Site A). In the egg mass, we selected 50 eggs and extracted genomic DNA using a NucleoSpin ® Tissue Kit (Macherey-Nagel GmbH, Düren, Germany) following the manufacturer's instructions. The sequencing was performed with a MiSeq Sequencer (Illumina, San Diego, CA, USA) by the DNA sequencing company DisGene (Daejeon, Korea). The resulting contigs were assembled in CLC workbench (CLC Bio, Aarhus, Denmark).

| Sampling and genomic DNA extraction for genetic structure analysis
For polymerase chain reaction (PCR) analysis of polymorphisms with the developed microsatellite markers and for genetic structure analysis, 552 samples were collected from 20 sites in Mongolia (1), Russia (1), China (1), Korea (12), and Japan (5) using pheromone attraction traps ( Figure 2, Table 1). The thoracic muscle of each individual was removed for the extraction of genomic DNA. For morphological examination, fore and hind wings were prepared as specimens on a glue board. Abdomens were maintained at −20°C for examination of genitalia. Genomic DNA was extracted using a DNeasy ® Blood & Tissue Kit (Qiagen, Leipzig, Germany) according to the manufacturer's instructions.

| Microsatellite locus identification and marker development
Microsatellite loci were identified using Phobos ver. 3.3.12 (Leese, Mayer, & Held, 2008;Mayer, Leese, & Tollrian, 2010) with the following conditions: repeated sequence length, 2-4 base pairs (bp) and repeat count, greater than four. AT-rich loci were excluded from the investigated microsatellite loci, and for loci that were repeated more than six times, primer sets were chosen using the primer design software PRIMER 3 (Koressaar & Remm, 2007;Untergrasser et al., 2012) with the following criteria: melting temperature, 55.5-56.5°C; GC content, over 30%; and primer length, 18-22 bp. A hundred and fifty primer sets were designed, and PCR tested for specificity and the presence of polymorphic amplification using one sample from each of the twelve regional populations from Korea. PCRs for the primer qualification test were conducted with AccuPower PCR primer for genotyping was labeled with 6-carboxyfluorescein at the 5′ end (Schuelke, 2000). Of the labeled markers, eight were selected for microsatellite marker assessment by a PCR amplification test. For microsatellite marker assessment, 432 samples from the 20 regional populations were genotyped (Table 1). These PCRs were performed by the DNA sequencing company Bionics (Seoul, Korea).

| Mitochondrial DNA sequencing
For the analysis of L. dispar genealogy in Far East Asia, we selected three mitochondrial genes: COI, ATP6, and ATP8. The COI gene may not be suitable for population analysis because its intraspecific variation is relatively low and its interspecific variation is relatively high (Cameron & Whiting, 2008;Wu et al., 2015); however, when combined with other genes, it may be useful (Hajibabaei, Singer, Hebert, & Hickey, 2007). The ATP6 and ATP8 genes show relatively higher intraspecific variation and are known to be suitable for population genetic analysis (Cameron & Whiting, 2008;Wu et al., 2015). The former region of the COI gene was amplified using the LCO1490 (5′-GGTCAACAAATCA TAAAGATATTGG-3′) and HCO2198 (5′TAAACTTCAGGGTGACCAA AAAATCA-3′) primer set (Folmer, Black, Hoeh, Lutz, & Vrijenhock, 1994) and a GeneMax Tc-s-B PCR cycler (BIOER, Hangzhou, China).
PCR conditions were set as in Hebert, Cywinska, Ball, and deWaard (2003). The ATP6 and ATP8 genes were amplified using the primer set from Wu et al. (2015) and an ABI Veriti 96-well Thermal Cycler (Applied Biosystems ® ; Thermo Fisher Scientific Inc., MA, USA). PCR products were checked using 1% agarose gel electrophoresis. The PCR products were purified and sequenced using the sequencing services of Macrogen (Seoul, Korea) and Bionics (Seoul, Korea). The obtained sequences were submitted to NCBI GenBank (Table 1).

| Microsatellite loci data analysis
Genotyping errors (such as null alleles and scoring errors) on selected markers were checked with MICRO-CHECKER ver. 2.2.3 (Oosterhout, Hutchinson, Wills, & Shipley, 2004). The pairwise linkage disequilibrium values for pairs of loci were then examined using Arlequin ver. 3.1 (Excoffier, Laval, & Schneider, 2005). Genetic diversity parameters such as allele frequency, genotype number, allele type, gene diversity, heterozygosity, and polymorphism information content (PIC) were calculated with PowerMarker ver. 3.5 (Liu & Muse, 2005). Hardy-Weinberg equilibrium (HWE) across loci was estimated after sequential Bonferroni correction (Rice, 1989). To test the isolation by distance (IBD) model, the correlation between genetic distance and geographic distance was calculated using Mantel's test with 30,000 randomizations in IBD ver. 3.23 (Jensen, Bohonak, & Kelley, 2005). To estimate genetic differentiation among regional populations, analysis of molecular variance (AMOVA) was used. AMOVA was calculated using the Kimura two-parameter model in Arlequin ver. 3.1 (Excoffier et al., 2005). We ascertained the allele type frequencies based on microsatellite loci for each population and estimated the pairwise genetic distances between the populations based on allele type frequencies with PowerMarker ver. 3.5 (Liu & Muse, 2005). Based on the pairwise genetic distances, a network estimating the genealogical relations among the 20 regional populations was calculated with SplitTree4 (Huson & Bryant, 2006). We tested the genetic differentiation among the populations using a model-based Bayesian analysis with STRUCTURE ver. 2.3.4 (Falush, Stephens, & Pritchard, 2003;Pritchard, Stephens, & Donnelly, 2000) under the following conditions: a correlated-allele model with a 500,000 burnin period, 750,000 MCMC reps after burn-in, K from 2 to 8, and 20 iterations. The value of the ad hoc statistics ∆(K) was then estimated with Harvester (Earl & von Holdt, 2012) using the average value of LnP(D) to estimate the number of genetic groups (Evanno, Regnaut, & Goudet, 2005).

| Mitochondrial sequence data analysis
The obtained sequences were manipulated as a raw data set using MEGA 6 (Tamura, Stecher, Peterson, Filipski, & Kumar, 2013), and sequence divergence was estimated. The standard diversity indices (the number of haplotypes and polymorphic sites) were estimated using DnaSP ver. 5.10.01 (Librado & Rozas, 2009), and the raw data set was converted for analysis in Arlequin and NETWORK. The molecular diversity indices (haplotype diversity and nucleotide diversity) were estimated using Arlequin ver. 3.1 (Excoffier et al., 2005). To estimate the genealogical relations among the haplotypes, a median-joining network was calculated using NETWORK ver. 4.6.1.3 (http://www. fluxus-engineering.com). F ST distances among all pairs in the population were used to assess the genetic structure of L. dispar asiatica.
Population pairwise F ST values were calculated using the Kimura two-parameter model in Arlequin ver. 3.1 (Excoffier et al., 2005) (significance test = 0.05; significance level = 1,000 permutations). To estimate genetic differentiation among regional populations, AMOVA was used following the Kimura two-parameter model in Arlequin ver.
3.1 (Excoffier et al., 2005). To test the IBD model, the correlation between genetic distance and geographic distance was calculated using Mantel's test with 30,000 randomizations in IBD ver. 3.23 (Jensen et al., 2005).
To estimate the demographic history of the gypsy moth populations, a mismatch distribution analysis was conducted using Arlequin ver. 3.1 (Excoffier et al., 2005). Sudden expansion of the population was first estimated in a mismatch distribution graph: unimodal or not unimodal. Deviation from the demographic expansion model was estimated using the sum of the squared deviation and Harpending's raggedness index (Harpending, 1994). To estimate population equilibrium, Fu's FS test (Fu, 1997) and Tajima's D test (Tajima, 1989)  distribution, the expansion time of the population was calculated using the formula tau = 2ut (tau = age expansion parameter; u = the aggregate mutation rate over the region of DNA under study; and t = generation time) (Roger & Harpending, 1992), with the assumption that the mutation rate of insect mitochondrial DNA is 2.3% per million years (Brower, 1994).

| NGS sequencing and microsatellite marker development
Illumina sequencing of the genomic DNA from L. dispar eggs obtained 3,974,358,483 bp from 15,988,036 reads, with an average of 248 bp per read, which assembled into 718,940 contigs, with an average of 511 bp per contig (Table S1). The contigs contained 1,867 microsatellite loci (excluding AT repeats; the length of the repeated base, 2-4 bp; and repeated more than four times). Of these, 430 loci showing more than six repeated motifs were tested for the probability of marker design with PRIMER3 (Koressaar & Remm, 2007;Untergrasser et al., 2012). We were able to design primer sets for 207 loci, from which we randomly selected 150 loci for PCR tests to examine polymorphism.

| Microsatellite marker assessment
Microsatellite markers were initially assessed using 20 regional populations (one population from Mongolia, one from China, one from T A B L E 2 (Continued) Russia, 12 from Korea, and five from Japan). Testing for genotyping errors at each locus revealed that one marker (346,977 in Site 22) showed evidence of null alleles (Table 4). However, the null allele frequency of the marker was 0.1947, which is lower than 0.20. In microsatellite analysis, the frequencies of null alleles are almost always p < .40 and usually p < .20 (Dakin & Avise, 2004). When microsatellite null alleles are uncommon to rare (p < .20), their presence causes a slight underestimate of the average exclusion probability at a locus; however, this is usually not of sufficient magnitude to warrant great concern. For p > .20, however, the mean "estimated with null" exclusion probability can be much higher than the "true" and "estimated without null" values (Dakin & Avise, 2004). Therefore, we retained the marker showing p < .20 in our population genetic analysis. We estimated that these markers were independently evolved within our samples; however, the tests for linkage disequilibrium were not significant.
Therefore, genetic diversity indices (major allele frequency, genotype number, allele number, gene diversity, observed heterozygosity, and PIC) of each regional population were estimated with eight markers.
We determined the allele type frequencies based on microsatellite loci for each population (Table S2). For each marker, gene diversity, observed heterozygosity, and PIC ranged from 0.2500 to 0.8963, 0.2778 to 1.0000, and 0.2374 to 0.8877, respectively. The genetic diversity was high, ranging from 2 to 16 alleles for each marker (Table S3). The exact p-values of HWE were calculated for each regional population after sequential Bonferroni corrections (p = .0003). Deviations from HWE were not detected in the exact p-values; however, some markers were identified as relatively low exact p-values in each population (Table S3). Thus, we compared the differences between gene diversity and observed heterozygosity in each population for each marker.
Two populations, Sites 12 and 26, had observed heterozygosity values lower than the gene diversity values in all loci except locus 344,041.
Lower observed heterozygosity values than the gene diversity values suggest significant homozygosity, and this implies the presence of null alleles or allelic dropout, linkage of alleles, or inbreeding (Damm, Armstrong, Arjo, & Piaggio, 2015). However, we tested for the presence of null alleles or allelic dropout and linkage of alleles through the previous analyses. Lastly, if the violation were a consequence of inbreeding, we would have expected to observe such a phenotype at many or all loci, not just at a single locus (Damm et al., 2015;Selkoe & Toonen, 2006). The samples from Sites 12 and 26 might indicate inbreeding or sib sampling. In this study, however, we retained the samples from the two sites in our analysis because the deviation from HWE was not detected at all analyzed loci (Table S3). Therefore, we suggest that the developed eight novel microsatellite markers may be useful for a population genetic analysis of L. dispar.

| Pairwise F ST genetic distances
The population genetic structure of L. dispar in Far East Asia was calculated with F ST values. Pairwise F ST distances among regional populations ranged from −0.0087 to 0.1171 (Table 5: Lower side).
Considering the genetic distances in each geographical region, the regional populations in Hokkaido (Sites 35-38), the species origin region, showed relatively low genetic distances (from −0.0055 to −0.0010); however, compared to other regional populations, their genetic distance was relatively high (0.0472 to 0.1171). A Mongolian regional population (Site 34), which was further from Hokkaido than other regional populations, showed relatively high genetic distances

| NeighborNet network
In a NeighborNet network based on pairwise F ST genetic distances, the 20 regional populations could be divided into five groups: Group 1, distance; however, their genetic distance was similar to Site 1 (geographically close to Incheon Harbor) and Site 16 (geographically close to Uljin Harbor). These two regions are geographically close to Busan Harbor, which is a frequent entry port for vessels (Choi, 2014). We therefore suspect that these two regional populations may frequently interbreed with the regional populations near Incheon Harbor and Uljin Harbor.

| Bayesian clustering
For the model-based Bayesian analysis, K was estimated by varying it from two to eight, and the ad hoc statistics ∆(K) (Evanno et al., 2005) indicate the maximum level of structure in three genetic groups ( Figure 5). Lymantria dispar has been divided into two subspecies in Asia, L. dispar asiatica (or L. dispar dispar) and L. dispar japonica, based on mitochondrial DNA and microsatellite analysis (Bogdanowicz et al., 2000;Wu et al., 2015). Our study showed similar results; however, the Far East Asian gypsy moth populations were distinguishable as three types according to sampling region ( Figure 6). Comparing the individual colored bar plots among the regional populations revealed that the frequency of the green-colored genetic content was high in Hokkaido regional populations (the species origin region) (Figures 6q, r, s, and t), the frequency of the red genetic content was high in Jeju regional populations (Figure 6o), and the frequency of the blue genetic content was high in Mongolian regional populations (Figure 6a).

The regional populations from the Korean Peninsula and adjacent
areas showed a mixed pattern in comparison with the Jeju regional populations and Mongolian regional populations.
Comparing the individual colored bar plots of each regional population, Sites 35, 36, 37, and 38 from Hokkaido were clearly distinct in genetic makeup from the regional populations of the Korean Peninsula and adjacent areas ( Figure 6). Several individuals ( (Table 6, F ST = 0.04192). In the Chinese and Russian regional populations, however, the blue genetic content was higher than other genetic content types. Several individuals (Figure 6i:   (Table S5).

| Mitochondrial genealogy
In the median-joining network, three high-frequency haplotypes (H1, 151ex; H37, 75ex; and H90, 73ex) were connected to each other by low-frequency haplotypes (Figure 7). This pattern was revealed in the pairwise F ST distances (  In particular, haplotype H90 appeared only in Hokkaido regional populations and was connected with haplotype H1 by haplotype H95, which is another Hokkaido haplotype (Figure 7). The were connected with each other through low-frequency haplotypes ( Figure 7). Therefore, the Far East Asian gypsy moth populations may have undergone sudden population expansion.

| Mitochondrial DNA haplotype mismatch distribution
The median-joining network revealed a star-shaped mtDNA genealogy, so we analyzed the mismatch distribution, applying a sudden population expansion model. We conducted the analysis using the three groups recognized above, and we found that the mismatch F I G U R E 7 Median-joining network using mitochondrial genes of Lymantria dispar from Far East Asia graphs of the groups were unimodal and the mismatch parameters were insignificant (Figure 8). In neutral equilibrium, Tajima's D and Fu's FS tests also had negative values in all three groups (Figure 8).
We therefore consider that the mismatch analysis supports a sudden population expansion.
The expansion time of each group was inferred using the observed value of the age expansion parameter (tau), the equation t = tau/2u (Roger & Harpending, 1992), and an insect mtDNA mutation rate of 2.3% per MY per lineage for silent sites (Brower, 1994

| DISCUSSION
The taxonomic status of the two subspecies of L. dispar in Far East Asia has been debated (Arimoto & Iwaizumi, 2014;Schintlmeister, 2004). In a recent study using molecular data (Wu et al., 2015), L. dispar dispar (European subspecies) was clearly distinct from the Asian two subspecies; however, the Asian subspecies were difficult to distinguish from each other. The Japanese subspecies, L. dispar japonica, was genetically similar to the populations from the southern end of the Korean Peninsula, and the Korean populations had mixed genetic content (Wu et al., 2015). We examined the previous study's collecting sites and found they were mainly located near seaports. We therefore included inland populations in the present study ( Figure 2).
The Würm glacial period can be divided into three glacial stages and two subinterglacial stages (Gao et al., 2016;Han & Meng, 1996;Ma, Yu, Wang, & Yao, 2006). The mean temperature during the period was approximately 5°C lower than at present, based on the snow-line elevation on Mt. Fuji in Japan (Kim, 2011). In Europe, three ice sheets (Scandinavian, British, and Alpine) developed to cover a large part of the continent (Trojan, 1997). The advancing glacier forced the flora and fauna of the warm and temperate zones southward, and refugia were formed in the Mediterranean region (Trojan, 1997). In a great amount of Siberia, large ice masses eliminated all plants and animals; however, eastern regions (including Ussuri Land, Korea, Manchuria, and Japan) remained ice-free as fauna-and flora-preserving areas during the glaciation period (Trojan, 1997). During these periods, the flora of the southern part of Korea showed the features of a cool, temperate climate (Chung, Lee, Lim, & Kim, 2005). For example, Polypodiaceae, Alnus spp., Carpinus spp., and deciduous Quercus spp. were distributed in the area (Chung et al., 2005;Kim, 2011). Carpinus spp. and Quercus spp. are the food plants of L. dispar asiatica in Korea (Lee et al., 2002).
During the last glacial maximum (approximately 20,000-18,000 ybp), however, Picea spp., Abies spp., Pinus spp., and Larix spp. were distributed in Far East Asia as it changed to a subarctic climate (Kim, 2011;Yoon & Hwang, 2009). The coastline during this period was quite different from the present. The west sea of the Korean Peninsula was a low hilly area because the sea level was approximately 30-130 m below present levels (Kim, 2011;Park & Cho, 1998;Park, Yoo, Lee, & Lee, 2000). The Japanese islands were connected with Sakhalin and the southeast part of the Korean Peninsula by a land bridge (Park et al., 2000;Trojan, 1997 (Park & Son, 2008), and therefore, genetic interaction between Group 1 and Group 2 would not have been possible. In the model-based Bayesian analysis using microsatellite loci, K (assumed as the number of populations) was calculated to be three, the same as the number of groups examined in the mitochondrial genealogy. The genetic diversity of the regional populations was higher in the Korean Peninsula than in other regions, with the Korean Peninsula populations showing the same mixed pattern reported previously (Wu et al., 2015). We suggest that this genetic pattern might have been caused by multiple sudden population expansions, and the demographic patterns caused by the Würm glacial period may have resulted in the present genetic diversity. In genetic makeup, however, the regional populations near the Busan seaport (Sites 27 and 28) were similar to the middle area of the Korean Peninsula. This might have been caused by vessels arriving in Korea and anchoring at the ports of Incheon or Busan (Choi, 2014;Kim, Kim, Kim, & Lee, 2008). We also looked for this genetic pattern in several samples from Russian and Japanese populations (Figures 6b, p, q, r, s, and t). Thus, we suggest that several individuals might have been introduced into each region via vessels arriving at seaports.
We can suggest that L. dispar in Far East Asia are divided into two types (the inland type and the Hokkaido type), although the analyzed samples did not cover the full distributional region of the species in Far East Asia. Taxonomically, 15 nomino-subspecies have been assigned view that L. dispar asiatica is a synonym of L. dispar dispar because the type locality of L. dispar asiatica is close to Europe, which is the type locality of L. dispar dispar. However, we consider that thorough genetic analyses on regional populations have to be undertaken in other regions of Eurasia to characterize the lineages of gypsy moth across its native range. A taxonomic system of the subspecies of L. dispar could therefore be re-established if each regional lineage revealed by genetic analysis is analyzed and compared with the topotypes collected from the type locality of each subspecies.