Underestimated mitochondrial diversity in gypsy moth Lymantria dispar from Asia

The gypsy moth Lymantria dispar (L.) (Erebidae) is one of the most important agricultural and forest insect pests. It currently includes three subspecies: Lymantria dispar dispar (L.), Lymantria dispar asiatica Vnukovskij and Lymantria dispar japonica (Motschulsky). The first subspecies is known as the European gypsy moth (EGM), whereas the latter two are collectively referred to as the Asian gypsy moth (AGM). Asian gypsy moth possesses traits that include a strong female flight capability and broader larval host range, which make it a more threatening pest than EGM when invading new areas. To better understand the genetic structure within AGM and in particular among Chinese populations, we reconstructed a mitochondrial gene tree based on four mitochondrial protein‐coding genes (ND2, ND6, ATP6 and ATP8) from 17 Chinese locations and 21 locations from other countries. None of the three subspecies of gypsy moth is recovered as monophyletic on the mitochondrial gene tree, likely as a result of human‐mediated transportation and a high level of cryptic genetic diversity. Chinese populations are split into four mitochondrial clusters. Surprisingly, China I is nested within EGM and China II is the sister group of sampled EGM. China III grouped with samples from South Korea and the Russian Far East to represent the typical L. d. asiatica. China IV formed a sister relationship with the former group. The results of the present study confirm that the recently discovered genotype from the Black Sea‐Caspian region consists of an independent mitochondrial genetic lineage from other subspecies.


Introduction
The gypsy moth Lymantria dispar (L.) is a highly destructive defoliator. Its larvae attack a broad group of host plants ranging from oaks to conifers, causing billions of dollars in damage to the natural forest and urban trees (Pogue & Schaefer, 2007). The distribution of gypsy moth spans the temperate forests of the entire Eurasian continent, including Japan, having extended into North America approximately 150 years ago (Giese & Schneider, 1979). Species with such wide ranges are often assigned into multiple subspecies or geographical races, sometimes even arbitrarily (Huang & Knowles, 2016). Three subspecies of L. dispar are currently recognized mainly based on their geographical distribution and different female flight capability: Lymantria dispar dispar (L.) from Europe and North America, also known as European gypsy moth (EGM); Lymantria dispar asiatica Vnukovskij from mainland East Asia; and Lymantria dispar japonica (Motschulsky) from Japan. Although a systematic review by Pogue and Schaefer (2007) defined wing colour and markings for subspecies identification, it has been shown that such characters are not accurate predictors of taxonomic status (Keena et al., 2008). The most notable feature shared by the two Asian subspecies is that most Asian females are capable of strong ascending flight, which is rarely found in EGM populations (Rozkhov & Vasil'eva, 1982;Wallner et al., 1995;Keena et al., 2008). This characteristic could allow them to spread rapidly once established in introduced areas. For pest management purpose, the United States Department of Agriculture used to define any biotype of gypsy moth possessing female flight capability as the Asian gypsy moth (AGM) (Keena et al., 2008). Additionally, AGM larvae have a broader host range that covers over 500 tree species, including those not utilized by EGM (Baranchikov & Sukachev, 1989).
When commercial vessels are docked at East Asian seaports, female AGM are often attracted to light sources, resulting in egg mass deposition on cargo or the superstructure of the ship (Wallner et al., 1995). AGM has been intercepted at North America from 1991 onward but has yet not been established (Qian, 2000). The source of some of those introductions comprised ships from the Russian Far East [Animal and Plant Health Inspection Service (APHIS), 2016]. To prevent further introductions of AGM, the U.S.A., Canada and other countries have negotiated trade deals with Russia and East Asian countries, requiring cargo coming from regions with native AGM populations is treated or quarantined. For example, the North America Plant Protection Organization (NAPPO) (China) has adopted specific phytosanitary measurements for five types of cargo and vessels that come from, or have visited, areas infested with AGM (NAPPO, 2009). Given the ecological and economic significance of AGM, there has been a long-standing interest in research that focuses on classification, ecology and evolution of this pest insect. However, despite the fact that China represents a major portion of the geographical range of AGM, genetic diversity and phylogenetic relationships among Chinese populations have not been studied in great detail. Gypsy moths from China are often treated as a single population (Pogue & Schaefer, 2007).
Mitochondrial DNA (mtDNA) is widely used as a convenient molecular marker in population-and species-level studies of phylogenetic relationships. Djoumad et al. (2017) sequenced the entire mitochondrial genome of 10 gypsy moths throughout the range of the species and suggested that mtDNA contains many variable sites suitable for deciphering relationships between subspecies and among populations. Their study also discovered an Iranian haplotype, which is the sister lineage to all other gypsy moth subspecies. In an analysis of the mitochondrial cytochrome c oxidase subunit I (COI) gene, Chen et al. (2016) found that gypsy moths from two populations in China had both Asian and European mitochondrial haplotypes, suggesting undocumented cryptic genetic lineage or human-aided movement of foreign moths into China. In the present study, we aimed to better understand the cryptic mitochondrial genetic diversity within gypsy moth subspecies and, in particular, among Chinese populations, using three mitochondrial fragments that span four protein-coding genes: ND2, ND6, ATP6 and ATP8. This work represents by far the most extensive sampling of gypsy moths from China (Wu et al., 2015;Djoumad et al., 2017).

Insect collection and DNA extraction
Gypsy moth samples were collected from 38 geographical locations worldwide, including 17 in China (Fig. 1). Pheromone-trapped adult male moths and hand-collected egg masses and larvae were used in the sample collection. To ensure that our samplings represent local genetic diversity, adult moths were mixed from different traps and egg masses and larvae were collected from multiple distant trees. The number of moths collected from each location is listed in Table 1. Adults were poisoned and dried for DNA extraction. Larvae and eggs were soaked in anhydrous ethanol to induce death. Genomic DNA was extracted using the Insect gDNA Miniprep Kit (BIOMIGA, China) in accordance with the manufacturer's instructions. Extracted DNA was examined using a NanoDrop 2000 Spectrophotometer V1.0 (Thermo Fisher, Waltham, Massachusetts) to ensure appropriate quality and quantity and then stored at −20 ∘ C. Sample collection was authorized by the National Forestry Bureau Forest Protection Station (China). All samples were used for scientific experiments only.

Phylogenetic analysis
DNA sequences were checked using bioedit (Hall, 1999) and mega, version 6 ( Tamura et al., 2013). The four mitochondrial protein-coding gene sequences were assembled and concatenated into a combined dataset for analyses using mega. Corresponding DNA sequences of the Iranian specimen from Djoumad et al. (2017) were downloaded from Gen-Bank (accession number KY923068) for comparison with our dataset. We also calculated mean p-distance among genetic groups in mega. Portions of alignment where indels were present were eliminated from the calculation. Mitochondrial sequences were used to infer phylogenetic relationships using a Bayesian approach with Lymantria umbrosa as the outgroup Figure 1 Sampling sites of Lymantria dispar in China, South Korea, Japan and other parts of the world (the hatched area shows locations of the former three countries). Chinese sites are denoted by solid symbols representing mitochondrial lineages. Arrows indicate sites that are close to others and cannot be positioned exactly. Other sampling sites are denoted by open circles. Numbers next to sampling sites correspond to locality names listed in Table 1. taxon because of its known close relationship with gypsy moth (Djoumad et al., 2017).
We first tested adherence of the phylogeny to a molecular clock using Tajima's one degree of freedom method (Tajima, 1993) implemented in mega. If the clock hypothesis was not rejected, we estimated the mitochondrial genealogy with divergence times within L. dispar under a strict clock model in beast, version 1.7.5  with the lepidopteran mtDNA substitution rate of 1.1 × 10 −8 per site per year (Brower, 1994). The nucleotide substitution model GTR+I+G was selected based on the Akaike information criterion obtained from jmodeltest, version 2.1.4 (Darriba et al., 2012). The Bayesian Markov chain Monte Carlo method was run for 50 million generations sampled every 5000 generations. Chain convergence was assessed by effective sample size values of the parameters in tracer, version 1.5 . The first quarter of trees were discarded as burn-in. We also used maximum likelihood (ML) analysis in raxml, version 7.0.4 (Stamatakis et al., 2008) to confirm the topology recovered by Bayesian analysis. The same substitution model was used and rapid bootstrap values were calculated with 100 replicates. In addition, we constructed a haplotype network based on the statistical parsimony method using tcs, version 1.21 (Clement et al., 2000) with 95% confidence limit for connections. The haplotype network can display mutational differences between haplotypes.

Results
The final alignment contains 146 L. dispar and six L. umbrosa, and there are 78 distinct haplotypes in the former species. The total length of concatenated sequences is 2102 bp. Gen-Bank accession numbers for all newly generated sequences are provided in the Supporting information (Appendix S1). The Bayesian and ML analysis yielded congruent tree topologies and so only the Bayesian tree is presented here (Fig. 2) (a version of the Bayesian tree with complete terminal taxa names is provided in the Supporting information, Appendix S2). The overall tree topology resembles those of previous studies from deWaard et al. (2010), Wu et al. (2015) and Djoumad et al. (2017), which placed the basal dichotomy within L. dispar between Asian populations and European/North American populations. We also discovered unexpected genetic groupings from samples collected in Asia. Particularly the placements of some Chinese populations lead to the lack of monophyly in all three currently recognized subspecies. The most parsimonious network is also in line with the Bayesian tree and those cryptic Chinese lineages are characterized by long steps of mutations from common haplotypes (see Supporting information, Appendix S3). Uncorrected group mean p-distances between Chinese groups are provided in  Another notable result is that two Chinese locations, Zunyi and Dayi, contained haplotypes belonging to different Chinese groups and are yet are distributed in sympatry (Fig. 1). In Zunyi, specimens possessed haplotypes from either the China II or III groups, whereas individuals collected in Dayi were separated into the China III and IV groups. Our result also showed that samples collected from Krasnodar, Russia grouped with the Iranian genotype, which could represent an independent lineage from the three known subspecies. Lastly, Tajima's one degree of freedom method did not reject the null hypothesis of a strict molecular clock ( 2 = 3, d.f. = 1, P = 0.08). Molecular dating suggested that most major cladogenetic events occurred between 0.1 and 0.3 million years ago during the Middle Pleistocene.

Discussion
Given the economic and ecological importance of gypsy moth, an accurate assessment of the extent of the genetic diversity in this insect provides critical information for pest management, including the development of prevention and exclusion measures. Although it is easier for policy-makers to consider different subspecies of L. dispar as discrete taxonomic units, the actual relationships between genetic lineages could be more complex. The results of previous studies (deWaard et al., 2010;Wu et al., 2015;Djoumad et al., 2017) involving a phylogenetic analysis supported monophyly for some of the subspecies, although the results of the present study show that reciprocal monophyly of all three recognized subspecies is not recovered by the mitochondrial data. A major reason for this result is the high level of cryptic diversity among Chinese populations, which have diverged into at least four genetic clusters (China I-IV). Uncorrected between-group mean p-distances are between 0.7% and 1.19%, which are comparable to the divergence between Asian and European gypsy moths (Wu et al., 2015). However, the lack of phylogenetic monophyly does not necessary reflect an imperfect taxonomy. For a group of organisms that are almost morphologically indistinguishable, such as subspecies of L. dispar, specimen identity is often assumed to be in accordance with their geographical origin. However, the assumption may not hold true when samples are moved outside their native range by humans (see below). Thus, the interpretation of relationships between subspecies can be confounded.
One interesting discovery is that moths from the China III and IV groups fall into the EGM clade. Chen et al. (2016) constructed a mitochondrial gene tree based on COI sequences and also suggested close affinity between EGM and a likely AGM population from the Russian Siberia. Therefore, both studies indicated that some AGM individuals possess mitochondrial haplotypes that are more closely related to EGM than to other AGM lineages. In our case, the two sites in Guizhou province are characterized by a major altitudinal change, with the highest mountain peak at 1750 m a.s.l. in the south and the lowest valley at 609 m a.s.l. in the north. Such geographical properties produce a three-dimensional distribution of climate characteristics, which could contribute to genetic isolation such that evolutionary forces including drift and selection would take effect. However, this hypothesis does not explain the appearance of similar haplotypes in Inner Mongolia over 2500 km away. An alternative, more plausible reason for China III and China IV appearing to be closely related to EGM comprises the historical human-mediated introductions of EGM into China. For example, during World War II, United States Air Force Bases were built in numerous places in China, including Guizhou, where Kaili Huangping Airport was completed in March 1945. This airport was only approximately 150 km from either Xifeng or Zunyi. It was an important military base for the United States Air Force when assisting China in their defence against Japanese aggression. Kaili Huangping Airport was home to over 800 famous pilots ('Flying Tigers') and staff and developed into a central airport in the southwest region (Pan & Yang, 2015). More than 100 aircraft took off each day. Military cargo originating in the U.S.A. may have accidentally transported gypsy moth egg masses or pupae to China, from which the China III and China IV groups are descended. Additionally, current international trade can continue to introduce non-native individuals if the risk of introduction is not mitigated (APHIS, 2016). After establishment, gypsy moth egg masses and larvae may further hitchhike on almost any household items to travel to other parts of the country, leading to an admixed genetic pattern. Similarly, the placement of four specimens collected from Kyushu, Japan, into the clade of mainland AGM populations could also be the result of human-aided introduction.
Another interesting observation within Chinese gypsy moths is that some locations harbour mitochondrial lineages belonging to different groups (China II, III and IV). To our knowledge, this level of genetic heterogeneity has never been reported in previous studies. Zunyi of Guizhou province and Dayi of Sichuan province, two locations in southwestern China, both demonstrated a sympatric mixture of two groups of haplotypes.
One explanation for this is retained ancestral polymorphism, similar to that observed in the Florida grasshopper sparrow (Bulgin et al., 2003). On the other hand, our analysis indicates that the divergence of those haplotypes occurred approximately 0.2-0.3 million years ago. This could be explained by the onset of a greater glacial-interglacial variability subsequent to the Middle Pleistocene. When ancestral populations retreated to warmer areas in the south during glacial periods, genetic divergence could take place as a result of isolation of distant refugia. When the temperature rose again, populations from separated refugia came into contact (post-glacial secondary contact), resulting in the sympatric occurrence of multiple genetic lineages (Kangas et al., 2015). Lastly, we cannot rule out human-mediated introduction that would also bring distinct genetic lineages into contact. Locating source populations of those different genetic lineages in Zunyi or Dayi from elsewhere would support the latter hypothesis.
An early study suggested the likelihood of mitonuclear discordance when assessing gypsy moth population genetic structure . For example, populations from Central Asia display EGM-type mitochondrial haplotypes, as also shown in the present study, whereas females from these populations often exhibit AGM-like flight capability and their nuclear genomes appear to have more in common with L. dispar asiatica (Keena et al., 2008;Wu et al., 2015;Stewart et al., 2016;Djoumad et al., 2017;Picq et al., 2017). This is probably because mitochondrial genomes are maternally inherited. Therefore, it is likely that phylogenetic tree reconstructed by nuclear genes will have a topology different from that of a mitochondrial gene tree. To further test the monophyly of each subspecies of L. dispar and understand the mechanism underlying the observed mitochondrial genetic pattern (human-mediated introduction versus evolutionary forces), a collaborative work that focuses on the nuclear genome of those questionable AGM samples is in progress.
The present study confirms the results of the study by Djoumad et al. (2017) showing that the Iranian genotype represents a gypsy moth mitochondrial lineage separated from the currently recognized three subspecies, and this Iranian specimen is nested within our independent collections of moths from the nearby coast of the Black Sea. We cannot rule out the possibility that those specimens represent another named but little known or rarely studied species of Lymantria, although we queried the published COI barcode of the Iranian specimen against The Barcode of Life Data System (http://www.barcodinglife.org) and did not find any close match (> 99%) with other available reference barcodes in the genus. This new lineage could either replace L. umbrosa as the immediate sister taxon to gypsy moth or may be more closely related to the former species than to the latter. However, morphological comparison or nuclear DNA data are not currently available. The distributional range of the new lineage does not appear to be restricted to a single location but may encompass the Black Sea-Caspian region (and therefore can be referred to as the Black Sea-Caspian lineage). A recent study identified additional specimens from Georgia that also belong to the Black Sea-Caspian lineage (Lackovi'c et al., 2018). Its taxonomic status should be examined in an integrative perspective (i.e. a combination of molecular, morphological and ecological data) because the result may impact regulatory policies, which need to reflect the correct taxa composition of quarantine species.
In conclusion, in the present study, we report genetic evidence for the lack of monophyly of all three subspecies of L. dispar, likely as a result of both human-mediated introduction (China I and II) and a high level of cryptic genetic diversity within Chinese populations (China III and IV). The occurrence of four divergent Chinese mitochondrial lineages, some of which are sympatric, highlights the importance of developing efficient strategies that prevent their introduction to other countries. Admixture among separate invasive AGM lineages could increase the adaptive potential, contribute to successful invasions in new areas, increase the likelihood of mating with local EGM population, and confound attempts to monitor or eradicate L. dispar in a given area.