Global phylogeography and invasion history of the spotted lanternfly revealed by mitochondrial phylogenomics

Abstract Biological invasion has been a serious global threat due to increasing international trade and population movements. Tracking the source and route of invasive species and evaluating the genetic differences in their native regions have great significance for the effective monitoring and management, and further resolving the invasive mechanism. The spotted lanternfly Lycorma delicatula is native to China and invaded South Korea, Japan, and the United States during the last decade, causing severe damages to the fruits and timber industries. However, its global phylogeographic pattern and invasion history are not clearly understood. We applied high‐throughput sequencing to obtain 392 whole mitochondrial genome sequences from four countries to ascertain the origin, dispersal, and invasion history of the spotted lanternfly. Phylogenomic analyses revealed that the spotted lanternfly originated from southwestern China, diverged into six phylogeographic lineages, and experienced northward expansion across the Yangtze River in the late Pleistocene. South Korea populations were derived from multiple invasions from eastern China and Japan with two different genetic sources of northwestern (Loess Plateau) and eastern (East Plain) lineages in China, whereas the each of Japan and the United States had only one. The United States populations originated through single invasive event from South Korea, which served as a bridgehead of invasion. The environmental conditions, especially the distribution of host Ailanthus trees, and adaptability possibly account for the rapid spread of the spotted lanternfly in the native and introduced regions.

molecular markers can promote effective monitoring and quarantine aimed at the source regions (Ascunce & Shoemaker, 2011).
Defining the accurate phylogeographic pattern and demographic history of native and invasive populations is crucial for effective management of invasive pests, because different control measures need to be applied to those populations with significant genetic structures and different sources. Studying genetic diversity of native and invasive pest populations can also provide important insights understanding the successful invasion process. For instance, higher genetic diversity might be associated with stronger adaptive abilities (Garnas et al., 2016), and low genetic diversity among invasive population may suggest the presence of bottleneck or founder effects (small initial invasive population) during initial invasion (Javal et al., 2019;. In addition, comparing genetic background of native and different invasive populations may inform the sites of secondary introductions and intermediate invasion regions between native and an introduced country (Bertelsmeier & Keller, 2018;Qiao et al., 2019). A comprehensive understanding of genetic diversity and invasion history of invasive pests is fundamental for future study on the genomic mechanism behind their successful invasions.
The spotted lanternfly (SLF), Lycorma delicatula (White), is a phytophagous hemipteran insect first recorded and commonly found in northern China (Liu, 1939). In China, this hopper is not regarded as a serious pest because it primarily feeds on non-economic plants, including Ailanthus altissima (Miller) Swingle (tree-of-heaven), Melia azedarach Linnaeus and Vitis plants. However, in its new habitats, it feeds on grapevines and other fruit trees, leading to a huge decline in the quality and yield of grapes and causing severe economic losses to the fruits industry. The invasion of the SLF into other country was first confirmed in 2004, when it was found in western South Korea. This is followed by its rapid spread across the country (Han et al., 2008).
Subsequently, it was identified in Ishikawa Prefecture, Japan Tomisawa et al., 2013), and recently in the eastern United States, including 13 counties in southeastern Pennsylvania and nine other surrounding states (NYSIPM, 2020), despite that persistent quarantine efforts surrounding the first detection site have been made. Because the SLF is the only fulgorid planthopper in northeastern North America, very few natural enemies have been reported attacking it, so that the spread of this hopper will likely continue and increase further (Clifton et al., 2019). Commercial transportation of more than 70 fruit and timber plants in 25 families, which are potential hosts, may be involved in its rapid, worldwide spread (Dara et al., 2015). For these phytophagous insects, the environmental conditions, especially food resources, play a pivotal role in determining successful colonization and spread in the introduced regions.

Several phylogeographic and population genetic studies
have partially revealed the invasion process of the SLF between East Asian countries Park et al., 2013;Zhang et al., 2019). Analysis of specimens from China, South Korea, and Japan suggest that native populations in northern China of the SLF is likely the source of introduction to Japan and South Korea Zhang et al., 2019), and that multiple independent invasion in the western coastal regions, followed by long-distance, inland dispersal may account for the rapid spread of the SLF throughout South Korea . It remains unknown, however, how this hopper crosses the Pacific Ocean, arrives and spreads in the United States Park et al., 2013;Zhang et al., 2019).
Previous phylogeographic analysis for SLF, based on limited population sampling and a few mitochondrial markers have been unable to reveal detailed invasion history due to limited phylogenetic resolutions (Hirase et al., 2016). Mitochondrial genome (mitogenome) is a complete organelle genome in eukaryotic cells, comprising 37 coding genes and a control region (Cameron, 2014). Mitochondrial genes have short coalescent times, unambiguous orthology, and rapid evolutionary rates compared to nuclear genes (Allio et al., 2017) and have been the foundation of phylogeography studies for the past 30 years (Hickerson et al., 2010). Recently, whole mitogenome sequences have been used in animal phylogeography to provide high resolution phylogenetic tree (Bjork et al., 2011;Chang et al., 2017;Du et al., 2020;Fields et al., 2018;Ma et al., 2012;Morin et al., 2010).
The development of high-throughput sequencing offers a cost-effective approach to study large-scale mitogenomic phylogeography, especially for recently diverged species (Du et al., 2019). To obtain a comprehensive understanding on phylogeographic pattern and invasion history from East Asia to North America, we analyzed whole mitogenome sequences of 392 individuals, representing a total of 40 native and invasive populations. We estimated divergence time and demographic history of different phylogeographic lineages of the SLF. Based on these results, we discussed the origin, dispersal, and demographic history of this hopper in China, as well as its spread route into other parts of the world, including North America.

| Sampling and total genomic DNA extraction
A total of 392 SLF specimens were collected from 40 populations in China, South Korea, Japan, and the United States, covering the native and invasive regions of this species (Figure 1, Table 1). One specimen of Lycorma meliae collected in New Taipei City of Taiwan was included as an outgroup. All samples were stored immediately in absolute ethanol in the field and transferred to −80°C freezer back to the laboratory. Total genomic DNA was extracted from the muscles of thorax using a DNeasy Blood & Tissue Kit (QIAGEN).

F I G U R E 1
Geographical distribution of 40 sample populations in East Asia and United States. The colors of the circles represent the six phylogeographic lineages referring to Figure 3b, and the area of the circle is proportional to the number of individuals except YNPB. See Table 1      π, nucleotide diversity.

TA B L E 1 (Continued)
2.2 | Mitochondrial metagenomic sequencing and assembly from pooled DNA DNA from each SLF specimen was pooled with equimolar quantities of genomic DNA from another two species studied in our laboratory (a lace bug and an assassin bug) to reduce the sequencing expense after several preliminary experiments (Gillett et al., 2014;Hahn et al., 2013). A library with insert size of 350 bp was constructed from each sequencing pool and 150 bp paired end reads were sequenced to acquire at least 6 Gb data through Illumina NovaSeq 6000 platform (Berry Genomics). Clean reads were obtained after trimming adapters using Trimmomatic (Bolger et al., 2014). The published mitogenome (GenBank accession EU909203) was used as a reference in our study. For each library, clean reads were mapped onto the reference mitogenome (with the sequencing coverage summarized in Table S6), using Geneious 11.1.4 (http://www.genei ous. com/) with parameters of up to 5% mismatches, a maximum gap size of 5 bp, and minimum overlap of 40 bp with at least 95% similarity.
Due to the distant relationships of the three species and the strict similarity and continuity requirements of the parameters, only reads of SLF were allowed to be isolated and mapped onto the reference genome from the sequencing readpool. The consensus sequences were generated as novel individual mitogenomes under threshold of 75% uniformity each site. The mitogenomic assembly of L. meliae also followed this workflow with additional mapping iterations to fish reads to known regions until the whole sequence was constructed (Hahn et al., 2013). Annotation of transfer RNA (tRNA) genes was rechecked using MITOS Web Server (Bernt et al., 2013).
Protein-coding genes (PCGs) and ribosomal RNA (rRNA) genes were aligned with homologous genes of the reference mitogenome.

| Phylogenetic analysis and network
Twenty-two tRNAs, two rRNAs, one control region, and several non-coding regions of the mitogenome were aligned respectively using MAFFT 7.0 online server with the G-INS-i strategy (Katoh & Standley, 2013). Thirteen PCGs were also aligned using TranslatorX online platform (Abascal et al., 2010) with all stop codons removed.
In this study, we prepared two concatenated datasets in the following analysis: (a) the WG dataset, which contained the complete mitogenome sequences; and (b) the PRT dataset, which included all 13 PCGs, rRNA, and tRNA sequences. Bayesian analysis of population structure was performed using BAPS 6.0 (Cheng et al., 2013), based on the WG and PRT datasets under the spatial clustering of groups of individuals.
Phylogenetic reconstruction was performed based on the haplotypes of WG dataset. Haplotype data were generated using DnaSP 6.0 (Rozas et al., 2017). PartitionFinder2 (Lanfear et al., 2012) was used to select the best-fit schemes and substitution models with "greedy" algorithm and Akaike information criterion. Although 44 partitions were predefined for the WG dataset, including 39 codon partitions for PCGs, two for rRNA, and three for tRNA, control region, and non-coding region. A single scheme with GTR + I + G model was selected ultimately. Maximumlikelihood (ML) analysis was applied with 1,000 replicates using ultrafast bootstrap approximation approach (Minh et al., 2013) in IQ-TREE 1.6.5 (Trifinopoulos et al., 2016). Bayesian inference (BI) analysis was performed using MrBayes 3.2.2 (Ronquist & Huelsenbeck, 2003) with two simultaneous Markov chain Monte Carlo (MCMC) runs of 2 million generations. All trees were sampled every 1,000 generations, and first 25% of them were discarded as burn-in.
Median-joining networks (Bandelt et al., 1999) were constructed separately for three lineages resolved in phylogenetic analysis above based on the PRT dataset using PopART (population analysis with reticulate trees) (Leigh & Bryant, 2015).

| Genetic diversity and population differentiation
To test genetic diversity divergence across geographic populations and phylogenetic lineages in our study, the number of polymorphic sites (S), the number of haplotypes (H), haplotype diversity (Hd), and nucleotide diversity (π) were calculated based on the PRT dataset using DnaSP software. We used Arlequin software to calculate the genetic differentiation F ST values and to obtain p-values for F ST values using 3,000 permutations. MEGA7 software (Kumar et al., 2016) was used to calculate uncorrected p-distance between different populations, lineages, and species to check their differentiation levels. Mantel tests were performed to test the isolation-by-distance model for all populations. F ST and linear geographic distance (ln km) were plotted with 999 replicates in the ade4 v. 1.7 module of R with the introduced populations excluded.

| Divergence time estimation
Divergence time was calculated using BEAST 2.4.8 (Bouckaert et al., 2014) based on the PRT dataset. Considering there are no fossils available for the calibration of this species, we used a molecular clock for divergence time estimation. In addition, as all gene sequences were estimated as a single scheme with GTR + I + G model above, cox1 gene was evaluated as a separate partition with the substitution rate fixed at 1.77% per site per million years (Papadopoulou et al., 2010). Other PCGs, rRNA and tRNA genes were applied as another three partitions, of which the evolutionary rates were estimated accordingly. An uncorrelated lognormal relaxed clock model was applied with the birth-death model for the tree prior. Two independent MCMC runs were performed for 500 million generations with tree sampling every 50,000 generations. We used Tracer 1.7 (Rambaut et al., 2018) to verify that the MCMC runs had reached a stationary distribution based on the effective sample sizes (ESS) of each estimated parameter to be larger than 200. TreeAnnotator program in BEAST 2 software was used to generate the consensus tree with "mean height" after discarding the first 25% trees as burn-in.

| Demographic history
The neutral tests of Tajima's D (Tajima, 1989) and Fu's Fs (Fu, 1997) were performed using Arlequin software with default parameter settings. Tajima's D compares the number of segregating sites to the average number of pairwise differences, and Fu's Fs is based on the probability to observe a certain number of alleles given the average number of pairwise differences. The p-values of these two statistics were obtained by comparing the observed statistics values with 1,000 simulated values under the hypothesis of selective neutrality and population equilibrium (Excoffier & Lischer, 2015). The mismatch distribution analyses were also performed for the six lineages using Arlequin software with default parameter settings. The mismatch distribution analysis computes the distribution of the observed number of differences between pairs of haplotypes in the populations, which is usually multimodal in demographic equilibrium population but unimodal in recent expansion populations (Rogers & Harpending, 1992). The departure of the observed mismatch distribution from the expected distribution under the expansion hypothesis was tested by comparing the observed distribution with a simulated mismatch distribution resulting from a bootstrap analysis with 1,000 replicates; two statistics, the sum of square deviations (SSD) and Harpending's raggedness index (r) were used to study the difference between observed and expected distributions (Excoffier & Lischer, 2015).

BSP analyses were applied for different lineages using Coalescent
Bayesian Skyline model for the tree prior in BEAST 2 software.
The same partitions and substitution rates were applied consistent with these in the estimation of divergence time. Two independent MCMC runs were conducted for 100 million generations sampling every 1,000 generations. Effective population size through time was determined using Tracer software to generate the plots. showed much lower diversity (Table 1) and S2). The Mantel test produced a significant correlation (p < .01) between the genetic and geographical distance in SLF populations ( Figure S1).

| Phylogeographic subdivisions and divergence time
The phylogenetic trees based on the WG dataset (including complete mitogenome sequence) using ML and BI methods contained six phylogeographic lineages (Figure 2) For the six lineages resolved in the phylogenetic analyses, the three lineages from southern Yangtze regions (Yunnan Plateau, Guizhou Plateau, and Southeast Hills) had higher diversity than those from northern Yangtze regions (Table 2). Yunnan Plateau lineage with the fewest individuals showed the highest π values, followed by Southeast Hills and Guizhou Plateau lineages (Table 2). For the three northern Yangtze lineages (East Plain, Sichuan Basin, and Loess Plateau), East Plain lineage exhibited the lowest diversity, even though it comprised the largest sample size (Table 2).  Table 3). The genetic distance (uncorrected p-distance) varied from 0.00085 to 0.01737 between the different lineages, being smallest between East Plain and Sichuan Basin lineages and largest between Southeast Hills and Yunnan Plateau lineages, which were both lower than 0.06021-0.06229 between two species (below diagonal in Table 3).
In the phylogenetic trees, the United States and most South Korea

| Haplotype networks
The median-joining networks supported similar results of the phylogenetic relationships. For the East Plain (EP) lineage, several typical TA B L E 2 Genetic diversity, neutrality test, and statistics of mismatch distribution of the six phylogeographic lineages  The other haplotypes of south EP were distributed far from the main haplotype groups with relatively large mutations (Figure 4a; Table S3).  Table S5).

| Demographic history
Multiple methods including neutral test, mismatch distribution, and Bayesian skyline plot (BSP) analyses were used to explore the

| Phylogeographic pattern in native regions
Several studies have attempted to reveal the authentic phylogeographic structure of SLF Park et al., 2013;Zhang et al., 2019). The comprehensive sampling in our phylogenetic analyses has successfully resolved this question. Six phylogeographic lineages were detected, which were consistent with the distribution of main Chinese topographical regions (i.e., East Plain, Loess Plateau, Sichuan Basin, Southeast Hills, Guizhou Plateau, and Yunnan Plateau). The Yangtze River is the most obvious geographic barrier for this species, shaping the divergence pattern between northern and southern Yangtze regions, which was also supported by previous studies Zhang et al., 2019). Besides the Yangtze River, there were several other geographical barriers that caused the differentiation of

F I G U R E 4
Median-joining networks for three phylogeographic lineages. Colored circles represent different haplotypes, black circles represent missing haplotypes that were not observed, and solid lines between haplotypes represent mutation steps. The area of the circle is proportional to the number of haplotypes. See Tables S3-S5 for haplotype frequencies expansion of this species. However, we also identified that specific Chinese populations such as Yinchuan, Ningxia (NXYC) which were located near the Loess Plateau regions but genetically, mostly contained individuals from the East Plain lineage. In this region, the SLF was first recorded in 2010, and its outbreak has occurred on the roadside Ailanthus trees since then (Zheng, 2018). It is likely that the SLF was introduced domestically through the timber trade and transportation between different provinces of China, which easily broke through the geographic barriers and facilitated the gene flow between these regions (Capinha et al., 2015;Zheng, 2018).

| Origin, dispersal and demographic history
Combining the phylogenetic analyses and divergence time estimation, we inferred that SLF underwent a dispersal from southwestern to northeastern China during the late Pleistocene. SLF may have originated from Yunnan Plateau, not from northern China as initially thought (Liu, 1939). Southwestern China, especially Eastern Himalaya and Hengduan Mountains regions, has long been regarded as one of the most biodiverse regions in East Asia and the origin of different organisms including many insect species (Song et al., 2018;Wei et al., 2015).

The population localities in the Guizhou Plateau and Yunnan
Plateau regions were included in population genetic study for the first time Zhang et al., 2019). The lack of these two pivotal lineages in previous studies resulted in an inaccurate understanding of origin and dispersal history of SLF (Liu, 1939;Zhang et al., 2019). Thus, comprehensive sampling should always be implemented as far as possible to reveal the phylogeographic structure and demographic history of a native species.
As the East Plain regions, including Northeast China Plain, North China Plain, and Middle-Lower Yangtze Plain, harbors abundant SLF populations, it has long been recognized as the ancestral distribution area (Dara et al., 2015;Han et al., 2008;Liu, 1939 (Dara et al., 2015). Tree-of-heaven has been thought to be native to northeastern and central China and is commonly distributed in the Yellow River and Yangtze River basin areas as border trees and protection forests (Hu, 1979). The SLF is particularly attracted to tree-of-heaven and its wide distribution   Figure 4a) and

| Global invasion history
South Korea implied the invasive routes across the sea, which was consistent with the earliest records and previous population genetic studies in South Korea (Han et al., 2008;Park et al., 2013 Ocean (Kim et al., 2017).
It is important to note that in the invasion history of SLF, the introduced population expanded quickly outward in South Korea (Han et al., 2008) and the United States (NYSIPM, 2020), even though a quarantine zone surrounding the site of first detection had been established. Tree-of-heaven was first introduced to the Philadelphia area of the United States in 1784 (Shah, 1997) and was widely distributed in South Korea and Japan before the invasion of SLF (Shimizu, 2003). The earlier invasion of tree-of-heaven in these countries possibly facilitated the quick invasion, colonization, and spread of SLF. The absence of natural enemies (Clifton et al., 2019) and suitable climatic conditions further contributed to the dispersal and damage of SLF in the invaded countries. Other regions such as many European countries have experienced the invasion of tree-ofheaven since long before (Hu, 1979) and are thus, possibly exposed to a high risk of outbreak. Therefore, monitoring and quarantining invasive SLF will be essential in these regions.

| Implications for biological invasions
Biological invasion represents one of the natural rapid evolutionary processes in contemporary time scales. The genetic diversity of introduced populations might increase largely due to different invasion scenarios (Garnas et al., 2016). In our study, South Korea was inferred to serve as a bridgehead of global invasion, resulting in the introduction of SLF to the United States. Bridgehead effect has been considered as a pivotal factor to explain increasingly successful global invasions (Ascunce & Shoemaker, 2011;Javal et al., 2019).
Nevertheless, little is known about how bridgehead effects facilitate secondary invasion. Bridgehead population may experience rapid evolution and possess higher adaptive abilities, which may lead to higher invasion capability. However, evidence supporting these hypotheses remains lacking (Bertelsmeier & Keller, 2018). The invading SLF population in the United States shows extremely low genetic diversity, but is currently undergoing rapid spread across eastern states, exhibiting exceptional adaptability to the new environment. This is a remarkable case of successful biological invasion with bridgehead effect and low genetic diversity, which can be established as a model to explore mechanisms behind the enhanced invasiveness in bridgehead populations. However, as discussed in the above sections, adaptation is not always necessary for outward establishment from the native ranges, because the introduced habitats generally have similar environmental conditions, especially host plants.

| CON CLUS I ON S AND FUTURE DIREC TIONS
Based on the detailed analysis of mitogenomic phylogeography, we explored the invasion history of SLF from China, to South Korea, Japan, and the United States. We revealed that the invasion of SLF into the United States may be a secondary invasion from a bridgehead population in South Korea. Our study supports the great potential of mass mitogenome sequences for resolving not only the phylogeographic pattern but also the invasive source and route of the pest species. Using high-throughput sequencing and metagenomic approach, multiple invasive species could be analyzed together to reveal shared or different invasive patterns in defined regions. Mitogenome normally under strong purifying selection pressure offers little information for key genes behind invasion. Nevertheless, with whole genome sequence of SLF currently available (Kingan et al., 2019), a clear and accurate genetic structure derived from mitogenomic analysis may build a foundation for future studies on genomic mechanisms mediating rapid worldwide spread of this severe invasive pest.

ACK N OWLED G EM ENTS
This work was supported by grants from the National Natural Science