Genome‐scale phylogeography resolves the native population structure of the Asian longhorned beetle, Anoplophora glabripennis (Motschulsky)

Abstract Human‐assisted movement has allowed the Asian longhorned beetle (ALB, Anoplophora glabripennis (Motschulsky)) to spread beyond its native range and become a globally regulated invasive pest. Within its native range of China and the Korean peninsula, human‐mediated dispersal has also caused cryptic translocation of insects, resulting in population structure complexity. Previous studies used genetic methods to detangle this complexity but were unable to clearly delimit native populations which is needed to develop downstream biosurveillance tools. We used genome‐wide markers to define historical population structure in native ALB populations and contemporary movement between regions. We used genotyping‐by‐sequencing to generate 6102 single‐nucleotide polymorphisms (SNPs) and amplicon sequencing to genotype 53 microsatellites. In total, we genotyped 712 individuals from ALB’s native distribution. We observed six distinct population clusters among native ALB populations, with a clear delineation between northern and southern groups. Most of the individuals from South Korea were distinct from populations in China. Our results also indicate historical divergence among populations and suggest limited large‐scale admixture, but we did identify a restricted number of cases of contemporary movement between regions. We identified SNPs under selection and describe a clinal allele frequency pattern in a missense variant associated with glycerol kinase, an important enzyme in the utilization of an insect cryoprotectant. We further demonstrate that small numbers of SNPs can assign individuals to geographic regions with high probability, paving the way for novel ALB biosurveillance tools.


| INTRODUC TI ON
Globalization has increased the spread and establishment of invasive species throughout the world, causing irreversible damage to forest ecosystems (Millar & Stephenson, 2015;Seebens et al., 2017). As it is difficult and expensive to manage established invasive species, the prevention and early detection of new invasive species and populations are the cornerstones of an effective management response (Reaser et al., 2020). In this respect, biosurveillance represents a knowledge framework that facilitates early detection and rapid response to new invasive threats, reducing the risk and impact of invasive species in novel habitats (Blackburn et al., 2020).
Tracking introduction pathways and identifying the geographic sources of alien pests is an essential component of an invasive species biosurveillance pipeline Cristescu, 2015).

Knowledge of invasion sources allows management responses to
focus on high-risk points of entry and routes of spread, which facilitates invasive species monitoring, trade negotiations, and future risk assessments . However, this approach requires clear delimitation of an invasive species' population structure in its native range so that intercepted individuals can be genetically assigned to a source population Manel et al., 2005;Roe et al., 2019). Thus, accurately characterizing the population structure and genetic diversity within the native range of an invasive species is an essential step toward effective biosurveillance and management of high-risk pests.
The Asian longhorned beetle (ALB, Figure 1) (Coleoptera: Cerambycidae: Anoplophora glabripennis (Motschulsky)) is a woodboring insect native to China and the Korean peninsula that has become a highly destructive invasive pest and poses a global threat to temperate broadleaved forests (Lingafelter & Hoebeke, 2002).
This species is broadly distributed in East Asia, spanning much of the temperate forested regions in mainland China and the Korean Peninsula and extending south of 35°N into subtropical regions in China and it broadly overlaps with the citrus longhorned beetle (CLB, A. chinensis (Forster)), another important invasive pest. Within its native range, ALB is considered an important forest pest, damaging and killing many tree species, such as poplar (Populus spp.), willow (Salix spp.), and maple (Acer spp.) (Lingafelter & Hoebeke, 2002;Sjöman et al., 2014). In less than two decades, the area that ALB populations occupied had increased by seven-fold, and by 1994, had impacted over 333,000 ha of broadleaf forests within China (Luo et al., 2000).
ALB was initially detected in North America in 1996 and then in 2001 in Europe where it has established and spread, causing severe ecological and economic losses in these newly invaded regions (Meng et al., 2015). ALB larvae tunnel and feed within the host xylem, making visual detection challenging. Wood damaged by ALB is inferior in quality and is often used to construct solid wood packaging material (e.g., pallets, crates, dunnage) which is then used a clear delineation between northern and southern groups. Most of the individuals from South Korea were distinct from populations in China. Our results also indicate historical divergence among populations and suggest limited large-scale admixture, but we did identify a restricted number of cases of contemporary movement between regions. We identified SNPs under selection and describe a clinal allele frequency pattern in a missense variant associated with glycerol kinase, an important enzyme in the utilization of an insect cryoprotectant. We further demonstrate that small numbers of SNPs can assign individuals to geographic regions with high probability, paving the way for novel ALB biosurveillance tools.

K E Y W O R D S
gene flow, genotyping-by-sequencing, glycerol, insect pest, invasion history, population assignment F I G U R E 1 Larva and adult of Anoplophora glabripennis, the Asian longhorned beetle. Pictures were kindly provided by Dr. Brent Sinclair to transport goods. Indeed, infested wood packaging material is considered a crucial pathway for ALB's invasion, facilitated by increasing globalization and trade (Wu et al., 2017). In response to the ALB invasion, the Food and Agriculture Organization (FAO) established International Standards for Phytosanitary Measures (ISPM) 15, which set minimum heat and fumigation treatments for wood packaging material used in global trade to render potential invasives nonviable (FAO, 2002). However, prior to global acceptance of ISPM 15, ALB was transported throughout the world and introduced to many new broadleaf ecosystems. Despite these new phytosanitary protocols, ALB continues to pose a significant threat to temperate forests outside its native range with recent interceptions and introductions (Haack, 2020;Pedlar et al., 2020) throughout the invaded range. Therefore, biosurveillance tools are needed to manage the threat posed by ALB and identify the regional source(s) of invasion.
Complex biogeographic patterns are typical among native species within China (Lyu et al., 2020). While East Asia is known to maintain complex, deeply divergent population structure (Qiu et al., 2011), this historic structuring can be eroded with contemporary migration between distinct populations, often facilitated through anthropogenic movement of individuals beyond their ability to disperse naturally (Eloff et al., 2020). In ALB's native range, population differentiation is driven by both evolutionary and anthropogenic processes (Colunga-Garcia et al., 2010;Huang et al., 2020;Shatz et al., 2013). Indeed, forestry practices in northern China from the 1950s to1970s, including the Three-North Shelter reforestation project which is the largest artificial forest worldwide, created forest conditions that caused ALB populations to irrupt and spread beyond their historic range. Previous studies not only reported moderate levels of population differentiation but also found high levels of admixture (Carter et al., 2009;Javal et al., 2019a,b). These authors hypothesized that this complex population structure was due to widespread contemporary movement of ALB associated with movement of infested wood out of China's Three-North Shelter reforestation plantation. This assisted migration caused gene flow between different populations, blurring the historic population structure. However, these previous methods used random amplified polymorphic DNA markers , mitochondrial DNA (Carter et al., 2009(Carter et al., , 2010Javal et al., 2019b), and small sets of microsatellite markers (Carter et al., 2010;Javal et al., 2019a), which lack the resolution needed to describe complex patterns of historic population structure and contemporary gene flow and, therefore, more informationrich marker systems are needed.
Genomic data provide unparalleled insights into the invasion history of an invasive pest . These data can quantify variation among populations and allow us to identify individuals that share common ancestors and geographic origins. Genomic technologies, such as reduced representation libraries (Altshuler et al., 2000;Van Tassell et al., 2008), provide access to variable genetic loci throughout the genome. Genotyping-by-Sequencing (GBS) (Elshire et al., 2011), double digest restriction-site associated DNA sequencing (ddRADseq) (Peterson et al., 2012), and DNA simple sequence repeats (microsatellites)-based genotyping (Jarne & Lagoda, 1996;Luikart et al., 2003) have been used to characterize population structure in a range of invasive insect pests, such as Lymantria dispar from Asia (Picq et al., 2018;Wu et al., 2020), the Asian granulated ambrosia beetle Xylosandrus crassiusculus (Storer et al., 2017), and the fall webworm Hyphantria cunea . With these approaches, high numbers of polymorphic markers can resolve fine-scale patterns of genomic differentiation and assign invasive samples to distinct populations with high levels of accuracy. Furthermore, genomic markers were able to identify signatures of population expansion within the invaded range, providing further insight into the patterns of invasion and spread within invasive pest populations Picq et al., 2018;Storer et al., 2017;Wu et al., 2020).
In this study, we comprehensively defined the native ALB population structure and quantified the scale of contemporary gene flow in this species by leveraging the informative power of genome-wide molecular markers. To complement and cross-validate our inferred population structure, we used two sets of independent data: SNPs obtained from GBS, and microsatellites obtained from amplicon sequencing. First, we generated SNPs using a reduced representation library approach to characterize genomic diversity and population structure in ALB's native range. We then validated our proposed population structure with an independent set of insect samples and 53 microsatellites developed with ddRADseq and genotyped through amplicon sequencing. Using our panel of informative SNPs, we delineate ALB population structure, reconstruct population history scenarios to describe the observed population structure, and identified loci that were under selection that may show adaptive significance within ALB populations. We identified loci that proved diagnostic for our delineated ALB populations and provided high assignment accuracy for downstream use in an amplicon-based ALB biosurveillance tool for this high-risk invasive forest pest.

| ALB sampling and DNA preparations for SNP genotyping
We collected 480 samples from 20 sites across China and South Korea (Figure 2a; sampling details in Table S1), which covered c. 40% of ALB's distribution in the native range according to its most current distribution record (CABI, 2021;Yan, 1985). Our samples also covered several Three-North regions, including Heilongjiang, Jilin, Liaoning, Hebei, Beijing City, Inner Mongolia, and Ningxia. These represent the major biogeographic regions in northern China, with the Greater Khingan Range dividing the Northeast and the Helan Mountains dividing the Northwest (He et al., 2017;Huang et al., 2012). In the south, the Huai River basin is a major transition zone between northern and southern climatic regions as it approximates the 0°C January isotherm (Shi et al., 2014). The Yangtze River is also recognized as an important North-South boundary within the region.
Based on the regions separated by these major geographic barriers, we defined the three regions located north of the Huai River as "Northeast (NE)," "North (N)," and "Northwest (NW)", respectively, and the one southern region, where ALB sampling occurred, "South (S)". Bengbu (BB) is located within the Huai River basin and was therefore included in the "South" region together with Cixi (CIX).
Before DNA isolation, 95% ethanol was used to remove any contaminations from the beetle's surface. For each sample, a single leg or larval tissue was cut into smaller pieces, and the tissue was ground using a mixer mill (Retsch MM400, Germany) at 29 Hz for 1 min.
DNA was extracted using the DNeasy 96 Blood &Tissue Kit (Qiagen) following the manufacturer's instructions. The quality and quantity of DNA samples were assessed using the NanoDrop ND-1000 spectrophotometer and Qubit 2.0 Fluorometer, respectively. For each sample, a total of 1μg or 2μg DNA was prepared for sequencing.  Table 1. The icon size of each population is relative to the sample size. We divided the ALB sampling locations in China into four areas, according to biogeography, distinguished by different colors: N, north China (blue); NW, northwest China (green); NE, northeast China (lilac); S, south China (red); in addition to China, Korea (turquois), was also sampled for ALB. Each point with a different color or shape represents a distinct sampling location. TOL is treated as a NE population given its geographical location. The three rivers shown are Huang River, Huai River, and Yangtze River from north to south. The Fast-GBS pipeline v1.0 was adopted to process raw sequencing reads (Torkamaneh et al., 2017). Within this pipeline, SABRE 1.0 was used to demultiplex barcoded reads into separate files (Joshi, 2011), Cutadapt 2.1 to remove adapter sequences (Martin, 2011), Burrows-Wheeler Aligner (BWA) 0.7.17 (Li & Durbin, 2010) to align reads to the ALB reference genome which consists of 10,474 scaffolds (GCA_000390285.1) (McKenna et al., 2016), Samtools 1.8 to convert sam files to bam format and index (Li et al., 2009), and lastly, PLATYPUS 0.8.1.1 to call the SNP variants (Rimmer et al., 2014).
The resulting SNP variants were filtered using VCFtools 0.1.16 (Danecek et al., 2011). Basic filters were applied to retain only biallelic SNPs, remove indels and variants with a FILTER flag other than PASS, and remove loci with more than 50% missing data. Samples with >60% missing data were removed. Finally, only loci with read depth >5, minor allele frequency (MAF) >0.05, minor allele count (MAC) >3, and missing data per site <10% were retained. We estimated relatedness between samples through the PLINK method of moment (MoM) using the "snpgdsIBDMoM" function in SNPRelate (Zheng et al., 2012) implemented in R (Team, 2015). Samples with an identity-by-descent coefficient exceeding 0.25, which are considered full sibs, were removed. However, genetically related individuals from Korea, where we had limiting sampling, were initially retained for analyses to explore our Korean samples' overall genetic diversity and compare them to the Chinese populations. Following these comparisons, these analyses were repeated without the related Korean individuals. The SNP variant filtering pipeline is outlined in Table S2.

| Genetic diversity
We calculated population genetic indices such as nucleotide diversity (π), observed heterozygosity (H o ), and expected heterozygosity (H e ) using the function "populations" in STACKS v2.3e (Catchen et al., 2013 Table S1; for sampling and basic population genetic analyses involving microsatellites, see Table S3.
We used PCA to demonstrate genetic similarities among individuals and then DAPC to delimit distinct genetic clusters. Finally, we used sPCA to explicitly incorporate spatial information to quantify geographic patterns of genetic variation. We performed all three analyses using adegenet v2.1.2 (Jombart, 2008). For PCA analysis, we used the "glPca" function with retaining 20 PCs. For DAPC analysis, we first ran the "find.clusters" function to determine the number of groups that best summarize variations in the data running from K = 1 to 16. We then used the "dapc" function while retaining 250 PCs and five discriminant analysis eigenvalues. In the sPCA analysis, we ran the "spca" function by choosing a Delaunay triangulation as a connection network and kept the first positive eigenvalue. To assess similarities between geographical and genetic distributions, we performed a Procrustes analysis and a Mantel test (Lisboa et al., 2014).
We used the "procrustes" function in the R package MCMCpack 1.4-7  to perform a Procrustes transformation analysis on the first two PCs identified in the PCA analysis, in which the PCA coordinates were scaled to geographical coordinates. We determined the Pearson correlation coefficient between the pairwise genetic distance (F ST ) and the geographical distance between populations using stat_cor() in R.
Based on the distinct groups identified by DAPC analysis, we performed an analysis of molecular variance (AMOVA) using an infinite allele model in GenoDive 3.04 (Meirmans, 2020) to separate total genetic variance into among-individuals within-populations, among-populations within-groups, and among-groups covariance components.
Contemporary movement of ALB was previously hypothesized by authors (Carter et al., 2009;Javal, Lombaert, et al., 2019), therefore, we wanted to quantify levels of admixture within each population. We used a sparse non-negative matrix factorization (sNMP) approach to estimate individual ancestry coefficients using LEA 2.0 (Frichot & François, 2015). This method allowed us to estimate homozygote and heterozygote frequencies and avoid the assumption of Hardy-Weinberg equilibrium (HWE) (Frichot et al., 2014). We ran ten replicates with K range 1 to 16 using the "snmf" function. It used a cross-validation approach to estimate the entropy of each K, where the minimum K value is the best estimate. We then ran "snmf" again with the ascertained K and an alpha-value (regularization parameter) of 100 to estimate the individual ancestry coefficients. We also computed a maximum likelihood (ML) phylogeny using 1000

| Gene flow and population history models
We calculated recent gene flow among populations with a Bayesian Markov Chain Monte Carlo (MCMC) resampling method in BayesAss 3.04, which estimates recent migration rates between populations as the proportion of migrants per generation (Mussmann et al., 2019). First, we ran 1,000,000 iterations without burn-in and then adjusted the parameter for allele frequencies to 0.3 to align with the suggested acceptance rate of 20%~60% (Wilson & Rannala, 2003).
Then, we performed three independent runs using 10 million iterations with a burn-in of one million and a sampling interval of 1000.
We used the default settings (0.1) for the mixing parameters associated with proposed moves (i.e., inbreeding coefficient and migration rates). We assessed convergence of the estimated parameters by plotting the combined trace plot and the probability distribution using Tracer 1.7.1 (Rambaut et al., 2018).
We used Bayesian inference with the structured coalescent estimator Migrate 4.4.4 (Beerli et al., 2019) to test population history models for the four regions N, S, NE, and NW based on our population structure results. Thus, the following different population scenarios were evaluated: seven scenarios ( Figure S1, models 1-7) that assumed the most recent common ancestry occurred in the N or NW regions based on their high genetic diversity, and among which, four of the scenarios ( Figure S1, models 2-5) assumed that the S and NE populations were from different sources; we included two admixture models ( Figure S1, models 8-9) to assess whether population admixture has occurred within the North region, given our admixture result. We randomly selected 20 individuals from each region and ran the models with a strictly filtered SNP dataset that contained no missing data (n = 422 SNPs). We used one long chain with four replicates for each run with 1,000,000 iterations and a burn-in of 10,000. We compared the models using the log marginal likelihood (Beerli et al., 2019). To further estimate effective population sizes, divergence time, and migration rates of the optimal model identified with Migrate, we used an alternative coalescent-based simulation method, Fastsimcoal 2.6 (Excoffier & Foll, 2011). We applied easySFS.py (https://github.com/isaac overc ast/easySFS) to generate the joint site frequency spectrum (SFS). We performed 100 independent parameter estimations to obtain the maximum composite likelihood of the joint SFS, each run with 100,000 iterations and 40 cycles of expectation conditional maximization (ECM) (Lanier et al., 2015) using a maximum likelihood method. The mutation rate μ was set to 2.9e −09 per base and per generation based on Heliconius melpomene (Keightley et al., 2015), assuming a 1-year life cycle (generation time) for ALB (Hu et al., 2009). The best run was selected based on the highest maximum likelihood. We computed the 95% confidence intervals of the parameter estimates of the best run by simulating 50 SFS of the estimates and re-estimating parameters each time (Lanier et al., 2015;Zhao et al., 2019b). PGDSpider 2.1.1.5 was used to convert the vcf file to the required file formats for the above analyses (Lischer & Excoffier, 2012).

| Selection detection
We used the F ST -based genome scan method OutFLANK v0.2 (Whitlock & Lotterhos, 2015) to detect loci putatively under selec- and FlyBase (Larkin et al., 2021) to determine molecular function and biological processes. We identified a potentially adaptive outlier SNP within glycerol kinase and conducted a phylogenetic analysis for related insect genes using the Neighbor-joining method with 1000 bootstraps in the MEGA X software . The allele frequencies of the candidate SNP across all populations were visualized in R (Team, 2015).

| Microsatellite discovery and analysis
To independently assess population structure with a different set of data, we genotyped 232 new ALB specimens collected independently from 16 locations in China ( We applied DAPC to characterize genetic clusters using Ward's hierarchical clustering method (Ward, 1963). We further assessed the distribution of genetic variance between and within genetic clusters as assessed through an AMOVA based on the clusters according to our DAPC results. Statistical significance of covariance associated with each hierarchical level was calculated with 1000 permutations.

| Congruence between genome-wide marker sets
To assess the congruence between the two marker sets, we performed a Mantel test using a Monte-Carlo method on the pairwise F ST generated from SNPs and microsatellites in ade4 (Chessel et al., 2004). We used F ST values calculated from the five sampling locations (i.e., BJ, HS, CIX, TOL, and YC) present in both datasets. We performed the 'mantel.rtest' analysis on the two F ST matrices with 9999 permutations.

| Population assignment
To identify markers for downstream diagnostic tool development, we wished to identify SNPs that could accurately assign individuals to their source populations. To do so, we used Mycorrhiza 0.0.28, a genotype assignment software that employs machine learning and phylogenetic networks to identify informative SNPs that can assign individuals to their source population (Georges-Filteau et al., 2020).
The SNPs were ranked by discriminatory power based on mutual information. Mycorrhiza works in two steps. First, it generates a pairwise genetic distance matrix from the genotype data and uses it to construct a phylogenetic split system using the Neighbor-Net method in the program SplitsTree 4.14.6 (Huson, 1998;Huson & Bryant, 2006). Second, the split system is used in a two-fold crossvalidation procedure via a random split method in the scikit-learn library.
With this approach, we generated two SNP assignment sets: (1) North versus South, as defined by our sPCA results; and (2) Six regional groups, as defined by our DAPC results ( Figure S2).  Figure S4B,D). However, this relationship appeared to be driven by a single population, Cixi (CIX), and when we removed this population from the analysis, the latitudinal cline was no longer significant ( Figure S5).

| Population genetic structure
In the PCA analysis, we observed Chinese and Korean populations were distinct (three individuals from KOR and KNA, upper right quadrant, Figure 2b). However, the three additional Korean samples from INC and ULS, collected in urban areas, were nested within the cluster containing samples from China.
Chinese ALB populations in the NW, S, and NE regions formed distinct clusters, while populations from the N region separated into three clusters, one that was unique, and the others were split among the NE and NW regions (Figure 2c,d; Figure S6). The sPCA analysis exposed a distinct north-south genetic break (p-value = 0.001) (Figure 3a), which was consistent with the F ST values, where populations in the north were all highly divergent from those in the south (Table S5) identified significant among-group (14.8%) and among-population within-group (7.9%) genetic covariances (Table S6; p < 0.05).
Individual admixture estimates identified eight distinct populations ( Figure 3c) based on the cross-entropy criterion ( Figure S7B).

Admixture analyses showed that Chengde (CHE) and Beijing (BJ)
(N region), as well as Yanji (YJ) (NE region), have the most complex admixture patterns, a result supported by relatively low F ST values for these populations (Table S5). We also show that Yanchi (YC) and  (Table S8).  Table 1.

F I G U R E 3 (a)
A total of 6102 SNPs were used

| Selection detection
We did not detect outliers in OutFLANK using a q-value cutoff of 0.05, while we did detect SNP loci putatively under selection through fsthet ( Figure S9; For loci that were functionally similar to previously annotated regions, we identified loci associated with a range of molecular functions and biological processes (Table S9). Of particular interest was the glycerol kinase (GLK) gene (AGLA000593, i5K database, https:// i5k.nal.usda.gov/Anopl ophora_glabr ipennis, Accessed Jan 2022).
The SNP under selection was annotated as a missense mutation under positive selection (F ST = 0.335) ( Table S9). We compared the ALB GLK gene to previously published insect GLK genes. We observed that the ALB GLK gene is closely related to genes described in other beetle taxa (Figure 6a), including Dendroctonus ponderosae and Tribolium castaneum, in agreement with the species' phylogenetic relationships shown in McKenna et al. (2016). We plotted the allele frequency of the SNP (A/G) and showed that the allele frequency exhibits a distinct clinal trend within our ALB populations (Figure 6b; for allele count details, see Table S10).

| Microsatellites analyses
After correcting for sample size variation, allelic richness is similar across locations (Table S3) was increased to K = 6, an additional cluster was formed by seven specimens from Zunyi (ZUY) that became separated from cluster 1.
However, another five specimens collected in the same location remained with cluster 1. For the AMOVA with K = 5, most of the genetic variance (84.56%, p < 0.001) is found within locations, with 7.22% (p < 0.001) of the variance among locations within clusters, and 8.22% (p < 0.001) of the total variance among clusters.

| Congruence of genetic markers
We used a Mantel test between the two F ST matrices generated from SNPs and microsatellites to assess congruence between

F I G U R E 4 Maximum likelihood phylogenetic tree for Chinese
Asian longhorned beetles analyzed in RAxML. A total of 6102 SNPs were used in the analysis. Each branch represents a sample. The branches are colored by their population codes (explained in Table 1). The width of each branch corresponds with their bootstrap value (widest branch with a bootstrap value of 100). Fan-like sections indicate subregions NE, S, and NW (whereby SHI is actually in the N region). The rest of the populations are from the N region our marker and specimen sets (Tables S5 and S11). Here, we observed a significant congruence between our datasets (r = 0.951, p-value = 0.039) based on 9999 replicates between the two datasets. We observed similar, although not identical, clustering in our two DAPC analyses on these distinct datasets (Figures 3b and 7).
Furthermore, we also found that the positions of the N-S break between regions in the two sPCA analyses were the same (Figure 3a; Figure S10). Only the branch supports above 70% are indicated. The closest homology for the ALB protein was found with D. ponderosae. (b) Geographic map of the allele frequency distribution for the glycerol kinase gene AGLA000593 across the 16 Chinese ALB populations studied by GBS technology. G refers to the reference allele and A to the alternative allele (the missense mutation)

| Population assignment using SNP markers
The accuracy of assignments to different groups was conducted using an increasing SNP set (up to 500 SNPs; Figure 8). We assigned individuals to the N and S groups with >90% accuracy using only 20 SNPs that we ranked as the most discriminant and with near 100% with 100-500 SNPs (Figure 8a). Assigning individuals to one of the six DAPC groups reduced accuracy to 96.4% with 200 SNPs (Figure 8b). Assignment accuracy for each population assignment class using 500 SNPs was shown in Figure S11. When the whole set (6102 SNPs) was used in the tests, accuracies were 100% (two groups) and 98.6% (six groups), respectively.

| DISCUSS ION
The Asian longhorned beetle is a widespread pest found throughout the temperate forests in East Asia. Global spread of this species Chen & Lou, 2019), which can result, at least partially, from the biogeographic history of the area. We observed some distinct phylogeographic patterns within ALB consistent with earlier publications and patterns seen in other forest species (Carter et al., 2009;Du et al., 2019Du et al., , 2021. First, we observed a distinct cluster of Korean ALB specimens from forested areas in Kangwon (KOR) and Pocheon (KNA) (except three individuals sampled from urban areas, which will be discussed further below). These populations were distinct from ALB populations in China, consistent with previously published results (Carter et al., 2010;Javal et al., 2019a;Lee et al., 2020) and were collected in the native forest from the northeast region of the country, within the hypothesized historic distribution of the species (Lee et al., 2020). Our limited sample of insects from this forested region showed low genetic diversity, contrary to the more extensive sampling in Lee et al. (2020). Given that our samples from Kangwon (KOR) were collected from a few trees at a rest stop in Korea and proved to be highly related based on kinship estimation performed in SNPRelate (Table S12), these can, therefore, be considered isolated populations with low genetic diversity due to inbreeding and sampling bias. Despite these differences, the unique Korean population structure and genetic diversity we observed in ALB, relative to mainland China, were consistent with phylogeographic patterns observed in other forest species (reviewed in Qiu et al., 2011), including assassin bugs (Du et al., 2019), raccoon dogs , orchids (Tian et al., 2018), and oaks (Zeng et al., 2015). Baekdudaegan, the main mountain range that runs the length of the Korean Peninsula, has been identified as an important biodiversity hot spot that served as a glacial refugium for boreal and temperate forest species during the last glacial maximum (LGM) . Complex phylogeographic structure is also typical among forest species in eastern China (Qiu et al., 2011). We found that ALB genetic variation was hierarchically structured, with sharp regional genetic breaks and internal population subdivisions with varying levels of admixture. Regionally, ALB populations were divided into northern and southern populations, similar to, but more clearly defined than, results in earlier publications (Carter et al., 2009;Javal et al., 2019a). This North-South population break is seen in a wide range of forest species (reviewed in Qiu et al., 2011), including ALB host plants (Betula, Chen & Lou, 2019;Acer, Guo et al., 2014;Populus, Hou et al., 2018). In our SNP dataset, this North-South division aligned with the hypothesized location of an "aridity belt" along the Yellow River (= Huang He) in East Asia (Milne & Abbott, 2002). Geographic barriers such as mountain chains, arid regions, rivers, and geographic distance are key factors driving population structure in many widely distributed Asian insect species (e.g., Du et al., 2021).
Population structure can be driven by a combination of geographic, spatial, and environmental conditions. Often, these vary along spatial or temporal gradients; temperature conditions, light . We observed latitudinal gradients in ALB with genetic diversity peaking between 35°N to 40°N ( Figure S4), similar to previous findings in ALB (Javal et al., 2019a), and spotted lanternfly (Du et al., 2021;Zhang et al., 2019). Thus, the higher genetic diversity found for the central populations compared to the edge populations would support the "center-periphery hypothesis" for ALB's native distribution.
Climate is critical in shaping biogeography within continents (Ficetola et al., 2017;Mckown et al., 2014). For ectothermic species such as insects, temperature is particularly influential in defining species ranges (Angilletta, 2009). Surviving subzero temperatures is physiologically challenging and can limit the northward expansion of insects (Brightwell et al., 2010).  (Feng et al., 2016b;Javal et al., 2018;Keena & Moore, 2010), and the greatest population density is north of the 0ºC isotherm (Yan, 1985). Therefore, although ALB has been found throughout China, their populations may be limited by biotic pressures (Fragnière et al., 2015), hence, limiting their effective population size and leading to the lower genetic diversity observed in the southernmost population CIX, and the northernmost populations HRB and CHC in terms of latitude.

| Population history and contemporary movement
The relative contribution of contemporary dispersal and historic population structure represent two opposing forces when seeking to reconstruct the population structure of an organism. Contemporary movement is thought to blur the historic population boundaries that evolved over thousands of years. Contemporary dispersal, as influenced by anthropogenic movement of ALB throughout the native range, was hypothesized to have significantly disrupted the natural population boundaries (Carter et al., 2009;Javal et al., 2019a).
We detected evidence of gene flow and admixture within the native range. Globally, gene flow was low between regions, except for specific sites in the NW and N contributing to gene flow between the two regions. Our population history modeling dated evidence to c. 16,300 years ago in our most probable demographic scenario.
Indeed, such dispersal for ALB might have been driven by changing environmental factors coinciding with the retreat of the glacial coverage that began c. 18,000 years ago as the end of the LGM. It is probable that ALB experienced similar fragmentation and distribution restrictions as its host plants, allowing congruent phylogeographic structuring to evolve. Then, this regional population structure would have been maintained given ALB's low dispersal ability (Smith et al., 2001(Smith et al., , 2004Williams et al., 2004) and limited movement beyond the natal tree. The second major population split and scattering of ALB was dated to c. 5000 years ago, thus following the Holocene climatic optimum in China (6000 years ago), when temperate deciduous forest vegetation reached almost 1000 km further north (48°N) than at present and extended further southwestwards at higher elevations than today (Yu et al., 2000). As climatic restrictions eased, ALB expanded its range in concert with its host plants, leading to localized admixture between nearby regions while still maintaining distinct regional structure. Similar population structuring, including regional admixture, has been reported in temperate forest trees, including Juglans species (Bai et al., 2016) and Betula platyphylla (Chen & Lou, 2019). We also observed evidence of movement out of the North to shipping hub between China and South Korea (Li, 2012). As ALB are poor dispersers (Smith et al., 2001(Smith et al., , 2004Williams et al., 2004), the current long-distance movement is likely linked to human activities.
Although our findings do not support the hypothesis that contemporary anthropogenic movement was widespread and blurred regional population differences (Carter et al., 2009;Javal et al., 2019a,b), we were, nonetheless, able to detect cryptic movement of ALB within the native range.

| Selection detection
We identified several loci under selection within our ALB populations. We were particularly interested in glycerol kinase (GLK), an enzyme that catalyzes an important rate-limiting step in the utilization of glycerol at diapause termination (Kihara et al., 2009). Glycerol is composed of two polyols and is a well-known cryoprotectant molecule used by insects to prevent intracellular ice formation (e.g., Cheng et al., 2014;Storey & Storey, 2012). Prior evidence also showed that ALB exhibits both seasonal and population differences in the accumulation of this important cryoprotectant (Feng et al., 2014, which coincide with gene expression that peaks during winter (Xu et al., 2021). Moreover, recent laboratory experiments performed on ALB colonies showed that glycerol content changes within the hemolymph occur during diapause and that such changes are related to concurrent shifts in the supercooling point in ALB (Torson et al., 2021). In our study, we found that the missense mutation within the GLK gene occurs at much higher

| Dataset congruence
Here, we used two different genetic marker systems and slightly different sampling strategies within the main geographic regions to verify the overall genetic pattern of ALB in its native range.
In addition to complementary findings, in both cases, the overall genetic structure differentiated NE, NW, and S regions. Given the highly congruent results from the two sets of independent markers, it is worthwhile to consider the pros and cons of the two approaches. The biggest advantage of GBS is undoubtedly its capability to produce thousands of SNPs. Genome-wide SNP markers provide in-depth insights into the level of population structure that may not be revealed by other types of genetic markers. For example, our SNP data show a genetic cline along a latitudinal gradient in Eastern China, a result not reported by Sanger sequencing or microsatellite data (Carter et al., 2010;Javal, Lombaert, et al., 2019). For organisms whose population structure is obscured by recent long-distance dispersal (such as human-aided movement as seen in many invasive species), SNP-based analysis proves to be particularly powerful in resolving the underlying genetic pattern (Picq et al., 2018). Additionally, GBS can generate SNPs that are candidate markers for identifying associations between genotypes and phenotypes that may point to processes of selection or adaptation to novel environmental conditions (Wickland et al., 2017), which can ultimately improve our assessment of invasive risk.
However, microsatellite markers do have their own merits. Using the designed PCR primers, there is no need to sequence the genomes and run the bioinformatic pipeline again for joint SNP calling when new samples are added to the existing database. Although it is not always possible to compare different datasets produced by different laboratories due to inconsistencies in allele size calling (Vignal et al., 2002), combined with its proven capability to resolve population structures and less requirement on bioinformatics, microsatellite markers will continue playing an important role in population genetic studies.

| Developing genomic tools for biosurveillance
Preventing the incursion of invasive forest pests is paramount to an effective biosurveillance program. One of the most critical applications in an effective biosurveillance program is (a) accurate detection of regulated pests to enable an early, rapid response; (b) ability to trace interceptions back to the source and inform trade regulations and policies Roe et al., 2019). Given the consequences of inaccurate information, it is imperative that the identification and assignment of intercepted invasive species is robust and accurate. As such, genomic tools are becoming the "gold standard" to support regulatory decision-making Roe et al., 2019).
Our genomic results showed equally strong genetic differentiation among native ALB populations, with limited contemporary migration. Therefore, we were able to exceed 90% predictive accuracy with only 200 SNP markers for the six genetic groups delimited within our dataset. We needed even fewer SNP markers to confidently assign individuals to Northern or Southern regions with 95% accuracy. A study on Lymantria dispar spp. also showed that a high degree of genetic differentiation led to high population assignment success (Picq et al., 2018). They used laboratory colonies that consisted of eight groups that were genetically distinct, and the assignment success was generally high (86%-100%) using different sets of SNPs (12, 24, 48, and 96). However, for populations of white bass (Morone chrysops) with a moderate genetic differentiation between six populations (F ST = 0.083), the assignment accuracy reached 99.78% with 57 SNPs (Zhao et al., 2019b).
Moreover, progress has been made in developing assignment tools using SNP markers. For example, an assay that contains 324 SNPs, allowing for predictions of phenotype, biogeographical ancestry, and male lineage has been demonstrated to be a robust and sensitive human forensics tool (Diepenbroek et al., 2020). Overall, such an approach is also very promising for translating our results into developing SNP panels for a biosurveillance purpose in forest protection. While our sampling design covered most of the regions in China, there are records of ALB throughout China (Yan, 1985), and we have not fully sampled the entire distribution. Given our microsatellite results, the NW population Pengyang (PEY) that is not included in the SNP dataset formed a distinct cluster. There are likely other populations with a different genetic make-up that have not been collected. Therefore, it is necessary to consider the possibility of unsampled sources when moving forward with genomics-based biosurveillance.

| CON CLUS ION
Our study revealed clear regional differentiation among native ALB populations using two independent datasets that comprised complementary genetic data and sample sets. Biogeography, drift, and limited dispersal capacity are likely key factors that aided the formation and subsequent maintenance of the ALB population structure within its native range. Future research is poised to explore the environmental factors, including both abiotic (e.g., temperature, precipitation, etc.) and biotic (e.g., host plants), that shaped the complex genetic structure we observed in this pest. The patterns of genetic structure varied among regions and while large-scale human-assisted ALB population migration was limited, we were able to detect it in both datasets. Our ability to resolve this complex pattern of movement in the native range demonstrated the ability for our markers set to serve as a diagnostic tool to track invasive ALB populations outside their native range. Using our genome-wide markers in a diagnostic framework will lead to the development of biosurveillance tools to aid rapid screening and pathway analyses of new ALB interceptions in support of plant protection agencies and regulatory bodies.

ACK N OWLED G M ENTS
We acknowledge funding of the study through Genome Canada, Genome British Columbia, Genome Québec, the Canadian Forest Service, the Canadian Food Inspection Agency, and FPInnovations through the Large Scale Applied Research Project LSARP 10106.
This work was also supported by a four-year grant from the China Scholarship Council to MC. We are thankful to Gwylim Blackburn for feedback and discussions.

CO N FLI C T O F I NTE R E S T
The authors declare no conflict of interest.