Rapid and strong population genetic differentiation and genomic signatures of climatic adaptation in an invasive mealybug

A growing number of studies suggest that adaptation of invasive species plays key roles in their successful establishment in novel environments. However, adaptation of invasive species to climatic conditions remains poorly characterized. This study aimed to understand the population genetic structure produced by the cotton mealybug Phenacoccus solenopsis invasion and to identify preliminary signals of selection during its range expansion.

Previous studies showed that these species usually exhibit higher phenotypic plasticity than native species (Amy Michelle, Michael, & Nicotra, 2011;Felden et al., 2018;Vanwallendael et al., 2018), helping them to colonize new areas. In addition, both theoretical and empirical studies suggest that invasive populations of many species are expected to undergo evolutionary adaptation to respond to new conditions (Andrew, Jensen, Hagen, Lundregan, & Griffith, 2018;Lee, 2002;Renault et al., 2018;van Boheemen & Atwater, 2019;Willoughby et al., 2018). While this evolutionary potential may be curtailed as a consequence of bottlenecks limiting genetic diversity, most invasive populations are still expected to have the potential to adapt rapidly through evolution (Bock et al., 2015;Estoup et al., 2016). This means that invasive populations provide an opportunity to understand and contrast population evolution under both neutral (bottlenecks, gene flow) and adaptive processes (Bock et al., 2015;Lee, 2002). Among many aspects of the environment that can restrict the distribution and abundance of invasive species, climatic conditions are likely to be particularly important, posing strong selective pressures on invasive populations (Chown et al., 2015;Colautti & Barrett, 2013;Renault et al., 2018;Roura-Pascual et al., 2011;van Boheemen & Atwater, 2019). Invertebrate species are particularly strongly influenced by climatic conditions on many biological characters, such as reproduction, developmental durations, diapause and survival. Studies have shown differences in climatic adaptation between invasive and non-invasive populations of invertebrates (Goubert et al., 2017;Hill, 2013;Jarošík, Kenis, Honěk, Skuhrovec, & Pyšek, 2015;Urbanski et al., 2012) and comparisons of species distributions in native and invaded regions provide additional indirect support for adaptive evolutionary shifts (Hoffmann, 2017). In newly colonized areas, climatic differences between early invaded areas and distribution edges may lead to differed selective pressures, and the opportunity to investigate rates of climatic adaptation in invading invertebrates.
The cotton mealybug, Phenacoccus solenopsis Tinsley (Hemiptera: Pseudococcidae), is a damaging and emerging invasive pest with harmful effects on agricultural and ornamental plants, with the pest attacking approximately 60 families of plants, including crop species such as cotton (Ibrahim, Moharum, & Elghany, 2015;Wang et al., 2010). Originating from North America, this species has spread rapidly to 24 countries. Its distribution was restricted in cold areas as shown by its native range, and its potential distribution predicted from climatic models (Wang et al., 2010). In China, it was first reported in Guangdong province in 2008 (Lu, Zeng, Wang, Xu, & Chen, 2008;Wu & Zhang, 2009). Within ten years, it had rapidly spread to many areas of China and reached its predicted distribution range, posing a severe threat to cotton production .
All developmental stages of P. solenopsis are wingless, and most times, they are attached to the host plant, except for a short-lived flying male stage (Lu et al., 2008), limiting the distance covered by active dispersal. Long-distance dispersal of this species is mainly mediated by humans transporting infested plants, and thus, ongoing gene flow may be among populations. The distribution of P. solenopsis is mainly restricted by temperature and humidity (Dhawan, Singh, Aneja, & Saini, 2009;Wang et al., 2010). In areas with dry conditions, P. solenopsis occurs more commonly on roots, stems and foliage close to the soil line, while in humid areas, it settles on the upper foliage of the plant (Hodgson, Abbas, Arif, Saeed, & Karar, 2008). In China, P. solenopsis mainly invaded into southern areas with relatively high In this study, we examined population genetic structure and genomic signals of climatic selection of P. solenopsis across populations from China using SNPs, microsatellites and mitochondrial DNA. Our aim was to understand the population genetic structure produced by the P. solenopsis invasion and to identify preliminary signals of selection. We tested whether the low mobility of P. solenopsis impacted genetic structure and whether there were genomic signatures of adaptation under strong climatic pressure, especially in populations of the northern expansion of this species.

| Sample collection and DNA extraction
Female adults of P. solenopsis were collected from 11 locations across its distribution areas in China in 2017 (Table S1, Figure 1a)
Libraries were constructed for each individual as described elsewhere (Peterson et al., 2012). Twenty individuals from each population were used for genotyping. In brief, 120 ng of genomic DNA in a 50 µl volume was digested with NlaIII and Acil restriction enzymes (New England Biolabs) for 3 hr at 37°C. Restricted DNA was purified with 75 µl (1.5×) SpeedBeads (GE), and each individual was ligated to a unique pair of modified Illumina P1 (5 bp) and P2 adapters (4 bp) overnight at 16°C, followed by a heat-deactivation F I G U R E 1 Sampling locations (a) and genetic structure of Phenacoccus solenopsis populations (b, c) based on potentially neutral genetic SNPs. JXNC: Nanchang, Jiangxi; JXGZ: Ganzhou, Jiangxi; FJPT: Putian, Fujian; ZJHZ: Zhejiang, Hangzhou; HNYY: Yueyang, Hunan; HNCS: Changsha, Hunan; XJTL: Tulufan, Xinjiang; AHHF: Hefei, Anhui; JSWX: Wuxi, Jiangsu; GDGZ: Guangdong, Guangzhou; GXNN: Nanning, Guangxi. The DAPC analysis showed XJTL and FJPT are outlier populations (b); The ADMIXTURE analysis showed XJTL, FJPT and AHHF were separated into different clusters when K was increased from 2 to 4, while when K was increased to 11, almost all populations were assigned into different clusters except for HNCS and JXNC (c) [Correction added on 30 May 2020, after first online publication: the location of red dots inside Figure 1a were previously incorrect and have been updated in this version.]  We used StackS version 2.0 for SNP calling (Catchen, Hohenlohe, Bassham, Amores, & Cresko, 2013). FaStQc version 1.1.5 (Andrews, 2013) was used to estimate the content of GC and quality of raw sequences. Raw reads with sufficiently high sequencing quality and correct barcode were retained. After quality control, ddRAD barcodes were removed by the process_radtags program (Catchen et al., 2013). Sequences were aligned to draft genome of P. solenopsis The final data sets were filtered using r package vcfR (Knaus & Grünwald, 2017) by retaining SNPs with coverage at least 10 X, a missing rate in one sample <5%, and samples with a missing rate of SNPs <20%.

| Microsatellite genotyping and mitochondrial DNA sequencing
As a complement to SNPs, we genotyped 21 microsatellite loci from 24 individuals of P. solenopsis per population following the method described in Ma et al. (2019) (Table S2). Briefly, a universal PC tail (5′CAGGACCAGGCTACCGTG3′) (Blacket, Robin, Good, Lee, & Miller, 2012) was used to identify forward primer candidates for amplification. All PCR products were analysed using an ABI 3730xl

| Outlier and neutral SNP identification
We used four population differentiation (PD) methods to identify outliers from all SNPs. First, BayeScan version 2.1 (Foll & Gaggiotti, 2008) was used to directly estimated the probability that a locusspecific F ST was different to a population-specific F ST . We ran this test with a prior odds value of 10, with 20 pilot runs of 5,000 iterations, a burn-in of 50 000 and with a thinning interval of 10. Loci with a false discovery rate (FDR) with <0.05 (q < 0.05) were candidate outliers. Second, arleQuin version 3.5.2.2 software (Excoffier & Lischer, 2010) was used to screen outliers with a P value lower than .01. Third, a Bayesian approach with the BayEnv model implemented in BaypaSS version 2.1 (Gautier, 2015) was used with default parameters. The XTX statistics were calibrated as suggested by Gautier (2015) and kept SNPs with values greater than a 99% threshold.
Fourth, PCAdapt was used (Lotterhos & Whitlock, 2014), which has powerful and low positive rates in independent evaluations of various outliers methods. We ran PCAdapt with a burn-in of 200 steps, K = 10, and a false discovery rate (FDR) lower than 0.01.
The 19 bioclimatic variables at resolution 10 min of degrees were downloaded from the WorldClim database (https://www.world clim.org/) for correlative analysis. To avoid high collinearity, we removed correlated predictors with |r| > .8. Six climatic variables remained for subsequent analysis, including bio3 (Isothermality), bio8 (Mean Temperature of Wettest Quarter), bio9 (Mean temperature of driest quarter), bio10 (Mean Temperature of Warmest Quarter),
We used three environmental association (EA) methods to identify outliers. First, BaypaSS version 2.1 was used for genotype-environment association analysis under the Basic Model. Second, latent factor mixed models (LFMM) implemented in the LEA R package (Eric, Schoville, Guillaume, & Olivier, 2013) were used to select outliers with |z|-scores and converted them into adjusted p values based on the Fisher-Stouffer method. Using the Benjamini-Hochberg procedure (Benjamini & Hochberg, 1995), we expected FDR correction at 5% to export candidate SNPs with adjusted p values <.05.
Third, we used a multivariate approach in a redundancy analysis (RDA) to estimate the extent to which the explained variance in SNP genotypes was explained by climatic variables and population structure and by their collinear portion (spatially autocorrelated climatic variation). RDA is a constrained linear ordination method that combines multiple linear regression and PCA (principal component analysis). Population structure effects were approximated by the first two principal components of a new PCA performed in R package. Two PCA vectors (PCA1 and PCA2) were remained when correlated predictors with |r| > .8 were removed. Since spatial autocorrelation is usually reflected by neutral genetic structure, we did not include spatial covariates to avoid over-conditioning the model (Rellstab, Gugerli, Eckert, Hancock, & Holderegger, 2015).
Both full RDA (population structure and climate) and partial redundancy analysis (pRDA) (population structure or climate) models were analysed in the r package vegan. The independent effect of the environment was the variance values for the constrained matrix of population structure in the pRDA, while the independent effect of population structure was the equivalent for the constrained matrix of climate. The collinear proportion was calculated by subtracting the independent effects of environment and population structure from the total amount of variance explained in the full RDA model. We used the loadings of the SNPs in the ordination space to determine which SNPs are candidates for local adaptation (Forester, Lasky, Wagner, & Urban, 2018).
Histograms of the loadings on each RDA axis usually show normal distributions. SNPs loading in the tails are more likely to be under selection as a function of predictors. So, we identified outliers as SNPs that load in the tails of these distributions with three times of standard deviation cut-off (two-tailed p = .0027). We identified SNPs for local adaptation loading for the first three significantly constrained axes.
Potential PD SNPs were those identified in each of the four PD methods, and potential EA SNPs were those identified in each of the three EA methods. The candidate SNPs under selection were those identified by both EA and PD methods, while potential neutral SNPs were those not identified in any of the outlier analysis.

| Genetic diversity analysis
For SNPs, genetic diversity was measured using the populations pro- For microsatellites, the genotyped data were corrected by MICRO-CHECKER (Van Oosterhout et al., 2004). The H o and H e were calculated by Microsatellite Tools (Park, 2001). Deviation from Hardy-Weinberg equilibrium (HWE), pairwise mean population differentiation (F ST ) and inbreeding coefficients (F IS ) were estimated in genepop version 4.0.11 (Raymond & Rousset, 1995). We used FStat version 2.9.3 to test average of allelic richness (A R ) of each locus (Goudet, 2017).

| Population genetic structure analysis
We estimated the individual ancestries using a maximum likelihood method implemented in admixture version 1.3.0 (Alexander, Novembre, & Lange, 2009

| Demographical history analysis
Contemporary effective population size for each population was in China using microsatellites. We performed independent analyses on ten pairs of population, whereas population XJTL was always included as suggested by Lombaert et al. (2014). Based on the results of genetic diversity and genetic structure, populations were divided into a northwestern population (XJTL) and a southern population group. Three biologically scenarios were tested: (a) one was dispersed from the other population (scenario 1 and 2); (b) two populations were sequentially introduced from a source population (scenario 3); (c) one was introduced from a source population, and the other was independently derived from another source population (scenario 4) ( Figure S2). Methods for DIYABC analysis are described elsewhere (Wei et al., 2015), and priors for the parameters are provided in Table S3.

| Gene annotation of environmentassociated outliers
Genes located 1 kb upstream and downstream of the candidate out-

| Genotyping and genetic diversity
An average sequencing depth of 37.18× was obtained after data filtering and trimming (  Table 1). The nucleotide diversity (Pi) ranged from 0.09 to 0.28. Two populations of XJTL and AHHF at the edge of the northern expansion had the lowest genetic diversity, and XJTL had the highest number of private alleles (Table 1).
For microsatellites, we found 46 alleles from 260 individuals.
There was no significant departure from HWE and linkage disequilibrium for a locus in all populations or a population on all loci, except for deviations from HWE in FJTP and GDGZ based on all microsatellites (p < .01) (Table S5). Negative F IS (except for ZJHZ) and level of allele richness was found among all populations based on microsatellites (Table S5).
Population genetic structure analysis using ADMIXTURE showed that the 11 populations were optimally divided into 11 clusters. When K was increased from 3 to 11, XJTL was the first separated cluster, followed by AHHF and FJPT (Figure 1c). In DAPC analysis, results showed that XJTL and FJPT were clustered into two separate groups; other populations were assembled into one cluster ( Figure 1b). STRUCTURE and DACP analyses based on microsatellite loci generated similar results to those based on SNPs, indicating strong population differentiation of P. solenopsis and a high level of differentiation of the distantly distributed northwestern population of XJTL ( Figure S4).   Table S6).

| Demographic history
DIYABC analysis revealed that scenario 2 was supported with the highest posterior probability in 9 of 10 analyses, suggesting that the XJTL population was derived from southern populations in China.

| Climatic effect on genetic variance
The full RDA model supported the role of climate and/or population structure in shaping the distribution of SNP genotypes (p = .001; R 2 = .367). In the partial RDA model, both climate conditioned on population structure (p = .001; R 2 = .152) and population structure conditioned on bioclimate (p = .001; R 2 = .058) showed significant effects on genetic variation. The first four RDA axes in the partial RDA model (where the climate was conditioned on population structure) were significant and together represented 79.61% of the constrained variance in SNP genotypes (Table S8). Climate accounted for 43.23% of the total explained genetic variance, population structure accounted for 15.90% and the collinear contribution of climate and population structure accounted for 40.87% (Table S9). When climate effects were conditioned by population structure effect in partial RDA analysis, the bio10 (mean temperature of warmest quarter) was highly correlated with genetic variance when the first two RDA axes were considered (Figure 3a), and the bio3 (isothermality) was highly correlated with genetic variance when the RDA3 and RDA4 axes were considered (Figure 3b).

| Candidate genes related to climatic adaptation
There are 190 candidate SNPs obtained by both EA and PD methods.
In total, 33 genes were obtained related to candidate loci, and 26 genes were annotated by eggNOG-mapper, which were enriched to energy metabolism (carboxylesterase and esterase), growth and development (ChBD, chitin-binding peritrophin-A domain), transcription factors (bHLH, basic helix-loop-helix), detoxification metabolism (ABC transporter and UDP-glycosyltransferase activity) (Tables S10, S11). In RDA analysis, we obtained 111 SNPs associated with climate variables. The bio10 variable had the most associated loci, followed by bio9 (mean temperature of driest quarter) and bio3 ( Figure S5).

| D ISCUSS I ON
In this study, we used genome-wide SNPs, microsatellites and mitochondrial DNA to examine the invasion genetics of the  F I G U R E 3 Partial RDA analysis on genetic variance explained by the effects of climate when population structure effects were constrained. Individuals from the same population were indicated by circles with the same colour. The blue vectors are the environmental predictors. The relative arrangement of individuals and climate variables in the ordination space reflects their relationship with the ordination axes, which are linear combinations of the predictor variables. Combinations of the first four ordination axes were shown, that is RDA1 and RDA2 (a), RDA3 and RDA4 (b). Six climate variables were used, including bio3 (Isothermality), bio8 (Mean temperature of wettest quarter), bio9 (Mean temperature of driest quarter), bio10 (Mean temperature of warmest quarter), bio15 (Precipitation seasonality) and bio17 (Precipitation of driest quarter)

| Invasion history of this invasive species in China
Based on historical records, P. solenopsis first invaded Guangdong in 2008 (Wu & Zhang, 2009) Table S1). Historical records showed that it was introduced into Xinjiang more recently. The distribution of the invasive species in southern China is now continuous, but the Xinjiang population is an outlier far from the early invasion area of this pest and with different (cold and dry) climatic conditions. Historical records could not determine whether the Xinjiang population was introduced from adjacent countries.
Our population genetic approach provides supplemental information on the dispersal history of this invasive species, as has been the case for other invasive species (Arnaud & Thomas, 2010;Cao, Wei, Hoffmann, Wen, & Chen, 2016 (Cao et al., 2016;Eric et al., 2015). Nevertheless, testing samples of populations from adjacent countries will be required for confidence in this conclusion.

| Rapid and strong genetic differentiation after the introduction
Although P. solenopsis has been in China for only about ten years, population genetic analysis based both on SNPs and microsatellites revealed strong population genetic differentiation, especially between the bridgehead populations involved in the northern expansion, such as XJTL and AHHF. Genome-wide SNPs here provided much higher resolution than microsatellites in detecting population structure, consistent with other comparisons of these marker systems (Rašić et al., 2014). Nevertheless, the level and pattern of population differentiation based on these marker systems was similar as expected if both are acting in a neutral manner at the population level. The highest F ST values which were between XJTL and other populations ranged from 0.45 to 0.64, while the lowest level of differentiation was 0.097 between two adjacent populations, estimated from SNPs (Table 2). Because all individuals used in our study shared one mitochondrial cox1 haplotype, the strong genetic differentiation we observed is unlikely to be attributable to an unidentified cryptic species. This is an example of rapid and strong genetic differentiation developing in an invasive species.
Genetic differentiation of invasive species may be related to different source populations, bottlenecks, genetic drift and/or rapid adaptive evolution (Bertelsmeier & Keller, 2018;Cao et al., 2017;Dlugosch & Parker, 2010). Based on the demographic analysis, the population differentiation in the invasive species was mainly caused by evolutionary processes in China rather than involving different sources of introduced populations. The low mobility of most developmental stages on plants makes this species relatively easy to identify through quarantine screening, which may have limited the number of individuals dispersing into a new area, accounting for the small effective population size and limited recent gene flow. These factors can lead to rapid genetic differentiation among populations (Ellstrand & Elam, 1993).

| Climatic adaptation
Genetic isolation by environmental conditions may generally be more common that isolation by geographical distance (Sexton, Hangartner, & Hoffmann, 2014). Phenacoccus solenopsis was discovered in 1898 by Tinsley in New Mexico, USA (Tinsley, 1898), and the potential distribution of this species appears to be limited by cold temperatures at high latitudes and altitudes, as well as aridity (Wang et al., 2010). The location of this species on the plant appears to be influenced by humidity (Hodgson et al., 2008). The current distribution of P. solenopsis in China has reached its predicted extent based on climatic models (Wang et al., 2010), but even so the cold and dry conditions experienced in northern China may impose strong stress on the invasive species, promoting evolutionary adaptation. Our multivariate analysis shows that variables related to both temperature and precipitation were associated with genetic patterns across populations of the invasive species, indicating the possible importance of environmental conditions overall. The genomic signatures provide possible indications on climate adaptation of P. solenopsis.
Due to the confound influence of population structure, we were unable to exclude the neutral processes in multivariate analysis.
Multiple sampling from different locations with the same climate conditions may help to reduce the influence of neutral variation in identification of adaptive loci.
In many species of mealybug, both sexes are capable of mating multiple times on the same day and on sequential days (Seabra et al., 2013). These mating behaviours may also be the case for P. solenopsis and help to reduce inbreeding and contribute to a higher effective population size. The P. solenopsis occurs multiple generations a year with development time from egg to adult stage about 25-30 days (Prasad, Prabhakar, Sreedevi, Rao, & Venkateswarlu, 2012). These biological characters in the invasive species may increase the likelihood of rapid adaptation to a new environment (rapid generation time, increased genetic variability).
We found a higher observed heterozygosity than expected heterozygosity and negative F IS in eight populations of P. solenopsis based on SNPs. Heterozygote excess can result from many potential causes, such as changes in effective population size, mating systems, asexual reproduction and overdominance (Stoeckel et al., 2006).
Fewer breeders due to a small effective population size can result to heterozygote excess (Pudovkin, Zaykin, & Hedgecock, 1996). In P. solenopsis, there was a higher heterozygosity and smaller effective population size in XJTL with the opposite situation occurring in JXNC (Table 1), which may be consistent with this expected pattern. In male mealybugs, paternally derived chromosomes will be inactivated during spermatogenesis and therefore only the maternally derived euchromatic chromosomes are transmitted by males to the progeny, a phenomenon known as paternal genome elimination (PGE) (Seabra et al., 2013). PGE is a form of haplodiploid reproduction (Normark, 2004). Haplodiploid population is expected to be different from diploids in evolution, such as effective population size and stable polymorphisms (Hedrick & Parker, 1997). Whether this form of inheritance in mealybugs impacts on genetic diversity and rates of adaptive evolution requires further study.
To provide a preliminary test of potential adaptive patterns, we identified genes associated with climatic variables involved in metabolic processes and molecular processes. These functional genes identified in our study provide candidates for environmental adaptation of the invasive species which requires further work. For instance, it may be possible to compare the cold/aridity responses of this species and relate variation within and across populations to the SNP variation.
The invasion success of many species will depend on their ability to respond to natural selection (Colautti & Barrett, 2013;Lee, 2002).
The recently developed method of characterizing genomic differences among populations provides an efficient approach to provide a preliminary test of local adaptation (Jensen, Foll, & Bernatchez, 2016;Munshi-South, Zolnik, & Harris, 2016;Pujolar et al., 2014) which can then be validated through additional work. However, few cases of adaptation of alien species to climatic conditions have been reported (Card et al., 2018). Our study provides SNP-based evidence of potential climatic adaptation in P. solenopsis, which may help to explain its invasiveness and its spatial expansion in recent years.

| CON CLUS ION
Our study on the population genetics of an invasive species P. solenopsis revealed three major findings on the invasion of this species. First, the data are consistent with the invasive species invading China from a single source population. Second, extremely rapid and strong population genetic differentiation among populations appears to have developed after introduction, suggesting limited gene flow, and small effective population sizes. Third, we found suggestive evidence of genomic signatures of climatic adaptation in an invasive species despite very low genetic diversity in the introduced population. Our study suggests rapid evolution in an invasive species which are worth investigating in additional studies with an experimental focus.

ACK N OWLED G EM ENTS
The study was supported by the National

CO N FLI C T O F I NTE R E S T
The authors declare they have no competing interest.

DATA AVA I L A B I L I T Y S TAT E M E N T
The data and scripts supporting the results of this article are available in the Dryad repository (https://doi.org/10.5061/dryad.0rxwd brwf).