Invasion genomics uncover complex introduction patterns of the globally invasive whitefly, Bemisia tabaci MED

The sweet potato whitefly, Bemisia tabaci MED is a globally invasive species that causes serious economic damage to agroecosystems. Despite the significant threat it poses to agricultural and economic crops worldwide, the global perspective of the invasion patterns and genetic mechanisms contributing to the success of this notorious pest is still poorly understood. The objective of this research was to enhance genome and population genetic analyses to better understand the intricate invasion patterns of B. tabaci MED.


| INTRODUC TI ON
Biological invasions, as a biogeographic phenomenon, present an increasing threat to ecological and evolutionary stability of the invaded communities when non-native species are introduced to geographic regions outside of their native range (Ahmed et al., 2021;Diagne et al., 2021;Pearson & Ortega, 2013;Sardain et al., 2019).
Understanding the causes of biological invasions requires that parallel field studies are conducted in the native and introduced ranges to elucidate how biogeographical shifts alter the individual performance, population success and community-level impacts of invading species (Pearson et al., 2022). A crucial first step in mitigating the impact of invasive species is to comprehend their invasion history, such as the invasion origin, expansion process, invasive capability and possible genetic adaptation to novel environment stress in invaded regions (Estoup & Guillemaud, 2010;Ryan et al., 2019). The role of genomics in understanding these invasions has gained increasing recognition, and the provision of reference genomes is allowing for the use of restriction site-associated DNA sequencing (RAD-seq) to accurately delineate the evolutionary history and genetic composition of both native and invasive populations (North et al., 2021;Yang et al., 2020).
The sweet potato whitefly, Bemisia tabaci (Gennadius; Homoptera: Aleyrodidae) species complex, is a notorious agricultural pest worldwide that comprising at least 44 cryptic species differed in many biological traits such as virus transmission, insecticide resistance and host plant ranges (Kanakala & Ghanim, 2019).
While the majority of cryptic species within the B. tabaci complex are restricted to particular geographic areas (Parrella et al., 2012), certain ones such as B. tabaci MEAM1 (Middle East-Asia Minor I, formerly known as biotype B) and B. tabaci MED (Mediterranean, formerly known as biotype Q) are highly invasive and have now spread globally and become significant global pests (De Barro et al., 2011;Guo et al., 2021;Wan & Yang, 2016). Their strong polyphagy, high reproductive rate, high resistance to insecticides and ability to transmit viruses make B. tabaci MED and MEAM1 a serious threat to cultivated plants in any invaded area, especially where the whiteflies encounter favourable agroecological conditions (Horowitz et al., 2005;Navas-Castillo et al., 2011;Oliveira et al., 2001). B. tabaci MED was first discovered in the Iberian Peninsula (Mediterranean Basin) in 1997 and has since then become a widespread invasive cryptic species and invaded many countries such as North America, South Korea and China over the past several decades (Chu et al., 2006;Dennehy et al., 2005;Guirao et al., 1997;Lee et al., 2005;Li et al., 2017;Moya et al., 2001).
Phylogenetic analysis has revealed a considerable genetic variability among mitochondrial lineages of B. tabaci MED. Among the four mitochondrial variants that have been discovered, two of them, referred to as Q1 (western Mediterranean populations) and Q2 (eastern Mediterranean populations), are extremely invasive and have extended their presence to non-native regions Gauthier et al., 2014;Tsagkarakou et al., 2007). These lineages are also referred to as Spanish-Q (Western) and Israel-Q (Eastern) when using microsatellite markers (Rosell et al., 1997;Horowitz et al., 2003;Brown, 2010).
In 2003, B. tabaci MED was first discovered in Yunnan Province and gradually become the dominant cryptic species in the field in most regions of China since 2008 (Chu et al., 2006;Li et al., 2017Li et al., , 2023. The B. tabaci MED that invaded China belongs to the Western Q1 lineage, while the North American invasion of the same species has origins from both the Western Q1 and Eastern Q2 lineages Parrella et al., 2014;Hadjistylli et al., 2016). Interestingly, in certain areas of China, the invasive B. tabaci MED has been discovered to displace the previously established B. tabaci MEAM1, particularly where insecticides are applied frequently, which attributed to the greater resistance to insecticides, higher fitness and increased tolerance to high temperatures of resistant B. tabaci MED populations (Bonato et al., 2007;Chu et al., 2010Luo et al., 2010;Wang et al., 2010). However, in North America, the B. tabaci MED is almost limited to greenhouses with weaker competitiveness (McKenzie et al., 2009(McKenzie et al., , 2012. Additionally, the two lineages (Q1 adaptation to the new environment. Population genomics analysis showed that three highly differentiated genetic groups were formed, and complex and extensive gene flow occurred across the Mediterranean populations. The genetic admixture patterns in East Asia populations were distinct from those in North America, indicating that they had different source populations.

Conclusions:
The high-quality, chromosome-scale genome of B. tabaci MED offered opportunities for more comprehensive genome-wide studies and provided solid foundation to the complex introduction events and the differential invasiveness of B. tabaci MED worldwide.

K E Y W O R D S
Bemisia tabaci, chromosome level, genetic differentiation, genome assembly, invasion genomics, spread pattern and Q2) have different endosymbiont compositions (Gueguen et al., 2010;Parrella et al., 2014), which has generated further interest in determining which lineage has more potential for invasiveness beyond its native Mediterranean range. Given that, researchers have started to pay more attention to comparing the population structures and evolutionary histories obtained from microsatellite and mitochondrial DNA markers. While the major lineages, clades and biotypes identified from the two molecular techniques are in agreement across several studies (Boykin et al., 2007;Dinsdale et al., 2010), there are still notable discrepancies between nuclear and mitochondrial relationships.
The advancement and reduced cost of RAD-seq technology have made it possible to study the biology of invasion and spread, as well as evolutionary genetics in a number of invasive species (Wagner et al., 2013;Nadeau et al., 2013;Takahashi et al., 2014;Lozier, 2014). While nextRAD sequencing was used for analysing the admixture and demographic history of B. tabaci MED, the comprehensive genetic structure at a global scale is still not well understood due to limited sampling, particularly in Asia's invasive regions (Elfekih et al., 2018). Another genotyping technology, the Type IIB-Digest Restriction Site-Associated DNA fragment procedure (2bRAD), has been shown to be effective in characterizing population genetics in small-bodied insect pests (Elfekih et al., 2018;Wang et al., 2012;Yang et al., 2020). A high-quality draft genome of the B. tabaci species could provide a fundamental basis for the identification of SNPs and selected genes that probably involved in environmental adaptation. Currently, three as- However, the scaffold N50 for these sequences only ranges from 0.4 M to 3.2 M due to limitations in sequencing technology (Chen et al., 2016(Chen et al., , 2019Xie et al., 2017). Thus, the high-quality genome of B. tabaci MED at the chromosome level should be generate, which could provide more comprehensive data for accurately uncovering the complex invasion patterns and genetic mechanisms behind the invasiveness discrepancy among different invasive populations.
Here, we first reported a chromosome-level, high-quality genome of B. tabaci MED and highlighted the expanded gene families and positively selected genes that probably related to environmental adaption and invasion success. Furthermore, we conducted a population genomics analysis of B. tabaci MED from its native Mediterranean regions (Spain, Croatia, Bosnia and Herzegovina, Cyprus and Israel) and invaded regions (China, South Korea and North America) using the 2b-RAD method and mitochondrial DNA sequences. Based on these genomic datasets, we aim to delineate the complex invasion patterns of B. tabaci MED by inferring the spatial patterns of genetic diversity, population structure and gene flow inference. These results could not only offer opportunities for more comprehensive genome-wide studies but also provide a solid foundation to delineate the complex introduction events and the differential invasiveness of this global invasive pest.

| Sample collection
The whitefly samples prepared for genome sequencing were collected from Hainan Province, China, in 2017. These whiteflies were maintained for approximately 30 generations under controlled conditions (27 ± 2°C, 16 L: 8D and RH 60 ± 5%). In preparation for Hi-C and genomic sequencing, about 300 pairs of fresh adult whiteflies were collected and then sequenced at Qingdao Biomarker Technologies Corporation.
Our previous analyses evidenced that at least four whitefly individuals for each locality can yield a reliable estimate of population genomic parameters such as genetic diversity and estimates of population genetic differentiation using 2b-RAD sequencing and library construction (Qu et al., 2019). So, we selected five or ten whitefly individuals randomly from each of the Mediterranean regions and Mediterranean. In the invaded range, 20 populations were selected from its main distribution areas, including South Korea (6), China (10) and North America (4). The samples were all stored in 95% ethanol for further 2b-RAD sequencing.

| DNA extraction
The DNA extraction process was performed using the TIAMamp Micro DNA Kit (TIANGEN, Beijing, China) in accordance with manufacturer instructions. About 300 pairs of newly emerged adults or each individual adult were homogenized with a cell disrupter and then yield genomic DNA following the kit protocol.
The DNA quantity and quality controls were tested by Qubit2.0, Agilent 2100 and Q-PCR method. The extracted genomic DNA was preserved at −20°C until genomic sequencing, mitochondrial cytochrome oxidase I (mtCOI) fragment amplification and 2bRAD sequencing.

| Identification of B. tabaci mitochondrial haplotypes
The identification of specimens from native and invaded range was performed using the primer pair 2195-MF(5′-CTGGT T Y T TTG GTC ATC CRGARGT-3′) and 2830-R(5′-CA ATC AGC ATA ATC TGA ATATCG-3′) (Simon et al., 1994). The PCR reactions were conducted in 25 μL solutions, consisting of 22 μL 1.1 × buffer (from Tsingke Biotech), 1 μL of template DNA and 1 μL of each primer (10 μM). The thermocycling programme consisted of starting with a denaturation step at 98°C for 2 min. This was followed by 35 cycles of a denaturation step at 98°C for 10 s, an annealing step at 50°C for 10 s and an extension step at 72°C for 1 min. The protocol concluded with a final extension step at 72°C for 2 min. The PCR products were sequenced using an ABI 3730 DNA Analyzer at Genscript Biotech (Qingdao, China). The mtCOI sequences were aligned using ClustalW in MEGA 6.0 (Tamura et al., 2013), and a total of 620 bp of mtCOI fragment were obtained from each sample.

| Genome survey and sequencing
To evaluate the genome size of B. tabaci MED, an Illumina highthroughput sequencing library with an insert size of 350 bp was constructed from genomic DNA and paired-end sequenced on the Illumina NovaSeq-6000 platform. We performed the k-mer analysis to determine the genome size using Jellyfish v2.1.3 software with the k-mer value of 17 ( Figure S1; Marcais & Kingsford, 2011).
Afterwards, the repeat content and heterozygosity rate were calculated with GenomeScope 2.0 software following the k-mer frequency distribution (Ranallo-Benavidez et al., 2020).
Short paired-end libraries (350-bp insert size) were generated and sequenced on the Illumina NovaSeq-6000 platform for Illumina sequencing. A SMRT bell library (20-kb insert size) was constructed and run on the PacBio Sequel II platform for Pacific Biosciences (PacBio) sequencing. We filtered the Illumina raw sequencing data by trimming the adapters, over 10% of nucleotides (N) designated as unknown, and 50% of low-quality bases (q-value ≤ 10), generating a total of 105.92Gb clean and high-quality reads. For the raw data from PacBio, short subreads (<5 kb) were removed and only a single representative subread was kept for each PacBio read.
Approximately 300 pairs of fresh B. tabaci MED adult samples were prepared for Hi-C library construction with a slight modification of the procedure described by Putnam et al., 2016. The procedure of cross-linking involved subjecting the cells to a 2% formaldehyde solution, which was terminated by the addition of 2.5 M glycine.
Afterwards, the cells were quickly frozen in liquid nitrogen and preserved at a temperature of −80°C. Then, the chromatin underwent digestion utilizing the Hind III restriction enzyme. The preparation of the Hi-C library involved incubating it with biotin, followed by the addition of T4 DBA ligase to facilitate DNA ligation. After adding the Proteinase K for reverse cross-linking, the ligated DNA was then cut into fragments that varied in size from 300 to 700 bp. It was repaired by blunt-end ligation and A-tailing, after which it was purified using biotin-streptavidin pull-down. They were sequenced using the Illumina NovaSeq6000 platform after quality assessment.

| Genome assembly
We constructed the draft genome by utilizing the raw reads produced by both the Illumina and PacBio sequencing platforms. PacBio long reads established the genome framework and Illumina short reads improved the genome assembly. The correction of PacBio long reads was carried out first using the CANU program v1.7, followed by assembling the corrected reads into contigs using SmartDeNovo (available at https://github.com/ruanj ue/smart denovo) (Koren et al., 2017).
The PacBio reads were used to improve the primary contigs with Quiver software (SMRT Link v5.0.1), and the gaps were filled with PBJelly software (English et al., 2012). Finally, the Illumina short reads were used to correct the genome assembly errors with Pilon v1.22 software (Walker et al., 2014).
Th Hi-C reads were aligned to the B. tabaci MED draft genome, and only the data that were uniquely mapped were used for assembly with the LACHESIS software (Burton et al., 2013;Langmead et al., 2009). After removing the duplicate data, the HiC-Pro (version 2.8.1) was applied to assess the quality (Servant et al., 2015). Afterwards, the corrected scaffolds were put together using LACHESIS and the interaction matrix of all chromosomes was generated along with heatmaps to provide a visual representation. The quality and completeness were assessed by conducting Benchmarking Universal Single-Copy Orthologs (BUSCO) analysis using busco version 3.0 (Simão et al., 2015) and Core Eukaryotic Genes Mapping Approach (CEGMA) analysis using cegma v2.5 (Parra et al., 2007).
Tandem repeats in the genome were identified using Tandem Repeat Finder v4.07b. With these procedures, we successfully annotated the repeat sequences of the B. tabaci MED genome.
We used multiple methods to identify the conventional small noncoding RNAs, such as small nuclear RNAs (snRNAs), small nucleolar RNAs (snoRNAs), transfer RNAs (tRNAs) and ribosomal RNAs (rRNAs). In detail, rRNAs were located through a BLAST search against a database of invertebrate rRNA, with a threshold E-value of 1e−10. The tRNAscan-SE software was used for the identification of tRNAs (Lowe & Eddy, 1997). The identification of miRNAs, snRNAs and snoRNAs was performed using INFERNAL (v1.1rc4) and searching against the Rfam database (Nawrocki & Eddy, 2013).
We applied three different gene annotation methods, includ- The functions of the genes were determined through alignment with several widely used databases, including eukaryotic orthologous groups of proteins (KOG), National Center for Biotechnology Information (NCBI), nonredundant protein database (NR), Trembl, Kyoto Encyclopedia of Genes and Genomes (KEGG) and Swiss-Prot. The alignment was performed using blast v2.2.31 with a 1e-5 threshold (Altschul et al., 1990). The Gene Ontology (GO) database was used for gene annotation via the blast2go software following the aligned results from the NR database (Conesa et al., 2005; Marchler-Bauer et al., 2010).

| Phylogenetic analysis of B. tabaci MED with other insect species
We conducted a phylogenetic analysis with 13 other insects genomes from the NCBI database, including Halyomorpha halys and Ceratitis capitata (GCA_905071925.1). Tetranychus urticae (GCA_000239435.1) was selected as an outgroup for constructing the phylogenetic tree. A total of 291 single-copy genes with high quality were chosen to build the phylogenetic tree. We aligned the coded protein sequences of these orthologous genes using MAFFT and then removed the poorly aligned regions using Gblocks (Katoh et al., 2009;Talavera & Castresana, 2007). The construction of maximum likelihood phylogenetic tree was then performed using IQ-TREE v1.6.11 with the best-fitting model LG + F + I + G4 and 1000 bootstrap replications (Nguyen et al., 2015).
The PAML package (version 4.9) was used to calculate the divergence times between the 14 species through the implementation of the 'correlated molecular clock' and 'JC69' model using the MCMCTREE (Puttick, 2019;Yang, 1997

| Gene family expansion and contraction analysis
First, all genes were assigned to corresponding gene families  (Han et al., 2013). The gene count file and an ultrametric species tree with branch length information were provided as the input for CAFE analysis. The optimal lambda parameter was automatically determined by the programme. The CAFE analysis revealed that those gene families with a family-wide p-value and viterbi p-value lower than .05 were defined to evolve rapidly. These expanded gene families were then annotated to KEGG pathways, and pathway with q-value <0.05 was considered significantly enriched.

| Positive selection analysis
The CodeML module in PAML was performed with the branchsite model to determine whether the corresponding gene was positively selected in B. tabaci MED (Yang, 1997 Subsequently, the codon multiple sequence alignment was generated based on the protein multiple sequence alignment result using the PAL2NAL software (Suyama et al., 2006). The 'chi 2 ' programme in the PAML programme was used to perform likelihood ratio tests (LRT) on Model A (assuming that the foreground branch ω was in a positive choice, i.e. ω > 1) and the null model (meaning that the ω value of any site was not allowed to be >1), with significance assessed at p < .01. The Bayesian method (BEB, Bayes empirical Bayes method) was then used to obtain positive selection sites (greater than 0.95 is usually considered significantly positively selected sites), and the genes receiving significant positive selection were ultimately obtained (Yang, 2005). The KEGG pathway enrichment analysis was performed for these positively selected genes in B. tabaci MED, and pathway with q-value <0.05 was considered significantly enriched.

| 2b-RAD sequencing for population genomics analysis
A total of 220 representative individuals from both native and invaded regions were selected for 2b-RAD sequencing. The construction and sequencing of the libraries were performed at Qingdao OE Biotech Co., Ltd. In brief, the genomic DNA was first digested with 1 unit of BsaXI (New England Biolabs, catalogue number R0609), and then, the fragments were linked with short adapter sequences . The Illumina NovaSeq sequencing platform was used to sequence the pooled purified PCR products.

| SNP calling
Raw reads were filtered following the criteria: Q ≥ 30 for ≥75% of each read, <8% N bases, and reads that did not contain BsaXI restriction sites. The paired-end raw reads were then merged using Pear v.0.9.6 and mapped to the B. tabaci MED reference genome using the SOAP aligner (-r0 -M 4 -v2), allowing for a maximum of two nucleotide base mismatches and filtering for unique and optimal alignments (Catchen et al., 2013;Li et al., 2008). PicardTools (http://broad insti tute.github.io/picar d/; v1.118) was utilized for coordination, sorting, elimination of PCR duplicates and the creation of a bam file index.
Finally, genome variants were identified using the Genome Analysis Toolkit version 4.0 following the criteria: (1) removing sites with minor allele frequency of less than 0.05; (2) removing all sites that had more than two alleles; and (3) removing sites that had >20% of missing data (McKenna et al., 2010). This generated a total of 20,026 single-nucleotide polymorphism (SNP) sites across all samples.

| Phylogenetic and population structure analysis
We constructed the neighbour-joining (NJ) tree using TreeBest

| Population genetic and gene flow analysis
The VCFtools v0.1.15 was utilized to compute the fixation statistics (F ST ) and nucleotide diversity (π) using a 10-Kb sliding window with no overlapping along each chromosome (Danecek et al., 2011 (Tables 1 and S2). Furthermore, the Hi-C data were utilized to anchor, order and orient these scaffolds, resulting in the formation of 10 linkage groups (LG) that contained more than 99% of the assembled sequences ( Figure 1b). The BUSCOs analysis revealed that B. tabaci MED genome assembly covers 96.5% of complete BUSCOs (Table 1).
In total, 275.02 Mb repeat sequences, accounting for approximately 43.14% of the assembled genome, were identified. This includes 110.3 Mb of transposable elements (TEs; 17.3%) which represent 17.3% of the total repeat sequences. These assembled TEs and tandem repeat sequences provide a valuable resource for studying transposon-mediated gene duplication, transposon expansion, the structure of centromeres, telomeres and heterochromatic regions (Table S3). The two most abundant repeat types were long interspersed nuclear elements (LINE) at 2.35% and long terminal repeat retrotransposons (LTR) at 1.32% (Table S3). Due to limitations in the analytical methods or database, almost 28% of the repeat sequences could not be categorized as they belong to unknown repeat types.
We used the EVM pipeline to combine homology-based annotation, transcriptome-based annotation and ab initio prediction. A total of 17,956 protein-coding genes were discovered in the B. tabaci MED genome (Figure 1d). Subsequently, these genes were functionally annotated using databases such as NR, KOG, KEGG, Trembl, Swiss-Prot and protein family (Pfam). As a result, 15,112 genes (84.16%) were successfully annotated (Tables S4 and S5).

| Species phylogeny analysis
The results showed that B. tabaci MED closely grouped with B. tabaci  (Tables 3 and S7).

| Genetic diversity and differentiation
To uncover the genetic variations in B. tabaci MED across different geographical populations, 220 individuals were sampled from F I G U R E 1 Genome sequencing assembly and annotation of Bemisia tabaci MED. (a) the sequencing approach used for the B. tabaci MED genome involved a combination of short-read Illumina sequencing, long-read PacBio sequencing and Hi-C sequencing; (b) a circular plot representation of the B. tabaci MED genome displays its characteristics using 500 Kb sliding windows. The tracks on the genome, starting from the outermost panel and moving inward, represent the following: Karyotype, density of protein-coding genes, density of repeat, density of G + C content and density of sample depth; (c) the contact map of the B. tabaci MED genome was constructed using Hi-C sequencing data, resulting in the formation of 10 linkage groups. The plot uses a colour bar to display the interaction intensity frequency, with yellow representing low intensity and red indicating high intensity. LG1-10 means chromosome 1-10; (d) EVM integration of coding gene sequence number predicted from three methods: Ab initio, homologue, RNAseq.  LG01 LG07 LG10 LG08 LG06 LG05 LG03 LG02 LG04 LG09  To determine the genetic differentiation among the three putative groups, we calculated the genetic distance, represented by the fixation index values (F st ), between the three groups. (Figure 3c).
The F st value between Groups B and C (F st = 0.07) was lower than the F st value between Groups A and B (F st = 0.19) or C (F st = 0.14), indicating that the populations in Groups B and C have the least genetic differentiation from the invasive population. This result is in agreement with the phylogenetic analysis. We further found little genetic diversity within groups Group A (π = 3.31 × 10 −5 ), Group B (π = 3.55 × 10 −5 ) and Group C (π = 3.45 × 10 −5 ) (Figure 3c).

| Gene flow and genetic structure of global Bemisia tabaci MED
The TreeMix analysis showed one gene flow event was found from the Central Mediterranean populations to North America population when assuming two gene flow events, and another gene flow event was observed from Western and Central Mediterranean populations  (Figure 3e). Furthermore, based on the lowest CV error value (K = 3), we found that the B. tabaci MED populations can be divided into three distinctive clusters, which was also consistent with PCA and phylogenetic analysis (Figures 3f and S3).
At K = 4, Group A, Group B and Group C could be also readily distinguished, but the CY and SPA populations were exhibited different population structure. Two new clusters emerged at K = 6, one is in Group A (USA0601c) and another is in Group C (CY and SPC).

| Geographic distribution of subgroups using mtCOI and SNP dataset
According to mtCOI typing results of all samples, seven mtCOI haplotypes were identified as Hap1-Hap7 ( Figure S4). Among these haplotypes, Hap1, Hap2 and Hap4-Hap6 are classified as group Q1, while Hap3 and Hap7 belong to Q2 group ( Figure S4; Table S8). with Q2 (Figure 4a; Table S8). The SNP data analysis is in general agreement with mitochondrial relationships with minor conflicts. For example, the Q1 populations from China, South Korea and western Mediterranean almost belonged to Group C, but the Cyprus (Eastern Mediterranean) was also identified as Group C (belong to Q2 using mtCOI marker). Consistently, the populations in American and BH mixed in Q1 with Q2 also have blended characteristics in their nuclear DNA (Figure 4b). Combined with mtCOI and SNP dataset analysis, we infer that East Asia populations and North America populations had different source populations.

| DISCUSS ION
In the present study, we utilized long-read sequencing technology (PacBio) along with short-read sequencing and chromatin confirmation assays (Hi-C) to generate a chromosome-level,      (Klai et al., 2020;Tay et al., 2010). Additionally, the average length of both exons and introns of each predicted gene is significantly larger than that in other whitefly genomes (Table 1). These observations indicate that the genome sequencing and assembly are valuable, which can serve as a reference genome for investigating evolutionary history and genetic adaptation.

ID Description
As shown by Zhang et al. (2013), gene families with rapid evolution and significant divergence among closely related species are more prone to being associated with adaptation compared with other genes. Our findings in this study showed that 50 gene families were expanded in the B. tabaci MED species, and the significant expansions are likely driven by the differential selection pressures and environmental conditions experienced by themselves in the invaded regions, such as changes in host range, resistance to insecticides or/and thermotolerance (Mahadav et al., 2009;Wang et al., 2023).
These gene families that underwent rapid expansion were related to various metabolism pathways related to insecticides, including drug metabolism-cytochrome P450, metabolism of xenobiotics by cytochrome P450. The superfamily cytochrome P450s, among the most important phase I metabolic enzymes, participate in the metabolic detoxification of toxic xenobiotics and the reduction in the penetration rates of insecticides (Bass et al., 2015). There is mounting evidence to show that the cytochrome P450 monooxygenases gene family had been proved to be associated with insecticide resistance in B. tabaci MED (Liang et al., 2022;Yang et al., 2013).
Moreover, B. tabaci MED has grown resistant to insecticides from many chemical classes, making it more tolerant to insecticides such as pyriproxyfen and neonicotinoids compared with the MEAM1 (Basit, 2019;Horowitz et al., 2005Horowitz et al., , 2020. The increased resistance to multiple chemical insecticides classes is a major force in becoming a globally invasive pest (Pan et al., 2015;Horowitz et al., 2020;Guo et al., 2021). The rapid expansion of genes related to physiological metabolism may play an essential role in rapid adaptation to the environment stress. Furthermore, some gene families that underwent rapid expansion were enriched into Insulin secretion and GnRH secretion pathways. In insects, ecdysteroids and insulin-like peptides (ILPs) appear to act as the most general growth-promoting signalling factors (Lin & Smagghe, 2019). Insulin and insulin-like factor signalling (IIS) pathways play important roles in insects such as in body size, growth and development, diapause and wing dimorphism (Ding et al., 2017;Sim & Denlinger, 2008;Williams et al., 2006;Xu et al., 2015). Gonadotropin-releasing hormones (GnRHs) play pivotal roles in reproduction via the hypothalamus-pituitary-gonad axis (HPG axis) in invertebrates (Sakai et al., 2020). Given that, we speculate that these rapidly expanded gene families in B. tabaci MED contributed to their population increase and outbreak within short time in invaded environment. Furthermore, a total of 436 genes were identified as positively selected genes. Functional enrichment analysis showed that these genes are enriched in multiple signalling pathways, including hedgehog signalling pathway, adrenergic signalling in cardiomyocytes, Ras signalling pathway and Wnt signalling pathway.
In all, not only are these genes expected to enhance the organism's ability to adapt to environmental stress, but they also play a role in the detoxification of both endogenous and xenobiotic reactive carbonyl compounds (Simard et al., 2020).
The invasion and spread pattern of alien species has garnered significant attention (Dalmon et al., 2008;, Chu et al., 2010. The utilization of RAD-seq approach has evidenced to be effective in uncovering the invasion and spread of various species  Figure 3). Surprisingly, while Groups A and B were found to cluster together in the phylogeny, the first principal component (PC1), which explains a significant portion of the variance in the data, actually separates Group A from both Group B and Group C. It is possible that Groups A and B may have diverged relatively recently in their evolutionary history and share many similar genetic characteristics, but may have undergone different adaptations or have been subject to different environmental pressures that influenced their trait variation. It is important to note that this separation may not necessarily correspond to the evolutionary relationship between the groups, as the PCA is based on quantitative measures of similarity between the data points rather than on a phylogenetic tree. In fact, a phylogenetic tree could potentially provide more insights into the population clusters. Further analysis may be needed to fully understand the relationships between these groups driving their patterns of variation, such as using the data from whole genome resequencing. The mitochondrial data revealed that the B. tabaci MED population in Mediterranean regions can be separated into two groups, Q1 and Q2, which are predominantly found in the western and eastern Mediterranean regions, respectively ( Figure 4a).
The combination of SNP and mitochondrial analysis reveals the geographical distribution of subgroups, highlighting the clear genetic heterogeneity between North American and Asian populations ( Figure 4). This finding is in line with previous studies, which suggested that B. tabaci MED populations in Asia originated from western Mediterranean countries Chu, Hu, et al., 2012;Lee et al., 2014;Park et al., 2012) and those in North America originated from both eastern Mediterranean countries (Hadjistylli et al., 2016). We used genome-wide SNP analysis to explore genetic mixing patterns between populations that were collected from various geographical localities worldwide. This allowed us to gain insights into migration events between native and invasive populations. The genetic mixing analysis performed using Treemix showed that one gene flow event was found from populations in the Central Mediterranean populations to those in North America population when assuming two gene flow events, while the other events were characterized from Western and Central Mediterranean populations to Asian populations (Figure 3e). The results were in agreement with the genetic structure analysis and previous report indicating that the populations in Asia and North America have distinct ancestral origins . However, a prior study revealed that there was a migration event from invasive populations in Arizona (North America) to native regions in Greece, which can be attributed to the impact of the trade in ornamental plants (Cheek & Macdonald, 1994;Dalton, 2006). The tries. In all, our findings provide a crucial resource for genomic study of invasive whitefly and will facilitate to develop effective strategies for future pest management.

ACK N O WLE D G E M ENTS
This research was supported by the National Key R&D Program of China (Grant No. 2022YFD1401204) and the Taishan Scholar Foundation of Shandong Province (tstp20221135).

CO N FLI C T O F I NTE R E S T S TATE M E NT
The authors have no conflicts of interest to declare.

PEER R E V I E W
The peer review history for this article is available at https:// www.webof scien ce.com/api/gatew ay/wos/peer-revie w/10.1111/ ddi.13751.

DATA AVA I L A B I L I T Y S TAT E M E N T
The genome assembly data have been deposited in the figshare da-