Limited introgression from non‐native commercial strains and signatures of adaptation in the key pollinator Bombus terrestris

Insect pollination is fundamental for natural ecosystems and agricultural crops. The bumblebee species Bombus terrestris has become a popular choice for commercial crop pollination worldwide due to its effectiveness and ease of mass rearing. Bumblebee colonies are mass produced for the pollination of more than 20 crops and imported into over 50 countries including countries outside their native ranges, and the risk of invasion by commercial non‐native bumblebees is considered an emerging issue for global conservation and biological diversity. Here, we use genome‐wide data from seven wild populations close to and far from farms using commercial colonies, as well as commercial populations, to investigate the implications of utilizing commercial bumblebee subspecies in the UK. We find evidence for generally low levels of introgression between commercial and wild bees, with higher admixture proportions in the bees occurring close to farms. We identify genomic regions putatively involved in local and global adaptation, and genes in locally adaptive regions were found to be enriched for functions related to taste receptor activity, oxidoreductase activity, fatty acid and lipid biosynthetic processes. Despite more than 30 years of bumblebee colony importation into the UK, we observe low impact on the genetic integrity of local B. terrestris populations, but we highlight that even limited introgression might negatively affect locally adapted populations.


| INTRODUC TI ON
Insect pollinators play a vital role in sustaining natural ecosystems and in the productivity of agricultural crops (Biesmeijer et al., 2006;Gill et al., 2016;Klein et al., 2007;Potts et al., 2010).Evidence of widespread declines in pollinators in recent decades has been attributed to a combination of several human-mediated changes in the environment at local and regional scales, including habitat loss, intensification of agriculture and pesticide use and the introduction of parasites and pathogens (Goulson et al., 2015;Ollerton et al., 2014;Powney et al., 2019).In addition, the use of managed and commercial bee colonies, including bumblebee colonies, for crop pollination and honey production has promoted the transfer of diseases (Goulson, 2010), and facilitated the spread of non-native species and subspecies, posing a risk to endemic species (Jaffe et al., 2016;Lecocq et al., 2016;Seabra et al., 2019).Declines in wild bumblebee species, in particular, have been highlighted (Cameron & Sadd, 2020;Goulson et al., 2015;Grixti et al., 2009;Potts et al., 2010;Soroye et al., 2020;Williams & Osborne, 2009), and bumblebee colonies are mass produced for the pollination of more than 20 crops and are imported into over 50 countries, including those outside the native ranges of reared bumblebee species (Velthuis & van Doorn, 2006).
The successful deployment of commercially reared bumblebees around the world, particularly Bombus terrestris, is largely due to the development of effective mass-rearing techniques (Velthuis & van Doorn, 2006), and the ability of these bees to efficiently pollinate crops such as tomatoes, strawberries and melons through 'buzz pollination', where the bees release pollen from flowers by shaking the anthers at high frequency (Buchmann, 1983;Nayak et al., 2020;Wahengbam et al., 2019).Tomatoes, Solanum lycopersicum, which are cultivated on a large scale and pollinated by bumblebees worldwide, have an annual economic export value of 10.06 billion USD (Cooley & Vallejo-Marin, 2021).However, the risk of invasion by commercial non-native bumblebees is considered one of 15 emerging issues for global conservation and biological diversity (Sutherland et al., 2017).
Commercial bees can potentially escape from greenhouses and successfully establish new populations and/or hybridize with endemic species or subspecies.Such invasions may drive changes in plantpollinator interactions and local bumblebee population diversity, as well as result in the spread of new pathogens (reviewed in Mallinger et al., 2017).Bombus terrestris is considered to have high biological invasion potential because these bees are ecologically flexible, emerge early in the season and have highly polylectic foraging habits (Dafni & Shmida, 1996;Hingston et al., 2002;Hingston & McQuillan, 1998;Matsumura et al., 2004).The native range of the different subspecies of B. terrestris comprises Europe, the Middle East and parts of North Africa (Rasmont et al., 2008).International trade of commercially reared B. terrestris colonies is reported to be the main cause of invasions into several countries including Chile, China, Israel, Japan, Mexico, South Africa, South Korea and Taiwan (Acosta et al., 2016;Amots Dafni et al., 2010;Hingston et al., 2002;Howlett & Donovan, 2010;Inari et al., 2005;Matsumura et al., 2004;Schmid-Hempel et al., 2014;Velthuis, 2002;Velthuis & van Doorn, 2006), and B. terrestris is also an invasive species in New Zealand and Tasmania (Australia) (Lye et al., 2011;Semmens et al., 1993).Invasions of B. terrestris are thus considered significant at a global scale, and this may be especially due to the rate of spread, which has been up to 200 km per year in Chile and Argentina (Acosta et al., 2016;Dafni et al., 2010;Schmid-Hempel et al., 2014).
The importation of commercial B. terrestris into regions where consubspecific populations occur has also been highlighted as concerning but has been less investigated (Ings et al., 2006;Lecocq et al., 2016).Bombus terrestris includes nine subspecies that are classified by their body hair colour patterns, semiochemical profile, genetic composition and distributional ranges (Cejas et al., 2018;Rasmont et al., 2008), with two of the most widely used subspecies in artificial rearing being Bombus terrestris dalmatinus Dalla Torre and Bombus terrestris terrestris (Linnaeus) (Velthuis & van Doorn, 2006).
A key example concerns the UK where the endemic subspecies is and potentially increase their invasive potential (Facon et al., 2011;Lecocq et al., 2016).Furthermore, if invasion leads to the establishment of populations of commercially bred bumblebee subspecies, and/or hybridization between these bees and native subspecies occurs, a multitude of key biological traits in native populations may be affected, with impacts on their genetic diversity, ecological adaptation, as well as key plant-pollinator interactions.However, the effects of commercially bred subspecies on native pollinator genetic integrity and evolution have been the focus of few studies to date (Cejas et al., 2021;Kardum Hjort et al., 2022;Seabra et al., 2019).
Hybridization between native and commercially bred subspecies in the UK, and the establishment of populations of the commercial subspecies is plausible.For instance, hybridization between different B. terrestris subspecies under laboratory conditions and the production of viable colonies has been demonstrated (De Jonghe, 1986;Ings et al., 2005;Lecocq et al., 2015), and mate choice studies suggest that although mating is not random, B. t. dalmatinus can and do mate with B. t. audax (Ings et al., 2005).Further, recent studies have detected introgression between commercial and native subspecies in the Iberian peninsula (Bartomeus et al., 2020;Cejas et al., 2018Cejas et al., , 2020;;Seabra et al., 2019), and between B. terrestris lineages in Britain and continental Europe potentially resulting from the import of commercial colonies (Moreira et al., 2015).In addition, reports of winter activity have increased in the UK since the early 1990s (Hart et al., 2021;Robertson, 1991;Stelzer et al., 2010).Most northern European B. terrestris populations are active during the warmer parts of the year, when queens emerge from diapause in the spring and produce a colony over the summer, and then new queens enter diapause at the end of the summer (Løken, 1973;Rasmont et al., 2008).(Owen et al., 2016).On the other hand, introgression was not detected between commercial B. t. dalmatinus and wild B. t. audax in the UK (Hart et al., 2021), or between commercial and wild B. impatiens in New England, USA (Suni et al., 2017), and no evidence was found for recent introgression of commercial B. terrestris into local wild conspecific populations in southern Sweden (Kardum Hjort et al., 2022).It may simply be the case that the behaviour is plastic, and that B. t. audax can express winter-active behaviour under the right environmental conditions, such as can be seen with the facultative expression of social behaviour in Halictus sweat bees (Field et al., 2010).
Here, we use genome-wide SNPs to detect and quantify introgression of commercial B. terrestris subspecies into native UK bumblebees, and evaluate the consequences of importing commercially bred non-native bumblebee subspecies into the UK.We test the hypothesis that importing such bees into the UK has facilitated hybridization between commercial and native subspecies, and that B. terrestris throughout the island of the UK has become a single genetically homogenous population.We conduct population genomic analyses of natural populations of B. terrestris close to (<6 km) and far from (>20 km) farms using commercial colonies for pollination, as well as commercial populations, and test for hybridization.We estimate the genetic diversity of natural and commercial populations and identify private alleles in the wild populations that can be used to help distinguish the native and commercial subspecies of B. terrestris.We also identify genomic regions that may be associated with global and local adaptation in this key pollinator.

| Sampling
We studied 129 bumblebee specimens (Bombus terrestris), including drones (males) and workers (females), from seven natural populations throughout the United Kingdom (Figure 1a), and three populations representative of the commonly used commercial subspecies/ strains (Table S1).Sampling was conducted from farms and from wild populations later in the season when drones were flying; therefore, the sampled populations include both drones (haploid males) and workers (diploid females), and differences in ploidy were accounted for in analysis (see below).All bees were collected in 2014 when both native and non-native subspecies were supplied to UK farms by companies that rear bumblebees for pollination.Natural populations were sampled from sites close to (<6 km) and far from (>20 km) farms using commercial colonies (data on farms and commercial colonies reported by farmers and collected by Natural England, Welsh Nature Policy and Science and Advice for Scottish Agriculture departments), and therefore deemed more and less likely, respectively, to be interacting with bees from commercial strains (Table S1).Bombus terrestris reportedly has a foraging range of up to 2 km (Walther-Hellwig & Frankl, 2000), with queen dispersal estimated as 3-5 km (estimated for other species only to date -B.pascuorum and B. lapidarius (Lepais, 2010)), and with males able to disperse up to 9.9 km (Kraus et al., 2009).With the permission of the farmers, commercial populations were sampled on site at two large tomato farms, using

| Sequencing, mapping and genotyping
Genomic DNA was extracted from the whole body of each bee stored in ethanol and refrigerated using a Qiagen DNeasy Blood & Tissue Kit (Qiagen) following the manufacturer's protocol.Gel electrophoresis and a Qubit v4.0 fluorometer (Life Technologies) were used to assess the integrity and quantify the DNA of each sample respectively.The species identity of all samples was confirmed by sequencing the mtDNA COI region and aligning sequences against entries in the NCBI database using BLAST tools.
Approximately 100 ng of DNA per sample was used to construct double-digest restriction site-associated DNA (ddRAD) using the quaddRAD protocol (Franchini et al., 2017) following the modifications introduced in Franchini et al. (2018).Double digestion was carried out with a rare cutter, PstI, and a frequent cutter, MspI, restriction enzyme.A single quaddRAD library including 129 barcoded individuals was size-selected from 450 to 550 bp in a Pippin Prep system (Sage Science) and paired-end sequenced (2 × 150 bp) in an Illumina HiSeq 2500 platform at the genome facility of Tufts University (TUCF Genomics).
A total of 117.67 million (M) raw reads were processed by the clone_filter and process_radtags scripts implemented in the Stacks v2.59 (Catchen et al., 2011) package in order to remove PCR duplicates (clone_filter option: "--inline_inline") and demultiplex individuals while discarding low-quality reads (process_radtags options: "-c -q").A final data set of 107.89 M paired reads (averaging 0.84 M reads per individual) was aligned to the Bombus terrestris genome (Bter_1.0;NCBI GenBank assembly accession: GCA_000214255.1) using the program bwa-mem v0.7.17 (Li & Durbin, 2009) with default parameters.A total of 125 individual mapping files, compressed to BAM format and sorted with Samtools v1.9 (Danecek et al., 2021), were retained after excluding four samples due to exceptionally low number of aligned reads.Downstream variant and genotype calling was performed using GATK v4.3.0.0 (Van der Auwera & O'Connor, 2020) in a three-step procedure: (i) the Hap-lotypeCaller module was used to call variants per-sample in GVCF mode by genotyping each sample based on its ploidy, and thus allowing us to maximize the number of variants across ploidy levels; (ii) per-sample GVCF files produced by HaplotypeCaller were combined into a multi-sample GVCF with the module CombineGVCFs; (iii) joint genotyping on all individual samples, pre-called with HaplotypeCaller, was performed using GenotypeGVCFs with the '-all-sites' option in order to output both variant and invariant sites, required to calculate nucleotide diversity (π) and absolute divergence (dxy).To produce a VCF file including only variant sites, the GATK output in standard VCF format was handled and filtered with VCFtools v0.1.15(Danecek et al., 2011) by removing short indels ('--remove-indels'), keeping only biallelic SNP loci ('--min-alleles 2 --max-alleles 2') with a mapping quality (MQ) > 30 ('--minQ 30') and minimum read depth of 5 ('--minDP 5').Finally, sites with a minimum allele count of 3 ('--mac 3') and genotyped in at least 50% of the individuals in each population were retained.This procedure resulted in a highly complete matrix including 8953 variable sites with only 3.47% missing genotypes across all samples (missing genotypes per population ranging from 2.03% to 5.54%).To reduce the effect of non-independence of markers due to physical linkage and linkage disequilibrium (LD), and thus satisfying one of the main assumptions of model-based algorithms used in this study, we generated a new SNP data set by pruning linked variants showing a r 2 (a commonly used measure for LD) greater than 0.15.To this end, we used the program Plink v1.9 (Purcell et al., 2007) with the option '--indep-pairwise 50 10 0.15' and ended up with a data set including 6774 unlinked SNPs.Invariant sites, selected by applying a minimum allele frequency filter ('--max-maf 0'), were processed by removing indels, applying a minimum read depth of 5 and a 50% missing data threshold.Independently filtered variant and invariant sites were then combined in a single VCF file using BCFtools v1.16 (Li, 2011) resulting in a total of 650,772 loci.As a final step, we "haploidized" the diploid individuals by randomly assigning heterozygote alleles as either homozygote reference or homozygote alternative using the "pseudoHaploidize" function included in the correctKin package (Nyerki et al., 2023).This procedure was implemented to create a bias-free SNP data set to be processed by algorithms that do not deal with mixed-ploidy populations.These new genotype data made of real haploids and haploidized diploid individuals were used for all downstream analyses except for the Entropy runs (see details below).

| Genetic diversity and differentiation
We used the program pixy v1.2.5 (Korunes & Samuk, 2021) to compute sliding-window estimates of both nucleotide diversity within populations (π), relative divergence between populations (F ST ) and absolute divergence between populations (dxy).Slidingwindow computations were performed across the genome using non-overlapping sliding windows of 200 kbp.The algorithm implemented in pixy allows to calculate π and dxy from VCF files encoding both variant and invariant sites, and provides unbiased estimates of these genetic parameters (Korunes & Samuk, 2021) using, for each window, only the actual sequence data in the window for the computation.As pixy computes per-window point estimates of π and dxy, we also computed bootstrapped estimates for each window in each population by obtaining bootstrapped VCF files (sampling with replacement the individuals in each population; 100 bootstraps) and subjecting the resulting VCF files to separate pixy runs.This allowed us to obtain a 95% confidence interval for each window of the genome containing sites (for each population, we excluded windows with more than 10% missing data).
Using the same pixy output (which includes number of variant sites and number of comparisons), we also obtained genome-wide 95% confidence intervals for π for each population and dxy for each comparison between populations.We then deemed the windows whose confidence interval did not overlap with the genome-wide 95% confidence interval for a given population/comparison, as either high or low nucleotide diversity/absolute divergence regions.
Similarly, using the R package hierfstat v0.5-11 (Goudet, 2005), we used bootstrapping to obtain 95% confidence intervals for F ST both genome-wide among populations and for each of the same windows used in the analysis of nucleotide diversity.As this analysis was performed bootstrapping loci, we restricted this analysis to combinations of comparison and windows containing at least 5 SNPs.We designated 'high F ST windows' for a given comparison between two populations (e.g.Bognor vs. Thurso), those windows whose confidence interval entirely exceeded the genome-wide 99% confidence interval for that specific comparison.These windows can be interpreted as 'high F ST ' in the sense that they exceed the level of divergence normally observed between two given populations.The location of 'high F ST ' windows can be checked against other features (e.g.locations of high genetic diversity) to uncover putative local or global adaptation (Booker et al., 2021); therefore, for this analysis, we focused on wild UK bees and chromosomal regions only.For instance, patterns of π and dxy in high F ST regions may reveal whether local or global adaptation is the most likely process giving rise to high F ST (Booker et al., 2021).Specifically, following Booker et al. (2021), among high F ST regions, those which had low π (for both populations) and dxy were deemed as putatively under global adaptation, and those which had high dxy were deemed as putatively under local adaptation.
As local adaptation can produce both regions of high and low nucleotide diversity (Russ & Jasper, 2020), we did not consider π to identify regions under local adaptation.
Genomic regions found to contain adaptive loci were then screened for protein-coding genes that were subjected to gene function (based on Gene Ontology, GO) and pathways enrichment analyses (using the Kyoto Encyclopedia of Genes and Genomes, KEGG, database).To this end, we first intersected the coordinates of the genomic windows under adaptation with those of the Bter_1.0annotation using the intersect module of the BEDtools v2.31.0 package (Quinlan & Hall, 2010) and extracted the genes included in the intersected regions using Bash commands.Second, these genes (test sets) were compared with the whole B. terrestris genes (baseline set, source: Ensembl Genomes Metazoa release 54) to identify significantly overrepresented GO terms and KEGG pathways.To test for significance, a Fisher's exact test implemented in g:Profiler ve107_ eg54_p17_bf42210 (Raudvere et al., 2019) was used (g:SCS multiple testing correction method, significance threshold of .05).

| Overall population structure
To assess the degree of overall population structure and genetic differentiation in our samples, we applied a combination of different approaches.First, we obtained point estimates of global F ST for each population comparison using hierfstat and considered a given comparison between populations significant if the 95% confidence interval obtained through bootstrap (see above) did not include zero.Then, a principal component analysis (PCA) was performed in R based on the genotype data of the haploidized and pruned data set after mean substitution of missing data and centring.Scatterplots of scores along the first few principal components can be highly misleading particularly when these account for a small proportion of total variation (Bookstein, 2017;Jombart et al., 2009;Lever et al., 2017).For this reason, we performed additional tests to check whether patterns of dispersion observed along the first two principal components were also observed when analysing all principal components.To achieve this, we used two approaches typically employed in morphometrics and ecology.Using the R package GeometricMorphometricsMix (Fruciano, 2018), we used bootstrap resampling to obtain population-specific 95% confidence intervals for multivariate variance (the variance across all the principal components).In this approach, which is commonly used in morphometrics (e.g.Fruciano et al., 2014), if a given population is more variable in PC space than the others, we expect its confidence interval to be disjunct from the others.The second approach involved the PER-MDISP2 procedure (Anderson, 2006), as implemented in the R package vegan (Oksanen et al., 2016), using Euclidean distances and the median distance from the centroid of each population as a measure of dispersion.An analysis of variance (ANOVA) on these measures of dispersion was then employed to test whether there was at least one population with larger dispersion.
To investigate the possible presence of genetic clustering in the whole data set, we used the widely used model-based algorithms implemented in Admixture v1.23 (Alexander et al., 2009).Admixture was run with default parameters and allowing the number of predefined genetic clusters (K) to range from 1 to 10. Statistical support for the different number of inferred clusters was assessed using the cross-validation function included in the package.To further assess the genetic relationships among samples, we used RADpainter v 0.3.2_r109(Malinsky et al., 2018) which, differently from widely used clustering approaches (e.g.Admixture), infers the ancestry of genomic segments in a population.The program then sums the coancestry values across all loci and generates a coancestry matrix for the full data set.
To ensure an accurate estimation of the genotype and ancestry parameters in our data set composed of mixed-ploidy (haploid and diploid) bumblebees, we used the hierarchical Bayesian model implemented in the program Entropy v2.0 (Shastry et al., 2021).This program is appositely devised to deal with data sets that include samples with different ploidy levels.Entropy was run on the genotype likelihoods encoded as "PL" field in the GATK VCF file that were first converted into a readable format (MPGL) using the vcfR v1.13.0 R package (Knaus & Grunwald, 2017), for 100,000 steps with 50,000 burn-in and stored every 10th step in memory.For each K model (putative number of demes) we ran three independent chains and assessed convergence using the assessconvergence.R script provided in the package.The number of K that best fits the data was evaluated using the widely applicable information criterion (WAIC) (Watanabe, 2010), and estimates were extracted from the raw Entropy HDF5 output files using the estpost.entropyscript.The admixture proportions (Q scores) for all individuals and all K values in a stacked barplot fashion were drawn using the plotadmix.R scrip.We used the VCF-contrast module of the VCFtools package to detect any private alleles in the wild bees.Target (wild) and background (commercial bees plus B. t. dalmatinus) samples were specified using the + < list>' and -<list> options respectively.Major and minor allele frequency for each private allele was calculated using VCFtools with the 'freq' option.Raw numbers of private alleles per population were normalized to account for differences in population size.

| Overlap of genomic regions of interest
To test for the significance of the overlap of genomic regions of interest, including regions of high and low nucleotide diversity and putative local and global adaptation, we used the permutational procedure implemented in the package regioneR v1.30.0 (Gel et al., 2016) using the number of overlaps as test statistic and 100 permutations.This procedure performs permutation tests by randomizing genomic regions, therefore controlling for chromosome size and distribution of regions in the genome.Importantly, for comparisons of regions of interest, we restricted the analysis to chromosomal regions in the reference genome only (i.e.we excluded regions in unplaced scaffolds).We also masked those regions (i.e.windows) which did not contain sequence data in our data set -which is crucial because not masking those regions would result in elevated type I errors (i.e.finding erroneous 'significant overlaps').
The first tests of overlap we ran were pairwise among populations testing for the overlap of, respectively, low nucleotide diversity and high nucleotide diversity regions, reflecting the hypothesis that -across wild populations and between commercial and wild populations -the same regions of the genome tend to harbour genetic diversity or show reductions in genetic diversity.Further, we tested for the overlap of those regions identified in the wild samples as under global adaptation, and windows of low nucleotide diversity for the commercial samples.This test reflects the question of whether regions putatively under global adaptation in the UK (characterized, among other things, by low π) correspond to regions of low genetic diversity specific to the UK (in which case we expect a non-significant overlap with commercial samples).Vice versa, a significant overlap would indicate that regions putatively under global adaptation may be either false positives or reflect global adaptation at a scale broader than the UK.Finally, we tested the overlap for wild samples of regions showing signals of local adaptation, with the location of SNPs containing private alleles for the wild samples.The rationale for this test is that in regions under local adaptation we expect to find private alleles more frequently (Sjöstrand et al., 2014).

| Sequencing, mapping and genotyping
A total of 125 individuals were kept after excluding those with low number of reads aligned to the reference bumblebee genome (average number of paired-reads per individual that aligned concordantly: 700,139; SD: 169,189).After genotype calling and filtering according to the criteria detailed above (see Section 2), a matrix of 650,772 SNP loci (including 8953 variants, of which 6774 had reduced linkage between nearby sites) was retained for downstream genomic analyses.

| Genetic diversity and differentiation
An exploratory PCA (Figure 1b) identified distinguishable groups formed by wild and commercial individuals, with clearly distinct scores along the first axis of variation (PC1).The commercially reared non-native (B.t. terrestris-like) individuals were the most divergent, while the commercial native (B.t. audax-like) individuals formed a separate group, equally distant from commercial non-native and the wild bees, these latter clustered tightly together, except for two individuals from Brockholes (a more defined clustering that mirrors their sampling locality can be observed in the PCA conducted on the wild samples only; see Figure S1).The four B. t. dalmatinus bees were also located between the commercial non-native and the wild samples in the scatterplot of the scores along the first two principal components; however, the commercial native specimens had distinct scores along the second principal component (PC2).In addition to variation in location among groups, the scatterplot of the scores along the first two principal components showed substantial variation in dispersion among populations.In particular, the commercially reared non-native population appeared approximately as dispersed as the variation among all other samples.However, an investigation of dispersion using all principal components revealed a much lower degree of dispersion, with the commercial non-native population showing a tendency for higher dispersion, but never to the extent one would have expected from the scatterplot of the scores along the first two principal components.Indeed, confidence intervals for multivariate variance of each population are never fully disjunct from each other (Figure S2, Table S2).A similar result is obtained with the procedure based on Euclidean distances in which a permutationalbased analysis rejected the null hypothesis that all populations have the same dispersion and the pattern was mainly driven by the commercial non-native samples having higher dispersion but not to the extent implied by the scatterplot along the first two principal components (Figure S3, Table S3).
The analysis of intra-population genetic diversity revealed similar patterns to the ones observed in dispersion in the full PCA space, with no single population having its confidence interval for genomewide genetic diversity disjunct from the ones from other populations (Figure S4).However, the confidence interval for the commercial non-native sample only overlapped with the commercial native sample and dalmatinus (Figure S4), further confirming the idea that this population had a tendency for higher genetic variability (note how dalmatinus has always very broad confidence intervals due to the low sample size).The sliding window analysis of nucleotide diversity was based, considering all individuals in the analysis, on an average of 566.77 (SD 381.69) base pairs per window, which increased to 622.3 (SD 387.4) restricting the computation to windows mapping to chromosomes in the reference genome.This analysis revealed a large number of windows deemed to have either high or low genetic diversity, with low genetic diversity windows noticeably more frequent than high genetic diversity windows in all populations (Figure S5).Among all samples, the commercial non-native sample had the largest number of windows -both high and low diversity (Figure S5).

| Overall population structure
The general population structure pattern highlighted by the PCA was confirmed by the Admixture analysis (Figure 2).When the to be effective in detecting both large-and fine-scale genetic structures (Steinig et al., 2016), allowed us to distinguish groups including the commercial native, the non-native and the wild bees.Among the wild bees, individuals that tended to show more similarity to the commercially reared bees (i.e. higher number of connections) were highlighted (Figure 3).S4.Across all pairwise comparisons among the wild samples, we found a total of 67 windows deemed as 'high F ST ' windows (Table S5), F I G U R E 2 Genetic clustering and individual ancestry at K = 2 and 3, the most supported numbers of clusters in an Admixture analysis (see cross-validation error plot in Figure S7) that revealed evidence of recent limited admixture between wild and commercial populations.corresponding to 45 unique windows in the bumblebee genome (we note that the same window can be 'high F ST ' in multiple comparisons).Across the 67 windows deemed as 'high F ST ', 40 are also considered 'high genetic diversity' in at least one of the two populations making up each comparison, four are also considered 'low genetic diversity' in at least one of the two populations in the comparison, and 24 are neither low nor high genetic diversity for both the two populations making up each comparison (Table S5).Out of the 67 high F ST windows, two (1 unique) are also low dxy and low π in both populations being compared (regions under putative global adaptation).Conversely, 50 windows (31 unique) had high dxy and were deemed as regions under putative local adaptation.Notably, 39 of these 50 windows (25 unique) were also 'high π' in at least one population being compared.
In terms of overlap for regions with high or low nucleotide diversity, we found contrasting results.In the case of regions of high nucleotide diversity, there was always a significant (p < .01;Table S6) overlap in regions between pairs of populations (i.e.different populations tend to share the same regions of high nucleotide diversity).Even though all tests were significant, the Z scores tended to be higher when testing the overlap between wild populations, and lower in comparisons involving the commercial samples (Table S6).
The Z scores in this analysis are the number of standard deviations between the observed number of overlaps and the average number of overlaps obtained under the null hypothesis that overlaps only occur by chance.Thus, a higher Z score in a given comparison can be interpreted as stronger evidence for overlap and a stronger departure from overlap just due to chance.Conversely, tests of overlaps for low nucleotide diversity windows were almost always non-significant (Table S6), except for a few comparisons involving the populations of Bognor, Broadway, Templeton and Treborth.
The tests of overlaps comparing high F ST windows under putative global adaptation for wild samples with low π regions in commercial samples were always non-significant (1 overlap with commercial non-native, p = .49,Z score 0.64; 1 overlap with commercial native, p = .3,Z score 1.28).That is, regions under putative global adaptation in the UK (which are characterized, among other things, by low π) are not typically the same regions which are low π in commercial samples.Finally, the test of overlap comparing windows in wild samples showing putative signals of local adaptation, with the location of SNPs containing private alleles for the wild samples, was also not significant (25 overlaps, p = −.77,Z score −0.74).
The genomic regions likely under global and local adaptation contain 21 and 182 protein-coding genes respectively (Tables S7   and S8).While we found no functional enrichment in the first gene set (global adaptation), the genes of the second set (local adaptation) showed enrichment for 12 GO terms related to, among others, taste receptor, oxidoreductase activity, fatty acid and lipid biosynthetic processes, and for one KEGG pathway (biosynthesis of unsaturated fatty acids; Table S9).
Using RADpainter, we aimed to identify genetic divergence and signatures of hybridization among populations.The RADpainter results mirrored the main patterns detected in the results of the other analyses.We find support for a relatedness scenario with two fully

| DISCUSS ION
We assessed the implications of utilizing commercially reared nonnative and native bumblebee subspecies in the UK for the genetic integrity of their wild native counterparts, and we asked whether F I G U R E 4 Clustered RADpainter coancestry matrix of the 125 individuals genotyped in this study.The darker the square, the higher the genetic similarity between a pair of individuals.these wild bumblebees showed signs of local and global adaptation.
This was achieved by investigating the population genomics of seven wild populations of Bombus terrestris throughout the UK as well as populations of commercially reared non-native and native subspecies.Minimal genetic structuring was detected in wild populations across the UK, and all populations were found to be genetically distinct from populations of the commercial subspecies, with low levels of introgression detected between commercial and wild populations.Wild bees occurring close to farms that utilized commercial subspecies for pollination were genetically assigned, typically in low proportions, to the commercial subspecies.We detected private alleles in the wild bees that can be used as genomic markers to help distinguish the native and non-native subspecies of B. terrestris.
In addition, we identified genomic regions potentially involved in adaptation.
Using different lines of investigation into population genetic structuring, we find that overall, wild populations of B. terrestris throughout the UK are not genetically distinct from one another, suggesting that there are no real barriers to gene flow.However, one exception is the population from far northern Scotland, Thurso, where individuals are relatively more strongly differentiated from the other wild populations.This suggests that there may be an element of isolation by distance in populations occurring at geographic extremes.The general results found here are mirrored by earlier studies where, for instance, a relatively homogenous populations structure was found across mainland Europe populations, but where high differentiation was detected between islands, such as between Mediterranean islands, and between Ireland and Britain, as well as between the islands and continental Europe (Estoup et al., 1996;Moreira et al., 2015;Widmer et al., 1998).Goulson, 2010).In addition, the lack of established populations of the commercial subspecies suggests that recent observations of winter-active B. terrestris in the southern UK (Hart et al., 2021;Robertson, 1991;Stelzer et al., 2010) may instead be native bees that are adapting to increasing temperatures and the availability of nonnative garden flowers for foraging.
Where introgression was detected was between the commercial subspecies and wild populations close to agricultural regions where greenhouses utilize commercial colonies.These wild populations include those in the regions of Bognor Regis, Broadway, Brockholes Sweden (Kardum Hjort et al., 2022;Seabra et al., 2019), and B. impatiens in New England, USA (Suni et al., 2017).The finding that individuals from the commercial native subspecies are genetically assigned in similar proportions to the wild individuals cluster and the commercial non-native bees cluster (at K = 2) may be due to the use of B. terrestris queens from several different origins in commercial breeding programs (Velthuis & van Doorn, 2006), and/or interbreeding in breeding facilities.
There are several factors that may be acting to limit introgression between the different subspecies of B. terrestris, despite the use of commercial colonies having presented opportunities for interaction between the groups.There may be mating barriers between the different subspecies, even though they reportedly can mate and produce viable offspring under laboratory conditions (De Jonghe, 1986;Ings et al., 2005;Lecocq et al., 2015), as well as under natural conditions where different subspecies co-exist, such as reported in southern France (Ings et al., 2005).For instance, male bumblebee sex pheromones, which act as an attractant for unmated queens (Kullenberg et al., 1973), have been found to vary distinctly in composition between dalmatinus, terrestris and sassaricus, whereas the subspecies terrestris and lusitanicus were found to be similar in composition (Coppee et al., 2008).These sex pheromones are thought to be crucial in the bumblebee mate recognition system, with males of several bumblebee species performing patrolling behaviour and scent marking objects that attracts conspecific virgin females (reviewed in Baer, 2003).Further, it could be possible that the timing of the production of the queens and drones may vary between the different subspecies, and the phenology of the different subspecies may be less well adapted to the UK climate (e.g.Owen et al., 2016).We find that, while no population in our analyses has distinctively higher genetic diversity than the other populations, commercial populations showed a tendency towards higher genetic diversity compared to wild populations.This higher level of genetic diversity may again be influenced by the breeding program techniques, including the use of queens from different origins (Velthuis & van Doorn, 2006) and crossing between lineages in breeding facilities.Other studies have found no large difference in genetic diversity between wild and commercial B. terrestris (Kardum Hjort et al., 2022;Seabra et al., 2019). Interestingly, although Kardum Hjort et al. (2022) find that both wild and commercial groups show moderate genetic diversity, they find that wild bumblebees have higher nucleotide diversity (π), while the number of segregating sites (θw) was higher in commercial bumblebees.These authors show that this discrepancy is reflected in a significantly positive Tajima's D that may be indicative of balancing selection in wild bumblebees in Southern Sweden.Additional studies have found somewhat higher genetic diversity in wild compared to commercial bumblebee populations (Hart et al., 2021;Moreira et al., 2015;Suni et al., 2017).These different patterns in diversity may be due to differences in commercial bumblebee rearing techniques, as well as differences in population dynamics, such as population contractions and declines between different geographic regions.Declines in diversity and abundance, as well as population contractions in wild bees, have been reported in several studies (e.g.Goulson & Hughes, 2015;Marshman et al., 2019).
Interestingly, we find putative evidence for both global and local genomic adaptation.We caution, however, that it remains a challenge, in general, to accurately disentangle the contributions of the different evolutionary processes (e.g.Charlesworth & Jensen, 2021, 2022).Charlesworth and Jensen (2022), for instance, highlight several neutral genetic processes, as well as demographic and selective forces, that likely play a role in restricting observed levels of DNA sequence variation in natural populations to a smaller range that would be expected from differences in their population sizes alone (Lewontin's Paradox).Their evaluations suggest that populations size, as well as selective sweeps and mutational bias, are of high relative importance in resolving this paradox.
Here, we find that regions of high nucleotide diversity tend to overlap across all comparisons, including between commercial and wild populations, suggesting that regions of the genome harbouring high diversity tend to be the same between populations of the different subspecies.The existence of typically high diversity regions across subspecies may be related to variation in recombination rates across regions of the genome -which we did not study here.However, regions of low nucleotide diversity did not significantly overlap.Several processes can drive low nucleotide diversity, and if more processes are acting there will be a higher likelihood that different factors are driving each population.Further, the finding that there is a lack of overlap between regions under putative global adaptation in the UK (characterized by low π, low dxy and high F ST ) in wild individuals, and low nucleotide diversity regions in commercial individuals, provides support for these low π regions of the genome being specific to the wild bees rather than being a general characteristic of the genome of both groups.
We found 21 genes in the regions of the genome putatively under global adaptation, and 182 genes in the regions under local adaptation.Notably, the genes found to be under local adaptation B. t. audax (Britain, Ireland and associated smaller islands), but commercially produced B. t. terrestris and B. t. dalmatinus have been imported for pollination (from the 1980s until 2015), particularly in central and southern England (Natural England).In such commercially bred species, artificial selection and hybridization may cause changes to life-history traits and ecological niche characteristics, multiple commercial colonies designated as B. t. terrestris-like (nonnative subspecies) and B. t. audax-like (native subspecies) respectively.Samples of B. t. dalmatinus, the third commonly commercially bred and supplied subspecies, were supplied as museum samples originally collected in Turkey, the native region of this subspecies.F I G U R E 1 (a) Map of UK showing the sampling localities of the wild-caught bees used in this study.Orange and green dots indicate locations close to and far from farms using commercial colonies respectively.(b) The two main axes of genetic variation of a PCA.

Finally
, to visualize genotypic relationships among individuals, we used the NetViewR v2.1.0program (Steinig et al., 2016) at multiple k-NN values (k-NN = 10-60).Optimization of k-NN values was performed by plotting each k-NN value against the number of communities detected using a "Fast-greedy", an "Infomap" and a "Walktrap" clustering algorithm.Networks were constructed at increasing values of k-NN (20, 40 and 60) allowing us to detect both fine-and large-scale genetic structure.As input data for NetviewR, we used a shared-allele distance matrix (1-IBS) that was generated by Plink from the full set of filtered variants in VCF format.
Bayesian algorithm was set to two genetic clusters, the commercially reared non-native and the wild individuals were assigned to different groups with high assignment scores (admixed individuals were exceptions, and none of these had assignment score (Q) < 0.5 to the correct group of origin).The commercial native and the B. t. dalmatinus bees were found to be admixed with similar proportion to the two defined genetic clusters.When further potential genetic structure was evaluated by increasing the number of K (K = 3), the commercial native bees showed the highest assignment score (Q ranging from 0.8049 to 1, average Q = 0.9523) to a new genetic cluster, and several wild individuals were also assigned to the cluster with low Q, with the exception of one bee from Broadway (Q = 0.8359).Evident substructure in the wild cluster (so identified with K = 2) arose when K was set to 4, with several wild-caught individuals showing substantial admixture with multiple ancestries.Exploring further potential clustering across the whole sample set failed to clearly identify a match between inferred genetic clusters and sampling locality, and thereby confirmed the low genetic structure across the wild bees in the UK.The Admixture plots corresponding to K = 2-10 are provided in the Figure S6 along with the results of the cross-validation techniques that suggested that the numbers of clusters with highest statistical support are K = 2-3 (Figure S7).The NetView results support the results of the previously applied population clustering methods and provide deeper insight into the interconnectedness within and between bumblebee populations.Setting K-NN to 20, 30 and 40, values that have been shown

Finally , F
ST statistics were congruent with the patterns of population structure revealed by the clustering approaches.Generally, we found low pairwise levels of relative genetic differentiation across the whole data set (F ST ranging from 0.0002 to 0.13), with the highest F ST values shown by the commercial populations.In the pairwise comparisons among the wild samples, Thurso showed the highest level of genetic divergence with the other populations.Similar patterns were detected using an absolute measure of genetic divergence (dxy).All genome-wide pairwise F ST values are reported in Table

F
NetView networks constructed at k-NN values of 20 (a and e), 40 (b and f) and 60 (c and g).Individuals are coloured based on sampling locality (e, f and g) and according to the Admixture assignment proportions at K = 2 (a, b and c).Admixture plot (d) is the same as Figure 2.
supported clusters: one that includes all the commercially reared native and non-native individuals, the B. t. dalmatinus bees, and a single wild individual from the locality of Broadway; a second cluster that includes the wild individuals (Figure4).In the first main cluster, RADpainter identified two well-resolved sub-clusters: the first grouping all the commercially reared non-native individuals, with high values of coancestry; the second grouping the commercially reared native individuals with the wild individual nested into, and the B. t. dalmatinus bees.In the second main cluster, only the individuals from the far north-eastern wild population, Thurso were grouped into a monophyletic clade (high ancestry values), while the remaining wild bees were assigned to several sub-clusters which do not necessarily correspond to the populations of origin (Figure4).Similar to what was highlighted by RADpainter, the hierarchical Bayesian model implemented in the program Entropy recovered a clear distinction between the wild and commercially reared nonnative samples, while, on average, the commercially reared native individuals and the B. t. dalmatinus bees showed higher levels of admixture (Figure5a).Notably, at K = 2 all individuals from the wild populations further from farms were assigned to the same genetic cluster with a very high assignment proportion (Q ranging from 0.9903 to 0.9998, average Q = 0.9984; Figure5a,b), while wild individuals closer to farms showed a slightly higher level of admixture with the cluster corresponding to the commercially reared non-native bees (Q ranging from 0.6572 to 0.9998, average Q = 0.9687).Similar patterns were observed at K = 3, where individuals from populations closer to the farms showed higher level of admixture with the two genetic clusters formed by commercially reared bees (Figure5a,c).Further sub-structuring at higher levels of K (4-10) was observed suggesting on the one hand a low level of divergence among the wild populations, and on the other hand a non-homogenous genetic background of the commercial bees, either natives or non-native subspecies (Figure5; FigureS8).
The wild and commercial populations of B. terrestris were found to be genetically differentiated, with generally only low levels of introgression observed between commercially reared populations and wild populations, and limited evidence for established populations of the commercial subspecies being detected.However, even limited introgression is likely associated with long-term risks for the conservation of wild bees.Importantly, these population genomic patterns are evident from several lines of investigation, including Admixture analysis, genotype relationships using NetView, estimates of F ST , genetic ancestry and implementing the hierarchical Bayesian model in Entropy.Interestingly, these patterns of limited introgression are evident despite seemingly ample opportunity for invasion and hybridization of non-native subspecies due to a history of importation of commercial subspecies from the 1980s until 2015 in the UK (Department for the Environment, Food and Rural Affairs, 2014;Goulson, 2010;Moreira et al., 2015).Further, this provides evidence contrary to predictions that B. terrestris in the UK is a single homogenous population in which native and imported subspecies have introgressed (including the native B. terrestris audax and nonnative B. terrestris dalmatinus originating from Greece and Turkey;

F
I G U R E 5 (a) Individual coancestry inferred by the hierarchical Bayesian model implemented in the program Entropy for K 2, 3 and 4. (b) Assignment scores (Q) to the commercial non-native cluster (inferred from Entropy at K = 2 and represented by green bars in a) of the wild bees close and far to the farms.(c) Assignment scores (Q) to the commercial native cluster (K = 3, blue bars in a) of the wild bees close and far to the farms.Nature Preserve and Deeping St Nicholas.Perhaps unsurprisingly, the overall direction of gene flow was found to be from commercial populations towards wild populations, which is similar to the findings of Seabra et al. (2019) in their analysis of introgression between commercial and native B. terrestris in the western Iberian Peninsula.Interestingly, one individual from Broadway stood out as being highly assigned to the commercial native bee cluster (at K = 3) using Admixture and Entropy.This individual is thus potentially either highly introgressed with native commercial bees, an escaped bee from farms utilizing native commercial colonies, genetically similar to the bees used to establish the commercial native stock, or a bee from established wild colonies of commercial bees.The geographic location of Broadway in the central region of the United Kingdom means that at the time of sampling there were farms using both non-native and native commercial colonies close to, or within foraging distance of the sampling location (Natural England), which could facilitate the different scenarios of introgression or escape.Indeed escapees from commercial colonies have been detected in other studies of B. terrestris in the Iberian peninsula and southern France and B. t. lusitanicus in the Iberian Peninsula.However, it has been suggested that warmer temperatures due to climate change may enable winter activity, and facilitate the successful establishment of commercial B. t. dalmatinus colonies in the south of the UK Expression of winter-active behaviour is usually restricted to populations occurring in the Mediterranean Basin, which includes the native range of B. t. dalmatinus, but also B. t. terrestris in southern