Colonization history and population differentiation of the Honey Bees (Apis mellifera L.) in Puerto Rico

Abstract Honey bees (Apis mellifera L.) are the primary commercial pollinators across the world. The subspecies A. m. scutellata originated in Africa and was introduced to the Americas in 1956. For the last 60 years, it hybridized successfully with European subspecies, previous residents in the area. The result of this hybridization was called Africanized honey bee (AHB). AHB has spread since then, arriving to Puerto Rico (PR) in 1994. The honey bee population on the island acquired a mosaic of features from AHB or the European honey bee (EHB). AHB in Puerto Rico shows a major distinctive characteristic, docile behavior, and is called gentle Africanized honey bees (gAHB). We used 917 SNPs to examine the population structure, genetic differentiation, origin, and history of range expansion and colonization of gAHB in PR. We compared gAHB to populations that span the current distribution of A. mellifera worldwide. The gAHB population is shown to be a single population that differs genetically from the examined populations of AHB. Texas and PR groups are the closest genetically. Our results support the hypothesis that the Texas AHB population is the source of gAHB in Puerto Rico.


| INTRODUC TI ON
The gentle Africanized honey bees of Puerto Rico (gAHB) are a unique population that combines some desirable traits, such as mite resistance (intense grooming and biting behavior that does not allow the proliferation of the mites). These bees have not been affected by losses common in the US and the world, as population samples showed an absence or low levels of seven viruses monitored in the National Honey Bee Health Survey (Madella et al., 2016). Also, gAHB have reduced colony defensiveness (Rivera-Marchand, Oskay and , and the least defensive colonies show the highest rate of foraging and honey reserves (Rivera-Marchand, Giray, & Guzmán-Novoa, 2008). This admixed population is part of the broader history of the accidental introduction of Africanized honey bees (AHB) to continental Brazil and later spread across the Neotropics and southern Nearctic. Since its introduction and spread, AHB has had significant ecological, agricultural, and human impact (Morse et al., 1973;Nelson et al., 2017). As part of this expansion and assisted by human transit, AHB arrived to Puerto Rico in 1994 (Cox, 1994). However, AHB's continental origin remains unknown and only one introduction event is thought to have occurred (see Rivera-Marchand et al., 2008;Galindo-Cardona et al., 2013).
Like the rest of the New World, Puerto Rico had an existing population of EHB, which were introduced by colonizers (Engel, 1999;Horn, 2005) prior to the arrival of AHB on the island. These EHB were likely an admixed population combining genetic diversity from current commercial "Italian" strains (C group) and initial historical stocks from Spain (M group) (Phillips, 1914;Taylor, 1977;Whitfield et al., 2006). However, by the time AHB arrived, this initial EHB population had been severely negatively impacted by the 1980s introduction of Varroa (de Guzman, Rinderer and Stelzer, 1997 (Guzman-Novoa & Correa-Benitez, 1996). In addition, honey bee colonies in Puerto Rico have not been affected by the degree of losses common in mainland US and other parts of the world (e.g., Oldroyd, 2007;Giray et al., 2010).
Though much is known about the events surrounding introduction and spread of AHB in the island and the selective pressures it experienced to become the gAHB (Avalos et al., 2017), the genetic origin and patterns of admixture of this population remain poorly understood. Past studies identified that gAHB is a contiguous population spanning Puerto Rico and two adjacent islands (Vieques, Culebra) with no detectable population substructure (Galindo-Cardona et al., 2013). Analysis of parental lineage through mitotype identification showed a single African matriline present in the island, in contrast with five detected in continental AHB populations (Rivera-Marchand et al., 2008). In addition, we know the population has retained a sizeable proportion of EHB alleles, with a suggested 40% introgression (Galindo-Cardona et al., 2013). Identifying the putative AHB founding population giving rise to gAHB can help understand the range and changes in genetic diversity leading to the evolution of this unique population and further inform how allelic profiles conferring both reduced colony defensiveness and parasite resistance may arise (Hunt et al., 2007;Navajas et al., 2008;Tsuruda et al., 2012).
In this study, we capitalize on a previous data set representing the widest geographical sampling available to date for honey bees (Whitfield et al., 2006), albeit with a greater representation of Africanized honey bees. We expand on this coverage by adding samples from the gAHB population in Puerto Rico. We implemented the combined data set to elucidate the recent genetic history of gAHB.
Specifically, we address three major aims: (a) to describe the genetic structure and ancestry contributions to the gAHB population in Puerto Rico, (b) to assess the geographic origin of gAHB parental populations, and (c) to examine the possible existence of populations with similar genetic profiles to that of gAHB in the broader spectrum of continental AHB genetic diversity. In addition, we assess if gAHB were a genetic mosaic in parts of its genome by contrasting whether alleles from one of the parental lineages were more frequent in gAHB than expected for particular markers. These aims provide a critical biogeographical context for a population known for its evolutionary novelty, furthering projects on current and future traits of interest.

| Data collection and processing
A total of 40 gAHB samples were collected from the Gurabo Apiary in Puerto Rico (18°15′27.65″N, 65°59′11.16″W, Figure 1 in Galindo-Cardona et al., 2013). To prevent oversampling of maternal alleles, only one bee per colony was subjected to genetic analysis. Samples were of different pupal stages to ensure colony origin. Genomic DNA from half the thorax of an individual honey bee was extracted using DNeasy extraction kit from QIAGEN ® with the animal tissue protocol. The extracted DNA was assessed using agarose gel electrophoresis (1%), NanoDrop (NanoDrop ND-1000), and Qubit Fluorometer (Invitrogen™), according to the manufacturer's instructions.

| Genotyping
The data were obtained with the same SNP panel used by Whitfield and colleagues (Whitfield et al., 2006). Briefly, we used Illumina's Bead Array Technology and the Illumina GoldenGate ® allele-specific extension assay (Illumina) with a custom Oligo Pool Assay (OPA), following manufacturer's protocols. Activated DNA targets were bound with allele-specific oligo (ASO), each dyed differently at the imaging stage (Whitfield et al., 2006).

| Genetic structure and ancestry in gAHB
We examined genetic clustering and population structure via discriminant analysis of principal components (DAPC; Jombart, Devillard, & Balloux, 2010) and STRUCTURE (Pritchard, Stephens, & Donnelly, 2000). Genetic structure via DAPC comprised the determination of optimal clusters achieved by using the find.clusters() function in the adegnet R package (Jombart et al., 2010). The approach applies successive k-means clustering of a PCA derived from the genotype matrix (917 SNP × 370 samples) and produces a goodness of fit BIC criteria for each level of k ( Figure 1a). In this analysis, k represents a "preselected parameter corresponding to an a priori number of populations or genetic groups, represented by a set of allele frequencies described in the data" (Pritchard et al., 2000).
The optimal number of k and corresponding sample assignations to these clusters are used to identify the principal components that maximize differentiation between clusters while minimizing differentiation within clusters (Jombart et al., 2010). These were juxtaposed with our a priori bins of samples to outline genetic history vis-a-vis geographic distribution and parental origin (Figure 1b).
A separate STRUCTURE analysis was run with gAHB (n = 40) and Texas AHB (n = 101) populations to determine differentiation of these two populations ( Figure 2). We also compared DAPC cluster assignation with phylogenetic relationships between samples.
Our approach used functions from the ape package in R to derive Euclidean distances between samples using the genotype matrix to create a per-sample neighbor-joining tree (Figure 3b). This way, we could examine the genetic proximity of mis-assigned samples.

| Geographic origin of gAHB
Using the combined 370 sample × 917 SNP data set, we applied a phylogenetic analysis to identify the genetic and geographic source of gAHB. Our approach used Prevosti's absolute genetic distance (Prevosti, Ocaña, & Alonso, 1975) to quantify individual relationships within and among populations. These distance matrices were reduced to (a) a rooted dendrogram at the population level using an Unweighted Pair Group Method with Arithmetic Mean (UPGMA) hierarchical clustering strategy (Sokal and Michener, 1958), and (b) an unrooted neighbor-joining (NJ) phylogenetic tree (Nei & Saitou, 1987).

| Mosaic test
We tested the deviation from an admixture model for specific markers by comparing all SNP marker allele frequencies across an F I G U R E 1 Identification of unsupervised genetic clustering via k-means selection. a, Plot of the Bayesian information criteria (y-axis) used to select the optimal number of possible genetic clusters (x-axis) in our data set. A k = 8 number of clusters was optimal for this data set (highlighted by an asterisk). b, The plot illustrates relationship of cluster memberships between prior population clusters (y-axis) and derived unsupervised genetic clusters (x-axis) for the data set. Square size indicates number of samples as defined in the legend

| Genetic structure and ancestry in gAHB
Results of the DAPC cluster assessment identified an optimal number of K = 8 genetic clusters in the data set ( Figure 1a)  F I G U R E 3 Structure clusters derived from genetic similarity across the data set. a, Principal components analysis (PCA) of the data set. b, Neighbor-joining tree plotted as an unrooted cladogram of the same data set used to explore genetic relationship between samples and its correlate with cluster assignation. Specific labels are provided for a subset of nine samples from continental hybrid populations (WWR, Latin American Transect) grouped in the same cluster as all the gAHB samples. In both panels colors are provided to highlight the previously defined genetic clusters (as in Methods 2.4, also in Figure 1b) the hybrid spectrum, it is a stable population found on the island, supporting a single, undivided population.

| Geographic origin of gAHB
We examined the combined 917 SNP × 370 sample data set to explore the origin of gAHB in Puerto Rico. We included New World groups (Texas, Brazil and gAHB) into the STRUCTURE analysis using genetic distances among the groups (Figure 4).

| Cluster assignment population
A PCA of the data set was conducted to examine population structure using the K = 8 clusters (Figure 3a). This analysis showed the gAHB cluster (Cluster 3, green) to be intermediate between the mostly AHB cluster (Cluster 4, pink) and the mostly EHB cluster (Cluster 8, orange; Figure 3a). Cluster assignation also identified some samples clustering with gAHB. Further examination revealed these samples to be from the WWR temporal transect and the Latin American geographic transect conducted by Whitfield et al., (2006).
Most of the samples clustered with gAHB had African mitotypes as reported by Whitfield et al. (2006). Cross-referencing collection dates with the identified WWR samples showed that most of the samples that fell within Cluster 3 corresponded to the early portion of the time series (1995)(1996) which correlated with the earliest description of AHB in Puerto Rico (Cox, 1994). The other misidenti-

| Mosaic test
We tested the mosaic hypothesis by comparing all SNP marker allele frequencies across a calculated hybrid frequency of AHB (from Arizona and Texas samples) and EHB bees (Texas and Managed colonies) in the samples of gAHB allele frequencies. The correlation coefficient for this comparison was 0.86 (r = 0.86, df = 916, p < 0.01), and Mahalanobis distance analysis revealed 60 SNPs that were outliers; they had allele frequencies with a significant deviation from the expected admixture frequency ( Figure 5). This is greater than 6.5% of all SNPs, indicating that there are at least six times more loci than what either parental population would resemble by chance (cut off for outliers was <0.01), supporting the mosaic hypothesis. Because of markers' dispersion (i.e., only ~917 total number of markers were considered, 60 outliers identified as significantly different from hybrid), associations with known genes and traits were not explored.

| D ISCUSS I ON
The genetic structure of gAHB found in Puerto Rico supports a single colonization event, as indicated by monophyly of this group.
Specific phylogeographic relations indicate the potential source population to be the Texas AHB. The single colonization hypothesis Rico (Delgado et al., 2012;Rivera-Marchand, Oskay and Giray, 2012). gAHB is a distinct population derived from the broader AHB hybridization spectrum, further evolved on the island (Avalos et al., (2017) and Figure 3b). It appears that the colonization event c.a. 1994 (Cox, 1994) initiated a process of hybridization that after 20 years leads to the establishment of an admixed island population (Avalos et al., 2017). Currently, the island population is an important reservoir of genetic diversity with traits of high interest for apiculture and agriculture as discussed in relation to Varroa resistance, reduced aggressiveness, and low viral load.
One hypothesis addressing the distinct genetic variation in the gAHB population is that introgression of alleles varied in different proportion from locus to locus, making some traits mostly African, others European. Honey bees have a relatively small genome (Honey bee Genome Sequencing Consortium, 2006) and the highest recombination rate reported of any multicellular organism so far (Beye et al., 2006). These characteristics foster the rapid development of novel combinations of genetic variation. In Puerto Rico, gAHB has undergone a soft selective sweep favoring retention of genetic variation in the frequency profile of many alleles across the genome (Avalos et al., 2017) and leading to a genetic mosaic.
Previously, even with only a few markers it was observed that two of eight microsatellite loci tested deviated from the expected allele frequencies based on an admixture model (Galindo-Cardona et al., 2013). We tested this hybrid mosaic hypothesis now with 917 markers across the genome and found 6.5% of the markers to deviate from the admixture model, demonstrating the "mosaic" characteristic.
Cluster and assignment analyses converge in that ( The gAHB population (Cluster 3) placement agrees with the close pattern of the New World clusters that shows a recent and likely ongoing admixture of variable degrees of intensity (see Figure 3a, and Whitfield et al., 2006). In addition, gAHB lies intermediate in the spectrum between EHB and AHB groups (Clusters 8 and 4, Figure 3a).

| CON CLUS ION
We conclude that AHB on PR hybridized with EHB and processes of local selection and extraordinary features of the island resulted in an "island bee" currently called gAHB. The ancestral parental gAHB came from Texas. The gAHB population has diverged from its origin (Texas) and is a population with a distinct stable genetic structure. Our results suggest that gAHB may represent a new ecotype of Apis mellifera.

CO N FLI C T O F I NTE R E S T S
None of the authors have any competing interests.