Admixture in Africanized honey bees (Apis mellifera) from Panamá to San Diego, California (U.S.A.)

Abstract The Africanized honey bee (AHB) is a New World amalgamation of several subspecies of the western honey bee (Apis mellifera), a diverse taxon historically grouped into four major biogeographic lineages: A (African), M (Western European), C (Eastern European), and O (Middle Eastern). In 1956, accidental release of experimentally bred “Africanized” hybrids from a research apiary in Sao Paulo, Brazil initiated a hybrid species expansion that now extends from northern Argentina to northern California (U.S.A.). Here, we assess nuclear admixture and mitochondrial ancestry in 60 bees from four countries (Panamá; Costa Rica, Mexico; U.S.A) across this expansive range to assess ancestry of AHB several decades following initial introduction and test the prediction that African ancestry decreases with increasing latitude. We find that AHB nuclear genomes from Central America and Mexico have predominately African genomes (76%–89%) with smaller contributions from Western and Eastern European lineages. Similarly, nearly all honey bees from Central America and Mexico possess mitochondrial ancestry from the African lineage with few individuals having European mitochondria. In contrast, AHB from San Diego (CA) shows markedly lower African ancestry (38%) with substantial genomic contributions from all four major honey bee lineages and mitochondrial ancestry from all four clades as well. Genetic diversity measures from all New World populations equal or exceed those of ancestral populations. Interestingly, the feral honey bee population of San Diego emerges as a reservoir of diverse admixture and high genetic diversity, making it a potentially rich source of genetic material for honey bee breeding.


| INTRODUC TI ON
Hybridization, the interbreeding of distinct genetic lineages, has long complicated taxonomic boundaries and challenged the perception of species as discrete taxonomic and evolutionary units. Initially considered an evolutionary dead end, hybridization is now recognized as a driver of adaptation and diversification across various evolutionary lineages. Sunflower (Helianthus) hybrids colonize habitats prohibitive to parental species (Rieseberg et al., 2003;Whitney et al., 2015). Admixture jumpstarted the spectacular diversification and adaptive radiation recognized in African cichlids (Seehausen, 2004).
The Africanized honey bee (AHB) is a human-mediated hybrid of the American continents, created from the intercross between an African subspecies (Apis mellifera scutellata) and various European and Middle Eastern honey bee subspecies. The western honey bee (Apis mellifera) is taxonomically diverse, comprising over thirty recognized subspecies traditionally clustered into four major, lineages based on genetic, geographic, and morphometric data: A (African), M (Western European), C (Eastern European), and O (Middle East and Anatolia) Ruttner, 1988;Wallberg et al., 2014;Whitfield et al., 2006). Recently, new lineage groupings have been proposed: Y, for bees from the Arabian Peninsula (see, e.g.,  and Z (Alburaki et al., 2013) referring to bees of the traditional O clade from Syria. Here, we are particularly interested in which honey bee lineages contribute to New World bee populations and we use the A, M, C, and O nomenclature because known importations of bees to the New World come from these clades (Carpenter & Harpur, 2021;Ruttner, 1988). Substantial variation in behavior, morphology, and genetics exists across honey bee subspecies, even within the overarching clades. The Eastern European subspecies (C clade) are particularly favored in commercial beekeeping due to their gentle nature and predilection for honey production. In contrast, African subspecies are largely disfavored for both commercial and hobbyist use due to the intensity of their nest defense and high propensity to abscond (abandon the nest en masse and move to another (Ruttner, 1988)).
Early honey bee importations to the Americas were largely Western European (M) and Eastern European (C) in origin (reviewed in Schneider et al., 2004). Generally, African (A) subspecies were excluded from importation with modest exceptions (see Schiff & Sheppard, 1993). However, temperate-adapted, non-native European honey bees (EHB) struggled to thrive in the Neotropics.
In response, honey bee researchers in Sao Paolo, Brazil initialized a breeding program in 1956, importing 47 queens of the African subspecies (A. m. scutellata) for experimental crossing (reviewed in Schneider et al., 2004). Researchers bred this African subspecies with European races, hoping to forge a superior hybrid for tropical beekeeping. These hybrid "Africanized" honey bees (AHB) escaped from their experimental apiaries and spread into the surrounding countryside (reviewed in Schneider et al., 2004).
The expansion of the AHB across the American continents over the past 60+ years is considered one of the "most spectacular biological invasions of all time" (Pinto et al., 2005). From their original Brazilian epicenter, AHB spread across South and Central America, rapidly hybridizing with and replacing the pre-existing European honey bee population with one of predominantly African ancestry (reviewed in Schneider et al., 2004 and references therein;Whitfield et al., 2006;Nelson et al., 2017;. On the southern front, AHBs reached their range limit in Argentina in the 1970s at approximately 34° south latitude, presumably stopped from advancing further by the colder climate (Taylor & Spivak, 1984).
In their northern expansion, AHB reached Panamá by 1982, Costa Rica by 1986, Mexico by 1989, Texas by 1990, and California by 1994(Kim & Oguro, 1999. Currently, African mitochondria and nuclear markers are present in feral honey bees in California as far as 38° north latitude (Calfee et al., 2020;Kono & Kohn, 2015;Lin et al., 2017). While the current northern range limit may not be stable in the face of climate change, northward range expansion has clearly slowed in comparison to its explosive (160-500 km/year) neotropical expansion (Schneider et al., 2004 and references therein).
The dramatic shift from European to predominantly African ancestry throughout the majority of the New World suggests that Africanization provides ecological advantages in the areas where it dominates. Several behavioral and physiological traits of the Africanized honey bee are thought to drive AHB success: high reproductive rates; intense nest defense (McNally & Schneider, 1992a, 1992bFewell & Bertram, 2002; see also Breed et al., 2004); and higher resistance to infestation from the mite (Varroa destructor), a parasite and disease vector implicated in honey bee nest failure (Goulson et al., 2015;Guzman-Novoa et al., 1996). The advantages, if any, AHB derives from its remaining European ancestry are less clear although Nelson et al. (2017) identified European genes underlying ovary size inferred to be selected for and Harpur et al. (2020) showed that both African and European ancestry underlies AHB nest defense.
In this study, we characterize the admixture and genetic diversity of AHB in the Neotropics and the southwestern United States using a dataset of 60 high depth (~25×) AHB whole genome sequences (WGS) collected from four regions spanning ~6000 km. Each sampling site reflects a distinct time since initial contact between resident European and advancing Africanized forms: the isthmus of Panamá; Guanacaste NP, Costa Rica; Chiapas, Mexico; and San Diego County, CA, U.S.A. We leverage two existing sequencing projects (Harpur et al., 2014;Wallberg et al., 2014) and the recently published honey bee reference genome with chromosome-length scaffolds (Wallberg et al., 2019) for our analyses. This is the first genomic study to assess ancestry in AHB samples from Central America and Mexico as well as the first to assay the contribution of the O lineage in the regions sampled. The contribution of this lineage to the California honey bee population was not evaluated in previous genomic studies (Calfee et al., 2020; and is of interest because mitochondria from this lineage are known to be present at considerable frequency in southern California's feral honey bees (Kono & Kohn, 2015) and occasionally elsewhere in the United States (Magnus & Szalanski, 2010). Ultimately, our study aims to broaden our understanding of the admixture dynamics of a massive hybrid takeover of an invasive social insect of great agricultural importance. were workers collected while foraging on flowers. San Diego bees were collected across 15 sites each separated by >5 km so that each likely represents a worker from a different colony. The furthest collection sites were separated by 65 km. Collection sites ranged from urban to rural settings. Due to the presence of hobbyist and agricultural beekeeping we do not rule out the possibility that the captured honey bees were from managed rather than feral hives. However, most honey bee foragers in San Diego are from feral hives (Kono & Kohn, 2015, and see results).  Apis mellifera carnica (n = 19) and Apis mellifera ligustica (n = 10) and 20 genomes from the Middle Eastern (O) clade, including the subspecies: Apis mellifera anatoliaca (n = 10) and Apis mellifera syriaca (n = 10) (see Tables 1, S1). In total, we used a panel of 89 reference honey bee genomes representing the four major honey bee clades and spanning 7 subspecies.

| DNA extraction & sequencing
We

| Variant calling and genotype likelihood estimation
We used the program ANGSD v0.930 (Korneliussen et al., 2014) to call variant sites and estimate genotype likelihoods (--doGlf 2) across all 159 honey bee genomes. The major and minor alleles were inferred (--doMajorMinor 1) as follows. A threshold likelihood ratio for SNP calling was set (--SNP_pval 1e-6) and allele frequencies were calculated using inferred major and minor alleles (--doMaf 1). In addition, we discarded reads with a mapping quality below 30 and a base quality below 20. We removed tri-allelic sites and only included sites in which we had at least 63% of individuals reporting information with a depth of coverage of at least 3×. We chose a genotype likelihood approach over a called genotype approach such as that used by other ancestry software progams like ADMIXTURE (Alexander et al., 2009) as genotype likelihoods have been shown to be robust to low-coverage sequencing data (Korneliussen et al., 2014;Skotte et al., 2013). This SNP set was then thinned for linkage disequilibrium, keeping 1 in every 100th SNP for an average spacing of 689-bp distance between SNPs.

| Admixture and principal components analysis (PCA)
For admixture analysis, we used the program NGSadmix (Skotte et al., 2013), which uses a genotype-likelihood approach that factors in uncertainty associated with next-generation sequencing and has been shown to have good performance even with lowcoverage data. We ran NGSadmix using the BEAGLE genotype likelihood files created by ANGSD with K values ranging from 2 to 6 (K = number of assumed genetic clusters). We included only SNPs that were present in at least 94% of all individuals and had a minimum minor allele frequency of 5%. Here, we focus on the results from K = 4 genetic clusters because we are interested in assessing the contributions of the four ancestral lineages (A, M, C, and O) historically imported into the Americas. However, we also report admixture results from K = 2-6 in Figure S1. We used R (R Core Team, 2014) to graph admixture estimates. We used PCAngsd (Korneliussen et al., 2014) to conduct a principal components analysis of all SNPs, and graphed the resulting PCA with R (R Core Team, 2014).

| Mitochondrial sequence assembly and phylogenetic analysis
Filtered reads of all 60 New World honey bees were aligned to a mitochondrial reference genome from an individual of subspecies A. m. ligustica sequenced by Crozier and Crozier (1993)

| Measures of genetic diversity
To assess allelic diversity, we calculated estimates of both pairwise theta (̂ π ), based on the number of mean pairwise differences between sequences, and Watterson's theta (̂ w ), based on the measure of segregating sites for each sampled and reference population using ANGSD v.928 (Korneliussen et al., 2014). Our reference populations were created by including honey bees from both the did not sequence bees from this lineage. We additionally calculated genetic diversity measures for each reference population, separated by sequencing project, to ensure that the different sequencing methods did not unduly influence our results (Table S3). Using only sites in which at least 50% of individuals in a population provided data, we estimated the folded site frequency spectrum (SFS) across the entire genome using the reference honey bee genome as the ancestral state. We then calculated and averaged thetas per site, including invariant sites, using ANGSD's realSFS program.
To ensure that our diversity estimates were not overly affected by the difference in coverage between our reference and newlysequenced genomes, we calculated an additional measure of pairwise nucleotide diversity (̂ π ) using only higher confidence SNPs with >5% minor allele frequency (MAF) in the total sample, following a pipeline described in Calfee et al. (2020). Using ANGSD, we first identified a set of SNPs with >5% minor allele frequency in the total sample and inferred the major and minor alleles at those SNPs

| Principal component analysis (PCA)
The

| Mitochondrial ancestry in Africanized honey bee samples
Each mitochondrial sequence from New World honey bees   Figure S2.

| Genetic diversity
All four sampled AHB populations have similar levels of genetic diversity for all three estimators which use both genome-wide sites and a smaller number of high-quality SNPs (  et al., 2002). Beekeeping with C-lineage honey bees was widespread across Mexico prior to the arrival of AHB, with an estimated 1.5 million managed colonies present throughout the country and Mexico remains one of the largest exporters of honey on the global market (Guoda et al., 2001;Winston, 1979). In contrast, Costa Rica and Panamá both had modest managed beekeeping activity prior to AHB arrival, and feral EHB colonies were quite rare, particularly in the rainy lowlands (Lobo, 1995;Roubik & Boreham, 1990). Additionally, many beekeepers in Central America abandoned the trade after AHB arrival (van Veen et al., 1998). Thus, AHB likely encountered a much smaller population of EHB in Central America than in Mexico, allowing for a rapid and extensive Africanization of the honey bee gene pool.
In striking contrast to the honey bees from Mexico and Central

America, African ancestry in honey bees collected in San Diego
County, California (U.S.A.) is relatively low (x = 37.5% ± 1.12%) and San Diego bees feature substantial genomic contributions from all four major honey bee lineages (Figures 1 and 2). Eastern European (C) ancestry (x = 34.8% ± 1.14%) in San Diego honey bees is substantially higher than the other three sampled sites while the contribution of the Western European (M) lineage (x = 19.1% ± 0.41%) is also somewhat elevated (Figure 1, Table 2). Honey bees from San Diego also have lower African genomic content in comparison to those from Texas and Arizona (~75% A) (Bozek et al., 2018;Whitfield et al., 2006). in southern California (Calfee et al., 2020;. However, this is the first assessment of Middle Eastern (O) ancestry in southern California honey bees.
Why is African genomic content in southern California bees much lower than elsewhere? We explore three hypotheses that might account for this. First, models built from climate data fitted to the southern AHB range limit predict that colder winter weather plays a considerable role in halting AHB expansion. (Harrison et al., 2006;Southwick et al., 1990;Taylor & Spivak, 1984). Nearer the northern Second, gene flow from managed European honey bee populations could restrain the introgression of genes of African origin in San Diego County. In the United States, AHB are generally considered unfit for apiculture and commercial agriculture due to undesirable characteristics such as a higher propensity to sting and to abandon their nests (reviewed in Schneider et al., 2004). The desired lineages of European clades are actively maintained via consistent requeening of colonies with mated queens of European origin (Schiff & Sheppard, 1995. However, such beekeeping practices have failed to noticeably inhibit the introgression of high levels of African genes into feral Texas and Arizona bee populations (Bozek et al., 2018;Pinto et al., 2005;Whitfield et al., 2006). One potential mitigating factor preventing excessive Africanization of San Diego honey bees may be the large agricultural presence in the county with ~230,000 acres of planted crops, many of which (e.g., avocados and citrus) use honey bees for pollination services (San Diego County Crop Statistics Annual Report, 2019). Gene flow from high-density European managed hives could counter Africanization.
However, genetic swamping by managed honey bees would require that a substantial fraction of the honey bees in San Diego County come from managed, genetically European hives. Our finding that all 15 foraging workers examined here had substantial African and Middle Eastern ancestry-lineages not used in managed coloniesargues against this. This finding is consistent with the hypothesis, supported by previous (Kono & Kohn, 2015) and current mitochondrial data, that most bees foraging in San Diego County, whether in urban or nonagricultural rural settings, derive from feral, Africanized

colonies.
Finally, insufficient time may have elapsed since the introduction of the AHB to San Diego County for African ancestry to reach levels comparable to those seen elsewhere in the southwestern U.S. (Pinto et al., 2005;Whitfield et al., 2006). AHB arrived in San Diego County in 1994 and our bees were sampled more than two decades later, suggesting either that Africanization is taking much longer than in Texas, or differences in conditions in San Diego relative to Texas lead to reduced penetration of the African genome.
Notably, all four sampled regions report a significant amount of Western European (M) ancestry. Studies that have tracked the process of Africanization elsewhere have shown that African genetic material largely or completely replaces genomic content from the Eastern European (C) lineage, while the contribution from the M lineage to genomes of AHB remains substantial and is never completely eliminated (Clarke et al., 2002;Nelson et al., 2017;Pinto et al., 2005;Whitfield et al., 2006 which is associated with a QTL for worker ovary size (Calfee et al., 2020;Nelson et al., 2017). In addition, genes of both M and A ancestry appear to underlie nest defense behavior in AHB (Harpur et al., 2020). Further work is needed to determine whether these regions of M ancestry are under selection in our sampled populations. Alternatively, small amounts of M ancestry may be hitchhiking within predominantly African genomes.
Mitochondrial analysis of our four sampled populations is largely consistent with findings from nuclear genomes (Figure 3; Additionally, depending on the estimator, the genetic diversity of admixed populations exceeds or equals that of the African clade, the honey bee lineage previously found to have the highest genetic diversity (Calfee et al., 2020;Espregueira Themudo et al., 2020;Harpur et al., 2012;Wallberg et al., 2014). The high level of genetic diversity of admixed populations likely results from both substantial contributions from the genetically diverse African lineage in addition to admixture bringing together variation found among ancestral lineages (Harpur et al., 2012 (Carpenter & Harpur, 2021;Ruttner, 1988 (Hung et al., 2018(Hung et al., , 2019. This occurs in spite of carrying detrimental viral diseases at titers similar to those found in managed bees, suggesting they can resist negative viral affects for which managed hives receive mitigating treatments (Geffre et al., 2021). Future work to determine local genomic ancestry could investigate selection on genomic regions that consistently come from African versus European lineages. Such regions, and the genes they contain, are critical to understanding the genetic changes that underlie the ecological success of Africanized honey bees. Such analyses could also shed light on the locations and origins of genomic regions useful for breeding managed honey bees to be more resistant to factors currently harming the honey bee industry.

ACK N OWLED G M ENTS
We

DATA AVA I L A B I L I T Y S TAT E M E N T
All DNA sequence files are deposited and publicly available on the Dryad database (https://doi.org/10.5061/dryad.xwdbr v1ff).