Species complex diversification by host plant use in an herbivorous insect: The source of Puerto Rican cactus mealybug pest and implications for biological control

Abstract Cryptic taxa have often been observed in the form of host‐associated species that diverged as the result of adaptation to alternate host plants. Untangling cryptic diversity in species complexes that encompass invasive species is a mandatory task for pest management. Moreover, investigating the evolutionary history of a species complex may help to understand the drivers of their diversification. The mealybug Hypogeococcus pungens was believed to be a polyphagous species from South America and has been reported as a pest devastating native cacti in Puerto Rico, also threatening cactus diversity in the Caribbean and North America. There is neither certainty about the identity of the pest nor the source population from South America. Recent studies pointed to substantial genetic differentiation among local populations, suggesting that H. pungens is a species complex. In this study, we used a combination of genome‐wide SNPs and mtDNA variation to investigate species diversity within H. pungens sensu lato to establish host plant ranges of each one of the putative members of the complex, to evaluate whether the pattern of host plant association drove diversification in the species complex, and to determine the source population of the Puerto Rican cactus pest. Our results suggested that H. pungens comprises at least five different species, each one strongly associated with specific host plants. We also established that the Puerto Rican cactus pest derives from southeastern Brazilian mealybugs. This is an important achievement because it will help to design reliable strategies for biological control using natural enemies of the pest from its native range.


| INTRODUC TI ON
Herbivorous insects are often involved in close interactions with their hosts since they are a food source and provide mating and oviposition sites (Schoonhoven, Van Loon, van Loon, & Dicke, 2005).
Such intimacy often entails the evolution of adaptations that allow insects to cope with specific features of their host plants. Hence, host plant shifts may affect the evolution of features associated with feeding location, oviposition, and development on the host (Orsucci et al., 2018;Schoonhoven et al., 2005). Adaptation to new hosts may cause, either as a direct consequence or as a by-product, the evolution of sexual isolation, highlighting the role of host plant shifts in speciation (Funk, Nosil, & Etges, 2006;Nosil, 2012).
Phytophagous insects with presumptively wide host ranges, that is, generalists, pose a concern from an applied view when dealing with insect pests. Many authors have suggested that polyphagous species are formed by locally adapted specialized populations or cryptic specialist species (e.g., Forbes et al., 2017;Loxdale & Harvey, 2016). A species complex made up of cryptic specialists pose an additional complication in defining host range. The distinction between true polyphagous and cryptic specialist species is crucial for the design of biological control programs (see below). In addition, cryptic species complexes offer the opportunity to investigate the role of host plants in the diversification of herbivorous insects since such complexes often include species of recent origin (Bakovic et al., 2019;Malka et al., 2018;Winter, Friedman, Astrin, Gottsberger, & Letsch, 2017).
Distinguishing between intra-and interspecific genetic variation is particularly relevant in cases of suspected cryptic species, especially when members of a guild of cryptic species are involved in invasions of new areas. Invasive insects may cause global problems upon spread to new territories, not only to agriculture, but also to biological diversity since invasive species are among the main causes of biodiversity loss (Newbold et al., 2015). The proper identification of invasive species is a necessary task to design successful biological control strategies to prevent or reduce harmful effects of invaders.
With the advent of molecular data, an increasing number of studies showed that some putative polyphagous insects are, actually, species complexes embracing several deeply diverged species, each one specialized to different host plant use (Egan, Nosil, & Funk, 2008;Nosil et al., 2012;Powell, Forbes, Hood, & Feder, 2014; but see Vidal, Quinn, Stireman, Tinghitella, & Murphy, 2019). So far, most population surveys have been based on the evaluation of a single genetic marker, the so-called DNA-barcoding gene encoding cytochrome oxidase subunit I (mtDNA) (Dinsdale, Cook, Riginos, Buckley, & De Barro, 2010;Stouthamer et al., 2017). However, it is well known that in many cases, this marker provides information limited to the mitochondrial lineage, potentially resulting in misidentification of species boundaries due either to incomplete lineage sorting or introgressive hybridization (Després, 2019;Eyer, Seltzer, Reiner-Brodetzki, & Hefetz, 2017).
Recently, population genetic studies benefited from availability of new methodologies based on high-throughput sequencing of genomic libraries containing a reduced-representation of nuclear genomes, known as genotype by sequencing, such as RADseq (Andrews, Good, Miller, Luikart, & Hohenlohe, 2016;Baird et al., 2008). These methodologies provide large multilocus datasets that can be used to evaluate cryptic diversity in species complexes (e.g., in Elfekih et al., 2018) and to trace the origin of invading pests (Anderson, Tay, McGaughran, Gordon, & Walsh, 2016;Ryan et al., 2019).
The cactus mealybug pest invading Puerto Rico and the adjacent smaller islands represents a threat for cactus diversity in the Caribbean, Central and North America. The pest was initially reported as Hypogeococcus pungens Granara de Willink (Hemiptera: Pseudococcidae), under the common name Harrisia cactus mealybug (HCM), the successful biological control agent released against an invasive cactus in Australia and South Africa (McFadyen & Tomley, 1978, 1981Paterson et al., 2011). Currently, H. pungens sensu lato is considered a species complex, and in its native range (South America), it was reported to feed on species of Amaranthaceae, Portulacaceae, and Cactaceae (Ben-Dov, 1994;Claps & Haro, 2001;Zimmermann, Pérez, Cuen, Mandujano, & Golubov, 2010). Recent studies based on molecular data and assessing reproductive compatibility (Poveda-Martínez et al., 2019) and performance on different hosts plants (Aguirre et al., 2016) suggest that populations collected in Argentina, initially identified as H. pungens sensu lato, were, actually, a species complex comprising at least two species: one mealybug species feeding on Amaranthaceae (H. pungens sensu stricto) and the other an undescribed species feeding on cactus. Interestingly, the Puerto Rican cactus pest appears closer to H. pungens sensu stricto in phylogenetic trees suggesting that the pest shares a recent ancestor with the latter rather than with the sympatric, cactus feeding new species from Argentina. However, none of the mitochondrial haplotypes found in Argentina matched the single haplotype detected in the Puerto Rican mealybugs feeding on Cactaceae, suggesting that the source population of the pest was not Argentina (Poveda-Martínez et al., 2019).
In the present study, we extended the sampling effort to a large geographic area comprising both the native (Argentina, Paraguay, and Brazil) and non-native (Puerto Rico and southern United States) ranges by collecting mealybugs on host plants recognized as part of the diet of H. pungens sensu lato. We used a combination of genome-wide SNPs and mtDNA variation to investigate: (a) genomic diversity within H. pungens species complex, (b) host plant ranges of each one of the putative members of the complex; (c) whether host plant shifts drove the diversification in the species complex, and (d) the source population of the Puerto Rican cactus pest. Based on these results, we will be able to search for and select biological control strategies using natural enemies, either those that co-evolved with the pest (classical biological control) or those that did not co-evolve but attack closely related species to the pest (new association biological control).

| Sample collections and DNA extraction
Samples of the mealybugs were collected in the native range of H. pungens sensu lato in three South American countries (n = 80) and in the recently invaded areas of Puerto Rico and the continental United States (n = 93). Mealybugs were collected on the host plants reported as part of the diet of H. pungens sensu lato (Cactaceae, n = 89; Amaranthaceae, n = 69; and Portulacaceae, n = 25). In the native range, mealybugs were collected in northern and northwestern Argentina (n = 21 on Cactaceae and n = 18 on Amaranthaceae), along the Atlantic coast of Brazil (n = 23 on Cactaceae; n = 11 on Amaranthaceae and n = 4 on Portulacaceae), and in western Paraguay (n = 3 on Cactaceae). We also included samples collected in the non-native range in Puerto Rico (n = 42 on Cactaceae; n = 24 on Amaranthaceae, and n = 11 on Portulacaceae), southeastern United States (n = 11 on Amaranthaceae), and in southwestern United States (n = 5 on Cactaceae). Additionally, we included samples from southeastern Australia (n = 5 on Cactaceae), where H. pungens sensu lato from Argentina was introduced as a biological control agent against cactus weeds in the 1970s (McFadyen, 2012). All individuals were preserved in 100% ethanol and stored in a freezer until DNA extraction. Information concerning sampling localities, geographic coordinates, and host plants species is presented in Figure 1 and Table 1. Genomic DNA was extracted using entire bodies of adult female mealybugs using Qiagen DNeasy Blood & Tissue Kit according to manufacturer's instructions, adding 2 µl of RNAse A after the lysis step. DNA was quantified using Qubit 2.0 Fluorometer (Life Technologies), and quality was assessed in a NanoDrop ND-1000 spectrophotometer (NanoDrop Technologies Inc.).

| SNP calling and genomic data filtering
The genotyping analysis used custom scripts (SNPsaurus, LLC) that trimmed the reads using bbduk (BBMap, http://sourc eforge.net/ proje cts/bbmap/) (Bushnell, 2017) (Bushnell, 2017). The assembly was done using abyss-pe (Jackman et al., 2017). Assembled contigs shorter than 250 bp were removed and then aligned using blastn to the NCBI nt database. Blast hits to bacterial species were removed. Reference draft genome is available as FuEDEI_HPun_1.1.fa at the NCBI under accession number: JAAOIU000000000. Cleaned reads were then mapped to the reference mealybug draft genome with an alignment identity threshold of 0.95 using bbmap (BBMap tools). Genotype calling was done by using callvariants (BBMap tools). The Variant Calling File generated was filtered to remove individuals with more than 10% missing data, sites with more than 20% missing genotypes, and loci with minor allele frequency (MAF) lower than 0.03 using VCFtools v1.15 (Danecek et al., 2011). We randomly selected one SNP per contig to minimize linkage disequilibrium (LD) and to ensure the independence of the SNPs employed in the next analyses. We also excluded from our dataset SNPs with more than two allele variants and indels.
We used BayeScan to test for outlier SNPs. Fit to Hardy-Weinberg expectations (HWE) of variant frequencies for each locus within populations was tested using the exact test implemented in dDocent (Puritz, Hollenbeck, & Gold, 2014).
Expected heterozygosity (H e ), global differentiation estimates (G st ), and pairwise F st estimates between populations from native and non-native ranges were calculated using mmod and Adegenet (Jombart, 2008) packages available in R (www.r-proje ct.org  (Frichot & François, 2015) to assign samples to genetic clusters. sNMF estimates individual ancestry coefficients utilizing the same likelihood model underlying STRUCTURE (Pritchard, Stephens, & Donnelly, 2000) and ADMIXTURE (Alexander, Novembre, & Lange, 2009), but uses nonnegative matrix factorization and least squares optimization to accommodate genome scale datasets. Ancestry coefficients were calculated by sNMF for K values ranging from 1 to 10, with 100 repetitions for each value of K, and the optimal K value was assessed using the cross-entropy criterion based on the prediction of masked genotypes to evaluate the error of ancestry estimation. The second approach used to assess population structure was a discriminant analysis of principal components DAPC (Jombart, Devillard, & Balloux, 2010) as implemented in R in adegenet package, using K means clustering to identify the optimal number of clusters for K values ranging from 1 to 10, and estimating individual admixture coefficients. We then calculated Bayesian information criterion scores (BIC) to identify the K value with the best fit. Additional pairwise F st estimates between groups identified in prior clustering analyses were performed using the SNPs dataset (142 individuals, 1,707 SNPs).

| Influence of host plants and geographic distance on genetic divergence
We analyzed the influence of geographic distance and host plants on genetic divergence among H. pungens complex populations from the native range. Non-native populations in these analyses were excluded because samples collected in these locations reached the areas by human mediated dispersal and would distort the analyses.
We used a combination of correlation tests and multiple regression analyses of distance matrices to analyze the importance of geographic distance and host plant use on genetic divergence.
In these analyses, we used pairwise F st linearized values at population level based either on nuclear SNPs or mtDNA datasets to construct the matrix of genetic distances. To construct the geographic distance matrix, we used the coordinates (latitude and longitude) of each population (Table 1). Host plant distance matrix was built considering host plant families where mealybugs were collected: Cactaceae, Portulacaceae, and Amaranthaceae. Thus, we constructed a binary matrix with three categories (the host plant families) considering patterns of host plant use.
To analyze the influence of geographic distance and host plant use on genetic divergence, we employed two Mantel tests (Mantel, 1967), first considering geographic and genetic matrices and second host plant and genetic distance matrices. We then conducted a partial Mantel test (Smouse, Long, & Sokal, 1986) to evaluate the influence of host plants on genetic divergence, using pairwise F st and host plant distance matrix, while controlling for geographic distance (Legendre & Fortin, 2010). For all correlation analyses, we used the R package vegan v2.4.4 (Oksanen et al., 2007) and 9,999 permutations to assess statistical significance. We further tested the influence of geographic distance and host plant use on genetic divergence by performing matrix regressions using the function MRM from the package ecodist v2.0.1 (Goslee & Urban, 2007) and ran 9,999 permutation to assess the significance of regression coefficients.
ModelFinder estimated GTR as the best-fitted model according to

| NextRAD and mitochondrial data
The nextRAD dataset consisted, on average, of

| Genomic diversity and differentiation in the H. pungens species complex
Both SNPs and mtDNA datasets revealed intermediate levels of genetic diversity but great levels of differentiation among populations.
Overall nuclear and mitochondrial genetic diversity were high across the native range and low in the non-native range (Table 3) (Table S2).
Concerning the mtDNA dataset (

| Influence of host plant use and geographic distance in genetic variation
Analyses of the influence of host plants and geographic distance on genetic variation in the H. pungens species complex showed that host plant use was a better predictor of genetic distance than geographic distance (Table 4). Correlation tests between host plant use and genetic distance matrices and between geographic and genetic distance matrices using the nuclear SNPs dataset were significant  (Table 4).

| Species tree reconstruction and species delimitation analysis
Both ML and BI phylogenetic analyses and the species tree inferred by SNAPP, all of them based on the SNPs dataset, revealed five major well supported clades (Figure 3a  Abbreviations: ARA, Argentina mealybugs feeding on Amaranthaceae; ARC, Argentina, Australia and Paraguay mealybugs feeding on Cactaceae; BRA, Brazil mealybugs feeding on Amaranthaceae; BRC, Brazil mealybugs feeding on Cactaceae. Paraguay, and Australia feeding on Cactaceae (clade ArPaAu-C) that appeared close to Amaranthaceae-feeding mealybugs from southeastern Brazil (clade Br-A) (Figure 3c).

Species delimitation analysis based on the BFD* method using
SNPs data supported a model in which H. pungens was a complex of five species (Model 1, Table S3), supporting the picture depicted by the phylogenetic tree and species tree produced with SNPs data (Figure 3a,b). This means that the clade which included H. pungens  Figure S1).  which the Puerto Rico cactus pest was derived. This is an important achievement since it will help to design reliable strategies for classical biological control using natural enemies, such as specialized parasitoids, associated with the pest in its native range.

| Hypogeococcus pungens species complex and host plant use
The specimens used to describe H. pungens were originally collected on an Amaranthaceae host (Granara de Willink, 1981). Mantel tests for population genetic and phylogenetic data (Guillot & Rousset, 2013;Harmon & Glor, 2010;Legendre & Fortin, 2010;Legendre, Fortin, & Borcard, 2015), it should be recalled that our conclusions are also based on the results of a multiple regression analysis that revealed that host plant use (accounting for more than 48% of total variance) is a better predictor of genetic divergence among populations (Table 4).
In this context, our results pointed to host plant use as an im- from Argentina (ArPaAu-C) and cactus feeding mealybugs from southeastern Brazil (BrPr-C) exceeded that which would be expected for conspecific populations, as indicated by our species delimitation analyses. In this context, it is worth mentioning that besides the large geographic distance separating populations of these clades and the fact that both feed on Cactaceae, these two putative species fed on cactus that belong to different genera. Mealybugs of the ArAuPa-C clade fed on species of the genera Cleistocactus and Harrisia, whereas BrPr-C thrived on species of other genera like Pilosocereus for native Brazilian mealybugs, and various genera for non-native Puerto Rican pest mealybugs (Table 1). It may be argued that divergent selection on alternate hosts and allopatry appeared as the more likely drivers of divergence. As a matter of fact, there is evidence that divergent selection caused by alternate hosts acting on ancestral fragmented populations is sufficient to produce genetic  (Hamon, 1984;Tomley & McFadyen, 1984;Williams, 1973  Argentina, or H. pungens (Julien & Griffiths, 1999;McFadyen, 2012;Zimmermann et al., 2010).
Overall, our data provided evidence of two factors that influenced at least 48% (Table 4)  Aguirre & Logarzo (also encyrtid primary parasitoids from Argentina and Paraguay) were only found associated to H. pungens sensu stricto and the ArAuPr-C clade (Triapitsyn et al., 2018). The influence of such differential ecological interactions with natural enemies might be affecting patterns of genetic divergence in this species complex.

Results of phylogenetic and species delimitation analyses
were not entirely congruent. Analysis with mtDNA visualized four major clades, whereas genome-wide SNPs allowed to detect five well supported clades (Figure 3). The main difference being that Hypogeococcus mealybugs from northeastern Brazil, Puerto Rico, and United States (BrPRUS-AP clade) and H. pungens sensu stricto (Ar-A) were different species according to nuclear SNPs, while these clades appeared collapsed in the same group with mtDNA data ( Figure 3). Such mito-nuclear inconsistencies have often been reported in insects (Hinojosa et al., 2019;Weigand et al., 2017). Even though the mtDNA has been useful to trace the evolutionary history in many species groups (Ball & Armstrong, 2006;Hebert, Penton, Burns, Janzen, & Hallwachs, 2004), incomplete lineage sorting and past hybridization events may obscure species delimitation based only on mtDNA data, particularly in recently diverged taxa (Després, 2019;Hickerson, Meyer, & Moritz, 2006;Hinojosa et al., 2019). For instance, Moreyra et al. (2019) reported a mitogenomic study in a cluster of closely related cactophagous Drosophila spp. inhabiting the southern cone of South America and found that the evolutionary history inferred by mitogenomes is not completely concordant with the phylogeny depicted by nuclear genomes (Hurtado, Almeida, Revale, & Hasson, 2019). The authors proposed that either incomplete lineage sorting (ILS) and/or introgressive hybridization could account for the pattern observed. A word of caution is needed before arriving at definitive conclusions in the present study due to limitations of the mtDNA dataset, that consisted of only a few hundred base pairs of the mtDNA gene encoding COI. In contrast, the evolutionary history depicted by the nuclear genomic dataset may be considered more reliable since it consisted of more than one thousand widely distributed SNPs.
Phylogenetic trees based on either nuclear SNPs or mtDNA indicated that feeding on Amaranthaceae was the more likely ancestral condition. However, results yielded by mtDNA and SNPs datasets were not completely congruent concerning the evolution of host plant use; both cactus feeding species formed a derived monophyletic clade in the tree based on nuclear SNPs, while cactophagy appeared to have evolved twice in the tree obtained with mtDNA data.
However, to unveil the ancestral host in these clades and to evaluate the evolution of host plant use and the biogeographic history of the genus in the continent, other well-delimited cactophagous species of the genus Hypogeococcus from South America, such as Hypogeococcus spinosus Ferris and H. festerianus, should be included in an expanded analysis.

| Puerto Rican cactus mealybugs derive from southeastern Brazilian cactus feeding populations
Our analyses, based on nuclear SNPs and mtDNA, allowed us to establish that the Puerto Rican cactus pest derives from a population similar to the southeastern Brazilian cactus feeding clade (Figures 2 and 3). In our previous study using only mtDNA and to develop more specific biological control strategies aimed at protecting wild cactus from this mealybug. In classical biological control programs, the correct identification of the target species is a key issue for searching for natural enemies of the pest in its native area (Hoelmer & Kirk, 2005). Thus, untangling the evolutionary history of the H. pungens species complex did not only have taxonomic and systematic relevance, but from a practical perspective, it indicated that the design of biological control strategies (e.g., search for natural enemies) against the pest should focus on southeastern Brazilian cactus feeding mealybugs.
Our results showed that Amaranthaceae-and/or Portulacaceaefeeding mealybugs collected in Puerto Rico and continental United States were also derived from Amaranthaceae-and/or Portulacaceae-feeding Hypogeococcus, though, in this case, from northeastern Brazil (Figures 2 and 3). These findings suggested two invasion events for mealybugs of the H. pungens species complex into Puerto Rico and the continental United States, one involving cactus feeders and the other mealybugs feeding on Amaranthaceae and/or Portulacaceae. Comparisons of the levels of genetic diversity in the native and invasive ranges of these mealybugs supported the idea of the occurrence of founder events during the colonization of Puerto Rico and southern United States. In both cases, introduced populations showed lower levels of genetic diversity in both mtDNA and nuDNA than in the respective native ranges, suggesting that a reduced number of individuals were involved in each colonization process (Table 3).
Many invasive species have been capable of thriving in novel environments despite the reduction of genetic variation as a consequence of founder events that may negatively impact fitness, survival, and evolutionary potential of the invasive populations (Koch et al., 2020;Logan, Minnaar, Keegan, & Clusella-Trullas, 2020;Tsutsui, Suarez, Holway, & Case, 2000). In this vein, two putative species of the H. pungens complex have been successful invaders and continue to spread throughout the Caribbean and southern United States, threatening native species and cactus diversity despite the loss of genetic variation.

ACK N OWLED G M ENTS
We thank Andrés Fernando Sánchez Restrepo and Nadia Jiménez

CO N FLI C T O F I NTE R E S T
The authors declare that they have no conflict of interest.