Invasion origin, rapid population expansion, and the lack of genetic structure of cotton bollworm (Helicoverpa armigera) in the Americas

Abstract In 2013, Helicoverpa armigera (Hübner) (Lepidoptera: Noctuidae) was officially declared as present in Brazil and, after two years, the species was detected in the Caribbean and North America. Information on genetic features and accurate distribution of pests is the basis for agricultural protection policies. Furthermore, such knowledge is imperative to develop control strategies, understand the geographical range, and genetic patterns of this species in the Americas. Here, we carried out the widest sampling of H. armigera in the South American continent and Puerto Rico, after we estimated the diversity, demographic parameters, and genetic structure. The Internal Transcribed Spacer 1 (ITS1) nuclear marker was used to investigate the presence of putative hybrids between H. armigera and H. zea, and they were observed at a frequency of 1.5%. An ABC analysis, based in COI gene fragment, suggested Europe as the origin of South America specimens of H. armigeraand following a movement northward through the Caribbean. Three mtDNA genes and three nDNA markers revealed high genetic diversity distributed without the defined population structure of H. armigera in South America. Most of the genetic variation is within populations with a multidirectional expansion of H. armigera among morphoclimatic regions. High genetic diversity, rapid population expansion, and hybridization have implications for pest management since they suggest that adaptive alleles are spread through wide areas in South America that favor rapid local adaptation of H. armigera to new and disturbed environments (e.g., in agricultural areas).


| INTRODUC TI ON
The agribusiness sector accounts for more than 20% of the Brazilian gross domestic product (GDP) (IBGE, 2016). Nevertheless, this sector suffers economic losses of US$ 17.7 billion per year due to pest damage on 35 major crops (Oliveira, Auad, Mendes, & Frizzas, 2014). Plant protection policies to control insect pests are limited by lack of efficient management strategies and new population outbreaks caused by ecologic disturbances and invasive pests. The introduction of exotic pests has been an enduring problem in the world, causing large negative impacts over the past three decades (Lopes-da-Silva, Sanches, Stancioli, Alves, & Sugayama, 2014). This is the case of the phytosanitary crisis caused by the Old World cotton bollworm, Helicoverpa armigera (Hübner) (Lepidoptera: Noctuidae), in Brazil and its dispersion northward to the Americas.
After the first record of H. armigera in Brazil in January 2013 in Goiás, Mato Grosso, and Bahia States (red dots in Figure 1; Czepak, Albernaz, Vivan, Guimarães, & Carvalhais, 2013;Tay et al., 2013), H. armigera population outbreaks occurred in the same year in a wide geographical area, especially the row crop areas of the northeastern and central regions of the country and constantly associated with reports of control failures of pyrethroid pesticides EMBRAPA, 2013). Strategic control tactics to contain or eliminate invasive pests depend on an accurate spatial characterization of the invasion and dispersion processes of the species in its new territory. However, H. armigera is morphologically similar to the native pest H. zea (Boddie) (Lepidoptera: Noctuidae). In larval stages, they are morphologically indistinguishable, which made data collection concerning geographical distribution and dispersion of this pest in Brazil difficult (Pogue, 2004). Furthermore, hybridization events between H. armigera and H. zea are a real scenario under natural conditions Laster & Hardee, 1995;Laster & Sheng, 1995;Leite, Corrêa, et al., 2017).
Despite the absence of an official record about specific invasion routes of H. armigera to Brazil, sequential investigations rapidly started, indicate new reports of H. armigera in the Americas. Initially, outbreaks began to be reported in other South American countries (Arnemann et al., 2016;Murúa et al., 2014), then to Caribbean (NAPPO, 2014) and, finally, Florida, USA, in 2015(APHIS, 2015Hayden & Brambila, 2015). Around three years after being officially reported, H. armigera had almost reached the entire American continent. Its rapid geographical expansion is favored by high dispersion capacity mediated by the wind, mild winter temperatures, and the intense cropping systems, which provide a wide availability of food and high reproductive rates during the four seasons of the year. All detections of H. armigera in Puerto Rico have been reported in the southern drier region of the island that concentrates on year-round production of vegetable (i.e., tomato, peppers) and is one of the largest winter nursery breeding operations in the United States for major row crops (i.e., soybean, corn, cotton, and sorghum) (Gerrero, 2009).
Recent population studies revealed a high genetic diversity of H. armigera populations in Brazil and genetic similarity among Brazilian individuals with populations originating in Europe, Africa, and Asia (Anderson, Tay, McGaughran, Gordon, & Walsh, 2016;Leite, Alves-Pereira, Corrêa, Zucchi, & Omoto, 2014;Mastrangelo et al., 2014;Tay et al., 2017). The high genetic diversity of H. armigera increases doubts surrounding an invasion process and population dynamics of H. armigera in Brazil, since high genetic diversity is not expected in invasive insects due to the founder effect of invasion processes.
Hybridization is an important event in the process of colonization of new ecosystems by invasive species because they may receive new and favorable alleles of a native species (Lewontin & Birch, 1966;Mesgaran et al., 2016). Fertile hybrids between H. armigera and H. zea, in laboratory conditions, were initially reported by Laster and Hardee (1995) and Laster and Sheng (1995). However, due to the reproductive isolation resulting from the large distance of occurrence between these two species, hybrids had never been observed under natural conditions. This recurrent question has now arisen after the H. armigera invasion in the Americas and the occurrence of these two species in the same geographical region with a huge impact on pest management.  Czepak et al. (2013) immediately preserved at −20°C in 100% ethanol until DNA extraction. Total DNA was extracted from the abdomen of the specimens using the standard phenol:chloroform method, adapted for microcentrifuge tubes (Lyra, Klaczko, & Azeredo-Espin, 2009). The pellet was resuspended in 100 µl of TE buffer and stored at −20ºC.

| Mitochondrial markers
We used three partial sequences of mitochondrial genes: cytochrome c oxidase subunit I (COI; COIF/COIR; Li et al., 2011), cytochrome c oxidase subunit II (COII; A-tLEU/B-tLYS; Liu & Beckenbach, 1992), and cytochrome b (Cyt b; Harm CB-J-10769/Harm CB-N-11325; primers designed in this study) (Appendix A). The PCR were conducted in a final volume of 25 μl for mtDNA with 25 ng of total DNA, 1.5 U of TaqDNA polymerase (Fermentas International Inc., Canada), 56 µM of dNTPs, 2.5 mM of MgCl 2 , 0.3 mg/ml of0020BSA, 10× TaqBuffer, and 0.16 µM of each primer. Successful amplifications were purified with the Illustra TM GFX TM kit (GE Healthcare, Bucks, UK), cycle sequencing reactions were completed using the BigDye ® Terminator v3.1 kit (Thermo Fisher Scientific Inc., Waltham, EUA), and bidirectional sequenced by the ABI 3730xl DNA Analyzer sequencer (Applied Biosystems, Foster City, EUA). Geneious 6.0.6 software package (Biomatters Ltd, Auckland, NZ) was used to assemble the nucleotide sequences into a contig for each specimen. The generated sequences were aligned employing the multiple sequences alignment algorithm implemented in Clustal Omega (Sievers et al., 2011), manually edited and the translated sequences were checked for the presence of premature stop codon in MEGA 5.1 (Tamura et al., 2011). Sequences were independently identified by BLAST 2.3.0 search (www.ncbi.nlm.nih.gov/BLAST/) against sequences stored in NCBI GenBank database (http://www. ncbi.nlm.nih.gov/genbank/). In this step, COI also were used to H. armigera species identification.

| Inferring Brazilian H. armigera origin
In short, we used an Approximate Bayesian Computation (ABC) approach (Csilléry, Blum, Gaggiotti, & François, 2010), designed here for the first time for H. armigera, to investigate the origin of its invasion in South America. We designed demographic and coalescent F I G U R E 2 An ABC analysis performed to investigate founder effect in H. armigera samples from South America. I-In horizontal, three independent ABC runs of scenarios based in COI marker represent individuals coming from (1) Asia; (2) Europe, and (3) Africa. Arrows represent time into past, T0 = present, T1 = divergence time, and T2 = coalescent time. Summary statistics vector most informative for the data are between parentheses. In vertical and inside each scenario, three models illustrate (a) constant population size; (b) exponential population growth, and (c) very rapid population expansion. Numbers at the bottom of the graph correspond to posterior probabilities (PP) while bold numbers inside the rectangle represent the highest PP found in each scenario. The goodness of fit test of the scenarios is shown on the right; Red cross: observed SuSt. II-Last independent ABC run for the scenarios comparison containing the best models obtained before (1c, 2c, 3c) models (see below) that were used to generate a simulated data dis- Initially, three simple and specific scenarios were designed (independent ABC runs) testing three similar models in each as shown in The ABC analyses were performed separately for each scenario in the ms 20161016 coalescent sampler (Hudson, 2002) to construct a prior distribution of simulated datasets. For each scenario, a custom Python™ script was designed with 300,000 data simulations.
All scripts used in the ABC analysis and mentioned below are de- After obtaining the prior distribution, the goodness of fit test was performed in the R package ABC (Csilléry, François, & Blum, 2012) with the gfitpca function to evaluate how well the simulated models fit the observed data. A good fit occurs when the observed SuSt is positioned within the simulated SuSt. Next, all possible combinations of two or more simulated SuSt were grouped into vectors. A rejection step was conducted with a Python script in the Euclidean distance C program msReject (http://msbayes.sourceforge.net/) using 10 simulations from the prior distribution of each model treated as empirical data to find the vector that best detects the true model (Tsai & Carstens, 2013). Then, these vectors were employed to find the 2.5% and 97.5% weighted values of each parameter in the R package ABC which were, respectively, the minimum and maximum new limits of these parameters in the main ABC analysis.
Therefore, the main ABC analysis was performed as described above, but now with 900,000 data simulations and restricted parameters. The model most supported by the data in each scenario was selected using msReject to retain the posterior distribution of 0.001 of the simulations closest to summary statistics from the empirical data. Again, only the vector selected previously was used. Lastly, the models with the highest PP values found in each of the three scenarios were carried out in a final independent run (Figure 2-II), applying a hierarchical procedure similar to Fagundes et al. (2007), to infer the most likely model among the scenarios. This new ABC analysis and the rejection step were performed as described above, but with 3,000,000 data simulations and restricted parameters. We estimated the number of transitions and transversions and the nucleotide (π) and haplotype diversity (Ĥ) (Nei, 1987). The neutrality tests Tajima's D (Tajima, 1989), Fu's Fs (Fu, 1997) and the mismatch distribution (Rogers & Harpending, 1992) were performed to test the neutral hypothesis and investigate the demographic events. The neutrality tests were tested for significance by generating 1,000 random samples using coalescent simulations. Hierarchical analysis of molecular variance (AMOVA) was conducted to estimate the population genetic structure within and among morphoclimatic groupings with 10,000 permutations and gamma = 0. Pairwise comparisons of the fixation index, F ST , were used to determine the genetic differences among groupings with the significance assessed by 10,000 permutations for each pairwise comparison and the gamma α value set to zero. All analyses were carried out using Arlequin v.3.5 (Excoffier & Lischer, 2010).

| Genealogical inferences
An unrooted statistical parsimony haplotype network (gene genealogies) was estimated with the default 95% plausible connection limit using the TCS 1.2.1 (Clement, Posada, & Crandall, 2000)   algorithm for each locus in each population (Guo & Thompson, 1992) using 10,000 dememorizations, 1,000 batches, and 10,000 iterations per batch to ensure a standard error <0.01. The global test across loci and locality was performed using Fisher's method once the linkage disequilibrium analyses (Slatkin, 1994) showed no linkage disequilibrium at significance level = 0.05, performed in Arlequin v.3.5 with 10,000 permutations for all pairs of loci. Allelic richness was calculated by Fstat 2.9.4 (Goudet, 1995).

| Approximate Bayesian Computation
Overall, π and πb were the most informative summary statistics (also Another ABC analysis performed as described above, but with 470 sequences from the Americas, also found scenario 2 with the highest PP value (data not shown). Many haplotypes are also shared between South America and Europe (Appendices B and C). Our study presents the first results obtained with this exploratory analysis, and we suggest that this is a powerful tool to boost the investigation of this recent invasion in the American continent, which can be intensely driven using a genomic approach.

| Genetic diversity, demography, and population structure
The total haplotype and nucleotide diversity for COI, COII, and Cyt b concatenated genes were Hd = 0.926 ± 0.006 and π = 0.002 ± 0.001 (Table 1 and   where the observed data distribution was clearly unimodal positioning out of the expected distribution and the raggedness index was small (r = 0.0235), which is to be expected in a population that has experienced recent expansion (Table 2).
Additionally, little evidence of subdivision into genetically distinct populations was found by AMOVA, with a variation of 92.46% occurring within sites' level and only 7.54% among sites' level also indicating genetic homogeneity Ɵ ST = 0.0754 and p-value = 0.0074) (Table 3). For three hierarchical levels, where the differentiation was also tested among the morphoclimatic regions, the results were similar, with −0.67% of variation occurring among regions' level and 8.08% among sites within regions' level (Table 3).

| nDNA-PCR-based structure in H. armigera
A total of 92 samples were successfully predicted to allele size for all three nDNA-PCR markers (data are available on Dryad: https://doi.org/10.5061/dryad.rd1570s). The remaining four samples, which had unspecific amplification for any nDNA marker, were discarded from the analysis. From those samples, 56 alleles were found in total for the three markers (Table 4).
DDC and RpS6 had a large number of alleles (ranging from 170 TA B L E 2 Neutrality test statistics and mismatch distribution analysis from 303 H. armigera based on COI, COII, and Cyt b concatenated mtDNA  (Weir & Cockerham, 1984) were

| Inferring South America H. armigera origin
ABC is a powerful approach to investigate complex demographic events and biological invasion source that occurred in the past (Camargo et al., 2012;Tsai & Carstens, 2013 European origin of the species in Brazil is supported by reports of human-mediated dispersion via international trade of agricultural products between the Americas and Europe over the past 500 years, trade that has intensified over the last 100 years and much more over the last three decades of globalization (Barbosa et al., 2015;Biondi, Guedes, Wan, & Desneux, 2018;di Castri, 1989;Grapputo, Boman, Lindstroem, Lyytinen, & Mappes, 2005;Oliveira, Corrêa, Souza, Guedes, & Oliveira, 2013;Paini et al., 2016;Weintraub et al., 2017).
We found two haplotypes in Puerto Rico for the COI (Appendices B and C). One of them is the second most frequent haplotype in its world distribution. The other has only been found in the northwestern region of Brazil. Thus, we believe these are clues suggesting that individuals found in Puerto Rico should have originated in South America, through a process of population geographical expansion after the initial infestation, despite the high genetic differentiation found using nDNA markers.
Common mtDNA haplotypes are shared among the Americas, Europe, Africa, and Asia, including the sharing of alleles associated with pyrethroid resistance Tay et al., 2017).
Furthermore, H. armigera populations show weak genetic structure and high gene flow in the Old World and Australia (Anderson et al., 2016;Behere et al., 2007;Endersby, Hoffmann, McKechnie, & Weeks, 2007;Li et al., 2011;Nibouche, Bues, Toubon, & Poitout, 1998;Tay et al., 2008;Vijaykumar, Krishnareddy, Kuruvinashetti, & Patil, 2008;Weeks et al., 2010). The high diversity and weak genetic structure in the Old World make it difficult to describe with accuracy the geographical origin, routes, and the number of invasion events of H. armigera to Brazil. However, they also reduce the impact of this information because there is a wide transference of alleles among geographical sites in the Old World (Anderson et al., 2016;Behere et al., 2007;Nibouche et al., 1998;Rasool et al., 2014;Stokes, Mckechnie, & Forrester, 1997). Finally, there is consensus that the invasion of South America by H. armigera was not originated from Oceania (Anderson et al., 2016;Durigan et al., 2017;Tay et al., 2017). America (Fitt, 1989;Hardwick, 1965;Nibouche et al., 1998). The genetic variation is primarily within populations with recent population expansion and multidirectional dispersion among sites and morphoclimatic regions. The weak geographical population structure detected at the regional level in South American populations of H. armigera was also described at continental and intercontinental levels in the Old World (Anderson et al., 2016;Behere et al., 2013;Nibouche et al., 1998;Scott et al., 2005;Song et al., 2018). High population admixture has serious implications for pest management, since alleles may spread to wide areas and promote an increase of genetic diversity and fitness in local populations of H. armigera (Fraimout et al., 2017;Lombaert et al., 2010;Rius & Darling, 2014;Seymour et al., 2016).  Nagoshi et al., 2017).

| High genetic diversity and population expansion of H. armigera
It is probably associated with contrary wind patterns between the South and North American continents that limit long-distance dis-

| Presence of Helicoverpa putative hybrids in Brazil
We found a low number of putative hybrids between H. armigera and H. zea Brazilian strains under natural conditions. This result agrees with the previous studies that suggest low hibridization between H. armigera and H. zea in Brazil (Anderson et al., , 2016Leite, Pereira, et al., 2017. However between the species, promoting a fitness increase in Helicoverpa local populations and the rapid evolution of resistance to pesticides and Bt crops in South America (Chakroun et al., 2016;Downes, Walsh, & Tay, 2016;Han et al., 2015;Joußen & Heckel, 2016).

ACK N OWLED G M ENTS
We thank the following entities for their financial support: Fundação To the Editorial UPR-Experimental Agricultural System for revision.

CO N FLI C T O F I NTE R E S T
We have no competing financial interests.

AUTH O R S' CO NTR I B UTI O N
RMG, TM, JCVR, DFP, and AMLAE conceived and designed the study. RMG collected the data. ASC, CO, and AMLAE provided reagents and analytical tools. RMG, TM, and ASC analyzed the data. RMG, TM, and ASC wrote the manuscript. All authors read, corrected, and approved the manuscript.

DATA ACCE SS I B I LIT Y
All COI, COII, Cyt b, and ITS1 sequences used in this study were de-

A PPE N D I X B
Parsimony haplotype network of COI 626 bp-long mtDNA from 526 H. armigera sampled in the Americas (n = 470), Asia (n = 40), Europe (n = 11), and Africa (n = 5). Circles represent different haplotypes nominated in ascending order from most frequent to least frequent (1-57).
Circles area is proportional to the number of individuals carrying this haplotype as shown by the scale at the top-right. Colors in the circles indicate the proportional distribution of the continents in each haplotype, small black dots represent presumed, but unsampled haplotypes and the number of connection lines are proportional to number of mutational steps between any two different haplotypes. The five sequences found in Puerto Rico are indicated in the two corresponding haplotypes (2 and 9).

A PPE N D I X C
Sharing of haplotypes based on cytochrome c oxidase subunit I (COI) among South America, Central America, Asia, Europe, and Africa continents.