Population genetic structure of the carrot weevil (Listronotus oregonensis) in North America

Abstract Population genetic studies of insect pests enhance our ability to anticipate problems in agroecosystems, such as pest outbreaks, insecticide resistance, or expansions of the host range. This study focuses on geographic distance and host plant selection as potential determinants of genetic differentiation of the carrot weevil Listronotus oregonensis, a major pest of several apiaceous crops in North America. To undertake genetic studies on this species, we assembled the first complete genome sequence for L. oregonensis. Then, we used both haplotype discrimination with mitochondrial DNA (mtDNA) and a genotyping‐by‐sequencing (GBS) approach to characterize the genetic population structure. A total of 220 individuals were sampled from 17 localities in the provinces of Québec, Ontario, Nova Scotia (Canada), and the state of Ohio (USA). Our results showed significant genetic differences between distant populations across North America, indicating that geographic distance represents an important factor of differentiation for the carrot weevil. Furthermore, the GBS analysis revealed more different clusters than COI analysis between Québec and Nova Scotia populations, suggesting a recent differentiation in the latter province. In contrast, we found no clear evidence of population structure associated with the four cultivated apiaceous plants tested (carrot, parsley, celery, and celeriac) using populations from Québec. This first characterization of the genetic structure of the carrot weevil contributes to a better understanding of the gene flow of the species and helps to adapt local pest management measures to better control this agricultural pest.

and generate different patterns of geographic variation in life history traits of herbivorous insects, including developmental rate, phenology (Renner & Zohner, 2018), host plant specificity (Jiggins & Bridle, 2004), insecticide resistance (Ffrench-Constant et al., 2000), or dispersal capacity (Mazzi & Dorn, 2012). Knowledge of the genetic structure of insect populations can thus contribute to the design and implementation of locally, better adapted pest management strategies (Anderson et al., 2016;Rollins et al., 2006).
Genetic variations within and among populations result from a number of evolutionary drivers: mutation, gene flow, genetic drift, and selection (Coyne, 1992;Pinho & Hey, 2010). For herbivorous insects, their capacity to disperse and colonize new habitats is a major determinant leading to divergent genetic population structure (Mazzi & Dorn, 2012). Efficient dispersal capacity facilitates gene flow between regions, thereby reducing genetic structuring (Bohonak, 1999;Broquet & Petit, 2009;Kim & Sappington, 2013;Roderick, 1996). In contrast, the genetic diversity of species having poor dispersal capacity is often limited to the diversity associated with a single or a few events of colonization (bottleneck and founder effects) and further reduced following strong genetic drift. Such a pattern can result in significant population structuring (Hanks & Denno, 1994;Slatkin, 1987). Abiotic factors such as climate and landscape also affect the dispersal capacity of insects and trigger isolation (Grez & Villagran, 2000). Although geographic distance represents one of the most important factors in genetic differentiation of populations, host suitability and availability can also lead to host-associated genetic differentiation (Angelella et al., 2019;Antwi et al., 2015;Hood et al., 2020). This evolutionary pathway constitutes a continuum of differentiation that can generate genetic changes across populations leading to reproductive isolation and sympatric speciation (Drès & Mallet, 2002;Forbes et al., 2017). The host plant can act as the main determinant of population differentiation alone (Groman & Pellmyr, 2000;Silva-Brandão et al., 2018) or in combination with geographic isolation (Agosta, 2006). From an applied perspective, pest species having distinct genotypes associated with different host plants or cultivars may differ in their vulnerability to pest control methods (Machado et al., 2008;Martel et al., 2003;Shufran et al., 2000).
Native to North America, the carrot weevil, Listronotus oregonensis (LeConte) [Coleoptera; Curculionidae], is mainly distributed in the Great Lakes region (Justus & Long, 2019). This pest attacks Apiaceae, notably carrot (Daucus carota L. subsp. sativus), parsley (Petroselinum crispum L.), and celery (Apium graveolens L.) (Chandler, 1926). Distribution across its range appears to be highly frag- (Polygonaceae) can also be exploited by the carrot weevil and may contribute to maintain local populations nearby agricultural areas (Boivin, 2013). The carrot weevil completes one to three generations per year depending on the latitude (Boivin, 1999), and climate warming tends to increase voltinism in northern regions (Boivin, 2013;Telfer et al., 2018). Adults have a low dispersal capacity, moving mainly by walking despite being winged (Wright & Decker, 1957).
Females lay their eggs on petioles, and larvae typically burrow into roots. Significant crop damage can be observed, reaching up to 90% in the absence of pest control strategies (Boivin, 1999). Carrot weevil management relies mainly on adult scouting, synchronized applications of foliar insecticides, and crop rotations (Gagnon et al., 2021;Justus & Long, 2019). Levels of damage have likely increased recently across North America, suggesting a change in L. oregonensis phenology or local adaptation of populations (e.g., pesticide resistance) that could disrupt pest control (Telfer et al., 2018).
The main objective of this study was to investigate the genetic structure of L. oregonensis populations across its range in North America based on two complementary approaches: haplotype discrimination with mitochondrial DNA (mtDNA) and genotypingby-sequencing (GBS). In addition, a whole-genome assembly was produced and released as the first draft genome for this species.
More specifically, these genomic resources for the carrot weevil allowed us to examine the role of geographic distance and host plant selection in structuring populations. dulce), parsley (P. crispum L.), and celeriac (A. graveolens L. var. rapaceum), using carrot-bait traps (Boivin, 1985) placed at the edge of commercial fields. This sampling technique is recommended for all Apiaceae crops (Boivin, 1985;Justus & Long, 2019). Sampling sites were chosen according to (i) their geographic location across the range of the carrot weevil, (ii) their history of infestation by L. oregonensis, and (iii) species of host plant (Table S1). In the province of Québec, a finer geographic scale analysis was carried out focusing sampling efforts on weevils from various host plant species. We also tested individuals from a rearing colony established at the Saint-Jean-sur-Richelieu Research and Development Centre of Agriculture and Agri-Food Canada, with initial specimens collected at the experimental farm located at Sainte-Clotilde-de-Chateauguay. This laboratory population is maintained on carrot roots and has never been restocked with other field-collected individuals for 15 years. A total of 220 individuals were sampled from 17 localities distanced by a maximum of 2000 km, from Ohio to Nova Scotia (Table S1). Adult weevils were preserved in 1.5-ml Eppendorf-type vials with 95% ethanol until DNA extraction. Voucher specimens (one individual per locality) were also deposited at the Ouellet-Robert Entomological Collection (Université de Montréal, Montréal, Québec, Canada) (QMOR57165-QMOR57181).

| DNA extraction
A DNA extraction protocol was adapted from the DNeasy Blood and Tissue Kit (Qiagen). For each individual, the six legs were removed from the body and frozen at −80°C for at least one hour to facilitate tissue grinding. Legs were then crushed in extraction buffer directly into the 1.5-ml Eppendorf tube using a pestle (Ultident Scientific).
Samples were incubated overnight at 56°C (16 h) in the lysing buffer, and all subsequent steps followed the recommendations from the extraction kit. DNA was eluted in 100 μl of distilled water and stored at −20°C. Concentration and purity of extracted DNA were analyzed with a NanoDrop 2000 spectrophotometer (Thermo Fisher Scientific) and a Qubit fluorometer (Invitrogen). DNA amount per sample was normalized to between 100 and 500 ng/μl for mtDNA (COI) sequencing and to 2 ng/μl for the GBS approach.

| Genome sequencing and assembly
Fifty weevils from the rearing colony, aged between 2.5 and 3.5 months, were used to extract gDNA using the method described

| Bioinformatic processing
Sequences from the five nanopore runs were merged before being simultaneously basecalled, trimmed of adaptor sequences, and filtered based on quality scores using Guppy (v.3.6; Oxford Nanopore) F I G U R E 1 Listronotus oregonensis sampling locations across northeastern North America with the flags qscore_filtering and min_qscore 7 set. The Illumina short reads were trimmed of adaptors and filtered based on a minimum Q score of 30 using Fastp (v.0.20.1;Chen et al., 2018). Quality filtered reads were assembled into draft contigs using a two-step assembly approach. First, the long-read assembly pipeline wtdbg2 (v2.5; Ruan & Li, 2020) was used to generate an initial set of contigs using the parameter -X 25, -x preset1, -g 1.3G, -L 8k, -p 21, --edge-min 2, and --rescue-low-cov-edges set. In a second assembly, the short reads were combined with a 25× subset of the longest nanopore reads using the hybrid assembler HASLR (v.0.8a1; Haghshenas et al., 2020) under default settings. Both assemblies were then merged into a single draft using Quickmerge (v.0.3; Chakraborty et al., 2016). The resulting draft was then scaffolded using LINKS (v 1.8.7;Warren et al., 2015) prior to polishing using TGS-GapCloser (v1.1.1; Xu et al., 2020) and pilon (v.1.23;Walker et al., 2014), with long-read and short-read sequences, respectively.
We then used BlobTools (v1.1.1; Laetsch & Blaxter, 2017) to remove any bacterial contamination from the final assembly before assessing completeness using BUSCO (v4.0.6;Seppey et al., 2019) with the Arthropoda lineage specified. The carrot weevil genome size was also estimated by the J. Spencer Johnston's laboratory at the Texas A&M University using flow cytometry on six adult males and six adult females following a protocol modified from Hare and Johnston (2012).

| Genotyping approaches
Haplotype discrimination with mtDNA COI and a GBS approach were used as two complementary methods to characterize genetic population structures of the carrot weevil. The combination of genotyping methods contributes to a more global comprehension of the genetic population structure by analyzing regions (mtDNA vs genomic DNA) having different molecular evolutionary rates (Xia et al., 2018). GBS analyses show more recent genetic changes from single-nucleotide polymorphisms (SNPs) on nuclear DNA, while COI mtDNA can go much further back in the evolution timescale (Avise, 1991;Brumfield et al., 2003). Moreover, mtDNA, transmitted only by the mother, is not subjected to genetic recombination and is more sensitive to the founder effect (Avise, 1991;Birky et al., 1989;DeSalle & Giddings, 1986). By using these two approaches, we aimed to assess the genetic differences between populations of the carrot weevil and to estimate whether these differences occurred on a more distant or recent timescale.
The amplified PCR fragments were sequenced with a 3730xl DNA Analyzer (Applied Biosystems) at the Genome Québec Innovation Center and McGill University (Montréal, Québec, Canada). Then, forward and reverse sequences were assembled and edited to 685 bp using CLC Main Workbench V20.0 (Qiagen). Sequence alignment was performed on MEGA X, using MUSCLE with the default options (Kumar et al., 2018). A total of 206 individuals from 17 localities were analyzed.

| Genotyping-by-sequencing (GBS)
Each carrot weevil individual was genotyped by the GBS method de-

| Bioinformatics and genetic analyses
For COI, measures of haplotype (H) and nucleotide (π) diversity were generated using ARLEQUIN v3.1 (Excoffier et al., 2005). Tajima's D (Tajima, 1989) and Fu's F S (Fu, 1997) neutrality tests were performed to determine whether populations presented cases of expansion or a bottleneck of their genetic diversity. The fixation indices (F ST ) were calculated to compare each pair of populations based on haplotype frequency. Networks to depict relationships among haplotypes were also produced using Pop Art with a statistical parsimony network (TCS network) (Clement et al., 2002). To investigate for genetic differences between carrot weevil populations from different regions or host plants, AMOVAs were performed using ARLEQUIN v3.1 (Excoffier et al., 2005). Finally, we tested for correlations between F ST and geographic distance using the Mantel test with 1000 permutations in ARLEQUIN.
GBS reads were processed using the software Stacks v.2.54 (Catchen et al., 2013). Raw sequences were demultiplexed and filtered for quality using the process_radtags function (parameters set to -r -c and -q). The reads were then aligned to L. oregonensis draft reference genome (GenBank: JAHBCN000000000) using BWA (Li, 2013). Sequence polymorphisms were called and attributed to each population using the function gstacks (-m), with the parameters set to tolerate, considering all populations combined, a maximum of 10% missing data per locus (-R 0.9), a minimum minor allele frequency of 5% (-min-maf 0.05), and a minimum minor allele count of 3 (-min-mac 3). Read coverage was manually checked among samples to avoid low coverage individuals and ensure genotype accuracy (mean per sample coverage = 23×). Genetic parameters were computed using the populations program in Stacks. Genomic diversity was estimated according to the observed heterozygosity (HO), expected heterozygosity (HE), nucleotide diversity (π), and the inbreeding coefficients (F IS ). F ST values were calculated using the fstats function to compare each pair of populations for each variable site (Meirmans & Hedrick, 2011). Fisher's exact test was also computed for each F ST value.
Genetic differentiation between and among populations was visualized using a principal component analysis (PCA) with the function dudi.pca from the ade4 package (Chessel et al., 2004) and a discriminant analysis of principal components (DAPC) with the function dapc from the adegenet package (Jombart, 2008;Jombart et al., 2010) in R (R Core Team, 2020). Admixture was evaluated by calculating the posterior membership probability (probability of each individual to belong to predetermined populations) from the DAPC results.
AMOVAs were also performed to explore genetic differences between carrot weevil populations associated with different regions or host plants using the function poppr.amova from the poppr package (Kamvar et al., 2014). Correlation between the genetic (F ST ) and the geographic distance using the Mantel test with 1000 permutations was performed using the adegenet package.

| Outlier detection and selection by host plants
Outlier analyses were conducted on the populations from Québec to detect correlations between SNPs putatively under selection and the host plant. First, we used a Bayesian method implemented in the software BayeScan, version 2.1 (Foll & Gaggiotti, 2008), that assumes a Dirichlet distribution of allele frequencies between populations (Foll, 2012). This program estimates the probability that each locus is subject to selection by using a logistic regression on the two locus-population F ST coefficients. We used BayeScan with the default parameters, with a minimum number of iterations set to 50,000, the length of 20 pilot runs to 10,000 iterations, and the burn-in length to 50,000 iterations. The decision thresholds to call a SNP as being under selection were a q-value (analog to a false discovery rate p-value) under 0.05 and a posterior probability (comparison with a neutral model) above 0.91, which correspond to a strong relationship on the Jeffreys scale (Foll, 2012). Second, we used a multivariate approach based on discriminant analysis of principal components (DAPC) to identify the specific SNPs most influential in separating samples based on the host plant. The number of clusters in DAPC was set a priori based on the host plant (k = 4). Then, a locus-specific loading plot was created using a threshold of 0.001 to identify the SNPs that most contribute to separating individuals on the first two components. Additional PCAs were produced using only the outlier SNPs. Then, the position of the outlier SNPs was retrieved from the genome assembly and compared for both methods.
For the SNPs located in predicted genes, their putative function was determined by comparing the coding sequence with the National Center for Biotechnology Information (NCBI) protein database by means of BLASTX (Altschul et al., 1990) on the nonredundant (nr) sequence database using an E-value significance cutoff of 1e−5.

| Genome assembly and annotation
Whole-genome sequencing generated just over 35 M nanopore  (Table S2).
Genome completeness, following the analysis of 1013 conserved arthropod genes, was estimated at 82.6% (Table S2).

| COI
We obtained 220 sequences of 685 bp of the partial COI from 17 populations of L. oregonensis collected from Québec, Ontario, Nova Scotia, and Ohio. Fourteen samples were withdrawn from analyses due to poor DNA quality, and a single individual from one field was not used in our data set. DNA sequence data and specimen collection Eighteen polymorphic sites (S) were observed, with a haplotype diversity (H) ranging between 0.562 and 0.964 (Table 1). The haplotype diversity was high in each population, except for samples col-  (Table 4).

| GBS
We sequenced 157 individuals from 17 populations of L. oregonensis collected from Québec, Ontario, Nova Scotia, and Ohio. Ten  (Table 2). Nucleotide diversity was high among carrot TA B L E 1 Sample size (n), number of haplotypes (k), number of polymorphic sites (PS), haplotype diversity (H) ± SD, and nucleotide diversity (π) ± SD; results of Tajima's D and Fu's F S neutrality tests with p-values for each Listronotus oregonensis population analyzed with COI analysis   Figure 4a). In addition, a discriminant analysis of the principal components confirmed a strong clustering between geographic regions, reinforcing the pattern of genetic differentiation associated with distance ( Figure 4b). This was also supported by the isolationby-distance tests, which showed a good correlation between geographic distance and genetic differentiation (IBD: r² = 0.6989, p = 0.01; Figure S1). Furthermore, AMOVA indicated a significant contribution of geographic regions in genetic differentiation, but with a smaller percentage of variation (15.92% in GBS compared with 45.55% in COI) ( Table 3).
The membership probabilities of the DAPC analysis for each individual revealed a strong admixture among the populations from Québec and to a lesser degree, with those from Ontario ( Figure 5).
The two populations from Nova Scotia were admixed together but did not share genotypic information with the other locations. Individuals from Ohio did not show any sign of admixture with other field populations. They shared some information with the rearing colony individuals, suggesting the presence of noninformative loci or heterozygote deficiency due to inbreeding in these populations.  Table 4). However, a discriminant analysis using the four host plant species as groups revealed a slight clustering by hosts ( Figure 6b) that could indicate an ongoing differentiation process limited to a few genomic regions.

| Outlier detection and selection by host plants
The

TA B L E 2
Sample size (n), observed heterozygosity (Ho), expected heterozygosity (He), nucleotide diversity (π), and the inbreeding coefficients (F IS ) for each Listronotus oregonensis population analyzed with GBS analysis clustering the samples based on host plant (7 for the first axis and 9 for the second; Figure S4). The PCAs of these two reduced data sets show a fair clustering of the samples based on their host plant without a clear separation of the four groups ( Figures S5 and S6).
However, the detailed analysis of allele frequency among each individual for these SNPs did not identify a clear pattern and does not support strong selection (Figures S7 and S8). Interestingly though, one of these SNPs was detected by both methods and located in an intron of a predicted gene. The closest accession (XP_030748181.1) on NCBI was from the rice weevil (Sitophilus oryzae) and contained a conserved domain encoding a DDE superfamily endonuclease (pfam03184).

TA B L E 3
Results of the AMOVA of 17 populations of Listronotus oregonensis collected from three Canadian provinces and one US state under the COI and GBS analyses

| DISCUSS ION
Our study revealed significant genetic differences between distant populations of the carrot weevil across North America, indicating that geographic distance represents a key factor of differentiation.
In contrast, no clear evidence of sympatric speciation or genetic differentiation associated with the host plant was observed in the populations of Québec collected from four different cultivated apiaceous plants (carrot, parsley, celery, and celeriac).
Analyses of both mitochondrial and nuclear DNA allowed the characterization of the genetic structure of carrot weevil populations at two timescales. Mitochondrial DNA goes much further back in the evolutionary history of a species compared with nuclear DNA that reveals more recent changes (Avise, 1991;Brumfield et al., 2003). For example, the COI mtDNA network analysis showed that carrot weevil populations from Nova Scotia were slightly more similar to populations from Québec than those from Ontario and Ohio, such a pattern being compatible with a classic stepping-stone model (Kimura & Weiss, 1964)  Both analyses (mtDNA and GBS) showed great haplotype diversity and complex genetic structure among populations of carrot weevils, which is characteristic of native species (Kim & Sappington, 2004a;Zhang et al., 2018). We also detected heterozygote deficiency in all populations, indicating that populations are locally confined. This pattern could potentially be explained by a limited dispersal capacity of adult carrot weevils and/or the presence of well-structured populations. The movement of an organism is a fundamental element in ecology and evolutionary processes, which determines its dispersal capacity and distribution area (Nathan, 2008;Nathan et al., 2008). Distribution area is also influenced by biotic and abiotic factors such as resource availability, competition, predation, climate, photoperiod, and landscape (Grez & Villagran, 2000;Renner & Zohner, 2018). The carrot weevil, moving mostly by walking, could also be limited in its dispersal between regions sampled because the apiaceous cultures are not contiguous across the landscape. Like the carrot weevil, the boll weevil (Anthonomus grandis Boheman) is not an efficient flyer (McKibben et al., 1991) and hardly flies headway against surface winds (Hardee et al., 1969;Moody et al., 1993). In accordance with our results, Kim and Sappington (2004b) and Raszick et al. (2021) demonstrated using mtDNA and nuclear DNA (ddRADseq) that geographic distance is an important factor of genetic differentiation for boll weevil populations in North America. Following the theory of isolation-by-distance, several authors have concluded that differentiation between populations generally increases with geographic distance (Slatkin, 1987). In this study, we found that L. oregonensis populations at the regional scale (>1500 km) are  (Olivieri et al., 2008).
Some insects develop a preference for a given host based on its suitability for development, survival, and reproduction (Hawthorne & Via, 2001;Singer et al., 1988), and others will have different mate choice behaviors depending on the affiliation to a host plant (Feder et al., 1994;Nosil et al., 2007). For example, the milfoil weevil (Euhrychiopsis lecontei Dietz) showed a genetic differentiation between individuals that feed on a native host plant, the northern water milfoil (Myriophyllum sibericum Komarov), and individuals that prefer a congeneric introduced species, the Eurasian water milfoil (Myriophyllum spicatum L.) (Roketenetz et al., 2017). In addition, the apple maggot fly (Rhagoletis pomonella Walsh) was previously feeding on the ancestral downy hawthorn (Crataegus mollis Torrey and Gray) but developed host shift for the domesticated apple tree (Malus pumila Miller), leading to sympatric host race formation (Bush, 1969;Feder et al., 2003 To complement the population genetic study, a first carrot weevil genome assembly was generated. The obtained genome indicates a great completeness and quality with a recovering of 82.6% of complete BUSCO genes. The 1.3 GB size for the genome assembly is large compared with other weevil (Coleoptera: Curculionidae) species such as the red palm weevil, Rhynchophorus ferrugineus (589 MB) (Dias et al., 2021), and the rice weevil, Sitophilus oryzae (770 MB) (Parisot et al., 2021), but similar to a sister species, the Argentine stem weevil, Listronotus bonariensis (1.1 GB) (Harrop et al., 2020). As for L. bonariensis, L. oregonensis presents a high proportion of repetitive sequences, explaining the large size of these two genomes. The L. oregonensis genome will also facilitate future genetic studies to help management of this pest species.
From an applied perspective, these results help to refine cur-

CO N FLI C T O F I NTE R E S T
The authors have no conflict of interest to disclose.