Genomic‐based epidemiology reveals independent origins and gene flow of glyphosate resistance in Bassia scoparia populations across North America

Abstract Genomic‐based epidemiology can provide insight into the origins and spread of herbicide resistance mechanisms in weeds. We used kochia (Bassia scoparia) populations resistant to the herbicide glyphosate from across western North America to test the alternative hypotheses that (i) a single EPSPS gene duplication event occurred initially in the Central Great Plains and then subsequently spread to all other geographical areas now exhibiting glyphosate‐resistant kochia populations or that (ii) gene duplication occurred multiple times in independent events in a case of parallel evolution. We used qPCR markers previously developed for measuring the structure of the EPSPS tandem duplication to investigate whether all glyphosate‐resistant individuals had the same EPSPS repeat structure. We also investigated population structure using simple sequence repeat markers to determine the relatedness of kochia populations from across the Central Great Plains, Northern Plains and the Pacific Northwest. We found that the original EPSPS duplication genotype was predominant in the Central Great Plains where glyphosate resistance was first reported. We identified two additional EPSPS duplication genotypes, one having geographical associations with the Northern Plains and the other with the Pacific Northwest. The EPSPS duplication genotype from the Pacific Northwest seems likely to represent a second, independent evolutionary origin of a resistance allele. We found evidence of gene flow across populations and a general lack of population structure. The results support at least two independent evolutionary origins of glyphosate resistance in kochia, followed by substantial and mostly geographically localized gene flow to spread the resistance alleles into diverse genetic backgrounds.


| INTRODUC TI ON
The rate at which herbicide resistance evolves in plant populations is dictated by the interplay of genetic and ecological factors (Hawkins et al., 2019;Kreiner et al., 2018), which determine whether resistance emerges and spreads via (i) a small number of initial evolutionary origins, or even a single evolutionary origin, followed by subsequent dispersal across landscapes; or (ii) via numerous local, independent evolutionary origins with limited and localized dispersal (Baucom, 2019). Important genetic determinants include the potential for resistance to evolve from standing genetic variation versus the requirement for populations to "wait" for de novo variation to arise via mutation after the onset of selection (Barrett & Schluter, 2008), which in turn dictates the likelihood of soft versus hard selective sweeps for herbicide resistance in plant populations (Hermisson & Pennings, 2005;Messer & Petrov, 2013). In the absence of standing variation, the time for novel adaptive mutations to arise depends on the size of the mutational target (the number of adaptive mutations/genotypes that endow resistance) and fitness trade-offs (Neve et al., 2014;Vila-Aiub et al., 2009). Ultimately, these genetic factors determine the extent to which the emergence of resistance is mutation-limited.
If molecular genetics establishes the rate at which adaptive variation arises, ecological factors determine the rate at which newly evolved or pre-adapted genotypes disperse between populations and across landscapes. In highly mobile and migratory organisms, such as migratory insects, relatively infrequent mutations may be dispersed long distances, whereas in less mobile species (including plants), resistance spread is likely to be more limited, making it relatively more likely that widely distributed resistance traits arise from multiple independent mutational origins (Beckie et al., 2021). However, in agroecosystems, the dispersal of arthropods, weeds and pathogens may often be facilitated by human activities. This is particularly true for weed seeds that are often moved across landscapes by farm machinery and potentially across continents in contaminated crop seed (Benvenuti, 2007), meaning that pollen-and/or seed-mediated dispersal is more frequent than would be predicted by consideration of ecological factors (Beckie et al., 2019).
Although these genetic and ecological factors are fundamentally important for understanding the rate of adaptation of plant populations to herbicides and for tailoring appropriate resistance management guidelines, relatively few studies have explored regional-scale patterns of resistance frequency and dispersal or associated genetic and genomic signatures of resistance alleles. In Europe, studies focused on landscape-, national-and continentallevel patterns of resistance to acetyl CoA carboxylase (ACCase-) and acetolactate synthase (ALS-) inhibiting herbicides have implicated multiple, independent evolutionary origins of herbicide resistance alleles in weeds (Délye et al., 2010;Dixon et al., 2021;Menchari et al., 2006). Multiple origins of resistance for the ACCase-and ALS-inhibiting herbicide groups may be due in part to standing genetic variation for resistance to these herbicides (Délye et al., 2013), multiple possibilities for target-site mutations (Gaines et al., 2020) and the lack of large fitness costs for resistance to these herbicides (Vila-Aiub et al., 2009). In North America, studies focusing on the evolution and spread of glyphosate resistance have been more equivocal, with some support for multiple independent origins (Okada et al., 2013), some studies presenting evidence for both long-distance dispersal and local evolution , and others implicating a single evolutionary origin with subsequent widespread dispersal (Molin, Patterson, et al., 2020;Molin et al., 2018).
Here we use a combination of molecular population genetics and genomic-based epidemiology (defined here as the use of genomic data to determine the origin(s) and spread of known resistance genotypes) to review evidence for the evolutionary origins of glyphosate resistance within and between multiple populations of the agricultural weed kochia [Bassia scoparia (L.) A. J. Scott, synonymous with Kochia scoparia (L.) Schrad.], an introduced weed that occurs in Plains where glyphosate resistance was first reported. We identified two additional EPSPS duplication genotypes, one having geographical associations with the Northern Plains and the other with the Pacific Northwest. The EPSPS duplication genotype from the Pacific Northwest seems likely to represent a second, independent evolutionary origin of a resistance allele. We found evidence of gene flow across populations and a general lack of population structure. The results support at least two independent evolutionary origins of glyphosate resistance in kochia, followed by substantial and mostly geographically localized gene flow to spread the resistance alleles into diverse genetic backgrounds.

K E Y W O R D S
gene duplication, gene flow, herbicide resistance, independent evolution, mobile genetic elements, population genetics the semi-arid arable lands of the western USA and Canada (Friesen et al., 2009). Kochia populations from North America exhibit high levels of genetic diversity but a lack of strong population structure (Kumar et al., 2019;Martin et al., 2020;Mengistu & Messersmith, 2002), probably due to several reproductive traits that promote cross-pollination and long-range dispersal of pollen and seed (Beckie et al., 2016).
Genome sequencing has revealed that the structural rearrangement that caused duplication of the gene encoding the glyphosate target enzyme (5-enolpyruvylshikimate-3-phosphate synthase, EPSPS) in glyphosate-resistant kochia is due to complex interactions between mobile genetic elements and local tandem rearrangements (Jugulam et al., 2014;Patterson et al., 2019).
These molecular mechanisms are unlikely to spontaneously generate the same genotype in multiple kochia populations and thus we are presented with a unique opportunity to employ genomicbased epidemiology to address questions about the evolutionary origins of glyphosate resistance in kochia. If many individuals from different populations contain the same structural rearrangement, this would provide strong evidence for a single evolutionary origin. If this is true, analysing patterns of neutral genetic variation using more traditional population genetics approaches will identify evidence of relatedness and gene flow between glyphosateresistant populations. If populations exist that lack this exact rearrangement and instead contain an EPSPS gene duplication with a different structural rearrangement, this would provide evidence for an independent evolutionary origin. Alternatively, an EPSPS gene duplication with a different structural rearrangement could provide evidence for a shared evolutionary origin followed by a subsequent rearrangement.

| Plant materials
Seeds were collected from kochia individuals from locations in the western USA and western Canada between 2010 and 2015 (Table 1).
Crops grown at the sampled locations included winter wheat, no-till fallow and sugar beet. A total of 44 kochia populations from eight different states in the USA and one province in Canada were used for the analyses (Table 1) (Kumar et al., 2018), and Montana (Kumar et al., 2014) were previously screened for glyphosate resistance.

| Glyphosate resistance screening
A population ID was assigned to each locality (Table 1). Each population ID consists of the abbreviated state/province of origin, a unique identifying number, and its designation as GR or GS (e.g., CO1R = Colorado resistant population 1). To determine variation in each population for glyphosate susceptibility and resistance, a screening was performed in the glasshouse. Seeds from each population were planted in germination flats. After emergence, seedlings were transplanted into 18-insert (7 × 7-cm pots) flats containing custom mix potting soil (Fafard, Sun Gro Horticulture), and grown at 23°C under a 14-hr light/10-hr dark cycle with supplemental light from sodium halide lamps. Plants were watered daily and fertilized once (Miracle-Gro, Scotts Miracle-Gro Company). When plants were 3 weeks old, a leaf was collected for DNA extraction (see below) and nine to 18 plants per population (Table 1) were sprayed with commercially formulated glyphosate (Roundup WeatherMax) in distilled water at 0.84 kg acid equivalent ha −1 . Glyphosate applications were performed using a moving flat-fan nozzle (8002EVS) in a laboratory spray chamber at 156 L ha −1 spray volume. Three weeks after herbicide treatment, individual survival for each population was assessed.
Populations for which at least one individual survived were classified as GR.

| DNA extraction and genotyping
For total genomic DNA extraction, leaf tissue was collected from plants grown as previously described prior to glyphosate treatment.
Samples were immediately frozen in liquid nitrogen and stored at −80°C. Tissue was ground using a TissueLyser II (Qiagen; 30 Hz for 2 min). Genomic DNA was extracted from 100 mg fresh weight tissue following a modified cetyltrimethylammonium bromide (CTAB) extraction protocol (Doyle, 1991). DNA quality and concentration were measured using a NanoDrop spectrophotometer (Thermo Scientific ND-1000). All DNA samples were normalized to 5 ng µl −1 with deionized water.

| Genomic-based epidemiology using EPSPS copy number and associated duplication markers
The EPSPS locus has been sequenced from a single GR kochia individual using BAC libraries (Patterson et al., 2019  The qPCR protocol of Patterson et al. (2019) was used and relative copy number was calculated using the ∆Ct method (Schmittgen & Livak, 2008). The type I and type II EPSPS duplication qPCR primers have a forward primer in the MGE and a reverse primer in either the type I or the type II sequence, respectively, enabling amplification only when the MGE is located next to the type I or type II repeat segment. Populations and their genotypes and geographical locations were plotted in R using the ggplot and sf packages (R Core Team, 2019).

| SSR genotyping
To develop polymorphic genetic markers for kochia, Roche 454 sequencing technology (Keck Center, University of Illinois) was used to determine partial genomic sequence from a single GR kochia individual taken from the KS-R1 population used in Wiersma et al. (2015).
Approximately 75.2 million aligned bases (from 357 million total bases sequenced) with reads having an average length of 557 bases were obtained. This data set was screened for simple sequence repeats (SSRs) using Imperfect SSR Finder (https://data.nal.usda.gov/ dataset/imperfect-ssr-finder). Primers to amplify the identified SSRs were also designed using Imperfect SSR Finder ffollowing the approach described by Lee et al. (2009) Note: For each locus, primer sequence and melting temperature (T m ) and targeted repeat motif are provided. PCR expected amplicon size and annealing temperature used for PCR are also indicated.

| Genetic diversity and population structure
The evaluation of linkage equilibrium and Hardy-Weinberg equilibrium of loci was done using exact tests with the functions test_LD and test_HW, respectively in the "genepop" package (version 1.1.7; Rousset, 2008). Descriptive summaries for each population across loci were calculated using the divBasic function in the "diveRsity" package (version 1.9.90; Keenan et al., 2013). Descriptive summaries for each locus across populations were calculated using the locus_table function in the "poppr" package (version 2.8.6; Kamvar et al., 2014Kamvar et al., , 2015. Missing data were assessed using the info_table function in the "poppr" package (Kamvar et al., 2014(Kamvar et al., , 2015 and loci with more than 10% data missing and individuals with more than 20% data missing were removed. Descriptive summaries for populations and loci were made on the entire data set and again on the data set after removing loci and individuals with unacceptable levels of missing data. A population-level phylogeny based on the neighbour-joining clustering method using Prevosti's genetic distance model was generated with bootstrapped support using 1000 replicates with the aboot function in the "poppr" package (Kamvar et al., 2014(Kamvar et al., , 2015 and independent runs using the LOCPRIOR model (sampling location information included) to account for weak structure signals in the data set. The most likely number of clusters was determined using the Evanno method (Evanno et al., 2005) as implemented in structure harvester version 0.6.94 (Earl & vonHoldt, 2012). The final analysis for K = 3 was performed using a burn-in of 50,000 and 500,000 MCMC replications with 20 independent runs. Runs were summarized using clumpp version 1.1.2b (Jakobsson & Rosenberg, 2007) utilizing the Greedy algorithm and visualized with distruct version 1.1 (Rosenberg, 2004).   A was consistent with the initial characterization of the EPSPS repeat structure (Patterson et al., 2019) in that type I and II PCR markers always amplified together, and type I (long repeat) was nearly always present at higher copy number than type II (short repeat) ( Table 2; Table S1). Both Genotype A and Genotype B

| Genomic-based epidemiology using EPSPS copy number and associated duplication markers
showed a positive correlation between EPSPS copy number and MGE copy number (Figure 2), while Genotype C had no increase in MGE with increasing EPSPS copy number (Figure 2). With few exceptions, individuals with wild-type EPSPS copy number of 1 (S genotypes) had no increase in MGE copy number (Figure 2; Table   S1). Some individuals had much higher EPSPS copy number than previously reported; for example, individuals collected in Montana had 20-30 copies with no presence of type I or II and very high (>60 copies) MGE (Table 3; Table S1).
All GS kochia samples in the survey had one copy of EPSPS per haploid genome, no amplification of type I or II markers, and most had four to six copies of the MGE per haploid genome (Figure 2; Table S1). MGE copy number was increased to >10 in one GS individual from each of four populations (Figure 2; Table S1; located in Colorado, Nebraska, and Montana) without a corresponding increase in EPSPS gene duplication (Table S1).

| Genetic diversity and population structure
To determine if the population genetics data supported the three independent origins identified by genomic-based epidemiology, we developed 11 SSR markers (  (Waples, 2015) and many CIs spanned from negative to positive and over very large ranges.
As loci and populations did not meet the assumptions of HWE, a neighbour-joining tree was used to assess genetic similarity be- The structure analysis showed that K = 3 was the best-supported number of clusters or gene pools ( Figure S1) and also supported the grouping of OR4R and MT2R with the Central Great Plains populations including CO1R, KS10R and KS11R (Figure 4) Table 2 for population-level genotype proportions). Note: Genotype A is defined as increased EPSPS copy number, presence of type I and II repeats, MGE ≥ 10; Genotype B is defined as increased EPSPS copy number, no type I or II repeats, and MGE ≥ 10; Genotype C is defined as increased EPSPS copy number, no type I or II repeats, MGE < 10; Genotype S is defined as EPSPS copy number of 1. Proportion colour scale from 0 (yellow) to 1 (green).
of the variation and PC2 explaining 3.9%, and no clear patterns of clustering by region were identified ( Figure S2). in geographically distinct locations, with two independent origins supported by the data and the possibility of a third independent origin to be investigated. Determining whether genotype C, the event not associated with increased MGE copy number, occurred independently from genotype A via a different molecular genetic mechanism will require additional sequencing to assemble this specific EPSPS duplication variant.

| Evolutionary origin(s) of resistance
The first report of GR kochia was from Kansas in 2007 (Waite et al., 2013) and reports have since confirmed glyphosate resistance in multiple US states and Canadian provinces (Kumar et al., 2019). The widespread regional evolution of GR kochia has negatively impacted the sustainability of reduced-tillage weed management and  (Kumar et al., 2019). The mechanism of glyphosate resistance has been thoroughly investigated in kochia, in terms of physiology and fitness penalty as well as the genetic mechanisms that cause resistance (Beckie et al., 2018;Martin et al., 2017;Osipitan & Dille, 2017;Wiersma et al., 2015). The first populations characterized for the molecular genetics of the resistance allele were from the Central Great Plains region (Kansas and Colorado) and had tandem EPSPS gene duplications that occurred at a single locus (Jugulam et al., 2014;Patterson et al., 2019). An increased copy number of the EPSPS gene has been identified as the resistance mechanism in all studied kochia populations to date from across seven US states (Montana, Wyoming, Oregon, Idaho, Nebraska, Kansas and Colorado) and in Canada (Gaines et al., 2016;Godar et al., 2015;Kumar et al., , 2018Martin et al., 2017;Wiersma et al., 2015).

| Genomic mechanisms underlying resistant copy number variants
Mobile genetic elements can generate novel structural genomic variation through removal and/or insertion at new loci (Bennetzen, 2005;Stapley et al., 2015), which can result in adaptive genetic changes in plant (Li et al., 2018) and insect (Schmidt et al., 2010)

| Gene flow of resistant genotypes
We  ( Figure 1, Table 3; Table S1). These two populations showing admixture in Colorado indicates gene flow has occurred in this area.
We predicted that population genetics analysis would show clear geographical structure if independent glyphosate resistance evolutionary events occurred and were followed by rapid dispersal and introgression within a region. Aside from a large grouping of Central Great Plains populations and a second grouping of Pacific Northwest populations, we were not able to identify clear geographical structure for the three regions corresponding to the three EPSPS genotypes. This lack of population and geographical structure is also supported by a PCA ( Figure S2). This aligns with previous population genetics studies in kochia that have found  (Friesen et al., 2009;Kumar et al., 2019;Mengistu & Messersmith, 2002). A recent study using high-coverage singlenucleotide variant data found almost no population structure (Martin et al., 2020), concluding that high gene flow rates occur across GR kochia populations. Although some strong signals of relatedness were detected between populations from geographically isolated locations, such as OR4R and KS10R (Figures 3 and   4), we consider it to be unlikely that the same duplication mechanism is evolving independently multiple times within a region on different genetic backgrounds. Instead, it is more likely the SSR data have insufficient resolution to identify population genetic relationships, and extensive long-distance gene flow via seeds may make regional differences harder to detect. Whereas the structure analysis supports three groups, populations were not consistently assigned to three groups corresponding geographically to the regions containing the three EPSPS genotypes and most populations contained some presence of all three structure groups. This further supports the weak population structure found in the PCA. Kochia has protogynous flowers in which the stigmas emerge first and are receptive to pollen from other flowers before pollen production within the same flower occurs, reducing the self-pollination rate (Blackwell & Powell, 1981;Guttieri et al., 1995). Additionally, kochia is a well-known tumbleweed species, with some plants dispersing seeds for dozens or even hundreds of miles (Kumar et al., 2019). This dispersal mechanism greatly increases the spread of herbicide-resistance alleles and makes containment extremely difficult (Beckie et al., 2016;Kumar et al., 2019;Stallings et al., 1995).
Further research will be needed to quantify long-distance gene flow in kochia, and the relative contributions of natural tumbling dispersal and human-mediated seed migration. Palmer amaranth (Amaranthus palmeri), an obligate outcrossing weed species subjected to widespread glyphosate selection pressure, also maintains high levels of genetic diversity and little population structure.

| Evolutionary origin(s) and gene flow of resistance in other species
Single origins of herbicide resistance followed by substantial geographical distribution by pollen-and seed-mediated gene flow is known to have a major contribution to resistance frequency in several weed species (Beckie et al., 2019). The high sequence similarity of the extrachromosomal DNA containing the EPSPS gene in Palmer amaranth (Koo et al., 2018;Molin, Yaguchi, et al., 2020) across widespread populations supports the hypothesis of a single origin followed by dispersal (Molin, Patterson, et al., 2020;Molin et al., 2017Molin et al., , 2018. GR populations of flaxleaf fleabane (Erigeron bonariensis) from across multiple Australian states were highly related, supporting a single origin of resistance followed by a high frequency of seed movement (Minati et al., 2020). In contrast to these singleorigin examples, multiple independent origins of glyphosate resistance with little population structure were found in GR populations of common morning glory (Ipomoea purpurea) (Kuester et al., 2015).
A genomics-based approach in the same species found evidence for parallel genetic responses in genomic regions encoding potential herbicide detoxification enzymes, while other genomic regions showed divergent patterns of selection (Van Etten et al., 2020).
Convergent evolution of EPSPS gene duplication with unique structural variation was found in waterhemp (Amaranthus tuberculatus) from the US Midwest and Ontario, Canada, suggesting at least two independent evolutionary origins of resistance alleles followed by gene flow and introgression into two populations . Multiple independent origins of glyphosate resistance were also detected in horseweed (Erigeron canadensis) in California, with localized movement of resistant individuals accounting for spread on regional levels correlating with groundwater regulations that encouraged more glyphosate use and less use of other herbicides (Okada et al., 2013).
In summary, we have used genomic-based epidemiology to track the mutations underlying one specific origin of glyphosate resistance in kochia and showed that at least two independent origins of glyphosate resistance have evolved in kochia, followed by substantial regional gene flow to spread the resistance alleles to new genetic backgrounds. Due to the tumbling dispersal of kochia, intercepting seed movement across the landscape has high potential to mitigate the negative impact of herbicide resistance spreading from an initial origin. With the kochia reference genome now available (Patterson et al., 2019), the population genomics approach used by Kreiner et al. (2019) can be used in kochia to study population divergence and origins of resistance . From the results of this study, we will continue to investigate the hypothesis that genotype C represents an independent origin and the hypothesis that genotype B is either an independent origin or derived from genotype A with loss F I G U R E 4 Bayesian clustering analysis (structure) in glyphosate-resistant kochia (Bassia scoparia). Assignment of 509 kochia individuals from 44 populations to the K = 3 genetic clusters inferred by analysis. Populations are sorted by region (Central Great Plains, Northern Plains and Pacific Northwest) and alphabetized within region. Each horizontal bar corresponds to a distinct individual and its probability of assignment, q, to each cluster [Colour figure can be viewed at wileyonlinelibrary.com] of PCR primer sites for the type I and type II markers. We will test these hypotheses by sequencing and assembling the duplicated region from individuals with genotypes B and C to provide insights and new markers to further investigate the evolutionary dynamics of the EPSPS tandem duplication in kochia across western North America.

ACK N OWLED G EM ENTS
The work was supported by USDA National Institute of Food and

CO N FLI C T O F I NTE R E S T
The authors declare no conflict of interest.

DATA AVA I L A B I L I T Y S TAT E M E N T
SSR genotypic data and R scripts used for population genetics analy-