Call of the wild rice: Oryza rufipogon shapes weedy rice evolution in Southeast Asia

Abstract Agricultural weeds serve as productive models for studying the genetic basis of rapid adaptation, with weed‐adaptive traits potentially evolving multiple times independently in geographically distinct but environmentally similar agroecosystems. Weedy relatives of domesticated crops can be especially interesting systems because of the potential for weed‐adaptive alleles to originate through multiple mechanisms, including introgression from cultivated and/or wild relatives, standing genetic variation, and de novo mutations. Weedy rice populations have evolved multiple times through dedomestication from cultivated rice. Much of the genomic work to date in weedy rice has focused on populations that exist outside the range of the wild crop progenitor. In this study, we use genome‐wide SNPs generated through genotyping‐by‐sequencing to compare the evolution of weedy rice in regions outside the range of wild rice (North America, South Korea) and populations in Southeast Asia, where wild rice populations are present. We find evidence for adaptive introgression of wild rice alleles into weedy rice populations in Southeast Asia, with the relative contributions of wild and cultivated rice alleles varying across the genome. In addition, gene regions underlying several weed‐adaptive traits are dominated by genomic contributions from wild rice. Genome‐wide nucleotide diversity is also much higher in Southeast Asian weeds than in North American and South Korean weeds. Besides reflecting introgression from wild rice, this difference in diversity likely reflects genetic contributions from diverse cultivated landraces that may have served as the progenitors of these weedy populations. These important differences in weedy rice evolution in regions with and without wild rice could inform region‐specific management strategies for weed control.


| INTRODUCTION
Crop domestication and agricultural weed evolution together represent two of the best documented forms of rapid evolution in plant species. Both of these processes are at play in the evolution of weedy crop relatives, which have recently gained attention as valuable systems for studying the genetic basis of rapid adaptation in agroecosystems (Vigueira, Olsen, & Caicedo, 2013). Weedy crop relatives are also increasingly recognized as long-standing components of agroecosystems and integral contributors to crop evolutionary dynamics (Fénart, Arnaud, De Cauwer, & Cuguen, 2008;Li, Li, Jia, Caicedo, & Olsen, 2017;Roumet, Noilhan, Latreille, David, & Muller, 2013).
In the case of domesticated rice (Oryza sativa L.), conspecific weed strains occur in rice production areas worldwide, where they aggressively compete with crop varieties for nutrients and light. Weedy rice infestations can reduce harvests by more than 80% if left unchecked and are considered a primary constraint on rice productivity in the United States and other world regions. Weedy rice control is hindered by its close phenotypic similarity to its domesticated relative, especially at the vegetative stage, and by the potential for crop-to-weed movement of herbicide resistance alleles (Chen, Lee, Song, Suh, & Lu, 2004;Shivrain et al., 2009). Although both cultivated and weedy rice are predominantly self-fertilizing, multiple instances of introgression of herbicide resistance alleles have been documented since the commercialization of herbicide-resistant cultivars in the early 2000s (Busconi, Rossi, Lorenzoni, Baldi, & Fogher, 2012;Engku et al., 2016;Shivrain et al., 2007), indicating the potential for weed adaptation through crop-to-weed gene flow. Phenotypic characteristics of weedy rice include rapid growth and soil nutrient uptake; highly shattering seed that are easily dispersed into crop fields; strong seed dormancy, which allows seeds to remain viable in the seed bank for several years; and proanthocyanidin-pigmented pericarps, a dormancy-associated trait found in wild Oryza species (reviewed in Nadir et al., 2017).
As a conspecific weed of a genomic model crop species, weedy rice has provided a productive system for studying the dynamics of agricultural weed adaptation. Studies over the last two decades have characterized the population structure of weed strains and revealed independent weed origins in different world regions, with most weed strains closely related to domesticated rice (Akasaka, Ushiki, Iwata, Ishikawa, & Ishii, 2009;Cao et al., 2006;Cho, Chung, & Suh, 1995;Huang et al., 2017;Londo & Schaal, 2007;Song, Chuah, Tam, & Olsen, 2014). Weedy rice strains in the southern United States are among the genetically best characterized. A combination of analyses, including comparative QTL mapping (Qi et al., 2015), population genomics , candidate gene studies Reagon, Thurber, Olsen, Jia, & Caicedo, 2011;Thurber et al., 2010;Vigueira, Li, & Olsen, 2013), and selection scans with whole-genome sequences , has revealed that the two major strains present in US rice fields likely evolved in Asia through two independent episodes of dedomestication from cultivated rice. No wild Oryza species occur in North America, and there is minimal evidence for any direct role of wild populations in the evolution of these weed populations. The US weed strains are also characterized by low genetic diversity, consistent with population bottlenecks during their introduction from Asia; this most likely occurred as accidental contaminants of grain stocks .
Examination of weedy rice in different regions of Asia can provide valuable points of comparison to the genetically well-characterized US weed strains. Here, we focus on portions of Southeast Asia (Thailand, Vietnam, Cambodia, Malaysia, and Indonesia) and Northeast Asia (specifically South Korea). In the case of Southeast Asia, which represents one of the likely centers of early rice cultivation, three factors would be expected to contribute to more complex weed evolutionary dynamics than in other regions: (i) the presence of Oryza rufipogon Griff. (hereafter wild rice), the crop's wild progenitor which is outcrossing and interfertile with both cultivated and weedy rice (Majumder, Ram, & Sharma, 1997); (ii) a far greater diversity of crop varieties and landraces in this region, some of which could be contributing to the weed's evolution (Song et al., 2014); and (iii) the very rapid proliferation of weedy rice across this region in recent decades due to agricultural shifts away from hand-transplanting of rice seedlings toward mechanized directseeded rice cultivation (Chauhan, 2013;Sudianto et al., 2016). In contrast to Southeast Asian weeds, those in Northeast Asia are similar to US weeds in that they occur outside the geographical range of wild Oryza populations and the area of high crop varietal diversity.
Among these different factors that could shape weedy rice evolution, the presence or absence of wild rice populations could be particularly important for weedy rice adaptation. Some wild rice traits, such as freely shattering seed and persistent seed dormancy, would be expected to be highly adaptive if introgressed into weedy rice populations. In contrast, wild rice traits such as perenniality, sporadic seed production and prostrate plant architecture would all be expected to be maladaptive for survival in cultivated rice fields. Given this combination of potentially beneficial and maladaptive traits for weedy rice, one might expect differential evidence of wild-to-weed introgression in the specific genomic regions that would confer weed-adaptive traits. Because rice is a genomic model species with a well-annotated reference genome and molecularly well-characterized domestication genes, evidence for such adaptive introgression can be explicitly examined using dense, genome-wide SNP markers (Hufford et al., 2013). This genome-wide approach can serve as a useful complement to recent candidate gene studies which have suggested adaptive introgression of wild rice alleles conferring shattering (sh4, Song et al., 2014) and seed dormancy (Rc, Cui et al., 2016) in some Malaysian weedy rice strains.
In this study, we used genome-wide SNPs generated through genotyping-by-sequencing (GBS) to compare the genetic composition and evolution of weedy rice strains in Southeast Asia, Northeast Asia, and the United States. We specifically address the following questions: (i) How do weeds from these different world regions compare with respect to relationships to cultivated rice varieties and to wild rice? (ii) To what extent does wild rice hybridization shape the genetic composition of Southeast Asian weeds? (iii) Is there evidence that wild rice hybridization with weeds in Southeast Asia has led to the differential introgression of loci associated with weed-adaptive traits?

| Sampling and genotyping
Rice seeds were obtained from the International Rice Germplasm Collection (IRGC), the United States Department of Agriculture (USDA), and from direct rice field collections in Malaysia (BK Song collections). Sampling included 133 weedy, 73 cultivated, and 34 wild rice accessions (Table 1 and Table S1) for a total of 240 accessions. Seeds were germinated for each sampled accession in the greenhouse at Washington University, and leaf tissue was collected from young seedlings. DNA was extracted using DNeasy Plant DNA kits (QIAGEN) or a modified CTAB procedure (Doyle, 1991). DNA concentrations were determined using Qubit Fluorometric Quantification. Genotyping-by-sequencing was carried out on 1 μg of genomic DNA (100 ng/μl) at Cornell University's Genomic Diversity facility based on the methods outlined by Elshire et al. (2011). Briefly, each sample was digested with ApeKI followed by ligation of barcode and common adapters. Barcoded libraries were sequenced on an Illumina HiSeq2000 sequencer (Illumina Inc., San Diego, CA) with single-end 100-bp chemistry. Raw sequence data were processed using a standard TASSEL-GBS pipeline (Bradbury et al., 2007). First, reads were filtered out if "N" was reported in the first 72 bases or a read did not contain a perfect match to any of the barcodes used in this study. Tags comprising fewer than five reads of the identical sequence were also discarded. All filtered tags were then aligned to the rice genome MSU 6.0 assembly (http://rice.plantbiology.msu.edu) using the Burrows-Wheeler alignment (BWA) tool (Li & Durbin, 2009), allowing a maximum of four mismatches and no gaps within 5 bp at the end of each read. SAMConverter was employed to convert SAM files to TagsOnPhysicalMap (TOPM) files, which were used to store information of the identified SNPs and small indels. Loci with more than 10% missing data and monomorphic data were discarded. After this filtering process, a total of 44,769 SNPs were retained for further analyses. Raw reads were submitted to the NCBI Short Read Archive (accession SRX576894).

| Population structure, PCA, and nucleotide diversity analyses
Bayesian analysis of population structure was performed in fast-STRUCTURE (Raj, Stephens, & Pritchard, 2014), with K values varying from 1 to 10 and three replicates for each K. The Python script chooseK.py, incorporated with fastSTRUCTURE, was used to identify the K value that maximized the marginal likelihood. Principal components analysis (PCA) was performed using the smartpca software in the EIGENSOFT package (Patterson, Price, & Reich, 2006).
A sliding window analysis for each identified rice genetic group was performed to estimate relative nucleotide diversity between rice groups across the genome. SNPs were converted from HAPMAP format to VCF format using TASSEL (Bradbury et al., 2007), and average pairwise nucleotide diversity (π) was estimated with a window size of 300 kbp and a step size of 100 kbp for a total of 3,679 sliding windows in VCFtools (Danecek et al., 2011). The mean nucleotide diversity and variation were calculated and visualized in R. GBS data cover a reduced fraction of the genome, so nucleotide diversity estimates likely include only a fraction of the total SNPs present in any given window. These estimates will therefore be impacted by the use of GBS data; however, the relative nucleotide diversity estimates between rice groups should not be affected as the same set of markers are used in all rice groups.

| Subpopulation structure of Malaysian weedy rice
Because a previous analysis of Malaysian weedy rice that we performed using simple sequence repeat (SSR) markers (Song et al., 2014) T A B L E 1 Summary of 240 rice accessions used in this study including country of collection and type

| Local ancestry estimation (HAPMIX)
Because population structure analysis of Southeast Asian weeds For the first scenario, VTCI wild rice (13 accessions) and landraces (15 accessions) were employed as the two putative parental populations, and the 19 weedy accessions from the same region were employed as the descendant population (Table S2). Indonesian samples were not included in this analysis because of insufficient wild rice collections in that country. To further investigate the role of wild alleles in the origin of wild-like traits that are present in these weeds, six well-studied domestication-related genes that control weed-adaptive traits were chosen to examine whether these genes or genomic regions were likely introgressed from wild accessions: An-1, controlling awn development (Luo et al., 2013); Bh4, controlling hull color (Zhu et al., 2011); sh4, controlling grain shattering (Li, Zhou, & Sang, 2006); qSW5, controlling seed size (Shomura et al., 2008;Weng et al., 2008); PROG1, controlling prostrate versus erect growth (Jin et al., 2008;Tan et al., 2008); and Rc, controlling pericarp pigmentation (Sweeney, Thomson, Pfeil, & Mccouch, 2006).
For each of these genes, VTCI weeds are characterized primarily by the phenotype found in wild rice. The genomic locations of these genes were verified using the latest rice reference genome assembly (IRGSP version 7).
For the second and the third admixture scenarios, we aimed to assess the extent to which three potential parental groups (elite cultivars, landraces, and wild rice) contributed to the Malaysian weedy rice genome. In the second scenario, the four Malaysian elite rice accessions and six Malaysian landrace accessions (which showed genetic similarity to weedy rice in population structure analyses) were employed as the two proposed parental populations. In the third scenario, we used seven Malaysian wild rice accessions in place of the landrace accessions used in the second scenario (Table S2). Although the actual parental accessions are unknown in each scenario, the use of these proposed parental accessions is arguably justified because they are geographically sympatric with the local weedy rice and genetically most similar based on our global ancestral estimation (see Results).
To convert SNP data from ancestral map format to the EIGENSTRAT format, which is required by HAPMIX, the CONVERTF function in the EIGENSOFT package (Patterson et al., 2006)  this assumption is consistent with population structure results. The number of generations since admixture was set to 10,000. This is most likely an overestimate, as the earliest origins of rice domestication likely date to no more than 10-12,000 years ago; however, HAPMIX results are robust to inaccurate estimations of time (Price et al., 2009), and the large estimate ensures that the time frame of weedy rice evolution is encompassed in the simulation. All of the remaining parameters used default settings. Results were visualized in R. aromatic, tropical japonica, and temperate japonica varieties (within the japonica subspecies). Results from K = 4-7 are also presented for comparison. As found in previous studies Reagon et al., 2010), US weedy rice groups into two distinct subpopulations that are closely related to the two cultivated rice subgroups within the indica subspecies: aus-like strains, corresponding to the previously described black-hull awned (BHA) weeds, and indica-like strains, corresponding to the previously described straw-hull awnless (SH) weeds.

| Population structure
Like US weeds, the sampled South Korean weedy rice accessions also segregate into two distinct genetic groups. For these samples, however, one group is indica-like while the other is grouped with japonica rice, specifically the temperate japonica varieties cultivated primarily in Northeast Asia (Figure 1). This evidence for temperate japonica-derived South Korean weeds is consistent with other reports of japonica-like weedy rice in temperate east Asia (Cho et al., 1995;He, Kim, & Park, 2017). In contrast, cultivation of aus rice varieties is restricted to northern regions of the Indian subcontinent and is absent in Southeast Asia.
As such, the aus-like component may be more likely to reflect weed descent from aus-like wild rice than from cultivated aus ancestors.
Overall, the combination of crop-like and wild-like genetic components in VTCI weeds suggests an origin through hybridization between cultivated rice and wild rice growing in close proximity to rice fields.
Malaysian weed samples are genetically distinct from the other Southeast Asian weeds. (Figure 1; Figure Reagon et al., 2010). In contrast, weedy rice from VTCI has nucleotide diversity levels that are only slightly lower than the levels found in O. rufipogon (2.14 × 10 −5 ). These high levels of nucleotide diversity could be driven by genetic input from multiple sources into these weeds, particularly indica cultivated rice and O. rufipogon. Malaysian weeds show a slight reduction in genetic diversity when compared to indica cultivars (1.34 × 10 −5 ).

| Admixture characterization
To quantitatively describe the degree of admixture in different weedy rice populations revealed by the fastSTRUCTURE analysis, we adopted a threshold in which individuals whose genome is composed of <80%   (Figure 4a-c). For the VTCI samples, there is a higher inferred level of contribution from wild rice (green) than from local landraces (red) (Figure 4a). Using a criterion of >80% ancestry to define genomic regions with a dominant contribution from a single ancestor, we found that 5.4% of the VTCI genomes are dominated by wild rice, with <1% showing a predominant inferred contribution from local landraces ( Figure S4). Notably, all six investigated weediness candidate genes (An-1, Bh4, sh4, qSW5, PROG1, and Rc) were located within a genomic region which was dominated by wild rice alleles ( Figure S5). Consistent with these patterns, seed phenotyping

| DISCUSSION
Weedy rice provides a valuable model to study rapid evolution and to test the importance of introgression from crops and their wild relatives in this process. In this study, we used more than 40,000 genome-wide SNPs to study the origin of weedy rice in five Southeast

| Contributions of wild and cultivated rice to weedy rice evolution
Weedy rice strains from different world regions have originated independently from one another. In nearly all cases, there is some genetic contribution of cultivated rice to the ancestry of weedy rice populations. However, the contribution of wild rice to weedy rice ancestry varies widely. US and South Korean weeds have no indication of wild rice contributions, and dedomestication events from cultivated rice seem to be responsible for the origin of these weeds.   (Figures 1 and 5).
Interestingly, it seems that indica-like weeds from Malaysia are genetically more similar to US indica-like weeds than weeds from other parts of Southeast Asia or South Korea (PCA; Figure S2). This could point to a shared common ancestor of Malaysian and US weeds. That conclusion is not supported by previous findings based on SSR markers (Song  Figure   S2), so we cannot rule out independent dedomestication events leading to these two weedy rice populations. The contribution of aus genes to weedy rice in Southeast Asia is smaller than that of indica, but is still present. Given the absence of aus rice cultivation in Southeast Asia, this contribution could most plausibly reflect introgression from auslike wild rice rather than cultivated aus varieties.
A subset of South Korean weedy rice is of temperate japonica origin. Weedy rice found in California (USA), which was not included in this study, is also of temperate japonica origin (Kanapeckas et al., 2016). It would be interesting to investigate whether these two weedy rice populations evolved independently from one another or reflect a single dedomestication event. To our knowledge, there is no evidence of weedy rice evolution from tropical japonica or aromatic rice varieties.

| Standing variation versus adaptive introgression
South Korean and US weedy rice populations do not show evidence of introgression of alleles from wild rice since their origin.
Therefore, weedy adaptive traits that are found in these populations are likely derived from standing genetic variation in the cultivated rice ancestral population. Previous studies have found standing variation in cultivar ancestors to be a contributing factor in weedy traits. The functional Rc allele that results in red-colored pericarps in wild rice is present in some cultivated rice varieties that are the likely source of this allele in weedy rice from the United States . Genetic bottlenecks during dedomestication could contribute to fixation of rare alleles from cultivated varieties in weedy populations. In a worldwide survey of cultivated and wild rice, one aus cultivar shared a functioning Bh4 allele, which underlies the black-hull phenotype, with the fixed allele of BHA weedy rice from the United States (Vigueira, Li et al., 2013).
However, standing variation does not seem to be the most likely source of all weed-adaptive traits in US weedy rice; for example, the sh4 allele that underlies loss of seed shattering in cultivated rice is also fixed in US weedy rice . The gain of seed shattering in these weeds may thus be due to recombination of standing genetic variation in the weeds or new mutations since weed establishment. Unique genetic QTLs are responsible for shattering in BHA and SH weeds from the United States (Qi et al., 2015), suggesting different underlying genetic mechanisms in the re-emergence of shattering.
Our genomic analysis of weedy rice from Southeast Asia paints a different picture from that of US weeds. Wild rice has contributed weedy adaptive alleles to weedy populations from Malaysia, Thailand, Vietnam, Cambodia, and Indonesia. Wild rice is the inferred contributing parent for genomic regions that house the six candidate weedy genes we tested ( Figure S5), specifically An-1, controlling awn development (Luo et al., 2013); Bh4, controlling hull color (Zhu et al., 2011); sh4, controlling grain shattering (Li et al., 2006); qSW5, controlling seed size (Shomura et al., 2008;Weng et al., 2008); PROG1, controlling prostrate versus erect growth (Jin et al., 2008;Tan et al., 2008); and Rc, controlling pericarp pigmentation (Sweeney et al., 2006). The wild rice contributions at all six of these regions suggest that this pattern is due to adaptive introgression rather than random, neutral introgression of portions of the genome.
One pattern that emerges from all of the weedy rice strains examined here is that to become weedy, rice apparently requires some genomic background from cultivated rice mixed with adaptive weedy alleles that are gained either through standing variation, new mutations, or adaptive introgression from wild rice. In regions where reproductively compatible wild rice is present, it appears that adaptive introgression from wild relatives is an effective route to acquisition of adaptive alleles. A more complete sampling of weedy rice, wild rice, F I G U R E 5 Current hypothesis of relationships between wild, cultivated, and weedy rice included in this study. Circles represent each group with connected triangles that represent population bottlenecks that lead to establishment of the new group. Green arrows represent introgression from wild rice into weedy populations after establishment. Average nucleotide diversity (π) is indicated for each group and cultivars from Southeast Asia would be beneficial to untangle exactly which genomic regions are cultivar-like and which are wildlike across weedy populations. These regions, if shared across populations, would presumably be important to weedy rice establishment or maintenance. It would also be interesting to track this over time as management of rice fields in Southeast Asia changes from traditional practices to more mechanized farming.

| Weedy rice management implications
Weedy rice populations have undergone a massive increase in regions of Malaysia and Thailand as a result of agronomic shifts toward industrialized rice production that rely on mechanized direct seeding of rice fields (Chauhan, 2013;Sudianto et al., 2016). Traditional rice farming in this region has relied on hand-transplanting of paddy-grown rice seedlings into prepared, flooded fields. While extremely labor-intensive, that method provides ample opportunities for hand-weeding of rogue weed seedlings. In contrast, direct seeding reduces opportunities for weeding and increases risks of weed seed dissemination between fields through shared farm equipment (Chauhan, 2013;Nadir et al., 2017). In Thailand, agricultural changes in the last decade away from direct seeding toward mechanized planting of seedlings have proved an effective weed control strategy (S. Jamjod, Chiang Mai Univ., pers. comm.). As the cultivation of herbicide-resistant rice planting has become more popular with Malaysian farmers in the last decade, the problem of herbicide-tolerant weedy rice has concurrently arisen (Engku et al., 2016;Ruzmi, Ahmad-Hamdani, & Bakar, 2017). This causes additional strain on the rice industry, particularly in regions where herbicide-tolerant rice cultivation has not been widely adopted.
Southeast Asian weedy rice is marked by much higher levels of genetic diversity than weedy rice from the other world regions sampled for this study. This high level of diversity as well as the continued potential for genetic introgression from both wild and cultivated rice in the region will likely make management of weedy populations extremely difficult. Although weed control in Thailand is showing promise over the last decade, the genetic potential for adaptation to these new practices is quite high in Thailand's weeds. Given our findings, management strategies that include control of wild rice populations in closest proximity to cultivated fields may be beneficial to weed control, as this could reduce the continued movement of adaptive alleles from wild rice into weedy populations. The high levels of genetic diversity in weedy rice from Southeast Asia further suggest that management in those regions should not rely solely on a single method (such as herbicide treatment), as the potential for rapid adaptive evolution is extremely high. Regardless of approach, careful monitoring of weedy populations during any shift in agricultural practices should be used to ensure the continued effectiveness of weed management strategies.