Genetic variation and possible origins of weedy rice found in California

Abstract Control of weeds in cultivated crops is a pivotal component in successful crop production allowing higher yield and higher quality. In rice‐growing regions worldwide, weedy rice (Oryza sativa f. spontanea Rosh.) is a weed related to cultivated rice which infests rice fields. With populations across the globe evolving a suite of phenotypic traits characteristic of weeds and of cultivated rice, varying hypotheses exist on the origin of weedy rice. Here, we investigated the genetic diversity and possible origin of weedy rice in California using 98 simple sequence repeat (SSR) markers and an Rc gene‐specific marker. By employing phylogenetic clustering analysis, we show that four to five genetically distinct biotypes of weedy rice exist in California. Analysis of population structure and genetic distance among individuals reveals diverse evolutionary origins of California weedy rice biotypes, with ancestry derived from indica, aus, and japonica cultivated rice as well as possible contributions from weedy rice from the southern United States and wild rice. Because this diverse parentage primarily consists of weedy, wild, and cultivated rice not found in California, most existing weedy rice biotypes likely originated outside California.

especially in terms of how these de-domesticated plants came to be and how their populations and genomes are evolving (Ellstrand et al., 2010;Qiu et al., 2017;Wedger & Olsen, 2018).
Asian domesticated rice (Oryza sativa L.) originated in South and Southeast Asia from the wild rice species Oryza rufipogon Griff.
Several distinct types of cultivated rice, including japonica, indica, and aus varieties, have evolved through multiple domestication events, adaptation to environment and rice-growing practices, and selection for agronomic and culinary traits (Londo, Chiang, Hung, Chiang, & Schaal, 2006). Cultivated rice has been a model genetic system for agricultural plants because of its small genome, ease of genetic manipulation, and importance as a food source globally (Wedger & Olsen, 2018).
This has led to a wealth of developed genetic resources, which can be used for the study of plant evolution. In the United States, two major rice-growing regions, the Mississippi River flood plain in the southern United States and the Sacramento Valley region of California, produce tropical japonica and temperate japonica rice, respectively.
Weedy rice (O. sativa f. spontanea Rosh.), also known as red rice, is a major problematic weed of rice agriculture in many regions (Wedger & Olsen, 2018; Figure 1). It is considered to be the same species as cultivated rice, O. sativa (Langevin, Clay, & Grace, 1990). While populations of weedy rice vary, it can generally be distinguished from cultivated rice by the red seed pericarp it is named for, high seed shattering, and increased seed dormancy (Gealy, 2005). This weed competes with cultivated rice in the field, leading to yield losses of up to 49% in the southern United States (Shivrain et al., 2009). Weedy rice is phenotypically similar to cultivated rice during the vegetative stage, making it difficult to identify until late in the growing season. The phenotypic and biological similarities of weedy rice with cultivated rice make it difficult to control in-season with either hand-weeding or chemical weed control methods. Because weedy rice is conspecific with cultivated rice, the abundant genetic resources developed for cultivated rice can also be applied to weedy rice. With diverse populations found in rice-growing regions around the world, weedy rice can be used as a model system to study weedy plant evolution as well as understanding the process of de-domestication.
As early evolutionary biologists considered the origins of weeds that are related to crops, various hypotheses were proposed for evolutionary pathways to weediness (Baker, 1974;De Wet & Harlan, 1975;Ellstrand et al., 2010). Weedy rice populations across the globe offer support for many of these hypotheses. The endoferal hypothesis states that weedy crop relatives are derived directly from the crop as a result of de-domestication (Gressel, 2005). Local weedy populations could have descended either from locally grown cultivars or from distantly located cultivars transported through movement of contaminated seed. Endoferal weedy rice populations that likely originated from local rice varieties have been identified in China (Cao et al., 2006;Xia, Wang, Xia, Zhao, & Lu, 2011). Some of these weed populations may have arisen through hybridization of indica and japonica cultivars, followed by environmental adaptation (Qiu et al., 2014). Weedy rice populations from the southern United States have been found to be descended from Asian indica and aus rice cultivars not grown in the United States (Londo & Schaal, 2007;Reagon et al., 2010). New weedy rice biotypes may also arise by hybridization of cultivated rice with existing weedy rice biotypes. Some populations of weedy rice in the southern United States originally derived from Asian cultivated sources have hybridized with each other and with local cultivars resulting in distinct weedy rice populations Shivrain et al., 2010). DNA sequencing of southern United States weedy rice and Asian weedy rice revealed that some populations contain a functional allele of the Rc gene responsible for both red pericarp and increased seed dormancy, indicating possible relationship to wild rice or Asian rice landraces never selected for white pericarp Li, Li, Jia, Caicedo, & Olsen, 2017;Subudhi et al., 2012).
In contrast to the endoferal hypothesis, the exoferal hypothesis states that weedy populations are the result of hybridization of the crop with its wild relative (Ellstrand et al., 2010;Gressel, 2005), in the case of rice most likely with O. rufipogon or O. nivara.
Domesticated rice and its wild ancestor O. rufipogon have some reproductive barriers, but gene flow is possible between domesticated, weedy, and wild rice (Bah, Merwe, & Labuschagne, 2017;Chu & Oka, 1970;Gealy, Mitten, & Rutger, 2003;Langevin et al., 1990). Some studies have proposed that some southern United States weedy rice populations evolved from crop-wild hybridization in China (Kanapeckas et al., 2016;Londo & Schaal, 2007), although there is limited empirical evidence for this. One final hypothesis is that weedy rice populations may not be derived from cultivated rice at all, but rather derived directly from wild rice species such as O. rufipogon or O. nivara, and that these populations have adapted to the environment of the cultivated rice agroecosystem, evolving phenotypic similarity with cultivated rice while retaining the seed shattering and seed dormancy traits of the wild species (Gressel, 2005). Some south Asian populations of weedy F I G U R E 1 Weedy rice panicles in a field in Colusa County, California (photo credit: Luis Espino) rice are likely descended from a wild rice ancestor (Huang et al., 2017). All of these hypotheses for the origins of de-domesticated populations are nonmutually exclusive, and it is possible that a weedy population may have genetic contributions from several wild, weedy, or domesticated ancestors.
It is clear from these previous studies that weedy rice around the world has evolved through several independent origins from diverse sources. The suite of phenotypic traits characteristic of weedy rice has evolved multiple times through convergent evolution of diverse genetic mechanisms (Li et al., 2017;Qi et al., 2015;Qiu et al., 2017;Thurber, Jia, Jia, & Caicedo, 2013). These studies highlight the need to investigate the evolutionary origins of weedy rice in individual regions to gain a greater understanding of weedy rice evolution as a whole (Wedger & Olsen, 2018).
In California, weedy rice was reported in the early 20th century shortly after the beginning of commercial rice production and was hypothesized to have originated from contaminated seed from the southern United States (Bellue, 1932). In the 1950s, weedy rice was thought to be eradicated. The use of a continuously flooded system and the widespread adoption of a certified seed program involving third-party field inspections and rice variety certification were credited as the reasons for the disappearance of weedy rice.
In 2003, however, weedy rice of a single biotype was reported in a dry-seeded rice field (Kanapeckas et al., 2016). Since then, weedy rice has been identified in at least 60 fields and on over 4,050 ha in 2016(Whitney Brim-DeForest, personal communication, February 5, 2018. While weedy rice in the southern United States has been well-characterized (Li et al., 2017;Londo & Schaal, 2007;, weedy rice in California is a recent and growing problem with previous studies limited to one or two biotypes (Kanapeckas et al., 2017(Kanapeckas et al., ,2016Londo & Schaal, 2007). It is unclear whether California weedy rice is derived from the weedy rice present in the southern United States, from cultivated rice inside or outside of California, from Asian wild rice, or from hybridization of any of these groups.
In this study, we seek to investigate the genetic diversity and relationships of California weedy rice, in order to gain insights into its evolutionary origins. We used microsatellite (SSR) markers and a Rc gene-specific marker to genotype 48 California accessions of weedy rice, as well as weedy rice from the southern United States, wild rice, and cultivated rice at 99 loci. We used phylogenetic, population structuring, and genetic distance-based approaches to examine possible relationships and evolutionary hypotheses for the origin of California weedy rice. We hypothesized that genetic diversity of weedy rice biotypes and their relationships to other rice groups would indicate multiple independent evolutionary origins of California weedy rice.  (Table 1). Samples were obtained from commercial rice fields in five of the nine major rice-producing counties (Glenn, Colusa, Butte, Yuba, and Sutter counties) in the northern Sacramento

| Plant material
Valley region of California. The majority of the 2006 collections were strawhull awned type or bronzehull awnless type, while several phenotypic types were present in 2016 collections (Tables 1 and 2).
Four of the 2006 accessions were also used to produce plant material for other studies (Kanapeckas et al., 2017(Kanapeckas et al., , 2016. To enable comparison with other weedy and wild rice, we included 20 weedy rice accessions from the southern United States (Arkansas, Mississippi, Missouri, Louisiana, and Texas) and 8 wild rice accessions. We also included a total of 22 cultivated rice accessions: 6 temperate japonica, 4 tropical japonica, 5 indica, 5 aus, 1 aromatic group V, and 2 red-pericarp specialty rice accessions. Samples of cultivated rice, southern weedy rice, and wild rice were obtained from USDA collections and from the Rice Experiment Station (Biggs, California) (Table 1).

| Genetic analysis
Genomic DNA was extracted from a 4-cm-long piece of leaf tissue from each plant sample using a modified TE-potassium acetate extraction protocol (Tai & Tanksley, 1990). Extracted genomic DNA was used directly for genotyping with 98 microsatellite (SSR) markers and 1 Rc gene-specific marker (Subudhi et al., 2012) (Supporting Information   Table S1). PCR amplification was performed with 0.1 µM labeled forward and reverse primers, 0.06 µM unlabeled dNTPs, 1× PCR buffer, and 0.08 units BioReady Taq polymerase (Bulldog Bio, Portsmouth, NH), and 10 ng DNA in a 8 µl PCR reaction. PCR reactions were run in a thermocycler with an initial denaturing step of 5 min at 94°C, followed by 35 cycles of 15 s at 94°C, 15 s at 55°C, and 30 s at 72°C, and a final extension of 5 min at 72°C. Products were resolved in a 6% polyacrylamide gel using an ABI 377 DNA sequencer (Applied Biosystems, Waltham, MA). Allelic differences between samples were scored based on allele size for genetic diversity (GenAlEx) and population STRUCTURE analyses. Genetic data were also scored as present (1) or absent (0) for each allele for the construction of phylogenetic trees.
To examine the biotypes of weedy rice existing in California and the southern United States, genetic diversity and differentiation indices, including the mean number of alleles detected per locus, Shannon diversity index, observed and expected heterozygosity, unbiased expected heterozygosity, and inbreeding coefficient were assessed for weedy rice biotypes (populations) using GenAlEx v6.5 software (Peakall & Smouse, 2006). Genetic differences among groups of California weedy rice were also inferred by conducting an analysis of molecular variance (AMOVA) in GenAlEx software.
To assess relationships between all rice samples, phylogenetic analysis of all 96 samples of California weedy rice, southern United States weedy rice, wild rice, and cultivated rice was conducted using neighbor-joining analysis with 1,000 bootstrap iterations in DARwin v.6 software (Perrier, Flori, & Bonnot, 2003) with allelic data from 99 genetic markers. To assess the membership of individual genotypes TA B L E 1 List of rice genotypes used in the genetic study, (genotype source, grain attributes, and presence or absence of 14-basepair deletion in Rc gene)

| RE SULTS
The 99 markers used in this study covered the 12 chromosomes of rice with an average of 8 markers per chromosome and a mean interval distance of 4.43 Mb between markers. The markers showed high polymorphism with an average of 5 alleles and a mean polymorphism information content (PIC) value of 0.61 per marker. In total, 508 different alleles were scored among 96 rice genotypes using the 99 markers (Supporting Information Table S1). The presence or absence of a 14-basepair deletion at the Rc gene correlated with red or white pericarp in rice individuals (Table 1). All weedy rice individuals had the wild-type allele lacking the deletion, demonstrating the effectiveness of this marker for genetic identification of red pericarp in California weedy rice (Table 1).
In the neighbor-joining phylogenetic analysis, individuals largely clustered by rice type (Figure 2). While bootstrap support for many basal branches of the tree is low, the grouping of most rice individuals into clusters by rice type is well-supported. The California weedy rice samples were grouped into four clusters, which correspond to five distinct biotypes categorized by hull color, grain type, and presence of awn ( Figure 2, Table 2).
The first cluster grouped all the short grain (SG), strawhull, awn- It is unclear from this analysis, however, whether the California weedy rice could be derived directly from these noncertified varieties or whether their relationship is the result of gene flow from these varieties or their ancestors into another population. Since California weedy rice individuals clustered into distinct biotypes, genetic differences among groups of weedy rice were examined in more detail.
Analysis of molecular variance (AMOVA) indicated that California weedy rice collections are very diverse, with the majority of the variation (55%) due to differences among groups (biotypes) while 40% is due to variation among individuals, and differences within group or biotype account for only 5% of genetic variation (Table 3). Each weedy rice biotype is genetically distinct from the others with an overall F ST value of 0.548 among biotypes. Comparison of genetic diversity patterns among the four major biotypes ( as would be expected for a species such as rice that reproduces primarily by self-fertilization. To investigate the relationships among rice individuals while allowing for gene flow and admixture, unlike phylogenetic analysis, STRUCTURE analysis was used to assign each individual's genotype to genetic clusters or populations. The largest increase in data probability (ΔK) was observed at K = 6 (Supporting Information Figure S1) (Evanno, Regnaut, & Goudet, 2005), and this model distinguishes the major biotype groups fairly well ( Figure 3). The

STRUCTURE grouping of California weedy rice individuals and all
other rice samples is consistent with their group membership from phylogenetic analysis (Figure 2). The majority of individuals assign to a single cluster with high probability, and most individuals of the same biotype assign to the same genetic cluster (Figure 3).
However, the majority of indica rice and wild rice individuals assign to multiple clusters, indicating higher background genetic diversity or admixture between clusters. Some weedy rice individuals also assign to multiple clusters, indicating hybridization with or evolutionary origin from other rice groups. The cluster that all Type  (Figure 4). The first three axes account for 22.9%, 11.6%, and 10.2% of genetic variation present. As in previous analyses, most rice individuals cluster together by rice type, and were spatially well differentiated on the first two axes (Figure 4). Type 1 rice clustered closely with aus rice, Basmati, the single temperate japonica individual that clustered separately from the others in phylogenetic analysis  (Table 5). In contrast, low pairwise F ST between a weedy rice biotype and another rice type can indicate more shared genetic content. For example, Type 2 shows low differentiation from indica cultivars (F ST = 0.224) and from wild rice (F ST = 0.214), indicating less differentiation between these groups and possible relatedness.

| D ISCUSS I ON
The increasing spread of weedy rice in California and the recent report of weedy rice originating from cultivated California rice varieties (Kanapeckas et al., 2016) raised questions about the origin of California weedy rice and its management. For this reason, we conducted a genetic study to understand the relationships between existing weedy rice in California and to investigate their possible origins. In the phylogenetic analysis, weedy rice individuals clustered together by biotype, indicating that for California weedy rice biotypes, samples can be easily classified by phenotype into groups that are biologically and genetically meaningful ( Figure 2, Table 2). The five biotypes of California weedy rice clustered within multiple larger genetic groups of weedy, wild, and cultivated rice (Figure 2). This division of weedy rice into separate clusters most likely indicates at least four separate evolutionary origins of California weedy rice from diverse lineages of cultivated, weedy, and wild rice. In fact, the four major groups of weedy rice are quite divergent from each other based on principal component analysis (Figure 4). Population structure analysis gives more insight into relationships of individuals and biotypes, revealing close correspondence between genetic populations and rice types ( Figure 3). However, some rice groups, especially wild rice and indica rice, are more genetically heterogeneous, with genotypes assigning to multiple genetic clusters. STRUCTURE analysis also identified admixed individuals, indicating hybridization of weedy rice both with other weedy rice biotypes and with wild and cultivated rice (Figure 3), despite the fact that rice is primarily self-fertilizing with generally low outcrossing rates (0.4%-11%) (Xia et al., 2011).
Individual and biotype differentiation analyses provide insights into the relationships of California weedy rice biotypes. The high pairwise F ST values between most California weedy rice biotypes, with the exception of Type 5, and the temperate japonica cultivars widely grown in California, indicates high genetic differentiation between California weedy rice and California cultivated rice and their relatively low shared genetic content ( necessarily exclude the possibility of infrequent hybridization with cultivated rice within California. One Type 1 individual and one Type 2 individual showed over 10% genetic assignment to the genetic cluster containing Type 5 and japonica rices in STRUCTURE analysis ( Figure 3). However, the majority of California weedy rice biotypes have a high inbreeding coefficient and low level of heterozygosity at 99 loci (Table 4). Therefore, it is likely that hybridization between rice groups happened many years or generations ago. Type 5 weedy rice was shown in phylogenetic, STRUCTURE, and PCA analyses to be closely related to japonica cultivars, raising questions of whether it is derived directly from the temperate japonica cultivars grown in California or from tropical japonica cultivars outside California and imported. The high inbreeding coefficient (F IS = 0.83) of Type 5 weedy rice (Table 4) and moderate genetic differentiation (F ST = 0.247) from temperate japonica rice (Table 5) make it likely that its evolutionary origin significantly predates its recent detection, although it is possible that a small weedy rice population could have been present unnoticed for some time prior to detection.
Another possibility for the origin and spread of California weedy rice is from the cultivation of red-pericarped specialty rice varieties. While the majority of rice-growing acreage in California is devoted to noncolored pericarp rice production, some specialty colored pericarp rice varieties are also grown at a commercial scale.
Two noncertified specialty rice varieties (called RR125 and RR126 here) are medium grain rice with red pericarp. RR125 is similar to bronzehull Type 2 weedy rice while RR126 is a strawhull type that is mostly awnless. In phylogenetic analysis, RR125 was grouped with Type 2 weedy rice and RR126 was grouped with Type 5 weedy rice  Overall, the phylogenetic, population structure, and principal component analyses above allow some insights into the ancestry of California weedy rice and into the prevalence of evolutionary histories of de-domestication. Type 1 weedy rice is likely evolutionarily derived from aus rice or possibly a wild rice species, as is the blackhull awned rice from the southern United States (Li et al., 2017).
These two American weedy rice biotypes may have a single origin from Asian rice or separate origins followed by hybridization with each other. However, they are both genetically and phenotypically distinct, as Type 1 rice is neither blackhulled or awned (Table 2).
Because the specific origins and any subsequent hybridization are unclear, it cannot be determined whether this biotype is derived from endoferal de-domestication directly from the crop cultivars or exoferal de-domestication through hybridization of cultivars and/or weedy populations. Type 2 weedy rice is most closely related to strawhull weedy rice from the southern United States, and these two groups likely evolved by exoferal de-domestication from indica rice. Alternatively, these two groups were also placed with wild rice species in several analyses and could have some ancestry from undomesticated rice. Regardless, it is unclear whether Type 2 and southern strawhull weedy rice have a single or separate origins.
Type 3 weedy rice of California is highly differentiated from other rice types and has ambiguous evolutionary origins. Based on its closest relationship in the phylogenetic analysis, it appears to have evolved from wild rice ( Figure 2) and may have retained wild traits such as pubescence of leaves, presence of long awn, high seed dormancy, and shattering, consistent with the results of Londo and Schaal (2007) using a single California weedy rice genotype (RR28).
In another study of Type 3 weedy rice by Kanapeckas et al. (2016), California strawhull weedy rice had the lowest mean population divergence (ϕ st ) from O. rufipogon from South East Asia, but was interpreted as having diverged from California cultivated rice based on coalescent modeling analysis. More study of this weedy rice biotype may be needed to fully understand its evolutionary origins.
Type 4 weedy rice is most likely descended from Type 3 weedy rice.
Type 5 weedy rice is endoferally derived from japonica rice, and it is not clear whether the direct ancestor is tropical japonica rice or temperate japonica rice grown inside or outside of California. For all California weedy rice biotypes, the presence of the causative Rc allele for red pericarp associated with wild rice or landrace rice never selected for white pericarp means that genetic contributions of endoferal ancestry from landrace rice or exoferal ancestry from wild rice cannot be ruled out.
In conclusion, the five major California weedy rice biotypes are not all closely related to each other and have diverse parentage from several major lineages of cultivated rice and wild rice, as well as relationships with weedy rice from the southern United States. Most biotypes are likely derived from independent origins outside of California, although hybridization between biotypes or with local cultivars may contribute to the evolution of weedy rice populations. Future study of California weedy rice with sequence data may help elucidate the evolutionary relationships of weedy rice types with currently ambiguous origins. The recent rediscovery and rapid spread of multiple weedy rice biotypes with evolutionary origins outside of California highlights the need for management of current weedy populations and measures to prevent further introductions of weedy rice into California.

ACK N OWLED G M ENTS
The authors would like to thank the California Rice Research Board for providing the funding for this research. The authors would also like to thank the California Rice Experiment Station and the Director, Dr. Kent McKenzie, who provided laboratory and greenhouse space. Dr. Paul Sanchez, Dr. Amar Godar, Michael Lee, Carson Tibbits, James Broaddus, and Christopher Boggs provided assistance in the greenhouse.

CO N FLI C T O F I NTE R E S T
None declared.

DATA ACCE SS I B I LIT Y
Genotype data for this paper have been archived in the Dryad repository (https://doi.org/10.5061/dryad.9v60c13) and are accessible there. To protect rice grower's identities, weedy rice sampling locations will be provided to those who wish to sign a confidentiality agreement.

R E FE R E N C E S
Bah, S., van der Merwe, R., & Labuschagne, M. T. (2017). Estimation of outcrossing rates in intraspecific (Oryza sativa) and interspecific