Population genetics and evolutionary history of the wild rice species Oryza rufipogon and O. nivara in Sri Lanka

Abstract Genetic diversity and population genetic structure of the wild rice species Oryza rufipogon and O. nivara in Sri Lanka were studied using 33 microsatellite markers. A total of 315 individuals of 11 natural populations collected from the wet, intermediate, and dry zones of the country were used in the study. We found a moderate to high level of genetic diversity at the population level, with the polymorphic loci (P) ranging from 60.6% to 100% (average 81.8%) and the expected heterozygosity (H E) varying from 0.294 to 0.481 (average 0.369). A significant genetic differentiation between species and strong genetic structure within species were also observed. Based on species distribution modeling, we detected the dynamics of the preferred habitats for the two species in Sri Lanka and demonstrated that both O. rufipogon and O. nivara populations have expanded substantially since the last internal glacial. In addition, we showed that the geographical distribution of the two species corresponded to the climate zones and identified a few of key environmental variables that contribute to the distribution of the two species, implying the potential mechanism for ecological adaptation of these two species in Sri Lanka. These studies provided important insights into the population genetics and evolution of these wild species in Sri Lanka and are of great significance to the in situ conservation and utilization of these wild resources in genetic improvement of rice.

The rice genus consists of diploid and tetraploid species with genome constitutions of AA, BB, CC, EE, FF, BBCC CCDD and HHKK, and HHJJ (Aggarwal, Brar, Nandi, Huang, & Khush, 1999;Ge, Sang, Lu, & Hong, 1999), which is an important gene bank harboring numerous favorable traits and genes that can be used to enhance the genetic basis of rice cultivars (Sang & Ge, 2007;Vaughan, Morishima, & Kadowaki, 2003). In particular, the wild relatives containing AAgenome have high genetic compatibility with cultivated rice and are the most accessible genetic resources for rice breeding (Sang & Ge, 2007;Vaughan et al., 2003;Zhu & Ge, 2005). Many studies have revealed that various traits derived from wild resources have successfully been integrated to rice cultivars for enriching them with many features (Ali, Sanchez, Yu, Lorieux, & Eizenga, 2010;Brar & Khush, 1997;He et al., 2017;McCouch et al., 2013;Vaughan et al., 2003).
In the present study, we used microsatellite markers to evaluate the genetic variation and population genetic structure for a diverse collection of the O. rufipogon and O. nivara populations throughout the three climatic zones, that is, wet, intermediate, and dry zones, of Sri Lanka. By determining the level and patterns of genetic diversity within the two wild rice species, we expected to discover the unique genetic resources maintained in the Sri Lankan populations, to reveal their genetic relationships, and to uncover the correlations between genetic components with geographic and climatic factors.
In addition, these two wild species, along with the cultivated rice have become an excellent working system for studying adaptation and speciation (Choi, Platts, Fuller, & Wing, 2017;Grillo et al., 2009;Huang et al., 2013;Liu et al., 2015;Zheng & Ge, 2010). Given the fact that Sri Lanka is an isolated island with diversified and well-understood ecosystems (Seo et al., 2005), investigations on the Sri Lankan populations of the wild rice will contribute to a better understanding of adaptation and speciation process of the wild rice and also provide additional insights into the evolutionary history and species divergence and adaptation in general. F I G U R E 1 Geographical locations of 11 Oryza rufipogon (the squares) and Oryza nivara (the circles) populations sampled in this study. Dark lines separate the climatic zones in Sri Lanka. Detailed information on these populations is provided in Table S1 information of these populations is provided in Supporting Information Table S1.

| DNA extraction and PCR amplification
The gDNA was extracted from 20 mg of silica gel-dried leaves, using the Plant genomic DNA kit (Biomed DL114-01) following by the CTAB protocol (Saghai-Maroof, Soliman, Jorgensen, & Allard, 1984) and further quantified using a Nanodrop 2000 (Scientific, 2009). Thirty-three microsatellite (Simple Sequence Repeat -SSR) primer pairs designed from cultivated rice were used to assay genetic variation of all included materials, based on the RiceGenes Database (https://gramene.org). Detailed information of the primer pairs is given in Table S2 The PCR products (2 µl) were diluted in 8 µl of ultrapure water and scoured in 100% alcohol for 15 min. The diluted DNA was dried and mixed with highly deionized formamide (Applied Biosystems) and then submitted for fragment analysis by capillary electrophoresis using an Applied Biosystems 3130×l DNA analyzer (Applied Biosystems). The DNA fragment size analysis and allele calling were performed using GeneScan and GeneMapper software (Applied Biosystems), followed by the manual allele binning. The matrix standard kit was used to generate the "multicomponent matrix" when analyzing the FAM-, TAMRA-, and HEX-labeled DNA fragments with the ABI3130 series systems. Data Collection Software (Applied Biosystems) used the multi component matrix for the instrument to analyze automatically for three different colored fluorescent dye-labeled samples in a single capillary. In addition, reference lanes were further verified by polyacrylamide gel electrophoresis. In the case of non-amplification, the PCR was repeated to exclude technical failure and a null allele was recorded if both PCRs failed.

| Data analysis
All SSR loci were initially evaluated for the adherence to the Hardy-Weinberg equilibrium and for presence of null alleles using the method based on heterozygous deficiency (Brookfield, 1996).
Genetic parameters for each population were assessed by calculating the allelic richness (A), the mean observed heterozygosity (H O ), the percentage of polymorphic loci (P), and the unbiased expected heterozygosity (H E ). All the estimates of genetic diversity were calculated using GenAIEx 6.5.01 software (Peakall & Smouse, 2006).
To identify the number of distinct clusters that best explain the genetic structure of the data set, we performed the structure analysis using the program STRUCTURE 2.3.4 (Falush, Stephens, & Pritchard, 2003;Pritchard, Stephens, & Donnelly, 2000), with default parameters. The K values ranging from 1 to 12 were evaluated using 12 independent runs for each K and 10 6 MCMC steps, assuming a burn-in period of 10 4 steps. The suggested K values were evaluated using the ∆K statistic developed by Evanno, Regnaut, and Goudet (2005). The level of genetic structure among the K clusters suggested by this analysis was evaluated. We also assessed both inter-and intra-specific genetic structure of the populations via multivariate principal component analyses (PCoA) through the dudi.pca() function of "adegenet" R package (Jombart & Ahmed, 2011). An analysis of molecular variance (AMOVA) were performed, implemented in GenAIEx 6.5.01 software (Peakall & Smouse, 2006), to assess the proportion of genetic variance explained by the differences within and between populations. An UPGMA clustering analysis of all populations was performed based on Nei's (1972) unbiased genetic distance, and the tree was visualized with TREEVIEW ver. 1.52 (Page, 1996). We also calculated the F ST value using FSTAT version 2.9.3 (Goudet, 2002) to investigate the group relationship. This method combines existing data with ecological climate layers to predict the relative probability of the existence of the species in a certain geographical area (Phillips et al., 2006  , the last glacial maximum (LGM; ~21kya) and the last interglacial (LIG; ~120 to 140kya) conditions. In addition, we added 107 O. nivara and 38 O. rufipogon records collected from Sri Lanka from field investigations. We defined the study extent according to the commonly accepted range of the two species, between 5.55 0 N and 9.51 O N, 79.42 O E and 81.53 O E. All the 145 localities of our own collection sites were used to build the models using the default parameters for convergence threshold (10 -5 ) and number of iterations (500). To insure the consistency of the model predictions, 75% of the localities were used to train the model and 25% were used to test it.  Table   S2). The table of allelic frequencies of each population is available from the first author upon request.

| RE SULTS
We calculated the genetic diversity at both population and species levels and found a moderate to high genetic diversity (Table 1) Table 1).
The STRUCTURE analysis done by running the K values from 2 to 12 based on the allele frequency distribution showed that the first divergence (K = 2) occurred between two major groups (i.e., two species; Figure 2). From K = 3 to K = 12, each species was further divided into distinct subgroups, corresponding to the populations within species. It is evident that there is admixed genetic background for individuals in some populations although a majority of individuals could be clearly assigned to a single population.
Interestingly, the O. rufipogon populations with many admixed individuals (SL01-R and SL06-R) were sampled from the intermediate zone (Figure 1). The PCoA also confirms the two major groups, with SL01-R being in a different place in the plot, a reflection of its admixture feature (Figure 3). The AMOVA showed that a large proportion of the total genetic diversity existed within (47.7%) and among populations (40%), with the genetic differentiation between species relatively low (12.3%) but significant (Table 2).

| D ISCUSS I ON
Microsatellites (SSRs) are featured as co-dominant, highly polymorphic, abundant, and randomly distributed markers in the genomes and have been successfully used for a variety of studies in rice biology (Kuroda et al., 2007;McCouch et al., 1997;Pusadee et al., 2013;Samal et al., 2018;Wang et al., 2014;Zhou et al., 2003).
As SSR markers have good cross-species amplification to the close relatives, they provide a powerful tool for studying O. nivara and O. rufipogon given that thousands of SSR markers across rice genome have been well characterized (Cho et al., 2000;.

| Genetic diversity and population genetic structure of O. rufipogon and O. nivara in Sri Lanka
To date, many investigations on genetic diversity of O. rufipogon and O. nivara have been undertaken using SSR markers and relatively high genetic diversity has been detected at both population and species levels (Banaticla-Hilario et al., 2013;Kuroda et al., 2007;Pusadee et al., 2013;Samal et al., 2018;Zhou et al., 2003;). In a study on the Chinese O. rufipogon populations, Zhou, Wang, Davy, and Liu (2013) found a genetic diversity of 0.413 and 0.787, as measured by H E , at population and species levels, respectively. Kuroda et al. (2007) studied It is well recognized that the perennial O. rufipogon is predominantly outcrossing while the annual O. nivara is highly self-fertilizing (Sang & Ge, 2007). Therefore, the O. rufipogon populations exhibited significantly lower differentiation than the O. nivara populations, as detected by different molecular markers in previous studies (Kuroda et al., 2007;Liu et al., 2015;Pusadee et al., 2016).

| Evolutionary history and origin of O. nivara in Sri Lanka
The hypothesis that O. nivara originated from O. rufipogon to associate with an ecological shift from a persistently wet to a seasonally dry habitat (Barbier, 1989;Vaughan, Lu, & Tomooka, 2008) has been supported by many recent studies (Grillo et al., 2009;Liu et al., 2015;Zheng & Ge, 2010). Interestingly, these two species were sympatric in many areas across their entire geographic distribution (Liu et al., 2015;Vaughan et al., 2008), which raised an interesting question that where and   Figure 6). In addition, we identified a few of the key environmental variables that have contributed to the distribution of the two species.
One interesting and challenging question is whether the Sri Lankan O. nivara originated locally within the island or introduced from the areas outside the island. Pant and Kumar (1997) reported that similar climatic conditions arose between the plains of southern India and northern Sri Lanka since the late Pleistocene.
Evidence also showed that Sri Lanka has been connected with Indian subcontinent and the dispersal across the Palk Strait between southeastern India and northern Sri Lanka was possible during the LGM (21 kya; Figure 6). In addition, the well-understood climatic zones in Sri Lanka and their close match to the distribution of the two species (i.e., O. rufipogon in the wet zone and O. nivara in the dry zone) provide a unique system to investigate adaptive divergence and ecological speciation in plant species. Together, our study on O. nivara and O. rufipogon populations in Sri Lanka not only provide important insights into the population genetics and evolution history of these wild species, but also is of great significance for in situ conservation and management of these genetic resources.

ACK N OWLED G M ENTS
We thank Shan-Shan Li and Mu-Fan Geng for their technical assistances and helpful comments on the manuscript. We also thank the

Current
LGM LIG

CO N FLI C T O F I NTE R E S T
None declared.

AUTH O R CO NTR I B UTI O N S
S. G., D. R., and S. S. conceived the ideas. D. R. and Q-L. M. collected the data. S. S., A. T., and A. M. analyzed the data; and S. G., A. T., A.
M., B. M., and S. S. wrote the paper.

DATA ACCE SS I B I LIT Y
Data available from the Dryad Digital Repository.