- Top of page
- Materials and Methods
- Supporting Information
Rice has played an essential role for our species, as it provides the staple food for over half of the human population. Increasing the rice yield and improving its quality is an important task in our future efforts to meet the increasing demands on food supply from population growth and economic development worldwide (Khush, 2001). Understanding the evolutionary history of cultivated rice and its wild relatives will help achieve these goals by facilitating the utilization of useful genes in wild rice (Ge et al., 1999; Wing et al., 2005; Sang & Ge, 2007). Furthermore, rice and its relatives have become an excellent model system for various evolutionary and functionary studies owing to their relatively small genome sizes, dense genetic maps and extensive genome colinearity with other cereal species, as well as the completion of genome sequencing of two rice subspecies (Wing et al., 2005; Ammiraju et al., 2008). In order to take full advantage of the ideal system for comparative and functional studies, we need better understanding of its fundamental background including the evolutionary history and dynamics of rice and its relatives.
The genus Oryza consists of two cultivated and c. 22 wild rice species, and is represented by 10 distinct genome types, including six diploid (A, B, C, E, F and G) and four allotetraploid (BC, CD, HJ and HK) genome types (Aggarwal et al., 1997; Ge et al., 1999). Through extensive phylogenetic analyses, the phylogenetic relationships among different genome types and species in Oryza have been well established (Ge et al., 1999; Zou et al., 2008). However, a few important evolutionary parameters have not been studied extensively or remain controversial, including the divergence times among lineages and ancestral effective population sizes. To date, several studies have used different methods to date the divergence times of the Oryza species, with different estimates obtained for some lineages (Second, 1985; Guo & Ge, 2005; Ammiraju et al., 2008; Lu et al., 2009; Sanyal et al., 2010; Tang et al., 2010). Estimates of ancestral effective population sizes are almost lacking except for Zhang & Ge (2007) in which the ancestral effective population sizes of C-genome Oryza species were attempted. Recently, Ai et al. (2012) estimated the effective population sizes for 10 extant diploid Oryza species, but no ancestral effective population size was obtained.
Divergence time is one of the key factors required to interpret the patterns of speciation and rates of adaptive radiation, and it is also necessary for estimating rates of genetic and morphological change and for understanding biogeographic history (Kumar & Hedges, 1998; Arbogast et al., 2002). Absolute divergence times allow evolutionary events to be placed in the appropriate context of global climate changes and geographical events, thereby suggesting possible mechanisms of lineage divergence and speciation (Tiffney & Manchester, 2001; Stromberg, 2005; Prasad et al., 2011). Effective population size is a central parameter in models of population genetics, conservation and human evolution, and plays important roles in plant and animal breeding (Rannala & Yang, 2003; Charlesworth, 2009). It helps answer numerous evolutionary questions by determining the equilibrium level of neutral or weakly selected variability in populations and evaluating the effectiveness of selection relative to genetic drift (Yang, 2002; Charlesworth, 2009). Estimates of the ancestral effective population size around the time of speciation can give very useful insights into the historical demographic process and speciation (Chen & Li, 2001; Broughton & Harrison, 2003; Rannala & Yang, 2003; Zhang & Ge, 2007; Zhou et al., 2007; Yang, 2010).
In this study, we sampled all six diploid genomes of the genus Oryza and selected 106 single-copy nuclear genes across all 12 rice chromosomes to estimate the divergence times and ancestral population sizes of major lineages in this genus. With these parameters available, we were able to estimate the time frame of species diversification in Oryza and explored potential climatic factors that underlie the diversification of the Oryza genus. Such information may help better understand the evolutionary history of Oryza and provides further insights into the pattern and mechanism of speciation and diversification in plants in general.
- Top of page
- Materials and Methods
- Supporting Information
Divergence time estimates of six diploid genome types based on 106 loci are shown in Table 1 and Fig. 1. Time estimates obtained from MCMCTREE and MULTIDIVTIME are very similar. The origin of the rice genus is dated c. 13–15 Ma, with 95% CI at (14.35, 16.57), (14.49, 16.64) and (12.12, 14.75) for clock2 and clock3 of MCMCTREE, and MULTIDIVTIME, respectively. The two programs dated the divergence of F-genome at c. 15 and c. 13 Ma, respectively. For the divergence of other genomes, the two programs gave largely consistent results: E-genome branched out from the ancestral lineage of the A-, B-, and C-genomes at c. 7 Ma, and C-genome diverged at c. 6 Ma, whereas the A- and B-genomes separated at c. 5.5 Ma.
Table 1. Divergence time estimation by the programs MCMCTREE and MULTIDIVTIME
|Prior||Posterior (clock2)||Posterior (clock3)||Prior||Posterior|
|I||10.00 (5.00, 15.00)||15.30 (14.35, 16.57)||15.40 (14.49, 16.64)||9.63 (5.25, 14.53)||13.46 (12.12, 14.75)|
|II||8.01 (3.04, 13.84)||15.17 (14.20, 16.43)||15.35 (14.44, 16.59)||7.91 (3.22, 13.49)||12.80 (11.52, 14.04)|
|III||6.02 (1.58, 12.04)||7.50 (6.87, 8.23)||7.36 (6.83, 8.04)||6.09 (1.77, 11.93)||6.79 (6.00, 7.58)|
|IV||4.03 (0.58, 9.80)||6.13 (5.57, 6.76)||5.87 (5.44, 6.41)||4.21 (0.56, 9.88)||5.79 (5.10, 6.49)|
|V||2.04 (0.05, 6.99)||5.61 (5.08, 6.21)||5.31 (4.89, 5.81)||2.21 (0.06, 7.30)||5.49 (4.83, 6.16)|
Figure 1. The chronogram of the genus Oryza based on 106 loci, showing the results of Bayesian MCMCTREE analysis under the clock2 model with calibration of (5, 15) Myr. The capital letters represent the genome types, followed by the sampled species in parentheses. The ancestral nodes are indicated as I, II, III, IV and V. The branch length of the tree represents the posterior means of date estimates with the grey bars illustrating 95% Confidence Intervals (CIs). The details of the dates are shown in Table 1. The numbers besides the nodes indicate the θ values of ancestral populations estimated by two-species ML and Bayesian MCMC methods, respectively. Below the chronogram is the figure showing temperature changes since the Miocene (modified from Zachos et al., 2001).
Download figure to PowerPoint
It is noted that considerable overlaps of the estimates in 95% CI were found between the divergence times of the F-genome and the G-genome (node I and II), and between the divergence times of the C-genome and the A- and B-genomes (node IV and V). This implies two episodes of rapid speciation that gave rise to the G- and F-genomes and the A-, B-, C-genomes. To explore whether the overlap was caused by the prior setting we used in the MCMC analyses, we examined the effect of priors for fossil calibration and rate heterogeneity across branches using two sets of parameters. In the first set, we narrowed and widened the width of the age constraints for node I by using different calibrations, that is, (10, 15), (5, 15) and (5, 20), respectively (in the form of minimum and maximum age in parenthesis, with time unit as Myr). In the second set, we changed the parameter settings that control rate variation across branches, that is, parameter σ2 in MCMCTREE by using different gamma priors as G(1, 10), G(2, 1) and G(5, 1), respectively, and changed the brownmean parameter in MUTIDIVTIME by using different values equal to 1, 3, 5 and 10, respectively. Results show that different calibrations have only a slight effect on node age, and the variation in posterior mean times obtained by MCMCTREE was < 3.36 Myr, nearly the same results obtained by MULTIDIVTIME with a difference of < 0.26 Myr (Fig. S1b,d). Compared with the calibration, the prior for the rate heterogeneity across branches had slightly larger effects on posterior estimates, with the differences in posterior means < 6.09 Myr in MCMCTREE and < 0.88 Myr in MULTIDIVTIME (Fig. S1a,c). When we draw attention to the two time spans between nodes I and II and between nodes IV and V, the CIs always overlapped (Fig. S1).
As stated by Yang & Rannala (2006), for a specified set of fossil calibrations, the error of posterior time estimates cannot be reduced to zero by increasing the number of sites in the sequence. Yang & Rannala (2006) predicted that when the sequence data approach infinity, the posterior means and the 95% CIs for all node ages will lie on a straight line. We therefore examined whether or not adding more sequence data would improve the date estimates by plotting the posterior means of divergence time against the width of corresponding 95% CIs for each node, following Yang & Rannala (2006). As shown in Fig. S2, a nearly perfect linear relationship exists between the posterior means and the 95% CI bounds, regardless of the programs used (r2 = 0.84–0.95). The high correlation coefficients of the plot suggest that our sequence data are highly informative, and it seems unlikely that the precision of time estimates might be improved by adding more sequences.
Our likelihood-based relative rate test (RRT) showed that among the 106 gene dataset, 66 genes did not reject the clock hypothesis and thus were included in our estimation of ancestral effective population size (Table S1). For the two-species ML method, we used several classes of paired species in which all pairs share the common ancestral population following the method of Satta et al. (2004). For node I, we used five pairs of species: A- and G-genomes, B- and G-genomes, C- and G-genomes, E- and G-genomes, F- and G-genomes; for node II, we used four pairs of species: A- and F-genomes, B- and F-genomes, C- and F-genomes, E- and F-genomes. Similarly, for nodes III, IV, and V, we used three, two, and one pairs of species, respectively. We obtained fairly large results for all ancestral polymorphism estimates (θ value), with all of them ≥ 0.013, and the estimates within each class were similar (Table 2).
Table 2. Estimates of θ values of ancestral populations in Oryza using the two-species maximum likelihood (ML) and Bayesian Markov chain Monte Carlo (MCMC) methods based on 66 neutrally evolving loci
|Node||Method||Paired taxa||θ||95% CI|
|I||Two-species ML||A-G||0.030||0.018, 0.042|
|Bayesian MCMC|| ||0.026||0.020, 0.034|
|II||Two-species ML||A-F||0.017||0.009, 0.025|
|Bayesian MCMC|| ||0.025||0.008, 0.051|
|III||Two-species ML||A-E||0.017||0.011, 0.023|
|Bayesian MCMC|| ||0.028||0.022, 0.035|
|IV||Two-species ML||A-C||0.017||0.011, 0.023|
|Bayesian MCMC|| ||0.025||0.015, 0.039|
|V||Two-species ML||A-B||0.018||0.012, 0.024|
|Bayesian MCMC|| ||0.025||0.008, 0.050|
Because variation of evolutionary rates among loci may influence the estimate and ignoring this rate variation might inflate the coalescent variation among loci, leading to the overestimation of the ancestral effective population size (Yang, 1997), we calculated the relative rate of each gene following Yang (2002); that is, for a given locus, averaging the JC69 distance from the A-, B-, C-, E-, and F-genomes to the G-genome, and then dividing by the mean of the value across all 66 loci. Although we found minimal rate of variation among the 66 loci (Fig. S3), we evaluated the effect of among-locus rate variation in the Bayesian MCMC method, in which the relative rate for each locus could be taken into consideration. Analyses with or without incorporating the relative rates of each locus did not obtain significantly different estimates (P = 0.89, Wilcoxon signed-ranks test), and thus we only show results that account for rate heterogeneity (Table 2).
Given that little prior information is available for θ, we performed a series of analyses using different priors for θ in the Bayesian MCMC method, with the mean being 0.005, 0.01, 0.02, 0.03 and 0.05, respectively, based on estimates of ancestral polymorphisms of Oryza C-genome species and of other genera (Rannala & Yang, 2003; Zhang & Ge, 2007; Zhou et al., 2007; Yang, 2010). The posterior distributions of the θ values for the five internal nodes were plotted in Fig. 2 and the details are provided in Table S2. As shown in Fig. 2, the posterior θ estimates obtained based on different priors were very similar for the ancestral populations of all genomes (node I) and the A-/B-/C-/E-genomes (node III). For the ancestral population of A-/B-/C-genomes (node IV), the posterior estimates were slightly influenced by the priors, but the posterior distributions overlapped substantially. For the ancestral population of A-/B-/C-/E-/F-genomes and that of A-/B-genomes (nodes II and V), the mean and distribution of the posteriors were dependent on the priors. Although the priors for population divergence time (τ) were assigned based on previous dating results (Ammiraju et al., 2008; Lu et al., 2009; Tang et al., 2010), we also varied the τ priors four-fold to examine the effect on posterior estimates. These analyses showed that the τ priors only had a negligibly small effect on the estimates (results not shown).
Figure 2. The posterior distributions of θ values for the five internal nodes for the genus Oryza in Fig. 1. Different priors for θ were used in the analyses, with the prior means 0.005, 0.01, 0.02, 0.03 and 0.05, respectively. The ordinate and abscissa represent the frequency and midvalue of θ values, respectively.
Download figure to PowerPoint
Taken together, all estimates for the ancestral effective population size by the two methods were largely consistent, except that the estimates for node III were slightly lower with ML than that with the Bayesian method (Table 2). Our estimates suggest large θ values for the ancestral population during the evolutionary history of Oryza. Taking the generation time for the ancestral species as 1 yr and the evolutionary rate as 6 × 10−9 substitutions per site per year (Gaut, 1998), we estimated the ancestral effective population size to be over 540 000 throughout the evolutionary history.