The size of cultivated rice (Oryza sativa) grains has been altered by both domestication and artificial selection over the course of evolutionary history. Several quantitative trait loci (QTLs) for grain size have been cloned in the past 10 yr. To explore the natural variation in these QTLs, resequencing of grain width and weight 2 (GW2), grain size 5 (GS5) and QTL for seed width 5 (qSW5) and genotyping of grain size 3 (GS3) were performed in the germplasms of 127 varieties of rice (O. sativa) and 10–15 samples of wild rice (Oryza rufipogon).
Ten, 10 and 15 haplotypes were observed for GW2, GS5 and qSW5. qSW5 and GS3 had the strongest effects on grain size, which have been widely utilized in rice production, whereas GW2 and GS5 showed more modest effects.
GS5 showed small sequence variations in O. sativa germplasm and that of its progenitor O. rufipogon. qSW5 exhibited the highest level of nucleotide diversity. GW2 showed signs of purifying selection. The four grain size genes experienced different selection intensities depending on their genetic effects. In the indica population, linkage disequilibrium (LD) was detected among GS3, qSW5 and GS5.
The substantial genetic variation in these four genes provides the flexibility needed to design various rice grain shapes. These findings provide insight into the evolutionary features of grain size genes in rice.
If you can't find a tool you're looking for, please click the link at the top of the page to "Go to old article view". Alternatively, view our Knowledge Base articles for additional help. Your feedback is important to us, so please let us know if you have comments or ideas for improvement.
Asian cultivated rice (Oryza sativa) is a staple food that provides three-quarters of the caloric intake of the population of Southeast Asia (Khush, 1997). Grain size is measured in terms of either grain weight or shape (grain length (GL) and grain width (GW)), and is a major component of crop yield and an important trait for appearance quality (Xing & Zhang, 2010). Different grain sizes are favoured by different local cultures and cuisines. Grain size characteristics, which are immediately obvious to consumers, are major factors defining market value (Fitzgerald et al., 2009). Detailed knowledge of the genetic factors controlling grain size enables breeders to design appropriate genotypes for distinct preferences.
In Asian cultivated rice (O. sativa), the two major subpopulations, indica and japonica, can be differentiated on the basis of morphological and genetic differences (Chou, 1948; Caicedo et al., 2007). Genome-wide studies have demonstrated that these differences arose from genetically distinct gene pools in a common ancestor, Oryza rufipogon, by a process of continuous selection for desirable features (Huang et al., 2012b). In addition to being an important domestication trait, grain size acts as a distinguishable character for the two rice subpopulations (Fitzgerald et al., 2009). Thus, dissection of the genetic basis of grain size and isolation of grain size genes will increase our understanding of the origin and evolution of rice.
In recent years, several quantitative trait loci (QTLs) regulating grain size have been fine-mapped (Xie et al., 2006; Liu et al., 2009; Bai et al., 2010; Shao et al., 2010). grain size 3 (GS3), the first grain-length QTL to be cloned, was isolated in different genetic backgrounds in several studies and functions as a negative regulator of grain size (Fan et al., 2006; Takano-Kai et al., 2009). A nonsense mutation in the second exon is associated with large grain size and is widespread in global rice collections (Fan et al., 2009; Wang et al., 2011). Further molecular characterization of GS3 identified four putative domains, with the organ size regulation (OSR) domain being both necessary and sufficient for its function (Mao et al., 2010). A major GW QTL, grain width and weight 2 (GW2), encodes an uncharacterized Really Interesting New Gene (RING) type E3 ubiquitin ligase. Increased GW is associated with a mutation that causes premature truncation of the GW2 protein (Song et al., 2007). QTL for seed width 5 (qSW5) is the most important QTL for GW and grain weight. A 1212-bp deletion in qSW5 was confirmed to be the causal mutation that led to increases in both GW and grain weight. A yeast two-hybrid assay further implied a possible role for this gene in the ubiquitin-proteasome pathway (Shomura et al., 2008; Weng et al., 2008). grain size 5 (GS5) is a minor GW QTL that is 2 Mb distant from qSW5 and functions as a positive regulator of grain size, such that higher expression of this gene is associated with both larger grain size and acceleration of the cell cycle (Li et al., 2011). Together with the newly cloned genes OsSPL16 and GL3.1 (Qi et al., 2012; Wang et al., 2012), GS3, GW2, qSW5 and GS5 are the main determinants of grain size dimensions. However, the extent of natural variation, allele frequencies and functional differences associated with these QTLs in rice germplasm remain unclear. In addition, their relative importance in grain shape determination is still unknown. An exploration of these issues will aid in understanding the comprehensive role of each gene in grain shape formation and provide an efficient way to target and improve grain shape.
The natural population, harbouring plentiful natural variation, provides an excellent opportunity to identify key single nucleotide polymorphisms (SNPs)/insertions/deletions (InDels) associated with gene function (Zhu et al., 2008). Nucleotide diversity can reflect the evolutionary history and geographical distribution of important genes or alleles (Izawa et al., 2009). Functional gene-based association analyses facilitate the further dissection of traits controlled by many genes. For example, polymorphisms in six genes involved in the flowering pathway were sequenced in a rice collection; by combining these results with those of further experiments, the authors demonstrated that variation in Hd1 proteins, Hd3a promoters and Ehd1 expression levels all contribute to the diversity of flowering time in cultivated rice (Takahashi et al., 2009). In addition, comparisons of the nucleotide diversity of four major flowering genes, PhyB, Hd1, Hd3a and Ehd1, between cultivars and wild rice revealed significant selection on these four genes and geographical adaption to differing photoperiods (Huang et al., 2012a).
In this study, GW2, qSW5 and GS5 were sequenced in the germplasms of 127 varieties of cultivated rice (O. sativa) and 10–15 accessions of wild rice (O. rufipogon). The GS3 genotypes of all samples were determined using functional markers (Fan et al., 2009). Allelic frequencies and functional differences in rice grain size were compared, respectively, in indica and japonica subpopulations. We found that genes with signatures of significant selection have long been utilized in rice breeding from rice germplasm. These results provide insight into the evolutionary features of grain size genes as well as information that may be useful for future molecular applications of these genes in rice breeding.
Materials and Methods
A total of 127 cultivated rice (Oryza sativa) varieties, collected world-wide, were used in this study. Most accessions are landraces from rice-growing regions in China, and some correspond to modern cultivars (Supporting Information Table S1). An additional 15 common wild rice varieties (Oryza rufipogon) were obtained from the International Rice Research Institute (IRRI; Table S2). All the material was planted in the rice-growing season of 2010 in the experimental field of Huazhong Agriculture University, Wuhan, China. Ten plants were transplanted in a single row for each accession with 16.5 cm between plants and 26.4 cm between rows. Field management was performed according to normal rice production. Harvested paddy rice was air-dried before measurement. Ten randomly chosen, fully filled grains from each plant were lined up length-wise along a vernier caliper to measure GL and then arranged by breadth to measure GW. Grain thickness (GT) was determined for each grain individually using a vernier calliper, and the values were averaged and used as the measurements for the plant. Grain weight was calculated on the basis of 300 grains and converted to 1000-grain weight (KGW).
DNA extraction, PCR amplification, cloning and sequencing
Fresh leaves were harvested from field-grown plants, and genomic DNA was extracted using the cetyl-trimethyl ammonium bromide (CTAB) method (Murray & Thompson, 1980). Gene fragments were amplified from genomic DNA using LA Taq (TaKaRa Biotechnology, Dalian, China). PCRs were conducted using standard PCR protocols. As heterozygous genotypes exist in O. rufipogon, the PCR product was ligated into the pGEM-T Easy Vector (Promega, USA), independent plasmid DNA was then sequenced, and only one sequence of the allele was selected randomly. For sequencing, 8 μl of PCR product was digested with 5 U of ExoI (NEB, Ipswich, MA, USA) and 0.13 U of shrimp alkaline phosphatase, and the singletons and ambiguous sites were resequenced as necessary. Sequence contigs were assembled with the sequencher 4.1.2 program (Gene Codes Corporation, Ann Arbor, MI, USA). The Cleaved Amplified Polymorphic Sequences (CAPS) marker SF28, developed in a previous study (Fan et al., 2009), was employed for genotyping the GS3 alleles. A list of all the primers used for PCR and sequencing is presented in Table S3.
DNA sequence analysis and population structure analyses
The DNA sequences were aligned using the clustal x 2.1 program (Larkin et al., 2007) and manually adjusted in BioEdit (Hall, 1999). The number of segregating sites (S), haplotype diversity (Hd), nucleotide diversity (π) (Nei, 1987), Watterson's estimator from S (θw) (Watterson, 1975) and two parameters of the neutrality test, Tajima's D (Tajima, 1989) and Fu and Li's D (Fu & Li, 1993), were calculated using dnasp version 5.0 software (Librado & Rozas, 2009). The evolutionary history was inferred using the neighbour-joining method with the mega5 program (Tamura et al., 2011).
For population structure analyses, 24 simple sequence repeat (SSR) markers, one each in the short and long arms of the 12 rice chromosomes, were randomly selected for genotyping the 127 rice varieties according to the SSR genetic map (Temnykh et al., 2001). The 24 markers were RM529, RM522, RM526, RM211, RM411, RM60, RM518, RM348, RM574, RM274, RM508, RM412, RM427, RM172, RM339, RM408, RM553, RM321, RM484, RM239, RM224, RM479, RM247, and RM463. PCR was performed using standard PCR protocols and PCR products were separated on 4% polyacrylamide denaturing gels to determine the alleles of each marker. structure 2.3.2 was used to infer the population structure using a burn-in of 10 000, a run length of 100 000 and a model allowing for admixture and correlated allele frequencies (Pritchard et al., 2000). The number of subpopulations, K, was tested from one to five, and five independent runs yielded consistent likelihoods of the population structure for each K.
Expression and statistical analyses
Young pre-heading panicles from three independent plants of each accession were harvested for the expression analysis. Fresh leaves at the seedling stage were also collected. Total RNA was extracted using TRIzol reagent (Invitrogen). Two micrograms of total RNA was reverse-transcribed using SuperScript III reverse transcriptase (Invitrogen) in a final volume of 20 μl to obtain cDNA. Real-time PCR was performed using gene-specific primers (Table S3) on a 7500 real-time PCR system (Applied Biosystems, Carlsbad, CA, USA) according to the manufacturer's instructions. Four technical replicates were performed for each sample. The rice ubiquitin gene was used as the internal control. Expression level data were obtained using the relative quantification method.
Differences in phenotypic values and in gene expression levels between the haplotypes were examined by one-way ANOVA or Student's t-tests. The Duncan multiple range test was conducted to make further comparisons if the results of the analyses were significant (P <0.05). Statistical analyses were performed using the statistica software (StatSoft 1995, Tulsa, OK, USA). The allele frequency correlation (R2) between segregating sites was used to measure the degree of LD. Inter-locus R2 values were calculated with the tassel 3.0 program (Bradbury et al., 2007) for segregation sites between all gene pairs in indica and japonica, respectively, and the schematic was generated in R (version 2.15.0). A total of 204 SNPs for GS3, qSW5, GS5 and GW2, downloaded from the RiceVarMap website (http://ricevarmap.ncpgr.cn/django/home/), were used to test LD between loci. The R2 was calculated separately in a population of 787 indica variations and in a temperate japonica population of 429 variations.
Phenotypic and genetic variation in the rice germplasm
The germplasm of 127 rice varieties presented substantial variation in grain size. GL ranged from 6.32 to 12.95 mm, GW ranged from 2.29 to 4.43 mm, GT ranged from 1.67 to 2.60 mm, and KGW ranged from 11.50 to 41.28 g (Table S1). Significant correlations were detected between the grain shape traits (Table S4). GL was negatively correlated with GW, and GW was positively correlated with GT, while GL, GW, GT were all positively correlated with KGW. These patterns are in accordance with the concept that KGW is the product of GL, GW and GT.
All 24 SSR markers were polymorphic among the collection of 127 varieties. Two to 16 alleles were detected for individual SSR markers (an average of 4.95 alleles), indicating substantial genetic variation. A significant population structure was identified in the collection. The groups corresponded to the indica and japonica populations, which contained 80 and 47 varieties, respectively. There were significant differences in GW, GT and KGW (Fig. 1b–d), but not in GL, between the two populations (Fig. 1a).
GS3 genotype determination
The CAPS marker SF28 was used to genotype the GS3 locus. Functional (GS3-C) and nonfunctional (GS3-A) GS3 alleles were observed in 43 and 36 varieties of the indica population and 23 and 24 varieties of the japonica population, respectively. The nonfunctional allele frequency was equivalent between the two populations. Differences in GL and GW between functional and nonfunctional GS3 genotypes were significant within both populations, whereas significant differences in KGW were only detected in the japonica population (Fig. 1e–h).
GW2 haplotype and association analysis
The whole genomic DNA sequence of GW2 was resequenced in the 127 varieties and analysed for nucleotide diversity. Thirty-six SNPs and one InDel were detected in the 6852-bp alignment (Fig. 2a). Of these polymorphisms, 12 SNPs and one InDel were located in the promoter and 5′ untranslated region (UTR), two synonymous SNPs occurred in the first and eighth exons, and the remaining 22 SNPs were located in introns. However, the 1-bp deletion in the fourth exon that was previously reported to cause a loss of function (Song et al., 2007) was not detected in the population. Ten haplotypes, named GW2-1 to GW2-10, were constructed based on the SNPs and InDels. GW2-2, GW2-3 and GW2-4 were largely represented by japonica varieties (67.8%) and showed large genetic distances to other haplotypes, mainly represented by indica varieties (89.7%) (Fig. 2b). The most prevalent haplotypes were GW2-1, GW2-2 and GW2-3, represented by 46, 29 and 18 varieties, respectively. The next most prevalent haplotypes were GW2-4 and GW2-5, represented by 12 and 10 varieties, respectively, while other haplotypes were rare types that were represented only by one to four varieties.
Differences in GW were examined among varieties carrying haplotypes GW2-1, GW2-3 to GW2-5, because of their high frequency in the rice germplasm. No difference in grain size was observed among the four major GW2 haplotypes in the indica population. However, in japonica, significant differences were detected in GL, GW, GT and KGW between GW2-2 and GW2-3 (Fig. 2c–f). The varieties with the GW2-3 genotype possessed a more slender but heavier grain. GW2-2 and GW2-3 were differentiated by three diagnostic SNPs (Fig. 2a), two of which were located in the promoter and the third in an intron. To determine whether the SNPs in the promoter affected grain size at a transcriptional level, the relative gene expression of GW2 was compared between the two haplotypes. However, no difference in expression level was observed in either the panicle or leaf tissue (Figs 2g, S1).
qSW5 haplotype and association analysis
The entire qSW5 locus (3209 bp) was sequenced in the 127 varieties. Sixty-seven SNPs and two large InDel regions were detected. Three major qSW5 types were classified according to the variations in the large InDel region, corresponding to the Kasalath, Nipponbare and Indica II types found in previous work (Shomura et al., 2008). Forty indica and 13 japonica varieties possessed the functional Kasalath allele (qSW5-K), while 31 indica and 34 japonica varieties carried the Nipponbare allele (qSW5-N) with a 1212-bp deletion. Finally, seven indica varieties belonging to Indica II type were represented by ZALE (qSW5-Z), which had a 950-bp deletion in the 3′-UTR. Each type had two to eight haplotypes comprising the polymorphic sites of the 67 SNPs and five InDels (Fig. 3a). In addition, four nonsynonymous SNPs and eight synonymous SNPs were detected in the exon of qSW5, resulting in four amino acid changes (Fig. S2).
Significant differences in GL, GW and GT were detected between qSW5-K and qSW5-N types within the indica and japonica populations (Fig. 3b–e). However, the grain size of varieties carrying the qSW5-Z type alleles was similar to that of the varieties with the qSW5-N type allele within the indica population.
GS5 haplotype and association analysis
GS5 was a minor QTL for GW. Forty-one SNPs and six InDels were detected in the 6.4-kb genomic region of GS5. Of these, 34 SNPs and three InDels were located in the promoter; two InDels were detected in the first exon, resulting in a change of two to five Glys. Three contiguous SNPs were located in the second exon, leading to two amino acid changes; a nonsynonymous SNP was detected in the ninth exon, and the remaining six SNPs were located in an intron and the 3′ UTR. These amino acid variations were the same as those found in a previous work (Li et al., 2011), in which the coding region variation was confirmed not to alter GS5 gene function. Ten haplotypes, named GS5-1 to GS5-10, were constructed (Fig. 4a,b). The most prevalent haplotypes were GS5-1, GS5-2 and GS5-3, corresponding to the haplotypes carried by ZS97, ZH11 and H94 in a previous work (Li et al., 2011). These haplotypes included 46, 35 and 28 varieties, respectively. Other haplotypes included less than five varieties each.
Of the 46 varieties with the GS5-1 haplotype, 43 were indica and three were japonica. A similar pattern was observed for GS5-3, which was found in 26 indica and two japonica varieties. However, GS5-2 was predominantly represented by japonica varieties (four indica and 31 japonica). This distribution implies that GS5-2 was the only prevalent haplotype in the japonica population, while two haplotypes, GS5-1 and GS5-3, were both prevalent in the indica population. We compared grain sizes between the GS5-1 and GS5-3 haplotypes in the indica population and found significant differences in GW, GT and KGW. The varieties with the GS5-3 haplotype possessed more slender grains of lower weight (Fig. 4c–f).
In a previous work (Li et al., 2011), higher expression of GS5 is correlated with larger grain size. However, no expression difference in rice panicle tissue was detected between the GS5-1 and GS5-3 haplotypes in the rice germplasm (Fig. 4g).
Evaluation of genetic effects of alleles in the near-isogenic lines (NILs)
To validate the results for the germplasm in a very different genetic background, several series of chromosomal segment substitution lines in the genetic background of ZS97 were used to screen target lines carrying different donor alleles at these four grain size genes to evaluate their genetic effects (X. J. Qiu & S. B. Yu, unpublished; Sun et al., 2013). As GW2-1 and GW2-4 were haplotypes prevalent only in the indica population, and GW2-2 and GW2-3 were mostly present in the japonica population, we compared grain sizes separately between GW2-1 and GW2-4, and between GW2-2 and GW2-3 (Table 1). No significant differences in GL and GW were detected between ZS97 (with the GW2-4 allele) and its NIL MH63 (with the GW2-1 allele), whereas significant differences in GL and GW were observed between the NIL Nip (with the GW2-2 allele) and the NIL SLG (with the GW2-3 allele). In addition, a significant difference in GL was detected between ZS97 (with the GS3-C allele) and its NIL MH63 (with the GS3-A allele), and a significant difference in GW was also observed between ZS97 (with the GS5-1 allele) and its NIL MH63 (with the GS5-3 allele). Moreover, significant differences in GL and GW were detected between ZS97 (with the qSW5-N allele) and its NIL MH63 (with the qSW5-K allele) (Table 1). These results agree closely with our results in the germplasm and previous reports (Fan et al., 2006; Shomura et al., 2008; Li et al., 2011).
Table 1. Comparison of rice (Oryza sativa) grain size between haplotypes of GW2, GS5, qSW5 and GS3 in the genetic background of Zhanshan97
NIL, near-isogenic line; NIL(MH63), NIL(NIP) and NIL(SLG) represent near-isogenic lines carrying Minghui 63, Nipponbare and SLG-1 segments in the target gene regions. Hap, the haplotype carried at the target gene in the NIL. GL, grain length; GW, grain width. All data are presented as mean ± SD. P values were calculated using the pair-wise Student's t-test.
8.17 ± 0.10
3.02 ± 0.03
8.08 ± 0.25
2.95 ± 0.01
8.34 ± 0.27
3.02 ± 0.11
9.28 ± 0.09
3.71 ± 0.10
8.17 ± 0.10
3.02 ± 0.03
8.45 ± 0.60
2.82 ± 0.22
8.17 ± 0.10
3.02 ± 0.03
8.58 ± 0.28
2.72 ± 0.06
8.17 ± 0.10
3.02 ± 0.03
9.27 ± 0.38
2.96 ± 0.13
Allelic combinations of four grain size genes in the rice germplasm
Although GS3, qSW5, GS5 and GW2 are known to individually regulate grain size, their collective function has remained unclear. To evaluate the effects of these loci on grain size, we investigated the grain size of different allelic combinations and the level of LD between the genes. R2 values between the segregation sites across GS5, qSW5 and GW2 were calculated, respectively, in indica and japonica populations.
An LD block was observed between GS5 and qSW5 in the indica population (Fig. 5a), which is logical as they were in physical linkage at a distance of 2 Mb. With respect to the gene combinations, among the 27 indica varieties carrying GS5-3 alleles, 24 (89%) contained a functional qSW5-K allele. Both of these alleles promote a more slender grain shape. Of the 45 varieties that carried the GS5-1 allele, 34 (76%) contained qSW5-N alleles. Significant differences in GL and GW were detected between qSW5-K and qSW5-N, regardless of GS5 genotype, but no differences in GL or GW were observed between the two GS5 genotypes that shared the same qSW5 genotype (Table 2).
Table 2. Comparison of grain size characters among genotype combinations at qSW5 and GS5 in the rice (Oryza sativa) indica population
N, the number of indica varieties; GL, grain length; GW, grain width, KGW, 1000-grain weight. All data are presented as mean ± SD. Letters are ranked by Duncan test at P <0.01. The same letter within the same column represents no significant difference.
9.38 ± 1.04b
2.73 ± 0.21a
23.18 ± 3.9a
9.11 ± 1.01bc
2.63 ± 0.2a
19.72 ± 5.82a
8.29 ± 0.68ac
3.18 ± 0.23b
23.6 ± 2.62a
8.04 ± 0.13a
3.19 ± 0.4b
21.47 ± 1.32a
Significant differences in both GL and GW were detected between GW2-2 and GW2-3 in the japonica population; however, 16 (89%) of the 18 varieties carrying the GW2-3 allele had GS3-A alleles. Both of these genotypes are associated with long and narrow grains. Of the 29 varieties that carried the GW2-2 allele, 20 (69%) possessed GS3-C alleles. These alleles promote the development of short and wide grains. Differences in GL and GW were investigated between two-gene combinations in the japonica population (Table 3). When varieties contained functional GS3-C alleles, no difference in GL was observed between the GW2-2 and GW2-3 genotypes. However, a significant difference in GL was observed between the two GW2 genotypes when they were accompanied by nonfunctional GS3-A alleles. In other words, the difference in GL between the two GW2 genotypes depends on the GS3 genotype. No differences in GW were observed among the four homologous allelic combinations at the two genes.
Table 3. Comparison of grain size characters among genotype combinations at GS3 and GW2 in the rice (Oryza sativa) japonica population
N, the number of japonica varieties; GL, grain length; GW, grain width; KGW, 1000-grain weight. All data are presented as mean ± SD. Letters are ranked by Duncan test at P <0.01. The same letter within the same column represents no significant difference.
7.81 ± 0.47a
3.77 ± 0.24a
24.77 ± 2.5ab
7.65 ± 0.79a
3.62 ± 0.09a
22.94 ± 0.23a
7.97 ± 0.8a
3.56 ± 0.53a
24.51 ± 4.74ab
9.83 ± 1.29b
3.35 ± 0.5a
29.46 ± 6.13b
GS3 and qSW5 were functional in both the indica and japonica populations, and a strong LD block was detected in the indica population. Within our collection, 39 varieties with a GS3-A and qSW5-K genotype combination produced slender grains, and 47 varieties that carried GS3-C and qSW5-N alleles bore round grains. These two combinations represented 67% of the total population, twice that of the other two combinations combined (33%). Grain size was next compared among the four homozygous combinations at GS3 and qSW5 within the indica and japonica populations (Table 4). In the indica population, a significant difference in GL was detected between the GS3-C and GS3-A groups, regardless of the genotype at qSW5. Interestingly, the difference in GL also existed between the two homozygous genotypes at qSW5 in the GS3-A genetic background. Similarly, a significant difference in GW was detected between the two homozygous genotypes at qSW5, regardless of GS3 genotype, and a significant difference in GW was detected between the two GS3 genotypes in the qSW5-N genetic background. With the exception of the functional qSW5 and GS3 combination, which had the shortest and most slender grains, leading to the lightest KGW, the remaining three combinations had nearly the same KGW. In the japonica subpopulation, no significant differences were detected in GL, GW or KGW.
Table 4. Comparison of grain size characters among genotype combinations at QTL for seed width 5 (qSW5) and grain size 3 (GS3) in rice (Oryza sativa) indica and japonica respectively
N, the number of japonica varieties. GL, grain length; GW, grain width; KGW, 1000-grain weight. All data are presented as mean ± SD. Letters are ranked by Duncan test at P <0.01. The same letter within the same column represents no significant difference.
8.08 ± 0.70a
2.65 ± 0.24a
18.28 ± 3.28a
9.68 ± 0.68c
2.67 ± 0.18a
22.07 ± 5.65b
8.01 ± 0.44a
3.24 ± 0.21c
23.02 ± 2.15b
8.75 ± 0.80b
3.03 ± 0.20b
23.77 ± 3.49b
7.37 ± 0.84a
3.47 ± 0.38ab
20.78 ± 0.36a
10.08 ± 0.63b
2.96 ± 0.33a
25.6 ± 3.39ab
7.79 ± 0.48a
3.71 ± 0.27b
24.44 ± 2.39ab
8.91 ± 1.70ab
3.66 ± 0.44b
29.67 ± 7.2b
To confirm the LD results obtained in the germplasms of 127 varieties of rice, SNP data for 1213 varieties in the RiceVarMap database (H. Zhao et al., unpublished) were used to examine the LD among GS3, qSW5, GS5 and GW2 separately in the indica and temperate japonica populations. SNP alleles with frequencies > 20% in each population were used to calculate R2. OsMADS3 was used as a control, as it is located on chromosome 1, functions in stamen development (Hu et al., 2011) and has nothing to do with grain size. In the indica population, LD blocks were detected among GS3, qSW5 and GS5 (Fig. 5c), whereas in the temperate japonica population, high levels of LD were observed among all the analysed genes except for qSW5 (Fig. 5d). These results were in agreement with those we obtained in the germplasm (Fig. 5a,b), and these LD differences were also in agreement with previous reports that the extent of LD in temperate japonica (> 500 kb) is much greater than that in indica (c. 75 kb) (Mather et al., 2007). The low LD observed for qSW5 in japonica is probably attributable to its location in a recombination hotspot region (Wan et al., 2008) and its high nucleotide diversity in japonica (see next section).
Nucleotide diversity and evolutionary pattern of grain size genes
The nucleotide diversity of GW2, GS5 and qSW5 was investigated in the rice (O. sativa) germplasm. Ten to 15 wild rice (O. rufipogon) accessions were sequenced to investigate evolutionary patterns (Table 5).
Table 5. Nucleotide diversity of grain width and weight 2 (GW2), grain size 5 (GS5) and QTL for seed width 5 (qSW5)
S, segregation sites; π, average number of nucleotide differences per site between random two sequences; θ, Watterson estimator; D1, Tajima's D; D2, Fu and Li’D; Hd, haplotype diversity. Significance: *, P <0.05; **, P <0.02.
Across the GW2 genomic locus, 2.09 SNPs were found per kilobase (π= 2.09 × 10−3), nearly a four-fold reduction compared with O. rufipogon (π= 7.57 × 10−3). Fu and Li's D values were negative and significantly deviated from neutrality in the whole collection and in O. sativa in particular. A sliding window of π at every 500 bp was plotted for GW2 (Fig. S3). Here, continuously high values of π were observed across the whole genome in O. rufipogon, but extremely low values were calculated in both the indica and japonica subpopulations. In detail, 60- to 110-bp deletions covering the initiation codon were detected in 10 of the 15 accessions of O. rufipogon. The loss of the first initiating codon (ATG) may make the translation start point shift into the next ATG, which is 141 bp distant from the original initiating codon, resulting in a 47-amino acid deletion in the GW2 protein. However, no SNPs or InDels were detected in the RING domain in either O. sativa or O. rufipogon. In addition, a phylogenetic tree with two major clades was constructed for the 10 haplotypes of GW2 in O. sativa and O. rufipogon (Fig. 6a). One clade comprised eight genotypes of O. rufipogon, all of which contained a 237-bp deletion in the last intron, while 10 haplotypes of O. sativa and the remaining seven genotypes of O. rufipogon were combined into another clade. Therefore, O. rufipogon could be divided into two subgroups with a large genetic distance between them according to GW2 variations, which was consistent with the finding at the level of the whole genome (Huang et al., 2012b). In contrast, all GW2 haplotypes detected in 127 rice (O. sativa) varieties were included in one subgroup, indicating that the GW2 allele of O. sativa originated from one GW2 progenitor.
For qSW5, 5.72 SNPs per kilobase (π= 5.72 × 10−3) were detected throughout the entire genomic locus, approximately half the value found in its ancestor (π= 10.34 × 10−3). Fu and Li's D values were negative and significantly deviated from neutrality for the whole collection. The phylogenetic tree constructed from the qSW5 haplotypes of O. sativa and the 14 genotypes of O. rufipogon is shown in Fig. 6(b). qSW5-K2 and K3, combined with most of the wild rice genotypes, were distantly divergent from the other haplotypes; the remaining qSW5-K haplotypes showed a closer genetic distance to qSW5-N and qSW5-Z, which were coupled with four O. rufipogon genotypes. Of the 14 genotypes of O. rufipogon, all but one (ID:48), which came from India, showed a 1212-bp deletion identical to that of qSW5-N, and the others were classified in the qSW5-K group.
At the GS5 locus, 2.75 SNPs per kilobase (π= 2.75 × 10−3) were detected, slightly lower than the value found in O. rufipogon (π= 3.86 × 10−3). A neutrality test detected no selective signals in either subpopulation. The sliding window diagram revealed a higher π value in the promoter relative to the other regions (Fig. S4). This finding is consistent with a previous conclusion that differences in the promoter were associated with gene function. A phylogenetic tree was constructed from the 10 GS5 haplotypes of O. sativa and the 14 genotypes of O. rufipogon (Fig. 6c). Two major clades were formed. One clade contained a widely divergent group, including GS5-1 and GS5-3, the most prevalent haplotypes in the indica subpopulation. Another major haplotype in japonica, GS5-2, was clustered in the other major clade. In summary, the GS5 allele in O. sativa showed a high genetic similarity to that of O. rufipogon, and some varieties contained the same alleles as wild rice.
GS3 and qSW5 play important roles in the natural variation of rice grain size
In the rice germplasm investigated in this study, GS3 was demonstrated to be the most important gene for GL, whereas qSW5 exerted the greatest effect on GW, regardless of genetic background. These results were also proved by those obtained NILs carrying various alleles in the same background of ZS97 (Table 1). GW2 was previously reported to have a large effect on GW (Song et al., 2007); however, the 1-bp deletion that led to a frameshift mutation was not detected in our collection. This was also the case in other studies (Takano-Kai et al., 2009; Yan et al., 2011), indicating that the 1-bp deletion is a rare mutation. In the present work, differences in GL and GW were detected between GW2-2 and GW2-3 in the japonica population, despite the lack of amino acid differences. In contrast, differences in genetic effects on GW between GW2-2 and GW2-3 evaluated in the indica genetic background of ZS97 were inconsistent with those in the germplasm. That is, the GW2-3 type showed a lower GW than the GW2-2 type in the germplasm, while the NIL SLG (with the GW2-3 allele) showed a greater GW than the NIL Nip (with GW2-2 alleles) (Table 1), indicating that the introgressed SLG segment may possess a large effect on GW. Interestingly, GW2 was not detected for GW in the population of recombinant inbred lines from the cross between ZS97 and Nanyangzhan (which carried the same GW2-3 allele as SLG) (Wang, 2007), whereas a major QTL, qGW2 for GW, was detected around GW2 in the population of recombinant inbred lines from the cross between ZS97 and SLG (Sun et al., 2013). This suggests that SLG carries another QTL increasing GW in the region of qGW2, which caused noise for genetic effect evaluation. That might be the reason why the NIL SLG had wider grains than expected. However, expression analyses revealed no differences in transcriptional regulation between GW2-2 and GW2-3. Thus, further experiments are needed to clarify the mechanism underlying the difference between the two alleles. We also found that GS5 function was masked by qSW5. Indica varieties with two major GS5 haplotypes (GS5-1 and GS5-3) showed significant differences in GL and GW, which was in agreement with the results from the NIL (Table 1). However, no differences in transcriptional levels were detected, which is inconsistent with previous reports (Li et al., 2011). As the varieties were diverse and grain size differences were great in this collection, we suggest that the genetic background may have masked the small effect of GS5.
GS3-A/qSW5-N, varieties with nonfunctional alleles at both the GS3 and qSW5 loci, did not have the largest GL, GW, and KGW values (Fig. S5). We speculate that the function of other undiscovered grain size genes might become apparent when GS3 and qSW5, the two major grain size genes, are nonfunctional. Thus, these varieties provide us with important resources for exploring new grain size genes.
Both GS3 and qSW5 are major genes controlling grain size, and they always show large genetic effects, regardless of the genetic background. By contrast, GS5 and GW2 act as supporting actors, modifying grain size to a lesser extent.
Widely used grain size genes have been subjected to high selection pressure
Nucleotide diversity was variable among GW2, qSW5 and GS5 in nature. A strong selection signal was detected in GW2 and qSW5, but not in GS5. GW2 encodes a conservative RING-type protein with E3 ubiquitin ligase activity; this activity is maintained in the context of the C-terminal deficiency in GW2 that alters rice grain size (Song et al., 2007). Although GW2 contained a high level of nucleotide diversity in O. rufipogon, the RING domain was still invariant. Moreover, no homologues of GW2 can be found in the rice genome. Taken together, these features indicate that the RING domain of GW2 is very important and unique in the rice genome, and it might be the target of natural selection aside from its function in determining grain size. In addition, Fu and Li's D values deviated negatively from neutrality in O. sativa, but a large diversity existed in the wild ancestors, suggesting that GW2 might have undergone long-term purifying selection during rice evolution and improvement. The GW2 alleles of O. sativa, which were more conserved, may have evolved from one unique ancient allele of O. rufipogon. However, we did not detect any signals of natural selection for GS5, which has a limited effect on grain size. GS5 haplotypes in cultivated rice were genetically close to those of the O. rufipogon ancestor. In addition, GS5 encodes a putative serine carboxypeptidase that, together with another 70 members, makes up a large family that functions in multiple cellular processes in rice (Feng & Xue, 2006). Therefore, GS5 may not be an essential gene determining grain size and also in functional redundancy as a putative serine carboxypeptidase, making it plausible that GS5 has not been selected against in rice. A significant selection signal was detected in qSW5, similar to that detected in GS3 in a previous study (Takano-Kai et al., 2009). This finding is logical, as this gene has a large effect on grain size and could easily have been targeted by artificial selection based on trait performance. Thus, qSW5 may have experienced strong artificial selection during evolution. Interestingly, all qSW5 alleles of O. rufipogon contained the intact gene sequence, except one that had the same 1212-bp deletion as qSW5-N.
The four grain size genes thus appear to have experienced selection intensities that correlate with their different genetic effects. That is, genes such as GS3 and qGW5, with their large genetic effects, have been under strong selection throughout evolutionary history, finally resulting in favourable alleles that have been widely applied in rice breeding. This interpretation is consistent with the fact that artificial selection in the past was primarily performed on the basis of target trait phenotypes. Thus, the genes with the predominant effects on the trait have been heavily selected. Similar results have been reported for many important rice genes that played key roles in rice domestication and breeding, such as Rc, Wx, qSH1 and PROG1 (Yamanaka et al., 2004; Konishi et al., 2006; Sweeney et al., 2007; Jin et al., 2008).
Combinatorial selection of major grain size genes
In this study, a high level of LD was detected among GS3, qSW5 and GS5 (Fig. 5a,c). The LD between GS5 and qSW5 can be attributed to their physical linkage, but the LD between GS3 and qSW5, located on chromosomes 3 and 5, respectively, might have been caused by artificial selection because slender grains and round grains are preferred by people of different regions. Indeed, the GS3-A/qSW5-K genotype has slender (long but narrow) grains, and the GS3-C/qSW5-N genotype has round (short but wide) grains. These two genotypes made up the largest proportion of our collection (Table 4). By contrast, the GS3-C/qSW5-K genotype, which has short and narrow grains with the smallest KGW, was rare in the population. In addition, varieties with a nonfunctional GS3-A/qSW5-N genotype are also rare in the rice germplasm. GW is frequently negatively correlated with grain eating quality in indica, presumably because qSW5 is linked to chalk5, which controls chalkiness (Tan et al., 2000). Wide indica grains frequently have high chalkiness, which results in poor quality. Thus, breeders’ tendencies to pursue high grain yield and high-quality grain appearance probably explain why the GS3-A/qSW5-N genotype, with long and wide grains, was seldom found in the collection. The GS3-A/qSW5-K and GS3-C/qSW5-N genotypes might be the optimal allelic combinations for both yield and quality breeding in rice.
Variable selection for grain size improvement
Phenotypic selection is correlated with the strong alleles of major genes. Conversely, phenotypic selection on moderate alleles or genes with moderate effects on target traits is low. However, in the era of molecular selection, it is important to note such moderate effects for the enhancement of selection efficiency. In this study, 10 or more haplotypes were discovered for GW2, GS5 and qSW5. The diagnostic SNPs of different alleles can be designed as functional markers for artificial selection. In addition, new functional alleles that lead to new trait variations in nature should be given priority and taken into consideration for genetic improvement. Taking GW2 as an example, a frameshift mutation in GW2 resulted in a 26.2% increase in GW and a 49.8% increase in KGW (Song et al., 2007), making this GW2 allele desirable for rice improvement. However, according to the GW2 resequencing results, this type of GW2 allele was not present in any of the 127 rice varieties, and was not discovered in a population of 235 rice varieties (Takano-Kai et al., 2009). Therefore, we concluded that, although this type of GW2 allele is desirable for rice breeding, it has seldom been used in rice improvement because of its limited frequency in nature.
In summary, we investigated functional and evolutionary differences among the four grain size genes GS3, qSW5, GW2 and GS5. qSW5 and GS3 are the key genes controlling grain size and are widely used in rice breeding. Although the mechanisms by which these genes regulate grain size are still not thoroughly understood, elucidation of their molecular features and evolutionary patterns in rice germplasm facilitates breeding efforts for improvement of both grain yield and rice quality. Importantly, some varieties shared the same genotypes at the four grain size genes but still showed substantial differences in phenotype, suggesting that other genes also contribute to rice grain size. For example, the grain size QTL OsSPL16 and GL3.1 were cloned very recently (Qi et al., 2012; Wang et al., 2012). Identifying more genes that regulate grain size will be necessary to more precisely dissect the genetic constitution of grain size for each variety. These cloned genes and ongoing gene cloning work will provide a comprehensive understanding of how rice grain size is controlled, which can then be applied to design varieties of the desired quality.
The authors thank Mr J. B. Wang for his excellent work in the field and Dr H. Y. Du for providing DNA of O. rufipogon varieties. This work was supported in part by grants from the National Key Program for Development of Basic Research of China (2010CB125901), the National 863 Program for Functional Genomics of Stress Resistance and Nutrient Utility in Rice (2012AA10A303), and the National Special Program for Research on Transgenic Plant of China (2011ZX08009-001).