Genomic dissection of small RNAs in wild rice (Oryza rufipogon): lessons for rice domestication


  • Yu Wang,

    1. Department of Agronomy & Key Laboratory of Crop Germplasm Resource of Zhejiang Province, Zhejiang University, Hangzhou, China
    Search for more papers by this author
  • Xuefei Bai,

    1. Department of Agronomy & Key Laboratory of Crop Germplasm Resource of Zhejiang Province, Zhejiang University, Hangzhou, China
    Search for more papers by this author
  • Chenghai Yan,

    1. Department of Agronomy & Key Laboratory of Crop Germplasm Resource of Zhejiang Province, Zhejiang University, Hangzhou, China
    Search for more papers by this author
  • Yiejie Gui,

    1. Department of Agronomy & Key Laboratory of Crop Germplasm Resource of Zhejiang Province, Zhejiang University, Hangzhou, China
    Search for more papers by this author
  • Xinghua Wei,

    1. State Key Laboratory of Rice Biology, China National Rice Research Institute, Chinese Academy of Agricultural Sciences, Hangzhou, China
    Search for more papers by this author
  • Qian-Hao Zhu,

    1. CSIRO Plant Industry, Canberra, ACT, Australia
    Search for more papers by this author
  • Longbiao Guo,

    Corresponding author
    1. State Key Laboratory of Rice Biology, China National Rice Research Institute, Chinese Academy of Agricultural Sciences, Hangzhou, China
    • Department of Agronomy & Key Laboratory of Crop Germplasm Resource of Zhejiang Province, Zhejiang University, Hangzhou, China
    Search for more papers by this author
  • Longjiang Fan

    Corresponding author
    • Department of Agronomy & Key Laboratory of Crop Germplasm Resource of Zhejiang Province, Zhejiang University, Hangzhou, China
    Search for more papers by this author

Authors for correspondence:

Longjiang Fan

Tel: +86 571 88982730


Longbiao Guo

Tel: +86 571 63370537



  • The lack of a MIRNA set and genome sequence of wild rice (Oryza rufipogon) has prevented us from determining the role of MIRNA genes in rice domestication.
  • In this study, a genome, three small RNA populations and a degradome of O. rufipogon were sequenced by Illumina platform and the expression levels of microRNAs (miRNAs) were investigated by miRNA chips.
  • A de novo O. rufipogon genome was assembled using c. 55× coverage of raw sequencing data and a total of 387 MIRNAs were identified in the O. rufipogon genome based on c. 5.2 million unique small RNA reads from three different tissues of O. rufipogon. Of these, O. rufipogon MIRNAs, 259 were not found in the cultivated rice, suggesting a loss of these MIRNAs in the cultivated rice. We also found that 48 MIRNAs were novel in the cultivated rice, suggesting that they were potential targets of domestication selection. Some miRNAs showed significant expression differences between wild and cultivated rice, suggesting that expression of miRNA could also be a target of domestication, as demonstrated for the miR164 family.
  • Our results illustrated that MIRNA genes, like protein-coding genes, might have been significantly shaped during rice domestication and could be one of the driving forces that contributed to rice domestication.


Noncoding small RNAs, including microRNAs (miRNAs) and small-interfering RNAs (siRNAs), are critical regulators in plants. They regulate the expression of their target genes by mRNA cleavage or repressing mRNA translation at the post-transcriptional level (Bartel, 2004). miRNAs are processed from precursors with a hairpin structure by Dicer-like enzymes (Voinnet, 2009). Plant miRNAs play critical roles in a variety of development processes (Llave et al., 2002; Aukerman & Sakai, 2003; Chen, 2004). Thousands of miRNAs have been identified in a wide range of plant species, from unicellular green alga Chlamydomonas reinhardtii to higher plants such as rice (Kozomara & Griffiths-Jones, 2011). However, new miRNAs are still being identified even for the well characterized plant species. For example, a recent study identified 76 new rice miRNAs by massive sequencing of small RNA populations from different tissues (Jeong et al., 2011).

In view of the comprehensive regulatory function of miRNAs in rice, whether MIRNA genes were under artificial selection during domestication is an intriguing question. Investigation of population genetics has illustrated a strong positive selection at the miR156b/c locus in rice (Wang et al., 2007). Recent studies have shown that miR156 controls the ideal plant architecture, an important agronomic trait, by regulating the expression levels of OsSPL14, and was a target of artificial selection (Jiao et al., 2010; Miura et al., 2010). Meanwhile, a large-scale investigation of selection signals at MIRNA genes indicated that several MIRNA genes might undergo artificial selection during rice domestication (Wang et al., 2010). Based on ANOVA of miRNAs and their binding sites in target genes and paralogs of target genes, a dynamic gain and loss of miRNA binding sites caused by nucleotide substitution or insertion/deletion (indel) during rice evolution was detected (Guo et al., 2008).

The wild rice population holds great genetic diversity and is an important genetic resource for rice breeding. Comparative analysis of the genomes from cultivated and wild rice could provide crucial insights into rice domestication. Although the genomes of two cultivated rice species (Oryza sativa ssp. indica and japonica) have been fully sequenced, and five accessions of Oryza rufipogon, a wild ancestor of O. sativa, have recently been sequenced to a mean coverage of 10.3× (Xu et al., 2012), the whole-genome sequence of O. rufipogon has not yet been generated. Meanwhile, no miRNA has been identified in O. rufipogon. A more complete O. rufipogon genome will offer a good opportunity to identify all O. rufipogon miRNAs and to study the evolutionary dynamics of miRNAs during rice domestication at a genome-wide scale.

In this work, we sequenced an accession of O. rufipogon to c. 55× coverage, identified miRNAs using small RNAs generated from three different tissues of O. rufipogon and identified miRNA targets in O. rufipogon by degradome sequencing. Using a comparative genomics approach, we found that rice MIRNA genes have experienced a complex evolutionary process during domestication.

Materials and Methods

Plant material

A Chinese accession of the O. rufipogon Griff. wild rice, collected from Dongxiang, Jiangxi Province, and provided by China National Rice Research Institute was used in genome, small RNA and degradome sequencing. Six cultivars (Oryza sativa L.) and six accessions of O. rufipogon (Supporting Information, Table S1), randomly selected from our collections kindly provided by the International Rice Research Institute (IRRI) and the China Rice Research Institute, were used in miRNA microarray experiments, as previously described (Wang et al., 2010).

Genome sequencing and annotation

Genomic DNA of O. rufipogon was extracted from young seedlings using the CTAB protocol (Chong, 2001). Fragmented DNA was fractionated by electrophoresis and DNA fragments of the desired length were excised and gel-purified. Purified DNA was then used to generate two paired-end sequencing libraries with an insert size of c. 500 bp and 2 kb, respectively. DNA sequencing was performed on the Illumina HiSeq2000 platform.

Sequencing reads were aligned onto the rice reference genome sequence (O. sativa, japonica;, Release 7 of the MSU Rice Genome Annotation Project) using SOAP2 (; Li et al., 2009ab) and the bwa package (; Li & Durbin, 2009). The sequencing depth and coverage relative to the rice reference genome were calculated based on the alignment. Genomic variation detection and visualization were performed using samtools-0.1.18 (; Li et al., 2009a,b) and inGAP-sv (; Qi et al., 2010). The raw sequence data have been deposited into the NCBI Short Read Archive under the accession number SRA055709.

The de novo assembly of the O. rufipogon genome was performed using SOAPdenovo (Li et al., 2010a). Raw reads were preprocessed to remove adaptors, to filter out reads of low quality (≥ 50% of its nucleotides with quality value ≤ 5). Error corrections were performed with the ‘Correction’ program to reduce the low frequency K-mer for better assembly (Li et al., 2010a,b). A best assembly result (i.e. its contigs with the longest N50) was obtained by using the 49-mer parameter and its contigs were used for further scaffold assembly based on the paired-end information of short reads. The gaps between the assembled scaffolds were then closed by GapCloser v1.1. Repeat sequences were annotated by searching against the Repbase version 16.06 ( and Oryza Repeat Database version 3.3 ( using RepeatMasker ( and masked. Repeat-masked contigs were subjected to gene prediction using Augustus ( Known noncoding RNAs were also annotated by searching against the Rfam database (version 10.0,

Sequencing of small RNA populations and degradome

Total RNA was extracted from leaves and roots of four-leaf stage seedlings (c. 20 d), and dehusked 5–10 d postanthesis (DPA) developing grains were collected from O. rufipogon. Small RNA sequencing was performed using the Illumina GA II. Two corresponding small RNA datasets (GSE11014 and GSE19602; Zhu et al., 2008; He et al., 2010) from Nipponbare (O. sativa, japonica) were downloaded from GEO databases.

A degradome sequencing library was constructed using mRNA isolated from leaves of four-leaf stage seedlings used in small RNA sequencing according to the protocols described previously (Addo-Quaye et al., 2008; German et al., 2008). The raw data of small RNA and degradome sequencing have been submitted to the NCBI GEO datasets with the accession numbers of GSE39309.

Analyses of small RNA and degradome data

Raw small RNA reads and degradome segments were trimmed to remove adaptors, and filtered to remove low-quality reads using custom perl scripts. The clean reads with a size of 20–24 nt were then mapped to the de novo assembled O. rufipogon contig sequences using BLAST (W = 7, E = 1000). Reads mapped to coding regions and known noncoding RNAs (rRNA, tRNA, snRNA, snoRNA) were excluded for further analysis. The known rice miRNAs were downloaded from miRBase (version 18.0,

A bioinformatics pipeline (Fig. 1) designed following the criteria previously described (Meyers et al., 2008) was used to predict new miRNAs in O. rufipogon. Each step was executed with custom perl scripts. The core program used was the modified version of MIREAP ( developed by Jeong et al. (2011). However, three more criteria were adopted to guarantee the quality of the predicted MIRNAs: the abundance of the small RNA reads; > over 90% of the reads from each pre-miRNA mapped to the same strand; and the two most abundant reads accounting for > 70% of the total reads mapped to each pre-miRNA. The RNA secondary structure of the candidate pre-miRNAs was predicted using the Vienna RNA package (; Hofacker, 2003) and manually checked.

Figure 1.

The bioinformatics pipeline for identification of microRNAs (miRNAs) from the small RNA populations of Oryza rufipogon. MicroRNAs were identified by a series of filtering steps shown in the diagram and described in the text. The numbers of candidate small RNAs and known rice miRNAs identified at each step are indicated.

Targets of known rice miRNAs and newly identified wild rice miRNAs were predicted by searching against the coding regions of O. rufipogon annotated in this study using psRNATarget (; Dai & Zhao, 2011). Evidence for miRNA-mediated cleavage was obtained based on analysis of the degradome sequencing data using the CleaveLand pipeline (Addo-Quaye et al., 2009).

Analysis of miRNA expression

Twelve accessions, including six O. sativa (three indica and three japonica) and six O. rufipogon lines, were cultivated under the same conditions (Table S1). Fresh leaves from the four-leaf stage seedlings were used in total RNA extraction. miRNA chips were designed by the LC Sciences Company, China, based on the rice miRNAs deposited in miRBase version 13.0. Each chip contains 253 miRNA probes, representing 414 rice miRNAs. For each chip, small RNA samples from a cultivar and an O. rufipogon accession were hybridized separately.

Microarray assay was performed by the LC Sciences Company. Five micrograms of total RNA was size-fractionated using a YM-100 Microcon centrifugal filter (Millipore), and the low-molecular-weight RNAs (< 300 nt) isolated were 3′-extended with a poly(A) tail using poly(A) polymerase. An oligonucleotide tag was then ligated to the poly(A) tail for later fluorescent dye staining; two different tags were used for the two RNA samples in dual-sample experiments. Chip hybridization was performed overnight using a microcirculation pump (Atactic Technologies, China). The hybridization melting temperatures of each probe were balanced by chemical modifications of the detection probes. Hybridization was performed in 100 μl 6 × SSPE buffer (0.90 M NaCl, 60 mM Na2HPO4, 6 mM EDTA, pH 6.8) containing 25% formamide at 34°C. Hybridization images were collected using a laser scanner (GenePix 4000B, Molecular Devices, China) and digitized using Array-Pro image analysis software (Media Cybernetics, China). Data were analyzed by first subtracting the background and then normalizing the signals using a LOWESS filter (locally weighted regression). For two-color experiments, the ratio of the two sets of detected signals (log2-transformed and balanced) and P-values of the t-test were calculated; a P-value < 0.01 was considered as differentially expressed. The raw chip data have been deposited into the NCBI GEO datasets under the accession number GSE39309.


De novo assembly of the O. rufipogon genome

To perform genome-wide comparison of MIRNA genes between cultivated and wild rice and to investigate the effects of domestication on evolution of MIRNAs in rice, we sequenced the genome of O. rufipogon (collected from Dongxiang, China) using the next-generation sequencing technology. Two paired-end (PE) libraries with an insert size of c. 500 bp and 2 kb were constructed and sequenced. In total, 22.4 Gb of sequence data were generated, which corresponded to a c. 55× coverage of the reference rice genome (O. sativa, ssp. japonica, TIGR V6.1) (Table S2). Sequenced short reads were de novo assembled using SOAPdenovo ( We obtained 343.7 Mb of genome scaffold sequences with an N50 of 27 880 and 1101 bp for the scaffolds and contigs, respectively. To annotate the genome, first, repeat sequences (25.2%) were identified by RepeatMasker and masked. The remaining sequences were used in gene prediction using Augustus ( In total, 29 660 genes or protein coding sequences were predicted. These genes were further annotated by BLASTX/BLASTP search against the nucleotide (NCBI) and protein (SWISSPROT) databases. Known noncoding RNAs (rRNAs, tRNAs, snRNAs and snoRNAs) were also annotated based on the Rfam database by BLAST.

Identification of miRNAs and their targets in O. rufipogon

To identify miRNAs in O. rufipogon, we generated three small RNA sequencing libraries using RNAs isolated from leaves and roots of four-leaf stage seedlings and developing seeds. In total, 7 572 483, 6 626 103 and 6 427 458 clean reads with a size between 20 and 24 nt were generated from each library. Consistently, 24 and 21 nt small RNAs were the most abundant in each library (Table 1). These reads were mapped to the draft genome sequences of O. rufipogon by BLAST (E = 1000, W = 7); 59.8, 54.0 and 56.2% of the reads could be exactly mapped. For miRNA identification, reads that mapped to the known noncoding RNAs were eliminated for further analysis.

Table 1. Summary of high-throughput sequencing of small RNAs and degradome from Oryza rufipogon
CategorySmall RNA sequencingDegradome sequencinga
LeafRootDeveloping seedLeaf
  1. NA, not available.

  2. a

    Including 18–33 nt reads.

  3. b

    Contigs from the wild rice genome generated in this study; for small RNAs, only 20–24 nt reads were counted.

Raw reads7 572 4836 626 1036 427 45812 048 037
Unique reads1 570 1771 274 5542 350 532470 548
Mapped to contigb
Reads4 531 4103 576 5733 609 1782 282 736
Specific in each library1 060 734602 2382 475 430NA
Conserved in all libraries2 750 5282 647 377773 948NA
Unique1 018 843773 3191 191 35481 657
Specific in each libraries775 330546 7051 024 137NA
Conserved in all libraries63 66063 66063 660NA
Singleton772 715640 686888 6827155

MIRNA genes were predicted using the pipeline illustrated in Fig. 1. To guarantee the quality of the predicted MIRNAs, in addition to the criteria described previously (Meyers et al., 2008), three additional criteria (the third filter in Fig. 1) were used. Using this pipeline, we identified 387 MIRNAs, including 128 known MIRNAs from 56 families, and 259 novel MIRNAs from 207 families. Of the 387 MIRNAs, 28 were present in all three tissues (leaf, root and developing grain) and 51 in at least two tissues. The precursor sequences, stem-loop structures and read numbers in three tissues of these MIRNAs are shown in Table S3. Of the 259 novel MIRNAs, at least 20 (Table 2) could be O. rufipogon-specific for the following reasons: they are absent in the cultivated rice genomes (from 40 cultivars; Xu et al., 2012); each has over four mismatches in the mature sequences with any known plant miRNAs (miRBase Version 18); and no small RNA reads were detected in their homologous pre-miRNA loci in the cultivated rice. Three examples of such O. rufipogon-specific miRNAs are shown in Fig. 2. Based on homolog search and sequence comparison, big deletion (> 20 nt; for 16 MIRNAs) or nucleotide mutation events (> 4 nt; for four MIRNAs) were found to be the reason for their loss or dysfunction in the cultivated rice. For the remaining 239 novel O. rufipogon MIRNAs, small (< 4 nt) sequence deletions or nucleotide mutations seem to be the reason for their dysfunction in the cultivated rice. To know whether the mutations found in O. rufipogon-specific MIRNAs were caused by misassembly of sequences, the precursor sequences of 15 O. rufipogon-specific MIRNAs were amplified and sequenced, and in 14 cases, the assembly could be confirmed, suggesting that these MIRNAs have indeed been lost in the cultivated rice.

Figure 2.

Stem-loop structures of three selected novel microRNAs (miRNAs) in Oryza rufipogon. Base pairings between miRNA and their predicted targets are shown with two dots representing a Watson–Crick pair and one dot representing a G–U (guanine and uracil) pair.

Table 2. Summary of novel microRNAs (miRNAs) unique to the wild rice (Oryza rufipogon)
miRNAMature sequenceSize (nt)No. of readsPredicted targetaScoreHomolog in O. sativaScoreAnnotationEvidenceb
  1. a

    NA, no target predicted or no homolog found in Oryza sativa; only targets with the best scores are shown.

  2. b

    Y, not found in the dataset, including 40 rice cultivars sequenced by Xu et al. (2012); NY, not determined yet.

oru-miRX30AACGGTGTAATCGGATAGTAGATC2429C7180253_g2693Os09g37200.1NATransferase family protein, putative, expressedNY
oru-miRX65CGGATTTGAGCTTGCCGACGGC2232scaffold13410_g380.5Os10g40770.10Expressed proteinY
oru-miRX69ACGGTATGATCGGATTGTAGATTG247scaffold3144_g4183Os03g17350.13White-brown complex homolog protein, putative, expressedY
scaffold3653_g4043Os04g50920.13WRKY37, expressedY
scaffold7063_g2373Os08g17390.1NAExpressed proteinY
oru-miRX83TTATGTTTTTGAGAGAGGGGC215scaffold8192_g413NA NAY
oru-miRX105AAAGTATTTTGATCGGATGGG2122scaffold461_g5603Os03g51020.1NAExpressed proteinY
oru-miRX109TTGGAACATAGGAATTTTACA2116scaffold591_g3012.5Os05g25890.1NAExpressed proteinY
oru-miRX121ATATTCCGTAGATGTTAATGAATC2410scaffold1019_g922.5Os02g57090.12.5Anthranilate phosphoribosyltransferase, putative, expressedY

All 387 miRNAs newly identified in O. rufipogon were subjected to target prediction and 169 (including 105 novel miRNAs) were predicted to target annotated genes in O. rufipogon (Table S4). Of the 20 O. rufipogon-specific miRNAs, six have targets identified in the annotated gene set based on the O. rufipogon genome, and most of them were not targeting transcription factors (Table 2), a characteristic of nonconserved miRNAs. Degradome sequencing matched harmoniously with small RNA sequencing technology for high-throughput prediction of miRNA target genes. To confirm the predicted targets, we constructed a degradome sequencing library using total RNA from seedlings of four-leaf stage O.rufipogon plants. In total, 12 048 037 raw clean degradome fragments were generated. Based on the approaches described previously (Addo-Quaye et al., 2008, 2009), targets were predicted for 88 miRNAs (including 11 novel miRNAs and one O. rufipogon-specific miRNA) (Fig. 3; Table S5). As expected, novel targets were found for novel miRNAs identified in O. rufipogon. For instance, oru-miRX35 was found to target tryptophan synthase beta chain 1 (scaffold2742_g70) (Fig. 3). In addition, conserved miRNAs were found to regulate conserved targets in O. rufipogon (Table S5); however, novel targets were also found for conserved miRNAs – for example, genes encoding glucan endo-1,3-beta-glucosidase precursor (scaffold485_g607, homolog of Os02g04670) and Bowman–Birk type bran trypsin inhibitor precursor (scaffold4757_g143, homolog of Os01g03380) were found to be targeted by miR156 and miR408, respectively. These results suggest a dynamic nature of miRNA and target interaction during rice domestication.

Figure 3.

Plot signals of the candidate targets of microRNAs (miRNAs) by degradome sequencing. Targets of three conserved and novel miRNAs are shown in the left and right panels, respectively. The x-axis displays the nucleotide position of the target genes while the y-axis indicates the abundance of reads converted into transcripts per billion (TPB). Each circle represents a degradome fragment mapped to the target gene and the circle indicated by the red arrow represents the expected miRNA cleavage product. Base pairing between miRNAs and their predicted targets are shown below each panel, with one vertical representing a Watson–Crick pair and one circle representing a G–U (guanine and uracil) pair.

Genomic variations of MIRNAs between wild and cultivated rice

To further investigate the dynamics of MIRNAs during rice domestication, we mapped O. rufipogon genomic and small RNA reads onto the MIRNAs previously identified in the cultivated rice (O. sativa). Of the 543 rice MIRNAs deposited in miRBase (version18), 495 were covered by both genomic (to their whole precursors) and small RNA reads (to their mature sequences) from O. rufipogon, and 48 were not covered by any O. rufipogon sequence (Table 3). Of the 495 common MIRNAs found in both the cultivated and wild rice, 272 were unchanged during rice domestication while 223 had small indels or mutations in their precursors or the mature sequences (Table S6), but these changes did not seem to affect the processing of these MIRNAs in the cultivated rice. Of the 48 MIRNAs found only in O. sativa, 25 had no corresponding pri-miRNAs found in O. rufipogon. Two such MIRNAs (MIR1435 and MIR413) were shown in Fig. 4(a). After mapping the O. rufipogon PE reads onto the japonica genome sequence (Nipponbare), it is clear that these two MIRNAs are located at a region with a big deletion in O. rufipogon. For the remaining 23 MIRNAs, their ancestral sequences were found in O. rufipogon but they were unable to form the stem-loop structures required for miRNA production because of big indels (≥ 20 nt) or a mutation (> 4 nt) in the mature sequences in O. rufipogon (Table 3; Table S6). Two such MIRNAs are shown in Fig. 4(b). Apparently, parts of MIR1442 and MIR439h, including the whole mature sequences, were missing in the wild rice genome. Genomic variations in precursors of 38 previously identified conserved miRNA families in wild rice are shown in Fig. 5. These results demonstrate that most MIRNAs have been adopted or fixed in the cultivated rice population.

Figure 4.

Examples of genomic variations between wild rice Oryza rufipogon microRNAs (miRNAs) and their counterparts in cultivated rice (Oryza sativa). (a) Two miRNA loci in cultivated rice (osa-miR1435 and osa-miR413) were not found in O. rufipogon. Mapping results of the wild rice paired-end reads to the Nipponbare genome are shown. (b) Two examples of disabled miRNAs in the wild rice as a result of sequence deletions. Two big deletions covered the whole mature sequences of osa-miR1442 and osa-miR 439h. For each miRNA, mapping results of the wild rice reads on pre-miRNA sequences are shown.

Figure 5.

Genomic variations in precursors of 38 previously identified conserved microRNA (miRNA) families in wild rice (Oryza rufipogon). Also see Table 4 for details.

Table 3. Classification of 543 known Oryza sativa microRNAs (miRNAs)
  1. a

    Numbers in parentheses are the number of miRNAs conserved in plants.

  2. b

    No small RNA reads were found in our Oryza rufipogon small RNA dataset.

Common in O. sativa and Oryza rufipogon495
No mutation between O. sativa and O. rufipogon272 (128)
SNP/indels in miRNA 50 (1)
SNP/indels in pre-miRNA (except miRNA)173 (36)
Novel in O. sativab 48
Unable to form hairpin structure in O. rufipogon 23 (0)
No corresponding MIRNA in O. rufipogon 25 (1)
Total543 (166)

Previous investigations of miRNA::target interaction suggest a coevolution relationship between miRNAs and their targets (Guo et al., 2008; Cuperus et al., 2011). Our results support this notion because no target was found in O. rufipogon for about half of the newly evolved MIRNAs in O. sativa, and when a target was predicted in O. rufipogon, it usually had a higher score (less likely to be a real target) than the target predicted in O. sativa (Table 4).

Table 4. The predicted microRNA (miRNA) targets in Oryza rufipogon and Oryza sativa
miRNAa O. sativa O. rufipogon HomologAnnotation
  1. a

    miRNAs indentified only in the cultivated rice (O. sativa) and their best predicted targets are listed.Y, yes; ND, not determined.

osa-miR1435Os03g42280.11.5scaffold9489_g1823YB3 DNA binding domain containing protein, expressed
osa-miR2097-3pOs01g14130.12.5scaffold13939_g542YAlpha/beta hydrolase fold, putative, expressed
osa-miR2097-5pOs08g43920.13scaffold15723_g692NDCarrier, putative, expressed
osa-miR2103Os03g50830.10scaffold9433_g802YConserved hypothetical protein
osa-miR2120Os11g35750.10Not FoundNDConserved hypothetical protein
osa-miR2928Os01g22954.12Not FoundNDSerine carboxypeptidase, putative, expressed
osa-miR413Os08g17400.12.5Not FoundNDWRKY DNA-binding domain containing protein, expressed
osa-miR416Os08g31780.12.5Not FoundNDMLA1, putative, expressed
osa-miR5079Os02g27130.11.5Not FoundNDExpressed protein
osa-miR5526Os02g21009.22.5Not FoundNDSodium/calcium exchanger protein, putative, expressed
osa-miR1427Os02g16030.12.5scaffold13728_g983NDHairpin-induced protein 1 domain containing protein, expressed
osa-miR1851Os01g61520.10.5scaffold28_g882.5YHypothetical protein
osa-miR414Os12g01320.10scaffold2732_g450YRetrotransposon protein, putative, Ty3-gypsy subclass
osa-miR5155Os04g40660.12.5scaffold6007_g1362.5YMA3 domain containing protein, expressed
osa-miR5162Os10g41920.11.5scaffold762_g563NDExpressed protein
osa-miR5484Os02g50140.13scaffold12274_g1273YCaleosin related protein, putative, expressed
osa-miR5532Os07g38664.13scaffold16393_g423NDExpressed protein
osa-miR1318Os03g59790.10.5Not FoundNDEF hand family protein, putative
osa-miR1442Os02g28720.12.5Not FoundNDSpotted leaf 11, putative, expressed
osa-miR2872Os07g28050.10.5Not FoundNDRetrotransposon protein, putative, unclassified
osa-miR5074Os10g06480.10Not FoundNDRetrotransposon protein, putative, Ty3-gypsy subclass
osa-miR5160Os10g36130.11Not FoundNDHypothetical protein

Expression profiles of miRNAs in wild and cultivated rice

To investigate the expression changes of miRNA in the cultivated and wild rice, six miRNA chips were used to compare the expression levels of miRNAs in six cultivars (three each from the indica and japonica group) and six accessions of O. rufipogon. Each chip was separately hybridized with low-molecular-weight RNAs (< 300 nt) extracted from four-leaf seedlings of a cultivar and a wild accession (Table S1). Expression divergence between the cultivated and wild rice was calculated with a false discovery rate (FDR) of < 3%. MicroRNAs with significant expression changes between cultivated and wild rice were clustered by hierarchical clustering (Fig. 6). In total, 87 miRNA families showed altered expression levels in the cultivated and wild rice (seven selected miRNA families are shown in Table S7). Interestingly, more miRNA families (71) tended to be up-regulated in the cultivars. Of these 71 families, 20 were conserved miRNAs. For example the miR164 family was significantly up-regulated in all six cultivars tested (Table S7). Meanwhile, the expression levels of several miRNAs showed divergence between the indica and japonica group. For example, a significant down-regulation of miR156, miR162, miR167 and miR397 was only observed in the indica but not in the japonica group, and miR396 was significantly up-regulated in the indica group but down-regulated in the japonica group. These differences might be a result of the independent domestication routes of the two subspecies of O. sativa.

Figure 6.

Clustering of microRNAs (miRNAs) based on miRNA microarray chip results. Six cultivars and six wild rice (Orzya rufipogon) accessions are labeled at the top of the figure, while miRNAs are listed on the right.

Although microarray is a powerful tool for investigation of the expression profiles of miRNAs, it could only detect relatively highly expressed miRNAs and miRNAs with their sequences represented on the chip. Because the design of our miRNA chip was based on rice miRNAs in version 13 of miRBase and more new miRNAs have been identified since then, to have a comprehensive view of the changes of all miRNAs in wild and cultivated rice, we compared their read numbers (reads per million (RPM)) in O. rufipogon (this study) with those generated previously from the equivalent tissues in O. sativa (japonica; Zhu et al., 2008; He et al., 2010). In general, the results from high-throughput sequencing were consistent with our miRNA microarray results (Table S8). For example, the up-regulation of the miR164 family in the cultivars analyzed was supported by the small RNA sequencing results (Fig. S1). In addition, the small RNA sequencing approach was able to distinguish the expression levels of individual members of this family, which is hard to be achieved by the miRNA chip.

To test whether the miRNAs with significant expression differences between wild and cultivated rice were potential targets of domestication selection, a neutrality test for positive selection was carried out for the miR164 family in the wild and cultivated populations. To this end, we sequenced the precursors of all six miR164 members in 54 cultivars (O. sativa, indica and japonica) and 15 accessions of O. rufipogon (Wang et al., 2010). A significant reduction of nucleotide diversity was observed in all six members in the two cultivar groups, varying from 1.32- to 7.31-fold (as measured by θW) relative to the wild group (Table S9), suggesting a domestication bottleneck effect on these MIRNA loci. A similar effect has been observed in protein-coding genes (Zhu et al., 2007). Besides using Tajima's D (Tajima, 1989) as a neutrality test, we also employed Fay and Wu's H test, which detects the positive selection by measuring the relatively high frequency of the departure of derived alleles from neutrality (Fay & Wu, 2000), rather than tracing an excess of low-frequency polymorphisms after selective sweep by the D test. Significant positive selection was detected in MIR164c, d and e in the cultivated population but not in the wild rice population, suggesting that these three members were putative targets of artificial selection during rice domestication. Furthermore, using the published genomic sequence information of 20 domesticated and five wild rice lines (Xu et al., 2012), we performed Tajima's D test using Zhu et al. (2007)'s method for the 188 miRNAs with expression differences detected between the wild and cultivated lines by the miRNA array experiment (Table S8), and found significant positive selection at regions (< 10 kb) containing 29 of these MIRNAs (data not shown). This result suggests that these MIRNAs might be targets of domestication selection.

Taken together, these results provide evidence of expression divergence of miRNAs in cultivated and wild rice and demonstrate that in some cases the expression divergence of miRNAs could be a result of domestication selection.


In this study, a de novo assembly of the O. rufipogon (the ancestor of the cultivated rice, O. sativa) genome was generated, and a genome-wide investigation of miRNAs was carried out in O. rufipogon using the next-generation sequencing technology and miRNA microarray. To our knowledge, this is the deepest sequenced (> 50×) genome of O. rufipogon so far and the first large-scale attempt at miRNA identification in wild rice. The deeper sequencing promises a better de novo assembly of the wild rice genome. Li et al. (2010a,b) suggested 30× as a minimal sequence depth required for achieving a proper assembly of the human genome. Meanwhile, the deeper sequencing also guarantees a proper sequence depth (e.g. > 4×) for single nucleotide polymorphism (SNP) between the wild and cultivated rice. For example, our sequencing coverage (c. 55× of O. rufipogon provided at least 4.5× in all 548 known rice miRNA loci. However, there were c. 40 miRNA loci with < 4× read coverage for each of the five wild lines that were sequenced to 8.5–11.5× of genomic coverage (Xu et al., 2012) (Fig. S2). Comparative analyses of MIRNAs, miRNA::target interactions and miRNA expression in the cultivated and wild rice revealed a complex and dynamic evolutionary process of miRNAs during rice domestication and provided lessons for rice domestication, that is, MIRNA genes, like protein-coding genes, were significantly shaped by artificial selection.

Gain and loss of MIRNAs as a driven force for rice domestication

In rice, exhaustive miRNA investigations have been carried out in many tissues and developmental stages in the past 10 yr ( In this study, we identified 754 MIRNAs in O. rufipogon, 387 via small RNA population sequencing (i.e. de novo approach) and 495 via a comparative genomic approach. Of the 754 O. rufipogon MIRNAs, 128 were identified by both approaches. Many O. rufipogon MIRNAs failed to be identified by the de novo approach although small RNAs were detected, most likely because of the high stringency of our MIRNA prediction program and/or low expression levels of these MIRNAs in the tissues we analyzed in the wild rice. Of the 387 O. rufipogon MIRNAs identified using the de novo approach, 259 were not found in the cultivated rice, suggesting that a number of MIRNAs have been lost during rice domestication. We also found that 48 MIRNAs were only present in O. sativa but not in O. rufipogon. In about half of these MIRNAs, mutations disabling the stem-loop structure in O. rufipogon were observed (Fig. 5); for the remainder, no corresponding genomic sequences were found in O. rufipogon, suggesting that these were most likely newly evolved MIRNAs through accumulation of mutations and/or genome rearrangement in the cultivated rice.

Functions of MIRNAs are achieved through regulating the expression levels of their target genes. Some targets of the 259 miRNAs identified only in O. rufipogon are transcription factors, such as the WRYK and MYB family genes, which could be involved in important developmental processes; however, the vast majority of their targets seem to be related to specific biological pathways (Table S4). Loss of these miRNAs means loss of regulation of their targets in the cultivated rice. Although it is not clear which particular developmental events and biological processes have been affected by loss of these miRNA::target interactions in cultivated rice and how the cultivated rice was shaped by these changes, the findings demonstrate that evolution of MIRNAs share a similar trend as protein-coding genes, for which many have been lost in the cultivated populations during domestication (Lam et al., 2010; Xu et al., 2012). On the other hand, new miRNA::target interactions might be established for the fixed and newly evolved MIRNAs in cultivated rice. If the interaction plays an important role during rice domestication, we expected it to be stable in various sources of the cultivated rice. After checking the MIRNA sequences in the 40 domesticated rice accessions that were generated recently (Xu et al., 2012), we found no mutations in 15 of the 48 newly evolved MIRNAs, suggesting that these MIRNAs might have experienced strong artificial selection and were important for rice domestication, as shown before for other MIRNAs (Wang et al., 2007, 2010; Jiao et al., 2010).

A similar phenomenon of birth and death of MIRNAs has been observed in Arabidopsis (Fahlgren et al., 2007, 2010; Ehrenreich & Purugganan, 2008; Zhang et al., 2011). A frequent birth and death of MIRNA genes and dynamic miRNA::target interactions were observed through comparing miRNAs in A. thaliana with those in its close relative A. lyrata. These results suggest that both natural and domestication selection could play a significant role in shaping the small RNA populations in plants.

Expression of miRNAs was also shaped during domestication

We found that the sequences of about half of the 543 O. sativa MIRNAs remain unchanged during rice domestication, and that for the remaining MIRNAs common to O. sativa and O. rufipogon, most mutations (SNP or small indels) between them were heterogenous (Table S6), implying that the miRNA::target interactions for these MIRNAs were stable in the cultivated rice and its ancestor. However, both the miRNA microarray and high-throughput sequencing showed significant expression changes for some miRNA families in the cultivated rice and its ancestor. This could be caused by different transcriptional strengths of MIRNAs as a result of mutations in the promoter region that affect the recruitment efficiency of the transcription machinery or by mutations in the pri- or pre-miRNAs that affect miRNA biogenesis. These mutations could be the targets of artificial selection and be fixed in the cultivated populations. For the first possibility, no such case has been reported in MIRNAs, but it has been documented for protein-coding genes in rice. For example, a SNP in the 5′ regulatory region of the rice-shattering gene qSH1 caused loss of seed shattering owing to the absence of abscission layer formation in the cultivated rice (Konishi et al., 2006). For the second possibility, it has been demonstrated that mutations in the pre-miRNAs, particularly in the loop-distal region of the hairpin, would affect the secondary structure and processing of pre-miRNAs, and consequently the biogenesis of miRNAs, including the cleavage accuracy and rate of DCL1 (Mateos et al., 2010; Song et al., 2010; Werner et al., 2010). In addition, our population genetics investigation of the miR164 family has suggested a significant positive selection on at least three of its members in the domesticated rice population (Wang et al., 2010 and this study). It has been shown that miR164 targets NAC domain transcription factors, including Os12g41680, Os06g23650 and Os04g38720 (Wu et al., 2009; Li et al., 2010a,b). Overexpression of miR164b caused semi-dwarf and lower fertility in rice (Zhu et al., 2012). Therefore, miR164 might be important for establishing the proper plant architecture in the cultivated rice.


This work was supported by the National Natural Science Foundation of China (31071393) and the PhD programs of the Foundation of Ministry of Education of China (20100101110096).