Comprehensive analyses of microRNA gene evolution in paleopolyploid soybean genome


  • Zhengkui Zhou,

    1. State Key Laboratory of Plant Cell and Chromosome Engineering, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing, China
    Search for more papers by this author
    • These authors contributed equally to this work.
  • Zheng Wang,

    1. State Key Laboratory of Plant Cell and Chromosome Engineering, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing, China
    Search for more papers by this author
    • These authors contributed equally to this work.
  • Weiyu Li,

    1. State Key Laboratory of Plant Cell and Chromosome Engineering, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing, China
    Search for more papers by this author
    • These authors contributed equally to this work.
  • Chao Fang,

    1. State Key Laboratory of Plant Cell and Chromosome Engineering, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing, China
    2. University of Chinese Academy of Sciences, Beijing, China
    Search for more papers by this author
  • Yanting Shen,

    1. State Key Laboratory of Plant Cell and Chromosome Engineering, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing, China
    2. University of Chinese Academy of Sciences, Beijing, China
    Search for more papers by this author
  • Congcong Li,

    1. State Key Laboratory of Plant Cell and Chromosome Engineering, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing, China
    2. University of Chinese Academy of Sciences, Beijing, China
    Search for more papers by this author
  • Yunshuai Wu,

    1. State Key Laboratory of Plant Cell and Chromosome Engineering, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing, China
    Search for more papers by this author
  • Zhixi Tian

    Corresponding author
    • State Key Laboratory of Plant Cell and Chromosome Engineering, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing, China
    Search for more papers by this author

For correspondence (e-mail


miRNA genes are thought to undergo quick birth and death processes in genomes and the emergence of a MIRNA-like hairpin provides the base for functional miRNA gene formation. However, the factors affecting the formation of an active miRNA gene from a MIRNA-like hairpin within a genome remain unclear. We performed a genome-wide investigation of MIRNA-like hairpin accumulation, expression, structural changes and relationships with annotated genomic features in the paleopolyploid soybean genome. Our results showed that adjacent gene and transposable element content, rates of genetic recombination at location of emergence, along with its own gene structure divergence greatly affected miRNA gene evolution. Further investigation suggested that miRNA genes from different duplication sources followed distinct evolutionary trajectories and that the accumulation of MIRNA-like hairpins might be an important factor in causing long terminal repeat retrotransposons to lose activity during genome evolution.


MicroRNAs (miRNAs) play crucial regulatory roles in diverse biological processes in plants and animals (Jones-Rhoades et al., 2006; Ambros and Chen, 2007; Zhang et al., 2007) through the modulation of gene expression at both transcriptional and post-transcriptional levels (Carrington and Ambros, 2003; Ambros, 2004; Bartel, 2004; Chen, 2005). Studies have suggested that different miRNAs have originated from different sources. One popular model has postulated that miRNA genes arise from their target genes by inverted gene duplication (Allen et al., 2004; Axtell and Bowman, 2008; Tang, 2010). The duplicated members become miRNA genes through inversion and variation, which allow the gene transcripts to be adapted as substrates for Dicer-like (DCL) enzymes (Allen et al., 2004). Another model has suggested that miRNA genes are ‘birthed’ randomly from the numerous inverted repeats in the genome, an idea that is supported by the fact that precursor arms of many ‘young’ miRNAs do not show sequence similarity to any other sequences in the genome (Lu et al., 2005; Rajagopalan et al., 2006; Kasschau et al., 2007; Felippes et al., 2008). Comparisons among species have indicated that miRNA genes are subjected to frequent birth and death processes in the genomes (Fahlgren et al., 2007, 2010; Ma et al., 2010; Abrouk et al., 2012; Meunier et al., 2013). It has been estimated in Drosophila that approximately 12 new miRNA genes have arisen every million years, but many of these quickly died out after formation (Lu et al., 2008). Conserved miRNA genes have maintained a relatively slow birth and death rate, such that in grass species the birth rate has been estimated as 0.87–1.27 per million years (Abrouk et al., 2012).

Many evolutionary factors can affect miRNA gene evolution. One evolutionary force is duplication, including whole-genome duplication (WGD) and tandem duplication (TD) (Fahlgren et al., 2007). After duplication, miRNA genes undergo dispersal and diversification processes similar to that found in coding genes (Jiang et al., 2006; Li and Mao, 2007). Examination of some miRNA gene families has demonstrated that duplicated miRNA genes have different spatial and temporal expression patterns, a finding that suggests that duplicated copies acquire new functionality as they evolve (Maher et al., 2006). Nevertheless, it is likely that WGD does not significantly affect the total number of conserved miRNA genes, rather it promotes their fast death (Abrouk et al., 2012). Another evolutionary force that affects miRNA gene evolution is the functional divergence of their target genes. As a crucial regulator, miRNAs coevolve with their target genes to maintain their regulatory functions (Felippes et al., 2008).

Until now, most efforts to identify miRNAs have focused specifically on highly expressed miRNAs and have excluded transposable elements (Lisch, 2009), however evidence has suggested that new miRNA genes could be produced as a consequence of transposon activity (Piriyapongsa and Jordan, 2007, 2008). In addition, newly birthed miRNAs tend to express at lower levels compared with conserved miRNAs (Fahlgren et al., 2007) and, furthermore, some miRNAs may have lost their activity over the course of evolution (Lu et al., 2008). These findings suggested that genomic structures might play a role in the formation of mature miRNA genes. It has been proposed that miRNA genes always pass through a crucial stage, named the MIRNA-like hairpin, during birth and death processes (Nozawa et al., 2010; Berezikov, 2011). Structural variation in MIRNA-like hairpins may affect expression and routine analytical approaches may miss low expressed miRNA genes and, thus, these MIRNA-like hairpins. Hence, large-scale investigations of MIRNA-like hairpins have been seldom reported, and the factors that affect the formation of a miRNA gene from a MIRNA-like hairpin remain unclear. A comprehensive investigation of MIRNA-like hairpins at a genome-wide level will strengthen our understanding of miRNA gene birth and death.

In an attempt to answer above questions, we conducted a genome-wide identification of MIRNA-like hairpins and comprehensive analyses of their accumulation, expression and association with various genomic features and their structural changes within soybean (Glycine max) genome. The soybean genome has undergone two rounds of WGD and many TDs within the last 60 million years (Schlueter et al., 2007; Gill et al., 2009; Schmutz et al., 2010). Nearly 75% of the genes have been found to be present in multiple copies, but duplicated gene pairs have been highly diversified, such that ~25% of the soybean gene duplicates have been silenced and lost, and many new genes created (Shoemaker et al., 2006; Schmutz et al., 2010). The paleopolyploidy and rapid divergence of soybean genome make it an ideal genome for evolutionary analyses (Chan et al., 2012). Our results indicated that large amounts of duplicated MIRNA-like hairpins exist in the soybean genome, and that their distribution and activities have been highly affected by surrounding genomic features and their own structural variation. Moreover, they may play an essential role in the suppression of transposable element (TE) amplification.


Overview of soybean small RNAs sequencing

To survey accumulation, expression and divergence of miRNA genes and MIRNA-like hairpins at a genome-wide level in soybean, 23 small RNA libraries from root, shoot, leaf, flower, and seed at different developmental stages, including germination, trefoil, flowering, seed development and plant senescence (Figure S1), were constructed. In total, about 124.6 million (M) reads (~4.36 Gb) were obtained with an average of 5.42 M reads (~189.6 Mb) per sample. After trimming adaptor sequences and filtering out low quality reads, 117.1 M high quality reads, which corresponded to 94.0% of the total reads, were retained (Figure 1). The average high quality reads for each sample was 5.1 M (Table S1). The length of the reads ranged from 18 to 30 nucleotides (nt) with the highest abundance sequences at 21, 22 and 24 nt (Figure S2). About 81.6% of the high quality sequences perfectly matched the soybean genome; 16.6 M reads corresponding to 14.2% of the high quality reads, matched structural non-coding RNAs, such as ribosomal RNA (rRNA), transfer RNA (tRNA), small nuclear RNA (snRNA) and small nucleolar RNA (snoRNA) (Figure 1 and Table S1).

Figure 1.

Overview of the small RNA sequences. Each number in the pie represents the percentage of reads of category in the total small RNA sequences. See text for details about the calculation.

After removal of the sequences that did not map to the genome or that matched to non-coding structural RNAs, the remaining reads were used for several following analyses.

Identification of small RNAs and MIRNA-like hairpins in the soybean genome

In total, 555 Glycine max miRNAs and 24 586 miRNAs from other species have been annotated in the miRBase database (, Release 19: August 2012) (Griffiths-Jones, 2004; Griffiths-Jones et al., 2006, 2008; Kozomara and Griffiths-Jones, 2011). By homologue search and bioinformatic analysis, 1780 annotated and 1198 previously unknown miRNAs were identified from the 23 small RNA libraries. In total, 9.1 M reads matched to these miRNAs that corresponded to 7.3% of the total small RNA sequence reads (Figure 1). The miRNA read length distribution ranged from 18 to 30 nt with the majority at 21 nt (Figure S3c). Clustering of miRNA expression demonstrated that same tissues from different developmental stages, such as leaves, flowers, pods, and seeds, were grouped more closely (Figure S3e).

To identify all potential miRNAs, including those that are functional, potentially non-functional, and highly duplicated, a combined database that included all miRNA genes in the miRBase (Griffiths-Jones, 2004; Griffiths-Jones et al., 2006, 2008; Kozomara and Griffiths-Jones, 2011) and novel miRNA genes identified from our analysis was created for miRNA inquiries. Through a homologue search (see 'Experimental Procedures'), 28 281 MIRNA-like hairpins belonging to 2978 families were identified (Table S2).

No significant differences were observed when characteristics, such as family membership, length distribution of the sequence reads and miRNA expression clustering, were compared between datasets from miRNAs and all MIRNA-like hairpins (Figure S3). In an attempt to ascertain the evolutionary processes acting on miRNA at a genome-wide level, the dataset of MIRNA-like hairpins with 28 281 members was used for the following analyses.

Genomic landscape of MIRNA-like hairpin distribution and expression

To obtain an overview of the landscape of MIRNA-like hairpins within the genome, their distributions were pooled in 1-Mb contiguous subregions along each of the 20 chromosomes, as illustrated in Figure 2. This process demonstrated that MIRNA-like hairpins were distributed unequally along each chromosome. Overall, more MIRNA-like hairpins were found in the pericentromeric regions (average density of 45.0/Mb) than in the chromosome arms (average density of 9.1/Mb) (Figure S4a), but with dramatic variations within specific subregions (Figure 2).

Figure 2.

Distribution and expression of MIRNA-like hairpin, transcriptome expression, genomic features and estimated local genetic recombination (GR) rates along 20 soybean chromosomes. The distribution of MIRNA-like hairpin pooled in 1 M regions was put on the top (histogram) for each chromosome, following by the expression of miRNA-5p (the second), the expression of miRNA-3p (the third), transcriptome expression (the fourth), the genomic features (the fifth), and the estimated local GR rates (the curves at the bottom), the grey transparent shield indicate the GR suppressed pericentromeric regions.

Usually the functional mature miRNA has higher expression levels than their corresponding miRNA* (Jones-Rhoades et al., 2006; Chen, 2009; Voinnet, 2009), but the reverse phenomenon also exists (Zhao et al., 2012). In some cases it is difficult to distinguish mature miRNA and its miRNA* strand as both are expressed. In our analysis, we evaluated the expression of miRNA-5p and miRNA-3p independently. Average expression levels in 1-Mb contiguous windows along each chromosome were calculated (Figure 2) and, in contrast with MIRNA-like hairpin accumulation, the expression of miRNA in the pericentromeric regions was lower than in the chromosome arms (Figure S4b, c).

Variations of genomic features along the genome can show distinct biological properties. For example, changes in rates of genetic recombination (GR) can affect the accumulation of TEs (Wright et al., 2003; Tian et al., 2009), specifically the pericentromeric regions tend toward suppression of GR, fewer genes and more TE accumulation compared with that of the chromosome arms (Zhang and Gaut, 2003). Thus, we hypothesized that these genetic characteristics might also affect variation in MIRNA-like hairpin distribution and expression.

Correlations of MIRNA-like hairpin distribution with genomic features

We next wanted to determine what aspects of the genomic landscape might affect MIRNA-like hairpin accumulation.

First, correlations between the distribution of MIRNA-like hairpins with genomic features such as TE content, gene content, GR rate and GC content were computed in the contiguous 1-Mb windows across the genome. The results indicated that MIRNA-like hairpin accumulation was highly associated with genomic features, in that MIRNA-like hairpin density was significantly negatively correlated with gene content (Table 1 and Figure 3a) and GR rate (Table 1 and Figure 3d), while significantly positively correlated with TE content (Table 1 and Figure 3b) and GC content (Table 1 and Figure 3c). To eliminate the potential effect of different window sizes, another dataset of contiguous 500-kb windows was used to compute correlations (Table 1). These results were consistent with that of the 1-Mb windows. Genetic recombination in the pericentromeric regions is highly suppressed and affects the distribution of TEs and genes (Ma and Bennetzen, 2006; Tian et al., 2009). In order to eliminate this ‘centromere effect’, another analysis was performed using the datasets in which the pericentromeric regions were excluded. Similar correlations were detected, except that the correlation ratios were decreased slightly (Table S3). Thus, analyses from different datasets indicated that the accumulation of MIRNA-like hairpins was highly related to local genome structure.

Table 1. Correlations of MIRNA-like hairpin density with genomic features
WindowFeaturesPearson correlation
r a P b
  1. a

    Pearson correlation coefficient.

  2. b

    All P-values calculated by 10 000 bootstrap resamplings.

1 MbProportion of genes (DNA%)−0.834<10−4
Proportion of transposable elements (DNA%)0.895<10−4
GC content (%)0.905<10−4
Genetic recombination rates (cM/Mb)−0.555<10−4
500 kbProportion of genes (DNA%)−0.756<10−4
Proportion of transposable elements (DNA%)0.825<10−4
GC content (%)0.844<10−4
Genetic recombination rates (cM/Mb)−0.442<10−4
Figure 3.

Correlations of MIRNA-like hairpin density with genomic features. (a) MIRNA-like hairpin density plotted against gene content. (b) MIRNA-like hairpin density plotted against TE content. (c) MIRNA-like hairpin density plotted against GC content. (d) MIRNA-like hairpin density plotted against genetic recombination (GR) rates.

We performed a genome-wide search for MIRNA-like hairpins in the rice genome to determine if the relationship of MIRNA-like hairpins with genomic features could be also found in other species. In total, 7996 MIRNA-like hairpins were identified (Table S4). The density was about 20/Mb, slightly less than that in soybean (25/Mb). As found in soybean, MIRNA-like hairpin accumulation was significantly negatively correlated with gene content, GR rate and significantly positively correlated with TE content and GC content (Table S5), suggesting that this finding is not unique to soybean and may be true for other organisms.

Correlation of MIRNA-like hairpin expression with genomic features

In contrast with the relationship of MIRNA-like hairpin accumulation with the various genomic features, miRNA expression was significantly positively correlated with gene content and GR rates, and significantly negatively correlated with TE content and GC content (Table 2). To determine the correlation between miRNA expression and the surrounding genes, the same 23 samples were deep sequenced using RNA-Seq; RNA expression in each window was calculated (Figure 2). We found that miRNA-5p and miRNA-3p expression were significantly positively correlated with local gene expression (Table 2).

Table 2. Correlations of miRNA expression with genomic features
FeaturesmiRNA-5p expressionmiRNA-3p expression
r a P b r a P b
  1. a

    Pearson correlation coefficient.

  2. b

    All P-values calculated by 10 000 bootstrap resamplings.

Proportion of genes (DNA%)0.338<10−40.384<10−4
Proportion of transposable elements (DNA%)−0.361<10−4−0.421<10−4
GC content (%)−0.314<10−4−0.382<10−4
Genetic recombination rates0.262<10−40.286<10−4
RNA expression level0.311<10−40.357<10−4
MIRNA loop length0.297<10−40.367<10−4
MIRNA diversity−0.171<10−4−0.204<10−4

The miRNA transcribed region is comprised of miRNA-5p, miRNA-3p, the loop and other sequences. The loop region is a variable sub-region that helps to maintain miRNA gene activity (Tang, 2010). Sequences between the miRNA-5p and the miRNA-3p are not completely complementary to each other and also process dynamic divergence during miRNA gene evolution (Lu et al., 2008; Voinnet, 2009). To determine whether these characteristics of the miRNA gene affect miRNA expression, we evaluated the divergence of the miRNA-5p and the miRNA-3p, and loop length for each MIRNA-like hairpin. Interestingly, and similar to the MIRNA-like hairpin accumulation and expression, the divergence and loop length in the pericentromeric regions and chromosome arms showed different patterns. The average divergence between the miRNA-5p and the miRNA-3p in the pericentromeric regions was higher than in the chromosome arms (Figure S4d), and the average loop length in pericentromeric regions was shorter than that in the chromosome arms (Figure S4e). Correlation analyses between miRNA-5p and miRNA-3p expression with miRNA divergence and loop length revealed that miRNA expression was significantly negatively correlated with miRNA divergence and significantly positively correlated with miRNA loop length (Table 2). Additional analyses using the 500-kb window dataset demonstrated similar relationships (Table S6).

The above analyses were based on the dataset from contiguous windows along the chromosome. To further verify these relationships, another dataset from individual MIRNA-like hairpins was adopted. The GR rate for each MIRNA-like hairpin was estimated based on its midpoint; GC content, gene content, TE content, and RNA expression within the surrounding 1-Mb region (500 kb for each side from the midpoint) and 500-kb region (250 kb for each side from the midpoint) were computed. Correlation analyses suggested that miRNA expression was affected by characteristics of the MIRNA-like hairpin and its surrounding genomic features in that expression was positively correlated with MIRNA loop length, GR rate, gene content and surrounding RNA expression, but negatively correlated with miRNA diversity, surrounding TE and GC content (Table S7).

Differentiation of duplicated MIRNA-like hairpins derived from different duplication events

Among the 28 281 MIRNA-like hairpins we identified, only 1647 were single copy, the majority had multiple copies (Table S2), a finding that was indicative of the existence of abundant duplicated MIRNA-like hairpins in the soybean genome. In addition, many MIRNA-like hairpin elements belonging to the same family shared almost identical flanking sequences.

During the last 60 million years the soybean genome has undergone two rounds of WGD and has had many TD events (Schlueter et al., 2007; Gill et al., 2009; Schmutz et al., 2010), which may led to the generation of duplicate MIRNA-like hairpins. By integration of the duplicated regions of the soybean genome and MIRNA-like hairpin loci, 594 and 454 MIRNA-like hairpins mediated by WGD and TD, respectively, were identified; this number accounted for 3.7% of the total MIRNA-like hairpins. In addition to the WGD- and TD-mediated MIRNA-like hairpin duplications, the other 25 586 elements frequently have nearly identical copies in the genome. It would be interesting to decipher how these large amounts of duplicated MIRNA-like hairpins were generated. Because MIRNA-like hairpin accumulation was highly positively correlated with TEs (Table 1 and Figure 3b), we proposed that TEs, along with their amplification in the genome, might be important for MIRNA-like hairpin duplication. Further analysis indicated that about 86% of the MIRNA-like hairpins co-localized with TEs, of which 78% were in the long terminal repeat retrotransposons (LTR-RTs) and 8% in DNA TEs (Figure S5). Thus, our data indicate that large numbers of duplicated MIRNA-like hairpins exist in the soybean genome and were mediated by WGD, TD and TE amplification.

The recent soybean WGD occurred 13 million years ago (Schmutz et al., 2010), but TE amplification and TD occur continually. The timeframe differences of these duplication mechanisms may result in different miRNA evolutionary processes. Hence, characteristics among different MIRNA-like hairpin duplication groups were compared. The results demonstrated that both miRNA expression of WGD- and TD-mediated MIRNA-like hairpins were significantly higher than those mediated by TEs (Figure 4a,b and Table S8). In addition, their loop lengths were longer and their GR rates were higher than that of TE-mediated MIRNA-like hairpins (Figure 4d,e and Table S8). These patterns were consistent with our previous correlations (Table 2). Inconsistent results were found when miRNA diversity was investigated in different duplication groups. For example, according to our correlation analyses, the miRNA diversity of TD-mediated MIRNA-like hairpins should be lower than that of TE-mediated MIRNA-like hairpins as they had relatively high levels of expression, however the opposite was found. In addition, the expression of TD-mediated MIRNA-like hairpins was even higher than that of WGD-mediated MIRNA-like hairpins whose diversity was much lower (Figure 4 and Table S8). When the MIRNAs in TD-mediated MIRNA-like hairpins were examined in detail, a block containing 180 almost identical highly expressed Gma-MIRL-1099 and Gma-MIRL-421 in chromosome 13 from 14.78 Mb to 15.54 Mb was found. Gma-MIRL-1099 and Gma-MIRL-421 were duplicated as a unit and distributed at near regular intervals (Figure S6a, b). Dot-plot analysis indicated that this region was highly duplicated (Figure S6c). When the MIRNA-like hairpins in this region were excluded, the expression level of the TD-mediated duplication group became much lower (Figure 4a,b and Table S8) and the diversity also decreased (Figure 4c). Thus, with the exclusion of the highly duplicated block of MIRNA-like hairpins, the overall relationships between miRNA expressions with other characters among different duplication groups showed same patterns with the other analyses.

Figure 4.

Comparisons of MIRNA-like hairpin characters among different duplication groups. (a) miRNA-5p expression, (b) miRNA-3p expression, (c) MIRNA-like hairpin diversity, (d) MIRNA-like hairpin loop length, and (e) genetic recombination (GR) rates.

We performed additional correlation analyses between the expressions of MIRNA-like hairpins with other features within each duplication group to determine the major differences among the groups. These results (Table S9) revealed: (i) a common relationship for all groups was that miRNA-5p and miRNA-3p expression was positively correlated with loop length and negatively correlated with miRNA diversity; (ii) the correlation coefficients for duplication groups were lower than the overall dataset, and the correlation coefficients for TE-mediated duplicated MIRNA-like hairpins were much lower than that of WGD- and TD-mediated duplicated MIRNA-like hairpins; and (iii) except for the MIRNA-like hairpin loop length and diversity, other important factors affecting MIRNA-like hairpins expression were differed between the WGD- and TD-mediated duplications. For WGD-mediated MIRNA-like hairpins, the surrounding GC content was an important factor correlated with their expression, while for TD-mediated MIRNA-like hairpins, surrounding the TE content was more significant.

The common relationships reinforced the concept that the structure of the MIRNA-like hairpins greatly affect expression, whereas differences among duplication groups and the lower correlation coefficients indicate that origin and genomic location have an influence on their activity. Together this situation implies that the expression of MIRNA-like hairpins was a combination of their structure and surrounding genomic features.

Conservation and specification of WGD-duplicated MIRNA-like hairpins

Plotting the 297 pairs of WGD-mediated MIRNA-like hairpins on the genome with duplicated gene pairs showed that most of these co-localized with WGD-generated genes (Figure 5a). To determine if WGD miRNA genes co-evolved with their surrounding duplicated genes, the divergence of paired duplication miRNA genes and surrounding 10 duplicated gene pairs (five paired genes from each side) were computed and correlations for divergence of MIRNA-like hairpins with the surrounding 10 duplication gene pairs were calculated. A slightly positive correlation was detected (= 0.16, = 0.015), which suggested that the surrounding duplication genes can affect the evolution of the adjacent WGD-mediated MIRNA-like hairpins.

Figure 5.

Whole-genome duplication (WGD)-mediated MIRNA-like hairpin duplications. (a) Distribution of WGD-mediated duplicated MIRNA-like hairpins and WGD gene pairs. Pink lines indicate the WGD-mediated duplicated MIRNA -like hairpins, and the grey lines indicate WGD gene pairs. (b) The distribution of WGD pairs along pair diversity. (c) Correlation of pairing MIRNA-like hairpin expression at different diversity levels.

Similar to the fate of the duplicated genes, some of the duplicated MIRNAs may become pseudogenes, functionally redundant, neofunctionalized, or sub-functionalized (Roulin et al., 2013). In our analysis, about half of the total number of paired MIRNA-like genes appears to maintain their original functions, a situation that was reflected by their nearly identical miRNA-5p (50%) or miRNA-3p (55%) (Figure 5b). These MIRNA-like genes may be functional redundancy in the soybean genome. The other half of the total duplicated MIRNA-like genes had diverged and had different expression patterns or levels, including miRNA156, which is thought to play an important role in plant development through regulation of Squamosa-promoter-binding proteins. In Arabidopsis, different members of this family are undergoing different evolutionary processes, some of them have lost their activity and some pairs are functionally redundant (Maher et al., 2006). In soybean, the Gma-MIR156 family contains 30 members, of which 12 pairs are the result of WGD. Our analyses demonstrated that the expression patterns of different MIR156 family members, especially between different duplication pairs, were highly divergent (Figure S7). Even for two members of a WGD pair, when their miRNAs diverged, their expression also showed a divergent pattern, for example Gma-MIRNA156 h and Gma-MIRNA156k, Gma-MIRNA156j and Gma-MIRNA156e (Figure S7), a finding that suggests that they might be undergoing neofunctionalization or subfunctionalization.

When the relationship between expression patterns of duplicated genes and their divergence were investigated, we found the pairs that had lower divergence rates had relatively higher correlation than those that were more diverged (Figure 5b). There was an exception, in that those miRNA-3p with divergences of 0.2–0.6 were more correlated than those with rates of 0.1–0.199. This may be due to a bias from the small number of the paired MIRNA-like hairpins in these two groups.

To maintain regulatory function, miRNAs co-evolve with their target genes (Maher et al., 2006; Felippes et al., 2008). Here, the expression pattern divergence of the duplicated MIRNA pairs may be related to the evolution of their target genes, an aspect that requires more detailed functional future research. From another perspective, the divergence and the correlation of expression patterns between paired genes suggest that expression is probably affected by the divergence of MIRNA-like hairpins.

Accumulation of MIRNA-like hairpins in LTR-RTs

Transposable elements play an important role in plant adaptive evolution through generating variation in gene expression and function (Lisch, 2013). It has been shown that different TE families were active and amplified within distinct evolutionary timeframes. Some TE families may lose activity, as shown by the relatively ‘older’ timeframes of their amplification peaks. In contrast, some families have a greater number of copies that have arisen recently, a finding that indicates that they may yet be active (Du et al., 2010b). The process that results in a decrease in amplification activity after an ‘accumulation peak’ remains unclear. LTR-RTs are less abundant in animals, but are common in plants (Wicker et al., 2007). Our analyses demonstrated that many MIRNA-like hairpins co-localized with LTR-RTs. We compared the temporal pattern of accumulation of LTR-RT intact elements and the ratio of LTR-RTs with MIRNA-like hairpins (intact LTR-RTs with MIRNA-like hairpins versus total intact LTR-RTs within the timeframe). Interestingly, we found a MIRNA-like hairpin peak (ratio of LTR-RTs with MIRNA-like hairpins) before the LTR-RT accumulation peak for many ‘older’ LTR-RT families. For example, TEs Gmr3, Gmr1 and Gmr25 were thought to lose their activity after highest amplifications at 0.5–1.0, 0.5–1.0, and 2.0–2.5 million years ago (mya), respectively (Figure 6a–c). When the ratios of LTR-RTs with MIRNA-like hairpins were calculated, peaks at 2.0–2.5, 2.0–2.5, and 2.5–3.0 mya respectively were detected. However, this situation was not the case for other families who had more recent activity, such as Gmr2, Gmr4 and Gmr9, although the ratios of LTR-RTs to MIRNA-like hairpins were quite high (Figure 6d–f). These results suggest that MIRNA-like hairpins may play an important role in the evolution of LTR-RTs, in that their accumulation within a LTR-RT family can temper the amplification of other LTR-RTs within a family.

Figure 6.

Timing and accumulation of different long terminal repeat retrotransposon (LTR-RT) families and related MIRNA-like hairpins. Accumulation (○) and ratio of intact elements containing MIRNA-like hairpins (●) of Gmr3 (a), Gmr1 (b), Gmr25 (c), Gmr2 (d), Gmr4 (e), and Gmr9 (f) along insertion time. mya: million years ago.

To confirm this finding, the LTR-RT elements of a distinct family, Gmr9, were analyzed in detail. Gmr9 is the largest family in the soybean genome, which can be further classified into different subfamilies based on sequence divergence. These subfamilies have undergone different evolutionary processes that have resulted in a distinct amplification timeframe (Du et al., 2010a). SAREA, and SNRES1 have more recently inserted copies than their duplicate pairs, SAREB, and SNRES2, respectively. Thus, SAREA, and SNRES1 were thought to be active more recently (Du et al., 2010a). In our analysis, large amounts of MIRNA-like hairpins that co-localized with Gmr9 were detected. The accumulation timeframes of MIRNA-like hairpins and LTR-RTs in different Gmr9 subfamilies were checked and, interestingly, MIRNA-like hairpins that contained peaks were detected before or at the same timeframe as the accumulation of SAREB, and SNRES2 LTR-RTs (Figure 7b,d), but this was not the case for SAREA, and SNRES1 (Figure 7a,c). To verify the activities of these four subfamilies, their expression levels were further compared and we found that SAREA and SNRES1 had higher expression levels than SAREB and SNRES2 (Figure 7e), while the miRNA-5p in SNRES2 and the miRNA-3p in SNREB had higher expression levels than that of SNRES1 and SNREA, respectively (Figure 7f,g).

Figure 7.

Timing, accumulation and expression level of different Gmr9 subfamilies and MIRNA-like hairpins. Accumulation number of intact elements (○) and percentage of intact elements containing MIRNA-like hairpins (●) of SAREA (a), SAREB (b), SNRES1 (c) and SNRES2 (d). (e–g) illuminate the expression level of different Gmr9 subfamilies (e), the miRNA-5p (f), and miRNA-3p (g) within the elements. mya: million years ago.


Evolution of miRNA genes was affected by genomic features and internal structural variation

miRNAs were initially discovered in C. elegans (Lee et al., 1993), but subsequently have been found in many animal (Ambros, 2004) and plant species (Millar and Waterhouse, 2005; Zhang et al., 2006). The presence of some highly conserved miRNAs in both plants and animals indicates that microRNA-based genes emerged early during the evolution of plants and animals (Pasquinelli et al., 2000; Axtell and Bartel, 2005). Among the 28 281 MIRNA-like hairpins examined in this study, about 90% were identified by homologue searches from other species, indicating their presence before species divergence. Besides conserved miRNA genes, each plant species has distinct miRNAs that do not exist in other species. For example, among the ~100 miRNA families from Arabidopsis, only 21 are present in the monocot rice (Axtell and Bowman, 2008). Even in closely related species, some miRNAs are distinct to one but not the other (Felippes et al., 2008). These results suggest that many miRNA genes evolve rapidly (Fahlgren et al., 2007, 2010; Ma et al., 2010; Abrouk et al., 2012; Meunier et al., 2013), and that most should quickly lose activity (Lu et al., 2008; Meunier et al., 2013). Based on our analysis the ratio of duplicated MIRNA-like hairpins from genome-wide and TD events to single copy MIRNA-like hairpins was 0.36 (2.1–5.8%). This low ratio indicates that many new MIRNA-like hairpins have been generated or that duplicated ones have been lost. This result confirms the idea that miRNA genes are evolving more quickly than other genes.

miRNA genes can emerge from different sources, such as gene duplication, introns, TE, and so on (Berezikov, 2011), but how genomic features affect MIRNA-like hairpin structure formation and expression are unclear. Through correlation analyses, we found that MIRNA-like hairpin accumulation and expression were greatly influenced by surrounding genomic features. Nevertheless, the correlation of their accumulation with genomic features contrasted with that of their expression. We proposed that this discrepancy might be due to the evolution of MIRNA-like hairpins into functional miRNA genes or that more miRNA genes can retain activities in gene rich environments than in the TE-rich regions. Hence, knowledge of the region in which MIRNA-like hairpins originated would be very useful to better understand the evolutionary processes and fates of miRNA genes. After formation, miRNA genes can be duplicated through the process of whole-genome, tandem, and segmental duplication (Maher et al., 2006; Li and Mao, 2007). The MIRNA-like hairpins from different sources showed different characteristics. For example, TE-mediated duplicated MIRNA-like hairpins showed signs of higher diversity, lower expression and shorter loop length than the other groups. Single copy and WGD-mediated duplication MIRNA-like hairpins had higher expression than others, and the diversity was lower while loop length was longer, a finding that indicates that they may be more active. This result further suggested to us that surrounding genomic features greatly affect the evolutionary processes of MIRNA-like hairpins.

The evolution of a MIRNA-like hairpin to a functional miRNA gene is gradual, such as the point mutation that occurred eventually in the miRNA-5p and miRNA-3p sequences, and loop length variation (Tang, 2010; Berezikov, 2011). In our analyses, MIRNA-like hairpin expression was significantly negatively correlated to divergence between miRNA-5p and miRNA-3p and significantly positively correlated to MIRNA loop length. This finding hinted that both the birth and death of miRNA genes or their activities was associated with changes in stem pair divergence and the loop length region, a finding that was consistent with a recent study on short tandem target mimic (STTM) (Yan et al., 2012). Yan et al. verified that the activity of the STTM was greatly affected by the length of the short spacer between the two small RNA, partially complementary, sequences. Although in our study a positive correlation was detected between MIRNA-like hairpin expression and loop length, it is hard to say that a longer loop is better for the MIRNA-like hairpin function for each specific miRNA gene. Instead, we may suggest that within certain length limitations, longer loops may generate more active miRNA genes. This finding would be beneficial for future small interfering RNA studies.

In summary, our data indicate that duplication together with local genomic features and hairpin structure variation play critical roles in miRNA gene evolution.

Role of MIRNA-like hairpins in the evolution of transposons

LTR-RTs are the most abundant TEs in plant genomes. Their accumulation is the primary mechanism driving plant genome expansion (Bennetzen et al., 2005). Several mechanisms have been reported to counteract this expansion, including DNA methylation (Hamilton et al., 2002; Palmer et al., 2003; Emberton et al., 2005), conversion to heterochromatin (Lippman et al., 2004), and recombination generated deletions (Devos et al., 2002; Tian et al., 2009). Even though all LTR-RT may be subjected to these evolutionary forces, different families showed activity and amplificaition within distinct evolutionary timeframes. The factors that promote a LTR-RT family losing its activities are poorly understood.

Studies have suggested that miRNA genes may arise from their target genes by inverted duplication (Axtell and Bowman, 2008; Tang, 2010), and it has been found that many miRNA precursor genes strongly resemble TE, such as MITEs, with nearly perfect inverted repeats (Allen et al., 2004). More evidence in both mammals and plants indicated that some miRNA precursor genes are, in fact, transposons (Smalheiser and Torvik, 2005; Piriyapongsa and Jordan, 2008). But most efforts to identify miRNAs have specifically excluded TEs, thus miRNAs located in TEs were missed and the relationship between miRNAs and transposons has been seldom reported.

Among the MIRNA-like hairpins in this study, about 86% were located in TE regions a finding that suggested that MIRNA-like hairpins may play important roles in TE activity and evolution. We found that non-active TE families had MIRNA-like hairpin peaks shortly before their burst. However, for those that are still active, no MIRNA-like hairpin peak was found, which suggested that the MIRNA-like hairpins might be associated with LTR-RT activity. We hypothesized that the process might be as follows: (i) MIRNA-like hairpins arose along with the LTR-RTs during evolution and some of these MIRNA-like hairpins formed functional miRNA genes; (ii) these functional miRNA genes may have functioned as regulators for the LTR-RTs belonging to a TE family; and (iii) gradually the LTR-RTs belonging to this family lost their activities, possibly due to the role of the miRNA gene. Although this model needs additional evidence for validation, it at least provides some insights into that fact that miRNA genes may affect the activity of TEs, and sheds light on our understanding of the evolution of transposable elements.

Experimental Procedures

Plant growth condition and material collection

Soybean (Glycine max cv. Williams 82) plants were grown in the Experimental Station of the Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, in Beijing. Materials were collected at different developing stages and quickly frozen in liquid nitrogen.

small RNA library and RNA-Seq library construction and sequencing

Total RNA was isolated using TRIzol reagent (Invitrogen). Small RNA libraries were constructed as described previously (Lu et al., 2007). Briefly, small RNAs in the size range of 15–30 nucleotides (nt) were purified from 200 μg of total RNA by 15% denaturing polyacrylamide gel electrophoresis (PAGE) gels and ligated to an adapter. After first-strand synthesis and 16 cycles of PCR amplification, the final bands were PAGE gel purified and submitted for sequencing on a Hi-Seq 2000 analyzer. RNA-Seq library construction was performed followed the method described previously (Severin et al., 2010). RNAs were used as templates for random-primed cDNA synthesis after denaturation, and then the cDNAs were ligated to adapters. DNA libraries were amplified by PCR for 16 cycles. The purified products were submitted for sequencing on the Hi-Seq 2000 analyzer.

Computational analysis of sequencing data

After removal of low quality reads and clipping adapter sequences, the raw sequencing data were mapped to the soybean reference genome [] using the program BWA (Li and Durbin, 2009). Reads perfectly matching the soybean genome, excluding those matching tRNAs, rRNAs, snRNA, and snoRNAs in the Rfam database (, were used for further miRNA analysis.

miRNA prediction

Known and novel miRNA identification was performed by miRDEEP pipeline analysis (Friedlander et al., 2008). In brief, sequences from 100 nt upstream to 100 nt downstream of the remaining aligned reads were extracted from the soybean genome. Potential miRNA stem loops were located by sliding a 100 nt window advancing by 10-nt increments along the strand of sequences and folding the window with the secondary structure prediction program RNAfold (Hofacker et al., 1994) to identify stem–loop structures. Z-scores (Washietl et al., 2005) were adopted to filter the candidates. Only hairpins with a z-score less than −3.0 were annotated as novel miRNA genes.

A search for all potential MIRNA-like hairpins was conducted by BLASTing using a combined miRNA database that included the miRNAs from the miRBase database (, Release 19: August 2012) (Griffiths-Jones, 2004; Griffiths-Jones et al., 2006, 2008; Kozomara and Griffiths-Jones, 2011) and the above identified miRNAs. The sequences that matched the miRNA precursor 90% in length with a similarity of 90% were considered to be potential MIRNA-like hairpins. Their structures, including miRNA, loop and miRNA*, were annotated based on the according homologous miRNA genes.

miRNA expression and transcriptome data analysis

The filtered small RNA sequences that perfectly mapped to mature miRNA-5p or miRNA-3p were calculated as miRNA-5p or miRNA-3p expression. If a sequence hit to one distinct locus, this read was considered to represent the expression of this miRNA. In contrast, if a sequence hit to more than one precursor locus, it was divided by the hit number, the average was considered to be equal to the expression of these hitting miRNAs. Final expression of miRNA-5p or miRNA-3p was calculated as TP5M.

Tophat (Trapnell et al., 2009, 2012) packages were used to map all of the 23 samples of RNA-Seq data to the soybean genome, and Cufflinks (Trapnell et al., 2010) packages were used to calculate overall each 1 Mb window transcriptional expression.

The distribution of miRNA and expression, and subsequent statistical analyses

Each chromosome was split into contiguous 1 Mb and 500 kb regions (called windows) from the beginning to the end of each chromosome. GR rates were obtained for each window and plotted on the basis of their midpoints. The miRNA, TE, gene contents/densities, were calculated based on their proportions within each window. For the dataset of chromosome arm windows, the first window, the last window, and the windows covering pericentromeric regions, were not included in the analysis. The correlations among investigated parameters were assessed using Pearson's correlation by 10 000 bootstrap resamplings as described previously (Tian et al., 2009).


We thank Dr Jianchang Du (Jiangsu Academy of Agricultural Sciences, China), and Mrs Shouhong Sun (Institute of Genetics and Developmental Biology, China) for their helpful comments and discussions. This work was supported by the National Natural Science Foundation of China (Grant Nos. 91131005 and 31222042), and the National Key Basic Research Program (No. 2013CB835200).