Some transposable elements (TEs) show extraordinary variance in abundance along sex chromosomes but the mechanisms responsible for this variance are unknown. Here, we studied Ogre long terminal repeat (LTR) retrotransposons in Silene latifolia, a dioecious plant with evolutionarily young heteromorphic sex chromosomes. Ogre elements are ubiquitous in the S. latifolia genome but surprisingly absent on the Y chromosome.
Bacterial artificial chromosome (BAC) library analysis and fluorescence in situ hybridization (FISH) were used to determine Ogre structure and chromosomal localization. Next generation sequencing (NGS) data were analysed to assess the transcription level and abundance of small RNAs. Methylation of Ogres was determined by bisulphite sequencing. Phylogenetic analysis was used to determine mobilization time and selection forces acting on Ogre elements.
We characterized three Ogre families ubiquitous in the S. latifolia genome. One family is nearly absent on the Y chromosome despite all the families having similar structures and spreading mechanisms. We showed that Ogre retrotransposons evolved before sex chromosomes appeared but were mobilized after formation of the Y chromosome. Our data suggest that the absence of one Ogre family on the Y chromosome may be caused by 24-nucleotide (24-nt) small RNA-mediated silencing leading to female-specific spreading.
Our findings highlight epigenetic silencing mechanisms as potentially crucial factors in sex-specific spreading of some TEs, but other possible mechanisms are also discussed.
Transposable elements (TEs) are ubiquitous in all eukaryotes. In plants, long terminal repeat (LTR) retrotransposons are the most widespread TEs (Finnegan, 1989). These TEs use a ‘copy and paste’ mode of spread and are among the main drivers that increase the size of plant genomes in the evolutionary short term (Kumar & Bennetzen, 1999) leading to extraordinary variations in genome size within even closely related species (Vitte & Bennetzen, 2006). The evolutionary dynamics of TEs alternate between transposition bursts and periods when the mobility of TEs is very low (Naito et al., 2006; Bergman & Bensasson, 2007). Bursts of retrotransposition are well documented in multiple mutant lines of Arabidopsis thaliana (Tsukahara et al., 2009), and demonstrate the stochastic and independent ability of each TE type to be activated. Factors that can account for variation in the stability and activity of TEs include: dependence on the host's genetic background (Vu & Nuzhdin, 2011), changes in host silencing mechanisms, the ability of TEs to escape host regulation, and the degree of TE self-regulation (Ågren & Wright, 2011).
As has been shown in chromoviruses (Gao et al., 2008) and yeast (Boeke & Devine, 1998), TEs usually colonize various chromosomal niches, which is the likely consequence of TEs targeting specific chromosomal regions. Specific distribution of some TE families has been observed on heteromorphic sex chromosomes in plants (Steinemann & Steinemann, 2005). For example, large accumulations of LINE (long interspersed nuclear element) elements on the Y chromosome were observed in hemp (Cannabis sativa) (Sakamoto et al., 2000). In the dioecious plant Silene latifolia, copia retrotransposons showed accumulation on the large Y chromosome compared with other chromosomes, while Ogre and Retand (Tat) retrotransposons which are very abundant in the genome were absent or underrepresented on the Y chromosome (Kejnovsky et al., 2006a; Cermak et al., 2008; Filatov et al., 2009). A similar depletion of Tat and Athila elements on the Y chromosome was observed in Rumex acetosa with the XY1Y2 system (Steflova et al., 2013). Overall, this pattern of absence of some TEs on Y chromosomes contradicts the hypothesis that repetitive elements tend to accumulate on Y chromosomes because of a lack of recombination (Charlesworth, 1991). These patterns of TE distribution on the Y chromosome indicate that other factors and mechanisms shaping the genomic landscape of these chromosomes are in action.
Ogre elements represent a widespread family of giant (up to 23 kb long) gypsy-type LTR retrotransposons present in many plant species (Macas & Neumann, 2007). Their gag/pol gene product ratio is probably regulated by splicing of the primary transcript, as has been shown in pea (Pisum sativum) and Medicago truncatula (Neumann et al., 2003; Steinbauerová et al., 2008). Another interesting feature of Ogre retrotransposons is the presence of relatively well-conserved extra open reading frames (eORFs) of unknown function downstream or upstream of their gag–pol genes (Neumann et al., 2003; Steinbauerová et al., 2012). In the dioecious plant species S. latifolia, Ogre elements account for c. 13% of the genome and significantly enlarged its size (Macas et al., 2011; Cegan et al., 2012). Similar levels of Ogre element amplification have also been observed in Vicia panonica (Neumann et al., 2006). There are numerous other cases where amplification of mobile elements has significantly enlarged the genomes of one species compared with the genomes of its relatives (Hawkins et al., 2006; Piegu et al., 2006; Ungerer et al., 2006).
Several mechanisms could explain the absence of Ogre elements on the Y chromosome and are examined here. First, Ogre elements could have expanded in the genome before the origin of sex chromosomes and remained inactive during the evolution of sex chromosomes. Secondly, there may be a mechanism for Ogre removal from the Y chromosome. Thirdly, because they colonize only recombining parts of the genome, the spread of Ogre elements could be connected in some way with the recombination process. Fourthly, interaction with some female-specific cellular proteins could be crucial for Ogre retrotransposition. Lastly, Ogre could be active only in female individuals because of either female-specific activation or male-specific silencing.
Materials and Methods
All plant species of the genus Silene used in this study came from the collection at the Institute of Biophysics in Brno, Czech Republic. Full-length Ogre elements have been isolated from the bacterial artificial chromosome (BAC) library characterized by Cegan et al. (2010). Ogre element copy numbers in the S. latifolia Poir. genome were estimated by hybridization of respective LTR probes with the BAC library and by counting reads in the genomic DNA 454 and Illumina libraries. To examine the transcription, splicing and diversity of Ogre elements, total RNA and DNA were isolated from male and female leaves and flower buds and used as a template for amplification reactions. To prevent the formation of chimeric PCR products, we used an emulsion PCR protocol (Williams et al., 2006). The mapping of transcription starts and ends was performed using the SMARTer RACE cDNA amplification kit (Clontech, Mountain View, CA, USA). Fluorescence in situ hybridization experiments were performed according to Lengerova et al. (2004) with slight modifications. In situ detection of Ogre mRNAs in cryosections of S. latifolia anthers and pistils was perfomed according to Brewer et al. (2006). RNA-seq and small RNA-seq Illumina sequenced libraries were prepared from purified pollen grains, pistils and young male and female leaves of the same S. latifolia population from Brno Bystrc, Czech Republic. Transcript level and small RNA abundance were estimated by counting reads and normalizations as stated in the Supporting Information Methods S1. The methylation level of Ogre elements was examined by bisulphite sequencing of Ogre genomic copies isolated from pollen grains, flower buds and leaves; primers were designed using Bisprimer (Kovacova & Janousek, 2012). To distinguish Ogre copies from vegetative cells and sperm cells of the pollen grain, hierarchical cluster analysis was performed. Ancestral state reconstructions, mobilization times and selection analyses of Ogre elements were performed by phylogenetic analyses as described in detail in Methods S1. Primers used in this study are summarized in Table S1. All methods are described in detail in Methods S1.
Structure and chromosomal localization of Ogre families in S. latifolia
To examine the structure of Ogre elements, we utilized sequences obtained from similarity-based clustering of 454 reads of S. latifolia genomic DNA (Macas et al., 2011). We identified three abundant Ogre families represented by three 454 clusters: CL5, CL6 and CL11. Reconstructed elements contained gag and pol genes as well as eORFs typical for Ogre LTR retrotransposons (Steinbauerová et al., 2012). Ogre CL5 was identical to the Ogre identified by Cermak et al. (2008) and SlOgre1 partially characterized by Filatov et al. (2009), while Ogre CL6 and Ogre CL11 were newly identified in this study. To obtain full-length Ogre sequences, we screened the BAC library of S. latifolia. In 32 BAC clones, we found four full-length or nearly complete Ogre elements (accession numbers KC206272–KC206275; Table 1), and 83 partial Ogre sequences.
Table 1. Genomic copies of Ogre elements in Silene latifolia
TE length (bp)
LTR – left (bp)
LTR – right (bp)
LTR similarity, %
TE, transposable element; LTR, long terminal repeat; TSD, target site duplication; PBS, primer binding site; cons, consensus based on two incomplete left and right LTRs. Some information is not available (n/a).
Left LTR deletion (1235–1589)
c. 17 900
The structure of typical elements of the three Ogre families (Ogre CL5, Ogre CL6 and Ogre CL11), including nonautonomous Ogre CL5del, is depicted in Fig. 1(a). Despite similar lengths and high structural similarity, the three Ogre families differed significantly in their sequences. Pairwise homology was 70% in gag–pol genes, but much lower in other parts of the elements (Fig. 1b). The nonautonomous Ogre CL5del lacked the protease domain (Fig. 1c) and Ogre CL5 had two eORFs with opposite orientations, while Ogre CL6 and Ogre CL11 had only one eORF. These eORFs showed protein similarity to ORF1 described in pea Ogre elements, but their function is still enigmatic (Neumann et al., 2003; Macas & Neumann, 2007; Steinbauerová et al., 2012). Short regions (c. 8–10 bp) at the 3′ and 5′ ends of the LTRs of all studied Ogre elements exhibited high conservation (Fig. 1d). These sites are known to bind integrase before element integration (Hindmarsh & Leis, 1999; Cherepanov et al., 2011), which suggests that all three Ogre families utilize the same integration mechanism. Poly purine tracts (PPTs) within a 70-bp conserved region were also the same for all Ogre families.
A striking feature of all three Ogre families was the presence of a fully conserved secondary primer binding site (PBS) located 80–90 bp upstream of the primary PBS inside the left LTR (Fig. 1d). Both PBS sites were complementary to the 3′ end of tRNA-Arg, which is typical for Ogre elements in other plant species (Neumann et al., 2003; Macas & Neumann, 2007). The high conservation of the secondary PBS indicates its functional importance, possibly as an alternative site for reverse transcription initiation.
The chromosomal distribution of Ogre CL5 has been studied in S. latifolia, Silene diclinis and Silene dioica by Cermak et al. (2008), Filatov et al. (2009) and Howell et al. (2009) using the internal region of the elements. In all cases, Ogre CL5 has been found to be very abundant in the whole genome, with the exception of the nonrecombinig part of the Y chromosome. Here, we also used integrase and LTR regions as probes for fluorescence in situ hybridization (FISH) and mapped not only Ogre CL5, but also Ogre CL6 and CL11 (Fig. 2). As expected, Ogre CL5 was ubiquitously localized with the exception of the Y chromosome but, surprisingly, Ogre CL6 and CL11 were localized on all chromosomes including the Y chromosome. We used degenerate primers to amplify Ogre elements on genomic DNA of S. vulgaris, Silene colpohylla and Silene otites and used them as FISH probes on chromosomes of these Silene species (Table S1c). We found that the copy number of Ogre elements in these species was much lower, indicating specific and extensive amplification of Ogre elements (CL5, CL6 and CL11) in the ancestor of Melandrium section. The phylogenetic relationships within the genus Silene are explained in Fig. 3. Copy numbers of Ogre elements in the S. latifolia genome as estimated by hybridization of LTR probes with the BAC library are 4990, 4220 and 2880 for Ogre CL5, CL6 and CL11, respectively. Similar results were obtained by counting reads in the Illumina and 454 genomic libraries (Table S2).
In order to distinguish possible differences in the Ogre neighbourhood (heterochromatin or euchromatin), we analysed flanking sequences in Illumina genomic reads mapped onto Ogre elements. We found no differences in GC content or nucleotide composition (data not shown), suggesting that Ogre elements do not prefer any specific chromosomal loci (sequence context) for insertion. This result is consistent with the regular spatial distribution of Ogre elements along chromosomes.
Ogre elements are transcribed and spliced
We studied the transcription of Ogre elements by RT-PCR using total RNA from roots, leaves and various floral tissues, including whole male and female flower buds and separated pistils and pollen grains, as templates. Primers were designed on the basis of 454 reconstructed elements (Table S1a,b). We found that all three Ogre families were constitutively transcribed in all tested tissues (data not shown). Further, we determined the transcript level by counting RNA-seq reads mapped to Ogre LTRs and GAG domains (Fig. 4a,b) in flower buds, leaves, pistils and pollen. The transcript level was much higher in LTRs than in GAG domains. This suggests that potential full-length Ogre transcripts form a minority of all Ogre transcripts. Transcripts within LTRs may represent either defective RNA Pol II products that do not extend to the polyA end or long noncoding RNAs.
To reveal whether Ogre is transcribed from its own promoter and forms functional polyadenylated mRNA, we performed 5′ and 3′ end RACE experiments on total RNA isolated from flower buds. Transcription starts were localized inside the left LTR but at different positions in the three Ogre families: 196, 496 and 1677 bp from the beginning of the LTR in Ogre CL5, Ogre CL11 and Ogre CL6, respectively (Fig. S1a). In contrast to the unambiguous transcription start sites, polyadenylated transcription ends were not as specific. The conserved polyadenylation signal CATAAA was mapped to position c. 1870 bp of the Ogre CL5 LTR and was followed by polyA tails at distances of 24–150 bp (Fig. S1b). Ogre CL11 polyA tails were mapped to positions 2671–2794 bp and Ogre CL6 polyA tails were 2281–2320 bp and 2712–2800 bp from LTR starts. In addition to dominant 5′ and 3′ RACE products, we found a few clones indicating some minor alternative transcription starts and ends. Taken together, the RACE experiments showed that all Ogre families are driven by their own promoters localized within their LTRs and that Ogre elements produce full-length polyadenylated mRNA transcripts.
The presence of Ogre transcripts prompted us to perform in situ hybridization of Ogre-derived probes on cryosections of anthers and ovaries of S. latifolia. As shown in Fig. 5, expression of Ogre CL5 and Ogre CL11 was visible in pollen (Fig. 5a,c) as well as in cells inside female ovules: integuments, the nucellus and the diploid secondary nucleus of the central cell (Fig. 5e,g). The lack of hybridization signal with sense probes indicates no or a very low level of cotranscription of Ogre elements (Fig. 5b,d,f,h). This supports the previous conclusion that Ogre elements are actively transcribed from their own promoters.
In 454 data, we found a variant of Ogre CL5 with the protease domain deleted: Ogre CL5del (Fig. 1a). PCR with primers bridging the potential intron showed that the short (spliced, 627-bp-long) fragment dominated in RNA, while longer (unspliced, 2227-bp-long) fragments prevailed in genomic DNA (Fig. S1a,b). These fragments were cloned and sequenced. Multiple sequence alignment revealed specific splicing sites bordering the protease gene, indicating that autonomous Ogre CL5 transcripts can be spliced (Fig. S1c). Clones containing Ogre CL5del were highly divergent in the 100-bp region around splicing sites. For this reason, Ogre CL5del is probably an ancient spliced copy, which is nonautonomous but active. We counted the autonomous Ogre CL5, its spliced variant and nonautonomous Ogre CL5del in Illumina genomic data sets and discovered that they make up 83.5, 3.5, and 13% of genomic copies, respectively. Open reading frames in the majority of nonautonomous and spliced Ogre elements remained intact, which implies that truncated copies can produce polyproteins containing all proteins except protease.
Branching order of Silene Ogre retrotransposon groups
To assess the pattern of Ogre evolution in the genus Silene, we constructed phylogenetic trees from the alignment of Silene Ogre elements as the ingroup and several Ogre elements from distantly related species as the outgroup (alignment A and translated alignment A). Fig. 6 and Notes S1 show the results of the maximum likelihood analysis of the nucleotide data sets. The phylogenetic tree clearly shows that the Silene Ogre retrotransposons are subdivided into three families that correspond to Ogre CL5, Ogre CL6 and Ogre CL11. The Shimodaira–Hasegawa-like approximate likelihood ratio test (aLRT; Anisimova & Gascuel, 2006) values that support the grouping of each family of Silene Ogre retrotransposons were within the range 0.96–1.00 (Notes S1), and thus the subdivision into the three families is strongly supported. Apparently, the branch leading to Ogre CL11 first evolved from a hypothetical ancestor, and the sequences Ogre CL5 and Ogre CL6 evolved later. Interestingly, S. otites and S. colpophylla, representatives of the subgenus Silene, contain only Ogre CL6 and Ogre CL11 elements, whereas species of the subgenus Behenantha contain all three Ogre families (Fig. 6, Notes S1). The results of phylogenetic analyses performed by Bayesian inference and on the translated alignment confirm the results obtained by the maximum likelihood approach (Notes S1, S2).
Approximate timing of the Silene Ogre emergence
To assess the timing of the evolution of all three groups of Silene Ogre retrotransposons, we constructed a phylogram based on sequences of four nuclear genes (fructose-2,4-bisphosphatase (Atanassov et al., 2001), spermidine synthase (Atanassov et al., 2001), CCLS1 (Barbacar et al., 1997; Zluvova et al., 2010), and eIF4A (Zluvova et al., 2005, 2006); Fig. 3, Table S3). The phylogram was used as an input in the ancestral state reconstruction to calculate the probability of the presence of Ogre CL5 at three internal nodes: (1) the node corresponding to the ancestor of the Silene subgenus Silene, (2) the node corresponding to the ancestor of the sections Melandrium, Conoimorpha, Behenantha and Physolychnis within the Silene subgenus Behenantha, and (3) the node corresponding to the ancestor of the genus Silene. The results are summarized in Table 2. The results of both maximum likelihood and Bayesian inference show that the ancestor of the subgenus Silene probably did not contain Ogre CL5, whereas Ogre CL5 was present in the ancestor of the sections Melandrium, Conoimorpha, Behenantha and Physolychnis. The probability that Ogre CL5 was present in the ancestor of the genus Silene is 10−5 (as calculated by maximum likelihood) or even less (as calculated by Bayesian inference). Thus, the presence of Ogre CL5 in the ancestor of the genus Silene is unlikely. We used the chronogram (prepared from the same nucleotide data set as the phylogram) to assess the approximate timing of the emergence of the Silene Ogre families (Fig. S2). According to our results, both Ogre CL6 and Ogre CL11 probably appeared before the split of the subgenera Silene and Behenantha and thus were probably present at the origin of the genus Silene, that is, they are older than 17 million yr. Ogre CL5 appeared between the emergence of the most recent common ancestor (MRCA) of the genus Silene and the MRCA of the sections Melandrium, Conoimorpha and Physolychnis within the subgenus Behenantha. Thus, Ogre CL5 probably emerged between 13.4 and 17 million yr ago.
Table 2. Probability of Ogre CL5 being present at three different nodes of the Silene phylogenetic tree presented in Fig. 3
Probability of the presence of CL5
5 × 10−6
Estimation of Ogre mobilization time
To determine the expansion period of Ogre elements in the Silene genome, we used two independent approaches to calculate the length of the terminal branches within the Ogre tree: the terminal branch lengths in the chronogram calculated in BEAST (Drummond & Rambaut, 2007) and the length of terminal branches for synonymous substitutions calculated in PAML (Yang, 2007). BEAST analysis suggested that Ogre CL5 actively expanded through the genome c. 1.75 million yr ago, Ogre CL6 c. 3 million yr ago, and Ogre CL11 c. 1.5 million yr ago (Fig. 7a). The Mann–Whitney test showed that the difference between CL5 and CL6 was not significant (Table S4).
The distribution of branch lengths for synonymous substitutions (calculated in PAML) in Ogre CL5 was found to peak at c. 7.5% of synonymous nucleotide changes per codon, in Ogre CL6 at c. 6%, and in Ogre CL11 at c. 3% (Fig. 7b). The difference between Ogre CL5 and Ogre CL6 was not statistically significant (Table S4). To convert the percentage of synonymous substitutions to time, we estimated the relative time of mobilization maxima using four chronograms of sex-linked genes (XY4, Atanassov et al., 2001; DD44, Moore et al., 2003; Cyp, Bergero et al., 2007; XY1, Delichère et al., 1999; Rautenberg et al., 2008). The chronograms are presented in Fig. S3, and the results are summarized in Table S5 and Fig. S4. The predicted mobilization maxima are c. 2.7 million yr ago in the case of Ogre CL5, c. 2.2 million yr ago in the case of Ogre CL6, and c. 1.1 million yr ago in the case of Ogre CL11.
Thus, Ogre CL5 and Ogre CL6 were massively mobilized first, followed by Ogre CL11. The fact that most of the analysed Ogre CL5 and Ogre CL6, but not Ogre CL11 BAC clones are already degenerated (they have disturbed open reading frames; Table S6) also supports the hypothesis that Ogre CL5 and Ogre CL6 are significantly older than Ogre CL11. Because the oldest analysed sex-linked gene pair (XY4) split c. 6 million yr ago, the maximum of mobilization of all Silene Ogre elements was present at a time when the sex chromosomes were already established. It is also worth noting that, according to our chronograms, speciation within the section Melandrium probably started 0.8–1.4 million yr ago (Fig. S4). Thus, the maximum of the transposition activity of Ogre CL5 and Ogre CL6 took place before speciation within the section Melandrium, but the maximum of the transposition activity of Ogre CL11 took place slightly before or at the time of speciation.
Selection analysis of Silene Ogre elements
In order to assess whether Ogre elements evolved different interactions with the host, we estimated the ratio (ω) of the non-synonymous substitution rate (dN) to the synonymous substitution rate (dS) using a partial sequence of the gag gene and the full pol gene. Whereas by the branch test, the terminal branches of Ogre CL5 and Ogre CL6 had a significantly higher ω than the background, the ω of the terminal branches of Ogre CL11 did not significantly differ from the background (Table S7). However, the branch analysis of the branches leading from the MRCA of all Silene Ogre retroelements to the MRCA of Ogre CL5, Ogre CL6 and Ogre CL11 (hereafter referred to as internal branches) showed very strong purifying selection along these branches (Table S7). When we compared the site models M1a and M2a, we found no evidence for positive selection (data not shown). However, a comparison of models M0 and M3 showed that M3 better fitted the data (P < 10−16), meaning that allowing the ω ratio to vary across the sequence provided a better fit than assuming one ω ratio.
The branch-site analysis of terminal branches (for a brief explanation of the method, see Methods S1 – selection analyses) revealed the effects of purifying selection in most codons, but did not detect any codon under adaptive evolution in any group of Ogre retroelements (Table 3). Compared with the analysis of the internal branches, the percentage of neutrally evolving codons was more than twice as high. Interestingly, the branch-site analysis of internal branches detected adaptive evolution in all three groups of Ogre retroelements. The positively evolving codons are distributed along the Ogre sequences nonrandomly (Fig. 8). The fewest positively selected codons are present within the reverse transcriptase, RNase H and integrase domains, whereas the highest number of sites under positive selection are present between functional domains and also within GAG.
Table 3. Branch-site analysis of terminal and internal branches of Ogre elements in Silene latifolia
Branch-site analysis of terminal branches
Percentage of codons consistent with
Purifying selection (%)
Neutral evolution (%)
Adaptive evolution (%)
Branch-site analysis of internal branches
Percentage of codons consistent with
Purifying selection (%)
Neutral evolution (%)
Adaptive evolution (%)
Note that the percentage of codons consistent with neutral evolution is higher in terminal branches, whereas a high percentage of codons consistent with adaptive evolution was detected in the internal branches.
8.55 (ω = 142.5; P <10−16)
8.44 (ω = 119.1; P <10−16)
13.09 (ω = 31.9; P <10−16)
Ogre elements are differentially regulated by 24-nucleotide small RNAs
To uncover possible regulation of distinct Ogre families by small RNA (sRNA) molecules, we isolated and sequenced fractions of RNA (15–50 nt) from male and female leaves, female reproductive organs (fertilized and unfertilized pistils) and pollen grains. The sequencing reactions were performed using the Illumina platform. We also used external sRNA libraries (accession numbers in Methods S1) and added them to our data sets. sRNAs were mapped onto the sequences of the three Ogre families, as well as onto those of the Retand and Athila LTR retrotransposons. The counts of mapped reads were normalized to element sequence length, element copy number and size of the respective library (see Methods S1). We found a large number of sRNA molecules distributed particularly along LTRs of all studied TE types (Fig. S5). Twenty-four-nucleotide sRNAs were among the most abundant sRNA species. They were ten times more abundant than 23-nt, 22-nt and 21-nt sRNAs (Table S8, Fig. 9). Interestingly, Ogre CL5 and Retand expressed c. 10-fold more sRNAs than Ogre CL6, Ogre CL11 and two Athila families (Fig. 9) in all tissues. Thus, the Ogre CL5 and Retand elements, both of which are underrepresented on the Y chromosome, are probably more intensively regulated by sRNAs than the other elements.
To reveal whether highly abundant 24-nt sRNAs drive the methylation of Ogre elements, we performed bisulphite sequencing of Ogre LTRs. We found that CG and CHG methylation (H = A, T, or C) was largely retained in mature pollen compared with flower buds and leaves. By contrast, CHH methylation was lost in somatic tissues and restored in pollen for most of the studied LTR regions (Fig. 10). As bisulfite-treated reads from pollen showed variable levels of CHH methylation, we performed a hierarchical cluster analysis and separated these reads based on CHH methylation level (see Methods S1). Clusters of reads with high CHH methylation probably represent copies from the vegetative nucleus, whereas those with lower CHH methylation are from sperm cells. Detailed CHH methylation levels and the 24-nt sRNA distribution are depicted in Fig. S6. The colocalization of 24-nt sRNA abundance and increased CHH methylation suggests that 24-nt sRNAs may potentially drive CHH methylation of Ogre LTRs. We did not determine the methylation level in pistils and seeds because of the complexity of the tissues, but we presume that the abundant 24-nt sRNA molecules probably guide CHH methylation of Ogre elements in the embryo, as suggested by Jullien et al. (2012).
As shown in Fig. 4, the transcript level was very high in LTRs and low in GAG domains. Thus, there was probably a very small number of full-length polyadenylated Ogre mRNAs as a result of potential silencing guided by 24-nt sRNAs. The transcript level along LTRs differed for distinct Ogre families and correlated with peaks of sRNA distribution (Fig. S7). Interestingly, the transcript level was very high around the transcription start site of Ogre CL5 in pollen (Fig. S7). This may indicate that Ogre CL5 may be more intensively silenced by RNA directed DNA methylation (RdDM) machinery in the male germline.
PCR amplification and FISH experiments revealed that Ogre elements are present in all dioecous and nondioecious Silene species, which is in agreement with previous findings, which described Ogre elements as being widespread in plants (Macas & Neumann, 2007). Our results indicate that Ogre elements escaped host control recently, after the common ancestor of the sex-chromosome-possessing Melandrium species had diverged from other Silene species. Thus, the spread of Ogre elements through the genome accompanied the evolution and diversification of the dioecious Silene species. The recentness of their spread is also supported by high pairwise identity of left and right LTRs (Table 1) and the ongoing transcription of Ogre elements (Fig. 4). Interestingly, in the Silene genome, 3.5% of all copies are integrated after splicing, which is similar to spliced Ogre copy numbers in other species, which are as low as 1.5% in pea (Neumann et al., 2003) and 3.2% in Medicago truncatula genomes (Steinbauerová et al., 2008). Splicing seems to be a general property of these giant TEs and may be a mechanism that regulates the gag/pol product ratio (Neumann et al., 2003).
One of the most interesting characteristics of Ogre retrotransposons in Silene is the underrepresentation of Ogre CL5 within the Y chromosome, in contrast to the uniform chromosomal distribution of the Ogre CL6 and Ogre CL11 families. This pattern is not exceptional among plant LTR retrotransposons, as Retand elements are also underrepresented on the Y chromosome of S. latifolia (Kejnovsky et al., 2006a). Similar chromosomal distributions of TEs have been observed in hemp and sorrel (Rumex acetosa) (Sakamoto et al., 2000; Steflova et al., 2013). Here, we addressed the question of why Ogre CL5 is underrepresented on the Y chromosome. The first hypothesis that we tested was that Ogre CL5 expanded before the origin of the Y chromosome while other Ogre elements expanded more recently. However, our model of Ogre evolution within the genus Silene (Fig. 11) shows that activity of all three groups of Ogre elements was highest after the Y chromosome originated and before speciation of three dioecious species. This suggests that other factors must be responsible for the differential chromosomal patterns of distinct Ogre families on the Y chromosome.
Another hypothesis was that Ogres were removed from the nonrecombining part of the Y chromosome. Rapid DNA loss counterbalances genome expansion in plants (Hawkins et al., 2009). If current transposition of Ogre CL5 is slowed as a result of silencing and the DNA removal is faster in the Y chromosome than in others, Ogre CL5 elements would appear underrepresented in the Y chromosome. This explanation is in compliance with DNA loss from Y chromosomes at later evolutionary stages (for a review, see Bachtrog, 2013). However, it is contrary to previous findings that the Y chromosome of S. latifolia accumulates repetitive sequences, microsatellites and chloroplast DNA (Hobza et al., 2006; Kejnovsky et al., 2006b; Kubat et al., 2008), and also to the hypothesis that TEs tend to accumulate in Y chromosomes as a result of the lack of recombination with X chromosomes (Charlesworth, 1991; Steinemann & Steinemann, 2005). Importantly, the histogram of terminal branch lengths in the chronogram obtained in BEAST (Fig. 7) shows that the number of recent insertions is about the same for all Ogre families. Filatov et al. (2009) also reported the age distribution of Ogre insertions in several Silene species and found that, although the maximum of transposition activity was present several millions of years ago, all species contain recent insertions. Thus, the mechanism removing Ogre CL5 from the Y chromosome would have to be very efficient and specific for this TE family. For example, recombination between the left and right LTRs of a single element should remove the internal region of the element, whereas solo LTRs should remain in the genome (Shirasu et al., 2000; Devos et al., 2002). Nevertheless, FISH experiments revealed that the hybridizing signal seems to be identical for both LTR and RT domain probes (Fig. 2). Taken together, these results suggest that loss of Ogre elements on the Y chromosome is not likely to be a sufficient explanation.
As Ogre CL5 is present only in the recombining pseudoautosomal region (PAR) of the Y chromosome, another hypothesis explaining this pattern is that Ogre CL5 spread is in some way connected with recombination. It is known that integrases of some Ty3/gypsy LTR retrotransposons can interact with histones and direct integration of elements into specific chromatin locations (Gao et al., 2008; Neumann et al., 2011). Similar tethering mechanisms, where integrase interacts with various factors, are crucial for insertion and targeting of the murine leukemia virus (MLV) and human immunodeficiency virus (HIV) retroviruses (Buschman, 1994; Buschman & Miller, 1997; Buschman et al., 2005; Lewinski et al., 2006; Cherepanov et al., 2011). Even small changes in the integrase sequence can influence the interaction of integrase with host proteins and thereafter insertion of a retrovirus or LTR retrotransposon into genomic DNA. We showed that the early evolution of all three Ogre families was connected with a high percentage of codons that evolved adaptively (Table 3). The remainder of the codons are under strong purifying selection. Baucom et al. (2009) obtained similar results in rice (Oryza sativa). We detected adaptive evolution mostly in the regions between functional domains and in GAG. We can hypothesize that the other functional domains were selected for their ideal function for a long time and thus further changes do not increase the ability of Ogre elements to insert into chromosomes. The low number of codons consistent with adaptive evolution within Ogre integrases reduces the probability, but does not exclude the possibility, that Ogre CL5 integrase evolved a new requirement to bind a factor present on recombining chromosomes or that Ogre CL6 and Ogre CL11 lost this requirement and started to insert into all chromosomes including Y. Additionally, we found high conservation of both ends of the LTRs (Fig. 1d), where integrase binds (Hindmarsh & Leis, 1999; Cherepanov et al., 2011), and that the base composition of flanking sequences is identical for all Ogre families, supporting the idea that they share the same integration mechanism and lack of integration site selectivity. The presence of Ogre CL5 in the PAR region and not in other parts of the Y chromosome could alternatively be explained by recombinational transfer from the X chromosome, as PAR regions are subject to an increased probability of crossing-over (Otto et al., 2011). For these reasons, we consider it improbable that integration of Ogre CL5, but not Ogre CL6 and CL11, is connected to the recombination machinery, but to completely exclude this possibility, further studies of interactions of integrase with the host proteome will have to be performed.
Another hypothesis is based on specific activity of the Ogre CL5 family in female plants. This can evolve either through interaction of a female factor with Ogre CL5 during its life cycle or through sex-specific transcriptional regulation. However, we found that all Ogre elements are transcribed in both sexes. As the Ogre CL5 transcript level along its LTR was found to be spatially different and higher than the transcript levels of other Ogre families (Fig. S7), the promoter strength of Ogre families probably differs or there may be different mechanisms regulating Ogre families at the transcriptional or posttranscriptional level. To distinguish between coding and noncoding transcripts is difficult; thus, the excess of RNA transcripts from LTRs over transcripts within the internal region (Fig. 4) can mirror either aberrant RNA Pol II transcripts or long noncoding RNAs that are produced by Pol IV and Pol V involved in RdDM. In this respect, an interesting question emerged: why do some TEs produce much more sRNAs than do others? In Arabidopis thaliana, the production of 24-nt sRNA molecules is dependent on RNA Pol IV synthesizing long noncoding RNAs (lncRNAs) that are made double-stranded by RNA-dependent RNA polymerase 2 (RDR2) and processed by Dicer-like nucleases (Haag et al., 2012; Wierzbicki, 2012; Wierzbicki et al., 2012). The mechanism controlling RNA Pol IV targeting is still not fully understood. In A. thaliana, recent findings demonstrate that methylated H3K9 is recognized by SAWADEE homeodomain homolog 1 (SHH1), an RNA Pol IV-interacting protein, and that 44% of RNA Pol IV-dependent sRNAs are also SHH1 dependent (Law et al., 2013). Thus, we can speculate that Ogre CL5 and Retand LTRs are targeted by RNA Pol IV more than the other elements, leading to a higher production of sRNAs. This hypothesis is supported by the higher transcript level in Ogre CL5 and Retand LTRs matching the distribution of sRNAs (Fig. S7), and by the excess of transcripts in LTRs over internal regions (Fig. 4). Nevertheless, the reduced transcript level within internal region can alternatively be explained by aberrant RNA Pol II products or their rapid degradation.
The high level of sRNAs complementary to Ogre CL5 and Retand elements indicates a possible role of epigenetic mechanisms in regulating these TEs. Their conspicuous chromosomal pattern suggests that the mechanism regulating their spread has to be conserved through many generations. Thus, epigenetic changes in the germline and during development may provide an explanation. Recent findings suggest that, in A. thaliana, the mechanism for transposon control has evolved in reproductive cells where TEs are reactivated by genome-wide loss of DNA methylation during both male and female gametogenesis and embryonal development. In the vegetative nucleus of pollen grains, TEs are demethylated as a result of the active effect of Demeter (DME) and down-regulation of the heterochromatin remodeller decrease in DNA methylation 1 (DDM1), which leads to TE transcription and subsequent production of 21-nt sRNA molecules. These sRNAs then travel to the transcriptionally silent sperm cells to reinforce TE silencing (Slotkin et al., 2009; Ibarra et al., 2012). As Ogre CL5 accumulates much higher levels of 21-nt sRNAs than the other elements, we can speculate that it is silenced more intensively in pollen.
Whereas DDM1 is required for asymmetric DNA methylation of internal regions of long TEs and is mediated by chromomethylase 2 (CMT2) separately from RdDM (Zemach et al., 2013), TE edges (LTRs) are also methylated by the RdDM pathway mediated by domains rearranged methyltransferase 2 (DRM2). During pollen development, CHH methylation is gradually lost from microspores and sperm cells, while it is restored in the vegetative nucleus where DRM2 is substantially accumulated (Calarco et al., 2012). We showed that there are two clusters of Ogre elements with low and high CHH methylation levels in pollen. We presume that the highly methylated elements represent copies from the vegetative nucleus, while the demethylated elements are from sperm cells. This suggests that the mechanism of epigenetic reprogramming is probably similar in A. thaliana and Silene. The increased CHH methylation of Ogre elements is probably guided by 24-nt sRNAs, as indicated by the colocalization of sRNAs and CHH methylation level in pollen (Fig. S6). As the transcript level inside Ogre elements is very low in pollen (Fig. 4), all Ogre elements may be silent and unable to retrotranspose in the male germline. Alternatively, the Ogre CL6 and CL11 families are not fully silenced because of lower 24-nt sRNA levels (Fig. 9), and only Ogre CL5 is restricted to paternal genome and Y chromosome transposition. However, Retand produces more complete transcripts, suggesting that this element remains active in pollen. This would explain its significantly lower presence on the Y chromosome.
A similar phenomenon is thought to occur in the female gametophyte embryo sac of A. thaliana (Gehring et al., 2009; Hsieh et al., 2009) and can be used to explain the absence of Ogre CL5 on the Y chromosome. During female gametogenesis, the activity of the maintenance DNA methyltransferases methyltransferase 1 (MET1) and CMT3 required for CG and CHG methylation is undetectable and this leads to progressive demethylation, activation and transposition of TEs in the maternal genome. By contrast, both maintenance and de novo DNA methyltransferases DRM1 and DRM2 are expressed strongly in the embryo, while the same genes remain inactive in the central cell and endosperm. Reactivated TEs in the endosperm then serve as a source of migrating sRNAs that guide gradual remethylation of TEs after fertilization and throughout embryogenesis (Calarco & Martienssen, 2012; Jullien et al., 2012). Relatively small changes in the level of 24-nt sRNAs should be enough to reactivate TEs, as shown in cotton (Gossypium hirsutum) (Romanel et al., 2012). Thus, we hypothesize that high levels of 24-nt sRNAs derived from Ogre CL5 and Retand elements ensure that these two TE families are preferentially methylated in CHH sequence context by the de novo methyltransferases DRM1 and DRM2, while TEs with lower 24-nt sRNA levels remain active for some time during embryogenesis. Silencing in pollen together with silencing in the embryo lead to complete protection of the paternal genome from insertions of Ogre CL5. Nevertheless, further studies of sRNA abundance, transcription and methylation level will have to be performed in pollen and seed (embryonal) tissues.
In summary, we found that three Ogre families escaped host regulation recently and spread intensively in the dioecious Melandrium species, resulting in significant genome size increases. We suggest that LTRs coding for regulatory sequences are remethylated by the RdDM pathway in pollen and assume that they may also be methylated in the embryo. The 10-fold greater number of sRNAs targeting one Ogre family (Ogre CL5) probably results in rapid silencing of this particular element in the male germline and in the embryo after fertilization. This presumed silencing mechanism may have resulted in the absence of Ogre CL5 on the Y chromosome. We also hypothesize that similar epigenetic mechanisms may account for the absence of TEs on the sex chromosomes of other plant species. Nevertheless, other mechanisms, such as TE insertion connected with recombination or the need for female factors to allow TE spreading, cannot be excluded. Last but not least, a similar mechanism may influence maternal genome-specific spreading of TEs in other plant species without sex chromosomes.
We thank Dr Jiri Siroky for performing in situ hybridization experiments on cryosections of anthers and ovaries and Eleni Michu for providing genic sequences. The access to computing and storage facilities owned by parties and projects contributing to the National Grid Infrastructure MetaCentrum, provided under the programme ‘Projects of Large Infrastructure for Research, Development, and Innovations’ (LM2010005), is greatly appreciated. This work was supported by grants from the Czech Science Foundation (grant nos. P501/10/P483, P501/10/0102, P305/10/0930 and P501/12/2220), by the project ‘CEITEC – Central European Institute of Technology’ (CZ.1.05/1.1.00/02.0068) from the European Regional Development Fund and by the project OPVK (CZ.1.07/2.3.00/20.0045). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.