Short interfering RNAs (siRNAs) homologous to transcriptional regulatory regions can induce RNA-directed DNA methylation (RdDM) and transcriptional gene silencing (TGS) of target genes. In our system, siRNAs are produced by transcribing an inverted DNA repeat (IR) of enhancer sequences, yielding a hairpin RNA that is processed by several Dicer activities into siRNAs of 21–24 nt. Primarily 24-nt siRNAs trigger RdDM of the target enhancer in trans and TGS of a downstream GFP reporter gene. We analyzed siRNA accumulation from two different structural forms of a trans-silencer locus in which tandem repeats are embedded in the enhancer IR and distinguished distinct RNA polymerase II (Pol II)- and Pol IV-dependent pathways of siRNA biogenesis. At the original silencer locus, Pol-II transcription of the IR from a 35S promoter produces a hairpin RNA that is diced into abundant siRNAs of 21–24 nt. A silencer variant lacking the 35S promoter revealed a normally masked Pol IV-dependent pathway that produces low levels of 24-nt siRNAs from the tandem repeats. Both pathways operate concurrently at the original silencer locus. siRNAs accrue only from specific regions of the enhancer and embedded tandem repeat. Analysis of these sequences and endogenous tandem repeats producing siRNAs revealed the preferential accumulation of siRNAs at GC-rich regions containing methylated CG dinucleotides. In addition to supporting a correlation between base composition, DNA methylation and siRNA accumulation, our results highlight the complexity of siRNA biogenesis at repetitive loci and show that Pol II and Pol IV use different promoters to transcribe the same template.
RNA-directed DNA methylation (RdDM) is a short interfering RNA (siRNA)-mediated epigenetic pathway that contributes to the transcriptional silencing of transposons and other repetitive sequences in plants. RdDM requires specialized transcriptional machinery that includes two RNA polymerase II (Pol II)-related, plant-specific RNA polymerases, called Pol IV and Pol V, as well as a number of ancillary factors that assist Pol-IV and Pol-V transcription, and participate in the silencing effector complex. Pol IV initiates siRNA biogenesis by transcribing a single-stranded RNA that is copied into double-stranded RNA by RNA-DEPENDENT RNA POLYMERASE 2 (RDR2). The double-stranded RNA is processed by DICER-LIKE 3 (DCL3) to produce 24-nt siRNAs that are the major trigger for DNA methylation. The Pol-V complex acts downstream of siRNA biogenesis to generate scaffold RNAs that are believed to base-pair with complementary siRNAs incorporated into ARGONAUTE 4, resulting in the recruitment of DOMAINS REARRANGED METHYLTRANSFERASE 2 (DRM2) to catalyze de novo methylation of DNA at the target site (He et al., 2011; Pikaard et al., 2012; Wierzbicki, 2012; Matzke and Mosher, 2014).
Short interfering RNAs of 24 nt can also be generated in a Pol IV–RDR2-independent pathway, wherein Pol II transcribes a DNA inverted repeat (IR), producing a single-stranded RNA that is self-complementary and folds back on itself to form an RNA hairpin. The hairpin RNA is processed redundantly by several DCL enzymes, including DCL3, into siRNAs of 21–24 nt (Dunoyer et al., 2005; Kanno et al., 2008; Daxinger et al., 2009). Transgene IRs are frequently used to silence genes at the transcriptional (Eamens et al., 2008; Kanno et al., 2008; Finke et al., 2012) and post-transcriptional levels (Dunoyer et al., 2005; Fusaro et al., 2006; Molnar et al., 2010). In Arabidopsis thaliana (Arabidopsis), several endogenous IRs that give rise to hairpin-derived siRNAs have been described (Henderson et al., 2006; Zhang et al., 2007; Dunoyer et al., 2010).
As assessed by genome-wide patterns of DNA methylation and siRNA accumulation, the primary targets of RdDM are transposons and non-protein coding repeats (Cokus et al., 2008; Lister et al., 2008). These sequences are most abundant in pericentromeric heterochromatin, where they are targets of both RdDM-dependent and RdDM-independent epigenetic silencing pathways (Zemach et al., 2013); however, a number of targets of RdDM are also present in genic and intergenic regions in euchromatin (Huettel et al., 2006; Lee et al., 2012; Zemach et al., 2013). Features that recruit Pol IV and Pol V are not yet fully understood, but specific nucleic acid structures, RNA and certain chromatin modifications may all play a role (Haag and Pikaard, 2011). Two recent studies suggested that Pol IV is recruited to a subset of its targets through its association with SAWADEE HOMEODOMAIN HOMOLOG 1/DNA-BINDING TRANSCRIPTION FACTOR 1 (SHH1/DTF1), which binds histone H3 methylated at lysine 9 through its SAWADEE domain (Law et al., 2013; Zhang et al., 2013). Chromatin immunoprecipitation followed by deep sequencing (ChIP-seq) analyses of genome-wide Pol-V occupancy identified up to two thousand peaks corresponding to transposons and genes that are distributed among all five Arabidopsis chromosomes (Wierzbicki et al., 2012; Zhong et al., 2012). A common sequence motif at these loci that might be involved in recruiting Pol V was not identified (Wierzbicki et al., 2012). It therefore remains unclear whether Pol IV and Pol V recognize and use conventional promoters in a way similar to Pol II (Haag and Pikaard, 2011).
Various tandem repeats, such as those in the promoter regions of FLOWERING WAGENINGEN (FWA; Soppe et al., 2000; Chan et al., 2006) and SUPPRESSOR OF DRM1 DRM2 CMT3 (SDC; Henderson and Jacobsen, 2008) in Arabidopsis, and at the b1 locus in Zea mays (maize) (Sidorenko et al., 2009), have been identified as targets of RdDM and sources of Pol IV–RDR2-dependent 24-nt siRNAs. This may reflect an inherent ability of tandem repeats to sustain RDR-DCL-dependent siRNA biogenesis (Martienssen, 2003). Similarly to other types of repeat, the majority of tandem repeats with overlapping siRNAs and DNA methylation are present in pericentromeric heterochromatin. A previous survey of unique tandem repeats in euchromatic regions found that only a small percentage are associated with siRNAs and DNA methylation in Arabidopsis (Chan et al., 2006). Thus, tandem repeats – particularly in euchromatic contexts – are not invariably targets of Pol IV- and Pol V-mediated RdDM. The reasons for the differential siRNA accumulation and methylation of euchromatic tandem repeats remain unclear.
We used a transgene silencing system that incorporates both an IR and tandem repeats in a forward-genetic screen to identify factors required for RdDM and transcriptional gene silencing (TGS) in Arabidopsis. The silencing system consists of a target (T) locus containing a GFP reporter gene under the control of an upstream target enhancer, and an unlinked silencer (S) locus containing an IR of target enhancer sequences downstream of the 35S promoter. A short tandem repeat comprising three copies of a 42-bp monomer is embedded in the enhancer sequence, which is derived from an endogenous pararetrovirus (EPRV) of Nicotiana tomentosiformis (Gregor et al., 2004; Chabannes and Iskra-Caruana, 2013). Pol-II transcription of the enhancer IR produces a hairpin RNA that is processed by several DCL activities to produce 21-, 22- and 24-nt siRNAs (Kanno et al., 2008). Primarily the 24-nt size class, which is produced by DCL3, induces methylation in trans of the target enhancer at the T locus, resulting in TGS of the GFP gene (Daxinger et al., 2009; Figure 1). To identify mutants in this RdDM pathway, we treated doubly homozygous seeds, which harbor T and S transgenes, with ethyl methanesulfonate (EMS) and screened M2 seedlings for release of GFP silencing (Kanno et al., 2008). We retrieved 13 dms (defective in meristem silencing) mutants harboring recessive, loss-of-function mutations in genes encoding Pol-V pathway components (Eun et al., 2012).
In addition to mutations in genes encoding RdDM factors, our screen could also potentially recover so-called cis mutants in which structural alterations of the S locus abolished the silencing ability and restored GFP expression in otherwise wild-type plants. Here, we report two cis mutants identified in our screen and describe how one allowed us to dissect distinct Pol II- and Pol IV-dependent pathways of siRNA biogenesis operating at the original silencer locus. We compare these two pathways and discuss sequence features that promote siRNA accumulation at tandem repeats in euchromatin.
Similarly to the dms mutants, two cis mutants, named 1.6x and 15.5a, were identified by their GFP-positive phenotype in a T+S background. As described below, these mutants did not appear to contain hairpin-derived siRNAs (21–24 nt) in an initial northern blot analysis, suggesting that structural changes at the S locus may be responsible for the observed deficiency in GFP silencing. Further investigation confirmed this hypothesis.
Structure of silencer locus in cis mutants
Polymerase chain reaction (PCR) analysis revealed that the 1.6x mutant contained a substantially altered S locus, lacking both the enhancer IR and 35S promoter (Figure 2). It is possible that the IR is potentially unstable, rendering this region susceptible to spontaneous deletion. This potential instability should be considered when using IRs to induce gene silencing in transgenic plants. The extensive deletion in the 1.6x mutant accounts for the absence of hairpin-derived siRNAs and unmethylated target enhancer (Figure S1).
For the 15.5a mutant, PCR analysis and whole-genome sequencing revealed an unusual structural variant of the S locus in which only the 35S promoter was deleted, by an unknown mechanism, leaving the enhancer IR largely intact (Figures 2, S2). We will hereafter refer to the altered silencer locus as SΔ35S and the 15.5a cis mutant as T+SΔ35S, and the original unaltered silencer locus as SWT and plants containing this locus as T+SWT (‘WT’, standing for original unaltered or ‘wild-type’ silencer). The lack of readily detectable hairpin-derived siRNAs in our initial northern blot analysis of T+SΔ35S plants (Figure S1) thus reflected the loss of Pol-II transcription of the enhancer IR following the deletion of the 35S promoter.
GFP expression and methylation of target enhancer in T+SΔ35S plants
Unexpectedly, despite the absence of hairpin-derived siRNAs in T+SΔ35S plants, GFP expression was still partially silenced, as assessed by the visualization of GFP fluorescence in seedlings (Figure 3a, compare T+SΔ35S with unsilenced T and fully activated T+SWTnrpe1) and GFP protein levels on western blots (Figure 3b, compare lanes 2 and 4). Consistent with partial GFP silencing, the target enhancer retained residual methylation, which was restricted to the tandem repeat region (Figure 3c, T+SΔ35S). This methylation pattern differed from that observed in T+SWT plants, where GFP is fully silenced and methylation extends throughout the entire target enhancer (Figure 3c, T+SWT). Similarly to the enhancer methylation in T+SWT plants (Kanno et al., 2008; Daxinger et al., 2009), the tandem repeat methylation in T+SΔ35S plants is the result of RdDM, as indicated by its loss in a mutant defective in Pol-V function (nrpe1) and dependence on the SΔ35S locus (Figure S3, T+SWTnrpe1).
Accumulation of siRNAs in T+SΔ35S plants
In view of the tandem repeat methylation in the T+SΔ35S plants and previous reports that tandem repeats can be sources of Pol IV-dependent siRNAs (Chan et al., 2006; Henderson and Jacobsen, 2008; Sidorenko et al., 2009), we tested again for siRNAs in T+SΔ35S plants by northern blotting using a probe specific to the tandem repeat region. After a long exposure time, we were able to detect a faint 24-nt band in these plants (Figure 4, lane 2). Unlike Pol II-dependent hairpin-derived siRNAs (21–24 nt), which were detectable on northern blots in T+SWT plants in a Pol-IV (nrpd1) mutant background (Figure 4, lane 4), the 24-nt siRNAs in T+SΔ35S plants disappeared in the nrpd1 mutant (Figure 4, lanes 5 and 6). The disappearance of the 24-nt siRNAs coincided with a complete loss of methylation in the tandem repeat at the T locus (Figure 3c, T+SΔ35Snrpd1) and with a full reactivation of GFP expression (Figure 3a, T+SΔ35Snrpd1; Figure 3b, lanes 5 and 6). By contrast, the extensive methylation induced by Pol II-dependent, hairpin-derived siRNAs originating from the intact SWT locus was not eliminated in an nrpd1 mutant (Figure 3c, T+SWTnrpd1; Daxinger et al., 2009), nor was GFP expression reactivated (Figure 3a, T+SWTnrpd1; Figure 3b, lane 3).
T+SΔ35S plants thus revealed a previously obscured class of Pol IV-dependent 24-nt siRNAs that only became evident following the deletion of the 35S promoter at the SΔ35S locus, which eliminated the more abundant Pol II-dependent hairpin-derived siRNAs. Despite their relatively low abundance, Pol IV-dependent 24-nt siRNAs were able to trigger methylation in the tandem repeat at the T locus, resulting in the partial silencing of GFP expression in T+SΔ35S plants. Pol IV-dependent biogenesis of tandem repeat 24-nt siRNAs relies at least to some extent on RDR2, as illustrated by the partial loss of GFP silencing and target enhancer methylation in T+SΔ35Srdr2 plants (Figure S4).
Pol II- and Pol IV-dependent pathways appear to contribute contemporaneously to siRNA accumulation in T+SWT plants, as shown by a northern blot analysis of siRNA accumulation in these plants, and in mutants deficient in Pol-II and Pol-IV function. Pol II-dependent 21-nt siRNAs were the major size class in the nrpd1 mutant, in which Pol IV-dependent 24-nt siRNAs are substantially reduced relative to T+SWT plants (Figure 4, compare lanes 4 and 1). By contrast, Pol IV-dependent 24-nt siRNAs were the major size class in a hypomorphic nrpb2 mutant, which displayed lower levels of Pol II-dependent 21- and 22-nt siRNAs, compared with T+SWT plants (Figure 4, compare lanes 7 and 1; Figure S5). The pattern of siRNA accumulation in T+SWT plants thus represents the sum of Pol II- and Pol IV-dependent siRNAs, suggesting that these pathways operate concurrently at the original unaltered silencer locus.
Biased accumulation of Pol II- and Pol IV-dependent siRNAs at target enhancer
To examine siRNA accumulation from Pol–II and Pol–IV pathways in more detail, we deeply sequenced small RNAs from T+SWT and T+SΔ35S plants. A marked non-uniform distribution of siRNAs from both pathways was observed along the enhancer sequence (Figure 5). A similar pattern of siRNA accumulation in T+SWT plants was observed in an independent study (Melnyk et al., 2011). Both Pol II- and Pol IV-dependent siRNAs were concentrated in the tandem repeat region and in a short upstream peak (Figure 5a, T+SWT and T+SΔ35S). Comparable accumulation patterns were observed in the absence of the T locus (Figure 5a, SWT and SΔ35S), which produced negligible quantities of siRNAs on its own (Figure 5a, T), confirming that essentially all siRNAs originated at the enhancer sequences of the respective silencer locus. As expected from the northern blot analysis, despite the similar accumulation patterns for Pol II- and Pol IV-dependent siRNAs, the major size class differed for the two pathways, with the predominant species being 21 and 24 nt, respectively (Figure 5b). As reported previously, siRNA generation downstream of the enhancer sequence was observed only in T+SWT plants (Figure S6; Daxinger et al., 2009), consistent with the spread of methylation into this region (Figure 3c).
In principle, the concentration of siRNAs in the tandem repeat region can be explained by the proposed affinity of Pol IV for such repeats (discussed further below); however, the uneven distribution of Pol II-dependent siRNAs was unanticipated because they are produced from a perfectly base-paired hairpin RNA that is expected to be processed homogeneously by DCL activities (Dalakouras and Wassenegger, 2013). One possibility is that Pol II-dependent siRNAs may accumulate preferentially from internal regions of the IR, which contain the tandem repeats (Figure 1), as suggested in an earlier study (Stam et al., 1998). Another factor to consider is sequence composition. This possibility was suggested by the siRNA-depleted spacer between the tandem repeats and upstream peak (Figure 5a, all genotypes). This spacer region is relatively GC-poor compared with the flanking upstream peak and tandem repeats (Figure S7).
siRNA accumulation biases at another transgene IR and endogenous tandem repeats
To extend our analysis to additional repeats, we examined siRNA accumulation at another transgenic IR for which deep sequencing data are available and at endogenous tandem repeats in euchromatin.
In a previous study, deep sequencing of siRNAs in a transgenic line expressing an IR of ‘GF’ sequences to silence a GFP reporter gene revealed distinct ‘hot spots’ of siRNA accumulation (Molnar et al., 2010). We analyzed the sequencing data and found that the siRNA peaks represent the more internal parts of the IR and generally correspond to regions that contain the most CG dinucleotides (Figure S8).
Using etandem (http://emboss.sourceforge.net/apps/release/6.0/emboss/apps/etandem.html) to screen the Arabidopsis genome (TAIR10 version), we identified 786 euchromatic tandem repeats, defined as being at least 4 Mb away from the centromere, with a minimum monomer size of 40 bp and a score >80 (calculated directly by etandem; Table S1). Of these, 282 are in intergenic (IGN) regions, 486 are in genes, four are in pseudogenes and 14 are in transposons (Figure S9; Table S1, column I). The majority (68.1%) of IGN tandem repeats are sources of Pol IV-dependent 24-nt siRNAs, compared with only around 9.5% of the tandem repeats in genic regions (Figure S9; Table S1, column L). Consistent with a sequence bias of siRNA accumulation, regions of tandem repeats (all categories) that accumulate more siRNAs are significantly GC-rich (Figure 6a, P < 10−10). Moreover, of the IGN and genic tandem repeats that accumulate Pol IV-dependent 24-nt siRNAs, most (91.1 and 78.3%, respectively) contain methylated CG dinucleotides, whereas the majority of IGN and genic tandem repeats that do not accumulate 24-nt siRNAs (82.2 and 72.3%, respectively) lack CG methylation (Figures 6b, S9A) or CG dinucleotides altogether (Table S1, column C, red zeroes). Tandem repeats in pseudogenes and transposons showed a similar trend (Figure S9A; Table S1, column C). Examples of the correlation between siRNA accumulation and regions of tandem repeats that are relatively GC-rich and contain methylated cytosines are shown in Figure S10. Note that in these examples CG methylation is not lost in an nrpd1 mutant.
We exploited a structural variant identified in a forward genetic screen to dissect distinct Pol II- and Pol IV-dependent pathways of siRNA biogenesis at a trans-silencer locus containing inverted and tandem repeats. In addition to highlighting the complexity of siRNA accumulation at compound repetitive loci, our studies allow comparisons between the two pathways operating at a single locus and help to identify general features influencing siRNA accumulation at repeats in euchromatin.
As expected, abundant hairpin-derived siRNAs (21–24 nt) were detected in T+SWT plants, where Pol II transcribes the enhancer IR from a 35S promoter. Because the siRNAs (21–24 nt) accrued from both the 5′ and 3′ ends of the target enhancer (upstream peak and tandem repeat, respectively), we presume that the hairpin RNA formed from a full-length, or nearly full-length, single-stranded transcript of the enhancer IR. Pol IV–RDR2-dependent 24-nt siRNAs detected in the T+SΔ35S mutant derived primarily from the tandem repeat, with lesser quantities accumulating from the upstream peak. The ability of Pol IV to carry out transcription after deletion of the 35S promoter indicates that it can function in the absence of a constitutive Pol-II promoter. The identity of the promoter used by Pol IV is not clear, but it is likely to reside in or near the tandem repeat itself. Whether this tandem repeat binds transcription factors is not known. Tandem repeats that mediate paramutation at the b1 locus in maize contain transcriptional regulatory sequences and give rise to Pol IV-dependent siRNAs (Belele et al., 2013).
Pol II-dependent siRNAs were considerably more abundant than Pol IV-dependent siRNAs, which became noticeable only when the Pol-II pathway was eliminated in T+SΔ35S plants. Although they produce different quantities of siRNAs, the Pol II- and Pol IV-dependent pathways are not mutually exclusive, and can apparently coexist at the unaltered silencer locus in T+SWT plants (Figure 7). Both pathways generate siRNAs from the tandem repeat, indicating that each polymerase can use this region as a template for transcription. Thus, chromatin features facilitating transcription by one polymerase do not necessarily preclude transcription by the other; however, the lower levels of siRNAs produced in the Pol-IV pathway suggest that it is much less efficient than the Pol-II pathway.
Although our study does not identify the promoter used by Pol IV, it does illuminate features promoting Pol IV-dependent accumulation of siRNAs at tandem repeats. Tandem repeats are often sites of amplification of Pol IV–RDR2-dependent siRNAs (Chan et al., 2006; Henderson and Jacobsen, 2008; Sidorenko et al., 2009), which is attributed to the ability of RDR and DCL to more efficiently generate siRNAs from a tandem array than from a single copy sequence (Martienssen, 2003). In our study, the major peaks of accumulation of Pol IV-dependent 24-nt siRNAs were observed at the tandem repeats, suggesting that Pol IV indeed recognizes these repeats and amplifies siRNAs preferentially in this region. However, we also identified a potential contribution of sequence composition to preferential siRNA accumulation. The tandem repeats as well as the upstream ‘peak’, which also preferentially accumulates siRNAs, are generally GC-rich compared with the intervening siRNA-depleted spacer region. A similar sequence bias in siRNA accumulation within a tandem repeat monomer was reported for the b1 tandem array in maize, where siRNAs accumulated preferentially from the 5′ half of the monomer unit, and not from the AT-rich 3′ end (Arteaga-Vazquez et al., 2010).
Prompted by these findings, we expanded our analysis by examining siRNA accumulation at 786 endogenous tandem repeats located in euchromatin. This survey revealed that nearly 70% do not accumulate Pol IV-dependent 24-nt siRNAs, substantiating a previous study showing that euchromatic tandem repeats are not invariably targeted by Pol IV–RDR2 activities (Chan et al., 2006). Around three-quarters of the tandem repeats lacking siRNAs are present in genes, where they might be protected from Pol–IV activity by unidentified chromatin features associated with Pol-II transcription of protein-coding genes. By contrast, the majority of tandem repeats in IGN regions accumulate Pol IV-dependent 24-nt siRNAs and, similarly to the enhancer tandem repeat in the T+S system, most of these accrue from more GC-rich segments containing methylated CG dinucleotides. The CG methylation persists in an nrpd1 mutant, indicating that it is maintained independently of RdDM. A scenario consistent with these findings is that Pol IV is recruited to tandem repeats in IGN regions by pre-existing CG methylation (Chan et al., 2006), perhaps established by 21-nt siRNAs and Pol II-dependent mechanisms (Zheng et al., 2009; Nuthikattu et al., 2013; Stroud et al., 2013), and subsequent siRNA amplification occurs through RDR2 and DCL3 activities (Martienssen, 2003). The resulting 24-nt siRNAs induce non-CG methylation to create the specific pattern of methylation of cytosines in all sequence contexts observed at repeats and transposons (Saze and Kakutani, 2011).
As discussed above, the presence of tandem repeats and variations in GC content and CG methylation appear to bias the accumulation of Pol IV-dependent 24-nt siRNAs in the T+S system; however, a similar non-uniform siRNA profile was observed for Pol II-generated, hairpin-derived siRNAs (21–24 nt) that are largely independent of Pol-IV activity. This result contrasts to the expectation that Pol II-dependent hairpin-derived siRNAs would be distributed more or less evenly along the entire inverted repeat that constitutes the Pol-II transcription unit (Dalakouras and Wassenegger, 2013). Given the comparable distribution patterns for both Pol IV- and Pol II-dependent siRNAs, we presume that similar factors, including GC content and CG methylation, influence siRNA accumulation of all sizes in both pathways. The enhanced siRNA accumulation at sequences with a higher GC content and methylated CG dinucleotides is consistent with previous studies showing that CG and non-CG methylation can feedback positively on 24-nt siRNA accumulation, perhaps reflecting an affinity of Pol IV for methylated DNA (Law and Jacobsen, 2010; Haag and Pikaard, 2011; Stroud et al., 2013). Our data suggest that, in addition, the accumulation of 21-nt siRNAs is also influenced by sequence composition. In accordance with this result, a recent study revealed a significant correlation between high GC content and increased expression of plant siRNAs of all size classes (Zhang et al., 2014).
A further factor that may influence the preferential accumulation of Pol II-dependent siRNAs from the tandem repeats in the T+S system is their position close to the center of IR, which might be more stably base-paired than distal regions in the hairpin RNA (Stam et al., 1998). This idea is supported by another transgenic silencer locus we examined, in which the hot spots of ‘GF’ siRNA accumulation are also derived from internal regions of the IR (Molnar et al., 2010). This internal region does not contain tandem repeats but is relatively rich in CG dinucleotides. An siRNA peak was also detected close to the center of endogenous IR-71 and, similarly to the enhancer IR in the T+S system, this central region contains a short tandem repeat (two copies of a 60-bp monomer; Henderson et al., 2006). Conceivably, this tandem repeat could promote preferential siRNA accumulation by further stabilizing base pairing in this region of the RNA hairpin or attracting Pol IV–RDR2. Local sequence elements or structural features have also been proposed to promote preferential generation of siRNAs involved in post-transcriptional gene silencing (De Paoli et al., 2009). Further work is required to understand fully the siRNA sequence biases that are observed in deep-sequencing efforts. Knowledge gained from such investigations will guide efforts to optimize siRNA production for gene-silencing applications in basic and applied research.
Previous publications describe the isolation of nrpb2-3 (Zheng et al., 2009), nrpd1-7 (Smith et al., 2007) and nrpe1-10 (Kanno et al., 2008) mutants, which are defective in Pol-II (hypomorphic allele), Pol-IV and Pol-V function, respectively. The T-DNA insertion line rdr2-2 (SALK_059661) was obtained from the Arabidopsis Biological Resource Center (ABRC, http://abrc.osu.edu). The 1.6x and 15.5a (T+SΔ35S) cis-mutants (identified in seed batches 1 and 15, in the sixth and fifth screens, respectively) were screened out by the GFP-positive phenotype of M2 seedlings derived from EMS-mutagenized M1 seeds of our T+S transgenic line (Kanno et al., 2008; Eun et al., 2012). The T and S transgene loci are present in intergenic regions on the top and bottom arms of chromosome 1 at nucleotide positions 10 343 986 and approximately 17 480 000, respectively. Silencer (S) strains (SWT and SΔ35S) were generated by crossing out the target (T) transgene from T+SWT and T+SΔ35S lines. Silencer and target transgenes were genotyped using primers shown in Table S2. The sequences of the T and S transgene constructs are available under the accession numbers HE582394 and HE584556, respectively.
The sequence of the silencer locus in the T+SΔ35S line (Figure S2) was deduced from whole-genome sequence data. Paired-end sequencing of a genomic DNA library was performed with 72-bp read length using an Illumina Genome Analyzer II, as described previously (Eun et al., 2011; Sasaki et al., 2012). For assembling the sequence of the silencer (SΔ35S) locus in the T+SΔ35S line, the sequence of the SWT locus (accession number HE584556) was used as a reference sequence (Eun et al., 2011). The whole-genome sequencing data from the T+SΔ35S line are available from the National Center for Biotechnology Information (NCBI, http://www.ncbi.nlm.nih.gov) Sequence Read Archive (SRA), under accession number SRX312270.
DNA methylation analysis of target enhancer
Bisulfite sequencing was used to determine the DNA methylation status of the target enhancer at the T locus. Genomic DNAs were extracted using a DNeasy Plant Mini Kit (Qiagen, http://www.qiagen.com). For bisulfite sequencing, 1 μg of genomic DNA was digested by HindIII and purified by QIAquick PCR purification kit (Qiagen). DNA (500 ng) was used for bisulfite conversion of unmethylated cytosines to uracils using the EpiTect Bisulfite Kit (Qiagen). Converted DNA was used as a template for PCR using the primers listed in Table S2. One of the primers used was specific for the target locus so that no fragment from the copies of the enhancer IR at the silencer locus was amplified. Amplified fragments were cloned using the T-Easy Vector System (Promega, http://worldwide.promega.com), and 14–20 clones were sequenced.
Small RNA northern blotting analysis
Small RNA was isolated from 250 mg of mixed floral inflorescences using a mirVana miRNA Isolation Kit (Ambion, now Life Technologies, http://www.lifetechnologies.com). RNA was separated by electrophoresis on a 15% denaturing polyacrylamide/urea gel and transferred to Hybond-N+ (GE Healthcare, http://www.gehealthcare.com) by electroblotting using a Trans-Blot SD Semi-Dry Transfer Cell (Bio-Rad, http://www.bio-rad.com). The membrane was hybridized overnight with [γ-32P]ATP-labeled oligonucleotides (Table S2) using Ultra Hyb oligo Buffer (Ambion) at 40°C. The two probes used are for the top strand, and cover the entire 42-bp tandem repeat monomer. The membrane was washed twice at 40°C in 2X SSC, 0.5% SDS for 15 min, and exposed to X-ray film for 4 days for hairpin-derived siRNAs and 5 h for miR171 at −80°C.
Western blotting analysis
The protein extraction, SDS-PAGE and western blotting to detect GFP protein were carried out as previously described (Sasaki et al., 2012).
Small RNA deep sequencing
For small RNA libraries, total RNA from the materials described above was isolated using Tri Reagent™ (Molecular Research Center, Inc., http://www.mrcgene.com). Small RNA libraries were constructed using the Illumina TruSeq Small RNA Sample Preparation Kit (RS-200-0012) and sequenced on an Illumina HiSeq2000 instrument at the Delaware Biotechnology Institute of the University of Delaware. Raw sequencing data was first trimmed of adapter sequences, then the read counts were normalized based on the total abundance of genome-matched small RNA reads, excluding structural sRNAs originating from annotated tRNA, rRNA, small nuclear and small nucleolar RNAs. For the transgene loci (wild-type or mutated Silencer), bowtie (http://bowtie-bio.sourceforge.net/index.shtml) was used to map small RNA reads to the transgene sequences, with the small RNA abundance as reads per five million total genome-matched reads (RP5M). For whole-genome small RNA analyses, the ‘hits normalized abundance’ (HNA) values were calculated by dividing the normalized abundance (in RP5M) for each small RNA read hit, where a hit is defined as the number of locations at which a given sequence perfectly matches the genome. The small RNA sequence data are available from the NCBI Gene Expression Omnibus (GEO) under GEO series accession number GSE47852.
Statistical analysis of GC contents of regions preferentially accumulating siRNAs
We identified 248 tandem repeats that accumulated 24-nt siRNA reads (Figure S9A; Table S1, column l). Within these 248 tandem repeats, the sequences are further classified into two regions, with or without siRNA accumulation, for the GC-content analysis. The GC-content percentage (GC%) based on the genomic sequence is defined as:
For each repeat, we calculated the difference of GC% (ΔGC) between the siRNA-accumulating regions (GCsiRNA%) and the non-siRNA-accumulating regions (GCnon%), as:
We compared the ΔGCs in repeats with the randomized genomic sequences, in which the genomic sequences of each of the 248 tandem repeats are shuffled (http://github.com/wwliao/deltagc_repeat). The result shows that the ΔGCs in repeats are significantly greater than those in randomized sequences, suggesting that the siRNA-accumulating regions are significantly GC-rich.
Whole-genome bisulfite sequencing
The methylation of endogenous tandem repeats (Figure S10) was extracted from whole-genome bisulfite sequencing data following the standard BS-seq protocol (Schmitz et al., 2013). Approximately 3 μg of T+SWT genomic DNA was sonicated to approximately 250 bp before it was ligated to Illumina adaptors, then size-selected, denatured and treated with sodium bisulfite to reveal their methylation status. The BS-seq libraries were sequenced using the Illumina HiSeq 2000 platform, according to the manufacturer's instructions. Sequencing was carried out for up to 100 cycles in paired ends. The reads were aligned to the reference genome (TAIR10) using the modified bisulfite aligner, bsseeker (Chen et al., 2010). To produce genome-wide DNA methylation profiles, the methylation level for each covered cytosine in the genome is calculated. Because bisulfite treatment converts unmethylated cytosines (Cs) to thymines (Ts), the methylation level at individual cytosines is estimated as #C/(#C + #T), where #C represents the number of methylated reads and #T corresponds to the number of unmethylated reads. The methylation level per cytosine provides an estimate of the percentage of cells containing methylation at this cytosine. The raw reads and the processed data set for the T+SWT methylome can be downloaded from NCBI GEO under accession GSE47453.
Financial support was provided by Academia Sinica and the Austrian Academy of Sciences. T.S. was supported by the Japan Society for the Promotion of Science Postdoctoral Fellowships for Research Abroad. Work in the Meyers Laboratory was supported by NSF award #1051576. We thank the sequencing services provided by the National Center for Genome Medicine of the National Core Facility Program for Biotechnology, National Science Council, Taiwan. We are grateful to Lucia Daxinger for performing the initial northern blot analysis to detect hairpin-derived siRNAs in the 15.5a mutant, Xuemei Chen for nrpd2-3 seeds, Christine Ying for editorial assistance, and Ming-Tsung Louis Wu and Attila Molnar for helpful discussions. The authors have no conflicts of interest to declare.