Characterization of small RNAs in Xenopus tropicalis gastrulae

Authors

  • Fernando Faunes,

    1. Center for Aging and Regeneration and Millennium Nucleus in Regenerative Biology, Pontificia Universidad Católica de Chile, Alameda 340, Santiago, Chile
    Search for more papers by this author
    • F.F., L.I.A., and J.L. contributed equally to this work.

  • Leonardo I. Almonacid,

    1. Molecular Bioinformatics Laboratory, Millennium Institute on Immunology and Immunotheraphy, Pontificia Universidad Católica de Chile, Alameda 340, Santiago, Chile
    2. Departamento de Genética Miolecular y Microbiología, Facultad de Ciencias Biológicas, Pontificia Universidad Católica de Chile, Alameda 340, Santiago, Chile
    Search for more papers by this author
    • F.F., L.I.A., and J.L. contributed equally to this work.

  • Francisco Melo,

    1. Molecular Bioinformatics Laboratory, Millennium Institute on Immunology and Immunotheraphy, Pontificia Universidad Católica de Chile, Alameda 340, Santiago, Chile
    2. Departamento de Genética Miolecular y Microbiología, Facultad de Ciencias Biológicas, Pontificia Universidad Católica de Chile, Alameda 340, Santiago, Chile
    Search for more papers by this author
  • Juan Larrain

    Corresponding author
    1. Center for Aging and Regeneration and Millennium Nucleus in Regenerative Biology, Pontificia Universidad Católica de Chile, Alameda 340, Santiago, Chile
    • Center for Aging and Regeneration and Millennium Nucleus in Regenerative Biology, Pontificia Universidad Católica de Chile, Alameda 340, Santiago, Chile
    Search for more papers by this author
    • F.F., L.I.A., and J.L. contributed equally to this work.


Abstract

Here, we report and characterize deep sequencing data and bioinformatics analysis of small RNAs from Xenopus tropicalis gastrula. A total of 17,553,124 reads with perfect match to the genome derived from 2,616,053 unique sequences were identified. Seventy-seven percent of theses sequences were not found in previous reports from X. tropicalis oocytes and somatic tissues. Bioinformatics analyses indicate that a large fraction of the small RNAs are PIWI-interacting RNAs. Up to 23.9% of small RNAs mapped to transposable elements and 27% to genic regions. Most of abundant transposable derived small RNAs are found in oocyte and gastrula libraries, suggesting that transposon needs to be silenced also during early development. Additionally, miRNAs were identified and many of them are not present in oocytes, suggesting that miRNA expression is stage specific. To the best of our knowledge, this is the first high throughput data release and bioinformatics characterization of small RNAs during Xenopus development. genesis 50:260–270, 2012. © 2012 Wiley Periodicals, Inc.

INTRODUCTION

The identification of different classes of small RNAs is one of the most important findings in gene expression studies in the last two decades. Although the first micro RNAs (miRNAs) were individually discovered (Lee et al.,1993; Wightman et al.,1993), the other two major classes of small RNAs—small interfering RNAs (siRNAs) and PIWI interacting RNAs (piRNAs)— have been mainly identified using different sequencing technologies in the past few years (Brennecke et al.,2007; Tam et al.,2008; Watanabe et al.,2008b). Although these methodologies have identified hundreds of thousands different small RNAs in several species, a large fraction of the sequences is observed only once (i.e., one read) in the different libraries. This suggests that these populations, in particular piRNAs, have a high complexity, and that therefore, these sequencing technologies do not cover the complete catalogue of small RNAs (Aravin et al.,2006; Brennecke et al.,2007; Girard et al.,2006; Siomi et al.,2011).

Small RNAs regulate gene expression during development and in adult tissues (Ghildiyal and Zamore,2009). miRNAs control translation and stability of target mRNAs and, for example, miR-430 accelerates the deadenylation and clearance of maternal mRNA in early zebrafish embryos (Giraldez et al.,2006). siRNAs control the expression of endogenous genes and transposable elements both in somatic tissues and the germline (Chung et al.,2008; Tam et al.,2008; Watanabe et al.,2008b). In contrast, piRNAs mainly protect genome integrity against transposons in the germ cells. However, recent evidence shows that piRNAs can also control the expression of endogenous genes (Robine et al.,2009; Saito et al.,2009).

During Xenopus development the patterning of the anterior-posterior and dorsoventral axes begins at the gastrula stage, 10 hours after fertilization. In this stage, the Spemann's organizer present at the dorsal side is crucial for the proper patterning of the embryo (De Robertis et al.,2000). Several miRNAs are expressed in the gastrula stage (Watanabe et al.,2005), and miR-15/16 are important for regulating Nodal signalling in the dorsoventral axis (Martello et al.,2007). To our knowledge, no previous global analysis of the small RNA population at this important stage has been performed.

In Xenopus, using deep sequencing technologies, different libraries of small RNAs have been successfully prepared and characterized from oocytes, eggs, liver, and skin (Armisen et al.,2009; Lau et al.,2009). In addition, about 10 million RNAs bound to XIWI protein have been sequenced from Xenopus tropicalis eggs (Lau et al.,2009; Robine et al.,2009). Importantly, size distribution of oocyte small RNAs suggests that most of sequences in oocytes correspond to the piRNA class (Armisen et al.,2009), consistent with their role in the protection of the germline against new insertions of transposable elements (Malone and Hannon,2009). In contrast, in somatic tissues small RNA sequences correspond mostly to miRNAs and siRNAs (Armisen et al.,2009).

We previously used Illumina sequencing technology to generate a library of small RNAs from X. tropicalis gastrula embryos. This library was specifically built to identify small RNAs derived from Tc1-like DNA transposable elements, as we previously found that some of them are differentially expressed during early Xenopus development (Faunes et al.,2009,2011). Now, we present a complete characterization of this X. tropicalis gastrula library and a comparison to published libraries. We obtained 17,553,124 reads with perfect match to the genome. Seventy-seven percent of unique sequences corresponded to novel sequences identified in this work. The length distribution of this population and the presence in XIWI-bound small RNA libraries suggests that piRNAs are the most abundant class in the gastrula. Up to 23.9% of total reads can be mapped to transposable elements and detailed analyses indicate that these sequences are piRNAs. Importantly, 27.3% of sequences map to genic regions. Small RNAs derived from genes contained piRNAs and other types of small RNA families. This is the first deep sequencing analysis of small RNAs at gastrula stage during Xenopus development.

RESULTS

General Description of Small RNAs Library from Gastrulae

Small RNA libraries from X. tropicalis oocytes, eggs, liver, and skin have been successfully prepared and analyzed (Armisen et al.,2009; Lau et al.,2009; Robine et al.,2009). In this work, small RNAs from dorsal and ventral explants at the gastrula stage of X. tropicalis embryos were isolated and processed for Illumina/Solexa deep sequencing. Sequences between 20 and 32 nucleotides (nt) were selected and mapped to the genome without mismatches (see Methods), to generate final sets of 9,737,900 and 7,815,224 reads, in the dorsal and ventral libraries, respectively (NCBI accession number GSE 30067).

Although the comparison between these libraries could be useful for identifying novel small RNAs expressed differentially along the dorsoventral axis, for the analysis depicted in this article, we generated a single and merged set of “gastrula” data for two main reasons: (1) The presence of different length variants of a unique “core” sequence with completely opposite frequencies between the libraries (Supporting Information Fig. S1); and (2) considering that these sequencing technologies do not cover the complete small RNA transcriptome, quantitative comparisons among libraries are difficult. Consistent with this point, miRNAs miR-15 and miR-16, which are known to be enriched in the ventral side (Martello et al.,2007) appeared at similar frequencies in both libraries (data not shown). The same results were obtained for selected sequences with high differences in frequency between both libraries (data not shown), suggesting that in the conditions that we have performed these experiments direct comparisons between both libraries are not feasible.

For the analysis performed in this report, we have merged all the sequences from both libraries with perfect matches to the genome. The merged library contains 17,553,124 total reads corresponding to 2,616,053 unique sequences (Table 1). In this set, length variants were considered as different unique sequences (Supporting Information Fig. S1). Forty-four percent of sequences (1,162,615 sequences) had a single match to the genome (unambiguous mapping) and 56% matched two or more times. From this set of sequences with multiple matches, 22.0% matched to more than 10 positions in the genome (Fig. 1a). Sixty-six percent of the unique sequences that match to the genome (1,729,885 sequences) had only a single read (Fig. 1b).

Figure 1.

Characterization of small RNAs of X. tropicalis.a) Histogram of cumulative frequency of the number of reads of gastrula small RNAs. b) Histogram of cumulative frequency of the number of matches to the X. tropicalis genome. c) The nucleotide composition of unique sequences was determined by using Weblogos tool. Orange character shows the bias in the 5′ end. d) Histogram of sequence length of the set of gastrula small RNAs. Black color indicates the percentage of sequences found in the XIWI-bound small RNA libraries. White color indicates the percentage of sequences which are not found in the XIWI-bound small RNA libraries.

Table 1. Description of Set of Small RNAs Obtained from X. tropicalis Gastrula
 Unique sequences (%)Total reads (%)
  1. Number of individual sequences (unique sequences), total reads, and percentage for each class. Sequenced once correspond to sequences with a frequency of 1 read.

Mapping to genome2,616,053 (100.0)17,553,124 (100.0)
Single match to the genome1,162,615 (44.4)5,497,219 (31.3)
Sequenced once1,729,885 (66.1)1,729,885 (9.9)

We compared our library with previously published X. tropicalis libraries (Armisen et al.,2009; Lau et al.,2009; Robine et al.,2009) and found that 77.7% of the unique sequences (2,032,109 sequences) were only detected in our gastrula library. The set of sequences found in other libraries (22.3% of unique sequences) corresponded to the 74.0% of total gastrula reads. Therefore, novel small RNAs were identified at gastrula stage in this work.

piRNAs Are the most Abundant Class of Small RNAs in the Gastrula Stage

There are three major classes of small RNAs: miRNA, siRNA and piRNA (Kim et al.,2009). A bias to uracil at the 5′ end has been described mainly for miRNAs and piRNAs (Kim,2006). A bias in length has also been observed: miRNAs and siRNAs are mainly 21–24 nt long and piRNAs are between 24 and 30 nt long.

We observed a bias for uracil in the 5′ end for unique sequences in the whole population of small RNAs in the gastrula library (Fig. 1c), suggesting that most of the small RNAs are members of the miRNA and piRNA families. The length distribution of gastrula derived small RNAs showed a peak in 28 nt and 95.3% of reads are between 24 and 32 nt in length (Fig. 1d). In addition, a specific comparison with the two published XIWI-bound small RNA libraries from eggs was performed (Lau et al.,2009; Robine et al.,2009). We found that 18.2% of the unique sequences were present in the XIWI-bound small RNA libraries, corresponding to 67.8% of the total reads (Fig. 1d). All these results together suggest that piRNAs are the most abundant class of small RNAs present at gastrula stage.

Transposon Derived Small RNAs

It has been shown that about one third of X. tropicalis genome corresponds to transposable elements (Hellsten et al.,2010). Considering that piRNAs could be derived from transposable elements we searched the set of small RNA sequences in databases containing representative transposable element sequences (Jurka,2000; Smit et al.,1996–2010). Three different sets of sequences were used for this analysis: (1) sequences with a single match to the genome without mismatches to transposable elements (Class A); (2) all sequences without mismatches to transposable elements (Class B) and (3) all sequences allowing up to three mismatches to transposable elements (Class C). The last analysis was performed because the databases available only contain representative sequences for each transposable element and it is known that the genomes contain several mutated copies of each transposon family (Brennecke et al.,2007).

For Class A only 1.0% of the unique sequences (1.6% of total reads) mapped to transposable elements (Table 2). For Classes B and C, the number of unique sequences matching to transposons increases to 5.7 and 17.7% (11 and 23.9% of total reads), respectively. This indicates that most of the multiple matches in the genome correspond to transposons.

Table 2. Mapping of Small RNAs to Transposable Elements
ClassA—Single match to the genomeB—Complete setC—Complete set +3 mismatches to transposable elements
Unique (%)Reads (%)Unique (%)Reads (%)Unique (%)Reads (%)
  • Sequences with a single match to the genome (Class A) or the complete set of sequences were searched in transposable element sequences with 0 mismatches (Class B) or allowing up to three mismatches (Class C).

  • a

    Sequences simultaneously mapping in the sense and antisense orientations to the same or different transposable elements were removed from analysis. For class A, 1,307 unique sequences (23,100 total reads) were removed. For class C, 9,698 unique sequences (64,154 total reads) were removed.

  • b

    Percentage relative to the total number of sequences mapping to transposable elements filtered in each class.

Total1,162,615 (100)5,497,219 (100)2,616,053 (100)17,553,124 (100)2,616,053 (100)17,553,124 (100)
Mapping to transposable elements11,375 (1.0)87,562 (1.6)149,123 (5.7)1,935,195 (11.0)464,083 (17.7)4,202,079 (23.9)
Mapping to transposable elements and found in XIWI-bound small RNA libraries2,287 (20.1)59,590 (68.0)44,711 (29.9)1,551,439 (80.1)100,501 (21.6)3,135,841 (74.6)
Mapping to transposable elements filtereda11,37587,562147,8161,912,095454,3854,137,920
Antisense orientation (b)8,669 (76.2)81,451 (93.0)105,639 (71.5)1,596,997 (83.5)313,797 (69.1)3,290,762 (79.5)
Sense orientation (b)2,706 (23.8)6,111 (7.0)42,177 (28.5)315,098 (16.5)140,588 (30.9)847,158 (20.5)

A detailed analysis for the Class B sequences was performed. Length distribution of small RNAs in this class showed a peak in 28 nt (Fig. 2a). A strong bias to the antisense orientation (83.5% of total reads) relative to the transposable element sequence was observed (Table 2). According to the ping-pong model of amplification of piRNAs, biases to uracil in the 1st position (5′ end) and to adenine in the 10th position should be observed (Brennecke et al.,2007; Gunawardane et al.,2007; Kim,2006). Analysis of gastrula sequences mapped to transposable elements showed these biases to uracil in the 1st position for antisense sequences and to adenine in the 10th position for sense sequences relative to transposable elements (Fig. 2b and Supporting Information Fig. S2). Finally, 80.2% of total reads were found in the published XIWI-bound small RNA libraries. All together, these results strongly suggest that around 20% of the small RNAs found in gastrulae are derived from transposons and most of them are piRNAs.

Figure 2.

Small RNAs mapping to transposable elements. a) Length distribution of small RNAs (Class B). b) Weblogos of unique sequences mapping to transposable elements in sense and antisense orientation. Orange characters show the bias in the 10th position for sense sequences and in the 5′ end for antisense sequences.

The Pool of Small RNAs Mapped to Transposable Elements Is Similar Between Oocytes and Gastrula

Transposable elements with the highest number of mapped reads included L1-55-XT with 365,755 reads; HAT-10-XT with 116,080 reads and PIGGYBAC-1-XT with 114,661 reads (Table 3 and Supporting Information File S1). The five elements with higher number of reads corresponded to 41.6% of total reads mapped to transposable elements. Notably, four out of these five transposable elements were also found among the transposable elements with the highest number of total reads from previously published oocytes libraries (Table 3) (Armisen et al.,2009). These results suggest that the pool of small RNAs mapped to transposable elements is similar in oocytes and gastrula.

Table 3. Transposable Elements with the Highest Number of Small RNAs Mapped
Transposable elementUnique sequencesReads
  • Small RNAs (class B) were searched in the set of representative sequences of transposable elements. The 10 elements with the highest number of reads are shown.

  • a

    Sequences also found in the top 10 transposable elements with high number of reads in oocytes libraries (Armisen et al.,2009).

L1–55-XTa7,734365,755
HAT-10-XTa6,484116,080
PIGGYBAC-1-XTa7,138114,661
HARBINGER-2-XTa3,980107,087
TC1–5-Xt(Xeminos)3,906100,714
POLINTON-1-XTa4,94090,817
HAT-9-XTa5,52585,123
CR1–1B-XT7,31278,738
ERV1–3-I-XTa3,19669,911
HELITRON-N1A-XT2,25866,903

Small RNAs mapped to the transposable element L1-55-XT were analyzed in more detail (Fig. 3a). Small RNAs mapped to this element showed the typical antisense bias of piRNAs. Length distribution showed a peak in 28-29 nt (Fig. 3b). Nucleotide analyses showed the bias to uracil for antisense reads in the 5′ end and the bias to adenine in the 10th position (Fig. 3c). In addition, 94.5% of the reads mapping to L1-55-XT were also found in published XIWI-small RNA libraries. All these results strongly suggest that L1-55-XT-mapped small RNAs correspond to the piRNA class.

Figure 3.

Analysis of small RNAs mapping to L1-55-XT transposable element. a) Distribution of small RNAs in L1-55-XT sequence. Reads of sequences mapping in sense orientation are shown above the line and reads of sequences mapping in antisense orientation are shown below the line. b) Length distribution of small RNAs mapping to L1-55-XT transposable element. c) Weblogos of unique sequences mapping to L1-55-XT. Orange characters show the bias in the 10th position for sense sequences and in the 5′ end for antisense sequences.

Some miRNAs Are Specifically Expressed at Gastrula Stage

Several miRNAs are expressed during early Xenopus development (Martello et al.,2007; Walker and Harland,2008; Watanabe et al.,2005,2008a). To determine the miRNAs detected in our library, we mapped the small RNA sequences observed in the gastrula library to mature miRNA sequences available in miRbase (Griffiths-Jones,2006). In our gastrula library, we found 76 out of the 169 described miRNAs in miRbase. Fifty-six out of these 76 miRNAs have two or more reads (Supporting Information File S2). The most abundant miRNAs detected were xtr-let-7f (1,154 reads), xtr-miR-367 (1,130 reads), xtr-miR-363-5p (1,055 reads), xtr-miR-103 (308 reads), and xtr-miR-16c (262 reads). Only xtr-let-7f was also found as one of the most abundant miRNAs in somatic and oocytes published libraries (Armisen et al.,2009). miR-16c and xtr-miR-363-5p were found at lower frequencies in somatic and oocytes published libraries. xtr-miR-367 and xtr-miR-103 were not found in somatic or oocytes libraries. Therefore, contrary to our findings with the most abundant piRNAs we found that some miRNAs can be specifically expressed at gastrula stage.

Small RNAs Derived from Annotated Genes

Although siRNAs and piRNAs were initially described to be mainly derived from transposons and to play a role protecting the germ line against them, recent evidences show that these small RNAs also derive from genes (Robine et al.,2009; Saito et al.,2009; Tam et al.,2008; Watanabe et al.,2008b). As it was mentioned earlier, only 5–20% of total reads mapped to transposons, suggesting that many small RNAs can be derived from genic regions. To determine whether the small RNAs in the gastrula library matched to annotated transcripts, the Ensembl gene annotation was used as a reference frame (Flicek et al.,2008). Considering only the sequences with a single match to the genome, we found that 72.7% mapped to intergenic regions and 27.3% (317.625 unique sequences) mapped to regions containing annotated genes, including exons, introns, and untranslated regions (UTRs) (Table 4). Ninety-four percent of unique sequences mapping to genes mapped to introns. Only 0.4% of unique sequences that mapped to genes also mapped to transposable elements, indicating that most of these small RNAs mapped specifically to genes and not onto transposable elements included in genes. It is important to mention that the number of small RNAs mapping to genes could be higher, because in the current annotation of X. tropicalis genome only 20% of the genes contain an annotated UTR element. Consistent with this fact, the addition of up to 5,000 nts in the 5′ and 3′ ends of annotated genes without assigned UTRs, increased the matching of unique small RNAs up to 42.1% (Table 4). In conclusion, between 23 and 42% of small RNAs could be derived specifically from genic regions and most of them come from intronic regions.

Table 4. Mapping of Small RNAs to Ensembl Genes
 Sequences (%)Reads (%)
  • The set of small RNAs with a single match to the genome was mapped to Ensembl genes (annotation version 58.41). This table summarizes the number of sequences obtained for each class.

  • a

    Indicate percentage relative to the total number of sequences with a single match to the genome.

  • b

    Indicate percentage relative to the total number of sequences mapping to known Genes.

  • c

    Indicate percentage relative to the total number of sequences mapping to the protein coding genes.

  • d

    Sequences mapping to structural RNA and to miRNAs were removed from this table to generate the “protein coding genes” class.

Sequences with single match to genome1,162,615 (100.0)5,497,219 (100.0)
Sequenced once839,139 (72.2)839,139 (15.3)
Intergenic844,990 (72.7)a4,607,211 (83.8)a
 Near to gene (2000 bp)56,927 (6.7)334,571 (7.3)
 Near to gene (3000 bp)82,285 (9.7)454,166 (9.9)
 Near to gene (4000 bp)103,471 (12.3)581,270 (12.6)
 Near to gene (5000 bp)124,667 (14.8)718,801 (15.6)
Gene317,625 (27.3)a890,008 (16.2)a
 Sense match216,555(68.2)518,340 (58.2)
 Antisense match101,070 (31.8)371,668 (41.8)
Protein coding genesd316,291 (99.6)b861,292 (96.8)b
 Sense match215,483 (68.1)c491,861 (57.1)c
 Exon12,298 (5.7)20,001 (4.1)
 Intron203,185 (94.3)471,860 (95.9)
 Antisense match100,808 (31.9)c369,431 (42.9)c
 Exon6,592 (6.5)16,984 (4.6)
 Intron94,216 (93.5)352,447 (95.4)

The observed length distribution (Fig. 4a) and the presence in XIWI libraries (40.8% of total reads) indicate that some of the small RNA derived from genic regions are indeed piRNAs. Of note, among the small RNAs that match to transposons a higher percentage (67.8% of total reads) were found in the XIWI-bound small RNA libraries. In the same line, a trend to adenine in the 10th position was observed in both sense and antisense reads (Supporting Information Fig. S3) and 68% of unique sequences mapped in the sense orientation (Table 4). These observations suggest that small RNAs derived from genic regions are a mixed population of piRNAs and other types of small RNA families, contrary to the transposon-derived small RNA where a vast majority of them are piRNAs.

Figure 4.

Small RNAs derived from genes. a) Length distribution of small RNAs mapping to known genes. b) Distribution of small RNAs in the sequence of sarcoglycan, delta(scgd). c) Distribution of small RNAs in the sequence of UPF0704 protein C6orf165.

Genes containing the highest number of reads were identified according to sense or antisense orientation. Genes with the highest number of sense mapping reads corresponded to ENSXETG00000012610 (no description) with 33,466 reads, xtr-mir-363 with 8,538 reads and sarcoglycan delta (35 kDa dystrophin-associated glycoprotein) with 7,683 reads (Supporting Information File S3). Genes with the highest number of antisense mapping reads corresponded to UPF0704 protein C6orf165 with 31,078 reads and amiloride-sensitive cation channel 1 with 26,438 reads (Supporting Information File S4). A deeper analysis was performed for two protein-coding genes containing a description. For sarcoglycan delta (Fig 4b) and UPF0704 protein C6orf16 5 (Fig. 4c), most of the small RNAs mapped to intronic regions. Interestingly, in the case of UPF0704 protein C6orf165 a region in the last intron generated both sense and antisense small RNAs. For the genes sarcoglycan delta and UPF0704 protein C6orf165, the length distribution of the reads showed a peak in 28–29 nts (data not shown), suggesting that these small RNA sequences also correspond to piRNAs. Lower number of reads were obtained from highly expressed genes such as ef1a (35 reads) and odc (nine reads) or from specific genes involved in the dorsoventral patterning, known to be expressed at the gastrula stage, such as chordin (15 reads) or goosecoid (one read).

DISCUSSION

The use of next-generation sequencing technologies has deeply increased our knowledge on transcriptomics. These methodologies have been used to study long transcripts and small RNAs. In X. tropicalis, small RNAs from oocytes, eggs, liver, skin, and XIWI-bound small RNAs have been prepared and characterized (Armisen et al.,2009; Lau et al.,2009; Robine et al.,2009). In this study, we characterized a new library of small RNAs from X. tropicalis gastrula stage embryos. The number of reads obtained in this work is larger than all other libraries previously published and 77.7% of the small RNAs here identified correspond to novel sequences. Interestingly, although only 22.3% of the unique sequences can be found in published libraries, these sequences generated 74.0% of total reads in the gastrula, suggesting that these small RNAs are abundant.

In our dataset, 66.1% of unique sequences were only observed once (i.e. one read). In all other published libraries, between 62.2 and 81.8% of unique sequences were observed as a single read, except in the skin library with 26.4% of unique sequences (data not shown). The presence of a high fraction of sequences with a single read indicates that, although these analyses give us a huge amount of sequences, they still do not cover the complete small RNA transcriptome. These results indicate that the complexity of the pool of small RNAs is high, consistent with previous reports [reviewed in (Siomi et al.,2011)]. Considering this, we did not attempt a quantitative comparison of our library to somatic or germline libraries. Finally, the presence of length variants of the same sequences with varying frequencies in different libraries does not allow direct reliable comparisons (Supporting Information Fig. S1).

The length distribution and the comparison of our set with previously published XIWI-bound small RNAs fron eggs strongly suggest that these sequences are piRNAs. Therefore, the pool of small RNAs in the gastrula is similar to the pool in oocytes and eggs (Armisen et al.,2009; Lau et al.,2009). Considering that gastrula (stage 10) emerges only 10 hours after fertilization, the simplest explanation is that these sequences are stable during oocyte development, egg fertilization and early development. Similar results have been obtained in Drosophila (Malone et al.,2009). Consistent with this possibility, XIWI proteins are present at higher levels during early development until stage 20–26 (Wilczynska et al.,2009). Another alternative is that after the beginning of zygotic transcription, the same pool of small RNAs is synthesized.

piRNAs control the expression and mRNA levels of transposable elements to protect genome integrity, mainly in the germline (Malone and Hannon,2009). The small RNAs mapping to transposable elements presented a strong bias to antisense orientation relative to the transposon sequences, a peak in 28 nt long and the nucleotide composition of these sequences showed the typical biases of piRNAs. In addition, 80.2% of reads were found in XIWI-bound small RNA libraries. These results suggest that this population correspond to piRNAs. Our analyses also indicated that up to 23.9% of our reads could be mapped to known transposable elements. In the X. tropicalis genome, about one third of the genome corresponds to transposable elements (Hellsten et al.,2010) and therefore an enrichment of these elements in our set of gastrula small RNAs was not detected. Similar values were obtained in previous analyses of XIWI-bound, egg and microtubules-bound small RNA libraries (Lau et al.,2009). Consistently with this result, transposable elements with the highest number of reads in gastrula also have high number of reads in oocytes libraries (Table 3).

In the gastrula small RNAs, we detected 45% of miRNAs described in the miRbase database (76 out of 169 sequences). Interestingly, some of abundant miRNAs in the gastrula (xtr-miR-367 and xtr-miR-103) were not found in somatic or oocytes libraries, suggesting that these miRNAs are specifically expressed after the beginning of the zygotic transcription. The role of these miRNAs is not known. One possibility is that these miRNAs regulate the clearance of maternal miRNAs as previously shown in zebrafish (Giraldez et al.,2006).

Importantly, the amount of reads mapped to transposable elements is lower compared to the amount of putative piRNAs in our set. One possibility is that the rest of the piRNAs map to unknown transposable elements in the genome. Another alternative is that they map to protein coding genes, as it has been previously described (Aravin et al.,2001; Robine et al.,2009; Rouget et al.,2010; Saito et al.,2009). Although in our set of small RNAs, 72.7% of unambiguous sequences mapped to intergenic regions, we also detected small RNAs mapping to genes. This ratio is similar to the ratio between genic and intergenic regions in human and probably other vertebrate genomes (Gibbs et al.,2004; Hellsten et al.,2010; Venter et al.,2001), suggesting that the distribution of these small RNAs had no bias to genic/intergenic regions. Interestingly, the bias to antisense was not observed for small RNAs mapping to genes. In addition, the trend to adenine in the 10th position was detected in both sense and antisense small RNAs. These results suggest that primary piRNAs can be produced from both strands or that other classes of small RNAs are present in the gastrula. Consistently with this possibility, the percentage of small RNA sequences found in XIWI-bound small RNA libraries is only 40.8% compared to the 80.1% of total reads mapping to transposable elements. Whether these sequences are real piRNAs or other class of small RNAs and if they are controlling gene expression of protein-coding genes must be determined in each case.

It is important to mention that gene annotation in X. tropicalis is still incomplete and about 80% of the known genes do not contain assigned UTRs. Therefore, the percentage of sequences mapping to genes was probably underestimated in our analysis. Consistently with this idea, the percentage of sequences mapping to genes was increased from 27.3 to 42.1% when 5,000 bp of genome sequence are added at both ends of the gene, suggesting that a fraction of small RNAs can be derived from unstranslated regions. A low number of reads was mapped to highly expressed genes such as ef1α and odc. In contrast, a higher number of reads was unambiguously mapped to specific genes, confirming that the production of small RNAs is a highly regulated process (Robine et al.,2009). Considering the sequences mapping to known genes, most of them corresponded to introns. Interestingly, the presence of specific regions with small RNAs is observed (Fig. 4). Genes with a higher number of mapping reads correspond to sarcoglycan delta and UPF0704 protein C6orf165. The expression or function of these genes during Xenopus development has not been shown.

This is the first global description of the pool of small RNAs during early development. This work gives new important information of the small RNA population during early Xenopus development. Consistent with previous works in other species or tissues, the population of piRNA seems to be highly complex and abundant. Although an important fraction of small RNAs maps to transposable elements, other fraction maps specifically to genes and intergenic regions. Whether these sequences control the gene expression still needs to be determined.

METHODS

Embryo and Oocyte Manipulation

X. tropicalis manipulations were performed as described (http://tropicalis.berkeley.edu /home/index.html).

Deep Sequencing of X. tropicalis Small RNAs

Total RNA from dorsal and ventral explants of X. tropicalis gastrula was isolated using TRIzol (Invitrogen). The proper isolation of explants was verified by RT-PCR using chordin as a dorsal marker and sizzled as a ventral marker (data not shown). Size selection of small RNAs between 18 and 35 nt long was performed after gel after electrophoresis, and processed for Solexa/Illumina sequencing, followed by data analyses as previously described (Faunes et al.,2011). Briefly, adapter sequences were removed and sequences between 20 and 32 nt selected for further analyses. Gastrula small RNA sequences were filtered using three criteria. First, low complexity sequences were removed (sequences consisting only of polyA, polyT, polyC, and polyG). Second, sequences in the range of 20–32 nt were selected. Third, only sequences mapped to X. tropicalis genome [Joint Genome Institute, Assembly v4.1 (Hellsten et al.,2010)] were considered. The software Bowtie was used to map sequences to the genome (Langmead et al.,2009). In this work, we will refer to individual different sequences (non redundant set) as “unique sequences”, and to the set of all sequences obtained (redundant set) as “reads.”

Small RNA sequences of all X. tropicalis libraries were downloaded from Gene Expression Omnibus (GSE15556, GSE14952, GSE19173) (Armisen et al.,2009; Lau et al.,2009; Robine et al.,2009), and we used the same filters describe above for all these libraries. An all-against-all sequence comparison among libraries was performed.

Analysis of Transposable Elements

Repbase sequences database, version 14 was used (Jurka,2000). To this database, we added Tc1-like sequences available in RepeatMasker database (Smit et al.,1996–2010) to generate a complete set of 238 transposons. Small RNAs were mapped to these databases using Bowtie with 0 or 3 mismatches. The profiles of small RNAs mapping to transposable elements were constructed by adding to each position the number of reads mapping to that position, considering the orientation sense and antisense. Analysis of nucleotide composition was performed by aligning all sequences in the 5′-end and filling 3′-end gaps with the symbol “-” according to the longest sequence. Sequence logos were created with WebLogo software (version 3) (http://weblogo.googlecode.com/files/weblogo-3.0.tar.gz).

Analysis of miRNAs

X. tropicalis miRNA sequences available in miRbase were used (August 2011) (Griffiths-Jones,2006).

Analysis of Annotated Transcripts

Ensembl annotation of the X. tropicalis genome (version 58) was obtained using BIOMART (http://www.ensembl.org/biomart/index.html). Genomic gene coordinates, exons, and the description of each gene were used to make the intersection with the annotation. All genes were separated in three classes according to the BIOMART classification: protein coding genes, miRNAs, and structural RNA genes (rRNA, snRNA, and snoRNA were included in this class). In this analysis, only sequences with a single match to the genome (unambiguous mapping) were used. The coordinates of each small RNA, obtained after its mapping to the genome, were crossed with the gene and exon coordinates. The profiles of small RNAs mapped to transcripts were constructed as indicated for transposable elements.

Acknowledgements

The authors thank Dasfne lee-Liu for critical reading of the manuscript.

Ancillary