The transcriptome landscape of Arabidopsis male meiocytes from high-throughput sequencing: the complexity and evolution of the meiotic process


  • Hongxing Yang,

    1. State Key Laboratory of Genetic Engineering, Institute of Plant Biology, Center for Evolutionary Biology, School of Sciences, Fudan University, 220 Handan Road, Shanghai 200433, China
    Search for more papers by this author
    • These authors contributed equally to this work.

  • Pingli Lu,

    1. Department of Biology, Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA 16802, USA
    Search for more papers by this author
    • These authors contributed equally to this work.

  • Yingxiang Wang,

    Corresponding author
    1. State Key Laboratory of Genetic Engineering, Institute of Plant Biology, Center for Evolutionary Biology, School of Sciences, Fudan University, 220 Handan Road, Shanghai 200433, China
    Search for more papers by this author
  • Hong Ma

    1. State Key Laboratory of Genetic Engineering, Institute of Plant Biology, Center for Evolutionary Biology, School of Sciences, Fudan University, 220 Handan Road, Shanghai 200433, China
    2. Department of Biology, Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA 16802, USA
    3. Institutes of Biomedical Sciences, 138 Yixueyuan Road, Shanghai 200032, China
    Search for more papers by this author
    • These authors contributed equally to this work.

(fax +86 21 65643794; e-mail or fax +86 21 55664187; e-mail


Meiosis is essential for eukaryotic sexual reproduction, with two consecutive rounds of nuclear divisions, allowing production of haploid gametes. Information regarding the meiotic transcriptome should provide valuable clues about global expression patterns and detailed gene activities. Here we used RNA sequencing to explore the transcriptome of a single plant cell type, the Arabidopsis male meiocyte, detecting the expression of approximately 20 000 genes. Transcription of introns of >400 genes was observed, suggesting previously unannotated exons. More than 800 genes may be preferentially expressed in meiocytes, including known meiotic genes. Of the 3378 Pfam gene families in the Arabidopsis genome, 3265 matched meiocyte-expressed genes, and 18 gene families were over-represented in male meiocytes, including transcription factor and other regulatory gene families. Expression was detected for many genes thought to encode meiosis-related proteins, including MutS homologs (MSHs), kinesins and ATPases. We identified more than 1000 orthologous gene clusters that are also expressed in meiotic cells of mouse and fission yeast, including 503 single-copy genes across the three organisms, with a greater number of gene clusters shared between Arabidopsis and mouse than either share with yeast. Interestingly, approximately 5% transposable element genes were apparently transcribed in male meiocytes, with a positive correlation to the transcription of neighboring genes. In summary, our RNA-Seq transcriptome data provide an overview of gene expression in male meiocytes and invaluable information for future functional studies.


Meiosis is essential for sexual reproduction in eukaryotes, and is characterized by homologous chromosome (homolog) pairing, synapsis and meiotic recombination (Hamant et al., 2006; Ma, 2006; Handel and Schimenti, 2010). Meiotic recombination not only ensures accurate homolog segregation, but generates genetic diversity within a population (Ma, 2005). Because meiosis is evolutionarily conserved, knowledge of meiotic gene activities from model systems could provide invaluable information for studying and understanding meiosis in other eukaryotes, such as crop plants. Furthermore, many of the genes affecting fundamental structures (e.g. the synaptonemal complex) and processes (e.g. meiotic recombination and chromosome segregation) are conserved between fungi, plants and animals (Zickler and Kleckner, 1999; Ma, 2005; Hamant et al., 2006; Wilkins and Holliday, 2009; Handel and Schimenti, 2010).

Arabidopsis thaliana is an excellent model system and has been used extensively to study fundamental biological processes, including meiosis (Ma, 2006). Dozens of meiotic genes have already been discovered in Arabidopsis (Klimyuk and Jones, 1997; Grelon et al., 2001; Armstrong et al., 2002; Azumi et al., 2002; Li et al., 2004; Stacey et al., 2006). Interestingly, analyzes in Arabidopsis also led to some new findings about the meiotic progress (Ma, 2005), such as the function of PTD in meiotic cross-over formation and possibly in Holliday junction resolution (Wijeratne et al., 2006).

In flowering plants, male meiosis occurs within the anther, which also contains several somatic tissues (Ma, 2005). Recently, microarray analyzes in Arabidopsis and other plants have identified reproductive genes using young wild-type and mutant anthers spanning the time of meiosis (Wijeratne et al., 2007; Hobo et al., 2008; Ma et al., 2008), allowing identification of candidate meiotic genes, such as RCK and PTD, which were then shown to be important for meiotic recombination (Chen et al., 2005; Wijeratne et al., 2006). Because meiocytes H. Yang et al. are surrounded by somatic anther cells (Goldberg et al., 1993), analysis of the meiocyte transcriptome will significantly expand our knowledge of gene expression in meiocytes, particularly meiocyte-preferential expression. The meiocyte transcriptome has already been studied in yeasts, rodents and human (Chu et al., 1998; Primig et al., 2000; Chalmel et al., 2007; Juneau et al., 2007; Wilhelm et al., 2008), but not plants.

Whole-transcriptome sequencing, usually termed RNA sequencing (RNA-Seq), has emerged as a powerful and quantitative method to explore the RNA transcripts in a specific tissue or cell type. Compared to microarrays, RNA-Seq methods rely less on existing genomic sequence information, cover a substantially larger range of expression levels, and can even uncover unexpected regulatory mechanisms (Wang et al., 2009). It has been used to investigate the mRNA and small RNA transcriptomes in Arabidopsis flowers (Lister et al., 2008) and meiotic transcriptomes in fission yeast (Wilhelm et al., 2008). To better understand meiosis in flowering plants, we used the SOLiD RNA-Seq platform in combination with a new technique of micro-collection for Arabidopsis male meiocytes to investigate the gene expression profile, and compared it with other expression information for plant tissues and non-plant cells.


RNA-Seq reveals that approximately 20 000 genes are active in Arabidopsis male meiocytes

To investigate the Arabidopsis male meiocyte transcriptome, we implemented high-throughput RNA-Seq using the SOLiD technology to analyze RNAs from male meiocytes isolated using a stepwise procedure (see Experimental procedures). Two separate sequencing runs were performed, producing 14 341 970 35 bp and 28 777 947 50 bp mapped reads, with 30 or 33% matching unique genomic positions, respectively (Table S1). Of the uniquely mapped reads, 90% were located in known exons (Table 1), and 2–3% of the reads covered known exon–intron junctions or completely fell within introns, with the remainder in intergenic regions. The reads that mapped to multiple positions were located in intergenic regions significantly more frequently than in genes (Table S1). Expression was detected across whole chromosomes, except for very low levels in the centromeric regions (Figure 1a), for both uniquely and multiply mapped reads (Figure S1), suggesting that centromeres are repressed during meiosis.

Table 1.   Summary of transcript types detected by short reads

Replication 1Replication 2
Gene numberRead numberGene numberRead number
  1. ancRNA, non-coding RNA; bsnoRNA, small nucleolar RNA; csnRNA, small nuclear ribonucleic RNA.

mRNA20 1632 933 93520 3804 802 459
tRNA3524 0913447 175
rRNA71 484 57173 296 637
All21 0784 459 24621 2258 172 507
Figure 1.

 The Arabidopsis male meiocyte transcriptome.
(a) The transcriptomic landscape for chromosome 1. For each window of 100 kb, normalized total reads matching exons or introns are shown above the x axis, and normalized total reads matching intergenic regions or transposable element genes are shown above the x axis. RPKM, reads per kilobase per million mapped reads.
(b) Number of genes that were detected with various numbers of reads.
(c) Number of genes detected with at least five reads in each replication.
(d) Number of genes with at least one intron detected by at least five reads in each replication.

We then determined the genome-wide meiocyte gene expression profile based on reads assigned to genes. In total, 21 078 and 22 339 genes with at least one read were detected in the two sequencing runs, respectively, with 19 829 genes detected in both runs. Most of the genes detected had fewer than 500 reads, with >71% of the genes having 5–500 reads (Figure 1b). Expression levels for genes with fewer than five reads showed greater variation between the two sequencing runs than other genes (Figure S2). This is probably due to the sampling effect of the sequencing runs, although other possibilities cannot be ruled out, such as sequencing errors. Subsequently, we focus on the 16 162 genes with at least five reads in both sequencing runs, unless indicated otherwise (Figure 1c).

We also found that introns of 2698 genes had at least one read in each of the two sequencing runs; among these genes, 425 had at least five intronic reads in each sequencing run (Figure 1d and Table S2), suggesting non-coding transcripts or possible unannotated exons. Compared to introns lacking sequenced reads, these putatively expressed introns were on average much longer (556 versus 152 nucleotides; Mann–Whitney rank sum test = 0), and were much more frequently the first introns of the corresponding genes (173 first introns among 444 putatively expressed introns versus 19433 first introns among all 109733 introns without any reads; chi-squared test, P = 4e−19). For 167 putatively expressed introns, we found conserved functional RNA sequences using RNA family information from the Rfam database (Gardner et al., 2009), including enrichment for microRNAs (miRNAs), snoRNAs (including H/ACA box snoRNAs) and group II introns. Expression of these intronic non-coding RNAs may interfere with splicing of the relevant introns (Brown et al., 2008). Moreover, putatively expressed introns with conserved RNA structures were expressed at significantly higher levels than other putatively expressed introns (291 versus 29, average number of reads for the two runs; Mann–Whitney test, = 5e−15), suggesting that they may indeed represent functional RNA elements.

As a test for the reliability of our transcriptome data, we detected the expression of all 71 genes with documented functions in Arabidopsis meiosis (Ma, 2006; Mercier and Grelon, 2008) (Table S3). We also detected expression of many homologs of meiosis-related genes from other model species. For example, for 83 human meiosis-related genes in the GenBank database, we identified 72 Arabidopsis homologs, all of which were detected (Table S4). Third, we examined the expression of selected genes using real-time PCR and in situ RNA hybridization, and the results were consistent with the RNA-Seq data (Figure 2 and Figures S3 and S4). In contrast, the Arabidopsis homologs (AT3G20580 and AT4G12520) of two tapetum-preferential rice genes (Os06g0685100 and Os04g0554500) (Suwabe et al., 2008) were expressed in the tapetum (Figure S4a,b), but were not detected in male meiocyte RNA-Seq data. Therefore, our male meiocyte data are reliable.

Figure 2.

 Validation of gene expression patterns.
(a–d) Analysis of the expression of genes AT5G62080, AT4G35420, AT5G16920 and AT4G14080 using real-time RT-PCR. RT, root; SM, stem, RL, rosette leaf; CL, cauline leaf; Se, silique; S1–9, stage 1–9, flower; S10–11, stage 10–11 flower; S12–13, stage 12–13 flower; OF, open flower. The data are means ± SD of three replicates.
(e–l) RNA in situ hybridization of anther sections using the following antisense probes: (e, f) AT5G62080, (g, h) AT5G16920, (i, j) AT5G53815. (k, l) Hybridization using a sense probe as a control. (e, g, i) Anther early stage 5, showing strong expression in male meiocytes and the tapetum; (f, h, j) anther stage 6. ep, epidermis; en, endothecium; t, tapetum; ml, middle layer; pmc, pollen mother cells; vc, vascular and connective tissues. Scale bar = 10 μm.

Transcriptome comparison between Arabidopsis male meiocyte and other tissues

To characterize the Arabidopsis male meiocyte transcriptome further, we compared it with published transcriptomic data of dozens of wild-type Arabidopsis tissues (Schmid et al., 2005; Wijeratne et al., 2007). The male meiotic transcriptome shows the largest overlap (approximately 67%) with tissues with undergoing active cell division (meiosis or mitosis), including floral buds (stage 9), anthers (stages 4–6) and the shoot apex (Figure 3a). Approximately 60% of genes detected in male meiocytes were also expressed in mature vegetative tissues, and smaller proportions overlapped with the expression in post-meiotic stamens and post-meiotic carpels (Figure 3a). Tissue transcriptome clustering on the basis of present/absent expression signals was consistent with the above results (Figure 3b).

Figure 3.

 Transcriptome comparison between male meiocyte and sporophytic tissues in Arabidopsis.
(a) Number of genes expressed in male meiocytes and one other tissue. Bars show the number of genes expressed in both male meiocytes and another tissue type. Labels: flower-S9, flowers at stage 9; carpels-12, carpels in stage 12 flowers; stamens-S12, stamens in stage 12 flowers. Other tissues are as indicated. Except for male meicoytes, expression data for all other tissues were from a previous microarray study (Schmid et al., 2005). Housekeeping genes were defined as genes expressed in all 79 samples examined in that study (Schmid et al., 2005).
(b) Transcriptome clustering based on binary expression signals (present/absent). Agglomerative hierarchical clustering was performed using functions from the R software (R Development Core Team, 2010), with distances between tissues/cell types calculated as the proportion of shared expressed genes and the clustering algorithm set as Ward’s minimum variance. In the names of tissues, suffixes indicate developmental stages, where ‘d’ represents days and ‘s’ indicates flowering developmental stages. Sources of data are the same as in (a).
(c) Venn diagram for the number of genes expressed in male meiocytes, anthers (stage 4–6) and flowers (stage 9). Expression data for anthers were from Wijeratne et al. (2007); those for flowers were from the same sources as used in (a).
(d) Venn diagram for the number of genes expressed in male meiocytes, inflorescence meristem and leaves. Expression data for the inflorescence meristem and leaves were obtained from the At-TAX database (Laubinger et al., 2008).

We also compared our meiocyte transcriptome data with microarray results from anthers and floral buds. We identified 3409 genes whose expression was detected in our RNA-Seq experiment but was missed by the previous microarray experiments (Figure 3c). Most of these genes (2162/3409; approximately 63%) were not represented on the Arabidopsis ATH1 chip array used previously (Schmid et al., 2005; Wijeratne et al., 2007). The remaining 1147 genes were not previously detected, possibly due to dilution by non-meiotic cells and/or limited sensitivity of microarray experiments. Another study with inflorescence meristems and leaves using whole-genome tiling arrays (Laubinger et al., 2008) examined many more genes (30 228) than present on the ATH1 chip (20 969). Our data uncovered expression of 1473 genes that were not detected by the tiling array study (Figure 3d), even though approximately 78% of these genes were represented on the tiling array, indicating that these genes might be male meiocyte-preferential. Taken together, these results indicate that deep sequencing can greatly increase the sensitivity of transcriptome analysis.

Gene families enriched in male meiocyte-expressed genes

We next searched for genes encoding specific Pfam-defined protein domains that are over-represented in male meiocytes. Of the 3378 Pfam domains (Finn et al., 2010) found in Arabidopsis, 3265 (approximately 97%) matched meiocyte-expressed genes, indicating the functional complexity of the Arabidopsis meiotic transcriptome. Furthermore, with a family-wise error rate of 0.05, 18 Pfam domains were enriched among meiocyte-expressed genes (Table 2 and Table S5), including several related to meiosis, such as the kinesin, helicase_C and DEAD box domains (Guogas et al., 2009).

Table 2.   The 18 most enriched Pfam domains in Arabidopsis male meiocytes
Pfam domainObservedExpectedP value

To identify putative meiotic regulators, we also examined the expression of transcription factor genes in male meiocytes. Despite their potential importance in regulating gene expression, few transcription factors involved in plant meiosis have been functionally characterized (Yang et al., 2003b). Here, we found that 1074 transcription factor genes annotated by the AGRIS database (Davuluri et al., 2003) were expressed in male meiocytes, indicating significant over-representation (1074 of 1851 predicted transcription factor genes; chi-squared test, < 2e−16). RNA in situ hybridization confirmed the meiotic expression pattern of four randomly selected transcription factor genes (Figure S4c–f). In addition, 50 of the 52 transcription factor families defined in the AGRIS database were represented by at least one member in our sequencing, with 25 transcription factor families being represented by at least ten members each in male meiocytes (Figure S5). These findings suggest the involvement of diverse transcriptional regulators during meiosis.

Identification of meiosis-preferential genes

To identify male meiocyte-preferentially expressed genes, we wished to compare our RNA-Seq data with published microarray data. However, direct comparison of RNA-Seq data and microarray results using an existing program was not feasible. We thus examined whether a gene is expressed in a tissue using the present/absent designations from microarray analysis, with data from experiments using either the Affymetrix ATH1 arrays or the Arabidopsis 1.0F tiling arrays (>30 000 genes) and more than 90 tissue samples. In total, we found 844 genes whose expression in male meiocytes was very reliable, but whose expression in non-meiotic samples was very weak or even undetected, suggesting that they were male meiocyte-preferential. Several known meiotic genes were found in the meiosis-preferential gene set, including SDS (Azumi et al., 2002), MMD1 (Yang et al., 2003b) and RCK (Chen et al., 2005). Furthermore, gene ontology (GO) analysis showed that several enriched biological processes were important for meiosis, such as meiotic recombination, chiasma formation and synapsis (Table S6). Another of the most enriched GO terms was related to the ubiquitylation process (GO 0006511; 18 genes), suggesting the existence of meiosis-specific protein degradation pathways. In total, we identified 17 Pfam gene families that are significantly enriched in male meiocyte-preferential genes, six of which are of unknown functions (Table S7).

The conservation and divergence of meiotic processes revealed by comparative transcriptome analysis

To investigate the similarity and differences in meiocyte transcriptomes between organisms, we used the software Inparanoid (Remm et al., 2001, version 3.0) to identify putative orthologs for Arabidopsis genes in several other species, including poplar (Populus trichocarpa), rice (Oryza sativa), budding yeast (Saccharomyces cerevisiae), fission yeast (Schizosaccharomyces pombe), mouse (Mus musculus) and human.

We first identified the number of meiocyte-expressed Arabidopsis genes that had homologs in each of the examined species, compared with all Arabidopsis genes (Figure 4a; binomial test, < 2e−16 for all cases). For the 16 162 meiocyte-expressed Arabidopsis genes, approximately three-quarters and two-thirds have homologs in poplar and rice, respectively, with lower proportions having homologs to mammalian genes (Figure 4a; binomial test, < 2e−16 for all comparisons), and even fewer having homologs to yeast genes (Figure 4a; binomial test, < 2e−16 for all comparisons), suggesting greater conservation between plant and mammalian meiosis than with yeasts.

Figure 4.

 Conservation of meiotic genes across model species.
(a) Percentage of Arabidopsis genes that lack or have homologs in other model species. Group A are Arabidopsis genes that do not have detected homologs in other species. The remaining bars represent the percentages of Arabidopsis genes with homologs in one or more other species. Gene homologs were identified using the software Inparanoid (version 3.0, Remm et al., 2001). Species: A, Arabidopsis; P, poplar; R, rice; H, human; M, mouse; F, fission yeast; B, budding yeast; MY, mammal plus yeasts, i.e. the union of human, mouse and the two yeasts.
(b) Meiocyte transcriptome comparison among Arabidopsis, mouse and fission yeast. Numbers indicate orthologous gene clusters with at least one member expressed in meiotic cells in one or more organisms. Orthologous clusters were identified using Inparanoid. Meiocyte transcriptome data for mouse and yeast were from previous publications (Chalmel et al., 2007; Wilhelm et al., 2008).
(c) Conserved single-copy genes expressed in meiotic cells of the three organisms. These single-copy genes are among the orthologous gene clusters in (b).
(d) Percentage of yeast core meiotic genes (Primig et al., 2000) with homologs expressed in meiotic cells of Arabidopsis and/or mouse.

We then compared the Arabidopsis male meiocyte transcriptome with that of mouse (Chalmel et al., 2007) and fission yeast (Wilhelm et al., 2008) to estimate the number of genes expressed during meiosis in two or more organisms. Because of gene duplication, the numbers of genes in a homologous group were not always the same for all species, thus we define homologous gene clusters as those that had at least one gene from each species (Figure 3b). A total of 1012 gene clusters were conserved in both sequence and expression across the three organisms (Figure 4b). Possibly the precursors of these genes functioned in meiocytes of the last common ancestor of plants, fungi and animals. Among the 1012 conserved gene clusters in meiocytes, 503 were found to contain single-copy genes in each of the three organisms (Figure 4c and Table S8), suggesting that they are highly conserved. GO annotation analysis showed that, in addition to housekeeping genes, genes participating in DNA repair and replication were significantly enriched among the conserved single-copy genes (Table S9).

A significantly higher number of meiocyte-expressed genes were shared between Arabidopsis and mouse than between either and fission yeast (Figure 4b; chi-squared test, < 2e−16). In particular, 1497 orthologous gene clusters were shared exclusively between the Arabidopsis and mouse male meiocyte transcriptomes, supporting greater similarity in meiotic processes between mammals and plants, and consistent with previous molecular genetic and evolutionary studies (Lin et al., 2006, 2007; Ma, 2006). This difference in gene conservation might be due to relatively rapid evolution of gene function in yeasts, including extensive gene losses, such as three genes in the recA/RAD51 gene family (Lin et al., 2006). Of the three genes in the recA/RAD51 gene family that have been lost in yeasts, RAD51C and RAD51D were found in our group of meiotic genes shared by Arabidopsis and mouse.

We also compared our meiocyte transcriptome with four previously defined meiotic gene clusters in budding yeast, peaking at different times during meiosis (Primig et al., 2000). For yeast meiotic clusters I–IV, approximately 40, 28, 16 and 24%, respectively, had homologs in Arabidopsis (Table S10), and almost all of these Arabidopsis homologs were also expressed in male meiocytes (Figure 4d). Homologs in Arabidopsis for cluster II genes were enriched for proteins required for DNA replication, recombination and chromosome synapsis, while cluster III genes primarily encoded proteins for cell-cycle progression, such as the anaphase-promoting complex, SEC14 proteins and cyclins (Table S10).

Evidence for active transposable elements in male meiosis

Because genomic re-organization or mutations induced by insertion/deletion of transposable elements (TEs) during meiosis can potentially be transmitted to the next generation, we investigated the transcriptional activities of TEs in meiocytes. Our RNA-Seq experiment provided the opportunity to detect expression of TE-encoded genes, allowing estimation of transposon activity in meiocytes. We identified 467 TE genes (of 3901 annotated TEs, approximately 12%) that had one or more uniquely mapped reads, indicating possible activities of these TEs during meiosis. We concentrated on 191 potentially active TE genes (approximately 5%) for which at least one read was detected in both sequencing runs. The mean expression level of this small set of TE genes was approximately 2.9 ± 1.0 RPKM (reads per kilo-base of mRNA length per million of mapped reads) (mean ± standard error of mean), which is significantly lower than the value of 56.8 ± 37.7 RPKM for endogenous protein-coding genes (Mann–Whitney test, < 2e−16), suggesting strong constraints on TE activities in meiocytes. The detected TEs belong to eight TE families, as well as an unclassified group (Figure 5a). The two most enriched TE families were Copia and Gypsy, both of which are long terminal repeat (LTR) retrotransposons. However, one of the non-LTR retrotransposon families, short interspersed nuclear element (SINE), showed the highest proportion of active genes (9 of 17, Figure 5a). These active SINE genes showed highly variable as well as significantly higher expression than other expressed TE genes (33.3 ± 19.7 versus 1.4 ± 0.2 RPKM; Mann–Whitney test, P = 5e−5; Figure 5b), suggesting that this type of TE gene may play a special role in meiosis.

Figure 5.

 Transposable elements (TEs) active in Arabidopsis male meiocytes.
(a) Distribution of expressed TE genes. Numbers represent active genes (before the forward slash) and total genes (after the forward slash) for each TE family.
(b) Variation of expression levels for genes of TE families (same order as in a). The bold central line indicates the median expression value, the upper and lower borders show the upper and lower quartile points, respectively, and the ‘whiskers’ extend to the most extreme points within 1.5× interquartile range of the box.
(c) Distribution of expression values for TE genes active in male meiocytes. Expression data are from male meiocytes (this study) and 11 sporophytic tissues (tiling array experiments, see text). Genes present in each platform were equally divided into 100 groups according to ranked expression levels, separately for each tissue/cell type. TE genes were then classified into one of the expression groups in each tissue according to their respective expression values.
(d) Heat map of ranked expression values for TE genes expressed in male meiocytes. Tissues: leaf_35d, senescing leaves; leaf_7d, expanding leaves at day 7; v.s.m., vegetative shoot meristem; inflo_wt, wild-type inflorescences; inflo_clv3-7, mutant inflorescences; i.s.m., inflorescence shoot meristem. The bar on the left indicates the family for the TE gene, for which the key presented above the heat map. The color key in the top left-hand corner is for colors in the heat map.
(e) Median expression level (RPKM) of the ten closest genes upstream ordownstream of active TE genes, silenced TE genes or protein-coding genes, respectively. On the x axis, the origin point represents the three types of genes taken for comparison, while negative coordinates and positive coordinates represent upstream and downstream gene loci, respectively.
(f) Transcript types of genes adjacent to active TE genes, silenced TE genes and protein-coding genes.

To identify commonly as well as meiocyte-preferentially expressed TE genes, we compared expression patterns for active TE genes in meiocyte and sporophytic tissues. 183 among the 191 putative meiocyte-expressed TE genes have available expression data in sporophytic tissues obtained via tiling array experiments (Laubinger et al., 2008). Most TE genes were expressed at very low levels across sporophytic tissues, consistent with their low expression in meiocytes (Figure 5c). We identified three clusters of TE genes based on their expression pattern (Figure 5d). The first cluster, including 117 genes, showed little or no expression across all tissues or cell types, suggesting broad repression of their activities. The second cluster of 19 genes showed broad expression across most tissues. The remaining 47 genes showed meiosis-preferential expression (Figure 5d). In this cluster, 21 TEs were not assigned to any family; of the remaining 26, LTR retrotransposons were the most enriched class, including seven Copia and six Gypsy genes. The second most enriched meiosis-preferential TE family was the SINE family (seven genes), which also propagates via retrotransposition. Three long interspersed nuclear element (LINE) genes were also included in this cluster. To validate the expression pattern of these meiosis-preferential TE genes, we performed RNA in situ hybridizations for a Copia element (AT5G53815), and found that it was strongly expressed in male meiocytes (Figure 2i,j). Although the relationship between active TEs and meiosis is largely unknown, it is possible that TEs that are active during meiosis could lead to genetic changes in the next generation.

To investigate factors that might affect TE expression in meiocytes, we compared the genomic contexts for TE genes that were either transcribed in meiocytes or silent, with those for endogenous protein-coding genes expressed in meiocytes. The gene density (27.0 ± 0.4 genes per 100 kb) for genomic regions with active TEs was significantly lower than that in regions with active protein-coding genes (30.4 ± 0.0 genes per 100 kb; Mann–Whitney test, P < 2e−16), but higher than that in genomic regions with silent TE genes (24.0 ± 0.1 per 100 kb, Mann–Whitney test, P = 1.5e−9). Analysis of expression of the surrounding genes revealed different patterns among these three types of genes: genes neighboring silent TEs showed lower expression than those neighboring active TEs, which in turn exhibited lower expression than those neighboring protein-coding genes (Figure 5e and Table S11). The genes types of the neighbors of the three types of gene were also different (Figure 5f): most neighbors of protein-coding genes were also protein-coding (91.0%), whereas most neighbors of silent TE genes were also TE-related (60.0%), consistent with the previous finding that TEs are strongly under-represented near protein-coding regions (Wright et al., 2003). Coincident with their expression in meiocytes, active TEs were flanked more often by protein-coding genes than by TE genes (58.4 versus 31.2%), which is significantly different from silent TE genes (chi-squared test, < 2e−16). Thus the transcriptional states of TEs may be correlated with those of nearby genes, and multiple TEs within a region tend to be co-repressed.


In this study, we explored the transcriptome of a single Arabidopsis cell type, the male meiocyte, using RNA-Seq, and detected the expression of approximately 20 000 genes. The reliability of our transcriptomic data was supported by detection of all currently known Arabidopsis meiotic genes and many Arabidopsis homologs of meiotic genes in other species, and by the results of real-time quantitative RT-PCR and RNA in situ analysis. Over 60% of the annotated Arabidopsis genes have detectable expression in meiocytes, greater than previously detected expression in other Arabidopsis single cell types and even tissues with multiple cell types (Schmid et al., 2005; Yang et al., 2008). In particular, guard cells express fewer than 50% of the genes on the microarray chip, suggesting greater meiocyte transcriptome complexity and/or RNA-Seq sensitivity (Yang et al., 2008). The expression of the majority of Arabidopsis genes suggests that the meiocyte transcriptome supports much more than meiosis-specific functions.

Our comparison of the Arabidopsis male meiocyte transcriptome with mouse and yeast meiocyte transcriptomes indicated that a greater number of genes were expressed in Arabidopsis male meiocytes and that these represent a greater percentage of the total annotated genes than in mouse. Therefore, it is possible that the Arabidopsis male meiocyte is less specialized than the mouse male meiocyte. Unlike the early separation of germline cells from somatic tissues in animals (Hubbard and Pera, 2003), plant meiocytes are only a small number of mitoses from totipotent stem cells of the meristem (Ma, 2005), and possibly retain some stem cell properties. Furthermore, at both the sequence and expression levels, there is a greater degree of conservation between Arabidopsis and mouse meiocyte transcriptomes than of either to those of yeasts, even though fungi are evolutionarily closer to animals. Because meiosis is an ancient process, it is possible that plants and animals, which have longer generation times, have retained a greater extent of functional conservation, whereas the single-cellular yeasts have diverged more rapidly, consistent with molecular evolutionary studies of meiotic gene families (Lin et al., 2006, 2007).

Gene families implicated in meiotic processes

Our study provides direct support for the activity of candidate meiotic genes that are homologs of known meiotic genes in other organisms. For example, the MSH protein family includes MSH2–MSH7 (Lin et al., 2007). MSH2, 3 and 6 are important for DNA mismatch repair in yeast (Marsischky et al., 1996). In Arabidopsis, MSH4 and MSH5 are important for meiotic cross-over (Higgins et al., 2004, 2008; Lu et al., 2008). In vitro, the Arabidopsis MSH2/3, MSH2/6 and MSH2/7 heterodimers recognize subsets of DNA lesions (Culligan and Hays, 2000; Wu et al., 2003). AtMSH2 also plays anti-recombination roles (Emmanuel et al., 2006). However, the function of other MSH genes in meiosis is not clear. The expression of all Arabidopsis MSH genes in male meicoytes suggests that they function in meiosis (Table S12). AtPMS1 plays an anti-recombination role in mitosis (Li et al., 2009). The finding of AtPMS1 expression in male meiocytes (Table S12) suggests that it may also function in meiosis.

Kinesins are motor proteins that function in the assembly and positioning of spindle and other microtubule structures (Miki et al., 2005). Arabidopsis has 61 kinesin genes (Reddy and Day, 2001), three of which, ATK1, ATK5 and TETRASPORE, play crucial roles in male meiosis (Chen et al., 2002; Yang et al., 2003a; Quan et al., 2008). We detected expression of these three genes and 51 additional kinesin genes in male meiocytes, suggesting that many kinesins function in meiosis, consistent with the finding that multiple kinesins tend to work together (Hancock, 2008), and further suggesting that Arabidopsis meiocytes require active participation of the vast majority of kinesins.

DNA helicases are important for the maintenance of genome stability, affecting DNA repair and recombination (Guogas et al., 2009), and include two closely related RecQ helicases, AtRECQ4A and 4B (Hartung et al., 2007) and ROCK-N-ROLLERS (RCK), which are involved in chromosome synapsis and cross-over formation (Chen et al., 2005). RCK also has a DEAD box domain, which is another enriched Pfam domain, and 88 more DEAD-box-containing ATP-binding helicases were present in our transcriptome dataset (Table S5). Furthermore, expression was detected for other helicases in the AAA protein family, which includes a phylogenetically defined ‘meiotic’ group (Frickey and Lupas, 2004; Forster et al., 2008). This meiotic group includes the PCH2 protein, which is conserved in animals and fungi and acts as a checkpoint for defects in the meiotic synaptonemal complex (Wu and Burgess, 2006). Arabidopsis male meiocytes express at least 103 AAA-ATPase genes, suggesting that they regulate many aspects of meiosis. Of these, one shows high sequence similarity to the yeast PCH2 protein (AT4G24710, BLAST E-value = 1e−51). Although plant meiosis appears to differ from animals in terms of checkpoint regulation (Yang et al., 2003b; Ma, 2006), expression of PCH2 provides an entry point to study this important regulatory mechanism and possible conservation between plants and animals.

Regulatory control of meiosis

Although few transcription factors are known to regulate meiosis in plants, except the MMD1 gene encoding a PHD domain protein (Yang et al., 2003b), our analysis uncovered expression of many predicted transcriptional regulators. We identified 54 genes containing the PHD domain in Arabidopsis, with statistically significant over-represention in male meiocytes (Table 2). In addition, we found enrichment of transcription factor families, including significant over-representation of the GRAS family (29 of 33 genes; chi-squared test, = 3.7e−5), which is a plant-specific transcription factor family that plays diverse roles during plant development (Hirsch and Oldroyd, 2009). In the monocot Lilium longiflorum, the GRAS gene LlSCL regulates meiosis-associated genes (Morohashi et al., 2003). An Arabidopsis homolog (AT1G07530) of LlSCL was strongly expressed in male meiosis, suggesting that it may also regulate meiotic gene expression.

Gene expression may also be regulated at the chromatin and post-transcriptional levels. Of the 18 over-represented protein families in male meiocytes, some are probably important for chromatin structure, such as core histone proteins and methyltransferases, and others probably process RNA post-transcriptionally, such as RNA recognition motif-containing proteins. Interestingly, our sequencing also detected reads for 25 miRNA genes, 14 of which had at least one read in both runs (Table S13). Because we used a poly(T) oligo to specifically purify mRNAs for library construction and sequencing, we were only able to detect pri-miRNA molecules, which have poly(A) tails (Kurihara and Watanabe, 2004). Additional analysis indicated that the two genes with most reads, namely miR839 and miR163, were both species-specific, (Xie et al., 2005; Rajagopalan et al., 2006). miR163 has five experimentally validated target genes, all encoding S-adenosyl-l-methionine-dependent methyltransferases (Allen et al., 2004). These five genes form two clusters in the genome: one on chromosome 1 (AT1G66690, AT1G66700 and AT1G66720) and the other on chromosome 3 (AT3G44860 and AT3G44870). The miR163 gene (AT1G66725) itself maps just downstream of the first cluster. The tandem positions of miRNA and target genes have been taken as evidence for the inverted duplication hypothesis for evolution of miRNA genes (Allen et al., 2004). Our RNA-Seq analysis did not detect reads for the three genes in the first cluster, and only one or two reads were detected for the two genes of the second cluster. miR839 has only one computationally predicted target, AT2G33710 (Zhang et al., 2010), which encodes a AP2 domain-containing transcription factor, for which no reads were detected. The limited or absent expression of the target genes suggests that they are silenced by the miRNAs. Several other detected miRNA genes were also non-conserved, such as miR824, miR856, miR780 and miR775 (Fahlgren et al., 2007). Therefore, species-specific post-transcriptional control of gene expression may have evolved to regulate the meiosis process.

Furthermore, putative proteins for ubiquitylation are also significantly enriched. The two most highly enriched domains, DUF627 and DUF629, with undefined functions, are present in 15 proteins, 12 of which also contain the UCH domain, which known to be involved in ubiquitylation/deubiquitination (Li et al., 2000). Another domain involved in ubiquitin-related proteolysis, UBA, was present in three proteins encoded by meiocyte-preferentially expressed genes (Figure S4l). The UBA domain was also found in one of two proteins containing two other domains of unknown function, DUF908 and DUF913, suggesting that the DUF domains might function in ubiquitylation-related processes. All together, five of the 17 enriched Pfam domains are related to ubiquitylation, suggesting that protein degradation plays important roles in male meiocytes.

Assessing the conservation of functional pathways in meiosis

Our data make it feasible to systematically examine the possible functional conservation in Arabidopsis of entire meiotic pathways that have been established in other organisms. In yeasts (S. cerevisiae and S. pombe) and mammals, the RAD1/RAD10 (XPF/ERCC1) protein complex participates in both the nucleotide excision repair (NER) and single-strand annealing (SSA) DNA repair pathways (Ataian and Krebs, 2006). NER repairs helix-distorting DNA lesions, such as UV photoproducts and bulky covalent chemical adducts, whereas SSA participates in the repair of DNA double-stranded breaks, mainly between or within repeated sequences (Ataian and Krebs, 2006). In Arabidopsis, little is known about genes involved in these two pathways, especially in meiosis (Bray and West, 2005), although the PTD gene for cross-over formation is a distant homolog of RAD10/ERCC1 (Wijeratne et al., 2006). We identified Arabidopsis homologs for most of the proteins involved in the NER and SSA pathways, and found that all these homologs were expressed in male meiocytes, suggesting that both pathways are functional during Arabidopsis meiosis (Table S14).

Of the components of the NER pathway, 34 have detectable Arabidopsis homologs but two do not, namely RAD14 and ABF1. For the SSA pathway, 5 of 16 genes lack obvious homologs: RAD52, RAD59, RFA3, SAW1 and SLX4 (BLAST E-value <0.1). These genes are also absent in the rice and poplar genomes, suggesting that plants might use proteins encoded by other genes. In S. cerevisiae, RAD14 and ABF1 bind to DNA and interact with the RAD1/RAD10 and RAD7/RAD16 complexes, respectively, to facilitate recognition of damaged nucleotides (Yu et al., 2004, 2009; Guzder et al., 2006). Therefore, plants might recognize DNA lesions via other mechanisms during NER. In the SSA pathway, the genes lacking Arabidopsis homologs play essential functions. The S. cerevisiae RAD52 and RAD59 proteins form a complex to promote annealing of complementary single-stranded DNA (Sugawara et al., 2000; Davis and Symington, 2001), suggesting that other factors in plants serve the roles of RAD52/RAD59 in annealing complementary single strands during the SSA process in Arabidopsis meiosis.

Telomeres are essential for genomic stability (Shen et al., 2010) and play important roles in meiotic chromosome pairing (Liu et al., 2004; Siderakis and Tarsounas, 2007). We found a high level of expression of several genes related to telomeres, such as AT5G06310 (14.85 RPKM) and AT2G05210 (38.84 RPKM), which encode homologs of Pot1 (protection of telomeres 1), AtSTN1 (AT1G07130; 4.00 RPKM), which is required for telomere length homeostasis and chromosome end protection (Shakirov et al., 2005; Song et al., 2008), and AT5G03610 (1.10 RPKM) and AT4G06634 (8.75 RPKM), which encode homologs of NDJ1 and ZSCAN4, respectively, that function in telomere maintenance and meiotic recombination (Wu and Burgess, 2006).

Transposable elements active in meiosis

Transposable elements are considered to be a major force shaping the structure and dynamics of host genomes (Pace and Feschotte, 2007; Gogvadze and Buzdin, 2009). TEs have also contributed to the generation of genetic diversity and new genes (Xiao et al., 2008). Despite these potential benefits, TE activities are primarily mutagenic and they are generally silenced (Tran et al., 2005; Slotkin and Martienssen, 2007). Thus the propagation of TE and the host defensive mechanisms form a continuous battle. It is important for TEs to be mobile in the germline, and it is less deleterious to transpose in diploid tissues because mutations can be compensated by the functional copy. Indeed, TE expression in haploid pollen is limited to the vegetative cell and absent from sperm cells (Slotkin et al., 2009). Male meiocytes represent the final stage of the diploid phase, and the last opportunity for transposition in a diploid cell. We found that approximately 5% of annotated TE genes are expressed in Arabidopsis male meiocytes, and that the expression of TE genes is correlated with that of their neighboring genes.

The detection of active pollen TEs only in vegetative cells but not in sperm cells (Slotkin et al., 2009) suggests that genomic alterations generated by pollen-active TEs cannot be transmitted, as the vegetative nucleus do not contribute to the next generation. On the other hand, genomic changes during male meiosis can be transmitted, if they do not damage gene functions important for meiosis or pollen development. It appears that plants have evolved mechanisms to prevent transposition in the haploid sperm, but transposons have remained active during meiosis, resulting in a balanced activity that allows limited TE mobility in the germline. Recently, Mourier and Willerslev (2010) analyzed the activities of LTR retrotransposons in the fission yeast S. pombe during mitotic growth and meiosis, and found that a small proportion of LTRs contribute significantly to the repertoire of active LTRs; during meiosis, the transcription of solitary LTRs correlated with that of nearby protein-coding genes. Therefore, meiotic activity of a small percentage of transposons might be a conserved feature of the transposon lifecycle.

In previous studies, co-expression of adjacent genes was interpreted as having shared regulatory elements (Williams and Bowles, 2004), transcriptional read-through (Ebisuya et al., 2008) and chromatin remodeling (Chen et al., 2010). Our analysis indicates that a small proportion of TE genes might be co-expressed with their neighboring genes during meiosis, possibly caused by chromatin remodeling. Transcription of neighboring genes might cause the local chromatin of active TE genes to be open, increasing accessibility. On the other hand, it has been shown that silencing of TEs is possibly achieved via chromatin-based methylation mechanisms (Tran et al., 2005), possibly affecting the expression of nearby genes. During meiosis, transcription of TE genes might affect chromatin structures, which could in turn facilitate meiotic processes, such as pairing and recombination, both of which are dependent on an opened chromatin for search of the homologous chromosome.

Meiosis is both an essential step for sexual reproduction and one of the most ancient processes to facilitate re-distribution of genetic variation and increase biodiversity. These functions of meiosis have contributed greatly to the immense success of eukaryotes. Our analysis showed that the Arabidopsis male meiocyte transcriptome is very complex and contains many genes with potentially important functions that require further study. Furthermore, plant and mammalian meiocyte transcriptomes might be more similar to each other than either is to those of yeasts. We also found evidence for active TEs in the male meiocytes, providing an additional means to increase genetic changes, and possibly even contributing to the process of meiosis itself. These observations form a foundation for future functional studies.

Experimental procedures

Plant materials, growth conditions, and Arabidopsis male meiocyte isolation

All Arabidopsis plants in this study were of the Columbia (Col) ecotype. All plants were grown in Metro-Mix 200 soil (Greenhouse & Nursery Supplies, under 16 h light and 8 h dark in a growth chamber at 18–22°C. To isolate Arabidopsis male meiocytes, young floral buds were harvested, and stage 5–7 anthers were separated from other floral organs on a glass slide under a stereoscopic microscope (Nikon, (Figure S6a–c). The dissected anthers were immediately transferred to a micro-chamber with liquid meiocyte medium. Subsequently, two tiny needles were used to gently dissect anthers to release the meiocytes under the dissecting microscope (Figure S6d–f). The micro-chamber with dissected samples was moved onto an inverted microscope (Zeiss, with a micro-manipulation platform (Figure S6d), which was used to collect the meiocytes. Meiocytes during prophase I were easily detached from the tapetum, but remain associated with each other (Figure S6e), forming a cluster called a ‘worm’. Subsequently, meiocytes dissociate and have thickened walls (Figure S6f). A micro-glass-needle was mounted onto the micro-manipulator, and used to pipette meiocytes in the chamber by gentle mouth pipetting through a long plastic pipette. We refer to this newly developed method as ‘micro-collection of male meiocytes’ (Figure S6). Because the meiocytes were progressively separated from other non-meiotic cells, the preparation had little contamination, with an estimated purity of over 95% (Figure S6g). The collected meiocytes were immediately frozen in liquid nitrogen and kept at −80°C until further use.

Total RNA isolation and cDNA amplification by PCR

Total RNA was extracted from male meiocytes using Trizol reagent (Invitrogen, according to the manufacturer’s protocol. Approximately 300 ng total RNA was used for cDNA synthesis with SuperScript® III reverse transcriptase (Invitrogen) and a SMART™ PCR cDNA synthesis kit (Clontech, Twenty-one cycles of long-distance PCR (according to the Clontech protocol) were used to amplify total cDNAs. Finally, approximately 6 μg cDNA was used for sequencing by the Pennsylvania State University DNA Facility using Life Technologies’ SOLiD™ sequencing platform (

Mapping of short color-space reads

Two runs of sequencing yielded 32 156 564 reads of 35 bp and 87 448 276 reads of 50 bp, respectively. Mapping of the reads was performed against downloaded Arabidopsis genomic sequences and annotations (release 9) from the Arabidopsis Information Resource (TAIR) database (TAIR9;, using a third-party software package, the Short Read Mapping Package (SHRiMP, Rumble et al., 2009), with default parameters, except for the cross-over penalty score, which was set to −25. To map reads against spliced exons, 34 bp (for the first run) or 49 bp (for the second run) at either side of all possible junctions between annotated exons were joined into a pseudo-sequence, which was used as a pseudo-chromosome for read mapping. Mapping results for genomic sequence and for exon junction sequences were combined after removal of redundant alignments.

Gene expression profiling and determination of expressed introns

Uniquely mapped reads were assigned to genes using exon coordinates. Due to alternative splicing, a genic region could be annotated as exon and/or intron. Reads mapped to such regions were assigned a type according to the order of priority: exon >exon–intron junction >intron. Accordingly, any read that overlapped with an intron by at least 10 nucleotides was considered to be mapped to that intron. An intron was considered to be expressed if it had at least five uniquely mapped reads. The gene expression level was calculated as the number of reads per kilobase of mRNA length per million mapped reads. The mRNA length for a gene was calculated as the length of the longest variant plus the lengths of all expressed introns.

Gene ontology enrichment analysis

Gene ontology (GO) annotations for Arabidopsis proteins (revision 1.1275) were downloaded from GO enrichment analysis was implemented using the R Bioconductor package topGO (Gentleman et al., 2004; Alexa et al., 2006), using default algorithms and Fisher’s exact test to assess statistical significance.

Identification of putative orthologs

We employed a widely used software, Inparanoid version 3.0 (Remm et al., 2001), to identify potential orthologous clusters between species. The similarity cut-off for a potential homolog of an Arabidopsis gene was set to 100 bits (in terms of Blast score), which generally corresponds to an E-value of 1e−10 to 1e−15. Of the 30 688 Arabidopsis genes, 15 576, 13 797, 5367, 6931, 3550 and 3383 had detectable orthologs in poplar, rice, human, mouse, fission yeast and budding yeast, respectively. Orthologous clusters between mouse and budding yeast were identified in the same manner. Appendix S1 gives more details of this procedure.

Gene family annotation

The protein domain annotations in the Pfam database were obtained from (Finn et al., 2010). Arabidopsis protein sequences were then searched against protein family models in the Pfam-A database, and a putative domain was accepted if the E-value was below 1e−7, resulting in 19 973 Arabidopsis proteins identified as having at least one Pfam domain. Similarly, Pfam domains were identified for 5187 S. cerevisiae proteins, 4625 S. pombe proteins and 41 911 M. musculus proteins. Transcription factor family annotations were from the AGRIS database (, Davuluri et al., 2003), which contains information for 1851 Arabidopsis transcription factors. RNA family models were downloaded from the Rfam database (, Gardner et al., 2009). To identify conserved RNA sequences for putatively expressed introns, Rfam models were searched against sequences of putatively expressed introns using the cmsearch program (Nawrocki et al., 2009).

Transposable element data

Within the TAIR9 annotations, there are 3901 TE genes. Of the 467 TE genes expressed in male meiocytes, 325 were uniquely mapped to specific TEs, but only 263 had been assigned to respective TE families. We thus compared mRNA sequences of potentially expressed TE genes that lacked a known family with repetitive sequences for Arabidopsis from the Repbase Update database (Jurka et al., 2005), using a cut-off BLAST E-value <1e−30 for membership in a TE family. This analysis was able to designate a TE family type to 83 genes previously lacking TE family information.


We thank Professor S. Assmann (Biology Department of the Pennsylvania State University) for use of the inverted microscope and Professor F. Chalmel, Professor J. Bähler and Dr S. Marguerat for kindly supplying published transcriptomic data. This project was supported by funds from Rijk Zwaan, the Netherlands, the Pujiang Plan and Fudan University (the ‘985’ and ‘211’ Programs of the Ministry of Education). H.M. was also supported by the Biology Department at the Pennsylvania State University.