Analysis of plant germline development by high-throughput RNA profiling: technical advances and new insights


(fax + 41 44 6348204; e-mail


Reproduction is a crucial step in the life cycle of plants. The male and female germline lineages develop in the reproductive organs of the flower, which in higher plants are the anthers and ovules, respectively. Development of the germline lineage initiates from a dedicated sporophytic cell that undergoes meiosis to form spores that subsequently give rise to the gametophytes through mitotic cell divisions. The mature male and female gametophytes harbour the male (sperm cells) and female gametes (egg and central cell), respectively. Those unite during double fertilization to initiate embryo and endosperm development in sexually reproducing higher plants. While cytological changes involved in development of the germline lineages have been well characterized in a number of species, investigation of the transcriptional basis underlying their development and the specification of the gametes proved challenging. This is largely due to the inaccessibility of the cells constituting the germline lineages, which are enclosed by sporophytic tissues. Only recently, these technical limitations could be overcome by combining new methods to isolate the relevant cells with powerful transcriptional profiling methods, such as microarrays or high-throughput sequencing of RNA. This review focuses on these technical advances and the new insights gained from them concerning the transcriptional basis and molecular mechanisms underlying germline development.

The plant life cycle and germline development

Unlike in animals, the precursors of the plant germline lineage are not set aside early during development (reviewed by Dickinson and Grant-Downton, 2009). In contrast, the plant life cycle alternates between a diploid sporophytic and a haploid gametophytic generation. The gametophytic generation has been progressively reduced during the evolution of land plants (reviewed by Haig and Wilczek, 2006; Dickinson and Grant-Downton, 2009). In bryophytes, the gametophytic generation is the dominant multicellular phase, while the life cycle of pteridophytes and higher land plants is dominated by the sporophyte and the gametophyte remains nutritionally dependent on the sporophyte in the latter (Haig and Wilczek, 2006). In flowering plants (angiosperms), gametophyte development takes place in specialized reproductive organs of the flower, the anthers and the ovules, respectively. Single determined sporophytic cells, the archespores, get selected to undergo meiosis and to give rise to dimorphic male and female spores during micro- and megasporogenesis, respectively. Subsequently, the haploid gametophytes harbouring the gametes develop from the spores through mitotic divisions. Thus, the archesporial cells can be viewed as the first cells of the plant reproductive or germline lineages committed to produce the gametes (Grossniklaus, 2011; Schmidt et al., 2011). However, later stages of reproductive development, e.g. when the gamete lineage is specified, have also been proposed to be the decisive step in germline specification (Dickinson and Grant-Downton, 2009; Twell, 2011).

While the development of both male and female reproductive lineages starts with the selection of a sporophytic cell (meiocyte) that undergoes meiosis to form haploid spores, there are important differences during gametophyte development and gamete specification (Figure 1). In the male germline, all four microspores originating from a pollen or microspore mother cell (MiMC) undergo asymmetric pollen mitosis I (PMI) to produce a vegetative and a generative cell (reviewed by Borg et al., 2009). During pollen mitosis II (PMII) the generative cell divides to form two sperm cells that are delivered to the female gametophyte by the pollen tube, which is formed by growth of the vegetative cell (Figure 1). The timing of these developmental processes varies between species. In most species, the generative cell undergoes PMII during pollen tube growth. In other species, like crucifers or grasses, however, PMII and maturation of the tricellular pollen takes place in the anther prior to pollen release and germination (Boavida et al., 2005). During development of the female germline, a Polygonum-type embryo sac is formed in >70% of all species, including Brassicaceae (e.g. the model plant Arabidopsis thaliana) and Gramineae (e.g. maize, wheat, rice) (reviewed by Yadegari and Drews, 2004; Brukhin et al., 2005; Sprunck and Gross-Hardt, 2011). In contrast to the male germline, however, typically only one megaspore survives after meiosis of the megaspore mother cell (MeMC) while the others degenerate. This functional megaspore gives rise to a Polygonum female gametophyte (embryo sac) by three rounds of mitosis in a syncytium, followed by cellularization of the eight-nucleate embryo sac (Figure 1). The embryo sac harbours the female gametes, the egg and central cell, both of which require fertilization to initiate development of the diploid embryo and the triploid endosperm, respectively (Figure 1).

Figure 1.

 Schematic representation of male and female gametophyte development in Arabidopsis thaliana.
Germline development starts with sporophytic cells differentiating in to spore mother cells (female, megaspore mother cell; male, microspore/pollen mother cell). These mother cells undergo meiosis resulting in the formation of four haploid spores, three of which degenerate in the case of the female. Each functional spore subsequently gives rise to one gametophyte. Abbreviations: MeMC, megaspore mother cell; FMS, functional megaspore (FG1); e2nES/l2nES, early/late two-nucleate embryo sac (FG2/FG3); 4nES, four-nucleate embryo sac (FG4); 8nES, eight-nucleate embryo sac (FG5–FG6); (p)AP, (precursor of) antipodal cell; (p)EGG, (precursor of) egg cell; (p)SYN, (precursor of) synergid cell; (p)CC, (precursor of) central cell; mES, mature embryo sac (FG7); MiMC, microspore mother cell; MS, microspore; 1nP, uninucleate pollen; e2cP/l2cP, early/late bicellular pollen; mP, mature pollen; VC, vegetative cell; GC, generative cell; SC, sperm cell.

The identification of genes important for plant reproduction, and especially the investigation of the gene regulatory networks underlying plant germline specification and development, have proved to be difficult due to technical obstacles. For the female reproductive lineage in particular, the low number of cells in the germline lineage and their inaccessibility – they develop enclosed by sporophytic tissue – have long hampered transcriptional profiling approaches. However, recent advances in establishing methods to isolate individual cells from the germline lineages in combination with high-throughput transcriptome profiling techniques have yielded important new insights and will be summarized in this review.

Early approaches to identify genes important for or expressed during plant germline development

Over the last decade, genetic screens using the model plant A. thaliana have led to the identification of a number of genes involved in sporo- and gametogenesis (reviewed in Yadegari and Drews, 2004; Brukhin et al., 2005; Berger and Twell, 2011; Chang et al., 2011; Sprunck and Gross-Hardt, 2011; Twell, 2011). But only a few genes with specific expression in gametophytic cells were identified, mostly by enhancer detection, a method allowing the identification of genes based on their pattern of expression (Sundaresan et al., 1995; Grossniklaus et al., 1998, 2002; Gross-Hardt et al., 2007). A more comprehensive picture of the transcriptional landscape in the cells of the gametophyte only became possible with the advent of transcript profiling methods. These were used in combination with mutants lacking a female gametophyte, such as sporocyteless/nozzle (spl/nzz), coatlique (coa) and determinant infertile 1 (dif1), in which development typically arrests before the initiation of meiosis, at the megaspore stage, and during meiosis, respectively (Bai et al., 1999; Bhatt et al., 1999; Yang et al., 1999; Johnston et al., 2007). Transcriptional profiling of isolated ovules or pistils from those mutants was subsequently used to identify genes expressed in the female gametophyte. Comparative profiling of wild-type and spl mutant ovules using Affymetrix ATH1 arrays identified 225 potentially gametophyte expressed genes (Yu et al., 2005), while 1260 potentially embryo sac expressed genes were identified based on comparative profiling of wild-type and mutant spl and coa ovules and pistils, respectively (Johnston et al., 2007). In addition, 71 and 382 genes were identified to be downregulated in dif1 mutant ovules (Jones-Rhoades et al., 2007; Steffen et al., 2007), using either the Affymetrix ATH1 or tiling arrays. While giving important new insights into the transcriptional basis of embryo sac development, these studies were limited by several drawbacks: (i) expression of a considerable number of gametophytic genes was superposed by the expression of sporophytic genes in pistils or ovules and thus remained undetected, (ii) different influences of the mutants on sporophytic gene expression made the results more difficult to interpret, and (iii) the numbers of genes potentially expressed in the embryo sac remained considerably smaller than the estimated transcriptome size required for germline development (Yu et al., 2005; Johnston et al., 2007; Jones-Rhoades et al., 2007; Steffen et al., 2007). It became obvious from these studies that significant technical improvements were required to allow germline-specific sampling and transcriptional profiling (Johnston et al., 2007).

Technical advances in isolation of germline-specific cells and tissues

The first attempts to isolate embryo sacs from ovule tissues date back to the middle of the 20th century (reviewed by Xin and Sun, 2010). Isolation of male and female gametes in combination with expressed sequence tag (EST) sequencing and generation of cDNA libraries subsequently allowed the identification of genes expressed in gametes from maize (Zea mays), wheat (Triticum aestivum), Nicotiana tabacum and Arabidopsis, and from the generative cell from Lilium longiflorum (Dresselhaus et al., 1994; Kumlehn et al., 2001; Xu et al., 2002; Engel et al., 2003; et al., 2005; Sprunck et al., 2005; Okada et al., 2006; Yang et al., 2006; Xin and Sun, 2010; Xin et al., 2011). However, suitable techniques for targeted isolation of almost every cell type of interest from male and female germline lineages were established only recently, based on micromanipulation, fluorescence-activated cell sorting (FACS), and laser-assisted microdissection (LAM) (reviewed by Xin and Sun, 2010; Hu et al., 2011). In addition, a method for the isolation of nuclei from specific cell types (isolation of nuclei tagged in specific cell types, INTACT) has recently been developed (Deal and Henikoff, 2011). In brief, micromanipulation is based on manual dissection of the tissue, sometimes in combination with enzymatic digestions of cell wall components, while FACS sorts cells based on their fluorescence and light scattering characteristics (reviewed by Hu et al., 2011). During LAM, cells or tissue types of interest are isolated with a laser from thin sections of fixed and embedded tissue (reviewed in Day et al., 2005; Nelson et al., 2006). The method was originally developed for the isolation of specific cells from animal tissues (Emmert-Buck et al., 1996) and first used for plant cells only more recently (Kerk et al., 2003; Casson et al., 2005). The INTACT method, on the other hand, uses affinity-based purification of nuclei expressing biotinylated proteins in the nuclear envelope of the target cells (Deal and Henikoff, 2011). The suitability of the different methods for the isolation of individual cells from male and female germline lineages, however, is largely dependent on the cell type of interest and the species used. Male gametophytic cells can be relatively easily isolated, e.g. by osmotic shock and separation by Percoll gradient centrifugation, as successfully applied for uninucleate microspores, binucleate pollen, and sperm cells from Arabidopsis and rice (Oryza sativa) (Table 1; Honys and Twell, 2004; Wei et al., 2010; reviewed by Xin and Sun, 2010). In addition, FACS has been successfully used to sort mature Arabidopsis pollen and to isolate sperm cells from maize and Arabidopsis (Table 1). Micromanipulation has been applied to isolate male Arabidopsis meiocytes (microspore mother cells, MiMCs) (Table 1) and the generative cell of L. longiflorum, but also embryo sacs, female gametes and zygotes from a variety of different species including maize, A. thaliana, O. sativa, Tourenia fournieri and Alstroemeria aurea (Becker et al., 2003; Engel et al., 2003; Pina et al., 2005; Hoshino et al., 2006; Okada et al., 2006, 2007; Chen et al., 2010; Takanashi et al., 2010; Ohnishi et al., 2011; Yang et al., 2011; Libeau et al., 2011; reviewed by Xin and Sun, 2010; Hu et al., 2011). Disadvantages of these powerful techniques, however, are that FACS often requires the use of a cell-type- or tissue-specific marker, as does the INTACT method. Apart from this, FACS, INTACT and micromanipulation may require prolonged handling or treatment with macerating enzymes, such that effects on RNA expression patterns or RNA stability cannot be fully excluded. However, if handling time is kept short, the transcritional program of specific cell types does not appear to undergo substantial changes (Birnbaum et al., 2003). LAM, on the other hand, is applicable for cell-type-specific isolation with little cross-contamination for a variety of purposes, because: (i) the use of specific markers is not required, and (ii) the tissue is fixed prior to any manipulation, such that transcriptional profiles are unaffected by the handling (Wuest et al., 2010; Schmidt et al., 2011; Schmid et al., 2012). The downside of LAM is that it can be very time-consuming, depending on the cell type of interest. Also, the cell type of interest needs to be structurally distinguishable from the surrounding tissue in thin sections and isolated cells may contain minor contamination from neighbouring cells depending on the exact structural organization of the tissue within the section. While the laser beam leaves nucleic acids in the adjacent cytoplasm mostly intact, the thickness of the beam/section, and thus the suitability for the isolation of small cell types, varies with the LAM system used. Laser-assisted microdissection has been successfully applied to profile the cell-type-specific transcriptomes at different developmental stages of the male and female germline in different species, including MeMCs (Schmidt et al., 2011), the three cell types of mature female gametophytes in Arabidopsis (Wuest et al., 2010) and different developmental stages of the male germline in rice (O. sativa ssp. japonica‘Nipponbare’), including pre-meiotic MiMCs, microspores, bicellular and tricellular pollen (Hirano et al., 2008; Hobo et al., 2008; Suwabe et al., 2008; Tang et al., 2010).

Table 1.   Summary of recent transcriptome analyses of different developmental stages from pollen mother cell and megaspore mother cell to mature gametophytes of the male and female plant germline lineages using microarrays or high-throughput sequencing of RNA (RNA-Seq). Studies analysing exclusively stages after pollen germination or fertilization are not included
Developmental stageIsolation techniqueSpeciesTranscriptome profiling methodLiterature
  1. UNM, uninucleate microspore; BCP, bicellular pollen; TCP, tricellular pollen; FACS, fluorescence-activated cell sorting.

Male germline lineage
 Pre-meiotic pollen mother cellLaser microdissectionOryza sativa ssp. japonica44K Agilent microarrayTang et al. (2010)
 MeiocyteMicromanipulationArabidopsis thalianaSolid sequencingYang et al. (2011)
 MeiocyteMicromanipulationA. thalianaIllumina sequencingChen et al. (2010)
 MeiocyteMicromanipulationA. thalianaCATMA microarrayLibeau et al. (2011)
 Meiocyte, tetrad, UNM, BCP, TCPLaser microdissectionO. sativa ssp. japonica44K Agilent microarraySuwabe et al. (2008), Hobo et al. (2008), Hirano et al. (2008)
 UNM, BCP, TCPPercoll gradient centrifugationA. thalianaAffymetrix ATH1 arrayHonys and Twell (2004)
 UNM, BCP, TCPPercoll gradient centrifugationO. sativa ssp. japonicaAffymetrix rice genome arrayWei et al. (2010)
 Generative cellMicromanipulationLilium longiflorumcDNA microarraysOkada et al. (2007)
 Mature pollen A. thalianaAffymetrix ATH1 arraySchmid et al. (2005)
 Mature pollen (hydrated, non hydrated)FACSA. thaliana8K Affymetrix GeneCHIPBecker et al. (2003)
 Mature pollenFiltrationA. thaliana8K Affymetrix GeneCHIPHonys and Twell (2003)
 Mature pollenFACSA. thalianaAffymetrix ATH1 arrayPina et al. (2005)
 Mature pollenManual collection after rubbing anthers together on cover slipsGlycine maxSoybean GeneCHIPHaerizadeh et al. (2009)
 Mature pollen, anthersCollected from anthers shedding pollenZ. mays44K maize oligonucleotide arrayMa et al. (2008)
 Pollen germination, pollen tube growthVacuum methodA. thalianaAffymetrix ATH1 arrayWang et al. (2008)
 Pollen, pollen tubesSeparation from anthers with steel sievePetunia axillariscDNA spotted microarrayIshimizu et al. (2010)
 Pollen, pollen tubes (germinated  in vitro and semi in vivo)Vacuum methodA. thalianaATH1 arrayQin et al. (2009)
 Sperm cellsFACSA. thalianaAffymetrix ATH1 arrayBorges et al. (2008)
 Sperm cellsMorphology based selectionPlumbago zeylanicacDNA spotted microarrayGou et al. (2009)
Female germline lineage
 Megaspore mother cell (MeMC)Laser microdissectionA. thalianaAffymetrix ATH1 arraySchmidt et al. (2011)
 Egg cell, central cell, synergidsLaser microdissectionA. thalianaAffymetrix ATH1 arrayWuest et al. (2010)
 Central cellLaser microdissectionA. thalianaRNA-seq, SOLiDSchmid et al. (2012)
 Egg cell, synergid cellsMicromanipulationO. sativa ssp. japonica44K Agilent microarrayOhnishi et al. (2011)

Increasing estimates of transcriptome size of the germline lineages reflect advances in profiling methods

The relative ease of access to and isolation of cells from the male as compared to the female germline and their higher abundance is reflected by the considerably higher number of studies analysing gene expression of the male as compared to the female germline (Table 1, Figures 2 and 3). This is largely due to the number of cells that can be isolated in a certain time period and, consequently, the amount of total RNA that can be obtained for transcriptional studies. For example, around 100 000 Arabidopsis sperm cells could be isolated in one FACS session, and 16 ng of total RNA was used as input for subsequent transcriptome analyses (Borges et al., 2008). In addition, approximately 480 MiMCs per Arabidopsis flower could be isolated and 3.5 μg of total RNA was obtained from samples of approximately 57 600 cells (Libeau et al., 2011). In contrast, one Arabidopsis flower harbours only about 50 ovules with one developing female germline lineage each. Using micromanipulation of target cells from rice, 3000 egg cells and 1000 synergid cells were collected (Ohnishi et al., 2011). In Arabidopsis, several hundred cells from the female germline can be isolated separately using LAM, resulting in an estimated 0.3–1.5 ng of isolated total RNA (Wuest et al., 2010; Schmidt et al., 2011; Schmid et al., 2012). Due to the small amounts of total RNA yielded from the isolation of specfic cells of the female germline, linear amplification of the mRNA is required prior to transcriptome analysis (Wuest et al., 2010; Schmidt et al., 2011; Schmid et al., 2012), typically resulting in a shortening of the RNA fragments and a preferential amplification of, on average, approximately 400–500 bp of 3′ sequences of the transcripts. To account for the cell-type-specific analysis in combination with this amplification bias, the AtPANP algorithm has been developed and tested for the analysis of Affymetrix ATH1 array data, outperforming the standard MAS5 algorithm in terms of accuracy and precision (Wuest et al., 2010).

Figure 2.

 Advances in transcriptional profiling of gametophytes and gametophytic cell types.
The figure summarizes the transcriptome sizes of various stages of different cell and tissue types of the germline from several angiosperms estimated by high-throughput profiling (arrays or high-throughput RNA sequencing). The example of Arabidopsis thaliana illustrates the technical advances in transcriptional profiling with a general increase of transcriptome size estimates in the past few years. Also visible is the reduction of transcriptome size during pollen development. Abbreviations: MeMC, megaspore mother cell; egg, egg cell; syn, synergid cell; cen, central cell; MiMC, microspore mother cell; MeC, meiocyte; MS, microspore; 2cP, bicellular pollen; 3cP, tricellular pollen; mP, mature pollen; hP, hydrated pollen; gP, germinated pollen; PT, pollen tube; 30mPT/240mPT, pollen tube grown for 30/240 min; SSPT, pollen tube grown through stigma and style. References marked with NA do not provide transcriptome sizes.

Figure 3.

 Genes preferentially expressed in gametophytic tissues and cell types.
Expression values (log2 scale, calculated with robust multi-array analysis [RMA]; Irizarry et al., 2005) of genes preferentially expressed in individual cells or tissue types from the male and female germline lineages are summarized in a heatmap (blue/red indicate low/high expression values). Replicates are averaged. The data set consisted of several mixed tissues, and specific tissue and cell types from Birnbaum et al. (2003), Honys and Twell (2004), Nawy et al. (2005), Pina et al. (2005), Schmid et al. (2005), Yu et al. (2005), Lee et al. (2006), Levesque et al. (2006), Brady et al. (2007), Borges et al. (2008), Wang et al. (2008), Qin et al. (2009), Yadav et al. (2009), Wuest et al. (2010), Schmidt et al. (2011). Others comprise sporophytic tissues such as siliques, rosette leaves, cotyledons, roots, root xylem, inflorescences, seeds, etc. Data were processed as described in Schmidt et al. (2011), except using an updated annotation of the ATH1 microarray (, TAIRG, version 14), an adjusted P-value cutoff of 0.05 and a minimal fold-change of four (on log2 scale). Abbreviation: fl. st., floral stage.

First studies to investigate transcriptomes of plant gametophytes using microarrays were performed using Arabidopsis pollen together with Affymetrix 8K GeneCHIPs, representing approximately 8000 of the currently 33 602 annotated loci (TAIR10; (Table 1). In 2003, two independent studies identified 1584 and 992 genes expressed in pollen using this microarray, respectively (Table 1, Figure 2; Becker et al., 2003; Honys and Twell, 2003). Only 1 year later, studying the expression in uninucleate microspores, binucleate, trinucleate and mature pollen, a total of 13 977 male gametophyte expressed genes were identified using the Affymetrix ATH1 array (Honys and Twell, 2004). On this array more than 22 500 probesets are spotted, originally designed for the detection of approximately 24 000 genes ( It was estimated that 61.9% of all genes represented on the Affymetrix ATH1 array are expressed in the male gametophyte (Honys and Twell, 2004). Using the Affymetrix ATH1 array, on average 6044 genes (Figure 2; 7235, 6587, 5004, 7177, 3954 and 6304, respectively) were identified to be expressed in mature pollen in independent studies (Table 1, Figure 2; Honys and Twell, 2004; Pina et al., 2005; Schmid et al., 2005; Borges et al., 2008; Wang et al., 2008; Qin et al., 2009). These numbers demonstrate that the technical advance in array technology from the Affymetrix 8K GeneCHIP to the Affymetrix ATH1 array led to the identification of, on average, >4.5 times more expressed genes. Nevertheless, the highest and lowest estimates for expression in Arabidopsis mature pollen differ by 3281 genes. These differences are probably due to different pollen harvesting methods, different Arabidopsis accessions, and different algorithms used for decision on presence or absence of expression. Microarrays were also used for transcriptional profiling of mature pollen from maize, soybean (Glycine max) and Petunia axillaries, and uninucleate microspores, bicellular and tricellular pollen from rice (Table 1, Figure 2; Ma et al., 2008; Haerizadeh et al., 2009; Ishimizu et al., 2010; Wei et al., 2010). Apart from this, genes expressed in the generative cell from L. longiflorum have been identified using cDNA microarrays (Table 1; Okada et al., 2007). The transcriptome of isolated sperm cells from Arabidopsis was determined using Affymetrix ATH1 array, leading to the identification of 5829 sperm-cell-expressed genes (Table 1, Figure 2; Borges et al., 2008). Using morphology-based selection and cDNA spotted microarrays, gene expression in the dimorphic sperm cells of Plumbago zeylanica was analysed separately (Table 1, Figure 2; Gou et al., 2009).

Only recently, earlier developmental stages during microsporogenesis, i.e. Arabidopsis MiMCs, were studied to obtain new insights into the transcriptional basis of meiosis (Table 1, Figure 2; Chen et al., 2010; Libeau et al., 2011; Yang et al., 2011). Using high-throughput sequencing of RNA (RNA-Seq) around 21 500 annotated loci likely to be expressed were identified (Yang et al., 2011, 19 829 with at least one read in both replicates; Chen et al., 2010, 23 843 with at least one read per million reads). A direct comparison with the data from Libeau et al. (2011), where the transcriptome of MiMCs was measured using the Complete Arabidopsis Transcriptome MicroArray (CATMA) is, however, difficult because the authors do not provide estimates of the transcriptome size. Nevertheless, the total number of genes found to be expressed in MiMCs using RNA-Seq is well above any previous reports from any studied cell type of the male germline lineage of Arabidopsis. Apart from the size of the transcriptome at this developmental stage, this largely reflects technical advances in the technology of transcriptome profiling by RNA-Seq as compared to microarrays, as will be discussed in more detail below. The different developmental stages of the male germline – from pre-meiotic MiMCs to the tricellular pollen – were also analysed in rice using LAM and microarrays (Table 1, Figure 2; Hobo et al., 2008; Hirano et al., 2008; Suwabe et al., 2008; Tang et al., 2010). Consistently, the studies led to the estimation of the expression of approximately 60% or more of all genes in the genome in either rice MiMCs (17 196 of 29 008 genes with representative probes on the array; Tang et al., 2010) or Arabidopsis MiMCs (studies using RNA-Seq, Chen et al., 2010; Yang et al., 2011). Even though the transcriptome size at different stages of male germline development cannot always be directly compared due to the use of different profiling and isolation techniques, a significant reduction in size and complexity of the transcriptome from microsporogenesis, over early stages of microgametogenesis, to mature pollen is evident. In addition, Arabidopsis mature pollen was characterized by a relatively small transcriptome size compared with that of vegetative tissues, consistent with findings from a study analysing the soybean pollen transcriptome (Figure 2; Honys and Twell, 2004; Pina et al., 2005; Schmid et al., 2005; Haerizadeh et al., 2009). The number of expressed genes is amazingly close to the estimated 20 000 transcripts in Tradescantia paludosa pollen that was based on hybridization kinetics (Willing and Mascarenhas, 1984).

In contrast to the relatively well-studied male germline, the transcriptional networks underlying female gametophyte development have only recently been investigated. To date, the transcriptomes of cell types of the mature embryo sac from Arabidopsis (egg cell, central cell, synergids) and rice (egg cell, synergids), and the Arabidopsis MeMC (Table 1, Figure 2; Wuest et al., 2010; Ohnishi et al., 2011; Schmidt et al., 2011) have been described. Using LAM in combination with Affymetrix ATH1 array, 9115 genes were identified with evidence of expression in the Arabidopsis MeMC (Schmidt et al., 2011), only slightly more than the 8850 genes identified to be expressed in the cells of the mature gametophyte (7171 in the egg cell, 7287 in the central cell and 5628 in synergids; Wuest et al., 2010). A direct comparison with the transcriptome sizes of rice egg cells and synergids is not possible as the authors did not provide such estimates (Ohnishi et al., 2011). From these results, the complexity of the transcriptome is not reduced to a similar extent during female as during male germline development. However, more than twice the number of genes with evidence of expression in Arabidopsis central cells have been identified using LAM in combination with RNA-Seq than previously identified using LAM and the Affymetrix ATH1 array [17 419 (Schmid et al., 2012) compared with 7287 (Wuest et al., 2010) genes]. This suggests a superior performance of RNA-Seq in the detection of expressed genes as compared to the broadly used Affymetrix ATH1 array.

RNA-Seq outperforms microarrays in terms of detection range and for transcriptome profiling of non-model species

For Arabidopsis transcriptional profiling the Affymetrix ATH1 array is so far the most frequently used platform, offering the advantage that a high number of different cell and tissue types can be directly compared (Schmid et al., 2005; Wuest et al., 2010; Schmidt et al., 2011). Nonetheless, microarrays have several limitations: (i) high background levels due to cross-hybridization, (ii) a lack of sensitivity at low and high expression levels, and (iii) reliance upon existing knowledge about the genome sequence (Wang et al., 2009). In addition, some microarrays designed for direct transcriptional profiling (i.e. non-tiling arrays such as the Affymetrix ATH1 or CATMA arrays) can become outdated in terms of transcriptome coverage (e.g. ATH1 and CATMA arrays cover only around 64 and 66% of the 33 602 annotated loci in TAIR10), and do not offer the possibility of detecting previously unknown transcribed regions, and splice or sequence variants. Apart from this, probes for detecting genes with preferential or specific expression in the gametophytes are under-represented on the Affymetrix ATH1 array as compared with probes for detection of genes preferentially expressed in sporophytic tissues (Jones-Rhoades et al., 2007).

RNA-Seq has the potential to overcome these limitations (Marioni et al., 2008; Wang et al., 2009), and therefore also offers the opportunity to study organisms lacking reference sequences, or to identify novel loci and alternative splicing events (Trapnell et al., 2010). In terms of transcriptome size, RNA-Seq detects far more expressed genes than any study using Affymetrix ATH1 arrays for the profiling of cell types from the male or female germline lineages (Chen et al., 2010; Yang et al., 2011; Schmid et al., 2012). Beside the effect of whole genome coverage, the difference is probably due to the higher sensitivity of RNA-Seq, as many genes seem to be expressed at a level that is not distinguishable from the background on the Affymetrix ATH1 arrays (Yang et al., 2011; Schmid et al., 2012). Interestingly, the increase in transcriptome size was not proportional for all classes of genes but more strongly affected certain gene classes, which are likely to be important for developmental processes and specific cellular functions (Schmid et al., 2012).

Another prominent feature in RNA-Seq data is the presence of reads aligning to non-exonic regions (7% and 16% of all uniquely aligning reads in Schmid et al., 2012; Yang et al., 2011, respectively), including introns, regions flanking annotated loci, and isolated intergenic regions. The high number of non-exonic alignments in the central cell compared with other RNA-Seq transcriptomes (7% in a pool of organs and seedlings, Filichkin et al., 2010; 3.5% in unopened flower buds, Lister et al., 2008) may indicate transcriptional alterations prevalent in the central cell and novel transcribed regions that are specific to this cell type (Schmid et al., 2012).

RNA-Seq has also been used for transcriptional profiling in non-model organisms. One possible approach for data analysis is the alignment of the reads to known sequences from a closely related organism. Szövényi et al. (2011) chose this strategy to compare the sporophytic with the gametophytic generation of the water moss Funaria hygrometrica, using reference sequences from Physcomitrella patens. Around 30% of the reads could be aligned to regions with an average nucleotide similarity of 95% (range 77–100%), indicating close genetic relatedness of the two species (Szövényi et al., 2011). However, this similarity estimate may be biased towards a high value considering that alignments in more diverse regions are likely to fail the alignment criteria (Szövényi et al., 2011). A limiting factor of this approach is not only the availability of reference sequences from a closely related species but also the read length and the total number of reads. Given the need for a permissive alignment strategy, the approach may be feasible for experiments with a relatively small number of long reads (approximately 600 000 high-quality reads obtained with the 454 pyrosequencer used in Szövényi et al., 2011; average length not provided by the authors), but may perform poorly in an experiment with millions of short reads. In this case, de novo assembly of short reads into transcripts may perform significantly better (example given in Schmid et al., 2012). This approach has recently been used to characterize the transcriptome of the (homosporous) gametophyte of the bracken fern Pteridium aquilinum, which has a diploid chromosome count of 2n = 104, and a genome size of about 9.8 Gbp (Der et al., 2011). The authors detected around 52 000 unique sequences (unigenes) from which 62% showed high similarities to known proteins (NCBI non-redundant protein database, Notably, the data not only represented an 865-fold increase over the EST data available prior to the study on GenBank, but also led to the identification of 548 potentially amplifiable simple sequence repeats (SSRs) that may be used for genotyping (Der et al., 2011). In addition, homologues of more than 50% of the presumably gametophyte-specific genes from Arabidopsis were identified (Der et al., 2011; the list of gametophyte-specific genes in Arabidopsis was based on the data from Honys and Twell, 2004; Yu et al., 2005; Wuest et al., 2010). This indicates that, in the long run, RNA-Seq used for non-model plants or plants without sequenced reference genome can provide important insights in the development and evolution of the germline lineage and the alternation of generations in land plants.

Novel insights in transcriptional basis underlying germline specification

Studies analysing the transcriptional basis underlying male germline determination, sperm cell fate and pollen development provided new insights into the molecular mechanisms governing these important reproductive processes. Consistently, during male germline development a trend to reduce transcriptome size and complexity over the course of microsporogenesis and microgametogenesis has been observed (Honys and Twell, 2004; Wei et al., 2010). While ≥60% of genes have been estimated to be expressed at onset of male germline development in pre-meiotic MiMCs (Tang et al., 2010; Yang et al., 2011), the transcriptome size of mature pollen has been estimated to comprise ≤30% of annotated loci (Pina et al., 2005; Schmid et al., 2005). However, as different transcriptional profiling methods have been used in these studies, and genes preferentially expressed in gametophytes are less represented on the Affymetrix ATH1 array used to estimate the pollen transcriptome size, this difference might be overestimated. Nevertheless, despite the reduction of the overall transcriptome size during pollen maturation, an increasing functional specification of genes expressed in pollen has been observed, leading to estimates of 10–26% of pollen-specific genes (Figure 3; reviewed by Borg et al., 2009). Analysing enriched gene expression at each developmental stage from the uninucleate microspore, bicellular and tricellular pollen, to the mature pollen grain, Wei et al. (2010) described a ‘U-type’ change in the numbers of preferentially expressed genes per developmental stage in rice and Arabidopsis, reaching a maximum level in mature pollen grains. Consistently, a reduced diversity of transcripts together with a functional skew towards transcripts related to cytoskeletal, cell wall and signalling processes have been described for mature pollen, probably important for germination, pollen tube growth and double fertilization (Honys and Twell, 2003, 2004; Pina et al., 2005; Schmid et al., 2005; Becker and Feijó, 2007; Borg et al., 2009; Haerizadeh et al., 2009). In addition, transcripts for translation and transcription were under-represented, with the exception of certain classes of transcription factors, markedly including non-classical MADS-box transcription factors, i.e. type I and MIKC* (Honys and Twell, 2004; Pina et al., 2005; reviewed by Grennan, 2007; Borg et al., 2009). Interestingly, together with the RWP-RK domain and reproductive meristem (REM) transcription factor families, type I MADS domain transcription factors have also been identified as being up-regulated in the female gametophyte in comparison with other tissues, and were found to be exclusively enriched in reproductive tissues (Wuest et al., 2010). This is in good agreement with recent studies on the expression and role of type I MADS box proteins during reproductive development (Bemer et al., 2010; reviewed by Masiero et al., 2011). This suggests that transcriptional profiling and enrichment analyses can aid in the identification of genes crucial for – or specifically expressed during – distinct stages of germline development and reproduction (Figure 3).

While certain genes and functions might be shared between male and female reproductive lineages, others are clearly distinct (Figure 3). Interestingly, enriched expression of PAZ and PIWI domain-encoding proteins is a dominant feature of the egg transcriptome (Wuest et al., 2010). While small RNA pathways were first thought to be absent in Arabidopsis pollen and have not been detected in soybean pollen (Pina et al., 2005; Haerizadeh et al., 2009), expression of genes involved in small RNA pathways has subsequently been detected in Arabidopsis pollen and sperm (Borges et al., 2008; Grant-Downton et al., 2009a). Expression of genes involved in small RNA pathways has also been observed during megasporogenesis (Schmidt et al., 2011). However, expression patterns in the MeMC were distinct from those of male or female gametophytes and gametes (Schmidt et al., 2011). In addition to studying the transcriptional profile of genes involved in small RNA pathways, expression of known and novel small RNAs in the male germline has also been investigated using RNA-Seq or miRCURY LNA microarrays (Table 2) (Chambers and Shuai, 2009; Grant-Downton et al., 2009b; Wei et al., 2011).

Table 2.   Recent studies addressing small RNAs during development of the male and female germline lineage
Developmental stageIsolation techniqueSpeciesProfiling methodLiterature
  1. UNM, uninucleate microspore; BCP, bicellular pollen; TCP, tricellular pollen.

Male germline lineage
 UNM, BCP, TCPPercoll gradient centrifugationO. sativa ssp. japonicaSolexa sequencingWei et al. (2011)
 Mature pollenPercoll gradient centrifugationA. thaliana454 sequencingGrant-Downton et al. (2009b)
 Mature pollenModified hand-held vacuumA. thalianamiRCURY LNA arrayChambers and Shuai (2009)

In contrast to the relatively high number of studies addressing the transcriptional basis of microgametogenesis, only a few recent studies analyse gene expression during microsporogenesis (Chen et al., 2010; Tang et al., 2010; Libeau et al., 2011; Yang et al., 2011). The transcriptomes of Arabidopsis MiMCs isolated by micromanipulation and of rice pre-meiotic MiMCs isolated by laser microdissection has recently been studied with the purpose of identifying new genes playing a role in meiosis or in the context of meiotic cell divisions (Chen et al., 2010; Tang et al., 2010; Libeau et al., 2011; Yang et al., 2011). Consistently, Yang et al. (2011) reported expression of all 71 genes with described functions in meiosis, while enrichment of a number of meiotic genes in Arabidopsis MiMCs has been reported in other studies (Chen et al., 2010; Libeau et al., 2011). Interestingly, Tang et al. (2010) identified pathways important for meiotic recombination and cell cycle progression as well as expression of known meiotic genes enriched in pre-meiotic MiMCs, in agreement with the hypothesis that the transcriptional basis relevant for meiosis is already set up before its onset. Also, in Arabidopsis MeMCs sampled predominantly before meiosis to prophase of meiosis I, a number of genes with documented functions in meiosis but not in somatic tissues were found to be expressed (Schmidt et al., 2011). Importantly, this study documented the prevalence of the biological process translation as well as the relevance of ATP-dependent RNA helicases in MeMCs, which play a role in the specification of the female germline lineage. These regulatory features are shared by the plant and animal germline (Schmidt et al., 2011). Similarly, expression of 89 DEAD-box containing ATP-binding helicases has also been observed in MiMCs (Yang et al., 2011). Together, recent studies addressing cell-type-specific profiling of distinct developmental stages during male and female germline development in angiosperms provided important insights in their underlying gene expression profiles, molecular functions and regulatory pathways. However, a more detailed discussion of these findings and pathways, for example with respect to hormone signalling, cell–cell communication or gene regulation, is outside the scope of this review.

Conclusions and outlook

Over the last decade, methodological improvements in both cell- and tissue-type-specific isolation methods as well as rapidly evolving techniques for whole-genome transcriptional profiling have provided new insights into the transcriptional basis and molecular mechanisms underlying the specification and development of the plant germline. Within a few years, knowledge of genes expressed at certain developmental stages of the male or female germline lineage have increased by one to two orders of magnitude, allowing investigations of gene and pathway enrichment to identify the underlying molecular mechanisms. However, while a relatively high number of studies have addressed transcriptional profiles underlying the development of the male lineage, only a few studies have concentrated on the female lineage, due to the small number and inaccessibility of the cells involved. Nevertheless, these studies allowed the identification of major trends, like the distinctiveness of transcriptional patterns underlying male and female gametophyte development and the realization that similar genes and pathways are active during specification of the plant and animal germline (Wuest et al., 2010; Schmidt et al., 2011). As RNA-Seq allows investigations of almost all species of interest and is not restricted to an analysis of model systems with known and annotated genomes, it is foreseeable that in the next years these technological improvements will help us to gain a deeper understanding of plant germline development. In particular, broadening the investigations to non-model organisms spanning the phylogenetic tree of land plants is likely to yield exciting insights into the evolutionary trends with respect to the alternation of generations as well as the underlying molecular determinants of germline fate.