• soybean;
  • gene expression atlas;
  • comparative genomic;
  • transcription factors;
  • nodulation


  1. Top of page
  2. Summary
  3. Introduction
  4. Results and discussion
  5. Experimental procedures
  6. Acknowledgements
  7. References
  8. Supporting Information

Soybean (Glycine max L.) is a major crop providing an important source of protein and oil, which can also be converted into biodiesel. A major milestone in soybean research was the recent sequencing of its genome. The sequence predicts 69 145 putative soybean genes, with 46 430 predicted with high confidence. In order to examine the expression of these genes, we utilized the Illumina Solexa platform to sequence cDNA derived from 14 conditions (tissues). The result is a searchable soybean gene expression atlas accessible through a browser ( The data provide experimental support for the transcription of 55 616 annotated genes and also demonstrate that 13 529 annotated soybean genes are putative pseudogenes, and 1736 currently unannotated sequences are transcribed. An analysis of this atlas reveals strong differences in gene expression patterns between different tissues, especially between root and aerial organs, but also reveals similarities between gene expression in other tissues, such as flower and leaf organs. In order to demonstrate the full utility of the atlas, we investigated the expression patterns of genes implicated in nodulation, and also transcription factors, using both the Solexa sequence data and large-scale qRT-PCR. The availability of the soybean gene expression atlas allowed a comparison with gene expression documented in the two model legume species, Medicago truncatula and Lotus japonicus, as well as data available for Arabidopsis thaliana, facilitating both basic and applied aspects of soybean research.


  1. Top of page
  2. Summary
  3. Introduction
  4. Results and discussion
  5. Experimental procedures
  6. Acknowledgements
  7. References
  8. Supporting Information

After grasses, legumes are the most economically important plant family based on their consumption in human and animal nutrition. In addition, the use of legumes in biofuel production will further increase the economic impact of this plant family. These characteristics justify a substantial effort by the research community to better understand legume biology. An attribute of most legumes is the development of a symbiotic interaction with soil bacteria (rhizobia) that fix and assimilate atmospheric dinitrogen (atmN2). This symbiosis is based on the chemical recognition of diffusible signals by both partners, which determines the specificity of the interaction (Oldroyd and Downie, 2008). For example, the recognition of the lipo-chitin Nod factor, produced by rhizobia, by the root hair cells of the compatible host leads to plant morphological and biochemical changes (e.g. root hair cell curling, cortical cell division, induction of Nod factor-responsive plant genes and calcium spiking in root hair cells). These changes are the first signs of the development of a new plant organ, the nodule, where the bacteria differentiate into bacteroids and reduce atmN2. In exchange, the plant provides a steady supply of carbon to the bacteroids.

As part of the effort to better understand legume biology, the genome sequences of three legume species are now complete, or nearly complete: that is, Lotus japonicus (Lotus;, Glycine max (soybean; and Medicago truncatula (Medicago; Schmutz et al. (2010) recently described the complete soybean genome sequence. In each case, a large number of genes were predicted. The availability of these genome sequences now enables a variety of functional genomic methods to characterize these genes and their related functions. For example, large-scale cDNA sequencing technologies [e.g. 454 Life Sciences (Margulies et al., 2005) and Illumina Solexa platforms (Bennett et al., 2005)] provide a means to accurately profile gene expression (e.g. Libault et al., 2010). In the past, gene expression atlases were established in Arabidopsis thaliana (Schmid et al., 2005), Oryza sativa (Nobuta et al., 2007; Jiao et al., 2009), M. truncatula (Benedito et al., 2008) and L. japonicus (Hogslund et al., 2009) by using massive, parallel-signature sequencing and array-hybridization technologies.

In this study, the high-throughput Illumina Solexa sequencing platform was used to develop a gene expression atlas of the soybean genome. cDNAs derived from a total of nine different soybean tissues were sequenced. Included in the soybean gene atlas are five additional data sets, described by Libault et al. (2010), for a combined total of 14 different conditions (tissues). This provides an unprecedented coverage of the transcriptome, including documentation of expression from annotated pseudogenes and unannotated genes, and also provides accurate quantification of low abundant transcripts (Cheung et al., 2006; Weber et al., 2007; Libault et al., 2010). To demonstrate the utility of the soybean gene expression atlas, we focused specifically on expression in root hair cells, as well as on meristem-specific genes and expression of transcription factor (TF) genes. The results from the soybean gene expression atlas were also compared with previously published expression data from A. thaliana, M. truncatula and L. japonicus. For example, the comparison to the well-annotated A. thaliana genome identified putative soybean genes involved in the determination of floral organs and the maintenance of the shoot apical meristem (SAM). The availability of the soybean gene expression atlas should facilitate additional studies on the basic biology of soybean, while also supporting applied research to improve soybean agronomic performance.

Results and discussion

  1. Top of page
  2. Summary
  3. Introduction
  4. Results and discussion
  5. Experimental procedures
  6. Acknowledgements
  7. References
  8. Supporting Information

Sequence-based transcriptome atlas of soybean: an overview

We used the Illumina Solexa sequencing platform to quantify the expression of soybean genes (i.e. the number of sequence reads/million reads aligned) in nine different conditions: root hair cells isolated 84 and 120 h after sowing (HAS), root tip, root, mature nodules, leaves, SAM, flower and green pods. Our choice to include root hair cells isolated at two different time points in this analysis was motivated by the changes in their transcriptome during development (Libault et al., 2010). Between 4.18 and 6.84 million reads of around 36 bp were generated for each of the nine conditions. Among them, 45.8–82.6% of the reads aligned with less than five loci on the soybean genome (Table 1). Such variation resulted from the high and low numbers of unaligned and repetitive reads (i.e. from matches with more than five loci) in pod (54.2% of the total reads) and flower samples (17.4% of the total reads), respectively. We classified the sequence reads aligned with less than five loci on the soybean genome into two different groups based on the number of matches identified against the soybean genome [i.e. non-unique reads (from two to five loci) and unique reads (only one soybean locus); Table 1]. To insure accuracy in the quantification of expression in the different tissues tested, only the sequence reads matching uniquely against the soybean genome were used. A total of 51 529 annotated soybean genes (74.5% of the 69 145 putative, annotated soybean genes) were found to be expressed in at least one condition (Table S1). Included in the present analysis are five additional data sets described by Libault et al. (2010)– i.e. root hairs harvested 12, 24 and 48 h after Bradyrhizobium japonicum inoculation (HAI); 24-HAI mock-inoculated root hairs; and 48-HAI inoculated stripped roots (Table S2) – resulting in the documentation of expression for a total of 52 947 annotated genes. No gene expression in any of the 14 conditions was detected for 16 198 annotated genes, suggesting that these genes were not expressed, were expressed at a level below our detection limit or were expressed only under highly restricted conditions (Table S2). The data also shows expression from 7314 different soybean loci currently lacking any gene annotation (Table S2). Considering only the nine conditions sequenced as part of the current study, the data demonstrate expression from 7174 currently unannotated regions (Table S1). A number of root hair genes were found to be specifically expressed upon inoculation with B. japonicum, as documented by Libault et al. (2010).

Table 1.   Distribution of Illumina-Solexa 36-bp reads according to their alignment against the Glycine max (soybean) genome
SampleG. max uniqueG. max non-unique (2–5 matches)Unaligned and highly repetitive reads (>5 matches)Total reads
  1. HAS, h after sowing.

Root tip3 235 689850 7501 068 1425 154 581
Root3 790 433884 2571 432 7546 107 444
84-HAS root hairs2 828 246719 6262 063 6375 611 509
120-HAS root hairs4 086 9651 052 4571 698 7876 838 209
Nodule3 401 083936 0371 999 3896 336 509
Leaves2 813 9161 202 9141 279 0125 295 842
Shoot apical meristem3 947 5661 041 8941 488 7006 478 160
Flower3 372 444902 730901 1165 176 290
Green pods1 462 809453 3402 268 6394 184 788

The soybean genome annotation, as described by Schmutz et al. (2010), refers to 46 430 soybean genes predicted with high confidence, with the remaining genes predicted with low confidence. We compared our gene list for which no detectable expression was found across 14 conditions with the list of low-confidence genes. From the list of 16 198 putative genes lacking expression, 12 673 (78.2%) were predicted with low confidence in the current soybean genome annotation (Table S3). The presence of an expressed sequence tag (EST) or full-length cDNA sequence led to the annotation of the remaining 3525 genes with high confidence (Table S3). Having reviewed the conditions in which these 3525 transcripts were detected, we conclude these genes were expressed under highly restricted conditions, such as at very specific stages of organ development or in specific response to abiotic stress, such as drought stress. Therefore, it is likely that most of the 12 673 low-confidence genes, which lack expression, are pseudogenes.

Soybean is an allotetraploid that has undergone at least two rounds of whole genome duplication, with the most recent having occurred approximately 13 Mya (Schlueter et al., 2004, 2007; Gill et al., 2009). In a previous study, we demonstrated cases in which the homeologous gene pairs showed significant divergence in their expression (Libault et al., 2010). In order to examine this on a whole genome basis, we established syntenic relationships between 19 533 annotated genes (28.2% of the annotated soybean genes) to establish their homeology (Table S4). Among the 12 673 predicted pseudogenes, we identified homeologs expressed at some level in all conditions tested for only 61 (<1%; Table S5). Such results are consistent with current theories of gene evolution, where, after whole genome duplication, gene fates include silencing or neofunctionalization of one of the two copies (Adams, 2007).

A number of sequence reads matched against the 7314 loci currently lacking gene annotation (Table S2). The majority of these loci (7127) were found in regions assembled as part of the chromosome pseudomolecules, whereas the remainder (187) were located on currently unanchored scaffolds. In a previous study, we demonstrated the use of high-throughput cDNA sequencing to improve the current soybean genome annotation (Libault et al., 2010). Therefore, we mined 20 kbp of the genomic DNA sequence around each of the 7127 regions found to have gene expression. Using FGENESH, we predicted putative protein-coding genes for 6059 of the 7127 loci (85%). Among them, 4323 of the gene predictions overlapped existing annotated genes, resulting in the 5′ or 3′ expansion of the currently annotated cDNA sequences (Table S6). The remaining 1736 genes predicted by FGENESH did not overlap currently annotated genes, suggesting the existence of new protein-coding genes. We used Interproscan (Zdobnov and Apweiler, 2001) software to identify the signature domains of the encoded proteins: 542 and 1194 genes encode protein with and without conserved domains, respectively (Table S7). Altogether, our analysis suggested that 57 352 soybean genes are transcribed (i.e. 55 616 out of the 69 145 putative genes in the current, published soybean genome annotation; the remaining 13 529 are putative pseudogenes, plus 1736 newly annotated genes).

Tissue-specific gene expression

Benedito et al. (2008) noticed large differences in the transcriptome between one M. truncatula organ compared with another, based on a number of DNA microarray hybridizations. Similarly, Schmid et al. (2005) and Aceituno et al. (2008) concluded that the A. thaliana transcriptome strongly varied from one organ to another. These studies suggest that the identity of specific plant organs is derived from the respective transcriptome. In soybean, across the nine tissues tested in the current study, the number of annotated/unannotated sequences transcribed was similar from one tissue to another (min. 52.4% in pod; max. 61.2% in the SAM; Table 2). Altogether, these percentages were slightly lower than those reported in M. truncatula (55–63%; Benedito et al., 2008) and A. thaliana tissues (55–67%; Schmid et al., 2005). Such differences might be a direct consequence of the non-negligible number of putative pseudogenes mentioned above, and might also reflect the residual background or cross-hybridization existing when using array hybridization technology. A similar number of soybean genes were expressed in a single cell type (root hair) and in multicellular organs (e.g. 45 717, 40 034, 43 377 and 46 173 soybean genes were expressed in flower, pod, 84- and 120-HAS root hair cells, respectively; Table 2). Jiao et al. (2009) previously reported that transcripts undetectable in cDNA derived from shoot, root or germinated seeds could be detected if mRNA was sampled from a single cell type from this organ. Therefore, we hypothesize that the heterogeneous population of differentiated cells composing a soybean organ results in a larger diversity of expressed sequences, but also in the poor detection of low-abundance transcripts. In contrast, cDNA derived from the single cell root hairs allows for the detection of low-abundance transcripts, because of a lack of dilution from other tissues, and the homogeneity of the tissue sampled. Apparently, these opposing factors result in approximately the same number of transcripts sequenced from a single cell type and multicellular organ samples.

Table 2.   Distribution of expressed and not expressed annotated and unannotated sequences across nine Glycine max (soybean) tissues
 Number of expressed sequencesNumber of silenced sequences (i.e. no transcript detected)
Annotated sequences (%)Unannotated sequences (%)Annotated sequences (%)Unannotated sequences (%)
Root hair 84 HAS38 645 (50.54)4732 (6.19)30 500 (39.89)2582 (3.38)
Root hair 120 HAS40 849 (53.43)5324 (6.97)28 296 (37.01)1990 (2.60)
Root tip36 882 (48.24)4624 (6.05)32 263 (42.20)2690 (3.52)
Root40 576 (53.07)5126 (6.71)28 569 (37.37)2188 (2.86)
Nodule36 369 (47.57)4438 (5.81)32 776 (42.87)2876 (3.76)
Leaf37 600 (49.18)4518 (5.91)31 545 (41.26)2796 (3.66)
Shoot apical meristem41 415 (54.17)5341 (6.99)27 730 (36.27)1973 (2.58)
Flower40 863 (53.44)4854 (6.35)28 282 (36.99)2460 (3.22)
Pod36 325 (47.51)3709 (4.85)32 820 (42.92)3605 (4.72)

To better establish the identity of the different soybean tissues, we generated a heat map based on the correlation between their transcriptomes (Figure 1a). Based on this map, the nine organs can be divided into three different groups: (i) root tip, root and root hairs; (ii) SAM, pod, flower and leaf; and (iii) nodule. The lack of correlation between root-related tissues and aerial organs was previously reported by Benedito et al. (2008) in M. truncatula. These results are likely to reflect the divergence in function between the root and aerial portions of the plant. Consistent with this notion, other tissues show significant overlap in their transcriptomes. For example, gene expression in the soybean pod and SAM was strongly correlated (Figure 1a). The transcription profile can also reflect development. For example, the flower and leaf transcriptomes were closely correlated. In 1790, Goethe hypothesized that floral organs were modified leaves (Coen, 2001). Indeed, four MADS-box TF genes named SEPALLATA1–4 (SEP1, SEP2, SEP3 and SEP4, previously named AGL2, AGL4, AGL9 and AGL3) were characterized for their role in the acquisition of floral organ identity, as sep mutants develop leaf-like organs instead of flowers (Honma and Goto, 2001; Pelaz et al., 2001; Ditta et al., 2004). These results suggest that organ-specific gene expression could be the result of the action of relatively few regulatory genes.


Figure 1.  Comparison of the transcriptomes of various Glycine max (soybean) tissues. Ward hierarchical clustering of log2 transformed gene distribution in nine diverse soybean organs [root hair cells isolated 84 and 120 h after sowing, root tip, root, mature nodules, leaves, shoot apical meristem (SAM), flower and green pods], based on Pearson correlation coefficients. The entire soybean tissue transcriptome (a) or the 28 374 annotated soybean genes identified to be expressed in all nine tissues (b) were used to generate two distinct maps. The color scale indicates the degree of correlation (green, low correlation; red, strong correlation). The heat map was generated using JMP Genomics 4.0.

Download figure to PowerPoint

The soybean nodule transcriptome showed little correlation with other organs, with the exception of mature roots. It is interesting to note that the soybean root hair transcriptome was not strongly correlated with that of the whole root, nor with any of the other soybean tissues analyzed (Figure 1a). This is likely to reflect the specialization of this single cell type, but also the tissue dilution that occurred by sampling the other organs, especially the roots.

In a previous study, Aceituno et al. (2008) showed that the Arabidopsis organ transcriptomes were not strongly affected in response to environmental changes. Therefore, the unique transcriptomic patterns exhibited by the various soybean organs are likely to reflect their unique identity, and are not the result of specific environmental conditions. Therefore, in order to better understand soybean organ development, we analyzed the soybean gene expression atlas to identify those genes that were ubiquitously expressed across the nine tissues, and those showing a very high level of tissue-specific expression. The results of this analysis showed that 58 703 soybean genome loci, including both annotated and unannotated regions, were expressed in at least one of the nine soybean tissues. Roughly half of these genes (28 374) were transcribed ubiquitously (Table S8). In theory, organ identity could depend on both the level of expression of ubiquitously expressed genes and the organ-specific expression of selected genes. To address this issue, we first compared the overall expression levels of the 28 374 ubiquitous genes between the nine conditions (Figure 1b). As shown in Figure 1, this analysis revealed significant differences in the absolute expression levels of the 28 374 ubiquitously expressed genes. These data also leave the impression that few, if any, soybean genes are stably expressed in the various soybean tissues. In order to examine this directly, we included the additional five conditions from the publication by Libault et al. (2010) to define genes constitutively expressed by the following criteria: (i) the gene was expressed in all 14 conditions tested; (ii) the fold change in the relative expression levels was not higher than three between conditions where genes were the most and the least expressed. These criteria identified 2532 putative constitutive genes (Figure S1; Table S9). Among these, PFAM, KOG or PANTHER conserved domains were identified for 2187 genes, leading to the identification of 140 TF genes [2.5% of the 5671 predicted TF genes in the soybean genome; Schmutz et al., 2010; Libault et al., 2009a; PFAM, KOG and PANTHER domain predictions are available from Such a relatively low number is a direct reflection of the specific role of TF genes in the determination of plant organ identity.

We also sought to identify soybean transcripts expressed solely in one soybean organ. These genes were classified into four groups depending on their tissue specificity: preferentially (≥3- and <10-fold changes between the expression levels of the most highly expressed and second most highly expressed genes), specifically (≥10- and <100-fold change), very specifically (≥100- and <1000-fold change) and exclusively identified in one tissue (≥1000-fold change). These criteria identified 5313, 1374, 147 and nine genes that were preferentially, specifically, highly specifically and exclusively expressed in one tissue, respectively (Figure 2a; Table S10). Benedito et al. (2008) reported that M. truncatula seeds and nodules possessed the largest number of tissue-specific genes. Hogslund et al. (2009) found that L. japonicus flowers exhibited the highest degree of tissue-specific gene expression. In soybean, the largest numbers of tissue-specific genes were identified in nodules and flowers (1465 and 1145 genes, respectively; ≥3-fold change; Figure 2b). Using more stringent parameters, soybean nodule, flower and pod were the organs that were strongly enriched in highly tissue-specific genes (61, 54 and 29 genes, respectively; ≥100-fold change; Figure 2b). Given the lack of correlation in overall gene expression between the nodule transcriptome and the other tissues sampled (Figure 1), it was not surprising to identify this tissue among those showing the highest level of organ-specific gene expression. In contrast, it would appear that the correlation in the overall level of gene expression between flowers and leaves (Figure 1) hides a significant level of flower-specific gene expression (1145 flower-specific genes; ≥3-fold change). These genes are clearly strong candidates for determining the specific functional components of the flower. The overall soybean transcriptome was also mapped relative to the position of the respective genes in the assembled soybean genome. As an aid to visualization of these data, we established a color-code map for each chromosome, and for each tissue, to reflect the overall gene expression level (Figure 3). These data, as well as the data from the earlier Libault et al. (2010) study, can best be viewed as part of the soybean genome browser available at Visualizing the data in this way rapidly demonstrates that most of the protein-coding genes and also the most strongly expressed genes are located on the chromosome arms, whereas expression from the less gene-dense pericentromeric regions is much reduced.


Figure 2.  Gene expression specificity across nine Glycine max (soybean) tissues. (a) All soybean transcripts (dashed grey line), unannotated transcripts (black line) and transcription factor transcripts (grey line) were classified into four groups according to their tissue specificity: preferentially (≥3- and <10-fold changes between the expression levels of the most highly expressed and second most highly expressed genes), specifically (≥10- and <100-fold changes), very specifically (≥100- and <1000-fold changes) and exclusively identified in one tissue (≥1000-fold change). (b) Distribution of the number of overall soybean transcripts in the nine different soybean tissues tested according to their level of specificity (3-, 10-, 100- and 1000-fold change cut-off).

Download figure to PowerPoint


Figure 3.  Color code maps of gene expression across the 20 Glycine max (soybean) chromosomes. For each chromosome, gene expression (i.e. number of sequence reads per million reads aligned: <0.5, yellow; 0.5–2, orange; 2–5, light green; 5–10, green; 10–25, greenish brown; 25–50, brown; 50–100, brownish red; 100, red) is indicated for nine different tissues (from top to bottom: root hairs 84 h after sowing, root hairs 120 h after sowing, nodule, root, root tip, shoot apical meristem, leaf, flower and pod). The final color strip at the bottom of each chromosome represents gene density (i.e. number of genes per 100 kbp; 0–15 or higher[RIGHTWARDS ARROW]black-white). These maps were generated by using the comparative map and trait viewer (cmtv) software.

Download figure to PowerPoint

Root hair and meristem-specific soybean genes

Root hairs are single cell extensions of the root epidermis, and play a key role in water and nutrient uptake. However, in legumes, they play a secondary role as the primary site for rhizobial infection, leading to the development of nitrogen-fixing nodules. Root hairs also exhibit polar cell expansion. In a previous study, we identified around 2000 soybean genes regulated in root hair cells in response to B. japonicum infection (Libault et al., 2010). In order to extend our understanding of the soybean root hair cell, we also sought to identify genes that were specifically expressed in root hairs. Using the same criteria outlined above, we identified 451 soybean sequences that were preferentially expressed in root hairs, including 69 and three root hair-specific and highly specific genes, respectively (Table S11). Using PFAM, KOG and PANTHER domain predictions, we predicted the functions of 304 of the 451 annotated genes. Some gene families are clearly over-represented in this list of root hair-specific genes. For example, cellulase (three genes, 1%), pectinesterase (four genes, 1.3%), peroxidase (eight genes, 2.6%) and extensin genes (four genes, 1.3%) were gene families preferentially expressed in root hairs (χ2 < 1 × e−50). These families represented only 0.06% (28 genes), 0.3% (144 genes), 0.4% (205 genes) and 0.03% (16 genes) of the 47 724 soybean annotated genes for which predicted functions were established. It is likely that the expression of these gene families reflects the polar growth of the root hair cells, where continuous cell wall expansion is required, and where reactive oxygen species are essential (Baumberger et al., 2001, 2003; Bucher et al., 2002; Carol and Dolan, 2006).

Shoot apical and root meristems are the locations of the intense cell division required for plant growth. We combined the transcriptomes of these two meristematic tissues to identify 28 soybean genes that were preferentially expressed in the soybean meristematic zones (Table S11). Among these, 18 genes encode proteins with conserved domains, including three encoding a predicted kinesin, a regulator of cytokinesis (Müller et al., 2006). In addition, eight transcriptional and translational regulators (e.g. bHLH, SBP, Zf-HD TFs; RNA polymerase subunit, PIWI and ribosomal protein) were also preferentially expressed in soybean meristematic zones, suggesting strong transcriptional and translational activities, which are probably also involved in maintaining the high cell division rate and in controlling cell determination, differentiation and elongation.

Expression pattern of soybean nodulation-related genes

A unique feature of legumes, including soybean, is their formation of a novel root organ, the nodule, in response to rhizobial infection. Previously, Schmutz et al. (2010) annotated approximately 100 soybean genes as those predicted to play a role in nodulation, based on an extensive review of the nodulation literature. Among these 100 putative nodulation-related soybean genes, 14 were regulated during root hair cell infection by B. japonicum (Libault et al., 2010). An examination of the soybean gene expression atlas showed that only one, Glyma13g12440 (a putative GmN56 gene; Schmutz et al., 2010), of the 100 soybean nodulation-related genes (Table S12) was not expressed in any of the nine tissues sampled. In a previous study of soybean nodulation, Kouchi and Hata (1995) clearly identified a transcript for GmN56. Consequently, we looked at the expression of Glyma13g12490 and Glyma13g12500, two homeologous genes to Glyma13g12440 (Schmutz et al., 2010). Both genes were expressed and to a significantly higher level in nodules (Figure S2; Table S12). Therefore, it is likely that the GmN56 EST identified by Kouchi and Hata (1995) arose from either Glyma13g12490 or Glyma13g12500, and not from Glyma13g12440. Of the remaining 100 putative nodulation-related genes, 70 genes were not expressed preferentially in nodules (≤3-fold change between nodule and the eight remaining tissues), including those encoding the putative Nod factor receptors (NFR1α-β and NFR5α-β), and TFs known to regulate root hair cell infection (e.g. NSP1 and NSP2) (Table S12). The induction of the expression of these genes during root hair infection by B. japonicum (Libault et al., 2010), but not in mature nodules, is in agreement with their early role during legume infection (Catoira et al., 2000; Amor et al., 2003; Madsen et al., 2003; Oldroyd and Long, 2003; Radutoiu et al., 2003; Kalo et al., 2005; Smit et al., 2005; Heckmann et al., 2006; Murakami et al., 2006). The remaining 29 genes were preferentially expressed in nodules (≥3-fold change; Figure S2; Table S12). Among these, 16 and seven genes were specifically (≥10- and <100-fold changes) and very specifically (≥100- and <1000-fold changes) expressed in the nodules (Figure S2; Table S12). Homeologous pairs of NIN (Glyma04g00210 and Glyma06g00240), NIN2 (Glyma12g05390 and Glyma11g13390) and CYCLOPS genes (Glyma01g35260 and Glyma09g34690) were expressed specifically in soybean nodules (Figure S2; Table S12). The role of NIN in L. japonicus nodule development was previously noted by Schauser et al. (1999), whereas CYCLOPS function during L. japonicus nodule development was not clearly established (Yano et al., 2008). In addition, consistent with their initial characterization, 23 encoded nodulins were also expressed specifically in nodules. Recently, Haney and Long (2009) identified seven flotillin-like genes in M. truncatula, which are gene homologs of the soybean nodulin GmNod53b (Winzer et al., 1999). Two of the M. truncatula flotillin genes were induced at 24 HAI with Sinorhizobium meliloti. Utilizing the GmNod53b sequence, we identified only two, homeologous flotillin genes in soybean (Glyma06g06930 and Glyma04g06830; e-value < e−20). However, their expression patterns across the nine tissues were very different. For example, Glyma04g06830 expression was not detected in any tissues, with the exception of nodule tissue, where its transcript was barely detected. Glyma06g06930 was strongly and primarily expressed in nodules, but also in root hair cells uninoculated by B. japonicum. In addition, Glyma06g06930 expression was induced in soybean root hairs at 12 (3.7-fold change), but not at 24 and 48 HAI, with B. japonicum (Table S2). These data suggest that the flotillin encoded by Glyma06g06930 is likely to be orthologous to the genes shown by Haney and Long (2009) to be crucial to root hair infection by S. meliloti.

Expression patterns of soybean transcription factor genes

The TF genes are of clear interest because they control plant responses to the environment, as well as developmental pathways (for a review, see Libault et al., 2009a). For example, our earlier study (Libault et al., 2010) identified a number of soybean TF genes in which expression responded to B. japonicum inoculation. Soybean genes homologous to MtHAP2.1, MtERN and LjNIN genes, genes controlling M. truncatula and L. japonicus nodule development (Schauser et al., 1999; Combier et al., 2006; Middleton et al., 2007), were clearly identified based on syntenic relationships and their nodule-specific expression (Libault et al., 2009a,b).

The soybean gene expression atlas was mined to identify TF genes exhibiting tissue-specific expression. This analysis identified 624 TF genes that were expressed preferentially in one soybean tissue compared with the eight others, including 114, five and one TF genes, specifically, very specifically and exclusively expressed in one tissue, respectively (Figure 2a; Table S13).

Examination of this list of 120 TF genes specifically expressed in at least one tissue (≥10-fold change) identified a significant number of C2H2 (Zn) and NIN-like TF genes expressed preferentially in nodules (Figure 4). As described above, the role of NIN-like genes in legume nodulation is well established. However, to date, there is no functional demonstration of a role for C2H2 (Zn) TF genes during legume nodulation. Our data suggests that this should be examined more closely. Members of the Homeodomain TF family were restricted to the SAM, whereas members of the LIM, MADS and NAC TF families were preferentially expressed in flowers, suggesting a specific role for these TF gene families in the normal development of these tissues (Figure 4). In A. thaliana, a large number of MADS TF genes, such as SEP1, SEP2, SEP3, SEP4, APETALA1 (AP1), APETALA3 (AP3), PISTILLATA (PI) and AGAMOUS (AG), are key regulators of flower development (for a review, see Robles and Pelaz, 2005). Arabidopsis thaliana Homeodomain TF genes, such as WUSCHEL (WUS) and SHOOTMERISTEMLESS (STM), are important in the formation and maintenance of the SAM (Barton and Poethig, 1993; Endrizzi et al., 1996; Laux et al., 1996; Mayer et al., 1998). Consequently, we hypothesized that some of the soybean Homeodomain and MADS TF genes expressed specifically in the SAM and flower may be orthologs to WUS and STM, and to SEP1, SEP2, SEP3, SEP4, AP1, AP3, PI and AG, respectively. In order to establish this orthology, we looked for syntenic relationships between these gene families in the A. thaliana and G. max genomes. With the exception of SEP3 and PI genes, we identified soybean orthologs of the flower and SAM-related Arabidopsis genes (Figure S3). In most cases, the recent duplication of the soybean genome logically led to the identification of two putative orthologs. More surprisingly, the Glyma18g50900 gene was identified as the potential ortholog of SEP1 and SEP2, whereas the region encoding Glyma02g13420 was orthologous to both SEP4 and AP1. Such a surprising result suggested the gene pairs SEP1/SEP2 and SEP4/AP1 probably diverged from common gene ancestors before the divergence between soybean and Arabidopsis. To provide further evidence of the orthology between the soybean MADS and Homeodomain genes WUS, STM, SEP1, SEP2, SEP4, AP1, AP3 and AG, we mined the Arabidopsis gene expression data (Hruz et al., 2008) to compare the expression profiles of the genes in both organisms. Similarly to A. thaliana, a significant number of soybean genes putatively involved in flower development were strongly but not exclusively expressed in flowers (Figure 5). Among them, four MADS genes (Glyma01g08150, Glyma02g13420, Glyma04g02980 and Glyma06g02990), orthologs to AtAP1, AtSEP4 and AtAP3, were identified as specifically expressed in flowers (Figure S3; Table S13). The function of the remaining four soybean MADS genes and seven Homeodomain genes expressed specifically in flower and SAM needs to be investigated. Altogether, this analysis clearly demonstrates the usefulness of combining genome and transcriptome comparisons to identify genes playing critical developmental roles in soybean.


Figure 4.  Distribution of Glycine max (soybean) transcription factor genes expressed specifically in one soybean tissue, based on their family membership. The sub-pies highlight the distribution of specific transcription factor gene families in the different tissues, based on the specificity of their expression.

Download figure to PowerPoint


Figure 5.  Gene expression patterns of Arabidopsis genes involved in the formation and maintenance of the shoot apical meristem (SAM) and the determination of flower organs (a), and their putative orthologs in Glycine max (soybean) (b). Genevestigator (Hruz et al., 2008) and the soybean gene atlas were mined to establish the expression pattern of the Arabidopsis and soybean genes, respectively.

Download figure to PowerPoint

Taking advantage of this analysis, and to validate the accurate measurement of soybean gene expression by Illumina Solexa technology, we compared the Illumina Solexa data set with transcriptomic analyses performed on 11 soybean tissues using the previously published quantitative RT-PCR primer set library, designed against more than 1000 soybean regulatory genes, including 652 TF genes (Libault et al., 2009b). In virtually all cases, the qRT-PCR results validated the measurements made by Illumina Solexa sequencing. Full details are provided in Appendix S1.

Comparison of the M. truncatula, L. japonicus and G. max transcriptome

Glycine max, M. truncatula and L. japonicus probably diverged around 40 Mya, reflecting the extensive microsynteny that exists between their genomes (Choi et al., 2004; Cannon et al., 2006; Young and Udvardi, 2009). This relationship provides opportunities to transfer genetic knowledge between these three species. However, such comparisons also need to allow for divergence in the expression patterns of orthologous genes during legume evolution, especially given the more recent whole genome duplication in soybean (Schlueter et al., 2004, 2007), and the silencing of homeologous genes (Libault et al., 2009a,b; present study). Consequently, the use of orthology to deduce common function among the three legume species will not only require the establishment of a syntenic relationship, but also the demonstration of similar gene expression patterns. This is further evidence for the utility of gene expression atlases for these three species.

The majority of gene expression data available for M. truncatula and L. japonicus come from a variety of Affymetrix microarray experiments. Therefore, as a first step to compare gene expression from these two species with that of soybean, we sought to identify the orthologous genes present on the M. truncatula and L. japonicus Affymetrix arrays, and their counterparts in soybean. To simplify this analysis, we focused on the 147 annotated soybean genes expressed very specifically in only one tissue (≥100-fold change; Table S10). Subsequently, we mined the M. truncatula and L. japonicus expression data for the corresponding orthologs by referencing the respective gene expression atlases (Benedito et al., 2008; Hogslund et al., 2009). This approach allowed the direct comparison of 40 soybean genes in five tissues (nodule, root, leaf, flower and pods) with the corresponding M. truncatula orthologs in the same five tissues, and the L. japonicus orthologs in four tissues (nodule, root, leaf and flower). This comparison showed that 18 soybean genes share similar tissue specificity with their putative orthologs in M. truncatula and L. japonicus (Table S14). This number may simply reflect the difficulty of establishing true orthology, or may reflect subfunctionalization or neofunctionalization of the remaining 22 soybean putative orthologs. To better establish orthology, we analyzed microsynteny between the G. max, M. truncatula and L. japonicus loci encoding the various putative orthologs. Significant microsynteny was found between three G. max and M. truncatula and eight G. max and L. japonicus gene regions (Figure S4; Table S14). For example, microsynteny was found between the Glyma01g44660 soybean gene region and the corresponding regions in both M. truncatula (Medtr5g006680) and L. japonicus (CM0591.50.nd). These three genes were expressed specifically in flowers.

Interestingly, during our analysis we also highlighted synteny between legume genes not identified during the initial screen (Figure S4). For instance, in addition to apparent orthology to Glyma07g16290, was also orthologous to Glyma18g40360, a soybean gene preferentially expressed in the nodules, based on the soybean gene atlas (Figure S4; Table S2). These three genes are predicted to encode C2H2 (Zn) TFs, consistent with the previously mentioned abundant expression of this family of TF genes in nodule tissue. Microsynteny was found between genes in G. max and M. truncatula, which have very different expression patterns. For example, Glyma09g41200, Glyma18g44670 and Glyma18g44680 are soybean genes expressed specifically in flowers, and lie on a region of the soybean genome microsyntenic to Medtr7g080300, which also appears microsyntenic to the soybean loci encoding Glyma01g32750 and Glyma01g32760, two soybean genes expressed in a variety of organs (Figure S4; Tables S2). This example suggests the subfunctionalization of Glyma01g32750 and Glyma01g32760 after the divergence of G. max and M. truncatula. As Glyma18g44670Glyma18g44680 and Glyma01g32750Glyma01g32760 probably arose by tandem duplication, we assume that the subfunctionalization of Glyma01g32750 and Glyma01g32760 occurred after the duplication of the soybean genome, but before their tandem duplication. The above example further illustrates the value of genome and transcriptome comparisons that allow interesting conclusions concerning the orthology of specific genes, and their evolutionary history. Space prevents us from presenting a variety of additional examples. At this point, the annotation of the G. max, M. truncatula and L. japonicus genomes clearly needs improvement. We predict that the full integration of the syntenic and transcriptome analysis of these three genomes will ultimately lead to the systematic identification of legume orthologs. At that point, it will be possible to rapidly transfer genetic and functional knowledge derived in one species to the others.

Experimental procedures

  1. Top of page
  2. Summary
  3. Introduction
  4. Results and discussion
  5. Experimental procedures
  6. Acknowledgements
  7. References
  8. Supporting Information

Bacterial cultures

Bradyrhizobium japonicum USDA110 was grown at 30°C for 3 days in HM medium (Cole and Elkan, 1973), supplemented with yeast extract (0.025%), d-arabinose (0.1%) and chloramphenicol (0.004%). Before plant inoculation, B. japonicum cells were pelleted (2000 g for 10 min), washed and diluted with sterile water to OD600 = 0.1.

Plant culture

All tissues described below were isolated from soybean G. max (L.) Merr. cultivar ‘Williams 82’ plants. For each tissue, three independent biological replicates were performed on a different set of plants to ensure the reproducibility of the plant tissues analyzed (i.e. seeds were sowed three times on different days, and tissues were harvested as described below).

Soybean seeds were surface sterilized according to the method described by Wan et al. (2005), and were sowed on nitrogen-free B&D agar medium (Broughton and Dilworth, 1971). Untreated root hair cells and stripped roots used for qRT-PCR were isolated from 3-day-old seedlings, as described by Wan et al. (2005). A similar protocol was used to isolate 84- and 120-HAS root hairs (Libault et al., 2010; 84- and 120-HAS root hairs were mock-inoculated root hairs isolated 12 and 48 h after being sprayed with water).

Other tissues were isolated as described below. The 3-day-old seedlings were germinated between moist Whatman filter paper. Root tips were harvested on these seedlings. To produce other tissues, germinated seedlings were transferred to the glasshouse under long-day conditions (16-h day/8-h night) at 27°C on Promix Bx soil (Premier Horticulture, Fourteen-day-old SAM (V2 stage), 18-day-old trifoliate leaves, stem and roots (V2 stage), flowers (R2 stage), and seeds and pods (R6 stage) were harvested. Nodules were harvested 32 days after the inoculation of 1 ml of B. japonicum suspension (OD600 = 0.1) on transferred 3-day-old seedlings.

RNA extraction, DNase treatments, and reverse transcription

Total RNA was isolated using Trizol Reagent (Invitrogen, according to the manufacturer’s instructions, followed by a chloroform extraction to improve their purity. Total RNAs were treated and reverse-transcribed differentially regarding the technology used to quantify cDNA levels.

qRT-PCR.  The qRT-PCR reactions including the different controls were performed as described by Libault et al. (2009b).

Solexa sequencing.  For each condition, similar quantities of total RNA isolated from three independent biological replicates were pooled together. After first- and second-strand cDNA synthesis, the cDNAs were end repaired prior to ligation of Solexa adaptors. The products were sequenced on a Solexa platform.

Quantitative PCR reaction conditions and data analysis

The qRT-PCR reactions were performed as described by Libault et al. (2009b). The specificity of primer sets was confirmed by analyzing the dissociation curve profile of each qRT-PCR amplicon, and the efficiency of primers (Peff) was quantified using LinRegPCR (Ramakers et al., 2003). Cons6, encoding an F-box protein (Libault et al., 2008), was used to normalize the expression levels of putative soybean regulatory genes. The cycle threshold (Ct) value of the reference gene was subtracted from the Ct values of the test gene analyzed (ΔCt). The expression level (E) of each gene was calculated according to the equation: E = Peff(−ΔCt). The average of the expression levels between three different replicates was calculated.

Solexa read alignment, statistical analysis and data representation

Illumina Genome Analyzer II image data were base-called and quality filtered using the default filtering parameters of the Illumina GA Pipeline GERALD stage (Illumina, Inc., Alignments of passing 36-mer reads to all contigs of the Glyma1 8x Soybean Genome assembly (Soybean Genome Project, were performed using gsnap (Wu and Nacu, 2010), an alignment program derived from gmap (Wu and Watanabe, 2005), with optimizations for aligning short transcript reads from next-generation sequencers to genomic reference sequences. Alignments were processed using the Alpheus pipeline (Miller et al., 2008), keeping only alignments that had at least 34 out of 36 identities, and had no more than five equivalent best hits. Read counts used in expression analyses were based on the subset of uniquely aligned reads that also overlapped the genomic spans of the Glyma1 gene predictions. Read counts for a given sample were normalized by using values for a gene’s uniquely aligned read counts per million reads uniquely aligning within that sample.

The raw and normalized Solexa data are available on, whereas the entire set of Solexa sequences used in our studies can be downloaded from the NCBI SRA browser (accession number SRA012188.1;

The color code maps of the soybean transcriptome across the 20 chromosomes were generated by using the comparative map and trait viewer (cmtv) software (Sawkins et al., 2004).

Synteny analysis

To establish microsynteny between G. max and A. thaliana, amino acid sequences of the A. thaliana candidate genes and at least the 20 genes surrounding them were blasted against soybean genome sequences. Using a P < e−20 as a cut-off, BLAST results and gene annotation were analyzed manually to established microsynteny.

To compare the gene expression of orthologous genes between G. max, M. truncatula and L. japonicus, we first mapped the medicago and lotus Affymetrix probe sets against their respective genomes based on NCBI BLASTN searches. Only probe sets with at least nine matching probes, sited at least 22-bp up- or downstream of a 4000-bp region, were considered for further analysis. The BLAST of the predicted soybean transcripts against the Medicago Mt v3.0 ( and Lotus pseudogenomes ( associated with the mapping of the Medicago and Lotus Affymetrix probe sets led to a direct comparison of the expression of the soybean, Medicago and Lotus genes. When genes shared a similar tissue specificity, we highlighted their orthology by establishing a microsynteny relationship between them using the same methodology described above.

Graphics showing microsynteny relationships were generated by using cmtv (Sawkins et al., 2004).


  1. Top of page
  2. Summary
  3. Introduction
  4. Results and discussion
  5. Experimental procedures
  6. Acknowledgements
  7. References
  8. Supporting Information

We thank Melanie Mormile, Sandra Thibivilliers and Charlie P. Jones for their critical reading of the manuscript. We also thank Chia Rou Yeo for technical assistance and Shaoxing Wang for providing some total RNA samples. We are also grateful to the Medicago Genome Sequence Consortium (MGSC) for providing M. truncatula genomic sequences. This work was funded by a grant from the National Science Foundation (Plant Genome Program, #DBI-0421620). TJ, LDF and DX were supported by United Soybean Board grant #8236.


  1. Top of page
  2. Summary
  3. Introduction
  4. Results and discussion
  5. Experimental procedures
  6. Acknowledgements
  7. References
  8. Supporting Information
  • Aceituno, F.F., Moseyko, N., Rhee, S.Y. and Gutierrez, R.A. (2008) The rules of gene expression in plants: organ identity and gene body methylation are key factors for regulation of gene expression in Arabidopsis thaliana. BMC Genomics, 9, 438.
  • Adams, K.L. (2007) Evolution of duplicate gene expression in polyploid and hybrid plants. J. Hered. 98, 136141.
  • Amor, B.B., Shaw, S.L., Oldroyd, G.E., Maillet, F., Penmetsa, R.V., Cook, D., Long, S.R., Denarie, J. and Gough, C. (2003) The NFP locus of Medicago truncatula controls an early step of Nod factor signal transduction upstream of a rapid calcium flux and root hair deformation. Plant J. 34, 495506.
  • Barton, M.K. and Poethig, R.S. (1993) Formation of the shoot apical meristem in Arabidopsis thaliana: an analysis of development in the wild type and in the shoot meristemless mutant. Development (Cambridge, England), 119, 823831.
  • Baumberger, N., Ringli, C. and Keller, B. (2001) The chimeric leucine-rich repeat/extensin cell wall protein LRX1 is required for root hair morphogenesis in Arabidopsis thaliana. Genes Dev. 15, 11281139.
  • Baumberger, N., Doesseger, B., Guyot, R. et al. (2003) Whole-genome comparison of leucine-rich repeat extensins in Arabidopsis and rice. A conserved family of cell wall proteins form a vegetative and a reproductive clade. Plant Physiol. 131, 13131326.
  • Benedito, V.A., Torres-Jerez, I., Murray, J.D. et al. (2008) A gene expression atlas of the model legume Medicago truncatula. Plant J. 55, 504513.
  • Bennett, S.T., Barnes, C., Cox, A., Davies, L. and Brown, C. (2005) Toward the 1,000 dollars human genome. Pharmacogenomics, 6, 373382.
  • Broughton, W.J. and Dilworth, M.J. (1971) Control of leghaemoglobin synthesis in snake beans. Biochem. J. 125, 10751080.
  • Bucher, M., Brunner, S., Zimmermann, P., Zardi, G.I., Amrhein, N., Willmitzer, L. and Riesmeier, J.W. (2002) The expression of an extensin-like protein correlates with cellular tip growth in tomato. Plant Physiol. 128, 911923.
  • Cannon, S.B., Sterck, L., Rombauts, S. et al. (2006) Legume genome evolution viewed through the Medicago truncatula and Lotus japonicus genomes. Proc. Natl Acad. Sci. USA, 103, 1495914964.
  • Carol, R.J. and Dolan, L. (2006) The role of reactive oxygen species in cell growth: lessons from root hairs. J. Exp. Bot. 57, 18291834.
  • Catoira, R., Galera, C., De Billy, F., Penmetsa, R.V., Journet, E.P., Maillet, F., Rosenberg, C., Cook, D., Gough, C. and Denarie, J. (2000) Four genes of Medicago truncatula controlling components of a nod factor transduction pathway. Plant Cell, 12, 16471666.
  • Cheung, F., Haas, B.J., Goldberg, S.M., May, G.D., Xiao, Y. and Town, C.D. (2006) Sequencing Medicago truncatula expressed sequenced tags using 454 Life Sciences technology. BMC Genomics, 7, 272.
  • Choi, H.K., Mun, J.H., Kim, D.J. et al. (2004) Estimating genome conservation between crop and model legume species. Proc. Natl Acad. Sci. USA, 101, 1528915294.
  • Coen, E. (2001) Goethe and the ABC model of flower development. C. R. Acad. Sci. III, 324, 523530.
  • Cole, M.A. and Elkan, G.H. (1973) Transmissible resistance to penicillin G, neomycin, and chloramphenicol in Rhizobium japonicum. Antimicrob. Agents Chemother. 4, 248253.
  • Combier, J.P., Frugier, F., De Billy, F. et al. (2006) MtHAP2-1 is a key transcriptional regulator of symbiotic nodule development regulated by microRNA169 in Medicago truncatula. Genes Dev. 20, 30843088.
  • Ditta, G., Pinyopich, A., Robles, P., Pelaz, S. and Yanofsky, M.F. (2004) The SEP4 gene of Arabidopsis thaliana functions in floral organ and meristem identity. Curr. Biol. 14, 19351940.
  • Endrizzi, K., Moussian, B., Haecker, A., Levin, J.Z. and Laux, T. (1996) The SHOOT MERISTEMLESS gene is required for maintenance of undifferentiated cells in Arabidopsis shoot and floral meristems and acts at a different regulatory level than the meristem genes WUSCHEL and ZWILLE. Plant J. 10, 967979.
  • Gill, N., Findley, S., Walling, J.G., Hans, C., Ma, J., Doyle, J., Stacey, G. and Jackson, S.A. (2009) Molecular and chromosomal evidence for allopolyploidy in soybean. Plant Physiol. 151, 11671174.
  • Haney, C.H. and Long, S.R. (2009) Plant flotillins are required for infection by nitrogen-fixing bacteria. Proc. Natl. Acad. Sci. USA, 107, 478483.
  • Heckmann, A.B., Lombardo, F., Miwa, H., Perry, J.A., Bunnewell, S., Parniske, M., Wang, T.L. and Downie, J.A. (2006) Lotus japonicus nodulation requires two GRAS domain regulators, one of which is functionally conserved in a non-legume. Plant Physiol. 142, 17391750.
  • Hogslund, N., Radutoiu, S., Krusell, L. et al. (2009) Dissection of symbiosis and organ development by integrated transcriptome analysis of lotus japonicus mutant and wild-type plants. PLoS ONE, 4, e6556.
  • Honma, T. and Goto, K. (2001) Complexes of MADS-box proteins are sufficient to convert leaves into floral organs. Nature, 409, 525529.
  • Hruz, T., Laule, O., Szabo, G., Wessendorp, F., Bleuler, S., Oertle, L., Widmayer, P., Gruissem, W. and Zimmermann, P. (2008) Genevestigator V3: a reference expression database for the meta-analysis of transcriptomes. Adv. Bioinformatics, 420747.
  • Jiao, Y., Tausta, S.L., Gandotra, N. et al. (2009) A transcriptome atlas of rice cell types uncovers cellular, functional and developmental hierarchies. Nat. Genet. 41, 258263.
  • Kalo, P., Gleason, C., Edwards, A. et al. (2005) Nodulation signaling in legumes requires NSP2, a member of the GRAS family of transcriptional regulators. Science, 308, 17861789.
  • Kouchi, H. and Hata, S. (1995) GmN56, a novel nodule-specific cDNA from soybean root nodules encodes a protein homologous to isopropylmalate synthase and homocitrate synthase. Mol. Plant Microbe Interact. 8, 172176.
  • Laux, T., Mayer, K.F., Berger, J. and Jurgens, G. (1996) The WUSCHEL gene is required for shoot and floral meristem integrity in Arabidopsis. Development, 122, 8796.
  • Libault, M., Thibivilliers, S., Bilgin, D.D., Radwan, O., Benitez, M., Clough, S.J. and Stacey, G. (2008) Identification of four soybean reference genes for gene expression normalization. The Plant Genome, 1, 4454.
  • Libault, M., Joshi, T., Benedito, V.A., Xu, D., Udvardi, M.K. and Stacey, G. (2009a) Legume transcription factor genes: what makes legumes so special? Plant Physiol. 151, 9911001.
  • Libault, M., Joshi, T., Takahashi, K. et al. (2009b) Large-scale analysis of putative soybean regulatory gene expression identifies a myb gene involved in soybean nodule development. Plant Physiol. 151, 12071220.
  • Libault, M., Farmer, A., Brechenmacher, L. et al. (2010) Complete transcriptome of soybean root hair cell, a single cell model, and its alteration in response to Bradyrhizobium japonicum infection. Plant Physiol. 152, 541552.
  • Madsen, E.B., Madsen, L.H., Radutoiu, S. et al. (2003) A receptor kinase gene of the LysM type is involved in legume perception of rhizobial signals. Nature, 425, 637640.
  • Margulies, M., Egholm, M., Altman, W.E. et al. (2005) Genome sequencing in microfabricated high-density picolitre reactors. Nature, 437, 376380.
  • Mayer, K.F., Schoof, H., Haecker, A., Lenhard, M., Jurgens, G. and Laux, T. (1998) Role of WUSCHEL in regulating stem cell fate in the Arabidopsis shoot meristem. Cell, 95, 805815.
  • Middleton, P.H., Jakab, J., Penmetsa, R.V. et al. (2007) An ERF transcription factor in Medicago truncatula that is essential for Nod factor signal transduction. Plant Cell, 19, 12211234.
  • Miller, N.A., Kingsmore, S.F., Farmer, A.D. et al. (2008) Management of high-throughout DNA sequencing projects: Alpheus. J. Comput. Sci. Syst. Biol. 1, 132148.
  • Müller, S., Han, S. and Smith, L.G. (2006) Two kinesins are involved in the spatial control of cytokinesis in Arabidopsis thaliana. Curr. Biol. 16, 888894.
  • Murakami, Y., Miwa, H., Imaizumi-Anraku, H., Kouchi, H., Downie, J.A., Kawaguchi, M. and Kawasaki, S. (2006) Positional cloning identifies Lotus japonicus NSP2, a putative transcription factor of the GRAS family, required for NIN and ENOD40 gene expression in nodule initiation. DNA Res. 13, 255265.
  • Nobuta, K., Venu, R.C., Lu, C. et al. (2007) An expression atlas of rice mRNAs and small RNAs. Nat. Biotechnol. 25, 473477.
  • Oldroyd, G.E. and Downie, J.A. (2008) Coordinating nodule morphogenesis with rhizobial infection in legumes. Annu. Rev. Plant. Biol. 59, 519546.
  • Oldroyd, G.E. and Long, S.R. (2003) Identification and characterization of nodulation-signaling pathway 2, a gene of Medicago truncatula involved in Nod actor signaling. Plant Physiol. 131, 10271032.
  • Pelaz, S., Tapia-Lopez, R., Alvarez-Buylla, E.R. and Yanofsky, M.F. (2001) Conversion of leaves into petals in Arabidopsis. Curr. Biol. 11, 182184.
  • Radutoiu, S., Madsen, L.H., Madsen, E.B. et al. (2003) Plant recognition of symbiotic bacteria requires two LysM receptor-like kinases. Nature, 425, 585592.
  • Ramakers, C., Ruijter, J.M., Deprez, R.H. and Moorman, A.F. (2003) Assumption-free analysis of quantitative real-time polymerase chain reaction (PCR) data. Neurosci. Lett. 339, 6266.
  • Robles, P. and Pelaz, S. (2005) Flower and fruit development in Arabidopsis thaliana. Int. J. Dev. Biol. 49, 633643.
  • Sawkins, M.C., Farmer, A.D., Hoisington, D., Sullivan, J., Tolopko, A., Jiang, Z. and Ribaut, J.M. (2004) Comparative map and trait viewer (CMTV): an integrated bioinformatic tool to construct consensus maps and compare QTL and functional genomics data across genomes and experiments. Plant Mol. Biol. 56, 465480.
  • Schauser, L., Roussis, A., Stiller, J. and Stougaard, J. (1999) A plant regulator controlling development of symbiotic root nodules. Nature, 402, 191195.
  • Schlueter, J.A., Dixon, P., Granger, C., Grant, D., Clark, L., Doyle, J.J. and Shoemaker, R.C. (2004) Mining EST databases to resolve evolutionary events in major crop species. Genome, 47, 868876.
  • Schlueter, J.A., Lin, J.Y., Schlueter, S.D. et al. (2007) Gene duplication and paleopolyploidy in soybean and the implications for whole genome sequencing. BMC Genomics, 8, 330.
  • Schmid, M., Davison, T.S., Henz, S.R., Pape, U.J., Demar, M., Vingron, M., Scholkopf, B., Weigel, D. and Lohmann, J.U. (2005) A gene expression map of Arabidopsis thaliana development. Nat. Genet. 37, 501506.
  • Schmutz, J., Cannon, S.B., Schlueter, J. et al. (2010) Genome sequence of the palaeopolyploid soybean. Nature, 463, 178183.
  • Smit, P., Raedts, J., Portyanko, V., Debelle, F., Gough, C., Bisseling, T. and Geurts, R. (2005) NSP1 of the GRAS protein family is essential for rhizobial Nod factor-induced transcription. Science, 308, 17891791.
  • Wan, J., Torres, M., Ganapathy, A., Thelen, J., DaGue, B.B., Mooney, B., Xu, D. and Stacey, G. (2005) Proteomic analysis of soybean root hairs after infection by Bradyrhizobium japonicum. Mol. Plant Microbe Interact. 18, 458467.
  • Weber, A.P., Weber, K.L., Carr, K., Wilkerson, C. and Ohlrogge, J.B. (2007) Sampling the Arabidopsis transcriptome with massively parallel pyrosequencing. Plant Physiol. 144, 3242.
  • Winzer, T., Bairl, A., Linder, M., Linder, D., Werner, D. and Muller, P. (1999) A novel 53-kDa nodulin of the symbiosome membrane of soybean nodules, controlled by Bradyrhizobium japonicum. Mol. Plant Microbe Interact. 12, 218226.
  • Wu, T.D. and Nacu, S. (2010) Fast and SNP-tolerant detection of complex variants and splicing in short reads. Bioinformatics, 26, 873881.
  • Wu, T.D. and Watanabe, C.K. (2005) GMAP: a genomic mapping and alignment program for mRNA and EST sequences. Bioinformatics, 21, 18591875.
  • Yano, K., Yoshida, S., Muller, J. et al. (2008) CYCLOPS, a mediator of symbiotic intracellular accommodation. Proc. Natl Acad. Sci. USA, 105, 2054020545.
  • Young, N.D. and Udvardi, M. (2009) Translating Medicago truncatula genomics to crop legumes. Curr. Opin. Plant Biol. 12, 193201.
  • Zdobnov, E.M. and Apweiler, R. (2001) InterProScan--an integration platform for the signature-recognition methods in InterPro. Bioinformatics, 17, 847848.

Supporting Information

  1. Top of page
  2. Summary
  3. Introduction
  4. Results and discussion
  5. Experimental procedures
  6. Acknowledgements
  7. References
  8. Supporting Information

Figure S1. Expression levels of putative soybean (Glycine max) constitutive genes in 14 different conditions (y–axis) compared with the average of their expression levels across the 14 conditions (x–axis).

Figure S2. Twenty nine soybean nodulation-related genes were expressed preferentially in mature nodules. For each gene, their expression levels in each of the nine tissues tested (root hair cells isolated from 84 and 120 h after sowing (HAS), root tip, root, mature nodules, leaves, shoot apical meristem (SAM), flower and green pods) were expressed in percentage. Among them, soybean genes being highly specifically expressed in the nodules (≥100 fold-changes compare to the second tissue where the gene was the most expressed) are highlighted by an arrow.

Figure S3. Syntenicrelationship between G.    max, and A.    thaliana genes involved in flower organ determination and the maintenance of the shoot apical meristem. Genes orthologshighlighted in the manuscript are indicated in bold characters. Other genes annotations were indicated to show the limits of the orthologouschromosome regions compared in this analysis.

Figure S4. Syntenicrelationship between G.    max, M.    truncatulaand L.    japonicusgenes surrounding soybean nodule and flower-specific genes. The gene structure of soybean (Gm), medicago(Mt) and lotus (Lj) chromosomes fragments was compared to highlight syntenyrelationship. These comparisons were performed based on gene function, order and direction. Green and red links between chromosome fragments highlight syntenybetween genes. Syntenyrelationships highlighted in Supplemental table 4 are highlighted in red. Others orthologiesare highlighted in black. Genes orthologshighlighted in the manuscript are indicated in bold characters. Other genes annotations were indicated to show the limits of the orthologouschromosome regions compared in this analysis.

Figure S5. Comparison of the transcriptomesof 1016 soybean regulatory genes quantified by qRT-PCR across 11 soybean tissues. Ward hierarchal clustering using Pearson-correlation coefficients was performed using qRT-PCR gene expression normalized against the mean value of gene expression across 11 tissue samples [root hair cells, root tip, stripped root, root, nodulatedroots, stem, leaf, shoot apical meristem(SAM), flower, pods and seeds]. The color scale indicates the degree of correlation (green: low correlation; red: strong correlation). The heat map was generated using JMP Genomics 4.0. Details of the raw data used to generate this graphic are available in Supplemental Table 15.

Table S1. Gene expression pattern of predicted and unannotated Glycine max (soybean) genes in nine different tissues.

Table S2. Gene expression pattern of predicted and unannotated Glycine max (soybean) genes in nine different tissues, and in root hair and stripped roots in response to Bradyrhizobium japonicum.

Table S3. Confidence in gene prediction according to Schmutz et al. (2010) of 16198 soybean genes not expressed in soybean tissues and in the early steps of nodulation. Genes predicted with a low confidence are highlighted in bold characters.

Table S4. Gene expression of Glycine max (soybean) homeologous genes.

Table S5. Gene expression of Glycine max (soybean) homeologous genes relative to putative pseudogenes.

Table S6. Unannotated sequence reads that overlap Glycine max (soybean) annotated genes leading to an improvement of the soybean gene annotation.

Table S7. Identification of the signature domains of the 1736 proteins encoded by the putative new Glycine max (soybean) genes.

Table S8. Expression levels of Glycine max (soybean) sequences identified to be ubiquitously expressed across the nine soybean tissues tested.

Table S9. Gene expression and function of putative Glycine max (soybean) constitutive genes across 14 different conditions. TF genes are highlighted in bold characters.

Table S10. Identification of Glycine max (soybean) transcripts preferentially (≥3- and <10–fold changes between the expression levels of the most highly expressed and second most highly expressed genes; yellow), specifically (≥10- and <100–fold changes; orange), very specifically (≥100- and <1000–fold changes; red) and exclusively (≥1000–fold change; purple) identified in one of the nine tissues tested.

Table S11. Identification of Glycine max (soybean) transcripts preferentially (≥3- and <10–fold change; yellow), specifically (≥10- and <100–fold change; orange) and very specifically (≥100- and <1000–fold change; red) identified in soybean root hair cells and meristems.

Table S12. Relative gene expression levels of putative Glycine max (soybean) nodulation-related genes in nine different tissues, including mature nodules. These genes were identified by Schmutz et al. (2009). Genes highlighted in yellow are preferentially expressed in mature nodules compared to the eight remaining tissues.

Table S13. Identification of Glycine max (soybean) transcription factor genes preferentially (≥3- and <10–fold changes; yellow), specifically (≥10- and <100–fold changes; orange), very specifically (≥100- and <1000–fold changes; red) and exclusively (>1000–fold change; purple) expressed in one out of the nine tissues tested.

Table S14. Gene expression pattern between Glycine max (soybean), Medicago truncatula and Lotus japonicus orthologous genes. The three legume atlases were mined to compare the expression of putative orthologous genes (present study; Benedito et al., 2008; Høgslund et al., 2009). In several cases, microsynteny relationships confirmed the orthology between these genes (See Supplemental Figure 5 for details).

Table S15. Gene expression of 1016 Glycine max (soybean) regulatory genes in 11 different soybean tissues. qRT-PCR was used to quantify their expression.

Table S16. Identification of tissue-specific Glycine max (soybean) regulatory genes, based on qRT-PCR experiments. The expression patterns of tissue-specific TF genes identified by qRT-PCR were compared to Solexa sequencing . When identified as tissue-specific by one (≥3 fold-change), TF genes were highlighted in bold characters and their tissue specificty was highlighted in green. Tissue specific TF genes identified by both technologies are highlighted in yellow. Columns N and O listed the tissue-specificity of the TF genes (bold characters) or the tissue where they are preferentially expressed (normal characters).

Appendix S1. Large-scale qRT-PCR of Glycine max (soybean) transcription factor genes.

As a service to our authors and readers, this journal provides supporting information supplied by the authors. Such materials are peer-reviewed and may be re-organized for online delivery, but are not copy-edited or typeset. Technical support issues arising from supporting information (other than missing files) should be addressed to the authors.

TPJ_4222_sm_appendix-s1.doc31KSupporting info item
TPJ_4222_sm_figure-s1.pdf972KSupporting info item
TPJ_4222_sm_figure-s2.pdf167KSupporting info item
TPJ_4222_sm_figure-s3.pdf1367KSupporting info item
TPJ_4222_sm_figure-s4.pdf1218KSupporting info item
TPJ_4222_sm_figure-s5.pdf109KSupporting info item
TPJ_4222_sm_table-s1.rar3802KSupporting info item
TPJ_4222_sm_table-s2.rar5053KSupporting info item
TPJ_4222_sm_table-s3.xls1085KSupporting info item
TPJ_4222_sm_table-s4.rar2291KSupporting info item
TPJ_4222_sm_table-s5.xls54KSupporting info item
TPJ_4222_sm_table-s6.xls1114KSupporting info item
TPJ_4222_sm_table-s7.xls1972KSupporting info item
TPJ_4222_sm_table-s8.rar1151KSupporting info item
TPJ_4222_sm_table-s9.xls3174KSupporting info item
TPJ_4222_sm_table-s10.xls2029KSupporting info item
TPJ_4222_sm_table-s11.xls180KSupporting info item
TPJ_4222_sm_table-s12.xls54KSupporting info item
TPJ_4222_sm_table-s13.xls192KSupporting info item
TPJ_4222_sm_table-s14.xls39KSupporting info item
TPJ_4222_sm_table-s15.xls946KSupporting info item
TPJ_4222_sm_table-s16.xls72KSupporting info item

Please note: Wiley Blackwell is not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing content) should be directed to the corresponding author for the article.