• comparative transcriptomics;
  • model species;
  • microarray

Plant biology has reached a stage where more and more molecular tools are being developed for species other than the general reference species like Arabidopsis or rice. Unfortunately this does not often include the construction of ‘whole genome’ microarrays. Although genome-wide expression analysis is very instructive in obtaining clues about (novel) genes that are active under the studied conditions, the cost of designing and producing microarrays for the wide variety of species with interesting traits are too great. That is also why, in recent years, several successful attempts have been made to carry out heterologous microarray hybridizations for studying the transcriptome of different non-model species. In this issue of New Phytologist Hammond et al. (pp. 239–260) present an improved method of using an Arabidopsis Affymetrix array to determine the transcript profile of two related Thlaspi species with contrasting metal accumulation and tolerance phenotypes.

‘... it will be a true challenge to develop the bioinformatic tools to accomplish the design of gene-specific probes that are able to detect orthologues while distinguishing paralogues.’

Comparative transcriptomics in plant biology

  1. Top of page
  2. Comparative transcriptomics in plant biology
  3. Challenges for comparative transcriptomics
  4. References

The two main reasons for performing heterologous transcript profiling are: (1) to compare two closely related species with contrasting traits, and (2) to compare different tissues, stages or conditions in a species for which no specific arrays are available. These two objectives are often combined, as has also been done by Hammond et al. The need for comparative transcriptomics, which inherently involves heterologous microarray hybridization, is often fueled by the presence of interesting traits and properties in one species that are not found in any of the current model species. If the right conditions and tissues are chosen for sampling, and assuming that many of the basic processes in plants are similar, the differences found by comparative transcriptomics will include the detection of genes involved in the studied trait or property. Good examples of this are adaptive traits, such as physiological adaptation to continuous adverse environmental conditions (low or high light intensity, high altitude, low or high temperatures, extreme soil composition – pH, salt, heavy metals, nutrient deficiencies, regular flooding, etc.) or developmental adaptations (leaf or root hair density, leaf shape, flowering time, vernalization response, wax deposition, etc.). Other examples are differences in general plant architecture such as fruit or flower size, tuberization, etc.; differences in plant defense or differences in primary or secondary metabolite production. In order to obtain informative hybridization results, the two species under comparison should be closely related to allow for sufficient probe–cDNA cross-hybridization.

In particular, Arabidopsis arrays have been used for comparative transcriptomics in the past to compare Arabidopsis (Arabidopsis thaliana) to related Brassicaceae species such as Arabidopsis halleri (Becher et al., 2004; Weber et al., 2004), Thlaspi caerulescens (Hammond et al., 2006; van de Mortel et al., unpublished), Thellungiella halophila (Taji et al., 2004; Gong et al., 2005), Brassica oleracea (Hammond et al., 2005) and B. napus (Li et al., 2005), but also recently a tomato array has been successfully used to examine fruit ripening and development in tomato, eggplant and pepper (Moore et al., 2005), and no doubt more such experiments will soon follow. Three of the Brassicaceae species were studied because of their special adaptive traits, such as Zn and/or Cd hyperaccumulation and hypertolerance of A. halleri and T. caerulescens, and salt and cold tolerance of T. halophila. An interesting general trend that emerged from the analysis of the adapted species was that many orthologues of genes that were induced by stress in Arabidopsis were more highly expressed in the adapted species, especially in the absence of the stressor.

As the Arabidopsis genome was the first plant genome to be fully sequenced, the largest variety of array platforms is available for this model species. In general, four different types can be distinguished: (1) spotted (full-length) cDNA microarrays; (2) spotted PCR-amplified gene-specific sequence tag (GST) or genomic amplicon arrays; (3) on-slide synthesized short oligonucleotide arrays; and (4) spotted or on-slide synthesized long oligonucleotide arrays (Rensink & Buell, 2005). Spotted cDNA microarrays are the least sophisticated, containing a collection of clones from different cDNA libraries, which are PCR-amplified using vector specific primers. Short oligonucleotide arrays contain probes up to 25 bases in length. Long oligonucleotide arrays contain probes of between 50 and 70 bases. The spotted gene-specific fragment arrays contain unique segments of the gene which are amplified from genomic DNA (gDNA), or gDNA libraries using specific primers for each gene fragment. Oligo arrays are generally more technologically advanced and are therefore often only commercially available, while the spotted PCR fragment arrays are often developed within academia. Next to the Arabidopsis arrays there is a growing list of arrays for other species, mainly crops like barley, Brassica, Citrus, grape, lily, maize, potato, rice, soybean, sugar cane, tomato and wheat, but also for trees such as poplar, pine and spruce and for the legume model Medicago truncatula (Rensink & Buell, 2005; Huang et al., 2006; Ralph et al., 2006).

Considering that the successful use of heterologous microarray hybridization depends largely on the level of sequence similarity and considering that this is highest among members of the same plant families, the current range of arrays already covers many agronomically important plant families (Brassicaceae, Leguminosae, Graminae, Solanaceae). Of course it will still be possible that there is too much sequence diversity for members of the same family to permit heterologous microarray hybridization. Even if the average level of sequence identity is sufficient, it is important to note that the subset of probes may not hybridize efficiently due to lower sequence identity. In the Thlaspi interspecies comparison method described by Hammond et al., the Affymetrix arrays used contained several 25-bp oligonucleotide probes per gene. T. caerulescens has an average coding region DNA identity with Arabidopsis of 88.5% (Rigola et al., 2006). Upon heterologous hybridization, many probes did not perfectly match the orthologous Thlaspi sequence, resulting in a relatively low number of ‘present’ calls. After hybridization with Thlaspi gDNA, non-fitting probe pairs could be discarded without discarding the entire probe-set and thus still generate acceptable expression data. This method is only useful for microarray platforms that contain multiple probes for one gene. Moreover, when comparing two species, care should be taken to include the same probes in the final set for comparison. If not, using one probe for the gene in one species and another for the orthologous gene in the other can lead to false conclusions on expression levels. When using arrays with only one probe per gene, discarding probes that do not hybridize properly to the target cDNAs will result in fewer genes for which expression can be determined. This may be a disadvantage compared to short oligonucleotide arrays, but on the other hand, longer oligos permit less sequence conservation between species, so that less probes will be discarded compared to the short oligo arrays.

Challenges for comparative transcriptomics

  1. Top of page
  2. Comparative transcriptomics in plant biology
  3. Challenges for comparative transcriptomics
  4. References

Comparative transcriptomics can be a very rewarding tool for discovering new gene expression profiles in plant species in the absence of species-specific cDNA information or microarrays, on the condition that there are arrays from a sufficiently closely related species. It is clear that such is still not the case for many important families. Currently for instance, the Caryophyllaceae, Chenopodiaceae, Compositae, Rosaceae and Umbelliferae are not yet represented among the species for which microarrays are available. Even though for a growing number of species, including T. caerulescens (Plessl et al. 2005), spotted cDNA microarrays are being developed, the number of genes represented on such arrays are often limited and more information will probably be obtained by hybridizing to a heterologous, but genome-wide, microarray. Rather than trying to complete as many as possible of the species specific arrays, it will probably be more efficient to focus on the development of a family specific array. This may well be based on the transcriptome of one reference species, like Arabidopsis for the Brassicaceae, supplemented with probes representing genes not found in Arabidopsis but present in other Brassicaceae. The growing collection of Expressed Sequence Tags that are generated for very many different species would be an excellent source of information for the careful design of plant-family oriented gene-specific long oligonucleotide probes. This will not be easy and it will be a true challenge to develop the bioinformatic tools enabling the design of gene-specific probes that can detect orthologues while distinguishing paralogues, a problem that is in addition not solved in designing species-specific microarrays.

Another important item that should be dealt with to improve comparative transcriptomics is to be able to account for all possible transcripts that can be found in a plant cell. Even for Arabidopsis, of which the full genome sequence is known and for which gene expression has been studied by many groups all over the world, the recent use of whole-genome tiling arrays (WGAs) showed that there were still many regions with transcriptional activity although there was no gene annotated (Mockler & Ecker, 2005). WGAs contain non- or partially overlapping probes that are tiled to cover the entire genome. They are instructive for identifying rarely or lowly expressed genes or miRNAs that are hard to identify or predict otherwise, and for designing probes to add to the current arrays. Although very informative for plant genomics, including transcriptomics, the high costs associated with making such arrays makes it unlikely that this kind of array will soon be available for many gene families. Moreover, for these, model species will lead the way.


  1. Top of page
  2. Comparative transcriptomics in plant biology
  3. Challenges for comparative transcriptomics
  4. References
  • Becher M, Talke IN, Krall L, Kramer U. 2004. Cross-species microarray transcript profiling reveals high constitutive expression of metal homeostasis genes in shoots of the zinc hyperaccumulator Arabidopsis halleri. Plant Journal 37: 251268.
  • Gong Q, Li P, Ma S, Rupassara I, Bohnert HJ. 2005. Salinity stress adaptation competence in the extremophile Thellungiella halophila in comparison with its relative Arabidopsis thaliana. Plant Journal 44: 826839.
  • Hammond JP, Bowen HC, White PJ, Mills V, Pyke KA, Baker AJM, Whiting SN, May ST, Broadley MR. 2006. A comparison of the Thlaspi caerulescens and T. arvense shoot transcriptome. New Phytologist 170: 239260.
  • Hammond JP, Broadley MR, Craigon DJ, Higgins J, Emmerson ZF, Townsend HJ, White PJ, May ST. 2005. Using genomic DNA-based probe selection to improve the sensitivity of high-density oligonucleotide arrays when applied to heterologous species. Plant Methods 1: doi: 10.1186/1746-4811-1-10.
  • Huang J, Chen F, Del Casino C, Autino A, Shen M, Yuan S, Peng J, Shi H, Wang C, Cresti M, Li Y. 2006. LlANK, characterized as a ubiquitin ligase, is closely associated with membrane-enclosed organelles and required for pollen germination and pollen tube growth in Lilium longiflorum. Plant Physiology. (In press.)
  • Li F, Wu X, Tsang E, Cutler AJ. 2005. Transcriptional profiling of imbibed Brassica napus seed. Genomics 86: 718730.
  • Mockler TC, Ecker JR. 2005. Applications of DNA tiling arrays for whole-genome analysis. Genomics 85: 115.
  • Moore S, Payton P, Wright M, Tanksley S, Giovannoni J. 2005. Utilization of tomato microarrays for comparative gene expression analysis in the Solanaceae. Journal of Experimental Botany 56: 28852895.
  • Plessl M, Rigola D, Hassinen V, Aarts MGM, Schat H, Ernst D. 2005. Transcription profiling of the metal-hyperaccumulator Thlaspi caerulescens (J. & C. PRESL). Zeitschrift für Naturforschung 60c: 216223.
  • Ralph S, Park JY, Bohlmann J, Mansfield SD. 2006. Dirigent proteins in conifer defense: gene discovery, phylogeny, and differential wound- and insect-induced expression of a family of DIR and DIR-like genes in spruce (Picea spp.). Plant Molecular Biology 60: 2140.
  • Rensink WA, Buelll CR. 2005. Microarray expression profiling resources for plant genomics. Trends in Plant Science 10: 603609.
  • Rigola D, Fiers M, Vurro E, Aarts MGM. 2006. The heavy metal hyperaccumulator Thlaspi caerulescens expresses many species-specific genes as identified by comparative EST analysis. New Phytologist. (In press.)
  • Taji T, Seki M, Satou M, Sakurai T, Kobayashi M, Ishiyama K, Narusaka Y, Narusaka M, Zhu J, Shinozaki K. 2004. Comparative genomics in salt tolerance between Arabidopsis and Arabidopsis-related halophyte Salt Cress using Arabidopsis microarray. Plant Physiology 135: 16971709.
  • Weber M, Harada E, Vess C, Roepenack-Lahaye E, Clemens S. 2004. Comparative microarray analysis of Arabidopsis thaliana and Arabidopsis halleri roots identifies nicotianamine synthase, a ZIP transporter and other genes as potential metal hyperaccumulation factors. Plant Journal 37: 269281.