A comparison of the Thlaspi caerulescens and Thlaspi arvense shoot transcriptomes

Authors


Author for correspondence: Martin R. Broadley Tel: +44 (0) 115 9516382 Fax: +44 (0) 115 9516334 Email: martin.broadley@nottingham.ac.uk

Summary

  • • Whole-genome transcriptome profiling is revealing how biological systems are regulated at the transcriptional level. This study reports the development of a robust method to profile and compare the transcriptomes of two nonmodel plant species, Thlaspi caerulescens, a zinc (Zn) hyperaccumulator, and Thlaspi arvense, a nonhyperaccumulator, using Affymetrix Arabidopsis thaliana ATH1-121501 GeneChip® arrays (Affymetrix, Santa Clara, CA, USA).
  • • Transcript abundance was quantified in the shoots of agar- and compost-grown plants of both species. Analyses were optimized using a genomic DNA (gDNA)-based probe-selection strategy based on the hybridization efficiency of Thlaspi gDNA with corresponding A. thaliana probes. In silico alignments of GeneChip® probes with Thlaspi gene sequences, and quantitative real-time PCR, confirmed the validity of this approach.
  • • Approximately 5000 genes were differentially expressed in the shoots of T. caerulescens compared with T. arvense, including genes involved in Zn transport and compartmentalization.
  • • Future functional analyses of genes identified as differentially expressed in the shoots of these closely related species will improve our understanding of the molecular mechanisms of Zn hyperaccumulation.

Introduction

The advent of whole-genome transcriptome profiling, using high-density microarrays, has had a substantial impact on our understanding of biological systems and how these are regulated at the transcriptional level. Currently, these advances have been restricted to a small number of species for which microarrays have been developed. This is particularly evident for plants, with several microarray platforms commercially available for Arabidopsis thaliana (L.) Heynh., and one or more platforms available for barley (Hordeum vulgare), rice (Oryza sativa), maize (Zea mays), tomato (Lycopersicon esculentum), soybean (Glycine max), sugar cane (Saccharum officinarum), grape (Vitis vinifera) and wheat (Triticum aestivum). In contrast, the transcriptomes of most plant species have received little attention, as extensive sequence information and fabrication of custom arrays are required before experimentation can begin. This study reports the development of a robust method to profile and compare the transcriptomes of two nonmodel plant species, Thlaspi caerulescens J & C Presl., a zinc (Zn) hyperaccumulator, and T. arvense L., a nonhyperaccumulator, using Affymetrix A. thaliana ATH1-121501 (ATH1) GeneChip® arrays (Affymetrix, Santa Clara, CA, USA).

Affymetrix high-density oligonucleotide (oligo) GeneChip® arrays are a widely used tool for transcriptional profiling (Lipshultz et al., 1999; Hennig et al., 2003). In contrast to other microarray platforms [such as the longer oligo arrays produced by Agilent Technologies Inc. (Palo Alto, CA, USA) and Qiagen (Crawley, West Sussex, UK)], GeneChip® arrays represent each gene with multiple oligo probes. Probes representing one gene are collectively called a probe set. Each probe set comprises between 11 and 20 probe pairs. Each probe pair comprises a perfect-match (PM) and a mismatch (MM) probe. The PM probe is a 25-base sequence, complementary to the 3′ end of the target transcript, whilst the MM probe is identical to the PM probe but with a single mismatch at the 13th base. Fluorescently labelled RNA is hybridized to the array, and the resultant signal intensities are imaged in order to quantify the abundance of each transcript present in the sample. Depending on the normalization procedure adopted, the abundance of each transcript is calculated from its hybridization efficiency with its PM complementary probe, but with a correction factor applied, depending on, for example, nonspecific hybridization to its MM probe. Probe-level information is generally integrated at the level of the probe set (i.e. gene) before biological interpretation. GeneChip® arrays provide reproducible, accurate data at high throughput rates, and these data can be curated and compared across experiments through the use of data repositories (Lipshutz et al., 1999; Zhu & Wang, 2000; Hennig et al., 2003; Craigon et al., 2004; Redman et al., 2004). For example, GeneChip® array technology has been adopted widely within the plant sciences and data from several thousand GeneChip® arrays on the model plant A. thaliana are publicly available (Craigon et al., 2004).

A few studies have used GeneChip® arrays designed to a model species to interrogate the transcriptome of another species (‘cross-species transcriptomics’; Chismar et al., 2002; Enard et al., 2002; Caceres et al., 2003; Higgins et al., 2003; Becher et al., 2004; Khaitovich et al., 2004; Uddin et al., 2004; Weber et al., 2004; Hammond et al., 2005). In general, these studies have not accounted for the probe-level effects caused by the inefficient hybridization of certain transcripts from the target species to their appropriate GeneChip® probes designed for the model species. Probe-level effects will impact substantially on probe-set expression estimates (Ji et al., 2004; Grigoryev et al., 2005; Hammond et al., 2005). For example, when probe-set signal values are calculated using Microarray Analysis Suite (MAS version 5.0; Affymetrix), the weight carried by each probe within a probe set is inversely related to its distance from the median value of all probes within the probe set. Thus, probes not generating signals because of sequence polymorphisms with the transcript from the target organism can reduce the quality of information available. To counter this problem, some studies have adopted an RNA-based probe-selection system, for example, to study nonhuman mammalian transcriptomes using human GeneChip® arrays (Ji et al., 2004; Grigoryev et al., 2005), using probe masking to exclude probes which hybridized weakly to their target transcript. Such an approach will tend to favour the quantification of abundant transcripts at the expense of those expressed at lower levels.

Alternatively, a genomic DNA (gDNA)-based probe-selection strategy can be used to optimize cross-species transcriptomics (Hammond et al., 2005). In this approach, gDNA from the species of interest is labelled and hybridized to the GeneChip® array. Perfect-match probes which hybridize efficiently to the gDNA are selected for subsequent use in interpreting GeneChip® arrays challenged with RNA from the species of interest. This technique avoids bias towards abundant transcripts. This study reports the development of a gDNA-based probe-selection strategy to enable the robust profiling and comparison of the transcriptomes of two nonmodel plant species, T. caerulescens, a Zn hyperaccumulator, and T. arvense, a nonhyperaccumulator, using the ATH1 GeneChip® array.

Under natural conditions, most species contain < 0.1 mg Zn g−1 dry weight (DW) in their shoots, although the shoot Zn concentration of several hyperaccumulator species can exceed 10 mg Zn g−1 DW when growing in their natural habitats (Reeves & Baker, 2000). Zinc hyperaccumulators have been studied for their biological interest and conservation value (Whiting et al., 2005), and for their possible role in phytoremediation or phytomining (Chaney, 1993; Salt et al., 1998). Approximately 12 species of Zn hyperaccumulators occur in the Brassicaceae genera Arabidopsis and Thlaspi sensu lato (Reeves & Baker, 2000). Of these, Thlaspi caerulescens has been the most studied physiologically (Baker et al., 1994; Vázquez et al., 1994; Brown et al., 1995; Lasat et al., 1996, 1998; Pollard & Baker, 1996; Shen et al., 1997; F. J. Zhao et al., 1998, H. Zhao et al., 1998; Küpper et al., 1999, 2004; Frey et al., 2000; Pence et al., 2000; Whiting et al., 2000, 2003; Assunção et al., 2001; Lombi et al., 2001, 2002; White et al., 2002; Peer et al., 2003; Piñeros & Kochian, 2003; Cosio et al., 2004, 2005; Papoyan & Kochian, 2004; Ma et al., 2005). This short-lived, nonmycorrhizal perennial occurs on calamine [enriched with Zn, lead (Pb) and often cadmium (Cd)], serpentine [enriched in cobalt (Co), chromium (Cr), iron (Fe), magnesium (Mg) and nickel (Ni)] and nonmineral soils. It is extremely tolerant of Zn and, when grown hydroponically, it can accumulate 30 mg Zn g−1 shoot DW without toxicity symptoms.

Zinc plays an important role in enzyme regulation and transcription, but high cytoplasmic Zn2+ concentrations are toxic because they interfere with cellular processes (Gaither & Eide, 2001). Thus, cytoplasmic Zn2+ concentrations must be tightly controlled; a factor of profound importance in metal hyperaccumulation. In plants, cytoplasmic Zn2+ homeostasis is controlled by selective cation uptake, by translocation to the shoot, and by compartmentalization within and between cells (Andrews, 2001). It is also influenced by the abundance of proteins that chelate Zn2+. In T. caerulescens, three features of Zn homeostasis have been linked to Zn hyperaccumulation. First, T. caerulescens has unusually active mechanisms for Zn uptake and translocation to the shoot compared with T. arvense L., which does not accumulate appreciable quantities of Zn in its shoot (Lasat et al., 1996, 1998; Pence et al., 2000; Whiting et al., 2000; Assunção et al., 2001; White et al., 2002; Piñeros & Kochian, 2003). Translocation of Zn to shoots prevents toxic levels of Zn accumulating in the roots, enabling T. caerulescens to tolerate high concentrations of Zn in the soil. Secondly, the synthesis of histidine and organic acids, such as malate, is implicated in the cellular detoxification and xylem transport of Zn in T. caerulescens, thus helping it to maintain cytoplasmic Zn2+ homeostasis (Salt et al., 1999). Thirdly, Zn accumulates in epidermal cell vacuoles and cell walls of shoot tissues, where it can exist in an ionic form at concentrations > 60 mg Zn g−1 DW (Vázquez et al., 1994; Küpper et al., 1999, 2004; Frey et al., 2000; Cosio et al., 2004; Ma et al., 2005). The compartmentalization of Zn prevents toxic levels of Zn2+ accumulating in the cytoplasm. Zinc appears to be largely absent from within mesophyll and stomatal-complex cells (Frey et al., 2000; Cosio et al., 2005).

A comprehensive transcriptome analysis of Thlaspi species has not yet been reported and no custom microarrays are publicly available to assay these species. Previously, the Affymetrix A. thaliana AG GeneChip® array (representing 8300 genes) was used to demonstrate that genes involved in Zn transport and homeostasis were differentially expressed in shoots and roots of the Zn-tolerant, Zn-hyperaccumulator species Arabidopsis halleri (L.) O’Kane & Al-Shehbaz, compared with the nonhyperaccumulator species A. thaliana (Becher et al., 2004; Weber et al., 2004). The ATH1 GeneChip® array (representing 22 746 genes) has since been used to study A. halleri and Arabidopsis lyrata (L.) O’Kane & Al-Shehbaz ssp. petraea (a nonhyperaccumulator species; H. J. Newbury, University of Birmingham, UK, unpubl. obs.; http://affymetrix.arabidopsis.info/narrays/experimentpage.pl?experimentid=85). Further, the transcriptome of T. caerulescens has recently been profiled using a custom cDNA spotted array representing 1900 expressed sequence tags (ESTs) (Plessl et al., 2005) and a 60-mer oligo microarray (M. G. M. Aarts, Wageningen University, the University, pers. comm.) whose 40 000 probes were designed to represent full-genome coverage of A. thaliana (Arabidopsis 3 oligo microarray; Agilent Technologies Inc.).

This study provides a comprehensive comparison of the transcriptional profiles of T. caerulescens and T. arvense shoots. Thlaspi caerulescens (genus Noccaea) and T. arvense (genus Thlaspi sensu stricta) sequences segregate into distinct monophyletic clades within the genus Thlaspi sensu lato (Koch & Mummenhoff, 2001). However, these species are morphologically similar and are routinely used in comparative physiological and molecular studies on metal hyperaccumulators (Lasat et al., 1996; Pence et al., 2000; Assunção et al., 2001; Piñeros & Kochian, 2003). In this study, a gDNA-based probe-selection strategy was used to profile and compare the transcriptomes of T. caerulescens and T. arvense. This strategy was shown to be robust and valid, based on in silico alignments of ATH1 GeneChip® array probes with T. caerulescens and T. arvense gene sequences, and quantitative real-time PCR. Many transcripts encoding proteins with putative roles in cellular Zn homeostasis were differentially expressed in the shoots of T. caerulescens compared with T. arvense, including several genes previously involved in Zn transport and Zn compartmentalization.

Materials and Methods

Plant material and growth conditions

Seeds of T. caerulescens J & C Presl. (‘Viviez’ population, France; Reeves et al., 2001) and T. arvense L. (collected by A. J. M. Baker in Toronto, Canada) were washed in 70%[volume/volume (v/v)] ethanol, rinsed in deionized water and surface-sterilized using NaOCl (1% active chlorine). Seeds were imbibed for 4 d in sterile deionized water at 4°C to break dormancy. Imbibed seeds were sown into unvented, polycarbonate culture boxes (Sigma-Aldrich, Dorset, UK), on 75 ml 0.8% (w/v) agar containing 1% (w/v) sucrose and a basal salt mix (Murashige & Skoog, 1962). Nutrient agar made using the Murashige & Skoog (MS) basal salt mix has a Zn concentration of 0.03 mm Zn. Boxes were placed in a growth room under a 16-h photoperiod at a constant temperature of 24°C. Illumination was provided by a bank of 100-W cool fluorescent tubes (Type ‘84’, Philips, Eindhoven, the Netherlands), giving a photon flux density between 400 and 700 nm [i.e. photosynthetically active radiation (PAR)] of 50–80 µmol photons m−2 s−1 at plant height. To enable a comparison of gene expression under different growth conditions, unsterilized seeds were also sown into pots containing a mix of 25% sand and 75% (v/v) compost (Shamrock medium-grade sphagnum peat; Scotts UK, Bromford, Suffolk, UK). This mix had a Zn content of 0.2 mg l−1[± 0.04 standard error of the mean (SEM); n = 3]. Plants were grown under a 16-h photoperiod at a constant temperature of 22°C. Plant shoots were harvested 64 d after sowing. Three independent samples of shoot tissue were obtained from T. caerulescens and T. arvense plants grown on agar, and two samples of shoot tissue were obtained from T. caerulescens and T. arvense plants grown on compost. Shoot material was bulked from between three to eight plants per sample. Shoot tissue was placed inside sterile screw-cap Eppendorf tubes and snap-frozen in liquid N2. Samples were stored at −70°C before total RNA extraction.

RNA extraction and hybridization to A. thaliana ATH1 GeneChip® arrays

RNA was extracted from T. caerulescens and T. arvense, using methods described by Hammond et al. (2003). To each sample, 1 ml of TRIzol reagent was added, and total RNA was subsequently extracted according to the manufacturer's instructions (Invitrogen, Paisley, UK), with the following modifications: (i) after homogenization with the TRIzol reagent, the samples were centrifuged to remove any remaining plant material and the supernatant was then transferred to a clean Eppendorf tube, and (ii) to aid precipitation of RNA from the aqueous phase, 0.25 ml of isopropanol and 0.25 ml of 1.2 m NaCl solution containing 0.8 m sodium citrate were added. This procedure precipitated the RNA whilst maintaining the proteoglycans and polysaccharides in a soluble form. Extracted total RNA was then purified using the ‘RNA Cleanup’ protocol for RNeasy columns (Qiagen). RNA yield and purity were determined using an Agilent 2100 Bioanalyser (Agilent Technologies Inc.). Approximately 5 µg of total RNA was reverse-transcribed at 42°C for 1 h to generate first-strand cDNA using 100 pmol oligo dT(24) primer containing a 5′-T7 RNA polymerase promoter sequence, 50 mm Tris-HCl (pH 8.3), 75 mm KCl, 3 mm MgCl2, 10 mm dithiothreitol (DTT), 10 mm dNTPs and 200 units of SuperScript II reverse transcriptase (Invitrogen). Following first-strand synthesis, second-strand cDNA was synthesized using 10 units of Escherichia coli polymerase I, 10 units of E. coli DNA ligase and 2 units of RNase H in a reaction containing 25 mm Tris-HCl (pH 7.5), 100 mm KCl, 5 mm MgCl2, 10 mm (NH4)SO4, 0.15 mmβ-NAD+ and 10 mm dNTPs. The second-strand synthesis reaction proceeded at 16°C for 2 h before 10 units of T4 DNA polymerase was added and the reaction allowed to proceed for a further 5 min. The reaction was terminated by adding 0.5 m EDTA. Double-stranded cDNA products were purified using the GeneChip® Sample Cleanup Module (Affymetrix). The synthesized cDNAs were in vitro transcribed by T7 RNA polymerase (Enzo BioArray High Yield RNA Transcript Labelling Kit; Enzo Life Sciences Inc., Farmingdale, NY, USA) using biotinylated nucleotides to generate biotinylated complementary RNAs (cRNAs). The cRNAs were purified using the GeneChip® Sample Cleanup Module (Affymetrix). The cRNAs were then randomly fragmented at 94°C for 35 min in a buffer containing 40 mm Tris-acetate (pH 8.1), 100 mm potassium acetate, and 30 mm magnesium acetate to generate molecules of approx. 35–200 bp. Arabidopsis thaliana ATH1 GeneChip® arrays (Affymetrix) were hybridized with 15 µg of fragmented labelled cRNA for 16 h at 45°C as described in the Affymetrix Technical Analysis Manual. GeneChip® arrays were stained with streptavidin-phycoerythrin solution and scanned with a G2500A GeneArray Scanner (Affymetrix). Following scanning, nonscaled RNA signal intensity (CEL) files were generated using Microarray Analysis Suite (MAS version 5.0; Affymetrix). Nonscaled RNA CEL files contain the raw signal intensity values for each probe on the array, generated from the scanned image of the GeneChip® array. Therefore, the RNA CEL file contains signal intensity values for the 11 PM probes and 11 MM probes within each probe set; more than 500 000 signal intensity values for the ATH1 GeneChip® array. RNA CEL files were not scaled before further analysis using the Robust Multichip Average (RMA) preprocessor in GeneSpring (Agilent Technologies Inc.). [Note: When the RNA CEL files were scaled to a target intensity of 100, the average scale factor was 1.36 (± 0.07 SEM).] All data have been made publicly available at NASCArray http://arabidopsis.info and http://affymetrix.arabidopsis.info/xspecies.

Genomic DNA-based probe selection

To enable a robust comparison of the T. caerulescens and T. arvense transcriptomes, a gDNA-based probe-selection strategy was used. First, gDNA was extracted from T. caerulescens and T. arvense using the method of Thomas et al. (1994), which was adapted to use a pestle and mortar to disrupt plant material under liquid nitrogen. Two phenol:chloroform (1 : 1, v/v) extractions were performed. Genomic DNA was labelled using the Bioprime DNA labelling System (Invitrogen) and subsequently hybridized to ATH1 GeneChip® arrays for 16 h at 45°C, using standard Affymetrix hybridization protocols, followed by the Affymetrix eukaryotic wash protocol that included antibody staining. ATH1 GeneChip® arrays were hybridized with 0.5 µg of labelled gDNA. Subsequently, the ATH1 GeneChip® arrays were scanned on a G2500A GeneArray scanner and a gDNA CEL file was generated using MAS 5.0 (Affymetrix). Two gDNA CEL files were generated that contained the nonscaled gDNA hybridization intensities for ATH1 GeneChip® array probes when challenged with either T. caerulescens or T. arvense gDNA fragments. Only one gDNA hybridization was performed, as replicate gDNA hybridizations all challenge the GeneChip® arrays with the whole genome. Therefore, replicates with gDNA hybridizations give the same relative positions of hybridization intensities for each gene. When the data were scaled to a target intensity of 100, the scale factor was 1.92 and 2.50 for arrays challenged with T. caerulescens and T. arvense gDNA, respectively. Nonscaled genomic DNA CEL files are available at http://affymetrix.arabidopsis.info/xspecies.

Probe pairs from the nonscaled gDNA CEL files were selected for subsequent Thlaspi transcriptome analysis using a parser written in Perl (http://www.perl.com) to generate custom probe mask files. The Perl script selects probe pairs in which the PM probe has a gDNA hybridization intensity greater than a user-defined threshold. This script then identifies probe sets retaining one or more PM probes with a gDNA hybridization intensity greater than the imposed threshold. Consequently, 25 bp of homologous probe sequence of A. thaliana was the minimum theoretical requirement for transcriptome analysis of T. caerulescens and T. arvense. Selected probe sets, now defined by between one and 11 probe pairs, are collated in a custom Chip Description File (CDF). This CDF file provides the template for the generation of a signal for the probe set when analysing the Thlaspi transcriptome (i.e. the RNA CEL files). The custom CDF files thus allow information to be extracted from the RNA CEL files for only those probe pairs whose PM probe has a gDNA hybridization intensity above the imposed threshold. Note that both PM and MM probes are retained in the CDF file.

The Perl script was written to enable user-specified gDNA hybridization intensity thresholds for probe mask file generation to be set. Thus, the optimal gDNA hybridization intensity threshold for interpreting transcriptomic data could be determined systematically and empirically (see Results). Twelve probe mask files (CDF files) were subsequently generated for each species (24 in total) using gDNA hybridization intensity thresholds from 50 to 1000 (50, 100, 150, 200, 300 … , 1000). The ATH1 GeneChip® array CDF file represents a universal CDF file which can be used for both species, but with no probe pairs excluded. The Perl script, Xspecies version 1.1, is available at http://affymetrix.arabidopsis.info/xspecies.

Interpretation of Thlaspi transcriptome data using gDNA-based probe selection

The 10 nonscaled RNA CEL files, representing replicates of both species grown under both conditions, were loaded into GeneSpring analysis software (GeneSpring 7.2; Silicon Genetics, CA, USA) using the Robust Multichip Average (RMA) prenormalization algorithm (Irizarry et al., 2003). To establish the optimal gDNA hybridization threshold for interpreting transcriptional data, the nonscaled RNA CEL files were first loaded into GeneSpring and prenormalized as a single experimental group using the A. thaliana CDF file (i.e. with no probe selection). These 10 RNA CEL files were then reanalysed within GeneSpring using the RMA prenormalization algorithm, but now using each of the 24 different probe mask (CDF) files in turn. Note that, at each gDNA hybridization threshold, all 10 RNA CEL files from the transcriptional analysis were prenormalized together. This procedure thus generated 25 sets of data, each containing 10 grouped RNA CEL files, within GeneSpring (one set with no probe selection, 12 with probe selection based on T. caerulescens gDNA hybridization and 12 with probe selection based on T. arvense gDNA hybridization). Subsequently, at each of the given gDNA hybridization thresholds, the T. caerulescens RNA CEL files prenormalized with a T. caerulescens probe mask file (CDF) were combined with the T. arvense RNA CEL files prenormalized with a T. arvense probe mask file (CDF). This procedure thereby reduced the 25 sets of transcriptional data to 13, each one representing a different gDNA hybridization intensity threshold. Combining the T. caerulescens and T. arvense data sets in this way ensured that only common probe sets were used in all transcriptome comparisons. As these probe sets are comprised of different PM probes for T. caerulescens and T. arvense, and as there are large differences in signal values generated across a probe set (Table 1), the different thermodynamic properties of individual probes may bias the analysis. However, the strategy of maximizing PM probe retention is mitigated by the RMA prenormalization algorithm (Irizarry et al., 2003) and minimizes any loss of data quality and quantity.

Table 1.  Cross-validation of genomic DNA (gDNA)-based probe selection for interpretation of transcriptional profiles of Thlaspi caerulescens and Thlaspi arvense obtained using the ATH1 GeneChip® array. Thlaspi caerulescens and T. arvense were grown on agar and compost under zinc (Zn)-replete conditions
GeneProbe-set IDCorresponding A. thaliana genea% similarity between A. thaliana and Thlaspi coding sequenceb Perfect-match probes for each probe set
    1   2   3   4   5   6   7   8   9  10  11
  • Labelled genomic DNA and RNA from both species were hybridized to Affymetrix Arabidopsis thaliana ATH1-121501 GeneChip® arrays. Genomic DNA hybridization intensity values were used to select probe pairs that were suitable for interpretation of transcriptional data from T. caerulescens and T. arvense.

  • a

    The T. caerulescens[zinc transporter 1 (ZNT1), AF133267; ZNT2/4, AF275752/AF292370; ZNT5, AF292029; heavy-metal-associated domain-containing protein 4 (HMA4), TCA567384; nitrate reductase 1 (NR1), AY551529; carbonic anhydrase (CA), AY551530; nicotianamine synthase 1 (NAS1), TCA300446) or T. arvense (metal tolerance protein 1 (MTP1), AY483146; chalcone synthase (CHS), AF144535] coding sequences were used to perform a BLAST search against the A. thaliana genome to identify the homologous sequence.

  • b

    A. thaliana sequences were aligned with the respective T. caerulescens or T. arvense sequences using the ClustalW algorithm within Vector NTI to obtain % similarities.

  • c

    Perfect-match oligo sequences from probe sets were imported into Vector NTI and aligned with gene sequences using the ‘Motif’ analysis function to obtain % similarities.

  • d

    Nonscaled raw hybridization intensity value from hybridization of T. caerulescens genomic DNA onto the ATH1 GeneChip® arrays.

  • e

    Mean nonscaled hybridization intensity value from hybridization of T. caerulescens RNA onto the ATH1 GeneChip® arrays (n = 5).

  • f

    Nonscaled raw hybridization intensity value from hybridization of T. arvense genomic DNA onto the ATH1 GeneChip® arrays.

  • g

    Mean nonscaled hybridization intensity value from hybridization of T. arvense RNA.

  • Ta, T. arvense; Tc, T. caerulescens.

ZNT1260462_atAt1g1097082.7% sequence similarityc   84  96 100  96  92  84  96 100  92  92  92
Tc gDNA signald  440.8 174.3 480.0 258.5 180.0 365.5 178.0 157.3 282.3 253.5  86.8
Tc RNA mean signale 1385.0 633.3 883.3 353.6 332.1 173.12164.4 658.7 475.6 194.2 359.8
Ta gDNA signalf  458.5 274.3 577.3 233.3 449.5 461.5 230.8 253.0 584.3 737.0 126.3
Ta RNA mean signalg  196.2 115.9 216.3 264.5 250.0 228.41180.2 210.9 434.9 181.5 242.9
ZNT2/4259723_atAt1g6096075.3% sequence similarityc   92  84  92  84  96  92  88  88  88  88  92
Tc gDNA signald  296.8 203.0 604.3  91.5 699.5 843.5 114.0 295.8 115.3 198.0 287.8
Tc RNA mean signale 1569.61711.64228.9 244.11684.41654.1 441.7 880.3 158.12041.92342.5
Ta gDNA signalf  338.8 286.3 221.8 102.0 692.31115.8 195.3 460.3 149.3 246.8 410.8
Ta RNA mean signalg  287.0 717.8 407.6 251.7 438.6 271.2 405.6 230.3 231.4 200.0 443.6
ZNT5264574_atAt1g0530083.6% sequence similarityc   72  92  76  80 100  84  80  72  92  92  92
Tc gDNA signald  623.3 135.0 136.5 153.8 414.0 366.8 113.5 257.0 132.5 146.0 135.8
Tc RNA mean signale  123.0 204.2 151.6 121.0 700.4 708.6 202.4 141.2 156.6 452.3 487.4
Ta gDNA signalf  742.8 229.8 231.5 352.5 750.5 653.8 118.0 382.8 318.0 272.5 173.3
Ta RNA mean signalg  137.5 311.4 240.3 125.52030.6 884.4 193.0 315.0 728.01128.51833.7
HMA4267488_atAt2g1911071.7% sequence similarityc   72  76  68  84  64  64  60  64  64  84  76
Tc gDNA signald  160.5 430.0 608.51758.3 213.0 163.5 353.8 194.3 180.01600.8 517.8
Tc RNA mean signale  235.7 608.5 158.51163.1 100.9  95.3  94.4 190.9  84.0 293.8 236.0
Ta gDNA signalf  293.8 740.0 874.52244.8 253.0 413.3 460.8 301.0 250.31648.8 853.3
Ta RNA mean signalg  311.6 586.1 194.91128.2  98.9  97.0  86.7 126.3  76.8 288.7 183.7
NR1259681_atAt1g7776084.7% sequence similarityc   56  56  80  88  84  84  88  92  84  80 100
Tc gDNA signald  230.3 102.0 233.5 253.5 139.8 266.8 524.8 435.5 621.3 262.3 450.3
Tc RNA mean signale  159.4 309.1 210.5 416.3 116.4 108.5 198.7 323.4 213.8 103.63466.5
Ta gDNA signalf  309.3 112.3 553.3 369.3 297.3 696.81488.5 626.8 664.3 483.5 370.0
Ta RNA mean signalg  231.3 236.1 224.6 292.81122.8 100.8 250.5 447.5 192.5 117.51958.9
CA259161_atAt3g0150089.4% sequence similarityc   96 100  92  92  92 100  96  92  96  92 100
Tc gDNA signald  139.8 242.3 133.8 427.3 294.0 348.5 571.3 100.5 114.3 141.8 601.0
Tc RNA mean signale  381.06398.0 342.25659.61461.63636.51347.2 168.81596.6 202.03641.7
Ta gDNA signalf  539.3 128.0 286.3 521.8 384.8 496.8 925.8 110.8 168.0 182.3 948.5
Ta RNA mean signalg10520.91890.4 163.42798.75587.13578.22044.3 178.62819.3 143.51127.4
NAS1264261_atAt1g0924085.0% sequence similarityc   88  96 100  96  96  96 100 100  96  60  84
Tc gDNA signald  293.0 311.8 473.0 232.8 250.3 654.5 191.3 692.5  85.5  86.5 362.3
Tc RNA mean signale  753.7 481.2 453.0 538.71418.42315.2 764.61343.3 313.1 176.7 195.4
Ta gDNA signalf  403.0 513.5 521.5 335.3 255.5 806.3 218.3 486.0  85.3 127.3 402.3
Ta RNA mean signalg  805.3 302.8 147.2 554.7 359.43683.2 503.6 348.4 125.8 153.1 223.0
MTP1266718_atAt2g4680077.5% sequence similarityc   96  88  88  92  80  80  92  92  84  88 100
Tc gDNA signald  268.8  78.3 223.8 140.0 310.8 104.3 327.0 355.0  90.3 105.0 158.0
Tc RNA mean signale  157.2 110.7 286.0 196.91941.41617.7 101.3 100.8 274.0 109.4 339.4
Ta gDNA signalf  814.0 100.3 575.8 245.0 561.3 139.3 580.5 565.0 140.8 169.5 303.0
Ta RNA mean signalg  959.0 125.2 355.8 281.0 177.2 257.8 101.2 363.5  84.9 114.23686.0
CHS250207_atAt5g1393084.0% sequence similarityc   72  88  84  96  88  92  92 100  88  92  96
Tc gDNA signald  276.8 748.3 186.3 318.5 100.8 266.8 244.5 449.0  85.8 309.8 196.8
Tc RNA mean signale  222.5 839.8 255.4 512.6 190.2 401.1 134.9 436.0 451.3 681.7 286.1
Ta gDNA signalf  728.01290.3 385.0 440.8 146.8 356.3 383.8 324.5 115.0 416.5 346.0
Ta RNA mean signalg  260.11268.3 209.8 518.5 177.4 439.6 146.4 601.0 706.8 876.6 536.9

Within each of the 13 transcriptome data sets interpreted at a different gDNA hybridization intensity threshold, in which each data set contained 10 RNA CEL files, per-gene normalizations were applied to the probe-set signal values as follows. For each biological replicate and growth condition, probe-set signal values from T. caerulescens and T. arvense were standardized to the probe-set signal value of T. arvense (i.e. T. arvense samples were considered to be the control), by dividing RMA prenormalized probe-set signal values from T. caerulescens by the RMA prenormalized probe-set signal values from T. arvense (Supplementary Table 2). Putative genes (i.e. probe-sets) with differential hybridization intensities between T. caerulescens and T. arvense were identified using a two-step process: (i) genes that were 2-fold up- or down-regulated were selected, and (ii) a Welch's t-test was performed to identify genes that were differentially expressed between T. caerulescens and T. arvense using the Benjamini-Hochberg false discovery rate (FDR) multiple testing correction at 0.05 or 0.005.

Table 2.  Cross-validation of genomic DNA (gDNA)-based probe selection for interpretation of transcriptional profiles of Thlaspi caerulescens and Thlaspi arvense obtained using the ATH1 GeneChip® array
GeneATH1-121501 probe-set IDCorresponding A. thaliana geneaNormalized probe-set valuebQuantitative PCR analysiscAmplification efficiencies for quantitative PCR analysisd
No probe signal cut-offProbe signal cut-off = 300External [Zn] 3 µmExternal [Zn] 30 µm
Tc/Ta ratioP-valueTc/Ta ratioP-valueTc/Ta ratioSEMTc/Ta ratioSEM18S rRNAGene
  • Thlaspi caerulescens and T. arvense were grown on agar and compost under zinc (Zn)-replete conditions. Labelled genomic DNA and RNA from both species were hybridized to Affymetrix Arabidopsis thaliana ATH1-121501 GeneChip® arrays. Genomic DNA hybridization intensity values were used to select probe pairs that were suitable for interpretation of transcriptional data from T. caerulescens and T. arvense.

  • a

    The T. caerulescens[zinc transporter 1 (ZNT1), AF133267; ZNT2/4, AF275752/AF292370; ZNT5, AF292029; heavy-metal-associated domain-containing protein 4 (HMA4), TCA567384; nitrate reductase 1 (NR1), AY551529; carbonic anhydrase (CA), AY551530; nicotianamine synthase 1 (NAS1), TCA300446) or T. arvense (metal tolerance protein 1 (MTP1), AY483146; chalcone synthase (CHS), AF144535] coding sequence was used to perform a BLAST search against the A. thaliana genome to identify the homologous sequences and design primers for quantitative PCR.

  • b

    Nonscaled raw RNA hybridization intensity values were robust multichip average (RMA) normalized using either the ATH1-121501 GeneChip® array Chip Description File (CDF) (no gDNA-based probe selection) or a custom CDF in which probe pairs were selected with T. caerulescens or T. arvense gDNA hybridization intensity values greater than 300. RMA normalized probe-set signal values were then standardized by dividing probe-set signal values from T. caerulescens by the probe-set signal values from T. arvense. Values presented are the mean of five replicates, three with plants grown on agar and two with plants grown on compost. P-values are from a Welch's t-test using the Benjamini and Hochberg false discovery rate multiple testing correction.

  • c

    Quantitative PCR for these genes was performed on samples taken from T. caerulescens and T. arvense grown at 3 and 30 µm Zn. Transcript abundances in T. caerulescens were normalized to 18S rRNA abundance and then to the transcript abundance in T. arvense.

  • d

    Amplification efficiencies (Ex) were calculated from the standard curves derived for the endogenous control gene (18S rRNA) and the gene of interest using the equation Ex = [10(1/–b)] − 1, where b is the slope of the standard curve.

  • Ta, T. arvense; Tc, T. caerulescens; SEM, standard error of the mean.

ZNT1260462_atAt1g109702.550.0159 4.42  0.004510.22 3.50 1.07 0.761.041.08
ZNT2/4259723_atAt1g609606.490.001212.52  0.000752.6311.4343.7213.351.041.03
ZNT5264574_atAt1g053000.360.0043 2.41  0.0095 4.18 1.27 0.70 0.361.071.14
HMA4267488_atAt2g191101.030.1360 2.19< 0.000140.5026.0288.0241.611.071.12
NR1259681_atAt1g777600.870.2230 1.75  0.004528.00 8.8742.7512.661.041.08
CA259161_atAt3g015000.990.8890 1.11  0.3840 1.55 0.59 0.30 0.141.041.01
NAS1264261_atAt1g092401.390.0127 0.96  0.6890 3.06 2.40 6.05 1.311.041.09
MTP1266718_atAt2g468000.940.0325 0.24< 0.0001 0.73 0.22 0.97 0.691.041.08
CHS250207_atAt5g139300.920.2420 1.75  0.0018 0.68 0.24 0.07 0.051.041.06

In silico alignment of A. thaliana sequences and PM probes with Thlaspi gene sequences

Coding sequences for seven T. caerulescens[heavy-metal-associated domain-containing protein 4 (HMA4), TCA567384; nitrate reductase 1 (NR1), AY551529; zinc transporter 1 (ZNT1), AF133267; ZNT2/4, AF275752/AF292370; ZNT5, AF292029; carbonic anhydrase (CA), AY551530; nicotianamine synthase 1 (NAS1), TCA300446) and two T. arvense (chalcone synthase (CHS), AF144535; metal tolerance protein 1 (MTP1), AY483146] genes were downloaded from the GenBank database (http://www.ncbi.nlm.nih.gov/entrez, 25/07/2004; Benson et al., 2004). Corresponding A. thaliana sequences were identified using the BLAST algorithm against the GenBank database to identify the closest match to the Thlaspi gene sequence (Altschul et al., 1997; http://www.ncbi.nlm.nih.gov/BLAST). All sequences were loaded into Vector NTI for further analysis (version 9.0.0; Invitrogen). A. thaliana sequences were aligned with Thlaspi sequences using the ClustalW algorithm within Vector NTI. Perfect-match oligo sequences from probe sets assigned to the A. thaliana sequences identified by the BLAST algorithm were downloaded from Affymetrix (http://www.affymetrix.com). Perfect-match sequences were imported into Vector NTI and aligned with gene sequences using the ‘Motif’ analysis function.

Quantitative real-time PCR (qPCR)

To verify GeneChip® array expression data, qPCR was performed on T. caerulescens and T. arvense genes for which coding sequences were available in GenBank (Supplementary Table 1). As a consequence of the availability of seed resources, different Thlaspi populations were used for the qPCR confirmation. Seeds of T. caerulescens (‘Ganges’ population, France) and T. arvense (collected from Wharf Ground field, Wellesbourne, Warwickshire, UK) were sterilized, imbibed and sown in agar as described previously, using a 10% basal salt formulation to reduce the ambient external Zn concentration ([Zn]ext). The agar in each box was amended with ZnSO4, to provide [Zn]ext at 3, 30, 60, 150, 300, 600, 1200 and 1800 µm. To enable accurate determination of tissue Zn concentrations, 65Zn was added to the agar at an activity concentration of 2.1 kBq 65Zn µmol−1 Zn. Boxes were placed in the growth room as described previously in this paper. Plant shoots were harvested 42 d after sowing and samples were split. For qPCR, subsamples were placed inside sterile screw-cap Eppendorf tubes and snap-frozen in liquid N2. Samples were stored at −70°C before total RNA extraction. For tissue Zn concentration analysis, subsamples were weighed fresh and dry and 65Zn γ-emissions were counted for 600 s per sample on an automatic well-type gamma counter (Wallac 1480 Wizard; Perkin-Elmer Life Sciences, Turku, Finland).

Total RNA was extracted from whole plants as described previously in this paper. Residual chromosomal DNA was removed using the on-column DNase digestion protocol during the purification of the RNA with the RNeasy columns (Qiagen). Reverse transcription was performed on 1 µg of total RNA from T. caerulescens and T. arvense using the ThermoScript RT-PCR system (Invitrogen). The cDNA synthesis reaction was carried out using random hexamers (50 ng µl−1). Primers for quantitative PCR were designed to the cDNA sequence of each of the nine genes (Supplementary Table 1) and an 18S rRNA control gene using the Primer3 primer design tool (Rozen & Skaletsky, 2000; http://frodo.wi.mit.edu/cgi-bin/primer3/primer3.cgi). Initial primer sequences were checked for secondary structures using the PrimerSelect programme (DNASTAR Inc., Madison, WI, USA) and were subjected to BLAST searches for short, nearly exact sequences against the GenBank database (Altschul et al., 1997; http://www.ncbi.nlm.nih.gov/BLAST). This search was optimized for short sequences (between 7 and 20 bp), as short sequences will often not find a significant match using standard BLAST settings, to ensure primer sequences were specific to the target Thlaspi cDNA. The BLAST search was optimized by decreasing the word size to seven, and increasing the Expect value to 1000 (i.e. lowering the stringency of the search). Sequences forming internal loops or dimers, or with homology to other gene sequences, were discarded and new primers designed.

The expression of nine genes was successfully quantified in the shoots of both T. caerulescens and T. arvense using the ‘standard curve’ method for mRNA quantification with normalization to the endogenous control gene, 18S rRNA (Wong & Medrano, 2005; Supplementary Table 1; T. caerulescens: HMA4, TCA567384; NR1, AY551529; ZNT2/4, AF275752/AF292370; ZNT1, AF133267; ZNT5, AF292029; CA, AY551530; NAS1, TCA300446; T. arvense: CHS, AF144535; MTP1, AY483146). The expression of all nine genes was quantified in samples from plants grown at 3 and 30 µm[Zn]ext, whilst the expression of HMA4, ZNT2/4 and NR1 was also quantified in T. caerulescens at all eight levels of [Zn]ext. ZNT2 and ZNT4 are likely to be alleles of the same gene, as there is very little sequence difference and they were isolated from different accessions. Indeed, they are represented by the same probe set on the ATH1 GeneChip® array (Table 1). Quantification of transcript levels for each gene was achieved by qPCR using SYBR® Green fluorescent dye (Applied Biosystems, Warrington, UK) and an ABI Prism 7900 HT sequence detection system (Applied Biosystems). Technical and biological triplication was used. Reactions were conducted using 15 µl per reaction in 384-well plates consisting of 2 ng of cDNA sample, 1 µm 5′ and 3′ primer and 7.5 µl of 2 × SYBR® Green PCR master mix (Applied Biosystems). The reaction conditions were 50°C (2 min) and 95°C (10 min) for one cycle, and 95°C (15 s) and 60°C (1 min) for 40 cycles. A dissociation step of 95°C for 15 s, 60°C for 15 s and 95°C for 15 s was used for melting curve analyses so that primer dimers and nonspecific products could be detected. Control reactions (see Supplementary Table 6; NTC, no template controls) using either 2 ng of total RNA or RNase free H2O were used to test for gDNA contamination in each sample. A standard curve for each gene was used to enable quantification of unknown samples. A dilution series of 20, 2, 0.2 and 0.02 ng of reverse-transcribed T. caerulescens total RNA (cDNA) µl−1 was used for the standard curves of HMA4, NR1, ZNT2/4, ZNT1, ZNT5, CA and NAS1, and a dilution series of 20, 2, 0.2 and 0.02 ng of reverse-transcribed T. arvense total RNA (cDNA) µl−1 was used for the standard curves of CHS and MTP1. The cycle threshold (Ct) and normalized fluorescence (ΔRn, normalized against the 6-carboxy-X-rhodamine (ROX) passive reference and background fluorescence from the baseline) values were determined using the ABI Prism sequence detector software (version 2.1). The standard curves for each gene [transcript quantity (ng) vs Ct value], based on the dilution series above, were first drawn, and then initial transcript quantities for unknown samples were derived from their respective standard curves based on their observed Ct value. The Ct values were determined using a threshold value of 0.2. The amplification efficiencies (Ex) of the reactions were determined from the slope of the standard curve using the following equation:

image(Eqn 1)

(b, the slope of the standard curve.)

This is derived from the following equation:

image(Eqn 2)

[Xn, the number of target molecules at cycle n; X0, the initial number of target molecules; n, the number of cycles (Lekanne Deprez et al., 2002).]

Theoretically, amplification efficiencies calculated using Eqn 1 should be equal to 1 (1 represents a 100% efficient reaction). The standard curve method does, however, slightly overestimate the amplification efficiencies. Thus, average amplification efficiencies for the 18S rRNA gene were 1.04 (n = 10). Amplification efficiencies for each target gene standard curve and the 18S rRNA standard curve to which they were normalized are presented in Table 2.

Transcript levels of each gene were normalized to 18S rRNA by dividing the mean quantity from the technical and biological replicates for the gene of interest by the mean quantity from the technical and biological replicates for 18S rRNA. The expression of all genes was measured relative to the expression of the gene in T. arvense grown at 3 µm Zn (calibrator sample). Transcript quantities from T. caerulescens grown at 3 µm Zn and T. arvense grown at 30 µm Zn were divided by transcript quantities from T. arvense grown at 3 µm Zn. Transcript quantities from T. arvense grown at 3 µm Zn were divided by themselves to give a relative value of 1. Transcript quantities from T. caerulescens grown at 30 µm Zn were calculated as the product of the relative transcript abundance for T. caerulescens grown at 3 µm Zn and the transcript abundance for T. caerulescens grown at 30 µm Zn divided by the transcript abundance for T. caerulescens grown at 30 µm Zn. Raw data for all qPCR experiments are available in Supplementary Table 6.

Results

The aim of this study was to develop a robust method to compare the transcriptomes of two nonmodel plant species, T. caerulescens and T. arvense, using the commercially available ATH1 GeneChip® array. A gDNA-based probe-selection strategy based on the hybridization efficiency of T. caerulescens and T. arvense gDNA with corresponding A. thaliana probes was adopted. The validity of this approach was tested using in silico alignments of GeneChip® probes with Thlaspi gene sequences, and qPCR. This validation focused on nine genes, seven from T. caerulescens and two from T. arvense, which were chosen according to the availability of sequence information (Tables 1 and 2).

A. thaliana and Thlaspi sequence comparisons suggest gDNA-based probe selection is feasible

Sequences for the nine T. caerulescens and T. arvense genes chosen for validation of the gDNA-based probe-selection strategy were used to identify homologous A. thaliana sequences. Homologous Thlaspi and A. thaliana gene sequences were aligned to determine the similarity at the sequence level between the species. The average similarity between the coding regions of the genes was 81.5% (Table 1). The gene encoding a carbonic anhydrase showed the greatest similarity between A. thaliana and T. caerulescens, with 89.4% of the nucleotides in the coding region being identical. Perfect-match probe sequences for probe sets representing A. thaliana genes homologous to the nine Thlaspi sequences were downloaded from Affymetrix. As expected, PM oligo sequences were 100% identical to their corresponding A. thaliana gene sequence. When the PM oligo sequences were aligned with the Thlaspi gene sequences, an average of 87% of nucleotides were identical, equivalent to approx. 240 identical nucleotides between the Thlaspi sequences and the A. thaliana PM probes (Table 1). The colinearity (conserved order from the 5′ to the 3′ end of the gene sequence) of the PM probe sequences from A. thaliana is conserved along seven of the nine Thlaspi sequences analysed in detail, with sequences encoding TcHMA4 and TcNR1 showing partial colinearity. Seven of the nine Thlaspi gene sequences had 100% identity with one or more of the ATH1 GeneChip® array PM probes. Two-thirds of the Thlaspi sequences analysed had more than 89% identity with the perfect-match probe sequences. The T. caerulescens gene sequence encoding the P1B-type ATPase, TcHMA4 (a heavy-metal-associated domain-containing protein), had the lowest homology to the PM probe sequences (Table 1).

Thlaspi gDNA and RNA hybridize to ATH1 GeneChip® arrays

As the genomes of T. caerulescens and T. arvense have not been sequenced, identification of PM probes suitable for interrogating these transcriptomes by sequence alignment is not possible. Thus, gDNA from T. caerulescens and T. arvense was labelled and hybridized to the ATH1 GeneChip® arrays. Hybridization signals were detected for all probes on the GeneChip® for both T. caerulescens and T. arvense gDNA samples. The average gDNA hybridization signal intensities for all probes on the ATH1 GeneChip® array were 345 and 475 for T. caerulescens and T. arvense, respectively. The average hybridization signal intensity was higher, for all genes analysed in detail, when gDNA from T. arvense was hybridized to the ATH1 GeneChip® array than when gDNA from T. caerulescens was hybridized (Table 1). As gDNA from both T. caerulescens and T. arvense generated hybridization signals on the ATH1 GeneChip® array, this information can be used to identify probes and probe sets that have good homology between A. thaliana and T. caerulescens or T. arvense to analyse the transcriptional profile of these species.

ATH1 GeneChip® arrays were challenged with total RNA from shoots of T. caerulescens and T. arvense grown on agar (n = 3) and compost-based (n = 2) substrates under Zn-replete conditions. Hybridization signals were detected for all probes on the ATH1 GeneChip® for both T. caerulescens and T. arvense RNA samples. The average hybridization signal with T. caerulescens RNA was 343 and the average hybridization signal with T. arvense RNA was 346.

Probe selection based on gDNA hybridization signals optimizes transcriptional analysis

As labelled RNA from Thlaspi hybridized to the ATH1 GeneChip® array, it is feasible to perform transcriptional comparisons of T. caerulescens and T. arvense. However, sequence polymorphisms between Thlaspi and A. thaliana and between T. caerulescens and T. arvense will affect overall signals calculated using all the probes in the probe set. For example, within the probe set representing an A. thaliana gene encoding a carbonic anhydrase, a 73-fold difference was observed between the lowest and highest hybridization intensity values of individual probes for the cRNA hybridization (Table 1). The average difference between the lowest and highest hybridization intensity values of individual probes was over 20-fold within the probe sets of the nine genes analysed in detail, when T. caerulescens or T. arvense RNA was hybridized to the array (Table 1).

Thus, a gDNA-based probe-selection strategy was adopted. The efficiency of gDNA-based probe selection was determined empirically by using 13 probe mask files with gDNA hybridization intensity thresholds ranging from 0 (i.e. no probe selection) to 1000. At increased gDNA hybridization intensity thresholds for probe mask file generation, probe-set retention was good (Fig. 1). For example, the probe mask files for both Thlaspi species generated using a gDNA hybridization intensity threshold of 300 masked > 50% of the available PM oligo probes with the loss of 3% of available A. thaliana probe sets (Fig. 1). This result is consistent with observations of the gDNA hybridization efficiency in Brassica oleracea (Hammond et al., 2005; http://affymetrix.arabidopsis.info/xspecies). Using a gDNA hybridization intensity threshold of 300, the average number of probe pairs retained to calculate the transcript abundance from the RNA CEL files was four and seven for T. caerulescens and T. arvense, respectively (Table 1; bold figures). On average, there were 3.8 probes common to both species using a gDNA hybridization threshold of 300 for the probe sets analysed, with the majority of probes selected using T. caerulescens gDNA also being selected by hybridization to gDNA from T. arvense (Table 1).

Figure 1.

(a) Arabidopsis thaliana perfect-match (PM) probes and probe sets from the Affymetrix A. thaliana ATH1-121501 GeneChip® array used to study the transcriptomes of Thlaspi caerulescens and Thlaspi arvense, as a function of the genomic DNA hybridization intensity thresholds used to generate the probe mask files, following the hybridization of labelled gDNA to GeneChip® arrays. Squares represent T. caerulescens and circles T. arvense. Closed symbols are scaled to the left-hand y-axis (i.e. probe sets used in probe mask files), and open symbols are scaled to the right-hand y-axis (i.e. PM probes retained in probe mask files). (b) Probe sets in common between T. caerulescens and T. arvense as a function of the gDNA hybridization intensity thresholds used to generate the probe mask files.

There was close association between agar- and compost-grown transcriptional profiles for T. caerulescens and T. arvense, indicating that the substrate effect was small in comparison to species effects in determining the differential expression of genes between the two species (Fig. 2; Supplementary Table 2). Data from all agar- and compost-grown Thlaspi samples were clustered using the ‘Condition Tree’ analysis tool with a Pearson correlation similarity measure in GeneSpring (Fig. 2). Probe selection at a gDNA hybridization intensity threshold of 300 was used and probe-set signals were normalized to the median value across all GeneChip® array data. The number of genes differentially expressed more or less than 2-fold (FDR of 0.05) between T. caerulescens and T. arvense increased substantially when probe mask files were used (Fig. 3). When no probe selection was used, 159 genes were identified as differentially (> 2-fold or < 0.5-fold) expressed (72 higher and 87 lower expression) in the shoots of T. caerulescens compared with T. arvense. When probe selection at a gDNA hybridization intensity threshold of 300 was used, 5782 genes were identified as differentially (> 2-fold or < 0.5-fold) expressed (3816 higher and 1966 lower expression). Thus, the sensitivity to detect transcript differences between the two species was substantially improved using gDNA-based probe selection (Fig. 3).

Figure 2.

Clustered GeneChip® array data from RNA extracted from the shoots of Thlaspi caerulescens and Thlaspi arvense grown under two conditions (agar- and compost-grown plants). A gDNA-based probe-selection strategy was adopted for both species at a gDNA hybridization intensity threshold of 300. Probe-set signals were normalized to the median value of all GeneChip® arrays. Clustering was performed in GeneSpring (Silicon Genetics) using the ‘Condition Tree’ analysis tool and a Pearson correlation similarity measure.

Figure 3.

Genes differentially expressed in the shoots of Thlaspi caerulescens and Thlaspi arvense as a function of the gDNA hybridization intensity threshold used to generate probe mask files for the transcriptome analysis. (a) Genes expressed significantly > 2-fold and (b) < 0.5-fold in T. caerulescens compared with T. arvense; for data see Supplementary Table 2.

Nine genes were used to validate gDNA-based probe selection (Table 1). Two genes whose expression was detected as higher in T. caerulescens than in T. arvense with no probe selection (ZNT1, 2.55-fold, P < 0.05 and ZNT2/4, 6.49-fold, P < 0.01) had increased differential expression when a probe mask file generated at a gDNA hybridization intensity threshold of 300 was used (ZNT1, 4.42-fold, P < 0.005 and ZNT2/4, 12.52-fold, P < 0.001). The expression of ZNT5 was detected as lower in T. caerulescens than in T. arvense with no probe selection (0.34-fold, P < 0.005). However, ZNT5 had higher expression in T. caerulescens when a probe mask file generated at a gDNA hybridization intensity threshold of 300 was used (2.24-fold, P < 0.01; Table 1). Three genes were not significantly differentially expressed when no probe selection was used (HMA4, 1.03-fold, P > 0.05; NR1, 0.87-fold, P > 0.05; CHS, 0.92-fold, P > 0.05). However, there was higher expression detected in T. caerulescens than in T. arvense following probe selection (HMA4, 2.19-fold, P < 0.0001; NR1, 1.75-fold, P < 0.005; CHS, 1.75-fold, P < 0.005). The expression of a carbonic anhydrase showed no differential expression between T. caerulescens and T. arvense either with or without gDNA-based probe selection (Table 1). Thus, gDNA-based probe selection dramatically affects the estimates of differential gene expression between T. caerulescens and T. arvense.

Quantitative PCR confirms differential gene expression between T. caerulescens and T. arvense

Quantitative PCR (qPCR) was used to confirm the reliability of gDNA-based probe selection for transcriptome analysis of T. caerulescens and T. arvense. Primers were designed to coding regions of published T. caerulescens or T. arvense gene sequences (Supplementary Table 1) and transcript abundance was determined at two levels of [Zn]ext for nine genes (Table 2; Fig. 4a; Supplementary Table 6). The expression of genes in T. caerulescens and T. arvense was minimally affected by [Zn]ext (Fig. 4a,b), consistent with the results of the condition tree analysis (Fig. 2). A Spearman's rank correlation coefficient analysis was used to rank relative abundance of T. caerulescens compared with T. arvense transcripts for nine genes, based on qPCR and ATH1 GeneChip® array data (d.f. = 7; Fig. 4c). Note that TcZNT2 and TcZNT4 are represented by the same probe set on the A. thaliana ATH1-121501 GeneChip® array (260462_at) and thus mean transcript abundances were used for qPCR analyses (Fig. 4). Quantitative PCR data for agar-grown plants were compared with GeneChip® array data for both agar- and compost-grown plants (Fig. 4c). In the absence of gDNA-based probe selection, relative transcript abundance in T. caerulescens, compared with T. arvense, measured by qPCR and ATH1 GeneChip® arrays did not correlate significantly for agar-grown plants (at 3 µm[Zn]ext, r = 0.52, P = 0.15; at 30 µm[Zn]ext, r = 0.43, P = 0.24) or for compost-grown plants (at 3 µm[Zn]ext, r = 0.40, P = 0.29; at 30 µm[Zn]ext, r = 0.32, P = 0.41). When gDNA-based probe selection was used at a gDNA hybridization intensity threshold of 300, significant correlations were observed between qPCR and ATH1 GeneChip® array data for both agar-grown plants (at 3 µm[Zn]ext, r = 0.77, P = 0.016; at 30 µm[Zn]ext, r = 0.70, P = 0.036) and for compost-grown plants (at 3 µm[Zn]ext, r = 0.70, P = 0.036; at 30 µm[Zn]ext, r = 0.68, P = 0.042). Thus, although seed availability necessitated the use of different populations of Thlaspi from those used for the GeneChip® analysis, qPCR results were largely consistent with transcriptome profiles. However, the relative expression estimates of two genes, CHS and NAS1, were different between the two methods of transcript quantification. These differences may be a result of the use of different biological samples, different populations of plants, contrasting primers, and/or different normalization procedures.

Figure 4.

(a) Real-time quantitative PCR (qPCR) of transcript abundance in Thlaspi caerulescens and Thlaspi arvense at 3 and 30 µm[Zn]ext, relative to transcript abundance in T. arvense at 3 µm (mean ± standard error of the mean, n = 3 biological replicates, each consisting of three technical replicates). Primers were designed to published sequences of T. caerulescens (zinc transporter 1 (ZNT1), AF133267; ZNT2/4, AF275752/AF292370; ZNT5, AF292029; heavy-metal-associated domain-containing protein 4 (HMA4), TCA567384; nitrate reductase 1 (NR1), AY551529; carbonic anhydrase (CA), AY551530; nicotianamine synthase 1 (NAS1), TCA300446) and T. arvense (metal tolerance protein 1 (MTP1), AY483146; chalcone synthase (CHS), AF144535) genes (Supplementary Table 1) and used to assay transcript abundance in both species (Supplementary Table 6). (b) Comparison of relative transcript abundance in T. caerulescens compared with T. arvense based on qPCR at 3 and 30 µm[Zn]ext (the dotted line represents unity). (c) Spearman's rank correlation coefficient analysis of relative transcript abundance of T. caerulescens compared with T. arvense transcripts for nine genes, based on qPCR at 3 µm[Zn]ext and ATH1-121501 GeneChip® array data for both agar- (closed circles) and compost-grown (open circles) plants (d.f. = 7). Data are presented as a function of the gDNA hybridization intensity threshold used to generate probe mask files for the transcriptome analysis. Note that ZNT2 and ZNT4 are represented by the same probe set on the Affymetrix A. thaliana ATH1-121501 GeneChip® array (260462_at) and thus mean transcript abundances are used.

Numerous genes are differentially expressed in T. caerulescens and T. arvense shoots

Further biological interpretations were undertaken using gDNA-based probe selection at a hybridization intensity threshold of 300. This threshold was selected as optimal, based on (i) the retention of 97% of probe sets whilst 50% of the probe pairs were retained (Fig. 1), (ii) the maximal detection of significantly differentially expressed genes between T. caerulescens and T. arvense (Fig. 3), and (iii) the significant correlation between ATH1 GeneChip® array data and qPCR confirmation of gene expression. Data obtained using all 13 gDNA hybridization intensity thresholds (0–1000) are available for scrutiny (Supplementary Table 2). To reduce the occurrence of differentially expressed genes occurring by chance, the FDR for the Welch t-test was set to 0.005 for biological interpretation of the data. In total, 4947 genes were identified as significantly differentially (> 2-fold or < 0.5-fold) expressed in the shoots of T. caerulescens compared with T. arvense in both agar- and compost-grown conditions (Supplementary Table 2). The abundance of 3349 transcripts was higher in the shoots of T. caerulescens compared with T. arvense and the abundance of 1598 transcripts was lower (Supplementary Tables 3 and 4).

Table 3.  Ten transcripts with the highest and lowest expression in the shoots of agar- and compost-grown Thlaspi caerulescens compared with Thlaspi arvense
ATH1-121501 GeneChip® array identifierAGI numberPutative functionNormalized expression in shoots of T. caerulescens compared with T. arvenseStandard errorWelch t-test P-value
  1. AGI, Arabidopsis Genome Initiative.

Highest expression
257462_atAt1g65740F-box protein234.4615.99< 0.0001
262632_atAt1g06680Photosystem II oxygen-evolving complex192.5035.30< 0.0001
255381_atAt4g03510E3 ubiquitin ligase124.5322.19< 0.0001
262679_atAt1g75830Plant defensin protein, putative (PDF1.1)123.16 7.97< 0.0001
261599_atAt1g49700Hypothetical protein 91.7614.05< 0.0001
259426_atAt1g01470Expressed protein 84.36 4.65< 0.0001
262262_atAt1g70780Expressed protein 81.43 5.56< 0.0001
252570_atAt3g45300Isovaleryl-CoA-dehydrogenase 73.5613.45< 0.0001
266118_atAt2g02130Plant defensin protein, putative (PDF2.3) 73.03 2.81< 0.0001
245650_atAt1g24735Caffeoyl-CoA 3-O-methyltransferase 72.14 4.93< 0.0001
Lowest expression
262684_s_atAt1g76030Vacuolar ATP synthase subunit  0.024 0.002< 0.0001
266865_atAt2g29980Omega-3 fatty acid desaturase (FAD3)  0.024 0.001< 0.0001
266421_atAt2g38540Nonspecific lipid transfer protein 1 (LTP 1)  0.023 0.001< 0.0001
258299_atAt3g23410Alcohol oxidase-related  0.023 0.004< 0.0001
256277_atAt3g12120Omega-6 fatty acid desaturase (FAD2)  0.021 0.003< 0.0001
251013_atAt5g02540Short-chain dehydrogenase/reductase  0.021 0.002< 0.0001
246997_atAt5g67390Expressed protein  0.020 0.002< 0.0001
258774_atAt3g10740Glycosyl hydrolase family 51  0.017 0.002< 0.0001
261821_atAt1g11530Thioredoxin family  0.014 0.001< 0.0001
250862_s_atAt5g0480040S ribosomal protein  0.012 0.000< 0.0001
Table 4.  Determining allelic variation for Thlaspi genes using the Affymetrix Arabidopsis thaliana ATH1-121501 GeneChip® array
GeneSpeciesProbe-set IDGenBank accession/ putative allele% similarities between PM probes and coding sequences
1234567891011
  1. Gene sequences for the zinc transporter 1 (ZNT1) and ZNT2/4 genes from T. caerulescens, and the metal tolerance protein 1 (MTP1) genes from T. arvense, T. caerulescens, and T. goesingense were downloaded from the GenBank database and imported into Vector NTI. Perfect-match (PM) oligo sequences from corresponding probe sets on the ATH1-121501 GeneChip® array were aligned with gene sequences using the ‘Motif’ analysis function to obtain percentage similarities.

ZNT1Thlaspi caerulescens260462_atAF2757518496100969284961009292 92
260462_atAJ3135218496100969284961009264 60
260462_atAF1332678496100969284961009292 92
ZNT2/4Thlaspi caerulescens259723_atAF2923709284 9284969288 888888 84
259723_atAF2757529284 9284969288 888888 92
259723_atAJ5383469284 9284929288 928888 60
MTP1Thlaspi arvense266718_atAY4831459688 8892808092 928488100
MTP1Thlaspi caerulescens266718_atAY4831469284 8488888888 889280 84
MTP1Thlaspi goesingense266718_atAY5600178484 8488848488 888876 84
266718_atAY5600188484 8488848488 889276 84
266718_atAY5600198484 8488848488 889276 84
266718_atAY0444538480 8088848488 889276 84
266718_atAY0444546080 8488848488 888876 84

Homologues of two plant defensin genes, PDF1.1 (At1g75830) and PDF2.3 (At2g02130), were amongst the 10 most highly expressed genes in the shoots of agar-grown T. caerulescens compared with T. arvense and two endoplasmic reticulum-localized fatty acid desaturase genes (FAD2/FAD3; At3g12120/At2g29980) and a two-pore calcium channel (TPC1; At4g03560) were amongst the 10 genes expressed at much lower levels in T. caerulescens shoots than in T. arvense (Table 3). The biological significance of these initial observations requires further investigation.

Transcripts homologous to A. thaliana genes encoding members of membrane transport protein families with a putative involvement in Zn transport were differentially expressed in T. caerulescens compared with T. arvense. There was a higher expression of genes homologous to three members of the A. thaliana ZIP (ZRT, IRT-like proteins) transporter family, including AtIRT3 (19- and 6.7-fold in agar- and compost-grown plants, respectively; At1g60960), AtZIP6 (2.3- and 3.4-fold; At2g30080), and, in agar-grown conditions only, AtZIP7 (2.1-fold; At2g04032). Two members of this family had lower expression in shoots of T. caerulescens compared with T. arvense: AtZIP3 (0.5-fold; At2g32270) in both agar- and compost-grown conditions, and AtZIP10 (0.5-fold; At1g31260) in agar-grown plants only. Four genes encoding P-type ATPase proteins had higher expression in shoots of T. caerulescens compared with T. arvense: the P1B-type ATPases AtHMA3 (3.5- and 3.9-fold; At4g30120) and AtHMA4 (2.2- and 2.2-fold; At2g19110) and the Ca2+-transporting ATPase AtACA13 (5.4- and 4.7-fold; At3g22910) in both agar- and compost-grown conditions, and the Ca2+-transporting ATPase AtACA12 (3.7-fold; At3g63380) in agar-grown plants only. Transcripts with homology to three CDF transporters (At2g39450, At2g04620 and At3g12100) had higher expression in T. caerulescens than in T. arvense. The expression levels of three alcohol dehydrogenases including a cinnamyl-alcohol dehydrogenase (up to 26-fold; At1g09500), a histidinol dehydrogenase (up to 5-fold; At5g63890) a beta-lactamase (2.1-fold; At5g63420), two carbonic anhydrases (including At4g20990, up to 18-fold), a metalloprotease (6.2-fold; At2g32480), a metallothionein (2.5-fold; At3g15353) and several genes involved in glutathione metabolism (up to 11-fold for AtGST16; At2g02930) were all higher in T. caerulescens than in T. arvense in at least one growth condition (Supplementary Tables 3 and 4). Over 1770 of the genes identified as being differentially expressed between T. caerulescens and T. arvense were classified as having unknown function. It is unclear how T. caerulescens coordinates its unusually active mechanisms for uptake, translocation and compartmentalization of Zn (Ernst et al., 2002; White et al., 2002). However, the transcripts with homologies to the A. thaliana genes identified in this and previous studies with altered expression levels in T. caerulescens are targets for future molecular and functional analysis in both shoots and roots as Thlaspi molecular resources become available (e.g. sequence information and transformation systems; Peer et al., 2003).

Following gDNA-based probe selection at a hybridization intensity threshold of 300, 561 genes were not selected for further transcriptional analysis, of which all were excluded based on hybridization with T. caerulescens gDNA and 107 of the 561 were also excluded based on hybridization with T. arvense gDNA to the ATH1 GeneChip® array. Of the 561 genes not selected, 253 were described as encoding expressed or hypothetical proteins, 26 were described as belonging to the F-box protein family, 17 were described as having transposase or transposon activity and 12 were described as encoding disease resistance proteins (Supplementary Table 5).

Expression of HMA4, ZNT2/4 and NR1 in T. caerulescens under increasing [Zn]ext

The expression of HMA4, ZNT2/4 and NR1 was studied further in T. caerulescens under conditions of increasing [Zn]ext, as these three genes showed consistently high expression in T. caerulescens compared with T. arvense. Although T. arvense showed reduced growth at > 30 µm[Zn]ext, the shoot growth of T. caerulescens was not affected until 600 µm[Zn]ext (Fig. 5a). Thlaspi caerulescens accumulated substantially more Zn than T. arvense on a shoot fresh (Fig. 5b) and dry weight basis (data not shown). There was little difference in gene expression between plants grown at 3 and 30 µm[Zn]ext (Fig. 5c–e). However, expression of HMA4 increased significantly at 150 and 300 µm[Zn]ext before returning to control levels at 600 µm[Zn]ext (Fig. 5). The expression of ZNT2/4 decreased at > 30 µm[Zn]ext. The expression of NR1 was less affected by [Zn]ext, although it increased significantly at > 1200 µm[Zn]ext. The functional significance of [Zn]ext on the shoot expression of these genes requires testing with more detailed spatial expression and functional analyses.

Figure 5.

(a) Shoot fresh weight (FW) and (b) shoot zinc (Zn) concentration responses of Thlaspi caerulescens (open squares) and Thlaspi arvense (closed circles) to increasing [Zn]ext[mean ± standard error of the mean (SEM), n = 3]. (c–e) Shoot expression of T. caerulescens heavy-metal-associated domain-containing protein 4 (HMA4), zinc transporter 2/4 (ZNT2/4), and nitrate reductase 1 (NR1) genes was measured using real-time quantitative PCR (qPCR). Primers were designed to published gene sequences (Supplementary Table 1) and data were normalized to an 18S rRNA control (mean ± SEM, n = 3; Supplementary Table 6). Amplification efficiencies (Ex) calculated from the standard curves for the gene of interest using the equation Ex = [10(1/–b)] − 1, where b is the slope of the standard curve, were TcHMA4 = 1.04; TcZNT2/4 = 1.05; TcNR1 = 1.03.

Discussion

This study reports the development of a gDNA-based probe-selection strategy to enable the robust profiling and comparison of the transcriptomes of two nonmodel plant species, T. caerulescens, a Zn hyperaccumulator, and T. arvense, a nonhyperaccumulator, using the ATH1 GeneChip® array. This approach was adopted to overcome the potential limitations of cross-species transcriptomic approaches which can arise because of sequence polymorphisms between the target species and the model species to which the probes were designed (Ji et al., 2004; Hammond et al., 2005). The validity of this approach was confirmed using in silico alignment of selected A. thaliana and Thlaspi probe and gene sequences and qPCR.

Genomic DNA-based probe selection is appropriate for interpretation of the T. caerulescens and T. arvense transcriptomes

The average similarity between the coding regions of the A. thaliana and T. caerulescens or T. arvense genes was 81.5%, based on the analysis of nine genes for which publicly available sequence information was available. This observation is consistent with previous reports of high (87–88%) sequence homology between noncoding regions of Thlaspi and A. thaliana at the DNA level (Peer et al., 2003; Table 1). On average, 87% of nucleotides were identical when PM oligo probe sequences from A. thaliana were aligned with the T. caerulescens and T. arvense gene sequences (Table 1). The generation of a reliable signal will be affected by several factors, including the formation of secondary structures in the probe and the thermodynamic properties of annealing between the probe and the transcript being analysed. There was no correlation between the thermodynamic properties of the probes and the probe signal intensity (data not shown). Thus, the lack of correlation between probe/target homology and probe signal strength may be a result of the formation of secondary structures (e.g. hairpin loops), which have the potential to increase or decrease the signal from a particular probe (Grigoryev et al., 2005). Probe design criteria, used by Affymetrix to avoid cross-hybridization with nonspecific targets, suggest that two 8-mer perfect matches or 12 consecutive matching bases will produce stable hybridization and signal generation (Affymetrix Technical Note, Array Design for the GeneChip® Human Genome U133 Set; http://www.affymetrix.com/support/technical/technotes/hgu133_design_technote.pdf.).

PM probes with high homology with the target organism have the potential to identify sequence differences between species and also allelic variation within a species. For example, sequences for alleles encoding ZNT1, ZNT2/4 and MTP1 were aligned with PM probe sequences from their corresponding probe sets on the ATH1 GeneChip® array (Table 4). For the majority of the PM probes, each allele had identical percentage similarities. For ZNT1, for which alleles from T. caerulescens were considered, probes 10 and 11 have the potential to discriminate the allele encoded by AJ313521. The allelic variation is greater between the alleles for ZNT2/4, with the potential to discriminate between all three alleles based on hybridization to the ATH1 GeneChip® array (Table 4). Further, there are clear differences between copies of MTP1 from different species and there is the potential to discriminate between the different alleles of MTP1 from Thlaspi goesingense (Table 4).

The use of gDNA-based probe selection at low hybridization intensity thresholds resulted in a rapid exclusion of probe pairs whilst probe sets were retained (Fig. 1). At higher gDNA hybridization intensity thresholds, many more probe sets were excluded and there was a concomitant decline in the number of genes differentially expressed between T. caerulescens and T. arvense (Fig. 3). Thus, although more stringent gDNA-based probe selection might be warranted for individual genes, an optimum gDNA hybridization intensity must be selected for initial transcriptome comparisons of two species. A gDNA hybridization intensity threshold of 300 was selected for further transcriptome comparisons in this study. When probe selection at a gDNA hybridization intensity threshold of 300 was used, 5782 genes were identified as differentially (> 2-fold or < 0.5-fold) expressed (3816 higher and 1966 lower expression) in the shoots of T. caerulescens compared with T. arvense. When no probe selection was used, 159 genes were identified as differentially (> 2-fold or < 0.5-fold) expressed (72 higher and 87 lower). Whilst the sensitivity to detect differentially expressed genes is clearly superior when probe selection is applied, the number of genes expressed at higher levels in T. caerulescens than in T. arvense is greater than would be expected by chance. To explore this observation, we reanalysed data on an individual replicate basis, before normalizing the data on a per gene basis, using (i) a standard MAS5.0 (Affymetrix) normalization algorithm, with probe set signals scaled to 100 (Supplementary Figs 1 and 2), and (ii) a more stringent gDNA-based probe selection strategy to include only probe pairs that were common to both species, using RMA prenormalization (Supplementary Fig. 5). MAS5.0 normalization reduced the uniformity of the data (Supplementary Figs 1 and 2). However, the use of a more stringent gDNA-based probe-selection strategy resulted in a more uniform distribution of data (at an FDR of 0.05, 754 genes were identified as expressed at a higher level in T. caerulescens than in T. arvense and 711 were expressed at lower levels; Supplementary Table 7). Thus, whilst RMA prenormalization is a more appropriate normalization strategy than MAS5.0, the trade-off between the retention of as many informative GeneChip® probes as possible and the uniformity of data following normalization at the probe level clearly warrants further theoretical analysis using a range of interspecies comparisons. Nevertheless, detailed in silico and qPCR analyses of the gDNA-based probe-selection strategy adopted in this study, based on the retention of as many informative GeneChip® probes per species as possible, show it to be robust.

Genes implicated in Zn transport are differentially expressed

There is evidence that genes encoding heavy metal transporters, such as TcZNT1, TcZNT2 and TcZTP1, have roles in Zn hyperaccumulation as their expression is constitutively higher in both roots and shoots of T. caerulescens compared with T. arvense (Pence et al., 2000; Assunção et al., 2001). Both TcZNT1 and TcZNT2 share sequence and structural similarities with members of the ZIP transporter family, which are implicated in the accumulation of essential metals and the detoxification of harmful ones in a wide range of organisms. There was differential expression of five genes homologous to the A. thaliana ZIP transporter family, including higher expression of AtIRT3, AtZIP6 and AtZIP7, and lower expression of AtZIP3 and AtZIP10 in shoots of T. caerulescens compared with T. arvense (Supplementary Tables 3 and 4). Other proteins that can mediate Zn2+ fluxes between inter- and intracellular compartments and that may be involved in Zn hyperaccumulation in T. caerulescens include members of the cation diffusion facilitator family (CDF; Gaither & Eide, 2001; Kim et al., 2004), and certain P1B-type ATPase (HMA; heavy-metal-associated domain-containing) proteins (Williams et al., 2000; Hussain et al., 2004; Papoyan & Kochian, 2004; Williams & Mills, 2005).

Transcripts with homology to three CDF transporters (At2g39450, AtMTP11; At2g04620, AtMTP12; At3g12100; AtMTP5; Delhaize et al., 2003) were expressed at higher levels in T. caerulescens than in T. arvense. The transcript abundance of TcMTP1 has previously been shown to be higher in the roots and shoots of three T. caerulescens ecotypes than in the nonhyperaccumulator T. arvense (Assunção et al., 2001). Curiously, TcMTP1 (At2g46800) transcript abundance was lower in T. caerulescens than in T. arvense in the present study (Supplementary Table 4). Kim et al. (2004) have recently characterized TgMTP1 from T. goesingense, a hyperaccumulator of Zn and Ni, and demonstrated its role in Zn efflux from cells. In A. halleri, a Zn hyperaccumulator, the transcript levels for AhMTP1 are higher in A. halleri compared with the Zn-sensitive species A. lyrata (Dräger et al., 2004; Krämer, 2005). The high expression of AhMTP1 in A. halleri has been attributed to two genetically unlinked genomic copies of the gene, which cosegregated with Zn tolerance in a back-cross between A. halleri and A. lyrata (Dräger et al., 2004). AhMTP1 was also able to complement the Zn hypersensitivity of a Saccharomyces cerevisiae zrc1cot1 mutant strain.

Functional analysis of TcHMA4 indicates a role in xylem loading of Zn (Papoyan & Kochian, 2004), an observation consistent with a proposed role for AtHMA2 and AtHMA4 in maintaining Zn homeostasis in A. thaliana, for example the hma2hma4 double mutant accumulated less Zn than its wild type (Hussain et al., 2004), whilst overexpression of AtHMA4 in A. thaliana increased the shoot accumulation of Zn and Cd (Verret et al., 2004). Four genes of the P-type ATPase family had greater expression in shoots of T. caerulescens compared with T. arvense: homologues of the P1B-type ATPases AtHMA3 and AtHMA4, and the Ca2+-transporting ATPases AtACA13 and AtACA12. Intriguingly, AhHMA3 is also expresssed at very high levels in the shoots of A. halleri, also a Zn hyperaccumulator, when compared with its nonhyperaccumulating relatives (Becher et al., 2004; H. J. Newbury, unpublished observations; http://affymetrix.arabidopsis.info/narrays/experimentpage.pl?experimented=85). However, AtHMA3 did not complement the Zn-hypersensitive yeast strain Δzrc1, but did complement the Cd/Pd-hypersensitive yeast strain Δycf1 (Gravot et al., 2004). The T. caerulescens orthologues of AtHMA3 and AtHMA4 are clearly targets for further detailed study in T. caerulescens.

Previously, the AG GeneChip® array (representing c. 8300 genes) was used to demonstrate that genes involved in Zn transport and homeostasis were differentially expressed in shoots and roots of the Zn-tolerant, Zn-hyperaccumulator species A. halleri compared with the nonhyperaccumulator species A. thaliana (Becher et al., 2004; Weber et al., 2004). Of the genes identified as differentially expressed in the studies utilizing the A. thaliana AG GeneChip® array, only 16 genes were common to the genes identified as significantly differentially expressed between T. caerulescens and T. arvense in this study. These included transcripts homologous to AtCAX2, AtHMA3, AtZIP6, AtZAT/MTP1, and a cytochrome P450. The low number of overlapping genes may be a consequence of not accounting for sequence polymorphisms between A. thaliana and the target organism when analysing the data. The ATH1 GeneChip® array (representing 22 746 genes) has since been used to study A. halleri and A. lyrata ssp. petraea (a nonhyperaccumulator species; H. J. Newbury, unpublished observations; http://affymetrix.arabidopsis.info/narrays/experimentpage.pl?experimentid=85). We thus analysed the A. halleri and A. lyrata ssp. petraea data in GeneSpring in the absence of any probe-selection strategy. Twenty genes were expressed at levels > 2-fold greater in both T. caerulescens and in A. halleri shoots, compared with their nonhyperaccumulating congeners. These include AtHMA3, and AtIRT3 described previously. Further, several genes with metal-binding functions (e.g. a magnesium-chelatase, At4g18480), an inducible high-affinity phosphate transporter gene Pht1;4 (At2g38940), and a gene encoding a metal efflux protein (At2g39450) were also expressed at higher levels in the shoots of both T. caerulescens and A. halleri than in their nonhyperaccumulating congeners. However, as there are c. 1000 species-specific gene expression differences in T. caerulescens and A. halleri shoots, the attractive hypothesis of conserved gene function in brassicaceous hyperaccumulators requires further testing.

Perspective

A notable success of the A. thaliana research community has been to develop high-density oligo arrays with full-genome coverage. Such arrays are well suited to the creation of standardized databases containing many thousands of experiments (Craigon et al., 2004). Although commercial microarrays are not available for Thlaspi species, their transcriptomes can be analysed using microarrays designed for A. thaliana. However, if appropriate probes are not selected, transcriptome analyses will suffer from reduced sensitivity. A gDNA-based probe selection strategy to select probes for transcript analysis is one option. In this study, a strategy to select probes based on their hybridization to T. caerulescens or T. arvense gDNA has been described and shown to be robust. This strategy is likely to be applicable to other species for which there is little genomic information.

Comparing the shoot transcriptomes of T. caerulescens to those of T. arvense has yielded insights into the ability of T. caerulescens to hyperaccumulate Zn. In total, 4947 transcripts (representing homologues, possibly orthologues, of genes in A. thaliana) were identified as differentially (> 2-fold or < 0.5-fold) expressed in the shoots of T. caerulescens compared with T. arvense. The abundance of 3349 transcripts was greater in the shoots T. caerulescens compared with T. arvense and the abundance of 1598 transcripts was lower. These included genes from gene families that encode Zn transporters, and those involved in Zn2+ compartmentalization, whose function has been previously characterized in T. caerulescens and other Brassicaceae species. Genes identified in this study are ideal candidates for future functional analyses to further our understanding of the molecular mechanisms of Zn hyperaccumulation.

Acknowledgements

Seeds of Thlaspi caerulescens (‘Viviez’ population, France) were obtained from the School of Botany Germplasm Collection, University of Melbourne, Australia, from samples initially collected by R. D. Reeves (Massey University, New Zealand). This research was supported by New Lecturer's Awards from the Nuffield Foundation and the University of Nottingham (MRB), the Biotechnology and Biological Sciences Research Council (PJW, HCB), the UK Department for the Environment, Food and Rural Affairs (Defra; HH3501SFV, HH3504SPO; JPH, PJW, MRB), and by the UK BBSRC Investigating Gene Function Arabidopsis award (IGF12422; STM). We thank Dr M. G. M. Aarts (Wageningen) for continued discussions on Thlaspi transcriptomics and the anonymous referees whose helpful and detailed comments have substantially improved this paper.

Ancillary