An efficient procedure for normalizing ionomics data for Arabidopsis thaliana


(Author for correspondence: tel +44 (0)115 9516382; email

We propose an efficient procedure for normalizing Arabidopsis thaliana ionomics data. The ionome is the mineral nutrient and trace element composition of an organism, which includes elements nonessential for plant growth (Salt et al., 2008; Baxter, 2009). The ionome can be considered as the inorganic subset of the metabolome at a given moment in space and time, and it is dependent on evolutionary, genetic, developmental and environmental factors. The ionomes of several plant species (e.g. A. thaliana, Brassica, Oryza sativa (rice) and Lotus) are currently being characterized (Lahner et al., 2003; Broadley et al., 2008; Salt et al., 2008; Chen et al., 2009; Hammond et al., 2009; White et al., 2010), as is the ionome of baker’s yeast (Saccharomyces cerevisiae; Eide et al., 2005; Danku et al., 2009). The most comprehensive ionomics data by far are for A. thaliana (Salt et al., 2008). Currently, these data comprise 128 238 unique samples from 10 386 genotypes/lines/accessions, and include 17 785 fast-neutron (FN) mutagenized plants, 24 685 T-DNA mutagenized plants (representing 1941 unique genes), 14 258 ethylmethane sulfonate (EMS) mutagenized plants and 32 280 wild-type plants, including 522 different accessions and inbred lines (all data were obtained from a database download from on September 23 2009). Several genetic correlates of mutant mineral phenotypes have been identified. For example, among FN mutants, a major calcium (Ca) phenotype was recently identified for a gene (enhanced suberin biosynthesis 1, ESB1) that controls the radial apoplastic transport of Ca to the root stele (Baxter et al., 2009).

An alternative to screening induced mutations for altered ionomic phenotypes is to exploit the reservoir of natural variation in A. thaliana (Nordborg et al., 2005). Currently, the ionomes of recombinant inbred progeny of six mapping populations and > 360 natural accessions have been reported. Correlating variation in genotype with phenotype among natural accessions – after correcting for population structure – has identified alleles that are under selection and are of putative adaptive significance. For example, a low shoot molybdenum (Mo) concentration among 92 A. thaliana accessions has been linked to a naturally occurring deletion in the promoter of a mitochondrion-localized transporter (molybdenum transporter 1, MOT1; Baxter et al., 2008a). Furthermore, higher shoot sodium (Na) accumulation in two accessions from the Tossa del Mar (Ts-1) and Tsu (Tsu-1) sites, on the coastal regions of Spain and Japan, respectively, has been linked to a novel AtHKT1 allele (Rus et al., 2006). Natural genetic resources are still relatively underexploited, primarily because natural phenotypic variation associates with genotypic variation at multiple loci.

A. thaliana ionomics data are being generated by high-throughput phenotyping. Data are managed using the Purdue Ionomics Information Management System (PiiMS) and are publicly available at for viewing, download and re-analysis. The workflow is described in detail elsewhere (Lahner et al., 2003; Baxter et al., 2007). Briefly, genotypes/lines/accessions are sown in ‘trays’ comprising 70–108 individual units and with Col-0 included as a reference accession in each tray. There are a minimum of six individual replicate plants per genotype/line/accession within a tray. The sowing pattern is varied across trays to reduce positional effects. Seeds are sown onto moist soil, stratified at 4°C for 48–72 h and grown for 36–40 d in a climate-controlled room at 19–24°C with 10 h of photosynthetically active light at 80–100 μmol m−2 s−1. Two or three leaves (with a total fresh weight of 1–5 mg) are sampled for subsequent mineral analysis using inductively coupled plasma mass spectometry (ICP-MS). The leaf mineral composition for each element is determined using a per-tray global weight-normalization procedure, based on the leaf content of all elements (Lahner et al., 2003). This allows for accurate comparison of genotypes/lines/accessions within each tray. There are currently > 1500 ‘trays’ of A. thaliana data in the public domain that have been generated continuously since October 2001.

Inter-tray comparison of genotypes/lines/accessions is nontrivial. Despite the maintenance of standard conditions during the A. thaliana ionomics workflow, plants inevitably experience slight variation in environmental conditions (temperature, watering, etc.) between experiments. Furthermore, certain environmental conditions (e.g. nutrient supply) are manipulated deliberately for experimental reasons. For data analysis, because there is at least one accession (Col-0) common to all experiments, we can consider the workflow as delivering a set of related experiments with synonymous ‘experiment’ and ‘tray’ terms. The workflow can also be considered as an incomplete block-design experiment, with ‘accession’ representing the fixed – i.e. treatment – factor. All random factors, including environmental and technical (e.g. analytical) variation, integrate at the level of ‘tray’. Residual maximum likelihood (REML) procedures (Patterson & Thompson, 1971; Robinson, 1987; Welham & Thompson, 1997) can provide reliable estimates of treatment means and can recover information on variance structures, from incomplete block-designs and from sets of related experiments with unknown error sources (e.g. Broadley et al., 2001, 2008; Watanabe et al., 2007). Residual maximum likelihood procedures can also be used to determine the relative contributions of genotypic and environmental sources of variation acting on the phenotype.

Here, we have tested the hypothesis that REML-estimated means are a reliable and efficient method for normalizing A. thaliana ionomics data across multiple trays. We tested this hypothesis using a core set of 96 natural accessions of A. thaliana, representing most of the common allelic variants within the species (Nordborg et al., 2005), grown across 210 trays (; downloaded May 2009). Data were identified using the accession name. Observed (solution concentration in μg l−1, measured using ICP-MS) and weight-normalized (mg kg−1 dry weight) data were downloaded for cells in each tray containing the accession, and data for cells containing Col-0 in the same tray were downloaded in a similar manner. Data were analysed separately for each element. For the REML procedure, we used an additive linear mixed model with ‘accession’ defining the fixed-treatment effect and ‘tray’ defining the sum of random effects. This approach partitions the total variation in leaf mineral concentration into components of an additive model and provides estimated means for the ‘accession’ fixed factor from the model. Thus, variation in leaf mineral concentration caused by the genotype is assigned to the ‘accession’ term. Variation in leaf mineral concentration as a result of environmental effects, or plant-to-plant effects, is assigned to the ‘tray’ and ‘residual’ terms, respectively. The approach is analogous to adjusting a treatment mean in a designed experiment by removing a blocking term. By comparing the proportion of variation in leaf mineral concentration explained by the model terms, we were therefore able to determine the scale of the genotype effect compared with the tray-to-tray and plant-to-plant variability. Such techniques have previously been used to extract evolutionary information from other unstructured data sets (Watanabe et al., 2007, and references therein). Statistical analyses were conducted using genstat (Release; VSN International Ltd, Hemel Hempstead, UK).

As the aim of the ionomics workflow is to determine ‘relative’ leaf mineral composition with precision, and not ‘absolute’ mineral concentrations, we tested the effectiveness of REML by exploring the relationships between the leaf concentrations of three pairs of elements (Ca and magnesium (Mg); potassium (K) and rubidium (Rb); and sulfur (S) and selenium (Se)). We expected these three element-pairs to correlate in absolute terms, based on previous studies (Broadley et al., 2004; Watanabe et al., 2007; White et al., 2007). The correlations between all three element-pairs improved substantially when data were REML-normalized across trays (Fig. 1, Table 1). This improvement was consistent for observed leaf concentrations and for weight-normalized leaf concentrations. For example, mean weight-normalized leaf Ca and Mg concentrations do not correlate across 96 accessions (= 0.17, > 0.05; Fig. 1d). Following REML normalization, leaf Ca and Mg concentrations are highly correlated (= 0.82 and 0.77; Fig. 1a,b), as are leaf K and Rb (= 0.71 and 0.70; Fig. 1e,f) and leaf S and Se (= 0.65 and 0.64; Fig. 1i,j). Additional correlation structures between elements in the ionomics database (i.e. ‘mineral signatures’) remain to be explored.

Figure 1.

 Relationship in leaf mineral concentration between three element-pairs among natural accessions of Arabidopsis thaliana. (a–d) Calcium (Ca) and magnesium (Mg) concentrations (n = 96); (e–h) potassium (K) and rubidium (Rb) concentrations (= 63); (i–l) sulfur (S) and selenium (Se) concentrations (= 96). Accessions are as defined in Nordborg et al. (2005) and data were downloaded from (May 2009). Panels labelled ‘O_...’ refer to observed data values (units, μg l −1); panels labelled ‘WN_...’ refer to weight-normalized values (units, mg kg−1 dry weight). Panels labelled ‘…_mean’ are arithmetic means of accessions; panels labelled ‘…_REML’ are accession means estimated using a residual maximum-likelihood procedure. Correlation coefficients are provided in Table 1.

Table 1.   Correlation coefficients between leaf mineral concentrations of element-pairs, based on arithmetic means, or residual maximum likelihood (REML)-estimated means, for 96 accessions of Arabidopsis thaliana, grown in 210 trays with Col-0 common to all trays (see Fig. 1 for related scatter plots)
Element correlationObserved valuesWeight normalized values
R (means)aR (REML)R (means)R (REML)
  1. aCorrelation coefficient. All are significant at < 0.0001, except those marked ‘ns’, which are > 0.05.

  2. Ca, calcium; K, potassium; Mg, magnesium; Rb, rubidium; S, sulfur; Se, selenium.

Ca : Mg0.005ns0.8180.175ns0.769
K : Rb0.202ns0.7050.3840.701
S : Se0.078ns0.6500.4760.635

A valid criticism is that the REML model can only achieve successful normalization of leaf mineral concentration on a per-accession basis if nongenotypic effects (represented in this analysis by ‘tray’) are reasonably uniform across all genotypes/lines/accessions, and if these effects are additive. If the environment induces significant Genotype × Environment interactions, or if nongenotypic effects are nonadditive, then the process is compromised. To explore this aspect, we examined the proportion of variation in leaf mineral concentration explained, by each of the model terms, using a (Accession + (Accession/Tray)) random model (Table 2). Accession effects, in general, are small, compared with plant-to-plant and tray-to-tray variation, which dominate the analysis. Furthermore, the proportion of variation in leaf mineral concentration explained by the Accession × Tray interaction terms is relatively small for each element, compared with Accession effects alone. Given this relatively small contribution of the Accession × Tray interaction term to explain variation in leaf mineral concentration in A. thaliana, the REML procedure appears to be appropriate, although clearly one must proceed with any normalization step with caution. Despite being globally small, the interaction term is where ionomics can undoubtedly capture some of the most interesting aspects of plant biology. More detailed variance components analysis will guide future experimental designs to explore genotype-specific responses to environment.

Table 2.   Variance components analyses of leaf mineral concentration from the residual maximum likelihood (REML) procedure using the random model of (Accession + (Accession/Tray)) for 96 accessions of Arabidopsis thaliana, grown in 210 trays with Col-0 common to all trays
Model termProportion of variation in leaf mineral nutrient concentration explained by model term (%)
  1. aNegative variance components values can arise during model fitting; here, the Accession × Tray term is effectively zero.

  2. Ca, calcium; K, potassium; Mg, magnesium; Rb, rubidium; S, sulfur; Se, selenium.

Accession × Tray0.280.530.17−
Residual term (plant-to-plant variation)13.659.7741.8511.4932.909.24
n12 17512 17511 8263450559312 175

From these observations, we conclude that REML normalization is an efficient procedure for estimating leaf mineral concentrations from the A. thaliana ionomics database. Such normalization should facilitate the more efficient exploration of the large ionomics data sets currently being developed, because these data sets are composed of many hundreds of experiments extending across multiple years. Comparison of the ionome of different genotypes across the multiple experiments will enhance the ability to identify informative mutants and natural variants, as well as allow the identification of classes of ionomic profiles with common underlying physiological foundations (Baxter et al., 2008b).


BBSRC Agri-Food IPA, BB-G013969-1 (MRB and JPH); the Scottish Government Rural and Environment Research and Analysis Directorate (PJW); US National Science Foundation (IOB 0419695) and the US National Institutes of Health (R01 GM078536) (DES).