Trait-directed de novo population transcriptome dissects genetic regulation of a balanced polymorphism in phosphorus nutrition/arsenate tolerance in a wild grass, Holcus lanatus



  • The aim of this study was to characterize the transcriptome of a balanced polymorphism, under the regulation of a single gene, for phosphate fertilizer responsiveness/arsenate tolerance in wild grass Holcus lanatus genotypes screened from the same habitat.
  • De novo transcriptome sequencing, RNAseq (RNA sequencing) and single nucleotide polymorphism (SNP) calling were conducted on RNA extracted from H. lanatus. Roche 454 sequencing data were assembled into c. 22 000 isotigs, and paired-end Illumina reads for phosphorus-starved (P−) and phosphorus-treated (P+) genovars of tolerant (T) and nontolerant (N) phenotypes were mapped to this reference transcriptome.
  • Heatmaps of the gene expression data showed strong clustering of each P+/P− treated genovar, as well as clustering by N/T phenotype. Statistical analysis identified 87 isotigs to be significantly differentially expressed between N and T phenotypes and 258 between P+ and P− treated plants. SNPs and transcript expression that systematically differed between N and T phenotypes had regulatory function, namely proteases, kinases and ribonuclear RNA-binding protein and transposable elements.
  • A single gene for arsenate tolerance led to distinct phenotype transcriptomes and SNP profiles, with large differences in upstream post-translational and post-transcriptional regulatory genes rather than in genes directly involved in P nutrition transport and metabolism per se.


Grasses are known to have complex genomes of various size, which are often large, with extensive repetitive elements, local rearrangements, and differences in genomic structure, ploidy level and chromosome number, and are consequently challenging with respect to whole-genome sequencing and assembly (Buckler et al., 2001; Feuillet & Keller, 2001; Jackson et al., 2011; Hamilton & Buell, 2012). Transcriptome sequencing bypasses genomic complexity by focusing on protein coding genes (Hamilton & Buell, 2012; c. 25 000–56 000 in grasses) derived from a well-conserved and relatively small proportion of the genome (c. 84% of gene families shared between grass subfamilies; The International Brachypodium Initiative, 2010). As transcripts are sensitive to environment, RNA sequencing can be clustered by function through use of appropriate experimental manipulation to further aid in gene pathway identification (Suarez Rodriguez et al., 2010; Urzica et al., 2012; O'Rourke et al., 2013).

While genomic sequencing has led to an unparalleled advance in our understanding of the physiology and ecology of model organisms, it is not currently feasible (because of the expense) to conduct trait-led genome sequencing investigations in all wild species of interest (Nawy, 2012); rather, there is a reliance on finding traits that exist (naturally or through mutation) in model species (Jackson et al., 2011). While laboratory-generated (site-directed, knockdown, overexpression, or chemical- or radiation-induced) mutants are essential for studying gene function and serve as a breeding resource (Kuromori et al., 2009), natural polymorphisms in nonmodel species are very much an untapped resource, as these have evolved under natural selection rather than artificial manipulation. Naturally selected genes, as a consequence, are ecologically fit, having evolved within a complex genetic network.

The wild grass Holcus lanatus, an outcrossing diploid (2n = 14) which is closely related to Brachypodium distachyon (Aliscioni et al., 2012), has a remarkable balanced polymorphism in arsenate tolerance, screened from seminatural, non-arsenic-contaminated populations (Meharg et al., 1993), coded by a single gene (Macnair et al., 1992). As arsenate is a phosphate analogue it has been postulated that this polymorphism is maintained as a consequence of phosphorus (P) nutrition, not arsenate tolerance per se, particularly as the tolerance gene co-segregates with suppression of high-affinity phosphate transport (HAPT; Meharg et al., 1992a; Meharg & Macnair, 1992b), although an explicit ecological link to the P status of soils has yet to be proven (Naylor et al., 1996). Characterization of the phosphate/arsenate transport system in tolerant and nontolerant individuals screened from uncontaminated populations has shown the same suppression of HAPT exhibited by mine tolerants that are exposed to high levels of soil arsenic contamination (Meharg & Macnair, 1992c), although in mine populations HAPT suppression is more extreme, probably as a result of the gene going to fixation in mine populations (Meharg et al., 1993) and the involvement of genetic modifiers (Macnair et al., 1992).

With respect to utilizing transcriptome sequencing to identify the genes involved in arsenate tolerance, the comparison between mine and nonmine populations is confounded as there are multiple differences (mineral nutrition, soil structure, co-contaminants, wind erosion etc.) in habitat, besides arsenate tolerance, that will be selected upon within disparate ecotypes. Conversely, screening tolerant and nontolerant mature plants (i.e. propagation from field-collected tillers rather than from seed) from the same population, as reported here, gets around the fact that transcriptomes will fundamentally differ between populations from contrasting habitats. If a distinct phenotype under the regulation of a single gene is screened from a population with a balanced polymorphism, it can be assumed that the frequency of all other genes between the phenotypes is random if genotype is the unit of replication. Here, a trait-directed de novo transcriptomic approach (sequence assembly and annotation, assessment of transcript expression levels and SNP calling) was applied to arsenate tolerance H. lanatus phenotypes from the same uncontaminated population. Furthermore, the role of P in maintaining the polymorphism was also investigated by the use of a factorial treatment design (low and high P, and tolerant and nontolerant phenotypes), as well as by assessing the P responsiveness of these phenotypes on their soil of origin to provide an ecophysiological context for interpreting the transcriptome data.

Materials and Methods

Chemicals used in experiments were trace element grade or better, while all chemicals used for analytical purposes were Aristar grade.

Plant and soil collection

Single tillers of Holcus lanatus L. were collected from a semi-natural grassland, Cruickshank Botanic Gardens (CBG), University of Aberdeen, UK, where H. lanatus is a dominant species. Only one tiller was taken from each of 250 individual plants, and only isolated plants were selected, spaced at least 5 m from each other. Surface soil (0–10 cm) from the CBG population was collected, 2-mm-sieved and stored in field moist conditions until use. Tillers were cultivated in a temperate glasshouse following potting (10-cm-wide pots) into John Innes number 2 compost.

Plant growth characterization

Tiller testing was conducted according to procedures outlined in Macnair et al. (1992). Following tiller tolerance testing, genotypes were selected for further study where their longest root length after 2 wk of growth in arsenate-free solution exceeded 100 mm. The plants were classified as either nontolerant (N) or tolerant (T) based on their tolerance index (TI), where TI = 100 × (root growth in 0.013 mM arsenate/root growth in absence of arsenic), with this segregation shown in Fig. 1. Out of the 250 genotypes screened, c. 30 of each phenotype were selected for detailed study.

Figure 1.

Tolerance index (100 × root length in +As/root length in −As) of the Cruickshank Botanic Gardens Holcus lanatus population grown hydroponically in 0.013 mM arsenic as arsenate for 2 wk. N, nontolerant (pink); T, tolerant (orange).

A P fertilization experiment was conducted on these c. 60 genotypes, split into N and T phenotypes, where each genotype was a single replicate, and where the CBG soil was fertilized with phosphate (100 P mg kg−1 soil DW as disodium phosphate). Plants were grown in control (no fertilization) and with fertilization for 60 d before harvesting. At harvest, roots were washed free of soil and root and shoot dry weights recorded along with shoot P. P was analysed in powderized shoot by inductively coupled plasma–mass spectrometry (ICP-MS) on an Agilent 4000 instrument (Aglient, Santa Clara, CA, USA), following microwave-assisted digestion (CEM-Technologies, Buckingham, UK) in concentrated nitric acid.

RNA preparation

Tillers were grown hydroponically for 2 wk in 50 ml of either complete Hoagland's solution (+) or that solution minus the phosphate (−), in individual centrifuge tubes with the base of the tubes covered in tinfoil to block out light. Plants of N and T phenotypes were grown in a heated glasshouse (c. 18°C), under supplemental sodium lamps. Two independent experiments were conducted. Expt 1 consisted of four plants (N+, N−, T+ and T−). Expt 2 consisted of 16 plants (four each of N+, N−, T+ and T−). Roche 454 sequencing (Roche, Branford, CT, USA) was conducted on plants from Expt 1 on one N replicate grown with (P+) or without (P−) phosphate. A genotype of the N phenotype was chosen for 454 transcriptome sequencing and assembly as it was assumed that the N phenotype would be the most P-responsive from previous physiological studies (Meharg & Macnair, 1992a). Illumina sequencing (Illumina, San Diego, CA, USA) was performed on plants from Expt 2, consisting of four replicate individual genotypes (not including the genotype used for 454 sequencing) of each phenotype (T and N) grown in each of P+ and P− in a factorial design, phenotype (tolerance) by P treatment; that is, 16 samples in total. All four plants from Expt 1 were also sequenced using Illumina technology, but were not included in the phenotype × P Illumina analysis, as their RNA was extracted at a different time-point, which led to differential gene expression. However, these additional T and N genotypes were used in SNP analysis, so that SNP calling was conducted on 20 samples.

On harvesting blotted, lightly dry roots and shoots for each replicate had their RNA extracted. Samples were ground under liquid nitrogen and total RNA was extracted using the Plant RNeasy extraction method (Qiagen), with the additional on-column DNase treatment. The resulting material was stored at −80°C until shipping on dry ice for analysis.

Transcriptome sequencing

Sequencing was conducted at the Max Planck Plant Breeding Research Institute, Cologne, Germany. RNA was reverse-transcribed to cDNA, fragmented, polyA-enriched and sequenced on a Roche 454 GS-FLX (titanium chemistry) and HiSeq 2000 (100-bp paired-end Illumina technology).

For generation of a reference transcriptome assembly, two normalized Roche 454 libraries were prepared for one arsenic-nontolerant genovar (N) and sequenced on a Roche 454 GS-FLX using titanium chemistry. Half a plate was used for transcriptome sequencing of P-treated (N+) plants and half a plate for nontreated (N−) plants.

For gene expression and SNP analysis (RNAseq, RNA sequencing), a 100-bp paired-end Illumina sequencing library was generated for each of the 20 samples. These included all four samples from the first experiment (T−, N−, T+ and N+), as well as another 16 samples (Expt 2) consisting of four tolerant (T) and four nontolerant (N) genovars, receiving P− and P+ treatment (four each of N−, N+, T− and T+). Illumina reads were mapped to the 454 reference transcriptome assembly.

For annotation of the reference transcriptome with standalone BLAST (blast-2.2.22;, the Osativa_193_transcript and Osativa_193_peptide databases were downloaded from, the plant-refseq database from and the nucleotide (nt) database from

For generation of the reference transcriptome, the 454 reads were adapter and quality trimmed and assembled with newbler version 2.6 (Roche). After assembly, isotigs were annotated by blasting them against Oryza sativa transcript (BLASTn), O. sativa peptide, plant-refseq (BLASTx) and nt (BLASTn) using an e-value cut-off of 1.00E-08. A BLAST report was compiled by parsing of all BLAST results with Perl scripts using BioPerl modules (Stajich et al., 2002) to extract the top hit accession, description, E-value, and per cent identification (id) from each BLAST search. Annotations of putative function for O. sativa peptides and O. sativa transcripts were inserted into the BLAST report after submission to

For identification of differentially expressed transcripts/genes, 100-bp paired-end Illumina reads from Expt 1 (four samples) and Expt 2 (16 samples) were aligned to the assembled 454 reference transcriptome (isotigs) using bowtie (Langmead et al., 2009), allowing multiple matches (option –a) reporting only the best hits obtained for each read pair (option –best –strata) with an allowed maximum of three end-to-end mismatches (option –v 3) to an isotig. For each of the 20 samples, the number of reported reads aligning to each isotig was counted with use of a Perl script.

Differentially expressed transcripts/isotigs were identified in R ( with the package DESeq (Anders & Huber, 2010) using a false discovery rate (FDR) < 0.1 as the cut-off for significance. DESeq analysis was carried out on the Illumina data from Expt 2 (= 4) for all four pairwise comparisons (N− versus T−, N+ versus T+, N− versus N+, and T− versus T+) to allow identification of genes relevant to P+/P− treatment and those involved in conferring the T versus N phenotype. After DESeq analysis, the log2 fold change (log2FC) in genes with low expression was recalculated as follows: DESeq-calculated normalized counts of < 5 were set to a baseline of 5. This allowed estimation of log2FC even if one of the treatments showed a mean normalized count of 0 and furthermore ensured that all two-fold changes reported had normalized mean expression values of at least ≥ 10 in one of the treatments in question. Isotigs with FDR < 0.1 for any treatment comparison (344 isotigs) were submitted to BLAST2GO (default settings: BLASTx, nr, BLAST expect value 1.0E-03, and number of BLAST hits 20; Conesa & Gotz, 2008) for further functional analysis including Gene Ontology, Enzyme Code and Interpro domain search. Fold changes and associated probabilities for isotigs/genes specifically discussed in the text are given in Table 1.

Table 1. Log fold changes (logFC) for Holcus lanatus isotigs/genes discussed in the text
IsotigFunctional annotationT−/N−T−/N−T+/N+T+/N+N−/N+N−/N+T−/T+T−/T+
  1. In bold, black are absolute LogFC > 1, which corresponds to absolute fold changes > 2. In bold, italics are statistically significant changes with false discovery rate (FDR) < 0.1.

  2. cont., containing; DEG, degradation; ind. induced; N, nontolerant; P, phosphorus treatment; T, tolerant.

Up-regulated in T
13038Retrotransposon protein 6.98 6.39E−04 6.37 1.37E−03 0.001.00E+000.611.00E+00
13187Transposon protein 6.47 2.46E−01 6.55 8.90E−02 0.001.00E+00−0.071.00E+00
13075Transposon protein 5.70 2.21E−02 6.20 1.75E−02 0.001.00E+00−0.501.00E+00
15546Retrotransposon protein 6.60 1.49E−03 6.17 5.23E−04 0.001.00E+000.421.00E+00
7417Retrotransposon protein 1.96 3.62E−01 3.85 3.05E−05 2.12 8.42E−02 0.231.00E+00
Down-regulated in T
2887Transposon protein6.142.21E−016.32 2.98E−02 −0.171.00E+000.001.00E+00
5913Retrotransposon protein3.012.64E−013.26 6.07E−02 −0.041.00E+000.211.00E+00
17128Aspartic proteinase nepenthesin-1 precursor1.83 3.00E−02 2.34 5.23E−04 0.181.00E+000.691.00E+00
10719Aspartic proteinase nepenthesin-1 precursor−0.879.10E−011.21 9.17E−02 −0.201.00E+000.141.00E+00
3772FtsH protease−0.952.41E−01 −1.12 2.38E−02 −0.141.00E+000.041.00E+00
20448Ubiquitin family protein −1.15 5.65E−01 −1.35 7.41E−02 0.291.00E+000.481.00E+00
Up-regulated in P−
3645Putative subtilisin homologue (nonspecific protease)0.111.00E+00−0.151.00E+00 1.50 1.52E−08 1.76 9.28E−12
8177Ubiquitin-conjugating enzyme protein0.051.00E+000.031.00E+00 1.44 4.91E−06 1.46 9.68E−06
17019Ubiquitin domain-containing protein 1−0.191.00E+000.111.00E+00 1.10 1.06E−02 0.802.89E−01
1092Inorganic phosphate transporter0.971.00E+000.501.00E+00 4.10 6.27E−06 4.57 4.82E−07
18981Inorganic phosphate transporter0.041.00E+00−0.341.00E+00 4.31 5.59E−07 4.69 7.06E−08
9507Inorganic phosphate transporter−0.031.00E+00−0.151.00E+00 1.51 5.25E−02 1.63 1.53E−02
16690Inorganic phosphate transporter0.301.00E+000.361.00E+00 1.63 2.58E−03 1.58 3.29E−03
1429SPX domain-containing protein0.001.00E+00−0.041.00E+00 3.28 2.70E−24 3.33 1.36E−24
5965SPX domain-containing protein0.351.00E+000.231.00E+00 6.39 2.42E−25 6.51 1.91E−26
10850Glycerophosphoryl diester phosphodiesterase protein−0.061.00E+00−0.221.00E+00 1.56 1.45E−08 1.71 4.72E−10
3521Sucrose phosphate synthase0.441.00E+000.201.00E+000.614.16E-010.84 3.50E−02
1457Soluble inorganic pyrophosphorylase0.151.00E+00−0.071.00E+00 1.75 9.69E−07 1.97 6.23E−09
10929Purple acid phosphatase−0.091.00E+00−0.331.00E+00 7.59 1.03E−05 7.83 1.70E−06
10537Purple acid phosphatase precursor0.221.00E+000.351.00E+00 3.99 1.36E−13 3.86 1.30E−12
12752Aldose 1-epimerase−0.091.00E+00−0.041.00E+000.97 2.83E−03 0.92 2.41E−03
11786Glucose-6-phosphate phosphate translocator0.091.00E+00−0.091.00E+00 1.20 7.20E−03 1.38 1.20E−03
Down-regulated in P−
19161Protease inhibitor−0.411.00E+00−0.381.00E+001.012.18E−011.05 9.68E−02
9222Putative Deg protease homologue−0.361.00E+00−0.061.00E+00−0.521.00E+00−0.82 7.77E−02
19161Protease inhibitor−0.411.00E+00−0.381.00E+001.012.18E−011.05 9.68E−02
11028Integral membrane DUF6 cont. protein (auxin ind.)−0.231.00E+000.411.00E+00−0.785.39E−011.43 5.58E−04

For identification of isotigs showing homology to specific proteins of interest, proteins were blasted against all isotigs (tBLASTn). For visualization of assigned protein homologies, some selected isotigs were translated into protein (expasy; and aligned with homologous plant protein sequences using mafft version 7 (Katoh & Standley, 2013). mafft alignments were imported into seaview (Gouy et al., 2010), and the alignments exported.

samtools was used for identification of SNPs with variant and mapping quality > 20 (Li et al., 2009). SNP tables of all 20 plants were merged and homozygous and heterozygous SNPs consistent across N and T phenotypes (= 10) were extracted as potentially relevant drivers of N versus T phenotype using a Perl script.

The sequencing experimental design data and read files are available in the ArrayExpress database ( under accession number E-MTAB-1678 and the European Nucleotide Archive ( under study accession number ERP003246.


There was a clear segregation into arsenate T and N phenotype classes using root extension assays in the presence and absence of arsenate (Fig. 1). With respect to the P-responsiveness of these two phenotypes, when grown in their soil of origin the plants responded differentially to phosphate fertilizer (Fig. 2). General linear modelling (GLM; using minitab v.16, Minitab, State College, PA, USA) of ranked data (which was necessary because of non-normality of the untransformed data) showed a significant (= 0.004) phenotype × P-fertilization interaction for shoot : root ratio; all other model terms were not significant for this comparison. The T phenotype did not reduce relative root growth in response to P addition, whereas for the N phenotype this ratio increased; that is, N produced more shoot relative to root. There were no significantly different (i.e. > 0.05) terms for both shoot weight and shoot P, while only the treatment term was significant for root weight. The total arsenic concentration in shoots was significantly elevated in shoots of the N phenotype compared with those of the T phenotype in an ANOVA model on ranked data (= 0.018), and arsenic was significantly suppressed (= 0.048) by P fertilization (Fig. 2).

Figure 2.

Box plots of the response of Cruickshank Botanic Gardens Holcus lanatus tolerant (T) and nontolerant (N) phenotypes to phosphate (P) fertilization (100 mg kg−1 P on a soil DW basis) when grown from tillers for 2 months on their soil of origin. Arsenate-tolerant plants, orange bars; nontolerant, pink bars. No P fertilization, bars without hatching; P-fertilized plants, hatched bars. Shown on the bars are the median and the 25th and 27th (outer box), 10th and 90th (whiskers) and 5th and 95th (dots) percentiles.

Roche 454 sequencing generated c. 1 million reads and a total of 474 megabases (MB) of sequence data for assembly of a reference transcriptome. Approximately 82% of all reads and c. 85% of all bases (400 MB) aligned. The inferred read error was 1.06% (Supporting Information Table S1). Assembly with newbler 2.6 generated a reference transcriptome (Table S2) with a total of 22 313 isotigs. The overall size of the assembled reference transcriptome was 29 MB. The average isotig size obtained was 1302 bp, the N50 isotig size was 1489 bp, and the number of isotigs ≥ 1 kb was 12 828 (Table S1). Isotigs were blasted against O. sativa, plant refseq transcriptome databases and the nonredundant nucleotide database (nt; Table S3). Of these, 18 204 returned a match against O. sativa transcripts, 18 954 against O. sativa peptides, 19 344 against plant refseq and 19 589 against nt (cut-off for significance < 1.00E-08). BLAST against nt returned hits almost exclusively against plant cDNA/mRNA, predominantly Hordeum vulgare (barley) and Triticum aestivum (wheat).

Paired-end Illumina sequencing produced an average of c. 53 million reads for each of the 20 libraries, giving a total of 1052 million 100-bp paired-end reads or 210 gigabases (GB) of sequence data. Forty-eight per cent (100 GB) of the HiSeq data aligned successfully to the assembled 454 transcriptome without any prior clipping of reads, with an average of 25 million full-length 100-bp paired-end reads (5 GB) for each of the 20 libraries mapping successfully to isotigs with three or fewer end-to-end basepair mismatches (Table S1). Of 22 313 isotigs, only 52 isotigs did not obtain any mapped paired-end Illumina reads when aligning all Illumina reads (from all 20 individual samples), giving good verification of the 454 reference transcriptome assembly (Table S4). The isotigs that did not obtain any mapped Illumina reads may have been either too short (< 250 bp) or misassembled. Nonaligned Illumina reads, in contrast, are likely to be a mixture of poor-quality reads and reads containing adapter sequence, which would require end-trimming, or alternatively originate from transcripts with low expression, which may not be represented in the 454 reference library, but may still be picked up amongst the Illumina sequences because of the much deeper coverage achieved with Illumina technology. The transcript expression results showed strong clustering of each P+ and P− treated genotype, verifying that each plant was indeed a different genotype (Fig. 3). T genotypes clustered together, as did N genotypes, showing that the phenotypes had distinctive transcript expression signatures. Differential expression analysis was carried out with DESeq (Anders & Huber, 2010) and identified 344 isotigs, with an FDR of 0.1, significantly up- or down-regulated in response to either P+/P− treatment or N/T phenotype (Fig. 4, Table S3). Of these, 87 isotigs were shown to be differentially expressed between the N and T phenotypes, while the majority, 258 isotigs, were shown to be differentially expressed in response to P+/P− nutrition treatment. There was no overlap between the significant isotigs identified for P+/P− treatment and N/T phenotype response.

Figure 3.

Heatmap showing the Euclidean distances between Holcus lanatus genotypes as calculated from the variance stabilizing transformation of the RNAseq count data (gplots and heatmap2, R). The heat map shows a colour representation of the Euclidean distance matrix (from dark grey for zero distance to white for large distance), and the dendrogram represents a hierarchical sample-to-sample clustering. While phosphorus (P+/P−) of each genotype show the highest degree of similarity, there is also clustering between tolerant (T; orange) and nontolerant (N; pink) phenotype samples.

Figure 4.

Overview of the Holcus lanatus RNAseq gene expression results: up- and down-regulated genes were identified for the phosphorus treatments (P−/P+) and tolerant (T)/nontolerant (N) phenotypes, with the average fold change shown by either P treatment or phenotype. Significant fold changes between P treatments are coloured green (P−/P+) and those between T and N phenotypes are coloured blue (T/N). All genes significantly differentially regulated (at a false discovery rate (FDR) = 0.01) between either P treatments or phenotypes are shown; bars are white where the fold change was not statistically significant (at FDR = 0.01) in the other treatment comparison.

The amplitude of fold change was greater between phenotypes than between P treatments, while annotation was better for P-responsive genes (Fig. 4). Annotated genes in classes that are highly relevant to the current study show that transcripts significantly differentially regulated between T and N phenotypes are dominated by kinases, pathogen resistance, plant growth regulators (PGRs), proteases, transposable elements (TEs) and RNA-directed activity, but none involved in phosphate transport (Table S3, Fig. 4).

With respect to the T phenotype, the only annotated gene absent compared with the N phenotype, where it is highly expressed, is a kinase receptor (isotig09647). Furthermore, isotigs with significant homology to cbl (Casitas B-lineage Lymphoma)-interacting kinase, MAPK (Mitogen-activated protein kinase) kinase and serine/threonine kinases had systematic differences in SNPs between the N and T phenotypes, as did isotigs showing homology to proteasome-associated protein, transferases and a ribonucloprotein/RNA recognition protein (Fig. 5, Table S5). Other transcripts belonging to kinases, some of these showing homology to cbl-interacting kinases 9, 14 and 23, were found to be up-regulated under low P status (Fig. 4, Table S3).

Figure 5.

Consistent tolerant (T) versus nontolerant (N) (= 10) homozygous (c. 100%) and heterozygous (c. 50%) single nucleotide polymorphisms (SNPs) identified in Holcus lanatus genotypes are found predominantly in transcripts with regulatory function. Red, transversions A <↔> C, A <↔> T, G <↔> C and G <↔> T (interchange of two-ring purine (AG) for one-ring pyrimidine (CT)); green, the more common transitions A <↔> G and C <↔> T (interchange of two-ring purine (AG) for one-ring pyrimidine (CT)).

Some sequences with homology to kinase PSTOL1 (phosphorus-starvation tolerance 1) were identified in Hlanatus, such as isotig20112, which showed c. 88% (identities 145/164) homology to the serine/threonine protein kinase LOC_Os01g04570.2 and 63% (identities 108/169) homology to Pstol1/OsPupK46-2, also annotated as serine/threonine protein kinase (Fig. S1), and isotig03216, which showed 50% (identities 158/312) homology to protein serine/threonine kinase LOC_Os01g04570.2 and 49% (identities 154/313) homology to Pstol1/OsPupK46-2. Isotig20112 was expressed in all T and N phenotypes but was approximately four-fold down-regulated in two of four N phenotypes, with no P effect. Isotig03216 was again expressed in all N and T phenotypes, with N5, N4, N2 and T4 showing approximately greater than three-fold lower expression compared with T2, T3, T5 and N3, but the overall observed approximately two-fold up-regulation in T− versus N− was not statistically significant (Table S3).

The only transcript absent in the T phenotype and present in the N phenotype (present in three out of five N phenotypes and absent in all five T phenotypes), isotig09647, noting that there was also a transposon severely suppressed, was a receptor protein kinase. Another probable serine/threonine protein kinase WNK2 (lysine deficient protein kinase 2)-like, isotig18018, was c. 60-fold up-regulated in T compared with N. It is apparent that kinases play a role in both T/N phenotypic and P-responsive differences in the H. lanatus transcriptomes presented here and that there is possibly some level of functional redundancy.

Also of note is that an auxin-binding protein (isotig16840) was highly expressed in the T phenotype compared with the N phenotype. The alignment of the translated isotig16840 is shown in Fig. S2 and shows a high homology to auxin-binding rice (O. sativa) and B. distachyon proteins. The higher expression of an auxin-binding protein was accompanied by decreased expression of an expansin precursor transcript in the T phenotype, compared with the N phenotype, and the enhanced expression of an auxin-responsive protein (isotig11028) under high P (Table S3). The alignment of translated isotig12721 with the expansin precursor can be seen in Fig. S3.

While most expressed transposons, some of these annotated with RNA-directed activity, were up-regulated in T compared with N (isotig13038, isotig13187, isotig13075, isotig15546 and isotig07417), one transposon of the en/spm (enhancer/suppressor-mutator) subclass (isotig02887/02888) and one unclassified nucleic acid-binding transposon (isotig05913) were suppressed in T compared with N (Table S3).

A suite of isotigs involved in protein degradation, including two with homology to an aspartic proteinase nepenthesin precursor (isotig17128 and isotig10719), one FtsH (filamentation temperature sensitive H) protease (isotig03772) and a ubiquitination-like protein (isotig20448), were suppressed in the T phenotype compared with the N phenotype, indicating that post-translational protein degradation may be an important factor for the N phenotype. With respect to P treatment, a ubiquitin-domain protein (isotig17019), a protease inhibitor (isotig19161), a putative subtilisin homologue and a nonspecific protease (isotig03645) were significantly up-regulated in response to P starvation, while a ubiquitin-conjugating enzyme-like (isotig08177), an ICE (interleukin-1 convertase)-like protease p20 domain-containing protein (isotig10848), a putative Deg (degradation) protease homologue (isotig09222) and an LTPL113 (lipid transfer protein-like 113) protease inhibitor (isotig19161) were significantly down-regulated in response to P starvation.

Furthermore, six consistent homozygous and 30 consistent heterozygous SNPs for N versus T phenotype (n = 10) were identified (Fig. 5). These occurred predominantly in transcripts associated with regulatory functions such as proteolysis (protease, proteasome subunit and heat shock), protein modification (cysteine desulferase, glycosyl transferase, transferase and kinase) and RNA recognition (RNA-binding and ribonucleoprotein) and were a mixture of transition and transversion SNPs (Fig. 5). The largest number of SNPs (three transversion and one transition SNP) between T and N were identified in a putative proteasome subunit (isotig11038), with another three SNPs identified in an FtsH protease (isotig03772) and two in a heat-shock protein (isotig03795/03796; Fig. 5).

The transcripts differentially regulated by P treatment were more completely annotated compared with those whose expression differed between phenotypes (Table S3), with many of the genes identified as being significant having well-known roles in P transport and metabolism, as well as those involved in post-transcriptional and post-translational, and signalling (Fig. 4, Table S3). Some gene expression responses involved phosphate transport. These include three isotigs annotated as phosphate co-transporters and two with SPX domains. All showed the same general pattern, with respect to transcript counts, with up-regulation under P starvation and down-regulation in the P-replete treatment, but there was little difference with respect to the expression changes in response to P−/P+ in the T and N phenotypes. Alignments of phosphate transport translated isotigs to proteins are shown in Figs S4 and S5. Isotig01092, isotig18981 and isotig09507 showed strong homology to AAM49810.1, a putative rice HAPT (Fig. S4), and isotig16690 showed strong homology to Q651J5 phosphate transporter PHO1-3, which is known to be induced under P-deficient conditions (Fig. S5). Among the few isotigs with significantly higher counts in P+ compared with P− in T, but not in N, was an auxin-induced protein (isotig11028; Fig. S6).


Tolerance phenotypes

The balanced polymorphism in arsenate tolerance previously characterized in H. lanatus (Meharg & Macnair, 1992c; Meharg et al., 1993; Naylor et al., 1996) was also clearly exhibited in the CBG population, as expected. Here we report the first phosphate-specific response for this polymorphism, as previous investigations tried to link tolerance gene frequency to soil phosphate status (Naylor et al., 1996), rather than studying phenotype response to P fertilization. A differential allocation to root and shoot biomass in response to P fertilization was observed, T maintaining a relatively constant shoot : root ratio while N showed the classic plant response to P addition in that shoot : root ratio increased (Gojon et al., 2009). This differential response in shoot : root ratio to P availability is likely to be the reason why this polymorphism is maintained, and this will be outlined in more detail in subsequent publications. Shoot P (root P was not measured as it is impossible to remove all adhering soil, which greatly confounds interpretation) does not differ between phenotypes, and this shoot P status is not P fertilizer responsive, all indicating tight homeostasis, a known characteristic with respect to plant P nutrition (Hill et al., 2006; Gojon et al., 2009).

Transcriptome assembly and expression patterns

With respect to annotation of the transcript data, the highest homologous matches against plant-refseq were invariably identified against protein/transcript sequences of B. distachyon, Sorghum bicolor and O. sativa, of these B. distachyon being the most frequent hit. Holcus lanatus is most closely related to B. distachyon of currently genome-sequenced grasses (Aliscioni et al., 2012). The only previously reported H. lanatus gene sequence, AY704470, a CDC25 (Cell division cycle 25) phosphatase homologue (Bleeker et al., 2006), was identified in all genotypes here, with isotig19077 showing 100% identity to this published sequence (tBLASTn; Fig. S7), giving further verification of the transcriptome assembly.

A key finding of this study was the independence of transcript expression between N/T and P+/P− nutrition; in particular, transcript expression of genes involved directly in P nutrition (i.e. those encoding enzymes involved in P transport and P compound synthesis/degradation) was not the basis of tolerance as would have been hypothesized from a priori knowledge, as these phenotypes results in suppressed phosphate/arsenate transport (Macnair et al., 1992; Meharg & Macnair, 1992a), including T phenotypes screened out of populations established on uncontaminated soils (Meharg & Macnair, 1992c). Given the large differential expression of classes of transcripts involved in post-translational and post-transcriptional regulation, probably regulating phosphate transporter enzyme production/degradation/activation, it is argued here that tolerance is attributable to these post-translational and post-transcriptional regulatory genes.

Phosphorus-sensitive transcriptome

Many of the transcripts responsive to P nutrition are regulated as expected, with many obviously involved in P nutrition and generally strongly up-regulated in response to P stress. For example, the four phosphate transport-related transcripts and the two SPX transcripts identified in H. lanatus in this study were induced by P stress, consistent with what has been observed generically in studies from the literature, as exemplified by the phosphate-responsive transcriptome of white lupin (Lupin albus) (O'Rourke et al., 2013). Other transcripts coding for proteins involved in P regulation or transport identified (Table S3) include glycerophosphodiester/phosphodiesterase, sucrose phosphate synthase, glucose pyrophosphorylase, purple acid phosphatase, phosphatase, and glucose-6-phosphate phosphate translocator, also differentially regulated when comparing P-stressed and P-repleat lupin plants (O'Rourke et al., 2013). Also, it is now well established that P-responsive genes (1) give rise to transcriptome signal cascades and/or (2) are involved in these signal cascades (Chiou & Lin, 2011; Wu et al., 2013) .

Tolerant/nontolerant transcriptome

Although we know that arsenate tolerance is under the control of a single gene (Macnair et al., 1992), the differential expression of transcripts between the T and N phenotypes represents a suite of genes, for which there are two potential explanations. The first is that there is an as yet unidentified gene that is itself controlling the transcript production of a host of genes differentially expressed in the T/N comparison. The second, which is not exclusive of the first, is that differences in metabolism resulting from differential function of a gene(s) may lead to feedback regulating transcripts of interrelated functions, such as the obvious impact of P starvation on P metabolism observed here. P stress perception in plants is known to induce a host of differential responses, such as tillering, root biomass production, arbuscular mycorrhizal regulation, and rhizosphere excretion of dicarboxylic acids to mobilize phosphate from iron minerals, and this will lead to differential regulation of a network of genes (Hill et al., 2006; Gojon et al., 2009; Chiou & Lin, 2011). Again, the arsenate tolerance gene has a range of pleiotropic consequences (shoot/root biomass allocation, HAPT suppression and arsenate tolerance itself), consistent with such a feedback and/or upstream regulator model.

A number of recent studies have found upstream regulators of low P adaption responses, with low P adaption having been postulated as the driver in maintaining the H. lanatus balanced polymorphism under study here (Meharg & Macnair, 1992c; Naylor et al., 1996). ALFIN-LIKE proteins have been implicated in regulating root hair growth in Arabidopsis under P stress (Chandrika et al., 2013). ALFIN-LIKE proteins are a small family of plant homeo domain (PHD)-containing putative transcription factors with a methylated histone residue-binding component, and ALFIN-LIKE 6 was shown to control the transcription of a range of genes involved in growth, in particular root hair growth (Chandrika et al., 2013). A homologue of this gene was identified in our H. lanatus transcriptome (isotig02053) and the translated isotig showed high similarity to Arabidopsis and even higher similarity to B. distachyon ALFIN-LIKE protein (Fig. S8). This transcript was equally highly expressed in all N and T plants and not differentially regulated between either P treatments or phenotypes, suggesting that it has a limited role in P nutrition and arsenate tolerance in H. lanatus.

Kinases were strongly differentially expressed in T/N transcript comparisons, with some kinases having recently been identified as being central in phenotypic differences in plant root response to P status (Gamuyao et al., 2012) as well as shown to be up-regulated in response to arsenate stress (Huang et al., 2012). A kinase that confers adaption to soil P stress in rice, PSTOL1, has been characterized (Gamuyao et al., 2012). This gene is an enhancer of early root growth and its over-expression leads to increased grain yields, hypothesized to be attributable to more efficient P capture as a result of larger root systems, with larger root systems characterizing the H. lanatus T phenotype here (Fig. 2). PSTOL1 homologues, although detected in the transcriptome, had no role in differential tolerance or P nutrition response in H. lanatus. However, Cbl-interacting kinases were differentially expressed between T and N phenotypes here; these are serine/threonine protein kinases, as is PSTOL1 (Gamuyao et al., 2012).

Auxins, and associated expansins, were also differentially expressed between phenotypes and these have a key role in root growth (Cosgrove, 1999), with differential shoot/root allocation in response to P fertilization being identified here as another pleiotropic effect of arsenate tolerance. Ubiquitins, also differentially expressed in T/N comparisons, which mark proteins for proteasome-mediated degradation, are thought to have a key role in regulation of plant SPX domains in response to P stress (Wu et al., 2013).

Transposons and retrotransposons, the class of isotigs most frequently and most strongly differentially expressed between phenotypes, are thought to play a role in post-transcriptional regulation, with silencing of TEs involving both transcriptional and post-transcriptional mechanisms (Okamoto & Hirochika, 2001; Mirouze & Paszkowski, 2011). In plants, retrotransposons are commonly known to be expressed under conditions of stress (Grandbastien, 1998). It is also thought that retrotransposon activation is sensitive to environment (Grandbastien, 1998; Mirouze & Paszkowski, 2011), further enhancing their candidacy for regulating stress responses, such as nutritional deficiencies. TEs can generate gene variation and functional changes (Gao et al., 2012) and are a source of small RNAs, and they have been implicated in gene regulation in both animals and plants (McCue & Slotkin, 2012). A role for small RNAs in regulation of P starvation in plants is emerging (Fang et al., 2009; Hsieh et al., 2009; Chiou & Lin, 2011; O'Rourke et al., 2013). Further to this, a potential role of a small RNA targeting transcripts involved in post-transcriptional/post-translational regulation leading to T and N phenotypes in H. lanatus is worth further investigation.

There were large and systematic differences in SNPs between T and N phenotypes. Ten of these were identified in transcripts annotated as protease, protease subunit and heat-shock proteins. We assume that these SNPs are of genomic origin, but without the genomic sequence of H. lanatus it is not possible to rule out the possibility that targeted mRNA editing may be involved in some of these cases. RNA editing, which was first identified in the cox2 (cytochrome c oxidase subunit) mRNA of Trypanosoma brucei, is thought to play an important role in organelles (plastids and mitochondria) of plants, with those identified typically involving a change of a specific C to U, but other changes can as yet not be ruled out (Grennan, 2011; Jiang et al., 2012). The role of RNA editing in plant plastids as well as nuclear-encoded RNA/mRNA remains to be further investigated by systematic sequencing of the plant genome (DNA) and transcriptome (cDNA), as has been described for identification of RNA editing sites in human studies (Ramaswami et al., 2012). While RNA-binding proteins of the pentatricopeptide repeat family, multiple organellar RNA editing factors and chloroplast ribonucleoproteins are known to be involved in RNA editing, further proteins remain to be identified (Tillich et al., 2009; Grennan, 2011; Takenaka et al., 2012). So it is noteworthy in this context that one of the homozygous SNPs identified between the T and N phenotypes in this study is in isotig05374, which shows 70% homology to an RNA recognition motif-containing protein/predicted ribonucleoprotein. Further to that, an exonuclease (isotig08248), which mediates RNA degradation (Stoppel & Meurer, 2012), was strongly up-regulated in the N phenotype. Thus, both the N/T gene expression findings and the SNPs obtained for the N/T phenotypes suggest that post-translational regulation of proteins via the ubiquitin–proteasome system plays an important role in determining the N and T phenotypes and, furthermore, indicates a potential role of post-transcriptional regulation (RNA degradation and a possible role of RNA editing). A causative upstream master regulatory gene, inducing post-translational and maybe also post-transcriptional events of consequence for arsenate resistance and P uptake efficiency in these plants, remains to be identified, and potential involvement of small RNAs should be investigated in this context.

With respect to arsenate tolerance, the character used to screen the phenotypes under study, one gene with a putative role in arsenic transport/metabolism, in addition to phosphate transporters and their regulators, that was differentially regulated between phenotypes was an arsB-like gene. The alignment of isotig09604 with an arsB-like protein can be seen in Fig. S9. arsB is widely present in arsenic-resistant bacteria, where its role is as an arsenite efflux channel (Yang et al., 2012), although in plants this class of aquaglycerinporins are also involved in silicic acid transport (Zhao et al., 2009). Logoteta et al. (2009) reported that differential efflux of arsenite was not found between H. lanatus T and N phenotypes. A protein, HLASR (H. lanatus arsenate reductase), has been identified in H. lanatus to have a constitutive, but not an adaptive, role in the metabolism of arsenate, as an arsenate reductase (Bleeker et al., 2006). It was thought that the product of this HLASR gene only had a secondary role in arsenic metabolism and that its primary role was homologous to that of CDC25 phosphatases, which activate cycline-dependent kinases in Arabidopsis, which are involved in cell cycle regulation. HLASR is also thought to have a role in GSH (reduced glutathione) oxidation (Bleeker et al., 2006). An exact protein match (isotig19077) to this Cdc25-like H. lanatus ASR, the only gene sequence previously published for this species, was found in all 10 N and 10 T transcriptomes (Fig. S7), and was shown not to be differentially regulated, confirming that it has no adaptive role in tolerance.

We identified 87 transcripts whose expression significantly differed between the T and N phenotypes, and 19 transcripts (17 with functional annotation) with consistent SNPs (36 SNPs in total) in all 10 T and N genotypes (Table S5). It is noteworthy that consistent SNPs and significant gene expression changes between N and T phenotypes are aggregated in transcripts with regulatory functions (Figs 4, 5). Differential post-translational and/or post-transcriptional regulation (involving ubiquitin, proteases, kinases, methylation, transponsons, retrotransposons and RNA-binding proteins), potentially of HAPT proteins or proteins acting on HAPT, therefore, appears to define the N/T phenotypes. Whether the identified SNPs are all genomic SNPs or whether in some cases mRNA editing may be involved remains to be elucidated. A master regulatory gene, potentially to be found within our list of target genes, or possibly in the form of an as yet unidentified small RNA, producing the observed effect on genes involved in post-translational events, including protein degradation via the ubiquitin/proteasome complex and potentially also post-transcriptional events mediated by RNA-binding proteins, is still to be identified. This characterization of the genetic consequences of the P response polymorphism in H. lanatus provides an unparalleled insight into the signal cascades, optimized under natural selection, involved in P nutrition and has major consequences for understanding how plants respond to P nutrition and adapt to arsenate in their environment. We anticipate that this as yet unknown master regulatory gene and its downstream targets, which we have already identified, will be of significant consequence for future study and breeding of P-efficient forage plants and cereal crops.

Furthermore, the transcriptome characterized here will enable future transcriptomic studies on arsenic mine-adapted plants (Macnair et al., 1992) to be more focused, as transcripts identified in this study will be the starting point for looking at further selection that may occur on mine spoil soil. Key to this is the fact that the identification of transcripts involved in arsenate tolerance within a polymorphic population will enable the identification of confounding genes that have no involvement in arsenate tolerance, but appear in comparisons of mine and non-mine population transcriptomes as a result of other selection pressures.


We would like to thank Prof. Dr Maarten Koornneef (MPIZ, Cologne, Germany) for facilitating the sequencing, and for hosting A.A.M. as a guest, at MPIPZ.