Comprehensive gene expression atlas for the Arabidopsis MAP kinase signalling pathways


Author for correspondence:
László Bögre
Tel:+44 1784 443407
Fax:+44 1784 414224


  • • Mitogen activated protein kinase (MAPK) pathways are signal transduction modules with layers of protein kinases having c. 120 genes in Arabidopsis, but only a few have been linked experimentally to functions.
  • • We analysed microarray expression data for 114 MAPK signalling genes represented on the ATH1 Affymetrix arrays; determined their expression patterns during development, and in a wide range of time-course microarray experiments for their signal-dependent transcriptional regulation and their coregulation with other signalling components and transcription factors.
  • • Global expression correlation of the MAPK genes with each of the represented 21 692 Arabidopsis genes was determined by calculating Pearson correlation coefficients. To group MAPK signalling genes based on similarities in global regulation, we performed hierarchical clustering on the pairwise correlation values. This should allow inferring functional information from well-studied MAPK components to functionally uncharacterized ones. Statistical overrepresentation of specific gene ontology (GO) categories in the gene lists showing high expression correlation values with each of the MAPK components predicted biological themes for the gene functions.
  • • The combination of these methods provides functional information for many uncharacterized MAPK genes, and a framework for complementary future experimental dissection of the function of this complex family.


Completion of genome sequences for a rapidly expanding list of organisms has dramatically changed the experimental approaches in biological sciences; it has permitted examination of thousands of different variables across most components of living organisms. However, to extract information from the vast amount of data and place it into organized units, networks and pathways remains a challenge. It is specifically so even for organisms studied extensively such as Arabidopsis, where only half of the c. 27 000 genes have been functionally annotated based on sequence similarities to known genes, and, among these, experimental evidence concerning their function is available only for 11%. Therefore, the functional elucidation of unknown genes is one of the major challenges in plant biology (Saito et al., 2008). From the different approaches employed to elucidate gene functions, genetics has limitations, including lack of phenotypes owing to genetic redundancy, compensatory mechanisms in the regulatory systems, or to pleiotropic phenotypes unrelated to the direct gene function. Methods in biochemistry only enable the study of components in isolation rather than in connected networks. In signalling pathways no single protein or gene is responsible for any individual biological response, but upon stimulation, signalling proteins display collective dynamic behaviour that none of the individual modules can exhibit in isolation. Thus, signal transduction can be viewed as a gradient of quantitative information propagated throughout a dense protein network from an input through individual receptors to diverse biological outputs. Therefore it is important to find methods that take into account the complex behaviour of biological systems.

In plant research, there is a paucity of data at the proteome level, while there are plenty of genome-wide gene expression data available. Although it is generally accepted that signalling processes are primarily regulated at the posttranscriptional level, there is a significant regulatory input that occurs on the gene expression level.

Hints of function can be gathered by comparing expression patterns of genes with unknown functions with patterns of characterized plant genes. Gene-to-gene correlations are, for example, measured by Pearson's correlation that provides links between genes with similarities in expression pattern across multiple datasets (Aoki et al., 2007). Recently, a graphical Gaussian model has been applied to the Arabidopsis gene network to infer coregulation between gene pairs but also taking into account the behaviour of other genes in the network (Ma et al., 2007). The large number of available datasets allows coexpression analysis across many experimental conditions and tissues, with the effects of individual laboratories and other variations being effectively diminished. Such coexpression analysis has uncovered gene regulatory mechanisms in model organisms such as Escherichia coli and yeast (Stuart et al., 2003). Recently, accumulation of Arabidopsis microarray data has facilitated genome-wide inspection of gene coexpression profiles in this model plant (Aoki et al., 2007), and these methods are increasingly used for gene identification and functional predictions (e.g. in cellulose biosynthesis; Persson et al., 2005), reconstruction of metabolic pathways (Wei et al., 2006; Saito et al., 2008), identifying transcription factors regulating pathways and processes (Hirai et al., 2007; Sonderby et al., 2007; Beekwilder et al., 2008).

Interpretation of functional genomics data is greatly facilitated by a structured description of known biological information at different levels of granularity. The Gene Ontology (GO) project aims at capturing the increasing knowledge on gene function in a controlled vocabulary applicable to all organisms (Ashburner et al., 2000). Gene ontology consists of three hierarchically structured vocabularies that describe gene products in terms of their associated biological processes, molecular functions and cellular components. Gene products may be annotated to one or several nodes in each hierarchy. Coexpressed genes can be grouped on functional annotation, by performing a gene-set enrichment analysis (GSEA; Subramanian et al., 2005). There are online tools for GSEA that enable researchers to reduce datasets consisting of thousands of genes into limited number of biological processes, such as the Biological Networks Gene Ontology tool (BiNGO; Maere et al., 2005). The increasing complexity of functional genomics data also drives the development of methods and tools for data integration and visualization. cytoscape is an open-source software platform for visualizing molecular interaction networks and integrating these interactions with gene expression profiles and other functional genomics data (Shannon et al., 2003). The cytoscape platform supports the development of plugin tools that extend the core functionality. BiNGO is a plugin for cytoscape that assesses the overrepresentation of GO categories in a set of genes (Maere et al., 2005).

Here we apply various approaches for genome-wide gene expression analysis to learn about functions on the mitogen activated protein kinase (MAPK) pathways in Arabidopsis. The MAPK pathways are conserved signal transduction modules composed of distinct combinations of at least three families of protein kinases: MAPKKK (MAP3K/MEKK), MAPKK (MKK/MEK), and MAPK (MPK) which are activated in a cascade by sequential phosphorylation. Arabidopsis MAPK signalling components have been annotated based on sequence conservation (Jonak et al., 2002; MAPkinase-group, 2002). There are c. 120 MAPK signalling components annotated, but few have experimentally verified function.

Within the plant MAPK layer, two structural subtypes can be distinguished, those containing a TEY phosphorylation motif in the activation loop (TEY subtype) and those containing a TDY motif (TDY subtype). Phylogenetic analysis further classified TEY subtype into related A, B, C groups, with the TDY subtype forming a more distant group D (MAPkinase-group, 2002). MPK21–23 form a separate group related both to MAPKs and cyclin-dependent kinases (CDKs), and it is not clear whether they are part of MAPK signalling pathways.

The MKKs lie upstream of the MAPKs. They are the least diverse in the cascades and were therefore suggested to act as points of intersection and integration between converging signals from upstream MAPKKKs and divergent outputs through downstream MAPKs. Sequence analysis has placed the plant MKKs into four groups (A–D; Jonak et al., 2002; MAPkinase-group, 2002). There is only a single MKK identified in the green algae, Chlamydomonas, and Osterococcus, indicating that the complexity in MAPK signalling pathways in higher plants might have evolved from a single pathway (Merchant et al., 2007; Palenik et al., 2007).

The MAPKKKs form the largest family of MAP kinase components, with the Arabidopsis genome encoding 60–80 putative members (Jonak et al., 2002; Champion et al., 2004), but very few members of this family have assigned biological functions. They contain different potential regulatory domains outside the catalytic domain, which means they can be regulated by a variety of upstream signals and then selectively activate MKKs. Sequence analysis of the protein kinase catalytic domain shows that Arabidopsis MAPKKKs fall into three main classes: MEKKs, RAF-like and ZIK-like. RAF1 is a human oncogene, and was found to mediate a variety of mitogenic stimuli to MAPK signalling pathways. The ZIK group of plant MAPK kinase kinases were originally named after ZIK1, a putative MAPKKK that is associated with the MAPK regulator ZR1 (Wrzaczek & Hirt, 2001). This group of kinases consists of 11 members and is also called WNK (with no lysine kinase) because of its similarity to a recently found novel gene family involved in regulating the ion permeability of epithelia in mammals (Xu et al., 2005). However, it is important to note that for only a couple of MAPKKKs and none of the RAF and ZIK-related kinases has it been demonstrated that they actually function as MAPKKKs. Thus, we refer only to the MEKK-like genes as MAPKKKs (MAPKKK1–20) and use the nomenclature of RAF (RAF1–48) and ZIK (ZIK1–10) for the two other groups (Jonak et al., 2002).

Here we present a comprehensive transcriptional analysis of 114 potential Arabidopsis MAPK signalling genes represented on the ATH1 Affymetrix microarray using genome-wide gene expression data publicly available. The combination of expression analysis over a large set of well-defined experimental conditions with analysis of GO annotation of genes having correlated gene expression patterns allowed us to suggest involvement in specific functions for a large number of uncharacterized MAPK genes. The suggested roles for specific signalling components and modules can be further tested with targeted experiments.

Materials and Methods

Search for MAPK signalling genes with significant regulation during stress responses

All expression data was downloaded from the NASC webpage ( Different datasets were defined into various functional biological categories, where common gene responses may be expected. For each treatment the average signal was calculated (average of two chips after response to environmental stresses; average of three chips after response to pathogen infection). Such processed absolute signals (Xi) were centred across each time-course experiment after stress treatment, and for each gene the normalized signal (Xnorm) at time i was determined as follows:

Xnorm = (Xi − Xaverage)/SD

Values of normalized signals were imported into the software genemaths (version 2.0; Applied Maths, Sint-Martens-Latem, Belgium).

To identify MAPK signalling genes which show significant regulation and coregulation with other putative regulatory genes in time-course experiments after stress treatment, a subset of genes including protein kinases (, protein phosphatases ( and transcription factors ( represented by probes on the ATH1-array were used. As one criterion for regulation, the fold change (FC) in expression was calculated by comparing for each time-point the average signal to the appropriate control sample at time 0 (in each dataset or time course). Only those genes were used for further cluster analyses, which were expressed in at least one experiment (≥ 1 present call per dataset or time course) and which showed at least a threefold change in expression compared with the appropriate control sample. Hierarchical cluster analysis of variance normalized signals was performed to identify coregulation of selected genes.

Global expression correlation analysis

The Nottingham Arabidopsis Stock Centre's microarray database (NASCArrays, was the source of data for the generation of scatter plots and the calculation of the relative correlation value. All Affymetrix ATH1 GeneChip array data deposited held at NASC are normalized using the MASuite 5.0 Scaling Protocol Algorithm to exclude the top 2% and bottom 2% of signal intensities before the mean is calculated. Expression data from individual experiments were normalized by scaling to a mean value of 100 for each slide; no other modification or scaling was carried out before calculating Pearson correlation coefficients unless otherwise specified. The ‘Superbulk gene’ file was downloaded (; June 2005). The file consisted of nearly 1800 hybridizations (see the Supplementary Material, Table S1), each with expression level measurements for over 22 500 genes represented on the ATH1 array. The arrays are derived from varied experiments, tissues, conditions, treatments and genetic backgrounds, providing the diversity for expression correlation analysis. A cut-off value of 1 (all values < 1 were discarded) was applied to the data before performing the analysis. A few slides (< 50 from three separate experiments) that used RNA from species other than A. thaliana or which involved pre-amplification of the RNA used as the source for the hybridization, were excluded. In the Supplementary Material, Table S1 lists all GeneChip experiments used with a NASCArrays experiment reference number, a short description of the experiment and a hyperlink to the NASC webpage with detailed information on each experiment, such as conditions and number of replicate slides used. This name is the same as in the ‘Superbulk gene’ file. For the correlation analysis, no further array normalization or processing of replicates was performed. The correlation analysis was performed essentially as described by Toufighi et al. (2005) by calculating the Pearson correlation coefficient for each gene pair from a two-gene scatterplot in the linear space using standard linear regression analysis. Briefly, for two sets of expression values (where X = {X1, X2, ... , Xn} and Y = {Y1, Y2, ... , Yn}), the Pearson correlation coefficient is defined as

inline image

and ranges from 1, for perfect correlation, to −1 for perfect anticorrelation (Toufighi et al., 2005). A detailed description of the program used for automation of the correlation calculation will be described elsewhere (L. Mizzi & P. Morandini, unpublished).

Generation of compressed gene expression correlation heatmaps

Calculations were carried out for all 114 MAPK genes against the 21 692 genes uniquely represented by a probe on Affymetrix's ATH1 GeneChip array. For each gene pair, the Pearson correlation coefficients were imported into genemaths (version 2.01) for visualization. Because of the requirement of having data normally distributed for calculating Pearson correlation, some expression datasets required logarithmic transformation (such as those from developmental stages, pollen or cycloheximide treatment, which are characterized by extremely high expression values for some genes). Hence for consistency all other datasets were also log-transformed and Pearson correlation coefficients were calculated for both linear and log-transformed data. Unless otherwise specified, all expression analyses and correlations were carried out using data without log transformation. To identify the expression relationships of the 114 MAPK cascade genes and to group these genes further based on correlation of expression across a wide range of random experiments, hierarchical clustering analysis was performed and a new matrix was calculated by an unweighted pair group method using arithmetic averages (UPGMA (large N/p); (Eisen et al., 1998) as the clustering algorithm. A compressed correlation heatmap for all 114 MAPK genes against a list of 1612 transcription factors represented by specific probes on Affymetrix's ATH1 GeneChip array was generated by the same method.

GO significance analysis of coregulated gene sets by BiNGO

The highest correlating genes to each of the 114 MAPK genes represented by a probe on the ATH1 GeneChip array were identified (cut-off values between ≥ 0.5 and ≥ 0.75) and then used to ascertain whether specific GO categories were significantly over-represented using BiNGO (Maere et al., 2005).

Results and Discussion

Developmental regulation of MAPK signalling genes

First the absolute expression levels of MAPK signalling genes was analysed, as measured by the Affymetrix microarray, and their developmental regulation visualized. The AtGenExpress project represents a developmental series (Atlas) of gene expression profiles from various developmental stages and tissues (NASCARRAYS-149-155, Schmid et al., 2005). In addition, NASCARRAYS-367 was included, which is a gene expression map of tissues from Arabidopsis root, derived from dissection and cell sorting (Birnbaum et al., 2003). The combined data were organized to provide a visual overview of the expression levels of individual MAP-kinase-related genes in various organs and developmental stages. Related tissue types were grouped: flowers and pollen; leaves; roots; seedlings, shoots and stem; siliques; seeds (see the Supplementary Material, Fig. S1). In Fig. S1 each bar represents the average of the absolute detected signal values for the gene in question from a specific tissue sample calculated from three replicate experiments. The graphical representation of the data in Fig. S1 provides an overview on expression levels and some characteristics in tissue specificities, but do not allow determination the expression level of specific samples. Therefore the data were also organized in an Excel format that is searchable and displayable for all the tissue types and all the sample points (see the Supplementary Material, Data S1, and at

It is notable that MAPK signalling components show a wide range of expression levels and distinct regulation during plant development. All components are expressed at least in one tissue sample above a detectable level (above c. 20 units in Affymetrix chips), indicating that none of these genes is a pseudogene. Typically, the expression levels are in the range of hundreds unit and no obvious deviation in average expression levels are conspicuous among the different MAPK signalling tiers (Fig. S1). To provide a global overview on the tissue specific expression of MAPK signalling components, we normalized their expression across the AtGenExpress developmental series as fold changes, and performed hierarchical clustering analysis of variance normalized signals to identify coregulations among genes. The data was visualized as a heat map (Fig. 1). Although there are clear overlaps in the expression of MAPK signalling genes among organs, the clustering broadly assigned MAPK signalling genes to typical expression patterns in shoot apex, flower, leaf, pollen, root and seed. Consistent with related developmental origin, the largest overlap in expression is between leaf and flower samples, particularly those of leaf vs sepal and petal. Large numbers of MAPK signalling genes were highly expressed in pollen, some of which are also abundant in seeds.

Figure 1.

Mitogen-activated protein kinase (MAPK) gene atlas during Arabidopsis development. Expression data for MAPK signalling genes in different stages of development in flowers, pollen, leaves, roots, seedlings, shoots, stems, siliques and seeds were normalized for fold changes and hierarchical clustering was applied to group genes based on similarities in their expression, represented as a heat map. Samples were numbered as follows. Flowers: 1, flowers stage 9; 2, flowers stage 10/11; 3, flowers stage 12; flowers stage 12, 4, sepals; 5, petals; 6, stamens; 7, carpels; 8, flowers stage 15; flowers stage 15, 9, pedicels; 10, sepals; 11, petals; 12, stamens; 13, carpels; flowers stage 12, 14, clv3–7; 15, lfy-12; 16, ap1–15; 17, ap2–6; 18, ap3–6; 19, ag-12; 20, ufo-1; 21, pollen; 22, flower 28 d long day. Leaves: 23, cotyledon 7 d; 24, leaves 1–2, 7 d; 25, rosette leaf #4, 10 d; 26, rosette leaf #4, 10 d gl1; 27, rosette leaf #2, 17 d; 28, rosette leaf #24, 17 d; 29, rosette leaf #6, 17 d; 30, rosette leaf #8, 17 d; 31, rosette leaf #10, 17 d; 32, rosette leaf #12, 17 d; 33, rosette leaf #12, 17 d, gl1; 34, leaf 7 petiole, 17 d; 35, leaf 7 proximal half, 17 d; 36, leaf 7 distal half, 17 d; 37, senescing leaf 35 d; 38, cauline leaf, 21 d; 39, vegetative rosette, 7 d; 40, vegetative rosette, 14 d; 41, vegetative rosette, 21 d; 42, leaf, 15 d long day. Root: 43, 7 d; 44, 17 d; 45, 15 d long day; 46, 8 d Murashige and Skoog (MS) agar; 47, 8 d MS agar 1% sucrose; 48, 21 d MS agar; 49, 21 d MS agar 1% sucrose; 50, lateral cap; 51, endodermis; 52, tip; 53, elongation zone; 54, hair zone; 55, style; 56, epidermis; 57, endodermis + cortex. Seedling: 58, seedling 7 d green parts; 59, rosette 21 d; 60, rosette 22 d; 61, rosette 23 d; 62, seedling green part 8 d; 63, seedling green part 8 d, 1% sucrose; 64, seedling green part 21 d; 65, seedling green part 21d, 1% sucrose; 66, hypocotyl 7 d. Shoot apex: 67, shoot apex + young leaves 7 d; 68, shoot apex 7 d; 69, shoot apex 14 d; 70, stem 2nd internode; 71, stem 1st node, shoot apex, inflorescence 21 d; 72, WT; 73, clv3-2; 74, lfy-12; 75, ap1-15; 76, ap2-6; 77, ap3-6; 78, ag-12; 79, ufo-1. Siliques and seeds: 80, 8-wk-old, stage 3; 81, stage 4; 82, stage 5; 83, stage 6; 84, stage 7; 85, stage 8; 86, stage 9.

The MAPK signalling components in plants have undergone a large expansion through gene duplication (Hamel et al., 2006). From the overall transcription pattern during development, it appears that gene duplication in many instances has been followed by changes in expression level of paralogous pairs, but retained a similar regulation. Hence, for example, MPK2 is generally more highly expressed than MPK1, or MPK7 than MPK14, and MKK2 than MKK1 (Fig. S1).

Paralogous gene pairs can also differ in the level of transcriptional control, with one showing constitutive expression, while the other being under strong transcriptional regulation. Examples of such gene pairs are MPK6–MPK3, MPK4–MPK11, MKK7–MKK9, MAPKKK8 (MEKK1)–MAPKKK10 (MEKK3), Raf7–Raf8/9 and ZIK1–ZIK2, where in each case the first member of the paralogous gene pair is constitutive, while the second is transcriptionally regulated (Fig. S1). Among these gene pairs, we know that MPK6 and MPK3 are both implicated in various biotic and abiotic stress signalling pathways and play redundant essential functions during plant development, as revealed by the additive embryo lethal or haplo-insufficiency phenotypes of their double mutant combination (Wang et al., 2007, 2008). MPK6 is constitutively present in many tissues and possibly part of a primary response by being activated on the protein level upon stimuli. By contrast, MPK3 is highly regulated at the transcriptional level, and might reinforce the primary response detected by MPK6. A similar diversification of gene function has been well studied in the yeast cell cycle, where the CLN3 G1 cyclin is constitutively expressed and regulated by nutrient availability at the posttranslational level. CLN3 can switch on the expression of further dynamically regulated structurally and functionally related G1 cyclins, CLN1 and CLN2, and together these three G1 cyclins are required for entry into the cell cycle (Futcher, 2002). Although signalling pathways are primarily regulated at the protein level, transcriptional regulation can thus represent a strong attribute towards function in certain components.

Transcriptional regulation of MAPK signalling genes during cell proliferation

In animal and yeast cells MAPK pathways are pivotal for the entry into cell proliferation and for the regulation of the cell cycle, but in plant cells only the role of a specific MAPK pathway in cytokinesis has been demonstrated (Takahashi et al., 2004). We have investigated the transcriptional regulation of MAPK signalling genes during synchronized cell cycle after the release of cells from S-phase block administered by the DNA polymerase drug, aphidicoline, re-entry into the cell cycle after sucrose starvation and during the logarithmic to stationary growth phase of Arabidopsis cultured cells (Menges et al., 2003). We found a cluster of MAPK signalling genes with clear mitotic regulation, including the previously identified MKK6 and MPK13, and two upstream MAPKKKs, MAPKKK2 and MAPKKK12 (see the Supplementary Material, Fig. S2; Melikant et al., 2004; Takahashi et al., 2004). Three other MAPK signalling genes had late S, early G2-specific expression within this cluster, including MPK8, MAP4K8 and MAPKKK1. In agreement with being regulated during cell proliferation, members of the mitotic cluster are strongly expressed in the shoot meristem (Fig. 1). A large cluster of MAPK signalling genes were upregulated in sucrose starvation and in stationary phase culture, another cluster in S-phase, while another during the re-entry to cell cycle after sugar refeeding (Fig. S2).

Identification of MAPK signalling genes with significant transcriptional regulation in response to stresses

To gain insights into the signal-dependent transcriptional regulation of MAPK signalling components, and to identify further signalling components, such as receptor-like kinases, other protein kinases, protein phosphatases and transcription factors that are coregulated with MAPKs in specific stress experiments, we searched for genes within these gene families having significant (at least threefold) regulation in normalized time-course experiments in response to well-defined stress stimuli (see the Materials and Methods section and the Supplementary Material Table S1, Data S2). These microarray experiments included cold stress, osmotic stress, salt stress, drought stress, genotoxic stress, oxidative stress, UV-B stress, wounding, heat stress, Pseudomonas syringae, Phytophthora infestans, Botrytis cinerea and elicitors (LPS, HrpZ, Flg22 and NPP1), and are all publicly available (Table S1, Data S2). Changes in gene expression, visualized as heat maps, are shown in Fig. 2 for the UV-B experiment in Fig. 3 for Pseudomonas, and in the Supplementary Material Fig. S3 for fungal pathogens and elicitors and Fig. S4 for a variety of abiotic stresses. The results of the analysis for all the abiotic stress treatments leading to significant transcriptional upregulation, transient upregulation or downregulation of specific MAPK components are summarized in Table 1.

Figure 2.

Regulation of mitogen-activated protein kinase (MAPK) genes by UV-B exposure. Normalized gene expression levels are visualized as heat maps. Treatments as described by NASCARRAYS-144.

Figure 3.

Transcriptional regulation of mitogen-activated protein kinase (MAPK) signalling genes by bacterial pathogens. Normalized gene expression levels are visualized as heat maps. Treatments as described by NASCARRAYS-120. DC3000, Pseudomonas syringae pv. tomato DC3000 (virulent); AvrRpm1, P. syringae pv. tomato avrRpm1 (avirulent); hrcC-, P. syringae pv. tomato DC3000 hrcC-(mutant for transferring the virulent factors); P. syringae (nonhost), P. syringae pv phaseolicola.

Table 1.  Mitogen-activated protein kinase (MAPK) signalling genes transcriptionally regulated during abiotic stress treatments
ShootUpMPKKK18, Raf7/31/35, ZIK4MPKKK14/18/19, Raf6/12/35 MPK7MPKK18, Raf12/35MPK7ZIK4MPK7MPKKK8, ZIK4 MPK7MPKKK19, ZIK4 MPK11Raf8,/25, ZIK4Peak 1MPKKK14, Raf39, Zik3/5 MKK9MPKKK14, Raf8/27/43MKK9 MPK3/11
PeakMPKKK14/20MPK11MPKKK8/12/17, ZIK4MKK4 MPKKK14/18, Raf32 MKK4/9   Peak 2MPKKK5/19, Raf43 MKK4MPKKK15, Raf13
DownMPKKK1, Raf39, ZIK2/3/5 MPK17Raf13/30/32/39, ZIK5MPKKK14, Raf30MKK9 MPKKK14, Raf39 MKK9MPKK14, Raf32/39 MKK9MPKKK13/14, Raf32/39, ZIK2/3 MKK9 MPK3/5/13Peak 3MPKKK8, Raf22 MKK1/2MPK7ZIK4MPK7
RootUpMPKKK18, Raf2/8/30/32/35, ZIK9 MPK11MPKKK13, Raf6/35/38, ZIK3MPKKK5/13/17, Raf35/46 MPK11   MPK12   
Peak MPKKK17/18MPKKK8/14/15MPKKK15/19, Raf30MPKKK19Raf30    
DownMPKKK14MPKKK15Raf34MPK10  MPK13   

In these analysis a large number of MAPK components have been identified as transcriptionally responsive to specific stress treatments, only a couple of which have been previously reported. The following general conclusions can be drawn: Transcriptional stress responses are very different in the shoot and root samples (Table 1, and the Supplementary material Fig. S4). Some MAPK signalling genes specifically respond to selected abiotic stresses (e.g. MPK11 transiently induced only by cold while its induction is sustained by oxidative stress). MKK4 is transiently induced by osmotic and drought stress in shoots. There are MAPK signalling genes that respond to a variety of stresses; for example, MPK7 is upregulated by osmotic, salt, drought and genotoxic, UV-B and wound stresses, while MKK9 is downregulated by salt and genotoxic, oxidative and heat stresses in the shoot, while it is induced by salt stress in the root. MPK13 expression was found to correspond to meristematic tissues during development (Fig. 1 and the Supplementary Material, Figs S1 and S2), and in agreement with the notion that stress halts proliferation, a number of stress treatments downregulate MPK13 expression in the root (Table 1, Fig. S4). UV-B treatment leads to the transcriptional induction of distinct sets of MAPK components in a cascade, MAPKKK14, MKK9, MPK3 belonging to early-, MAPKKK19, MKK4, MPK5 and MPK11 mid- and MAPKKK8, MKK1, MKK2 and MPK7 late-induced (Fig. 2). Encounters with different bacterial and fungal pathogens were also found to induce the transcription of specific sets of MAPK components; MKK4 and MPK11 are induced by nonhost P. syringae, MAPKKK15, Raf13, Raf43 and MPK17 are rapidly (2 h post-inoculation) induced both by host and nonhost pathogens, Raf31, MAPKKK19, MAPKKK20 and MKK9 are induced only after 6 h by virulent while already after 2 h by and avirulent pathogen attack. Raf27, MAPKKK14, MAPKKK17, MAPKKK18 and MPK7 are late induced both by virulent and avirulent Pseudomonas strains (Fig. 3). Pathogens are sensed by the plants through the recognition of pathogen-associated molecular patterns (PAMPs), a MAPK pathway involving MAPKKK8 (MEKK1), MKK4 and 5 and MPK3 and 6 has been proposed as a cascade based on the kinase activations (Asai et al., 2002). We find that different PAMPs can trigger the transcriptional activation of a largely similar set of MAPK components with rapid kinetics, including MAPKKK5, 15, 19, MKK4 and 6 and MPK3 and 11 (Fig. S3). Consistent with the induction by virulent, avirulent, nonhost and hrc-mutant strains, Raf43, MAPKKK15, MKK4, MPK11 are also induced by various elicitors (Fig. S2). Infection by the fungal pathogen Phytophthora infestans led to the rapid transcriptional induction of MAPKKK19, MKK9 and MKK4, while Botrytis cinerea infection led to the rapid transcriptional induction of MAPKKK18, 19 and 20, Raf43, ZIK2, 8, suggesting that signalling to bacterial and fungal pathogen attack is distinct (Fig. S3). While a largely overlapping activation of MAPK activities was found for MPK3, MPK4 and MPK6 by numerous biotic and abiotic stress stimuli (Nakagami et al., 2005), surprisingly a very specific stress-activation pattern of a number of distinct MAPK signalling genes was found on the transcriptional level.

Transcriptional regulation of MAPK signalling genes in response to plant hormones

Plant hormones are diffusible signalling molecules that provide interface to most physiological processes during development and in response to environmental stimuli. Therefore, we analysed the transcriptional regulation of MAPK signalling components in microarray experiments with treatments of seedlings with the major plant hormones and by compounds that inhibit hormone action (Table 2; Supplementary Material, Fig. S5). Considering how little is known about the role of MAPKs in hormone signalling, it was surprising that many are specifically induced by hormones on the transcriptional level. The stress hormone, ABA, led to the transcriptional activation of MAPKKK15, 17, 18, Raf6, 12, 35 and MPK7. Consistent with the recent discovery of the role of ABA in pathogen-response, MAPKKK17, 18, MPK7 are also induced by virulent and avirulent pathogens (de Torres-Zabala et al., 2007, Fig. 3). Methyl jasmonate (MJ) was found to induce MAP4K9, MAPKKK19, MAPKKK20. Ethylene induced MAPKKK1, MKK7, MPK13, MAP4K10 and MKK3. MKK9 is induced both by abscisic acid (ABA) and MJ. Corresponding to the known role of MJ in signalling to fungal pathogens, an overlapping set of MAPK signalling genes were induced by MJ, Phytophthora (MAPKKK19, MKK9) and Botrytis (MAPKKK20, MAPKKK19). MAPKKK14 and MPK3 are transcriptionally induced by auxin, which is in agreement with the observation that auxin activates MAPKs in Arabidopsis on the protein kinase level (Mockaitis & Howell, 2000). MAP4K7 is upregulated by zeatin, while in the cytokinin biosynthetic mutant background the zeatin treatment has led to a much more robust induction of a number of MAPK genes, including Raf27, 30, CTR1, ZIK2, MAPKKK19, 20, MKK4, MKK7, MKK9 (see the Supplementary Material, Fig. S5). The auxin transport inhibitor was found to strongly upregulate MAP4K10, MAPKKK19 and MKK9. In agreement with this, a close paralogue of MKK9, MKK7 was shown to regulate auxin transport (Zhang et al., 2007). Gene expression is controlled, in many instances, by labile transcriptional repressors that are removed in response to signalling events. Therefore inhibition of protein synthesis by cycloheximide (CHX) can lead to the rapid induction of genes regulated in this fashion. A large number of MAPK signalling genes are induced by cycloheximide treatment, including MAPKKK5, 8(9), 8(10), 14, 15, 16, 19 and 20, Raf27 and 28, MKK1, 4, 6 and 9, and MPK3, 5, 11, 17 and 23 (Table 2, Figs S5 and S6).

Table 2.  Mitogen-activated protein kinase (MAPK) signalling genes transcriptionally regulated by plant hormones, and hormone inhibitors, inhibitor of protein synthesis
 ABAMJBLACCET inh.IAAAuxin inhibitorCytokininCHX
  • ABA, abscisic acid; MJ, methyl jasmonate; BL, brassinolide; ACC, amino-cyclopropane carboxylic acid; ET, ethylene; IAA, indole acetic acid; CHX, cycloheximide.

  • *

    Genes induced by cytokinin in ARR22-ox background.

MAP4K MAP4K9     MAP4K7 
(ANP1), MAPKKK5/17/18, Raf6/12/35(ANP1), MAPKKK7/18/19/20 (ANP1), Raf8/13/27  Raf19MAPKKK20*,
Raf1* (CTR1),
MAPKKK8 (MEKK1) APKKK14/15/16/19/20,
Raf2 (EDR1)

Hierarchical clustering of MAPK components based on their global expression pattern across large number of experiments

In order to find genes with globally similar transcriptional regulation to MAPK signalling components, we calculated pairwise Pearson correlation coefficients of MAPK signalling genes with all the Arabidopsis genes represented on the ATH1 Genechip array over a large number (1800) of hybridizations (see the Materials and Methods section) from different genetic backgrounds, tissues, conditions and treatments.

Having this large matrix of correlation coefficient values (114 MAPK components vs c. 22 500 genes) first we wanted to find groups of MAPK signalling genes with similar global transcriptional regulation. The assumption is that genes with correlated expression and therefore related functions will show coregulation with the same group of genes and therefore cluster together. In this way we should be able to group MAPK components and coregulated genes together, and infer functional information within the coregulated gene groups from genes with established functions. To do this we performed hierarchical biclustering analysis on the matrices of Pearson correlation coefficients obtained pair-wise for MAPK signalling genes versus all the Arabidopsis genes first using linearly scaled expression data (Fig. 4).

Figure 4.

Hierarchical clustering of correlation coefficient values calculated using linearly scaled (normalized) expression values of gene pairs from 114 mitogen-activated protein kinase (MAPK) signalling genes and c. 22 500 Arabidopsis genes in 1730 experiments carried out on the Affymetrix ATH1 GeneChip array. The heatmap represents a compressed picture of all 21 692 unique genes represented by probes on the ATH1 array (left to right), with the shading representing the degree of correlation with each of these probe sets. The cluster tree on the right represents the similarity of expression of each MAPK cascade component across all probe sets with other MAPK cascade genes.

Clustering of correlation coefficient values placed MAPK signalling genes to distinct clusters irrespective of the MAPK tiers they occupy within the signalling cascade or of their phylogenic sequence similarities. Four major clusters were established. Three of those clusters could be assigned to biological functions based on published evidence on its members’ functions.

The cluster between MPK4 and MPK22 (Cluster I) consists of a number of MAPK components that have defence-related functions (e.g. MPK1/2/4/6, MKK4/5, MAPKKK8 [MEKK1]) (Asai et al., 2002; Doczi et al., 2007). Cluster II is formed between MAPKKK1 and MPK13 and contains ANP1/2/3 and YODA, which have important roles in developmental and cell cycle-related processes (Soyano et al., 2003; Lukowitz et al., 2004; Melikant et al., 2004). The third major cluster is formed between Raf45 and MPK15 (Cluster III). Within this cluster there are two subclusters, Cluster III a and b. Cluster III b is defined by genes having functions in stress signalling, mainly in biotic defence responses. Supporting functional evidence is provided for MKK1/2 and MPK3 (Teige et al., 2004; Meszaros et al., 2006). Cluster IV is formed between MPK9 and MAP4K8, and contains functionally unknown genes. Members of this cluster are coexpressed with a high correlation coefficient with a very large number of diverse genes, many of them with more than 300 genes with a correlation coefficient higher than 0.9. However, no such cluster is observed if log-transformed data are used to calculate the correlation coefficient, indicating that a relatively small number of experiments with high expression values contribute to this cluster (see later).

Some developmental stages or experimental conditions, such as samples from stages of pollen development and for cycloheximide treatment, are characterized with extremely high expression values for a large number of genes. The use of logarithmic transformation reduces the impact of the high levels of expression of some genes on the calculated correlation coefficients. Therefore, we also calculated the log-transformed values of the Pearson correlation coefficients and performed hierarchical biclustering on this matrix. As expected, cluster IV was dissolved with this method. The other functional clusters largely remained as obtained with linear values, but were somewhat reorganised. A very distinct ‘development and cell cycle’ cluster (Cluster II) formed when applying this method (Fig. S6).

We also examined whether the clustering of MAPKs is altered when the biclustering is performed on the correlation coefficient values calculated for a specific subset of genes, the 1612 transcription factors, but an arrangement of the main MAPK clusters was obtained that was largely similar to that of full Arabidopsis gene set (Fig. S7).

Gene-set enrichment analysis for GO functional categories using gene lists showing high expression correlation with MAPK signalling components

By calculating the Pearson correlation coefficients for MAPK signalling components with all the Arabidopsis genes represented on the ATH1 Genechip array we generated lists of genes ranked according to their correlation with each of the MAP kinase components (see the Supplementary Material, Data S4, also available at This data is valuable to search for genes that show strong co-regulation with particular MAPK components (e.g. other MAPK components, other protein kinases, receptor like kinases or transcription factors). For example MPK3, MKK1, MKK2, MKK4, MKK5 are all strongly coregulated with a number of WRKY transcription factors, in agreement with the published data that these MAPK signalling components are all implicated in pathogen-response and a number of them are known to regulate the expression of pathogen-activated genes through WRKYs (Asai et al., 2002; Andreasson et al., 2005). MPK3, which is known to be involved in a variety of biotic and abiotic stress responses, shows global correlations in its expression with MKK9, EDR1 and MAPKKK8 (MEKK1). Indeed, MPK3 was recently shown to be a downstream target of MKK9, and is negatively regulated by CTR1, a closely related RAF-like kinase to EDR1 (Yoo et al., 2008). MAPKKK8 (MEKK1) was also shown to be an upstream kinase to MPK3 (Asai et al., 2002). This shows that it is feasible to obtain not only components with related functions, but also components that are part of the same signalling cascade based on coregulation. A strong correlation in global expression was found between MKK6 and MAPKKK12, a module that was implicated in regulating cell proliferation or cytokinesis (Soyano et al., 2003). A somewhat weaker, but significant correlation also exists between MKK6 and MPK13, components that have been shown to interact and constitute a signalling pathway together (Melikant et al., 2004). Looking for further correlators among potential upstream signalling components of this putative cell cycle MAPK module identified an AGC kinase, AGC1–3 (Bogre et al., 2003), and a receptor like kinase, At3g24660, while potentially downstream there is a group of transcription factor having strong coregulation with these signalling components, namely, Zinc Finger Family Protein (At4g22250), Basic Helix–Loop–Helix 1 (EGL3) (At1g63650), Squamosa promoter binding protein (At3g57920), Scarecrow transcription factor family protein (At1g63100), c-myb-like transcription factor family MYB3R (At5g11510), Zinc Finger Family Protein, GATA type (At3g06740). The MYB3R type transcription factor was shown to regulate the expression of mitotic genes through the MSE promoter element (Ito et al., 2001). Upon further examination of the annotation of the list of coregulated genes with AGC1–3, MAPKKK12, MKK6, MPK13 we also noted genes, such as CYCB1 and the Expressed Protein, Growth Regulation Factor (At4g37490), that hint of functions in cell proliferation, as has been confirmed in published experiments (Soyano et al., 2003). Correspondingly, MAPKKK12, MKK6, MPK13 all have a clear mitosis-specific expression pattern in cell synchronization experiments (Fig. S2).

To systematically study gene lists that show expression correlations with MAPK components (see the Supplementary Material, Data S3), we examined whether they are enriched in genes belonging to specific functional categories. Aoki et al. (2007) characterized topological features of the entire Arabidopsis coexpression network, and established that biologically significant modules are expected to be found above the correlation coefficient threshold ranging from 0.55 to 0.66. Accordingly, we varied the correlation coefficient threshold between 0.5 and 0.7. The top correlators for each MAPK gene were classified into functional subgroups using BiNGO, a tool for gene-set enrichment analysis for GO (Maere et al., 2005). The GO category consisted of biological process (BP), molecular function (MF) and cellular component (CC). The GO terms in Table 3 and the Supplementary Material Tables S2–S4 were first selected for their highest significance based on the P-value and second by the position in the resulting network graphic, as nodes furthest down the hierarchy are probably the most relevant (Maere et al., 2005). Selected biological process GO term overrepresentation for MAP kinase signalling genes is summarized in Table 3, while the complete dataset can be found in Table S2. Molecular function and cellular component GO term overrepresentation lists are given in the Supplementary Material, Tables S3 and S4, respectively. For each gene the number of correlators over a linear correlation coefficient threshold value, ID and description of the selected GO terms, and their corresponding statistical characteristics are shown. A functional overview of MAP kinase signalling genes, as supported by published evidence and as revealed by gene set enrichment analysis for GO terms is provided in Table 4. We found biologically informative biological process GO terms other than kinase or signalling functions for 41 genes, which is one-third of all MAP kinase signalling genes.

Table 3.  Gene ontology (GO) overrepresentation analysis for biological process of the mitogen-activated protein kinase (MAPK) signalling components
GeneCorrelatorsGO-IDDescriptionCorr P-valueCluster frequencyTotal frequency
MPK3108 (≥ 0.6) 6952Defence response4.28E-0814/76, 18.4%409/20763, 1.9%
50896Response to stimulus1.58E-0522/76, 28.9%1755/20763, 8.4%
42829Physiological defence response1.58E-057/76, 9.2%120/20763, 0.5%
51869Physiological response to stimulus1.58E-057/76, 9.2%124/20763, 0.5%
 6468Protein amino acid phosphorylation2.32E-0514/76, 18.4%768/20763, 3.6%
MPK991 (≥ 0.6)42545Cell wall modification3.45E-034/63, 6.3%65/20763, 0.3%
 7047Cell wall organization and biogenesis1.08E-024/63, 6.3%116/20763, 0.5%
45229External encapsulating structure organization and biogenesis1.08E-024/63, 6.3%116/20763, 0.5%
16043Cell organization and biogenesis1.50E-029/63, 14.2%826/20763, 3.9%
 7010Cytoskeleton organization and biogenesis1.50E-024/63, 6.3%145/20763, 0.6%
MPK1182 (≥ 0.65) 9607Response to biotic stimulus2.60E-0710/62, 16.1%235/20763, 1.1%
51707response to other organism1.93E-069/62, 14.5%231/20763, 1.1%
42829physiological defence response3.04E-067/62, 11.2%120/20763, 0.5%
51869Physiological response to stimulus3.04E-067/62, 11.2%124/20763, 0.5%
 6952Defence response9.66E-0610/62, 16.1%409/20763, 1.9%
MPK1275 (≥ 0.55)15979Photosynthesis1.78E-055/54, 9.2%48/20763, 0.2%
19685Photosynthesis, dark reaction3.80E-032/54, 3.7%5/20763, 0.0%
 9853Photorespiration3.66E-022/54, 3.7%21/20763, 0.1%
 9416Response to light stimulus3.66E-024/54, 7.4%190/20763, 0.9%
 9314Response to radiation3.66E-024/54, 7.4%193/20763, 0.9%
MPK1341 (≥ 0.5) 7167Enzyme-linked receptor protein signalling pathway1.81E-055/29, 17.2%115/20763, 0.5%
 7169Transmembrane receptor protein tyrosine kinase signalling pathway1.81E-055/29, 17.2%115/20763, 0.5%
 7166Cell surface receptor linked signal transduction3.81E-055/29, 17.2%145/20763, 0.6%
 6468Protein amino acid phosphorylation1.29E-048/29, 27.5%768/20763, 3.6%
16310Phosphorylation1.79E-048/29, 27.5%828/20763, 3.9%
MPK1763 (≥ 0.55) 6081Aldehyde metabolism1.32E-022/41, 4.8%7/20763, 0.0%
 6891Intra-Golgi vesicle-mediated transport1.72E-022/41, 4.8%11/20763, 0.0%
 6097Glyoxylate cycle3.19E-021/41, 2.4%1/20763, 0.0%
46487Glyoxylate metabolism3.19E-021/41, 2.4%1/20763, 0.0%
 6886Intracellular protein transport3.19E-023/41, 7.3%137/20763, 0.6%
MPK2288 (≥ 0.55) 9873ethylene mediated signaling pathway4.06E-033/59, 5.0%20/20763, 0.0%
 9723Response to ethylene stimulus4.93E-034/59, 6.7%74/20763, 0.3%
  160Two-component signal transduction system (phosphorelay)5.21E-033/59, 5.0%31/20763, 0.1%
 7165Signal transduction2.75E-027/59, 11.8%522/20763, 2.5%
10119Regulation of stomatal movement2.75E-022/59, 3.3%15/20763, 0.0%
MKK194 (≥ 0.6) 6952Defence response3.68E-1015/67, 22.3%409/20763, 1.9%
 9617Response to bacterium3.56E-077/67, 10.4%77/20763, 0.3%
50896Response to stimulus3.36E-0621/67, 31.3%1755/20763, 8.4%
 6468Protein amino acid phosphorylation2.94E-0513/67, 19.4%768/20763, 3.6%
 9751Response to salicylic acid stimulus4.53E-055/67, 7.4%63/20763, 0.3%
MKK291 (≥ 0.6) 6952Defence response3.49E-1216/60, 26.6%409/20763, 1.9%
 9617Response to bacterium1.74E-098/60, 13.3%77/20763, 0.3%
51707Response to other organism1.74E-0911/60, 18.3%231/20763, 1.1%
42829Physiological defence response1.74E-099/60, 15.0%120/20763, 0.5%
 9607Response to biotic stimulus1.74E-0911/60, 18.3%235/20763, 1.1%
MKK4105 (≥ 0.6)42829Physiological defence response4.97E-089/78, 11.5%120/20763, 0.5%
51869Physiological response to stimulus4.97E-089/78, 11.5%124/20763, 0.5%
51707Response to other organism4.97E-0811/78, 14.1%231/20763, 1.1%
 9607Response to biotic stimulus4.97E-0811/78, 14.1%235/20763, 1.1%
 6952Defence response1.24E-0713/78, 16.6%409/20763, 1.9%
MKK534 (≥ 0.55)42829Physiological defence response1.15E-055/27, 18.5%120/20763, 0.5%
51707Response to other organism1.15E-056/27, 22.2%231/20763, 1.1%
 9607Response to biotic stimulus1.15E-056/27, 22.2%235/20763, 1.1%
51869Physiological response to stimulus1.15E-055/27, 18.5%124/20763, 0.5%
 9814Defence response, incompatible interaction1.90E-033/27, 11.1%71/20763, 0.3%
MKK6272 (≥ 0.65) 6996Organelle organization and biogenesis3.96E-2035/200, 17.5%427/20763, 2.0%
 6259DNA metabolism1.03E-1929/200, 14.5%275/20763, 1.3%
 7017Microtubule-based process1.76E-1516/200, 8.0%76/20763, 0.3%
 7010Cytoskeleton organization and biogenesis1.35E-1419/200, 9.5%145/20763, 0.6%
 6325Establishment and/or maintenance of chromatin architecture1.06E-1316/200, 8.0%101/20763, 0.4%
MKK945 (≥ 0.6) 9737Response to abscisic acid stimulus4.67E-023/25, 12.0%123/20763, 0.5%
MAPKKK132 (≥ 0.5) 7049Cell cycle1.19E-044/21, 19.0%87/20763, 0.4%
  278Mitotic cell cycle1.19E-043/21, 14.2%26/20763, 0.1%
   87M phase of mitotic cell cycle1.39E-032/21, 9.5%11/20763, 0.0%
 7067Mitosis1.39E-032/21, 9.5%11/20763, 0.0%
16570Histone modification1.83E-032/21, 9.5%14/20763, 0.0%
MAPKKK271 (≥ 0.55)30705Cytoskeleton-dependent intracellular transport2.32E-034/53, 7.5%60/20763, 0.2%
 7010Cytoskeleton organization and biogenesis2.38E-035/53, 9.4%145/20763, 0.6%
 7049Cell cycle3.36E-034/53, 7.5%87/20763, 0.4%
 6996Organelle organization and biogenesis3.53E-037/53, 13.2%427/20763, 2.0%
 7018Microtubule-based movement5.05E-033/53, 5.6%43/20763, 0.2%
MAPKKK414 (≥ 0.5) 7020Microtubule nucleation1.86E-021/9, 11.1%1/20763, 0.0%
MAPKKK9139 (≥ 0.5) 6952Defence response1.94E-0917/98, 17.3%409/20763, 1.9%
 6468Protein amino acid phosphorylation5.09E-0820/98, 20.4%768/20763, 3.6%
51707Response to other organism5.96E-0812/98, 12.2%231/20763, 1.1%
 9607Response to biotic stimulus5.96E-0812/98, 12.2%235/20763, 1.1%
50896Response to stimulus5.96E-0829/98, 29.5%1755/20763, 8.4%
MAPKKK10155 (≥ 0.6) 6952Defence response5.09E-2027/109, 24.7%409/20763, 1.9%
50896Response to stimulus1.01E-1238/109, 34.8%1755/20763, 8.4%
42829Physiological defence response2.53E-0911/109, 10.0%120/20763, 0.5%
51869Physiological response to stimulus2.71E-0911/109, 10.0%124/20763, 0.5%
51707Response to other organism1.10E-0813/109, 11.9%231/20763, 1.1%
MAPKKK12321 (≥ 0.75) 6996Organelle organization and biogenesis4.38E-2343/272, 15.8%427/20763, 2.0%
 6259DNA metabolism4.77E-2134/272, 12.5%275/20763, 1.3%
 6325Establishment and/or maintenance of chromatin architecture1.20E-1721/272, 7.7%101/20763, 0.4%
 6323DNA packaging1.20E-1721/272, 7.7%101/20763, 0.4%
 7001Chromosome organization and biogenesis (sensu Eukaryota)4.23E-1721/272, 7.7%108/20763, 0.5%
MAPKKK13125 (≥ 0.55)48316Seed development2.78E-1014/93, 15.0%238/20763, 1.1%
48608Reproductive structure development2.78E-1014/93, 15.0%241/20763, 1.1%
19952Reproduction2.66E-0914/93, 15.0%294/20763, 1.4%
 9793Embryonic development (sensu Magnoliophyta)1.39E-0711/93, 11.8%217/20763, 1.0%
 9790Embryonic development2.04E-0711/93, 11.8%230/20763, 1.1%
MAPKKK14152 (≥ 0.6)50896Response to stimulus2.84E-0731/110, 28.1%1755/20763, 8.4%
42221response to chemical stimulus1.49E-0518/110, 16.3%775/20763, 3.7%
 6952defense response8.67E-0411/110, 10.0%409/20763, 1.9%
 9693Ethylene biosynthesis1.42E-033/110, 2.7%12/20763, 0.0%
 9692Ethylene metabolism1.42E-033/110, 2.7%12/20763, 0.0%
MAPKKK1835 (≥ 0.5) 9737Response to abscisic acid stimulus4.26E-129/27, 33.3%123/20763, 0.5%
 9414Response to water deprivation5.59E-086/27, 22.2%80/20763, 0.3%
 9415Response to water5.68E-086/27, 22.2%88/20763, 0.4%
 9725Response to hormone stimulus5.68E-089/27, 33.3%413/20763, 1.9%
 9628Response to abiotic stimulus3.01E-079/27, 33.3%513/20763, 2.4%
MAPKKK19118 (≥ 0.5)50896Response to stimulus1.23E-0524/84, 28.5%1755/20763, 8.4%
 9607Response to biotic stimulus4.26E-059/84, 10.7%235/20763, 1.1%
51707Response to other organism2.70E-048/84, 9.5%231/20763, 1.1%
 9617Response to bacterium6.99E-045/84, 5.9%77/20763, 0.3%
42430Indole and derivative metabolism5.78E-033/84, 3.5%30/20763, 0.1%
Raf15 (≥ 0.5) 9723Response to ethylene stimulus5.85E-032/5, 40.0%74/20763, 0.3%
Raf4105 (≥ 0.55) 9873Ethylene mediated signalling pathway2.53E-033/66, 4.5%20/20763, 0.0%
51603Proteolysis during cellular protein catabolism2.53E-035/66, 7.5%141/20763, 0.6%
 6511Ubiquitin-dependent protein catabolism2.53E-035/66, 7.5%141/20763, 0.6%
19941Modification-dependent protein catabolism2.53E-035/66, 7.5%141/20763, 0.6%
43632Modification-dependent macromolecule catabolism2.53E-035/66, 7.5%141/20763, 0.6%
Raf529 (≥ 0.5)15995Chlorophyll biosynthesis6.41E-032/20, 10.0%15/20763, 0.0%
 7582Physiological process6.41E-0317/20, 85.0%8922/20763, 42.9%
15994Chlorophyll metabolism6.41E-032/20, 10.0%20/20763, 0.0%
 6779Porphyrin biosynthesis6.41E-032/20, 10.0%23/20763, 0.1%
 6778Porphyrin metabolism6.41E-032/20, 10.0%28/20763, 0.1%
Raf864 (≥ 0.6)48316Seed development2.25E-068/46, 17.3%238/20763, 1.1%
48608reproductive structure development2.25E-068/46, 17.3%241/20763, 1.1%
19952Reproduction6.89E-068/46, 17.3%294/20763, 1.4%
 9790Embryonic development1.49E-057/46, 15.2%230/20763, 1.1%
 9793Embryonic development (sensu Magnoliophyta)1.38E-046/46, 13.0%217/20763, 1.0%
Raf1392 (≥ 0.5) 9695Jasmonic acid biosynthesis7.55E-043/64, 4.6%16/20763, 0.0%
31408Oxylipin biosynthesis7.55E-043/64, 4.6%16/20763, 0.0%
 9694Jasmonic acid metabolism7.55E-043/64, 4.6%17/20763, 0.0%
31407Oxylipin metabolism7.55E-043/64, 4.6%17/20763, 0.0%
42829Physiological defense response9.66E-045/64, 7.8%120/20763, 0.5%
Raf14156 (≥ 0.6) 6394RNA processing1.56E-1719/100, 19.0%192/20763, 0.9%
16070RNA metabolism4.16E-1126/100, 26.0%956/20763, 4.6%
 6139Nucleobase, nucleoside, nucleotide and nucleic acid metabolism9.27E-1135/100, 35.0%1910/20763, 9.1%
43283Biopolymer metabolism8.11E-0835/100, 35.0%2459/20763, 11.8%
 6395RNA splicing4.25E-066/100, 6.0%46/20763, 0.2%
Raf2592 (≥ 0.5) 6952Defence response5.53E-0712/62, 19.3%409/20763, 1.9%
 6468Protein amino acid phosphorylation4.77E-0614/62, 22.5%768/20763, 3.6%
50896Response to stimulus6.04E-0620/62, 32.2%1755/20763, 8.4%
16310Phosphorylation6.04E-0614/62, 22.5%828/20763, 3.9%
 6796Phosphate metabolism7.58E-0614/62, 22.5%871/20763, 4.1%
Raf26110 (≥ 0.7) 6066Alcohol metabolism3.42E-036/79, 7.5%152/20763, 0.7%
19321Pentose metabolism3.42E-033/79, 3.7%19/20763, 0.0%
 5996Monosaccharide metabolism3.75E-035/79, 6.3%116/20763, 0.5%
 5975Carbohydrate metabolism2.56E-028/79, 10.1%527/20763, 2.5%
 6014d-Ribose metabolism2.56E-022/79, 2.5%12/20763, 0.0%
Raf27120 (≥ 0.65)51869Physiological response to stimulus6.37E-068/85, 9.4%124/20763, 0.5%
51707Response to other organism2.32E-059/85, 10.5%231/20763, 1.1%
 9607Response to biotic stimulus2.32E-059/85, 10.5%235/20763, 1.1%
42829Physiological defence response2.32E-057/85, 8.2%120/20763, 0.5%
50832defence response to fungus1.10E-044/85, 4.7%26/20763, 0.1%
Raf3816 (≥ 0.5) 9638Phototropism1.62E-032/11, 18.1%13/20763, 0.0%
 9637Response to blue light1.62E-032/11, 18.1%16/20763, 0.0%
 9606Tropism2.62E-032/11, 18.1%28/20763, 0.1%
 9416Response to light stimulus2.62E-033/11, 27.2%190/20763, 0.9%
 9314Response to radiation2.62E-033/11, 27.2%193/20763, 0.9%
Raf42171 (≥ 0.55)48316Seed development1.72E-039/124, 7.2%238/20763, 1.1%
48608Reproductive structure development1.72E-039/124, 7.2%241/20763, 1.1%
 9793Embryonic development (sensu Magnoliophyta)3.24E-038/124, 6.4%217/20763, 1.0%
19952Reproduction3.24E-039/124, 7.2%294/20763, 1.4%
 9790Embryonic development3.24E-038/124, 6.4%230/20763, 1.1%
Raf4716 (≥ 0.5)43283Biopolymer metabolism3.18E-025/9, 55.5%2459/20763, 11.8%
 9960Endosperm development3.18E-021/9, 11.1%6/20763, 0.0%
 9653Morphogenesis3.18E-022/9, 22.2%212/20763, 1.0%
 6468Protein amino acid phosphorylation3.18E-023/9, 33.3%768/20763, 3.6%
16310Phosphorylation3.18E-023/9, 33.3%828/20763, 3.9%
ZIK3131 (≥ 0.55) 6767Water-soluble vitamin metabolism3.17E-056/103, 5.8%48/20763, 0.2%
 6766Vitamin metabolism5.57E-056/103, 5.8%59/20763, 0.2%
16108Tetraterpenoid metabolism9.86E-054/103, 3.8%18/20763, 0.0%
16116Carotenoid metabolism9.86E-054/103, 3.8%18/20763, 0.0%
15979Photosynthesis1.62E-045/103, 4.8%48/20763, 0.2%
ZIK4136 (≥ 0.5) 9606Tropism1.63E-034/95, 4.2%28/20763, 0.1%
 9638Phototropism2.21E-033/95, 3.1%13/20763, 0.0%
 9414Response to water deprivation2.21E-035/95, 5.2%80/20763, 0.3%
 9637Response to blue light2.21E-033/95, 3.1%16/20763, 0.0%
 9415Response to water2.21E-035/95, 5.2%88/20763, 0.4%
ZIK9221 (≥ 0.55) 6979Response to oxidative stress2.69E-049/155, 5.8%143/20763, 0.6%
 6800Oxygen and reactive oxygen species metabolism2.69E-049/155, 5.8%152/20763, 0.7%
50896Response to stimulus4.30E-0431/155, 20.0%1755/20763, 8.4%
 6950Response to stress6.39E-0417/155, 10.9%665/20763, 3.2%
 6468Protein amino acid phosphorylation1.04E-0216/155, 10.3%768/20763, 3.6%
MAP4K244 (≥ 0.5)48507Meristem development6.44E-033/32, 9.3%44/20763, 0.2%
10087Vascular tissue development (sensu Tracheophyta)6.44E-032/32, 6.2%10/20763, 0.0%
10073Meristem maintenance6.44E-032/32, 6.2%11/20763, 0.0%
48532Shaping of an anatomical structure2.25E-022/32, 6.2%26/20763, 0.1%
 9933Meristem organization2.25E-022/32, 6.2%26/20763, 0.1%
MAP4K383 (≥ 0.6)48856Anatomical structure development2.08E-026/51, 11.7%512/20763, 2.4%
 9790Embryonic development2.08E-024/51, 7.8%230/20763, 1.1%
 9211Pyrimidine deoxyribonucleoside triphosphate metabolism2.08E-021/51, 1.9%1/20763, 0.0%
46075dTTP metabolism2.08E-021/51, 1.9%1/20763, 0.0%
 9202Deoxyribonucleoside triphosphate biosynthesis2.08E-021/51, 1.9%1/20763, 0.0%
MAP4K10208 (≥ 0.65)15979Photosynthesis4.19E-1714/147, 9.5%48/20763, 0.2%
19684Photosynthesis, light reaction1.34E-077/147, 4.7%29/20763, 0.1%
 6091Generation of precursor metabolites and energy1.11E-0516/147, 10.8%450/20763, 2.1%
 9773Photosynthetic electron transport in photosystem I2.20E-053/147, 2.0%3/20763, 0.0%
 9767Photosynthetic electron transport2.48E-054/147, 2.7%10/20763, 0.0%
List of MAP kinase cascade genes with relevant biological process GO term overrepresentation within their coregulated gene sets using BiNGO tool. Column two shows the number of correlators over a linear correlation coefficient threshold value (in parenthesis). P-values (column five) were corrected by the Benjamini & Hochberg False Discovery Rate (FDR) correction. Cluster frequency (column six) is the occurrence of a given GO term within correlated annotated genes. Total frequency (column seven) is the occurrence of the same GO term within all annotated genes.
Table 4.  Summary of mitogen activated protein (MAP) kinase cascade gene functions as provided by published data or predicted by gene ontology (GO) significance analysis
EvidenceDefenceEthylene, jasmonateCell cycle, developmentLight
ExperimentalGO significanceExperimentalGO significanceExperimentalGO significanceExperimentalGO significance
MAP4K     MAP4K2, MAP4K3, MAP4K4 MAP4K10
MAPKKKMPKKK8 (MEKK1), Raf2 (EDR1),MPKKK9, MPKKK10, MPKKK17, MPKKK18, MPKKK19, Raf25, Raf27, ZIK9Raf1 (CTR1)MPKKK14, Raf1 (CTR1), Raf4, Raf12 (?), Raf13MPKKK1 (ANP1), MPKKK2 (ANP2), MPKKK4 (YDA), MPKKK12 (ANP3)MPKKK1 (ANP1), MPKKK2 (ANP2), MPKKK12 (ANP3), MPKKK13, Raf8, Raf9, Raf21, Raf42, Raf47 Raf5, Raf38, ZIK3, ZIK4

To validate the predictive power of the approach, we compared these results with published information on established MAPK cascades. The first complete MAPK pathway in plants was reported by Asai et al., (2002) to consist of MEKK1–MKK4/5–MPK3/6 functioning in innate immunity. However, recently it was shown that MEKK1 is not essential for regulating MKK4/5 and MPK6 in response to biotic and oxidative stresses, indicating the presence of other MAPKKKs with overlapping functions (Ichimura et al., 2006; Nakagami et al., 2006; Suarez-Rodriguez et al., 2007). MEKK1 (also named as MAPKKK8) does not show significant GO term functional overrepresentation to any biological functions, but genes correlating in expression with two closely related genes, MAPKKK9 (MEKK2) and MAPKKK10 (MEKK3) are implicated in defence response. The coregulated gene list of MKK4, MKK5 and MPK3 also showed biological process GO term overrepresentation for defence response with very low correlation P-values, confirming published results with this pathway.

Another established pathway that can validate functional predictions based on transcriptional correlation and GO term overrepresentation is the cytokinesis-controlling ANP1/2/3–MKK6–MPK13 pathway (Krysan et al., 2002; Soyano et al., 2003; Melikant et al., 2004; Tanaka et al., 2004), which is the Arabidopsis orthologue of the tobacco NPK1–NQK1–NRK1 pathway (Takahashi et al., 2004). GO term overrepresentation analysis of coregulated gene lists unambiguously resulted in associating the ANP genes (MAPKKK1 is ANP1, MAPKKK2 is ANP2 and MAPKKK12 is ANP3) with the cell cycle and cytoskeleton (Table 3). These genes have been shown to function as positive regulators for cytokinesis and negative regulators for stress responses by reverse-genetic approaches (Krysan et al., 2002). MKK6-coregulated genes are over-represented with cell cycle and cytokinesis GO terms, with very low P-values, again providing strong support to our analytical approach.

The above examples of MAPK pathways with known functions show that coregulated gene lists and their GO annotations can provide functional clues in many cases, with the prerequisite that the genes studied are themselves under transcriptional control. Of the 23 MAP kinase genes seven produced coregulated gene lists with significant overrepresentation of biological process GO terms supported by a reasonable number of genes.

MPK12 is coregulated with genes annotated in photosynthesis and response to light stimulus. Promoter–reporter fusion experiments have revealed that the expression of MPK12 in aerial tissues of Arabidopsis is largely restricted to guard cells (Hamel et al., 2006). MPK9 is associated with biochemical processes of the cell wall and MPK22 with ethylene signalling. Molecular function GO term overrepresentation for MAP kinases relates to protein kinase activities (Table S3), implicating further protein kinase cascades (MPK3, MPK5, MPK11, MPK13, MPK20 and MPK22), protein phosphatases (MPK6) and other signalling processes, such as calmodulin binding (MPK11). A number of coregulated gene lists are enriched in DNA binding and transcription factor activity (MPK1, MPK14) or indicate the regulation of specific biochemical processes, such as cell wall modifications (MPK8, MPK9). MKK3, like its downstream group C MAP kinases is associated with transcription-related BP and MF GO terms.

Interestingly, MKK-coregulated gene sets are enriched in defence-related GO terms. MKK1, 2, 4 and 5 are all associated with defence responses, and MKK9 is associated with response to ABA stimulus (Table 3). Accordingly, the majority of Arabidopsis MKKs, representing all four MKK groups, have been shown to participate in pathogen defence responses: MKK1 (Meszaros et al., 2006), MKK2 (Brader et al., 2007), MKK3 (Doczi et al., 2007), MKK4/5 (Asai et al., 2002) and MKK7 (Zhang et al., 2007). MKK7 is induced by ethylene, while a closely related gene, MKK9 is transcriptionally activated both by MJ, ABA and by the auxin transport inhibitor, TIBA (see the Supplementary Material, Figs S5 and S6).

The number of putative MAP kinase kinases (MKK) in Arabidopsis is not only much larger than of MPKs and MKKs, but they display a greater variety in primary structures and domain composition. The knowledge of Arabidopsis MAPKKK functions is very limited, and any functional associations would therefore be very informative and a useful guide for further experiments. Genes coregulated with MAPKKK14 are over-represented in GO terms of response to various stimuli, and ethylene biosynthesis. This gene was shown to be rapidly induced by light and to be potentially involved in phyA-mediated far red light-induced seedling de-etiolation (Khanna et al., 2006). MAPKKK14 is also induced by auxin (Fig. S5). Of the further MAPKKKs of unknown function, the GO term analysis of the coregulated gene list predicts a functional involvement for MAPKKK13 in seed and embryonic development. This is in agreement with the specific expression of MAPKKK13 in siliques and seeds (Figs 1, S1). MAPKKK18 is associated with genes having roles in response to abscisic acid and water stress, and MAPKKK19 with biotic stimulus and indole metabolism. Among the RAF-like genes, CTR1 (RAF1) is constitutively expressed. The ENHANCED DISEASE RESISTANCE 1, EDR1 (RAF2) and the related RAF3 are coregulated with genes involved in signalling processes but GO significance analysis did not reveal any specific physiological process. Interestingly, RAF4 is associated with the processes of ethylene signalling and ubiquitin-dependent proteolysis and thus potentially a transcriptionally regulated functional paralogue of CTR1. Ubiquitin-dependent protein degradation has already been shown to play an important role in ethylene signalling at the level of the ethylene-insensitive3 (EIN3)/EIN3-Like (EIL) transcription factors (Dreher & Callis, 2007).

RAF5 and RAF48 are RAF-related MAPKKKs with no available functional information. Protein amino acid phosphorylation and other GO terms suggest that kinase activities are predominantly over-represented in this subset of genes, and indeed these GO terms are the only ones obtained for 12 of the genes, indicating that they may be part of further protein kinase signalling pathways. Sixteen RAFs are coregulated with gene sets whose GO terms are over-represented in certain biological processes. RAF5 is coregulated with genes in chlorophyll (porphyrin) biosynthesis/metabolism, RAF8 and RAF42 with genes in seed/embryonic development, RAF14 with RNA processing and splicing, and RAF21 is associated with terms that suggest involvement in meiosis and microtubule organization. RAF25 and RAF27 are associated with defence response, and RAF26 with alcohol metabolism. Molecular function GO terms for RAF30 suggest a role in reactive oxygen species (ROS) signalling. RAF34 is associated with protein glycosylation, and RAF38 with light-responsive mechanisms and phototropism.

To date, there is no information on the function of any of the ZIK/WNK group, although it was found that the transcription of four members of this group (WNK1/2/4/6) is under the control of circadian rhythm (Nakamichi et al., 2002). GO significance analysis of coregulated genes associates ZIK3 with photosynthesis and vitamin metabolism. ZIK4 is strongly coregulated with genes in light responses, phototropism and water stress. ZIK6 is known to bind and to phosphorylate V-ATPase subunit C (Hong-Hermesdorf et al., 2006), but our analysis did not result in any functional prediction. ZIK9 was associated with oxidative stress signalling. Taken together, out of the 80 MAPKKKs functional information supported by experiments is available for only seven genes (ANP1–3, CTR1, EDR1, YDA and MEKK1). The analysis of co-regulated genes for GO term overrepresentation predicts putative functions for a further 29 genes, summarised in Table 4.

The functions for all the 10 MAP4Ks are unknown, but the co-regulated gene list for MAP4K2 predicts a function in meristem development and maintenance and MAP4K3 is also linked with development. MAP4K10 is coregulated with genes involved in photosynthesis-related process.


The MAPKs are conserved signal transduction modules in eukaryotes, but these modules diversified and have distinct roles in distantly related organisms, such as plants, yeasts and animals. There is only a single MKK encoded in the sequenced unicellular algae, Chlamydomonas and Osterococcus (Merchant et al., 2007; Palenik et al., 2007), suggesting that the common ancestor of animals, fungi and plants also had just one MKK and that the diversification of MAPKs in these kingdoms has occurred independently. Therefore functional information generated on MAPKs in yeast and animal systems is not transferable to plant models. In higher plants the MAPK signalling components are highly expanded in number with c. 120 putative genes in Arabidopsis, and are abundantly used in many biological processes, but our insights on the functions are limited to a small proportion of MAPK components.

Although it is generally accepted that signalling processes are primarily regulated at the post-transcriptional level, there is a significant portion of regulatory input that occurs at the gene expression level. For example, signal transduction pathways activate specific transcription factors coordinately regulating sets of genes. Because in signalling networks there are ample of feedback and feed-forward regulations (Kitano, 2004), it is common that within the set of genes coordinately regulated by signalling pathways there are components of the very same pathway, regulators of the pathway (e.g. protein phosphatases) or genes coding for parallel pathways within the same biological process. Such network topologies can lead to significant coregulation of signalling components with their targets. The available large volume of gene expression data can therefore provide a valuable resource for functional information on signalling genes. Particularly valuable data are the time-course experiments following responses to specific stimuli.

Our comprehensive analysis of Arabidopsis MAPK signalling components confirms this idea, and shows that a large number of these genes have significant transcriptional regulation. Using several strategies, including the statistical analysis of publicly available microarray data for the regulation of MAPK components during development, cell cycle and in response to hormones, biotic and abiotic stress treatments, searching for coregulated gene sets with MAPKs and the analysis of GO-term overrepresentation in these gene lists identified biological processes in which MAPK signalling genes are involved. These large compendia of information should aid further experimentation to study the role of MAPK signalling in diverse biological processes in higher plants.


The work was funded by a joint BBSRC grant to J.A. H.M. and L.B. (BBS/B/13268 and BBS/B/13314 respectively), the British Council exchange program between the laboratories of L.B. and P.M., and a European Commission Marie Curie fellowship (EIF 041909) to R.D.