The conjugation of the small ubiquitin-related modifier, SUMO, to substrate proteins is a reversible and dynamic process, and an important response of plants to environmental challenges. Nevertheless, reliable data have so far been restricted largely to the model plant Arabidopsis thaliana. The increasing availability of genome information for other plant species offers the possibility to identify a core set of indispensable components, and to discover species-specific features of the sumoylation pathway. We analyzed the enzymes responsible for the conjugation of SUMO to substrates for their conservation between dicots and monocots. We thus assembled gene sets that relate the Arabidopsis SUMO conjugation system to that of the dicot species tomato, grapevine and poplar, and to four plant species from the monocot class: rice, Brachypodium distachyon, Sorghum bicolor and maize. We found that a core set of genes with clear assignment in Arabidopsis had highly conserved homologs in all tested plants. However, we also observed a variation in the copy number of homologous genes, and sequence variations that suggested monocot-specific variants. Generally, SUMO ligases and proteases showed the most pronounced differences. Finally, we identified potential SUMO chain-binding ubiquitin ligases, pointing to an in vivo function of SUMO chains as degradation signals in plants.
The modification of substrate proteins by covalent linkage to the small ubiquitin-related modifier, SUMO, occurs by a dedicated set of enzymes. The process is mechanistically similar to the conjugation of ubiquitin, the most prominent representative of a family of small protein modifiers with conserved structure (Hochstrasser, 2009). SUMO is synthesized as a pre-protein that needs to be processed by SUMO proteases to expose a carboxyl-terminal diglycine motif. In a two-step reaction, the heterodimeric SUMO activating enzyme, SAE, forms a thioester between SUMO’s terminal glycine (Gly) residue and an active site cysteine (Cys) of the enzyme. The process starts with the formation of an AMP–SUMO linkage from ATP and SUMO’s carboxyl-terminal Gly residue. Under release of AMP, the activated SUMO carboxyl terminus is then transferred onto the active site Cys of SAE to form a thioester. The whole process involves dramatic conformational changes of the enzyme (Olsen et al., 2010). Subsequently, activated SUMO is transferred to the active site Cys of the SUMO conjugating enzyme (SCE; also called UBC9 in some animals and fungi). SCE can conjugate SUMO to substrate proteins, resulting in an isopeptide linkage formed between the carboxyl-terminal Gly of SUMO and the ε-amino group of a lysine (Lys) residue within the substrate. So far, conjugation has been observed exclusively to ε-amino groups of Lys residues. This differs from the more complex ubiquitin conjugation machinery, where transfer to α-amino groups, or to substrate Cys residues, has also been documented (cf. Vosper et al., 2009, and references therein). In vitro, and probably also in vivo, SCE can modify many substrates in the absence of substrate specificity factors (SUMO ligases). Direct substrate interaction and modification by SCE usually depend on the presence of the short sumoylation consensus motif, consisting of a hydrophobic aliphatic amino acid, followed by the Lys residue to be modified, any amino acid and an acidic residue (ΨKxD/E in one-letter code). Nonetheless, SUMO ligases play important roles in vivo to determine the substrate range and extent of sumoylation. Figure 1 summarizes the reaction steps of the SUMO conjugation cycle.
Functional studies in plants, as well as the characterization of sumoylation enzymes, have so far been restricted largely to Arabidopsis thaliana (for recent reviews, see Miura et al., 2007a; Lois, 2010; Miura & Hasegawa, 2010; H. J. Park et al., 2011). More recently, the first experimental data for the monocot plant rice were published (Park et al., 2010; Thangasamy et al., 2011; Wang et al., 2011). SUMO conjugation has been shown to be essential in Arabidopsis (Saracco et al., 2007), and work by several groups has demonstrated its importance for the integration of environmental inputs and for adequate reaction to stress conditions (Yoo et al., 2006; Catala et al., 2007; Conti et al., 2008; Jin et al., 2008; Chen et al., 2011; Miura et al., 2011). A significant number of plant-specific sumoylation substrates have been identified recently (Budhiraja et al., 2009; Elrouby & Coupland, 2010; Miller et al., 2010). With an impressive body of information available for Arabidopsis, but relatively little insight into SUMO conjugation in other plants, we wanted to understand which of the components identified in Arabidopsis are conserved in other plants, and which genes point to more divergent features of the pathway. The increasing number of sequenced plant genomes (for review, see Feuillet et al., 2011) and plant gene databases (Martinez, 2011) provide promising tools for the characterization of complete pathways. For comparison with Arabidopsis, we used the assembled genome data of tomato (Solanum lycopersicum, genome size c. 800 Mb), grapevine (Vitis vinifera, genome size 487 Mb), poplar (Populus trichocarpa, genome size 550 Mb), rice (Oryza sativa; genome size 466 Mb), Brachypodium distachyon (genome size 270 Mb), Sorghum bicolor (genome size 697 Mb) and maize (Zea mays; genome size c. 2800 Mb). In the latter case, we expected to find genes with high similarity to the Sorghum homologs, but, as a result of a recent genome duplication, the number should be twice that of Sorghum. Surprisingly, we did not find this situation in most instances, which we ascribe to incomplete sequence availability/annotation in maize. Generally, annotation of the Arabidopsis and rice genomes is most advanced, whereas we found some genes not annotated in their full length in other species. We are nonetheless convinced that the survey provides a valid overview over the set of SUMO conjugation and deconjugation enzymes in plants.
Table 1. Enzymes from the sumoylation pathway in Arabidopsis thaliana (Arabidopsis), Solanum lycopersicum (tomato), Vitis vinifera (grapevine), Populus trichocarpa (poplar), Oryza sativa (rice), Brachypodium distachyon (Brachypodium), Sorghum bicolor (Sorghum) and Zea mays (maize)
SUMO activating enzyme subunit 1 (SAE1)
SUMO activating enzyme subunit 2 (SAE2)
SUMO conjugating enzyme (SCE)
SUMO (SUM1 homologs)6
SUMO ligase SIZ1 type
SUMO ligase HPY2/MMS21 type
SUMO ligase PIAS-like
SUMO protease class A10
SUMO protease class B1 OTS type
SUMO protease class B2
SUMO protease class C ESD4 type
SUMO domain containing protein
1Previous annotations of the Arabidopsis Col-0 genome had, in addition, Gene At5g50680 listed with identical sequence to At5g50580. This entry was removed from the most recent update.
2Hypothetical open reading frame GRMZM2G129575 is significantly shorter than other SAE2 reading frames. However, de novo prediction allows this open reading frame to be extended, generating a gene that is as long as and highly similar to Sorghum SAE2 (see Supporting Information Fig. S2 on alignment).
3A gene previously annotated as a potential SCE1 pseudogene, At5g02240 (SCE1b), encompasses a conserved gene currently annotated as a steroid dehydrogenase. It was therefore excluded from the table. See text for further details.
5Genes Os04g49130, Bradi5g19200, Sb06g026280, Sb06g026270, Sb06g026250, GRMZM2G433968, GRMZM2G038851, GRMZM2G341089 and GRMZM2G146142 apparently form a monocot-specific subgroup (see Fig. S3 on alignment).
6For the complete set of Arabidopsis SUMO genes, see Novatchkova et al. (2004). Orthologs to the additional Arabidopsis SUMO genes are difficult to find in other plants, although all plants encode ‘noncanonical’ SUMO genes, with unknown function.
7GRMZM2G053898 does not contain the conserved carboxyl-terminal residues of SUMO. It may therefore be nonfunctional, or the gene may be incompletely annotated.
8According to current annotation, GRMZM2G002999 is shorter, aligning only to the carboxyl-terminal part of other genes listed in this group. This could be a result of incompleteness of the annotation, or to gene truncation.
9The gene model currently representing Os05g48880 in data bases is shorter than other HPY2 genes, lacking part of the zf-MIZ domain. However, de novo prediction for this gene allows the construction of an extended reading frame that contains all parts expected for an HPY2 ortholog (see Fig. S6 on alignment).
10In addition to the genes listed, which fall into classes A, B1, B2 and C, we identified SUMO protease candidates that do not fit into any of these classes: rice genes Os01g33530; Os03g42960; Os04g25110; Os09g08450; Os09g11860; Os09g23240; Os10g33450; Os11g12500; Brachypodium genes Bradi2g26350; Bradi2g33410; Bradi3g28630; Bradi5g15320; maize genes GRMZM2G312375; GRMZM2G321795; GRMZM2G332829; Sorghum gene Sb03g029665; grapevine gene GSVIVT01007609001; and poplar gene POPTR_0004s06880.
11Current annotations suggest that Sorghum gene Sb03g040230 and maize GRMZM2G177324 are significantly shorter than other homologs, aligning to only a portion of the reading frame of Arabidopsis protease At1g09730. However, these hypothetical open reading frames could be part of longer genes with more extended homology.
12Genes Os01g69040, Bradi2g58870, Sb03g043910 and GRMZM2G359505 have two short amino acid insertions in common, and may therefore form a monocot-specific subgroup (see Fig. S11 on alignment).
SUMO chain binding protein (ubiquitin ligase)
SUMO activating enzyme SAE
The SUMO activating enzyme is a heterodimer. The smaller subunit, SAE1, is represented by two genes in Arabidopsis: SAE1a and SAE1b (see Table 1). There is considerable difference in amino acid sequence, but both are competent for SUMO activation (Budhiraja et al., 2009), and no functional differentiation has been reported so far. The larger subunit SAE2, which contains the active site Cys, is encoded by a single gene in Arabidopsis. Taken together, we found that all plants contained SAE genes in low copy numbers, suggesting that single copies of SAE1 and SAE2 were present in the common ancestor of monocots and dicots.
SUMO conjugating enzyme SCE
Arabidopsis encodes a single SCE gene, SCE1 (Kurepa et al., 2003; Novatchkova et al., 2004). Another entry of previous surveys, annotated as a possible gene with similarity to SCE1 (At5g02240 in Kurepa et al., 2003; Novatchkova et al., 2004), probably consists of two distinct open reading frames. One is a presumed pseudogene with identifier At5g02244 in the most recent TAIR release; the other is an abscisic acid (ABA)-responsive steroid dehydrogenase (At5g02240). Neither of these loci is listed in Table 1. In contrast with Arabidopsis, all other plants of Table 1 encode at least two SCE genes. Monocots have additional SCE genes with a slightly different sequence (cf. SCE alignment provided as Supporting Information Fig. S3), which may be considered as a monocot-specific subgroup.
SUMO genes encode precursor proteins with carboxyl-terminal extensions. After extension cleavage by SUMO-specific proteases, the exposed, conserved carboxyl terminus is linked to enzyme active site Cys residues, and eventually to substrates. Some ‘noncanonical’ SUMO proteins have mutations in conserved residues of the carboxyl-terminal region. The functional implications of these changes are not yet fully understood, but decreased cleavage by SUMO-specific proteases is one of the consequences (Budhiraja et al., 2009). Arabidopsis SUMO1 and SUMO2 are each other’s paralogs, representing the most highly expressed, ‘canonical’ isoforms. Over-expression of either SUMO1 or SUMO2 is correlated with an attenuation of ABA-mediated growth inhibition, and combined mutation of both SUMO isoforms is lethal (Lois et al., 2003; Saracco et al., 2007). Arabidopsis contains six additional SUMO genes, SUMO3–8, and one pseudogene (Novatchkova et al., 2004). Among this group, there is good evidence for the participation of SUMO3 (At5g55170) in SUMO conjugation, whereas evidence for the conjugation of the other isoforms is scarce or nonexistent. SUMO3 is nonessential, and its expression level is lower than that of SUMO1 and SUMO2. SUMO1 and SUMO2, but not SUMO3, can form SUMO chains, and the isoforms also differ in their characteristics as substrates of desumoylating enzymes (Chosed et al., 2006; Colby et al., 2006). SUMO3 is elicitor inducible, and over-expression activates plant defense (van den Burg et al., 2010).
Table 1 and Fig. S4 list the sequences identified as orthologs of Arabidopsis SUMO1 and SUMO2. Although other genes homologous to Arabidopsis SUMO genes exist in flowering plants, their relationship to the Arabidopsis genes has not been resolved clearly using orthology tools, and they were not included in Table 1.
Ligases are proteins that increase the rate of SUMO conjugation to substrates and influence the substrate specificity of the SUMO conjugation system. This can occur if the ligase brings substrate and SCE into close proximity, by providing binding interfaces for both. Interestingly, as SCE itself often binds to substrates, this is not the only method of catalytic enhancement. It has also been suggested that certain SUMO ligases may only interact with SCE, and enhance SUMO transfer by imposing conformational constraints on SUMO-loaded SCE (Gareau & Lima, 2010). Likewise, domains specifying a particular subcellular localization, plus SCE interaction, can increase local SCE concentration to promote sumoylation of certain substrates. A known SCE interaction domain, present in most (but not all) identified SUMO ligases, is the MIZ-type zinc finger (zf-MIZ, also called SP-RING; Hochstrasser, 2001), a domain in which two zinc ions are coordinated via a set of conserved Cys and histidine (His) residues. zf-MIZ is part of all three SUMO ligase types known in Arabidopsis.
Arabidopsis SUMO ligase At3g15150 was identified independently by two groups and designated as AtMMS21 and HPY2, respectively (Huang et al., 2009; Ishida et al., 2009). This ligase plays a role in DNA metabolism and meristem maintenance. Homologs to this protein also exist in fungi and animals. In most plants, HPY2 is apparently a single copy gene. In Brachypodium, two paralogs exist that are closely linked, suggesting that a species-specific tandem duplication resulted in the gene number increase. We therefore conclude that a single gene of this class belongs to the core set of plant SUMO ligases.
Two additional proteins of Arabidopsis, At1g08910 and At5g41580, carry the zf-MIZ domain that is characteristic of many SUMO ligases (Novatchkova et al., 2004). The encoded proteins, PIAS like 1 and 2 (PIAL1/2), respectively, show in vitro SUMO ligase activity (K. Tomanov & A. Bachmair, unpublished). Expression data (University of Toronto Arabidopsis eFP Browser http://bar.utoronto.ca/efp/cgi-bin/efpWeb.cgi) indicate that PIAL1 is stress inducible. The PIAL1/2 class of SUMO ligases has identifiable homologs in the analyzed plants. PIAL1/2 class members show a high level of sequence conservation in the amino-terminal region that decreases in the second half of the proteins, after the zf-MIZ domain (for details, see Fig. S7). Most plants listed have one homolog, which is more similar to PIAL2 than to PIAL1.
SUMO proteases have a dual function. They provide free SUMO by hydrolyzing peptide linkages in primary translation products of SUMO genes, which encode carboxyl-terminal extensions linked to the SUMO sequence (see Fig. 1). In addition to precursor cleavage, these proteases function as isopeptidases to release and recycle SUMO from protein conjugates (Colby et al., 2006; Mukhopadhyay & Dasso, 2007). There is good evidence that SUMO proteases contribute to the regulation of flowering time, plant–pathogen interactions and adaptation to abiotic stress factors (Murtas et al., 2003; Xu et al., 2007; Conti et al., 2008; Kim et al., 2008). The specificity of SUMO proteases in animals and fungi derives, to a large extent, from their subcellular localization, and mutant enzymes with improper localization do not fully complement null phenotypes (Panse et al., 2003; Mukhopadhyay & Dasso, 2007).
An exhaustive listing of plant SUMO-specific proteases is particularly challenging as explained in the following. The known SUMO proteases of Arabidopsis belong to the C48 clade of Cys proteases (Mukhopadhyay & Dasso, 2007; van der Hoorn, 2008). However, proteases specific for the modifier ubiquitin belong to five different structural groups, including metalloproteases (for reviews, see Routenberg Love et al., 2007; Komander et al., 2009), and one SUMO-specific protease of baker’s yeast, Wss1, is a metalloprotease (Mullen et al., 2010). Plants encode proteins with homology to Wss1 (e.g. Arabidopsis gene At1g55915 is a potential candidate). Likewise, Arabidopsis genes AtULP2a–h have been annotated as potential SUMO proteases (Kurepa et al., 2003), but experimental proof has not yet been published. This suggests that the experimentally verified SUMO-specific proteases of Arabidopsis do not represent the complete set. We nonetheless concentrated on the C48 clade of proteases, and used Phytozome 8 gene family 318727390 as the basis for analysis. Figure 2(a) shows a phylogenetic tree of Arabidopsis entries (generated using MrBayes 3.2.1). It contains, in addition to branches with known SUMO proteases, two new branches that represent additional candidate genes. In a second step, we searched our set of plant genomes for presumed orthologs of the Arabidopsis genes, using OMA, OrthoMCL, Inparanoid and RoundUp. Figure 2(b) shows a graphic representation of gene relationships using CLANS (Frickey & Lupas, 2004). Table 1 lists all entries of Fig. 2. Footnote 10 of Table 1 lists genes that did not fall into one of the four defined classes. Figure S9 shows a protease sequence alignment. Taken together, the searched plant species encode potential orthologs of the known Arabidopsis SUMO proteases, but additional proteases with SUMO specificity are likely to exist in plants.
Group A, potential SUMO protease At3g48480
This group of genes awaits experimental proof regarding its activity as SUMO-specific proteases. Its relationship to the other groups, however, is highly suggestive of the proposed activity. Most plants of Table 1 encode a single gene of this class.
Group B1, OTS1 and OTS2
The two genes OTS1, At1g60220, and OTS2, At1g10570, have overlapping functions and have been implicated in salt stress resistance (Conti et al., 2008). The copy number of this group varies from one in grapevine and poplar to six in Sorghum (Table 1).
Group B2, potential SUMO proteases At1g09730 and At4g33620
These genes (called ULP2like2 and 1, respectively, in Novatchkova et al., 2004) have not yet been functionally characterized. The At1g09730 gene is large and, consistent with a high expression level, cDNAs have been isolated. The second Arabidopsis gene of this class, At4g33620, has a significantly lower expression level than At1g09730. Mutation of At1g09730, but not of At4g33620, results in reduced growth (H-P. Stuible & A. Bachmair, unpublished). Predicted orthologs exist in all plants of Table 1.
Group C, ESD4 and related
ESD4 of Arabidopsis (At4g15880) locates to the nuclear periphery and contributes to flowering time regulation (Murtas et al., 2003; Xu et al., 2007). By contrast, its relative ELS1 (At3g06910) is extranuclear (Hermkes et al., 2011). Both genes are functionally distinct, as is evident from the different mutant phenotypes. Because esd4 mutants have a more severe phenotype than els1 mutants, we hypothesize that ESD4 is the central gene of this group. A third candidate gene, At4g00690, has not been functionally characterized and may be a pseudogene. Each plant of Table 1 has at least one representative of this group.
Sumoylation functions mainly by promoting the formation of new intra- and intermolecular protein contacts (Kerscher, 2007). These interactions allow the establishment of functional networks between sumoylated proteins and their noncovalent interactors (Hecker et al., 2006; Kerscher, 2007). So-called SUMO-interacting motifs (SIMs) are the mediators of noncovalent interactions between SUMO and SUMO binding proteins. SIMs are characterized by a loose consensus sequence, ΨΨxΨD/S/E or D/S/EΨxΨΨ, where Ψ symbolizes the hydrophobic amino acids I, L, V, M or F, x can be any amino acid, and D, S and E represent single-letter amino acid abbreviations (Miteva et al., 2010). The short length and variability of these sequences result in poor conservation (they may disappear and reappear in a different part of a protein, or in a different subunit of a protein complex). However, one class of animal and fungal proteins is characterized by a tandem arrangement of four SIMs, which therefore have specific affinity for binding to SUMO chains. In addition, these proteins have a RING domain and function as ubiquitin ligases, channeling proteins with a SUMO chain into the ubiquitin-proteasome-dependent degradation pathway (Plechanovova et al., 2011; Praefcke et al., 2012). It has been shown previously that Arabidopsis SUMO1/2 can form chains (Colby et al., 2006). We therefore gave consideration to candidate loci with the above structural hallmarks. Arabidopsis has two proteins with four or five SIMs and one RING domain, At3g07200 and At5g48655. The performed orthology predictions suggest that they are each other’s paralogs and that potential orthologs exist in other plants. Some monocot representatives have two characteristic sequence insertions compared with the other family members, and may therefore form a monocot-specific subclass (Table 1, Fig. S11). We thus hypothesize that plants use SUMO chains as degradation signals, channeling chain-modified substrate proteins into degradation by SUMO chain binding ubiquitin ligases.
In summary, we list the predicted orthologs of known and currently unappreciated components of the SUMO conjugation system and identify candidates for monocot-specific subgroups of enzymes. The overall structural conservation of the SUMO conjugation system in flowering plants underpins the value of Arabidopsis as a model in SUMO research, and monocot-specific features promise interesting results from experimental approaches in monocots.
Work in A.B.’s labotatory is supported by the German Research Foundation DFG (grant 1158/5-1, SPP1365) and by the Austrian Science Foundation FWF (grant P21215-B12). K.H.’s work is supported by DFG (SPP1365).