N-terminal protein acetylation is common in eukaryotes and halophilic archaea, but very rare in bacteria. We demonstrate that some of the most abundant proteins present in the crenarchaeote Sulfolobus solfataricus, including subunits of the thermosome, proteosome and ribosome, are acetylated at the N-terminus. Modification was observed at the N-terminal residues serine, alanine, threonine and methionine-glutamate. A conserved archaeal protein, ssArd1, was cloned and expressed in Escherichia coli, and shown to acetylate the same N-terminal sequences in vitro. The specific activity of ssArd1 is sensitive to protein structure in addition to sequence context. The crenarchaeota and euryarchaeota apparently differ in respect of the frequency of acetylation of Met–Glu termini, which appears much more common in S. solfataricus. This sequence is acetylated by the related Nat3 acetylase in eukarya. ssArd1 thus has a relaxed sequence specificity compared with the eukaryotic N-acetyl transferases, and may represent an ancestral form of the enzyme. This represents another example where archaeal molecular biology resembles that in eukaryotes rather than bacteria.
In eukarya, acetylation of the α-amino group at the N-terminus of proteins is an extremely common modification, affecting 80–90% of cytosolic mammalian proteins and 50% of proteins in Saccharomyces cerevisiae (reviewed in Polevoda and Sherman, 2003a). S. cerevisiae has three protein N-terminal acetyltransferases (NATs), NatA, NatB and NatC, with catalytic subunits Ard1, Mak3 and Nat3 respectively. None of these genes are essential in yeast, suggesting that Nα acetylation is itself not an essential modification, although a variety of growth and other defects are observed in deletion mutants (reviewed in Polevoda and Sherman, 2003b). NatA acetylates N-termini with sequences beginning Ser, Ala, Gly and Thr, following removal of the N-terminal Met residue. NatA acetylates over 90% of proteins with an N-terminal Ser residue, and over 50% of those with an Ala at the terminus (Driessen et al., 1985; Polevoda and Sherman, 2003a). NatC is specific for N-terminal sequences Met–Glu and Met–Asp while NatB acetylates termini with a Met residue followed by a bulky hydrophobic amino acid (Polevoda and Sherman, 2003a). In Homo sapiens, hsArd1 (human Ard1) forms a complex with the NatH protein, which in turn interacts with the ribosome to facilitate co-translational acetylation of proteins (Arnesen et al., 2005). The Ard1, Mak3 and Nat3 proteins are homologous, and constitute part of the GNAT superfamily of acetyltransferases (Vetting et al., 2005).
In contrast, bacterial N-terminal protein acetylation is highly unusual. In Escherichia coli, the NAT proteins RimI, RimJ and Riml are specific for acetylation of the ribosomal proteins S18, S5 and L12 respectively (Yoshikawa et al., 1987; Tanaka et al., 1989). There is no evidence for more general bacterial protein N-terminal acetylation analogous to the eukaryal pathways. Though lacking a nucleus, the archaea have information-processing pathways such as DNA replication and transcription that are similar to those in eukarya and quite distinct from the equivalent bacterial processes (Reeve, 2003; White, 2003; Kelman and White, 2005; Robinson and Bell, 2005). Until recently, only a handful of N-terminally acetylated archaeal proteins had been identified, leading to the prediction that archaea, like bacteria, do not acetylate significant numbers of proteins (Polevoda and Sherman, 2003a). However, a recent proteomic study of two halophilic euryarchaeal species, Halobacterium salinarum and Natronomonas pharaonis, revealed that 14–19% of N-terminal peptides were modified by Nα acetylation (Falb et al., 2006). In these species, acetylation was restricted to termini with Ser or Ala residues, leading to the suggestion that the archaeal acetylase has a NatA-like specificity.
Here we report the identification of 17 proteins from the crenarchaeote Sulfolobus solfataricus with acetylated N-termini. The N-terminal sequences acetylated include Ser, Ala, Thr and Met–Glu. We demonstrate that a single NAT family member, S. solfataricus Ard1 (ssArd1), conserved in archaea and eukarya, has the ability to acetylate all of these sequences in vitro. We predict that ssArd1 represents an ancestral form of the eukaryal NAT family with relaxed sequence specificity.
Limited proteomic survey of N-terminal acetylation in S. solfataricus
As there was only limited information on the extent of protein N-terminal acetylation in archaea, we undertook a proteomic survey of acetylation in S. solfataricus proteins. An S. solfataricus cell lysate was fractionated by gel filtration chromatography and early fractions, enriched for large proteins and protein complexes and typically containing scores or hundreds of different proteins, were digested by trypsin and characterized by mass spectrometry, as described in Experimental procedures.
The proteins for which an unambiguous N-terminal peptide was identified are summarized in Table 1. Highly expressed proteins, such as proteasome, thermosome and ribosome subunits, were well represented. This probably reflects the increased probability of identification of N-terminal peptides from highly expressed proteins, as well as the enrichment of proteins that are present in high molecular weight complexes. Of the 26 N-terminal peptides identified, 17 were acetylated. These included the Alba1 and glutamate dehydrogenase proteins, both shown previously to be acetylated (Bell et al., 2002; Polevoda and Sherman, 2003a). Although limited in scope, this survey represented a significant increase in the number of acetylated S. solfataricus proteins known and suggests that a high proportion of proteins in this species may be modified by acetylation. The majority of the acetylated N-terminal peptide sequences began with Ser and Ala residues at the terminus, consistent with the removal of the N-terminal methionine followed by acetylation. One example of an acetylated threonine terminus was identified, together with four examples of an N-terminal Met–Glu peptide sequence. In yeast, Met–Glu termini are acetylated by a specialized acetylase, NatC, while the NatA protein acetylates peptides where the methionine has been removed to reveal a Ser, Ala, Thr or Gly residue (Polevoda and Sherman, 2003a). The nine unacetylated N-terminal peptides included one example (the Thermosome β-subunit) that was also found in acetylated form, suggesting that partial acetylation can occur. The remainder consisted of sequences beginning with Val, Pro or uncleaved Met followed by a large uncharged residue. The latter are substrates for the NatB acetylase in eukaryotes. Our data suggests that protein acetylation may be quite common in S. solfataricus, consistent with a recent large scale proteomic study of the euryarchaea H. salinarum and N. pharaonis, which yielded an estimate that 10–15% of proteins in these species are acetylated at the N-terminus (Falb et al., 2006). This stands in stark contrast to the situation in bacteria, where N-terminal acetylation is limited to a few ribosomal proteins (Polevoda and Sherman, 2003a).
Table 1. N-terminal peptides from S. solfataricus proteins detected by mass spectrometry.
N-terminal tryptic peptide
LSU, large subunit ribosomal protein; SSU, small subunit ribosomal protein.
Hypothetical coiled coil
PCD5 DNA binding
CRISPR associated (cmr4)
CBS domain protein
Identification, cloning and purification of the ssArd1 homologue
The genome sequence of S. solfataricus contains a clear putative N-terminal protein acetylase, ssArd1 (gene sso0209) with 37% sequence identity to the human Ard1 (hsArd1) protein (Fig. 1). Homologous proteins are present in other archaeal genome sequences, but the ssArd1 sequence is more homologous to hsArd1 than to any sequence from a euryarchaeal species, or to the eukaryal Mak3 and Nat3 proteins. The gene-encoding ssArd1 was amplified by PCR and cloned into the plasmid pET28c, allowing expression of the native protein in E. coli. Recombinant ssArd1 was purified as described in Experimental procedures (Fig. 2A).
ssArd1 acetylates the N-terminus of ssAlba1
One of the few proteins known previously to be acetylated at the N-terminus in S. solfataricus is the chromatin protein Alba1 (Bell et al., 2002). However, when expressed in E. coli, Alba1 is predominantly unacetylated. We used recombinant unacetylated ssAlba1 as a substrate for ssArd1 by incubating the two proteins along with acetyl-coenzyme A (AcCoA) in assay buffer at 55°C. Mass spectrometry revealed that a single acetyl group was added to Alba1 when both ssArd1 and AcCoA were present, resulting in the expected mass increase of 42 Da, yielding a protein of molecular mass 10 496 Da (Fig. 2B). In S. solfataricus, Alba1 is acetylated both at the N-terminus and on a single internal lysine residue, Lys16 (Bell et al., 2002) (Fig. 3A). To determine the site of acetylation of Alba1 by ssArd1, acetylation was carried out using radioactive 14C-AcCoA, and the labelled protein was separated from unincorporated AcCoA by SDS-PAGE followed by phosphorimaging and quantification (Fig. 3B and C). The rates of acetylation of wild-type (wt) Alba1, a K16E mutant and an N-terminal S2P mutant were compared. The rates of acetylation of wt and K16E Alba1 were essentially the same, whereas the acetylation of the S2P mutant was severely reduced. These observations confirmed that ssArd1 was an N-terminal acetylase, consistent with the prior identification of the protein ‘Pat’ that is responsible for acetylation of Lys16 of Alba1 (Marsh et al., 2005).
Specificity of ssArd1 for the N-terminal amino acid sequence
The eukaryal Ard1 protein family is specific for acetylation of Ser, Ala, Thr residues, with much lower levels of acetylation of Gly, Val and Cys residues (Polevoda and Sherman, 2003a). To test the sequence specificity of ssArd1, we made a series of site-directed mutant forms of the Alba1 protein, with alteration of the second amino acid (after the methionine) to Ala (S2A), Glu (S2E), Gly (S2G), Leu (S2L), Thr (S2T) and Val (S2V). All the proteins were stable and purified as for the wt Alba1 protein (Fig. 4). Mass spectrometry confirmed that the N-terminal Met– residue was removed from the S2A, S2G, S2T and S2V proteins, as expected from the known specificity of methionine aminopeptidase in E. coli. The S2E and S2L mutants retained their N-terminal Met residue. The rate of N-terminal acetylation of each of the mutant proteins was determined using the same radioactive 14C-AcCoA incorporation assay and SDS-PAGE. The data confirmed that, in vitro, ssArd1 preferentially acetylates N-termini with Ser and Ala residues. However, there was appreciable acetylation of both sequences with an N-terminal methionine, Met–Glu and Met–Leu. These are not substrates for eukaryotic Ard1 in vivo, but are targeted by Nat3 and Mak3 respectively (Polevoda and Sherman, 2003a). The Alba1 mutant with an N-terminal valine was also acetylated at an appreciable rate. These data suggest that ssArd1 has a relaxed sequence specificity compared with the eukaryotic proteins. However, two recombinant proteins with valine at the N-terminus were not acetylated by ssArd1 in vitro (Fig. 5), and several unacetylated N-terminal peptides beginning with valine were identified in Table 1. While this may be explained by differences in the in vivo activity of ssArd1 compared with the in vitro assay, it may also relate to differences in the exposure of N-terminal peptides in different proteins (see Discussion).
Acetylation of other Sulfolobus proteins by ssArd1 in vitro
To test whether the specificity of ssArd1 was dependent purely on the sequence context, we tested the rate of acetylation of a variety of recombinant S. solfataricus proteins with different structures and N-terminal sequences (Fig. 5). Alba1 was confirmed as the best substrate from the panel of proteins. Two other proteins, the Holliday junction resolving enzymes Hjc and Hje (Kvaratskhelia and White, 2000), which had alanine residues at their N-termini, were acetylated at an appreciable rate, consistent with the known specificity of eukaryotic Ard1 for Ser and Ala termini. Notably, however, the S. solfataricus single-stranded DNA-binding protein SSB was acetylated more quickly than either of these proteins. SSB has the N-terminal sequence MEEK–, which is not acetylated by eukaryal Ard1, but rather by the Nat3 protein (Polevoda and Sherman, 2003a). Taken together with the data for the Alba mutant with an N-terminal sequence ME–, this provides strong evidence that ssArd1 can acetylate Nat3– as well as Ard1-type substrates.
While the majority of the proteins tested had been expressed in recombinant form in E. coli, the S. solfataricus protein Sso7d had been purified from its native host, and shown by mass spectrometry to be unacetylated in vivo (data not shown), despite the presence of a favoured Ala residue at the N-terminus. This was consistent with the inability of ssArd1 to acetylate Sso7d in vitro, strengthening the prediction that ssArd1 is the major N-terminal protein acetylase in S. solfataricus. The reason for the lack of activity against Sso7d may arise from the structure of the N-terminus, which is closely associated with the rest of the protein structure (Fig. 5D) (Agback et al., 1998; Su et al., 2000), and may therefore not be available for modification. In contrast, the N-termini of the Alba1, SSB, Hjc and Hje proteins, which are all targets for the acetylase, are disordered in the respective crystal structures and presumably solvent exposed in solution (Bond et al., 2001; Wardleworth et al., 2002; Kerr et al., 2003; Middleton et al., 2004).
A limited proteomic study of protein acetylation in S. solfataricus has resulted in the identification of 17 proteins with modified N-termini and only nine unacetylated N-terminal peptides. It is notable that many of the proteins, such as subunits of the ribosome, proteasome and thermosome, are known to be highly abundant, and we predict that more sensitive analyses would detect many other acetylated proteins. Taken together with the recent estimate that at least 10% of proteins are acetylated in two species of halophilic euryarchaea (Falb et al., 2006), this suggests strongly that protein acetylation in the archaea is a common phenomenon. If this is correct, it represents another aspect in which the molecular biology of the archaea and eukarya are similar. This is in stark contrast to the situation in bacteria, where only a handful of proteins are known to be modified. The spectrum of N-terminal sequences acetylated in the S. solfataricus proteins, comprising primarily Ser, Ala and Met–Glu termini, is also similar to the situation in eukarya.
The S. solfataricus Ard1 homologue is highly similar to the eukaryotic NAT family of acetyltransferases at the amino acid level, suggesting conservation of function. There is a single Ard1 protein in S. solfataricus and many other archaea, whereas there are multiple paralogues in eukarya (e.g. three in yeast). Unlike the eukaryal NAT family proteins, ssArd1 does not appear to be part of a larger protein complex, but is an active Nα acetyltransferase in isolation. The increased complexity of the eukaryal acetylase machinery compared with the archaeal system is typical of many cellular processes conserved between the two domains. For example, the archaeal transcription machinery is a much simpler version of the eukaryotic transcription apparatus, which has increased in complexity in the course of evolution (Bell and Jackson, 1998). We can therefore postulate that the situation in S. solfataricus represents an ancestral state, with a single NAT family acetyltransferase that functions independently of other proteins. Gene duplication and specialization of paralogous NAT proteins with specificity for different N-terminal protein sequences has taken place in eukarya. Non-catalytic subunits that may influence protein specificity and anchor the eukaryotic NAT proteins to the ribosome have also evolved.
Consistent with this hypothesis, the biochemical characterization of ssArd1 demonstrates that the protein can acetylate a much wider range of protein N-terminal sequences than any of the eukaryotic NAT proteins. In particular, the strong specific activity against Ser, Ala and Met–Glu termini accounts for the majority of the acetylated protein N-termini detected so far in S. solfataricus, suggesting that ssArd1 is the sole functional NAT in this organism, and that it combines the specificity of the yeast Ard1 and Nat3 proteins. We cannot yet say whether acetylation happens co-translationally in archaea, as is the case in eukarya. However, the observation that the folded 3D structure of Sso7d may prevent acetylation of the N-terminus in vivo in S. solfataricus could be construed as evidence that acetylation happens post-translationally.
The contrast with the situation in euryarchaea is intriguing, as acetylation patterns in the two halophilic species studied so far suggest that Met–Glu termini are not modified (Falb et al., 2006). A comprehensive mass spectrometric characterization of 72 proteins from Methanocaldococcus jannaschii revealed only one acetylated protein (Forbes et al., 2004), suggesting protein acetylation may be a rare modification in this species. It would therefore be interesting to determine whether Nα protein acetylation in other euryarchaea follows the same pattern.
In conclusion, the crenarchaeal N-terminal acetylation machinery appears to resemble an ancestral version of the eukaryotic one, and is an ideal model for structural and mechanistic studies of this conserved enzyme family.
ssArd1 amplification, cloning, expression and purification
ssArd1 (gene sso0209) was amplified by PCR with Pfu polymerase using the following primers:
The PCR product was cloned into the NcoI/BamHI site of the pET28c vector for native protein expression and the construct was transformed into E. coli Rosetta DE3 cells (Novagen) for expression. Protein expression was initiated by addition of 0.1 mM IPTG to cultures at OD600 0.6, and incubated at 37°C for 3 h. Cultures were centrifuged to pellet the cells and the pellet was frozen overnight. Pellets were re-suspended on ice in lysis buffer (20 mM Tris-HCl pH 8.5, 200 mM NaCl, 1 mM EDTA, 1 mM DTT) and 1 mM benzamidine. The sample was lysed by sonication on ice. Soluble protein was separated from cell debris by centrifugation at 40 000 g, 4°C for 30 min. The cleared lysate was diluted twofold with lysis buffer and subjected to heat treatment for 25 min at 65°C, followed by centrifugation as above to precipitate E. coli proteins. The heat-treated, cleared lysate was diluted fivefold in 20 mM Tris-HCl pH 8.0, 1 mM EDTA and 1 mM DTT to lower the salt concentration, then passed slowly through an equilibrated 5 ml HiTrap Q-sepharose column (GE Healthcare) followed by a 5 ml HiTrap heparin column to remove major contaminants. ssArd1 did not bind appreciably to either column under these conditions. ssArd1 was precipitated from the flow-through by addition of solid ammonium sulphate to 80% saturation on ice and centrifuged as above. This reduced the volume and also eliminated some persistent contaminating proteins. The pellet was re-suspended on ice in 20 mM Tris-HCl, pH 8.0, 200 mM NaCl, 1 mM EDTA and 1 mM DTT. The protein was subjected to a final purification step on a gel filtration HiLoad® 26/60 Superdex® column (GE Healthcare).
Fractionation of S. solfataricus proteins for mass spectrometry
Sulfolobus solfataricus P2 biomass was obtained from Dr Neil Raven, Centre for Applied Microbiology and Research, Porton Down, UK. Cell lysis, centrifugation and chromatography steps were carried out at 4°C. Five grams of cells were thawed in 12 ml of buffer A (50 mM MES pH 6.5, 200 mM NaCl, 1 mM DTT, 1 mM EDTA) and immediately sonicated for 5 × 1 min with cooling for 2 min between bursts. The lysate was centrifuged at 40 000 g for 30 min, and the supernatant was passed through a 0.45 μm syringe filter. The filtered, cleared lysate was applied to a 26/70 gel filtration column (Superdex Hi-Load, GE Healthcare) equilibrated with buffer A, and fractionated by isocratic chromatography. Fractions eluting from the column were monitored by absorbance at 280 nm. Fractions eluting early from the column, assumed to be enriched for large proteins and protein complexes, were analysed by mass spectrometry as described below.
Detection of acetylated N-terminal peptides by mass spectrometry
The pH of the fractions was adjusted to pH 8.0, using 1 M Tris pH 8.0, and trypsin (0.5 ml, 0.1 mg) added. The samples were incubated at 37°C overnight. The resultant peptides were diluted 1:5 into 5% formic acid and then separated using an UltiMate nanoLC (LC Packings, Amsterdam) equipped with a PepMap C18 trap and column using a 3.5 h gradient of increasing acetonitrile concentration, containing 0.1% formic acid (5–35% acetonitrile in 3 h, 35–50% in a further 30 min, followed by 95% acetonitrile to clean the column). The eluent was sprayed into a Q-Star Pulsar XL tandem mass spectrometer (Applied Biosystems, Foster City, CA) and analysed in Information Dependent Acquisition (IDA) mode, with Gas 1 set to 5, Gas 2 to 0, Curtain Gas to 20, IonSpray Voltage to 2800, Declustering Potential to 60, Focusing potential to 230, Declustering potential 2 to 15 and Collision gas to 4. The Collision energy was calculated based on the mass of the peptide being subjected to fragmentation by the Analyst IDA CE parameters script. Mass spectrometry/mass spectrometry data for doubly and triply charged precursor ions were analysed using ProID software (Applied Biosytems), searching against a S. solfataricus protein database containing entries with and without the N-terminal methionine. The data were searched with tolerances of 0.2 Da for the precursor and fragment ions, trypsin as the cleavage enzyme, one missed cleavage, and acetylation and methionine oxidation selected as possible modifications. Unambiguous acetylation was assigned if the peptide match gave a confidence score of 99.
Measurement of acetylated and unacetylated intact protein masses
The protein sample (15 ml, 5 pm ml−1) was desalted on-line through a MassPrep On-Line Desalting Cartridge 2.1 × 10 mm, eluting with an increasing acetonitrile concentration (2% acetonitrile, 98% aqueous 1% formic acid to 98% acetonitrile, 2% aqueous 1% formic acid) and delivered to an electrospray ionization mass spectrometer (LCT, Micromass, Manchester, UK) which had previously been calibrated using myoglobin. An envelope of multiply charged signals was obtained and deconvoluted using MaxEnt1 software to give the molecular mass of the protein.
Site-directed mutagenesis of Alba1
The construction of site-directed mutants of Alba1 was carried out using the Quikchange site-directed mutagenesis kit (Stratagene). The sequences of oligonucleotides used for mutagenesis are available from the corresponding author on request. Mutant proteins were expressed and purified as described for the corresponding wt proteins.
The standard acetylation assay consisted of 250 nM ssArd1, 20 μM Alba1 and 18 μM 14C acetyl CoA in assay buffer (50 mM Tris-HCl pH 7.0, 1 mM EDTA, 10% glycerol) in a total volume of 10 μl, incubated at 55°C for the time indicated. Control reactions excluding ssArd1 were treated in the same way to determine non-enzymatic background acetylation rates, which were subtracted from the rates calculated. Reactions were stopped by freezing in liquid nitrogen and stored until needed. To quantify acetylation, samples were thawed quickly with the addition of SDS sample loading buffer, and then loaded onto SDS-PAGE gels. Following electrophoresis, gels were vacuum dried and exposed to a phospho-image plate overnight. Plates were scanned using a Fuji FLA-5000 phosphorimager, and the ratios of incorporated and unincorporated 14C-acetate were determined using Imagegauge software (Fuji). All experiments were carried out in triplicate for each time point, and standard errors calculated.
To determine the target site/s of acetylation of Alba1 by ssArd1, wt, K16E and S2P Alba1 proteins (20 μM) were acetylated for 0, 20, 40, 60, 90, 120, 180, 240 and 300 s. The N-terminal sequence specificity of ssArd1 was investigated using versions of Alba1 mutated at position 2, following the initiating methionine (S2A, S2E, S2G, S2L, S2T and S2V), which were analysed in the same way with time points of 0, 60, 120, 180, 240 and 300 s. Rates of acetylation of other S. solfataricus proteins (Hjc, Hje, PCNA1, PCNA2, SSB, Sso7d, XPB1 and XPF), all at 10 μM, were compared with wt recombinant Alba1 at a single time point (300 s).
Thanks to Paul Talbot for technical assistance and fractionation of S. solfataricus proteins for mass spectrometry. Thanks to M.F.W. lab members Jodi Richards, Jo Parker and Jana Rudolf and Jen Roberts for the purified recombinant S. solfataricus proteins, and to Jana Rudolf, Sonia Paytubi and Taciana Kasciukovic for helpful discussions. This work was funded by the BBSRC. The Mass Spectrometry facility in St Andrews is funded by grants from the Wellcome Trust.