Protein alpha-N-acetylation studied by N-terminomics

Authors


K. Gevaert & P. Van Damme, VIB Department of Medical Protein Research, UGent Faculty of Medicine and Health Sciences, Ghent University, A. Baertsoenkaai 3, B-9000 Ghent, Belgium
Fax: +32 92 649 496
Tel: +32 92 649 274
E-mail: kris.gevaert@vib-ugent.be; petra.vandamme@vib-ugent.be

Abstract

Cotranslational protein N-terminal modifications, including proteolytic maturation such as initiator methionine excision by methionine aminopeptidases and N-terminal blocking, occur universally. Protein alpha-N-acetylation, or the transfer of the acetyl moiety of acetyl-coenzyme A to nascent protein N-termini, catalysed by multisubunit N-terminal acetyltransferase complexes, generally takes place during protein translation. Nearly all protein modifications are known to influence different protein aspects such as folding, stability, activity and localization, and several studies have indicated similar functions for protein alpha-N-acetylation. However, until recently, protein alpha-N-acetylation remained poorly explored, mainly due to the absence of targeted proteomics technologies. The recent emergence of N-terminomics technologies that allow isolation of protein N-terminal peptides, together with proteogenomics efforts combining experimental and informational content have greatly boosted the field of alpha-N-acetylation. In this review, we report on such emerging technologies as well as on breakthroughs in our understanding of protein N-terminal biology.

Abbreviations
(a)TIS

(alternative) translation initiation site

hyx

hyrax

NAT

N-terminal acetyltransferase

SCX

strong cation exchange

Protein alpha-N-acetylation and N-terminal acetyltransferases

Alpha-N-acetylation occurs in all kingdoms of life, but has mostly been studied in eukaryotes where it is a highly abundant modification [1–5]. This modification occurs by transfer of an acetyl group from acetyl-coenzyme A to the alpha-amino group of the first amino acid residue of a protein or peptide. It is both a very common cotranslational and a rather rare post-translational process catalysed by N-terminal acetyltransferases (NATs), and all eukaryotic NATs identified to date are fully or partially associated with ribosomes [6–11]. From yeast to humans, there is a set of five conserved NATs, NatA–NatE [12], whereas higher eukaryotes also harbour an additional NAT-type, NatF [13] (see also Table 1). Each NAT is composed of one or more specific subunits and acetylates a subset of protein N-termini roughly defined by the N-terminal amino acid sequence. NatA, composed of the catalytic subunit Naa10p (Ard1) and the auxiliary subunit Naa15p (Nat1/NATH), was the first NAT identified and potentially acetylates Ser-, Ala-, Thr-, Val-, Gly- and Cys- N-termini after initiator methionine cleavage [4,14–16]. Also, acidic N-termini (Asp-/Glu-) may be targeted post translationally by NatA or Naa10p [17]. Furthermore, the chaperone-like protein Huntingtin (Htt)-yeast-2-hybrid protein K was identified as a stable interactor of hNatA and also influenced alpha-N-acetylation of a NatA substrate, suggesting that this may be an essential functional component of NatA [18]. NatB is composed of the catalytic subunit Naa20p (Nat3) and the auxiliary subunit Naa25p (Mdm20), and potentially acetylates Met–Asp-, Met–Glu- and Met–Asn- [7,19–21]. The hydrophobic N-termini, Met–Leu-, Met–Ile-, Met–Tyr- and Met–Phe- are potential targets of the NatC complex composed of the catalytic subunit Naa30p (Mak3) and the auxiliary subunits Naa35p (Mak10) and Naa38p (Mak31) [8,22,23]. NatD is the catalytically active Naa40p (Nat4) which acetylates the Ser-starting N-termini of histones H2A and H4 in yeast [24]. NatE is defined as Naa50p (Nat5) which in part is physically associated with the NatA subunits Naa10p and Naa15p [6,25–27]. Human Naa50p in vitro acetylates Met–Leu- and similar N-termini [17,28], but its in vivo activity and its dependency on Naa10p and Naa15p are yet to be determined. Recently, we revealed that higher eukaryotes express an additional NAT-type, NatF, defined by the catalytic Naa60p enzyme [13]. NatF acetylates Met–Lys-, Met–Ala-, Met–Leu- and other N-termini, including those found to hold an increased N-acetylation potential in higher eukaryotes. As such, NatF displays both a unique substrate profile and a potential redundancy with NatC and NatE.

Table 1.   Composition and subunits of eukaryotic N-terminal acetyltransferases (NATs). The NatA complex in humans may be composed of one of two different catalytic subunits, the paralogues hNaa10p or hNaa11p, and one of two different auxiliary subunits, the paralogues hNaa15p or hNaa16p. The NatE activity may require both Naa15p and Naa10p as auxiliary subunits. NatF is not present in Saccharomyces cerevisiae.
TypeNatANatBNatCNatDNatENatF
Homo sapiens
 Catalytic subunitshNaa10p hNaa11phNaa20phNaa30phNaa40phNaa50phNaa60p
 Auxiliary subunitshNaa15p hNaa16phNaa25phNaa35p hNaa38p hNaa15p hNaa10p 
S. cerevisiae
 Catalytic subunitsyNaa10p (Ard1p)yNaa20p (Nat3p)yNaa30p (Mak3p)yNaa40p (Nat4p)yNaa50p (Nat5p)
 Auxiliary subunitsyNaa15p (Nat1p)yNaa25p (Mdm20p)yNaa35p (Mak10p) yNaa38p (Mak31p) yNaa15p (Nat1p) yNaa10p (Ard1p) 

Functions of protein alpha-N-acetylation: protein stability, localization and beyond

The prevailing view on alpha-N-acetylation is that it protects proteins from degradation. Quite unexpectedly, Varshavsky and colleagues recently demonstrated that alpha-N-acetylation can create specific degradation signals, referred to as AcN-degrons, which are targeted by a specific ubiquitin ligase that tags these proteins for degradation [29]. By contrast, recent proteomics studies demonstrated that the steady-state protein expression levels of NatA and NatB substrates in yeast remained largely unchanged [4,21], indicating that not all NatA- and NatB-mediated alpha-N-acetylation events in yeast yield AcN-degrons [29]. Puzzling here is that the SP-starting Ymr090w protein, previously found to be Nα-free [4], in compliance with the (X)PX rule of protein N-acetylation [30], was nonetheless shown to harbour a NatA-dependent AcN-degron [29].

Besides this more general view on the effect of protein alpha-N-acetylation on protein stability, its exact biological implication has only been determined for a limited number of proteins, ranging from no observed functional effects, to alteration in protein localization, protein complex formation and enzyme kinetics (Table 2). For example, Arl proteins, small Arf-like GTPases, play key roles in targeting subcellular organelles and only in their GTP-bound state, the hydrophobic side of the N-terminal positioned helix is exposed, enabling interaction with the lipid bilayer. Upon GTP hydrolysis, however, the GDP-bound form is displaced from the membrane as the helix becomes buried in the protein core. Alpha-N-acetylation of these N-terminal, amphipathic helices of yeast Arl3p and human Arl8b by NatC was shown to be a prerequisite for targeting these GTPases to the Golgi and lysosomes respectively. As such, in these cases, alpha-N-acetylation seems to be an alternative for fatty acid based modification (i.e. N-myristoylation and N-palmitoylation) in targeting small GTPases to membranes [8,31–33]. Construction of chimeric proteins further demonstrated that alpha-N-acetylation is more likely to be important in removing the N-terminal positive charge and thereby promotes protein insertion into the lipid bilayer as opposed to organellar-specific targeting [32]. Protein malfunctioning inflicted by the presence of an alpha-N-acetyl group was demonstrated for the thalassaemia-causing Raleigh mutation in haemoglobin. Here, a Val to Ala mutation at position 2 leads to alpha-N-acetylation and results in a decreased affinity for oxygen [34,35]. Other roles for eukaryotic protein alpha-N-acetylation and their associated function(s) are summarized in Table 2.

Table 2.   Known effects and functions of co-translational alpha-N-acetylated eukaryotic proteins.
ProteinFunctionRole of alpha-N-acetylationReferences
MATα2Site-specific endonucleaseTargeted degradation29
Tbf1pTranscriptional activator
Slk19pRegulator of chromosome segregation
Pop2pmRNA deadenylase
Hsp104pChaperone
Tho1pNuclear RNA-binding protein
Ubp6pUbiquitin carboxyl-terminal hydrolase
Aro8pAromatic amino acid aminotransferase
Act1pCell motilityProtein–protein interaction20,87,88
Tpm1pMuscle contraction, actin filament stabilization
GagViral capsid protein22
HbFOxygen transport35
Orc1pTranscriptional silencing89
Sir3pTranscriptional silencing90
Arl3pGTPaseProtein localization31
hArl8b8,32
Grh1pEndoplasmic reticulum to Golgi vesicle-mediated transport91
Trm1ptRNA (guanine-N2-)-methyltransferase activity92
GNMTGlycine N-methyltransferaseEnzymatic function93
Tfs1pCarboxypeptidase Y inhibitionProtease inhibition94
Subunits of the 80S ribosomal complexProtein synthesisEfficiency of protein synthesis/80S ribosome assembly/translational fidelity95

N-Terminal protein sequence characteristics

Considerable differences in the use of amino acids at protein N-termini compared with internal protein sequences have been identified [36,37]. Amino acid frequencies at human protein N-termini are indeed biased and alanines are highly favoured at N-terminal positions, especially at the second position [36]. Interestingly, this is the case for serine in Saccharomyces cerevisiae [4] and Halobacterium salinarium [2]. Further, a higher frequency is also evident for arginine and leucine, whereas isoleucine, tyrosine and acidic amino acids are disfavoured at human protein N-termini. Furthermore, an apparently repetitive occurrence (or enrichment) of the starting amino acid was observed for certain classes of protein N-termini, including the (Met)–Ala-, (Met)–Ser- and (Met)–Thr NatA-type class of N-termini [36]. Further, for NatB substrates, conservation of the N-terminal sequence (NAT class) is not common between lower and higher eukaryotes [21], which is well in line with the fact that the N- and C-termini of proteins display a lower degree of conservation compared with internal protein parts [37]. However, evolution of N-terminal sequences was shown not to be a major cause of the observed evolutionary increase in N-terminal acetylation [13]. All of the above indicate that phenomics data related to specific (often yet undefined) NAT substrates cannot simply be extrapolated from one species to another, despite the general evolutionarily-conserved NAT specificity profiles [4,7,8].

Evolutionary shifts in NAT-specificities and alpha-N-acetylomes

Sequence analysis of yeast and human protein N-termini suggests that evolution or differences in substrate N-terminal sequences may account for only a minor part of the observed evolutionary shift in alpha-N-acetylation [13]; protein N-acetylation in archaea, yeast, fruit fly and human occurs on ∼ 15%, 60%, 70% and 80% of all (soluble) proteins, respectively [2,4,38]. It was only recently demonstrated that the level of alpha-N-acetylation in archaea [2,39–41] more closely resembles that of eukaryotes as opposed to bacteria, although the overall numbers of acetylated proteins are fewer and archaeal protein N-termini are typically partially acetylated. It is interesting to note that in the archaeal Sulfolobus solfataricus genome only one NAT-encoding gene was identified. The resulting protein was confirmed to display NAT-activity and based on its 37% sequence identity with hNaa10p, was named ssArd1 (ssNaa10p). Surprisingly, besides the broader specificity profile compared with the typical NatA-type specificity [i.e. ssArd1 also acetylates (Met)-Ser- and (Met)-Ala-], this protein additionally demonstrated NAT activity towards Met–Glu- and Met–Leu-starting N-termini [41], leading to the hypothesis that this NAT might well represent an ancestral NAT with a broader substrate-specificity profile. Interestingly, independent of its other composite NatA partners, hNaa10p was recently shown to display a specificity profile that diverged from that of endogenous hNatA because it also displayed a prominent acetylating activity towards Glu- as well as Met–Glu- starting N-termini, which was postulated to represent a NatA-independent, post-translational hNaa10p activity [17].

The regulation of substrate specificity and NAT stability via complex formation [6,7,9,17,42], post-translational modifications of NATs [28,43], the existence of various splice variants and NAT paralogues [44–46] potentially contribute to the increased complexity of alpha-N-acetylation observed in higher eukaryotes and require further molecular characterization. In this respect, in silico analysis combined with in vitro peptide and in vivo protein acetylation analyses, led to the identification a novel NAT specific for higher eukaryotes, and displaying a unique specificity profile [13]. When expressed in yeast, this NAT increased the overall fraction of alpha-N-acetylated yeast proteins by 10%, as such increasing the general acetylation levels towards those of higher eukaryotes.

Protein N-acetylation studies point to alternative translation initiation (TIS)

Alternative splicing and alternative promoter usage are major contributors to the increase in protein diversity in higher eukaryotes [47,48]. Synthesis of various protein isoforms might be fine-tuned and controlled at the nucleotide sequence by secondary RNA structure constraints proximate to the translation start codon and by the regulation/occurrence of trans-acting factors [49]. Both allow for a context-dependent, leaky mechanism of mRNA scanning [50], internal entry of the ribosome or ribosomal shunting [51] as opposed to the more general linear scanning mechanism of translation initiation making use of the 5′ 7-methylguanosine, or m7G, RNA cap-recruited 40S ribosomal subunits to scan and allow for the initiation of translation at the first encountered AUG codon [52]. However, the occurrence and usage of alternative AUG and non-AUG start codons within a single transcript and their contribution to proteome diversity have been neglected. An accumulating body of evidence on mRNA transcript information clearly demonstrates the possible generation of protein isoforms differing at their N-terminus [53]. These may both be N-terminally extended or truncated through the use of alternative translation initiation sites (aTIS) in the 5′-UTR or by downstream inframe start codons, respectively. Using comparative genomics, it was recently demonstrated that both types of aTIS are conserved in eukaryotic genomes [54–56], further highlighting their possible functional significance. As a consequence, the usage of aTISs might give different functionalities to their resulting protein products, including distinct localization patterns [57] and protein interaction potential among others, as demonstrated for a few tens of proteins [58–63].

Figure 1 shows the identification of the conserved database-annotated (with and without initiator methionine processing) and aTIS1 identified in the human Mps one binder kinase activator-like 3 or MOBL3 protein and its Drosophila orthologue, Mps one binder kinase activator-like 4 (MOB4). Alternative splicing may give rise to the identified conserved aTIS2, because for both species the coding sequence resides in exon 2. Alpha-N-acetylation on all identified MOBL3/4 protein variants, in addition to the presence of optimal KOZAK (in case of human proteins being gccRccAUGG with R=G or C and capitals indicative of the major determining nucleotides) and Cavener (KOZAK-like motif in insects, including Drosophila melanogaster, being cAAaAUG [64,65]) sequence motifs surrounding the annotated as well as the downstream AUG start codons, point to species-conserved occurrence of alternative translation. Nonetheless, in the human UniProtKB/Swiss-Prot release version 2011_03, this protein product was marked to be the result of erroneous translation initiation (EMBL-CDS: AAP97221.1). Remarkably, the partial degree of alpha-N-acetylation of both database-annotated protein N-termini (Fig. 1B) (15.3% for the V2M-starting human N-terminus and 29.8% for the M1K-starting Drosophila N-terminus) and the absolute distance between the aTISs, a factor known to contribute to the regulation of aTIS selection [54], are also conserved. Because the extreme N-terminus harbours a putative signal peptide and the protein is known to reside at the membrane of Golgi cisternae, it is tempting to speculate that the loss of a few N-terminal residues and/or the presence or absence of the initiator methionine and alpha-N-acetylation might impinge an altered stability and/or localization to the resulting isoforms.

Figure 1.

 Conservation of alternative translation initiation sites (aITS) in the Mps one binder kinase activator-like protein from fruit fly and human, as demonstrated by mass spectrometric identification of matching protein N-termini. (A) Sequence alignment of the first 50 amino acid residues of human MOBL3 and Drosophila melanogaster MOB4 show that all methionines are conserved. Identified (alternative) protein N-terminal peptides are indicated in a bold. The loss of the first one or two amino acids in the aTIS as opposed to the database-annotated N-termini identified is indicated in italics. (B) Alignment of the corresponding nucleotide sequences surrounding the identified aTIS AUG codons (underlined). Nucleotide preferences matching the KOZAK (MOBL3) or Cavener (MOB4) consensus sequence motifs are indicated with capital letters. (C) Illustrative MS-spectra for the different categories of N-terminal peptides. Database-annotated protein N-termini are indicated in the left-hand panels. These peptides were identified as Ac-2VMAEGTAVLR11 (upper) (the initiator methionine retaining N-terminus was additionally identified) and 1MKMADGSTILR11 (lower) [(13C2D3)Ac denotes a 13C2D3 alpha-N-acetylated amino group] originating from the human MOBL3 and the fruit fly MOB4 protein. The ion intensities of the two different N-terminal peptide forms indicate that both N-termini were partially acetylated (for 15.3 and 29.8%, respectively). MS-spectra of the conserved and fully alpha-N-acetylated aTIS-generated N-termini, respectively AEGTAVLR and ADGSTILR, are displayed in the right-hand panels. Further, another alternative N-terminus, being 33MDSTLAVQQYIQQNIR48 and 33MDSTLAVQQYIQQLIKR49 in MOBL3 and MOB4, respectively (both fully alpha-N-acetylated), presumably produced by the action of alternative splicing was also identified.

Post-translational alpha-N-acetylation

As opposed to eukaryotes and archaea, alpha-N-acetylation occurs, although rarely, post-translationally in bacteria. In Escherichia coli, only four N-acetylated proteins, including the elongation factor EF-Tu and some ribosomal proteins, have been identified [1]. Alpha-N-acetylation of S12 was shown to be required for ribosomal stalk complex stabilization [66].

Besides the well-reported occurrence and biological implications of post-translational alpha-N-acetylation on certain regulatory peptides and hormones including the α-melanocyte-stimulating hormone and β-endorphin in mammals (reviewed in [38]), the widespread assumption is that in eukaryotes alpha-N-acetylation occurs strictly cotranslationally. More recently, however, indirect evidence has accumulated for a more generalized occurrence of post-translational alpha-N-acetylation. In this respect, identifying the upstream proteases exposing alpha-N-acetylatable N-termini will be of vital importance. In various plant studies, > 50 processed and alpha-N-acetylated mature chloroplast protein N-termini have recently been identified [67], all of which were nuclear encoded (see also representative example in Fig. 2 as reported in [67]). The transport of proteins in higher plant chloroplasts across the membrane is often mediated by N-terminal chloroplast transit peptides and strikingly, the N-acetylated residues of these chloroplast proteins (closely) mapped to the experimentally verified/predicted processing event by the stromal processing peptidase, revealing acetylation motifs of the NatA type, recognizable by a presumed nonluminal NAT [67–69]. As such, post-translational alpha-N-acetylation is highly likely to follow stromal processing peptidase cleavage events. Recently, Bienvenut et al. [70] showed a similar mechanism in the alga Chlamydomonas reinhardtii in which chloroplast proteins with a processed transit peptide were identified as N-terminally acetylated.

Figure 2.

 Alpha-N-acetylated mature protein N-terminus of the nuclear-encoded chloroplast protein Myo-Inositol monophospatase-like 1. (A) The mature N-terminus 61VLSEVSDQTR70 of the myo-inositol monophospatase-like 1 protein (AT1G31190.1) (indicated in the protein sequence in bold) was identified as alpha-N-acetylated. The predicted chloroplast transit sequence (using TargetP at http://www.cbs.dtu.dk/services/TargetP-1.1/) is indicated in italics and the predicted stromal processing peptidase processing site underlined. The lower part schematically represents the domain organization of this protein and its predicted transit peptide (purple) and stromal processing peptidase -cleavage site. Inositol_P = inositol monophosphatase domain. (B) The corresponding MS/MS spectrum of the doubly charged peptide ion with the observed b- and y-type fragment ions indicated.

Further, Rubenstein and colleagues revealed that, in various eukaryotic species, some of the highly conserved actins undergo post-translational processing by which the alpha-N-acetylated Met or Cys is removed by an N-acetylaminopeptidase followed by alpha-N-acetylation of the newly exposed acidic N-terminal residue [71–74]. Recently, we reported on a putative role for hNaa10p in the post-translational acetylation of these processed actin N-termini. In fact, we demonstrated that hNaa10p very strongly acetylates peptides representative of the acidic (i.e. the Asp- and Glu-starting) N-termini of β- and γ-actin. Further, we also showed that a major fraction of endogenous hNatA is nonpolysomal and that a portion of hNaa10p is independent of the hNatA complex, suggesting that either hNatA or hNaa10p may act post translationally towards the processed actins [17]. Further, Helbig et al. [36] reported on the identification of over 200 so-called internal (partially) N-acetylated protein N-termini in yeast and demonstrated the lack of (a) consensus sequence(s) of such N-termini as opposed to database-annotated, N-acetylated protein N-termini. Thus, overall, all of these data hint at NATs working in a post- as well as cotranslational mode.

The use of oligopeptides for studying the sequence specificities of alpha-N-acetyltransferases in vitro

Synthetic peptides resembling the N-terminal parts of nascent polypeptides have been successfully used as substrates for selected NAT complexes or catalytic subunits. For such studies, one needs a (mixture of) peptide(s), a NAT complex or catalytic subunit and an acetyl donor, acetyl coenzyme A or (stable) isotopically labelled variants thereof. Following in vitro alpha-N-acetylation, the mixture can either be separated by reverse-phase HPLC or the substrate peptide can be isolated to calculate the degree of acetylation. The former is based on the fact that an acetylated peptide is slightly more hydrophobic than its nonacetylated counterpart, and thus based on UV absorbance intensities, the absolute degree of acetylation can be derived by comparing the UV traces of both types of peptides. The latter is typically used for radiolabelled (following acetylation) peptides in which scintillation counters are used to measure radioisotopes, thus leading to relative levels of acetylation when comparing different substrate peptides.

A more in-depth and largely unbiased analysis of the sequence specificities of the NAT complexes and their individual catalytic subunits can be carried out by using proteome-derived peptide libraries as enzyme substrate pools [13,17] (Fig. 3D). In brief, in an isolated proteome, cysteines are in vitro blocked by alkylation and lysines by acetylation. Such a proteome is then digested with trypsin, which now only cleaves C-terminal to arginines, and alpha-N-acetylated peptides are depleted from this complex peptide mixture by strong cation exchange (SCX) at low pH ([75], and see below). This mixture, now composed of peptides containing free alpha-amino groups, is incubated with a NAT complex or a catalytic NAT subunit of interest and with an isotopically labelled variant of acetyl coenzyme A such as 13C2D3-acetyl-coenzyme A. As a result, during enzymatic acetylation, NAT substrate peptides acquire a 13C2D3-acetyl group at their alpha-N-terminus, aiding in identifying true peptide substrates following LC-MS/MS analysis. Following the acetylation reaction, in vitro acetylated peptides are enriched by SCX at low pH and finally analysed by LC-MS/MS.

Figure 3.

 Proteomics technologies to study protein or peptide alpha-N-acetylation. Protein mixtures are in vitro S-alkylated and free amines are either N-acylated (enrichment of all N-termini, independent of their Nα-modified status) or left unmodified (exclusive enrichment of in vivo alpha-N-blocked N-termini). Subsequently, the samples are typically digested with trypsin. (A) (Alpha-)amine specific coupling methods to deplete N-free peptides include terminal amine isotopic labelling of substrates and CNBr-activated Sepharose. (B) SCX at low pH can be used to enrich for N-terminal peptides, with internal peptides being retained on the SCX-resin and serving as an N-free peptide library for in vitro NAT-assays (see D). (C) Biotinylated internal peptides are affinity-trapped on an avidin column to enrich for N-terminal peptides. (D) Peptide in vitro Nα-acetylation. N-terminal peptides are subsequently analysed by LC–MS/MS analysis.

Proteomics for studying protein alpha-N-acetylation

Contemporary proteomics is driven by mass spectrometry and is further typically performed bottom-up, meaning that proteomes are first digested into peptides, typically using trypsin, and these peptides are analysed by means of mass spectrometry and, following identification, they serve as proxies for the identification and quantification of their parent proteins. We need not explain that because of the high complexity of peptide mixtures generated by whole-proteome digestion, protein N-terminal peptides need to be enriched prior to LC-MS/MS analysis, because they would otherwise remain largely undetected in the background of all other peptides. Over the past several years, various laboratories have introduced different strategies for enriching N-terminal peptides, and the most important are discussed below (see also Fig. 3).

In 2003, the Gevaert lab reported on the use of diagonal, reverse-phase peptide chromatography to enrich for protein N-terminal peptides [76] (Fig. 3B). A detailed protocol describing all consecutive steps of this so-called N-terminal combined fractional diagonal chromatography approach was recently published [77]. In brief, prior to trypsin digestion, cysteines are blocked by alkylation and lysines and primary alpha-amines by acylation. As a result, trypsin digestion will yield two main classes of peptides; N-terminal peptides and non-N-terminal peptides or internal peptides. The actual chemical difference between both peptide classes is that the N-terminal peptides have a blocked (acylated) alpha-amino group, whereas the internal peptides have a free, thus primary, alpha-amino group. Following trypsin digestion, this peptide mix is fractionated a first time by RP-HPLC. Clearly, each of these peptide fractions contains both N-terminal and internal peptides; the latter are specifically targeted by reaction with 2,4,6-trinitrobenzenesulfonic acid and thus made more hydrophobic by the trinitrophenyl group that they acquire on their alpha-N-terminus. As a result, during a second, identical RP-HPLC separation, the modified internal peptides shift away from the nonmodified N-terminal peptides, and the latter are collected for further LC-MS/MS analysis.

Because N-terminal combined fractional diagonal chromatography is a negative selection procedure for N-terminal peptides, i.e. removing non-N-terminal peptides instead of enriching for them, this technology was recently also used for detailed, quantitative studies on protein alpha-N-acetylation. A first study, published in 2007, was peformed on the halophilic archaeabacteria H. salinarum and Natronomonas pharaonis and, rather unexpectedly, reported that 13–18% of archael proteins were subject to alpha-N-acetylation [2].

Combined with other proteomics technologies, including SCX at low pH (see below), N-terminal combined fractional diagonal chromatography was used to map the N-terminal proteome of Drosophila melanogaster Kc167 cells [30]. Besides reporting on the identification of over 1200 mature protein N-termini in these fruit fly cells, the (X)PX rule was derived from the proteomics data. This rule indicates that proline prevents alpha-N-acetylation if it is present at protein position 2, following the (nonretained) initiator methionine, or at protein position 3 following an amino acid with a small gyration radius (e.g. glycine and serine) and the nonretained initiator methionine. Given the current focus on the influence of protein alpha-N-acetylation on protein stability [29], this (X)PX rule gives researchers the possibility of studying protein stability (and protein function and localization) when removing the end-standing acetyl group. Both pioneering N-terminal combined fractional diagonal chromatography studies catalogued alpha-N-acetylation in different organisms, but did not provide clues as to the actual degree of acetylation. In fact, this important quantitative aspect of alpha-N-acetylation was assessed in another study that reported on the specificities of the yeast and human NatA acetyltransferase complex by the use of amino-directed, and isotopically labelled mass tags [4].

Several other negative selection procedures also target internal peptides using for instance beads onto which amino-scavengers are immobilized, such as N-terminal amine isotopic labelling of substrates that uses dendritic polyglycerol aldehyde polymers to trap internal peptides [78] and CNBr activated Sepharose beads [79] (Fig. 3A). Yet another strategy is to label the alpha-N-terminus of internal peptides with affinity tags such as biotin allowing for their subsequent removal from proteome digests [80] (Fig. 3C).

As mentioned above, SCX chromatography at low pH can be used to enrich blocked N-terminal peptides from protein digests (e.g. [81] and Fig. 3B). More recently, the Heck lab exploited SCX at low pH to enrich for acetylated peptides in whole-proteome digests [82]. Later, the Heck lab experimented with the endoproteinase Lys-N (thus cleaving N-terminal to lysine residues in proteins and yielding internal peptides starting with lysine) in combination with SCX at low pH and different peptide fragmentation techniques and showed that specific peptides classes, the most important being alpha-N-acetylated peptides and phosphopeptides, could be selectively enriched [83]. The latter technology was recently used to study the substrates of the yeast NatB complex [21]. In brief, in this study, following metabolic labelling with nitrogen-15, the N-terminal proteomes of wild-type yeast and a nat3(naa20)Δ strain were compared, and out of 756 acetylated protein N-termini, 59 different yeast NatB substrates were identified.

Concluding remarks

Accurate prediction of the occurrence of alpha-N-acetylation and its degree is not straightforward [84,85], because besides the primary amino acid sequence, the mRNA structure (RNA-stalling), the structure of the emerging polypeptide, the occurrence of cotranslational membrane insertion, complex formation, speed of translation and chaperone activity among others, and the expression of species-specific NATs (as is the case for NatF) may all potentially contribute to the occurrence and the extent of alpha-N-acetylation. As such, detailed and comprehensive experimental data on various N-acetylomes are still needed to understand the biology of protein N-acetylation. Organisms and cellular model systems in which the NATs are manipulated, combined with large throughput N-terminal proteomics, appear to be a very powerful and effective route. Further, qualitative and quantitative in vitro NAT assays provide essential supporting information, and in some cases may even guide our path towards interesting biological findings. As an important tool for assessing the functional implications of protein N-acetylation, the (X)PX rule may be implemented. In a transgene fruit fly model, Goetze et al. demonstrated the effects of hyrax (Hyx) protein N-acetylation when comparing wild-type N-acetylated Hyx protein and the alpha-amino free form Hyx(A2P), of which the latter only partially rescued a mutant hyx allele and displayed reduced expression levels. Nonetheless creation of N-acetylated and their N-free protein counterparts needs great care as an altered translation initiation context and concomitantly the translation efficiency, RNA/protein stability, post-translational modification of an altered amino acid, efficiency of initiator methionine processing and/or level of expression might differ between the two protein variants.

Regarding the functional impact of protein alpha-N-acetylation, there is still no clear consensus based on the variety of published data. Several studies, including the work by Goetze et al. of the Hyx protein, suggest that alpha-N-acetylation confers stability on proteins. However, Hwang et al. [29] demonstrated the involvement of alpha-N-acetylation in yeast protein degradation. These apparent discrepancies may simply be due to the fact that there is no absolutely general rule; alpha-N-acetylation may or may not impose a gain or loss-of-function on its target protein depending on the specific protein in question. It is interesting that in certain cases the absence of alpha-amino acetylation is more important (conserved) than its presence. In the case of various 20S proteasome catalytic subunits, propeptides protect the mature N-terminal catalytic threonine residues from alpha-N-acetylation and in this way prevent the loss of specific peptidase activities [86]. Further, a significant correlation between alpha-amino free protein N-termini and the presence of certain protein domains was also observed [30].

Further improvements of the applied methodology, the generation of additional genetic models as well as in-depth functional studies of selected candidates, is likely to be instrumental in the elucidation of the ‘why?’ and ‘how?’ of alpha-N-acetylation.

Acknowledgements

PVD is a Postdoctoral Fellow of the Research Foundation–Flanders (FWO-Vlaanderen). KG acknowledges support of research grants from the Fund for Scientific Research–Flanders (Belgium) (project numbers G.0042.07 and G.0440,10), the Concerted Research Actions (project BOF07/GOA/012) from the Ghent University and the Inter University Attraction Poles (IUAP06). TA is supported by the Norwegian Research Council (Grant 197136 to TA), and the Norwegian Cancer Society.

Ancillary