Halophilic archaea thrive in environments with salt concentrations approaching saturation. However, little is known about the way in which these organisms stabilize their secreted proteins in such ‘hostile’ conditions. Here, we present data suggesting that the utilization of protein translocation pathways for protein secretion by the Halobacteriaceae differs significantly from that of non-haloarchaea, and most probably represents an adaptation to the high-salt environment. Although most proteins are secreted via the general secretion (Sec) machinery, the twin-arginine translocation (Tat) pathway is mainly used for the secretion of redox proteins and is distinct from the Sec pathway, in that it allows cytoplasmic folding of secreted proteins. tatfind (developed in this study) was used for systematic whole-genome analysis of Halobacterium sp. NRC-1 and several other prokaryotes to identify putative Tat substrates. Our analyses revealed that the vast majority of haloarchaeal secreted proteins were predicted substrates of the Tat pathway. Strikingly, most of these putative Tat substrates were non-redox proteins, the homologues of which in non-haloarchaea were identified as putative Sec substrates. We confirmed experimentally that the secretion of one such putative Tat substrate depended on the twin-arginine motif in its signal sequence. This extensive utilization of the Tat pathway in haloarchaea suggests an evolutionary adaptation to high-salt conditions by allowing cytoplasmic folding of secreted proteins before their secretion.
Protein folding under high-salt conditions is crucial for the survival of halophilic archaea, as these organisms accumulate potassium in their cytoplasm to balance the high sodium concentration of the environment (Ginzburg et al., 1970). It has been shown that chaperones support correct protein folding in the cytoplasm of halophilic archaea (Franzetti et al., 2001). Once properly folded, halophilic proteins are stabilized by the same intramolecular forces as their non-halophilic counterparts (Frolow et al., 1996). In addition, aggregation is prevented by minimalization of hydrophobicity and accumulation of negative charges on the protein surface (Madern et al., 2000; Mevarech et al., 2000; Scandurra et al., 2000). Folding might even be accelerated as a result of high intracellular salt conditions. But how can halophilic archaea stably secrete proteins across the cytoplasmic membrane? Bacteria and non-halophilic archaea use the general secretory (Sec) pathway for most secreted proteins, which requires that the proteins are kept in an unfolded state (Pohlschröder et al., 1997; Driessen et al., 1998), a risky scenario in extremely halophilic conditions. Thus, folding of secreted haloarchaeal proteins before translocation would (i) allow their stabilization; and (ii) prevent aggregation, both intra- and extracellularly. The recently discovered twin-arginine translocation (Tat) pathway is capable of translocating folded proteins (Berks et al., 2000; Robinson and Bolhuis, 2001). The use of this pathway by the haloarchaea would also alleviate the need for protein folding outside the cell, where chaperones might not be available. The existence of the Tat pathway in haloarchaea is supported by the observation that homologues of the known essential Tat components (TatA/B and TatC) are present in these organisms (Ng et al., 2000).
Tat substrates can be recognized by their signal sequences, which contain a twin-arginine motif followed by a stretch of uncharged residues (Cristobal et al., 1999; Berks et al., 2000). Signal sequences of Tat substrates resemble those of Sec substrates in that both contain a positively charged hydrophilic n-region, followed by a hydrophobic stretch (h-region) and, in most cases, a cleavage site (c-region) Fig. 1). However, Tat signal sequences contain the conserved twin-arginine motif, which consists of two neighbouring arginines within a less conserved amino acid pattern (Berks et al., 2000; Robinson and Bolhuis, 2001). The h-region of Tat signal sequences is generally less hydrophobic than that of Sec signal sequences and often consists of an uncharged region with neutral hydrophobicity (Cristobal et al., 1999; this work). In addition, it has been suggested that as yet unidentified structural determinants in the mature part of the protein may also play a role in Tat substrate specificity (Cristobal et al., 1999; Halbig et al., 1999; Sanders et al., 2001).
The analysis of known extracellular proteins (which include an α-amylase, proteinases and halocins; Kamekura et al., 1992; 1996; Kobayashi et al., 1994; Price and Shand, 2000) from halophilic archaea revealed that all had typical Tat signal sequences. We thus sought a better understanding of the role that this pathway plays in these organisms. To this end, we scanned the entire Halobacterium sp. NRC-1 genome sequence (Ng et al., 2000; currently the only available complete genome sequence of a halophilic archaeon) for Tat and Sec substrates. Our analyses confirmed that almost all secreted proteins from halophilic archaea are putative Tat substrates. We therefore hypothesize that the extensive use of the Tat pathway by the Halobacteriaceae represents an evolutionary solution to the problem posed by the need for stabilization of secreted proteins in high-salt conditions.
Upon observing that (i) all known extracellular proteins of the Halobacteriaceae had typical Tat signal sequences; and (ii) homologues of these proteins in organisms other than Halobacteriaceae contained Sec signal sequences, our goal was to better understand the role played by the Tat pathway in haloarchaeal protein translocation. We thus examined the utilization of the Tat and Sec pathways in Halobacterium sp. NRC-1 by analysing the entire genome using the following strategy: (i) we developed a Tat substrate recognition program (tatfind) to detect putative Tat substrates in Halobacterium sp. NRC-1 (this program is based on the position and sequence of the Tat pattern, as well as on the position, length and hydrophobicity of an uncharged region following the twin-arginine pattern); (ii) we examined the usage of the Sec pathway in Halobacterium sp. NRC-1 by identifying putative secreted proteins with signalp (Nielsen et al., 1997); (iii) all signalp-positive candidates were analysed further with tmhmm (Sonnhammer et al., 1998) to eliminate proteins with multiple membrane-spanning segments; and (iv) all remaining candidates were classified based on their subcellular localization. In any genome analysis, only a fraction of coding sequences (CDS) can be positively identified. We chose to analyse the proteins encoded by annotated CDS and assumed that this subset is representative of the whole population of CDS. A detailed map of the analyses and complete results of individual steps of the process can be accessed on the internet (http:www.sas.upenn.edu~pohlschr).
Identification and analysis of putative Tat substrates
Applying tatfind to the Halobacterium sp. NRC-1 genome, we identified 64 putative secreted Tat substrates (see web page). Of these, 34 had detectable homologues with assigned functions that could be classified into one of four groups: (i) eight putative extracellular enzymes; (ii) 13 binding proteins; (iii) seven redox proteins; and (iv) six other surface proteins (Table 1).
Table 1. Putative Tat substrates of Halobacterium NRC-1 and their closest homologs in both Halobacteriaceae and non-Halobacteriaceae.
Homologues of putative Halobacterium sp. NRC-1 Tat substrates were identified in other organisms by blastp and screened with tatfind and signalp to determine whether they were potential Tat or Sec substrates respectively (see web page). Although all haloarchaeal homologues were putative Tat substrates, the non-haloarchaeal homologues of all non-redox Tat substrates were putative Sec substrates (Table 1). We thus hypothesized that halophilic archaea use the Tat system as a major secretion pathway.
Identification of putative Tat substrates in non-Halobacteriaceae
To substantiate our hypothesis, the same analyses were carried out with three additional non-halophilic archaeal genomes. We identified only eight putative Tat substrates in Archaeoglobus fulgidus, a thermophilic non-halophilic euryarchaeon closely related to Halobacterium sp. NRC-1 (Table 2). Consistent with our previous findings, of the seven proteins that had assigned functions, six were putative redox proteins. Similarly, in two thermophilic crenarchaeota, Sulfolobus solfataricus and Aeropyrum pernix, only four and 10 putative Tat substrates, respectively, could be identified (see web page). Of these, all the assigned substrates were putative redox proteins (Table 2).
Table 2. Classification of putative archaeal Tat substrates.
Halobacterium sp. NRC-1
The accuracy of tatfind was assessed by screening the genomes of (i) Escherichia coli, which has been estimated to contain more than 400 exported proteins (http:www.cf.ac.ukbiosistaffehrmannhead.html) including 26 that are known or hypothesized to be secreted via the Tat pathway; and (ii) Methanococcus jannaschii, which lacks homologues of the Tat translocation genes. tatfind predicted 34 substrates in E. coli, successfully identifying all 26 previously reported Tat substrates (Robinson and Bolhuis, 2001; Stanley et al., 2001). Most remarkably, this algorithm did not predict any proteins in M. jannaschii, which is consistent with the lack of the Tat genes in this methanogen. In addition, tatfind identified no cytoplasmic proteins in A. fulgidus, A. pernix and S. solfataricus.
Taken together, these results support our hypothesis that the extensive use of the Tat pathway is an unusual feature of protein secretion in the Halobacteriaceae, and strongly suggest that tatfind is a suitable tool for the prediction of Tat substrates in a broad range of prokaryotes.
Identification of putative archaeal Sec substrates
To determine the significance of the relatively high number of putative extracellular Tat substrates identified in the Halobacteriaceae, it was necessary to examine the utilization of the Sec pathway for haloarchaeal secretion. Putative Sec substrates from Halobacterium sp. NRC-1 were identified by submitting all CDS to signalp. However, signalp predictions cannot be considered to be Sec pathway specific per se, as this program is not able to clearly distinguish between substrates of the Sec and Tat translocation pathways, because the overall structure of Tat and Sec signal sequences is conserved. Furthermore, the Sec machinery, unlike the Tat pathway, does not recognize specific amino acid patterns. Therefore, we used tatfind to identify and eliminate putative Tat substrates from the signalp output. Of the remaining signalp positives, 171 did not contain membrane-spanning segments past the first 50 amino acids, as identified by tmhmm. Fifty of these could be putatively identified by blastp analysis, of which the majority (33) were false positives as they had significant similarity to cytoplasmic or membrane proteins. Furthermore, five proteins had significant similarity to flagellins, which are known to use a specialized translocation system (Thomas et al., 2001). The remaining 12 proteins were putative Sec substrates, although their cell surface localization was evident in only three cases, including a homologue of a type IV protease and two proteins involved in uptake pathways (see web page). In contrast, when similar analyses were conducted with the A. fulgidus genome, we observed that the majority of proteins were secreted via the Sec pathway (see web page). These results indicate that the vast majority of putative secreted proteins of Halobacterium sp. NRC-1 are translocated via the Tat pathway, and suggest that this phenomenon is specific to the Halobacteriaceae.
Functional analysis of the haloarchaeal Tat motif in vivo
It is known that the exchange of the twin arginines (RR) with lysines (KK) in the twin-arginine motif of Tat substrates results in a block of Tat-dependent protein translocation in E. coli (Stanley et al., 2000). To determine the Tat specificity of an extracellular halophilic protein, we replaced the RR of the α-amylase precursor from the alkalihalophile Natronococcus sp. strain Ah36 (Kobayashi et al., 1994) (AmyRR) with KK (AmyKK). Using the genetically amenable halophile Haloferax volcanii for the expression of these constructs, we observed that the export of active AmyKK was blocked. This was indicated by the absence of clear halos around cells grown on starch medium that was exposed to iodine vapour (Fig. 2A). Immunoblot analysis of culture supernatants and cell extracts confirmed that the enzyme was synthesized in both cell types, but was secreted only in cells harbouring wild-type α-amylase (Fig. 2B). Approximately 90% of α-amylase activity was present in the culture supernatant of wild-type cells, whereas only cell-associated activity was observed for cells expressing AmyKK (data not shown). We conclude that this halophilic α-amylase can fold inside the cytoplasm, and that the twin-arginine motif is indeed essential for its secretion.
Systematic whole-genome analyses can be useful in the characterization of biological adaptations to highly selective environments. A good example is the analysis of the Thermoplasma volcanium genome, which confirmed principles of thermal adaptation (Kawashima et al., 2000). Although the understanding of halophilicity has thus far been limited to the characterization of model proteins, our analyses of the recently sequenced Halobacterium sp. NRC-1 genome have revealed a strategy used by halophilic archaea to survive at salt concentrations that are close to saturation.
Prompted by the surprising finding that all known extracellular proteins from halophilic archaea had typical Tat signal sequences, we hypothesized that, in response to extremely high-salt conditions, the Halobacteriaceae rerouted the translocation of most secreted proteins to the Tat pathway, allowing these substrates to fold in the cytoplasm before their secretion. In addition to the presence of the Tat genes in the Halobacterium sp. NRC-1 genome, our in vivo analyses of halophilic α-amylase provided experimental evidence for the utilization of the Tat pathway by the Halobacteriaceae. To substantiate our hypothesis further, it was necessary to identify putative Tat and Sec substrates from the Halobacterium sp. NRC-1 genome. We thus developed tatfind, which allowed us to distinguish between the structurally similar signal sequences of these two substrate types.
It is intriguing that, despite the fact that putative structural determinants in the mature Tat substrate exist that have not yet been identified and thus could not be included in tatfind, this algorithm identified all experimentally confirmed E. coli Tat substrates and did not identify any substrates in M. jannaschii, an archaeon lacking the Tat genes. Thus, the criteria used by this program provide sufficient information to predict Tat substrates with high accuracy.
As expected, our analyses revealed that the majority of secreted proteins in Halobacterium sp. NRC-1 were putative Tat substrates, whereas only three putative Sec substrates could be identified. We do not exclude the possibility that more Sec substrates might exist among unassigned or signalp-negative proteins. However, as the motif recognized by tatfind is based solely on the signal sequences of assigned putative Tat substrates, it is also likely that an even greater number of haloarchaeal Tat substrates will be revealed as the understanding of the motif is expanded.
Although the secretion of a number of non-redox proteins via the Tat pathway has been reported in thermophilic and mesophilic organisms (Berks et al., 2000; Angelini et al., 2001; Robinson and Bolhuis, 2001), it is very unusual that the Halobacteriaceae make such extensive use of this pathway. This is supported by our observations that (i) the homologues of putative non-redox Tat substrates from Halobacterium sp. NRC-1 were also predicted Tat substrates in other Halobacteriaceae, whereas the homologues of these proteins in non-haloarchaea were predicted Sec substrates; (ii) the majority of putative Halobacterium sp. NRC-1 Tat substrates were non-redox proteins; and (iii) in a thermophilic archaeon A. fulgidus, the number of putative Sec substrates exceeded the number of Tat substrates. In addition, tatfind analysis of the nearly completed H. volcanii genome sequence has demonstrated an extensive usage of the Tat pathway for secreted proteins, similar to that observed for Halobacterium sp. NRC-1 (data not shown).
Further investigation of protein translocation in haloarchaea may provide insight into the way in which high-salt conditions might have led to the extensive use of the Tat pathway in halophilic archaea. The reason for this phenomenon could be that (i) under high-salt conditions, many proteins either fold too rapidly or simply cannot be maintained in an unfolded state, as required for Sec-mediated translocation; or (ii) rapid cytoplasmic folding might help to prevent aggregation before translocation. These explanations presume that Sec-dependent protein translocation occurs post-translationally, in which the protein is translated before the initiation of translocation. As components of the signal recognition particle (a ribonucleoprotein complex required for co-translational translocation) have been identified and shown to be essential in halophilic archaea (Rose and Pohlschröder, 2002), the problem of rapid folding in the cytoplasm could have been overcome by translocating the proteins co-translationally. However, this mode of translocation would result in the secretion of unfolded proteins into an external environment that is likely to lack chaperones. As it has been shown that chaperones are often required for proper folding of halophilic proteins, the Tat pathway might provide a solution to this extracellular folding problem (Franzetti et al., 2001). In view of the exceptional usage of the Tat pathway in the Halobacteriaceae, is likely that this system it is costly and used only when cytoplasmic maturation steps are required or unavoidable.
Sequences used in the analyses
All predicted proteins (identified and putative) as annotated in GenBank records [NC_000917.1 (Archaeoglobus fulgidus); AE004437, AE004438 (plasmid NRC100) and NC_001869.1 (plasmid NRC200) (Halobacterium sp. NRC-1)] were analysed to identify putative Tat and Sec substrates. Furthermore, U00096.1 (Escherichia coli K12); NC_000909, NC_001732 (large extrachromosomal element), NC_001733 (small extrachromosomal element) (Methanococcus jannaschii ); NC_000854.1 (Aeropyrum pernix K1); and AE006641 (Sulfolobus solfataricus)] were analysed to identify putative Tat substrates.
Tat substrate prediction
A perl program (tatfind; available from the authors) was written to identify putative Tat substrates among the set of all putative coding sequences (CDS) from the entire genome. The patterns recognized by tatfind were taken from the literature (Halbig et al., 1999; Robinson and Bolhuis, 2001; Stanley et al., 2001), as well as from a list of putative secreted proteins from Halobacterium sp. NRC-1, and then refined with residues found in putative Tat substrates from other Halobacteriaceae. The position of the pattern in the N-terminus, as well as the length and position of the following uncharged region, was adjusted according to known extrema in bacterial Tat substrates (i.e. Zymomonas mobilis glucose:fructose oxidoreductase (GFOR): largest n-region with RR at position 30/31; E. coli HybO: shortest uncharged stretch of only 13 residues length; E. coli NapG: uncharged stretch begins nine residues behind the RR motif) (Halbig et al., 1999; Robinson and Bolhuis, 2001; Stanley et al., 2001). The final version of tatfind, which was used to analyse the complete set of predicted proteins of all species described in this work, searched for the following pattern between residues 2 and 35 of the predicted protein: (X−1) R0R+1(X+2)(X+3)(X+4), where the amino acid at position X−1 had a hydrophobicity score ≤ 0.26; X+2 had a hydrophobicity score ≤ 0.02; X+3 had a hydrophobicity score ≥−0.77 (positively charged residues were excluded from this position); and X+4 was one of the following residues (ILVMF). All hydrophobicity values were taken from Cid et al. (1992). If the above pattern was found, tatfind assessed three additional criteria: (i) whether there was an uncharged stretch of at least 13 residues in the 22 residues following the RR; (ii) whether the uncharged stretch started behind a negatively charged residue (not allowed except behind positions +2 and +5); and (iii) whether the hydrophobicity sum of the first 13 residues of the uncharged region was < 8.0. If the above pattern was found and all the criteria were met, then a sequence was considered to be a putative Tat substrate.
Sec substrate prediction
Identification of putative Sec substrates required multiple steps (see web page). signalp (Nielsen et al., 1997) was used to identify putative signal sequence-containing proteins from the complete set of all predicted proteins. As signalp has not been trained specifically to recognize archaeal signal sequences, we cast the largest possible net and used options for Gram–, Gram+ and eukaryotic signal sequence predictions, as well as a lenient cut-off value ≥ 3 ‘yes’ responses by at least one predictive option. Of the signalp positives, we then excluded all tatfind-positive proteins (see Discussion). From this subset, proteins with tmhmm-predicted transmembrane helices (Sonnhammer et al., 1998) outside the N-terminal 50 residues were removed, as these proteins are not secreted but, rather, are integral membrane proteins.
As part of our analysis, we wished to determine whether the proteins detected by signalp and tatfind in Halobacterium sp. NRC-1 had homologues in other species that were also potentially secreted and, if so, via which route (Tat or Sec; see web page). All putative proteins that were positive by tatfind or signalp (after removing integral membrane proteins) were analysed further by blastp to identify homologues (E-value ≤ 10−10) (Altschul et al., 1990). The homologues were then examined, and cytoplasmic proteins were removed. The non-cytoplasmic homologues were then analysed by tatfind, signalp and tmhmm to determine their potential mode of secretion.
In vivo analysis of Haloarchaeal Tat secretion
Construction of wild-type and signal sequence mutant α-amylase plasmids.
The twin-arginine residues of the wild-type Natronococcus sp. strain Ah-36 (Kobayashi et al., 1994) α-amylase signal sequence were replaced with twin-lysine residues using site-directed polymerase chain reaction (PCR) mutagenesis. A 125 bp fragment encompassing the 5′ end of the α-amylase gene (including 57 bp upstream of the open reading frame, ORF) was PCR amplified (forward primer 5′-GTTAGCACTAAGCTTCGAAACCGAATTAAAATCATTAT-3′; reverse primer 5′-CGAGCGCAGGACGGTCTTTTTGTCGATA CCCGCCG-3′) from pANAM121 (which harbours the wild-type α-amylase; Kobayashi et al., 1994), thus replacing the twin arginines at amino acid positions 16 and 17 with twin lysines. A second 1543 bp fragment encompassing the α-amylase ORF starting from nucleotide 91 (amino acid 11) was PCR amplified (forward primer 5′-TCGGCGGGTATCGACAA AAAGACCGTCCTGCGCTCG-3′; reverse primer 5′-GACTGT GGTACCTCAGTCGTCGTCGGACAG-3′) from pANAM121. These two fragments were ligated together using a modified PCR, and the product was PCR amplified (forward primer 5′-GTTAGCACTAAGCTTCGAAACCGAATTAAAATCATTAT-3′; reverse primer 5′-GACTGTGGTACCTCAGTCGTCGTCGGA CAG-3′) to produce the final signal sequence mutant α-amylase insert (AmyKK). The wild-type α-amylase insert (AmyRR) was also generated by PCR using these two primers. Both inserts were cut with HindIII and KpnI and cloned into pMLH3 (Holmes et al., 1994), resulting in the expression vectors pAMY-KK and pAMY-RR. These constructs were then used to transform H. volcanii (strain WFD11) (Charlebois et al., 1987; Cline et al., 1989).
Expression and detection of wild-type and signal sequence mutant α-amylase
In vivo starch hydrolysis assay.
Secretion of the wild-type and mutant α-amylase was examined by a starch hydrolysis assay. H. volcanii harbouring pAMY-KK or pAMY-RR were grown on rich medium (Charlebois et al., 1987) supplemented with 0.2% soluble starch. Once single, separated colonies were visible, the plates were exposed to iodine vapour to detect starch hydrolysis by extracytoplasmic α-amylase.
To confirm that α-amylase was synthesized in these cells, Western blot analysis of SDS-denatured cell extracts (wcl) and culture supernatants (sup) from H. volcanii expressing either AmyRR or AmyKK was performed. Cells were cultured in rich medium to an absorbance (600 nm) of 0.6, pelleted and lysed with SDS-PAGE sample buffer. The cell supernatant was precipitated with 10% trichloroacetic acid. Samples were electrophoresed on an SDS–polyacrylamide gel, electroblotted to nitrocellulose, probed with a polyclonal antibody against α-amylase and visualized with the ECL chemiluminescent system (Amersham).
We thank David Roos for use of his computational infrastructure, D. DelVecchio and Kieran Dilks for analysing the H. volcanii genome with tatfind, and Jon Beckwith, Fevzi Daldal, Mike Dyall-Smith and Michael Ehrmann for valuable comments on the manuscript. Support was provided to T.B. by a research fellowship from the German Academy of Natural Scientists Leopoldina (grant BMBF-LPD 9901/8–14), to R.W.R. by a predoctoral fellowship from the American Heart Association (ref. no. 0110093U), and to M.P. by grants from the National Science Foundation (grant MCB-9816411) and the Department of Energy (grant DEFG0201ER15169).