Transport of solutes and polypeptides across membranes is an essential process for every cell. In the past, much focus has been placed on helical transporters. Recently, the β-barrel-shaped transporters have also attracted some attention. The members of this family are found in the outer bacterial membrane and the outer membrane of endosymbiotically derived organelles. Here we analyze the features and the evolutionary development of a specified translocator family, namely the β-barrel-shaped polypeptide-transporters. We identified sequence motifs, which characterize all transporters of this family, as well as motifs specific for a certain subgroup of proteins of this class. The general motifs are related to the structural composition of the pores. Further analysis revealed a defined distance of two motifs to the C-terminal portion of the proteins. Furthermore, the evolutionary relationship of the proteins and the motifs are discussed.
Transport of solutes or macromolecules such as proteins across membranes requires a proteinaceous channel or transporter. Besides their way of action, these proteins can be divided according to their substrates or to their secondary structure of the membrane domain. In terms of secondary structure α-helical or β-sheet channels can be differentiated . Both types of channels show a high neighbourhood correlation according to the fold  suggesting similar folds of the membrane-inserted domains. In the past, much attention was given to the α-helical channels [3–5]. However, recently ion channels formed by the β-sheets moved into the focus of interest [6,7]. While analyzing these channels it became obvious that they emerged from outer membrane proteins of prokaryotic endosymbionts, as these proteins were the only β-barrel type membrane proteins found in bacteria . This class of proteins is present in organellar membranes of eukaryotic organisms, like in the outer mitochondrial membrane  emerged from α-proteobacteria , in the outer envelope of chloroplasts [10,11] emerged from cyanobacteria  and maybe even in the peroxisomal membrane . The peroxisomal β-barrel protein might be an indication either of the discussed endosymbiotic origin (for example ) or of a redistribution of proteins within the cell as a result of the gene transfer of the other two endosymbiotic events . Most of the β-barrel type channels of eukaryotes belong to the porin type family. Recent research revealed that β-barrel type channels are also involved in the translocation of polypeptides , in the assembly of proteins in the outer membrane of endosymbiotic organelles [16–19] or in the assembly of proteins in the outer membrane of bacteria [7,20,21].
One polypeptide-transporter that was found in Bordetella pertussis is FhaC, which secretes the main adhesin, namely haemagglutinin [22,23]. This is an outer membrane protein of various Gram-negative pathogens and facilitates translocation of polypeptides . A further member of the family of accessory outer membrane proteins involved in secretion of haemolysins or adhesins in various Gram-negative pathogens is ShlB first found in Serratia marcescens. Despite the homology between FhaC and ShlB, the proteins are not exchangeable indicating the molecular specificity of the transporters . Structural modelling of FhaC  and ShlB  suggests long loop regions at the N-terminus, whereas the C-terminal portion seems to be involved in pore formation. It was further established that ShlB has two functions. On one hand, the channel formed by ShlB facilitates the translocation of ShlA . On the other hand it activates ShlA by changing the conformation of this substrate and thereby inducing the transfer of phosphatidylethanolamine [29–31] required for the activation of the enzyme ShlA. A third class of translocators is formed by the Omp85/D15 homologues. Omp85 is an essential component for outer membrane biogenesis in the Gram-negative bacterium Neisseria meningitidis. Similarly to the ShlB family, it seems that Omp85 has two functions: the assembly of outer membrane proteins  and the translocation of lipids . Voulhoux and coworker  further suggested a β-barrel-shaped membrane structure. However, little more is known about these proteins.
Toc75, the 75-kDa subunit of the translocation complex of the outer envelope of chloroplasts of Pisum sativum, is one member of this polypeptide-transporting family found in the endosymbiotic organelle chloroplast. It is one of the major proteins of this membrane and acts as the protein translocation channel . In contrast to the other identified polypeptide transporters, the translocation of proteins requires the action of assisting proteins like Toc159 . Similar to FhaC, ShlB and Omp85, structural modelling of Toc75 from Pisum sativum and Toc75-V from Arabidopsis thaliana suggests a β-barrel type structure of the protein. Furthermore, it was proposed that the Toc75 family might have evolved from the ShlB [37,38] or the Omp85 class . Recently, a protein of the outer membrane of the second endosymbiotically derived organelle, the mitochondrion, was identified to belong to this distinct family of polypeptide-transporting proteins. The protein was termed Sam50 , Tob55  or mitochondrial Omp85 homologue . This protein facilitates the assembly of proteins into the outer mitochondrial membrane [17–19].
Here we present an analysis of this transport protein family. We observed a putative motif in the N-terminal region, but with a lower reliability than a conserved motif in the C-terminal region. We provide evidence that this conserved motif is specific for polypeptide-transporting proteins and that it is involved in pore formation. The possible function is discussed.
Motifs in β-barrel-shaped polypeptide-transporting proteins β-barrel type proteins are divided in several subclasses regarding their structural or functional features as discussed elsewhere . Herein we have analyzed 71 β-barrel-shaped proteins with putative polypeptide-transporting function (Table S1). For our analysis and to test the specificity of the identified domains we included 10 members of the β-barrel-shaped FepA family. These proteins are known to facilitate iron transport across the outer membrane of bacteria and are not involved in polypeptide transport .
Analyzing the β-barrel-shaped polypeptide-transporting proteins using the ‘motif alignment and search tool’ we identified four motifs (Fig. 1, Table 1) in the selected proteins (Table S1). The respective sequence with highest probability is shown in Fig. 1A. The motifs are not found in the sequences of the members of the FepA class with the exception of the protein in Pseudomonas putida KT2440 where motif 3 and 4 were detected, however, with the highest P-value of the whole set of analyzed proteins. The P-value of an identified motif within the target sequence is computed as following: the match score of the identified motif within the target sequence with the position-specific scoring matrix generated by MEME facilitating a hidden Markov model for the motif is calculated. This match score is then compared to the match score of a randomly generated string of amino acids generated from the background letter frequencies (Table S2) of the used sequence pool (Table S1). The P-value is estimated as the fraction of random strings that have match scores bigger or equal than the score of the putative motif in the target sequence. The threshold to select sequences containing a specific motif was set corresponding to a P-value of 10−10 for highest certainty (Table S1, bold) and 0.001 for low certainty for the presence of the motif within a sequence (Table S1). The latter value would correspond to an appearance of a motif with a score better than the threshold once every 1000 sequences randomly generated using the same amino acid frequency as in the sequence pool. According to these limits we conclude that all identified motifs are restricted to the polypeptide transporters, as they cannot be found in the solute transporting FepA family. Furthermore, pair-wise-motif correlation  revealed no significant overlap between the identified motifs as the correlation value was in the range of 0.16–0.28 and therefore smaller than the threshold of 0.6 suggested by MEME. The first and the second motif is a polypeptide stretch of 150 amino acids. Motif 1 comprises a sequence where the amino acids of 58% of the sequence are similar and 37% of all amino acids are identical in all candidates analyzed (Fig. 1B). In motif 2, slightly more amino acids are defined in their features (60%) but fewer amino acids are identical (33%, Fig. 1B). Motif 1 is present in the C-terminal region, whereas motif 2 was identified in the N-terminal portion of the proteins. In contrast to motif 3 or motif 4, motifs 1 and 2 could only be identified in few polypeptides (Fig. 1C, Table S1). Interestingly, both motifs 1 and 2 were found almost exclusively in proteins of the Oma87 homologues (Table S1, Fig. 6), a protein class with so far unknown function . In our subsequent investigation we have focused on motif 3 and motif 4, as these were characteristic for the whole family.
Table 1. Obtained log likelihood values (llr) and E-values for the obtained motifs.
Motifs 3 and 4 are present in almost all investigated polypeptide transporters (Fig. 1C, Table S1). In total, 53 of the analyzed polypeptide transporters contain motif 3 and 51 of the analyzed transporters motif 4 (Fig. 1C). The phylogenetic distribution of the motifs is displayed in Fig. 6. In particular the polypeptide transporters of the Oma87 family of higher organisms do not contain the identified motifs 3 or 4. Motif 3 comprises 43 amino acids and motif 4, 30 amino acids (Fig. 1A). In the identified motifs of the class 3, 79% of all amino acids are similar and 44% of the amino acids are identical (Fig. 1B). In contrast, the sequences representing motif 4 are more diverse. We observed that in certain positions (positions 1, 3, 4, 18, 20, 24, 25, 26 and 30) two defined but different amino acids were placed (Fig. 1B). However, besides this splitting between these two amino acids, the position is clearly defined. Taking these amino acid positions into account, we have 70% of all positions defined by a class of amino acids and 23% defined by a specific amino acid.
Analysis of the two C-terminal motifs 3 and 4
To gain insight into the function of the detected motifs we analyzed the physicochemical parameter of the two motifs. Strikingly, both motifs consist of two β-barrel regions according to the exact β-sheet (EBS) score (Fig. 2A,C). The EBS score is based on the amino acid distribution in membrane segments of β-barrel proteins . These two transmembrane β-sheet regions can also be seen by analysis of the alternating hydrophobicity profile (Fig. 2B,D). In here, the hydrophobicity values of the amino acids according to the octanole scale of White and Wimley  were used to calculate the alternating hydrophobicity as a typical signature of membrane-inserted β-sheets [44,45]. Additionally, all motifs were analyzed by mcmbb, a program probing for a β-barrel conformation . Of all identified sequences of the class 3, 49% were selected by the program (Table S3). When only the sequences with a P-value below e-10 were analyzed, 95% of all sequences were selected by mcmbb to form a transmembrane β-sheet structure. Using the same procedure for all sequences of motif 4 revealed a prediction rate of 60% for all selected motifs and a rate of 79% for all motifs with a P-value below e-10. This strongly supports the notion that the detected motifs indeed represent transmembrane regions. To further support this statement, the topology of all sequences representing either motif 3 or motif 4 was analyzed using Pred-TMBB [47,48]. Subsequently, the percentage of all amino acids in sheet conformation for a specific position within the motif was calculated either for all sequences found (Fig. 2E, solid line) or for motifs with a P-value below e-10 (Fig. 2E, dashed line). Analyzing motif 3, two regions were identified, where for most of the sequences a transmembrane β-sheet was predicted (Fig. 2E, left). For motif 4, two regions in such a conformation were also observed (Fig. 2E, right); the second transmembrane segment was not present as frequently as the first segment. This is in line with the observation that the EBS score of this transmembrane sheet is not as high as for the first predicted sheet (Fig. 2C). Nevertheless, when the sequences with a P-value below e-10 were analyzed, more then 60% of all sequences contained a sheet in this region (Fig. 2E, right). Therefore, comparing the prediction based on the statistical analysis (Fig. 2A,C) with that achieved by the hidden-Markov-model based method (Fig. 2E) shows that the same regions of the motifs were predicted to form a transmembrane β-sheet structure (compare Fig. 2A,E, left; Fig. 2C,E, right). However, for motif 4, the scores of the models generated by Pred-TMBB are slightly shifted toward the C-terminus. We can conclude that both motifs represent structural units composed of the two β-sheets, respectively.
We further analyzed the positioning of the two motifs with regard to the amino acid sequence of the target proteins. We first looked at the relative positioning (normalized to the amino acid length of the protein) either to the N-terminus or to the C-terminus. Here, no significant cluster could be observed. Next, the absolute distance (in amino acids) of the start of the motif to both termini was analyzed. Again no direct relation of the positioning to the N-terminus could be observed. In contrast, the spacing to the C-terminus of the proteins is highly conserved (Fig. 3). We found a distance of the starting amino acid of motif 3 from the C-terminus of 118 amino acids (Fig. 3A) and a distance of the starting amino acid of motif 4 from the C-terminus of 40 amino acids (Fig. 3B). Taking the length of motif 3 (43 amino acids) this further implies an almost constant spacing between the C-terminus of motif 3 and the N-terminus of motif 4 of about 35 amino acids.
Taking this into account, we analyzed the existing topological models of Nme-I-Omp85 , Bpe-FhaC  and Ath-Toc75-V . Aligning the region including the motifs 3 and 4 (Fig. 4; black box above sequence, motif 3; black box below sequence, motif 4) revealed that Nme-I-Omp85 has enlarged loop regions (Fig. 4) in motif 3, which explains the high P-value for this motif of 4.3e-6. However, earlier individually proposed transmembrane segments (Fig. 4, grey frames under the sequence) align very well with the exception of one missing segment in Bpe-FhaC (first segment, Fig. 4) and an additional segment in Ath-Toc75V (fourth segment, Fig. 4). Furthermore, the proposed transmembrane segments are in agreement with the physicochemical parameter analysis (Fig. 2) for the whole set of sequences representing the motifs 3 and 4. The analysis of the exact β-barrel score [11,42] or of the alternating hydrophobicity [45,49] revealed that in motif 3 and 4 two transmembrane β-strand segments exist (Fig. 2, grey boxes above). This is in agreement with the analysis of the motif sequences facilitating hidden-Markov-model based methods (Fig. 2E and not shown). In addition, the previously identified motifs by Eckart et al.  (Fig. 4, dashed boxes) subsequently confirmed by Voulhoux  (Fig. 4, open box), and Gentle  (Fig. 4, grey box), are in this region and cover most of the transmembrane segments (Fig. 4). We therefore conclude that in contrast to the previously identified POTRA motif , which was postulated to represent a polypeptide-binding motif, and to the motifs 1 and 2, which are specific for the Oma87 family, the motifs 3 and 4 are related to the general pore-forming region. It might therefore be that the regulation of the translocation of polypeptides through the channel is rather conserved and defined by these two identified domains and specificity gained by the accompanied N-terminal region of the protein.
The identification of new Toc75-related proteins
Using the proposed motifs 3 and 4 we have searched for proteins belonging to this family in Arabidopsis thaliana. On the base of this search we identified two putative members of this family, namely Ath-P1 and Ath-P2 (At3g44160, At3g48620). The mRNA encoding the proteins was detectable in roots, flowers and flower stalks (Fig. 5A, lanes 4,2 and 3). The mRNA level of Ath-P1 in flowers was comparable to that in flower stalks, whereas the mRNA level of Ath-P2 was slightly lower (Fig. 5A, lanes 2 and 3). For both proteins, almost no mRNA was detectable in leaves. This result was confirmed by Affymetrix gene ship analysis . Here, however, only the gene expression of both genes together could be analyzed, as both genes are annotated to the same spot of the ATH1 genome ship. However, a 10 times lower expression of both genes in combination was observed in leaves when compared to Ath-Toc75-III (Fig. 5B). In addition, the diurnal expression of Ath-P1 and Ath-P2 did not differ drastically from that of Ath-Toc75-III further suggesting a tissue-dependent differential expression rather than a differential expression during the daily cycle. This might suggest a differentiated function of the two proteins in comparison to Ath-Toc75-III and Ath-Toc75-V (Fig. 5). Both proteins Ath-P1 and Ath-P2 are smaller in size (47 and 36 kDa, respectively). Sequence comparison of Ath-P1 and Ath-P2 with Ath-Toc75-III and Ath-Toc75-V revealed that both proteins lack the N-terminal domain, which was proposed to form long soluble loops  and to contain the POTRA motif . The phylogenetic tree (Fig. 6) indicates a close relationship between Ath-P1 and Ath-Toc75-V, which constitute a cluster with 96% support value, whereas the phylogenetic affiliation of Ath-P2 remains unresolved. It has to be investigated whether these proteins Ath-P1 and Ath-P2 assemble polypeptide transporters and, if so, how recognition of the polypeptides is achieved.
Analysis of the evolutionary relation of the β-barrel proteins β-barrel proteins present in eukaryotes have most likely evolved from the proteins of the outer membrane of Gram-negative bacteria. However, recently the relation between certain proteins found in mitochondria, namely Sam50/Tob55, or chloroplasts, namely Toc75, with the proteins of the Omp85 class was discussed [17–19,32]. Interestingly, in Sam50/Tob55 only the motif 1 was identified with high probability (Table S1) suggesting a relation to the protein class Oma87 but not to Omp85. This relation is further substantiated by the phylogenetic tree (Fig. 6). The five Oma87 sequences from Metazoa constitute a sister group to Ncr-Tob55 and Sce-Sam50, which receives high support. Therefore it might be speculated that these Oma87 proteins actually assemble the homologues of the mitochondrial polypeptide transporter.
However, one should note that the Oma87 sequences from bacteria (Eco, Gme, Hso, Nar, Pmu, Rru, Vvu) are not clustered in the tree. The proteins of the Toc75 class revealed a high homology to the consensus of motif 3 and 4 as found for the members of the Omp85 class. However, the Toc75-class is not a homogeneous group from a phylogenetic perspective, and the phylogenetic relationship to other protein classes remains unclear. Almost all proteins assigned to the Toc75-class containing the sequence motifs 3 and 4 are in a weakly supported group (58%). Within this group four supported subgroups (support values larger than 80%) are discernable, the largest one is concentrated around Ath-Toc-III, the main protein translocation channel in A. thaliana, and consists of representatives from higher plants (Ath, Osa, Psa). The second grouping comprises of a collection of proteins from Nostoc sp., the third again consists of proteins from cyanobacteria (Pmar, Ssp). As already mentioned, the final subgroup, with highest support value clusters the newly identified protein Ath-P1, which only contains motif 3, with Ath-Toc75-V, whereas P2 does not belong to this phylogenetic family even though it contains both identified motifs.
The polypeptide transporters with adhesin selectivity (ShlB and FhaC) are evolutionary closely related with the exception of ShlB from Bordetella pertussis (Bpe-ShlB), Haemophilus influencae (Hin-HuxB) and Xanthomonas axonopodis (Xax-ShlB). These three proteins together with Omp85 from Geobacter sulfurreducens (Gsu-Omp85) form a separated branch with almost no support. The proteins assigned to the Omp85 family, however, do not form a single phylogenetic group. This might reflect that many of those proteins were annotated just based on low sequence homologies without any functional information.
Recently, a new class of membrane proteins was defined according to their function as polypeptide transporters. Besides the members of the family of accessory outer membrane proteins involved in secretion of haemolysins or adhesins in various Gram-negative pathogens, proteins of the endosymbiotic organelles belong to the increasing list of such proteins . On top of the functional relation of these proteins, the initiated sequence analysis of the β-barrel-type proteins involved in polypeptide transport revealed a possible structural relation of the proteins. First, a cluster of short motifs in the C-proximal region of such proteins was identified . This cluster was partly confirmed to be present in the Omp85 family [19,32]. Comparison with these earlier motif predictions revealed that motif 3 was partly identified by Voulhoux  (Fig. S1B). However, in the previous work the motif was limited to 20 amino acids. Motif 4 shows a significant overlap with the motif identified by Gentle et al. . Interestingly, in line with the motif prediction by Eckart et al.  analyzing Toc75 homologues, the work of Gentle  (Fig. S1B) suggests a prolongation of the motif towards the N-terminus. Motifs 3 and 4 are found with highest reliability in the proteins homologous to Toc75 proteins (Table S1), however, incorporation of the other polypeptide transporters revealed that the conserved motif for the entire family is not as long as when only Toc75 proteins were analyzed (not shown). Both motifs are present in a defined distance from the C-terminus of the proteins (Fig. 3). This underlines that during evolution these specific pore-forming regions remained conserved (Fig. 6). In addition, the motifs do not only reflect similar sequence features, but also structural features as determined by their physicochemical parameters (Fig. 2). Both motifs define two transmembrane β-sheets. The conservation of transmembrane sheets suggests a similar gating and translocation behaviour of the member of this family. Therefore, the N-terminal region would be essential for the fine tuning of the specific function of the individual protein. This hypothesis has to be confirmed in future. Furthermore, guided by this notion it should be suggested that the proposed topological models of FhaC  and Omp85  should contain an additional transmembrane domain (Fig. S2).
Furthermore, we identified one motif with N-terminal proximity. Such a region localized in the N-terminus was previously identified  and termed POTRA for polypeptide transport-associated domain. Voulhoux and coworker  also identified a similar but shorter consensus sequence. It was proposed that this region might have a function in polypeptide transport. However, the herein identified motif 2 does not represent the identified POTRA motif, but shows a certain overlap with the C-terminal portion of this earlier proposed consensus sequence (Fig. S1). The two motifs 1 and 2 are limited to a certain subset of proteins of the Oma87 family. Remarkably, the same motif 1 is found in the Sam50 or Tob55 protein. This observation and the evolutionary relation of Sam50 and Tob55 to certain members of the Oma87 family (Fig. 6) together suggest a functional relation of the identified group. However, the phylogenetic tree also underlines that the given nomenclature for Oma87 or Omp85 proteins requires further investigation to understand their functional relations. Based on the large distance of this family to the iron transporter FepA (Fig. 6), we came to the conclusion that the class of polypeptide-transporting proteins must have evolved from a common branch during evolution.
Taking our proposal and those of others, the polypeptide transporters can be identified by at least three signature sequences, namely POTRA, motif 3 and motif 4, which are not present in proteins transporting solutes, like the proteins of the FepA class. The POTRA motif, however, is less well defined as we could not identify it using a larger pool of sequences. Based on motifs 3 and 4 two new members of the family were identified in Arabidopsis thaliana (Fig. 5), whose function has to be explored in future.
Sequence selection and motif detection
The amino acid sequences of the proteins listed in Table S1 were identified by homology searches at http://www.ncbi.nlm.nih.gov/BLAST/ taking one member of the protein class described. Sequences were controlled for redundancy and further analyzed by the ‘multiple EM for motif elicitation’ (MEME) program at http://meme.sdsc.edu/meme/website/meme.html ( and references therein). The sequences were analyzed for the presence of motifs using the following parameter: any number of repetition; maximum of motifs find to 5; minimum length was set to 20 amino acids and maximum length to 200 amino acids. The selection procedure to identify motifs within MEME is based on the statistical significance of the log likelihood ratio (Table 1) of the occurrence of the motif within the user defined range. The E-value of a single motif is an estimate of the number of motifs (Table 1) that would have an equal or higher log likelihood ratio, if the sequences had been generated randomly according to the 0-order background hidden Markov model consisting of the frequencies of the letters in the training set (Table S2).
Analysis of the physicochemical parameter
The distance of the starting amino acid of the motifs to the different termini was calculated using the individual amino acid length of each sequence. The distribution was analyzed by a Gaussian function by least square fit analysis using the incorporated tool of sigma plot. The EBS score and the alternating hydrophobicity were calculated for each identified motif individually as described . The average was calculated using sigma plot. For the weighted average, the results were normalized according to their significance by calculating the significance (S) as follows: S = − log(p). The β-sheet topology of the motifs was predicted using the following programs: proftmb ( accessible at the web address http://www.rostlab.org/services/proftmb/index.html, mcmbb at http://bioinformatics.biol.uoa.gr/mcmbb/andpred-tmbb[47,48] at http://bioinformatics.biol.uoa.gr/PRED-TMBB/. The output of the last program was analyzed by calculating the percentage of predicted transmembrane β-sheet formation for each amino acid position in the motif of either all selected motifs (Table S1) or all motifs with a P-value below e-10 (Table S1, bold numbers).
Arabidopsis thaliana (ecotype Columbia) were grown on plates suggested by Murashige and Skoog . Plant growth was achieved in a 16 h light at 21 °C and 8 h dark at 16 °C cycle. Seedlings were transferred to soil after 18 days and growth was continued under the same conditions. Different body parts (as indicated) were collected from 55-day-old flowering plants . Total RNA from the individual plant material was isolated using RNeasy Plant Mini Kit (Qiagen, Hilden, Germany) according to the protocol recommended. RT-PCR reactions were performed using 20 ng RNA and the SuperScript One Step RT-PCR Kit with Platinum Taq (Invitrogen, Karlsruhe, Germany) as described by the manufacturer. The reverse transcription reaction was performed for 30 min at 45 °C followed by 40 PCR cycles.
The tree-puzzle program  was applied in its parallel version 5.2  to infer a maximum likelihood based phylogenetic gene tree (http://www.tree-puzzle.de). We used the JTT model with constant rates across sites. Twenty-five thousand puzzling steps were applied and the resulting consensus tree was used for the analysis.
We are grateful to Dr S. Smith (University of Edinburgh, UK) for his permission to use his Affymetrix data for the diurnal cycle and to A. Vojta for permission to his Affymetrix data. We are grateful to Dr H. A. Schmidt (FZ-Jülich) for help with the computation on the JUMP supercomputer at the ZAM/NIC of the Research Center Jülich. This work was supported by grants to E.S. from the Deutsche Forschungsgemeinschaft (SFB-TR01), Fonds der Chemischen Industrie and Volkswagenstiftung and to A.v.H. from the Deutsche Forschungsgemeinschaft (SFB-TR01).