Toward genomic identification of β-barrel membrane proteins: Composition and architecture of known structures


  • William C. Wimley

    Corresponding author
    1. Department of Biochemistry SL43, Tulane University Health Sciences Center, New Orleans, Louisiana 70112-2699
    • Department of Biochemistry SL43, Tulane University Health Sciences Center, New Orleans, LA 70112-2699; fax: (504) 584-2739.
    Search for more papers by this author


The amino acid composition and architecture of all β-barrel membrane proteins of known three-dimensional structure have been examined to generate information that will be useful in identifying β-barrels in genome databases. The database consists of 15 nonredundant structures, including several novel, recent structures. Known structures include monomeric, dimeric, and trimeric β-barrels with between 8 and 22 membrane-spanning β-strands each. For this analysis the membrane-interacting surfaces of the β-barrels were identified with an experimentally derived, whole-residue hydrophobicity scale, and then the barrels were aligned normal to the bilayer and the position of the bilayer midplane was determined for each protein from the hydrophobicity profile. The abundance of each amino acid, relative to the genomic abundance, was calculated for the barrel exterior and interior. The architecture and diversity of known β-barrels was also examined. For example, the distribution of rise-per-residue values perpendicular to the bilayer plane was found to be 2.7 ± 0.25 Å per residue, or about 10 ± 1 residues across the membrane. Also, as noted by other authors, nearly every known membrane-spanning β-barrel strand was found to have a short loop of seven residues or less connecting it to at least one adjacent strand. Using this information we have begun to generate rapid screening algorithms for the identification of β-barrel membrane proteins in genomic databases. Application of one algorithm to the genomes of Escherichia coli and Pseudomonas aeruginosa confirms its ability to identify β-barrels, and reveals dozens of unidentified open reading frames that potentially code for β-barrel outer membrane proteins.

The β-barrel is one of two known structural motifs for membrane-spanning proteins. As many as several hundred β-barrel species can be found in the outer membrane of Gram-negative bacteria (Schulz 2000; Alm et al. 2000; Molloy et al. 2000), and they also occur in the outer membranes of mitochondria (Benz 1994) and chloroplasts (Fischer et al. 1994). In addition to these native proteins, the β-barrel motif is also used by a large, diverse set of secreted membrane permeabilizing protein toxins and antibiotics that assemble into β-barrels on exogenous membranes (Saier 2000). In a recent review, Schulz (2000) summarized the main structural features shared by all known β-barrel membrane proteins in a list of 10 explicit rules: in summary, known β-barrels are composed of an even number of membrane-spanning β-strands with an antiparallel β-meander topology. Neighboring strands in the barrel are connected by alternating long and short loops. The lipid-interacting outer surfaces of all β-barrels are hydrophobic, and have a band of aromatics near the bilayer interfaces, while the internal residues have an intermediate polarity. Known structures contain between 8 and 22 strands and include monomeric, dimeric, and trimeric β-barrels. Many of these features are apparent in the structure of the dimeric β-barrel phospholipase, OmpLA, which is shown in Figure 1.

One might assume that knowing these explicit rules would make the prediction of β-barrel structure and topology and the identification of β-barrels in genome databases readily solvable problems. In fact, several different types of structure prediction algorithms have been applied with mixed success (Schirmer and Cowan 1993; Fischbarg et al. 1995; von Heijne 1996), and recent structure prediction algorithms based on neural networks have been able to make reasonably accurate predictions of β-barrel structure and topology (Gromiha et al. 1997; Jacoboni et al. 2001). But these predictions were made for proteins already known to be β-barrel membrane proteins by other means. A more difficult part of the problem, and one that has not yet been solved, is the accurate identification of β-barrel membrane proteins in genome databases from physical principles. Currently, β-barrels are identified in genome annotations mainly by their homology to known β-barrels. Each Gram-negative bacterial genome has hundreds of “putative” and “probable” outer membrane proteins identified in this way. It would also be useful to able to identify them through their fundamental physical properties so that novel classes of β-barrels can be identified, and so that the homology-based annotation can be verified. Because each bacterial genome has as many as 1000 hypothetical or unknown proteins that have not been classified at all, there are undoubtedly many β-barrel membrane proteins that have not yet been identified.

We are broadly interested in understanding β-barrel membrane proteins through a knowledge of their composition and physical properties and through parallel studies of how model β-sheets assemble in membranes (Bishop et al. 2001). In theory, a thorough understanding of the fundamental physical principles should contain sufficient information to allow researchers to determine if an unknown protein sequence is a β-barrel membrane protein. For α-helical bundle membrane proteins this idea is a proven one; prediction algorithms based on the physical principle that membrane-spanning helices will have a contiguous stretch of 19 or more hydrophobic residues, have very high accuracy (Rost et al. 1995; Casadio et al. 1996; Krogh et al. 2001), exceeding 99% in recent applications (S. Jayasinghe, K. Hristova, and S.H. White, 2001). However, β-barrel membrane proteins have been more difficult to identify from physical principles for several reasons. First, their hydrophobic, membrane-interacting residues are cryptic, hidden in the alternating inside-outside (dyad repeat) motif. Second, compared to helical membrane proteins, there are many fewer membrane-interacting residues on each strand, and this reduces the uniqueness of the membrane-spanning sequences. And third, some β-sheets in soluble proteins have, superficially, many of the same physical properties, such as similar strand length and amphipathicity as the β-sheets of β-barrel membrane proteins. In this work we set out to analyze the composition and architecture of all β-barrel membrane proteins of known structure, including many new structures, and to generate a body of data that will be a useful starting point in the rapid identification of β-barrel membrane proteins in genome databases.


The β-barrel database

All of the initial β-barrel structures published in the early 1990s belong to the closely related class of trimeric porins of 16 or 18 membrane-spanning β strands. The architecture of this class of porins has been discussed in the literature (Seshadri et al. 1998). In the last few years, the total number of known β-barrel membrane proteins has nearly doubled, and the architectural diversity of known structures has increased significantly with the addition of new β-barrel membrane proteins having different functions, topology, and architecture. For example, three-dimensional structures are now known for the monomeric, TonB-dependent transport proteins FepA (Buchanan et al. 1999) and FhuA (Locher et al. 1998), which have 22 β-strands each and for the trimeric, single-barrel transporter TolC (Koronakis et al. 2000) in which each monomer contributes four β-strands to a 12-stranded barrel. New additions also include the first known dimeric β-barrel, OmpLA (Snijder et al. 1999), shown in Figure 1, and the adhesion protein OmpX (Vogt and Schulz 1999), a monomeric eight-stranded β-barrel.

For this work we identified all β-barrel membrane proteins in the Protein Data Bank (Berman et al. 2000) and used a BLAST (Altschul et al. 1990) sequence alignment to screen each sequence against all other sequences in the PDB. For closely homologous or identical sequences (i.e., those with more than 70% conserved residues) we eliminated all but one member. The β-barrel database that we used in the calculations is described in detail in Table 1. It has 15 diverse members comprising a total of 210 membrane-spanning β-strands with more than 2000 amino acids in the membrane-spanning segments.

Identification of membrane-spanning segments

Three features, which are present in all β-barrel structures, were used to align the XY plane of each protein's Cartesian coordinates with the putative plane of the bilayer: the band of aromatics that lies in the bilayer interfacial region (Schiffer et al. 1992; von Heijne 1994; Yau et al. 1998), the band of charged residues just outside of the aromatics, and the band of aliphatic residues that interact with the hydrocarbon core of the bilayer (see Fig. 1 for an example). Structure coordinates were transformed as described in Materials and Methods so that the three bands of residues around each β-barrel (aromatic, aliphatic, and charged) were aligned with the XY plane of the new coordinate system.

After aligning the structures along the bilayer normal, we identified all β-strands in each structure using the annotation in the PDB datafile, and we identified the β-strands that span the membrane by inspection of molecular graphics images. One additional residue beyond the designated membrane-spanning β-sheet was also included in each strand segment. Residues in a membrane-spanning strand were designated as either exposed, internal, or involved in protein–protein interfaces. Exposed residues were those whose Cα to Cβ vector extended away from the axis of the barrel and whose side chain was more than 50% “solvent” exposed on the barrel surface. Internal residues were those whose Cα to Cβ vector pointed towards the interior of the barrel. The geometry of β-sheet secondary structure places side chains on alternating inner and outer surfaces of the β-sheet so this distinction is unambiguous. We classified the numerous glycine residues in the β-barrel database by the orientation of their Cα-H vectors and the exposure of the α carbon. We did not differentiate between internal residues that were exposed to water within an aqueous pore or those that were buried in the protein. Residues in protein–protein contacts were those residues whose Cα to Cβ vector was oriented out from the barrel axis, but whose side chain was not exposed in the multimer structure because of protein–protein contacts. Because we are trying to characterize and exploit the unique physical properties of the membrane-interacting surfaces of these proteins, we have excluded the residues in protein–protein contacts from the database. The properties and composition of these residues, which are similar to protein–protein interfaces in soluble proteins, have been discussed (Seshadri et al. 1998).

Identification of the bilayer midplane with hydrophobicity profiles

Hydrophobicity profiles for the external and internal residues for all XY-aligned structures were calculated by summing the hydrophobicity of all β-strand residues within a 5-Å sliding window that was moved along the axis of the bilayer normal. Examples of hydrophobicity profiles for external residues are shown in Figure 2A and B. For this analysis we used an experimentally derived hydrophobicity scale measured for peptides partitioning into bulk octanol (Wimley et al. 1996). This scale is “absolute” in the sense that it is a whole-residue hydrophobicity scale that includes contributions from both the side chains and the polypeptide backbone. Thus, negative ΣΔG values indicate a net preference of the polypeptide in the window for an octanol phase relative to water. For all the β-barrel structures examined, the hydrophobicity profile of the external surfaces was very similar to the examples shown in Figure 2A and B, with a band of negative ΣΔG 27-Å wide (average: 26.5 ± 0.7 SD Å) flanked by regions of large positive ΣΔG. The 27-Å band corresponds to the width of the bacterial outer membrane. The crossover points signify the edges of the hydrophobic membrane phase.

The midpoint of the negative ΣΔG band, as delineated by the crossover points, was taken to be the midpoint of the bilayer. We transformed the coordinates of the β-barrel structures so that the bilayer midplane for all structures was set to z = 0. This places all of the proteins in the database on a universal “bilayer” coordinate system. The transbilayer profiles for all of the β-barrel proteins in the database (e.g., Fig. 2A,B) were remarkably similar. Composite profiles calculated from the sum of all the β-barrels are shown in Figure 3A and B. There are several universal features of the hydrophobicity profiles that may be important for genomic identification of β-barrel membrane proteins. The 27-Å negative ΣΔG band, the pronounced peaks in the distribution of external aromatic residues at ±10 Å, and the peaks in the abundance of external charged residues at ±15 Å. In Figure 3B we also show the hydrophobicity profile of the internal β-barrel residues, which have a featureless broad hydrophilic character across the membrane.

Composition of β-barrels

The β-barrel database contains 1592 amino acids in membrane-spanning β-barrels that are either exposed or internal and about 400 additional residues that are found at protein–protein interfaces. Raw abundance (Fig. 4) was determined for residues within the 27 Å width of the bilayer, or ±13.5 Å from the bilayer midplane and also for interfacial and hydrocarbon core regions of the bilayer separately. The bilayer thickness was subdivided, following structural models of bilayers (Wiener and White 1992), into a hydrocarbon core region ±6.5 Å from the midplane and an interfacial region between 6.5 and 13.5 Å from the midplane. Interior residues had similar abundances in both regions of the bilayer, as shown in Figure 4B and listed in Table 2. However, some external residues had very distinct abundance differences between the hydrocarbon core and the interface. For example, tyrosine is about twofold more abundant in the interface than the core, and tryptophan is about sixfold more abundant in the interface, while leucine and alanine are about half as abundant in the interface as in the hydrocarbon core. Abundance data are given in Table 2, and are available as electronic supplementary material.

The information content of an amino acid abundance measurement such as those shown in Figure 4A and B does not reside in the raw abundance values but instead in the deviation of the observed abundance from the expected genomic abundance. We, therefore, calculated the expected abundance of each amino acid in the database, fx, using a weighted average of genomic abundances, fix, using

equation image

where the relative weight, wi, is for each organism, i. Weights were calculated by

equation image

where ni is the number of amino acids in the database that are from each organism, i, and ntotal is the total number of amino acids in the database. Relative β-barrel abundance values (Table 2) were calculated by dividing raw abundance by the weighted expectation values, fx. Relative abundances are plotted in Figure 5A and B and are listed in Table 2. The dotted line in the relative abundance plots (Fig. 5A,B), shows the value of 1 expected from the genomic abundance. Deviations from 1 are a measure of the information content of each amino acid (Seshadri et al. 1998). Note that the most abundant external β-barrel residues leucine and valine (Fig. 4A), have a smaller information content in the relative scale (Fig. 5A) because of their high natural abundance, while the aromatics have a high information content.

Architecture of β-barrels

The goal of this work is to obtain information from known β-barrels that will be useful in characterizing unknown sequences in genome databases. Thus, we also need to explore the architecture and architectural diversity of known structures. The most relevant architectural variable is the rise per residue of the β-strands along the direction normal to the bilayer plane. Simulations have shown that the shear number and tilt angle of β-barrels can vary within certain bounds (Murzin et al. 1994; Sansom and Kerr 1995), as reflected in the known structures. Although the maximum possible rise per residue is about 3.6 Å for a β-strand perpendicular to the bilayer, known structures (Schulz 2000) and theory (Sansom and Kerr 1995) suggest that tilted strands are energetically preferred. We determined the distribution of β-barrel rise per residue values at the bilayer midplane by calculating the value, over the three residues closest to the midplane, for each membrane-spanning strand. The results, shown in Figure 6, demonstrate the narrow range of variation in known structures. The rise per residue in the database is 2.7 ± 0.25 Å per residue, or about 10 ± 1 residues across the membrane.

We also calculated the distribution of loop length in the β-barrels in the database. These data are shown in Figure 7. In this work, loops are defined as segments between membrane-spanning β-strands that are outside the thickness of the membrane. In other words, more than 13.5 Å from the bilayer midplane. Note that about half of the loops are shorter than six residues, indicating that most membrane-spanning β-strands are connected to at least one other strand by a short loop. This suggests that the β-hairpin is the basic structural building block of β-barrel membrane proteins. As apparent in the example shown in Figure 1 and in Figure 2A and B, the short and long loops of β-barrel membrane proteins are generally segregated onto opposite sides of the membrane.


Uniqueness of membrane β-barrel dyad repeats

Membrane-spanning β-strands, like all β-sheets, have a dyad repeat topology in which alternating residues are oriented toward alternating faces of the sheet. In β-barrel membrane proteins about half of the membrane-spanning residues are hydrophobic residues that are oriented toward the membrane lipids, while the other half are more hydrophilic residues that are oriented towards the interior of the barrel. Several β-barrel identification algorithms have been developed, in part, on the idea that membrane β-barrels could be recognizable through the dyad repeat of hydrophobic (external) and hydrophilic (internal) residues (e.g., Fischbarg et al. 1995). However, difficulties arise when genome databases are screened for β-barrel membrane proteins using this simple idea because the interior of membrane-spanning β-barrels are not necessarily very hydrophilic, and because many soluble β-sheets also have a similar dyad repeat motif in which one hydrophobic face of a sheet is buried and one hydrophilic face is more exposed to the aqueous phase. Our goal in this work was to use the known β-barrels to generate a data set based on the observed abundance of the amino acids and the architecture of β-barrel membrane proteins that will further help to differentiate β-barrel membrane proteins from the abundant amphipathic β-sheets of soluble proteins.

From the strand length distribution shown in Figure 6 we concluded that a search for a membrane-spanning segment of 10 residues will be able to identify most transmembrane β-strands. We performed a 10-residue sliding window analysis for each protein examined. For each 10-residue sliding window in a protein's amino acid sequence we calculated a “β-strand score” based on the two abundance data sets (interior and exposed) determined for β-barrel membrane proteins (shown in Fig. 5A,B, and listed in Table 2) using

equation image


equation image

whichever is highest, where AXlin and AXlout are ln (relative abundance) values for interior (in) and exterior (out) residues (Table 2) for the ith amino acid in the sliding window. A comparison between the β-strand scores for the membrane-spanning β-strands of β-barrel membrane proteins and the whole E. coli genome (Perna et al. 2001) is shown in Figure 8. The peak for the β-barrel strands is at approximately 2.5 σ from the center of the genome distribution. This is a good starting point for the distinction of membrane-spanning β-strands in genome databases. We also made the same calculations using a simple dyad repeat of alternating octanol hydrophobicity (Wimley et al. 1996). The results of this comparison, shown in Figure 9, show that the distinction between membrane-spanning β-strands and the genomic distribution is significantly poorer than for the scores generated with the abundance data of Table 2.

β-barrel profiles

An example of a 10 residue sliding window score profile using the abundance data in Table 2 is shown in Figure 10A. The sequence examined is the membrane-spanning domain of the 22-stranded monomeric β-barrel FhuA from E. coli. The actual membrane-spanning β-strands are shown as solid black bars. For reference, the figure has a gray area between 2 and 6 that covers the range in which most membrane-spanning β-strands are found (see Fig. 8). Note that the algorithm is successful at identifying most membrane-spanning β-strands, although there are also some false positive peaks. A similar over prediction is encountered for the prediction of transmembrane helices in many hydropathy analyses (Zen et al. 1995; Casadio et al. 1996; Krogh et al. 2001). The results of this analysis were the same if we treated FhuA as an unknown protein and left it out of the abundance calculation.

To improve the ability to rapidly recognize β-barrels in genome databases and to simplify the sliding window average, we also incorporated the architectural data (Figs. 6,Fig. 7.) into a secondary sliding window calculation that gives a “β-hairpin” score from the β-strand score. The β-hairpin score, as shown in Figure 10B, is the sum, in a 25-residue sliding window, of the highest β-strand score in residues 1–10 and the highest β-strand score in residues 15–25. The β-hairpin score is thus highest when there are two β-strand peaks separated by a short loop. A prototypical β-hairpin with two 10 residue β-strands separated by a five-residue loop (see Figs. 6,Fig. 7.) will give a high, flat peak in this β-hairpin analysis. Note in Figure 10B that most of the β-hairpins of FhuA are correctly identified in this analysis.

Screening of genomic data

These analyses are being conducted so that we can begin to develop methods for rapidly identifying potential β-barrels in genome databases. Potential β-barrels can then be further analyzed with neural network-based structure prediction algorithms (Gromiha et al. 1997; Jacoboni et al. 2001) and with molecular biology and proteomics tools (Molloy et al. 2000). A rapid genomic screening algorithm requires a simple parameterization or scoring of each protein sequence. One feature we expect to find in all β-barrel membrane proteins is a set of roughly 5 to 15 peaks in the β-hairpin analysis like that in Figure 10B. The number of β-strands or β-hairpins is expected to scale approximately with protein size; thus, in our preliminary genomic analyses we calculated a single β-barrel score for each protein by summing the high peaks as follows:

equation image

and we obtained the distribution shown in Figure 11 for the E. coli genome. We chose a cutoff value of 6 because it correctly identifies ∼90% of the β-hairpins in our structure database, without also including many false peaks (see Fig. 10B). Using this algorithm, we calculated scores for three sets of known β-barrel membrane proteins: known crystal structures used in this work (Table 1), trimeric porins, and TonB-dependent outer membrane receptors. The median genomic score is 0.4, whereas all members of these three sets of β-barrel membrane proteins are found beyond the 85th percentile at 1.0 and many score higher than the 97th percentile score at 2.0. The eight-stranded β-barrel OmpX (Table 1), at 5.5, is the highest scoring protein in the entire E. coli genome.

Using this simple and rapid scoring algorithm we have begun to analyze the whole genomes of Gram-negative bacteria. Here we discuss preliminary results from the genomes of Escherichia coli and Pseudomonas auriginosa as examples. After scoring and ranking all the open reading frames in these two genomes, we examined the 125 highest scoring proteins for each genome. These proteins, which represent about 2.5% of all open reading frames, fall between 1.7 and 5.5 in β-barrel score (Fig. 11). They have been categorized in Table 3. We find four main classes of proteins in this high-scoring group. Known outer membrane proteins and putative or probable outer membrane proteins, identified by sequence homology, comprise approximately half of the genes in the highest scoring group. This observation strongly supports the idea that this algorithm can accurately detect β-barrel membrane proteins. Unidentified, open reading frames or hypothetical proteins also comprise about half of these highest scoring proteins. It seems very likely that some of these sequences encode for functional β-barrel membrane proteins. Interestingly, we also find a significant number of fimbrial (piliar) proteins, fimbrial usher proteins, adhesin-like proteins, and exoproteins in this highest scoring group. These are all proteins that reside in, or pass through, the outer membrane. Proteins or hypothetical proteins belonging to other classes, such as probable soluble enzymes, comprise only a very small fraction of the high-scoring genes. The complete genomic lists of β-barrel scores are provided as Electronic Supplementary Material to this manuscript.


We have analyzed the amino acid composition and architecture of all β-barrel membrane proteins of known structure. These data have been used to develop a simple algorithm for rapidly screening genomes for potential β-barrel membrane proteins. Application of this algorithm to the genomes of the Gram-negative bacteria Escherichia coli and Psedomonas auriginosa has revealed dozens of potential β-barrel membrane proteins that have previously not yet been identified or annotated as such. Future experiments will be directed toward refinement of the screening algorithm and toward application of proteomics methods to determine if the potential β-barrels that we have identified can be expressed as β-barrel membrane proteins in bacterial outer membranes.

Materials and methods

Transformation of PDB coordinates to the bilayer plane

Each protein's XYZ PDB coordinates were transformed to align the “bilayer plane” of the protein with the XY plane of the coordinate system. First, the PDB coordinate file was converted to a kinemage file using PreKin (Richardson and Richardson 1994). With the program Mage (Richardson and Richardson 1994) we viewed the kinemage and used the position of the external aromatics, aliphatics, and charged residues to align each protein with the XY plane. The transformation matrix was obtained from Mage and used in a modified version of the program KinPlot (Wimley et al. 1994) to transform the coordinates and rewrite them in PDB format. The output of this procedure is a PDB format file in which the plane of the bilayer is coincident with the XY plane of the atomic coordinate system. Alignment of the proteins along the z-axis is described in the text. All the software used in this work that is not publicly obtainable is available from the author upon request.

Hydrophobicity profiles

Hydrophobicity profiles were calculated over a 5-Å sliding average window, which was moved across the protein in the bilayer coordinate system along a line normal to the bilayer. The “location” of each residue was taken to be the XYZ coordinates of the β-carbon, or the α-carbon for glycine. We examined the differences that would occur in the locations of long polar side chains, such as lysine, if we instead used the position of the polar side-chain moiety, but we found only small net differences from the position of the β-carbon (∼1 Å or less). The octanol hydrophobicity scale, which has been discussed in detail elsewhere (Wimley et al. 1996; White and Wimley 1998 White and Wimley 1999) is based on the partitioning of peptides of the form AcWL-X-LL into bulk octanol. The scale is less permissive of polar residues, and appears to be a good scale for mimicking the environment of membrane proteins.

Electronic supplemental material

Electronic supplemental material consists of tabulated amino acid abundance data (Table 2) and tables of sorted β-barrel scores for the complete genomes of the two Gram-negative bacteria discussed in the text: Escherichia coli and Pseudomonas aeruginosa. After the file header, the genomic data are given in five columns: β-barrel score (sorted), protein length, number of peaks in the β-hairpin score greater than 4.0 (Fig. 10), description of the protein in the genome annotation, and the protein's code. File name conventions are as follows: Ecoli.doc: Escherichia coli; Paeruginosa. doc: Pseudomonas aeruginosa.

Table Table 1.. The β-barrel database
ProteinOrganismPDB codeaArchitectureStrandsReference
  • a

    a Accession number for the structure in the protein data bank (Berman et al. 2000).

  • b

    b Each monomer contributes two β-strands to the 14 stranded barrel.

  • c

    c Each monomer contributes four β-strands to the 12 stranded barrel.

PorinRhodobacter capsulatus2PORtrimer16Weiss and Schulz 1992
Pho EEscherichia coli1PHOtrimer16Cowan et al. 1992
PorinRhodobacter blastica1PRNtrimer16Kreusch and Schulz 1994
Omp FEscherichia coli1OPFtrimer16Cowan et al. 1995
α hemolysinStaphylococcus aureus1AHLheptameric single barrelb2Song et al. 1996
MaltoporinSalmonella Typhimurium2MPRtrimer18Meyer et al. 1997
Omp AEscherichia coli1BXWmonomer8Pautsch and Schulz 1998
Sucrose porinSalmonella Typhimurium1AOStrimer18Forst et al. 1998
FhuAEscherichia coli1BY5monomer22Locher et al. 1998
OsmoporinKlebsiella Pneumoniae1OSMtrimer16Dutzler et al. 1999
FepAEscherichia coli1FEPmonomer22Buchanan et al. 1999
OmpLAEscherichia coli1QD6dimer12Snijder et al. 1999
Omp XEscherichia coli1QJ9monomer8Vogt and Schulz 1999
Tol CEscherichia coli1EK9trimeric single barrelc4Koronakis et al. 2000
Omp 32Comamonas acidovorans1E54trimer16Zeth et al. 2000
Table Table 2.. Composition data for β-barrels of known structure
 Abundance on external surfacesAbundance on internal surfaces
Amino acidRawdNormeRawNormRawNormRawNormRawNormRawNorm
  • a

    a The bilayer is defined as the region ±13.5 Å from the bilayer midplane defined as shown in Figures 1, 2, and 3, Fig. 2., Fig. 3..

  • b

    b The interface is the region more than ±6.5 Å from the bilayer midplane, but equal to or less than 13.5 Å away.

  • c

    c The hydrocarbon core of the membrane is the region within ±6.5 Å of the bilayer midplane.

  • d

    d Raw abundance is abundance in the β-barrel database divided by the total number of amino acids.

  • e

    e Normalized abundance is the raw abundance divided by the genomic abundance, calculated as described in the text.

  • f

    f There are no cysteine residues in the β-barrel database. For genomic screening the normalized abundance of Cys was set to 0.02.

  • g

    g There are no glutamate residues in the external hydrocarbon core areas. For genomic screening the normalized abundance of Glu in the core was set to 0.02.

  • h

    h There are no lysine residues in the hydrocarbon core areas. For genomic screening the normalized abundance of Lys in the core was set to 0.02.

Table Table 3.. Analysis of high-scoring proteins in bacterial genomes
Protein classificationEscherichia coliaPseudomonas aeruginosab
  • a

    a Complete genome of E. coli O157:H7 (Perna et al. 2001). Annotation dated January 25, 2001.

  • b

    b Complete genome of Pseudomonas aeruginosa AR01 (Stover et al. 2000). Annotation dated August 30, 2000.

  • c

    c We chose for close examination the 125 proteins that scored the highest in the β-hairpin score. These constitute about 2.5% of the genome and cover scores equal to or higher than ∼1.7, as shown in Figure 11. The complete genome lists of β-barrel scores are provided as Electronic Supplemental Material to this manuscript.

  • d

    d Known proteins were those designated in the genome annotation as outer membrane proteins, porins, outer membrane receptors, etc., but without any adjectives such as “probable,” “possible,” or “putative.”

Known outer membrane proteinsd21c22c
Putative or probable outer membrane proteins3928
Unidentified or hypothetical proteins4065
Fimbrial proteins, fimbrial ushers, and adhesins165
Other proteins95
Figure Fig. 1..

Molecular graphics image of a β-barrel outer membrane protein, the dimeric phospholipase OmpLA (Snijder et al. 1999). In this image we show the interfacial aromatic residues tryptophan and tyrosine in green and external charged residues in blue. These residues were used to orient the dimer in the bilayer plane (see text). The grid superimposed over the structure shows the protein in the bilayer-coordinate system that it was transformed to by the procedures described in the text.

Figure Fig. 2..

Examples of external hydrophobicity profiles for two β-barrels. (A) The trimeric 18-stranded sucrose porin from Salmonella typhimurium (Table 1). (B) The monomeric 22-stranded iron transport protein fepA from Escherichia coli (Table 1). A 5-Å sliding window was used to generate hydrophobicity profiles for exposed barrel residues that were identified and centered on the bilayer midplane as described in the text. The hydrophobicity scale used was an experimentally determined scale based on partitioning of model peptides into octanol. Negative numbers on the X-axis signify residues closer to the periplasmic space. Negative numbers of the Y-axis signify residues that are more hydrophobic.

Figure Fig. 3..

Composite transbilayer profiles for all β-barrel membrane proteins of known structure. (A) Fractional abundance of external aromatic and ionized residues summed over a 5-Å sliding window. The abundance is divided by the total number of external residues within the window. (B) Composite hydrophobicity of internal and exposed amino acids in the β-barrel membrane proteins of known structure (Table 1). The hydrophobicity scale is an absolute scale based on octanol partitioning of model peptides (Wimley et al. 1996), and was calculated using a 5-Å sliding window. Negative numbers on the X-axis signify residues closer to the periplasmic space, and negative numbers on the Y-axis of (B) signify greater hydrophobicity. The hydrophobic thickness of the membrane, 27 Å, is centered on X = 0 Å, and is shown as a gray box. Note that the hydrophobicity scale is an absolute scale that has not been normalized. The fact that the natural zero level of the octanol scale corresponds exactly to the actual membrane-spanning segments has been noted elsewhere for helical bundle membrane proteins applications (S. Jayasinghe, K. Hristova, and S.H. White 2001).

Figure Fig. 4..

Raw amino acid abundance for the external and internal amino acids in the database of all known β-barrel membrane proteins. (A) External residues. (B) Internal residues. Raw abundance values are the total number of each amino acid divided by the total number of amino acids in that structural subclass. In addition to the abundance across the whole bilayer, we also show the abundance for each of two bilayer regimes, the hydrocarbon core ±6.5 Å from the bilayer midplane and the bilayer interface between 6.5 and 13.5 Å from the midplane. Abundance values are ranked, left to right, by the value for the whole bilayer.

Figure Fig. 5..

Normalized amino acid abundance for the external and internal amino acids in the database of all known β-barrel membrane proteins. (A) External residues. (B) Internal residues. Normalized abundance values are the raw abundance (Fig. 4, Table 2) divided by the weighted genomic abundance of each amino acid (see text). In addition to the abundance across the whole bilayer, we also show the abundance for each of two bilayer regimes: the hydrocarbon core ±6.5 Å from the bilayer midplane and the bilayer interface between 6.5 and 13.5 Å from the midplane. The line at 1.0 is the expectation value for residues whose abundance equals the expected genomic abundance. Abundance values are ranked, left to right, by the value for the whole bilayer.

Figure Fig. 6..

Histogram of the rise per residue in β-barrel membrane proteins of known structure. For each lipid-exposed β-strand in our database we calculated the rise per residue from the three residues closest to the bilayer midplane. The scale at the top shows a conversion to the number of residues required to span the 27-Å thickness of the membrane.

Figure Fig. 7..

Histogram of interstrand loop lengths in the known β-barrel membrane proteins. In this measurement, a loop is a count of all the residues between two β-strands that are outside of the bilayer, more than 13.5 Å from the bilayer midplane. The distribution is bimodal, with about 45% of the loops shorter than eight residues and 55% of the loops longer.

Figure Fig. 8..

Distribution of β-strand scores for the whole Escherichia coli genome (Perna et al. 2001) and for the membrane-spanning β-strands of known β-barrel proteins (Table 1). β-Strand scores reflect the match between the composition of alternating amino acids in an unknown segment and the composition expected from the analysis of known β-barrels. Calculation of β-strand scores is described in the text. Note that the center of the distribution of known β-barrel membrane protein is at about 2.5 σ from the genomic peak.

Figure Fig. 9..

Distribution of alternating hydrophobicity scores for the whole Escherichia coli genome (Perna et al. 2001) and for the membrane-spanning β-strands of known β-barrel proteins (Table 1). Alternating hydrophobicity scores reflect the idea that the residues on the inside and outside of a β-barrel will have a hydrophobic-hydrophilic pattern. Calculation of abundance scores is described in the text. The value cannot be negative because we take the highest positive score of the two possible scores for the 10-residue window. Note that the overlap is much greater than the overlap in Figure 8, and thus, alternating hydrophobicity is a weaker detection method than the abundance comparison in Figure 8.

Figure Fig. 10..

Examples of sliding window scores for the membrane-spanning segment of FhuA, a monomeric 22-stranded β-barrel (Table 1). The actual membrane-spanning strands are shown by the horizontal bars. (A) β-Strand score calculated as described in the text. A membrane-spanning β-strand will have a sharp peak. The gray box represents the area in which most known membrane-spanning β-strands fall. Note that every β-strand in this protein has a corresponding peak in this regime. (B) β-Hairpin score is the sum, in a 25-residue sliding window, of the highest peak in residues 1–10 and the highest peak in residues 15–25. Arrows denote the location of the short turns between known β-strands. Note that most of the β-hairpins in the protein are correctly identified.

Figure Fig. 11..

Distribution of β-barrel scores for all proteins in the E. coli genome and in sets of known β-barrel membrane proteins. The known proteins are from three groups: known structures from the protein data bank (Table 1), trimeric porins, and TonB-dependent outer membrane receptors. Note that the known outer membrane proteins have scores that fall well beyond the mean of the E. coli distribution, 0.4.


The New Orleans Protein Folding Intergroup is gratefully acknowledged for many invaluable discussions, and we thank Samuel J. Landry and William F. Walkenhorst for critically reading the manuscript. We are indebted to Dr. Harald Engelhardt (Max-Planck Institute for Biochemistry, Munich) for sending the coordinates of Omp32 before their release from the PDB. Funded by NIH (GM60000) and the Louisiana Board of Regents Support Fund 1999-02-RD-A-43.

The publication costs of this article were defrayed in part by payment of page charges. This article must therefore be hereby marked “advertisement” in accordance with 18 USC section 1734 solely to indicate this fact.