A knowledge-based potential highlights unique features of membrane α-helical and β-barrel protein insertion and folding


  • Daniel Hsieh,

    1. BioMaPS Institute and the Graduate Program in Computational Biology and Molecular Biophysics, Rutgers University, Piscataway, New Jersey 08854
    2. Department of Biochemistry, Robert Wood Johnson Medical School, University of Medicine and Dentistry of New Jersey, Piscataway, New Jersey 08854
    3. Center for Advanced Biotechnology and Medicine, Piscataway, New Jersey 08854
    Search for more papers by this author
  • Alexander Davis,

    1. Center for Advanced Biotechnology and Medicine, Piscataway, New Jersey 08854
    Search for more papers by this author
  • Vikas Nanda

    Corresponding author
    1. Department of Biochemistry, Robert Wood Johnson Medical School, University of Medicine and Dentistry of New Jersey, Piscataway, New Jersey 08854
    2. Center for Advanced Biotechnology and Medicine, Piscataway, New Jersey 08854
    • Department of Biochemistry, Robert Wood Johnson Medical School—UMDNJ, Piscataway, NJ 08854

    Search for more papers by this author


Outer membrane β-barrel proteins differ from α-helical inner membrane proteins in lipid environment, secondary structure, and the proposed processes of folding and insertion. It is reasonable to expect that outer membrane proteins may contain primary sequence information specific for their folding and insertion behavior. In previous work, a depth-dependent insertion potential, Ez, was derived for α-helical inner membrane proteins. We have generated an equivalent potential for TM β-barrel proteins. The similarities and differences between these two potentials provide insight into unique aspects of the folding and insertion of β-barrel membrane proteins. This potential can predict orientation within the membrane and identify functional residues involved in intermolecular interactions.


Our current biophysical understanding of membrane protein folding and function lags significantly behind that of water-soluble proteins. This gap is particularly evident for outer membrane β-barrel proteins. Transmembrane β-barrel proteins (TMBs) fulfill many important functions from nutrient uptake to cell signaling to virulence, in Gram-negative bacteria, mitochondria1 and chloroplasts.2 Although in vivo folding and insertion of TMBs are guided by chaperones,3 folding can proceed in vitro in synthetic vesicles in the absence of such cellular machinery.4–9 This result suggests that the primary sequence encodes sufficient information for successful TMB insertion and folding. For transmembrane α-helical (TMH) proteins, analysis of sequences and high resolution structures has provided insight into the physical basis of folding and insertion. Here we present a statistical potential for predicting the thermodynamics of TMB insertion into a lipid bilayer.

It is reasonable to expect that different rules govern insertion of TMHs and TMBs. The two structural classes differ in insertion pathway, lipid environment, and secondary structural preferences. Folding and insertion models of TMBs contrast with the two-stage folding model that has been proposed for TMHs based on pioneering studies of bacteriorhodopsin.10 In this model, individual helices insert into the membrane and subsequently fold into the final tertiary structure. In contrast, TMBs are proposed to fold and insert in a concerted process,7, 9, 11–13 with partial folding preceding insertion in vivo in some cases.14, 15

Differences in lipid environment may also influence the manner in which TMB and TMH sequences have evolved for folding and function. Lipid composition varies between the two membranes, resulting in different headgroup polarities as well as bilayer thickness. In addition, while the inner membrane is composed of phospholipids, the outer membrane also contains lipopolysaccharides in its outer leaflet, rendering it highly impermeable to polar solutes.16 TMBs have been observed to directly bind to lipopolysaccharides in high-resolution crystal structures.17

The secondary structure context may influence the evolutionary selection and placement of amino acids. First, it is well established that amino acid frequencies for α-helices and β-sheets are distinct;18, 19 certain residues promote or disrupt each type of secondary structure. Second, TMBs and TMHs utilize different motifs facilitating oligomerization,20, 21 such as the GxxxG motif found in glycophorin-A.22 Another example is the capacity for amino acids such as lysine or arginine to “snorkel,” allowing the amino acid to place its polar moiety in the aqueous/headgroup region while burying the aliphatic portion of the sidechain in the hydrophobic region of the bilayer.23, 24 Different snorkeling propensities are observed for α-helices and β-strands at the N and C termini.

We focus on the energetics of insertion of TMBs into the lipid bilayer. This study builds on previous work that explored partitioning effects on amino acids into the lipid bilayer both experimentally25–27 and computationally.28–30 These studies assume a single parameter for each amino acid which reflects the energetic cost of transferring a residue from water to a purely nonpolar environment, thus treating the lipid bilayer as a hydrophobic slab in an aqueous environment. However, neutron diffraction studies have shown the water-lipid interface to be continuous rather than discrete.31 To account for such heterogeneity, it is necessary to develop a depth-dependent potential where the properties of each amino acid are described by several parameters.

A depth dependent statistical potential called Ez was developed for predicting individual amino acid energies of insertion for TMHs.32 For purposes of clarity, we refer to this TMH potential as ΔEzα and the new TMB potential as ΔEzβ. The ΔEzα potential recapitulated subtle features of α-helical protein insertion. This potential was applied to the de novo design of membrane-soluble peptides that targeted platelet integrin receptors and modulated their thrombogenic behavior.33 Ezβ is expected to be similarly useful in the design of TMBs.

There are advantages and drawbacks to using a knowledge-based potential. The accuracy of the potential is restricted by the size of the protein dataset from which it is derived. It is also argued that knowledge-based potentials provide limited information on folding intermediates or transition states.34 However, statistical potentials can capture subtle or unknown physio-chemical features that ab initio potentials may fail to include.

Due to the utility of Ezα, we were motivated to develop a depth-dependent potential for TMBs that reflected the unique lipid environment, folding pathway and secondary structural context of this class of proteins. TMB-specific features of Ezβ provide new insights into mechanisms of folding, insertion and function.


Comparison of Ezβ with Ezα

Ezβ was calculated using the same protocol as Ezα32 (see Methods section). Thirty-five high-resolution TMB crystal structures of low sequence homology were centered in the membrane and aligned such that the barrel axis coincided with the membrane normal (z-axis). Although Ezα was constructed using all residues in the dataset, Ezβ only considered the lipid-facing residues and extra-membrane turns. The validity of this subset, which included 4710 of the 12,886 total residues, will be discussed later in this study. Due to the limited dataset size, differential partitioning between outer and inner leaflets of the lipid bilayer was not considered. Parameters for cysteine and methionine were not calculated due to insufficient counts. Only absolute distance from the membrane center was taken into account.

Most amino acids exhibited similar distributions in Ezα and Ezβ. As expected, polar residues preferred the outside of the membrane, while hydrophobic residues had the reverse preference (Fig. 1, Table I). Aromatic residues were predominantly situated in the headgroup region. Values of the parameter ΔE0 for Ezβ strongly correlated with those of Ezα (R2 = 0.78) and with an experimentally derived hydrophobicity scale25 (R2 = 0.68), indicating general physio-chemical behavior was conserved across both classes of membrane proteins (Fig. 2).

Figure 1.

Propensity profiles of each amino acid. Profiles are plotted as a function of depth along membrane depth, separated into three categories: (A) hydrophobic, (B) polar, and (C) aromatic + glycine. Threonine showed no depth-dependent bias and could not be fit either functional form.

Figure 2.

Correlation with other hydrophobicity scales. (A) Comparison of ΔE(0) for Δ Ezβ and ΔEzα. (B) Comparison of Δ Ezβ with an experimental hydrophobicity scale.25

Table I. Parameters of the Ezβ Potential
  1. Values of parameters of the Ezβ potential function determined by leave-one-out analysis. Paq is the limiting propensity in water (z = ∞); ΔE(z) is the free energy change by transferring the amino acid from water to membrane depth z. ΔEmin is interpreted as ΔE(zmin), where zmin is the depth z at which an aromatic residue attains most favorable insertion energy; zmid is where a nonaromatic, nonglycine residue reaches its half-maximal energy; n and σ are the steepness and width of transition, respectively.

Sigmoidal functional form
 Ala0.7 ± 0.0−0.8 ± 0.06.0 ± 0.17.1 ± 0.9
 Leu0.1 ± 0.0−2.0 ± 0.017 ± 0.02.9 ± 0.1
 Ile0.3 ± 0.0−1.0 ± 0.115 ± 0.118 ± 1.8
 Val0.1 ± 0.0−1.5 ± 0.015 ± 0.123 ± 4.2
 Asp3.0 ± 0.01.3 ± 0.015 ± 0.17.2 ± 0.3
 Glu2.9 ± 0.31.1 ± 0.115 ± 0.38.0 ± 1.9
 Lys3.0 ± 0.11.3 ± 0.014 ± 0.33.7 ± 0.2
 Asn1.7 ± 0.00.7 ± 0.013 ± 0.129 ± 1.9
 Pro1.7 ± 0.20.8 ± 0.111 ± 0.38.8 ± 5.6
 Gln1.6 ± 0.30.7 ± 0.111 ± 0.47.7 ± 1.5
 Arg3.0 ± 0.01.4 ± 0.112 ± 1.01.7 ± 0.2
 His1.3 ± 0.01.2 ± 0.08.0 ± 0.214 ± 1.4
 Ser2.4 ± 0.20.9 ± 0.117 ± 0.03.6 ± 1.5
Gaussian functional form
 Phe0.0 ± 0.0−3.0 ± 0.09.5 ± 0.110 ± 0.1
 Trp0.1 ± 0.0−2.1 ± 0.211 ± 0.16.2 ± 0.4
 Tyr0.3 ± 0.0−1.3 ± 0.09.3 ± 0.03.6 ± 0.1
 Gly1.4 ± 0.00.7 ± 0.09.9 ± 0.12.9 ± 0.1

However, a few differences between Ezα and Ezβ parameters pointed to unique features of TMBs. First, values of the parameters zmin for Trp, Tyr, and Phe in Ezβ were smaller than the corresponding values in Ezα, consistent with the proposal that the inner membrane has a thicker hydrophobic core than the outer membrane due to differences in lipid composition.7 The average value of zmin for aromatic residues in TMHs (12.4 Å) versus TMBs (10.1 Å) corresponds to a predicted difference in hydrophobic thickness of ∼5 Å.

Second, Ezα and Ezβ differed in values of the parameter n for aliphatic amino acids. This parameter represents the steepness of the transition from one environment to another and was previously interpreted as the cooperativity of the interaction with the environment.32 In the case of Ezα, Leu, Val, and Ile had similar values of n, suggesting similar cooperativities in TMH proteins. In contrast, we observed values of n that were 5- to 6-fold greater for Ile and Val relative to Leu in TMB proteins. In many structures in our TMB training set, the β-sheet ends at the lipid/water interface. Therefore, the steep change in Ile and Val propensities may be due to a favorable β-sheet preference for β-branched amino acids,18, 35 rather than cooperativity from sidechain-lipid interactions.

The third difference was Phe and Gly required different functional forms of E(z) [Eqs. (4) and (5)] to describe their partitioning behavior. In TMBs, phenylalanine partitioned like the other two aromatic residues, localizing to the headgroup region, in contrast to its preference for the bilayer center in TMHs36, 37 (Fig. 3). Tyr and Trp presumably localize in the headgroup region due to hydrophobic interactions with lipids and hydrogen bonding between the sidechain and water. Phenylalanine, on the other hand, lacks a polar moiety on the sidechain and thus cannot hydrogen bond with water. In TMHs, phenylalanine behaves mostly like a hydrophobic amino acid and preferentially localizes in the center of the membrane; it has the largest Ezα zmid of all the amino acids, suggesting some affinity for the headgroup region as well. In TMBs, affinity for the headgroup region is more pronounced presumably due to favorable π-stacking interactions between aromatic residues in adjacent β-strands, a general feature in β-sheet proteins.38

Figure 3.

Insertion energy profiles of glycine and phenylalanine. Energies of insertion of glycine and phenylalanine with respect to membrane depth compared between Ezα (dashed) and Ezβ (solid).

Relative structural properties of aromatic groups were reflected in the Ezβ parameters. The optimal Ezβ propensities of aromatic residues in the headgroup region (Pmin: Tyr = 2.7, Trp = 2.3, and Phe = 1.9) were consistent with experimentally derived stabilities of substitutions in OmpA (Tyr = −2.6, Trp = −2.0, and Phe = −1.0 kcal/mol).39

Glycine in Ezα was poorly fit to a sigmoidal function (ΔE0 = −0.01 kcals/mol), indicating no depth preference. In TMBs, one would expect Gly to be uniformly destabilizing due to its inability to shield the backbone from competing interactions with solvent.40 Surprisingly, glycine was found at a higher-than-expected frequency at z = 0 and was unfavorable only in the headgroup region; the distribution best fit a Gaussian with a positive ΔEmin. The center of the bilayer is only minimally hydrated,31 mitigating the competing solvent interactions. The headgroup region has significant water content, making the presence of glycine destabilizing to cross-strand hydrogen bonds in this region. Glycine is again found in extra-membrane loops due its flexibility and the absence of secondary structure. Glycine is thus uniquely unfavorable at the headgroup region, the only location with both secondary structure and hydration. In some β-structures, the presence of glycine is compensated by cross-strand pairing with aromatic residues through an interaction called aromatic rescue.41 However, in the headgroup region, aromatic rescue must compete with solvent hydrogen bonding (i.e., snorkeling23) and π-stacking interactions. Thus, very few instances of aromatic rescue were observed in this region in contrast with other locations within TMBs.42

Orientation of TMBs

The Ezβ training set was initially aligned using a grid-search algorithm; an ensemble of rigid-body rotations along the local x- and y-axis of the experimental structure (θx and θy in Fig. 4) were assessed for maximal projection of the β-strands on the z-axis. We subsequently realigned the same structures using the Ezβ potential to assess whether orientations remained consistent across the training set. The average change in angle post-Ezβ alignment between the geometrically determined barrel-axis and the membrane normal was 9.3° ± 16°, consistent with similar measures using other orientation prediction methods (6.5° ± 7.8°) for the same protein dataset.43 The displacement of barrel center-of-mass from the center of the bilayer was 0.9 ± 1.9 Å. Deviations from zero displacement of the barrel center-of-mass or coincident barrel axis and membrane normal reflect limitations of the training-set alignment based on geometric criteria. Three proteins not in the original training set: α-hemolysin, FpvA, and OprG also optimally aligned at the center of the membrane with the barrel axis nearly coincident with the membrane normal (Supporting Information Fig. S1).

Given the discrepancies between Ezα and Ezβ parameters, one might expect that Ezα would be unsuitable for aligning TMBs. However, alignment of the TMB training set by Ezα was comparable, with center of mass within 1.8 ± 1.2 Å of the membrane center and angular deviation of the membrane normal from the barrel normal of 8.5° ± 11.5°. In terms of OMP placement within the membrane, the similarities between the α- and β-potentials dominate over differences.

Sampling rigid-body rotations of FecA (PDB ID 1KMO) around the minimum orientation resulted in a narrow, funnel-shaped energy landscape (Fig. 4), suggesting that amino acid insertion propensities specify a unique depth and orientation of TMBs within the lipid bilayer. The magnitude of the minimum corresponds with the size of the protein; TtoA (PDB ID 3DZM) which was less than one-third the size of FecA, had a significantly shallower minimum.

Lipid facing residues dominate insertion energetics

Only lipid-facing residues were included in the calculation of Ezβ, while the original Ezα potential sampled all residues. This was intended to reflect distinct folding pathways employed by TMH and TMB proteins. The two-state folding model of TMHs implies essentially all transmembrane amino acids interact with lipids at some point during folding. In contrast, TMB folding coincides with insertion so that amino acids buried in the native state might never contact lipids during folding. Therefore, it was expected buried amino acids of TMHs would show a depth-dependent bias, supporting the existence of a folding transition-state where all positions interact with lipids to some degree, while buried positions in TMBs would not present a detectable bias.

Distributions of buried amino acids were compared to lipid-facing positions in both TMBs and a set of TMH proteins from the original Ezα potential (Fig. 5). Regardless of secondary structure, lipid-facing residues showed the most pronounced depth- dependent bias, consistent with a strong sequence conservation for promoting membrane insertion. A nominal depth dependence was observed for buried, aliphatic amino acids in both TMBs and TMHs. Buried aromatic amino acids did not show any clear propensity to localize in one region of the membrane.5

Figure 4.

Energy landscape of barrel orientation. Rigid rotations of (A) FecA, PDBID - 1KMO and (B) TtoA, PDBID - 3DZM about x- and y-axes at z = 0 Å.

Figure 5.

Depth dependence of lipid facing residues. The distributions of lipid-exposed (filled circles) and buried (open circles) of aliphatic, polar and aromatic residues of TMHs (left) and TMBs (right).

A striking difference was observed in the distribution of buried polar amino acids. TMBs showed a flat distribution across the bilayer, but TMHs had a pronounced bias. Fitting the group of polar amino acids together to parameters describing a sigmoidal distribution showed an approximately tenfold greater ΔE0 for TMHs over TMBs (Table II). This discrepancy supports a two-stage TMH folding model where buried amino acids must interact with the lipid bilayer and thus facilitate insertion.

Table II. ΔE0 (kcal/mol) of Buried and Exposed Positions
Lipid facingBuriedLipid facingBuried
  • a

    Phe was considered aliphatic for TMHs and aromatic for TMBs.


Discriminating protein–protein interaction sites

If lipid-accessible amino acids are strongly conserved to promote insertion into the bilayer, one might expect surfaces buried upon TMB oligomerization to show a weaker depth-dependent bias. Lipid-facing residues at TMB protein–protein interfaces were noted to have unique amino acid compositions.44 We tested whether the Ezβ could predict the binding sites of five oligomers in the data set: ScrY, maltoporin, OMPLA, OprP, and OmpC. An interfacial-moment for an aligned β-barrel monomer was defined as the sum of radial moments from the barrel axis to lipid facing amino acids:

equation image(1)

where si was the projection of a radial unit vector on the xy-plane from the center of the barrel to the amino acid at position i, and Ei was the Ezβ insertion energy. This moment matched the protein–protein interface for four out of five oligomers (Fig. 6). With an average of 30% of lipid-facing residues buried in a protein–protein interface, the chance of correctly predicting four out of five protein interfaces by chance was one in two-hundred.

Figure 6.

Interfacial-moment of the barrel exterior Eq. (1) (red) consistently discriminates the binding interface. The hydrophobicity-moment (black) is computed using amino acid transfer energies from Ref. 25.

Polar amino acids in the bilayer are hallmarks of membrane protein interaction sites, mediating interprotein networks of hydrogen bonds and ionic interactions.45–47 However, a similarly calculated experimental hydrophobicity-moment25 did not consistently discriminate the binding interface. Only for OMPLA was the most hydrophobic face of the barrel opposite the binding site.

A hydrophobicity-moment does not adequately capture the combined contributions of polar amino acids in the bilayer center and nonpolar amino acids in the headgroup and extra-membrane region. This can be directly visualized on the protein–protein interface by normalizing the Ezβ insertion energy for each amino acid type from red (most unfavorable) to blue (most favorable) (Fig. 7). Red-colored residues presumably have unfavorable interactions with lipids and/or water, which are relieved upon protein oligomerization. For the sucrose porin ScrY [Fig. 7(A)], unfavorable positions are found near the center of the bilayer, such as the buried Lys 186, and in the extra-membrane region: Ala 133, Leu 129 and Leu 170 form a hydrophobic surface that binds to Val 428 of an adjacent chain. In the case of OMPLA [Fig. 7(B)], mainly polar amino acids near the center of the bilayer are detected, resulting in discrimination of the binding interface by both Ezβ and hydrophobicity-moments. In both cases, these interfacial amino acids are also highly conserved as determined by CONSURF.48, 49 Not all unfavorable, conserved positions have clear roles in oligomerization, such as the L2-loop in ScrY and Trp 58 in OMPLA. These may play additional structural or functional roles.

Figure 7.

(A) Sucrose porin ScrY (1A0S) and (B) outer membrane phospholipase A (1QD6) interfacial residues colored by Ezβ potential. Amino acids not favorably placed were colored red, while those more favorable for insertion are colored in blue. * Residues in the L2 loop: Asn 149, Asp 150, Ala 153, and Ser 156.

The magnitude of the Ezβ interfacial moment generally correlates with the existence of an oligomeric state in high-resolution structures (Fig. 8), with notable exceptions. Correlation between protein length and equation image is not straightforward, indicating protein size is not the primary determinant of the magnitude of the interfacial moment. OMPLA has the lowest equation image of all the oligomeric proteins in the training set, despite crystallizing as a dimer (PDB ID 1QD6). OMPLA exists in a monomer-dimer equilibrium regulated by calcium and substrate binding;53, 54 the weaker moment is consistent with the need of allosteric modulators to promote dimer formation. A few proteins that crystallized as monomers, 1FEP, 1K24, and 2F1C, had biochemical evidence suggesting the existence of protein–protein interactions.50–52

Figure 8.

Large magnitudes of the Ezβ interfacial moment [Eq. (1)] generally correlate with crystallographic evidence of oligmerization (yellow bars). TMBs without a clear protein–protein interface are shown in blue. Number to the right of the bar is the β-strand count. Biochemical data supports potential presence of oligomers for aRef.50, bRef.51, and cRef.52.


A statistical analysis of amino-acid depth preferences confirmed TMH and TMB proteins obeyed similar broad physiochemical rules: aliphatic residues preferred the hydrophobic bilayer center while polar amino acids were sparse in the same region, and aromatic belts girdled the protein at the water-lipid interface. A more detailed examination of the Ezα and Ezβ parameters revealed differences for each class of proteins consistent with unique aspects in structure, folding pathway and lipid environment.

Only recently has a direct experimental thermodynamic scale of transfer energies been attempted for a TMB OMPLA, allowing a direct comparison to Ezβ.55 Nineteen substitutions were made at Ala 210, which is within one angstrom of the bilayer center assumed in our calculations. A direct comparison of ΔΔGs of mutation in OMPLA to ΔE0 from Ezβ shows a very good correlation (R2 = 0.77) if proline and phenylalanine are omitted. Proline can be structurally disruptive to both α-helical and β-strand regular secondary structures, but studies of Pro substitutions in bacteriorhodopsin reveal a complex dependence of stability on local structural context.56 Often, the effects of proline are on kinetics of folding or post-translational processing. Such effects do not factor in equilibrium thermodynamic measurements, but would certainly affect TMB sequence evolution. The other outlier, phenylalanine, was the most favorable substitution for Ala at position 210 in OMPLA, whereas Ezβ indicated Phe at z = 0Å was destabilizing. The reason for this discrepancy is not clear, but may result from kinetic constraints imposed by off-pathway interactions between a centrally located Phe and other aromatic groups in a partially folded intermediate.57 This highlights both the challenge of developing a sufficiently general amino acid transfer scale, and the limitations of attributing knowledge-based potential parameters to purely thermodynamic effects when sequence conservation come from many aspects of protein function.

The Ezβ potential can be applied to challenges in TMB structural bioinformatics, from structure prediction to identifying protein and lipid binding sites. Although sequence conservation is often used to identify functional sites, it is difficult to determine whether conservation is instead due to constraints of folding and structure.58 Ezβ, in concert with existing TMB design potentials,42, 59–61 can be used to identify amino acids conserved primarily for folding and insertion, allowing a clearer discrimination of positions required for protein or lipid interactions.62

As with the application of Ezα to the design of the CHAMP peptides that bind TMH targets, Ezβ will be a useful tool in TMB protein engineering. There is gaining interest in modifying TMBs as small molecule biosensors,63, 64 engineered enzymes65 and drug delivery agents.66, 67 In many of these cases, improving TMB stability is an important engineering goal.63 To incorporate TMBs into synthetic membranes, Ezβ can be adjusted to accommodate bilayers of varying thickness to minimize hydrophobic mismatch in the new design.68 In the future, we plan to incorporate this potential into software for computational protein design, toward the development of fully de novo TMB proteins.


TMB data set

A set of 35 crystallized protein structures (see Supporting Information Table S1) was compiled from a larger list, filtering for a maximum of 26% pairwise sequence homology using EMMA, a ClustalW69 interface included within software suite EMBOSS.70 For oligomeric structures the first chain was used in calculating Ezβ.

Geometric alignment of TMBs along the z-axis

A script was constructed using protCAD,71 a set of in-house libraries for protein design, to align the β-barrel axis to the z-axis (i.e., normal to the membrane bilayer). The best-fit Euler rotation parameters were determined using a grid-search algorithm that maximized the projection of the transmembrane segments of β-strands to the z-axis. Transmembrane segments were specified by Orientation of Proteins in Membranes.43 The center of rotation was determined preliminarily by the center of mass of all Cα of the TM segments.

Calculating Ezβ parameters

The center of the bilayer was defined as z = 0 Å, and assumed symmetric over the inner and outer leaflets. The coordinates of the Cβ of an amino acid was used to specify its distance from the bilayer center. Only residues with a fraction of maximal solvent accessible surface area (SASA) greater than 20% were considered.72 SASA was calculated using DSSP.73 Thus the dataset was primarily composed of lipid-facing and extra-membrane residues.

The model environment was divided into discrete 3 Å bins, offset by 1.5 Å. The propensity, Pres,bin [Eq. (2)], was defined as the observed frequency of an amino acid over its expected frequency in a certain bin:

equation image(2)

nres,bin was the observed number of a particular residue found in a bin. For example, nLys, 4.5 Å was the number of lysines observed at both 4.5 ± 1.5 Å and −4.5 ± 1.5 Å away from the center of the bilayer. ntot represented the total number of residues in the dataset; fres was the frequency of the residue in the entire dataset, and fbin was the frequency in a certain bin. Once all Pres,bin were calculated, they were fit using nonlinear least squares to a continuous function P(z).

equation image(3)

ΔE(z) = E(∞) – E(z), the energetic cost of transferring an amino acid from solvent to a particular depth in the membrane; Paq was the propensity for the amino acid to partition in the aqueous phase. The propensities were fit using either a sigmoidal or Gaussian distribution. The sigmoidal form [Eq. (4)] represented the proclivity of an amino acid to partition into either the hydrophobic or aqueous phase.

equation image(4)

ΔE(0) was the transfer energy from water to the center of the bilayer: E(∞) − E(0). Other parameters included the depth at which transfer energy was half-maximal (zmid), and the steepness of transition (n). The physical interpretation of the parameters was described in the original Ezα potential derivation.32 zmid describes how deeply an amino acid prefers to localize in the membrane. n reflects how tightly the position of an amino acid on the surface of a TMB is coupled with the change in hydrophobicity from a polar aqueous to nonpolar environment.

The Gaussian form [Eq. (5)] modeled the tendency for certain amino acids to partition into the interfacial headgroup region of the bilayer. Key parameters included the energy ΔEmin at depth zmin and the width of the transition, σ. Parameters of the individual fits are presented in Table I.

equation image(5)

These discrete propensities were then fit with a nonlinear continuous curve according to Eq. (3) [with ΔEzβ chosen according to either Eqs. (4) or (5)]. Each fit was described by a set of parameters per residue. A jackknife (leave-one-out) method was applied to the dataset of 35 TMBs to obtain standard error. Met and Cys were removed from the analysis due to low count on the TMB exteriors (68 Met and one Cys).

Buried versus exposed distributions for TMHs and TMBs

Twenty-four TMH structures used in the original Ezα potential were obtained.32 For each secondary structural class of protein, the buried and exposed regions were determined by PDB and DSSP criteria. Residues SASA73 of greater than or equal to 20% of the maximal accessible surface area72 were considered exposed.

Orientation dependent energy calculation

Each protein was rigidly rotated about the x- and y-axes coupled with translation with respect to the z-axis. At each rigid rotation step sized 10°, a total energy of insertion was calculated as the sum of individual residue Ezβ energies as a function of residue type and depth.

Mapping protein–protein interaction sites

The ΔEz of each residue in the TMB was normalized to the range [0,1] with respect to its ΔEmin. Residues were by the normalized insertion energy using a red–white–blue scheme.74 Amino acids at the oligomerization interface were defined as those that have minimum 40% change in SASA between monomer and oligomer as determined by calculated DSSP values.

Ezβ Moments

For each protein, residues having at least 20% SASA exposed in the monomer are determined by DSSP. The residues should not be interior-facing; this condition was checked manually. To calculate the moment, start with the top view of the protein, looking through the barrel axis. The Cα atoms of these residues are associated with (1) position vectors from a central point (see next section for calculating this point) that are projected onto the membrane then normalized and (2) corresponding Ezβ energies which are normalized to a range of one's choice. For example, the least favorable energy can be assigned a value of 0 and the best energy 1. The final Ezβ moment is the inner product between the Ezβ energies and the normalized position vectors across all of the selected residues.

Choosing the appropriate point of central tendency for calculating the Ezβ moment

To calculate a total Ezβ moment, a central point must first be picked to assign position vectors to each Cα atom of the included residues. To also guarantee the direction of the moment is invariant given an arbitrary Ezβ range (min and max energy values), this central point must have the property that if all the energies are equal, the total moment should be the zero vector. This same point has the property of minimizing the sum of distances to these points.75 Such central point is called the geometric median or Fermat point. Determining the geometric median of these proteins requires an iterative algorithm based on the work of Weiszfeld,76 and such computation is implemented by a Python script written and maintained by Daniel J. Lewis of UCL Geography.77

From an aligned structure, residues included for moment calculation form a collection of n Cα position vectors projected onto the membrane plane equation image, each paired with associated Ez energies equation image. Each unit position vector can be denoted as equation image, where equation image is the geometric median obtained by Weiszfeld's algorithm using the chosen Cα's.76 The total Ezβ moment is the inner product equation image

By this choice of geometric median as equation image, we have satisfied the following criteria:

  • 1 equation image.
  • 2The direction of the moment is invariant to any linear transformation of the ΔEz range (min and max): equation image.


The authors thank Dr. James Stapleton for critical reading of the manuscript.