A mass weighted chemical elastic network model elucidates closed form domain motions in proteins

Authors


Abstract

An elastic network model (ENM), usually Cα coarse-grained one, has been widely used to study protein dynamics as an alternative to classical molecular dynamics simulation. This simple approach dramatically saves the computational cost, but sometimes fails to describe a feasible conformational change due to unrealistically excessive spring connections. To overcome this limitation, we propose a mass-weighted chemical elastic network model (MWCENM) in which the total mass of each residue is assumed to be concentrated on the representative alpha carbon atom and various stiffness values are precisely assigned according to the types of chemical interactions. We test MWCENM on several well-known proteins of which both closed and open conformations are available as well as three α-helix rich proteins. Their normal mode analysis reveals that MWCENM not only generates more plausible conformational changes, especially for closed forms of proteins, but also preserves protein secondary structures thus distinguishing MWCENM from traditional ENMs. In addition, MWCENM also reduces computational burden by using a more sparse stiffness matrix.

Introduction

Most proteins undergo conformational changes, which are closely related to their specific biological functions such as catalysis and regulation.1, 2 Therefore, in the past decades, a number of experimental and theoretical approaches have been proposed to understand the functional dynamics of proteins. Various experimental techniques including Cryo-EM, X-ray crystallography, and NMR have succeeded in determining protein structures at the atomic level. Although these structures provide good starting points for molecular dynamics simulations and the elucidation of protein dynamics,3–6 limitations regarding the simulation time scale, data size, and computational cost still exist.7, 8

Elastic network model (ENM) based normal mode analysis (NMA) was proposed as an alternative method that is better suited for the study of the collective motions in macromolecules.9–12 In ENM, the system is constructed using a virtual spring network among point masses, which represent the protein residues and their interactions.13–17 To reduce computational cost, this coarse-grained protein model adopts a simplified Hookean potential instead of using an all-atom empirical potential function.18, 19 Moreover, various types of connection rules have been proposed to capture biologically relevant collective modes.

The most common and simplest method is the distance-cutoff rule. Both empirical and theoretical studies have suggested that the minimum cutoff value should be at least 11 Å to guarantee system stability.19–21 However, this traditional method sometimes fails to capture biologically important functional modes on the low-frequency normal modes, especially for closed forms in proteins. This failure results from the discrepancy between the virtual spring connections in ENM and the actual chemical interactions of native protein structures. For example, in the case of the closed form of lactoferrin, the distance-cutoff method does not show the expected “tweezers” movement, because of the unrealistic constraints between the two closed lobes.22 Moreover, the rigidity of protein structures is not given careful consideration because the traditional connection rule for ENM uses a uniform spring constant for all types of interactions. However, protein secondary structures are thought to behave as rigid bodies under thermal fluctuation because of their relatively strong covalent and hydrogen bond connections.23–25

To improve the accuracy of stiffness and connectivity, the chemical-bond based connection rule was proposed, in which the various stiffness values are assigned according to the type of chemical bond, including disulfide bonds, covalent bonds, hydrogen bonds, salt-bridges, and Van der Waals interactions.20 In this model, one can reduce the computation time by using smaller and more realistic distance-cutoff values of less than 8 Å without a loss of generality. Figure 1 compares traditional distance-cutoff based ENM with chemical bond based ENM. In both cases, the orange spheres and lines represent the alpha carbons and their interactions, respectively.

Figure 1.

Schematic of coarse-grained ENMs using the traditional distance-cutoff method (left) and chemical bond information (right). The representative alpha carbons are shown as orange spheres. In traditional ENM, the interactions within the cutoff distance of 11 Å are shown as blue solid lines. In chemical bond based ENM, various types of lines depict the various chemical interactions. Black, green, cyan, and yellow solid lines, and blue dotted lines represent backbone, hydrogen bonds, ionic bonds, disulfide bonds, and van der Waals interactions, respectively. [Color figure can be viewed in the online issue, which is available at wileyonlinelibrary.com.]

In this article, we propose a more elaborate connection rule that represents not only chemical interactions but also the inertial effect by assuming that the entire mass of an amino acid is concentrated on its alpha carbons atom. This mass-weighted chemical ENM, called MWCENM, maintains a sparse stiffness (Hessian) matrix and enables us to analyze protein dynamics more precisely. To verify the proposed methodology, we compute overlap values between normal modes and conformational changes of several proteins for which both open and closed conformations are available. Moreover, three alpha helix rich proteins are also tested in order to compare the torsion angle distortion between the traditional distance-cutoff method and the proposed MWCENM. Lastly, the computation time for both approaches is compared.

Results

To verify the proposed method, we determined the overlap values of the 10 closed-form proteins, which have their clearly distinguished open form proteins listed in Table I. Here, we precisely discuss the following three closed-form proteins: SARS protease,26, 27 threonyl-tRNA synthetase,28 and lactoferrin.29 We also measured the torsion angle changes in three alpha helix rich proteins: myoglobin,30 calmodulim,31 and human UMP/CMP kinase.32 These proteins were selected from the Macromolecular Movements Database.33, 34 The overlap tests are intended to evaluate how precisely the proposed method can capture the conformational changes observed in proteins, especially for the closed forms. Traditional connection rules have often failed to capture conformational changes in the closed forms of protein.35, 36 A torsion angle measurement for alpha helix rich proteins can show quantitatively whether the protein secondary structures are preserved in the MWCENM based NMA.

Table I. Overlap Value for Tested Proteins
  MWCENMTraditional ENM
Protein namePDB codeaCSOOverlap (mode)CSOOverlap (mode)
  • a

    For each protein, the first and the second PDB codes represent closed and open forms, respectively.

SARS protease1UK40.6660.353 (8)0.5370.284 (8)
 2A5A0.5100.306 (9)0.5320.310 (8)
Threonyl-tRNAsynthetase1EVL0.8510.555 (16)0.6300.293 (18)
 1EVK0.8790.550 (10)0.7990.387 (13)
Lactoferrin1LFG0.8900.636 (7)0.8660.543 (7)
 1LFH0.9190.630 (9)0.9220.519 (7)
Guanylate kinase1EX70.9700.919 (7)0.9170.770 (7)
 1EX60.9680.882 (7)0.9440.864 (7)
Sucrose phosphatase (SPP)1TJ50.8640.644 (10)0.8810.664 (9)
 1S2O0.9600.903 (7)0.9710.937 (7)
LAO binding protein1LST0.9070.446 (15)0.8620.576 (9)
 2LAO0.9700.946 (7)0.9640.819 (7)
α-Ketoglutaratedioxygenase1GY90.9220.798 (7)0.8850.756 (8)
 1OTJ0.9510.884 (7)0.9580.826 (7)
Adenylate kinase1AKY0.6340.377 (10)0.6200.464 (9)
 1DVR0.9490.792 (8)0.9480.730 (8)
Diphtheria toxin1MDT0.6420.352 (9)0.5880.358 (10)
 1DDT0.7100.409 (8)0.7230.562 (8)
CBL2CBL0.9110.744 (7)0.8550.653 (9)
 1B470.9210.701 (7)0.9010.655 (9)

All structural information was obtained from the Protein Data Bank37 and MWCENM based NMA was performed using Matlab. All protein images were generated by Visual Molecular Dynamics.38 Matlab code for both MWCENM and traditional ENM were taken from the online morph server KOSMOS (http://bioengineering.ac.kr/kosmos).39 NMA with various simulation options can be requested from KOSMOS, and the simulation results can be downloaded and visualized using a 3D interactive viewer.

Figure 2 compares the NMA results of closed-form proteins by representing CSO (see Methods section) distributions over the first twenty lowest modes. CSO results from MWCENM are shown as a blue solid line, while the red dashed line represents those from traditional ENM, with a distance cutoff of 11 Å. The first six zeros in the CSO values indicate rigid-body motions.

Figure 2.

The CSO values of closed-form proteins using the traditional ENM (red dashed line with open circles) and MWCENM (blue solid line with closed rectangle). The first six modes are always zero, representing rigid body motions. [Color figure can be viewed in the online issue, which is available at wileyonlinelibrary.com.]

The MWCENM method clearly gives similar or even higher CSO values than traditional ENM. In the case of SARS protease, the CSO value from MWCENM is more than 0.66, whereas the value from traditional ENM is only 0.53. Both methods have the largest increase in the CSO value at the 8th mode. Additionally, two other increases in the CSO value occur at the 10th and the 18th modes of MWCENM. Figure 2(B) shows a more impressive result for threonyl-tRNA synthetase. Both methods show the similar CSO profiles until MWCENM has the largest increase in the CSO value at the 16th mode. MWCENM ultimately achieves the CSO value to 0.85, as compared with 0.63 from traditional ENM, when the first 20 lowest modes are accumulated. As lower frequency modes are closely correlated with more significant functional motions, these high overlap values strongly demonstrate the better simulation accuracy of the proposed MWCENM.36, 40

We also test the closed-form of lactoferrin [Fig. 2(C)]. Unlike the results for SARS protease and threonyl-tRNA synthetase, both traditional ENM and MWCENM achieved nearly the same CSO distribution. Although one can observe the same bending motion at the 7th mode, which represents one of the most significant functional motions of lactoferrin, another peak at the 12th mode involves remarkably different vibration behaviors, depending on the types of ENM.

Figure 3 illustrates the 10th and the 12th mode shapes from traditional ENM and MWCENM, respectively. For comparison, we illustrate the conformational change from the closed form to the open form that is represented by a combination of two large collective motions: the bending motion between the head and the two lobes, and the relative scissoring motion between the two lobes. Figure 3(B) shows an additional bending motion generated by the 10th mode of the traditional ENM. If we approximate the first bending motion, which occurs at the 7th mode, to a half sine wave with a hinge point in the middle, this second bending motion approximates a full sine wave where the upper head and the lower lobes bend in opposite direction to one another. In particular, MWCENM captures the scissoring motion between the two lobes at the 12th mode [Fig. 3(C), Supporting Information]. This result not only correlates well with the significant conformational change in lactoferrin observed in Figure 3(A), but also agrees with previous studies.17, 22, 29 From these results, we are convinced that the proposed MWCENM is able to better capture the intrinsic dynamics of proteins in closed form because it contains enough detailed information about mass weights and chemical interactions, and properly modulates the number of spring connections. An excess of spring connections has often appeared to be the limitation of traditional ENM.

Figure 3.

The comparison of the important normal mode shapes of the closed-form of lactoferrin. N, S1, and S2 denote, respectively, the three domains of lactoferrin: head (green), left lobe (yellow), and right lobe (orange). The red arrows indicate the directional vectors at each residue. (A) The conformational change vector from the closed form to the open form. (B) The 10th normal mode vectors from traditional ENM. (C) The 12th normal mode vectors from MWCENM. [Color figure can be viewed in the online issue, which is available at wileyonlinelibrary.com.] PR2244 Figure 3A,B PR2244 Figure 3C

Additionally, we tested seven other proteins, which also have both open and closed conformations. The CSO results from all 10 exampled proteins are listed in Table I. In the case of closed forms, MWCENM method clearly shows higher CSO values than traditional ENM. Moreover, most of closed form proteins get the higher overlap values at lower modes when the MWCENM method is applied. In contrast, the CSO results for the open forms by MWCENM method are not that much higher but still similar those of the traditional ENM. This quantitative study strongly suggests that MWCENM be one of the best choices for protein modeling, regardless of the type of protein conformation.

We also measured the torsion angle change for alpha helix rich proteins to test how well MWCENM can preserve secondary structures in NMA. The first five nonrigid-body modes were used to calculate torsion angle changes, and these results are depicted in Figure 4. Each torsion angle change for each mode is represented by a different color line, and alpha helical regions are marked with gold lines in the middle. Figure 4(A) shows the case of myoglobin. The fluctuation of torsion angle by MWCENM clearly distinguished alpha helical regions from others. Most high peaks are placed out of alpha helical regions. In contrast, alpha helical regions are not easily identifiable from traditional ENM, because of the large and disordered torsion angle variation. The mode shape comparison between MWCENM and tradition ENM also supports these results (Supporting Information). At the 7th mode shape from traditional ENM, each alpha helix comprising myoglobin fails to preserve its original conformation showing bending or twist motion, whereas all alpha helices in MWCENM either translate or rotate like rigid bodies. An interactive view is available in the electronic version of the article.

Figure 4.

Comparison of traditional ENM (left) with MWCENM (right) in terms of torsion angle fluctuation caused by the first five nonrigid-body modes. The gold lines located in the middle of each plot indicate alpha helical regions. (A) Myoglobin (PDB: 101M), (B) Calmodulin (PDB: 1CLL), and (C) Human UMP/CMP kinase (PDB: 1TEV). [Color figure can be viewed in the online issue, which is available at wileyonlinelibrary.com.]

Calmodulin provides a more vivid comparison traditional ENM and the proposed MWCENM [Fig. 4(B)]. Among many alpha helixes within calmodulin, we focus on the longest one comprising residues 65–92. Every mode from MWCENM achieves torsion angle changes close to zero, whereas the corresponding modes in traditional ENM show higher values by comparison. Similar results are observed for human UMP/CMP kinase [Fig. 4(C)].

Consequently, we note that NMA based on traditional ENM fails to capture secondary structures in proteins. In contrast, the proposed MWCENM preserves the rigidity of proteins and enables us to identify secondary structural regions by measuring the torsion angle change. This good performance of MWCENM is because it replaces many unrealistic distance-based virtual springs in traditional ENM with realistic hydrogen bonds found in alpha helical regions.

To compare the simulation complexity of traditional ENM with the proposed MWCENM, we measured the computation time required for each method. MWCENM usually requires less computational time than traditional ENM because it is able to reduce the number of spring connections based on chemical information. In Figure 5, the results of a computational cost analysis of both traditional ENM and MWCENM are shown. The density of the elastic network is indicated by the number of connections in the linking matrix, and the required computation time to perform NMA on each protein structure is located on top of the bar graph (unit: second). As predicted, MWCENM creates a much sparser elastic network and thus requires less computation time as compared to the traditional ENM. Additionally, the sparseness of the network created increases as the protein size increases. For calculation, we use a PC with a 2.33 GHz Intel Core2 Quad CPU and 4.00 GB of RAM.

Figure 5.

Elastic network density comparison between traditional ENM (white bar) and MWCENM (black bar). Each protein size is represented as the number of residues inside parentheses. The elapsed time for NMA is also displayed on the top of each bar graph.

Discussion

In traditional alpha carbon coarse-grained ENM, various existing chemical interactions including covalent bonds, hydrogen bonds, ionic bonds, disulfide bonds, and van der Waals interactions are oversimplified by a distance based connection rule with a uniform spring constant. Therefore, NMA based on traditional ENM has some limitations in describing realistic conformational changes of proteins. For example, NMA of closed-form proteins often fails to capture their intrinsic functional motions, such as opening mode, because of unrealistic excessive spring connections among proximal regions. In addition, traditional ENM is also inadequate for representing the rigidity of secondary structural elements, such as alpha helices, when NMA is executed. To overcome these problems, MWCENM is proposed in this article. This method achieves both simulation accuracy and computational efficiency by not only optimizing the elastic network with various stiffness values according to the types of chemical interactions, but also considering the inertial effect of each amino acid as a lumped sum mass at the representative alpha carbon atom.

To validate the proposed method, several case studies were performed. First, the overlap values for closed-form proteins were investigated. MWCENM mostly shows higher CSO values than traditional ENM, and captures functionally important opening modes that are rarely observed in traditional ENM. Second, the torsion angle fluctuation, which is highly correlated with the rigidity of a protein structure, is observed in alpha helix rich proteins. MWCENM preserves alpha helical structures better than traditional ENM. We also compared the computational complexity in terms of computation time and elastic network density. The optimal connectivity in MWCENM dramatically reduces the computation time as the size of protein increases. Consequently, the proposed MWCENM enables us to understand protein dynamics more rapidly and more precisely by adopting more precise spring connection rules.

Methods

MWENM with NMA

In MWCENM, the elastic network is constructed using two procedures called backbone modeling and spatial interaction modeling. In backbone modeling, four consecutive atoms along the backbone (i.e., from the ith to the i + 3th alpha carbon in proteins) are connected using virtual springs. Because these constraints in Cartesian space are equivalent to the 3N-6 internal coordinate representations, including N-1 bond lengths, N-2 bond angles, and N-3 torsion angles, they can stabilize the stiffness matrix by generating only six zero eigenvalues corresponding to rigid-body motions. Then, other spring connections that represent the nonsequential but spatially close interactions, such as disulfide bonds, hydrogen bonds, salt bridges, and van del Waals interaction, are added to the network model. Although the backbone modeling is required to ensure system stability in MWCENM, the spatial interaction modeling is the required to improve the accuracy of the simulation model.

The order of magnitude of the stiffness value is assigned to each spring on the basis of the order of averaged bonding energy for the corresponding chemical interaction.41, 42 First, the strongest bonds, such as backbone covalent bonds, and disulfide bonds are easily modeled because a PDB file includes the information for these bonds. Second, hydrogen bonds are modeled using the HBPLUS program, which automatically generates hydrogen bond information from the topology of the given protein structure.43 Third, salt bridges are created between every pair of charged amino acids (i.e., interaction between cation and anion within 4 Å).44 Finally, van der Walls interactions are added to MWCENM depending on the distance between all pair of representative atoms within the 8 Å, because these nonbonded interactions are relatively weak and follow a Lennard–Jones potential profile, where the reliable energy state ranges up to 8 Å.45 Table II summarizes the various stiffness values applied to MWCENM.

Table II. Various Stiffness Values in MWCENM
Connection typeStiffness ratioCutoff condition
  • a

    HBPLUS is a program used to calculate all the possible hydrogen bonds within a protein.

  • b

    The stiffness ratio for the second range of Van der Waals force interaction is only fitted by the attractive term of Lennard–Jones potential.

Backbone (covalent)100Residue number (between ith and I + 1th)
Backbone (nonbonded)1Residue number (between ith and i + 2/I + 3th)
Disulfide bond100PDB information
Hydrogen bonda10HBPLUS
Salt-bridge10Distance between charged residues <4 Å
Van der Waals forceb1Nonbonded distance <4 Å
  equation image4 Å ≤ Nonbonded distance, dij< 8 Å

Regarding NMA, the equation of motion is derived from the Lagrangian mechanics such that

equation image(1)

where L = T – V. T and V mean the kinetic energy and potential energy of the given MWCENM, respectively. δi is the ith component of generalized deviation vector δ ∈ ℝ3N. It is physical meaning is a small fluctuation from the initial position of the ith atom xi(0) so that xi(t) = xi (0) + δi (t). The total kinetic energy in a network of n point masses is obtained by a summation of the kinetic energy of all constituent atoms. Thus,

equation image(2)

where mi corresponds to a specific lumped mass value depending on the amino acid type. In addition, the total potential energy forms

equation image(3)

where ki,j is a spring constant between the ith and jth atom based on Table II. Substitution of these two energy terms into Eq. (1) yields the following equation of motion and its full derivation is available at Ref. 16.

equation image(4)

where M is a global inertia matrix consisting of sub-diagonal matrices Mi,i each of which has a specific lumped mass value mi. Likewise, K is a global stiffness matrix that includes various sub-stiffness matrices Ki,j, which are derived from the potential energy and defined by the following equations.

equation image(5)

Substitution of equation image by equation image in Eq. (4) yields the mass-weighted stiffness matrix

equation image(6)

Once NMA is performed with respect to the transformed vector equation image in Eq. (6), the obtained eigenvector set should be inversely transformed by the multiplication of M−1/2. From this process, one can obtain both eigenvalues and eigenvectors of the given MWCENM, which can be interpreted as vibration frequencies and corresponding vibration modes, respectively.46

Overlap of normal modes

The overlap value is widely used to compare the similarity between the direction of conformational changes and the calculated normal mode of given protein. It is defined by Marques and Sanejouand,47 such that

equation image(7)

where Oj is the overlap value between the jth normal mode vector and the conformational change vector. aij is the eigenvector of the ith alpha carbon at the jth normal mode, and ri is the displacement vector of the ith alpha carbon between the two given superimposed structures. A higher overlap value correlated with, higher modeling accuracy. An overlap value of 1 implies that the computed normal mode vector exactly captures the direction of conformational change. To validate the proposed MWCENM, we also used the cumulative square overlap (CSO) of the first k modes, defined as

equation image(8)

which quantitatively measures how well the first k modes represent the conformational change of the given protein cooperatively.10

Torsion angle

Torsion angle variation along the backbone numerically represents the 3D topology of a given protein. Protein rigidity, (or flexibility), can easily be measured by the torsion angle change during local vibration or a large conformational transition. Figure 6 illustrates the torsion angle defined by the angle between two plains π1 and π2, each of which is sequentially formed by three consecutive atoms among the four given atoms Ci−2, Ci+1, Ci, and Ci+1.

Figure 6.

Schematic of torsion angle definition. π1 is the plane defined by the first three atoms Ci−2, Ci−1, and Ci. Similarly, π2 is defined by the next three atoms Ci−1, Ci, and Ci+1. The torsion angle, θ, is determined by the angle between these two planes. [Color figure can be viewed in the online issue, which is available at wileyonlinelibrary.com.]

Ancillary