Protein topology defined by the matrix of residue contacts has proved to be a fruitful basis for the study of protein dynamics. The widely implemented coarse-grained elastic network model of backbone fluctuations has been used to describe crystallographic temperature factors, allosteric couplings, and some aspects of the folding pathway. In the present study, we develop a model of protein dynamics based on the classical equations of motion of a damped network model (DNM) that describes the folding path from a completely unfolded state to the native conformation through a single-well potential derived purely from the native conformation. The kinetic energy gained through the collapse of the protein chain is dissipated through a friction term in the equations of motion that models the water bath. This approach is completely general and sufficiently fast that it can be applied to large proteins. Folding pathways for various proteins of different classes are described and shown to correlate with experimental observations and molecular dynamics and Monte Carlo simulations. Allosteric transitions between alternative protein structures are also modeled within the DNM through an asymmetric double-well potential.
It has long been the goal of biologists and physicist to relate the protein sequence to the protein fold. The mapping of sequence space onto fold space is problematic because even though most homologous proteins fold in a similar fashion, there are proteins of similar shape that can have little sequence similarity.1, 2 Moreover, many proteins function through adopting multiple conformations and are allosterically regulated by binding events.3–6 Also, protein misfolding, which underlies many disease states, can be triggered by relatively minor sequence changes.7 However, the structural transitions that a protein goes through during folding, referred to as the folding path, appear to be largely defined by the final structure8 even when two proteins with little sequence similarity fold to a similar structure.9 Because the folding path is largely encoded by the native state topology, it can be studied in a sequence-independent manner.
Native structure-based folding dynamics has been modeled by molecular dynamics (MD) simulations with interactions defined by Go potentials.10–12 In these models, the potential is parameterized by the native conformation which is defined at various levels of coarse-grained reduction.13 Go models in their simplest incarnation, model the Cα chain with non-neighboring atoms moving under a Lennard-Jones-like potential parameterized by the native conformation and a power law potential for the more rigid covalent bond interactions. These models have been extended to include statistical potentials modeling pseudo-torsion angles14, 15 and have also included Cβ atoms.16, 17 Taking the limit of protein fold dynamics being purely encoded by the native residue contacts and the potential, a simple quadratic function with an influence radius cut-off results in the elastic network model (ENM).18–20 The ENM can be viewed as a simple dynamical extension of the protein fold topology encoded by the matrix of residue contacts. Such coarse-grained reductions of the protein chain greatly speed up dynamical simulations in contrast to full atom MD simulations21 that are currently hampered by the relatively short time frames that fall well below the folding times for large proteins, and it is difficult to see how residue-independent folding characteristics can naturally emerge from such treatments.
In the ENM, the protein is represented by a Cα backbone structure that undergoes coupled harmonic oscillation with other Cα atoms within a defined sphere of influence. The potential is approximated by a quadratic fluctuation term, which can be diagonalized to give eigenmodes that describe the linearly independent oscillations of the protein. Fluctuation correlations show excellent agreement with crystallographic B-factors that measure Cα positional uncertainty19, 20, 22 and have been successful in describing allosteric couplings.23, 24 Also, a perturbative treatment of ENM has been applied to interaction site prediction.25–28 In many cases, global structural transitions have conformational change vectors correlating with the low-energy eigenvectors of the ENM corresponding to one of the alternative conformations.29–31 In addition to predicting allosteric couplings, large-scale allosteric transition pathways have been modeled by an iterative ENM normal mode approach32–36 and within a plastic network model (PNM), where the pathway is defined as the minimum energy path interpolating between two distinct ENM minima.37, 38 Protein unfolding has also been treated within the ENM through a perturbative correlation analysis where, with each iteration, the residue pair undergoing the largest fluctuations has its contact broken.39 Unfolding pathways are thought to mirror folding pathways,40 and the analysis of Su et al.39 show the contact matrix evolution to be in broad agreement with experimental and theoretical observations.
When modeling the folding pathway, it is no longer possible to apply the normal mode analysis of ENM as the motions are no longer small oscillations about the native state. As mentioned above, unfolding can be investigated within an iterative ENM model, but it is difficult to see how to apply this iterative approach to folding. Alternatively, the full classical equations of motion of the ENM can be solved numerically to simulate global folding. In particular, we can start from a completely unstructured protein represented by a random noncontacting Cα backbone and use numerical iteration to follow the evolution of the protein fold. However, this leads to an unphysical rapidly escalating kinetic energy component as new residues are continuously brought within the influence radius. To resolve this problem, we introduce a damped network model (DNM) with a friction or damping term modeling energy dissipation to the water environment in which the protein is folding. The resulting folding path, defined as a series of structural transitions eventually converging to the native configuration, has many features in common with experimental observations and full atom simulations. In contrast to the unfolding pathway derived through sequential release of unstable couplings,39 the present approach is not sequential and many folding events occur throughout the protein at the same time, which is a more realistic scenario.
Local minima along the folding pathway are a common problem with folding simulation and there are instances with the DNM where the protein stalls at a non-native structure. Within this deterministic model, the stalling is a consequence of the initial random conformation, but the space of initial configurations leading to folding is sufficiently large for this not to be a critical problem. The reason for this is that we are not following a simple steepest descent minimization strategy but following a vibrating polymer that is sampling configurations away from the local minima. This is similar to the role played by the temperature in Monte Carlo simulations,41 but we are not restricted to generate local moves and the protein is folding all at once.
In many instances, proteins adopt multiple conformations and can dynamically switch between conformations. Conformational switches are intimately related to protein function, where, for example, a ligand binding event can result in a remote or allosteric conformational change in the protein that leads to a down stream signaling event as the protein is switched from an inactive to an active conformation.42 Such multiple conformations have been studied extensively and have led some researchers to further separate sequence from structure and question the discretization of structural domains.43
The transition dynamics from one structure to another can also be modeled by a network model but with two discrete minima. In particular, the structural transition path of adenylate kinase (AKE) has been studied within the context of a double-well PNM where the transition pathway is defined as a minimal energy interpolation between the two minima representing the open and closed configurations.37, 38 We argue that this transition can also be modeled with the classical equations of motion of a double-well DNM. In particular, starting with the protein in one potential minimum, we introduce an asymmetry in the double-well potential destabilising the protein and eventually leading it to fall into the lower energy conformation. The transition dynamics are parameterized by the environmental viscosity, the influence radius, and the potential asymmetry, which together trigger and dictate the nature of the transition events.
We first present the DNM folding pathways for two extensively studied proteins, barnase, and chymotrypsin. The DNM method is not restricted to small proteins, and we illustrate the folding pathway for a larger protein, a serine lipase. The asymmetric double-well version of DNM is then introduced and shown to model the allosteric transition in AKE. There follows a Discussion section. The mathematical framework behind the DNM approach is given in the Methods section at the end.
There is a wealth of data on the intermediate states that proteins adopt on the way to their native fold or alternatively from the native fold to denatured state.44 The most direct measurements come from NMR experiments where one can follow the appearance/disappearance of interactions with changing temperature or environmental pH. This data can be presented in the form of an evolving contact matrix and we can make a direct comparison with our pathways. Computationally, intensive full atom MD simulations have shown good agreement with NMR studies, at least for small proteins, and these can also be compared with our approach. In what follows we will look first at two small proteins that have been extensively studied in terms of their folding pathways and then we will describe the folding pathways of a larger protein.
Barnase is an αβ-protein that has been the subject of many folding simulation and NMR experiments.44–47 The crystal consists of 108 residues and the secondary structure consists of: α1(7–17), loop1(18–26), α2(27–33), loop2(34–41), α3(42–45), β1(49–56), loop3(57–68), β2(69–76), loop4(77–85), β3(86–91), β4(95–99), β5(105–108). Starting with a random noninteracting fold and solving the DNM equations of motion, we obtain our predicted folding path. A representative folding path is illustrated in Figure 1 giving the Cα backbone structure at various values of the native fold contact fraction, Q. Native contacts were defined as between residues that are more than three residues apart along the chain and within 8 Å of each other. To get a full picture of the folding pathway, we performed 100 folding runs and calculated the average occupancy of the contacts at various Q values. The average contact matrices for various Q values are shown in Figure 2, and the evolution of the total native contact fraction is shown in Figure 3. Fold evolution as a function of Q is a description adopted in the MD simulation study.47
The folding nucleation sites have been localized to the β34 and β45 sheets through NMR and MD simulations. It has also been shown that the α2, α3, and loop2 contacts are stable under multiple high-temperature MD unfolding simulations.47 It is interesting to compare these results with our methodology especially as the unfolding of this protein has been successfully modeled by an iterative normal mode analysis.39 The small-scale mobility of Barnase has been the subject of a coarse-grained network model analysis using rigid substructure decomposition that shows good agreement with NMR experiments.48 Under the classical DNM oscillator equations of motion, the fold can be illustrated by the contact map evolving with Q, Figure 2. The initial secondary structure contacts correspond to the three N-terminal helices, Q = 0.1. The fold nucleation contacts corresponding to the non-neighboring secondary structure interactions of the β34 and β45 sheets together with the second and third helices and the loop2 interactions emerge at Q = 0.3. This is in agreement with NMR and MD simulations.47 The early folding of the first helix is mirrored in its stability in unfolding MD simulations and this is in contrast to the ENM iterative unfolding result.39 The folding rate profile is given in Figure 3 and this shows a greater folding rate at the beginning and end of the fold. Of the 100 random initial conformations, 48 resulted in full folding to the native fold and the results presented are the averages over complete folds, the average iteration count per fold was 7466. Reaching the native fold was defined as native contact occupancy of greater than 95%. A run was judged to stall in a local minimum away from the native fold if the root-mean-squared distance matrix difference (RMSD) to native fold oscillates above the cut-off (1 Å) for a sufficient number of iterations. One could equally well define fold termination as the fold to native RMSD falling below 1 Å with essentially the same results. When the influence radius was lowered, the folding success rate is diminished, with 8% for a 13 Å influence radius (19,783 steps) and 26% for a 14 Å influence radius (13,854 steps). A representative folding animation (100 steps per frame) is given as a Supporting Information Movie S1.
Another well-studied folding pathway is that of the chymotrypsin inhibitor 2, protein data bank (PDB) accession 2CI2.39, 46, 49–51 This is a relatively small molecule with 64 residues and a secondary structure progression: α1(12–24), β1(28–34), loop1(35–34), β2(45–52), β3(61–64). The DNM folding path contact maps are shown in Figure 4. Of 100 random initial noninteracting folds, 88 complete folding paths resulted. NMR and MD simulations have established the α-helix and a central hydrophobic cluster (32–37) as the first contacts established in the folding path.49 The DNM simulations are consistent with this scenario with the α1 and Leu32-Thr37 contacts emerging at Q = 0.1 plot in Figure 4. One recurring feature of the simulated unfolding path of 2CI2 is the stability of the β12 sheet relative to the β23 sheet.39, 49–51 Within our simulation, this contact cluster appears at Q = 0.3 but only emerges strongly at Q = 0.5 and after the β23 contacts are established. This may reflect an incomplete mirroring of the folding versus unfolding pathways. However, running the DNM folding simulation with the smaller influence radius of 13 Å, the β12 contacts emerge at Q = 0.1 and before the β23 cluster, see Figure 5. In this case, only 12% of the folding paths are complete and for folding to be successful, an early contact cluster that does not nucleate at residues close on the chain, the β12 cluster, needs to be established. Thus the influence radius can have an effect on the fold progression and we will see this later on for models of allosteric switching. A representative folding animation (100 steps per frame, Rc = 15 Å) is given as a Supporting Information Movie S2.
To illustrate the generality of this approach, we folded the relatively large globular serine lipase belonging to the hydrolase fold class, PDB accession 1TIB. This protein is an αβ-protein consisting of 269 residues. The folding pathway is illustrated in Figure 6. Multiple folding runs were carried out, and of 100 random starts, there were 26 complete folds, taking on average 28,435 iterations of the DNM equations of motion. The folding path is illustrated in Figure 6 and the corresponding movie is given as Supporting Information. The striking feature here is the early emergence of the α-helices. The fold nucleation sites are predicted to be the β34 and β56 sheet contacts. The fold proceeds by the formation of two distinct globular domains, with the C-terminal domain nucleating around the β10-11 sheet. A representative folding animation (100 steps per frame) is given as a Supporting Information Movie S3.
AKE is an enzyme catalyzing the interconversion of adenine nucleotides. On binding either AMP or ATP, it undergoes a large conformational transition where the nucleoside monophosphate binding (NMPbind) domain and the LID domain fold into the large CORE domain,52–55 see Figure 7(A). This transition has been studied experimentally by florescence resonance energy transfer and there is structural data available on homologous proteins that have structures lying between the opened and closed conformations. There have been many coarse-grained model studies of the AKE structural transition pathway. Some groups have developed iterative normal mode interpolations, based on Cα and full-atom rigid substructure models.32–36 Seelinger et al.56 generate an ensemble of possible deformations based on distance constraints and show that the AKE transition overlaps with a subset of deformations. Transition pathways have also been defined as saddle point solutions of double-well potentials connecting the two ENMs corresponding to the alternative conformations.57–59 The treatment closest to our approach is the double-well PNM where the transition pathway is defined as a minimal energy interpolation between the two minima representing the open and closed configurations.37, 38 Both treatments with the simple double-well potential find the LID region closing before the NMPbind region. Maragakis and Karplus37 show that AKE approaches intermediate structures along the transition path. A more complex potential used in the double-well network model (DWNM) of Chu and Voth38 results in two minimal free-energy paths depending on the minimization procedure and one of these has the NMPbind region closing before the LID region. This latter pathway has also been shown in recent simulation studies to characterize the unbound as opposed to bound AKE closure.60, 61
Structural transitions can be modeled within the DNM with an asymmetric double well. The opened and closed conformations of AKE correspond to PDB accessions 4AKE (chain A) and 1AKE (chain A), respectively. The double-well potential is the same as described above37, 38 with an additional asymmetry factor such that a protein initially in the high-energy minimum will over time slip out of this state and migrate to the lower minimum, see Methods Eqs. (13) and (14). As expected, the transition is faster for larger influence radii and requires a minimal asymmetry parameter. The transition path in terms of the RMSD between the intermediate fold and the terminal conformations is given in Figure 7(B). The transition path time increases with decreasing asymmetry parameter α, below α = 0.66, the transition no longer occurs. To investigate the LID and NMPbind closure, we divide the residues of AKE into three regions NMPbind (48–55), CORE (170–214), and LID (121–160). Following Chu and Voth,38 we follow the closure with the parameter , where do/c is the distance between the LID/NMPbind centroid and the CORE centroid in the opened/closed configuration, see Figure 7(C,D). Interestingly, for an influence radius of 10.5 Å and above, the NMPbind domain closes before the LID domain and this pathway is remarkably similar to that observed in the DWNM treatment.38 However, for influence radii below 10.5 Å, the LID domain closes before the NMPbind domain in agreement with the results of the PNM treatment.37, 38 This order swap does not occur with varying the asymmetry parameter, Figure 7(C). As this pathway depends critically on the influence radius, we looked to the crystal data to see if the B-factors can be used to fix the influence radius through Eqs. (7) and (15). We find that for the closed structure, the correlation of fluctuations with B is maximal at Rc = 10 Å whereas for the opened conformation, this correlation is relatively unchanged over the range 9–20 Å. This does not allow us to unambiguously fix Rc and so we have to fall back on the conclusion in Chu and Voth38 and posit the coexistence of the two pathways. The folding animation for Rc = 15 Å (20 steps per frame) is given as a Supporting Information Movie S4 and the animation for Rc = 10 Å (100 steps per frame) is given as a Supporting Information Movie S5.
We have presented a DNM description of protein dynamics and applied it to protein folding and large-scale structural transitions. The DNM can be viewed as a reduction of Go model dynamics or a simple extension of the ENM. The model presented here differs from Go models in that the bonded and nonbonded interactions are both modeled with a quadratic potential that extends over a finite radius of influence with the oscillations being damped by a friction term. Within ENM, the structure oscillates about the native conformation and the results are in good agreement with dynamical data encoded in the crystallographic B-factors. The ENM has also served as a powerful tool for the pinning down allosteric couplings between protein residues and predicting possible ligand binding sites. Large-scale protein motions associated with allosteric switches have been modeled with an iterative ENM and a plastic extension of the ENM, the PNM. Directly relevant to the present study, protein unfolding has been modeled with an iterative ENM strategy. Our coarse-grained DNM model can be viewed as a further extension of these approaches in that the protein evolves according to the classical equations of motion. This is not a straight forward energy minimization strategy that would be beset by local energy minima but a model where the protein has a kinetic component that is continuously being dissipated through a friction term. Consequently, the local minima stalling frequency is relatively low as the protein is always sampling nearby conformations away from the local minima. The model has relatively few parameters and is easy to implement. It appears that there is good agreement with folding pathways that have been reported in the literature, with striking similarity to the reported folding paths of barnase and chymotrypsin. The method can equally well be applied to proteins of arbitrary length, and we present the folding path of the 269 residue lipase as an example. Our methodology for modeling structural transitions is an extension of the PNM potential to include an asymmetry parameter in the double-well potential so that under the classical equations of motion, the protein slips out of the higher energy minimum and ends up in the lower energy minimum. Again we illustrate and validate the methodology with a well-studied system, the allosteric switch in AKE. In the structural transition of AKE, there are two alternative pathway topologies depending on which of the LID or NMPbind domains closes first. We have shown that there is a switch between the two pathways as the influence radius passes through 10.5 Å and also defined the minimal asymmetry parameter for the transition to occur.
All the results presented in this study are sequence independent and this has proved to be a successful simplification of the protein dynamics within coarse-grained network models. However, the contribution of each residue along the chain can be examined by varying the spring constant coupling and this has been done for the allosteric switch in GroEL.62 In our case, it will be interesting to see whether residue-specific spring constant variation leads to different folding pathways and/or success rates, and whether the DNM methodology can then be extended to look at mutational instability and protein misfolding. This is the subject of a work in preparation (AJT and GW).
We conclude on a technical note. The ENM potential is closely related to a global minimization function which is commonly exploited in solving the molecular distance geometry problem (MDGP),63 which determines the native structure of a protein based only on distance data between all atoms or a subset of atoms. The approach described in this article can be exploited to solve the general MDGP and related problems in a global nonperturbative fashion that competes well with other MDGP methods (AJT and GW, in preparation).
The ENM potential between two residues i and j is given by
with is the position vector of the ith Cα along the protein chain. The zero superscript refers to the stable native configuration about which the protein oscillates and kij is the set of harmonic spring constants.
In the simplest incarnation of ENM, the spring constants are taken to be residue independent.19, 20 In practice, the nearest neighbor separation must fluctuate over a narrow range restricted by the covalent bond architecture linking neighboring Cα atoms, so that we define
In the absence of this relative rigidity in the neighboring atom fluctuations, the intermediate fold conformations have unphysically large nearest neighbor separations. Interestingly, the folding efficiency is also greatly enhanced when nearest neighbor atoms are forced to stay within physical bounds.
Only residues within a given sphere of influence, Rc, can be considered to physically interact with each other so that the ENM spring constant vanishes for well-separated residues:
The dynamics is partly driven by a repulsive term when and . For large Rc, it could be argued that there is an unphysically large repulsive term. However, when the repulsive contribution is rendered constant by scaling for and , we obtain essentially the same folding and transition pathways.
Provided fluctuations are small about the equilibrium conformation, we can approximate the potential to quadratic order. Here, , where H is the Hessian matrix and are the fluctuations about the native conformation.
This now enables us to exactly solve for the thermodynamic fluctuations
where kB is Boltzman's constant and T is temperature, and thus relate the parameters of the theory to the crystallographic temperature factors, defined as .64 Specifically, the spring constant is then given by
where brackets refer to averaging over the protein chain.
Now we will consider the coarse-grained protein model as a deterministic classical system governed by the equations of motion, . Explicitly:
where the “hat” refers to the unit vector. To dissipate the kinetic energy component, we introduce a friction or damping term, μ, into the equations of motion:
These are the DNM equations of motion and apart from the native target structure, the solutions will depend on the spring constant per unit mass, the damping term, and the influence radius. Their values will be discussed in the next section.
In practice, the differential equations Eq. (9) are solved by the symplectic Euler method65
where n is the iteration number and δ is the time increment.
When the protein can adopt two native conformations “a” and “b,” the potential will be either or depending on which is minimal, and correspond to the Cα distances for the two conformations. That is
To ensure smooth interpolation between the two minima, we introduce a small parameter e:37, 38
For a transition to occur within the context of classical trajectories, we introduce a destabilising term that favors one minimum over the other. The simplest way to do this is via an asymmetry parameter α. Specifically,
with the double-well classical equations of motion given by
Of course, an isolated residue sitting in the high-energy minimum will not end up in the lower minimum as there is an energy barrier to overcome. However, for an allosteric transition, there is in general a set of residues within the protein for which the transition path is short enough for the barrier to vanish and it is these residues that drive the transition. It will be shown that transitions only occur for a large enough asymmetry.
We now deal with the parameterization of the model. Folding within the DNM is achieved by initiating the protein chain in a random completely noninteracting configuration and then following the trajectories of the Cα nodes under the equations of motion. At each stage, only those nodes within the influence radius are visible to the individual residues, Eq. (5), and as the protein collapses, more residues come within the influence radius and there is a build-up of kinetic energy that is dissipated via a friction term that models the water environment, Eq. (9).
Based on protein interaction statistical potential models, see for example,66, 67 we take the influence radius to be 15 Å. For larger radii, the protein will fold faster but such large range interactions are unphysical. Within ENM, the radius of influence is chosen to maximize the correlation of the normal mode fluctuation magnitudes with the crystallographic temperature B-factors.19, 20 However, the correlation varies with structure and we find maximal correlations for chymotrypsin (2CI2) 15 Å, barnase (1A2P) 7 Å, hydrolase (1TIB) 12–20 Å, and for AKE, the correlation is maximal at 10 Å for the closed conformation (1AKE) and 9–20 Å for the opened conformation (4AKE). Without crystallographic data on the intermediate fold conformations, it is difficult to derive Rc from ENM analysis and we are forced to rely on potential model parameters. We find that if the influence radius falls below 15 Å, then the folding probability is substantially reduced and folding time increased. However, as will be shown below, when considering DNM models of allosteric transition, the transition path can undergo a dramatic change with a lower Rc. We take the spring constant per unit mass to be 1/2, which determines the time scale of the problem. Statistical thermodynamics allow us to fix the parameters of the theory based on crystallographic B-factors, Eq. (7). So, with our choice of spring constant per unit mass, we have time units of , where m is the mass of the resonating node, in our case, that of the amino acid. In the numerical iterations, Eq. (10), we set the time increment to be 0.01. The damping term is set to unity. For a small damping term, the iteration becomes unstable and for a large damping term, the fold is slow. For the double-well potential, we set the smoothing factor, Eq. (12), to be 0.1 and perform simulations with various choices of asymmetry parameter, α. All programs were written in C and run on a PC. Movies were generated in gif format by importing folding coordinate frames into Maple (Waterloo Maple, Maplesoft). Structure images were generated by DS ViewerPro software (Accelrys).