Molecular dynamics (MD) simulations are widely employed for structural and dynamics characterizations of peptides and proteins.[1-4] These simulations rely mainly on classical force field parameters, such as AMBER,[5-7] CHARMM, GROMOS,[9, 10] and OPLS-AA.[11, 12] Amongst different types of force field parameters, backbone, and sidechain torsional potentials have been the subject of extensive reoptimizations, leading to improved modifications of AMBER[13-16] and CHARMM[17, 18] force fields. Based on a number of detailed benchmark studies,[19-33] AMBER99SB has emerged as one of the force fields which reproduces experimentally measured parameters with better accuracy compared to other force fields. This force field has undergone further useful refinements in recent years.[15, 16] To predict the correct balance of secondary structure propensities in proteins, a simple backbone energy correction was introduced to reproduce the fraction of helix measured in short peptides at 300 K, with the modified force field known as AMBER99SB*. Recently, the AMBER99SB force field has been improved further (known as AMBER99SB-ILDN) by refitting the amino acid sidechain torsion potentials of the AMBER99SB force field for four residues: isoleucine, leucine, aspartic acid, and asparagine.
One of the important properties not exploited in the force field optimizations for biomolecular MD simulations is the timescale of motion for a given backbone or sidechain fragment. As a result, while the motionally averaged experimental NMR parameters can be reproduced well by new force fields, the timescale over which this averaging is achieved may deviate significantly from experiment. The reason for the lack of timescale verifications is that either experimental data is not available or it is not clear how the force field parameters can be modified to reproduce better the experimental data. To explore the possibilities that involve experimentally known motional timescales in force field optimizations, we have selected a relatively simple example of the proline (Pro) sidechain dynamics in this work. The simplicity of the Pro dynamics arises from the fact that unlike other amino acid residues the Pro residue has a unique cyclic structure, which interconverts continuously between two conformers, known as Cγ-endo and Cγ-exo. Another factor in favor of the Pro residue is that numerous theoretical and experimental studies have been undertaken in the past focusing mainly on the pyrrolidine ring dynamics. Furthermore, the torsional parameters of the Pro residue have not been optimized in the past and standard force field parameters obtained for open chain fragments are used for proline. The result is that the predicted geometry of the pyrrolidine ring by AMBER force fields is relatively flat compared to single-crystal X-ray diffraction data or quantum-mechanical (QM) calculations, as judged by the value of the endocyclic torsion χ2 (Fig. 1) or the pseudorotation amplitude χm, also known as the maximum puckering angle. In the first approximation, the nonplanarity of the pyrrolidine ring can be assessed by how far atoms Cβ and Cγ are placed from the plane formed by the remaining three atoms. The further they are from the plane, the higher the absolute values of χ2 and χm are. We note that changes in geometry of the ring have also further energetic implications, and, as shown previously, the larger the maximum puckering angle the larger the pyrrolidine ring interconversion barrier in Pro and hydroxyproline (Hyp) residues. The increase in the energy barrier implies less frequent transitions or longer motional timescales. Based on these considerations, force field optimizations may potentially improve the accuracy of MD simulations for predicting both the structure and dynamics of the Pro residue in proteins. Note that one of the important attributes of the Pro residue is its hinge-like function, which enhances the probability of β-turns in proteins. Therefore, accurate predictions of the proline structure and dynamics may have critical implication on the outcome of MD descriptions of proteins.
Returning to the original problem of force field optimizations, we expect that the introduction of an additional dynamics constraint into force field optimizations should be advantageous from a methodological point of view, as multiple solutions are often found in force field optimizations which fit equally well experimental data. This is not surprising, as the experimental data consists of motionally averaged values of NMR J couplings and chemical shifts, which are dependent on the relative populations of conformers, but not on how fast they exchange. Timescale fittings combined with fittings of NMR J couplings and/or chemical shifts are expected to select a correct solution in such cases. Unlike previous optimizations based on the quantum-mechanical calculations, we will use experimentally measured NMR J-couplings in our initial re-optimization of the Pro sidechain torsion potentials. The approach used by us is based on either simple grid search or iterative fittings of experimental NMR data, in which a figure-of-merit function is evaluated using MD trajectories calculated for each trial set of parameters. Once torsional force fields reproducing experimental NMR J-couplings have been identified, we will probe MD-predicted timescales of motions which best match experimental data. 13C NMR spin-lattice relaxation times will be used to estimate both overall and intramolecular timescales of motions. In addition to the Pro residue, we will also reoptimize torsional force field parameters for the trans-4-hydroxy-L-proline residue (Hyp) to match experimental dynamics data.
MATERIALS AND METHODS
Apart from Ace-Hyp-NHMe (AHM) and Ace-Hyp-Gly (AHG), all other peptides were used as received from Sigma Aldrich and Cambridge Bioscience. The synthesis of AHM and AHG is described in Supporting Infromation. Experimental values of proton 3JHH couplings and internuclear proton distances for N-acetyl-l-proline (NAcPro), Gly-Pro-Gly-Gly (GPGG) and Val-Ala-Pro-Gly (VAPG) in D2O solutions at 298 K were taken from Refs. [34-36]. The experimental data for angiotensin II, AHM and AHG was determined in this work (see below) using full lineshape analysis. Unless otherwise specified, the trans-orientation about the amide bond preceding the Pro (or Hyp) residue is assumed for a given peptide. For the values of 3J-couplings determined from the full lineshape analysis, the standard deviation was estimated to be <0.1 Hz.[34-36] Experimental values of 3J-couplings for ubiquitin were taken from Refs.  and . The root-mean-square (rms) deviations in the 3D-derived 3J values of ubiquitin were estimated to be of the order of ∼0.1 Hz.
Solution 1H NMR spectra were recorded on a Bruker Avance III 600 MHz NMR spectrometer equipped with a 5 mm cryoprobe (1H 600.13 MHz and 13C 150.90 MHz). Data acquisition and processing were performed using standard TopSpin (version 2.1) software. 1H and 13C chemical shifts were calibrated using dioxane shifts in D2O (1H 3.75 ppm, 13C 67.19 ppm). Uncertainties in measured values of 1H and 13C chemical shifts were typically better than ±0.01 ppm. Unless otherwise specified, NMR measurements were carried out at 298 K. High (>300 K) and low (<300 K) temperature calibrations were carried out using standard samples of 80% 1,2-ethanediol in DMSO-d6 and 4% CH3OH in CD3OD, respectively.
The 13C spin-lattice relaxation times were measured for solutions of peptides in either D2O or H2O:D2O (9:1) using a standard inversion-recovery technique with the 13C observation in the presence of proton decoupling. To minimize errors associated with low signal-to-noise ratios, these experiments were carried out on a 600 spectrometer with a dual channel 1H/13C cryoprobe with the sensitivity optimised for 13C measurements. From five independent measurements carried out at probe ambient temperature (293 K) for the 214 mM solution of GPGG in D2O at different dates over 60 days, the standard deviations for 13C T1 measurements were within 0.4–1.4% of the corresponding mean values. From three independent measurements carried out at 298 K for the 77 mM VAPG in H2O:D2O (9:1), the standard deviations for 13C T1 measurements were within 0.2–1.1% of the corresponding mean values.
Chemical shift anisotropies (Δσ, in ppm) of aliphatic carbons were measured using slow MAS measurements (2.5 kHz) on a Bruker AVANCE III 850 spectrometer equipped with a 4 mm CPMAS probe and a solid sample of L-proline. The estimated Δσ values (−43 ppm for Cα and −30 ppm for Cγ of L-proline) were used in calculations of correlation times using 13C T1 values. From the 13C T1 calculations, at Δσ = −43 ppm, the dipolar relaxation mechanism remains the dominant factor determining 13C T1 relaxation at 14.1 T, while chemical shift anisotropy accounts for <1% of T1 values.
MD calculations and simplex fittings
All MD simulations were carried out using GROMACS (version 4.5.5). One molecule of NAcPro molecule (terminated with CO2− and with a Na+ cation added for neutralization) was solvated with 147 water molecules in a dodecahedral box with a volume of 4.7 nm3 in MD simulations. Periodic boundary conditions and the TIP3P water model were employed in all MD simulations. An integration step of 2 fs was used and neighbor lists were updated every 5th step. The particle mesh Ewald (PME) method was employed for the electrostatics with fourth-order interpolation. The neighbor list and the real-space cutoff distances were set to 0.9 nm, which is similar to that used in optimizations of the original force field and its recent modifications.[5-7, 13-16] The van der Waals interactions in all MD simulations were treated with a twin-range cutoff method using the neighbor list and van der Waals cutoff distances. The value of the van der Waals cutoff distance was 0.9 nm.[5-7, 13-16] The temperature at 298 K was controlled using velocity rescaling with a stochastic term (V-rescale) and a time constant of 0.1 ps. A Parrinello–Rahman scheme was employed for pressure control at 1 bar using a coupling constant of 2 ps and an isothermal compressibility of 4.5 × 10-5 bar−1. Prior to production MD runs, including those implemented within downhill simplex optimizations,[44, 45] the system was minimized using steepest-descent and conjugate gradient algorithms. Minimization steps were followed by four steps of equilibration. The system was equilibrated for 40 ps with the positionally restrained solute molecule to allow water molecules to equilibrate around it, followed by a NVT molecular dynamics for 100 ps, NPT dynamics for 200 ps and another NVT dynamics for 200 ps. Reproducible production MD simulations at each step of simplex fittings were performed for 7.5–40.5 ns using NVT ensemble, the first 0.5 ns of which was discarded from the calculations of averaged NMR parameters. For the selected set of parameters from simplex fittings additional 200 ns long MD simulations were carried out.
The vicinal 3J couplings of the five-membered pyrrolidine ring in NAcPro (as well as in other peptides, see below) in each frame of MD simulations were calculated using empirically optimized Karplus-type equations 8C and 8D of Haasnoot et al. These equations contain terms accounting for the differences in electronegativities of α- and β-substituents, and hence are better suited for the analysis of the 3J couplings of the pyrrolidine ring than the original Karplus equation. The precision of equation 8C of Haasnoot et al. (expressed as the rms deviation) for a structural fragment containing 2 substituents (-CH2X-CH2Y-) is estimated as 0.367 Hz using a set of 45 experimental 3JHH couplings. The precision of equation 8D of Haasnoot et al. for a structural fragment containing 3 substituents (-CHXY-CH2Z-) is estimated as 0.485 Hz using a set of 100 experimental 3JHH couplings.
To analyze MD trajectories, including those obtained at each step of simplex fittings, dihedral angles were extracted for each frame recorded every 0.01 ps during the MD simulation. The calculated values of 3J couplings using the corresponding dihedral angles in each frame were used to calculate the averaged values of 3J couplings over the duration of the MD simulation. The rms deviation defined as (denoted as rmsJp for the 3JHH-couplings of the pyrrolidine ring) was used as a figure-of-merit function in simplex fittings, where and are conformationally averaged experimental and calculated couplings, respectively, and N is the number of different J couplings available (N = 10 for the Pro sidechain). As simplex may in principle lead to a local minimum of the merit function,[44, 45] it is important to consider several sets of starting values of the optimized parameters xj. This was achieved by varying the factor c, by which one of the optimized parameters xj is varied within the first n + 1 steps of the simplex run using the following expression: xj + c xj (i.e., at step n = 1 the initial values of xj from the original AMBER99SB force field are used followed by x1 +c x1, x2, … xn at step n = 2, etc.). Several simplex fittings were considered with c varied between 0.2 and 5 (see the main text for further details). In addition, for |c| < 1, both positive and negative values were considered. An additional constraint requiring xj > 0 was imposed in simplex fittings.
For further optimization and validation of newly derived force field parameters, 800 ns MD simulations of GPGG, VAPG, Gly-Pro-Phe (GPF), 1.5 μs MD simulations of angiotensin II, 1 μs MD simulations of human ubiquitin (PDB entry 1UBQ), 600 ns and 1.5 μs MD simulations of AHM and 1.5 μs MD simulations of AHG were carried out. One molecule of zwitterionic GPGG was solvated with 253 water molecules in a dodecahedral box with a volume of 8.3 nm3. For VAPG, one molecule of zwitterionic peptide was solvated with 260 water molecules in a dodecahedral box with a volume of 8.4 nm3. In the case of GPF, one molecule of zwitterionic peptide was solvated with 292 water molecules in a dodecahedral box with a volume of 9.3 nm3. Similarly, one molecule of angiotensin II (with a Cl- anion added for neutralization) was solvated with 1201 water molecules in a dodecahedral box with a volume of 40.8 nm3. One molecule of ubiquitin (with six Na+ cations added for neutralization) was solvated with 2605 water molecules in a cubic box with a volume of 91.1 nm3. For the Hyp parameter optimizations, one molecule of AHM was solvated with 225 water molecules in a dodecahedral box with a volume of 7.4 nm3 and one molecule of AHG (with a Na+ cation added for neutralization) was solvated with 300 water molecules in a dodecahedral box with a volume of 9.4 nm3. Other conditions and parameters of MD simulations were the same as described above for NAcPro. Frames recorded every 1 ps were used in estimating averaged 3J-couplings from MD simulations of GPGG and ubiquitin.
The calculated 3JHH couplings are expected to depend on the length of the MD simulation. To estimate the significance of this dependence, we have considered MD simulations of varying lengths. Calculations of 3JHH couplings using 600, 700, and 800 ns long MD simulations of GPGG using the modified force field (referred to as (25), Table 1) showed the largest variation of less than ±0.023 Hz in the calculated 3JHH values over 200 ns change in the length of the MD simulation (<0.5% of the value of the 3JHH coupling). Two MD simulation of GPGG with 800 ns and 3 μs lengths were available for the parameter set (19), with the third largest value of V3 considered (6.92437 kJ mol−1, Table 1). These were used for error estimates in MD-predicted quantities. The changes were (see Tables 1-4 for definitions of parameters): Pexo 0°, Pendo 0°, χm +0.1°, xendo +0.9%, rmsJp +0.025 Hz, pf 0 %, dter 0 Å, Nψ1 +0.41, Nψ2 +0.04, Nϕ3−0.33, Nψ3 −0.29, Nχ1 −0.02, Nχ2 +0.01, S2 0, τi −0.02 ps. The negative sign here corresponds to the decrease of the value on increasing the length of the MD simulation. The absolute values of these changes can be considered as an estimate of the upper limit of errors involved, as the value of the force constant in parameter set (19) is higher than that in the final selected set (25), hence requiring longer MD simulations for better convergence in calculated parameters in the case of (19).
Table 1. Summary of Torsional Force Constants (Vn, in kJ mol−1), Their Phases (γn, in Degrees) and the Pyrrolidine Ring Conformational Characteristics of NAcProa
V1 (kJ mol−1)
V2 (kJ mol−1)
V3 (kJ mol−1)
V4 (kJ mol−1)
V5 (kJ mol−1)
V6 (kJ mol−1)
γ1 = γ2 = 180° and γ3 = γ4 = γ5 = γ6 = 0°.
From least-squares fittings of the vicinal 3J-couplings using Eqs. ((8)C) and ((8)D) of Haasnoot et al.
The motionally averaged 3J-couplings of the peptide backbone of GPGG and ubiquitin were calculated using quantum-mechanically derived Karplus relationships[31, 49] and empirically parameterized Karplus equations.[50, 51]
Interatomic distances from the MD simulations of GPGG were calculated in a manner similar to that used in NMR measurements: (i) internuclear distances (ri) for pairs of hydrogen atoms were calculated in each MD frame i; (ii) a quantity equal to r−6 was calculated as a measure of the expected NOE in each frame, ηi; (iii) the sum of ri−6 were used as a measure of the expected total NOE over the full length of the MD run; (iv) using r = 2.4 Å as the reference Hα-Hβ3 distance in the Pro residue, internuclear distances for other proton pairs were calculated using the η ∼r−6 relationship.
As shown by Tropp, when overall molecular motions are relatively slow and intramolecular motions are relatively fast, NOEs may show a r−3 dependence, for example, in globular macromolecules. In the case of the tetrapeptide GPGG used in this work for the NOE analysis, timescales of overall and intramolecular motions are both relatively fast. We have therefore used the r−6 dependence of NOEs. This is consistent with the simplified growth rates method used widely for interproton distance measurements in small molecules.[53-57]
To determine autocorrelation times for the intramolecular motions of the C-H bond from MD simulations, the corresponding internal autocorrelation functions were calculated using the following equation:
where denotes an average over the MD trajectory, is a unit vector along the CH bond direction and P2 is the second order Legendre polynomial. Prior to the calculations, the overall rotational and translational motions of the solute molecule were removed from the MD trajectory. This was accomplished by superimposing the sequence of four bonded peptide backbone atoms C(Pro)-Cα(Pro)-N(Pro)-C(i+1) on the corresponding atoms of the snapshot at the midpoint of the production run, chosen as the reference structure. A similar approach was used by Showalter and Brüschweiler in their detailed analysis of NMR relaxation data (for a detailed discussion see Section 2.3 of Ref. ). The Lipari–Szabo model was used to fit the initial 20 ns of the autocorrelation functions:
In Eq. (2) above, S2 denotes the order parameter and τe is the autocorrelation time for the intramolecular CH bond reorientations.
All quantum-mechanical calculations were carried out using Gaussian 09. Geometry optimizations were carried using various combinations of QM methods and basis sets, as described in the main text. The “nosymm” keyword of Gaussian 09 was employed to carry out QM calculations with the symmetry of molecules disabled. For DFT M06-2X[61, 62] geometry optimizations, the ultrafine numerical integration grid (with 99 radial shells and 590 angular points per shell) was used, combined with the “verytight” convergence condition (requesting the root-mean-square forces to be smaller than 1 × 10−6 Hartree Bohr−1). Additional frequency calculations were also undertaken to verify that the optimized geometries correspond to true minima. The reaction field method IEFPCM[63, 64] was used to account for water solvent effects. The jump angles Δθ of the CH bonds as a result of the pyrrolidine ring interconversion were determined using Python Molecular Viewer (version 1.5.4).
Calculations employing MP2 and M06-2X methods were also carried out in which a selected dihedral angle was incremented or decremented in 5° steps. Basis sets considered are specified in the main text. At each step the selected dihedral angle was fixed with all the remaining degrees of freedom optimized using MP2 or M06-2X QM calculations. A relaxed 1D potential energy surface scan was performed in this manner and minimized QM energies at each step were obtained. The QM-optimized structures were then used in molecular mechanics (MM) calculations using AMBER99SB force field to obtain the corresponding MM energies (see the main text for further details).
The original conformational notation proposed by Haasnoot et al. for L-prolines are used in this work. The exo- and endo-orientations of the Pro ring carbon Cγ are defined relative to the substituent (COO or CONH groups) at the Cα carbon of the Pro ring. The definition of endo- and exocyclic torsional angles is shown in Figure 1.
The pseudorotation phase angle, P, which identifies a given conformation on the pseudorotation circle, and the pseudorotation amplitude χm, which is the maximum value attained by χ1-χ5. The calculations of P and χm were done using equations by Westhof–Sundaralingam:
Note that 180° is added to the calculated value of P if χ2 < 0. From the distributions of endocyclic torsional angles, a two-site exchange between Cγ-endo and Cγ-exo conformations of the pyrrolidine ring of Pro and Hyp residues was observed in MD simulations of the peptides considered. The populations of these ring conformations are denoted as xendo and xexo (in % with xendo + xexo = 100%).
Initial simplexed MD fittings of experimental NMR data
In our initial revision of the AMBER99SB force field we undertook simplex fittings of 3JHH-couplings, which comprised the optimization of the C-C-C-C dihedral parameters for the endocyclic carbons in the Pro residue of N-acetyl-L-proline (NAcPro) and Gly-Pro-Gly-Gly (GPGG). The choice here is dictated by the fact that accurate experimental data is available for NAcPro and GPGG.[34-36] In particular, full lineshape analysis was employed to derive accurate experimental values of 3JHH-couplings in D2O solutions, with the estimated standard deviation ≤0.03 Hz for vicinal couplings. As for the choice of the force field, the analysis of >10 different force fields applied to GPGG, identified AMBER99SB as the force field which reproduces best experimentally measured NMR parameters in aqueous solutions. Thus, further improvement of this force field presents a challenging task for the simplex fittings of 3JHH couplings.
While AMBER99SB predicts satisfactorily the relative energies of Cγ−exo and Cγ−endo conformations (as judged by their populations predicted by AMBER99SB MD simulations and those determined experimentally from least squares fittings of 3JHH-couplings),[31, 35, 36] the predicted number of χ2 transitions in the Pro-2 residue of GPGG is nearly four times higher than the number of the backbone ψ transitions of Gly-3 (see Table 2 in Ref. ). This is in disagreement with the available experimental data. In particular, from the auto-correlation times and activation parameters reported for GPGG in water based on 13C spin-lattice relaxation time measurements, the frequency of the torsional transitions involving the Cγ atom of Pro-2 is of similar order of magnitude as the frequency of the torsional transitions involving the Cα atom of Gly-3 (see Tables 1 and 2 in Ref. ). Thus, the Pro force field parameters must be optimized such that they reproduce experimentally observed timescale of the Pro sidechain motions. As discussed above, apart from dynamics aspects, there is also need for improving the predicted structure of the pyrrolidine ring. The geometry of the pyrrolidine ring as predicted by AMBER99SB MD simulations is flatter (χm ≈ 35°, where χm is approximately the same as the largest of the ring endocyclic torsions χ1−χ5, which is usually χ2) compared to NMR, X-ray and QM calculations (χm = 37°−42°).[34, 35] The reason for such a difference is that the same set of dihedral CCCC parameters is used in AMBER force fields for both the cyclic (e.g., Cα−Cβ−Cγ−Cδ in Pro corresponding to the endocyclic torsion χ2) and open chain systems (see Ref.  for details of how the CCCC parameter was derived).
Table 2. Conformational Populations and Geometries of the Pro ring in GPGG in Water as Predicted by NMR and by 800-ns Long MD Simulations Using Various Sets of Torsional Parameters for the Pro residue
From least-squares fittings of the vicinal 3J-couplings using Eqs. ((8)C) and ((8)D) of Haasnoot et al.
For our initial simplex optimizations, a standard AMBER dihedral energy term of the following form was used:
where Vn represents dihedral force constant (amplitude), n is dihedral periodicity and γn with the value of either 0° or 180° is a phase of the dihedral angle θ. The dihedral force constants, Vn, were optimized to obtain the best agreement between experimental and MD-predicted values of 3J-couplings of NAcPro. These are optimized for the angle χ2 (Fig. 1), which is usually the largest amongst the endocyclic dihedral angles χ1−χ5 for the Pro sidechain in peptides and proteins. There are three non-zero Vn values (V1, V2, and V3) for the χ2 = CT-CT-CT-CT torsion (CT denotes tetrahedral carbon) in the original AMBER99SB force field. Thus, three parameters V1, V2, and V3 were optimized in our simplex fittings, each step of which consisted of MD simulation followed by the calculation of the MD-averaged 3JHH couplings using the modified Karplus equation of Haasnoot et al.
Prior to deciding the length of MD simulation within simplex fittings, we examined the convergence of the population of endo conformation (xendo, in %) using a 500 ns long MD simulation (Fig. S1, Supporting Information). The results indicate that after the initial ∼10 ns the populations of the two conformers have converged sufficiently. In particular, after 10 ns MD run the population of the endo conformer is 56.6% compared to 56.3% after 20 ns, 56.5 after 100 ns and 56.7% after 500 ns. Even in the region between 1.5 and 10 ns, the population deviations are within less than ±2.0% (Fig. S1, Supporting Information). We have therefore used 7.5 ns long MD simulations at each step in our simplex fittings. The first 0.5 ns were considered as equilibration period and the corresponding data were discarded from calculations of averaged 3JHH-couplings. Up to 10 different simplexed MD simulations were carried out using different scaling factors c between −0.5 and 5, with 50–200 steps of 7.5 ns long MD simulations in each case.
The original AMBER99SB values of force field parameters, together with those derived from our simplex fittings of experimental 3JHH-couplings are shown in Table 1. Five sets of optimized parameters (1)–(5) were selected from simplex fittings, showing the rms deviations from the experimental 3JHH-couplings (rmsJp, in Hz) less than 0.8 Hz based on 7 ns long MD simulations. For comparison, rmsJp = 0.96 Hz for the original AMBER99SB force field. Considering that the increase in force constants during simplex optimizations may lead to longer convergence times, we used additional 200-ns long MD simulations for final estimates of merit functions (rmsJp) for parameter sets (1)–(5) and AMBER99SB. The results of these simulations are summarized in Table 1.
As can be seen from Table 1, parameter sets (1)–(5) obtained from simplexed MD simulations show 5–14% improvements in rms values compared to the original AMBER99SB force field. The χm values in (1)–(5) have slightly increased compared to that in the original force field, which are in better agreement with the NMR, XRD and QM results (37–42).[34, 35] From the Edih(χ2) graphs for the CT-CT-CT-CT fragment (Fig. S2, Supporting Information), it can be seen that the Edih(χ2) graphs for the parameter sets (1)–(4) show higher maxima at χ2 = 0°, the values of which correspond to the value of V3, since n = 1 and n = 2 terms of Eq. (5) are zero at χ2 = 0°, as γ1 = γ2 = 180°. In the transition state between the Cγ-endo and Cγ-exo conformations of the pyrrolidine ring, the value of χ2 is 0°. Thus, the increase of the V3 value here corresponds to the increase of the activation energy of the ring interconversion. Based on the Arrhenius relationship, the increase of the activation energy is expected to lead to the decrease of the frequency of transitions between the Cγ-endo and Cγ-exo states.
The above results suggest that relatively short MD simulations combined with subsequent long MD simulations using selected sets can be applied for the refinement of force field parameters provided that the force constants do not increase significantly. Note that the simplex fittings described in this work generate a new MD trajectory for each trial set of parameters to evaluate the rms deviation between experimental and MD-predicted NMR data, that is, new conformations are created at each step of fittings (see Single Trajectory Reweighting Approach section below). However, the disadvantage of the current method is that it is computationally expensive and relatively large increase in optimized parameters may not be described adequately by short MD simulations used in simplex fittings.
QM optimizations of force field parameters
After initial simplexed MD simulations, we considered QM optimizations of force field parameters followed by iterative MD simulations for further refinement of the force field parameters obtained from QM fittings. Four sets of QM calculations were considered to estimate the dependence of the results on the choice of the basis set and the QM method, as well as to assess the level of uncertainty involved: M06-2X/def2-TZVP, M06-2X/6-31G(d,p), M06-2X/cc-pVTZ and MP2/6-31+G(d). Based on previous studies,[35,70] these QM methods and basis sets reproduce relative conformational energies and geometries in good agreement with experimental data. Calculations of 31 conformers of NAcPro were carried out in which the Cα-Cβ-Cγ- Cδ dihedral angle (χ2) was varied in 5° steps between −75° and +75°. The QM predicted energy profiles in the gas phase and in water (using IEFPCM)[63, 64] as a function of χ2 are compared in Figure 2. Considering relative energies of Cγ-endo and Cγ-exo conformers (with the corresponding χ2 values at ∼−40° and +40°, respectively), the experimentally measured ratio of two conformers in water (xendo=61% and xexo=39%) are best reproduced by IEFPCM(H2O) MP2/6-31+G(d) and M06-2X/def2-TZVP calculations [Fig. 2(c)]. The predicted populations of the Cγ-endo were 66 and 71%, respectively, by IEFPCM(H2O) MP2/6-31+G(d) and M06-2X/def2-TZVP calculations. Thus, the results from these two sets of calculations were used in our further analysis.
The following merit function of Lindorff-Larsen et al. was used in our fittings:
where and are the QM and molecular mechanics (MM) energies, respectively, and M is the number of conformations optimized at the QM level (31 in this case). The inverse temperature, β, is set to 1.0 mol kcal−1 (see discussion in Ref.  regarding the choice of β value). Adopting the approach developed by Lindorff–Larsen et al., the EMM energy is given by the AMBER99SB energy, EA99SB, plus a new torsion term, that replaces the existing AMBER99SB torsion, VA99SB(θ):
where k0 is a constant, the Vns are force constants in the cosine expansion including N terms and the γns are corresponding phases of the dihedral angle θ.
Simulated annealing fittings were employed to minimize Φ as a function of θ = χ2 with N = 3 by varying Vn and γn values in the torsional force-field term. In line with the approach used to modify the AMBER99SB backbone potential, we have assumed that Vn ≥ 0 kJ mol−1 and γn is either 0° or 180°. However, on fitting the gas phase data the predicted values of V1, V2 and V3 were 0 kJ mol−1 for both the MP2/6-31+G(d) and M06-2X/def2-TZVP data. We therefore consider only the IEFPCM(H2O) data below and any further reference to MP2/6-31+G(d) and M06-2X/def2-TZVP calculations assumes the use of the IEFPCM(H2O) method.
The values of the merit function Φ for the original AMBER99SB force field compared to the MP2/6-31+G(d) and M06-2X/def2-TZVP profiles were 2.84 and 2.24 kcal mol−1 (after k0-corrections according to Eq. (7)). On using simulated annealing fittings, these reduce to 2.39 kcal mol−1 for the parameter set (6) obtained from fittings to the MP2/6-31+G(d) profile and 1.93 kcal mol−1 for the parameter set (7) obtained from fittings to the M06-2X/def2-TZVP profile (Table 1 and Fig. 2). In both cases, V1 = V2 = 0 and V3 ≠ 0 kJ mol−1. Such a result with only V3 ≠ 0 kJ mol−1 is not surprising, considering that χ2 in the pyrrolidine ring varies between ∼−40° and +40°. Only the V3 term (with γ3 = 0°) will have a maximum equal to V3 kJ mol−1 at χ2 = 0°, while the V1 and V2 terms (with γ1 = γ2 =180°) will show minima equal to 0 kJ mol−1 at χ2 = 0°. The value of V3 increases significantly compared to the original force field, which is in qualitative agreement with earlier results from simplexed MD simulations (parameter sets (1)–(5)) indicating to better agreement with experiment on increasing V3. From 200 ns MD simulations of NAcPro (Table 1), the parameter set (7) from M06-2X calculations shows significantly better agreement with experiment than (6) derived from MP2 calculations.
Using the QM-derived parameter set (7) as a starting point, simplexed MD simulations were carriedout to optimize the value of V3. Initially, 40 ns long MD simulations were used at each step of simplexed MD simulations for merit function calculations. Parameter sets (8)-(23) were selected from these fittings with lowest merit function values for further 200 ns long MD simulations (Table 1). On increasing the length of MD runs from 40 ns to 200 ns, the rmsJp values increase from 0.65–0.85 Hz to 0.73–1.08 Hz for parameter sets (8)-(23). The MD-averaged χm values predicted by these parameter sets (Table 1) show better agreement with the experimental NMR value compared to the original AMBER99SB force field. However, it is likely that at relatively high values of V3 the short MD simulations used in simplexed MD fittings were not converged sufficiently. We therefore retain all new parameter sets (6)-(23) in our further analysis, as these provide sufficiently fine distribution of V3 values between 1.9 and 9.3 kJ mol−1. In addition, parameter sets (1)–(5) were also included in our further analysis.
Single trajectory reweighting approach
The first application of the method relying on the energy-based reweighting approach[71-74] to fittings of 3J-couplings of NAcPro with optimizations of three parameters V1, V2, and V3 led to unusually large values of V2 and V3 on using a 500-ns long MD trajectory with frames recorded every 0.04 ps: V1 = 0.0419, V2 = 22.3835, and V3 = 22.5864 kJ mol−1 with the rms of the fitting 0.79 Hz. As the predicted value of V3 is very high, significantly smaller number of the pyrrolidine ring transitions are expected in MD simulations compared to, for example, the number of peptide backbone transitions, which does not agree with experiment. As discussed by Li and Brüschweiler, the effectiveness of the reweighting scheme critically depends on the degree of overlap between the parent and the reweighted trajectories, since the reweighted procedure does not create any new conformations. On introducing a collectivity parameter κ with the requirement κ > 50% (see Eq. (2) and the discussion following it in Ref. ), a physically plausible solution was obtained from the 500 ns long parent trajectory of NAcPro in water: V1 = 0, V2 = 0.0009, and V3 = 2.3891 kJ mol−1. This set of parameters is essentially the same as (14) (Table 1) and therefore is not included into our further analysis. On increasing the number of terms from three to six in Eq. (5), an alternative set of parameters was derived using the reweighting approach, which is included in Table 1 as (24). Based on 200 ns MD simulations of NAcPro, this set of parameters performs slightly better than AMBER99SB and is therefore included into our further analysis.
MD simulations of Gly-Pro-Gly-Gly
For further examination, we carried out MD simulations of GPGG (Fig. 3) using force field parameters (1)-(24) and the original AMBER99SB force field. Note that AMBER99SB* or AMBER99SB-ILDN simulations would be the same in this case as the AMBER99SB simulation, as there are only Gly and Pro residues in GPGG. The recent study verifying different force fields using GPGG used 2 μs long MD simulations, which were sufficient for the majority of the force fields considered. However, on examination of the convergence of the population of the folded form against the length of the MD run for the AMBER99SB force field (Fig. 6 in Ref. ), it is clear that no significant change occurs in the population of the folded form after 600 ns. Thus, we carried out 800 ns long MD simulations for our analysis.
From the results obtained for the Pro ring in GPGG (Table 2), all new parameter sets show better agreement with the experimental data with rmsJp in the range between 0.48 and 0.62 Hz compared to the original force field (0.66 Hz). More importantly, all the tested sets provide higher values of χm, (36.4°–41.3°) compared to the original set of parameters (35.3°). These results confirm that new parameter sets predict pyrrolidine ring geometries in better agreement with NMR, XRD and QM data compared to the original force field.
We have also analyzed NMR parameters dependent on the backbone conformation of GPGG. In particular, based on the analysis of NOE data for GPGG internuclear distances for seven proton pairs were measured previously. Averaged values of internuclear distances from MD runs were estimated over 800 ns time length for each of three MD simulations. The rms deviation between experiment and the MD predictions of distances (rmsd) were calculated (Table SI, Supporting Information). In addition, four 3JCH and two 3JHH were available from NMR measurements for the GPGG backbone,[31, 36] which were used for NMR versus MD comparisons. As described previously, two empirical (corresponding to rmsJe1 and rmsJe2 in Table SI)[50, 51] and two QM-derived equations (corresponding to rmsJq1 and rmsJq2 in Table SI) were used to exclude possible model dependent deficiencies. For rmsJq1 and rmsJq2, we used B972 and B3LYP-predicted Karplus relationships, which have been shown to be sufficiently accurate.[75-81] The results summarized in Table SI confirm that modifications of the Pro χ2 dihedral parameters do not cause any significant changes in the backbone conformations as there is a very good agreement for all the MD simulations when considering parameters averaged over backbone conformations. Similarly, the population of the U-shaped folded conformation of GPGG (pf, %) and the mean terminal N…C′ distance (dter, Å) predicted by new parameter sets are in agreement with those predicted by the original AMBER99SB force field (Table 3).
Table 3. Conformational Properties of GPGG Derived from MD Simulationsa
Shown are the population of the folded form (pf); the mean terminal N…C′ distance (dter), the number of ψ2, ϕ3, ψ3 and χ2 torsional transitions per ns (Nψ2, Nϕ3, Nψ3 and Nχ2, respectively). Frames recorded every 1 ps were used in the calculations of Nψ2, Nϕ3, Nψ3 and Nχ2.
Matching relative motional timescales from MD simulations and experiment
To identify which of the new parameter sets is likely to reproduce both structural and dynamics properties of Pro residues more accurately, we have considered timescales of motions in GPGG. First, we consider the number of the ψ2, ϕ3, ψ3 (see definitions of angles in Fig. 3), χ1 and χ2 (see definitions of angles in Fig. 1) torsional transitions per nanosecond (Nψ2, Nϕ3, Nψ3, Nχ1, and Nχ2 in Table 3). As expected, the backbone transition numbers (Nψ2, Nϕ3 and Nψ3) are not affected by the change of the Pro torsional parameters, whereas moderate (Nχ2 ≈ 41–66) and significant (Nχ2 ≈ 1–37) decrease in the Nχ2 values are observed for parameter sets (1)-(5) and (6)-(24), respectively, compared to the original force field (Nχ2 ≈ 81). For new force field parameter sets containing only a single V3 term there is a linear relationship ln (Nχ2) versus V3 (Fig. 4), as well as ln (Nχ2) versus χm (Fig. S3, Supporting Information) and χm versus V3 (Fig. S4, Supporting Information). Thus, we can adjust the Pro sidechain torsional force field such that the timescale of the sidechain dynamics matches that from experiment.
Using 13C spin-lattice relaxation times measured for GPGG in water at 303 K, Mikhailov et al. have estimated that the auto-correlation time of the CγH bonds of Pro-2 is 27 ± 1.5 ps. As the accuracy of this type of measurements is critically dependent on the signal-to-noise ratio, we have repeated 13C spin-lattice relaxation time measurements of GPGG using a higher-field NMR spectrometer (14.1 T, 600 MHz 1H frequency) and a cryoprobe (Tables SII and SIII, Supporting Information). For the analysis of the T1 values and deriving correlation times, we have used the approach developed by Ernst et al.,[82, 83] which is different to that used by Mikhailov et al. The following equations were used to derive the correlation times for the overall (τc) and the intramolecular ring interconversion (τe) processes from the measured T1 relaxation times[82-84]:
where Δθ is the jump angle of the CH bond on conformational transition, γH and γC are gyromagnetic ratios of 1H and 13C, ħ is Planck's constant divided by 2π, rCH = 1.09 Å is the CH bond length, Δσ is the chemical anisotropy of the 13C nucleus considered (see Experimental), N is the number of H atoms attached to the C atom. Note that in Eq. (8)), the sum of populations xendo and xexo is 1 (not in %).
The correlation time τc can be determined using the NT1 value (where N is the number of H atoms bonded to C) of the backbone Cα carbons, which are least affected by the intramolecular motions, hence better describe the overall motion of the molecule.[82-86] In GPGG, NT1 values of Cα carbons are 1.146 s (Gly-1), 0.995 s (Pro-2), 1.106 s (Gly-3), and 1.836 s (Gly-4) (Table SIII, Supporting Information). The end residue backbone Cα carbons of Gly-1 and Gly-4 show the largest values, which suggest additional intramolecular dynamics for this carbon compared to mid-chain Cα carbons of Pro-2 and Gly-3. The minimum value of NT1 is observed for the Cα carbon of Pro-2, therefore we have used T1 of this backbone carbon to determine the correlation time τc for the overall motion. The likely intramolecular motion that can influence the T1 value for this carbon is the pyrrolidine ring interconversion. However, as estimated previously the jump angle Δθ is <5° for the Cα carbon of the pyrrolidine ring (see Table IX in Ref. ). Using Eqs. (8)–(11), it can be estimated that Δθ = 5° leads to only ∼0.4% increase in the T1 value and therefore can be neglected. From the T1 value of 995 ± 6 ms for the Cα carbon of Pro-2 in GPGG measured at 298 K for the 57 mM solution in D2O, the correlation time τc is 48.2 ± 0.3 ps. This value was used in the analysis of the T1 value for the Cγ carbon of Pro-2 in GPGG to determine the correlation time τe for the intramolecular ring interconversion (see below).
In Eq. (8), two terms are weighted by factors dependent on the populations of Cγ-endo and Cγ-exo ring conformers (xendo and xexo, with xendo + xexo =1) and on the jump angle Δθ for a given CH bond direction on changing the ring conformation. The largest jump angles are expected for Cγ carbon of the pyrrolidine ring. Thus, the T1 relaxation times of Cγ carbons (Tables SII and SIII, Supporting Information) were used for τe determinations. Madi et al. determined Δθ values using dihedral angles, which they estimated using the Karplus relationship. Because the accuracy of the Karplus relationship for predicting dihedral angles is relatively poor, we have taken a different approach, in which QM predicted geometries are used. Such an approach is supported by the finding that in the absence of relatively strong intermolecular interactions QM geometries reproduce accurately experimental molecular geometries derived from X-ray and neutron diffraction measurements. We used the two lowest energy conformations of NAcPro from M06-2X/def2-TZVP IEFPCM(H2O) calculations described above, the geometries of which were optimized without any restrictions. Additional frequency calculations were carried out to verify that the final structures correspond to true minima. The obtained structures correspond to Cγ-endo- and Cγ-exo-conformations of the pyrrolidine ring with P/χm values of 171.5°/39.3° and 16.5°/39.0°, respectively. As discussed previously,[82, 83] the most rigid part of the Pro ring in peptides is the C-N-Cα-C fragment, where Cs are carbonyl carbons of COMe and COO in the case of NAcPro (see Fig. 5). We therefore overlaid the Cγ-endo and Cγ-exo conformations such that the rms deviations in the positions of four atoms of the C-N-Cα-C fragment are minimal (Fig. S5, Supporting Information). The angle Δθ was then estimated as the angle between the corresponding Cγ-H bond directions in two conformations. The values of Δθ determined for the Cγ-Hγ2 and Cγ-Hγ3 bonds were 82.65° and 82.47° with the average value of 82.56°, which was used as a fixed value of Δθ in our fittings using T1 relaxation times of Cγ carbons. The populations of Cγ-endo and Cγ-exo ring conformers are known from the analysis of 3JHH coupling constants measured at 298 K (Table 2) and are assumed to be temperature independent. With these restrictions in place, the correlation time τe for the intramolecular ring interconversion process were determined using the measured T1 values for Cγ carbons at different temperatures. From the comparison of the above Eq. (8) and Eq. (37) of Lipari and Szabo, the generalized order parameter is dependent on the populations of conformers and the jump angle Δθ in the case of the two-site jump model and can be calculated using the following relationship:
For xendo=0.543 and Δθ=82.56°, the calculated experimental value of S2 is 0.27.
Using the measured T1 values for Cα and Cγ carbons of Pro in GPGG for the 214 mM solution of D2O (Table SII, Supporting Information) the values of correlation times τc and τe were determined at different temperatures (Table SIV, Supporting Information). Assuming Arrhenius dependence of correlation times [τ = τ0exp (Ea/RT)], activation parameters are Ea = 16.4 ± 1.2 kJ mol−1 and τe0 = (4.1 ± 1.6) × 10−14 s for the pyrrolidine ring interconversions. To estimate errors in activation parameters, we have excluded two highest and two lowest temperatures from consideration which led to Ea variations between 15.6 and 17.6 kJ mol−1 and τe0 variations between 2.5 × 10−14 to 5.5 × 10−14 s. The estimated correlation time τe for the CγH bond movements in Pro-2 of GPGG as a result of the pyrrolidine ring interconversion is 27.2 ps at 303 K, which is in good agreement with the value of 27 ps reported by Mikhailov et al.
Our MD simulations were carried out at 298 K. Using the T1 value of 898 ± 4 ms for the Cγ carbon of Pro-2 in GPGG measured at 298 K for the diluted 57 mM solution of GPGG in D2O, we have estimated the correlation time τe for the CγH bond reorientations in Pro-2 of GPGG as a result of the pyrrolidine ring interconversion as 29.7 ± 0.4 ps, which is slightly smaller than the value of τe calculated as 30.3 ps using the activation parameters reported above for the 214 mM solution. As higher concentrations may in principle lead to partial self-associations of peptides, we have used the experimental value of τe = 29.7 ps at 298 K as a reference point for our MD simulations. From the analysis of τe calculated for 14 parameter sets with a single non-zero V3 term (γ3 = 0°, Table 4), there is a linear correlation (Fig. S6, Supporting Information): V3 (in kJ mol−1) = 1.9272 ln τe (in ps) – 2.1881 (with r2 = 0.9975). Using this relationship, we estimate V3 = 4.3474 kJ mol−1 for τe = 29.7 ps. For backward verification, the 800-ns long MD simulation at 298 K with V3 = 4.3474 kJ mol−1 (γ3 = 0°) predict τe = 28.7 ps and S2 = 0.29, in close agreement with the experimentally measured values of τe = 29.7 ps and S2=0.27. This parameter set (denoted as (25) in Tables 1-5) is selected as the final solution which reproduces the experimental structural (Tables 2 and 3) and dynamic properties (Tables 3 and 4, Fig. S7) of the sidechain of the Pro residue significantly better than the original AMBER99SB force field.
Table 4. Intramolecular Autocorrelation Times τe (in ps) and Order Parameters S2 for the CγH Bond Reorientations of Pro in GPGG as Predicted by 800-ns MD simulations
In another set of optimizations we considered variations of both the V3 force constant and the phase γ3. The value of V3 was varied between 1 and 5 kJ mol−1 with a step of 1 kJ mol−1, while the value of γ3 was varied between −50 and 50° with a step of 10°. The results of 700 ns long MD simulations for each pair of V3 and γ3 values are summarized in Tables SV–SVIII in Supporting Information. Over four parameters considered (rmsJp, xendo, τe and S2), the force field with V3 = 4.0 kJ mol−1 and γ3 = 0° shows the best agreement with experiment. This additional grid search analysis allowed us to confirm that the above optimization leading to V3 = 4.3474 kJ mol−1 and γ3 = 0° is the unique solution in the two-dimensional (V3,γ3)-parameter space.
Influence on the backbone conformation
To examine the influence of the new sidechain parameter set on the protein backbone conformations and dynamics, we have carried out 1-μs long MD simulations of ubiquitin. Three Pro residues of ubiquitin—Pro-19, Pro-37, and Pro-38—were considered, conformational characteristics of which are compared in Table SIX (Supporting Information). Compared to the original force field, the parameter set (25) lead to higher χm values (38.8–39.5°), which are in better agreement with experimental XRD data.[48, 88] In particular, the solid-state values of χm are 42.5° (Pro-19), 44.2° (Pro-37), and 45.2° (Pro-38).
Unlike Pro-19 and Pro-37, the pyrrolidine ring of Pro-38 in ubiquitin is in predominantly Cγ-exo conformation according to MD simulations (Table SIX), which is in agreement with the finding that in Xaa-Yaa-Gly triplets of collagen the Pro ring prefers the endo pucker (i.e., Cγ-endo conformation) in the X position, while in the Y position it prefers the exo pucker.[89, 90] In principle, this can be verified experimentally by measuring accurate values of 3JHH-couplings of the pyrrolidine rings in ubiquitin. However, pyrrolidine cyclic protons usually show strongly-coupled 1H NMR spectra due to small chemical shift differences for methylene protons in β and γ positions. Accurate measurements of JHH-couplings would therefore require a full lineshape analysis, which is complicated by strongly overlapping spectra in the case of proteins.
The values of Nχ2 in ubiquitin prolines are in good agreement with those predicted for the Pro residue in GPGG, although the number of χ2 transitions decreases significantly in Pro-38, which is likely caused by the Pro-37 residue preceding Pro-38. We have compared three experimental 3J(C′,Hα) couplings of 1.22 Hz (Pro-19), 1.71 Hz (Pro-37), and 1.06 Hz (Pro-38) in ubiquitin[37, 38] with those calculated from MD simulations of ubiquitin using Karplus parameters, derived empirically and from DFT B3LYP/EPR-III calculations. Compared to the AMBER99SB*-ILDN calculations, the parameter set (25) lead to only small variations in 3J values (Table SIX, Supporting Information). This result confirms that the changes in the sidechain dynamics interchanging the Cγ atom position below and the above the Cα-N-Cγ plane cause only small changes in the torsional angle Hα-Cα-N-C (Fig. 5 and Fig. S5).
Finally, the performance of parameter sets AMBER99SB*-ILDN and (25) were compared using experimental values of five different types of backbone 3J-couplings, each of which has been determined for 60–67 amino acid residues in ubiquitin.[37, 38] On calculating the MD-predicted averaged 3J-couplings we have considered up to four different sets of Karplus parameters for each type of 3J coupling.[49, 50] From the results summarized in Table SX (Supporting Information), both force fields reproduce 3J couplings equally well, confirming that the new Pro torsion potential does not cause undesirable side effects on the backbone conformations compared to the original force field, the performance of which has been verified extensively.[3, 4, 14, 15, 19-33]
Force field validation
As an independent test, we have used NMR data and MD simulations of Val-Ala-Pro-Gly (VAPG). In Table 5, we compare conformational populations and geometries of the Pro ring in VAPG in water as predicted by NMR and by 800-ns long MD simulations. The rmsJp values relative to experimental values of 10 3JHH-couplings show that the new force field (25) reproduces better the experimentally measured values than the original force field. The value of χm serves as a measure of non-planarity of the five-membered ring. The results confirm that the new force field (25) leads to significantly improved agreement with experiment compared to the original force field AMBER99SB.
Table 5. Conformational Populations and Geometries of the Pro ring in Aqueous Solutions of Peptides from NMR and MD Simulations Using Different Sets of Torsional Parameters for the Pro residuea
1.5 μs MD simulations for angiotensin II and 800 ns for other peptides were analyzed.
The rms deviation for NMR is for fittings of experimental 3JHH values using Eqs. ((8)C) and ((8)D) of Haasnoot et al. on the assumption of a two-site conformational exchange between Cγ-endo and Cγ-exo conformers and χmendo= χmexo.
The values and uncertainties were determined using T1=386 ± 12 ms for 13Cγ of Pro-7. From M06-2X/def2-TZVP calculations of GPF, the jump angle Δθ was 83.16°.
In terms of motional dynamics, the predicted values of the correlation time and generalized order parameter for the Pro ring interconversion at 298 K are 4.2 ps and 0.35, respectively, according to the 800-ns MD simulations at 298 K using the original AMBER99SB force field. The predicted value of τe is significantly different from the value measured experimentally in this work using T1(13C) values at 298 K (Table SXI, Supporting Information): 30.7 ± 0.5 ps for the 77 mM solution of VAPG in H2O:D2O (9:1). For xendo = 0.523 and Δθ = 82.56°, the estimated experimental value of S2 is 0.26. Note that in VAPG, the NT1 values of Cα carbons are 0.751 s (Val-1), 0.614 s (Ala-2), 0.641 s (Pro-3) and 1.142 s (Gly-4) (Table SXI, Supporting Information). Judging by NT1 values, the Cα site of Ala is least affected by intramolecular motions, thus the T1 value of this carbon was used to determine the correlation time for the overall molecular motion (τc = 82.8 ± 0.7 ps). The corresponding values predicted by the new force field are τe = 28.6 ps and S2=0.31, which are in good agreement with experiment.
Although we have primarily focused on force field optimizations for the trans-rotamer about the bond preceding the Pro residue, it would be interesting to verify whether the new force field would offer any improvements for the cis-rotamer compared to the original force field. In the case of cis-VAPG (with the cis-orientation of the CH2 group of Gly and the CO group of Pro), the MD-predicted 3JHH couplings by the new force field (25) show improved agreement with experimental values of 3JHH couplings compared to the original force field as judged by the rmsJp values: 1.00 Hz and 1.32 Hz for force fields (25) and AMBER99SB. However, the agreement with the experiment is not as good as for the trans-VAPG considered above due to the lower value of the predicted population of the Cγ-endo conformer by the new force field (74%, as opposed to the experimental value of 83%). The difference in the predicted population by the new force field is further amplified in the predicted value of S2=0.41 (experimental value 0.58), as S2 is proportional to the product of xendo and (1 - xendo). At the same time, the predicted value of τe = 20.9 ps by the new force field is in good agreement with the experimental value of 22 ± 2 ps. For comparison, the predicted values of S2 and τe by AMBER99SB are 0.41 and 3.3 ps, respectively.
The change of the amino acid residue proceeding the Pro residue to Phe has been shown to lead to the increased population of the Cγ-endo conformer. We have re-determined conformational characteristics of the Pro residue in Gly-Pro-Phe (GPF) using experimental values of all ten 3JHH couplings reported by Anteunis et al. and the least squares fitting procedure described previously. The results summarized in Table 5 confirm that the content of the Cγ-endo conformer increases in GPF (xendo = 68.0%) compared to that in GPGG and VAPG. However, the degree of change is not as significant as previously predicted (xendo = 85%) using Karplus relations of Pogliani et al. In Table 5, we compare conformational populations and geometries of the Pro ring in GPF in water from 800 ns long MD simulations and experiment. As in the case of tetrapeptide VAPG above, the rmsJp values relative to experimental values of ten 3JHH-couplings show that the new force field (25) reproduces better the experimentally measured values than the original force field. The higher values of xendo and χm compared to the original force field are also in better agreement with experiment (Table 5).
We have also analyzed NMR data and MD simulations of octapeptide angiotensin II (Asp-Arg-Val-Tyr-Ile-His-Pro-Phe, Fig. S8 in Supporting Information). After initial assignments of peaks in 1H and 13C spectra of 16 mM solution of angiotensin II in D2O using 2D NMR spectra (Tables SXII and SXIII, Supporting Information), full 1H NMR lineshape analysis was carried out to determine vicinal 3JHH couplings of the Pro-7 sidechain (Fig. S9 and Table SXIV, Supporting Information), which were subsequently analyzed to estimate conformational characteristics of the pyrrolidine ring of Pro-7 in angiotensin II. In addition, 13C spin-lattice relaxation times were measured at 298 K (Table SXV in Supporting Information), which allowed to measure values of S2 and τe. As in the case of GPGG and VAPG discussed above, the T1 values of the backbone Cα carbons show clear decrease towards the mid-chain residues (in ms next page):
Asp Arg Val Tyr Ile His Pro Phe
520 355 347 310 324 327 372 448
The minimum value of T1 observed for the Cα carbon of Tyr-4 suggests that this site is least affected by intramolecular motions. It is therefore best suited for determining the correlation time τc of the overall molecular motion. From Eqs. (8)–(11), the value of τc corresponding to T1 = 310 ± 3 ms is 246 ± 6 ps. This value was used in the analysis of the T1 value for the Cγ carbon of Pro-7 in angiotensin II to determine the correlation time τe for the intramolecular ring interconversion (see below).
To estimate the jump angle Δθ in angiotensin, we have used M06-2X/def2-TZVP calculations of GPF with the Phe residue following Pro as in angiotensin II. After overlaying the Cγ-endo- and Cγ-exo-conformations of GPF such that the rms deviations in the positions of four atoms of the C-N-Cα-C fragment are minimal, the jump angle Δθ was determined as 83.16° (82.97° for Cγ-Hγ2 and 83.34° for Cγ-Hγ3), which was used as a fixed value of Δθ in our fittings T1 relaxation data.
In Table 5, we compare conformational populations and geometries of the pyrrolidine ring of angiotensin II in water determined by NMR and by 1500-ns long MD simulations. The rmsJp values relative to experimental values of 10 3JHH-couplings show that the new force field (25) with rmsJp = 1.03 Hz reproduces the experimentally measured values better than the original force field with rmsJp = 1.32 Hz. For the pseudorotation amplitude χm, the results confirm that the new force field (25) leads to significantly improved agreement (χm = 38.8°) with experiment (χm = 42° ± 2°) compared to AMBER99SB (χm = 35.5°). Regarding motional dynamics (Table 5), the timescale of motion is reproduced significantly better by the new force field (25). The corresponding values of τe are 8.4, 33.1 and 32 ± 4 ps for AMBER99SB, the new force field (25) and experiment, respectively.
Finally, the relative experimental values of overall and internal correlation times τc/τe were 48.2 ps/29.7 ps in GPGG, 82.8 ps/30.7 ps in VAPG and 246 ps/32 ps in angiotensin II. These clearly show that despite the fivefold increase in the correlation time of the overall motion, the timescale of the internal motion remains essentially unchanged in these peptides of varying size. Thus, it is likely that the overall molecular motions and the intramolecular dynamics of the Pro ring are independent in the peptides considered.
Force field parameters of hydroxyproline
Together with Pro and Gly, the 4-hydroxyl-L-proline residue (Hyp) is one of the main building blocks in collagen,[89, 90, 93] although it is not included in the list of 20 natural amino acid residues. In the GROMACS implementation of AMBER99SB, the force field parameters of Mooney et al. is used for the N-Cδ-Cγ-O torsion of Hyp, although reparameterization by Park et al. has been shown to reproduce the experimentally observed preference of the Cγ-exo conformer in Hyp over the Cγ-endo conformer better than that of Mooney et al. Our MD simulations carried out for Ace-Hyp-NHMe (AHM, Fig. 6) are in agreement with these findings (Table 6). The predicted population of the Cγ-endo conformer is 51.4% on using parameters of Mooney et al., while the smaller value of 6.7% predicted by the Hyp parameters of Park et al. is in good agreement with the experimental value of 12%. Similarly, the experimental 3JHH couplings of the Hyp ring are better reproduced by parameters of Park et al. (rmsJp=1.05 Hz) compared to that of Mooney et al. (rmsJp=2.72 Hz). However, the χm values by both parameter sets show flattened ring geometries compared to experiment (Table 6). Furthermore, the predicted motional characteristics of the ring dynamics by both parameter sets are in sharp contrast with experiment, showing significantly higher frequencies of ring interconversions. In particular, the correlation times of the ring interconversions (τe) are 7.8 ps (Mooney et al.), 1.5 ps (Park et al.) and 82.6 ps (experiment).
Table 6. Conformational Populations and Geometries of the Hyp Ring in AHM in Water from NMR and 1.5-μs Long MD Simulations Using Various Sets of Torsional Parameters for the Hyp Residue
V3 (kJ mol−1)
Apart from the original AMBER99SB force fields using the Hyp force field parameters of Mooney et al. and Park et al., all other models use V3=4.3474 kJ mol−1 (γ3 = 0°) for the endocyclic CCCC (χ2) torsion of the Hyp residue of AHM.
The modified Hyp force field parameters of Park et al. were used as a Ryckaert–Bellemans function with C0 = 0.6527 kJ mol−1 and C2 = 12.46832 kJ mol−1.
The rms deviation for NMR is for fittings of experimental 3JHH values using Eqs. (8)C) and (8)D) of Haasnoot et al. assuming a two-site exchange between Cγ-endo and Cγ-exo conformers and χmendo= χmexo.
We have optimized the force field parameters for the hydroxyproline N-Cδ-Cγ-O torsional angle (denoted as χh) to better match the dynamics characteristics of the Hyp sidechain. The new force field (25) for the CCCC (χ2) torsion was used as a fixed constant (V3 = 4.3474 kJ mol−1 and γ3 = 0°) in these optimizations for the Hyp residue. In the original AMBER99SB force field V3 = 0.65084 kJ mol−1 and γ3 = 0° for the hydroxyproline N-Cδ-Cγ-O (χh) torsion. Initially, 1.5-μs MD simulations were considered in which the value of V3 for χh was gradually increased (Table 6). This showed that the population xendo approaches the experimental value at only very high values of V3 (see Table 6), at which even 1.5 μs MD simulations may not be sufficient for the convergence of the predicted population.
Similar to the Pro residue considered above, we used QM calculations to fit the χh parameters in Hyp. The M06-2X/def2-TZVP IEFPCM(water) calculations of 26 conformers of AHM were carried out in which the N-Cδ-Cγ-O dihedral angle was varied in 5° steps between 52.8° and 177.8°. Simulated annealing fittings were employed to minimize the value of merit function Φ [Eq. (6)] as a function of θ = χh by varying V3 values (γ3 = 0°) and k0 [Eq. (7)]. This led to V3 = 5.5574 kJ mol−1 with only small improvement in the value of Φ (0.44 kcal mol−1) compared to the original force field with the Hyp parameters of Mooney et al. (0.46 kcal mol−1). The QM-optimized value is close to the value of V3 = 5.7 kJ mol−1 in Table 6, which predicts very high value of xendo compared to experiment. Therefore, no new MD simulations were carried out.
In a new set of optimizations we considered variations of both the V3 force constant and its phase γ3. The results of 600-ns long MD simulations for each pair of V3 and γ3 values are summarized in Tables SXVI–SXIX (Supporting Information). Over four parameters considered (rmsJp, xendo, τe and S2), the force field with V3 = 5.3 kJ mol−1 and γ3 = 30° shows the best agreement with experiment. From the spin-lattice relaxation time measurements for a 59 mM solution of AHM in D2O at 298 K, τc = 32.8 ± 0.5 ps, τe = 82.6 ± 2.8 ps and S2 = 0.69 ± 0.01 (full NMR data for AHM is included in Tables SXX–SXXII in Supporting Information). The τe values for the force constants V3= 4.3, 5.3, and 6.3 kJ mol−1 at γ3 = 30° show a satisfactory linear relationship: V3 (in kJ mol−1) = 3.6404 ln τe (in ps) −10.555 (with r2 = 0.9968). Using this relationship, we estimate V3 = 5.5138 kJ mol−1 for the experimental value of τe = 82.6 ps. This value of V3 together with the phase γ3 = 30° was used for our further verifications (referred to as parameter set (h13)). A 1.5-μs long MD simulation using force field (h13) for χh of Hyp (with force field (25) for the χ2 potential) confirmed the improvement of the parameterization of the χh potential, as S2 is 0.69 and τe = 77.6 ps compared to the original AMBER99SB force field with S2 = 0.34 and τe = 7.8 ps and the experimental values of S2 = 0.69 and τe ≈ 83 ps (Table 7). Also, the predicted xendo population is 9.6%, which is in close agreement with the experimental value of 11.9%. In addition, the χm value increases from 35.0° for AMBER99SB to 39.5° for (h13), which compares better to the experimental estimate of 42° ± 2°. As expected, these improvements are reflected in the considerable reduction in the rmsJp value, which decreases from 2.72 Hz for AMBER99SB with the Hyp parameters of Mooney et al. to 0.62 Hz for model (h13).
Table 7. Conformational Populations and Geometries of the Hyp Ring in AHM and AHG in Water from NMR and 1500-ns MD Simulations Using Different Sets of Torsional Parameters for the Hyp Residue
Apart from the original AMBER99SB force fields using the Hyp force field parameters of Mooney et al. and Park et al., all other models use V3=4.3474 kJ mol−1 (γ3 = 0°) for the endocyclic CCCC (χ2) torsion of the Hyp residue.
The modified Hyp force field parameters of Park et al. were used as a Ryckaert–Bellemans function with C0 = 0.6527 kJ mol−1 and C2 = 12.46832 kJ mol−1.
The rms deviation for NMR is for fittings of experimental 3JHH values assuming a two-site exchange between Cγ-endo and Cγ-exo conformers and χmendo= χmexo.
The values and uncertainties were determined using T1 for 13Cγ of Hyp in 59 mM D2O solutions. From M06-2X/aug-cc-PVTZ calculations of AHM, the jump angle Δθ used for determining S2 and τe in AHM and AHG was 82.64°. The τc values determined using T1 for 13Cα of Hyp were 32.8 ± 0.5 ps for AHM and 43.5 ± 0.6 ps for AHG.
Further independent validation for the hydroxyproline parameters was carried out using 1.5-μs long MD simulations of N-acetyl-4-hydroxy-l-proline-glycine (Ace-Hyp-Gly, AHG, Fig. 6; full NMR data is included in Tables SXX–SXXII, Supporting Information). The new force field (h13) for the χh torsion together with the force field (25) for the χ2 endocyclic torsion shows a much improved agreement with experiment compared to the original force field AMBER99SB (Table 7). The value of χm increases from 35° and 34.6° for the AMBER99SB force field with the Hyp parameters of Mooney et al. and Park et al., respectively, to 39.6°. For comparison, χm = 42° ± 2° based on the analysis of the experimental NMR data. The predicted value of xendo also shows improved agreement with experiment, that is, the experimental value of 13.8% ± 0.5% is reproduced as 9.4% by the new force field. This is also reflected in the reduced rmsJp value which is 0.64 Hz (Table 7). By far the largest improvement is obtained for dynamics characteristics of the hydroxyproline ring interconversion. For example, the original force field using the Hyp parameters of Mooney et al. predicts τe = 8.7 ps and S2 =0.34, while the experimental values are τe =80 ± 4 ps and S2 =0.65 ± 0.01. The new force field predicts τe = 79.9 ps and S2 =0.70, in quantitative agreement with the experimental values and significantly better than the original force field (Table 7).
We propose a new approach for force field optimizations which aims at reproducing experimental dynamics characteristics using biomolecular MD simulations, in addition to improved prediction of motionally averaged structural properties available from experiment. As the source of experimental data for dynamics fittings, we use 13C NMR spin-lattice relaxation times T1 of various backbone and sidechain carbon atoms, which allow to selectively determine correlation times of both overall molecular reorientations and intramolecular motions. For relative conformational stability and structural fittings, we use motionally averaged experimental values of NMR 3J couplings over three bonds. The proline residue and its derivative 4-hydroxyproline with relatively simple structure and sidechain dynamics were chosen for the assessment of the new approach in this work. Initially, the grid search and simplexed MD simulations identified large number of parameter sets which fit equally well experimental J couplings. Using the Arrhenius-type exponential relationship between the force constant and the correlation time, the available MD data for a series of different parameter sets were analyzed to determine the value of the force constant that best reproduces experimental timescale of the sidechain dynamics. Verification of the new force-field parameters against NMR J couplings and correlation times showed consistent and significant improvements compared to the original force field in reproducing both structural and dynamics properties. These results suggest that matching experimental timescales of motions together with motionally averaged characteristics is a valid and robust approach for force field parameter optimization. Such a comprehensive approach is not restricted to cyclic proline and 4-hydroxyproline residues and can be extended to sidechain structure and dynamics of other amino acid residues, as well as to the protein backbone. In cases more complex than the Pro or Hyp sidechain dynamics, QM methods may also prove successful in providing information regarding the barrier heights of conformational changes, especially when the interpretation of the NMR relaxation data is not straightforward.
The authors thank University College London (UCL) for the provision of computational facilities. The authors acknowledge the use of the UCL Legion High Performance Computing Facility (Legion@UCL), and associated support services, in the completion of this work. The work presented here made use of the Emerald High Performance Computing facility. The Center is owned and operated by the e-Infrastructure South Consortium formed from the universities of Bristol, Oxford, Southampton and UCL in partnership with STFC Rutherford Appleton Laboratory. Prof Ad Bax is thanked for the provision of the details of their J-coupling measurements.[37, 38] Dr. Viktor Hornak is thanked for the details of their force field optimizations. Helpful and stimulating suggestions by the reviewers are gratefully acknowledged.