On the Use of Side‐Chain NMR Relaxation Data to Derive Structural and Dynamical Information on Proteins: A Case Study Using Hen Lysozyme

Abstract Values of S2CH and S2NH order parameters derived from NMR relaxation measurements on proteins cannot be used straightforwardly to determine protein structure because they cannot be related to a single protein structure, but are defined in terms of an average over a conformational ensemble. Molecular dynamics simulation can generate a conformational ensemble and thus can be used to restrain S2CH and S2NH order parameters towards experimentally derived target values S2CH (exp) and S2NH (exp). Application of S2CH and S2NH order‐parameter restraining MD simulation to bond vectors in 63 side chains of the protein hen egg white lysozyme using 51 S2CH (exp) target values and 28 S2NH (exp) target values shows that a conformational ensemble compatible with the experimentally derived data can be obtained by using this technique. It is observed that S2CH order‐parameter restraining of C−H bonds in methyl groups is less reliable than S2NH order‐parameter restraining because of the possibly less valid assumptions and approximations used to derive experimental S2CH (exp) values from NMR relaxation measurements and the necessity to adopt the assumption of uniform rotational motion of methyl C−H bonds around their symmetry axis and of the independence of these motions from each other. The restrained simulations demonstrate that side chains on the protein surface are highly dynamic. Any hydrogen bonds they form and that appear in any of four different crystal structures, are fluctuating with short lifetimes in solution.


On the Use of Side-Chain NMR Relaxation Data to Derive Structural and Dynamical Information on Proteins: A Case Study Using Hen Lysozyme
Lorna J. Smith, [a] Wilfred F. van Gunsteren, [b] and Niels Hansen* [c] Values of S 2 CH and S 2 NH order parameters derived from NMR relaxation measurements on proteins cannot be used straightforwardly to determine protein structure because they cannot be related to a single protein structure, but are defined in terms of an average over a conformational ensemble. Molecular dynamics simulation can generate a conformational ensemble and thus can be used to restrain S 2 CH and S 2 NH order parameters towards experimentally derived target values S 2 CH (exp) and S 2 NH (exp). Application of S 2 CH and S 2 NH order-parameter restraining MD simulation to bond vectors in 63 side chains of the protein hen egg white lysozyme using 51 S 2 CH (exp) target values and 28 S 2 NH (exp) target values shows that a conformational ensemble compatible with the experimentally derived data can be obtained by using this technique. It is observed that S 2 CH orderparameter restraining of CÀ H bonds in methyl groups is less reliable than S 2 NH order-parameter restraining because of the possibly less valid assumptions and approximations used to derive experimental S 2 CH (exp) values from NMR relaxation measurements and the necessity to adopt the assumption of uniform rotational motion of methyl CÀ H bonds around their symmetry axis and of the independence of these motions from each other. The restrained simulations demonstrate that side chains on the protein surface are highly dynamic. Any hydrogen bonds they form and that appear in any of four different crystal structures, are fluctuating with short lifetimes in solution.

Introduction
During the past 50 years, the determination of structure of proteins in crystal based on the reflections of X-rays has become a standard procedure to obtain information on proteins at the atomic level of resolution. Over the past 30 years NMR measurement of proteins in solution has also become a standard technique, not only to obtain information on protein structure, but also on dynamics at the atomic level of resolution. Several quantities that are observable by NMR, such as nuclear Overhauser enhancements (NOEs), 3 J couplings or chemical shifts, only provide local structural information. Only residual dipolar couplings (RDCs) do provide longer-range information in terms of the average (relative) directions of bond vectors throughout a molecule. In contrast, X-ray diffraction of crystals yields non-local information and, in addition, its information density, that is, the ratio of the number of independent measured values of observable quantities for a molecule and the number of independent molecular degrees of freedom, is much higher than that of NMR experiments on proteins in solution. On the other hand, NMR measurements may provide dynamic information in the form of relaxation data, for example, expressed as S 2 order parameters, and of all techniques available to obtain information on proteins in solution, NMR shows the highest information density. [1] All techniques to derive structural information from the measurement of observable quantities Q make use of a relation of Q to structure r, a function Q(r). [1] Since virtually all experimental techniques measure an average hQi space,time of Q over the molecules (space) in the test tube and over a time window determined by the type of experiment, the derivation of structural information from a set of hQi values should account for the averaging involved in the measurement. This can be done by applying multi-molecule averaging [2] or by time-averaging [3] structure refinement instead of the commonly used single-structure refinement technique. Application of time-averaging structure refinement to proteins based on X-ray data, [4,5] NMR NOE [6,7] or 3 J coupling [8] data showed the protein structural variation to be much larger than that observed using single-structure refinement techniques. [1] For observable quantities Q, such as X-ray reflection intensities I hkl , NOEs (when represented as atom-atom distance bounds), 3 J couplings or chemical shifts, it is possible to formulate a function Q(r) relating a Q value to a particular structure r. RDCs may be directly related to structure by assuming a single alignment tensor representing the anisotropic rotational distribution of the molecule, which is, unfortunately, unknown. For other observable quantities, such as S 2 order parameters, the function relating Q to r involves some average over the Boltzmann ensemble of structures in solution, Q(hf(r)i), where f denotes the function of r that is being averaged. [1] This means that structure refinement based on such quantities must involve the averaging hf(r)i in addition to the averaging hQ(hf(r)i)i. [9] S 2 order parameters are commonly derived from an analysis of NMR relaxation data using a so-called "model free" model [10] and can be calculated as an ensemble-or time-average of a function of the three Cartesian coordinate components of the 13 CÀ 1 H or 15 NÀ 1 H bond vector. [11] They are commonly interpreted as a measure of the geometrical restriction (S 2 = 0: no restriction; S 2 = 1: complete restriction) of the directions of the mentioned bond vectors on a time scale faster than the stochastic rotational tumbling of the molecule in solution, for proteins of the order of nanoseconds. For relatively stable proteins, order parameters involving backbone atoms lie generally in the range 0.75 < S 2 < 0.95. Exceptions occur in longer loops connecting secondary structure elements (S 2 values as low as 0.5) and for thermodynamic conditions, such as low pH, that are known to often destabilise proteins. For bond vectors in side chains, values as low as 0.05 can be found.
MD simulations have reproduced experimentally derived order-parameter values for bond vectors involving backbone atoms with some success. [12,13] For side chains, this is more challenging, [14][15][16][17] because of the flexibility of side chains and of the multiple hydrogens bound to the 13 C atom in a CH 3 -group or bound to the 15 N atom in a NH 2 À group. This suggests the use of S 2 order-parameter structure refinement using time averaging in order to obtain a conformational ensemble compatible with the order-parameter data. The technique of time-averaging order-parameter structure refinement has been tested on the backbone 15 NÀ 1 H order parameters of the B3 domain of protein G, [9] and subsequently applied to backbone 15 NÀ 1 H order parameters of the protein IL-4 at pH 6 to detect inconsistencies and model flaws regarding complementary sets of NMR data, [18] and applied to backbone 15 NÀ 1 H order parameters of the protein hGH at pH 2.7 in order to explain the occurrence of low order-parameter values in the middle of stable helices. [19] The application of S 2 order-parameter restraining to CH 3 and NH 2 moieties in protein side chains is more challenging than to backbone NH groups. The multiple hydrogens cause ambiguity regarding peak assignments, which requires additional assumptions. The directional variability of bond vectors in side chains is generally larger than for backbone NÀ H or C α À H vectors, leading to a greater variety of and smaller order-parameter values. Third, averaging over side-chain motions may take longer to converge than over limited backbone motion in a stable protein.
Here the application of S 2 order-parameter restraining to side-chain NH, NH 2 and CH 3 moieties of the protein hen egg white lysozyme (HEWL) [20,21] is investigated. An earlier compar-ison of the experimentally derived S 2 values with those obtained from short, 1 ns unrestrained MD simulations showed a poor relation between simulation and experiment, [14] which could be due to the short simulation time period or force-field deficiencies, assuming no flaws in the experimental data. Use of an improved force field, of much longer sampling, and of S 2 order-parameter restraining might generate a conformational ensemble compatible with these and other experimental data on HEWL.
HEWL is one of the proteins most studied. Several X-ray crystal structures are available, and sets of NOE data, [22] of 3 J couplings, [23] RDC values, [24] and of S 2 order parameters, [20,21] measured at a variety of thermodynamic conditions. Here the configurational ensembles, generated in unrestrained and in order-parameter restrained MD simulations, will be used to calculate side-chain 13 CÀ 1 H and 15 NÀ 1 H order-parameter values, which are compared to values obtained from NMR measurements at pH 3.5. [20,21] The side-chain S 2 values were separated into two groups, one of CÀ H values and another of NÀ H values. By separately restraining these subsets of S 2 values, it could be investigated whether restraining one subset would improve the agreement with values derived from experiment for the other subset of S 2 values. Four X-ray crystal structures were used in the simulations and for comparison, in order to delineate the influence of a particular starting structure on the generated configurational ensemble.
The simulated configurational ensembles were also used to calculate NOE atom-atom distances that were compared to a set of NOE atom-atom distance bounds derived from experiment at pH 3.8. [22] A set of side-chain 3 J HαHβ couplings derived from experiment is available. [22,23] The 26 3 J HαHβ couplings in side chains for which S 2 order-parameter values derived from experiment are available, were used for comparison. They regard χ 1 torsional angles close to the backbone. A comparison of simulated with experimentally derived 3 J HαHβ coupling values is not straightforward though, because the Karplus relation 3 J(θ) that connects a torsional angle θ to a 3 J coupling, possesses a rather large uncertainty of 1-2 Hz, [25] small 3 J coupling values ( � 4 Hz) are difficult to determine precisely from spectra, and 3 J couplings in the range 5-8 Hz may result from averaging over long time periods (microseconds). The set of RDCs for side chains of HEWL [24] was not used for comparison, because they strongly depend on the solvent environment in the measurement. [26] doubly protonated. [32] The simple point charge (SPC) model [33] was used to describe the solvent molecules in the rectangular periodic box. To compensate for the overall positive charge of the protein, 10 Cl À ions were included in the solution. All bond lengths and the bond angle of the water molecules were kept rigid with a relative geometric precision of 10 À 4 using the SHAKE algorithm, [34] allowing for a 2 fs MD time step in the leap-frog algorithm [35] used to integrate the equations of motion. For the non-bonded interactions a triple-range method [36] with cut-off radii of 0.8/1.4 nm was used. Short-range van der Waals and electrostatic interactions were evaluated every time step based on a charge-group pair list. [29] Medium-range van der Waals and electrostatic interactions, between pairs at a distance larger than 0.8 nm and shorter than 1.4 nm, were evaluated every fifth time step (10 fs), at which time point the pair list was updated, and kept constant between updates. Outside the larger cut-off radius a reaction-field approximation [37,38] with a relative dielectric permittivity of 61 [39] was used. Minimum-image periodic boundary conditions were applied.

Simulation set-up
Four X-ray crystal structures were used as initial structures for the energy minimisations followed by MD simulations.
1. Structure 2VB1 from the Protein Data Bank (PDB), [40] derived from a triclinic unit cell at 0.065 nm resolution at T = 100 K. It contains multiple side-chain conformations for 46 residues. 2. Structure 4LZT from the PDB, derived from a triclinic unit cell at 0.095 nm resolution at T = 295 K. It contains multiple side-chain conformations for 8 residues. 3. Structure 1IEE from the PDB, derived from a tetragonal unit cell at 0.094 nm resolution at T = 110 K. It contains multiple sidechain conformations for 33 residues. 4. Structure 1AKI from the PDB, derived from a orthorhombic unit cell at 0.15 nm resolution at T = 298 K. It contains no multiple side-chain conformations.
For the initial structures the side-chain conformation with the highest occupancy was chosen.
An initial structure was first energy minimised in vacuo to release possible strain induced by small differences in bond lengths, bond angles, improper dihedral angles, and short non-bonded contacts between the force-field parameters and the X-ray structure. Subsequently, the protein was put into a rectangular box filled with water molecules, such that the minimum solute-wall distance was 1.0 nm, and water molecules closer than 0.23 nm from the solute were removed. This resulted in boxes with 12157 water molecules for the initial protein structures. In order to relax unfavourable contacts between atoms of the solute and the solvent, a second energy minimisation was performed for the protein in the periodic box with water while keeping the atoms of the solute harmonically position-restrained [29] with a force constant of 25 000 kJ mol À 1 nm À 2 .
The resulting protein-water configuration (i. e., coordinates) was used as initial configuration for the MD simulation. In order to avoid artificial deformations in the protein structure due to relatively high-energy atomic interactions still present in the system, the MD simulation was started at T = 60 K and then the temperature was slowly raised to T = 308 K. Initial atomic velocities were sampled from a Maxwell distribution at T = 60 K. The equilibration scheme consisted of five short 20 ps simulations at temperatures 60, 120, 180, 240 and 308 K at constant volume. During the first four of the equilibration periods, the solute atoms were harmonically restrained to their positions in the initial structures with force constants of 25 000, 2500, 250, and 25 kJ mol À 1 nm À 2 . The temperature was kept constant using the weak coupling algorithm [41] with a relaxation or coupling time τ Τ = 0.1 ps. Solute and solvent were separately coupled to the heat bath. Following this equilibration procedure, the simulations were performed at a reference temperature of 308 K and a reference pressure of 1 atm. The pressure was kept constant using the weak coupling algorithm [41] with a coupling time τ p = 0.5 ps and an isothermal compressibility k T = 4.575 × 10 À 4 (kJ mol À 1 nm À 3 ) À 1 . The centre of mass motion of the system was removed every 1000 time steps (2 ps).

Order-parameter restraining
Two sets of 13 CÀ 1 H and 15 NÀ 1 H side-chain order-parameter target values S 2 XY (exp) [20,21] for restraining [9] were used, see Tables 1-3. 1. A set of 51 S 2 CH (exp) values for CH 3 moieties in 30 residues, [21] 2. A set of 28 S 2 NH (exp) values for NH and NH 2 moieties in six Trp, five Arg, fourteen Asn and three Gln residues. [20] The distribution of these S 2 values over the protein is shown in Figure 1.
For the Asn and Gln residues, one S 2 NH (exp) value per NH 2 group was available. This required the assignment to one of the two NH1 and NH2 bond vectors. This was done by calculating S 2 NH1 (MD) and S 2 NH2 (MD) values from the unrestrained simulation MD_2VB1 starting from the 2VB1 X-ray structure and then selecting the NÀ H vector with its S 2 NH (MD) value closest to S 2 NH (exp) for restraining. A corresponding procedure was used to assign experimentally unassigned S 2 CG1 and S 2 CG2 values for Val residues and S 2 CD1 and S 2 CD2 values for Leu residues. For an ideal methyl group with equal and fixed CÀ H bond lengths and HÀ CÀ H bond angles in which rotation around the symmetry axis occurs uniformly, the order parameter for the CÀ H bond vector is given by [15] S 2 rot ¼ ðð3 cos 2 bÀ 1Þ=2Þ 2 (1) where β is the angle between a CÀ H vector and the symmetry axis, which can be considered equal to the CÀ C bond vector of the bond to the C-atom adjacent to the CH 3 group. When in addition the rotational motion around the CÀ C axis is independent of the motion of the C-axis itself, one may factorise their contributions, When β = 109.5°, one has [15] S 2 rot = 0.111. Thus the methyl group restraining is applied to the CÀ C bond vector and the target value is Table 1. S 2 CH values (51) derived from relaxation measurements and from four unrestrained MD simulations starting from four X-ray crystal structures, and the mean of the latter four values and the root-mean-square deviation (RMSD) from it. Order-parameter target values larger than 0.95 were set to 0.95 (second column between brackets). Values differing more than 0.2 from the experimental value (0.95 in case the experimental value is 1) are denoted using italics.

MD simulations performed
Four unrestrained MD simulations, starting from the four mentioned X-ray crystal structures, were performed:   (11) for Trp (NE1-HE1) and Arg (NE-HE) side chains derived from relaxation measurements and from four unrestrained MD simulations starting from four X-ray crystal structures, and the mean of the latter four values and the root-mean-square deviation (RMSD) from it. Values differing more than 0.2 from the experimental value are denoted using italics.
Residue Experimental MD simulation value [20] 2VB1  (17) for Asn (ND2-HD21, -HD22) and Gln (NE2-HE21, -HE22) side chains derived from relaxation measurements and from four unrestrained MD simulations starting from four X-ray crystal structures, and the mean of the latter four values and the root-mean-square deviation (RMSD) from it. The experimental values correspond to either HD/E21 or HD/E22. [20] The assignment in the second column is based on the best agreement with the values of the MD_2VB1 simulation (third column). Values differing more than 0.2 from the experimental value are denoted using italics.

Analysis of atomic trajectories
Trajectory energies and atomic coordinates were stored at 5 ps intervals and used for analysis. [42] S 2 order parameters were calculated using the ensemble averaging expression [11] where τ indicates the time-averaging window, here 1 ns, shorter than the rotational correlation time of 5.7 ns of HEWL in solution, [22] are the components of the vector r XY �r X À r Y connecting atoms X and Y, and r XY �j r XY j its length. [9] To obtain a dimensionless quantity the term in curly brackets is multiplied with the 6 th power of the effective length (r eff XY ) of the vector r XY . Because in the present work bond length constraints are used, the length of r XY is essentially constant over time and thus equal to its effective value r eff XY . Before calculating S 2 XY , the protein trajectory structures are superimposed using the backbone atoms (N, C α , C) of residues 3-126 in the fit in order to eliminate the effect of overall rotation of the protein upon the S 2 XY values. Use of only the backbone atoms of four of the five α-helices and two β-strands in HEWL (residues 4-15, 24-36, 41-45, 50-53, 89-99, and 108-115) did not lead to significantly different S 2 XY values. In the S 2 order-parameter restraining simulations, the S 2 order parameter is calculated at every time step (2 fs) using an exponential damping factor in the average [9] with a memory relaxation time τ sr = 200 ps, and no rotational fit of the protein structures is carried out, which means that the calculated order parameters in the biased simulation are influenced by the stochastic tumbling of the protein. These S 2 order-parameter values will thus differ slightly from the ones calculated from the saved trajectory structures, because in the averages in Equation (4) trajectory structures 5 ps apart are used, the exponential damping factor is not used, the averaging period is 1 ns and a rotational fit of the protein structures is carried out. However, to analyse all trajectories in the same way Equation (4) was used for both the unrestrained and restrained trajectories.
In view of the uncertainty inherent to the derivation of S 2 XY (exp) values from relaxation experiments and inherent to the calculation of S 2 XY (MD) values from MD simulation, a deviation of less than 0.2 between simulation and experiment is considered insignificant.
The GROMOS force fields treat aliphatic carbons as united CH, CH 2 and CH 3 atoms. So inter-hydrogen distances involving the aliphatic hydrogen atoms were calculated using virtual atomic positions for CH and pro-chiral CH 2 [43] and pseudo-atomic positions for CH 3 [44] for those hydrogen atoms. [29] The pseudo-atom NOE distance bound corrections of ref. [44] were used. [1] The set of NOE distance bounds can be found in Table S1 in the Supporting Information, together with values obtained from the seven simulations. The NOE between Trp28 HZ3 and Leu56 HG was reassigned as between Trp28 HZ3 and Leu56 HD* following reassessment of the experimental spectra. Inter-hydrogen distances were calculated as hr À 3 i À 1/3 , that is, using r À 3 averaging over the trajectory structures, where r indicates the actual hydrogen-hydrogen distance. [45] In view of the uncertainty inherent to the calculation of NOE bounds and r À 3 averaged distances, deviations from experiment of less than 0.1 nm are considered insignificant.
For the calculation of the side-chain 3 J Hα-Hβ couplings, the Karplus relation [46,47] was used with the parameter values a = 9.5 Hz, b = À 1.6 Hz and c = 1.8 Hz. [48] In view of the various factors contributing to an uncertainty of about 2 Hz inherent to the Karplus relation linking structure and 3 J couplings, a deviation of less than 2 Hz between 3 J Hα-Hβ coupling values calculated from MD trajectory structures and 3 J Hα-Hβ coupling values derived from experiment is considered insignificant.
Atom-positional root-mean-square differences RMSD(t) between trajectory structures and the X-ray crystal structures and atompositional root-mean-square fluctuations (RMSF), i. e. around their average positions, in the MD trajectories were calculated after superimposing the backbone atoms (N, C α , C) of residues 3-126 to eliminate the contribution of overall translation and rotation of the protein.
The secondary structure assignment was done with the program DSSP, based on the Kabsch-Sander rules. [49] Hydrogen bonds were identified according to a geometric criterion: a hydrogen bond was assumed to exist if the hydrogen-acceptor distance was smaller than 0.25 nm and the donor-hydrogen-acceptor angle was larger than 135°. Table 1  are expected for side chains at the protein surface. Yet, this seems not always true. Val99 has in the 2VB1 X-ray structure a solvent accessible area of only 7 % and its side chain is surrounded by the side chains of Tyr20, Trp28, Ile98, and Tyr108. This leads to simulated S 2 CH values of about 0.71 to be compared to the experimentally derived value of 0.5. Thr89 has in the 2VB1 X-ray structure a solvent accessible area of 76 %. This leads to simulated S 2 CH values of about 0.68 to be compared to the experimentally derived value of 1.0. A larger variation (RMSD � 0.12) of S 2 CH values between the four MD simulations is observed for Met12 CE, Leu75 CD1, Ile78 CG2, Val99 CG1 and CG2, and Met105 CE. Table 2 lists the 11 S 2 NH values for Trp (NE1-HE1) and Arg (NE-HE) side chains derived from relaxation measurements and from four unrestrained MD simulations starting from four X-ray crystal structures. Three simulations show only one deviation from the experimentally derived value larger than 0.2. Both, large and small values are well reproduced. Table 3 lists the 17 S 2 NH values for Asn (ND2-HD21, -HD22) and Gln (NE2-HE21, -HE22) side chains derived from relaxation measurements and from four unrestrained MD simulations starting from four X-ray crystal structures. The simulations show one or two deviations from the experimentally derived value larger than 0.2, four simulated values, for Asn65 and 74, smaller than the experimentally derived values, and two simulated values, for Asn103 and 113, larger than the experimentally derived ones. Table 4 shows the occurrence (%) of hydrogen bonds involving the side chains of Arg, Asn, Gln and Trp residues in the four X-ray structures and in the four unrestrained MD simulations starting from the four respective X-ray structures. The four hydrogen bonds present in all four X-ray structures are also observed in the MD simulations, but with widely different occurrences (0-85 %). For the seven hydrogen bonds observed in only three of the four X-ray structures the occurrences vary from 0 to 99 %. Considering the different thermodynamic conditions under which the X-ray diffraction of the different crystals was measured and the pH of the NMR measurement in aqueous solution, the observed differences are not surprising.

Comparison of S 2 order-parameter values calculated from unrestrained MD trajectories with NMR derived values
Figures 2 and S1-S3 show the secondary structure elements [49] of HEWL as a function of time calculated for the four unrestrained MD simulations. Five α-helices (red; residues 4-15, 24-36, 89-99, 108-115 and 121-125) and three β-strands (blue; residues 41-45, 50-53 and 58-59) are largely maintained, but the α-helix at residues 108-115 turns into two β-bridges (yellow) after 3 ns in the MD_1IEE simulation ( Figure S2). All four simulations show a helix of alternating α-helical (red) and 3 10helical (black) character at residues 80-85. At residues 21-24, simulation MD_1AKI ( Figure S3) shows a 3 10 -helix, which is lost after about 6 ns in the MD_2VB1 ( Figure 2) and MD_1IEE ( Figure S2) simulations, and within 1 ns in the MD_4LZT simulation ( Figure S1). Figure 3 shows the backbone C α atom-positional rootmean-square fluctuations (RMSF) as function of residue sequence number in the four unrestrained MD simulations MD_ 2VB1 (black), MD_4LZT (red), MD_1IEE (green) and MD_1AKI (blue) starting from the respective four X-ray structures. Apart from the residues beyond residue sequence number 100 at the C-terminal part of the polypeptide chain, the motional patterns in the four simulations are rather similar, except for residues 21-24 in the MD_4LZT (red) simulation that become very mobile, their initial 3 10 -helical character being lost. Table 5 lists the 51 S 2 CH values derived from relaxation measurements and from the unrestrained and order-parameter re- Figure 2. Secondary structure elements [49] as a function of time calculated for the unrestrained MD simulation MD_2VB1 starting from the 2VB1 X-ray structure. Red: α-helix; green: π-helix; black: 3 10 -helix; blue: β-strand; yellow: β-bridge; brown: bend; grey: turn.   * Some hydrogen bonds involving aspartic acid side chains are present in the simulations with either OD1 or OD2 acting as the acceptor. In these cases (marked OD1/2) the highest population of the hydrogen bond involving either OD1 or OD2 is listed. Similarly for hydrogen bonds involving asparagine NH 2 groups in some cases (marked ND2-HD1/2) the highest population of a hydrogen bond where the donor is either ND2-HD21 or ND2-HD22 is listed while for arginine NH2 groups in some cases (marked NH1/2-HH1/2) the highest population of a hydrogen bond where the donor is either NH1-HH11, NH1-HH12, NH2-HH21 or NH2-HH22 is listed.

Comparison of S 2 order-parameter values calculated from S 2 order-parameter restraining MD trajectories with NMR derived values
slightly worsens the agreement between simulation and experiment for the S 2 NH (exp) values. Worsening by more than 0.1 is observed for Trp108, Trp111, and Arg112 (Table 6), and for Gln57 HE21 and Asn65 HD22 (Table 7). Yet, also some improvement by 0.1 of the agreement between simulation and experiment for the S 2 NH values is observed, for example, for Asn44 HD22, Asn46 HD22 and Asn103 HD22 (Table 7). Tables 6 and 7 show that in the unrestrained MD simulation only two S 2 NH order-parameter values (in italics) deviate more than 0.2 from the experimentally derived values, for Trp62 (Table 6) and for Asn103 HD22 (Table 7). S 2 order-parameter restraining towards the set Nres of 28 target S 2 NH (exp) values leads, as expected, to good agreement between simulation and experiment for the 28 S 2 NH order parameters. No deviations larger than 0.2 are observed.
S 2 order-parameter restraining towards the set C + Nres of 79 target S 2 CH (exp) and S 2 NH (exp) values leads, as expected, to good agreement between simulation and experiment for 78 S 2 order parameters ( Figure S4), only one deviation larger than 0.2 is observed, for the S 2 CH value of Ala95 CB ( Table 5). The S 2 order- Table 5. S 2 CH values (51) derived from relaxation measurements and from the unrestrained and order-parameter restrained MD simulations starting from the 2VB1 X-ray crystal structure. Order-parameter target values larger than 0.95 were set to 0.95 (second column between brackets). Values differing more than 0.2 from the experimental value (0.95 in case the experimental value is 1) are denoted using italics.

Residue and methyl group
Experimental value [21] Unrestrained MD Order-parameter restrained MD parameter restraining is not able to reduce the S 2 CH (MD) value from 0.94 in the unrestrained simulation to the target S 2 CH (exp) value of 0.68. Enhancing the mobility of the CA-CB vector that is close to the polypeptide backbone and in a residue that is in the centre of a helix seems impossible with the parameters applied here.  Table 8 shows the occurrence (%) of hydrogen bonds involving the side chains of Arg, Asn, Gln and Trp residues in the MD_2VB1 unrestrained MD simulation and in the three S 2 order-parameter restraining MD simulations starting from the 2VB1 X-ray structure. The unrestrained simulation shows nine hydrogen bonds with an occurrence larger than 50 %. S 2 orderparameter restraining MD simulation reduces this number to 5, 3 and 3 for the three S 2 order-parameter restraining MD simulations using the Cres, Nres or C + Nres sets of restraints, respectively. All but one of the occurrences of the mentioned 9 hydrogen bonds are reduced by the S 2 order-parameter restraining. In contrast, only one hydrogen-bond occurrence is raised above 50 % by restraining, the hydrogen bond Asn46 ND2-HD22-Ser50 O, from 2 % to 51, 27 and 17 %, respectively. Of the 52 hydrogen bonds listed (i. e., observed in either the four X-ray structures or for at least 20 % in the seven MD simulations), 38 are observed in the unrestrained simulation, 39, 37 and 42 are observed in the three S 2 order-parameter restraining MD simulations using the Cres, Nres or C + Nres sets of restraints, respectively.
There are a number of examples where the MD_2VB1 unrestrained simulation yields a S 2 value larger than experiment. When restraining reduces the S 2 value, a reduction or change in the populations of hydrogen bonds involving that side chain is observed. For example, Asn44 (experimental S 2 NH value 0.51 compared to 0.71 and 0.58 in the MD_2VB1 and MD_ 2VB1_C + Nres simulations, respectively, Table 7) shows a reduction in the populations of hydrogen bonds to the side chains of Asp52 and Gln57 (Table 8) and Asn46 (experimental S 2 NH value 0.62 compared to 0.82 and 0.56 in the MD_2VB1 and MD_2VB1_ C + Nres simulations, respectively) shows a reduction in the population of the hydrogen bond to the side chain of Asp52, but an increase in the population of hydrogen bonds to both the main chain and side chain oxygens of Ser50 in the restrained simulations. Similarly, for Asn103 (experimental S 2 NH value 0.26 compared to 0.61 and 0.17 in the MD_2VB1 and MD_ 2VB1_C + Nres simulations, respectively), the hydrogen bond to the side chain of Asp101 is almost completely lost in the restrained simulations. Figures 4 to 6 show the secondary structure elements [49] of HEWL as a function of time calculated for the three S 2 orderparameter restraining MD simulations starting from the 2VB1 Xray structure. Compared to the MD_2VB1 unrestrained simulation (Figure 2), S 2 CH order-parameter restraining towards the S 2 CH (exp) values of set Cres induces some changes in secondary  (11) for Trp (NE1-HE1) and Arg (NE-HE) side chains derived from relaxation measurements and from the unrestrained and orderparameter restrained MD simulations starting from the 2VB1 X-ray crystal structure.

Residue
Experimental value [20] Unrestrained MD Order-parameter restrained MD  . Secondary structure elements [49] as a function of time calculated for the S 2 CH order-parameter restraining MD simulation MD_2VB1_Cres starting from the 2VB1 X-ray structure. Red: α-helix; green: π-helix; black: 3 10helix; blue: β-strand; yellow: β-bridge; brown: bend; grey: turn. structure ( Figure 4). Residues 19-21 become 3 10 -helical, the second α-helix (residues 24-36) is slightly more stable at its Cterminal end, the helix of alternating α-helical and 3 10 -helical character at residues 80-85 gains 3 10 -helical character, residues 103-108 become α-helical after 4 ns, and the α-helix 108-115 gains π-helical character. Generally, the 3 10 -helical character is increased. S 2 NH order-parameter restraining towards the S 2 NH (exp) values of set Nres shows similar changes in the secondary structure ( Figure 5). The same observation holds for the MD_ 2VB1_C + Nres simulation ( Figure 6). Figure 7 shows the backbone C α atom-positional rootmean-square fluctuations (RMSF) as function of residue sequence number in the unrestrained MD simulation MD_2VB1 (black) and in the S 2 order-parameter restraining simulations MD_2VB1_Cres (magenta), MD_2VB1_Nres (cyan) and MD_2VB1_ C + Nres (orange) all starting from the 2VB1 X-ray structure. S 2 CH order-parameter restraining induces mobility for residues 1-17, residues 85-105 and residues 109-112. Restraining S 2 NH order parameters shows increased mobility for residues 100-105. Restraining to both sets of order parameters shows increased mobility for residues 100-104 and 109-110.  (17) for Asn (ND2-HD21, -HD22) and Gln (NE2-HE21, -HE22) side chains derived from relaxation measurements and from the unrestrained and order-parameter restrained MD simulations starting from the 2VB1 X-ray crystal structure. The experimental values correspond to either HD/E21 or HD/E22. [20] The assignment in the second column is based on the best agreement with the values of the MD_2VB1 simulation (third column). The NÀ H vectors used as restraint are indicated by * . Values differing more than 0.2 from the experimental value are denoted using italics.

Residue
Experimental value [20] Unrestrained MD   5. Secondary structure elements [49] as a function of time calculated for the S 2 NH order-parameter restraining MD simulation MD_2VB1_Nres starting from the 2VB1 X-ray structure. Red: α-helix; green: π-helix; black: 3 10helix; blue: β-strand; yellow: β-bridge; brown: bend; grey: turn.  Table 8. Occurrence (%) of hydrogen bonds (52) involving the side chains of Arg, Asn, Gln and Trp residues in the MD_2VB1 unrestrained MD simulation and in the three S 2 order-parameter restraining MD simulations starting from the 2VB1 X-ray structure. Only hydrogen bonds present in one of the X-ray structures or with a population of at least 20 % in any of the restrained or unrestrained MD simulations are included. Only hydrogen bond populations of 1 % or greater are shown.

Discussion
* Some hydrogen bonds involving aspartic acid side chains are present in the simulations with either OD1 or OD2 acting as the acceptor. In these cases (marked OD1/2) the highest population of the hydrogen bond involving either OD1 or OD2 is listed. Similarly for hydrogen bonds involving asparagine NH 2 groups in some cases (marked ND2-HD1/2) the highest population of a hydrogen bond where the donor is either ND2-HD21 or ND2-HD22 is listed while for arginine NH2 groups in some cases (marked NH1/2-HH1/2) the highest population of a hydrogen bond where the donor is either NH1-HH11, NH1-HH12, NH2-HH21 or NH2-HH22 is listed.
Nres simulation. S 2 order-parameter restraining to all 79 experimentally derived S 2 order-parameter values yields even less deviations larger than 0.1 (12) than restraining only to the 51 S 2 CH order-parameter values (16). Tables 10 and 11 summarise the deviations of S 2 NH (MD) values from S 2 NH (exp) values for the 28 S 2 NH order parameters in the seven MD simulations. Of course, S 2 NH order-parameter restraining marginally improves the agreement with experiment. S 2 CH order-parameter restraining does not improve the agreement with experiment for the 28 S 2 NH order parameters, from 12 deviations larger than 0.1 in the MD_2VB1 simulation to 11 deviations in the MD_2VB1_Nres simulation. Combining S 2 CH and S 2 NH order-parameter restraining yields almost equally good agreement.
The data in Tables 5 and 7 and 9-11 indicate some unreliability of S 2 CH order-parameter restraining, which may have different experimental or computational sources: 1) Some S 2 CH (exp) values (Table 5) are very different for two vectors in the same side chain, for example, for Ile58 CG2 and CD (1.0 and 0.16) and less so for Leu25 CD1 and CD2 (1.0 and 0.609) and Ile55 CG2 and CD (0.739 and 0.323). This suggests an unlikely large difference in mobility for nearby CÀ H vectors. 2) Some residues in the protein interior show unexpectedly low S 2 CH (exp) values, indicating high mobility, for example Val99 (CG1 0.487 and CG2 0.517) with a solvent accessible area in the 2VB1 X-ray structure of only 7 %. 3) Some residues at the surface of the protein show unexpectedly high S 2 NH (exp) values, indicating low mobility, for example Thr89 CG2 with S 2 CH (exp) = 1.0 and a solvent accessible area in the 2VB1 X-ray structure of 76 %. 4) As discussed in section 2.3, the S 2 CH order-parameter restraining algorithm restrains the CÀ C vector adjacent to the three CÀ H vectors of a methyl group, of which the relaxation is measured experimentally. This procedure is based on the assumption that the rotational motion of the CÀ H vectors around the axis of symmetry of the CH 3 group is uniform and decoupled from the motion of the symmetry axis itself (the CÀ CH 3 vector). These assumptions need not be true. This suggests that S 2 CH orderparameter restraining is less reliable than S 2 NH order-parameter restraining, for which the latter assumptions are not invoked. Table 12 shows the number of NOE distance bound violations in the four X-ray crystal structures and the seven MD simulations for the 1630 NOE distance bounds specified in Table S1. S 2 CH order-parameter restraining decreases the number of NOE bound violations larger than 0.1 nm from 42 in the unrestrained 2VB1 simulation to 34 in the MD_2VB1_Cres simulation. S 2 NH order-parameter restraining reduces the number Figure 6. Secondary structure elements [49] as a function of time calculated for the S 2 CH and S 2 NH order-parameter restraining MD simulation MD_2VB1_N + Cres starting from the 2VB1 X-ray structure. Red: α-helix; green: π-helix; black: 3 10 -helix; blue: β-strand; yellow: β-bridge; brown: bend; grey: turn. Figure 7. Backbone C α atom-positional root-mean-square fluctuations (RMSF) as function of residue sequence number for the unrestrained MD simulation MD_2VB1 (black) and for the three S 2 order-parameter restraining MD simulations MD_2VB1_Cres (magenta), MD_2VB1_Nres (cyan) and MD_ 2VB1_C + Nres (orange) all starting from the 2VB1 X-ray structure. The trajectory structures are translationally and rotationally superimposed using the backbone atoms (N, C α , C) of residues 3-126. The black bars at the top indicate secondary structure elements of HEWL (thick bars: α-helix; thin bars, β-strand). order-parameter restraining yields better agreement, with 30 violations larger than 0.1 nm, as well as worse agreement, with nine violations larger than 0.2 nm. S 2 order-parameter restraining in MD simulation improves agreement with experimentally derived NOE atomatom distance bounds. Table 13 shows 26 side-chain 3 J HαHβ coupling values for side chains, for which S 2 order-parameter values derived from experiment are available, as derived from NMR measurements [23] as well as from the unrestrained and S 2 order-parameter restraining MD simulations starting from the 2VB1 X-ray crystal structure. In the unrestrained simulation, six 3 J HαHβ coupling values differ more than 2 Hz from the experimentally derived values, four for side chains for which experimentally derived S 2 CH order-parameter values are available and two for side chains for which experimentally derived S 2 NH order-parameter values are available. S 2 CH order-parameter restraining induces two more deviations of 3 J HαHβ -couplings in side chains with S 2 CH orderparameter values, while S 2 NH order-parameter restraining induces four more deviations of 3 J HαHβ -couplings in side chains with S 2 NH order-parameter values. Combined S 2 CH and S 2 NH order-parameter restraining also leads to four more deviations of 3 J HαHβcouplings, two in side chains with S 2 CH order-parameter values and two in side chains with S 2 NH order-parameter values. Overall, S 2 order-parameter restraining in MD simulation did not improve the agreement with experiment for the 26 side-chain 3 J HαHβ coupling values. For some residues, Thr51 for example, the agreement improves, whereas for other residues, Val29 for example, the agreement worsens.

Conclusions
S 2 order parameters for CÀ H and NÀ H bonds in proteins derived from NMR relaxation measurements reflect the directional mobility of these bonds. Consequently, they cannot be related to a single protein structure, but to an ensemble of such structures. Therefore, S 2 order parameters are not used as data for standard single-structure determination of proteins. A comparison of four X-ray structures of HEWL and four MD simulations starting from these four different X-ray structures illustrates the need of a conformational ensemble representation of the HEWL protein, in particular for its side chains. MD simulation allows for averaging over an ensemble of trajectory structures, which is used in protein structure determination based on S 2 order parameters. [9] S 2 NH order-parameter restraining can be directly applied to the NÀ H bond vectors in a protein. In contrast, S 2 CH order-parameter restraining of CÀ H bond vectors in methyl groups makes no sense because of the fast rotation of the three CÀ H bonds around their symmetry axis parallel to the CÀ CH 3 bond. By assuming the above-mentioned rotation to be uniform and independent from the motion of the symmetry axis itself, [15] the S 2 CH order-parameter restraining algorithm can be applied to the CÀ CH 3 bond. The results for HEWL show that S 2 CH order-parameter restraining is more problematic than S 2 NH order-parameter restraining, which may be due to less valid assumptions and approximations used to derive experimental S 2 CH (exp) values from NMR relaxation measurements and the assumptions of uniform rotational motion of methyl CÀ H bonds around their symmetry axis and of the independence of these motions from each other. The application of S 2 order-parameter restraining to the protein HEWL shows that this technique is able to produce a conformational ensemble compatible with the experimentally derived S 2 CH (exp) and S 2 NH (exp) values. S 2 order-parameter restraining in MD simulation does improve the agreement with 1630 NOE atom-atom distance bounds for HEWL. It maintains the overall structure of the protein and induces slightly more mobility, reflected in the backbone atom-positional fluctuations. The unrestrained MD simulations show a high level of conformational disorder for side chains on the protein surface. However, this disorder is increased even further on S 2 orderparameter restraining. In the MD simulations, which show good agreement with the experimental order parameters, the populations of many of the hydrogen bonds that are seen in all or most of the X-ray structures are low. This has important implications for the use of X-ray structure data in areas such as drug design, the interpretation of mutational data and receptor binding studies.  Table 13. Side-chain 3 J HαHβ coupling values (26, for side chains for which S 2 order-parameter values derived from experiment are available), in Hz, derived from NMR measurements and from the unrestrained and order-parameter restrained MD simulations starting from the 2VB1 X-ray crystal structure. Experimental data is from Tables III and IV of ref. [23] and consists of values that could be stereo-specifically assigned based on NMR data as well as of values that could not be stereospecifically assigned in this way (marked with *). For the latter, stereo-specific assignment of the experimental values for the β 2 and β 3 hydrogens is based on the 3 J HαHβ coupling values calculated from the four unrestrained MD simulations starting from the four X-ray structures in case 4 or 3 of the unrestrained MD simulations suggested the same stereo-specific assignment. The root-mean-square fluctuations (RMSF) of the 3 J HαHβ couplings in the MD simulations are given within parentheses. MD values differing more than 2 Hz from the experimental value are denoted using italics.