A geometry optimization force field was developed using ultra high-resolution structures and tested using high- and low-resolution X-ray structures. Protein and small molecule X-ray data was used. When applied to ultra high-resolution structures the force field conserves the internal geometry and local strain energy. When applied to low-resolution structures there is a small change in geometry accompanied by a large drop in local strain energy. Although optimization causes only small structural changes in low-resolution X-ray models, it dramatically modifies profiles for hydrogen bonding, Van der Waals contact, bonded geometry, and local strain energy, making them almost indistinguishable from those found at high resolution. Further insight into the effect of the force field was obtained by comparing geometries of homologous proteins before and after geometry optimization. Optimization causes homologous regions of structures to become similar in internal geometry and energies. Once again, the changes only require small atomic movements. These findings provide insights into the structure of molecular complexes. The new force field contains only short-range interatomic potential functions. Its effectiveness shows that local geometries are determined by short-range interactions which are well modeled by the force field. Potential applications of this study include detection of possible structural errors, correction of errors with minimal change in geometry, improved understanding and prediction of the effects of modifying ligands or proteins, and computational addition of structural water.
X-ray models of biomolecules are rich in structural information. However, many of these structures have regions where low electron density prevents accurate assignment of atom coordinates. Often these errors do not affect conclusions about structure and function. However, in structure-based design, geometry errors can cause serious problems.1
The use of geometry restraints, in X-ray refinement,2 helps to reduce errors but inaccuracies remain.3, 4 Large errors in assignment of backbone or side chain torsions can occur at resolutions greater than 2.0; smaller errors of around 0.3 Å are found in structures with resolution greater than 1.5 Å.
In tests of ligand docking and scoring, poor results have been attributed to errors in binding site model.5, 6 Small structural errors can cause large errors in computed energies leading to incorrect docking. Errors also compromise the usefulness of X-ray data in the development of accurate docking and scoring methods.
In principle, geometry-related problems in structure-based drug design could be resolved by a force field capable of producing accurate minimum energy geometries of complex molecules. In 1986, a study by Whitlow and Teeter7 suggested that the development of accurate force fields for macromolecules may not be easy. They applied AMBER8 to very accurate X-ray model (Crambin, 0.95 Å) and found that energy minimization introduced errors. The errors could be partly reduced by use of short-range nonbonded forces. The errors appear to be related to the absence of a complete set of explicit solvent molecules and unfortunately this is a feature of nearly all PDB structures.
In 1993 a new force field, QXP,9 was developed from AMBER9 with the goal of improved modeling of molecular complexes. A few more ultra high-resolution PDB structures were available for use in the development which was strongly influenced by the work of Whitlow and Teeter. QXP can be used for docking with flexible binding site side-chains and has predicted docking geometries which were confirmed by X-ray analysis.10–13 However, the force field still caused significant distortion when applied to entire proteins.
There appear to be few other reports on the suitability of force fields for prediction of macromolecular geometries. In 2003, a review of many force fields14 confirmed the scarcity of publications on this topic, attributing this to a lack of accurate experimental data.
A set of 1500 PDB files with resolutions of 1.2 Å or less is now available.15 Atoms in these structures have sharply defined electron density (Fig. 1) allowing unambiguous coordinate assignment, unbiased by geometric restraints. No restraints of any kind were used in the refinement of 145 PDB structures used extensively in this study.
Small molecule crystals provide a second source of high quality geometries.16 Combined with the PDB data, this provides a large set of accurate models of molecular complexes. Small molecule crystals contain data for many different types of chemistry. High-quality regions of PDB files, which have an accuracy approaching that of small molecule crystal data,3, 7, 17 provide many examples of complex patterns found in biomolecules.
Using this data, a new force field (CrystalGEO) was developed from QXP with the following goals:
1.accurate modeling of structures, with complex networks of forces, at a local energy minimum;
2.an extended set of force field parameters for ligands;
3.nonbonded potentials which work for models with incomplete amounts of explicit solvent.
The force field was developed by evaluating variations of potential functions using accurate test structures. Variations that produce the smallest changes in geometry were kept. This protocol provided a force field with self-consistent relative energies.
To test the resultant force field, many ultra high-resolution structures were energy minimized. Absolute atom movements and changes in internal geometry were measured. With the new force field, small changes in test structures still occurred. The significance of these changes was assessed by calculating their effect on local internal energy.
Next, the force field was applied to models with less accurate data. Two sources of data were used: low-resolution structures, and high B-factor regions of high-resolution structures. Results were compared to those for accurate models using measures such as, atomic movement, energy, numbers of explicit hydrogen bonds, hydrogen bond length, and van der Waals contact distance.
These tests were statistical. An additional test was developed to provide information about the effect of the force field on specific errors. This test takes advantage of the fact that the same protein often appears in several PDB structures and they have regions with similar conformations. Although similar in overall structure, there are variations in geometry that become larger at low resolution, presumably due to structural errors. Comparison of a region in a low-resolution structure with the equivalent region in an accurate structure provides a method for identifying structural error. Comparisons made before and after energy minimization give atom by atom information about the effects of the force field on these errors.
Energy minimization can only be used to remove geometric errors that are small. Preparation of X-ray models for structure design may also require changes in side chain conformation and addition of structural water. Preliminary tests of the ability of the force field to address these issues are reported.
The structures used in this study are rich in geometric and chemical diversity
The small molecule data (COD) has many chemically distinguishable substructures: 154 hydrogen bond pairs, 583 Van der Waals contact pairs, 1153 bond lengths, 34,031 bond angles, and 51,313 dihedral angles. The number of distinct motifs in ultra high-resolution PDB structures is smaller, but, in both data sets there are hundreds of examples for most substructure types.
Ultra high-quality PDB structures contain geometries similar to those found in the COD
One hundred forty-five PDB files refined with no restraints were examined. Well-ordered regions in these structures with sharp electron density and accurate atomic coordinates can be identified by their low B-factors (Fig. 1 and Supporting Information). Many types of bond length and bond angle in PDB files and small molecule crystals (COD) have similar values. Linear regression of average values (PDB vs. COD) for 83 types of bond length and 70 types of bond angle gives: bond length: n = 83, r2 = 0.99, s = 0.02 Å; bond angle: n = 70, r2 = 0.96, s = 0.02 Å (1–3 distance).
Bond angles with a broad distribution (caused by complex conformational effects) were not included.
In PDB files, the number of chemically distinct heavy atom hydrogen bonding pairs is 42. There are 154 distinct Van der Waals contact pairs. Correlations for contact distance (PDB vs. COD) are: hydrogen bond: n = 42, r2 = 0.75, s = 0.08 Å; Van der Waals: n = 154, r2 = 0.85, s = 0.17 Å.
Use of very high-quality data as a training set led to a significantly improved force field
The major improvement occurred in the effectiveness of the nonbonded potential functions, as shown by a 5-fold reduction in the effects of energy minimization on internal geometry (see Fig. 1, Supporting Information).
The modified force field caused only small changes to accurate structures
Models for testing the accuracy of the new force field were set up to restrict motion to high-quality atoms. Effects of the force field on these models are shown in Table I. Changes in internal geometry were around 0.09 Å for small molecules and 0.06 Å for PDB structures.
Table I. Movement of Atoms, Resulting from Energy Minimization with the New Force Field, Depends on Resolution
To assess the effect of the force field on energy, nonbonded energies before and after energy minimization were compared by linear regression. Atom energies were calculated as the sum of contact, hydrogen bonding and Van der Waals repulsion terms. Statistics for energies of atoms with B-factors less than 10 are as follows:
Restricted optimization: n = 4000, r2 = 0.95, s = 0.19 kJ, slope = 0.93;
All atom optimization: n = 4000, r2 = 0.94, s = 0.18 kJ, slope = 0.95.
Effects of the force field on less accurate structures depend on resolution and B-factor
Effects of energy minimization on hydrogen bond length, in high- and low-resolution PDB structures, are shown in Figure 2. Frequency profiles for hydrogen bonds in X-ray structures are broader at low resolution. After energy minimization, the profiles were almost identical at all resolutions and were the same as those found using ultra high-resolution X-ray coordinates. After energy minimization, the peak widths for 1.2, 2.0, and 2.5 Å differed by no more than 0.02 Å.
Table I compares the effect of optimization on changes in the local internal geometry of atoms for a series of structures of different resolution. For low B-factor atoms in ultra high-resolution structures the change was 0.06 Å. For 30 PDB structures at 2.5 Å, the average change was 0.17 Å. There was also a progressive increase in atom movement when moving from high to low resolution.
The force field was tested on 168 HIV protease complexes covering a range of resolutions
HIV complexes, with resolutions ranging from 1.0 to 3.0 Å, were energy minimized. Figure 3 shows that atom movement was typically less than 0.4 Å at high resolution; at low resolution, movement was variable, ranging from 0.4 to 0.8 Å. Figure 3 also compares internal energies of the complexes before and after energy minimization.
Molecular graphics, Figures 4 and 5, shows the effect of energy minimization on repulsion energies in two complexes with different resolution. In the high-resolution structure (1KZK) there were a few high energy atoms before and after energy minimization. The low-resolution X-ray structure (1HVS) had more high energy atoms but after energy minimization the number became similar to that for 1KZK.
Hydrogen bonding completeness can be expressed as the percentage of atoms which form a hydrogen bond with explicit atoms or implicit water. In ultra high-resolution structures, around 99% of the hydrogen bonding atoms were satisfied, i.e. there were 1% unsatisfied hydrogen bonds.
Energy minimization of structures with resolutions below 1.5 Å caused a small drop in average values of unsatisfied hydrogen bonds from 1.1% and 0.8%. At resolutions above 2.4 Å the drop is from 2.3% to 1.5%. Twenty-four energy minimized structures with resolution below 1.65 Å had an unsatisfied hydrogen bond count of less than 1.3%. For resolutions greater than 2.0 Å, 12 of 47 energy minimized structures had a value greater than 1.5%.
The force field produced very similar geometries in homologous regions in proteins
Similarities in internal geometry and energy in homologous regions of pairs of HIV complexes were calculated before and after geometry relaxation. Atoms with similar local geometries were identified by comparing the local distance geometry of atoms with identical residue numbers, chain-names, and atom labels (Supporting Information). Figure 6 shows regions with similar internal geometries in 1HVI and 1HVS (1.8 Å and 2.25 Å) before and after geometry optimization. An improvement in the match of homologous regions is apparent. Linear regression was used to compare internal energies of equivalent atoms in the two complexes using atoms from regions which match (d < 0.5 Å).
Optimized: n = 1065; r2 = 0.92; s = 0.21 Å; X- ray: n = 1065; r2 = 0.65; s = 0.29 Å. Results for 2QD7 (1.11 Å) and 1MTB (2.50 Å) showed that optimization can be lead to a match between a low-resolution and an ultra high-resolution structure.
Optimized: n = 706; r2 = 0.98; s = 0.12 Å; X-ray: n = 706; r2 = 0.69; s = 0.44 Å.
Computed waters match structural X-ray waters
The procedure for water addition, described in methods, was tested using 168 HIV structures. The X-ray structures contain 290 water molecules which have B-factors of less than 30 and form three or more hydrogen bonds to nonwater atoms. Examples of computed waters matching X-ray waters are shown in Figure 7. Each of the X-ray waters had a nearest computed water at an average distance of 0.58 Å, with 85% of the X-ray waters having computed water within 1.0 Å. The effect of water addition on structure movement as a result of energy minimization was observed. Water addition caused a small increase in RMS atom movement during optimization from 0.38 to 0.48 Å.
Preliminary results suggest that the force field may be useful for assessing side-chain conformations
Nine HIV complexes were used, each providing 150 flexible side-chains. Water was excluded. The search was incomplete for 3% of the side chains which had a lowest energy conformer more than 1 kJ greater than the starting structure. Of the remaining side-chains those with less than 45% exposure to solvent were most predictable; 97% of these side-chains had a lowest energy conformer with an RMSD less than 0.5 Å from the starting structure. The average RMSD for these side chains was 0.087 Å.
For 20 side-chains the lowest energy conformer differed by more than 0.5 Å RMSD from the test structure. Compare to the test structure they had an average RMSD of 1.4 Å and an average energy of 2.7 kJ lower. Of the 20 side-chains, 19 occur in lower resolution structures (1.85 Å or greater).
X-ray structures provide an information-rich training set for force field development
In the high-quality data used in this study bond lengths and bond angles are much more tightly defined than torsion angles, hydrogen bond lengths, and van der Waals contact. Deviations from a standard value indicate parts of the structure, which might be strained. Molecular mechanics provides a model for determining strained geometries. The deviations are caused by conflicting forces that can be calculated. When the forces on an atom are not balanced, movement of the atom will reduce the energy. When the forces are balanced the structure is at an energy minimum; no movement will occur although some internal strains remain.
The large variety of examples of internal strain found in the experimental data provides a powerful training set for development of a force field which can conserve the geometry of these structures.
The new force field uses modified potential functions for non-bonded interactions
One of the challenges in developing the new force was to find potential functions for nonbonded forces in models with limited explicit solvent. Replacing electrostatic potentials with a simple hydrogen bonding potential and modifying the calculation of the Van der Waals attraction caused a major improvement in the force field (Supporting Information).
Geometric restraints and force fields assist development of molecular models in different ways
In X-ray crystallography, knowledge-based geometry restraints are helpful when the electron density is too diffuse to indicate individual atom positions. For bond lengths and some bond angles the restraints are precise; for other features they must allow for variation observed in accurate structures. Moderate use of restraints allows reasonable geometries to be developed where electron density is weak, without losing definitive input from well defined regions. A 2010 review17 outlines limitations of current restraints.
Force fields use a mathematical model of molecular potential energy to predict precise minimum energy geometries. The current force field was developed to achieve consistency between the results of energy minimization and high quality experimental data.
The force field conserved geometry and energies in high quality structures
Energy minimization caused only small changes in accurate test structures (see Results, Table I, Fig. 1). Changes in local internal geometry were less than 0.1 Å. Comparison of the internal energy of atoms before and after energy minimization showed that the movement had only small effects on strain energy. Effects on hydrogen bond length were also small (Fig. 2). This shows that the force field provides an adequate model of forces that determine geometries at a local minimum.
Optimization of low-resolution structures caused small changes which resulted in geometry and energy patterns close to those found at high resolution
When the force field was applied to low-resolution structures, profiles for hydrogen bond length became similar to those found at atomic resolution. Application to a series of HIV protease complexes (Fig. 3) reduced the high energies found at low resolution to values found at high resolution. The reduction is caused by small changes in local internal geometry; around 0.17 Å at 2.5 Å resolution compared to 0.09 Å for high-resolution examples (Table I). These changes of geometry and energy were accomplished with simple down-hill energy minimization involving no conformational searching.
Energy minimization reduced the percentage of unsatisfied hydrogen bonding atoms in low-resolution structures, but some still had more unsatisfied hydrogen bonds than high-resolution structures. These structures may have errors that cannot be corrected by local energy minimization.
The force field can bring homologous regions in different X-ray structures into alignment
The geometries of equivalent atoms in homologous X-ray structures often became very similar after geometry optimization. This indicates that the force field can predict nearly identical energy minima.
Preliminary tests show that the force field may be useful for conformational searching and for placing computed water in a model
Preliminary results show that conformational searching after randomization can predict the geometry of many side-chain conformers with an accuracy of around 0.1 Å. Computed waters are less accurately predicted but nearly all low B-factor X-ray waters that form several hydrogen bonds are close to a modeled water that often has similar bridging properties.
The new force field can provide models with uniform high-quality geometries similar to those found by atomic resolution crystallography
Geometries and energies in optimized structures can give insights into internal stresses in a molecule or complex and their effect on structure. Optimized structures can also reveal stabilizing features such as well-formed hydrogen bonds and atoms rich in attractive Van der Waals attraction.
The methods may be useful for understanding and predicting the effects of structural change. Examples include ligand-induced binding site changes and allosteric mechanisms.
X-ray refinement is another potential application for the force field. Energy minimization coupled with electron density restraints might produce structures with high-quality internal geometries that are also consistent with X-ray data.
Limitations of the geometry force field
The force field appears to be suitable for modeling applications but is less accurate than high-resolution X-ray data. Although accuracy is assessed to be 0.1 Å errors may be greater for atypical geometries such as might occur in the catalytic site or in ligand binding sites with high selectivity. A second limitation is that the force field only models relative interaction energies. These energies include bonded and nonbonded strain energies as well as attractive energies for hydrogen bonding and Van der Waals interactions. However, for a full energy calculation, additional terms may needed.
Selection and preparation of accurate X-ray crystal structures required a detailed protocol
Small molecule CIFF files, downloaded from the Crystallography Open Data Base,2 were selected using several criteria; presence of an empirical formula, presence of each atom in the unit cell at full occupancy, absence of B-factors greater than 30. Files that included atom coordinates, bonds with bond order, and B-factors were prepared with full crystal packing. File conversion used a custom written program.
Seven hundred and thirty-three X-ray PDB1 entries, resolution 1.2 Å or less, were converted to files suitable for force field evaluation. When an atom had multiple occupancies, the first reported set of coordinates was taken and the atom marked as unreliable. Waters with Van der Waals clashes or with B-factor greater than 50 were removed. Also marked unreliable were atoms with Van der Waals clashes or where an implicit bonded atom was missing.
Proton assignments are often absent in PDB files. In this study, explicit protons were removed. Added protons were optimized using a force field based on small molecule data. Ionization and proton orientation were adjusted to achieve a reasonable hydrogen bonding network. Atom types are similar to those in AMBER.8
Outline of the method of force field development
The force field was refined by observing its effects on the geometry of accurate structures, applying corrections to the energy functions, and then retesting. Estimation of corrections is based on the mechanism of energy minimization. A force field is used to calculate forces on each atom. When the forces do not balance the atom is moved. Movement of an atom, in a structure known to be accurate, indicates errors in energy terms which involve that atom. For example, if the Van der Waals radius for aliphatic hydrogen was too large then on average the distance between this type of hydrogen and other atoms would increase. The extent of the increase gives a guide as to how much to change the Van der Waals radius. However, when the force field has many errors some changes may originate from several sources. For this reason the size of corrections was limited and several cycles of force field refinement were used to approach a stable solution.
Using statistics for changes in each motif value made it possible to combine energy minimization results from many models and improve multiple parameters in a single pass.
In the first stage restraints of bond angles and torsions were used to limit change to nonbonded contact distances and bond length
Small molecule crystal data provided a large sample with reliable values for all atoms in the unit cell. By optimizing with bond angles and torsions restrained to starting values, it was possible to focus on nonbonded interactions. Short-range electrostatic forces did not model hydrogen bonding well and better results were obtained using a strong short-range attractive force between polar hydrogen and acceptor atoms and without electrostatic energy. Changes were also made in the attractive part of the Van der Waals potential.
Restraints were removed to optimize the bonded terms and then the entire force field was refined to achieve self-consistency using both PDB and small molecule crystal data
Bond angle terms for small molecules were optimized keeping only torsion restraints. After this, removal of all restraints allowed torsion parameters to be tested and developed. Final refinement of the force field involved all the potential energy terms found in a combined set of PDB and small molecule crystal structures.
The new force field has major changes in non-bonded energy functions
Changes in non-bonded potential energy functions caused a dramatic improvement in the force field (Supporting Information). No changes were made to the form of the functions used for bonded energy. However, the force field does contain parameters for many additional bond energy terms.
A water addition algorithm was based on the new force field
A 0.33 Å resolution grid map of water energies was prepared. Energies were calculated assuming optimal hydrogen orientation. Points with a local energy minimum were listed. When points were close the one with highest energy was removed. The worst clashes are processed first. When no severe clashes remained the listed grid points were used to position water oriented for hydrogen bonding in the model. Energy minimization with the new force field was used to optimize the positions of the waters and remove any with high energy.
Amino side-chain searching used a Monte Carlo method adapted from QXP
Nine HIV protease complexes were energy minimized, water was removed, and test models for side-chains and their environment were prepared. The complexes had resolutions between 1.2 Å and 2.8 Å. Each test amino acid was randomized and subjected to a Monte Carlo search with energy minimization of the test amino acid.