SSC: A tool for constructing libraries for systematic screening of conformers

Authors

  • Sanliang Ling,

    Corresponding author
    1. Chemistry—School of Engineering and Physical Sciences, Heriot-Watt University, Edinburgh EH14 4AS, United Kingdom
    • Chemistry—School of Engineering and Physical Sciences, Heriot-Watt University, Edinburgh EH14 4AS, United Kingdom
    Search for more papers by this author
  • Maciej Gutowski

    1. Chemistry—School of Engineering and Physical Sciences, Heriot-Watt University, Edinburgh EH14 4AS, United Kingdom
    Search for more papers by this author

Abstract

Even a relatively small molecule with 10–20 atoms might have a few local minima, which correspond to different conformers. The number of local minima quickly increases with molecular size and the most common algorithms, driven by calculated forces, frequently identify a minimum, which is closest to the initial structure, rather than the most stable conformer. Here we discuss how to perform a systematic search of the conformational space for a chain-like molecule. Our approach is fully automated and a user has control which chemical bonds will be probed and with which increments. Moreover, whole fragments of the molecule, which are adjacent to each selected rotational bond, are rotated in a properly selected cylindrical coordinate system and unchemical hybridizations and some “clashes” between neighboring groups, which are common when standard Z-matrices are used, are avoided. A library of potentially relevant conformers is created with a tool, which we call SSC, denoting Systematic Screening of Conformers. Each member of the library is prescreened at a predefined level of theory and the most promising conformers are identified. Finally, they are further evaluated at a higher level of theory to identify the most stable structures and their physicochemical properties. As an example, we demonstrate the results of this approach for 2′-deoxycytidine. © 2011 Wiley Periodicals, Inc. J Comput Chem, 2011

Introduction

It has been recently demonstrated that combinatorial/computational methods are quite robust to answer fundamental chemical questions. For example, a method has been developed to find the most stable tautomers,1, 2 and the TauTGen code constructs a library of tautomers of a molecular frame built of heavy atoms and hydrogens. Likewise, the ConGENER code3 constructs a library of substitution isomers (or the so-called congeners) of a molecule.

The conformational space of a molecule is spanned by all its internal rotational bonds.4 Each combination of internal rotational bonds gives birth to an individual conformation, whereas every conformation has its own, in general different, electronic energy, which corresponds to a point on the potential energy surface. According to the theory of statistical thermodynamics,5 conformations with lower electronic energies have more favorable Boltzmann factors, and they dominate the distribution of conformers. At the same time, only significantly populated conformers contribute to molecular properties measured in experiments. Therefore, to obtain relevant information about a molecule from a theoretical study, one needs to find the low-energy conformations, as they determine its physicochemical properties.

The potential energy surface of a molecule is characterized by many local minima and one global minimum, and the goal of the conformational search would be to find the global minimum structure as well as the most stable local minimum structures. There are stochastic search methods, such as the simulated annealing,6 Monte-Carlo, and basin-hopping methods,7, 8 which could explore the potential energy surface in an efficient way. However, one never knows for sure that the global minimum has been found by a stochastic search method.8 The systematic search method, in which one systematically explores the whole conformational space, is the most reliable approach, the accuracy of which is only restricted by the accuracy of the theoretical method used and the accuracy of sampling of rotational degrees of freedom. To perform a systematic search, one would need to generate a library of all reasonable conformers, which are obtained through systematic rotations of molecular fragments by discrete increments. This conceptually straightforward task quickly becomes very time consuming, if one does it manually. It would be much more advantageous to develop a tool, which could generate conformers automatically. Indeed, there are tools which are capable of doing this.9, 10 However, the research community would benefit from a well-documented tool, with the source code available in the public domain, so one could modify it for special applications.

The systematic generation of conformations could be done in either the internal coordinates, which are usually written in the Z-matrix format,4 or the Cartesian coordinates. A few algorithms have been suggested for generation of conformers of a molecule, such as OMEGA,9 and CAESAR,10 both of which generate conformers in such a way that a molecule is divided into smaller fragments, and then all these fragments, which might have different conformations, are reassembled to form new conformers of a molecule. Such approaches are efficient because of adoption of conformations of the fragments. However, these are not full systematic searches, and these methods might miss some conformers, which could not be assembled from the most stable conformations of each fragment. In practice, these methods use force fields for screening at intermediate stages, which limits accuracy of the procedure.

To generate a new conformation, one would need to change dihedral angles. Different schemes have been developed for the purpose of updating dihedral angles, including simple rotations, which is based on a global reference-frame,11 the Denavit–Hartenberg local frames,12 and the atom-group local frames.13 Zhang and Kavraki13 compared these three methods, and they concluded that their atom-group local frames are superior to the first two, because their method eliminates book-keeping and error accumulation and provides complete inheritance of rotation matrices. Later on, Choi11 improved the efficiency of the simple rotations method by using a series of consecutive operations, including translations and rotations, to update dihedral angles, and she concluded that the improved simple rotations are as efficient as the atom-group local frames.

Because of the inherent nature of Z-matrix coordinates, in which the definition of coordinates of one atom depends on coordinates of another atom that has already been defined, a systematic generation of conformations based on Z-Matrix coordinates encounters problems. An example, which clearly shows the problem when using Z-Matrix coordinates to update a dihedral angle and create new conformers, is given below. In this example, an intuitively obvious Z-matrix is given for a conformer of the canonical tautomer of glycine (See Figure 1 for the Z-matrix and for the atomic labels).

Figure 1.

Updating molecular geometry based on the Z-Matrix coordinates. Top left – the Z-matrix. Top-right – the initial structure. Bottom left and right – unchemical molecular structures resulting from increments of dih5 and dih7 by −60° and +60°, respectively. Color code: white = hydrogen, blue = nitrogen, green = carbon, red = oxygen.

If one wants to generate a new conformer of glycine by updating dihedral angles defined by a rotational bond between atoms 1 and 2, one has three choices, because there are three dihedral angles defined by this rotational bond: “dih5”, “dih7”, and “dih8”. Supposing the initial geometry has a Cs symmetry, the dihedral angles “dih7” and “dih8” are equivalent. For this reason, only “dih5” and “dih7” are considered here. If one changes these two dihedral angles by a certain amount (e.g. −60° or +60°), one produces several new conformers, and two of them are shown in Figure 1. One can see that both trial structures are chemically meaningless. For example, in the first case, we updated “dih5” by −60°, in which only the positions of atoms 5, 6, and 9 were changed, and it creates a short contact between atoms 5 and 7 and an improper hybridization of atom 1. This is exactly what happens when one manipulates the glycine molecules using this Z-matrix and the commonly used and publically accessible pre- and post-processing program Molden.14 The problem could be alleviated if atoms 7 and 8 were rotated together with atoms 5, 6, and 9, but it requires special attention from a computational chemist. Alternately, one can reorder atoms in the coordinate file to avoid this clash, but the approach would require different ordering of atoms for rotations along different dihedral angles. For glycine, this problem is straightforward to resolve, but it becomes serious and difficult to handle in systems, which have many large side chains. A decisive solution of this problem would be to perform a concerted rotation of all fragments that are connected to the chemical bond under consideration. This approach will be outlined in the next section.

Methods

We developed a tool, SSC, denoting Systematic Screening of Conformers.15 SSC is an automatic conformation generator, which we use for systematic searches of conformational spaces of molecules. The user is asked to provide the Cartesian coordinates of a molecule (path of the *.xyz file). To obtain a library of conformations, the user needs to specify which chemical bonds should be activated (the two atoms involved in a rotational bond), and what is the number nrot of rotamers to be considered for this rotational bond. The value of nrot implies that all groups connected to one side of the rotational bond will be rotated with 360°/nrot increments. This is all what is needed from a user. Next, SSC will generate possible conformations for each rotational bond recursively, until all specified rotational bonds have been scanned. As an output, geometries (in the form of separate *.xyz files) of all considered conformers will be produced.

SSC can also prepare Gaussian16 input files for these geometries. To do this, the user will be asked to provide information about the “Route Section” for a Gaussian job, the charge and multiplicity of the molecule. Then Gaussian input files for all geometries available in the library of conformers will be created. Finally, once calculations based on these input files finish, information about converged energies and geometries could be extracted from output files with the Gaussian output tools,17 and the global minimum structure as well as the most stable local minima structures will be identified. Further calculations are typically performed at a higher level of theory for the most promising conformers to determine their relative stability and physicochemical properties.

In the current contribution, we modify chemical structures in a step-by-step procedure. The total number of created conformers, ntotal, is a product of nrot values for all activated bonds:

equation image(1)

The value of ntotal increases rapidly with the size of the molecule.18, 19 For example, a molecule with six rotational bonds, each probed with 360°/6 increments, would have 46,656 conformers. The value of ntotal dictates which theoretical model is practical when evaluating the level of fitness of each member of the library. In the case of 2′-deoxycytidine discussed in the next section, we could afford to screen the library with quite an accurate electronic structure method, B3LYP/6-31G*. In the case of more complicated molecules, the initial screening might be performed using approximate force fields.

Details on how SSC updates molecular structures and generates new conformations are given below. A flowchart, which explains how SSC works, is given in Figure 2. SSC reads in the Cartesian coordinates file (*.xyz) provided by the user, a list of rotational bonds that will be active in the conformational search, and the corresponding nrot values. For example, the value of nrot for a rotational bond determined by atoms 1 and 2 (see Figure 1) is 6.

Figure 2.

Basic working procedure of SSC.

Next, a connectivity matrix is constructed for the molecule. Let us take glycine as an example (see Figure 1, in which the atom numbering is consistent with the *.xyz file, which defines the geometry). The connectivity matrix, see Figure 3a, is automatically generated in such a way that if two atoms are connected with a chemical bond, we set the related matrix element to 1, or if not, the element is set to 0. The diagonal elements of the matrix are all set to 0. To decide whether two atoms M and N are connected we use a criterion adopted by the Cambridge Structural Database20

equation image(2)

in which R0 (MN) denotes the actual distance between atoms M and N, Rcov denotes the covalent radii, as recommended by Cordero et al.21 and t denotes a tolerance which was set to 0.4 Å. If the value of R0 (MN) fulfills the condition (2) then M and N are connected.

Figure 3.

The connectivity matrix of the glycine molecule: (a) the original one, (b) the one with atoms C1 and C2 disconnected, in which the modified elements are marked in green, and the red and blue markings indicate molecular fragments connected to C1 and C2, respectively.

The left-most part of (2) is also used to identify potential clashes that might develop between various parts of the molecule in the course of systematic rotations. If the distance between any two atoms M and N is smaller than Rcov(M) + Rcov(N) − t then the user is notified about a potential clash and further input is required to decide whether this conformer should be abandoned or not.

Our approach to create conformers that differ by rotations around the A-B bond is the following. First, we create a temporary connectivity matrix based upon the original matrix, but we will disconnect atoms A and B. For example, we have shown in Figure 3b a connectivity matrix for glycine with atoms 1 and 2 disconnected (the modified elements are marked in green). Next, we create two vectors which have at most L elements (where L is the total number of atoms in the molecule). The first vector contains labels of atom A, and all these atoms that are directly or indirectly connected to it. The list of these atoms is determined from the temporary connectivity matrix (see red markings in Figure 3b). Analogously, the second vector contains labels of atom B and all these atoms that are directly or indirectly connected to it (see blue markings in Figure 3b). For example, in the case of rotations around the bond 1-2 of glycine, one produces two vectors [1, 7, 8, 5, 6, 9] and [2, 3, 4, 10], which characterize two branches of glycine, one connected to atom 1 and another connected to atom 2. The software will recognize which branch of the molecule has fewer atoms, and this branch will be rotated to generate new conformations.

Let us assume that the branch connected to atom B is smaller, and we will rotate this branch around the A-B bond. Next steps of our procedure are shown in Figure 4. First we translate the whole molecule, such that the atom A coincides with the origin of the coordinate system (see Figure 4b). Next, we rotate the molecule, such that the atom B ends up on the Z-axis, whereas A remains at the origin of the coordinate system (see Figure 4c). Now, we use cylindrical coordinates to rotate the B branch of the molecule, whereas the A branch remains intact. This approach eliminates potential “clashes” between different parts of the B branch, see Figure 1. Other potential clashes are handled using the Rcov (M) + Rcov (N) − t threshold discussed earlier. After this rotation, we convert cylindrical coordinates back to Cartesian coordinates and print them out for each conformation into a separate *.xyz file. A more detailed description of operations illustrated in Figure 4 could be found in the Supporting Information (see Supporting Information Figure S1 and the following description).

Figure 4.

Alignment of the A-B rotational bond with the Z-axis.

We do not expect the outcomes of our method to be dependent on the structure of the initial conformer, providing all rotational bonds are sampled and the nrot values are properly selected. One should keep in mind that all structures created by SSC are optimized at the screening stage, compensating for differences in internal geometrical parameters of different initial conformers. Our approach might avoid problems encountered in earlier studies, where the results of a conformational search were dependent on which conformer was used as the starting point.22, 23

The MP224 optimizations of the neutral species reported in this study were performed with NWChem 5.0,25 and the B3LYP26, 27 calculations were performed with Gaussian 03.16

Results

As an example, SSC has been used to explore the conformational space of 2′-deoxycytidine (dC), a nucleoside consisting of cytosine and deoxyribose. The molecular framework of dC is shown in Figure 5. It has been reported that in natural nucleic acid helices, the anti combination of the base and the sugar is dominant.28 In a theoretical study performed by Hocquet et al.29 the authors found that for all eight isolated nucleoside molecules studied, the anti conformers were more stable than the syn conformers, and for dC, the anti conformer was more stable than the syn conformer by about 5 kcal/mol. In the current contribution, both the anti and syn combinations of the base and the sugar unit were considered through rotation of the base along the glycosidic bond by a proper increment. The numbering scheme for all atoms is given in Figure 5. In the following discussion, we will add apostrophes on the atomic numbers of all the heavy atoms in the sugar unit to distinguish them from those of the base. All rotational bonds, which are considered in our systematic conformational search, are also labeled in Figure 5. The nrot values for the angles β, γ, ε, χ, and φ considered in our work are 3, 6, 3, 6 and 2, respectively. It should be pointed out here that all previous studies indicated that the amino group of the base is nearly planar,30 and it is far from other parts of the molecule. It implies that any change of the dihedral angle φ has little effect on other degrees of freedom, hence it is reasonable to neglect this degree of freedom at the initial stage of our conformational search. In addition, the puckering of C2′ or C3′ in the sugar, namely the relative stability of the C2′-endo and the C3′-endo conformations, is still controversial.31 Obviously, this degree of freedom could not be properly dealt with using the SSC algorithm. Thus, we decided to use a planar sugar conformation in the initial guess structures and let the optimization procedure to distort it according to actual forces acting on atoms, though it might require more time to complete the geometry optimization. This procedure led in some cases to the C2′-endo and in others to the C3′-endo structures. For the five most stable structures identified in this way, we explored whether an opposite puckering of the sugar ring would lead to a more stable structure.

Figure 5.

Molecular structure of 2′-deoxycytidine.

On the basis of these assumptions, our systematic search of the conformational space of neutral dC is performed in the following way. First, a library of conformers related to the dihedrals angles β, γ, ε and χ is generated by SSC, and these initial structures were optimized at the B3LYP/6-31G* level of theory.16 The total number of trial structures in the library was 324, but only 66 distinct conformers were identified at this stage (we used an energy difference of 10−5 Hartree plus visual evaluation to exclude equivalent minimum energy structures). Second, all these 66 conformers were further optimized at the B3LYP/6-31++G** level of theory. Finally, vibrational frequency analysis was performed at the B3LYP/6-31++G** level for 30 most stable structures (see Supporting Information for their Cartesian coordinates and absolute electronic energies). These structures were further optimized at the MP2/aug-cc-pVDZ level to take intramolecular dispersion interactions into account and to refine the relative stability of conformers. For the five most stable structures, an opposite puckering of the sugar ring was also considered.

The eight most stable structures we found are displayed in Figure 6 and the B3LYP/6-31++G** and MP2/aug-cc-pVDZ relative energies of these eight conformers, with geometries optimized at the corresponding levels of theory, are summarized in Table 1. The B3LYP results corrected for zero-point vibration energy, enthalpy, and free energy terms, obtained in rigid rotor/harmonic oscillator approximation for T = 298 K and p = 1 atm, are also listed. The labeling letter N refers to the neutral structure, and the numeral following the letter indicates the stability of the neutral species ordered according to the ascending MP2/aug-cc-pVDZ electronic energy.

Figure 6.

MP2 geometries for the eight most stable conformers of 2′-deoxycytidine. The dotted lines indicate the intramolecular hydrogen bonds identified by Molden with default settings. [Color figure can be viewed in the online issue, which is available at wileyonlinelibrary. com.]

Table 1. Relative MP2 and B3LYP energies (in kcal/mol) of the most stable conformers of neutral 2′-deoxycytidine. The zero-point vibrational corrections and thermal contributions to enthalpy and free energy are included at the B3LYP level.
StructureEB3LYPEMP2EB3LYP +ΔE0,vibEB3LYP +ΔH298,corrEB3LYP +ΔG298,corr
N10.0000.0000.0000.0000.000
N2−0.0030.1090.0530.0210.115
N31.2442.0091.0531.2230.804
N41.8082.0491.6891.7811.487
N51.3492.0971.0591.2900.661
N63.3082.7333.2993.3623.305
N73.9533.5223.9003.9723.833
N82.9833.7162.4732.8081.290

As mentioned earlier, our procedure leaves some uncertainty about the puckering of the sugar ring. For this reason, we manually changed the sugar ring puckering for the conformers N1–N5, e.g., if the converged structure was C2′-endo we considered its C3′-endo counterpart, and vice versa. The geometry optimizations of these five structures converged to the structures which already existed in our original set listed in Supporting Information Table S2. Some higher energy conformers from Supporting Information Table S2 might favor a puckering of the sugar ring opposite to the one identified by us. In view of small energy differences between oppositely puckered structures, we do not expect these unidentified yet structures to become competitive with the most stable structures identified so far.

The results reported in Table 1 and Supporting information Table S2 demonstrate the critical importance of systematic conformational searches. Our most important finding is that the neutral dC favors the syn conformation in the gas phase, rather than the anti conformation, which is common in solid phases32 and in DNA.33 Indeed the seven most stable conformers of dC identified by us are syn, and the first encountered anti structure, N8, is less stable than N1 by 3.6 kcal/mol at the MP2/aug-cc-pVDZ level of theory. This finding is important because many computational studies of dC in the gas phase select an anti conformer, e.g., a study on the intramolecular hydrogen bonding in dC,34 a recent study on anion interactions of dC,35 or an interpretation of photoelectron spectra of anionic nucleosides.36

The formation of intramolecular hydrogen bonds is very important for the relative stability of various conformers of biomolecules.37 The results reported in Figure 6 and Table 1 and Supporting Information Table S2 unravel that the seven most stable neutral conformers are characterized by a hydrogen bond between C2[BOND]O2 and O5′-H13, which is plausible for the syn conformers. This kind of intramolecular hydrogen bond is impossible for the anti conformers, e.g., the conformer N8 supports only a weak hydrogen bond formed between C6-H4 and C5′-O5′. We also found out that the conformers N1 and N2 have very similar molecular structures and stability. They differ only by orientation of a hydroxyl group connected to the C3′ atom of the sugar ring.

From Figure 6, we could see that the conformers N1 and N3 differ by the puckering of the sugar ring, with N1 being C2′-endo and N3 being C3′-endo, and they differ in stability by 2.0 kcal/mol, see Table 1. Thus our results strongly favor the C2′-endo conformation to be dominant in the gas phase, while earlier B3LYP results29 favored the C3′-endo conformation by 0.53 kcal/mol. Further physicochemical consequences of findings resulting from our systematic search will be discussed in our future report.

Conclusions

We have presented the Systematic Screening of Conformers (SSC) tool, which generates an initial library of conformers for a systematic screening of conformational space of a molecule. After providing an initial structure, the user defines rotations around which chemical bonds should be explored and with which increments. By rotating a whole fragment of the molecule, which is adjacent to the activated bond, unchemical hybridizations and some “clashes” between neighboring groups are avoided. Each member of the library is prescreened at a predefined level of theory and the most promising conformers are identified. The tool offers a significant saving of human time, which otherwise would be spent on generation of hundreds or thousands of initial structures and corresponding input files. We envisage usage of this tool by both computational chemists and these experimentalists, who perform supporting computational studies.

The tool could be used to identify the most stable conformer of a molecule. Thus it might provide useful benchmarks for stochastic search methods.8 SSC scales steeply with the molecular size. With the nowadays computational resources it is applicable to systems with up to 5-6 rotational bonds probed with 2< nrot <6, if density functional electronic structure methods are used to screen the library. Larger molecules could be studied if initial screening was performed using more approximate force fields. The SSC approach is not limited to molecules in the gas phase. After inclusion of solvent effects at the level of polarizable continuum model, it is applicable to molecules in any solvent.

An application of our tool has been demonstrated for a nucleoside, 2′-deoxycytidine. We demonstrated that syn conformers should be dominant in the gas phase rather than anti conformers, which are favored in crystalline structures and in DNA.28 The dominance of syn conformers in the gas phase is dictated by an intramolecular hydrogen bond between C2-O2 and O5′-H13. Our results also suggest that the dominant puckering of the sugar ring in the gas phase should be C2′-endo, rather than C3′-endo.29

It should be pointed out that SSC is applicable to chain-like molecules, with complicated side chains. We pointed out difficulties encountered when dealing with puckered ring-like fragments, e.g., a sugar unit in nucleosides. For molecules with overall closed ring structures, such as cyclodextrin, SSC would not work, because for a given rotational bond A-B, the molecule could not be divided into two decoupled parts associated with atoms A and B. SSC could not recognize which part of the molecule should be rotated. If such a situation actually happens, SSC will be able to recognize the problem and alarm the user. There might be also problems for molecules with symmetric intramolecular hydrogen bonds, because SSC will assume that the central hydrogen atom is connected to both acceptors, and thus forms a ring. Solutions to these problems will be considered in our future work. For instance, it has been reported that ring internal torsion angles could be used to describe ring puckering.38 In the near future, the current tool, SSC, will be integrated with TauTGen,1 a tool for formation of libraries of tautomers, so that the structure of a molecule could be determined automatically taking into account various conformers and tautomers.

Acknowledgements

The authors thank Dr. Maciej Haranczyk for helpful discussions. Acknowledgement is given to the professional support provided by colleagues in Heriot-Watt University Information and Computing Services (UICS) when using the Heriot-Watt high performance cluster services. In addition, computing resources were available through (i) a project (m875) from the National Energy Research Scientific Computing Center (NERSC), and (ii) a Computational Grand Challenge Application grant from the Molecular Sciences Computing Facility in the Environmental Molecular Sciences Laboratory (EMSL). Pacific Northwest National Laboratory is operated by Battelle for the U.S. DOE under Contract DE-AC06-76RLO1830.

Ancillary