Unfavorable regions in the ramachandran plot: Is it really steric hindrance? The interacting quantum atoms perspective

Accurate description of the intrinsic preferences of amino acids is important to consider when developing a biomolecular force field. In this study, we use a modern energy partitioning approach called Interacting Quantum Atoms to inspect the cause of the φ and ψ torsional preferences of three dipeptides (Gly, Val, and Ile). Repeating energy trends at each of the molecular, functional group, and atomic levels are observed across both (1) the three amino acids and (2) the φ/ψ scans in Ramachandran plots. At the molecular level, it is surprisingly electrostatic destabilization that causes the high‐energy regions in the Ramachandran plot, not molecular steric hindrance (related to the intra‐atomic energy). At the functional group and atomic levels, the importance of key peptide atoms (Oi –1, Ci, Ni, Ni +1) and some sidechain hydrogen atoms (Hγ) are identified as responsible for the destabilization seen in the energetically disfavored Ramachandran regions. Consistently, the Oi –1 atoms are particularly important for the explanation of dipeptide intrinsic behavior, where electrostatic and steric destabilization unusually complement one another. The findings suggest that, at least for these dipeptides, it is the peptide group atoms that dominate the intrinsic behavior, more so than the sidechain atoms. © 2017 The Authors. Journal of Computational Chemistry Published by Wiley Periodicals, Inc.


Introduction
Since the original schematic published [1] in 1963 by Ramachandran et al., few efforts have advanced the understanding of the commonly denoted "forbidden" and "accepted" regions of the Ramachandran u/w plot. Subsequent work by Mandel et al. [2] allowed an advanced representation of the Ramachandran u/w plot to be depicted, detailing the locations of specific hard-sphere repulsions. Much more recently, we note that Mandel's plot is still used in undergraduate biochemistry textbooks. [3] Hence, despite the passing of almost half a century, the hard-sphere repulsion models have been accepted and incorporated in the development of many modern-day molecular force fields. Perhaps regrettably, the development of these force fields focused more on successfully parameterising torsional angles rather than on understanding the quantum mechanical nature of the interactions between the atoms involved. We believe that such a greater understanding is an important step towards simplifying the parameterization task, and especially, putting it on a firmer footing. In other words, the re-parameterization of conventional force fields typically creates new terms for experimentally observed structural effects. However, a method that directly partitions quantum mechanical information has a better chance of capturing all effects from the outset, and without extra corrections.
The conformational propensity of the 20 natural amino acids relies on three factors: intrinsic behavior, amino acid sequence, and chemical environment. [4] Understanding the combined influence of all three aspects on a molecular system requires each factor's individual behavior to be understood first.
Addressing the first factor, that of intrinsic behavior, many studies have shown that it causes an amino acid to show preferences in u/w space. [4][5][6][7][8][9][10] One way of investigating intrinsic behavior is through the use of coil libraries. Coil libraries contain sequences of amino acids that form neither a-helical or b-sheet conformations observed in experimental X-ray crystal structures. However, the conformations collected are always influenced by the surrounding protein structure since they are simply extracted from the initial complete protein. Hence, the individual amino acids are possibly biased by the tertiary structure of a protein. Also, they are still biased by being inside a sequence of amino acids. However, it is also known that observing isolated dipeptide structures are not representative of the amino acids behavior in oligopeptide chains. [11] Note, as a brief aside, that the often used but actually confusing name "dipeptide" refers to a single amino acid, flanked by a peptide bond at both termini. Here, we must ask, at what point does an oligopeptide become a sequence, resulting in the peptide's behavior being a result of sequential effects, rather than of the intrinsic behavioral effects of its amino acids? Could it be that only the presence of intramolecular stabilization should be associated with the intrinsic behavior? Despite some investigations of single amino acids poorly replicating their behavior in larger systems, there are also many reports identifying the importance of their study. [11][12][13][14] The second factor is that of sequencing effects. As the name suggests, this is the effect of having a sequence of amino acids, that is, an oligopeptide. The formation of ahelices, b-sheets, and loops result from the specific sequence of amino acids in a chain. The a-helices, b-sheets, and loops are regularly occurring oligopeptide structural arrangements allowing amino acids to be packed closer together, typically through hydrogen bonding across different amino acids in the sequence. The formation of such secondary structure landmarks still rely partly on intrinsic propensity, but one could argue are dominated by the interatomic interactions formed between the sidechain and backbone atoms of neighboring amino acids in a sequence. Capturing the behavior of oligopeptide sequences still remains a non-trivial task. A good example is given in a study by Best et al., [15] which recently reported the helical character induced upon tri-, tetra-, and penta-Ala oligopeptides by many common force fields: GRO-MOS (53a6) $13.1%, CHARMM27 (with CMAP) $57.5%, AMBER03 $62.3%, AMBER99 $94.2%, and AMBER94 $97.6%. The experimental value each force field was striving to achieve was $20%. The helicity was excessive for all force fields other than GROMOS. Such is the motivation for many studies determining new or improved torsional potentials in all conventional force fields. [16][17][18][19][20][21][22][23] The third factor, chemical environment, is perhaps the most difficult to investigate due to the computational expense. Chemical environment behavior may be investigated through observing the influence of multiple sequences on a defined central sequence. However, chemical environment may also be an investigation of solvation effects. Both introduce many new intermolecular bonds to consider, and scale the system size dramatically.
Today we use quantum mechanical methods to investigate the first factor: intrinsic behavior. In order to exclusively observe the intrinsic behavior, so-called dipeptides have been chosen as a first point of investigation. We isolate the intrinsic behavior through eliminating (i) sequencing effects by working with the single amino acid blocked (or "capped") with an acetyl group (ACOCH 3 ) and an amide group (ANHCH 3 ), and (ii) chemical environment through working in vacuo. The additional benefit of vacuum conditions, other than computational cost, is that they are important for the study of amino acids in the hydrophobic core of folded proteins typically inaccessible to solvent. [24] Working with gas phase ab initio data is also in accordance with most force field development. [25][26][27][28][29][30] Our investigation aims to validate (or contrast, were appropriate) the long-standing interpretation of the regions in the Ramachandran plot. To do this, we will use the Interacting Quantum Atoms (IQA) energy partitioning method, an approach that falls under the "umbrella" approach of Quantum Chemical Topology (QCT), a name first coined [31] in 2003. The IQA method allows the calculation of atomic energies, which together account for the full molecular energy. The atomic energies can be classified into both intra-and inter-atomic components, and also by energy type, for example, electrostatic, exchange, correlation, and so forth. Strategically chosen excursions (see Figure 1) through u/w conformational space are used to obtain system conformations representing multiple regions of the Ramachandran plot. Three systems are investigated: glycine (Gly), valine (Val), and isoleucine (Ile), representing a gradual increase in the number of atoms making up the aliphatic residues, going from ANHAC a H 2 ACOA (in Gly) over The investigation will allow us to identify the key atoms, both single and group thereof, that are responsible for both highand low-energy regions in the respective Ramachandran plots, along with any global trends that consistently appear across the three systems investigated. The existence of global trends at the atomic level will indicate transferability [32,33] within the systems, a key cornerstone of many force fields and a topic we have previously reported on for oligopeptide chain. [34]

Dataset generation
The optimized geometries of the global energy minima of glycine (Gly) dipeptide, valine (Val) dipeptide, and isoleucine (Ile) dipeptide were taken from our previous work. [35] The angle u is defined as the Figure 2 shows these two angles and the nuclei involved in defining them. The generic notation used here will be explained in the next section. For each system in turn, the u and w dihedral angles were rotated by 158 increments between 21808 and 11808, resulting in 24 5 [180 -(-180)]/15 geometries, additional to the global minimum. First, the w dihedral angle of the global minimum was frozen, while the u angle was rotated by the increment angle over the full range (-1808 u 11808) using the GAUSSVIEW package. Once all 24 additional geometries for u were obtained, collectively known as the phi (u) scan, the procedure was repeated but now freezing the global minimum's u angle and incrementing the w angle over the full range (-1808 w 11808), generating the psi (w) scan. The 48 additional geometries (24 for each of the two scans in total), were then relaxed through geometry optimization but keeping both the u and w dihedral angles frozen. Note that the residues were also optimized and not kept rigid relative to u or w. The program GAUSSIAN09 [38] was used to perform the geometry optimization, single-point energy calculations and printing of the wavefunction for subsequent QCT analysis, for each geometry. The optimizations and calculations were also performed at B3LYP/apc-1 level, [39] which is the same level of theory with which the optimized coordinates were originally obtained. In total, 6 (52 3 3) sets of geometries were obtained, arising from two scans carried out on each of the three capped amino acids, each set consisting of 25 (524 1 1) geometries. In summary, the overall analysis of all systems is based on 147 5 6 3 25 -3 IQA-partitioned wave functions, where we corrected for the fact that the three global minima are used for both u and w.
After the ab initio calculations described above, the IQA energy partitioning calculations were performed using the AIMAll program [40] (version 16.01.09). The non-default settings requested in AIMAll for the IQA calculations were: the "TwoE" program for the calculation of intra-atomic electron-electron repulsion energies was turned off (-usetwoe 5 0), the target spacing between interatomic surface paths was improved from fine to very fine to ensure accurate atomic integrations (-iasmesh 5 veryfine), and atomic IQA energies were requested (-encomp 5 3). The IQA energy partitioning is outlined in Section "The Interacting Quantum Atoms (IQA) Approach", which provides only the relevant equations. In order to gauge the accuracy of AIMAll's energy partitioning, the IQA molecular energies were compared to the ab initio energies obtained from GAUSSIAN09. The discrepancy between this (unpartitioned) ab initio molecular energy and the IQA-reconstructed molecular energy is referred to as the IQA recovery error. For some geometries of higher energy, the obtained IQA recovery error was considered to be too high (1 kJ mol 21 < IQA recovery error < 1.5 kJ mol 21 ). Hence, the IQA energies for these geometries were recalculated using stricter conditions for the basin outer angular quadrature (-boaq 5 skyhigh_leb instead of the default -boaq 5 auto) in order to obtain better atomic integration accuracy. The best IQA energies, as determined by the IQA recovery error, were incorporated into the final dataset and will be reported in the Results section. Figure 2 illustrates the notation followed throughout this article. A generic notation is useful in the current study because it allows atoms that are present in all three amino acids to be identified using a single atom label and, thus, easily compared across the three amino acids. This notation is more concise than that of the unique atomic labels assigned by GAUSSVIEW, which naturally change with varying system size. The standard residue subscript labels are used, namely a, b, c, and d, and each is assigned to the covalently bonded atoms forming the residue. Both the carbonyl and amino groups at either side of the C a are labelled with label "i," which refers to the central residue as a subscript. Either side of these groups, the adjacent carbonyl and amino groups are labelled as "i -1" and "i 1 1," respectively. So, an increasing index refers to a move towards the NHCH 3 terminus (by convention on the right), while a decreasing index refers to moving in the opposite direction, towards the acetyl C(@O)CH 3 terminus (on the left).

Generic notation
The Interacting Quantum Atoms approach IQA [41] is a topological approach that sits alongside the Quantum Theory of Atoms in Molecules (QTAIM) [42][43][44] and the Electron Localization Function (ELF) [45] under the collective header of QCT. [46,47] All three share the central idea of using the gradient vector field to extract chemical information from a system. QTAIM and IQA both share the presence of topological atoms. Topological atoms, such as those seen in Figure 2 for isoleucine dipeptide, are finite-volume three-dimensional fragments of space representing a single atomic basin, determined by the gradient paths of a systems electron density. These atomic basins (i.e., atoms) are well-defined even when molecules are compressed (short range van der Waals complexes), and they are space-filling (i.e., non-overlapping and gapless). The latter feature ensures that in an analysis of properties derived from the electron density (such as atomic energies) no part of the system is unaccounted for. This hallmark is an important advantage [48] of QCT, particularly when applications are expanded to interactions between ligands and proteins [49] where currently classically standardized van der Waals radii are used leaving areas of space unattributed to either the ligand or protein. The previously entitled Quantum Chemical Topological Force Field, [50] but recently renamed to FFLUX, [51] is a force field currently being developed with topological atoms at its heart. FFLUX features a novel design, unlike the classical designs used in other popular force fields such as AMBER and CHARMM. FFLUX maps geometrical change to a change in atomic energy through a machine learning method known as kriging. [52] Two recent publications [50,53] describe its architecture and the process of model building in detail. FFLUX uses four primary energies to describe a molecule (or any system). The energies are obtained via the IQA energy partitioning, and include the intra-atomic energy, the classical electrostatic energy, the exchange energy and the correlation energy. Each will be introduced in turn and described through the following equations.
IQA partitions a molecule's energy,E Mol IQA , into a sum of atomic energies, E A IQA , which in turn are composed of intra-atomic and inter-atomic energy components: where A and B represent atoms, the superscript denotes the atoms the energy is associated with and the subscript denotes the type of energy, a format that applies to all subsequent equations. The intra-atomic energy can be divided into its kinetic, T, and potential, V, energy contributions as follows: where T A represents the kinetic energy of atom A, V AA ee is the (repulsive) potential energy between the electrons within atom A, and V AA ee is the (attractive) potential energy between the electrons and nucleus of atom A.
Similarly, the interatomic energy can be divided into its potential energy contributions (there is no kinetic contribution this time): where V AB en , V AB ne , and V AB ee follow the same format as described earlier. This time the superscript and subscript ordering playing a more important role. For example, V AB en refers to the electrons of A and the nucleus of B. Additionally, V AB nn is the (repulsive) potential energy between the nuclei of A and B. The first three terms are bracketed to illustrate their connection to forming the "classical" electrostatic energy. To complete the electrostatic energy, V AB ee must be expanded to: Here, "Coul" refers to the Coulombic interaction between the electrons, "x" represents the exchange energy, and "corr" the correlation energy. Now that the Coulombic energy has been separated from V AB ee , the classical electrostatic energy V AB cl can be represented as: allowing the interatomic interaction energy to be rearranged to This arrangement is intuitive: the classical electrostatic energy can be identified separately from the exchange and correlation energies, which together, can be thought of as the covalent contribution within an interaction.
A recent FFLUX publication [53] introduced the use of interatomic energies designated by AA 0 instead of AB. Here A 0 represents every other atom in the molecular system except A. Thus, the notation AA 0 denotes the interatomic energy between an atom A and its surrounding environment A 0 , such that The energies in eq. (7) are only approximately equivalent because they use two separate algorithms for calculation, one analytical (left term) and one numerical (right term), naturally resulting in some minor differences between the values.
In this investigation, we will also study the IQA energies at the molecular level, as well as at the atomic level and at (functional) group level (more precisely, at the level of a meaningful collection of atoms). In order to define the "molecular energies," we observe that: where "i" may be substituted for the IQA energy type of choice (i.e., intra, IQA, cl or xc). Note that for the latter two subscripts, E is replaced by V (E A cl V AA0 cl and E A xc V AA 0 xc ). A particular type of molecular energy (e.g., electrostatic or exchange) is then obtained from a simple summation of the respective energy type over every atom A. Similarly, a particular energy of a (functional) group is obtained by energy summation over every atom belonging to the (functional) group. As a result, a hierarchical search for chemical insight can be carried out whereby first the total energy profile itself is studied (E Mol IQA ), next the various energy types at the molecular level (E Mol i ), at the (functional) group level (E G i ) (where G is any meaningful collection of atoms), and finally the various energy types at the atomic level (E A i ). The IQA approach has been used to study many different chemical systems such as the interactions of Zn(II) complexes, [54] organoselenium molecules, [55] halogen-trinitromethanes, [56] halogen bonding, [57,58] and hydrogen bonding. [59,60] IQA has also been used to shed light on chemical phenomena such as steric repulsion, [61][62][63] hyperconjugation, [64] reactions, [65] and transferability. [34] The broad applicability of IQA, and its well-defined and robust quantitative nature make it ideal for the current investigation. We note that IQA does not suffer from a list of conceptual and numerical problems plaguing the older and more traditional energy decomposition analysis, the many variants of which have recently been reviewed and critically discussed. [66] The next point to highlight regards IQA's compatibility limitations, in particular the lack of affordable correlation. Until recently, IQA was incompatible at theory levels other than Hartree-Fock, full configuration interaction, configuration interaction with single and double excitations, and complete active space. This is due to perturbation theory remaining computationally very expensive even for small systems, and standard density functional theory (DFT) not providing a well-defined second-order reduced density matrix. However, recent developments have managed to expand IQA's application to include at least some correlation through B3LYP [67][68][69] and M06-2X level DFT, and the direct correlation through coupled cluster with single and double excitations [70,71] level. In 2016, MPn-IQA (n 5 2, 3, or 4) also became possible. [72] The inclusion of correlation is anticipated to have important consequences in the investigation of systems driven by dispersion energy. For further details on the expansion of IQA, the reader is directed to the respective references. Accordingly, for a more complete description of the IQA approach, the original paper of Blanco et al. [41] should be consulted.
A final point discusses a potential concern in connection with the validity of the atomic virial theorem. Although at the root of QTAIM, this theorem is actually irrelevant for both IQA atomic energies and molecular energies, in the sense that IQA does not assume nor use the virial theorem (either atomic or molecular) in any way. We also note that self-consistent virial scaling (SCVS) is not applicable anyway to DFT methods, such as B3LYP, so the energy partitioning in this paper could not benefit from such a correction in the first place. It is debatable whether virial-based atomic energies are useful, in practice, even with SCVS to satisfy the molecular virial theorem. One could go as far as to state that virial-based energies are given in AIMAll basically for historical reasons. In summary, our results are not affected by the concern raised above.

Preliminary analysis
The aforementioned IQA recovery error is due to the integration error, L(X), that accompanies each atomic integration. For FULL PAPER WWW.C-CHEM.ORG our systems, the mean absolute IQA recovery errors for the u scans were 0.39, 0.60, and 0.95 kJ mol 21 for Gly, Val, and Ile, respectively. For the w scans, they were 0.35, 0.58, and 0.57 kJ mol 21 , respectively. With observed relative energy barriers of up to $64 kJ mol 21 and mean absolute IQA recovery errors of up to 0.95 kJ mol 21 , the maximum percentage error of the values becomes (0.95/64) 3 100 5 1.5%. We conclude that all effects seen and discussed are far above integration noise. The energy profile with the highest energy range (Ile-u) also has the highest mean absolute IQA recovery error. Larger atomic integration errors are typically observed for atoms in more complex geometries, for example, in molecules energetically far from the global energy minimum or in molecules with unusual topology.
Analysis at molecular level Figure 3 plots the E Mol IQA energy profiles for each system, and for both the u and w scans. The colored regions depict the similarity between Gly and Val/Ile energies: (1) brown indicates confluence between all three dipeptides, (2) navy indicates the appearance of an additional maximum in Val/Ile, not seen for Gly, and (3) orange indicates a change in position of a maximum seen for all three dipeptides. The first point to note is the similarity between the Val and Ile energy profiles throughout both scans. Using the Pearson correlation coefficient r, where r 5 1 indicates a perfectly correlated dataset, values of r 5 0.996 and r 5 0.997 are obtained between the Val and Ile energy profiles within the u and w scans, respectively. The striking similarity between the profile of Val and Ile is not surprising given that their side chains only differ by a methylene group. In contrast, the energy profile of Gly is less correlated to that of Val, for example, with r 5 0.857 and r 5 0.833 for the u and w scans, respectively. However, in the u scan, it is clear that the common backbone structure between Gly and Val/Ile is accountable for the molecular barrier interval of 21508 u 1158 (brown area in Fig. 3, panel u) where very similar energy profiles are observed across all three systems. Outside of this interval, we deduce that the sidechain must influence the energy profile and cause the maximal u torsional barrier at 11658 for Val and Ile, which is absent in Gly (navy area in Fig. 3, panel u). In the w scans, the Gly and Val/Ile Note that not all methylene or methyl hydrogen atoms are labeled in order to avoid cluttering the figure. This emblematic figure was generated by the inhouse program IRIS, which is based on previously published [36,37] algorithms. The following fragmentation will prove to make sense later in this article: CH 3 |C(@O)AN(H)|C a HR|C(@O) AN(H)|CH 3 , where each fragment is flanked by two vertical bars and consists of 4, 4, 15, 4, and 4 atoms, respectively, totalling 31 atoms; (right) schematic clarifying the notation in this paper (for the ith amino acid, which is isoleucine in this case). [Color figure can be viewed at wileyonlinelibrary.com] energy profiles are less correlated according to r, and indeed turn out to be more different visually. The additional local maximum at w 5 1158 (navy area in Fig. 3, panel w), and the translation of the maximum barrier in w, from 2758 in Gly, to 21208 (in Val/Ile)(orange area in Fig. 3, panel w) suggests a broader influence of the sidechain throughout the dihedral angles. In summary, the sidechain influences the energy profile more in w than in u, because the former lacks the brown area of high energy profile confluence. To aid the interpretation of Figure 3, Figure 4 shows the molecular graphs of the two energy maxima in u scan (-158 and 11658), and two energy maxima in the w scan (-1208 and 1158).
So far, we have presented an unpartitioned perspective based on geometrical differences between Gly, Val, and Ile, allowing us to comment on the general influence of the sidechain (and its size) on the molecular energy. The current literature states that the barrier at u 5 11658 is a result of the bbranching on the residue causing hard-sphere steric clashes [73] between O i-1 and C b (at B3LYP/ANO-L-VDZP level), whereas the barrier at w 5 21208 relates to clashes [74] between C b and N i11 . However, around w 5 1158, where our earlier observations would suggest a sidechain-related destabilization, the literature reports the region as being "sterically allowed." [74] This region will be investigated further later. Finally, the barrier at u 5 2158, which occurs in all three systems, is reportedly due to two sets of backbone clashes, one between O i-1 and C 2 i , and a second [74] between O i-1 and N i11 . Figure 4 is meant to help in visualising the atomic clashes mentioned above but a careful inspection may leave the impression that a more thorough atomic analysis, in the spirit of topological atoms, is needed.
To this point, we only commented on the general regions where one expects the sidechain atoms to influence (navy/ orange in Fig. 3), or not (brown), the molecular energies, given the structural differences between Gly and Val/Ile. To comment further on the agreement between literature and the IQA  perspective, it is necessary to partition the molecule into fragments. The IQA interpretation for the causes of the observed maxima (and minima) will now be investigated, for each system, at three partitioning levels: molecular, functional group (or collections of atoms) and atomic (i.e., single atom).
We now analyze the overall trends of the molecular IQA energies (relative to the global minimum) illustrated for the u and w scans in Figures 5 and 6, respectively. These figures show, for each of the three systems, the profile of each IQA molecular energy contribution to DE Mol IQA , that is DV Mol cl , DE Mol intra , and DV Mol xc (see eq. (8)). For convenience, DE Mol IQA is plotted again for each system, repeating what was already shown in Figure  3. It is clear that the energy scale between Figure 3 and Figure   5 (or Fig. 6) differs by about an order of magnitude. This scale difference explains why the energy barriers look less pronounced at the scale of hundreds of kJ mol 21 , which is necessary though to show the behavior of the three types of molecular energy contributions to the total molecular energy. The difference in energy scales also suggest immediately that substantial energy cancellation must take place.
Indeed   (bottom). Accompanying the destabilising intra-atomic energy is a stabilization of the electrostatic energy within these atoms, which is expected during the formation of a hydrogen bond.
The fact thatV Mol cl causes its own high-energy regions is interesting, surprising, and perhaps even controversial. To explain why, it is necessary to refer to a recent study completed within our group where fluctuations in E intra were observed to mimic a Buckingham-type potential and hence significantly contribute towards steric hindrance. [76] As a result, the behavior of DE intra can be viewed as a measure of the steric hindrance. However, this conclusion is only based on a set of observed correlations, and statistically robust fits to classical Buckingham-type (exponential) potentials, successfully obtained for many small van der Waals complexes. Unpublished results (involving an oligopeptide intra-atomic energy analysis) have also shown that E intra often correlates with the atomic volume of an atom: where energy is stabilized, the atomic volume increases. Thus, the general stabilization of D E Mol intra across the energy profile should be interpreted as due to expanding atoms (i.e., relaxing) when the backbone is extending itself. Such extension occurs when u or w values move towards 21808 or 1808, that is, further away from the global minimum. To be more specific, of the 49 (524 1 24 1 1) conformations (see Section 2.1) studied for each dipeptide, only four are sterically destabilized relative to the global minimum (deducible from Fig. 6, Val/Ile orange curve). At this minimum, there is an electrostatically stabilising intramolecular hydrogen bond, which explains why DE Mol cl stabilizes at nearby torsional angles (Fig. 6, Val/Ile purple curve at w 5 08 and w 5 608). From this reasoning, we can initially conclude that it is generally not steric hindrance (through hard-sphere clashes) causing the molecule to be less stable in many regions of the Ramachandran plot. Instead, the high energies are caused by a lack of electrostatic stability. Figures 5 and 6 also show how Gly is more electrostatically destabilized than Val and Ile for many dihedral angles (both u and w) on the energy profile. However, the greater electrostatic destabilization is also accompanied by a greater DE Mol intra stabilization, resulting in Val and Ile having the higher DE Mol IQA , and therefore barriers, at such u/w angles. Hence, IQA confirms that Gly, due to the absence of a side chain, is conformationally less restricted than other amino acids, which is expressed through greater relative stabilization via DE Mol intra . These results are another example of the prominent relationship between DE intra and DV cl energies: as one becomes stabilized, the other becomes typically destabilized. This counterbalancing effect is elaborated upon in our recent work [77] on large water clusters.

Analysis at (functional) group level
Next we observe the functional group behavior. As mentioned in the caption of Figure 1 the following partition will prove useful: CH 3 |C(@O) AN(H)|C a HR|C(@O) AN(H)|CH 3 . We introduce the following notation to describe these fragments: (i) the methyl groups are combined and this collection of eight atoms is called "Caps," (ii) the peptide group at the C-terminus (i.e., left, involving the O i-1 -C i-1 -N i -H i ) is called "Pep-," (iii) the peptide group at the N-terminus (i.e., right, involving the O i -C i -N i11 -H i11 ) is called "Pep1," (iv) the pivotal C a atom (and one H a (for Ile/Val) or two H a (for Gly) atoms bonded to it) is called (CH) a , and finally (v) the sidechain atoms (full chain for Val/Ile only, called "Sidechain"). The energies associated with the five atom groups defined above are denoted respectively:  Figures 8 and 9 plot the functional group analysis for the u and w scans, respectively. Figures 8 and 9 illustrate a few interesting points. First, through the newly available functional group energy profiles we can now establish a high degree of transferability, that is, a high similarity between the various energy profiles in both u and w scans. This analysis allows us to dissect the consistency in the molecular energy trends seen in Figures 5 and 6. Second, the a-atoms fluctuate very little across u and w scans within 615 kJ mol 21 . These atoms are at the pivot point of the dihedral rotations and link the backbone to the sidechain. One would expect them to be energetically sensitive but this is not the case. Third, regions within the u and w scans can be attributed to certain functional groups. As a remarkable example, we see that the peptide groups only are responsible for the barriers in the u scan within an interval approximately stretching from the global minimum to u 5 2158. Outside of this interval the barrier results from both the peptide groups and the sidechain atoms. Within the w scan, again only the peptide groups are responsible for the barrier right of the global minimum (w > 1758) in all systems (allowing for discrepancies up to $5 kJ mol 21 ). For glycine, this remarkable match extends from w 5 21808 to w 5 2458. Again, outside these areas, the a-atoms and the sidechain (for Val/Ile) atoms make significant contributions towards the barrier seen at w 5 1158. Collectively, the barriers at u 5 11658 and w 5 1158 result from important sidechain contributions, confirming our earlier hypothesis on the rationale behind each of these maxima for Val and Ile.
The behavior of each functional group energy (DE intra , DV cl , and DV xc ), composing the total energy of the fragment (DE IQA ) may be seen in the Supporting Information in Figures S1 and  S3 for u, and Figures S4 to S6 for w. From these plots, it is clear that the peptide groups experience very large In summary and broadly speaking, the molecular intraatomic and electrostatics are remarkably well described by the peptide atoms alone. However, such remarkable behavior is lost when the intra-atomic and electrostatics are added resulting in a more constrained relative energy range. In other words, the resultant cancellation and concomitant intricate interplay, leads to energy magnitudes similar to those of the remaining atomic groups (DE Mol intra and DV Mol cl ) and exchange energy in general (DV xc ).
Supporting Information Figure S3 shows how DV xc is dominated by only one peptide group (Pep1), which is significantly  Figure S6 shows the same effect for the w scan, although not so pronounced. For clarity, Pep1 corresponds to the peptide group with H i11 forming an intra-molecular hydrogen bond with the O i-1 and N i atoms (see Fig. 7).

Analysis at atomic level
As a result of the group partitioning, molecular behavior has been localized to, for example, peptide atoms for certain torsional intervals. The energy profiles have also been rationalized by their electrostatic, steric, and exchange origins. Next, we take our analysis one partitioning step further and observe the energies at the atomic level. At the atomic level, we aim to isolate individual atoms causing the barriers observed within the u/w scans. So far, we have learnt about the consistency of the molecular and group energy profiles across each system. At the more-refined atomic level, we also now expect to see this consistency.  Figures S7 and S8 plot the same type of information for the remaining two systems: Gly and Ile. Here, we only report the Val results because it is clear that the atomic trends are very similar throughout each system, within each scan. To clarify, where backbone atomic destabilization is observed within Gly, is it equally present within Val. In addition, by comparing Figures 10 and 11 to Supporting Information Figures S7  and S8, it is clear the Val and Ile plots are almost identical for every torsional angle. Hence, we focus on the general trends using Val as the example.
In Figures 10 and 11, we plot only the key atoms with energy fluctuations greater than 610 kJ mol 21 . Many sidechain and methyl-cap atoms fell below this threshold and are hence not included in these figures. In fact, very few sidechain atoms fluctuate with any significant energy deviations, except for two H ! atoms. In addition, the C a atoms also fluctuate very little (< 68 kJ mol 21 ) but are included in the plot to demonstrate this key point. In the group analysis, we already established a lack of energy fluctuation for C a atoms but we reiterate this surprising result considering C a 's key bridging role. The O i-1 atoms are the most perturbed atoms across all six torsional scans, indicating their importance to the overall molecular stability. In contrast to the destabilising behavior of the O i-1 atoms, the vicinal C i-1 atoms significantly stabilize throughout. Within the u scan of Figure 10, we also see N i becoming the most destabilized atom when 2908 < u < 08 and. In the w scan of Figure 11, N i starts to match O i-1 in terms of destabilization magnitude in the vicinity of the w 5 21208 barrier. We learn that the O i-1 and N i atoms dominate the destabilization within each system, across both scans. Many of the remaining atoms fluctuate with some preference towards stabilization or destabilization, or oscillate around the zeroenergy line (given by the global minimum). Figures 10 and 11 also plot the atomic basins for the most destabilising atoms present at the barriers at u 5 2158, u5 11658, and w 5 21208, w 5 1158, respectively. The literature reasoning behind each barrier will now be further compared with the IQA-based reasoning. Both Mandel et al. [2] and Ho et al. [74] state that the clash between O i-1 and N i11 contributes to the barrier at u 5 2158. The DE A IQA analysis shown in Figure  10 confirms this because the largest destabilising (i.e., positive energy) contributors to this barrier are indeed O i-1 and N i11 . However, Mandel et al. [2] quotes the clash between O i-1 and C i as an extra contributor to this barrier, which we cannot confirm because the positive DE A IQA for C i is an order of magnitude smaller than that of N i11 .
We now analyze the barrier at u 5 11658 in a similar way. Ho et al. [74] suggest that a clash between O i-1 and C b causes this barrier. Our analysis confirms that O i-1 is indeed a major factor of destabilization (large positive DE A IQA value) but C b is not at all (in fact, because it is always smaller than 4 kJ mol 21 is not even shown in Fig. 10). However, if our analysis is forced to point out destabilising atoms from the side chain then one H b and one H ! atom emerge. Much more significant destabilization originates from N i and C i . We are now in a position to refine our earlier observation in the molecular energy analysis (see Section "Analysis at molecular level"). Although the sidechain causes the u 5 11658 barrier, it results from three peptide atoms (O i-1 , C i , and N i ) being destabilized alongside two sidechain hydrogen atoms but not sidechain carbons.
Within the w scans, we do not see the reported [74] clash between N i11 and C b when w 5 21208. Instead, we observe that N i and O i-1 are most destabilized alongside H a (see Fig. 11). Moreover, the suggested N i11 is actually stabilising at w 5 21208, according to our findings. For the w 5 1158 barrier, which is known to be sterically allowed but without specific clashing atoms identified, we discover that the sidechain (within Val/Ile) is destabilized through H ! . Indeed, in Figure 11 we see a clear peak at w 5 1158 for H ! . In addition to H ! being destabilized, H a and O i-1 are also destabilising when w 5 1158.
Overall, some of our atomic interpretations of energy barriers are quite different to those in previous literature. However, our energies are more complete than those represented by, for example, the hard-sphere model, which only considers the steric-like behavior of an atom. To better understand the nature of the destabilization of each atom, it is necessary to observe the causal energies (DE A intra , DV AA0 cl , and DV AA0 xc ) composing DE A IQA . The Supporting Information reports each of these three atomic energy profiles in Figures S9 to S11 for the u scan, and again in Supporting Information Figures S12 to S14 for the w scan. Collectively, Supporting Information Figures S9 to S14 allow us to identify the source of destabilization for every atom known to be significantly destabilized (through DE A IQA ) at the barrier peaks. The results are summarized in Table 1.
We note an unusual result for two cases: O i-1 (u 5 11658 and W 5 21208) and C i (u 5 11658), where we observe the anomalous combination of both significantly destabilising steric (intra-atomic) and destabilising electrostatic energies. For O i-1 in particular, the anomalous lack of cancellation causes the atom to be the most destabilized (through DE A IQA ) atom across each of all six torsional scans.
Energy profile consistency has been identified within both the molecular energy analysis and the (functional) group analysis. When a system is partitioned, consistency of energy trends is commonly known as "transferability," which is a key topic in force field design. If atomic energies (group or single atoms) are identified as being consistent across systems, then such atomic energies are said to be transferable. Categorising such transferable atoms should become a significant topic within computational chemistry itself.
Finally, we comment further on the transferable energy trends. Some weak trends occur in the backbone atoms when comparing Gly with Val/Ile but they are strengthened when comparing Val and Ile. Within the atomic analysis, Gly and Val atomic energies are similar to within 8 kJ mol 21 (while 21658 < u < 1158) and within 9 kJ mol 21 (across all w angles). The Val and Ile atomic energies are even closer, within 5 kJ mol 21 (while 21658 < u < 11658) and within 3 kJ mol 21 (when w < 08 or w > 08). The minimal energy discrepancies across Gly, Val, and Ile corroborate fragment transferability, which force field developers need in their atom typing. The results presented also support some of our other work [34] on IQA and transferability.

Conclusions
In this study, three dipeptides (Gly, Val, and Ile) were investigated to gain a better understanding of the intrinsic behavior of amino acids at three successive levels of detail: molecular, (functional) group, and atomic. The topological energy partitioning method called IQA provided four types of energy to achieve this goal: intra-atomic (self ) energy (E intra ), electrostatic energy (V cl ), exchange(-correlation) energy (V xc ), and the sum of all three (E IQA ). We determined the causes of the highenergy regions at relevant combinations of u/w in the Ramachandran plots.
At molecular level, a destabilization of the electrostatic energy is the cause of the barrier regions across both u and w scans, and across each dipeptide system. However, each electrostatic barrier is dampened by counter-stabilization from D E Mol intra and DV Mol xc . Electrostatics dictating the barriers is an unexpected conclusion given the prevailing view that steric hindrance can explain the Ramachandran regions.
At atom-group level, the peptide groups are consistently the cause of the barriers at u 5 2158 and w 5 21208, with the barriers at u 5 11658 and at w 5 1158 arising as the result of both the peptides and sidechain groups becoming destabilized, cooperatively.
At atomic level (A), the aforementioned group trends were reflected in destabilized DE A IQA energies for key peptide atoms (O i-1 , C i , N i , and N i11 ) and some sidechain hydrogen atoms (H b and H c ).
The origin of the atomic destabilization was also clarified through the analysis of the DE A intra , DV AA0 cl , and DV AA0 xc energies (A 0 is the atomic environment of A), confirming some steric destabilization within the O i-1 , C i , H a and sidechain H c and H b atoms at barrier peaks. Surprisingly and interestingly, the energies of the sidechain carbon atoms (C b , C ! , C d ), and more importantly C a , remained relatively unperturbed throughout.
Finally, some very promising results regarding transferability were observed where absolute values of atomic energies are smaller than 9 kJ mol 21 between Gly and Val/Ile, and smaller [a] Their atomic basins are not drawn in Figure 10 but collectively contribute around the barrier.

FULL PAPER
WWW.C-CHEM.ORG than 5 kJ mol 21 between Val and Ile, for the majority of torsional angles across both u/w scans.

Acknowledgment
We thank Dr Todd Keith for useful comments on virial-based atomic energies.