Using the Relative Energy Gradient Method with Interacting Quantum Atoms to Determine the Reaction Mechanism and Catalytic Effects in the Peptide Hydrolysis in HIV‐1 Protease

Abstract The reaction mechanism in an active site is of the utmost importance when trying to understand the role that an enzyme plays in biological processes. In a recently published paper [Theor. Chem. Acc. 2017, 136, 86], we formalised the Relative Energy Gradient (REG) method for automating an Interacting Quantum Atoms (IQA) analysis. Here, the REG method is utilised to determine the mechanism of peptide hydrolysis in the aspartic active site of the enzyme HIV‐1 Protease. Using the REG method along with the IQA approach we determine the mechanism of peptide hydrolysis without employing any arbitrary parameters and with remarkable ease (albeit at large computational cost: the system contains 133 atoms, which means that there are 17 689 individual IQA terms to be calculated). When REG and IQA work together it is possible to determine a reaction mechanism at atomistic resolution from data directly derived from quantum calculations, without arbitrary parameters. Moreover, the mechanism determined by this novel method gives concrete insight into how the active site residues catalyse peptide hydrolysis.


Introduction
The quantum physics governing ac hemicals ystem is complex: each atom interacts with every other,a ccording to various energy types:e lectrostatic or Coulomb, exchange and electron correlation. Moreover, fromaphysical point of view,k inetic energy also plays ar ole althought his role is rarely discussed in chemistry textbooks, if at all. Despite this underlying physical complexity,ac hemist routinelys ingles out af ew atoms in a given system in order for them to explain the behaviouro ft he total system. For example, textbooks will tell that the base-pair complex guanine···cytosine is more stable than the adenine···thymine complexb ecause the formerh as three hydrogen bonds while the latter has only two. However,i tw as shown that this explanation is not correct in work [1] that introduced the so-called secondary interaction hypothesis but later it was shown [2,3] that this hypothesis is not correct either.T his situation proves the need for am oderna nd rigorous protocol to bridge the gap between quantumm echanical data and a chemicalr ationale. In summary,t he main question one faces is whichf ragment of at otal system dictates the behaviour of the total system,i nt erms of energy profiles. More precisely,t wo questionsc an be posed:" which atoms are involvedi nd eterminingt he behaviour of as ystem?" and "which energy types determine the total behaviouro ft he system?" In ar ecently published paper [4] we proposed an answer to these two questionsb yi ntroducing the so-called Relative EnergyG radient (REG) method. The REGm ethodi sa ne xhaustive, parameterless and generalm ethod that can be appliedt o any energetically partitioned potential energys urface (PES).I n essence, the REG methode nables (i)the determination of a subset of partitioned energies that best describe the total behaviour of the system, and (ii)the extraction of chemical insight from an energetically partitioned system.T he REG method wasd esigned to allow for the automated analysiso f arbitrarily sized chemical systems, andi sp articularly useful for large systems. Indeed, as the system size increases, the number of energy terms increases dramatically,m aking it unwieldya nd even impossible to study large systems manually.
In our original paper [4] we showedh ow the REG methodd etectedthe energy terms that most determine the behaviouro f the water dimer as the distance between the two monomers varied. It was shown that the energyt erm that most favours binding in the water dimer is the classical (electrostatic) term betweent he hydrogen bond donora nd the hydrogen bond acceptora toms. The purposeo ft he currentp aper is to show that the REG method is ag eneral and powerful tool that not only offers insighti nto the local stability preference in small molecules (e.g. the gauche effect [5] )b ut also elucidates chemical mechanism in biomolecular active sites. In the latter,t he REG method operateso na no verwhelmingly large amount of data. We show that the REG methodc an be used to pinpoint atoms and functional groups that facilitate and inhibit reactions in ap rotein's active site.
In this study,t he so-calledI nteracting QuantumA toms( IQA) methodi su sed in conjunction with the REG method.T he IQA methodw as inspired by early work [6] and is derived from the QuantumT heory of Atoms in Molecules (QTAIM), [7][8][9][10][11] an established and popular [12] electron density partitioning method. The domain of applicabilityo fQ TAIM is large andh as also been used in QSAR studies. [13][14][15] The IQA approach requires integratingo ver the electron density within QTAIM atoms, also known as topological atoms, in order to obtain well-defined intra-atomic and inter-atomic energies. [16,17] The IQA approach has been used successfully in an umber of papers, including (but not limitedt o) the following case studies:t he nature of halogen interactions in perhalogenated ethanes, [18] prototype S N 2r eactions, [19] proton-transfer reactions, [20] intramolecular bond paths between electronegative atoms, [21] hydrogen-hydrogen interactions with respect to the torsional barrier in biphenyl, [22] short-range electrostatic potentials across torsional barriers, [23] CO 2 trapping by adduct formation, [24] atom-atom repulsion as Buckingham potentials, [25] and the diastereoselective allylation of aldehydes. [26] Allt heses tudies focusedo ns mall molecules, whereas in this paper we show how IQA is able to give conclusive results when appliedt ol arge biomolecules by using it in conjunction with the REG method.
In principle,t he REG methodc an be applied to anye nergy partitioning scheme but the IQA method offersanumber of attractive features. In particular,w hen compared to other energy partitioning methods,s uch as Energy Decomposition Analysis (EDA) [27,28] for example, there are three reasons supporting the combination of REG with IQA.
The first is that IQA provides ad efinition of an atom,u nlike EDA. The definitiono fa na tom enables assigning energies to atoms and the pairwise interactions they are involved in. IQA providesc hemically intuitive [29] and meaningful energies. Indeed,t he three major energy components of IQA (E A intra , V AB xc and V AB cl )r espectively relate to an atom's internal energy,t he covalenti nteraction energy between atoms A and B,a nd the "classical" Coulombi nteraction energy between atoms A and B.T hese contributions can be treated on ane qual footing within IQA. In contrast, EDA only partitions complexes into their monomeric constituents (or even molecular fragments), thus not offering information at atomicr esolution.M oreover, EDA requires ar eference state, [30] whereas IQA does not.
As econd reasonf or using IQA is the space-filling nature of the topological atoms, which meanst hat (i)there are no overlapping areas of electron density,a nd that (ii)there are no gaps between the atoms, that is, each point in space belongs to at opological atom. This space-filling property prevents the over-counting of electron density and also ensures that all electrond ensity is accounted for (allowing for av ery small loss due to ac onstant electron density envelope at the outer edge of am olecule). The latter property prevents under-counting of electron density.T he space-filling nature of the topological atoms leads to the additivity of their properties.I no ther words, the energy of af unctional group is simply the sum of the energieso ft he constituent atoms (with as mall numerical error introduced by the integration methodu sed).
At hird and final reason is that IQA also has three advantages over NaturalB ond Orbitals (NBO). First, IQA allows for welldefined energy terms to be analyzed, whereas NBO only approximates the electrostatic energy within am olecular system. The reason for NBO's approximation is that the electrostatic interactions and "steric" interactions are convoluted within the so-called Lewis-type energy.T his convolution is due to the use of ar eference state within the method. [31,32] Secondly, IQA is able to describe "through-space" interactions, whereas NBO is not. [33] Thirdly,S tone [34] showed that NBO is particularly prone to errors duetob asis set superposition, while IQA is not.
In summary,w ed ecided to move forwardw ith IQA because of its atomicr esolution, its independence of reference states, its full and clear separation of the different types of energy contribution, and its capacity to reveal "through-space" interactions.H owever,w en ote that IQA is computationally expensive due to the finite volume and complicated shapeo ft opological atoms.
It is possible to use alternative metrics to determine reaction mechanism. Such methods include the "Bond Evolution Theory", in which the Electron Localisation Function is used to study the "flow of electron density" within asystem. [35] Another such example (involving electron density) has been shownt o be ap oor indicator of bonding is that of QTAIM'sb ond critical points, which can vanish upon nuclearv ibrations. [36] By using the IQA methodt od etermine the reactionm echanism we are using ap hysically robust methodology.T his is because energy is adefinite measureofr elative stabilityand can therefore indicate bonds as energeticallyf avourable exchange-correlation energies between atoms.
Aspartic proteases are ac ommont ype of protease enzyme. Their active sites contain aspartate residues, which activate water to catalysep eptideh ydrolysis. Figure 1o utlines the traditional mechanism [37,38] of peptide hydrolysis in aspartic proteases.
Peptide hydrolysis is an important step in the action of many diseases such as breast cancer, malaria, hypertension, Figure 1. The traditional mechanism for the peptideh ydrolysisreaction in aspartic proteases. The forward directiono ft his reaction occursf rom left to right,a convention that is maintained throughout this article. The straight arrows representt he overriding direction from reactant (left) to products (right), over the transition state (middle).
Alzheimer's disease and the human immunodeficiency virus (HIV). [39][40][41] Concerning the HIV,t he aspartic protease site (alongside the reverse transcriptase [42] and the integrase [43] sites) is one of three active sites commonly used as drug targets. The protease site is ac ommon drug target due to its important role in replicating the HIV. [44] There has been much research into the inhibition of the aspartic protease activesite [45,46] including research into dual-acting drugs that, for example,i nhibit multiple sites relevant in the replication of the HIV. [47] The traditional mechanism,s hown in Figure 1, consists of two main features.T he first is nucleophilic attack of the water upon the peptidec arbon atom to form the transition state. The second is the hydrolysis of the peptideb ond in the transition state, in which the carboxyl groups of the aspartate play ac rucial role. In this reaction the aspartate groups play a catalytic role, by hydrogen donation and extraction. Figure 2s hows the overall tertiary structure, obtain from a crystallographic measurement, of the HIV-1 Protease( PDB Code:4 HVP). [40] This protein is made up of two subunits, one on the left and one on the right.I mportant regions of the protein consist of flaps, aspartic groups in the active site and the dimerisation domain.T hese regions are encircled in Figure 2. This protein is an enzymet hat activates aw ater molecule, which then performsanucleophilic attack on the carbonyl carbon of the scissile peptidebond, as explained below.

System Details
Computationally,i ti sn ot possible to handle the whole protein, that is, finding aS CF-converged wave functionw ithin a reasonable time and perform an IQA analysis on it. Hence the protein was truncated to the same size as that in the study by Garrec et al. [48a] in 2011. Note that am ore recent paper [48b] from ad ifferent group also studied this system but appeared after the current work had started. Garrece ta l. give ad etailed discussion as to why this is an appropriate model, which can perhaps be summarised as follows:t he model is largee nought o include all the important direct and indirect energetic interactions (vide infra), but at the same time it remains small enough to be described at ar easonablel evel of theory.F ollowing their model, we terminatet he residues with am ethyl group attached to the peptidel inkage and, in addition, the methyl groups of threonine residuesa re replaced by hydrogen. Figure 3s chematically shows this truncated system which contains 133 atoms (only af ew hydrogen atoms are shown for clarity), and for whicht he wave functiona nd the IQA analysis could be obtained. Thus, this model has each of the aspartate residues (Asp25a nd Asp25(')) bonded to two other amino acids, leading to two amino acid chains:A sp25-Thr26-Gly27 and Asp25(')-Thr26(')-Gly27('). We shall refer to one chain as "primed" and the other as "unprimed", which relates to the notation used in the text and figures. Such ac hain of three amino acids is often referredt oa sac atalytic triad. [49] The primeda nd unprimed catalytic triads have ar igid, coiled structure, such that they maintain close proximity.T his rigidity is due to an umber of hydrogen bonds that naturally occur in the enzymea nd are present in our modela nd are referredt o as the "fireman's grip". [50] Another reason for this rigidity,p roposed by others, is that it is induced by the peptided ipole moments in Thr26(')-Gly27(')t hat interact with the negatively chargedA sp25(')r esidues. [50][51][52] This rigidity,w hichi sd ue to the fireman's grip and peptided ipole moments, is crucial in keeping the Asp25a nd Asp25(') carboxyl groups coplanar, which, in turn, is crucial for the active site to act as ac atalyst. Hence we consideri tc rucial to include the catalytic triad in the wave function calculation in order to maintain the rigid structure and coplanarity required forc atalytic behaviour. [53][54][55] Furthermore, when choosing more atoms with which we representt he system, we considered that the inhibition of the active site by as ubstrate can cause changesi nt he tertiary structure of ap rotein. In ac rystallographic study [40] of HIV-1 Proteasei tw as found that the presence of al igand bound in the active site caused changes as large as 7 in the backbone (a region of the often described as the "flaps",s ee Figure 2). These large atomicd isplacements indicate that these "flaps"  may be importanti nt he catalytic behavior of the system.T herefore, the truncated system studied contains ar epresentation of these "flaps" in the form of two doubly methylated peptide bonds( residues 49('), 50('), 49 and 50).
The two water molecules in Figure 3, the reactive water molecule WR and W301,w ere not present in the crystal structure 4HVP.H ence they had to be added to the crystal structure, whichw as carried out before by others. [48a] The WR water molecule is crucial as it is neededt oa ct as the nucleophile in the peptide hydrolysis reaction. The two Me49(')ÀIle50(') groups are bonded to the "flaps" (in Figure 2) and the role of the W301 water molecule is to maintain cohesion of these groups and that of the rest of the system.T he NH groups in the Ile50/50(')a mino acids strongly bind the W301 water molecule,w hich in turn binds to the substrate and ensures that the substrate geometry is correctfor peptide hydrolysis.
The reaction-step and atomic numbering used throughout this paper is shown in Figure4.T his region is highlighted as the atoms in this region are used throughout the Results and Discussion.

Results and Discussion
Figure 5s hows the total energy Potential Energy Surface (PES), startingf rom the reactants at Intrinsic Reaction Coordinate (IRC) sample number 1, and endinga tt he transition state at IRC sample number 11.T his path covers at otal energy change of 66.8 kJ mol À1 .I nt his study we follow the IRC from left to right in Figure 5. Doing this gives information regarding the first step of the peptideh ydrolysis, from whichi ti sp ossible to determine the reactionm echanism using the REG method in conjunction with IQA. Calculation of the reaction mechanism will be guidedb yt he REG method when applied to pairwise atomistic IQA energies. Firstly,w ew ill evaluatet he exchangecorrelation energies( commonly associated with covalency) to show an energetic reaction mechanism that is consistent with the traditional chemical interpretationofareaction mechanism (i.e. covalent bondsf orming and breaking). Secondly, we will discusst he electrostatic energies associated with the peptide hydrolysis, showing that the REG method is able to recover common chemical ideas (such as electrostatic hydrogen bonds and electrostatic nucleophilic attack) from physical principles. Finally,w ew ill discusst he V inter (A,B) terms. These terms are the sum of pairwise electrostatic and exchange-correlation energies anda ss uch give am ore complete overall picture of the reactionm echanism. Using V inter (A,B) terms allows for the assessment of cancellations in the exchange-correlation and electrostatic energies, and returnst otal pairwise interaction energies between atoms.

Determining the mechanism of peptide hydrolysis
To determine the mechanism of peptide hydrolysis in the HIV protease site, the exchange-correlation IQA energies were evaluated using the REG method. The exchange-correlation energy relates to the chemical concept of covalency,a nd thus to the  Figure 3), showing the atomic numbering used throughout this paper.B ecause they are the only groups required in the analysis, only the Asp25 and Asp25(')c arboxyl groups, the reactive (WR) water molecule, and the peptidea rea of the substrate are shown. The structure on the right corresponds to the transition state, and the straight reaction arrows reflect that the energy of this state is higher than that of the left structure. Notethat the similar arrows in Figure 1r epresentthe overall direction from reactant to products, and hence face the opposite way. making and breakingo fb onds. This explainsw hy we analyse this particular type of energy separately,i no rder to determine the reactionm echanism as traditionally thought of by chemists. The exchange-correlation IQA energies were ordered (as discussed in the Theoretical and Computational Background), with the more negative REG values considered to be more important in defining the reactionmechanism than the less negative (or even positive) REG values are. Thiso rdering from most negative to most positive is evident as the energy gradientso f the IQA energy terms that are most stabilised in forming the transition state must be the opposite of the total system energy gradient. As such, negative REG values imply stabilisation, when going from the reactants to the transition state, whereas positive REG values imply destabilisation. We also stress that the validity of the REG method in this case is evident through the Pearson correlation coefficients R in Ta ble 1 all being approximately equal top ositive or negative unity.
It is possible to represent the results of Ta ble1 as am echanism, as shown in Figure 6. The mechanism in Figure 6d iffers from at raditional chemical mechanism in its meaning, because it does not directly represent the movement of electrons. In-stead, it represents the exchange-correlationb etween two topological atoms becoming energetically more favourable or more unfavourable when progressing from IRC sample point 1 to IRC sample point 11.W ehave assigned colour, direction and labels to the curly arrows to aid in understanding the values given in Ta ble 1. The colouro fe ach arrow represents the sign of the REG term with which it is associated.I nF igure 6g reen represents an egative REG value and red represents ap ositive REG value.
Energies with negative REG values regress in the opposite directionc ompared to the total energy,w hen going from the reactants to the transition state. The total energy increases in the forward direction (from IRC sample point 1t o1 1). Hence, the total energy gradient is positive, because aR EG is essentially ar atio of gradients (i.e. partitioned energy over total energy), an egative REG means that the partitioned energy must decrease in the forward direction. Thus, an egative REG value represents stabilisation or bond strengthening (or formation) over the course of the reaction. In contrast, red arrows have positive REG values that represent destabilisation (i.e. regress in the same direction as the total energy) when going from the reactants to transition state. Hence these REG values represent bond weakening (or breaking) over the course of the reaction.
The directionality of the arrows is somewhat arbitrary as the reactioni sr eversible. However,a ss tated earlier, we consider the reaction proceeding from IRC Sample Number 1t o1 1i n Figure 5i no rder to show the mechanism proceedingi nt he same directiona st raditional peptideh ydrolysis. To summarise the colour scheme used: * Red arrows ande nergy terms both represent the exchangecorrelation energies that become lessf avourable over the course of the reaction( i.e. indicative of bondsb reaking/ weakening).
* Green arrows and energy terms both represent the exchange-correlation energies that become more favourable over the course of the reaction (i.e. indicative of bonds forming/strengthening).
The energetic mechanism determined by the REG values in Ta ble 1i ss hown in Figure 6. The first point to note is that the Table 1. The exchange-correlation IQA energies with largest magnitude REG valuesa re shown, along with their Pearson correlation coefficients R. The column marked "Label" is used to mark the curly arrows shown in Figure 6. Note that the prime (i.e. ')i nt his and subsequent tables is unrelated to the prime used to distinguish the amino acid residues (e.g. Figure 3). REG-determined energetic mechanism shows an ucleophilic attack mechanism (whichi ss imilar to the traditional mechanism). In fact, the bonds breaking andf orming in the traditional mechanism are shown to be of crucial importance as they have the largestm agnitude REG values in Ta ble 1 ( A, A', B and  B'). This shows that the REG methodi sa ble to recover wellknown features of reactions by only considering the largest REG values. However,t he mechanism determined by the REG methoda lso includes additional curly arrows noti ncluded in the traditional mechanism.T hese additional curly arrows express more subtle effects than those shown in Figure 1. The cascadingn ature of the arrows in Figure 6, along with the alternating green and red arrows, shows that the REG method is able to show how as every bond strengthens, another weakens. This represents the textbook understanding of valence bondingp atterns that only allow for ag iven number of bonds depending on the number of valence electrons. While au seful "rule of thumb" for the organic chemist, the physical nature of bondingi sm uch subtlera nd is reflected in our treatment. Indeed, not all the terms found by the REG method are cascading. For example, E, G and H are through-space bonding changes.T hrough-space exchange-correlation [56] may not be intuitive to many chemists who are used to integerb ond and valence numbers. However, through-space exchange-correlation is physically reasonable and is often reflected in textbooks by concepts such as conjugation (which have non-integer valence numbers) and so-called "electron deficient" bonding seen often in boron chemistry.D etecting through-space interactions is one of the benefits of working with IQA overt he orbital based method NBO. [33] This ability to determine throughspace interactions allows for the observation of subtle catalytic effects in the peptide hydrolysis reaction. The through-space exchange-correlation within the substrate mentioned in the previousp aragraph is not shown in Figure 6b ut presenti nT able 1. Ta ble1 shows the increased stabilityo ft he through-space exchange-correlation between the reactive water oxygen (O66) and the amide oxygen (O67) shown by the energy term labelled E. This increased stability could be attributed to partial conjugation of the amide and the reactive water in the transition state. This partial conjugation is also evidenced by energy terms G and H,which both involve O66 as well. In particular, energy term G can be seen as partial conjugation of the aspartic carbonyl oxygen (O37) and the reactive water (O66), while energy term Hc orrespondst o partial conjugation between the amide nitrogen (N59) and the reactive water (O66).W ithin the IQA method, it is not possible to determine conjugation in the traditional senseb ecause IQA does not explicitly refer to molecular orbitals in its output. However, there is definitely an increase in the exchange-correlation bondingb etween the atoms discussed.
To show further how to use the output of the REG method we will discuss an umber of REG values shown in Ta ble1.T he REG value with the most negative value in Table 1( i.e.A )r epresents the bond formation between O66 and C58, which agreesw ith the traditional mechanism of bond formation by nucleophilic attack. Indeed, energy terms with more negative REG values show larger relative stability over the course of the reaction, which in turn indicates bond formation compared to mere bond strengthening.
The energy term with the second most negative REG value (B) corresponds to the exchange-correlation between H100 and O67. This term shows one of the Aspartic groupss tabilising the peptide substrate by partially protonating the oxygen of the amide (which starts to form an alcohol). This also shows the asymmetry in the role of the Aspartic groups,s uch that Asp25 activates the water whereas Asp25(')n ot only activates the water but also stabilises the peptideu pon nucleophilic attack. The term with the third largest REG value (C) shows the "double" bond forming in the Asp25(')c arbonyl, which can be considered to also facilitate the process discussed previously regarding the curly arrow labelledB .B yc onsidering terms B and C it is possible to see how the residues in the active site facilitatet he nucleophilic attack of the water on the peptide. Considering all the other negative REG values in Ta ble 1, it is possible to see in Figure 6h ow the active site facilitates peptide hydrolysis in both activating the water molecule and stabilising the peptide in the transition state. It is possible to study al arger subset of exchange-correlation IQA terms than the one studied here, which would allow for the observation and analysis of more subtle catalytic properties of the active site.
The exchange-correlation IQA terms with positive REG values (marked in red) destabilise as the reactionp rogresses and therefore indicate bond-weakening andb ond-breaking. As indicated by having the most positive REG value, the exchange-correlation term that most destabilises is that between O36 and H100. This term represents the partial deprotonation of the catalytic Asp25('), which is required to stabiliset he formation of the alcohol group on the substrate during peptide hydrolysis.
The second most positive IQA energy term (B')d etermined by the REG method is that of the exchange-correlation between the amide carbon (C58) and amide oxygen (O67), which represents the weakeningo ft he bond between them. The formally double bond C(58)=O(67) is parto fa na mide group, together with the formally single bond C(58)ÀN (59).T hese bonds are linked via the classical resonance canonicals, which weakens the former and strengthens the latter.H ence, if C(58)= O(67) is weakened, through interaction with an atom external to the amide group, C(58)ÀN(59) will also be weakened. Thus, the cleaving of the scissile peptide bond is helped by the weakening of C(58)=O (67).
By considering the terms represented by C' and E' we can see the asymmetric activation of the reactive water by the Asp25 and Asp25(')g roups because, traditionally,o ne hydrogen atom is fully deprotonated by Asp25(')a nd the other hydrogen forms ah ydrogen bond with the Asp25 carboxyl group. This asymmetryi ss hown in the REG methodb yt he difference in the two REG values( C ' = 2.0 and E' = 1.3). Finally,w e also see bondingr earrangement in the Asp25(')c arboxylic acid group because one of the oxygens (O36) is deprotonated and the other is protonated (O37). As the curly arrows cascade this rearrangement leads to the bond weakening betweena toms O66 and H101, as represented by the curly arrow labelled E'. The term E' represents bond weakening (not bond breaking). This can be seen in Table 1, as the REG value for E' is smaller in magnitude than C',w hich indicates less of an energetic change over the course of the reaction. This bond weakening is associated with the hydrogen bond formation between O11 and H101 (term F).
Here we have shown that determining reaction mechanism and catalytic properties within an active sight from REG values is simple.W es howedt hat, by simply looking at ordered lists of REG values, chemical insight can be gained. We can also say that this chemical insight is physically well-grounded as it is based purelyo nw ell-defined energies.

Electrostatic contributions to the peptide hydrolysis
In the previous sectioni tw as shown how the exchange-correlation energies could be used to define ar eaction mechanism that represents covalent bonds breakinga nd forming. Althoughs imilart ot he traditional chemical mechanism, the exchange-correlation terms do not allow for exhaustive conclusions to be drawn as the electrostatic energy must also be considered. For this reason, we will study the electrostatic terms here. It should be noted that the REG values in Ta ble 2 are directly comparable to those in Table 1.
Here we discuss some of the largest electrostatic contributions to the overall stabilityo ft he reactionm echanism using the REG method. The IQA electrostatic energy that most contributes to the reaction is the electrostatic interaction between the amide oxygen (O67) and the Asp25(')c arboxylic acid hydrogen (H100). The fact that the REGo ft his electrostatic term is larger than its counterpart exchange-correlation term (see Ta ble 1) showst hat this proton transfer is primarily dominated by electrostatics. Furthermore, we can state that the electrostatic term is primarily polarisation driven. As shown in Table 3, the polarisation REG for this interaction is the most negative and constitutes À6.1/À6.8 = 90 %o ft he overall interaction. The two next largest electrostatic REG values relate to the deprotonation of the Asp25(')g roup, whichw as discussed in the previouss ection. These values show that the electrostatic interactions betweent he carboxylic carbon (C35) of the Asp25(') and the carbonyl oxygen (O36), as well as the 1,3 electrostatic interaction between C35 and H100, energetically favour the reaction proceeding. The former favourable electrostatic interaction is likelyd ue to the increased stabilisation as the oxygen charge increases, while the latter is likelyd ue to the increased distance between the positive hydrogen (H100) and the positive carbon (C35) as the hydrogen deprotonates.
The next most negative term represents the electrostatic interaction between the nucleophilico xygen (O66) of the WR water and the carbon (C58) of the amide (which is attacked). This electrostatici nteraction shows that the nucleophilic attack is only partially driven by electrostatics because in the previous section we showedt hat the attack was also driven by exchange-correlation. From Ta ble 3, we can see that this nucleophilic attack is primarily based on the monopolar energy, which corresponds to charge transfer and is thus not due to polarisation.Weshow here that the nucleophilic attack has stabilising contributions from both electrostatic and exchangecorrelation energies. By comparing the REG values of these two energies, we find that both are important as drivers of the reaction, bute xchange-correlation is slightly more of ad riving factor as it has al arger REG value, in magnitude,( À4.8, see Ta ble 1) than the electrostatic REG value (À3.2, see Table 2).
Finally,w es how that the electrostatic terms that mosti nteract to prevent the interaction are listed with positive values in Ta ble 2. The electrostatic IQA terms that most opposes the reaction are those between the amide oxygen (O67) and the amide carbon (C58), and the amide nitrogen (N59) and the amide carbon (C58). This shows that peptideh ydrolysis not only destabilises the exchange-correlation between the amide atoms (see previous section), but it is also destabilises the amide atoms electrostatically.
Finally,w ec onsider the final energyt ype, denoted V inter (A,B), which is the sum of the corresponding (A,B) exchange-correlation term and the electrostatict erm. The purpose of this final analysisi st oa llow for anyc ancellation between the exchangecorrelation and electrostatic pairwise energies, and thereby investigate the total interaction between two atoms. Using the  total interaction energy is useful for determining the reaction mechanism because in nature there is no distinction in stability due to exchange-correlation or electrostatic energy.W ec an therefore see how the total interaction between two atoms dictates the reactionm echanism. Table 4r eports the results of using the REGmethod on the total interatomic energies.
From Ta ble 4w ea re able to draw am echanism much like the one for exchange-correlation. Figure 7s hows this interatomic energy reaction mechanism.F igure 7s hows many of the same features as in Figure 6, with somen otable differences. One such difference is that the hydrogen transfer between the amide oxygen (O67) and the Asp25(')h ydrogen( H100) is more dominant than the nucleophilic attack of the WR oxygen (O66) on the amide carbon (C58). This must be due to the additional electrostatic terms, because the electrostatic REG value (see Ta ble 2) of the former is the most important electrostatic contribution.I ti sa lso evident that the deprotonation and bond weakening between the oxygen and the hydrogens in the WR water is not as important as shown in the exchange-correlation case. Therefore we can state that the WR water is electrostatically bound in the transition state by the Asp25 and Asp25(') groups.T wo very important interactions, shown by Figure 7, that inhibitt he reaction occurring are those between the amide carbon (C58) and its neighbouringo xygen (O67) and nitrogen (N59).I np articular,t he total interaction points to the fact that the destabilisation of the carbon-nitrogen interaction is al arge energy barrier to the reaction occurring.
Pragmatically,w ew ould recommend that using total pairwise IQA interaction energies is the universal methodf or understanding ar eactionm echanism in large systems using the REG method. As it gives the total energy between two atoms and therefore contains the most information about how reactions occur in large systems.
The intra-atomic energy terms have not yet been considered. In this system we find that the intra-atomic REG values range from 2.2 to À6.3. This makest he intra-atomic terms less dominant than the interatomic IQA energy terms, which have aR EG value range of 11.4 to À10.8, as seen in Ta ble 4. The intra-atomic terms are also not as usefuli nd etermining reaction mechanism because their purely intra-atomic natureg ives no information regardinga tomistic pairwise information.

Conclusions
In this study of ap eptideh ydrolysis reaction in the HIV-1 Protease active site, we have shown that the REG method, when used in conjunction with IQA energies can be used to determine the mechanism of reaction. In using IQA energies we have an energetic scheme that is without arbitrary parameters, well-defined atomsa nd withoutd ubiouse nergy terms. The REG method is also parameterless and therefore the determination of the reaction mechanism is achieved here withouta rbitraryp arameters. Furthermore, this reaction mechanism gives much more insight into catalysis than the traditional mechanism and is based on an absolute measureofinteratomic energy,n ot on electron density given the ongoing controversy surroundingt he interpretationofb ond critical points.
It is pleasingt hat the REG method reveals "through-space" interactions. These can be problematic in terms of standard chemicalintuition. In this work, however,t hey roll out naturally and are fully integrated (in terms of computationalt reatment) with the bonded interactions. This ability to determine through-space interactions allows for the observation of subtle catalytic effects in the peptide hydrolysis reaction.
This study shows that the REG methodi sapowerful tool that can be used to automate the analysiso fl arge chemical systemsa nd allows for the exhaustive study of such as ystem providing chemical insight. The REG method requires no prior knowledge of the system and so can be useful tool in studying protein-ligand binding, active site reaction mechanismsa nd conformational changes within large biomolecules. We hope that this work will set an example in how to use IQA in large systemstor ecover physically meaningful results.

Theoretical and ComputationalBackground
Interacting quantuma toms (IQA) The IQA methodw as originally derived in its entiretyb y Blanco et al. [17] although the electrostatic terms were previously derived and implemented by Popeliera nd Kosov, [6] and independently and simultaneously by Salvador [57] et al. Equation (1) shows the partitioning method used, in which each energetic term is calculated by integrating,i np rinciple, over either the first-or second-order reduced density matricesw ith the appropriate operator.
The first term in Equation (1) requiresa3D integration of the electrond ensity over each atom independently and contains the (one-electron) intra-atomic kinetic, exchange-correlation and electrostatic energies within the atom. The second and third terms are obtained from 6D integrations over two topological atoms A and B,and represent the (classical) electrostatic and exchange-correlation energies, respectively.N ote that, at long-range, the electrostatic energy can be calculated [58] via atomic multipole moments therebya voiding 6D integration. Because the B3LYP density functional is used in this work, the exchange and correlation energies are combined in Equation (1). Thef irst successful attempta tc ombining ad ensity functional (in this case B3LYP) with IQA, such that all partitioned energies add up to the system's total energy,w as achieved [56] in 2016. We mention that it is also possible to calculate electron correlation explicitly,a gain within IQA, for wavefunctions derived from Møller-Plesset perturbation theory [59,60] and coupled cluster (CCSD(T)). [61,62] To extract more information from the IQA electrostatic terms, we partitiont his energy further into monopolar (charge-transfer) [63] and polarisation terms, which are defined in Equations (2) and (3) in which Q A 00 is the monopole moment( i.e. essentially atomic net charge) of atom A and r AB is the internuclear distance between atoms A and B. These two terms enable furtheru nderstanding of polarisation and charget ransfer without having to invoke any approximations or perturbation theory.The physical picture behindt his further partitioning should be clear.F irst, an atomic monopole moment measurest he build-up or depletion of electronic chargew ithin that atom. For example, the oxygen in water returnsavalue Q O 00 ¼ À1:2, expressing the fact that the oxygen hasa ttracted an electronic charge of 0.6 e from each hydrogen. In non-symmetric cases it is not clear how much charge hasb een drawn from which other atom but the principle remains that atomic monopole moments express charge transfer.S econdly, we view charge transfer as as pecial case of polarisation in general. Actually,t he term polarisation is most often used in the narrower meaning of dipolar polarisation, which we view as ac hange in the atomicd ipole moment. In other words, and in summary,c harge transfer is monopolar polarisation,r eferring to ac hange in the zeroth multipole moment, while (dipolar) polarisation refers to ac hange in the first multipole moment. We briefly mention that polarisation can be predicted by machine learning [64,65] after suitable training.

The relative energy gradient (REG) method
We recently published [4] and applied the REG method to the water dimer as an example of how to understand hydrogen bondingi nt he system. The water dimer was chosen as at rivial system to illustrate the REG methoda saproof of concept. However, the current study is far from trivial because its size is too large to analyse the IQA terms "by hand".
Here we only repeat the fundamentals of the REG method and recommend the originalp aper [4] for more details concerning the logic behindt he method. The REG method has two main objectives:( i) to determines ubsets of partitionede nergies that best represent the total behaviour of the system,a nd (ii)toe xtract chemical insightf rom an energeticallyp artitioned system.G iven these two objectives,i ti sc ritical that the REG method is exhaustive and does not depend on arbitrary parameters. Central to REG is that the system is systematically perturbed, controlled by ac ontrol coordinate s. In the current case study,t he control coordinate is an intrinsic reaction coordinate (IRC) but it can be as imple internuclear distance, or a dihedrala ngle. Typically,o ne needs aboutadozen or so snapshots alongarelevant trajectory in which the whole system evolves. The REG methodt hen looks forc orrelations, occurring along such at rajectory,b etween the total energies and the atomic energies. However,t he REG methodi sa ppliede nergybarrier-wise. In other words, the PES is separated into segments that are defined by the turning points in the total energy.W ec all these segments" barriers" because there is always ad irection along the energy profile in which the energy rises towards al ocal maximum at the turning point, which marks the end of the segment.I nt he current study there is only one barrier.
The IQA partitioning scheme is additive in nature such that the total energy of the system can be recovered by the following sum, in which N is the total number of energy terms and s is the control coordinate, which is sampled at M data points within a single barrier. Practically speaking, Equation (4) is numerically not exact because of the well knowna tomici ntegration error. [66] However,t his error is typicallys ystematic, in that the sum of all IQA energies is typicallys maller than the wavefunction's total energy.I nt his case study,t he maximum error found for any of the M = 11 data points is 26 kJ mol À1 .E nergy gradients are not much affected by this number because of the systematic nature of the error. The REG method relatest he gradient of ap artitioned energy (denoted E i )t ot he total energyo ft he system (denoted E total ). It does this using linear regression as shown in Equation (5), in which m REG i is the RelativeE nergy Gradient, c i is the y intercept (which does not contain chemically relevant information) and s is the control coordinate. Note Equation (5) corresponds to an equationf or every energy term i,a nd that it is fitted to the M datap oints that represent the barrier.F rom here onwards, the m REG i value shall be referred to as the REG value of ag iven energy term i or simply REG i .I ti sc alculated using ordinary least-squares linear regression as shown in Equation (6), in which T denotes the transpose, transforming an Mx1 columnvector into a 1xM row vector or: The energies in the above equations are translated such that the mean energy is 0, where E i is the meano ft he values in vector E i (and the same is true for the total energy). By definition [see Eq. (5)],aR EG value only makes sense if the partitioned energy (E i )a nd the total energy correlatel inearly,o ra s linearly as possible. To quantify the degree of linearity we use the Pearsonc orrelation coefficient, which is calculated in Equation (8), Once the REG valuesf or the system have been calculated, they are ordered from mostp ositive to mostn egative (i.e. largestt o smallest). The magnitude of the REG value gives the ratio of the energy gradient of the partitioned energyc ompared to the total energy.A ss uch if the REG value has am agnitude of 1t hen the partitioned energy has the same magnitude of gradient over the barrier as the total energy.I ft he REG value magnitude is < 1t hen the gradient of the partitioned energy is smaller than the total energy,a nd vice versa.T he sign of the REG value gives informationo nt he direction of the gradient: if the sign of the REG value is positive then the partitioned energy gradienta cts in the same direction as the total energy over the given barrier (i.e. the partitioned energy behaves in a similar fashion to the total energy). In contrast, if the REG value is negative thent he partitioned energy acts in the opposite direction to the total energy over the given barrier( i.e. the partitioned term has the opposite behaviour when compared to the total energy of the system). Being exhaustive, the REG methodr anks all IQA energy terms such that the terms with the largestmagnitude REG values are more chemically relevant than IQA terms with smaller magnitudeR EG values. Due to its exhaustive nature,the REG method observes all subtle catalytic effects, andr anks them in aquantitative manner.

Computational Details
The transition-state geometry was calculated at the B3LYP/6-31 + G(d,p) level of theory,w ithout geometric constraint on the residues, and had one "negative frequency", as would be expected for such as tationary point. Although it is not the normal practicet oc arry out unconstrained optimisationso n enzymef ragments, we chose to do so here as the unconstrained structure we determined was not too different from that obtained by Garrec et al., [48a] who used constraints. Our approachh as, in addition, the advantage of lacking spurious vibrational frequencies causing problems with the determination of the free energy or causing problems for the IRC. Starting from this transition state geometry,a ni ntrinsic reactionc oordinate (IRC) calculation was performed for 100 steps. Along the trajectory 11 geometries were selected, including the transition state and the reactants. The wave functions of these 11 geometries were then calculated at B3LYP/6-31 + G(d,p) level. All the density functional calculations werep erformed using GAUSSIAN09. [67] The IQA analysisw as performed using the program [68] AIMAll version 16.01.09 (in which the default AIMAll integration method [69,70] was used insteado ft he TWOE [71] algorithm, due to observed instabilities). The REG analysis was performed by the in-house code ANANKE. The system contains 133 atoms, which means that there are 17 689 individual IQA terms to be calculated. This number can be justified as follows: for an n-atom system there are n(n-1)/2 interatomic electrostatic energy terms (V cl ), n(n-1)/2 interatomic exchange-correlation energy terms (V xc ), and n atomic energy terms. As ar esult there are 2n(n-1)/2 + n = n 2 IQA terms, and 133 2 = 17 689.