Molecular mechanisms from reaction coordinate graph enabled multidimensional free energies illustrated on water dimer hydrogen bonding

Abstract Computing the free energies of molecular mechanisms in multidimensional space relies on combinations of geometrically complex reaction coordinates. We show how a graph theory implementation reduces complexity, and illustrate this on the arrangements of hydrogen bonding of a water dimer. The reaction coordinates and forces are computed using graphs that define the dependencies on the atoms in the Free Energy from Adaptive Reaction Coordinate Forces (FEARCF) library. The library can be interfaced with classical molecular dynamics as well as quantum molecular dynamics packages. Multidimensional interdependent reaction coordinates are constructed to produce complex free energy hypersurfaces. The reaction coordinates are graphed from atomic and molecular components to define points, distances, vectors, angles, planes and combinations thereof. The resultant free energy surfaces that are a function of the distance, angles, planes, and so on, can represent molecular mechanisms in reduced dimensions from the component atomic Cartesian coordinate degrees of freedom. The FEARCF library can be interfaced with any molecular package. Here, we demonstrate the link to NWChem to compute a hyperdimensional DFT (aug‐cc‐pVDZ basis set and X3LYP exchange correlation functionals) free energy space of a water dimer. Analysis of the water dimer free energy hypervolume reveals that while the chain and cyclic hydrogen bonding configurations are located in stable minimum energy wells, the bifurcated hydrogen bond configuration is a gateway to instability and dimer dissociation.

thermal properties, principally the free energy of association. A central requirement in the computation of free energy is rigorous sampling of the association, conformational, configurational or reaction space fundamental to the free energy landscape for the event(s) under investigation. There are several free energy methods that rely on classical force fields including Umbrella sampling, 4 adaptive biasing potential 5 and meta-dynamics, 6 all of which use a pre-chosen analytical biasing potential to drive molecular systems from regions of low potential energy to regions of high potential energy. The process of choosing weighting function(s) presupposes knowledge of a mechanism for which the free energy landscape is then computed. By way of example, selecting the intermolecular distance between reacting molecules and then implementing a series of umbrella potentials along that coordinate, prevents the system from evolving freely and revealing the interplay of multiple possible reaction mechanisms.
The Free Energy from Adaptive Reaction Coordinate Forces (FEARCF) 7-9 method instead uses numerically derived biasing forces to facilitate complete sampling of the potential energy landscape. A complex and rugged biasing potential is constructed from the density distribution of a molecular system resulting from the sites visited in MD runs. The density distribution is updated after each FEARCF iteration and reaction coordinate forces are computed that drives the system away from equilibrium low energy regions toward less probable high energy areas. The converged iteration results in an unbiased sampled landscape which categorizes it as a Flat Histogram 10 method.
The name implies a convergence toward a condition where the molecular system of interest is able to visit every possible state with equal probability.
Incorporating directional hydrogen bonding between water molecules in any molecular model is central to accurately simulating the natural phenomena. However, a dynamic account of electron redistribution in hydrogen bonding is only possible using quantum dynamics methods. 11,12 What remains outstanding is the computation of the free energy of association of water that accounts for electron redistribution during non-Boltzmann dynamics. Previous reporting of the FEARCF method 7,13,14 involved an implementation within the CHARMM molecular dynamics software which limited its scope to classical and semi-empirical systems. FEARCF has now been packaged within an object-oriented software library enabling its utilization within various molecular dynamics software packages including ones suited for quantum mechanical systems. Recently we reported a computationally efficient method that combines ab initio Hartree Fock dynamics (QSL: Quantum Supercharger Library) 15,16 through MD packages with a comprehensive multidimensional Free energy method (FEARCF) 7-9 that could be applied to problems of this nature. 17 A selection of a single reaction coordinate, ξ 1 to compute the free energy for complex problems such as the hydrogen bonding water dimer is likely to result in the conflation of several events in the 1D energy well. Constructing several reaction coordinates (RCs) that are diverse in geometry (ξ 1 = r, ξ 2 = θ, ξ 3 = φ etc.) representing distance between points, angles between vectors, angles between planes, and so on, to map out multiple dimensions in free energy space is a computational challenge. The numerical approach of FEARCF avoids the complex Jacobian corrections required by other free energy methods.
In addition, the object-oriented design of the FEARCF library enables a more intuitive and efficient means to calculate the atomic biasing forces. This can be best understood by the representation of sets of RCs as graphs.
The FEARCF approach is to express the complex relationship between the RCs and the atomic coordinates and forces on them in terms of graphs. Graphs are composed of elements referred to as vertices (nodes) where the relationship between the nodes are described by edges that connect the nodes. Complex combinatorial selections of multiples of these two objects can model a large variety of problems.
Chemical graph theory was first applied to the discovery and classification of molecular isomers, particularly hydrocarbons, using trees to represent molecular topologies. [18][19][20] More recently other applications have been developed including using graphs to assign unknown spectral resonances in NMR data, 21 discovery of new reaction mechanims 22 and even predicting new reactions entirely. 23 Here we report the use of graphs that reduce the complexity of defining nontrivial RCs and their role in easily combining multiple RCs to efficiently compute high dimension free energy landscapes. The library implementation of the FEARCF method that facilitates an interface with classical and quantum molecular dynamics packages is illustrated in an interface with CHARMM and NWChem to produce classical and quantum free energy hypervolumes (FEVs). By way of illustration the mechanism of association and dissociation of hydrogen bonds in a water dimer is discovered from the multidimensional DFT FEVs using NWChem/FEARCF computations.

| FEARCF method
Previously we have detailed the FEARCF as a flat histogram method 8,9,24 that produces a converged free energy surface enabling equal sampling in molecular and atomic configurational and conformational space. Equal sampling for a set of RCs (ξ) requires that the summed effect of the system potential W(ξ) and the biasing potential U(ξ) must be a constant. For the sake of simplicity we can choose this constant to be 0. To reach convergence the condition: must be satisfied giving the ideal biasing potential of U(ξ) = ÀW(ξ), even though initially W(ξ) is not known. To discover the form of W(ξ) it is convenient to start with U 1 (ξ) = 0. Using a zero bias, a simulation of the molecular system produces a probability density P 1 (ξ). The Boltzmann relationship produces the first estimation of the system potential W 1 (ξ), from which the next estimate of the ideal biasing potential is computed This updated estimate of the biasing potential is then used to run a second iteration of the system from which another probability density is extracted, however the new probability density is that of the biased simulation of the system, that is, P 0 2 (ξ). The unbiased probability density P 2 (ξ) can be recovered since the biasing potential U 2 (ξ) is known because it was used to produce the biased probability density where C is a normalization constant. With this new updated unbiased probability density, a new biasing potential U 3 (ξ) can be created using the relation shown in Equation (3). This process is continued iteratively until a roughly equal sampling occurs with a small ratio between high and low energy regions making it possible for the system to diffuse about the landscape freely. When this is the case, a close approximation of the systems potential W(ξ) has been discovered. To speed up this process, the weighted histogram analysis method (WHAM) 25 is used to combine the result of multiple simulations run at each iteration. This embarrassingly parallel approach allows for a much faster approach to convergence than a serial simulation computation can achieve.
The manner in which the biasing potential U(ξ) is used to calculate individual biasing forces on each of the atoms used to define ξ is through a simple application of the properties of conserved potentials, especially the relation that states that the force is the gradient of the scalar potential, that is, F ! ¼ ÀrU. The gradient is presented in terms of Cartesian coordinates however in FEARCF the potential is defined in terms of ξ. To resolve this discrepancy, we consider the chain rule.
The first term on the right hand side is the gradient of the biasing potential which can be numerically approximated using interpolation schemes such as cubic splining. The second term is the partial derivative of the RCs in terms of the Cartesian coordinates of the atoms used to define ξ. In the case of multiple dimensions this term is split into several partial derivatives which have known analytical solutions and are computable. 26,27 This separation of the atomic forces into a product of partial derivatives is the rationale for representing RCs as graphs. The FEARCF library's use of objects aids in the efficiency and flexibility of computing free energies as a combination of points, distances, planes, vectors, or angles.

| FEARCF library
The A second significant module translates atomic coordinates, masses and forces data provided by the MD package into FEARCF syntax. The force and position data is amended as described above and directed back to the MD package. A library call to FEARCF is placed inside the MD package, typically at the end of the step update section as is illustrated in Figure 1.
The QMD implementation in NWChem has been previously reported. 28 Briefly it has the functionality to perform ab initio molecular dynamics from methods including Hartree-Fock (HF), Density-Functional Theory (DFT) and Møller-Plesset perturbation (MP2). This is achieved with a Born-Oppenheimer approximation treatment of the nuclei, and modeling the electronic structure using Gaussian basis functions. In the case of DFT where the Kohn-Sham equations are given as where the nuclei are represented in the v C r ! term as well as the electronic contributions to the Coulomb potential v C r The nuclei positions R

| Graph Theory applied to reaction coordinates and biasing forces
The graphs constructed in FEARCF show in detail the relationships between the ξ i and the atomic coordinates from which they are ultimately derived. The nodes represent several types of elements (atoms, vectors, planes, angles, etc.) while the edges represent the subroutines that both calculate the quantities themselves, as well calculate the partial derivative terms needed for the biasing force. In this way the constructed RCs can be represented as directed, colored graphs although an explicit inclusion of directed edges is not incorporated since every edge is implicitly bidirectional.
To demonstrate how graphs can show this kind of relationship, a simple graph ( Figure 2) is constructed between a single atom (A 1 ) and a RC (ξ 1 ) that has an explicit dependency on A 1 . Following this ξ 1 is used to define a free energy W(ξ 1 ). To illustrate that the nodes of the graph represent a variety of elements, the nodes are depicted as different color nodes. They share an edge which represents the potential flow of information (via subroutines) between them. For clarity, this edge is be split in two to represent the two types of information transfers namely calculation of the value of ξ 1 as a function of A 1 , and the propagation of the biasing force via a product of partial derivatives, that is, At every integration step of a FEARCF simulation the values of the chosen RCs need to be calculated from the atomic positions. To do this each chosen RC node retrieves information from the nodes, a level below, that are used to define it. If those nodes do not represent atoms, they in turn retrieve information from the nodes a further level below, that were used to define them. This creates a chain Once all RC values are known, the cubic or B-spline interpolation subroutine is used to calculate the gradient, in particular the partial derivatives, of the free energy volume at these ξ i values. The gradient partial derivatives received from the interpolation subroutine for each chosen ξ is sent to the nodes used to define it. These nodes then calculate the partial derivative term required to transform the received derivative to be in terms of the variable each node represents. These nodes then in turn send this derivative to the nodes used to define them. This is repeated in a chain-like manner until the nodes that represent the atoms have accumulated all the partial derivatives needed for the biasing force on each atom (see Equation (5)). The atomic biasing forces are then added to the forces calculated by the MD package interfaced with the FEARCF library in order to update the atomic velocities and positions using both the system and biasing potential.
An early graphing approach defining RCs was implemented in the CHARMM module RXNCOR and a limited precursor to FEARCF 30 expanded on this to undertake QM/MM reactions relying on a pair of distance RCs in two dimensions used for umbrella sampling. [30][31][32] The original implementation in CHARMM was limited by umbrella sampling that is not ideally suited to free energy simulations of more than two dimensions and further is not efficiently parallelizable. The limit to two dimensions is due to the analytical nature of the biasing potential used in umbrella sampling that requires increasingly complex corrections with increasing dimensionality. The early implementation in CHARMM of the adaptive umbrella sampling method which uses WHAM to combine results from multiple parallel simulations has not been developed further to date.
The FEARCF library rectifies these limitations by sourcing its biasing potential from a numerical potential, which does not rely on change of frame corrections. The library being located outside of MD packages such as CHARMM better enables efficient integration of WHAM for parallelization. The graphing approach uses an adjacency matrix A defined as A = [a ij ], a ij 0, 1 ð Þif node i and j share an edge. A simple graph representing the RCs and its adjacency matrix is given as Simple graph illustrating basic function of a graph in the FEARCF library, with information being exchanged between an atom A 1 and a reaction coordinate ξ 1 An associated matrix representing the forces, named here the derivative matrix D, which rather than simply having 1s representing the edges, they can be replaced by terms such D ij ¼ ∂i ∂j which is the partial derivative of node i with respect to j. The atomic biasing forces can then be found by finding a path in the derivative matrix from each reaction coordinate to each connected atom node. The net atomic biasing force on an atom can then be expressed as the sum of the products of the derivatives along each path. 33   The distance reaction coordinates (ξ 1 and ξ 2 ) as defined in terms of the molecular representation ( Figure 3A) are simply represented in a two layer graph ( Figure 3B). The force computation in FEARCF is shown in the derivative matrix ( Figure 3C). The atomic biasing force dependencies and net force on the hydrogen being exchanged can be mathematically defined and computed (Equation (9)). Here i is an index over the set of reaction coordinates and j is an index over the edges that make up the path connecting the reaction coordinates to the atom.
The subroutines used to calculate partial derivatives are defined

| RESULTS AND DISCUSSION
Here we demonstrate the effective graphing structure of the FEARCF library applied to the interplay of hydrogen bonding in a water dimer.
The value of multidimensional FEVs is illustrated by first computing a simple 1D free energy curve, followed by a 2D free energy surface.
These computations have previously been performed using classical 34,35 and quantum 36

| Free energy graph based computation
Free energy is an essential component needed to understand molecular mechanisms. The value of multidimensional reaction dynamics using FEARCF's have been shown for complex enzymatic catalyzed F I G U R E 3 Representations of ammonium proton exchange reaction (A) molecular representation, (B) atoms defining ξ 1 and ξ 2 in colored graph, and (C) force computation from partial derivatives for each edge presented in derivative matrix reactions. 17,37 Further the hypersurface FEVs using FEARCF for the TIP water system illustrated the sensitive nature of FEVs showing that they can be used to distinguish between TIP3P, TIP4P, and TIP5P models. 14 However we did point out that a true mechanism of intermolecular hydrogen bond exchange was not possible using only classical methods. Here, the free energy volumes are computed for a classical water dimer model only to illustrate (i) the graphing approach to reaction coordinate force computation and (ii) demonstrate the flat histogram convergence of FEARCF. The mechanism of hydrogen bond association and dissociation are derived exclusively from an analysis of the QMD FEVs.

| Reaction coordinate and convergence of the 1D FE water dimer
While water is a simple molecule the collective structures and properties in bulk liquid and solid are complex. In hexagonal ice, a water molecule forms a tetrahedral structure with its four closest neighboring water molecules where it participates as a donor in two hydrogen bonds and as an acceptor in two more hydrogen bonds. This local structure is retained in the water solvent form. 38 A key reason for this is the intermolecular hydrogen bonds that form between water molecules. A single water molecule is able to form up to four hydrogen bonds at once, a large number for a molecule of such low molecular weight. These numerous hydrogen bonds are the reason for water's high boiling point and greater density as a liquid compared to a solid.
First let us consider one reaction coordinate, namely the wellstudied scalar distance between the oxygen atoms for the water dimer system ( Figure 4A) along with the graph that represents this reaction coordinate ( Figure 4B). The oxygen atoms are represented as points through which a distance between the two waters are defined.
This distance is the reaction coordinate. The flow of information via the edges in Figure 4B where the value of node r is simply given by . The atomic biasing forces computed through the addition of the derivative in terms of r, is the unit vector b r. This is given in Equation (10).
The 1D classical Water Dimer FEARCF simulation is run for 50 ps each iteration, with 128 simultaneous simulations in each iteration, each simulation run with a single core.
The sampling along ξ = r during the first FEARCF iteration where the biasing force = 0 is centered at 3 Å with the ratio of highest to lowest sampling being 19.3:1 ( Figure 4C). After the 15th FEARCF iteration ( Figure 4E) the sampling along r converges toward a flat histogram with a high to low sampling ratio of 2

| Reaction coordinates and convergence of the 2-D FE water dimer surface
Next we show another common setup for investigating water dimer interactions where two reaction coordinates ( Figure 5A) include a distance (between the center of mass of each water molecule) and an angle (between the two dipole moments of each water molecule). An illustration of the graphing construct for these reaction coordinates is given in Figure 5B. Since the distance is now between centers of mass, all of the atoms in the system are represented as points. The oxygen atom and two hydrogens of a water is needed to define the center of mass W i . This reaction coordinate is the scalar distance While these are not actually the true dipole moment vectors, they are parallel to them which allows us to get the correct value for our angle reaction coordinate φ.
To translate the data via edges shown in Figure 5B r ¼ ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffi are needed for the ξ nodes. In the case of the atomic biasing forces a similar approach for r as in Equation (10) is used. However, for ϕ the forces for the nodes are given in Equation (11).
For the atom nodes used to define the center of masses, forces are simply weighted according to their atomic masses. The 2D classical water dimer FEARCF simulation is run for 50 ps each iteration, with 128 simultaneous simulations in each iteration, each simulation runs on a single core. The sampling for both r and ϕ ( Figure 5C) for the first FEARCF iteration that has a zero biasing force has a high-tolow sampling ratio of (ξ 1 = r; ξ 2 = ϕ) is (8:1; 28:1) while after the convergence toward a flat histogram the ratio is (1.14:1; 26:1).
While the same minimum energy distance is observe ( Figure 5F) as in the 1D case ( Figure 4F) now the preference of the dipoles to orient themselves between 40 and 120 , that is, a preference for orthogonal configurations hints at hydrogen bonding configurations.
Orthogonal dipole orientations more closely resemble linear chain compared with bifurcated hydrogen bonding. However, there is no single minima for the φ angle. This is consistent with an understanding that φ cannot distinguish between an orientation where the hydrogen bonding is donated from one water giving a bifurcated configuration or when the waters are laying alongside each other. As we have previously noted, dipole interactions are typically described by two degrees of freedom, that is, the separation of the centers and the angle between them. This is a sparse description and contains not much more information than W(r). 14

| Reaction coordinates and convergence of the 4-D FE water dimer hypersurface
Previously we concluded that four reaction coordinates was a useful description leading to the 4D W(r, θ 1 , θ 2 , and ϕ). 14 Here, the parameters are distance between centers of masses (r), the molecular vector angles (θ 1 and θ 2 ) and their relative orientation (ϕ) as shown in Figure 6A. It is similar to the 2D system with the same definition for r however, φ is now a dihedral angle. In addition we have defined angles between the dipole moment vectors b i ! and the vector version r ! of our previously defined distance r. Constructing a graph that best represents these reaction coordinates is illustrated in Figure 6B. Again firstly all the atoms in the system are defined as points from which the center of masses m for each water molecule can be described.
This alone defines the distance reaction coordinate r as well as its vector r ! . Following this, the vectors parallel with the dipole moment vectors b i ! can be described which in turn allows for a definition of the dihedral angle φ between. In addition two angles θ 1 and θ 2 between the dipole moment vectors b i ! and the intermolecular vector r ! are needed.
The value for the r RC node in Figure 6B is the same as for the 2D case while The 4D classical water dimer FEARCF simulation is run for 50 ps each iteration with 128 simultaneous simulations in each iteration; each simulation being run on a single core. The sampling for all ξ i in the first FEARCF iteration with no biasing forces has (ξ 1 = r; ξ 2 = θ 1 ; ξ 3 = θ 2 ; ξ 2 = φ) a high-to-low ratio for the distance of (57.4:1) in Figure 6C while after the 5th FEARCF iteration approximate convergence was reached with high-to-low ratios of (1.8:1) in Figure 6E.
The preference for φ = 180 and θ 1 = 52 /128 is due to 52 being half of the TIP3P parameter angle formed between the two hydrogen-oxygen covalent bonds. At θ 1 = 52 one of the hydrogen atoms of the first water is now in line with the vector r ! facing the oxygen of the second water, while one of the hydrogens of the second water is also in line with r ! but facing away from the other oxygen. Lastly φ = 180 indicates that the second hydrogen of each    Figure 7A are electron density contour plots of C I , C II , C y , B I , and B II .
The closest contours are at a density of 0.03 au where it is apparent that there is a sharing of electrons in the chain case (C I and C II ) that does not occur for C y B I , or B II . This implies that the hydrogen bonds formed at C I and C II are not purely electrostatic in nature but also involve deformation of the electron clouds of the hydrogen bond donor and acceptor ( Figure 7B and Table 1). While the C and Cy con- It is a mid-point between C I and C II .
The nature and extent of the water association is measured through the dipole-dipole correlation function given by Q t where D(t) is the total dipole moment of both water molecules at time t (left panel Figure 8B). To fit the Q t ð Þ decay 2 exponentials of the  Figure 8C). The faster relaxation time τ 1 is due to the librational motion of both dipoles within the minima well about the angles of C I (θ 1 = 65.53 and θ 2 = 118.33 ) and C II (θ 1 = 122.61 and θ 2 = 66.23 ).
This observation for an isolated water dimer is consistent with the well-known phenomenon of water molecules libration in bulk liquid 41,42 indicating that water libration in hydrogen bonding configurations is an innate molecular property rather than a condensed phase induced phenomenon. The slower relaxation time τ 2 is due to the periodic migration from one minima well C I /C II area to another C II /C I via the transition region Cy (Table 1).
Turning our attention to the process of hydrogen bond dissociation (right panel Figure 8A) we investigate trajectories to understand if there is a systematic molecular process of the hydrogen bond between the waters breaking while drifting apart. After passing the energy barrier of 6 kT = 3.55 kcal/mol, the water molecules remain more than 5 Å apart, are no longer hydrogen bonded and do not have an enthalpic means to re-establish the hydrogen bond. The time correlation function can be fitted to single exponential function (right panel Figure 8B) of the form e À tþx ð Þ τ where τ is the relaxation time. Upon fitting, the values are found to be τ ¼ 127:0 AE 1, x ¼ 5:63 AE 0:77 . Analyzing the dissociation trajectories, two of which are projected onto the free energy surface (right panel in Figure 8A), shows that the exit out of the 6 kT hydrogen bond zone is made via a bifurcated hydrogen bonding configuration. A closer look at this through an examination of the time series of the θ and r RCs (right panel Figure 8C) confirms that the gateway out of the hydrogen bond zone of r > 4 Å is through the B I and B II configurations (red and black trajectories on right panel Figure 8A).

| CONCLUSIONS
The implementation of graph theory to define complex reaction coordinates and draw detailed paths to atoms affected by the perturbations due to biasing forces has been detailed here. We showed that the convergence of a multidimensional free energy simulation as a flat histogram having a small sampling ratio between high energy to low energy events can be achieved. This free energy method has been reported previously as a set of algorithms linked to CHARMM. Here, we present a formalized library format for hyperdimensional free energy computations (FEARCF) that can be interfaced with several packages. The seamless link to NWChem presents the opportunity to produce advanced ab initio QMD multidimensional free energy hypersurfaces. We illustrate the value of this by investigating the hydrogen bonding nature of a water dimer by constructing the hyperdimensional DFT (aug-cc-pVDZ basis set and X3LYP exchange correlation functionals) free energy space of a water dimer. Through this we are able to understand that the association mechanisms of a water pair is grounded in two processes that govern the interplay between the chain and cyclic hydrogen bonding configurations. First, the correlated librational motion of both water dipoles that describes the dynamic C I /C II configurations appears to be an innate molecular phenomenon that carries through to bulk water in the condensed phase. Second, there is a periodic interchange between the C I and C II configurations that is via the Cy configuration. In the case of hydrogen bond dissociation there appears to be an organized process that passes from the enthalpically stable dimer 6 kT configurational well into the random independent water dynamics via bifurcated hydrogen bonding. It is now possible to undertake ab initio hyperdimensional free energy investigations of molecular mechanisms through the interfacing of FEARCF with packages such as NWChem and so reduce the reliance on subjective parameterised QM models.