Design, Parameterization, and Implementation of Atomic Force Fields for Adsorption in Nanoporous Materials

Molecular simulations are an excellent tool to study adsorption and diffusion in nanoporous materials. Examples of nanoporous materials are zeolites, carbon nanotubes, clays, metal‐organic frameworks (MOFs), covalent organic frameworks (COFs) and zeolitic imidazolate frameworks (ZIFs). The molecular confinement these materials offer has been exploited in adsorption and catalysis for almost 50 years. Molecular simulations have provided understanding of the underlying shape selectivity, and adsorption and diffusion effects. Much of the reliability of the modeling predictions depends on the accuracy and transferability of the force field. However, flexibility and the chemical and structural diversity of MOFs add significant challenges for engineering force fields that are able to reproduce experimentally observed structural and dynamic properties. Recent developments in design, parameterization, and implementation of force fields for MOFs and zeolites are reviewed.


Introduction
Nanoporous materials are materials with pore shapes and sizes comparable to the typical size of small molecules. Selective adsorption is the process of exploiting this confinement using differences in molecular configurations. Zeolites are 3D, microporous, crystalline solids with well-defined structures that mainly force field was replaced by a more generic solution such as DREIDING, [14] or UFF, [15] for example, to tackle the larger chemical diversity of MOFs. These approaches were very successful without the need for re-parameterization. [16][17][18][19] Over the years several challenges were found. The first major force field challenge is to take flexibility of the framework into account. [20] All MOFs exhibit some form of flexibility [21][22][23] ranging from lattice vibrations around equilibrium positions to large-scale structural transformations upon external stimuli. An anomalous structural transformation due to temperature is negative thermal expansion. [24,25] Guest adsorption can induce phase transitions such as breathing. [26,27] Another newly discovered stimuli for breathing is the electric field. [28] A related interesting phenomenon is the gate-opening behavior observed in ZIF-8. These flexible behaviors can have important implications for MOF application prospects [29,30] by enabling, for instance, exceptional gas storage or separation performances that is not possible in traditional rigid materials. Soft vibrational modes (terahertz regime) determine large scale framework flexibility. [31,32] The most difficult and important part of the force field to get right is the lower frequency modes that are responsible for large-scale flexibility and structural rearrangement as a function of external stimuli. [20] The second major force field challenge are open metal-sites as found for example in Cu-BTC and MOF-74. Sorbent materials containing open-metal sites are promising materials for the separation olefin-paraffin mixtures. [33] Unfortunately, conventional force fields do not capture the olefin-metal interactions. Ideal adsorbed solution theory (IAST) [34] breaks down for olefin adsorption in open-metal site materials due to non-ideal donor-acceptor interactions. [35] Here, we review design, parameterization, and implementation of force fields that lie at the foundation of computing adsorption and diffusion in nanoporous materials. First, we briefly describe ab initio and classical simulations approaches. At the heart of classical simulation methodology lies the "force field," an equation describing the dependence of the energy of a system on the coordinates of its particles using chemical concepts like bonds, bends, torsions, and non-bonded terms. [36] The various intra-molecular and inter-molecular potentials that are used in the field of adsorption/diffusion in nanoporous materials are discussed and the implementation of the energy, gradients, and second-derivatives with respect to position and strain are detailed. Then we focus on the details of the long-range interactions, in particular charge-charge interaction using the Ewald summation and the inclusion of bond-dipoles, polarization using induced dipoles, MD/MC-implementation details, and how to deal with a net-charge in the system. The latter is also needed when decomposing the energy into framework, cations, and adsorbate contribution when the framework itself is net-charged but compensated by cations. Next, the theory of molecular mechanics is reviewed which takes the implemented force fields to accurately compute zero Kelvin properties like unit cell size and shape, crystal structure, elastic constants, and vibrational spectra. To be able to compute these properties fast and accurately, allows for a common force field parameterization workflow to optimize the force field parameters to be as close to ab initio results as possible. We also discuss other parameterization approaches like constructing empirical force fields based on reproducing experimental data, for example, fitting on adsorption isotherms and vapor-liquid experimental data, and fitting on thermal expansion and elasticity properties. We then review specific force force fields that have been developed for flexible zeolites and MOFs, and end with an outlook and discuss the aspects involved in implementing force field on graphic cards, and designing and constructing a force field in terms of energy versus entropy, bottom-up versus a top-down approach, and the advantage and pitfalls when using machine learning approaches to implement force fields.

Ab Initio
The term "ab initio" is Latin for "from the beginning." The computational framework is derived from purely theoretical considerations with no inclusion of experimental data. [37][38][39][40][41][42] The most common type of ab initio calculation is called a Hartree Fock (HF) calculation, an extension of molecular orbital theory, in which the correlated electron-electron repulsion is not specifically taken into account; only its average effect is included in the calculation. Most ab initio calculations make the Born-Oppenheimer approximation, which greatly simplifies the underlying Schrödinger equation by assuming that the nuclei remain fixed during the calculation. Present algorithms in computational chemistry can routinely calculate the properties of molecules that contain up to about 50 electrons with sufficient accuracy. Errors for energies can be less than a few kJ mol −1 . For geometries, bond lengths can be predicted within a few picometers and bond angles within 0.5 degrees. The treatment of larger molecules that contain hundreds of electrons is computationally tractable by approximate methods such as density functional theory (DFT).
In DFT, the total energy is expressed in terms of the total electron density, rather than the wave function. [43][44][45][46] In this type of calculation, there is an approximate Hamiltonian and an approximate expression for the total electron density. DFT methods can be accurate for little computational cost. DFT methods that do not include HF exchange can scale better than HF. Some methods combine the density functional exchange functional with the HF exchange term and are known as hybrid functional methods. Despite recent improvements, there are still difficulties in using DFT to accurately describe inter molecular interactions, especially van der Waals forces (dispersion). They cannot be reproduced at the HF level because they are a pure correlation effect. Exact DFT would include all correlation effects, including the dispersion force. However, exact DFT contains a functional of the electron density which is unknown, and most probably unknowable (except by solving the full Schrödinger equation). DFT is currently unable to accurately describe adsorption in nanoporous materials. Phase equilibria are extremely sensitive to potential parameters of empirically determined classical force field. Importantly, this also applies to first-principles parameters of the system. [47] The development of new DFT methods to overcome this problem is an active research topic. Current research direction include alteration to the functional, [48] the inclusion of additive terms, [49] van der Waals density functional (VDW-DF), [50] and the use of maximally localized Wannier functions. [51] VDW-DF has been used to study CO 2 binding in ZIFs. [52] For reaching chemical accuracy for the energies of adsorption, hybrid high-level (MP2) low-level (DFT + D) quantum chemical method is argued to be required. [53,54] Gradient-corrected density functional with empirical dispersion corrections (PBE-D) as well as a nonlocal correlation functional (vdW-DF2) were found to reproduce available experimental adsorption enthalpies of alkanes in silicalite and HZSM-5 zeolite only in a limited fashion. [55] Regular GGAs do not describe dispersion-governed crystal types well. [56] Currently, dispersion-corrected DFT produces a stronger associated liquid phase when compared to experiment. [57][58][59] Fang et al. systematically compared several dispersion-corrected DFT methods for CO 2 in zeolites. [60] The semi-empirical method by Grimme [48] was found to give the best agreement with high level quantum and experimental results. The other methods substantially overestimated the binding energies.
The non-local van der Waals density functional (vdW-DF) of Dion et al. [50] is a very promising scheme for the efficient treatment of dispersion bonded systems. [61,62] The Strongly Constrained and Appropriately Normed (SCAN) semilocal density functional [63] is a major improvement over PBE (and much more so over LSDA), at nearly the same computational cost. rVV10 [64] is a simple revision of the VV10 nonlocal density functional by Vydrov and Van Voorhis [65] for dispersion interactions. Unlike the original functional this modification allows nonlocal correlation energy and its derivatives to be efficiently evaluated in a plane wave framework. Peng et al. constructed a "best-ofboth-worlds" van der Waals (vdW) density functional: SCAN + rVV10. [66] The resultant vdW density functional yielded excellent interlayer binding energies and spacings, as well as intralayer lattice constants in 28 layered materials. Trans et al. tested a wide variety of nonlocal van der Waals functionals for solids. Only three functionals provided reasonably small errors for all properties (intralayer and interlayer lattice constants and interlayer binding energy) and for most solids: PBE + rVV10L, SCAN + rVV10, and rev-vdW-DF2. [67] In flexible MOFs, there is a delicate balance between dispersion and entropy that governs their transformation. [68] Quantum simulations of adsorption (binding energies) of adsorbates in nanoporous materials are relatively scarce. The reason is that adsorption is due to dispersion interactions and therefore a very high level of theory is needed. This significantly reduces the amount of atoms that one can reliably include in the simulations. Therefore most such studies compute binding energies of an adsorbate with a small, representative cluster. However, this is a severe approximation for nanoporous materials where the molecule is usually (strongly) confined. It is more common to see quantum mechanics applied to the properties of the host material itself, for example, infra-red (IR) spectra, unit cell size, and elastic constants. [69][70][71] Moreover, quantum mechanical tools are essential to obtain reliable atomic information on the positions of framework atoms when experimental crystallographic data are missing or only partially known. However, to study adsorption in nanoporous materials at, for example, physiological conditions we must resort to classical methodology. Three main categories of classical simulation methodologies are Molecular Mechanics (MM), Molecular Dynamics (MD), and Monte Carlo (MC).

Molecular Mechanics (MM)
The basic ideas behind MM date back to the 1930s and 1940s. [72][73][74] MM assumes that matter consists of atoms and for every set of positions of the atoms the potential energy can be defined. In 1950-1970 the advent of computers caused MM methods to grow in popularity at a rapid rate, and currently MM is one of the standard methods of structural chemistry. The MM model is defined in terms of an energy equation. The classical molecular energy  can be described as an Taylor expansion in bonds, bends, torsions, etc. [75,76] This expansion captures all the chemical entities we are used to, such as atoms, bonds, bond-angles and torsions, and physical properties like equilibrium structures, vibrational spectra, etc. On top of the terms in Equation (1) one can add ad hoc terms, such as hydrogen bonding, that are not adequately accounted for otherwise. Equation (1) is historically referred to as an force field. The name arose from the lowest order approximation using only springs with force constants. The prototypical MM application is energy minimization, for example to obtain the best binding modes of two 3D molecules ("docking"). Some examples of its uses are: [77] • To obtain reasonable input geometries for more computational expensive types of calculations, • To obtain good geometries (and perhaps energies) for smallto medium-sized molecules, • To calculate the geometries and energies of very large molecules, usually polymeric biomolecules (proteins and nucleic acids), • To generate the potential energy function under which molecules move, for MD or MC calculations, • As a (usually quick) guide to the feasibility of, or likely outcome of, reactions in organic synthesis.

Molecular Dynamics (MD)
MD mimics nature by computing the positions r(t) and velocities v(t) of particles as a function of time t according to the equations of motion of Newton: f = ma, which relates the force f to the mass m times the acceleration a. [78] MD neglects zero point motion and the quantisation of vibration. It is therefore not applicable to the very low temperature regime (at low temperature a path-integral formalism is required). [79] The first step in a numerical scheme to solve this equation is to rewrite it as two coupled first-order differential equations:
The integration scheme is stable for sufficiently small time steps Δt. The permitted time step depends on the system. Stiff bonds require a smaller time step (like 0.5 fs), while for rigid molecules a larger time step can be used (like 2 fs). The main criteria to validate the chosen time step is the long-time conservation of the conserved quantity of the equations of motion. The main use of MD is to generate a thermodynamic ensemble. Newton's equation of motion corresponds to the NVE-ensemble, constant number of particle N, constant volume V, and constant energy E.
Other ensembles can be generated by extending the equation of motion with thermo-and barostats. [80,81] MD can be applied to comparatively large systems for relatively long times compared to QM. In contrast to MC, MD can be used to compute dynamic properties (time correlation functions and transport properties). Practical aspects of MD are reviewed in refs. [36,82,83] and further details on the algorithms and computational aspects can be found in refs. [78,[84][85][86][87].

Monte Carlo (MC)
In the Markov Chain Monte Carlo method (MCMC) [88,89] the system evolves from state to state by applying "MC moves" that modifies the current state, denoted as "old" state, to generate a "new" state. The "acceptance rule" for the MC moves guarantees that all states of the Markov chain are visited with the appropriate probability. Hence, no weighting is necessary, and averages can be computed as averages over the chain. For an infinite Markov chain the expression is exact. The MC scheme makes use of the fact that only the relative probability of visiting points in configuration space is needed, not the absolute probability. Common moves are to translate and/or rotate a molecule. Such an attempt can be "accepted" or "rejected" by the acceptance rule, which means that the state has changed or that the new state is simply equal to the old one, respectively. Note that in MC, for particle-moves (like translation and rotation), the energy difference used in the MC moves has to be computed only for the atoms that move. Typical MC system-moves are volume changes and MC/MD hybrid moves, which are much more expensive Adv. Theory Simul. 2019, 2,1900135 1900135 (4 of 62) since the energy has to be recomputed for the entire system. Advantages of MC are that (for most moves) only energy evaluations are required and that different ensembles are easily handled. Examples are the ensembles used to compute adsorption, the Gibbs ensemble and the grand-canonical ensemble, where the chemical potential is imposed and the number of particles of the adsorbed-phase fluctuates. Downsides of MC are that MC is considerably harder to apply to chemically complex molecules than MD, and highly correlated movements are hard to simulate. Also only static properties can be computed in MC, because there is no time involved in an MC move. Time does not enter in the partition function (i.e., no velocities, only positions) and therefore the involved units are energy and distance (not time). However, it is possible to artifically enter time using schemes like "kinetic MC." Practical aspects of MC are reviewed in ref. [90] and further details on the algorithms and computational aspects can be found in refs. [78,84,[91][92][93][94].

Functional Form
The energy equation Equation (1) is the central workhorse in MM, MD, and MC. The majority of the computational time will be spent here, in computing energies and energy derivatives (e.g., the forces). The term "force field" refers to the functional form and parameter sets of the energy, that is, a force field consists of two parts: 1. An analytical expression for the inter-atomic potential energy  (r) as a function of the atomic coordinates r. 2. Definitions and parameters for "atom types." Chemical elements are classified based on bonding and environment (e.g., carbon: sp 3 , sp 2 , sp, aromatic, carbonyl). Parameters are assigned based on the atom types involved (the CC bond length and force constant are different for sp 3 -sp 3 vs sp 2 -sp 2 ).
The forces are given by the gradients of the potential energy with respect to the internal coordinates of the molecule. Several types of force fields can be distinguished: • Rule-based force fields Force fields that have generally broad applicability across the periodic table. This type of force field utilizes generic rules to generate parameters for a range of interactions from a small set of atomic-based parameters (in contrast to using a set of explicit parameters determined for each type of interaction). The rule-based strategy provides a systematic and extensible way to develop a force field. • First-generation, class I force fields Harmonic functions for stretch and bend, no cross-terms, Lennard-Jones for dispersion interactions. Class I force fields are also called "harmonic" or "diagonal" and are common for large systems like proteins or DNA because of their computational efficiency. It is believed that class I force fields are sufficiently accurate for large bio-molecules near roomtemperature. • Second-generation, class II force fields Class II force fields add cross-terms, cubic, or quartic expansions of stretch and bend, exponential-type potentials for dispersion interactions, all in an effort to increase the accuracy of the potential energies, the structure, and especially the vibrational frequencies of the system. Because of the additional complexity and computational expense, class II force fields are intended for materials science and small to medium molecules.
• Third-generation, class III force fields Type III force fields include hyperconjugation and polarization. [95][96][97] Polarization is included using either induced dipoles (AMOEBA, SIBFA, NEMO, and others), the drude model (polarizable CHARMM) or fluctuating charges. Table 1 lists a selection of common force fields.
There are many advantages of the classical approach. Force field-based simulations can handle large systems, and are several orders of magnitude faster (and cheaper) than quantum-based calculations. But also the analysis of the energy contributions can be done at the level of individual classes of interactions, and the energy expression can be modified to bias the computations. However, classical approaches have difficulties with electronic transitions (photon absorption), electron transport phenomena, proton transfer (acid/base reactions), and chemical reactions. Therefore vibrational properties like intensities cannot be described by classical MD, and it is not possible to compute electronic (band structure, eDOS) or magnetic properties (NMR spectra). A recent approach to handle chemical reactions in classical simulations is the reactive force field (ReaxFF) [161] which is bond order-based. The following characteristics of force fields are a trade-off: • Accuracy (how accurate are properties of interest computed), • Transferability (prediction of properties or systems outside of the fitting set), • Computational efficiency.

Derivatives
Potential energy functions and their derivatives play an essential ingredient of molecule simulations. For MC methodology the potential energy function  suffice, but MD requires forces f and hence first derivatives of the potential energy function, that is, f = −  ∕ r. Computing normal modes requires the Hessian matrix, a matrix of second derivatives with respect to position. The Hessian matrix is particularly important, not only for geometry optimization, but also for the characterization of stationary points as minima, transition states, or hilltops, and for the calculation of IR spectra. [77] A generalized Hessian matrix can be formed by including strain-derivatives such that also cell shape and volume changes are incorporated. Elastic constants at zero Kelvin can directly be computed at machine precision from the generalized Hessian (see Section 5.1), and the Hessian is also of great use in efficient minimization methodology [162,163] (see Section 5.3). Expressions of first and second derivatives for stretching, linear, nonlinear, and out-of-plane bending and torsional motion have appeared in many articles. [164][165][166][167][168][169][170][171][172][173][174][175][176] [98][99][100][101][102][103][104][105] AMB/OPLS AMBER/OPLS I proteins, [106] all-atom hydrocarbons, [107] organic liquids, [108] carbohydrates [109] AMOEBA Atomic Multipole Optimized Energetics for Biomolecular Applications III Nucleic acids [110] CFF91 Consistent Force Field II hydrocarbons, proteins, protein-ligand interactions [111] CFF93 Consistent Force Field II alkyl group and alkanes [112] CFF95 Consistent Force Field II general-purpose [113,114] CHARMM Chemistry at HARvard Molecular Mechanics I biochemistry, proteins, nucleic acids, organic molecules, metal-ions [115][116][117][118] COMPASS Condensed-phase Optimized Molecular Potentials for Atomistic Simulation Studies II small molecules and polymers [119,120] CVFF Consistent-Valence Force Field I biochemistry, peptides and proteins [121] DREIDING Generic force field rule general-purpose [14] DRF90 Direct Reaction Field III small (solvent) molecules [122] ECEPP/5 Empirical Conformational Energy Program for Peptides I peptides, proteins, and organic molecules [123] ESFF Extensible Systematic Force Field rule organic, inorganic, and organometallic systems [124] GAFF Generalized Amber Force Field I general-purpose [125] GROMOS GROnigen MOlecular Simulation I biochemistry, proteins, lipids, nucleic acids, polymers, water-protein, [126] oligosaccharides, [127] biomolecules in explicit water [128] INTERFACE-PCFF Thermodynamically CFF II inorganic, organic, and biological nanostructures [129] OPLS Optimized Potentials for Liquid Simulations I proteins, nucleic acids, liquid hydrocarbons, [130] amides and peptides, [131] liquid sulfur compounds, [132] liquid alcohols [133] PCFF Polymer Consistent Force-Field II organic polymers, (inorganic) metals, and zeolites [134] PIPF Polarizable Intermolecular Potential Function III liquid amides and alkanes [135] MM2 Molecular Mechanics II hydrocarbons, [136] hydrogen-bonding, [137] peptides [138] MM3 Molecular Mechanics II amides, polypeptides and proteins, [139] hydrogen-bonding [140] MM4 Molecular Mechanics II saturated hydrocarbons, [141,142] conjugated hydrocarbons, [143] alkenes, [144,145] sulfides and mercaptans [146] MMFF94 the Merck Molecular Force Field II general-purpose, all organic molecules for drug design [147][148][149][150] NEMO Non-Empirical Molecular Orbital III small molecules and systems [151] SIBFA Sum of Interactions Between Fragments Ab initio computed III small molecules and flexible proteins [152] TraPPE TRAnsferable Potentials for Phase Equilibria I small molecules [153][154][155][156][157][158][159] UFF Universal Force Field rule general-purpose [15] X-Pol Explicit POLarization III fluid systems [160] in literature. [177,178] Both the gradient and the Hessian can be evaluated in (N 2 ) steps. [175] Third-order derivatives are used in free energy minimization. [162] By combining free-energy minimization with empirical fitting, it is possible to determine interatomic potentials with correct treatment of thermal effects and the zeropoint energy. [179] In free energy minimization the derivatives of the phonon frequencies are needed, which require the phased third derivatives of the energy with respect to either three Cartesian coordinates, or two Cartesian coordinates and one strain, for internal and external variables, respectively. [162] Higher-order derivatives are also required for the analysis of thermal expansion at zero Kelvin. A harmonic approach does not suffice in this case because there will be no lattice expansion due to the symmetrical displacement of atoms about their equilibrium positions. Note that the energy and derivative computation can be parallelized with nearly 100% efficiency for internal coordinates. [180] The notational convention in this review is that , , , = {x, y, z} denote the vector components, latin subscripts i, j, k, l denote atom indices, r i denotes the position of atom i, and denotes the Lagrangian strain-tensor.
Derivatives of a function f (y), where y is a function of x 1 , x 2 , … , are given by applications of the chain rule f (y) 2 f (y) Adv. Theory Simul. 2019, 2,1900135 1900135 (6 of 62) Here, f could be a potential energy function f (r ij ) that depends on the distance r ij which in turn depends on the atomic positions r i , r j of the atoms i and j involved, or a potential energy function f ( ijk ) that depends on the bend-angle ijk which in turn depends on the atomic positions r i , r j and r k for the three involved atoms i, j, k, etc. Hence, using the chain rule we can split the derivative in a derivative of the potential energy function form with respect to the r, and/or and the derivative of these with respect to the atomic positions. [170] Computing the gradient ∇ r i  (r ij ) of a pair potential  (r ij ) requires the use of the chain rule Hence we need derivatives of the distance vector r ij We will encounter r ij ∕ r q and 2 r ij ∕ r p r q in single and double derivatives, respectively. We find [176] r ij where is the Kronecker delta. As an example, consider the harmonic bond potential  (r) = 1 2 k(r − r 0 ) 2 . The first derivative is given by For strain derivatives we first define the metric tensor by [181,182] where the matrix h = {a, b, c} is composed of the cell vectors a, b, c of the simulation cell andh refers to h in the reference state (usually taken as the zero strain configuration). Using M , a homogeneous deformation (i.e., strain) can be applied by relating the position of an atom to its original position in the reference state of the body The strain tensor can be expressed in terms of the metric tensor as [181,182] where ‡ denotes the transpose of the tensor. Derivatives of any general thermodynamic variable A that is an explicit function of M can be derived as [181,182] where An overview of popular function forms for atomic potentials can be found in ref. [162]. In the next sections, we highlight the first and second derivatives of the most common potentials used in zeolite and MOF work. These expressions are taken from literature but written down using a uniform notation (and hence changed from the original notation). An implementation for the analytical expressions as well as their numerical validation can be found as Supporting Information.

Bond-Stretching Potentials
Covalent forces are complicated in nature as the electronic structure changes dramatically upon bond formation. The strengthrange of the covalent bonds is about 200-800 kJ mol −1 , and the force operates over very short distances of the order of 1-2 Å. [183] MOFs consist of a coordinating metal atom (or cluster of atoms) with one or more electron rich ligands attached to it by so-called "coordination bonds." These bonds are weaker (between 50 and 200 kJ mol −1 ) and the force is directional acting on a relevant distance of 1.5-2.5 Å. [184] In a course-grained approach, bonding can be modeled by bond-potentials that describe the change in energy as the bond stretches and compresses. At high temperatures the asymmetry is apparent as it is harder to compress a bond than it is to stretch it. The bond force constants are very high and motion is usually restricted to the harmonic well of the potential.
The bond vector r ij can be defined based on the position of the atoms r ij = r i − r j (see Figure 1) and points from r j toward r i . The distance r ij between atom i and j is |r ij | = √ r ij ⋅ r ij . The  TraPPE [153] Fixed bond-lengths OPLS [106] , UFF [15]  (r ij ) = 1 2 k(r ij − r 0 ) 2 AMBER [98] , GAFF [125] , CHARMM [116]  (r ij ) = k(r ij − r 0 ) 2 MM3 [139] , AMOEBA [110]  [121] , UFF [15]  most common functional form for the bond potential  b is the expansion where r is the current bond length, r 0 the reference bond length, and k 1 , k 2 , k 3 are the bond force constants. Often the expansion is truncated after the first term. The "Morse" potential is anharmonic and often provides a much better description of the potential Expanding around the equilibrium value leads to The first term is the harmonic potential (with k = 2D 2 ) and for organic structures where distortions from equilibrium are small, the difference between the potentials is small. For larger deviations the Morse potential provides a significantly better description. The Morse potential provides a restoring force which goes to zero at long distances. For minimization starting far from equilibrium this could result in non-convergence. Some force fields solved this problem by using modifications of Hook's law. MM2 added a cubic term making the bond an-harmonic. However, this leads to large negative energies for poor initial geometries with large distortions. MM3 added the quartic term to solve this. Note the 7∕12 terms in the MM2/3 functional forms originate from the Taylor expansion of the Morse potential, and the cubic and quartic terms are chosen to mimic the Morse potentials for moderate distortions. Some common bond potentials are listed in Table 2.
It is convenient to separate the bond potential functional form from the other derivatives so that a more general framework can be used. We therefore define the bond potential functional derivatives as We define u = r ij = r i − r j and u = |u| = √ u ⋅ u. The derivatives of  (r) with respect to Cartesian coordinates are given by [164]  (r) The derivatives of  (r) with respect to strain are derived by Ray [185] and Lutsko [181] The explicit strain derivative terms appearing in the last term in Equation (31) arise from the action of the strain derivative on the explicit cell dependence of the microscopic stress. [181] Adv. Theory Simul. 2019, 2,1900135 1900135 (8 of 62) Table 3. Common Urey-Bradley potential functional forms.

Urey-Bradley
The Urey-Bradley term is a special case of cross-terms and consists of a harmonic potential as a function of the distance between the (non-bonded) atoms i and k of an i-j-k angle (i.e., as if there exists an extra bond stretching term between atoms i and k). This Urey-Bradley term is coupled with the i-j and j-k bond stretching terms and the i-j-k angle bending term through basic trigonometric relationships, and is a computationally elegant way of reproducing bond-bond coupling effects in vibrational spectra. However, compared to the more conventional cross-terms, it is poorly transferable to various combinations of three atoms with different reference values for the bonds or angle. Some common Urey-Bradley potentials are listed in Table 3. Derivatives are the same as for bond-potentials.

Bond Bending Potentials
We define The bend angle ijk is formed by the angle between bond vector u and v (see Figure 1) To describe bond bending between three atoms one can use Angles are much softer than bonds, especially in zeolites where a Si-O-Si angle ranges between 135 and 180 degrees. [186] Also in three-or four-membered rings (cyclopropane, cyclobutane) large deviations from the usual bond angles are observed. A problem with all polynomial representations of angles is that angles of 180 degrees results in a singular point (unless the reference angle is 180 degrees). The case of 0 degree is not possible due to repulsion of the i and k atoms in the i-j-k bend. The singularity is due to a factor 1∕ sin( ) in the force expression. A common solution is to use a trigonometric function [187]  Note that close to the maximum this potential has no restoring force, but for small distortions this is not a problem. The MM force fields use higher order terms. A six power term was needed to describe the highly bent bicyclo[1.1.1]pentane. Cubic terms and higher become desirable when the bending is more than Table 4. Common bend angle potential functional forms.
It is convenient to separate the bend potential functional form from the other derivatives so that a more general framework can be used. We therefore define the bend potential functional derivatives as The derivative of the potential energy depends on the functional form of the potential energy function and can be obtained in general using [164]  ( ) where cos cos The second derivatives are given by  ( )  ( ) where 2 cos 2 cos 2 cos The strain derivatives are given by [188]  ( ) = u  ( )  ( ) = f 2 cos cos + f 1 2 cos The explicit strain derivative terms appearing in the last term in Equation (57) arise from the action of the strain derivative on the explicit cell dependence of the microscopic stress. [181] The formulae for the strain derivatives are more conveniently written down in terms of the natural log of the cosine of the angles [188] cos = cos ln cos (58) 2 cos = cos ( 2 ln cos + ln cos ln cos ) where The cross-derivative of the strain derivative with respect to position can be expressed in terms of Hessian elements

Torsion Potentials
We define with atom indices as defined in Figure 2. The structure of a molecule is largely determined by energy barriers to rotation about chemical bonds. The torsion energy varies with the dihedral angle . The dihedral angle is given by where the following substitutions were made The sign of is taken as the sign of v ⋅ (m × n). [170] Adv. Theory Simul. 2019, 2,1900135 1900135 (10 of 62) Figure 2. The dihedral angle defined by the angle between the planes formed by atoms i-j-k and j-k-l. A positive dihedral angle (left) and a negative dihedral angle (right) are shown. Table 5. Common torsion potential functional forms.
The values for fourfold or higher are small. It is unknown whether these are essential to include and perhaps the van der Waals and dipole interactions already take care of these effects. Torsions are even softer than bond angles. All possible -values can be found in structures. Therefore, the energy function must be valid over the entire range, the function must be periodic, and have stationary points at 0 and 180 • (for reasons of symmetry). The periodicity is the number of minima for the potential, usually 3 for an sp 3 -sp 3 bond and 2 for a conjugate bond. The definition of a torsion includes two central and two terminal atoms. The term "torsional" means an internal rigid rotation and "dihedral" means a rotation of two vicinal bonds about a middle bond. In general, bonds and bend angles are high frequency modes, and generally not too important for thermodynamic quantities. Torsions, however, are very important. They determine the overall structure of a molecule, and improperly calibrated torsions can, for example, cause over stabilization of helices. Some common torsion potentials are listed in Table 5.
A common functional form, for example, for alkanes, is the Ryckaert-Bellemans potential [189]  Note that ′ = − is defined according to the polymer convention ′ (trans) = 0. Functional forms for torsion potential can often be converted using trigonometric identities like cos 2x = −1 + 2 cos 2 x and cos 3x = −3 cos x + 4 cos 3 x.
It is convenient to separate the torsion potential functional form from the other derivatives so that a more general framework can be used. We therefore define the torsion potential functional derivatives as Using the chain rule we can split the derivative in a derivative of  ( ijkl ) with respect to the dihedral angle and the derivative of the dihedral angle with respect to the atomic positions [170]  ( For first derivatives it is convenient to define vectors R and S [171] such that only dot-products of vectors are involved. [169,170] cos cos cos The second derivatives are given by [87]  ( )  ( ) 2 cos and where The strain derivatives are given by [188]  ( ) = u  ( )  ( ) = f 2 cos cos + f 1 2 cos The explicit strain derivative terms appearing in the last term in Equation (119) arise from the action of the strain derivative on the explicit cell dependence of the microscopic stress. [181] The formulae for the strain derivatives of the torsion are more conveniently written down in terms of the natural logarithm of the square of the cosine of the dihedral angle [188] cos = 1 2 cos ln cos 2 (120) 2 cos = 1 2 cos ( 2 ln cos 2 + 1 2 ln cos 2 ln cos 2 ) The first derivative with respect to strain is given by [188] ln where The second derivative with respect to strain is given by [188] ln The cross-derivative of the strain derivative with respect to position can again be expressed in terms of Hessian elements Using the chain rule we can split the derivative in a derivative of  ( ijkl ) with respect to the dihedral angle and the derivative of the dihedral angle with respect to the atomic positions [170] Adv. Theory Simul. 2019, 2,1900135 1900135 (13 of 62) Table 6. Common out-of-plane potential functional forms.
CVFF [121] , CFF91 [111] , CHARMM [116] The middle term in Equation (134) leads to a singularity term 1∕ sin when = 0 or = . This is a problem for torsional expressions like for AMBER/GAFF and CHARMM that have phase angles 0 other than 0 or . Therefore the equivalent chain-rule Equation 135 is more convenient. [170]

Improper Torsions and Out-of-Plane Potentials
Common planar molecules that contain a double bond or sp 2 hybridization form planar groups with trigonal centers, for example, peptidic bonds, the carbon centers in benzene, and carbon and nitrogen centers in formamide. The mode of motion is different from bond stretching, bending, and internal rotation, and mandatory to control planarity. The associated harmonic potential is with the out-of-plane angle. Two common definitions in use are: i) the average angle between any bond that extends from the central atom and the plane defined by the other two bonds, and ii) the distance of the central atom from the plane defined by the other three atoms (pyramid height). The out-of-plane potential can also be used for non-planar structures, for example in chiral centers to avoid inversion. Another example of its use is coordination complexes where now the plane of the ligands need no longer be defined exactly. In square planar complexes it is necessary to define an average plane through the ligands (usually the least-square plane). Note that the definition include one central atom which is listed as the second in i-j-k-l: i, k, and l are bonded to the central atom j. The inversion angle potential is the average of the three possible inversion angle terms. The first and second derivatives for out-of-plane bending motion are special cases of linear bending motion of atoms. [168] Note that an alternative to the out-of-plane angle is the "improper torsion" potential The improper torsion simply treats the four atoms in the plane as if they were bonded in the same way as in a true torsional angle. Note that the definition include one central atom which is listed as the second in i-j-k-l (i, k, and l are bonded to the central atom j). Improper torsions are often used to keep sp 2 atoms planar and sp 3 atoms in a tetrahedral geometry. The CHARMM convention is to list the central atom first, while there are no rules how to order the other three atoms. Hence, six possibilities exist for the definition of an improper torsion. The AMBER convention is that the out-of-plane atom is listed in the third position and the order of the other atoms is determined alphabetically by atom type, and by the atom number (i.e., the order in the molecule) when atom types are identical. Common out-of-plane potentials are listed in Table 6. Table 7. Common intra-molecular cross potential functional forms.

Cross Terms
The cross terms arise naturally from the energy Taylor expansion and are not ad hoc functions and allow to reproduce better the observed spectra of intra-molecular vibrations. Apart from anharmonic terms, class II and III force fields contain cross terms that reflect the coupling between adjacent bonds, angles and dihedrals. For example, bonds and bends interact, as the bend angle becomes smaller the bond lengths tend to increase. The inclusion of cross-terms leads to two advantages: • they increase the accuracy of the force field (especially the vibrational frequencies), • they increase the transferability of the diagonal terms  b (r),  ( ),  ( ),  ( ).
Some typical intra-molecular cross potentials are listed in Table 7.
For bond angles smaller than the equilibrium bond angle bond stretching is observed (e.g., cyclobutane), which can be described by a cross term that couples the bond-lengths and bend angle.
Similar to the stretch-bend term, the stretch can also couple to the torsion (stretch-torsion) and the bend couples to the torsion (bend-torsion cross-terms). These interactions are usually very small and several force fields neglect these potentials. An important use in zeolite models, however, is to slowly smooth the torsional energy to zero when the Si-O-Si bend approaches 180 • (the torsion becomes ill-defined when three atoms are colinear). [190,191] Adv. Theory Simul. 2019, 2,1900135 1900135 (14 of 62)

Coulombic Potential and The Multi-Pole Expansion
So far we have discussed the intra-molecular bond, bend, torsion, and out-of-plane terms and these are sufficient to model small isolated molecules. The non-bonded term is the central work-horse of force fields and acts between atoms which are not linked by covalent bonds. It describes the dispersion and electrostatic interactions between molecules, as well as longer-range interactions within a molecule that are not already described by the bond, bend, torsion, and out-of-plane terms. It is clear that 1-2 and 1-3 pairs are excluded from Lennard-Jones and electrostatics (these are already taken into account by explicit bonding and bending terms), while 1-5 and more should be included. The 1-4 interaction is a somewhat gray area; some force fields neglect explicit torsions but include dispersion/electrostatics, others use only torsions and no dispersion/electrostatics, and some use both, often with scaling factors (generally 0.5; AMBER uses a scaling factor of 0.8, a scale factor of 0.75 is used in the MMFF94 force field, [147] and a value of 0.5 is adopted in the OPLS-AA force field [109] ). The latter option (scaling) generally improves agreement and better reproduces the potential energy of rotation. First, the electrostatic density of the molecules needs to be described. Consider two separate molecules A and B and their associated static charge densities A and B . The Coulombic interaction energy then reads [192,193] where 0 is the absolute dielectric permittivity of classical vacuum, and where the 3D integration is carried out over the volume V ′ of atom A as described by the vector r ′ and over the volume V ′′ of atom B as described by the vector r ′′ . If we assume that the extent of the charge distributions A and B is much smaller than their separation we may expand  AB in a multipole series with respect to the center of mass coordinates r A and r B . The first few multipole moments of the charge distribution A are determined as [192,193] and similar expressions for the charge distribution B . The monopole moment (q) is a scalar, the dipole moment ( ) is a vector, the quadrupole moment ( ↔ Q) is a second-rank tensor, and so on. With these multipole moments we can express the interaction potential as [194] where r = r A − r B . The first term in the expansion is the chargecharge interaction. It is only non-zero if both charge distributions carry a net charge. Charge-charge interactions span over long distances since the distance dependence is only 1∕r. The next two terms are charge-dipole interactions. They require that at least one particle carries a net charge. These interactions decay as 1∕r 2 and are therefore of shorter range than the chargecharge interaction. Finally, the fourth term is the dipole-dipole interaction. It is the most important interaction among neutral particles. The dipole-dipole interaction decays as 1∕r 3 and depends strongly on the dipole orientations. The next higher expansion terms are the quadrupole-charge, quadrupole-dipole, and quadrupole-quadrupole interactions, which decay approximately as 1∕r 3 , 1∕r 4 , and 1∕r 5 , respectively. By introducing the infinitesimal test charge q at r, the electrostatic potential (r) at r is defined according to where  is the potential energy of the system including the test charge. The -component of the electrostatic field E (r) at r is defined as A non-uniform electrostatic field E distorts a charge distribution, thereby inducing multipole moments. The total moments may be expressed (to electric quadrupole order) as [192,194] The tensors ij , a ijk , b ijkl , d ijkl , …are also properties of the undistorted distribution. They are referred to collectively as polarizabilities ( ij is the dipole polarizibility, a ijk the quadrupole polarizibility, b ijkl the octopole polarizibility, etc). The polarizabilities of all but spherically symmetric molecules are anisotropic, having different values along different molecular directions. This arises because the electronic polarizabilities of bonds, which are a measure of the response of the electrons to an external field, are anisotropic. If a charge distribution is unaffected by an applied field, it is said to be non-polarizable or rigid. Molecules are often modeled using the lowest non-zero electric moment. Species such as Na + , Cl − have the charge as their lowest non-zero moment. Molecules such as N 2 and CO 2 have the quadrupole as their lowest non-zero moment; the lowest non-zero moment for methane and tetrafluoromethane is the octopole. Each of these multiple moments can be represented by an appropriate amount of charges. A dipole can be represented using two charges placed an appropriate distance apart; a quadrupole can be represented using four charges and octopole by eight charges. A complete description of the charge distribution around a molecule requires all of the non-zero electric moments to be specified.

Van der Waals Potentials
At very small inter-atomic distances, the electron clouds of atoms overlap. A strong repulsive force arises that determines how close two atoms or molecules can ultimately approach each other. These repulsive forces are sometimes referred to as exchange repulsion, hard core repulsion, steric repulsion, or, for ions, the Born repulsion. [183] They are characterized by having very short ranges and increasing very sharply as two molecules come together. They belong to the category of quantum mechanical forces, and unfortunately there is no general equation for describing their distance dependence. Instead, a number of empirical potential functions have been introduced over the years, all of which appear to be satisfactory as long as they have the property of a steeply rising repulsion at small separations. The three most common such potentials are the hard sphere potential, the inverse power-law potential, and the exponential potential.
Three distinct types of forces contribute to the total long-range interaction between polar molecules. [183] They vary with the inverse sixth power of the distance and are collectively known as the van der Waals force.
1. For the dipole-dipole interaction, a Boltzmann averaging of the interaction energy over all orientations (which now involves averaging over two polar angles and one azimuthal angle) leads to an angle-averaged interaction free energy that varies with the inverse sixth power of the distance. The Boltzmann-averaged interaction between two permanent dipoles is usually referred to as the "orientation" or "Keesom interaction." 2. For two molecules possessing permanent dipole moments and polarizabilities their net dipole-induced dipole energy is 1∕r 6 distance dependent. This is often referred to as the "Debye interaction" or the "induction interaction." 3. Dispersion forces (often called "London dispersion forces") make up the third and perhaps most important contribution to the total van der Waals force between atoms and molecules, and because they are always present (in contrast to the other types of forces that may or may not be present, depending on the properties of the molecules), they play a role in a host of important phenomena such as adhesion, surface tension, physical adsorption, wetting, the properties of gases, liquids, and thin films; the strengths of solids; the flocculation of particles in liquids; and the structures of condensed macro molecules such as proteins and polymers. [183] Today, the term "van der Waals" interactions refers often only to the London-dispersion forces. Dispersion forces are quantum mechanical in origin. Their origin may be understood intuitively as follows: For a nonpolar atom such as helium, the time average of its dipole moment is zero, but at any instant there exists a finite dipole moment given by the instantaneous positions of the electrons about the nuclear protons. This instantaneous dipole generates an electric field that polarizes any nearby neutral atom, inducing a dipole moment in it. The resulting interaction between the two dipoles gives rise to an instantaneous attractive force between the two atoms, and the time average of this force is finite. This can be summarized by calling the dispersion interaction an interaction between two mutually induced dipoles. [195] The point charge-induced dipole decays as 1∕r 4 , induced dipole-dipole decays as 1∕r 6 , the induced dipole-quadrupole as 1∕r 8 , the induced dipole-octopole and the induced quadrupole-quadrupole as 1∕r 10 . In general, the dispersion contribution is [195]  disp (r) = − C AB The 1∕r 6 term dominates the dispersion and for computational simplicity the higher order terms are usually neglected. An interesting quirk of the multipole series is the 1∕r 7 term; this term is missing for atoms and molecules with an inversion center, or when one does rotational averaging, but in general it is present. [195] This dispersion force results from instantaneous dipole/induced dipole (transitory dipole) interactions and thus is also observed acting between noble gas atoms and non polar molecules with closed shells of electrons. The strength of this force depends on the dipole moment of the atom inducing the second dipole and the polarizability of the second atom. The polarizability depends on the number of electrons in the outer shell of an atom. The same is true for the refraction of light, the socalled "dispersion" -hence the name dispersion forces. However, nothing is being dispersed by dispersion forces, [195] and for better or worse we are stuck with the term. The dispersion coefficients can be written as [196] ] where E A and E B are the energies of the first electronic transition in both atoms, A and B the polarizability of atoms A and B, and where 1, 2, and 3 denote dipole, quadrupole and octopole, respectively. Nicholson et al. [197] analyzed the magnitude, importance and origin of these terms for nonpolar molecules in silicalite and AIPO 4 -5. The static polarizabilities for the charged lattice species were estimated from Auger data using a method given in previous work, [198] and repulsive parameters were obtained by fitting to experimental Henry's law constants. The oxygen atom two-body dispersion accounts for about 80% of the potential energy, while the total interaction of the Si species is of the order of 12%. There are significant contributions from the higher order (C 8 and C 10 ) terms. The induced interaction is extremely small in silicalite, being less than 0.3% of the total potential energy at the bottom of the well, even for polarizable xenon. This is due to cancellation of the O and Si contributions, which are individually quite large. Unlike Coulomb forces, van der Waals forces are not generally pairwise additive: the force between any two molecules is affected by the presence of other molecules nearby, so one cannot simply add all the pair potentials of a molecule to obtain its net interaction energy with all the other molecules. This is because the field emanating from any one molecule reaches a second molecule both directly and by "reflection" from other molecules, since they, Adv. Theory Simul. 2019, 2,1900135 1900135 (16 of 62) too, are polarized by the field. [183] This effect adds an additional contribution to the total van der Waals interaction energy. In general, we have the expansion The effect of the three-body interaction on the energy or force depends on the relative disposition of the molecules and can be positive or negative, so the net effect is usually small. In silicalite it turns out that the 3-body contribution to the total energy accounts for only 3% of the total potential energy for Ar, but is about 15% for Xe. [196] The three body interaction can be quite significant for polar species. The overall effect of multiple-or multibody (including four-body, five-body, etc.) interactions usually results in an overall reduction in the strength of the summed pair interactions.
Dispersion interactions between two ground-state species are always attractive (for excited states it can be attractive or repulsive). [195] Short-range interactions, also labeled exchange or overlap forces, are always unfavorable. They occur between electrons with the same spin but cannot occupy the same region in space (Pauli exclusion principle). The analytical form of the repulsion is unknown. The most common repulsion/dispersion functional form is the Lennard-Jones (LJ) potential often encountered in two flavors where is the well-depth and R min is the distance of minimum interaction energy. Note that the potentials are related by R min = 2 1∕6 . The LJ size-parameter is the distance at which the interparticle potential is zero. The potential should be viewed as an effective potential with many effects lumped in together: the van der Waals term must include not only the long-range induceddipole-induced-dipole r −6 dispersion effect, but also the higher multipoles, the damping of the dispersion, the induction, and the steric repulsion when two atoms get close (sometimes called exchange repulsion), and it must also make up for errors in the highly approximate treatment of electrostatics. [195] A common alternative to the LJ potential is the Hill (also called Buckingham) potential function, The Hill potential, having three adjustable parameters vs. two for Lennard-Jones might be slightly more accurate. A downside of the Hill potential is divergence to large negative energies at r → 0. Therefore the potential needs to be "blocked" or changed to a polynomial repulsion (e.g., MM2) at short distances. It is convenient to separate the VDW pair potential functional form from the other derivatives so that a more general framework can be used. We therefore define the pair potential functional derivatives as We define r = |r ij | = √ r ij ⋅ r ij . The derivatives of  (r) with respect to Cartesian coordinates are given by [199]  (r) The derivatives of  (r) with respect to strain are derived by Ray [181,185,199] The explicit strain derivative terms appearing in the last term in Equation (165) arise from the action of the strain derivative on the explicit cell dependence of the microscopic stress. [181] Note that these expressions above are identical to the bond-stretching ones, as both are valid for pair potentials. Note that it is also common for pair potentials to define f † 1 and f † 2 to further simplify the equations Adv. Theory Simul. 2019, 2,1900135 1900135 (17 of 62)

Rigid-Body Coordinates and Derivatives
Small molecules and phenyl-rings are often kept fixed in molecular simulations. A convenient formulation of rigid units is to use the center of mass (three degrees of freedom) and an orientation (zero, two, or three degrees of freedom, depending on whether it is a single atom, a linear molecule or a non-linear molecule, respectively). The six degrees of of freedom for a nonlinear rigid body can be defined in terms of the center of mass coordinates r = (r 1 , r 2 , r 3 ) and the three components of a vector p = (p 1 , p 2 , p 3 ) =p, which specifies both a normalised rotation axisp through the center of mass, and a magnitude of rotation [200] For the rotation vector p, the corresponding 3 × 3 rotation matrix R can be expressed as (Rodrigues' rotation formula) where I is a 3 × 3 identity matrix, andp is the skew-symmetric matrix obtained fromp The product of the skew-symmetric matrixp and any vector v returns their cross productpv = p × v. The matrix R is then given by [200] One can transform a quaternion q = [q 1 , q 2 , q 3 , q 4 ] T to the axisangle representation and the rotation matrix to the axis-angle representation Denoting the derivative of the matrix A, A Remembering = √ p 2 1 + p 2 2 + p 2 3 the derivatives ofp are [200] When = 0 the singularity can be removed giving R k =p(k) wherep(k) is obtained from Equation 169 usingp with the kth element set to unity. The advantage of this scheme is that all the rigid-body coordinate information is stored in the space-fixed frame and the derivatives of the rotation matrix can all be programmed in general using a fixed subroutine, which can be called to deal with site-site isotropic potentials as well as single-site or site-site anisotropic potentials.
The current prescription can be extended to obtain the second derivatives by successive application of the chain rule. We then need to compute six additional 3 × 3 matrices of two types, namely 2 R p 2 k and 2 R p l p k which are denoted by R kk and R kl Adv. Theory Simul. 2019, 2,1900135 1900135 (18 of 62) The derivativesp kl are [200] When = 0 the singularity can be removed giving R kk =p(k)p(k) and R kl =p(k)p(l).
If the coordinates of two rigid bodies are denoted using the subscripts I and J, and the sites within each rigid body by subscripts i and j, then for site-site isotropic potentials the total energy is where  VDW ij is the pair potential between sites i and j, and The first derivatives are given by [200]  VDW The second derivatives are [200] The strain derivative for rigid units reads [163]  where d i = r I,i − r I is the distance vector between site i and the center of mass of rigid unit I. Note that the constraints result in a torque, and the instantaneous stress-tensor becomes asymmetric (remedied by explicitly symmetrizing the stress-tensor). The Adv. Theory Simul. 2019, 2,1900135 presence of intra-molecular constraints results in a force term to be added to the strain derivative of an atomic system. The derivatives of the energy with respect to strain and center of mass position and of the energy with respect to strain and orientation for pair potentials are given by [163] where For pair potentials we find the strain-strain derivative is given by [163] which reduces to Equation (165) for atomic systems. Miller at al. derived a Hamiltonian form of a minimal, nonsingular representation of rigid body rotations, the unit quaternion. [201] The derived NVE integrator is the symplectic and time reversible integrator for molecules with an arbitrary level of rigidity. Thermo-and barostats can be combined with the Miller scheme to control the temperature and pressure. [202][203][204]

QM-Corrected Potentials
Carefully calibrated Lennard-Jones models usually give quite satisfactory results. The model captures the first order effects, but because the parameters are adjustable parameters, also a large portion of the remaining physics is effectively incorporated. It performs well at moderate and high temperatures, where the potential energy is a relative small part of the total Hamiltonian. Quantum effects are expected to become significant when the Broglie wavelength Λ = h∕ √ 2 mk B T is much larger than interparticle distance. For T = 300K one has Λ = 1 Å for a hydrogen atom and Λ = 0.19 Å for a silicon atom. Except for the lightest ones, all atoms can be treated classically. However, with T becoming lower, quantum corrections may be needed. Using the path-integral formalism [205] the Feynman-Hibbs (FH) potentials can be obtained that include quantum effects: where m = 2 is the reduced mass of a pair of interacting particles, = 1∕(k B T) is the inverse temperature, ℏ is the reduced Planck constant, and where ′ denotes differentiation with respect to r. The thermodynamics given by the FH-potential is modified. [206] For example, the energy is given by where the second term on the right-hand side can be interpreted as a quantum correction to the classical kinetic energy. Liu et al. [207] found that quantum effects are important for hydrogen adsorption in MOFs at cryogenic temperatures. For H 2 -H 2 = m∕2 and ℏ 2 ∕(24 ) = 2.005∕T, while for H 2 -sorbent = m and ℏ 2 ∕(24 ) = 1.002∕T. The Feynman-Hibbs potential, truncated at second order in ℏ, was able to satisfactory account for quantum effects.

Hydrogen Bonding
Ion-dipole interactions can be strong enough to lead to mutual alignment. Dipole-dipole interactions are usually too weak for that with the exception of the "hydrogen bond" (which is essentially just a particularly strong directional dipole-dipole interaction). Water is the prime example of hydrogen bonds but they are also found between molecules with O─H, N─H, F─H, and C─H bonds. The bond moment of O − -H + is unusually large. The electronegative oxygen atom can get quite close to the highly polar O − -H + groups and experience a very strong field. Hydrogen bond strength lies between 10 and 40 kJ mol −1 which makes them stronger than a typical van der Waals bond (≈ 1 kJ mol −1 ) but still much weaker than covalent or ionic bonds (≈ 500 kJ mol −1 ). [183] Even though the hydrogen bond is believed to be purely electrostatics there is no simple equation for the interaction potential known. The electrostatics consist of charge-transfer, polarization, dispersion, and electron-exchange. One does find that the strengths of hydrogen bonds tend to follow a 1∕r 2 distance dependence. Hydrogen bonds can occur intermolecularly as well as intramolecularly and can exist in a nonpolar environment. While the van der Waals and electrostatic point charge interactions are more or less isotropic interactions, hydrogen bonds are directional short-range interactions. The energetically optimum arrangement occurs when the two dipole moments are colinear. Ab Adv. Theory Simul. 2019, 2, 1900135 Table 8. Common hydrogen bonding potential functional forms.
initio quantum mechanical simulations have shown that this interaction can be accurately modeled by combining Coulomb's law with the Lennard-Jones potential if appropriate partial charges are used. [208] Therefore several force fields do not model the hydrogen bond explicitly but account for this type of interaction by van der Waals and electrostatic interactions (MM2). Some force fields employ modified Lennard-Jones potentials (e.g., 12-10, 6-4, and 8-6 potential functions). Other force fields account for the directionality of the hydrogen bond (MM3) or even model lonepairs explicitly. Some common functional forms for hydrogen bonding are listed in Table 8.

Coulomb Potential in Periodic Systems: Ewald Summation
The main challenge in calculating long-range potentials in periodic systems is their slow convergence with distance. The de facto method to deal with this problem is the Ewald summation. Suppose there are N point charges q 1 , q 2 , … , q N at positions r i , r 2 , … , r N within the unit cell satisfying The vectors a , which need not be orthogonal, form the edges of the unit cell. The conjugate reciprocal vectors a * are defined by the relations Let a be the 3 × 3 matrix having the lattice vectors a as columns.
Note that the volume V of the unit cell is given by the determinant of a. Furthermore a −1 is the 3 × 3 matrix having the reciprocal lattice vectors a * as rows. The point charge q i at position r i has fractional coordinates s i defined by The charges interact according to Coulomb's law with periodic boundary conditions. Thus a point charge q i at position r i interacts with all other charges q j (with j ≠ i) at positions r j as well as with all of their periodic images at positions r j + n 1 a 1 + n 2 a 2 + n 3 a 3 for all integers n 1 , n 2 , n 3 . It also interacts with its own periodic images at r i + n 1 a 1 + n 2 a 2 + n 3 a 3 for all integers n 1 , n 2 , n 3 not all zero. The electrostatic energy of the unit cell  can be written where the outer sum is over the vectors n = n 1 a 1 + n 2 a 2 + n 3 a 3 , the prime indicating that terms with i = j for n = 0 are omitted. We define the reciprocal lattice vectors k by with k 1 , k 2 , k 3 integers not all zero and the structure factor S(k) by The structure factor S(k) can be viewed as a discrete Fourier transform of a set of charges place irregularly within the unit cell. The technique of Ewald summation is applicable to true periodic systems, that is, the simulation region or unit cell is effectively replicated in all spatial directions. For short-ranged potentials like the Lennard-Jones, the cutoff is usually chosen as smaller than half the shortest width of the unit cell, such that only summations over interactions in the main cell are required. For distances larger than half the shortest width of the unit cell, neighboring simulation volumes need to be taken into account. In the minimum image convention, the potential seen by a particle is summed over all other particles or their periodic images, whichever is closest. For long-range potentials this construction is inadequate because the contributions from more distant images at 2L, 3L etc, are no longer negligible. One might think that these contributions should approximately cancel and they nearly do. One has to take care in which order to do the sum as illustrated by an example. Consider a system of two oppositely charged ions, periodically extended to form an infinite 1D line of charges, each separated by a distance R. The potential energy of the reference ion with charge −q is: [38] The factor 2 log 2 is the Madelung constant, which is of central importance in the theory of ionic crystals. The series is actually conditionally convergent, i.e. the results depends on the summation order. We can choose a different ordering, for example [209] 1 that is, two positive terms followed by a negative term. Then it can be shown it now converges to 3 2 log 2, which is 50% higher potential energy then before. [209] The problem is in what order should we sum over periodic images. An intuitive and elegant way of doing this is to build up sets of images contained within successively larger spheres surrounding the simulation region. According to this scheme the energy is expressed as in Equation (205). However, using the Ewald transformation, the single divergent summation can be replaced with two convergent summations.

Ewald Transformation
For an infinite system, a meaningful energy can be defined by use of the Ewald transformation. The basic idea of the Ewald approach is as follows [210,211] where the error function erf(x) and its complement are defined as In Equation (210), the first term goes to a constant ( 2 √ ) as r → 0, but has a long tail as r → ∞. The second term has a singular behavior as r → 0, but vanishes exponentially as r → ∞. Ewald's idea is to replace a single divergent summation with two convergent summations. The first summation has a convergent summation in the form of its Fourier transform and the second has a convergent direct summation. Thus the calculation of the electrostatic energy would be evaluated using For an appropriate choice of the parameter , the second summation converges quickly and can be evaluated directly. The first term in the summation must be transformed into Fourier space. In order to describe these summations explicitly, we assume that we have a periodic lattice so that every ion can be located by r i = r ′ i + n, that is a location r ′ i within a unit cell and a periodic translation vector n. In this way, the summation becomes where N denotes the number of unit cells in the system. Since we have a periodic system, N is infinite, but the energy per unit cell  ∕N is well defined. A sum over lattice translation n may be transformed into an equivalent sum over reciprocal lattice translations k according to the identity: where V denotes the unit cell volume. The first term thus becomes ∑ where the last term comes from subtracting out the selfinteraction (i = j) term from the complete lattice sum. Using r ij = r i − r j , the lattice sum reads The last term in Equation (218) comes from the k = 0 contribution. The omission of this term is called a "tin-foil" boundary condition and corresponds to a system embedded in a medium with infinite dielectric constant (a perfectly conducting medium). [212] If the surrounding medium is a vacuum (dielectric constant of unity) then an additional surface energy term appears in the equations. However, tin-foil boundary conditions are standard and for ionic systems the use of this type of boundary condition is essential. [84] The Ewald expression is found to be (220) where the ′ in the summation over the lattice translations n indicates that all self-interaction terms should be omitted, and the triple sum over k, i, and j in the reciprocal term in Equation (219) has been replaced by a double sum over k and i (in the structure factor S(k)) in Equation (220).

Units
The total electrostatic potential energy of interaction between point charges q i at the positions r i is given by A note on the prefactor of the Coulomb potential and conversion to internal simulation units. The dielectric constant of vacuum is where the conversion of energy in internal units is (mass unit) × (lenght unit) 2 ∕(time unit). For example, using Angstrom (10 −10 m) as the length scale, picoseconds (10 −12 s) as the time scale, Kelvin as the temperature, atomic mass (1.6605402×10 −27 kg) as the unit of mass, and atomic charge (1.60217733×10 −19 C per particle) as the unit of charge, then the units of energy will be 10 J mol −1 (1.6605402×10 −23 J) and the Coulombic conversion will be 138935. 48. This factor is needed to convert the electrostatic energy to the internal units at every evaluation.
The factor corresponds to 167101.1 K, 1389.36 kJ mol −1 , and 332.067 kcal mol −1 . In the remainder, the prefactor will be omitted for notational convenience.

General Formalism Finite System
Nymand and Linse have provided a general formalism for atomic charges, dipoles, and polarizabilities and this section is taken from their article. [213] Consider a finite system of interacting molecules possessing atomic charges, dipoles, and anisotropic polarizabilities. Throughout, the Einstein summation convention is used for Greek letter indices. The following interactions tensors are employed Here, r ij = r i − r j , where r i is the -component of the position vector of atom i. After a generalization of the approach by Veseley, [214] the total potential energy of a finite system containing a set of atoms possessing charges, dipoles, and anisotropic polarizabilities can be expressed as [213]  =  int +  pol (227) where q i is the charge on atom i, i the -component of the total dipole moment of atom i, ind i, the -component of the induced dipole moment of atom i, and i, the -component of the polarizability tensor of atom i. In Equation (228) the four terms in  int represent the charge-charge, the charge-total dipole, the total dipole-charge, and the total dipole-total dipole interaction, respectively, whereas  pol represents the work of forming the induced dipoles. The total dipole moment of atom i is given by where stat i, denotes the -component of the permanent (static) dipole moment of atom i.
By introducing a infinitesimal test charge q at r, the electrostatic potential (r) at r is defined according to where  is the potential energy of the system including the test charge. The -component of the electrostatic field E (r) is defined as and the -component of the electric field gradient E (r) at r as The inclusion of the test charge in the summation of Equation (227) and the application of Equations (231)-(233) on Equation (227) gives the electrostatic potential, field, and field gradients. The corresponding quantities on atom i become (after omission of any self-terms) [213] : The induced dipole moments are determined by a minimization of  with respect to ind i, where E i is the -component of the electrostatic field on atom i arising from other charges and total dipole moments as given by Equation (235). The potential energy is often separated into two physically appealing contributions: an electrostatic and an induction term.  (230) and (237), Equation (227) can be recast according to (238) where the electrostatic energy is given by and the induction energy by and By taking the derivative of the total potential energy  given by Equation (227) with respect to the position of atom i and noting that  is stationary with respect to variation of ind i, , thecomponent of the force on atom i arising from its interaction with atom j can be obtained according to The -component of the total force acting on atom i can conveniently be expressed as [213] where E i is the -component of the electrostatic field and E i the -component of the field gradient on atom i arising from other charges and total dipole moments as given by Equation (235) and Equation (236), respectively.
Finally, the virial Φ, which enters the expression for the pressure, p = p ideal − Φ∕3V, is given by where V denotes the volume of the system. The virial for a molecular system composed of charges, static, and induced dipole moments formally is given by with f ij and f i given by Equation (243) and Equation (244), respectively.

General Formalism Periodic System
Nymand and Linse extended the formalism to periodic systems using the Ewald summation. [213] Also this section is directly taken from their work. When applying truly periodic boundary conditions, the summations over j ≠ i are replaced by infinite lattice sums. In the Ewald formalism, the expression of a given property is conventionally divided into four different terms viz. i) a real term arising from the short-range interaction in the real space, ii) a reciprocal term arising from the long-range interaction in the reciprocal space iii) a self-term correcting an over-counting in the reciprocal space, and iv) a surface term appearing in systems with nonconducting surroundings. The modified interactions tensors are denoted as . In practice, this surmounts to replacing r −(2n+1) , n ∈ {0, 1, 2, 3}, in the interaction energy expressions by its screened counterpartr −(2n+1) , wherê and for n > 0. Note that this can straightforwardly be extended to higher order moments through the use of the recursion formula Equation (248). We define  ij =  ij = T ij =  ij ≡ 0 when atoms i and j belong to the same molecule. The modified tensors read where d ij is the intra-molecular distance between atom i and j.
The tensors ij are used to remove intra-molecular electrostatics. In real space these interactions can simply be skipped. The Fourier part can then be corrected by subtracting real space valuẽ  ij for excluded pairs, that is, by correcting in real space. An important note is that sometimes numerical divergences can occur. For example, in the core-shell models the shell can overlap with its core. This can be resolved by replacing them by analytical expressions for the limit r → 0 for small values of d, for example, d < 10 −4 Å. We have The real-space cancellation is imperfect and better cancellation can be achieved if the excluded interactions are subtracted out using a reciprocal-space expression. [81,215] Nymand and Linse provided expressions for  , i , E i , E i , f i , and Φ for a truly infinite periodic system of charges, static dipoles, and anisotropic polarizabilities. The expressions correctly reduce to those for systems of only charges, or of only dipoles. The volume is defined by V = L x L y L z , where L x , L y , and L z denote the lengths of the box edges, A k is defined according to and Q q according to where 2 ≡ −1. The bar-notationQ q is used for the complex conjugate of Q q . The Potential Energy: For a system of charges, static dipoles, and anisotropic polarizabilities in an infinite lattice, the potential energy can be written as [213] where the four terms are the real, the reciprocal, the self, and the surface contribution, respectively, to the potential energy and  pol represents the work of forming the induced dipoles. From Equation (266) and on, the Ewald convergence parameter is assumed to be chosen such that the screening is sufficiently large to ensure that only particles in the primary box need to be considered when calculating the real space sum. In the self-term  self , the single sum over i is always present. In the Adv. Theory Simul. 2019, 2, 1900135 remaining part, the index p runs over all particles and the notation i, j ∈ p means that atomic sites i and j both reside within particle p. Hence, the double sum over i and j contributes only if atom i and j belongs to the same particle.
Finally, the surface term  sur originates from the total dipole moment of the simulation box. The quantity sur entering in  sur is the dielectric constant of the continuum surrounding the replicated sample, and is often chosen to be either ∞ (tinfoil conditions) or 1 (vacuum conditions). For systems such as ionic and dipolar ones, where long-range interactions are important, Ewald summation with tinfoil boundary conditions corresponds to the physically most desirable situation. The general energy expression reduces correctly for systems consisting of i) only charges by setting stat = ind = 0 and ii) only dipoles by setting q = 0 and ind = 0. The Electrostatic Potential: The electrostatic potential on atom i arising from other charges and total dipole moments is obtained from Equation (266), and becomes [213] For systems of only atoms the sum in the self-term vanishes. For a system of charge neutral molecules the surface term may be written as The Electrostatic Field: Similarly, the expression for the corresponding -component of the electrostatic field on atom i is [213] In particular, for a system consisting of only charges the reciprocal term reduces to and for a system of only static dipoles it becomes where Q q and Q (stat) were defined in Equations (263) and (265). For an atomic system, the self term is zero. The -component of the electrostatic field on atom i arising from other induced dipoles only, E ind i, , is needed later, and it is obtained from Equation (269) by setting Q i = 0 and by replacing i by ind i, and Q q by Q (ind) . For the periodic system, Nymand and Linse showed that i) the induced dipole moments are still given by Equation (237), but E i, is now given by Equation (269), and ii) the potential energy can again be separated into an electrostatic and induction term according to Equation (238). The electrostatic term  ele is still given by Equation (239) and the induction term by Equation (240) The Electrostatic Gradient: The -component of the field gradient i on atom i is given by [213] For an atomic system, the self-term is zero. Note that E sur i, is zero. The Force: The expressions for the -component of the force on atom i is readily available from Equation (266) by taking its derivative with respect to r i, and reads [213] Nymand and Linse showed that the use of E i (given by Equation (269)) and E i (given by Equation (272)) leads exactly to Equation (244), and that Equation (244) holds term-wise. For a system consisting of only charges the reciprocal term reduces to and for a system consisting of only static dipoles the reciprocal term reduces to Note that the self-term is not constant due to the presence of i . For an atomic system, the self-term is zero.
The Virial: The virial can be expressed as [213]

Charge Only Systems
The expressions of the previous section reduce to the familiar ones for systems consisting of only charges. They are easily incorporated into the usual Ewald summation routines. The Potential Energy: The potential energy can be written as where The Electrostatic Potential: The electrostatic potential in atom i arising from other charges is given by The Electrostatic Field: The -component of the electrostatic field on atom i is given by where Adv. Theory Simul. 2019, 2, 1900135   1900135 (27 of 62) www.advancedsciencenews.com

www.advtheorysimul.com
The Electrostatic Field Gradient: The -component of the field gradient on atom i is given by where The Force: The -component of the force on atom i is given by where The Hessian: The atomic expression for the Hessian can be written out explicitly as [163] 2  real and is similar to the expression given by Krishnan and Balasubramanian. [177] Note that the Hessian requires a double summation over particles.
The Strain Derivatives: The strain derivative of the Ewald summation reads [182] where The Ewald expressions for positional cross derivatives read [163]  real Adv. Theory Simul. 2019, 2, 1900135 The strain-strain derivative for the Ewald summation is given by [182] where

Polarization via Induced Dipoles
There are basically three methods to generate induced dipoles [95,216] 1. The Drude oscillator model (also called Core-Shell model). In the shell model, the electronic polarizability is incorporate by representing an atom as a two-particle system: a "core" charge z i + q i is attached by a harmonic spring (with spring constant k) to a "shell" to a shell charge −q i . [216] Atomic polarizability i , is related to force constant k of the harmonic spring connecting the core and shell and is determined by [96,216] The shells are massless. For minimization of core-shell models the Hessian matrix  contains both the core and the shells because the shells needs to be minimized with respect to the cores too. [217] The set of vibrational frequenecies can be obtained by diagonalization of the dynamical matrix  where M is a diagonal matrix that contains the masses of the atoms. For use in MD, Mitchell and Fincham assigned small masses to the shells (typically taken to be less than 10% of the total particle mass) and proposed a method to determine their motion using adiabatic dynamics. [218] 2. The Fluctuation-Charge (FQ) model, The FQ model is based on the principle of electronegativity equalization (EE): a charge flows between atoms until electronegativities of the atoms become equalized. [96] The partial charges on each atom q i of the molecule are found by minimizing the electrostatic energy of the system (equivalent to equalizing electronegativities) in a given configuration subject to the constraint that the total charge is conserved. Since charges depend on interactions with other charges located on the same molecule or other molecules, their values change with every time step or sampled configuration during a simulation. During the charge equalization process, charges may be constrained to movement within a molecule or movement between any atom pairs. Application of the latter option leads to charge-transfer effects. The method is further described in Subsection 4.13. 3. The point-polarizable dipole model, In the classical point dipole approach, polarization energy is described as the interaction between static point charges and static dipoles and the dipole moments they induce. [96] A straightforward way to choose the sites of the inducible dipoles is to put them on all atoms of the system. [95] In a molecular simulation, numerically problems may occur when two inducible dipoles come spatially too close to each other, and the dipolar interaction between them will mutually enhance their induced dipoles to infinite size. The polarization catastrophe can be avoided in several ways, [95] for example by chosing VDW parameters large enough such that interatomic distances are always larger than this critical distance. Chosing dipole sites that do not reside on atoms is possible but leads to more complicated expressions for the forces on these atoms.
The procedure for induced dipoles in the formulation of Nymand and Linse for calculating the electrostatic and induction contributions of the energy, forces, and the virial in a molecular system, where each molecule is described by atomic charges, dipole moments, and anisotropic polarizabilities, is as follows [213] 1. Calculate the electrostatic potential stat i and the electrostatic field E stat i, from charges and static dipole moments, according to Equation (241) and Equation (242), respectively. 2. Calculate the induced dipole moments ind i, either by prediction, or self-consistent iteration using Equation (237) and Adv. Theory Simul. 2019, 2, 1900135   1900135 (29 of 62) 3. Calculate the electrostatic energy  elec and the induction energy  ind from Equation (239) and Equation (240) Due to the many-body nature of polarization, the interactions between all molecules change and have to be re-computed for every step. In MD simulations the full recomputation has to be done any way, but simple MC moves work via computing energydifferences between the new and old state for only the atoms that are changed. Having to recompute the full energy for the new and old state means an order-of-magnitude penalty in efficiency. Lachet et al. therefore proposed a procedure to mitigate this limitation by using only the first term of the multipole expansion. [219] The induction energy is then given by where i is the atomic polarizability of interaction site i of the adsorbate atoms, and E i is the electrostatic field at the position of atom i due to the partial charges of all the framework atoms. The approach accounts solely for polarization between the framework and neglects polarization caused by induced dipoles (called the "back-polarization"). In this way, the difference in induction energy to the previous configuration can simply be added as another energy term in the acceptance rule of the Monte Carlo algorithm, and the computational cost is similar to simulations without considering explicit polarization. Lachet et al. showed that for xylene NaY zeolite system this procedure captures around 94% of the total induction energy (the back-polarization effect accounts for only about 6% of the total induction energy). [219] When explicitly accounting for polarization, one has to ensure that the force field parameters describing the remaining interactions do not include an implicit polarization contribution which would have to be removed. Otherwise, the contribution of polarization would be double counted, once implicitly and once explicitly. The removal of implicit polarization is necessary if a standard force field is used as starting point for the development of a polarizable force field, because current force fields are likely to be calibrated to reproduce certain experimentally observed properties.

Electric-Field Dependent Potentials
Cicu et al. proposed a general method to include electric-fielddependent terms in empirical potential functions representing interatomic interactions, [220] and this subsection is taken from their work. Let us consider a system of N charged particles, interacting with each other via a potential energy function depending on their positions and on the electric field acting upon each of them and generated by other particles (which also depend on the positions of the particles): where r i are the positions of the particles and r i are the electric fields. The force acting on a particle i is given by [220] where E stands for {E i }. In particular, it is worth to note that the terms in Equation (314) including the 3N × 3N matrix, corresponds to many-body forces involving in principle all the particles of the system as it is better illustrated if the atom pair approximation is assumed. In this approximation, the potential energy is represented by the sum of pairwise interactions  p (r ij |E) between i and j. Therefore Equation (314) and Equation (315) become [220] where, again, E stands for {E i } and the terms containing the electric field gradients have been grouped in two sums, the former including the gradient of the electric field acting on particle i for which the force is calculated, and the latter including the gradient of the electric field acting on all the other particles multiplied by the derivatives, with respect to these electric fields, of the potential function terms involving particle j, even though these terms do not include explicitly any interaction with the particle i. It is worthy to note that in general the last group of terms is not zero: indeed it ensured that the third law of the dynamics (as generalized for many-body forces) is obeyed.

Charged Frameworks
A minor complication arises when the framework has a nonzero net electric charge compensated by counter ions. Let us assume the framework is kept fixed, but the cations are allowed to move. Although the system a whole may be electrically neutral, the omission of framework-framework interactions from the calculation also means the Ewald Fourier-contributions should be splitted into separate sums which are each net-charged. Not only the interactions of a framework atoms with other framework atoms should be omitted, but also the interactions with all its images.
Adv. Theory Simul. 2019, 2, 1900135 Splitting of the potential energy into separate contribution involves computing cross terms between component  and . In real space this is trivially accomplished, but the reciprocal separation is more difficult. Cross term interaction energies in reciprocal space are given by It is important to note that Equation (318) only applies when both the separate sums over species A and B are charge neutral. This is in general not the case. For example, in MC adsorption simulations one often encounters a negatively charged zeolite chargecompensated by either cations or protons. All-silica zeolites devoid of cations are neutral, but framework ions like aluminum induce a net-charge. It is customary to treat charged periodic systems via a uniform neutralizing background plasma. However, this leads to serious artifacts in both system energy and pressure and leads to unrealistic behavior. [221] Bogusz et al. corrected these artifacts by instituting a net-charge correction term that consists of subtracting off the Ewald sum for a single particle with charge equal to the net charge. [221] This correction implicitly restores -independence in net-charged systems. We define  as the reciprocal energy of single ion place at the center of charge o If only the energy is needed the particle can simply be placed at the origin. Note that the term then only depends on the shape and size of the simulation cell and has be computed only once if the cell does not change. The charge-charge interaction energy  , of component  and  is given by where ij = 1 if i = j, zero otherwise, denotes the Kronecker delta. These expressions are valid in any kind of net-charge arrangement, including a net-charge of the total system.

MD Implementation
In an MD simulation, all atoms move according to Newton's equations of motion and the forces computed on the atoms. Equation (296) shows that we basically need to compute two sums: However, cosine and sine are very expensive to compute, on modern CPU's roughly 10 times more expensive than multiplication and 15 times more expensive then addition or subtraction. One trick is to compute the sin and cos only for the first k-vector and use recursion to build up the higher terms. Chebyshev method is a recursive algorithm for finding the nth multiple angle formula knowing the (n − 1)th and (n − 2)th values Alternative, we can start with cos(a + b) = cos(a) cos(b) − sin(a) sin(b) and substitute nx = a and x = b and obtain This way we obtain for cos(nx) and sin(nx) sin (nx) = sin ((n − 1)x) cos (x) + cos ((n − 1)x) sin (x) This means we have to compute cos(x) and sin(x) once, and build up the multiple angles via the recursion formula using addition/subtraction and multiplication only. A doubling in efficiency can be gained by making use of the symmetry of the reciprocal lattice. The loop over all k-vectors k = 2 ( n x L x , n y L y , n z Lz ) range over [222] : • n x ranges over the value 0 to k max The Ewald method has an execution time which scales as N 3∕2 for appropriate and number of k-vectors. [223] When the total time is optimized it is roughly equally divided between real and reciprocal space parts of the calculation. [223,224] The parameter dictates the rate of convergence of the real and Fourier-space sums. The real space sum is truncated at r cut and must be chosen such that the contributions beyond r cut are negligible. The errors for certain cutoffs can be estimated. [224][225][226] The cutoff r cut is usually set to r cut = L∕2 and in practice is close to 6∕L and typically 200-300 wavectors are used in the real-space sum. [78] DL_POLY computes the and k max -vectors given an input precision and cutoff r cut where L x , L y , and L z are the cell lengths, and the function "rint" rounds off to the nearest integer. These estimates for the Ewald parameters are largely based on relative error estimates for the real and reciprocal space terms In LAMMPS, [227] the parameter is chosen as The k max are obtained by iterating from k max = 1 until the error estimate is smaller than the desired precision. As mentioned, the structure factor S(k) (Equation (207)) can be viewed as a discrete Fourier transform of a set of charges place irregularly within the unit cell. The Smooth Particle Mesh Ewald (SPME) method is such an FFT-based method for the fast evaluation of electrostatic interactions under periodic boundary conditions. [228] The reciprocal-space cost of the Ewald summation is reduced to (N log N) and hence SPME is much faster for large systems. However, the pre-factor is much larger, and it is estimated that the cross-over point happens around 10 000 ions. [217] Cisneros et al. reviewed the classical Ewald summations, particle-particle particle-mesh, particle-mesh Ewald algorithms, multigrid, fast multipole, and local methods. [229] As we shall see in the next section, conventional Ewald is more convenient for MC algorithms. An alternative for condensed matter systems is to omit the Fourier part because in these systems the inherent screening results in an effective short-ranged interaction. An example is the Wolf direction summation method. [230] Rahbari et al. showed that this direct summation method gave similar results as the Ewald summation in aqueous methanol mixtures. [231]

MC Implementation
Most MC moves are based on changing the coordinates of a single molecule and computing the energy difference of a new trialmolecule with respect to the old molecule. As the positions of all the other molecules are unchanged it is unnecessary to recompute the reciprocal energy of the non-moving molecules. An important example is a rigid zeolite. The precomputing and storing of the Ewald sums makes the Ewald-summation method the preferred method for Monte Carlo simulations.
Let's define We assume the simulation starts with a full computation of Equation (322), and the sums per k vector are stored in (, k). For each of the n-species the sums (, k) and the net-charges per species needs to be stored. After an accepted move the sums and net-charges are updated The difference of a selected molecule in the new state and the old molecule belonging to species  can be calculated by

Rigid Frameworks
The effect of flexibility on adsorption in zeolites is known to be small. [232] Significant speedups can be obtained by keeping the framework fixed and pre-computing the potential energy surface induced by the framework. [233,234] Instead of looping over all framework atoms in order to compute the host-adsorbate energy at each time step, one can construct a 3D grid before the simulation and then obtain the energy by interpolation during the simulation. The more points in the grid the higher the accuracy. RASPA [235] implements the triclinic grid interpolation scheme in Adv. Theory Simul. 2019, 2, 1900135 three dimensions of Lekien and Marsden. [236,237] The algorithm is based on a specific 64 × 64 matrix that provides the relationship between the derivatives at the corners of the elements and the coefficients of the tricubic interpolant for this element. The cubic interpolant and its first three derivatives are continuous and consistent. The same grids can therefore be used for both MC and MD with no additional energy drift besides the drift due to the integration scheme, that is, the energy gradients are the exact derivatives of the energy at each point in the element. For LJ potentials a separate grid needs to be generated for each parameter. However, for the real part of the Ewald summation a single grid generated with a probe charge of unity will suffice. The real part of the Ewald summation can be obtained by simple multiplication of the grid-value with the actual charge of the atom. Note that the grid depends on , and hence the real-space grid depends on the number of used unit cells.

Bond-Dipoles in Periodic Systems
The MM2 and MM3 force fields are able to calculate the electrostatic interaction in non-charged polar molecules by using bond dipoles. The electrostatic interaction is then given by where is the angle between the two dipoles and i and j are the angles between the dipoles and the vector connecting them. There is little difference if properly parameterized. However, almost all force fields prefer the charge model because it is easier to parameterize. Amirjalayer et al. extended the MM3 force field and used a building block approach for the parameterization of IRMOF-1. [238] We will derive here the expressions to integrate the host bond-dipole model with adsorbate models that contain atomic charges using the Ewald summation framework. Consider bond-dipoles between pairs of atoms a and b. We denote r ab i as the separation vector or bond vector of the two atoms in the bond-dipole i, r ab i is the bond distance, and i is the bonddipole with predefined (fixed) magnitude i The dipole has a direction along the normalized bond separation vector and is located at the center r i of the bond. Using B 0 = 1∕r, B 1 = 1∕r 3 , B 2 = 3∕r 5 , B 3 = 15∕r 7 we can write the energies for the charge-charge, charge-bond-dipole, and bond-dipole-bonddipole as where N ch is the number of charges, N bd the number of bonddipoles, r ij = r i − r j the distance between charges and/or centers of bond-dipoles, and q j denote the charge of ion j. Note that although the energy expressions are the same for bond-dipoles and point dipoles, the expressions for forces and stress/pressure are not. For bond-dipoles, the forces are exerted on the two atoms of the bond-dipole. The force expression for charge i and charge j is given by We derive the force expressions for bond-dipole i and charge j as and the expression for bond-dipole i and bond-dipole j Adv. Theory Simul. 2019, 2, 1900135 www.advancedsciencenews.com www.advtheorysimul.com The stress and pressure (related to the trace of the stresstensor) can be obtained using the results for point-dipoles [213] but corrected. The contribution for each charge-charge, chargebond-dipole, and bond-dipole-bond-dipole interaction is where V is the volume and the corresponding force expressions are given above. This procedure is related to the conversion between atomic and molecular stress for rigid molecules. Usually this conversion produces a torque and an asymmetric stress which needs to be symmetrized. The expressions given above still apply for the real part of the Ewald summation if the following expressions are used where erfc is the error-function complement. The Fourier part of the charge-charge, charge-bond-dipole, and bond-dipole-bond-dipole energies is The force in direction on a charge exerted by other charges is given by The force in direction on a charge exerted by the bond-dipoles is given by The forces in direction on atom a and b, respectively, of bond dipole i exerted by charges are given by Adv. Theory Simul. 2019, 2, 1900135 The forces in direction on atom a and b, respectively, of bond dipole i exerted by other bond-dipoles are given by In Table 9 we have compiled numerical data on various systems with charges and/or bond-dipoles. The Ewald results differ from the results of a finite systems and the difference is due the infinite periodic images. For this box-size the results are close though and serve as a good check for the derived expressions. Some bond-dipole systems are chosen such that the results are identical in energy to the point-dipole systems discussed by Nymand and Linse. [213] Note that the forces differ, albeit that they are related (the bond-dipole expressions contain a term identical to the one for point-dipoles).

Charge Equilibration Methods
Related to electrostatics is the problem of computing atomic charges in periodic systems. In the charge equilibration models, effective atomic charges are obtained by minimizing a potential energy function in which the adjustable parameters are atomic parameters. [239] Ongari et al. recently evaluated various charge equilibration methods to generate electrostatic fields in nanoporous materials. [240] The following section is based on the papers of Wilmer and Snurr. [241,242] They begin by assuming a Taylor series form for the energy of an isolated (gas-phase) atom as a function of its charge, E A (q), centered about some particular partial charge q * The original Q eq method [243] expands around q * = 0 but when atoms are expected to have large positive charges, such as the transition metals in MOFs, the atomic energy given by a secondorder Taylor series centered at q * = 0 is unlikely to yield accurate results because the energy is based on an extrapolation far from the measured value. Significantly improved results are obtained for mono or divalent ions in metal-organic frameworks by Taylor expanding around its oxidation state. By adding and subtracting E A (q * + 1) and E A (q * − 1) from E A (q * ) expressions for the first and second partial derivatives can be obtained Adv. Theory Simul. 2019, 2, 1900135 1900135 (35 of 62) Table 9. Calculated energy and forces for charge and bond-dipoles as obtained from the finite and infinite system (Ewald summation). Box size L x = L y = L z = 10, reduced units: the unit of charge is 1e, the unit of length is 1 Å, and 4 = 1. The spherical cutoff is L∕2. The Ewald results are corrected for nonzero net charge of the system using the method of Bogusz et al., [221] e.g. the uncorrected energy of a single +1 charge of the system is  = −0.1402549, while the corrected energy is  = 0. The definition of the bond-dipole is that it points from a to b; toward the partially positively charged atom if the magnitude is positive.
where q * is the electronegativity (also known as Mulliken electronegativity and J q * is the idempotential (also known as the atomic hardness). The atom energy can now be described solely as a function of its charge and measured parameters that are available in the literature for most of the periodic table (currently available up to Z = 86). This leads to a quadratic energy function for every atom based entirely on measured quantities The energy of the entire system E sys can be expressed as a sum of individual atomic energies and pairwise Coulombic interactions [242] E sys ( where E Orb k is the orbital energy term for the kth atom (a pairwise damping term to prevent infinite charge separation when two atoms are brought arbitrarily close together), and E Coul k is the energy of the charge on the kth atom interacting with the charges of all other atoms in the system. For a non-periodic system, E Coul k can be computed using a direct sum, but for a periodic system the Ewald methodology is needed.
The charges q 1 , q 2 , … , q N are determined by finding the minimum energy in Equation (382) in a noniterative manner (regardless of whether the system is periodic), subject to the constraint that N ∑ k=1 q k = q tot (383) where q tot is the net charge of the system (which must be zero in the periodic case). The general approach to minimizing a function of many variables subject to constraints is via Lagrange multipliers, where the gradient vector ∇ spans the N dimensional space defined by the partial charges. Matching the vector components of Equation (384), we get the following N − 1 equations: Combining Equations (385) and (383) provides N equations with N unknowns (the charges) which can be solved simultaneously, without iteration. [242] Adv. Theory Simul. 2019, 2, 1900135   1900135 (36 of 62)

Energy Expansion
The potential energy surface  of a (periodic) system can be Taylor-expanded around a configuration x of the system [162]  is usually referred to as the Hessian matrix. The superscript ‡ denotes the transpose of a vector or matrix. The real potential energy surfaces of common force fields are rarely harmonic, but still the expansion is usually truncated at the second order (i.e., harmonic analysis). This assumes the energy surface is at least locally quadratic and can iteratively be used to find the nearest local minimum. The steps through the energy landscape can therefore not be taken too large. If the potential energy surface truly were quadratic, a method like Newton-Raphson would be able to find the minimum energy configuration in a single step. At the minimum energy configuration the first derivatives are zero, and all harmonic information on the system is therefore described by the Hessian matrix. The generalized Hessian, having dimensions (3N + 6) × (3N + 6) with N being the number of atoms, is given as with the force constant matrix  ij (3N × 3N) and the Born term  (6 × 6) being the second-order derivative of the internal energy with respect to position and strain, respectively. The strain (and also stress tensor) is symmetric and can be simplified to a 6D vector using Voigt notation: = ( xx , yy , zz , yz , xz , xy ) = ( 1 , 2 , 3 , 4 , 5 , 6 ) The Born term accounts for distortions of the lattice,  i and  i are cross-terms. The 0 K elastic tensor is defined as the derivative of the stress with respect to the strain [244,245] at zero gradient h = 0 and can be expressed in terms of the generalized Hessian [181] The relaxation term arises when more than one particle is present in the unit cell. [181] When the system is strained the atoms need to relax relative to one another, because before and after the strain applied the system must be in a state of zero net force. Note that the generalized Hessian  encompasses the conventional Hessian  and hence all information on the vibrational spectrum. In addition to the vibrational information, elastic constants also contain information on the gradients and second derivatives on the cell, and the cross terms of position gradient and cell gradients.

Normal Mode Analysis
Normal Mode Analysis (NMA) has become one of the standard techniques in the study of the dynamics of molecules. [246] It is the study of motions of small amplitudes in harmonic potential wells by analytical means. Here, small means small enough that the harmonic approximation holds. Usually this means that harmonic vibrational analysis can provide a very good description of the system at low temperature. A NMA contains all timescales. However, disadvantages include limited motion around a single stable conformation and lack of anharmonic features that are small but sometimes important. Despite these obvious limitations, NMA has found widespread use. The equation of motion of a molecule in a harmonic well is given by where M is a diagonal matrix that contains the masses of the atoms. [87,246] Substitution of the general solution x = A cos( t + ) into the equation results in To remove the dependence on the mass matrix on the right side, one can rearrange the equation as and denoting the new quantities with new symbols we have Here,  ′ = M −1∕2 M −1∕2 is the mass-weighted Hessian matrix, A ′ = M 1∕2 A is the eigenvector of the mass-weighted Hessian matrix and needs to be unmass weighted for normal modes A = M −1∕2 A ′ , and = 2 . The mass-weighted Hessian is usually projected to remove the translation, rotation, and constraints. The harmonic frequencies are related to the eigenvalues of the projected and mass-weighted Hessian by The normal mode i is given by . The eigenvectors give the direction and relative amplitude of each atomic displacement.

(37 of 62)
The value A i ∕ i is an arbitrary amplitude for displacement along normal mode i. If atoms are undergoing thermal fluctuations along each mode, the standard deviation of each atom is given by setting A i = √ 2k B T∕m n , where m n is the atomic mass of atom n. [87,246] For the generation of the trajectory of normal mode i using N frames, the following expression applies: The magnitude of the motion depends on temperature and is inversely proportional to its frequency. The largest contribution to the atomic displacement comes from the lowest frequency normal modes, whereas for high-frequency eigenvectors, only a few atoms contribute. As mentioned, the low-frequency modes determine large scale framework flexibility [31,32] and are responsible for structural rearrangement as a function of external stimuli. [20]

Optimization
One can differentiate the energy Taylor expansion Equation (386) with respect to x, set the result to zero and solve to obtain what is known as the Newton-Raphson step [87] x = − −1 h The Newton-Raphson step can also be expressed as a sum over the eigenvectors e i (called local principal modes) and eigenvalues i of the Hessian matrix where e T i h is the component of the gradient along the eigenmode e i . For zero eigenvalues, the corresponding step component is set to zero. A zero eigenvalue means that for a displacement in the direction of the eigenvector the energy does not change, while a positive and negative value mean an increase and decrease in energy, respectively. Therefore, a true minimum has all positive eigenvalues. A first-order saddle point has exactly one negative eigenvalue. The Newton-Raphson steps minimize along the eigenvectors with positive eigenvalues and maximize along eigenvectors with negative eigenvalues. Therefore, when the starting configuration is of the correct curvature (the desired number of negative eigenvalues) the Newton-Raphson step is a good one to take. In general, however, the step must be modified to obtain a structure of the desired curvature. [248] A simple but very powerful modification is to use a shift parameter which shifts the value of the eigenvalues [249] Simons et al. [250] derived an equation to find the shift parameter which can be solved by iteration. Note that, besides minimization, the method can also be used to find saddle points. [248] Taking a big step leads to an increased likelihood of moving out of the region where the Hessian was valid. Therefore it makes sense to set a maximum tolerance on the size of any calculated x and scale down x accordingly if this maximum is exceeded. [250] The mode-following technique is also applicable to minimizations of large quantum mechanical systems. Here, the analytical evaluation of the Hessian might be computational and memory demanding, but efficient parallel distributed data algorithms have been developed. [251] An an illustration, the minimization process using the generalized Hessian is shown in Figure 3 for the MIL-53 with 0, 1, or 2 water molecules per unit cell. The initial configuration is the experimental crystal structure [252] with the water at random positions. To find the lowest energy configuration the minimization was repeated 1000 times for the systems with water. Initially, there are many negative eigenvalues present and the number increases with the amount of water in the system. The number of minimization steps therefore depends on the amount and initial positions of the adsorbates. Slowly but steadily the number of negative eigenvalues is reduced to zero, although it is possible that the number increases before decreasing again. This usually corresponds to a substantial structural change (angle and shape change). As can be seen, once zero eigenvalues have been reached the convergence is very fast. The stopping criteria has been that each of the gradients on the particles and on the cell have to be smaller than 10 −10 kJ mol −1 /Å 2 . Initially the gradients are high and the volume increases. This behavior increases with water content, because the water is placed randomly inside the structure. During the minimization the structure relaxes and for no water or one water molecule per unit cell the volume stays close to the open-pore starting structure. However, when two water molecules are optimized, the structure collapsed to the narrow-pore structure.
The method can be of great utility in force field development and to assess the influence of adsorbates on the framework structure. It is able to provide the correct solution, that is, zero gradients and the desired number of negative eigenvalues, by construction, where other methods can give wrong solutions. The method can be applied to unit cell minimization of periodic systems such as metal-organic frameworks and zeolites, for vibrational analysis (IR spectra and mode analysis), for force field development and for the computation of elastic constants at zero Kelvin. In Subsection 6.9 we will discuss an application of this methodology: the fitting of MOF force fields on elastic constants.

Force Field Optimization of Biological Systems
As an example to see how a biological force field is optimized, let us discuss how the CHARMM force field is optimized as Adv. Theory Simul. 2019, 2, 1900135 1900135 (38 of 62) Figure 3. Mode-following minimization of MIL-53 using the model of Salles et al. [247] with zero, one and two water molecules per unit cell: a) the number of negative eigenvalues, b) the convergence of the energy, c) the convergence of the maximum of the gradients on the particles (translation and rotational) and on the cell, and d) the volume of the unit cell. Reproduced with permission. [163] Copyright 2009, American Chemical Society.
described in ref. [253]. The CHARMM potential energy function is [118] Figure 4 is a flow diagram of the CHARMM parameter optimization procedure. [253] Loops I, II, and III were included in the optimization of the CHARMM22 force field for nucleic acids, with loop IV representing an extension of that approach included in the CHARMM27 optimization.
In the CHARMM22 nucleic acid parameter optimization a variety of model compounds were selected with target data collected on those compounds. This target data included both experimental and ab initio data and solely acted as the basis for the parameter optimization. Empirical force field calculations were performed on the model compounds with the computed properties compared with the target data. The parameters were then manually adjusted to better reproduce the target data. Part of this process involved iterative procedures where, upon changing one class of parameters, a set of previously optimized parameters were re-adjusted if necessary (loops I, II, and III in Figure 4). For example, a set of partial atomic charges would be assigned to a model compound following which dihedral parameters would be adjusted to reproduce a target potential energy surface for that model compound. The partial atomic charges would then be reinvestigated due to possible changes in geometry associated with optimization of the dihedral parameters that could effect the reproduction of the target data for the charge optimization. This approach yields a parameter set that accurately reproduces a variety  of internal (e.g., geometries, vibrational spectra, conformational energetics) and interaction (e.g., interactions with water, heats, of sublimation) target data for the selected model compounds. Once the optimization procedure at the model compound level was complete, the resultant parameters were then used to perform simulations of B and Z DNA in their crystal environments, both of which yielded satisfactory agreement with experiment. At this point, the CHARMM22 parameterization was considered complete.
Re-optimization of the protein backbone parameters to systematically deviate from the QM energetic data led to improved properties for the protein backbone. This additional procedure is represented by loop IV in Figure 4. The need for this additional loop may reflect limitations associated with the level of theory of the QM data as well as the simplified form of the potential energy function in Equation (401), and emphasizes the importance of including macro-molecular properties as part of the target data for the parameter optimization procedure. The fitting procedure highlights that all the terms in Equation (401) influences each other. In iterative schemes, the long-range interactions are usually chosen first before optimizing the internal parameters. Recent developments and future trends in force fields for biomolecular simulations have been reviewed by Nerenberg en Head-Gordon. [254]

Force Field Optimization for Adsorption and Diffusion in MOFs and Zeolites
In contrast to biological systems, MOFs and zeolites are fundamentally different and are usually considered a host-guest system. The host is the framework, the guests are small adsorbate molecules that are adsorbed inside the nanopores. The adsor-bates experience confinement induced by the structure. Adsorption is driven by dispersion interactions and it is crucial to accurately describe the long-range interactions in the system. There is a natural separation of the system into a model for the adsorbates, a model for the adsorbate-framework interactions, and if the framework is taken as flexible, a model for the framework itself.
The long-range interactions are: 1) the Coulombic interactions, and 2) the van der Waals interactions. A common workflow is first to develop models for the adsorbates. The TraPPE models aim to have a high degree of accuracy in the prediction of thermophysical properties when applied to a range of different compounds, different state points, different compositions, and different properties. This makes TraPPE one of the few force fields generally suitable for materials and industrial applications. For nonbonded interactions, parameters for representative simple molecules are found by empirical fitting procedures that reproduce experimental phase equilibrium data, typically the vapor-liquid coexistence curve (VLCC) but also binary mixtures and vapor-solid equilibria. Fitting to phase equilibrium data provides accurate benchmarks against multiple phases, temperatures, and densities.
The advantage of adsorbate models that reproduce phase equilibrium data is that the saturation value of the adsorption isotherm is well-reproduced by construction. This is important, because it allows an examination of the state of the pores of the framework, that is, is there pore-blocking? are there remaining solvent or template molecules in the structure? The TraPPE model and many others use the approximation of fixed point charges. For small molecules, the charges follow from the dipole or quadrupole moment. However, this approximation immediately shows why water is difficult to model, it has a different dipole-moment in the gas-phase compared to the liquid-phase. Also, the bend-angle is altered in the adsorbed state. One approach to handle this is to make the potentials electric-field dependent. [220] The framework-atoms are also usually modelled with the LJ+Coulombic approach. The next subsection focuses on how to obtain point charges. Once the charge is determined, the next step is to obtain the van der Waals parameters. We discuss two ways: the VLE fitting for the adsorbate-adsorbate interaction, and 2) fitting to inflection point in isotherms to fit the adsorbate-framework interactions. The next subsection deals with QM-derived potential for open-metal sites. Then, we discuss how to obtain models for the framework itself, and we close with a discussion on force fields for modeling mechanical properties.

Point Charges
Charge is not a quantum mechanical observable. Electrons are smeared out, and their charge is shared among nearby atoms. We are faced with the difficult problem of how to map the electron density onto classical atomic point charges. [255] Many algorithms were defined over the past 60 years to divide molecules into atomic parts, [256] for example Mulliken Population Analysis (MPA), [257] Natural Population Analysis (NPA), [258] Bader's AIM scheme, [259] Lowdin Population Analysis (LPA), [260] Hirshfeld partitioning, [261] Hirshfeld-I partitioning. [262] One approach is the concept of Atoms in Molecules (AIM) [263] and then try to Adv. Theory Simul. 2019, 2, 1900135   1900135 (40 of 62) www.advancedsciencenews.com www.advtheorysimul.com introduce AIM subspaces within the molecule. The Hirshfeld-I has been extended to periodic bulk materials [264] and Verstealen et al. proposed a simple Extended Hirshfeld (Hirshfeld-E or HE) scheme to solve an ambiguity in the Hirshfeld-I scheme when applied to inorganic oxide clusters. [265] Manz and Sholl developed an AIM-based method called Density Derived Electrostatic and Chemical (DDEC) charges [266][267][268] method. Here, the electron density is partitioned and assigned to each atom employing spherically symmetrical weighting functions [269] with the partial charges optimized to simultaneously reproduce realistic chemical states, together with the QM electrostatic potential. This method gives accurate results for a variety of periodic and nonperiodic materials including molecular systems, solid surfaces, porous solids, and nonporous solids. [268] A second class of methods is to fit the reproduced QM Electro-Static Potential (ESP): This quantity can be used to fit point charges. [270] Subsequently the method was refined by introducing least-squares fitting of point charges on a grid, [271] ESP on spherical shells surrounding atom (CHELP), CHELP on a grid instead of spheres (ChelpG) which removed the problem with CHELP of not satisfying rotational invariance, and the RESP method (Restrained electrostatic potential fit) that applies restraints to partial charges on some atoms to remove the problem of erratic charges for buried atoms.
In the REPEAT method, [272] Campana et al. introduced a new error functional which acts on the relative differences of the potential and not on its absolute values, as it is currently done with molecular ESP charge derivation methods. Conventional ESP fitting procedures developed for molecular systems, in general, will not work for crystalline systems because the electrostatic potential in periodic systems is ill-defined up to a constant offset at each spatial position. The method produces sensible partial atomic charges for periodic systems. There is no need to extract chemically sensible and neutral moiety from the framework. Problems like capping/terminate the fragment to satisfy unfilled valences can be avoided. For many systems, minimal user intervention is required to generate the REPEAT charges, which is necessary for the automation of the charge derivation procedure. Such automation is very useful in high throughput computational studies of nanoporous materials. However, Watanabe et al. found for the ZIF-8 system that the REPEAT charges (especially for Zn) significantly depend on the details of the fitting-volume. [267] The grid used in fitting the REPEAT charges should cover a volume similar to that which is available to an adsorbate molecule. It is therefore critical to exclude points near an atomic nucleus in the fitting of REPEAT charges. [267,272] Wannabe et al. [267] proposed to directly use the electrostatic potential energy calculated from plane wave DFT. This method removes an ambiguity that has been inherent in all previous models of these materials in which point charges are assigned to atoms in the adsorbent. By definition, the DDEC, Hirshfeld, and REPEAT charges sum over the unit cell to the correct total charge. ESP methods have problems with buried atoms. Some alternative methods, which are not affected by the presence of buried atoms, have been proposed. The dipole preserving and polarization consistent (DPPC) charges method, [273] based on a Lagrangian optimization to reproduce atomic dipoles and multipoles, [274] has been developed with the introduction of peculiar features, including charge redistribution based on relative atomic electronegativity and the treatment of molecular dipole polarization. Resulting charges are able to accurately reproduce the molecular dipole moments, thus ensuring a good description of the system local properties (which may be spoiled by basic ESP fitting). Two major improvements to the state-of-theart REPEAT method, for generating accurate partial charges for molecular simulations of periodic structures, were developed by Gabrieli et al. [275] The first, D-REPEAT, consists in the simultaneous fit of the ESP, together with the total dipole fluctuations (TDF) of the framework. The second, M-REPEAT, allows the fit of multiple ESP configurations at once. When both techniques are fused into one, DM-REPEAT method, the resulting charges become remarkably stable over a large set of fitting regions, giving a robust and physically sound solution to the buried atoms problem. The method capabilities were extensively studied in ZIF-8 framework, and subsequently applied to IRMOF-1 and ITQ-29 crystal structures. [275] InfiniCharges [276] is a tool for generating reliable partial charges for molecular simulations in periodic systems using the DM-REPEAT method. The stability of the resulting charges, over a large set of fitting regions, is obtained through the simultaneous fit of multiple electrostatic potential (ESP) configurations together with the total dipole fluctuations (TDF). The program can also perform standard REPEAT fit and its multiframe extension (M-REPEAT), with the possibility to restrain the charges to an arbitrary value.
A third approach for the computation of effective charge is through charge equilibration models, [277] as discussed in Subsection 4.13. Charge equilibration tries to find the set of partial atomic charges that minimize the energy of the system. The method of Rappe and Goddard required iteratively minimizing the system energy in a self-consistent manner. [243] In contrast, the recently proposed method of Wilmer et al. requires no iteration, [241,242] which makes it better suited for high throughput screening of materials and for calculating fluctuating charges on the fly throughout a molecular simulation. The method of Wilmer is applicable to MOFs: the Taylor series is expanded around the charged metal atom, and the ionization potentials and electron affinities corresponding to the cation are used. The charge equilibration is able to compute charges at only a tiny fraction of the computational cost of DFT methods. Charge equilibration methods have been evaluated for the generation of electrostatic fields in nanoporous materials. [278] Many charge sets have been published for frameworks like MOFs, [279][280][281] COFs, [282] ZIFs, [283,284] and siliceous zeolites. [285] The different methods available to calculate atomic partial charges in MOFs have been reviewed by Hamad et al. [256] They demonstrated the influence of even small structural variations on atomic charges, and the influence of the choice of charges on computed properties. The decision about what method is the best is not a simple one, and the choice will depend on factors such as the knowledge and experience of the researcher, the codes that he or she has access to, the type of systems that will be studied, etc. [256] A relatively new approach to charge assignment is Machine Learning. [286][287][288] Also note that the force-match procedure, Adv. Theory Simul. 2019, 2, 1900135 1900135 (41 of 62) to be discussed in Subsection 6.7, naturally includes the fit of partial charges.

Vapor-Liquid Equilibrium Data
A computational very efficient model is the Transferable potentials for Phase Equilibria TraPPE force field by Martin and Siepmann. [153,154] The force field describes linear, monobranched and di-branched alkanes, [153,154] benzene, pyridine, pyrimidine, pyrazine, pyridazine, thiophene, furan, pyrrole, thiazole, oxazole, isoxazole, imidazole, and pyrazole, [298] primary, secondary, and tertiary amines, nitroalkanes and nitrobenzene, nitriles, amides, pyridine, and pyrimidine, [299] ethers, glycols, ketones, and aldehydes, [158] thiols, sulfides, disulfides, and thiophene, [159] as well as some smaller molecule like CO 2 and N 2 [157] and ethane and ethylene. [300] Despite the fact that the model lumps CH 3 , CH 2 , and CH into single interaction centers, it very accurately reproduces the experimental phase diagram and critical points. This united atom approach allows for much longer simulation times and larger systems because each of the CH xgroups is charge-neutral and charge-charge interaction can be omitted. Some TraPPE models for small molecules are listed in Table 10.
The TraPPE model is calibrated incrementally using Vapor-Liquid Equilibrium (VLE) data, for example, CH 4 using methane data, CH 3 using ethane data, CH 2 using propane data, CH using isobutane, and C using neopentane. This then constitutes the "calibration-set" (Figure 5a). With the final parameters in hand the remaining experimental data, for example, for heptane or isopentane, can then be used to validate the predictions of the model. If for this "validation set" unsatisfactory results are obtained and the experimental calibration data are not in doubt, then the model needs to be refined. A more sophisticated model could include explicit hydrogens [155] or anisotropic sites. [187,301] However, the united-atom TraPPE model for alkanes is able to accurately describe alkanes properties over a wide range of chainlengths, densities, pressures, and temperatures ( Figure 5b).
The TraPPE alkane fitting strategy results in a well depth that increases, and a diameter that decreases, with increasing number of hydrogens. At first it might be surprising that the LJ diameter of a methyl pseudo atom is smaller than that of a methylene group. However, it has been pointed out that the incremental volume of a methyl group remains much larger than the incremental volume of a methylene group as proposed by Bondi [302] because of the large overlap between a methylene pseudoatom and its two neighbors. [153,154] Furthermore, it is plausible that the diameter of a methylene pseudoatom containing two long C-C bonds and two (relatively) short C-H bonds might be larger than that of a pseudoatom containing a single long C-C bond and three short C-H bonds. [153,154] The TraPPE-UA force field parameters for a given pseudo-atom do not depend upon its neighboring pseudo-atoms. While it could be argued that the parameters of pseudo-atoms should depend upon their nearest neighbors, such a force field for the alkanes would have 69 different types of pseudo-atoms. [154] Figure 5. Alkane model based on VLE fitting (left) calibration of CH 4 using methane data, CH 3 using ethane data, CH 2 using propane data, CH using isobutane, and C using neopentane (right) density vs. pressure for a united-atom TraPPE-UA propane model. Experimental data from the NIST database. a) Reproduced with permission. [90] Copyright 2013, The Authors, published by Taylor & Francis. b) Reproduced with permission. [235] Copyright 2015, The Authors, published by Taylor & Francis.
Adv. Theory Simul. 2019, 2, 1900135 1900135 (42 of 62) Table 11. Selection of models by the Calero-group using shifted LJ potentials (cutoff 12 Å) for molecules calibrated on experimental vapor-liquid equilibrium data. These potentials can be used by both MC and MD. Cross terms are computed using Lorentz-Berthelot mixing rules. Alkenes [296] no ---12 yes propylene [297] no CH 3  Adjusting these force field parameters to VLE data is currently a cumbersome and computationally expensive task. For alkanes an incremental optimization (e.g. first CH 3 from ethane, then CH 2 from propane, etc) is possible due to the simplistic nature of alkanes, but this is an exception rather than the rule. As mentioned before, a force field optimization via the VLE of, for example, CO 2 involves optimizing the van der Waals interactions of the carbon and oxygen atom type simultaneously, therefore requiring many time-consuming iterations. An objective function can be defined as the squared deviation between the computed and the experimental data. Current fitting strategies involve lowering this function below a defined threshold using Simplex-or gradientbased methods. Each iteration point requires a full molecular simulation run. Van Westen et al. developed a method to overcome this limitation. [303] The actual optimization is performed in a faster, different framework than molecular simulation, that is. using the PC-SAFT equation of state. [304] The PC-SAFT is able to sufficiently capture the size and shape of molecules and the dispersion interactions of fluids to reduce the amount of iterations to only two or three.
The TraPPE model was mainly developed for MC purposes. For pure alkanes, the Radial Distribution Function (RDF) tends to converge to unity beyond 14 Å. This indicates that longer range order is absent. The Van der Waals potential is then truncated at this distance and the energy-correction due to this truncation is approximated (the "tail-correction"). The truncation distance and whether or not to use tail-correction should be considered as part of the force field. For molecular dynamics the truncation in the energy leads to a divergence in the forces. Common approaches include the use of switching function where the energy is forced to smoothly go to zero, and to shift the Van der Waals potential to zero at the cutoff. An alternative is to use the Ewald summation to compute the Van der Waals term, [305,306] but only a few codes have a that capability (an example is GULP [307] ). The Calero-group (re-)calibrated many small molecules model for a shifted Van der Waals potential and a cutoff of 12 Å (see Table 11) for use in both MC and MD. Compared to similar molecules in Table 10 we notice that the size parameter is slightly smaller and the strength parameter has become larger to make up for the lost attraction due to the shifting of the potential.
Subtle differences in energy and structure have been exploited to develop a force field for molecular simulation of martensitic phase transitions. [308] Phase equilibria are extremely sensitive to potentials parameters of empirically determined classical force field. The reason the VDW parameters are so sensitive to the VLCC is that the molecule experiences two different environments, the low density and the high density, and the parameters need to fit both simultaneously. Likewise, inflections in isotherms are caused by the filling up of two (or more) adsorption sites that differ in energy. While VLE can fix the parameters for adsorbate-adsorbate interactions, the fitting on inflections in isotherms allows the accurate determination of adsorbateframework parameters.
Adv. Theory Simul. 2019, 2, 1900135  Figure 6. The calibration of isobutane in MFI zeolite. a) For a fixed that is too small the inflection is not reproduced, b) for a fixed that is too large the inflection is too large (for several different ). c) Only a single , -pair is able to reproduced the experimental isotherm. Other models are not able to model the inflection of isobutane in MFI accurately. d) The resulting model predicts the experimental isobutane isotherms for two sources at various temperatures accurately. Reproduced with permission. [309] Copyright 2004, American Chemical Society.

Fitting on Inflections in Adsorption Isotherms
Early work on alkanes in zeolites used the Kiselev-model where adsorbate-framework interactions were optimized to reproduce experimental Henry coefficient and heats of adsorption. [310] The Kiselev approach models the zeolite as a rigid network of oxygen atoms. [12] This is a very common approximation because the large oxygen atoms account for most of the potential energy and essentially shield the much smaller silicon atoms. The oxygenadsorbate potential is an effective potential which implicitly includes the Si-contribution. The advantage of such a simplicity is that when combined with a united atom model, one can uniquely optimize an alkane-zeolite model (e.g., the obtained parameters are the best ones possible). Fitting on Henry coefficients and heats of adsorption does not uniquely fix the LJ-parameters. The methodology proposed by Dubbeldam et al. is based on fitting isotherms with inflection points. [309,311] These inflection points are due to adsorption sites which differ significantly in energy. One site is predominantly filled up first, before the other sites fill up (as a function of pressure). The inflection in the isotherm reflects this subtle difference in adsorption energetics of the sites. Inflections are extremely sensitive to the van der Waals parameters which is the dominating mechanism for physisorption.
It is instructive to discuss the role of the size-parameter O-CH x . In Figure 6 we show the influence of the parameters on the inflection of 2-methylpropane in MFI. The O-CH parameters remain fixed at = 3.92 Å and ∕k B = 40 K, while O-CH 3 is examined over a range of reasonable values for two values of O-CH 3 : one significantly too small and one significantly too large. A crucial observation is that only a single strength/size parameter pair is able to describe the inflection and the entire isotherm properly. This is in contrast with the common belief that for each value of there is a corresponding that can describe the isotherm correctly. [312] The shape of the isotherm and the inflection points are the most sensitive to the size-parameter of the interactions, whereas the loading at a given pressure is most sensitive to the strength-parameter of the interaction. A higher strength parameter induces an increased loading, and a lower strength  Comparison of the polarization energy computed with the developed polarizable force field without considering back-polarization and the orbital interaction energy from DFT calculations as a function of the distance between a CO 2 molecule and the Mg ion. [313] Reproduced with permission. [313] Copyright 2017, American Chemical Society.
parameter results in a decrease in loading (for a fixed pressure). The amount of inflection is controlled by the size parameter . These properties can be exploited to obtain unique parameters.
In practice, one proceeds as follows. A reasonable starting size parameter is chosen. For this parameter one iteratively searches for the corresponding strength parameter that matches the experimental data at a pressure significantly below the inflection. The entire isotherm is then followed for increasing pressure until a deviation from the experimental data is observed. The updated size parameter is then found by choosing a higher value for a deviation to the left of the experimental data, and by choosing a lower value for the size parameter for a deviation to the right of the experimental data. This scheme proceeds iteratively until the entire experimental isotherm is accounted for.
The fitting to well-established inflection points in the isotherms has many advantages [309,311] : i) a unique set of parameters is obtained directly related to a well defined physical property, ii) the parameters are determined accurately, iii) inflections are found at moderate pressures and here the experimental data is most reliable, iv) the inflection is directly related to the structure (e.g., for n-heptane and 2-methylpropane in MFI the inflection occurs exactly at four molecules per unit cell) and v) by explicitly fitting to full adsorption isotherms the proper reproduction of properties such as Henry coefficients, heats of adsorption, adsorption entropies, and maximum loadings is guaranteed.

Polarizable and QM-Derived Force Fields for Open-Metal Sites
When simple force fields fail to accurately describe experimental data, physically based next-generation force fields that efficiently describe effects like polarization needs to be adapted. [314] Mg-MOF-74 has been shown to be a promising candidate for carbon capture due to its high CO 2 uptake capacity at low partial pressures. Despite the significant progress, it is still a major challenge to accurately capture the change of interaction strength with varying metal ions in M-MOF-74 in molecular simulations. One approach is to use ab initio derived force fields, [315,316] another to include explicitly polarization effects. Guest molecules are polarized in the vicinity of the open-metal sites in M-MOF-74 and this interaction contributes to the enhanced CO 2 affinity. Since this is local effect it is not possible to simply re-parametrized the guesthost interactions. Considering polarization explicitly can help to create force fields that overcome the shortcomings of current generic force fields. [313,[317][318][319][320] Lachet et al. explored polarization in MC simulations using solely polarization between the framework and adsorbate molecules and neglecting polarization caused by induced dipoles, so-called back-polarization. [219] Thereby, an iterative scheme is avoided and the computational costs of the method are drastically reduced. In fact, the computational costs can be similar to simulations without considering explicit polarization. Figure 7a shows the most favorable position for CO 2 in Mg-MOF-74 at approximately 2.4 Å, the difference in total energy is approximately 7%. This deviation seems to be acceptable in comparison with the considerable speedup of the simulations.
Note that one can not just add polarization on top of an existing force field. Most likely, the VDW parameters are fitted to experimental data, and hence implicitly contain some sort of average polarization. Becker et al. therefore removed the contribution of implicitly considered polarization to the interaction potential, by applying a global scaling parameter to all Lennard-Jones energy parameters developed without explicit polarization. [313] In Figure 7b, we compare the polarization energy of a CO 2 molecule Adv. Theory Simul. 2019, 2,1900135 1900135 (45 of 62) Figure 8. Comparison between the experimental results of Herm et al. [321] (open), Queen et al. [322] (yellow), Yu et al. [323] (orange), and Dietzel et al. [324] (brown) and simulation results using the polarizable force field of Becker et al. [313] (black), the UFF force field (blue) and the DFT-derived non-polarizable force field of Mercado et al. [325] (green) for CO 2 in Mg-MOF-74. a) adsorption isotherm at 298 K (Herm et al. [321] 313K) (b) heat of adsorption as a function of uptake. Reproduced with permission. [313] Copyright 2017, American Chemical Society.
approaching the Mg ion of Mg-MOF-74 estimated with the developed polarizable force field to the orbital interaction energy calculated from ADF. For relevant distances, both methods show a comparable trend for the energy contributions. The most relevant distance between the CO 2 molecule and the Mg ion is where the total energy is the lowest (i.e., 2.3-2.5 Å, as shown in Figure 7a). At this distance the polarizable force field predicts that the polarization energy of a single CO 2 molecule in Mg-MOF-74 has a significant contribution of around 30% to the total energy. For larger distances the contribution of polarization decreases rapidly.
The resulting adsorption isotherms for CO 2 in Mg-MOF-74 in comparison to experimental measurements, the UFF force field, and the DFT-derived non-polarizable force field of Mercado et al. [325] are shown in Figure 8a. The simulation results with the polarizable force field clearly display the inflection of the experimental adsorption isotherm. The predicted behavior is significantly better than with the UFF force field. This is expected, because the scaling factors are adjusted to reproduce the experimental data. The overall agreement with the experimental measurements is comparable with the DFT-derived non-polarizable force field of Mercado et al. [325] Both force fields can predict the low fugacity region which is particularly important for carbon capture. For higher fugacities, simulations with all compared force fields predict higher CO 2 uptakes in comparison to the experiments. This can be attributed to the fact that a certain degree of inaccessibility due to diffusion limitation or defects in the crystal structure is inherent with experimental structures. It should be noted that Mercado et al. [325] scaled the calculated CO 2 uptakes with 0.85 to account for inaccessibility of open-metal sites. This scaling procedure mainly improves the agreement between experiments and computations for the high fugacity region. Figure 8b shows the heat of adsorption as a function of CO 2 molecules per metal ion. The distinct inflection of the adsorption isotherm caused by the strong affinity of the CO 2 molecule toward the metal ions is reflected by the change of the heat of adsorption with increasing gas uptake. The calculated heat of adsorption has an inflection around one CO 2 per metal ion. Before and after the rapid decrease at one CO 2 molecule per metal ion, the heat of adsorption increases slightly. This increase can be related to a rise in the total number of adsorbed CO 2 and therefor a larger contribution of the CO 2 -CO 2 interactions to the total energy.

Fitting of Force Field Parameters against Ab Initio Generated PES Data
Fitting algorithms are generally based on optimizing a "merit function" that corresponds to minimizing a weighted sum of square deviations between the classical data and reference data points from the QM PES. [76,113] The key quantity is the "sum of squares" F which measures how good your fit is where f calc and f obs are the calculated and observed quantities and w is a weighting factor. The aim of a fit is to minimize the sum of squares by varying the potential parameters. Also "Force Matching" can be done [326] and even a more generalized match on energy E, forces F, and stresses S Adv. Theory Simul. 2019, 2, 1900135 The weight factors are usually in the order: forces ≫ stresses > energies. There are several codes able to do the parameterization for several classical force fields. GULP [162] allows fitting on many observables like elastic constants, stresses, heat capacity, entropy, bond lengths and angles, relative energies, and on energy surfaces. To fit an energy surface it is basically necessary to input all the structures and the energies that correspond to them. ForceFit is a code to fit classical force fields to quantum mechanical potential energy surfaces. [327] ForceFit interfaces with three molecular mechanics codes, DL_POLY, Amber, and LAMMPS. Potfit (https://github.com/potfit) is a free implementation of the forcematching algorithm to generate effective potentials from ab initio reference data. [328] DFTFIT (https://github.com/costrouc/dftfit) is a python code that uses ab initio data from DFT calculations such as VASP, Quantum Espresso, and Siesta to develop molecular dynamic potentials. DFTFIT uses the least square method to fit the stresses, total energy, and forces of a given set of configurations. The user has the flexibility of changing the weights for fitting and uses relative error to judge the fitness of parameters. Atomicrex is a versatile tool for the construction of advanced atomistic models. [329] In a general sense, it allows one to develop models that describe a given scalar or vectorial property (e.g., total energies and forces) as a function of an atomic (or atom-like) configuration. GAFit is a software package based on a genetic algorithm that fits an analytic function to a given set of data points. [330] The code was interfaced with the CHARMM and MOPAC programs in order to facilitate force field parameterizations. GARFfield (Genetic Algorithm based Reactive Force Field optimizer) is a multi-platform, multi-objective parallel hybrid genetic algorithm (GA)/conjugate-gradient (CG) based force field optimization framework. [331] GARFfield currently supports a range of force field engines, via the LAMMPS Parallel Molecular Dynamics Simulator (including the adiabatic ReaxFF). MEAM-fit is an interatomic potential fitting code capable of fitting embedded atom method (EAM) and reference-free modified embedded atom method (RF-MEAM) potentials to energies and/or atomic forces produced by the VASP density functional theory (DFT) package. [332] As a last example, we would like to mention machine learning interpolation of atomic potential energy surfaces to enable the nearly automatic construction of highly accurate atomic interaction potentials. [333]

Genetic Algorithms
A common fitting approach is to use Genetic Algorithms (GA). [334] A genetic algorithm has been used to generate plausible crystal structures from the knowledge of only the unit cell dimensions and constituent elements. Woodley et al. successfully generated 38 known binary oxides and various known ternary oxides with the Perovskite, Pyrochlore, and Spinel structures, from starting configurations which include no knowledge of the atomic arrangement in the unit cell. [335] There have been a number of approaches to parameterize a force field directly from the quantum chemical calculations (see ref. [336] and references therein). Since the true potential energy surface (PES) can be approximated using quantum mechanical methods, a force field can be directly fit to a calculated QM PES by numerically matching the gradients or energy. Sufficiently accurate description of the PES enables fitting a variety of terms in the force field including harmonic bonding constants, as well as anharmonic and coupling terms, using, for example, a freely available code like "ForceFit." [327] Note that it is very hard, if not impossible, to obtain potentials from, for example, rotational energy curves. These profiles are the result of a complicated interaction of many potentials and to separate these special methods were developed. Although it is theoretically possible to include nonbonded interactions in the fitting, it is more common to obtain charges and van der Waals parameters separately and use these as input. Tafipolsky et al. parameterized a force field from first principles reference data by optimizing a novel objective function with a GA. [334] The objective function was based on a representation of both structure and Hessian matrix in a redundant internal coordinate set, which can in principle be defined automatically by the connectivity of the molecular mechanics force field. The van der Waals parameters and electrostatics were given as input, as well as the functional forms of the bonded terms and a quantum-optimized representative cluster. The GA algorithm then is able to efficiently parameterize the bonded terms, which reproduce as close as possible the DFT results (structure and vibrational modes). It can be used in combination with any quantum-mechanical package that is able to provide the optimized structure and the Hessian matrix in Cartesian coordinates. The evaluation of the fitness of an individual parameter set comprises several basic steps: The stopping criteria for the GA can be set up by separate convergence thresholds for the metric parameters (bonds, angles) and Hessian terms or by a fixed number of generations to be evolved.

Elastic Constants
Gale and coworkers used the elastic constant tensor for the optimization of core-shell zeolite models. [162] However, only a hand-full of parameters need to be optimized and there is little ambiguity in the functional forms of the model. MOFs are chemically much more diverse, but fortunately there are many generic force fields available for the linker-molecules. The difficult part is to connect the linkers together via the hinges while making sure the model reproduces the correct mechanical behavior. Heinen et al. [337] argue that reproducing the flexible Adv. Theory Simul. 2019, 2,1900135 1900135 (47 of 62) Figure 9. Elastic tensor-based force field parameterization approach: functional forms and corresponding parameters are changed and/or added until the classical force field elastic tensor agrees with the ab initio derived elastic tensor. Reproduced with permission. [337] Copyright 2017, American Chemical Society.
behavior of the hinges in MOFs can be done by parameterizing the flexible force field on the elastic tensor. The elastic constant can be obtained from experiment when available, but otherwise can be computed by ab initio calculations. The ab initio computation of the elastic tensor for various MOFs have been performed by Tan et al. [32,338,339] and Coudert et al. [340,341] Understanding mechanical properties is crucial as it has been argued that the elastic tensor can predict flexible behavior. [338] The large-scale flexibility and low frequency modes in MOFs revolves around the behaviour of metal-clusters that are thought of as "hinges" acting between rigid "struts." [2] Heinen et al. propose to fit the parameters of the hinges of MOFs on DFT derived elastic constants. [20,337] It is the hinges that determine the mechanical behavior and modeling these correctly is vital. The fourth-rank elastic tensor  is related to the second-rank stress and strain tensors via Hooke's law [245] = ∑  with Greek indices denoting Cartesian components. For an orthorhombic unit cell, there are nine independent elastic constants when considering Voigt symmetry as shown below.
The first three diagonal components of the elastic tensor in Voigt notation  11 ,  22 , and  33 represent the stiffness along the principal crystal axes. The other diagonal components  44 ,  55 , and  66 are the shear coefficients that determine the resistance against angular deformation due to shear strain. The off-diagonal components  12 ,  13 , and  23 are the tensile-coupling interactions between two principal axes. The proposed fitting procedure of the hinges is shown in Figure 9. First, target elastic constants are computed from DFT. This is non-trivial by itself as the energy-landscape around the hinges can be shallow. Very high precision computations are needed, and since second derivatives are usually computed numerically in DFT using finite differences from the forces, the dependence on the step-length of the differentiation needs to be investigated. Next, the parameters of hinges can be initialized with reasonable values based on previous work, other force fields, or intuition. The structure is minimized and the elastic constants are examined. By examining the values to the target values a new set of parameters is chosen. By applying an iterative process, the final set is obtained.

Machine Learning (ML)
A few examples of the use of machine learning in the field of classical simulations for materials science are the deep neural network learning of complex binary sorption equilibria from molecular simulation data, [342] solving vapor-liquid flash problems using artificial neural networks, [343] predicting thermodynamic properties of alkanes, [344] charge assignment, [286][287][288] prediction of partition functions, [345] simulation of infrared spectra, [346] predicting the mechanical properties of zeolite frameworks, [347] CO 2 capture using MOFs, [348,349] prediction of methane adsorption performance of MOFs, [350,351] chemically intuited, large-scale screening of MOFs, [352] screening for precombustion carbon capture using MOFs, [353] screening of MOF Membranes for the separation of gas mixtures, [354] and screening of MOFs for use as electronic devices. [355] Using ML to combine the accuracy and flexibility of electronic structure calculations with the speed of classical potentials is a very active research field. [356][357][358] The predictive accuracy of Machine Learning (ML) models of molecular properties depends on the choice of the molecular representation. [359] Machine learning for atomistic force fields can be used in two different ways Adv. Theory Simul. 2019, 2, 1900135

(48 of 62)
• to find parameters for a force field with specified functional forms (for example, see ref. [360]), • to replace the classical or QM force field altogether.
In the second approach the vectorial force on an atom is computed directly from its environment. By directly mapping quantum mechanical-derived force components to the numerical representation of an atoms local environment, accurate, and computationally inexpensive force fields can be developed. [361][362][363] A typical workflow is to systematically construct reference data, representing the atomic environments with a numerical fingerprint, sampling nonredundant data, and learning the forces. The ML databases can be pre-computed or updated on the fly when required. An example of the on-the-fly approach is the paper of Li et al. [364] Forces on atoms are either predicted or, if necessary, can be computed by on-the-fly QM calculations and added to a growing ML database, whose completeness is, thus, never required. As a result, the scheme is accurate and general, while progressively fewer QM calls are needed when a new chemical process is encountered for the second and subsequent times.
Atomistic Machine-learning Package (Amp) is an open-source package designed to easily bring machine learning to atomistic calculations. [365] This allows one to predict (or really, interpolate) calculations on the potential energy surface, by first building up a regression representation from a "training set" of atomic images. The Amp calculator works by first learning from any other calculator (usually quantum mechanical calculations) that can provide energy and forces as a function of atomic coordinates. Depending upon the model choice, the predictions from Amp can take place with arbitrary accuracy, approaching that of the original calculator. Amp is designed to integrate closely with the Atomic Simulation Environment (ASE).
DeePMD-kit, a package written in Python/C++ that has been designed to minimize the effort required to build deep learningbased representation of potential energy and force field and to perform molecular dynamics. [366] DeePMD-kit is interfaced with TensorFlow (https://www.tensorflow.org), one of the most popular deep learning frameworks, making the training process highly automatic and efficient. On the other end, DeePMD-kit is interfaced with high-performance classical molecular dynamics and quantum (path-integral) molecular dynamics packages, that is, LAMMPS and the i-PI, respectively. Thus, upon training, the potential energy and force field models can be used to perform efficient molecular simulations for different purposes. Based on DeePMD-kit, Zhang et al. used active learning to obtain uniformly accurate interatomic potentials for materials simulation. [367] Jpred (http://www.compbio.dundee.ac.uk/jpred4) [368] predicts secondary structural elements for an amino acid sequence using machine learning approaches, based on known 3D structural information as available in the Protein Data Bank. Machine learning approaches can predict protein-ligand binding accurately. [369] Tangadpalliwar et al. developed "ChemSuite," [370] a stand-alone application for chemoinformatics calculations and machinelearning model development. Availability of multi-functional features makes it widely acceptable in various fields. Force field such as UFF is incorporated in tool for optimization of molecules. "SchNet" is another software package that can accurately predict a range of properties across chemical space for molecules and Table 12. Functional form of the core-shell model for zeolites, metal-oxides and minerals. Note that the interaction between a core and its shell is skipped in the VDW and Ewald real-part, and a correction-term is added to the Fourier sum.
 total =  core-shell +  3-body +  nonbonded  core-shell core-shell energy  core-shell (r ij ) = ∑ core-shells r ij materials, where their model learns chemically plausible embeddings of atom types across the periodic table. [371] 7. Zeolite and MOF Force Fields

Zeolite Force Fields
Metal-oxides are very successfully modeled using the Core-Shell method. [69,307,[372][373][374] To model polarization on the oxygen, the charge of the oxygen is divided over the "core" and the "shell" which are separated by a spring. Typically, a Buckingham potential is used for the metal (with no attraction term), while for the oxygen both the exponential repulsion and the 1∕r 6 attraction term are used. The attraction of the oxygen is very strong (see ref. [335] for a compilation of metal-oxide parameters). A common functional form of the model is shown in Table 12. In the shell model, the short-range repulsion and van der Waals interactions are taken to act between the shell particles. This finding has the effect of coupling electrostatic and steric interactions in the system: in a solid-state system where the nuclei are fixed at lattice positions, polarization can occur not only from the electric field generated by the neighboring atoms, but also from the short-range interactions with close neighbors. This ability to model both electrical and mechanical polarization is one reason for the success of shell models in solid-state ionic materials. [216] The core-shell modes are also very successfully applied to zeolites. [69,307,[374][375][376][377][378] The predicted crystal structures agreed very well with experimental crystallographic data. The core-shells approach allowed to study structural phase transitions, zeolite hydration processes, phonon-dispersion curves, and negative thermal expansion in zeolitic materials. However, it unfortunately proved difficult to combine these models with adsorbate models to study adsorption. The core-shell model requires formal charges on the atoms while charges based on ab initio approaches are significantly lower. In addition, this type of model increases the amount of particles in the system. This reduces the efficiency Adv. Theory Simul. 2019, 2, 1900135 of the calculation. To overcome these inherent drawbacks several other force fields for zeolite were proposed.
The Demontis model consists of an O-Si bond-potential and an Urey-Bradley Si-T-Si potential (bond potential between 1-3). [379,380] The models are restricted to the simulation of adsorbates without the use of charge. This includes alkanes as simulated in the united-atom approach. The advantage of the model is not its accuracy, but its efficiency. The model was extended to hydrated aluminosilicates including cations and Coulombic interactions were added. [381] The BKS force field was developed by van Beest et al. to describe the geometries, and mechanical properties of silicas and aluminophosphates. [382] The parameterization is based on both ab initio and experimental data. The force field contains four atom types: Si, O, Al, and P. One important observation by the authors is that when the inter-atomic interactions goes beyond nearest neighbors it is inevitable to also use macroscopic information (here elastic constants) in the force field parameterization. Kramer et al. further outlined their route to develop effective inter-atomic force fields for zeolites from ab initio calculations on small clusters. [383] The Nicholas model used bond-, bend-, torsion-potential, as well as Lennard-Jones and Coulombic long-ranged interactions. [190] A Urey-Bradley term was included on the Si─Si nonbonded distance for each Si─O─Si angle to couple the bond-stretching to the bend-angle (in silicates the Si─O bond is known to lengthen as the Si─O─Si bond angle becomes smaller). When the Si─O─Si angle is close to linear, it is possible for it to invert, causing a discontinuity in the torsional potential that contains the Si─O─Si angle. Therefore, the torsion potential was coupled to the related Si─O─Si angle, so that the torsional energy goes smoothly to zero as the Si─O─Si bond becomes linear. The force field has been demonstrated to reproduce the structure and dynamics of silica sodalite with use of energy minimization, normal mode analysis, and molecular dynamics techniques. Schrimpf et al. [384] developed a mode for zeolite NaY, accounting for the flexibility of the lattice. It is mainly based on effective pair potentials between neighboring atoms. The comparison of calculated and experimental data shows that the main structural and dynamical properties of the zeolite are reproduced. [384] Smirnov and coworkers parameterized the simplified general valence force field (SGVFF) to study window fluctuations and Raman and infrared spectra of zeolites and aluminosiliciates. [385][386][387][388] In a comparison to experimental data of spectra generated by different force fields, Bueno-Pérez et al. found that the Nicholas and Hill force fields generate the most similar IR spectra. [389] However, none of these force fields allowed the identification of frameworks based on their experimental spectra. Hill and Sauer derived a CFF force field for the simulation of aluminum-free zeolite structures [390] and aluminosilicates [391] from results of ab initio calculations on molecular models which represent typical structural elements of zeolites. In addition to the Nicholas forms, they also included bond-bond, bond-bend, bend-bend, and angle-angle-torsion terms. Flexibility is essential for studying properties of the zeolites themselves, and for example transition between phases. Yamahara et al. performed MD simulations on MFI and reproduced the temperature-induced phase transition between orthorhombic and monoclinic phases. [392] Bordat et al. [393] modified the BKS model with reduced partial atomic charges and re-optimized covalent bond potential wells and reproduced both the structure of the framework and its monoclinic to orthorhombic transition. Combined with the SPC/E model of water, the model also reproduces the important result that pure unhydroxylated silica is hydrophobic.
Zeolites with germanium substitutions have also been modeled with force fields with Buckingham potentials. [376] The study of these zeolites has been interesting for the synthesis of pure silica zeolites. These force fields allow the study of the position and distribution of germanium atoms in the structure. [394,395] The predicted germanium distribution agreed very well with experimental NMR data.
To study diffusion of benzene in Na-Y and develop a transport theory for cationic zeolites, Auerbach modeled the zeolite fully flexible. [396] The Si and Al atoms were replaced with an average tetrahedral-site atom (T-atom) because of the difficulty in experimentally determining A1 distributions in disordered zeolites. The work was the first long length scale simulation of diffusion in cation containing zeolites. Jaramillo and Auerbach developed and validated a new force field for cations in zeolites, which explicitly distinguishes Si and Al atoms, as well as different types of oxygens in the framework. [397] Their molecular dynamics simulations show that most cations are immobile in Na-X and Na-Y on the MD time scale, even at 1000 K. The MD simulations also show that cation movement is highly correlated. Most of the studies up to this time used force fields with the same partial charge on silicon and aluminum. However, because of the close proximity between cations and framework atoms this approach is unsuited for modeling cations in zeolites and it cannot account for the allocation of cations to specific positions. The model of Jaramillo and Auerbach was extended by Calero et al. to model adsorption of hydrocarbons in Na-Y and their results showed that these cations not only have a large influence on diffusion but also on adsorption. [398] Jeffroy et al. provide a general and transferable force field able to predict the structure of zeolites for various type of extra-framework cations. [399] Based on simple functional forms and interactions, it can be easily implemented in most common molecular simulation codes. The optimized force field is validated on structural properties (lattice parameters and Si─O─Al angles) for a large variety of zeolites, including faujasites of different Si/Al ratio and different extra-framework cation types (Li + , Na + , K + , Mg 2+ , Ca 2+ , and Co 2+ ). The transferability of the force field was successfully tested on zeolites of different topologies such as FAU, LTA, MFI, FER, and TON. Shi et al. studied the temperature dependence of zeolite pore sizes using ab initio and classical simulations [400] for all-silica zeolites SOD, FER, and MFI over the temperature range 300-900 K. Such simulations are important for understanding and predicting zeolite/guest fit, especially for relatively bulky guest species. Several force fields were tested and a modified BKS force field was found to give the best agreement with the simulated properties listed above, especially for MFI.
Some more recent work focuses on moving toward more generic force fields. Pedone et al. developed a new empirical pairwise potential model for ionic and semi-ionic oxides. [401] Its transferability and reliability have been demonstrated by testing the potentials toward the prediction of structural and mechanical properties of a wide range of silicates. Gabrieli et al. created a new force field for fast MD simulations in flexible aluminosilicates by adapting to the CHARMM functional form. [402] This Adv. Theory Simul. 2019, 2,1900135 new force field allows the execution of large-scale simulations in a parallel environment via the most common packages available and correctly reproduce the crystallographic structures and the vibrational properties for silicalite and zeolites Na A, Ca A, Na Y, and Na X. The CHARMM functional form was also used by Praprotnik et al. [403] to construct a new all-atom force field for molecular dynamics simulation of an AlPO4-34 molecular sieve.
Zimmerman et al. proposed a modified CHARMM force-field (ZHB potential) with low point charges for silica. [404] Lennard-Jones zeolite parameters were chosen to be compatible with existing CHARMM parameters. Sahoo and Nair modified the ZHB potential (MZHB) by reformulating its bonding potential, while retaining the nonbonding potential as in the ZHB force-field. [405] They showed that several structural and dynamic properties of silica, like the IR spectrum, distribution functions, mechanical properties, and negative thermal expansion computed using the MZHB potential agree well with experimental data. Gu and Hammond developed a potential energy model for siliceous zeolites based in part on fits to infrared (IR) spectra by extending the MZHB potential. [406] This potential is tested in terms of its ability to model seven silica polymorphs ( -quartz, -cristobalite, and five siliceous zeolites: zeolite Y, sodalite, silicalite-1, zeolite A, and chabazite). A sensitivity analysis shows significant sensitivity of the lattice parameter to the atomic partial charges, indicating a possible target for the parameterization of the atomic partial charges for crystalline materials. The Hill-Sauer force field for silica has recently been improved and the transferability of the force field was tested on 13 experimentally known silica eightring zeolites. [407]

MOF Force Fields
The first work on flexible force fields was on the prototypical IRMOF-1 structure and started around 2006. Greathouse and Allendorf studied water decomposition of IRMOF-1 using a model adapted from the generic CVFF force field. [408] To incorporate the possible breakdown of the structure, the Zn-O interactions were modeled using a nonbonding approach. Using this model, Greathouse and Allendorf demonstrated a structural collapse occurring around 4% water content. An understanding of the water stability and adsorption concepts has been reviewed by Burtch, Jasuja, and Walton. [409] The Greathouse and Allendorf model was later extended to other isoreticular zinc carboxylate coordination polymers. [410] Three other flexible force fields for IRMOF-1 were introduced in 2007 by Amirjalayer et al., [238] Dubbeldam et al., [411] and Han and Goddard. [412] Amirjalayer et al. extended the MM3 force field and used a building block approach for the parameterization of IRMOF-1. It correctly predicts the structure of IRMOF-1 and the MM3 force field was chosen for a more accurate reproduction of the vibrational modes. The parameters for the Zn4O moiety were refined based on first principles DFT of nonperiodic model systems. [413] Dubbeldam et al. re-parameterized the Greathouse and Allendorf force field by reproducing the experimental lattice parameters, and CO 2 and methane adsorption isotherms. [411] These authors showed that IRMOFs have exceptionally large negative thermal expansion coefficients and that this phenomenon is likely a generic feature of MOFs associated with the struts-and-spring, large-pore nature of MOFs. Negative thermal expansion in MOFs is related to relatively rigid linkers connected to rigid metal clusters via flexible groups such as carboxylate groups and the porosity of the structure that allows adequate volume for motion. The linker molecule undergoes transverse motions due to thermal vibrations such that at lower temperatures the linker becomes more rigid and stretches out. The simulations were in excellent agreement with later experiments using neutron powder diffraction and first-principles calculations using minimization of the free energy. [414] Han and Goddard used the DREIDING force field to further elucidate the mechanism of negative thermal expansion in MOFs. [412] The magnitude of the negative thermal expansion effect was found to be sensitive to the specific types of organic linker used. This change in thermal expansion can also change from negative to positive with the action of the pressure from adsorbed molecules. [415] Huang et al. developed a force field for IRMOF-1 and calculated the phonon thermal conductivity and vibrational power spectra. [416] Using the approach based on the MM-type force field Schmid and Tafipolsky, [417] and Amirjalayer et al. [418] developed force fields for COF-102 and boron based covalent organic frameworks (COFs) in general, respectively.
The Dubbeldam model for IRMOF-1 force field reproduced the experimental thermal expansion properties well. A later refinement was made to reproduce the ab initio elastic tensor. [419] The classical model of IRMOF-1 of Dubbeldam et al. gave  11 = 29.3 GPa,  12 = 11.9 GPa, and  44 = 0.985 GPa. DFT calculations gave  11 = 29.2 GPa,  12 = 13.1 GPa, and  44 = 1.4 GPa for LDA quantum using VASP, [70] and the experimental and DFT combined work of Zhou and Yildirim found  11 = 29.42 GPa,  12 = 12.56 GPa, and  44 = 1.16 GPa. [71] In order to refine the model for reproduction of elastic constants the generalized Hessian matrix was used to optimize the IRMOF-1 structure to a true minimum with all positive eigenvalues. The second use of the generalized Hessian was to analytically compute the elastic constants with machine precision. This mode-following optimization mechanism was previously applied to core-shell models of minerals and zeolites and available in the GULP software. [307] Using this optimization procedure on MOFs showed that, for example, the Greathouse and Allendorf minimum structure was a structure with 32 negative eigenvalues. A further optimization using generalized Hessian matrix showed that their model actually leads to an asymmetric and Zn 4 O distorted structure (in disagreement with the experimental structure). Mode-following optimization is therefore an incredibly powerful tool to identify problems in force fields.
The IRMOF-1 structure does not show large-scale structural changes. Rather the atoms vibrates largely around their equilibrium positions. Ford et al. only found minor differences between the diffusion results from the rigid and flexible framework models for several short n-alkanes and benzene in IRMOF-1. [420] Extreme examples of framework flexibility are the "breathing MOFs," with the MIL-53 structure as a prime example. [26] Pore breathing structures can expand or shrink to admit guest molecules. MIL-53 is a three dimensional MOF containing unidirectional diamond-shaped channels. The open form of MIL-53 (MIL-53lp) is obtained upon calcination of the as-synthesized compound, while the adsorption of water at room temperature leads to the closed, narrow-pore form (MIL-53np). Upon Adv. Theory Simul. 2019, 2,1900135 in STAM-1 by means of experimental and molecular simulation techniques. Slawek et al. used the flexible force field developed by Zhao et al. [442] for Cu-BTC, due to the similarity between the two MOFs. MC simulations in the osmotic ensemble and MD simulations using this flexible force field provided an accurate description of the framework flexibility upon the adsorption of methanol in STAM-1. The prediction of accurate multicomponent adsorption isotherms for mixtures containing olefin and paraffin molecules in open-metal site materials like Cu-BTC is difficult. Ideal Adsorption Solution Theory (IAST) fails, but Heinen et al. have shown that accurate results can be obtained by explicit binary-mixture simulations using a force field that captures the relevant orbital interactions using an additional chargeinduced-dipole-like term. [35] Recently, Durholt et al. parameterized a coarse grained force field for the copper paddle-wheel based HKUST-1, [445] allowing to extend both length and timescale in the simulation of MOFs.
The zirconium-based MOF UiO-66(Zr) [446] is comprised of zirconium oxide clusters connected by benzene dicarboxylate (BDC) linkers. The framework demonstrates excellent chemical, hydrothermal, and mechanical stability, [446] possesses reverse shape selectivity in the adsorption of hexane and xylene isomers. [447,448] Yang et al. studied CO 2 and CH 4 mixture adsorption and diffusion in UiO-66 using a fully flexible framework. [449] The force constants for the organic ligand in UiO-66(Zr) were directly taken from the CVFF force field, while the equilibrium bond distances and angles were assigned by averaging the values obtained from the DFT optimized crystal structure. The interactions between the inorganic and organic parts, that is, the torsion and bend term, were re-parameterized. The LJ potential parameters were taken from the DREIDING force field. A more transferable model, the BTW-FF model, included UiO-66 and UiO-67 in the initial parameterization set. [450] ZIF materials combine the topology and stability of zeolites with the chemical diversity of MOFs. Typical flexibility effect in ZIFs are for example the gate-opening behavior observed in ZIF-8, [451] and flexibility of ZIF-7 and its structural impact in alcohol adsorption. [452] Zheng et al. presented a full set of force field parameters for ZIF-8, based on the AMBER database and on previously computed partial charges, well reproducing the ZIF-8 structural properties over a wide range of temperatures and pressures. [453] The AMBER force field was also the basis of the flexible force field of Hu et al., [454] with several parameters taken from quantum chemical calculations. A structural transition from crystalline to amorphous as found in experiment was observed. Only with the structural flexibility included were the predicted diffusivities of CH 4 and CO 2 close to reported data in the literature. Also, generic force fields like DREIDING were used to model ZIF-8. [455] Recently, more general and transferable intra-molecular ZIF force fields were developed by Verploegh et al. [456] , ZIF-FF by Weng et al. [457] and MOF-FF for ZIFs by Durholt et al. [458] Several other general approaches and force field for MOFs have been developed. MOF-FF, a third generation force field of Schmid and coworkers is based on the building block methodology. The first generation has been discussed in the preceding paragraphs. [238] In the second generation, a generic algorithm was used to derive force field potentials for the zinc benzoate and dilithium terephthalate. [336] In the third generation, a new en-ergy expression was proposed as well as a new parameterization scheme for Cu and Zn paddlewheel-based structures and Zn4O and Zr6(OH)4-based MOFs. [459] A popular approach is to use a building block methodology utilizing ab initio calculations on nonperiodic clusters to fit parameters. QuickFF is a program for a quick and easy derivation of force fields for metal-organic frameworks from ab initio input. [460] The QuickFF force field protocol was extended for an improved accuracy of structural, vibrational, mechanical, and thermal properties of metal-organic frameworks like IRMOF-1, MIL-53(Al), CAU-13, and NOTT-300. [461] UFF4MOF [462,463] is an extension of the original Universal force field UFF (UFF) that incorporates additional atom types found in the Computation-Ready Experimental (CoRE) Database. [464] The BTW-FF model is a transferable inter-atomic potential that can be applied to MOFs regardless of metal or ligand identity. The initial parameterization set included IRMOF-1, IRMOF-10, IRMOF-14, UiO-66, UiO-67, and HKUST-1. [450] By extending the BTW-FF model, Bristow et al. developed the vibrational metal-organic framework (VMOF) force field. [465] Here, force field parameters are explicitly fitted on DFT-PBEsol calculated phonon spectra of periodic binary oxides such as ZnO, ZrO 2 , TiO 2 , and Al 2 O 3 . A modified MM3 Buckingham potential for the metal-linker interaction was needed to reproduce the ab initio and experimental structural and mechanical properties of the binary oxides more accurately.

Outlook
Bottom-Up vs Top-Down: Long-range interactions necessitate the use of macroscopic information in the force field parameterization. [382] That means that system properties like unit cell size/shape and elastic constants are used to fit local parameters in a top-down approach. In contrast, local interactions, like bond, bend, and torsions can be efficiently obtained from small clusters using ab initio data as a reference. Small clusters are unsuitable for obtaining long-range interactions, and more and more approaches are based on using the full crystallographic unit cell. Another reason for using full unit cell or even larger cells is the modeling of defects. [466,467] Ab Initio Approach vs Empirical Fitting: Parameters from one force field are in principle not transferable to another. Force field design and parameterization is highly non-trivial as evidenced by the large number of different force fields. For the different classes of force fields the explanation is rather obvious: they have very different function forms, but the reasons per class are different. For class I force fields, the lack of cross-terms makes the diagonal parameters less transferable. Class II force fields applied to systems where polarization and charge transfer is negligible are potentially more accurate, but require more parameters to optimize. However, most of the additional parameters are additional intra-molecular interactions that can be obtained from genetic algorithm using ab initio structural and vibrational information as target data. The main reason why force fields parameterization is difficult is the treatment of long-range interactions.
VDW interactions are notoriously difficult to compute accurately at the ab initio level. Yet they are responsible for most of interaction energies involved in adsorption and for the Adv. Theory Simul. 2019, 2, 1900135 large-scale structural changes of flexible MOFs. But even if you could compute them with unlimited precision, there is no oneto-one mapping of a classical force field on the QM PES. If one traces the energy for a CO 2 molecule in different orientations as a function of distance to a metal-site, then all of these energydistance curves have different errors, but all of them have an intrinsic error. This error will be magnified at high temperature where entropy comes into play (and sometimes becomes dominant). Therefore, fitting the ab initio data for VDW potentials is often replaced by fitting to experimental data. Fitting VDW potentials to experimental data (such as VLE curves) has additional advantages. It "lumps" other errors, like the assumption of a rigid molecule, into the mix which then are optimized as a whole. Even when flexibility and polarization are not explicitly included in the model, they implicitly enter via the fitted parameters.
Machine Learning: Molecular simulation is a powerful tool to conduct "in-silico" experiments on atomic systems. DFT is currently applicable to hundreds of atoms, but using a classical formulation trillions of atoms can be used. [468] However, while the increase in computational power and algorithms enables simulation of larger systems that can be run for longer times, one simulation itself is just one typical reproduction of the physics. To really get to the bottom of the physical-, chemical-, and biological phenomena that occur, it is vital to elucidate the what, why, and hows. Therefore, even if you could atomically simulate the system on large space and timescales, in many cases it is preferable to simplify the system significantly to focus on one particular aspect. For example, using transition state for rare events, the reaction coordinate provides valuable information what exactly happens during the transition and why it occurs.
The main use of molecular simulation is to provide understanding. For this purpose, qualitative agreement with experimental data is sufficient. Reproducing experiments is not the end point, but rather the start point for obtaining understanding. A decomposition of the total energy in terms with chemical meaning is essential to be able to investigate microscopic origins of physical behavior. For example, one could switch off terms like electrostatics to check the effect and influence, or increase point charges, or fix certain torsion angles, etc. Molecular simulations can therefore be viewed as "doing experiments on the computer." Machine learning can be used as a machinary to better parameterize an existing force field, but also as a new force field approach by itself. Replacing a force field by a machine learning approach has up-and downsides. A downside is that the machine learning force field might contain correlations that are artificial or potentially produce physics that is not real. However, if one sees this approach as an additional tool amongst many others, then one could take advantage of it in situations where it really could do great things. One such example is screening of a large set of structures for specific applications. [347,352,353,[469][470][471] Here, the problem of false positives would not matter too much because it is used to reduce the millions of structures to a small set of promising structures. This reduced dataset, usually a few dozen, are then re-evaluated and ranked using conventional simulation approaches. Zeolite databases on hypothetical structures have been used to analyze structure-property relations. [472] The Computation-Ready Experimental (CoRE) Database is one of the first attempts to construct a large set of MOF structures for analysis. [464,473,474] For the purpose of screening, new methodologies are actively developed to generate large sets of hypothetical structures, [475] as well as methods to detect and avoid duplicates. [476,477] Potentially the results of screening could depend on the used force field, but for small molecules not much difference was found in the ranking of materials for the tested DREIDING and UFF force fields. [478] GPU Computing: Graphics processing units (GPUs) were initially designed to deliver computer graphics. It was soon realized that the massive parallelism could also be exploited in many other computational tasks, for example, molecular simulation. This leads to GPGPU (General Purpose GPU) programming. Modern GPUs contain hundreds of processing units and potentially offer large speedups compared to CPU-computing. The growth of the market used to be driven by evolving graphic games, but currently augmented reality (AR), virtual reality (VR), and AI on the GPU are gaining a lot of traction.
The last decade GPU-accelerated molecular modeling has really matured. [479] Although capable of very high rates of floatingpoint arithmetic, GPUs are also intrinsically highly parallel processors and their effective exploitation typically requires extensive software refactoring and development. [480] CPU parallelism can be achieved with SIMD, OPENMP, and MPI which is more task-parallel, but for the GPU specialist programming languages such as CUDA and OpenCL were developed. Many force field implementations have now GPU-support, for example, Amber [481][482][483] AMOEBA, [484] and MFF and UFF. [485] Examples of codes that are designed to run on the GPU are the OpenMM [486] for MD, and GOMC [487] for MC. The programming paradigms for GPU computing are vastly different in CUDA/OpenCL. In these languages, a computation must be mapped to workgroups of work-items that can be executed in parallel on the compute units. That means that new algorithms had to be developed that map optimally on GPU hardware. Examples of these modifications are: nonbonded-interactions [488] and efficient cell list implementations, [489] electrostatics, [490][491][492] many-body potentials, [493,494] ReaxFF, [495] rigid-body constraints, [496] Monte Carlo, [487,[497][498][499][500][501] and MD. [502][503][504] Other use cases are to speed up post-processing, [505] screening, [506] accelerating analysis of void space in porous materials on multicore and GPU platforms, [507] and boosting theoretical zeolitic framework generation for the determination of new materials structures. [508] The iRASPA GPUaccelerated visualization software for materials allows interactive examination of energy surfaces and computation of surface areas and void volume [509] While on the CPU there is little difference in performance between double-and single-precision (except when using SIMD units), the GPU is optimized for single-precision (which is sufficient for the dominant use case: graphics). Besides that doubleprecision is much slower on the GPU, there are also significantly less computing units of it. Therefore, most GPU codes use only single-precision. [510] However, for challenging systems the use of single-floating point precision may result in quantitatively and even physically wrong results. [511] The goal of many algorithms is to obtain the same accuracy but better performance with more low-precision computations. The relation between computational precision and final accuracy is complicated. The challenge in using single-precision operations is the reduction of Adv. Theory Simul. 2019, 2, 1900135 cancellation or round-off errors in the calculation results. One solution is to use mixed precision models, [512] another one is to use quasi double-precision methods to accelerate scientific applications. [510] Many simulation methods require a large number of independent random variables generated at each step. Some solutions include generating random numbers on the CPU and uploading these sets to the GPU before use. However, also solutions have been proposed that generate random numbers on the GPU. [513,514] The future looks bright for GPGPU. Increasing higher performance will be achieved by increasing the number of computing units. Also tensor-cores have been recently introduced for acceleration of ML on the GPU. An alternative is that also on the CPU side more and more cores are and will be introduced. However, GPUs currently significantly outperform CPUs on both bandwidth and floating point operations, and this gap is growing.
Open Force Field Initiative: Current force fields are very successful at what they were constructed for, but at the same time limited in accuracy and transferability when applied to other systems. There is no "one force field to rule them all," as evidenced by the large amount of different force fields in common use (Table 1). The core of present day force fields has remained almost the same since the 1980s and 1990s. Relatively few new efforts have been undertaken since then. Developing force field is hard and time-consuming.
Scientific software is called a "community code" if it is freely available, written by a team of developers who welcome user input, and has attracted users beyond the developers. [515] Recent contributions to the LAMMPS code by other authors are, for example, an implementation of the neural network potential within the molecular dynamics package LAMMPS, [516] OCTP, a tool for on-the-fly calculation of transport properties of fluids with the order-n algorithm, [517] and PyLAT, an open-source Python-based post-processing routines to compute viscosities, self-diffusivities, ionic conductivities, molecule or ion pair lifetimes, dielectric constants, and radial distribution functions using best-practice methods. [518] The Open Force Field Consortium (https://openforcefield.org) is one such effort to bundle resources via community codes. The Open Force Field Consortium aims to solve these and other significant problems by developing a new generation of open force fields, along with the open software infrastructure and open datasets to advance the field and accelerate progress. The consortium seeks to improve force field science and accelerate progress. It aims to make force field fitting/extension straightforward by automating derivation of: i) numerical parameters ii) classification of atom types, bond types, etc., and iii) confidence metrics.
The initiative generate/curates open datasets necessary for producing high-accuracy (bio-) molecular force fields, but also develops a machinery to automatically parameterize molecular mechanics force fields. One important aspect is automating chemical perception, that is, the way force field parameters are assigned to a molecule based on chemical environment. The SMIRKS Native Open Force Field (SMIRNOFF) avoids the complexities of atom typing and the format allows for changes to easily be made in both the chemical perception and quantitative parameter space. [519]

Supporting Information
Supporting Information is available from the Wiley Online Library or from the author.