Net charge changes in the calculation of relative ligand-binding free energies via classical atomistic molecular dynamics simulation

The calculation of binding free energies of charged species to a target molecule is a frequently encountered problem in molecular dynamics studies of (bio-)chemical thermodynamics. Many important endogenous receptor-binding molecules, enzyme substrates, or drug molecules have a nonzero net charge. Absolute binding free energies, as well as binding free energies relative to another molecule with a different net charge will be affected by artifacts due to the used effective electrostatic interaction function and associated parameters (e.g., size of the computational box). In the present study, charging contributions to binding free energies of small oligoatomic ions to a series of model host cavities functionalized with different chemical groups are calculated with classical atomistic molecular dynamics simulation. Electrostatic interactions are treated using a lattice-summation scheme or a cutoff-truncation scheme with Barker–Watts reaction-field correction, and the simulations are conducted in boxes of different edge lengths. It is illustrated that the charging free energies of the guest molecules in water and in the host strongly depend on the applied methodology and that neglect of correction terms for the artifacts introduced by the finite size of the simulated system and the use of an effective electrostatic interaction function considerably impairs the thermodynamic interpretation of guest-host interactions. Application of correction terms for the various artifacts yields consistent results for the charging contribution to binding free energies and is thus a prerequisite for the valid interpretation or prediction of experimental data via molecular dynamics simulation. Analysis and correction of electrostatic artifacts according to the scheme proposed in the present study should therefore be considered an integral part of careful free-energy calculation studies if changes in the net charge are involved. © 2013 The Authors Journal of Computational Chemistry Published by Wiley Periodicals, Inc.


Introduction
The calculation of binding free energies is a standard task in the thermodynamic analysis of multicomponent molecular systems involving an association reaction between two system constituents, as, for example, an enzyme and a substrate, a receptor and a drug, or a nanocage and a guest compound. Physics-based approaches to compute binding free energies rely on statistical mechanics, which expresses the free energy as the natural logarithm of the system partition function (multiplied by the negative of the thermal energy, 2k B T, where k B is Boltzmann's constant). The underlying configurational ensembles can be generated by, for example, molecular dynamics (MD) simulation. A wealth of methodological improvements, along with increased computational resources allow (in principle) the accurate calculation of binding free energies, as extensively reviewed in the case of protein-ligand association. [1][2][3][4][5][6][7][8][9] However, if conducted without a proper eye on all potential pitfalls, binding free energies may be spuriously affected by limitations of MD simulations, such as, for example, an inadequate force-field description, approximations or/and assumptions in the free-energy calculation methodology, insufficient configurational sampling, or spurious configurational sampling due to the use of an effective electrostatic interaction function. These points are briefly discussed in turn below.
First, besides intrinsic deficiencies of classical force fields such as, for example, the neglect or mean-field treatment of electronic polarizability [10,11] and the use of effective interac-tion energy functions [12][13][14] with empirical parameters, additional problems arise if the system under consideration involves molecular species for which no force-field parameters are available. For instance, standard (bio-)molecular force fields may not provide parameterizations of certain metal ions, cofactors, or drug molecules. Ideally, the corresponding parameters should be parameterized against experimental data using a strategy consistent with the parameterization of the used force field. In practice, however, they are either inferred based on chemical intuition and comparison with similar compounds or taken from automatized parameterization protocols. [15,16] In addition, although the solvent representation in most (bio-)molecular force fields is already highly simplistic (rigid three-site models [17] ), its structural characteristics may be relinquished for the sake of computational savings, the solvent then being modeled implicitly and the solvent-generated electrostatic potential computed via numerical or empirical (generalized Born) solutions of the Poisson-Boltzmann equation. [18][19][20] Second, because they rely on a thorough characterization of the phase space of the system, simulations involving free-energy calculations are computationally expensive, which is why a number of approximate methods are sometimes applied. For instance, the free energy of charging a neutral particle may be estimated from an electrostatic linear-response approximation [21,22] or cumulant expansions at the endpoints of thermodynamic integration (TI). [23][24][25] Similarly, the free energy of growing the van der Waals envelope of a particle is sometimes approximated using physics-based [26,27] or empirical [21] relationships. Furthermore, assumptions in the ansatz of free-energy calculation methods, such as, for example, sufficient overlap of the phasespace distribution functions in different states of relative freeenergy calculations, [28] or electrostatic linear response [22,29] may limit the scope of their applicability. Lastly, discretization errors in numerical free-energy calculation methods, for example, the window width in potential of mean force calculations [30,31] or the integration method in TI, [32,33] limit the precision of the obtained results, although usage of optimal methods for statistical analysis [e.g., Bennett acceptance ratio (BAR) [34,35] or multistate BAR [36,37] approaches] may lead to significant gains in computational efficiency and statistical certainty.
Third, the phase space accessible to the system should be sampled exhaustively and according to the Gibbs measure appropriate for the desired thermodynamic ensemble, for example, canonical Boltzmann weighting in the case of simulations at constant particle number, temperature and volume. However, exhaustive sampling of phase space is complicated by the shear number of possible configurations, growing exponentially with the system size, and by energy barriers higher than k B T, usually not amenable to transitions in plain MD simulation. Enhanced sampling methods can be used to improve coverage of the relevant phase space. A widely-used technique to address this problem involves the alteration of the potential energy function, for example, through local [38,39] or nonlocal [40,41] biasing, or more complex smoothening procedures, [42,43] along with subsequent reweighting of the sampled configurations to the Gibbs measure corresponding to the unaltered potential energy function.
Finally, even if the phase space accessible to the system is sampled exhaustively and according to the Gibbs measure appropriate for the desired thermodynamic ensemble, the sampled configurations might not be representative of the real (experimental) situation because of an approximate or incorrect calculation of interatomic interactions. This is generally the case for electrostatic interactions which, due their long-range nature, are treated in an effective manner during MD simulations. [44][45][46][47][48][49] Ensuing artifacts become strongly apparent in the configurational sampling of systems involving charged particles or in free-energy calculations involving the change of the net charge of the system (charging free energy calculations), and have been reviewed extensively. [48][49][50][51][52][53][54][55][56] For instance, if electrostatic interactions are calculated via lattice-summation (LS) over a periodic system in charging free energy calculations, the orientational polarization of the environment of the particle to be charged will be affected by the influence of the periodic copies of this particle, which is an inappropriate contribution if actually a truly nonperiodic system is to be described. The magnitude of the introduced errors may be strongly dependent on the parameters of the system or the interaction function (e.g., the box-edge length), giving rise to so-called methodology-dependent charging free energies. [54] It has been shown before how charging free energies of monoatomic [54,57,58] and polyatomic [59][60][61] ions in infinitely dilute aqueous solution can be corrected for these errors, such that methodology-independent values are obtained.
The goal of the present article is to address the last point above for model systems representative of a protein-ligand complex in aqueous solution, that is, to present a correction scheme for the charging of polyatomic ions in a low-dielectric cavity functionalized with different chemical groups (section "Simulated guest-host systems"), such that the raw charging free energy DA raw chg ðHÞ of a ligand bound to a host molecule can be corrected to a methodology-independent value DA chg ðHÞ (Fig. 1). Comparison with the corresponding raw or corrected charging free energies in bulk water, DA raw chg ðWÞ or DA chg ðWÞ, respectively, yields the raw or corrected binding free energies of the charged ligand to the host molecule relative to a neutralized analog of the ligand, in the following denoted as DDA raw chg and DDA chg , respectively (Fig. 1). The Figure 1. Thermodynamic cycle illustrating the connection between the binding free energy DA Gd bind of a noninteracting (dummy) guest molecule G d , the binding free energy DA Gq 0 bind of a neutral guest molecule G q 0 , the raw binding free energy DA raw bind of a charged guest molecule G q (full atomic partial charges), and the corrected binding free energy DA bind of the latter guest to a host molecule H. Corresponding complexes formed by the host and the bound guest molecules are denoted H:G d , H:G q 0 , H:G q , and H:G q , respectively. The free energies of growing the van der Waals envelope of the guest molecule, DA vdW ðEÞ, raw free energies of charging, DA chg ðEÞ, and correction terms, DA cor ðEÞ, of the guest in environment E (either water W or the host molecule H) are associated with the reversible work of creating neutral van der Waals particles (G d mutated into G q 0 ), installing partial charges (G q 0 mutated into G q ) and applying corrections for approximate-electrostatics, summation, and finite-size artifacts (G q represented in the simulated situation associated with these artifacts versus in the ideal situation, i.e., a macroscopic nonperiodic system with Coulombic electrostatic interactions). The differences DDA raw chg ¼ DA raw chg ðHÞ2DA raw chg ðWÞ and DDA chg ¼ DA chg ðHÞ2DA chg ðWÞ are the raw and corrected charging contributions to the binding free energy, that is, the raw and corrected binding free energies of the charged guest G q relative to its neutralized analog G q 0 [eqs. (1) and (21)].

FULL PAPER
WWW.C-CHEM.ORG possible occurrence of methodology-dependent artifacts (caused by the use of an approximate electrostatic interaction function, an improper summation scheme and simulated systems of finite size) in DDA raw chg directly impairs calculations of the (absolute) binding free energy of a charged ligand and of relative binding free energies between ligands of different net charge. The value obtained for DDA raw chg is not representative of a macroscopic nonperiodic system with Coulombic electrostatic interactions, and only DDA chg allows a meaningful comparison to or prediction of experimental data measured in systems of macroscopic extent (Fig. 1). This issue was, however, not duly appreciated in previous work. Examples from the authors' own research include, for example, the calculation of ligand binding free energies [25,62,63] or redox potentials. [64] On the long term, increases in computational power as currently mainly driven by graphics processing unit-based electrostatic interaction calculation [65][66][67] and advances in multiscale simulation methodologies targeted to an improved representation of electrostatic interactions [68,69] may eventually allow for the simulation of macroscopic nonperiodic systems with Coulombic electrostatic interactions, or electrostatic interactions truncated at sufficiently large distances, such that an adequate representation of experimental bulk systems is achieved. Before such techniques have become state of the art, however, a scheme that corrects for methodology-induced artifacts will prove valuable in the calculation of binding free energies of charged ligands to (bio-)macromolecular host compounds.

Methods
Simulated guest-host systems A previously described [70] simplified guest-host system was used to assess the size of methodology-induced artifacts in the calculation of (relative) charging free energies (Table 1). Two oppositely-charged guest molecules, methylammonium (MAM) and acetate (ACE), binding to a host C60 molecule (buckyball), or derivatives thereof, were considered. In comparison to a realistic buckyball model, here all C-C bonds were artificially extended to 0.2 nm. For the host molecules, four variants were used: (i) an empty, apolar C60 cavity (CAPO); (ii) a C60 cavity containing a covalently-bound amide group, representing a neutral polar cavity with hydrogen-bonding capability (CHB); (iii) a C60 cavity containing a covalently-bound methylammonium group, representing a positively-charged cavity (CPOS); (iv) a C60 cavity containing a covalently-bound carboxylate group, representing a negatively-charged cavity (CNEG). Because of the high symmetry and low flexibility in these systems, the observed artifacts can be expected to be solely due to methodological aspects rather than, in addition, insufficient sampling. Figure 2a provides a graphical illustration of an example guest-host complex used in the present study. GROMOS molecular topology [71] files for the eight guest-host complexes are provided as supporting information.
The raw charging component DDA raw chg of binding free energies of charged guests MAM and ACE to host molecules CAPO, CHB, CPOS, and CNEG was calculated with MD simulation according to a thermodynamic cycle (Fig. 1) involving the free energies of guest-charging in water DA raw chg ðWÞ and in the host cavity DA raw chg ðHÞ, The MD simulations involved cubic computational boxes containing one guest molecule at multiple charge states q G varying from 0.0 to the full charges Q G ¼ 1:0 e or 21:0 e for MAM and ACE, respectively, in either pure water or the hydrated host cavity. Each system was simulated in four different box sizes of edge lengths L ss ; L s ; L m , and L l , of about 2.46-2.53, 2.90, 3.21-3.25, or 3.80-3.81 nm, respectively, differing by the number of water molecules. Furthermore, two different methods were used to calculate electrostatic interactions (section "MD simulations"), namely either LS [72,73] or moleculebased cutoff truncation in combination with a Barker-Watts reaction-field correction [74] (BM). Simulations with the LS scheme were performed in boxes of edge length L ss ; L s ; L m , and L l , and are in the following referred to as LS,ss, LS,s, LS,m, and LS,l, respectively. Simulations with the BM scheme were performed in boxes of edge length L m and L l , and are in the following referred to as BM,m and BM,l, respectively.

MD simulations
All MD simulations were performed either with a modified version of the GROMOS96 program [71] or with the GROMOS11 program. [75] The former was exclusively used for free-energy calculations in simulations using the particle-particle-particlemesh (P 3 M) method [72,73] for the treatment of electrostatic interactions. Water was represented by means of the three-site simple point charge (SPC) model. [76] Host and guest molecules were described with the GROMOS 53A6 force-field parameter set as in the previous study of Ref. [70]. For CNEG, the appropriate GRO-MOS improper dihedral type 1 (reference value of 0 ) was used for the improper dihedral angle in the formate group rather than an erroneous type 2 (reference value of 35 as in Ref. [70]).
All simulations were carried out under periodic boundary conditions (PBC) based on cubic computational boxes. The equations of motion were integrated using the leap-frog scheme [77] with a timestep of 2 fs. The solute bond-length distances and the rigidity of the water molecules were enforced by application of the SHAKE algorithm [78] with a relative  24 . The center of mass translation of the computational box was removed every 2 ps. The temperature was maintained at 300 K by weak coupling to a heat bath [79] using a coupling time of 0.1 ps and the volume was kept constant. Electrostatic interactions were either calculated using LS based on the P 3 M algorithm with tinfoil boundary conditions, [72,73] or using the BM scheme. [74] The LS scheme was applied with [72,80,81] a spherical hat charge-shaping function of width 1.0 nm, a triangular shaped cloud assignment function, a finite-difference (FD) scheme of order two and a grid spacing of about 0.08-0.12 nm in the different systems. The self-energy term [23,73,[82][83][84] of the guest molecule was not included in the electrostatic potential at the guest atom sites to be consistent with the previously developed correction scheme for single-ion solvation free energies. [54,57] In simulations with the LS scheme, Lennard-Jones interactions were truncated at an atom-based cutoff distance R C ¼ 1:0 nm. Both real-space electrostatic and Lennard-Jones interactions were calculated at each timestep based on a pairlist updated at each timestep. The BM scheme was applied with a value BW ¼ 66:6 for the relative permittivity of the dielectric continuum surrounding the cutoff sphere, as appropriate [85] for the SPC water model. Here, too, the selfenergy term [84,86] of the guest molecule was not included in the calculation of the electrostatic potential at the guest atom sites to be consistent with previous work. [54,57] In simulations with the BM scheme, electrostatic and Lennard-Jones (van der Waals) interactions were truncated at a charge-group cutoff distance R C ¼ 1:4 nm, and calculated at each timestep based on a pairlist that was updated at each timestep. The calculation of the charging component to the binding free energy was performed in two TI procedures [87] considering the free energies of charging the guest molecules in water and in the host cavity (Fig. 1). All production simulations for the free-energy calculations were preceded by an equilibration period of 0.3 ns and lasted 1 ns. The configurations sampled along these simulations were written to file every 0.3 ps for subsequent analysis, whereas the corresponding energetic data was written every timestep.

Structural properties
The configurations sampled in simulations LS,ss, LS,l, and BM,l of the fully charged guests MAM and ACE in hydrated host molecules CAPO, CHB, CPOS, and CNEG were analyzed in terms of the solvent radial distribution function g(r) around the buckyball center of mass, and the solvent radial polarization P(r) around the charged guest. These functions were calculated as Figure 2. a) Stick representation of the guest-host complex MAM-CNEG. All guest-host complexes used in the present study are similarly composed of a noncovalently-bound oligoatomic ion (guest) and an artificial buckyball molecule possibly functionalized with a covalently-bound chemical group (host), as explained in section "Simulated guest-host systems." In the following, they are depicted by a simplified schematic where the ion is drawn as a filled black circle and the buckyball is drawn as the gray surrounding structure. b) Graphic illustration of the meaning of correction terms DA pol , DA psum , DA dir , and DA exc [eqs. (10), (11), (14), (15), (16), (17), (18)] to be applied to the raw charging free energy DA raw chg [eq. (7)], for charging of a guest molecule in a solvated host to get a corrected charging free energy DA chg [eq. (20)] for the LS and BM electrostatic schemes. The guest, host, and water molecules are depicted in black, dark gray, and light gray colors, respectively. Periodic copies of the computational system are depicted with dashed lines. Arrows labeled with the above correction terms depict the concerned interactions, that is, guest-solvent interactions (DA pol , DA psum ), guest-host interactions (DA dir ), and environment-mediated guest-guest interactions (DA exc ). [Color figure can be viewed in the online issue, which is available at wileyonlinelibrary.com.] gðrÞ ¼ ð4pg w Drr 2 Þ 21 hNðr; DrÞi; (2) and PðrÞ ¼ l w g wg ðrÞhÑ 21 ðr; DrÞMðr; DrÞi; where h…i denotes ensemble averaging, g w is the solvent number density, Nðr; DrÞ is the number of water molecules j for which r2Dr=2 < r j r þ Dr=2 (r j denoting all possible minimum-image vectors connecting the center of mass of the 60 buckyball carbon atoms to the oxygen atom of any periodic copy of water molecule j),Ñðr; DrÞ is the number of water molecules j for which r2Dr=2 < r j r þ Dr=2 (r j here denoting all possible minimum-image vectors connecting the MAM nitrogen atom or the ACE carboxylate carbon atom to the oxygen atom of any periodic copy of water molecule j), Dr ¼ 0:01 nm is the bin width, l w ¼ 0:0473 e nm is the molecular dipole moment of the SPC water model [76] and Mðr; DrÞ is defined as e j being a unit vector along the dipole moment of molecule j.
where S ¼ 66:6 is the relative dielectric permittivity of the SPC water model. [85] Equation (6) is an approximation because in the considered systems the charge Q G 0 is actually spread out over multiple atom sites. In addition, the dielectric permittivity around the highly-charged MAM-CPOS and ACE-CNEG systems may be lower than the bulk value of 66.6.

Free-energy calculations
Raw charging free energy. Raw charging free energies DA raw chg ðWÞ and DA raw chg ðHÞ [eq. (1)] were calculated with the TI approach [87] along progressive installation of k-dependent intermolecular electrostatic interactions U inter elec ðx; kÞ, where the subscript E denotes the environment of the guest (either W or H), x ¼ fx G ; x E g denotes the 3N-dimensional coordinate vector of the system containing N G guest atoms and N E water and host atoms, k denotes the scaling parameter of the TI procedure, and h…i k;E denotes ensemble averaging over configurations sampled during a simulation where the guest is in environment E, and guest partial atom charges are scaled by k. The term U inter elec ðx; kÞ in eq. (7) was calculated based on the sampled configurations as where o is the permittivity of vacuum and q i is the partial charge of atom i. w ij ðxÞ is the effective pairwise electrostatic interaction function which describes the implementation of the particular electrostatics scheme. [71] The exact details of this function depend on the implementation in a simulation program, and can be found elsewhere. [71,86] Note that the electrostatic interaction energy U inter elec ðx; kÞ is exempt of intramolecular contributions. During the simulations of a given charge state, the partial charges of the guest atoms were scaled by k. For all freeenergy estimates, eleven charge states (k ¼ 0:0, 0.1, …, 0.9, 1.0) were used and, the integral in eq. (7) was evaluated numerically with the trapezoidal rule. Statistical error estimates on ensemble averages pertaining to particular k-values were obtained from block averaging. [88] Errors on the free-energy values were calculated by numerical integration (trapezoidal rule) of the individual errors and amounted to about 0.2-1.9 kJ mol 21 .
Free-energy correction terms. The raw charging free energies DA raw chg ðEÞ [eq. (7)] were used to calculate corresponding methodology-independent values DA chg ðEÞ as [54,57,58] DA chg ðEÞ ¼ DA raw chg ðEÞ þ DA cor ðEÞ; where DA cor ðEÞ is a free-energy correction for the various methodology-dependent errors committed during the simulation. These errors have been discussed extensively for the case of monoatomic [54,56,58] and polyatomic [61] single-ion hydration.
In simulations with the LS and BM schemes, they arise from: i. The deviation of the solvent polarization around the charged solute from the polarization in an ideal macroscopic system with Coulombic electrostatic interactions. This is a consequence of the use of a finite (microscopic) system during the simulation, for example, a computational box simulated under PBC, of the use of approximate (non-Coulombic in the limit of infinite system sizes) electrostatic interaction functions, for example, involving cutoff truncation, and of the use of a solvent model with an inaccurate dielectric permittivity. The corresponding correction term is here denoted DA pol . Note that in previous work, [61] this correction term was denoted DG AþBþD , or split up into three terms [54,57,58] DG A ; DG B , and DG D . ii. The deviation of the solvent-generated electrostatic potential at the atom sites of the charged solute as calculated from the simulated trajectory from the "correct" electrostatic potential. This is a consequence of the possible application of an inappropriate summation scheme for the contributions of individual solvent atomic charges to this potential, that is, summing over individual point charges ("P-summation" [89] ) versus summing over charges within individual solvent molecules ("Msummation" [89] ), as well as of the possible presence of a constant offset in this potential, due to the presence of an interfacial potential at the surface of the solute along with the constraint of vanishing average potential in the LS scheme. The corresponding correction term is here denoted DA psum . Note that in previous work, [54,57,58,61] this correction term was denoted DG C1 . iii. The spurious calculation of guest-host interactions with an effective electrostatics scheme rather than with Coulombic electrostatic interactions. The corresponding correction term is denoted DA dir . Note that it is only pertinent to systems containing the host moiety, that is, it only occurs in DA cor ðHÞ. iv. The presence of electrostatic interactions between excluded atoms in the guest molecule. The corresponding correction term is denoted DA exc . Note that it is absent in the case of monoatomic guest compounds and that it was not explicitly listed in Ref. [61], because there these interactions were considered an integral part of the environmentinduced electrostatic potential at the solute atoms. Figure 2b illustrates the interactions addressed by the afore mentioned correction terms DA pol , DA psum ; DA dir , and DA exc for the case of a guest molecule binding noncovalently to a solvated host. In the following, the calculation of these terms is explained. As in previous work, [61] DA pol can be deduced from the results of two continuum-electrostatics calculations, for the LS scheme and for the BM scheme, where DA CB chg is the charging free energy of the guest molecule in a macroscopic nonperiodic system with Coulombic electrostatic interactions based on the experimental solvent permittivity (CB). DA LS chg and DA BM chg are the charging free energies of the guest molecule in a periodic system with LS or BM electrostatic interactions based on the model solvent permittivity, respectively. Here, the relative dielectric permittivity is set to 66.6 in the calculation of DA LS chg and DA BM chg as appropriate for the SPC water model [85] and to 78.4 in the calculation of DA CB chg , as appropriate for water, [56] to account for the inaccurate dielectric permittivity of the SPC water model. As the guest molecules considered in this study are essentially rigid, DA LS chg ðWÞ and DA BM chg ðWÞ are essentially configuration-independent. Also, the rotational and translational sampling of the guest molecules in the host cavities does not lead to significant fluctuations in DA pol (data not shown). Therefore, the calculations of DA CB chg , DA LS chg , and DA BM chg were only performed based on a sin-gle structure, taken as the final solute configuration of the simulation at guest charge states k 2 f0:0; 0:2; 0:4; 0:6; 0:8; 1:0g in the system with box-edge length L l , as where X ¼ CB, LS, or BM, and / inter ðr i ; k; XÞ is the environment-generated electrostatic potential evaluated at guest atom site i and charge state k for the given boundary conditions and electrostatics scheme X. For both the LS and BM scheme, / inter ðr i ; k; CBÞ was evaluated using the FD Poisson equation solver of Refs. [90][91][92] with the appropriate boundary conditions, and solvent permittivity. The FD solver was also used to evaluate / inter ðr i ; k; LSÞ. Because the FD solver does not offer the option of using the BM scheme, a combination with the fast fourier transform (FFT) Poisson equation solver of Refs. [93,94] was used to evaluate / inter ðr i ; k; BMÞ as where the first and the last two terms on the right-hand side are calculated based on the FD and FFT Poisson equation solvers, respectively. This is done to enhance cancellation of griddiscretization and boundary-smoothing errors in the two different Poisson equation solvers. In both algorithms, the grid spacing was set to about 0.02 nm and the convergence threshold for the electrostatic free energy was set to 10 26 kJ mol 21 . A van der Waals envelope was used to define the guest-host system, where the atomic radii were based on distances at the minimum of the Lennard-Jones potential between the different solute atoms and the oxygen atom of a SPC water molecule [76] using the Lennard-Jones interaction parameters of the GROMOS 53A6 force-field parameter set, [95] reduced by an estimate [96] for the radius of a water molecule (0.14 nm). Polar hydrogen atoms were treated differently [53,55] and assigned an atomic radius of 0.05 nm. Instead of using eq. (12) to evaluate DA X chg , less computationally intensive but more approximate approaches can be used, which are discussed in Appendix section "Calculation of DA pol ". The term DA psum corrects for the atom-based summation scheme implied by the LS and BM schemes in comparison to a proper molecule-based summation scheme. [89] In the present study, it is calculated as for the LS scheme and for the BM scheme, where N A is Avogadro's constant, c w is the quadrupole-moment trace of the water model relative to its single van der Waals interaction site, N w is the number of FULL PAPER WWW.C-CHEM.ORG water molecules, BW the reaction-field permittivity, R C the cutoff distance, L 3 the box volume (here, a constant box-edge length L is adopted because all simulations were performed at constant volume), and hN w ðR C Þi is the average number of water molecules present within the cutoff sphere around the center of mass of the 60 buckyball carbon atoms for in-host charging, or around the MAM nitrogen or ACE carboxylate carbon atoms in the case of in-water charging. For the SPC water model, [76,89] c w ¼ 0:0082 e nm 2 . Equation (15) (10) and (11)] also corrects for the spurious vanishing average (over the computational box) electrostatic potential contribution due to the guest and host quadrupole moments in the MD simulations. Note that this implicit inclusion of the solute-associated DA psum correction in DA pol was not recognized in previous work. [61] Similar to Ref. [61], the term DA C2 was neglected in the present study because this term is proportional to the ratio of the guest volume to the box volume, that is, its magnitude is very small for the systems considered here.
The term DA dir , to be applied only to raw charging free energies of the guest in the host, was calculated as where DA CB chg;GðHÞ , DA raw;LS chg;GðHÞ , and DA raw;BM chg;GðHÞ are charging free energies of the guest due to the host calculated with Coulombic electrostatic interactions in a nonperiodic system and with effective electrostatic interactions (LS or BM) under PBC, respectively. The guest-host complex configurations sampled in the explicitwater MD simulations at all guest charge states q G (section "Raw charging free energy") were used to extract DA CB chg;GðHÞ ; DA raw;LS chg;GðHÞ , and DA raw;BM chg;GðHÞ in postanalyses under NPBC (guest-host complex in vacuum described with Coulombic electrostatic interactions) and under PBC (guest-host complex described with LS or BM electrostatic interactions), respectively (section "Solute and solvent contributions to the free energy of charging").
The term DA exc was calculated as wherew is a modified LS or BM electrostatic interaction function exempt of self term. The integrand of eq. (18) corresponds to minus the first term in eq. (20) of Ref. [61] and corrects for the presence of electrostatic interactions between excluded atoms (first and second covalently-bound neighbors) in the Hamiltonian used in the present simulations. Electrostatic interactions between excluded atoms are equal to the normal interaction function applied to nonexcluded atoms, but reduced by the Coulombic contribution. As a result, excluded atoms may be viewed to interact via a term that depends on the representation of the surroundings of the solute, that is, periodic copies of the computational box in the case of the LS scheme and a continuous medium of homogeneous relative dielectric permittivity in the case of the BM scheme. Therefore, in the present study, interactions between excluded atoms are regarded as contributing in a methodologically-dependent way to the charging free energy. Application of DA exc removes the corresponding contribution. Given the above correction terms, the charging free energy DA chg ðEÞ is calculated according to eq. (9) as the sum of the raw charging free energy DA raw chg ðEÞ, and the correction terms DA pol [eqs. (10) and (11)], DA psum [eqs. (14) and (15)], DA dir [eqs. (16) and (17)] and DA exc ðEÞ [eq. (18)] as DA chg ðWÞ ¼ DA raw chg ðWÞ þ DA pol þ DA psum þ DA exc ðWÞ; (19) in the case of charging in water, and as in the case of charging in the host. These charging free energies were calculated for guests MAM and ACE in water and host molecules CAPO, CHB, CPOS, and CNEG and yield the corrected charging component DDA chg of binding free energies of the guest to the host, Solute and solvent contributions to the free energy of charging The trajectories of guest-host complexes were reanalyzed to obtain the raw free energy of charging the guest molecule due to the host and periodic host copies DA raw chg;GðHÞ as DA raw chg;GðHÞ ¼ ð dk½U tot elec ðk; GHÞ2U tot elec ðk; GÞ2U tot elec ðk; HÞ; (22) where U tot elec ðk; GHÞ; U tot elec ðk; GÞ, and U tot elec ðk; HÞ are the total electrostatic energies sampled in trajectories pertaining to FULL PAPER WWW.C-CHEM.ORG guest charge states defined by k using modified interaction parameters with full guest and host charges, full guest and zeroed host charges and zeroed guest and full host charges, respectively. The corresponding raw free energies of charging the guest molecule due to the solvent and periodic solvent copies DA raw chg;GðWÞ are calculated as DA raw chg;GðWÞ ¼ DA raw chg ðHÞ2DA raw chg;GðHÞ ; where DA raw chg ðHÞ is given by eq. (7). Corrected values DA chg;GðHÞ and DA chg;GðWÞ are calculated as the sum of the raw charging free energies and the correction term DA dir [eqs. (16) and (17) in the case of DA raw chg;GðHÞ , and as the sum of the raw charging free energies and the correction terms DA pol ; DA psum , and DA exc [eqs. (10), (11), (14), (15), and (18)], in the case of DA raw chg;GðWÞ .

Results
Application of correction terms for spurious solvent polarization and wrong dielectric permittivity of the solvent model (DA pol ) [eqs. (10) and (11)], improper electrostatic potential summation (DA psum ) [eqs. (14) and (15)], effective guest-host direct electrostatic interactions (DA dir ) [eqs. (16) and (17)] and the presence of electrostatic interactions between excluded solute atoms in the Hamiltonian (DA exc ) [eq. (18)] to raw charging free energies DA raw chg [eq. (7)] yields corrected values DA chg [eqs. (19) and (20)] reported in Table 2 for charging of guests MAM and ACE in water or in complex with the hydrated host molecules CAPO, CHB, CPOS, or CNEG (Table 1). This is illustrated for four different sizes of the computational box (L ss ; L s ; L m and L l ) and the two different electrostatics schemes (LS and BM) considered (section "MD simulations"). While the root-mean-square deviations for DA raw chg are 6.6, 6.8, 6.7, 8.1, and 9.0 kJ mol 21 for charging of MAM in CAPO, CHB, CPOS, CNEG, and water, respectively, and 12.4, 12.3, 11.8, 13.9, and 10.2 kJ mol 21 for charging of ACE in CAPO, CHB, CPOS, CNEG, and water, respectively, they are reduced to 0.8, 1.0, 1.5, 2.1, and 1.4 kJ mol 21 for MAM and 0.5, 0.6, 1.6, 2.5, and 0.6 kJ mol 21 for ACE in corresponding corrected free energies DA chg . Complexes MAM-CNEG and ACE-CNEG exhibit the largest rmsd values in corrected charging free energies (2.1 and 2.5 kJ mol 21 ). Notably, for these complexes, the DA chg value from simulation LS,ss has higher magnitudes by up to 6.0 kJ mol 21 (MAM-CNEG) or up to 8.0 kJ mol 21 (ACE-CNEG) compared to simulations LS,s, LS,m, LS,l, BM,m, and BM,l. These deviations might be due to the inability of the continuum-electrostatics representation to capture short-range artifacts in solvent structure.
For all guest-host complexes, in simulations LS,ss, g(r) shows spurious density fluctuations (overestimated height of the first and second peaks; Fig. 3) and P(r) shows marked underpolari-zation in comparison to the Born polarization P Born ðrÞ (due to the large influence of periodic solute copies, a consequence of the small edge length of the computational box; Fig. 4). As discussed in Appendix section "Calculation of DA psum ", the former artifact might cause the DA psum correction [eq. (14)] to be inadequate. The long-range regime of the latter artifact is corrected by the DA pol correction [eqs. (10) and (11)]. However, short-range artifacts in the solvent polarization, which affect solvation shell structure in the vicinity of the guest-host complex, cannot be represented by a continuum-electrostatics description of the solvent and are thus not captured by the DA pol correction [eqs. (10) and (11)]. Such artifacts appear especially pronounced in simulation LS,ss of complexes MAM-CNEG and ACE-CNEG. This is evidenced by the relatively large deviation of the short-range P(r) in simulation LS,ss from the corresponding polarization in simulations LS,l and BM,l in the fully-charged complex ACE-CNEG (Fig. 4) and in complex MAM-CNEG containing the uncharged guest molecule (data not shown). Note furthermore that in simulations BM,m and BM,l, P(r) shows marked cutoff artifacts at the cutoff distance of 1.4 nm (a dip in the case of MAM-containing complexes and a crest in the case of ACE-containing complexes) and just before the cutoff distance (overpolarization in the case of MAM-containing complexes). These peculiarities, arising from molecule-based cutoff truncation in an explicit-solvent system, are also not captured by the continuum-electrostatics analog of P(r) for the BM scheme [94,97] and can thus not be remedied by DA pol .
For the systems considered in the present study, the magnitude of correction terms DA pol ; DA psum , and DA dir (CPOS and CNEG only) is very large (on the order of 50-200 kJ mol 21 ). For hydration in pure water, DA pol is always negative (independent of the sign of the guest charge) to account for the underhydration of the guest molecule caused by the presence of neighboring periodic copies (LS scheme), or the omission of guest-solvent interactions outside the cutoff sphere (BM scheme). In contrast, DA pol is positive for MAM-charging in CNEG and ACE-charging in CPOS because in these complexes the initial state of the TI procedure contains a charged guesthost complex, whereas the final state contains a neutral complex, that is, the electrostatic potential sampled at the guest atom sites is spurious in the initial rather than in the final state. As is the case for hydration in pure water, DA psum is negative for the charging of cations (MAM) and positive for the charging of anions (ACE). Because of the absence of electrostatic guest-host interactions, DA dir is zero for the charging in CAPO. As the LS and BM electrostatic interaction functions used in the present study are reduced in comparison to the Coulombic component (presence of self-and reaction-field terms [71,86] ), DA dir is negative for charging of MAM in CHB and the oppositely-charged CNEG and of ACE in CHB and the oppositely-charged CPOS. Likewise, it is positive for charging of the guests in the like-charged hosts, that is, MAM in CPOS and ACE in CNEG. In comparison to the other correction terms, DA exc is of rather small magnitude (0.1-0.4 kJ mol 21 ), because it is a short-range electrostatic interaction reduced by the short-range singularity associated with the Coulombic FULL PAPER WWW.C-CHEM.ORG Table 2. Charging free energies DA chg of the guest molecules MAM and ACE in hydrated host molecules CAPO, CHB, CPOS, and CNEG, as well as in water, computed in cubic computational boxes of edge length L containing N w water molecules using LS or BM electrostatic interactions (sections "Simulated guest-host systems" and "MD simulations"). Values obtained with the LS scheme in boxes of edge lengths L ss ; L s ; L m , and L l are labeled LS,ss, LS,s, LS,m, and LS,l, respectively, and values obtained with the BM scheme in boxes of edge lengths L m and L l are labeled BM,m and BM,l, respectively. The charging free energy DA chg [eqs. (19) and (20)] is calculated as a sum of the raw charging free energy DA raw chg [eq. (7)], and the correction terms DA pol ; DA psum ; DA dir , and DA exc [eqs. (10), (11), (14), (15), (16), (17), (18)]. Error estimates on the raw charging free energies refer to the statistical uncertainty obtained from block averaging. [ (24) and (25)]. DA chg;GðHÞ differs from DA raw chg;GðHÞ in that it is exempt of interaction of the guest with periodic host copies (LS scheme) or of reaction-field terms (BM scheme) and  Table 2). DA chg;GðWÞ differs from DA raw chg;GðWÞ in that it is corrected for all solvent-associated artifacts, that is, spurious solvent polarization and wrong dielectric permittivity of the solvent model, improper electrostatic potential summation and the presence of electrostatic interactions between excluded atoms (DA pol ; DA psum ; DA exc ; Table 2). It can be seen that application of the correction terms may cause considerable shifts in the ratio of DA chg;GðWÞ and DA chg;GðHÞ . In particular, the relative dominance of the components reverses in the case of MAM-CHB and MAM-CPOS, that is, while interactions with the host dominate DDA raw chg , those with the solvent dominate DDA chg (Table 3). Moreover, in the case of ACE-CPOS, the  Table 2) in combination with the fact that DA psum also has a negative sign (cation charging) [eqs. (14) and (15)]. The least change in contributions to the binding free energy occurs with ACE-CHB and is effected by DA pol and DA psum approximately canceling each other (248.8 to 271.0 kJ mol 21 for the former versus 67.6 to 77.2 kJ mol 21 for the latter; Table 2) by virtue of the positive sign of DA psum in the case of anion charging [eqs. (14) and (15)]. Note, in this context, that in system MAM-CPOS, DA pol corrects for spurious polarization around charges þ1 e and þ2 e in the initial and final states of the TI, respectively. Thus, considering, for example, the LS scheme, DA pol approximately evaluates to three times the correction for artificial periodicity in the case of charging a single monovalent ion in a box of edge length L, [23,54,92] namely to 3N A ð8p o Þ 21 L 21 n LS ð12 S Þ 21 , where the factor three arises from the proportionality of this correction to the square of the ionic charge (Appendix section "Calculation of DA pol " ) [eq. (A3)].
The raw charging free energies DA raw chg ðHÞ and DA raw chg ðWÞ, as well as the corrected data DA chg ðHÞ and DA chg ðWÞ may be used to calculate raw and corrected charging contributions to the binding free energies, DDA raw chg and DDA chg , respectively [eqs. (1) and (21)]. For the corrected, that is, methodologyindependent data, this can be done for all possible combinations of system sizes or/and electrostatics schemes used in the simulations of charging in water and in the host molecule. In practice, binding free energies are often calculated using computational boxes that are smaller for the in-water than for the in-host simulations. Table 4 reports the uncorrected data DDA raw chg for such a situation (in-water charging in small box size, here L ss for the LS scheme and L m for the BM scheme; inhost charging in large box size, here L l for the LS and BM scheme) and for those situations where approximate cancelation of periodicity-induced artifacts is expected to occur (inwater and in-host charging in boxes of equal size). Note, however, that the latter cancelation is of greater relevance for the LS scheme, because raw charging free energies obtained from simulations with the BM scheme are less sensitive to system size. [89] The averages DDA chg of corrected values DDA chg [eq. (21)] over all combinations of box sizes used for in-water and in-host charging, along with associated root-mean-square deviations are also provided. Values obtained for DDA raw chg based on L ss for in-water charging and L l for in-host charging differ by 228.1, 228.1, 235.2, and 221.0 kJ mol 21 for MAM binding to Table 3. Charging free-energy contributions due to the solvent, DA chg;GðWÞ [eq. (25)], and due the host molecule, DA chg;GðHÞ [eq. (24)], of the guest molecules MAM and ACE in hydrated host molecules CAPO, CHB, CPOS, and CNEG (section "MD simulations"). Values obtained with the LS scheme in boxes of edge lengths L ss ; L s ; L m , and L l are labeled LS,ss, LS,s, LS,m, and LS,l, respectively, and values obtained with the BM scheme in boxes of edge lengths L m and L l are labeled BM,m and BM,l, respectively (section "Simulated guest-host systems"). The charging free energies DA chg;GðWÞ and DA chg;GðHÞ are calculated as the sum of the raw charging free energy DA raw chg;GðWÞ [eq. (23)] and the correction terms DA pol ; DA psum , and DA exc [eqs. (10), (11), (14), (15), and (18)] and as the sum of the raw charging free energy DA raw chg;GðHÞ [eq. (22)] and the correction term DA dir [eqs. (16) and (17)], respectively (section "Solute and solvent contributions to the free energy of charging").    21 for ACE binding to CAPO, CHB, CPOS, and CNEG, respectively, from corresponding data for DDA chg (BM scheme). The majority of these deviations are nonnegligible, and it is thus essential to correct raw charging contributions to binding free energies. Note that box-edge length dependence is more pronounced for simulations with the LS scheme, because here the system-size parameter crucially determines the magnitude of artificial periodicity artifacts.

Guest Host Scheme
If both the in-water and in-host charging simulations are conducted in boxes of identical edge length, the deviations are significantly reduced for the LS scheme, that is, they evaluate to 22.9, 22.9, 210.0, and 4.2 kJ mol 21 for MAM binding to CAPO, CHB, CPOS, and CNEG, respectively, and to 20.8, 21.3, 6.5, and 29.3 kJ mol 21 for ACE binding to CAPO, CHB, CPOS, and CNEG, respectively, based on simulations in boxes of edge length L l , the best agreement with DDA chg thus being achieved for complexes containing the apolar CAPO and CHB host molecules. Note that simulations in equisized boxes do not lead to an improvement for the BM scheme, where, using data pertaining to edge length L l , the deviations of DDA raw chg from DDA chg are 214.0, 213.6, 217.2, and 210.6 kJ mol 21 for MAM binding to CAPO, CHB, CPOS, and CNEG, respectively, and 5.7, 4.9, 9.5, and 1.6 kJ mol 21 for ACE binding to CAPO, CHB, CPOS, and CNEG, respectively.
The averages DDA chg differ for simulation data pertaining to solely the LS or BM scheme by 0.9-5.0 kJ mol 21 . Overall, the averages DDA chg based on the BM scheme data differ on average by 2.4 kJ mol 21 from the LS scheme data, the agreement between the two different electrostatics schemes being better for ACE-containing complexes (average absolute difference 1.3 kJ mol 21 ) than for MAM-containing complexes (average absolute difference 3.6 kJ mol 21 ). This might be due to favorable cancelation of artifacts in the ACE-containing complexes, as well as the more pronounced cutoff artifacts in P(r) and the continuum-electrostatics-based correction scheme insufficiently capturing the pronounced overpolarization within the cutoff sphere for the MAM-containing complexes (Fig. 4). In comparison to the polarization in a homogeneous dielectric medium, approximated here by the Born polarization P Born ðrÞ [eq. (6)] around a charge of þ1 e (MAM-CAPO, MAM-CHB), þ2 e (MAM-CPOS), 21 e (ACE-CAPO, ACE-CHB), or 22 e (ACE-CNEG) centered at the MAM nitrogen or the ACE carboxylate carbon atom, hydration shell peaks in P(r) appear more pronounced for MAM in comparison to ACE in neutral host cavities CAPO and CHB and significantly broader for MAM in comparison to ACE in host cavities CAPO, CHB and the like-charged functionalized one (CPOS in the case of MAM, CNEG in the case of ACE). A less pronounced water radial polarization around anionic in comparison to cationic solutes was also observed before in the context of the hydration of monoatomic ions and can be drawn back to a decreased orientational freedom of water molecules around cations. [58] The corrected charging contributions (entailing all possible combinations of box sizes for in-water and in-host charging) show rmsd values within 2.5 kJ mol 21 for all complexes except Table 4. Raw charging contributions DDA raw chg [eq. (1)] to binding free energies of guest molecules MAM and ACE to hydrated host molecules CAPO, CHB, CPOS, and CNEG based on values for DA raw chg ðHÞ and DA raw chg ðWÞ calculated in four different system sizes (boxes of edge lengths L ss ; L s ; L m , and L l ; section "Simulated guest-host systems") using LS or BM electrostatic interactions (Table 2). Only a subset of the 16 (LS scheme-based) or four (BM scheme-based) possible combinations is reported. LðWÞ and LðHÞ denote the box-edge lengths used for simulations of in-water and in-host charging, respectively. For comparison, averages DDA chg of corrected values DDA chg [eq. (21)] over all 16 combinations of box sizes fL i g ¼ L ss ; L s ; L m and L l in the case of the LS scheme, over all four combinations of box sizes fL i g ¼ L m and L l in the case of the BM scheme, or over the union of the two latter sets (denoted LS1BM) used for in-water and in-host charging, along with associated root-mean-square deviations (rmsd) are also provided. MAM-CNEG (2.8 kJ mol 21 ) and ACE-CNEG (2.7 kJ mol 21 ). As discussed above, the spread in DDA chg for these systems may be drawn back to the inability of the continuum-electrostatics approximation to capture short-range artifacts in solvent structure which appear to be very strong for the LS,ss simulations of these complexes. Both raw and corrected charging contributions to the binding free energy, DDA raw chg and DDA chg (Table 4), obey intuitive reasoning in that they are least favorable for the like-charged guest-host complexes (MAM-CPOS, ACE-CNEG), considerably less unfavorable for the apolar host cavity (CAPO) and the host cavity allowing hydrogen bonding (CHB) and least unfavorable for the oppositely-charged guest-host complexes (MAM-CNEG, ACE-CPOS). In comparison to charging the guest in water, binding to the host is, however, only favorable in the case of MAM-CNEG (DDA chg ¼ 266:6 kJ mol 21 ; Table 4). The charging of guest molecule ACE is basically indifferent toward pure water or host CPOS environments (DDA chg ¼ 1:1 kJ mol 21 ; Table 4), which can probably be explained in terms of water being an extremely good solvent for anion solvation because the hydrogen atoms of the water molecule can approach anions very closely. [98][99][100][101] Altogether, as it can significantly alter the charging contribution to binding free energies and thus crucially change the interpretation or prediction of experimental data, analysis of possible electrostatic artifacts and application of required correction terms appears very important and should be considered an integral part of careful free-energy calculation studies if changes in the net charge are involved. Note that for more complex guest-host systems (e.g., a drug-receptor complex) it might be necessary to take into account the possible flexibility of the molecules, giving rise to time-dependent correction terms.

Conclusion
The calculation of binding free energies of charged species to a target molecule is a frequently encountered problem in MD studies of (bio-)chemical thermodynamics. A number of important endogenous receptor-binding molecules (e.g., glutamate, acetylcholine), enzyme substrates (e.g., superoxide anion, lysine) or drug molecules (e.g., aspirin, proguanil) have a nonzero net charge. Absolute binding free energies, as well as binding free energies relative to another molecule with a different net charge will be affected by artifacts due to the used effective electrostatic interaction function and associated parameters (e.g., size of the computational box). This is increasingly being recognized in the field of free-energy simulations. Independently from the authors' work, Rocklin et al. proposed a very similar correction scheme. [102] In the present study, charging contributions to binding free energies of either of two ionic guest molecules, MAM and ACE, to functionalized buckyball-like host cavities were calculated with classical atomistic MD simulation. Electrostatic interactions were treated using a LS scheme or a BM scheme, and the simulations were conducted in boxes of four different edge lengths. It was illustrated that: (i) the charging free energies of the guest mole-cules in water and in the host molecule strongly depend on the applied methodology; (ii) the charging free energies of the guest molecules in water and in the host molecule obtained from the LS scheme present a non-negligible dependence on the edge length of the simulation box; (iii) considering the investigated systems, error cancellation in computed charging contributions to binding free energies is only approximately guaranteed for systems with an apolar cavity (zero host charges) if corresponding in-water and in-host charging simulations are performed with the LS scheme in equisized boxes; (iv) neglect of correction terms for the artifacts introduced by the finite size of the simulated system and the use of an effective electrostatic interaction function considerably impairs the thermodynamic interpretation of guest-host interactions, and in particular the relative contributions of the solvent and the host compound; (v) application of correction terms for spurious solvent polarization and wrong dielectric permittivity of the solvent model, improper electrostatic potential summation, effective guest-host direct electrostatic interactions, and the presence of electrostatic interactions between excluded solute atoms in the Hamiltonian yields consistent results for the charging contribution to binding free energies. In particular, rmsd values over 20 results lie within 2.5 kJ mol 21 for all systems except MAM-CNEG and ACE-CNEG. For these systems, the spread might be drawn back to strong artifacts in solvent configurational sampling in a very small computational box using the LS scheme for the treatment of electrostatic interactions that are not captured by the continuum-electrostatics-based correction procedure.
As long as simulations of macroscopic nonperiodic systems with Coulombic electrostatic interactions, or electrostatic interactions truncated at sufficiently large distances, such that an adequate representation of experimental bulk systems is achieved, are out of reach, the proposed correction scheme for the charging contribution to binding free energies is a crucial step in obtaining thermodynamically sensible results for the free energy of binding of charged ligands.
[eq. (10)] or four [eq. (11)] such TIs. However, three alternative computationally less intensive approaches may be conceived of: 1. As the electrostatic potential at the guest atoms varies linearly with the guest charge state, the TI can be cut down to involve only the initial and final state, that is, eq. (12) may be simplified to where the factor (1/2) arises from numerical integration using the trapezoidal rule and the electrostatic potentials are evaluated at k ¼ 0 (uncharged guest) and k ¼ 1 (fully charged guest). DA pol then follows from eqs. (10) where the factor (1/2) arises from the assumption of linear response, and the second term in the curly brackets accounts for a possibly nonzero host charge Q H . In the latter case, eq. (A2) is very approximate because it assumes that the host charge is concentrated at one point and experiences the average of electrostatic potentials monitored at the guest atom sites. DA pol then follows from eqs. (10) and (11). 3. A crude approximation to DA pol for the LS scheme may be obtained through usage of the analytical formula per-tinent to the case of solvation of a single nonpolarizable monoatomic ion of radius R I in a box of edge length L filled with a homogeneous dielectric medium of permittivity S . [54,92] In the present case of a solvated guest-host system, this formula may be rewritten where n LS ¼ 22:837297. [82,83,103] Equation (A3) approximates the guest-host complex by a cavity of radius R vdW containing a point charge of magnitudes Q H and Q G þ Q H in the initial and final states, respectively.
As an example, the case of MAM charging in CPOS in system LS,l is considered. Equation (12), using six charge states as described in section "Free-energy correction terms" yields DA pol ¼ 2146:4 kJ mol 21 (Table 2). Equation (A1) yields the same result, DA pol ¼ 2146:4 kJ mol 21 . Equation (A2) yields DA pol ¼ 2147:3 kJ mol 21 , with contributions of 297.9 and 249.4 kJ mol 21 from the first and second term in curly brackets, respectively. Using Q G ¼ 1 e; Q H ¼ 1 e; R vdW ¼ 0:5 nm; L ¼ 3:80 nm and S ¼ 66:6, eq. (A3) yields 2149.4 kJ mol 21 . Whereas eq. (A1) is numerically accurate (due to a linear charging profile the initial and final points of the charging curve are sufficient to calculate the charging free energy), eqs. (A2) and (A3) rely on the afore mentioned approximations, which causes the resulting DA pol value to be less accurate. However, both of the two latter estimates deviate less than k B T from the numerically accurate DA pol result.

Calculation of DA psum
In Ref. [61], DA psum was calculated as eq. (14) for the LS scheme and using  (20) and Table 2], for charging of the guest molecules MAM and ACE in hydrated host molecules CAPO, CHB, CPOS, and CNEG (section "MD simulations"). The correction term DA psum was evaluated using eq. (14) for the LS scheme and either eq. (A4) or (15) for the BM scheme. The volume V vdW of the host cavity was approximated by a sphere of radius R vdW [eq. (A4)]. For a given approach and choice of R vdW , averages rmsd over the individual DA chg rmsd values are provided.

Guest Host
Equations (14) and (A4), R vdW 5 0.5 nm Equations (14) and (A4), R vdW 5 0.6 nm Equations (14) and (A4), R vdW 5 0.7 nm Equations (14) and (15) DA chg (kJ mol 21  for the BM scheme. V vdW is an estimate for the volume of the solute cavity (guest molecule in the case of in-water charging; host molecule in the case of in-host charging). Here, the volume of the host cavity was approximated by a sphere of radius R vdW (Table A1). In the present study, a slightly different equation was used for DA BM psum [eq. (15)] (section "Free-energy correction terms"). Equation (15) derives from expressing the product of the fraction f c of the cutoff sphere occupied by water with the water number density g w in the original expression [54,89] for DA BM psum in the case of hydration of a single monoatomic ion, Similarly, eq. (14) derives from expressing the product of the fraction f b of the computational box occupied by water with the water number density g w in the original expression [54,89] for DA LS psum in the case of hydration of a single monoatomic ion, Note that the expressions for the water number density used in eqs. (14) and (15), that is, N w L 23 and hN w ðR C Þi½ð4=3ÞpR 3 C 21 , respectively, might not be adequate for simulations LS,ss, because of spurious water density fluctuations in this very small system that are not captured by a global (box-volume associated) density measure. In particular, for simulations LS,ss, the height of the first and second hydration shell peaks around the buckyball cavity is overestimated by up to 18 and 15% for MAM and ACE, respectively, in hydrated host molecules CAPO, CHB, CPOS, and CNEG in comparison to simulations LS,l and BM,l (Fig. 3). Equation (15) seems advantageous in comparison to eq. (A4) because it yields greater methodological independence, as evidenced by the root-mean-square deviations (rmsd) of corrected charging free energies DA chg [eq. (20)] for charging the guest molecules in the host, reported in Table A1, and is independent of the volume of the solute. Three different choices for R vdW were investigated, namely R vdW ¼ 0:5, 0.6, or 0.7 nm. Distances of 0.5 and 0.7 nm from the center of mass of the 60 buckyball carbon atoms correspond to the average location of the 60 buckyball carbons atoms and the value where the water oxygen radial distribution function starts to deviate from zero, respectively. Overall, the cavity volumeindependent eqs. (14) and (15)