## Introduction

The calculation of binding free energies is a standard task in the thermodynamic analysis of multicomponent molecular systems involving an association reaction between two system constituents, as, for example, an enzyme and a substrate, a receptor and a drug, or a nanocage and a guest compound. Physics-based approaches to compute binding free energies rely on statistical mechanics, which expresses the free energy as the natural logarithm of the system partition function (multiplied by the negative of the thermal energy, , where *k*_{B} is Boltzmann's constant). The underlying configurational ensembles can be generated by, for example, molecular dynamics (MD) simulation. A wealth of methodological improvements, along with increased computational resources allow (in principle) the accurate calculation of binding free energies, as extensively reviewed in the case of protein-ligand association.[1-9] However, if conducted without a proper eye on all potential pitfalls, binding free energies may be spuriously affected by limitations of MD simulations, such as, for example, an inadequate force-field description, approximations or/and assumptions in the free-energy calculation methodology, insufficient configurational sampling, or spurious configurational sampling due to the use of an effective electrostatic interaction function. These points are briefly discussed in turn below.

First, besides intrinsic deficiencies of classical force fields such as, for example, the neglect or mean-field treatment of electronic polarizability[10, 11] and the use of effective interaction energy functions[12-14] with empirical parameters, additional problems arise if the system under consideration involves molecular species for which no force-field parameters are available. For instance, standard (bio-)molecular force fields may not provide parameterizations of certain metal ions, cofactors, or drug molecules. Ideally, the corresponding parameters should be parameterized against experimental data using a strategy consistent with the parameterization of the used force field. In practice, however, they are either inferred based on chemical intuition and comparison with similar compounds or taken from automatized parameterization protocols.[15, 16] In addition, although the solvent representation in most (bio-)molecular force fields is already highly simplistic (rigid three-site models[17]), its structural characteristics may be relinquished for the sake of computational savings, the solvent then being modeled implicitly and the solvent-generated electrostatic potential computed via numerical or empirical (generalized Born) solutions of the Poisson-Boltzmann equation.[18-20]

Second, because they rely on a thorough characterization of the phase space of the system, simulations involving free-energy calculations are computationally expensive, which is why a number of approximate methods are sometimes applied. For instance, the free energy of charging a neutral particle may be estimated from an electrostatic linear-response approximation[21, 22] or cumulant expansions at the endpoints of thermodynamic integration (TI).[23-25] Similarly, the free energy of growing the van der Waals envelope of a particle is sometimes approximated using physics-based[26, 27] or empirical[21] relationships. Furthermore, assumptions in the ansatz of free-energy calculation methods, such as, for example, sufficient overlap of the phase-space distribution functions in different states of relative free-energy calculations,[28] or electrostatic linear response[22, 29] may limit the scope of their applicability. Lastly, discretization errors in numerical free-energy calculation methods, for example, the window width in potential of mean force calculations[30, 31] or the integration method in TI,[32, 33] limit the precision of the obtained results, although usage of optimal methods for statistical analysis [e.g., Bennett acceptance ratio (BAR)[34, 35] or multistate BAR[36, 37] approaches] may lead to significant gains in computational efficiency and statistical certainty.

Third, the phase space accessible to the system should be sampled exhaustively and according to the Gibbs measure appropriate for the desired thermodynamic ensemble, for example, canonical Boltzmann weighting in the case of simulations at constant particle number, temperature and volume. However, exhaustive sampling of phase space is complicated by the shear number of possible configurations, growing exponentially with the system size, and by energy barriers higher than , usually not amenable to transitions in plain MD simulation. Enhanced sampling methods can be used to improve coverage of the relevant phase space. A widely-used technique to address this problem involves the alteration of the potential energy function, for example, through local[38, 39] or nonlocal[40, 41] biasing, or more complex smoothening procedures,[42, 43] along with subsequent reweighting of the sampled configurations to the Gibbs measure corresponding to the unaltered potential energy function.

Finally, even if the phase space accessible to the system is sampled exhaustively and according to the Gibbs measure appropriate for the desired thermodynamic ensemble, the sampled configurations might not be representative of the real (experimental) situation because of an approximate or incorrect calculation of interatomic interactions. This is generally the case for electrostatic interactions which, due their long-range nature, are treated in an effective manner during MD simulations.[44-49] Ensuing artifacts become strongly apparent in the configurational sampling of systems involving charged particles or in free-energy calculations involving the change of the net charge of the system (charging free energy calculations), and have been reviewed extensively.[48-56] For instance, if electrostatic interactions are calculated via lattice-summation (LS) over a periodic system in charging free energy calculations, the orientational polarization of the environment of the particle to be charged will be affected by the influence of the periodic copies of this particle, which is an inappropriate contribution if actually a truly nonperiodic system is to be described. The magnitude of the introduced errors may be strongly dependent on the parameters of the system or the interaction function (e.g., the box-edge length), giving rise to so-called methodology-dependent charging free energies.[54] It has been shown before how charging free energies of monoatomic[54, 57, 58] and polyatomic[59-61] ions in infinitely dilute aqueous solution can be corrected for these errors, such that methodology-independent values are obtained.

The goal of the present article is to address the last point above for model systems representative of a protein-ligand complex in aqueous solution, that is, to present a correction scheme for the charging of polyatomic ions in a low-dielectric cavity functionalized with different chemical groups (section “Simulated guest-host systems”), such that the raw charging free energy of a ligand bound to a host molecule can be corrected to a methodology-independent value (Fig. 1). Comparison with the corresponding raw or corrected charging free energies in bulk water, or , respectively, yields the raw or corrected binding free energies of the charged ligand to the host molecule relative to a neutralized analog of the ligand, in the following denoted as and , respectively (Fig. 1). The possible occurrence of methodology-dependent artifacts (caused by the use of an approximate electrostatic interaction function, an improper summation scheme and simulated systems of finite size) in directly impairs calculations of the (absolute) binding free energy of a charged ligand and of relative binding free energies between ligands of different net charge. The value obtained for is not representative of a macroscopic nonperiodic system with Coulombic electrostatic interactions, and only allows a meaningful comparison to or prediction of experimental data measured in systems of macroscopic extent (Fig. 1). This issue was, however, not duly appreciated in previous work. Examples from the authors' own research include, for example, the calculation of ligand binding free energies[25, 62, 63] or redox potentials.[64]

On the long term, increases in computational power as currently mainly driven by graphics processing unit-based electrostatic interaction calculation[65-67] and advances in multiscale simulation methodologies targeted to an improved representation of electrostatic interactions[68, 69] may eventually allow for the simulation of macroscopic nonperiodic systems with Coulombic electrostatic interactions, or electrostatic interactions truncated at sufficiently large distances, such that an adequate representation of experimental bulk systems is achieved. Before such techniques have become state of the art, however, a scheme that corrects for methodology-induced artifacts will prove valuable in the calculation of binding free energies of charged ligands to (bio-)macromolecular host compounds.