Probing the range of applicability of structure‐ and energy‐adjusted QM/MM link bonds II: Optimized link bond parameters for density functional tight binding approaches

Abstract Optimized link bond parameters for the Cα—Cβ bond of 22 different capped amino acid model systems have been determined at SCC DFTB/mio (self‐consistent charge density functional tight‐binding), SCC DFTB/3ob and GFNn‐xTB (n = 0, 1, and 2) level in conjunction with the AMBER 99SB, 14SB, and 19B force fields. The resulting parameter sets have been compared to newly calculated reference data obtained via resolution‐of‐identity 2nd order Møller–Plesset perturbation theory. The data collected in this work suggests that the optimized values in this study provide a more suitable setup of the QM/MM link bonds compared to the use of a single global setting applied to every amino acid fragmented by the QM/MM interface. The results also imply that a transfer of the ideal link bond settings between different levels of theory is not advised. In contrast, virtually identical parameters were obtained in calculations employing different variants of the AMBER force field. Considering the increasing success of tight binding based approaches being inter alia a results of their exceptional accuracy/effort ratio the provided collection of link atoms parameters provides a valuable resource for QM/MM studies of biomacromolecular systems as demonstrated in an exemplary QM/MM MD simulation of the β‐amyloid/Zn2+ complex.

sciences, a large number of studies is still aimed at the initial applications focused on biomacromolecular systems. 10 Hybrid QM/MM methods exploit the accuracy of quantum mechanical (QM) methods 11,12 for the description of the chemical most relevant part while on the other hand less-accurate but at the same time less-demanding molecular mechanical (MM) approaches 13,14 are considered sufficient to represent the remaining part of the system (e.g., the bulk of a liquid or the structure of an entire biomolecule). Thus, in addition to methodical developments associated to the QM/MM hybrid approach such as advanced embedding techniques 7,15,16 and improved adaptive frameworks, 17,18 progress achieved in both QM and MM techniques for the description of chemical systems directly contributes to the increasing success of this versatile method.
In particular, research associated with the demanding QM approaches focused on improving the accuracy while keeping the computational demand manageable resulted in a hierarchy of increasingly complex methods. While high-level QM approaches such as density functional theory (DFT) 19,20 and even post-Hartree Fock methods 11,12 can be routinely applied in QM/MM simulations, the associated computational effort imposes limitations in both the treatable system size as well as the achievable number of simulations steps, for example, when applying the QM/MM framework in the context of Monte-Carlo and molecular dynamics (MD) simulations. [21][22][23] A possible alternative enjoying widespread application in QM/MM studies are semi-empirical QM methods. 24,25 By introducing various approximations based on DFT or/and Hartree-Fock (HF) theory a more efficient yet oftentimes less accurate description of molecular interactions is achieved. One increasingly successful family of semiempirical approaches is density functional tight binding (DFTB). [26][27][28][29][30] These methods exploit the efficiency of tight binding (TB) theory 31 but maintain their accuracy via a parametrization against more demanding DFT approaches. Despite their semiempirical character DFTB methods have been applied with large success in QM/MM simulations of various chemical systems. [32][33][34][35] One particular challenge in a QM/MM simulation is linked to the partitioning of the system into a QM and MM region which in the case of macromolecular systems involves the fragmentation of chemical bonds separated by the QM/MM interface. A variety of approaches for the treatment of QM/MM frontier bonds such as pseudobond 36,37 and quantum capping potential approaches, 38,39 adjusted connection atoms, 40 effective core potential techniques, 41,42 effective group potentials 43 and the structure-dependent effective Hamiltonian 44 method have been developed, with the link-atom technique 45,46 being one of the most widely employed approaches. This is inter alia due to the fact that no modification of the employed quantum mechanical routines is required, making this method particularly flexible when changing the theoretical level of theory which may be associated with a change of the applied QM calculation package, for example, when comparing results obtain via perturbation theory to those calculated via DFTB as done in this work.
In the link-atom framework, the valence introduced by fragmenting the QM/MM frontier bond L is compensated via the introduction of a suitable capping atom L C (see Figure 1). In the majority of cases, a hydrogen atom is employed, which is a suitable choice if the bond in question displays an apolar character. Although this capping atom introduces additional degrees of freedom, its location depends strictly on the positions of the QM and MM parent atoms L Q and L M (see Figure 1). While some approaches consider a fixed distance between L Q and L C irrespective of the actual L Q L M bond length, a more adequate approach 47,48 is the use of a distance ratio ρ Link according to with r being the position of the respective atom. This ensures that any variation in the parent L Q L M bond is replicated proportionally by the L Q L C bond. In addition, this linear relationship enables a redistribution of any force contribution F acting on L C to the atoms of the host bond via thereby eliminating any additional degrees of freedom arising due to the introduction of the capping atom L C .
While this implementation of the link-bond enables a more seamless integration of the QM/MM boundary into the QM treatment, it is F I G U R E 1 Link bond definition in the ACE-ARG-NME system for (A) the full QM reference and (B) the QM/MM setup. In the latter case, only the side chain of ARG was included in the QM-treatment employing hydrogen as capping atom L C . Typically, the partial charge of L M is excluded when applying electrostatic embedding in the QM calculation due to its vicinity to L C . However, L M is fully considered when evaluating all other contributions in the system including also the non-coulombic QM/MM coupling potential vital to place the capping atom at a chemically reasonable distance from L Q . 47 This is demonstrated in Figure 2, comparing the potential observed for an energy-minimized ACE-ARG-NME model system  of F LC although the system should be at its equilibrium. However, when employing an optimized ratio of ρ Link = 0.723, the equilibrium distance r eq coincides with that of the all-QM reference calculation.
Considering that a C H bond displays a weaker bond force constant compared to its C C counterpart, it is required to add an additional harmonic potential to the parent L Q -L M bond. However, the associated force constant k Link should not correspond to that of a C C bond but only compensate for the observed difference in the bond strength between the C C parent bond and its C H analogue. In this particu- framework. [55][56][57] The results have been compared against high-level QM reference data obtained at the correlated ab initio level RIMP2.
To assess the impact of the nature of the MM model all calculations have been performed using the AMBER 99SB, 51 AMBER 14SB, 49 and AMBER 19SB 52 parameterizations. Although the considered force fields are all from the same family, their parameterization differs in the F I G U R E 2 Total energy obtained via a potential energy scan in case of the ACE-ARG-NME model system depicted in Figure 1 about the equilibrium distance of the C α C β bond at RIMP2/cc-pVTZ level.
In order to achieve a correct representation of the full QM reference (black) in a QM/MM calculation employing the AMBER 14SB force field an ideal placement of the capping atom using a ρ Link -value of 0.723 (red) is required. To compensate the difference in bond strength between the C H link bond and the C α C β parent bond, a harmonic potential with reduced force constant of 200 kcal mol À1 Å À2 has to be applied (green) resulting in a nearperfect representation of the QM reference close to the equilibrium. The dashed lines demonstrate the impact of employing non-ideal ρ Link -values of 0.700 (purple) and 0.750 (blue), respectively assigned atomic partial charges, the Lennard-Jones parameters as well as the treatment of the bonding interactions, in particular the dihedral degrees of freedom. Where applicable the considered amino acid residues also take different protonation states into account. An overview of the individual steps of the fully automated parametrization strategy is given in Figure 3. In the first step the initial structure of a particular model system containing the aminoacid capped by N-terminal acetyl (ACE) and C-terminal N-methyl amide (NME) was generated using the program tleap, which is part of the AMBER program package. 58 This initial configuration was then subjected to an energy minimization using the respective QM method. This step already provides access to the equilibrium distance of the link bond r eq . Following the structure optimization, a potential energy scan along the C α C β bond has been carried out using 10 steps in increments of 0.01 Å in each direction. The resulting potential serves as the reference for the link atom parametrization.

| METHODOLOGY
Next, the QM/MM treatment was invoked to determine the ideal placement of the capping atom L C from the C β atom yielding the ideal distance ratio ρ Link . By variation of ρ Link in the range from 0.

| Link bond parameters
A comparison of the link bond parameters determined at the 7 different levels of theory in conjunction with the AMBER 19SB parametrization is provided in Figure 4A and B. A summary of the smallest and largest obtained values for {r eq , ρ Link , k Link } along with the respective averages and standard deviations is given in Table 1. Considering that RIMP2/cc-pVTZ is the most accurate level of theory employed in this study, it serves as the reference in the following discussion. Already The simpler GFN0-xTB formulation consistently yields shorter equilibrium distances than the higher-order xTB methods, which is also reflected by the respective average values listed in Table 1. However, comparison to the RIMP2/cc-pVTZ data reveals an inconsistent pattern, that is, for some systems GFN0-xTB displays the best agreement while in other cases the higher-ordered xTB flavors provide the best match.
On the other hand, the xTB methods show the smallest standard deviation in the equilibrium distance, indicating that the variation in r eq is the smallest of all investigated methods and thus even smaller than those observed in both RIMP2 cases.
In the case of ρ Link all three methods tend towards larger values  of the residues in the MM zone as well as their interaction with the QM atoms are to some extent different. In order to provide a consistent analysis, this comparison has also been carried out at RIMP2/cc-pVDZ level. However, due to its increased computational demand the triple-zeta valence basis set cc-pVTZ was not considered. In addition, only the GFN2-xTB method was employed to represent the xTB approach, since the different orders resulted in quite consistent trends (see Figure 4B).
It can be seen from Figure  These findings imply that it appears adequate to transfer link bond parameters between different force fields of the same family.
Although it would be of particular interest to also assess whether this conclusion still holds true when transferring parameters between different force field families, this step was not considered in the present work. This is due to the fact that related force fields such OPLS-AA 77 employ effectively the same potential energy expressions. In this case, it can be expected that a similar finding as for the different AMBER subtypes can be obtained. More difficult, however, is the extension towards other force fields such as CHARMM22 78 or GROMOS 79,80 that often employ a united atom (UA) approach combining aliphatic carbon atoms and their bound hydrogen atoms into a single meta-particle. In this case, additional steps during the QM/MM initialization are required to re-introduce the missing hydrogen atoms in the QM zone, which make these force field approaches more challenging to implement in the automated link atom parametrization. Moreover, the united atom approach is less effective within the context of an electrostatic embedding framework, which makes UA-based force fields to some extent less attractive for QM/MM applications compared to their all-atom counterparts.

| Impact of non-ideal link bond settings
In order to demonstrate the benefit in pre-optimizing the link bond parameters a short exemplary QM/MM MD simulation of a hydrated peptide has been carried out. The comparably short 42 amino acid β-amyloid peptide coordinated to a Zn 2+ ion derived from pdbstructure 1ZE9 82 has been considered. In this example the ion and the directly coordinated side chains of one glutamate and three histidine residues were included in the QM zone treated at GFN2-xTB level. 56 The AMBER 14SB force field 49

ACKNOWLEDGMENT
The computational results presented have been achieved (in part) using the HPC infrastructure LEO of the University of Innsbruck. The authors are grateful to S. Tarique Moin for his insightful comments regarding the setup and execution of the β-amyloid/Zn 2+ system.

DATA AVAILABILITY STATEMENT
The data that supports the findings of this study are available in the supplementary material of this article.