Unraveling individual host–guest interactions in molecular recognition from first principles quantum mechanics: Insights into the nature of nicotinic acetylcholine receptor agonist binding

Drug binding to a protein target is governed by a complex pattern of noncovalent interactions between the ligand and the residues in the protein's binding pocket. Here we introduce a generally applicable, parameter‐free, computational method that allows for the identification, quantification, and analysis of the key ligand–residue interactions responsible for molecular recognition. Our strategy relies on Local Energy Decomposition analysis at the “gold‐standard” coupled cluster DLPNO‐CCSD(T) level. In the study case shown in this paper, nicotine and imidacloprid binding to the nicotinic acetylcholine receptor, our approach provides new insights into how individual amino acids in the active site determine sensitivity and selectivity of the ligands, extending and refining classical pharmacophore hypotheses. By inference, the method is applicable to any kind of host/guest interactions with potential applications in industrial biocatalysis and protein engineering.


| INTRODUCTION
Understanding the nature of protein-ligand interactions underlies any attempt to design safe and efficient small molecule drugs as well as agrochemicals. Due to the advances in methodology as well as computing power, computer aided drug discovery has developed from a niche to rather standard in academic as well as industrial life science research. While in former days discovery and optimization of small molecules were focused in the first place toward efficacy against a single target only, since a few years the paradigm is shifting toward a multi-parameter approach, in which adverse effects, for example, affinity to off-targets, species selectivity, resistance development, or environmental effects, are taken into account from very early on. 1,2 This trend is further adding on the need for generally applicable methods that lead to a deep understanding of how small molecule drugs and agrochemicals bind to their biological target, for example, proteins and RNAs. Nowadays, a large number of computational strategies are routinely applied to the study of protein-ligand interactions.
For example, free energy perturbation methods have proven instrumental for predicting free energy differences of binding for large numbers of ligands, [3][4][5] while machine learning and artificial intelligence techniques have shown their potential to allow highly promising ways to navigate vast chemical biology spaces. 6,7 In addition, many quantum mechanical (QM) methods can be used to obtain an insight into the nature of protein-ligand interactions, 8 either via calculations on large cluster models or on entire protein-ligand complexes within a hybrid quantum mechanical/molecular mechanical (QM/MM) approach. 9,10 QM methods also provide valuable information into the intricate pattern of noncovalent interactions involved in the binding of a ligand to a protein. For example, they give access to molecular properties like electron densities, whose topology can be analyzed in the framework of the Quantum Theory of Atoms in Molecules (QTAIM 11 ) to identify the most significant ligand-residue interactions. Unfortunately, although interpretative tools like QTAIM have proven useful in many (bio)chemical applications, 12 they only provide qualitative insights, which significantly limits their usefulness in the rational design and optimization of new drugs. Alternatively, fragmentation methods, 13 in which the energy of a large system is approximated as the sum of contributions from smaller fragments, can be used to obtain more quantitative information, 14 although their accuracy is limited when many-body effects become important. 13 In this work, we introduce a computational strategy able to quantify accurately the individual ligand-residue and residue-residue interactions that contribute to drug binding. Our approach relies on a local variant [15][16][17] of the coupled cluster method with singles, doubles, and perturbatively included triples excitations, CCSD(T), which is generally regarded as the "gold standard" of computational chemistry for its well documented high accuracy and reliability. 18 In particular, we employ the linear-scaling domain-based local pair natural orbital CCSD(T) method, DLPNO-CCSD(T), 17,19 which allows for the calculation of accurate energies for systems with hundreds of atoms and thousands of basis functions. 17 Importantly, the DLPNO-CCSD(T) energy can be decomposed into fragment-pairwise contributions using the recently introduced Local Energy Decomposition (LED) scheme, [20][21][22] which has already found widespread application in the analysis of intermolecular interactions throughout chemistry. 21 Herein, we demonstrate that the DLPNO-CCSD(T)/LED methodology can be used to exactly dissecting ligand-protein binding energies into ligand-residue and residue-residue contributions, thus providing a quantitative and (bio)chemically readily interpretable framework in which one can discuss ligand-target sensitivity and selectivity. As a prototype case study, this approach is applied to the binding modes of two, biologically very different agonists of the F I G U R E 1 Structure of the nicotinic acetylcholine receptor (nAChR) and homology models used in the present work. (a) Top and side-view of the structure of a full, heteropentameric nAChR, exemplified by the first such resolved structure of Torpedo marmorata. 23 (b) Homopentameric acetylcholin binding proteins (AChBPs), like the one from Lymnea stagnalis shown here, 35 resemble the structure of the ligand binding domain of nAChRs. (c) Architecture of homology models of insect binding sites, derived from Lymnea s. AChBP. (d) Detailed view of these homology models highlighting an arginine residue from loop D in wild type insects (top, with green carbon atoms) and an important resistance inducing mutation featuring a Threonine residue replacing arginine (bottom, with yellow carbon atoms) nicotinic Acetylcholine Receptor (nAChR) of insects and mammals, that is, (S)-nicotine and imidacloprid.
2 | COMPUTATIONAL ASPECTS 2.1 | Cluster models nAChRs are ligand-gated ion channels that are involved in the rapid neuronal signal transduction in animal nervous systems. The overall architecture is sketched in Figure 1(A); for the scope of this study it is sufficient to realize that nAChRs are homo-or heteromeric pentamers of structurally related subunits that comprise an extracellular Nterminal domain involved in ligand binding, four transmembrane domains forming the actual cation-channel, and an intracellular region. 23 While the exact composition and 3D structure of nAChRs is very well known for many vertebrates, functional architecture, diversity, and three-dimensional (3D) tertiary structure of native insect nAChRs are still poorly understood. 24 Homopentameric acetylcholine binding proteins (AChBPs) found in snails (e.g., marine Lymnea stagnalis) resemble nAChRs' ligand binding domain, Figure 1(B). 25 Based on available AChB 3D-structures, numerous homology models of the binding sites in insect nAChRs were constructed [26][27][28][29][30][31][32] as sketched in Figure 1(C). The calculations in this work are based on an already existing model 24,33 of the active site of Myzus persicae nAChR. M. persicae is a useful model, as the active site of nAChR is highly conserved across insect species. 34 Nicotine and imidacloprid investigated in this work represent the "nicotinoid" and "neo-nicotinoid" classes of nAChR agonists. 36 They exhibit reverse patterns of selectivity. (S)-(−)-nicotine is a natural product with only very limited efficacy as an insecticide, reflected by an IC50~4000 nM at insect nAChR, but shows high mammalian toxicity with an IC50 of 7 nM. 37 The synthetic insecticide imidacloprid, selectively agonizes insect nAChR (IC50 4.6 nM), binding weaker to vertebrate nAChR by a factor of more than 500. 37 The observed F I G U R E 2 Overlay of starting geometries, as derived from the homology models (pink wireframe), and constrained geometry optimizations at the HF-3c level of theory. In the optimized geometries, the ligands imidacloprid (panels a and b) and nicotine (panels c and d) are shown as ball and stick models, with cyan carbon atoms. Panels (a) and (c) show the models of the native insect nAChR binding pocket, featuring an arginine residue with green atoms. In vertebrate as well as resistant insects, this position is occupied by a Threonin, shown with orange carbon atoms in panels (b) and (d). RMSD values comparing starting geometry and relaxed geometry are 0.99, 0.68, 0.75, and 0.66 Å in panels (a), (b), (c), and (d), respectively. The nAChR -ligand models shown in panels (a), (b), (c), and (d) contain 233, 223, 233, and 223 atoms, respectively selectivity pattern can be attributed to a single mutation in the binding niche of the respective nAChRs of sensitive insects as well as vertebrates and mammals, that is, the exchange of an arginine-residue by a threonine. Notably, a well-known mutation, commonly referred to as the R81T-mutation, in which a basic aminoacid (that is protonated under physiological condition) from loop D is replaced by an uncharged residue, is found in insects developing target site resistance against neo-nicotinoid insecticides like imidacloprid. 28,29,34,38,39 As the numbering of aminoacids is different in our model, we will refer to this mutation as "R55T" in the following, to avoid confusion. The "empty" protein pockets as well as the isolated ligands were also optimized at the HF-3c level of theory. For the protein, geometry optimizations were carried out using the same constraints used for the protein-ligand adduct.

| Binding energy calculations and LED analysis
The LED scheme provides an exact decomposition of the DLPNO-CCSD(T) energy into a series of additive contributions corresponding to the interaction of pairs of user-defined fragments. In this work, the LED scheme is used to identify the key noncovalent interactions that contribute to the binding of a ligand L (nicotine and imidacloprid in the present example) to a protein P (insect nAChR and R55T mutated nAChR).
Within a supramolecular approach, the P-L binding energy (ΔE bind ) can be computed at the DLPNO-CCSD(T) level as the difference between the energy of the P-L adduct (E LP tot ) and that of the isolated L (E L tot ) and P (E P tot ) fragments frozen in their in-adduct geometry: In the present case, in order to illustrate our LED approach for the analysis of ΔE bind , we use the cluster models introduced in Section 2.1 (see Section 2.2 for deformation energy, solvation and free energies corrections).
The fragments used in the LED decomposition are the ligand and the key residues in the active site (see Tables S4-S7 for Similarly, E P tot can be decomposed into intraresidue E P R ð Þ tot and residue-residue E P R,R ð Þ tot contributions: Using Equations (2) and (3), ΔE bind can be expressed as: Which can be rearranged in a more compact form as: Where ΔE L tot and ΔE R tot represent the change in the energy of the ligand and of the residues upon ligand-protein binding; ΔE Single point DLPNO-CCSD(T) calculations on the model structures described in Section 2.1 were carried out using the def2-TZVPP basis set with matching def2/C auxiliary basis set. NormalPNO settings were used by setting the TCutPairs threshold to the "TightPNO" value, that is, 10 −5 . The RIJCOSX approximation was used in the HF part (the def2/JK basis set was used) with a fine integration grid (gridx6).
For the sake of comparison, binding energies were also computed with the same basis set at various levels of theory, including Hartree-Fock (HF), second-order Møller-Plesset perturbation theory using the resolution of identity approximation (RI-MP2), HF plus London dispersion (HFLD), 41 DLPNO-CCSD and B3LYP-D3. 42,43 In HFLD calculations, all intraresidue and intraligand excitations were neglected.
Default HFLD settings were used, that is, NormalPNO settings in conjunction with TCutPairs 10 −5 .

| Solvation, entropy and deformation energy
Thermal corrections, entropy and environmental effects also contribute to the affinity of a ligand toward a protein. Moreover, the binding process of the ligand to the receptor might involve a significant structural rearrangement of the protein that also contributes to the binding free energy (the so called "deformation free energy"). 44 The accurate modeling of all these effects has never been attempted before for the systems considered here. Quantitatively modeling these important but very complicated contributions is beyond the scope of the present work. Herein, our aim is to introduce a method able to quantify the key ligand-residues interactions at the DLPNO-CCSD(T) level (see Section 2.2), and to discuss how these results correlate with available experimental data for the case study considered. For the sake of completeness, estimates for the solvation-(ΔG solv ) and deformation energy (ΔE def ) as well as the thermal and entropy (ΔG corr ) corrections were computed and added to ΔE bind in order to provide an approximate estimate for the overall free binding energy in solution: ΔE def is the energy required to distort the protein and the ligand from their equilibrium geometry to that they have in the adduct. It was computed at the same level of theory used for the calculation of The dielectric constant ε was set to 4, which is usually considered to be a good representation of the dielectric properties of the protein interior. [47][48][49] Note that the trend observed for ΔG bind is the same irrespective of the specific implicit solvation method used, as detailed in Supporting Information (see Table S8).

| RESULTS AND DISCUSSION
In this section, we start by discussing the most important structural features of the homology models just described and the results obtained are compared with those found in previously published computational and experimental works (Section 3.1). Then, various electronic structure methods are used to compute the binding energy between imidacloprid and sensitive and resistant nAChR and the results are compared with those found at the DLPNO-CCSD(T) level (Section 3.2). Finally, the DLPNO-CCSD(T)/LED approach is used to study the protein-ligand interactions for all the systems studied in the present work (Section 3.3).

| Structural analysis and previous studies
Our atomistic models for the binding of imidacloprid and nicotine to insect (sensitive model) and "R55T" nAChR (resistant model) were obtained as outlined in Figure 1 For both models, imidacloprid shows a significant interaction with a water molecule and with a tryptophan residue (W143), while no significant interaction was detected with the tyrosine Y185 and Y192 residues.
In contrast to imidacloprid, nicotine is a positively charged ligand and is expected to interact strongly with polarizable residues. Consistent with this picture, nicotine seems to interact significantly with W143 in both sensitive (Figure 3(B)) and resistant (Figure 3(D)) models. However, neither the geometries nor the QTAIM analysis provides any direct evidence for relevant interactions to either T55 (in the resistant model) or R55 (in the sensitive model). Moreover, only a very weak interaction was found with C187-C188 for both models.
A third key interaction for nicotinoids as well as neonicotinoids is a hydrogen bridge formed by the pyridyl-nitrogen of nicotine as well as imidacloprid to a highly conserved water molecule, which is observable in X-ray structures of AChBPs co-crystallized with different agonists. 25,50,51 As the water interaction does not allow any discrimination of nicotinoid versus neonicotinoid binding modes and is also well understood, 24,25,50,51 it will not be discussed further.
These results appear to be consistent with the widely-accepted selectivity mechanism proposed in seminal works by the groups of Tomizawa and Casida, 36,52,53 as well as Sattelle, 54 and are also consistent with previous three-dimensional quantitative structure-activity relationships (3D-QSAR) and pharmacophore models developed for neonicotinoid binding to nAChRs. 55 For sensitive insect nAChRs, it is commonly argued that the nitro-moiety of imidacloprid is capable of binding strongly to protonated residues like lysin or, as in our case, arginine. In contrast to imidacloprid, nicotine is lacking an electron-rich group analogous to the nitro-group and it is furthermore protonated under physiological conditions. These two factors are expected to have negative impact on the ligand binding to sensitive insect nAChR. For vertebrate nAChR, the lack of protonated residues is thus advocated as a major reason for the higher vertebrate toxicity of nicotine compared to its rather moderate insecticidal activity, in analogy to acetylcholine. 30 Moreover, the Tomizawa and Casida model predicts that the positively charged center in protonated nicotine can undergo a cation-π interaction with a key tryptophan. If and how the partially positively charged guanidine-center of imidacloprid is able to "mimic" the protonated nitrogen of nicotine, thus giving rise to a cation-π-like interaction to tryptophan as well, is still under debate. Based on atomic charge calculations, the Sattelle model of imidacloprid binding assumes that binding of the guanidinium-moiety to tryptophane is synergistically enhanced by the binding of the nitro group to a basic aminoacid. In any model, the importance of the tryptophan to neonicotinoid-binding to nAChRs is out of question. 27,30,32,56

| Comparison between different electronic structure methods
Before starting our discussion on the nature of the ligand-nAChR interaction, we calculated the binding energies ΔE bind (Equation (1)) at various levels of theory in order to compare the performance of different electronic structure methods in this context. This preliminary benchmark study was conducted on the systems containing imidacloprid. The resulting ΔE bind values are reported in Table 1.
Interestingly, dynamic electron correlation is found to play a fun-  Table 2.
Although deformation, free energy and solvation corrections pro-  and "(C)-(D)" (the changes in interaction upon mutating a residue in the active site), which provide a decomposition of the differential binding energies discussed above. The numerical values obtained from this analysis are given in Supporting Information. In particular, Figure S1 illustrates how the LED values are broken down into electrostatic, exchange and correlation energy contributions (where the latter, for the systems studied here, is dominated by the London dispersion energy).
The first eye-catching feature of Figure 4 is that the magnitude of the binding energy is dominated by the ligand-residue interactions and by the diagonal elements, while the cooperative effect between ligand-residue and residue-residue interactions is much smaller. This is an important finding, as it demonstrates that the binding energy is essentially determined by a limited number of ligand-residue interactions, which can be easily quantified with our approach.
Let us start by analyzing the binding of imidacloprid to sensitive nAChR (panel A). The strongest interaction partner for imidacloprid in sensitive nAChR is R55. As expected for an H-bonded system, the interaction is predominantly electrostatic in nature (81%, see Figure S1 and Table S1 in Supporting Information for the individual contributions to the interaction energies). In addition, and consistent with the Sattelle as well as the Tomizawa and Casida model, a noticeable attractive interaction between imidacloprid's guanidine-moiety and the tryptophan residue W143 is also found, which is stronger compared to the one that is occurring with all the other aromatic residues. Finally, the imidacloprid's chlorine atom also exhibits a non-negligible interaction with L112, which is composed of 55% electrostatics, 16% exchange, and 29% correlation contribution. Similar interactions of chlorine with the L112-M114 fragment have been discussed for Chloro-thiazole-neonicotinoids. 60 Consistent with these findings, the electronic structure of imidacloprid, R55, W143, and L112-M114 is significantly perturbed by the ligand-protein binding, as shown by the intense blue color in the corresponding diagonal elements of the LED interaction maps.
The interaction of imidacloprid with "R55T" mutated nAChR (panel C) retains some of the key features just discussed. For instance, a strong interaction between the ligand and W153 is observed.
Sattelle suggested that, in sensitive nAChR, such an interaction is further enhanced by the interaction of the nitro head group with R55.   Figure 4, panel B.
First, a very strong nicotine-W143 interaction, contributing to 67% of the total interaction energy of nicotine with all active site residues, was detected. Importantly, our analysis reveals that this strong interaction cannot be considered as a pure cation-π interaction, as would be possibly expected from the classical binding mode models. In fact, the protonated nitrogen of nicotine forms a very strong H-bond to a backbone carbonyl of residue W143, with electrostatics, exchange and correlation contributions being 75%, 11%, and 13%, respectively (Table S1). These results are in line with structural biology results on the binding of nicotine to human nAChR 61 as well as earlier theoretical models. 24,33 A comparison to the values discussed above for the respective interaction of imidacloprid (62%, 13%, and 25%) underlies the severe difference of the nature of how imidacloprid and nicotine interact with this key tryptophan residue.
Second, a purely electrostatic repulsion between nicotine and the basic residue R55 was detected, thus providing quantitative evidence to the assumptions on the electrostatic repulsion of nicotinoids discussed above. In absolute values, the strength of this repulsion is comparable to the remaining attractive nicotine-protein interactions and amounts to +34.1 kcal/mol. In the resistance mutant, the positively charged arginine is replaced by a neutral threonine and thus the repulsive interaction is not observed. Interestingly, the interaction with W143 is weakened, while there is a slight increase in interaction with the water molecule and the L112-M114 fragment (LED mutation map "(B-D)" in Figure 4).
Finally, as nicotine is lacking a feature analogous to the nitrogroup of imidacloprid, R55 is lacking an interaction partner in the nicotine-nAChR complex and finds weak compensation in an interaction with the cysteine-bridge C187-C188. This is reflected in the LED ligand perturbation map (panel "(A-B)") by the respective, prominently blue off-diagonal element.

| CONCLUSIONS
We have demonstrated here that the DLPNO-CCSD(T)/LED scheme is an extremely powerful tool for the analysis of any kind of host/ guest interaction. The methodology is capable to deliver in-depth, chemically intuitive insights into the nature and mechanisms of ligand binding, irrespective of whether it is electrostatic or dispersive in origin or a combination of both. Our LED maps appear particularly powerful for an in-depth understanding of hot spots of binding between ligands and the individual residues of the receptors, concerning: (i) the differences between different ligands binding to the same receptor; (ii) the differences between binding of one ligand to different mutations of a receptor. We have illustrated the power of this approach by analyzing the differential binding of nicotine and imidacloprid to insect and mammalian nACHRs. Our calculations did not only predict trends in binding strengths that correlate well with experimental selectivity differences, but also provide a quantitative analysis of which specific residues are most important for the interaction. We believe that the DLPNO-CCSD(T)/LED methodology opens up new avenues in the analysis of drug/receptor interactions and we hope that it will ultimately aid the design of new, highly specific drugs.
ACKNOWLEDGMENT GB and Frank Neese gratefully acknowledge the Priority Program "Con-

CONFLICT OF INTEREST
Michael Edmund Beck is an employee of Bayer AG, which is marketing products containing imidacloprid as active ingredient; Christoph Riplinger and Frank Neese are founders of FAccTs GmbH, which is marketing the ORCA software suite, for commercial usage (https:// www.faccts.de). For academic use, ORCA remains free (https:// orcaforum.kofo.mpg.de/app.php/portal).

DATA AVAILABILITY STATEMENT
The data generated during this study are available within the article or its supplementary materials.