Combined docking and machine learning identify key molecular determinants of ligand pharmacological activity on β2 adrenoceptor

Abstract G protein‐coupled receptors (GPCRs) are valuable therapeutic targets for many diseases. A central question of GPCR drug discovery is to understand what determines the agonism or antagonism of ligands that bind them. Ligands exert their action via the interactions in the ligand binding pocket. We hypothesized that there is a common set of receptor interactions made by ligands of diverse structures that mediate their action and that among a large dataset of different ligands, the functionally important interactions will be over‐represented. We computationally docked ~2700 known β2AR ligands to multiple β2AR structures, generating ca 75 000 docking poses and predicted all atomic interactions between the receptor and the ligand. We used machine learning (ML) techniques to identify specific interactions that correlate with the agonist or antagonist activity of these ligands. We demonstrate with the application of ML methods that it is possible to identify the key interactions associated with agonism or antagonism of ligands. The most representative interactions for agonist ligands involve K972.68×67, F194ECL2, S2035.42×43, S2045.43×44, S2075.46×641, H2966.58×58, and K3057.32×31. Meanwhile, the antagonist ligands made interactions with W2866.48×48 and Y3167.43×42, both residues considered to be important in GPCR activation. The interpretation of ML analysis in human understandable form allowed us to construct an exquisitely detailed structure‐activity relationship that identifies small changes to the ligands that invert their pharmacological activity and thus helps to guide the drug discovery process. This approach can be readily applied to any drug target.

of receptor interactions made by ligands of diverse structures that mediate their action and that among a large dataset of different ligands, the functionally important interactions will be over-represented. We computationally docked ~2700 known β2AR ligands to multiple β2AR structures, generating ca 75 000 docking poses and predicted all atomic interactions between the receptor and the ligand. We used machine learning (ML) techniques to identify specific interactions that correlate with the agonist or antagonist activity of these ligands. We demonstrate with the application of ML methods that it is possible to identify the key interactions associated with agonism or antagonism of ligands. The most representative interactions for agonist ligands involve K97 2.68×67 , F194 ECL2 , S203 5.42×43 , S204 5.43×44 , S207 5.46×641 , H296 6.58×58 , and

| INTRODUC TI ON
G-protein-coupled receptors (GPCRs) remain a therapeutically important family of proteins with over 100 receptors targeted by 500 drugs approved for clinical use. 1 The human β2-adrenoceptor (β2AR) 2,3 responds to stimulation by the endogenous agonist ligands adrenaline and noradrenaline by inducing Gs-mediated cAMP signaling and is a valuable target for small molecule smooth muscle relaxants used to treat asthma and other pulmonary diseases. 4,5 Endogenous agonist activity can be readily inhibited by so-called antagonist drugs that prevent receptor activation by occupying the binding pocket without activation and blocking agonist access. A large number of ligands have been developed to target β-adrenoceptors (βAR) over the last 60 years since the pioneering discovery of beta-blockers by Sir James Black. [5][6][7][8][9] All GPCRs share a common architecture of a bundle of seven transmembrane helices (TMs), with the ligand binding pocket accessible from the extracellular space and an intracellular effector binding site that becomes available following transition into an active receptor conformation. 10 One of the key features of GPCRs is that they are highly dynamic and adopt many distinct conformations that are important for the engagement of signaling partners, e.g., activation of the Gs protein or arrestins. 11 It is generally thought that ligands control GPCR activity by preferentially stabilizing active or inactive conformations. 12 With 35 reported structures with 13 diverse ligands in inactive and active states reported, β2AR is one of the best-studied GPCRs from a structural perspective.
Structure-based drug design has become an integral part of the modern drug discovery process. Approaches to link ligand structure to its activity are generally based on the ligand chemical structure (similar chemical structures have similar activity paradigms) or by considering the interactions between the ligand and the receptor.
Structural Interaction Fingerprints that describe the interactions of ligands with proteins [13][14][15] have proven to be a very successful approach to score binding poses of ligands. A number of different interaction fingerprints have been developed, with more complex ones that incorporate atomic interactions and different types of non-covalent interactions having superior performance. 16 Several studies have attempted to link structural properties of the ligands and the interactions they make to the receptor to their functionality, based on available crystallographic structures and complemented with ligand docking 17,18 or MD simulations. 19 These studies show significant promise in using interaction fingerprints to rationalize the link between structure and function, however, the results of these studies were limited to the experimentally available structural data that cover only a very small fraction of known β2AR ligands. This limited their general ability to generate the new chemical knowledge needed to answer the key question in the drug discovery pipelinewhat is the next molecule to make?
Ligands exert their action on GPCRs via the interactions they make in the ligand binding pocket. We hypothesized that despite the observed structural diversity of ligands targeting a particular receptor, there should be common interacting atoms within the ligand binding pocket that mediate their action. Unfortunately, the 35 experimentally determined structures is a very small dataset to obtain a comprehensive representation of the interaction pattern between ligands and the receptor. We reasoned that among a large dataset of different ligands and their respective binding poses, the functionally important atomic interactions the ligands make with a particular receptor will be over-represented. To investigate this hypothesis, we assembled a database of ~2700 known β2AR ligands and computationally docked them to multiple experimentally determined β2AR structures, generating ca 75 000 docking poses ( Figure 1A,B). This produced a large synthetic dataset suitable for Machine Learning applications. For each of the docking poses, we generated a detailed Atomic Interaction Fingerprint (AIF), which comprises a list of all the pairs of atoms involved in the interaction between a receptor and a ligand and a classification of each pairwise interaction as one of fifteen types of bond. In total, there were ca 1100 possible interaction descriptors that we interchangeably call features ( Figure 1C) in our dataset. Using pairwise correlation and Machine Learning (ML) approaches, we identified specific interactions between the ligands and the β2AR that correlated with their reported agonist or antagonist activity at the receptor (Table S1). In addition to a common set of interactions that were present for both ligand types, agonists make specific contacts with the amino acid residues H93 2.64×63 , K97 2.68×67 , S203 5.42×43 , S204 5.43×44 , S207 5.46×461 , H296 6.58×58 and K305 7.32×31 in transmembrane helices TM2, TM5, TM6, and TM7 while antagonists make specific interactions with W286 6.48×48 and Y316 7.43×42 in TM6 and TM7. This approach successfully identifies the key features of K305 7.32×31 . Meanwhile, the antagonist ligands made interactions with W286 6.48×48 and Y316 7.43×42 , both residues considered to be important in GPCR activation. The interpretation of ML analysis in human understandable form allowed us to construct an exquisitely detailed structure-activity relationship that identifies small changes to the ligands that invert their pharmacological activity and thus helps to guide the drug discovery process. This approach can be readily applied to any drug target.

K E Y W O R D S
adrenoceptor, docking, drug discovery, GPCRs, machine learning, structure-activity relationship the ligands in terms of the individual interactions they make with the receptor to exert their pharmacological action. Importantly, we were able to discover more subtle relationships where small changes to the ligand result in significant changes to their pharmacology, the so-called activity cliffs encountered in every drug discovery program. This method represents a novel strategy for understanding the molecular mechanism of drug action on receptors and provides a valuable tool to guide the drug design process. (C) Interatomic interaction fingerprint (AIF) calculations were made using Arpeggio. In the case of the "filtered dataset", the generated AIFs were filtered based on the presence of ionic interactions with D113 3.32×32 and N312 7.39×38 .

| Dataset preparation
A dataset was compiled using the primary open access repositories GPCRdb, 20,21 ChEMBL, 22 ZINC, 23 DrugBank, 24 and Guide to Pharmacology. 25 This dataset yielded a total of 2643 unique β2AR ligands, of which 1317 have reported pharmacological action, while 1326 compounds are binders with undetermined activity profiles. We classify ligands with known activity as either agonists (including partial and full agonists) or antagonists (including inverse agonists).
Each ligand was assigned an internal ID (ranging from 1 to 2643), and its corresponding SMILES string (line notation encoding its molecular structure) and pharmacological action (agonist/ antagonist/binder) were retrieved from the relevant databases.
The International Chemical Identifier key (InChIKey) was used as a unique identifier to distinguish between ligands across the dataset. 26 Both InChIKey and physicochemical properties appended for all compounds were acquired using the software Open Babel v3.1.1. 27

| Protein structures and ligand preparation and docking
The active-state protein coordinates were extracted from two crystal structures of human β2AR bound to an ultrahigh-affinity agonist (BI-167107) coupled with the Gs protein 28 or/and with a G proteinmimicking nanobody (Nb6B9) 29 from the Protein Data Bank (PDB code: 3SN6 and 4LDE, respectively). The inactive-state protein coordinates were extracted from the human β2AR bound to the inverse agonist carazolol (PDB code: 5JQH). 30 Receptor structures were aligned to use the same grid box of 22 × 22 × 32 Å at the orthosteric binding site, protonated, and charged, yielding a protein input file for subsequent docking experiments using UCSF Chimera. 31 The SMILES representation of ligands along with their internal ID were protonated and converted to a spatial data file (SDF) and pdbqt formats using Obabel.
The semi-flexible molecular docking was carried out using the software AutoDock Vina 32 and generated up to 10 poses for every compound. In total, 2643 compounds were docked in three βadrenoceptor structures, yielding almost 27 000 docking poses.

| Interaction fingerprint calculations and filtering
The inter-atomic receptor-ligand interaction fingerprints (AIFs) were calculated for all docking poses generated for each compound using the software Arpeggio 33 executed in Docker environment, 34 a software container platform. This method accounts for the presence of up to 15 subtypes of interatomic interactions, classified by atom type, distance, and angle constraints. The output was presented as binary values, with a 1 denoting the presence of a particular defined interaction and 0 indicating an absence.
A Python script was written to filter the Arpeggio results (to generate the "filtered dataset") by imposing minimum constraints that enforced certain features deemed essential for β2AR ligand binding, which eliminated all irrelevant binding poses (around 50% rows).
Criteria important for binding were based on prior knowledge derived from the literature, in particular the presence of the ionic/polar interaction between D113 3.32 and N312 7.39 with the ethanolamine moiety of the ligands.

| Generation of interaction matrix
A Python script was written to process each docking pose to gen-

| Descriptive statistical analysis
A descriptive statistical analysis of the frequency of the interatomic interactions was performed using a Python script. In this manner, the most frequently occurring features (those observed in at least 10% of all docking poses) contributing to agonism and antagonism were clearly identified across all interaction types and collated for further analysis. This resulted in the reduction of the number of features from ca 1100 to ca 100.
Subsequently, we computed the pairwise correlation between the columns representing atomic interactions and the column representing pharmacological action using Pearson's correlation coefficient (r) method. The resulting value r for each interaction (feature) reflects how well it is correlated with the pharmacological action (agonism or antagonism). Plots, graphs, and tables were generated with Excel, and statistical significance was determined using an unpaired t-test using Prism 8.

| Machine learning dataset preparation
The dataset was then randomly shuffled and split, via stratification, into cross-validation and final hold-out datasets. The cross-validation set was used for training and validation during hyperparameter optimization. The hold-out dataset comprised 20% of the original dataset and allowed us to gauge whether the validation scores were good estimations of model performance when generalizing to unseen data. The hold-out set was not used during any training or optimization procedures.

| Performance metric
For the filtered dataset the Random Forest classifier and for the unfiltered dataset XGBoost classifier were used. Matthews correlation coefficient (MCC) was used as the performance metric for all models. 35 The MCC metric is defined as follows: where TP is the number of true positives, TN the number of true negatives, FP the number of false positives, and FN the number of false negatives. The MCC for binary classification weights both positive and negative classes equally, while also being robust to severe class imbalances. A value of +1 indicates a perfect positive correlation, that is a total agreement between prediction and observation. An MCC score of 0 indicates no correlation, that is the classifier performs no better than a random coin flip. Finally, −1 indicates a perfect negative correlation, that is a total disagreement between prediction and observation.

| Model performance estimation
Model performance was validated using repeated-stratified-k-fold cross-validation. Cross-validation entails splitting the dataset into k equally sized partitions, termed folds. One of the folds is extracted and used for validating a model on unseen data. The remaining folds are then used to train the model. This process is then repeated using each of the k folds as the validation set. The optimal model is the one that has the best performance on average across all k-folds. Crossvalidation generally provides a less optimistic estimation of model generalizability on unseen data, which is finally tested on the holdout set. Due to class imbalances in the data, stratification is used to ensure that the original distribution of classes is maintained in each fold, thus preventing any fold from being populated by a single class. 36 Model estimation can be noisy and so by performing crossvalidation over many repeats one obtains a more precise estimation of true model performance. Bootstrap resampling was used to estimate model uncertainty. 37 Confidence intervals were calculated with respect to a 99% confidence level. Bayesian hyperparameter optimization (BHO) was utilized to determine high-performing model parameter configurations when tested on unseen data. BHO was set to maximize the mean MCC across K-folds and repeats, model uncertainty was then calculated using optimized models only.  Mathematically this can be formalized as: A precision of 1% (Marginal Error = 0.01) was selected for all resampling methods ( Figure S1). Therefore, a minimum of 13 repeats for both the RFC and XGBoost were used during cross-validation and bootstrap resampling. binders" with no assigned pharmacological activity ( Figure 1A).

| Nomenclature of targets and ligands
To understand if there are any obvious differences between agonists and antagonists, we compared their physicochemical (PC) properties predicted using OpenBabel software. 27 We found that We observed an identical linear correlation between the molecular weight and lipophilicity for both agonists and antagonists (Figure 2), suggesting that bigger compounds are more lipophilic. The likely explanation is that drug discovery efforts have focused on developing β2AR agonists formulated for the treatment of asthma. They are delivered to the lungs via inhalation with higher hydrophobicity increasing their duration of action at the target tissue. Therefore, although the observed differences in size and hydrophobicity are present in our data set, they are unlikely to have a functional role.

| Generating atomic interaction fingerprints based on molecular docking poses
To obtain structural information on how ligands in the curated dataset interact with the receptor (i.e., ligand binding poses), we We improved the signal-to-noise ratio within our dataset by excluding irrelevant binding poses using prior knowledge based on crystallographic data ( Figure 1C, filtering panel). The majority (ca 97%) of β2AR ligands have a prevalent β-hydroxy-amine motif that makes specific interactions with the receptor. We, therefore, excluded poses that did not display this ionic interaction between the oxygen of D113 3.32×32 and the nitrogen atom of ethanolamine of the ligands and the hydrogen bond between the oxygen atom of N312 7.39×38 and either the NH or beta-hydroxyl groups in the ligand scaffold; these have been observed in every experimental crystallographic structure of the β2AR. After applying this filter, we obtained ~31 500 atomic interaction files (~10 500 poses and AIF files for each PDB), reducing the size of the original dataset by ~55%. We refer to this as the "filtered dataset." As the filtering step also removed ~3% of ligands in our dataset that did not contain the β-hydroxy-amine motif or did not produce suitable poses, we have also included in our analysis the "full dataset" consisting of ~75 000 AIF files with no filtering for comparison.

| Data-driven analysis reveals key interactions that drive agonism and antagonism of ligands
We constructed a ligand-receptor interaction matrix, organizing the atom-atom interactions and their types in the columns and each binding pose in rows for each PDB. We defined the ligand binding site as all residues that interact with at least one ligand binding pose in the dataset resulting in 30 residues in total ( The antagonists made specific hydrophobic/aromatic contacts with W286 6.48×48 and Y316 7.43×42 and polar/ionic/hydrogen bond contacts with Y316 7.43×42 ( Figure 3A and Table S3).
While  Figure S2A). For the full dataset (cut-off = 0.51), the MCC and accuracy decreased to 0.29 and 67%, respectively ( Figure S2B). An important consideration for interpretation of the prediction accuracy is that the training dataset may contain errors: compounds that are "wrongly" assigned to a particular class (e.g., agonist or antagonist). Therefore, we would not expect the predictors to be 100% accurate during the validation step.
As the pairwise correlation approach identifies the relative importance of individual interactions, we applied ML strategies (see methods for details) that can detect more complex patterns in the data than pairwise correlation analysis. We trained a Random Forest Classifier (RFC) 42  In most cases, the presence of a particular interaction is predictive of agonism or antagonism. However, in a minority of cases, the absence of the interaction was more important for predictions (e.g., 193/CB--1/C hydrophobic).
Overall, while the relative order of importance of individual features varied depending on the model, we observed the same set of interactions that were predictive of agonism or antagonism for both models ( Table S4). The application of pairwise correlation analysis and ML methods allowed us to identify the key interac-

| DISCUSS ION
While an observation that on average agonists are larger and more hydrophobic could potentially be used to distinguish them from antagonists in the βAR ligand dataset, the pharmacological action of ligands on GPCRs is far more specific than a simple function of their size or hydrophobicity.

| Specific ligand-receptor interactions determine their pharmacological activity
While ML algorithms can successfully classify compounds into ago- databases allowed us to identify several "hot spots" mediating the agonism or antagonism of ligands acting on β2AR. Agonism was mediated by residues in TM2 and TM5 and further facilitated by residues in TM6 and TM7. It is entirely plausible that certain ligands can successfully pull these TM regions together causing receptor activation in the process. In contrast, our data suggest that antagonism is mediated by the interaction of ligands with W286 6.48×48 , the so-called toggle switch, that has long been proposed to play a key role in the activation of GPCRs. 45,46 The second mediator of antagonism is Y316 7.43×42 which is involved in the so-called 3-7 lock that has previously been identified as important for GPCR activation. 47 Engaging these key residues in the ligand binding pocket likely prevent the conformational rearrangements necessary for activation of the receptor.

| Potential for developing more fine-grained models of ligand activity
While the assembled data classify compounds as agonist or antagonist, the pharmacological activity of compounds covers a spectrum from a very strong antagonist (aka inverse agonist) to that of a very strong agonist (aka full agonist). Another class of GPCR ligands, socalled biased ligands, changes the balance between activating G protein and arrestin signaling pathways, with a potential to increase their therapeutic benefits. 11,48 It is likely that such partial and biased ligands would also show a distinct AIF that is somewhat different from the all-inclusive agonist AIF we have identified in the current work. However, a large experimental dataset of partial or biased agonists would be needed to explore this hypothesis, ideally collected in a uniform screen to minimize experimental and interpretational bias.

| Limitations on the ability to correctly predict ligand activity
Our data strongly support the hypothesis that individual atomic interactions are correlated with ligand pharmacological activity.
This is learned from a large dataset of ligand binding poses, where "correct" binding poses are a minority but the machine learning methods we used identified the structure-activity relationship because "wrong" binding poses averaged themselves out. Prediction of pharmacological activity, in contrast, is 100% dependent on having a correct binding pose for the ligand. This is a problem that has not yet been solved in a satisfactory manner, and it limits the performance of any structure-based activity prediction method. It is clear that the future progress in our ability to predict the pharmacological activity of novel ligands will be closely correlated with our ability to correctly predict their ligand binding poses. The analysis of the structural properties of ligands ( Figure S6) with "correct" and "wrong" predicted activity did not identify any specific clusters of ligands for which the prediction failed. This observation supports the idea that the quality of prediction is determined by the quality of the binding pose prediction. To summarize, the observed correlations are informative and potentially useful to design novel ligands with desired pharmacological, their application in a completely automated pipeline needs further optimization of docking algorithms.

| CON CLUS IONS
These results strongly support the hypothesis that the interatomic interactions between the receptor and its ligands are central to differentiate between their agonist and antagonist effects at the β2AR. The overview obtained of the interatomic interactions between receptor and ligand which correlate with an action will help the synthesis of new previously unseen compounds with a specific pharmacological activity. While the specific interatomic interactions between β2AR and its ligands that we describe are unlikely to be generalizable to other GPCRs (with the exception of closely related receptors such as β1AR), the same hypothesis and ML approach can be applied to other targets. The growth of GPCR ligand databases provides a rich data source to facilitate the application of this approach to other GPCRs, while conceptually this approach could be applied to any drug target.
The ability to predict the pharmacological action of a ligand based on its ligand binding pose will significantly advance drug discovery projects contributing to a reduction of attrition during drug

CO N FLI C T O F I NTE R E S T
DAS and DBV are founders of Z7 Biotech Ltd, an early-stage drug discovery company. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be constructed as a potential conflict of interest.

E TH I C A L S TATEM ENT
This study did not involved animals, human tissue, human participants or identifiable personal information.