Engineering G protein‐coupled receptors for stabilization

Abstract G protein‐coupled receptors (GPCRs) are one of the most important families of targets for drug discovery. One of the limiting steps in the study of GPCRs has been their stability, with significant and time‐consuming protein engineering often used to stabilize GPCRs for structural characterization and drug screening. Unfortunately, computational methods developed using globular soluble proteins have translated poorly to the rational engineering of GPCRs. To fill this gap, we propose GPCR‐tm, a novel and personalized structurally driven web‐based machine learning tool to study the impacts of mutations on GPCR stability. We show that GPCR‐tm performs as well as or better than alternative methods, and that it can accurately rank the stability changes of a wide range of mutations occurring in various types of class A GPCRs. GPCR‐tm achieved Pearson's correlation coefficients of 0.74 and 0.46 on 10‐fold cross‐validation and blind test sets, respectively. We observed that the (structural) graph‐based signatures were the most important set of features for predicting destabilizing mutations, which points out that these signatures properly describe the changes in the environment where the mutations occur. More specifically, GPCR‐tm was able to accurately rank mutations based on their effect on protein stability, guiding their rational stabilization. GPCR‐tm is available through a user‐friendly web server at https://biosig.lab.uq.edu.au/gpcr_tm/.


| INTRODUCTION
G protein-coupled receptors (GPCRs) are members of one of the most medicine-relevant human protein families (Hauser et al., 2017).They possess the ability to selectively bind to a diverse range of ligands, including light-sensitive compounds, pheromones, hormones, ions, and neurotransmitters.These receptors then transmit these signals from the external environment of the cell to its intracellular side.In 2017, it was estimated that around 34% of all drugs approved by the United States Food and Drug Administration targeted GPCRs.This corresponds to approximately 700 approved drugs (Hauser et al., 2017).Nevertheless, the majority of GPCRs were unexplored, and there are approximately 227 non-olfactory GPCRs that are yet to be analyzed by drug discovery processes (Hauser et al., 2017).Despite all the recent discoveries around GPCRs, we still have many structures to be discovered.According to the GPCRdb (Pandy-Szekeres et al., 2023), 187 unique structures were elucidated, out of 800-1000 GPCRs (Alhosaini et al., 2021;Congreve et al., 2020).In this context, the emergence of cryo-electron microscopy (cryo-EM) has revolutionized GPCR structural biology.cryo-EM enables the visualization of GPCR structures in their native states, circumventing the need for crystallization and providing insights into diverse ligand-bound conformations (Cheng, 2015;Frank, 2002).
Complementary to cryo-EM, x-ray crystallography can provide higher-resolution structural information, which can be of great value for structure-based drug design.Nevertheless, obtaining diffraction-quality crystals for highresolution structure determination of GPCR is not a trivial task.This struggle is linked to the diversity of ligands and unique features (highly flexible and dynamic).To overcome this barrier, engineering is usually required to minimize conformational heterogeneity and maximize crystal contacts and stability (Kobilka & Deupi, 2007).For this purpose, several methodologies have been developed, which include recombinant overexpression, purification strategies (Errey & Fiez-Vandal, 2020), crystallization platforms (Parker & Newstead, 2012), and detergent studies (Lee et al., 2020).However, these studies are primarily experimental (i.e., human-dependent), with complex and time-consuming methodologies, resulting in low scalability while dealing with new GPCRs.
In addition, it is worth noting that it is well known that GPCRs do not work in a simple turn-on or off state.Instead, it is understood that GPCRs work in a spectrum of states, from which either completely active or completely inactive is almost always not the case (Park et al., 2008).It is also important to mention that the interactions between GPCR ligands and receptors can unfold in a variety of relationships.For example, they can unfold under positive and negative allosterism (May et al., 2004), inverse agonism (May et al., 2004), ligand-biased signaling (Wootten et al., 2018), and receptor oligomerization (Milligan et al., 2019).The discovery of new GPCR structures in different states is, therefore, of paramount importance.The differences between the states can shed light on our understanding of cell signaling mechanisms and support structure-based drug design.
A strategy deployed to engineer new stable GPCR structures is the use of point mutations.Several studies involving mutations in GPCR have already been developed.For instance, more stable neurotensin receptor mutants were obtained using a systematic mutational approach coupled with activity assays (Shibata et al., 2013).
A second example of it involved the application of mutagenesis in the search for more stable adenosine A2a receptor mutants (Lebon et al., 2011).Following a different idea, mutagenesis was used to obtain detergent-solubilized thermostable mutants of β1-adrenergic receptors (Magnani et al., 2008;Shibata et al., 2009).We also had the application of mutagenesis to generate thermostabilized free fatty acid receptor 1 (Hirozane et al., 2014).Additionally, to aid x-ray crystallography, thermostabilization can support the study of dynamic functional aspects, such as ligand binding kinetics and receptor activation under physiological conditions.Finally, thermostabilization facilitates highthroughput screening assays for drug discovery efforts, complementing cryo-EM studies in identifying novel ligands and allosteric modulators (Lee et al., 2020).
However, all these studies involving mutagenesis were complex, expensive, and time-consuming because the number of possible sequences generated by mutagenesis (sequence space) was too high, considering the number of residues in the GPCRs.Furthermore, it is important to emphasize that these strategies have been successfully used for only a few GPCRs (Vaidehi et al., 2016).
Based on the lack of stability predictors for GPCR engineering, we developed GPCR-tm to speed up and reduce the costs of engineering-based mutagenesis studies within the GPCR scope.GPCR-tm relies on currently available mutation GPCR data, leading to a robust and accurate ΔTm predictor capable of ranking and stabilizing mutations tailored for GPCRs.GPCR-tm integrates graph-based signatures with a range of auxiliary features, providing a straightforward, explainable, and complete analysis of the impact of mutations on a protein's dynamics and stability resulting from stability changes caused by point mutations.Our method is available as a tool through an easy-to-use and reliable web interface at https://biosig.lab.uq.edu.au/gpcr_tm/.

| Data set analysis
We compiled experimental ΔTm data from various databases, including FireProtDB (Stourac et al., 2021), Thermomutdb (Xavier et al., 2021), and MPTherm-pred (Kulandaisamy et al., 2021).All datasets were combined, and just one mutation was selected from repeated ones (same position, same UniProt [Coudert et al., 2022] identification, mutated to the same residue).In total, we ended up with 97 non-redundant mutations.The distributions of the stability value in the 97 single-point mutations show that 60 mutations cause an increase in stability higher than 0. Figure S1 depicts the distribution of ΔTm values for the collected GPCRs.
Our training dataset comprises data from 11 distinct receptors (Uniprot IDs: O14842, P08172, P21554, P24530, P25024, P28335, P29274, P35408, P48039, P49286, and P51677), with 6 receptors included in both training and test sets (Uniprot IDs: O14842, P25024, P28335, P29274, P48039, and P49286).The selection of mutations for each set was conducted entirely at random, ensuring an unbiased approach to dataset construction.This meticulous approach underscores the integrity and reliability of our data, facilitating robust analysis and interpretation of the results.In addition, although training and test sets share receptors, it is important to reiterate that they do not have repeated mutations (i.e., nonredundant sets in terms of mutations).
We also analyzed the distribution of different amino acid (AA) types on wild-type and mutated residues.Most of the single-point mutations determined through experiments involve hydrophobic residues, namely alanine, phenylalanine, leucine, and valine (53, 6, 7, and 3, respectively), followed by charged residues, namely arginine, and lysine (both 5 and 4, respectively).Most of the residues involved are mutated to Ala (Ryu et al., 2023), as a reflection of alanine scanning efforts (Munk et al., 2019).
Moreover, we observed mutations on all helices of the receptor by investigating the topology involved in those mutations.Helix 3 has the highest number of mutations, 28.Next, we have 18 mutations on Helix 2, 10 on Helix 5, 11 on Helix 6, and 10 on Helix 7. Helices 4 and 1 have the least number of mutations, 9 and 5, respectively.This trend is related to the fact that Helix 3 is central to the receptor, showing more interactions with other helices, whereas Helices 4 and 1 are more outermost, interacting much less with the other helices (Munk et al., 2019).The remaining six mutations occurred outside the helices.

| Predictive performance on singlepoint mutations
In an effort to build a robust and reliable model in GPCR-tm for predicting the effects of single-point mutations on GPCR structure stability, we used structures in both states (inactive and active).The selection of the best state and ML algorithm was based on a 10-fold crossvalidation approach using Pearson's correlation coefficient as the performance metric.We concluded that the best scenario was using structures on active state and Random Forest (with 300 predictive decision trees) as the ML algorithm.

| ΔTm GPCR-tm's performance
We generated two sets of features that are diverse and complementary (refer to Table 3).This detailed characterization served as the foundation for training, validating, and evaluating predictive supervised models aimed at forecasting stabilization in GPCR proteins induced by these mutations.Utilizing all features during crossvalidation yielded a Pearson's coefficient of 0.19 and a mean squared error (MSE) of 19.92.These results suggest that additional refinement and feature selection are required to enhance the model's predictive capabilities.
To enhance the model's performance, we implemented a bottom-up greedy feature selection method (see Figure S2).Through this approach, we evaluated our feature set and the model's performance notably improved during cross-validation.Specifically, the Pearson's coefficient increased to 0.74, and the MSE was 29.00, achieved with the utilization of 14 selected features (see Figure S3, 10-fold cross-validation scatter plot).When evaluating the ranking predictive performance, the Kendall's tau metric was 0.51, and the Spearman's rank-order correlation coefficient was 0.67.
Subsequently, we established a correlation between predicted and actual ΔTm values.We subjected our proposed model to a blind test, as detailed in Section 4.During this evaluation, our model yielded a Pearson coefficient of 0.46 (refer to Figure 1) and an MSE of 16.85.When evaluating the ranking performance in the blind test, the Kendall's tau metric was 0.27, and the Spearman's rank-order correlation coefficient was 0.41.Notably, this performance stands out as the highest in the benchmark for stability upon mutations, showcasing its robust and reliable capabilities.The outcomes of the blind test further affirm the model's effectiveness in handling new data scenarios.We have also included scatter plots in Figures S4 and S5 to demonstrate the correlation between experimentally measured ΔTm values and the rank ordering of mutations predicted by our model.These scatter plots depict the relationship observed during both 10-fold cross-validation and the blind test.This additional analysis provides further insight into the consistency and reliability of our predictor in estimating the impact of mutations on ΔTm values across different validation scenarios.
Subsequently, we assessed our model via a second blind test set, in which mutations are stabilizing according to GPCRdb (Pandy-Szekeres et al., 2023).The model predicted 38 mutations (76%) properly, increasing the stability of the receptor, which provides additional confidence in the generalizability and robustness of the model in GPCR-tm.
Nevertheless, we are aware that GPCR-tm was modeled using a small structural GPCR dataset, which constrains its ability to accurately represent the underlying GPCR mutations, thereby increasing the risk of overfitting in terms of predicting the effect of new GPCR mutations in terms of thermostability.In the context of our study, the scarcity of available GPCR data exacerbates this challenge.Despite our efforts to increase the GPCR mutation sample size, the limited dataset has led to an observed decrease in predictive performances between cross-validation and blind testing.However, it is noteworthy that our model still demonstrates considerable reliability, as evidenced by its performance on both blind sets applied.Additionally, in comparative benchmarking against other methods (see the following subsection), our approach outperforms alternative methods, further underscoring its efficacy in predicting GPCR thermostabilizing mutations.

| (De)stabilization model's predictive performance
Furthermore, we applied the same dataset used during blind testing of our model (i.e., considering 19 independent mutations from the training set) to benchmark against other available tools using classification by regression.We employed the performance metrics accuracy, Matthew's correlation coefficient (MCC), and weighted F1 score to compare the predictive performance of GPCR-tm against alternative methods.Methods in Supporting Information S1 detail all these performance metrics.
GPCR-tm significantly performed as well as or better than all alternative methods (Table 1).Nevertheless, it is worth noting that we could not directly compare GPCRtm with a well-established method, MPTherm-pred (Kulandaisamy et al., 2021).For predicting using the tool MPTherm-pred, the user needs to select as input a Protein Data Bank (PDB) file available at https://www.rcsb.org/.MPTherm-pred does not accept as input a PDB file to be uploaded.When selecting a structure available at https://www.rcsb.org/, the structure would contain potential structural modifications (like engineered mutations).Predictions made using these structures would not be comparable with the ones used in this study.Therefore, the comparison with this alternative method was not feasible.
Although mCSM-membrane performed as well as GPCR-tm on the blind test set, we believe that is because part of this data were used to build mCSM-membrane's model, not allowing a true comparison to this alternative method.In fact, when the whole dataset with 97 proteins is set as input, the predictive performance of the mCSMmembrane method drastically decreased, achieving an accuracy of 0.32, an MCC of À0.03, and a weighted F1 score of 0.27.These results highlight why personalizing a ML-based tool for GPCR stabilization prediction is crucial, which is an aspect delivered by GPCR-tm.

| Interpretation of the selected features for predicting stabilization for GPCR mutations
During feature selection of GPCR-tm (regressor tool), we found that out of 14 features, 9 graph-based signatures were important for ranking mutations (see Figure 2; Table S1).Graph-based signatures represent the quantification of pairs of pharmacophoric regions within a defined distance threshold surrounding the mutation site.For example, the feature denoted as Hydro:Hydro-4.00 indicates the count of pairs of hydrophobic atoms within a maximum distance of 4 Å from F I G U R E 1 Regression analysis on G protein-coupled receptors (GPCR)-tm.By using the predictions on the selected (blind) test set, we externally analyzed the performance of the GPCR-tm model.The plot demonstrates the correlation between experimental and predicted values.the mutation site.Refer to Section 4 for more information about graph-based signatures.Six out of the nine graphbased signatures are related to hydrophobic interactions: Hydro:Hydro-4.00,Acc:Hydro-2.50,Hydro:Pos-5.00,Hydro:Sul-3.50,Hydro:Sul-6.00,and Hydro:Pos-3.50(where Hydro = hydrophobic, Acc = hydrogen bond acceptor, Pos = positive, Sul = sulfur group, and the given number represents the distance cutoff in angstroms).This selection can be related to the fact that membrane proteins, like GPCRs, are embedded in a hydrophobic environment (membrane).Therefore, the hydrophobic interactions play an important role in membrane insertion and folding (Ballesteros & Weinstein, 1995).Any changes in the hydrophobic core can cause the protein to lose stability.According to the Shapley Additive Explanations (SHAP) (Lundberg & Lee, 2017) feature importance plot in Figure 2, when the values of the features Hydro:Pos-5.00,Acc:Hydro-2.50, and Hydro:Sul-6.00are high, the predictions tend toward not stabilizing.For the feature Hydro:Hydro- 4.00, there is no visible pattern.In this case, high and low values are correlated to stabilizing and destabilizing.Another important aspect that can be observed is related to the feature mem (stands for membrane; if its value is high, the residue where the mutation occurred resides inside the membrane; if its value is low, the residue where the mutation occurred resides outside the membrane).According to the SHAP plot, mutations inside the membrane are correlated to an increase in stability, and mutations outside the membrane are correlated to a decrease in stability.We also found four features related to charged residues (Don:Pos-5.50,Hydro:Pos-5.00,Hydro:Pos-3.50, and Aro:Neg-4.50,where Don indicates a hydrogen bond donor), which reflects the importance of electrostatic interactions for protein stability (Hassani, 2012;Matthews, 1993;Pace et al., 2000) (refer to Table S1, for more information about each feature).
The features BENS940104 (i.e., based on the genetic code matrix) (Benner et al., 1994) and LUTR910108 (i.e., based on the structure-based comparison table for alpha helix class) (Luthy et al., 1991) have also been considered important in our findings (see Table S1).Their importance is related to the fact that they help in identifying regions of proteins that are prone to mutation and those that are evolutionarily conserved, which can provide insights into functional regions and structural stability.BENS940104 feature is based on a genetic code matrix.The matrix is calculated by assuming that the genetic code is the only constraint on AA divergence.A higher value of this feature indicates that the observed AA substitution occurs more frequently than expected by chance, while a lower value indicates that the substitution occurs less frequently than expected by chance.When analyzing the SHAP feature importance plot (Figure 2), relating the value to the impact of it on the prediction is unclear.Nevertheless, higher values are apparently more related to a decrease in stability.This indicates that these destabilizing mutations have a higher frequency, according to this matrix.Conversely, the LUTR910108 feature is based on a secondary structurebased profile made for mutations occurring in alpha helices.Lower values meaning that the mutation tends to occur less often when happening in an alpha helix.Higher values mean the opposite.According to the SHAP feature importance plot (Figure 2), the higher values of this feature tend to indicate an increase in stability.
The feature non_cytosol (i.e., a mutation not inserted into the cytosol) provides information about the location of the mutation.This is critical for the prediction because the cytosolic environment is completely different from the regions outside of the cytosol in terms of lipophilicity and substances concentrations (Kulandaisamy et al., 2019).
The feature FromPro refers to mutations in which the original AA is a proline.This is biologically relevant because this is an AA that has a distinctive cyclic side chain.This special side chain gives proline an exceptional conformational rigidity compared to other AAs.Proline is often found in turns, loops, and bends within protein structures.The change of proline to another AA commonly leads to a decrease in structural rigidity (Choi & Mayo, 2006) (see Table S1 for more information).

| GPCR-tm's web server
GPCR-tm is available as a user-friendly web server.To perform a prediction, users need to provide a PDB file or a PDB code of the GPCR.They can upload a list of mutations.The point mutation should contain the single-letter code of the wild-type residue, the corresponding mutation residue number, and the single-letter code of the mutant residue.The chain identifier of the wild-type residue in the protein should also be specified in its singleletter code.The single-letter code and the chain identifier should be separated by a space.
GPCR-tm predicts ΔTm, which is a metric related to how a single-point mutation will affect protein stability.According to the ΔTm, the mutations are ranked following a descending order of ΔTm on GPCR-tm's web server.The higher the result, the higher the probability of the mutation increasing the stability of the receptor.The results can be downloaded in a tabular comma-separated value format (Figure S6).

| CONCLUSIONS
Here, we present GPCR-tm, a ML web-based method, GPCR-tm, which relies on the concepts of graph-based signatures and auxiliary features to predict and rank the effects of single-point missense mutations on the stability of GPCRs.This is the first approach designed exclusively for GPCRs, which incorporates a user-friendly web server for seamless interaction.
GPCR-tm could accurately predict the effects of a variety of mutations on many different types of GPCRs on two non-redundant sets, which guarantees the robustness and reliability of our method.One downside related to the availability of data is that it was trained and assessed using class A GPCR information only.Because of the structural differences between other classes, there is no guarantee that the performance demonstrated through our work is transferable to other classes.
It is also crucial to acknowledge that the limited data availability posed challenges during the GPCR-tm model's development.Nevertheless, we benchmarked our proposed method against alternative methods, showing that GPCR-tm outperformed or performed similarly to these methods.Hence, GPCR-tm represents a significant advance over current ranking platforms, which have been built for GPCRs.
We would also like to stress the relationships between GPCR-tm and GPCR structural elucidation studies.As cryo-EM caused a revolution in the GPCR structural biology field, it offered an alternative approach to traditional crystallography methods that often require protein engineering through point mutations to develop more stable proteins.Additionally, it is also important to point out that thermostabilizing mutations may be needed to just facilitate purification and structure determination but are not usually required once the structure has been determined.Despite that, thermostabilizing GPCRs using point mutations is still beneficial for several reasons.For example, it supports the study of dynamics and functions, such as ligand binding kinetics and receptor activation.Additionally, thermostabilization facilitates highthroughput screening assays for drug discovery efforts, complementing cryo-EM studies in identifying novel ligands and allosteric modulators.Moreover, GPCR-tm benefits from cryo-EM GPCR structures and any other tool that supports structure elucidation, as the model relies on a structure for ranking mutations based on their potential to enhance structural stability.
GPCR-tm is intended to support mutagenesis studies for GPCRs, decreasing the time and expenses of those studies.The model was built and validated using AlphaFold-Multistate models, displaying its reliability even when utilizing AlphaFold models for mutation prediction.GPCR-tm offers a robust ΔTm predictor, ranking stabilizing mutations tailored for GPCRs.This proposed tool is freely available as a scalable, user-friendly, and easy-to-use web server at https://biosig.lab.uq.edu.au/gpcr_tm/.

| METHODS
The general workflow of GPCR-tm is shown in Figure 3. GPCR-tm was trained using data sets of experimentally characterized mutations in GPCR proteins, for which structures were available.It is composed of four main steps, including: (i) data collection, which refers to collecting experimental data about phenotype-changing GPCR mutations; (ii) feature generation, which encompasses the feature engineering to model different aspects and particularities involved in GPCR data coming from sequences and structures, and (iii) machine learning, which highlights the development of the supervised learning models to predict and rank GPCR stabilization based on the computed features and experimental classification of thermal stability upon mutation; (iv) web server, which delivers a deployed GPCR-tm web server for easy and scalable access to the developed ML models from the previous step, providing both GPCR predictions and interpretability via a web platform.

| Data collection
We retrieved and gathered experimental ΔTm data from diverse databases, including FireProtDB (Stourac et al., 2021) and Thermomutdb (Xavier et al., 2021).These databases are thorough, manually compiled repositories of protein stability information concerning point mutations.Additionally, we also collected data from MPTherm-pred (Kulandaisamy et al., 2021), which is a web server that hosts a range of topologically specific models for forecasting the thermal stability of membrane proteins following missense mutations.All datasets were combined, and just one mutation was selected from repeated ones (repeated mutation means same position, same UniProt [Coudert et al., 2022] identification, mutated to the same residue).The mutations across these three databases are summarized in Table 2.The data used to develop GPCR-tm are freely available to download at https://biosig.lab.uq.edu.au/gpcr_tm/data.
The generation of features requires wild-type-like structures of the receptors represented in the data.We utilized inactive and active states GPCR structure models, which were built using AlphaFold-Multistate.These GPCR structure models are available in the GPCRdb (Pandy-Szekeres et al., 2022).All generated models are based on the wild-type sequence (Pandy-Szekeres et al., 2022).This step is crucial considering that most of the GPCR structures available on PDB (Berman et al., 2000) contain mutations that are essential for increasing stability and permitting experimental structure elucidation, and it is possible to download models without these mutations on GPCRdb.It is important to note that we selected models without loops.This was done to avoid using regions with low AlphaFold pLDDT scores and reduce structural bias in our proposed ML model (Jumper et al., 2021).We also did a preliminary structure manipulation, removing water molecules, additional chains, and ligands.

| Feature engineering
Single-point mutations have the potential to induce a variety of structural and functional alterations in the protein.Our research focuses on comprehensively capturing and investigating the impact of single-point mutations on GPCR proteins.To achieve this, we generated both sequence-and structure-based features to provide GPCRtm with diverse and complementary variables describing GPCRs (see Table 3).This characterization was subsequently employed to train, validate, and assess the predictive supervised models for predicting stabilization of GPCR proteins.

| Graph-based structural signatures
One of the main components of our method is a structure-based feature derived from the concept of graph-based signatures, which is based on the Cutoff Scanning Matrix algorithm (Pires et al., 2014), which was originally proposed to represent biological systems using network topology by distance patterns.
GPCR-tm uses a graph-based representation of the residue environments to extract geometric and physicochemical T A B L E 2 Data collection of G protein-coupled receptors mutations and their influence on structure stability.

Sequencebased
Amino acid substitution scoring matrices (Blosum62, PAM30, and special amino acids) Relative solvent accessibility patterns (the last represented in terms of pharmacophores).
The wild-type residue environment, which here is defined as the set of atoms within a distance r from its geometric center, can be modeled as a contact graph, where the atoms are the nodes, and the edges are interactions defined by a cutoff distance.By varying the distance cutoff, different graphs are generated, and cumulative distributions of distances for different interactions are generated, composing a concise and effective representation of the residue environment.To account for the atom changes induced by the mutation, we introduce a pharmacophore count vector.Wild-type and mutant residues are represented as pharmacophore frequency vectors.The frequency of each type of pharmacophore in a residue is then summarized in a vector p.The difference (PChange) between pharmacophore count for mutant (pmt) and wild-type (pwt) residues is calculated and appended to the signature.PChange formula is described in Equation ( 1): The atom pharmacophores are characteristics belonging to eight possible classes: hydrophobic, positive, negative, hydrogen acceptor, hydrogen donor, aromatic, sulfur, and neutral.These signatures have shown to be an effective and efficient method to model protein residue environment, its geometry and physicochemical properties, information that has been used to predict the effects of mutations on protein stability and affinity to its partners (Myung, Pires, & Ascher, 2020;Myung, Rodrigues, et al., 2020;Nguyen et al., 2021;Pires et al., 2014Pires et al., , 2016;;Pires & Ascher, 2016, 2017;Rodrigues et al., 2019Rodrigues et al., , 2021a;;2021b, 2024;Rodrigues & Ascher, 2022, 2023;Ryu et al., 2023), pharmacodynamic andpharmacokinetics (Al-Jarf et al., 2021;de Sa et al., 2022;Iftkhar et al., 2022;Morozov et al., 2023;Pires et al., 2015Pires et al., , 2022;;Pires & Ascher, 2020;Rodrigues et al., 2021cRodrigues et al., , 2022;;Velloso et al., 2021), and identify drug resistance (Hawkey et al., 2018;Karmakar et al., 2018Karmakar et al., , 2019Karmakar et al., , 2020;;Portelli et al., 2020;Portelli, Heaton, & Ascher, 2023;Zhan et al., 2021;Zhou et al., 2021) and disease mutations (Jessen- Howard et al., 2023;Karmakar et al., 2022;Lai et al., 2021;Portelli et al., 2021;Portelli, Albanaz, et al., 2023).

| Auxiliary features
Apart from using the graph-based signatures to map the structural GPCR protein information, we also employed other complementary features.Inside this set, we have both structure-based and sequence-based features.
The sequence-based feature includes AA substitution scoring matrices (Blosum62, PAM30, and special AAs), relative solvent accessibility (RSA), residue depth (RD), AA index, and potential function energy calculations.More specifically, the AA substitution scoring matrices (Blosum62, PAM30, and special AAs) were used to include information regarding rates at which various AA residues in proteins are substituted by other AA residues over time (Trivedi & Nagarajaram, 2019).RSA of a protein residue, in turn, is a measure of residue solvent exposure (Shrake & Rupley, 1973).Alternatively, RD describes how buried a residue is in the protein structure space (Chakravarty & Varadarajan, 1999).Finally, the AA index (Kawashima et al., 2008) comprehends a set of 20 numerical values representing any of the different physicochemical and biological properties of AAs.

| Model building and feature selection
After the generation of features, we aimed to find the best set of features and the best ML algorithm for the building of a reliable regressor and ranking model for GPCR stabilization.We tested four different algorithms (Raschka, 2015): Random Forest, Extremely Randomized Trees, Gradient Boosting, and Extreme Gradient Boosting (XGBOOST).The Scikit-learn toolkit (Pedregosa et al., 2011) was used for training, (cross-)validating, and testing the models (Raschka, 2020).All these models used 300 as the number of predictive (decision trees) in their resultant ensemble.Other hyperparameters were set to their respective default Scikit-learn values.
To avoid overfitting, increase performance, and reduce noise in the data, we attempted to find the best set of features and the best ML.For this task, we used a bottom-up greedy feature selection algorithm, which is a heuristic algorithm that locally selects the feature with the best feature in terms of a performance metric at each stage.The adopted algorithm employs a forward selection approach.This means it initially starts with zero selected features.Next, it evaluates all features individually and fixes the one with the best score (e.g., Pearson's correlation coefficient), using a 10-fold cross-validation procedure on a ML model.Thereafter, all remaining features are tested together with the first one previously selected.Subsequently, this feature selection process continues until the predictive performance stops improving with the inclusion of new features.
The proposed model is a regressor for predicting ΔTm, a continuous value.The blind test selected was 13% of the entire dataset.For testing the predicted values against the actual values, we used the proper metrics for regression models described in the next section.We have also used classification by regression as a means of comparing our tool (predicts ΔTm) with other available tools that predict ΔΔG.Subsequently, we converted the predictions and the actual values for two classes only: mutations that increase stability and mutations that decrease stability.We have also applied classification by regression during our second blind test, when evaluating our model against mutations available at GPCRdb, all characterized as increasing stability (Pandy-Szekeres et al., 2022).To avoid bias, we just selected mutations that were not included in our training data set.Utilizing a second blind test provides a crucial step for validating the model's generalization and reliability.This was a crucial step in evaluating the prediction ability of GPCR-tm to capture stabilizing mutations from a broader and generalized mutational landscape.This step was done by converting our prediction values to 0 or 1, meaning destabilizing or stabilizing, respectively.In this approach, zero was considered for negative values, one for positive values, and then the prediction was compared with the actual results.GPCRdb data consisted of 50 stabilizing non-redundant mutations (245 prior to redundancy removal when compared to the dataset used to train the model) from 16 different GPCRs.The test set contains all types of mutations and proteins from family A.

| Predictive performance evaluation
To assess the predictive performance of models, we gauged the effectiveness of our regression predictions by comparing them against both experimental and predicted ΔTm values.To achieve this, we employed Pearson's correlation coefficient and MSE, quantifying the relationships and deviations between our predictions and the actual values.We have also included Kendall's tau metric and the Spearman's rank-order correlation coefficient to measure the ranking precision of GPCR-tm.All the evaluation metrics for regression are detailed in Supporting Information S1.
Furthermore, as part of our classification through a regression approach, we evaluated the overall predictive performance of GPCR-tm using the metrics MCC, accuracy, and weighted F1 score.All the evaluation metrics for classification are also detailed in Supporting Information S1.
Additionally, GPCR-tm was compared to wellestablished tools designed to predict the effects of mutations on protein stability.Because these alternative tools for predicting protein stability are based on ΔΔG values and our tool is based on ΔTm, we compared them using classification by regression.For this purpose, we considered three possible outcomes: mutations that have a destabilizing, neutral, and stabilizing effect.For each one of the assessed predictors, we explored different threshold levels for defining the classification outcomes.We started evaluating neutral as being between a minimum of À0.1 and a maximum of 0.1 (values below À0.1 were considered destabilizing, and values above 0.1 were considered stabilizing).We altered these threshold levels to 0.05 (to the minimum, we subtracted 0.05, and to the maximum, we summed 0.05) until a maximum of À3.0 for the minimum and 3.0 for the maximum.

| Model interpretability
In our investigation, we opted to utilize SHAP (Lundberg & Lee, 2017) summary plots to delve into the significance of various features.SHAP assigns an important value to each feature concerning a particular prediction, providing a nuanced understanding of the factors influencing the model's output.These summary plots act as an intuitive and accessible tool, allowing us to comprehend the primary influences shaping the predictions made by GPCR-tm's model.
Within the SHAP summary plot, the visualization showcases the relationship between feature values and their impact on predictions.It illustrates how both low and high values of features are associated with either stabilization or destabilization effects, contributing to understanding the model's predictions.This visual representation enhances interpretability, making it easier to grasp the intricate dynamics between input features and the model's decision-making process.

F
I G U R E 3 The methodological steps followed in this work: data collection; feature engineering-two main classes of features are generated: (i) graph-based signatures of the wild-type residue environment and (ii) auxiliary features, comprehending substitutions matrix, interatomic interactions, and general properties of amino acids; machine learning, where training, validation, optimization and testing of the proposed supervised model are performed; and, web server deployment, which provides easy and robust access to the proposed predictive model (and its interpretation) via a web-based interface.GPCR, G protein-coupled receptors.
Comparative performance of GPCR-tm (in bold) across testing data sets with alternative stability predictive methods.
T A B L E 1