Molecular Determinant Underlying Selective Coupling of Primary G‐Protein by Class A GPCRs

Abstract G‐protein‐coupled receptors (GPCRs) transmit downstream signals predominantly via G‐protein pathways. However, the conformational basis of selective coupling of primary G‐protein remains elusive. Histamine receptors H2R and H3R couple with Gs‐ or Gi‐proteins respectively. Here, three cryo‐EM structures of H2R‐Gs and H3R‐Gi complexes are presented at a global resolution of 2.6‐2.7 Å. These structures reveal the unique binding pose for endogenous histamine in H3R, wherein the amino group interacts with E2065.46 of H3R instead of the conserved D1143.32 of other aminergic receptors. Furthermore, comparative analysis of the H2R‐Gs and H3R‐Gi complexes reveals that the structural geometry of TM5/TM6 determines the primary G‐protein selectivity in histamine receptors. Machine learning (ML)‐based structuromic profiling and functional analysis of class A GPCR–G‐protein complexes illustrate that TM5 length, TM5 tilt, and TM6 outward movement are key determinants of the Gs and Gi/o selectivity among the whole Class A family. Collectively, the findings uncover the common structural geometry within class A GPCRs that determines the primary Gs‐ and Gi/o‐coupling selectivity.


Introduction
Comprised of over 800 members, G-protein-coupled receptors (GPCRs) constitute the largest family of membrane proteins DOI: 10.1002/advs.202310120 in the human genome. [1]GPCR-mediated signaling pathways are involved in virtually every physiological function and many pathologies, thus representing the targets of approximately one-third of all medications, including antihistamines, antipsychotics, and opioid painkillers.2b] While GPCR activation by agonists typically triggers multiple G-protein signaling pathways, many GPCRs exhibit pronounced bias and can preferentially activate certain downstream responses. [3]In many circumstances, the beneficial effects of a drug come from one signaling pathway and the adverse effects from other pathways. [4]Understanding the molecular mechanism underlying the selective coupling of an agonist-stimulated GPCR to an appropriate G-protein is therefore crucial for transmembrane signal transduction.primary G-protein-coupling selectivity of GPCRs at the receptor side remains intricate.Most investigations suggest that GPCRs coupled with different G-proteins generally adopt nearly identical conformations, [10] with the exception of the 5HT 4 , [11] resulting in the widespread doubts of the existence of complementary determinants at the GPCR level. [12]9d,14] However, the limitation is that these studies often focused on specific receptors or receptor subfamilies, and the observed characteristics are not universal across class A GPCR families, rendering the primary G-protein selectivity mechanism still promiscuous and challenging.
Histamine, a biogenic amine, plays a pivotal role in various physiological and pathophysiological processes, such as allergic and inflammatory reactions, gastric acid secretion, wakesleep disorders, and neurotransmission in the central nervous system. [15]The histamine receptor system comprises four members (H 1 R, H 2 R, H 3 R, and H 4 R), all of which belong to the Class A GPCR family. [16]Drugs targeting H 1 R, H 2 R, and H 3 R have been approved to treat allergic disorders, [17] gastroduodenal ulcers, [18] and narcolepsy, [19] respectively.The four subtypes of histamine receptors deliver downstream signaling by coupling with different heterotrimeric G-proteins: H 1 R mainly couples with G q -protein; H 2 R primarily couples with G s -protein and could alternatively couple with G q -protein; H 3 R and H 4 R are both capable of mediating downstream signaling via G i/o -protein activation primarily.Given their diverse coupling profiles, histamine receptors serve as an excellent model system for investigating the structural basis of primary G-protein-coupling and the preferential profile of GPCRs.
To date, structures of the inactive states of H 1 R, H 2 R, and H 3 R, as well as the active states of H 1 R and H 4 R, are available. [20]owever, the absence of active-state structures of H 2 R and H 3 R has impeded the understanding of the molecular basis of H 2 R and H 3 R signaling and the structural-based drug design of histamine receptors.In this study, we report three cryo-EM structures of H 2 R-G s and H 3 R-G i complexes at a global resolution of 2.6-2.7 Å.These structures reveal the structural basis for agonist recognition and G-protein selectivity of H 2 R and H 3 R.A comparative analysis of the H 2 R-G s and H 3 R-G i complexes reveals that the conformation of TM5 and TM6 determines G-protein-selectivity in histamine receptors.Furthermore, machine learning (ML)-based structural profiling and functional analysis of publicly available and home-generated class A GPCR-G-protein complexes indicate that TM5 length, TM5 tilt, and TM6 outward movement determine G s and G i/o selectivity.To further enhance our understanding, we develop a ML-derived architecture classifier for G s -and G i/o -coupled receptors using homology models from GPCRdb as a training dataset.Remarkably, the classifier achieves an accuracy up to 91%, as validated by experimentally determined GPCR-G complex structures.Collectively, our results reveal the common structural geometries of class A GPCRs determine primary G s -and G i/o -coupling selectivity.

Cryo-EM Structures of the Agonist-Bound H 2 R-G s and H 3 R-G i Complexes
To investigate the structure of the H 2 R-G s complex, we coexpressed full-length H 2 R with the human heterotrimeric G s protein in Sf9 insect cells.A flag tag was fused to the N-terminus of H 2 R to aid purification.The H 2 R-G s complex was assembled in the presence of a highly selective agonist, amthamine, and further stabilized by a camelid antibody, Nb35. [21]Utilizing single-particle cryo-EM analysis, the purified H 2 R-G s complex was solved at a nominal resolution of 2.7 Å (Figure 1B; Figures S1, S2 and Table S1, Supporting Information).The H 3 R-G i complex was obtained through a co-expression strategy, similar to that of the H 2 R-G s complex.In addition to the N-terminal flag tag, a C-terminal tandem maltose-binding protein (MBP) tag was also introduced to H 3 R.A NanoBiT tethering strategy was used to improve the stability of the H 3 R-G i complex. [22]The endogenous agonist histamine and synthetic agonist immepip were used separately to generate the agonist-bound H 3 R-G i complexes, and the complexes were further stabilized by a single-chain antibody scFv16. [23]The histamine-bound H 3 R-G i complex and immepipbound H 3 R-G i complex were solved at a global resolution of 2.7 Å and 3.0 Å, respectively (Figure 1C; Figures S1,S2 and Table S1, Supporting Information).
All three GPCR-G-protein complex structures are resolved at near-atomic resolution, and the EM density maps were clear enough to build the model of the receptors, heterotrimeric Gprotein, antibody, and the bound ligand in the receptor orthosteric pockets (Figure S2 and Table S1, Supporting Information).The N-and C-termini of both receptors, as well as the ICL3 of H 3 R and the -helical domains of G, exhibit poor resolution in the cryo-EM density maps, consistent with previous cryo-EM solved GPCR-G-protein complexes due to the high flexibility of these regions.Notably, the density of cholesterol between TM1 and TM7 is observed in both H 2 R-G s and H 3 R-G i complexes, highlighting the critical role of cholesterol molecules in maintaining conformational stability and signaling activity of GPCR.

Ligand Recognition of Histamine Receptors
In the H 2 R-G s complex, the agonist amthamine occupies the conventional orthosteric pocket of H 2 R. As depicted in Figure S3A (Supporting Information), this pocket is defined by TM3, 5, 6, and 7, and it is capped by the extracellular loop 2 (ECL2) of H 2 R. The binding of amthamine is stabilized through a combination of polar and hydrophobic interactions (Figure 2A).20a,25] Additionally, the two amines present in the thiazolamine moiety of amthamine establish polar interactions with T103 3.37 , D186 5.42 , and T190 5.46 of H 2 R (Figure 2A).Consequently, the polar network effectively restrains both the aminoethyl group (head) and thiazolamine group (tail) of amthamine.Consistent with these observations, alanine mutagenesis of the majority of these residues markedly attenuated the cAMP accumulation response of H 2 R to amthamine (Figure 2B; Figure S4A and Table S2, Supporting Information).It's important to note that the T190 5.46 A mutant exhibits an anomalous behavior, possibly attributed to compensatory effects involving T103 3.37 .The thiazole ring of amthamine is sandwiched between V99 3.33 on one side and F251 6.52 /F254 6.55 on the other side, facilitating crucial hydrophobic interactions (Figure 2A).The methyl group position on the thiazole ring intensifies the hydrophobic interaction with F254 6.55 .Moreover, amthamine establishes van der Waals interactions with C102 3.36 and L274 7.39 (Figure 2A).Our mutational studies targeting these residues corroborate the substantial import of these hydrophobic interactions (Figure 2B; Figure S4A and Table S2, Supporting Information).
H 2 R displays the greatest similarity to H 1 R among the four histamine receptors, exhibiting an approximate 73% sequence homology in the 7TM region.As anticipated, the residues within the binding pocket of both H 1 R and H 2 R manifest a high degree of conservation.Among the 12 residues within the H 2 R binding pocket, seven are identical and three show homogeneous to their counterparts in H 1 R (Figure 2D; Figure S3F, Supporting Information).Notably, despite this conservation, discernible differences emerge in the ligand pocket shapes of H 1 R and H 2 R (Figure S3B,C, Supporting Information).Y108 3.33 and N198 5.46 , possessing substantial steric bulk, serve to constrict the ligand pocket in H 1 R (Figure 2C).In contrast, the structurally equivalent residues V99 3.33 and T190 5.46 in H 2 R do not introduce such steric hindrance.As a consequence, the docking of amthamine into H 1 R would inevitably give rise to significant steric clashes, particularly involving the thiazole ring and the primary amine moiety of the ring with Y108 3.33 and N198 5.46 (Figure 2C).Furthermore, the substitution of D186 5.42 in H 2 R with the corresponding T194 5.42 residue in H 1 R diminished the polar interactions with amthamine (Figure 2C).Collectively, our structures provide a clear explanation for the high selectivity of amthamine as an H 2 R agonist.Furthermore, mutations of these three residues of H 2 R to the corresponding residues in H 1 R significantly decreased the potency and efficacy of amthamine, further supporting our conclusion (Figure 2E; Table S2, Supporting Information).
The H 3 R-G i complexes, bound with histamine and immepip, exhibit highly analogous overall structures, featuring a remarkably low root-mean-square deviation (RMSD) of 0.3 Å for the H 3 R C atoms.Both ligands reside within the orthosteric pocket, a characteristic shared with H 2 R and other aminergic receptors, as depicted in Figure 2. In the histamine-bound H 3 R-G i complex, the amino group of histamine engages a pivotal hydrogen bond with E206 5.46 (Figure 2F).Simultaneously, the N  atom within the imidazole ring establishes van der Waals forces with C118 3.36 , while also forming hydrogen bonds with D114 3.32 and W402 7.43 (Figure 2F).Furthermore, the N  atom establishes a hydrogen bond with the backbone of F398 7.39 (Figure 2F).Through diligent alanine mutagenesis investigations, we affirmatively ascertained the indispensability of these interactions for histamine binding and the ensuing receptor activation.Notably, the alanine mutations targeting E206 5.46 , D114 3.32 , and W402 7.43 unequivocally abolish the capacity for G i protein activation (Figure 2H; Figure S4B and Table S2, Supporting Information).Additionally, the EC50 value of the C118 3.36 A mutant experiences an approximately 150-fold reduction in comparison to wild-type H 3 R. Furthermore, the imidazole ring of histamine securely embeds itself within the hydrophobic core constituted by Y374 6.51 , L401 7.42 , W402 7.43 , and F398 7.39 (Figure 2F).This interaction is particularly evident in the formation of a faceto-face - stacking association with F398 7. 39 .The veracity of these observations is endorsed by alanine substitutions targeting the residues implicated in this critical hydrophobic core, further underscoring their pivotal role in facilitating histamine recognition (Figure 2H; Figure S4B and Table S2, Supporting Information).
Within the H 3 R orthosteric pocket, the imidazole rings of histamine and immepip overlap and share many interactions with H 3 R in a nearly identical position (Figure 2G).Immepip, a synthetic agonist derived from histamine, replaces the aminoethyl moiety with a piperidine ring.A notable departure emerges in the positioning of the amine within the piperidine ring, situated a mere 1.3 Å closer to E206 5.46 than the primary amine in histamine, thus intensifying its polar interactions with E206 5.46 .The bulky piperidine ring also establishes hydrophobic and van der Waals interactions with Y115 3.33 and T119 3.37 , conferring enhanced compatibility of immepip with the dumbbell-shaped pocket characteristic of H 3 R (Figure S3D,E, Supporting Information).Consequently, the binding interface of immepip with H 3 R spans ≈290 Å 2 , an appreciable 20 Å 2 more expansive than that of histamine.Notably, despite modifications introduced to Y115 3.33 and T119 3.37 , the resulting effect on the pEC 50 values to immepip is negligible, potentially attributable to the compensatory amplification of polar interactions between the piperidine moiety and E206 5.46 (Figure 2H; Figure S4C and Table S2, Supporting Information).

Unique Histamine Binding Mode of H 3 R
Recent reports have detailed the structures of aminergic receptors bound to their endogenous ligands, including H 1 R with histamine, [20a] D1R with dopamine, [25c] 5-HT 1A with 5-HT, [25a] and  1 AR with noradrenaline. [26]Comparison of these structures reveals that the biogenic monoamine ligands adopt a nearly identical binding pose, situated at the bottom of the receptor's orthosteric binding pockets.Notably, the amino group of these monoamine ligands forms a salt-bridge with the residue D 3.32 , which is highly conserved among aminergic receptors.Intriguingly, our high-resolution cryo-EM Map unveils a distinctive binding pose of histamine in H 3 R, in which the amino group of histamine forms a salt-bridge with E206 5.46 rather than interacting with D114 3.32 in other aminergic receptors (Figure 3A-F; Figure S5, Supporting Information).To further validate this surprising observation, we constructed an alternative model with histamine adopting the classical pose observed in other aminergic receptors.This alternative model, along with the non-classical model revealed by cryo-EM map, were subjected to molecular dynamics (MD) simulation.As anticipated, histamine in our nonclassical pose for H 3 R remained stable throughout a 200 ns simulation (Figure S5B, Supporting Information).In contrast, histamine in the classic pose of aminergic receptors, as seen in H 1 R, exhibited high instability after a 200 ns simulation, with two considerably distinct conformations in the trajectory (Figure S5C, Supporting Information).These findings conclusively establish that histamine adopts a unique pose in H 3 R compared to other aminergic receptors.
Sequence alignment of all aminergic receptors reveals that the 5.46 position is mainly occupied by small side-chain residues that are either uncharged polar or hydrophobic, such as threonine, serine, and alanine (Table S3, Supporting Information).This allows sufficient space to accommodate the bulky conjugated group of monoamine ligands.Consequently, the relatively compact imidazole group of histamine enables the inclusion of residues with bulkier side chains in the orthosteric binding pocket, as exemplified by the presence of asparagine in H 1 R.However, compared to A 5.46 in 5-HT 1A and S 5.46 in D1R, the elongated glutamine residue in H 3 R causes severe steric hindrance with the imidazole group in the classic pose.Consequently, histamine rotates 180-degree along the bilayer plane, resulting in the amino group of histamine interacting with E206 5.46 instead of D114 3.32 (Figure 3A-F; Figure S5, Supporting Information).A comparison of the orthosteric ligand binding pockets of H 1 R and H 3 R reveals that five of the eight residues constituting the H 3 R ligand pocket are not conserved with H 1 R (Figure 2D; Figure S3F, Supporting Information).This indicates that the shape and size of histamine binding pockets are significantly distinct between the two histamine receptors.The unique histamine binding of H 3 R is further confirmed by the differential effects of mutations of H 1 R and H 3 R at the 5.46 position (Figure 3G,H; Table S4, Supporting Information).This position contributes to the different binding capacities to the endogenous ligand histamine, enabling the bulk discovery of selective agonists and antagonists targeting H 3 R.

Structural Differences Reveal the Basis of G-Protein Selectivity in Histamine Receptors
Histamine receptors H 1 R, H 2 R, and H 3 R primarily couple with G q -, G s -, or G i -proteins, respectively.Our structures of H 2 R-G s and H 3 R-G i complexes, along with the recently determined H 1 R-G q structures, [20a] enable structural comparisons among histamine receptor-G-protein signaling complexes.Such comparative analysis holds the potential to unveil the intricate molecular determinants underlying G-protein-coupling preferences within this receptor subfamily, offering insights that could extend to other GPCRs.Upon superimposing the three structures focused on the receptor, we observe a congruence in the alignment of TM1, TM2, TM3, TM7, and ICL1 on the cytoplasmic side.Conversely, disparities emerge in the configuration of TM4-6 and ICL2-3 (Figure S6A,B, Supporting Information).This observation highlights that while the 5 of G inserts into the intracellular cavity of the receptors, contributing to the classical 5-insertion interface within the complex structures, the specific manner in which the G-protein binds diverges significantly across the three structures.
In both H 1 R-G q and H 3 R-G i structures, TM6 exhibits comparable outward displacements at their cytoplasmic ends.However, in the H 2 R-G s complex, a significant TM6 movement of 4.7 Å further outward than in H 3 R, as measured at C atoms of residue 6.30 (Figure S6C, Supporting Information).Notably, this represents the most significant TM6 outward displacement among the histamine receptor subfamilies.Consequently, the C-terminus of 5 in G s shifts by 4.0 Å toward TM6 in comparison to G i within the H 3 R-G i complex, as measured at the conserved C atoms of residue H5.25 (common G numbering system) [27] in G (Figure S6C, Supporting Information).Additionally, beyond the dynamic TM6, we observed significant variations in the conformation of TM4, as well as their corresponding ICL2 regions, on the cytoplasmic side.Notably, the intracellular tip of TM4 in H 3 R exhibits a noteworthy 4.9 Å shift toward TM2 in comparison to H 2 R, as determined by measuring the C atoms of residue 4.40 (Figure S6B, Supporting Information).This specific movement induces the ICL2 of H 3 R to rotate approximately 64-degree compared to H 2 R (Figure S6B, Supporting Information).Besides, our H 2 R-G s and H 3 R-G i structures also reveal that the ICL2s form polar and hydrophobic interactions with the N helix, the 2-3 loop, and the 5 helix of G, which are critical for G-protein coupling.
The most notable distinctions among the three GPCR-Gprotein complexes are prominently evident within the TM5/TM6 regions (Figure 4A,B).In the H 2 R-G s complex, the length TM5 of H 2 R extends ≈2.5 helical turns (from position 5.67 to 5.75, encompassing nine residues) compared with H 1 R (Figure 4B; Figure S2, Supporting Information).Furthermore, the extended TM5 in H 2 R tilts toward TM6 from the Y202 5.58 residue for an 18-degree angle (Figure 4B).Subsequently, the cytosolic tip of TM5 inserts into the cavity of G s , which is formed by the G-4 loop, the 4 helix, and the 4-6 turn, resulting in additional interactions with the 4 of G s (Figure 4C).In the H 3 R-G i structure, both TM5 and TM6 of H 3 R exhibit unexpected elongations comprising nine and eight residues (2.5 and 2.2 helical turns) compared to H 1 R, respectively (Figure 4B).This elongation of TM5/TM6 regions in H 3 R establishes close contacts with the 4-6 turn of G i , thereby strengthening the H 3 R-G i coupling effectiveness (Figure 4D).Consequently, the interfaces of H 2 R-G s (1610 Å 2 ) and H 3 R-G i (1381 Å 2 ) are both significantly larger than that of H 1 R-G q (1144 Å 2 ).In conclusion, the elongations observed in TM5/TM6 regions within H 2 R and H 3 R constitute additional yet distinctive interfaces between the receptor and G-proteins, hereafter referred to as non-canonical receptor-G interface.We anticipated that the differential non-canonical receptor-G interactions resulting from the extended TM5/TM6 could be the key determinants for the primary G s or G i/o selectivity.To validate our hypothesis, we analyzed the structural geometries of TM5/TM6 among available GPCR-G-protein structures.This exploration aimed to unravel potential the correlations between the G s -or G i/o -coupling selectivity within class A receptors.

TM5 and TM6 are Responsible for G s and G i/o Selectivity
9a,b] Currently, ≈170 structures of class A GPCRs coupled to their corresponding G-protein are available (as of July 2022).Among them, 13 G sand 30 G i/o -coupled signaling complex structures (Table S5, Supporting Information) of unique class A receptors, whose structures have been resolved with their corresponding primary Gprotein, are chosen for further analysis.We found that the extended TM5/TM6 structural geometries observed in H 2 R-G s and H 3 R-G i complexes are widespread in above structures.More than half of receptors in G s -coupled state, the length of TM5 is remarkably longer than TM6 (the median length of TM5 and TM6 is 25 and 21, respectively (Table S5, Supporting Information).14c,28] In addition, similar to H 2 R, the cytosolic halve of TM5 in G s -coupled receptors exhibits a curved conformation toward TM6, with a range of tilting angles spanning from 8-to 26-degree (Figure 5A; Table S5, Supporting Information).Nevertheless, the situation for G i/o -coupled structures is more complex.14e,25a,29] Notably, for 5-HT 1A , 5-HT 1B , and 5-HT 1D , the length of TM6 is 35, 31, and 31 residues, respectively.A significant proportion of GPCR-G i/o structures, including MT 1 , M 2 R, 5-HT 1F , chemokine receptors, etc., exhibit short helices of both TM5 and TM6, with the length of TM6 similar or only slightly longer than TM5 (Figure 5B). [30]It is noteworthy that in the majority of G i/o -coupled receptor structures, the TM5 is straight, as in H 3 R, indicating that the cytosolic ends of TM5 are not curved toward TM6, as in H 2 R in G s -coupled structures (Figure 5A,B).Therefore, the length of TM5 and TM6, together with the tilt of the TM5 cytosolic end, exhibits significant differences between G s -and G i/o -coupled receptors, likely responsible for G s and G i/o selectivity.
Previous research has established that the outward movement of TM6 is the hallmark of receptor activation, thus plays a crucial role in receptor G s /G i/o -coupling selectivity.9c] To evaluate the significance of four structural geometries of TM5 and TM6 in G s and G i/o selectivity (the lengths of TM5 and TM6, the tilt of the TM5 cytosolic end, and the outward movement of TM6), we analyzed currently available G s -and G i/o -coupled structures of class A receptors, along with our unpublished structures (including a total of 20 G s -and 34 G i/o -coupled receptors) (Table S5, Supporting Information).The lengths of TM5 and TM6 were measured from the cytosolic tip to 5.50 and 6.50, respectively, and the lengths and angles of TM5/TM6 of receptors were extracted as illustrated in Figure 6A (see Experimental Section).Consistent with our speculation, significant differences were observed in the geometries of TM5 and TM6 between G s -and G i/o -coupled receptors.The average length of TM5 in G s -coupled receptors is four residues longer than that of G i/o -coupled receptors (23.1 ± 4.6 and 18.9 ± 3.1, respectively) (Figure 5C; Table S5, Supporting Information).The tilt of TM5 in G s -coupled structures ).E) Replacement of I 5.56 -F 6.44 of H 2 R with F 5.56 -F 6.44 of H 3 R did not confer the ability to dissociate G i -protein (left), but resulted in loss of the ability to activate the G s signal pathway (right).F) Replacement of P 5.50 -P 6.50 of H 3 R with P 5.50 -P 6.50 of H 2 R resulted in gain of the ability to activate the G s signal pathway (left), but loss of the ability to dissociate G i -protein (right).was distributed into two major clusters (12.3 ± 7.1°), while the majority of the G i/o -coupled receptors exhibit smaller tilts (5.90 ± 2.9°) (Figure 5C; Table S5, Supporting Information).The average outward movement of TM6 in G s -coupled receptors was slightly larger than that of G i/o -coupled structures (23.8 ± 8.4°a nd 19.7 ± 7.6°, respectively), consistent with previous findings (Figure 5C; Table S5, Supporting Information).However, the TM6 lengths of G s -coupled receptors and G i/o -coupled receptors were indistinguishable, with comparable length distributions and medians (Figure 5C; Table S5, Supporting Information).In conclusion, the structural geometries of TM5/TM6, particularly the length of TM5, the tilt of cytosolic TM5, and the outward movement of TM6, may be responsible for primary G s and G i/o selectivity.
To validate the hypothesis that the preference selection of Gproteins for class A receptors is determined by TM5/TM6, we conducted "TM5/TM6 swap" experiments to reverse the G s and G i/o selectivity of receptors (Figure 5D).In the first set of experiments, we replaced the intracellular half of TM5, TM6, and ICL3 (P 5.50 to P 6.50 ) of G i/o -coupled receptors with that of G s -coupled receptors.As expected, these chimeric receptors induced cAMP accumulation, including H 3 R(H 2 R_P 5.50 -P 6.50 ), H 3 R(D1R_P 5.50 -P 6.50 ), H 3 R(MC4R_P 5.50 -P 6.50 ), and H 3 R(2AR_P 5.50 -P 6.50 ), and D2R(H 2 R_P 5.50 -P 6.50 ) (Figure 5E; Figure S7A and Table S6, Supporting Information).In the second set of experiments, we attempted to switch G s -coupled receptors to G i/o -coupled receptors.Although these engineered receptors could not recruit G i proteins, their ability to stimulate cAMP accumulation was almost lost (Figure 5F; Figure S7B and Table S6, Supporting Information).In conclusion, these "TM5/TM6 swap" experiments could alter the G s -or G i/o -coupling selectivity of receptors, supporting the notion that the conformational architecture of TM5/TM6 determines the G s and G i/o selectivity of a GPCR.

Classification of G s -and G i/o -Coupled Receptors by Machine Learning
Can the structural determinants of G s and G i/o selectivity identified in this study be generalized to all Class A GPCRs?To address this question, a machine learning approach was utilized to categorize primary G s -and G i/o -coupled receptors, based on the structural geometries of TM5/TM6 of receptors (Figure 6A).To augment the generalization of the predictive model, we incorporated predicted structures in the structural datasets (see Experimental Section).The training set comprised 98 (24 G s -and 74 G i/ocoupled receptors) GPCRdb-predicted homology models (Table S7, Supporting Information), [31] while the test set consisted of 54 (20 G s -and 34 G i/o -coupled receptors) true (experimental) structures (Table S5, Supporting Information).In light of potential correlations among the four structural geometries of TM5/TM6, we opted to apply Principal Component Analysis (PCA) [32] to effectively decorrelate and compress these characterizes, thereby facilitating a reduction of the TM5/TM6 geometry space from a 4D to a 2D construct (Figure 6B,C).The first two Principle Component (PCs) are reserved for further analysis (PC1 and PC2), contributing more than 86% of the variance (Figure 6B,C).The individual biplots of the training and true datasets revealed that the TM5 length, TM5 tilt, and TM6 outward movement are the primary contributors to the principal components (The contributions of each variable to the principal components are depicted as vectors on the plot, where the vertical component of a vector on a given PC illustrates the respective contribution of that variable to the PC) (Figure 6B,C).Specifically, the TM6 outward movement significantly influences PC1, whereas the length and tilt of TM5 play a critical role in PC2 (Figure 6B,C).Moreover, these three vectors exhibit similarities in the principal component plots between the training and true datasets, particularly in terms of TM5 tilt and TM6 outward movement (Figure 6B,C).The significance of TM5 length, TM5 tilt, and TM6 outward movement in determining the selectivity of G s -and G i/o -protein coupling in class A receptors is suggested by these findings.
Remarkably, the biplots indicate that between the two datasets, the TM6 length has the largest discrepancy: its eigenvector directions are almost opposite in the two biplots, whereas the remaining vectors have similar directions (Figure 6B,C).In the training dataset, the impact of the TM6 length vector on each principal component appears to be minimal and positively correlated to both PCs (Figure 6B,C).In contrast, it exerts a negligible effect on PC1 and displays a negative correlation with PC2 in the true dataset (Figure 6B,C).Subsequent violin plots reveal a symmetry of the distributions of TM5 tilt and TM6 outward movement in the two datasets (Figure 6D).Nevertheless, there is a notable disparity in the distribution pattern of TM6 length between the two datasets (Figure 6D).Further correlation analysis of four geometries of TM5/TM6 elucidates similar inconsistency.The TM6 length from the training dataset reveals a mildly positive correlation of 0.20 for the TM5 tilt (Figure S8, Supporting Information).Conversely, the TM6 length from the true dataset unveils a negative correlation for TM5 tilt, with a magnitude of −0.29 (Figure S8, Supporting Information).In conclusion, these results imply that the TM6 length poses a challenge for model training, concurrently indicating that TM6 length marginally contributes to G-protein selectivity.
Aimed at deriving a more precise conclusion about the underlying mechanism, we trained a classifier for primarily G s -and G i/o -coupled receptors through a machine-learning approach directly on the three standardized geometries of TM5/TM6 (TM6 length excluded).We employed a variant of the Random Forest Classifier [33] with constraints to the number of estimators and their maximum depth to decrease the risk of overfitting.Validation of the model using the true dataset achieved an accuracy of 90.74%, F1-score of 0.8943, Area Under Curve (AUC) of 0.8162, and Matthews Correlation Coefficient (MCC) of 0.8086 (Figure 6E; Tables S5,S8, Supporting Information).This further illustrates the effectiveness of these three geometries for selectivity.Interestingly, certain falsely predicted receptors (PE2R2, PE2R4, V2R) in our classifier exhibited comparable G s and G i/oprotein selectivity, and LSHR had limited proof to support its primary coupling according to the GPCRdb/GproteinDb. [31]In brief, the predictive model has fulfilled good performance.
Besides, with the aid of PCA visualization, we can depict a probable decision boundary given by our best classifier to derive a simpler formula (Figure 6F).The decision boundary elucidates that GPCRs with extended TM5 length, significant TM5 tilt, and prominent TM6 outward movement are indicative of G s -coupling receptors.In contrast, GPCRs featuring shorter TM5 length, minimal TM5 tilt, and constrained TM6 outward movement are more likely associated with G i/o -coupling receptors (Figure 7F).

TM6 Length is not Significant for G s /G i/o Selectivity
Structuromic analysis and ML analysis of GPCR-G-protein complexes suggested that the length of TM6 was not responsible for G s and G i/o selectivity.Therefore, further investigation is required to elucidate the mechanisms underlying the differential effects of TM5 and TM6 length in G-protein selectivity.In the GPCR-G s complex, TM5 extends to insert into the groove formed by the G-4 loop, the 4 helix, and the 4-6 loop of G s (Figure 7A), leading to increased interactions with G s .To achieve this, extended TM5 tilts toward TM6 to fit into the groove in G s .Failure to accomplish this could result in severe steric hindrance with the G-4 loop of G s , as observed in D1R (Figure 7A), [28d] indicating a correlation between the length of TM5 and its tilt in the G s -coupling receptors.Consistent with this, biplot analyses of GPCRdb-predicted structures and true structures consistently reveal a high correlation between the length and tilt of TM5 in G s -coupling receptors, with coefficients of 0.54 and 0.63, respectively (Figure S8, Supporting Information).However, the groove in G s is absent in G i/o due to the absence of 13 residues in the G-4 loop (Figure 7B).Consequently, in the GPCR-G i/o complex, the extended TM5 and TM6 can only rest on the flattened surface of G i/o , resulting in smaller contact areas and fewer interactions compared to the G s -coupled structure.
To validate this hypothesis, we investigated whether the extended TM5 in G s -coupled structures or extended TM6 in G i/ocoupled structures contributed equivalently to the increase in the contact area.Among the 54 true GPCR-G structures, the median length of the entire TM5 is 23 in 20 G s -coupled structures and 18 in 34 G i/o -coupled structures (Table S5, Supporting Information).The median length of TM6 is 20.5 in G s -coupled re- ceptors and 22 in G i/o -coupled structures.Consequently, we considered TM5 longer than 18 (below 5.69) in G s -coupled receptors and TM6 longer than 20.5 (below 6.29) in G i/o -coupled receptors as extended TMs.Interestingly, we found that the additional contact area between the extended TM5 and G s exhibited a linear correlation with the extended residue amount, ≈40±3.0Å 2 per residue (R 2 = 0.91) (Figure 7C; Figure S9 and Table S9, Supporting Information).For example, the contact area between the 15-residue-extended TM5 of D1R with G s reached 561 Å 2 (Figure 7A).In contrast, the increased contact area between TM6 and G i/o did not show a significant correlation with the length of the extended TM6.For instance, the contact area between the 15-residue-extended TM6 of 5-HT 1A and G i was only 192.9 Å 2 (Figure 7B).
Visualizing cryo-EM density map of G i/o -coupled complexes further revealed that the extended TM6 is not significant for Gprotein coupling.25a,29b] Consistently, the removal of the extended TM6 of H 3 R posed negligible effect on its G i activity even when nine residues were removed from the cytosolic extremity of TM6 (Figure 7D,E; Table S10, Supporting Information).In contrast, cAMP accumulation experiments confirmed that the removal of the extended TM5 of H 2 R significantly reduced its G s -protein activity (Figure 7D,E; Table S10, Supporting Information).In conclusion, our findings indicate that a long TM5 is crucial for G s -coupled receptors, whereas the TM6 length is not significant for G s and G i/o selectivity.

Discussion
The four histamine receptors serve essential roles in pathophysiological and signaling events, rendering them significant drug targets.In this study, we presented the structures of one H 2 R-G s and two H 3 R-G i complexes, offering profound insights into the orthosteric pocket of histamine receptors and G-protein selectivity.Through our investigation of the amthamine-bound H 2 R structure and mutagenesis studies, we revealed the discrepancy in the ligand pocket of H 1 R and H 2 R that contributed to H 2 R selectivity for the agonist amthamine.Notably, our findings uncovered a unique binding pose for endogenous histamine in H 3 R, deviating from other monoamine receptors, as the amino group interacts with E206 5.46 in H 3 R instead of the conserved D114 3.32 found in other aminergic receptors.This observation coincided with the behavior of the antipsychotic drug quetiapine, which exhibits high or modest affinities for many aminergic receptors with the exception of H 3 R and H 4 R. [34] Importantly, the structural geometry differences observed in TM5/TM6 between H 2 R-G s and H 3 R-G i structures led us to propose the determinants of primary G s -and G i/o -coupling selectivity.
GPCRs predominantly regulate downstream signaling through the engagement and activation of four major G-protein subtypes.2a] The selective coupling of GPCRs to the G-proteins is crucial for effective signal transduction, especially those primarily coupled with G s -and G i/o -proteins with remarkable coupling efficiency and rapid kinetics.The explosion of GPCR-G-protein structures in recent years has significantly enhanced our comprehension on GPCR coupling mechanisms.1b] However, the selectivity signatures of receptors are more complicated due to the great divergence in the amino acid sequences and structural conformations of GPCRs.In contrast to the 16 members of G proteins which adopt similar conformations, the intracellular parts of GPCRs are more diverse, particularly in the regions of TM5, TM6, H8, intracellular loops, and the receptor C-terminal tail.By combining reported GPCR-G-protein structures with unpublished structures from our laboratory, we obtained a number of non-redundant G s -coupled or G i/o -coupled structures of class A receptors, totaling 20 and 34, respectively.These structures revealed that the selectivity of G s and G i/o is determined by TM5/TM6 geometries, including the length and tilt of TM5, and the outward movement of TM6.The length and tilt of TM5 exhibit a high correlation in true GPCR-G s complexes, consistent with the extended and tilted TM5 (e.g., H 2 R) inserting into the groove formed by the G-4 loop, the 4 helix, and the 4-6 loop of G s .However, the G-4 loop is 13 residues shorter in G i/o than in G s , which cannot constitute the same groove as G s in G i/o .Hence, the extended TM6 (e.g., H 3 R) leads to less contact with G i in G i/o -coupled structures.This phenomenon indicates that the interactions of GPCR-G s are stronger than that of GPCR-G i/o complexes, which is consistent with Matic et al.'s report that the binding energy of G s complexes is stronger than that of G i/o complexes. [13]Furthermore, our "TM5/6-breaking" experiments also confirmed that TM6 length is not significant for G s and G i/o selectivity.
The intricate molecular mechanism underlying primary Gprotein coupling selectivity in GPCRs remains complex and not fully elucidated.The structures of GPCR-G-protein complexes indicate that the ICL2, ICL3, TM5, and TM6 of receptors are associated with G-protein-coupling selectivity.However, these cytoplasmic regions of receptors rarely exhibit common patterns in terms of the sequence or amino-acid properties related to Gprotein selectivity.In our prior study, we proposed that the TM5 and TM6 helices synergistically alternate lengths to determine the selectivity between G s -and G i/o -proteins, defined as "a macroswitch". [11]Comparative sequence analyses revealed that specific amino acids function as "micro-switches" uniquely localized in the complementary pocket.However, these two switches only partially determine the selectivity between G s and G i/o .Aided by the drastic increasing number and improved qualities of the structures of GPCRs in complexes with different G-proteins in house and in the public data server, in the present study, we conducted a systematic evaluation to thoroughly explore the intricate relationship between the four geometries of TM5/TM6 and their impact on primary G-protein selectivity by using machine learning-based structuromic profiling.Using the GPCRdb homology models as the training dataset, we generated a G s -or G i/o -coupling classifier that achieved an impressive 91% accuracy in the true dataset based on the TM5/TM6 geometries.Based on our classifier, we find that GPCRs with a long TM5 length and large TM5 tilt and TM6 outward movement are G s -coupling receptors, whilst GPCRs with a short TM5 length and small TM5 tilt and TM6 outward movement are more likely to be G i/o -coupling receptors.We anticipate that GPCRs that fail to satisfy the aforementioned conditions probably have comparable or promiscuous activity in stimulating G s and G i/o signaling (Figure 7F).Interestingly, class B1 receptors, primarily coupled to G s -protein, exhibit large TM5 tilt and TM6 outward movement.These structural geometries also conform to the G-protein selectivity features observed in Class A receptors, further demonstrating the universality of our proposed G-protein selectivity mechanism (Table S11, Supporting Information).In summary, our studies revealed that the structural geometry of TM5/TM6 serves as the determinant of the primary G-protein signaling selectivity in GPCRs.These findings have the potential to aid efforts in engineering GPCR selectivity, especially by modifying promiscuous GPCRs to improve affinity to specific G-proteins.

Experimental Section
Constructs: The wild-type human H 2 R gene was cloned into the pFast-Bac1 vector.The N-terminus of H 2 R was fused with the hemagglutinin signal peptide (HA) to enhance receptor expression, followed by a Flag tag (DYKDDDK) to facilitate complex purification.The wild-type human H 3 R 445 isoform was cloned into a modified pFastBac1 vector with a hemagglutinin (HA) signal sequence at the N-terminus and a PreScission protease site followed by a Flag tag.The C-terminus of H 3 R was fused with the LgBiT, [22] followed by a TEV protease site and a double MBP tag to facilitate expression and purification.The dominant-negative bovine G s (DNG s ) and human G i1 (DNG i1 ) were generated by site-directed mutagenesis, as previously described, to stabilize the interaction with the  subunits, [36] and were cloned into the pFastBac1 vector.Human G1 was fused with a 10× His tag at the N-terminus and HiBiT at the C-terminus, and together with G2, was cloned into the pFastBac dual vector.
Cryo-Grid Preparation and EM Data Collection: Three microliters of the purified GPCR-G-protein complexes were applied onto a glow-discharged holey carbon grid (Quantifoil R1.2/1.3) at ≈20 mg mL −1 .The grids were plunge-frozen in liquid ethane using Vitrobot Mark IV (Thermo Fisher Scientific).The frozen grids were then transferred to liquid nitrogen and stored for data collection.
Cryo-EM imaging was performed on a Titan Krios at 300 kV using the Gatan K2 Summit detector at the Center of Cryo-Electron Microscopy, Zhejiang University (Hangzhou, China).Movies were recorded at a dose rate of approximately 8.0 e/Å2/s with a defocus ranging from −1.0 to −2.2 μm using the SerialEM software in counting mode for the three receptor-Gprotein complexes, respectively.The total exposure time was 8 seconds and 40 frames were recorded per micrograph.A total of 3122, 3008, and 3079 movies were collected for the amthamine-H 2 R-G s , histamine-H 3 R-G i , and immepip-H 3 R-G i complexes, respectively, for structure reconstruction.
For the amthamine-H 2 R-G s complex, automated particle picking yielded 3361930 particles.The particles extracted from the dataset were downscaled two times and subjected to 2D classification.The map of the GPBAR-G s complex (EMD-30344), [14c] low-pass filtered to 40 Å, was used as an initial reference model for two rounds of 3D classification, resulting in a well-defined subset of 857906 particles.The particles were then re-extracted with the original pixel size and performed further 3D classification focusing on the receptor-G s complex, which produced a good subset accounting for 368447 particles.After 3D refinement, CTF refinement, and Bayesian polishing of the final particles, the final refinement map was generated with a global resolution of 2.7 Å at a Fourier shell correlation of 0.143.
For the histamine-H 3 R-G i complex, template-based particle selection produced 2487329 particles.After 2D classification and two rounds of 3D classification using the 5-HT 1D -G i complex (EMD-30974) [25a] low-pass filtered map as an initial reference model, a well-defined subset with 742110 particles was selected.Further 3D classifications focusing on the alignment of the receptor-G i complex produced a good subset accounting for 234203 particles, which were subsequently subjected to 3D refinement, CTF refinement, and Bayesian polishing.The final refinement generated a map with an indicated global resolution of 2.7 Å at a Fourier shell correlation of 0.143.
For the immepip-H 3 R-G i complex, template-based particle selection produced 2050983 particles.After 2D classification and two rounds of 3D classification using the histamine-H 3 R-G i complex low-pass filtered map as an initial reference model, two well-defined subsets with 697323 particles were selected.Further 3D classifications focusing on alignment of the receptor produced a good subset accounting for 330449 particles, which were subsequently subjected to 3D refinement, CTF refinement, and Bayesian polishing.The final refinement generated a map with an indicated global resolution of 3.0 Å at a Fourier shell correlation of 0.143.
Model Building and Refinement: For the H 2 R-G s complexes, the initial model of H 2 R was downloaded from the activated homology models of H 2 R from GPCRdb. [31]25a] Agonist and lipid coordinates and geometry restraints were generated using phenix.elbow.Then, the models were docked into the cryo-EM density map using Chimera.After the initial docked models were refined using Rosetta, the models were subjected to iterative rounds of manual adjustment and auto-refinement in Coot and Phenix, respectively.The final refinement scores were validated by the module "comprehensive validation (cryo-EM)" in Phenix.Structure figures were prepared using Py-MOL, Chimera, and ChimeraX.
Molecular Dynamics Simulations: First, the histamine bound to H 3 R model was subtracted from histamine-H 3 R-G i complex.Second, the histamine was also positioned in a similar direction to the histamine-H 1 R structure and fitted to the cryo-EM density.The orientations of the receptors were calculated by the Positioning of Proteins in Membranes (PPM) Web Server. [39]Following these steps, the whole systems were prepared using the CHARMM-GUI and embedded in a bilayer consisting of 200 1-palmitoyl-2-oleoyl-sn-glycero-3-phosphocholine (POPC) lipids using replacement methods. [40]The membrane systems were then solvated into a periodic TIP3P water box supplemented with 0.15 m NaCl.The CHARMM36m Force Field was used to model protein molecules, and the CHARMM General Force Field (CGenFF) was used for the agonist histamine.
Subsequently, two systems were subjected to minimization for 10 000 steps using the conjugate gradient algorithm.They were then heated and equilibrated at 310.13 K and 1 atm for 200 ps with 10.0 kcal mol −1 Å −2 harmonic restraints in the NAMD 2.13 software.This was followed by 5 cycles of equilibration for 2 ns each at 310.13 K and 1 atm, during which the harmonic restraints were sequentially reduced to 5.0, 2.5, 1.0, 0.5, 0.1 kcal mol −1 Å −2 .Production simulations were run at 310.13 K and 1 atm in the NPT ensemble using the Langevin thermostat and Nose-Hoover method for 250 ns.Electrostatic interactions were calculated using the particle mesh Ewald (PME) method with a cutoff of 12 Å.Throughout the final stages of equilibration and production, 5.0 kcal mol −1 Å −2 harmonic restraints were placed on the backbone of the receptors due to the flexibility of the second extracellular loop (ICL2) of H 3 R. Trajectories were visualized and analyzed using Visual Molecular Dynamics (VMD) version 1.9.3.
GloSensor cAMP Assay: The agonist-induced cAMP accumulation was measured using the GloSensor™ cAMP assay kit (Promega).HEK293T cells were co-transfected with a 3:1 ratio of WT or mutant and the pGloSensor™−22F plasmid by using LIPO (YEASON).After a minimum of 6 h, transfected cells were seeded onto cell adherent reagent (Applygen) coated 384-well culture plates and incubated for more than 12 h at 37 °C in 5% CO 2 .The culture medium was removed and PBS was added to starve the cells.After 20 min later, the PBS was removed, and the cells were incubated with the equilibration medium (CO 2 -Independent Medium (Gibco) with 10% FBS) containing a 4% dilution of the GloSensor™ cAMP reagent stock solution for 30 minutes at 37 °C and then 10 min at RT.To obtain the dose-response curves, serially diluted agonists were added to each well to stimulate the cells.Luminance signal was measured using 200 ms intervals then (TECAN, 25 °C).Dose-responses were generated from the peak response.cAMP accumulation was analyzed by a standard doseresponse curve using GraphPad Prism 8.0 (GraphPad Software).EC50 and pEC50± SEM were calculated using nonlinear regression (curve fit).Data are means ± SEM from at least three independent experiments performed in technical triplicates.
NanoLuc Binary Technology (NanoBiT) Assay: For the measurement of G-protein dissociation, HEK293T cells were co-transfected with WT or mutant, G i -LgBit, G, and G-SmBit plasmids at a 6:2:5:5 ratio.Following transfection, cells were seeded onto 96-well culture plates and incubated for over 12 h at 37 °C in 5% CO 2 .Then, the cells were rinsed twice with D-PBS and incubated with 4 nm coelenterazine-400a (Maokangbio) in HBSS supplemented with 5 mm HEPES pH 7.4 and 0.1% BSA for 1 h.The baseline luminance signal was read immediately for 5 cycles (TECAN, 25 °C).
To obtain the dose-response curves, serially diluted agonists were added to each well to stimulate the cells.The luminance signal was then measured for another 15 cycles using 500 ms intervals (TECAN, 25 °C).Doseresponses were generated from the 5th response, and the row data was standardized by the baseline and the vehicle group.G-protein activation was analyzed by a standard dose-response curve using GraphPad Prism 8.0 (GraphPad Software).EC50 and pEC50± SEM were calculated using nonlinear regression (curve fit).Data are means ± SEM from at least three independent experiments performed in technical triplicates.
Enzyme Linked Immunosorbent (ELISA) assay: To confirm the cell surface expression of H 3 R and its mutants, the ELISA was performed 24 h after transfection using cells plated on 96-white plates.The cells were fixed with 4% formaldehyde for five minutes at RT and then washed once with PBS.Following fixation, the cells were blocked with blocking buffer (1% FBS in PBS) for 2 h at RT. Afterwards, the plates were incubated with a 1:10000 dilution of monoclonal ANTI-FLAG M2-Peroxidase (HRP) antibody (Sigma Aldrich) in blocking buffer for another 0.5 h at RT.After careful washing, 80 μL per well of SuperSignal ELISA Femto Maximum Sensitivity Substrate (ThermoFisher Scientific) solution was added.The luminance signal was measured using 500 ms intervals.The data are presented as means ± SEM from at least three independent experiments performed in technical triplicates.
Collections of Activated GPCR Data Sets: For the true GPCR datasets, structures were collected from reported distinct class A GPCR-G-protein complex structures that are primarily coupled to G s -or G i/o -proteins.Several G s -or G i/o -coupled receptors that are primarily coupled to G q protein, such as CCK 1 R, CCK 2 R, GASR, NK 1 R, and NTSR1, were excluded.The true datasets were enlarged to 20 G s -and 34 G i/o -coupled structures by adding the four G s -coupled and one G i -coupled complexes from our unpublished data.
For the GPCRdb homology model datasets, the primary G s -or G i/ocoupled activated class A GPCR homology models were collected from GPCRdb.However, the accuracy of orphan class A receptors homology models is poor due to the lack of true homologous structures.After removing the orphan receptors, 98 (24 G s -and 74 G i/o -coupled receptors) GPCRdb homology models were used as training datasets.
Extractions of Length and Angle: To conveniently extract the length and angle geometries of receptors, the reference H 3 R model was positioned on the lipid bilayer using the PPM server.The 7TMs of H 3 R were perpendicular to the membrane plane.Then, the structures of true GPCR dataset and GPCRdb homology model dataset were superimposed onto H 3 R.The Z-axis of all receptor models was perpendicular to the membrane plane.The lengths of TM5 and TM6 were measured from the cytosolic tip to 5.50 and 6.50, respectively.The angle of TM5 and TM6 was calculated between the orientation of a helix and the reference direction.The orientation of a helix was obtained by averaging the C = O bond orientation of each amino acid on the helix segment.Considering that an alpha helix turn consists of 3.6 amino acids, we selected 11 amino acids from the cytoplasmic tip of TM5 or TM6 to calculate the average orientation and eliminate the component perpendicular to the helix axis.The reference orientation for TM5 angle calculation is the TM5 orientation of H 3 R, in which the average C = O bond orientation is 22 amino acids instead of 11.As the TM6 angle indicates the TM6 outward movement during receptor activation, the reference orientation for the TM6 angle is the central axis of the receptors, which is the Z-axis that is perpendicular to the membrane plane.
Data Pre-Processing and Machine Learning Classification: We utilize the Sklearn package [41] to perform the data pre-processing and machine learning tasks.First, the TM6 length is excluded due to the statistical reason previously stated, followed by standardizing, namely, centering and scaling the three-geometries dataset.From above, we construct a training set of (3,n) dimensions where n = 98 (Gs:24, Gi:74) is the number of data points in the training set.
For the cross-validation, the stratified 10-fold validation is used on the GPCRdb set.After that, we train the Random Forest classifier on the whole training set with the parameters as follows (unspecified parameters set as default): RandomForestClassifier(max_depth = 1, n_estimators = 15, max_features = 1, class_weight = {0:74, 1:24}), in which the bagging strategy is used with sample ratio as 0.7.Note that the standardization fit with the training dataset is applied to the true dataset as well.The best model is trained with random seed as 2784 for both bagging and Random Forest.
Subsequently, we apply PCA to fit the training dataset and get a PCA transformation for visualization.The technique of linear dimensionality reduction, known as Principal Component Analysis (PCA), employs the Singular Value Decomposition method to project the data into a subspace of reduced dimensions.We use the PCA implementation in sklearn.decompositionand it can be used with the default parameters.The first two principal components are reserved, which explain over 87% variance of the data.
The resultant principle can be distilled from the decision boundary for approximate categorization.When X>0.7, it is highly probable that GPCR will couple with G s , indicative of an elongated TM length, a substantial TM tilt, and a significant outward shift in TM6 orientation.Conversely when X falls below or equals 0.7, there's an increased likelihood of GPCR coupling with G i which comprises instances that are either Y.
Statistical Analysis: Data were represented as the mean ± SEM values of ≥3 independent experiments.The raw data of the NanoBiT assay was standardized by the baseline and the vehicle group, while the raw data of cAMP and ELISA assay was standardized by the corresponding WT group.The sigmoidal curves of the reporter luciferase assay were analyzed using a standard dose-response curve in GraphPad Prism version 8.02 (GraphPad Software, San Diego, CA, USA).Data sets with two groups were analyzed by Student's t-tests.Data with ≥3 groups were analyzed by one-way analysis of variance (ANOVA), while the violin plots depicting four distinct features of TM5/TM6 in G s -and G i/o -coupled receptors are calculated using Mann-Whitney U test taking advantage of scikit learn python package.A significance level of p < 0.05 was considered, and the details are provided in the figure legends.

Figure 1 .
Figure 1.Cryo-EM structures of H 2 R-G s and H 3 R-G i complexes.A) (Up) Proportion of G s-or G i/o -coupled receptors in the non-olfactory GPCRs.(Down) Number of receptors primarily coupling with G s -or G i/o -proteins.B) The cryo-EM density map (left) and atomic model (right) of amthamine-bound H 2 R-G s complex.The amthamine is depicted as stick within a transparent EM density map.C) The cryo-EM density map (left) and atomic model (right) of H 3 R-G i complex.The ligands histamine and immepip are depicted as sticks within a transparent EM density map.

Figure 2 .
Figure 2. Ligand recognition of histamine receptors.A) Detailed interaction of amthamine with H 2 R. Dashed lines represent hydrogen bonds.B)Amthamine induced cAMP accumulation in HEK293 cells expressing H 2 R mutants of the residues in orthosteric pocket (n = 3, ordinary one-way ANOVA, ****P < 0.0001, ND, not determinable, which refers to cannot be established over the tested concentration range, ns refers to no significance between the WT and mutant).C) Structural comparisons of ligand recognition between histamine-H 1 R and amthamine-H 2 R structures.D) Sequence alignment of the orthosteric pocket of histamine receptors.Residues that interact with ligands are highlighted with colored circles.E) Dose-dependent curves for amthamine induced cAMP accumulation in HEK293 cells expressing the H 2 R mutants (V99 3.33 Y, D186 5.42 T, T190 5.46 N) that are not conserved in H 1 R (n = 3).F-G) Detailed interactions of histamine F) and immepip G) with H 3 R. H) Agonists induced G i dissociation in HEK293 cells expressing H 3 R mutants of the residues in orthosteric pocket by NanoBiT assays (n = 3, ordinary one-way ANOVA, *P = 0.0456, ****P < 0.0001, NR refers to no response to the ligand, ND, not determinable, which refers to cannot be established over the tested concentration range, ns refers to no significance between the WT and mutant).

Figure 4 .
Figure 4. Structural comparison of histamine receptor-G-protein complexes.A) Structural superposition of the active models of H 1 R, H 2 R, and H 3 R. B) Comparison of TM5 and TM6 in H 1 R-G q , H 2 R-G s , and H 3 R-G i complexes.C,D) Surface (left) and cartoon (right) representation of binding area of H 2 R (C) and H 3 R (D) with their respective coupled G subunit.Residues within 4Å of H 2 R or H 3 R in G s or G i are highlighted in yellow or orange, respectively.

Figure 5 .
Figure 5. TM5 and TM6 are responsible for G s and G i selectivity.A) Representative of the TM5/TM6 structural geometry of G s -coupled receptors.B) Representation of the TM5/TM6 structural geometry of G i -coupled receptors.C) Violin plots depicting four distinct features of TM5/TM6 in G s -and G i/o -coupled receptors.The white dot denotes the median.The interquartile range is shown by the broad black bar in the middle.Except for points considered to be "outliers" using an interquartile range-based technique, the thin line reflects the remainder of the distribution.n = 54, ns P > 0.05, *P < 0.05, **P < 0.01 by Mann-Whitney U test.D) Schematic representation of the "TM5/TM6 swap" experiments.E-F) Dose-dependent curves for histamine induced G-protein dissociation and cAMP accumulation in HEK293 cells expressing chimera receptors (n = 3).E) Replacement of I 5.56 -F 6.44 of H 2 R with F 5.56 -F 6.44 of H 3 R did not confer the ability to dissociate G i -protein (left), but resulted in loss of the ability to activate the G s signal pathway (right).F) Replacement of P 5.50 -P 6.50 of H 3 R with P 5.50 -P 6.50 of H 2 R resulted in gain of the ability to activate the G s signal pathway (left), but loss of the ability to dissociate G i -protein (right).

Figure 6 .
Figure 6.Machine learning-based classification of GPCRs into G s and G i signaling pathway.A) Workflow for GPCR classification utilizing machine learning and feature pre-processing.The length and tilt of TM5, as well as the length and outward movement of TM6 are extracted from active GPCR structures obtained from GPCRdb homology models and True structures.The resulting data of these four geometries are standardized, and then subjected to PCA.Subsequently, a Random Forest classifier is employed to classify GPCRs based on the PCA results.B,C) PCA biplots were generated for the four geometries of GPCRdb homology models (B) and True GPCR structures (C) individually.The contributions of each variable to the principal components are depicted as vectors on the plot, where the vertical component of a vector on a given PC illustrates the respective contribution of that variable to the PC.The angle between two vectors reflects the correlation between the corresponding features, while the length of a vector represents the significance of the corresponding feature.D) Comparison of violin plots depicting four geometries of TM5/TM6 in GPCRdb structures and True GPCR structures.Here the median and interquartile range are depicted as dashes.E) Confusion matrix for the True GPCR structures given by the best classifier.F) Decision boundary visualization via PCA given by the best classifier and scatter plot of GPCRdb homology models (left) and True GPCR structures (right) based on GPCRdb homology models.

Figure 7 .
Figure 7. Contributions to G-protein signaling of extended TM5/TM6.A) (Up) Surface and cartoon representations of G s are shown, with the surface of G s depicted as transparent.(Down)The binding surface of TM5 with G s of D1R is shown, with TM5 beyond I 5.69 noted as extended helix.Residues that interact with G are highlighted in green.B) (Up)Surface and cartoon representations of G i are shown, with the surface of G i depicted as transparent.(Down) The binding surface of TM6 with G i/o of 5-HT 1A .TM6 beyond R 6.29 was considered as extended helix.C) The additional contact areas of G s -or G i/o -coupled receptors between the extended TM5/TM6 with G s /G i/o , respectively.D) Schematic representation of the TM5-(up) or TM6-(down) breaking mutants.E) Dose-dependent curves for histamine induced cAMP accumulation or G-protein dissociation in HEK293 cells expressing the truncated receptors (n = 3).(Up) The TM5 of H 2 R was respectively truncated by four amino acids and eight amino acids.The cAMP assay validated that the length of TM5 was closely related to the activation of the G s signal pathway.(Down) The TM6 of H 3 R was respectively truncated by eight amino acids and nine amino acids.The NanoBit assay validated that the length of TM6 did not correlate with the dissociation of the G i -protein.F) Schematic representation of the geometry determinants for G-protein selectivity of GPCRs.