The role of leucine and isoleucine in tuning the hydropathy of class A GPCRs

Leucine and Isoleucine are two amino acids that differ only by the positioning of one methyl group. This small difference can have important consequences in α‐helices, as the β‐branching of Ile results in helix destabilization. We set out to investigate whether there are general trends for the occurrences of Leu and Ile residues in the structures and sequences of class A GPCRs (G protein‐coupled receptors). GPCRs are integral membrane proteins in which α‐helices span the plasma membrane seven times and which play a crucial role in signal transmission. We found that Leu side chains are generally more exposed at the protein surface than Ile side chains. We explored whether this difference might be attributed to different functions of the two amino acids and tested if Leu tunes the hydrophobicity of the transmembrane domain based on the Wimley‐White whole‐residue hydrophobicity scales. Leu content decreases the variation in hydropathy between receptors and correlates with the non‐Leu receptor hydropathy. Both measures indicate that hydropathy is tuned by Leu. To test this idea further, we generated protein sequences with random amino acid compositions using a simple numerical model, in which hydropathy was tuned by adjusting the number of Leu residues. The model was able to replicate the observations made with class A GPCR sequences. We speculate that the hydropathy of transmembrane domains of class A GPCRs is tuned by Leu (and to some lesser degree by Lys and Val) to facilitate correct insertion into membranes and/or to stably anchor the receptors within membranes.

explored whether this difference might be attributed to different functions of the two amino acids and tested if Leu tunes the hydrophobicity of the transmembrane domain based on the Wimley-White whole-residue hydrophobicity scales.Leu content decreases the variation in hydropathy between receptors and correlates with the non-Leu receptor hydropathy.Both measures indicate that hydropathy is tuned by Leu.To test this idea further, we generated protein sequences with random amino acid compositions using a simple numerical model, in which hydropathy was tuned by adjusting the number of Leu residues.The model was able to replicate the observations made with class A GPCR sequences.We speculate that the hydropathy of transmembrane domains of class A GPCRs is tuned by Leu (and to some lesser degree by Lys and Val) to facilitate correct insertion into membranes and/or to stably anchor the receptors within membranes.

K E Y W O R D S
G protein-coupled receptor, hydrophobicity, lipid, lipophilicity, membrane insertion, membrane protein, transmembrane

| INTRODUCTION
Leucine and isoleucine are two amino acids that are identical except for the position of one methyl group, which is attached to the γcarbon in Leu and the β-carbon in Ile.The high similarity raises the question to which degree these two amino acids are used differently in proteins by nature.Most well-known is that Ile has a lower propensity to be within α-helices due to steric clashes caused by the β-branching. 1,2In addition, another study showed differences in how the two amino acids interact with lipids. 3However, little is known if there are additional general trends that distinguish the two amino acids within proteins.We were interested in whether differences exist between these two amino acids within the structures and sequences of class A G protein-coupled receptors (GPCRs).
GPCRs are eukaryotic membrane proteins that possess a transmembrane domain (TMD) consisting of seven plasma membrane spanning α-helices (TM1-7).These proteins are receptors that detect a variety of GPCR subtype-specific stimuli, including photons, small organic molecules, and proteins.Absorption of a photon or binding of a molecule leads to conformational rearrangements that activate the receptor.Active GPCRs transmit the received signal further to cellular transducers such as G proteins and β-arrestins, which in turn initiate specific signaling cascades.5][6] Class A (or rhodopsinlike) GPCRs are the most abundant and diverse receptors and include the most thoroughly studied GPCRs. 7Since GPCRs span the hydrophobic environment of the plasma membrane, there is no partitioning into hydrophobic core and hydrophilic shell as present in soluble proteins.
][10][11] These factors add different restraints on the primary sequence and lead to a general increase in the hydrophobicity of GPCRs and other membrane proteins in comparison to soluble proteins.
One way to determine the overall hydrophobicity of an entire protein or a stretch of an amino acid sequence is by applying the Wimley-White whole-residue hydrophobicity scales. 12,13These scales are based on the change in free energy (ΔG) for the transfer of amino acids from water to a lipid (POPC) bilayer interface (ΔG wif ) and from water to octanol (ΔG woct ).Negative values for either indicate that an amino acid is hydrophobic in the sense that it energetically disfavors to be in water.
The difference between both values (ΔG woct À ΔG wif ) captures the change in free energy for the insertion into a membrane.Negative values indicate that an amino acid favors the aliphatic environment of octanol over the membrane interface and thus favors the insertion into a membrane.For conciseness, we refer to the difference in octanol and interface scales (ΔG woct À ΔG wif ) as hydropathy.Amino acid sequence stretches with negative hydropathy typically indicate transmembrane elements.This is used to predict membrane-spanning elements within membrane protein sequences based on hydropathy plots. 12 found differences between Leu and Ile within class A GPCR structures with respect to packing density and protein-surface exposure of the side chains.Leu residues are more commonly found at the receptor surface and in less densely packed areas of the receptor.We explored the idea that Leu adopts a role in tuning TMD hydropathy and shows thus differences in these structural properties compared to Ile.Leu appears in specific patterns within the amino acid compositions of these GPCR TMDs that would match this putative role.We further assessed to which extent the observed patterns could be expected based on a simple numerical model for amino acid frequencies within TMDs.

| METHODS
214 GPCR structures were downloaded from the GPCRdb, 14 selecting all TMD helices, helix 8, and the loops between these elements.Of these structures, 119 were annotated as active and 95 as inactive.
Structures of active-state GPCRs are typically determined in the agonist-and G protein-bound form, whereas inactive-state structures are determined in complex with antagonists or inverse agonists.Protons were added using PyMol v2.4.2. 15 PDB identification codes of the structures are listed in Table S1 in the supplementary information (SI).Packing densities were calculated by counting the number of atoms within a 5 Å radius around the δ-methyl carbons of Leu and Ile and by dividing this count by the spherical volume (523.6 Å 3 ).Atoms that belong to the same residue as the probed methyl group were not included in the calculation.Side chains are thus referred to as densely packed when a high number of protein atoms is present in the vicinity of the δ-methyl carbons.We used the δ-methyl carbons to quantify the packing densities at the maximum extension of the side chains.
Both Leu δ-methyl carbons were included in the calculation of average packing densities.The exposure of side chains at the protein surface was quantified by determining the solvent-accessible surface area (SASA) using GETAREA 16 with default settings, including a water probe radius of 1.4 Å.The relative area of side chains at the protein surfaces was calculated by dividing the SASA by the mean SASA values obtained for side chains of the free amino acids (174.2Å 2 for Ile and 174.0 Å 2 for Leu).Therefore, protein-surface exposure indicates either the absolute area accessible to the water probe (in Å 2 ) or the percentage of the side-chain surface that is accessible to it.
Class A GPCR sequences (1580 in total from 325 targets (i.e., receptors)) were downloaded from GPCRdb selecting only sequences of TMD helices.Hydropathies were calculated based on the differences of Wimley-White whole-residue hydrophobicity scales for the transfer of an amino acid from water to a lipid bilayer interface and from water to octanol (ΔG woct À ΔG wif ). 12,13The sum of all amino acid hydropathies was taken as the hydropathy of the TMDs.Only the hydropathies for protonation states of amino acids at pH 7 were considered.Spearman's rank correlation coefficients (ρ) were calculated based on amino acid content, that is, the number of residues of an amino acid within the TMD divided by the sequence length of the TMD, when not mentioned otherwise.All calculations, statistical analyses, and plots were done using R v4.3.0 17 with RStudio 2023.03.1 + 446 18 and the packages bio3d 19 and stringr. 20| RESULTS

| Leu & Ile in class A GPCRs
We compared Leu and Ile residues based on how densely packed their side chains are and how strongly these side chains are exposed on the protein surface.Since GPCRs are membrane proteins, protein-surface exposure captures contact with the lipid bilayer or the solvent, depending on the location of the side chain.Further, side-chain packing density and protein-surface exposure measure overlapping properties of residues within protein structures, that is, a high level of protein-surface exposure will lead to a small packing density for a given residue.A sample of 214 experimental structures from 93 unique GPCRs indicates that Ile tends to occur in more densely packed regions than Leu (Figure 1A) and that Leu tends to be more protein-surface exposed than Ile (Figure 1B).This is true for the majority of receptor structures, as a total of 88.3% of them display more densely packed Ile residues and 81.3% more protein-surface exposed Leu.This general difference between the two amino acids suggests that Leu and Ile residues tend to occur within different structural contexts within class A GPCRs, with Leu being likely more prone to interact with the lipids in the membrane bilayer.
Interestingly, significant differences in side-chain packing between active-and inactive-state GPCRs are present for Leu and Ile (Figure 1C).For both residues, packing density is smaller in active-than in inactive-state structures (Ile: À8.5%, Leu: À4.7%).This is further accompanied by a less pronounced and statistically non-significant increase in protein-surface exposure of the two amino acids (Ile: +4.6%, Leu: +2.6%; Figure S1).Similar trends are present for other amino acids as well (Figure S2).GPCR activation leads to conformational changes that allow G proteins to bind.These ) based on Spearman's rank correlation coefficient (ρ).Spearman's ρ ranges from À1 (perfect negative correlation) to +1 (perfect positive correlation), while 0 indicates the absence of any correlation.Negative correlations show that TMD hydropathy decreases (i.e., TMD is more hydrophobic) when the amino acid content increases.Thus, the higher the Ile content, the more hydrophobic the TMD.(F) Correlations between amino acid content (of the indicated amino acid) and TMD hydropathy calculated without the indicated amino acid («Hy(ÀAA)»).Positive correlations show that TMD hydropathy calculated without the indicated amino acid increases (i.e., TMD is more hydrophilic) when the amino acid content increases.Thus, the higher the Leu content, the more hydrophilic the non-Leu content of the TMD.Only correlations for the five most hydrophobic amino acids are shown in E and F. Correlations for all amino acids are shown in Figure S3.conformational changes include the outward movement of TM6 and the subsequent opening of a cytosolic crevice that accommodates the C-terminal helix of a G protein. 7,21Active-state GPCR structures are generally determined in the presence of G proteins or other intracellular binding partners, 22,23 which were not included in the calculation of packing densities.It is therefore likely that the decrease in packing density upon activation reflects the opening-up of the G protein-binding pocket.This further matches the more pronounced decrease in packing density for Ile than Leu since this conformational change can be expected to have a stronger impact on more buried residues.
Several hypotheses can be formulated to explain the observed differences in packing densities and protein-surface exposure between Leu and Ile side chains in class A GPCRs, which are not necessarily mutually exclusive.For example, the destabilizing effects due to the β-branching of Ile might require additional structural restraints for adequately accommodating Ile within α-helices, preventing it from being more protein-surface exposed.Another underlying rationale could be that Leu might form better interactions with lipids than Ile and thus occurs more often on the protein surface.The hypothesis that intrigued us the most was that Leu is more protein-surface exposed because it adjusts the hydropathy of class A GPCRs for optimal insertion into membranes and/or for anchoring the receptors within membranes.If this is indeed the case, then we expect that, for example, a GPCR with many polar residues in the TMD should have an increased number of Leu residues to compensate for the excess hydrophilicity.We refer to this compensation as hydropathy tuning by Leu, as it likely indicates that the Leu content complements the hydropathy of non-Leu residues.This makes hydropathy tuning an evolutionary process that might be detected by comparing TMDs that share a common ancestor, as is the case for class A GPCRs.
We hypothesized that whether or not Leu tunes TMD hydropathy in class A GPCRs could be detected by two patterns with which this amino acid occurs in the overall amino acid composition of the TMDs.The first pattern is based on the spread of values of a property within a population: A property that needs to adopt a defined optimal value will display little variation between different members of the population.If mainly one factor optimizes the value of such a property, then the removal of that factor will lead to a larger variation in the resulting values, since they are no longer optimized.Hence, if Leu is responsible for optimizing TMD hydropathy, then hydropathies calculated without Leu (i.e., the hydropathy of TMDs calculated using all non-Leu residues) should show a larger variation than when calculated including Leu.This was indeed the case for the sequences of 1580 class A GPCR TMDs (Figure 1D).Among all amino acids, Leu displays the strongest impact on TMD hydropathy variation.
The second expected pattern is related to correlations between Leu content and TMD hydropathy as calculated using all residues and all non-Leu residues.If Leu tunes hydropathy, then a positive correlation between Leu content and TMD hydropathy of all non-Leu residues is expected because more Leu residues are required to compensate for a more hydrophilic sequence, that is, the more hydrophilic the TMD of a GPCR is (without Leu), the more Leu residues are required to make this TMD sufficiently hydrophobic (Figure 1F).This, however, means that the Leu content and the overall TMD hydropathies should be mostly uncorrelated, that is, the TMD of a GPCR does not become more hydrophobic the more Leu it contains (Figure 1E).
Overall, the correlations between Leu content and TMD hydropathy in the sample of class A GPCR sequences match these predictions (Figure 1E,F).
Interestingly, the patterns that are observed for Leu are absent or weaker for Ile, suggesting that Ile is not (or much less) involved in tuning the hydropathy of the TMDs.This is insofar surprising as both amino acid share similar side chain structures and hydropathies, with Ile (À0.81 kcal/mol) being even slightly more hydrophobic than Leu (À0.69 kcal/mol).In addition, also Lys and Val show similar albeit mostly weaker patterns as found for Leu (Figure 1D-F, S3), suggesting that they might be involved in hydropathy tuning as well, but to a smaller extent than Leu.The effects of these two amino acids are shown in more detail at the end of the results section.
The patterns for Leu and Ile are also present in the TMD sequences of the analyzed structures, which represent a sub-sample (6% of the TMDs) of the set of 1580 class A GPCR TMD sequences (Figure S4).Interestingly, the effects for Val are more pronounced in this sub-sample, which might either be by chance due to the small sequence sample or it might suggest that Val is important for hydropathy tuning in a subset of receptors, which are better represented by the structure sample.The structural information for these TMD sequences allows to distinguish between residues that are buried within the TMD from residues that are exposed on the protein surface and thus more likely to interact with lipid tails (Figures S5 and S6).
This division reveals that the patterns indicative of hydropathy tuning by Leu are present for protein-surface exposed residues and almost absent for buried residues.This matches to the observation that Leu residues are more protein-surface exposed than Ile and thus are likely more involved in protein-lipid interactions.Interestingly, the effects observed for Val mostly disappear with the division of residues into buried and protein-surface exposed, suggesting that the hydropathy tuning by Val, if present, is considerably weaker than the tuning by Leu (Figures S5 and S6).However, such a strict division in buried and protein-surface exposed residues might overlook potential effects from protein-lipid interactions of transient folding intermediates.
Taken together, Leu appears to be the most important amino acid for hydropathy tuning in class A GPCR TMDs.However, this is likely not the only function of this amino acid.To quantify the extent to which the above-described effects are present when only a part of all Leu residues is involved in hydropathy tuning, we performed numerical simulations based on a simplified model for amino acid compositions.The hydropathy of Ile was assigned to A (h A = À0.81kcal/mol), and the one of Leu was assigned to B (h B = À0.69 kcal/mol).C and D, and their counts «c» and «d», were modeled to reflect all other amino acids with hydropathies smaller than zero and amino acids with hydropathies larger than zero, respectively.C and D are thus generic amino acids that represent the averages of all hydrophobic (except Ile and Leu) and all hydrophilic amino acids, respectively.For C and D, the average hydropathies of the amino acids they represent were used (h C : À0.36 kcal/mol, h D : 0.8175 kcal/mol).Amino acid compositions for simulated sequences were created by generating Gaussian distributed numbers for a-d based on the amino acid occurrences in the class A GPCR TMD sequences (A: 8.8% ± 3.0% (SD), B: 15.2% ± 3.4%, C: 28.0% ± 3.0%, D: 48.0% ± 2.8%).The generated random numbers a-d were then multiplied by 220 and rounded to the nearest integer to obtain sequence lengths that are comparable to the lengths of the GPCR TMD sequences.To test for statistical features, 1 0 500 sequences were generated in every run, and 10 0 000 runs were performed.Driver residues were introduced to drive hydropathies toward a defined optimum value h opt , which was set to À1.5 kcal/mol to resemble the mean hydropathy of the class A GPCR TMD sequences (À1.47 kcal/mol).With B as the driver, «a», «c», and «d» were randomly determined by a Gaussian distribution as described above.

| A numerical model for hydropathy tuning
Then «b» was determined as shown by the equation below, with g(B) being a randomly Gaussian distributed number and h B being the hydropathy of B. The first term calculates the difference between the optimal and the already present hydropathy, and divides it by the hydropathy of B, yielding the value of «b» needed to get to the optimal hydropathy.A defined degree of noise was introduced using f drive , which determines the amount of drive towards the optimum value h opt , with the rest (1 À f drive ) being determined randomly by the Gaussian distribution g(B).The value of f drive used was 0.25, which, however, does not mean that 25% of the final number of «b» is driving the hydropathy towards the desired value since this fraction additionally depends on the value of h opt .Interestingly, the variances and correlations were essentially identical between runs with different values for h opt , indicating that the actual value of h opt is not important to observe the effects of tuning towards it.
Two different models were tested (Figure 2).In the first model, all amino acids were modeled independently from the resulting hydropathies by generating a-d based on Gaussian distributions alone.This simulates a case in which TMD hydropathy is not optimized and resulted in a mean hydropathy of 25.4 kcal/mol (Figure 2A-C).In the second model, «a», «c», and «d» were generated based on Gaussian distributions, resulting in a mean hydropathy of 48.5 kcal/mol.
Subsequently, «b» was chosen based on the equation shown above to drive the mean towards h opt , resulting in a mean hydropathy of 18.7 kcal/mol.This simulates the case in which Leu would be the driving force for tuning the hydropathy of the TMDs (Figure 2D-F).
The results confirm the anticipated effects for hydropathy tuning: The variation between hydropathies is larger when calculated without the tuning amino acid than with all amino acids (Figure 2D), and a positive correlation exists between the content of the tuning amino acid and the hydropathy calculated without this amino acid (Figure 2F).
Further, the simulation shows the degree to which the effects are present when only a fraction of the tuning amino acid is driving the hydropathy toward the optimum value.The f drive of 0.25 leads to similar patterns as were observed within the sequences of class A GPCR TMDs, supporting the idea that Leu is responsible for tuning the hydropathy of these TMDs.Interestingly, the numerical model also captures the overall patterns of Ile within the TMD sequences.In the simulation, «amino acid» A was modeled to correspond to Ile, and its content was determined by a Gaussian distribution alone, suggesting that Ile is not necessarily involved in tuning TMD hydropathy in class A GPCRs despite showing a moderate positive correlation with non-Ile hydropathy (Figures 1F and 2F).

| The role of other amino acids in hydropathy tuning
Other amino acids (foremost Lys and Val) display similar albeit mostly weaker patterns as found for Leu in variances (Figure 1D) and correlations (Figure S3) and thus appear to tune TMD hydropathy to some extent as well.Since tuning results in a reduction of hydropathy variation between receptors, the absolute impact of each amino acid can be measured using the change in hydropathy standard deviation (Figure 3A).This is similar to the approach used above (Figure 1D, 2A,D), where the variance instead of the standard deviation was used to measure the change in hydropathy variation.
The larger the reduction in standard deviation, the stronger the contribution to hydropathy tuning.This indicates that Leu contributes the most (1.16 kcal/mol), followed by Lys (0.69 kcal/mol), Val (0.35 kcal/mol), and Asn (0.21 kcal/mol).Especially Lys contributes therefore considerably to hydropathy tuning besides Leu.This is further confirmed by similarly strong correlations between the numbers of Leu and Lys residues and the TMD hydropathies calculated using all non-Leu and all non-Lys residues, respectively (Figure 3B,C).The reason why Lys has a smaller impact on the standard deviation than Leu despite similar correlations is due to the lower abundance of Lys in the amino acid composition of TMDs (mean number per TMD of Lys: 5.9, of Leu: 35.3).
Class A GPCRs are divided into subclasses based on functional similarities.This subdivision indicates that while Leu remains the most pronounced tuner overall, hydropathy tuning becomes more complex within subclasses (Figure 3D-G, Figure S7).Leu is a very strong tuner in peptide-, protein-, and lipid-binding receptors but only a moderate tuner in aminergic receptors.Interestingly, aminergic receptors are the only subclass in which Val contributes significantly to hydropathy tuning, suggesting that Val assumes the role of hydropathy tuner to a large extent in this subclass.Interestingly, aminergic receptors are overrepresented in the structure sample analyzed here (28.5% of the 214 structures compared to 15.0% of the 1580 TMD sequences), which might therefore explain why Val appears to be a considerable tuner when the TMD sequences of the structures are analyzed (Figure S4).
Two other interesting class A GPCR subclasses are the sensory and the lipid receptors.In sensory GPCRs, hydropathy tuning appears to be provided by multiple amino acids, including Leu, Lys, Ile, and Glu (Figure S7), which makes this subclass the only instance of hydropathy tuning by Ile.Even though all class A GPCRs share a common ancestor, TMD sequences show an average pairwise identity of only 22.8%.
Interestingly, sensory receptor TMDs show the highest average pairwise identity with 50.7%, which could suggest that different amino acids tune hydropathies at different phylogenetic distances.In lipid receptors, Leu and Lys are both very strong hydropathy tuners, which seems to counteract the strong negative influences on the hydropathy variance by Ile (Figure 3G) and Gly (Figure S7).Interestingly, lipid receptors are the subclass with the least Ile and the most Leu residues on average per receptor.

It is unclear how generalizable the observations made on class A
GPCRs are.Outside of class A, only class B2 GPCRs show similar patterns as present in class A, whereas, for example, class C GPCRs hydropathy appears to be tuned exclusively by Ile (Figure S8).A preliminary analysis of other membrane proteins suggests that for some groups of proteins, such as microbial rhodopsins and potassium channels, it is also Leu that contributes most to hydropathy tuning (Figure S9).However, in secY translocons, Lys contributes most to A simple numerical model was able to reproduce the overall magnitudes of the two patterns when the number of Leu residues was adjusted to drive the hydropathy toward an optimal value.Overall, our observations suggest that the hydropathy of class A GPCR TMDs is foremost tuned by Leu, with additional contributions especially from Lys and to some degree from Val, followed by other amino acids.
Since hydropathy is a measure for the energetics of membrane insertion, the appropriate content of these amino acids appears to ensure that class A GPCRs are inserted into membranes and/or are stably anchored within them.Therefore, we present a potential functional difference between Leu and Ile despite their almost interchangeable side-chain architecture.The average number of residues per TMD in each subclass is given at the bottom of each plot with the standard deviation in parentheses (e.g., there are 32 ± 6 (SD) Leu residues per TMD in aminergic receptors).The same plots for all amino acids are shown in Figure S7.
Leu content and protein hydrophobicity have previously been linked in proteins of thermophiles.In thermophilic organisms, an increased hydrophobicity in the protein core improves thermostability, which keeps these proteins functional at elevated temperatures. 24The comparison between 110 pairs of homologous proteins from thermophilic and mesophilic organisms indicated that the Leu content is significantly higher in thermophilic proteins and accounts for a significant change in the aliphatic index. 25The aliphatic index quantifies hydrophobicity based on the Ala, Val, Ile, and Leu content of a protein. 26Interestingly, the authors of that study used the correlation between the aliphatic index and Leu content to question the validity of the aliphatic index, whereas we would interpret it in a way that the increase in Leu content is the reason for the increased hydrophobicity of these proteins.
One underlying rationale why Leu but not Ile preferentially tunes hydropathy could be that a mutation of any α-helical residue to Leu is less destabilizing than a mutation to the β-branched Ile.Therefore, if an increased protein hydrophobicity is beneficial, then a mutation to Leu might be preserved more commonly than a mutation to Ile, despite their comparable hydrophobicity.In the case of GPCRs, such a stability-driven effect could be further amplified due to the lower intrinsic stability of GPCRs compared to other proteins. 27However, it is unclear to what degree such an effect exists in a membrane environment since α-helix destabilization by β-branching appears absent within membranes, at least for single-span α-helices. 28other possibility is that Leu provides better interactions with lipids than Ile and is therefore responsible for hydropathy tuning.Due to the difference in branching position, surface-exposed Leu residues possess higher hydrophobic side chain densities remote to the helix backbone thereby providing more surface for forming interactions with lipid tails compared to Ile. 3 Thus, Leu provides increased proteinlipid interactions, which likely becomes advantageous for proteins within less densely packed, fluid membranes, as has been shown by Deber and Stone using a peptide model system. 3Differences in how the two amino acids interact with lipids suggest that membrane properties (such as lipid composition) might influence the degree to which Leu or Ile (or potentially other amino acids) tune TMD hydropathy.
While Leu displayed the strongest impact on hydropathy, other amino acids, foremost Lys, also contribute to some extent.Lys are known to be important for ensuring correct membrane topology and are predominantly found together with Arg at cytosolic sites in transmembrane proteins as described by the positive-inside rule. 29,30Lys and Arg are among the most surface-exposed amino acids within class A GPCRs, indicating that they provide interactions between TMDs and the environment (Figure S2).However, while Lys and Arg occur in similar average numbers per receptor (Arg: 8.3, Lys: 5.9), Arg never shows patterns that would suggest its involvement in hydropathy tuning.Interestingly, there are relatively strong positive correlations between the number of Lys and Ile residues (Spearman's ρ = 0.51) and between the number of Arg and Leu residues (Spearman's ρ = 0.36; Figure S10).Combined with the roles of Leu and Lys in hydropathy tuning, these correlations suggest that the number of Lys varies to complement the number of Ile residues, whereas the number of Leu varies to complement the number of Arg residues.
The whole system becomes even more complex as soon as all correlations between different pairs of amino acids are considered, indicating that there is likely more to the overarching theme of Leu as the major hydropathy tuner (Figure S10).Further, even though we assumed that Leu content is adjusted to match the non-Leu hydropathy (as tested by the numeric simulation), we cannot entirely exclude that the causal link of the correlation is different.It might be possible that the patterns could also be produced when the hydropathy of all non-Leu residues complements the Leu content.Whereas the reason for this relation is not obvious for Leu, it might be more intuitive for Lys, as a higher number of Lys might be better at ensuring proper membrane topology.Thus, Lys content would not tune TMD hydropathy, but rather TMD hydropathy would adapt to the Lys content in this scenario.However, such arguments raise the question why similar patterns as present for Lys are absent for Arg.To further support the hypothesis that hydropathy is indeed tuned by Leu with contributions from other amino acids, and to rule out potential statistical anomalies and alternative explanations, more sophisticated models and alternative approaches need to be explored.

F I G U R E 1
Differences between Leu and Ile in class A GPCRs. (A) Packing density and (B) protein-surface exposure of Leu and Ile side chains in 214 GPCR structures.Each point represents a single GPCR structure.Mean values are the averages for all Leu and all Ile side chains within one structure.The diagonals indicate positions where, on average, Leu and Ile side chains are equally packed (A) or protein-surface exposed (B).Values above the diagonal in panel A indicate that Ile side chains are on average more densely packed than Leu side chains.Values below the diagonal in panel B indicate that Leu side chains are on average more protein-surface exposed than Ile side chains.Structures of GPCRs in active and inactive states are highlighted in purple and orange, respectively.(C) Differences in mean side-chain packing density between inactive-and active-state structures.Error bars indicate 95% confidence intervals.p-values were determined using Welch's t-tests and were <0.001 for Ile and <0.01 for Leu.(D) Relative variances of TMD hydropathies.Variances were calculated for hydropathies of complete TMD sequences («All») and for hydropathies of TMD sequences without the indicated amino acid.Amino acids are ordered from low to high hydropathy.The dashed line indicates the variance of hydropathies of complete sequences as a visual reference.(E) Correlations between amino acid content (of the indicated amino acid) and TMD hydropathy («Hy(all)» To investigate hydropathy tuning by Leu, we used a simple model with sequences composed of only 4 «amino acids»: A, B, C, and D. These «amino acids» form sequences of the type A a B b C c D d , where the small letters indicate the count of the corresponding amino acid within the sequence.A and B were modeled according to Ile and Leu, respectively, with «a» and «b» corresponding to the occurrences of these two amino acids within the TMD sequences of class A GPCRs.

F I G U R E 2
Numerical simulation of hydropathy tuning.(A, B, C) Simulated sequences in the absence of hydropathy tuning and (D, E, F) when hydropathy is tuned by B. Error bars indicate the 95% confidence interval within which values were obtained among all runs.(A, D) Relative variances of hydropathies.Variances were calculated for hydropathies of complete sequences («All») and for hydropathies of sequences without the indicated amino acid.The dashed line indicates the hydropathy variance of complete sequences as a visual reference.(B, E) Correlations between amino acid content and hydropathy («Hy(all)») based on Spearman's rank correlation coefficient (ρ).(C, F) Correlations between amino acid content and hydropathy calculated without the indicated amino acid («Hy(ÀAA)»).Only correlations for the hydrophobic amino acids A, B, and C are shown in the bar plots B, C, E, and F. The generic hydrophilic amino acid D displayed strong positive correlations in each case.hydropathy tuning, whereas no tuning appears to be present in OmpA proteins.4| DISCUSSIONTo summarize, Leu side chains tend to occur in less densely packed regions and are more protein-surface exposed than Ile side chains in the structures of class A GPCRs analyzed in this work, indicating that Leu generally interacts more with lipids.Within the TMD sequences of class A GPCRs, the Leu content decreases the variation in hydropathy between receptors and correlates with TMD hydropathies when calculated using only non-Leu residues.Corresponding sequence patters are absent for Ile in class A GPCRs, indicating that Ile is not (or much less) involved in tuning TMD hydropathy despite the very similar side chain structure, except for the subclass of sensory GPCRs.

F I G U R E 3
Absolute effects of amino acids on hydropathy variation in class A GPCRs. (A) Tuning effect on TMD hydropathies by amino acid based on the change in standard deviation (Std.Dev.).Standard deviations were calculated for the hydropathies of complete TMD sequences and for the hydropathies of TMD sequences without the indicated amino acid.Bars display the difference of the two standard deviations, that is, SD(Hy[ÀAA]) À SD(Hy[all]).Positive values mean that the indicated amino acid lowers the variation between receptors, that is, tunes hydropathy.Amino acids are ordered according to their effect on the standard deviation.Dashed lines at 0.5 and 1.0 kcal/mol are included as visual aids.(B) Correlation between the number of Leu and the TMD hydropathy calculated using all non-Leu residues and (C) correlation between the number of Lys and the TMD hydropathy calculated using all non-Lys residues.Each point represents one TMD.Positive hydropathy values indicate a hydrophilic character of the TMD.Spearman's rank correlation coefficients (ρ) are included at the top right of the plots.(D-G) Change in standard deviation within TMDs of class A subclasses for Leu (D), Lys (E), Val (F), and Ile (G).Standard deviations are calculated the same way as in (A).Only subclasses with more than 100 sequences were included to minimize statistical uncertainties.All other subclasses are binned together under "remaining."