Characterization of early and late transition states of the folding pathway of a SH2 domain

Abstract Albeit SH2 domains are abundant protein–protein interaction modules with fundamental roles in the regulation of several physiological and molecular pathways in the cell, the available information about the determinants of their thermodynamic stability and folding properties are still very limited. In this work, we provide a quantitative characterization of the folding pathway of the C‐terminal SH2 domain of SHP2, conducted through a combination of site‐directed mutagenesis and kinetic (un)folding experiments (Φ‐value analysis). The energetic profile of the folding reaction of the C‐SH2 domain is described by a three‐state mechanism characterized by the presence of two transition states and a high‐energy intermediate. The production of 29 site‐directed variants allowed us to calculate the degree of native‐like interactions occurring in the early and late events of the folding reaction. Data analysis highlights the presence of a hydrophobic folding nucleus surrounded by a lower degree of structure in the early events of folding, further consolidated as the reaction proceeds towards the native state. Interestingly, residues physically located in the functional region of the domain reported unusual Φ‐values, a hallmark of the presence of transient misfolding. We compared our results with previous ones obtained for the N‐terminal SH2 domain of SHP2. Notably, a conserved complex folding mechanism implying the presence of a folding intermediate arise from comparison, and the relative stability of such intermediate appears to be highly sequence dependent. Data are discussed under the light of previous works on SH2 domains.


| INTRODUCTION
The protein folding problem is one of the central themes of biochemistry and molecular biology. An unfolded protein must undergo a myriad of conformational rearrangements to properly fold to the native state, however, from an experimental perspective, small globular proteins often show an intrinsic ability to self-assemble in their native state in a two-state fashion, without populating intermediate states along the folding pathway, with only native and denatured states that are accessible to the protein to be populated. 1 Intermediate states, when present, are generally very limited in number and particularly elusive from an experimental detection and characterization, mostly because of their inherent transient nature.
Determining the folding mechanism of a protein relies in pinpointing the presence of intermediates and transition states along the reaction pathway and in characterizing their degree of structure. [2][3][4][5] Transition states, by definition, never accumulate, so that they must be characterized indirectly. A powerful strategy widely used to reach this goal is the Φ-value analysis, a methodology based on a combination of mutagenesis and kinetics, which compares and normalizes the effect of single mutations in transition state with native state and allows to map the interactions occurring in the transition state(s) of the reaction. 6 The strategy is also feasible to structurally characterize intermediate states. Intermediates can escape detection in classical kinetic (un)folding experiments, and their presence along the folding reaction can be inferred with different strategies. For example, nonlinear dependences of the logarithm of observed rate constants of (un)folding experiments as function of denaturant concentrations (i.e., chevron plots) is a wellrecognized signature of the existence of folding intermediates, 7,8 albeit data interpretation is not usually straightforward and rigorous analysis of different kinetic parameters must be taken into account. 9 In previous studies we characterized the folding mechanism of the N-terminal SH2 domain of SHP2, an important protein phosphatase with key role in several molecular pathways in the cell. Our results were compatible with a three-state folding scenario, accounting for the presence of a low-energy transiently populated on-pathway intermediate revealed by a pronounced roll-over effect in the refolding arm of the chevron plot. By performing a Φ-value analysis we were able to characterize the structure of the intermediate and second transition state at nearly atomic resolution. Interestingly, our data showed rather high Φ-values for the early intermediate state which were compatible with a surprisingly highly native-like structure, further increased and consolidated in the late transition state. 10 A powerful strategy to infer the mechanism of folding of a given protein relies in comparing its folding properties with proteins belonging to the same family, that is, sharing an overall same topology but different primary structures. 3,11,12 In a recent work, we demonstrated the C-SH2 domain to fold through a three-state mechanism characterized by the presence of a high-energy intermediate. 13 In this paper, we report a Φ-value analysis of the Cterminal SH2 domain of SHP2. By performing an extensive site-directed mutagenesis and a kinetic analysis we were able to characterize the structure of the early and late transition states of the folding reaction. Overall, the analysis of kinetic and thermodynamic data put in evidence a nucleation-condensation mechanism of folding, with the presence of a structured folding nucleus surrounded by a lower degree of native-like interactions, and subsequent locking in place in the late events of the reactions. Our results are discussed in comparison with what previously observed for the folding of the N-SH2 domain of SHP2 and under the light of general folding properties of SH2 domains.

| Φ-value and Linear Free Energy Relationship analysis of the C-SH2 domain of SHP2
In a recent work, by employing a combination of equilibrium and kinetic (un)folding experiments conducted at different experimental conditions, we analyzed the folding mechanism of the C-SH2 domain of SHP2. 13 An analysis of initial and final fluorescence of the unfolding reaction of the wild-type C-SH2 domain as function of [UREA] showed the dependence of the initial fluorescence (which resembles the signal of native protein) to be linear, denoting the absence of burst-phase unfolding events. This aspect allowed us to conclude that the rollover in the unfolding arm of the chevron plot could be ascribable to a folding mechanism implying a change in the rate-limiting step at high denaturant concentrations, with the presence of two transition states along the reaction pathway, and a high-energy intermediate.
To infer the structural features of the early and late transition states of the C-SH2 folding reaction we resorted to conduct a Φ-value analysis. The Φ-value analysis is a technique based on a combination of kinetic and extensive mutagenesis, which allows to determine the role of single aminoacidic residues in the folding of a given protein. By producing a number of site-directed variants based on conservative truncation of residues lateral chains (Val to Ala, Leu to Ala, Ile to Val, Thr to Ser), that is, variants that may have a destabilizing effect on the protein without disrupting its native conformation, 6,14 and by monitoring the effect of such mutations on the thermodynamics of transition states and native state, it is possible to determine the degree of native-like contacts which are formed by that residue in the transition states. These contacts are then reported as a normalization value (namely the Φ-value) which measures the change in free energy in the transition state and normalizes it to the change in free energy between the ground states. As a result, Φ-value tends to 1 if the mutated residue experiences a native-like environment and therefore is fully "structured" in the probed state as it is in the native state, and it tends to 0 if it is "unstructured" as it is in the denatured state.
By following the general rules of Φ-value analysis we designed 32 variants of the C-SH2 domain (reported in Table 1). Of the 32 designed variants, three were too poorly expressed and could not be characterized. Experiments were performed by stopped-flow methodology, monitoring the change in intrinsic fluorescence of the protein as function of time, by an 11-fold dilution of the protein in buffer containing urea as denaturing agent for unfolding experiments, and protein in urea versus buffer T A B L E 1 Folding and thermodynamic parameters of C-SH2 domain variants. at different final urea concentrations for refolding experiments. Buffer used was TrisHCl 50 mM, DTT 2 mM, pH 8.0 and experiments were performed at 25 C. In all the experiments conducted, the unfolding and refolding time courses were satisfactorily fitted with a single exponential equation. The dependences of the logarithm of the observed rate constants (k obs ) as function of the concentration of denaturant (chevron plots) of all the variants explored are reported in Figure 1. In analogy to our previous work on the C-SH2 domain, 13 chevron plots displayed a deviation from linearity in their unfolding arm and were analyzed by using an equation taking into account the presence of a high-energy intermediate along the reaction pathway To obtain a more reliable measurement of the kinetic parameters, all the chevron plots were analyzed by sharing kinetic m-values. 15 According to the model used to analyze kinetic data, we could calculate thermodynamic parameters for two energetic barriers obtaining two Φ-values, one for the early transition state (Φ TS1 ) and one for the late transition state (Φ TS2 ). Following a generally accepted approach, to infer the structure of the two transition states we divided Φ-values in four categories, 3), and noncanonical Φ-values (i.e., below 0 or higher than 1), and mapped them on the three-dimensional structure of the C-SH2 domain following a well-defined color-code ( Figure 2). Inspection of Figure 2a indicates the presence of a folding nucleus highlighted by high Φ-values of residues V148 and L149, surrounded by a lower degree of native-like structure, with the prevalence of low and intermediate Φ-values in the early transition states. As the folding reaction proceeds, the degree of native-like contacts increases, as depicted in Figure 2b. In fact, in the late transition state the folding nucleus appears to be more consolidated and surrounded by higher Φ-values.
Interestingly, inspection of Figure 2 reports anomalous Φ-values (i.e., Φ > 1 and Φ < 0) in correspondence to residues V170 and I172, in both early and late events of folding. Non-canonical Φ-values may indicate that those residues are involved in non-native interactions and can represent transient misfolding on the path to the native state.
To further investigate the structural features of the early and late transition state of the folding reaction of the C-SH2 domain we performed a Linear Free Energy Relationship (LFER) analysis. 16 This method is based on relating the change in free energy observed in the transition states to the one calculated for the native state. The slope of the observed correlation, denoted as α value and ranging between 0 and 1, indicates the degree of nativelike structure in the transition state. By following this methodology, we performed a LFER analysis for both the transition states of the folding reaction of the C-SH2 domain, and the corresponding LFER plots are reported in Figure 3. The LFER analysis returned a α TS1 = 0.46 ± 0.10 and α TS2 = 0.74 ± 0.10. Altogether, our results indicate that the C-SH2 domain follows a stepwise folding mechanism with an extended native-like late transition state.

| Detecting possible Hammond effect on the transition states of C-SH2
Global analysis of chevron plots obtained from several site-directed variants of a given protein, or from different experimental conditions, is a very useful methodology to calculate reliable microscopic folding and unfolding rate constants. 15 However, such a process relies on the assumption that the perturbation deriving from sitedirected mutagenesis or change in pH, ionic strength, etc., may have an effect only on the stability of the native state, without affecting the thermodynamics of the transition state(s). In particular, by sharing kinetic m-values in the fitting process of chevron plots it is not possible to monitor possible changes in the transition states relative stability compared to the ground states, which, by following the Hammond postulate, could determine a movement of their position along the reaction pathway (the so-called Hammond effect). 17 To obtain information about possible Hammond effects on the transition states of the C-SH2 domain, we performed a new fit of the chevron plots reported in Figure 1 by removing constraints on all kinetic m-values. Unfortunately, due to the data complexity, the analysis software (GraphPad Prism) could not compute a reliable fit (data not shown). Thus, we resorted to fit chevron plots by removing constraints only from m f and m u values, and we kept a shared m part value for all data sets. Obtained kinetic and thermodynamic parameters are reported in Table S1. This approximation, which implies TS1 and TS2 positions on the reaction coordinate to be The obtained β-tanford values for TS1 and TS2 were plotted versus the thermodynamic stability of the corresponding site-directed variant, and fitted with a linear equation (Figure 4). It is of interest to note that β TS1 and β TS2 values obtained display a clear dependence versus ΔG D-N values, with a negative slope. This result is compatible with a Hammond effect, highlighting that both transition states approach the native state on the reaction coordinate as the native state becomes less stable. 17 To test the robustness of our results we resorted to globally analyze kinetic data by removing constraints from kinetic m f value and monitoring the effect of the stability of the domain on the position of the early transition state on the reaction coordinate. We reported the results of the fitting process in Table S2, and the relative calculated β TS1 values as function of protein stability in Figure S1. Data were fitted with the same linear equation used to fit data reported in Figure 4, proving the robustness of the model proposed.

| DISCUSSION
Although their fundamental role for the physiology of the cell and their involvement in several molecular pathways, only few experimental works characterized the folding properties of SH2 domains. 10,13,[18][19][20][21] Understanding the determinants of thermodynamic stability as well as characterizing the folding pathway of a protein is of fundamental importance to depict the molecular basis of its biochemical function. Moreover, a structural characterization of the transition state(s) and of possible intermediates along the folding reaction allows to pinpoint potential aberrant misfolding events, that may result in protein misfunction. The analysis of Φ-values represents a very powerful methodology to obtain such mechanistic information.
In the study of protein folding, it is particularly useful to compare the folding pathways of homologous proteins or domains. By analyzing the folding pathway of proteins belonging to the same family, sharing a similar threedimensional structure but displaying different primary structures, it is possible to characterize the role of sequence in determining folding and function, with the possibility to pinpoint interactions fundamental for proper folding and native state stability. To do so we resorted to compare kinetic and thermodynamic data obtained in a recent work for the N-SH2 domain of SHP2 with data presented in this paper. The N-SH2 domain is characterized by a three-state folding mechanism implying the presence of an on-pathway low-energy intermediate. Interestingly the analysis of Φ-values described a rather highly structured intermediate state, ture, an aspect that may suggest the tendency to retain different degrees of residual structure in the denatured states of the two SH2 domains. In fact, it is worth noticing that for the SH2 domain of Src a considerable degree of residual structure in the denatured state could be detected. 18 Since the denatured state in Φ-value analysis is usually considered to be not affected in its free energy, it is tempting to assume that an undetected difference in the residual structure of the denatured states of the N-SH2 or C-SH2 domain might dictate the degree of nativelike structure characterizing early transient states of the two SH2 domains, possibly influencing the presence (and the relative stability) of intermediate states.
To further compare the folding pathway of the C-SH2 domain with the N-SH2 we resorted to perform a Φ vs Φ analysis for conserved residues between the two domains. Such analysis allows to determine the degree of similarities in native-like structure between probed states, providing useful information about whether folding pathways are conserved in homologous proteins. 22 To obtain an informative Φ vs Φ plot, a sequence alignment of the two domains is mandatory. In fact, in such analysis, only residues in the same structural position in the two proteins can be compared. To do so, we performed a sequence alignment between the N-SH2 and C-SH2 domain of SHP2 by using the ClustalW online tool (https://www.genome.jp/tools-bin/clustalw). In Figure 5 we show Φ vs Φ plots of early and late events of the folding reaction of N-SH2 vs C-SH2 domain and we reported the alignment result, highlighting those residues for which a comparison of Φ-values was possible. For the early events of folding, albeit the clear linear correlation between the Φ-values of the two domains, Φ-values of N-SH2 are higher than the ones calculated for C-SH2, indicating a conserved folding mechanism with a higher degree of native-like structure for the N-SH2 domain. The analysis of the late events of folding, on the other hand, appears less clear, due to a more scattered correlation between Φ-values. Interestingly, a comparison of the change in stability of the early and late events of folding, as well as of the native state of C-SH2 and N-SH2 domains upon mutation ( Figure S2) resemble what is reported in Figure 5, further supporting the validity of our model and the hypothesis of a generally conserved folding mechanism between the C-SH2 and N-SH2 domain.
The analysis of Φ-values of C-SH2 domain show the presence of anomalous Φ-values in both early and late events of folding, in correspondence of residues V170 and I172. Structural explanation of unusual Φ-values demands careful analysis, given the nature of Φ-values as thermodynamic measures of native-like structure. The presence of Φ-values higher than 1 and/or lower than 0 is, in fact, usually interpreted as the presence of nonnative interactions, which may lead to misfolding events. It is worth noticing that, as reported in Table 1, there are other positions characterized by an unusual Φ-value. However, since those mutations caused a minor change in activation and/or equilibrium free energies that may jeopardize Φ-value calculation, we considered those Φ-values as low (close to 0) or high (close to 1), to avoid misinterpretation of thermodynamic parameters, and reported them accordingly in Figure 2. Interestingly, both V170 and I172 residues are physically located close to the binding pocket of the C-SH2 domain. Thus, it is not surprising that perturbation occurring in the functional site of the protein lead to the occurrence of transient misfolding events. Whilst, in fact, globular proteins are generally characterized by a status of minimal frustration, 23,24 that is, residues have been selected by evolution to best fulfill the interactions needed for the protein to fold, the contrasting demand between folding and function leads to the presence of residues that are sub-optimal for folding in the functional sites of proteins. [23][24][25][26] Given these premises, we conclude that the unusual Φ-values measured for V170 and I172 highlight these residues to be part of a frustration pattern relative to the functional binding site of the C-SH2 domain.
In summary, in this work we provide a comprehensive characterization of the folding mechanism of the Cterminal SH2 domain of SHP2 protein, with a detailed characterization of early and late events of folding. These kinds of analysis is of particular interest to understand the determinants of stability of functional proteins and domains, such as SH2. In fact, albeit deep learning and artificial intelligence (AI) approaches, as for example AlphaFold 27 and RoseTTA, 28 are currently paving the way into a better and better determination of threedimentional structure of proteins starting only from their aminoacidic sequences, the characterization of the mechanism by how the native functional shape is acquired still strongly relies on experimental data. Moreover, given the enormous number of physiological pathways that are regulated by protein-protein interactions mediated by these types of domains, determining such mechanisms allows to pinpoint possible misfolding events that are, until now, elusive for AIs. Data reported in this study, in fact, show how residues structurally located in the functional portion of the domain to display transient misfolding, as reported by their unusual values of Φ. The comparison of kinetic and thermodynamic folding data with the ones obtained for the N-SH2 domain suggest a conserved complex folding mechanism with the formation of at least one intermediate along the folding reaction. However, the relative stability of such intermediate appears to be highly sequence dependent. Under this light, we cannot exclude that the folding mechanism of SH2 domains showing a two-state behavior 18,20 may involve intermediate state(s) too unstable to be experimentally detected. Moreover, by systematically comparing Φ-values obtained for conserved residues between the N-SH2 and C-SH2 domains we could highlight a conserved and robust mechanism, in both early and late events of folding.

| Protein expression and purification
C-SH2 domain in its wild-type variant and all the sitedirected mutants produced were purified as described previously. 13 Site-directed mutagenesis was performed using the QuikChange mutagenesis kit (Agilent Technologies Inc., Santa Clara, CA), accordingly to manufacturer instructions.

| Kinetic (un)folding experiments
Kinetic (un)folding experiments were performed on an Applied Photophysics Pi-star 180 stopped-flow apparatus. Samples were excited at 280 nm and the change of fluorescence emission was recorded by using a 360 nm cutoff glass filter. The experiments were performed at 25 C in buffer Tris HCl 50 mM, DTT 2 mM, pH 8.0, by using urea as denaturant agent. For each denaturant concentration, at least five individual traces were averaged. The final protein concentration was typically 2 μM.

| Data analysis
In a two-state scenario the logarithm of the folding rate constants follows a linear dependence as function of [UREA], with the following equation used to fit a typical two-state chevron plot The parameters m f and m u represent the slopes of the dependence of k f and k u as function of [UREA]. In the case of the C-SH2 domain, a kink in the unfolding arm of the chevron plot is clearly appreciable. The three-state model used to describe kinetic data implied the presence of an on-pathway high-energy intermediate with K part representing the partitioning constant between the early (TS1) and late (TS2) transition state, and m part is the associated kinetic m-value. β-tanford values for TS1 and TS2 were calculated as follows Φ-values for TS1 were calculated using the following equation Φ-values for TS2 were calculated as where ΔΔG TS2ÀN ¼ ΔG wt TS2ÀN À ΔG mut TS2ÀN ¼ RT ln k mut u =k mut part À k wt u =k wt part