Functional site profiling and electrostatic analysis of cysteines modifiable to cysteine sulfenic acid


  • Freddie R. Salsbury Jr,

    1. Department of Computer Science, Wake Forest University, Winston-Salem, North Carolina 27109, USA
    Search for more papers by this author
  • Stacy T. Knutson,

    1. Department of Physics, Wake Forest University, Winston-Salem, North Carolina 27109, USA
    2. Department of Computer Science, Wake Forest University, Winston-Salem, North Carolina 27109, USA
    Search for more papers by this author
  • Leslie B. Poole,

    1. Department of Biochemistry, Wake Forest University School of Medicine, Winston-Salem, North Carolina 27157, USA
    Search for more papers by this author
  • Jacquelyn S. Fetrow

    Corresponding author
    1. Department of Physics, Wake Forest University, Winston-Salem, North Carolina 27109, USA
    2. Department of Computer Science, Wake Forest University, Winston-Salem, North Carolina 27109, USA
    • 100 Olin Physical Laboratory, 7507 Reynolda Station, Wake Forest University, Winston-Salem, NC 27019-7507, USA; fax: (336) 758-6142.
    Search for more papers by this author


Cysteine sulfenic acid (Cys-SOH), a reversible modification, is a catalytic intermediate at enzyme active sites, a sensor for oxidative stress, a regulator of some transcription factors, and a redox-signaling intermediate. This post-translational modification is not random: specific features near the cysteine control its reactivity. To identify features responsible for the propensity of cysteines to be modified to sulfenic acid, a list of 47 proteins (containing 49 known Cys-SOH sites) was compiled. Modifiable cysteines are found in proteins from most structural classes and many functional classes, but have no propensity for any one type of protein secondary structure. To identify features affecting cysteine reactivity, these sites were analyzed using both functional site profiling and electrostatic analysis. Overall, the solvent exposure of modifiable cysteines is not different from the average cysteine. The combined sequence, structure, and electrostatic approaches reveal mechanistic determinants not obvious from overall sequence comparison, including: (1) pKas of some modifiable cysteines are affected by backbone features only; (2) charged residues are underrepresented in the structure near modifiable sites; (3) threonine and other polar residues can exert a large influence on the cysteine pKa; and (4) hydrogen bonding patterns are suggested to be important. This compilation of Cys-SOH modification sites and their features provides a quantitative assessment of previous observations and a basis for further analysis and prediction of these sites. Agreement with known experimental data indicates the utility of this combined approach for identifying mechanistic determinants at protein functional sites.

Abbreviations: Prx, peroxiredoxin; Msr, methionine sulfoxide reductase; Cys-SOH, cysteine sulfenic acid; GAPDH, glyceraldehyde-3-phosphate dehydrogenase; PTP, protein tyrosine phosphatase.

Protein post-translational modifications are well known to play important biological roles by rapidly modifying the structure and function of proteins. The most common and well-known example is the involvement of protein phosphorylation in signal transduction. Analysis of phosphorylation sites has led to a better understanding of kinase substrate specificity (Brinkworth et al. 2002; Kobe et al. 2005), methods for site prediction (Koenig and Grabe 2004; Huang et al. 2005; Plewczynski et al. 2005; Xue et al. 2005), and a combined experimental/computational approach that has led to a better understanding of the yeast phosphoproteome (Brinkworth et al. 2006; Molina et al. 2007).

The reversible oxidation of cysteine side chains to cysteine sulfenic acid (Cys-SOH) has been recognized as a post-translational modification for several decades (Allison 1976; Poole and Claiborne 1989); however, the broader significance of its biological roles has been emerging only in the last decade (Claiborne et al. 1999). Cys-SOH plays roles at enzyme catalytic sites, senses oxidative and nitrosative stress, and regulates some transcriptional regulators (for review, see Poole et al. 2004). For example, cysteine is a catalytic residue in protein tyrosine phosphatases (PTPs): Cys-SOH modification at that active site residue is responsible for reversible inhibition of some PTPs (Denu and Tanner 1998; Cho et al. 2004; Tonks 2005). The transcriptional regulators OxyR and OhrR are also reversibly regulated through Cys-SOH formation (Zheng et al. 1998; Fuangthong and Helmann 2002; Kim et al. 2002; Panmanee et al. 2006). Cys-SOH can also be hyperoxidized to sulfinic and sulfonic acid (Cys-SO2H and Cys-SO3H, respectively), modifications that may play roles in signaling, aging, or disease processes (for review, see Berlett and Stadtman 1997; Jacob et al. 2004; Cross and Templeton 2006).

The biological relevance of Cys-SOH post-translational modification suggests that this modification is not random, but rather facilitated by specific features at or near the functional site that influence the modification and its stability. Previous observation of protein structure has suggested that the microenvironment that stabilizes Cys-SOH is characterized by three features: (1) lack of solvent accessibility to the modified cysteine; (2) lack of nearby reduced cysteines; and (3) local hydrogen-bonding residues that stabilize the sulfenate form (Claiborne et al. 1993, 1999). Until now, however, these observations have been general and qualitative. Given their significance, we aimed to better characterize known Cys-SOH sites in proteins.

Functional site profiling is a method that allows the analysis and comparison of sequence and structure features at a functional site, as outlined in Figure 1 (Cammer et al. 2003). We describe here the functional site profiling of Cys-SOH modification sites in proteins of known structure, the characterization of the sequence and structural features near the modifiable cysteine, and clustering of the sites by common sequence features.

Figure Figure 1..

An example of construction of functional site signatures and a profile. (A) The three-dimensional structure of 1vhrA, a dual-specificity phosphatase, is shown as a ribbon. The location of the modifiable (and, in this protein, the active site) cysteine residue is indicated as a light-gray sphere. The structural segments that contain residues located within 10 Å of the key cysteine residue are shown in colors, each color representing a different segment. (B) The segments are extracted from the global structure. (C) The sequence fragments corresponding to the structural segments are combined, from N to C terminus, into a single contiguous sequence that is called the functional site signature. The color of each residue is indicative of the segment in which it was originally located in the structure, as shown in A and B. (D) Related signatures can be aligned to create a functional site profile, as shown for three example proteins. The modified cysteine is underlined. A scoring function for a given profile rewards both identities and similarities, but penalizes gaps. Functional site profile scores greater than 0.25 indicate a significant relationship between the signatures in the profile for a closely related family (Cammer et al. 2003); however, some proteins with scores slightly less than 0.25 are known to be related (Baxter et al. 2004). Protein structures were prepared in VMD v1.8.3 (Humphrey et al. 1996).

Functional site profiling, with a sequence-based scoring function, has previously been applied to enzyme active sites (Cammer et al. 2003; Baxter et al. 2004; Huff et al. 2005). Such sites are more definitively related than are Cys-SOH modification sites; consequently, identification of features other than just sequence and structure is likely necessary to identify relationships between these sites. The accepted mechanism for Cys-SOH generation by hydrogen peroxide-mediated oxidation involves initial cysteine deprotonation (Denu and Tanner 1998; Wood et al. 2003b), thus suggesting a lowered sulfhydryl pKa for the modifiable cysteine. This observation indicates that analysis of the cysteine pKa and its electrostatic environment should provide important clues to its modifiability, an idea similar to that suggested for enzyme active sites (Ondrechen et al. 2001; Jones et al. 2003). Thus, in this work, we apply the commonly used semi-macroscopic electrostatic methods or simple continuum models (Bashford and Gerwert 1992; Antosiewicz et al. 1994; Sham et al. 1997) to the modifiable cysteine sites in proteins and compare the electrostatic features with the sequence features identified by profiling.

Our long-term goals are to develop methods to predict Cys-SOH modification sites in protein sequences and to create a database of modification sites, and potential sites, for use by biological researchers. Toward this long-term goal, we have compiled a list of proteins known to contain Cys-SOH modification sites and report that list here. We characterize site features using a combination of functional site profiling and electrostatic analysis, which allows us to more quantitatively cluster Cys-SOH sites, compare features between the sites, and identify similarities and differences related to cysteine reactivity.


Overview of structure and function of proteins that contain Cys-SOH sites

The modifiable protein set contains 47 proteins with 49 distinct Cys-SOH sites (Table 1). These proteins either exhibit Cys-SOH in the crystal structure (27 proteins) or are known from biochemical studies to generate a Cys-SOH during exposure to oxidants (20 proteins). Each site, therefore, is located in an environment in which the cysteine can be modified by some mechanism to sulfenic acid. A total of 27 of the 47 proteins contains a Cys-SOH that is sufficiently stable to allow observation in the crystal structure. Most of the proteins in the modifiable data set exhibit little sequence identity to one another. Sequence identities range from insignificant (<10% identity) to 99% (1kyg and 1n8j are the same protein sequence, with a single mutation difference). Most proteins are unrelated, with ∼99% (1116 of 1128) of all global pairwise sequence identities <30%.

Table Table 1. Protein structures from the pdb with cysteines modifiable to sulfenic acida and strongly interacting residues
original image
original image

As expected, cysteine oxidation to sulfenic acid is not limited to proteins of specific biochemical function or structure. The biochemical functions include metabolic enzymes (e.g., GAPDH, malate synthase), DNA-binding and transcriptional regulatory proteins (e.g., papillomavirus E2 protein, NF-κB p50/p65), inhibitors (e.g., α-1 antitrypsin), oxygen carriers (e.g., hemoglobin), and proteins involved in the immune system (e.g., H-2 Class I histocompatibility antigen). A few are involved in cellular redox regulation and/or signaling (e.g., peroxiredoxin, glutathione reductase, thioredoxin), but most are not. Comparison of SCOP classifications (Murzin et al. 1995) for these proteins indicates that they are not limited to a single structural class: All structural classes and some multidomain proteins are represented. Additional analyses (see Supplemental material and Supplemental Fig. S1) indicate that modifiable cysteines have no propensity for any particular secondary structure; thus, while the helix macrodipole might contribute to some modification sites, it is not the only mechanism by which the cysteine reactivity is modified.

Modifiable cysteines might be more reactive due to greater solvent exposure; however, to stabilize the reactive sulfenic acid intermediate, one might expect the cysteine to be more buried. To determine whether modifiable cysteines were more or less accessible than average, solvent accessibility was calculated for some of the side chains. Conformational change occurs in some of these proteins when the cysteine is modified, thus we did not calculate the solvent accessibility for cysteines that were actually modified in the crystal structure. The average accessibility for 15 modifiable (but not modified) cysteines was 94.7 Å2, while the side-chain surface area for the average cysteine was 90.7 Å2 (with average defined as all free cysteines in the modifiable protein set). The 4 Å2 difference is too small to be physically relevant in determining reactivity, though the calculation should be repeated when more structures of modifiable (but not modified) sites are known.

Features of the functional site signatures and profiles for modifiable cysteine sites

The modifiable cysteine was selected as the “key residue,” and signatures were identified for each cysteine modification site (procedure shown in Fig. 1) (Cammer et al. 2003). These signatures were aligned (see Materials and Methods) to create a functional site profile for these sites (Fig. 2). The signatures are highly diverse, a characteristic different from profiles previously created for enzyme active sites (Cammer et al. 2003; Baxter et al. 2004). In fact, the sequence-based profile score is −0.46, an insignificant score (Cammer et al. 2003). This result is expected given the diversity of proteins (Table 1) and local structures (Supplemental Fig. S1) in which this post-translational modification occurs.

Figure Figure 2..

All signatures for each Cys-modifiable site, organized and aligned to show the functional site profile. For each protein in the modifiable protein set (Table 1), the functional site signature was extracted using the process shown in Figure 1. The signatures were aligned using ClustalW (Thompson et al. 1994; Higgins et al. 1996), and then the functional site profile score was calculated for each pair of signatures. The signatures are organized by these pairwise scores, with the dendrogram to the right indicating the relationships based on these pairwise scores (note that the branch length in the dendrogram is meaningless and does not indicate a distance). The modifiable cysteine is shown as a white “C” on a black background and the structural fragment that contains the modifiable cysteine is highlighted in light blue. Yellow and red shading indicates strongly and weakly interacting residues, respectively. (As described in the Materials and Methods, strong interacters, yellow, are identified when the interaction energy [measured in pK units] is >1.0 pK units, and weak interacters, red, when the interaction energy is between 0.5 and 1.0 pK units.) The identity of the strong interacters is given in Table 1. Clusters of signatures discussed in the text are indicated by gray shading; these are also shaded in Table 1.

Frequencies for residues within the signature containing the modifiable cysteine were calculated (Table 2). One observation is immediately surprising: Three of the four charged residues, Asp, Glu, and Lys, are found in active site signatures significantly less often than would be expected by chance. The frequency of occurrence of Arg in the signatures is about the same as its occurrence in the entire modifiable protein set. If charged residues were a key feature in determining cysteine reactivity, one would expect these residues to be overrepresented. Residues that are overrepresented in the signatures include the hydrogen-bonding residues—particularly Thr, Ser, and His—suggesting that hydrogen bonding might play an important role in cysteine reactivity.

Table Table 2.. Frequencies of residue occurrence in signatures and adjacent to the modifiable Cys
original image

Frequencies for specific residues directly adjacent to the modifiable cysteine in the sequence were also calculated (Table 2). Given the caveat of small numbers, three residues are overrepresented in this analysis with frequencies of adjacent residues greater than 2.7 in each case: His and Met N-terminal and Trp C-terminal to the modifiable cysteine. His was observed as an N-terminal adjacent residue seven times in mostly unrelated signatures (Fig. 3) and, thus, its overrepresentation in this position is statistically significant, suggesting His-Cys might be a predictive motif. Adjacent Met and Trp residues were only observed two times each, an insufficient number of observations to draw conclusions. Seven residues were never observed in adjacent positions: Glu, Arg, Trp, Phe, Pro, Cys, and Ile N-terminal and Cys and Asn C-terminal to the modifiable cysteine (Table 2). Because of the small numbers, drawing conclusions may be premature, but in one case the difference in occurrence between the directly N- and C-terminal residues is statistically significantly different: Pro occurs nine times on the C-terminal side, but never on the N-terminal side. These observations, and the data provided in Table 2, provide initial clues into potentially important features for the prediction of modifiable cysteines.

Figure Figure 3..

An example of structural polymorphism, one cause for differences between functional site signatures of otherwise related proteins. (A) Proteins 1n8jA (cyan ribbon) and 1kygA (yellow ribbon) are both alkyl hydroperoxide reductase AhpC proteins. These two proteins are identical in sequence and largely similar in structure, with the exception of the protein C terminus and the region around the Cys-SOH site (areas indicated by arrows). The modifiable cysteine residue in 1n8jA is shown as blue ball-and-stick (in this case the Cys46 has been mutated to a Ser), while in 1kygA is shown in red (in this case, disulfide bonded to the resolving cysteine, though that subunit is not shown in this figure). The weakly interacting Thr residue common to both signatures is shown in purple (1kygA) and pink (1n8jA). (B) Alignment of the functional site signatures of 1n8j (top) and 1kyg (bottom) shows the different sequence fragments identified as the signatures for these proteins. Although the proteins are mostly identical, the signatures are different due to the structural polymorphisms shown in A. Strong interacters are highlighted in yellow and weak interacters are highlighted in red. The modified cysteine is shown as a red character. Protein structures were prepared in VMD v1.8.3 (Humphrey et al. 1996).

Despite the profile diversity, groups of signatures within the profile appear to be related. If those similarities are significant, they would indicate common features necessary for cysteine modification within these clusters. A first step in identifying such common features is to explore the relationship between the overall protein sequence and the modifiable cysteine site. A second step is to explore other methods, such as inclusion of biophysical parameters, which would be useful in identifying similarities beyond simple sequence comparison of the signatures. The first step will be discussed in the next paragraphs, and the second will be explored by inclusion of electrostatics in the profile analysis, as described in subsequent sections.

To explore the relationship between the overall sequences and the signatures, we compared the pairwise sequence identity between signatures with the pairwise sequence identity between complete sequences. Exploration of this relationship will indicate whether the fragments proximal to the modifiable cysteines are more or less similar than the overall sequences. This comparison shows that the signature sequence identity is largely uncorrelated with the overall sequence identity (data not shown). There are three possible explanations for lack of correlation between global sequence identity and identity between signatures in the same protein: (1) the signatures are trivially short, so sequence comparison between signatures is not significant; (2) structural polymorphism (differences in conformation) at the cysteine modification sites create different signatures; and (3) the actual mechanism for cysteine modification is more or less similar than the overall identity between the two proteins would suggest. Prior knowledge of protein structure and function allows us to identify examples of each explanation, as discussed in the next three paragraphs.

The small size of some signatures results in a lack of information with which to create an accurate alignment; consequently, two signatures might appear more or less related than they actually are. An example includes signatures [1kyg, 1vkx] and [9pap, 1mem]: The 1kyg signature is 36% identical to signatures for 9pap and 1mem, while the 1vkx signature is 30% identical to the 9pap and 1mem signatures. The 1kyg and 1vkx signatures are only 10 and 11 residues in length, respectively (Fig. 2). Sequence identity of 30%–36% between such short fragments is not significant. Conclusions about mechanistic similarity for cysteine modification at such sites would be unwarranted.

Signatures are identified by structural criteria (Fig. 1); consequently, conformational differences (or structural polymorphism) at the modifiable cysteine site can make the signatures appear less related than they might actually be. Several examples of this effect are observed in the modifiable protein set. The most obvious example is 1n8j and 1kyg, both the bacterial peroxiredoxin, AhpC. These structures differ by a single residue change at the modifiable cysteine, and thus exhibit 99% global sequence identity; however, the two proteins exhibit different conformations near the modifiable cysteine site (red arrow and red/blue side chains, Fig. 3A). The structural polymorphism exists because the structures represent analogs of different conformations seen during the enzymatic reaction: 1kyg contains a disulfide bond between the modifiable cysteine and another mechanistically important cysteine in another subunit, while 1n8j is a mutant with a serine replacing the modifiable cysteine and, thus, no disulfide bond (Wood et al. 2003a). The structural polymorphism produces different functional site signatures with 81% pairwise sequence identity (Fig. 3B). Even though one of the “key” residues is an engineered serine rather than a cysteine and these residues are not aligned in superposition of the structures (Fig. 3A, cf. red and blue side chains), these residues are properly aligned in the profile (Fig. 3B). In addition, both signatures contain a threonine that is important for cysteine reactivity (discussed subsequently) and this threonine is aligned in the profile (Fig. 3B). This observation indicates that a profile can identify and align mechanistically important residues, even in cases where structural changes are observed; however, if conformational differences are observed in identical proteins, the resulting signatures will appear less similar than they actually are.

Similarity in the cysteine deprotonation mechanism, or lack thereof, is the third and most interesting explanation for observing or not observing a correlation between the sequence identities of full protein and signature. In this case, two signatures would exhibit more (or less) similarity than the overall sequence comparison would suggest, indicating that mechanisms by which the reactive cysteine is modified to Cys-SOH are similar (or are not similar). Such observations would potentially allow identification of common features of the modification mechanism that would not be obvious from the overall sequence comparison. The proteins 1hd2 and 1prx provide an example. Overall sequence identity between these two proteins is 8%, an insignificant value that would suggest no relationship; however, the sequence identity and profile score between the signatures are both significant at 41% and 0.38, respectively. The sequence similarity at the modifiable cysteine site is apparent from comparison of the signatures (Fig. 2, light-gray shading) and structures (Fig. 4). This observation makes sense: These proteins are peroxiredoxins (Prxs) V and VI, which exhibit a common mechanism of cysteine modification (Poole 2007). (Their electrostatic similarity extends and supports this observation and is discussed subsequently.) While this result is not new, the ability to identify previously known mechanistic similarities by comparison of functional site signatures demonstrates the method's utility and its potential for identifying novel similarities.

Figure Figure 4..

Comparison of the functional sites of 1prx, 1hd2, and 1j0x shows similarities in the structural location of key functional residues, despite the lack of overall sequence similarity. The backbone structure for 1prx, Prx VI (A), 1hd2, Prx V (B), and 1j0x, GAPDH (C) are shown as white ribbons; segments that comprise the functional site signature are colored cyan. (The signatures themselves are aligned in Fig. 2, with 1prx and 1hd2 shaded in light gray and 1j0x shaded in dark gray.) Consistent with the coloring in Figure 2, side chains of strongly interacting residues are colored yellow, and weakly interacting residues are colored red. The modifiable Cys at these functional sites are shown as blue van der Waals side chains (Cys 47, Cys 47, and Cys 149, respectively). Strongly interacting residues Arg 132 and Thr 44 (1prx) and Arg 127 and Thr 44 (1hd2) are shown as yellow side chains; these align nicely in both the signature and the structure of the two proteins. Although Tyr 317 (red side chain) of 1j0x does not align with Arg 132 and Arg 127 in the signatures, it does share a similar position in structure. The strong interacter Glu 50 (1prx, yellow) and aligned weak interacters His 51 (1hd2, red) and Cys 153 (1j0x, red) are also located in structurally similar positions. The strong interacters Ser 72 (1prx, yellow) and His 176 (1j0x, yellow), and weak interacter Cys 72 (1hd2, red) align within the sequence signature (Fig. 2), are not in exactly the same position, but are all found in a β strand adjacent to the active site. Overall, the proteins exhibit low sequence identity (1prx and 1hd2: 8%; 1prx and 1j0x: 7%; and 1hd2 and 1j0x: 11%), but the structural similarity between the proteins, particularly at the functional site identified in the signature comparison in Figure 2, can be observed in the structures.

Functional site profiling focuses on sequence similarities. Identification of mechanistic similarities at diverse sites, such as the Cys-SOH modification sites, requires additional information. As part of the mechanism of Cys-SOH formation through reaction with substrates such as hydrogen peroxide, the cysteine is likely to be deprotonated (Wood et al. 2003b); thus, decreased cysteine pKa facilitates the reaction. The pKa, and residues that cause its shift, are thus ideal biophysical features with which to further characterize these sites. As a first approach to including additional features in profiles, we explore the pKa of the modifiable cysteine and the residues that cause its pKa to be shifted using newly validated cysteine parameters and methods (F.R. Salsbury Jr., L.B. Poole, and J.S. Fetrow, in prep.).

Electrostatic characterization of Cys-SOH modification sites

A pKa was calculated for the reduced cysteine at each Cys-SOH modification site in each structure (Table 1). As expected, the calculated pKa distribution for modifiable cysteines is shifted compared with the distribution for a control set from 8.14 for the control set to 6.9 for the modifiable set (F.R. Salsbury Jr., L.B. Poole, and J.S. Fetrow, in prep.). This result is consistent with the molecular function of these sites, in which the first step in the typical oxidation mechanism is cysteine deprotonation (Wood et al. 2003b).

All but eight modifiable cysteines exhibit pKas lower than the mean of a control protein set (8.14) and nearly half, 20 cysteines, exhibit pKas shifted more than 1.5 below the mean. The eight with increased pKas ranging from 8.2 to 10.4 are: 1fzj (H-2 class-1 histocompatibility antigen K-B), 1q79 (polynucleotide adenylyltransferase), 1g55 (DNA cytosine methyl transferase), 1qvz (gene product of YDR533C, function unknown), 1hku (C-terminal binding protein 3), 1d8c (malate synthase G), 1vhq (enhancing lycopene biosynthesis protein 2), and 1fva (methionine sulfoxide reductase, Msr) (Table 1). Only the last two exhibit pKas greater than one standard deviation from the mean for a control cysteine data set. There are two potential explanations for increased cysteine pKas. First, protein conformational changes and local flexible loops modulate the pKa of the modifiable cysteine, an aspect not captured using static structures. Second, Cys-SOH generation could be a result of modification in the synchrotron and, thus, would likely occur by a mechanism (e.g., hydroxyl radical attack) (Xu and Chance 2005) different from a typical biological oxidation mechanism (e.g., oxidation by hydrogen peroxide) (Wood et al. 2003b). Seven of these eight protein structures were solved at synchrotrons, increasing the possibility of oxidative modification during data collection. Flexibility is likely the explanation for the eighth protein, Msr, as a nearby loop is found in multiple conformations in different crystal structures.

Again, we can use prior knowledge of protein structure and function to illustrate the utility of electrostatic analysis to analyze modifiable cysteine microenvironments. The modifiable cysteines in the protein tyrosine phosphatase 1B (1oet) and RNA triphosphatase domain of mRNA capping enzyme (1i9t) exhibit calculated pKas of the modifiable cysteines that are less than one (Table 1). While this pKa is nonbiological (F.R. Salsbury Jr., L.B. Poole, and J.S. Fetrow, in prep.), the result indicates the extremely low probability of cysteine protonation in a physiological environment, consistent with the mechanism of Cys-SOH formation. The intrinsic pKa (not including titratable residues) of both phosphatase cysteine microenvironments is low, 6.1 and 5.1 for 1oet and 1i9t, respectively. The intrinsic pKa is a result of significant electrostatic interactions with nontitrating groups, including dipoles generated by the partial changes of the backbone atoms in an adjacent loop. Interactions with titrating residues lower the pKa even more: 5.3 and 4.9 pH units for 1oet and 1i9t, respectively (Table 1). We identify Arg 221 and Ser 222 as significant interacting residues in 1oet and Thr 133 in 1i9t (Table 1). The identification of Arg 221 is in agreement with previous work, which compared three phosphatases and determined that the only ionizable side chain significantly influencing the cysteine pKa was this conserved arginine (Peters et al. 1998); however, the previous work did not determine the interaction of Ser or Thr with the active site Cys by treating them as titratable in the calculations. This is a second indication of the importance of serine and threonine and is consistent with the residue frequencies (Table 2).

Qualitatively, the electrostatic calculations on the modifiable cysteine sites are consistent with the expected: an overall decrease in the modifiable cysteine pKa. While the calculations cannot exactly pinpoint the cysteine pKa, they can help us to identify residues important to cysteine reactivity.

Effect of threonine residues on the pKa of Cys-SOH modification sites

The calculations were first performed without considering Thr as an ionizable residue, a standard assumption given its high pKa (given in standard tables as 15.0). Residue frequency calculations indicate that Thr is overrepresented in the signatures compared with its overall presence in these proteins (Table 2), suggesting the importance of hydrogen bonding. To quantify the interaction and effect of Thr on the modifiable cysteine, electrostatics were recalculated including Thr as an ionizable residue.

The inclusion of Thr significantly lowers the calculated pKas for the modifiable cysteine in three proteins and lowers it somewhat for 10 more (Table 1). Altogether, the Thr is important for maintaining the modifiable cysteine pKa in six Prx crystal structures. This observation suggests that this Thr, which is essentially invariant across all classes of Prxs (Hofmann et al. 2002), is of considerable importance in determining the electrostatic properties of these functional sites. In fact, mutations at this position indicate the importance of the Thr hydroxyl group, as replacement by serine or valine yields an active or inactive enzyme, respectively, in studies of a Leishmania donovani Prx (Flohe et al. 2002). Again, this post hoc “prediction” supported by experimental evidence indicates the utility of the method for analysis of functional site features.

The pKas of seven other proteins are also affected by inclusion of Thr: 1qvz (product of gene YDR533C, an unknown protein from Saccharomyces cerevisiae), 1j0x (glyceraldehyde-3-phosphate dehydrogenase), 1d8c (malate synthase G), 1qwi (OsmC hydrogen peroxide reductase), 1gsn (glutathione reductase), 1ekf (branched chain amino transferase), and 1i9t (RNA triphosphatase). There is no obvious commonality between these proteins, except that all contain sites of Cys-SOH modification. Thr, its hydroxyl group, and hydrogen-bonding capabilities are thus predicted to play an important role in these protein functions and cysteine deprotonation.

Comparison of functional site profiling and electrostatic analysis

To accomplish our long-term goals of developing methods to predict Cys-SOH modification sites, we must more fully identify the sequence, structure, and electrostatic features that are required for propensity toward cysteine modification. Toward this end, we combined functional site profiling, which is sequence based, with information about electrostatics. First, the complete functional site profile (Fig. 2) was clustered based on pairwise active site profile scores (Cammer et al. 2003). Second, as part of the electrostatic calculations, residues that affect the pKa of the modifiable cysteine were also identified (see Materials and Methods). Residues were divided into two classes: strong interacters, where the interaction energy is >1.0 pK units; and weak interacters, where the interaction energy is between 0.5 and 1.0 pK units. Strong interacters are listed in Table 1 and colored yellow in Figure 2. Weak interacters are indicated in red in the functional site profile (Fig. 2). We then compared sequence-based similarities with similarities in the location of interacting residues.

Of the 210 interacting residues identified by the electrostatic calculations, 169 are found in the functional site signatures. Thus, as expected, most (80%) of the residues that shift the modifiable cysteine pKa are located within 10 Å of that cysteine. However, the converse is not true—not all titratable residues within 10 Å of the cysteine influence its pKa. A total of 166 titratable residues are found in the functional site signatures that are not interacting residues, compared with 169 titratable residues that are; thus, only about half of all titratable residues within 10 Å of the modifiable cysteine are important for its pKa shift. Proximity of side chain centers of mass of these residues is therefore not the only parameter affecting the cysteine pKa. We observed a number of causes for this noninteraction, including hydrogen-bonding residues pointing away from the modifiable cysteine and other protein atoms located between the titratable residue and the modified cysteine.

Comparison of the interacting residues with the functional site signatures profile reveals several commonalities. Recall that Prxs V and VI (1hd2 and 1prx, respectively) exhibit an overall sequence identity of only 8%, but sequence identity and significant profile scores between the functional site signatures of 41% and 0.38, respectively, indicating that the cysteine modification sites are similar. The electrostatic analysis identifies four aligned interacting residues: Thr; Arg; His/Glu; and Cys/Thr (Table 1 and Fig. 2, light-gray shading). The mechanism for shifting pKa thus appears to be shared among these proteins. The similarity in the structure of their functional sites is apparent (Fig. 4). The identification of these potential common mechanistic determinants was revealed by comparison of the functional site signatures and the pKa analysis, and not by overall sequence analysis (which is only 8%). The result is consistent with known mechanisms of these Prxs (Choi et al. 1998; Declercq et al. 2001).

Other Prxs provide further examples, including 1n8j (AhpC), 1qmv (thioredoxin peroxidase B, also known as PrxII), and 1e2y (tryparedoxin peroxidase) (Fig. 2, light-gray shading). Positions of two of the four interacting residues are conserved in all of these proteins: the Thr located three residues to the N terminus of the modifiable cysteine, and the His/Glu located several residues toward the C terminus from the modifiable cysteine. Other interacting residues are common only to a subset of Prxs, e.g., 1n8j (AhpC) and 1qmv (Prx II) contain six conserved interacting residue positions (Fig. 2), suggesting a common mechanism for cysteine deprotonation in these two Prxs.

Interestingly, GAPDH (1j0x), which is not a Prx, is found amidst this Prx cluster and exhibits several common interacting residue positions (cf. functional site signatures, shaded dark and medium gray, Fig. 2). The most obvious similarities exist within the helix containing the modifiable cysteine (Fig. 4). All three proteins have an interacting residue approximately one helical turn from the cysteine on the C-terminal side (Glu 50 in 1prx, His 51 in 1hd2, and Cys 153 in 1j0x). Also, 1prx Thr 48 shares a similar position in the structure to 1j0x Thr 150. Two other notable structurally similar interacting residues are observed (Fig. 4). First, all three proteins contain an interacting residue on a β strand directly behind the cysteine: Ser 72 of 1prx, Cys 72 of 1hd2, and His 176 of 1j0x. Second, residue Tyr 317 in 1j0x is located near the cysteine in a similar manner as Arg 132 in 1prx and Arg 127 in 1hd2, a residue thought to be important in stabilizing the deprotonated cysteine in these Prxs (Wood et al. 2003b; Copley et al. 2004). These observations suggest that these common features might play similar roles in cysteine deprotonation in these proteins.


Functional site profiling has previously been applied to enzyme active sites (Cammer et al. 2003; Baxter et al. 2004), sites that are closely related compared with post-translational modification sites. Analysis of post-translational modification sites is a more difficult problem because of the diversity of the modifiable sites. Here, we have applied functional site profiling to Cys-SOH post-translational modification sites. As expected, the individual signatures are diverse and the known data set is currently small. Attempts to align the entire profile (Fig. 2) illustrate the difficulties in using a sequence-only based method to align these diverse signatures. Pairwise analysis of the functional site profile scores can be used to identify potentially related subgroups, but sequence-only methods are not powerful enough to elucidate similarities at these sites. Additional features must be included in the scoring function to better identify similarities between the signatures.

Previous observation of protein structure has suggested that the microenvironment that stabilizes Cys-SOH is characterized by three features: (1) lack of solvent accessibility to the modified cysteine; (2) lack of nearby reduced cysteines; and (3) local hydrogen-bonding residues that stabilize the sulfenate form (Claiborne et al. 1993, 1999). Others have suggested the helix dipole (Hol et al. 1978; Hol 1985; Iqbalsyah et al. 2006), interaction with histidine (Polgar 1974; Lo Bello et al. 1993), or ion pair formation (Griffiths et al. 2002) as important features in cysteine reactivity. Integration of profiling with electrostatic analysis across the entire modifiable protein set allows us to comment generally on these observations and to identify possible mechanistic determinants for cysteine deprotonation. First, solvent accessibility does not appear to be a key feature of these sites. Second, lack of other nearby cysteines does not appear important for reactivity. If the modifiable cysteine is excluded from the counts, the frequency of occurrence of cysteines in signatures is 1.2, just slightly above its average occurrence in this set of proteins overall. (The absence of proximal cysteines could still account for Cys-SOH stability, which is not addressed by the current studies.) Third, titratable, polar, but not necessarily charged, residues are important. Polar residues are overrepresented, but three of four charged residues are significantly underrepresented in the signatures (Table 2). In addition, the cysteine pKa can be decreased without nearby charged or titratable residues (for example, 1fnjA and 1qq2A signatures in Fig. 2). Subtle polar interactions, influenced by local backbone conformation and local hydrogen-bonding patterns, thus appear to be important in decreasing the cysteine pKa. Fourth, histidines, in particular, may be a key feature of many modifiable sites. They are overrepresented in both functional site signatures generally and N-terminally adjacent to the modifiable cysteine specifically. Fifth, we have observed important interactions with Thr that shift pKas in some of these proteins (Table 1). Thr is overrepresented in these signatures overall (Table 2). We suggest that Thr can play an important role in cysteine modification and should be included in future calculations, not because Thr can ionize in physiological environments, but because the calculation can identify and quantify important interactions between the Thr hydroxyl and pKas of other residues.


This study provides a first general analysis of known Cys-SOH modification sites in proteins. From the structure database and experimental evidence, we have identified a set of proteins containing modifiable cysteines. We have reported here a detailed sequence, structural, and electrostatic analysis of these sites, and have calculated the cysteine pKas. With this study as a baseline, we can now begin to include and compare sequence and electrostatic features of other cysteine modifications such as those being identified by specific chemical probes (Moos et al. 2003; Poole et al. 2005; Dennehy et al. 2006; Greco et al. 2006), with an aim toward understanding how specificity of cysteine modification is determined. The features identified in this study will aid in prediction of protein sequences that contain modifiable cysteine sites, an important step in the identification of “candidate proteins” for the design of experiments characterizing redox signaling pathways. Analysis of the signatures and the interactions with the modifiable cysteines will aid researchers in understanding the modification mechanisms important in redox signaling and disease.

Materials and Methods

Protein data sets

The modifiable protein set consists of proteins containing one or more cysteines that are known to be modifiable to Cys-SOH: a total of 47 proteins and 49 modifiable sites (Table 1). All of these proteins are of known structure and were taken from PDB release Jan 2005 (Berman et al. 2002). Some are proteins for which Cys-SOH is generated as part of their biological function and/or there is significant biochemical evidence supporting Cys-SOH formation; others actually exhibit a Cys-SOH in their structure. Additional details about this data set can be found in the Supplemental material.

Protein sequence and structure analysis

Calculation of secondary structure, cysteine side-chain solvent accessibility, and amino acid frequency calculations are described in the Supplemental material.

Identification of functional site signatures and profiles

Functional site signatures and profiles were created and clustered for modifiable cysteine sites in the modifiable protein set using the procedures previously described (Cammer et al. 2003; Fetrow 2006) and outlined in Figure 1. Additional details can be found in the Supplemental material.

Calculation of cysteine pKas

pKas were calculated using the MEAD multiflex package (Bashford 1997), essentially as previously described (F.R. Salsbury Jr., L.B. Poole, and J.S. Fetrow, in prep.). Ser, Asp, Arg, Glu, His, Cys, Lys, Tyr, and Thr residues were considered titratable. Parameters for the cysteine were taken as previously validated (F.R. Salsbury Jr., L.B. Poole, and J.S. Fetrow, in prep.). An intrinsicpKa was determined first for each titratable residue, which includes the effects on the electrostatic free energies due to solvent accessibility of the titratable site and to its interactions with the partial charges of the backbone and nontitrating residues. The full pKa, which includes the effects of titratable groups, is then determined, either with Monte Carlo (Bashford 1997) or the reduced-site titration method (Bashford and Gerwert 1992; Antosiewicz et al. 1994; Bashford 1997; van Vlijmen et al. 1998). Model parameters for Thr were nonexistent and were developed as recently described for cysteines (F.R. Salsbury Jr., L.B. Poole, and J.S. Fetrow, in prep.). Details of the cysteine and threonine charge models can be found in Supplemental Table S1. Additional methodological details can be found in the Supplemental material.

Interaction between modifiable Cys residues and other titratable residues

The initial step in identification of residues affecting the modifiable Cys pKa is construction of a fully protonated protein reference state. The pKa calculations were performed using MEAD (Bashford 1997). To calculate interaction energies, the electrostatics were calculated with each titratable residue (one at a time) in the protonated and deprotonated forms. These site–site interaction energies are used to identify interacting residues or interacters—those side chains that are interacting with the residue of interest and participating in its pKa shift. Residues interacting with the modifiable Cys are identified as strong interacters when the interaction energy (measured in pK units) is >1.0 pK units, and weak interacters when the interaction energy is between 0.5 and 1.0 pK units.


We thank Mick Knaggs and Todd Lowther for helpful discussions. We acknowledge support from both the NIH (R21 CA112145) to L.B.P. and the NSF (MCB-0517343) to J.S.F. These calculations were performed on Wake Forest University's DEAC cluster ( including a SUR grant from IBM for storage hardware, and the support of the Wake Forest IS department is gratefully acknowledged.