Mapping of disulfide bonds with aid of virtualmslab
Disulfide bonds in proteins can be mapped by mass spectrometric identification of the corresponding digest peptides . For this, efficient cleavage between cysteine containing sections of the protein, leaving the disulfide bridges intact, is essential. However, disulfide bonded proteins often have a rigid structure rendering the native protein resistant to cleavage by proteases. In that case, chemical cleavage may be considered, such as the use of cyanogen bromide to cleave at methionine residues, or pH 2 at elevated temperature to cleave peptides bonds at the C- or N-terminal side of aspartate residues. RNase A was used as a model protein to show the development of a procedure with the aid of virtualmslab for mapping disulfide bonds in a rigid protease resistant protein .Virtual experiments with the virtualmslab program showed that MALDI-MS detectable fragments, with masses ranging from ∼800 to ∼4000 atomic mass units, could be generated by initial specific acid cleavage in front of and behind aspartate residues [17,18] to break-up the rigid protein, followed by tryptic cleavage which takes place behind lysine and arginine residues.
Experimentally, RNase A was cleaved by treatment at pH 2, followed by trypsin digestion and mass analysis of the resulting peptide mixture. Based on a single MALDI-FTICR mass spectrum, 42 fragments were assigned by virtualmslab within a mass window of 4 p.p.m., corresponding to a sequence coverage of > 90%. Figure 1 shows part of the output sheet for the assignment over three quests. The first quest matches all unmodified peptide masses (specified by the question mark) to the experimental masses. The second quest matches the combined masses of all pairs of peptides, each containing at least one cysteine minus the mass of two hydrogen atoms, assigning the disulfide linked peptides. The third quest matches the mass of all peptides containing cysteine minus the mass of two hydrogen atoms, assigning the peptides with an internal disulfide link.
From the assignments, a peptide map was constructed as shown in Fig. 2. Due to partial cleavage at both D and R/K, many overlapping peptides were observed. About 80% of all peaks in the MALDI-FTICR mass spectrum with intensity above 5% of the base peak could be assigned, assuming cleavage at D or K/R. This demonstrates the high specificity of chemical cleavage at aspartate residues. We were aware of the possible occurrence of deamidations of asparagines and subsequent partial cleavage at the resulting aspartate residues. However, virtualmslab analysis allowing partial modification of N to D followed by partial cleavage on the newly formed D residues, showed no matches for the resulting peptides. This indicates the absence of severe deamidations under our experimental conditions. Of the 42 assigned fragments, a total of 23 were unambiguously attributed to peptides with a correct disulfide bridge, considering four disulfide linkages in RNase A. Of the 23 disulfide-containing fragments, three were assigned to the C26–C84 linkage, 12 to C40–C95, four to C58–C110, and four to C65–C72. Several disulfide-linked peptides were also present as free SH-containing peptides, indicating partial in-source reduction of disulfides . It should be noted that this phenomenon enables assignment of pairs of in-source cleavage products to corresponding disulfide linked peptides, the sum of the masses of the cleavage products, due to incorporation of two H atoms, being 2 atomic mass units more than the mass of the parent compounds. This information can be used to confirm the results of the virtualmslab analysis.
Figure 2. Peptide map constructed from the virtualmslab assignments of the MALDI-FTICR-MS data of the RNase A digest peptide mixture. The first column shows the sequence with the four well established disulfide links. The second column shows the peptides resulting from the in-source MALDI reduction of S–S-linked peptides. Column 3 shows the linked peptides, clearly confirming all four established disulfide links. Column 4 shows the peptides associated with conflicting internal disulfide bridges.
Download figure to PowerPoint
Despite the overwhelming evidence for the correct disulfide linkages, three minor peaks were assigned by virtualmslab to peptides with conflicting disulfide linkages; two of these correspond to a peptide with an internal C40–C58 linkage, the third corresponds to a peptide with an internal C84–C95 1inkage. These species can conceivably be naturally occurring disulfide-bridge variants, or can be the result of disulfide interchange reactions during the experiment. Disulfide interchanges can in principle be catalysed by free thiols at neutral or high pH. If this were the case, a thiol scavenger should be able to prevent disulfide interchange. To investigate this possibility we added N-ethylmaleimide (NEM) to the acid-cleaved RNase A preparation before the start of the digestion at pH 8.0 by trypsin. It should be noted that at this pH NEM not only reacts with SH groups, but to a lesser extent also with amines. The presence of NEM during trypsin digestion therefore results in complex peptide mixtures, due to partial modification at the amino terminus and at lysine residues, and because modification at lysine residues prevents cleavage by trypsin. Accordingly, analysis using virtualmslab including modifications with NEM in the match quests results in the assignment of no fewer than 84 peptides. Of these, 31 represent free SH-containing peptides, as the result of in-source decay, and 53 are correct disulfide-linked species. No unambiguous evidence was found for peptides with internal C40–C58, C84–C95 or any other conflicting linkages under these conditions, indicating that their minor presence in the absence of NEM must have been the result of disulfide interchange reactions. A possible explanation is the phenomenon of β-elimination , occurring under the alkaline conditions during trypsin digestion, creating the necessary catalyst for the interchange reaction. Even trace amounts of free sulfhydryl groups can trigger a cascade of reshuffling of disulfide-linked peptides, which may explain the minor formation of the detected peptides with an internal C40–58 or C84–95 disulfide bond. Ambiguities caused by these interchange reactions can be resolved by adding NEM before and during digestion.
In conclusion, it appears that well-controlled acidic cleavage followed by tryptic digestion effectively breaks up the rigid RNase A molecule into MALDI-MS detectable fragments, leaving the vulnerable disulfide bonds intact. The virtualmslab analysis of the data from a single MALDI-mass spectrum acquired with a high performance FTICR mass spectrometer unambiguously reveals the origin of all disulfide bonds.
Identification of cross-links in the NK1 domain of HGF/SF
HGF/SF and its receptor Met stimulate cell growth, cell differentiation and migration during embryogenesis. In cancer they promote invasive growth in surrounding tissues and metastasis of the tumour. Both proteins are produced as inactive singular proenzymes, which upon cleavage form an active disulfide-linked α/β heterodimer. Several individual domains of both Met and HGF/SF have been elucidated, but the 3-D structures of the full-length proteins are not yet resolved. The NK1 domain of the α-chain of HGF/SF is found to be the main interaction site with Met, while the β-chain might make additional interactions. To obtain a model of the interaction of Met and HGF/SF, the complex has been subjected to solution phase small angle X-ray scattering (SAXS) (Gherardi, E., Sandin, S., Petoukhov, M. V., Finch, J., Öfverstedt, L.-G., Nunez, R., Blundell, T. L., Vande Wonde, G. F., Skoglund, U. & Svergun, D. I., unpubished data). Experimentally determined constraints on the distances of amino acid residues should be helpful to either discard or confirm the solutions obtained by SAXS. Identification of the sites of artificially induced cross-links can provide such distance constraints and, with these constraints, a detailed model of the interaction between the two proteins can be designed, based on SAXS data and the known 3-D structures of single protein domains. Such a model will be of great value both to understand how HGF/SF interaction with Met leads to receptor dimerization and signal transduction and to design Met inhibitors as anticancer drugs .
Mass spectrometric analysis of digests of cross-linked proteins is known to be a powerful way to identify sites of cross-linking [3,6,8]. However, the identification of cross-linked sites in biological assemblies as complicated as the HGF/SF-Met complex are unprecedented. We use the amine-specific homobifunctional cross-linker bis(sulfosuccinimidyl)suberate (BS3). Besides reaction with amines, the activated ester is also susceptible to hydrolysis, which may lead to single labelling, i.e. modification of amines without actual cross-linking. Clearly, the above analyses of the naturally occurring disulfide-linkages in RNase A must be taken a step further for this complex which is build from four peptide chains over two disulfide-linked αβ heterodimers. The complex has more than 1600 amino acid residues adding up to a mass of over 180 kDa, and it has 98 lysine residues which can be heterogeneously cross-linked or singly labelled by the cross-linking reagent.
To anticipate limitations of a mass spectrometric analysis of this complicated system, analysis has first been completed with the virtualmslab program. The above HGF/SF-Met complex was subjected to reduction and alkylation of cysteine residues by iodacetamide, followed by digestion with trypsin, allowing a maximum of three miscleavages. Mass filtering between 200 and 4500 Da resulted in a digest mixture of 534 peptides. From this, a database was generated of all possible realistic peptide pairs linked together with BS3 via their lysine residue, excluding lysines cleaved by trypsin. From this database of 16 554 BS3-linked peptide pairs, the mass list was extracted and taken as our virtual mass spectrum. Figure 3 shows the mass distribution of cross-linked peptide pairs, illustrating that most of the cross-linked peptide pairs have masses > 3000 Da. Each entry of this mass spectrum was then matched against the complete theoretical set of peptides, including unmodified peptides, peptides that are modified by a partially hydrolysed cross-linker, intrapeptide cross-linking products and interpeptide cross-linking products. The match presents all peptide candidates for assignment of each mass in the spectrum as a function of the match mass window. It shows how many alternative peptide candidate assignments can be anticipated if the experimental mass spectrum is searched for cross-linked peptides at a specific instrumental mass accuracy. In Fig. 4 the results are summarized as the average number of peptide candidates for all 16 554 masses in the virtual spectrum, segmented in four mass ranges, vs. the mass window. As expected, the number of candidates comes down to almost 1 for all mass ranges if the mass window is zoomed in to 0 p.p.m. Still, for about 15% of the mass entries an alternative candidate, beside the authentic cross-linked peptide pair is given. Most of these alternatives are due to shifted tryptic cleavage places for the peptides with RK, RR, KK and KR sequence elements which will yield peptides with identical elemental composition. Nevertheless, these alternative assignments will pinpoint the same cross-link. When the mass window is zoomed out, the number of candidate peptides gradually increases to 2.5 for the low mass segment of m 1000–2000 Da. This number appears to level off if the window becomes broader than ±60 p.p.m. The gradual increase indicates that for this mass range the density of the candidate peptide masses is relatively low. The levelling-off points out that the distribution of the alternative candidate peptide masses around the mass of the authentic cross-linked peptide pair is about ±60 p.p.m. wide. This limited width is a consequence of the known discontinuous mass distribution of peptides . For comparison, the gap between m/z 2000 and m/z 2001 is 500 p.p.m. For the highest mass segment of m 4000–5000 Da, which covers most of the cross-linked peptides (see Fig. 3) the number of peptide candidates rapidly increases with increasing detection mass window, while this number only begins to level off to about 10 outside ±90 p.p.m. This indicates that, for the higher mass range, the density of candidate peptides masses is much higher and the mass distribution width has increased to over ±90 p.p.m. For comparison, the gap between m/z 5000 and m/z 5001 is 200 p.p.m.
Figure 3. Calculated mass distribution of the BS3 cross-linked peptide pairs in the tryptic digest of the HGF/SF Met protein complex allowing cross-links between all lysine residues.
Download figure to PowerPoint
Figure 4. Calculated average number of peptide candidates within a mass window at different mass ranges in a tryptic digest mixture of BS3 cross-linked HGF/SF Met protein complex. (for details see text).
Download figure to PowerPoint
The above virtual analysis reveals that instrumental mass accuracy is crucial. For mass accuracies better that 2 p.p.m., such as can be obtained with a high performance Fourier transform mass spectrometer, most of the identifications can be based on accurate mass with additional tandem mass spectrometric validation. For mass accuracies better that 20 p.p.m. the identification is filtered to three or four possible candidates (see Fig. 4). This moderate number of alternative candidates should still allow unambiguous identification based on additional tandem mass spectrometric validation. It thus appears that a cross-linking approach to obtain structural information about an assembly as complicated as the HGF/SF-Met complex is feasible, especially with adequate fractionation of the peptide mixture, e.g. by reversed phase HPLC.
To experimentally test this finding, we have carried out the mass spectrometric analysis of a cross-linked peptide mixture with at least the same or higher complexity as a reversed phase HPCL fraction of a peptide mixture derived from cross-linked HGF/SF-Met. We chose the NK1 domain of HGF/SF as the test protein for these experiments. The size of NK1, with 183 residues adding up to almost 22 kDa, is roughly one-tenth of that of the entire HGF/SF complex and therefore of similar complexity as an average reversed phase HPLC fraction from the complex, assuming sorting of the peptides in at least 10 fractions. Moreover, a 3-D structure of the NK1 domain is available, so that cross-link identification can be validated. BS3 was used to covalently cross-link amines within the NK1 subunit. Cross-linked and control preparations were subjected to SDS/PAGE. Subsequently, protein bands corresponding to the monomeric NK1 were treated with trypsin and the resulting peptide mixtures were mass analysed. The processed MS data were loaded into the virtualmslab program and matched with the corresponding virtual experimental results. A total of 13 peaks in the MALDI-TOF mass spectrum of the cross-linked NK1 digest could be related to cross-linking products. Some of these peaks were matched with one or two alternative assignments within a mass window of ±30 p.p.m. corresponding to the mass accuracy of our MALDI-TOF instrument. As anticipated, the relatively limited average number of possible peptide assignments found for the cross-linked NK1 is smaller than the average number of three candidate assignments found by the virtualmslab program for the entire HGF/SF-Met complex in a mass window of ±30 p.p.m. (Fig. 4).
Based on the peptide assignments, a list of candidate cross-links is given in Table 1. Four of these candidate cross-links have been confirmed by tandem mass spectrometric analyses of the corresponding cross-linked peptides using either ESI-QTOF or MALDI-TOFTOF (Fig. 5). These validated cross-links were fit into an available crystal structure of the protein (PDB: 1BHT) . It was found that the measured distances between amino groups are compatible with the calculated distance of 11.4 Å which can be spanned by the BS3 cross-linker (Fig. 6). Another candidate cross-linked peptide pair connecting K44 and K91 was assigned by virtualmslab. Tandem MS data allowed neither confirmation nor rejection of the assignment, still leaving open the possibility that it corresponds to an unknown species. However, also this candidate cross-link fits nicely into the 3-D structure of NK1 (Fig. 6).
Table 1. Candidate cross-links found in NK1 using BS3 as a cross-linking agent. The cross-link candidates are nominated by the virtualmslab program by assigning peaks in the MALDI-TOF mass spectrum of the tryptic digest of cross-linked NK1 to the corresponding cross-linked peptides. Residue Y28 is the N-terminal residue in the construct used.
|Residue 1||Residue 2||Assigned peaks (m/z)||Experimental mass discrepancy (p.p.m)|
Figure 5. MALDI-TOF/TOF MS/MS analysis of a NK1 cross-linked peptide with m/z 2503.3. NK1 K137 is linked to NK1 K170 (see Table 1). (A) Structures of the cross-linked peptide. Observed fragment ions are indicated. (B) MALDI-TOF/TOF MS/MS data: fragment ion annotations correspond to the annotations in A.
Download figure to PowerPoint
Figure 6. Space filled model of the NK1-domain of HGF/SF (1BHT). Four confirmed (solid lines) and one candidate cross-link (dashed line) are shown in this model. Measured distances between the linked amino acids are indicated. The different angles between the two views A and B are indicated by the arrows. The model was visualized using pymol (http://www.pymol.org).
Download figure to PowerPoint
The candidate cross-links in Table 1 suggest cross-linking between the N-terminal part of the protein [Y28 (N-terminus) and K34] with the region including K132, K137 and K170, which are close together. However, the first seven residues of the protein N-terminal region, specified as amino acids 28–34, are not resolved in the crystal structure and links to their amine groups cannot be drawn. This can be explained by assuming flexibility of the seven N-terminal residues that might localize preferentially into this region. Alternatively, we may assume that K132, K137 and K170 have a relatively high reactivity towards the cross-linking agent, enabling them to trap the flexible amino terminus.
The results imply that a single MALDI-TOF mass spectrum with moderate mass accuracy of an unfractionated proteolytic digest of a cross-linked protein can disclose significant information on the protein structure. This opens new avenues in the computer assisted analysis of more complex biological assemblies, by combining advanced peptide separation techniques with mass analysis, and by taking advance of the high mass accuracy of FTICR-MS.
In conclusion, it appears that advanced mass spectrometric studies on proteins can significantly be promoted by software tools, like the virtualmslab program, that can merge and tune mass spectrometric analysis with biochemical experiments. In contrast to other available software such as asap, ms2assign and searchxlinks the unique multistage experiment editor in our program is a convenient tool to predict and optimize possible outcomes beforehand, which saves time in finding successful experimental strategies. asap and searchxlinks have the order of events hard coded into the program and do not allow for multipass experiments. ms2assign has the unique feature to handle MS/MS data, which all other programs, including virtualmslab cannot. virtualmslab also allows for a large number of candidate proteins to be input in one single analysis. The recently described program cplm is flawed, in the sense that it only candidates the match with the least mass deviation for a given observed mass, thus bypassing critical assessment and verification.
The potential of our software program has been shown for the cross-link studies presented in this paper. However, the applications can be extended with other studies, including studies comprising entire cellular proteomes.