To identify a panel of candidate protein biomarkers of rheumatoid arthritis (RA) that can predict which patients will develop erosive, disabling disease.
To identify a panel of candidate protein biomarkers of rheumatoid arthritis (RA) that can predict which patients will develop erosive, disabling disease.
A 2-step proteomic approach was used for biomarker discovery and verification. In the first step, 2-dimensional liquid chromatography–coupled tandem mass spectrometry was used to generate protein profiles of synovial fluid (SF) from patients with either erosive RA (n = 5) or nonerosive RA (n = 5). In the second step, the selected candidate markers were verified using quantitative multiple reaction monitoring mass spectrometry in sera of patients with erosive RA (n = 15) or nonerosive RA (n = 15) and of healthy controls (n = 15).
Through differential profiling of proteins in the <40-kd portion of the SF proteome, we selected 33 prospective candidate biomarkers from a total of 418 identified proteins. Among the proteins that were elevated in the SF of patients with erosive RA were C-reactive protein (CRP) and 6 members of the S100 protein family of calcium-binding proteins. Significantly, levels of CRP, S100A8 (calgranulin A), S100A9 (calgranulin B), and S100A12 (calgranulin C) proteins were also elevated in the serum of patients with erosive disease compared with patients with nonerosive RA or healthy individuals.
Several potential protein marker candidates have been identified for prognosis of the erosive form of RA. This study demonstrates the facility of using protein mass spectrometry in SF and serum for global discovery and verification of clinically relevant sets of disease biomarkers.
An ability to discern persistent disease activity in rheumatoid arthritis (RA) would be advantageous for the identification of patients who are likely to develop the most severe form of the disease. However, in most patients, there is no early biomarker of persistently active disease, and this leads to high risk of joint destruction and disability. Currently available biomarkers, such as C-reactive protein (CRP) or erythrocyte sedimentation rate, offer good correlation with concurrent disease activity, but do not predict subsequent severity (1). Rheumatoid factor, which is present in approximately two-thirds of patients with RA, does predict more severe disease, but it may be absent early in the disease, is not very sensitive, and does not vary dynamically with treatment (2). Radiographic erosions are strong predictors of disability, but often do not manifest for 1–3 years, and rarely resolve with treatment (1). A number of individual proteins have been evaluated on a case-by-case basis as potential biomarkers of RA (3–9). To facilitate a more global discovery of biomarkers in RA, we used mass spectrometry (MS) to characterize the protein profiles of synovial fluid (SF) samples from 10 patients with either the nonerosive or the erosive form of the disease.
Tandem mass spectrometry (MS/MS), coupled with multidimensional liquid chromatography (LC) and database searching, has emerged as a powerful technique for protein identification and characterization (10). It has been utilized for large-scale analysis of complex protein mixtures such as yeast cell lysate and has resulted in the identification of 1,484 proteins from yeast proteome (11). Two-dimensional LC-MS/MS (LC/LC-MS/MS) in the targeted analysis mode can be used to profile low-level proteins in human plasma (12). Several recent studies have demonstrated the utility for global protein profiling of human body fluids and the potential for protein marker discovery (13–16). Attempts at quantitation in data-dependent LC-MS/MS protein profiles have essentially been limited to studies using isotope-coded affinity tag reagents (17). However, Bondarenko et al (18) and Chelius and Bondarenko (19) have recently shown that chromatographic peak areas of peptide precursor ions calculated in the intervening MS scans from LC-MS/MS experiments were closely correlated with protein concentrations, even in a complex mixture such as human serum. We have implemented this capability in our SpectrumMill software, and in this report we demonstrate that the ion intensity information can be compared at least semiquantitatively to identify differentially expressed protein markers.
Both the identification and the quantification of protein biomarkers in either SF or serum using MS are challenging because of the large dynamic range that is required to detect proteins present at ng/ml to μg/ml levels, versus abundant proteins such as human serum albumin (HSA), IgG, and haptoglobin, which are present at mg/ml concentrations. Therefore, to enrich for lower-abundance proteins and to simplify the SF proteome, in this report we describe the processing of SF samples to remove the abundant proteins HSA and IgG, followed by size-exclusion chromatography (SEC) of intact proteins prior to preparation of samples representing the <40-kd portion of the SF proteome for analysis by LC/LC-MS/MS.
We conducted a 2-step exploratory study for biomarker discovery. In the first step, we differentially profiled SF samples from 5 patients with erosive RA and 5 patients with nonerosive RA to delineate protein marker candidates that are present at the site of the destructive process and thus may ultimately serve as prognostic markers of the most aggressive form of this disease. In the second step, a subset of the marker candidates was analyzed using multiple reaction monitoring (MRM) MS and 13C-labeled peptide internal standards in sera from 15 patients with erosive RA, 15 patients with nonerosive RA, and 15 healthy controls, in order to provide a quantitative measure (20). We present data demonstrating a link between the discovery of protein markers in SF by MS and their expression in the serum of patients with RA. In addition to the discovery of several promising marker candidates, this study also demonstrated the utility of the 2-step proteomic approach in biomarker discovery.
Patients with erosive and nonerosive RA were recruited from a single tertiary referral center in Switzerland. All patients were examined by a rheumatologist (FH) who confirmed their diagnosis and obtained blood and SF samples from clinically inflamed knee joints. Clinical inflammation was defined as both joint swelling and pain on physical examination. Serum and SF were collected simultaneously. The serum and SF samples were aliquoted and stored at −80°C until assayed. All patients gave informed consent, and the Medical Ethics Committee approved the study protocol. Control serum samples were obtained from 15 healthy subjects (7 men and 8 women) whose ages ranged from 24 to 40 years.
Chromatography columns (HiTrap Blue, HiTrap Protein G, HiTrap NHS-activated HP, and Superdex 200 HiLoad 16/60) were purchased from Amersham Biosciences (Uppsala, Sweden). Solid-phase extraction C-18 Sep-Pak Light cartridges were purchased from Waters (Milford, MA). Molecular weight centrifugal filters (Centriplus and Centricon) were purchased from Millipore (Bedford, MA). Sequencing grade–modified trypsin was obtained from Roche Diagnostics (Indianapolis, IN). Ultrapure urea was purchased from USB (Cleveland, OH). All other chemicals and hyaluronidase (HSE) from Streptomyces hyalurolyticus were purchased from Sigma (St. Louis, MO) and used without further purification. Synthetic peptides representing trypsin cleavage products of candidate marker proteins were synthesized by New England Peptide (Gardner, MA), where purity and molecular weight were assessed by reverse-phase chromatography and matrix-assisted laser desorption ionization–time-of-flight MS, respectively. A Protein Assay kit was purchased from Bio-Rad (Hercules, CA).
In order to facilitate the chromatography of SF samples during the depletion of abundant proteins and size fractionation, we used a highly active HSE from S hyalurolyticus to digest hyaluronic acid and thereby reduce viscosity. Only microgram quantities of the enzyme were needed to treat 1 milliliter of SF; therefore, very small amounts of exogenous protein were introduced during digestion. Unlike mammalian hyaluronidases, this enzyme catalyzes an elimination reaction, rather than hydrolysis, resulting in the generation of unsaturated oligosaccharides that can be detected at A232 (21).
To prepare enzyme stock solution, SHSE buffer (60 mM NaOAc, 1 mM EDTA [pH 6.0]) was added to 1 vial of HSE enzyme to a final concentration of 1,300 units/ml. Hyaluronic acid digestion was carried out by mixing 1 ml of SF with 40 μl of 25× SHSE buffer and 100 μl of the HSE stock solution. The reaction mixture was incubated at 37°C for 4 hours, after which significant reduction in sample viscosity was observed. Under the experimental conditions, we determined that >50% digestion was achieved after 4 hours of incubation at 37°C (data not shown).
Hyaluronidase-treated SF samples containing 50 mg of total protein were centrifuged at 15,000g for 15 minutes to pellet insoluble material. Pellets were washed twice with 1 ml of depletion buffer (200 mM ammonium bicarbonate [pH 7.8]). The washing solution was then combined with the supernatant. To remove IgG and HSA, the combined solution was loaded onto a 1-ml HiTrap Protein G HP column and 3 1-ml HiTrap Blue HP columns that had been coupled in tandem and equilibrated with the depletion buffer. Fractions containing unbound protein, as monitored by A280, were collected and subsequently pooled, freeze-dried and stored at −80°C.
Depleted and lyophilized samples were reconstituted in 1.5 ml of running buffer (8M urea, 200 mM NH4HCO3 [pH 7.8]), then reduced with 20 mM dithiothreitol at 60°C for 1 hour and alkylated with 50 mM iodoacetamide at room temperature in the dark for 30 minutes. The sample was fractionated according to size on a Superdex 200 HiLoad 16/60 column pre-equilibrated with 300 ml of running buffer. Five-milliliter fractions were collected at a flow rate of 0.5 ml/minute, 76 minutes after injection. Fractions 2–7 were concentrated using Centriplus-10 followed by Centricon-10 centrifugal filters, to a final volume of 100 μl. The urea concentration was reduced to 2M by adding 3 volumes of water, and fractions were further concentrated to ∼50 μl. Fractions 8–11 were concentrated as described above, using Centriplus-3 and Centricon-3 filters, respectively. Fractions 12–16 were concentrated using C-18 Sep-Pak Light cartridges. Protein was eluted from the cartridges using 800 μl of 0.1% trifluoracetic acid/10% isopropylalcohol/90% AcN, and the eluate was evaporated to dryness in a benchtop Speedvac.
HSA is the most abundant protein in SF and accounts for ∼64% of total protein in normal SF (22). Although IgG levels in normal SF are low, in patients with RA their levels are similar to those of serum, with an average of 9.5 mg/ml (23). Hyaluronidase-treated SF samples were subjected to affinity depletion of abundant proteins to remove HSA and IgG. This process is estimated to enrich the remaining SF components by effectively removing up to 90% of the abundant proteins, as assessed by sodium dodecyl sulfate–polyacrylamide gel electrophoresis (SDS-PAGE) (Figure 1). Fractionation of proteins by SEC was then performed on the reduced and alkylated samples under denaturing conditions in 8M urea to minimize the formation of protein complexes. SDS-PAGE analysis of the SEC fractions revealed a uniform molecular weight distribution of proteins in the sample (Figure 1). Based on this distribution, it was determined that proteins and polypeptides in the <40-kd range were collected in fractions 5–16.
SDS-PAGE analysis was performed on an aliquot of the concentrated SEC fractions. The remaining material was then digested overnight with trypsin (1:50 [weight/weight]) at 37°C. The digestion was quenched by adding 2 μl of formic acid. The digestion solution was diluted with 2 volumes of loading buffer (5% acetonitrile, 0.1% formic acid) prior to LC/LC-MS/MS analysis.
Tryptic digests of proteins were analyzed with an automated nanoLC/LC-MS/MS system, using a Famos autosampler (Dionex, Sunnyvale, CA) and an 1100 high-performance liquid chromatography (HPLC) binary pump (Agilent Technologies, Wilmington, DE) coupled to an LCQ Deca ion trap mass spectrometer (ThermoFinnigan, San Jose, CA) equipped with a custom-made nanospray ionization source (James A. Hill Instrument Services, Arlington, MA). The built-in divert valve on the LCQ Deca was used as a column switch valve. HPLC vials containing the peptide solutions and vials containing 0, 75 mM, 150 mM, or 500 mM ammonium formate for the step gradient elution were placed onto the autosampler plate and maintained at 4°C. Ten microliters of each peptide sample was loaded onto a column (0.3 mm × 50 mm) packed with PolySulfoethyl aspartamide strong cation exchange material (SCX column; PolyLC, Columbia, MD) and washed at 1.5 μl/minute for 15 minutes with loading buffer (5% acetonitrile/0.1% formic acid [pH 3.0]) with the divert valve in the “load” position. Peptides eluted from the SCX column by each step gradient salt injection were trapped on a 75 μm × 20 cm reverse-phase PicoFrit column (New Objective, Woburn, MA) packed with C-18 (Vydac, Hesperia, CA) (RP column) (10-μm particle, 300Å pore size) and desalted. With the divert valve switched to the “inject” position, peptides from each step were then eluted from the RP column pre-equilibrated with 0.1% formic acid aqueous solution, using a 150-minute linear gradient from 5% to 60% acetonitrile in 0.1% formic acid at a flow rate of 0.2 μl/minute post split. Spray voltage was 2.0 kV, the heated capillary temperature was maintained at 160°C, and the collision energy for MS/MS was 35 units.
Automated data-dependent MS analysis was carried out using the dynamic exclusion feature built into the MS acquisition software (Xcalibur 1.2; ThermoFinnigan). Each MS full scan (mass/charge [m/z] 400–2,000) was followed by 4 MS/MS scans of the 4 most intense peaks, to obtain as many MS/MS spectra as possible.
The tryptic peptide mixture derived from each SEC fraction was separated by online multidimensional LC/LC using strong cation exchange as the first dimension followed by reverse-phase separation in the second dimension. Utilization of multidimensional LC significantly improves the dynamic range of protein detection and provides increased confidence in the identification of lower-abundance proteins. A volatile ammonium formate solution was used for peptide step-elution to minimize possible nanospray source contamination that might have been caused by nonvolatile salts. A gradient of 4 salt steps was utilized in order to reduce carryover of peptides between elution steps. Peptide separation was enhanced when the eluate of each salt step was captured on a 20-cm–long RP column and was analyzed throughout a 2.5-hour acetonitrile gradient. Thus, each of the 10 patient samples yielded 27 LC-MS/MS runs.
From all 10 patients, the 376,678 MS/MS spectra passing the quality filter of having a sequence tag length >1 were interpreted to determine protein identities and relative abundances using a software package which was developed by us and is now commercially available (SpectrumMill version 2.5; Agilent Technologies, Santa Clara, CA). Peptide sequences were interpreted from the MS/MS spectra by searching the mammalian subset of the NCBInr protein database (September 2001; ∼143,000 entries). Search parameters included carbamidomethylation of cysteines, 50% minimum matched peak intensity ± 2.5-dalton and ± 0.7-dalton tolerance on precursor and product ion masses, respectively, 1 missed tryptic cleavage, and electrospray ionization trap scoring parameters.
Identities interpreted for individual spectra were automatically designated as valid by applying the following scoring threshold criteria to all spectra derived from a single patient's SF in 4 rounds of validation: (a) protein details mode: protein score >25, peptide score (scored percent intensity [SPI]) charge +1 (>7, >70%), peptide charge +2 (>8, >70%), peptide charge +3 (>9, >70%); (b) protein details mode: protein score >20, peptide SPI charge +1 (>9, >70%), peptide charge +2 (>9, >70%), peptide charge +3 (>9, >70%); (c) peptide mode: SPI charge +1 (>13, >70%), peptide charge +2 (>13, >70%), peptide charge +3 (>13, >70%); (d) results from select additional spectra with lower scores (>10, >70%) were accepted as valid only after manual inspection by an expert. The above parameters result in a protein being considered identified when either multiple spectra of moderate quality or better (a and b), or at least 1 spectrum of excellent quality (c) or good quality (d) have been obtained. The false-positivity rate for automated interpretation of the MS/MS spectra using the above criteria (a–c) was estimated by searching all spectra passing the quality filter from 1 patient with erosive disease (32,100 spectra) and 1 patient with nonerosive disease (38,786 spectra) against a comparable database with all of the protein sequences reversed (mammalian subset of NCBInr July 2003; ∼243,000 entries) (24).
After application of criteria a and b to the search results from the reversed database, no spectra passed the thresholds, while with the forward database the erosive RA sample yielded 6,591 spectra for 95 proteins and the nonerosive RA sample yielded 6,513 spectra for 75 proteins passing the thresholds. After application of criteria a–c with the reversed database, the erosive RA sample yielded 6 spectra for 5 proteins and the nonerosive RA sample yielded 3 spectra for 3 proteins, while with the forward database the erosive RA sample cumulatively yielded 6,638 spectra for 130 proteins and the nonerosive RA sample yielded 6,532 spectra for 93 proteins passing the thresholds. Thus, the estimated cumulative false-positivity rates following application of criteria a–c were 0.09% spectra, 3.8% proteins for the erosive RA sample and 0.04% spectra, 3.2% proteins for the nonerosive RA sample. In practice, with the goal of achieving a near-zero false-positivity rate at the protein level while still including proteins identified by a single unique peptide via criteria c and d, we have applied a reversed database scoring measure to each individual spectrum using SpectrumMill version 3.1. Consequently, each spectrum must have a Δ(forward score − reverse score) >1. (A list of proteins identified by a single unique peptide across all 10 patient samples is available upon request from the corresponding author.)
In calculating scores at the protein level and reporting the identified proteins, redundancy is addressed in the following manner: the protein score is the sum of the scores of unique peptides. A unique peptide is the single highest-scoring instance of a peptide detected through an MS/MS spectrum. MS/MS spectra for a particular peptide may have been recorded multiple times (i.e., as different precursor charge states, isolated from adjacent protein size-exclusion and/or peptide strong cation-exchange fractions). When a peptide sequence is contained in multiple protein entries in the sequence database, the proteins are grouped together and only the highest scoring one and its gene identification are reported. Proteins grouped in this manner are unlikely to be functionally distinct. Rather, they represent isoforms with minor sequence differences or orthologs.
Protein relative abundances were determined using the ion current measured for each peptide precursor ion in the intervening MS scans of the LC-MS/MS chromatogram. The chromatographic peak area of each precursor ion was calculated in the region ±1.4 m/z and ±20 scans (approximately ±30 seconds). An individual protein's abundance was calculated as the mean ion current measured for all peptide precursor ions derived from that protein.
Proteins from SF were categorized by their Gene Ontology (GO) biologic process terms (25). The terms were assigned by the GO annotations present in the LocusLink records for each protein (26). LocusLink identifications were assigned to 52% of the proteins. The unassigned proteins included 52% immunoglobulin and major histocompatibility complex proteins, 25% hypothetical, predicted, or genome-annotated RefSeq proteins, and 16% nonhuman proteins (single amino acid substitution/peptide to a mammalian ortholog, mostly IgG variable region). Of those proteins that could be assigned LocusLink identification numbers, 70% had GO process terms associated with those LocusLink entries. Statistical analysis of the likelihood of each category to be overrepresented in the SF protein was carried out as outlined below. The categorization calculation was performed using Fisher's exact test for associations between the GO biologic processes represented by the LocusLink entries in the protein list and the all of the human LocusLink entries. The P values were adjusted by the Bonferroni correction to compensate for multiple comparisons.
The selection of marker candidates was based on the relative MS signal intensity of peptides (relative abundance), as well as the frequency at which a protein was detected among 5 erosive RA and 5 nonerosive RA samples. Differences in relative MS signal of <2-fold are not significant because several experimental factors contribute to variability in the average intensity for a protein, including incomplete digestion, differential ionization of equimolar peptides of different amino acid composition, variable instrument sensitivity over mass range, variable flow rate during LC gradient, adequate sampling of the chromatographic peak between MS/MS scans, multiple charge states, and automatic gain control on the ion trap.
Quantification of candidate proteins in patient serum was determined by LC-MRM MS analysis on a triple quadrupole–MS, using 13C-labeled peptide standards as described (20). Thirty patients with RA and 15 healthy volunteers donated blood for serum samples. Because each subject's analysis requires a month of a technician's time to process and analyze, we pooled 5 patients' or controls' samples prior to analysis. This reduced the time of analysis from ∼30 months to 5 months. By mixing samples in this way, we expected to bias the results toward the null hypothesis, since individual extreme values would be muted. Each pool consisted of equal volumes from 5 individuals from the same category. The S100 proteins are relatively small in size (Mr 10,000–13,000), and based on the molecular weight distribution of the SEC fractions by SDS-PAGE analysis, we determined that these proteins would span fractions 7–9, and thus these fractions were combined (Figure 1). CRP, a 25-kd protein monomer, was estimated to reside in fractions 5–7 of the SEC profile, and those fractions were combined for analysis. Synthetic peptide standards (13C-peptides) with favorable ionization properties were identified for use in MRM quantification analyses.
Briefly, pooled serum samples (1 ml) were diluted with 2 volumes of 200 mM ammonium bicarbonate buffer solution and loaded onto affinity columns aligned in tandem to deplete haptoglobin (hemoglobin), IgG (HiTrap Protein G), and HSA (HiTrap Blue), respectively. Further fractionation into molecular weight ranges was accomplished using a Superdex 200 HiLoad 16/60 column as described for SF. Mixtures of peptides, obtained upon treatment of selected molecular weight fractions with trypsin, were pooled and spiked with synthetic 13C-labeled peptide standards to 250 or 500 fmoles/μl final concentration. Synthetic peptides representing the endogenous trypsin cleavage products of the candidate marker proteins were selected, based on criteria previously described (20), in order to optimize the TQ-MS by establishing a unique MRM signature for each peptide. Each peptide contained a naturally occurring leucine residue that was replaced in the 13C-peptide with uniformly labeled [13C6]-leucine residue (designated by an asterisk in the sequences below) suitable for substitution in the synthetic peptide (13C-peptide). The peptide sequences used, including numeric subscripts to designate the amino- and carboxyl-terminal amino acid positions in the native protein sequence, were as follows: S100A8, L37*LETECPQYIR47; S100A9, L26GHPDT*LNQGEFK38; S100A12, E39*LANTIK45 and G22HFDT*LSK29; S100 P, E40LPGF*LQSGK49; S100A11, D28GYNYT*LSK36; and S100A4, E41LPSF*LGK48.
MRM analysis was performed by injecting 1 μl of a pooled fraction solution from an HPLC vial onto an online capillary 75 μm × 12 mm C-18 Magic AQ PicoFrit column coupled to an API-3000 TQ-MS programmed to acquire MRM throughout a 40-minute reverse-phase acetonitrile gradient in formic acid. The ratio of detected endogenous peptide peak area to the 13C-peptide standard peak area was used in conjunction with the fraction volumes and the target protein molecular weight to calculate the serum protein concentration.
Because of the exploratory nature of this study, only limited statistical analyses were performed. Functional categorization of identified proteins was carried out as described above, using LocusLink. We compared the number of SF proteins in each of the GO categories, expressed as a percentage of all 418 SF proteins found, with the percentage of all LocusLink proteins falling into that particular GO category. The category set used for the test included all of the GO biologic process terms present in LocusLink (∼44,000). We used Fisher's exact test to determine the significance of representation of each of the GO terms in the data. The test returns a P value indicating the significance of the intersection of GO terms in the data set and in the category set. Fisher's exact test was implemented as a part of the Category Server application.
Marker candidates were selected on the basis of biologic plausibility, using a panel of experts in mass spectrometry and RA. We selected markers that were 1) detected in more erosive samples than nonerosive samples or vice versa with a ≥2-fold frequency, or 2) were expressed at ≥2-fold intensity in erosive samples compared with nonerosive samples or vice versa. Given the small sample size involved (5 patients in each group), we did not rely on formal statistical tests for the marker selection.
Because we pooled serum samples for the serum study as noted above, the unit of analysis became singular for each of the study groups (erosive RA, nonerosive RA, and healthy controls). Therefore, no statistical analysis was feasible for the serum biomarker evaluation.
Clinical and demographic characteristics of the study subjects are shown in Table 1. Of the 10 patients included in the biomarker discovery portion of the study, 5 patients with severe erosive and destructive disease were classified as having erosive RA (Larsen class IV–V) and 5 patients with only mild disease were classified as having nonerosive RA (Larsen class <II) (27). Clinical assessment of the patients with RA was performed on the day of joint aspiration. The mean SF leukocyte count was 11,983/μl (range 5,400–23,400) in the erosive RA group and 6,233/μl (range 400–12,600) in the nonerosive RA group. Most patients were taking disease-modifying antirheumatic drugs, low-dose steroids, and nonsteroidal antiinflammatory drugs.
|SF for discovery||Serum for verification|
|Erosive RA (n = 5)||Nonerosive RA (n = 5)||Erosive RA (n = 15)||Nonerosive RA (n = 15)||Healthy control (n = 15)|
|Age, years||62 ± 12||61 ± 16||59 ± 17||52 ± 12||34 ± 5|
|Duration of RA, years||12 ± 7||10 ± 4||12 ± 5||6 ± 4||NA|
|No. of swollen joints||19 ± 13||16 ± 21||24 ± 14||6 ± 7||0|
|Corticosteroid use, yes/no||4/1||2/3||10/5||8/7||NA|
|Serum CRP, mg/dl||80 ± 51||18 ± 16||52 ± 36||5 ± 7||ND|
In the verification portion of the study, blood was obtained from 30 patients with RA and 15 healthy control subjects. The patients were similar to the discovery patients in demographic and clinical characteristics, while the control subjects were younger (Table 1). All patients met the American College of Rheumatology (formerly, the American Rheumatism Association) 1987 classification criteria for RA (28).
Analysis of SF samples by LC/LC-MS/MS resulted in lists of proteins identified from each sample. In addition, the information about the number of peptide spectra and total peptide MS ion intensity for each protein in each sample was generated. (A representative portion of the protein profiles is available upon request from the corresponding author.) Some proteins were identified with high ion intensity from all 10 samples, while others appeared in only a subset of samples with low ion intensity. A subset of proteins was identified from just 1 or a few spectra that were detected in very few samples. Some proteins were found to be equally represented across all samples, while others tended to predominate in either erosive or nonerosive RA samples. The differences in total ion intensity of a particular protein, as well as the frequency of its appearance, between erosive and nonerosive RA samples are the basis for selection of candidate markers. In total, 418 proteins representing a broad range of functional classes were identified with a high level of confidence from 10 SF samples (5 erosive RA, 5 nonerosive RA). (Complete lists of identified proteins are available upon request from the corresponding author.)
Proteins identified in SF were categorized by their GO biologic process terms (25). Using a simple occurrence count, terms representing certain biologic processes, e.g., blood coagulation, immune response, acute-phase response, and cell adhesion and transport, predominated,. However, as shown in Figure 2, when proteins were categorized according to their representation within a particular class of biologic processes, in comparison with the total number of proteins that exist for that class, up-regulation of inflammatory proteins became apparent. The most significantly represented GO terms included response to biotic stimulus, immune response, defense response, and response to pest/pathogen/parasite.
Differentially expressed proteins were considered for selection as biomarker candidates as described in Patients and Methods. By comparing protein profiles of 5 erosive RA SF samples and 5 nonerosive RA samples, we selected 33 proteins that were differentially elevated in either erosive or nonerosive RA (Table 2). For example, CRP was detected in 4 of 5 erosive RA SF samples and 4 of 5 nonerosive RA SF samples, and the intensity ratio between erosive RA and nonerosive RA samples was 0.98:0.02, representing a 49-fold difference in total intensity. In another example, S100A12 was detected in all erosive RA samples but in only 2 of the nonerosive RA samples, with an erosive RA:nonerosive RA total intensity ratio of 0.93:0.07.
|Protein name||NCBI gene identification no.||Frequency||Relative intensity|
|Erosive RA||Nonerosive RA||Erosive RA||Nonerosive RA|
|Phosphoglycerate mutase 1||112128||5||3||0.93||0.07|
|S100A8 (calgranulin A)||225541||5||5||0.85||0.15|
|Neutrophil gelatinase–associated lipocalin||631308||2||0||1.00||0.00|
|Similar to coactosin–like protein||1196417||3||1||0.97||0.03|
|S100A12 (calgranulin C)||2146972||5||2||0.93||0.07|
|S100A9 (calgranulin B)||4506773||5||5||0.92||0.08|
|14-3-3 protein β/α||4507949||4||3||0.94||0.06|
|Neutrophil defensin α3||4758146||5||3||1.00||0.00|
|S100A11 protein (calgizzarin)||5032057||5||1||1.00||0.00|
|Epididymal secretory protein E1||5453678||4||1||0.81||0.19|
|Fcγ receptor IIIA (CD16A)||12056967||5||2||0.92||0.08|
|Vitamin K–dependent protein C||4506115||0||2||0.00||1.00|
The selected marker candidates included 30 proteins that were more abundant in the SF of patients with erosive RA and 3 that were more abundant in nonerosive RA SF. These candidate markers represented a range of biologic categories including serum proteins such as CRP, α2-plasmin inhibitor, and glutathione transferase; metabolic enzymes such as triosephosphate isomerase and G3PDH; calcium-binding S100 proteins such as S100A4, S100A8, S100A9, S100A11, S100A12, and S100P; matrix-degrading cysteine proteinase cathepsin B and its inhibitor cystatin B; cellular signaling proteins such as 14-3-3 protein and RhoGDI2; as well as a number of proteins whose properties and functions have not been well characterized.
We measured a subset of biomarker candidates including CRP and 6 members of the S100 protein family in serum in order to determine whether any of these proteins were present in the peripheral circulation of patients with RA. We chose CRP as a control because it could be easily measured by an independent quantitative method, enzyme-linked immunosorbent assay, to confirm the results obtained with the MRM method. We focused on the family of S100 proteins because they are homologous in sequence and structure, but diverse in biologic function (29). In addition, we measured each candidate biomarker in healthy subjects' serum as a control for establishing whether any of the proposed candidate markers were differentially expressed at increased levels in the serum of patients with RA regardless of erosion status. Isotope-labeled synthetic peptides were used as standards for the relative quantification of proteins in serum by MRM MS (20). The MRM method is used to rapidly determine the relative abundance of a protein analyte across several different biologic samples without the requirement for antibodies and immunoassays.
Among the 6 members of the S100 protein family measured in one set of pooled healthy, nonerosive RA, or erosive RA serum, only S100A8 (calgranulin A), S100A9 (calgranulin B), and S100A12 (calgranulin C/extracellular newly identified receptor for advanced glycation end products–binding protein [ENRAGE]) were increased in erosive RA serum versus nonerosive RA serum (4.8-, 19.5-, and 8.4-fold, respectively), or erosive RA serum versus serum from healthy individuals (3.4-, 14.1-, and 15.1-fold, respectively) (Table 3). The amounts of these 3 proteins were not significantly different between healthy and nonerosive RA samples. Concentrations of S100A11 (calgizzarin) and S100P were similar in healthy and nonerosive or erosive RA samples, while S100A4 (metastasin) appeared reduced in both erosive RA and nonerosive RA serum (both 6.1 ng/ml) compared with healthy serum (43.8 ng/ml).
|Serum concentration, ng/ml|
|Ratio, erosive RA/nonerosive RA||4.8||19.5||8.4||1.0||1.3||1.4|
|Ratio, erosive RA/healthy controls||3.4||14.1||15.1||0.1||0.8||0.7|
To substantiate the findings that S100A8, S100A9, and S100A12 were increased in the sera of patients with erosive RA, these 3 proteins were quantified in 2 additional sets of pooled patient sera (Table 4, experiments 2 and 3) and compared with the values obtained in the previous set of samples (Table 4, experiment 1). In each of the 3 sets of serum, the relative abundance of all 3 proteins was marginally higher in nonerosive RA serum than in serum from healthy individuals (S100A8 0.7–1.5-fold, S100A9 0.7–2.2-fold, and S100A12 [peptide 1] 0.8–4.8-fold). However, in patients with erosive RA, the relative abundance of these proteins was substantially higher than in healthy control samples (S100A8 3.0–6.0-fold, S100A9 9.4–14.1-fold, and S100A12 [peptide 1] 4.4–111-fold). Furthermore, the values obtained in erosive RA samples were again higher than those in nonerosive RA samples. Because the 111-fold difference noted for S100A12 in erosive RA serum versus healthy serum in experiment 2 seemed inordinately high, we confirmed this finding using a second peptide standard in experiment 1, and found that the relative quantification by MRM was virtually identical to the result obtained using peptide 1.
|Serum source, experiment||S100A8||S100A9||S100A12 (peptide 1)||S100A12 (peptide 2)||CRP MRM||CRP immunoassay|
|Mean ± SD ng/ml||Ratio to healthy||Mean ± SD ng/ml||Ratio to healthy||Mean ± SD ng/ml||Ratio to healthy||Mean ± SD ng/ml||Ratio to healthy||Mean ± SD ng/ml||Ratio to healthy||Mean ± SD ng/ml||Ratio to healthy|
|1||8.4 ± 2.9||–||8.7 ± 1.7||–||5.3 ± 0.2||–||ND||–||400 ± 80||–||0.0|
|2||10.3 ± 1.6||–||26.5 ± 3.1||–||0.5 ± 0.3||–||980 ± 50||–||700|
|3||2.4 ± 0.2||–||5.0 ± 0.7||–||8.6 ± 3.2||–||170 ± 20||–||2,600|
|1||5.9 ± 1.0||0.7||6.3 ± 3.1||0.7||9.5 ± 0.6||1.8||9.3 ± 2.1||NA||1,850 ± 70||4.6||9,900||NA|
|2||15.9 ± 4.6||1.5||58.6 ± 6.7||2.2||2.4 ± 1.7||4.8†||2,060 ± 90||2.1||2,800||4.0|
|3||2.5 ± 0.2||1.0||6.4 ± 0.8||1.3||7.2 ± 2.1||0.8||2,080 ± 60||12.2†||8,800||3.4|
|1||28.4 ± 1.5||3.4||123.0 ± 9.3||14.1||80.1 ± 1.4||15.1||85.1 ± 13.4||NA||30,100 ± 600||75.3||69,300||NA|
|2||61.4 ± 10.3||6.0||267.3 ± 15.2||10.1||55.5 ± 3.1||111†||46,000 ± 3,500||46.9||36,100||51.5|
|3||7.3 ± 0.1||3.0||46.8 ± 2.2||9.4||38.0 ± 2.3||4.4||24,200 ± 1,000||142.3†||73,300||28.2|
CRP, an acute-phase protein associated with inflammatory processes including RA, was also measured using the MRM method in experiments 1–3 (Table 4). CRP levels in nonerosive RA samples were 2–12-fold higher than in samples from healthy individuals, while in patients with erosive RA, CRP levels were 47–142-fold higher than in healthy control samples. Measurement of CRP by immunoassay confirmed these differences (Table 4). While significant discrepancies were observed between measurements of CRP concentration by immunoassay and MRM analysis, the concentration ratios between erosive RA, nonerosive RA, and healthy control samples were similar with the 2 methods. These results demonstrated the utility of MRM analysis as a relative quantification method for fast screening of protein markers in complex biologic samples (20). The limit of detection of the method is protein dependent due to differences in factors such as peptide ionization efficiency, yield of protein digestion, and complexity of matrix. Under the conditions described in this report, the detection limit was estimated at 10–100-ng/ml concentrations in serum for most proteins.
There is a clear need for sensitive and specific biomarkers to identify RA patients who are at high risk for erosive, disabling disease. Ideally, such biomarkers would be present at diagnosis, would predict clinically important outcomes, and would vary with successful treatment of the disease. S100A8, S100A9, S100A12, and CRP, all of which showed increased abundance in the serum of patients with erosive RA, might one day comprise a constellation of biomarkers for use in a multianalyte approach to the diagnosis of aggressive RA.
One premise for screening the SF proteome of patients with RA was the notion that prospective biomarkers of RA would be found in high concentrations at the site of joint destruction, thereby improving the chance that they would be detected by MS. After differentially screening a relatively small number of patient SF samples representing different clinical states of RA, we were able to identify a large number of proteins from which a list of biomarker candidates was selected based on differential profiling. Many of the candidate biomarkers play defined roles in inflammation processes. These candidates include some secreted proteins such as the S100 family, as well as some intracellular molecules such as 14-3-3 and RhoGDI2, which are linked to signaling pathways. The finding that CRP, a marker of active RA, showed increased abundance in both serum and SF samples from patients with erosive RA supports the hypothesis that the relative abundance of other prospective biomarkers of erosive RA can be discerned in body fluids using differential profiling through LC/LC-MS/MS.
Among the 6 S100 proteins on the selected marker candidate list, S100A8, S100A9, and S100A12 displayed differential expression levels between serum samples from patients with erosive RA, patients with nonerosive RA, and healthy individuals, as demonstrated by the MRM analysis. However, 3 other members of this family, S100A4, S100A11, and S100P, were not found at increased levels in erosive RA serum despite an apparent increased abundance in erosive RA SF. While the concentrations of these 3 proteins in SF have not been verified by the quantitative MRM method, the differences observed between erosive RA SF and nonerosive RA SF in ion trap discovery are very substantial (Table 2). Additionally, the overall sequence identity among the 6 identified S100 proteins is <50%. Of all the peptides found for the 6 proteins, none are shared by 2 or more proteins. Therefore, misidentification among members the S100 protein family is extremely unlikely. Interestingly, S100A8, S100A9, and S100A12 are proinflammatory molecules that are primarily expressed and secreted by activated phagocytes and they act during the recruitment of leukocytes (30), while S100A4, S100A11, and S100P are thought to be localized to tumor cells, muscle, and placenta, respectively (31). Thus, in the present study, the patterns of differential expression of a subset of proteins in SF samples were not fully mirrored in the peripheral circulation of patients when quantified in serum by the MRM method.
S100A8, S100A9, and S100A12 belong to a newly recognized class of mediators of inflammation (30). S100A8 and S100A9 were first found in macrophages in infiltrates in patients with RA (32). They function as a heterodimer (myeloid-related protein 8 [MRP-8]/MRP-14) and are released by activated monocytes upon interaction with activated endothelial cells under conditions of inflammation. The concentrations of S100A8 and S100A9 in SF and serum of patients with active juvenile RA (JRA) are significantly higher than those in healthy controls or JRA patients who have experienced remission after therapy (33). When compared with samples from patients with osteoarthritis, SF and serum from RA patients exhibit higher levels of S100A8 and S100A9 (34, 35). One of the functions of S100A8–S100A9 complex is to mediate leukocyte migration and adhesion to vascular endothelium (36, 37).
S100A12 is a proinflammatory chemoattractant that induces monocyte migration (38). The interaction of S100A12 and RAGE leads to activation of the NF-κB pathway in macrophages, lymphocytes, and endothelial cells, which in turn results in production of other proinflammatory mediators (39). Concentrations of S100A12 in SF and serum of patients with RA are higher than in patients with osteoarthritis (35). The average concentration of S100A12 in SF of RA patients is ∼2 μg/ml, while in serum it is 0.35 μg/ml in patients versus 0.05 μg/ml in healthy controls (3). Previously reported serum concentrations established for S100A12 (3) closely match the serum concentrations of S100A12 that were obtained in the present study using the MRM method for quantification. The concentration of S100A12 in SF is 5–10 times higher than in corresponding serum samples. Thus, there is an advantage for marker discovery in this medium due to high concentrations of analyte.
In addition to CRP and the S100 proteins, many of the marker candidates discovered in SF are directly or indirectly involved in biologic processes of RA or other inflammatory diseases. Osteopontin mediates attachment and invasion of synovial fibroblasts to cartilage and stimulates the release of collagenase 1 for matrix degradation (40). It is also implicated as a proinflammatory factor in multiple sclerosis (41). Peptidylprolyl isomerase, or cyclophilin, is a binding protein of a potent immunosuppressant cyclosporin A. Its role as proinflammatory mediator in arthritis and systemic lupus erythematosus has been suggested (42, 43). G3PDH and phosphoglycerate mutase have been found to bind the active metabolite of leflunomide, an immunoregulatory and antiinflammatory drug, implying that they may be potential molecular targets of the drug (44). Plasma concentrations of the low-affinity Ig Fcγ receptor IIIA (CD16A) and CD14 are increased in patients with RA versus healthy controls (6, 45). Cathepsin B is expressed at higher levels in synovial lining cells of RA patients than those of osteoarthritis patients (46). Glutathione transferase activity is 3-fold higher in the serum of patients with RA versus healthy controls (47). RhoGDI2 is an inhibitor of small G protein Rho, which plays an important role in human leukocyte signal transduction pathways (48, 49). The 14-3-3 protein binds and is affected by a novel immunosuppressant mizoribine (50).
This study had several limitations. First, the sample size was very small, due to the resource- and labor-intensive nature of the MS approach at the present time. The small sample size limits the generalizability of the results to patients with similar duration, treatment, joint distribution, and severity of RA. However, it is likely that proteomic methods will become more streamlined and efficient in the future, making further application and validation of this method practical. The small sample size also limited our ability to carry out statistical analyses with even modest power. Nevertheless, the main goal of this study was to demonstrate that proteomic techniques can be used to discover novel biomarkers in RA, and to confirm observations in SF with measurements in serum in a second group of patients.
By using an effective 2-step proteomic approach, in which biomarker discovery using semiquantitative protein profiling of diseased tissues was followed by candidate verification using quantitative MRM analysis in peripheral blood, we were able to identify at least 33 biomarker candidates for RA and to verify a subset of very promising biomarkers that are indicative of disease severity. This approach could be used for biomarker identification in many other diseases in which body fluids including serum are the discovery medium. As more efficient sample enrichment/separation techniques, as well as more sensitive and accurate mass spectrometers, become available, a wealth of additional information on disease-associated biomarkers at even lower protein concentrations will be revealed. By offering several promising biomarkers in SF and serum of patients with erosive RA, this investigation contributes to meeting the need for prognostic markers of aggressive RA. The clinical application of such biomarkers, while still requiring several levels of validation, could for the first time arm clinicians with a tool for rational selection of patients whose conditions warrant high-cost, high-risk treatment early enough in the disease course to prevent unnecessary joint destruction and disability.
We would like to thank Geoff Ginsburg, MD, PhD for critical reading of the manuscript and Steve Lewitzky and Mike Morrissey for their input in the statistical analysis.