Classification of cancer types by measuring variants of host response proteins using SELDI serum assays


  • Fax: +510-505-2101.


Protein expression profiling has been increasingly used to discover and characterize biomarkers that can be used for diagnostic, prognostic or therapeutic purposes. Most proteomic studies published to date have identified relatively abundant host response proteins as candidate biomarkers, which are often dismissed because of an apparent lack of specificity. We demonstrate that 2 host response proteins previously identified as candidate markers for early stage ovarian cancer, transthyretin and inter-alpha trypsin inhibitor heavy chain 4 (ITIH4), are posttranslationally modified. These modifications include proteolytic truncation, cysteinylation and glutathionylation. Assays using Surface Enhanced Laser Desorption/Ionization Time of Flight Mass Spectrometry (SELDI-TOF-MS) may provide a means to confer specificity to these proteins because of their ability to detect and quantitate multiple posttranslationally modified forms of these proteins in a single assay. Quantitative measurements of these modifications using chromatographic and antibody-based ProteinChip® array assays reveal that these posttranslational modifications occur to different extents in different cancers and that multivariate analysis permits the derivation of algorithms to improve the classification of these cancers. We have termed this process host response protein amplification cascade (HRPAC), since the process of synthesis, posttranslational modification and metabolism of host response proteins amplifies the signal of potentially low-abundant biologically active disease markers such as enzymes. © 2005 Wiley-Liss, Inc.

Although the analytical tools used for protein expression profiling have become increasingly powerful, a fundamental challenge remains—namely the broad dynamic range that encompasses the human proteome.1, 2 Serum, of particular interest because biomarkers found therein may lead directly to diagnostic applications, contains a dynamic range of protein expression estimated to be 9–12 fold, far wider than any analytical technique can currently discern. In spite of many efforts to overcome this broad dynamic range problem, most proteomic biomarker discovery studies have nevertheless described relatively abundant host response proteins as candidate biomarkers.3, 4 The host response comprises a cascade of inflammatory signals that can be triggered by very small inciting events, e.g., a localized infection or a small tumor, and that leads to up- and downregulation of a group of circulating proteins often called acute phase reactants.5, 6, 7 These proteins are mostly synthesized by the liver and include albumin, transthyretin, retinol binding proteins, clotting proteins, lipoproteins, c-reactive protein and immune mediators, among others. Since this host response occurs in response to virtually any insult, including inflammation, infection, vascular disease and malignancy, measurement of these analytes is generally not thought to be diagnostically useful.

We recently completed a 503-patient study to identify biomarkers that could be used to distinguish patients with early stage ovarian cancer from control individuals (benign disease or healthy women). Three markers, transthyretin, apolipoprotein A1 and a fragment of inter-alpha trypsin inhibitor heavy chain four (ITIH4), were identified to comprise a multimarker panel with higher diagnostic accuracy than CA-125.8 Although these proteins represent abundant host response proteins, of particular interest to us was the fact that 2 components of the 3-marker panel, transthyretin and ITIH4, were fragments of their mature counterparts. This raised the interesting possibility that diagnostic specificity of these analytes would be conferred by the relative amounts of the modified and unmodified forms of these proteins, in addition to their use in a multimarker panel. We developed SELDI-TOF-MS-based immunologic and chromatographic assays that could detect and quantify these variants and used multivariate analysis to distinguish ovarian cancer from other cancers.

Material and methods

Patient samples

A total of 142 archived serum specimens collected for routine clinical laboratory testing at the Johns Hopkins Medical Institutions were tested. The samples included 41 healthy women, 41 patients with late-stage ovarian cancer and groups of 20 patients each with breast, colon and prostate cancers. The gender of the patients with breast and colon cancer was female. All samples were processed promptly after collection and stored at 2–8°C for a maximum of 48 hr prior to freezing at −70°C.


Rabbit anti-human transthyretin antibody was purchased from Dako (Glostrup, Denmark, catalog number A0002;). To generate antibodies to ITIH4, a peptide corresponding to the biomarker for ovarian cancer derived from ITIH4 was chemically synthesized with an addition of Cystein at the N-terminus (SynPep, Dublin CA). The sequence is as follows: CMNFRPGVLSSRQLGLPGPPDVPDHAAYHPF. The peptide was conjugated to KLH and injected into rabbits following a 69-day standard protocol utilized by SynPep. Rabbit bleeds were collected, and ELISA tests to determine titer were performed by SynPep. Antibodies were both Protein-A purified and affinity purified.

ITIH4 stability test

Ten microliters of sample serum were diluted in 90 μl of binding buffer with or without protease inhibitor cocktail (Roche, Mannheim, Germany). The incubation was done at either room temperature or 4°C for 2 hr or 4 hr. The beads were washed once with binding buffer with or without the protease inhibitor, then twice with the binding buffer and once with water. The remainder of the assay was performed as described below.


Antibodies were cross-linked to AminoLink® Plus beads (Pierce Biotechnology, Rockford, IL). Beads were washed 2 times with 200 μl of binding buffer (PBS pH 7.2, 0.1% Tween). Patient serum samples were diluted in the same buffer (1:50 for transthyretin assays or 1:5 for ITIH4 assays) to a final volume per assay/well of 100 μl. Reactions were performed in 96-well filter plates (Silent Screen Plate, 96 well, 1.2 μm, Nalge Nunc #256065) at room temperature for 2 hr on a MicoMix shaker (DPC; program15, amplitude 6). At the end of the incubation, the beads were washed 3 times with 200 μl of PBS pH 7.2, 0.1% Tween, 3 min each with shaking. Captured material was eluted 3 times by adding 50 μl of an organic elution buffer (33.3% isopropanol/16.7% acetonitrile/0.1% trifluoracetic acid + 0.1% CHAPS) and incubating 5 min each time with shaking. The 3 eluates were pooled in a 96-well v-bottom plate by centrifugation at 671g for 1 min in a desktop Sorvall. Array binding was performed immediately or the eluates were stored at −80°C prior to additional processing.

Transthyretin chromatographic assay

All assays were performed in 96-well filter plates. Five microliters of patient serum were denatured with 7.5 μl 9 M urea, 2% CHAPS, 50 mM TrisHCl, pH 9 in a v-bottom 96-well plate and incubated at room temperature for 2 min with vortexing. The denatured serum was diluted with 150 μl 0.02% Triton X100 PBS (2×) containing protease inhibitor cocktail without EDTA (Roche, 1 tablet per 50 ml). For each assay, 50 μl of a 50% suspension of IDA-Ni (II) (Biosepra IMAC Hypercel, charged with 0.1M NiSO4) was washed with 0.02% Triton X100 PBS (2×) 200 μl 3 times. The diluted, denatured serum was added to IDA-Ni containing filter plate and incubated with shaking at room temperature for 30 min. The plates were then washed 8 times with 200 μl PBS containing 0.02% TX100. After gently blotting dry the bottom of the filter plates on tissue paper (Kimwipe), the bound material was eluted into a 96-well plate with 75 μl 10 mM imidazole 1 M urea, 0.1% CHAPS, 0.3 M KCl, 100 mM TrisHCl, pH7.5 4 times, mixed 10 min each time. The 4 elutions were pooled.

ProteinChip array binding and reading

All ProteinChip array incubations were performed in 96-well bioprocessors. CM10 ProteinChip arrays (Ciphergen, Fremont, CA) were pre-equilibrated with 150 μl of 100 mM sodium acetate, pH 4.0 twice, 5 min each. Q10 ProteinChip arrays were pre-equilibrated with 200 μl 0.1 M sodium phosphate buffer, pH 7.5 twice, 5 min each. For the transthyretin immunoassay, 40 μl of the immunoassay elution was mixed with 60 μl of 100 mM sodium acetate, pH 4.0 buffer in the wells. For the transthyretin chromatographic assay, 40 μl of the IDA-Ni elution was diluted with 150 μl 0.1M sodium phosphate buffer, pH 7.5. For the ITIH4 assay, 50 μl of the immunoassay elution was mixed with 50 μl of 100 mM sodium acetate, pH 4.0 buffer. Array binding was performed at room temperature for 30 min (CM10 arrays) or 60 min (Q10 arrays) with shaking. The CM10 arrays were washed 3 times for 5 min each with 200 μl of pH 4.0 binding buffer, with shaking. The Q10 arrays were washed 2 times with 200 μl 0.1M sodium phosphate, pH 7.5, with shaking. The arrays were rinsed with water once for 1 min and then briefly allowed to air dry. For transthyretin assays, 1 μl of 50% of sinapinic acid (SPA) was added to each spot, dried and then reapplied. For ITIH4, 1 μl of 20% CHCA was added to each spot, dried and then reapplied. The arrays were read in PBSIIc ProteinChip readers, a time-lag focusing, linear, laser desorption/ionization-time of flight mass spectrometer. All spectra were acquired in the positive-ion mode. Time-lag focusing delay times were set at 400 ns for low-mass scans and 1900 ns for high-mass scans. Ions were extracted using a 3 kV ion extraction pulse and accelerated to a final velocity using 20 kV of acceleration potential. The system employed a pulsed nitrogen laser at repetition rates varying from 2–5 pulses per second. Typical laser fluence varied from 30–150 μJ/mm2. An automated analytical protocol was used to control the data acquisition process in most of the sample analysis. Each spectrum was an average of at least 50 laser shots and externally calibrated against a mixture of known peptides or proteins.

Data analysis

Data preprocessing was performed in CiphergenExpress version 2.1. Spectra were baseline subtracted using a fitting window of 8 times expected peak width. For ITIH4 assays, data were normalized using an external coefficient of 1, mass window of m/z 5,000–50,000. For transthyretin chromatographic assays, data were normalized using an external coefficient of 1, mass window of m/z 1,500–200,000. Univariate analysis was performed using the Mann-Whitney test for each pairwise comparison.


We recently identified a 3-marker panel comprising apolipoprotein A1, transthyretin and ITIH4 that could distinguish women with early stage ovarian cancer from healthy women. To determine whether these markers were specific to ovarian cancer or were generically changed in a variety of cancers, we performed immunoassays for apolipoprotein A1 and transthyretin on samples from patients with ovarian cancer, breast cancer, colon cancer, prostate cancer and healthy women. Although both apolipoprotein A1 and transthyretin were changed in ovarian cancer, none of the other cancers tested demonstrated significant changes in both of these analytes, although transthyretin was decreased in patients with colon cancer (p < 0.01).8 Traditional immunoassays measure the sum of all antigens that bind the antibody and therefore do not reveal the relative contribution by isoforms or posttranslationally modified forms of the analyte. We therefore constructed ProteinChip assays for each of the 3 ovarian cancer biomarkers discovered in our previous study. Figure 1a shows an example of the ProteinChip immunoassay for transthyretin. Note that this immunoassay, in contrast to nephelometry, can simultaneously quantify and distinguish 4 forms of transthyretin: unmodified, cysteinylated, glutathionylated and truncated. All but the truncated form have been previously described.9 These 4 forms can also be visualized by direct serum profiling on metal affinity (IMAC30 coupled with copper), anionic (Q10) and cationic (CM10) ProteinChip arrays (Fig. 1b). We performed the transthyretin SELDI chromatographic assay on the same set of samples for which the traditional immunoassay was run (Fig. 1c–f). These results demonstrate that quantifying individual forms of transthyretin confers higher diagnostic accuracy than measuring total transthyretin alone. In particular, these results reveal that colon cancer, by traditional assay, was considered to have a significant change in transthyretin levels because of the downregulation of the truncated and unmodified forms, which comprise approx. 50% of total transthyretin. These novel SELDI assays for transthyretin variants demonstrate that among these cancers, the cysteinylated and glutathionylated forms are decreased to a significant level only in ovarian cancer. This provides a means to obtain greater specificity than measuring total transthyretin by traditional methods. Similarly, immunoassays directed against apolipoprotein A1 reveal 2 peaks, one at 28 kD and another at 29 kD (data not shown).

Figure 1.

SELDI ProteinChip assays for transthyretin. (a, b) SELDI immunoassay using polyclonal antibody to transthyretin immobilized on affinity beads. Serum was incubated with the antibody-coated beads, washed and the bound material eluted onto cationic (CM10) ProteinChip arrays. (bottom two panels) SELDI chromatographic assay on anionic (Q10) ProteinChip array. Both assays reveal 4 forms of transthyretin: a truncated form lacking the first 10 amino acids (t); unmodified, full-length transthyretin (u); cysteinylated transthyretin (c); and glutathionylated transthyretin (g). In our experience, cysteinylated transthyretin is always the largest peak of these forms, and the truncated form is the least. (c–f) Scatter plots of intensity values obtained using the SELDI chromatographic assay for each of the transthyretin forms across samples from breast, ovarian, prostate and colon cancer as well as age-matched controls. *p < 0.01; **p < 0.001 using 2-group t-test (Mann-Whitney test) comparing each form of transthyretin for each cancer vs. healthy controls.

We developed a SELDI immunoassay to examine the importance of quantifying the presence of specific fragments of ITIH4, another marker for early stage ovarian cancer. ITIH4 is a 120 kD protein but is known to be extensively processed in vivo.10 The SELDI immunoassays are quantitative, showing a linear response more than 2-fold (Fig. 2). In SELDI immunoassays of human serum, the antibody recognized both the antigenic peptide as well as a series of smaller peptides representing sequential N-terminal truncation (Fig. 3). These truncations may arise in vivo after being processed by an aminopeptidase11 or ex vivo, depending on the stability of the peptides. We therefore wanted to determine the stability of these peptides during the assay process. We tested 3 parameters: temperature of incubation (room temperature vs. 4°), duration of incubation (2 vs. 4 hr) and effect of protease inhibitors. Figure 4 reveals that the addition of protease inhibitors had no effect on the intensity or pattern of fragmentation. Longer duration incubation did decrease the overall intensity of the peak heights but did not alter the pattern of fragmentation. Incubation at room temperature gave slightly lower peak heights than at 4°, but, like the other 2 parameters, had no effect in overall pattern of fragmentation. In addition, incubation of the serum at room temperature prior to processing did not alter the appearance of the peptide train, indicating that these changes occurred in vivo, not during the assay process (data not shown).

Figure 2.

Quantitation of SELDI ProteinChip assay for ITIH4. ITIH4 peptide was spiked into phosphate-buffered saline at the indicated concentrations and a standard curve was generated.

Figure 3.

SELDI ProteinChip assay for ITIH4 in serum. Top, sequence of ITIH4 peptide discovered as a biomarker for ovarian cancer. Bottom, SELDI immunoassay for ITIH4, revealing amino-terminus train.

Figure 4.

Stability of ITIH4 fragments. The SELDI immunoassay for ITIH4 was performed in the absence or presence of protease inhibitors for 2 or 4 hr at room temperature or 4°. Longer duration led to lower intensity peaks, but none of these parameters changed the overall fragmentation pattern.

This SELDI immunoassay was used to analyze the serum of the same set of patients described earlier. Table I demonstrates that ITIH4 is processed differently in the serum of patients with different types of cancer. For example, no significant difference in these peptides of ITIH4 was found between breast cancer patients and control individuals. Levels of several of the peptides, but not all, were changed in 3 of the other types of cancer examined (ovarian, colon and prostate) but not in breast. Although ovarian and colon cancer both demonstrated changes in the level of similar peptides, the changes in colon cancer were generally more marked.

Table I. Differential Modification of ITIH4 in Different Cancers1
  • 1

    Two-group t-test (Mann-Whitney) comparing intensity values of each of the forms of ITIH4 between the respective cancer and age-matched control individuals.–ns, not significant; nd, not determined (the 2,486 m/z fragment was not visualized).


Because each cancer demonstrates a different combination of differentially expressed modified forms of the proteins, multivariate analysis may be used to classify the cancers. Use of protein markers and marker fragments in combination is likely to confer higher specificity than using any single one.12, 13, 14, 15 Singular value decomposition (SVD) was applied to the data of the 2 forms of ApoA1 and 4 forms of transthyretin to allow for visualization at reduced dimension.16 The groups of ovarian cancer samples, controls and the breast, colon and prostate cancer samples each formed overlapping but distinguishable clusters with moderately good separation from one another in the 2-dimensional space spanned by the first and third components of the SVD analysis (Fig. 5). Since SVD is an unsupervised process, this result serves as an independent validation of the cancer classification information carried by the individual protein variants. These results demonstrate that relatively abundant serum proteins can in fact be specifically associated with a disease, when their posttranslational modifications are taken into consideration and when used in combination. It is likely that as additional biomarkers are discovered and added to this analysis, greater separation may be achieved.

Figure 5.

Unsupervised cluster analysis by singular value decomposition (SVD) using variants of apolipoprotein A1 and transthyretin. Samples are displayed in a 2-dimensional space spanned by the first and third SVD components. The ovarian, breast, prostate and colon cancer samples and healthy controls were distributed into clusters using unsupervised learning, which serves as an independent validation of the cancer classification information carried by the protein variants.


Serum proteomics is increasingly used to discover and characterize candidate biomarkers to answer a variety of clinical questions, including initial diagnosis, prognosis and treatment response. Perhaps the most important lesson that has been learned from protein expression profiling is the difficulty of overcoming the dynamic range of the proteome, which is 8–9-fold greater than the quantitative dynamic range of any commonly used analytical technique. Any untargeted approach will intrinsically favor the more common analytes. These include albumin, transferrin and immunoglobulin, which comprise more than 70% of the serum and plasma proteomes. Together with the next group of abundant proteins, which include transport proteins such as haptoglobin, transthyretin, lipoproteins; protease and protease inhibitors such as clotting factors and alpha-1-antitrypsin; and immune response proteins such as complement factors and c-reactive protein, more than 99% of the plasma proteome is represented.1, 2

Although each of these abundant proteins plays a normal physiologic role, most are also components of the host response that is elicited by any pathophysiologic process. Although this acute phase response has been surmised to be a general, stereotyped phenomenon, it has become clear that in fact different diseases elicit different individual components of the acute phase response and that they do so even in their earliest stages.6, 17 This host response is generally mediated by the innate immune system; for example, interleukins 6 and 8 are commonly associated with the increases in c-reactive protein. The inflammatory cascade includes a multitude of constituents with varying functions, including structural proteins, clotting factors, angiogenesis and transport proteins.7 In addition, proteases and protease inhibitors play integral roles in the host response. Whereas the activity of some of these factors can be directly measured, in other cases it is easier to measure their byproducts, i.e., measure the proteolytic fragments generated by their activity.

In addition to the proteases and protease inhibitors elicited by the inflammatory response, both proteases and protease inhibitors are expressed on the cell surface of the malignant cells.18, 19 Many of the proteases expressed by malignant cells themselves are thought to play a role in invasion, loss of contact inhibition and angiogenesis.20, 21 For example, the aminopeptidase CD13 is expressed on the surface of ovarian cancer cells and endothelial cells and is thought to play a role in invasion and angiogenesis.11 Most of these proteases do not reach generally detectable levels in the circulation until late-stage disease, presumably as a result of increased tumor bulk.22 Therefore, it is unlikely that relying on measuring these proteins will be useful in the diagnosis of early stage cancer, when there is the best chance of survival. Because host response proteins exist at substantially higher concentrations than the enzymes that process them within the tumor microenvironment, measurement of the modified forms of these analytes is more likely to serve that purpose.

We propose the following model, which we have called the host response protein amplification cascade (HRPAC), to describe how the host response provides a means to detect disease at its earliest stages. In this model, host response proteins that are exposed to the tumor microenvironment are processed and exit the tumor microenvironment substantially modified and at sufficiently high concentrations to be detected (Fig. 6a). The modifying proteins themselves are present at relatively high concentrations in the local microenvironment but at much lower concentrations in the systemic circulation; therefore, measuring the (dis)appearance of substrates and enzymatic products represents measuring an amplified signal of enzymes and other factors. Inflammatory proteins participate in cascades that have multiple sites of entry and exit, and therefore no individual host response protein can be expected to have adequate clinical diagnostic specificity. However, because different inciting events (inflammation, infection, cancer) lead to the expression of different subsets of proteases and protease inhibitors, different byproducts are observed, even of the same parent protein (Fig. 6b). Just as the acute phase response is not a generic entity, cancers do not express these proteases generically but express different subsets of proteases according to tumor type, as exemplified by the family of tissue kallikreins.23

Figure 6.

Host response protein amplification cascade concept. (a) Proteins are synthesized in the liver and enter the circulation. When these proteins are exposed to a localized disease area, they are processed by the local host response, and modified forms of the proteins reenter the general circulation. This is the source of amplification of a localized disease signal. (b) Specificity of this process is made possible by the fact that each disease generates a different type of local host response. This may be due to the fact that each disease expresses a different set of antigens (e.g., tumor markers) or that the recruitment of specific inflammatory mediators differs based on the inciting event.

Identification of the enzymes responsible for generating these posttranslational variants will provide both mechanistic insight as well as a potential source of additional diagnostic markers. We hypothesize that most of these proteases will be present at substantially lower concentrations than the analytes we are measuring. In cancer, a large number of proteases and protease inhibitors have been suggested to be candidate tumor markers, including the tissue kallikreins,23 matrix metalloproteases,24 prostasin,25 bikunin,26 urokinase plasminogen activator receptor,27 and tumor associated trypsin inhibitor.28 Most of these enzymes circulate in the serum at concentrations of pg-ng/ml, but their local concentration at the site of the tumor is substantially higher. Determination of which proteases act on which substrates to generate which products will permit their inclusion in future diagnostic multimarker panels, as well as, possibly, potential targets for therapeutic intervention.29

Protein expression profiling using proteomics techniques can be used to discover novel modified forms of proteins and to determine which combinations of proteins are most specifically associated with which conditions. For this to be valid, the protein expression profiling technique must be reproducible and have adequate throughput to assay enough samples across enough clinical conditions to determine clinical sensitivity and specificity. To optimally determine in vivo protein processing, protein expression profiling methodologies that do not artificially introduce cleavages are preferred (top-down approaches).

Another important component of the HRPAC approach to diagnostics is the use of multivariate analysis to classify cancer types as well as to distinguish cancer from other diseases. Our study examined only three markers, apolipoprotein A1, transthyretin and a specific fragment of ITIH4. Although a moderately high level of specificity was achieved in our study using these markers, even greater specificity should be achieved when a larger panel of markers is taken into consideration. This will be important in determining an optimal biomarker panel that encompasses a more diverse set of diseases that includes cancer, cardiovascular disease and inflammatory disease. For example, c-reactive protein has been reported to be increased in a variety of diseases, including colon cancer,30 cardiovascular disease,31 osteoarthritis32 and macular degeneration.33 We have developed SELDI immunoassays for c-reactive protein and are studying its various posttranslational modified forms for their potential inclusion in the next generation of HRPAC-based diagnostic tests (data not shown).

The ability to measure the individual products of the HRPAC and to determine which combinations of these products are specific and sensitive for a given disease condition is likely to enable the development of a novel multimarker panel diagnostic test for the disease condition. Because this paradigm can be used to distinguish between various cancers, this may even offer the possibility of providing a molecular solution to the problem of diagnosing epithelial tumors of unknown primary origin. Taken one step further, this paradigm of measuring the host response may eventually permit the diagnosis of cancer at its earliest stages, irrespective of primary source.