High predictive accuracy of an unbiased proteomic profile for sustained virologic response in chronic hepatitis C patients


  • This study was funded in part by a generous grant from the David H Murdock Institute for Business and Culture; this project was also supported in part by CTSA Grant No.1 UL1 RR024128-01 from NCCR and NIH Roadmap for Medical Research.

  • Disclosures: A.T. and J.G.M. are coinventors of a patent relating to the IL-28B discovery.


Chronic hepatitis C (CHC) infection is a leading cause of endstage liver disease. Current standard-of-care (SOC) interferon-based therapy results in sustained virological response (SVR) in only one-half of patients, and is associated with significant side effects. Accurate host predictors of virologic response are needed to individualize treatment regimens. We applied a label-free liquid chromatography mass spectrometry (LC-MS)-based proteomics discovery platform to pretreatment sera from a well-characterized and matched training cohort of 55 CHC patients, and an independent validation set of 41 CHC genotype 1 patients with characterized IL28B genotype. Accurate mass and retention time methods aligned samples to generate quantitative peptide data, with predictive modeling using Bayesian sparse latent factor regression. We identified 105 proteins of interest with two or more peptides, and a total of 3,768 peptides. Regression modeling selected three identified metaproteins, vitamin D binding protein, alpha 2 HS glycoprotein, and Complement C5, with a high predictive area under the receiver operator characteristic curve (AUROC) of 0.90 for SVR in the training cohort. A model averaging approach for identified peptides resulted in an AUROC of 0.86 in the validation cohort, and correctly identified virologic response in 71% of patients without the favorable IL28B “responder” genotype. Conclusion: Our preliminary data indicate that a serum-based protein signature can accurately predict treatment response to current SOC in most CHC patients. (HEPATOLOGY 2011)

Chronic hepatitis C infection (CHC) is a global health problem, with an estimated 120 to 130 million chronic hepatitis C virus (HCV) carriers worldwide. Current standard-of-care (SOC) treatment with pegylated interferon (PEG-IFN) in combination with ribavirin is costly, associated with significant side effects, and results in sustained virological response (SVR) in only about one-half of treated subjects in controlled clinical trial settings with carefully selected patients. In standard clinical practice, comorbid conditions and potential adverse events result in frequent dosage adjustments, reduced compliance, and early withdrawal from treatment, further reducing SVR.1

Identifying host-viral factors that predict the likelihood of SVR prior to initiating therapy would be a very useful clinical tool that could help reduce costs and avoid unnecessary exposure to therapy with significant side effects. Data from clinical registration trials have identified genotype and pretreatment HCV RNA levels as the most significant viral predictors of SVR.2 Baseline host characteristics that influence virologic response include race, gender, age, disease stage, insulin resistance, and body weight, but are unable to accurately predict SVR. Pretreatment serum biochemical markers that are often obtained routinely, such as liver transaminases, total cholesterol and γ-glutamyltransferase, are also unreliable predictors of SVR.3 As a result, predictive models for SVR that are based on pretreatment host-viral characteristics have not been developed for routine clinical use. Recent data indicate that a single nucleotide polymorphism in the host IL28B region on chromosome 19q13 identified in one-third of HCV genotype 1-infected patients is also strongly associated with SVR.4 The allele frequency for this SNP varies with genetic ancestry and, for example, the response variant is present in 15% of African-Americans, who have the lowest SVR rates of any ethnic group to current SOC therapy. Thus, alternative and more reliable baseline predictors of virologic response are still required for most CHC patients.

Most proteomic studies in HCV to date have used unbiased proteomic profiling to identify novel markers of HCV-related fibrogenesis or identify pathways of hepatocellular carcinoma (HCC) development. A prior study evaluating treatment response in CHC patients using surface-enhanced laser desorption/ionization / time-of-flight mass spectrometry (SELDI-ToF/MS) ProteinChip technology noted that two “peaks” of interest in combination with fibrosis stage and viral genotype could predict treatment response in 81% of a verification cohort with an area under the receiver operator characteristic curve (AUROC) of 0.75.5 However, these protein “peaks” have not yet been further characterized or validated in other CHC cohorts, and as such are unlikely to have further clinical application. In this study we used an unbiased proteomic discovery platform with high resolution, accurate mass liquid chromatography mass spectrometry (LC-MS) to analyze pretreatment sera from a well-characterized training cohort of 55 CHC patients, and validation in a further 41 CHC genotype 1 patients with characterized IL28B genotype. We identified novel signature pathways of derived metaproteins that can predict SVR with an AUROC of 0.86-0.90 before commencing SOC antiviral therapy.


AUROC, area under the receiver operator characteristic curve; CHC, chronic hepatitis C; HCC, hepatocellular carcinoma; HCV, hepatitis C virus; LC-MS, liquid chromatography mass spectrometry; PEG-IFN, pegylated interferon; SOC, standard-of-care; SVR, sustained virological response.

Patients and Methods

Patient Population.

Patients were selected from the Duke Hepatology Clinical Research (DHCR) database, an ongoing biorepository of greater than 3,000 HCV-infected patients initiated in 2002. Only sera from selected CHC patients with available treatment history and demographic data were included in this study. SVR was defined as undetectable HCV RNA at 6 months following end-of-treatment with current SOC therapy.

Pretreatment serum samples from 55 CHC patients in three different phenotypic groups were selected as the initial “test” group for this study: 19 nonresponder patients with HCV genotype 1; 17 SVR patients with HCV genotype 1; and 20 SVR patients with HCV genotype 2/3. An additional 41 genotype 1 CHC patients were selected for the validation study (n = 26 responders and n = 15 nonresponders). Responders and nonresponders were carefully matched as much as possible with respect to clinical and demographic variables, such as age, race, gender, and viral load, known to affect treatment response (Supporting Table 1). All patients provided written informed consent and all study procedures were approved by the Duke University Institutional Review Board.

Sample Preparation and LC-MS/MS Analysis.

A detailed sample preparation protocol and LC-MS methods section are available in the Supporting Methods section. Serum samples were statistically randomized and immunodepleted using MARS14 columns in an LC format (Agilent Technologies, Santa Clara, CA) according to the manufacturer's recommendations. After buffer exchange, protein concentration was normalized across all samples and ≈25 μg of protein from each sample was subjected to in-solution digestion with trypsin. All samples were spiked with 50 fmol MassPREP ADH digestion standard per μg of total protein (Waters, Milford, MA). LC-MS/MS analysis was carried out on a nanoAcquity liquid chromatograph coupled to a QToF Premier mass spectrometer (Waters). The Rosetta Elucidator v. 3.3 software package (Rosetta Biosoftware, Rosetta Inpharmatics, Seattle, WA) was used to import and align all LC-MS raw data files and perform feature quantitation. Mascot v. 2.2 (Matrix Sciences, Boston, MA) and Proteinlynx Global Server v. 2.4 (Waters) database search engines were used to make peptide identifications. MS/MS identifications were curated using a forward/reverse database search, at a false discovery rate of 1%. Using this data collection and informatics pipeline, the mass spectrometer is used to measure expression for tryptic peptides and these peptides can be used to infer expression of the parent proteins.

IL28B Polymorphism.

Genotyping was performed with the Illumina Human610-quad BeadChip (Illumina, San Diego, CA) as described.7 Genotype at the polymorphic site rs12979860 on chromosome 19 was suitable for analysis in 41 CHC patients. Genotypes were pooled for analysis (CC versus CT or TT) and we refer to an IL28B polymorphism, noting that the association SNP actually lies 3 kb upstream of the IL28B gene.

Predictive Model Based on Meta-Proteins.

To build predictive models based on the proteomic data, we used a “metaprotein” classification approach which aims to minimize the effects of several types of error on a quantitative proteomic dataset, including incorrect peptide identification and incorrect protein inference due to tryptic peptide sequence identity. The central difference to standard approaches is that we use the expression profile of a peptide or group of peptides to assist its grouping with similarly quantified peptides, in addition to the traditional grouping by common “parent” protein sequence. We generated 105 metaproteins from which to build a model predictive of response to therapy; however, we expect most of these to be nonpredictive either individually or within a linear model. We used shotgun stochastic search variable selection, as described,6 to obtain a set of parsimonious model. This procedure is similar to penalized regression schemes such as AIC or BIC, but allows for the use of model averaging in prediction. A weighted average of the set of fitted models is then computed (with models demonstrating good fit weighted more heavily). This model averaging scheme has been shown to outperform the single best model in predictive accuracy because it more accurately estimates the uncertainty associated with model choice.7 As subjects were selected to balance the most relevant clinical predictors (such as race, HCV RNA levels, and gender) we did not use these in our predictive model. This type of filtered sample selection would likely distort the relationships between relevant clinical variables and our outcome, thereby likely decreasing the predictive accuracy of our model. We observed a difference in our predictor between responders and nonresponders of 0.4 on our training data, and the within-group standard deviation was observed to be 0.23. With this magnitude of effect size, and assuming 15 nonresponders and 26 responders in our validation group, we expected a power of >97% to detect a statistically significant difference (P < 0.01) in our predictor in the validation dataset.

See Supporting Methods, Tables, and Figures for additional information. MS/MS peptide identifications have been uploaded to the PRIDE database (http://www.ebi.ac.uk/pride/), accession numbers 10679 and 10680.


Baseline Demographics

Responders and nonresponders from both the training (n = 55) and validation (n = 41) cohorts of CHC patients selected for this study were well matched in terms of host-viral characteristics and were typical of a U.S. tertiary referral center CHC cohort (Supporting Table 1).

Training Cohort

Metaprotein-Based Response Prediction.

The data collection approach used in this study, specifically data-independent MS/MS (or MSE), enables peptide identification as well as high-quality label-free quantification in the same analysis. Integrating database searches for single-dimension LC-MSE analyses of all 55 subjects in the training cohort with traditional LC-MS/MS analyses of a pooled plasma sample, we identified a total of 105 proteins with two or more peptides, with a total of 3,768 peptides (an average of 36 peptides per protein). Latent factor modeling was used to cluster identified peptides into 105 metaprotein factors, which were then used as independent variables in a regression model with SVR as the outcome.8

Three metaproteins were significantly associated with treatment response even after correction for multiple hypotheses (Benjamini-Hochberg, alpha level 0.05). These three included vitamin D binding protein (VTDB; P = 3.2 × 10−4), Alpha 2 HS glycoprotein (FETUA; P = 4.4 × 10−5), and Complement C5 (CO5; P = 4.9 × 10−5), and when combined in a probit regression model, these provided a combined AUROC for SVR of 0.90. Inclusion of demographic variables (female sex, African-American [AA] race, and HCV RNA) in our model increased the AUROC for SVR to 0.94, but this marginal incremental difference was not significant (P = 0.98) (Fig. 1). This lack of improvement after the inclusion of demographic variables is likely due to subjects in the responder and nonresponder groups being matched for these variables. The three metaprotein regression model (including demographic variables) for predicting SVR is given as:

equation image

where P = probability of SVR. The Probit function is the inverse of the cumulative distribution function of a standard normal distribution. The model without demographic variables is given as:

equation image
Figure 1.

(A,D,G) Heatmaps of the peptides in each of the most predictive factors. They are split by responders (labeled 1 on the right side of the white bar) and nonresponder (labeled 0 on the left side of the vertical white bar). Each of these sets of peptides make up a single factor, and the expression of those factors are shown in scatterplots (B,E,H). These scatterplots show the average expression of the peptides for the three corresponding factors in the heatmap. (C,F,I,L) Histograms showing the makeup of the factor shown in that row. The majority of peptides labeled “other” were unidentified by the proteomics analysis. (J) The overall performance of the predictor based on the VTDB, FETUA, and CO5 factors. (K) The receiver operating characteristic curve for the predictor in (J) (in red), along with a predictor based on only clinical variables (black) and a predictor based on both factors and clinical variables (blue).

We also carried out a separate analysis of these data, using clustering on a subset of the isotope groups selected by false discovery rate. This analysis supports an association between these three proteins and sustained viral response (see Supporting peptide level analysis). In summary, decreased levels of VTDB and FETUA and increased levels of CO5 were associated with SVR, representing the single best model. However, for the purposes of prediction in our validation group, we use a model averaging approach, the results of which are presented below.

Validation Cohort

Metaprotein-Based Response Prediction.

An independent sample preparation and LC-MS/MS data collection were performed on 41 additional subjects, and these data were used as an initial verification of the predictive power of the metaprotein model in a blinded fashion. In the independent analysis of the validation CHC cohort (n = 41), 112 proteins were identified with two or more peptides, with a total of 3,211 peptides. The sample complexity results in less than 100% reproducibility in the identification of peptides from sample-to-sample, making alignment of two or more datasets challenging. For the purposes of prediction, we must rely on the subset of peptides that are identified with the same modifications and in the same charge state in both datasets. Figure 2 outlines the overall strategy by which the predictive model, built on the training cohort, is deployed for blinded prediction of a new cohort. The initial training dataset was generated from 55 individuals, yielding a training metaprotein model describing the dataset (Fig. 2A). The quantitation of 3,966 peptides from this dataset are portrayed on the left side of the Venn diagram (Fig. 2B). A second proteomic expression dataset was collected on n = 41 blinded individuals (Fig. 2C), which independently contains expression data for 3,358 peptides (Fig. 2D). The peptide identifications in common between the two datasets, in this case 2,051 peptides (Fig. 2E) are then used to weigh the metaprotein coefficients created in (A) based on which peptides have expression data available (Fig. 2F). The model is then used for independent treatment response prediction for the test set (Fig. 2G). Importantly, peptides to the three significantly associated metaproteins in the discovery cohort were identified in all samples. The MatLab codes for these analyses are provided in the Supporting Material. It is important to note that expression data from the validation dataset is not used to build the predictive model. Evaluation of our validation cohort in this fashion, without setting a threshold level, resulted in a significant (t test P value 3.6 × 10−5) differential between SVR and NR groups in the test set. Setting the threshold is challenging due to batch effects; however, at the optimal threshold the test resulted in an AUROC of 0.86 and sensitivity and specificity of 0.92 and 0.8, respectively (Fig. 3).

Figure 2.

Initial training dataset was generated from 55 individuals, yielding a training metaprotein model describing the dataset (A). The peptide identifications available from these data are portrayed on the left side of the Venn diagram (B). A second proteomic expression dataset was collected on n = 41 blinded individuals (C), of which a second subset of peptides and proteins is independently identified (D). The overlap in peptide identifications (E) is then used to weight the metaprotein model created in (A) based on which peptides have expression data available for use (F). This weighted model is then used to independently predict treatment response for the test dataset in a blinded fashion (G). For the study described herein, the number of peptide identifications for B,D,E were 1,915, 1,307, and 2,051 respectively.

Figure 3.

The plot on the left shows the predicted scores of the validation samples and the ROC curve shows overall performance for selecting SVR.

IL28B Polymorphism.

We evaluated the IL28B rs12979860 genotype (CC versus non-CC) as a predictor of SVR in this validation cohort and found it to have very high specificity, suggesting that carriers of the CC genotype have a high predicted probability of SVR. The CC genotype was present in 20/41 (49%) of this validation cohort, with sensitivity and specificity for SVR of 0.73 and 0.93, respectively. In comparison, our proteomic signature had slightly better sensitivity, but lower specificity than the IL28B genotype. Among the IL28B non-CC patients (n = 21/41, 51%), identified metaproteins were able to identify SVR in 4/7 (57%) and NR in 11/14 (78%) of patients with an overall accuracy of 71% (Table 1).

Table 1. Performance of IL28B Genotype and Proteomic Signature in 41 HCV Genotype 1 Patients
  1. rs12979860 CC+ represents the response predictive genotype.

IL28B genotype0.730.93  
 rs12979860 CC +  191
 rs12979860 non-CC  714
Proteomic signature (PS)0.770.80  
 PS +  203
 PS −  612
Proteomic signature among rs12979860 non-CC0.570.78  
 PS +  43
 PS −  311

Pathway Analysis.

Ingenuity pathway analysis indicates that the differentially regulated metaproteins are involved in multiple host regulatory pathways including innate and adaptive immune responses, pro- and antiinflammatory signaling, coagulation cascade, fibrogenesis, and hepatocyte regeneration. Several of these metaproteins share a common pathway related to IFN response, and corroborate the observed association between IL28B signaling pathway and our response signature (Fig. 4). As the lower limit of quantitation in the mass spectrometer is somewhat better than the lower limit at which a peptide can be identified, there are some as-yet-unidentified peptides which make up a portion of the differentiating signal. These unidentified peptides are in the minority based on signal intensity for the sample and in the predictive metaprotein signature. Therefore, although in-depth profiling of the samples to obtain further peptide identification is not strictly necessary to improve upon the predictive power of our current predictive model for SVR, these additional peptides or proteins may provide improved insight into disease pathology.

Figure 4.

Ingenuity pathway analysis indicating direct (solid line) and indirect (dashed line) relationships between derived metaproteins in a training cohort and IFN-response signaling pathways. GC, vitamin D binding protein; AHSG, alpha-2-HS-glycoprotein/fetuin A; C5, complement C5; AKTI, protein kinase akt-1; ST6GAL1, beta-galactoside alpha-2,6-sialyltransferase; STAT3, signal transducer and activator of transcription 3; IFNG, interferon gamma; TNF, tumor necrosis factor; pkc (s), protein kinase C; IL28RA, interleukin-28 receptor subunit alpha.


Using an unbiased proteomics discovery platform in a well-characterized cohort of CHC patients, we have described a three-metaprotein signature that is able to accurately predict sustained or virologic nonresponse prior to current SOC therapy in over 90% of patients in the training cohort, 88% of our validation group, and in 71% of patients with the poor response variants for the IL28B polymorphism. This represents a significant advance over response predictors based on host and viral characteristics derived from clinical registration trials. Only one-half of CHC patients are expected to achieve sustained viral clearance from serum with prolonged IFN-based therapy, which is associated with significant side effects, cost, and detrimental effects on quality of life measures. Thus, approaches such as our three metaprotein algorithm, which provide an accurate prediction of antiviral efficacy, should be a useful adjunctive clinical tool in the treatment decision-making process. This study is reported according to recent guidelines for scientific reporting of proteomic biomarker data,9 and represents the first application of the state-of-the-art “bottom-up” approach to unbiased platform differential proteomic expression to predict therapeutic response in chronic HCV infection.

To date, most of the efforts to employ genomic technologies to outcomes of HCV infection have focused on genetic approaches or studies of targeted or genome-wide gene expression.10 Proteomics has been used less frequently, but has several advantages over other “omics” platforms. Genetics does not address the dynamics of disease process, and the level of messenger RNA (mRNA) expression does not account for potential silencing of genes, for example, by methylation, and only partially correlates with protein expression. Gene expression profiling has been applied to CHC patient samples such as liver or tumor tissue, mostly to identify novel markers of HCV-related fibrogenesis, or identify pathways of HCC development. Recent studies have also evaluated hepatic gene expression in relation to virologic responses to therapy11 and IL-28B.12 Unbiased protein profiling of liver tissue has been applied to identify pathways associated with CHC fibrosis,13 and could improve the likelihood of identifying relevant, low-abundance proteins in relation to virologic response. However, obtaining adequate liver tissue samples is often difficult. Thus, recent proteomic profiling studies in CHC infection have mostly used serum or plasma samples to identify proteomic signatures of disease pathogenesis.14, 15 Several mass spectrometry-based proteomic discovery methods have been used to assess clinical outcomes in various disease states, including MALDI-ToF MS, SELDI-ToF, 2D gel electrophoresis with qualitative LC-MS/MS, and qualitative and quantitative gel-free LC-MS. A disadvantage of SELDI and MALDI-based approaches is the difficulty in identifying and quantitating the differential peaks of interest.16 A prior study evaluated protein “peaks” using SELDI-TOF in relation to virologic response in a CHC training cohort that included HCV genotypes 1-5. Two of six peaks were significant in the training set, and with fibrosis stage and HCV genotype were able to predict viral response with an AUROC of 0.75.5 However, these protein “peaks” have not been further identified or validated. In contrast, our study used identified peptides and protein profiles from the training cohort to develop a predictive model that could be applied to the training set. Furthermore, our approach allows for development of a targeted MS platform for verification in larger patient cohorts. Our “bottom-up” approach uses enzymatic digestion of the proteins to create peptide “surrogates” for the proteins, followed by the LC-MS analysis of the peptide mixtures. Analysis of peptide “surrogates” has several advantages over analyzing parent proteins. These peptides are small enough to be very efficiently separated using liquid chromatographic separations, and sequenced using tandem mass spectrometry.17 This yields datasets in which several hundred thousand distinct features can be quantitated in a few hours. In addition, with this “bottom-up” approach, changes to individual protein epitopes, such as posttranslational modifications, proteolytic cleavage events, or splice variants can be probed. This approach may have applicability to identification of peptide surrogates of disease progression or therapeutic responses in other nonviral chronic liver diseases that rely on biopsy to assess clinical outcome measures.

We identified three metaproteins of interest that were able to predict virologic response to current SOC therapy in 55 patients with an AUROC of 0.90, and 0.94 when combined with demographic variables of gender, race, and HCV RNA levels. In clinical trials, HCV genotype is the most important predictor of virologic response, with expected SVR rates 42%-46% in patients with HCV genotype 1 infection, and around 80% in patients with HCV genotype 2 and 3 infection. However, actual SVR rates in practice are likely to be lower compared with those observed from clinical registration trials with highly selected patient cohorts. Other host and viral factors, such as HCV RNA levels, race, gender, body weight, early stage disease, younger age, and absence of insulin resistance are also important variables in determining outcomes to therapy, but have a poor predictive value for SVR. Our cohort controlled for these demographic variables, but noted a predictive AUROC of only 0.69. Once treatment has commenced, adherence to therapy, and early virologic responses in the first 4 to 12 weeks of treatment, are currently the most important factors that determine the likelihood of achieving SVR in HCV genotype 1-infected patients. Predictive algorithms have been developed based on combining baseline factors and on-treatment responses, but have not yet been validated or adopted into routine clinical practice. However, this still entails a several-week period of therapy for the patient with associated cost and risk of adverse events.

Recent genome-wide association data in a large clinical study of an HCV genotype 1 population adherent to therapy indicated the presence of single nucleotide polymorphisms in the host IL28B region on chromosome 19q13 that encodes type III interferon-λ3, to be strongly associated with SVR.4 The frequency of the favorable genotype varies by population, and is present in less than 40% of patients of European ancestry and less than 20% of African-American patients. The presence of the good response IL28B variant alone appears to have modest baseline predictive performance for SVR in HCV genotype 1 patients, with predictive values below 0.7 in larger cohorts, and its future clinical utility may depend on combination with other baseline or on-treatment predictors of virologic response. We evaluated the IL28B genotype in our validation cohort, and despite the small number of patients, we noted that our metaprotein model was able to determine virologic responses in 71% of patients with poor response (non-CC) IL28B genotype. In clinical practice, there may be the potential for adjunctive use of metaproteins, along with IL28B genotype determination, to increase sensitivity and provide accurate determination of virologic response in the majority of HCV genotype 1 patients. Of clinical relevance for non-CC genotype CHC patients is that a secondary predictive measure of virologic outcome allows the option of further individualizing therapeutic regimens.

Our pretreatment predictive metaprotein model is based on a tertiary center referral cohort that completed assigned duration of therapy, controlled for baseline host and viral variables known to affect response. There was one patient (responder) in the training set and five patients (three responders and two nonresponders) in the validation cohort with METAVIR stage 4. However, the overall rate of response among cirrhotic patients was similar to those without cirrhosis. Additionally, when the five subjects with cirrhosis were removed from the validation dataset, the overall AUROC for SVR did not change. Although we did not include any HCV genotype non-1 nonresponders in our training cohort, latent factor expression patterns were similar between HCV genotype 1 and non-1-infected patients that achieved SVR, indicating a specific virologic response expression phenotype, and not a pattern reflecting differences in viral genotype. Restriction of the model fit on the training data to just those patients in the study with HCV genotype 1 disease results in an AUROC of 0.89, indicating no significant association between the metaprotein predictors and viral genotype.

Pathway analysis provides biological plausibility in that the identified metaproteins are associated with various immunoregulatory functions related to the inflammatory process in CHC infection. Vitamin D binding protein is a polymorphic multifunctional 52.9 kDa protein encoded by the albumin gene family, synthesized in the liver and found in plasma, urine, cerebrospinal fluid, and on the surface of many cells. It carries the vitamin D sterols, associates with membrane-bound immunoglobulin on the surface of B-lymphocytes, and is involved in several biological functions, such as fatty acid transport and macrophage activation.18 VTDB levels may provide prognostic information in acute liver failure, and augments complement C5a mediated chemotaxis during inflammation. Alpha-2-HS glycoprotein (also termed Fetuin-A, FETUA) is a 45-kDa plasma protein synthesized by hepatocytes and regulated as a negative acute phase reactant.19 FETUA promotes opsonization, inhibits insulin receptor tyrosine kinase, acts as an antagonist of TGF-β and regulates cytokine dependent bone mineralization. Fetuin-A has been shown to decline in CHC patients with increasing fibrosis stage.15 However, both SVR and nonresponder patients were matched for fibrosis severity in our training and validation cohorts. Complement C5 is a 180-kDa protein that is cleaved into active peptides: C5a, which mediate local inflammatory responses, and C5b, that initiates the formation of the complement membrane attack complex, and is a key component of the innate immune response.

Further validation of this proteomic signature will be required in external CHC cohorts, and the determination of IL28B genotype variants will also be important in this regard, and likely provide adjunctive data to improve baseline predictive indices of response and the potential for individualized targeted therapy. However, any association between these proteins and immunomodulatory pathways linked to IFN-based therapies remain hypothetical at this stage. Several proteins implicated by metaprotein analysis have immunoassays (e.g., enzyme-linked immunosorbent assay [ELISA]) available, so this provides a potential avenue for proteomic signature validation and future clinical assay development. However, because many of the differentiating signals included in the metaprotein model may be peptide-specific, a better option for validation and initial clinical implementation may be quantitative (MRM) mass spectrometry. The importance of this approach in biomarker validation was recently highlighted by the Clinical Proteomics Technology Assessment for Cancer, and we have planned further targeted multiple reaction monitoring (MRM) studies in this regard.20

The power of modern mass spectrometry approaches is to be able to investigate the samples in an unbiased manner (i.e., without a priori knowledge of what might be changing), while gaining high confidence identifications and very accurate quantitative data on the species measured in the study. Splice variants and posttranslational modifications are just some of the protein isoforms that are present with a frequency of several orders of magnitude, and with proper data treatment some of these changes to high abundance proteins can be monitored within the dataset as well. Still, a significant challenge to proteome biomarker discovery is that the blood proteome remains a complex biological system to evaluate with significantly more features compared to the genome. The quantitative dynamic range of identified plasma proteins alone is 1010 (from IL-6 at 0-5 pg/mL to albumin at 35-50 g/dL). Although our methodology employed current high-affinity depletion techniques, low-abundance markers of antiviral response were almost certainly missed in our cohort, due to the limited dynamic range of the technique (≈3 orders of magnitude). Additional immunodepletion or subproteome enrichment techniques, in conjunction with high-resolution LC-MS, are powerful tools for lower-level biomarker discovery that have distinct promise in future HCV research.21, 22

Disease variability may not be as important in a chronic infection such as HCV with a prolonged natural history, but accounting for inherent patient variability in the plasma proteome will continue to provide a significant challenge to discovery efforts.23 Likewise, adapting unbiased discovery methods to the rapidly changing therapeutic landscape in CHC infection will provide further ongoing challenges in the future. In summary, this preliminary study provides the first description of the potential clinical utility of unbiased proteomics, to identify metaproteins of interest that can accurately predict virologic responses to current SOC therapy in the majority of hepatitis C patients.


K.P., J.L., J.W.T., H.T., A.T., R.M.C., M.A.M., G.S.G., J.G.M., J.J.M. conceptualized and implemented the study design; K.P., J.L., J.W.T., J.G.M., and J.J.T. were responsible for data analysis and interpretation and drafted the initial article; J.W.T., M.A.M., and L.G.D. performed proteome analysis; D.U. implemented material transfer and article preparation. All authors had access to data and contributed to the final version of the article. We thank Martha Stapels and Scott Geromanos for assistance in LC-MS method development and data collection, Cindy Chepanoske and Andrey Bondarenko for assistance in raw data processing, and Crystal Cates and Melissa Spain for assistance with biorepository serum sample processing.