Chronic hepatitis C infection (CHC) is a global health problem, with an estimated 120 to 130 million chronic hepatitis C virus (HCV) carriers worldwide. Current standard-of-care (SOC) treatment with pegylated interferon (PEG-IFN) in combination with ribavirin is costly, associated with significant side effects, and results in sustained virological response (SVR) in only about one-half of treated subjects in controlled clinical trial settings with carefully selected patients. In standard clinical practice, comorbid conditions and potential adverse events result in frequent dosage adjustments, reduced compliance, and early withdrawal from treatment, further reducing SVR.1
Identifying host-viral factors that predict the likelihood of SVR prior to initiating therapy would be a very useful clinical tool that could help reduce costs and avoid unnecessary exposure to therapy with significant side effects. Data from clinical registration trials have identified genotype and pretreatment HCV RNA levels as the most significant viral predictors of SVR.2 Baseline host characteristics that influence virologic response include race, gender, age, disease stage, insulin resistance, and body weight, but are unable to accurately predict SVR. Pretreatment serum biochemical markers that are often obtained routinely, such as liver transaminases, total cholesterol and γ-glutamyltransferase, are also unreliable predictors of SVR.3 As a result, predictive models for SVR that are based on pretreatment host-viral characteristics have not been developed for routine clinical use. Recent data indicate that a single nucleotide polymorphism in the host IL28B region on chromosome 19q13 identified in one-third of HCV genotype 1-infected patients is also strongly associated with SVR.4 The allele frequency for this SNP varies with genetic ancestry and, for example, the response variant is present in 15% of African-Americans, who have the lowest SVR rates of any ethnic group to current SOC therapy. Thus, alternative and more reliable baseline predictors of virologic response are still required for most CHC patients.
Most proteomic studies in HCV to date have used unbiased proteomic profiling to identify novel markers of HCV-related fibrogenesis or identify pathways of hepatocellular carcinoma (HCC) development. A prior study evaluating treatment response in CHC patients using surface-enhanced laser desorption/ionization / time-of-flight mass spectrometry (SELDI-ToF/MS) ProteinChip technology noted that two “peaks” of interest in combination with fibrosis stage and viral genotype could predict treatment response in 81% of a verification cohort with an area under the receiver operator characteristic curve (AUROC) of 0.75.5 However, these protein “peaks” have not yet been further characterized or validated in other CHC cohorts, and as such are unlikely to have further clinical application. In this study we used an unbiased proteomic discovery platform with high resolution, accurate mass liquid chromatography mass spectrometry (LC-MS) to analyze pretreatment sera from a well-characterized training cohort of 55 CHC patients, and validation in a further 41 CHC genotype 1 patients with characterized IL28B genotype. We identified novel signature pathways of derived metaproteins that can predict SVR with an AUROC of 0.86-0.90 before commencing SOC antiviral therapy.
Patients and Methods
Patients were selected from the Duke Hepatology Clinical Research (DHCR) database, an ongoing biorepository of greater than 3,000 HCV-infected patients initiated in 2002. Only sera from selected CHC patients with available treatment history and demographic data were included in this study. SVR was defined as undetectable HCV RNA at 6 months following end-of-treatment with current SOC therapy.
Pretreatment serum samples from 55 CHC patients in three different phenotypic groups were selected as the initial “test” group for this study: 19 nonresponder patients with HCV genotype 1; 17 SVR patients with HCV genotype 1; and 20 SVR patients with HCV genotype 2/3. An additional 41 genotype 1 CHC patients were selected for the validation study (n = 26 responders and n = 15 nonresponders). Responders and nonresponders were carefully matched as much as possible with respect to clinical and demographic variables, such as age, race, gender, and viral load, known to affect treatment response (Supporting Table 1). All patients provided written informed consent and all study procedures were approved by the Duke University Institutional Review Board.
Sample Preparation and LC-MS/MS Analysis.
A detailed sample preparation protocol and LC-MS methods section are available in the Supporting Methods section. Serum samples were statistically randomized and immunodepleted using MARS14 columns in an LC format (Agilent Technologies, Santa Clara, CA) according to the manufacturer's recommendations. After buffer exchange, protein concentration was normalized across all samples and ≈25 μg of protein from each sample was subjected to in-solution digestion with trypsin. All samples were spiked with 50 fmol MassPREP ADH digestion standard per μg of total protein (Waters, Milford, MA). LC-MS/MS analysis was carried out on a nanoAcquity liquid chromatograph coupled to a QToF Premier mass spectrometer (Waters). The Rosetta Elucidator v. 3.3 software package (Rosetta Biosoftware, Rosetta Inpharmatics, Seattle, WA) was used to import and align all LC-MS raw data files and perform feature quantitation. Mascot v. 2.2 (Matrix Sciences, Boston, MA) and Proteinlynx Global Server v. 2.4 (Waters) database search engines were used to make peptide identifications. MS/MS identifications were curated using a forward/reverse database search, at a false discovery rate of 1%. Using this data collection and informatics pipeline, the mass spectrometer is used to measure expression for tryptic peptides and these peptides can be used to infer expression of the parent proteins.
Genotyping was performed with the Illumina Human610-quad BeadChip (Illumina, San Diego, CA) as described.7 Genotype at the polymorphic site rs12979860 on chromosome 19 was suitable for analysis in 41 CHC patients. Genotypes were pooled for analysis (CC versus CT or TT) and we refer to an IL28B polymorphism, noting that the association SNP actually lies 3 kb upstream of the IL28B gene.
Predictive Model Based on Meta-Proteins.
To build predictive models based on the proteomic data, we used a “metaprotein” classification approach which aims to minimize the effects of several types of error on a quantitative proteomic dataset, including incorrect peptide identification and incorrect protein inference due to tryptic peptide sequence identity. The central difference to standard approaches is that we use the expression profile of a peptide or group of peptides to assist its grouping with similarly quantified peptides, in addition to the traditional grouping by common “parent” protein sequence. We generated 105 metaproteins from which to build a model predictive of response to therapy; however, we expect most of these to be nonpredictive either individually or within a linear model. We used shotgun stochastic search variable selection, as described,6 to obtain a set of parsimonious model. This procedure is similar to penalized regression schemes such as AIC or BIC, but allows for the use of model averaging in prediction. A weighted average of the set of fitted models is then computed (with models demonstrating good fit weighted more heavily). This model averaging scheme has been shown to outperform the single best model in predictive accuracy because it more accurately estimates the uncertainty associated with model choice.7 As subjects were selected to balance the most relevant clinical predictors (such as race, HCV RNA levels, and gender) we did not use these in our predictive model. This type of filtered sample selection would likely distort the relationships between relevant clinical variables and our outcome, thereby likely decreasing the predictive accuracy of our model. We observed a difference in our predictor between responders and nonresponders of 0.4 on our training data, and the within-group standard deviation was observed to be 0.23. With this magnitude of effect size, and assuming 15 nonresponders and 26 responders in our validation group, we expected a power of >97% to detect a statistically significant difference (P < 0.01) in our predictor in the validation dataset.
See Supporting Methods, Tables, and Figures for additional information. MS/MS peptide identifications have been uploaded to the PRIDE database (http://www.ebi.ac.uk/pride/), accession numbers 10679 and 10680.
Using an unbiased proteomics discovery platform in a well-characterized cohort of CHC patients, we have described a three-metaprotein signature that is able to accurately predict sustained or virologic nonresponse prior to current SOC therapy in over 90% of patients in the training cohort, 88% of our validation group, and in 71% of patients with the poor response variants for the IL28B polymorphism. This represents a significant advance over response predictors based on host and viral characteristics derived from clinical registration trials. Only one-half of CHC patients are expected to achieve sustained viral clearance from serum with prolonged IFN-based therapy, which is associated with significant side effects, cost, and detrimental effects on quality of life measures. Thus, approaches such as our three metaprotein algorithm, which provide an accurate prediction of antiviral efficacy, should be a useful adjunctive clinical tool in the treatment decision-making process. This study is reported according to recent guidelines for scientific reporting of proteomic biomarker data,9 and represents the first application of the state-of-the-art “bottom-up” approach to unbiased platform differential proteomic expression to predict therapeutic response in chronic HCV infection.
To date, most of the efforts to employ genomic technologies to outcomes of HCV infection have focused on genetic approaches or studies of targeted or genome-wide gene expression.10 Proteomics has been used less frequently, but has several advantages over other “omics” platforms. Genetics does not address the dynamics of disease process, and the level of messenger RNA (mRNA) expression does not account for potential silencing of genes, for example, by methylation, and only partially correlates with protein expression. Gene expression profiling has been applied to CHC patient samples such as liver or tumor tissue, mostly to identify novel markers of HCV-related fibrogenesis, or identify pathways of HCC development. Recent studies have also evaluated hepatic gene expression in relation to virologic responses to therapy11 and IL-28B.12 Unbiased protein profiling of liver tissue has been applied to identify pathways associated with CHC fibrosis,13 and could improve the likelihood of identifying relevant, low-abundance proteins in relation to virologic response. However, obtaining adequate liver tissue samples is often difficult. Thus, recent proteomic profiling studies in CHC infection have mostly used serum or plasma samples to identify proteomic signatures of disease pathogenesis.14, 15 Several mass spectrometry-based proteomic discovery methods have been used to assess clinical outcomes in various disease states, including MALDI-ToF MS, SELDI-ToF, 2D gel electrophoresis with qualitative LC-MS/MS, and qualitative and quantitative gel-free LC-MS. A disadvantage of SELDI and MALDI-based approaches is the difficulty in identifying and quantitating the differential peaks of interest.16 A prior study evaluated protein “peaks” using SELDI-TOF in relation to virologic response in a CHC training cohort that included HCV genotypes 1-5. Two of six peaks were significant in the training set, and with fibrosis stage and HCV genotype were able to predict viral response with an AUROC of 0.75.5 However, these protein “peaks” have not been further identified or validated. In contrast, our study used identified peptides and protein profiles from the training cohort to develop a predictive model that could be applied to the training set. Furthermore, our approach allows for development of a targeted MS platform for verification in larger patient cohorts. Our “bottom-up” approach uses enzymatic digestion of the proteins to create peptide “surrogates” for the proteins, followed by the LC-MS analysis of the peptide mixtures. Analysis of peptide “surrogates” has several advantages over analyzing parent proteins. These peptides are small enough to be very efficiently separated using liquid chromatographic separations, and sequenced using tandem mass spectrometry.17 This yields datasets in which several hundred thousand distinct features can be quantitated in a few hours. In addition, with this “bottom-up” approach, changes to individual protein epitopes, such as posttranslational modifications, proteolytic cleavage events, or splice variants can be probed. This approach may have applicability to identification of peptide surrogates of disease progression or therapeutic responses in other nonviral chronic liver diseases that rely on biopsy to assess clinical outcome measures.
We identified three metaproteins of interest that were able to predict virologic response to current SOC therapy in 55 patients with an AUROC of 0.90, and 0.94 when combined with demographic variables of gender, race, and HCV RNA levels. In clinical trials, HCV genotype is the most important predictor of virologic response, with expected SVR rates 42%-46% in patients with HCV genotype 1 infection, and around 80% in patients with HCV genotype 2 and 3 infection. However, actual SVR rates in practice are likely to be lower compared with those observed from clinical registration trials with highly selected patient cohorts. Other host and viral factors, such as HCV RNA levels, race, gender, body weight, early stage disease, younger age, and absence of insulin resistance are also important variables in determining outcomes to therapy, but have a poor predictive value for SVR. Our cohort controlled for these demographic variables, but noted a predictive AUROC of only 0.69. Once treatment has commenced, adherence to therapy, and early virologic responses in the first 4 to 12 weeks of treatment, are currently the most important factors that determine the likelihood of achieving SVR in HCV genotype 1-infected patients. Predictive algorithms have been developed based on combining baseline factors and on-treatment responses, but have not yet been validated or adopted into routine clinical practice. However, this still entails a several-week period of therapy for the patient with associated cost and risk of adverse events.
Recent genome-wide association data in a large clinical study of an HCV genotype 1 population adherent to therapy indicated the presence of single nucleotide polymorphisms in the host IL28B region on chromosome 19q13 that encodes type III interferon-λ3, to be strongly associated with SVR.4 The frequency of the favorable genotype varies by population, and is present in less than 40% of patients of European ancestry and less than 20% of African-American patients. The presence of the good response IL28B variant alone appears to have modest baseline predictive performance for SVR in HCV genotype 1 patients, with predictive values below 0.7 in larger cohorts, and its future clinical utility may depend on combination with other baseline or on-treatment predictors of virologic response. We evaluated the IL28B genotype in our validation cohort, and despite the small number of patients, we noted that our metaprotein model was able to determine virologic responses in 71% of patients with poor response (non-CC) IL28B genotype. In clinical practice, there may be the potential for adjunctive use of metaproteins, along with IL28B genotype determination, to increase sensitivity and provide accurate determination of virologic response in the majority of HCV genotype 1 patients. Of clinical relevance for non-CC genotype CHC patients is that a secondary predictive measure of virologic outcome allows the option of further individualizing therapeutic regimens.
Our pretreatment predictive metaprotein model is based on a tertiary center referral cohort that completed assigned duration of therapy, controlled for baseline host and viral variables known to affect response. There was one patient (responder) in the training set and five patients (three responders and two nonresponders) in the validation cohort with METAVIR stage 4. However, the overall rate of response among cirrhotic patients was similar to those without cirrhosis. Additionally, when the five subjects with cirrhosis were removed from the validation dataset, the overall AUROC for SVR did not change. Although we did not include any HCV genotype non-1 nonresponders in our training cohort, latent factor expression patterns were similar between HCV genotype 1 and non-1-infected patients that achieved SVR, indicating a specific virologic response expression phenotype, and not a pattern reflecting differences in viral genotype. Restriction of the model fit on the training data to just those patients in the study with HCV genotype 1 disease results in an AUROC of 0.89, indicating no significant association between the metaprotein predictors and viral genotype.
Pathway analysis provides biological plausibility in that the identified metaproteins are associated with various immunoregulatory functions related to the inflammatory process in CHC infection. Vitamin D binding protein is a polymorphic multifunctional 52.9 kDa protein encoded by the albumin gene family, synthesized in the liver and found in plasma, urine, cerebrospinal fluid, and on the surface of many cells. It carries the vitamin D sterols, associates with membrane-bound immunoglobulin on the surface of B-lymphocytes, and is involved in several biological functions, such as fatty acid transport and macrophage activation.18 VTDB levels may provide prognostic information in acute liver failure, and augments complement C5a mediated chemotaxis during inflammation. Alpha-2-HS glycoprotein (also termed Fetuin-A, FETUA) is a 45-kDa plasma protein synthesized by hepatocytes and regulated as a negative acute phase reactant.19 FETUA promotes opsonization, inhibits insulin receptor tyrosine kinase, acts as an antagonist of TGF-β and regulates cytokine dependent bone mineralization. Fetuin-A has been shown to decline in CHC patients with increasing fibrosis stage.15 However, both SVR and nonresponder patients were matched for fibrosis severity in our training and validation cohorts. Complement C5 is a 180-kDa protein that is cleaved into active peptides: C5a, which mediate local inflammatory responses, and C5b, that initiates the formation of the complement membrane attack complex, and is a key component of the innate immune response.
Further validation of this proteomic signature will be required in external CHC cohorts, and the determination of IL28B genotype variants will also be important in this regard, and likely provide adjunctive data to improve baseline predictive indices of response and the potential for individualized targeted therapy. However, any association between these proteins and immunomodulatory pathways linked to IFN-based therapies remain hypothetical at this stage. Several proteins implicated by metaprotein analysis have immunoassays (e.g., enzyme-linked immunosorbent assay [ELISA]) available, so this provides a potential avenue for proteomic signature validation and future clinical assay development. However, because many of the differentiating signals included in the metaprotein model may be peptide-specific, a better option for validation and initial clinical implementation may be quantitative (MRM) mass spectrometry. The importance of this approach in biomarker validation was recently highlighted by the Clinical Proteomics Technology Assessment for Cancer, and we have planned further targeted multiple reaction monitoring (MRM) studies in this regard.20
The power of modern mass spectrometry approaches is to be able to investigate the samples in an unbiased manner (i.e., without a priori knowledge of what might be changing), while gaining high confidence identifications and very accurate quantitative data on the species measured in the study. Splice variants and posttranslational modifications are just some of the protein isoforms that are present with a frequency of several orders of magnitude, and with proper data treatment some of these changes to high abundance proteins can be monitored within the dataset as well. Still, a significant challenge to proteome biomarker discovery is that the blood proteome remains a complex biological system to evaluate with significantly more features compared to the genome. The quantitative dynamic range of identified plasma proteins alone is 1010 (from IL-6 at 0-5 pg/mL to albumin at 35-50 g/dL). Although our methodology employed current high-affinity depletion techniques, low-abundance markers of antiviral response were almost certainly missed in our cohort, due to the limited dynamic range of the technique (≈3 orders of magnitude). Additional immunodepletion or subproteome enrichment techniques, in conjunction with high-resolution LC-MS, are powerful tools for lower-level biomarker discovery that have distinct promise in future HCV research.21, 22
Disease variability may not be as important in a chronic infection such as HCV with a prolonged natural history, but accounting for inherent patient variability in the plasma proteome will continue to provide a significant challenge to discovery efforts.23 Likewise, adapting unbiased discovery methods to the rapidly changing therapeutic landscape in CHC infection will provide further ongoing challenges in the future. In summary, this preliminary study provides the first description of the potential clinical utility of unbiased proteomics, to identify metaproteins of interest that can accurately predict virologic responses to current SOC therapy in the majority of hepatitis C patients.
K.P., J.L., J.W.T., H.T., A.T., R.M.C., M.A.M., G.S.G., J.G.M., J.J.M. conceptualized and implemented the study design; K.P., J.L., J.W.T., J.G.M., and J.J.T. were responsible for data analysis and interpretation and drafted the initial article; J.W.T., M.A.M., and L.G.D. performed proteome analysis; D.U. implemented material transfer and article preparation. All authors had access to data and contributed to the final version of the article. We thank Martha Stapels and Scott Geromanos for assistance in LC-MS method development and data collection, Cindy Chepanoske and Andrey Bondarenko for assistance in raw data processing, and Crystal Cates and Melissa Spain for assistance with biorepository serum sample processing.