A 7 gene signature identifies the risk of developing cirrhosis in patients with chronic hepatitis C


  • Potential conflict of interest: Dr. Shiffman is a consultant and received grants from Celera Diagnostics. Dr. Bzowej advises, is on the speakers' bureau of, and received grants fron Idenix. He also advises for Intarcia. He received grants from Gilead, Roche, Schering-Plough, Bristol-Myers Squibb, Vertex, Merck, and Novartis. Dr. Rowland owns stock in Celera Diagnostics and Applied Biosystems. Dr. Catanese owns stock in Celera Diagnostics. Dr. Cheung is a consultant for Celera Diagnostics.


Clinical factors such as age, gender, alcohol use, and age-at-infection influence the progression to cirrhosis but cannot accurately predict the risk of developing cirrhosis in patients with chronic hepatitis C (CHC). The aim of this study was to develop a predictive signature for cirrhosis in Caucasian patients. All patients had well-characterized liver histology and clinical factors; DNA was extracted from whole blood for genotyping. We validated all significant markers from a genome scan in the training cohort, and selected 361 markers for the signature building. Using a “machine learning” approach, a signature consisting of markers most predictive for cirrhosis risk in Caucasian patients was developed in the training set (N = 420). The Cirrhosis Risk Score (CRS) was calculated to estimate the risk of developing cirrhosis for each patient. The CRS performance was then tested in an independently enrolled validation cohort of 154 Caucasian patients. A CRS signature consisting of 7 markers was developed for Caucasian patients. The area-under-the-ROC curves (AUC) of the CRS was 0.75 in the training cohort. In the validation cohort, AUC was only 0.53 for clinical factors, increased to 0.73 for CRS, and 0.76 when CRS and clinical factors were combined. A low CRS cutoff of <0.50 to identify low-risk patients would misclassify only 10.3% of high-risk patients, while a high cutoff of >0.70 to identify high-risk patients would misclassify 22.3% of low-risk patients. Conclusion: CRS is a better predictor than clinical factors in differentiating high-risk versus low-risk for cirrhosis in Caucasian CHC patients. Prospective studies should be conducted to further validate these findings. (HEPATOLOGY 2007.)

Chronic hepatitis C (CHC) is the most common cause of cirrhosis and hepatocellular carcinoma (HCC), and the leading indication for liver transplantation in the United States and many Western countries. The progression rate to cirrhosis varies widely among CHC patients. Treatment with antiviral therapy would be most cost-effective in those patients with evidence of progressive liver disease.1, 2 Previously identified standard clinical risk factors for rapid progression include age, gender, alcohol use, age at infection, obesity and hepatic steatosis.3–5 However, due to individual variability, these factors are not sufficiently accurate to identify which patients with these factors will progress to cirrhosis.6 We hypothesized that host genetic factors, such as single nucleotide polymorphisms (SNPs) could play a primary role in determining fibrosis risk. Indeed, preliminary results from a previous study suggested that replicated SNPs identified by large association studies had several advantages over clinical risk factors, such as higher odds ratios, consistent percentages of risk population across different study cohorts, and objective genotyping calls that are available for all patients.7 However, we did not exhaustively identify all replicated markers in the genome scan in the previous preliminary report. More importantly, a key question remained as to how these markers could be utilized in clinical settings.

Following the previous report on the initial findings of 2 replicated SNPs,7 in the current study we completed the confirmation of all significant SNPs from a genomic scan, and selected 361 SNPs for signature building. The aim of this study was to build and validate a CRS signature, alone or combined with standard clinical factors, that could be utilized in clinical practice to assess the risk for cirrhosis in Caucasian CHC patients.


CHC, chronic hepatitis C; CRS, Cirrhosis Risk Score; AUC, area under the ROC curves; HCC, hepatocellular carcinoma; SNP, single nucleotide polymorphism; HIV, human immunodeficiency virus; HBV, hepatitis B virus; UCSF, University of California at San Francisco; VCU, Virginia Commonwealth University; CPMC, California Pacific Medical Center; UIC, University of Illinois at Chicago; IDU, injection drug use; BT, blood transfusion; 10×CV, 10-fold cross-validation; NIH, National Institutes of Health; ODC, ornithine decarboxylase; LPS, ligand lipopolysaccharide; APRI, aspartate aminotransferase platelet ratio;

Materials and Methods

Patients: Training and Validation.

All patients in this study were from a large cohort of 1,468 patients with well-documented chronic CHC and known fibrosis stage. Patients were consecutively recruited between November 2002 and May 2005 from the Liver clinics at 5 U.S. centers. The enrollment criteria were the same as previously described.7 In brief, all patients were adults (ages 18-75 years), anti-HCV and HCV RNA positive, and had a baseline liver biopsy prior to, if any, treatment for HCV. Patients were excluded if they had any other co-existing chronic liver diseases or co-infection with human immunodeficiency virus (HIV) or hepatitis B virus (HBV), or presence of hepatocellular carcinoma (HCC). The study was approved by the institutional review boards of all participating centers and written informed consent was obtained from all patients.

Figure 1 illustrates the overall workflow and sample sources. A total of 1,020 patients were enrolled from the University of California at San Francisco (UCSF) and the Virginia Commonwealth University (VCU), of which 420 Caucasian patients meeting the clinical endpoints defined in the signature building (see Clinical Endpoint for details) constituted the Training cohort. Since the previous report,7 an additional 104 patients were enrolled from UCSF and included in the analysis. A total of 448 patients were enrolled from Stanford University (Stanford), California Pacific Medical Center (CPMC), and University of Illinois at Chicago (UIC), of which 154 Caucasian patients meeting the clinical endpoints defined in signature building constituted the Validation cohort.

Figure 1.

Work Flow.

Clinical Risk Factors.

All participants completed a questionnaire to record their risk factors (and year) for HCV infection (injection drug use [IDU], blood transfusion [BT] and others), and to assess lifetime alcohol consumption using standard technique. Medical records were reviewed regarding demographics (date of birth, gender, and ethnicity), virological features (HCV genotype and serum HCV RNA level), and liver histology. Liver biopsies were scored by local experienced pathologists using the system described by Knodell et al. (VCU) or its modification by Batts-Ludwig (UCSF, Stanford, UIC, CPMC).8 Due to the retrospective nature of the study, the liver biopsies were not reviewed by a central pathologist but only those patients with biopsies that were considered by the pathologists or hepatologists as adequately sized were included in the study. Histories of detailed alcohol use, including lifetime and daily alcohol consumption, were calculated based on a previously validated questionnaire.9 Year of infection was defined as the earlier time point of the first exposure to IDU or BT prior to 1992, as used in other studies.3, 4, 10 Estimated duration of infection was calculated from the year of infection to the year of liver biopsy. When the duration of infection could be estimated, a fibrosis rate was calculated as fibrosis stage divided by duration of infection. To calculate progression rate on VCU patients, stage as measured by Knodell score was converted into METAVIR units (similar to Batts-Ludwig score) as described in previous studies.3, 11

Clinical Endpoint.

The primary objective was to determine the risk of developing bridging fibrosis/cirrhosis in Caucasian patients with CHC. To increase the signal-to-noise ratio in the modeling process, only those patients with histological status at the two extremes were included. With a mean duration of infection of 25.1 years (25.1 ± 7.6), patients developing either bridging fibrosis or cirrhosis (stage 3-4) are considered as at “high-risk” of developing cirrhosis, and were defined as “Cases.” In contrast, patients with no-fibrosis (stage 0) with similar duration of infection (24.3 ± 8.5 years) are considered as at a “low-risk” and were defined as “Controls”. Using this endpoint, 420 patients from the Training and 154 patients from the Validation cohort were included in the signature building and testing.


Whole blood was collected from each patient. DNA extraction and genotyping reaction was performed as described previously.7 The DNAs of patients from both the UCSF and VCU cohort were pooled into the following groups: no-fibrosis (stage 0), portal/periportal fibrosis (stage 1 or 2 in UCSF, stage 1 in VCU), and bridging fibrosis/cirrhosis (stages 3 or 4). A gene-centric, genome-wide scan consisting of 24,823 putative functional SNPs was performed on the UCSF cohort. Of all the SNPs with significant association in the UCSF cohort, the first batch of 100 was genotyped in individual VCU DNAs as described previously.7 All the remaining SNPs were genotyped in pooled VCU DNAs. “Replicated SNPs” were defined as those SNPs with significant association in both the UCSF and VCU cohorts, with odds ratios (ORs) in the same direction and of similar magnitude, based on either pool or individual data7 (Fig. 1).

Marker Selection.

A total of 361 unique SNPs based on several clinical endpoints was selected for building the signature and individually genotyped in all sample sets. They can be divided into 7 categories: (1) 142 replicated SNPs associated with bridging-fibrosis/cirrhosis when compared with no-fibrosis (stage 3-4 versus 0); (2) 159 (94 additional) replicated SNPs associated with bridging-fibrosis/cirrhosis when compared with no to portal/periportal fibrosis (stage 3-4 versus 0-2 in UCSF; 3-4 verus 0-1 in VCU), including the previously reported SNPs in DDX5, CPT1A and POLG27; (3) 88 (21 additional) replicated SNPs associated with portal/periportal fibrosis to cirrhosis when compared with no/minimal fibrosis (stage 2-4 versus 0-1 in UCSF, 1-4 versus 0 in VCU); (4) 66 (12 additional) replicated SNPs associated with any fibrosis when compared with no-fibrosis (stage 1-4 versus. 0); (5) 228 (26 additional) replicated SNPs in fibrosis trend analysis (stage 0 versus 1-2 versus 3-4 in UCSF; 0 versus 1 versus 3-4 in VCU); (6) 25 SNPs genotyped for the fine density mapping of DDX5, CPT1A and MTP but did not fall into the previous five categories; (7) 41 SNPs individually genotyped in all sample sets for DNA quality control. For the primary endpoints (category 1 and 2), all replicated SNPs significant in Caucasian patients were included in analysis. For those replicated SNPs based on the other endpoints (category 3-5) or significant only when analyzed in all races, the coverage ranged from 50% to 100%.

Building CRS Signature.

Three major steps were involved in building the CRS signature: ranking markers, selecting the final signature and building the CRS algorithm, and validating the CRS (Supplementary Fig. 1A). The first two steps were carried out in the Training set; once the final signature was selected, it was then tested in the Validation set (Fig. 1).

As the first step, markers were ranked in their order of robustness by performing marker selection using a repeated 10-fold cross-validation (10×CV) experiment (Supplementary Fig. 1B). Before each 10×CV-experiment, the training set was randomized and a stratified sampling was used to create the 10 CV folds to ensure that, in each fold, the proportion of cases and controls was the same as in the original training set. In each of the 10×CV experiments, 9 folds of training samples were used. Each SNP was transformed into 3 binary markers according to the 3 genotypes. An entropy-based univariate analysis12 was used to remove those SNPs poorly associated with cirrhosis risk. Next, a subset of the remaining markers, which strongly discriminated high risk versus low risk for cirrhosis, was selected using Consistency-based Subset Evaluator.13–15 combined with Best First13 (a heuristic search) and a forward selection strategy. In this approach, the subset of markers was selected by sequentially adding markers that could increase the consistency score, a parameter to measure the performance.13–15 A stopping criterion was used to terminate the search if after a finite number of steps no improvement was possible. Fifty runs of such 10×CV experiments were performed and 500 subsets of markers were obtained. The markers were ranked based on the # folds (number of times) they were selected into the 500 sets. For each marker, “%CV-folds” was defined as their selected number of folds divided by 500.

The second step of selecting the final signature was an iterative process (Supplementary Fig. 1A: inset of “Select Signature”). Nine lists of markers were created based on their %CV-folds being above or equal to these cutoffs: 0% (full marker set), 0.2% (selected once in 500 runs), 10%, 20%, 30%, 40%, 50%, 60%, 80%, respectively. For each list, the “marker selection” process was re-applied and a corresponding signature was generated. For each of the nine signatures generated, a Naïve Bayes13 classification algorithm (see Supplementary Materials for details) was implemented to classify samples in the Training set. Two parameters, AUC and “Robustness,” were calculated and plotted with each other for each signature (Supplementary Fig. 2). AUC was used to assess overall performance of each signature; Robustness was used to assess overall robustness of each signature, calculated as the average %CV-folds of all markers in each signature. The final signature was a balance between an increase of Robustness (to reduce overfitting) and loss of predictability (underfitting). Therefore, it was determined by picking the point with Training-AUC ≥ 0.75, Robustness ≥0.5 and closest to where the two curves intersected (Supplementary Fig. 2).

Marker selection and classification were performed using WEKA, an open source Machine Learning Workbench.13 The ROC curve and the AUC were computed using the Mayo Clinic's ROC program.16 Testing whether the AUC was greater than 0.5 was performed using the variance estimate described by DeLong et al.16 AUCs were compared using non-parametric method for comparing correlated ROC curves.16

CRS Algorithm.

The value of CRS based on a constellation of 7 SNPs was calculated using a Naïve Bayes formula.13 Given outcomes C = {cirrhosis, no cirrhosis} and a set of seven predictive SNPs X = {X1, X2, X7}, the probability of a patient S = {X1= x1, X2= x2, Xn= x7}having cirrhosis is computed as follows:

equation image(1)

The conditional probabilities P (S|cirrhosis) and P (S|no cirrhosis), were estimated assuming each SNP was independent of all other SNPs.

equation image(2)
equation image(3)

The probabilities P(Xi=xi|cirrhosis) P(Xi=xi|no cirrhosis) are the class conditional probabilities for the ith marker Xi with value xi (each SNP can take the value of ‘1’ or ‘0’ based on the genotypes), given that the patient has cirrhosis or no cirrhosis respectively. For more detailed description and examples, please see Supplementary Materials.


Characteristics of the Patients.

As the first step in this report, all replicated SNPs significant in both UCSF and VCU cohort from the genomic scan were identified. Next, samples from UCSF and VCU were combined as the Training cohort for signature building (Fig. 1). Of a total of 1,020 patients from UCSF and VCU, 592 met the definition of clinical endpoints with either no fibrosis or bridging fibrosis/cirrhosis. Of those, there were 420 Caucasians, 119 African Americans, 20 Hispanics, 13 Asians, and 20 other races. Due to the insufficient sample size in non-Caucasians, and to avoid population stratification in different races,17 this study focused on Caucasians. To ensure the independent validation of the signature, the Validation cohort (N = 154) was enrolled from 3 other centers. The previously reported risk factors, such as the mean age, male sex, percentage of patients with daily alcohol consumption over 50g, age at infection over 40, and duration of infection did not differ significantly between the two cohorts (Table 1). However, mean alcohol use, duration of infection, overall fibrosis score and the percentage of patients with bridging fibrosis or cirrhosis were significantly higher in the Validation cohort. Similar characteristics were observed when patients with portal/periportal fibrosis were included in the analysis (data not shown).

Table 1. Patient Characteristics: Comparison Between the Training and Validation Cohorts
 Training (UCSF + VCU) N = 420Validation (Stanford + CPMC + UIC) N = 154P Value
  • a

    Clinical factors significantly different between the Training and Validation cohort.

  • b

    Patients with portal/periportal fibrosis were not included in the analysis.

 mean ± SD48.6 ± 8.249.0 ± 7.40.604
 % Male70.0%67.5%0.570
Daily alcohol   
 % > 50 g/day30.2%36.4%0.164
 All: mean ± SDa47.8 ± 70.666.7 ± 98.30.011
Age at infection   
 % >=406.0%2.0%0.061
 mean ± SDa24.9 ± 9.021.6 ± 8.0<.0001
Duration of infection   
 mean ± SDa24.1 ± 7.926.9 ± 7.60.001
Fibrosis score   
 mean ± SDa2.2 ± 1.73.3 ± 1.1<.0001
 No fibrosis157 (37.4)14 (9.1) 
 Portal/Periportal fibrosisb   
 Bridging fibrosis/Cirrhosis263 (62.6)140 (90.9) 
Fibrosis rate   
 mean ± SD0.11 ± 0.200.15 ± 0.150.075

Individual Predictors of Cirrhosis Risk.

CRS signature consists of the 7 SNPs with the highest %CV-fold. It was derived from the list of markers meeting the cutoff of 40% CV-fold, a point with Training-AUC ≥0.75, Robustness ≥0.5 and closest to where the 2 curves intersected (Supplementary Fig. 2). The %CV-fold for the 7 SNPs ranged from 40.2% to 83.8%, indicating they were highly robust in their predictability. Table 2 lists the 7 SNPs based on their %CV-folds from high to low. Of the 7 SNPs, SNP1 was not in a public database (see Supplementary Materials for sequence), and SNPs 2-7 have public identifications. Four SNPs were located in known genes; 3 were located in chromosomal regions not fully characterized. All 7 SNPs were highly significant in their associations with risk for cirrhosis, with ORs ranging from 1.86 to 3.23. The AUC of each SNP was 0.56-0.59, indicating its moderate predictability when used individually. The risk genotypes of these 7 SNPs had medium-to-high frequencies (18.5%-87.3%) in Caucasian patients with CHC, suggesting that a combination could lead to even higher risk yet still be applicable to a large proportion of patients. Of the 361 source SNPs, 7 with the highest predictive value were selected into the final signature. The remaining 354 SNPs, including the 4 previously reported markers in DDX5, CPT1A and POLG27 were less predictive than the final 7 SNPs in CRS. Consistent with literature3 clinical risk factors such as male gender and older age were significantly associated with the risk of cirrhosis. However, excess alcohol use was not a risk factor in our study (Table 2).

Table 2. Individual Genetic and Clinical Predictors in the Training Cohort
PredictorsPublic IDGene (Chr)Robustness (%CV-fold)Risk Genotype or PhenotypeP valuea (Univariate)ORs (95%CI)AUC (95%CI)Frequency (Caucasian Patients)
  • a

    Cases were defined as High-Risk patients who had Bridging-fibrosis or Cirrhosis (stage 3–4), and Controls were defined as Low-Risk patients who had No-fibrosis (stage 0).

  • b

    See Supplementary Material Section II for sequence information on SNP1

SNP1b AZIN1 (Chr8)419 (83.8%)GG0.00023.23 (1.76–6.11)0.57 (0.53–0.60)86.9%
SNP2rs4986791TLR4 (Chr9)410 (82.0%)CC0.00043.11 (1.66–5.81)0.56 (0.53–0.60)87.3%
SNP3rs886277TRPM5 (Chr11)325 (65.0%)CT, CC0.00062.05 (1.36–3.08)0.59 (0.54–0.63)63.6%
SNP4rs2290351none (Chr15)242 (48.4%)AG,AA0.00381.86 (1.22–2.82)0.57 (0.53–0.62)41.6%
SNP5rs4290029none (Chr1)229 (45.8%)GG0.00012.35 (1.52–3.63)0.59 (0.54–0.63)72.3%
SNP6rs17740066none (Chr3)228 (45.6%)AG, AA0.00142.76 (1.48–5.15)0.56 (0.53–0.60)18.5%
SNP7rs2878771AQP2 (Chr12)201 (40.2%)GG0.00032.17 (1.42–3.30)0.58 (0.54–0.63)66.7%
Sex   Male0.00241.94 (1.26–2.96)0.56 (0.42–0.70)69.3%
Age   Older0.0097N/A0.57 (0.39–0.75)N/A
Alcohol   >= 50g/day0.58711.13 (0.73–1.74)0.58 (0.46–0.70)32.9%

Overall Performance of CRS.

CRS was calculated based on the genotypes of 7 SNPs in each patient, hence reflecting the combined impact of all 7 SNPs (Table 3). The value of CRS ranged from 0 to 1, the higher the CRS value, the higher the risk is. In the Training set, the AUC of CRS was 0.75 (95% CI: 0.70-0.80, P < 0.001) for predicting the risk of developing cirrhosis (Fig. 2A). Importantly, similar performance was observed in the Validation set, where the AUC of CRS was 0.73 (95% CI: 0.56-0.89, P < 0.001). In contrast, AUC of clinical risk factors (age, gender, and daily alcohol) for predicting the risk for cirrhosis was only 0.53 (95% CI: 0.35-0.72, P = 0.36). Combining CRS and clinical risk factors resulted in an AUC of 0.76 (95% CI: 0.60-0.92, P < 0.001), a value very close to that obtained from CRS alone (Fig. 2B). Consistently, when each clinical risk factor was added separately into the CRS, the AUC did not improve significantly in either Training or Validation sets. The performance of the CRS improved marginally by 1.6% to 3.6% when combined with all 3 clinical risk factors (Table 4A). Taken together, the results clearly indicated that the CRS was a much better predictor of cirrhosis risk than clinical risk factors.

Table 3. Cirrhosis Risk Score Algorithm
MarkerGeneSNP valueP(SNP=1| cirrhosis)P(SNP=0| cirrhosis)P(SNP=1| no cirrhosis)P(SNP=0| no cirrhosis)
SNP1AZIN1 (Chr8)GGGA, AA0.9280303030.0719696970.8012820510.198717949
SNP2TLR4 (Chr9)CCCT, TT0.9283018870.0716981130.8101265820.189873418
SNP3TRPM5 (Chr11)TTTC, CC0.3181818180.6818181820.4873417720.512658228
SNP4none (Chr15)GGGA, AA0.5547169810.4452830190.6962025320.303797468
SNP5none (Chr1)GGGC, CC0.7849056600.2150943400.6100628930.389937107
SNP6none (Chr3)GGGA, AA0.7849056600.2150943400.9056603770.094339623
SNP7AQP2 (Chr12)GGGC, CC0.7471698110.2528301890.5786163520.421383648
Figure 2.

Performance of CRS and clinical factors. (A) Training AUC = 0.75 (95% CI: 0.70-0.80, P < 0.001). (B) Validation AUC = 0.73 (95% CI: 0.56-0.89, P < 0.001)

Table 4A. Effects of Clinical Risk Factors: Individual and Combined Clinical Risk Factors
AUCΔAUCaP valuebAUCΔAUCaP valueb
  • a

    ΔAUC was calculated as AUC of CRS combined with different clinical risk factors subtracted by AUC of CRS only

  • b

    AUCs were compared using the non-parametric method for comparing correlated ROC curves.16P value indicates whether the ΔAUC is significantly different from 0. 4B. Predictability of CRS in Different Alcohol Segments

CRS0.753  0.726  
CRS + Sex0.7620.0090.280.7540.0280.28
CRS + Age0.7680.0140.140.725−0.0010.96
CRS + Alcohol0.751−0.0020.490.7340.0070.21
CRS + All 3 clinicals0.7690.0160.250.7620.0360.20
Table 4B. Effects of Clinical Risk Factors: Predictability of CRS in Different Alcohol Segments
Daily AlcoholTraining + ValidationP valuea
NAUC (95%CI)
  • a

    AUCs were compared using the non-parametric method.16P value indicates whether the two AUCs are significantly different.

=01380.75 (0.66–0.84)1
>04360.75 (0.70–0.80) 
<253080.74 (0.68–0.82)0.83
>=252660.75 (0.68–0.80) 
<503910.75 (0.66–0.83)0.85
>=501830.74 (0.69–0.80) 
<804560.75 (0.65–0.86)1
>=801180.75 (0.70–0.80) 

CRS and Alcohol Consumption.

The role of alcohol consumption in fibrosis risk has been controversial. In this report, alcohol history was obtained using a previously validated questionnaire.9 Patients in our study cohorts had a wide range of alcohol consumption; 24% did not take any alcohol, and more than 30% consumed >50g daily (Table 1), a cutoff previously reported to be associated with the increased risk of fibrosis progression.4 In our data sets, alcohol was not significantly associated with the risk of developing cirrhosis in the univariate analysis (Table 2), and was a poor predictor by the AUC test (Table 4A). To further investigate whether CRS was dependent on alcohol consumption, we compared the performance of CRS in patients with different degrees of alcohol consumption such as 0, 25, 50 and 80 g/day. Table 4B demonstrated that CRS retained consistently good predictability for cirrhosis risk regardless of the degree of alcohol consumption. Although we could not exclude the possibility that patients might over report alcohol consumption, especially in those with excessive alcohol use (>80g/day), nevertheless, the results indicated that the performance of CRS was independent of alcohol consumption.

Predictive Value of CRS.

Table 5A shows two cutoff values to identify CHC patients with a low risk (<0.50) versus high risk (>0.70) of developing cirrhosis in the Training set. Additional cutoffs and their diagnostic values are listed in Supplementary Table 2. A low cutoff value of <0.50 to identify low-risk patients would correctly classify 71 of 157 low-risk patients (specificity = 45.2%). More importantly, the presence of high-risk patients could be excluded with great certainty because only 27 of 263 of the high-risk patients would fall into this category (misclassification rate = 10.3%). Of the 98 patients with CRS <0.50, 71 low-risk patients were correctly identified (NPV = 72.4%). A high cutoff value of >0.70 to identify high-risk patients would correctly classify 158 of 263 high-risk patients (sensitivity = 60.1%), and misclassify 35 of 157 low-risk patients (misclassification rate = 22.3%). Of the 193 patients with CRS >0.70, 158 high-risk patients were correctly identified (PPV = 81.9%). Also, 129 (30.7%) of the patients fell between the 0.50 and 0.70, and hence could not be classified accurately. A more extreme high cutoff of >0.80 would misclassify only 7.0% patients but would increase the indeterminate group to 60.7% (Supplementary Table 2A).

Table 5A. Predictive Values of CRS: Training Cohort
CRS ValuesLow Risk Stage 0 (N = 157)High Risk Stage 3–4 (N = 263)SenSpPPVNPVMisclassifying RateNo. (%) PatientInterpretation
<0.50712789.7%45.2% 72.4%10.3%98 (23.3)Low risk for cirrhosis
0.50–0.705178     129 (30.7)Indeterminate
>0.703515860.1%77.7%81.9% 22.3%193 (46.0)High risk for cirrhosis
Table 5B. Predictive Values of CRS: Validation Cohort
CRS ValuesLow Risk Stage 0 (N = 14)High Risk Stage 3–4 (N = 140)SenSpPPVNPVMisclassifying RateNo. (%) PatientsInterpretation
  1. aAbbreviations: Sen, sensitivity; Sp, specificity; PPV, positive predictive value; NPV, negative predictive value. Sensitivity and Specificity were calculated for a cutoff of < compared with >=

  2. Lower misclassifying rate was used for the low cutoff than high cutoff due to the higher risk of misclassifying high-risk patients into low-risk category than vice versa

<0.5061787.9%42.9% 26.1%12.1%23 (14.9%)Low risk for cirrhosis
0.50–0.70548     53 (34.4%)Indeterminate
>0.7037553.6%78.6%96.2% 21.4%78 (50.7%)High risk for cirrhosis

In the Validation cohort, similar values of sensitivity, specificity and misclassification rate were observed when applying the same cutoffs (Table 5B). Using a low cutoff of <0.50 to identify low-risk patients, 12.1% (17 of 140) of high-risk patients would have been misclassified. Due to the high prevalence of high-risk patients in the Validation cohort (90.9%), NPV was only 26.1% in the low-risk group, but PPV was 96.2% in the high-risk group, which was higher than that of the Training cohort. Similar to the Training cohort, changing the high cutoff from >0.70 to >0.80 would decrease the misclassification rate from 21.4% to 7.1% but increase the indeterminate group from 34.5% to 65% (Supplementary Table 2B).


The natural history of chronic hepatitis C is highly variable.1, 3, 18–23 Antiviral therapy is costly, associated with side-effects, and has an overall response rate of about 50% for HCV genotype 1.18 As a result, treatment of CHC should be individualized. According to the National Institutes of Health (NIH) consensus statement, “treatment is recommended for patients with an increased risk of developing cirrhosis.” The lack of a prognostic test to identify patients at high-risk for cirrhosis resulted in the current recommendation of treatment candidates as those on “a liver biopsy with portal or bridging-fibrosis, and at least moderate inflammation and necrosis.”1 While this approach will identify those with significant disease, it does not indicate the likelihood of developing cirrhosis in those patients with less severe histology.

The CRS reported here stratifies the cirrhosis risk in Caucasian patients with CHC. For the first time, one can estimate the risk of cirrhosis rather than using findings of the liver biopsy to project the future course of disease. Liver biopsy represents only one time point in the long natural history of CHC, whereas genetic markers are intrinsic and “life-long.” Potentially, CRS could be used to stratify patients' cirrhosis risk prior to liver biopsy (Fig. 3). Consistent with the NIH recommendations of treating those “with an increased risk of cirrhosis,” one can perhaps make the argument that patients at high risk should be treated regardless of their fibrosis stage especially since the overall response rate using current therapy is approximately at least 50%. Patients with low or indeterminate risk could then undergo liver biopsy and those with stage 2-4 would also be treated (Fig. 3). Among those with fibrosis stage 0-1, treatment could probably be deferred in those at low risk and individualized in those with indeterminate risk. Such a strategy would improve current management in two aspects. First, liver biopsy could perhaps be avoided in patients at high risk of developing cirrhosis. In our study population, 45.7% of Caucasian patients could be classified as high risk for cirrhosis (PPV = 81.4%) at a high cutoff of >0.70. Second, those Caucasian patients with fibrosis stage 0-1 but at high risk for cirrhosis, whose treatment is currently deferred, might benefit from immediate treatment as their response to antiviral therapy would be higher.22, 23 Among the high-risk Caucasian patients in our study population, 21.1% had fibrosis stage 0-1. On the other hand, those patients at high risk but who decided to defer antiviral therapy or failed a prior course of antiviral therapy may need closer monitoring and more aggressive management compared to low-risk patients. If the CRS were to be used to screen patients in either manner, a physician education program would be essential to understanding how to interpret the high, low, and indeterminate CRS and how to use the information to optimize patient care.

Figure 3.

Potential clinical implications.

Similar to other complex human diseases, liver cirrhosis is caused by the interactions among multiple genetic and environmental factors. Yang et al. estimated that, for genes with very common genotype frequencies (>30%) and moderate ORs (1.2-1.5), 10-15 markers are needed to achieve appreciable population attributable fraction (PAF) for disease occurrence.24 Consistent with this finding, the CRS signature is comprised of seven markers with high frequencies (18.5%-87.3%) and significant ORs (1.86-3.23). In contrast, only two clinical factors, sex and age, had significant associations with cirrhosis risk (Table 2). Moreover, the CRS measurements are objective, whereas clinical factors such as alcohol are subjective and suffer from recall bias and inaccuracy. Taken together, these reasons explained why CRS is a better predictor than the clinical factors studied here (Fig. 2B). However, we cannot exclude the possibility that other clinical factors, such as hepatic steatosis, body mass index, diabetes, and iron overload might be useful predictors for cirrhosis risk as suggested in some studies.6 Their interactions with CRS should be evaluated in the future studies.

The goal of this study was to build a signature with a minimal set of highly predictive SNPs based on an exhaustive list of significant markers. Most of the 361 source SNPs had confirmed associations with certain clinical endpoints (Fig. 1), but only 7 formed the final CRS signature; the remaining 354 SNPs, including the previously reported 4 markers in DDX5, CPT1A, and POLG27 were not selected. Possible reasons included lower ORs and frequencies of risk genotypes, and more importantly, decreased robustness and accuracy in the multivariate signature compared to those chosen. For example, of those 4 reported SNPs,7 only one of the POLG2 SNPs had a 27.8% of %CV-fold, the other 3 had ≤1% of %CV-fold. Therefore, these 4 markers, separately or combined, had little impact on Training or Validation AUC of CRS (Supplementary Table 1). Nonetheless, this finding does not conflict with the association of the markers with the disease or diminish the potential roles of the markers in disease pathogenesis.

It is important to emphasize that the entire signature-building process, including selecting source SNPs, ranking SNPs, and building the final signature, was performed strictly within the Training cohort. The Validation cohort was only used to validate the final signature once it was established. In addition, unlike many studies which divide the same sample pool into Training and Validation, the Validation set in this study was enrolled from 3 different sites. Furthermore, all patients were consecutively enrolled, thus, there was no control on the clinical characteristics. As expected, differences existed between Training and Validation cohorts, such as average alcohol use, age at infection, duration of infection, fibrosis stage (2.2 versus 3.3), and case–control ratio (263/157 versus 140/14) (Table 1). Despite these, excellent performance was achieved in the Validation set, and its AUC was nearly identical to that developed in the Training (0.73 versus 0.75). Taken together, we expect the CRS results would be applicable to other independent sample sets, and the signature-building approach described here can be generalized to build predictive signatures in other diseases.

Unlike rare Mendelian diseases, the genetic risk of common, complex diseases such as fibrosis associated with CHC are likely polygenic. Indeed, each of the 7 most predictive markers provided only moderate predictability (Table 2), whereas the combination of these 7 was more robust and predictive. This finding is consistent with the multiple biological pathways known to be involved in the hepatic fibrogenesis.25 Of the 7 genes, antizyme-Inhibitor-1 (AZIN1) and Toll-like receptor 4 (TLR4) have an identified role in hepatic fibrosis. AZIN1 binds to ornithine decarboxylase (ODC) antizyme and stabilizes ODC, thus inhibiting antizyme-mediated ODC degradation.26 Interestingly, ODC activity in noncancerous hepatic tissue from patients with correlates with the severity of active hepatitis and degree of fibrosis.27 In addition, ODC mRNA was elevated in liver tissue with HCC.28 Regarding TLR4, it is expressed in all hepatic cell types in response to its ligand lipopolysaccharide (LPS).29 In patients with CHC, NS5A induces the elevated expression of TLR4 in B cells 3- to 7-fold.30 Moreover, LPS-TLR4 pathway plays an important role in the hepatic fibrogenesis.31 We are in process of determining the functional mechanisms of the other 5 genes in fibrogenesis, and the applicability of the CRS in other liver diseases.

There are several limitations of the current study. First, ideally this study should be a longitudinal one with serial biopsies from all patients instead of a cross-sectional one. Unfortunately, such a study is not practical because only a small percentage of patients have repeat biopsies prior to any antiviral therapy. Such patients usually have persistently normal liver tests and/or mild histology, a group known to experience a very slow progression. In addition, the duration between the two biopsies is generally short (3-5 years) based on current practice. Second, all biopsies should be scored by a single pathologist. We addressed this issue by defining “controls” as those with no fibrosis, and “cases” as those with bridging fibrosis/cirrhosis. Multiple studies have shown inter-observer agreement was greatest with cirrhosis or when the number of stages was reduced.32 In addition, no fibrosis (stage 0) and cirrhosis (stage 4) are the same under the Knodell score or the Batt-Ludwig classification.8 Third, there is a spectrum bias in fibrosis distribution between the Training and Validation cohorts. However, this bias reflects strength rather than weakness of our study. The key here is that the Training cohort has 157 controls and 263 cases, which was well-powered to generate a robust signature. If the CRS signature is valid, it should survive the validation of other cohorts regardless of the fibrosis distribution. This was what was observed. Nevertheless, additional cohorts with a more balanced case–control ratio should be used to further validate the CRS performance. Fourth, the CRS signature needs to be validated and optimized in non-Caucasian populations.

In conclusion, we have demonstrated that a CRS signature containing 7 predictive SNPs can identify Caucasian patients with CHC at high risk for cirrhosis. Relying on cutoff values such as <0.50 and >0.70, we could distinguish between the absence and presence of high cirrhosis risk with sufficient reliability. Application of CRS in clinical practice could help to reduce the rate of liver biopsy in Caucasian patients with CHC, identify additional candidates for treatment at early disease stage, and assist re-treatment decisions. In addition, CRS would complement the current and evolving non-invasive fibrosis staging tests33, 34 such as aspartate aminotransferase /platelet ratio (APRI), Fibrotest, and Fibroscan, by adding the much-needed dimension of risk assessment to the management of CHC. The added value of using the CRS to assess genetic risk, in combination with these other tests which assess extent of fibrosis, will require prospective validation.


We thank Dr. David Brenner and Dr. Samuel Broder for their extensive review and helpful comments.