Pathologic criteria for nonalcoholic steatohepatitis: Interprotocol agreement and ability to predict liver-related mortality§

Authors


  • This study was supported in part by the Liver Disease Outcomes Fund of the Center for Liver Diseases at Inova Fairfax Hospital (Falls Church, VA).

  • Potential conflict of interest: Nothing to report.

  • §

    See Editorial on page 1792

Abstract

Since the initial description of nonalcoholic steatohepatitis (NASH), several sets of pathologic criteria for its diagnosis have been proposed. However, their interprotocol agreement and ability to predict long-term liver-related mortality (LRM) have not been demonstrated. In this study, we examined patients with biopsy-proven nonalcoholic fatty liver disease (NAFLD) for whom liver biopsy slides and clinical and mortality data were available. Liver biopsy samples were evaluated for a number of pathologic features and were classified according to the presence or absence of NASH by (1) the original criteria for NAFLD subtypes, (2) the nonalcoholic fatty liver disease activity score (NAS), (3) the Brunt criteria, and (4) the current study's criteria. All NASH diagnostic criteria and individual pathologic features were tested for agreement and for their independent associations with LRM, which were determined with a Cox proportional hazards model. Two hundred fifty-seven NAFLD patients with complete data were included. The diagnoses of NASH by the original NAFLD subtypes and by the current study's definition of NASH were in almost perfect agreement (κ = 0.896). However, their agreement was moderate with NAS (κ = 0.470 and κ = 0.511, respectively) and only fair to moderate with the Brunt criteria (κ = 0.365 and κ = 0.441, respectively). Furthermore, the agreement of the Brunt criteria with NAS was relatively poor (κ = 0.178). During the follow-up (median = 146 months), 31% of the patients died (9% were LRM). After we controlled for confounders, a diagnosis of NASH by the original criteria for NAFLD subtypes [adjusted hazard ratio = 9.94 (95% confidence interval = 1.28-77.08)] demonstrated the best independent association with LRM. Among the individual pathologic features, advanced fibrosis showed the best independent association with LRM [adjusted hazard ratio = 5.68 (95% confidence interval = 1.50-21.45)]. Conclusion: The original criteria for NAFLD subtypes and the current study's criteria for NASH were in almost perfect agreement, but their level of agreement with the NAS and Brunt criteria was lower. A diagnosis of NASH by the original criteria for NAFLD subtypes demonstrated the best predictability for LRM in NAFLD patients. (HEPATOLOGY 2011;)

Nonalcoholic fatty liver disease (NAFLD) is a clinicopathologic spectrum that ranges from simple steatosis to nonalcoholic steatohepatitis (NASH).1-3 Although the incidence of NAFLD in the US population has been estimated to be 15% to 30%, only 2% to 3% have the potentially progressive subtype of NAFLD or NASH.3-5 A number of natural history studies have convincingly shown that among patients on the NAFLD spectrum, only those with NASH are at risk for progression.1, 6-14 Because of this differential progression of NAFLD subtypes, establishing the diagnosis of NASH is important both for prognosis and for the identification of potential candidates for future treatment protocols.

In order to establish the diagnosis of NASH, a number of pathologic criteria have been used. Among these, the original criteria for NAFLD subtypes were developed to histologically categorize NAFLD into four subtypes. Specifically, NAFLD subtypes 3 and 4 are now considered to represent NASH.2, 6, 15 Subsequently, the Brunt criteria were developed to grade NASH, and they have been used for clinical research in patients with NAFLD.16 More recently, the nonalcoholic fatty liver disease activity score (NAS) was developed to provide a numerical pathologic score for patients who most likely have NASH.17

Over the past decade, these different pathologic criteria have been used to carry out epidemiologic studies or to assess the efficacy of different medications in clinical trials of patients with NASH. Despite their increasing use, the interprotocol agreements of these pathologic criteria have not been assessed. Additionally, the ability of these NASH pathologic criteria to predict adverse outcomes such as liver-related mortality (LRM) has not been assessed. The aims of this study were (1) to assess the agreement of three commonly used pathologic criteria for NASH along with our own pathologic criteria (used in the current study) and (2) to determine their ability to predict long-term LRM in a well-defined cohort of patients with NAFLD for whom liver biopsy slides and mortality data were available.

Abbreviations

aHR, adjusted hazard ratio; CI, confidence interval; HR, hazard ratio; ICD-9, International Classification of Diseases, 9th revision; ICD-10, International Classification of Diseases, 10th revision; LRM, liver-related mortality; NAFLD, nonalcoholic fatty liver disease; NAS, nonalcoholic fatty liver disease activity score; NASH, nonalcoholic steatohepatitis.

Patients and Methods

Patient Population.

Patients with histologically proven NAFLD, available liver biopsy slides, and adequate clinical information were selected from our fatty liver databases. This NAFLD cohort included patients with available clinical data and liver biopsy slides from the Armed Forces Institute of Pathology (Washington DC) as well as the original NAFLD patients whom we previously reported.6 For each patient, clinical and demographic data were available (age, sex, race, height, weight, alcohol consumption, medications, presence of diabetes, presence of hyperlipidemia, and results of laboratory tests measuring liver enzymes). The height and the weight were used to calculate the body mass index. To be included in the study, a patient had to have been diagnosed with biopsy-proven NAFLD with a minimum of 5 years of follow-up. Patients were excluded for the following reasons: (1) a daily alcohol intake greater than 20 g in men and greater than 10 g in women; (2) another form of chronic liver disease such as viral hepatitis, autoimmune hepatitis, or medication-induced liver disease; (3) the use of medications associated with fatty liver disease; (4) bariatric surgery or small bowel resection; (5) total parenteral nutrition; and (6) an active or recent malignancy.

The study was approved by the institutional review boards of Inova Health System and the Armed Forces Institute of Pathology.

Pathologic Assessment.

For the purpose of this study, all liver biopsy slides were reread at the same time by two hepatopathologists (Z.G. and H.M.) who were blinded to the clinical data. For each liver biopsy, slides stained with hematoxylin-eosin and Masson's trichrome were reviewed in conference by both hepatopathologists (Z.G. and H.M.), and decisions about each pathologic feature and the diagnosis of NASH were made by consensus. Steatosis was scored as an estimate of the percentage of parenchyma replaced by fat: (0) 0%, (1) up to 5%, (2) 6% to 33%, (3) 34% to 66%, or (4) more than 66%. Lobular inflammation, portal inflammation, hepatocellular ballooning, pericellular/perisinusoidal fibrosis, and portal fibrosis were graded on a scale of 0 to 3: (0) none, (1) mild or few, (2) moderate, or (3) marked or many. Bridging fibrosis was scored as (0) none, (1) few bridges, or (2) many bridges. Cirrhosis was scored as (0) absent, (1) incomplete, or (2) established.

Four pathologic protocols or sets of criteria were used to assess each liver biopsy sample. First, we categorized each liver biopsy into one of the following four groups according to the original criteria for NAFLD subtypes2, 15: (1) steatosis alone (NAFLD type 1), (2) steatosis with lobular inflammation only (NAFLD type 2), (3) steatosis with hepatocellular ballooning (NAFLD type 3), or (4) steatosis with Mallory-Denk bodies or fibrosis (NAFLD type 4). As previously reported, NAFLD types 3 and 4 were considered to be NASH.6 Furthermore, each liver biopsy sample with at least fat and lobular inflammation was further graded as mild (grade 1), moderate (grade 2), or marked (grade 3) as described by Brunt et al.16 For the purpose of this study, patients with Brunt grades of 1 to 3 were combined and were considered to have NASH. Next, we used the current study's pathologic criteria for NASH.18 According to these criteria, NASH was diagnosed for (1) any degree of steatosis along with centrilobular ballooning and/or Mallory-Denk bodies or (2) any degree of steatosis along with centrilobular pericellular/perisinusoidal fibrosis or bridging fibrosis in the absence of another identifiable cause. Finally, for all liver biopsy samples, the elements of NAS and the stage of fibrosis were scored as described by Kleiner et al.17 with separate scores for steatosis (0-3), hepatocellular ballooning (0-2), lobular inflammation (0-3), and fibrosis (0-4). As recommended, NAS was the sum of the first three features. Fibrosis according to the NAS was scored from 0 to 4 [(0) none, (1) centrilobular/perisinusoidal, (2) centrilobular plus periportal, (3) bridging, and (4) cirrhosis]. Each biopsy sample was examined separately according to these four pathologic criteria, and the readings were recorded into the database.

Mortality Follow-Up.

For each patient, the long-term mortality status at the time of the study and the cause of death were obtained from the National Death Index Plus. Maintained by the Center for Disease Control, the National Death Index is a computerized database of all certified deaths in the United States since 1979. In addition to the mortality status, the mortality files contain the dates and causes of death. According to the National Death Index database, people who died in the United States before 1998 were classified according to the guidelines of the International Classification of Diseases, 9th revision (ICD-9), whereas those who died during or after 1998 were classified according to the guidelines of the International Classification of Diseases, 10th revision (ICD-10).19 In the current study, the causes of death classified as LRM included liver fibrosis and cirrhosis (ICD-10 code K74), chronic liver disease and sequelae of chronic liver disease (ICD-9 code 571-572), liver cell carcinoma (ICD-9 code 155.0 and ICD-10 code C22.0), and hepatic failure (ICD-10 code K72).

Statistical Analysis.

The main long-term outcome for this study was LRM. Demographic, clinical, and laboratory characteristics were compared between NAFLD subjects who were dead and those who were alive at the time of follow-up and between LRM and non-LRM groups. The chi-square test or Fisher's exact test was used to compare categorical variables, and nonparametric Mann-Whitney tests were used to compare continuous variables. P values below the level of 0.05 were considered significant.

As described previously, four different sets of pathologic criteria were used to establish the diagnosis of NASH. After the agreement between these different pathologic criteria for NASH was assessed with κ statistics20 (Table 1), each diagnostic criterion for NASH was tested separately for its ability to independently predict LRM (after adjustments for relevant demographic and clinical confounders). Cox proportional hazards models were used to calculate adjusted hazard ratios (aHRs) and to identify independent predictors of LRM, and aHRs with P values ≤ 0.05 were considered potentially significant. Furthermore, two different schemes for grading fibrosis (NAS and the current study's criteria) were tested for agreement with Spearman's correlation coefficient.

Table 1. κ Scores and Strength of Agreement
κInterpretation
<0Poor agreement
0.0-0.20Slight agreement
0.21-0.40Fair agreement
0.41-0.60Moderate agreement
0.61-0.80Substantial agreement
0.81-1.00Almost perfect agreement

Next, the individual pathologic features that constituted each of the four pathologic protocols were tested individually for their ability to independently predict LRM. Because in the original criteria for NAFLD these features were described in the form of ordinal variables, series of tests were performed to identify the thresholds for transforming each pathologic feature into a binary classifier. The threshold for each pathologic feature was selected so that the log-rank test for LRM associated with a transformed binary pathologic feature returned the highest P value in comparison with other possible thresholds for that feature. The transformation of ordinal features into binary features was supposed to eliminate nonequidistant distributions of the degrees of pathologic processes described by the respective pathologic features. The transformed binary pathologic features were further used for building a multivariate survival model for LRM with the aims of (1) identifying those features independently predicting LRM and (2) establishing potential ways of improving existing pathologic protocols.

All analyses were performed with SAS 9.1 (SAS Institute, Inc., Cary, NC).

Results

Liver biopsy slides and clinical data were available for 257 NAFLD patients (67% with NASH and 33% with non-NASH NAFLD); 142 of these patients were from the Armed Forces Institute of Pathology, 72 were from our previously reported NAFLD cohort, and 43 were from Inova Fairfax Hospital. For 209 patients (81%), both liver biopsy slides and mortality data were available, and they were used to calculate pathologic predictors of LRM. Demographic and laboratory data for the studied cohort are summarized in Table 2. The median follow-up length was 146 months (maximum = 342 months, interquartile range = 59-186 months); during follow-up, 31% of the subjects died. Of those deceased at the time of follow-up, 28% died of liver-related causes.

Table 2. Clinical and Demographic Data for the Cohort
 NASH (n = 131)Non-NASH (n = 78)P Value
  1. NASH was diagnosed according to the original NAFLD pathologic protocol.

Prevalence, %62.6837.32 
Deceased, n (%)40 (30.5)24 (30.8)0.97
LRM, n (%)17 (13.0)1 (1.3)0.004
Age, years, mean ± SD49.09 ± 14.3948.05 ± 15.320.76
Age at death, years, mean ± SD68.9 ± 12.171.1 ± 11.70.26
Male gender, n (%)40 (30.5)39 (50.0)0.005
Caucasian, n (%)80 (69.0)62 (84.9)0.013
Obesity, n (%)62 (47.3)40 (51.3)0.58
Type 2 diabetes, n (%)31 (23.7)12 (15.4)0.15
Hyperlipidemia, n (%)22 (16.9)25 (35.7)0.003
Body mass index, kg/m2, mean ± SD37.38 ± 10.3134.87 ± 10.830.13
Weight, kg, mean ± SD103.36 ± 30.0099.52 ± 31.910.37
Alanine aminotransferase, U/L, mean ± SD98.18 ± 83.7172.89 ± 69.950.089
Aspartate aminotransferase, U/L, mean ± SD78.39 ± 76.5540.90 ± 51.59<0.00017
Glucose, mg/dL, mean ± SD130.59 ± 48.86113.71 ± 40.930.029
Total bilirubin, mg/dL, mean ± SD0.98 ± 1.640.80 ± 0.960.33
Total cholesterol, mg/dL, mean ± SD203.59 ± 49.18207.79 ± 49.650.48

Agreement Between Different Pathologic Protocols for the Diagnosis of NASH.

The diagnosis of NASH was established separately with the four different pathologic protocols. κ scores reflecting agreement between the four sets of NASH pathologic criteria are summarized in Table 3. Specifically, the diagnoses of NASH by the original criteria for NAFLD subtypes and the diagnoses of NASH by the current study's criteria were in almost perfect agreement {κ = 0.896 [95% confidence interval (CI) = 0.838-0.953]}. The agreement of NASH diagnoses by the original criteria for NAFLD subtypes and by the current study's NASH protocol with NASH diagnoses by NAS ≥ 5 (the threshold for diagnosing NASH) was moderate [κ = 0.470 (95% CI = 0.367-0.574) and κ = 0.511 (95% CI = 0.409-0.613), respectively]. However, the agreement of the Brunt criteria (any grade of NASH) with the current study's NASH criteria [κ = 0.365 (95% CI = 0.257-0.474] and with the original criteria for NAFLD subtypes [κ = 0.441 (95% CI = 0.329-0.552)] was fair to moderate, and its agreement with NAS ≥ 5 was relatively poor [κ = 0.178 (95% CI = 0.117-0.240)].

Table 3. κ Scores for the Comparison of Four Different Pathologic Protocols Used for the Diagnosis of NASH
 κ Score (95% CI)
Original NAFLD SubtypesBrunt CriteriaNAS (Reference)
Current study's NASH protocol2, 6, 150.896 (0.838-0.953)0.365 (0.257-0.474)0.511 (0.409-0.613)
Original NAFLD subtypes160.441 (0.329-0.552)0.470 (0.367-0.574)
Brunt criteria170.441 (0.329-0.552)0.178 (0.117-0.240)

Our data also show that using NAS ≥ 5 for establishing the diagnosis of NASH missed 40% to 45% of the NASH patients diagnosed by the current study's NASH criteria and by the original criteria for NAFLD subtypes. In fact, only 72 of 131 patients diagnosed with NASH by the original criteria for NAFLD subtypes and 75 of 123 patients diagnosed by the current study's NASH criteria were also diagnosed with NASH by an NAS value of 5 or higher.

On the other hand, in comparison with the current study's NASH criteria and the original criteria for NAFLD subtypes, another 30% of NAFLD patients were considered to have NASH according to the Brunt criteria. In fact, all these patients were diagnosed to have grade 1 NASH by the Brunt criteria.

In order to test whether a better agreement could be achieved with a different NAS threshold, NAS values ≥ 3 and ≥ 4 were separately considered as definitions of NASH. Lowering the NAS threshold improved the agreement of the NAS criteria with the original criteria for NAFLD subtypes and with the current study's NASH criteria [κ = 0.645 (95% CI = 0.544-0.746) and κ = 0.564 (95% CI = 0.457-0.672) for the NAS threshold of 3 and κ = 0.600 (95% CI = 0.502-0.698) and κ = 0.602 (95% CI = 0.504-0.701) for the NAS threshold of 4, respectively]. Despite this improvement in κ scores, the agreement remained moderate. On the other hand, assessing the agreement between different protocols for the fibrosis stage, we were able to show that the NAS fibrosis scores and the current study's fibrosis scores were in excellent agreement [nonparametric correlation coefficient = 0.74 (P < 0.0001) for pericellular fibrosis and nonparametric correlation coefficient = 0.83 (P < 0.0001) for portal fibrosis].

Different NASH Pathologic Criteria and LRM.

Regardless of which criteria were used to establish the diagnosis of NASH (with the exception of the Brunt criteria), patients with the pathologic diagnosis of NASH had higher LRM than those with non-NASH NAFLD (Table 4). However, assuming that the diagnosis of NASH was required for LRM, we found that the proportion of correctly included NASH diagnoses was higher (89%-95%) when the diagnosis of NASH was established by the original criteria for NAFLD subtypes or the current study's criteria for NASH rather than the NAS threshold of 5. In fact, with the recommended NAS threshold of 5, only 61% of NAFLD cases who died of liver-related causes were diagnosed as having NASH. On the other hand, the Brunt criteria demonstrated the lowest specificity of the four pathologic protocols. In fact, our data indicated that 85% of individuals with NAFLD who did not die from liver-related causes were still diagnosed with NASH by the Brunt criteria, whereas other protocols diagnosed 33% to 60% of those with NASH. However, when patients with Brunt's grade 1 NASH were excluded from the NASH group, the proportion of patients with NASH diagnoses among those who did not die from liver-related causes decreased to 41%, and the proportion of correctly included NASH diagnoses (those resulting in LRM) remained at the level of 89%. These data again suggest that Brunt grade 1 for NASH leads to an overdiagnosis of NAFLD in patients who do not develop progressive liver disease causing liver-related deaths.

Table 4. HRs Predicting LRM for the Diagnosis of NASH According to Each Pathologic Protocol
NASH ProtocolTestHR (95% CI)P Value
Current study's NASH protocolLog-rank test5.13 (1.18-22.45)0.03
Cox aHR4.43 (0.97-20.20)0.05
Original NAFLD subtypesLog-rank test10.50 (1.39-78.93)0.02
Cox aHR9.94 (1.28-77.08)0.03
Brunt criteriaLog-rank test2.99 (0.82-10.92)0.26
Cox aHR2.54 (0.34-19.15)0.37
NAS scoring schemeLog-rank test2.52 (0.97-6.55)0.06
Cox aHR2.64 (0.92-7.59)0.07

In an attempt to assess the predictability of NASH diagnoses made by the four separate pathologic protocols, we ran a multivariate analysis. After we controlled for important confounders (age, gender, ethnicity, and presence of obesity and diabetes), a diagnosis of NASH made by the original criteria for NAFLD subtypes [aHR = 9.94 (95% CI = 1.28-77.08)] and a diagnosis of NASH made by the current study's criteria for subtypes [aHR = 4.43 (95% CI = 0.97-20.20)] were independent predictors of LRM (Table 4).

Again, similarly to the interprotocol agreement analysis, we attempted to find the optimal NAS value for predicting LRM. In our analysis, for an association with LRM, the best NAS threshold was 4 (i.e., a patient with NAS ≥ 4 was presumed to have NASH). Nevertheless, the association of this NAS threshold with LRM remained nonsignificant [log-rank P = 0.098, aHR = 2.92 (95% CI = 0.95-8.95)]. Additionally, other thresholds for NAS (both higher and lower) did not return any significant association with LRM.

Association of Individual Pathologic Features of NASH With LRM (Table 5).

Next, we assessed individual pathologic features used in the original criteria for NAFLD subtypes for their ability to predict LRM. In a Cox proportional hazards model consisting of all the originally independent pathologic features (i.e., bridging fibrosis and cirrhosis were not included because they could be described as linear combinations of other features) and using each feature as an ordinal parameter, only fibrosis independently predicted LRM.

Table 5. Baseline Pathologic Features Associated With LRM
Histologic FeatureBiopsy Samples With Specific Features (%)Liver-Related Deaths (%)
Steatosis  
 1 (≤5%)220
 2 (6%-33%)4483
 3 (34%-66%)2717
 4 (>66%)70
Lobular inflammation  
 0 (none)86
 1 (mild)4428
 2 (moderate)3544
 3 (severe)1322
Portal inflammation  
 0 (none)196
 1 (mild)4417
 2 (moderate)3571
 3 (severe)26
Hepatocellular ballooning  
 0 (none)4311
 1 (rare)2117
 2 (frequent)2761
 3 (severe/numerous)911
Mallory-Denk bodies  
 0 (none)5211
 1 (rare)2533
 2 (frequent)1739
 3 (severe/numerous)617
Pericellular/perisinusoidal fibrosis  
 0 (none)3817
 1 (mild)2317
 2 (moderate)2738
 3 (severe)1228
Portal fibrosis  
 0 (none)266
 1 (mild)3011
 2 (moderate)3322
 3 (severe)1161
Bridging fibrosis  
 0 (none)6517
 1 (few)110
 2 (numerous)2483
Cirrhosis  
 0 (absent)8850
 1 (incomplete)617
 2 (established)633
NAS and Brunt fibrosis stage  
 0 (none)230
 1 (perisinusoidal only)2811
 2 (perisinusoidal/periportal)146
 3 (bridging)2233
 4 (cirrhosis)1350
NAFLD subtype  
 186
 2250
 340
 46394
Brunt NASH grade  
 0136
 23156
 31738
NAS value  
 0-2296
 31417
 41517
 ≥54260
Current study's NASH definition  
 No3511
 Yes6589

After each pathologic feature was transformed into binary parameters, univariate survival analyses showed that portal inflammation [grade ≥ 2; HR = 6.68 (95% CI = 2.20-20.3), P = 0.0008], ballooning degeneration [grade ≥ 2; HR = 5.32 (95% CI = 1.89-14.9), P = 0.0015], Mallory-Denk bodies [grade ≥ 2; HR = 4.21 (95% CI = 1.66-10.7), P = 0.0024], portal fibrosis [grade > 2; HR = 14.1 (95% CI = 5.47-36.5), P < .0001], and pericellular fibrosis [grade > 2; HR = 4.86 (95% CI = 1.73-13.7), P = 0.0027] on the liver biopsy samples were all associated with LRM. Additionally, histologic documentation of advanced fibrosis grade ≥ 2 [HR = 20.4 (95% CI = 5.9-70.5), P < 0.0001] or any cirrhosis [HR = 10.6 (95% CI = 4.19-26.7), P < 0.0001] was also associated with LRM. In these univariate analyses, the grades of steatosis and lobular inflammation were not associated with LRM.

Using multivariate analyses, we further tested these pathologic features as independent predictors of LRM. As a result, using these binary parameters in the Cox proportional hazards model, we found that portal fibrosis graded as ≥3 (with the current study's fibrosis grading, this included all patients with bridging fibrosis and cirrhosis) remained independently associated with LRM [aHR = 5.68 (95% CI = 1.50-21.45)]. Similarly, when we tested pathologic features graded with NAS, advanced fibrosis (stage 4) was independently associated with LRM [aHR = 5.62 (95% CI = 1.92-6.46)]. The LRM survival curves for individuals with different grades of fibrosis graded according to the current study's criteria (Fig. 1A) and the NAS criteria (Fig. 1B) are shown.

Figure 1.

Liver-related survival curves for individuals with different grades of fibrosis graded according to (A) the current study's criteria and (B) the NAS criteria.

Discussion

This is the first study providing the interprotocol agreement and predictability values for LRM of four sets of pathologic criteria for diagnosing NASH. This study used a well-defined cohort of NAFLD patients for whom clinical data, liver biopsy slides, and long-term mortality data were available. We have confirmed the findings of previous studies2, 6-18 reporting that, regardless of the specific pathologic criteria for NASH, patients with a histologic diagnosis of NASH have higher LRM.

Our study is the first study to assess interprotocol agreement between different pathologic criteria for NASH. Our data show that diagnoses of NASH by the original criteria for NAFLD subtypes15 and diagnoses of NASH by the current study's NASH criteria18 were in almost perfect agreement. On the other hand, diagnoses of NASH made by NAS threshold of17 showed only moderate agreement with diagnoses of NASH made by the other pathologic criteria for NASH. Nevertheless, NAS threshold of 5 returned an almost 100% positive predictive value for NASH as a cause of liver-related death. However, this NAS threshold, missed every third NAFLD patient who would ultimately die of liver-related causes, whereas the other NASH pathologic criteria used in this study successfully predicted 90% to 100% of cases dying from liver-related causes. These findings are consistent with the conclusions of a recent article by the authors of NAS that discusses the role of NAS thresholds.21 In fact, our data indicate that a NAS value ≥5 seems to be a strict criterion that could potentially lead to underdiagnosis of NAFLD in patients who will ultimately die of liver-related causes. Changing the threshold to lower values improves its ability to predict LRM but only in a limited way. It is important to remember that the original article reporting the development of NAS for use in clinical trials noted that most patients with NAS values ≥5 had NASH according to subjective criteria, but some with lower NAS values also had NASH; consequently, NAS values were not recommended for establishing the diagnosis of NASH.17 The same point is reiterated in a recent publication on the role of NAS, which was not developed as a criterion for establishing the diagnosis of NASH.21 Despite these precautions, a number of studies have used an NAS value ≥5 as a diagnostic criterion for NASH. Our data confirm the conclusion by Kleiner et al.17 and Brunt et al.21 that the diagnosis of NASH by NAS (regardless of its threshold) is inappropriate. In fact, our study shows that NAS does not offer any advantage over the other NASH pathologic protocols in its ability to predict an important long-term outcome of NAFLD patients, that is, LRM. Nevertheless, NAS does offer some advantage by providing a numerical score for each pathologic component of NAFLD that can be followed over time during clinical trials of different agents, which generally occur over relatively short periods of time (12-18 months).

In addition, our data also show that the original Brunt grading criteria for NASH seems to overdiagnose NASH. This issue affects the agreement of this pathologic protocol with other NASH protocols as well as its ability to predict LRM. However, if patients with Brunt grade 1 NASH are no longer considered to have NASH, the agreement and predictability of the Brunt criteria become very similar to those of the other protocols (Table 5).

Considering these issues, we can return to the results of the assessment of the predictability of each individual pathologic feature for LRM. Although a number of pathologic features (e.g., ballooning, portal inflammation, and Mallory-Denk bodies) seemed to be associated with LRM in the univariate analysis, fibrosis (any grade) remained an independent predictor of LRM in the multivariate analysis. In fact, advanced stages of fibrosis demonstrated the best independent association (aHR) with liver deaths. Furthermore, this independent association of advanced fibrosis with LRM was confirmed for both fibrosis-grading systems: the one used in the NAS protocol and the other used in the current study's NASH protocol. This issue has an important prognostic implication. Although NASH is considered the potentially progressive type of NAFLD, NASH patients with fibrosis are at highest risk for LRM. Although treatment modalities for NASH patients should be developed to prevent NASH-related fibrosis, patients who already have NASH and fibrosis should also become the subjects of careful clinical monitoring and future treatment protocols.

One of the main limitations of our study was the use of LRM as an outcome for validating the diagnosis of NASH. It is important to understand that any cause-specific mortality (i.e., LRM) lacks true negative controls: deaths other than liver-related deaths did not mean the absence of life-threatening liver disease, so those who had been diagnosed with NASH but eventually did not die from liver-related causes were not genuine false positives. In our study, patients with NASH often had other chronic conditions, and as can be seen from the summary of our mortality data, half of those who had NASH and might well have been on their way to a liver-related death died earlier from other causes (mostly from coronary artery disease). Indeed, such individuals who had been diagnosed with NASH and died later from other causes were approximately 9 years older and were more frequently diabetic than those with NASH who succumbed to liver-related deaths (Table 6). In fact, it is likely that in older subjects and in subjects with multiple chronic diseases, NASH is probably not the top risk factor for adverse outcomes. Additionally, we observed that NASH subjects dying from liver-related causes died on average 6 years earlier than subjects without NASH. Therefore, we assume that NASH, once it has progressed, could be responsible for approximately 6 fewer years of life, and the rate of its progression to liver death over 10 years is approximately 16%.

Table 6. Mortality Follow-Up for Patients With NASH (Diagnosed by the Original Criteria for NAFLD Subtypes) Who Had at Least 10 Years of Follow-Up
 Alive (n = 68)Died From Other Causes (n = 23)Died From Liver-Related Causes (n = 17)P Value
Rate, %62.9621.315.74 
Age, years, mean ± SD46.22 ± 11.7165.09 ± 9.3656.44 ± 12.54<0.0001
Male gender, n (%)29 (42.6)4 (17.4)4 (23.5)0.017
Caucasian, n (%)38 (67.9)17 (81.0)13 (81.3)0.16
Obesity, n (%)25 (36.8)8 (34.8)6 (35.3)0.85
Type 2 diabetes, n (%)13 (19.1)9 (39.1)5 (29.4)0.066
Hyperlipidemia, n (%)11 (16.2)5 (22.7)1 (5.9)0.091
Body mass index, kg/m2, mean ± SD31.72 ± 6.2029.68 ± 6.5731.65 ± 6.400.38

Other important limitations of our study were its relatively small sample size and the lack of external validation with another similar cohort of NAFLD patients. Unfortunately, we are not aware of another relatively large cohort of biopsy-proven NAFLD subjects with available liver biopsy slides, extensive clinical data, and long-term mortality follow-up data. Nevertheless, the in-depth analysis of our data and the availability of long-term mortality data make this study quite unique.

In summary, our data confirm that patients with NASH are at high risk for LRM.1-14, 21-30 We have also shown that a certain degree of agreement exists between all four sets of pathologic criteria for NASH. Nevertheless, only two have been found to have the best interprotocol agreement as well as the best independent predictability for LRM in patients with NAFLD.

Ancillary