Potential conflict of interest: Nothing to report.
A list of members of the Nonalcoholic Steatohepatitis Clinical Research Network is located in the Appendix.
Nonalcoholic fatty liver disease (NAFLD) is characterized by hepatic steatosis in the absence of a history of significant alcohol use or other known liver disease. Nonalcoholic steatohepatitis (NASH) is the progressive form of NAFLD. The Pathology Committee of the NASH Clinical Research Network designed and validated a histological feature scoring system that addresses the full spectrum of lesions of NAFLD and proposed a NAFLD activity score (NAS) for use in clinical trials. The scoring system comprised 14 histological features, 4 of which were evaluated semi-quantitatively: steatosis (0-3), lobular inflammation (0-2), hepatocellular ballooning (0-2), and fibrosis (0-4). Another nine features were recorded as present or absent. An anonymized study set of 50 cases (32 from adult hepatology services, 18 from pediatric hepatology services) was assembled, coded, and circulated. For the validation study, agreement on scoring and a diagnostic categorization (“NASH,” “borderline,” or “not NASH”) were evaluated by using weighted kappa statistics. Inter-rater agreement on adult cases was: 0.84 for fibrosis, 0.79 for steatosis, 0.56 for injury, and 0.45 for lobular inflammation. Agreement on diagnostic category was 0.61. Using multiple logistic regression, five features were independently associated with the diagnosis of NASH in adult biopsies: steatosis (P = .009), hepatocellular ballooning (P = .0001), lobular inflammation (P = .0001), fibrosis (P = .0001), and the absence of lipogranulomas (P = .001). The proposed NAS is the unweighted sum of steatosis, lobular inflammation, and hepatocellular ballooning scores. In conclusion, we present a strong scoring system and NAS for NAFLD and NASH with reasonable inter-rater reproducibility that should be useful for studies of both adults and children with any degree of NAFLD. NAS of ≥5 correlated with a diagnosis of NASH, and biopsies with scores of less than 3 were diagnosed as “not NASH.” (HEPATOLOGY 2005;41:1313–1321.)
If you can't find a tool you're looking for, please click the link at the top of the page to "Go to old article view". Alternatively, view our Knowledge Base articles for additional help. Your feedback is important to us, so please let us know if you have comments or ideas for improvement.
Nonalcoholic fatty liver disease (NAFLD) is increasingly recognized as the hepatic manifestation of insulin resistance and the systemic complex known as metabolic syndrome.1–4 Prevalence estimates of NAFLD have used a variety of laboratory and imaging assessments and suggest that NAFLD may be the most common form of chronic liver disease in adults in the United States, Australia, Asia, and Europe, paralleling the “epidemic” of obesity in developed countries.5–9
Ludwig et al.10 are credited with solidifying the nomenclature and pathological findings of nonalcoholic steatohepatitis (NASH) in a seminal manuscript published in 1980. Now recognized as a progressive form of fatty liver disease, NASH has been documented to have the potential to progress to cirrhosis and hepatocellular carcinoma.3, 5 NASH may be a leading cause of “cryptogenic cirrhosis” in which etiologically specific clinical laboratory or pathological features can no longer be identified.11 NAFLD is also gaining recognition as a significant form of liver disease in pediatric populations, which in some patients may progress to cirrhosis in adulthood.12, 13
Whereas laboratory test abnormalities and radiographic findings may be suggestive of NAFLD, histological evaluation remains the only means of accurately assessing the degree of steatosis, the distinct necroinflammatory lesions and fibrosis of NASH, and distinguishing NASH from “simple” steatosis, or steatosis with inflammation.14 Matteoni et al.15 showed that cirrhosis developed in 21% to 28% of patients whose index biopsies had shown the combination of lesions of steatosis, inflammation, ballooning, and Mallory's hyaline or fibrosis, whereas only 4% of patients with simple steatosis and none of the patients with steatosis and inflammation alone had evidence of cirrhosis during the 10 years of follow-up.
A system for semiquantitative evaluation for the unique lesions recognized for NASH proposed by Brunt et al.16 in 1999 was developed to parallel the concepts and terminology used in chronic hepatitis for semiquantitative evaluation, commonly referred to as “grading” and “staging.”17 The proposed system was based on the concept that the histological diagnosis of NASH rests on a constellation of features rather than any individual feature. However, it was developed for NASH and was not developed to encompass the entire spectrum of NAFLD as defined by Matteoni et al.15 A different semiquantitative feature-based scoring system for NAFLD was used in a recently published treatment trial from Promrat et al.18 Neither of these systems was designed to evaluate pediatric NAFLD, which may show different histological features than adult NASH.12
Beginning in 2002, the National Institute of Diabetes & Digestive & Kidney Diseases (NIDDK) sponsored the development of a multicenter cooperative Clinical Research Network for NASH.19 Among the goals of the network were: (1) to form a database for long-term natural history observations of patients with NAFLD; (2) to collect clinical samples for metabolic, immunological, molecular, and genetic studies of NAFLD focusing on defining the etiology of this disease; and (3) to evaluate promising therapies for NASH in both adult and pediatric populations. One of the charges to the NASH Clinical Research Network was to develop and validate a system of histological evaluation that would encompass the spectrum of NAFLD, that could be applied to pediatric NAFLD, and that would allow for assessment of changes with therapy. This report describes the multi-center effort to accomplish this task.
NAFLD, nonalcoholic fatty liver disease; NASH, nonalcoholic steatohepatitis; NIDDK, National Institute of Diabetes & Digestive & Kidney Diseases; NIH, National Institutes of Health; H&E, hematoxylin and eosin; NAS, NAFLD activity score.
Materials and Methods
The Pathology Subcommittee, composed of pathologists from each Clinical Research Network Clinical Center (total of eight), a pathologist from the National Institutes of Health (NIH), two principal investigators from Clinical Centers, the NIDDK project scientist, and the principal investigator from the Data Coordinating Center, met to devise a scoring system and to discuss plans for case review. A study set was assembled from cases contributed by each center; the cases provided were drawn from locally evaluated patients who underwent biopsies to “rule out” NAFLD or NASH. Cases were specifically included to cover the range of possible diagnoses: (1) diagnostic of steatohepatitis, (2) borderline or possible steatohepatitis, and (3) not diagnostic of steatohepatitis. Each case had hematoxylin and eosin (H&E) and Masson trichrome stains submitted. The study set cases were stripped of all clinical information except for the designation as “adult” or “pediatric,” and the pathologists were blinded to even this piece of information. The Institute Review Boards at each Clinical Center, the Data Coordinating Center, and the Office of Human Subjects Research of the NIH approved the research plan.
Scoring System Definition.
The Committee reviewed the study set cases as a group and then proposed an evaluation method for each of the recognized features of NAFLD. The Committee agreed that only H&E and Masson's trichrome stains should be necessary to perform the evaluation. The histological features were grouped into five broad categories: steatosis, inflammation, hepatocellular injury, fibrosis, and miscellaneous features. The system of evaluation is detailed in Table 1, and examples of ballooning, microvesicular steatosis, and megamitochondria are shown in Figs. 1 and 2.
Table 1. NASH Clinical Research Network Scoring System Definitions and Scores in Study Set
% Responses in Category for Study Set Cases
Adult (n = 576)
Pediatric (n = 162)
Ballooning classification: few indicates rare but definite ballooned hepatocytes as well as case that are diagnostically borderline; examples are shown in Fig. 1. Examples of patches of microvesicular steatosis and megamitochondria are shown in Fig. 2.
The “None to rare” category is meant to alleviate the need for time-consuming searches for rare examples or deliberation over diagnostically borderline changes. If the feature is identified after a reasonable search, it should be coded as “many.”
Diagnostic classification was not available on 2 sets of adult biopsy observations, reducing the total of such observations to 512.
Low- to medium-power evaluation of parenchymal involvement by steatosis
For the validation study, the pathologists were additionally asked to provide an overall diagnosis for each case as “NASH,” “borderline,” or “not NASH.” The biopsy specimens were made anonymous and randomized by an NIH employee not involved in the study. The adult cases were reviewed by each pathologist at his or her home institutions twice and at separate times; the pediatric cases were circulated once.
Pediatric and adult cases were analyzed separately. Weighted kappa scores were used to measure the degree of inter-rater and intra-rater agreement between and within multiple pathologists. Weighted kappa scores were estimated by using intra-class correlation coefficients derived from components of variance models.20 Chi-square tests were used to compare univariate associations of histological features with diagnosis of steatohepatitis. Multivariate associations with diagnosis of steatohepatitis were assessed by using multiple logistic regression models using generalized estimating equations with robust variance estimation and exchangeable correlation to account for correlations due to multiple readings within and between raters.21 Adjusted percentage distributions for each histological feature, holding other features constant and retaining the univariate marginal distributions, were calculated from the logistic regression model coefficients for comparison with the unadjusted univariate distributions of features. All statistical analyses were carried out by the Data Coordinating Center (M. Van N.) using SAS 8.0 (SAS Institute Inc., Cary, NC) and Stata 7.0 (Stata Corp., College Station, TX)
A total of 50 anonymous liver biopsy specimens formed the validation study set. The study set was chosen to sample the range of pathological conditions seen in pediatric and adult NAFLD. The 32 cases from adults were each read twice by the nine pathologists, for a total of 576 sets of observations; the 18 pediatric cases were each read once, for a total of 162 sets of observations. Each possible score was used at least once during the course of the validation. The distribution of recorded scores is shown in Table 1.
Analysis of agreement on features in the adult cases showed reasonable agreement on the scores for the major scoring categories of steatosis grade, fibrosis, ballooning injury, and Mallory's hyaline, with weighted kappa values for the adult study set of 0.5 and above (Table 2). Agreement was not as strong for the inflammatory changes or for steatosis location; both lobular and portal inflammation had interrater kappa values of 0.45. The interrater agreement on the diagnosis of steatohepatitis was 0.61. There was absolute agreement on the diagnosis between the nine pathologists on six cases and agreement among eight of the nine on a further five cases. The intra-rater agreement was higher in all categories than the interrater agreement.
Table 2. Inter- and Intra-rater Variability
Agreement (Kappa Score)
Adult Cases (32 cases, 9 raters)
Adult Cases (32 cases, 9 raters)
Pediatric Cases (18 cases, 9 raters)
NOTE. All values represent weighted kappa statistics where appropriate.
Feature scoring agreement on the pediatric cases in the study set was not as robust as in the adult study set, with lower weighted kappa scores in all categories except steatosis (0.64). Some of the results were statistically no better than chance (microvesicular steatosis, pigmented macrophages, lipogranulomas and megamitochondria, and glycogenated nuclei); this may be caused in part by the low level of agreement on the presence or absence of the feature as seen in the adult cases and in part by the low frequency of observation of one of the two scores. Many of the pediatric cases appeared to have more zone 1 steatosis, more “periportal only” fibrosis, less ballooning, and rare Mallory's hyaline. In particular, there was disagreement between pathologists on the diagnosis of steatohepatitis when cases had fibrosis and steatosis but little or no ballooning or lobular inflammation.
Although it is accepted that steatohepatitis is a pattern of injury composed of several features, it has been difficult to define exact diagnostic criteria that all pathologists agree on to precisely distinguish NAFLD cases with steatohepatitis from those with only steatosis and inflammation.14 Data generated by the NASH Network study on how each pathologist categorized each case along with the assigned feature scores allowed statistical examination of which individual features were useful in discriminating definite steatohepatitis from the other two categories. Both crude and adjusted analyses were performed to address these questions; the results are shown in Table 3. Several features showed significant association with the diagnosis of steatohepatitis on both analyses and for both adults and children, including lobular inflammation, ballooning degeneration, and fibrosis. In the adult cases, the degree of steatosis showed a trend toward significance in the crude analysis but was clearly associated with the diagnosis of NASH in the adjusted analysis. In children, the degree of steatosis above the 5% cutoff was not associated with the diagnosis of NASH in either analysis.
Table 3. Logistic Regression Analysis of Features of NAFLD With Respect to the Diagnosis of NASH
Adjusted percent with NASH diagnosis calculated from a logistic regression model for correlated data with NASH diagnosis as the outcome and indicator variables for each histological feature. Marginal totals for adjusted percentages were fixed to equal the unadjusted marginal totals, and odds ratios from the adjusted percentages equal those from the logistic regression model.
Combined 5 observations in the <5% category with 5%-33% category because there were no events in the <5% category.
P values were derived from comparisons of percentages with diagnosis of NASH across categories of each histological feature and were calculated from chi-square tests (unadjusted percentages) or from Wald's tests from the logistic regression model (adjusted percentages).
Not calculable because 2 of 2 observations in large lipogranulomas-present category had the event.
Portal inflammation, location of steatosis, Mallory's hyaline, and megamitochondria were associated with steatohepatitis on crude analysis but, when analyzed in the adjusted model, were not significantly associated. In this series, 81% of cases with Mallory's hyaline were also given a ballooning score of 2; this may be why this feature was not independently associated with steatohepatitis.
The proportion of adult cases categorized as steatohepatitis for the most significant features is shown in Fig. 3. Although no single feature or score was absolutely associated with a diagnosis of steatohepatitis, the severity of some individual findings did show association. Pathologists had somewhat varied criteria for steatohepatitis and tended to emphasize or require some features more than others. Even so, the mere presence of a combination of features was not sufficient for a diagnosis of steatohepatitis. For example, only 68% of adult cases with steatosis, ballooning, and lobular inflammation (a common set of minimal criteria14) were diagnosed as NASH. Among the individual pathologists, this percentage varied from 56% to 79%. Adding fibrosis increased this fraction to 82%, but at the cost of increasing the fraction of cases that failed to meet these criteria yet still were diagnosed as steatohepatitis. These results emphasize how difficult it is to reduce the histopathological diagnosis of steatohepatitis to the presence or absence of specific features, independent of the variability that is part of any observation.
Based on both the agreement data and the multiple regression analysis, we would propose a NAFLD Activity Score (NAS), which specifically includes only features of active injury that are potentially reversible in the short term. The score is defined as the unweighted sum of the scores for steatosis (0-3), lobular inflammation (0-3), and ballooning (0-2); thus ranging from 0 to 8. Fibrosis, which is both less reversible and generally thought to be a result of disease activity, is not included as a component of the activity score. The separation of fibrosis from other features of activity is an accepted paradigm for staging and grading for both NASH16 and chronic hepatitis.17
The relationship of the NAS to the diagnosis of steatohepatitis is shown in Fig. 4. Cases with NAS of 0 to 2 were largely considered not diagnostic of steatohepatitis; on the other hand, most cases with scores of ≥5 were diagnosed as steatohepatitis. Cases with activity scores of 3 and 4 were divided almost evenly between the 3 diagnostic categories. A similar analysis of pediatric case scores showed nearly identical results (data not shown).
The aims of this study were to devise and validate a feature-based semiquantitative scoring system to be used in clinical trials and natural history studies of NAFLD. The feature scoring system described used the recognized lesions of NAFLD and NASH10, 14, 15 and identified a core group of histological features for evaluation. This system was based on and further refined the grading proposal of Brunt et al.16 In particular, the fibrosis staging system was subdivided, and the scoring of steatosis was altered. Fibrosis scores for stage 1 were extended to include a distinction between delicate (1A) and dense (1B) perisinusoidal fibrosis, and to detect portal-only fibrosis, without perisinusoidal fibrosis (stage 1C). Minimal steatosis (score 0) (under 5%) was separated from mild (5%-33%) steatosis (score 1) to avoid giving weight to this feature when very little steatosis is present. A minimum of 5% steatosis was used for the operational minimal definition of histological NAFLD in biopsy specimens from both adults and children. Evaluation of ballooning was limited to three categories (none, few, and many) after discussion among the Committee pathologists as to what might be the most reproducible cutoff points. Lobular inflammation was assessed semiquantitatively on a scale that is the same as the method of Brunt et al.16 The proposed system differed from the previous methods in that the histologically distinct lesions of NAFLD were assessed and summed to provide an NAFLD Activity Score (NAS), in contrast with methods that depended more on an aggregate assessment to grade severity16 or to make disease categories.22 None of the features was weighted in this analysis; future studies may identify elements that indeed deserve greater weight.
With respect to these major disease features, the current system had fair to good interobserver reproducibility in both adult and pediatric cases and was equivalent to previously reported interobserver reproducibility studies in fatty liver disease.22, 23 Each of the major features (steatosis grade, lobular inflammation, ballooning, and fibrosis) showed independent correlation with a diagnosis of steatohepatitis. Based on this observation and the reproducibility studies, we defined a NAS for evaluating histological changes after therapeutic intervention trials. It is important to note that the primary purpose of the NAS is to assess overall histological change; it is not intended that numeric values replace the pathologist's diagnostic determination of steatohepatitis. The NAS has also not been studied as a measure of the rapidity of disease progression, nor should it be taken as an absolute severity scale.
With respect to other features assessed in this scoring system, some were more reproducible than others. The less reproducible features were considered an optional part of the scoring system, and for the purposes of the NASH Clinical Research Network, will only be evaluated in group review sessions, where differences in observations between pathologists can be immediately addressed.
One of the findings of this study was the lesser degree of interobserver agreement in scoring the features of the pediatric cases of NAFLD submitted for review. As has been previously reported in pediatric NAFLD,12, 13, 24, 25 pediatric study cases included a significant proportion with only periportal fibrosis. This feature is considered unusual in adults, although it has been reported.26 Pediatric biopsy specimens also showed less lobular inflammation, less ballooning, and only rare Mallory's hyaline when compared with adult cases. The only pediatric case that all 9 pathologists agreed was diagnostic of steatohepatitis resembled “typical” adult steatohepatitis, with prominent ballooning, moderate lobular inflammation, and Mallory's hyaline. We postulate that the increased variability of scores was attributable to different patterns of disease in pediatric NAFLD, and we will continue to study this in the cases accrued by the NASH Clinical Research Network. Nevertheless, the scoring system that was developed covers the range of features in pediatric NAFLD.
One concern for any new scoring system is how it applies in actual clinical trials. Although the data are not presented here, this method for scoring and the activity score (NAS) were used in a blinded re-review of biopsy specimens from two recently published therapeutic trials.18, 27 The conclusions that could be drawn regarding histological changes were essentially the same as those in the original publications. These results were useful in protocol design for the clinical trials that have been developed by the NASH Network.
In summary, we have designed and validated a semiquantitative scoring system that is useful for assessing the range of histological features of NAFLD. The system is simple and requires only routine histochemical stains (H&E and Masson trichrome stains), so that the system can be used by practicing pathologists. Of course, other special stains for additional analyses may be used at the discretion of each pathologist. The method proposed showed reasonable interrater agreement among experienced hepatopathologists similar to other studies of variability in fatty liver disease.22, 23 Multiple regression analysis of the scores with respect to the diagnosis of NASH confirmed previous observations that the diagnosis of steatohepatitis is not dependent on a single histological feature, but rather involves assessment of multiple independent features. As a reflection of this fact, the NAS was able to discriminate between NASH and non-NASH fatty liver disease in this patient population. To make it easy for other pathologists to make use of this system, it is our plan to show examples of the lesions on the NASH Clinical Research Network web site.
Appendix: Members of the Nonalcoholic Steatohepatitis Clinical Research Network
Case Western Reserve University, Cleveland, OH: Yao-Chang Liu, M.D.; Arthur J. McCullough, M.D. (Principal Investigator); Duke University Medical Center, Durham, NC: Anna Mae Diehl, M.D. (Principal Investigator); Marcia Gottfried, M.D.; Michael S. Torbenson, M.D.; Indiana University School of Medicine, Indianapolis, IN: Naga Chalasani, M.D. (Principal Investigator); Oscar W. Cummings, M.D.; Johns Hopkins University Center for Clinical Trials (Data Coordinating Center), Baltimore, MD: Pat Belt, B.S.; Aynur Ünalp-Arida, M.D., Ph.D.; Mark Van Natta, M.H.S.; James Tonascia, PhD (Principal Investigator) National Cancer Institute (NCI), Bethesda, MD: David E. Kleiner, M.D., Ph.D. (Co-Lead Pathologist); National Institute of Child Health and Human Development (NICHD), Bethesda, MD: Gilman D. Grave, MD (Project Scientist) National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK), Bethesda, MD: Jay Hoofnagle, M.D. (Project Scientist); Patricia R. Robuck, Ph.D. (Project Scientist); St Louis University Hospital, St. Louis, MO: Elizabeth M. Brunt, M.D. (Co-Lead Pathologist); Brent A. Tetri, M.D. (Principal Investigator); University of California San Diego, San Diego, CA: Cynthia Behling, M.D.; Joel E. Lavine, M.D., Ph.D. (Principal Investigator); University of California San Francisco, San Francisco, CA: Nathan M. Bass, M.D., Ph.D. (Principal Investigator); Linda D. Ferrell, M.D.; University of Washington, Seattle, WA: Kris V. Kowdley, M.D. (Principal Investigator); Matthew Yeh, M.D., Ph.D. Virginia Commonwealth University, Richmond, VA: Melissa J. Contos, M.D.; Arun J. Sanyal, M.D. (Principal Investigator)