Systematic review with meta-analysis: non-invasive assessment of non-alcoholic fatty liver disease – the role of transient elastography and plasma cytokeratin-18 fragments

Authors

  • R. Kwok,

    1. Institute of Digestive Disease, The Chinese University of Hong Kong, Hong Kong, China
    2. Department of Gastroenterology and Hepatology, Concord Repatriation Hospital, Sydney, Australia
    Search for more papers by this author
  • Y.-K. Tse,

    1. Institute of Digestive Disease, The Chinese University of Hong Kong, Hong Kong, China
    2. Department of Medicine and Therapeutics, The Chinese University of Hong Kong, Hong Kong, China
    Search for more papers by this author
  • G. L.-H. Wong,

    1. Institute of Digestive Disease, The Chinese University of Hong Kong, Hong Kong, China
    2. Department of Medicine and Therapeutics, The Chinese University of Hong Kong, Hong Kong, China
    Search for more papers by this author
  • Y. Ha,

    1. Department of Internal Medicine, Asan Medical Center, University of Ulsan College of Medicine, Seoul, Korea
    Search for more papers by this author
  • A. U. Lee,

    1. Department of Gastroenterology and Hepatology, Concord Repatriation Hospital, Sydney, Australia
    Search for more papers by this author
  • M. C. Ngu,

    1. Department of Gastroenterology and Hepatology, Concord Repatriation Hospital, Sydney, Australia
    Search for more papers by this author
  • H. L.-Y. Chan,

    1. Institute of Digestive Disease, The Chinese University of Hong Kong, Hong Kong, China
    2. Department of Medicine and Therapeutics, The Chinese University of Hong Kong, Hong Kong, China
    Search for more papers by this author
  • V. W.-S. Wong

    1. Institute of Digestive Disease, The Chinese University of Hong Kong, Hong Kong, China
    2. Department of Medicine and Therapeutics, The Chinese University of Hong Kong, Hong Kong, China
    Search for more papers by this author

  • This commissioned systematic review was subject to full peer-review and the authors received an honorarium from Wiley, on behalf of AP&T.

Correspondence to:

Dr V. W.-S. Wong, Department of Medicine and Therapeutics, 9/F, Prince of Wales Hospital, 30-32 Ngan Shing Street, Shatin, Hong Kong, China.

E-mail: wongv@cuhk.edu.hk

Summary

Background

Non-alcoholic fatty liver disease (NAFLD) affects 15–40% of the general population. Some patients have non-alcoholic steatohepatitis (NASH) and progressive fibrosis, and would be candidates for monitoring and treatment.

Aim

To review current literature on the use of non-invasive tests to assess the severity of NAFLD.

Methods

Systematic literature searching identified studies evaluating non-invasive tests of NASH and fibrosis using liver biopsy as the reference standard. Meta-analysis was performed for areas with adequate number of publications.

Results

Serum tests and physical measurements like transient elastography (TE) have high negative predictive value (NPV) in excluding advanced fibrosis in NAFLD patients. The NAFLD fibrosis score comprises of six routine clinical parameters and has been endorsed by current American guidelines as a screening test to exclude low-risk individuals. The pooled sensitivities and specificities for TE to diagnose F ≥ 2, F ≥ 3 and F4 disease were 79% and 75%, 85% and 85%, and 92% and 92% respectively. Liver stiffness measurement often fails in obese patients, but the success rate can be improved with the use of the XL probe. A number of biomarkers have been developed for the diagnosis of NASH, but few were independently validated. Serum/plasma cytokeratin-18 fragments have been most extensively evaluated and have a pooled sensitivity of 66% and specificity of 82% in diagnosing NASH.

Conclusions

Current non-invasive tests are accurate in excluding advanced fibrosis in NAFLD patients, and may be used for initial assessment. Further development and evaluation of NASH biomarkers are needed.

Introduction

Non-alcoholic fatty liver disease (NAFLD) is currently the most common chronic liver disease worldwide, affecting 15–40% of the general population.[1, 2] Depending on the presence of necroinflammation and hepatocyte ballooning, NAFLD is further divided into non-alcoholic fatty liver (NAFL) and non-alcoholic steatohepatitis (NASH).[3] NASH is the active form of NAFLD. It occurs in 10–20% of NAFLD patients and may progress to cirrhosis and hepatocellular carcinoma (HCC).[4-6] As NAFLD is highly prevalent, but the majority of patients only have NAFL and run a benign course, it is important to identify patients with active disease efficiently.

Traditionally, liver biopsy was the primary method to assess the severity of NAFLD. However, it is an invasive procedure and carries a small, but definite, risk of complications. Furthermore, it is unrealistic to perform liver biopsy for 15–40% of the general population. In recent years, a multitude of blood tests and physical assessments have been developed to aid evaluation of NAFLD patients. Therefore, it is timely to appraise the diagnostic performance of these non-invasive tests.

Recently, a review by Festi et al. provided a synopsis of the main non-invasive markers and suggested a diagnostic algorithm for evaluation of steatosis using SteatoTest, Fatty Liver Index or ultrasonography, followed by evaluation of fibrosis using FibroMeter and transient elastography (TE).[7] Our review explores a wider range of non-invasive tools and provides an update on this field. We have sought to cover the majority of serum biomarkers, prediction models and physical measurements for NASH and fibrosis. In addition, we conducted a meta-analysis on plasma cytokeratin-18 fragments (CK-18) and TE, which are the most widely studied modalities for NASH and fibrosis respectively.

Methods

Literature search

A systematic web-based literature search of all publications in MEDLINE (via OvidSP), PUBMED (NLM) and EMBASE was conducted on 13 June 2013 from the date of inception for each of the databases. Our primary search strategy for identifying studies comprised of using free-text words [fatty liver, NAFLD, NASH, TE, Fibroscan, liver stiffness measurement (LSM), elastography imaging techniques, acoustic radiation force impulse, keratin 18, cytokeratin 18]. Two reviewers (RK and VWSW) performed literature search separately and agreed upon the final selection of studies. Search limits included English language, abstracts, and publication in peer-reviewed journals. A secondary search was performed to locate any potential studies missed by electronic search strategies. A comprehensive search of MEDLINE was performed for locating any existing systematic reviews on TE, acoustic radiation force impulse (ARFI) and CK18 in the diagnosis of NAFLD. Manual searching of reference lists from relevant reviews and primary studies was performed. No additional suitable studies were found.

Meta-analysis

All candidate articles from our primary search had their abstract or full text scrutinised to determine whether they were primary studies. Subsequently, the full text was further assessed to check for fulfilment of the inclusion/exclusion criteria. Disagreements were resolved through consensus. Inclusion/exclusion criteria for primary studies required the following features:

  1. Detailed description of adult human subjects under study.
  2. Description of TE, ARFI or CK18 as an index test.
  3. Description of liver biopsy as the reference standard. The definition of NASH was taken as the NAFLD activity score ≥5. Fibrosis staging based on the Brunt's or Kleiner's system: F0 = no fibrosis; F1 = perisinusoidal or portal; F2 = perisinusoidal and portal/periportal; F3 = septal or bridging fibrosis; and F4 = cirrhosis.[8, 9]
  4. A minimum number of NAFLD subjects ≥20.
  5. Results describe number of cases with NASH or different fibrosis stages using liver biopsy, the sensitivity, specificity and nominated cut-off values of the index test so that a 2 × 2 table could be created. Corresponding authors were asked to provide study-level data if adequate information could not be extracted from the published article.
  6. Different articles from a primary study that contained overlapping data cohorts were only counted once. The most suitable article to use was determined by seeking clarification from the authors, or by using the most updated manuscript that contained all the required data.

Both prospective and retrospective studies were acceptable. Studies in which subjects had other causes of chronic liver disease apart from NAFLD were included so long as discrete data for NAFLD population could be extracted. Studies that reported other non-invasive comparators were also allowed if the discrete data for TE, ARFI and CK18 could be extracted.

A final number of 9 articles for TE,[10-18] 11 articles for CK18[19-29] and 2 articles for ARFI[15, 30] were assessed to be suitable for inclusion in the meta-analysis. However, statistical analysis was not possible on the ARFI data due to there being too few studies. Figure 1 outlines the stepwise evaluation and selection process for all the candidate studies.

Figure 1.

Summary of literature search and selection.

Quality assessment

Each study's quality was analysed by independent reviewers (RK, YKT). A modified version of the QUADAS[31] was used to assess the quality of the studies included for meta-analysis (Table S1). Consensus was reached in disagreements by referral to a third reviewer (VWSW).

Transient elastography studies overall scored highly on the QUADAS assessment (Figure S1). Two studies scored 12/13, whereas the rest scored 13/13. CK18 studies had a mean QUADAS score of 11.2 (range 9–13) (Figure S2). The most common components in which studies lost points were an unclear description of the quality of liver biopsies (36% studies had high-quality data), whether the histopathologist was blinded to other results (45%) and unclear descriptions of when serum was obtained for CK18 analysis in relation to the timing of liver biopsy (64%).

Data extraction

Two reviewers (RK, YJH) independently extracted the required information from primary studies. A data extraction pro forma was created and variables included for collection were patient age, sex, ethnicity, body mass index (BMI), transaminase levels, results of the index and reference tests and accompanying diagnostic thresholds (cut-offs). Where available, other biochemical and blood parameters, presence of metabolic syndrome components and risk factors (other anthropometric measures, diabetes, hypertension, hypercholesterolaemia and hypertriglyceridaemia) were recorded. A 2 × 2 table was created for each modality and its reported cut-off for diagnosing each category.

Data synthesis and statistical analysis

From the 2 × 2 tables, we calculated sensitivity and specificity. The estimates of sensitivity and specificity and their associated 95% confidence intervals (CIs) were presented graphically by plotting in paired forest plots. Summary estimates of sensitivity and specificity, along with 95% CIs, were obtained by using the bivariate random-effects modelling approach (minimum four studies).[32] Besides accounting for study size and between-study heterogeneity using a random-effects model, the bivariate analyses enable correctly dealing with any possible negative correlation that might arise between the sensitivity and specificity. Moreover, we constructed a hierarchical summary receiver-operating characteristic (HSROC) curve plotting sensitivity vs. specificity.[33] The HSROC curve illustrates the summary trade-off between sensitivity and specificity across the included studies.

To examine the potential sources of heterogeneity, we pre-defined the following covariates: BMI (<30 kg/m2 vs. ≥30 kg/m2, for TE), and study quality factors (yes vs. unclear vs. no, for individual QUADAS item as described above). Separate bivariate models were simply performed to different subgroups of studies because sufficient data were not available (at least 10 studies) to allow adding covariates to the hierarchical model by means of meta-regression. The two studies by Yoneda et al. had much higher cut-offs for F4, compared with the other studies included in the meta-analysis (Figure 3). To assess the effect on the pooled results, post hoc sensitivity analyses were conducted to calculate pooled estimates of sensitivity and specificity in the bivariate model by excluding these two studies. Statistical analyses were performed using STATA 10.0 (StataCorp, College Station, TX, USA), particularly the metandi[34] commands and Review Manager[35] software. All statistical tests were two-sided, with a P value <0.05 indicating statistical significance.

Non-invasive diagnosis of NASH

Non-alcoholic steatohepatitis is the active form of NAFLD with necroinflammation and hepatocyte ballooning. With ongoing liver injury, NASH may progress to cirrhosis and HCC. In long-term follow-up studies, histological features of NASH predict future liver complications.[36, 37] Previously, NAFL and NASH were considered distinct entities. However, recent longitudinal studies with paired liver biopsies suggest that some patients with NAFL may progress to NASH.[6, 38] In any case, assessment of disease severity is important for prognostication and treatment monitoring.

Serum biomarkers

Cytokeratin-18 fragments

Cytokeratins are keratin-containing proteins that form intermediate filaments and comprise the structure of cytoskeletons of epithelial cells. CK18 is found predominantly in glandular epithelia of the digestive, respiratory and urogenital tracts. It is the major intermediate filament protein of the liver. During apoptosis of hepatocytes, caspases cleave CK18 generating fragments that can be detectable using immunoassays.[39] It is one of the most widely investigated biomarkers for NASH as a stand-alone test or as part of prediction models. The two main enzyme assays of CK18 that have been studied are M30 and M65, which supposedly measure hepatocyte apoptosis and total cell death respectively.

Meta-analysis on CK18

We performed a meta-analysis of 11 studies with a total pool of 822 patients, in which 389 had histological NASH (Table S2). As M30 and M65 had similar performance and M30 was more widely studied, we decided to focus on M30. The studies were further grouped according to whether a separate ‘high-sensitivity’ and ‘high-specificity’ cut-off (six studies) was chosen, and/or a single ‘best’ overall cut-off level (seven studies) was used to diagnose NASH. In the six studies that chose separate cut-offs, for ‘high sensitivity’, the CK 18 cut-off chosen ranged 111.6–380.0 U/L (77–90% sensitivity and 34–94% specificity) (Figure 2). For ‘high specificity’, the cut-offs chosen ranged 261.4–670 U/L (24–86% sensitivity and 91–100% specificity). The areas under the receiver-operating curve (AUROC) for these six studies ranged 0.71–0.93. For the six studies that reported a single ‘best’ overall cut-off, the range of chosen cut-offs was 121.6–338.0 U/L, with 60–88% sensitivity, 66–97% specificity and AUROC 0.70–0.87.

Figure 2.

Forest plot from meta-analysis of sensitivities and specificities for CK18 to diagnose NASH using a random-effects model. Cut-offs with the best overall accuracy, sensitivity and specificity in individual studies were adopted.

In the pooled estimates of diagnostic accuracy, the seven studies that used a single ‘best’ overall cut-off level showed 66% sensitivity and 82% specificity. In the six studies using separate ‘high sensitivity’ and ‘high specificity’ cut-offs, the pooled estimates were 82% sensitivity, 65% specificity and 58% sensitivity and 98% specificity respectively. Pooled estimates of diagnostic accuracies remained stable when only studies with high quality were analysed (Table S3). Figure S3 shows the HSROC plots of CK18.

Discussion on CK18

Our findings suggest that CK18 has moderate accuracy overall for diagnosing NASH (66% sensitivity, 82% specificity). When optimal cut-offs are used, sensitivity improves to 82%, while specificity is 98%. However, there is considerable variability in the suggested cut-offs and their respective diagnostic accuracy among studies. In clinical practice, this makes choosing which threshold to use very difficult. The variability may be partly explained as by choosing an optimal threshold to maximise either sensitivity or specificity, the accuracy of the other is greatly sacrificed. Other possible causes of heterogeneity include intervals between blood tests and liver biopsy, inadequate description of liver biopsy assessment and blinding, and inadequate reference test description. However, none of these was found to be significant, with only small differences in overall sensitivities and specificities in these subgroups (Table S3).

Other biomarkers

Soluble sFas (sFAS) is a death receptor from the TNFR family that has been implicated in apoptosis and is upregulated in NASH in animal models. An apoptosis panel combining CK18 with sFAS was found to have greater AUROC than either alone.[40]

Tumour necrosis factor-alpha (TNF-α) is a proinflammatory cytokine, which has been proven to play important roles in pathogenesis of NAFLD. Several studies demonstrated that TNF-α contributed to NASH development in that NASH patients or animal models exhibit elevated serum TNF-α.[41-44] However, its diagnostic performance of differentiating NASH from NAFL has not been fully elucidated.

Another cytokine, interleukin-6 (IL-6), was elevated or upregulated in serum or liver tissue of NASH patients as stated by some independent studies,[41, 45, 46] but did not show any difference between NASH and NAFL in other studies.[47, 48] Grigorescu et al. evaluated the accuracy of IL-6 as a non-invasive test for discriminating 59 NASH patients from 20 patients without NASH.[46] At a cut-off of 6 pg/mL, the sensitivity and specificity were reported as 64% and 80% respectively. However, the clinical utility of sole measurement of IL-6 for NASH diagnosis is probably of little value because of the discrepancies above mentioned.

Concerning insulin resistance (IR), which characterises NASH,[49] Shimada et al. conducted accuracy analyses of homeostasis model assessment of insulin resistance (HOMA-IR). In accordance with the fact that HOMA-IR could be normal in the early-stage NASH, they reported that HOMA-IR differentiated early-stage NASH from NAFL with a sensitivity of 51% at a cut-off of 3 (specificity of 95%, positive predictive value, PPV 98%, NPV 31% and AUROC 0.76). In another study, HOMA-IR was found to be significantly associated with NASH and was an independent predictor.[50] However, there was no baseline difference in HOMA-IR between normal vs. NAFLD and NAFL vs. NASH; only between normal subjects and subjects with NASH was there a significant difference.

High-sensitivity C-reactive protein (hsCRP) is an acute-phase reactant, which can detect lower grade inflammation. Yoneda et al. was the first to show the usefulness of elevated hsCRP in distinguishing biopsy-proven NASH patients from nonprogressive steatosis subjects at an AUROC of 0.83.[51] However, the results were not reproduced by others.[48, 52] In particular, Haukeland et al. demonstrated that CC-chemokine ligand-2 (CCL2), but not hsCRP, was elevated in NAFLD and was significantly higher in NASH than in NAFL.[48]

CC-chemokine ligand-2, also known as monocyte chemoattractant protein-1 (MCP-1), is a potent chemokine, which is responsible for hepatic recruitment of macrophages during liver inflammation.[53] In another study of 104 subjects, high CCL2 level was associated with elevated alanine aminotransferase (ALT).[54] In addition, CCL2 level was significantly higher in patients diagnosed with NAFLD by ultrasound. However, the subjects included in the study did not undergo liver biopsy, hence additional evaluation will be needed for the application of this biomarker in the clinical practice.

In a series of 70 patients with biopsy-proven NAFLD and 10 healthy controls, significantly higher pentraxin-3 level was found in NASH than in non-NASH cases.[55] The AUROC for separating NASH from non-NASH with pentraxin-3 was 0.76. The sensitivity, specificity, PPV and NPV were 66.7%, 78.6%, 82.4% and 61.1%, respectively, at the cut-off of 1.6 ng/mL. There is a possibility of utilisation of pentraxin-3 for not only differentiating NASH from non-NASH but also assessing degree of fibrosis, in that there was a stepwise increase in the level of this marker according to the histological stage of fibrosis. However, because pentraxin-3 is primarily an acute phase reactant responding to inflammation, the sole measurement of this marker would hardly be of diagnostic value.

Serum prolidase enzyme activity (SPEA) reflects hepatic prolidase enzyme activity.[56] Kayadibi et al. reported that SPEA was significantly elevated in patients with NASH than in those with NAFL with an AUROC of 0.85, a sensitivity of 84%, a specificity of 82%, a PPV of 82% and a NPV of 84% (cut-off 1134 U/L).[57] Potential advantage is that SPEA could predict fibrosis as well as steatohepatitis. However, further investigation and validation are needed as for other biomarkers.

Soluble receptor for advanced glycation endproducts (sRAGE) has been known to be associated with some components of metabolic syndrome.[58, 59] A case–control study involving 57 NAFLD patients and 14 healthy controls showed significantly decreased level of sRAGE in NASH group.[60] In differentiating NASH from NAFL, the AUROC of sRAGE was 0.77. The sensitivity was 75.0% and specificity was 71.4% at a cut-off of 1309 pg/mL. Although the level of sRAGE might be decreased in NASH, it is not unique to NASH.[61] Hence, it would possibly be useful when added to NASH diagnostic panels after further investigations.

Oxidative stress has been recognised as an important mechanism in the pathogenesis of NASH. Markers from different oxidation pathways were investigated for use in NASH diagnosis, but failed to show solid and consistent results.[62-65] In addition, the serum or plasma measurement of oxidative markers may not necessarily reflect the activity of different oxidation pathways in the liver. Therefore, the use of oxidative stress markers in clinical practice is still questionable.

Clinical models

A thorough medical history to assess for metabolic syndrome risk factors and exclude alcohol and secondary causes of fatty liver is crucial in establishing NAFLD. However, in discerning which patients have NASH, symptoms are not helpful, as patients remain asymptomatic until a considerable degree of cirrhosis develops.[66] As for physical examination, a specific pattern of fat distribution, dorsocervical lipohypertrophy, was shown to be associated with severity of steatohepatitis, but lacks objectivity and generalisability.[67] In addition, the performance of routine laboratory parameter has not reached satisfactory levels of sensitivity and specificity.[68]

However, diagnostic performance can be improved when clinical and laboratory parameters are incorporated into prediction models (Table 1). Poynard's NashTest consists of 13 parameters and includes several metabolic syndrome risk factors.[69] From a cohort of patients diagnosed with NAFLD via the SteatoTest (also developed by Poynard), NashTest was assessed in its ability to differentiate NASH from simple steatosis. A specificity of 94% was reported, but the sensitivity only reached 33%. A later attempt was performed to validate this test in another French cohort.[70] However, there were only 15 NashTest-positive cases and 19 biopsy-confirmed NASH cases among more than 250 patients, hence further study is warranted.

Table 1. Clinical models for predicting NASH
StudyNameComponent/formulaStudy populationResultsComment
  1. NASH, non-alcoholic steatohepatitis; α2-MG, alpha2-macroglobulin; GGT, gamma-glutamyl transpeptidase; ALT, alanine aminotransferase; AST, aspartate aminotransferase; AUROC, area under the receiver-operating curve; Se, sensitivity; Sp, specificity; PPV, positive predictive value; NPV, negative predictive value; CK-18, cytokeratin-18; BMI, body mass index; NAFLD, non-alcoholic fatty liver disease; 13-HODE, 13-hydroxyl-oactadecadienoic acid; LA, linoleic acid; IR, insulin resistance.

Poynard et al.[69]NashTest
  1. Age
  2. Sex
  3. Height
  4. Weight
  5. Triglyceride
  6. Cholesterol
  7. α2-MG
  8. Apolipoprotein A1
  9. Haptoglobin
  10. GGT
  11. ALT
  12. AST
  13. Total bilirubin-undisclosed formula

160 – training group

97 – validation group

383 – controls

AUROC 0.79

Se 33%, Sp 94%

PPV 66%, NPV 81%

Validated in 274 patients with morbid obesity – Se 21%, Sp 96%, PPV 27%, NPV 94% (calculated)
Younossi et al.[22]NASH Diagnostics
  1. Cleaved CK-18
  2. CK-18 minus cleaved CK-18
  3. Adiponectin
  4. Resistin
  5. undisclosed formula

69 – training group

32 – validation group

AUROC 0.85

Se 72%, Sp 91% (threshold 0.4320)

Re-evaluated in 79 patients by same group – AUROC 0.70, Se 61%, Sp 69%, PPV 68%, NPV 63% (threshold 0.389)
Younossi et al.[71]NASH Model of NAFLD Diagnostic Panel
  1. Type 2 diabetes mellitus
  2. Gender
  3. BMI
  4. Triglyceride
  5. Cleaved CK-18
  6. CK-18 minus cleaved CK-18
79 NAFLD patients

AUROC 0.81

Se 91%, Sp 47%, PPV 61%, NPV 86% (threshold 0.2210)

Se 44%, Sp 92%, PPV 83%, NPV 65% (threshold 0.6183)

 
Anty et al.[72]Nice Model
  1. ALT
  2. CK-18
  3. Metabolic syndrome

464 morbidly obese patients

310 – training group

154 – validation group

AUROC 0.83–0.88

Se 84%, Sp 86%, PPV 44%, NPV 98% (logarithmic transformation, threshold 0.1400)

Model = −5.654 + 3.780E−02 × ALT × 2.215E−03 × CK-18 = 1.825 × (presence of metabolic syndrome = 1)Logarithmic transformation = 1/{1 + Exp(-Nice Model)}
Feldstein et al.[73]oxNASH
  1. 13-HODE/LA ratio
  2. Age
  3. BMI
  4. AST

73 – training group

49 – validation group

AUROC 0.74–0.83

Se 81–84% (threshold 55)

Sp 63–97% (threshold 73)

Model = 100x exp(z)/{(1 + exp(z))

−10.051 + 0.0463 × age (years) + 0.147 × BMI + 0.0293 × AST + 2.658 × 13-HODE/LA ratio

Dixon et al.&!#6;[75]HAIR
  1. Hypertension
  2. (increased) ALT
  3. IR
105 morbidly obese patients

AUROC 0.90

Se 80%, Sp 89% (threshold 2)

Hypertension = 1

ALT > 40 IU/L = 1

IR index > 5.0 = 1

NASH Diagnostics, which incorporates CK18, adiponectin and resistin, yielded a sensitivity of 72.1%, specificity of 91.4% and overall AUROC of 0.85,[22] but a later study conducted by the same group demonstrated a lower AUROC of 0.70.[71] In that same study, the authors newly constructed a diagnostic algorithm called the NASH model as a part of the NAFLD Diagnostic Panel. It consists of six clinical and cell death-related parameters: type 2 diabetes mellitus, gender (male being negative impact), BMI, triglyceride and CK18. In set of 79 NAFLD patients, the authors found AUROC of 0.81, which was superior to the NashTest AUROC of 0.70. The discrepancy in results along with small sample sizes calls for external validation.

The Nice Model is a scoring system incorporating three independent variables, which predict NAFLD activity score (NAS) ≥5: ALT, CK-18 and the presence of metabolic syndrome.[72] Using ALT, CK-18 and the presence of metabolic syndrome alone, an AUROC of 0.78, 0.74 and 0.74 was obtained, respectively, for detection of definitive NASH. Combining these three variables increased AUROC to 0.88 in the training group and 0.83 in the validation group. The reported sensitivity of logarithmic transformation of this scoring system was 84%, with a specificity of 86% and NPV of 98%. Yet, the PPV of this model is quite low.

OxNASH is a risk score model, which incorporates 13-hydroxyl-octadecadienoic acid (13-HODE)/linoleic acid (LA) ratio, age, BMI and aspartate aminotransferase (AST).[73] In addition to the variables that were included in other models, such as age, BMI and AST, the rationale for oxNASH in clinical diagnosis of NASH is based on the finding that oxidative stress is an important mechanism of pathogenesis in NAFLD.[74] Although this model showed an acceptable AUROC, it has not been externally validated and blood markers for oxidation products are not easy to perform in most centres.

HAIR (hypertension, increased ALT and IR) was introduced in 2001 and its performance characteristics for NASH are relatively high.[75] However, this scoring system included highly selective patients who were suffering from severe obesity (BMI > 35 kg/m²) and, to date, no external validations have been carried out.

Non-invasive diagnosis of fibrosis and cirrhosis

Fibrosis and cirrhosis is the common pathway of chronic liver diseases. Fibrosis is a natural response to tissue injury. With ongoing liver injury, however, there is accumulation of fibrous tissue. Eventually, the liver architecture is disrupted, and multiple nodules are formed and separated by thick fibrous septa. This marks the development of cirrhosis. Although HCC has been reported in patients with noncirrhotic NAFLD[76, 77], cirrhosis is still the most important risk factor of HCC.[4, 5] Besides, cirrhotic patients may also develop various complications, including ascites, spontaneous bacterial peritonitis, variceal bleeding, hepatic encephalopathy and hepatorenal syndrome. Therefore, it is important to diagnose fibrosis and cirrhosis.

Biomarkers and prediction scores

Biomarkers of fibrosis are divided into two types. Class I biomarkers measure fibrogenesis and fibrinolysis directly. Class II biomarkers do not measure fibrosis directly, but are clinical parameters associated with fibrosis. For example, patients with higher aminotransferases are more likely to have active disease and therefore fibrosis, but aminotransferases are not a measurement of fibrosis and the association is not absolute.[78] Moreover, it is important to note that fibrosis and cirrhosis are the results of years of disease activity. Thus, a single-time measurement of markers of disease activity would not have good correlation with the severity of fibrosis. In fact, when NAFLD reaches the stage of cirrhosis, steatosis and necroinflammation typically regress.[79] NASH is currently believed to be the most important aetiology underlying cryptogenic cirrhosis.[80, 81]

As none of the available biomarkers has sufficient accuracy in diagnosing fibrosis as a stand-alone test, there have been a number of prediction scores (Table 2). In general, the scores were derived using liver histology as the reference standard. Clinical parameters and biomarkers associated with different fibrosis stages were identified, and a score was constructed based on the relative importance of each factor. Some of the scores were developed and validated in NAFLD patients only, while the majority were first developed for patients with other liver diseases such as chronic hepatitis C and later adopted for NAFLD.

Table 2. Biomarkers and prediction scores of liver fibrosis in NAFLD
ScoreComponentsClass I or II biomarkersF2F3
SensitivitySpecificitySensitivitySpecificity
  1. ALT, alanine aminotransferase; ApoA1, apolipoprotein A1; APRI, AST-to-platelet ratio index; AST, aspartate aminotransferase; BMI, body mass index; ELF, enhanced liver fibrosis panel; GGT, gamma-glutamyl transpeptidase; NAFLD, non-alcoholic fatty liver disease; PIIINP, procollagen III amino-terminal peptide; TIMP1, tissue inhibitor of matrix metalloproteinase 1.

Specific for NAFLD
NAFLD fibrosis score[82]Age, hyperglycaemia, BMI, platelet, albumin, AST/ALT ratio (dual cut-offs)II0.770.96
BARD score[111]BMI, AST/ALT ratio, diabetesII0.620.66
FibroMeter NAFLD[112]Glucose, AST, ferritin, platelet, ALT, body weight, ageII0.790.96
Not specific for NAFLD
AST/ALT ratio[113]AST, ALTII0.210.90
APRI[114]AST, platelets (dual cut-offs)II0.650.97
ELF[115]Hyaluronic acid, TIMP1, PIIINP (dual cut-offs)I0.800.670.800.90
FIB-4[116]Age, AST, platelet, ALT (dual cut-offs)II0.740.98
FibroTest[117]Total bilirubin, GGT, α2-macroglobulin, ApoA1, haptoglobin (dual cut-offs)I and II0.710.980.880.99

The NAFLD fibrosis score is one of the most extensively tested prediction scores.[82] It comprises age, hyperglycaemia, BMI, platelet count, albumin, AST and ALT. The score was derived from 480 patients in the training cohort and further tested in 253 patients in the validation cohort. Using a pair of high and low cut-offs, the score had 82% PPV and 88% NPV in diagnosing F3 disease. Around 30% of patients had score between the two cut-offs and thus indeterminate results. The latest American guideline supports the use the NAFLD fibrosis score to risk-stratify NAFLD patients.[3] As 90% of the original cohort for the development of the NAFLD fibrosis score were Caucasians,[82] the score has been independently validated in the Chinese population.[83] The score still had high NPV of 91% in Chinese, but few patients had high scores suggestive of advanced fibrosis. The phenomenon may be partly because Asian patients tend to develop metabolic complications at a lower BMI. Recently, long-term data showed that baseline NAFLD fibrosis score correlated with increased overall mortality, mostly due to cardiovascular causes.[84, 85]

Although the other scores have not been as extensively studied, the FIB-4 index appears to have the highest accuracy in diagnosing fibrosis in NAFLD patients when compared with other prediction scores. The FIB-4 index comprises age, platelet count, AST and ALT. In three separate validation studies in America, Europe and Asia, the FIB-4 index had an area under the receiver-operating characteristic curve of over 0.80 in diagnosing F3–4 disease.[14, 86, 87] The components and performance of other prediction scores are shown in Table 2.

It is important to note that the prediction scores were validated against liver histology. As liver histology is an imperfect reference standard with sampling variability, intraobserver and interobserver bias, there is a ceiling for the perceived accuracy in such validation studies.[88] In simple words, even if a score has 100% accuracy, assuming that the accuracy of liver biopsy is 90%, the score will still disagree with histology in 10% of cases and will be classified as inaccurate result. In reality, however, the prediction scores are modelled against histology and therefore would suffer from a similar degree of case misclassification.

Physical measurements

Ultrasound, computed tomography and magnetic resonance imaging

Ultrasound is the most commonly performed imaging test in patients with liver disease. A recent meta-analysis found that ultrasound is able to diagnose NAFLD when hepatic steatosis exceeds 33% at a good accuracy (84.8% sensitivity and 93.4% specificity).[89] The drawbacks include being a qualitative measure, poor ability to detect minor steatosis and intraobserver/interobserver variability (κ = 0.54–0.92, 0.44–1.00). Ultrasonography cannot distinguish NASH from simple steatosis.[89, 90] Cirrhosis can be diagnosed in advanced cases when the liver is small and shrunken, or when there are signs of portal hypertension, such as ascites, splenomegaly, varices and recanalisation of the umbilical vein. However, the diagnosis can be difficult in early cirrhosis when signs of portal hypertension are absent. It follows that fibrosis is certainly impossible to assess with ultrasound.[90] Furthermore, hepatomegaly and increased liver echogenicity in patients with NAFLD would make ultrasonographic features of cirrhosis inconspicuous. Various ultrasound quantitative measures based on the greater echogenicity of the liver in NAFLD compared with other organs have been studied. The Ultrasonographic Fatty Liver Indicator and the Hepatorenal index are two such methods,[91, 92] but require further evaluation as only small studies have been performed.

Computed tomography is superior to ultrasound in detecting focal steatosis, but otherwise has a similar diagnostic performance (82% sensitivity, 100% specificity)[93] to ultrasound and can only assess moderate steatosis or more. Noncontrast CTs should be performed as contrast affects attenuation. Misdiagnoses can occur when there are other diffuse liver conditions, such as haemochromatosis.[94] Although CT can evaluate features such as nodular liver, ascites and varices that may suggest cirrhosis, it cannot detect early cirrhosis or the degree of fibrosis, and it also cannot distinguish NASH from simple steatosis.[95] There is also the additional drawback of radiation exposure. Thus, it is not an appropriate modality for routine diagnosis.

Conventional MRI is superior to ultrasound in detecting minor steatosis,[96] but is poor at diagnosing NASH and assessing fibrosis.[90] Many varieties of MRI technique have been developed to improve its performance in the diagnostic spectrum of NAFLD. Magnetic resonance spectroscopy (MRS) is emerging as a very promising modality. This directly measures the signal from hydrogen atoms and can distinguish between its different molecular bonds. The spectra pertaining to methyl groups in triglyceride molecules can be detected and hence MRS is able to directly diagnose and quantify hepatic triglyceride content. MRS shows good diagnostic accuracy for all grades of steatosis (AUROC 0.87–0.89).[97] MRS also has the advantages of being able to assess the entire volume of liver. As more refined software and technique algorithms are being developed, it is challenging liver biopsy as a possible new gold standard in diagnosing steatosis.[98] For screening, it is not a feasible modality because of limited availability and expense.

Magnetic resonance elastography (MRE) is an MRI modality with promising results for diagnosing fibrosis. MRE is phase contrast-based MRI technique that produces an image of a propagating shear wave. In the Mayo clinic protocol, a constant mechanical wave is produced from a disc-shaped driver that is attached to the patient's anterior right chest wall.[99] The data acquired allow the MRI to generate an image map of the liver that depicts the quantitative tissue elasticity. Early studies of MRE suggest that it is superior to TE in diagnosing each stage of fibrosis[100] and has good accuracy for diagnosing NASH.[101] The disadvantages of MRI techniques are that they are expensive and not widely available. Further external validation is also required.

Transient elastography

Transient elastography (Fibroscan; Echosens, Paris, France) enables non-invasive assessment of liver fibrosis using ultrasonic elastography principles. The Fibroscan probe consists of an ultrasound fitted on the axis of an electrodynamic transducer. The probe is placed on the skin overlying the liver, and generates a low-amplitude 50 Hz mechanical pulse, which creates a shear wave. The velocity of the shear wave is directly related to the stiffness of the liver. Ultrasound signals at low-energy 3.5 MHz emitted from the probe measure the shear wave velocity and can directly calculate the elastic modulus. This is expressed in kilopascals and is known as LSM. TE has been validated as a measure of fibrosis across a wide spectrum of chronic liver disease and has overall a good accuracy. It has the advantage of being quick, easy to learn, well tolerated by patients and assesses a volume of liver around 100–200 times the size of a liver biopsy. There have been many studies examining its use in NAFLD patients, and there is ongoing debate regarding its diagnostic accuracy and feasibility, especially in obese patients.

Meta-analysis on TE

Nine studies including a total pool of 1047 NAFLD patients from different ethnic backgrounds were identified as suitable for meta-analysis (Table S4). Data on M probe included 854 NAFLD patients. Groups of data were formed according to whether the M probe or the XL probe was used, and then further subgrouped according to the fibrosis stage that was being compared. Eight studies had suitable data for the M probe, whereas one study had suitable data only for the XL probe. There were seven, eight and six TE studies that reported that its performance compared with liver biopsy for F ≥ 2, 3 and 4 respectively (Figure 3). For F ≥ 2, the LSM cut-off ranged from 6.7 to 7.7 kPa, with 67–94% sensitivity, 61–84% specificity and AUROC 0.79–0.87. For F ≥ 3, the LSM cut-off was 8.0–10.4 kPa, with 65–100% sensitivity, 75–97% specificity and AUROC 0.76–0.98. For F4, the LSM cut-off was 10.3–17.5 kPa, with 78–100% sensitivity, 82–98% specificity and AUROC 0.91–0.99. The overall pooled estimates of the diagnostic accuracy of TE are: F ≥ 2 (79% sensitivity, 75% specificity); F ≥ 3 (85% sensitivity, 85% specificity) and F4 (92% sensitivity, 92% specificity; Table S3). Figure S4 shows the HSROC plots of TE.

Figure 3.

Forest plot from meta-analysis of sensitivities and specificities for TE to diagnose different fibrosis stages using a random-effects model.

Discussion on TE

The overall results suggest that TE is excellent in diagnosing F ≥ 3 (85% sensitivity, 82% specificity) and F4 (92% sensitivity, 92% specificity) and has moderate accuracy for F ≥ 2 (79% sensitivity, 75% specificity). Our analysis of 854 NAFLD patients in eight studies is the largest so far and most updated. The quality of data in the included studies was excellent, with all studies obtaining at least 12/13 on the modified QUADAS (Figure S1), and hence no subgroup analysis between high- and low-quality studies was performed. In addition, analysis of whether BMI and ALT was a factor in heterogeneity could not be performed because of wide range of these factors in each of the included studies.

Obesity is the main reason for failed LSM, and the problem can be largely overcome using the XL probe.[102, 103] The largest study of 193 patients reported the ability to obtain 10 measurements in 93% of patients with BMI >30 kg/m2 with AUROCs of 0.80, 0.85 and 0.91 for F ≥ 2,3 and 4, respectively, although lower LSM cut-offs need to be used.[12] Pooled statistical analysis could not be performed for the XL probe performance due to insufficient number of studies.[104, 105] All TE studies had similar baseline characteristics, used similar cut-offs and there were no heterogeneity factors identified. TE studies had high-quality data and subgroups and post hoc sensitivity analysis did not show that this affected the overall summary estimates.

Acoustic radiation force impulse

Acoustic radiation force impulse imaging is a form of tissue elastography that is integrated into a conventional high-end ultrasound machine (Siemens S2000, Siemens, Erlangen, Germany). A region-of-interest (ROI) in the liver is targeted using short-duration acoustic pulses with a fixed frequency of 2.67 MHz. Shear waves are generated away from the region of excitation that are tracked using an ultrasonic, correlation-based method. The shear wave speed of the tissue within a ROI is measured and can be used to calculate the elasticity of the liver. Like TE, the result is expressed in kilopascals. ARFI has the advantage of being a feature existing on an ultrasonography machine. This allows for the convenience of assessing for structural abnormalities, steatosis as well as fibrosis in a single sitting.

Summary estimates for ARFI were not possible in this review due to insufficient data being available. Only two studies fit our selection criteria,[15, 30] although a further two articles[106, 107] could have been included if attempts to contact study authors were successful. The AUROCs reported in our candidate studies[15, 30, 106, 107] ranged from 0.74 to 0.97 for the diagnosis of F ≥ 3 in NAFLD. From a recent meta-analysis on the performance of ARFI across a heterogeneous range of liver disease, the mean AUROCs were 0.87, 0.91 and 0.93 for the diagnosis of F ≥ 2, 3 and 4 respectively.[108] ARFI appears to be a promising modality for NAFLD, but availability of this feature on ultrasound devices is currently limited.

Liver scintigraphy

Technetium-99 m-2-methoxy-isobutyl-isonitrile (Tc 99-MIBI) is a lipophilic cationic agent that was initially designed for myocardial perfusion imaging utilising the property of Tc99-MIBI uptake and retention being related to mitochondrial function. In NASH, the precise mechanism is unclear, but it has been observed that the liver: heart ratio and the liver: spleen ratio uptake of Tc99-MIBI is decreased in NASH compared with simple steatosis.[109, 110] Further studies are needed.

Conclusions

Non-alcoholic fatty liver disease is a disease that affects 15–40% of the general population. Accurate identification of patients with active or advanced disease is one of the most urgent clinical needs. At present, serum tests and physical measurements such as TE come close as highly accurate non-invasive tests to exclude advanced fibrosis and cirrhosis in NAFLD patients. CK18 has moderate accuracy in diagnosing NASH, while other biomarkers have not been extensively studied. Further studies are needed to explore the optimal test combinations and the role of these tests in prognostication and treatment monitoring.

Authorship

Guarantor of the article: Dr Vincent Wong.

Author contributions: Dr Raymond Kwok performed the research, collected and analysed the data and wrote the manuscript. Dr Yee-Kit Tse performed the research, analysed the data and wrote the manuscript. Dr Yeonjung Ha performed the research, analysed the data and wrote the manuscript. Dr Grace Wong, Dr Alice Lee, Dr Meng Ngu and Prof Henry Chan gave critical comments to the manuscript. Dr Vincent Wong conceived and designed the study, assisted with data interpretation and wrote the manuscript. All authors approved the final version of the manuscript.

Acknowledgements

Declaration of personal interests: Dr Grace Wong has received paid lecture fees from Echosens. Prof Henry Chan has been a consultant for Echosens.

Declaration of funding interests: This study is supported, in part, by the General Research Fund of the Research Grant Council, the Government of Hong Kong SAR (Project reference 476512).

Ancillary