Systematic review of performance of non-invasive biomarkers in the evaluation of non-alcoholic fatty liver disease

Authors


Correspondence
Dr Michael H. Miller, Biomedical Research Institute, University of Dundee, Ninewells Hospital, Dundee DD1 9SY, UK. Tel: +01382 632307
Fax: +01382 425504
e-mail: m.miller@dundee.ac.uk

Abstract

This systematic review evaluates the many studies carried out to discover and evaluate non-invasive markers of non-alcoholic fatty liver disease (NAFLD). Many different strategies and methods have been used in this task, from the discovery of new markers by global ‘shotgun’ studies to hypothesis-driven approaches, to the development of algorithm tests based on routinely available clinical and biochemical parameters. We examined the various different approaches, summarising the findings in an attempt to give an overview of the field of non-invasive markers in NAFLD, encompassing markers of steatosis, necro-inflammation and fibrosis. The body of literature surrounding this topic is complex and varied, encompassing not only different methodologies but also different patient characteristics, different disease definitions, as well as different end points. This reflects the heterogeneity of NAFLD, which, however, introduces considerably difficulty when trying to draw a conclusion between studies. We have divided this review into three main chapters based on the characteristics of the studies. The Genomics/Proteomics chapter reviews studies using a non-hypothesis-driven approach to biomarker discovery. Thereafter, we evaluate studies of association – studies that target-specific markers, comparing levels between disease and control groups. Finally, we examine the algorithm tests – mathematical systems developed on the basis of previously described markers and assessed, usually, by receiver operator curve analysis. While radiological examination and investigations offer important diagnostic information, such studies are not discussed in this review – the body of literature surrounding blood and anthropological markers is complex and varied, demanding close attention.

Non-alcoholic fatty liver disease (NAFLD) is the hepatic manifestation of the metabolic syndrome and as such is commonly associated with type 2 diabetes mellitus (T2DM), hypertension and hypercholesterolaemia (1–3). It was first described as a distinct clinical entity in 1980 (4) and since then interest and research in this group of disorders have exploded. NAFLD represents a histological spectrum of disease, ranging from simple steatosis (SS) to non-alcoholic steatohepatitis (NASH). NASH can progress to fibrosis, cirrhosis and, in some instances, hepatocellular carcinoma (HCC) (5). The pathogenesis of NAFLD remains unclear, although improper handling and transport of fat, insulin resistance, inflammation and oxidative stress are all implicated in the development and progression of these conditions. The exact interplay between these complex pathways remains unknown. Additionally, the relationship between SS and NASH is not well defined. It is believed that the presence of SS is sine qua non for the development of NASH; however, it is believed that most NAFLD patients with SS will not progress to NASH. Why some patients with SS should progress to NASH is not yet fully understood.

Non-alcoholic fatty liver disease is an extremely common condition, affecting up to 1/3 of the US population (6). There are close links between the incidence of NAFLD and that of obesity; thus, NAFLD is more common in the western world. One of the main challenges facing epidemiologists is the lack of clinical symptoms; if reported, fatigue and vague right upper quadrant pain are the most common symptoms. However, most cases of NAFLD are discovered incidentally. Thus, the majority of NAFLD cases are ‘silent’.

Liver biopsy remains the gold standard investigative tool for NAFLD. There are several limitations to liver biopsy. It is an invasive, costly procedure with associated morbidity and mortality concerns (7, 8). Sampling error and interobserver variability also hinder the performance of this test (9).

Discovery and validation of biomarkers for NAFLD have several potential benefits, including increased ease of diagnosis, reduced risk, reduced cost, as well as insights into disease mechanisms and pathogenesis. Additionally, markers that can accurately diagnose and characterise NAFLD will help focus service provision for what is already a very common condition. The European Association for the Study of Liver Disease has produced a position paper commenting that ‘there is a significant need for the non-invasive quantification of fibrosis in order to facilitate screening of the large number of patients at risk’ (10). This highlights the growing and urgent need for such non-invasive markers.

What is a biomarker?

Biomarkers are biological markers of disease presence or progression. There are many examples of biomarkers in everyday clinical use, including prostate-specific antigen (11) and troponin (12). An ideal biomarker would detect the presence of disease with high accuracy, be specific for the disease in question, be measurable non-invasively and be cost effective, both in terms of the assay and in the impact on the disease.

The Early Detection Research Network, a working group set up by the National Cancer Institute, developed a 5-phase plan for the development of screening biomarkers (13). Phase 1 concerns the preclinical discovery phase – numerous technologies and strategies are used in the discovery of biomarkers. Phase 2 involves the development of a clinical assay and the validation of the discovered biomarkers, before retrospective longitudinal validation (Phase 3). Phase 4 of biomarker development is prospective screening followed by Phase 5, cancer control. Although specifically derived for the development of cancer biomarkers, Phases 1–4 are applicable to any disease state and provide a robust, standardised framework for biomarker studies.

Methods

We searched three main-stream public-domain databases, namely Pubmed, Scopus and Web of Knowledge. Four categories were devised: (i) conditions (SS, NASH, NAFLD, FLD), (ii) markers [(invasive/non-invasive) serum, plasma, liver, tissue], (iii) method (micro-array, proteomics, global proteomics, shotgun proteomics) and (iv) subjects (human). Each possible combination was searched in all three databases. A total of 11 891 abstracts were retrieved. Duplicates were removed along with abstracts concerning non-related conditions. Non-English publications, abstract-only publications and non-human studies were excluded from the final review. Paediatric studies were excluded owing to the potential alternate pathogenesis associated with paediatric NAFLD. Studies relating to radiological assessment of NAFLD, including transient elastography were excluded. A total of 50 papers fulfilled all inclusion criteria. For the purposes of this review, the studies were further divided into three ‘chapters’– those dealing with non-hypothesis-driven, shotgun studies (proteomic and genomic studies), those dealing with targeted markers and those dealing with algorithm tests. We review each chapter in turn.

Markers derived from non-targeted approaches

Non-hypothesis-driven biomarker discovery studies have several advantages over more focused experiments. By their very nature, they do not require a ‘target’, i.e. they are true discovery studies. Additionally, they avoid elements of bias in as much as they offer the potential to look at all markers from many different pathways. There are several limitations; however, these studies require large, powerful and often cutting-edge technologies. They can be expensive and paradoxically ‘miss’ markers owing to the limitations in sample preparation and processing. There are two main approaches to this type of study – gene expression studies and proteomics studies.

Gene expression studies

Hepatic gene expression studies provide an insight into possible mechanisms of pathogenesis as well as potential biomarkers of disease. The use of liver tissue in these gene expression studies negates the non-invasive nature of the differentially expressed features discovered. However, gene expression data may identify candidate markers measurable in serum, making this a potentially powerful tool in the hunt for non-invasive markers. The experiments can be tailored, focusing on a specific target gene set. Younossi et al. (14) used a customised set of cDNA micro-arrays, containing 5220 genes, to investigate differential hepatic gene expression in patients with NASH. They found a total of 34 differentially expressed genes when comparing NASH with the control, 19 of which were shown not to be because of obesity. Four of these gene changes were confirmed by reverse transcription polymerase chain reaction (rtPCR). Sreekumar et al. (15) compared the gene expression of patients with NASH-related cirrhosis with other causes of cirrhosis. They found 16 genes showing differential expression in the NASH cirrhosis group. Several genes involved in the anti-oxidant response were found to be underexpressed, along with genes involved in fatty acid and glucose metabolism. This study surveyed over 6000 genes.

Greco et al. (16) studied a larger selection of genes, over 17 500, as well as a different classification of NAFLD. They examined patients who had either extreme steatosis (liver fat>60%), or patients with almost no liver fat (<6%). No patients had evidence of inflammation or fibrosis; all were obese undergoing bariatric surgery. A total of 1060 genes were found to be significantly associated with liver fat content, of which 419 were positively correlated. However, only a small number of genes had fold changes greater than 2 (14 genes). Genes noted to be overexpressed included those involved in carbohydrate metabolism, lipid metabolism, insulin signalling and inflammation.

Chiappini et al. (17) also examined hepatic gene expression in steatosis, analysing over 22 000 genes. All samples were from organ donors. One hundred and ten gene expression profiles were found to be differentially expressed and included genes involved in glucose and fatty acid metabolism, insulin signalling and nuclear transcription regulation. However, in the context of NAFLD markers, this study should be interpreted with caution as no record of alcohol intake was made, a point that the authors concede in their discussion.

A slightly different approach was taken by Yoneda et al. (18). They performed a micro-array analysis of over 54 000 genes. The results were analysed by a specific software package to look for groups of related genes. They reported on the over or underexpression of these genes sets in NASH compared with non-NASH and found 27 gene sets significantly higher in the NASH cohort. These sets included reactive oxygen species-scavenging genes, cell adhesion genes and various transcription factor gene sets.

All of the studies included within this section give important insight into the possible pathogenesis of this group of disorders. All of the mentioned studies made efforts to confirm the findings of the gene expression experiments, in all cases by rtPCR on a selected group of candidate genes (see Table 1). While this is an essential confirmation of the gene expression experiment, it does not translate to confirming the product of the gene as a potential marker of disease.

Table 1.   Studies of gene expression in non-alcoholic fatty liver disease
ReferenceYearDescriptionNumber of patientsNumber of
potential markers*
Validated
(method)
  • *

    Number of significantly differentially expressed gene changes (up/down).

  • †Steatotic cohort did not have alcohol history recorded.

  • GSEA, gene set enrichment analysis; MA, micro-array; NAFLD, non-alcoholic fatty liver disease; NASH, non-alcoholic steatohepatitis; rtPCR, reverse transcription-polymerase chain reaction; SS, simple steatosis.

Yoneda et al. (18)2008GSEASS=952 gene sets6 (rtPCR)
   NASH=9  
Younossi et al. (30)2005MASS=12247 (rtPCR)
   NASH=27  
   SS+non-specific inflammation=52  
Younossi et al. (14)2005MASS=12344 (rtPCR)
   NASH=29  
Chiappini et al. (17)2006MASteatosis=91103 (rtPCR)
Greco et al. (16)2008MALow fat SS=510606 (rtPCR)
   High fat SS=5  
Sreekumar et al. (15)2003MANASH cirrhosis=6142 (rtPCR)
Westerbacka et al. (20)2007rtPCR24 (eight normal, 16 NAFLD)21

Although not strictly a global study, Bragoszewski et al. (19) studied gene expression, by rtPCR, on a selected group of genes encoding mitochondrial proteins. They used liver tissue from patients with SS and NASH, and found 6 of the 16 selected genes to be differentially expressed between the two groups. Unfortunately, the degree of variability in gene expression between patients was high, such that none of the genes analysed could be used as markers of disease progression. Nonetheless, the differential gene expression between the cohorts highlighted potential markers of disease presence.

In another more targeted study, Westerbacka et al. (20) selected a group of hypothesis-driven genes, mainly focused on fatty acid binding and partitioning as well as inflammation within the liver. They measured the expression levels of these genes in liver tissue of patients with or without steatosis (patients with NASH were excluded). Several genes were significantly overexpressed in fatty liver including fatty acid binding protein (FABP) 4 and 5, monocyte chemo-attractant protein-1 (MCP1) and PPARγ2.

Genome-wide association studies (GWAS) provide a method for evaluating a large number of single nucleotide polymorphisms within the same experiment. Several GWAS studies have been carried out on NAFLD populations, most recently by Chalasani et al. (21). In this unique pilot study, the authors performed the GWAS analysis on a cohort of NAFLD patients characterised by histological criteria. Several SNPs were associated with the different histological parameters, including an SNP in farnesyl diphosphate farnesyl transferase 1, an SNP associated with the total NAFLD activity score. In an earlier GWAS study, Romeo et al. (22) described an SNP in PNPLA3 strongly associated with both hepatic fat content and hepatic inflammation. More significantly for this finding, Speliotes et al. (23) have recently published data describing an odds ratio of 3.26 (confidence interval 2.11–7.21) for a PNPLA3 SNP relating to histological scoring of NAFLD. This description was especially interesting as there was no association with the SNP and the metabolic syndrome, suggesting that the SNP may be directly linked to histological NAFLD and not the constellation of pathology that is the metabolic syndrome.

All of the gene expression studies highlight many potential mechanisms of disease, each adding an extra piece to the complicated jigsaw that is NAFLD pathogenesis. Furthermore, these studies also reveal several promising non-invasive markers of disease. They all, however, share several limitations, mainly concerning sample size. It is hardly surprising that sample size was relatively small in these types of studies given the large expense involved.

Table 1 summarises the main findings from the gene expression studies reviewed.

Proteomic studies

Proteomic techniques look specifically at protein expression patterns and profiles. There are several different approaches to proteomic studies, some providing a pattern recognition and subtraction approach [surface-enhanced laser desorption ionisation-time of flight (SELDI-TOF)], while others provide an identification-only endpoint [two-dimensional (2D) gel] (24). Other more sophisticated methods provide quantification data, both relative and absolute [ICAT, isobaric tags for relative and absolute quantification (iTRAQ) and Label Free] (25). It is also possible to use complex proteomic methods to validate protein expression data (multiple reaction monitoring), although this can be complicated, time consuming and expensive (26).

Five studies are included in this review, each adopting a different approach, either in terms of the proteomic platform used, the material used (liver tissue vs serum) or the cohorts of patients examined.

Surface-enhanced laser desorption ionisation-time of flight is a useful proteomic profiling tool. Protein finger print profiles are generated for a sample set using this technique. This pattern comparison/subtraction technique is used to highlight differences in the protein peak profile between disease and control groups. One significant limitation of this technology is the limited ability to identify the protein peaks as individual proteins, whereas other proteomic platforms provide protein identification (iTRAQ, Label-Free systems), SELDI-TOF uses peak patterns based on molecular weight. Several clinical studies have utilised this technology for the discovery of biomarker profiles, including studies of lung (27), colon (28) and gastric cancers (29).

Younossi et al. (30) took a novel approach, performing micro-array and proteomic experiments on the same patient groups. From their micro-array work, performed on liver tissue, 22 genes were shown to be differentially expressed across the various groups analysed. Thereafter, SELDI-TOF analysis of paired serum samples was performed revealing 12 significantly different protein peaks across the groups. This is the first such study to take a dual genomic/proteomic approach and shows the potential for this combination of studies. However, the patient cohorts were generalised – the NASH cohort included patients with all and any grade of fibrosis. Additionally, the authors made no remark regarding the potential overlay of results; this is the obvious advantage of their approach; thus, we do not know if any of the proteomic markers were also observed in the micro-array data or vice versa.

Trak-Smayra et al. (31) also used SELDI-TOF to analyse a cohort of patients undergoing bariatric surgery. Eighty such patients were recruited and subdivided based on liver histology. Serum was collected at the time of surgery and 6 months post-operatively in a subset of the group. Three protein peaks were identified in which intensity increased as the severity of liver disease increased (SS to NASH). These peaks returned to normal when measured 6 months post-operatively. Moreover, these peaks were identified as α- and β-haemoglobin subunits. Not only does this work suggest markers of disease severity in this subpopulation of NAFLD patients but also it reveals potential markers of prognosis. These assertions require independent, large-scale verification that, however, highlights the potential of proteomic studies.

Charlton et al. (32) used a more recently developed proteomic tool – iTRAQ. This is a method of chemically labelling peptides with a tag of known molecular weight. The major advantage of iTRAQ over more traditional methods is the ability not only to identify parent proteins from peptide residues but also to provide quantification data. The relative abundance of each identified peptide, and hence protein, can be calculated based on the fold change between tags in different groups. Using this technique, Charlton and colleagues identified two markers of particular interest. In their study, lumican was found to be increased in a progressive manner across the spectrum of NAFLD; whereas, FABP1 was found to be highest in the SS and decreased across the spectrum of disease. These findings are particularly noteworthy not only as markers of disease but also as markers of disease progression.

Whereas iTRAQ technology requires the addition of a chemical tag, Label-Free proteomics provides a platform for identification and quantification of protein expression without the need for additional tagging. Bell et al. (33) used a Label-Free strategy to identify a group of potential biomarkers in serum. They further refined their findings, producing a 6-protein (fibrinogen β chain, retinol binding protein 4, serum amyloid p component, lumican, transgelin 2 and CD5-like antigen) and a 3-protein panel of markers (complement component 7, transgelin 2 and insulin-like growth factor acid labile subunit). Both panels performed well in diagnosing the different stages of NAFLD, with area under the receiver operator curve (AUROC) figures ranging from 0.83 to 0.91. Additionally, they compared the performance of the protein panels with alanine transferase (ALT), finding ALT to be significantly inferior, especially when diagnosing NASH and NASH with advanced fibrosis. Gray et al. (34) performed 2D gel electrophoresis followed by MADLI-TOF MS to identify five potential markers of NAFLD-associated HCC and cirrhosis. Four of the markers were apolipoproteins. The fifth marker was CD5L antigen, which was validated in a larger cohort by ELISA. CD5-like antigen was dramatically increased in the serum of cirrhotic individuals patients compared with precirrhotic individuals, performing reasonably well with an AUROC of 0.719. Both ALD and NAFLD patients were included in this analysis.

The array of proteomic studies in the NAFLD field reflects the rapidly improving technologies available for this type of discovery strategy. Quantitative proteomic studies (i.e. iTRAQ, Label Free) now provide a means of identifying proteins while providing fold-change data. The different proteomic platforms support the use of either liver tissue or blood as a starting material. In addition to providing discovery data, several studies have gone on to validate their findings, usually by ELISA – validation is a critical step in the development of biomarkers. As well as providing discovery and validation information, proteomics studies also generate vast amounts of raw data, which can be mined to attempt to understand the pathogenesis of NAFLD.

Targeted studies –‘association’ studies

Many studies have taken a targeted, hypothesis-driven approach to the discovery and validation of biomarkers. Often these studies measure markers in isolation and report on their relative performance by comparing the marker(s) between a disease cohort and a control cohort. We have reviewed these studies and reported on them depending on the type of marker or pathway they implicate. To avoid repetition, where possible, we have included markers under only one heading (supporting information Table S1).

Routinely available parameters

Several studies have looked at the usefulness of routinely available biochemical markers, either alone or in combination with other markers or indeed clinical parameters.

Alanine transferase is a commonly measured indicator of hepatocellular damage and disease. The role of ALT in the diagnosis and prognosis of NAFLD is unclear. In many instances, NAFLD is considered and ultimately diagnosed owing to an isolated and mildly elevated ALT in an otherwise asymptomatic individual. One of the earliest studies assessing ALT performance as a biomarker of NAFLD evaluated 441 Japanese patients with ultrasound-diagnosed NAFLD (35). ALT achieved an AUROC of 0.69 in this cohort. Body mass index (BMI) performed equally well with an AUROC of 0.63. Both parameters show suboptimal performance. However, Mofrad et al. (36) showed that the entire histological spectrum of NAFLD can be seen in patients with normal ALT values. Unexplained hepatomegaly and evaluation for potential living organ donation were the main indication for biopsy in the cohort with normal ALT.

A Korean study (37) of over 5000 apparently healthy male volunteers revealed that elevated ALT, even within the reference range, was an independent predictor of incident NAFLD. Elevated ALT has also been associated with features of the metabolic syndrome, including T2DM (38).

From a population of patients with chronic hepatitis, Tanaka et al. (39) identified a group of patients who had NASH and a group in whom no diagnosis was found. They measured several routine clinical and biochemical markers and found serum ferritin to be significantly higher in the NASH group. Serum ALT was found to be similar in both groups, an unsurprising observation as the original population suffered from chronic hepatitis. BMI was the only clinical parameter shown to be significantly different between the groups.

γ-Glutamyltransferase (GGT) has been associated with features of the metabolic syndrome, including T2DM and obesity. In a large Italian study, Marchesini et al. (40) described the association of the metabolic syndrome, and in particular insulin resistance, with elevated liver enzymes, including GGT. Tahan et al. (41) assessed the performance of GGT in predicting the presence of advanced fibrosis in a small cohort of 50 patients with biopsy-proven NAFLD. GGT performed well with an AUROC of 0.74.

Aspartate transaminase (AST) is considered less specific than ALT in detecting hepatocellular damage and as such has received less attention as a standalone biomarker of NAFLD. Nonetheless, AST forms an integral part of many of the algorithm tests discussed later.

A study (42) of 18 NASH patients (any stage/grade) found significantly higher levels of serum ferritin, CRP and α2 macroglobulin when compared with the control. Yoneda et al. (43) measured serum ferritin in 100 patients with NAFLD (29 SS and 71 NASH), finding significantly higher levels in the NASH cohort. They also measured hsCRP, showing significantly higher levels in the NASH cohort. Additionally, they demonstrated significantly higher levels of hsCRP in patients with advanced fibrosis stage compared with mild fibrosis.

Targher et al. (44) measured serum 25-hydroxyvitmain D3 concentrations in a well-described cohort of NAFLD patients. Not only was the serum concentration significantly lower in NAFLD patients compared with the control but also the levels were inversely related to the degree of severity of steatosis, necro-inflammation and fibrosis.

Adipokines

The association between obesity and NAFLD is well described. Adipose tissue is known to secrete several bioactive proteins, considered to be implicated in the pathogenesis of NAFLD. Adiponectin is the most abundant protein secreted by adipose tissue and is negatively correlated with adiposity, insulin resistance, T2DM and the metabolic syndrome (45). Several studies have shown significantly lower levels of serum adiponectin in NASH compared with SS and control (46, 47). Hui et al. (48) and Targher et al. (49) confirmed this observation while noting a negative correlation between adiponectin level and the severity of NASH, suggesting that adiponectin is not only a useful marker to distinguish SS from NASH but also a useful prognostic marker.

Leptin is a hormone secreted by adipose tissue under neurological control and is important in the regulation of food intake (45). The validity of serum leptin as a potential non-invasive marker is yet to be established. Lemoine et al. (46) measured serum leptin levels in 74 patients with biopsy-proven disease (57 NASH, 17 SS) and found significantly higher levels in the NAFLD group compared with the control. Le et al. (50) studied a cohort of patients undergoing bariatric surgery. Twenty-one patients had NASH and 10 had SS. They found no difference in serum leptin levels between disease and control. Interestingly, Haukeland et al. (51) found serum leptin levels highest in the SS group compared with both NASH and control, which if validated may be useful in an algorithm-based biomarker panel.

The role of resistin is even less clear. Unlike adiponectin and leptin, it is produced mainly by peripheral blood mononuclear cells. Although the role of resistin is well described in mice, where it is known to mediate hepatic and skeletal insulin resistance, more human studies are needed to better understand its function (45). Jarrar et al. (47) found no difference in serum resistin between NAFLD and control. Zou et al. (52) found no significant difference in serum resistin when measured in obese children with radiologically and biochemically diagnosed NAFLD. Charlton et al. (53) measured resistin in patients defined as having mild or advanced NAFLD, mild being SS and NASH with fibrosis score 0–2, and advanced being NASH with fibrosis score 3–4. They found significantly higher levels of resistin in the advanced group. They also noted higher levels (P=0.06) of leptin in the advanced stage group, but no difference in adiponectin.

It is clear that adipokines play a role in the development and pathogenesis of NAFLD; however, at present there is no clear evidence that these hormones are useful markers of disease.

Inflammatory markers

As well as the acute phase proteins discussed previously (ferritin, CRP, ALT), other inflammatory markers have been studied in NAFLD, both mechanistically and as potential non-invasive markers. Tumour necrosis factor α (TNF-α) is implicated in the pathogenesis of NASH. Crespo et al. (54) measured mRNA levels for TNF-α in liver tissue from patients with and without NASH, finding significantly higher levels in the NASH group. Several other studies have found significantly higher levels of soluble serum TNF-α in NASH patients compared with SS and controls (51, 55, 56). Furthermore, serum levels of both sTNF-α1 (56) and sTNF-α2 (57) receptors have been found to be significantly higher in NASH compared with SS and controls.

Interleukin 6 (IL6) has been shown to be significantly higher in NAFLD compared with control (51, 56); however, conflicting results are seen when comparing IL6 levels between SS and NASH. Haukeland et al. (51) found no significant difference in IL6 between SS and NASH, whereas Abiru et al. (56) concluded that IL6 was able to distinguish SS from NASH. Thus, while IL6 may be a useful diagnostic marker, its potential role as a staging marker is less clear.

Additional inflammatory markers investigated include CC-chemokine ligand 2/monocyte chemo-attractant protein (CCL2/MCP) (51) and intercellular adhesion molecule-1 (ICAM1) (58). CCL2/MCP levels were significantly higher in the NASH group compared with the SS and controls. ICAM1 performed similarly. It was also positively correlated with the stage of both inflammation and fibrosis. Interestingly, ICAM1 levels were negatively correlated with the degree of steatosis, although this did not reach statistical significance.

Lipid peroxidation and oxidative stress markers

Oxidative stress has been established as a key factor in the development of NASH, although the exact interactions of this complex process with disease development and progression remain unclear (59). Several studies have successfully measured markers of both increased oxidative stress and reduced anti-oxidant capacity in NAFLD patients. Chalasani et al. (60) measured systemic lipid peroxidation in subjects with NASH. Oxidised LDL and thiobarbituric acid-reacting substances (TBARS) were compared between patients with NASH and age- and sex-matched controls, diabetic subjects were excluded. Both oxidised LDL and TBARS were significantly higher in the NASH group. Furthermore, both markers were independently associated with insulin resistance. Videla et al. (61) showed a progressive reduction in the levels of superoxide dismutase (SOD), catalase and glutathione peroxiadase (GSH-Px) across the histological spectrum of disease, when measured in liver tissue. They also measured the ferric reducing ability of plasma, a marker of the anti-oxidant capacity, and found this to be reduced in patients with NASH. A Turkish group measured levels of erythrocyte GSH, SOD and catalase in biopsy-proven NAFLD, finding significantly lower levels of all three anti-oxidant enzymes (62). Plasma malonyldialdehyde levels were higher in the NAFLD cohort, an observation supported by Baskol et al. (63). They investigated the anti-oxidant enzyme paraoxonase 1, finding significantly lower levels of the enzyme in the NASH cohort. They did not find any correlation with the level of this marker and the stage/grade of disease.

Cytochrome p450 2E1 has been implicated in NAFLD pathogenesis by several studies (61, 64, 65) via the development of oxidative stress. CYP 2E1 activity can be measured non-invasively by the oral administration of chlorzoxazone (CHZ). Serial blood samples are taken at various time intervals post-ingestion and the ratio of CHZ and 6-hydroxycholrzoxazone was measured (the CHZ test). Orellana et al. (66) measured both liver CYP2E1 (by western blot) and CHZ hydroxylation in patients undergoing bariatric surgery. Both the liver content of CYP2E1 and the hydroxylation of CHZ were significantly higher in the group found to have NASH compared with the control group. Furthermore, both parameters correlated positively with the severity of disease. Chtioui et al. (64) evaluated the CHZ test as a non-invasive tool for the diagnosis of NASH in an NAFLD population. They found no difference in the CHZ test between patients with NASH and those with simple steatosis. Both studies suffered from relatively small sample sizes – larger scale review and validation are required before conclusive remarks can be made.

Fibrosis markers

Dehydroepiandrosterone, an abundant steroid hormone known to influence sensitivity to oxidative stress, is measurable in serum. Circulating levels have been shown to correlate negatively with the stage of fibrosis in NASH (53). Hyaluronic acid (HA), a component of the extracellular matrix in most tissues, has been shown in several studies to be associated with advanced stages of fibrosis in NASH (67–69). Both plasma pentraxin 3 (70) and serum enothelin-1 (71) have been shown to correlate with the stage of fibrosis in NASH cohorts. Serum prolidase enzyme activity, an enzyme that catalyses the final step of collagen breakdown, is significantly higher in NASH than in SS. Furthermore, there are positive and significant correlations with fibrosis score, lobular inflammation and NAFLD activity score (72).

Apoptosis markers

Hepatocyte apoptosis has been postulated as an important, if not critical mechanism for the progression of NAFLD. Furthermore, phagocytosis of apoptotic bodies by hepatic stellate cells stimulates the fibrogenic activity of the cells (73). Feldstein et al. (74) assessed the role of hepatocyte apoptosis in human NASH, firstly semi-quantitatively by counting TUNEL-positive cells seen on histology. The number of TUNEL-positive cells were significantly greater in NASH compared with steatosis and control. The number of TUNEL-positive cells also correlated positively with the grade of fibrosis. The presence of apoptosis was confirmed in this study by immuno-histochemical staining for caspases 3 and 7 as well as Fas. Further evidence of the involvement of apoptosis in NASH was provided by Panasiuk et al.'s study (75). They carried out immuno-histochemical staining, for the pro-apoptotic proteins p53 and Bax and the anti-apoptotic protein Bcl-2, on steatotic and non-steatotic hepatocytes from patients with NALFD. The expression of Bax was significantly higher in the steatotic group than the non-steatotic group, while the expression of p53 was higher in the NASH group (both steatotic and non-steatotic hepatocytes) than the SS group. The expression of the anti-apoptotic protein Bcl-2 was reduced in the steatotic hepatocytes vs the non-steatotic cells, suggesting possible inhibition of this protein.

Having established hepatocyte apoptosis as an important feature of NASH, several markers of apoptosis have since been investigated. Cytokeratin 18 (CK-18) is a protein cleaved by caspases 3 and 7 during the apoptotic process. Plasma levels of this cleavage product are significantly higher in patients with NASH; moreover, CK-18 levels can be used to differentiate SS from NASH (76, 77). In a further multicentre validation study, Feldstein et al. (78) found CK-18 fragments to correlate with the degree of NASH, predicting the presence of NASH. This validation study consisted of 139 NASH patients and CK-18 achieved an AUROC of 0.83. These findings support the assertion that CK-18 may be a useful clinical marker in the diagnosis of NASH.

Other non-invasive apoptosis markers shown to differentiate NASH from SS and control subjects include tissue polypeptide-specific antigen (79), a keratin 18 serological mirror.

Algorithm tests

Algorithm tests often use complex mathematical equations to assess the utility of a combination of simple markers in diagnosing and staging NAFLD. Several such studies have been performed and are summarised in Table 2. The ultimate aim of these tests is to provide a robust and validated means of diagnosing and staging NAFLD non-invasively. Each algorithm test discussed varies in its aim – some are intended to differentiate SS from NASH, while others focus more on predicting the severity of fibrosis in a NASH population. This dichotomy adds a degree of difficulty when evaluating and comparing performance. The relatively small number of studies included also hinders comparative evaluation. Most algorithm studies, and indeed some of the previously described studies, use AUROC to assess the performance of the scoring system. AUROC values greater than 0.8 indicate good diagnostic performance, whereas values less than 0.8 suggest suboptimal performance. The closer the value to 1, the better performing the scoring system.

Table 2.   Summary of algorithm tests
ReferenceMarkersTest name (aim/purpose of test)Cutoff valueAUROC
TV
  • *

    The clinical model comprises of the following risk factors: age≥50 years, female gender, AST≥45 IU/l, BMI≥30, AAR≥0.80, HA≥55mcg/l.

  • Group 1 (n=170) was a single centre group. Group 2 (n=97) was a multicentre group.

  • ALT, alanine transaminase; AST, aspartate transaminase; AUROC, area under receiver operator curve; BMI, body mass index; CK18, cytokeratin 18; ELF, European Liver Fibrosis score; HA, hyaluronic acid; Met Synd, metabolic syndrome; P3NP, aminoterminal peptide of pro-collagen III; T, training set; TG, triglycerides; TIMP-1, tissue inhibitor of matrix metalloproteinase 1; T2DM, type 2 diabetes mellitus; NR, not reported; V, validation set.

Younossi et al. (90)CK18 M30NASH diagnostics0.27720.908 
 CK18 M65    
 AdiponectinDiagnose NASH   
 Resistin 0.2085 0.732
 T=69    
 V=32    
Poynard et al. (89)T=160NashTestNR0.790.79
 V=97Diagnose NASH   
Palekar et al. (67)HA+clinical model*Differentiate SS from NASH≥3 risk factors0.763No validation group
 T=80    
Guha et al. (82)HAELF0.3576 0.9
 TIMP1Severe Fibrosis   
 P3NPF0,1,2 vs F3,4   
 V=192    
 Age, BMISimple panel−2.3824NR0.89
 PlateletsSevere fibrosis−0.8325  
 Albumin    
 AST/ALT    
 Hyperglycaemia    
 V=91Simple panel+ELF−0.2826 0.98
  Severe fibrosis0.0033  
Angulo et al. (83)Age, BMINAFLD fibrosis score−1.4550.880.82
 PlateletF0,1,2 vs F3,40.676  
 AlbuminT=480   
 AST/ALTV=253   
 Hyperglycaemia    
Ratziu et al. (80)BMI, ALTBAAT1 No validation group
 Age, TG    
 T=93F0,1 vs F2,3,440.84 
Harrison et al. (81)BMI, AST/ALT, T2DMBARD≥20.810.78
 T=827    
 V=160F0,1,2 vs F3,4   
Suzuki et al. (68)HA+clinical modelPredict severe fibrosis in NASHn/r0.92No validation group
 T=79    
Poynard et al. (86)FibroTest+BMISteatotest0.300.790.72–0.86
 Cholesterol, TGLiver steatosis   
 Glucose    
 T=310    
 V=434    
Ratziu et al. (84) FibroTest   
  F0,1 v F2,3,40.300.860.75
  F0,1,2 vs F3,40.300.920.81
Calés et al. (85)Glucose, AST, plateletsFibrometre   
 Ferritin, ALTF0,1 vs F2,3,4≤0.6110.9360.952
 Weight, ageF0,1,2 vs F3,4 0.9280.950
 T=121F0,1,2,3 vs F4 0.9290.888
 V=114    
Kotronen et al. (87)Met Synd, ASTNAFLD liver fat score   
 T2DM, AST/ALTLiver fat/steatosis−0.6400.870.86
 Insulin    
 T=313    
 V=157    
Lee et al. (88)ALT, AST, BMIHepatic Steatosis Index>360.8120.819
 T2DM, GenderLiver steatosis   
 T=2680    
 V=2682    

Fibrosis algorithm tests

In this section, we review the algorithm tests whose primary outcome is the prediction of fibrosis in an NAFLD cohort. Where applicable, distinction is made as to the extent of fibrosis assessed.

One of the earliest such studies used a combination of BMI, age, ALT and TG to predict the presence of advanced fibrosis in a group of overweight patients with abnormal LFTs. Ratziu et al. (80) retrospectively examined 97 overweight patients, concluding that the combination of the mentioned markers, shortened to BAAT, could predict the presence of septal fibrosis with an AUROC of 0.84, indicating good diagnostic power. The BARD algorithm (81) was developed in a larger cohort of patients (897) and utilised a combination of BMI, age, AST/ALT and the presence of T2DM to predict the presence of advanced fibrosis with an AUROC curve of 0.81.

Another comparatively simple panel study (68) used serum HA in combination with several clinical parameters (age, obesity, T2DM, ALT/AST) to predict the presence of severe fibrosis in NASH. This particular algorithm was devised based on a cohort of 79 patients and achieved an AUROC of 0.92 for the detection of advanced fibrosis. The European Liver Fibrosis panel was originally developed to detect the presence of fibrosis in chronic liver disease. It has since been validated as a useful test to predict the presence of fibrosis in NAFLD. Furthermore, the removal of age from the original ELF panel did not affect the performance of this panel in NAFLD (82). The addition of simple parameters to the ELF panel, including BMI, albumin and platelets, further improved the diagnostic performance for the detection of severe fibrosis from 0.90 without to 0.98 with the simple markers. Angulo et al. (83) developed a simple panel of markers, namely age, hyperglycaemia, BMI, platelets, albumin and AST/ALT ratio. The simple panel was used to diagnosis severe fibrosis with a positive predictive value (PPV) of 90% and a negative predictive value of 82% (AUROC for validation set 0.82).

FibroTest was developed and validated in an NAFLD cohort with the intention of detecting advanced fibrosis. For the detection of F3/4 fibrosis, FibroTest achieved an AUROC of 0.92 in a single-centre cohort and 0.81 in a multicentre group (84). The fibrometre (85) panel includes seven variables (glucose, AST, age, body weight, ferritin, ALT and Plt) performed as well as the NAFLD Fibrosis Score (AUROC 0.928 vs 0.937) when predicting the presence of severe fibrosis (defined as F3/4 disease). However, fibrometre performed better when predicting F2,3,4 than the NAFLD fibrosis score (AUROC 0.936 vs 0.9).

All of these studies aim to predict the presence of advanced fibrosis in NAFLD and it is tempting to draw comparisons between these studies in search of the ‘best’ algorithm. However, it should be remembered that the histological scoring systems and indeed the patient phenotypes differ between studies, making head-to-head comparisons difficult.

Steatosis algorithm tests

While the previously mentioned studies attempt to predict the presence of fibrosis, few algorithm studies are designed to differentiate SS from NASH. The SteatoTest (86) was developed and validated in a cohort of patients with chronic liver disease, including viral hepatitis, NAFLD and ALD. The primary outcome was the detection of steatosis. The test performed well in both the training sets and the three validation cohorts included in the study, with AUROC values of 0.79 for the training set and 0.79–0.86 for the validation groups. Of particular note, SteatoTest performed consistently better than ALT and GGT at detecting all grades of steatosis, reflected in both the box plot and AUROC analysis. Palekar et al. (67) constructed a panel, including HA, to differentiate SS from NASH, although patients with NASH and no fibrosis were excluded from the study. This panel performed with an AUROC value of 0.763.

In a large (n=470) Scandinavian study, Kotronen et al. (87) measured liver fat content by H-MRS. They constructed an NAFLD liver fat score to diagnose NAFLD – the score comprised AST, fasting insulin, presence or absence of metabolic syndrome and T2DM and AST/ALT. NAFLD was defined as a liver fat content >5.56% as measured by H-MRS. When applied to the cohort, the NAFLD liver fat score achieved an AUROC of 0.87 in the training and 0.86 in the validation groups. Interestingly, the authors assessed the effect of adding genetic information (presence/absence of SNP for PNPLA3 – a genetic variant associated with NAFLD) to the score. The sensitivity was improved by <1%.

Lee et al. (88) developed the Hepatic Steatosis Index (HSI) in a large cohort of ultrasound-diagnosed NAFLD patients (n=5362). The HSI includes ALT, AST, BMI, age and sex in a relatively simple equation and achieved an AUROC of 0.812 in the training and 0.819 in the validation cohorts.

Non-alcoholic steatohepatitis algorithm test

The NashTest (89) uses a combination of 13 different parameters, both biochemical and clinical, to diagnosis NASH with an AUC of 0.79 in both the training and validation sets. NashTest performed similarly when diagnosing borderline NASH and no NASH. Younossi et al. (90) also developed a biomarker panel designed to diagnose NASH from an NAFLD cohort. They used a combination of four ELISA tests [cleaved CK-18 (CK-18–cleaved CK-18), adiponectin and resistin], validating their findings in a separate, smaller cohort. In the validation set, this panel achieved an AUROC of 0.732.

Of those studies that reported AUROC figures for both the training and validation sets, regardless of the test aim, the test performed less well in the validation set than in the training set, with the exception of SteatoTest, NashTest and FibroTest (equal performance or slightly improved performance). Of all the studies reviewed, those that seek to diagnose NASH have a median AUROC value of 0.763, compared with 0.92 for studies predicting the presence of severe fibrosis in NASH. This suggests that algorithm tests are better developed and perform better when predicting the presence of severe fibrosis than when diagnosing NASH. However, AUROC results should be interpreted with caution as it does not reflect the performance across all score values, where a range of sensitivity and specificity values are seen (91). The performance of these tests requires more evaluation and validation in larger cohorts to ensure that they are reliable in wider use.

Several algorithm tests have been designed and developed within an obese (severely) population. Many of these studies recruited patients undergoing bariatric surgery. Dixon et al. (92) developed the HAIR algorithm. One hundred and five patients undergoing laparoscopic bariatric surgery were included in the study. Raised ALT, defined as ALT >40 IU/l, systemic hypertension and index of insulin resistance were independent predicators of NASH. The unweighted sum of all three parameters comprises the HAIR algorithm, with a score of 3 associated with an AUROC of 0.9. The presence of two or more parameters was associated with a sensitivity of 0.8 and a specificity of 0.89 for the detection of NASH. Gholam et al. (93) evaluated 97 severely obese subjects, of whom 35 patients had NASH and 25 had fibrosis. Several clinical parameters were independent predictors of NASH and fibrosis as distinct parameters, leading to the development of two separate algorithms. ALT and HbA1C were incorporated in a complex algorithm that performed well with an AUROC of 0.90 (sensitivity 0.83, specificity 0.82) for the detection of fibrosis, whereas AST and T2DM were incorporated into an equally complex algorithm for the diagnosis of NASH. This algorithm achieved an AUROC of 0.82 (sensitivity 0.76, specificity 0.66). Campos et al. (94) developed an NASH Clinical Scoring System in a cohort of patients undergoing bariatric surgery. The Scoring System was developed in 172 patients, of whom 58 had a diagnosis of NASH. Multivariate analysis revealed six independent predictive variables associated with NASH (hypertension, T2DM, AST≥27 IU/l, ALT≥27 IU/l, obstructive sleep apnoea and non-black race). Non-black race was assigned a score value of 2, and the other parameters were assigned a value of 1. The sum of the parameters was used to place patients into either a low-, intermediate-, high- or very-high-risk group. For the detection of NASH, the very-high-risk category achieved a PPV of 93% in the full study group, dropping to 80% in a cross-validated cohort, although there was 85% concordance between the full study set and the cross-validation cohort. The scoring system achieved an overall AUROC of 0.75 in the cross-validation cohort. Most recently, Ulitsky et al. (95) constructed a scoring system based on 253 patients undergoing bariatric surgery. Four parameters were included in this model (T2DM, abnormal ALT, triglycerides>150 mg/dl, obstructive sleep apnoea) with T2DM attracting a point score of 2 and the remaining parameters a point score of 1. The model achieved an overall AUROC of 0.76 for the detection of NASH. All four studies described tackle an important group of patients, the severely obese. It is not appropriate, however, to use these systems indiscriminately. These systems are not necessarily transferable to a general NAFLD population who, although usually overweight, do not always achieve morbidly obese status.

Summary

The field of non-invasive markers of NALFD is a complex and varied one. Many studies have been performed, each using differing techniques, adding to the heterogeneity in the field. The differing study designs, end points and disease groups studied make it difficult to draw comparisons between studies and as such each should be considered on its own merits. The proteomic and genomic studies provide fascinating insights into disease mechanisms and pathogenesis while also highlighting potential non-invasive markers of NAFLD. For the most part, these studies require larger scale validation of their findings, although they show exciting promise in the discovery of novel biomarkers.

Both the markers of association studies and the algorithm studies lend themselves more to clinical utility. They are comprised mostly of routinely available clinical and biochemical markers and as such have the potential to be used in every day practice. Several of the algorithm systems, i.e. BARD and the NAFLD fibrosis score, have been validated in large populations, with encouraging results. Although it is not the intention of the authors to advocate the use of one algorithm system in particular, it is clear that there are a number of developed scoring systems with potential clinical utility. The exact role of the scoring systems remains unclear. Whether they are able to replace liver biopsy remains to be seen; more likely, they have a role in triaging patients for liver biopsy, providing a means to allocate this resource to patients most likely to benefit.

Ancillary