Multiparametric magnetic resonance imaging for the assessment of non‐alcoholic fatty liver disease severity

Abstract Background & Aims The diagnosis of non‐alcoholic steatohepatitis and fibrosis staging are central to non‐alcoholic fatty liver disease assessment. We evaluated multiparametric magnetic resonance in the assessment of non‐alcoholic steatohepatitis and fibrosis using histology as standard in non‐alcoholic fatty liver disease. Methods Seventy‐one patients with suspected non‐alcoholic fatty liver disease were recruited within 1 month of liver biopsy. Magnetic resonance data were used to define the liver inflammation and fibrosis score (LIF 0‐4). Biopsies were assessed for steatosis, lobular inflammation, ballooning and fibrosis and classified as non‐alcoholic steatohepatitis or simple steatosis, and mild or significant (Activity ≥2 and/or Fibrosis ≥2 as defined by the Fatty Liver Inhibition of Progression consortium) non‐alcoholic fatty liver disease. Transient elastography was also performed. Results Magnetic resonance success rate was 95% vs 59% for transient elastography (P<.0001). Fibrosis stage on biopsy correlated with liver inflammation and fibrosis (r s=.51, P<.0001). The area under the receiver operating curve using liver inflammation and fibrosis for the diagnosis of cirrhosis was 0.85. Liver inflammation and fibrosis score for ballooning grades 0, 1 and 2 was 1.2, 2.7 and 3.5 respectively (P<.05) with an area under the receiver operating characteristic curve of 0.83 for the diagnosis of ballooning. Patients with steatosis had lower liver inflammation and fibrosis (1.3) compared to patients with non‐alcoholic steatohepatitis (3.0) (P<.0001); area under the receiver operating characteristic curve for the diagnosis of non‐alcoholic steatohepatitis was 0.80. Liver inflammation and fibrosis scores for patients with mild and significant non‐alcoholic fatty liver disease were 1.2 and 2.9 respectively (P<.0001). The area under the receiver operating characteristic curve of liver inflammation and fibrosis for the diagnosis of significant non‐alcoholic fatty liver disease was 0.89. Conclusions Multiparametric magnetic resonance is a promising technique with good diagnostic accuracy for non‐alcoholic fatty liver disease histological parameters, and can potentially identify patients with non‐alcoholic steatohepatitis and cirrhosis.


| INTRODUCTION
Non-alcoholic fatty liver disease (NAFLD) represents a disease spectrum ranging from accumulation of liver fat only (steatosis) to fat associated with inflammation (non-alcoholic steatohepatitis; NASH) and fibrosis. NAFLD has now reached epidemic levels in developed countries, affecting a third of the adult population. 1 NASH prevalence is estimated at 3%-12%, 2,3 and is expected to become the most common indication for liver transplantation in the near future. 4 Steatosis and NASH have been traditionally regarded as distinct disease entities with steatosis generally running a benign course and with NASH associated with disease progression. 5,6 However, some patients with simple steatosis can develop progressive disease, 7 suggesting that NAFLD may be more complex than previously thought.
The diagnosis and classification of NAFLD into different subtypes (steatosis, NASH) and staging of fibrosis often relies on liver biopsy, and this is problematic because of the inherent drawbacks of this technique (eg sampling and observer dependent variability). 8 Furthermore, the majority of patients with NAFLD have uncomplicated steatosis, where non-invasive diagnosis would be preferable. There is therefore a clinical need for reliable non-invasive biomarkers for the assessment of NAFLD.
Non-invasive biomarkers can be broadly divided into serum based and imaging or elastography technologies. Serum biomarkers have yielded mixed results that have hindered widespread clinical application. For example, cytokeratin-18 has demonstrated moderate overall accuracy for diagnosing NASH in a meta-analysis (66% sensitivity, 82% specificity), 9 but was found to have only a limited sensitivity (58%) for the diagnosis of NASH in a large clinical study. 10 Measurement of liver stiffness (LS) using transient elastography (TE) 11 is increasingly used for the assessment of fibrosis in patients with viral hepatitis. However, it is associated with high failure rates, particularly in obese patients (BMI>30 kg/m 2 ), 12 where reliable measures could only be obtained in 65% of patients in one study. 13 This limits the applicability of TE for the assessment of patients with NAFLD who are often obese.
Measuring liver stiffness using magnetic resonance elastography (MRE) has shown promise in the evaluation of fibrosis in patients with NAFLD, 14 outperforming serum-based tests and ultrasoundbased elastography techniques. 15,16 More recently, a more advanced version of this technique (3D-MRE) has produced even better results than the commercially available 2D-MRE. 17 However, the accuracy of MRE for the diagnosis of NASH is limited, and this technique remains restricted to specialist centres with considerable obstacles to widespread use (eg need for additional hardware). MRI techniques that can be implemented using scanners available in routine practice offer an attractive alternative for NAFLD evaluation.
We have recently developed a multiparametric magnetic resonance (MR) technique that allows quantification of liver inflammation and fibrosis. [18][19][20] This technique has shown a high diagnostic accuracy compared to histology 18 and can also provide prognostic information 21 in patients with mixed liver disease aetiologies.
The primary aim of this study was to evaluate the diagnostic performance of multiparametric liver MRI specifically in the assessment of patients with NAFLD using liver histology as the reference standard.
We also compared this to TE in the assessment of fibrosis. The analysis was conducted using components of the steatosis, activity and fibrosis (SAF) score and the diagnostic categories of the Fatty Liver Inhibition of Progression (FLIP) consortium algorithm. 22

| Study design and patient population
This was a prospective pilot study conducted at a UK tertiary centre (John Radcliffe Hospital, Oxford, UK) between May 2011 and March 2015. Adult patients (≥18 years) with suspected or known NAFLD were invited to participate (see also Data S1). Patients attended for a single visit, for multiparametric MR examination, TE and blood sampling. The median (IQR) interval between the study visit and biopsy was 13 5-27 days.
All the examinations were carried out after a fasting period of at least 4 hours. Patients were recruited from general hepatology and metabolic liver disease clinics and from the bariatric surgery service. Biopsies were Conclusions: Multiparametric magnetic resonance is a promising technique with good diagnostic accuracy for non-alcoholic fatty liver disease histological parameters, and can potentially identify patients with non-alcoholic steatohepatitis and cirrhosis.

K E Y W O R D S
diagnostic accuracy, non-alcoholic steatohepatitis, non-invasive test, sensitivity and specificity Key points • Multiparametric magnetic resonance (MR) can be used to derive the liver inflammation and fibrosis score (LIF), a non-invasive, quantitative score that can be used to evaluate non-alcoholic fatty liver disease (NAFLD).
• In patients with NAFLD, LIF score had good diagnostic accuracy, both for the diagnosis of non-alcoholic steatohepatitis and ballooning.
• The LIF score also had good diagnostic accuracy for cirrhosis.
• This methodology has the potential to be used for risk

| Multiparametric MR examination
All MR scans were performed with the patient lying supine in a 3-Tesla scanner (Siemens, Tim Trio, Germany). The individual components of the multiparametric MR protocol were T 1 mapping and T 2 * mapping which were used to calculate the iron-corrected T 1 and LIF score (see also Data S1).

| Iron-corrected T 1 and the liver inflammation and fibrosis score T 1 relaxation time increases with increases in extracellular fluid and
is characteristic of fibrosis and inflammation. However, the presence of iron, which can be accurately measured from T 2 * maps, has an opposing effect on the T 1 . An algorithm has been created that allows for the bias introduced by elevated iron to be removed from the T 1 measurements, yielding the iron-corrected T 1 (cT 1 ). 18,20 Optimal cT 1 cut-off points for the differentiation of: no (Ishak fibrosis stage F0), mild (Ishak F1-2), moderate (Ishak F3-4) and severe (Ishak F5-6) liver fibrosis have been derived from the association of cT 1 with histological fibrosis in our previous study. 18 These cut-offs were used to develop the liver inflammation and fibrosis (LIF) score, a standardised continuous score (0-4).
LiverMultiScan™ (LMS, Perspectum Diagnostics, Oxford, UK), is a software product that can be used to measure cT 1 and LIF scores from T 1 and T 2 * maps. For this study, LMS was used to analyse anonymised images, by a blinded investigator (MP). Interobserver agreement was assessed in a subset of consecutive scans (see Data S1 and Figure S1).
LIF scores were measured in two operator-defined, regions of interest (ROI), one in each liver lobe, and the average value was used in the analysis. The coefficient of variance (CoV) for the measurement of cT 1 / LIF on two different occasions on the same patient (test, re-test CoV) was previously found to be 1.8%. 18 Figure 2 illustrates typical MR data from patients with varying disease severity.

| Transient elastography
TE was performed using Fibroscan (Echosens, France) by operators (MP or RB) who were certified by the manufacturer to perform liver stiffness measurements. TE was performed with the patient lying supine and with the right arm fully extended. Both the medium (M) probe and extra-large (XL) probes were used. Ten measurements per patient were needed for a successful scan and the manufacturer's recommendations were used to assess the validity of each examination (10 valid measurements; 60% success rate; interquartile range to median ratio <0.3).

| Liver histology
Percutaneous biopsies (n=50) were performed under ultrasound guidance using 18G cutting biopsy needles and trans-jugular (n=9) biopsies were performed under fluoroscopic guidance using 19G needles. Patients who were having bariatric surgery (n=12) had wedge liver biopsies intra-operatively. The median (IQR) biopsy length in patients who had needle biopsies was 18 mm, 14-25 including a median (IQR) of 10 (7-13) portal tracts. All biopsies were included in the final analysis.
Biopsies were evaluated by two experienced liver pathologists and discussed in a clinico-pathological meeting before a final consensus report was issued, and this was used as the reference standard in this study. The reporting pathologists and clinicians attending the clinicopathological meeting were blinded to the MR data.

| Statistical analysis
All the analysis was carried out using GraphPad Prism software (version 6.05, July 7, 2014). Statistical significance was set at P<.05.
Descriptive statistics were used to summarise baseline subject characteristics. Normality was determined using the Shapiro-Wilks test.
Associations were tested using the Spearman's correlation coefficient (r s ). Differences between groups were assessed using the Mann-Whitney test. Fisher's exact test was used to test for the differences in proportions between two groups. Differences between multiple groups were assessed using the Kruskal-Wallis test with Dunn's correction for multiple comparisons.
Receiver operating characteristic curves (ROC) were used to determine (a) the diagnostic accuracy of multiparametric MR for the assessment of NAFLD components (ballooning, lobular inflammation, activity, fibrosis, NASH vs steatosis, mild vs significant NAFLD, and (b) the diagnostic accuracy of TE in the assessment of NAFLD fibrosis. A cut-off to optimise sensitivity at 90% was reported. Ninety-five percent confidence intervals (95% CI) were calculated for the parameters of diagnostic accuracy.

| Baseline characteristics
A total of 78 patients consented to participate and biopsy data were available in 75. Of these, 71 (95%) had a successful MRI and were included in the final analysis (Figure 1). TE was attempted in 64 (90%)  Table 1, and for subpopulations within the study (suspected NAFLD, known NAFLD, patients undergoing bariatric surgery) in Tables S1-3.
The diagnostic accuracy of the two techniques for the diagnosis of significant (F2-4) and bridging fibrosis (F3-4) and cirrhosis (F4) are summarised in Table S4.
The median LIF scores of patients with no lobular inflammation and lobular inflammation grade >0 were 1.5 and 2.7 respectively (P=.024, Figure 3). There was an association between LIF and overall activity (sum of ballooning + lobular inflammation grades; r s .58; P<.0001).
Overall, there was a strong association between the total SAF score and LIF score (r s =.70, P<.0001; Figure 4).

| DISCUSSION
This prospective pilot study has shown that multiparametric MRI can be used to assess the overall disease severity in patients with Multiparametric MR had a significantly higher success rate (95%) compared to TE (59%, P<.0001), while there was a considerable overlap in the 95% confidence intervals for the diagnostic accuracy in the detection of fibrosis, indicating no significant differences between the two techniques (Table S4). TE is solely used for the assessment of fibrosis, so its diagnostic accuracy in the assessment of activity, NASH and overall disease severity was not examined in this study. In the steatosis, activity and fibrosis (SAF) score, biopsies are reported for steatosis (0-3), activity (0-4; sum of ballooning (0-2) and lobulitis (0-2)) and fibrosis (0-2). The fatty liver inhibition of progression (FLIP) algorithms categorise patients as NASH if all of steatosis, ballooning and lobulitis are graded as 1 or higher and as steatosis if this criterion is not met. The overall disease severity is also classified as mild (fibrosis <2 and activity <2) or significant (fibrosis or activity ≥2).
T A B L E 2 Diagnostic parameters of liver inflammation and fibrosis score for NAFLD assessment F I G U R E 4 Liver inflammation and fibrosis score for the assessment of the total steatosis, activity and fibrosis score. There was a strong association between the liver inflammation and fibrosis score (LIF) and the overall histological severity scored by the steatosis, activity and fibrosis score (SAF; r s =. fat. We have previously found good agreement between assessors in our centre for the assessment of fibrosis (weighted kappa 0.51) and steatosis (weighted kappa 0.72). 18 In addition, for the histological assessment of fibrosis, a minimum biopsy length of 25 mm or at least 11 portal tracts are needed for reliable scoring. 28 However, a biopsy with at least six portal tracts is generally considered adequate for routine diagnosis. 29 As our aim was to conduct a comparison with all the histological aspects of NAFLD, we have not excluded any biopsies based on quality criteria as the pathologists could assess all the histological parameters of interest in our study.
In conclusion, this study shows that multiparametric MR is a promising tool for the evaluation of patients with NAFLD. MR gave reliable data more frequently compared to TE, with no differences in the diagnostic accuracy for significant fibrosis or cirrhosis. Furthermore, multiparametric MR had good accuracy for the diagnosis of NASH and ballooning. Therefore, the ability to assess both the necroinflammatory and fibrotic components of NASH in a single test is a particular strength of the MR technique that allows accurate evaluation of the overall disease severity. Further refinement and the technical development of non-invasive biomarkers that enable separate quantification of the inflammatory and fibrotic components of NAFLD is likely to revolutionise this field. Long-term follow-up of patients with NAFLD will be required to determine the prognostic capabilities of multiparametric MR, and this should be the focus of future studies.