Potential conflict of interest: Dr. Chan advises and is on the speakers' bureau of Novartis and Bristol-Myers Squibb. He also advises Pharmasset and Schering-Plough. Dr. Sung advises and is on the speakers' bureau of AstraZeneca, GlaxoSmithKline, and Roche.
Nonalcoholic fatty liver disease (NAFLD) is one of the most common liver diseases in affluent countries. Accurate noninvasive tests for liver injury are urgently needed. The aim of this study was to evaluate the accuracy of transient elastography for the diagnosis of fibrosis and cirrhosis in patients with NAFLD and to study factors associated with discordance between transient elastography and histology. Two hundred forty-six consecutive patients from two ethnic groups had successful liver stiffness measurement and satisfactory liver biopsy specimens. The area under the receiver-operating characteristics curve (AUROC) of transient elastography for F3 or higher and F4 disease was 0.93 and 0.95, respectively, and was significantly higher than that of the aspartate aminotransferase–to–alanine aminotransferase ratio, aspartate aminotransferase–to–platelet ratio index, FIB-4, BARD, and NAFLD fibrosis scores (AUROC ranged from 0.62 to 0.81, P < 0.05 for all comparisons). At a cutoff value of 7.9 kPa, the sensitivity, specificity, and positive and negative predictive values for F3 or greater disease were 91%, 75%, 52%, and 97%, respectively. Liver stiffness was not affected by hepatic steatosis, necroinflammation, or body mass index. Discordance of at least two stages between transient elastography and histology was observed in 33 (13.4%) patients. By multivariate analysis, liver biopsy length less than 20 mm and F0-2 disease were associated with discordance. Conclusion: Transient elastography is accurate in most NAFLD patients. Unsatisfactory liver biopsy specimens rather than transient elastography technique account for most cases of discordance. With high negative predictive value and modest positive predictive value, transient elastography is useful as a screening test to exclude advanced fibrosis. Liver biopsy may be considered in NAFLD patients with liver stiffness of at least 7.9 kPa. (HEPATOLOGY 2010;51:454–462.)
Nonalcoholic fatty liver disease (NAFLD) is one of the most common chronic liver diseases worldwide.1 It is strongly associated with metabolic syndrome and obesity,2, 3 and may progress to cirrhosis and hepatocellular carcinoma.4, 5 The prognosis depends heavily on histological severity. Although patients with simple steatosis have excellent prognosis, those with nonalcoholic steatohepatitis tend to progress and have hepatic complications.6
Traditionally, liver biopsy is the gold standard for the assessment of hepatic necroinflammation and fibrosis. However, the procedure carries a small risk of complications and may not be acceptable to some patients. Because a standard liver biopsy sample only represents approximately 1/50,000 of the whole liver mass, sampling bias may occur. When both lobes of the livers underwent biopsy during bariatric surgery, fibrosis stage was discordant between the two samples in half of the cases.7 Noninvasive tests for NAFLD are urgently needed.8, 9
Transient elastography by Fibroscan is a noninvasive method for the diagnosis of liver fibrosis. It has high degree of accuracy and reproducibility in predicting bridging fibrosis and cirrhosis in patients with viral hepatitis.10–13 Nevertheless, NAFLD patients are underrepresented in previous validation studies. Whether factors other than fibrosis, such as hepatic steatosis and prehepatic fat, may affect liver stiffness is uncertain. Factors associated with inaccurate measurements have not been evaluated.
In this study, we aimed to evaluate the accuracy of transient elastography and biochemical tests for the diagnosis of fibrosis and cirrhosis in a large cohort of NAFLD patients, and to test whether liver stiffness is altered by hepatic steatosis, inflammation, and obesity. We also aimed to identify factors associated with discordance between liver stiffness measurements (LSM) and histology.
ALT, alanine aminotransferase; AST, aspartate aminotransferase; AUROC, area under the receiver-operating characteristics curve; BMI, body mass index; CI, confidence interval; IQR, interquartile range; LSM, liver stiffness measurement; NAFLD, nonalcoholic fatty liver disease.
Patients and Methods
Consecutive patients with NAFLD undergoing liver biopsies at the University Hospital of Pessac, France, and Prince of Wales Hospital, Hong Kong, were prospectively recruited. We included patients age 18 years or older. Men who consumed more than 30 g alcohol per day and women who consumed more than 20 g alcohol per day were excluded. Patients with secondary causes of hepatic steatosis (such as chronic use of systemic corticosteroids), positive hepatitis B surface antigen, anti-hepatitis C virus antibody, or histological evidence of other concomitant chronic liver diseases were also excluded. Because the aim of transient elastography was to diagnose significant fibrosis and early cirrhosis, patients with clinical and radiological evidence of cirrhosis were excluded (for example, bilirubin ≥30 μmol/L, albumin <35 g/L, international normalized ratio >1.3, platelet count <150 × 109/L, ascites, varices, splenomegaly). All patients gave informed written consent.
Comprehensive clinical assessment was performed. Co-morbid illness and drug/herb intake was recorded with a standard questionnaire. Anthropometric tests included body weight, body height, and waist circumference measurements. Body mass index (BMI) was calculated as weight (kg) divided by height (m) squared. Waist circumference was measured at a level midway between the lower rib margin and iliac crest with the tape all around the body in the horizontal position. On the day of liver biopsy, a fasting venous blood sample was taken for albumin, bilirubin, alanine aminotransferase (ALT), glucose, total cholesterol, and triglycerides.
In patients with complete biochemical data, the performance of transient elastography was compared with that of other prediction scores. The aspartate aminotransferase (AST)-to-platelet ratio index was calculated as AST (/upper limit of normal)/platelet count (×109/L) × 100.14 FIB-4 was calculated as age × AST (U/L)/platelet count (×109/L) × ✓ (U/L).15 Cutoff values for NAFLD patients were adopted.16 The NAFLD fibrosis score was calculated according to the following formula: −1.675 + 0.037 × age (years) + 0.094 × BMI (kg/m2) + 1.13 × impaired fasting glyceamia (IFG)/diabetes (yes = 1, no = 0) + 0.99 × AST/ALT ratio − 0.013 × platelet (×109/L) − 0.66 × albumin (g/dL).17 The BARD score was the weighted sum of three variables (BMI ≥ 28 = 1 point, AST/ALT ratio ≥ 0.8 = 2 points, diabetes = 1 point).18
In this study, liver histology serves as the gold standard for evaluating the diagnostic accuracy of transient elastography. Percutaneous liver biopsy was performed using the 16G Temno or Menghini needle. Liver biopsy specimens were fixed in formalin and embedded in paraffin. Liver histology was assessed by experienced histopathologists (B.L.B., P.C.C.) who were blinded to the clinical data. Liver specimens shorter than 15 mm were excluded. Histological scoring was performed according to the system reported by Kleiner et al.19 Grade of steatosis was defined according to Kleiner et al.: 0 = steatosis < 5%, 1 = steatosis 5% to 33%, 2 = steatosis > 33% − 66%, 3 = steatosis > 66%. Fibrosis was staged from 0 to 4: stage 0 = absence of fibrosis; stage 1 = perisinusoidal or portal; stage 2 = perisinusoidal and portal/periportal; stage 3 = septal or bridging fibrosis; and stage 4 = cirrhosis.
LSM was performed within 1 week before liver biopsy by using transient elastography according to the instructions and training provided by the manufacturer. Measurements were performed on the right lobe of the liver through intercostal spaces with the patient lying in dorsal decubitus with the right arm in maximal abduction. Ten successful acquisitions were performed on each patient. The median value represented the liver elastic modulus. Only cases with 10 successful acquisitions were evaluated. The liver stiffness was expressed in kiloPascal (kPa). The success rate was calculated as the number of successful measurements divided by the total number of measurements. The operators were blinded to all clinical data and the diagnoses of the patients.
Statistical tests were performed using the Statistical Package for Social Sciences version 16.0. Continuous variables were expressed as mean ± standard deviation or median (interquartile range [IQR]) as appropriate. Receiver-operating characteristics curves were constructed to assess the overall accuracy of LSM and to identify optimal cutoffs. The optimal cutoffs of LSM for F2, F3, and F4 disease were chosen at points with the highest Youden's index. The relationship between steatosis, NAFLD activity score, BMI, and LSMs was adjusted by fibrosis stage in a multiple linear regression model. Significant discordance between transient elastography and histology was defined as a difference in fibrosis stage by 2 points or more. In the assessment of discordance, both cutoff values identified in this study and those reported by Yoneda et al.20 were used. Quantitative variables between groups were compared by unpaired t test, Mann-Whitney U test, and one-way analysis of variance followed by Bonferroni test. Categorical variables were compared by chi-squared test or Fisher's exact test. The area under the receiver operating characteristics curves of different noninvasive tests was compared by the Delong test. All statistical tests were two-sided. Significance was taken as P < 0.05.
From May 2003 to April 2009, 309 consecutive patients with NAFLD underwent transient elastography and liver biopsies. A total of 35 patients were excluded because of liver biopsy length less than 15 mm. Twenty-eight (10.2%) patients were excluded because of failure to obtain 10 valid LSM acquisitions. Two hundred forty-six patients with valid LSM acquisitions and satisfactory liver biopsy specimens were included in the analysis. Patients who failed LSM acquisitions had higher BMI (35.6 ± 6.3 versus 28.0 ± 4.5 kg/m2, P < 0.001) and waist circumference (114 ± 14 versus 94 ± 12 cm, P < 0.001). Valid LSM acquisitions were obtained in 62 of 63 (98.4%) patients with BMI less than 25 kg/m2, 114 of 117 (97.4%) patients with BMI 25 to 30 kg/m2, and 70 of 94 (74.5%) patients with BMI of 30 kg/m2 or higher. The rate of successful acquisitions at the same BMI was similar in whites and Chinese. Thirty-one (12.6%) and 25 (10.2%) patients had advanced fibrosis and cirrhosis, respectively (Table 1).
Table 1. Clinical Characteristics of NAFLD Patients With and Without Discordant Results
The LSMs of patients with F0, F1, F2, F3, and F4 disease were 5.7 ± 1.8, 6.8 ± 2.4, 7.8 ± 2.4, 11.8 ± 5.2, and 25.1 ± 17.1 kPa, respectively (P < 0.0001 by analysis of variance). Patients with F3 and F4 disease had significantly higher LSM than those with less fibrosis (Fig. 1). Overall, the accuracy of transient elastography to detect F2 or higher, F3 or higher, and F4 disease was good, with areas under the receiver operating curve (AUROCs) of 0.84, 0.93, and 0.95, respectively (Table 2). The corresponding AUROCs were 0.87, 0.94, and 0.94, respectively, in the French cohort, and 0.84, 0.92, and 0.97, respectively, in the Chinese cohort.
Table 2. Accuracy of Transient Elastography
For each fibrosis stage, cutoffs with sensitivity >90%, highest overall accuracy and specificity >90% were presented.
AUROC, area under the receiver-operating characteristics curve; LR, likelihood ratio; NPV, negative predictive value; PPV, positive predictive value.
The best LSM cutoff for F2 or greater disease was 7.0 kPa (Table 2). The negative predictive value to exclude F2 or greater disease was 84% (95% confidence interval [CI], 78%–90%). Cutoff values of 5.8 kPa and 9.0 kPa had greater than 90% sensitivity and specificity to rule out and rule in F2 disease, respectively.
The best cutoff for F3 or greater disease was 8.7 kPa (Table 2). The negative predictive value to exclude F3 or greater disease was 95% (95% CI, 91%–98%). Cutoff values of 7.9 and 9.6 kPa had greater than 90% sensitivity and specificity to rule out and rule in F3 disease, respectively.
The best cutoff for F4 disease was 10.3 kPa (Table 2). The negative predictive value to exclude cirrhosis was 99% (95% CI, 98%–100%). The same cutoff value also had greater than 90% sensitivity to rule out cirrhosis. A cutoff value of 11.5 kPa had greater than 90% specificity to detect cirrhosis.
Factors Affecting LSM.
Steatosis grade (P = 0.31), NAFLD activity score (P = 0.31), serum ALT (P = 0.39), and BMI (P = 0.29) did not influence LSM after adjusting for fibrosis stage (Fig. 2). Similarly, whites and Chinese had similar LSMs at the same fibrosis stage (P = 0.22).
Prevalence of Discordance and Risk Factors.
Discordance of at least two stages between transient elastography and histology was observed in 33 (13.4%) patients according to the cutoffs derived in this study. Transient elastography predicted a higher fibrosis stage in 30 cases and a lower fibrosis stage in three cases. Using cutoffs reported by Yoneda et al.,20 discordance was also observed in 33 (13.4%) patients. Transient elastography predicted a higher fibrosis stage in 23 cases and a lower fibrosis stage in 10 cases. Discordances occurred in seven (10.0%) patients with BMI of at least 30 kg/m2 and 26 (14.8%) patients with BMI less than 30 kg/m2. Thus, among 94 obese patients, 24 failed to have valid LSM acquisitions, and seven had discordant results. The overall success rate of transient elastography (valid measurements plus correct classification) in patients with BMI of at least 30 kg/m2 was 67.0%. Among 37 patients with BMI of 35 kg/m2 or higher, 15 (41%) failed LSM acquisition, and two (5%) had discordance between LSM and histology. To avoid overfitting, Yoneda's cutoffs were used to identify risk factors for discordance.
By univariate analysis, younger age, Chinese ethnicity, lower fibrosis stage, and shorter liver biopsy lengths were associated with discordance (Table 1). Discordance occurred in 25 of 144 (17.4%) patients with liver biopsy lengths smaller than 20 mm, compared with 8 of 102 (7.8%) patients with liver biopsy lengths 20 mm or greater (P = 0.031). Similarly, discordance occurred in 32 of 190 (16.8%) patients with F0F1F2 disease, but only 1 of 56 (1.8%) patient with F3 or higher disease (P = 0.002). Conversely, performance indices of transient elastography were not associated with discordance. Discordance occurred in 30 of 231 (13.0%) patients with at least 60% valid LSM acquisitions out of all measurements and 3 of 15 (20.0%) patients with valid LSM acquisitions below 60% of all measurements (P = 0.43). Discordance occurred in 3 of 25 (12.0%) patients with IQR/LSM ratio above 0.3 and 30 of 221 (13.6%) patients with ratio below 0.3 (P = 1.0). By multivariate analysis, only liver biopsy length less than 20 mm (odds ratio, 2.7; 95% CI, 1.1–6.3; P = 0.024) and F3 or greater disease (odds ratio, 0.084; 95% CI, 0.011–0.63; P = 0.016) remained independent factors associated with discordance.
Comparison Between Transient Elastography and Other Biochemical Tests.
As shown in Table 3, AUROC of transient elastography was significantly higher than that of AST/ALT ratio, AST-to-platelet ratio index, FIB-4, NAFLD fibrosis score, and BARD score in the diagnosis of both advanced fibrosis and cirrhosis. Among the biochemical tests, the FIB-4 index was superior to AST/ALT ratio (P = 0.0008), AST-to-platelet ratio index (P = 0.017), and BARD score (P = 0.021) for the detection of F3 or greater disease, and superior to AST/ALT ratio (P = 0.0061) and BARD score (P = 0.0031) for the detection of cirrhosis.
Table 3. Area Under the Receiver Operating Characteristics Curves for Transient Elastography and Noninvasive Markers for the Diagnosis of Advanced Fibrosis and Cirrhosis
AUROC (95% CI)
AUROC for Fibroscan (95% CI)
AUROC (95% CI)
AUROC for Fibroscan (95% CI)
P values refer to the comparison between transient elastography and other noninvasive tests.
ALT, alanine aminotransferase; APRI, AST-to-platelet ratio index; AST, aspartate aminotransferase; AUROC, area under the receiver-operating characteristics curve; CI, confidence interval.
NAFLD fibrosis score
In an ‘intention-to-treat’ analysis, all 274 patients who underwent transient elastography and liver biopsy were analyzed, and the 28 patients in whom LSM could not be obtained were considered as not correctly classified. At the cutoff of 8.7 kPa, the negative predictive value of transient elastography in excluding F3 or greater disease remained high at 89.3% (Table 4). However, the positive predictive value was modest at 48.5%. Although the combined sensitivities and specificities of the biochemical tests were generally lower than those of transient elastography, they also had negative predictive values over 80% at the low cutoff values (Table 4).
Table 4. Comparative Performance of Noninvasive Tests for the Diagnosis of Advanced Fibrosis in 274 NAFLD Patients
In this “intention to treat” analysis, patients in whom ten valid liver stiffness measurements could not be acquired were included and counted as incorrect classification by transient elastography.
Two strategies using transient elastography to diagnose and exclude advanced fibrosis in NAFLD patients were evaluated: (1) use a single cutoff with the best overall sensitivity and specificity; (2) use two cutoffs with high sensitivity and specificity.
If liver biopsies were reserved for patients with LSM of at least 8.7 kPa, only 79 (32.1%) would require the procedure. Eight (4.8%) patients with F3 disease and one (0.6%) patient with cirrhosis would be missed (Table 2).
When two cutoffs were used, 148 (60.2%) patients had LSM below the low cutoff of 7.9 kPa, and the negative predictive value was 96.6% (95% CI, 93.7%–99.5%) (Table 2, Fig. 3). Fifty-eight (23.6%) patients had LSM above the high cutoff of 9.6 kPa, and the positive predictive value was 72.4% (95% CI, 60.9%–83.9%). If liver biopsies were performed only in patients with LSMs between 7.9 and 9.6 kPa, 40 (16.3%) patients required the procedure. Five (3.4%) patients with F3 disease would be missed, and 16 (27.6%) patients without advanced fibrosis would be misclassified. Alternatively, if liver biopsies were performed in all patients with LSM above 7.9 kPa, 98 (39.8%) would require the procedure.
In this large prospective cohort of NAFLD patients, transient elastography had high accuracy in detecting advanced fibrosis and cirrhosis. Successful measurement could be obtained in more than 97% of patients with BMI below 30 kg/m2 and 75% of obese patients. LSM was not affected by hepatic steatosis, necroinflammation, or obesity. Most discordance between transient elastography and histology occurred in patients with short liver biopsy lengths and mild or no fibrosis. In addition, transient elastography had superior performance to other noninvasive biochemical tests in diagnosing advanced fibrosis and cirrhosis.
Transient elastography has been validated in chronic viral hepatitis, cholestatic liver disease, and alcoholic liver disease.11, 21–24 According to a meta-analysis of nine studies, the pooled estimates of sensitivity and specificity are 87% and 91%, respectively, for cirrhosis and 70% and 84%, respectively, for F2 or higher disease.10 In a study of 97 Japanese patients with NAFLD, transient elastography had good overall accuracy, with AUROCs of 0.87, 0.90, and 0.99 for F2 or greater, F3 or greater, and F4 diseases, respectively.20 Similar levels of accuracy and reproducibility were observed in 50 pediatric NAFLD patients in Italy.25 However, in this study, only 10 patients had F3 or higher fibrosis. Our study confirms that transient elastography works well in both white and Asian NAFLD patients. At cutoff values of 8.7 and 10.3 kPa, the negative predictive values to exclude F3 or greater disease and cirrhosis were 95% and 99%, respectively. The adoption of transient elastography could potentially spare two thirds of NAFLD patients from liver biopsies. Because the prevalence of NAFLD is high in many affluent countries, this approach would be cost saving.
Conversely, the positive predictive value of transient elastography and other noninvasive tests to diagnose advanced fibrosis in NAFLD patients remains modest. Therefore, the main value of these tests is to exclude advanced fibrosis as screening tests. Based on our data, it is reasonable to consider liver biopsy in patients whose LSM is 7.9 kPa or above. When transient elastography is not available, the biochemical tests reported in this study are reasonable screening tests despite a lower overall accuracy.
LSM has been shown to be spuriously increased in patients with acute hepatitis and extrahepatic cholestasis, indicating that the stiffness of the liver is not attributable to fibrosis alone.26–29 One unique feature of NAFLD patients is the accumulation of subcutaneous, prehepatic, and hepatic fat. Whether this would affect LSM has major clinical implications. In patients with chronic hepatitis C, hepatic steatosis does not appear to influence LSM, although patients with severe steatosis were underrepresented.26 In this study, we clearly showed that hepatic steatosis did not increase LSM in NAFLD subjects. Although subcutaneous and prehepatic fat thickness was not measured, patients with high BMI also did not have increased LSM after adjusting for fibrosis stage. Moreover, ALT level and the NAFLD activity score did not influence LSM. This is likely because severe necroinflammation is rare in NAFLD subjects, and a milder degree of necroinflammation has no major impact on LSM. Besides, ALT level in NAFLD subjects mainly reflects the degree of hepatic steatosis and correlates poorly with necroinflammation.30
In patients with chronic hepatitis C, discordance between transient elastography and histology occurs more commonly if the IQR/LSM ratio is high.31 In our study, discordance occurred mainly in patients with shorter liver biopsy lengths and lower fibrosis stages. Both factors indicate that the discordance was attributable to understaging by histology as a result of sampling bias. One possible explanation of the phenomenon is that the distribution of fibrous tissue may be less even in NAFLD patients. In a study of 41 subjects undergoing right-lobe and left-lobe liver biopsies during bariatric surgery, the kappa coefficient for fibrosis staging was only 0.53.7 Although we cannot recommend relying on transient elastography regardless of the IQR/LSM ratio because most of our patients had IQR/LSM ratio less than 0.3 at inclusion, our study serves as a reminder that when a noninvasive test disagrees with histological results, the latter may be inaccurate. By mathematical modeling, the AUROC of a noninvasive test is limited by the biopsy sensitivity and specificity even if the test has perfect accuracy.32
Our study has several limitations. First, liver biopsy was used as the gold standard, and liver biopsy specimens were assessed by two pathologists. Sampling bias could not be excluded. However, liver biopsy is currently the only reference standard, and biopsy specimens were assessed by experienced pathologists. Second, patients recruited at referral centers likely had more advanced disease. However, the negative predictive values to exclude advanced fibrosis and cirrhosis would be even higher in the primary care setting. The inclusion of both whites and Chinese further increases the external validity of this study. Third, a significant proportion of obese subjects were not analyzed because of failed LSM. The problem may be solved in the future with the development of probes for obese subjects. In a study of 84 obese subjects, at least five measurements could be acquired in over 90% by using the new obese probe, compared with less than 80% by using the standard probe.33
In conclusion, transient elastography can be performed in most NAFLD patients and is accurate. The measurement and accuracy are not affected by hepatic steatosis, necroinflammation, and obesity. Unsatisfactory liver biopsy specimens rather than transient elastography technique account for most cases of discordance. With high negative predictive value and modest positive predictive value, transient elastography is useful as a screening test to exclude advanced fibrosis.