Aliment Pharmacol Ther 31, 1095–1103
Background Non-invasive assessments of liver fibrosis in chronic hepatitis B were well established.
Aim To develop a combined algorithm of liver stiffness measurement (LSM) and serum test formula to predict advanced liver fibrosis in chronic hepatitis B.
Methods We reported an alanine aminotransferase (AST)-based LSM algorithm for liver fibrosis in 156 chronic hepatitis B patients, which formed the training cohort to evaluate the performance of APRI (AST-to-platelet-ratio-index), Forns index, FIB-4 and Fibroindex against liver histology. The best combined LSM-serum formula algorithm would be validated in another cohort of 82 chronic hepatitis B patients.
Results In the training cohort, LSM has the best performance of diagnosing advanced (≥F3) fibrosis [area under the receiver operating characteristics curve (AUROC) 0.88, 95% confidence interval (CI) 0.85–0.91], while Forns index has the best performance among the various serum test formulae (AUROC 0.70, 95% CI 0.62–0.78). In the combined algorithm, low LSM or low Forns index could be used to exclude advanced fibrosis as both of them had high sensitivity (>90%). To confirm advanced fibrosis, agreement between high LSM and high Forns index could improve the specificity (from 99% to 100% and from 87% to 98% in the training and validation cohorts respectively).
Conclusion A combined LSM–Forns algorithm can improve the accuracy to predict advanced liver fibrosis in chronic hepatitis B.
Chronic hepatitis B (CHB) is the most common cause of liver cirrhosis and hepatocellular carcinoma in most Asian countries.1 Early diagnosis of advanced liver fibrosis and prompt anti-viral treatment can potentially reverse the fibrosis and reduce the risk of cirrhotic complications.2, 3 Recent treatment guidelines have clearly stated the importance of liver fibrosis as a pre-treatment assessment of CHB.4–6 Furthermore, determination of advanced liver fibrosis and early cirrhosis is important to stratify the risk of patients for surveillance of hepatocellular carcinoma.7, 8 Liver biopsy has all along been the gold standard of liver fibrosis assessment. Unfortunately, it is an invasive procedure that carries a significant, albeit very low, risk of bleeding and complications.
Various serum test-based formulae have been developed to predict liver fibrosis. Some of these formulae (for example, Fibrotest and Zeng index) require the measurement of uncommon and expensive laboratory parameters, such as α2-macroglobulin, apolipoprotein A1 and hyaluronic acid, which are not readily available in most laboratories.9, 10 Other investigators have developed mathematical formulae in chronic hepatitis C patients on the basis of common laboratory tests, including aspartate aminotransferase (AST)-to-platelet-ratio-index (APRI),11 Forns index,12 FIB-413 and Fibroindex14 (Table S1). We have derived and validated an index composed of body mass index, albumin, platelet and alanine aminotransferase (ALT) level (Hui index) based on 235 treatment-naïve CHB patients with a high negative predictive value (NPV, 0.92) to exclude significant (Ishak F3 or above) liver fibrosis.15 Indexes have also been derived in China and Korea combining ultrasonic features such as spleen size with clinical and laboratory parameters to predict CHB cirrhosis (METAVIR F4 fibrosis).16, 17
Transient elastography (Fibroscan, Echosens, Paris, France) is a non-invasive method based on shear wave technology to measure liver stiffness. It has been proven by numerous investigators to be a reproducible and accurate measurement of liver fibrosis, particularly for advanced fibrosis and early cirrhosis.18–20 One limitation of transient elastography is the increase in liver stiffness measurement (LSM) with higher ALT levels regardless of the fibrosis staging.20, 21 Owing to the very different mechanisms and limitations of serum test formulae and transient elastography, there is increasing enthusiasm to combine these two test modalities so as to increase the accuracy of fibrosis prediction.22 Recently, a sequential algorithm has been proposed using a trans-abdominal ultrasonic examination for hepatic nodularity and APRI to predict advanced liver fibrosis in chronic hepatitis C.23 Unfortunately, similar investigation in CHB is lacking.
In the current study, we aimed to investigate and validate the serum test formulae and transient elastography to predict advanced liver fibrosis in CHB. As we did not have detailed ultrasonic assessment at the time of liver biopsy, we could not assess the accuracy of formulae involving ultrasonic parameters. We also aimed to develop a combined algorithm using the best performing serum test formula and transient elastography for advanced liver fibrosis in CHB.
Patients and methods
Since 2006, we have been pursuing a prospective study evaluating the use of transient elastography against histological liver fibrosis in CHB patients.21 These patients formed the training cohort to evaluate the performance of non-invasive assessments of liver fibrosis in the current study. Patients who have serum ALT level above 1–5 times upper limit of normal (ULN), co-infection with hepatitis C virus, other liver diseases, decompensated liver cirrhosis or hepatocellular carcinoma were excluded. All patients were treatment-naïve at this time of assessment. All patients gave written informed consent for the study and the protocol was approved by the local ethics committee. All patients received comprehensive clinical and laboratory assessment, including liver biochemistry and fasting cholesterol level, at the time of liver biopsy.
Liver biopsy examination
Percutaneous liver biopsy was performed using the 16G Temno needle. Each liver biopsy specimen was assessed independently by two histopathologists (P.C.L.C., A.W.H.C.) without knowledge of the clinical data. A liver sample was considered adequate if it was longer than 15 mm and contained at least six portal tracts. Liver fibrosis was evaluated semi-quantitatively according to the METAVIR scoring system as follows: F0, no fibrosis; F1, portal fibrosis without septa; F2, portal fibrosis and few septa; F3, numerous septa without cirrhosis; and F4, cirrhosis.24 Advanced fibrosis was defined as METAVIR fibrosis score ≥3.
Liver stiffness measurement
Transient elastography was performed within 1 week from the liver biopsy examination according to the instructions and training provided by the manufacturer.25 Ten successful acquisitions were performed on each patient. The success rate was calculated as the ratio of the number of successful acquisition over the total number of acquisitions. The median value of LSM (in kilopascal, kPa) was kept as representative of the liver elastic modulus. The LSM was considered reliable only if 10 successful acquisitions were obtained, with interquartile range ≤30% of LSM and success rate >60%.
Development of combined algorithm serum test formulae and transient elastography
We have developed an ALT-based algorithm of LSM to predict advanced liver fibrosis and cirrhosis in 156 CHB patients.21 In the LSM algorithm, an LSM of ≤6.0 and ≤7.5 kPa could exclude advanced (F3) fibrosis in patients with normal and elevated (1–5 times ULN) ALT respectively. And an LSM of >9.0 and >12.0 kPa could diagnose advanced fibrosis in patients with normal ALT and elevated ALT respectively. This cohort of 156 patients formed the training set of the current study to evaluate the performance of the various serum test formulae. Eighty-two newly recruited CHB patients who had liver biopsy performed formed the validation set. The performance of the ALT-based transient elastography algorithm and the various serum test formulae would be validated by this validation cohort.
The best performed serum test formula and the ALT-based transient elastography algorithm will be sequentially applied in a diagnostic model. The low-risk patients among whom advanced fibrosis could be confidently excluded and the high-risk patients among whom advanced fibrosis could be confidently confirmed by either test would also be identified. For the patients with intermediate nondiagnostic results of either test, we would assess whether the other test could assist in the classification of the fibrosis stage. The remaining patients with inadequate diagnostic certainty by the combined algorithm would be classified into the grey zone potentially requiring liver biopsy for staging.
The diagnostic performance of the different serum test formulae was assessed by the area under the receiver operating characteristics curves (AUROC). The cutoff values of the best performing serum test formula, defined as the formula with the highest AUROC curve, would be computed based on a >90% sensitivity to exclude and >90% specificity to confirm advanced liver fibrosis in the training and the validation cohorts. Sensitivity, specificity, positive predictive values (PPVs), NPVs, positive and negative likelihood ratios for these cutoff values of the combined LSM and serum test formula algorithm would be calculated for the training and validation cohorts. A confirmatory strategy for advanced fibrosis was defined as accurate if the post-test probability was above 90%, and an exclusion strategy was defined as accurate if the post-test probability was <10%.
The clinical characteristics, disease distribution and laboratory parameters of the validation cohort were generally well matched with that of the training cohort, except that there were more patients with elevated ALT and more portal tracts were present in the liver biopsy samples in the validation cohort (Table 1). In both the training and validation cohorts, most patients had elevated ALT levels. There was also no statistical difference on the severity of liver fibrosis between the training and validation cohorts.
|Training cohort||Validation cohort||P-value|
|Number of patients||156||82|
|Male gender||119 (76%)||71 (87%)||0.06|
|Age (year)||45 ± 11||42 ± 12||0.21|
|Body mass index (kg/m2)||24 ± 3||24 ± 4||0.40|
|Platelet (×109/L)||210 ± 56||209 ± 45||0.06|
|Albumin (g/L)||43 ± 5||43 ± 3||0.05|
|Gamma globulin (g/dL)||37 ± 6||37 ± 4||0.009|
|Total bilirubin (μmol/L)||15 ± 13||16 ± 8||0.54|
|Alanine aminotransferase (IU/L)||83 ± 53||123 ± 67||0.01|
|Normal||58 (37%)||5 (6%)|
|>1–5 times ULN||98 (63%)||77 (94%)|
|Aspartate aminotransferase (IU/L)||54 ± 39||75 ± 42||0.22|
|Gamma-glutamyl transpeptidase (IU/L)||52 ± 75||59 ± 41||0.43|
|Total cholesterol (mg/dL)||5.0 ± 1.1||5.0 ± 1.0||0.89|
|Length of liver biopsy (mm)||19 ± 4||20 ± 4||0.36|
|Number of portal tracts||10 ± 5||16 ± 8||<0.001|
|METAVIR fibrosis score||2 (0–4)||2 (0–4)||0.88|
|F0||10 (6%)||5 (6%)|
|F1||29 (19%)||29 (35%)|
|F2||43 (27%)||27 (33%)|
|F3||34 (22%)||5 (6%)|
|F4||40 (26%)||16 (20%)|
Performance of LSM algorithm and serum test formulae
LSM algorithm had better performance to predict advanced fibrosis than any of the serum test formulae (Figure 1). The AUROC was 0.88 [95% confidence interval (CI) 0.85–0.91, P < 0.001] in the training cohort. The sensitivity of LSM algorithm to exclude advanced fibrosis (at LSM ≤6.0 kPa for normal ALT and ≤7.5 kPa for elevated ALT) was 95%, with specificity, PPV and NPV of 58%, 70% and 92% respectively. At the high cutoff of the LSM algorithm (LSM >9.0 kPa for normal ALT and >12.0 kPa for elevated ALT), the specificity to confirm advanced fibrosis was 99%, with sensitivity, PPV and NPV of 54%, 98% and 67% respectively. The superior performance of LSM algorithm to other serum test formulae was confirmed in the validation cohort (AUROC 0.80; 95% CI 0.68–0.92, P < 0.001).
Among the various serum tests formulae, Forns index had the best performance to diagnose advanced fibrosis. The AUROC was 0.70 (95% CI 0.62–0.78, P < 0.001) in the training cohort and 0.72 (95% CI 0.60–0.85, P < 0.001) in the validation cohort (Table S2). The optimal cutoff values of Forns index in the training cohort were 5.2 to exclude advanced fibrosis and 8.4 to confirm advanced fibrosis. At a Forns index of ≤5.2, the sensitivity to exclude advanced fibrosis in the training cohort was 99%, with specificity, PPV and NPV of 26%, 59% and 95% respectively. At a Forns index of >8.4, the specificity to confirm advanced fibrosis in the training cohort was 91%, with sensitivity, PPV and NPV of 28%, 76% and 55% respectively (Table 2).
|Training cohort||Validation cohort|
|Exclusion strategy||Confirmatory strategy||Exclusion strategy||Confirmatory strategy|
|Positive predictive value||52||62||26||27|
|Negative predictive value||67||60||100||79|
|Positive predictive value||59||76||29||69|
|Negative predictive value||95||55||100||83|
|Positive predictive value||66||50||41||50|
|Negative predictive value||56||49||84||76|
|Positive predictive value||55||54||27||28|
|Negative predictive value||66||58||86||85|
|Positive predictive value||68||67||41||43|
|Negative predictive value||59||50||84||76|
|Liver stiffness measurement||Sensitivity||95||54||81||43|
|Positive predictive value||70||98||71||53|
|Negative predictive value||92||67||61||82|
Combined LSM–Forns algorithm for advanced liver fibrosis
As the Forns index has the best diagnostic performance for advanced fibrosis, a diagnostic algorithm combining LSM and Forns index was developed. As both LSM and Forns index had high sensitivity to exclude advanced fibrosis in the training cohort (95% and 99% respectively) and the high specificity could be confirmed in the validation cohort (91% and 99% respectively), either low LSM or low Forns index was used to exclude advanced fibrosis in the combined algorithm (Table S3). Using this combined LSM–Forns algorithm, the sensitivity and NPV remained high (>90%). In patients without advanced fibrosis (F < 3), the number of biopsy correctly avoided, compared with LSM algorithm alone, increased from 44/82 (54%) to 49/82 (60%) in the training cohort, and from 37/61 (61%) to 40/61 (66%) in the validation cohort. On the other hand, the number of incorrect diagnosis made by the combined LSM–Forns algorithm remained the same as the LSM algorithm alone [4/82 (5%) in the training cohort, and 4/61 (7%) in the validation cohort; Table 3].
|Exclusion strategy||Confirmatory strategy|
|LSM algorithm||LSM–Forns algorithm||LSM algorithm||LSM–Forns algorithm|
|Training cohort||Sensitivity (%)||95||95||54||24|
|Positive predictive value (%)||70||68||98||100|
|Negative predictive value (%)||92||92||67||59|
|Positive likelihood ratio||2.3||2.4||54.0||∞|
|Negative likelihood ratio||0.1||0.1||0.5||0.8|
|No. biopsy correctly avoided (%)||44/82 (54)||49/82 (60)||43/74 (58)||18/74 (24)|
|No. incorrect diagnosis (%)||4/82 (5)||4/82 (5)||1/74 (1)||0/74 (0)|
|Validation cohort||Sensitivity (%)||81||95||43||29|
|Positive predictive value (%)||71||67||53||100|
|Negative predictive value (%)||61||92||82||80|
|Positive likelihood ratio||2.1||2.2||3.3||∞|
|Negative likelihood ratio||0.3||0.1||0.7||0.7|
|No. biopsy correctly avoided (%)||37/61 (61)||40/61 (66)||9/21 (43)||6/21 (29)|
|No. incorrect diagnosis (%)||4/61 (7)||4/61 (7)||8/21 (38)||0/21 (0)|
To confirm advanced fibrosis, the specificity of the LSM algorithm did not remain high in the validation cohort (87%), while the specificity of Forns index alone was reasonably high (91%). Therefore, agreement between high LSM and high Forns index was required to achieve a good diagnostic performance in the combined LSM–Forns algorithm. The specificity/PPV of the LSM–Forns algorithm increased from 99%/98% to 100%/100% in the training cohort and from 87%/53% to 100%/100% in the validation cohort as compared with the LSM algorithm alone. In patients with advanced fibrosis, the number of biopsy correctly avoided, compared with LSM algorithm alone, decreased from 43/74 (58%) to 18/74 (24%) patients in the training cohort and from 9/21 (43%) to 6/21 (29%) patients in the validation cohort. On the other hand, the number of incorrect diagnosis decreased from 1/74 (1%) to 0/74 (0%) patients in the training cohort and from 8/21 (38%) to 0/21 (0%) patients in the validation cohort (Table 3).
Overall, using this combined LSM–Forns algorithm, liver biopsy could be correctly avoided in 67/156 (43%) and 46/82 (56%) patients in the training and validation cohorts, as compared with 77/156 (49%) and 46/82 (56%) patients in the training and validation cohorts with LSM alone respectively. On the other hand, incorrect diagnosis was made by the LSM–Forns algorithm in 4/156 (3%) and 4/82 (5%) patients in the training and validation cohorts, as compared to 5/156 (3%) and 12/82 (15%) patients by the LSM algorithm alone respectively.
In this study, we demonstrated that LSM with transient elastography had the best diagnostic performance for advanced fibrosis among all non-invasive assessments of liver fibrosis. The performance of various serum test formulae was less satisfactory, with Forns index being the best among them. The diagnostic performance of LSM to confirm advanced fibrosis could be further improved in the combined LSM–Forns algorithm, and its performance was probably superior to other non-invasive assessments of liver fibrosis in CHB patients reported so far.26
Liver stiffness measurement alone has good, but not excellent, diagnostic performance. The diagnostic performance was particularly affected in patients with elevated serum ALT levels.20 Hence, a second non-invasive test independent of the serum ALT or AST levels would be a good supplementary test to LSM. Among various serum test formulae, Forns index12 and Hui index15 are composed of clinical parameters other than ALT or AST levels. While the performance of these two formulae was similar in the training cohort, Forns index had better performance in the validation cohort and was therefore chosen as a supplementary index in the combined algorithm (Table 1). Although the performance of APRI in CHB was reported good in a Korean report and modest in a Chinese and two Singaporean series,27–30 its performance in the current study was unsatisfactory. This might be explained by the majority of patients having elevated ALT and AST levels in the current study, such that the performance of APRI was affected. After all, ALT and AST were surrogates of hepatic necro-inflammation, which tend to fluctuate significantly in CHB and might not reflect the fibrosis staging as accurately as in other liver diseases.
During the development of the combined algorithm, we aimed to derive a diagnostic model of approximately 90% accuracy. This was because liver biopsy, which served as the gold standard, is imperfect and has its own limitations such as sampling error31 and the intra-observer and inter-observer variability.32 Therefore, a perfect non-invasive marker can only achieve an AUROC of approximately 90% with reference to liver biopsy.33 As both LSM and Forns index had high sensitivity of above 90% to exclude advanced fibrosis at the low cutoff in both the training and validation cohorts, either low LSM or low Forns index was used to exclude advanced fibrosis in the combined algorithm. This ‘EITHER-OR’ approach increased the number of biopsy correctly avoided without compromising the number of incorrect diagnosis. On the other hand, the specificity of LSM tend to fall in the validation cohort (from 99% to 87%) when it was used to confirm advanced fibrosis at the high cutoff value. Agreement between high LSM and high Forns index improved the specificity and PPV. This ‘AND’ approach could decrease the number of incorrect diagnosis of advanced fibrosis at the expense of less patients avoided from liver biopsy (Figure 2).
Although Forns index had the best diagnostic performance for advanced fibrosis among various serum test formulae, the optimal cutoff values in the current study (5.2 and 8.4) were higher than those proposed in the original study among chronic hepatitis C patients (4.2 and 6.9).12 This implied that findings from studies in chronic hepatitis C patients could not be directly extrapolated to CHB patients. Our findings echoed a report by Wai et al.,28 in which none the of serum test formulae derived from chronic hepatitis C patients could accurately predict liver cirrhosis in patients with CHB.
There were a few limitations in our study. First, most of our patients had elevated ALT levels, particularly in the validation cohort. The small number of patients with normal ALT rendered our study insufficient to make concrete recommendations among these patients. This was because most of the patients who underwent liver biopsy had elevated ALT levels for assessment of the need for anti-viral therapy. We believed that the performance of the combined LSM–Forns algorithm would be even better in patients with normal ALT, as the performance of LSM should be better in these patients without the influence of ALT.20 Second, as in all studies evaluating non-invasive markers of liver fibrosis, liver biopsy was used as the gold standard. The mean length of liver biopsy was 19–20 mm in the two cohorts, which was still shorter than the suggested length of 25 mm by Bedossa et al.31 It might limit the accuracy of histological assessment of liver fibrosis. As transient elastography measures a volume of liver tissue at least 100 times bigger than a biopsy sample, we could not exclude a possibility that the performance of LSM has been underestimated.34 Third, patient with ALT higher than 5 times ULN were excluded from this study because of the inaccurate performance of LSM in patients with very high ALT levels.35 Our combined LSM–Forns algorithm cannot be recommended for patients with very high ALT levels, such as patients with severe acute exacerbation of CHB.36 Nonetheless, patients with very high ALT levels are less equivocal for treatment decision and liver biopsy is less commonly required. Third, more patients in validation cohort had elevated ALT (94% vs. 63%), while more patients in the training cohort had advanced fibrosis (48% vs. 26%). This imbalance in patient characteristics might lead to discrepancy in the performance of the non-invasive assessments of liver fibrosis in two cohorts. Lastly, ideally it would more useful to confirm or exclude significant (F ≥ 2) instead of advanced (F ≥ 3) fibrosis, as mentioned in the current treatment guidelines.4, 6 Nonetheless, the performance of transient elastography was known to be much less satisfactory for F2 than F3 fibrosis.18, 19 In the current study, the AUROC of LSM for F ≥ 2 fibrosis was only 0.81 (95% CI 0.73–0.89) in the training cohort, and only 0.71 (95% CI 0.63–0.79) in the validation cohort. Hence, it would be difficult to find the optimal cutoff values to confirm or exclude F2 fibrosis.
In conclusion, we have developed a combined algorithm using LSM and Forns index to predict advanced liver fibrosis in CHB. This combined LSM–Forns algorithm could improve the accuracy of prediction compared with LSM alone and liver biopsy could be correctly avoided in approximately 50% of patients. Future studies are warranted to validate this algorithm in patients with normal ALT levels.
Declaration of personal interests: Henry L. Y. Chan has served as an advisory board member for Novartis Pharmaceuticals, Bristol-Myers Squibb, Roche and Schering-Plough. Vincent W. S. Wong has served as a consultant for Novartis Pharmaceuticals. Declaration of funding interests: This study was funded in part by Research Fund for the Control of Infectious Diseases (RFCID) grant 09080292 to Henry L. Y. Chan.