fax: (33) 2 41 35 41 19
Determination of reliability criteria for liver stiffness evaluation by transient elastography†
Article first published online: 4 FEB 2013
Copyright © 2012 American Association for the Study of Liver Diseases
Volume 57, Issue 3, pages 1182–1191, March 2013
How to Cite
Boursier, J., Zarski, J.-P., de Ledinghen, V., Rousselet, M.-C., Sturm, N., Lebail, B., Fouchard-Hubert, I., Gallois, Y., Oberti, F., Bertrais, S., Calès, P. and the Multicentric Group from ANRS/HC/EP23 FIBROSTAR Studies (2013), Determination of reliability criteria for liver stiffness evaluation by transient elastography. Hepatology, 57: 1182–1191. doi: 10.1002/hep.25993
Potential conflict of interest: Dr. Zarsko consults for and is on the speakers' bureau of Roche, MSD, Bristol-Myers Squibb, Gilead, Janssen, and Summit.
- Issue published online: 28 FEB 2013
- Article first published online: 4 FEB 2013
- Accepted manuscript online: 16 AUG 2012 12:00AM EST
- Manuscript Accepted: 17 JUN 2012
- Manuscript Received: 16 FEB 2012
- ANRS (French National Agency for AIDS and Viral Hepatitis) for HC/EP23 Fibrostar
- Top of page
- Patients and Methods
- Supporting Information
Liver stiffness evaluation (LSE) is usually considered as reliable when it fulfills all the following criteria: ≥10 valid measurements, ≥60% success rate, and interquartile range / median ratio (IQR/M) ≤0.30. However, such reliable LSE have never been shown to be more accurate than unreliable LSE. Thus, we aimed to evaluate the relevance of the usual definition for LSE reliability, and to improve reliability by using diagnostic accuracy as a primary outcome in a large population. 1,165 patients with chronic liver disease from 19 French centers were included. All patients had liver biopsy and LSE. 75.7% of LSE were reliable according to the usual definition. However, these reliable LSE were not significantly more accurate than unreliable LSE with, respectively: 85.8% versus 81.5% well-classified patients for the diagnosis of cirrhosis (P = 0.082). In multivariate analyses with different diagnostic targets, LSE median and IQR/M were independent predictors of fibrosis staging, with no significant influence of ≥10 valid measurements or LSE success rate. These two reliability criteria determined three LSE groups: “very reliable” (IQR/M ≤0.10), “reliable” (0.10< IQR/M ≤0.30, or IQR/M >0.30 with LSE median <7.1 kPa), and “poorly reliable” (IQR/M >0.30 with LSE median ≥7.1 kPa). The rates of well-classified patients for the diagnosis of cirrhosis were, respectively: 90.4%, 85.8%, and 69.5% (P < 10−3). According to these new reliability criteria, 9.1% of LSE were poorly reliable (versus 24.3% unreliable LSE with the usual definition, P < 10−3), 74.3% were reliable, and 16.6% were very reliable. Conclusion: The usual definition for LSE reliability is not relevant. LSE reliability depends on IQR/M according to liver stiffness median level, defining thus three reliability categories: very reliable, reliable, and poorly reliable LSE. (HEPATOLOGY 2013)
Liver stiffness evaluation (LSE) by Fibroscan is now widely used in several countries for the assessment of liver fibrosis in chronic liver diseases. According to the usual definition, all the following criteria have to be met to consider LSE as reliable: ≥10 valid measurements, LSE success rate ≥60%, and LSE interquartile range / median (IQR/M) ≤0.30.1-3 Reliability criteria for LSE are of great importance, first in clinical practice because reliable LSE result is useful for patient management, and also in clinical research because unreliable LSE are very often excluded from statistical analyses. When the usual definition is applied in clinical practice, 15% of LSE are considered unreliable.4 However, the relevance of the usual definition for LSE reliability has never been demonstrated, as no study has yet shown that LSE with at least 10 valid measurements and success rate ≥60% and IQR/M ≤0.30 provide better diagnostic accuracy than those not fulfilling these three criteria.
Two recent studies focused on determining the reliability criteria of LSE.5, 6 In the Lucidarme et al.5 study, including 254 patients with chronic hepatitis C (CHC), neither the number of valid measurements nor the LSE success rate were independent predictors of discrepancy between LSE median and fibrosis stages as determined on liver biopsy. Independent predictors were pathological fibrosis stage and IQR/M, with the most significantly discriminating cutoff value for IQR/M calculated at 0.21. In the Myers et al.6 study, including 251 patients with various causes of chronic liver disease, independent predictors of discrepancy between LSE median and liver biopsy were IQR/M, body mass index, and low pathological fibrosis stages, with no influence of LSE success rate or ≥10 valid measurements. The most discriminative IQR/M cutoff for discrepancy was ≥0.17.
However, those studies had several limits. First, they included pathological predictors leading their reliability criteria of LSE not applicable to clinical practice. Second, their main judgment criterion was discrepancy rate. To evaluate discrepancies between liver biopsy and LSE median, both studies categorized the latter into estimated Metavir F stages (called FFS stages in the present study) according to several diagnostic cutoffs provided by binary diagnoses such as significant fibrosis or cirrhosis. We have previously shown that the combination of such diagnostic cutoffs accumulates the diagnostic errors of each, resulting in a loss of accuracy.7 Consequently, the study of discrepancies between histological fibrosis stages and such poorly accurate LSE classifications seems not adequate and calls into question the relevance of the ensuing calculated cutoffs for IQR/M. This may explain why calculated cutoffs for IQR/M in the Lucidarme et al. and Myers et al. studies failed to identify subgroups of LSE with significantly different diagnostic accuracies. Third, the sample size might have been weak considering the low prevalence of putative discrepancies. Finally, to determine the reliability criteria for LSE, a better study outcome may be diagnostic accuracy rather than discrepancy rate.
The aims of the present study were to evaluate the diagnostic relevance of the usual definition for LSE reliability and to precisely determine the noninvasive reliability criteria of LSE by using diagnostic accuracy as a primary outcome in a large population.
Patients and Methods
- Top of page
- Patients and Methods
- Supporting Information
Two populations with liver biopsy and LSE were included in the present study. The first population was composed of patients with chronic liver disease recruited in three French centers between 2004 and 2009 (Angers: n = 383; Bordeaux: n = 309; and Grenoble: n = 142). Patients included in the Angers and Bordeaux centers had various causes of chronic liver diseases, whereas those from Grenoble had CHC. CHC patients of the three centers (n = 467) have been included in previous studies.8, 9 The second population was that of the multicenter ANRS/HC/EP23 Fibrostar study promoted by the French National Agency for Research in AIDS and Hepatitis.3 The patients included in both populations were identified and ultimately grouped as a single observation for statistical analyses. All patients gave written informed consent. The study protocol conformed to the ethical guidelines of the current Declaration of Helsinki and received approval from the local Ethics Committees.
Liver fibrosis was evaluated according to Metavir fibrosis (FM) staging. Significant fibrosis was defined as Metavir FM≥2, severe fibrosis as Metavir FM≥3, and cirrhosis as Metavir FM4. In the first population, histological evaluations were performed in each center by blinded senior pathologists specialized in hepatology. In the Fibrostar study, histological lesions were centrally evaluated by two senior experts with a consensus reading in cases of discordance. Fibrosis staging was considered as reliable when the liver specimen length was ≥15 mm and/or portal tract number ≥8.10
Liver Stiffness Evaluation
Precise definitions are provided in the Glossary in the Supporting Material.
LSE by Fibroscan (Echosens, Paris, France) was performed with the M probe and by an experienced observer (>50 examinations before the study), blinded for patient data. A time interval of ≤3 months between liver biopsy and LSE was considered acceptable for the purposes of the study. Examination conditions were those recommended by the manufacturer,11 with the objective of obtaining at least 10 valid measurements. Results were expressed as the median and the IQR (kPa) of all valid measurements. According to the usual definition, LSE was considered reliable when it included ≥10 valid measurements with a success rate ≥60% and IQR/M ≤0.30.
Interpretation of LSE Result.
LSE median was interpreted according to the diagnostic cutoffs published in previous studies. As CHC was the main cause of liver disease in our study population (68%), we tested the cutoffs published by Castera et al.12: ≥7.1 kPa for FM≥2 and ≥12.5 kPa for FM4, those by Ziol et al.13: ≥8.8 kPa for FM≥2 and ≥14.6 kPa for FM4, and those specifically calculated for CHC in the meta-analysis of Stebbing et al.14: ≥8.5 kPa for FM≥2 and ≥16.2 kPa for FM4. As there were various causes of chronic liver disease in our study population, we also tested the cutoff published in the meta-analysis of Friedrich-Rust et al.15: ≥7.7 kPa for FM≥2 and ≥13.1 kPa for FM4. By using the diagnostic cutoffs, LSE median was categorized into estimated FFS stages according to the most probable Metavir F stage(s). This approach provided the following LSE classification: LSE result <cutoff for FM≥2: FFS0/1; ≥cutoff for FM≥2 and <cutoff for FM4: FFS2/3; ≥cutoff for FM4: FFS4.
Because distribution was skewed for most quantitative variables, they were expressed as median with 1st and 3rd quartiles in brackets. Diagnostic accuracy was mainly expressed as area under the receiver operating characteristic (AUROC) (for binary diagnoses of significant fibrosis, severe fibrosis, or cirrhosis) or the rate of well-classified patients by the LSE classification. AUROCs were compared according to Delong et al.16 for paired groups, and Hanley and McNeil17 for unpaired groups.
To identify the factors influencing LSE accuracy, we determined the variables independently associated with the following diagnostic target: significant fibrosis, severe fibrosis, or cirrhosis by stepwise forward binary logistic regression. Indeed, by definition, each variable selected by a multivariate analysis is an independent predictor of the diagnostic target studied. In other words, when selected with LSE median, an independent predictor influences the outcome (diagnostic target) for each fixed level of liver stiffness. Consequently, the multivariate analysis allowed for the identification of the predictor influencing LSE accuracy. The dependent variable, LSE median, was tested with the following independent variables: age, sex, body mass index, cause of chronic liver disease (CHC versus other), ≥10 LSE valid measurements, LSE success rate, IQR/M, and biopsy length as a putative confounding variable. Statistical analyses were performed using SPSS v. 18.0 software (IBM, Armonk, NY) and SAS 9.1 (SAS Institute, Cary, NC).
- Top of page
- Patients and Methods
- Supporting Information
The main characteristics of the 1,165 patients included in the study are presented in Table 1. The cause of chronic liver disease was CHC in 68.5% of patients, hepatitis B monoinfection: 5.7%, alcohol: 12.4%, nonalcoholic fatty liver disease (NAFLD): 3.3%, and other: 10.1%. Overweight status (body mass index ≥25.0 kg/m2) was present in 44.0% of patients. Liver biopsies were considered reliable in 92.0% of the cases. The prevalence of significant fibrosis, severe fibrosis, and cirrhosis was, respectively, 63.3%, 38.9%, and 21.0%.
|Cause of Liver Disease|
|Age (years)||51.1 (43.9-60.5)||50.1 (43.9-59.7)||54.2 (43.9-63.3)||0.084|
|Body mass index (kg/m2)||24.5 (22.2-27.6)||24.2 (22.1-26.7)||25.1 (22.5-29.4)||<10−3|
|Body mass index ≥25 kg/m2 (%)||44.0||40.1||50.9||10−3|
|Metavir FM stage (%):||<10−3|
|Biopsy length (mm)||25 (18-30)||24 (18-30)||25 (17-32)||0.093|
|Reliable biopsy (%)||92.0||93.8||88.0||10−3|
|LSE median (kPa)||8.1 (5.8-14.0)||7.8 (5.6-11.1)||11.0 (6.6-25.1)||<10−3|
|Valid measurements (n)||9.8 ± 1.5 †||9.8 ± 1.3 †||9.7 ± 1.9 †||0.227|
|≥10 LSE valid measurements (%)||92.8||93.3||91.6||0.291|
|LSE success rate (%)||100 (83-100)||100 (83-100)||91 (77-100)||10−3|
|LSE success rate ≥60% (%)||89.8||91.9||85.1||<10−3|
|IQR/M||0.17 (0.12-0.25)||0.17 (0.12-0.24)||0.18 (0.11-0.25)||0.211|
|IQR/M ≤0.30 (%)||85.5||86.1||84.3||0.416|
|Reliable LSE (%) ‡||75.7||77.6||71.6||0.027|
The AUROCs (±standard deviation [SD]) of LSE for the diagnosis of significant fibrosis, severe fibrosis, and cirrhosis were, respectively, 0.822 ± 0.012, 0.872 ± 0.010, and 0.910 ± 0.011 (Table 2). AUROCs of LSE in unreliable biopsies were not significantly different from those in reliable biopsies (details not shown). The rates of well-classified patients according to the various diagnostic cutoffs tested are presented in Table S1 in the Supporting Material. Cutoffs published by Castera et al.12 provided the highest accuracy for significant fibrosis and LSE classification, and were thus used for further statistical analysis.
|Cause of Liver Disease||Diagnostic Target||Liver Stiffness Evaluation|
|All||FM≥2||0.822 ± 0.012||0.835 ± 0.014||0.794 ± 0.026||0.165|
|FM≥3||0.872 ± 0.010||0.881 ± 0.012||0.856 ± 0.023||0.344|
|FM4||0.910 ± 0.011||0.913 ± 0.012||0.906 ± 0.022||0.780|
|CHC||FM≥2||0.787 ± 0.016||0.805 ± 0.018||0.733 ± 0.037||0.080|
|FM≥3||0.843 ± 0.015||0.856 ± 0.016||0.811 ± 0.035||0.242|
|FM4||0.897 ± 0.016||0.900 ± 0.018||0.918 ± 0.038||0.669|
|Other||FM≥2||0.883 ± 0.019 ‡||0.888 ± 0.024 §||0.889 ± 0.032 ‡||0.980|
|FM≥3||0.905 ± 0.016 §||0.913 ± 0.018‖||0.888 ± 0.034||0.516|
|FM4||0.908 ± 0.016||0.920 ± 0.018||0.862 ± 0.037||0.159|
Usual Definition for LSE Reliability
92.8% of LSE included at least 10 valid measurements, 89.8% achieved a ≥60% success rate, and 85.5% had an IQR/M ≤0.30 (Table 1). None of these conditions led to a significant increase in LSE AUROC (Table S2).
75.7% of LSE fulfilled these three criteria; they were consequently considered as reliable according to the usual definition for LSE reliability. AUROCs for significant fibrosis, severe fibrosis, or cirrhosis were not significantly different between reliable and unreliable LSE (Table 2). By using Castera et al.12 cutoffs (≥7.1 kPa for FM≥2 and ≥12.5 kPa for FM4), LSE accuracy was not significantly different between reliable and unreliable LSE for the diagnosis of significant fibrosis (respectively: 75.5% versus 72.1%, P = 0.255) or cirrhosis (85.8% versus 81.5%, P = 0.082). Similarly, the rate of well-classified patients by the LSE classification (FFS0/1, FFS2/3, FFS4) derived from Castera et al. cutoffs was not significantly different between reliable and unreliable LSE (respectively: 63.5% versus 57.2%, P = 0.064).
Independent Predictors of Fibrosis Staging
Independent predictors of significant fibrosis, severe fibrosis, or cirrhosis are detailed in Table 3. Briefly, in addition to LSE median, IQR/M was the only LSE characteristic independently associated with the three diagnostic targets of fibrosis, with no significant influence of the number of LSE valid measurements, LSE success rate, or the cause of liver disease. There was no colinearity between LSE median and IQR/M (Spearman coefficient correlation = 0.047, P = 0.109). Independent predictors were the same when variables were introduced as dichotomous results (IQR/M ≤0.30, LSE success rate ≥60%, reliable versus unreliable biopsy) in the multivariate analyses (details not shown).
|Diagnostic Target||Step||Variable||P||Odds Ratio (95%CI)|
|FM≥2||1st||LSE median||<10−3||1.323 (1.262-1.387)|
|FM≥3||1st||LSE median||<10−3||1.278 (1.234-1.324)|
|FM4||1st||LSE median||<10−3||1.201 (1.168-1.234)|
|2nd||Biopsy length||0.002||0.965 (0.944-0.987)|
Classification of LSE Accuracy
We develop here a classification using the preceding independent predictors of accuracy.
LSE accuracy as a function of increasing intervals of IQR/M is depicted in Table S3. Briefly, LSE accuracy decreased when IQR/M increased and three subgroups of LSE were identified: IQR/M ≤0.10 (16.6% of patients); 0.10< IQR/M ≤0.30 (69.0%); IQR/M >0.30 (14.5%). LSE with IQR/M ≤0.10 had significantly higher accuracy than LSE with IQR/M >0.10 (Table 4). LSE with 0.10< IQR/M ≤0.30 had higher accuracy than LSE with IQR/M >0.30, but the difference did not reach statistical significance.
|Diagnostic Target:||AUROC||Diagnostic Accuracy (%)*|
|IQR/M||≤0.10||0.886 ± 0.024||0.937 ± 0.018||0.970 ± 0.011||77.1||90.4||69.1|
|0.10< and ≤0.30||0.822 ± 0.015||0.868 ± 0.013||0.895 ± 0.015||75.6||84.7||62.6|
|>0.30||0.785 ± 0.035||0.842 ± 0.032||0.898 ± 0.031||69.1||80.6||53.9|
|≤0.10 vs. 0.10< and ≤0.30||0.024||0.002||<10−3||0.661||0.043||0.092|
|≤0.10 vs. >0.30||0.017||0.010||0.029||0.088||0.008||0.003|
|0.10< and ≤0.30 vs. >0.30||0.331||0.451||0.931||0.081||0.196||0.039|
|Linear trend †||−||−||−||0.091||0.009||0.003|
By using 7.1 kPa as a diagnostic cutoff,12 the rate of well-classified patients for significant fibrosis was very good in LSE medians ≥7.1 kPa, but only fair in LSE medians <7.1 kPa: 81.5% versus 64.5%, respectively (P < 10−3). By using 12.5 kPa as a diagnostic cutoff,12 the rate of well-classified patients for cirrhosis was excellent in LSE medians <12.5 kPa, but only fair in LSE medians ≥12.5 kPa: 94.3% versus 60.4%, respectively (P < 10−3). LSE thus demonstrated excellent negative predictive value for cirrhosis and very good positive predictive value for significant fibrosis. Conversely, it had insufficient positive predictive value for cirrhosis and insufficient negative predictive value for significant fibrosis. Finally, the rate of well-classified patients by the LSE classification derived from Castera et al. cutoffs was not significantly different among its three classes, FFS0/1: 64.5%, FFS2/3: 60.4%, and FFS4: 60.4% (P = 0.379).
IQR/M and LSE Median.
In patients with LSE median <7.1 kPa, the diagnostic accuracy of the LSE classification derived from Castera et al. cutoffs was not significantly different among the three IQR/M subgroups (P = 0.458; Fig. 1). Conversely, in patients with LSE median ≥7.1 kPa the diagnostic accuracy of the LSE classification was significantly lower in LSE with IQR/M >0.30 compared to LSE with IQR/M ≤0.30 (43.8% versus 64.1%, P < 10−3; Fig. 1). The rates of well-classified patients for the binary diagnoses of significant fibrosis or cirrhosis as a function of IQR/M and LSE median are detailed in Supporting Fig. S1. Briefly, in patients with LSE median ≥7.1 kPa, LSE with IQR/M >0.30 had lower accuracy for significant fibrosis than LSE with IQR/M ≤0.30 (67.6% versus 84.3%, P < 10−3). In patients with LSE median ≥12.5 kPa, LSE with IQR/M >0.30 had lower accuracy for cirrhosis than LSE with IQR/M ≤0.30 (45.1% versus 64.0%, P = 0.011).
Proposal for New Reliability Criteria in LSE
The previous findings led us to develop new criteria for the interpretation of LSE results (Table 5). LSE accuracy in the subgroup of LSE with IQR/M ≤0.10 was higher than in the whole population (Table 6). LSEs in this subgroup were thus considered “very reliable.” LSE with 0.10< IQR/M ≤0.30 or with IQR/M >0.30 and LSE median <7.1 kPa provided accuracy similar to that of the whole population and were thus considered “reliable.” Finally, LSE with IQR/M >0.30 and LSE median ≥7.1 kPa provided accuracy lower than that of the whole population and were thus considered “poorly reliable.”
|Diagnostic Target:||AUROC||Diagnostic Accuracy (%) *|
|LSE:||All †||0.822 ± 0.012||0.872 ± 0.010||0.910 ± 0.011||74.9||85.0||62.4|
|Very reliable||0.886 ± 0.024||0.937 ± 0.018||0.970 ± 0.011||77.1||90.4||69.1|
|Reliable||0.823 ± 0.014||0.876 ± 0.012||0.904 ± 0.014||75.3||85.8||63.2|
|Poorly reliable||0.773 ± 0.045||0.745 ± 0.049||0.819 ± 0.052||67.6||69.5||43.8|
|Very reliable vs. reliable||0.023||0.005||<10−3||0.603||0.090||0.125|
|Very reliable vs. poorly reliable||0.027||<10−3||0.004||0.076||<10−3||<10−3|
|Reliable vs. poorly reliable||0.289||0.009||0.115||0.088||<10−3||<10−3|
According to these new criteria, 16.6% of LSE were considered “very reliable,” 74.3% “reliable,” and 9.1% “poorly reliable.” Importantly, LSE AUROCs and diagnostic accuracies were significantly different among these three subgroups (Table 6). Finally, the rate of poorly reliable LSE according to the new criteria was significantly lower than that of unreliable LSE according to the usual definition (9.1% versus 24.3%, P < 10−3).
We evaluated our new criteria for LSE reliability as a function of several potential influencing characteristics: cause of liver disease (CHC versus others), diagnostic indexes (AUROC, binary diagnosis of significant fibrosis or cirrhosis, LSE classification), and diagnostic cutoffs published by Ziol et al.,13 Stebbing et al.,14 and Friedrich-Rust et al.15 The detailed results are presented in Tables S4 and S5. Briefly, whatever the potential influencing factor, a decrease in LSE reliability, according to our new criteria, was associated with a decrease in LSE accuracy. Body mass index (<25 versus ≥25 kg/m2) did not influence LSE accuracy in any of the three new categories of LSE reliability (details not shown). Because of the few numbers of patients with hepatitis B, alcohol abuse, or NAFLD, it was not possible to perform a sensitivity analysis for these causes of chronic liver disease.
- Top of page
- Patients and Methods
- Supporting Information
There is currently a critical need in clinical practice and in clinical research to precisely define the reliability criteria of LSE. Indeed, Fibroscan is now widely used and physicians have to daily determine whether LSEs are reliable and permit a more accurate diagnosis. Moreover, in clinical research the reliability criteria of LSEs directly influence the results of studies because unreliable LSEs are usually excluded from statistical analyses.
Relevance of the Usual Definition for LSE Reliability
To our knowledge, the present study is the first to evaluate the relevance of the usual definition for LSE reliability. The strengths of our work include the large number of included patients, the high rate of reliable liver biopsy (92.0%), and a thorough analysis of accuracy including either global indexes of performance such as AUROC, or useful indexes for daily clinical practice such as the rate of well-classified patients. Our results clearly show that LSE considered as reliable according to the usual definition have higher diagnostic accuracy than unreliable LSE, but this difference is slight and not statistically significant (Table 2). The usual definition for LSE reliability, including the number of valid measurements, LSE success rate, and IQR/M, is thus not relevant for clinical practice or clinical research.
New Reliability Criteria for LSE
Multivariate analyses showed that liver fibrosis staging was independently linked to IQR/M, with no influence of the number of LSE valid measurements or LSE success rate (Table 3). These results confirm the key role of IQR/M, as suggested in the Lucidarme et al. and Myers et al. studies.5, 6 However, these two studies were based on a discrepancy analysis between FM stages by liver biopsy and FFS stages (defined by LSE median categorized into equivalent Metavir fibrosis stages). IQR/M cutoffs were thus calculated to predict the discrepancy, but they failed to delineate subgroups of LSE where accuracies for liver fibrosis diagnosis were significantly different. In the present study, we used diagnosis of fibrosis stages as the main outcome. This allowed us to determine the thresholds of IQR/M that define subgroups of LSE with significantly different diagnostic accuracies, and thus the precise reliability criteria for LSE.
LSE with IQR/M ≤0.10 (i.e., with minimal signal variability) provided significantly higher AUROCs, a higher rate of well-classified patients for the diagnosis of cirrhosis, and a higher rate of well-classified patients by LSE classification (Table 4). LSE with IQR/M ≤0.10 may thus be considered “very reliable,” especially when the LSE median is ≥12.5 kPa (Fig. 1).
LSE with IQR/M >0.30 (i.e., with large variability) provided lower AUROCs and a lower rate of well-classified patients when compared to LSE with <0.10 IQR/M ≤0.30, but the difference was not statistically significant (Table 4). Because multivariate analyses showed a significant interaction between these two variables, we evaluated the influence of IQR/M according to LSE median. The deleterious effect of IQR/M >0.30 on LSE accuracy was amplified by the liver stiffness level: the diagnostic accuracy for cirrhosis decreased even more in patients with LSE median ≥12.5 kPa, and accuracy for significant fibrosis significantly decreased in patients with LSE median ≥7.1 kPa. Finally, LSE with IQR/M >0.30 may be considered “poorly reliable” in patients with LSE median ≥7.1 kPa and “reliable” in patients with LSE median <7.1 kPa (Fig. 1).
The interaction between IQR/M and liver stiffness level is not surprising: IQR corresponds to the interval around the LSE median containing 50% of the valid measurements between the 25th and 75th percentiles, and is usually expressed as the ratio IQR/M. A high IQR/M implies a large distribution of LSE valid measurements and thus a higher risk of an aberrant LSE median. However, by definition, a high IQR/M also implies a smaller interval in low liver stiffness levels (compared to high stiffness levels). For example, an IQR/M at 0.30 represents a 1.5 kPa interval when liver stiffness is 5.0 kPa, but a 4.5 kPa interval when liver stiffness is 15.0 kPa. Consequently, IQR/M has little impact on LSE median in low liver stiffness levels, thus explaining why LSE with IQR/M >0.30 may be considered “reliable” when LSE median is <7.1 kPa (Fig. 1). Because increasing liver stiffness amplifies the deleterious effect of IQR/M >0.30 with a significant decrease in diagnostic accuracy, LSE with IQR/M >0.30 and median ≥7.1 kPa may be considered “poorly reliable” (Table 6; Fig. 1). Finally, by inverting the same reasoning, one can explain why LSE with IQR/M ≤0.10 are very accurate in high liver stiffness values (Fig. 1).
0.10< IQR/M ≤0.30.
Finally, our results permitted the establishment of new reliability criteria identifying three LSE subgroups according to IQR/M and liver stiffness level (Table 5). The accuracy of LSE for fibrosis staging was significantly different between these three subgroups, thus demonstrating the relevance of these new criteria (Table 6). Moreover, the rate of poorly reliable LSE according to the new criteria (9.1%) was significantly lower than “unreliable” LSE as defined in the previous usual criteria (24.3%).
How Many Valid Measurements Are Needed for LSE?
In our study, as in those of Lucidarme et al. and Myers et al.,5, 6 the ≥10 valid measurements variable had no influence on LSE accuracy (Table 3). This leads to the question: How many valid measurements are required for LSE? Kettaneh et al.18 have shown in 935 patients with CHC that AUROCs for the diagnosis of significant fibrosis or cirrhosis barely differed across LSE median values obtained from the three first, five first, and ten first valid measurements. We found similar results in our population (details not shown). However, the analysis by Kettaneh et al. was performed in a subgroup of patients with LSE including at least 10 valid measurements; their results probably do not reflect the accuracy of LSE for which only three or five valid measurements are genuinely available because of examination difficulties. In our study, 92.8% of LSE had at least 10 valid measurements and this rate was 96.9% in the large series of Castera et al.4 Considering the current state of knowledge, and because LSE is a quick and easy procedure, the pragmatic goal of operators should be to obtain 10 valid measurements, whatever the success rate.19
Key Role of IQR/M in LSE
Several recent longitudinal studies have shown that LSE median was linked to clinical events such as liver decompensation,20, 21 hepatocellular carcinoma,22, 23 or death.24 This suggests that liver stiffness may be used as a prognostic index in chronic liver diseases. Reliability criteria of LSE are thus important to correctly compare LSE repeated over time and accurately evaluate the course of liver stiffness in patients. We have previously shown that interobserver reproducibility of LSE median depends on IQR/M and liver stiffness level.25, 26 Interobserver agreement decreased in LSE with IQR/M >0.25,25 confirming the key role of this index for the interpretation of LSE median in the management of patients with chronic liver diseases.
Our results suggest that LSE is less accurate in CHC patients than in patients with other causes of chronic liver disease (Table 2). However, the cause of liver disease was not an independent predictor of fibrosis (Table 3). Moreover, the characteristics of CHC and non-CHC patients were significantly different, especially for F stages with a significantly higher prevalence of FM≥2, FM≥3, and F4 in non-CHC patients (Table 1). It has been previously shown that a higher prevalence of the diagnostic target is associated with an increase in fibrosis tests accuracy.27 Finally, the difference in LSE accuracy observed between CHC and non-CHC patients is probably explained by the significantly different characteristics of these two subgroups.
LSE diagnostic cutoffs calculated in published studies are very heterogeneous.28 We tested several cutoffs, some calculated for CHC12-14 and others determined in a large meta-analysis including patients with various causes of chronic liver disease.15 Interestingly, we found significant but slight differences in diagnostic accuracy, either in CHC patients or in patients with other causes of chronic liver disease. This supports the interest to evaluate the influence of the cause of chronic liver disease on LSE accuracy and diagnostic cutoffs determination in well-matched populations of alcoholic, NAFLD, CHC, or chronic hepatitis B patients.
Finally, we evaluated in a sensitivity analysis the influence of several characteristics on our new criteria for LSE reliability. Regardless of the characteristic tested (cause of chronic liver disease, diagnostic cutoffs used, diagnostic index, body mass index), a decrease in LSE reliability according to our new criteria was associated with a decrease in LSE accuracy, reinforcing the relevance of these new criteria for the interpretation of LSE results in daily clinical practice.
Relevance of the New Reliability Criteria for LSE
Our new reliability criteria for LSE represent a significant improvement for the interpretation of LSE in clinical practice. First, we have shown that the usual definition of LSE reliability is not relevant and the criteria “success rate ≥60%” is unnecessary. Second, we have defined a new category of “very reliable LSE” which provides very good positive predictive value for the diagnosis of cirrhosis. As a complement to diagnostic accuracy, which is useful for the individual diagnosis in clinical practice, AUROC, based on sensitivity and specificity, is another important index especially for fibrosis screening in the general population.29 In this setting, “very reliable” LSE provided the highest AUROC significantly different from those of the other two new reliability classes. Third, we have refined the usual definition of unreliable LSE (IQR/M >0.30) only in patients with LSE median ≥7.1 kPa. Consequently, the rate of patients with “poorly reliable” LSE, as defined by our new reliability criteria, was 3 times lower than in LSE considered as unreliable according to the usual definition. Compared to “reliable” LSE, “poorly reliable” LSE are impaired by a significantly lower diagnostic accuracy for cirrhosis or LSE classification. For the diagnosis of significant fibrosis, the accuracy reached borderline significance in the whole population and was significantly lower in the subgroup of CHC patients.
It is now well documented that several conditions influence LSE accuracy for the noninvasive evaluation of liver fibrosis: liver inflammation,30 cholestasis,31 central venous pressure,32 food intake,33 and probably liver steatosis.34 Our results show that intrinsic characteristic of LSE (IQR/M) also influences its accuracy. Finally, our new reliability criteria are an additional characteristic that must be taken into account by physicians for an accurate evaluation of liver fibrosis by LSE.
In conclusion, the usual definition for LSE reliability is not relevant. LSE median must be interpreted according to IQR/M and liver stiffness level. Using these two characteristics, we defined new reliability criteria for LSE resulting in three categories: “very reliable,” “reliable,” and “poorly reliable” with significantly different diagnostic accuracies.
- Top of page
- Patients and Methods
- Supporting Information
Angers: Sophie Michalak, Anselme Konaté, Catherine Ternisien, Alain Chevailler, Françoise Lunel, Wael Mansour; Grenoble: Vincent Leroy, Marie-Noelle Hilleret, Patrice Faure, Jean-Charles Renversez, Francoise Morel, Candice Trocme; Bordeaux: Juliette Foucher, Laurent Castéra, Patrice Couzigou, Pierre-Henri Bernard, Wassil Merrouche, Paulette Bioulac-Sage. FIBROSTAR study: Hepatologists: R. Poupon, A. Poujol, Saint-Antoine, Paris; A. Abergel, Clermont-Ferrand; J.P. Bronowicki, Nancy; J.P. Vinel, S. Metivier, Toulouse; V. De Ledinghen, J. Foucher, Bordeaux; O. Goria, Rouen; M. Maynard-Muet, C. Trepo, Lyon; Ph. Mathurin, Lille; D. Guyader, H. Danielou, Rennes; O. Rogeaux, Chambéry; S. Pol, Ph. Sogni, Cochin, Paris; A. Tran, Nice; P. Calès, Angers; P. Marcellin, T. Asselah, Clichy; M. Bourliere, V. Oulès, Saint Joseph, Marseille; D. Larrey, Montpellier; F. Habersetzer, Strasbourg; M. Beaugrand, Bondy; V Leroy, MN Hilleret, Grenoble. Biologists: R-C. Boisson, Lyon Sud; M-C. Gelineau, B. Poggi, Hôtel Dieu, Lyon; J-C. Renversez, Candice Trocmé, Grenoble; J. Guéchot, R. Lasnier, M. Vaubourdolle, Paris; H. Voitot, Beaujon, Paris; A. Vassault, Necker, Paris; A. Rosenthal-Allieri, Nice; A. Lavoinne, F. Ziegler, Rouen; M. Bartoli, C. Lebrun, Chambéry; A. Myara, Paris Saint-Joseph; F. Guerber, A. Pottier, Elibio, Vizille. Pathologists: E-S. Zafrani, Créteil; N. Sturm, Grenoble. Methodologists: A. Bechet, J-L Bosson, A. Paris, S. Royannais, CIC, Grenoble; A. Plages, Grenoble. We also thank the following contributors: Gilles Hunault, Pascal Veillon, Gwenaëlle Soulard; and Kevin L. Erwin (for English proofreading).
- Top of page
- Patients and Methods
- Supporting Information
- 4Pitfalls of liver stiffness measurement: a 5-year prospective study of 13,369 examinations. HEPATOLOGY 2010; 51: 828-835., , , , , , et al.
- Top of page
- Patients and Methods
- Supporting Information
Additional Supporting Information may be found in the online version of this article.
Please note: Wiley Blackwell is not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing content) should be directed to the corresponding author for the article.