Variability of the area under the receiver operating characteristic curves in the diagnostic evaluation of liver fibrosis markers: impact of biopsy length and fragmentation

Authors


Professor T. Poynard, APHP Groupe Hospitalier Pitié-Salpêtrière, 47-83 Boulevard de l’Hôpital, 75651 Paris Cedex 13, France.
E-mail: tpoynard@teaser.fr

Abstract

Summary

Background

The area under the receiver operating characteristic (ROC) curve is widely used as an estimate of the diagnostic value for fibrosis markers. Biopsy length and fragmentation are known as risk factors of false positive or false negative of biopsy but their quantitative impact on area under the receiver operating characteristic curve variability has not been assessed.

Aim

To assess these relationships to better compare the fibrosis markers.

Methods

The area under the ROC curves of FibroTest for the diagnosis of fibrosis was estimated in patients with chronic hepatitis C using an integrated database including 1312 patients with FibroTest and biopsy. To take into account the biopsy length, we used two adjustment factors: one in which an observed area under the ROC curve could be adjusted according to the relative area under the receiver operating characteristic curve of a biopsy of a given length vs. the entire liver and one taking into account the prevalence of each fibrosis stage defining advanced and non-advanced fibrosis.

Results

The mean biopsy length was smaller for cirrhosis (F4, 16 mm) vs. F3, (18 mm, P = 0.01) and F0 (19 mm, P = 0.01). The mean number of fragments was higher for cirrhosis (F4 = 4.1 fragments) vs. all the other stages (F0 = 1.9, F1 = 1.9, F2 = 1.9, F3 = 2.3; P < 0.001 vs. F4). The FibroTest area under the ROC curves for the diagnosis of advanced fibrosis, adjusted for stages’ prevalence, ranged from 0.80 to 0.98 depending on biopsy length and fragmentation, respectively.

Conclusion

The comparison of the area under the ROC curves of fibrosis markers should take into account the biopsy length and fragmentation.

Background

A major clinical challenge is finding the best means of evaluating and managing the increasing numbers of patients infected with the hepatitis C virus (HCV).1 Because of its limitations and risks, liver biopsy is no longer considered mandatory as the first line estimate of liver injury, and many markers have been developed as non-invasive alternatives.2–4

These new markers estimate liver fibrosis by mainly using the area under the receiver operating characteristic curve (AUROC).2–4 The AUROC combines the sensitivity (Se) and specificity (Sp) of a given quantitative marker for the diagnosis of a specific definition of fibrosis. Se is usually assessed in patients with advanced fibrosis (i.e. in the METAVIR scoring system stages F2, F3 and F4)4, 5 and Sp in non-advanced fibrosis (stages F0 and F1).

Since 1991, we have identified parameters associated with fibrosis and fibrosis progression and constructed several panels combining these parameters, these being the PGA index,6 age/platelets index7 and FibroTest.8 We and others have observed that three factors are associated with the AUROCs: the prevalence of fibrosis stages defining advanced and non-advanced fibrosis,9, 10 biopsy length and biopsy fragmentation.4, 9, 11, 12

The aim was to assess the relationships between biopsy length, biopsy fragmentation and the AUROCs on a large number of patients to better standardize the expression of fibrosis markers. The relationships between prevalence of fibrosis stages defining advanced fibrosis and the AUROCs have been studied in a separate study. This will allow more accurate comparisons to be made between fibrosis markers.

Methods

Standardization of AUROCs according to biopsy length

For years, liver biopsy was considered a gold standard for the diagnosis of fibrosis, but it has been shown to have a high percentage of false positives and false negatives when compared with the entire liver, mainly because of sampling error.13 For example, a biopsy 15 mm in length, which is the usual median liver biopsy size,14 had an AUROC of 0.82 for the diagnosis of two adjacent stages, F2 vs. F1, when the entire liver was used as the gold standard.13 This implies around a 20% error rate of the biopsy (false positives and false negatives) compared with the entire liver; therefore, discordance in the staging of fibrosis between a biomarker and liver biopsy may be due to an error of the biomarker, as well as to an error of the biopsy.14

To take into account the biopsy length, we used an adjustment factor in which an observed AUROC could be adjusted according to the relative AUROC of a biopsy of a given length vs. the entire liver as published by Bedossa et al.13 The adjusted AUROC (AdAUROC) is the relative expression of the AUROC taking into account the maximum possible AUROC of the liver biopsy of this length. The AdAUROC was calculated from the observed AUROC (ObAUROC) divided by the AUROC of the biopsy given length vs. the entire liver (gold standard = GsAUROC), with an upper limit of 0.99: AdAUROC = ObAUROC/GsAUROC. Three length classes were used: 0–10 mm (median 5 mm), 10–20 (median 15 mm) and >20 mm as previously described.12 For example, if the ObAUROC is 0.69 for F1 vs. F2 and the mean biopsy length is 15 mm, the biopsy GsAUROC for 15 mm = 0.8212 and the AdAUROC = 0.69/0.82 = 0.84.

Impact of biopsy length and biopsy fragmentation

Both biopsy length13 and biopsy fragmentation have been associated with false positives or false negatives of fibrosis diagnosis.14, 15 No previous study has looked at the specific association between fibrosis marker AUROCs according to both the presence of biopsy fragmentation and biopsy length. We hypothesized that both biopsy length and fragmentation could be significantly different between fibrosis stages, particularly in patients with cirrhosis. We looked first at the association between biopsy length and fragmentation and then compared the FT AUROCs between the three length classes and between fragmented and non-fragmented biopsies and their six combinations. Biopsy fragmentation was defined as more than two fragments.

Standardization of AUROCs according to the prevalence of fibrosis stages

The AUROC for the diagnostic of advanced fibrosis is not dependent on the prevalence of advanced fibrosis. However, the AUROCs variability because of a change in the prevalence of each fibrosis stage among advanced fibrosis (i.e. F2, F3 and F4) and among non-advanced fibrosis (i.e. F0 and F1) is unknown. In a separate study, we have assessed the relationship between the prevalence of each stage defining advanced and non-advanced fibrosis and the AUROC of a fibrosis marker for advanced fibrosis and we have constructed an index for standardizing comparisons.10 The standardization was constructed to transform any different prevalence profile to a homogeneous distribution of fibrosis stages from F0 to F4, as defined by a prevalence of 0.20 for each of the five stages (standard prevalence). In this case, the mean fibrosis stage in METAVIR units is three for advanced fibrosis: [mean of (F2 + F3 + F4)/3 = (2 + 3 + 4)/3 = 3] vs. 0.5 for non-advanced fibrosis [mean of (F1 + F0)/2 = (1 + 0)/2 = 0.5]. In this standard prevalence distribution, the difference between the mean fibrosis stage of advanced fibrosis minus the mean fibrosis stage of non-advanced fibrosis is three. This difference between advanced and non-advanced fibrosis stages (DANA) has been used for the standardization of fibrosis marker AUROCs. Therefore, the adjusted AUROC (AdAUROC), which takes into account the observed DANA vs. a standard DANA of three, can be calculated from the regression formula linking the observed AUROC (ObAUROC) to DANA. The regression formula for standardizing AUROCs estimated from different stage prevalences was AUROC = 0.582 + 0.1056 × (DANA).10

Patients

Integrated database

The integrated database included published prospective studies of patients with chronic hepatitis C which had concomitant FT measurement and liver biopsy, had liver biopsy scored using the METAVIR scoring system,5 had FT assessed on fresh serum using the recommended pre-analytical and analytical procedures and had individual data sent by the principal investigator.

Liver biopsies

In the integrated database, liver biopsies were processed by using standard techniques. A pathologist who was unaware of the biochemical markers evaluated the fibrosis stage and necrosis grade according to the METAVIR scoring system.5 Fibrosis was staged on a scale of 0–4: F0 = no fibrosis, F1 = portal fibrosis without septa, F2 = few septa, F3 = numerous septa without cirrhosis, F4 = cirrhosis. Biopsies were performed with a 16-gauge Hepafix Luer Lock needle (Braun Melsungen, Melsungen, Germany) in the Paris centre and the Bordeaux centre, and with various needles in the multicenter study from Marseille.

Biochemical markers

The previously validated FT-AT was used.8 FibroTest (FT; Biopredictive, Paris, France; HCV-Fibrosure, Labcorp, Burlington, USA) has been validated for the assessment of liver fibrosis in patients with chronic hepatitis C and B, and in patients with alcoholic and non-alcoholic steatosis. FT is a non-invasive blood test that combines the quantitative results of five serum biochemical markers (alpha2macroglobulin, haptoglobin, gamma glutamyl transpeptidase, total bilirubin and apolipoprotein A1) with the patient’s age and gender in a patented artificial intelligence algorithm (USPTO 6 631 330), which generates a measurement of fibrosis stage in the liver.

Statistical analysis

The AUROC was used as a measurement of discrimination; it was estimated by using the empirical (non-parametric) method by DeLong et al.16 and was compared by using the paired method by Zhou et al.17 All analyses were performed with ncss software (Kaysville, UT, USA).

Results

Integrated database

A total of 1312 subjects from three centres (Paris n = 537, Marseille n = 601, Bordeaux n = 174) were included in the integrated database. These were patients with chronic hepatitis C who were PCR positive before treatment and who had had contemporaneous FT and liver biopsy with METAVIR staging performed. The prevalence of stages was 11% for F0 (n = 141), 40% for F1 (n = 520), 22% for F2 (n = 295), 16% for F3 (n = 208) and 11% for F4 (n = 148); there was a 50% prevalence of advanced fibrosis (n = 661). The mean age was 48 years and 58% were male; biopsy length was detailed in 1292 patients (up to 10 mm n = 254, 20%, median 9 mm; 10–20 mm n = 683, 53%, median 15 mm; over 20 mm n = 355, 27%; median 25 mm) and the number of biopsy fragments was detailed in 853 patients (28% with more than two fragments). The main characteristics (age, gender, ethnicity, body mass index, genotype and fibrosis stages) did not differ for the 479 patients with missing data (20 patients with missing biopsy length and 459 patients with missing fragmentation) or for the 853 patients with non-missing data (data not shown).

Adjusted AUROCs according to biopsy length

Standardized AUROCs adjusted with the relative AUROC of biopsy vs. entire liver are presented in Table 2. For adjacent stages, the adjusted FT AUROCs ranged from 0.75 to 0.87. The adjusted FT AUROC for advanced fibrosis and cirrhosis were 0.89 and 0.96, respectively.

Table 2.   Diagnostic value (AUROCs) of FibroTest for the diagnosis of fibrosis stages observed in the integrated database according to biopsy length and fragmentation
 Advanced vs. non-advancedCirrhosis vs. non-cirrhosisComparison of adjacent stages
F2F3F4 vs. F0F1DANAAdAUROC¨F4 vs. F0F1F2F3F1 vs. F0F2 vs. F1F3 vs. F2F4 vs. F3
  1. § AUROC ± s.e. of the mean (number of patients); $ P = 0.03 vs. 10–20 mm; $$ P = 0.03 vs. 1–10 mm >2 fragments and vs. >20 mm 1–2 fragments; P = 0.005 vs. 10–20 mm 1–2 fragments; P = 0.002 vs. 10–20 mm >2 fragments. * P = 0.03 and ** P = 0.01 vs. 1–10 mm; *** P = 0.04 vs. 1–2 fragments. ^ P = 0.002 vs. 1–10 mm 1–2 fragments; P = 0.005 vs. >2 fragments 1–10 and 10–20 mm; P = 0.03 vs. >20 mm >2 fragments. £ P = 0.047 vs. 1–2 fragments. Number of patients included was 1312, with detailed biopsy length 1292, with detailed number of fragments 853. DANA is the difference between mean fibrosis stages in advanced fibrosis group and non-advanced fibrosis; AdAUROC is the adjusted AUROC taking into account the observed DANA vs. a standard DANA of 2.5; all the AUROC have been adjusted to a DANA of 2.5 using the formula: AdAUROC (2.5) = ObAUROC +  (0.1056) (2.5-ObDANA).

All§0.80 ± 0.01 (1312)1.980.850.88 ± 0.01 (1312)0.60 ± 0.03 (661)0.67 ± 0.01 (815)0.71 ± 0.03 (503)0.71 ± 0.03 (356)
Biopsy length
 1–10 mm0.85 ± 0.02$ (254)2.000.900.89 ± 0.03 (254)0.54 ± 0.08 (127)0.76 ± 0.04 (164)0.65 ± 0.06 (90)0.75 ± 0.06 (71)
 10–20 mm0.78 ± 0.02 (683)1.960.840.87 ± 0.02 (683)0.61 ± 0.04 (356)0.66 ± 0.03* (437)0.71 ± 0.03 (252)0.67 ± 0.04 (172)
 >20 mm0.79 ± 0.02 (355)2.010.840.90 ±  0.02 (355)0.61 ± 0.04 (171)0.63 ± 0.04** (205)0.76 ± 0.04 (152)0.73 ± 0.05 (104)
Fragmentation
 1–2 fragments0.79 ± 0.02 (617)1.790.860.88 ± 0.02 (617)0.62 ± 0.04 (322)0.68 ± 0.03 (421)0.71 ± 0.03 (249)0.66 ± 0.04 (133)
 >20.79 ± 0.02 (236)2.240.830.89 ± 0.02 (236)0.60 ± 0.06 (99)0.56 ± 0.08*** (126)0.75 ± 0.06 (83)0.79 ± 0.05£ (93)
Length and fragmentation
 1–10 mm
  1–2 fragments0.91 ± 0.04$$ (55)1.870.980.90 ± 0.05 (55)0.53 ± 0.19 (19)0.81 ± 0.09 (24)0.67 ± 0.13 (16)0.83 ± 0.09 (27)
  >20.80 ± 0.04 (140)2.520.800.87 ± 0.05 (140)0.56 ± 0.11 (74)0.72 ± 0.06 (93)0.64 ± 0.08 (54)0.70 ± 0.10 (35)
 10–20 mm
  1–2 fragments0.75 ± 0.05 (111)1.760.830.88 ± 0.04 (111)0.58 ± 0.08 (51)0.42 ± 0.08^ (57)0.82 ± 0.07 (34)0.78 ± 0.07 (45)
  >20.78 ± 0.02 (384)2.360.790.85 ± 0.02 (384)0.66 ± 0.05 (201)0.67 ± 0.03 (268)0.70 ± 0.04 (156)0.60 ± 0.06 (76)
>20 mm
  1–2 fragments0.77 ± 0.06 (70)1.880.830.90 ± 0.04 (70)0.73 ± 0.11 (29)0.59 ± 0.08 (45)0.77 ± 0.08 (33)0.74 ± 0.12 (21)
  >20.81 ± 0.05 (93)1.850.880.96 ± 0.02 (93)0.53 ± 0.11 (46)0.65 ± 0.07 (60)0.85 ± 0.06 (39)0.79 ± 0.10 (22)

Relationships between biopsy length, fragmentation and fibrosis stages

The mean biopsy length was smaller for cirrhosis (F4, 16 mm) vs. F3, (18 mm, P = 0.01) and F0 (19 mm, P = 0.01), but not vs. F1 (17 mm) and F2 (18 mm). The mean number of fragments was higher for cirrhosis (F4 = 4 fragments) vs. all the other stages (F0 = 1.9, F1 = 1.9, F2 = 1.9, F3 = 2.3; P < 0.001 vs. F4); 54% of cirrhosis biopsies had more than two fragments vs. 24% of non-cirrhosis biopsies (P < 0.0001).

AUROCs according to the prevalence of fibrosis stages, defining advanced and non-advanced fibrosis

According to stage prevalence, the FT AUROCs varied (P < 0.001) from 0.67 to 0.98 for DANA ranging from 1 (the lowest difference between only stage F2 as advanced fibrosis and only F1 as non-advanced fibrosis) to 4 (the highest difference between only F4 as advanced fibrosis and F0 as non-advanced fibrosis. There was a highly significant correlation between the FT AUROC and DANA [Spearman’s r coefficient = 0.95 (P < 0.0001)]. The regression formula for standardizing AUROCs estimated from different stage prevalences was AUROC = 0.582 + 0.1056 × (DANA). The FT AUROC standardized at the DANA value = 3 was 0.85. Therefore the adjusted AUROC (AdAUROC) which took into account the observed DANA vs. a standard DANA of 3 was calculated by using the formula: AdAUROC = ObAUROC +  (0.1056) (2.5-ObDANA).

AUROCs and adjusted AUROCs according to biopsy length and fragmentation

The FT AUROCs for adjacent stages ranged from 0.54 to 0.76 and from 0.56 to 0.79 depending on biopsy length and fragmentation, respectively (Table 2).

Significant AUROC differences were observed for advanced vs. non-advanced fibrosis, as well as for F2 vs. F1 according to biopsy length, with surprisingly higher AUROCs for 1–10-mm long biopsies.

There was no significant difference for the FT AUROC in the diagnosis of advanced vs. non-advanced fibrosis or between non-fragmented (0.79 and 0.02) and fragmented biopsies (0.79, 0.03), P = 0.91. However, when the FT AUROCs for adjacent stages according to the presence or absence of biopsy fragmentation were compared (Table 2), there were two significant and inverse differences: lower FT AUROCs for fragmented biopsies for F2 vs. F1 and higher FT AUROCs for fragmented biopsies for F4 vs. F3.

Discussion

This study allowed the relative roles of biopsy length and fragmentation in AUROC variability to be estimated.

Liver biopsy is not a true gold standard and the diagnostic studies of fibrosis markers must take into account all the factors associated with the risk of biopsy failure. These factors include the intra- and inter-pathologist variability for fibrosis staging, and more importantly, the sampling error.

The sampling error related to biopsy length has been clearly demonstrated by using the entire liver as a gold standard,13 and we suggest using this information in standardized AUROCs adjusted for biopsy length means. The three proposed classes (between 0 and 10 mm, 10 and 20 and over 20 mm) have the advantage of covering the usual range of diagnostic studies and have been validated vs. the entire liver.13 Indeed, the FT AUROCs variability observed between stages were reduced after standardization for most stages (Table 1).

Table 1.   Diagnostic value of FibroTest (AUROC) for the diagnosis of stage combinations of hepatic fibrosis after standardization on the diagnostic value of biopsy according to the biopsy length
 Adjacent stagesAll cases
F1 vs. F0F2 vs. F1F3 vs. F2F4 vs. F3Advanced fibrosis vs. non-advancedCirrhosis vs. non-cirrhosis
 Ob *Gs**Al^ObGsAlObGsAlObGsAlObGsAlObGsAl
  1. * Ob, observed FibroTest AUROC; ** Gs, biopsy AUROC adjusted on the Gold standard (entire liver) for the given biopsy length, adapted from Bedossa et al.(12) ^ Al, adjusted length FibroTest AUROC = observed AUROC/Gold Standard AUROC.

All (median = 16 mm)0.600.800.750.670.820.820.710.860.830.710.820.870.800.920.890.880.920.96
1–10 mm0.540.700.770.760.751.000.650.750.870.750.751.000.850.811.000.890.811.00
10–20 mm0.610.800.760.660.820.800.710.860.830.670.820.840.780.920.850.870.920.95
>20 mm0.610.850.720.630.891.000.760.950.800.730.890.820.790.950.830.900.950.95

In a separate study, we demonstrated that the prevalence of liver fibrosis stages defining advanced and non-advanced fibrosis is a major factor of variability in assessing the diagnostic value of a fibrosis marker.10 FT AUROCs varied from 0.67 to 0.98 according to this prevalence. We suggested a standardization using the same prevalence for each stage. Without this standardization, the indirect comparisons between biomarkers are impossible.

However, the variability of FT AUROCs observed between stages was not fully corrected after the length (Table 1) and DANA (Table 2) standardizations, suggesting other variability factors. For example, for F2 vs. F1, the higher AUROC observed for biopsies smaller than 10 mm (0.76) vs. longer biopsies (0.66 and 0.63) was not expected (Table 2). This difference could be explained in part by the impact of fragmentation, as the AUROCs were lower for F2 vs. F1 comparison in patients with fragmented (0.56) vs. non-fragmented biopsies (0.68) (Table 2). This suggests more false positives or more false negatives related to biopsy failure among fragmented biopsies without cirrhosis and therefore more discordance with FT. Conversely, the higher FT AUROCs observed for fragmented (0.79) vs. non-fragmented biopsies (0.66) in the F4 vs. F3 comparison (Table 2) could suggest less failure of fragmented biopsies with less discordance with FT, as fragmentation being more specific of true F4, that is less false positive for the diagnosis of F4. These data indirectly confirm that fragmentation is a sign of cirrhosis, with better concordance between biopsy and FT in fragmented than in non-fragmented biopsies, independent of biopsy length. In F4 patients, the biopsy length was significantly lower, with twice as many fragments than in F3. Therefore, biopsy length and fragmentation must be discussed together when interpreting the AUROC and discordances between a marker and biopsy. Despite no apparent association of the AUROCs with advanced fibrosis, a significant impact can exist between adjacent stages.

Limitations of the study

This study has several limitations.

We analysed the biopsy length but not the biopsy width, although this has also been associated with sampling error.18 We also looked at the portal tract number (data not shown) which added no significant improvement to the variability analysis according to length and fragmentation. The details concerning fragmentation were missing in a part of our population, but when we compared patients with missing data to the others, there were no significant differences which could suggest a bias.

We used a simple standardization for the DANA giving the same weight to each fibrosis stages.10 There is a controversy concerning the linear association between the METAVIR scoring system and the quantity of fibrosis and concerning the linear progression of fibrosis. However, even if the exact model is unknown, the METAVIR scoring system is one of the best validated scoring system without any other better alternative.

Diagnostic studies comparing different markers need direct comparisons in the same patients to avoid bias related to indirect comparisons. With this type of internal control, the standardization of the AUROCs according to the prevalence of fibrosis stage, biopsy length and fragmentation are not mandatory. However, because the applicability of some markers like elastometry may be different than the applicability of blood biomarkers, both markers cannot be performed in all patients and standardization can be useful. The number of new fibrosis markers is increasing rapidly and it will be increasingly difficult to compare all of them in the same patients.4

Conclusions

Details of biopsy length and fragmentation as well as prevalence of each stages defining advanced and non-advanced fibrosis must been given and discussed in studies assessing or comparing fibrosis markers. Without knowing the impact of these three factors on the AUROC estimates, published results of diagnostic studies and meta-analyses of fibrosis markers can be misinterpreted by non-specialists.

Authors’ Contributions

TP conceived the study, performed the statistical analysis, and wrote the manuscript. MM, FIM, PH, LC, VR, YB, MB and VdL participated in the co-ordination of the study, data monitoring and drafted the manuscript. All authors read and approved the final manuscript.

Acknowledgements

Authors’ declaration of personal interests: Thierry Poynard has grants from the Association pour la Recherche sur le Cancer (ARECA) and from the Association de Recherche sur les Maladies Virales Hépatiques. Special thanks to Pierre Bedossa for furnishing details of his previous publication.

Ancillary