Biomarkers of liver fibrosis: What lies beneath the receiver operating characteristic curve?


  • Potential conflict of interest: Dr. Guha is a consultant for and received grants from Pfizer.


Noninvasive biomarkers of liver fibrosis represent an intense area of research with the goals of improving patient care, disease stratification, and aiding the development of future antifibrotic therapies. Despite the rapid progress in recent years, there remain questions about how diagnostic studies are designed, statistical methods to account for spectrum bias, clinically relevant thresholds of fibrosis that should be delineated, how diagnostics can be improved, and strengthening the reference test to judge emerging biomarkers. This review discusses the current methods to address these issues and where further progress is needed. (HEPATOLOGY 2011;)

The quest for noninvasive markers of liver fibrosis represents an area of intense research and controversy in hepatology. The growing global burden of liver disease, patient preference for noninvasive tests, limited tools for assessing the efficacy of emerging antifibrotic medication, and the necessity of healthcare to be cost-effective are the key driving forces in this area.

There has been undoubted progress in the area of noninvasive biomarkers of liver fibrosis. The evolving understanding of the biology of fibrosis progression and regression has underpinned many of the candidate markers utilized in the emerging biomarkers. Coupled with markers of hepatic function, liver enzymes, routine blood tests, and anthropometry they have led to the creation of a large number of panel tests that are able to distinguish severe fibrosis/cirrhosis with acceptable diagnostic accuracy.1, 2 The emergence of novel technologies, both imaging and the “omics” holds further promise for improving diagnostic accuracy. However, there are fundamental aspects of how we conduct diagnostic studies, compare different biomarkers, and define endpoints that have not been clearly defined or agreed upon within the hepatology field. This article will discuss some of these important issues.


α-SMA, α-smooth muscle actin; AUC, area under the receiver operating characteristic curve; CHC, chronic hepatitis C; DANA, difference between the mean of advanced and nonadvanced fibrosis stages; DWI, diffusion-weighted imaging; HVTT, hepatic vein transit time; INF, interferon; MRE, magnetic resonance elastography; MRI, magnetic resonance imaging; STARD, standards for the reporting of diagnostic accuracy; TGF-β, transforming growth factor-β.

What Are the Goals of Noninvasive Markers of Liver Fibrosis?

The majority of studies in noninvasive markers focus on cross-sectional endpoints of liver fibrosis. Although this remains an important endpoint to enable stratification of disease severity, there are other endpoints that also need to be considered. Histology represents a surrogate for the development of future liver-related outcomes; finding biomarkers that directly correlate with clinical outcomes has obvious merit. The ability to test such markers in a serial and noninvasive manner, and the confidence that these changes will alter future prognosis, will have a major impact on clinical practice. A current barrier to the development of antifibrotic therapy is the lack of robust tools that enable the dynamic processes of fibrogenesis and fibrolysis to be measured at early and regular intervals. Importantly, this will influence regulatory guidelines for industry developing antifibrotic therapies. The current reliance on histological staging of fibrosis using categorical scores on liver biopsy is a suboptimal method to assess efficacy. Thus, the evolution and diversity of endpoints should be expected and encouraged. Importantly, one biomarker test may not achieve all of these diverse objectives.

Design and Quality of Studies

A common theme that emerges from the systematic reviews of noninvasive markers3-7 is the heterogeneity of studies. There are common issues about conducting and reporting diagnostic studies, common to many areas of clinical research but some that are specific to hepatology.

The CONSORT guidelines were established in the mid-1990s for intervention studies. Based on similar principles the STAndards for the Reporting of Diagnostic accuracy studies (STARD) guidelines (Table 1) were developed by a working group based on the 1999 Cochrane colloquium meeting in Rome and first published in 2003.8 The difference these guidelines have made on quality is limited by the time lag in influencing the design of diagnostic studies and the variability in adoption across medical journals. A review of the quality of diagnostic studies after STARD was introduced had a number of interesting observations relevant to biomarkers of liver fibrosis. Approximately 50% of studies did not report inclusion/exclusion criteria or how patients were sampled (e.g., random selection or consecutive series) and 25% did not report if the diagnostic study was prospective or retrospective.9-11

Table 1. STARD Items
Section and TopicItemDescription
TITLE/ABSTRACT/KEYWORDS1Identify the article as a study of diagnostic accuracy (recommend MeSH heading 'sensitivity and specificity').
INTRODUCTION2State the research questions or study aims, such as estimating diagnostic accuracy or comparing accuracy between tests or across participant groups.
Participants3Describe the study population: The inclusion and exclusion criteria, setting and locations where the data were collected.
 4Describe participant recruitment: Was recruitment based on presenting symptoms, results from previous tests, or the fact that the participants had received the (evaluated) index tests or the (golden) reference standard?
 5Describe participant sampling: Was the study population a consecutive series of participants defined by the selection criteria in items 3 and 4? If not, specify how participants were further selected.
 6Describe data collection: Was data collection planned before the index test and reference standard were performed (prospective study) or after (retrospective study)?
Test methods7Describe the reference standard and its rationale.
 8Describe technical specifications of material and methods involved including how and when measurements were taken, and/or cite references for index tests and reference standard.
 9Describe definition of and rationale for the units, cut-offs and/or categories of the results of the index tests and the reference standard.
 10Describe the number, training and expertise of the persons executing and reading the index tests and the reference standard.
 11Describe whether or not the readers of the index tests and reference standard were blind (masked) to the results of the other test and describe any other clinical information available to the readers.
Statistical methods12Describe methods for calculating or comparing measures of diagnostic accuracy, and the statistical methods used to quantify uncertainty (e.g. 95% confidence intervals).
 13Describe methods for calculating test reproducibility, if done.
Participants14Report when study was done, including beginning and ending dates of recruitment.
 15Report clinical and demographic characteristics of the study population (e.g. age, sex, spectrum of presenting symptoms, co morbidity, current treatments, recruitment centers).
 16Report the number of participants satisfying the criteria for inclusion that did or did not undergo the index tests and/or the reference standard; describe why participants failed to receive either test (a flow diagram is strongly recommended).
Test results17Report time interval from the index tests to the reference standard, and any treatment administered between.
 18Report distribution of severity of disease (define criteria) in those with the target condition; other diagnoses in participants without the target condition.
 19Report a cross tabulation of the results of the index tests (including indeterminate and missing results) by the results of the reference standard; for continuous results, the distribution of the test results by the results of the reference standard.
 20Report any adverse events from performing the index tests or the reference standard.
Estimates21Report estimates of diagnostic accuracy and measures of statistical uncertainty (e.g. 95% confidence intervals).
 22Report how indeterminate results, missing responses and outliers of the index tests were handled.
 23Report estimates of variability of diagnostic accuracy between subgroups of participants, readers or centers, if done.
 24Report estimates of test reproducibility, if done.
DISCUSSION25Discuss the clinical applicability of the study findings.

To assess the quality of diagnostic studies, tools such as the QUality Assessment of Diagnostic Accuracy Studies (QUADAS) have been formulated. Although broadly useful, the QUADAS is not specific for liver fibrosis. As Table 2 shows, there are certain aspects of assessment that deserve particular attention for liver biomarker studies. The selection of patients into diagnostic studies of liver disease has predetermined spectrum bias because of the requirement of an invasive reference test, often performed only in specialist care settings. Bias is also introduced if the studies are retrospective (rather than prospective) and recruitment is nonconsecutive. The “doubling up” of therapeutic/intervention trials as diagnostic studies has obvious advantages but these needed to be treated with caution, as they impose additional inclusion/exclusion criteria, in comparison to a priori diagnostic studies, and thus may not be applicable to the future population to be tested.

Table 2. QUADAS Criteria
QUADAS CriteriaRelevance to Liver Biomarker Studies
1. Was the spectrum of patients representative of the patients who will receive the test in practice?Most current studies are conducted in hospital settings (secondary or tertiary care) as the reference standard is an invasive liver biopsy. Care in extrapolating data to community settings is needed.
2. Were selection criteria clearly described?Important to distinguish prospective and retrospective studies and also “de novo” diagnostic studies from a priori therapeutic studies
3. Is the reference standard likely to correctly classify the target condition?A contentious issue for liver fibrosis. Is liver biopsy a true gold standard ?
4. Is the time period between reference standard and index test short enough to be reasonably sure that the target condition did not change between the two tests?Ideally the biomarker should be taken at the same time as the reference test (liver biopsy). However the progression of liver fibrosis is not generally rapid so there may be flexibility but currently no consensus
5. Did the whole sample or a random selection of the sample, receive verification using a reference standard of diagnosis?Most studies comply with this.
6. Did patients receive the same reference standard regardless of the index test result?Most studies comply with this.
7. Was the reference standard independent of the index test (the index test did not form part of the reference standard)?Not a major issue for the non-invasive biomarker studies that use liver biopsy as the reference test.
8. Was the execution of the index test described in sufficient detail to permit replication of the test?Most studies give details of processing. Algorithms of panel tests are not always published. Of particular relevance to the imaging modalities and “omics” technologies. Allied to this question is also the issue of reliability and reproducibility of the test.
9. Was the execution of the reference standard described in sufficient detail to permit replication of the test?Important to give details of pathological analysis; size of biopsy, portal tracts, number of pathologists and scoring systems.
10. Were the index test results interpreted without knowledge of the results of the reference standard?Most studies comply with this.
11. Were the reference standard results interpreted without knowledge of the results of the index test?Most studies comply with this.
12. Were the same clinical data available when test results were interpreted as would be available when the test is used in practice?Pertinent for the diagnosis of cirrhosis; routine blood tests, radiology and clinical findings have diagnostic potential; unclear to the extent that biomarkers show “additive” diagnostic discrimination for this stage of disease.
13. Were uninterpretable/intermediate test results reported?Biomarkers are often reported in the context of dichotomous outcomes. This is because the reference test is reported as a categorical variable. It may limit the potential utility of biomarkers.
14. Were withdrawals from the study explainedMost studies are cross sectional. Greater issue is lack of transparency in selection (see point 2).

Diagnostic tests assess diagnostic accuracy and diagnostic reliability. Although accuracy is dependent on agreement between the index test (e.g., biomarker) and reference test (e.g., liver biopsy), reliability is the agreement between multiple observations of the same index test (e.g., biomarker). The assessment of reliability is different between the modalities of biomarkers. With serum/urine biomarkers, the variation in analysis will need to be within the laboratory (e.g., coefficient of variation measures) or if there are multiple sites, between laboratories. With imaging modalities, studies have shown that variation can exist depending on the training and experience of the operator.12 The “omics” technologies utilize the power of the technology to discriminate endpoints but this can be potentially limited by the accompanying variability in reliability in both acquisition and analysis.

Spectrum Bias

Since originally described by Ransohoff and Feinstein over 30 years ago,13 it is clear that the performance of a diagnostic test (i.e., its sensitivity and specificity) may vary depending on the spectrum of clinical, pathological, and comorbid characteristics of the patients to which the test has been applied.14 This “spectrum bias” has important implications for the study of biomarkers of liver fibrosis, particularly in comparisons of markers across different study populations. Indeed, empiric evidence has shown that a biomarker's area under the receiver operating characteristic curve (AUC), the most frequently used measure of test performance in this field, may be biased if the fibrosis distribution in the study differs from that of the reference population to which it is applied.15, 16 For example, in a study of 1,000 simulated samples of HCV patients with different distributions of fibrosis, Lambert et al.16 showed an average bias of 0.063 (95% confidence interval [CI] 0.016-0.105) in the AUC of the FibroTest among samples with a predominance of extreme stages of fibrosis (largely F0 and F4 [AUC = 0.871]) compared with a more representative fibrosis distribution (largely F1 and F2 [AUC = 0.808]). Comparisons of the FibroTest's AUC between samples defined by intermediate and extreme fibrosis distributions were also highly prone to type I errors, which occurred in 43% of simulations. Similarly, in an analysis of a large hepatitis C virus (HCV) patient database, an important association between the AUC of the FibroTest and the prevalence of fibrosis stages used to define advanced (F2-F4) and nonadvanced fibrosis (F0-F1) was described.15 In this analysis the AUC for differentiating advanced from nonadvanced fibrosis was 0.80 in a reference population with a 50% prevalence of F2-F4 fibrosis. However, in situations with very divergent fibrosis distributions the AUCs ranged from 0.67 in a sample restricted to patients with F1 and F2 fibrosis to 0.98 in a sample including only patients with F0 and F4 fibrosis.

In light of these important observations, two methods of adjustment have been proposed to address spectrum bias related to differences in fibrosis stage distributions across study populations.15, 16 In the first, Poynard et al.15 proposed a formula for standardizing AUCs based on a regression equation linking the observed AUC with the difference between the mean of advanced and nonadvanced fibrosis stages in the study population (referred to as DANA). According to the authors:

equation image(1)

where p equals the prevalence of each fibrosis stage in the study sample. For a hypothetical reference population including a 20% prevalence of each fibrosis stage, DANA equals 2.5. Based on data from 1,312 HCV patients in a multicenter database, the regression formula for standardizing an observed AUC based on this reference population is:

equation image(2)

Based on this adjustment, the adjusted AUC of the FibroTest in the patient database was 0.85 compared with 0.80 without adjustment (at an observed DANA of 1.98). An advantage of this approach is that it facilitates AUC standardization to the fibrosis distribution of any reference population by replacement of 2.5 in Equation 2 with the DANA among that population. Moreover, the calculations required for this method do not require specialized software. The major disadvantage of this method is that it has not been externally validated for biomarkers other than the FibroTest or among patients with non-HCV liver disease. For example, in a systematic review of the performance of the FibroScan in patients with diverse liver disorders, DANA was not significantly associated with the reported AUCs.17 In addition, in the original publication the regression model linking DANA with observed AUCs in the individual patient database did not fit optimally when applied to 18 previously published studies of the FibroTest.15

The second proposed method for addressing spectrum bias adapted to the fibrosis biomarker literature is referred to as the “Obuchowski measure.” This measure, interpreted in a similar manner as the AUC, was designed for situations characterized by a nonbinary reference standard (e.g., as in liver fibrosis staging, which is ordinal). The Obuchowski measure represents a weighted average of the n(n-1)/2 different AUCs corresponding to all of the pairwise comparisons between two of the n categories. In the case of liver fibrosis, typically staged according to from F0 to F4 (i.e., n = 5 categories), this would represent 10 comparisons (5x4/2). Weighting can be based on the relative distribution of fibrosis stages in the study sample or on the fibrosis distribution of a reference population, as with DANA described above. In addition, a penalty function can be applied to adjust for the “distance” between fibrosis stages under comparison or the number of units on the ordinal scale. In this way a comparison of AUCs in patients with F0 and F4 fibrosis would be penalized more than a comparison between those with F1 and F2 fibrosis, which is a more difficult distinction. According to the simulation study of Lambert et al.,16 the Obuchowski measure is insensitive to the distribution of fibrosis stages in the study sample and associated with a low rate of type I errors. The authors suggested that by using the Obuchowski measure with the same weighting scheme, results from different studies (e.g., of different biomarkers) could be easily compared or combined in a meta-analysis to minimize the issue of spectrum bias across study populations. Although attractive from a statistical viewpoint, the Obuchowski measure is limited by its requirement for specialized statistical software and programming. Moreover, to combine the results of different studies in a meta-analysis using this measure would require individual patient data or the AUCs of each pairwise comparison of fibrosis stages, both of which are rarely available.

It is presently unclear whether either the DANA or the Obuchowski methods are the ideal approaches to account for spectrum bias in liver biomarkers studies.

Delineating Fibrosis Thresholds

Recent American Association for the Study of Liver Diseases (AASLD) recommendations have indicated that liver histology remains an important adjunct in the management of liver disease, particularly where prognostic information about fibrosis stage may guide treatment.18 There have been significant advances in the treatment for viral hepatitis in the last two decades, and in particular the advent of interferon (IFN)-based therapies for chronic hepatitis C (CHC) infection has required reevaluation of the role of liver biopsy to help guide therapeutic decisions. The 1997 National Institutes of Health (NIH) Consensus guidelines on hepatitis C endorsed the use of pretreatment liver biopsy to select patients at greatest risk of progression to cirrhosis.19 At that time, due to the relatively low efficacy and a significant side effect profile of IFN monotherapy, recommendation for treatment was partly based on a threshold of portal or bridging fibrosis and at least moderate necroinflammation on liver biopsy. Other professional society guidelines also endorsed the need for biopsy to determine eligibility for IFN-based treatment.20 The 2002 NIH consensus statement on hepatitis C suggested that biopsy remained essential to determine fibrosis stage, but there was a need to better define the role of biopsy in therapeutic management and for CHC patients with normal liver enzymes.21 Recently updated practice guidelines for hepatitis C recommend biopsy if the healthcare provider and patient wish for information regarding fibrosis stage or to make a decision regarding treatment.22 The development of specifically targeted antiviral therapy for HCV (STAT-C) with increased efficacy and tolerability will result in a transition to individualized therapy in CHC, and further limit the diagnostic role of biopsy in treatment-naïve patients. Certainly, biopsy is not recommended for HCV genotype 2 or 3 infection prior to current standard-of-care therapy, and integration of newer genomic predictors of virologic response may also reduce the requirement for pretreatment assessment of fibrosis in a proportion of patients with HCV genotype 1 infection.23 Thus, for these patients with expected favorable responses to a finite course of therapy, accurate delineation between mild and moderate stage disease is less important than excluding advanced stage disease (bridging fibrosis and/or cirrhosis) that provides additional prognostic information. For chronic hepatitis B, although the non-IFN-based oral nucleos(t)ide therapies are well tolerated and demonstrate good antiviral efficacy, treatment duration may be prolonged and stopping rules are not as well defined as in CHC infection. Liver biopsy assessment for these patients requires a closer histologic evaluation of mild-to-moderate fibrosis stages and necroinflammation, as this will be incorporated into decisions regarding therapy.24 Differentiation between earlier fibrosis stages is also important in virologic nonresponders, patients with contraindications or poor tolerance to therapy, and other chronic liver disease. This allows for longitudinal evaluation of the natural history of disease progression, or assessment of the efficacy of potential antifibrotic or disease-modifying therapy.

Thus, the evolving treatment paradigm for CHC patients eligible for therapy indicates that histological assessment is required to exclude advanced-stage disease. For other patients with CHC and nonviral-mediated chronic liver injury, delineation between earlier fibrosis stages remains important, but has to be evaluated in the context of other histological features such as inflammation, steatohepatitis, or iron overload. In the absence of viable therapeutic options for nonalcoholic fatty liver disease (NAFLD), the main impetus for noninvasive diagnostic testing in this condition is to provide prognostic information by distinguishing the clinically significant nonalcoholic steatohepatitis (NASH)-associated moderate-advanced fibrosis (stage ≥F2) from earlier stage disease (F0-1) or simple steatosis. Histological scoring systems for NAFLD stages ≤F2 differ from METAVIR; for example, NAFLD fibrosis stage 1 also includes qualitative assessment of periportal and perisinusoidal fibrosis.25 Thus, noninvasive test thresholds developed and validated in CHC patients are likely to be different for mild-moderate stage disease in NAFLD. For alcoholic liver disease, a modified METAVIR staging approach has been used to validate noninvasive tests initially developed in CHC infection.26 There is a relatively consistent definition of bridging fibrosis and cirrhosis for chronic liver disease, thus providing a more reliable fibrosis threshold for noninvasive diagnosis (and prognosis).

Improving Diagnostic Accuracy

Sampling issues and observer variability continue to provide significant limitations to liver biopsy as an accurate diagnostic tool in clinical practice and in the development of sensitive noninvasive measures of fibrosis. Recent AASLD guidelines recommend a biopsy of at least 2-3 cm in length, obtained with a 16G needle, and the presence of greater than 11 complete portal tracts for adequate staging and grading of diffuse parenchymal disease.18 This aims to reduce the sampling error associated with disease heterogeneity that further impairs accuracy of histologic grading and staging.27 However, even in experienced tertiary referral centers, few percutaneous needle biopsies meet these criteria.28 Smaller specimens may be sufficient for the diagnosis of cirrhosis and consideration of other biopsy techniques that may not be routinely available, such as transvenous or laparoscopic approaches, need to be individualized.

Reducing observer variability appears as important as specimen quality for accurate disease staging. One approach has been the use of standardized scoring systems for fibrosis and necroinflammation. However, these do not provide a linear assessment of fibrosis deposition or matrix content, but are semiquantitative grading systems that were developed to provide an overall assessment of severity of hepatic injury and determine appropriate treatment thresholds. The experience of the reviewing histopathologist also remains important to reduce interobserver variability.29 Biopsy evaluation should be performed in conjunction with a trained liver histopathologist, preferably in a multidisciplinary setting that includes the healthcare providers. This is important, as fibrosis staging needs to be assessed in the context of other qualitative features or disease processes present in a biopsy, and which may provide further useful diagnostic or prognostic information. Furthermore, accurate delineation between fibrosis stages may not be important for all cases, and a qualitative assessment of hepatic injury such as mild or moderate-to-severe disease may be adequate for management decisions in many patients.

Computer-aided morphometric image analysis of hepatic collagen can provide an objective measure of fibrosis. Unfortunately, morphometry is also subject to sampling variability, and the coefficient of variation for image analysis remains high in relation to standard histologic staging for fibrosis.30 However, the ability of this methodology to provide a quantitative measure of collagen potentially allows for earlier detection of changes in fibrosis, for example, due to natural history of disease or efficacy of antifibrotic therapy, as compared to routine histologic assessment. A combination of collagen morphometry with other immunohistochemical measures of myofibroblast activation, such as α-smooth muscle actin (α-SMA) or transforming growth factor-β (TGF-β) staining31 may provide a more accurate assessment of changes in fibrosis, and has recently been utilized as a coprimary efficacy measure in an antifibrotic clinical trial.32 Emerging bioimaging methods to improve fibrosis quantitation include multiphoton microscopy. This nonlinear optical technique allows for assessment of endogenous signals, such as two-photon excitation fluorescence and second harmonic generation, to provide a spatial assessment of fibrillar collagen (types 1 and III).33 Although fibrillar collagen deposition represents a relatively late event in fibrogenesis, application of this methodology to unstained biopsy specimens requires minimal additional sample preparation, and could potentially be used in conjunction with morphometry to provide a more accurate and quantitative measure of fibrosis.

Imaging and Novel Technologies

As with serum marker tests, there is growing evidence that novel imaging techniques can also characterize the extent of hepatic fibrosis defined by liver biopsy. One of the motivations for developing novel imaging techniques emanates from the limited sensitivity of conventional imaging strategies to identify early to intermediate stages of fibrosis.34 There continues to be great interest for refining existing technologies associated with ultrasound and magnetic resonance platforms.

One ultrasound-based approach, known as the hepatic vein transit time (HVTT), has been studied for detecting hepatic fibrosis in patients with chronic liver injury. HVTT is measured by Doppler ultrasound following a bolus injection of a microbubble contrast agent. Reduced HVTT suggests the presence of vascular dysregulation and intrahepatic shunting that is found with advanced hepatic fibrosis. Further investigations are required to define the performance of HVTT in detecting early to intermediate stages of hepatic fibrosis.35 A more commonly used approach is called ultrasound-based transient elastography (TE). The majority of published studies demonstrate excellent sensitivity and specificity for detecting cirrhosis (stage F4) with TE, whereas lower degrees of accuracy are noted for identifying patients with stages F2-F4 hepatic fibrosis.36, 37 Acoustic radiation force impulse imaging (ARFI) is a promising method that involves the mechanical excitation of tissue using short-duration (≈262 μs) acoustic pulses. Tissue stimulation generates micron-scale displacements that are then used to calculate wave velocity. Of note, the ARFI method can be embedded into a conventional ultrasound scanner to allow formal examination of the hepatic parenchyma as well. Preliminary results suggest very similar performance to TE although further validation is warranted.38

Several magnetic resonance imaging (MRI) techniques have also been proposed for assessing hepatic fibrosis. The most standardized approach to date is magnetic resonance elastography (MRE). As with TE, MRE directly visualizes and quantitatively measures propagating acoustic shear waves progressing through liver tissue.39 Notably, initial studies with MRE document higher sensitivity and specificity values for detecting stages F2-F4 hepatic fibrosis when compared to TE.21 It is also expected that MRE will become widely available in the near future. There are several differences between TE and MRE that should be recognized. Although reproducibility of liver stiffness measurement by TE and MRE is excellent within experienced centers,40, 41 the degree of variability in accurate measurement may be higher following diffusion of TE into community-based settings.15 The examination of larger hepatic parenchymal areas with MRE is also responsible, in part, for higher diagnostic accuracy when detecting stages F2-F4 hepatic fibrosis as compared to TE.17, 42 For both TE and MRE, however, the presence of necroinflammation, cholestasis, and venous congestion from heart failure may interfere with accurate liver stiffness measurement.36, 42

Diffusion-weighted MR imaging (DWI) is a technique that assesses the degree of molecular diffusion in tissues. With the development of hepatic fibrosis, it is thought that water diffusion is reduced and thus could be assessed as a marker of fibrosis deposition. Results from recent studies using DWI, however, are mixed with respect to demonstrating a specific relationship between the extent of water diffusion (measured as the apparent diffusion coefficient [ADC]) and fibrosis stage on liver biopsy.43 31P-based MR spectroscopy (MRS) has been used for assessing hepatic fibrosis in a limited number of studies.44 Because the absolute quantitation of metabolites is difficult to calculate, metabolite ratios for assessing spectral profiles are used for predicting stage of fibrosis. As with DWI, the sensitivity and specificity of MRS is reduced compared to MRE as substantial overlap in metabolite ratio values are seen between patients with differing fibrosis stages on liver histology.

The refinement of imaging technologies within tertiary care facilities has traditionally led to the diffusion of these methods into the community. Subsequently, the initial period of use may be affected by operator inexperience and unfamiliarity with hardware and software components. It is expected that this will be less of an issue with MR-based approaches, given the development of these tests are incremental advances based on existing algorithms already used within the realm of clinical MRI. The development of combined ultrasound devices capable of performing conventional real-time sonographic imaging and tissue stiffness assessment will likely increase the chance for uptake into the community.

An emerging issue will be how imaging strategies for detecting hepatic fibrosis should best be used for clinical evaluation of patients. Although the performance of both serum markers and imaging studies is more powerful at detecting higher stages of fibrosis (F2-F4), it can be expected that severity of hepatic fibrosis will be milder (i.e., stages F0-F1) in the community. Both serum and imaging tests have limitations in identifying this subgroup of patients, although recent data suggest MRE has better accuracy given its ability to consistently identify patients with normal liver and simple steatosis alone. Whatever the choice, it remains essential to assess what the preliminary diagnostic performance of any noninvasive test in a community-based population will be before determining which approach is more suitable. Ideally, this could include head-to-head comparisons of new technologies so that their comparative effectiveness can be assessed. Finally, it remains uncertain whether these tests should be used alone or in various combinations in practice. The initial use of serum markers has typically been followed by imaging (or more advanced serum marker panels) as a step-wise approach for increasing diagnostic accuracy while minimizing resource utilization in published studies to date. However, it is possible that imaging tests upfront may have the same cost-effectiveness as combined testing in certain populations, yet this will also need to be demonstrated prospectively.

Alternative Gold Standards

Despite several limitations described with liver biopsy, this technique will still be required when estimating the performance of noninvasive techniques unless alternate “gold standards” are recognized as reliable and valid. For example, there is emerging data on the ability of hepatic venous pressure gradient (HVPG) measurement to predict overall liver-related outcome as well as risk for variceal hemorrhage.45 However, there remain several issues with HVPG as well, given that it is an invasive technique whose performance within tertiary medical centers remains variable as well.

Recent studies have highlighted the beneficial role of serum and imaging noninvasive tests compared to liver biopsy in prediction of long-term clinical outcomes.26, 46, 47 This prognostic information also applies to nonviral chronic liver disease and posttransplant patients, potentially allowing identification of patients at greatest risk of disease progression several years before development of clinical sequalae.22, 23, 48, 49 However, developing alternative surrogate markers that replace current noninvasive tests of fibrosis will be difficult, given the slow disease progression rates in chronic liver disease requiring long-term follow-up of large patient cohorts. The associated incremental cost with limited therapeutic options for disease reversibility at advanced stages for many of the chronic liver diseases further limits development of such markers. To facilitate comparisons between approaches, an emphasis needs to be placed on justifying the rationale for endpoints to assess comparative effectiveness and how this affects power and sample size calculations overall. For example, a binary endpoint (such as the detection of advanced versus early fibrosis) will be useful when this knowledge can facilitate a change in clinical management strategy (i.e., institution of surveillance procedures associated with cirrhosis when detected). Greater statistical complexity arises, however, when designing investigations to examine multiple categorical stages of fibrosis should this be required for assessing treatment candidacy. However, it is already being recognized that clinical decision-making is relying less stringently on each fibrosis stage as opposed to category of fibrosis involvement, which could then be addressed using categorical endpoints.50 It is generally recognized that measurement of a continuous outcome would be an ideal situation to provide a means for assessing prognosis and treatment response. Substantial work will be required, however, to conduct longitudinal, validation studies of sufficient length to validate the association between a quantitative surrogate biomarker and clinical outcomes such as death or hepatic decompensation.


Noninvasive markers of liver fibrosis have a role in current clinical practice. The continuing evolution of noninvasive markers is dependent on open discussion, improvement, and research in key areas. Prospective, large, and methodologically sound diagnostic studies that are reported in a standardized format across journals will continue to improve the quality of evidence available. Further discussion is needed on how to account for spectrum bias and varying prevalence of liver fibrosis; specifically, on whether statistical correction is desirable or justified. Novel technologies will continue to improve accuracy but implementation into clinical practice will be dependent on incremental value, cost, reproducibility, and external validity. Finally, defining the question that the noninvasive marker is required to address will lead to different diagnostic and prognostic tools. It is logical that different tests will be utilized depending on whether the objective is to stratify fibrosis in the community at one end of the spectrum or define robust endpoints for novel antifibrotic compounds at the other end of the spectrum.