Potential conflict of interest: Nothing to report.
This study was supported by the American Diabetes Association Mentor-Based Postdoctoral Fellowship Program (7-07-MN-08; to R.H. and M.L.), National Institute of Diabetes and Digestive and Kidney Diseases grant 1RO1DK083393-01A1 (to J.M.C.), and K24-DK62222 P60 DK079637 (to F.L.B.).
Ultrasonography is a widely accessible imaging technique for the detection of fatty liver, but the reported accuracy and reliability have been inconsistent across studies. We aimed to perform a systematic review and meta-analysis of the diagnostic accuracy and reliability of ultrasonography for the detection of fatty liver. We used MEDLINE and Embase from October 1967 to March 2010. Studies that provided cross-tabulations of ultrasonography versus histology or standard imaging techniques, or that provided reliability data for ultrasonography, were included. Study variables were independently abstracted by three reviewers and double checked by one reviewer. Forty-nine (4720 participants) studies were included for the meta-analysis of diagnostic accuracy. The overall sensitivity, specificity, positive likelihood ratio, and negative likelihood ratio of ultrasound for the detection of moderate-severe fatty liver, compared to histology (gold standard), were 84.8% (95% confidence interval: 79.5-88.9), 93.6% (87.2-97.0), 13.3 (6.4-27.6), and 0.16 (0.12-0.22), respectively. The area under the summary receiving operating characteristics curve was 0.93 (0.91-0.95). Reliability of ultrasound for the detection of fatty liver showed kappa statistics ranging from 0.54 to 0.92 for intrarater reliability and from 0.44 to 1.00 for interrater reliability. Sensitivity and specificity of ultrasound was similar to that of other imaging techniques (i.e., computed tomography or magnetic resonance imaging). Statistical heterogeneity was present even after stratification for multiple clinically relevant characteristics. Conclusion: Ultrasonography allows for reliable and accurate detection of moderate-severe fatty liver, compared to histology. Because of its low cost, safety, and accessibility, ultrasound is likely the imaging technique of choice for screening for fatty liver in clinical and population settings. (HEPATOLOGY 2011; 54:1082–1090)
Fatty liver is the accumulation of fat (i.e., macrovesicular steatosis) within the hepatic parenchyma. Nonalcoholic fatty liver disease (NAFLD), the presence of fat infiltration in the liver in the absence of excessive alcohol consumption and other causes of liver disease, is the most common cause of fatty liver, with a prevalence as high as 30% in many populations.1 NAFLD may lead to fibrosis,2 cirrhosis,3 liver cancer,4, 5 liver failure requiring liver transplant,6 and mortality7, and it is associated with type 2 diabetes, metabolic syndrome, and other cardiovascular risk factors.8, 9 Although NAFLD represents a major public health challenge, its natural history and determinants are incompletely understood because of limitations in diagnostic technologies and because this condition is often asymptomatic until very late, severe complications occur. In addition, because of the risk of progression to more advanced stages, early noninvasive detection of fatty liver disease is clinically important.
Conventional B-mode ultrasonography is the most common technique used to assess the presence of fatty liver in clinical settings and population studies. However, several limitations of ultrasonography, including operator dependency, subjective evaluation, and limited ability to quantify the amount of fatty infiltration, have raised concerns. Indeed, some qualitative reviews10, 11 have questioned the ability of ultrasound to reliably identify fatty liver, although no systematic review has performed a quantitative summary of available data on the diagnostic ability and reliability of ultrasound to identify fatty liver, compared to histology, the gold-standard.
The main aim of this meta-analysis was thus to systematically review and summarize the available literature on the diagnostic accuracy (i.e., sensitivity and specificity) and reliability of ultrasound to distinguish patients with and without fatty liver, defined as the presence of moderate to severe steatosis on liver biopsy (gold standard). As secondary aims, we sought to systematically review and summarize the diagnostic accuracy and reliability of different ultrasonographic parameters or criteria used to diagnose fatty liver (e.g., presence of liver-to-kidney contrast or scores summing a variety of parameters). And, finally, we planned to analyze the available literature on the diagnostic accuracy (i.e., sensitivity and specificity) of ultrasound to detect fatty liver, compared to other imaging techniques (i.e., magnetic resonance imaging [MRI] and computed tomography [CT]).
Patients and Methods
Data Sources and Search.
Our search of PubMed and Embase included the term ultrasound and different combinations of fatty liver using free text and key words (Supporting Table 1). The period of the electronic search extended from October 1967 through March 17, 2010, with no language restrictions. We also searched the reference lists of identified reviews and abstracted articles.
We included all studies that presented the following: (1) estimates of diagnostic accuracy (such as sensitivity or specificity), cross-tabulations, or correlations of B-mode ultrasonography to identify fatty liver against histology as the gold standard; (2) estimates of intra- or interrater reliability (such as kappa statistics or intraclass correlation coefficients) of ultrasound to identify fatty liver; and (3) comparisons of ultrasound to other imaging modalities (i.e., CT or MRI) to identify fatty liver.
We excluded studies that did not use ultrasound for evaluating fatty liver, studies that used ultrasound but did not study fatty liver (e.g., cirrhosis exclusively), and studies that evaluated ultrasound techniques not commonly used (e.g., Doppler, transient elastography contrast-enhanced ultrasound, artificial neural networks, or computer-aided readings, including histogram evaluation and fat quantification, using regions of interest). We also excluded studies using experimental conditions, studies performed in the operating room, studies performed in nonhumans, in vitro or in vivo, and articles that did not report original data (e.g., editorials, news, comments, guidelines, and reviews).
Data Extraction and Quality Assessment.
Three investigators (R.H., M.L., and S.B.) independently reviewed the search results to determine article inclusion and perform data abstraction. Discrepancies were resolved by consensus. For each selected publication, we abstracted year of publication, country, inclusion criteria, histological definition of fatty liver (i.e., simple steatosis and steatohepatitis), number of participants undergoing ultrasound and comparison tests (if applicable), definitions of fatty liver used in the study, ultrasonographic parameters evaluated, and reported measures of accuracy and reliability. For articles with no reported measure of accuracy, we estimated the sensitivity and specificity from the available data. We evaluated the quality of each article by applying modified Quality Assessment of Diagnostic Accuracy Studies (QUADAS)12 and STAndards for the Reporting of Diagnostic accuracy studies (STARD) criteria.13
Study outcome was the presence of fatty liver as a dichotomous variable, using the specific criteria and definitions used in each study. For ultrasound, a few studies reported four categories, and we combined the normal/mild categories as absence of fatty liver, and the moderate/severe categories as presence of fatty liver. For histology, we used the presence of greater than or equal to 20%-30% fat infiltration to define fatty liver, except for Nagata et al. (≥10%), Guajardo-Salinas (>0%), and Soresi (>5%). We conducted secondary analyses on the diagnostic accuracy using lower levels of fat infiltration on histology as diagnostic criteria (i.e., <5%, ≥10%, and ≥20%-30%).
Because a number of ultrasonographic parameters have been used alone or in combination to diagnose fatty liver; if data were available, we evaluated the diagnostic accuracy of the following parameters: (1) parenchymal brightness, (2) liver-to-kidney contrast, (3) deep beam attenuation, (4) bright vessel walls, and (5) gallbladder wall definition. Given that some studies reported or combined different histological findings, such as inflammation and fibrosis, we performed secondary analyses to study how accurate ultrasound was in identifying fatty infiltration with or without inflammation or fibrosis.
Data Synthesis and Analysis.
Sensitivity and specificity of each study were summarized using the hierarchical summary receiver operating characteristics (ROC) curve approach.14 In this method, the relationship between logit-transformed sensitivity and specificity in each study is quantified by the log diagnostic odds ratio (OR) and the results are used to estimate a summary ROC curve.15 This method provides summary estimates of sensitivity and specificity, 95% confidence and prediction regions, and summary ROC curves, and it allows for multivariate analysis of between-study heterogeneity. Between-study heterogeneity was assessed by plots of the standardized logarithm of the diagnostic OR versus the inverse of the standard error and by the I2 statistic, a parameter that describes the percentage of total variation across studies attributable to heterogeneity, rather than chance.16 We used clinically important variables to assess between-study heterogeneity and fit metaregression models. Publication bias was assessed visually using the effective sample size funnel plot and associated regression test of asymmetry.17 Statistical analyses were performed using the STATA commands, METANDI and MIDAS (StataCorp 2007, Stata Statistical Software, Release 10; StataCorp LP, College Station, TX).
CI, confidence interval; CT, computed tomography; MRI, magnetic resonance imaging; MRS, magnetic resonance spectroscopy; NAFLD, nonalcoholic fatty liver disease; N/R, not reported; OR, odds ratio; QUADAS, Quality Assessment of Diagnostic Accuracy Studies; ROC, receiver operating characteristics; STARD, STAndards for the Reporting of Diagnostic accuracy studies.
Our review included 49 studies of diagnostic accuracy comparing ultrasound to histology (Table 1; Supporting Fig. 1)18-66 and five studies comparing ultrasound to other radiological techniques (including three studies that reported three-way comparisons between ultrasonography, another imaging technique, and histology) (Table 2).67-71 Nine of the 49 studies comparing ultrasound to histology also included data comparing each ultrasonographic parameter (e.g., liver-to-kidney contrast, deep beam attenuation, etc.) and histology.25, 26, 31, 34, 38, 40, 49, 61, 62 Finally, 22 studies provided data on intra- or interrater reliability (Supporting Table 2).S1-S22
Table 1. Characteristics of the 44 Studies of Diagnostic Accuracy Comparing Ultrasound to histology, sorted by publication year (*)
Author, year (reference)
n = These studies provided data of the accuracy of individual ultrasound parameters compared to histology.
Table 2. Characteristics of the Five Studies of Diagnostic Accuracy Comparing Ultrasound to Another Imaging Technique, Sorted by Publication Year
Author, year (reference)
Scatarige, 1984 (67)
Known liver disease
Pacifico, 2007 (68)
Suspicion liver disease
Pozzato, 2008 (69)
Edens, 2009 (70)
Mancini, 2009 (71)
Meta-Analysis of Diagnostic Accuracy of Ultrasonography Versus Histology.
Forty-nine studies, including 4720 participants, provided data on the diagnostic accuracy of ultrasound compared to histology as the gold standard. The weighted prevalence of histologically defined fatty liver across all studies was 31.8%, but the studies varied with respect to study population and location. Twenty-seven studies (55%) were conducted in a hospital setting or included a mixture of inpatients and outpatients. The indication for testing was suspicion of liver disease in 17 studies and known liver disease in 16 studies. The underlying liver disease was a combination of NAFLD and other pathologies in 36 studies and NAFLD only in eight studies. All studies included a representative spectrum of patients. Seventeen (35%) of the 49 studies did not report the method of ascertainment or used a different method of ascertainment in controls. Fewer than 50% of studies reported whether the interpretation of the ultrasound had been done without knowledge of the results of the biopsy.
Overall sensitivity of ultrasound to detect moderate to severe histologically defined fatty liver from the absence of steatosis (n = 34 studies, 2815 participants) was 84.8% (95% confidence interval [CI]: 79.5-88.9), specificity was 93.6% (87.2-97.0), the positive likelihood ratio was 13.3 (6.4-27.6), the negative likelihood ratio was 0.16 (0.12-0.22), and the summary area under the ROC curve was 0.93 (0.91-0.95) (Figs. 1 and 2A). We further examined the lower cutoffs for the detection of histologically defined fat, and found that ultrasounds have a diagnostic accuracy for the detection of ≥10% of steatosis between 0.91 and 0.93 and specificity between 0.88 and 0.99 (Supporting Table 3).
Heterogeneity for the area under the summary ROC curve was substantial (I2, 98%; 95% CI: 97-99). In subgroup analyses, clinically relevant categories only explained a minor proportion of between-study heterogeneity (Supporting Fig. 2). There was no indication of publication or related biases (data not shown).
When ultrasound was used to differentiate the presence of histologically based fatty liver alone versus other pathological findings, such as hepatitis or fibrosis or normal liver (n = 29 studies), overall sensitivity was similar (87.2%; 95% CI: 77.8-93.0), but specificity was substantially lower (79.2%; 95% CI: 72.8-84.4). Correspondingly, the positive likelihood ratio was lower (4.2; 95% CI: 3.3-5.4), but the negative likelihood ratio was unchanged (0.16; 95% CI: 0.09-0.28). Overall, the summary area under the ROC curve was the same as that for determining fatty liver versus not (0.93; 95% CI: 0.91-0.95) (Fig. 2B).
Meta-Analysis of Diagnostic Accuracy of Ultrasonography Components Versus Histology.
There was a wide variation in ultrasound parameters evaluated for assessing fatty liver (data not shown). Of the 49 studies with histology as a gold standard, parenchymal brightness was used as an ultrasound diagnostic criterion in 43 (88%) studies, deep beam attenuation in 30 (61%), vessels in 28 (57%), liver-to-kidney contrast in 27 (55%), and gallbladder wall definition in 4 (8%) studies.
In studies where the accuracy of ultrasonographic parameters of fatty liver definition were evaluated individually, sensitivities of liver to kidney contrast, vessel wall brightness, and deep beam attenuation were 98% (75%-100%), 81% (70%-89%), and 59% (45%-72%), respectively. Specificity was similar for all components (range, 93%-95%) (Supporting Table 4).
Systematic Review of the Reliability of Ultrasonography.
Twenty-two studies reported the reliability of ultrasound findings: kappa statistics (17 studies), coefficients of variation (three studies), percent disagreement (one study), and intraclass correlation coefficient (one study).S1-S22 Among studies reporting kappa statistics, the number of readers ranged from 1 to 15. The range of kappa values for intrarater evaluation was 0.54-0.92 (six studies) and for the interrater evaluation was 0.44-1.00 (14 studies). Studies reporting reliability measures for individual components reported similar results across components (Supporting Table 5).S1-S22
Meta-Analysis of Diagnostic Accuracy of Ultrasonography Versus Other Imaging Techniques.
We found five studies comparing ultrasound data to CT, MRI, or magnetic resonance spectroscopy (MRS) without histology, including a total of 215 adults. Ultrasound had an overall sensitivity of 93.6% (60.5-99.3), specificity of 80.1% (53.3-93.4), positive likelihood ratio of 4.71 (1.89-11.71), and negative likelihood ratio of 0.08 (0.01-0.56). Only three studies had ultrasound,56, 65, 66 another imaging technique, and histology (Supporting Table 6),S23-S25 and ultrasound had slightly better overall accuracy for detecting fatty liver, compared to other techniques.
Our meta-analysis shows that ultrasound is an accurate, reliable imaging technique for the detection of fatty liver, as compared with histology, with a pooled sensitivity of 84.8%, a pooled specificity of 93.6% for detecting ≥20%-30% steatosis, and a summary area under the ROC curve of 0.93. Because ultrasound is relatively inexpensive and accessible, compared to other diagnostic techniques, our results suggest that ultrasound may be the imaging technique of choice for screening for the presence of fatty liver in clinical settings and, especially, population studies. The widespread use of ultrasound to detect fatty liver may help better identify the determinants and natural history of fatty liver disease in the general population and may help target interventions directed to reducing the complications associated with fatty liver. Indeed, though no U.S. Food and Drug Administration–approved therapy exists for fatty liver, lifestyle changes,72 vitamin E, and pioglitazone73 have shown some efficacy.
We found a relatively large number of studies using ultrasound as the diagnostic method and liver biopsy as a gold standard, with a wide range of sensitivities (55%-100%) and specificities (26%-100%). These differences could be the result of a number of factors. First, technical quality and performance of the ultrasound varied across studies. We included studies conducted from 1979 to 2010; during this time, technological advances in ultrasound equipment have occurred and could potentially explain part of this variation. Second, the ultrasound criteria used to define fatty liver differed across studies. Third, although the majority of the studies included patients who underwent liver biopsy with some suspicion of liver disease, there was a wide range in severity of the underlying disease. Finally, the composition of the comparison group (i.e., normal liver or other liver disease, such as inflammation, fibrosis, or a combination of these) also differed across studies, adding to the heterogeneity. Despite these differences, our sensitivity analyses, stratified by publication year, setting, degree of steatosis, and diagnosis, among others, showed similar results and, therefore, allow the use of the pooled accuracy estimates. Similar factors may have also contributed to the variation of reliability estimates between studies, including prevalence of cases with steatosis in the study population, lack of standard protocol to perform the evaluation, and the use of different criteria.
The potential role of ultrasound in clinical settings and in population research is very important. In the current obesity epidemic, the prevalence of fatty liver disease, in particular NAFLD, is likely to increase, making it necessary to use practical tools for measuring the burden of disease and tracking time trends. In the clinical context, the number of patients at risk for fatty liver disease is also increasing. There is thus a pressing need to have readily available, accurate methods to assess the presence of fatty liver, and ultrasound compares favorably to alternative noninvasive techniques. Liver enzymes, indirect markers of liver injury, have lower sensitivity (0.30-0.63) and specificity (0.38-0.63) than ultrasound.74 Indeed, compared to liver enzymes, the use of ultrasound as a triage test, applied early on to determine which patients should undergo further testing, would likely reduce the number of false-positive results and thus decrease the burden of subsequent testing. Other imaging techniques (i.e., CT or MRI/MRS) have similar operating characteristics, but are more expensive, and CT involves radiation, and therefore, their widespread usefulness is limited.
Our systematic review had certain limitations. We did not include other ultrasound techniques (e.g., Doppler and histogram) that would have allowed a more objective quantification of fat. Also, we could not assess the accuracy of ultrasound for the whole range of fat accumulation and could not evaluate the performance of an ultrasound-based four-grade scale (i.e., normal, intermediate, moderate, and severe) in the detection of fatty liver. We did not have individual patient data, so we were not able to evaluate the performance of ultrasound in key patient subgroups (e.g., by body mass index or presence of subcutaneous fat thickness). Although we reported significant statistical heterogeneity, in multiple secondary analyses on the key clinical variables, our inferences remained unchanged.
Our review shows that though ultrasound is useful for identifying fatty liver, additional research is needed to better assess the performance of specific ultrasound criteria of individual parameters, in particular gallbladder and vessel wall definition, to accurately and reliably detect fatty liver. Some parameters may be more reliable and justify the use of a more focused ultrasound examination. In addition, future studies assessing the accuracy of ultrasound should aim to refine the ultrasound protocol and assess the accuracy of a scoring system to improve its reliability.
We also identified relatively few studies comparing the accuracy of ultrasound against other noninvasive imaging techniques and alternative testing strategies (e.g., including a combination of imaging and liver enzymes), and, therefore, could not conduct comparative analyses of different techniques and/or different testing strategies. Future studies are warranted to answer those questions, including about the accuracy of MRS extensively used in epidemiological studies, in which no large comparison with histology is available. However, the existing data do not provide evidence that other techniques are superior to ultrasound to detect the presence of fatty liver, although they may be useful in experimental settings, where a more precise quantification of liver fat is needed. Further studies are needed contrasting the diagnostic performance of ultrasound in persons with different degrees of adiposity to histology.
We also found that there is a need for improved study quality and reporting. Only five of the studies evaluating the diagnostic accuracy of ultrasound also assessed its reliability. These few studies support that ultrasound has a good intra- and interrater reliability and is comparable to the reliability of biopsy data,75 but future studies using ultrasound should include detailed reliability data. In addition, a number of studies did not provide details of the ultrasound protocol and/or assessment. This information is important to ensure proper replication and comparability between studies. Finally, few studies clearly reported the ultrasound reviewers masked to participant's characteristics and histological findings, and therefore, there was the risk of bias.
In conclusion, our meta-analysis shows that liver ultrasonography is an accurate, reliable tool to detect moderate to severe fatty liver, with sensitivity and specificity of 84.8% and 93.6%, respectively. These findings, together with the relatively low cost and lack of radiation exposure, support the use of ultrasound as the imaging technique of choice for screening for fatty liver in clinical settings and population studies. More research is needed to assess the long-term prognostic significance of ultrasound findings as well as the diagnostic implications of improvements in ultrasound technology and of more detailed quantification of liver fat.