Accurate prediction scores for liver steatosis are demanded to enable clinicians to noninvasively screen for nonalcoholic fatty liver disease (NAFLD). Several prediction scores have been developed, however external validation is lacking.
Accurate prediction scores for liver steatosis are demanded to enable clinicians to noninvasively screen for nonalcoholic fatty liver disease (NAFLD). Several prediction scores have been developed, however external validation is lacking.
The aim was to determine the diagnostic accuracy of four existing prediction scores in severely obese children, to develop a new prediction score using novel biomarkers and to compare these results to the performance of ultrasonography.
Liver steatosis was measured using proton magnetic resonance spectroscopy in 119 severely obese children (mean age 14.3 ± 2.1 years, BMI z-score 3.35 ± 0.35). Prevalence of steatosis was 47%. The four existing predictions scores (“NAFLD liver fat score,” “fatty liver index,” “hepatic steatosis index,” and the pediatric prediction score) had only moderate diagnostic accuracy in this cohort (positive predictive value (PPV): 70, 61, 61, 69% and negative predictive value (NPV) 77, 69, 68, 75%, respectively). A new prediction score was built using anthropometry, routine biochemistry and novel biomarkers (leptin, adiponectin, TNF-alpha, IL-6, CK-18, FGF-21, and adiponutrin polymorphisms). The final model included ALT, HOMA, sex, and leptin. This equation (PPV 79% and NPV 80%) did not perform substantially better than the four other equations and did not outperform ultrasonography for excluding NAFLD (NPV 82%).
The conclusion is in severely obese children and adolescents existing prediction scores and the tested novel biomarkers have insufficient diagnostic accuracy for diagnosing or excluding NAFLD.
Nonalcoholic fatty liver disease (NAFLD) is well established as one of the complications of obesity. Concomitant with the rise in obesity, NAFLD has become the most common chronic liver disease in children and adults in the industrialized world (1). In obese children it has a reported incidence ranging from 22% in school-based population up to 52% in those referred to obesity centers (2, 3). The spectrum of NAFLD ranges from simple steatosis, to steatohepatitis, to fibrosis and cirrhosis. Significant fibrosis and even cirrhosis can already develop in childhood (4). Furthermore, although still disputed, several studies found that NAFLD is an independent risk factor for diabetes type 2 and atherosclerosis in this age group (5, 6). In view of these risks screening for this chronic and mostly asymptomatic disorder in obese children is important.
Serum aminotransaminases and liver ultrasonography (US) have been shown to be imperfect screening tools for NAFLD (1, 7, 8). Proton magnetic resonance spectroscopy (1H MRS) has been recognized as an accurate noninvasive tool to determine liver steatosis (1, 9). Liver steatosis is a prerequisite for the presence of all stages of NAFLD. The relevance of 1H MRS as a diagnostic tool is underscored by a recent guidance document which defined it as one of the end points for clinical trials in NAFLD when liver biopsy is not feasible (10). However, 1H MRS has drawbacks; it is expensive and not widely available. Therefore biomarkers for NAFLD remain highly demanded for daily clinical practice. Several prediction scores using noninvasive markers have been developed for determining the presence of liver steatosis in adults and in children (11-15). Fair to good predictive values of these equations were reported; however, external validation of their diagnostic accuracy in different populations is lacking. Besides these equations, novel biomarkers that strongly correlate with the presence of NAFLD continue to be reported. Most of these have not been evaluated for their diagnostic value in noninvasive prediction models.
In this study, we aimed (1) to prospectively evaluate the diagnostic performance of four freely available prediction scores for diagnosing NAFLD in an unselected cohort of severely obese children and adolescents using 1H MRS as the reference standard; (2) to build a new prediction model in this same cohort using anthropometric, routine biochemistry and an extensive set of novel biomarkers; and (3) to compare the performance of these equations to the diagnostic accuracy of US in this cohort.
Children and adolescents referred to a Dutch obesity center between February 2008 and October 2010 were eligible. Inclusion criteria were age from 8 to 18 years and primary obesity. Exclusion criteria were concomitant liver disease, (past) use of steatogenic medication or oral anti-diabetic drugs, alcohol use ≥7 U week−1, history of jejunal-ileal surgery or parenteral feeding and contra-indications for MR scanning (magnetic or radiofrequency sensitive implants or claustrophobia). The study was conducted according to the Declaration of Helsinki. The study protocol was approved by the Medical Ethics Committee of the Academic Medical Center of the University of Amsterdam. Each participant and/or its guardian provided written informed consent.
Measurements were performed at the start of the obesity program. Three trained pediatricians in the obesity center conducted anthropometric measurements using a standardized protocol (all blinded; 11-28 years of experience). Weight and length were measured and used to calculate the age adjusted BMI standard deviation score, the BMI-z-score (16). Waist circumference was defined as the smallest torso circumference measured between spina iliaca superior and the lower rib margin (17). Resting blood pressure was measured using an oscillometric sphingomanometer. The pubertal stages were determined by visual inspection, using Tanner's criteria.
After an overnight fast venous blood was sampled for serum biochemistry studies. Alanine aminotransferase (ALT), asparte aminotransferase (AST), gamma glutamyl transferase (γGT), fat spectrum, ferritin, and high sensitive C-reactive protein were measured directly after blood sampling using standard laboratory methods by blinded and certified laboratory staff in an adjacent local hospital. Fasted insulin and glucose were used to calculate homeostatic model assessment (HOMA-IR) as previously described (18). An oral glucose tolerance test was performed to exclude the presence of diabetes mellitus type 2. Hepatitis B and C, autoimmune hepatitis, alpha-1 antitrypsin deficiency, abetalipoproteinemia, hemochromatosis and Wilson disease were excluded using the appropriate diagnostic tests.
Serum and plasma samples plus lymphocytes were stored at −80°C until analyses were conducted for novel biomarkers. Circulating levels of leptin, adiponectin, tumor necrotic factor-alpha (TNF-alfpha), interleukin-6 (IL-6), caspase-cleaved cytokeratin 18 (CK-18), fibroblast growth factor 21 (FGF-21) and single nucleotide polymorphism in the rs738409 region of the patatin-like phospholipase domain-containing protein-3 (“adiponutrin”) gene (PNPLA3 SNP) were selected based on previously reported correlations with the presence of NAFLD (19-28). Leptin and adiponectin were measured in plasma using a human Leptin RIA kit (Millipore, St. Charles, MO). CK-18, TNF-alpha and IL-6 were measured in serum by ELISA kits (CK-18: M30-Apoptosense ELISA kit, PEVIVA, Bromma, Sweden; IL-6 and TNF-alpha: Pelikine compact™, Sanquin, Amsterdam, The Netherlands). Serum FGF-21 levels were determined by an in-house developed ELISA using goat anti-human FGF-21 antibody (AF2539, R&D Systems, Minneapolis, MN) (Details available on request). PNPLA3 SNP analysis was performed by PCR amplification of the region surrounding rs738409 in the PNPLA3 gene. PCR fragments were bidirectionally sequenced using the Bigdye kit v1.1 (Applied Biosystems, Carlsbad, CA). Reactions were run on an ABI3700 genetic analyzer (Applied Biosystems, Carlsbad, CA) and sequences were analyzed using CodonCode aligner (CodonCode Corporation, Dedham, MA).
US examinations were performed in the daily workload of a local, medium-sized hospital by one of three radiologists (5-20 years experience, >600 liver US examinations per year). All radiologists were blinded to clinical and 1H MRS data. Four widely accepted scoring items for liver steatosis were considered (29): (1) echogenicity of liver parenchyma; (2) visualization of diaphragm; (3) visualization of intrahepatic vessels and (4) visualization of posterior part of the right hepatic lobe. Standardized views of the liver were obtained to enable scoring of these four items on a four point scale: score 0 for no steatosis, score 1 for mild steatosis, score 2 for moderate steatosis and score 3 for severe steatosis. The “US steatostis score” was defined as the average score of the four items. A score ≥1 was defined as liver steatosis. Diagnostic accuracy results and observer variability data of US in this cohort were previously reported (8).
Magnetic resonance (MR) scanning was performed in the Academic Medical Center of the University of Amsterdam from 1 month before to 2 weeks after the start of the obesity program. 1H MR spectra were acquired using a point-resolved spectroscopy sequence (TE/TR = 38/2,000 ms) in a voxel of 20 × 20 × 20 mm3 during free breathing on a 3.0 Tesla MR system (Philips Healthcare, Best, The Netherlands). If body habitus did not permit scanning on this system, an open bore 1.0 Tesla MR scanner (Philips Healthcare, Best, The Netherlands) was used. The absolute mass concentration of liver fat was calculated as previously described by a research fellow (blinded, 3 years of experience) under supervision of an experienced MR physicist (blinded, 8 years of experience) (8). Liver steatosis was defined as >1.8% absolute mass concentration of liver fat measured with 1H MRS. This cutoff has been shown to correspond with >5% fat containing hepatocytes on liver histology in a validation study which showed an excellent linear correlation and accuracy of the MRI setting in our hospital (30).
The diagnostic accuracy of “fatty liver index” (FLI), “NAFLD liver fat score” (NAFLD score), “hepatic steatosis index” (HSI) and a recently published equation for the prediction of NAFLD in children (ped-NAFLD score) was determined in this cohort. All scores use simple clinical and laboratory parameters, have good to fair accuracy in their derivation population and are freely available (12, 13, 15). For the FLI and HSI, it should be noted that these were developed using US as an (imperfect) reference standard. The respective equations are shown in Table 1. The optimum cutoff points reported for the derivation populations were not adopted but instead the optimum cutoff was calculated for this cohort, to aim for maximum performance.
|NAFLD liver fat score (12)|
|Scorea = −2.89 + 1.18 × metabolic syndrome (yes = 1/no = 0) + 0.45 × type 2 diabetes (yes = 2/no = 0) + 0.15 × insulin (mU L−1) + 0.04 × AST (U L−1) − 0.94 × AST/ALT ratio|
|Fatty liver index score (13)|
|Hepatic steatosis index (14)|
|Paediatric NAFLD scoreb(15)|
Descriptive results were expressed using standard descriptive statistics. Diagnostic accuracy of the equations was assessed by performing receiver operating characteristics (ROC) curve analysis. The optimal cut-off point was determined using the Youden index. Performance was expressed as sensitivity, specificity, positive and negative predictive value (PPV and NPV, respectively), positive and negative likelihood ratio (LR) and their corresponding 95% confidence intervals (95% CI).
Predictive parameters for steatosis in the study cohort were identified using multivariate logistic analysis. Anthropometric and routine biochemistry parameters with P ≤ 0.10 in univariate logistic regression analysis were entered in a multivariate model. Variables were excluded using backward selection with a significance level set at P < 0.05. Novel markers were then entered into this model and included if they significantly attributed to the model (forward selection). This approach was chosen to prevent model over-fitting. Effect modification for sex, age, and pubertal stage on the selected parameters was studied and added to the model if P < 0.10. Shrinkage was applied to correct the coefficients for overoptimism (31). Calibration of the model was evaluated by the Hosmer–Lemeshow goodness of fit test. Diagnostic accuracy of the model was determined by ROC curve analysis as described above.
All analyses were performed with PASW Statistics 18; SPSS, Chicago, IL and with Microsoft Office Excel; Microsoft; Redmond; WA; USA. Results were presented according to the STAndards for the Reporting of Diagnostic accuracy studies (STARD) criteria (32).
Out of 134 eligible subjects, 119 severely obese children and adolescents were consecutively included. Eight subjects could not be included because no informed consent was obtained, two subjects withdrew before study procedures were finished and five patients met exclusion criteria: magnetic sensitive implants (n = 1), past use of steatogenic or oral antidiabetic drugs (n = 3) and alcohol use ≥7 U week−1 (n = 1).
Characteristics of the 119 included subjects are depicted in Table 2. None of the subjects had diabetes type 2. Prevalence of steatosis in this cohort was 47% (56/119) (95% CI: 38-56 %). The distribution of the degree of steatosis is shown in Figure 1.
|N = 119|
|Age (yrs)||14.3 (± 2.1)|
|Gender (♂)||48 (41%)|
|BMI-z-score||3.35 (± 0.35)|
|Waist (cm)||104.5 (± 12.7)|
|Diastolic BP (mmHg)||80 (± 10)|
|Systolic BP (mmHg)||121 (± 15)|
|ALT (IU L−1)||31.8 (± 19)|
|gGT (IU L−1)||23.4 (± 11.4)|
|Ferritin (μg L−1)||58.5 (± 34.3)|
|CRP (mg L−1)||4.3 (± 3.9)|
|HOMA-IR||3.9 (± 2.5)|
|Triglycerides (mmol L−1)||0.98 (± 0.50)|
|HDL-cholesterol (mmol L−1)||1.08 (± 0.25)|
|LDL-cholesterol (mmol L−1)||2.50 (± 0.68)|
|Adiponectin (μg mL−1)||7.8 (± 2.8)|
|Leptin (ng mL−1)||33.9 (± 32.4)|
|Interleukin-6 (pg mL−1)||1.28 (0.33–4.86)|
|TNF-α (pg mL−1)||1.12 (0.10–24.93)|
|CK-18 (U L−1)||134.0 (75.8–564.4)|
|FGF-21 (ng mL−1)||0.112 (0.190–0.845)|
|PNPLA-3 SNP, n (%)|
The ROC curves of the NAFLD score, FLI, HSI, and Ped-NAFLD score in this cohort are shown in Figure 2. The diagnostic accuracy of these equations using the optimum cutoff are shown in Table 4. The PPV for predicting liver steatosis of these four equations ranged from 61 to 70% and the NPV ranged from 68% to 77%, with overlapping 95% CI.
|Univariate analysis||Multivariate analysis|
|OR (95% CI)||P value||Adjusteda OR (95% CI)||P value|
|Age (per year)||1.05 (0.89-1.25)||0.55|
|Gender (♂)||1.86 (0.89-3.90)||0.10||2.12 (0.76-5.89)||0.132|
|Waist (per cm)||1.05 (1.01-1.08)||0.007|
|Diastolic BP (per mmHg)||1.05 (1.01-1.10)||0.02|
|Systolic BP (per mmHg)||1.02 (0.99-1.05)||0.11|
|ALT (per IU L−1)||1.06 (1.03-1.10)||<0.001||1.06 (1.02-1.10)||0.001|
|gGT (per IU L−1)||1.08 (1.03-1.14)||0.002|
|Ferritin (per μg L−1)||1.01 (0.99-1.02)||0.09|
|CRP (per mg L−1)||0.95 (0.86-1.05)||0.32|
|HOMA-IR||1.52 (1.22-1.90)||<0.001||♀ 1.76 (1.19-2.60)||0.03|
|♂ 1.13 (0.85-1.50)||0.40|
|Triglyceriden (per mmol L−1)||3.72 (1.52-9.10)||0.004|
|HDL-chol (per 0.1 mmol L−1)||1.13 (0.97-1.31)||0.12|
|LDL-chol (per mmol L−1)||1.6 (0.91-2.85)||0.10|
|Ultrasound steatosis score||0.09|
|Score 1||0.8 (0.69-7.14)||0.18|
|Score 2||3.52 (1.19-16.26)||0.02|
|Score 3||3.36 (2.98-88.09)||0.004|
|Adiponectin (per μg mL−1)||0.96 (0.84-1.09)||0.52|
|Leptin (per ng mL−1)||1.02 (0.99-1.05)||0.10||1.04 (1.01-1.09)||0.03|
|FGF-21 (per ng mL−1)||1.67 (0.87-32.22)||0.73|
|PNPLA-3 SNP rs738409|
|AUC-ROC||Sensitivity (%)||Specificity (%)||PPV (%)||NPV (%)||LR+||LR−|
|Existing equations||NAFLD liver fat score||0.75 (0.66-0.84)||77 (64-86)||71 (58-81)||70 (57-80)||77 (64-87)||2.62 (1.71-4.02)||0.33 (0.19-0.55)|
|Fatty liver index||0.71 (0.61-0.80)||70 (56-80)||60 (47-71)||61 (48-72)||69 (56-80)||1.76 (1.22-2.49)||0.51 (0.32-0.80)|
|Hepatic steatosis index||0.68 (0.59-0.78)||67 (53-78)||62 (50-73)||61 (48-72)||68 (55-79)||1.76 (1.22-2.57)||0.53 (0.35-0.81)|
|Ped-NAFLD score||0.76 (0.67-0.85)||75 (62-85)||68 (55-79)||69 (56-79)||75 (61-84)||2.34 (1.56-3.54)||0.36 (0.22-0.60)|
|New equation Ultrasound||ALT-HOMA-leptina||0.83 (0.76-0.91)||77 (64-87)||81 (69-89)||79 (66-88)||80 (68-89)||4.1 (2.4-7.1)||0.28 (0.17-0.47)|
|Ultrasound (11)||NA||85 (75-95)||55 (42-68)||62 (50-74)||82 (69-94)||1.9 (1.4-2.6)||0.26 (0.13-0.45)|
Results of the logistic regression analysis are shown in Table 3. The following variables were associated with presence of steatosis in univariate analysis (P ≤ 0.10) and accordingly presented to the multivariate logistic regression model in two steps: gender, BMI z-score, waist, diastolic blood pressure, ALT, γGT, ferritin, HOMA-IR, triglycerides, LDL-cholesterol, US steatosis score (Step 1: backward selection) and the novel biomarkers leptin, TNF-alpha, CK-18, PNPLA-3 SNP (Step 2: forward selection). Only HOMA, ALT and leptin were identified as independent predictors of steatosis. Gender was identified to be an effect modifier for the effect of HOMA on steatosis; the effect of HOMA in females being 1.6 times higher as compared to males (OR = 1.81, P = 0.003 vs. 1.13, P = 0.396) (P = 0.06). No collinearity between the independently correlated variables was detected. The model-issued equation to predict NAFLD is:
Ad sex: male = 1 and female = 0
The model was fairly well calibrated (Hosmer–Lemeshow Chi Square = 12.39, P = 0.14). The AUC-ROC of this equation was 0.83 (95% CI 76-91%; Figure 2). The ROC curve and diagnostic performance of this equation as well as the performance of US in this cohort, as previously published (8), are shown in Figure 2 and Table 4.
This study shows that in a high-risk population of severely obese children and adolescents (prevalence of steatosis 47%, 95% CI: 38-56%), the currently freely available prediction scores and presently derived formula using a set of novel biomarkers only have moderate diagnostic accuracy for predicting NAFLD and, therefore, are insufficiently accurate to be used in daily clinical practice in this group.
Noninvasive prediction scores for NAFLD are highly demanded to enable clinicians to screen for NAFLD easily and rapidly. Particularly, accurate exclusion of the presence of NAFLD in high risk groups would allow clinicians to select patients needing more expensive and invasive diagnostics (i.e., MR scanning or liver biopsy). Although a perfect diagnostic test for NAFLD remains the final aim (33), widely adopted criteria indicative for a clinically useful test when aiming to exclude a disorder are sensitivity >95% (i.e., <5% false negatives) and negative likelihood ratio <0.10 (34). All the evaluated equations do not meet this degree of diagnostic accuracy in this study population as shown in Table 4. Even in their derivation population, the four previously published prediction scores did not meet this degree of accuracy (sensitivity varied between 82% and 85% and negative likelihood ratio varied between 0.17 and 0.25) (12, 13, 15). External validation of these equations is scarce. FLI and HSI were evaluated once in a cohort of obese and diabetic adults and lacked diagnostic accuracy (AOC-ROC 0.65 and 0.64, respectively) (35). The only report on external validation of the NAFLD score was published in a letter: in a selected overweight adult population it almost met the proposed criteria of a clinically useful test (sensitivity 93% and a negative likelihood ratio 0.09) (36). No other reports of external validation of these equations using an accurate reference standard (liver histology or MR spectroscopy) have been published to our knowledge.
As expected, since it is derived from this study population, the newly built equation performed better than the four previously published equations. However its diagnostic accuracy for excluding NAFLD is still insufficient for use in daily clinical practice (sensitivity 77%; negative likelihood ratio 0.28). Moreover, when externally validated, this model is unlikely to perform substantially better than the other four equations. Application of lower cut-off values (i.e. leading to increased sensitivity, and lower false negative rate) for these prediction scores can improve the diagnostic accuracy for excluding NAFLD, but consequently a smaller proportion of patients is thus identified as free from disease, limiting the clinical usefulness of these tests. Using a lower cut-off for the prediction score derived from this cohort, a sensitivity >95% and a negative likelihood ratio <0.10 can be obtained. However, using this cut-off, only 15% of the cohort has a negative test result, while in fact 53% of this cohort is disease free. In other words, less than one third of the disease free subjects are identified as free from disease. For clinical practice this would mean that a large number of patients would still require further diagnostic testing to exclude NAFLD.
The prediction rules did not outperform US for excluding NAFLD in this study despite the only moderate diagnostic accuracy of US (NPV 82%; 95% CI 69-94). US is widely available and is simple to use in comparison to the equations. Therefore, until new high-performance diagnostic tools to screen for NAFLD are developed, liver US remains the most suitable tool to exclude NAFLD in severely obese children. US cannot be used to diagnose NAFLD in this patient group because the positive predictive value was low (63%; 95% CI 50-74). In literature, the reported predictive values of US vary. However, we previously showed that the observed predictive values in this cohort are comparable to those observed in other studies when correcting for differences in disease prevalence (8). To our knowledge, no previous study compared head-to-head the diagnostic accuracy of noninvasive prediction scores versus US in one population.
The diagnostic value of the extensive set of novel biomarkers for NAFLD evaluated in this study was disappointing. All selected markers have been reported to individually correlate strongly with the presence of NAFLD and have been marked as promising biomarkers, but most have not been evaluated in diagnostic models (19-28). In this cohort, none of the novel biomarkers, except for leptin, independently correlated with NAFLD when added to the model containing the simple parameters ALT, HOMA and effect modification of sex on HOMA. By adding leptin the model only marginally improved (PPV unchanged, NPV increased from 76% to 80%). PNPLA3 SNPs have previously been shown not to improve diagnostic accuracy when added to the NAFLD score in obese adults. Adiponectin only marginally improved the ped-NAFLD score in its initial publication (12, 15). Leptin, IL-6, TNF-alpha, FGF-21 and CK-18 have not been previously evaluated in diagnostic models for NAFLD. Leptin and FGF-21 have been shown to correlate specifically with steatosis (22, 25, 27). IL-6, TNF-alpha and CK-18 have been shown to correlate with steatohepatitis and fibrosis (23, 25, 26). Despite the lack of diagnostic value observed in this population, these novel biomarkers remain to be evaluated in populations with different age and degree of obesity.
A strong point of this study is that the MR scanner used in this study has been validated by comparison with the reference standard of histological determined liver fat, as described in the method section (30). In most studies steatosis is defined as 5.0 or 5.5% MR determined liver fat without taking into account scanner parameters and mode of fat fraction calculations that importantly influence the results of MR determined fat. Moreover, the cohort in this study was not selected based on liver abnormalities and other causes of liver disease were rigorously excluded. Limitations of this study are that the study population was severely obese and results cannot simply be extrapolated to less severely obese children. Yet, the negative predictive value of US is likely to be even better in less severely obese children as has been reported in adults (37). The newly developed prediction rule derived from this study cohort was not tested in a validation cohort. However, as its accuracy was already considered insufficient, validation is less relevant. Finally, novel candidate biomarkers for presence of NAFLD continue to be reported, such as white blood cell count, vitamin D, Apolipoprotein C3 gene variants, markers of oxidative stress and molecular species identified through metabolomics (38-40). The diagnostic value of these parameters in prediction models needs to be evaluated.
In conclusion, the currently freely available, noninvasive prediction scores lack sufficient diagnostic accuracy for diagnosing or excluding NAFLD in severely obese children and adolescents. The set of novel biomarkers for NAFLD evaluated in this study have little additional diagnostic value when combined with routinely available biomarkers in this group. New high-performance prediction scores for NAFLD remain to be developed.
We gratefully acknowledge the pediatricians at The Obesity Center Heideheuvel for recruiting patients, the CBSL laboratory of the Ter Gooi Hospital for helping in collecting and storing samples and the department of Clinical Genetics and DNA Diagnostics of the AMC hospital for aiding in genetic analysis.