Letter to the Editor
Single slice vs. Volumetric MR assessment of visceral adipose tissue: Reliability and validity among the overweight and obese
Version of Record online: 12 MAR 2013
Copyright © 2013 The Obesity Society
Volume 21, Issue 1, pages 6–7, January 2013
How to Cite
Sabour, MD, PhD, S. (2013), Single slice vs. Volumetric MR assessment of visceral adipose tissue: Reliability and validity among the overweight and obese. Obesity, 21: 6–7. doi: 10.1002/oby.20093
- Issue online: 12 MAR 2013
- Version of Record online: 12 MAR 2013
- Accepted manuscript online: 24 OCT 2012 09:13AM EST
- Manuscript Accepted: 29 AUG 2012
- Manuscript Revised: 27 JUL 2012
- Manuscript Received: 21 MAY 2012
TO THE EDITOR:
I was interested to read the paper by Maislin et al. published in the May 2012 issue of Obesity. The authors assessed reliability and validity of single slice vs. volumetric MR of visceral adipose tissue (VAT) among the overweight and obese and reported that the correlation with VAT volume was significantly larger for L2-L3 VAT area (r = 0.96) compared to L4-L5 VAT area (r = 0.83) (1). These correlation computations do not reflect reliability and validity analysis and is a common mistake in reliability analysis (2-4). I found the manuscript title of Maislin et al. incorrect and misleading. Moreover, they reported a strong positive correlation between variables in both areas; clinically 0.13 differences in r means nothing, although it was statistically significant. As a rule of thumb in clinical epidemiology, clinical importance should be considered a priority instead of statistically significant. The P value can easily be changed from significant to non-significant due to small sample size, the amount of mean difference, and more important factor which is standard deviation of the variable in the study population (2-4). As the authors point out in their conclusion, linear regression analyses demonstrated that L2-L3 area alone was sufficient for predicting total VAT volume. The common practice is to employ two different sets of cohort data for developing and validation of a prediction model, and it is unclear why the authors did not consider employing such practice. The authors also did not utilize Area Under the Curve (AUC) analysis, which would have added diagnostic value to the study (2-4).
Reliability and validity are two completely different methodological issues in researches. Sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), likelihood ratio positive (LR+: true positive/false positive) and likelihood ratio negative (LR−: false negative/true negative) as well as odds ratio (true results/false results—preferably more than 50) are among the tests to evaluate the validity (accuracy) of a single test compared to a gold standard (2-4). Reliability (repeatability or reproducibility) is being assessed by different statistical tests such as Pearson r, least square, and paired t test which all of them are among common mistakes in reliability analysis (5). Briefly, for quantitative variable intra class correlation (ICC) coefficient and for qualitative variables weighted kappa should be used with caution because kappa has its own limitation too. Regarding reliability or agreement, it is good to know that for computing kappa value, just concordant cells are being considered, whereas discordant cells should also be taking into account in order to reach a correct estimation of agreement (weighted kappa) (2-4). It is crucial to know that there is no value of kappa that can be regarded universally as indication of good agreement. Statistics cannot provide a simple substitute for clinical judgment. Two important weaknesses of k value to assess agreement of a qualitative variable are as follow: It depends on the prevalence in each category and also depends on the number of categories. So it is obvious that the less our categories, the higher will be our kappa value which can easily lead to misinterpretation (2-4).
Area under the curve (AUC) is usually reported for diagnostic rather prognostic values of a model. The Receiver Operative Curve (ROC) for models may be comparable with LR+ for a test because both of them actually use sensitivity and 1-specificity; however, in LR+ they are divided and in the ROC we should plot sensitivity to 1-specificity. As a take home message, for reliability and validity analysis, appropriate tests should be applied.
- 1Single slice vs. volumetric MR assessment of visceral adipose tissue: Reliability and validity among the overweight and obese Obesity (Silver Spring), in press., , , et al.
- 2Epidemiology, Biostatistics and Preventive Medicine, 3rd ed. Philadelphia, PA, United State: Saunders, Elsevier; 2007., , , .
- 3Modern Epidemiology, 3rd ed. Baltimore, United States: Lippincott Williams & Wilkins; 2008., , .
- 4Epidemiology; Beyond the Basics, 2nd ed. Manhattan, New York, United state: Jones and Bartlett Publisher; 2007., .
- 5Agreed statistics: Measurement method comparison. Anesthesiology 2012; 116: 182-185., .
Siamak Sabour*, * Department of Clinical Epidemiology, Shahid Beheshti University of Medical Sciences, Tehran, Iran.