Letter to the Editor
Risk Assessment and Fracture Discrimination by Ultrasound: The Debate Continues
Version of Record online: 29 NOV 2004
Copyright © 2005 ASBMR
Journal of Bone and Mineral Research
Volume 20, Issue 3, pages 536–538, March 2005
How to Cite
Nguyen, T. V., Nguyen, N. D. and Ahlborg, H. G. (2005), Risk Assessment and Fracture Discrimination by Ultrasound: The Debate Continues. J Bone Miner Res, 20: 536–538. doi: 10.1359/JBMR.041130
- Issue online: 4 DEC 2009
- Version of Record online: 29 NOV 2004
It is a truism infrequently acknowledged that few things are more difficult than to predict a fracture event. Indeed, the greater the accuracy of prediction required, the greater the difficulty is. In considering the associations between various quantitative ultrasound (QUS) measurements and asymptomatic vertebral fracture reported by Glüer et al.,(1) we see no reason to challenge their carefully worded conclusion and judgment, but here we would like to put forward a few heuristic points that may yield some alternative and supplementary interpretations.
First, in many clinical and epidemiologic settings, odds ratio (OR) can be used as an estimate of risk ratio (RR). If p1 is the prevalence of fracture exposed to a risk factor (i.e., low speed of sound [SOS]) and p0 is the prevalence of fracture unexposed to the risk factor, then RR = p1/p0 and OR = p1/(1 − p1)/p0/(1 − p0). Therefore, it is easier to understand the meaning of a RR than that of OR. For example, an RR of 1.7 readily informs us that the risk of fracture in the exposed group is 70% higher than the risk in the unexposed group. However, for an OR of 1.7, an equivalent interpretation of increased risk is not tractable, because the unit of interpretation is now odds, not probability (i.e., the odds of sustaining a fracture in the exposed group is 70% higher than the odds of sustaining a fracture in the unexposed group). Because of this awkward interpretation, some authors suggest that OR is meaningless, whereas others suggest that OR should not be used as a measure of strength of association.(2–6)
Second, OR usually overestimates RR. For events with low frequency in the population (e.g., hip fracture), where 1 − p1 and 1 − p0 are close to 1, OR is approximately equal to RR; however, when the prevalence of fracture is high, the OR actually overestimates the RR,(6–8) and in practice, such an overestimate has generated misunderstanding about the magnitude of association in the scientific media as well as in the popular press.(9) The study by Glüer et al. is a case in point; they report that the unadjusted OR for the association between Achilles + SOS and fracture was 1.7, but with the prevalence of fracture of 16%, the RR was actually around 1.5. Therefore, all of the ORs reported by Glüer et al. in their Table 2 actually overestimate the actual strength of association between QUS (and for that matter, DXA measurement of BMD) and fracture risk.
Third, RR (or its estimate, OR) is primarily a measure of association of two variables, not a measure of discrimination of two events. Consequently, a risk factor with a high OR is not necessarily an accurate discriminator of fracture cases from nonfracture cases. In the following simulations, using the parameters similar to the study of Glüer et al., we show that, for an OR of 1.7, the proportion of SOS values in fracture and nonfracture cases overlapping is ∼80%. This overlapping proportion is, as expected, progressively decreased as the OR increases. However, even with an OR of 10, the overlapping proportion is still high (∼20%). To achieve a complete discrimination between fracture and nonfracture cases, an OR of 30 is required (Fig. 1). Therefore, it can be argued that none of the QUS or DXA BMD measurements was qualified as an accurate discriminator of fracture.
Fourth, in addition to OR, Glüer et al. used the area under the receiver operating characteristic (ROC) curve to gauge the use of QUS in the identification of fracture. In this study, the area under the ROC curve, although in the modest range (between 0.6 and 0.7), was likely optimistic, because the regression-based discrimination rule was derived and applied to the same data set. The modus operandi of the logistic regression is that its parameters are estimated from observed data such that they maximize the likelihood of the observed data. Thus, when the logistic regression equation is used to generate a discrimination rule and if the rule is applied to the same data set (which is the case in the study of Glüer et al.), the discrimination rule would naturally yield the best concordance with the observed data. Therefore, the area under the ROC curve is only meaningful when the discrimination rule is applied to an independent sample.
Moreover, the ROC curve is a trade-off between sensitivity and complement of specificity (i.e., false positive rate). Sensitivity is the proportion fracture cases who are identified as “high risk” by, in this case, the QUS measurement; specificity is the proportion of nonfracture individuals who are identified by the QUS as “low risk.” Both sensitivity and specificity do not answer the question of interest: given a group of individuals with “high risk” (or “low risk”) values of QUS, how many of them will sustain a fracture? To answer this question, we need to take into account the prevalence of fracture and estimate the positive predictive value (PPV), which can be shown as a function of fracture prevalence, OR, and sensitivity or specificity (Fig. 2). (A proof can be obtained by e-mailing the authors.) In the study of Glüer et al., all ORs were <2, and for a fracture prevalence of 16%, the PPV was likely <0.2, which is very low for identification of fracture cases. Even with the combination of QUS and BMD, the PPV is still <0.3, suggesting that such a combination does not provide an accurate discrimination.
Finally, we would like to make a comment on the interpretation by Glüer et al. Based on the estimates of the logistic regression parameters, Glüer et al. state, as an example of interpretation, that “For example, two women with an age difference of 15 years and an Achilles + SOS difference of 39.5 m/s (1.5 SEE or 1.4 SD of older women) would have odds of having a vertebral fracture that differed by a factor of OR = (1.51)2 × (1.49)1.5 = 3.36.” We consider that the interpretation is too simplistic and even inappropriate, because (1) the estimates of the logistic regression parameters are averages, and as a result, they can only logically be applicable to a group of individuals, not to any particular individual, because an “average individual” does not exist in any population; and (2) risk is a population measure with a denominator, whereas an individual does not have a denominator. Indeed, results of epidemiologic and clinical studies, under some strict assumptions and conditions, can be generalized to a group of subjects, but cannot be individualized to any particular individual. An individual risk of fracture is a dichotomous measure: it is either 1 (to sustain a fracture) or 0 (not sustain a fracture); any continuous risk prediction that falls between these two limits is simply a theoretical conjecture for a group of individuals and does little to inform the individual.
The very nature of fracture prediction is to use a limited amount of information to make an accurate discrimination of fracture from nonfracture cases. Because of the inherent uncertainty of interaction among risk factors, the discrimination rule has inevitably to be a frugal heuristic one, but it must possess a high positive predictive value. However, given the low positive predictive values as we have shown above, it does not seem justified to use QUS and BMD in the identification of asymptomatic vertebral fracture at the individual level.
- 12004 Association of five quantitative ultrasound devices and bone densitometry with osteoporotic vertebral fractures in a population-based sample: the OPUS Study. J Bone Miner Res 19: 782–793., , , , , , , , , , , ,
- 21990 The use of risk factors in medical diagnosis: opportunities and cautions. J Clin Epidemiol 43: 851–858.,
- 31995 To use or not to use the odds ratio in epidemiologic analyses. Eur J Epidemiol 11: 365–371.
- 41998 Odds ratios should be avoided when events are common. BMJ 317: 1318., ,
- 51996 Down with odds ratios! Evidence Based Med 1: 164–166., ,
- 61998 When can odds ratios mislead? BMJ 316: 989–991., ,
- 72002 What's the relative risk? A method to directly estimate risk ratios in cohort studies of common outcomes. Ann Epidemiol 12: 452–454., ,
- 82003 Estimating the relative risk in cohort studies and clinical trials of common outcomes. Am J Epidemiol 157: 940–943., , ,
- 91999 Misunderstandings about the effects of race and sex on physicians' referrals for cardiac catheterization. N Engl J Med 341: 279–283., ,