Risk Assessment and Fracture Discrimination by Ultrasound: The Debate Continues
Article first published online: 29 NOV 2004
Copyright © 2005 ASBMR
Journal of Bone and Mineral Research
Volume 20, Issue 3, pages 539–540, March 2005
How to Cite
Glüer, C. C., Eastell, R., Reid, D. M., Felsenberg, D., Roux, C., Barkmann, R., Timm, W., Blenk, T., Armbrecht, G., Stewart, A., Clowes, J., Thomasius, F. E. and Kolta, S. (2005), Risk Assessment and Fracture Discrimination by Ultrasound: The Debate Continues. J Bone Miner Res, 20: 539–540. doi: 10.1359/JBMR.041135
- Issue published online: 4 DEC 2009
- Article first published online: 29 NOV 2004
In their letter, Nguyen et al. review some of the well-established limitations of using odds ratios (ORs). However, we believe that the following critical statements regarding our manuscript are based on an inappropriate representation of the position that we put forward in our paper,(1) in part by disregarding key statements in the paper. We also disagree with their main conclusion that risk assessment is of little use for the individual.
At the outset, Nguyen et al. explain some of the limitations of using ORs. Notwithstanding their limitations, ORs represent the method of choice for expressing associations with disease prevalence in cross-sectional studies, as is explained in statistics textbooks.(2) In the osteoporosis literature, the majority of cross-sectional fracture studies published in leading bone journals, including JBMR, present results based on logistic regression analysis and ORs, and rightfully so. The Osteoporosis and Ultrasound (OPUS) study on fracture prevalence represents one such study. ORs facilitate the comparison of results of cross-sectional and prospective studies that used different sampling schemes. Risk ratios (RRs) for a disorder are strongly dependent on the prevalence of that disorder, and thus, it would not be advisable to use this measure for most cross-sectional studies. Even for prospective vertebral fracture studies, many researchers use logistic regression instead of the Cox proportional hazard model and report ORs because time of fracture is not known. The degree of overestimation of RRs for the quantitative ultrasound (QUS) methods in our study is limited, and it does not affect a key point of the publication, that is, the comparison of QUS with DXA. Interestingly, Nguyen et al. have recently published a study in which they also used this statistical tool.(3)
Nguyen et al. correctly state that ORs represent a measure of association. However, ORs can also be used as a measure to characterize how well a variable is able to discriminate between two groups (e.g., with and without fractures). Just as with linear discriminant analysis, logistic regression analysis is a statistical method to find the best function of multiple risk factors to determine the group memberships of subjects. The β coefficients in the respective discriminant or logistic functions are optimized in a way to best separate the groups. The logit transformation that distinguishes logistic regression from linear discriminant analysis results in β coefficients that aid clinical interpretation, which is an added advantage. As a secondary step of analysis, receiver operating characteristic (ROC) curves are appropriate to test whether two multivariate models show significantly different discriminatory power—it is difficult to analytically test whether two logistic models have significantly different discriminatory power unless iterative boot strapping methods are used. In their recent publication, Nguyen et al. not only use ORs and ROC analysis, but they explicitly judge “discriminatory power” based on the area under the curve.(3)
The large degree of overlap between fractured and nonfractured populations is well known, but this does not invalidate the use of ORs as measures of the discriminatory power of the techniques. We did not claim that we achieved anything close to complete discrimination as Nguyen et al. seem to imply. Unlike Nguyen et al., we also did not use the term accurate discrimination, which is ill defined. Instead, we used statistical criteria based on ROC analysis. This is meaningful, and because of the fact that our study is population based, it should be applicable to the general population. As usual, verification of results in an independent sample would further improve the level of evidence.
Accepting the use of logistic regression analysis for discrimination does not mean that QUS methods could be used to identify or diagnose fractures. Nguyen et al. state that we used OR-based ROC analysis “to gauge the use of QUS in the identification of fracture.” We did not do this. Instead, in our paper, we very clearly point out that “DXA and QUS methods are not suited for diagnosing fractures. Therefore, the modest sensitivity to identify women with prevalent fractures in a population-based sample is not surprising…” In the remainder of the paragraph that begins with this statement we contrast such inappropriate use with a different, more appropriate role for QUS: as a technique that is helpful “to identify subjects at highest risk for having a vertebral fracture,” that is, as a technique helpful to decide whether additional diagnostic assessments are warranted (e.g., taking a spinal radiograph). Only such an additional assessment would allow the identification of fractures.
Nguyen et al. propose to express the performance of the techniques in terms of positive predictive values (PPVs). They may not have realized that we in fact did analyze our data in this way. NNX, the numbers needed to X-ray, is simply the inverse of the PPV. NNX has the added advantage that it is easy to interpret and that it clearly documents that we are not claiming that we can diagnose vertebral fractures—we suggest taking radiographs for this purpose. And, unlike the term PPV, there is little risk that our data could be misinterpreted as having been derived from prospective data.
Nguyen et al. finally deny that fracture risk assessment would be useful in the individual. In our view, both of their statements “An individual risk of fracture is a dichotomous measure: it is either 1 (to sustain a fracture) or 0 (not sustain a fracture)” and “The very nature of fracture prediction is to use a limited amount of information to make an accurate discrimination of fracture from nonfracture cases” are wrong. We believe that they confuse the concept of risk assessment (which, because the fracture event has not yet happened, by definition is never going to be exact) and fracture discrimination (which is appropriate for cross-sectional studies)—here they make the mistake that they inappropriately accuse us of. Prospective data from groups of subjects in our view should be used to quantify the likelihood of future fracture, preferably in terms of absolute fracture risk. The current move toward intervention thresholds for individual patients that are based on absolute fracture risk derived from cohort studies is a good example for this strategy. In OPUS, we are currently re-examining the participants to contribute prospective data that could be used for such clinical purposes. Hopefully, such data will eventually convince Nguyen et al. that risk assessment for the individual patient is most important for case-finding strategies, as well as for therapeutic decision-making in osteoporosis.
- 12004 Association of five quantitative ultrasound devices and bone densitometry with osteoporotic vertebral fractures in a population-based sample: The OPUS Study. J Bone Miner Res 19: 782–793., , , , , , , , , , , ,
- 21982 Case-Control Studies. Oxford University Press, New York, NY, USA.
- 32004 Bone mineral density-independent association of quantitative ultrasound measurements and fracture risk in women. Osteoporosis Int 15: 942–947., ,