Dr. Leslie receives honoraria or speaker's fees from Merck Frosst Canada, Sanofi-Aventis, and Genzyme Canada Ltd., research support from Merck Frosst Canada, and unrestricted educational grants from Procter & Gamble Pharmaceuticals, Novartis Pharmaceuticals Canada, and Amgen, Canada. Dr. Hans receives honoraria from Perceptive Informatics and Cardea Technology and research support from Hologic, Med-imaps, GE Healthcare, and DAL Foundation. He has stock options or a patent relationship with Synarc, Ascendys, and Med-Imaps.
Commentary
On Lumpers and Splitters: The FRAX Debate Continues
Article first published online: 28 SEP 2009
DOI: 10.1359/jbmr.090902
Copyright © 2009 ASBMR
Additional Information
How to Cite
Leslie, W. D. and Hans, D. (2009), On Lumpers and Splitters: The FRAX Debate Continues. Journal of Bone and Mineral Research, 24: 1789–1792. doi: 10.1359/jbmr.090902
Publication History
- Issue published online: 4 DEC 2009
- Article first published online: 28 SEP 2009
- Abstract
- Article
- References
- Cited By
“The stone age was marked by man's clever use of crude tools; the information age, to date, has been marked by man's crude use of clever tools.” Anonymous
Social anthropologists have long recognized a fundamental human tendency to dichotomize. This can be said for the response to the World Health Organization Collaborating Center's Fracture Risk Assessment tool (FRAX), which is barely 1 yr old.(1) FRAX enthusiasts and FRAX detractors are easy to identify, and few remain neutral. Although the principles underlying the development of FRAX are appealing, the actual gain in fracture prediction from using this multivariable instrument over simpler instruments (possibly BMD alone) is still not clear.
The ability to accurately gauge fracture risk is crucial in identifying cost-effective thresholds for intervention.(2,3) Working with population-based cohort data from Europe, North America, Asia, and Australia, the WHO Collaborating Center identified seven clinical risk factors (prior fragility fracture, a parental history of hip fracture, smoking, use of systemic corticosteroids, excess alcohol intake, body mass index, and rheumatoid arthritis), which, in addition to age and sex, contribute to fracture risk independently of BMD.(3,4) Kanis et al.(5) has reported that the inclusion of clinical risk factors with BMD increases the average gradient of risk for osteoporotic fracture prediction.
The principle steps leading to adoption of a clinical prediction tool such as FRAX have recently been reviewed(6–9) and can be briefly summarized as follows: (1) development-identification of the important predictors, assigning relative weights to each predictor, and estimating the model's predictive performance through discrimination (the ability of a model to distinguish those with from those without the outcome of interest) and calibration (the degree of correspondence between predicted and observed probabilities); (2) validation-testing the model's predictive performance in new participants (ideally with a different case mix or using slightly different definitions and measurements of predictors and outcomes); and (3) clinical impact-verifying that use of the prognostic model by practicing doctors truly improves their decision making and ultimately patient outcome.
A model's performance is likely to be overestimated when it is developed and assessed on the same dataset. This is known as optimism and is the bias caused by overfitting of the data.(10) Therefore, it is important to look for additional independent cohorts in which the predictions from FRAX can be objectively tested. To their credit, the FRAX developers conducted independent validation studies in multiple large cohorts that were not used in the derivation process. For the nine validation cohorts with available data, eight showed average gradients of risk lower than in the original derivation cohorts for prediction of osteoporotic fractures with BMD, and seven showed lower average gradients of risk for prediction of hip fractures with BMD.(5) On a chance basis alone, one would expect approximately one half of these gradients of risk (four or five) to exceed the gradient risk for the original derivation cohorts and the remainder to have lower values. The fact that this is not the case suggests that there is optimism in the FRAX model, and it is instructive to determine the robustness of the model's performance when applied by new investigators.(6,7)
This issue of JBMR contains a provocative study based on an analysis of vertebral fracture outcomes in untreated (placebo) women from the Fracture Intervention Trial (FIT).(11) During 3.8 yr of follow-up, 223 (7.3%) of the 3221 women had at least one new radiographic vertebral fracture. That older age, lower BMD (femoral neck), and prior vertebral fractures were major risk factors for new vertebral fractures is not surprising. That 10-yr osteoporotic fracture risk estimates from FRAX, denoted FRAX (osteoporotic), was predictive of incident vertebral fractures is also not surprising, because it includes components (especially age) that are univariate predictors. What is challenging in this report is that FRAX (osteoporotic) did not improve fracture discrimination based on a simpler prediction model based on age, femoral neck BMD, and baseline vertebral fracture. Similar results were recently reported from the Canadian Multicentre Osteoporosis Study (one of the FRAX derivation cohorts) using a population-based community cohort of 2761 noninstitutionalized men and women ≥50 yr of age with fracture outcome data to 5 yr (343 morphometric vertebral and 200 nonvertebral fractures).(12) A model considering age, BMD, and baseline radiographic spine fracture status provided greater predictive information for any incident fragility fracture than a model considering the WHO FRAX risk factors alone.
At first glance, these studies seem to strike a blow against FRAX. What is the point in using a more complex tool when a simpler tool will do the same job just as well? Digging deeper, however, raises questions about the design and interpretation of this new report from the FIT study. Notably, this analysis conflates radiographically confirmed fractures with self-report of fracture and global fracture risk with site-specific fracture risk,(11) clouding rather than illuminating some of the fundamental questions around FRAX and its performance:
- 1.FRAX (osteoporotic) was originally optimized for prediction of a composite measure of osteoporotic fractures (hip, clinical spine, forearm, humerus). Clinical spine fractures are a subset of all morphometric fractures and typically represent only a minority of major clinical fractures (approximately one quarter).(13) The definition of prior fracture used in the FRAX calculation was narrowly limited to self-reported history of prior fragility fracture, which has an average confirmation rate of 71% for all single-site fractures but only 51% for clinical spine fractures.(14) Although the prevalence of prior fragility fracture in the participants was quite high (43%), this may have included a large number of fractures that were poorly documented or unrelated to osteoporosis (e.g., ankle fractures). In contrast, the model fitted to the FIT data included an objective radiographic determination of baseline vertebral fracture, which, as the authors fairly admit, increases the risk for further fractures 4- to 5-fold-a considerably steeper gradient of risk than seen with any prior fragility fracture for predicting any osteoporotic fracture.(15,16) Therefore, FRAX was disadvantaged in the comparison because nonclinical morphometric vertebral fractures were not used in the FRAX calculation, thereby depriving FRAX of the strongest risk factor for future vertebral fractures, namely a past vertebral fracture.
- 2.Other variables that may have enhanced the performance of FRAX were exclusion criteria from FIT (corticosteroid use), had too low a prevalence to affect results (>2 units of alcohol/d, 2%; rheumatoid arthritis, 4%), or may be better suited for prediction of nonvertebral fractures (maternal history of hip fracture, 11%). Is it any wonder that there would be performance differences between a FRAX model optimized for all osteoporotic fractures and a FIT model optimized for vertebral fractures? Results might be very different if this study was repeated in a cohort of rheumatology patients.
- 3.How much better is the simple FIT model (baseline vertebral fracture plus femoral neck BMD plus age) than FRAX (osteoporotic with femoral neck BMD)? Ignoring the question of optimism in the FIT model (derived and tested within the same study population, whereas the FRAX estimates used an existing prediction model), the c-statistic (equivalent to receiving operating characteristic [ROC] area under the curve) was 0.76 for the former and 0.71 for the latter. However, we do not know how much better FRAX would have been if baseline vertebral fracture had been included in the calculation. The authors state that “as defined previously, the FRAX 10-yr estimate of major osteoporotic fractures only includes clinical spine fractures.” This is a very strict interpretation of how to use FRAX. The University of Sheffield website indicates “a prior morphometric fracture has the same significance as any other prior fragility fracture and can be entered into the FRAX model.”
Comments thus far have highlighted issues of validating fracture discrimination with these prediction models. The FIT authors briefly allude to the question of calibration, pointing out that “FRAX estimates used in this analysis are from the first release and have been shown to overestimate fracture risk in the U.S. white female population.” A model can show good discrimination but still fail as a clinical tool if it is not well calibrated to the population. Achieving good model calibration may actually be more difficult than is widely appreciated. Ten-year risk of cardiovascular disease was recently evaluated in a large UK cohort of patients from general practice using the Anderson-Framingham model (1.07 million patients with 43,990 cardiovascular events).(17,18) Although Anderson-Framingham gave good discrimination (ROC area under the curve 0.737 for men and 0.761 for women), it overpredicted risk by 23% overall. This degree of miscalibration has large implications in terms of overtreatment (if calibrated too high) or undertreatment (if calibrated too low). Although FRAX can be calibrated to a population/country of interest based on local fracture data, independent (external) validation of the predictions is still prudent.
The value of the current JBMR report lies in highlighting these important issues, and for that the FIT investigators should be applauded. As FRAX evolves (as it has already done and will undoubtedly continue to do), these are exactly the kinds of questions and analyses that need to be undertaken in more cohorts of varying case-mix. Many would like to see a FRAX model that is more responsive to vertebral fracture risk (possibly but not necessarily including spine BMD). This is particularly important when examining the clinical armamentarium for osteoporotic fracture prevention, because the vertebral antifracture effect is consistently greater than the nonvertebral antifracture effect.(19) In recent meta-analyses from Wells et al.,(20,21) the nonvertebral fracture risk reduction from aminobisphosphonate therapy was 16–20% versus 37–45% for vertebral fractures. The value in risk prediction is to identify individuals in whom intervention can reduce that risk, because there is no value to the individual in identifying them as high risk for a condition that has no effective treatment.
Ultimately FRAX is a tool, and like all tools, must be wielded skillfully to achieve the best results. It is unclear whether its performance in the FIT study reflects a weakness of FRAX or simply failure to use the tool to its optimum. Pushing the analogy further, any craftsman would recognize that a single tool cannot do all jobs equally well-you must pick the right tool for the right job. If the job in question is assessing radiographic vertebral fracture risk, the FRAX (osteoporotic) tool is not optimal. Conceivably, FRAX could be improved for predicting vertebral fracture risk. The exact structure of this model could be quite different from FRAX (osteoporotic). For example, it would likely include spine BMD (rather than femoral neck BMD(22) and prior vertebral fracture (rather than prior nonvertebral fracture). Perhaps new clinical risk factors would emerge as important, such as parental history of vertebral fracture (assuming this could be accurately ascertained, which is questionable) or other quantitative measures of bone strength (based on bone geometry, turnover, or microstructure). Regardless, FRAX (spine) would presumably outperform FRAX (osteoporotic) for vertebral fracture prediction just as FRAX (hip) is presumed to be optimized for hip fractures. Will FRAX (forearm) and FRAX (humerus) follow? Even assuming that a family of FRAX tools could be developed, each optimized for a specific fracture site, how would this benefit the clinician? If experts are confused about how to interpret site discordance in BMD measurements, imagine the confusion arising from a multitude of site-specific fracture prediction tools when they disagree (as they surely would).
Enter the lumpers and the splitters, a concept that was first introduced by the medical geneticist Victor McKusick in 1969.(23) A “lumper” takes a gestault view, looks for broad similarities, and is generally satisfied when a classification system works most of the time. In contrast, a “splitter” looks for precise definitions and seeks to create new categories and classifications systems to deal with the exceptions. In the context of FRAX, the “lumper” seeks one risk assessment system that works for the majority. FRAX (osteoporotic) would be seen as a good global risk assessment system-not perfect, but good enough. Whether splitting off radiographic vertebral fractures and creating a vertebral fracture prediction system, FRAX (spine), is beneficial or a distraction becomes a matter of personal taste. Will it outperform FRAX (osteoporotic) in this narrow domain? Likely. Will the public be better off with site-specific fracture assessment systems? We are skeptical and side with the “lumpers” on this.
REFERENCES
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20, , , , , , , 2008 Alendronate for the primary and secondary prevention of osteoporotic fractures in postmenopausal women. Cochrane Database Syst Rev CD001155.
- 21, , , , , , , 2008 Risedronate for the primary and secondary prevention of osteoporotic fractures in postmenopausal women. Cochrane Database Syst Rev CD004523.
- 22
- 23

1523-4681/asset/olbannerleft.gif?v=1&s=d7e4c0e37904a489128d3a4e58ba94214db307a9)
1523-4681/asset/olbannerright.gif?v=1&s=854ee0e4d351ead9faaca8bfab3e50b1c7c9d03d)
