Although obesity has been widely believed to be protective against fracture, recent studies have challenged this perception. In an audit of postmenopausal women presenting with low trauma fracture to a Fracture Liaison Service, the prevalence of obesity was 28%,1 whereas in the Global Study of Osteoporosis in Women (GLOW) the incidence of low trauma fractures was similar in obese and nonobese postmenopausal women.2 The distribution of fracture sites differs between obese and nonobese women, with fractures of the leg, ankle, and humerus being reported more commonly in obese women, whereas fractures of the hip, wrist, and pelvis are less common.2–6
FRAX is a computer-based algorithm that is widely used in clinical practice to calculate the 10-year probability of major osteoporotic fractures (hip, clinical spine, humerus, or wrist fracture) and hip fractures.7, 8 Clinical risk factors (age, body mass index [BMI], previous fracture, parental history of hip fracture, glucocorticoid therapy, smoking, alcohol intake, rheumatoid arthritis, and secondary causes of osteoporosis) are used alone or with hip bone mineral density (BMD) to predict 10-year fracture probability. A number of studies have investigated its use in populations of postmenopausal women and have generally shown moderately good discrimination between fracture and nonfracture cases and reasonably close agreement between predicted and observed fracture frequency, particularly for hip fracture.9–20 However, its utility in fracture prediction in obese women has not been reported; higher BMI, BMD, and a greater frequency of falls in obese women with fracture1, 2 might be expected to affect its performance. In addition, the prevalence of obesity in the populations used to develop FRAX was 18.3%, considerably lower than the current prevalence in women of 34% and 23% in the United States and Europe, respectively.21, 22 In this study, we have compared the prediction of low-trauma clinical fractures using FRAX with and without BMD in obese and nonobese older postmenopausal women.
Materials and Methods
For this analysis, we used data from the Study of Osteoporotic Fractures (SOF). SOF is a multicenter study of risk factors for fracture in women aged 65 years and older. The participants were community-based ambulatory women recruited between September 1986 through October 1988, from population-based listings at four clinical centrers in Portland, OR; Minneapolis, MN; Baltimore, MD; and the Monongahela Valley near Pittsburgh, PA.23 Women unable to walk without assistance and women with bilateral hip replacements were excluded. In addition, black women were excluded because of their low incidence of hip fracture. All participants provided informed consent, and the protocol was approved by the institutional review boards of the participating sites.
Baseline examinations took place from 1986 to 1988. From January 1989 to December 1990, all participants were invited to undergo a second evaluation. A total of 9704 women attended the first visit, and 8098 women attended the second visit. A total of 1241 women provided questionnaire data by mail and telephone without attending the clinic. For the current analysis, we used the second visit for baseline data because measurement of hip BMD was first made at this time.
The second visit included a self-administered questionnaire, questions administered by an interviewer, and BMD measurement.
The self-reported questionnaire included demographics and risk factors for fractures including age, smoking habits, alcohol consumption, family history of fractures, and personal history of fracture after the age of 50 years. Women were also asked about medical conditions such as diabetes mellitus, rheumatoid arthritis, and glucocorticoid use.
Weight was measured in indoor clothing with shoes removed using a balance beam scale, and height was measured using a standard held-expiration technique with a wall-mounted Harpenden stadiometer (Holtain Ltd., Dyved, UK). BMI was calculated by the formula weight in kilograms divided by the square of height in meters. In addition, waist and hip circumference were measured and waist to hip ratio was calculated.
BMD of the proximal femur (total hip and its subregions) was measured between 1988 and 1990 (visit 2) using Hologic QDR 1000 scanners (Hologic, Bedford, MA). The coefficient of variation was 1.2% for both sites.24, 25) This was performed on 84% (7959) of the initial cohort.
WHO and FRAX 10-year absolute fracture risk
Ten-year probability of hip fracture and major osteoporotic fracture (hip, clinical spine, wrist, or humerus) was calculated for each SOF participant using the FRAX algorithm for US white women (FRAX version 3.0). For the calculated probabilities, the risk factors included were age, BMI, parental history of hip fracture, patient history of previous fracture, presence of rheumatoid arthritis, smoking status, consumption of three or more alcoholic beverages per day, and current use of glucocorticoids.26, 27 The FRAX 10-year probabilities were estimated both with and without femoral neck (FN) BMD.
Ascertainment of fractures
Women were contacted every 4 months to determine their fracture status; more than 98% of these contacts were completed. All reported fractures were confirmed by radiographic report. Women who reported fractures were interviewed to determine the circumstances. Pathologic fractures (including periprosthetic) and fractures secondary to extreme trauma were excluded. Incident fracture outcomes include hip fracture, major osteoporotic fracture, and any clinical fracture. As FRAX generates 10-year probabilities, for this analysis the follow-up was truncated to 10 years in women with ≥10 years follow-up.
Only women with data on risk factors for the calculation of FRAX 10-year probabilities and FN BMD were included in this analysis. Obese and nonobese women were defined as women with BMI ≥30 kg/m2 and <30 kg/m2, respectively. Receiver operating characteristic (ROC) curve analysis was used to evaluate the ability of FRAX to predict fractures in obese and nonobese women. Further analysis was also performed to assess model calibration (ie, how close the observed rates agree with the predicted risks). This involved comparing the predicted frequencies of fractures with overall observed frequencies of fractures in both groups (obese and nonobese) and stratifying the data into categories defined by quartiles of FRAX 10-year probabilities and comparing the observed and predicted counts within each risk category. Ratios of observed to predicted counts were calculated to aid comparisons. The utility of the FRAX models in clinical practice was also assessed by calculating estimates of sensitivity, specificity, negative predictive value (NPV), and positive predictive value (PPV) according to the National Osteoporosis Foundation (NOF) recommended intervention thresholds of 3% for hip fracture and 20% for major osteoporotic fracture.28 In addition, the Hosmer-Lemeshow goodness-of-fit statistic was calculated for logistic regression models including all risk factors used to calculate FRAX 10-year probabilities. A p value <0.05 indicates a lack of good fit for the model.
A decision curve analysis method was performed to evaluate the clinical usefulness of the FRAX models.29–31 Decision curves were constructed for the FRAX model with and without BMD for the outcomes of (1) hip fracture, (2) major osteoporotic fracture, and (3) all clinical fractures. This consisted of plotting the net benefit of the FRAX models compared with the strategy of treating no women at various threshold probabilities. The threshold probabilities are the predicted risks of fracture where the clinical consequences are uncertain (ie, there is uncertainty about whether the women would be classified as a high risk of fracture or not).
Differences were considered significant when the two-tailed p value was <0.05. The decision curve analysis was performed using R statistics software, version 2.14.1 (R Foundation for Statistical Computing, Vienna, Austria). All other analyses were performed using version 18 of the SPSS statistics package for Windows (SPSS Inc., Chicago, IL, USA).
Data for FRAX scores and FN BMD were available in 6049 women. Of those, 18.5% were obese. Incident clinical fractures occurred during the follow-up period in 26.9% and 32.7% of obese and nonobese women, respectively. Nonobese women with fracture had a significantly lower FN BMD than obese women (Table 1). The mean (SD) duration of follow-up was 9.03 (2.22) years (obese women 9.12 [2.09], nonobese women 9.0 [2.26] years, p = 0.121); 10 years follow-up was available in 72.5% of women. During the follow-up period, 252 (22.5%) of obese women died versus 1215 (24.6%) of nonobese women (p = 0.142).
Table 1. Characteristics of Obese and Nonobese Women With Incident Fracture
Obese women (n = 285)
Nonobese women (n = 1509)
Age (years), mean (SD)
Oral glucocorticoid therapy
Parental history of hip fracture
Alcohol intake ≥3 units/day
Femoral neck BMD (g/cm2), mean (SD)
FRAX-derived probabilities in women with incident hip or major osteoporotic fractures were significantly lower in obese than in nonobese women (without BMD: 5.8% versus 11.4% for hip and 17.6% versus 23.6% for major osteoporotic fracture, p < 0.0001; with BMD: 7.1% versus 10.9% for hip and 18.2% versus 23.3% for major osteoporotic fracture, p < 0.0001). Nevertheless, ROC analysis showed no significant differences in the ability of FRAX models with or without BMD to predict fractures in obese and nonobese women (Table 2).
Table 2. Comparison of the Area Under the Curve (AUC [95% Confidence Interval]) From Receiver Operating Characteristic Curve (ROC) for the FRAX Algorithm Between Obese and Nonobese Women
FRAX algorithm including BMD
Women with hip fractures
Women with any major osteoporotic fracture (hip, clinical vertebral, wrist, and humerus)
Women with any clinical fracture (nonvertebral and clinical vertebral)
FRAX algorithm not including BMD
Women with hip fractures
Women with any major osteoporotic fracture (hip, clinical vertebral, wrist, and humerus)
Women with any clinical fracture (nonvertebral and clinical vertebral)
The sensitivity and specificity of the FRAX scores according to the NOF risk thresholds are displayed in Table 3. The sensitivity for hip fractures was lower in obese women, but the specificity was higher when compared with nonobese women. There was a lower sensitivity in both obese and nonobese women for major osteoporotic fractures, particularly in obese women, but the specificity was somewhat better in obese women. Interestingly, the NPVs and PPVs for hip fracture were almost identical in obese and nonobese women and closely similar for major osteoporotic fracture. In the stratified comparison of observed and expected counts, the FRAX performance for hip fracture was not as good in obese than in nonobese women in the lower two risk quartiles (Table 4). Both models performed well for major osteoporotic fractures in obese and nonobese women, particularly in the three highest quartiles of risk.
Table 3. Comparison of Expected and Observed Frequencies of Fractures According to NOF Thresholds
FRAX + BMD
FRAX + BMD
NOF = National Osteoporosis Foundation; MOP = major osteoporotic fracture; BMD = bone mineral density; PPV = positive predictive value; NPV = negative predictive value.
The figures are shown as predicted counts/observed counts.
For hip fracture, high risk is classified as a 10-year fracture probability ≥3%.
For MOP fracture, high risk is classified as a 10-year fracture probability ≥20%.
Table 4. Comparison of Expected and Observed Frequencies of Fractures in Obese and Nonobese Women Using Quartiles of the Predicted Probabilities
Observed counts are observed counts/total number of patients in the risk threshold category.
FRAX using BMD
Hip fracture (%)
Major osteoporotic fracture (%)
FRAX without BMD
Hip fracture (%)
Major osteoporotic fracture
The Hosmer-Lemeshow goodness-of-fit analysis demonstrated insufficient evidence that the logistic regression models fit poorly for obese women, apart from the model for hip fractures not including BMD (p = 0.03), and for nonobese women apart from the model for major osteoporotic fracture not including BMD (p = 0.02).
Fig. 1 shows the decision analysis curves for FRAX with and without BMD in obese and nonobese women for hip fracture, major osteoporotic fracture, and all clinical fractures. For hip fracture, the FRAX models were useful in women for threshold probabilities in the range of 4% to 15%. For nonobese women, there was very little difference between the FRAX models, but for obese women, the net benefit of the FRAX model including BMD was clearly superior. For major osteoporotic fracture, the FRAX models were useful in women with predicted probabilities in the range of 10% to 30%, with very little difference between the net benefit of the two FRAX models in either obese or nonobese women. In the case of all clinical fractures, the FRAX models were not useful at threshold probabilities below around 30%, with only a small net benefit at probabilities in the range of 30% to 40%.
In this study, we used several approaches to compare the performance of FRAX with and without BMD in predicting fracture in obese and nonobese women. We hypothesized that because of higher BMI and BMD in obese women, the predictive value of FRAX, with or without BMD, might be inferior to that in nonobese women. However, ROC analysis and calibration did not reveal any clear differences in the utility of the models between obese and nonobese women, although decision curve analysis indicated a greater net benefit in nonobese than in obese women.
Differences in the results obtained by the various approaches used in this study are not unexpected because they provide different information about prediction models. ROC analysis is widely used in evaluating prediction models as a test of discrimination, with comparison of the area under the curve (AUC) expressed as the c statistic or c index. This compares how well models separate cases and noncases but does not provide information about whether a case has an accurate risk probability or about the value of the model in clinical practice. On its own, the c statistic does not consider the range of cut-point values used to compute the ROC curve or the clinical usefulness of these cut points. In the current study, AUC values ranged from 0.66 to 0.76 for hip fracture and 0.63 to 0.70 for major osteoporotic fracture, indicating only modest discrimination between fracture and nonfracture cases in both nonobese and obese women. Furthermore, the substantial differences in sensitivity and specificity shown in Table 3 were not captured by AUC data. Calibration provides information about the agreement between predicted and observed risk but not about the clinical relevance of miscalibration. A test of statistical significance can be applied, the Hosmer-Lemeshow test, but this tests whether there is adequate evidence for miscalibration rather than if there is good calibration. Classification and reclassification are useful in the comparison of two models in the same group of patients but not in assessing the utility of an individual model in clinical practice. Decision analytic methods are based on the different weighting of false positives and false negatives and aim to determine the net benefit of implementing prediction models in clinical practice.29–31
Comparison of the performance of FRAX with and without BMD in the different analyses did not reveal any consistent differences in either obese or nonobese women. ROC analysis indicated slightly better discrimination using FRAX + BMD rather than FRAX alone, but the calibration data showed a closer agreement between predicted and observed frequencies of fractures when FRAX without BMD was used. Decision curve analysis demonstrated a higher net benefit associated with use of FRAX with BMD than FRAX alone in obese women for prediction of hip fracture at clinically relevant thresholds and a smaller advantage of FRAX with BMD for prediction of major osteoporotic fracture in both obese and nonobese women. We have recently reported that obese women with fracture have significantly lower BMD than obese women without fracture, despite closely similar BMI in the two groups, indicating that BMI may be a poor surrogate for BMD in obese women with fracture.32
The finding that both discrimination and calibration measures were generally worse for all clinical fractures than for hip and major osteoporotic fractures is not unexpected, given that FRAX was developed to predict hip, wrist, humerus, and spine fractures only. The results of the decision curve analysis also indicated that the FRAX models were not useful in predicting all clinical fractures over clinically relevant thresholds. However, although the sites at which fracture is predicted are clearly stated on the FRAX website, it may be wrongly assumed by some that the results can be extrapolated to low-trauma fracture at any site. Our results suggest that this is not the case, although we did not have high enough numbers of the other fractures to analyze individual sites separately.
Strengths of our study include the prospective design and inclusion of community-based, ambulatory women. In addition, all fractures were adjudicated, and BMD measurements were subjected to rigorous quality control. Although SOF is not a population-based sample, characteristics of the SOF participants are similar to, or healthier than, those of the population-based NHANES III. The prevalence of obesity in SOF (18.3%) was somewhat lower than that of white women in the general population (22.4%), although closely similar to that of women and men studied in the cohorts from which FRAX is derived (18%).9, 33 Limitations are that only white postmenopausal women aged 65 years and older were included in this analysis, and it remains to be established whether our results can be generalized to men, younger women, and individuals of different races. Secondly, morphometric vertebral fractures were not included in the analyses. Thirdly, not all women in this study had 10 years of follow-up. Finally, fracture probability computed by FRAX incorporates death hazard, whereas the estimation of fracture incidence in the SOF cohort did not take into account the fracture status of women who did not survive. However, the mean duration of follow-up and the mortality were similar in obese and nonobese women.
Overall, our results indicate that FRAX with and without BMD is of similar value in predicting hip and major osteoporotic fractures in obese women versus nonobese women. However, the net benefit values for both FRAX models are lower over clinically relevant thresholds than in nonobese women, in part most likely because of the lower number of true positives for obese women relative to the total sample size. The lower fracture probabilities in obese women make them more likely to be below a given intervention threshold, and the lower sensitivities suggest that these women are less likely to receive treatment, a contention supported by our recent finding of significantly lower rates of treatment in obese women than in nonobese women with fracture.2 However, the positive and negative predictive values, which are more clinically meaningful quantities, are very similar between obese and nonobese women. The negative predictive values for obese women are at least as high as the corresponding values for nonobese women, which means that obese women below the NOF thresholds are no more likely to have a fracture than nonobese women.
The significantly lower rates of treatment in obese women than in nonobese women may reflect the perception that obese women do not suffer “osteoporotic” fractures because of their higher BMD.2 However, the lower BMD in obese women with fractures when compared with women of a similar BMI without fracture indicates that BMD may be inappropriately low in those who fracture. The better performance of FRAX with than without BMD in the decision curve analysis, particularly for hip fracture, supports this contention. Whether bone-protective therapy is effective in obese individuals at increased risk of fracture has not been rigorously tested, and further randomized trials are required to evaluate efficacy of treatments among obese women selected by level of BMD and/or FRAX in view of the growing prevalence of obesity and the substantial contribution of obese individuals to the overall burden of fractures in the aging population.
All authors state that they have no conflicts of interest.
JEC is supported by the National Health Service, National Institute of Health Research, and the Cambridge Biomedical Research Centre. The Study of Osteoporotic Fractures (SOF) is supported by National Institutes of Health funding. The National Institute on Aging (NIA) provides support under the following grant numbers: R01 AG005407, R01 AR35582, R01 AR35583, R01 AR35584, R01 AG005394, R01 AG027574, and R01 AG027576.
Authors' roles: Study design: All the authors. Data management: MP, RAP, LL. Data interpretation: All the authors. Drafting the manuscript: MP, RAP, JC. Revising the manuscript content: All the authors. Approving the final version of the manuscript: All the authors.