SEARCH

SEARCH BY CITATION

Keywords:

  • BONE DENSITY;
  • BONE QUALITY;
  • QCT;
  • VERTEBRAL FRACTURE;
  • DISTAL FOREARM FRACTURE;
  • GRADIENT BOOSTING

Abstract

  1. Top of page
  2. Abstract
  3. Introduction
  4. Methods
  5. Results
  6. Discussion
  7. Disclosures
  8. Acknowledgements
  9. References
  10. Supporting Information

Advanced bone imaging with quantitative computed tomography (QCT) has had limited success in significantly improving fracture prediction beyond standard areal bone mineral density (aBMD) measurements. Thus, we examined whether a machine learning paradigm, gradient boosting machine (GBM) modeling, which can incorporate diverse measurements of bone density and geometry from central QCT imaging and of bone microstructure from high-resolution peripheral QCT imaging, can improve fracture prediction. We studied two cohorts of postmenopausal women: 105 with and 99 without distal forearm fractures (Distal Forearm Cohort) and 40 with at least one grade 2 or 3 vertebral deformity and 78 with no vertebral fracture (Vertebral Cohort). Within each cohort, individual bone density, structure, or strength variables had areas under receiver operating characteristic curves (AUCs) ranging from 0.50 to 0.84 (median 0.61) for discriminating women with and without fracture. Using all possible variables in the GBM model, the AUCs were close to 1.0. Fracture predictions in the Vertebral Cohort using the GBM models built with the Distal Forearm Cohort had AUCs of 0.82–0.95, whereas predictions in the Distal Forearm Cohort using models built with the Vertebral Cohort had AUCs of 0.80–0.83. Attempts at capturing a comparable parametric model using the top variables from the Distal Forearm Cohort resulted in resulted in an AUC of 0.81. Relatively high AUCs for differing fracture types suggest that an underlying fracture propensity is being captured by this modeling approach. More complex modeling, such as with GBM, creates stronger fracture predictions and may allow deeper insights into information provided by advanced bone imaging techniques. © 2012 American Society for Bone and Mineral Research.


Introduction

  1. Top of page
  2. Abstract
  3. Introduction
  4. Methods
  5. Results
  6. Discussion
  7. Disclosures
  8. Acknowledgements
  9. References
  10. Supporting Information

Advanced bone imaging methodologies, such as quantitative computed tomography (QCT) and high-resolution peripheral QCT (HRpQCT), can measure numerous bone macro- and microstructural properties, along with volumetric bone mineral density (vBMD) of cortical and trabecular bone separately.1, 2 Recent publications examining the relative ability of these different measurements to assess fracture risk have focused primarily on each measurement individually, which is helpful to better understand whether certain attributes of bone can discriminate between those with and without fractures.3–11 The focus has generally been on bone imaging measurements that are better understood, such as vBMD or cortical thickness. However, there is a potential for increased predictive ability when all available measurements are used in a multivariable approach, including measurements produced by these scanners that are less understood but which also may relate to the structural and biomechanical properties of bone. Additionally, more complex modeling allows for nonlinear relationships and interactions between variables. Statistical learning is a framework used extensively in finance and industry to predict outcomes, such as the price of a stock in 6 months.12 Many of these approaches, of which gradient boosting machines (GBM) are a particular instance, focus on improved prediction by combining information from many variables that individually may not be significant but together are very informative; of less concern is the functional form of any one variable. Indeed, these methods have often been successful even when the predictors are highly related. The goal of our study was to use GBM to determine whether prediction of specific fractures can be improved by incorporating additional information available from the scanners and to assess whether the resulting models that are useful for one kind of fracture are equally robust for predicting fractures of another type. Moreover, we hoped to evaluate the potential usefulness of underutilized measurements available from newer bone imaging devices.

Methods

  1. Top of page
  2. Abstract
  3. Introduction
  4. Methods
  5. Results
  6. Discussion
  7. Disclosures
  8. Acknowledgements
  9. References
  10. Supporting Information

Study subjects

As previously described,13 we identified 99 postmenopausal community women ≥50 years of age who were newly diagnosed with a distal forearm (Colles') fracture in 2001 to 2008. The fracture cases were frequency matched to 105 postmenopausal controls recruited from an age-stratified random sample of Olmsted County, MN, women (Distal Forearm Cohort). None of the controls had a history of an osteoporotic fracture, that is, a hip, spine or forearm fracture that occurred after age 35 years.

Similarly, we recruited 40 postmenopausal Olmsted County women ≥50 years of age who had a moderate-to-severe vertebral fracture that was clinically diagnosed within the past 5 years.14 They were compared with 78 controls with no vertebral fracture who were recruited from the same age-stratified random sample of community women (Vertebral Cohort). The 78 controls were also controls in the Distal Forearm Cohort. Seven subjects were cases in both the Distal Forearm Cohort and in the Vertebral Cohort. Thoracic and lumbar vertebral body fractures were assessed according to the semiquantitative method15 from the QCT lateral localizer images, which have no projection distortion and a nominal resolution of 0.5 mm. Deformities were classified as mild (grade 1), moderate (grade 2), or severe (grade 3), although only the latter two groups were included in this analysis.

Women with distal forearm or vertebral fractures because of severe trauma or to a specific pathological process were excluded, as was anyone who had undergone vertebroplasty or intermittent parathyroid hormone therapy. Women treated with antiresorptive drugs [bisphosphonates, hormone therapy or selective estrogen receptor modulators (SERM)] were included, however, as these agents do not appear to greatly alter bone structure.16 Each subject at the time of study visit also underwent anthropometric assessment, which included measurement of height to the nearest 0.1 cm and weight in light clothes without shoes to the nearest 0.1 kg. Written informed consent was obtained from all subjects.

Bone density and structure measurements

Hip, forearm, and total-body areal BMD (aBMD) were made by dual-energy X-ray absorptiometry (DXA) using the Lunar Prodigy system (GE Healthcare, Madison, WI, USA), and evaluated according to technical criteria from the International Society of Clinical Densitometry.17 Osteoporosis and osteopenia were defined by World Health Organization criteria,18 using femoral neck (FN) T-scores from the Lunar device. In addition to aBMD measurements, approximately 60 other bone and soft tissue parameters were available from DXA scans, as documented in the Supplemental Appendix.

FN and LS vBMD and geometry were assessed by single-energy spiral QCT using a 64-channel system (Somatom Sensation 64, Siemens Healthcare, Forcheim, Germany). In addition to total vBMD, we also measured trabecular vBMD in the central 70% of the midportion of the vertebral bodies and nondominant FN. A number of bone macrostructure measurements were derived, including total cross-sectional area, moment of inertia, section modulus, and cortical thickness, recognizing that thickness of the cortical shell is overestimated in the vertebrae because of volume averaging artifacts.19, 20 In addition to overall summaries for a midportion of the FN and vertebrae, many of the measurements were summarized within quadrants [posterior (P), superior (S), anterior (A), and inferior (I)]. Finally, we included in the analysis a large number of additional variables as defined in the Supplemental Appendix.

In lieu of detailed trabecular microstructure data for the spine or hip, we evaluated the nondominant distal radius and tibia by HRpQCT (XtremeCT, Scanco Medical AG, Brüttisellen, Switzerland). As described elsewhere,21 distal radius or tibia trabecular bone volume/total volume fraction (BV/TV) was derived from trabecular vBMD. A thickness-independent structure extraction was used to identify three-dimensional ridges (centers of the trabeculae), and trabecular number (Tb.N) was then taken as the inverse of the mean spacing of the ridges.22 Analogous with standard histomorphometry,23 trabecular thickness (Tb.Th) was calculated as BV/TV ÷ Tb.N, and trabecular separation (Tb.Sp) as (1 − BV/TV) ÷ Tb.N. Tb.Sp.SD, the standard deviation of Tb.Sp, is a measure of trabecular variation.24 Validation studies show excellent correlation (R ≥ 0.96) of these parameters with gold standard ex vivo microcomputed tomography (µCT).25 Trabecular architectural disruption was also assessed by connectivity density (Conn.D), and the structure model index (SMI) indicated whether trabeculae were more plate-like (lower values) or more rod-like (higher values). We recognize that there may be significant limitations to measuring SMI using HRpQCT. Thus, MacNeil and Boyd26 found relatively poor correlations (R2 = 0.075) for SMI measured by HRpQCT versus µCT. By contrast, unpublished data from Scanco Medical AG (Brüttisellen, Switzerland) suggests that SMI measured by HRpQCT correlates well with that measured using µCT. For this, 15 different 1 cm × 1 cm × 1 cm radius cubes (BV/TV range = 0.04–0.19) from human donors were scanned with µCT (20 microns) and then with HRpQCT using the standard patient protocol resolution (82 microns); SMI showed an R2 of 0.94 between the results from the two scanners. The distal radius or tibia cortex was segmented from the gray scale image with a Gaussian filter and threshold.22 Cortical vBMD and area were measured directly and the periosteal circumference calculated from the contour; cortical thickness (Ct.Th) was then calculated as Area ÷ Circumference. Excellent correlation (R = 0.98) has also been shown with Ct.Th measurements by µCT.26 Total and cortical section modulus, as well as components of these measurements, were also included. Again, we also included numerous additional variables produced by the device as defined in the Supplemental Appendix.

Statistical analysis

Before fitting the GBM models, each bone variable was age-standardized by fitting a linear regression model using all subjects in both study cohorts, extracting the residuals, then adding to that the overall mean, that is, presenting the variables as if they were all measured on 68-year-old women (overall mean age of the cohorts). We used the R package GBM27 to build separate prediction models for distal forearm and for vertebral fractures. The shrinkage penalization, which controls the rate of optimization in the model, was set at 0.01 (values closer to 1 are computationally faster but less accurate). Tree complexity controls the maximum number of interactions, and in these models, was set at three (ie, two- and three-way interactions were allowed) for the main analysis. The number of steps or terms in the fit was determined by cross-validation to prevent overfitting. Note that the GBM program utilizes a stochastic (random) component in the fitting process; stochastic methods are well established, but are normally only necessary for the most difficult maximization problems. One consequence of this is that the final solution will differ slightly from one run to another on the data. Multiple runs were done to verify that this had only minor impact on the results. Resulting models were further evaluated by exploration of functional form plots (ie, looking for indications of nonlinearity or interactions).

Models were fit predicting fracture status (case versus control) using both the Distal Forearm Cohort and the Vertebral Cohort. Separate models were fit using the HRpQCT variables, the spiral QCT variables, the DXA variables, or all three sets of variables as indicated in Figure 1. All models included height, weight, body mass index (BMI), and FN aBMD becausee these are standard measurements used to assess fracture risk. The GBM model developed to predict distal forearm fractures in the Distal Forearm Cohort was then used to predict vertebral fracture status in the Vertebral Cohort and vice versa. As a secondary analysis, the top 10 variables from the model using all three sets of variables were used to create a logistic regression model using interactions and splines to determine whether the predictive ability of these variables could be captured in a more standard model. Logistic regression and stepwise model selection were used when attempting to build this model. As an expression of fracture discrimination, the area under a receiver operating characteristic curve (AUC) was assessed using the predictive values from the various GBM and logistic models.28 Analyses were performed using R version 2.11.0 (R Foundation for Statistical Computing, Vienna, Austria) and SAS 9.2 (SAS Institute Inc., Cary, NC).

thumbnail image

Figure 1. Design of the analysis whereby a model predicting distal forearm fractures in the Distal Forearm Cohort was also used to predict fractures in Vertebral Cohort and vice versa.

Download figure to PowerPoint

Results

  1. Top of page
  2. Abstract
  3. Introduction
  4. Methods
  5. Results
  6. Discussion
  7. Disclosures
  8. Acknowledgements
  9. References
  10. Supporting Information

Table 1 summarizes the AUCs when the various distal forearm fracture models were used to extract predictions for the Distal Forearm Cohort and for the Vertebral Cohort. For the Distal Forearm Cohort there were 267 bone density, structure, or strength variables available for use in the analysis (Supplemental Appendix); individually they had AUCs for predicting forearm fracture outcomes ranging from 0.50 to 0.71 (median 0.61). FN aBMD, a standard measurement used clinically, had an AUC of 0.68. When only HRpQCT variables along with FN aBMD, height, weight, and BMI were used in the GBM modeling, the AUC was significantly higher at 0.96. Similarly, when only DXA variables or only spiral QCT variables were used instead of HRpQCT variables, the resulting AUCs were quite high (0.95 and 1.0, respectively). Using all 267 variables (DXA, HRpQCT, spiral QCT) also produced an AUC of 1.00.

Table 1. Area Under the Curve (AUC) as Obtained for Each of the 10 Fracture Prediction Models, as Derived Either From the Distal Forearm Cohort or the Vertebral Cohorts and Then Applied to Both Cohorts
Endpoint used to create prediction modelPredictors used in modelsModels applied to Distal Forearm CohortModels applied to Vertebral Cohort
  • a

    The model including only femoral neck aBMD was fit using univariate logistic regression; all other models were fit using GBM. The GBM models all included femoral neck aBMD, BMI, height, and weight in the list of candidate variables.

Distal forearm fracture in Distal Forearm CohortDXA FN aBMDa0.680.68
 DXA variables0.950.88
 HRpQCT variables0.960.82
 Spiral QCT variables1.000.94
 HRpQCT, spiral QCT, and DXA variables1.000.95
Vertebral fracture in Vertebral CohortDXA FN aBMDa0.690.69
 DXA variables0.780.99
 HRpQCT variables0.800.95
 Spiral QCT variables0.801.00
 HRpQCT, Spiral QCT, and DXA variables0.831.00

Because AUC values are artificially high when models and predictions are based on the same data, we also applied the prediction models developed from the Distal Forearm Cohort models to data from the Vertebral Cohort. When the distal forearm fracture GBM model with only HRpQCT variables was applied to the Vertebral Cohort, the AUC for predicting vertebral fractures dropped considerably, to 0.82. The AUC was 0.88 when the distal forearm GBM model using only DXA variables was applied to the Vertebral Cohort. The distal forearm GBM model using all three sets of variables provided the best overall prediction of vertebral fractures, with an AUC of 0.95.

Table 1 also summarizes the AUCs when the various vertebral fracture models were used to generate predictions for the Vertebral Cohort and the Distal Forearm Cohort. For the Vertebral Cohort, the 267 available variables used in the GBM models had AUCs ranging from 0.50 to 0.84 (median = 0.61) for discriminating those with and without vertebral fractures when used individually. For predicting vertebral fracture, the AUC for FN aBMD was 0.69. Using the HRpQCT measurements alone in the GBM model improved the AUC markedly to 0.95. Using the DXA measurements alone in the GBM model resulted in an AUC of 0.99. Using the spiral QCT variables alone or using the combination of DXA, HRpQCT, and spiral QCT variables, the AUCs were 1.00. Although the GBM model fit using only DXA variables produced a high AUC, the spiral QCT variables dominated the top 20 list when all HRpQCT, spiral QCT, and DXA variables were included in the modeling process.

As shown in Table 1, when the vertebral fracture GBM models derived from the Vertebral Cohort were applied to the Distal Forearm Cohort, the AUCs for predicting a distal forearm fracture ranged from 0.78 to 0.83.

Figure 2 shows the predicted probability of a distal forearm fracture for subjects, grouped by the traditional FN aBMD T-score classification (osteoporotic, osteopenic, normal). The predictions come from the GBM model using the HRpQCT variables plus BMI, height, weight, and FN aBMD as applied to the Distal Forearm Cohort. Higher probabilities of fracture were observed for those subjects who experienced a distal forearm fracture; and, as might be expected, the subset of osteoporotic subjects (as defined by FN aBMD T-score) had the highest predicted probability of fracture. Similarly, women defined as osteoporotic who had not yet developed a fracture were identified as being at higher risk for fracture compared with women who were defined as osteopenic or having normal FN aBMD but had not yet developed a fracture. Interestingly, women with normal FN aBMD who had experienced a distal forearm fracture were clearly identified as being at high risk for fracture and were distinguished from women with normal FN aBMD but no forearm fracture. Based on the fracture predictions derived from the GBM model using the spiral QCT variables plus BMI, height, weight, and FN aBMD, and applied to the Vertebral Fracture Cohort, Figure 3 shows an even stronger separation between vertebral fracture and nonfractured subjects, irrespective of FN aBMD T-score group.

thumbnail image

Figure 2. Boxplots of the probability of a distal forearm fracture as predicted by the model fit using the Distal Forearm Cohort using the HRpQCT variables, plus BMI, height, weight, and femoral neck aBMD. The box boundaries show the 25th and 75th percentiles of the values and the middle line is drawn at the median value.

Download figure to PowerPoint

thumbnail image

Figure 3. Boxplots of the probability of a moderate-to-severe vertebral fracture as predicted by the model fit using the Vertebral Cohort using the spiral QCT variables, plus BMI, height, weight, and femoral neck aBMD.

Download figure to PowerPoint

Table 2 lists the top twenty variables included in each of the four GBM models fit using the Distal Forearm Cohort. Given the stochastic approach used in this modeling, the variables chosen differed somewhat each time the model was fit; however, the predictive ability was consistent when the modeling process was repeated 100 times. Additionally, for the model fit using the HRpQCT variables, the tibia SMI was consistently the top variable listed when the models were run multiple times, while radius cortical density and SMI were consistently among the top variables. Less familiar variables such as Radius Imin/Cmin (mm3), the radius total section modulus relative to the larger main axis of inertia, and Radius Imax (mm4), also appeared in the top 20 variables. When the models were fit using only these 20 variables, the resulting AUC values were quite similar to those using all variables, due in part to the strong correlation between all of these measurements. When only the 9 top variables were used, the AUC dropped slightly to 0.94, suggesting that there may be a minimum number of variables necessary to create these models. When these reduced distal forearm models were applied to the Vertebral Cohort, the AUC similarly dropped slightly to 0.80. When all HRpQCT, spiral QCT, and DXA variables were included in the modeling process, the spiral QCT variables dominated the top 20 list.

Table 2. Top 20 Variables From the GBM Models Predicting Distal Forearm Fractures That Were Fit Using the Distal Forearm Cohort
67 HRpQCT variables144 Spiral QCT variables62 DXA variables267 HRpQCT, spiral QCT, and DXA variables
VariableRelative influenceVariableRelative influenceVariableRelative influenceSource of VariableVariableRelative influence
  1. All models included femoral neck aBMD (fnbmd), BMI, height, and weight in the list of candidate variables; these four variables are included in the variable count. See Supplemental Appendix for a definition of each of the variables.

smi_tib10.7vmncdist25.5duubmc8.7SpiralCTvcentx5.7
cdens_rad6.2vcentx5.3duubmd7.2SpiralCTvmncdist24.7
smi_rad6.2vsdcdist34.5ftarea4.8SpiralCTvsdcdist34.2
weight4.4vtmaxbmdv4.1fhipbmd4.3HRpQCTsmi_tib3.7
d100_tib4.0nlength3.8du13area4.2SpiralCTvtmaxbmdv3.0
bmi3.8vsdcdist13.6fnbmd4.0SpiralCTvsdcth22.5
fnbmd3.5vwtcentx3.4fhiparea3.8SpiralCTnsdcdist22.4
dtrab_tib3.2vsdcth23.1blegsbmd3.5SpiralCTvwtcentx2.4
cdens_tib2.8nsdcdist23.0blegsfat2.5SpiralCTnlength2.3
moi_cort_mn_tib2.6vmndist22.8barmsarea2.5SpiralCTncstdbmdv2.2
ttbn_rad2.5ncstdbmdv2.6fwbmd2.4HRpQCTsmi_rad2.2
ttbsp_tib2.4fnbmd2.0bodybmc2.3SpiralCTvsdcdist12.1
ttbth_rad2.3nhentr1.8du13bmd2.2SpiralCTnsdcdist31.9
connd_tib2.3nsdcdist31.8ftbmd2.2SpiralCTvmndist21.9
ttb1_nsd_rad2.1ncbmdv1.7dulnaarea2.2SpiralCTvcminbmdv1.5
imin_cmin_tot_mn_rad2.1nctmaxbndv1.7bodybmd2.0DXAduubmc1.4
imax_tot_mn_rad2.0ncminbmdv1.5dulnabmd1.8DXAduubmd1.3
ttbn_tib2.0vsdcdist41.5drubmd1.8SpiralCTvsdcdist21.2
imax_cmax_tot_mn_tib1.8nctstdbmdv1.4barmsfat1.7SpiralCTvmncdist41.2
ctpm_rad1.7vmncdist1.4btotln1.7SpiralCTvctminbmdv1.1
AUC0.96 1.0 0.95  1.0

Table 3 shows the top 20 variables included in each of the four the GBM models fit using the Vertebral Cohort. As was true with the Distal Forearm Cohort, variables such as cortical density and SMI were among the top listed HRpQCT variables. The strongest predictor among the models fit using only the spiral QCT variables was a measure of variability within the FN cortical vBMD measurement, followed by vertebral histogram entropy and overall vertebral trabecular vBMD. When the model building was limited to these three variables, the AUC for predicting a vertebral fracture was essentially unchanged. The strongest predictor among the models fit using only the DXA variables was ultradistal ulna aBMD followed by the area of the femoral shaft and total fat measured in the arms. Although the GBM model fit using only DXA variables produced a high AUC, the spiral QCT variables dominated the top 20 list when all HRpQCT, spiral QCT, and DXA variables were included in the modeling process.

Table 3. Top 20 Variables (Supplemental Appendix) From the GBM Models Predicting Vertebral Fractures That Were Fit Using Vertebral Cohort
67 HRpQCT variables144 Spiral QCT variables62 DXA variables267 HRpQCT, Spiral QCT, and DXA variables
VariableRelative influenceVariableRelative influenceVariableRelative influenceSource of VariableVariableRelative influence
  1. All models included femoral neck aBMD (fnbmd), BMI, height, and weight in the list of candidate variables. See Supplemental Appendix for a definition of each of the variables.

d100_tib11.0ncstdbmdv14.5duubmd9.7SpiralCTncstdbmdv14.2
smi_rad7.5vhent10.8fsarea6.8SpiralCTvtbmdv8.4
ttbth_rad5.3vtbmdv9.3barmsfat5.9SpiralCTvhent8.3
cdens_tib5.3vsdcdist35.1fnarea5.4SpiralCTvsdcdist34.8
fnbmd5.0vbmdv4.7ftbmd4.2SpiralCTvbmdv4.4
imax_cmax_tot_mn_rad4.2ntminbmdv2.8drubmd4.0SpiralCTvsdcth42.2
ttb1_nsd_tib4.2vsdcth42.7barmsarea3.9SpiralCTntminbmdv2.0
d100_rad3.9ncmaxbmdv2.3blegsarea3.5HRpQCTcdens_tib2.0
imax_cmax_cort_mn_tib3.2ntbmdv2.2du13area3.2SpiralCTncmaxbmdv1.8
imin_cmin_tot_mn_tib3.2vsdcdist22.1bodyarea2.9SpiralCTvsdcdist21.7
dtrab_rad3.0nsdcth31.7duubmc2.7DXAdrubmd1.5
trabarea_rad2.8ntstdbmdv1.6dr13area2.6DXAfsarea1.5
cth_tib2.5vmncdist21.5dtubmd2.5HRpQCTd100_tib1.4
cortarea_rad2.1vmncth11.5fhipbmd2.3DXAduubmd1.4
imax_cort_mn_rad1.9vsdcth1.3dulnaarea2.1SpiralCTnsdcth31.3
dtrab_tib1.8vwtcentx1.3blegsbmd2.1SpiralCTvmncdist21.3
ctpm_tib1.7vtrac1.2blegsfat1.9DXAfnarea1.3
connd_tib1.7nhent1.1bspibmc1.8SpiralCTvcentx1.2
weight1.5nmncth31.0duuarea1.8HRpQCTsmi_rad1.1
height1.5vcenty1.0fsbmd1.8DXAdr13bmc1.0
AUC0.95 1.00 0.99  1.00

As a secondary analysis, an attempt was made to mimic the distal forearm prediction model using a traditional logistic regression approach, with interactions and quadradic and cubic terms to capture the nonlinearity, using the top ten HRpQCT variables listed in Table 2 (first column). As applied to the Distal Forearm Cohort, the model it was built with, the AUC was 0.81 compared with 0.96 using the GBM approach. When this alternative Distal Forearm model was applied to data from the Vertebral Cohort, the AUC dropped to 0.73. Additionally, the need for interactions within the GBM framework was investigated by fitting a GBM model with a tree complexity setting of 1. This resulted in an AUC of 0.88 in the Distal Forearm Cohort, indicating that interactions play an important role in the GBM model.

Discussion

  1. Top of page
  2. Abstract
  3. Introduction
  4. Methods
  5. Results
  6. Discussion
  7. Disclosures
  8. Acknowledgements
  9. References
  10. Supporting Information

Using the GBM modeling approach and taking advantage of all of the variables produced by the DXA, HRpQCT, and spiral QCT scanners, we were able to differentiate fracture and nonfracture subjects with surprisingly high predictive ability, with AUCs near 1.0 for predicting distal forearm fractures in the Distal Forearm Cohort and vertebral fractures in the Vertebral Cohort. There is, of course, a possibility that the models may be overfitted, thereby producing results better than would be obtained applying these models to new data; however, prediction was still strong when each model was applied to the other fracture type, suggesting that these models may be capturing underlying fracture susceptibility attributes, regardless of fracture type. Of particular note, using parameters derived from these advanced bone imaging measurements, and no clinical information, GBM models predicted the increased fracture risk among women who were considered osteopenic or had normal bone density by DXA (Figs. 2, 3). These results suggest that there are structural parameters assessed by QCT and/or HRpQCT, which have unique biomechanical interactions that are contributing to diminished bone strength and predisposing these women to fracture; these are clearly not being captured by aBMD T-scores. On the other hand, use of all DXA measurements together may prove more useful than relying on only a handful of standard DXA values.

These results further illustrate the utility of such novel modeling approaches to help better identify previously understudied measurements currently being captured by DXA, QCT, and HRpQCT scanners that may allow us to improve our understanding of the factors contributing to bone fragility. Indeed, the data presented in Tables 2 and 3 strongly suggest that potentially important information is being ignored when we focus only on the well characterized skeletal parameters such as aBMD, vBMD, and cortical thickness. Although it is unclear at present what attributes some of the HRpQCT variables are capturing, our results provide the rationale for additional biomechanical analyses that will be needed to better understand the implications of relatively understudied skeletal parameters such as radius mean Imax. Moreover, one of the important concepts in data mining is that small contributions from many variables can lead to high quality predictions.29 Thus, rather than devise ever newer imaging techniques, there may be opportunities for better analysis of currently available data.

There are many possible machine learning methods available, such as neural networks and support vector machines,12 but we chose GBM for three primary reasons: there is evidence that boosting methods are one of the approaches least affected by overfitting; the models can accommodate both continuous and categorical variables; and software is readily available in the R statistical package.27 Moreover, GBM models have the advantage over logistic regression in that nonlinearity and interactions between variables can be captured without prior specification, which is of obvious importance in the search for new fracture prediction parameters. Moreover, GBM incorporates the stochastic component, for example, falling, that is so important in fracture pathogenesis. No intimation is implied or intended that ours is the “best” method: the point made here is that important information is contained within the currently collected variables that analytic methods such as this may be able to extract.

“Boosting” is a process that combines many separate prediction rules, some of which may be quite weak on their own, to produce a more powerful combined classifier. It is an important concept that has been discussed in the machine learning literature for the past 20 years.29 Gradient boosting, which combines ideas of boosting with classification trees, was introduced by Friedman in 1999, who clarified its relation to several other important statistical methods including lasso, bagging, and stage-wise models.30–32 Applications of the GBM approach to deal with complex sets of variables can be found in the ecology literature,33–35 but this approach has rarely been applied in the analysis of medical data.

In several of our models, SMI derived from the HRpQCT parameters appeared as a significant predictive parameter for fracture. As noted earlier in the Methods, HRpQCT may not accurately measure SMI.26 Despite this limitation, we chose to include SMI in our models as it likely does reflect some “quality” of trabecular bone that is being assessed by HRpQCT, even if that quality is not the true SMI as assessed by µCT.

In this report, we have demonstrated the potential power of the GBM approach to provide better fracture prediction models by reanalyzing existing information. It must be recognized, of course, that the resulting models were derived from our own specific data sets and need to be validated by others both in a case–control study such as this and using longitudinal cohort data. As such, our work represents mathematical prediction models that require further validation for the prospective clinical prediction of fractures. Nonetheless, our results provide some sense of an upper bound on how well we might expect to do with a given set of variables. Moreover, by including heretofore underutilized information provided not only by newer imaging devices but also existing DXA scanners, we were able to identify new variables for exploration. Although it is highly unlikely that DXA, HRpQCT, and spiral QCT assessments would all be combined in routine clinical practice, the goal of this preliminary study was instead to illustrate the potential of a novel statistical approach for obtaining deeper insights into predictor variables that might improve fracture risk assessment. Ultimately, the hope is that an approach such as this would be used by researchers to incorporate new prediction algorithms into scanners in order to provide increased predictive ability of fractures within the clinical setting.

Acknowledgements

  1. Top of page
  2. Abstract
  3. Introduction
  4. Methods
  5. Results
  6. Discussion
  7. Disclosures
  8. Acknowledgements
  9. References
  10. Supporting Information

This work was supported by research grants R01-AR027065 and UL1-RR024150 (Center for Translational Science Activities) from the National Institutes of Health, U.S. Public Health Service. The authors would like to thank Margaret Holets for the HRpQCT, DXA, and Spiral QCT measurements, Lisa McDaniel, RN, and Louise McCready, RN, for their assistance in ecruitment and management of the study subjects, and James Peterson for assistance with data management and file storage.

Authors' roles: Study design: EJA, SK, TMT. Study conduct: EJA, SK, TMT. Data collection: SJA, JJC, EJA. Data analysis: EJA, TMT. Data interpretation: EJA, TMT, SK. Drafting manuscript: EJA. Revising manuscript: SK, TMT, LJM, SA. Approving final version: EJA, TMT, LJM, JJC, SJA, SA, SK. Responsibility for integrity of data analysis: EJA.

References

  1. Top of page
  2. Abstract
  3. Introduction
  4. Methods
  5. Results
  6. Discussion
  7. Disclosures
  8. Acknowledgements
  9. References
  10. Supporting Information

Supporting Information

  1. Top of page
  2. Abstract
  3. Introduction
  4. Methods
  5. Results
  6. Discussion
  7. Disclosures
  8. Acknowledgements
  9. References
  10. Supporting Information

Additional Supporting Information may be found in the online version of this article.

FilenameFormatSizeDescription
jbmr_1577_sm_SupplAppendix.doc377KSupplementary Appendix

Please note: Wiley Blackwell is not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing content) should be directed to the corresponding author for the article.