Classification of Osteoporosis Based on Bone Mineral Densities

Authors

  • Ying Lu,

    1. Osteoporosis and Arthritis Research Group, Department of Radiology, University of California San Francisco, San Francisco, California, USA
    Search for more papers by this author
  • Harry K. Genant,

    Corresponding author
    1. Osteoporosis and Arthritis Research Group, Department of Radiology, University of California San Francisco, San Francisco, California, USA
    • Address reprint requests to: Harry K. Genant, M.D., Department of Radiology, University of California, San Francisco, San Francisco, CA 94143-0628, USA
    Search for more papers by this author
  • John Shepherd,

    1. Osteoporosis and Arthritis Research Group, Department of Radiology, University of California San Francisco, San Francisco, California, USA
    Search for more papers by this author
  • Shoujun Zhao,

    1. Osteoporosis and Arthritis Research Group, Department of Radiology, University of California San Francisco, San Francisco, California, USA
    Search for more papers by this author
  • Ashwini Mathur,

    1. Statistical Sciences, SmithKline Beecham Pharmaceuticals, King of Prussia, Pennsylvania, USA
    Search for more papers by this author
  • Thomas P. Fuerst,

    1. Osteoporosis and Arthritis Research Group, Department of Radiology, University of California San Francisco, San Francisco, California, USA
    Search for more papers by this author
  • Steven R. Cummings

    1. Department of Epidemiology and Biostatistics, University of California San Francisco, San Francisco, California, USA
    Search for more papers by this author

Abstract

In this article we examine the role of bone mineral density (BMD) in the diagnosis of osteoporosis. Using information from 7671 women in the Study of Osteoporotic Fractures (SOF) with BMD measurements at the proximal femur, lumbar spine, forearm, and calcaneus, we examine three models with differing criteria for the diagnosis of osteoporosis. Model 1 is based on the World Health Organization (WHO) criteria using a T score of −2.5 relative to the manufacturers' young normative data aged 20-29 years, with modifications using information from the Third National Health and Nutrition Examination Survey (NHANES). Model 2 uses a T score of −1 relative to women aged 65 years at the baseline of the SOF population. Model 3 classifies women as osteoporotic if their estimated osteoporotic fracture risk (spine and/or hip) based on age and BMD is above 14.6%. We compare the agreement in osteoporosis classification according to the different BMD measurements for the three models. We also consider whether reporting additional BMD parameters at the femur or forearm improves risk assessment for osteoporotic fractures. We observe that using the WHO criteria with the manufacturers' normative data results in very inconsistent diagnoses. Only 25% of subjects are consistently diagnosed by all of the eight BMD variables. Such inconsistency is reduced by using a common elderly normative population as in model 2, in which case 50% of the subjects are consistently diagnosed as osteoporotic by all of the eight diagnostic methods. Risk-based diagnostic criteria as in model 3 improve consistency substantially to 68%. Combining the results of BMD assessments at more than one region of interest (ROI) from a single scan significantly increases prediction of hip and/or spine fracture risk and elevates the relative risk with increasing number of low BMD subregions. We conclude that standardization of normative data, perhaps referenced to an older population, may be necessary when applying T scores as diagnostic criteria in patient management. A risk-based osteoporosis classification does not depend on the manufacturers' reference data and may be more consistent and efficient for patient diagnosis.

INTRODUCTION

BONE MINERAL density (BMD) and bone mineral content (BMC) values are used to diagnose osteoporosis.(1) According to the World Health Organization (WHO) working group,(2) white women whose BMD or BMC values are below −2.5 T score compared with young normal adults are considered to have osteoporosis. Although based on the properties of dual X-ray absorptiometry (DXA) scans of spine and hip, this diagnostic approach does not specify the anatomic measurement sites and/or techniques and, therefore, as the WHO working group notes, “individuals will be categorized differently according to the site and technique of measurements and the equipment and the reference population used.”(2) The choice of bone measurement technology, the corresponding reference population, and the measurement sites will have a direct impact on the management of individual patients in clinical practice.

Although there is a wide array of bone mineral measurement techniques available or under development, the most commonly used is X-ray absorptiometry, either single X-ray absorptiometry (SXA) or dual X-ray absorptiometry (DXA). DXA and/or SXA scanners can measure many anatomic sites.(3) The most common sites are the lumbar spine, proximal femur, forearm, and calcaneus. Each provides a reasonable capacity for generic fracture risk prediction,(1) and each correlates moderately well to the others.(4, 5) However, this modest correlation does not necessarily lead to consistent diagnostic classification of individual patients when threshold-based criteria are used.(5–13) These inconsistencies may be caused by differences in reference data, differences in peak bone levels and rates of bone loss, in technique-related sources of error, and in individual anatomic interrelationships.(3) These inconsistencies also make it unclear whether combining the results of several measurement techniques and/or sites reduces misclassification or improves fracture risk assessment.(7, 9-11, 14) Furthermore, inconsistencies in patient diagnosis make it difficult to develop treatment and reimbursement standards.(15)

The use of bone densitometry in patient management differs from its use in mass screening or epidemiological study. Under managed care, physicians are obligated to reduce the cost of examinations by eliminating unnecessary scans. At the same time, they are responsible for accurately diagnosing their patients; a truly osteoporotic patient must not go undiagnosed, and a healthy patient must not be falsely diagnosed as osteoporotic. Using only one diagnostic technique and/or one BMD parameter may fail to detect patients who might be diagnosed with another method. Measuring multiple sites reduces the likelihood of missing an osteoporotic patient but increases expense; false-positive rates will increase, resulting in some normal subjects receiving treatment. The choice of technique and site will affect the diagnostic conclusions as well as the cost, directly affecting the choice of intervention strategy for an individual patient.

Given these conditions, we attempted to distinguish the differences in the diagnosis of osteoporosis based on different BMD sites and to examine their clinical ramifications. Using a prospective cohort from the Study of Osteoporotic Fractures (SOF), we applied three approaches to investigate the magnitude of discordance in the diagnosis of osteoporosis. Model 1 was based on the WHO criteria and the manufacturers' young normative data for different measurement sites. Model 2 used a standardized older reference population to derive a T score-like measure. Model 3 used a cut-off threshold based on osteoporotic fracture risk. Finally, we considered whether assessing more than one BMD region of interest (ROI) from a single scan site helps predict hip and/or spinal fractures in a clinical setting.

MATERIALS AND METHODS

Subjects

From 1986 to 1988, the SOF recruited 9704 white women aged 65 years or older from population-based listings in four areas of the United States. At baseline, BMD was measured at the calcaneus, distal radius, and proximal radius, using single photon absorptiometry (SPA). At the second visit (1988-1989), surviving participants had BMD measurements of the postero-anterior (PA) spine (L1-L4) and proximal femur (neck, trochanter, Ward's triangle, and total hip ROIs) using DXA. Fractures of the hip and spine were recorded for each subject at each visit. Details of the study design and the data have been published.(16, 17)

We included 7671 women from the study for whom all the BMD measurements were available. All analyses of the classification of osteoporosis according to BMD measurements in this article were based on this study population.

Fractures

All women who had fractures of the hip and/or spine were included, whether they were followed for 5 years or not. We also included all women who actually were followed for 5 years, whether they had fractures in that time or not. Women who were followed for less than 5 years and had no fractures were excluded. These parameters gave us the records of 5568 women that were suitable for analyses related to hip fracture.

Lateral spine radiographs were obtained at baseline and about 3 years later. Incident vertebral deformities were defined by quantitative morphometry as a 20% reduction and at least 4-mm loss in the height of anterior, posterior, or midheight of any vertebra between L4 and T4, during the time between baseline and follow-up visits.(18, 19) The average time between the baseline and follow-up radiographs was about 3.7 years. An experienced radiologist provided visual confirmation of the diagnoses. The records of 6561 women were suitable for the analyses related to spinal fractures.

By combining hip and spine fractures, 5066 women met our inclusion criteria of at least one fracture or completion of 5 years follow-up. We considered hip and/or spine fractures in the SOF because they are probably the most relevant osteoporotic fractures. Unless specified otherwise, in this article “fracture” will refer to osteoporotic fractures of the hip and/or the spine.

BMD based classification of patients

The DXA scanners used in the SOF were the Hologic QDR 1000 (Hologic, Inc., Waltham, MA, USA). The SPA scanners were from OsteoAnalyzer (Siemens-Osteon, Wahiawa, HI, USA). The unit of BMD measurement was in grams per square centimeter. Lumbar spine BMD was measured at L1-L4. The reference parameters (peak mean and SD of young women aged 20-29 years) for hip, spine, and calcaneus were provided by the manufacturers. The hip BMD normative data represent the new version based on results from the National Health and Nutrition Examination Survey (NHANES) study.(20, 21) The forearm reference data were derived from Davis et al.(22) These reference values are listed in the first column of Table 1.

Table Table 1.. Mean and SDs of Normative BMD Values (g/cm2) From Manufacturers (Model 1), SOF 65 (Model 2), and Overall SOF Data
original image

We used the WHO criteria to classify women as osteoporotic based on the young normative values.(2) We considered a woman to be osteoporotic if her BMD T score was less than −2.5. (A T score is defined as the BMD value minus the sex-matched young reference BMD divided by the sex-matched young reference SD at the corresponding anatomical site.) We refer to this classification as model 1.

In model 2, we substituted the peak young reference values in the T score with reference values from the SOF women aged 65 years at baseline. We refer to this alternative T score definition as T′ (“T prime”). The midcolumn of Table 1 lists reference values for this cohort. To make classifications based on this reference cohort comparable with model 1, we used a T′ score of −1 as the cut-off point for the SOF 65 group, which derived a comparable prevalence of osteoporosis based on the femoral neck BMD.

Model 3 was based on the estimated risk of osteoporotic fracture of the hip and/or spine for a given BMD measurement. The estimated risk was determined with a logistic regression equation derived from the SOF fracture data. Because age is always available without cost, we used both age and BMD values in the regression equations. In this model, classifications of osteoporosis were based on a threshold of the estimated fracture probability as illustrated in the Appendix, that is, patients with a fracture risk above the cut-off values were designated as osteoporotic. The cut-off values for fracture risk were 14.6% for model 3, based on follow-up periods of 5 years for the hip and 3.5 years for the spine. The fracture risk value of 14.6% was chosen to generate a prevalence of osteoporosis similar to models 1 and 2 according to neck BMD so that comparisons among models would be possible.

Statistics

Descriptive statistics were used to summarize the study population. Pearson's correlation coefficients were used to evaluate the association of BMD values from different anatomic measurement sites. κ-Statistics and percentage agreements were used to study the agreement in classifications.

To evaluate the diagnostic agreement for model 3, we used a 10-fold cross-prediction method to classify the women. We randomly divided the SOF data into 10 subsets. We derived cut-off values from nine of the subsets and used those values to classify women in the remaining subset. We rotated the target subset among all subsets so that all patients in the SOF study were classified based on other women. This was done to reduce the bias in classification because the classification rules and results to be compared are based on the same data.

Logistic regression analysis was used to assess the risk of fractures. All statistical analyses were carried out using a statistical software package (SAS 6.09; SAS Institute, Cary, NC, USA). The statistically significant level was set at 0.05 throughout the study.

RESULTS

Classification of osteoporosis

Descriptive statistics of the study participants used in this article are shown in the last column of Table 1. Correlation coefficients between BMD parameters measured at different sites are shown in Table 2. The correlation coefficients ranged from 0.51 between proximal radius and Ward's triangle BMD to 0.90 between total hip and trochanteric BMD. Measurements from the same anatomic sites tended to have higher correlations than those from different sites. Correlation coefficients between two different sites were moderate (around 0.6).

Table Table 2.. Correlation Coefficients Among BMD Measurementsa
original image

In model 1, using the WHO criteria and the manufacturers' reference values, the prevalence of osteoporosis ranged from 3% based on calcaneal BMD to 60% based on the BMD of the proximal radius (Fig. 1). Examination of the cross-tables of classifications of osteoporosis, specifically the upper triangle of Table 3, shows that the percentages of agreement in the diagnosis of osteoporosis ranged from 43% between proximal radius and calcaneal BMD to 92% between total hip and trochanter BMD. Overall, the percentages of agreement varied greatly, indicating poor agreement in diagnosis. This is also evident in the very low κ-statistics given in the lower triangle of Table 3. The percentage of women who were consistently classified by all eight measurement sites was only 25%: 24% were classified as normal and 1% as osteoporotic (Fig. 2). These results suggest that using the WHO criteria and the manufacturer's reference data could result in very different diagnoses depending on the anatomic site and method used.

Table Table 3.. Percentage Agreement (Upper Triangle) and κ-Statistics (Lower Triangle in Percentage) in Osteoporosis Diagnosis for Model 1a
original image
Figure FIG. 1..

Prevalence of osteoporosis based on eight BMD measurements.

Figure FIG. 2..

Pie chart for the distribution of the number of low BMD sites. The pie chart consists of SOF participants in nine categories according to their number of osteoporotic sites. The area of each category represents the percentage of participants in it. Participants who are classified consistently by eight BMD measurement sites had either no low BMD sites or eight low BMD sites.

Model 2 further examined whether the differences were caused by the use of younger reference values or by the use of threshold-based diagnostic criteria in general. The prevalence of osteoporosis classified in model 2 was substantially more consistent than in model 1 (Fig. 1). The most noticeable changes were for osteoporosis at Ward's triangle, calcaneus, and forearm. The upper triangle of Table 4 shows the percentages of agreement in classifications based on two BMD measurements. The percentages ranged from 75% between femoral neck and proximal radius to 89% between total hip and trochanter BMD. The κ-statistics for pairwise agreement in diagnosis ranged from 0.33 to 0.70, given in the lower triangle of Table 4. Overall, model 2 showed substantially improved agreement in classifications, particularly among classifications based on different femoral BMD measurements and when comparing calcaneal BMD to other locations. The percentage of women who were classified consistently by all eight BMD measurements was 49%: 45% as normal and 4% as osteoporotic (Fig. 2). Even with this improvement over model 1, more than 50% of the women were diagnosed differently depending on the anatomic sites measured. Most of the percentage agreements between two sites were above 80%, indicating that a fifth of the women would be classified differently if they were diagnosed based on two different BMD measurement sites. Except for measurements of hip BMD sites, the κ-statistics of other combinations were mostly less than 0.5, suggesting only moderate agreement among different sites after adjusting for agreement by chance.

Table Table 4.. Percentage of Agreement (Upper Triangle) and κ-Statistics (Lower Triangle in Percentage) in Osteoporosis Diagnosis for Model 2a
original image

Model 3 used a risk-based approach and further improved the agreement in classifications. The prevalence of osteoporosis by this model ranged from 22% to 26%, as shown in Fig. 1. The model substantially improved the κ-statistics (lower triangle of Table 5) as well as the percentage agreement (upper triangle of Table 5), although moderate discrepancies remained. The percentage of women who were classified consistently by all eight BMD measurements was 69%: 59% as normal and 10% as osteoporotic (Fig. 2) based on our 10-fold cross-prediction. We found that low hip BMD missed only 6% of women diagnosed as low by other BMD measurements in model 3.

Table Table 5.. Percentage of Agreement (Upper Triangle) and κ-Statistics (Lower Triangle in Percentage) in Osteoporosis Diagnosis for Model 3a
original image

Fracture outcome in the diagnostic classification of osteoporotic patients

Agreement in diagnostic classification is one aspect of the comparison of diagnostic criteria. Equally important is whether a diagnostic method reflects patient prognosis.(23)

Figure 3 shows the sensitivity, specificity, and the positive and negative predictive values for hip and/or spine fracture for each measurement in the three classification models. Although sensitivities and specificities varied most in model 1 and were most consistent in model 3, the variations in positive and negative predictive values in model 1 and the improvement in model 3 were not substantial. This is because both positive and negative predictive values were regulated by the low prevalence of fracture, which was 7%. Still, model 1 had the most variations, with positive predictive values ranging from 12% Ward's triangle (WARDS) to 34% calcaneus (CALC). In contrast, the ranges of positive predictive values were from 17% (both forearm BMDs) to 24% (WARDS) for model 2 and 22% (PRAD) to 25% (all hip BMDs) for model 3. Negative predictive values were all above 90%.

Figure FIG. 3..

Sensitivity, specificity, and positive and negative predictive values for osteoporotic fracture based on eight BMD measurements.

Figure 4 shows the relative risks for fracture for osteoporotic women (those with lower BMD) compared with all the nonosteoporotic women (those with higher BMD) in the study, according to the three different diagnostic models. Relative risk here is for 5-year hip and/or spine fracture and compares two groups rather than traditional 1 SD change in BMD. Osteoporotic women had significantly elevated risks of fracture in each model. Although model 2 provided significantly better agreements in classification and homogeneity of sensitivity and specificity, the relative risks were not all higher than those in model 1. The relative risks for classification by Ward's triangle, calcaneal, and distal radius BMD in model 1 were higher than the relative risks in model 2. However, given the very high prevalence of Ward's triangle osteoporosis and very low prevalence of calcaneus and distal radius osteoporosis in model 1, such advantages in relative risk were of no practical use. Model 3 had consistently higher relative risks and relative risks across all eight measurements sites, compared with both models 1 and 2. Thus, model 3 was most relevant to the prognosis of the classified subjects.

Figure FIG. 4..

Relative fracture risk for hip and/or spine fracture based on eight BMD measurements. Relative risks for hip and/or spine fracture for osteoporotic women (lower BMD) compared with all the nonosteoporotic women (higher BMD) in the study.

Combination of measurements

The disagreement in patient classification and the multiple BMD parameters measured in hip and forearm scans lead to the question of whether additional BMD parameters derived from different ROIs of the same scan would be beneficial in patient management, without additional measurement cost. We examined the benefits of combining multiple BMD results from the hip or from the forearm scan measurements.

The relative risk of osteoporotic fracture as a function of the number (>0) of osteoporotic ROIs in a hip or forearm scan, in reference to women without osteoporotic ROIs in the corresponding site, is listed in Table 6. The risk increased with the number of osteoporotic ROIs assessed for either hip or forearm scans in all three models (Armitage trend test, p < 0.001). This suggests that the combination of osteoporotic status of all BMD ROIs from a hip or a forearm scan could help physicians better assess the prognosis of a patient without increasing diagnostic expense. Therefore, it may be beneficial to combine BMD information from different ROIs in the same hip or forearm scan in reporting patient bone status.

Table Table 6.. Relative Osteoporotic Fracture Risk as a Function of Number of Osteoporotic ROIs in Hip or Forearm Scans
original image

DISCUSSION

A number of important conclusions can be drawn from this study. The correlation between BMD measures is good, but the agreement in threshold-based classifications is only modest. Threshold-based classifications depend heavily on the reference data used. Standardization of the reference data will reduce but not eliminate inconsistencies in osteoporosis diagnosis. Risk-based classifications further improve the consistency among different measurement sites and result in nearly 70% agreement among the eight BMD sites studied. Reporting the status of all ROIs at the hip or forearm enhances fracture risk prediction.

The ideas and models in this article are not new. The conclusions are not surprising. Many authors have reported inconsistency in WHO classifications.(5–10, 12) There has been considerable effort to determine new definitions that will lead to more consistent classification.(23, 24) Our article is the first to evaluate the impact of diagnostic classification using data from a large prospective study. Our results support previous observations based on small data sets and reinforce the suggestions that classification of osteoporosis should focus on risk of fracture rather than on T scores selected to provide a certain prevalence.(24–27)

Although consensus is forming that the diagnosis and treatment of osteoporosis should not be based solely on BMD but should include other considerations and risk factors,(1, 24) BMD values are still the most frequently used and important factor. The WHO criteria produce inconsistent diagnoses of osteoporosis and are better applied in epidemiological and population-based studies than in individual diagnostic procedures or as thresholds for therapeutic decisions. This inconsistency has more consequences in clinical practice than in epidemiological studies. Physicians face the dilemma of deciding which patients should receive long-term treatments. Our study suggests that many of the inconsistencies are methodological and can be reduced by standardization to an older reference population and by using risk-based criteria. With increasing therapeutic choices in patient management, such information also can be used in choosing the optimal treatment strategies based on patient prognosis.

Because of logistical and proprietary concerns, manufacturers do not codevelop or share their reference populations. Device manufacturers typically use different reference populations for different machines. Our study supports the notion that different normative populations within and among manufacturers are a source of inconsistencies in classification based on T scores.(8, 12, 13) However, it seems that age may play a more dominant role in the disagreement. The disagreement in classifications could be reduced by using a unified elderly normative population in which the differences in age-related bone losses have been negated. Unfortunately, the current study cannot explicitly separate the effect of a standardized reference population and of differential aging effects because we do not have complete manufacturer-based reference data for age 65 years for all measured sites. The greatest disagreements in model 1 were between peripheral BMD and central BMD measures. Because of changes in technology, we are not able to obtain reliable references for 65-year-old white women from manufacturers for forearm and calcaneal BMD. The International Committee for Standards in Bone Measurement has recommended the adoption of hip BMD measurements from the NHANES(20, 21) as the standardized reference population for white women. Developing a universal reference population for all bone measurement sites and technologies would be a formidable task. However, there are efforts underway as proposed by Miller.(25) As diagnostic and treatment options increase, the importance of studying methodologies to calibrate future BMD measurements to the same reference population increases also.

Currently, the T score-based threshold approach for defining osteoporosis is widely applied. However, there are many advantages to a risk-based approach rather than T scores. We have previously underscored some of the pitfalls of using T scores for patient diagnosis.(9, 10, 26) Black et al. suggested avoiding the use of T scores as classification tools altogether and proposed including the relative risk of fractures as a standard tool in reporting bone densitometry results.(27) This article supports these arguments. Because relative risk depends on the definition of baseline fracture risk, it is much easier to define osteoporosis based on absolute fracture risks within a defined time window.

The risk-based diagnostic criteria used in this study showed strong and consistent separation of osteoporotic women from normal women. The better agreement, to a certain degree, could be attributed to the use of age as a variable in estimating risk and perhaps because the logistic regression model used information of fracture risks over the whole age range. Because age is a known risk factor for osteoporosis and costs nothing to obtain, it is beneficial to include it in our diagnostic criteria. Dropping age in the estimation of fracture risks in model 3 would result in a classification scheme very similar to model 2. Using age in the model to estimate the fracture risks causes the BMD cut-off values to vary with age. This should not be a problem in practice because the manufacturers could program the logistic regression equations and report such risks to physicians. It is already being used in a limited way. It is important to point out that model 3 offers significantly better prediction of osteoporotic fractures than does age alone, and may be more relevant to classification. An additional advantage of the risk-based diagnostic criteria is that risk information from multiple sources can be integrated into the estimation of global risk of fractures.

Using an older rather than younger reference age in T score calculations reduces age-related loss discrepancies between sites for postmenopausal women. Prevalence is still not equal because each manufacturer's T score is normalized by unique population SDs. To compensate, site and technology-specific T score thresholds for osteopenia and osteoporosis could be developed that, for large populations, would produce equivalent prevalence to a reference site such as the femoral neck. This approach is being considered by the joint committees of the NOF and ISCD and is outlined in Faulkner et al.(13) Although this approach could be implemented with existing reference curves, prevalence-based approaches generally are intended for epidemiological study. Also, this type of approach does not explicitly take into account fracture risk estimates that can differ by more than a factor of 2. It is not clear how well this approach would work to reduce inconsistent diagnoses in individual patients when different measurement techniques are used.

As expected, the risk-based diagnostic criteria used in this study were shown to be preferable based on the consistency among different BMD measurements. This supports the proposal by Kanis and Glüer for the Committee of Scientific Advisors of IOF.(24) The risk-based approach of model 3 is still dependent on the population used to derive the estimates of fracture probability. Although we do not have data to confirm it, we can speculate that the estimation equations derived from different populations will result, somewhat, in inconsistent osteoporosis diagnoses. The use of risk-based diagnostic criteria does not reduce the need for a standard reference population.

A substantial disadvantage of a risk-based approach in clinical use is that it requires prospective clinical studies for the risk of fractures, which are expensive and time consuming. Almost all of the prospective data available to link BMD and fracture pertain to elderly white women; yet interest is growing in assessing osteoporosis risk in women under the age of 65 years, in nonwhite women, and in men. Thus, it will be difficult to establish clinical thresholds for new techniques and populations other than white women. Also, risk estimates based on BMD may not be very accurate because BMD is only one of many factors that influence fracture risks.(17) The use of cross-sectional studies has been proposed to evaluate diagnostic cut-off values based on observed odds ratios (ORs). Possible limitations of such an approach are that selection of fracture cases can produce biases and, further, the observed BMD already may be affected by the fracture and may differ from the prospective studies.

The selection of hip and/or spine fracture as the “osteoporotic fractures” in this study was based on their relative clinical importance and abundance. In unpublished work and other conference presentations,(9, 10) we have performed similar analyses using only hip or spine fracture as the endpoints. The results were very similar to the results in this article and are available through the corresponding author. The cut-off value of 14.6% for osteoporotic fracture in model 3 was derived to make the prevalence of osteoporosis at the femoral neck comparable with the other two models so that comparison among models would be possible. It was not optimally selected and was not based on any clinical validation. Such a cut-off value requires further careful study and should not be used directly for clinical applications without further validation. In the Appendix, we provide a formula and corresponding parameters for calculating the corresponding BMD value for given percentages of cut-off points. This allows the reader to select the risk levels and derive BMD cut-off values accordingly.

Regardless of which definition of osteoporosis is used, we should always expect inconsistencies when measuring different body sites because of biological and technical variations. However, it is important to examine whether the proportion of misdiagnosed patients is small enough to offset the cost of identifying them and whether this additional knowledge enhances the prognostic information and changes the treatment strategy. Although there have been limited studies of the cost-effectiveness of the use of BMD,(1, 28) Black et al.(14) compared the risk of hip fractures using BMD of the femoral neck and lumbar spine. They found that by altering the cut-off values of the T score to −2.2 for femoral neck BMD, they could get a classification with the same risk prediction power as using either a neck or spine BMD with a T score less than −2.5. They concluded that the additional spine scan was unjustified.

Our approach of combining BMD parameters from a given site has several differences from that study. First, we used a combination of a scan of the hip or forearm, which gives more information than assessment of a single parameter. Second, our comparison was based on individual patients, not the similarity in statistical properties of populations. Although diagnosis and risk assessment are two different concepts with different uses, we only wish to emphasize that combining available information on diagnosis from the same scan can provide prognostic information and should not be ignored.

This study has several limitations. First, our conclusions are limited to women over the age of 65 years, to BMD measured by DXA/SPA, and to hip and/or spinal fractures, and may not apply to other bone measures, other osteoporotic fractures, or other populations. With the rapid development of new bone mineral measurement technologies, including quantitative ultrasound, it is important to determine the appropriate selection of single and/or multiple technologies and to integrate this information and its appropriate use into patient management.

Second, the hip and spinal fractures used in this article had different time windows. Hip fractures occurred within 5 years after all BMD measurements. However, spinal fractures could have occurred before the hip and spine DXA scans. Thus, the risk of incident spinal fractures for some participants may not be evaluated prospectively, as were the hip fractures. The window of spinal fractures averaged about 3.7 years, which is different from hip fractures. Thus, fracture outcomes may be weighted more toward hip fractures than spine fractures. Given the relative clinical importance of hip fractures, a slight overemphasis on hip fractures may be reasonable. Our osteoporotic fracture risk estimation only serves as a surrogate measure of the risk of future fractures. However, separate analyses indicated that our conclusions remained the same regardless of whether we studied only hip fracture or only spine fracture based on the same population.(9, 10)

Third, the calcaneal and forearm BMD measures were obtained 2 years before the hip and spine BMD. Although it seems unlikely, we do not know whether the difference in measurement time would affect the conclusions.

Fourth, the statistical method used in this article may introduce some biases in favor of model 2 and model 3 over model 1. Model 2 used women aged 65 years at baseline of the SOF study as reference. Thus, the reference data used in model 2 is part of the SOF data to be evaluated and is more consistent for other age groups than the reference data used in model 1. Using the 10-fold cross-prediction, model 3 built classification rules using data different from the data to be evaluated. Nevertheless, it still used SOF data of all age ranges. Such biases are not thought to be substantial enough to change the conclusions. The observed differences among the three models can be attributed mainly to the methodological differences in patient classifications.

In summary, we conclude that standardization of normative data, perhaps referenced to an older population, may be necessary for applying T scores as diagnostic criteria in patient management. A risk-based osteoporosis classification does not depend on the manufacturer's reference data and may be more consistent and efficient in patient diagnosis.

Acknowledgements

We thank David Breazeale for editorial assistance and manuscript preparation. We also thank both reviewers and editors for their comments and suggestions.

APPENDIX

Because of the logistic regression model, for a given risk of fracture (p) and age (a), the corresponding BMD cut-off values for model 3 can be derived in the following formula:

equation image(1)

where, β0, β1, and β2 are the intercept, regression coefficient for age, and regression coefficient for BMD, respectively, from the logistic regression equation. The corresponding parameters are given in Table A1. For example, if we are interested in a risk of fracture of 14.6% for 67-year-old women based on their total hip BMD measurement, we can use parameters in Table A1 and formula (1) to calculate the cut-off value for total hip BMD as

equation image

Thus, if we use a 14.6% fracture risk as our criteria, 67-year-old women who had a total hip BMD of less than 0.5445 mg/cm2 on a Hologic scanner will be classified as osteoporotic.

Table Table A1.. Parameters Used to Derive BMD Cut-Off Values for Model 39
original image

Ancillary