Discordance Between Changes in Bone Mineral Density Measured at Different Skeletal Sites in Perimenopausal Women—Implications for Assessment of Bone Loss and Response to Therapy: The Danish Osteoporosis Prevention Study
Department of Endocrinology, Odense University Hospital, Odense, Denmark
Address reprint requests to: B. Abrahamsen, M.D., Ph.D., Department of Endocrinology, Odense University Hospital, DK-5000 Odense C, Denmark
Assessing bone loss and gain is important in clinical decision-making, both in evaluating treatment and in following untreated patients. The aim of this study was to correlate changes in bone mineral density (BMD) at different skeletal sites during the first 5 years after menopause and determine if forearm measurements can substitute for dual-energy X-ray absorptiometry (DXA) of the spine and hip. BMD was measured at 0, 1, 2, 3, and 5 years using Hologic 1000/W and 2000 densitometers in 2016 perimenopausal women participating in a national cohort study. This analysis comprises 1422 women remaining in the study after 5 years without changes to their initial treatment (hormone-replacement therapy [HRT], n = 497, or none, n = 925). Despite correlated rates of change between forearm and spine (r2 = 0.11; p < 0.01), one-half of those who experienced a significant decrease in spine BMD at 5 years showed no significant fall in forearm BMD (sensitivity, 50%; specificity, 85%; κ = 0.25). The total hip had significant better agreement with spine (sensitivity, 63%; specificity, 85%; κ = 0.37; p < 0.01). Analysis of quartiles of change also showed significant better agreement with spine and whole body for the total hip than for the femoral neck or ultradistal (UD) forearm. In a logistic regression analysis for identification of group (HRT or control), the prediction was best for whole body (82.6%) and spine (80.9%), followed by total hip (78.5%) and forearm (74.7%). In conclusion, changes at the commonly measured sites are discordant, and DXA of the forearm is less useful than DXA of the hip or spine in determining the overall skeletal response to therapy or assessing bone loss in untreated women.
ASSESSING BONE loss and gain is important, both in the evaluation of treatment and in the follow-up of patients without therapy. Although the bone mineral density (BMD) at different anatomic regions is correlated, the agreement between sites is low when it comes to classifying individual subjects as osteoporotic or not.(1–4) Moreover, the anatomic regions accessible to DXA have different characteristics relevant to follow-up studies. Some of this variability is biological and can be explained by differences in cortical and cancellous bone content and differences in mechanical loading, but measurement errors are also unevenly distributed across the anatomic sites. The lumbar spine shows high reproducibility but can be misleading in the elderly due to aortic calcification and spondyl-arthrosis.(5, 6) In contrast, the hip has a higher coefficient of variation (CV), but it is less affected by degenerative changes. The ultradistal forearm can be measured on cheaper, portable equipment and may be more responsive to therapy than other sites because of a high proportion of trabecular bone.(7, 8) The purpose of this study was to compare changes in BMD at different skeletal sites and determine if forearm dual-energy X-ray absorptiometry (DXA) can substitute for spine and hip measurements in the assessment of bone loss rates and the response to therapy. The correlation and, more importantly, agreement between changes in BMD were assessed in a large national cohort of healthy women as they progressed through the first 5 years after menopause. To address the question of responsiveness, a comparison also was made of the ability of each measured region to distinguish between women receiving or not receiving hormone-replacement therapy (HRT).
MATERIALS AND METHODS
From 1990-1993 a total of 2016 healthy perimenopausal women were included in The Danish Osteoporosis Prevention Study (DOPS), a nationwide 20-year multicenter study of risk factors for osteoporosis. This is an open study with a randomized (HRT or no treatment) and a nonrandomized arm (HRT or not by personal choice) and a planned duration of 20 years.(9) Women were eligible for inclusion, provided they were 45-58 years of age and either (a) 3-24 months past last menstrual bleeding or (b) still menstruating but exhibiting perimenopausal symptoms including menstrual irregularities with a serum follicle-stimulating hormone (FSH) more than 2 SD above the premenopausal mean. All participants gave their verbal and written informed consent before entry in the study, which was conducted in accordance with the Helsinki II Declaration and approved by the local ethics committees (reference 90/119). At the initial visit, a blood sample was drawn for biochemical screening, including a blood count, serum levels of calcium, thyroid-stimulating hormone (TSH), creatinine, aspartate amino transferase, and fasting blood glucose. Exclusion criteria were: (1) metabolic bone disease, including osteoporosis defined as nontraumatic vertebral fractures on X-ray; (2) current estrogen use or estrogen use within the past 3 months; (3) current or past treatment with glucocorticoids for >6 months; (4) current or past malignancy; (5) newly diagnosed or uncontrolled chronic disease; and (6) alcohol or drug addiction.
The effects of HRT on fracture incidence and BMD in the parent study have been reported previously.(10) In the present subset, 71% of untreated women had a significant decrease in spine BMD, while 53% of participants in the HRT group had a stable BMD and 33% a significant increase over 5 years. The untreated group had a 6.5% mean decrease and the HRT group had a 2.2% mean increase in BMD from baseline to 5 years. To compare the ability of different regions to identify women who received HRT or not, participants who changed treatment status (began or stopped HRT) during the study were excluded from the data set presented here. Of the 723 women who were started on HRT, 497 (68%) completed 5 years of follow-up while taking HRT throughout. Similarly, of 1293 women in the control group, 925 (72%) were available for analysis after 5 years without changes to their treatment status. Thus, this presentation covers 1422 women (497 HRT and 925 untreated), 668 from the randomized arm and 754 from the open arm.
Measurement of BMD
BMD of the spine, hip, forearm, and whole body was measured using cross-calibrated QDR-1000/W and QDR-2000 densitometers (Hologic Inc., Waltham, MA, USA), as previously described.(3) Whole-body (WB) scans were done at inclusion and after 1, 2, and 5 years. All other measurements were done at inclusion and after 1, 2, 3, and 5 years. The in vivo precision errors for BMD in the participating clinics were 1.5% (spine and total hip), 2.1% (femoral neck), 1.9% (UD radius), and 0.7% (whole body). Long-term stability of the equipment was assessed by daily scans of an anthropometric phantom at each center. Changes were below 0.2% per year. A standardized procedure for scan acquisition and data analysis was established and followed for all scans. The scanned images were stored on optic disks for subsequent analysis. Intercenter concordance in data analysis was checked during the inclusion period by circulation of unanalyzed scans from all centers. Intercenter calibration differences were investigated by scanning one common anthropometric phantom at the start of the study and once a year.
Baseline demographics were compared using Student's t-test for unpaired data. Log transformation was done for variables that did not follow the normal distribution (weight, age, and years since menopause). The absolute change in BMD from baseline to each time point was used in a paired comparison scoring each follow-up scan as a significant gain (greater than +1.96√2 × CV), a significant loss (greater than −1.96√2 × CV), or no significant change from baseline. Agreement between regions in classifying participants was then analyzed using κ-statistics. In the interpretation according to Landis and Koch κ-coefficients below 0.2 indicate “poor agreement,” between 0.2 and 0.4 indicate “fair agreement,” and above 0.4 indicate “moderate agreement.”(11) The total hip and the neck of the femur were not compared because these regions are acquired from the same scanned image and are mathematically dependent. Second, the rates of bone loss (ΔBMD) were derived using linear regression analysis on all scans from each patient. ΔBMD at different measurement sites was compared using Pearson correlation analysis. A separate analysis of κ-scores for allocation into quartiles of BMD change also was done. Logistic regression analysis was used in order to compare the ability of DXA to discriminate between treated and untreated participants across anatomic sites, and χ2 tests were used to compare proportions in patient referral scenarios. Values of p < 5% were considered significant throughout.
Women in the HRT group were slightly but significantly younger than the untreated group (p < 0.01, Table 1). On average, their body weight was 1.8 kg lower (p < 0.01) and their last menstrual period was 5 months (p < 0.01) earlier relative to the inclusion date. The two groups did not differ in BMD at any site.
Table Table 1.. Baseline Demographics
Correlation between changes in BMD at different sites
There was a significant correlation (p < 0.001) between ΔBMD across all anatomic sites (Table 2), both in treated and untreated participants. Between 10% and 30% of the variation in ΔBMD at one site could be predicted by ΔBMD at another measurement site. The strongest correlation with spine ΔBMD was seen for whole-body ΔBMD and the weakest correlation was seen for forearm ΔBMD.
Table Table 2.. ΔBMD Correlation (r2) Across Anatomic Regions
Agreement in classification based on significance of BMD change
The discordance of BMD changes depended on observation period, anatomic site, and treatment status. A significant increase in total hip BMD from baseline to 5 years was associated with an overall 62% probability (positive prediction value [PPV]) of a concomitant significant gain in spine BMD (Table 3) and an 80% probability of a significant gain in whole-body BMD (not shown). At 2 years, the corresponding probabilities were 38% and 40%, respectively. For the UD forearm, the probabilities of simultaneous spine BMD gains were 40% and 62% at 5 years and 21% and 33% at 2 years. Treatment status influenced concordance; thus, a significant gain in total hip BMD over 5 years in an untreated woman was associated with a 7% chance of an increase in spine BMD, whereas the same observation made in a woman receiving HRT indicated a 70% probability that spine BMD had increased. Although specificity was high (80-97%), sensitivity was low. Thus, at 5 years the sensitivity of the forearm for detecting significant changes at the spine was 49.5%, while it was 62.8% if measured at the total hip. When subjected to κ-analysis, none of the comparisons indicated good or moderate agreement (Fig. 1 and Table 4). Fair agreement was seen for all ΔBMD comparisons after 5 years, though agreement was significantly poorer for combinations that included the UD forearm. In addition, agreement with spine ΔBMD was significantly better for the total hip site (κ = 0.37) than for the femoral neck (κ = 0.31). For all regions, agreement increased with follow-up time (Fig. 2). Thus, κ-coefficients for spine against total hip and spine against WB, went from poor to fair at the 2-year visit, whereas spine against femoral neck and spine against forearm required a period of 3 years to reach this level. Agreement between forearm and the hip sites went from poor to fair only at the 5-year visit.
Table Table 3.. Concordance Between Significant Gains or Loss Between Peripheral Sites and the Spine
Table Table 4.. Agreement Between Anatomic Regions
Agreement in classification based on quartiles of BMD change
This analysis was done separately for each of the two treatment groups because the limits of ΔBMD quartiles differ between women who receive HRT and those who do not. First, in untreated participants, the quartiles of loss at the forearm over 5 years showed poor agreement with the other measured sites (κ = 0.12). The total hip region or neck yielded a significantly higher (p < 0.01) κ with spine (κ = 0.17). Whole-body ΔBMD showed fair agreement (κ = 0.21-0.22) with all regions apart from the femoral neck. Second, in women receiving HRT, ΔBMD agreement between forearm and spine was poor (κ = 0.13). Agreement between total hip and spine (κ = 0.32) was significantly (p < 0.01) better and ranked as fair. The femoral neck showed better agreement with spine than was the case for the forearm, but poorer than that provided by the total hip (p < 0.05). Whole-body ΔBMD showed fair agreement with spine and total hip (κ = 0.22-0.23)
Logistic regression analysis revealed that the ability to discriminate between women receiving HRT and untreated women was highest (p < 0.05) using the spine or whole body. It was significantly lower for the total hip and lowest for the forearm and the femoral neck (Table 5). About 80% of the population could be identified correctly as treated or untreated using whole body, spine, or total hip and 75% using UD forearm or femoral neck.
Table Table 5.. Logistic Regression for Prediction of Treatment Status Based on Observed Change in BMD
Using forearm or hip loss rates to select participants for referral to DXA of the spine
Finally, a scenario in which forearm loss rates limited the referral of untreated participants to follow-up DXA of the spine was considered. If the aim was to identify 90% of participants belonging to the highest quartile of lumbar bone loss, 81% of those examined would have to be to referred to a clinic capable of spine densitometry, leading to a PPV of only 27% (Table 6). At this sensitivity level, the total hip did not perform significantly better than the forearm. If a sensitivity of 80% rather than 90% was accepted, 69% would require referral. The PPV in this scenario was 28%. The number referred within the 80% sensitivity scenario would be significantly lower (p < 0.001) if the total hip were to be used, requiring referral of 53% and offering a slightly better PPV of 35%.
Table Table 6.. Diagnostic Performance of the UD Forearm and Total Hip in a Selection Scenario for Bone Loss
Our study shows that the agreement between bone loss rates at different anatomic sites is incomplete and that DXA of the forearm cannot adequately substitute for axial DXA measurements in monitoring bone loss rates in early postmenopausal women. The agreement between changes was best between the spine and whole body and between the spine and the total hip. Concordance increased with longer observation times, but the changes at the forearm consistently showed lower agreement than the other sites. Although the UD forearm generally is considered very responsive to HRT,(7, 8) the best discrimination between women receiving or not receiving HRT was provided by measurements of whole body and spine and, to a lesser degree, the total hip. Identification of treatment status was significantly less efficient using the forearm or the femoral neck. Moreover, loss rates at the forearm performed poorly as a diagnostic test for selecting participants for additional spine densitometry, requiring referral of two-thirds of the patients to obtain a sensitivity of 80% for fast loss (highest quartile) at the spine. The discordance between rates of change across the anatomic regions was particularly pronounced in women who were not receiving treatment.
BMD is a strong risk factor for fracture. Thus, in prospective studies, the relative risk of fracture for a 1 SD decrease in BMD ranges between 1.4 and 2.6, depending on fracture location and measurement site.(12) The most important osteoporotic fractures, both in terms of health expenditures(13) and excess mortality,(14) are those of the hip and the spine. The best prediction of hip fracture is provided by DXA of the hip or spine, whereas the forearm is a poorer predictor.(12, 15, 16) Age, falls, and preexisting fractures influence the risk of fracture independently of BMD. For diagnostic purposes, the World Health Organization (WHO) working group criteria in 1994 proposed a fixed, not age-adjusted, BMD threshold for the diagnosis of osteoporosis in women, defined by a T score of −2.5 SD below the young adult mean.(17) However, T scores vary greatly between anatomic sites. This is particularly pronounced in perimenopausal women,(1, 2) in which the choice of anatomic region determines the apparent prevalence of osteoporosis. Thus, the prevalence of osteoporosis increases by 50% when all femoral sites are considered as opposed to the spine alone,(3) Again, the correlation between BMD of the forearm and spine or hip is particularly low, and we have previously found that one-third of perimenopausal women classified as normal based on forearm DXA have osteoporosis or osteopenia of the spine or hip.(3)
Two important points can be made regarding this discordance. First, the population classified as having osteoporosis when one site is measured is not the same population as the one found to be osteoporotic when another site is chosen. Second, rates of bone loss differ substantially between the anatomic regions in the same individual. Therefore, the purpose of this study was to evaluate the use of DXA in patient follow-up, both in the assessment of response to therapy and in the monitoring of bone loss in untreated women. DXA of the forearm has particularly good precision(7) and accuracy(18) and can be measured on relatively inexpensive and portable equipment. Forearm BMD—albeit measured by single-photon absorptiometry—also has been shown to predict long-term fracture risk.(19) Nevertheless, agreement with BMD at other sites is low,(3) and the discrimination between patients with established osteoporosis and healthy subjects is poorer than that provided by spine or hip.(20) In this study, we found that the agreement between loss rates required longer follow-up times at the forearm to obtain fair agreement (defined as a κ-coefficient above 0.2) than for the other anatomic regions examined. The agreement was poorer in untreated women than in those receiving HRT. This indicates that the sites vary less in their response to HRT than in their spontaneous bone loss rates after menopause. Even so, the rates of change overlapped and no region offered a complete separation of treated and untreated participants even after 5 years. Whole-body DXA performed well, both in the identification of treatment status and in the overall correlation with spine and hip. The agreement with spine and total hip was significantly better than that provided by the forearm. It can be argued that the spine and hip contribute to the whole-body DXA measurement, so these results are not strictly independent. Although this reservation is mathematically valid, it is of little practical consequence because the total hip and spine each contribute <2% to the whole-body bone mineral content (BMC).
Based on the present findings, the use of forearm densitometry as a cheaper means for follow-up of untreated women cannot be recommended. Although observing a significant loss in UD forearm BMD at 5 years indicates an 80% probability of a simultaneous significant loss in spine BMD, this information is not really clinically useful because the a priori probability is approximately 70%. Because of low sensitivity (Table 3), the decline in forearm BMD falls short of statistical significance in one-half of those who have a significant loss in BMD at the spine over 5 years. Using the total hip instead of the forearm reduces the number of missed cases to one in three, which is better but still not acceptable. The results are similar if the aim is to screen not for significance of bone loss (ΔBMD exceeding 1.96√2 × CV) but for particularly accelerated bone loss (highest quartile of ΔBMD). Thus, two-thirds of untreated perimenopausal women followed in a forearm-only clinic would need to be reassessed using DXA of the spine to identify 80% of those in the highest quartile of spinal bone loss. Further, the remaining one-third would still have a 16% risk of belonging to the highest quartile, and in those referred only 28% would indeed belong to this quartile. A significantly lower number of patients would need referral if the changes at the total hip were used in selection at the same level of sensitivity. In addition, the forearm proved slightly less efficient at detecting the presence of HRT than the other regions. We recommend the use of the spine and total hip both for monitoring the response to HRT and for decision-making in untreated patients. DXA of the forearm is not a good surrogate measurement for this purpose. The role of whole-body BMD is not yet established, but the current study indicates that this measurement modality is not only responsive but also shows fair agreement with spine and hip.
There are some limitations to this study. First, the only treatment evaluated was HRT. Anatomic sites exhibit differences in their response to growth hormone(21) PTH,(22, 23) and bisphosphonates.(24) Thus, for other treatments, additional work certainly is required to determine if the best separation of treated and untreated patients would be offered by the same sites as for HRT. Second, the change in BMD during antiresorptive therapy only explains about 20% of the decrease in fracture rate in large clinical trials. This can be accounted for only in part by measurement errors of DXA, which become particularly important in studies of short duration. It is possible but not proven that the reduction in bone remodeling per se leads to decreased fracture risk, perhaps because of prevention of trabecular perforations. The low explanatory power of ΔBMD on reduced fracture risk also may be caused by nonskeletal effects of therapy, as well as baseline differences in skeletal fragility and the possibility that the relationship between BMD and fracture risk may not be truly bidirectional.(25) In theory, antiresorptive effects would be easier to detect at sites with a large proportion of trabecular bone, such as the spine, trochanter, and UD forearm. Whole-body BMD is largely made up of cortical bone; yet it provided superior discrimination between treated and untreated participants in the current study.
In conclusion, the forearm is not an acceptable substitute for the spine and hip, when repeated DXA measurements are required for clinical decision-making in peri- or early postmenopausal women. Bone loss at the forearm has low sensitivity for bone loss at other sites, and the ability of the forearm to detect the presence of HRT is not as efficient as that of the spine or the total hip.
The authors are grateful to all technicians and secretarial staff who contributed to the study. The DOPS is financially supported by grants from the Karen Elise Jensen Foundation and from Novo Nordisk Farmaka (Lyngby, Denmark). Participating centers include the Aarhus University Hospital, Professor L. Mosekilde (center leader) and Dr. P. Charles; Copenhagen Municipal Hospital, Dr. O.H. Sørensen (center leader); Hillerød Central Hospital, Dr. S.P. Nielsen (center leader); and Odense University Hospital, Professor H. Beck-Nielsen (center leader).
Presented in part as an abstract at the 7th Bath Conference on Osteoporosis, Bath, UK, April 2000.