Association of Five Quantitative Ultrasound Devices and Bone Densitometry With Osteoporotic Vertebral Fractures in a Population-Based Sample: The OPUS Study

Authors

  • Dr Claus C Glüer,

    Corresponding author
    1. Medical Physics, Department of Diagnostic Radiology, University Hospital Schleswig-Holstein, Campus Kiel, Kiel, Germany
    • Medizinische Physik, Klinik für Diagnostische Radiologie, UKSH, Campus Kiel, Michaelisstr. 9, D-24105 Kiel, Germany
    Search for more papers by this author
    • Dr Barkmann serves as a consultant. Dr Eastell received research funding from IGEA. Dr Glüer served as a consultant for IGEA. Dr Reid received grants from Procter & Gamble and served as a consultant for Eli Lilly and Company, Procter & Gamble, Novartis, and Roche. All other authors have no conflict of interest.

  • Richard Eastell,

    1. University of Sheffield Clinical Sciences Centre, Sheffield, United Kingdom
    Search for more papers by this author
    • Dr Barkmann serves as a consultant. Dr Eastell received research funding from IGEA. Dr Glüer served as a consultant for IGEA. Dr Reid received grants from Procter & Gamble and served as a consultant for Eli Lilly and Company, Procter & Gamble, Novartis, and Roche. All other authors have no conflict of interest.

  • David M Reid,

    1. Department of Medicine and Therapeutics, University of Aberdeen, Aberdeen, United Kingdom
    Search for more papers by this author
    • Dr Barkmann serves as a consultant. Dr Eastell received research funding from IGEA. Dr Glüer served as a consultant for IGEA. Dr Reid received grants from Procter & Gamble and served as a consultant for Eli Lilly and Company, Procter & Gamble, Novartis, and Roche. All other authors have no conflict of interest.

  • Dieter Felsenberg,

    1. Diagnostic Radiology, Free University Berlin, Berlin, Germany
    Search for more papers by this author
  • Christian Roux,

    1. Centre d'Evaluation des Maladies Osseuses, Service de Rhumatologie, Assistance-Publique, Hopitaux de Paris, René Descartes University, Paris, France
    Search for more papers by this author
  • Reinhard Barkmann,

    1. Medical Physics, Department of Diagnostic Radiology, University Hospital Schleswig-Holstein, Campus Kiel, Kiel, Germany
    Search for more papers by this author
    • Dr Barkmann serves as a consultant. Dr Eastell received research funding from IGEA. Dr Glüer served as a consultant for IGEA. Dr Reid received grants from Procter & Gamble and served as a consultant for Eli Lilly and Company, Procter & Gamble, Novartis, and Roche. All other authors have no conflict of interest.

  • Wolfram Timm,

    1. Medical Physics, Department of Diagnostic Radiology, University Hospital Schleswig-Holstein, Campus Kiel, Kiel, Germany
    Search for more papers by this author
  • Tilo Blenk,

    1. Diagnostic Radiology, Free University Berlin, Berlin, Germany
    Search for more papers by this author
  • Gabi Armbrecht,

    1. Diagnostic Radiology, Free University Berlin, Berlin, Germany
    Search for more papers by this author
  • Alison Stewart,

    1. Department of Medicine and Therapeutics, University of Aberdeen, Aberdeen, United Kingdom
    Search for more papers by this author
  • Jackie Clowes,

    1. University of Sheffield Clinical Sciences Centre, Sheffield, United Kingdom
    Search for more papers by this author
  • Friederike E Thomasius,

    1. Diagnostic Radiology, Free University Berlin, Berlin, Germany
    Search for more papers by this author
  • Sami Kolta

    1. Centre d'Evaluation des Maladies Osseuses, Service de Rhumatologie, Assistance-Publique, Hopitaux de Paris, René Descartes University, Paris, France
    Search for more papers by this author

Abstract

We compared the performance of five QUS devices with DXA in a population-based sample of 2837 women. All QUS approaches discriminated women with and without osteoporotic vertebral fractures. QUS of the calcaneus performed as well as central DXA.

Introduction: Quantitative ultrasound (QUS) methods have found widespread use for the assessment of bone status in osteoporosis, but their optimal use remains to be established. To determine QUS performance for current devices in direct comparison with central DXA, we initiated a large population-based investigation, the Osteoporosis and Ultrasound Study (OPUS).

Materials and Methods: A total of 463 women 20–39 years of age and 2374 women 55–79 years of age were measured on five different QUS devices along with DXA of the spine and the proximal femur. Their vertebral fracture status was evaluated radiographically. The association of QUS and DXA with vertebral fracture status was evaluated using logistic regression.

Results: All QUS approaches tested discriminated women with and without osteoporotic vertebral fractures (20% height reduction), with age-adjusted standardized odds ratios ranging 1.2–1.3 for amplitude-dependent speed of sound (AD-SOS) at the finger phalanges, 1.2–1.4 for broadband ultrasound attenuation (BUA) at the calcaneus, and 1.4–1.5 for speed of sound (SOS) at the calcaneus, 1.4–1.6 for DXA of the total femur, and 1.5–1.6 for DXA at the spine. For more severe fractures (40% height reduction), age-adjusted standardized odds ratios increased to up to 1.9 for DXA of the spine and 2.3 for SOS of the calcaneus.

Conclusions: In conclusion, all five QUS devices tested showed significant age-adjusted differences between subjects with and without vertebral fracture. When selecting the strongest variable, QUS of the calcaneus worked as well as central DXA for identification of women at high risk for prevalent osteoporotic vertebral fractures. QUS-based case-finding strategies would allow halving the number of radiographs in high-risk populations, and this strategy works increasingly well for women with more severe vertebral fractures. It is likely that the good performance of QUS was in part achieved by rigorous quality assurance measures that should also be used in clinical practice.

INTRODUCTION

Quantitative ultrasound (QUS) methods have found widespread use in the assessment of bone status in osteoporosis.(1–4) Lower cost and lack of ionizing radiation have facilitated dissemination and enhanced acceptance by patients and physicians. Prospective studies have demonstrated that risk of fracture of the proximal femur,(5–8) the vertebrae,(9–11) and other sites(10,12-16) can be predicted by QUS, with standardized risk ratios at least comparable with other peripheral measurement approaches, and in some studies, even similar to central bone densitometry methods.(6) Effective prevention programs for osteoporosis require quick, inexpensive diagnostic methods suited for widespread use. Validating QUS-based assessment of osteoporosis thus offers the opportunity to reduce the medical and economic burden of this debilitating disease, provided that cost-effective strategies for identifying patients at high risk for fracture could be developed.(17,18)

Despite its proven advantages, the use of QUS remains controversial. While it is undisputed that QUS methods can be used to assess fracture risk, it is unclear how they can be used for the diagnosis of osteoporosis and how patients who would benefit most from treatment could be selected based on QUS. Technological diversity among QUS approaches complicates the validation process. Many new devices have been introduced that differ from those machines used in the early prospective studies. Hence, it is not clear whether the current technologies meet or exceed performance standards of those early approaches. Monitoring performance has been inconsistent because of lower longitudinal sensitivity and the lack of validated quality assurance methods to control equipment stability.

To gather the data required to address the issues raised above and to provide clear guidelines for appropriate clinical use of QUS methods, we initiated the Osteoporosis and Ultrasound (OPUS) study. The performance of QUS methods can only be judged if they are directly compared with the performance of competing diagnostic methods. Therefore, those diagnostic methods, for example, laboratory markers of bone turnover, genetic markers, X-ray-based bone densitometry techniques, function performance tests, and clinical risk factor questionnaires, were also incorporated in the OPUS study protocol. The OPUS participants were selected from random population samples to collect reference data and to allow drawing conclusions regarding the use of QUS and the other methods in the general population or in defined subsets thereof.

In this study, we report the baseline visit of OPUS. To assess the performance of QUS methods, we studied the association with prevalent vertebral fractures and compared it with bone densitometry.

MATERIALS AND METHODS

Recruitment

Five European centers participated in the OPUS study: Aberdeen (Department of Medicine and Therapeutics, University of Aberdeen, Aberdeen, UK), Berlin (Diagnostische Radiologie, Klinikums Benjamin Franklin der Freien Universität Berlin, Berlin, Germany), Kiel (Medizinische Physik, Klinik für Diagnostische Radiologie, Universitäts-klinikum Schleswig-Holstein, Campus Kiel, Kiel, Germany), Paris (Centre d'Evaluation des Maladies Osseuses, Service de Rhumatologie, Assistance-Publique, Hôpital Cochin, Université René Descartes, Paris, France), and Sheffield (Division of Clinical Sciences, Northern General Hospital, Sheffield, UK). The study was coordinated by the “Medizinische Physik” in Kiel, Germany. All investigations were conducted in accordance with the Declaration of Helsinki and were approved by the appropriate institutional human research committee at each participating center.

Participants of the OPUS study were recruited from random population samples between April 1999 and April 2001. In Germany, subjects were randomly selected from government-provided registers (“Einwohnermeldeamtslisten”). Subjects were initially contacted by mail. A similar procedure was followed in France, using registers of a complementary health insurance system. In Sheffield, we worked with several general practices and sent out letters of invitation to all women on their lists who met our inclusion criteria. In Aberdeen, women were selected randomly from a population health register of patients living within a 25-km radius of the city.

In this first contact, subjects were asked to fill out a short questionnaire and to state whether they would be interested in participating in the examination at the local hospital. Subjects that expressed interest were contacted by phone, and the study visit was scheduled. “No-shows” were re-contacted by phone to arrange a new appointment; subjects that again did not show up were excluded from the study.

As recruitment progressed, the response rates stratified by 5-year age groups were monitored. To achieve a homogeneous distribution across the age range to be covered, the age distribution of remaining mail contacts was adjusted to enhance recruitment from age groups that were initially under-represented.

We included women of two different age segments: 20-39 years of age (“younger women”) or 55-79 years of age (“older women”). Exclusion criteria were limited to disorders that precluded valid QUS measurements (i.e., bilateral fractures of the calcaneus, bilateral hip prostheses, disorder of the hand), general inability to undergo the specified exams, and cognitive limitations that preclude filling out self-administered questionnaires. Pregnant women were excluded because of potential risks associated with X-ray exposure.

Examinations and questionnaires

The visit involved a large number of examinations and questionnaires and took ∼5 h per participant.

Ultrasound measurements

QUS was obtained on five different QUS devices. For measurements at the calcaneus, we used the following: Achilles+ (GE Lunar, Madison, WI, USA), UBIS 5000 (Diagnostic Medical Systems, Montpellier, France), DTU-one (OSI/Osteometer Meditech, Hawthorn, CA, USA), and QUS-2 (Quidel/Metra, San Diego, CA, USA). For measurements at the finger phalanges, we used the DBM Sonic BP (IGEA, Carpi, Italy).

The following QUS variables were evaluated: speed of sound (SOS) on the Achilles+, UBIS 5000, and DTU-one; broadband ultrasound attenuation (BUA) on all four calcaneus devices; stiffness index as a linear combination of BUA and SOS on the Achilles+; amplitude-dependent SOS (AD-SOS); and as secondary variables, bone transmission time (BTT) and ultrasound bone profile index (UBPI) on the DBM Sonic BP device.

Each measurement was performed twice on each device, with interim repositioning of the subject. If the two measurements deviated by more than a predefined device-specific threshold, a third measurement was obtained. The predefined device-specific threshold was set to three times the estimated precision error (1.5% for stiffness of the Achilles+; 1.7 and 4 m/s for SOS of DTU-one and UBIS 5000, respectively; 1, 0.5, and 2 dB/MHz for BUA of the DTU-one, UBIS 5000, and QUS-2, respectively; and 10 m/s for AD-SOS of the DBMSonic BP. Please note that these unstandardized precision errors should not be used to judge technique performance—they cannot be compared without standardization according to the response rate of the variables, a topic that is beyond the scope of this paper). This resulted in the following thresholds: 4.5 for stiffness of the Achilles+; 5 and 12 m/s for SOS of DTU-one and UBIS 5000, respectively; 3, 1.5, and 6 dB/MHz for BUA of the DTU-one, UBIS 5000, and QUS-2, respectively; and 30 m/s for AD-SOS of the DBMSonic BP. To obtain the final result for any variable of a given patient, we averaged two results—if a third measurement had been taken, the two closest results were averaged.

Bone densitometry

Bone densitometry was performed using DXA of the spine and the proximal femur in postero-anterior projection (Hologic QDR-4500; Hologic, Bedford, MA, USA in the Kiel, Paris, and Sheffield centers) or in antero-posterior projection (Lunar Expert devices; GE Lunar, Madison, WI, USA in the Aberdeen and Berlin centers).

Radiography

Vertebral fracture status was determined for all women in the group of older women. Lateral spinal radiographs of the thoracic (breathing technique) and the lumbar spine were obtained using standardized procedures and were centrally evaluated in the center in Berlin.

Other examinations

A number of other tests were performed that are not described in detail here. These included lateral imaging of vertebral deformities on the DXA devices, laboratory blood and urine assessments, functional tests on muscle status, balance, and pulse rate.

Questionnaires

Each participants filled out a number of questionnaires. The “OPUS risk factor questionnaire,” a modified version of the EVOS risk factor questionnaire of the European Vertebral Osteoporosis Study,(19) was administered in interview fashion. It includes biographical questions, aspects of family history of osteoporosis, medical history (with a focus on fractures and falls), medications known to affect skeletal metabolism, nutrition and lifestyle aspects, etc. To assess health-related quality of life, a number of validated instruments and additional standardized questionnaires were used in self-administered fashion. Validated instruments included the “Qualeffo”(20) and the symptoms domain of the “OQLQ”(21) as osteoporosis-specific questionnaires and the “EuroQol”(22) and the “SF12”(23) as generic quality of life questionnaires. Additionally, a Generalized Anxiety Questionnaire(24) was used in three of the centers (Berlin, Kiel, and Paris). Because little data from these questionnaires are used here, no further details are provided.

Quality assurance and standardization

Great care was taken to obtain results in standardized fashion and according to the manufacturers specifications. Standard operating procedures were defined in written form for QUS, DXA, and radiography. Detailed guidelines on how to administer the questionnaires were developed and discussed. To train the local study coordinators and key personal, a central start-up meeting was held in Kiel, Germany, in March 1998 (before study start). In addition, a number of technical quality control and standardization measures were implemented.

Ultrasound measurements

The quality of the study procedures was assured by rigid quality control measures. These include the following components:

• Stability of the QUS equipment was monitored at the local centers by daily measurement of the manufacturer provided phantom plus weekly measurements of bone equivalent QUS standards (the Leeds phantoms(25)) according to predefined routines.(26)

• Performance of the operators was assessed by the QUS coordinator of the study (RB). As a follow-up to the start-up training, he visited the centers twice during the course of the study. During these visits, cross-calibration measurements were performed by him using a set of CIRS cross-calibrations phantoms.(26) Measurements on RB were also obtained.

Because QUS standardization measures have not been validated so far, the results reported here were not adjusted according to cross-calibration data. Assessment of the validity of such standardization measures is currently ongoing and will be reported independently. For this study, only descriptive observations regarding equipment performance will be given to show that the devices were in proper working condition.

Bone densitometry

For DXA, quality assurance concepts and quality control procedures are well established.(27,28) DXA results were corrected for longitudinal changes (based on daily measurements of the European Spine Phantom(29)) and differences among centers according to published methods.(28) Results of the same brand were adjusted according to the cross-calibration phantom data, whereas results of different brands were standardized by expressing DXA results as standardized BMD (sBMD).(30,31) For standardized BMD of the lumbar spine (sBMDspi) the total BMD of vertebrae L2-L4 was evaluated. Subjects in whom less than two vertebrae could be evaluated were excluded from the analyses. For standardized BMD of the hip (sBMDhip), the total BMD of the proximal femur was evaluated.

Fracture assessment

Radiographs were taken according to the specifications of the standard operating procedures and were evaluated centrally by two radiologists (TB and GA). The procedure to assess fracture status combined morphometric measurements of vertebral height ratios and the qualitative interpretation of fracture status by a radiologist. This way osteoporotic fractures could be distinguished from deformities due to other causes, using established criteria.(32–34) For all vertebra considered as deformed in fashion typical for osteoporosis, the shape and the magnitude of deformation (>20% height reduction) were noted. While the morphometric and the qualitative evaluations were performed at the same time, the grading was established independently. In only two cases, discrepant results were observed, and here the results of the quantitative morphometry were used, that is, the two cases were not considered as fractured despite of their status “fractured” based on qualitative radiological reading.

Degenerative changes in the spine

In addition to the fracture assessment, all patients were also evaluated with regard to presence of degenerative changes in the thoracic or lumbar spine. Grading was performed according to the Kellgren score,(35) ranging from 1 (no degenerative changes) to 4 (severe degenerative changes). Separate scoring was done for the lumbar and the thoracic spine. In patients with Kellgren scores of 3 or 4 for the lumbar spine, it may not have been possible to obtain accurate spinal bone density results using DXA because of the overlying ossifications.

Questionnaires

For the questionnaires, the accuracy of the translation had to be assured. The questionnaires used in OPUS either consisted of instruments previously validated for the English, German, and French languages,(19,36,37) or they were translated and back-translated by the OPUS team.

Statistical methods

All statistical evaluations, except otherwise noted, were carried out using JMP software (SAS Institute, Cary, NC, USA). Goodness of fit tests for normal distributions were based on χ2 tests. Tests for differences among multiple groups were based on the Tukey Kramer HSD test. The association of DXA or QUS variables with vertebral fracture status were analyzed using logistic regression analysis. For most variables, the distribution was close to normal, and for consistency, we have treated all statistics parametrically here. Results were expressed as standardized odds ratios (sORs), that is, the increase in the odds of fracture per 1 population SD decrease in the respective DXA or QUS variable. For age-adjusted ORs, the SE of the estimate (SEE) of linear age-related decreases was used for standardization. Differences in the discriminatory power of techniques were analyzed by receiver operating characteristics (ROC) analysis using the ROCKIT software(38) and were based on a two-sided test comparing the areas under the curves (AUC). Using multivariate logistic regression analysis, we also investigated whether a combination of several QUS or DXA variables would improve the ability to identify subjects at high risk of prevalent vertebral fractures. A p value of less than 0.05 was considered significant, and a trend was defined as 0.05 ≤ p < 0.20.

RESULTS

Baseline characteristics and confounders

For this report, we present results from both the group of younger and older women. For the group of younger women recruited, we report data from those 463 who were 20-39 years of age at the time of the study visit, had a valid DXA measurement of the femur, and had filled out the questionnaires. For the group of older women, we report results from those 2374 who were 55-79 years of age at the time of the study visit, had a valid DXA measurement of the femur, had radiographs of sufficient quality to allow assessment of vertebral fracture status, and had filled out the questionnaires. Ninety-nine percent of the women were of white ethnicity. A total of 379 (16.0%) of the older women had one or more vertebral osteoporotic fractures, and 147 (6.2%) of them had multiple osteoporotic fractures. Among the 379 women with osteoporotic vertebral fractures (i.e., a vertebral height reduction of more than 20%), the height reduction of the most severely affected vertebra exceeded 25% in 270 participants (11.4%), 30% in 214 participants (9.0%), 35% in 174 participants (7.3%), and 40% in 135 participants (5.7%). Lumbar Kellgren scores showed the following distribution: 170 women (7.1%) grade 1, 946 (39.8%) grade 2, 894 (37.7%) grade 3, and 364 (15.3%) grade 4. Table 1 lists the baseline characteristics of the two groups. Numbers of examinations differ among variables because two devices were not installed in Paris (Achilles+ and QUS-2) and because of temporary equipment breakdown.

Table Table 1.. Population Characteristics of Participants of the OPUS Study
original image

Both in the younger and the older group we noted significant differences among centers with regard to height (greatest in Kiel and Berlin, the younger women of the German sites were 3.6-5.6 cm taller than women at the three other sites), weight (for young women, they were highest in Sheffield and lowest in Paris, with a significant difference of 7.3 kg between these sites; in older women, they were lowest in Paris, significantly lower by 5.9-7.5 kg than women at any other site), and body mass index (BMI; in younger and older women, they were highest in Sheffield, and significantly higher by 1.4-2.0 kg/m2 in younger women and 1.2-2.9 kg/m2 in older women than women at the three sites outside the UK; in older women, they were lowest in Paris, and significantly lower by 1.4-2.9 kg/m2 than women at any other site).

Fracture discrimination

When relating DXA and QUS results with prevalence of vertebral fractures, all methods showed significant associations. Table 2 (column 3) shows unadjusted sORs (per 1 SD decrease, using the SD of the group of older women). For the relationship of DXA and QUS with fracture prevalence, age was a significant confounder for all variables tested. Results for age-adjusted logistic regression are also summarized in Table 2, separating data for the maximum size set of subjects for any given DXA or QUS variable (columns 2 and 4-6) and data obtained on the subset of the 1265 women with complete information on all DXA and QUS variables assessed (columns 7-9). The ranking of techniques was virtually identical for age-adjusted and unadjusted models (comparing columns 3 and 6). However, age-adjusted models showed better discrimination of fractured and unfractured individuals than unadjusted models, reflecting the independent contribution of age and the respective BMD or QUS variable.

Table Table 2.. Unadjusted and Age-adjusted Prediction of Vertebral Fracture Status
original image

After adjusting for age, the following results were observed for additional confounding variables. Neither height nor weight nor BMI by themselves showed any age-adjusted association with fracture status. Confounders in the association of DXA or QUS variables with fracture status height again did not contribute independently. Weight, however, was a significant confounder for the association of age-adjusted DXA of the spine (p < 0.02) and age-adjusted DXA of the total hip (p < 0.003) with fracture prevalence. After adjusting for age and the respective BMD variable, fracture risk increased, with an OR of 1.2 per SD (12.4 kg) increase in weight. BMI also was a significant confounder for fracture status assessed by age-adjusted DXA of the spine and DXA of the total hip (p < 0.03 and p < 0.001, respectively): fracture risk increased with ORs of 1.15 and 1.25 per SD (4.5 kg/m2) increase in BMI, respectively. Neither height, weight, nor BMI showed a significant effect for the association of the QUS variables studied with vertebral fracture status. Differences among centers also did not affect any of the associations of DXA or QUS with vertebral fracture status. A separate analysis of women with different levels of degenerative change in the lumbar spine revealed slightly higher ORs for DXA of the spine if those 1258 women with lumbar Kellgren scores greater than 2 were excluded. The standardized age-adjusted OR increased to 1.60 but only if the population SD of the entire data set was still used. If this was calculated only for the subset with lumbar Kellgren scores less than 2, the standardized age-adjusted ORs returned to the value of 1.55 listed in Table 2. Consequently, all further analyses were performed without excluding subjects with higher lumbar Kellgren scores.

The age-adjusted associations of the various variables with vertebral fracture prevalence were also analyzed by ROC analysis in the 1265 women with complete data sets (areas under the ROC curves listed in Table 2). Only small differences were observed (range AUC, 0.65-0.67). Compared with DXA of the hip or spine, no significant differences were observed for any of the QUS variables. SOS of the Achilles+ had a significantly higher AUC compared with BUA of the UBIS 5000, and SOS of the UBIS 5000 had a significantly higher AUC compared with BUA of the Achilles+. No other significant differences were observed among QUS variables. However, when comparing SOS and BUA obtained on the same device, SOS always showed somewhat better results, with AUCs at least as good as those observed for DXA (see also data on combinations of variables reported below). The secondary QUS variables available on the DBM BP (BTT and UBPI) did not offer any improvement over AD-SOS. The stiffness index provided on the Achilles+ showed performance levels between those observed for BUA and SOS on this device.

To improve the statistical power for detecting differences in performance among QUS variables, AUCs were also calculated for the data set of maximal size for each pair of variables. Again, equivalent performance compared with DXA of the spine and hip was observed for SOS of the Achilles+, SOS of the UBIS 5000, and BUA of the QUS-2 (with differences in AUC ranging between 0.01 and 0.018). For the other variables, trends or even significant differences in AUC compared with DXA of the spine and the hip were found: SOS of the DTU-one (p < 0.05 and p < 0.04), BUA of the Achilles (p < 0.02 and p < 0.06), BUA of the DTU-one (p < 0.003 and p < 0.002), BUA of the UBIS-5000 (p < 0.11 and p < 0.12), and AD-SOS of the DBM BP (p < 0.002 and p < 0.001)—AUC differences between DXA and QUS variables ranged between 0.018 and 0.053. AUC differences among QUS variables were generally smaller and not significant. SOS again showed somewhat better performance compared with BUA obtained on the same device, but significantly better performance was observed only for SOS of the Achilles+ compared with BUA of the DTU-one and for SOS of the UBIS 5000 compared with AD-SOS of the DBM BP.

We also analyzed the discriminatory power of the various techniques for vertebral deformities considered to be solely of degenerative origin. Eighty cases showed only degenerative deformities and no osteoporotic fractures, and 95% of them had Kellgren scores greater than 2 in either the thoracic or in the lumbar region. When we compared those 80 cases with cases without deformities, no significant age-adjusted difference was observed for any of the DXA or QUS variables.

Fracture detection efficiency

Bone densitometry and QUS methods could potentially be used to identify women at highest risk of having a vertebral fracture—most likely unknown to them. For these women, spinal radiography might be justified. To study whether the observed differences in the gradients of risk would result in substantial performance differences among devices, we calculated the sensitivity of the techniques to identify women with prevalent osteoporotic vertebral fractures. Sensitivity was plotted as a function of the X-ray referral rate ranging from 0% to 100%. An X-ray referral rate of 20%, for example, would mean that those 20% of the subjects with the highest estimated vertebral fracture risk (calculated from the age-adjusted logistic regression analysis of the respective DXA or QUS variable under study) would be referred for further radiographic fracture examination. Based on the diagnostic result of the radiographic assessment, the sensitivity and specificity of BMD or QUS variables for predicting a patient's true fracture status can be calculated.

Results for two BMD variables and three selected QUS variables are displayed in Fig. 1. The presentation mode is similar to ROC curves, but for ease of interpretation here, the “X-ray referral rate” was directly used as an independent variable instead of “1 − specificity”; the maximal differences between X-ray referral rate and 1 − specificity are ∼4%. As for ROC curves, the performance of a purely random selection of participants to be referred to an X-ray exam would be represented by the diagonal. The vertical distance between any of the curves and the diagonal represents the increase in the rate of vertebral fractures detected. As seen in Fig. 1, the increase in sensitivity typically ranges around 20%.

Figure FIG. 1.

Sensitivity of QUS variables compared with bone densitometry for identifying subjects with prevalent vertebral fractures. Sample results shown for two BMD and three QUS variables: BMD of the lumbar spine, BMD of the total proximal femur, SOS of the Achilles+, BUA of the Quidel/Metra QUS-2, and AD-SOS of the IGEA DBMSonic BP. Results are based on a subset of 1265 women for whom all variables were obtained completely and were calculated from age-adjusted logistic regression analysis of the respective DXA or QUS variable.

Another way to represent the performance is depicted in Fig. 2, where it is expressed as the number of cases to be X-rayed (NNX) to detect one vertebral fracture, as calculated from age-adjusted logistic regression models. In the general population of the older group, the number of X-rays to be taken to detect one vertebral fracture (NNX) could be cut in half from 6.3 to ∼3, if only the 10% women at highest risk were to be targeted based on calcaneal QUS or central DXA, whereas for AD-SOS of the finger phalanges, a corresponding reduction to ∼4 could be achieved. A reduction to ∼4-5 would be the limit for selection based on age alone.

Figure FIG. 2.

The number of women to be X-rayed (NNX) to detect one additional vertebral fracture displayed as a function of the X-ray referral rate. In the highest risk group of 5-20% of the general population of 55-80 year olds, the number can be reduced from 6.3 to ∼2.5-3.5 if preselection was based on age-adjusted BMD or calcaneal QUS results and to ∼4 for QUS finger measurements. Sample results shown for two BMD and three QUS variables: BMD of the lumbar spine, BMD of the total proximal femur, SOS of the Achilles+, BUA of the Quidel/Metra QUS-2, and AD-SOS of the IGEA DBMSonic BP. For comparison, results for selection based on age alone are also shown. Results are based on a subset of 1265 women for whom all variables were obtained completely.

When selecting the 10% women at highest risk in age-adjusted models, DXA and QUS showed agreement in ∼90% of the cases. Comparing the various QUS devices to DXA of the spine, the κ score ranged 0.49-0.52 for BUA, 0.43-0.48 for SOS, and 0.34-0.44 for DBM variables, and it was 0.49 for the stiffness index of the Achilles+. Comparing them to DXA of the total hip, the κ score ranged 0.54-0.56 for BUA, 0.50-0.51 for SOS, and 0.36-0.45 for DBM variables, and it was 0.52 for the stiffness index of the Achilles+.

The magnitude of the age-adjusted standardized ORs was also studied for vertebral deformities of different severity. Figure 3 shows ORs for the variables tested as a function of the maximum reduction in vertebral height for a given participant. The analysis was performed on the subset of 1265 women with complete DXA and QUS data. As can be seen, with increasing severity of the fracture, the ORs increase substantially for most variables, but not for AD-SOS at the finger phalanges. While no significant differences could be observed, the following ranking of the techniques was observed for the most severe fractures (40% height reduction): the strongest risk ratios were calculated for SOS of the DMS UBIS 5000 with sRR = 2.3(1.6-3.1), SOS of the GE Lunar Achilles+ with sRR = 2.2(1.6-2.9), and SOS of the Osteometer DTU-one with sRR = 2.0(1.5-2.7), followed by DXA of the spine with sRR = 1.9(1.4-2.5) and the total femur with sRR = 1.7(1.3-2.3). Somewhat lower ORs were achieved for BUA of the calcaneus, and the standardized age-adjusted ORs for the QUS variables of the finger phalanges were no different from the levels observed for less severe vertebral fractures (Fig. 3). Because of the small number of fractures per group, the above differences between variables did not reach significance in ROC analyses.

Figure FIG. 3.

Discriminatory power for DXA and QUS variables for osteoporotic vertebral fractures of increasing severity (maximum height reduction ranging from >20% to >40% in any of the vertebrae).

Combinations of variables

We also investigated whether a combination of several QUS or DXA variables would improve the ability to identify subjects at high risk of prevalent vertebral fractures. When combining two QUS variables in an age-adjusted model, no significant improvement over single age-adjusted QUS variables could be achieved. In the data set of the 1265 older women for whom complete information on all DXA and QUS variables was available, this was tested for SOS and BUA of the same device, several SOS (or BUA) variables of different calcaneal QUS devices, and SOS of the calcaneus in combination with AD-SoS of the finger phalanges. The stronger performance of SOS was confirmed in multivariate logistic regression models: if both BUA and SOS of a given device were included in an age-adjusted model of vertebral fracture status, only SOS, and not BUA, was highly significant. Although independent contributions were observed for BUA of any of the QUS devices in combination with AD-SOS, the combined predictive power was still smaller compared with age-adjusted SOS of any device.

BMD of the spine and the total femur did not show independent age-adjusted associations with fracture status in the data set of the 1265 women (only in the larger data set of 2340 women). All QUS variables except for BUA of the Achilles+ and BUA of the DTU-one showed significant associations with fracture status independent of either BMD of the spine or the hip (large data set). For SOS of the Achilles+ and SOS of the UBIS 5000, the strongest independent contributions were observed with p < 0.001 for SOS in models with BMD of the spine and p < 0.005 for SOS in models with BMD of the hip. The p values for the BMD variables in these combined models were always smaller than those for the QUS variables at p < 0.0002. The strongest combination was obtained for BMD of the spine and SOS of the Achilles+ with an AUC of 0.68 (and 0.69 once adjusted for weight), up from AUC = 0.67 for either of the two variables alone (difference in AUC not significant). The resulting improvement (e.g., expressed as a reduction of NNX for BMD of the spine and SOS of the Achilles+ compared with NNX based on age-adjusted BMD of the spine alone) was minimal (an average reduction of 3%, e.g., from NNX = 3 to NNX = 2.91).

DISCUSSION

A broad range of diagnostic techniques is available for the assessment of osteoporosis. These include radiological imaging techniques—both X-ray- and ultrasound-based, laboratory tests for the evaluation of biochemical or genetic markers, functional tests, and instruments for the assessment of risk factors. Many of them have been developed or updated in the past few years, and therefore, only limited information about their performance can be obtained from the large epidemiological osteoporosis studies started 10-15 years ago such as SOF,(39) EPIDOS,(40) or the Rotterdam Study.(41) Even less data are available for a comparative assessment or for evaluation of their combined use.

To develop a multidisciplinary assessment strategy for osteoporosis based on a comprehensive range of state-of-the-art diagnostic measurements, we initiated the OPUS study. A population-based sample of Western European younger and older women underwent a comprehensive range of diagnostic tests at the five participating centers. In this first report from the OPUS study, we focus on the cross-sectional performance of five different QUS devices, specifically comparing their ability to identify subjects at risk for prevalent vertebral fractures—other variables are currently being evaluated, and the first prospective data are being collected. The need to present comprehensive well-controlled QUS data are pressing, because to date, reports have either focused on few isolated or discontinued QUS devices(6,7,12) or have not compared QUS performance directly head to head with gold standard methods such as DXA.(8,14) Smaller studies have provided partly discrepant and therefore confusing results. As a consequence, substantial criticism regarding QUS has been voiced. Moreover, because of the lack of consensus about appropriate use and interpretation of QUS methods and results, misuse is increasingly being reported—not so much in the scientific literature but at many osteoporosis meetings that cover diagnostic strategies of how to fight osteoporosis. The OPUS study is powered to provide evidence-based data for a variety of different QUS approaches, in direct comparison with competing techniques, to address the questions about the strengths and limitations of ultrasound approaches.

As the first main performance test, we report the association of nine QUS variables obtained on five different QUS devices with the prevalence of radiologically defined vertebral fractures. All devices, both those measuring at the calcaneus and the one measuring at the finger phalanges, showed significant discriminatory power even after adjusting for age. The highest ORs were achieved for SOS measurements at the calcaneus, and the fracture association reached levels equivalent to the values observed for central DXA measurements on the same subjects. In other words, the vertebral fracture discrimination observed for a direct measurement at the spine was no stronger than for the best peripheral ultrasound-based measurement at the calcaneus. Unlike DXA, fracture risk associations based on QUS variables were not significantly affected by height, weight, or BMI, facilitating the interpretation of results.

We observed some significant differences in performance among DXA and QUS techniques, but the small differences in AUCs from ROC analysis indicate that few of them may be clinically important. For each of the three devices that allowed measuring of BUA and SOS, SOS proved to be superior in performance with regard to the association with vertebral fractures, most notably in multivariate models including BUA and SOS. Consequently, for the purpose of fracture risk assessment, a clinical user may be well advised to concentrate on the SOS result if available. The additional information on risk provided by BUA is not likely to be clinically relevant. The situation is different for the Quidel/Metra QUS-2, which only provides BUA data and which showed fracture discrimination equivalent to central DXA. AD-SOS measurements at the finger phalanges also showed significant age-adjusted associations with fracture status, but the AUCs calculated by ROC analysis, while statistically insignificantly different from calcaneal ultrasound results (except for SOS of the UBIS 5000 evaluated on the large sample), were significantly lower compared with the level observed for central DXA methods (p < 0.05); the secondary QUS variables BTT and UBPI available on the DBM Sonic BP did not offer improvements compared with AD-SOS. The sample size differed quite substantially among QUS variables, ranging from 1552 to 2322. Consequently, the power to detect significant differences also varied in the analyses that were based on pairwise data sets of maximum size, making a comparison of levels of significance difficult. However the differences in AUCs paralleled the p values observed and thus the ranking of techniques should be valid.

Age was an independent predictor of fracture status for all variables tested. A combined model that included age and the respective BMD or QUS variable showed stronger discrimination compared with the univariate model of unadjusted BMD or QUS variables. The data provided in Table 2 can be used to calculate the combined effect of age and the QUS or BMD variable according to the formula

equation image

where OR(age) is specified in column 4, OR(method) in column 6, and SEE in column 5 of Table 2. For example, two women with an age difference of 15 years and an Achilles+ SOS difference of 39.5m/s (1.5 SEE or 1.4 SD of older women) would have odds of having a vertebral fracture that differed by a factor of OR = 1.512 × 1.491.5 = 3.36. If only judged by their unadjusted ORs, the factor would only be OR = 1.721.4 = 2.12.

The strong performance compared with central DXA observed for the state-of-the-art calcaneal QUS devices compares favorably with the data reported for earlier devices. In their meta-analysis, Marshall et al.(42) reported standardized risk ratios (sRRs) for predicting vertebral fractures for calcaneal ultrasound of sRR = 1.8(1.5-2.2), compared with sRR = 2.4(1.8-3.2) for X-ray-based calcaneal measurements and sRR = 2.3(1.9-2.8) for spinal DXA measurements. Their ultrasound data, however, were only based on two retrospective studies(43,44) that both used an ultrasound device that has long been discontinued (Walker Sonix UBA 575). The Hawaii group afterward published a prospective study reporting an sRR = 1.5(1.1,2.2) for calcaneal QUS on that device, again lower than BMD, in this case measured at the hand. The only other prospective study relating ultrasound and DXA with vertebral fracture incidence showed similar results, although the sample size was small and the CIs were correspondingly large.(45) In the time since the meta-analysis of Marshall et al., a number of cross-sectional studies investigating the power of QUS for discriminating vertebral fracture in direct comparison with central DXA have been reported with mixed results. It is interesting to note that the performance of QUS, specifically SOS of the calcaneus, was equivalent to spinal DXA in both of the recent population-based studies: the Basel Osteoporosis Study(46) and the OPUS study reported here.

When comparing discriminatory power for vertebral fractures for DXA versus QUS, a steep increase in the ORs with increasing degree of deformation was observed for calcaneal QUS variables, somewhat stronger and more consistent than the increases observed for DXA of the spine or the total femur (see Fig. 3). The difference in areas under the ROC curves, however, did not reach statistical significance. This may be explained by the small sample size per fracture group and the limited power of ROC analysis in general. Our observation of strong performance of QUS for more severe fractures confirms a similar report stating that QUS associations with multiple vertebral fractures were somewhat stronger than spinal DXA measurements obtained in the same study group.(46)

The standardized age-adjusted ORs observed in our study for DXA of the spine and the hip were somewhat lower than in most other studies (on average sRR = 1.8 and sRR = 2.3 for DXA hip and spine, respectively, in Marshall's meta-analysis(42)). Differences in the SEE used for standardization and in the definition of vertebral fractures may have contributed. As seen in Fig. 3, the age-adjusted sORs generally increased with increasing severity of the fractures.

We investigated whether the discriminatory power of spinal DXA (and perhaps other techniques) was potentially impeded by the presence of degenerative changes. We observed virtually identical age-adjusted sORs if we excluded those 1258 cases that had Kellgren scores >2 in the lumbar spine region. The discriminatory power of DXA of the total femur was not affected either. Thus, the good performance of QUS compared with DXA in assessing vertebral fracture status was not caused by DXA problems caused by degenerative changes. Interestingly, we noted that age-adjusted sORs in subjects with Kellgren score <3 were 3-16% higher for the QUS results obtained at the calcaneus but not at the finger phalanges. With an increase of 13-16%, this difference was most pronounced for the Achilles+ and the QUS-2. Because the 1258 cases with higher Kellgren scores were included in all other analyses presented here, the reported performance of QUS can be considered to be a conservative estimate.

The limited impact of high Kellgren scores on discriminatory power for spinal DXA should not be used as an argument to disregard degenerative changes in the process of diagnostic assessment of individual patients. Here the bias introduced by degenerative calcifications needs to be considered, and affected vertebrae may need to be excluded from the analysis. Even more important is the related topic of degenerative deformities. Our observation that for any of the DXA or QUS variables no significant difference could be observed between cases with degenerative deformities only on the one hand and subjects without deformities on the other hand supports our assumption that those deformities had not been caused by low bone mass. These findings also emphasize the need to perform a careful radiological differential diagnosis to distinguish osteoporotic from non-osteoporotic vertebral deformities. Well-defined and validated criteria, such as those used in this study, may be helpful to employ in clinical practice. The results also provide further evidence that the low QUS readings are related to low bone mass and not to some other property that would be similarly reduced in subjects with degenerative disease.

Except for age, which showed the expected consistent association with fracture risk in univariate (OR of 1.7/10-year age increment) and multivariate models, other potential confounders like body height, weight, or BMI had no effect on the association of QUS with vertebral fracture risk. The association of DXA variables with fracture risk, on the other hand, was significantly confounded by weight, or alternatively, BMI. The observed positive relationship between DXA-adjusted weight or BMI and vertebral fracture risk was unexpected. In the EPOS study low weight and low BMI (not adjusted for BMD) showed nonsignificant trends with increased osteoporotic vertebral fracture risk.(47) A different study reported that such associations were eliminated once weight was adjusted for BMD.(48) Whether our observation indicates that the relationship of weight and BMI with vertebral fractures is different from that observed for hip fractures (where low weight and BMI are commonly reported as independent risk factors) remains to be tested in a prospective study. Potentially, the relationship may also have been affected by the fat error of DXA.(49,50)

Combinations of DXA and QUS showed only limited benefits with regard to fracture discrimination. This was observed for combinations of several QUS variables, combinations of several DXA variables, and combinations of QUS and DXA variables, whether the combination was based on linear combinations of the variables or whether it was based on the minimum Z-score. However, the fact that statistically significantly independent contributions were observed for some QUS and DXA variables, with the most powerful combination being based on spinal DXA and calcaneal SOS, indicates that combined assessments of these two variables may provide important information for some individuals. It is only for a population-based sample like the one investigated here that a general combination of (any) two approaches tested across all patients does not provide substantial benefits for fracture discrimination. For similar reasons, ROC analysis in general has limited power to detect smaller differences among diagnostic techniques, whereas logistic models more readily allow to detect significant independent associations.

DXA and QUS methods are not suited for diagnosing fractures. Therefore, the modest sensitivity to identify women with prevalent fractures in a population-based sample is not surprising and is in agreement with previous reports—for example, the cut-off values of the WHO criteria(51) have been set to achieve high specificity at the expense of low sensitivity.(52) However, it is relevant to develop a strategy to identify subjects at highest risk for having a vertebral fracture, unknown to themselves. Even under health-economic constraints, subjects with prevalent osteoporotic vertebral fracture are in undisputed need for effective treatment because they not only suffer from pain and limitations in daily living,(53) but they are at high risk for additional fractures within a short time period.(54) However, only about one-third of all women with prevalent vertebral deformities know of their health problem,(55,56) but even those who are not aware of their fracture already suffer from back pain(57) not attributed by them to osteoporotic fracture. Therefore, low-cost strategies to identify women with prevalent vertebral fracture can be valuable. Our findings indicate that QUS and DXA methods are equally well suited for this task, but DXA has the disadvantages of higher cost and ionizing radiation. High-risk patients could also be selected based on age, but QUS or BMD are required to obtain addition benefits in the high-risk groups. The NNX data displayed in Fig. 2 give an indication how much can be gained in a population-based sample that could be relevant for cost-effectiveness assessments. However, for the assessment of the individual patient strategies based on age alone have obvious limitations because risk profiles of younger individuals also need to be assessable. The performance of SOS is particularly strong for women with more severe fractures for which the risk for future fractures is higher than for mild fractures. For population-based QUS (or DXA) screening, the number of women to be referred to radiography of the spine can be reduced by one-half in the higher-risk group. However, the most likely strategy would not be a population-based QUS (or DXA) screening but evaluation of a targeted high-risk group by means of one of those two methods. For this purpose the identification of risk factors for low QUS needs to be pursued. If one were able to identify the group corresponding to those 10-15% of our older groups at highest risk, about one in three women would have a prevalent osteoporotic vertebral fracture. Interestingly, the agreement between QUS- and DXA-based identification of the high-risk group was quite good, with κ scores ranging up to 0.56 for DXA of the hip and 0.52 for DXA of the spine. Alternatively, as treatment with a bisphosphonate significantly reduces further vertebral fracture rates by ∼50%, regardless of whether or not the women have BMD-defined osteoporosis,(58) strategies might be developed where treatment of women with very low QUS with a potent bisphosphonate could reduce vertebral fracture rates without the need for central DXA or even radiographs. The effectiveness of strategies such as those described here would have to be tested in a prospective randomized study.

In conclusion, we have demonstrated in a population-based sample that all of the five QUS approaches tested allow identification of women at high risk for prevalent osteoporotic vertebral fractures. Performance differences among QUS variables were modest, but SOS of the calcaneus showed the best performance. Using the strongest variable available for a given device, three of the four calcaneal QUS devices discriminated women with and without vertebral fractures as well as central DXA measurements. Targeted QUS-based case finding strategies would allow for half the number of radiographs in high-risk populations. Compared with SOS of the calcaneus or central DXA, BUA results obtained on the same device and all variables at the finger phalanges showed somewhat less strong performance. The good performance of BUA on the device that does not measure SOS demonstrated that BUA-based results can also show fracture discrimination equivalent to central DXA. SOS of the calcaneus and central DXA measurements showed statistically independent association with fracture prevalence, but the increase over single age-adjusted predictors was small for our population-based sample. The statistically independent association indicates that SOS and DXA results both have relevance for fracture risk of individuals. Targeting subgroups at highest risk enhances the efficiency of QUS-based case finding, and this strategy works increasingly well for women with more severe vertebral fractures. It is likely that the good performance of QUS reported here was in part be achieved by rigorous quality assurance measures, and these should also be used in clinical practice. Whether the tested state-of-the-art QUS methods would also work as well as central DXA in identifying women with incident vertebral or other fractures remains to be investigated.

Acknowledgements

We thank the following members of the OPUS teams at the five participating centers for their contributions: Rosie Reid and Lana Gibson (Aberdeen); the members of the Zentrum für Muskel und Knochenforschung Berlin; Antonia Gerwinn, Dr Maren Glüer, Roswitha John, Roswitha Marunde-Ott, Monika Mohr, Regina Schlenger, Pia Zschoche, Dr Carsten Liess, and Carsten Rose (Kiel); Therese Kolta and Nathalie Delfau (Paris); and Margaret Paggiosi, Nicky Peel, Diane Shutt, Anne Stapleton, and Debbie Swindell (Sheffield). This project was supported by Aventis, Eli Lilly, Novartis, Procter & Gamble Pharmaceuticals, and Roche. We also thank the equipment manufacturers for support: DMS, IGEA, OSI/Osteometer Meditech, and Quidel/Metra.

Ancillary