This work was presented in part at the Second Annual Scientific Meeting of the Society for Clinical Densitometry, January 18–21, 1996, Colorado Springs, Colorado, U.S.A.
To determine if measuring skeletal status at the calcaneus is a potentially valuable technique for diagnosing osteoporosis, we examined five calcaneal assessment techniques in 53 young normal women and 108 postmenopausal women with osteoporosis and compared these measurements to dual-energy X-ray absorptiometry (DEXA) at the calcaneus, hip, and spine. The five instruments, including single-energy X-ray absorptiometry (SEXA) and four quantitative ultrasound (QUS) instruments, were evaluated for precision, ability to discriminate osteoporotic from young normal subjects, and correlation to the other instruments. The coefficient of variation (%CV) for instrument, positioning, interobserver, and short-term precision of the five calcaneal instruments ranged from 1.34–7.76%, 1.63–7.00%, 1.84–9.44%, and 1.99–7.04%, respectively. The %CVs for positioning, interobserver, and short-term precision were similar for calcaneal DEXA, calcaneal SEXA, and stiffness (as measured by Achilles). The %CVs for instrument precision were similar between calcaneal DEXA and SEXA. The ability of the five calcaneal instruments to discriminate osteoporotic from young normal subjects was similar based on the analysis of area under the receiver operating characteristic curves (range 0.88–0.93) and equivalent to DEXA of the calcaneus and hip (0.88–0.93). The correlations between the measurements of five calcaneal instruments were strong (0.80 ≤ r ≤ 0.91, p < 0.001). These data suggest that although the precision is variable, the calcaneal QUS and SEXA instruments can discriminate between osteoporotic patients and young normal controls and appear to be a useful technique for assessment of osteoporosis.
Osteoporosis is a growing concern due to the increased aging of our population. In the United States, there are more than 250,000 hip fractures annually at a cost of over $10 billion.1,2 Because women over the age of 65 are the fastest growing segment of the U.S. population,3 it is expected that the number of hip fractures alone will double or triple within the next quarter century.4 Along with established agents, such as estrogen replacement therapy, additional therapeutic alternatives are available to treat osteoporosis.5 It has therefore become imperative to evaluate and diagnose patients with osteoporosis so that preventive or therapeutic measures can be instituted as soon as possible.
Although there are several techniques to assess bone mineral density (BMD) or content (BMC), including quantitative computerized tomography, dual-photon absorptiometry, single-photon absorptiometry, and radiographic absorptiometry, dual-energy X-ray absorptiometry (DEXA) of the hip and spine has become the most widely accepted technique for evaluation of skeletal status.6 DEXA measurements have good precision and accuracy, low radiation exposure, and are associated with hip and vertebral fracture risk.6 However, because these machines require dedicated office space and can be expensive, they are not always accessible and tend to be located mainly in urban areas.
Quantitative ultrasound (QUS) and single-energy X-ray absorptiometry (SEXA) are potentially useful techniques to evaluate skeletal integrity at peripheral sites such as the calcaneus, patella, tibia, finger, and forearm. While SEXA measures BMC (g/cm), QUS uses sound waves to assess skeletal status. Two parameters are typically measured by QUS: ultrasound velocity or speed of sound (SOS), which reflects the speed of the ultrasound wave; and broadband ultrasound attenuation (BUA), which reflects the frequency dependence of ultrasound attenuation.7 An additional parameter, stiffness, is calculated as a linear combination of SOS and BUA. Both QUS and SEXA are associated with fracture risk.8–12 Although calcaneal devices have been developed that are small, portable, and relatively inexpensive compared with larger, stationary DEXA machines, these techniques have not received widespread clinical acceptance in the U.S.
This study was undertaken to evaluate five calcaneal bone assessment instruments for precision, correlation, and ability to discriminate osteoporotic from normal subjects. The investigation allowed for comparisons between the various calcaneal instruments and comparisons with standard DEXA measurements.
MATERIALS AND METHODS
Two U.S. centers participated in the study. Both centers received Institutional Review Board approval, and each subject gave written informed consent prior to participation. The study population consisted of two groups, young normal and osteoporotic. The subjects were recruited from either the study sites' current patient population or through community-based advertising. The young normal group (n = 53) was comprised of non-Hispanic, Caucasian females aged 25–35 who were not pregnant or lactating and without a history of osteoporosis or medical disorders associated with low bone mass. The osteoporotic group (n = 108) was comprised of non-Hispanic, Caucasian, postmenopausal females at least 55 years of age with a BMD more than 2.5 standard deviations (SD) below the mean young normal bone mass at the femoral neck or trochanter (<0.645 or <0.497 g/cm2, respectively) as determined by the Hologic normative database. The osteoporotic group was comprised of two subgroups: subjects with no history of osteoporotic fracture (osteo non-Fx) (n = 52) and subjects with osteoporotic fracture (osteo Fx) (n = 56). An osteoporotic fracture was defined as a fracture that (1) occurred after menopause, (2) was due to no more than a moderate trauma (i.e., energy less than or equal to a fall from standing height), and (3) involved 1 of 15 fracture sites associated with low bone mass or mild/moderate trauma.13
Excluded from the osteoporotic group were subjects with a history or evidence of metabolic bone disease (other than postmenopausal bone loss), those ever treated with fluoride, or those who took high doses of calcium (>1500 mg/day) or vitamin D (>800 IU/day) within the past year. Subjects who initiated use of the following medications within the prior year were excluded: estrogen, glucocorticoids, anticonvulsants, anticoagulants, bisphosphonates, calcitonin, androgen, or vitamin D. Subjects on these medications at a stable dose during the entire year prior to the study and who planned to continue use throughout the study were eligible.
Bone assessment technologies
Four QUS instruments were examined and included: Achilles, Lunar Corporation (Madison, WI, U.S.A.); CUBA, McCue Ultrasonics, Ltd. (Hampshire, U.K.); QUS-1X, Osteo Sciences Corp. (Beaverton, OR, U.S.A.); and Ultrasonic Bone Analyzer (UBA575+), Hologic, Inc. (Waltham, MA, U.S.A.) (Table 1). A single calcaneal SEXA instrument was used: OsteoAnalyzer, Dove Medical Systems, Inc. (Newbury Park, CA, U.S.A.). DEXA was used to assess calcaneal, hip, and spine BMD (QDR-1500 and QDR-2000; Hologic, Inc.). All of the QUS instruments provided multiple measurement parameters; however, the single measurement parameter considered by the corresponding instrument manufacturer as the most clinically relevant was used in the primary analyses. Secondary analyses with the other QUS measurement parameters are presented in Appendix 1.
Table Table 1. Calcaneal Instruments
Measurements were conducted by two technicians at each study site. One technician at each site was identified as the primary technician for the Achilles, OsteoAnalyzer, and UBA575+, and the other technician was the primary technician for the CUBA, calcaneal DEXA, and QUS-1X. The technicians received training on each machine and were instructed to conduct all measurements according to the manufacturers' operating manuals.
This cross-sectional study was conducted during a 7-month period. Qualification of each subject was determined by obtaining a medical history including prior medications and a disease-directed physical examination on the subjects who reported any ailments. All subjects of child-bearing potential were required to have a negative urine pregnancy test. Vital signs, height, and weight were collected. Subjects who met all inclusion and exclusion criteria underwent a DEXA scan of the hip and posteroanterior spine.
Within each group (young normal, osteo non-Fx, osteo Fx), qualified subjects were randomly assigned to one of four groups. All measurements were conducted by the primary technician for the instrument with the exception of the positioning/interobserver precision group. The baseline group underwent one measurement on each of the five calcaneal instruments plus calcaneal DEXA within a single day. The instrument precision group underwent a total of five measurements on each instrument without repositioning; measurements for all instruments were completed within a 1-week period. The positioning/interobserver precision group underwent a total of four measurements on each instrument within a 1-week period. The first three measurements were conducted with repositioning by the primary technician; the fourth measurement was conducted by the secondary technician. The short-term precision group underwent one measurement by each of the instruments on 5 separate days within a 2-week period.
Throughput time was defined as the time inclusive of preparing the subject for the measurement (e.g., swabbing the heel with alcohol, positioning the foot), entering the subject demographics into the computer, performing the measurement, and analyzing the measurement.
An in vivo cross calibration of the two units of each of the four QUS instruments was conducted at the beginning of the study (at one study site) and at the end of the study (at the other study site). A total of 10 subjects were selected at each study site with T scores at the femoral neck or trochanter ranging from −3.0 to +3.0 on a recent DEXA scan (within the past 6 months). Each subject underwent five measurements (without repositioning) on each instrument by the primary technician. In vitro cross calibration was performed at the beginning and end of the study using instrument-specific phantoms supplied by the manufacturers.
Precision was assessed using two parameters, coefficient of variation (%CV), and standardized coefficient of variation (%SCV), for the five calcaneal instruments compared with the precision of calcaneal DEXA. For testing and estimation purposes of the %CVs for instrument, positioning, interobserver, and short-term precision, the logarithmic transformation was used. The SD of the logarithmic transformed data was computed for each subject on each instrument. The estimated %CVs were pooled across subjects to obtain estimates for each instrument. The estimated %CVs of the instruments were compared by taking the square root of the 95% confidence intervals on the ratios of the squared %CVs for all pairs of instruments using the F-distribution. The %SCV was calculated using the ratio of the pooled within individual variation to the 95% range of the data. Ninety-five percent confidence intervals were generated for all precision estimates. The lowest precision indicates the best precision.
Measurements, T scores, and throughput time
Descriptive statistics were calculated on the first measurement of each instrument and subject. For each instrument, the mean measurements from the young normal, osteo non-Fx, and osteo Fx group were compared by F-test from two-way analysis of variance (ANOVA) tests with effects for center and cohort. T scores were calculated for each subject and calcaneal instrument using the young normal group from the study as the reference database:
In addition, with the young normal group as the reference database, T scores were calculated for DEXA femoral neck and trochanter. Statistical similarity of T scores was assessed by Duncan's multiple range test.
Receiver operating characteristic (ROC) curves and area under the curve (AUC) were used to evaluate the sensitivity and specificity of each calcaneal instrument, DEXA femoral neck, and trochanter to (1) identify osteoporotic subjects from all study subjects and (2) discriminate between osteoporotic subjects with and without fracture, using the first measurement by each instrument and subject.
Correlations of the measurements from all calcaneal instruments and hip and spine DEXA were assessed using Pearson's product moment correlation coefficients. The first measurement by each instrument and subject was used to calculate the correlation coefficients.
Analyses were conducted using both an “all scans” and a “per protocol” approach. The all scans approach included the measurements from all subjects who qualified for the study and underwent the protocol specified procedures. The per protocol approach was the result of a post hoc quality assurance audit of all calcaneal measurements conducted for the study (3005 measurements). This quality assurance audit identified 72 measurements (2.4%) that did not meet the manufacturers' specifications (OsteoAnalyzer, n = 29/502 measurements [5.8%]; calcaneal DEXA, n = 6/501 [1.2%]; QUS-1X, n = 37/500 [7.4%]). These measurements were excluded from the per protocol analyses. Reasons for exclusion included incorrect positioning, movement, artifacts, and wrong foot scanned (left or right). The 72 individual measurements that were excluded from the per protocol analyses were distributed among all of the four subject groups. In addition, the measurements of the subjects initially recruited into the osteoporotic group who did not qualify as osteoporotic (n = 3) and the measurements of the subjects who did not have all of their measurements completed in 31 days as specified by the protocol (n = 2) were excluded from the all scans and per protocol analyses of measurements, T scores, and discriminatory ability. The measurements from the latter two subjects were also excluded from the correlation calculations. Unless otherwise noted, data presented are from the per protocol analyses.
For each study site and QUS instrument, the data were modeled using a repeated measures ANOVA, and 95% confidence intervals on the mean differences between the units were constructed. In addition, descriptive statistics were used to summarize measurements using the phantoms for each unit, and in cases where the same phantom was used at both sites, the difference between sites was also summarized.
A total of 161 subjects were enrolled in the study (young normal, n = 53; osteo non-Fx, n = 52; osteo Fx, n = 56). In the osteo Fx group, 32 women (57%) had fractures of the hip, vertebrae, and/or wrist and 24 women (43%) had other osteoporosis-associated fractures. Demographics for both of the osteoporotic subgroups were similar (Table 2) with the exception that, on average, the subjects in the osteo Fx group were 4 years older and postmenopausal 2.5 years longer than those in the osteo non-Fx group. A comparison of the subject demographics between the two study sites revealed that the osteo non-Fx group subjects at one site were on average 6.5 years older and 6.1 cm shorter than those at the other study site. These differences were not considered clinically meaningful.
Table Table 2. Demographics of Study Population
Table 3 summarizes instrument, position, interobserver, and short-term precision. The %CV and %SCV for instrument precision (i.e., repeat measurements with no repositioning) of the five calcaneal instruments ranged from 1.34 to 7.76% and 1.16 to 8.13%, respectively, as compared with precision of the calcaneal DEXA (%CV, 1.23%; %SCV, 1.07%). The OsteoAnalyzer had the lowest %CV, which was statistically similar to calcaneal DEXA.
Table Table 3. Precision
The positioning precision (i.e., repeat measurements on same day with repositioning; %CV and %SCV) of the five calcaneal instruments ranged from 1.63 to 7.00% and 1.68 to 6.84%, respectively, as compared with precision of the calcaneal DEXA (1.63 and 1.68%). The Achilles and OsteoAnalyzer had the lowest %CVs, which were statistically similar to calcaneal DEXA.
The %CV and %SCV for interobserver precision (i.e., repeat measurements on same day by different technicians) ranged from 1.84 to 9.44% and 1.72 to 9.16%, respectively, as compared with precision of the calcaneal DEXA (1.91 and 1.96%). The Achilles and OsteoAnalyzer had the lowest %CVs, which were statistically similar to calcaneal DEXA.
The short-term precision (i.e., repeat measurements within 2 weeks) ranged from 1.99 to 7.04% and 1.76 to 6.81% (%CV and %SCV, respectively) as compared with precision of the calcaneal DEXA (2.02 and 2.35%). The Achilles and OsteoAnalyzer had the lowest %CVs, which were statistically similar to calcaneal DEXA.
When the analyses were performed including all scans (i.e., adding the 72 measurements that had been excluded during the quality assurance audit), the %CV and %SCV were similar to the per protocol results for all instruments except the OsteoAnalyzer. For the OsteoAnalyzer, the instrument precision (%CV and %SCV) increased from 1.34 to 6.71 % and 1.16 to 5.75%, respectively, positioning precision from 1.63 to 8.02% and 1.68 to 7.54%, interobserver precision from 1.84 to 5.79% and 1.72 to 5.38%, and short-term precision from 1.99 to 5.86% and 1.76 to 5.16%.
Measurements, T scores and throughput time
Higher mean measurements (p < 0.001) were obtained in the young normal group than in either of the osteoporotic groups for all instruments and body sites (Table 4). Similarly, higher mean measurements were obtained in the osteo non-Fx group than in the osteo Fx group for all technologies (p ≤ 0.024) except DEXA of the spine (p = 0.450).
Table Table 4. Measurements by Instrument
Mean T scores for the osteoporotic subjects ranged from −1.79 to −2.43 (Table 5). T scores from DEXA femoral neck (−2.43), OsteoAnalyzer (−2.36), QUS-1X (−2.31), and Achilles (−2.30) were statistically similar.
Table Table 5. T Scores* from All Osteoporotic Subjects
Throughput time (mean minutes ± SD) was shortest for the CUBA (4.09 ± 1.38) followed by QUS-1X (5.69 ± 1.28), OsteoAnalyzer (6.04 ± 1.47), Achilles (8.55 ± 3.14), UBA575+ (10.89 ± 1.48), and Calcaneal DEXA (11.40 ± 1.81). Throughput time was assessed using the all scans analysis.
The data from all five calcaneal instruments and DEXA calcaneal, femoral neck, and trochanter were examined with ROC curves to assess the discriminatory ability between the osteoporotic and young normal subjects. The eight curves were similar and exhibited discriminatory ability (AUC ranged from 0.88 to 0.93) (Fig. 1). ROC curves for all techniques could not discriminate between osteo Fx subjects and osteo non-Fx subjects (Fig. 2).
The correlations between the five calcaneal measurements (QUS and SEXA) were strong (0.80 ≤ r ≤ 0.91, p < 0.001), and correlations between the six calcaneal measurements (including calcaneal DEXA) and the DEXA of the femoral neck, trochanter, and spine were generally weaker (0.58 ≤ r ≤ 0.82, p < 0.001) (Table 6).
Table Table 6. 95% Confidence Intervals for Correlation of Measurements Between Instruments for All Subjects*
Review of the initial and final cross calibration results from the QUS instruments did not indicate any clinically significant differences among the instruments (data not shown).
In this study, designed to examine the precision and discriminatory ability of five calcaneal bone assessment technologies, we found that all calcaneal instruments were able to discriminate osteoporotic patients from young normal controls in a manner similar to DEXA of the hip. In general, all of the calcaneal instruments appeared to have adequate precision for identifying patients with osteoporosis, since the precision errors were much smaller than the difference between young normals and osteoporotics (approximately 24–34%) (Table 4).
For all types of precision studied, SEXA performed in a similar manner to calcaneal DEXA with %CV approximately 2% or less. The most clinically relevant precision categories are likely the positioning and short-term precision. For these categories, the %CV for Achilles was statistically similar to that of calcaneal DEXA and SEXA. For positioning precision, CUBA and UBA575+ were intermediate and QUS-1X was the highest (Table 3). Although we were unable to perform hypothesis testing for %SCV, the performance trends were similar to the %CV. Comparisons with the literature are problematic because many studies have not clearly defined the type of precision assessed. Nonetheless, the %CVs that we obtained were similar to those previously reported.14–24 The precision results reported here are specific for the instruments studied. These results cannot be applied to other peripheral instruments measuring other sites (e.g., patella, tibia, or phalanx).
For instrument and interobserver precision, the best precision among the ultrasound machines was seen for those using a water bath. This was not true for positioning precision, where the CUBA and UBA575+ were similar, and short-term precision, where the CUBA performed better than the UBA575+. In choosing an appropriate ultrasound device for a particular application, many factors need to be taken into consideration, such as precision, size, cost, ease of use, patient comfort, and portability. For diagnostic use, all instruments demonstrate adequate precision. However, if the goal is to monitor changes over time, then an instrument with good short-term precision would be preferable.
It should be noted that the precision error calculated as %CV is influenced by the QUS variable used. Techniques that report SOS or stiffness often have lower %CVs than those that report BUA. Therefore, using stiffness (defined as a combination of BUA and SOS) as the most clinically relevant measurement for the Achilles instrument may explain why its %CV was lower than that of the other QUS techniques which used BUA as the most clinically relevant measurement (Table 3 and Appendix 1). Because of difficulty in comparing %CVs, some investigators23,25 have suggested using a %CV adjusted for the range of clinical values known as the standardized %CV (%SCV). This eliminates the favorable bias on instruments that offer a small range of clinical values when compared with instruments that offer a large range of clinical values. We found that whereas the %SCV for calcaneal SEXA and calcaneal DEXA were nearly equivalent to %CV for each precision studied, the %SCV for QUS measurements ranged from nearly equivalent to the calcaneal DEXA precision values to nearly eight times greater than the calcaneal DEXA precision values (Table 3).
There were no significant differences for the precision determinations using an “all scans” analysis versus a “per protocol” analysis, with the exception of the calcaneal SEXA, which improved significantly when only properly performed scans were analyzed. This was primarily due to misplacement of the heel and heel movement during the scan. Per protocol analyses provide reliable information since this reflects the performance quality when operated as specified by the manufacturer. In contrast, the all scans analysis provides “real world” use of the machines when quality assurance of each scan cannot be performed routinely. Since both analyses provide information that could be pertinent to several different medical settings and the results were relatively similar, with the exception of the SEXA instrument (OsteoAnalyzer), the per protocol analysis is provided in the body of this paper.
For all calcaneal measurements, the ability to discriminate patients with osteoporosis from young normal controls was similar to that of DEXA of the calcaneus and hip. As expected, the discriminatory ability at the femoral neck was good since DEXA of the femoral neck was a criteria used to classify subjects as osteoporotic. Other studies have demonstrated that QUS techniques can discriminate between controls and patients with hip fracture,14,16,26–28 vertebral fracture,26,28–30 wrist fracture,31 and osteopenia.32,33 In practice, this would be a clinically relevant use for the calcaneal instruments, in that a diagnosis of osteoporosis can be established and appropriate therapy initiated. Although there were significant differences in mean values between osteoporotic subjects with and without fractures by ANOVA, using ROC analysis we could not distinguish between the two groups. This may be because all of our patients were enrolled with the primary inclusion criterion of a low hip BMD. In addition, it is well established that factors independent of BMD (e.g., fall type, body mass index, and use of medications associated with falls, etc.) are associated with hip fractures8,34–36; these factors were not assessed.
We found a strong correlation among all calcaneal measures (0.79 ≤ r ≤ 0.93). We observed a moderately strong correlation between calcaneal measurements and DEXA measurements of the hip or spine. These correlations are similar to what has been reported when comparing measurements at peripheral sites to central site measurements.37
There were several strengths of this study. It is one of the few to compare the precision and discriminatory abilities of five different calcaneal instruments. In addition, we compare their performance to DEXA of the calcaneus, hip, and spine. Furthermore, we used our own reference group of young normal controls, basing our T score and ROC curves on this group rather than the manufacturer's normative database.
However, our study also has several limitations. Because we defined our study population (osteoporotic versus young normal) based on hip BMD measurements provided by the manufacturer's database, it would be anticipated that we would be able to discriminate between osteoporotic and nonosteoporotic patients based on hip BMD measurements. Although we used the manufacturer's database for classification of osteoporosis based on WHO criteria, use of other young normal databases such as NHANES III38,39 may have resulted in a slightly different study population. In fact, the mean and standard deviation values for the hip femoral neck and trochanter derived from the young normal population used in this study are different from the values in use by the manufacturer. This may be explained by differences in the selection of the normative population, geographic differences, and the age ranges used. We only included non-Hispanic, Caucasian, osteoporotic women in the postmenopausal group. Precision values among the technologies may be different for men, patients with metabolic bone disease, and non-Caucasians. In addition, we only examined precision over a short period of time. We did not examine whether these techniques would be appropriate for long-term follow-up with or without therapy. Because therapy today is primarily directed at maintaining or improving BMD at the hip and spine, follow-up of calcaneal measurements could potentially be misleading if therapies have differential effects at various skeletal sites.40,41
In summary, we found that the five calcaneal instruments can discriminate osteoporotic subjects from young normal controls. The techniques have variable precision but appear to be adequate for detection of osteoporosis. Because these calcaneal instruments provide rapid measurements, do not require dedicated office space, and are more affordable than DEXA of the hip and spine, they offer a reasonable alternative for the initial evaluation of patients with osteoporosis. However, future studies are needed to determine whether these techniques can be used to monitor disease progression or response to therapeutic intervention.
We thank Lauren Ferguson and Michelle MacCallum (Osteoporosis Prevention and Treatment Center, Beth Israel Deaconess Medical Center, Boston, MA, U.S.A.), and Elizabeth Allen and Sandra Veith (Bone and Mineral Research Unit, Oregon Health Sciences University, Portland, OR, U.S.A.) for performing and analyzing the subject measurements. This study was funded and conducted by Merck & Co., Inc.
Table Appendix 1. Secondary Analyses with Other Measurement Parameters of the Quantitative Ultrasound Instruments