As part of an effort to quantify device-dependent differences in forearm bone density, 101 women, aged 20-80 years (∼16 women in each age decade), were scanned on six forearm bone densitometers: the Aloka DCS-600EX, the Hologic QDR-4500A, the Lunar PIXI, the Norland pDEXA, the Osteometer DTX-200, and the Pronosco X-posure System. Regression statistics are reported for all similar regions of interest (ROIs). However, comparisons were confounded because of large differences in the ROI size and placement. The number of ROIs reported for a single scan by each device varied from 1 to 12. The correlation coefficients ranged from 0.7 < r < 0.97, with the highest correlation coefficients and lowest SEs for comparisons between the most similar ROIs. Standardized units of bone mineral density are derived for distal (sdBMD), mid- (smBMD), and proximal (spBMD) ROIs that allow for comparable mean bone densities to be derived for patient populations. Five phantoms were scanned and characterized on five of the devices and the precision and mean values were reported. These phantom values will aid in the in vitro cross-calibration between manufacturers to recreate the presented in vivo relationships. Care should be exercised when using these equations for cross-calibrating patient databases or pooling clinical data from different devices because the least significant differences detectable from measurements taken on two different machines can be increased substantially.
Bone density is the single-most predictive measure of fracture risk. Bone density at virtually any available measurement site (spine, proximal femur, forearm, whole body, calcaneus, and tibia) can be used to predict risks of all fractures. Of those bone density measurement sites, in clinical practice, the forearm has been used the longest, applying quantitative methods such as dual-energy X-ray absorptiometry (DXA), single-energy X-ray absorptiometry (SXA), or the radioisotope equivalents dual photon absorptiometry (DPA) and single photon absorptiometry (SPA). Düppe et al. showed that a single SPA measurement predicted hip fractures 25 years later with a relative risk of 1.66/SD decrease in bone mineral density (BMD).(1)
The purpose of this study is to quantify the interrelationships of currently marketed devices that measure forearm bone density. The International Committee of Standards in Bone Measurement (ICSBM) commissioned the study. This committee was formed to address issues of accuracy, standardization, and comparability of densitometers from different manufacturers. All manufacturers were invited to participate with the committee, as were representatives from academic institutions. The ICSBM has commissioned two previous studies, one for the standardization of the anteroposterior (AP) spine BMD as measured by DXA(2) and the second for the standardization of the proximal femur BMD.(3) The standardization equations were reported as letters to the editor of selected journals(4–7) and standardize BMD units are now available on the applicable machines. The committee also has recommended standardized projectional density units of milligrams per squared centimeter (sometimes referred to as areal density) to distinguish it from the manufacture-specific BMD units typically reported in grams per squared centimeter.
In the previous spine standardization study, regions of interest (ROIs) were found to be very similar. The agreement between the three participating manufacturers (Hologic, Inc. Bedford, MA, USA; and Lunar, Madison, WI, USA; and Norland Medical Systems, White Plains, NY, USA) was better than r = 0.99 with only slight differences attributable to bone edge detection. The T scores compared well also. In the proximal femur, the intermanufacturer definitions of the neck and total femur ROIs are larger than in the spine. The BMD agreement between manufacturers ranged from 0.97 < r < 0.98 for the total femur and 0.94 < r < 0.96 for the femur neck.(3) In addition, the femur reference data used by the three manufacturers differed considerably.(8) It was the suggestion of the ICSBM to define one reference data set, the Third National Health and Nutrition Examination Survey (NHANES III) total hip BMD,(9) to be provided in standardized BMD by all manufacturers to remove the diagnostic differences.(6) To date, reference data have not been standardized for any sites other than the femur.
Different forearm densitometers from a single manufacturer tend to agree well by design with a unity regression slope, no significant mean value offsets, and a small amount of scatter around the fit line. Intergenerational comparisons have been published for Hologic(10,11) forearm devices. Faulkner et al.(12) found that although there was high correlation (r ∼ 0.99) between the Hologic QDR-1000 and Lunar DPX-L at the ultradistal and ⅓ distal sites, the regression slope was up to 12% different from unity with offsets of 0.07-0.08 g/cm2. There are no universal standards for the measurement of forearm bone density. Consequently, each of the commercially available densitometers has different ROIs (size, location, and number), different calibration standards, and unique racial and ethnic reference populations. Yet, practitioners commonly apply the standardized World Health Organization (WHO) or National Osteoporosis Foundation (NOF) diagnostic criteria equally to all devices. Thus, there is a strong possibility of treatment disparities and disagreement in clinical studies. We report on the absolute BMD differences, regression relationships, and define standardized equations to compare BMD results of similar ROIs between most of the commercially available forearm densitometers.
MATERIALS AND METHODS
The study goal was to recruit 100 women, aged 20-80 years with ∼16 women in each age decade. One hundred and one subjects were actually recruited with 13-19 subjects in each 10-year period. This population was chosen to provide a broad range of clinically observed BMD values. The self-declared racial breakdown was 74% white/Hispanic, 18% Asian, 4% black, 4% Pacific Islander/Indian. The following exclusion criteria were used: women known to be pregnant; women with a history of fracture at the distal radius of either arm; the known presence of generalized bone diseases of bone other than osteoporosis including hyperparathyroidism, hypoparathyroidism, Paget's disease, renal osteodystrophy, Cushing's disease, and steroid-induced osteoporosis or other metabolic diseases; a history of malignant diseases localized to bone or treatment by local resection; the presence of rheumatoid arthritis or other arthritic processes that severely limit patient mobility; and the presence of senile dementia severe enough to hinder adequate compliance and understanding of the study. The University of California San Francisco's (UCSF's) local institutional review board approved the study protocol.
Each manufacturer was asked to identify the device they would like to include in the study and how each device was to be used. Some of the manufacturers produce several devices that measure forearm densitometry. However, it was out of the scope of this study to include all of a manufacturer's offerings or previous scanner generations. Table 1 is a list of the devices used in this study along with the associated manufacturer. The Hologic QDR-4500A and the Osteometer DTX-200 (Osteometer MediTech, Hawthorne, CA, USA) were already on site. The other devices were obtained on loan for the course of the study from the manufacturers. Each device acquired an image of both the radius and ulna. All devices except the Pronosco (Vedbaek, Denmark) collected a DXA image from the carpals to a specified proximal distance on the forearm. The Pronosco X-posure acquires a single-energy radiograph of the hand and proximal forearm. When a forearm length was needed for the calculation of ROI placement, consistent lengths were used for all devices. ROIs as configured by each manufacturer for their specific device were used. These ROIs are shown graphically in Fig. 1. No attempt was made in this study to alter the default ROIs because this would amount to nonstandard analysis and the reference data would not apply.
Table Table 1.. List of Devices With Version of the Scan Protocol Included in this Study
The Aloka DCS-600EX (Aloka Co., Ltd., Tokyo, Japan) algorithm automatically placed three 10-mm-long regions on the radius and ulna centered at1/10, ⅙, and ⅓ of the forearm length measured from the styloid process on the ulna.(13) Both the radius and ulna were included in each of the BMD calculations.
In the Hologic QDR-4500A, first the user had to place a global ROI in which three individual regions were defined: ⅓ distal, middistal, and ultradistal ROIs.(14) The ⅓ distal ROI was defined as 20 mm long centered at a distance equal to ⅓ of the forearm length measured from the distal tip of the ulna. The ultradistal ROI is a region nominally 15 mm in length positioned proximal to the end plate of the radius. The middistal is the region between the ⅓ and the ultradistal regions. These regions were defined for both the radius and the ulna. A total ulna, total radius, and total forearm BMD is reported that is a sum of the radius and ulna for each of the three ROIs.
The Lunar PIXI forearm ROI was positioned automatically to span 30 mm from the radius and ulna radiographic junction proximally up the forearm.(8,15) Both the radius and the ulna were analyzed and single BMD, bone mineral content (BMC), and projected bone area values were reported.
The Norland pDEXA has a global distal site (radius + ulna), global ⅓ proximal site (radius + ulna), and a ⅓ proximal radius site.(16) The distal ROI spans 10 mm of the lowest BMD region in the distal forearm and is found using an automated search routine. The proximal site spans 10 mm starting at the ⅓ forearm length and continuing proximally.
The Osteometer DTX-200 ROI was positioned automatically 24 mm proximal to the position where the radius and ulna are separated by 8 mm.(17)
The Pronosco X-posure System automatically defines a single BMD calculated from five ROIs.(18) Three of the ROIs are centered on the shafts of the three inner metacarpals, one ROI is on the distal ulna cortical shaft, and the last ROI on the distal shaft of the radius. Unlike the other five devices, the X-posure derives a BMD estimate from radiogrammetry applied to the cortical part of the bone (i.e., no reference bone material is used.) The BMD estimate is called digital X-ray radiogrammetry-BMD (DXR).
Unique BMD variable names were defined for the ROIs described previously and listed in Table 2. Each scanner was maintained per manufacturer's instructions. Data were collected over a 4-month period beginning in February 1999. There were no signs of calibration drifts of >0.5% for any of the devices during the study period.
Table Table 2.. Variable Names for the Scan Protocol Defined in the Text
Interdevice BMD comparison statistics
Because there are a total of 21 reported BMD values, reporting the correlation relationships between all ROIs was not practical for this publication. We chose to present the BMD correlations in three groupings most commonly used for diagnostic and clinical trial purposes: distal (i.e., the ultradistal), mid- (i.e., middistal), and proximal (i.e., ⅓ distal) ROIs. The Lunar PIXI, Osteometer DTX-200, and Pronosco X-posure each report one BMD value exclusively and will be compared in all three groups. The Norland pDEXA scans and reports a BMD for distal and proximal ROIs. Thus, the distal ROI for the Norland pDEXA is included in the distal as well as the mid-ROI analysis. Simple linear regression using SAS's General Linear Model procedure, procGLM, (SAS Institute, Cary, NC, USA) was used to determine a correlation coefficient, slope, intercept, and an SE of the estimate (SEE) for reported combinations. The significance of the intercept was determined using a p value.
Universal standardization equations
Universal standardization equations were calculated using the “optimal conversion” method by Hui et al.(19) This method minimizes a common entropy equation [Eq. (2) in Hui et al.] to simultaneously solve for the best-fit solution of all devices at once. The result is a set of equations that converts each device's BMD into a BMD in standardized units. In addition, the optimal conversion method differs from simple linear regressions in that it creates standardized relationships that are invertible and it assumes that errors exist in all variables. For this reason, the standardized equations are more appropriate for cross-calibrating devices than the linear regressions. The optimal conversion standardization equations were solved using the SAS proc NONLIN. Because it was not possible to scan a single phantom on all devices, a common average BMD was picked by averaging the population means for each device. Standardization equations were calculated for the distal, mid-, and the proximal ROIs with the equations referred to as the sdBMD, smBMD, and the spBMD equations, respectively. As in the linear regression, devices with only one ROI were included in all three standardization equations. After applying the standardized BMD equations to the study subject data, we compared each device using the Bland-Altman analysis to look for systematic residual differences.(20)
Relative BMD value
To look at the equivalence of the standardized BMD as a function of age, we plotted standardized BMD values versus age for each ROI grouping. The SD for each variable was calculated independently for each decade. A p value was used to test for significant differences between similar ROIs for different age decades. In addition, the manufacturer-specific peak reference BMDs and population SDs were converted to standardized BMD units. T-score values of −2.5 were calculated and compared to determine diagnostic equivalence.
The phantoms in Table 3 were identified for potential use in quality control and cross-calibration procedures. Two additional phantoms were made available and considered: the Aloka PHA-8158 and the Leeds phantom used by the Pronosco System. The Aloka and Pronosco phantoms were designed specifically for the manufacturer-specific machine and could not be analyzed using any of the other manufacturer's forearm algorithms. The European Forearm Phantom (EFP) I is not anthropomorphic and could not be analyzed using the default analysis instructions provided by the manufacturers and would have required special analysis. The EFP II could not be scanned in the default mode on the DCS-600EX. The three CIRS phantoms were designed specially and built for this study. The volumetric bone density of 400, 600, and 800 mg/cm3 were chosen by CIRS. Because of their anthropomorphic shape and full forearm length, we had to prop the CIRS phantoms with foam pads to keep the phantoms from rotating during the scan. None of the phantoms could be analyzed by the X-posure in vivo algorithm. Each phantom in Table 3 was scanned on the remaining five devices with repositioning between each scan. The Aloka DCS-600EX scans were analyzed for radial BMD only. The Aloka device was shipped with radius-only analysis software and upgraded only after the phantom scans were acquired.
Table Table 3.. Forearm Phantoms Identified for Use as either Quality Control or Cross-Calibration Studies
Intermanufacturer BMD correlations
The population mean, SD, and CV for each of the chosen BMD variables is shown in Table 4. For qualitative visualization, Figs. 2,3 and 4 are compilations of the scatterplots for all the distal, mid-, and proximal ROIs, respectively. Each plot contains the scatter data, the least square fit line (solid) with 95% confidence limits (dotted), and a line of identity (diagonal). The x variables are in columns and the y variables are in rows. To simplify visualization, we used the same x and y axis scales for each plot compilation. The x and y axes in Fig. 2 range from 0.2 to 0.7 g/cm2. For Figs. 3 and 4, the axes range from 0.2 to 0.8 and 0.2 to 1.1, respectively. Tables 5,6 and 7 show the linear regression statistics for the distal, mid-, and proximal sites, respectively. The r values ranged from 0.70 to 0.97. The SEE ranged from 2.9% to 13.1% of the mean.
Table Table 4.. Table of Population Statistics for Each Device and Selected ROIs
Table Table 5.. Linear Regression Statistics for the Distal Sites
Table Table 6.. Linear Regression Statistics for the Midsites
Table Table 7.. Linear Regression Statistics for the Proximal Sites
Standardization relationships between devices
Table 8 shows the standardized BMD equations for each of the devices for the ultradistal, mid-, and proximal ROIs. Regression of the difference in sBMD values to the mean sBMD values for each device (Bland-Altman analysis) showed no systematic differences between mean values.
Table Table 8.. Standardization Equations for the suBMD, smBMD, and spBMD ROIs
Standardization BMD values versus age
The standardized BMD versus the average decade age for similar ROIs across manufacturers is plotted in Figs. 5,6 and 7. The error bars show the SD of the mean for each decade mean value independently. For this study population and presentation method, there was a trend for all BMD values to increase relative to the young BMD values before a perimenopausal drop, with this trend becoming more pronounced for the more distal ROIs. In general, there was a similar age-related change in BMD for the ultradistal, mid-, and ⅓ proximal sites. For the proximal ROIs, none of the spBMD values of the young (20- to 30-year-old) subjects were significantly different (p > 0.05) except those between PBMD and DXR-BMD, which differed by 30 mg/cm2 (p = 0.03). For the mid-ROIs, all smBMDs for the 20- to 30-year-old subjects were equivalent except for a 30-mg/cm2 (p = 0.04) difference between ARU6BMD and NDBMD. For the distal ROIs, no age decade was significantly different except for a 0.01-g/cm2 (p = 0.044) difference between PBMD and ARU3BMD. However, none of the 20- to 30-year-old sdBMD values were significantly different for p < 0.01.
Figure 8 is a comparison of the peak reference values and population SDs used to calculate T scores after being converted to sBMD. The sBMD values are shown for each ROI grouping. Note that some of the devices are represented in each ROI grouping and thus the peak sBMD values change relative to the standardization grouping used. T scores calculated using sBMD values can now be compared and are shown in Fig. 8. Note that although there is no relative difference between the sBMD values in each grouping, the peaks, SDs, and thus the T score are different most likely because of the use of different reference populations.
As stated in the methods, six of the available phantoms could be scanned on five machines in the manufacturer's default scan mode. A forearm length of 25 cm was defined when the software asked for it. The mean values and CVs are given for the remaining phantoms on each device in Table 9. One Osteometer phantom scan was excluded from the DCS-600EX data because of poor positioning. The PIXI phantom was too short to have a distal site available for the QDR-4500A, the pDEXA, and the DCS-600EX devices. The CIRS and Lunar phantoms did not contain a density gradient in the ultradistal region for the pDEXA to use to find the NDBMD ROI. The Osteometer phantom's handle interfered with its ability to lay flat on the horizontal scanners like the QDR-4500A, PIXI, DCS-600EX, and the pDEXA. All Osteometer phantom scans were done with the phantom propped and a horizontal bone plane.
Table Table 9.. Phantom Mean BMD Values in (g/cm2) for Each ROI
Several principal points can be derived from this study. First, baseline and follow-up examinations must be acquired on the same make and model of densitometer. The monitoring of the same patient on a different machine comes up in a variety of situations including device upgrades and when a patient moves and/or changes primary care giver. One must be certain that the two measurements are comparable and how the use of the two different devices change affects the least significant change (LSC)(21) in BMD. When no change of devices has occurred, the LSC is defined as
where PE is the largest precision error of the technique for a 95% statistical confidence that the change in BMD is significant. However, if two devices are used, the LSC has to be modified to include the SE of the estimate, the precision of the two devices, and the uncertainty of the slope and intercept relating the two devices. If precision drove the SEE, we would expect the SEE around the fit line to be on the order of the root mean square combination of their precision values. For equal precision on both devices, the SEE would be √2 times the precision. If the precision on two devices to be compared is determined by patient positioning and is ∼2%, the expected SEE would be ∼3%. The actual SEEs varied from ∼3% to over 13% with the ROIs, most closely matched in position, having the lowest SEE. This implies that biological variations are mostly responsible for the large SEEs on dissimilar ROIs. Unfortunately, the manufacturers do not reveal their specific algorithms, for example, their method of bone segmentation. Thus, the quantification of the impact on ROI and scanner variations is difficult. A better comparability of forearm densitometry would require in particular a standardization of the ROIs to be used. In addition, the ratio of cortical to trabecular bone mass is known to increase when moving proximally up the forearm.(22) So, one would expect mismatched ROIs to have different rates of loss or gain. Needless to say, if one is following a subject of a clinical trial over time, a change of devices during the trial should be strongly avoided.
Another principal point is that monitoring changes in BMD for populations may be possible on cross calibrated devices for similar ROIs and some devices. This is a common practice to increase recruitment numbers in clinical trials by drawing subjects from more than one clinical site. To pool the results from different machines to look for treatment effects and/or describing patient populations, it is imperative to know their in vivo cross-calibration relationships. All measures would be converted into the measure from one device.
There are no guidelines on what r value and SE is appropriate when converting one device's BMD into another's for a clinical trial. This situation can be contrasted to predicting the total femur BMD from the posteroanterior (PA) spine BMD. The r value between comparative spine and hip measures on the same subjects is ∼0.7 caused by biological variation and it is never recommended to omit the hip and use only the spine to predict hip BMD. To the contrary, it has been shown that one can increase the ability to predict fracture risk by using a combination of hip and spine BMD in determining fracture risk.(23) Thus, we conclude that an r value of 0.70 between devices suggests too large of an uncertainty between devices to predict data of one device from another for individual patients. One the other hand, generally, it is accepted that devices that have been compared with r values above 0.97 using a clinically broad range of values can be pooled reasonably and cross-calibrated. An example of this is the pooling of spine and hip BMD values from across manufacturers. What is not clear is the threshold of acceptability for correlation relationships between the 0.70 and the 0.97 values. One could argue that an r value of 0.90 could be used as a threshold because this would mean 80% of the data supports the linear relationship between the two scanners. In addition, an SEE similar to the LSC of a single device could be used as an additional criterion. In this study, ∼20% of all correlations resulted in r values of >0.9 and 50% in r values of >0.85. Therefore, excellent comparability exists for some devices and for certain ROIs.
It is interesting to look at mistaken conclusions one could draw from uncalibrated BMD comparisons. For example, nonunity slopes between devices could cause confounding conclusions to be drawn depending on the device used. One would expect that a patient scanned on the DTX-200 (OBMD), for example, with a change between baseline and follow-up measurements of say 0.10 g/cm2, would see only a 0.06-g/cm2 change on the Norland pDEXA (NDBMD). However, even when the sBMD units are used, there still exist differences in the sBMD value for a T score of −2.5 because different manufacturers use different reference populations as shown in Fig. 8.
Standard ROIs would simplify interpretation of forearm results. Early postmenopausal bone loss is monitored most sensitively using sites high in trabecular bone, like the ultradistal forearm. Global loss in skeletal mass is best monitored with cortical bone sites like the distal proximal forearm. Standardized ROIs would eliminate biological variation between measures on different devices, reduce the SEE, and increase confidence in cross-calibrating devices, pooling measurements, and monitoring drug therapy.
Because the phantoms are distributed through the clinical range, it may be useful to use two or three together when cross-calibrating scanners to get a reasonable slope. The CIRS phantoms were designed to give three reference points for all ROIs but were limited by their difficult positioning and high densities. Based on the experience of this study, an ideal cross-calibration phantom set could be constructed in several ways. The CIRS phantoms could be improved by shortening to half the forearm length, given a square shape on the proximal end to limit rotation and add stability, eliminate the 800-mg phantom, and provide a 200-mg phantom. Alternatively, two additional Norland or Lunar phantoms could be made of half and double their present densities to cover the clinical range. The EFP II covers all densities required in a single phantom but is not anthropomorphic enough to be analyzed on all devices using the default mode. A more complete anthropomorphic phantom of the hand as well as the forearm that included the medullar canals would be needed for Pronosco X-posure longitudinal QC. Because the Pronosco X-posure uses radiogrammetry instead of mass attenuation to derive bone density, it is not obvious how one could create a phantom set that could cross-calibrate the X-posure to the DXA devices. The fact that none of the phantoms could be scanned on all devices stresses the challenges in cross-calibrating different technologies to each other.
This study has several shortcomings. First, not all devices from all manufacturers were included in this study. GE/Lunar makes four different devices for forearm densitometry: Prodigy, DPX-IQ, Expert, and PIXI. The Prodigy, Expert, and IQ have the same X-ray technique but differ from the PIXI. The Prodigy, IQ, and Expert have multiple ROIs and the PIXI has only one. Thus, it will be difficult to apply the results of this study to those other devices. Hologic makes pencil and fan beam devices that have identically defined ROIs but different X-ray techniques. Osteometer has several algorithms for analyzing their scans that differ significantly from the analysis we used. Also, no peripheral quantitative computed tomography (pQCT) devices were included. Because in vivo precision was not quantified on this patient population, we could not use regression models that use weighted error corrections such as the model proposed by Mandel.(24) Instead, we assumed that the base precision of each device was similar. Although the precision reported in the literature for each device varies, this is a reasonable estimate because patient repositioning typically limits the precision of most clinical BMD measurements.
This study describes the regression relationship between six forearm densitometers. In general, the agreement between forearm densitometers is high. Standardized units have been defined such that the mean BMD values for each device would, on average, agree with one another. Further standardization is necessary in the ROIs to improve the agreement and reduce the SE of the estimate.
We thank Vesta March and Fay Wong for acquiring the scans and David Breazeale for editing. The ISCBM commissioned this study. Each of the manufacturers that participated contributed financially in equal amounts to cover the expense of recruitment, scanning, and data analysis. The participating members of the Forearm Subcommittee of the ISCBM at the time of this study are as listed: Klaus Engleke, Institute of Medical Physics (IMP), Erlangen, Germany; Harry Genant, Osteoporosis & Arthritis Research Group/UCSF, San Francisco, CA, USA; Nicole Hamilton, Schick Technologies, Inc., Long Island City, NY, USA; Thomas Hangartner, Wright State University, Dayton, OH, USA; Lewis Harrold, Norland Medical Systems, Fort Atkinson, WI, USA; Willi Kalender, Institute of Medical Physics, University of Erlangen/Nürnberg, Erlangen, Germany; Russ Nord, Lunar/GE, Madison, WI, USA; Svenn Poulsen, Pronosco, Vedbaek, Denmark; P. Rüesgsegger, Institute für Biomedizinsche Technik, Zürich, Switzerland; John Shepherd, OARG/Department of Radiology, San Francisco, CA, USA; Toshiaki Tamegai, Aloka Co., Ltd., Tokyo, Japan; George Tysarczyk-Neimeyer, Stratec Medizintechnik GmbH, Pforzheim, Germany; Eric von Stetten, Hologic, Inc., Bedford, MA, USA.