Opportunistic Screening With CT: Comparison of Phantomless BMD Calibration Methods

Opportunistic screening is a new promising technique to identify individuals at high risk for osteoporotic fracture using computed tomography (CT) scans originally acquired for an clinical purpose unrelated to osteoporosis. In these CT scans, a calibration phantom traditionally required to convert measured CT values to bone mineral density (BMD) is missing. As an alternative, phantomless calibration has been developed. This study aimed to review the principles of four existing phantomless calibration methods and to compare their performance against the gold standard of simultaneous calibration (ΔBMD). All methods were applied to a dataset of 350 females scanned with a highly standardized CT protocol (DS1) and to a second dataset of 114 patients (38 female) from clinical routine covering a large range of CT acquisition and reconstruction parameters (DS2). Three of the phantomless calibration methods must be precalibrated with a reference dataset containing a calibration phantom. Sixty scans from DS1 and 57 from DS2 were randomly selected for this precalibration. For each phantomless calibration method first the best combination of internal reference materials (IMs) was selected. These were either air and blood or subcutaneous adipose tissue, blood, and cortical bone. In addition, for phantomless calibration a fifth method based on average calibration parameters derived from the reference dataset was applied. For DS1, ΔBMD results (mean ± standard deviation) for the phantomless calibration methods requiring a precalibration ranged from 0.1 ± 2.7 mg/cm3 to 2.4 ± 3.5 mg/cm3 with similar means but significantly higher standard deviations for DS2. Performance of the phantomless calibration method, which does not require a precalibration was worse (ΔBMD DS1: 12.6 ± 13.2 mg/cm3, DS2: 0.5 ± 8.8 mg/cm3). In conclusion, phantomless BMD calibration performs well if precalibrated with a reference dataset. © 2023 The Authors. Journal of Bone and Mineral Research published by Wiley Periodicals LLC on behalf of American Society for Bone and Mineral Research (ASBMR).


Introduction
I n conventional quantitative computed tomography (QCT), the subject is scanned on top of a so-called calibration phantom, a standard that contains inserts of known values of hydroxyapatite (HA) or similar substances characteristic of bone material.Based on the CT values measured in these inserts, CT values measured in the bone of interest can be converted to bone mineral density (BMD).This procedure termed simultaneous calibration has the advantage of eliminating the majority of CT scanner instabilities that equally affect the bone of interest and the inserts of the calibration phantom. (1)wever, the stability of CT scanners has improved significantly since the early days of QCT, and asynchronous calibration, (2,3) where the calibration phantom is measured separately from the subject at defined intervals, such as monthly, has been proposed to streamline clinical processes.In opportunistic screening, the use of routine clinical CT scans for determining risk of osteoporotic fracture, a phantomless or internal BMD calibration is even more desirable and even indispensable for the analysis of BMD for research purposes using historic CT data for which measurements of a BMD calibration phantom are not available.
Instead of using CT values of the HA inserts of a BMD calibration phantom, in phantomless calibration CT values of certain tissues within the body termed internal calibration materials (IMs) such as muscle or subcutaneous adipose tissue along with air are used.It is assumed that the IM tissue composition does no vary among subjects, obviously a source of error of phantomless calibration.Four different techniques have been published for internal calibration. (4-7)]In the present study, basic assumptions of these approaches will be reviewed.
It was the first aim of this study to compare BMD differences between phantom and phantomless calibrations among the phantomless calibration methods by applying them to identical datasets.It was the second aim of the study to compare BMD differences of phantomless calibration methods to a common BMD calibration applied to all subjects, a scenario comparable to a one-time asynchronous calibration, which would be suitable for a stable CT scanner.

BMD calibration
The relation between BMD and measured CT values is linear: Thus the task of any calibration technique is to determine the constants a and b.In standard single energy QCT they are determined with the help of a calibration phantom measured with (simultaneous) or separately from (asynchronous) the subject.A variety of calibration phantoms exist but all consist of several inserts containing bone equivalent materials (usually hydroxyapatite [HA] or K 2 HPO 4 ) of different concentrations, from which slope a and intercept b can be determined. (8)Both constants depend on the composition of the x-ray spectrum emitted by the x-ray tube and thus on tube design, age, and voltage but also on additional filters, on table height and a number of other factors, thus they are scanner and acquisition protocol specific.If the CT scanner is correctly calibrated to water then b ¼ 0 and BMD is directly proportional to the measured CT value, but this is rarely the case at the time of the measurement.In real world conditions b should be smaller than AE5 mg/cm 3 .

Phantomless calibration
It is also the task of any phantomless technique to determine a and b.(9)(10)(11)(12)(13)(14)(15)(16)(17)(18)(19) In the following, a and b obtained by phantomless calibration are denoted as a * and b * .Some of the phantomless calibration methods themselves require an a priori calibration with a set of n constants a i and b i (with i ¼ 1, …, n) known from a reference dataset of n CT scans analyzed with standard QCT.This dataset must be scanner and kV specific.
Four different phantomless calibration methods and a fifth method that uses a mean value instead of an internal calibration are described below.

Phantomless calibration method 1: Equivalent BMD reference values
First, the reference dataset is used to calculate an 'equivalent' BMD value for each of the k selected IMs, where k is typically 2 or 3. (4,11,13,16,17,19) For each i of the n CT scans of the reference dataset and for each of the k IMs a BMD i,k value is determined according to Eq. 1.The IM-specific equivalent BMD k is determined by taking the average of all scans of the reference dataset.
In the scan to be calibrated the k CT values of the IMs are measured and plotted against the BMD k values.A linear fit results in two constants a * and b * that can be used in Eq. ( 1) to calculate BMD of the bone volume of interest (VOI).

Phantomless calibration method 2: Multiple regression
This method (5,18) also requires an a priori calibration with a reference dataset.a * and b * are expressed as a linear combinations of the CT values of the k IMs: The constants m 0 , m 1 ,…m k are determined by a multiple linear regression analysis using the known a i of the reference dataset as dependent and the measured CT i,k values of the k IMs as independent factors (Fig. 1).With a corresponding analysis using the known b i values, constants n 0 , n 1 , …n k are determined.
For a given CT scan of a phantomless CT dataset, the CT values of the IMs are measured and then a * and b * are calculated using Eqs.( 3) and ( 4). a * and b * can be used in Eq. ( 1) to calculate BMD of the bone of interest.

Phantomless calibration method 3: CT number correction
Instead of modifying constants a and b to calculate BMD, the CT value measured in the bone VOI of the scan without a phantom is linearly modified in this approach (6,9) : where CT is the measured and CT' the modified value.BMD in the scan without phantom is then calculated according to Eq. ( 1) using a and b obtained from one or multiple CT scans of the reference dataset and using CT' instead of CT.The correction factors s and t are calculated for each phantomless CT scan assuming that for a stable scanner s = 1 and t = 0. Scanner instabilities that require a modification of the measured CT values can be assessed with the help of two IMs (k ¼ 1,2).CT values of both IMs are measured in the reference CT dataset, CT R 1 and CT R 2 , and in the phantomless CT dataset, CT 1 and CT 2 , which results in a system of two linear equations with two unknowns s and t: where ρ and μ=ρ ð Þ are the physical density and the mass absorption coefficient of the corresponding bone VOI, respectively.Using their elemental composition, (21) mass absorption coefficients of M and HA can be calculated form the NIST database (National Institute of Standards and Technology, NISTIR 4999 (22) ) as a function of photon energy.Thus, in principle the spectral distribution of the polychromatic X-ray beam emitted by the CT tube must be known and Eq. ( 6) must be integrated over the full spectrum.However, in practice the absorption characteristics of such a spectrum can be approximated by an effective energy. (23,24)n order to solve Eq. ( 6), two or more IMs are required.The known density values are plotted over their measured CT values (Fig. 2A) and slope and intercept of a linear regression is used to obtain ρ of the bone VOI.The same method is used to determine μ=ρ ð Þof the bone VOI from a linear regression of the known mass absorption coefficients of the IMs and their measured CT values  (Fig. 2B).Further details are given in the Supporting Information (Voxel-specific calibration technique).

Phantomless calibration method 5: Mean phantom calibration
In order to compare the methods above with a non-subject specific calibration we calculated a "mean" calibration of n CT scans from the reference dataset, simply be averaging a and b: The concept of this technique is similar to an asynchronous calibration, it does not use CT values of IMs.

Datasets
Two existing CT datasets (DS1 and DS2) of the lumbar spine were used.All scans were obtained following the appropriate Institutional Review Board approvals, were fully anonymized and had been analyzed earlier.No CT scan was specifically obtained for this study.In both datasets subjects were scanned on top of a calibration phantom, that is, constants a and b were available for each CT scan.No contrast agents were used for either datasets.
DS1, obtained in Rochester, MN, USA, consisted of 350 CT scans from female residents from Minnesota, USA (mean age 56.3 AE 17.4 years, range: 21-97 years).The scans covered L 1 to L 3 and were acquired on a GE LightSpeed QX/i (GE Healthcare, Milwaukee, WI, USA) using a standardized acquisition and reconstruction protocol (Table 1).A Mindways model 2 calibration phantom (Mindways Software Inc., Austin, TX, USA) was used for simultaneous calibration (Fig. 1).Further details of DS1 were described earlier. (25,26)S2, obtained at the University Hospital Erlangen, Germany, consisted of 114 CT scans from clinical routine, although for research purposes a Siemens calibration phantom was used.All scans (76 male, 38 female; mean age 57.9 AE 16.2 years, range: 18-88 years) were acquired on a Siemens SOMATOM Definition AS (Siemens Healthineers, Erlangen, Germany).The consent form includes agreement for the use of these scans for clinical research.In DS2, apart from tube voltage, CT acquisition and reconstruction parameters differed among scans (Table 1).

Image analysis
For all scans, average CT values of the IMs were measured in the VOIs depicted in Fig. 3. Air and subcutaneous adipose tissue were measured in an anterior position, skeletal muscle in the left psoas, blood in the aorta (Fig. 3B), and cortical bone in the pedicles, the articular or spinous processes, depending on which CT value was higher (Fig. 3C).For each IM, cylindrical VOIs were placed at the level of L 2 with approximately the same height as L 2 .The VOI locations were comparable in all CT scans; however, if the scan did not cover L 2 , L 3 was selected instead.MIAF (Medical Image Analysis Framework, University Erlangen, Germany, version 6.0.1) was used for segmentation and measurement of trabecular CT values (Fig. 3), which were averaged over L 1 to L 3 (27) and subsequently converted to BMD.

Comparison of phantomless calibration methods
Sixty scans of DS1 and 57 scans of DS2 were randomly selected for the required a priori calibration for the first three phantomless calibration techniques (equivalent BMD reference values, multiple regression and CT number correction).The remaining 290 scans of DS1 and 57 scans of DS2 were used to evaluate BMD differences (ΔBMD) between phantomless and simultaneous calibration, which served as gold standard.Randomization of DS2 ensured an equal distribution of table height settings in both groups.Abbreviations: C1 = air/blood; C2 = air/subcutaneous adipose tissue; C3 = skeletal muscle/subcutaneous adipose tissue; C4 = air/subcutaneous adipose tissue/blood/cortical bone; C5 = subcutaneous adipose tissue/blood/cortical bone; IM = internal calibration materials; SD = standard deviation.
For the first three phantomless calibration techniques, three combinations (C1 to C3) were used as phantomless calibration materials: air and blood (C1), air and subcutaneous adipose tissue (C2), and muscle and subcutaneous adipose tissue (C3).Obviously, IMs were not used for the mean phantom calibration.For the voxel-specific calibration the calculation of BMD is a two-step process, first the determination of the effective energy E eff and then two linear regressions to determine ρ and μ=ρ ð Þof the bone volume of interest as described in Phantomless Calibration Method 4: 'Voxel-Specific Calibration'.For the determination of E eff , at least a combination of four IMs is required (Table S2).Here we used the combination of air, subcutaneous adipose tissue, blood (replacing water which cannot be used in clinical CT data) and CB (C4).For the second step, in addition to C4, we used the same combinations (C1 to C3) as for the first three phantomless calibration techniques and a fifth combination of subcutaneous adipose tissue, blood, and CB (C5).
As described in the Supporting Information: Appendix S1, the voxel-specific calibration technique assumes that the local density of CB used as IM is known and that the measured CT value is not affected by partial volume artifacts.Both assumptions are violated in clinical CT scans of the spine.Thus, in this study two methods to determine the CB CT value were compared.First, CB CT value was measured in each CT scan as described above (local CB CT value).In a second analysis, in agreement with the ICRU report, (21,28) a physical density of cortical bone of 1920 mg/cm 3 corresponding to a HA density of 820 mg/cm 3 was assumed.Then, the density was converted back to obtain a global CT value of cortical bone used for all subjects (see Supporting Information, 'Determination of the CT value of cortical bone').
The voxel-specific calibration uses yellow marrow and HA as base materials (Phantomless Calibration Method 4: 'Voxel-Specific Calibration').This is appropriate for the hip but at younger age at the spine bone marrow is mostly blood building red marrow.Then with increasing age there is a gradual conversion of red to yellow marrow. (29,30)Therefore, we also used red marrow as second base material.We used DS1 to investigate the age dependency of BMD differences between the use of red or yellow marrow.
Finally, DS2 was used to determine the dependence of ΔBMD on table height for all investigated calibration methods.

Statistical analysis
For all calibration methods and for each IM combination, means, standard deviations (SD) and maximum and minimum values of ΔBMD were calculated.
For each phantomless calibration method one-way analysis of variance (ANOVA) with a Games-Howell post-hoc test was used to compare ΔBMD results among the IM combinations.In the absence of homogeneity of variance, a Welch-ANOVA was used instead.
Based on these results, for each calibration method the best IM combination was selected.These five combinations were again compared with a one-way ANOVA.Bland-Altman analyses were used to compare BMD values obtained from phantom and phantomless calibrations.
Paired t tests were used to compare the ΔBMD results of C1 to C5 between the two methods for determining the CB CT value and the results using red or yellow marrow as base material for the voxel-specific calibration technique.Standard deviations were compared with Levene's tests.Independent sample t tests were used to compare the ΔBMD values between DS1 and DS2.Linear regression analyses were performed to compare the effect of table height on ΔBMD.
A p value of <0.05 was considered significant.All statistical analyses were carried out with SPSS (IBM SPSS statistics for windows, version 26; IBM Corp., Armonk, NY, USA).

Results
ΔBMD results of all calibration methods for DS1 and DS2 are listed in Table 2 as absolute and percentage values for the different IM combinations C1 to C5.For the voxel-specific method, ΔBMD results BMD are listed separately for the local and the global CB CT value.
For the equivalent BMD reference values method, in DS1 ΔBMD was significantly lower for C1 and C3 than for C2.In DS2, ΔBMD was significantly lower for C1 than for C2 and C3.Thus, for the equivalent BMD reference values method, C1 was selected as optimum combination.For the multiple regression technique, ΔBMD did not significantly differ among C1 to C3 in either DS1 or DS2, but to be consistent with the equivalent BMD reference values method, C1 was also selected as optimum combination.For the CT number correction method, in DS1 ΔBMD did not significantly differ among C1 to C3, but in DS2 ΔBMD was significantly lower for C1 than for C2 and C3.Thus, for the CT number correction technique also C1 was selected as optimum combination.
For the voxel-specific calibration using a local CB CT value, ΔBMD results were significantly lower for C4 and C5 compared to the other three combinations in DS1 and DS2.Using a global CB CT value, ΔBMD was significantly lower for C5 compared to the other four combinations in DS1 and DS2.For the voxelspecific calibration, the comparison between a global and a local CB CT value depended on the particular IM combination.However for C5, ΔBMD was significantly lower for the global compared to the local CT value.Thus, for subsequent analyses, ΔBMD results obtained with a global CB CT value and for C5 were used.Figure 4 shows the comparison of absolute ΔBMD results for all calibration methods, using the best IM combinations.The overall pattern was similar in DS1 (Fig. 4A) and DS2 (Fig. 4B), but SDs in DS2 were numerically about 50% higher compared to those in DS1.In DS1, ΔBMD was significantly higher for the voxel-specific calibration than for all other methods.For the equivalent BMD reference values method, ΔBMD was significantly higher than for the multiple regression, the CT number correction and the mean phantom calibration technique.In DS2 no ΔBMD differences were observed between calibration methods.
In accordance with these results, limits of agreement (LOA) between simultaneous and phantomless calibrations shown in the Bland-Altman plots (Fig. 5 for DS1 and Fig. S3 for DS2) were also higher for DS2 compared to DS1.LOAs were lowest for the multiple regression and the mean phantom calibration technique.For the voxel-specific calibration, the Bland-Altman plots also show a significant dependence of ΔBMD on BMD.For low BMD values the phantomless approach overestimated and for high BMD values it underestimated BMD by up to 40 mg/cm 3 in DS1 and by up to 18 mg/cm 3 in DS2 when compared to the phantom based calibration.For the equivalent BMD reference values method, a significant slope was detected in DS1 but not in DS2.
For the voxel-specific calibration using a local CB CT value, the calculated E eff values for DS1 and DS2 varied widely with mean values of 87.6 keV (SD: 8.9 keV, range: 75-126 keV) and 77.9 keV (SD: 6.5 keV, range: 67-95 keV), respectively.With a global CB CT value, E eff was significantly higher and the variation was significantly lower for DS1 and DS2 with mean values of 104.6 keV (SD: 1.2 keV, range: 100-110 keV) and 93.2 keV (SD: 1.1 keV, range: 91-96 keV), respectively.
Using red instead of yellow marrow as second base material increased BMD values of the voxel-specific method (Fig. 6A).If red marrow was used for patients below and yellow marrow above an approximate age of menopause of 50 years (Fig. 6B), the BMD differences of the voxel-specific calibration with a global CB CT value were significantly reduced (mean: 6.4 mg/cm 3 , 2.6%; SD: 9.8 mg/cm 3 , 8.0%; min: À20.0 mg/cm 3 , max: 30.1 mg/cm 3 ) but still significantly higher than results for the other methods.
[Correction added on 5 October 2023, after first online publication: the numbers in the first sentence of last paragraph in Result section has been revised]

Discussion
In the present study BMD differences between phantom based and phantomless calibration were compared for four different techniques.With one exception, average ΔBMD was below 3 mg/cm 3 , which is an excellent result.As opportunistic screening primarily targets fracture risk assessment in individual subjects, standard deviations of ΔBMD are also relevant.For the three phantomless methods that are calibrated with a references dataset, ΔBMD SDs were smaller than 3.5 mg/cm 3 in DS1 and smaller than 5.6 mg/cm 3 in DS2, again excellent results, given that BMD differences between normal and osteopenic and osteopenic and osteoporotic subjects are 20 mg/cm 3 each. (31)t is important to note that in our study ΔBMD means and SDs in Fig. 4 and Table 2 can be used as performance measures to compare phantomless calibration techniques because segmentation of the vertebrae, the in scan calibration phantoms and of the VOIs used to extract the CT values of the IMs was performed only once.
It is further interesting that a simple "average calibration" also performed well.Actually for DS1 and DS2, SDs of ΔBMD for the multiple regression and mean phantom calibration technique were slightly but significantly lower than for the equivalent BMD reference values and CT number correction methods.The multiple regression and the mean phantom calibration technique used the reference dataset to obtain average values of a* and b* for BMD calibration either applied a linear regression (Eqs.( 2) and ( 3)) or just by averaging (Eq.( 7)).In contrast, the equivalent BMD reference values and CT number correction techniques are more strongly affected by the population variance of the IM CT values that is neglected in all phantomless calibration methods.This is evident from Table 2, where mean and SD of ΔBMD using the multiple regression technique are independent of the selected IM pair.Best results for the equivalent BMD reference values and CT number correction method were obtained for the combination of blood and air (C1), both of which are not (air) or almost not (blood) affected by population differences, in contrast to the combinations air/subcutaneous adipose tissue (C2) or skeletal muscle/ subcutaneous adipose tissue (C3). (17)All methods except the voxel-specific calibration showed significantly higher ΔBMD SDs in DS2 compared to DS1, which can be explained by the broader range of CT acquisition and reconstruction parameters and the slight dependence of ΔBMD on table height in DS2.Table height was constant for all scans in DS1, but varied in DS2.
Performance of the voxel-specific calibration differed from the other techniques.As explained in the Supporting Information: Appendix S1, the first step of the voxel-specific calibration technique is the determination of E eff based on the measurement of the CT values of a number of reference materials with known concentrations.The results in Table S2 demonstrate that without CT values of air and cortical bone E eff cannot be determined accurately.However, this is problematic for CT datasets of the spine.For a given VOI of cortical bone the mineralization is usually unknown.Fully mineralized cortical bone may only be present in the pedicles or the spinal process (Fig. 3) but CT values of these small VOIs will be affected by partial volume artifacts.Thus, E eff values will be inaccurate, which indeed was reflected by the wide range of E eff values observed when applying the voxel-specific calibration method in DS1 and DS2, although 120 kV was used for all scans of DS1 and 100 kV for all scans of DS2.
As a consequence, ΔBMD means and in particular SDs for the voxel-specific calibration method (Table 2) using the local CB method were higher compared to the other calibration methods.The use of a global CB CT value significantly reduced means and SDs of ΔBMD, at least for the preferred IM combination C5 of subcutaneous adipose tissue/Blood/CB.Nevertheless, even with a global CB CT value SDs of ΔBMD were significantly higher for the voxel-specific calibration compared to the other methods (Fig. 4).It is not fully clear why ΔBMD was 12.6 mg/cm 3 for DS1 and only 0.5 mg/cm 3 for DS2.However, it should be noted that the voxelspecific calibration technique was the only method which showed a significant slope in the Bland Altman plots (Fig. 5 and Fig. S3), indicating larger ΔBMD results for low and high BMD values.Thus, for the voxel-specific calibration technique, BMD distributions may have affected ΔBMD differently for cohorts of DS1 and DS2.
The higher BMD differences of the voxel-specific calibration are at least in part caused by a calibration offset.Standard single energy QCT converts CT to BMD values using water and for example HA as base materials.In the publication describing this method, (7) the authors used yellow marrow instead of water.As seen from Fig. 6A a change from yellow to red marrow, ie, just a change in the calibration, caused a BMD offset of more than 10 mg/cm 3 .The use of yellow marrow is certainly not adequate for spinal BMD measurements in younger subjects but even in elderly subjects there is still some red marrow in the vertebra.The negligence of fat in standard QCT, known as fat error, causes artificially lower BMD values.It cannot be fully resolved because the marrow composition of a given subject at a given age is unknown, thus any base material will never exactly fit the marrow compositing.Dual-energy techniques reduce the fat error (32) but CT scans used for opportunistic screening are still largely acquired with single energy CT.
Which is the most appropriate phantomless calibration method?If a reference dataset exists, the multiple regression or mean phantom calibration methods showed the best performance.However, as reference datasets are scanner-and kV-specific, multiple reference datasets are required even for a given scanner to cover all possible kV settings.Therefore, an alternative approach such as the CT number correction, which needs only a single reference CT dataset, eg, a phantom scan from an asynchronous calibration, may be more practical for clinical use, requiring a long-term stable CT scanner.
For the retrospective analysis of historic data, for which no reference data and not even a single phantom dataset is available, the voxel-specific calibration seems to be the best choice.However, the use of the global option, which showed better performance, requires the knowledge of the CT value of cortical bone or the availability of a reference standard scanned on the CT scanner.Both options may not be available for historic CT data.In this case, for the voxel-specific calibration the local determination of CT of CB is the only remaining choice at the expense of higher means and SDs of ΔBMD.
The first three mentioned phantomless calibration methods can only be used for internal calibration of historic scans under the assumption that the particular scanner used for the reference dataset is equivalent to the scanner used for the historic scans, obviously an assumption difficult to prove.During the last two decades, scanner stability has significantly improved and different models from the same manufacturer seem to be comparable but to our knowledge, a detailed comparison has not been published so far.
It is a big advantage of the current study to compare the performance of phantomless calibration methods using identical datasets and identical image processing procedures.Published performance data of phantomless calibration methods, which mostly agree well with our results were all obtained with their own study specific datasets and procedures.Nevertheless, it must be considered that our particular implementation of the different calibration methods may not exactly match those reported in the literature, as algorithmic details or optimization steps are usually omitted from the publications.Thus, the generalization of our results has some limitations.However, it is unlikely that specific details of an implementation, which are unreported in a scientific publication have a major impact on the performance of the internal calibration; therefore, our results should be representative.
For the equivalent BMD reference values method and standardized data our ΔBMD results of 2.4 mg/cm 3 were slightly higher than those reported by Lee and colleagues (4) (ΔBMD = 1 mg/cm 3 ).For the multiple regression method and clinical routine data our ΔBMD results of À0.2 mg/cm 3 were lower than those reported by Lee and colleagues (18) (mean: 4.3 mg/cm 3 ).Consistent with our findings, Prado and colleagues (5) reported comparable performance of the multiple regression independent of the selected IM combination.For the CT number correction technique and standardized data our ΔBMD results of 0.5 mg/cm 3 were comparable with 0.9 mg/cm 3 reported by Mueller and colleagues. (10)Other studies also reported an underestimation of the BMD for the phantomless compared to the simultaneous calibration, (9,33) although in clinical routine data we found an overestimation of BMD of 2.5 mg/cm 3 .For the voxel-specific method, discrepancies between our and other results were higher.In their original publication, Michalski and colleagues (7) reported a ΔBMD value of 3 mg/cm 3 using the local CB CT technique, whereas ΔBMD results in our study were larger than 18 mg/cm 3 .One explanation of this discrepancy may be the fact that results in Michalski's study were based on just 10 cadaveric spines.
This study has a few limitations.First, it was limited to the spine, but opportunistic screening to determine hip BMD may actually be more relevant for hip fractures.Second, an automated segmentation of VOIs used to determine the CT values of the IMs was not performed.However, the same CT values were used to calculate BMD for all phantomless calibration techniques.Third, for the first three phantomless calibration techniques and the mean phantom calibration method we did not investigate the potential impact of a different reference dataset.Fourth, the effect of CT contrast agents was not considered-only native scans were included in the study.However, the majority of routine clinical CT scans is acquired with administration of CT contrast agents.Fifth, dual energy CT will provide new options for phantomless calibration, (34,35) but its use should be further investigated as most high end CT scanners nowadays have dual-energy capabilities.Sixth, all scans of DS1 and DS2 and the corresponding reference datasets were taken from a single scanner, respectively.In theory, accuracy errors of internal calibration could differ among scanner manufacturers and models.However, the only methodological study (4) on internal calibration (using the equivalent BMD reference values method) that included multiple types of CT scanners did not report accuracy differences among them.But of course, scanner-specific reference datasets were used.
In conclusion, BMD differences between phantom and phantomless calibration were small (<3 mg/cm 3 ) for methods that are precalibrated with a reference dataset.Interestingly, the use of average calibration constants obtained from the reference dataset resulted in performance comparable to that of the phantomless calibration methods.However, a reference dataset may not be available, for example in case of a retrospective analysis of historic CT data.For such a scenario, one phantomless calibration method has been developed but its performance is inferior compared to the other methods because one input parameter is the CT value of cortical bone, a very inaccurate measurement in the spine.Phantomless calibration is a valid option to calculate BMD in opportunistic screening and if the use of phantoms is very complicated, such as in large observational studies.Nevertheless, for preplanned studies, standard phantom based QCT is still the method of choice.

Fig. 1 .
Fig. 1.Illustration of the multiple regression technique: Each dot represents the known calibration slope a of one scan of the reference dataset.The regression plane (blue) is used to determine the slope a* required for the phantomless calibration.Points below the plane are shown in black and those above it in red.For a given CT dataset scanned without a phantom the CT values of the selected internal calibration materials (here air -CT 1 = À1000 and blood -CT 2 = 40) are measured.Then slope a* (purple) required for the BMD calibration of the particular dataset is determined.The same procedure is used to determine intercept b*.

Fig. 2 .
Fig. 2. (A) Density and (B) mass absorption coefficients of the selected IMs plotted against their measured CT values for the effective energy values determined earlier.Linear regression curves are generated from these plots to obtain the values of ρ and μ=ρ ð Þfor the CT value of the bone being examined.IM = internal calibration material.

Fig. 3 .
Fig. 3. (A) Axial slice of an abdominal CT scan at level L 2 .Red overlays indicate location of volumes of interest used to measure CT values of internal calibration materials.Segmentation of trabecular VOI is shown in blue.The location used to determine CT values of cortical bone is shown in (B) and (C).SAT = subcutaneous adipose tissue.

Fig. 4 .
Fig. 4. Whisker plots comparing BMD differences among the different phantomless calibration techniques for (A) DS1 and (B) DS2.For details please refer to the results section.For the voxel-specific calibration method, a global CT value for cortical bone was used.Top and bottom horizontal borders of the blue boxes indicate 25th and 75th percentiles with their distance representing the interquartile range (IR), the green asterisk the mean and the red horizontal line the median.Red dots outside the black horizontal lines (Whisker) are outliers with values >1.5 Â IR.

Fig. 5 .
Fig. 5. Bland-Altman plots comparing BMD values of phantomless and phantom based calibration.Data are shown for DS1.The solid line indicates the mean difference X and the dashed ones X AE1:96 Á SD X À Á .

Fig. 6 .
Fig.6.Age-dependent BMD differences of the voxel-specific calibration method (global cortical bone CT value) when using (A) yellow or red marrow as second base material and (B) red marrow instead of yellow marrow for individuals younger than 50 years in DS1.

Table 1 .
Scanning and Reconstruction Parameters used in DS1 and DS2

Table 2 .
Absolute and Percentage Mean and SD and Absolute Minimum and Maximum Values of BMD Differences Between Simultaneous Calibration and Phantomless Approaches Using Different IM Combinations C1-C5 Note: For the voxel-specific calibration, the results are given for the use of both a local and a global cortical bone CT value.Best combinations are displayed in bold.