Dose reduction in digital radiography based on the significance of marginal contrast detectability

Abstract The performance of three digital detectors was measured at two exposure index (EI) levels in terms of the effect on features at the borderline of detectability. The null hypothesis was that there would be no statistically significant difference in the CNR of marginally visible features of a baseline‐ (2.2 µGy) and reduced dose (1.4 µGy) images. The experiment used three digital detectors and a phantom composed of an aluminum contrast‐recovery plate, with features of varying diameters and hole depths, which was placed between the detector/grid and 5–20 cm Lucite. Exposures were made using a kVp between 55 and 110 corresponding to the Lucite thickness and a mAs producing an EI of approximately 220 or 140. Images were acquired for all detectors, EI values, and all Lucite thicknesses, then scored by a team of physicists and technologists in terms of feature visibility for each feature size. Contrast‐to‐noise ratio (CNR) was calculated for each feature using an ROI over the feature and a local background annulus. The uncertainty in the CNR was determined by sampling the background at each feature size, finding residuals from an overall background fit, and then calculating a standard deviation in the noise for each size. The marginal feature pair for each feature size bracketed the reader score. The difference between the CNR values of corresponding marginal features in EI‐paired images was significant (P < 0.05) for one detector and not significant (P > 0.05) for marginal features of the other two. Based on both reader scoring and CNR measurements of phantoms, patient doses can be lowered by 30% for those two detectors without a statistically significant difference in lesion perceptibility of the marginally visible feature, while for the other detector there was a statistically significant change in marginal feature detectability and dose reduction was not recommended.


| INTRODUCTION
The transition in imaging media from screen-film to digital detectors decoupled dose and image quality. Screen-film exhibits a loss of contrast when the exposure lies on the shoulder and toe regions of the Hurter-Driffield (H-D) curve, meaning that acceptable image quality corresponds to a limited range of doses to the film. 1 In comparison, digital detectors have a wide dynamic range and it is no longer obvious how much radiation exposure is needed to produce a diagnostically acceptable image. Although this has historically raised concerns of "dose creep," 2 digital detectors have led to dose reduction while maintaining acceptable image quality. 3 Following the principle of minimizing dose (As Low As Reasonably Achievablethe ALARA principle) 4 for radiography, it is important to define an image quality metric so that the minimum dose meeting the metric can be used.
Although clinical diagnostic quality of an image can be subjective, the physical characterization can be described in terms of noise and contrast between features. The relevant figure of merit for optimizing image quality is therefore the contrast-to-noise ratio (CNR), also called the signal-difference-to-noise-ratio (SdNR). 5 One prominent strategy for optimizing the image quality with respect to dose levels is to use a figure of merit (FOM) of CNR 2 /E, where E is the exposure to the detector. This FOM will optimize the detector energy response, such as finding the optimal kVp per Lucite thickness, since the tube current (mAs) has been canceled out. It also somewhat balances the image quality improvement with a penalty for increasing patient dose; however, it does not optimize the dose itself for a diagnostic imaging task.
The determination of marginal feature visibility across a range of features has been studied using a contrast-detail curve, often employing the CDRAD phantom (Artinis Medical Systems, Elst, the Netherlands). 6,7 This phantom has an array of cylindrical holes with different depths and diameters, as described in its user manual. 8 The image quality FOM used with this phantom is a software calculation that is inversely related to the sum of the products of the marginal features' diameters and depths. 9 This FOM increases as the features identified as marginal by the software become more subtle. However, the clinical significance of this FOM is not clear. It is dose dependent, so decreasing the dose will reduce the FOM, but there is little guidance on how much change in the FOM can be tolerated clinically; the Institute of Physics and Engineering in Medicine (IPEM) recommended a maximum 30% deviation from baseline. 10 One recent paper 11 looked at the significance of change to this FOM using the standard deviation of the results and found that changes were unlikely to be reliably detected at less than a 50% dose change. The question remained unanswered, how meaningful is the reduction in image quality when the dose is lowered by a certain amount?
Consider the situation where, instead of optimizing the FOM of CNR 2 /E, the lowering of patient dose is emphasized while requiring acceptable diagnostic quality. The choice of features to represent acceptable diagnostic quality is important, as some hypothetical features will have high enough CNR that they will be visible over the clinically relevant range of detector exposures, and some features will have low enough CNR that they will never be visible over a clinically relevant exposure range. It is reasonable, therefore, to establish a clinical baseline dose and resultant image quality where the features at the borderline of visibility (henceforth called "marginal features") are defined and further dose optimization should be compared to this baseline. This scenario may be more clinically relevant, where patient radiation safety is touted but image quality only needs to meet a minimum threshold that is task dependent. In this situation, there is a feature-specific baseline CNR (CNR B ) and reference patient dose resulting in detector exposure (E B ) producing acceptable image quality. If the patient dose is lowered so that E < E B and CNR < CNR B , will the ability to make a clinical diagnosis based on low-contrast features be lost?
One approach to addressing the significance of the change in image quality between baseline and reduced dose is to utilize a statistical method proposed in computed tomography (CT) for determining low-contrast detectability. 12,13 This method uses background sampling with a large number of matrix elements the size of the feature of interest to obtain the mean pixel value from each element at that spatial scale. If the mean values follow a normal distribution, a quantitative measure of the visibility of a low-contrast feature over background can be defined, using the criterion that the feature contrast should be 3.29 times the standard deviation of the background values (representing a 90% confidence limit, with 5% probability tails beyond this). Conceptually, this can be applied to the visibility of a change in feature contrast relative to the background as well.
The goal of this research was to determine if there was a statistically significant difference in the low-contrast detectability of reader-determined marginal features at very different exposure indices. The null hypothesis was that lowering the EI by 30% from the institutional baseline would produce no statistically significant difference in the CNR of the marginal features of an image quality phantom.

| MATERIALS AND METHODS
The overarching goal was to determine if a change in CNR for marginal features of a contrast-detail phantom was statistically significant when the dose was lowered by 30% from baseline. The analysis of the images, post-acquisition, required some distinct steps. First, the marginal features were determined by individuals associated with x-ray imaging who scored the low-contrast features for visibility.
Second, the contrast was determined by drawing regions-of-interest (ROIs), at a spatial scale corresponding to the feature size, over and around the feature. The noise was the average pixel noise from sampling over the image. Then, the variance in the background at different spatial scales was determined from the variance in background means from size-specific ROIs. Finally, a chi-squared statistical test was applied to see if the difference in CNR for the corresponding marginal features in baseline-and low-dose images was significant compared to the ratio of variance in background means to pixel noise at that spatial scale. Each of these analysis steps is described below.

2.A | Image acquisition
The experiment was conducted using a Siemens Multix x-ray room (Siemens Medical Solutions USA, Malvern, PA) retrofitted with both a Carestream DRX-1c detector and a Carestream DRX-Plus detector, and also a portable x-ray unit from a third-party vendor with a CXDI-710c digital detector manufactured by Canon. All three were cesium iodide (CsI)-based digital radiography detectors. Both x-ray generating devices were acceptance-tested and had annual physics QC tests performed, and the exposure index calibration for all three detectors was tested at acceptance in accordance with AAPM TG 116 and the manufacturer's recommendations for creating RQA5 beam conditions (Fig. 1 Table 2); a diagram of the experimental setup is shown in Fig. 2  Two images were produced for each phantom build using a manual technique: a kVp between 55 and 110 as appropriate to the Lucite thickness 14 (see Table 4) and a mAs producing an EI of approximately 220 or 140. The two target EI values were chosen based on the baseline EI and the experimental target EI, but the actual EI values had to be approximate because of the granularity of the mAs stations. Acquisition parameters and the resultant exposure index values are provided in Table 5. There were a total of 24 images produced for the three detectors, the four Lucite thicknesses, and the two EI levels. The images were acquired using a minimal processing algorithm such as "Pattern." Gain and offset calibrations were applied, but the image was not processed based on assumptions of anatomy.

2.B | Reader scoring
In order to identify the marginal features used in the analysis, the images were scored for the last resolvable feature of each feature size, which was the lowest contrast feature for each diameter (column) that appeared circular. The low-contrast features were defined as f i,j , where i∈ 1,2,...,10 f gbeginning with the largest diameter and T A B L E 1 Vendor information on exposure index (calibration beam quality and EI definition) for the three digital detectors.

The difference in scoring team composition between Canon and
Carestream is due to the fact that images were acquired at separate times and there was personnel turnover in between. The image scores were reported in the form of f 1,j À f 2,j À ... À f 10,j , where j represents the lowest contrast feature that was distinguishable. A feature score mean (average j) and standard deviation for each i were calculated from the collected reader data. The marginal features for each size were then determined so as to bracket the mean reader score; for example, if the mean reader score for a group of 10 features in column i were 7.1, then the marginal features for that column i would be f i,7 and f i, 8 . To serve as an additional check of the quantitative results, the differences in scores for the baseline-and low-EI image pairs for each reader were calculated and the standard deviation among readers for the score difference was found.

2.C | Feature contrast measurement
In addition to reader scoring, the images were quantitatively evaluated to measure the CNR of each feature using the ImageJ software (National Institutes of Health, Bethesda, MD). To determine the contrast for each f i,j , the pixel intensity S i,j was measured using an ROI drawn over the feature using an ImageJ macro, designed so that the T A B L E 2 Gammex model 1151 aluminum contrast-detail recovery phantom has a matrix of holes, arrayed 10x10. The phantom was imaged so that depth (contrast) varied by rows and diameter varied by columns. Hole depth and diameter are given in mm. For illustration purposes, a representative image of the feature ROIs and local background ROIs is shown in Fig. 4. However, the background pixel intensity varied with a gradient across the image and the gradient was sufficiently steep in places to prevent the use of the annular ring pixel noise N i,j as the noise when calculating the CNR. It was ultimately decided to use a global but size-dependent value of the noise for the CNR of each feature in the image, as described in the next subsection.

2.D | Background variance determination
The process for determining the spatial scale-specific background variance utilized the distribution of the means of an ensemble of size-specific background samples, which was fitted to find the standard deviation of that distribution 12,13 . This standard deviation served as the statistical uncertainty in the contrast at that spatial size. However, it was noted that there was a DC component of the background which consisted of a signal gradient across the images (see Fig. 5a The σ i from the Gaussian fits of samples of size i was used to determine the uncertainty in the CNR calculation for each f i,j , as it described the statistical variance in the background at size i.

2.E | Calculating CNR change and uncertainty
The final calculation was to find the CNR and CNR difference (ΔCNR i,j ) for corresponding marginal features in the EI image pair (images from same detector and phantom but at baseline and lower dose). The CNR i,j was calculated as the difference in feature ROI and annular ring ROI values divided by the size-specific pixel noise:

| RESULTS
The difference in reader scores for EI image pairs was compared across the three detectors with results displayed in Figs. 8(a)-8(d).
The scoring difference between EI image pairs for the five largest diameter features was less than or comparable to the standard deviation of the scoring difference. The exception was for the DRX-Plus detector, 5 cm phantom, feature size 5, where the scoring difference was three times the standard deviation. Of 60 mean scoring differences among the three detectors, only two features (including the aforementioned one) had a ratio relative to the standard deviation greater than 2.0 and no single reader was responsible for the score difference.
To evaluate the readers relative to each other, an analysis of variance (ANOVA) single factor statistical test was performed to determine if the readers' scores were consistent at the 5% level. For all the three detectors, the individual readers were not consistent with each other at the 5% level, but the mean tech group score was consistent with the overall mean and the mean physicist group score was also consistent with the overall mean.
The difference in the CNR between corresponding features of EI-paired images was found for the marginal features of each feature size. Figs. 9(a)-9(d) show the difference in CNR between high and low EI for the five largest features. The p-value and reduced chisquared are included in Table 6 for each detector and phantom combination; only the data from the five largest feature sizes were used to calculate the p-value and chi-squared.  Although this may seem complicated, for a large institution we have technique charts tailored to the individual imaging device and techs assigned to a particular room, so that equipment-specific requirements have been adopted easily by the majority users.

ACKNOWLEDG MENTS
The authors thank the Quality Assurance and Performance Improvement team for Digital Radiography, who assisted in reviewing images and optimizing detector doses for image quality and radiation safety: