A method for quantitative evaluations of scanning‐proton dose distributions

Abstract Purpose Patient‐Specific Quality Assurance (PSQA) measurement analysis depends on generating metrics representative of calculation and measurement agreement. Considering the heightened capability of discrete spot scanning protons to modulate individual dose voxels, a dose plane comparison approach that maintained all of the capabilities of the well‐established γ test, but that also provided a more intuitive error parameterization, was desired. Methods Analysis was performed for 300 dose planes compared by searching all calculated points within a fixed radius around each measured pixel to determine the dose deviation. Dose plane agreement is reported as the dose difference minimum (DDM) within an empirically established search radius: ΔDmin(r). This per‐pixel metric is aggregated into a histogram binned by dose deviation. Search‐radius criteria were based on a weighted‐beamlet 3σ spatial deviation from imaging isocenter. Equipment setup error was mitigated during analysis using tracked image registration, ensuring beamlet deviations to be the dominant source of spatial error. The percentage of comparison points with <3% dose difference determined pass rate. Results The mean beamlet radial deviation was 0.38mm from x‐ray isocenter, with a standard deviation of 0.19mm, such that 99.9% of relevant pencil beams were within 1 mm of nominal. The dose‐plane comparison data showed no change in passing rate between a 3%/1mm ΔDmin(r) analysis (97.6 +/‐ 3.6%) and a 3%/2mm γ test (97.7 +/‐ 3.2%). Conclusions PSQA dose‐comparison agreements corresponding to a search radius outside of machine performance limits are likely false positives. However, the elliptical shape of the γ test is too dose‐restrictive with a spatial‐error threshold set at 1 mm. This work introduces a cylindrical search shape, proposed herein as more relevant to plan quality, as part of the new DDM planar‐dose comparison algorithm. DDM accepts all pixels within a given dose threshold inside the search radius, and carries forward plan‐quality metrics in a straightforward manner for evaluation.


| INTRODUCTION
Patient-Specific Quality Assurance (PSQA) is a common practice for intensity modulated radiation therapy with proton (IMPT) and x rays (IMRT) in order to verify agreement between the calculated and delivered doses. The measurement methods have evolved in step with the proliferation and increasing complexity of IMRT: contemporarily, measurement of the planned delivery is performed on a planar or cylindrical array, and then cross compared with a treatment planning system (TPS) calculation, with the level of agreement being described by a γ index (see Eq. 1) as proposed by Low et al. 1 There are two parameters which define the γ index: Distance to Agreement (DTA or Δd M ) and Dose Difference (ΔD M ). These criteria define the passable agreement tolerances. The DTA is classically defined as a distance between a measured point and the nearest calculated pixel with a dose value that agrees within a set threshold. 2 The DTA commonly used for the γ index is conflated with a dose agreement criterion to define a passing (γ < 1) ellipse; this search shape results in the dose threshold decreasing as the DTA increases. The spatial axis of the ellipse is defined by taking the three-dimensional (3D) distance between the locations of the mea- A resulting γ index <1 indicates that the measured point has passed the correlation test. TG 119 recommended a 90% passing rate for a Δd M = 3mm and ΔD M = 3%, considering >10% relative dose 3 , though PSQA was a secondary consideration of the report.
The more recent PSQA-dedicated TG 218 4 suggested a passing rate tolerance of 95%, and a passing rate action level of 90%, assuming test thresholds of ΔD M = 3% and Δd M = 2mm for pixels receiving >10% of intended dose.
The γ test gained widespread acceptance relatively soon after being introduced; test results provide a simple summary of the somewhat complex interplay between beam alignment and dose accuracy. The now widespread availability of intensity modulated beam delivery, including IMRT and volumetric moldulated arc therapy (VMAT), spurred the demand for increasingly sophisticated PSQA diagnostics. [5][6][7][8][9] Specifically, previous work has denoted γ-test shortcomings in several key parameters: spatial sensitivity, 6,9-12 dose sensitivity 5,8,10,[13][14][15] and specificity. 7,16,17 While insightful and formative to this work, these reports did not address some challenges unique to evaluating IMPT dose distributions.
In an effort to test the sensitivity of our institution's PSQA process for IMPT delivery, dose deviations of up to 15% were inserted into a 3x3cm 2 plane within a 10x10x10 cm 3  A spot-scanning proton beam provided the opportunity to directly measure parameters, such as beamlet position, from empirical measurements of the delivery system. Beamlet position accuracy was continuously monitored as part of our synchrotron delivery system. The synchrotron safety system tracks systematic and random beamlet position deviation to ensure the centroid and tail of the beamlet superposition remain within tolerance.

2.B | Search distance vs distance to agreement
In order to combine the dose and distance dimensions into a γ index, units are removed through division, thereby concealing the responsible parameter (dose or DTA), such that the user must back calculate via the γ angle and index values to tease out the per-pixel dose or DTA. To achieve a fixed dose threshold across the search shape and a more straightforward ouptut, we elected to report per-pixel dose deviations directly as a histogram for a given search distance threshold.
To support the proposed DDM method, a new parameter, named the Search Distance (r), is necessary. Unlike DTA, which is a variable in the γ index equation, r is fixed value defined per clinic from empirical determination of the spatial accuracy of the modulated beamlet.
r defines a radius that outlines a circular area (or spherical volume) Alignment, or registration, of the fields negated the error components associated with setup and couch uncertainties, ensuring that the positions of the individual pencil beams were the dominant contributor to spatial uncertainty of the PSQA system. Relative plane shifts required during alignment were tracked to ensure that all adjustments fell within error tolerances of setting equipment up to lasers, and were consistent in magnitude and direction for each data collection session. Laser and x-ray isocenter coincidence, and beam and x-ray isocenter coincidence were checked daily prior to clinical use per institutional policy.

2.C | Dose difference minimum vs dose difference
The γ test uses a dose difference to define one axis of the elliptical space which defines acceptable agreement. In contrast, the proposed method reports a spatially limited Dose Difference Minimum: ΔD min (r) or DDM, where "N" is the number of test points within the area (volume) bounded by the empirically determined r, "ΔD measÀTPS n " is the difference in dose found between the measurement and the calculation from the TPS for the n th test point, and "Max Dose" is the global maximum of the dose plane. Similar to the available γ analysis algorithms, this algorithm correlates each measurement to a matched location within the calculated dose volume and then compares with multiple (N) dose points surrounding that location. The difference is that the dose threshold is invariant over the full search space. The minimum result from the N comparisons is still the accepted output, as with γ analysis, but now may be expressed directly as either absolute or relative dose for a given search distance. This function may be simply thought of as reporting the best dose agreement within a statistically probable search distance, per pixel. The data that support the findings of this study are available from the corresponding author upon reasonable request.

3.A | Search distance
Average beamlet position deviation from x-ray isocenter was determined from high-resolution (0.275 mm/pixel) scintillator measurements over a 6-month period across all four treatment gantries. 19 The resulting deviation histogram was weighted according to clinical use frequency, based on delivery logs collected over a 1-month period, to determine the probability weighted distribution.
The mean beamlet radial deviation is 0.38mm from nominal, represented in Table 1, with a standard deviation of 0.19mm. It was determined that 99.9% of relevant pencil beams are delivered within 1 mm of the desired location (Fig. 1).
While the underlying distribution of deviations is non-normal due to the restriction that deviations must be positive, large sample theory and the Central Limit Theorem allow for testing based on the normal distribution. Both the one sided Student's t-test and Wilcoxon signed rank test showed high statistical significance (P < 0.0001) for the probability that the average beamlet position deviation was less than 1 mm. Any agreement to test points beyond the determined range from the measured point is most likely coincidental, or a false positive, since beam position accuracy renders correlation statistically improbable (i.e., <0.1% for the observed distribution). Based on this characterization of our system, and in line with TG-224 standard of 1-mm agreement between beam and x-ray isocenter, an r = 1.0 mm was used 18 .

3.B | Relative sensitivities: D min (r) vs γ index
We employed an area under the curve (AUC) analysis as a means to demonstrate the relative sensitivities of the ΔD min (r) and γ index approaches. As discussed in Jiang et al, 20 an acceptance region can be delineated in two dimensions for AUC analysis by plotting the function for each test with dose on the vertical axis and distance on the horizontal axis (Fig. 2). The ellipse for the γ test is achieved by reflecting the curved line over the x-axis and rotating about the yaxis; the ΔD min (r) cylinder is visualized using the same reflection and rotation process. The relative dose and beam position sensitivities for the defined acceptance region can be correlated to the AUC for each line in Fig. 2. It is easy to observe that the greater the AUC, F I G . 1. Probability-weighted radial distance deviations of discrete proton spots from x-ray isocenter. 99.9% of delivered spots probabilistically deviate from nominal by <1.0 mm. Data were measured over 30x40cm 2 field size grid in four gantries over 6 months; probability density of spot deviations were determined based on all plans delivered over a 1-month treatment period.
the more variation is allowed for passing a measurement. However, when characterizing the spatial accuracy of our proton accelerator we determined with 99.9% confidence that each beamlet would be delivered within 1 mm of the expected location. Therefore, we anticipate that dose congruence beyond 1-mm radial deviation from intended location (indicated by single hash marks in Fig. 2) is highly indicative of a false positive, rather than true dosimetric agreement.
To avoid these false positives, passing criteria should ideally be set based on the spatial tolerances of each machine, leading to test metrics better correlated to the quality of beam delivery.

3.C | Example output
The top row of Fig. 3 is provided for the reader to reference the measured and calculated dose planes used for comparison; the x and y represent image registration shifts used to align the fields. Registration was performed to provide a consistent comparison, isolate spatial deviations to the beamlets, and also to functionally track systematic shifts, such as laser offset. The bottom row of Fig. 3 provides additional information from a heat map, with astrisks overlaying the heat map representing failing pixels from a binary γ test. Due to the non-elliptical area, r can be smaller than DTA for a given specificity (based on the AUC value, see Fig. 2), or the ability to convey reasonable similarity between planes, but the pattern of failing pixels for a 3%/1mm DDM still resembles that of the 3%/ 2 mm γ test, indicatd by the congruence of asterisks with out of tolerance dose scale on the heat map in Fig. 3 The passing-rate agreement of the 3D DDM (90% pixels with a ΔD min (r = 1 mm) <3%, µ < 1%) with the 3%/2mm γ analysis for 75 IMPT patients (300 planar measurements, 1-5 fields per patient) is displayed in Fig. 4. The mean passing rate for this comparison saw no change between a 3%/2 mm γ test (97.7 +/-3.2%) and a 3%/ 1 mm ΔD min (r) (97.6 +/-3.2%), largely because the γ test search pixels between 1 mm and 2 mm are irrelevant due to the overwhelmingly high probability of a proton spot to fall within a 1-mm deviation.
The minimum γ index for each pixel was computed for 10 patients; Figure 5 shows histograms from two representative fields.
An average of 97.8% (σ = 4.5%) of the γ indices were found within 1 mm of the correlated location.

| DISCUSSION
This paper has presented a technique for dose distribution comparison. The method accepts the same inputs as the conventional γ test, and so may be used alternatively to or supplementarily with existing γ-analysis workflows.
The proposed method carries forward pixel-by-pixel dose information that is masked by the dose and distance conflation inherent in the γ test. The newly proposed DDM method allows dose deviations to be read out on a continuum and analyzed potentially F I G . 2. Area Under the Curve (AUC) displaying relative sensitivity of Distance to Agreement (DTA) and Dose Difference for two sets of γ index criteria: 3%/2mm AUC = 4.71 and 3%/1mm AUC = 2.36. The various γ sensitivities are represented by a single quadrant of each ellipse that vary in size. These are compared to the relative sensitivity of finding the dose difference minimum (DDM) within a fixed search distance: 3%/1mm AUC = 3.0, represented by a rectangle. The reader may readily observe that for a given distance threshold, the γ test is significantly more dose restrictive than the proposed DDM method, rendering use of the 3%/1mm γ test clinically intractable. The single-hashed area represents distances outside of machine performance limits for our system, where γ agreements would have a high likelihood to be false positives; the double-hashed area represents where the 3%/2mm γ is too dose restrictive, and reported failures would have a high likelihood to be false negatives.
more intuitively, and can be run in parallel with the γ test for historical tracking as both tests operate using the same input data.
The per-pixel dose variation presented in the histogram is a key element of the proposed method. This histogram enables both statistical and enhanced intuitive evaluations which are not available in the γ test.
The statistical similarity observed between the two methods suggests that PSQA passing rates will not drastically change for institutions currently using a 3%/2mm γ test. However, the hope is that having more data available for analysis, presented in an intuitive, straightforward manner will improve the users ability to diagnose delivery errors and reduce the potential for errors to go unnoticed.

4.A | Sensitivity
The γ test has been rigorously analyzed and inadequacies have been noted in the literature, specifically with spatial insensitivity, 6,9-12 and dose insensitivity. 5,8,10,[13][14][15] Our objective is to optimize the AUC Although the plot in Fig. 4
Expanding the DTA to 2 mm allows users to capture just over 90% of relevant points, but doing so with an elliptical shape simultaneously encompasses numerous, irrelevant points outside of machine performance limits. Tracking the γ-index DTA for 10 patients showed that about 98% of passing points fell within 1 mm (Fig. 5).
The DDM-proposed shape preserves specificity without adding irrelevant search points, demonstrated from the percentage of plans passed being similar to that achieved by others 5,17 and to our own comparison with γ-test results (Fig. 5). The histogram produced enables simple and accurate determination of maximum error and dose shift trends present in the measurement; this heightened access to data ensures that either large point-dose errors or small dose shifts on large volumes will be recognized and appropriately evaluated by a responsible physicist. Neither of these potentially concerning scenarios are discernible with standard Boolean γ-test procedures.

| CONCLUSIONS
Field-delivery spatial accuracy was well within 1 mm consistent with TG-224 recommendations, 18 as well as extensive QA and delivery logs suggesting that pixel-comparison agreement outside of 1 mm is highly unlikely to occur, and is likely a false positive when it does.
However, the elliptical shape of the γ test is too dose restrictive with a spatial error threshold set at 1 mm. The cylindrical search shape of the new DDM algorithm, proposed herein as more relevant to the quality of beam delivery, accepts all pixels within a given dose threshold inside the search radius. In addition, using a fixed dose threshold within an empirically defined search radius allows DDM to present the magnitude and direction of per-pixel dose deviations in a straightforward manner to the user.

CONF LICT OF I NTEREST
The authors have no relevant conflicts of interest to disclose.

AUTHOR CONTRIBU TI ONS
Bryce Allred performed principle data collection, analysis and manuscript composition and was a co-creator of the method described in the manuscript. Jie Shan and Wei Liu authored essential portions of the method code, and contributed to manuscript composition. Todd A. DeWees provided statistical support and analysis for the duration of the project, and composed portions of the manuscript. Jiajian Shen and Daniel Robertson collected data supportive to the method, and contributed to manuscript composition. Joshua Stoker oversaw the efforts of all individuals on the project, finalized the manuscript and was also co-creator of the proposed method.