Geometric and dosimetric accuracy of deformable image registration between average‐intensity images for 4DCT‐based adaptive radiotherapy for non‐small cell lung cancer

Abstract Purpose Re‐planning for four‐dimensional computed tomography (4DCT)‐based lung adaptive radiotherapy commonly requires deformable dose mapping between the planning average‐intensity image (AVG) and the newly acquired AVG. However, such AVG‐AVG deformable image registration (DIR) lacks accuracy assessment. The current work quantified and compared geometric accuracies of AVG‐AVG DIR and corresponding phase‐phase DIRs, and subsequently investigated the clinical impact of such AVG‐AVG DIR on deformable dose mapping. Methods and Materials Hybrid intensity‐based AVG‐AVG and phase‐phase DIRs were performed between the planning and mid‐treatment 4DCTs of 28 non‐small cell lung cancer patients. An automated landmark identification algorithm detected vessel bifurcation pairs in both lungs. Target registration error (TRE) of these landmark pairs was calculated for both DIR types. The correlation between TRE and respiratory‐induced landmark motion in the planning 4DCT was analyzed. Global and local dose metrics were used to assess the clinical implications of AVG‐AVG deformable dose mapping with both DIR types. Results TRE of AVG‐AVG and phase‐phase DIRs averaged 3.2 ± 1.0 and 2.6 ± 0.8 mm respectively (p < 0.001). Using AVG‐AVG DIR, TREs for landmarks with <10 mm motion averaged 2.9 ± 2.0 mm, compared to 3.1 ± 1.9 mm for the remaining landmarks (p < 0.01). Comparatively, no significant difference was demonstrated for phase‐phase DIRs. Dosimetrically, no significant difference in global dose metrics was observed between doses mapped with AVG‐AVG DIR and the phase‐phase DIR, but a positive linear relationship existed (p = 0.04) between the TRE of AVG‐AVG DIR and local dose difference. Conclusions When the region of interest experiences <10 mm respiratory‐induced motion, AVG‐AVG DIR may provide sufficient geometric accuracy; conversely, extra attention is warranted, and phase‐phase DIR is recommended. Dosimetrically, the differences in geometric accuracy between AVG‐AVG and phase‐phase DIRs did not impact global lung‐based metrics. However, as more localized dose metrics are needed for toxicity assessment, phase‐phase DIR may be required as its lower mean TRE improved voxel‐based dosimetry.


| INTRODUCTION
Recent advances in radiotherapy (RT) have enabled the use of highly conformal treatment plans. 1,2 However, over the treatment course, patient's breathing pattern can change, 3 and the tumor and normal tissue can change in volume, shape, and position in response to the treatment. 4 As a result, the original treatment plan may not be optimal to deliver the prescribed dose, potentially leading to tumor underdosing and healthy tissue overdosing. To account for these changes over the treatment course, the treatment plan must be adapted to the new anatomy as previous studies have demonstrated improved clinical outcomes with lung adaptive RT. 5,6 The goal of adaptive RT is to maintain target coverage and normal tissue sparing by re-optimizing the treatment plan, based on the most recent imaging data that reflect setup differences, patient anatomy changes, and tumor response. 7,8 An essential part of adaptive RT workflow is to consider the dose received in the initially completed fractions in the dose optimization based on the newly acquired planning image. 9 In many RT practices for non-small cell lung cancer (NSCLC), four-dimensional computed tomography (4DCT) is used for treatment planning to account for respiratory-induced motion during beam delivery. 10 Composed of 3-dimensional CT images acquired over full breathing cycles, a 4DCT is usually binned into 10 breathing phases, ranging from end-inhalation phase (T0) to mid-ventilation phase (T3) to end-exhalation phase (T5). 11 An average-intensity image (AVG) can also be created by averaging pixel intensity values of all breathing phases of a 4DCT, and many lung RT practices calculate the radiation dose for treatment planning using AVG because it is composed of the average density seen at each voxel location, [12][13][14] which allows the delineation of the internal target volume, comprising of individual target volumes at all phases. [15][16][17] The current NRG lung template mandates that dose calculation is performed on AVG for free breathing lung cases. 18 During plan adaptation, calculation of the composite dose requires the dose distribution on the AVG of the original planning 4DCT to be mapped onto the dose distribution on the AVG of the newly acquired 4DCT. 19 This dose mapping can be achieved with deformable image registration (DIR) between the associated AVGs (AVG-AVG DIR).
Intuitively speaking, accurate DIR is the prerequisite for accurate dose deformation for plan adaptation, and for 4DCT-based NSCLC treatment planning, it is important to quantify the uncertainty in DIR between the primary and adaptive 4DCTs. Although numerous studies have evaluated the accuracy of intra-4DCT DIR (e.g. between T0 and T5), 20-24 very limited number of studies investigated the accuracy of inter-4DCT (phase-phase) DIR to analyze the effect of longitudinal anatomic changes over the course of RT. 25 The RTOG 1106 trial implemented AVG-based planning for free-breathing treatments and rigidly registered the original and mid-treatment AVGs for dose propagation because the uncertainty of AVG-AVG DIR was unknown. 26 Therefore, the current work evaluated the accuracy of AVG-AVG DIR to provide the uncertainty within this adaptive treatment planning workflow for NSCLC.
In this study, the geometric accuracy of AVG-AVG DIR was quantified using corresponding anatomic landmarks and was subsequently compared with the geometric accuracy of DIR between corresponding phase pairs of the 4DCTs. The accuracy of AVG-AVG DIR and phase-phase DIR was also evaluated with respect to landmark motion observed for each patient during standard respiratory motion at the planning stage. Finally, by evaluating the dosimetric impact of such geometric accuracy, implications of AVG-AVG deformable dose mapping for adaptive re-planning were investigated and the associated clinical guidelines were provided.

| Patient data
The original planning 4DCT and a mid-treatment 4DCT (approximately four weeks into treatment to mimic an adaptive RT setting) were retrospectively evaluated for 28 randomly selected patients, who were previously treated with intensity-modulated RT under an Institutional Review Board-approved randomized clinical trial for locally advanced NSCLC. 13 Weekly 4DCTs, 6-8 per patient, were acquired conversely, extra attention is warranted, and phase-phase DIR is recommended.
Dosimetrically, the differences in geometric accuracy between AVG-AVG and phase-phase DIRs did not impact global lung-based metrics. However, as more localized dose metrics are needed for toxicity assessment, phase-phase DIR may be required as its lower mean TRE improved voxel-based dosimetry.

K E Y W O R D S
4DCT, adaptive radiotherapy, deformable image registration accuracy, non-small cell lung cancer during the treatment course for motion and target reassessment. These 4DCTs were acquired with voxel size of 0.98 × 0.98 × 2.5 mm 3 using the Discovery CT system (General Electric Healthcare, Waukesha, WI) with 120 kVp, operated in cine mode. Patient surface motion was monitored by Varian's Real-time Position Management Respiratory Gating (Varian Medical Systems, Palo Alto, CA). 27

| Hybrid intensity-based DIR
The DIR algorithm used for this evaluation was ANACONDA (ANAtomically CONstrained Deformation Algorithm), a hybrid intensity-based DIR algorithm commercially available in RayStation v9 (RaySearch Laboratories, Stockholm, Sweden). 20 The image registration process is as follows. Prior to image deformation, reference and target images are rigidly registered based on image similarity measured over all voxels that are enclosed by the external body contours in both images. The optimization process of ANACONDA is based on image similarity as measured by a correlation coefficient and is solved by a quasi-Newton algorithm. 20 This process is regularized by minimizing the Dirichlet energy of the generated deformation vector field (DVF). DVF smoothness is maintained by penalizing large shape deviations of regions of interest (ROIs) defined in the reference image, and invertibility is checked by the determinant of the Jacobian. When the optimization process is constrained by user-delineated organ contours in both reference and target images, such geometric information makes the ANACONDA method hybrid.
In the current work, rigid registration was first established between the planning AVG and the midtreatment AVG, focusing on bone and tumor regions and discarding rotations to mimic the daily kilovolt alignment. With ANACONDA, T0, T3, T5, and AVG of the mid-treatment 4DCT were registered to their counterparts of the planning 4DCT (as the reference image). Left and right lung boundaries were manually contoured in each image pair to guide the DVF as controlling ROIs. The rigid and deformable registrations were systematically performed through RayStation scripting. For each registration, the results were qualitatively assessed via image fusion of the deformed target image and the reference image with a focus on the bronchial and vascular alignment, per recommendation of the American Association of Physicists in Medicine (AAPM) Task Group 132. 28

| Dice similarity coefficient
American Association of Physicists in Medicine Task Group 132 recommends metrics including Dice similarity coefficient (DSC) and target registration error (TRE) for validation of DIR accuracy. DSC is a measure of overlap between the ROI in the reference image and the same ROI in the deformed target image: where v deformed is the volume of the deformed ROI and v reference is the volume of the reference ROI. 29 DSCs of lung contours across all DIRs were obtained.

| Target registration error
Unlike DSC, which focuses on the alignment of the organ contour, TRE addresses the internal alignment of images. TRE is defined as the three-dimensional Euclidian distance between a landmark's position in the deformed target image and its location in the reference image: where x, y, and z are the Cartesian coordinates in the reference image space. In the current work, vessel bifurcations were used as landmarks because of their abundance in the lungs and high contrast against air-like lung tissue. Landmark pairs were identified on corresponding phases (T0, T3, and T5) of the planning 4DCT and the mid-treatment 4DCT. These landmark pairs were overlaid onto the AVGs of the corresponding 4DCTs to assess the accuracy of AVG-AVG DIR, given that the AVG comprises all phases and exists in the same image space with phases of the same 4DCT. As a result, TRE of each landmark pair was computed for AVG-AVG DIR and phase-phase DIRs (T0-T0 DIR, T3-T3 DIR, or T5-T5 DIR).

| Automatic landmark identification method
Manual landmark identification is a cumbersome process that could introduce uncertainties such as interobserver variability. For the current work, the following in-house fully-automatic landmark identification workflow was used. 30 By thresholding the reference and target images, vessels on these images were automatically segmented, through which the centerlines could be extracted and the bifurcations on the centerlines detected using the neighbors' count. The segmented vessels on the reference and target images were then registered via a separate intensity-based Demons DIR algorithm, 31 after which bifurcations <4 mm apart after the deformation were considered landmark pairs. Landmarks in both lungs were automatically identified on T0 pairs, T3 pairs, and T5 pairs. This workflow has been previously validated against 10 pairs of T0/T5 lung 4DCTs, each with 300 manually identified landmark pairs (DIR-Lab, http//www.dir-lab.com) and 10 pre-/post-RT pairs of liver contrast-enhanced CTs with five manual landmarks each.
In the current work, the automatic workflow was also validated against manually identified landmarks in the ipsilateral lung from a random subset of 10 patients. Compared with the contralateral lung, the ipsilateral lung could inherently exhibit larger anatomic changes between the two 4DCTs owing to tumor response. For the validation, 16 anatomic landmark pairs were manually identified at vessel bifurcations in the ipsilateral lung in T0 pairs. These landmarks were uniformly distributed along the superior-inferior direction of the lung to potentially cover a wide range of respiratory motion exhibited by different regions of the lung. 3,32 The same 16 landmark pairs were then re-identified in the T3 pairs and T5 pairs. Each landmark (position) was then directly copied to the AVG pairs, the same way as in the automatic workflow. In total, for each 4DCT set, 16 landmarks were identified in each of the three phases, and three variations of these 16 landmark positions (48 total) were identified in the AVG. The above process is described in Figure 1. Linear regression was used to compare mean TRE values from each phase-phase DIR of manual landmarks against the mean TRE values of automatic landmarks.

| Comparison of geometric accuracy between AVG-AVG DIR and phasephase DIRs
The TRE of each landmark pair from T0-T0 DIR, T3-T3 DIR, T5-T5 DIR, and AVG-AVG DIR was obtained for all 28 patients. Paired Student's t-test was used to compare TRE of AVG-AVG DIR and TRE of T0-T0 DIR, based on T0-T0 landmark pairs, and such comparison was repeated for T3-T3 landmark pairs and T5-T5 landmark pairs. The same statistical testing was also used for comparisons among the phasephase DIRs.
F I G U R E 1 Manual landmark identification process to validate the automatic landmark workflow. The planning four-dimensional computed tomography (4DCT) and the midtreatment 4DCT are shown. Coronal views of the end-inhalation phase (T0), mid-ventilation phase (T3), and endexhalation phase (T5), as well as the average-intensity image (AVG), are shown in the left column. Simplified cartoons of the two 4DCTs are shown on the right. Deformable image registrations (DIRs) were established between corresponding phases and AVGs of the two 4DCTs. A landmark at the lower vessel bifurcation in the right lung was identified with different colors in different phases. The landmark's locations in these phases were transferred onto the AVG of each corresponding 4DCT

| Effect of breathing motion
For each patient, respiratory-induced landmark breathing motion was represented by the displacement of landmarks under hybrid intensity-based DIR between T5 (reference exam) and T0 of the planning 4DCT, with both lungs as controlling ROI. Assuming TREs of landmark pairs in the AVG pair and corresponding phase pairs follow a Gaussian distribution, we used linear regression to quantify the effect of landmark breathing motion on TRE for AVG-AVG DIR and T5-T5 DIR. A more simplified comparison was made in which these landmarks were divided using a cutoff of motion magnitude of 10 mm, and a two-sample unequal variance t-test was used to compare the average TRE values for landmarks that showed at least 10 mm motion and those that showed <10 mm motion, for both AVG-AVG DIR and T5-T5 DIR.

| Clinical impact of DIR geometric accuracy on dose mapping
Hybrid intensity-based DIR was established between AVGs and between T5s (from planning to mid-treatment) with both lungs as controlling ROIs. Accordingly, the original planned dose on AVG, calculated with RayStation's default uniform dose grid of 3 mm, was deformed to the mid-treatment 4DCT to simulate adaptive RT with both AVG-AVG DIR and T5-T5 DIR (warranted by AVG and phase existing in the same image space). For the ipsilateral lung, multiple linear regression was performed to correlate AVG-AVG DIR TRE, along with several planning metrics, to the absolute difference in mean lung dose (MLD) between the planned dose deformed with AVG-AVG DIR versus with T5-T5 DIR, and the absolute difference in the volume of lung receiving at least 20 Gy (V 20 ) between the planned dose deformed with AVG-AVG DIR versus with T5-T5 DIR, as clinical endpoints. The planning metrics include gross tumor volume (GTV) motion in the superior-inferior direction, GTV volume inside lung, GTV center of gravity to diaphragm, diaphragm breathing motion in the superior-inferior direction, GTV dose homogeneity index (DHI) (concept proposed by Ding et al. 33 ), DHI of normal lung (lung excluding GTV), and percent of primary tumor volume (PTV) in normal lung. In addition, the dose discrepancy on a voxel/subregional level was evaluated using the percentage of landmarks on T5 that had at least 2 Gy absolute dose difference when the dose distribution was mapped using AVG-AVG DIR versus T5-T5 DIR.

| R ESULTS
AVG-AVG DIR and phase-phase DIR were successfully performed for all patients. On average, 654 ± 162, 603 ± 186, and 606 ± 194 landmarks were identified on each patient's T0, T3, and T5 pairs respectively. The average DSC of left and right lung contours combined was 0.95 ± 0.02 for AVG-AVG DIR and 0.93 ± 0.03 for all phase-phase DIRs. AAPM Task Group 132 considers DSC of 0.80-0.90 satisfactory within the contouring uncertainty of the structures. 28 Therefore, according to this metric, the DIRs were deemed successful.

| Validation of automatic landmark identification workflow
For 10 randomly selected patients, 16 landmarks were manually identified where they were uniformly distributed along the superior-inferior direction within the ipsilateral lung. TREs of these landmarks were compared against those of the automatic landmarks for each phase-phase DIR (Figure 2). With a slope of greater than 1.0, the automatic method consistently reported a larger TRE compared with the manual method (average TRE 2.5 ± 0.9 mm for automatic compared with 2.4 ± 1.7 mm for manual). However, the coefficient of determination was 0.8, and the averaged difference between the two methods was 0.1 mm, substantially smaller than the largest voxel dimension (2.5 mm). In addition, the paired Student's t-test showed no statistical significance (p = 0.55), indicating that the automatic F I G U R E 2 Target registration error (TRE) differences between deformable image registration (DIR) across average-intensity images and DIR across phases for each patient. Each data point represents a labeled patient. The result of linear regression is TRE (auto)=0.5 × TRE (manual) +1.4. Coefficient of determination is 0.81 method was an acceptable substitute for the manual method.

| AVG-AVG DIR compared with phase-phase DIRs
As shown in Figure 3, the mean TRE of T0 landmark pairs was 3.6 ± 1.1 mm for AVG-AVG DIR compared with 2.8 ± 0.8 mm for T0-T0 DIR (p < 0.001). The mean TRE of T3 landmark pairs was 3.0 ± 0.9 mm for AVG-AVG DIR compared with 2.6 ± 0.8 mm for T3-T3 DIR (p < 0.001). The mean TRE of T5 landmark pairs was 3.0 ± 0.8 mm for AVG-AVG DIR compared with 2.5 ± 0.8 mm for T5-T5 DIR (p < 0.001). In total, AVG-AVG DIR resulted in a mean TRE of 3.2 ± 1.0 mm compared with 2.6 ± 0.8 mm for phase-phase DIRs (p < 0.001). For all patients except one, TRE for AVG-AVG DIR was higher than TRE for phase-phase DIRs (for such patient, the TRE for AVG-AVG DIR was 0.07 mm lower than that for T0-T0 DIR and 0.03 mm lower than that for T3-T3 DIR).

| Geometric impact of landmark motion on DIR TRE
Respiratory-induced landmark motion was represented by the displacement of T5 landmarks under T5-T0 DIR of the planning 4DCT. Linear regressions between landmark motion and TRE for AVG-AVG DIR as well as for T5-T5 DIR both yielded coefficient of determination values of <0.1. When comparing landmarks grouped using a cutoff of 10 mm motion, for AVG-AVG DIR, the mean TRE of T5 landmarks with <10 mm motion was 2.9 ± 2.0 mm compared with 3.1 ± 1.9 mm for T5 landmarks with at least 10 mm motion (p < 0.001), whereas for phase-phase DIR, the mean TRE of T5 landmarks with <10 mm motion was 2.5 ± 2.0 mm compared with 2.5 ± 1.9 mm for T5 landmarks with at least 10 mm motion (p = 0.30). Therefore, landmark pairs with at least 10 mm motion had significantly larger TREs than those with <10 mm motion in AVG-AVG DIR, which was not observed for phase-phase DIR. Figure 4 shows for patient #14 the comparison of the deformed planned dose with AVG-AVG DIR versus T5-T5 DIR. This patient carried the largest absolute difference in MLD between the two deformed doses.

| Clinical impact of DIR geometric accuracy on dose mapping
Ipsilateral lung planning information for all 28 patients is shown in Table 1. Multi-linear regression resulted in non-significant correlations (p > 0.05) when correlating AVG-AVG DIR TRE as well as clinical planning information (columns 3-9) to the difference in global clinical metrics of MLD and V 20 between doses mapped with AVG-AVG DIR versus T5-T5 DIR (columns 11 and 12). However, when comparing these two mapped doses on a sub-regional level, both TRE of AVG-AVG DIR and F I G U R E 3 Boxplots of mean target registration error (TRE) of the 28 patients. Each color pair of boxplots represents the mean TRE of phase pairs for the corresponding phase-phase DIR and for AVG-AVG DIR. The standard deviation of TRE ranged from 1.0 to 3.4 mm (not shown). *AVG, average-intensity image; DIR, deformable image registration; T0, end-inhalation phase; T3, mid-ventilation phase; T5, end-exhalation phase GTV motion reached statistical significance (p = 0.04 and p = 0.01, respectively), and percent of PTV in normal lung achieved p = 0.06, when correlated against the metric: portion of T5 landmarks that had at least 2 Gy absolute dose difference (≥2 Gy%) (column 10). For this local dose metric, 12 patients achieved at least 10% (at least 10% of total T5 landmarks had at least 2 Gy absolute dose difference). Eleven of these 12 patients met at least one of the following conditions: TRE larger than 3.0 mm (recommended dose grid size, per AAPM Task Group 132 28 ), GTV motion larger than 5.0 mm (AAPM Task Group 76 34 recommends that respiratory management techniques be considered when tumor motion is larger than 5.0 mm), or percent of PTV in normal lung larger than 20%, achieving sensitivity of 0.92. Only 3 of the 16 patients with <10% of ≥2 Gy% demonstrated any of these three criteria, achieving a specificity of 0.81.

| DISCUSSION
In the current work, we evaluated the accuracy of AVG-AVG DIR and compared it with that of DIR between corresponding phases (T0, T3, and T5) for 4DCT-based treatment planning in NSCLC. Based on imaging data and landmark pairs from 28 NSCLC patients, our findings indicated that AVG-AVG DIR has larger uncertainty, i.e., inferior accuracy, compared with phase-phase DIR, and that breathing-induced motion degrades the accuracy of AVG-AVG DIR more than that of phase-phase DIR.
AAPM Task Group 132 recommends that the target TRE should be less than the maximum image voxel size. 28 Therefore, given that 4DCTs used in the current work were 2.5 mm thick, T5-T5 DIR achieved the AAPM-recommended target TRE, with mean TRE of 2.5 ± 0.8 mm, and thus can be deemed clinically acceptable. For phases typically subject to larger breathing artifacts, 35 T3-T3 DIR and T0-T0 DIR almost met the target TRE, with 2.6 ± 0.8 mm and 2.8 ± 0.8 mm respectively. However, with an overall mean TRE of 3.2 ± 1.0 mm, AVG-AVG DIR did not meet the uncertainty recommendation, with more than half of the lateral resolution (0.98 mm) over the recommended threshold. In addition, our analysis of the effect of motion showed that large landmark respiratory motion (i.e., at least 10 mm) affects AVG-AVG DIR more than phase-phase DIR. In summary, phase-phase DIR was more robust against breathing motion, which was not achieved by the AVG-AVG DIR. Therefore, per AAPM Task Group 132, additional uncertainty should be assumed, depending on the clinical protocol, if AVG-AVG DIR is used for dose mapping. This is especially true for patients who breathe with a relatively large magnitude, which can potentially introduce a large amount of uncertainty, as well as for anatomic structures that typically have a large motion, such as tumors close to the diaphragm.
Geometrically robust points, which are necessary for TRE, require a geometrically robust image. By definition, AVG is constructed from averaging the pixel intensities of all phases over the breathing cycle. As a result, anatomy on AVGs is blurred due to breathing motion and thus carries inherent uncertainty in representing the true anatomic shapes, which makes AVG not geometrically robust. Practically speaking, AVG is commonly used as the planning image for dose calculation because it captures the entire tumor movement under respiratory motion. 12,[36][37][38] This helps to avoid overdosing normal tissue near the tumor and under-dosing the tumor itself, which can happen with a smooth dose intensity map around the tumor. Although AVG-AVG DIR is the more straightforward choice for clinicians, extra care must be taken when using AVG-AVG DIR to perform dose mapping when adaptive planning is required, given the small but potentially clinically significant uncertainty. Furthermore, considering the effect of TRE on dose metrics, a TRE reduction of 1.6 mm for phase-phase DIR used to deform and propagate the planning dose has been shown to clinically affect decision-making in stereotactic body RT treatment planning. 39 These findings support the potential clinical benefit of using phasephase DIR rather than AVG-AVG DIR. The major strength of the current work was the large number of landmark pairs made available by the automatic landmark identification workflow (average of 600+ per DIR). As a result, this work contributed to closing the gap on evaluating inter-4DCT DIR in the presence of anatomy changes through internal landmark points. In addition to using landmarks to determine TRE as the internal metric, we also used DSC of the left and right lung contours as the external metric. Because the hybrid intensity-based DIR method was constrained by matching the lung contours on the reference and target images, DSC values would naturally be large. In addition, the DSC can quantify how well the organ boundary matches, but it does not guarantee accurate modeling of internal volume. 40 Therefore, using DSC as the sole metric to represent DIR accuracy would be inadequate, and consequently DSC was reported to supplement TRE.
In terms of landmark identification on AVG, the same anatomic landmarks were detected on the three phase images on both 4DCTs and then were directly used for AVG-AVG DIR. The presented method that transfers landmarks defined in the breathing phases onto the AVG of the same 4DCT enabled representation of the actual position of these landmarks (the average of a landmark's locations in breathing phases) in AVG, which is difficult to identify directly, especially in regions that experience large respiratory-induced motion.
In our validation of automated landmarks, automatic landmarks achieved larger TREs than manual landmarks. This is potentially due to landmarks identified by automatic workflows systematically produce worse DIR, so identifying landmarks throughout the entire lung would on average produce larger TREs. This result could also be attributed to a lack of one-one correspondence between automatic landmarks and manual landmarks and the limited number of manual landmark pairs identified. A small set of landmark points has been shown to be insufficient for calculating TRE because this misrepresents the actual spatial accuracy. 41 In fact, only 16 landmark pairs were chosen manually per patient due to the cumbersome nature of manual landmark identification, thus potentially limiting the TRE accuracy based on these landmarks.
The difference in DIR accuracy among the three phases (T0, T3, and T5) could be attributed to differences in the magnitude of breathing artifacts. At T0, the full inhale phase, the patient changes breathing direction, so this phase is most prone to breathing artifacts due to diaphragm motion. In addition, the inhale position tends to vary from breathing cycle to breathing cycle. In contrast, in T5, the patient spends a longer proportion of the breathing phase in the exhale position, thus leading to the fewest motion artifacts; vessels especially near the diaphragm are not "washed out" and can be more accurately identified. Therefore, T5-T5 DIR has the potential to provide the highest registration accuracy among phase-phase DIRs. This can be applied in four-dimensional dose calculation, with phase dose mapped and summed on T5 as the gold-standard dose calculation method for a 4DCT. 42 Therefore, our findings can support the investigation of the differences between T5-based four-dimensional planned dose and accumulated dose using T5-based four-dimensional weekly doses with T5-T5 DIR. Retrospective toxicity studies on a voxel or regional level are also warranted, to achieve the most accurate representation possible for the delivered dose.
In the dosimetric study, as the landmark TRE increased, the probability of the dose to an image voxel in the planning AVG being mapped to a different anatomical location in the re-planning AVG also increased. Such effect was pronounced in the high dose-gradient region near the GTV where adjacent image voxels shared large dose differences. It was magnified when the GTV motion was large, which further increased TRE of AVG-AVG DIR. However, their effect was lessened on global clinical metrics such as mean lung dose, where these voxel-level dose differences cancel each other out. For instance, patient 24 had AVG-AVG DIR TRE of 5.4 mm (largest among all patients) and achieved 21.6% in the previously defined metric ≥2 Gy%, but only achieved 21 cGy and 0.2 in absolute MLD difference and V 20 difference respectively. On the contrary, patient 8 carried AVG-AVG DIR TRE of 5.7 mm but only achieved 2.1% in ≥2 Gy% due to low GTV motion (0.9 mm) and low PTV in normal lung (5.5%). These results are similar to those concluded by Hardcastle et al., 43 where larger registration error in high dose gradient regions caused larger dosimetric uncertainty. These observations demonstrated that the clinical impact of DIR geometric accuracy, as represented by TRE, must be interpreted in the context of the patient and dosimetric parameters. Therefore, when dose is mapped with AVG-AVG DIR versus phase-phase (T5-T5) DIR, global clinical metrics are unlikely to differ significantly. However, if local doses are evaluated based on the mapped dose, phase-based mapping should be considered.
Lastly, to provide context in the clinical adaptive workflow, the following three simplified workflows in the order of increasing complexity can be considered.
(1) Three-dimensional treatment planning using AVG-AVG DIR: original planned dose is calculated on AVG of planning 4DCT; at adaptation, the fraction-corrected planned dose is deformed to the AVG of the newly acquired 4DCT with AVG-AVG DIR; the adapted dose is created based on the deformed dose on the new AVG.
(2) Three-dimensional treatment planning using phasephase DIR: this is an alternative to (1) where a specific phase-phase DIR is performed and used to map the dose calculated on the original AVG onto the secondary AVG for better accuracy. (3) Full four-dimensional treatment planning using phase-phase DIR: dose is calculated on each phase of the planning 4DCT, deformed to a reference phase, and summed over the breathing phases; the fraction-corrected 4D planned dose on a specified phase is deformed to the corresponding specified phase of the newly acquired 4DCT with phase-phase DIR; the adapted dose is created based on the deformed dose. 44

| CONCLUSION
In the current work, TRE was used to quantify the geometric uncertainty of DIRs between corresponding AVGs and corresponding phases for 4DCT-based lung adaptive RT for 28 NSCLC patients. When the region of interest has respiratory-induced motion <10 mm, AVG-AVG DIR may provide sufficient clinical accuracy; however, when motion is at least 10 mm, extra attention is warranted and phase-phase DIR, especially T5-T5 DIR, is recommended. Dosimetrically, the geometric accuracy of AVG-AVG DIR has not been shown to significantly impact global lung-based clinical metrics. However, phase-phase DIR may be required for advanced toxicity correlation studies that utilize local dose metrics (e.g., dose used in voxel-based analyses).

A C K N O W L E D G M E N T S
The authors thank the Image Guided Cancer Therapy Research (IGCTR) Program at MD Anderson Cancer Center, Houston, TX, USA. This research was supported in part by RaySearch Laboratories, Stockholm, Sweden and the Helen Black Image-Guided Radiotherapy Fund.

A U T H O R C O N T R I B U T I O N S
Yulun He has made substantial contributions to conception and design, and/or acquisition of data, and/or analysis and interpretation of data, drafted the article and revised it critically for important intellectual content, and given final approval of the version to be submitted and any revised version. Dr. Cazoulat has made substantial contributions to conception and design, and/or acquisition of data, and/ or analysis and interpretation of data, has participated in drafting the article or revising it critically for important intellectual content, and given final approval of the version to be submitted and any revised version. Dr. Wu has made substantial contributions to conception and design, and/or acquisition of data, and/or analysis and interpretation of data, participated in drafting the article or revising it critically for important intellectual content, and given final approval of the version to be submitted and any revised version. Dr. Peterson has made substantial contributions to conception and design, and/or acquisition of data, and/ or analysis and interpretation of data, participated in drafting the article or revising it critically for important intellectual content, and given final approval of the version to be submitted and any revised version. Dr. McCulloch has made substantial contributions to conception and design, and/or acquisition of data, and/or analysis and interpretation of data, participated in drafting the article or revising it critically for important intellectual content, and given final approval of the version to be submitted and any revised version. Brian Anderson has made substantial contributions to conception and design, and/or acquisition of data, and/or analysis and interpretation of data, participated in drafting the article or revising it critically for important intellectual content, and given final approval of the version to be submitted and any revised version. Dr. Pollard-Larkin has made substantial contributions to conception and design, and/or acquisition of data, and/or analysis and interpretation of data, participated in drafting the article or revising it critically for important intellectual content, and given final approval of the version to be submitted and any revised version. Dr. Balter has made substantial contributions to conception and design, and/ or acquisition of data, and/or analysis and interpretation of data, participated in drafting the article or revising it critically for important intellectual content, and given final approval of the version to be submitted and any revised version. Dr. Liao has made substantial contributions to conception and design, and/or acquisition of data, and/or analysis and interpretation of data, participated in drafting the article or revising it critically for important intellectual content, and given final approval of the version to be submitted and any revised version. Dr. Mohan has made substantial contributions to conception and design, and/ or acquisition of data, and/or analysis and interpretation of data, participated in drafting the article or revising it critically for important intellectual content, and given final approval of the version to be submitted and any revised version. Dr. Brock has made substantial contributions to conception and design, and/or acquisition of data, and/or analysis and interpretation of data, participated in drafting the article or revising it critically for important intellectual content, and given final approval of the version to be submitted and any revised version.

C L I N I C A L T R I A L S TAT E M E N T
Data used in the paper was acquired under an Institutional Review Board-approved clinical trial.

D ATA AVA I L A B I L I T Y S TAT E M E N T
The data that support the findings of this study are available on request from the corresponding author. The data are not publicly available due to privacy or ethical restrictions.