Development of a physical geometric phantom for deformable image registration credentialing of radiotherapy centers for a clinical trial

Abstract Purpose This study aimed to develop a physical geometric phantom for the deformable image registration (DIR) credentialing of radiotherapy centers for a clinical trial and tested the feasibility of the proposed phantom at multiple domestic and international institutions. Methods and materials The phantom reproduced tumor shrinkage, rectum shape change, and body shrinkage using several physical phantoms with custom inserts. We tested the feasibility of the proposed phantom using 5 DIR patterns at 17 domestic and 2 international institutions (21 datasets). Eight institutions used the MIM software (MIM Software Inc, Cleveland, OH); seven used Velocity (Varian Medical Systems, Palo Alto, CA), and six used RayStation (RaySearch Laboratories, Stockholm, Sweden). The DIR accuracy was evaluated using the Dice similarity coefficient (DSC) and Hausdorff distance (HD). Results The mean and one standard deviation (SD) values (range) of DSC were 0.909 ± 0.088 (0.434–0.984) and 0.909 ± 0.048 (0.726–0.972) for tumor and rectum proxies, respectively. The mean and one SD values (range) of the HD value were 5.02 ± 3.32 (1.53–20.35) and 5.79 ± 3.47 (1.22–21.48) (mm) for the tumor and rectum proxies, respectively. In three patterns evaluating the DIR accuracy within the entire phantom, 61.9% of the data had more than a DSC of 0.8 in both tumor and rectum proxies. In two patterns evaluating the DIR accuracy by focusing on tumor and rectum proxies, all data had more than a DSC of 0.8 in both tumor and rectum proxies. Conclusions The wide range of DIR performance highlights the importance of optimizing the DIR process. Thus, the proposed method has considerable potential as an evaluation tool for DIR credentialing and quality assurance.

the tumor deformation using the inflated and deflated catheter, it could not reproduce the deformation patterns. 19,20 Moreover, this phantom had great potential for DIR credentialing; however, it could not simulate complicated anatomical changes owing to its simplicity.
To the best of our knowledge, no studies have been conducted on DIR credentialing (i.e., an end-to-end test for DIR performance) for any clinical trial. To perform DIR credentialing, the development of a DIR physical geometric phantom suitable for DIR credentialing is required.
In this study, we developed a physical geometric phantom for the DIR credentialing of radiotherapy centers for a clinical trial and tested the feasibility of implementing the proposed phantom at multiple domestic and international institutions.

2.A | Development of the physical phantom
The proposed DIR phantom comprised a base phantom and six custom inserts [ Fig. 1(a)]. The base phantom was composed of a "tough water phantom" material (WD, Kyoto Kagaku Co. Ltd., Kyoto, Japan) and had six holes (slots 1-6). The density and effective atomic number of tough water were 1.017 g/cm 3 and 7.42, respectively. The base phantom was designed to simulate various clinical situations using multiple custom inserts (e.g., tumor shrinkage and rectal filling).
These custom inserts contained an internal object with different shapes and materials. This phantom is not an anthropomorphic phantom; the objective is to ensure that the phantom resembles the actual patient to the maximum possible extent without losing its geometric simplicity for quantitative analysis. Phantoms should be as small and light as possible to facilitate delivery. Therefore, the thickness of the entire phantom was set to 10 cm to reduce the phantom weight as much as possible while facilitating DIR evaluation. Herein, we selected six custom inserts that reproduce the features of the pelvic region in patients [ Fig. 1(b)]. For slots 1 and 3, we inserted the inserts containing an octagonal "tough bone" (BE-H, Kyoto Kagaku Co. Ltd) to simulate the right and left femoral heads. The octagonal "tough bone" had a length, width, and height of 8.0, 4.0, and 3.5 cm, respectively. The density and effective atomic number of the tough bone were 1.50 g/cm 3 and 11.70, respectively. In actual patients, the shape of the femoral head is closer to a sphere than an octagon. The octagonal shape was adopted to facilitate a more detailed evaluation of the DIR accuracy using a shape with corners. For slot 2, we inserted two inserts with different sizes to simulate tumor shrinkage, namely, large and small trigonal polymethyl methacrylate (PMMA). The large trigonal PMMA had a length, width, and height of 6.0, 6.0, and 4.0 cm, respectively. The small trigonal PMMA had a length, width, and height of 6.0, 4.5, and 3.0 cm, respectively. The density of PMMA was 1.190 g/cm 3 . Moreover, these inserts had 40 fiducial markers (PMMA, 4-mm Φ). These fiducial markers were evenly placed outside the trigonal PMMA. The detailed position of the fiducial marker is shown in Fig. 2. In actual patients, the shape of the tumor is closer to a sphere than a trigonal shape. The trigonal shape was adopted to facilitate a more detailed evaluation of the DIR accuracy using a shape with corners as well as femoral heads. For slot 5, we inserted three different custom inserts with different shapes simulating rectum deformation, namely, Sshaped, inverted S-shaped, and trapezoidal air cavities. These inserts had a length, base width, top width, and height of 6.0, 3.0, 4.0, and 2.0 cm, respectively. For slots 4 and 6, blank inserts (i.e., tough water inserts) were inserted. Moreover, a three-quarter-scaled base phantom was used to simulate the overall body shrinkage. This phantom was deigned to simulate weight loss; however, owing to technical issues with the production, everything, including the inserts, was scaled down to 3/4. To increase sensitivity for evaluating technical skills for DIR, we fabricated the proposed phantom with more deformation within the phantom, although this situation is unlikely to occur in actual clinical practices.

2.B | DIR at various phantom settings
We developed five DIR patterns using the combination of four phantom configurations (Fig. 3). All four phantom settings shared the following inserts: octagonal tough bone inserts (femoral heads) in slots 1 and 3 and blank inserts in slots 4 and 6. Only phantom setting 4 used the three-quarter-scaled base phantom. For phantom setting 1, the insert with a large trigonal acrylic object and the insert with an S-shaped air cavity were inserted in slots 2 and 5, respectively. For phantom setting 2, the insert with a large trigonal acrylic and the insert with a trapezoidal air cavity were inserted in slots 2 and 5, respectively. For phantom setting 3, the insert with a small trigonal acrylic and the insert with an inverted S-shaped air cavity were inserted in slots 2 and 5, respectively. For phantom setting 4, the three-quarter-scaled phantom was used and the insert with a large trigonal acrylic and the insert with a trapezoidal air cavity were inserted in slots 2 and 5, respectively.
Based on these four phantom settings, we determined five DIR patterns as follows.
• DIR pattern 1: phantom settings 1 (fixed image) and 2 (moving image). The area to match was the entire phantom.
• DIR pattern 2: phantom settings 3 (fixed image) and 1 (moving image). The area to match was the entire phantom.
• DIR pattern 3: phantom settings 4 (fixed image) and 2 (moving image). The area to match was the entire phantom.
The area to match was focused only on tumor proxy and rectum proxy regions (i.e., body shape and femoral heads were ignored).
• DIR pattern 5: phantom settings 4 (fixed image) and 2 (moving image). The area to match was focused only on tumor proxy and rectum proxy regions. DIR aims to determine the spatial transformation warping the moving image to match the fixed image as closely as possible.
In clinical practice, based on a request by radiation oncologists, we determined the area to match in the DIR software. For example, if radiation oncologists want to determine the cumulative dose distribution of initial and boost plans, we perform DIR to increase the DIR accuracy in the patient's body. If radiation oncologists want to determine the histogram parameters of the only cumulative dose volume of the rectum, we perform DIR to increase the DIR accuracy, particularly in the rectum. Based on these clinical purposes, we used two different clinical strategies in our credentialing workflow.

2.C | DIR instructions
The phantom with four phantom settings was scanned using the clinical computed tomography (CT) scan protocol for pelvic regions at each institution. The acquired CT images were transferred to the DIR software. Then, DIR was performed using five DIR patterns

2.D | Evaluation
To create reference contours, which were used for contour-based quantitative DIR evaluation, two experienced medical physicists and one radiation therapist created the contours of tumor, rectum, right femoral head, left femoral head, and body proxies in the CT images using the four phantom settings of the phantom by employing the MIM software (76 images: 4 phantom settings × 19 data). Deformed contours were created using DVFs submitted by each institution and three reference contours. Finally, the average Dice similarity coefficient (DSC) and Hausdorff distance (HD) for the tumor, rectum, right femoral head, left femoral head, and body proxies were calculated using the three reference and deformed contours in the fixed image for each pattern. 21 DSC is a common measure of the spatial overlap between contours, and it is defined using the following formula: where V d is the volume of the contours deformed using DIR and V s is the volume of contours manually delineated on the reference CT image. HD is defined as the maximum closest distance between two volumes, where the closest distance is computed for each vertex of the two volumes. To reduce the uncertainty of the reference contour, we used the average DSC and HD using three reference contours.
To calculate the fiducial marker-based target registration error (TRE), one medical physicist created a center of mass for 40 fiducial markers in CT images using the 4 settings of the phantom (i.e., the creation of a point-to-point correspondence between the moving and fixed images). TRE is defined as the three-dimensional Euclidean distance between the fiducial marker coordinates on the fixed image and the corresponding fiducial marker coordinates on the deformed images. Further detailed methodology for calculation of TRE is described in the literature. 11 To evaluate the HU difference in the CT images among institutions, the mean HU value was assessed using the right and left femoral head proxies.
The authors will not share research data.

| RESULTS
End-to-end DIR credentialing was performed on 21 DIR software at 19 participating institutions in 2019 and 2020.  and hybrid DIR with specific organ structures (R2).

| DISCUSSION
The proposed physical DIR phantom was developed to ensure that end-to-end tests can be easily performed in various clinical settings that include tumor, rectum, and body changes. We tested the feasibility of DIR credentialing using the proposed phantom at multiple domestic and international institutions. The results provide feedback to the institutions that performed poorly, indicating where they may improve; these institutions were allowed to resubmit their results to demonstrate that they can meet the standards required for the trial. and pelvis sites) generated by specific DVF to test six commercial DIR software. 25 These studies showed that the DIR performance Because the physical phantom exhibits a simpler structure than the actual patient, the DIR accuracy tends to be high. In the future, we plan to determine the specific criteria for the proposed phantom by analyzing the relationship between quantitative (e.g., visual inspection using a scale of one to five) and qualitative results (e.g., DSC).
The limitations of this study are as follows. First, the proposed phantom was designed to assess the comprehensive DIR accuracy at each institution. Herein, we created a simple representation of the pelvic region by inserting an octagonal tough bone proxy, trigonal PMMA, and various air cavities. By rearranging the inserts or including inserts that are hollow inside, we can simulate the simple thoracic region in the proposed phantom. Although this phantom may be used for DIR evaluation in other clinical sites, we only evaluated the DIR accuracy in one clinical site (i.e., pelvis). Second, a limitation of this geometric phantom is that the deformations are simple and linear (reflections and linear expansions). For example, the rectum proxy goes from a consistent zigzag shape to a straight line. Real pelvic deformations for the prostate were influenced by the bladder and rectum motions (e.g., filling and gas passage). These aspects could be addressed using an anthropomorphic phantom and or more complicated inserts in the geometry of this phantom. Third, the three-quarter-scaled phantom used in pattern 3 was created by uniformly scaling down the entire phantom, including the custom inserts. Although this phantom may present an unlikely clinical scenario, we used this design to increase the sensitivity for determining the technical skills for DIR. Fourth, our results may include residual errors caused by the variability in the placement of the inserts between two phantom settings. When we use the proposed phantom for the DIR credentialing of clinical trials, we must carefully pay attention to the custom insert position. Fifth, we used inserts comprising geometric shapes with distinct patterns (i.e., many corners), which may be influenced by CT scanning parameters (e.g., CT resolution and slice thickness); however, this influence may not be prominent in clinical situations. Sixth, only three types of DIR software were used in this study.

| CONCLUSION S
We developed a physical phantom for the DIR credentialing of a clinical trial. We revealed a wide range of DIR evaluation parameters among the participating institutions. Thus, the proposed method shows considerable potential as an evaluation tool for DIR credentialing and QA.