Magnetic resonance imaging organ at risk delineation for nasopharyngeal radiotherapy: Measuring the effectiveness of an educational intervention

Abstract Introduction Magnetic resonance imaging (MRI) demonstrates superior soft tissue contrast and is increasingly being used in radiotherapy planning. This study evaluated the impact of an education workshop in minimising inter‐observer variation (IOV) for nasopharyngeal organs at risk (OAR) delineation on MRI. Methods Ten observers delineated 14 OARs on 4 retrospective nasopharyngeal MRI data sets. Standard contouring guidelines were provided pre‐workshop. Following an education workshop on MRI OAR delineation, observers blinded to their original contours repeated the 14 OAR delineations. For comparison, reference volumes were delineated by two head and neck radiation oncologists. IOV was evaluated using dice similarity coefficient (DSC), Hausdorff distance (HD) and relative volume. Location of largest deviations was evaluated with centroid values. Observer confidence pre‐ and post‐workshop was also recorded using a 6‐point Likert scale. The workshop was deemed beneficial for an OAR if ≥50% of observers mean scores improved in any metric and ≥50% of observers' confidence improved. Results All OARs had ≥50% of observers improve in at least one metric. Base of tongue, larynx, spinal cord and right temporal lobe were the only OARs achieving a mean DSC score of ≥0.7. Base of tongue, left and right lacrimal glands, larynx, left optic nerve and right parotid gland all exhibited statistically significant HD improvements post‐workshop (P < 0.05). Brainstem and left and right temporal lobes all had statistically significant relative volume improvements post‐workshop (P < 0.05). Post‐workshop observer confidence improvement was observed for all OARs (P < 0.001). Conclusions The educational workshop reduced IOV and improved observers' confidence when delineating nasopharyngeal OARs on MRI.


Introduction
Radiotherapy (RT) for nasopharyngeal carcinoma is constrained by the proximity of organs at risk (OARs) to tumour volumes. Modern RT techniques allow high doses of radiation to be delivered to target volumes while sparing nearby OARs. 1 This significantly improves local control, whilst reducing treatment-related side effects, which improves patients' quality of life. 1 To ensure the dose delivered to the tumour volume is precise and the dose to OARs is within tolerance, accurate delineation of OARs is necessary. 2 Inter-observer variability (IOV) contributes to delineation error in RT and impacts on the dose delivered to OARs. 3,4 IOV is a measure of the difference between contours completed by two or more observers examining the same material. 2 Measures designed to minimise IOV include guidelines and atlases, multi-modality imaging, standard protocols and autocontouring tools. 2 The inclusion of Magnetic resonance imaging (MRI) in nasopharyngeal RT has shown reductions in IOV. 5 Compared to computed tomography (CT), MRI provides superior soft-tissue visualisation, which improves target volume and OAR delineation. 6 Due to the complexity of soft-tissue structures in the head and neck (H&N) region, MRI improves nasopharyngeal OAR delineation. 7 These OARs display similar soft-tissue contrast to surrounding structures on CT data sets but are more discernible on MRI scans and thus better delineated. [8][9][10][11] MRI is often combined with CT to optimise delineation of target volumes and OARs while maintaining accurate dose calculation. 12 Currently, the existing CT-based treatment planning workflow relies on target and OAR definition on MRI and a transfer of contours to CT via image registration. 12 MRI-CT co-registration requires two separate imaging sessions and has some fundamental and logistical drawbacks as dual-modality workflows may introduce misregistration and geometrical uncertainties. 12 MRI-only workflows are feasible; however, radiation therapists (RTs) need to be educated on how to identify and delineate OARs on MRI as well. 12 Education courses conducted in-person, online or a combination of both, have shown to be effective teaching modalities. [13][14][15] Davis et al. 13 demonstrated didactic and hands-on interventions were more effective in facilitating change than didactic sessions alone. In this metaanalysis 13 studies with only didactic interventions had no statistically significant impact on participants' behaviour or healthcare outcomes, while studies with both didactic sessions and interactive interventions did. In a study by Awan et al., 14 seven resident observers contoured 26 H&N OARs on a CT scan. After a teaching intervention, the observers contoured the same 26 OARs on another CT scan. The teaching intervention involved an atlas and real-time software-based feedback to help contour OARs in the H&N region. The mean DSC scores across all structures improved between phases, and each resident observer demonstrated statistically significant improvements in overall OAR contouring (P < 0.01). These findings indicate educational interventions can improve the contouring of H&N OARs. As a result of these studies, we explored the possibility of using an interactive approach, so a didactic lesson on radiological anatomy in an interactive environment was used. The lack of published clinical findings on the effects of a teaching intervention for MRI-only nasopharyngeal OAR delineation, emphasise the need for this research. 2 This study evaluated whether participation in an education workshop minimises IOV when delineating nasopharyngeal OARs on MRI only.

Methodology Context
The South Western Sydney Local Health District Human Research Ethics Committee approved this study (HREC/ 16/LPOOL/603), which utilised retrospective imaging data obtained during routine clinical radiotherapy planning. Retrospective MRI data sets of five patients diagnosed with nasopharyngeal carcinoma were used. Inclusion criteria were patients treated for nasopharyngeal carcinoma, with a curative treatment intent, prescribed dose ≥60Gy, with any tumour and nodal stage but no metastasis present and had MRI scans including the temporal lobe superiorly and clavicles inferiorly.
This study was a single-centre, comparison study of OAR IOV pre and post a contouring education workshop intervention. Eleven radiation therapist observers were asked to contour 14 H&N OARs on five patient MRI data sets pre and post the education workshop. The 14 OARs delineated included: base of tongue, brainstem, left and right lacrimal glands, larynx, optic chiasm, left and right optic nerves, left and right parotid glands, pharyngeal constrictors, spinal cord and left and right temporal lobes. These OARs were selected as they display similar soft tissue contrast to surrounding structures on a CT data set but are more discernible on MRI scans.

Imaging
Images were acquired on a 3 T MRI scanner (Magnetom Skyra; Siemens Healthcare, Erlangen, Germany) with two 18-channel receiver array coils. The surface coils were placed over the H&N region abutting each other using coil bridges on top of a H&N mask to cover the area of interest. Images were acquired in the transverse plane using T2_tse_DIXON sequence. Dixon in-phase images were utilised for delineation in this study. In-phase images are a combination of water and fat. The voxel size was 1x1x3mm, slice thickness was 3 mm with a 0-slice gap, base resolution was 256, and the flip angle was 140°. The 250 mm field of view extended from the superior aspect of the orbital bone to the suprasternal notch. The

Pre-workshop delineation
Before the workshop, observers delineated 14 OARs using MIM version 6.9.5 (MIM software inc., Cleveland, Ohio). The observers were provided with H&N contouring work instructions based on consensus delineation guidelines for H&N OARs. 8 The instructions described the borders of each OAR and an example image of the OAR contoured on a CT scan. The observers were blinded to each other's volumes. Observers were asked to indicate their experience levels as either no previous experience (never rostered for head/neck planning), some experience (≤3 months of consistently contouring a minimum of 5 plans a week) or experienced (≥6 months of consistently contouring a minimum of 5 plans on week). Observers were also asked to rank their levels of confidence when contouring each OAR on a 6-point Likert scale ranging from 0-5 indicating not confident, slightly confident, somewhat confident, fairly confident, confident and very confident.

Intervention
The education workshop was conducted after observers finalised their initial volumes. The workshop included discussion on basic concepts of T1 and T2 weighted MRI scans, OAR anatomy in relation to other structures and their typical appearance on T2_DIXON_in-phase acquisition, and how to recognise structures on this acquisition such as water, fat and bone. The 14 OARs were discussed in detail on the appearance of each OAR on T2_DIXON In-phase sequence with images for aid. The education workshop was collaboratively developed by a senior MRI radiographer with more than 12 years of MRI experience employed in a radiotherapy department and a senior radiation therapist with expertise in H&N planning and MRI guided radiotherapy. As COVID-19 restrictions prohibited in-person workshops, the education intervention was held virtually. Observers were encouraged to ask questions and received real-time feedback.

Post-workshop delineation
To minimise recall bias, all observers were given a fourweek break between pre-and post-workshop delineations, data sets were re-labelled, and observers were blinded to their pre-workshop volumes. Observers were given the same instructions as pre-workshop along with new information from the education workshop. Observers were once again asked to rank their confidence levels for each OAR following delineations. In addition, observers completed a 5-point Likert scale survey indicating whether they strongly disagreed, disagreed, were neutral, agreed or strongly agreed with being comfortable with H&N OAR contouring pre-and post-workshop, and whether the education workshop was a worthwhile experience.

Reference volumes
Reference volumes were generated based on the consensus of two H&N radiation oncologists (RO) with extensive experience in utilising MRI for H&N radiotherapy. Each RO has contoured between 40-50 cases over the last 5 years. Their contours have been routinely audited weekly by other H&N RO's and radiologists at the hospital. For this study, each RO contoured seven OARs and then audited the other seven. If a contour was not accepted by the auditing RO, edits were made after discussion and agreement. The ROs followed international consensus guidelines to aid with contouring. 8

Analysis
Contour analysis was performed in MIM version 6.9.5 (MIM software inc., Cleveland, Ohio). The data obtained for each OAR included DSC, HD (mm), absolute volume (cc) and centroid X, Y and Z (cm). The relative volume (%) and centroid difference (Δ) between each observer's OAR and the reference OAR were reported. DSC measures deviations of observers' contours from reference contours and evaluates their overlap. 4 A DSC ≥0.7 indicated a 'good' agreement between observer and reference contours. 2 HD identifies the greatest distance of a point in one contour to the closest point in another contour. 4 A high HD indicates greater dissimilarity between the observers' contour and reference contour, while a HD of zero indicates identical contours. 4 Relative volume is a volumetric comparison between observer and reference volumes.
To calculate relative volume: Centroid ΔX, ΔY and ΔZ describe the difference in geometric centre of a volume in 3 planes, compared to the reference volume. The X-axis represents left/right, Yaxis represents anterior/posterior, and Z-axis represents superior/inferior.

To calculate centroid ΔX:
Centroid ΔX ¼ observers centroid XÀreference centroid X The same equation was followed to calculate centroid ΔY and ΔZ.
Quantitative analysis of pre-and post-workshop differences involved DSC, HD, relative volume, centroid ΔX, ΔY and ΔZ and confidence levels for each OAR for all observers being analysed. Statistical analysis was performed using the Mann-Whitney U test on the program SPSS (IBM Corp. Released 2021. IBM SPSS Statistics for Macintosh, Version 28.0. Armonk, NY: IBM Corp), a P-value of < 0.05 was considered significant.

Analysis of Education Program
The education workshop was deemed beneficial for any OAR if ≥50% of observer contours demonstrated improvement based on IOV comparison metrics and if ≥50% of observer confidence improved.

Results
The study was conducted from May to August 2021. We began with 11 observers and five patient data sets; however, due to a COVID-19 outbreak putting extra strain on staff, one observer dropped out and one patient data set was removed due to time constraints. COVID-19 restrictions prohibited face-to-face workshops, so the education intervention had to be presented virtually. Observers experience in H&N radiotherapy varied, 4 of 10 observers had no previous experience, three had some experience, and 3 observers were experienced. The four data sets used were of patients with the following staging: T4N2M0, T2N2M0, T1N2M0 and T3N3bM0.
As shown in Table 1, all OARs except the right optic nerve, pharyngeal constrictors and spinal cord had statistically significant improvements in at least one of the metrics measuring inter-observer variation. Interestingly, the left optic nerve and right parotid gland demonstrated statistically significant improvement for HD across observers; however, the bilateral structures did not. All OARs had statistically significant improvements in observers' confidence levels when contouring each OAR.
Mean DSC scores increased for ≥50% of observers on eight OARs (base of tongue, left and right lacrimal glands, larynx, optic chiasm, spinal cord, left and right temporal lobes). However, only base of tongue, larynx, spinal cord and right temporal lobe also achieved a mean DSC score of ≥0.7. The right and left temporal lobes demonstrated an outlier observer post-workshop DSC score of 0.4 and 0.3, respectively ( Fig. 1 and Figure S1). These outliers, however, had minimal impact on the mean post-workshop DSC scores. The mean DSC for the right temporal lobe went from 0.66 with the outlier included to 0.69 without it. Similarly, the mean DSC for the left temporal lobe went from 0.62 with the outlier included to 0.66 without. Figure  The percentage of OARs contoured by observers without previous experience had a higher percentage of improvement in mean DSC and relative volume scores. However, the percentage of organs with Hausdorff distance improvements were similar between the groups ( Table 2). An increase in confidence was also seen for all 14 OARs post-workshop (Fig. 4). All observers had an improvement in comfort levels with MRI OAR delineation after the education workshop (Fig. 5). Out of the 10 observers, four observers agreed the education workshop was a worthwhile experience and 6 observers strongly agreed.

Discussion
Contouring variability is a well-known source of geometric error in radiotherapy planning. 2,16 The impact of contour deviation has been associated with poorer clinical outcomes. 2 RTs are highly skilled in CT image interpretation as CT imaging is routinely used for training and clinically. With the increased use of MRI in radiotherapy, including MRI-only workflows, it is important RTs are skilled in identifying and delineating OARs on MRIs too. MRI guidelines for radiotherapy H&N contouring do not exist, and therefore, there are no guidelines to specify optimal sequences for OAR  delineation. Current international guidelines are CT-based and only include CT-MRI fused datasets, signifying a need for future consideration of including MRI-only data sets in these guidelines. 8 MRI delineation skills are also valuable as the introduction of automated workflows becomes more routine, and RTs will be required to make a qualitative assessment on whether the generated delineation is correct. 17 This is a novel study investigating the benefits of an educational workshop on reducing IOV when contouring in the H&N region specifically on MRI.
The results showed the education workshop was beneficial in reducing IOV and improving observer confidence when delineating nasopharyngeal OARs on MRI scans. There have been previous CT-based studies in this area, and our study supports previous findings that teaching interventions can reduce IOV when contouring H&N OARs. 1,2,14,15,[18][19][20] In the Bekelman et al. 18 study, 11 radiation oncology residents contoured three H&N clinical tumour volumes (CTV) on CT scans before and after a teaching intervention. Six observers did not have prior experience in H&N contouring, and five observers did. The teaching intervention consisted of a didactic session on identifying anatomic landmarks on cross-sectional images and a hands-on practical session on CTV target delineation. Observers' contours were rated as adequate or inadequate, based on current radiotherapy protocols. For the group without prior experience, 60%, 0% and 20% of baseline contours were deemed adequate as compared to 100%, 40% and 80% in follow-up contours. Likewise, for the group with prior experience, 83%, 33% and 67% of baseline contours were deemed appropriate compared to 100%, 67% and 100% for the follow-up contours. Similarly, to our study, these findings suggest incorporating a teaching intervention led to improved delineation for participants, but especially for those with no prior experience.
Our pilot study only showed a short-term assessment on the education impact, with constant utilisation the clinical implication could be different. While observer confidence is a simplistic measure for the impact of the workshop, it's still relevant to know observer perception on the material delivered and if they felt they could identify OARS on MRI. In our study, we demonstrated the education workshop improved observers' contouring confidence levels, as all observers' confidence levels increased for all OARs post-workshop. An increase in confidence was associated to an improvement in contouring consistency, as all OARs also showed a decrease in IOV. Similarly, Jaswal et al. 20 demonstrated an increase in confidence scores as well as improvement in contouring, as across all contoured H&N structures there was a 0.20 median improvement in students' average DSC score (P < 0.001). Whereas Stanley et al. 21 found observer confidence was not reflected in contouring consistency as the ratio of smallest to largest contour volumes for each brain metastasis contoured by eight physicians varied from 1.25 to 4.47, indicating a high degree of IOV. The average observer's confidence, on the other hand, was relatively high, with a mean score of 3.2, on a scale where 4 indicated very high confidence.
In our study, observer's level of experience was related to improvements in DSC and relative volume scores, as the percentage of OARs contoured by observers with no previous experience that improved post-workshop was higher than the percentage of OARs that improved contoured by observers with previous experience. This is similar to other studies 14,18 reporting that a teaching intervention improved contouring particularly for participants without previous experience. The level of experience of observers did not seem to influence their HD scores as all observers improved at the same rate regardless of level of experience.
Our study also found the largest centroid ΔX and ΔY variations were demonstrated for the temporal lobes ( Fig. 2A,B). Since an OAR's volume scales faster than its surface area, the larger centroid ΔX and ΔY variations may be a consequence of the temporal lobe's large volume. 4 Both the right and left temporal lobes demonstrated outlier DSC scores of 0.4 and 0.3, respectively, post-workshop ( Fig. 1 and Figure S1). A single observer was responsible for both outliers. This observer indicated no previous experience with head/neck planning, so this lack of experience may have contributed to the outliers. The brainstem had the largest centroid ΔZ variation (Fig. 2C) predominantly due to variability in defining the brainstem and spinal cord boundary. Brainstem DSC (Fig. S1) and HD scores were also better before the education workshop. Similarly, spinal cord, optic nerves, parotid glands and pharyngeal constrictors DSC's remained equivalent post-workshop (Fig. S1). These structures are commonly delineated on CT by RTs, this familiarity with CT-based delineation may have resulted in better concordance than post-workshop. However, better concordance does not necessarily mean the contours are accurate. Despite this lack of improvement, the brainstem, spinal cord and parotid glands post-workshop mean DSC's were 0.7, which is considered a 'good' agreement. However, the left and right optic nerve structures had low mean DSC scores of 0.4 post-workshop and mean HD values of 1 cm postworkshop (Fig. 2F). This HD value is clinically significant due to the optic nerves small size. Because of their small size and tubular geometry, optic nerves are difficult to delineate. Small absolute differences can also lead to poor scores when the volume of the organ is small. 22 Our results also showed statistically significant HD improvements for only one bilateral structure for the optic nerve and parotid glands. This was caused by the small sample size and outliers in the data; however, this  statistical significance does not indicate clinical significance. The pharyngeal constrictors had a postworkshop mean DSC score of 0.5. This low DSC score may have been due to RTs CT familiarity. The pharyngeal constrictors are difficult to visualise on CT, so its contouring requires the accurate interpretation of guidelines based on anatomical landmarks. 22 The observers may have contoured the pharyngeal constrictors based on perceived CT boundaries, which may have contributed to the higher degree of variations observed in this study for MRI. Interestingly, for the optic chiasm (Fig. 2D), post-workshop DSC scores improved but remained under the 0.7 threshold (Fig. S2). This may have been due to the use of T2_DIXON affecting its visibility, or it may be due to learnt behaviour from CT delineation. Likewise, the lacrimal glands (Fig. 2E)  improved across all metrics but still had low DSC scores (Fig. S2). This may have been due to distortion at the edge of the field of view on MRI and possible motion artefacts from eye movement. 22 The orbital area is prone to distortions and signal loss on MRI scans due to interfaces among air, bone and soft tissue, resulting in an inhomogeneous magnetic field and susceptibility artefacts. 22 This study had some limitations. The small sample size of observers and patient data sets may have limited the ability to test the effectiveness of the education workshop. Also, COVID-19 restrictions required the education workshop to be delivered virtually, thereby limiting observer hands-on and interactive experience. Due to a COVID-19 outbreak putting additional strain on staff, one observer dropped out and one patient data set was removed from the study due to incomplete observer data. Observers may also have been influenced by CT recall bias, since they may have delineated organs based on perceived CT boundaries. In addition, the reference contours created by consensus of two ROs may have IOV. For future work, incorporating consensus volumes of ROs and radiologists may provide more accurate reference volumes. Contours were also only evaluated quantitatively, not qualitatively, so future studies should incorporate qualitative RO assessments of the clinical significance of volume changes. Further research is also needed to determine the most appropriate imaging sequence for each organ.

Conclusion
This study demonstrated educational workshops can potentially reduce inter-observer variability and improve observer confidence when delineating nasopharyngeal OARs on MRI scans. This study provides important insights concerning the emerging trend of MRI-only radiotherapy planning workflows. It is a pilot study adding to a body of evidence regarding the emerging role of MRI in radiotherapy. All observers found the teaching intervention a valuable experience and all reported improvements in confidence levels post-workshop. This is consistent with other studies of teaching interventions for contouring in radiation oncology and calls for investment and additional research in this area.