Validation of clinical acceptability of an atlas-based segmentation algorithm for the delineation of organs at risk in head and neck cancer




The aim of this study was to assess whether clinically acceptable segmentations of organs at risk (OARs) in head and neck cancer can be obtained automatically and efficiently using the novel “similarity and truth estimation for propagated segmentations” (STEPS) compared to the traditional “simultaneous truth and performance level estimation” (STAPLE) algorithm.


First, 6 OARs were contoured by 2 radiation oncologists in a dataset of 100 patients with head and neck cancer on planning computed tomography images. Each image in the dataset was then automatically segmented with STAPLE and STEPS using those manual contours. Dice similarity coefficient (DSC) was then used to compare the accuracy of these automatic methods. Second, in a blind experiment, three separate and distinct trained physicians graded manual and automatic segmentations into one of the following three grades: clinically acceptable as determined by universal delineation guidelines (grade A), reasonably acceptable for clinical practice upon manual editing (grade B), and not acceptable (grade C). Finally, STEPS segmentations graded B were selected and one of the physicians manually edited them to grade A. Editing time was recorded.


Significant improvements in DSC can be seen when using the STEPS algorithm on large structures such as the brainstem, spinal canal, and left/right parotid compared to the STAPLE algorithm (all p < 0.001). In addition, across all three trained physicians, manual and STEPS segmentation grades were not significantly different for the brainstem, spinal canal, parotid (right/left), and optic chiasm (all p > 0.100). In contrast, STEPS segmentation grades were lower for the eyes (p < 0.001). Across all OARs and all physicians, STEPS produced segmentations graded as well as manual contouring at a rate of 83%, giving a lower bound on this rate of 80% with 95% confidence. Reduction in manual interaction time was on average 61% and 93% when automatic segmentations did and did not, respectively, require manual editing.


The STEPS algorithm showed better performance than the STAPLE algorithm in segmenting OARs for radiotherapy of the head and neck. It can automatically produce clinically acceptable segmentation of OARs, with results as relevant as manual contouring for the brainstem, spinal canal, the parotids (left/right), and optic chiasm. A substantial reduction in manual labor was achieved when using STEPS even when manual editing was necessary.