Assessment and improvement of radiation oncology trainee contouring ability utilizing consensus-based penalty metrics


  • A Hallock MD; G Bauman MD; N Read MD; D D'Souza MD; F Perera MD; I Aivas MD; L Best MD; J Cao MD; AV Louie MD; E Wiebe MD; T Sexton MD; S Gaede PhD; J Battista PhD; G Rodrigues MD.
  • Conflict of interest: George Rodrigues, Jerry Battista and Glenn Bauman have a non-financial academic research agreement with Standard Imaging Inc. for access to StructSure software.


Dr George Rodrigues, A3-808 London Regional Cancer Program, 790 Commissioners Rd E, London, ON, Canada N6A4L6.




The objective of this study was to develop and assess the feasibility of utilizing consensus-based penalty metrics for the purpose of critical structure and organ at risk (OAR) contouring quality assurance and improvement.


A Delphi study was conducted to obtain consensus on contouring penalty metrics to assess trainee-generated OAR contours. Voxel-based penalty metric equations were used to score regions of discordance between trainee and expert contour sets. The utility of these penalty metric scores for objective feedback on contouring quality was assessed by using cases prepared for weekly radiation oncology radiation oncology trainee treatment planning rounds.


In two Delphi rounds, six radiation oncology specialists reached agreement on clinical importance/impact and organ radiosensitivity as the two primary criteria for the creation of the Critical Structure Inter-comparison of Segmentation (CriSIS) penalty functions. Linear/quadratic penalty scoring functions (for over- and under-contouring) with one of four levels of severity (none, low, moderate and high) were assigned for each of 20 OARs in order to generate a CriSIS score when new OAR contours are compared with reference/expert standards. Six cases (central nervous system, head and neck, gastrointestinal, genitourinary, gynaecological and thoracic) then were used to validate 18 OAR metrics through comparison of trainee and expert contour sets using the consensus derived CriSIS functions. For 14 OARs, there was an improvement in CriSIS score post-educational intervention.


The use of consensus-based contouring penalty metrics to provide quantitative information for contouring improvement is feasible.