Development of an assessment strategy in preclinical fixed prosthodontics course using virtual assessment software—Part 2

Abstract The purpose of this study was to evaluate interrater agreement between faculty and virtual assessments of preparations for complete coverage restorations in preclinical fixed prosthodontics. Teeth prepared during preclinical fixed prosthodontics practical exams at the University at Buffalo School of Dental Medicine were used in this study. Teeth were prepared for fabrication of complete cast, metal ceramic, and all ceramic crowns. The specimens were digitized using an intraoral scanner. Then, they were virtually superimposed on the corresponding standard preparations using Compare software. The software was used to quantify comparison percentages, average finish line widths, and average axial wall heights. Two calibrated faculty members assessed preparations for occlusal/incisal reduction, finish line location, axial wall height, and finish line width using traditional assessment forms. Cohen's kappa coefficient was used to measure interrater agreement between faculty and virtual assessments. Kappa interrater agreement scores ranged between 0.83 and 0.88 for virtually assessed comparison percentages and sums of faculty‐assessed occlusal/incisal reduction and finish line location. Kappa interrater agreement score ranges were 0.64–0.94 and 0.74–0.89 for comparisons of virtual and faculty assessments for axial wall height and finish line width, respectively. Virtual assessments are similar to faculty assessments for occlusal/incisal reduction, finish line location, axial wall height, and finish line width in fixed prosthodontics and can be used as equivalent evaluations of student performance for these criteria.

the development of contemporary manikins and stations. This newer setting is believed to increase the learning experience of the students prior to applying the technique in the patient care environment (Perry, Bridges, & Burrow, 2015). In addition, over the last decade, advances in technology have facilitated incorporation of virtual reality and threedimensional haptic systems into medical and dental training to increase the learning experience of the students (Bongers, van Hove, Stassen, Dankelman, & Schreuder, 2015;Jasinevicius, Landers, Nelson, & Urbankova, 2004;Larsen, Oestergaard, Ottesen, & Soerensen, 2012).
However, the validity and value of virtual reality-based education in dentistry has not yet been fully assessed (Buchanan, 2004).
In addition to the importance of the preclinical simulation environment in dental education, accurate and consistent feedback from faculty is a critical aspect of the educational experience. It is crucial that students receive consistent feedback so that they can use the assessment to improve their performance. However, variations in grading scales, faculty calibration, and subjective faculty assessment can diminish the consistency and value of feedback (Feil & Gatti, 1993).
In order to promote more reliable and accurate faculty assessment, the Commission on Dental Accreditation mandates incorporation of assessment forms and faculty calibration for U.S. dental schools (American Dental Association, 2006). However, despite these improvements, multiple studies have shown that faculty interrater and intrarater assessments are not consistent when evaluating dental student performance (Lilley, Bruggen Cate, Holloway, Holt, & Start, 1968;Fuller, 1972;Salvendy, Hinton, Ferguson, & Cunningham, 1973;Sharaf, AbdelAziz, & MEl Meligy, 2007).
Virtual assessment software was proposed as a mechanism to remove faculty-based subjective error from dental student assessments by providing an objective means of evaluation (Schiff, Salvendy, Root, Ferguson, & Cunningham, 1975;Renne et al., 2013). In support of the idea, calculation of comparison percentage (Comparison%) by virtual assessment software was shown to increase the objectivity and reliability of student assessment in the simulated laboratory setting (Renne et al., 2013). However, Comparison% does not take into consideration the principles of tooth preparation, such as axial wall height (AWH) and total occlusal convergence, when evaluating student performance (Renne et al., 2013). In addition, the validity of the use of Comparison% to assess preparation for complete coverage restorations has been questioned (Callan, Haywood, Cooper, Furness, & Looney, 2015).
In Part 1 of this study, rubrics were developed for evaluating the preparation of complete coverage restorations in the preclinical fixed prosthodontics. Following the virtual quantitative assessment, students utilize Compare software (E4D Technologies, Richardson, TX, USA) to assess their preparations against standard tooth preparations including average AWH, average finish line width (FLW), occlusal/incisal reduction (O/IR), and finish line location (FLL).
Presently, there is no consensus regarding the correlation of virtual quantitative assessments with evaluations from highly trained professionals in the field. Careful evaluation of these correlations is needed to universally establish computerized evaluation as a viable educational tool. The purpose of this study was to verify the virtual assessment rubrics developed in Part 1 of this study. We aimed to evaluate the level of concordance between faculty and virtual assessments for O/IR, FLL, AWH, and FLW in fixed prosthodontics.   (Table 1). Then, standard preparations were recorded using an

| Assessment techniques
For the purpose of this study, one operator digitized collected teeth from the above-mentioned practical exams using an intraoral scanner In addition to virtual assessments, two independent and calibrated faculty members quantified the amount of O/IR, the FLL, AWH, and FLW using traditional assessment forms. The faculty members were not aware of the result of the virtual assessment.
Then, each preparation was scored as E, S, or N for the stated criteria. For discordant scores, the faculty members reviewed the preparations following the traditional rubrics until reaching a unified decision. O/IR was quantified using a reduction guide and a periodontal probe. Reduction guides were fabricated on corresponding unprepared teeth using polyvinyl siloxan (Virtual XD, Ivoclar Vivadent, Amherst, NY, USA) and sectioned vertically into one or three slices. Molar reduction guides were sectioned in three locations: at the distolingual cusp tip, the lingual groove, and the mesiolingual cusp tip. Premolar and anterior reduction guides were vertically sectioned on the cusp tip and the mid-incisal edge, respectively. The amount of O/IR reduction was then measured at each slice using a periodontal probe. FLL and AWH were also assessed using a periodontal probe. FLW was quantified using the corresponding bur for the finish line design and a periodontal probe.

| Statistical analysis
Cohen's kappa coefficient (Viera & Garrett, 2005)      Note. ACC = all ceramic crown; CCC = complete cast crown; FLW = finish line width; MCC = metal ceramic crown.   (Renne et al., 2013). In contrast, when a calibrated faculty member evaluates the same work on separate occasions, they may assign different scores each time (Lilley et al., 1968;Fuller, 1972;Salvendy et al., 1973).  (Goodacre, Campagni, & Aquilino, 2001). As a result, faculty assessments may focus primarily on the facial and lingual AWH, causing them to ignore or miss measurement of the interproximal AWH.

| CONCLUSIONS
Within the limitations of this study, the following conclusions can be drawn: 1. Virtual assessment of the Comparison% at a tolerance of 400 μm can be used to evaluate O/IR and FLL.
2. Interrater agreement between the Comparison% at a 400-μm tolerance and the sum of faculty-assessed O/IR and FLL is almost perfect (kappa > 0.81) for all preparation designs. However, virtual assessment may be associated with slight inflation in grading.
3. Interrater agreement between virtual and faculty assessment of FLW was almost perfect or substantially in agreement (kappa-0.61) for all preparation designs. However, virtual assessment of FLW may be associated with grade inflation.
4. Interrater agreement between virtual and faculty assessment of AWH was almost perfect or substantially in agreement (kappa-0.61) for all preparation designs. However, virtual assessment of AWH was associated with a lower grade in 1.8% of student preparations.