A multi‐year evaluation of medical student performance on and perceptions of collaborative gross anatomy laboratory examinations

Collaborative testing and its benefits have been reported in diverse disciplines across different types of academic institutions. However, there has been minimal research conducted on collaborative assessments in medical schools, particularly in the gross anatomy laboratory. The objectives of this study were to explore the effect of collaborative anatomy laboratory examinations on student performance and to gauge student perceptions of this assessment format. This study examined five academic years of medical students' performance on a two‐stage, collaborative anatomy laboratory examination wherein each student's overall score was a weighted combination of scores from the individual and team examination. Analyses of a descriptive survey capturing students' perceptions of the assessment method were also performed. Individual examination averages increased since implementing the collaborative assessment (p < 0.001), and team examination averages were higher than individual examination averages (p < 0.001). Teams outperformed each of their team members 98% of the time. Teams had a greater than 0.90 incidence of answering a question correctly if more than one person in the group got the answer correct on the individual portion, and a 0.66 incidence of answering correctly if only one person in their group answered correctly on the individual portion. Student feedback identified the discussions and learning that took place during the team portion to be a beneficial feature of this assessment format. Students also reported that this collaborative assessment made them feel a higher level of responsibility to perform well, and that it improved their understanding of gross anatomy.

effect of collaborative anatomy laboratory examinations on student performance and to gauge student perceptions of this assessment format. This study examined five academic years of medical students' performance on a two-stage, collaborative anatomy laboratory examination wherein each student's overall score was a weighted combination of scores from the individual and team examination.
Analyses of a descriptive survey capturing students' perceptions of the assessment method were also performed. Individual examination averages increased since implementing the collaborative assessment (p < 0.001), and team examination averages were higher than individual examination averages (p < 0.001). Teams outperformed each of their team members 98% of the time. Teams had a greater than 0.90 incidence of answering a question correctly if more than one person in the group got the answer correct on the individual portion, and a 0.66 incidence of answering correctly if only one person in their group answered correctly on the individual portion. Student feedback identified the discussions and learning that took place during the team portion to be a beneficial feature of this assessment format. Students also reported that this collaborative assessment made them feel a higher level of responsibility to perform well, and that it improved their understanding of gross anatomy.

K E Y W O R D S
assessment methodology, collaborative assessments, collaborative testing, gross anatomy education, laboratory examinations, medical education, practical examinations

INTRODUC TI ON
In the gross anatomy laboratory, students actively learn in a group setting where student-led acquisition of knowledge is encouraged, and learning from peers is noted as beneficial and favorable (Böckers et al., 2010;Huitt et al., 2015;Laakkonen & Muukkonen, 2019;Singh et al., 2019). Despite the tradition of students working and learning in teams during gross anatomy laboratory sessions, the typical assessment approach has almost always been through an examination that students take individually.
Educational scholars would suggest revisiting the type of assessment used in the gross anatomy laboratory to better match the style of instruction (Dochy, 2001).
One such assessment format is collaborative testing which allows students to work in small groups to take an examination as a team, using the assessment itself as a learning experience (Sainsbury & Walker, 2008;LoGiudice et al., 2015;Efu, 2019).
These studies showed that collaborative testing increased student performance and enhanced student learning. However, collaborative testing with teams larger than two people showed an increase in the likelihood of social loafing, the concept that people exert less effort when working in a group compared to if they worked alone (Kapitanoff, 2009;Pandey & Kapitanoff, 2011). Social loafing has even been seen in collaborative testing in a virtual setting (Blaskovich, 2008;Robert, 2020).
In an anatomy laboratory setting, there has been one published study about the use of collaborative laboratory practical tests at a Health Sciences Institute that caters to various undergraduate majors, most of which are described as general cohorts for which anatomy is not a core discipline; however, the study was limited by the lack of inclusion of a survey or questionnaire to obtain student feedback (Green et al., 2016).
In medical education, a recent study reported that collaborative testing improved performance on examinations and long-term knowledge retention; however, student feedback suggested some concerns about the group testing format, indicating additional studies are needed to investigate this further (Eastwood et al., 2020).
Since the opening of a partnership campus between Augusta University and the University of Georgia in 2010, gross anatomy laboratory examinations have had noticeably lower performance than other assessments in the curriculum, and in the academic years between 2013 and 2016, the laboratory examination averages decreased each year ( Figure S1). An alternative assessment approach was considered to increase student performance on laboratory examinations without compromising the rigor of the examination itself. Given the collaborative nature of the learning environment in the laboratory, an assessment style matching this type of instruction was desired. The principles of team-based learning through the use of individual and group readiness assurance tests (iRAT and gRAT, respectively) in gross anatomy courses and laboratories have been shown to increase examination scores, improve students understanding of the material, and was overall strongly supported by both students and faculty (Nieder et al., 2005;Vasan & DeFouw, 2005;Vasan et al., 2008Vasan et al., , 2009Vasan et al., , 2011Huitt et al., 2015;Vijayalakshmi et al., 2016), and therefore served as inspiration for how a collaborative assessment in a medical school gross anatomy laboratory could be implemented. This was a timely study given that a recent report suggested that team laboratory assessments (i.e., "TAILS") would be a viable approach at medical schools as it modeled what the students would be doing in the clinic and facilitated collaborative learning (Barremkala et al., 2019).
The goals of this study were to implement a two-stage, collaborative anatomy laboratory examination, to determine whether it increased student performance, and to understand how students perceive this type of assessment.

Ethics statement
Ethical approval for this research project was granted by the Institutional Review Board (IRB) at the University of Georgia. The IRB reference number is 00004068.

Study participants
Participants in the study were first-year medical students at the students) were also obtained and used to compare examination scores before implementing the collaborative assessment. There was little difference in matriculation data between the cohorts of students during the study period (Table 1).

Gross anatomy laboratory structure
Gross anatomy laboratory sessions took place throughout the entire 37 weeks of the first year in an integrated, systems-based undergraduate medical education curriculum that is divided into five sequential modules. The curriculum and instructional approaches did not change significantly throughout the study period. Each student was assigned to a weekly three-hour laboratory period with mandatory attendance. Students within the period were randomly placed into a four-person dissection team, with the only consideration for team constitution being attempted gender parity within each team.
Depending on class size and withdrawals, there were sometimes three students per team and one instance in which there were five students in a team. Team composition remained the same throughout the year and each team of students dissected one of the 6-7 cadavers in the laboratory all year (two teams worked on each cadaver, one from each of the three-hour laboratory periods). Students took a laboratory examination at the end of each module; however, due to the quantity of anatomy content in the Musculoskeletal module, there was an additional laboratory examination at the halfway point, yielding six laboratory examinations per academic year. Students received a checklist of structures they were responsible for knowing for each examination. These checklists remained the same from year-to-year. While students must earn an overall score of 70% to pass a module, there was no minimum score requirement on the anatomy laboratory examinations.

Gross anatomy laboratory examination format
Each laboratory examination consisted of 25 stations containing two questions each, for a total of 50 questions. All questions were fill-in-the-blank. There were 12-14 cadaver-based stations and 11-13 bone-, organ-, or radiograph-based stations per examination.
The first question at each station always asked students to identify a tagged, pinned, or labeled structure. The second question at the station could be another identification question, or it could ask a subsequent question about the indicated structure (i.e., action, innervation, blood supply, venous drainage, embryonic origin, nerve roots, vertebral level, function). Throughout the study period, each laboratory examination contained 55%-70% identification questions (Bloom level 1) and 30%-45% comprehension or application questions (Bloom levels 2 and 3). The laboratory examinations each year consisted of 90% established questions and 10% new questions.

Administration of the collaborative anatomy laboratory examinations
Students first took the laboratory examination individually, sequentially rotating through the 25 stations. They spent 90 s at each station plus 5 min at the conclusion to revisit any station(s) prior to handing in their answer sheet. TA B L E 1 Matriculation data by academic year for the study period (2016-2021) and the 3 years prior to implementation of the team-based examination format (2013)(2014)(2015) Academic year After completing the examination individually, students then grouped into their dissection teams and retook the examination collaboratively. For this team portion, they rotated through the same stations, but completed two stations (four questions) per 90-s timeframe. Due to the odd number of stations, once during each team examination, the teams completed three stations (six questions) in 90 s. When possible, teams were spaced out within the laboratory to limit interactions with other teams. The teams did not get to revisit any stations at the end. Each team submitted one answer sheet.

Question-by-question analyses of individual versus team performance
Since no partial credit was given, all answers on the deidentified individual and team examinations were coded with either a 1 or a 0 to indicate correct or incorrect, respectively (N = 15,700 questions).
Each individual's performance per question was then compared to their team's performance on the same question, resulting in 12 possible scenarios of correctness/incorrectness (an example scenario: three team members answered a particular question correctly and one member answered incorrectly as they took the examination individually, and the team subsequently answered the same question correctly during the team portion). The incidence and occurrence frequency of each of the 12 possible scenarios were counted. The scenarios were based on teams of four students, but the values in each scenario have been corrected to account for teams of three or five students.

Survey administration and data analysis
Student perception of this collaborative assessment method was evaluated using a descriptive survey (Online Appendix 1) consisting of two free-response questions, and seven statements with which students rated their level of agreement on a four-point Likert scale, where 1 = strongly disagree, 2 = disagree, 3 = agree, and 4 = strongly agree. The survey was anonymous and optional, and no incentive was given for participation. In  (73.5 ± 13.3; after implementation). In AY 2016-2021, the team average (90.8 ± 6.1) was 17.3 points higher than the individual average, resulting in an overall average (80.0 ± 9.5) that was 6.5 points higher than the individual average.

Student examination performance
Additionally, teams have made a perfect score 14 times, and there have also been three individuals who have made a perfect score since the onset of the collaborative assessment compared to none in the years prior. In only 24 out of 1205 occasions (2.0% occurrence) was a student's individual score higher than their team's score, resulting in an overall score lower than their individual score.
Students who scored below a 60% on their individual examinations had an average score increase of more than 12 points with the inclusion of a collaborative assessment, with lower deciles seeing larger increases. Students in the upper three deciles had an average score increase of 6 points or less (Figure 2). There was no significant difference in performance on the collaborative laboratory assessment between male and female students; both groups saw equal improvements in their scores with the inclusion of a collaborative assessment (Table S1).  F I G U R E 2 Average difference ± standard deviation (SD) in points between students' individual and overall laboratory examination scores (N = 1205) by decile.

Student feedback via survey
When prompted to provide two features of the collaborative laboratory examination that they found beneficial to their learning, more than half of respondents' answers were coded as learning as a team/discussion. For example, one participant's response said, "the ability to combine knowledge for better learning." The other five codes that appeared at a lower frequency in the responses included  Respondents also reported wanting to have collaborative assessments for other classes (3.10 ± 0.32). Overall, respondents reported that they benefited from the collaborative laboratory examination (3.67 ± 0.46).

DISCUSS ION
The findings in this study show that collaborative assessments in a medical school gross anatomy laboratory setting have distinct, yet somewhat overlapping impacts on teams and individual students.

TA B L E 2
Scenarios detailing the correctness of students' individual answers to a question and the subsequent correctness of their team's answer to the same question

Correctness of team's answer to question
Incidence of team's answer b

Occurrence of scenario, n (%) c
Scenarios in which team answered question correctly (✓) Note: Each scenario describes how the members of a team each answered an examination question (✓, correctly; X, incorrectly) while taking the individual portion of the examination, and how their team subsequently answered the same question on the team portion of the examination. a All members ✓, all team members correctly answered the question during the individual portion of the examination; X, only one team member answered incorrectly during individual portion (all other team members answered correctly); XX, two team members answered incorrectly; XXX, three team members answered incorrectly; ✓, only one team member answered correctly; All members X, all team members answered incorrectly. While most teams consisted of four students, due to variations in class size, some teams had only three students, so, for example, scenarios 4 and 5 are not necessarily the same. b Whether a team answered a question correctly (✓) or incorrectly (X) given the scenario's unique combination of team members' individual answers to the question. c n, number of times the scenario was observed to occur; %, number of times scenario was observed out of the total number of questions analyzed, represented as a percentage [= 100 * (n/15,700)].

Team dynamics with collaborative anatomy laboratory examinations
One of the most important considerations on a collaborative assessment is the composition and proficiency of a student team. Previous collaborative learning studies reported no statistically significant differences in individual or team performance on novel tasks based upon whether teams were formed randomly, by student preference, or by the instructor (Haberyan & Barnett, 2010;Pociask et al., 2017).
Similarly, there was no difference in performance between newly constructed or previously established teams (Levine et al., 2018).
Given those findings, the method of team creation used in this study is not a notable concern.
Some reports in the literature have suggested that team performance simply improves over time (Nowak & Miller, 1996). The study described here showed an increase in the average team scores between the laboratory examinations in Modules 1 and 2, but the average team scores then fluctuated for the remaining examinations ( Figure S2). This pattern suggested that there could be an early improvement in team dynamics, efficiency, and communication, but factors such as module content could affect team performance. This is supported in the literature by an undergraduate science course that reported the effect of course content on team performance in collaborative testing (Siegel et al., 2015). It should be noted that Module 3 and Module 5 corresponded to the end of the Fall and Spring semesters, respectively, so fatigue may also have played a role in team performance. In summary, the team-making process was unlikely to negatively affect team performance and, in this study, team performance did not improve based solely on team dynamics as the academic year progressed.

Team performance on collaborative anatomy laboratory examinations
One of the most compelling features of a collaborative assessment is the increase in performance on a given assessment when it is taken collaboratively. Previous studies in business education (Nowak & Miller, 1996), nursing education (Rivaz et al., 2015), physiology education (Rao et al., 2002;Cortright et al., 2003;Rathner & Byrne, 2014), and anatomy education have shown that collaborative assessments increased student performance (Nieder et al., 2005;Vasan et al., 2008;Huitt et al., 2015;Green et al., 2016;Vijayalakshmi et al., 2016). This study's finding of increased performance on collaborative anatomy laboratory examinations ( Figure 1) is consistent with these reports and with those describing how collaborative testing increases performance on fill-in-the-blank questions (Rao et al., 2002;Newton et al., 2019).
The occurrence of teams making perfect scores in this study was also seen with collaborative testing in other disciplines at the undergraduate level (Bloom, 2009).
This study's most compelling and novel finding is in the questionby-question analyses of individual and team performance (Table 2).
Even if only one student on the team answered correctly on the individual component, the incidence of their team answering correctly was 0.66 (Table 2, Scenario 5), highlighting the benefit of a collaborative laboratory examination. This finding is also consistent with F I G U R E 3 Student level of agreement with seven statements about the collaborative laboratory examination reported as weighted mean responses ± weighted standard deviation (±SD). Number of participants (N = 68); responses are reported on a four-point Likert scale corresponding to: 1 = strongly disagree, 2 = disagree, 3 = agree, 4 = strongly agree; * = one student responded both agree (3) and disagree (2) and left a note stating "it depended on which practical it was," so these responses were excluded from calculation.
results from a study conducted in a veterinary physiology course that reported that the student who answered a question correctly was much more likely to convince their partner to change an incorrect answer to a correct answer (Giuliodori et al., 2009). This same study also showed that a pair of students who both answered incorrectly changed their response to the correct answer 12% of the time (Giuliodori et al., 2009). This is consistent with the finding from the current study that even if everyone in the team answered incorrectly as individuals, they answered correctly as a team with a 0.15 incidence (Table 2, Scenario 6). In summary, the findings in this study suggested that collaboration between students greatly increased performance on the team portion of the anatomy laboratory examinations.

Impact of collaborative anatomy laboratory examinations on students
An unanticipated effect of collaborative exams was their impact not just on team performance, but on the performance of the individuals in those teams. Even though the examination questions have not appreciably changed since switching to a collaborative laboratory assessment format, the average individual scores have increased significantly compared to the average individual scores prior to implementation of a collaborative laboratory examination (Figure 1).
This and other studies on collaborative assessments have reported high levels of student responsibility and accountability when taking assessments as a team, which might explain much or all of the increase in individual performance on the collaborative laboratory examination (Meseke et al., 2010;Vasan et al., 2011).
Previous studies have shown that both low-and high-performing students' scores increased with collaborative assessments, and rarely did someone score lower on the team component than the individual component of the examination (Giuliodori et al., 2008;Kapitanoff, 2009;Siegel et al., 2015). Consistent with these findings, the results from the present study showed that lower performing students had larger increases in their score, anywhere between 12 and 24 points, after inclusion of the collaborative assessment, and students who performed in the upper three deciles still benefited, just to a lesser extent ( Figure 2). Furthermore, there was only a 2% occurrence of a student's overall score being lowered due to their individual score being higher than their team's score. In summary, the findings in this study suggest that collaborative assessments benefit nearly all students, irrespective of gender or performance level.

Student-perceived benefits of collaborative anatomy laboratory examinations
In addition to increased student performance, there are other positive effects that collaborative assessments have on students.
Previous studies have described the effect of collaborative assessments on an increased student-reported understanding of content (Hickey, 2006;Kapitanoff, 2009). Consistent with those findings, participants in this study reported that the group component during the laboratory examination improved their understanding of anatomy ( Figure 3). There have been reports of students benefiting from the chance to learn from others and discuss the answers together (Kapitanoff, 2009;Meseke et al., 2010). The opportunity to get immediate feedback from their peers and come to a decision promptly were other important features of collaborative assessments that have been described previously (Böckers et al., 2010;Meseke et al., 2010;Barremkala et al., 2019). Anxiety reduction was another known benefit of collaborative assessments (Zimbardo et al., 2003;Böckers et al., 2010;Pandey & Kapitanoff, 2011).
Furthermore, students report using collaborative testing outside of traditional assessment settings, such as when studying (Wissman & Rawson, 2016). The student responses in this study supported all of those findings as well (Table S2). The literature also supports the notion that most students positively perceive collaborative assessments despite their grades (Hickey, 2006;Vasan et al., 2009), and the students in this study overwhelmingly reported this to be true ( Figure 3). In summary, the student-perceived advantages of a collaborative assessment are clear and consistent among this and other studies.

Future directions
Future qualitative studies could also explore student opinions and perceptions of interpersonal relationships and team dynamics in the setting of collaborative anatomy laboratory assessments. Similarly, efforts could be made to re-design the collaborative laboratory examinations better to decrease social loafing and increase student accountability (Revere et al., 2008;Cooper, 2017). This could include collecting peer evaluations that would count for a portion of a student's grade.
There are conflicting reports of the effects of collaborative assessments on short-and long-term content retention, wherein some disciplines report no change in long-term retention (Woody et al., 2008;Meseke et al., 2010;Leight et al., 2012;Green et al., 2016;LoGiudice et al., 2021), while others report improved long-term retention of subject matter (Cortright et al., 2003;Rivaz et al., 2015;Vazquez-Garcia, 2018;Eastwood et al., 2020). Given that there is little data on this topic in the context of medical school anatomy laboratories, a future study could involve evaluating performance on gross anatomy and embryology questions on the Comprehensive Basic Science Exam practice examination offered by the National Board of Medical Examiners before and after the implementation of the collaborative laboratory examination.

Limitations of the study
Implementation time for both students and faculty is a substantial consideration and potentially limiting factor for instituting a collaborative laboratory examination. The examination format used in this study nearly doubled the time required for both students and faculty to complete the examination. The time commitment would be exacerbated in instances where student testing accommodations afford additional time to complete an assessment. For institutions with large class sizes, this time commitment may disallow a two-stage, collaborative assessment format. Finally, evaluation of long-term content retention via a cumulative anatomy laboratory examination taken at a later date was not possible given the institutional curricular design.

CON CLUS IONS
This was the first study to examine collaborative laboratory assessments in a high-stakes medical school gross anatomy laboratory and analyze question-by-question performance. The data showed that collaborative laboratory examinations significantly increased student performance; individual students scored higher on average than prior to starting the collaborative assessment format, and teams scored higher than any of their individual members 98% of the time. The incidence of teams answering a question correctly was considerably higher than their incidence of answering a question incorrectly regardless of individual team member performance, and the discussions and learning that took place during the team portion were the most highly ranked benefits of this assessment format. Student feedback also suggested that the collaborative assessment made them feel accountable and responsible for performing well for themselves and for their team. In conclusion, the findings reported in this study show that collaborative assessments in a medical school gross anatomy laboratory, both in terms of increased student performance and student-reported benefits to their understanding of content and the opportunity to work together to discuss and better learn the material, are a worthwhile avenue for anatomy educators to consider implementing at other institutions.