Implementation of virtual OSCE in health professions education: A systematic review

The Objective Structured Clinical Examination (OSCE) has been widely used in health professions education since the 1970s. The global disruption caused by the COVID‐19 pandemic restricted in‐person assessments and medical educators globally sought alternative means to assess and certify students and trainees to meet the acute demand for health‐care workers. One such solution was through virtual OSCE (vOSCE), which modified traditional in‐person OSCE using videoconference platforms. This meta‐ethnography sought to synthesise qualitative literature on candidates' and assessors' experiences of vOSCE to evaluate whether it may have a role in future assessment practices.


| INTRODUCTION
The global disruption caused by the COVID-19 pandemic has no parallel in modern times and extended across all major sectors of life. 1 It was clear that it would have a profound impact on health professions education from early in the crisis. The high stakes nature of medical practice means that assessments have always been a crucial component of medical education, 2 and the acute demand for health-care workers meant that certification was a particularly important policy focus. 3 Educators were quick to respond to, reflect on, and evaluate its widespread impacts. [4][5][6] They had to find alternative means to assess students in a way that would not pose a risk to them, their teachers and examiners, and any patients or simulated patient actors. Such changes in assessments would require fresh and radical thinking prompted by a disaster response mindset. [7][8][9] First described in the 1970s, the Objective Structured Clinical Examination (OSCE) is a form of in-person practical assessment that includes structured stations with standardised candidate tasks and examiner marking schemes. 10 It has become a popular assessment in health professions education and beyond for many reasons. 11 It resonated with the dominant ideas of its time, including a preoccupation with competence and on psychometry. 12 Despite some critiques, it has been ubiquitously incorporated into modern 'systems' 13 and 'programmes' 14 of assessment.
During the COVID-19 pandemic, one of the key restrictions imposed was minimising in-person engagements and encounters, and shifting to virtual communications, wherever possible. 15 A virtual OSCE (vOSCE), which applies the same approach to the OSCE but through an online videoconference platform, emerged as a temporary replacement. 16 It has been praised for minimising travel and improving performance. 17 However, the importance of clear and regular communication with students has been emphasised as critically important to its implementation. 18 There have also been concerns raised about its potential to allow cheating 19 and the possibility of disadvantaging students with more challenging home circumstances. 7 The most fundamental aspect of establishing validity of OSCEs is authenticity of the content. 11,20 It has been noted, for example, that there are challenges in assessing non-technical competencies such as professionalism through OSCEs, which limits how well the test performance extrapolates to real-world performance. 21 It has also been suggested that the use of standardised encounters and patients in OSCEs is too 'artificial' causing trainees to 'pretend empathy' in order to make the grade, 22 pursuing what Bleakley 23 described as 'a compulsive focus on the medical agenda'. Much of the work that has sought to refine and improve OSCEs has therefore focussed on making it realistic to true clinical practice. 24,25 This has been particularly important in relational specialties such as psychiatry 26 and acute specialties such as emergency medicine. 27 Attempts to enhance authenticity in OSCEs have shown positive results. A Swiss study showed that makeup artistry helped enhance the visual realism of simulated patients as octogenarians in a geriatrics OSCE. 28 Likewise, a Korean study showed that an OSCE station with a higher degree of authenticity better detected medical student level of patient centredness. 29 As such, the major change in format from traditional OSCE to vOSCE represents a fundamental threat to its effectiveness as an assessment tool if assessors and candidates did not find it to be authentic.
Despite an explosion of research about the impacts of COVID-19, there is a lack of coherent synthesis about lessons that can be learnt as the world emerges from the pandemic and grapples with important questions about which innovations should be retained and which should be dropped. 30,31 This study therefore took a broad view to identify and synthesise experiences of vOSCEs from candidates and assessors in health professions education.

| AIM
Given that vOSCE represents a fundamental threat to the authenticity of the OSCE, and that many schools and programmes around the world are reflecting on the extent to which they may have a role in future assessment practices, this study seeks to evaluate experiences with this assessment approach in a systematic, rigorous and interpretive manner.
The research question guiding this study is What are candidates' and assessors' (including faculty members) experiences of virtual OSCE in health professions education?

| METHODOLOGY
Although quantitative evidence synthesis approaches such as metaanalysis have been widely used and revered, qualitative evidence synthesis approaches have also been recognised as an important approach to advance interpretation as they make a "key contribution [of] deepening understanding". 32 Just like qualitative research methodologies, these exist on a continuum between objectivist and subjectivist orientations to provide broad insights in health professions education. 33 Meta-ethnography is one such form that can help to organise and synthesise findings from qualitative studies. It is a method first described by Noblit and Hare in the context of educational research and seeks to translate studies into one another. 34  involved checking the reference lists of eligible articles. 40 Forty-one records were identified from snowballing ( Figure 1). Ten additional records were identified from manual searching. The article selection process is summarised in a flowchart ( Figure 1) based on the Preferred Reporting Items for Systematic Reviews and Meta Analyses (PRISMA). 41 After the removal of duplicates, all 1069 identified records were screened using titles and abstracts by two reviewers (SCCC and GC).
The discrepancies in selection were discussed with a third reviewer (MAR). There were no limitations in terms of publication year. Studies excluded at this stage mostly did not use qualitative methodologies or focus on vOSCE. Full texts were obtained for 79 selected abstracts and assessed for inclusion by three reviewers (SCCC, GC and MAR).
Seventeen articles met the defined inclusion criteria and were included in the meta-ethnography. The final inclusion and exclusion criteria are detailed in Table 2. Studies were included if they described their methods as qualitative and involved the collection, analysis or interpretation of non-numerical data. 42 Studies examining any implementation of vOSCE, including for formative or summative purposes, were included. However, studies were excluded if they were conducted in a hybrid approach, such as having in-person candidates with remote examiners in order to exclusively examine candidates' and assessors' virtual experiences. Furthermore, studies were excluded if the examination process was asynchronous as their participatory experiences may differ significantly; for example, we excluded candidates self-recording and uploading videos of clinical assessments.

| Critical appraisal
There has been a debate on the value of appraisal in qualitative syntheses, with some authors opting to judge articles exclusively on their conceptual contribution. 43  This checklist aims to ensure that any articles with poor methodology are excluded in the synthesis. As all 17 articles scored between 55% and 95% on the CASP checklist, 44 no articles were excluded on the grounds of poor quality (<50%).
Using the criteria set out by Dixon-Woods et al., 45 Table 3.

| Synthesis
The 17 included studies were synthesised using a meta-ethnographic approach. Firstly, the studies were independently evaluated by four researchers (SCCC, GC, MAR, JK) to extract direct quotations from research participants, known as 'first-order constructs' by Noblit and Hare., 34 Subsequently, the researchers compiled the 'second-order constructs', which were the authors' interpretations of these quotations from the original studies' results and discussion sections. The researchers then came together to formulate their interpretations of first-and second-order constructs, known as the 'third-order constructs'. 36,63 These were developed through the 'line of argument synthesis', which involved identifying similarities and differences between the themes to develop an overall argument that accounts for the range and diversity of the 17 studies. 34 This collaborative approach challenged researcher's individual interpretation of constructs, decreased the possibility of biases and enabled more comprehensive understanding of these experiences 37,64

| RESULTS
A total of 13 second-order constructs were identified across the 17 articles. These are detailed in Table 4, along with the articles from which they arise, and representative first-order constructs. These second-order constructs were then synthesised by the research teams into four third-order constructs: T A B L E 2 Selection criteria used to guide screening of articles.   T A B L E 4 Diminished rapport 'It was harder to understand the patient, and show empathy over a computer screen.' -Student (54) 'I think … you're limited in your assessment of rapport building, because it's difficult to build rapport over an online platform. And it's more difficult for an examiner to then see that body language interaction.' -Examiner (50) 50,53,55,59,60,62 Standardisation of examination process 'In fact, the use of staff as actors was considered a bonus by the staff as "standardisation of interaction" was better.' -Staff (51) 'I suggest using real actors in the future […] I like working with the actors because they make the entire encounter feel more realistic …' -Student (53) 47,52,60 Refining the operational processes Improved accessibility 'I was able to examine from the home, students were able to sit from their place of choosing' -Examiner (54) 'Virtual OSCE is pretty good because it saves a lot of time like on traffic.
That's one of the best parts.' Student (54)  Impact on resources 'I believe that the virtual exam, as I said, is an optimization of resources. It saves time for evaluators and students, and it can be a more objective method to evaluate our knowledge' -Student (55) 'The cost was not greatly increased, because in most cases, we used permanent staff, although there was a significant workload increase for those. And it really did rely heavily on technology, but everyone these days seems to have their own laptop. So that seemed to be okay.
• strengthening confidence in a virtual environment, • understanding scope of use as an assessment, • refining the operational processes and • envisioning its future role.
These third-order constructs are evaluated in turn below.

| Strengthening confidence in a virtual environment
Transitioning from an in-person to a virtual OSCE platform created anxiety and uncertainty for both students and assessors prior to the examination. Students had concerns for technology-related disruptions to their assessment experiences, such as the dependability of the assessment platform and the stability of network connectivity. 47,51,59 Similarly, assessors were worried about their technological proficiency as well as glitches that may impact students' performances and grades. 62 To address these technical and logistical concerns, both students and assessors valued the provision of additional support or training prior to the examination, such as a mock vOSCE, Q&A webinars and guidance documents. 47,62 These familiarisation approaches enabled students and assessors to learn the technical requirements for this exam and to understand the procedure for reporting incidents during the examination.
Overall, students felt the virtual environment was less intimidating and stressful both before and during the examination. Prior to the examination, they were not situated in an environment with other nervous students who were waiting for their exams and were able to have a 'peace of mind' at home. 55,58,60 During the examination, they could 'focus on [their] own thing' without being distracted by other students who were simultaneously completing OSCE stations. 47,58 Students commented that the surveillance in these remote assessments was less explicit and tangible, as examiners had 'turned off their camera', 52 and there was not 'somebody standing over you'. 48 In contrast to the 'confronting' physical OSCE environment, 55 the virtual environment enabled students to be more collected, 55  For other competencies, such as data interpretation, prescribing skills and communication skills, the virtual format was considered a suitable alternative. 46,47,59 Some students felt the set up was very similar to in-person consultations, and they were able to communicate to standardised patients effectively. 47,62 Others struggled to develop rapport because of the reduction in non-verbal communication cues over the screen. One student commented it was 'difficult to maintain eye contact and generate rapport', 50 and another found it 'difficult to be empathetic' because of a 'disconnection'. 53 Faculty members were initially concerned about the standardisation and fairness of the virtual examination process but were more confident after examiner training and station calibration sessions. 47 The additional recruitment of exam assistants to manage timing, technology and transitions was favoured by examiners. This also contributed to standardisation by reducing 'cognitive overload' experienced by examiners who would otherwise be expected to simultaneously examine candidates and manage assessment operations. 62 Some institutions recruited faculty members as simulated patients to promote consistency and calibrate patient behaviour, thereby further promoting standardisation of the virtual assessment. 47 However, this was not always welcomed, with one student commenting having actors instead as standardised patients 'make [s] the entire encounter feel more realistic'. 60 Whilst the breakout rooms were used effectively and smoothly, students were also given extra time in case of any delay in transitioning between breakout rooms, 47,52 which 'brought respite' and 'breathing space' for examiners. 62

| Refining the operational processes
Students and faculty commented on operational difficulties during the running of vOSCE. This included the accessibility of the assessment for the parties involved, unpredictable logistical challenges, resourcerelated impacts, and challenges with assessment security.
Students and faculty alike found the shift to a virtual platform to be flexible, convenient, time-saving and without 'the burden of the costs associated with travel', especially when candidates were previously required to travel to or from remote sites. 52,53,55,61,62 However, the comfort and convenience offered by vOSCE introduced issues with exam security. Faculty members questioned appropriate invigilation and sequestering through a virtual platform, with one assessor commented on noticing students referring to extra resources by their sides during their assessment. 55 Accessibility in terms of an institution's ability to operate assessments virtually was also raised. Students thought the new platform was 'an optimization of resources' and 'achieve[d] the same goals' as face-to-face assessments. 50 Faculty members commented on 'new resource requirements', such as technology and additional time needed to organise this novel assessment, but found these to be an initial hurdle that once overcome was 'relatively inexpensive'. 59,62 One assessor was also not in favour of virtual assessment, commenting 'there is far more preparation for examiners … compared to face-to-face where the centres have prepped everything for the examiner to just turn up'. 62 As with all operational changes, the shift to a virtual platform was met with limitations in engagement with the platform. Several participants referred to unstable internet connections as a source of issues during the assessment phase. An examiner was concerned that they 'may have made the resident nervous' 52 as a result of such operational difficulties. A few students 37,45 mentioned incompatible computer programmes for their assessment and issues with connection, 56,58 with one student explaining the vOSCE technological demands were ultimately 'too much … for [their computer] to pull off all at once' to run their assessment effectively. 56

| Envisioning its future role
The current and future roles of vOSCE were discussed in almost all studies by both students and faculty members. There was support for the transition of OSCE to a virtual setting in the extraordinary circumstances of the COVID-19 global pandemic, in particular as it was seen to be the 'safest' 50 alternative for all parties. Although favour was given to vOSCE during the pandemic, there were divided views on whether they were appropriate outwith the pandemic. In-person assessments were thought to be 'essential' and 'superior' by some examiners. 55 Most comments that favoured in-person assessments considered the virtual platform to be impeding appropriate assessment of physical examination skills. One examiner was concerned that abandoning in-person OSCE would stop students learning physical examination skills in groups. 59 Overall, there was a collective belief that the virtual platform was 'not a perfect replacement', 46  Previous studies have found, like we did in this study, that students across different disciplines and training stages have a mixed response to OSCEs. Health-care professional students are consistently stressed, nervous and anxious with traditional oral examinations. [65][66][67] As such, the findings in this study that vOSCE can be intimidating and stressful may not reflect factors unique to the virtual environment. Nonetheless, students and trainees were assured about the authenticity of OSCEs 68,69 and the extent to which it mirrored 'real-life practice'. 70 This reflected that the most positive aspect of vOSCE identified in our study was its ability to simulate telehealth practice, which is growing in importance across the health-care sector.
Existing literature on OSCEs highlights important differences between 'high stakes' and 'low stakes' uses of this assessment tool, 21,71 although this comparison was not apparent in this study as the use of OSCEs in this review was generally low stakes and at a local, rather than regional or national, level.
Revisiting our conceptualisation of vOSCE as a potential threat to OSCE validity because of its divergence from real-world practice, this review reaffirms the centrality of authenticity as a fundamental tenet of OSCE validity, both through the clear focus on enhancing the realism of vOSCE itself and also through the recognition that it mirrors an important shift in professional practice in the health-care sector. A wide-ranging sociohistorical review and critique of OSCEs identified various problematic areas of disconnect between the educationassessment axis and authentic clinical practice. 72 In particular, it noted that there have been dramatic changes to the clinical context in recent decades, linked to workforce, teamwork, technologies and 'unofficial rules', which OSCE has struggled to keep up with. The rise of telehealth practice was rapid and explosive in response to the COVID-19 global health crisis. 73 We propose that the necessarily rapid rise of vOSCE in response to this crisis provided a mechanism for this change in clinical practice to unusually quickly be reflected in assessment practices. Finally, given the growing interest and evolving nature of telemedicine, further research to understand how vOSCE can contribute to this in an authentic and valid way would also be worthwhile.
Overall, meta-ethnography is a widely used and effective synthesis method for qualitative studies. The use of systematic searches, snowballing and backward snowballing, and critical appraisal using the CASP framework to screen for poor quality studies contributed to a rigorous research approach. Although the wide range of CASP scores in included studies may indicate that some studies included in the review vary in the quality of their methodologies, the CASP framework does not capture relevant nuances fully and is therefore not equally applicable to all studies, nor is it a definitive indicator. 45,64 Only studies in English were included, which may have limited the range of experiences to predominantly Western countries. Hybrid OSCE formats were excluded, as were studies in which the examination process was asynchronous. Such studies may have yielded valuable insights into students' and examiners' perceptions of vOSCE, and further research in this area would be warranted. whether its potential contribution to authentically assess telehealth competencies is fully realised.

AUTHOR CONTRIBUTIONS
All authors were involved in coming up with the research question and methodology. Data clinics were organised where SCCC, GC, JK and MAR made significant contribution to collecting and synthesising the results. SCCC, JK, DM and MAR jointly drafted the discussion section. All authors contributed to manuscript revision and gave final approval to this submitted paper. All authors agreed to be accountable for all aspects of the work.