SEARCH

SEARCH BY CITATION

Abstract

  1. Top of page
  2. Abstract
  3. Introduction
  4. Methods
  5. Results
  6. Discussion
  7. Acknowledgments
  8. References

Context  Today’s formal medical school admission systems often include only cognitively oriented tests, although most medical school curricula emphasise both cognitive and non-cognitive factors. Situational judgement tests (SJTs) may represent an innovative approach to the formal measurement of interpersonal skills in large groups of candidates in medical school admission processes. This study examined the validity of interpersonal video-based SJTs in relation to a variety of outcome measures.

Methods  This study used a longitudinal and multiple-cohort design to examine anonymised medical school admissions and medical education data. It focused on data for the Flemish medical school admission examination between 1999 and 2002. Participants were 5444 candidates taking the medical school admission examination. Outcome measures were first-year grade point average (GPA), GPA in interpersonal communication courses, GPA in non-interpersonal courses, Bachelor’s degree GPA, Master’s degree GPA and final-year GPA (after 7 years). For students pursuing careers in general practice, additional outcome measures (9 years after sitting examinations) included supervisor ratings and the results of an interpersonal objective structured clinical examination (OSCE), a general practice knowledge test and a case-based interview.

Results  Interpersonal skills assessment carried out using SJTs had significant added value over cognitive tests for predicting interpersonal GPA throughout the curriculum, doctor performance, and performance on an OSCE and in a case-based interview. For the other outcomes, cognitive tests emerged as the better predictors. Females significantly outperformed males on the SJT (= − 0.26). The interpersonal SJT was perceived as significantly more job-related than the cognitive tests (= 0.55).

Conclusions  Video-based SJTs as measures of procedural knowledge about interpersonal behaviour show promise as complements to cognitive examination components. The interpersonal skills training received during medical education does not negate the selection of students on the basis of interpersonal skills. Future research is needed to examine the use of SJTs in other cultures and student populations.


Introduction

  1. Top of page
  2. Abstract
  3. Introduction
  4. Methods
  5. Results
  6. Discussion
  7. Acknowledgments
  8. References

In many countries, there is a striking discrepancy between medical school objectives and the admission process. Although the objectives of curricula acknowledge the importance of interpersonal skills and personal characteristics (‘soft’ skills), most formal medical school admission systems tend to primarily assess academic achievement in science domains and cognitive abilities such as verbal reasoning.1–6 Therefore, a ‘more holistic and sophisticated approach to selection – based on predictors of care that are both valid and patient-relevant – needs to be developed and applied’.7

Over the years, a variety of approaches for measuring soft skills (e.g. interviews) have been proposed and examined. However, it is often difficult to reliably and formally apply them in the large-scale and high-stakes context of student admissions. Therefore, this study investigated the use of situational judgement tests (SJTs)8 as an innovative tool for measuring interpersonal skills in large groups of candidates in the formal medical school admission process. In this study, the SJTs provided applicants with multiple video-based descriptions of doctor–patient scenarios (i.e. situations related to ‘building and maintaining relationships’ and ‘exchanging information’9–11) and asked them to indicate how they would react by choosing an option from a list of responses. Situational judgement tests were expected to be useful additions to the admission process for at least three reasons. Firstly, although candidates at the time of admission may not have experience in doctor–patient interaction from the doctor’s perspective, measurement of their interpersonal procedural knowledge (i.e. knowledge of how to act as a doctor in an interpersonal situation) through the use of video-based scenarios may provide precursory insight into their interpersonal behaviour in actual future interactions with patients, as observed and rated during internships and on the job many years into the future.12 Secondly, interpersonal skills training during medical education may build on students’ initial interpersonal procedural knowledge as measured using SJTs at the time of admission and thereby underscore the value of selecting candidates on the basis of their prior interpersonal procedural knowledge. Thirdly, video-based interpersonal SJTs may result in more favourable perceptions of the admission process because they present realistic job-related situations.

This study aimed to test these claims and to examine the validity of interpersonal video-based SJTs in predicting a variety of outcome measures using a longitudinal and multiple-cohort design. The study investigated whether SJTs predict academic performance in medical and interpersonal courses along a 7-year medical curriculum. It also examined the added value of SJTs to predict performance as a doctor (trainee) and performance on the certification examination administered 9 years later. Finally, the study scrutinised subgroup (e.g. gender) differences and candidate responses to SJTs.

Methods

  1. Top of page
  2. Abstract
  3. Introduction
  4. Methods
  5. Results
  6. Discussion
  7. Acknowledgments
  8. References

Context

This study was situated in the Flemish-speaking part of Belgium, where candidates are required to pass an examination in order to gain access to medical education. In Belgium, candidates who pass the admission examination and start medical education are typically younger than in other countries as pre-medical education programmes do not exist (i.e. candidates can begin medical education directly after high school). The medical curriculum lasts for 7 years. Graduates who wish to pursue a career in general practice undertake an additional 2 years of training. Each year the admission examination is centrally administered in the capital city. Candidates who pass the examination receive a certificate that warrants entry to any university of their preference. There is no further selection by the universities.

Despite these differences, there are many similarities between the Flemish medical school admission examination and medical school admission examinations across the world. For instance, cognitive tests comprise the key part of the examination. Moreover, there have been repeated calls to complement these cognitive tests with interpersonal skills assessments.

Study population

This was a longitudinal, multiple-cohort study. Archival admission examination scores and demographic information were obtained for the entire candidate population of 5444 individuals (36.7% men, 63.3% women; average age: 18 years, 10 months; 99.5% White) who sat the examination between 1999 and 2002 (Fig. 1). This time span was used because all students had completed the full curriculum (7 years of medical education), as well as the certification examination (conducted after an additional 2 years of general practice education).

image

Figure 1.  Flowchart showing how the study sample was derived. SJT = situational judgement test; GPA = grade point average

Download figure to PowerPoint

Of those 5444 individuals, 2161 (39.7%) passed the examination. The proportions of candidates passing the examination across the years in the study period were virtually identical (38–40%). This is not surprising because each cohort represents the entire population of candidates participating in the examination at that time. Of these 2161 individuals, 1788 began the first year of medical school. These two sample sizes differ because some candidates (= 373, 17.3%) who passed the examination eventually chose not to study medicine. This study used academic performance data for the 1432 students at the three largest universities in the Flemish part of Belgium as only these universities provided a full 7-year medical curriculum. Figure 1 shows how many students successfully passed each of the 7 years. Student dropout as a result of failure (especially in the first year, in which 1432 students began and 1176 successfully concluded the year [82.1%]) was the main cause of sample attrition. Of these 1432 students, 37.0% were male and 63.0% were female.

Admission examination measures

Cognitive tests Cognitive tests included a combination of four science knowledge tests (biology, chemistry, mathematics, physics) and a general mental ability test. Each science test included 10 items with four possible answers (i.e. 40 items to be answered within a time limit of 180 minutes). The cognitive ability test consisted of 50 items (verbal, numeric or figural), each with five possible answers (to be answered within 50 minutes). Prior research confirmed the adequate reliability (0.78) and validity (0.36) of this test for medical students.13,14

Situational judgement tests The SJT was developed by collecting realistic situations that referred to two key interpersonal skills domains (‘building and maintaining relationships’ and ‘exchanging information’9) from experienced doctors. Vignettes nesting incidents in which these critical dimensions of interpersonal skills were relevant were then written. For example, the SJT included situations that involved showing consideration and interest, conveying bad news to a patient, responding to a patient’s refusal to take the prescribed medicine, and the use of appropriate language for explaining technical terms. No medical knowledge was necessary to complete the items. Questions and response options were derived using a similar approach. Next, semi-professional actors were videotaped in a recording studio. Finally, a scoring key was developed by a panel of experienced doctors. Agreement among the experts was satisfactory and discrepancies were resolved upon discussion until the scoring key was complete. The scoring key indicated which response alternative was correct (+ 1 point) for each item. In its final version, the SJT consisted of 30 multiple-choice questions with four possible answers to each. After each scene froze, candidates were given 25 seconds to answer the question (‘What is the most effective response?’) related to the scene. Extensive research attests to the reliability (0.66) and validity (0.13–0.22) of the SJTs developed.14,15 Although the SJT was a formal part of the admission system, it carried less weight (30%) than the cognitive tests (70%) in the admission decision.

Outcome measures

A comprehensive set of outcome measures was used. Firstly, five measures of academic performance were gathered: (i) Year 1 grade point average (GPA); (ii) GPA in non-interpersonal courses; (iii) GPA in interpersonal (communication) courses; (iv) GPA at, respectively, the end of the 3-year Bachelor’s degree period and the end of the 4-year Master’s degree period, and (v) GPA across the 7 years of the curriculum. None of the universities was familiar with their students’ performance on the admission examination as this information was not sent to them.

To decide whether a course was interpersonal rather than non-interpersonal, two coders independently inspected the descriptions of courses within the medical curricula. To be coded as ‘interpersonal’, a course was required to deal with communication with patients in the context of an internship (either short- or long-term). Inter-rater agreement between the coders was adequate and discrepancies were resolved upon discussion. A composite score (interpersonal GPA) was obtained by averaging scores on interpersonal courses across years. The non-interpersonal GPA composite (made up of scores on courses without an interpersonal component) was computed in a similar way. In one university, performance on interpersonal courses was not rated. Instead, students were required to pass these courses in order to be able to take the examinations in that respective year. Therefore, the sample for which interpersonal GPA data were available is smaller (= 607) than that for which non-interpersonal GPA data were available (= 927).

Secondly, a supervisory rating of job performance as a doctor was included. After 7 years, 261 students (28.2%) entered a 2-year general practice training programme in which they worked under the supervision of a registered general practitioner (GP) in a practice placement (Fig. 1). While being supervised and evaluated, these trainees were fully responsible for patients. Their supervisors were unfamiliar with the students’ academic records.

Finally, students’ general practice certification examination results (= 261) were obtained from the database of the centralised general practice programme. Students sat this certification examination 9 years after entering medical education. Data for three key examinations were retrieved: (i) an interpersonal skills assessment conducted using an objective structured clinical examination (OSCE); (ii) a knowledge test about general practice, and (iii) a case-based panel interview.

Control measure

High school GPA served as a control measure. When applying to sit the medical school admission examination, candidates were asked to provide their rank in high school (quartile position). Thus, this outcome measure denoted self-reported high school GPA. Prior research found high convergence between self-reported GPA and actual GPA.16 A total of 3049 (56.0%) candidates provided information on their high school GPA. This variable was reverse-scored so that higher scores denoted better school performance.

Perceptions of the admission examination

At the end of the 1999 and 2000 examinations, students anonymously completed a questionnaire in which they rated the components of the examination for face validity (‘relevance to profession’) and perceived difficulty using a 5-point scale (1 = strongly disagree, 5 = strongly agree). Extant validated scales, with adequate internal consistency reliabilities,17,18 were used. Because all data were anonymised, students’ perceptions could not be linked to their actual scores. The response rate was 61.8%.

Results

  1. Top of page
  2. Abstract
  3. Introduction
  4. Methods
  5. Results
  6. Discussion
  7. Acknowledgments
  8. References

Predictive validity of interpersonal SJTs

Scores on the SJT were correlated with each of the outcome measures. For comparison purposes, scores on the cognitive tests were correlated in the same way. Table 1 presents the correlation coefficients (validity coefficients) of the SJTs and cognitive tests and shows how the SJTs and cognitive tests complement one another. The measurement of interpersonal skills using video-based SJTs at the time of admission predicted interpersonal GPA (a composite of scores on interpersonal courses throughout the curriculum). Moreover, the SJT score predicted the score on an OSCE on interpersonal and communication skills, performance on a case-based panel interview, and job performance as a doctor 9 years later. Scores on the cognitive tests were not good predictors of these outcome measures, but were significantly related to the other outcomes (Year 1 GPA, non-interpersonal GPA, Bachelor’s degree GPA, Master’s degree GPA, final-year GPA, and the certification knowledge test) that the SJT did not predict. The validity of the cognitive tests was higher than that of the SJT.

Table 1.   Uncorrected and corrected validity coefficients of cognitive tests and video-based interpersonal situational judgement tests (SJTs) for predicting outcome measures
 Cognitive testsVideo-based interpersonal SJTs
Outcome measure r r§ r r r§ r
  1. * p < 0.05; † p < 0.01; ‡ p < 0.001

  2. § To establish the validity of the tests for the total candidate group, validity coefficients were corrected for multivariate range restriction19 on the basis of the admission examination data for the entire candidate population (= 5444)

  3. ¶ This column reports the validity of the tests corrected for multivariate range restriction19 and for unreliability in the predictor. To this end, the reliabilities used for the cognitive test and SJT were 0.78 and 0.66, respectively14,21

  4. GPA = grade point average; OSCE = objective structured clinical examination

Year 1 GPA (= 1386)0.340.460.520.040.070.09
Medical GPA (= 927)0.320.440.490.08*0.110.13
Interpersonal GPA (= 607)0.09*0.130.150.210.220.27
Bachelor GPA (= 1120)0.37*0.490.550.07*0.090.12
Master GPA (= 927)0.250.350.390.100.120.14
Final GPA (= 927)0.330.440.490.090.120.14
Knowledge test (general practice) (= 261)0.15*0.210.240.040.050.06
OSCE (interpersonal skills) (= 261)0.00− 0.01− 0.010.12*0.120.15
Doctor performance (= 261)0.050.070.080.15*0.150.19
Case panel interview (= 261)0.040.050.060.190.190.23

To eliminate possible biases, several precautions were taken and additional analyses conducted. Firstly, in terms of prediction (admission scores), this study combined data for multiple cohorts of medical students to increase its sample size. In terms of outcome, the analyses also combined GPA (sometimes based on different courses and derived from scores given by different professors) across multiple (three) large universities. Therefore, both predictor and outcome scores were standardised to ensure that aggregated scores carried the same implications. High school GPA could not be standardised because students’ records in the various high schools were unavailable.

Secondly, in the process of selecting students from the candidate population, the variability in test scores is reduced because only those students who are admitted pursue medical studies. In subsequent years, these students may drop out of medical school. This reduced variability (also known as restriction of range) might artificially reduce the magnitude of the correlation coefficients obtained. In order to establish the validity of the tests for the total candidate group, the validity coefficients in Table 1 were corrected for multivariate range restriction19 on the basis of admission examination data for the entire candidate population (= 5444). Generally, correcting for range restriction had greater effect on the validity of the cognitive tests (an increase of about 0.10; see Table 1) than on that of the SJTs because the cognitive tests carried more weight (70%) in the admission decision than the SJT (30%).

Thirdly, this study focused on four cohorts of students who had completed the full curriculum. However, as a result of dropout and choices of medical study (e.g. specialty versus general practice), the sample sizes differed for the various criteria. This begs the question of whether the same results were found in different groups. To investigate this, the present analyses were re-run for participants for whom final-year GPA (= 927), interpersonal GPA (= 607) and certification results (= 261) were available. There was no substantive change in findings.

Fourthly, analyses to deal with outliers were conducted. For instance, a robust regression in which the data were analysed with weighted least squares (rather than ordinary least squares) regression was conducted. In this analysis, cases are weighted by the inverse of their leverage value. Results were very similar.

A fifth possible bias is that the medical GPA composite may provide a more reliable measurement of students’ capabilities than the interpersonal GPA composite because it is based on outcomes in a greater number of courses. Therefore, care should be taken when comparing the validity of the SJT for predicting interpersonal GPA with the validity of the cognitive tests for predicting medical GPA. To provide a fair comparison between the correlations obtained for these two outcome measures, the procedure described by Berry and Sackett20 was used to compute the validity of the SJT for predicting the outcome on a single interpersonal course and to compare it with the validity of the cognitive tests for predicting the outcome on a single medical course. The mean validity of the SJT for predicting a single interpersonal course outcome was 0.31 and the mean validity of the cognitive tests for predicting a single medical course outcome was 0.44. Thus, the cognitive tests remained more valid than the SJT, although the difference was less substantial than in the main analyses.

Finally, issues pertaining to retest and coaching effects were considered. If students participated in the examination more than once, this study used their entry-gaining scores. Previous studies have shown that candidates who retake SJTs score on average 0.32 standard deviation (SD) better than one-time test-takers.21 This effect size was in the same range as that associated with the cognitive tests within the examination. Similarly, coaching has been found to raise SJT scores by a maximum of 0.24 SD.22 This value is similar to that associated with coaching effects in cognitive tests.23 Thus, potential retest and coaching effects do not bias SJT results any more than they do cognitive test results.

Added value of interpersonal SJTs

Whether the video-based SJT had additional validity over the cognitive tests for each of the 10 outcome measures was examined by conducting hierarchical regressions. As noted, high school GPA was first entered as a control in this regression equation. Next, the cognitive tests were entered as the second block, followed by the SJT. The SJT had significant added value for predicting four outcomes (interpersonal GPA, OSCE performance, doctor performance, performance on the case-based panel interview), with additional portions of variance of 4.4%, 1.4%, 2.2% and 3.4%, respectively. For the other outcomes, the added value of the SJT was negligible.

SJTs and gender differences

On the cognitive tests, males (= 1996) slightly outscored females (= 3448) in this candidate population (t(5442) = 8.97, p < 0.01, = 0.25). The opposite result was found for the interpersonal SJT, in which females slightly outperformed males (t(5442) = − 9.13, p < 0.01, = − 0.26). Therefore, complementing the cognitive tests with SJTs may lead to roughly equal proportions of male and female candidates passing the admission process (provided the tests are given equal weighting).

SJTs and candidate perceptions

As an admission examination is a high-stakes procedure conducted in a highly visible setting, it is also important to examine the perceptions of candidates.24 In this admission examination, candidates who completed the post-examination questionnaire perceived the interpersonal SJT (mean ± SD: 3.19 ± 0.88) as having significantly more face validity than the cognitive tests (mean ± SD: 2.76 ± 0.68) (t(1470) = 20.50, p < 0.01, = 0.55), which suggests that the SJT was seen as more related to the profession than the cognitive tests. Candidates also perceived the SJT as significantly less difficult than the cognitive tests (= − 0.98). (However, note that the actual mean ± SD score on the SJT was only 18.36 ± 3.10.)

Discussion

  1. Top of page
  2. Abstract
  3. Introduction
  4. Methods
  5. Results
  6. Discussion
  7. Acknowledgments
  8. References

This study leads to the following key conclusions. Firstly, SJTs as formal admission assessments of interpersonal procedural knowledge in large volumes of candidates are useful complements to a cognitive examination component as they predict other outcome measures. That is, they predict interpersonal GPA during medical education. Accordingly, the formal inclusion of SJTs into medical school admission procedures ensures that these procedures are congruent with medical curriculum objectives that emphasise both cognitive and non-cognitive aspects. Secondly, interpersonal SJTs maintain their validity for predicting OSCE results (7 years later) and doctor and certification performance (9 years later), indicating that interpersonal skills training during medical education does not negate the value of selecting students on the basis of their interpersonal procedural knowledge in the first place. Thirdly, interpersonal SJTs are positively perceived by applicants and might increase the gender diversity of the candidates selected.

A strength of this study is that it is the first large-scale and long-term examination of the use of video-based interpersonal SJTs in medical school admissions. Prior research examined the use of SJTs primarily for advanced level medical selection (i.e. selection of GPs).25 Another strength is that the SJT was implemented in an actual admissions context. Although this means that there is range restriction (outcome measures are available only for admitted students), the validity of the SJT was determined for the full candidate group by correcting the validity coefficients for multivariate range restriction using the total candidate population (= 5444).

Despite these strengths, some caveats are in order. Situational judgement tests are not good predictors of medical course grades. Therefore, they are not intended to replace cognitive tests. In addition, the present results were obtained in the Flemish-speaking part of Belgium in a medical school programme that takes place immediately after high school completion, which potentially limits the generalisability of the findings. However, it should be noted that the search for formal approaches to the assessment of interpersonal skills is a long standing endeavour in many countries worldwide.

As this study was conducted in Belgium, 99.5% of the study sample was White. As a result, subgroup differences according to ethnicity could not be examined. Hence, the question of whether the use of SJTs in admission systems has the potential to increase the ethnic diversity of the selected student pool remains outstanding. This is a crucial question as widening access for various subgroups (e.g. in terms of ethnicity) to medical education has become a key consideration in evaluating approaches to medical school admission.4,26 To date, prior research on SJTs in employment and educational contexts reveals that SJTs may enable institutions to attract more ethnic minority candidates, thereby increasing the demographic make-up of the student pool, with only small decreases in GPA.27,28 However, this research needs to be replicated in a medical school admissions context. Overall, examinations of the generalisability of the present results in other student populations and cultures are encouraged before colleges use them in the admission process. Within the international scientific community, such research should increase current knowledge and understanding of SJTs as formal measures of interpersonal skills in medical school admission.

In this study, the validity of interpersonal video-based SJTs for measuring interpersonal outcomes is at best moderate and thus the search for additional formal procedures to evaluate intra- or interpersonal skills should be continued. Future research might also compare the validity of different approaches to measuring such soft skills. For instance, the video-based SJT approach might be compared with the multiple mini-interview approach.29–31 A final avenue of future research might investigate how videotaped SJT scenes can be used and integrated into interpersonal skills training during the medical curriculum.

Funding:  this research was supported by the Flemish Government, Ministry of Education. The views expressed here are those of the author and do not reflect an opinion of the funder.

Conflicts of interest:  none.

Ethical approval:  this study was approved by the Human Subject Committee of the Department of Psychology, Ghent University.

References

  1. Top of page
  2. Abstract
  3. Introduction
  4. Methods
  5. Results
  6. Discussion
  7. Acknowledgments
  8. References