SEARCH

SEARCH BY CITATION

Abstract

  1. Top of page
  2. Abstract
  3. Introduction
  4. Methods
  5. Results
  6. Discussion
  7. References
  8. Supporting Information

Medical Education 2011: 45: 1190–1198

Objectives  Workplace-based assessment (WPBA) is an increasingly important part of postgraduate medical training and its results may be used as evidence of professional competence. This study evaluates the ability of WPBA to distinguish UK Foundation Programme (FP) doctors with training difficulties and its effectiveness as a surrogate marker for deficiencies in professional competence.

Methods  We conducted a retrospective observational study using anonymised records for 1646 trainees in a single UK postgraduate deanery. Data for WPBAs conducted from August 2005 to April 2009 were extracted from the e-portfolio database. These data included all scores submitted by trainees in FP years 1 and 2 on mini-clinical evaluation exercise (mini-CEX), case-based discussion (CbD), direct observation of procedural skills (DOPS) and mini-peer assessment tool (mini-PAT) assessments. Records of trainees in difficulty, as identified by their educational supervisors, were tagged as index cases. Main outcome measures were odds ratios (ORs) for associations between mean WPBA scores and training difficulties. Further analyses by the reported aetiology of the training difficulty (health-, conduct- or performance-related) were performed.

Results  Of the 1646 trainees, 92 had been identified as being in difficulty. Mean CbD and mini-CEX scores were lower for trainees in difficulty and an association was found between identified training difficulties and average scores on the mini-CEX (OR = 0.54; p = 0.034) and CbD (OR = 0.39; p = 0.002). A receiver operator characteristic curve analysis of mean WPBA scores for diagnosing ‘in difficulty’ status yielded an area under the curve of 0.64, indicating weak predictive value. There was no statistical evidence that mean scores on DOPS and mini-PAT assessments differed between the two groups.

Conclusions  Analysis of a large dataset of WPBA scores revealed significant associations between training difficulties and lower mean scores on both the mini-CEX and CbD. Models show that using WPBA scores is, however, not a valid way of screening for trainees in difficulty. Workplace-based assessments have value as formative assessments that prompt supervision, feedback and reflection. They should not be relied upon to certify competence and their use for such ends may reduce their effectiveness in training. Their results should be interpreted in the context of multiple other methods of assessment, with the aim of achieving a genuinely holistic and representative assessment of professional competence.


Introduction

  1. Top of page
  2. Abstract
  3. Introduction
  4. Methods
  5. Results
  6. Discussion
  7. References
  8. Supporting Information

Procedures to ensure the assessment and maintenance of a doctor’s competence have received extensive attention from policymakers and academics.1 The numerous reasons for this include concerns about performance,2 patient safety3 and increased accountability to patients and funding agencies.1–3 Desired outcomes for medical training have been agreed and codified across many countries. In Canada and the Netherlands, the Canadian Medical Education Directives for Specialists (CanMEDS) framework is used to direct training and assessment towards the achievement of competency within seven key generic roles (scholar, collaborator, professional, communicator, health advocate, manager and medical expert). In the USA, the Accreditation Council for Graduate Medical Education (ACGME) identifies six core outcomes, similar to those of CanMEDS, and the UK Foundation Programme (FP) specifies another seven curriculum foci for junior doctors (good clinical care, maintaining good medical practice, teaching and training, relationships with patients, working with colleagues, probity and professionalism, and recognition of the sick patient). The intention to make the key aspects of good medical training explicit is laudable, but measuring the achievement of competence within these areas of training has proven challenging. The concept of competencies, which represent individual elements of professional competence, provides a framework for evaluating competence in a particular skill, ideally in a real-world setting, and taking into account a wider context of the professional roles or attributes identified by CanMEDS or the UK Foundation Programme. Over the last decade, many institutions with responsibility for postgraduate training have embraced workplace-based assessment (WPBA) as representing a response to the challenge to increase the authenticity of competency assessments and avoid the difficulties of context for which more traditional approaches, such as objective structured clinical examinations or multiple-choice question examinations, are criticised.4 In 2001, the ACGME identified WPBA as a component of its ‘toolbox’ to assess competencies and outcomes for resident training in the USA. Workplace-based assessment remains a key part of the ACGME Learning Portfolio, an electronic system which allows for the recording and sharing of assessments, achievements and personal reflection and is currently in development. Training authorities in Canada and the Netherlands continue to promote the use of WPBA to demonstrate the achievement of their training requirements. Further, since 2005 in the UK, WPBA has formed a significant part of the assessment of junior trainees within the FP, in which trainees are mandated to complete minimum numbers of four types of WPBA (Fig. 1) in order to progress to higher levels of training. In summary, WPBA has become a major part of the assessment and certification of competence in postgraduate medical training.

image

Figure 1.  Workplace-based assessments used in the UK National Health Service Foundation Programme: a primer

Download figure to PowerPoint

In the UK, the FP consists of a 2-year general training programme forming the bridge between medical school and specialist or general practice training. Foundation Programme training aims to develop ‘demonstrably competent doctors’ and aspires to be ‘trainee-centred’ (FP trainees are responsible for initiating assessments and returning all completed paperwork). Throughout the FP, trainees keep records of their learning and WPBAs using an e-portfolio, which allows them to plan and manage their training. The e-portfolio contains guidance and suggestions for presenting the evidence gathered, in order to demonstrate developing competence.

Some limitations of WPBA are already well recognised. In a trainee-led system, assessors will be chosen for reasons other than their ability to perform the assessment. Thus the relationship between observer and trainee can influence the validity of the assessment.5,6 Further, there is concern that competency-based assessment does not adequately assess the additional attributes, beyond clinical competence, that make a good doctor. The expert advisory panel on selection and assessment found that the FP assessment processes did not effectively discriminate between adequate and excellent doctors,7 but these processes are still relied upon to discriminate competent from non-competent trainees. As a key part of assessment in the FP, the reliability of the multiple WPBA methods is thus highly relevant because they are frequently used to provide evidence of competence.8 As many other national postgraduate training schemes use similar WPBA and e-portfolio systems for both formative and summative assessment, the question of their reliability takes on even wider significance. Thus far, evidence of the effects of WPBA implementation on doctor performance is limited.9 Further, it is not known how WPBA results or scores are associated with professional excellence, competence or lack of competence.

In the UK, the accepted standard for defining professional competence is Good Medical Practice (GMP),10 published by the General Medical Council (GMC). This sets standards for professional behaviour across several domains of personal and professional conduct. For the purposes of this study, ‘competence’ is an operational definition equivalent to adherence to GMP. However, there is no reference standard test for detecting the presence or absence of competence. A surrogate for the outcome measure – professional competence – must be used. Foundation Programme trainees in the UK are typically supervised by multiple senior doctors whose traditional responsibilities include the identification of problems affecting professional competence. Because of the complex and subjective nature of the concept of competence, it cannot be said that such identification is a perfect marker of a lack of competence, or vice versa. Given this limitation, the use of multiple senior colleagues’ judgements as a method of competence assessment has the virtues of good face validity, wide professional acceptance and demonstrated ability to withstand the test of time. Further, aside from the rare and extreme instances of GMC involvement and censure of problem doctors, it is also the only marker available for identifying postgraduate trainees in professional difficulties and, for these reasons, more novel methods of assessing competence must be validated against it. This study evaluates the ability of WPBA to distinguish FP doctors with identified training difficulties (as a surrogate marker for lack of competence) from their peers. The study question was identified before any data analysis was performed.

Methods

  1. Top of page
  2. Abstract
  3. Introduction
  4. Methods
  5. Results
  6. Discussion
  7. References
  8. Supporting Information

Data extraction and index case identification

Data for 11 placement periods (during August 2005 to April 2009), which included all scores submitted by trainees on the four types of assessment, were extracted from the e-portfolio database. Records of trainees in difficulty, as identified by their educational supervisors and reported to the deanery, were tagged as index cases. Following involvement of the deanery, the underlying aetiology of training difficulties was also noted as related to one or more of health, conduct and performance. The definition of ‘Foundation trainees in difficulty’ (FTiDs), which the North Western Foundation School has used for 4 years, is a pragmatic and functional one:

‘...any trainee who has caused concern to his/her educational supervisor(s) about the ability to carry out their duties, which has required unusual measures. This would mean anything outside the normal trainer–trainee processes where the Foundation Programme Director has been called upon to take or recommend action. This includes health, conduct or performance but not normally maternity.’11

The definition, therefore, selects trainees for whom the normal trainer–trainee interaction is insufficient to resolve the identified problems in professional competence and for whom the problems, irrespective of their underlying causes, have affected their ability to work as junior doctors. This process has consistently returned 4–6% of the school population per year. The postgraduate deanery also records the ‘subtype(s)’ of difficulty in order to measure the prevalences of underlying aetiologies. This categorisation is based on related GMP domains (Table 1) and, despite such operationalisation, is somewhat subjective. It should be noted that more than one difficulty subtype may be recorded for an FTiD.

Table 1.   Frequencies of types of difficulty and combinations
Conduct-related*Health-relatedPerformance-relatedFrequency (%)
  1. * Related Good Medical Practice (GMP) domains: relationships with colleagues and patients; probity

  2. † Health is an individual domain in GMP. In practice it typically refers to problems related to mental health and substance abuse

  3. ‡ Related GMP domains: good clinical care; maintaining good practice; relationships with colleagues and patients; teaching and training

1554 (94.4)
X8 (0.5)
X31 (1.8)
X36 (2.2)
XX 4 (0.2)
XX4 (0.2)
XX7 (0.4)
XXX2 (0.1)

These data were anonymised during extraction, using unique identifiers per trainee, before undergoing statistical analysis. All statistical analyses were performed in stata IC/10 (StataCorp LP, College Station, TX, USA). The evaluation methods were reviewed by the chair of the local research ethics committee and ethical approval was not required. The methods were also approved by the postgraduate dean.

Summary measures

Protocols for the direct observation of procedural skills (DOPS), mini-clinical evaluation exercise (mini-CEX), case-based discussion (CbD) and mini-peer assessment tool (mini-PAT) contain 11, seven, seven and 16 items respectively. As trainees carried out assessments for up to 11 periods and multiple assessments of the same type in one placement period, this resulted in many repeated measurements per assessment per trainee. To simplify the analyses, summary measures were used to represent each trainee for each assessment type. For example, the summary measure to represent the DOPS result for an individual trainee was calculated as the mean score given over all DOPS assessments for that particular trainee. In all WPBAs, the possible score per question in each assessment was 1–6, where a score of 1 represented ‘Below expectations for F1 [FP year 1] completion’ and a score of 6 represented ‘Above expectations for F1 completion’. A score of 3 was defined as ‘Meeting expectations for F1/F2 completion’, which, given the FP’s aim of producing ‘demonstrably competent’ doctors, can be taken as analogous with ‘competent’. Therefore, four mean scores ranging from 1 to 6 were calculated for each trainee. Because of the size of the dataset, it would be impossible to manually calculate these summary scores. Thirty trainees were chosen at random from the trainee pool and their scores were checked via simple histograms to observe the skew of the data. It was decided that these data were sufficiently symmetric to generate a valid summary of the data using means.

Statistical analysis

Logistic regression was used to assess any association between assessment scores and whether a trainee was in difficulty. A receiver operator characteristic (ROC) curve analysis was then carried out to assess the diagnostic capabilities of the best-fitting logistic model. A ROC curve is a plot of sensitivities against specificities. As the cut-off level of a diagnostic test changes (in this case, a cut-off assessment score for diagnosing the status of FTiD), its sensitivity and specificity alters, so each new cut-off can be regarded as a different ‘test’. Increasing sensitivity may occur at the expense of a decrease of specificity. A ROC curve displays the trade-off between optimum sensitivity and optimum specificity to help make a decision about which cut-off to use.

Two or more variables can be compared as ROC curves in a single plot to assess their diagnostic capability. The area under the ROC curve summarises its capability: 0 indicates no capability; 0.5 indicates capability as good as that of tossing a coin, and 1 indicates perfect capability. The logistic regression and ROC curve analyses were repeated using the same predictor variables to look at associations with the three subtypes of trainee difficulty in particular.

Results

  1. Top of page
  2. Abstract
  3. Introduction
  4. Methods
  5. Results
  6. Discussion
  7. References
  8. Supporting Information

Altogether data representing 75 580 assessments, completed by 1646 different trainees, were analysed. Of these trainees, 92 had been identified as being in difficulty, as demonstrated by 18 incidences of conduct-related difficulties, 44 of health-related difficulties and 49 of performance-related difficulties. The exact combinations of difficulties are summarised in Table 1.

Average WPBA scores

Table 2 gives descriptive statistics for the mean scores per trainee for the four types of assessment. Graphic representation is shown in Fig. S1 (online). These descriptive statistics are given for trainees who were not detected as being in difficulty and index cases with training difficulties, whether of one type (e.g. conduct-related) or all three. The table shows that the distributions of data do not differ greatly between the two groups. However, the ranges of scores on CbDs and mini-CEXs are somewhat broader in the ‘No difficulties’ group and mean scores of FTiDs are lower. All mean scores and ranges for FTiDs were > 3 (meeting expectations for F1/F2 completion).

Table 2.   Descriptive statistics for mean assessment scores per trainee
 CbDDOPSMini-CEXMini-PAT
Mean (SD)RangeMean (SD)RangeMean (SD)RangeMean (SD)Range
  1. CbD = case-based discussion; DOPS = direct observation of procedural skills; mini-CEX = mini-clinical evaluation exercise; mini-PAT = mini-peer assessment tool; SD = standard deviation

No difficulties4.10 (0.36)2.64–5.294.04 (0.34)3.05–5.253.91 (0.38)1.71–5.004.8 (0.29)3.30–5.53
Any difficulty3.95 (0.38)3.18–5.294.04 (0.34)3.07–5.093.79 (0.35)3.00–4.844.8 (0.31)3.81–5.46

Scores as predictors of training difficulties

Table 3 shows the output for the logistic regression when mean CbD, DOPS, mini-CEX and mini-PAT scores are inserted as predictor variables and the outcome is classification as a doctor in difficulty or not. The output of a logistic regression provides an odds ratio (OR) for each predictor variable inserted. This OR indicates whether an increase in the predictor variable (e.g. mean CbD score) decreases the odds of being identified as an FTiD (an OR of < 1), has no association in this sample of data (an OR of 1) or increases the odds of being identified as an FTiD (an OR of > 1). The 95% confidence interval (CI) of this OR conveys how accurate this estimate is according to the population; therefore a 95% CI for an OR which does not contain 1 is said to be statistically significant. The ORs in Table 3 suggest a moderate and significant association between average score and the outcome of being identified as an FTiD for two assessment types, the mini-CEX (OR = 0.54, 95% CI 0.30–0.95) and the CbD (OR = 0.39, 95% CI 0.21–0.72).

Table 3.   Logistic regression output for the outcome of any difficulties
CovariateOR95% CI for ORp-value
  1. Data in bold are significant at p < 0.05

  2. OR = odds ratio; 95% CI = 95% confidence interval; CbD = case-based discussion; DOPS = direct observation of procedural skills; mini-CEX = mini-clinical evaluation exercise; mini-PAT = mini-peer assessment tool

Mean CbD score0.390.210.720.002
Mean DOPS score1.260.66–2.400.486
Mean mini-CEX score0.540.300.950.034
Mean mini-PAT score1.430.67–3.040.357

The ROC curve of the model in Table 3 is shown in Fig. 2, demonstrating the trade-off between sensitivity and specificity. It can be seen from this that the logistic model that best fits these data would not be very useful for diagnosing whether a doctor was in any difficulty or not. With an area under the curve (AUC) of 0.64, WPBA scores are only marginally better than coin-tossing (0.5) for predicting ‘in difficulty’ status.

image

Figure 2.  Receiver operator characteristic (ROC) curve for the logistic model in Table 3

Download figure to PowerPoint

Analyses by difficulty subtype

The logistic regression was performed again, this time for separate outcomes of conduct-, health- and performance-related difficulties. The results of these regressions are shown in Table 4. These results show that there are no statistically significant associations between mean scores of any type and the outcomes of conduct- and health-related difficulties. However, it is evident that the significant associations found in Table 3 can be mainly attributed to the association of mean CbD and mini-CEX scores with performance-related difficulties specifically (mean CbD score, OR = 0.16, 95% CI 0.07–0.36; mean mini-CEX score, OR = 0.31, 95% CI 0.14–0.66).

Table 4.   Logistic regression output for the outcomes of conduct-, health- and performance-related difficulties
Problem typeCovariateOR95% CI for ORp-value
  1. Data in bold are significant at p < 0.005

  2. OR = odds ratio; 95% CI = 95% confidence interval; CbD = case-based discussion; DOPS = direct observation of procedural skills; mini-CEX = mini-clinical evaluation exercise; mini-PAT = mini-peer assessment tool

Conduct-relatedMean CbD score0.980.26–3.730.99
Mean DOPS score1.660.40–6.990.49
Mean mini-CEX score0.990.27–3.600.98
Mean mini-PAT score1.830.33–10.070.49
Health-relatedMean CbD score0.610.26–1.440.28
Mean DOPS score1.050.42–2.630.94
Mean mini-CEX score0.650.29–1.450.29
Mean mini-PAT score1.600.54–4.700.40
Performance-relatedMean CbD score0.160.070.36< 0.001
Mean DOPS score1.290.53–3.130.77
Mean mini-CEX score0.310.140.660.003
Mean mini-PAT score1.680.59–4.780.34

The ROC curve for the logistic model in which only mean CbD and mini-CEX scores are inserted as predictor variables, with performance-related difficulties as outcome, is shown in Fig. 3. This has a higher AUC than Fig. 2, at 0.76. The ROC curve demonstrates the trade-off between sensitivity and specificity for using the score as a marker of performance-related difficulties. This curve indicates that, if sensitivity and specificity were equally weighted in importance, the optimum sensitivity and specificity that could be achieved would be around 60–75% for each (by observing the central portion of the plot).

image

Figure 3.  Receiver operator characteristic (ROC) curve for the logistic model containing mean case-based discussion and mean mini-clinical evaluation exercise scores for predicting performance difficulties

Download figure to PowerPoint

Discussion

  1. Top of page
  2. Abstract
  3. Introduction
  4. Methods
  5. Results
  6. Discussion
  7. References
  8. Supporting Information

This study analysed a large dataset of WPBA results for any evidence of association between scores and known training difficulties as determined by educational supervisors, whose assessments represent the only available marker of notable problems in training. After multiple analyses, significant associations were found between known training difficulties and average scores on both the mini-CEX and CbD, although the relationships were not sufficiently strong to have useful predictive value. Selecting the optimal relationship by analysing only for association with performance-related difficulty yielded moderate sensitivity and specificity.

Limitations of the study include the fact that its reference standard was represented by the current practice of having trainers detect trainees who have training, health-, conduct- or performance-related problems that are significant enough to affect their ability to perform their role and require special attention. This is the best available indicator of a lack of competence in FP trainees, but it is likely that this system fails to identify some trainees in difficulty, although these are likely to represent a small proportion of the total trainee population and to have less severe difficulties. There is also inevitable heterogeneity in the standards by which FP trainees are judged as each struggling trainee may be identified by a different supervisor. Further, the categorisation of training difficulties into performance-, conduct- and health-related problems is even more subjective and any conclusions based on these subtypes should be interpreted extremely cautiously. It also is quite probable that a trainee who is aware of his or her own status as ‘in difficulty’ might alter his or her behaviour to limit exposure to further evidence of incompetence. This might include avoiding difficult situations or assessors for WPBAs and thereby achieving globally higher scores. This would also influence the present analysis as trainees who are genuinely struggling but are not identified as being ‘in difficulty’ by their supervisors will be counted in the ‘No difficulties’ group. Finally, this study is retrospective and any results should be corroborated with prospective work.

Unfortunately for the statistical analysis, the prevalence of doctors identified as being in difficulty was low (5.6%). This did not appear to cause a problem when regressions onto the outcome of any training difficulty were performed, although it resulted in some relatively wide CIs when specific types of difficulty were examined. Given the weak associations found, it is anticipated that estimates derived from a larger sample taken to achieve higher numbers of doctors in difficulty would be unlikely to alter the conclusions made.

The study shows that using scores in WPBAs is not a valid way of detecting trainees in difficulty. There does appear to be some evidence of association between performance-related difficulty and lower mean scores on both the CbD and mini-CEX. However, this is not strong enough to suggest that these scores could be a useful diagnostic tool for identifying this type of problem. It is also notable that trainees identified as having difficulties demonstrated a narrower range of average scores on WPBAs, the distribution of which counter-intuitively showed fewer low scores for mini-PAT, mini-CEX and CbD assessments. The range of DOPS scores was comparable. We are not aware of other work that addresses the question of WPBA validity as a predictor of training difficulties or other markers of real-world competence, although some association between performance in other forms of summative assessment and WPBA results has been established.12

Assessments of competence do not reliably predict performance.13 The present results suggest that WPBA scores do not appear to predict lack of competence. An initial consideration indicated that WPBAs might serve to provide early warning of any difficulty, but, even with large datasets such as that reported here, this appears not to be possible and, indeed, may not be desirable. The consequences of assessments may have very important implications for poorly performing trainees. An expert advisory panel for the UK’s Independent Inquiry into Modernising Medical Careers considered the assessment methods used during the FP and for selection to specialty training. The resulting report acknowledged that: ‘The tools recommended for use in the Foundation Years are all recognised formative instruments capable of providing adequate reliability and validity. Their particular strength is in the rich variety of feedback they offer and their validity in terms of offering information about actual performance in practice.’ The panel did, however, highlight the fact that it is unclear when assessment is ‘aimed at supporting self-improvement and remediation (lower stakes) and when is it regulatory and summative (high stakes)’ during the FP.7 Similar debates on the utility of WPBAs as formative or summative assessments are occurring in other territories. In a notable example, WPBAs are currently being trialled in extremely high-stakes assessments in Australia, where the Australian Medical Council is evaluating a pilot of a WPBA-based alternative to its examination-based system for accrediting foreign medical graduates.

Traditionally, in medicine, the individual doctor or practitioner has been responsible for ensuring that he or she does whatever is necessary to become and remain competent. With increasing awareness of institutional responsibility for the competence of medical practitioners comes an understandable need for robust methods of delivering and measuring competence. The goal of the WPBA, as a relatively new method of assessment, is to assess real-world aspects of a doctor’s competence and to provide a focus for training and self-improvement. The primary intent of these methods should be to define individual strengths and weaknesses, and to involve trainees in the reflection process. Their quantitative results should not be relied upon as certifying the elusive concept of competence and their primary use for such ends may diminish their effectiveness in training. It should be restated that, when used appropriately, WPBAs are a valuable part of the FP and of postgraduate training generally. Their results should be interpreted in the context of multiple other methods of assessment, with the aim of achieving a genuinely holistic and representative assessment of professional competence.

Contributors:  SB and PB proposed the evaluation described in this study. CM and AH devised the analyses. AH performed the analyses. All authors contributed to the interpretation of the results and to the writing and editing of the article. All authors approved the final manuscript for publication.

Acknowledgements:  the authors thank Professor Tim Dornan, Department of Educational Development and Research, Faculty of Health, Medicine and Life Sciences, Maastricht University, Maastricht for critiquing a draft of this paper, and Dan Powley, North Western Deanery, Manchester for conducting the necessary data extraction.

Funding:  funding for statistical analysis from North Western Deanery.

Conflicts of interest:  none.

Ethical approval:  the protocol for this study was examined by the Manchester Local Research Ethics Committee and deemed exempt from full review.

References

  1. Top of page
  2. Abstract
  3. Introduction
  4. Methods
  5. Results
  6. Discussion
  7. References
  8. Supporting Information
  • 1
    Southgate L, Hays RB, Norcini J et al. Setting performance standards for medical practice: a theoretical framework. Med Educ 2001;35:47481.
  • 2
    Southgate L, Cox J, David T et al. The assessment of poorly performing doctors: the development of the assessment programmes for the General Medical Council’s Performance Procedures. Med Educ 2001;35 (Suppl 1):28.
  • 3
    Kohn LT, Corrigan JM, Donaldson MS. To Err is Human: Building A Safer Health System. A Report of the Committee on Quality of Health Care. Washington, DC: National Academy Press 2000.
  • 4
    Wass V, van der Vleuten C, Shatzer J, Jones R. Assessment of clinical competence. Lancet 2001;357:9458.
  • 5
    Norcini JJ. Peer assessment of competence. Med Educ 2003;37:53943.
  • 6
    Holmboe ES. Faculty and the observation of trainees’ clinical skills: problems and opportunities. Acad Med 2004;79:1622.
  • 7
    Independent Inquiry into Modernising Medical Careers. Aspiring to Excellence. London: MMC Inquiry 2008.
  • 8
    Collins J. Foundation for Excellence – An Evaluation of the Foundation Programme. London: Medical Education England 2010.
  • 9
    Miller A, Archer J. Impact of workplace-based assessment on doctors’ education and performance: a systematic review. BMJ 2010;341:c5064.
  • 10
    General Medical Council. Good Medical Practice: providing good medical care. London: GMC 2006.
  • 11
    Bhat S, Grue RB, Baker P. Ill health in Foundation trainees. British Medical Association, American Medical Association, Canadian Medical Association International Conference on Doctors’ Health: Doctors’ Health Matters – Finding the Balance, 17–19 November 2008, London. London: BMA 2008;37
  • 12
    Durning SJ, Cation LJ, Markert RJ, Pangaro LN. Assessing the reliability and validity of the mini-clinical evaluation exercise for internal medicine residency training. Acad Med 2002;77:9004.
  • 13
    Rethans JJ, Norcini JJ, Baron-Maldonado M, Blackmore D, Jolly BC, LaDuca T, Lew S, Page GG, Southgate LH. The relationship between competence and performance: implications for assessing practice performance. Med Educ 2002;36 (10):9019.
  • 14
    Archer JC, Norcini JJ, Davies HA. Peer review of paediatricians in training using SPRAT. BMJ 2005;330:12513.
  • 15
    Evans R, Elwyn G, Edwards A. Review of instruments for peer assessment of physicians. BMJ 2004;328 (7450):1240.
  • 16
    Sargeant JM, Mann KV, Ferrier S. Exploring family physicians’ reactions to multi-source feedback: perceptions of credibility and usefulness. Med Educ 2005;39:497504.
  • 17
    Norcini JJ, Blank LL, Arnold GK, Kimball HR. The mini-CEX (clinical evaluation exercise): a preliminary investigation. Ann Intern Med 1995;123:7959.
  • 18
    Wragg A, Wade W, Fuller G, Cowan G, Mills P. Assessing the performance of specialist registrars. Clin Med 2003;3 (2):1314.
  • 19
    Norcini JJ. Workplace-based assessment in clinical training. ASME Publication No. 31. Med Teach 2007;29 (9):85571.
  • 20
    Norman GR, Davis D, Painvin A, Lindsay E, Rath D, Ragbeer M. Comprehensive assessment of clinical competence of family/general physicians using multiple measures. Proceedings of the Annual Conference on Research in Medical Education. Washington, DC: Association of American Medical Colleges 1989;759.
  • 21
    Maatsch JL, Huang R, Downing S, Barker B. Predictive validity of medical specialty examinations – Final report for Grant HS 02038-04. East Lansing, MI: National Center of Health Services Research, Office of Medical Education and Research and Development, Michigan State University 1983.

Supporting Information

  1. Top of page
  2. Abstract
  3. Introduction
  4. Methods
  5. Results
  6. Discussion
  7. References
  8. Supporting Information

Figure S1. Mean score distributions for the four WPBAs.

FilenameFormatSizeDescription
MEDU_4056_sm_FigS1.doc93KSupporting info item

Please note: Wiley Blackwell is not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing content) should be directed to the corresponding author for the article.