Short communication
A validity study of the national UK colposcopy objective structured clinical examination—is it a test fit for purpose?
Article first published online: 11 NOV 2009
DOI: 10.1111/j.1471-0528.2009.02389.x
© 2009 The Authors Journal compilation © RCOG 2009 BJOG An International Journal of Obstetrics and Gynaecology
Issue

BJOG: An International Journal of Obstetrics & Gynaecology
Volume 116, Issue 13, pages 1796–1800, December 2009
Additional Information
How to Cite
Shehmar, M., Cruikshank, M., Finn, C., Redman, C., Fraser, I. and Peile, E. (2009), A validity study of the national UK colposcopy objective structured clinical examination—is it a test fit for purpose?. BJOG: An International Journal of Obstetrics & Gynaecology, 116: 1796–1800. doi: 10.1111/j.1471-0528.2009.02389.x
Publication History
- Issue published online: 11 NOV 2009
- Article first published online: 11 NOV 2009
- Accepted 7 July 2009.
- Abstract
- Article
- References
- Cited By
Keywords:
- BSCCP OSCE;
- reliability;
- validity
Abstract
- Top of page
- Abstract
- Introduction
- Methods
- Results
- Discussion
- Conclusion
- Future research
- Disclosure of interests
- Contribution to authorship
- Details of ethics approval
- Funding
- Acknowledgements
- References
- Commentary on ‘A validity study of the National UK colposcopy objective structured clinical examination – is it a test fit for purpose?’
This study examines the validity and reliability of the British Society for Colposcopy and Cytopathology (BSCCP) objective structured clinical examination (OSCE). The BSCCP OSCE results obtained over eight OSCE circuits were analysed using SPSS 15. Face validity and content validity were established from expert opinion and blueprinting. Statistically significant difference was not shown in construct validity through level of experience (P = 0.867, P = 0.822, P = 0.59, P = 0.74, P = 0.12, P = 0.01; these are the P values for each of the patient interaction stations) however, concurrent validity was established against the gynaecology mini-CEX (sensitivity = 1, specificity = 0.8, positive predictive value = 0.947, negative predicitive value = 1). The reliability of the OSCE’s range from Cronbach’s alpha of 0.617 to 0.775. The OSCE has face, content and concurrent validity.
Introduction
- Top of page
- Abstract
- Introduction
- Methods
- Results
- Discussion
- Conclusion
- Future research
- Disclosure of interests
- Contribution to authorship
- Details of ethics approval
- Funding
- Acknowledgements
- References
- Commentary on ‘A validity study of the National UK colposcopy objective structured clinical examination – is it a test fit for purpose?’
The Postgraduate Medical Education and Training Board (PMETB) is the independent regulatory body governing postgraduate medical education in the UK. In their publication on the principles for an assessment system,1 PMETB state ‘methods will be chosen on the basis of validity, reliability, feasibility, cost effectiveness, opportunities for feedback, and impact on learning.’ They also state that ‘studies to establish the validity of new methods will be undertaken.’ The British Society for Colposcopy and Cervical Pathology (BSCCP) changed its summative assessment method in 2006 to include an objective structured clinical examination (OSCE). Joint certification in colposcopy by the BSCCP and The Royal College of Obstetricians and Gynaecologists (RCOG) is a pre-requisite to practicing colposcopy within the UK national screening programmes. Previously, colposcopic competence on completion of training was assessed by the submission of a logbook as a record of clinical experience and ten written case reports incorporating a critical analysis. The demonstration of clinical knowledge in this method was dependent on case selection, on the standard of written English and on the ability to identify and reference supporting evidence from clinical practice. Blueprinting of cases against the competencies in the training manual was limited.
The aim of this study was to establish the validity and reliability of this new OSCE in line with PMETB recommendations. Validity is the extent to which the test measures what it is intended to measure. Validity itself has many components, and one can place more confidence in the validity of an assessment if more of these validities are shown to be true. Reliability is whether a test gives the same results over different samples and time. It is a measure of the test results rather than of the test itself. Unlike reliability, validity cannot be expressed as a single coefficient. We investigated face, content, construct and concurrent validity as well as reliability.
Face validity is whether the assessment ‘feels right on the face of it’. That is, does it look as if it assesses what it should? Face validity can be demonstrated by expert opinion. Content validity refers to how much the OSCE covers the areas of competency. Construct validity is whether the assessment produces the expected results. For example, candidates with no experience of colposcopy would be expected to score significantly lower than those with considerable experience. The concurrent validity is how your assessment tool compares with another best existing tool which assesses the same skill. The reliability of a test is whether it measures consistently over time and with different sample items. Reliability was measured by internal consistency, which is the extent of the correlation between different items on the same test that propose to measure the same general construct.
Most OSCEs, including the BSCCP OSCE, use both a checklist and a global marking score. The checklist score is awarded when particular items of competency are demonstrated by the candidate and the global score awards a mark for the general performance at a station.
Methods
- Top of page
- Abstract
- Introduction
- Methods
- Results
- Discussion
- Conclusion
- Future research
- Disclosure of interests
- Contribution to authorship
- Details of ethics approval
- Funding
- Acknowledgements
- References
- Commentary on ‘A validity study of the National UK colposcopy objective structured clinical examination – is it a test fit for purpose?’
The study population comprised OSCE candidates for all BSCCP OSCEs run between May 2007 and November 2007. This comprised three OSCE cohorts (May 2007, June 2007, November 2007). A total of eight separate OSCE circuits were run during this period. All those candidates who agreed to participate were included in the study, and there were no exclusions. All OSCE circuits were run at Birmingham Women’s Hospital NHS Trust by the BSCCP. The results were collated and statistical analysis was carried out using SPSS 15 (SPSS Inc., Chicago, IL, USA).
Face validity
Expert opinion was used to gauge face validity.
Content validity
A method of sampling over a wider range of competencies is to draw up a blueprint. This is a grid listing the content areas against the skills required within each area. To ensure that these samples are aligned with the learning outcomes of the curriculum, a blueprint is used with domains along the x-axis, such as high-grade cervical intraepithelial neoplasia, and skills down the y-axis, such as communicating results. The OSCE stations can then be mapped out against the blueprint to ensure adequate sampling.2
Construct validity
The construct validity was tested by comparing the checklist results of those people who have more experience (in years) in the field of gynaecology with those of candidates with less experience.
The BSCCP database of OSCE results was used to obtain data on performance in the patient-interaction stations. The checklist scores were analysed using analysis of variance with a P < 0.05 used to denote statistical significance.
Concurrent validity
Concurrent validity was compared at the history-taking station using the OSCE checklist and global marking score with the RCOG gynaecology Mini-CEX (mini-clinical evaluation exercise), which has been approved by PMETB. We looked for concordance in similar skills in the November OSCE. The scores given using the mini-CEX and the BSCCP assessment tool for the patient-interaction station were analysed. The mini-CEX was used as the validated gold standard tool. The examiner for this station was asked to mark the student using both. The specificity, sensitivity and negative and positive predictive values were determined for passing the station using the modified borderline method. A two-by-two table was constructed to analyse the results.
Reliability
The reliability of each of the OSCE sessions (May 2007, June 2007 and November 2007) was measured by internal consistency and analysis was by Cronbach’s alpha.
Results
- Top of page
- Abstract
- Introduction
- Methods
- Results
- Discussion
- Conclusion
- Future research
- Disclosure of interests
- Contribution to authorship
- Details of ethics approval
- Funding
- Acknowledgements
- References
- Commentary on ‘A validity study of the National UK colposcopy objective structured clinical examination – is it a test fit for purpose?’
Participants represented all training regions nationally. There were six who were post-certificate of completion of training level, three staff grades, 56 at registrar level, three at senior house officer level and seven were nurses. There was a range of experience in gynaecology, 38 had between 2 and 6 years of experience, 28 had between 7 and 10 years of experience and nine had 11 years or more of experience.
Face validity
Before the new assessment came into use, the face validity had been reviewed and accepted by an expert panel of colposcopists. In addition the BSCCP OSCE assessment process has been reviewed by clinical experts on behalf of the RCOG and by PMETB. The face validity was therefore deemed to be at an acceptable standard to trial the new assessment.
Content validity
This was established by alignment between the curriculum and the OSCE using blueprints. They list the clinical knowledge that needs to be assessed and the skills involved.
Construct validity
There was no statistically significant difference between experience (as measured by years in gynaecology) and scores in the patient-interaction OSCE stations for most of the OSCEs as measured against a statistical significance of 0.05 (P = 0.867, P = 0.822, P = 0.59, P = 0.74, P = 0.12, P = 0.001). There was one OSCE where a statistically significant variance was seen (P = 0.01). Here, the most experienced group had a lower mean score for patient-interaction station 2 in the May 2007 OSCE (Table 1).
| OSCE | Years of experience | Station 1 | Station 2 | |||||
|---|---|---|---|---|---|---|---|---|
| n | Mean | SD | P-value | M | SD | P-value | ||
| May | 2–6 | 16 | 16.8 | 2.9 | 0.867 | 17.1 | 2.0 | 0.01 |
| 7–11 | 8 | 16.2 | 3.7 | 18.3 | 1.4 | |||
| 11+ | 3 | 16.0 | 1.0 | 14.3 | 0.6 | |||
| June | 0–6 | 8 | 19.2 | 5.8 | 0.822 | 22.1 | 2.0 | 0.74 |
| 7–11 | 11 | 20.5 | 3.2 | 21.6 | 4.7 | |||
| 11+ | 1 | 21.0 | – | 19.0 | – | |||
| November | 2–6 | 14 | 7.3 | 2.2 | 0.59 | 15.7 | 2.6 | 0.12 |
| 7–11 | 9 | 7.1 | 1.8 | 13.0 | 3.9 | |||
| 11+ | 5 | 8.2 | 1.3 | 15.4 | 2.1 | |||
Concurrent validity
Twenty-three candidates participated in this part of the study. Using the mini-CEX tool, 19 people passed the station and four failed. Using the BSCCP assessment tool, 18 passed and five failed. By constructing a two-by-two table, the sensitivity of the BSCCP tool was 1 and the specificity was 0.8. The positive predictive value was 0.947 and the negative predictive value was 1.
Reliability
Cronbach’s alpha for the May OSCE was 0.775, for the June OSCE was 0.617 and for the November OSCE was 0.696.
Discussion
- Top of page
- Abstract
- Introduction
- Methods
- Results
- Discussion
- Conclusion
- Future research
- Disclosure of interests
- Contribution to authorship
- Details of ethics approval
- Funding
- Acknowledgements
- References
- Commentary on ‘A validity study of the National UK colposcopy objective structured clinical examination – is it a test fit for purpose?’
The BSCCP have established stringent validity for their new assessment. Earlier studies in medical education have established face and content validity;3 however, these types of validity on their own are not robust enough, particularly for high-stakes examinations.4 This study established that the BSCCP OSCE has face, content and concurrent validity as well as being reliable.
PMETB has published acceptable reliability scores for a 2-hour OSCE of a Cronbach’s alpha of 0.64.1 Our results show that the BSCCP performs consistently higher in reliability than this accepted standard. The patient-interaction stations were chosen to test for concurrent validity; they examined similar constructs to those examined by the mini-CEX (history taking, communication, professionalism). The mini-CEX was chosen as the gold standard because it has been tested itself in many studies and has been shown to have the expected relationship with other assessments of competency.5 In addition, PMETB chose the mini-CEX as one of the work-based assessment tools after the evaluation of the mini-CEX against the nine principles set out by PMETB for an assessment system for postgraduate medical training6 and so is a relevant tool against which to measure other assessment tools. The BSCCP assessment tool for the interactive stations has good sensitivity and specificity, as well as excellent positive and negative predictive values. In particular, the BSCCP tool is discriminating and fails all those people who should fail according to the gold standard (negative predictive value = 1). This is of particular importance because passing the OSCE will contribute to certification and independent practice.
We cannot assess all the desired competencies; therefore we need to take a sample of these. The problems or conditions the candidate should be familiar with are broadly identified in the training guide. The skills within the problems or conditions are set, such as communicating results or image recognition of colpophotograhs or micrographs of cytology or pathology specimens. By blueprinting, the BSCCP can ensure that all the relevant domains are covered, that the clinical content reflects reality and that there is a balance across them. Assessing across a wide sample improves the content validity and reliability of the OSCE.
The patient-interaction stations were also used for determining construct validity. The hypothesis being that practitioners who were more experienced in gynaecology would perform better at the patient-interaction station. Our study was unable to establish construct validity. This may have been because of the small numbers in some of the years of experience groups. However, as the BSCCP OSCE sets its standard for minimum competency, we would not expect to see an effect of experience. Generally, a person can become minimally competent at low experience levels as determined by their case exposure, and experience does not equate to competency. If the standard was set towards excellence or mastery of a skill rather than minimal competency, then experience would be more likely to discriminate.
Limitations of the study were that examiner perception of the OSCE was not explored. A questionnaire to examiners would have investigated their perception of the acceptability of the study.
Conclusion
- Top of page
- Abstract
- Introduction
- Methods
- Results
- Discussion
- Conclusion
- Future research
- Disclosure of interests
- Contribution to authorship
- Details of ethics approval
- Funding
- Acknowledgements
- References
- Commentary on ‘A validity study of the National UK colposcopy objective structured clinical examination – is it a test fit for purpose?’
This study has established the validity of the BSCCP OSCE in line with PMETB recommendations. In particular, it has shown to fail those candidates who should fail, providing for quality assurance of the assessment. We have shown the importance and feasibility of performing validity studies on all new assessment methods in postgraduate medical education.
Future research
- Top of page
- Abstract
- Introduction
- Methods
- Results
- Discussion
- Conclusion
- Future research
- Disclosure of interests
- Contribution to authorship
- Details of ethics approval
- Funding
- Acknowledgements
- References
- Commentary on ‘A validity study of the National UK colposcopy objective structured clinical examination – is it a test fit for purpose?’
Validity studies should be performed for all new assessment tools, particularly in high-stakes assessments. Further validation could be carried out by measuring predictive validity through work-based assessments of performance. This component of validity would enhance the overall validity of the OSCE.
Disclosure of interests
- Top of page
- Abstract
- Introduction
- Methods
- Results
- Discussion
- Conclusion
- Future research
- Disclosure of interests
- Contribution to authorship
- Details of ethics approval
- Funding
- Acknowledgements
- References
- Commentary on ‘A validity study of the National UK colposcopy objective structured clinical examination – is it a test fit for purpose?’
None.
Contribution to authorship
- Top of page
- Abstract
- Introduction
- Methods
- Results
- Discussion
- Conclusion
- Future research
- Disclosure of interests
- Contribution to authorship
- Details of ethics approval
- Funding
- Acknowledgements
- References
- Commentary on ‘A validity study of the National UK colposcopy objective structured clinical examination – is it a test fit for purpose?’
All authors have made a substantial contribution to the conception and design, or acquisition of data, or analysis and interpretation of data; and contributed to drafting the paper and approving the final version of the draft.
Details of ethics approval
- Top of page
- Abstract
- Introduction
- Methods
- Results
- Discussion
- Conclusion
- Future research
- Disclosure of interests
- Contribution to authorship
- Details of ethics approval
- Funding
- Acknowledgements
- References
- Commentary on ‘A validity study of the National UK colposcopy objective structured clinical examination – is it a test fit for purpose?’
Ethics approval was granted by Coventry Research Ethics Committee 11/01/08 (REC ref: 07/H1210/124).
Funding
- Top of page
- Abstract
- Introduction
- Methods
- Results
- Discussion
- Conclusion
- Future research
- Disclosure of interests
- Contribution to authorship
- Details of ethics approval
- Funding
- Acknowledgements
- References
- Commentary on ‘A validity study of the National UK colposcopy objective structured clinical examination – is it a test fit for purpose?’
There was no funding for this paper.
Acknowledgements
- Top of page
- Abstract
- Introduction
- Methods
- Results
- Discussion
- Conclusion
- Future research
- Disclosure of interests
- Contribution to authorship
- Details of ethics approval
- Funding
- Acknowledgements
- References
- Commentary on ‘A validity study of the National UK colposcopy objective structured clinical examination – is it a test fit for purpose?’
We are grateful to Prof. N. Stallard for statistical input, to L. Wood for advice on construct validity, and to L. Dollery, D. Lewis and S. Parisi for access to BSCCP databases.
References
- Top of page
- Abstract
- Introduction
- Methods
- Results
- Discussion
- Conclusion
- Future research
- Disclosure of interests
- Contribution to authorship
- Details of ethics approval
- Funding
- Acknowledgements
- References
- Commentary on ‘A validity study of the National UK colposcopy objective structured clinical examination – is it a test fit for purpose?’
- 1PMETB. The Assessment Working Group. Developing and maintaining an assessment system—a PMETB guide to good practice. 2007. [http://www.pmetb.org.uk/fileadmin/user/QA/Assessment/Assessment_good_practice_v0207.pdf]. Accessed January 2007.
- 2
- 3
- 4
- 5
- 6, . Principles for an assessment system for postgraduate medical training. 2004. [http://www.pmetb.org.uk/media/pdf/4/9/PMETB_principles_for_an_assessment_system_for_postgraduate_medical_training_(September_2004).pdf]. Accessed September 2009.
Commentary on ‘A validity study of the National UK colposcopy objective structured clinical examination – is it a test fit for purpose?’
- Top of page
- Abstract
- Introduction
- Methods
- Results
- Discussion
- Conclusion
- Future research
- Disclosure of interests
- Contribution to authorship
- Details of ethics approval
- Funding
- Acknowledgements
- References
- Commentary on ‘A validity study of the National UK colposcopy objective structured clinical examination – is it a test fit for purpose?’
Shehmar et al. have applied various measures of statistical test theory to analyse the credibility, overall quality and performance of the British Society of Colposcopy and Cytopathology’s (BSCCP) Objective Structured Clinical Examination (OSCE) (Shehmar M, et al.BJOG 2009;116:1796–800). The paper was written from a testing theorist’s and statistician’s point of view. Their findings seem reasonable and reassuring; the OSCE seems to appropriately stratify candidates into those who should and should not pass the examination. However, how reassuring are these findings for the practitioner or the patient?
Certainly a valid test should select against those who are really not qualified to practice. However, we are not informed as to what actually occurs in this test. Do they actually take biopsies? If biopsies are taken, how are they correlated with outcome for the test patient(s) or is it all based on images and computer simulation? These real-world types of analyses seem to be lacking from this theoretical evaluation. Indeed, it seems clear that performance on this OSCE is really set, as the authors imply, at a level of minimal competence rather than true clinical proficiency. The authors spend a good part of the discussion attempting to explain why the test was unable to demonstrate the hypothetical construct validity that would stratify performance based on clinical experience. Yes, the numbers were small and, in a sense, clinical experience may not be germane to this skill because one cannot practice without passing the examination and most candidates taking the examination have, by definition, limited experience. Potentially, it would have been interesting to see the data stratified by the number of colposcopies each candidate had performed, either supervised or outside the system, before taking the OSCE. More to the point, perhaps experience is less critical than one might expect, as recent literature has shown that a significant fraction of high-grade lesions are present but missed by the colposcopist during the initial biopsy assessment. For example, in the US National Cancer Institute ASCUS LSIL Triage Study (ALTS) approximately 50% of lesions were missed by what was presupposed to be the gold standard of cervical assessment, namely immediate colposcopy by a cadre of colposcopists all of whom were trained and certified by the American counterpart of the BSCCP (The ASCUS-LSIL Triage Study (ALTS) Group, Am J Obstet Gynecol 2003;188:1383–92). Clearly, there are a number of variables that contribute to the sensitivity of colposcopy. Besides experience, the criteria used to develop one’s colposcopic impression, the age of the patient, and the size of the lesion are all important variables in determining colposcopic accuracy (Massad LS, et al.J Low Genit Tract Dis 2009;13:137–44). In addition, the number of biopsies taken by the colposcopist is emerging as a crucial variable in determining colposcopic sensitivity. In ALTS, as well as other recent studies, the number of biopsies seems to trump most of the other variables. Indeed when most other factors are controlled, nurse practitioners in ALTS performed just as well as experienced gynaecological oncologists as long as the number of biopsies taken was at least three during a colposcopic session (Gage JC, et al. Obstet Gynecol 2006;108:264–72).
In summary, while the analysis by Shehmar et al. is a necessary part of test validation, it seems that the BSCCP, like the ASCCP, certifying tests set a floor for minimal competence, but this floor does not fix the problems inherent in colposcopy. In fact, the problems may or may not be fixable given the number of variables that impact colposcopic sensitivity.
Disclosure of interest
There were no conflicts of interest or relevant financial disclosures.
MH Stoler Department of Pathology, University of Virginia Health System, Charlottesville, VA, USA

1471-0528/asset/BJO_left.gif?v=1&s=0fb87361cdb6be25fdf05019eed6d47f5143f610)
1471-0528/asset/olbannerright.gif?v=1&s=3892ef16ff18d6834c302faf85268a49f5fc588f)