The mind’s scalpel in surgical education: a randomised controlled trial of mental imagery


R Geoffrion, St Paul’s Hospital, 1190, Hornby Street 4th floor, Vancouver, BC V6Z 2K5, Canada. Email


Please cite this paper as: Geoffrion R, Gebhart J, Dooley Y, Bent A, Dandolu V, Meeks R, Baker K, Tang S, Ross S, Robert M. The mind’s scalpel in surgical education: a randomised controlled trial of mental imagery. BJOG 2012;119:1040–1048.

Objective  To evaluate the role of mental imagery (MI) in resident training for a complex surgical procedure.

Design  Randomised controlled trial.

Setting  Eight centres across Canada and the USA.

Population  Junior gynaecology residents who had performed fewer than five vaginal hysterectomies (VH).

Methods  After performing a pretest VH, junior gynaecology residents were randomised to standard MI versus textbook reading (No MI) and then performed a test VH. Surgeons blinded to group evaluated resident performance on the pretest and test VH via global rating scales (GRS), procedure-specific scales and intraoperative parameters. Residents evaluated their own performance.

Main outcome measure  Change in surgeon GRS score from pretest to test VH. The study was powered to detect a 20% difference in score change.

Results  Fifty residents completed the trial (24 MI, 26 No MI). There was no difference in GRS score change via blinded assessment from pretest to test evaluation between groups (mean change 13% [SD 17] versus 7% [SD 14], P = 0.192). There was no difference in procedure-specific score change. There was a significant difference in self-scored GRS score change between groups (mean change 19% [SD 12] versus 9% [SD 11], P = 0.005). Residents also felt more confident performing a VH (mean change 19% [SD 16] MI versus 11% [SD 10] No MI, P = 0.033).

Conclusions  No difference was observed in the surgical performance of residents after MI. Improved resident self-confidence may be attributable to MI or the effect of unblinding on trial participants.


Mental imagery (MI) is the ‘cognitive rehearsal of a task in the absence of overt physical movement’.1 When applied to learning of a motor task, it activates synaptic connections in the motor frontal cortex that are normally activated by actual physical practice.2 Neuroplasticity has been shown in other traditionally unimodal cortical areas such as the auditory and visual cortices.3,4 Kosslyn et al.4 demonstrated that the primary visual cortex of blindfolded participants was activated during mental visualisation, similar to actual visualisation. Mental imagery has proven applications in high-performance athletic training and post-stroke rehabilitation programmes.5–8 It has been shown to improve perceptual learning on visual tasks such as spatial discrimination and contrast detection.9 In surgery, MI as a technique of motor learning and practice has the theoretical advantages of being safe and inexpensive when compared with traditional learning in the operating theatre or in a simulation environment.1 In addition, it has been shown to improve nontechnical aspects of surgical performance such as stress10 and confidence.1 Mental imagery is usually performed one on one with the learner and according to a script that contains visual, cognitive and kinaesthetic cues.1 Repetitive individual MI rehearsal can then be undertaken.11 More than a simple enumeration of procedural steps, imagery of one or more surgical tasks encourages the learner to imagine the actual movements involved in the execution of each step.1,12

Our multicentre randomised controlled trial tested the hypothesis that MI enhances the surgical skill of novice surgeons for the performance of a vaginal hysterectomy (VH) in the operating theatre when compared with a control group of novice surgeons reading about a VH in a standard gynaecological textbook.


The use of MI for learning VH was investigated in a multicentre randomised controlled trial with a parallel design and an allocation ratio of 1:1. Learners were randomly allocated to training via standard MI (MI group) or reading a surgical gynaecology textbook (No MI). The main outcome was change in surgical performance from pretest VH to test evaluation after intervention (MI or standard textbook). Data were collected between 2008 and 2011. Local research ethics board approval was obtained from each participating centre.


Eight academic centres in Canada and the USA participated. Surgical training differs between the two countries in that Canadian gynaecology residencies are 5 years in duration, whereas US programmes are 4 years. In Canada, vaginal surgical skill acquisition usually starts towards the end of the second and the beginning of the third year of residency, and often extends to the fourth and fifth years. This can vary from programme to programme depending on available surgical volume. In the USA, on the other hand, vaginal surgery skill acquisition is typically complete by the third year.

Recruitment and eligibility

Residents were recruited, by principal investigators at each centre, at the start of their learning curve for VH. For MI to work as a teaching intervention, experts recommend some prior knowledge of the procedure to be learned.13 We recruited residents who had observed at least one but performed fewer than five VH on their own because the learning curve for laparoscopic-assisted vaginal hysterectomy is five to 30 such procedures.14,15 To avoid possible resident coercion, the residency programme director was not eligible to participate as an evaluator of resident surgical performance.


Residents were allocated to one of two groups via a computer-generated list of random numbers, stratified by centre. Five sequentially numbered sealed opaque envelopes were initially distributed to centres. As the envelopes were used for new recruits, they were replaced by mail with further sequentially numbered envelopes.


All residents underwent the usual surgical training, with the addition of the trial intervention as allocated by randomisation.

MI group

The MI teaching intervention was standardised via an MI script developed by the principal investigator (RG). The standard MI script enumerated the procedure steps based on a reference surgical gynaecology textbook,16 with the addition of visual, cognitive and kinaesthetic performance details. An example of a visual detail is: ‘Identify the orientation of the uterosacral ligaments in three dimensions, 45 degrees downward and lateral from the posterior cervicovaginal junction.’ An example of a cognitive detail is: ‘Local anaesthetic infiltration facilitates dissection in the proper plane and acts as an internal tourniquet.’ An example of a kinaesthetic detail is: ‘Use your scalpel perpendicular to the tissues and with brief strokes.’ A DVD of an MI session was recorded at the coordinating site and mailed to each participating site to standardise the intervention and train a dedicated MI educator at each site. Following the pretest VH, each resident randomised to the MI arm performed the MI sessions one on one with the designated MI educator at each site. The MI residents then continued individual performance of MI for VH until they felt comfortable with the procedure ‘in their mind’s eye’ and before the test VH. The day before the test VH, residents met with the dedicated MI educator for another standard MI session.

Control group (No MI)

After performing an initial pretest VH, residents in the control group were encouraged to read the textbook chapter describing VH,16 but were given no additional training for the study. They then performed a test VH.

Data collection

In both groups, demographic data such as age, gender, resident level, number of VH observed and number of VH performed were recorded.

Evaluation of surgical skills and confidence

Evaluation of surgical skills was carried out by the gynaecological surgeons supervising the residents in the performance of VH in the operating theatre. Evaluators did not participate as MI educators, and were blinded to the resident’s allocated group.

All performance evaluations were administered with each VH, that is an evaluation during the VH at baseline (pretest), and again during the test VH after the intervention. Outcomes were calculated as a change in scores from performance on the pretest baseline VH to performance on the test VH.

The primary outcome was the change in validated global rating scale score (GRS) of surgical skill.17,18 The GRS scored seven components of surgical performance using five-point Likert scales. Maximum attainable score was 35. Evaluators also recorded resident pretest and test performances on a rating scale of procedure-specific steps.19 This scale was developed by expert consensus specifically for VH. It served to evaluate knowledge of and adherence to ten necessary steps of the procedure. For each step, the possible answers (‘No’, ‘Prompted’ and ‘Yes’) were equivalent to 0, 1 and 2 points, respectively. Maximum score attainable was 20.

Residents in each group also scored their own performance using the GRS on pretest and test VH. In addition, residents used a self-confidence scale to rate their level of confidence with pretest and test VH.20 This scale was partially validated in a randomised controlled trial of MI in medical students learning basic surgical skills.20 Seven Likert-scale-type questions were used with a maximum attainable score of 35.

In addition to the attending surgeon evaluations and the resident self-evaluations, intraoperative parameters of time in operating theatre, blood loss and complications were recorded for pretest and test VH.

Sample size

A priori sample size calculation showed that 50 learners would be required to find a 20% improvement in GRS from pretest to test VH, with 25% standard deviation (to account for the variation in surgical skill of residents), 80% power and a significance level of 0.05. Using the GRS, Goff et al.18 established significant differences in scores between gynaecology residents of different levels performing complex pelvic operations on live pig models, and so demonstrated the construct validity of the scale. Based on these studies and on expert opinion, we determined that an improvement in score of 20% on the GRS would warrant acceptance of MI for resident teaching.

Statistical analyses

Per protocol analysis included only those residents who obtained both the pretest and test VH performance assessments and received the allocated intervention (standard MI or No MI). Baseline characteristics were summarised using descriptive statistics.

Results from the GRS and the self-confidence rating scale were converted to percentage scores out of 35, and procedure-specific rating scale results were converted to percentage scores out of 20. Pretest to test VH change scores for the primary and secondary outcomes were compared between the MI and No MI groups using the Student's t test. Change score differences between groups, along with their 95% confidence intervals, were reported. The Satterthwaite approximation for P-values and 95% confidence intervals (95% CI) were reported if the folded F-test indicated unequal variances.

Additional analyses were undertaken to examine the change in raw scores on individual questions on the GRS by attending physicians, along with total scores for manual dexterity measures (questions 1, 2 and 3 on the scale) and cognitive measures (questions 4, 5, 6 and 7). These were compared between groups using the t-test. Two-way analysis of variance on change scores for the GRS, procedure-specific rating scale and self-confidence rating scale was used to examine the effects of MI after controlling for resident level. Results from the type III sum of squares were reported. Data were analysed using SAS version 9.2 (SAS Institute Inc., Cary, NC, USA). Missing values were excluded from statistical tests.


Between April 2008 and March 2011, 82 residents were recruited from the eight academic centres and assessed for eligibility. The CONSORT21 flow diagram of participant recruitment and follow up is shown in Figure 1. Of the 79 eligible residents who consented to join the study, 39 were randomised to the MI intervention group (MI), and 40 to the control group (No MI). Fifteen in the MI group and 14 in the No MI group were lost to follow up or did not follow through to obtain the assessments or intervention (Figure 1). Results from a total of 24 in the MI group and 26 in the No MI group were analysed.

Figure 1.

CONSORT flow diagram.

Baseline characteristics of residents in the MI group and the No MI group are presented in Table 1. Median age of residents was 28 years for both groups, with 13 (54%) in the MI group and 16 (62%) in the No MI group in their first or second years of residency. Most residents had observed or assisted in fewer than ten vaginal hysterectomies: 15 (63%) in the MI group and 14 (54%) in the No MI group. One half of the residents had not previously performed any vaginal hysterectomies: 13 (54%) in the MI group and 12 (46%) in the No MI group.

Table 1.   Baseline characteristics of residents
CharacteristicsMI group (n = 24)No MI group (n = 26)
  1. IQR, interquartile range.

Age (years); median (IQR)28 (3)28 (2)
Gender; n (%)
Male6 (25)7 (27)
Female18 (75)19 (73)
Resident level; n (%)
15 (21)6 (23)
28 (33)10 (38)
310 (42)9 (35)
41 (4)1 (4)
No. of VH observed/assisted; n (%)
Fewer than 1015 (63)14 (54)
10–499 (38)11 (42)
50–1000 (0)1 (4)
No. of VH performed; n (%)
013 (54)12 (46)
12 (8)6 (23)
23 (13)2 (8)
32 (8)4 (15)
44 (17)2 (8)
Days between pretest and test VH; n (%)
0–1410 (42)17 (65)
15–306 (25)4 (15)
31–904 (17)2 (8)
91 or more4 (17)3 (12)

Table 2 shows the results of the pretest and test VH assessments, along with the change in scores. This small trial showed only a small and statistically insignificant improvement in objective surgical performance of novice gynaecological surgeons after MI: the mean improvement in primary outcome (change in GRS score recorded by attending physician), was 13% (SD 17) in the MI group versus 7% (SD 14) in the No MI group (mean difference of 6% (95% CI −3% to 15%), P = 0.192). Table 3 also shows no difference in change on individual items of the GRS between groups. There was no statistically significant difference between groups for the secondary outcome of change in procedure-specific scale score as assessed by the attending physician (mean 14% improvement for the MI group compared with 10% for the No MI group, P = 0.320).

Table 2.  Vaginal hysterectomy assessments
OutcomesMI group
mean (SD)* (n = 24)
No MI group
mean (SD) (n = 26)
Difference in change scores MI − No MI (95% CI)P-value**
Raw score% scoreRaw score% scoreRaw score% score
  1. *Percentage scores out of 35 for the global rating scale and self-confidence rating scale, and out of 20 for the procedure-specific scale.

  2. **t-test comparing change in measurements from pretest to test.

  3. ***Excludes n = 1 in the MI group and n = 1 in the No MI group where the test scale was incomplete.

  4. ****Excludes n = 2 in the No MI group where the pretest scale was incomplete.

  5. *****Excludes n = 1 in the No MI group where the test scale was incomplete.

  6. ******Excludes n = 1 in the MI group where both the pretest and test times are missing, n = 2 in the MI group where the test time is missing, and n = 1 in the No MI group where the test time is missing.

Primary outcome
GRS by attending physician***
Pretest19.8 (6.0)56 (17)21.7 (6.1)62 (17)  0.192
Test23.9 (6.0)68 (17)23.8 (5.6)68 (16)  
Test–Pretest change4.4 (5.8)13 (17)2.4 (4.8)7 (14)2.0 (−1.1 to 5.1)6 (−3 to 15)
Secondary outcomes
Procedure-specific score by attending physician****
Pretest15.7 (3.2)79 (16)16.9 (3.4)84 (17)  0.320
Test18.4 (1.9)92 (10)18.8 (1.5)94 (10)  
Test–Pretest change2.7 (2.7)14 (13)1.9 (2.8)10 (14)0.8 (−0.8 to 2.4)4 (−4 to 12)
Global rating score by resident
Pretest17.9 (4.4)51 (13)18.8 (5.6)54 (16) 10.005
Test24.5 (4.3)70 (12)22.0 (4.9)63 (14)  
Test–Pretest change6.6 (4.3)19 (12)3.2 (3.7)9 (11)3.4 (1.1 to 5.6)0 (3 to 16)
Self-confidence rating score by resident*****
Pretest18.5 (4.3)53 (12)19.6 (4.2)56 (12)  0.033
Test25.2 (4.0)72 (11)23.2 (4.2)66 (12)  
Test–Pretest change6.7 (5.6)19 (16)3.7 (3.6)11 (10)3.0 (0.3 to 5.7)9 (1 to 16)
Time in operating theatre (minutes)******
Pretest82 (80)74 (45)  0.202
Test67 (30)79 (53)  
Test–Pretest change−18 (66)3 (36)−21 (−54 to 12)
Blood loss (ml)
Pretest219 (167)208 (133)  0.633
Test228 (165)188 (134)  
Test–Pretest change9 (235)−19 (185)29 (−91 to 148)
Table 3.  Change in GRS scores by attending physician
Individual questionMI group
mean (SD) (n = 24)
No MI group
mean (SD) (n = 26)
Difference in change scores
MI − No MI (95% CI)
  1. *t-test comparing change in measurements from preintervention to postintervention.

(1) Respect for tissue0.4 (0.8)0.0 (0.9)0.3 (−0.2 to 0.8)0.180
(2) Time and motion0.6 (0.8)0.4 (0.9)0.2 (−0.3 to 0.7)0.343
(3) Instrument handling0.6 (1.0)0.2 (0.8)0.4 (−0.1 to 0.9)0.135
(4) Knowledge of instrument0.6 (1.1)0.5 (1.1)0.2 (−0.5 to 0.8)0.670
(5) Use of assistants0.7 (1.0)0.4 (0.9)0.3 (−0.3 to 0.8)0.312
(6) Flow of operation0.6 (1.4)0.4 (0.9)0.2 (−0.4 to 0.9)0.502
(7) Knowledge of specific procedure1.2 (1.3)0.5 (1.3)0.6 (−0.1 to 1.4)0.100
Manual dexterity measures (1) + (2) + (3)1.6 (2.2)0.6 (2.2)1.0 (−0.3 to 2.2)0.125
Cognitive measures (4) + (5) + (6) + (7)2.8 (3.9)1.7 (3.1)1.1 (−1.0 to 3.1)0.302

Resident self-assessments for change in GRS score and self-confidence rating scale score were both significantly higher for the MI group than the No MI group. The GRS score increased by a mean of 19% (SD 12) for the MI group compared with 9% (SD 11) for the No MI group (P = 0.005), and the self-confidence rating scale score improved by a mean of 19% (SD 12) for the MI group and 11% (SD 10) for the No MI group (P = 0.033). There were no differences between groups for time in the operating theatre, blood loss or intraoperative complications.

Two-way analysis of variance was used to show that there were no differences in change scores of the assessments after controlling for resident level (levels 1 and 2 junior versus levels 3 and 4 senior). The results for improvement of self-rated GRS score and self-confidence rating scale score did not change after controlling for resident level (F1,46 = 8.93, P = 0.005; and F1,45 = 6.31, P = 0.016 for change in GRS and self-confidence scale scores, respectively). Resident level only had an effect on the procedure-specific scale score by attending physician, where junior residents improved by a mean of 17% (SD 13) compared with 4% (SD 10) by more senior residents (F1,44 = 13.67, P = 0.001). There were no significant interactions between resident level and any of the scale scores.

Discussion and Conclusion

Our randomised controlled trial of MI versus usual teaching techniques for novice surgeons learning to perform a VH was the first to evaluate this teaching technique in complex gynaecological surgery. Our study did not demonstrate a significant improvement in surgical skills as rated via a validated scale. Residents in both groups had similar but nonsignificant improvement from the pretest to the test VH performed. On the other hand, resident self-assessment of surgical skill demonstrated a significant improvement in the MI group, along with increased self-confidence.

This trial was designed to provide the best available level of evidence and, as such, has notable strengths. Our findings are likely to be generalisable because we involved participation of eight centres across Canada and the USA. Most residents allocated to a group completed the study and their baseline demographics were similar. An extensively validated assessment tool with good inter-rater reliability was used in the evaluation of the primary outcome. The attending surgeons evaluating the residents were blinded to group allocation, so limiting bias in performance rating. The MI instruction was performed according to a standard script and we tried to minimise the variability in the instruction at different centres by providing a DVD to train the instructors.

This trial also has limitations. During planning of this trial, the sample size calculation was challenging, as previous studies using GRS did not estimate what change in score would represent clinically relevant learning for surgical trainees. Our sample size was therefore based on the change scores observed in previous studies.17–19 In our study, residents in the control group had similar improvement to the MI group residents. Actual practice on one VH in the operating theatre, with usual intraoperative teaching and supervision, may be as valuable as the performance of MI and may have accounted for the noted improvement in the control group performance. Despite efforts at standardisation, MI instructors at different centres may have provided heterogeneous training. The assessors were blinded to group and residents were instructed to not discuss the group allocation and the intervention, yet residents may have given clues to their assigned group and may have influenced their assessment. In addition, MI may be a type of preparation automatically performed by all, even by the residents without standard MI instruction in the control group. These control residents may have been performing MI unknowingly while reading the assigned textbook chapter on VH. We instructed the MI residents to perform MI on their own until they felt comfortable with the procedure because we did not know how many MI sessions are needed to improve performance. For some residents, the individual (unsupervised) MI sessions may not have been sufficient or adequately performed to improve surgical skill. We recognise that our resident groups were heterogeneous, with different levels of residents taking part in the trial. On the other hand, residents in our trial were at similar points on the VH learning curve. It is impossible to account for all individual variations in baseline technical skills. We also did not want to compromise the generalisability of results by excluding certain residencies where VH is taught earlier or later than average. Our randomised trial design ensured that residents from various levels were equally represented between the two groups.

In a randomised controlled trial by Komesu et al.,22 junior residents improved in the performance of a cystoscopy after MI when compared with a control group. Scores were 15.9% higher than controls (P = 0.03). Cystoscopy is a simpler operation, with significantly fewer steps than a VH, requiring less manual dexterity and cognitive involvement over a shorter time. It may be that MI is beneficial for improvement in skill on simpler procedures. After performance of one cystoscopy by both groups, the advantage of the MI did not persist for the performance of a second cystoscopy. It was felt that the actual execution of the cystoscopy contributed to surgical learning more than the MI session. This finding was consistent with our own trial findings of no improvement on the second performed procedure despite MI.

Another randomised controlled trial of novice surgeons performing basic laparoscopic training for cholecystectomy on a simulator did not show any advantage of MI as assessed via global rating scale.12 Their trial design was similar to ours, with testing before and after the intervention and learning of a complex surgical procedure. A procedure-specific checklist score did improve with the performance of MI over no instruction, suggesting that MI may benefit the more cognitive aspects of surgical performance (knowing the steps of the operation).12 Our trial did not show any improvement in the procedure rating scale scores. This may be because all our participants had access to the rating scales used in the study and may have read or rehearsed the required operative steps on their own before performing the VH.

Although the scores from assessment tools were ordinal in nature, we chose to report and analyse the data as though they were continuous scores, in keeping with other studies using similar scales.18,20 As a sensitivity analysis, we repeated the analyses comparing the change scores between groups using the Wilcoxon rank sum test, and the conclusions did not change.

There was clear improvement in assessment of the residents’ own performance and self-confidence after MI. The MI may be a tool to reduce learners’ stress and surgical performance anxiety. On the other hand, residents in the MI group were aware of the intervention and knew they would probably perform better. The significant improvement in their self-assessed performance may therefore be just an effect of unblinding. This finding is also in keeping with previous research suggesting that physician self-assessment can be discrepant with assessment by other observers and that physicians of lesser skill have the worst accuracy of self-assessment.23 Inducing an expectation of positive performance in cognitively complex tasks in undergraduate students produced overconfidence but did not increase effort or attention to strategy.24 On the other hand, one is unlikely to take action if one doubts one’s own capabilities.25 Mental practice is an effective stress management tool for novice surgeons.10 Whether this translates into significant learning benefits long-term remains to be further investigated. As for the role of MI in the enhancement of surgical skill, perhaps the surgical task needs to be broken down into individual simpler steps, such as the loading of a needle on a driver. Each step can then be rehearsed repetitively via MI and further research can determine how simple the step needs to be and how many times it needs to be rehearsed before an improvement can be demonstrated.

In conclusion, this small trial did not show an improvement in objective surgical performance of novice gynaecological surgeons after MI. The improvements in self-assessment of surgical skills and confidence suggest that MI may have an impact that could translate later into improved surgical performance. In its current form, MI cannot be recommended as a teaching tool for VH despite being an easily implementable strategy. Surgical educators may consider using mental imagery to improve learners’ self-confidence while performing complex surgical procedures. With modification and further evaluation, MI could yet prove to be a useful addition to the surgical teaching armamentarium.

Disclosure of interests

RG, VD, RM, KB, ST, SR and MR have no conflicts of interest. JG is a consultant for Ethicon, Coloplast, CP Medical and Boston Scientific and researcher with American Medical Systems Inc.; YD has links with Pfizer Speaker Bureau and the US Armed Forces; and AB is praeceptor and proctor for Bard Canada Inc. and Ethicon.

Contribution to authorship

RG, SR and MR designed this trial and secured funding for its conduct, they also prepared and revised the final manuscript for publication. ST provided invaluable assistance with statistical concepts, data analysis and drafting of the Methods and Results sections. JG, YD, AB, VD, RM and KB contributed to the design of the trial by reviewing assessment tools, training to become educators in mental imagery and reviewing the final manuscript for publication. All authors accept responsibility for this version of the manuscript.

Details of ethics approval

This trial received ethics approval from the main site in Calgary as well as from the other Ethics Boards at the participating institutions. The Calgary Research Ethics Board approval number was 21464 and approval date was 3 April 2008.


An American Urogynecologic Society (AUGS) seed grant of $5,000 was awarded in 2007.



Commentary on ‘The mind’s scalpel in surgical education: a randomised controlled trial of mental imagery’

In the UK and other countries, changes in the working hours of doctors in postgraduate training and increasing use of less invasive treatment modalities has reduced the surgical experience available during training in gynaecology. An educational technique that could improve surgical competence at minimal cost would be welcome. In this paper the authors used a randomised controlled trial to investigate the role of mental imagery to improve competence at vaginal hysterectomy. The use of randomised controlled trials in evaluating educational interventions is relatively recent. Five years ago Cook et al. (Med Educ 2007;41:737–45) evaluated the quality of reporting of a selection of experimental studies in the medical education literature and noted that overall the quality was poor. Only 18 of the 105 studies they analysed reported randomisation and less than half of the studies mentioned ethical committee approval or participant consent. Although simulation training is widely promoted to improve team working in obstetric emergencies, Merien et al. (Obstet Gynecol 2010;1115:1021–31) were only able to identify eight studies, of which four were randomised controlled trials. There has been greater use of this methodology in assessing virtual reality training in laparoscopic surgery with Gurusamy et al. (Cochrane Database Syst Rev 2009) identifying 23 trials with 612 participants, but only three trials had a low risk of bias.

The use of an appropriate outcome measure is challenging. The authors have used a well-validated score to assess the performance of trainees before and after the intervention. Clinical outcome measures such as complication rates are more important, but impossible to use in studies evaluating supervised operators because of the likely intervention of the supervisor. As it took 3 years for eight centres to recruit and follow up 50 residents, a larger sample size would not be feasible. Although residents were instructed not to discuss their group allocation, it is possible that over the course of the study period residents heard from each other the technique of mental imaging. A way of reducing this risk and reducing the risk of introducing bias into the assessment would have been with the use of cluster randomisation. However, this would require the recruitment of a greater number of centres to achieve a reasonable power and may have created more problems with the heterogeneity of instruction of mental imaging.

Ethical considerations are as important in educational interventions as in clinical studies. The authors of this paper obtained ethical committee approval from all the recruiting centres and obtained consent from the residents for participation. It was important that the authors ensured that the residents felt no coercion to be in the study by not allowing the Training Programme Directors to recruit; despite this, only three of 82 residents declined consent, which minimised the risk of volunteer bias. By contrast, Callahan et al. (Med Educ 2007;41:746–53) had 25% of residents decline permission to be involved in an educational research project and found a significant difference between the characteristics of volunteers and nonvolunteers.

Disclosure of interest

I have no conflict of interest to declare.

E Hawkins

STC Chair NE Thames, Queen’s Hospital, Romford, UK