Test-enhanced learning may be a gender-related phenomenon explained by changes in cortisol level


Charles Kromann, Centre for Clinical Education, Rigshospitalet 5404, Blegdamsvej 9, 2100 Copenhagen E, Denmark. Tel: 00 45 42 44 29 05; Fax: 00 45 35 45 44 37; E-mail: charles.kromann@gmail.com


Medical Education 2011: 45: 192–199

Context  Testing increases memory of a topic studied more than additional study or training. The mechanisms by which this occurs are not clearly understood. Testing can be stressful and studies suggest that the stress hormone cortisol has modulating effects on memory, predominantly in men. The aim of this study was to investigate whether cardiopulmonary resuscitation (CPR) skills testing induce a cortisol increase, whether the cortisol increase enhances retention of CPR skills, and how this relates to gender.

Methods  We randomised a convenience sample of medical students attending a mandatory course to one intervention and one control group. Students received a 4-hour course on CPR skills. During the final half-hour of the intervention course, participants were tested in CPR scenarios, whereas the control group underwent additional training. We assessed learning outcomes 2 weeks later by rating student performance in a CPR scenario using a checklist and a single blinded assessor. We measured salivary cortisol pre-course, half an hour before the end of the course and post-course, and compared learning outcomes and cortisol responses between groups and genders.

Results  In total, 146 of 202 (72%) students completed the study. We found a significant difference in learning outcome between the intervention and control groups for both genders (mean ± standard deviation, 5.0 ± 3.5; p = 0.006). We found a significant effect of increase in cortisol on learning outcome in men. The correlation between learning outcome and cortisol increase was medium to large for men (r = 0.38), but not for women (r = − 0.05).

Conclusions  Cardiopulmonary resuscitation skills testing induces a rise in cortisol in men, which is related to the better retention of skills in men. Cortisol modulates test-enhanced learning in men.


Test-enhanced learning relates to the fact that testing increases memory of a topic studied more than additional study or training. This phenomenon pertains to both knowledge1 and skills2 learning. Although test-enhanced learning is a well-studied and robust phenomenon, the mechanism behind it is not clearly understood.

Roediger and Karpicke suggest that complex mechanisms known from psychonomics, such as transfer-appropriate processing and elaborate retrieval, are possible explanations. Transfer-appropriate processing suggests that if the testing situation and thus processing are sufficiently similar to a later transfer situation, retrieval is facilitated. Elaborate retrieval suggests that the harder it is to retrieve a specific element from memory in the intervention situation, the easier it is to retrieve in an assessment or transfer situation later.1 However, it is possible that these mechanisms do not entirely explain test-enhanced learning. In our earlier studies of the testing effect on skills learning,2,3 we assumed that the acute stress that pertains in the testing situation could be a factor of importance.

A well-known physiological response to acute stress is the release of cortisol from the adrenal cortex. The literature on the effect of cortisol on learning is extensive, in both animal models and human studies. Researchers in neurobiology, learning and cognition describe the effect of the stress hormone cortisol on memory.4–6 An extensive meta-analysis by Dickerson and Kemeny concluded that the social-evaluative threat and lack of control experienced by the testee when being tested will cause medium to large increases in cortisol.7 Therefore, it is possible that an elicited cortisol response might contribute to test-enhanced learning.

However, the effects of increased cortisol levels can be both positive and negative in terms of learning.8,9 The positive effect of stress-level cortisol is that it facilitates memory consolidation, the process by which a memory is stabilised after its initial acquisition. The negative effect of stress-level cortisol is impairment of memory retrieval10 (i.e. the ability to access stored memories). It has been suggested that, as a result of cortisol, the brain is in learning mode and thus retrieval is impaired.11,12

Testing has been demonstrated to enhance learning outcomes1,2 and to increase cortisol levels.13 However, until now any possible link between test-enhanced learning and cortisol response has not been studied. Hence, the aim of this study was to explore the connection between test-enhanced learning and cortisol response.

It is well known that there are differences between men and women with regard to the effects of cortisol and that men are more susceptible to the effects of cortisol on memory and learning.4–6,14 Given that men and women may have different responses to stress and thus to cortisol levels, we also aimed to investigate differences according to gender.

This study was set in the context of a cardiopulmonary resuscitation (CPR) skills training course for medical students. Our research questions were:

  • 1What are the effects of CPR skills testing on learning outcome and cortisol response in men versus women?
  • 2Is the testing effect on learning outcome related to cortisol response?



The study was a single-blinded, randomised, controlled study of the effect of CPR skills testing on learning outcome and cortisol response.

Main outcome measures were learning outcome in terms of students’ CPR performance assessed 2 weeks after the course and stress response in terms of changes in cortisol levels before and after testing.


A convenience sample of 202 medical students in semester 7 attending a mandatory course in CPR skills was invited to take part in the study. Students who accepted the invitation were randomised to either the intervention or control group.

Both groups received a 4-hour, small-group practical course on CPR skills, involving basic life support, automatic external defibrillation (AED) and safe defibrillation. The simulations were based on six similar basic cardiac arrest scenarios. However, the last half-hour of the course differed between the groups.

In the intervention group, the last half-hour of the course was used for testing the participants in one of six similar CPR scenarios. They were tested individually for 5 minutes each, while the other participants observed. The instructor conducted the tests using a standardised checklist. Participants were not informed about the test results. As these test scores were not recorded for research use, score sheets were destroyed after the course. When all six 5-minute tests had been completed, the group was given plenary feedback on three or four important issues related to the standard procedures involved in CPR.

In the control group the last half-hour was spent on additional instructor-assisted hands-on training. The students were trained in scenarios similar to the test scenarios used in the intervention group and were given short plenary feedback after each scenario. The classrooms and simulation facilities were the same for both groups during courses, testing and assessments.


Participants were given a random ID number and randomised into 34 groups of six students per group and allocated to either the intervention or the control course. Five minutes before the course began, a research assistant told the instructor and the group whether the group was to be tested. All randomisation sequences and tables were generated using http://www.random.org.

Assessment of learning outcome

Learning outcome was defined as CPR performance assessed 2 weeks after the course. A single blinded assessor conducted all assessments. The assessor was an experienced CPR instructor, who had not taught any of the participants and who had ample prior experience in assessing CPR performance. Before the assessment started, the assessor ran the participant through a set of standardised instructions regarding utensils, drugs and the degree of assistance available during the scenario. Each participant was assessed individually in a simulated cardiac arrest scenario without observers. The assessor introduced the simulated case by stating: ‘You are about to establish an i.v. access when your patient, a 75-year-old man, becomes unresponsive. You are now required to manage this patient.’ Each participant’s CPR performance was assessed using a standard 25-item checklist. Each item was scored on a scale of 0–5 points and summarised scores were converted to percentages. Items were focused on skills such as securing free airways while checking for a pulse and respiration, quality of chest compressions and ventilations, and correct and safe use of a defibrillator. For further details please refer to Kromann et al.,2 in which the checklist appears in the appendix. The checklist was converted from a validated European Resuscitation Council (ERC) Advanced Life Support (ALS) Cardiac Arrest Scenario checklist15 and the content of the checklist was validated by an expert (i.e. an ERC ALS instructor). We examined the internal consistency of the checklist using Cronbach’s alpha.

Before the performance assessment all participants were asked if they had rehearsed in the retention period and their data were excluded from analyses if they confirmed that they had. Participants had no access to the simulation laboratory facilities, thus making the rehearsal of skills difficult. Performance assessment of CPR skills was not part of the formal assessment protocol or of examinations in this semester.

Measurement of cortisol

Stress was measured by salivary cortisol levels (SCLs), which correlate well to serum cortisol levels and are easily obtained and analysed.16 Sarstedt’s Salivette sampler is a simple and reliable saliva-sampling device.17 For saliva collection, the participant placed a cotton swab, from a Salivette, in his or her mouth for 30 seconds and then replaced the swab in the Salivette test tube. Samples were labelled with barcodes, centrifuged and frozen to − 20 °C until assayed.

In our study, SCL was measured three times in each individual at the following time-points: before the course began (SCL1); half an hour before the end of the course (SCL2), and after the course (SCL3). Post-course SCL (SCL3) was obtained 20–40 minutes after the onset of testing because the cortisol response normally peaks in this interval after a stressful event.16


Saliva samples were assayed in a laboratory at Copenhagen University Hospital ‘Rigshospitalet’. Saliva cortisol concentrations were measured by electrochemiluminescence immunoassay (ECLIA), using a kit sourced from Roche Diagnostics GmbH (Mannheim, Germany). The limit of sensitivity of the assay was 0.5 nmol L−1. The inter- and intra-assay coefficients of variation on the assay are < 10% and 5%, respectively, according to the laboratory’s validation report.

Data analysis

Total checklist scores from the assessment of CPR performance were converted to percentages of the maximum possible score.

Cortisol response was calculated as the increase from SCL2 to SCL3. The calculated variable, named deltaSCL, was compared between the intervention and control groups and between men and women using an independent samples t-test and is reported as the mean (standard deviation [SD]). To estimate the magnitude of differences between groups, effect size (ES) calculations were performed using Cohen’s d, with ES values of 0.2, 0.5 and 0.8 designated as small, medium and large ESs, respectively.18

Additionally, in order to examine the effect of testing in both men and women, and whether deltaSCL had an effect on learning outcome, an analysis of covariance (ancova) was performed. The ancova used learning outcome as the dependent variable, group and gender as independent variables and deltaSCL as a covariate. Results are reported as mean ± standard deviation. After the initial analysis, outliers were removed and the analysis was run again to ensure that the model was robust with regard to outliers. Data on either learning outcome or deltaSCL ≥ 3 SD of the mean were considered to represent outliers.

To evaluate the association between learning outcome and change in cortisol level, we calculated Pearson’s r and used the ES indication for correlations, where r-values of 0.1, 0.3 and 0.5 were designated as small, medium and large correlations, respectively.18 Correlations are reported in scatterplots of changes in learning outcome scores by deltaSCL.

All calculations were performed in spss Version 17.0. (SPSS, Inc., Chicago, IL, USA)


All participants received both verbal and written information about the research project and gave written informed consent to participation.

The Biomedical Research Ethics Committee in the Capital Region of Denmark waived the need for full ethical approval. Saliva samples were either maintained under supervision or kept in a securely locked freezer. The data were split into two sets and kept on two separate dedicated restricted access file folders on secure servers at Rigshospitalet. One dataset contained the identity of participants and the other contained the results.

Participants were assured that their personal data would remain anonymous and that no results would be passed on to any third party. Participants were registered in the database with an ID number. Only the project secretary had access to information about individuals and ID numbers. The participants were offered the opportunity to join a mailing list by which they could receive information on the outcome of the study.


All course participants (n = 202) were invited to enrol in the study. A total of 146 (72%) students completed the study. Dropouts included participants who were unable to attend the assessment procedures 2 weeks after the course for personal reasons. None of the participants reported having rehearsed for the performance assessment. The median age of participants was 24 years (range 21–36 years). Data for eight participants were excluded because their datasets included outliers. The resulting study sample included 66 participants in the intervention group (24 men, 42 women) and 72 in the control group (24 men, 48 women.


The internal consistency of the checklist, Cronbach’s alpha, was calculated to 0.74, which is considered adequate for research purposes.19

Learning outcome, deltaSCL and differences between genders

The effects of testing on learning outcome and cortisol are depicted in Table 1. There was a significant effect of testing on learning outcome and deltaSCL in men, but not in women.

Table 1.   Differences between learning outcome and cortisol levels
Intervention mean(SD) (n = 66)Control mean (SD) (n = 72)p-valueESIntervention mean (SD) (n = 24)Control mean (SD) (n = 24)p-valueESIntervention mean (SD) (n = 42)Control mean (SD) (n = 48)p-valueES
  1. SD = standard deviation; ES = effect size

Learning outcome70.91 (10.52)65.43 (9.47)p = 0.002 0.5573.61 (8.34)64.72 (9.32)p = 0.0010.9969.37 (11.38)65.78 (9.62)p = 0.110.34
DeltaSCL0.72 (3.54)− 0.08 (2.42)p = (3.06)0.08 (2.62)p = 0.0070.80− 0.18 (3.50)− 0.08 (2.27)p = 0.88− 0.03

The ancova showed a significant difference in learning outcome between the intervention and control groups (mean difference 5.0 ± 3.5; F[1,137], p = 0.006). With regard to the entire sample, there was no significant interaction between gender and deltaSCL (F[2,137], p = 0.064). However, there was a significant effect of deltaSCL on learning outcome in males (1.0 ± 0.9; p = 0.047). The model was robust with regard to outliers.

For the entire study sample, the correlation between learning outcome and deltaSCL was not significant (r = 0.11, p = 0.18). However, the male group showed a significant positive correlation between learning outcome and increase in deltaSCL (r = 0.38, p = 0.008; ES, medium to large).18 The female group showed no significant correlation (= − 0.05, p = 0.65). The correlations are reported in scatterplots (Fig. 1) showing changes in cortisol levels by learning outcome scores. Separate plots for males and females, and intervention and control groups, demonstrate that the relationship between learning outcome and rise in cortisol is credible only among males in the intervention group. The R2 statistic (0.148) of this linear relationship indicates that about 15% of the variance in learning outcome in males in the intervention group is explained by a rise in cortisol level.

Figure 1.

 Scatterplots showing learning outcome for men and women in test and no-test conditions as a dependent of cortisol elevation (salivary cortisol level, deltaSCL). Lines indicate trends. Women, no test, linear R2 = 0.003; men: no test, linear R2 = 0.007; women: test, linear R2 = 0.024; men: test, linear R2 = 0.148


Our results demonstrate a positive effect of testing on learning outcome. In men this effect was in part related to an increase in cortisol.

A wealth of studies have demonstrated test-enhanced learning1 and some have shown the effect of cortisol on learning.20 However, to our knowledge this study is the first to combine these two concepts to explore whether an increase in cortisol might explain test-enhanced learning. Our results show that there may be such a connection. The following discussion relates firstly to the results pertaining to test-enhanced learning, secondly, to the effect of testing on cortisol and, thirdly, to the connection between test-enhanced learning and the increase in cortisol. Finally, we address the practical implications of the findings of this study.

The testing effect

The demonstration of test-enhanced learning in this study is in accordance with our earlier studies on skills learning2,3 and findings on knowledge learning.1 However, other factors, such as intention to learn, preparation for a retention test and observational learning, may have influenced the results. Although students in the intervention group knew they were about to be tested at the end of the course, we doubt that this influenced their intention to learn and hence increased their learning outcome. According to Crocker and Dickenson, there is no difference in learning outcome between intentional and unintentional learning if the participants are not allowed to rehearse in the retention period.21 The tight course schedule and standardised format were similar in both groups and participants in the intervention group did not have an opportunity to engage in self-directed training. Moreover, none of the participants reported having rehearsed in the retention period. Hence we do not find that intention to learn and subsequent rehearsal threaten the results.

Observational learning is known to be an important part of skills learning.22 However, in this study the amounts of observational learning taking place in the intervention and control groups were probably equal as the last half-hour in both groups was dedicated to basically the same scenarios and the same modus of feedback. The only difference was that the control group was undergoing training, whereas students in the intervention group were being tested individually, without being told the results. Hence we do not find observational learning to threaten the results.

Test-induced cortisol elevation

Diurnal variations and considerable interpersonal variations in SCLs represent a threat to any analysis of SCL measurements. A wide range of factors such as time of day, female menstrual cycle, use of oral contraceptives and hormone replacement therapy, pregnancy, lactation and breast-feeding, smoking, alcohol consumption, dietary energy supply and possibly caffeine may affect salivary cortisol responses.13,16 It was not feasible to standardise the design of the current study according to all these factors. However, we anticipated that these possible biases would be evenly distributed between the groups following the randomisation procedure and chose the deltaSCL measured over a narrow time span (20–40 minutes) as outcome variable. Given the great variability in SCL, we expected some results to be measurement errors and therefore removed outliers that were ≥ 3 SDs from the mean. However, post hoc analysis demonstrated that the ancova model obtained was robust with regard to outliers. Saliva samples were analysed in multiple runs and thus inter-test variability (< 10% according to Rigshospitalet’s validation report) may have had some impact on the results. However, interassay variability of 10% must be considered acceptable in comparison with other saliva cortisol studies.23–25 Hence, although we acknowledge the possibility of measurement errors, we feel confident in our results regarding the effect of testing on cortisol.

Referring to neurobiological studies on psychosocial stress and acute cortisol response, we theorised that the test situation used in our study would induce an acute cortisol response comparable with that provoked by a standardised psychosocial test (e.g. the mental challenge component of the Trier Social Stress Test [TSST]).16 We found a significant cortisol response to the testing situation, but only in men, which is in accordance with the conclusions proposed in a major review by Kirschbaum and Hellhammer.16 However, the test-induced elevation of cortisol in our study was on average equivalent to only 10% of the elevations mentioned in the review article.16 This may be explained by the fact that the test situation was of no consequence to our participants and perhaps not as stressful as the highly standardised TSST.

Test-enhanced learning, cortisol and gender

Our results indicate the presence of a connection between learning outcome and acute cortisol elevation in men, but not in women. A thorough review by Kudielka et al.13 showed that healthy males elicit a stronger and more uniform cortisol response to stress, whereas the response in females depends on menstrual cycle and use of oral contraceptives.

However, when we examined the extensive literature on test-enhanced learning, we found no studies on differences between genders in learning outcome.1 It seems unlikely that such a gender difference has passed unnoticed through decades of research in test-enhanced learning. With regard to the current study, we can only state that the literature supports our results indicating that test-enhanced learning in males is modulated by cortisol to some extent. However, whether test-enhanced learning is a male phenomenon must be explored further, either by re-examining older studies or by future research.

Educational implications

Test-enhanced learning has been recommended for inclusion in instructional strategies.1 This may be particularly relevant for simulation-based skills training. Medical simulation facilities are expensive and the retention of CPR skills is known to be poor.26 Therefore, whatever can be done to make teaching in simulation facilities more effective should be implemented in order to save money and lives. Against this background, we have earlier argued that tests should be implemented in CPR courses.3 However, the gender-related results of this study disturb this picture. Medical universities all over the world have male : female student ratios of approximately 40 : 60.27 We consider that it is vital to report differences between genders in future studies of test-enhanced learning. A possible gender difference may undermine the educational value of the testing effect and this issue requires further investigation.

Implications for research in medical education

Issues related to test-enhanced learning have important implications for research designs. When administering both an immediate post-test and a later retention test to the same group, researchers should be aware that the immediate post-test will interfere with performance on the later retention test and thus a true learning outcome will be difficult to estimate from this design.1,2 Moreover, an immediate post-test does not necessarily correlate to long-term learning outcome.1 Finally, any cortisol elevation during an intervention or control situation may obscure the true results by impairing retrieval for about an hour in most males and some women.16 Hence, when choosing a research design, it is important to observe that testing may influence learning and therefore may bias the results.


The participants in this study were medical students. The sample was relatively homogeneous with regard to age and educational level. Consequently, the findings do not necessarily apply to other groups at higher levels of training, different age groups or other groups of medical professionals.

The convenience sample of 146 of a population of 202 medical students at exactly the same educational level can be considered a weakness, but was necessary for feasibility reasons. We would have preferred to include more than 80% of the population, but, as we prioritised the testing of all participants within 24 hours and at 2 weeks after the course to avoid any time-related bias, we had to leave out 56 participants who were unable or unwilling to participate in follow-up.

A simulation scenario in CPR skills learning is an activity that potentially affects the emotional state of participants. The effects of stress and cortisol on memory mainly pertain to elements that carry emotional valence.28,29 There is probably an emotional valence in CPR training and hence we caution against the generalisation of our findings to other skills or pure knowledge learning. To our knowledge, we are currently the only group reporting on the testing effect in skills learning and, until our results are reproduced elsewhere, our findings suffer the disadvantage of having no direct comparison.


This study demonstrates a significant effect of CPR skills testing on learning outcome and cortisol response in men. Test-enhanced learning was in part explained by a test-induced increase in cortisol levels.

Contributors:  CBK contributed to the study conception and design, the acquisition, analysis and interpretation of data, and to securing funding for the study. MLJ contributed to the study conception and design, and to the acquisition and interpretation of data. CR contributed to the study conception and design, the analysis and interpretation of data, and to securing funding for the study. All authors contributed to the drafting and revision of the article and approved the final manuscript for publication.


Acknowledgements:  the authors wish to thank the cardiopulmonary resuscitation instructors at the Centre for Clinical Education (Center for Klinisk Uddannelse [CEKU]), Kristoffer Andresen for his meticulous work in collecting, labelling and caring for saliva samples in the classrooms, Elisabeth Anne Wreford Andersen of the Department of Biostatistics, Copenhagen University, for assistance with the ancova analysis, and Rigshospitalet’s laboratories for assaying the saliva cortisol samples.

Funding:  the study was financed by Centre for Clinical Education and in part funded by Trygfonden.

Conflicts of interest:  none.

Ethical approval:  all participants gave written informed consent to participate. The Biomedical Research Ethics Committee in the Capital Region of Denmark waived the need for full ethical approval. Saliva-samples were either under supervision or kept in a securely locked freezer. Participants were ensured anonymity in terms of personal data and results not being passed on to a third party.