Evaluating information skills training in health libraries: a systematic review


Alison Brettle, Research Fellow (Information), Salford Centre for Nursing, Midwifery and Collaborative Research, Institute of Health and Social Care, University of Salford, Allerton Building, Frederick Road, Salford, M6 6PU, UK. E-mail: a.brettle@salford.ac.uk


Introduction:  Systematic reviews have shown that there is limited evidence to demonstrate that the information literacy training health librarians provide is effective in improving clinicians’ information skills or has an impact on patient care. Studies lack measures which demonstrate validity and reliability in evaluating the impact of training.

Aim:  To determine what measures have been used; the extent to which they are valid and reliable; to provide guidance for health librarians who wish to evaluate the impact of their information skills training.

Methods:  Data sources: Systematic review methodology involved searching seven databases, and personal files. Study selection: Studies were included if they were about information skills training, used an objective measure to assess outcomes, and occurred in a health setting.

Results:  Fifty-four studies were included in the review. Most outcome measures used in the studies were not tested for the key criteria of validity and reliability. Three tested for validity and reliability are described in more detail.

Conclusions:  Selecting an appropriate measure to evaluate the impact of training is a key factor in carrying out any evaluation. This systematic review provides guidance to health librarians by highlighting measures used in various circumstances, and those that demonstrate validity and reliability.


Health librarians devote a large amount of time to training activities, yet there is limited evidence to demonstrate that the training they provide is effective,1,2 or whether the services provided by health librarians make a difference. Health librarians may wish to evaluate the impact of their training for a number of reasons including: deciding whether the techniques and methods they use are worthwhile (are those being trained learning anything?); to help understand whether they are making the best use of resources; to demonstrate a need for funding; to improve the service; to help redesign materials and methods; or to contribute towards the overall assessment of library performance. One of the key requisites for undertaking a good quality study or evaluation of effectiveness is the selection of an appropriate measure with which to evaluate the service of interest. However it has been suggested that there is a lack of validated measures with which to evaluate information skills training.1,3

Examining the ‘outcomes’ when carrying out an evaluation of the impact of a training programme, allows the researcher to measure how far a programme or intervention has met its stated objectives or goals.4 Outcomes are the results (effects) of processes. They are the part of the situation pertaining after a process which can be attributed to the process.5 Outcome measures or methods therefore are the means by which outcomes can be measured and outcome criteria are the criteria that one can measure against to determine whether the stated aims and objectives have been met. To improve the trustworthiness of the results obtained, instruments used to measure outcomes should conform to some basic methodological criteria: will the instrument produce the same results if reapplied to the same situation—if so it is said to be reliable; is it measuring what it purports to—if so it is said to be valid.5 These concepts are discussed more fully in the results section.

Library training sessions are often evaluated by means of a survey that participants complete following a training session. This can include questions on the accommodation, facilities or whether a session was interesting or met the participants’ needs. Some go further and ask whether the training has improved knowledge or skills. Although these types of surveys may provide some useful information, they do not help the health librarian evaluate the impact of the training. They cannot determine whether participants have learned anything or will change their practice as a result of the training.

According to Gratch-Lindauer, assessing student learning is difficult because learning is complex and multidimensional.6 Students have different backgrounds and abilities and learn in different ways at different rates. In the case of information skills training, some courses are teaching knowledge (e.g. resources available or principles or literature searching), whilst others are teaching skills such as database searching or critical appraisal. The methods and measures used to evaluate training need to reflect these differences. These complexities are beyond the scope of this review but are covered by Avery and Cohen et al.7,8 To determine whether training has made a difference, it is necessary to go beyond how students perceive or value the training (what are sometimes referred to as affective domains) and use more objective methods to evaluate outcomes. To do this, it is necessary to evaluate what students know (cognitive domains or knowledge) or what they do (behavioural domains, e.g. skills) following teaching or training. The main focus of this review, therefore, will be on measures and methods that examine knowledge, skills or behaviour following teaching or training by health librarians.

The remainder of this paper describes the review and then concludes by putting forward suggestions of how librarians can evaluate their training and what is needed to advance the evidence base.


The systematic review aims to:

  • 1determine what measures have been used to evaluate the effectiveness of information skills training for health professionals;
  • 2determine the extent to which these measures are valid and reliable.


Data sources

A range of sources were searched in order to identify relevant studies. These included a personal collection of articles and references collected whilst undertaking research in this topic area; electronic database searches (Australian Education Index, British Education Index, cinahl, eric, lisa, lista and medline) and citation tracking of references of relevant articles. Electronic searches were carried out in Spring 2003 from 1990 onwards and updated in Spring 2006. Combinations of free text and subject heading terms appropriate to each database were combined using Boolean operators. Examples of the search terms used can be found in Table 1, full search strategies are available on request from the author.

Table 1.  Search terms used to locate studies on electronic databases
Evaluation/Outcome related terms (combined using the OR Boolean operator)
Evaluation; performance measures; outcome; outcome measur*; feedback; assessment; measure; test; score; checklist; evaluation; assess*; precision; outcome assessment; program evaluation; educational measurement
Literature searching, information skills, information literacy related terms
Information storage and retrieval; Cd roms; Information literacy; searching; medline; cinahl; evidence based*; information skills; user education; user instruction; database; computer literacy; online searching; library skills; library instruction; information retrieval; professional education; education, medical; medical informatics; medlars; libraries medical; library science; internet

All potentially relevant citations were screened by the author using the titles and abstracts to determine whether they met the inclusion criteria (Table 2) (n = 693). Full articles were obtained and further screened for those studies which met the criteria (n = 350). These papers were read once again and 63 met the inclusion criteria and went forward to the data extraction stage.

Table 2.  Inclusion/exclusion criteria for study selection
1. Study about Information literacy, Information skills training, bibliographic instruction, evidence-based practice OR
2. Study assessing the quality of a literature search
3. Using an objective measure or method (i.e. measuring a change in knowledge, skills or behaviour)
4. Occurring in a health/clinical setting (hospital library, academic library to health students or professional development for clinical staff)
1. User perceptions/surveys of training
2. Studies of training in evidence-based practice with no literature searching/finding evidence element
3. Non-health settings, e.g. academic libraries to other students

At the data extraction stage, a further seven were excluded, as they did not fit the criteria and two papers were describing duplicate studies, therefore a total of 54 studies were included in the review. A flowchart of the progress of studies through the search, screening and data extraction process is presented in Fig. 1. Data for the review was extracted using a tool adapted from two other critical appraisal tools9,10 (Appendix 1) and then entered into a summary table in an Excel spreadsheet. Microsoft Excel was used to perform simple counts of relevant features, which are presented in the results section below, and to compile the summary table in Appendix 2.

Figure 1.

Overview of literature search and retrieval

To facilitate the calculation of simple descriptive statistics presented in the tables below, data in some fields of the data extraction sheet were categorized; for example, setting, topic taught and outcome measures. The categories were created after the initial round of data extraction. Data templates were read several times, themes noted and categories created where similarities existed. For example, in relation to setting any study describing physiotherapists, occupational therapists, pharmacists, etc. were categorized into one group academic professions allied to medicine (PAM).

Validity and reliability were determined from reading which tests had been used in the paper. In some cases, judgements were made; for example, a test may be described which tested for inter-rater reliability but the paper did not actually use the term ‘inter-rater reliability’. In a case such as this, the paper would be categorized as having tested for inter-rater reliability. ‘Not clear’ was used where tests for reliability or validity were not described in the paper.


Fifty-four studies were included in the review. A full list of the studies included in the review and a summary table of the features of each study can be found in Appendix 2. It should be noted that, as the aim of the review was to examine the validity and reliability of the measures that have been used to evaluate the impact of training, the quality of the studies has not been addressed here. Issues about study quality in this field have been addressed in other reviews by Brettle and Garg and Turtle.1,2,11 Issues regarding the quality of the measures used in the studies will be addressed within this section and the discussion.

Settings, locations and population

Tables 3–5 provide a breakdown of the settings, location and study populations of the studies included in the review. These details in the papers were well described and, as can be seen from the tables, the majority of studies took place in US academic institutions on medical students. Undergraduate students were tested in varying years. Thirty-eight took place in academic settings; however, almost a quarter (n = 22) took place in clinical settings (clinical practice) and four were library based (either academic or hospital libraries).

Table 3.  Location of studies
Academic–medical29 53.70
Academic–nursing 7 12.96
Academic–professions allied to medicine (PAM) 2  3.70
Clinical practice12 22.22
Library 4  7.41
Other 0  0.00
Not clear 0  0.00
Table 4.  Setting of studies
USA35 64.81
UK 6 11.11
Australia 2  3.70
Europe 5  9.26
Canada 3  5.56
Other 3  5.56
Not clear 0  0.00
Table 5.  Recipients of training
Undergraduates30 55.56
Postgraduates 1  1.85
Resident 9 16.67
Practice 9 16.67
Other 4  7.41
Not clear 1  1.85

The majority (n = 30) were carried out with undergraduate students (medical, nursing or professions allied to medicine completing their first degree) and only nine on practising (qualified) clinicians or health professionals. Nine took place on residents, that is doctors who have gained their degree but are still undertaking clinical training.

Study design

Table 6 shows a breakdown of included studies by study design. The majority of measures were used in pre-experimental (one group, post-test only; one group with pre- and post-test or two groups post-test only), quasi-experimental (control group, pre- and post-testing no randomization) or randomized controlled designs (n = 34). A small number of qualitative studies were included in the review (n = 2) and a small number of descriptive studies (n = 4). The study design categories were based on the descriptions given by Robson.4

Table 6.  Breakdown by study design
Study typen%
Experimental 0  0.00
Randomized controlled trial (RCT)13 24.07
Quasi-experimental 6 11.11
Pre-experimental15 27.78
Longitudinal 8 14.81
Qualitative 2  3.70
Survey 1  1.85
Cross-sectional 3  5.56
Case-study analysis 2  3.70
Descriptive 4  7.41

Topics taught and teaching methods

A range of topics relating to information skills were taught and were categorized to aid analysis. A full breakdown can be found in Table 7. Several studies included more than one topic—hence the number of topics does not correspond with the number of studies in the review.

Table 7.  Breakdown of topics covered by information skills training
Topic taughtn%
Database searching4133.33
Question formulation2318.70
Critical appraisal11 8.94
Referencing 1 0.81
Applying evidence to practice 3 2.44
Evaluating practice 1 0.81
Selecting articles 4 3.25
Critical Appraisal Topics (CATs) 1 0.81
Library orientation 4 3.25
Internet 0 0.00
History of medline 3 2.44
Research methods 4 3.25
Computer literacy 3 2.44
Not clear/not applicable 9 7.32

It can be seen that database searching, question formulation and sources of information are the most popular topics covered—a result that should not be surprising considering the topic area of the review. In eight of the studies,12–19 the focus of the study was to develop a measure or to evaluate the quality of end-user searches rather than the training given, therefore the topic taught was not clearly described.

Teaching methods were categorized into those noted in Table 8. Hands-on practical sessions (n = 28) or didactic (lecture)-type sessions (n = 21) were the most common. Some used a combination of more than one method (a blended approach). In 20 studies the teaching method was unclear. The eight studies noted above fell into this category; however, in some cases this was not possible to determine because of poor reporting.

Table 8.  Teaching methods
Teaching methodn%
Hands on2851.85
One-to-one 611.11
Distance 1 1.85
Web tutorial 4 7.41
Outreach 0 0.00
Clinical librarian 0 0.00
Not clear/not applicable2037.04


The studies aimed to measure achievement against a wide range of different outcome criteria (Table 9). A range of different methods or measures were used to evaluate the outcomes of the studies (Table 10). Studies that used measures that solely relied on user's perceptions of the training (affective domains) were excluded from the review; however, those that included perceptions as part of a package of measures have been included, as can be seen in Table 9. The majority of studies measured multiple outcome criteria and used multiple methods. In line with the topics taught, most studies were examining whether a difference occurred in database search skills as a result of training (n = 39). Search question development (n = 14), and knowledge of library or information resources was also examined by a number of studies (n = 20). Just over a quarter of studies (n = 16) examined whether training impacted on participants behaviour (e.g. did they search more frequently or more quickly post-training or did they have more positive attitude to carrying out a literature search). Some of the outcome criteria may seem similar; for example, evidence-based medicine (EBM) principles and knowledge and EBM skills; these represent the measurement of different domains of learning (cognitive and behavioural) and so have been classified separately. Search quality was used when database searching skills were measured, but not described or broken down in detail. In comparison, database search skills was applied when a range of skills were examined. Search value was used when a value was place on the results of the searches to the user, and search knowledge measured the cognitive domain of searching rather than the application of search skills.

Table 9.  Outcome criteria
Outcome criterian%
Search question development1425.93
Knowledge of resources1120.37
Database search skills3972.22
Critical reasoning/appraisal skills 916.67
Evaluation of search results 814.81
Evidence-based medicine (EBM) principles/knowledge1120.37
EBM skills 3 5.56
Application of evidence (to practice) 3 5.56
Computer literacy skills 3 5.56
Search behaviour/attitude to searching1629.63
Library orientation 1 1.85
Citations 0 0.00
Research methods 3 5.56
Search quality 712.96
Search knowledge 3 5.56
Search value 611.11
Confidence 3 5.56
Not clear/not applicable 1 1.85
Table 10.  Outcome measures and methods
Outcome measures and methodsn%
Test 916.67
Transaction log 814.81
Search scale/score/checklist2037.04
Graded assessment 916.67
Vignette/scenario 611.11
Recall precision 814.81
Multiple-choice questionnaire (MCQ)1120.37
Relevance 814.81
Usage rates 4 7.41
OSCE 5 9.26
Interview 5 9.26
Gold standard 3 5.56
Citation analysis 1 1.85
Not clear/not applicable 0 0.00

The most frequently used method of measuring the outcome of training or teaching was by means of a search score/scale or checklist (n = 20). In these studies, participants or students carried out a literature search that was then marked using a tool that gave a mark or score for features of a search; for example, ability to use Boolean operators or subject headings.

Surveys were also popular (n = 18) and covered a range from perceptions of the training, including confidence, but also incorporating questions that tested the knowledge of participants post-training. Multiple-choice questionnaires (MCQs) were used in 11 studies and, although some of these claimed to be measuring skills (behavioural domain), this has to be questioned, as the ability to answer a multiple choice question correctly about literature searching does not necessarily mean that the participant is able to carry out a literature search. It is more likely that these types of tests are measuring knowledge rather than skills. Tests (n = 9) covered a wide range, as did graded assessments (n = 9). This could include true/false questions, yes/no or questions requiring a specific answer. Vignettes/scenarios (n = 6), graded assessments and transaction logs (n = 8) were often combined with search scales/scores or checklists where participants were given a scenario which involved conducting a literature search on a given database, and this was marked with a standard score or checklist. An alternative method was to compare the student's literature search with a ‘gold standard’ search that had been performed by a librarian (n = 3). Perceptions about skills, confidence or other aspects of searching were used in 15 studies, in conjunction with other measures examining skills or knowledge. The Objective Structured Clinical Exam (OSCE) is a method most commonly used to examine the clinical skills of medical students. A series of ‘stations’ are set up where students perform tasks in a short space of time on which they are examined, taking a diagnosis from a patient, for example. In these cases (n = 5), literature searching was incorporated into the examination by having a ‘medline station’ where students were given details about a clinical case and needed to perform a medline search in order to answer the question.

Validity and reliability

According to Robson,4 validity and reliability are fundamental in establishing the trustworthiness of the research. Validity refers to whether the measure establishes the ‘true’ findings, or is it measuring something else? Reliability refers to the stability or consistency of the measure—does it measure the same things each time? For example, a test that measures whether a student's literature searching skills have improved post-training will only be reliable if it is consistent and can pick up changes over time (e.g. before and after training) and differences between students, or performs the same each time the student takes the test. For the test to be valid, it must be able to measure changes in the skills (not whether knowledge has changed). If outcome measures are used that are not valid or reliable in the context in which they are being used, then the results of the study can be called into question. For example, a test that has been designed to measure whether doctors have acquired skills following a short course in evidence-based practice, may not necessarily be valid for use with nurses who have undertaken a longer course in the same topic area. Although the courses may appear to be similar, the same measure can only be valid if the courses are teaching and wishing to evaluate the same learning criteria in a similar context. Robson and Streiner and Norman discuss issues regarding validity and reliability more fully in their texts;4,20 however, in brief it is essential that the outcome measure selected is measuring the outcomes of interest consistently.

To determine whether measures are valid or reliable, various tests can be performed. In the case of validity, content validity compares the test with the scope of the subject matter being tested and face validity is when experts decide if the test is reasonable. Other types of validity include discriminant validity (ensures that the instrument does not correlate with unrelated variables), criterion validity (comparison with another measure or gold standard) and construct validity (ongoing testing to ensure the test covers all relevant aspects). Reliability can also be described in various ways. Test–retest reliability measures whether the instrument performs the same over time and inter-rater reliability refers to whether the test performs the same when marked by different examiners. Internal consistency measures the extent to which items within a test measure the same thing.

The majority (n = 42, 78%) of measures described in the studies included in the review were developed for the purpose of that particular study and their validity was either not described or tested (Table 11). In those studies that did measure validity, face and content validity were the most common elements to be tested. In six papers, validity was discussed, but not tested or proven.

Table 11.  Validity
Content 611.11
Face 5 9.26
Discriminant 2 3.70
Not clear/not applicable4277.78
Discussed – not tested 611.11
Criterion 1 1.85

The majority of studies (n = 46, 85%) did not discuss or test reliability. Where reliability was tested, inter-rater reliability was the most common element to be established (n = 5) or internal consistency (n = 4). A full breakdown can be found in Table 12.

Table 12.  Reliability
Inter-rater 5 9.26
Internal consistency 4 7.41
Discussed – not tested 0 0.00
Test–retest 1 1.85
Not clear/not applicable4685.19

Measures with proven validity and reliability

Two measures demonstrated validity and reliability, and one proved valid and reliable in one study but not in a second. These were described or used in five papers relating to teaching evidence-based practice (which includes an information skills element).15–19 The measure described by Johnston et al.19 is a questionnaire that aims to measure knowledge, attitude, practice, use of and anticipated future use of evidence-based practice in order to evaluate evidence-based practice teaching in an undergraduate medical setting. The paper states that the questionnaire is a series of questions requiring Likert scale, open-ended and multiple-choice questions and demonstrates face, content, criterion and construct validity. One hundred and fifty-nine medical students were recruited for the development part of the questionnaire and 338 participated in the validation. The measure developed by Bradley et al.16,17 comprises a multiple-choice questionnaire to measure knowledge about searching for and critically appraising scientific articles for evidence-based practice (not included in the paper). The instrument was validated on delegates at an international conference in evidence-based practice and medical students participating in a randomized controlled trial. Validity was established in both studies; however, reliability was established in the first study but not the second, suggesting that the reliability of the measure is still under question. Robson,4 however, suggests that, unless a measure is reliable, it cannot be valid. Finally, the Fresno test, developed by Ramos et al.15 and used by Gardois et al.,18 uses open-ended questions on question formulation, searching and research design to examine the effects of instruction in evidence-based medicine. A copy of the instrument and scoring rubric is provided and shows that the instrument measures knowledge and, to some extent, skills in these areas. Validity (content, discriminant, construct) and reliability (inter-rater and internal) were demonstrated after testing on 115 residents, experts in evidence-based medicine and family practice teachers. All these tests have demonstrated validity in relation to evidence-based medicine teaching in academic situations. A fourth test, developed by Ross and Verdieck, which also examines evidence-based practice skills demonstrated content validity.21 This comprises a multiple-choice questionnaire covering question formulation and critical appraisal.

Although the tests may be useful in other situations, their validity and reliability have not been demonstrated elsewhere or with other groups. They may not be suitable for the average health librarian who wishes to evaluate the impact of a training session in literature searching. Unfortunately, studies measuring purely information skills or information literacy tended not to describe the validation of their instruments. Berner et al.22 established inter-rater reliability in a test measuring knowledge of resources, database search skills and critical appraisal, and utilized a scenario and search score as part of an OSCE examination. Hersh and Hickham23 also established inter-rater reliability in a test that measured database search skills of residents.


The review has highlighted that a range of measures and methods have been used in studies of health-related information skills training. The most frequent topics covered are database searching, sources of information, and question formulation.

The majority of the measures have been used in academic settings rather than in hospital libraries or on practising clinicians. However, with the exception of three measures which have been used in relation to searching within evidence-based practice, their reliability and validity remains unproven. Validated tests that can be used by the ‘average’ health librarian (particularly those in hospital libraries) are sadly lacking. There is clearly scope for more measures to be developed and, more importantly, tested for validity and reliability.

In a research study (or indeed any type of evaluation), using outcome measures that are not valid or reliable for the situation under study calls into question the results of the evaluation. This is because it is not possible to guarantee that the study is measuring what it intends to measure. This was the case in one randomized controlled trial,24 when the checklist used as an outcome measure appeared to be useful only for searches requiring more complex commands and was not reliable for more simple searches.

It is important to note that a number of the studies tested knowledge about searching—that is they were examining the impact of knowledge about searching not search skills.17,19,25–29 This is an important distinction when evaluating or considering the impact of training. Demonstrating knowledge or understanding about literature searching or database searching is not the same as being able to put those skills into practice. There are a number of examples of studies that measure actual skills in practice.22,26,30–35 A number of studies purported to measure skills, but a closer examination of the outcome criteria and measures used (e.g. using a multiple choice test or true/false to assess literature searching skills) suggests that they are more likely to be measuring knowledge or cognitive domains of learning rather than the behavioural domains.17,36 A limitation of a large number of the studies is the failure to include the whole instrument that was used to measure outcomes, even when describing its development and validation,16,19 leaving the reader unable to draw accurate conclusions about the validity and reliability of the study or fully assess the usefulness of the measure.

This systematic review is limited, in that it was only carried out by one individual; commonly systematic reviews are carried out by teams of people, with papers being double screened and data extracted, and thus reducing the possibility of bias in relation to the results/conclusions drawn. The decisions about reliability and validity were determined from information reported in the papers. It is possible that measures have been tested, but this was not reported in the papers. Contacting the authors would provide a truer picture, but this was outside the scope of this review. Furthermore, it was limited to the health library literature. Information literacy is a large topic area and the wider literature; for example, information literacy in all academic settings, where some measures may have proven validity or reliability, may be useful. Additionally, the literature on health outcome measurement (or educational assessment) could be used to provide further ideas and methods and measures for evaluating information skills training.

So where does this leave health librarians who wish to evaluate or assess their training? When selecting a measure, it is essential to bear in mind the reasons for service evaluation and to choose a measure or method that fits in with this. Straus et al.37 suggest a framework that involves considering:

  • 1Who is the learner?
  • 2What is the intervention?
  • 3What are the outcomes?

The summary table in Appendix 2 should help librarians to locate tools relevant to their situation (by considering the framework of Straus et al. in relation to their situation and matching it to the items in the table). This is not a perfect approach, but it will save some reinventing of the wheel.

For example, if you work in a nursing library and want to evaluate your training, you could apply the framework of Straus et al. to establish that your learners are nursing students, your intervention is the information skills/information literacy sessions you provided and you want to know if their database search skills have improved (outcomes). Appendix 2, column 4 shows that a number of studies are relevant to academic nursing libraries (Fox, Francis and Fisher, Martin, Jacobs et al. and Martindale), and these can be used as a starting point. A check on the outcome criteria in column 7 shows that Jacobs et al. measures database search skills using a test and multiple-choice questionnaire). This paper is most likely to warrant further investigation as it measures the outcomes of relevance. Once potential tools or studies have been identified, the points set out in a toolkit published by the National Library for Health38 can be used to narrow down the selection.

  • • Does the test that you have selected cover the work contained in the training session?
  • • Does the test focus on the most important information that you want participants to know rather than small irrelevant facts?
  • • Is the measure or tool reliable?
  • • Is the measure valid?
  • • Is the test feasible for your use?

The first two points can be checked by examining the paper, but Appendix 2 notes that the paper does not describe any tests for validity and reliability. Before using the instrument, a pilot should be carried out and, ideally, tests for validity and reliability in the context in which you are using it.

Feasibility has not been addressed in this review, but it is important to consider when looking at the practical issues relating to evaluation. Some measures are more feasible for use in an academic setting, as they can be used on predetermined groups or involve a graded assessment. This would not be feasible with practising health professionals. Some instruments take up considerable amounts of time to administer, this may be feasible for a research project or one-off evaluation, but not on a routine basis. For an instrument to be used routinely, it should be relatively easy to incorporate into a training session and one should have the resources to obtain and interpret the results. Piloting the tool to find out how feasible it is for use (particularly if the tool of interest has not been tested for validity or reliability) is likely to be a worthwhile exercise.


The review has highlighted that a range of measures and methods have been used in studies of health-related information skills training. However, few have been proven to be valid or reliable, key requisites when selecting measures to use when evaluating an intervention. There is clearly scope for more measures to be developed (particularly in non-medical academic and practice settings) and more importantly tested for validity and reliability. Without the use of valid and reliable measures, it is difficult to see how high-quality studies of information skills training can be carried out or how impact can truly be measured. In the meantime, this review offers some practical tips and suggestions to help health librarians select tools to evaluate the impact of their own practice.


The author would like to thank the University of Salford Research Investment Fund for providing funding to update and complete this systematic review. Grateful thanks are also owed to David Brettle for providing expertise and guidance in using Excel. This review was initially conceived and presented at the Evidence-Based Librarianship Conference, Alberta, Canada in 2003. An updated review was presented at the UK Health Libraries Group Conference, Eastbourne in July 2006.

Key Messages

Implications for Policy

  • • There are various measures available to measure the impact of information skills training or information literacy.
  • • Few of these have been tested for validity and reliability—basic methodological considerations.
  • • There is scope for developing and, more importantly, validating measures to demonstrate the impact of information skills training—particularly in non-medical academic situations, and for practising health professionals.

Implications for Practice

  • • Various measures have been used in research studies.
  • • These may be suitable for evaluating the impact of practice—but care is needed to select the most appropriate ones.
  • • The final appendix offers guidance for selecting measures relevant to specific settings.
Table Appendix1.  Data extraction template
Study Number
aims of study:
aims of paper:
key findings:
summary evaluative comments:
study type:
comparison intervention:
care setting (academic/practice):
teaching method used:
mode of delivery (e.g. lecture, practical):
topic taught:
instructional contact time:
beginning and duration of study:
source pop/sampling frame:
sample selection:
number of groups:
study size:
methods of measurement:
outcome criteria (eg what learning o/c measured -skills/knowledge):
outcome measures:
length of follow-up:
ethical committee approval:
informed consent:
other ethical issues:
total no of refs:
other noteworthy features:
Table Appendix2.  Summary table of features of included studies/measures
AuthorTitleLocationSettingParticipantInterventionOutcome CriteriaOutcome MeasuresValidityReliability
  1. KEY: DB—database, EBM—evidence based medicine, MCQ—multiple choice questionnaire, NC—Not clear or not applicable, PG—postgraduate, Search q—search question, US—United States, UG—undergraduate.

Barnett et al. Mount Sinai J Med 2000; 67(2):163–8 An integrated program for evidence based medicine in medical schoolUSAcademic-medicalUG4 year integrated EBM curriculumCritical reasoning/appraisal skills Search quality DB search skillsSearch scale/score/checklist Graded assessment OSCENCNC
Berner ES et al. Academic Medicine 2002; 77(6):547–51A model for assessing information retrieval and application skills of medical studentsUSAcademic-medicalUGInstruction in applied medical informaticsKnowledge resources DB search skills Critical reasoning/appraisal skills EBM principles/knowledgeSearch scale/score/checklist Vignette/scenario MCQ OSCENCInter rater
Blumenthal JL et al. Medical Reference Services Quarterly 2005; 24(2):95–102 Defining and assessing medical informatics competenciesUSAcademic-medicalUGLarge group sessions facilitated by faculty librarian team to year 1 students or individual consultations with librarian (not mandatory). Two hour hands on EBM workshop for 3rd year studentsNCSurvey Graded assessment PerceptionsNCNC
Bradley DR et al. JMLA 2002; 90(2):194–201 Real-time, evidence based medicine instruction: a randomised controlled trial in a neonatal intensive care unitUSClinical PracticeResidentObservation of search + real time 1 to 1 librarian instruction (i.e. at the point of need on a neonatal intensive care unit) and feedback during search in OVID MEDLINESearch q development DB search skills Search behaviour/attitudeSurvey Search scale/score/checklist Recall precisionNCNC
Bradley P and Herrin J Med Educ Online 2004; 9(15): http://www.med-ed-online.org/res00096.htmDevelopment and validation of an instrument to measure knowledge of evidence based practice and searching skillsOther-EuropeAcademic-medicalUGEBP conference for experimental study and medical students undertaking and EBP course (similar to that occurring at conference)EBM principles/ knowledge EBM skills Search q development DB search skills Critical reasoning/appraisal skillsMCQContent, Face, DiscriminantInternal Consistency
Bradley P et al. Medical Education. 2005; 39(10):1027–35Comparison of directed and self directed learning in evidence based medicine: a randomised controlled trialOther-EuropeAcademic-medicalUGComputer assisted, self directed learning about EBMSearch q development DB search skills Critical reasoning/appraisal skills EBM principles/knowledgeMCQContent, Face, DiscriminantInternal Consistency
Burrows SC and Tylman V BMLA. 1999; 87(4):471–6Evaluating medical student searches of MEDLINE for evidence based information: process and application of resultsUSAcademic-medicalUGInformation skills instruction and evidence based medicine instruction incorporated into first year students orientation and the medical school curriculum, examined in the 3rd yearDB search skills Evaluation search resultsOSCE Search scale/ score/checklistNCNC
Burrows SC et al. JMLA. 2003; 91(1):34–41Developing an evidence based medicine and use of the biomedical literature component as a longitudinal theme of an outcomes based medical school curriculum—year 1USAcademic-medicalUGUse of the EBM literature component on medical school curriculumSearch q development DB search skills Evaluation search resultsSearch scale/score/checklist Graded assessment InterviewNCNC
Cheng GYT HILJ. 2003; 20(supplement 1, June):22–33Educational workshop improved information-seeking skills, knowledge, attitudes and the search outcome of hospital clinicians: a randomised controlled trialOtherClinical PracticePractice3 hour educational workshop with supervised hands on practiceSearch q development DB search skills Evaluation search results Search behaviour/attitudeSurvey Perceptions MCQ Usage ratesNCNC
Davidson RA et al. Academic Med. 2004;79(3):272–5Evaluating evidence based medicine skills during a performance based examinationUSAcademic-medicalUGEBM teachingSearch q development DB search skills Critical reasoning/appraisal skills EBM principles/knowledgeSearch scale/score/checklist Vignette/scenario OSCENCNC
Dorsch JL et al. JMLA. 2004; 92(4):397–406Impact of an evidence based medicine curriculum on medical students attitudes and skillsUSAcademic-medicalUGEBM seminar series covering 5 steps of EBM supported by pre seminar reading and worksheetsSearch q development DB search skills Evaluation search results EBM principles/knowledge Research methodsSurvey Graded assessment PerceptionsNCNC
Dorsch JL BMLA. 1997; 85(2):147–53Bioethicsline use by medical students: curriculum integrated instruction and collection development implicationsUSAcademic-medicalNCBibliographic instruction, with an introduction to library resources with an emphasis on epidemiology and bioethicsDB search skills Knowledge resourcesTransaction log Usage rates Citation analysisNCNC
Dyer H and Bouchet ML HLR. 1995; 12(1):39–52A comparison between the perceived value of information retrieved via end user searching of CD Roms and medicated online searchingUKLibraryOtherEnd user search on CD RomSearch value Search qualityRelevance Interview SurveyNCNC
Dyer H and Buckle P. London: British Library; 1995Who's been using my CD-ROM? Results of a study on the value of CD-Rom searching to users in a teaching hospital l ibraryUKLibraryOtherno course—not a study of training, but some participants had received training. Assessing quality of searches not the impact of teaching on quality of searchingSearch quality Search valueRelevance Gold standardNCNC
Erickson S and Warner ER Medical Education. 1998; 32(3):269–73The impact of an individual tutorial session on MEDLINE use among obstetrics and gynaecology residents in an academic training programme: a randomized trialUSClinical PracticeResident1 hour individual tutorial on MEDLINEDB search skills Search behaviour/ attitudeRecall precision Perceptions Usage rates RelevanceDiscussed Not TestedNC
Fox LM et al. BMLA. 1996; 84(2):182–90A multidimensional evaluation of a nursing information-literacy programUSAcademic-nursingUGIntroduction of a nursing information literacy program integrated into the nursing curriculumKnowledge resources Confidence Search behaviour/attitudeGraded assessment Perceptions MCQNCNC
Francis BW and Fisher CC BMLA. 1995; 83(4):492–8Multilevel library instruction for emerging nursing rolesUSAcademic-nursingUGCourse integrated bibliographic instruction for all nursing programsKnowledge resources Search knowledgeTest SurveyNCNC
Gardois P et al. Hypothesis 2005; 2005 Spring; 19(1):1Assessing the efficacy of EBM teaching in a clinical settingOther-EuropeAcademic-medicalPracticeEducational program in medical information retrieval and EBMSearch q development Knowledge resources DB search skills EBM skills Search behaviour/attitudeSurvey Perceptions MCQContent FaceInter rater Internal Consistency
Gibson KE and Silverberg M BMLA. 2000; 88(2):157–64 A two year experience teaching computer literacy to first year medical students using skill based cohortsUSAcademic-medicalUGComputer literacy courseComputer literacy skillsTest PerceptionsDiscussed Not TestedNC
Grant KL et al. American Journal of Pharmaceutical Education 1996; 60:281–286 Teaching a systematic search strategy improves literature retrieval skills of pharmacy students.USAcademic-PAMUGLecture/computerised demonstration of a systematic approach to computerised literature retrievalDB search skillsSearch scale/score/checklistNCNC
Graves RS Boston: Scarecrow Press Inc; 2001. pp. 288–292 Partnering with occupational therapy: the evolution of an information literacy programUSAcademic-PAMUGRange of information/library courses to build up into an information literacy curriculumDB search skillsSearch scale/score/ checklistNCNC
Haynes RB et al. BMLA. 1991; 79(4):377–81Online access to MEDLINE in clinical settings: impact of user feesCanadaClinical PracticePracticeTraining and free use of MEDLINE, then allocation to fee groupDB search skills Search quality Search valueRecall precision RelevanceDiscussed Not TestedNC
Haynes RB et al. Annals of Internal Medicine. 1990; 112(1):78–84Online access to MEDLINE in clinical settings: a study of use and usefulnessCanadaClinical PracticePracticeProvision of free online access to MEDLINE (grateful med). Participants offered an introduction to online searching and 2 hours free search timeDB search skills Search quality Search valueRecall precision RelevanceDiscussed Not TestedNC
Hersh W and Hickam D BMLA. 1994; 82(4):382–9 Use of a multi-application computer workstation in a clinical settingUSClinical PracticeResidentAvailability of MEDLINE and other resources in practice setting + brief training and manualDB search skills Search behaviour/attitudeRecall precision Relevance Perceptions Usage ratesNCInter rater
Hersh W et al. JASIS. 1996; 47(1):50–56 And Proceedings of the 58th Annual Meeting of the American Society for Information Science 1995:112–116A task-oriented approach to information retrieval evaluationUSAcademic-medicalUGBrief MEDLINE training session + searching test (automated, natural language system)DB search skills Evaluation search resultsTest Transaction log Survey Search scale/ score/checklistNCNC
Holloway R et al. Medical Education 2004; 38(8):868–78Teaching and evaluating first and second year medical students’ practice of evidence-based medicineUSAcademic-medicalUGEBM curriculumSearch q development DB search skills Critical reasoning/appraisal skills Application of evidence Search behaviour/attitudeSearch scale/score/checklist Vignette/scenarioDiscussed Not TestedTest ReTest
Jacobs SK et al. Journal of Professional Nursing 2003; 19(5):320–8 Information literacy as the foundation for evidence based practice in graduate nursing education: a curriculum integrated approachUSAcademic-nursingPGIntroduction of modular information literacy components into nursing programKnowledge resources DB search skills Library orientationTest MCQNCNC
Johnston JM et al. Medical Education 2003; 37(11):992–1000 Information literacy as the foundation for evidence based practice in graduate nursing education: a curriculum integrated approachOtherAcademic-medicalUGIntroduction of modular information literacy components into nursing programEBM principles/knowledge Search behaviour/attitudeSurveyContent Face CriterionNC
Jwayyed S et al. Academic Emergency Medicine. 2002; 9(2):138–45 Assessment of emergency medicine residents’ computer knowledge and computer skills: time for an upgrade?USClinical PracticeResidentQuestionnaire about PC use and ability to perform 23 informatics related tasks. 3 part test designed to test the skills mentioned in the questionnaireComputer literacy skills DB search skillsTestNCNC
King NS BMLA 1993; 81(4):439–41 End-user errors: a content analysis of paper chase Transaction logsUSLibraryOtherAnalysis of search errorsDB search skillsTransaction log Search scale/score/checklistNCNC
Koufogiannakis D et al. HILJ. 2005; 22(3):189–95. Impact of librarians in first year medical and dental student problem based learning (PBL) groups: a controlled studyCanadaAcademic-medicalUGAddition of librarian to PBL groupsSearch q development DB search skills EBM principles/knowledge Search behaviour/attitudePerceptions MCQNCNC
Linton AM et al. Medical Reference Services Quarterly. 2004; 23(2):21–31 Evaluation of evidence based medicine search skills in the clinical yearsUSAcademic-medicalUGEBM workshopKnowledge resources DB search skills EBM principles/knowledge EBM skills, search knowledgeTest Survey PerceptionsNCNC
Martin S HLR. 1998;15:111–116Reflections on a user education session with nursing studentsUSAcademic-nursingUGLibrary skills session + reflective essay (marked as part of coursework)Evaluation search results Search behaviour/attitudeGraded assessmentNCNC
Martindale K Learning Resources Journal. 1995; 11(2):37–40Teaching information skills on CD Rom: a conceptual approachUKAcademic-nursingUGTeaching session—few details, apart from taught concepts of literature searching rather than practicalitiesDB search skills Search behaviour/attitudeInterviewNCNC
McKibbon KA et al. AMIA. 1992:73–77A study to enhance clinical End-User MEDLINE search skills: design and baseline findingsOther-EuropeClinical PracticePractice1 hour training + 1 hour individualised searching + preceptor to give individualised feedback on 1st 10 searchesSearch quality Search behaviour/attitude Evaluation search results Search valueRecall precision Perceptions SurveyDiscussed Not TestedNC
Nelson JL Medical Reference Services Quarterly. 1992; 11(4):11–21An analysis of Transaction logs to evaluate the educational needs of end usersUSLibraryOtherNoneDB search skills Search behaviour/attitudeTransaction log Search scale/score/checklistNCNC
Pao ML et al. Academic Medicine 1994; 69(11):914–20Factors affecting students’ use of MEDLINEUSAcademic-medicalUGNoneDB search skillsGraded assessment Recall precision Transaction log Search scale/score/checklistNCNC
Pearce-Smith N INFORM 2005–2006; 16(3):17–18Randomised controlled trial comparing the effect of e-learning with a taught workshop on the knowledge and skills of health professionalsUKClinical PracticePracticeWorkshop in search skills taught by a librarianDB search skills Evaluation search results Research methodsSearch scale/score/checklist Vignette/scenario MCQNCNC
Poyner et al. HlLJ. 2002; 19(2):84–9 Distance learning project—information skills training: supporting flexible trainees in psychiatryUKAcademic-medicalResident4 individual intensive (2–3 hour) home based visits, supplemented by 5 half day workshops covering: Microsoft Office (various aspects), Cochrane Library, PubMed and Internet sources of information. Tailored to individual needs and skills but to ensure that all had covered same syllabus by end of projectDB search skills Computer literacy skillsTestNCNC
Ramos KD et al. BMJ. 2003; 326(7384):319–21Validation of the Fresno test of competence in evidence based medicineUSAcademic-medicalResidentTest developmentEBM principles/knowledge Knowledge resources Search q development Critical reasoning/appraisal skillsTestFace ContentInter rater Internal Consistency
Rosenberg WM et al. Journal of the Royal College of Physicians of London. 1998; 32(6):557–63Improving search skills and evidence retrievalUKAcademic-medicalUG3 hour interactive training session in search question formulation and MEDLINE searching delivered by 2 librarians to groups of 4–7 studentsDB search skillsSearch scale/score/ checklist Relevance PerceptionsNCNC
Rosenfeld P et al. CIN: Computers, Informatics, Nursing. 2002; 20(6):236–41 Piloting an information literacy program for staff nursesUSClinical PracticePracticeInformation literacy/searching for ebp trainingDB search skillsTransaction log Search scale/score/checklistNCInter rater
Ross R and Verdieck A Academic Medicine. 2003; 78(4):412–7 Introducing an evidence based medicine curriculum into a family practice residency—is it effective?USAcademic-medicalResident10 session EBM workshop seriesSearch q development Critical reasoning/appraisal skills EBM principles/knowledgeMCQ InterviewContentNC
Sathe NA et al. JMLA. 2004; 92(4):459–64A power information user (PIU) model to promote information integration in Tennessee's public health communityUSClinical PracticePracticeObservation of workplace and information needs, development and delivery of tailored curriculumApplication of evidence Search behaviour/attitudeSurvey InterviewNCNC
Shelstad KR and Clevenger FW Journal of Surgical Research. 1994; 56(4):338–44 On-line search strategies of third year medical students: perception vs. factUSAcademic-medicalUGGrateful med trainingDB search skills Search behaviour/attitudeSurvey Gold standardNCNC
Shorten A et al. International Nursing Review. 2001; 48(2):86–92 Developing information literacy: a key to evidence based nursingAustraliaAcademic-nursingUGA series of info literacy learning activities integrated into the nursing curriculumKnowledge resources Search knowledge ConfidencePerceptions SurveyNCNC
Smith CA et al. Journal of General Internal Medicine. 2000; 15(10):710–5 Teaching residents evidence based medicine skillsUSClinical PracticeResident7 week EBM course, 2x per week focusing on 4 essential EBM skillsSearch q development DB search skills Application of evidence Research methodsVignette/scenario Survey Search scale/score/checklistNCNC
Srinivasan M et al. Journal of General Internal Medicine. 2002; 17(1):58–65Early introduction of an evidence based medicine course to preclinical medical studentsUSAcademic-medicalUG1 month problem based EBM courseEBM principles/knowledgeSurvey Transaction log Perceptions MCQNCNC
Steele GA and Greenidge E HILJ. 2002; 19(4):206–13 Integrating medical communication skills with library skills curricula among first year medical students at the University of the West Indies, St AugustineOtherAcademic-medicalUGIntegration of communication and library skills into the medical school curriculumDB search skills Knowledge resourcesTest Graded assessmentNCNC
Verhoeven AA et al. Family Practice. 2000; 17(1):30–5Which literature retrieval method is most effective for GPs?Other-EuropeAcademic-medicalPractice2 hour introduction to literature retrieval methods and on site training session on index medicusSearch quality Search valueSearch scale/score/checklist Recall precision SurveyNCNC
Vogel EW et al. JMLA. 2002; 90(3):327–30Finding the evidence: teaching medical residents to search MEDLINEUSAcademic-medicalResident3 hour workshop pre workshop scenario, 30 minute didactic presentation on MEDLINE searching Remainder of session practical on pcs. Given guided step by step searching for two scenarios. Independent practice of third scenario. Given lecture notes and MEDLINE instructionsDB search skillsSearch scale/score/checklist RelevanceNCNC
Wallace MC et al. Nurse Education Today. 2000; 20(6):485–9 and Nurse Education Today. 1999; 19(2):136–41Teaching information literacy skills: an evaluation and Integrating information literacies into an undergraduate nursing programmeAustraliaAcademic-nursingUGCurriculum integrated information literacy programmeSearch q development Knowledge resources DB search skills Critical reasoning/appraisal skills ConfidenceGraded assessment SurveyNCNC
Wildemuth BM and Moore ME BMLA. 1995; 83(3):294–304 End user Search behaviours and their relationship to search effectivenessUSAcademic-medicalUGNoneDB search skills Search behaviour/attitudeTransaction log Search scale/score/checklist Gold standard PerceptionsNCNC
Woods SE and Francis BW BMLA. 1996; 84(1):108–9 MEDLINE as a component of the objective structured clinical examination: the next step in curriculum integrationUSAcademic-medicalUGIntroduction of a MEDLINE station into OSCEDB search skillsVignette/scenario OSCENCNC