Search strategies to identify observational studies in MEDLINE and EMBASE

  • Protocol
  • Methodology



This is the protocol for a review and there is no abstract. The objectives are as follows:

To assess the sensitivity and precision of methodological search filters for the detection of observational studies in MEDLINE and EMBASE.


Description of the problem or issue

Systematic reviews of the literature have become vital decision-making aids for clinicians, researchers, policy makers and patients (Gough 2012a; Ligthelm 2007; Manchikanti 2009; Wilczynski 2007). They provide a formal synthesis of a large and ever increasing body of research literature. Systematic reviews typically address specified questions and can, as a result, help to (1) establish links between available information and potentially beneficial (or harmful) interventions, (2) compare and contrast conflicting results, and (3) identify gaps in medical knowledge (Manchikanti 2009; Wilczynski 2007).

In order to achieve their objectives, systematic reviews rely on the use of explicit strategies to search for relevant evidence and on methodological criteria against which to evaluate this evidence (Wilczynski 2007). When searching for relevant studies, researchers (including those conducting a systematic review) can make use of structured search strategies that can facilitate this process (Wilczynski 2007). However, searching for specific studies on a given topic can be challenging, particularly when searching for a specific study design. Wilczynski 2007 attributes this phenomenon to the spread of relevant papers across numerous scientific journals, the inherent limits in indexing, and the lack of search skills amongst database users.

Systematic reviews vary in many respects, including the types of research questions they ask (Gough 2012b). The review question will, in turn, determine the methods and the types of data that are most appropriate to answer the question itself (Gough 2012a; Gough 2012b). Systematic reviews assessing the effectiveness of interventions are typically best answered by data from randomised controlled trials (RCTs) (Glasziou 2001; Ligthelm 2007). Systematic reviews asking questions of aetiology and risk, prediction and prognosis, or frequency of rare outcomes or complications are usually best answered by data from observational studies (Furlan 2006; Glasziou 2001).

However, there are circumstances in which evidence from observational studies is needed in order to assess the effectiveness of interventions or safety outcomes: when data from RCTs are insufficient or when the findings of RCTs appear to be contradictory (Fraser 2006; Furlan 2006; Manchikanti 2009). Improvements in observational-study methods and statistical analyses have made observational studies an important source of evidence, particularly with regards to the side effects or adverse events associated with health interventions (Ligthelm 2007; Manchikanti 2009; Wieland 2005). As argued by Ligthelm 2007, observational studies can complement data from RCTs in order to provide an evidence base for clinical decision-making or for policy-making. While searching for RCTs has become a relatively simple task since the 1990s (Lefebvre 2013), limitations in indexing practices can make the identification of observational studies particularly challenging.

Description of the methods being investigated

The use of a search strategy in health-related bibliographic databases is the method required by The Cochrane Collaboration and other evidence-based healthcare organisations to identify relevant study reports for a systematic review. MEDLINE and EMBASE are the principal databases of biomedical scientific literature. Together, they contain abstracts for many millions of published articles in this field, the extent depending on the topics of interest. Records in these databases can be searched electronically for words in the title or abstract, and for assigned index terms. The latter are controlled vocabulary terms that indexers assign to each record after reviewing them (Higgins 2011). Searching these two databases is usually the minimum requirement for anyone wishing to conduct a systematic review; although they often overlap (between 10% and 87% of the indexed records) depending on the topic under consideration (Manchikanti 2009).

Search strategies can be complemented by including search filters. These refer to a predefined combination of terms that have been designed to retrieve a selection of records on the basis of a particular concept (CRD 2012). Filters used to retrieve records on the basis of their study design are often referred to as methodological filters. The combination of search filters with content terms will in turn determine the performance properties of a search strategy, namely, its sensitivity, precision (or positive predictive value (PPV)) and specificity (Doust 2005; Fraser 2006).

How these methods might work

Evaluating a search strategy relies on the availability of a reference standard against which to compare its performance, in this case the included studies of a systematic review (Sampson 2006). By comparing the records retrieved by a search strategy with a methodological filter with those retrieved by a search strategy without a filter, it is possible to calculate the performance properties of the filter. In the context of systematic reviews, the sensitivity and precision are the most relevant performance properties of a search filter (Sampson 2006). Sensitivity, also referred to as recall, is defined as the number of relevant records in a database identified by the search strategy as a proportion of the total number of relevant records in the database (Sampson 2006). The precision of a search strategy refers to the number of relevant reports identified by the search strategy as a proportion of the total number of records yielded by the search (Doust 2005; Furlan 2006; Sampson 2006).

Review authors should aim for search strategies that have both high sensitivity and high precision (Sampson 2006). In addition, authors should identify and include all possibly relevant reports (high sensitivity) in order to reduce the likelihood of bias in their systematic reviews, and to reduce random error in meta-analyses (Edwards 2002; Robinson 2002). At the same time, they should attempt to retrieve as few irrelevant records as possible (high precision) in order to minimise the burden on the resources available (Gough 2012a; Gough 2012b; Sampson 2006). However, in reality there are trade-offs between these two properties.

An ideal methodological filter could help review authors to achieve this balance by maintaining the sensitivity of a content-only search strategy while increasing its precision (Doust 2005; Fraser 2006). Applying methodological filters to a search strategy could in theory limit the number of records retrieved in a search, while avoiding the exclusion of relevant papers. At the same time, a methodological filter could limit the number of records that need to be evaluated for inclusion in the review. However, by reducing the number of hits methodological filters could increase the likelihood of missing relevant records that would otherwise be included in a systematic review.

Why it is important to do this review

Searching health-related literature by study design can identify the study type of primary interest in an efficient and time-saving manner (Littleton 2004). Specifying the types of study design is relatively easy for RCTs owing to initiatives such as the Cochrane Central Register of Controlled Trials (CENTRAL) database of trials, the introduction of the Consolidated Standards of Reporting Trials (CONSORT) statement (which is linked to better reporting of RCTs in the titles and abstracts), appropriate indexing terms in MEDLINE and EMBASE, and the publication of highly sensitive filters (Fraser 2006 ; Lefebvre 2013).

The situation is different when dealing with observational studies. Indexing using Medical Subject Headings (MeSH) intervention terms is limited; and when used, these terms are usually applied inconsistently (Fraser 2006; Wieland 2005). Despite the introduction of statements such as the Strengthening the Reporting of Observational studies in Epidemiology (STROBE) guidelines, reporting of methodological detail is still poor in observational studies, contributing to the problems in indexing and searching (Fraser 2006; Manchikanti 2009). The lack of appropriate search terms for observational studies has greatly contributed to the exclusion of methodological components from search strategies (Fraser 2006). As a consequence of this, searches often yield a large number of irrelevant records, leading to the inefficient use of resources and the time needed to complete a review increases (Doust 2005). For this reason, it is necessary to explore the literature for recent developments in search approaches that can lead to the efficient identification of observational studies.

In addition, there appear to be no agreed universal standard criteria for the creation of a search strategy (Lemeshow 2005); although guidelines are available to anyone thinking of undertaking a systematic review, particularly in relation to RCTs. In their Handbook for Systematic Reviews of Interventions (Higgins 2011), The Cochrane Collaboration presents their Highly Sensitive Search Strategy (HSSS) for identifying RCTs in MEDLINE. Similarly, The Cochrane Collaboration is working towards the creation of an objectively derived HSSS for identifying RCTs in EMBASE. The work of another group, the InterTASC Information Specialists' Sub-Group (ISSG), focuses on the identification, assessment and testing of search filters that are intended to select studies depending on their design or focus (CRD 2012). They offer various resources related to study designs such as RCTs, observational studies, diagnostic studies, and economic evaluations, among others.

Attempts have been made to appraise the evidence for search filters. A recent Cochrane systematic review (Leeflang 2013) evaluated the performance of search filters designed to retrieve diagnostic test accuracy (DTA) studies in MEDLINE and EMBASE. Similarly, McDonald 2013 attempted to assess search strategies to identify RCTs in MEDLINE (this protocol has now been withdrawn). However, we could not identify a similar protocol or review on search strategies for observational studies. A systematic review in this area could help to identify the specific features of a search strategy that can improve the identification of observational studies. As a result, this work could contribute to the creation of evidence-based standards for the formulation of search strategies.


To assess the sensitivity and precision of methodological search filters for the detection of observational studies in MEDLINE and EMBASE.


Criteria for considering studies for this review

Types of studies

We will include studies that compare search strategies that include methodological filters for identifying observational studies in MEDLINE and EMBASE against a reference standard. In this review, we will use observational studies to refer to the classic epidemiological designs (i.e. case-control, cohort and cross-sectional studies) as well as other study designs such as case series, controlled before-and-after (CBA) studies and interrupted time series (ITS), among others.

Types of data

We will include data from published, unpublished and grey literature comparing two or more search strategies for retrieving observational studies in MEDLINE and EMBASE.

We will exclude studies that compare the effectiveness of the same strategy across different bibliographic databases or search interfaces. We will not exclude studies on the basis of their language or time of publication. We will not consider for inclusion any studies that focus on the retrieval of observational studies from bibliographic databases other than MEDLINE or EMBASE.

Types of methods

We will include studies that compare a search strategy that contains a methodological filter with a search strategy without a methodological filter. We will also include studies that compare two or more different methodological filters. We will focus on methodological filters for searching observational studies in MEDLINE or EMBASE.

Types of outcome measures

Primary outcomes
  • Sensitivity: defined as the number of relevant reports in a database that were identified by the search strategy as a proportion of the total number of relevant reports identified by the reference standard; and

  • Precision: the number of relevant studies identified by the search strategy divided by the total number of records retrieved by the search strategy.

We will assess the performance properties of the methodological filters against a reference standard. For the included studies, we will measure the change in sensitivity and precision after adding a methodological filter to the search without the methodological filter, or to the search with a different methodological filter.

Search methods for identification of studies

Electronic searches

We will use the search strategy outlined in Appendix 1 in MEDLINE. We will adapt this search strategy for use in Cochrane Methodology Register (CMR), EMBASE, CINAHL and Google Scholar.

Searching other resources

We will search the grey literature by adapting the search strategy outlined in Appendix 1 for use in OpenGrey.

We will also check the reference lists of all relevant primary studies and review articles for additional references (Horsley 2011). We will contact authors of included trials and ask them for additional published and unpublished studies (Young 2011).

Data collection and analysis

Selection of studies

TR will implement the search strategy described above. We will import the references into EndNote 2010 and remove duplicate records. Two authors (JMB and LTC) will independently screen the titles and abstracts and JMB will obtain the full-text reports of potentially relevant records. Afterwards, JMB and LTC will independently assess the full-text of these records for compliance with the inclusion and exclusion criteria. If any disagreement arises, JMB and LTC will resolve it through discussion. If we cannot reach agreement, LG or JC will act as arbiter.

Data extraction and management

JMB and LTC will independently extract data from included studies using a structured form. They will compare their data extraction forms and follow up any discrepancies with reference to the original report.

Where possible, we will extract at least the following information from each included study:

  • general information about the study (e.g. study authors, journal of publication, original language of publication, year of publication);

  • study methods including aim of the study and study design;

  • number and type of methodological filters being compared;

  • number of records yielded by each of the methodological filters in MEDLINE, EMBASE or both;

  • sensitivity and precision of each of the methodological filters; and

  • reference standard method against which each methodological filter was compared. Each study should use the same reference standard to compare the filters they are assessing.

We will summarise this information in a Characteristics of included studies table.

Assessment of risk of bias in included studies

We will assess the included studies against their reporting of the development and implementation process of the search filters being studied. Therefore, we will assess the included studies across the following domains: (1) intended application; (2) reference standard, its appropriateness and whether it was independent from the reference standards used to develop the methodological filters; (3) filter validation; (4) limitations; (5) requirements of the intended application; (6) accuracy of translation of filters between platforms (if applicable); and (7) ability to reproduce accurately historical searches when included studies from a review are the reference standard.

Measures of the effect of the methods

We will evaluate the characteristics of included studies in order to determine the feasibility of conducting a meta-analysis. We will only report the primary outcomes of interest in a meta-analysis if the studies in question are evaluating the same methodological filter.

For each included study, we will attempt to calculate the differences between the search strategies being compared for each of the primary outcomes of interest (i.e. sensitivity and precision). We will perform all statistical analyses using Review Manager 5 (RevMan 2012). For comparisons across studies, we will use the sensitivity and precision of the search strategies being assessed.

Dealing with missing data

We will exclude studies that do not provide enough data to calculate the performance properties (i.e. sensitivity and precision) of the search strategies being evaluated. Therefore, we do not expect to find situations of missing data within our analyses.

Assessment of heterogeneity

We will assess the included studies across the following characteristics: (1) dates when the searches were conducted, (2) interfaces used to run the searches and (3) study design. We will conduct a meta-analysis for those studies deemed similar across these domains and evaluate them for heterogeneity using the I2 statistic. If the value obtained is greater than 0.5 we will not assess publication bias or perform a meta-analysis due to substantial heterogeneity (Higgins 2011). If a meta-analysis is indicated, we will calculate the mean change in sensitivity and precision for the filters. We will only calculate the mean effect for different assessments of the same filter.

Data synthesis

We will create tables to summarise the data extracted from the included studies with separate tables for those strategies implemented in MEDLINE and for those implemented in EMBASE. Within each of these databases, we will create separate tables according to the type of observational study for which the strategy was developed. In each table, we will include information about the date on which the strategy was implemented, the reference standard used to validate the methodological filter, the filter's sensitivity and precision, and the interface in which the strategies were run.

If appropriate numerical data are available and a meta-analysis is appropriate, we will synthesise data by primary outcome of interest. For each primary outcome, we will calculate the mean difference (MD) and 95% confidence intervals (CIs).

If appropriate numerical data are not available, or if a meta-analysis is not appropriate, we will adopt a narrative synthesis approach to the presentation of our results.

We will use the GRADE approach to assess the quality of the evidence, the magnitude of effect observed and the overall available data on our primary outcomes of interest, to produce a 'Summary of findings' table.

Subgroup analysis and investigation of heterogeneity

We intend to perform subgroup analyses according to the bibliographic database searched (i.e. MEDLINE and EMBASE). Additionally, if we identify more than one assessment of the same search filter, we will calculate the mean effect for that filter separately.

Sensitivity analysis

We will conduct a sensitivity analysis in any of the following situations: (1) one or more included studies are dominant in terms of their size, (2) the results of one or more of the included studies differ significantly (based on assessing the overlap of 95% CIs) from the results of other included studies or (3) if quality issues are identified when assessing the risk of bias of included studies.




Appendix 1. MEDLINE search strategy

1. Epidemiologic Studies/

2. exp Case-Control Studies/

3. exp Cohort Studies/

4. Cross-Sectional Studies/

5. (epidemiologic adj (study or studies)).ab,ti.

6. case control.ab,ti.

7. (cohort adj (study or studies)).ab,ti.

8. cross sectional.ab,ti.

9. cohort analy$.ab,ti.

10. (follow up adj (study or studies)).ab,ti.

11. longitudinal.ab,ti.

12. retrospective$.ab,ti.

13. prospective$.ab,ti.

14. (observ$ adj3 (study or studies)).ab,ti.

15. adverse effect?.ab,ti.

16. 1 or 2 or 3 or 4 or 5 or 6 or 7 or 8 or 9 or 10 or 11 or 12 or 13 or 14 or 15

17. medline.ti.

18. embase.ti.

19. pubmed.ti.

20. (database? and searching).ti.


22. *PubMed/

23. *Databases, Bibliographic/

24. 17 or 18 or 19 or 20 or 21 or 22 or 23

25. 16 and 24

26. ((identify$ or develop$ or design$ or test$ or assess$ or evaluat$ or robust$ or optim$ or effic$ or effect$ or sensitiv$ or simpl$ or specific$ or precis$) adj3 ("search strat$" or "search filter?")).ab,ti.

27. 16 and 26

28. 25 or 27

Contributions of authors

All authors contributed substantially to the development of this systematic review protocol.

Declarations of interest