Clinical Evidence diagnosis: developing a sensitive search strategy to retrieve diagnostic studies on deep vein thrombosis: a pragmatic approach


Sam Vincent, Bazian, 138 Upper Street, London N1 1QP, UK. E-mail:


Objectives: To devise and evaluate a sensitive search strategy to retrieve diagnostic studies on specific diagnostic tests for deep vein thrombosis (DVT).

Methods: Systematic reviews on diagnostic tests for DVT were identified and the studies cited by them used to produce a reference set of search results (to be used to evaluate different search strategies). Five existing diagnosis search filters were combined to produce a sensitive search. This combined search was then refined to produce a more specific strategy, which was run on medline and the results were checked against the reference set. This search was too specific and was modified to produce a more balanced final strategy, which was again tested and the results compared with the reference set. The sensitivity of this newly created strategy was compared with the existing diagnosis searches already found. Finally, studies identified by the final search strategy were critically appraised for validity and relevance and the selected articles were compared with those found in the reference set.

Results: The final filter retrieved 124 out of 126 references from the reference set. From the search result, 227 cohort studies were selected and 147 of these were not cited in any of the systematic reviews on diagnostic tests for DVT.

Conclusions: The search strategy had 98.8% sensitivity. The precision of 8.8%, although low, compares well with other strategies with high sensitivity. Most of the systematic reviews on diagnosing a DVT have omitted a number of high quality articles.


Clinical Evidence1 presents the best available evidence on interventions for common clinical conditions. The medical literature is searched, using validated search strategies for systematic reviews and randomised controlled trials2 which are then critically appraised and the resulting evidence is summarized. As an evidence-based publication, searches are regularly re-run and each chapter is updated with new evidence as it is found. Clinical Evidence currently answers questions relating to prevention and treatment only, however, there is great interest in assessing if the same approach could be used to answer diagnostic questions. Our aim was to develop a diagnostic search strategy that could be used to retrieve good-quality diagnostic studies for this purpose.

Although there are many validated search strategies available to retrieve randomised controlled trials3–6 there are fewer evaluated search strategies to retrieve diagnostic studies.7,8 When used, many of the strategies have a very low specificity, especially in studies published before 1991, due to non use of methodological MeSH headings in medline articles.9 As there were a number of strategies to choose from, it was felt important to test existing strategies and, if necessary, develop a different strategy that would improve on the 92% sensitivity of the PubMed Clinical Queries5 but be specific enough to use regularly.

It was decided to use the references cited in systematic reviews as a reference set of search results (a surrogate ‘gold standard’) against which the results of a diagnosis search strategy could be tested. In order to do this, a subject which had several systematic reviews on diagnostic tests, needed to be selected.

Diagnosing a deep vein thrombosis (DVT) was chosen as the first Clinical Evidence diagnostic topic; the question being ‘What are the effects of ultrasound, d-dimer, clinical prediction, magnetic resonance imaging, computed tomography or plethysmography in diagnosing a deep venous thrombosis in people with recent onset calf pain/swelling?’ The gold standard test was a venogram against which all other tests would be compared.

This topic was chosen, as there were 16 systematic reviews10–25 examining diagnostic tests for DVT. Much work has been done on defining criteria to identify high quality diagnostic studies. Essential features include: a representative patient sample; the test compared against a valid reference standard; participants given both the experimental and control tests; the test results measured blindly; the test being adequately described and its reliability assessed.26–30


Sensitivity was calculated as the proportion of citations that were retrieved from the reference set. Precision was calculated as the proportion of all articles retrieved that are relevant and meet the appraisal criteria.31

Defining the reference set

Systematic reviews were retrieved from medline (1966–2000) and embase (1980–2000) on OVID, using a validated systematic review search filter4 on diagnostic tests for DVT. Forty-two papers were found. These were critically appraised on the following criteria: specific inclusion and exclusion criteria stated; quality assessment of included articles; systematic search strategy. Sixteen systematic reviews10–25 were selected.

The references were examined, and articles that compared one of the specified diagnostic tests stated in our question against a venogram, were listed. A search was done on medline 1966–2000 to ensure that the reference set contained only papers that were indexed on medline. The search strategy results would be tested against this reference set. The earliest reference in the reference set was dated 1969, so searches were run for papers published 1969–2000.

Although hand searching selected journals is the standard approach for defining a reference set, the cited references approach resulted in a set with a broader range of journals and publication years than could have been achieved by hand searching in the time available.

It is acknowledged that this assumes that the selected systematic reviews had adequate literature searches. Because of this, additional checks would be made—see Comparing Reference Set with Search Result.

Identifying search strategies

A broad search (see Appendix 1) was performed in February 2002 on medline and evidence-based medical sites on the Internet such as CASP32 and Netting the Evidence33 to find existing diagnosis search strategies. Most of the strategies seem to have been derived from work done at McMaster University by Brian Haynes and Ann McKibbon.5

See Appendix 1 for details of strategies.

Devising a search strategy: strategy A

The terms in the selected strategies were combined. The text words (roc adj curve$) (observer adj variation$) were added as these are terms used in diagnostic assessment, and the MeSH terms ‘likelihood function’ and ‘diagnosis, differential’ were added as these were common indexing terms found in articles on diagnostic tests. The assumption was that this would produce a very sensitive search strategy as shown in Table 1. The filter was combined with the subject terms for DVT and run on medline 1969–2000, and the result checked to see if the studies in the reference set were retrieved. Due to the high cost of translation, only English language articles were included.

Table 1.  Details of search strategies for medline using OVID. 1966–2000.
Search name
Strategy AStrategy BStrategy C
 1. exp ‘sensitivity and specificity’/1. exp ‘sensitivity and specificity’/ 1. exp ‘sensitivity and specificity’/
 2. (sensitivity or specificity or accuracy).tw.2. (sensitivity or specificity or accuracy).tw. 2. or
 3. ((predictive adj3 value$) or (roc adj curve$)).tw.3. ((predictive adj3 value$) or (roc adj curve$)).tw. 3. (predictive adj3 value$).tw.
 4. ((false adj positiv$) or (false adj negativ$)).tw.4. ((false adj positiv$) or (false adj negativ$)).tw. 4. exp Diagnostic errors/
 5. ((observer adj variation$) or (likelihood adj3 ratio$)).tw.5. (observer adj variation$) 5. ((false adj positiv$) or (false adj negativ$)).tw.
 6. likelihood function/6. likelihood function/or 6. (observer adj variation$).tw.
 7. exp mass screening/7. exp Diagnostic errors/ 7. (roc adj curve$).tw.
 8. diagnosis, differential/or exp 8. (likelihood adj3 ratio$).tw. 8. (likelihood adj3 ratio$).tw.
Diagnostic errors/
 9. di.xs or du.fs.9. or/1–8 9. likelihood function/
10. or/1–9 10. exp *venous thrombosis/di, ra, ri, us
11. exp *thrombophlebitis/di, ra, ri, us
12. or/1–11

Devising a search strategy: strategy B

This strategy was then altered to produce a more specific strategy by removing all the general diagnostic terms, shown below, and re-run on medline with the subject terms for DVT:

  • • exp mass screening/
  • • diagnosis, differential/
  • • di.xs
  • • du.fs

The general terms were removed to see if relying on the MeSH heading ‘exp sensitivity and specificity’ and text words describing statistical terms would produce a sufficiently sensitive search.

However, the result was too specific and too many relevant papers from the reference set were not retrieved.

Devising a search strategy: strategy C

The MeSH headings, titles and abstracts of the papers that had not been retrieved with strategy B were examined and the strategy altered. Most of the papers that had not been retrieved were published before 1991 and did not have an abstract, but did have the diagnosis MeSH subheadings. The subject terms combined with diagnosis subheadings were added to the diagnostic filter.

  • exp *venous thrombosis/di, ra, ri, us

  • exp *thrombophlebitis/di, ra, ri, us

The use of major MeSH headings of papers that had not been retrieved by strategy B was noted. Using major MeSH headings reduced the number of references by 1583:

  • –4418 references without MJMH

  • –2835 references with MJMH

The final strategy is shown in Table 1 and Appendix 2.

The strategy was run on medline 1969–2000 and the results checked to see if all the references in the reference set were retrieved.

Comparing results of A with final strategy C

Strategy A was then run on medline, along with the final strategy C to produce a set of references, that were not identified by strategy C search but were retrieved by the first strategy A.

It was necessary to perform this additional check on references omitted by the final strategy C, to allow for variable searching quality in the systematic reviews (most of the selected systematic reviews had very specific searches on medline, or did not specify the keywords of the strategy they had used). There was also little overlap in references between the systematic reviews, even when the date of the search was taken into account, which again implied that the searches were specific.

The abstracts were then critically appraised to see if the final more-specific strategy C missed any relevant papers. Full copies of papers were examined if no abstract was available, or if the abstract contained limited information.

Comparing with other diagnostic strategies

The search results from the new final strategy C were then compared by searching medline 1969–2000 with those from existing diagnosis search filters, and the sensitivity compared. See Table 2.

Table 2.  Sensitivity of diagnosis searches using the reference set as a comparison. medline searched 1969–2000, limited to English language papers.
SearchNumber of articles retrievedNumber of articles from reference set retrieved 126 papersSensitivity (%)
Strategy A5013126100.0
Strategy C2578124 98.4
PubMed3182121 96.0
Strategy B 992100 79.4
Rochester 930 99 79.4
Devillé 878 95 75.4
North Thames 701 67 53.2

Comparing reference set with search result

The references retrieved by the final strategy C were critically appraised using criteria listed in Appendix 3. The number of relevant high quality articles that were not cited in the systematic reviews was identified.


Defining the reference set

In the selected systematic reviews, 126 English-language references were cited, indexed in medline and used as the reference set. References to other systematic reviews were omitted from the list as these were retrieved in the systematic review search.

Selecting search strategies

Five search strategies;5,32,34–36 were found which were all slightly different. The McMaster strategy used on PubMed5 had a sensitivity of 92.0% and a specificity of 73.0%. The Devillé strategy36 had a sensitivity of 80.0% and a specificity of 97.3% when tested by the authors against a hand-searched reference set. The Devillé strategy was developed in a different way, as the aim was to increase specificity by devising a strategy to reduce the number of false positives retrieved.

Devising a search strategy: strategy A

This sensitive strategy retrieved 5013 articles. All the papers in the reference set were retrieved, however, many irrelevant articles were also identified, see Table 2.

Devising a search strategy: strategy B

This strategy retrieved 992 articles. This strategy was too specific as only 103 (81%) of papers from the reference set were retrieved.

Devising a search strategy: strategy C

The final strategy C retrieved 2578 articles in total, and 124 references from the reference set of 126 (98%), so two papers were not retrieved.37,38 The paper by Whitaker had no terms matching the strategy—the article was comparing an antibody immune assay to d-dimer so was irrelevant. The paper by Elms was indexed Thrombophlebitis/di, but this was not a major MeSH heading, so was not retrieved.

However, neither of these papers fulfilled the appraisal criteria for a high-quality diagnostic study, as not all participants had been given a venogram as well as the experimental test. This indicates that papers in the reference set should have been critically appraised to avoid irrelevant articles being listed.

When examining the MeSH headings of papers that had not been retrieved by strategy B, it was noticed that major MeSH headings had been used in the majority of articles, so using major MeSH headings seemed a logical approach to take, as only references comparing diagnostic tests were relevant to the question. For example, references about DVT where a test had been used could be indexed venous thrombosis/di, yet the article could mainly be about treatment, rather than focusing on the diagnostic test. Use of major MeSH headings was not analysed systematically—analysis of this would be valuable.

Comparing results of strategy A with final strategy C

Strategy A retrieved 5013 articles in total including 2435 abstracts that were not retrieved by the final strategy C. The full text of papers were examined when no abstract was available and two relevant papers were identified. One paper should have been retrieved, as the MeSH headings and text words were covered in the final strategy. For some reason it was not retrieved on the medline OVID interface that was used. One possibly relevant 1975 paper39 was not retrieved due to limited indexing and lack of an abstract, however, the paper was assessing an irrelevant test (not listed in our question). Even if it had been relevant, there is always a conflict between pragmatism and quality when searching. Appraising an extra 2435 articles to find one additional paper is not a viable option for the Clinical Evidence search process.

Comparing with other diagnostic strategies

As Table 2 shows, the final strategy C performs well against other diagnosis search strategies. The sensitivity was 98.4% and fewer articles were retrieved than with the CASP32 or PubMed,5 searches, and only two articles from the reference set were not retrieved. The Deville36 strategy had a 77% sensitivity and failed to retrieve 20 references from the reference set. The PubMed strategy had a 96% sensitivity, failed to retrieve five references from the reference set and also retrieved an extra 604 references.

Comparing reference set with search result

Most of the systematic reviews appeared to be based on very specific searches, and there was little overlap in references between them. Most of the authors of these reviews had just searched medline and only two had searched Current Contents as well;19,24 Two information specialists and a doctor independently appraised the final set of 2578 cohort studies. Two hundred and twenty-seven cohort studies were selected, and 147 studies fulfilling the appraisal criteria had not been cited in any of the systematic reviews. This gave a precision of 8.8% for the final search strategy C.


The final search strategy C was designed to be sensitive so it consequently retrieved a high number of false hits. For Clinical Evidence, it is important to retrieve as many relevant studies as possible. For other researchers, this may not always be necessary, in which case the more specific search strategies might be more appropriate to use.

The main difference between strategy C and the other available strategies was the addition of the MeSH subheadings di, ri, ra, us linked to the subject MeSH heading of venous thrombosis or thrombophlebitis, and limiting these to major MeSH headings.

Some of the existing diagnostic strategies have been compared against a reference set of articles from a sample of journals that were hand searched. The Devillé strategy36 had a sensitivity of 80.0% and a specificity of 97.3% when a reference set of articles from nine journals published between 1992 and 1995 was used. The PubMed strategy5 had a sensitivity of 92.0% and a specificity of 73.0%.

It is interesting that when these strategies were used on medline, with a reference set from many different journals and including papers published between 1969 and 2000, the sensitivity and precision of these searches were lower. This is possibly because older studies are less likely to contain abstracts so text word searching is less effective. In the combined reference set of cited and selected articles, 38 out of 256 papers had no abstract, and were only retrieved by the *thrombophlebitis/di, ra, ri, us MeSH headings and subheadings.

The existing systematic reviews on diagnosing deep vein thrombosis missed 147 relevant articles. This is possibly because much of the work on retrieving diagnostic studies was developed by Ann McKibbon and Bryan Haynes5,7 in the 1990s, so earlier reviewers may not have been aware of effective search strategies. Reviewers may also have not sought advice from librarians or information specialists on how to search effectively. Because of the limited strategies used in the reviews, it is not surprising that relevant articles were found by strategy C.

A further area of research would be to see if adding the extra papers to the existing systematic reviews would effect the overall conclusions on the sensitivity and specificity of a diagnostic test.

A more rigorous research-based approach to filter design would undoubtedly have improved the methods used during the development of the strategy, however, the approach was developed as a strategy had to be found quickly, so a very pragmatic approach was taken. The reference set should have been critically appraised. Critical appraisal of search results using the other strategies would be worthwhile so precision could be calculated.

Other databases should have been searched to find articles on DVT such as amed, and lisa searched for diagnostic strategies. However the authors did not have access to these databases, so were unable to search them. A decision was also made to exclude foreign language articles as translation costs would have been prohibitive.

Although this search was more specific than existing strategies, it is still very sensitive, and many irrelevant papers were retrieved in the search. Further research needs to be done on improving the specificity of this strategy. The strategy also needs to be validated by using the same approach for other diagnostic questions, so at present, it is not clear if this approach to searching is generalisable, when searching for other diagnostic tests.


Appendix 1

Search strategy to identify existing strategies, and details of existing search strategies for retrieving diagnostic studies

Search strategy to identify existing diagnostic strategies.medline

  • 1(literature adj5 search$).tw.
  • 2(search$ adj5 strateg$).tw.
  • 4(1 or 2) and 3
  • 5diagnos$ or di.xs
  • 64 and 5

Existing search strategies. Most of the strategies have been derived from work done at McMaster University by Brian Haynes and Ann McKibbon.

Source: PubMed version of McMaster strategy5. Note—this is the sensitive search strategy—a more specific strategy is available on PubMed.

  • 1exp ‘sensitivity and specificity’/
  • 3di.fs.
  • 4du.fs.
  • 6or/1–5
  • Specificity = 73%, Sensitivity = 92.0%

Source: CASP32. Note—very similar to Pubmed search—except di.xs includes MeSH terms pathology/, radiography/, radionuclide imaging/or ultrasonography/

  • 1exp ‘sensitivity and specificity’/
  • 3di.xs.
  • 4du.fs.
  • 6or/1–5

Source: North Thames34.

  • 1exp ‘sensitivity and specificity’
  • 2exp diagnostic errors
  • 3exp mass screening
  • 4or/1–3

Source: University of Rochester Medical Library35.

  • 1exp ‘sensitivity and specificity’/
  • 2false negative reactions/or false positive reactions/
  • 3(sensitivity or specificity).ti,ab.
  • 4(predictive adj value$1).ti,ab.
  • 5(likelihood adj ratio$1).ti,ab.
  • 6(false adj (negative$1 or positive$1)).ti,ab.
  • 7or/1–7

Source: Devillé36.

  • 1exp sensitivity and specificity/
  • 2specificit$.tw.
  • 3false negative$.tw.
  • 6or/1–5
  • Specificity = 91.9%, Sensitivity = 89.33%

Appendix 2: strategy C

Database: medline <1966 to January Week 3 2002>

Search strategy: DVT diagnosis.

1exp venous thrombosis/(24150) 
2exp thrombophlebitis/(18186) 
3deep vein 
4(thrombos$ or thrombophleb$ or dvt).ti.(22937) 
5(thrombos$ or thrombophleb$ or dvt).ab./freq = 2(15043) 
74 or 5(30544) 
8exp leg/(75341) 
9(calf$ or calv$ or leg$ or limb$ or dvt or (low$ adj3 extremet$) or knee$).tw.(235407) 
107 and (8 or 9)(5869)DVT in leg/calf
116 or 10(26750)DVT SET
12exp Physical Examination/or Algorithms/or (clinical$ adj2 predict$).tw.(328124) 
15exp Fibrin Fibrinogen Degradation Products/(3796) 
17exp Ultrasonography/(112370) 
18exp ULTRASONICS/di, du [Diagnosis, Diagnostic Use](18949) 
19(ultrasound$ or ultrasonic$ or ultrasonog$).ti.(46377) 
20(ultrasound$ or ultrasonic$ or ultrasonog$).ab.(85085) 
21exp PLETHYSMOGRAPHY/or plethysmog$.tw.(16559) 
22exp Tomography, X-ray Computed/or exp Tomography, X-ray/(135291) 
23(ct adj3 scan$).tw.(30053) 
24exp Magnetic Resonance Imaging/(95249) 
25mri.ti. or mri.ab./freq = 2(23202) 
26exp Phlebography/(7827) 
27(venogram$ or venograph$ or phlebograph$).tw.(6688) 
28(clin$ adj5 predict$ adj5 (rule$ or test$ or strateg$ or algo$)).tw.(1154) 
29or/12–28(721948)Diagnostic Tests
3029 and 11(7110)DVT and Tests
31exp ‘Sensitivity and Specificity’/(115256) 
32(sensitivity or specificity).tw.(314227) 
33(predictive adj3 value$).tw.(24177) 
34exp Diagnostic Errors/(50692) 
35((false adj positiv$) or (false adj negativ$)).tw.(26388) 
36(observer adj variation$).tw.(534) 
37(roc adj curve).tw.(1355) 
38(likelihood adj3 ratio$).tw.(1953) 
39Likelihood Functions/(4537) 
41exp Venous Thrombosis/di, ra, ri(7316) 
42exp THROMBOPHLEBITIS/di, ra, ri, us(5395) 
4440 or 43(459049)Subhead or filter (too many)
4530 and 44(4510) 
46exp *venous thrombosis/di, ra, ri, us(3734) 
47exp *thrombophlebitis/di, ra, ri, us(2862) 
4846 or 47(3734)Subheads/MJMH
4940 or 48(455670)Subheads/MJMH OR Filter
5030 and 49(2901)DVT & TESTS & Filter
51(review or review,tutorial or review, academic).pt.(870121) 
52(medline or medlars or embase).tw,sh.(10137) 
53(scisearch or psychinfo or psycinfo).tw,sh.216 
54(psychlit or psyclit).tw,sh.335,sh.373 
56((hand adj2 search$) or (manual$ adj2 search$)).tw,sh.(1094) 
57(electronic database$ or bibliographic database$ or computeri?ed database$ or online database$).tw,sh.(1577) 
58(pooling or pooled or mantel haenszel).tw,sh.(16790) 
59(peto or dersimonian or der simonian or fixed effect).tw,sh.(406) 
6151 and 60(6623) 
64(meta-analys$ or meta analys$ or metaanalys$).tw,sh.(9595) 
65(systematic$ adj5 review$).tw,sh.(3553) 
66(systematic$ adj5 overview$).tw,sh.(215) 
67(quantitativ$ adj5 review$).tw,sh.(905) 
68(quantitativ$ adj5 overview$).tw,sh.(84) 
69(quantitativ$ adj5 synthesis$).tw,sh.(1011) 
70(methodologic$ adj5 review$).tw,sh.(1215) 
71(methodologic$ adj5 overview$).tw,sh.(96) 
72(integrative research review$ or research integration).tw.(59) 
7461 or 73(22656)SR filter
7550 and 74(23)SRS on DVT Diagnosis
7650 not 75(2657) 
77limit 76 to year = 1969–2000(2578)Diagnostic Cohort Studies

Appendix 3

Appraisal criteria used for diagnostic studies

  • Was the test compared with a valid reference standard?

  • Were the Test and Reference Standard measured blindly?

  • Was the patient sample representative?

  • Were participants given both experimental and control tests?

  • Was the reproducibility of experimental test assessed?

  • Was follow up more than 80%?

  • Was the sample size sufficient?

  • Was the study prospective?

Criteria taken from references.26–30