The development of search filters for adverse effects of medical devices in medline and embase

Abstract Background Objectively derived search filters for adverse drug effects and complications in surgery have been developed but not for medical device adverse effects. Objective To develop and validate search filters to retrieve evidence on medical device adverse effects from ovid medline and embase. Methods We identified systematic reviews from Epistemonikos and the Health Technology Assessment (hta) database. Included studies within these reviews that reported on medical device adverse effects were randomly divided into three test sets and one validation set of records. Using word frequency analysis from one test set, we constructed a sensitivity maximising search strategy. This strategy was refined using two other test sets, then validated. Results From 186 systematic reviews which met our inclusion criteria, 1984 unique included studies were available from medline and 1986 from embase. Generic adverse effects searches in medline and embase achieved 84% and 83% sensitivity. Recall was improved to over 90%, however, when specific adverse effects terms were added. Conclusion We have derived and validated novel search filters that retrieve over 80% of records with medical device adverse effects data in medline and embase. The addition of specific adverse effects terms is required to achieve higher levels of sensitivity.


Introduction
Systematic reviews usually employ highly sensitive search strategies that aim to identify as many relevant papers as possible. However, retrieving a complete data set of studies on adverse effects is challenging due to inconsistent terminology and poor reporting (Golder, McIntosh, Duffy & Glanville, 2006). Medical devices are equipment, instruments, software or related articles intended for use in health care; they include stents, the contraceptive coil, breast implants and hip replacements. Retrieving studies on non-drug interventions such as medical devices is particularly challenging because the primary studies are less likely to have incorporated adverse effects data and may be smaller than studies of drug interventions, making event data more sparse and their retrieval more difficult (Golder, Wright & Loke, 2017). For medical devices, in particular, adverse effects are more likely to be overlooked or not considered important. Even when they are considered they are likely to be secondary or tertiary outcomes. This may be due to the regulatory requirements for research evidence on the safety of new devices being universally less stringent than those for medicines (Golder & Loke, 2012a,b,c). The reporting and terminology surrounding adverse effects in medical devices have also been notoriously inconsistent, and this is reflected in the indexing of database records. In addition, as with other interventions, not all adverse effects may be known at the time of searching and it is common to include study designs beyond randomised controlled trials (RCTs) for identifying the adverse effects of medical devices. Whilst search filters for RCTs have been proven to perform well, searching for non-RCT study designs is more problematic (Higgins & Green, 2011).
One way to help enable efficient searching for adverse effects could be through the development of search filters. Search filters are combinations of search terms which are designed to improve the efficiency and effectiveness of searching. Search filter development for adverse effects has tended to concentrate on identifying studies that report on adverse drug effects (Badgett, Chiquette, Anagnostelis & Mulrow, 1999;Golder & Loke, 2012a,b,c;Golder et al., 2006;Wieland & Dickersin, 2005). However, a different approach is required for the adverse effects of medical devices (Farrah, Mierzwinski-Urban & Cimon, 2016;Golder, Wright & Rodgers, 2014;Golder et al., 2017). The different search strategies required for medical devices as opposed to drug adverse effects has been demonstrated by the poor retrieval obtained when our adverse drug effect search filter (which obtains between 89% and 97% of the relevant drug literature) (Golder & Loke, 2012b,c), identified only 54% of the literature on the adverse effects of medical devices (Farrah et al., 2016).
Search filters may be useful not only for librarians and information professionals but also for clinicians, researchers, guideline producers and policymakers. A relatively efficient method of retrieving useful information would benefit all searchers not just expert searchers. Information is required to enable decision-making in clinical practice to generate appropriate advice on the benefit:harm of medical devices.
The creation of a medical device adverse effect search filter would be particularly timely given the current developments in EMBASE. Elsevier (who produce EMBASE) have been improving the indexing for adverse effects of medical devices in a number of ways. In 2014, they introduced the subheading 'adverse device effect', and by April 2018, this had been used in the indexing of 30 000 records. In addition, Elsevier have added further EMTREE indexing terms for medical devicesfor example, endoscopes, catheters and prostheses and now have over 3000 specific terms.
We aimed to create highly sensitive validated search filters for OVID MEDLINE and EMBASE to identify studies on medical device adverse effects.

Systematic review identification
Systematic reviews of adverse effects were identified by searching Epistemonikos (https:// www.epistemonikos.org/) and the Health Technology Assessment (HTA) database via OVID. Epistemonikos was chosen as it is currently the largest source of systematic reviews still being updated. Similarly, the HTA database is the largest source of technology assessments from around the world.
Due to the large volume of systematic reviews published in the years 2015-2017, we were unable to simply sift the records available in Epistemonikos. We therefore conducted a series of searches for named 'medical devices' in combination with terms relating to 'safety'. Searches were conducted on the 20 and 21 June 2017 and Publication Type: Systematic Reviews. A limit was placed of 'Publication Date: 2015 to 2017' in order to retrieve a recent cohort of systematic reviews. Additionally, the size of the sample needed to be restricted because of resource constraints. The safety terms were derived from previous research (Golder et al., 2006) and the medical device terms from a list of device terms provided by Elsevier (Box A1). The HTA database was searched with the search strategy ('2015' or '2016' or '2017').di on the 23 June 2017.
A systematic review was considered eligible for inclusion if: • Adverse effect(s) for a medical device were the primary or secondary outcome. The device was required to be the main focus of the review. If the review focused more heavily on the surgical procedure needed to implant the device or the drug component of the device (such as anticoagulation after stenting) or was focused on prevention of adverse effects, it was excluded on this basis. The World Health Organisation (WHO) definition of a medical device was used: '"Medical device" means any instrument, apparatus, implement, machine, appliance, implant, reagent for in vitro use, software, material or other similar or related article, intended by the manufacturer to be used. . ..for specific medical purpose(s)' http://www.who. int/medical_devices/full_deffinition/en/ • The search strategy was reported in the published paper, and no adverse effects search terms (either generic, such as 'adverse effects' or 'side-effects' or named, such as 'fatigue' or 'insomnia') had been used. Typically, such reviews rely on terms for the population or condition and intervention only. This enabled us to construct an unbiased cohort which did not include articles that had been retrieved because they already contained adverse effects terms.
• The search included either handsearching or reference checking in addition to database searches. This was in an attempt to compensate for potential deficiencies in the search strategies.
• At least one included study was related to safety. This was because some reviews were unsuccessful in retrieving any relevant studies. We excluded reviews that (a) were in a non-English languagewhich we were unable to obtain a translation for and (b) where the full text was unavailable.
Two researchers independently screened titles and abstracts using Distiller and selected systematic reviews for potential inclusion. Any discrepancies between the researchers were resolved by discussion and consensus or by a third reviewer. The full text of potentially relevant systematic reviews was also independently screened, with discrepancies resolved by discussion and consensus.

Included primary studies
The full text of the included articles within these systematic reviews was checked to confirm the presence of adverse effects data. The use of included papers from systematic reviews has been shown to be an effective alternative to handsearching to identify a reference standard set of records for developing and evaluating search strategies (Sampson et al., 2006).
The first stage of the analysis was to check whether each paper was contained in MEDLINE or EMBASE. We used several search iterations as necessary of the author names or words from the paper to identify each record. The records available on MEDLINE and EMBASE were then divided into three test sets and one validation set of records using random numbers generated by RANDOM.ORG.
Individual word and multiple-word frequency analysis on the first test set of records was undertaken using WriteWords to identify commonly occurring terms related to adverse effects. WriteWords is freely available on the Internet and allows frequency counting of the usage of words or phrases (http://www.writewords. org.uk/phrase_count.asp). We calculated relative recall as a measure of the percentage of known records retrieved using the filter because it provides an estimate of sensitivity (Sampson et al., 2006). The relative recall of the relevant search terms was calculated using the following formula:

Relative recall calculation.
No of relevant records retrieved No of relevant records available Â 100 ¼ Relative recall as a percentage ð%Þ: A draft filter was created with the first test set. We started with the search term that had the highest recall and then tested all other potentially relevant terms to ascertain the incremental increase in recall when added to the first search term. This process continued until no more new records were being identified by additional search terms.
The filter created with the first test set was next applied to the second test set, then after any additional modifications, such as additional search terms, the filter was applied to the third test set. After any further modifications from applying the filter to the third test set, the retrieval performance of the search filter was tested in the validation set.
We also examined those records not retrieved by our generic search term filters to ascertain whether specific adverse effects search terms (such as 'infection' or 'mortality') would have been successful in the retrieval of additional records. We noted any database records with no indication that the full text contained information on adverse effects.
In order to give a relative or rank estimate of the precision of the search terms, we also identified the total number of records retrieved from MEDLINE or EMBASE at the time of conducting the present research using each search term. We then calculated an approximation of the relative precision of the term in comparison with the other terms we identified.
This whole process was first undertaken in MEDLINE and then repeated in EMBASE.

Results
From 6433 records screened, 1422 full-text reports were retrieved of which 423 met our inclusion criteria. Of these 423 reviews, 93 were systematic reviews where the primary outcome was an adverse effect(s) of a medical device and 330 systematic reviews had adverse effects as secondary outcomes. Due to constraints on time and resources, we limited the analysis to the 93 reviews with adverse effects as a primary outcome and a random selection of 93 of the 330 reviews with adverse effects as a secondary outcomegiving a total of 186 reviews ( Figure 1). These 186 reviews included 2130 studies (2278 studies before deduplication) and of these included studies -1984 unique records were available on MEDLINE and 1986 on EMBASE.

MEDLINE
The gold standard set of 1984 records in MEDLINE were randomly allocated into three test sets of 496 records each and one validation set of 496 records.
First test set for the development of the MEDLINE search filter. Of the search terms identified in the first test set -'complicat*' in the title and abstract had the highest recall and was searched first. This was followed by the floating subheading 'adverse effects (ae)' which gave the highest incremental increase in recall when added to 'complicat*' in the title and abstract (Table A1 and Box 1). The addition of further terms resulted in a search strategy (Box 1) which retrieved 89% (439/496) of records. Of the 57 records not retrieved -25 contained terms for specific adverse effects (Table A2) whereas 32 records gave no indication that the full paper contained information on adverse effects. The specific adverse effects terms (such as sore throat and dysphagia) were not added to the search as they tended to only apply to specific medical devices. A search strategy which incorporates both generic and specific adverse effects terms could therefore potentially achieve 94% (464/496) recall.
The search terms which gave the highest precision in MEDLINE (Tables 1 and A2)  Second test set for the development of the MEDLINE search filter. The search strategy from the first test set (Box 1) was tested on the second test set of records and retrieved 87% (432/496). On inspection of the records that had not been retrieved, we found three additional generic adverse effects terms -'intraoperative complications/' [MeSH], 'migration' in the abstract, and 'breakag*' in the abstract. These additional terms were added to the search strategy, and 88% (438/496) of records were retrieved.
Of the 58 records that had not been retrieved by this search strategy, 28 contained specific adverse effects terms (Table A2). A search strategy which incorporates both generic and specific adverse effects terms could therefore potentially achieve 94% (466/496) recall in the second test set of records.
Third test set for the development of the MEDLINE search filter. The search strategy from the second test set (Box 1) was tested on the third test set of records and retrieved 89% (443/496) of records. On inspection of the records that had not been retrieved, we found additional generic adverse effects terms in the abstract, 'detrimental adj2 'discomfort', 'displacement' and 'untoward effects'. These terms were added to the search strategy, and 91% (450/496) records were retrieved.
Of the 46 records that had not been retrieved by this search strategy -18 contained specific adverse effects terms (Table A2). A search strategy which incorporates both generic and specific adverse effects terms could therefore potentially achieve 94% (468/496) recall in the third test set of records.
Validation of the MEDLINE search filter. The revised search strategy (Box 1) performed less well on the validation set of records then in the test sets and retrieved 83% (414/496) of records. We conducted post hoc analysis to identify factors that may have affected the recall. There was one additional record that could have been retrieved if 'post-operative morbidity' in the abstract was added to the search strategy.
Of the 82 records not retrieved, 40 contained terms related to specific adverse effects (Table A2). A search strategy which incorporates both generic and specific adverse effects terms could therefore potentially achieve 92% (454/496) recall in the validation set of records.

EMBASE
The gold standard set of 1986 records in EMBASE were randomly divided into three test sets of 496 records each and a validation set of 498 records.
First test set of records for the development of the EMBASE search filter. The floating subheading 'complication (co)' had the highest recall and was searched first. This was followed by 'complicat*' in the title and abstract which gave the highest incremental increase in recall when added to the floating subheading 'complication (co)' (Table A3 and Box 2). The addition of further terms resulted in a search strategy (Box 2) which retrieved 89% (439/ 496) records. Of the 57 records not retrieved by the search strategy, 30 had terms related to specific adverse effects (Table A4) whereas 27 gave no indication that the full paper contained information on adverse effects. A search strategy which incorporates both generic and specific adverse effects terms could therefore potentially achieve 95% (469/496) recall.
The terms which gave the highest precision in EMBASE ( Second test set of records for the development of the EMBASE search filter. The search strategy from the first test set (Box 2) was tested on the second test set of records and retrieved 87% (431/496). There were four additional records that could have been retrieved if 'device safety/' or 'equipment safety' as a keyword, 'peroperative complication/', 'safety/' and 'tolerated' in the abstract were added to the search strategy. After adding these terms to the search strategythe revised strategy retrieved 88% (435/496) of the records in this second test set.
Of the 61 records not retrieved, 24 had terms related to specific adverse effects (Table A4). A search strategy which incorporates both generic and specific adverse effects terms could therefore potentially achieve 93% (459/496) recall in the second test set of records.
Third test set of records for the development of the EMBASE search filter. The search strategy from the second test set (Box 2) was then tested on the third test set of records and retrieved 85% (423/ 496) of records. There were two additional records in this test set that could have been retrieved if 'failing' in the abstract was added to the search strategy. Hence, after adding the term 'failing'the revised strategy retrieved 86% (425/496) of records in this third test set.
Of the 72 records not retrieved by the search strategy, 37 had terms related to specific adverse effects (Table A4). A search strategy which incorporates both generic and specific adverse effects terms could therefore potentially achieve 93% (462/496) recall in the third test set of records.
Validation of the EMBASE search filter. The revised search strategy in Box 2 was then tested on the validation set of records and retrieved 410/498 (83%) of the records. We conducted post hoc analysis to identify factors that may have affected the recall. When we explored the records that had not been retrieved from the validation set, 'postoperative complications/' and 'adverse drug reaction/' and 'high risk device' in the abstract were in three records not retrieved. These terms are indicative of generic adverse effects.
However, adverse effects specific to the individual paper were present in 32 of the 83 records not captured (Table A4). A search strategy which incorporates both generic and specific adverse effects terms could therefore potentially achieve 90% (447/498) recall in the validation set of records.

Discussion
We have used a cohort of included studies from systematic reviews on medical devices to derive and validate a novel search filter for the adverse effects of medical devices. The results here give an indication of performance in terms of relative recall of individual search terms and their combinations. The filters will also inevitably increase the precision of searches for adverse effects, although we were unable to quantify this. We were able to compile a list of some of the specific terms commonly used in the databases and we recommend that searchers look to augment the search filter with these specific named adverse effects where appropriate. However, it is very apparent that the 'specific' terms are very narrow in scope and relevant only to a particular intervention, anatomical site and method of deploying the device. Unlike pharmaceutical preparations which typically are pill, potions, creams and injections, there is far greater diversity in how and where the device is fitted. Hence, the 'specific' AE are a mishmash that cannot easily be addressed by search filter terms. Therefore, reviewers could look at the physical characteristics and scientific development of the device, and pick out the most relevant specific adverse effects rather than rely on the specific terms listed in this paper. This would be best done by using our generic search filter and then adding those specific to site and device (e.g. cardiac tamponade for devices in the heart). Search filters vary in the level of sensitivity and precision that can be achieved. Whilst we strive for 100%, generally lower levels of sensitivity are deemed acceptable and we adopted Benyon 2013's 90% or above threshold (Beynon et al., 2013). Perfect sensitivity is unachievable because some relevant records will always not contain any terms in the title, abstract or indexing to indicate they met certain criteria or present relevant data and examination of the full text will always be required. In addition, there is always a trade-off between sensitivity and precision. The recall of searches using solely generic adverse effects terms was 84% in MEDLINE and 83% in EMBASE. With the addition of specific adverse effects terms (to the generic adverse effects terms), the recall could be raised to 92% in MEDLINE and 90% in EMBASE. The results for medical device searches here are less favourable compared with search filters for drug intervention adverse effects whereby sensitivity approaching 90% in both MEDLINE and EMBASE was achieved without specific named adverse effects and 93% in MEDLINE and 96% in EMBASE when specific adverse effects terms were added (Golder & Loke, 2012b). And also less favourable than searches for adverse effects of surgical interventions whereby sensitivity of 87% in MEDLINE and 92% in EMBASE was achieved with generic adverse effects terms and 93% in MEDLINE and 95% in EMBASE with the addition of specific adverse effects terms (Golder 2008). This is likely to be as a result of the more diverse adverse effects being associated with medical devices rather than for drug interventions and surgical procedures. Hence, there may be fewer generic terms useful for searching for general medical device adverse effects.
It should also be noted that the performance of the search filters for medical device adverse effects in the validation set in both MEDLINE and EMBASE was poor in comparison with the test sets. However, when searching with only generic adverse effects terms, the sensitivity did not meet the 90% or higher target in the validation sets of records and five of the six test sets. The 90% target was however met for the test sets when generic and named adverse effects were searched in the validation set and all the test sets (Table 2). We anticipate that these search filters will assist searchers when devising search strategies to identify relevant studies for a systematic review of the adverse effects of medical devices. In addition, we demonstrate the value of the addition of specific adverse effects terms where possible. However, we do not recommend these adverse effects filters for medical devices be used without due consideration, particularly as some of the search terms may only apply to certain types of medical device and that recent changes in indexing may impact on the performance. For instance, the recently introduced subheading in EMBASE is 'adverse medical effect (am)' in March 2014.
Whilst the floating subheading adverse device effect (am.fs) is not currently included in our search filter, this is likely to be a result of the year of publication of many of our studies. This subheading was introduced in March 2014. Future research may see the value of this subheading for searching for adverse effects improve as it is more widely accepted and used.

Limitations
A major limitation of the methodology used in this study is the lack of a true measurement of precision. We would need a large set of non-relevant records in order to identify not just the most frequently occurring relevant terms but also the most discriminating terms and to measure precision. The current study simply indicates the relative rank precision of terms in relation to one another.
Our sample of records was obtained using search terms for both devices and safety in Epistemonikos. Although we included many synonyms and different devices, this may have limited the generalisability of our findings. The next steps in this area need to be the testing and validation on systematic review case studies (in which precision can be measured) and further research with larger sample sizes of relevant papers.
Medical devices have an added complexity in that they are often used in conjunction with another intervention. For instance, many medical devices require a surgical procedure for their placement such as breast implants and hip prosthesis. Other medical devices have a drug component embedded in them such as drug-eluting stents. The diversity of types of medical devices and the common use of medical devices in conjunction with another type of intervention (such as pharmaceutical or surgical) meant that we employed a loose definition of 'generic' adverse effects terms. Some of the generic terms therefore are more specific to one type of device than another and may even be irrelevant to others.

Conclusions
This is the first search filter for adverse effects of medical devices. The filter can be used where unmanageable numbers of records would otherwise be retrieved. Additional specific terms can be added to the filter to increase its sensitivity.
Further research on larger data sets is required in order to measure the precision of searching for adverse effects of medical devices and to test the suggested search filters with more rigour. In time with improvements in indexing and the adoption of subheadings such as 'adverse device effects' in EMBASE, the sensitivity of future filters is likely to improve. Different categories of medical devices may require more individualised search filters.

Funding
This paper presents independent research funded by a post-doctoral fellowship PDF-2014-07-041 through the National Institute for Health Research (NIHR). The views expressed are those of the author(s) and not necessarily those of the NHS, the NIHR or the Department of Health.