Processes for evidence summarization for patient decision aids: A Delphi consensus study

Abstract Background Patient decision aids (PDAs) should provide evidence‐based information so patients can make informed decisions. Yet, PDA developers do not have an agreed‐upon process to select, synthesize and present evidence in PDAs. Objective To reach the consensus on an evidence summarization process for PDAs. Design A two‐round modified Delphi survey. Setting and participants A group of international experts in PDA development invited developers, scientific networks, patient groups and listservs to complete Delphi surveys. Data collection We emailed participants the study description and a link to the online survey. Participants were asked to rate each potential criterion (omit, possible, desirable, essential) and provide qualitative feedback. Analysis Criteria in each round were retained if rated by >80% of participants as desirable or essential. If two or more participants suggested rewording, reordering or merging, the steering group considered the suggestion. Results Following two Delphi survey rounds, the evidence summarization process included defining the decision, reporting the processes and policies of the evidence summarization process, assembling the editorial team and managing (collect, manage, report) their conflicts of interest, conducting a systematic search, selecting and appraising the evidence, presenting the harms and benefits in plain language, and describing the method of seeking external review and the plan for updating the evidence (search, selection and appraisal of new evidence). Conclusion A multidisciplinary stakeholder group reached consensus on an evidence summarization process to guide the creation of high‐quality PDAs. Patient contribution A patient partner was part of the steering group and involved in the development of the Delphi survey.

guidance about recommended methods for evidence selection and summarization for PDAs. 9 A 2013 review of the literature conducted by the IPDAS working group on the synthesis of scientific evidence highlighted the importance of rigorously selecting and summarizing evidence used to populate a PDA. They did not provide clear practical guidance on how to conduct evidence summarization for the development of PDAs, except recommending that developers apply the Grading of Recommendations Assessment, Development and Evaluation (GRADE) methodology. 10 Furthermore, the IPDAS instrument and the IPDAS minimum standards do not offer additional information or guidance on the steps required to select and summarize evidence-based information for PDAs. 11,12 As part of a recent update of IPDAS, researchers analysed the evidence characteristics of 471 publicly available PDAs, and only 14% of them reported 'at least one of the steps used to select the evidence included in the decision aid, such as how it was searched for, appraised, or summarized'. 13 Other efforts to evaluate or certify the quality of PDAs have emerged, 14 but none of those standards or certification bodies describe recommended methods and criteria that PDA producers should follow when selecting and summarizing evidence for patient-facing tools.
Although consensual and validated methods for evidence summarization exist for other evidence resources, such as clinical practice guidelines, there is no agreed-upon process for the selection and summarization of evidence for PDAs. Evidence summarization processes for other resources have become increasingly standardized, which promotes transparency and rigour, while minimizing the risk of bias in the end product. 2,[15][16][17][18][19][20][21][22][23] The same level of scrutiny is justified when developing PDAs, as they may directly influence patient care and decision making. The selection and identification of patient-relevant outcomes, analysis of patient concerns and priorities, description of the quality of evidence and communication of uncertainty in ways that patients understand warrant the development of an agreed-upon process and related steps and criteria that are specific to PDAs. The target audience, scope and content differ substantially enough from clinical practice guidelines to require a tailored evidence summarization process.
Additionally, the IPDAS Collaboration imposes some prerequisites on the evidence summarization process on which the PDA will be based.
For example, the IPDAS Collaboration requires that developers summarize the evidence regarding all health options available to a patient facing a specific health problem, with positive and negative features of each option presented with equal amounts of details. 24 Efforts to develop an agreed-upon evidence summarization process for PDAs should incorporate the substantial body of related evidence summarization guidance previously developed by other groups. 17 The aim of the study was to reach consensus, using a modified Delphi survey, on a process, related steps and criteria for selecting and summarizing evidence for PDAs. This will, in turn, assist PDA developers in improving the transparency and rigour, while minimizing the risk of bias in the evidence summarization processes used in PDA development.

| Design
Our study protocol was published, and a summary is presented here. 25 The modified Delphi method has been previously used in

| Participants
To maximize the generalizability and applicability of the criteria, we invited the following groups to complete the surveys: (1) the Global Inventory of Patient Decision Support Developers, which includes all known developers of PDAs who created or updated a tool within the last five calendar years (the inventory was last up- Membership in one of the groups listed above was the only criterion for inclusion in our study.

| Steering group
We convened a steering group to oversee the project and make strategic decisions about the study design, data collection and analysis processes, as well as agree on a final set of steps and criteria. The steering group also generated an initial set of criteria for the Delphi process and managed the distribution of the survey.  States (n = 6), Canada (n = 1), Australia (n = 1) and Spain (n = 1), all of whom are authors of this study. The steering group also included 1 patient representative. To avoid contaminating Delphi results or duplicating their views, the steering group members unanimously decided not to complete the Delphi surveys.

| Survey development
At the first meeting of the ISDM conference in Lyon, France, the steering group developed a spreadsheet that detailed the evidence summarization process inherent to PDA development.

| Round 1 survey
The round 1 survey invitation (see Appendix S2) was sent by email and provided a brief outline of the study and the link to the online survey. Consent was inferred by participants' completion of the survey. The first page of the survey was a brief participant information sheet. Following the information sheet, participants were asked to complete demographic questions and provide their email address so that they could be contacted for round 2 of the survey. Next, participants were asked to provide their input on the phases, steps and criteria (including inclusion, wording, grouping, order and any other comments). Specifically, they indicated whether each criterion should be omitted, possibly kept, or whether they considered the criterion to be essential or desirable to the process using a 4-point Likert scale (omit, possible, desirable and essential). Participants were also given the opportunity to provide rewording suggestions, suggest additional phases, steps or criteria or provide additional comments or questions. Participants were not required to provide qualitative feedback but had to select a response on the Likert scale for each criterion to progress through the survey. Participants could exit the survey at any time. The survey was open from 16 July to 1 September 2018. During that time period, two automated email reminders were sent to participants to complete the survey.

| Round 2 survey
Round 1 participants were invited to complete the round 2 survey via email. The round 2 survey included a summary of the round 1 results: the percentage of participants who thought each criterion should be retained or excluded and the changes made based on the qualitative feedback. Participants were invited to indicate whether each criterion should be omitted, possibly kept, or whether they considered the criterion to be essential or desirable to the process using the same Likert scale as round 1. They could also provide additional rewording suggestions, comments or questions for criteria that did not reach consensus in round 1 and new criteria proposed by participants during the first round. The survey was available from 24 April to 31 May 2019. During that time frame, two automated email reminders were sent to participants to complete the survey.

| Data analysis
Following round 1, the ratings were summarized using percentages.
If >80% of participants rated a criterion in the lower two categories (omit, possible) or in the higher two categories (desirable and essential), we considered consensus as having been reached and the criterion was removed or retained accordingly. Following the round 1 survey, a consensus meeting involving the steering group was held. The steering group reviewed and discussed the ratings and qualitative feedback, including rewording suggestions, suggestions to add new phases, steps or criteria, and more general comments or questions. The wording or order of the phases, steps or criteria were revised if two or more respondents suggested it or if the steering group members agreed that the phase, step or criterion would benefit from rewording, reordering or merging. The same process was conducted following the round 2 survey.
Only fully completed surveys were included in the analysis.
Based on the round 2 results and feedback, the steering committee deemed it unnecessary to conduct a third Delphi round. This method to determine the number of survey rounds was implemented successfully in the past to develop a measure of organizational readiness for patient engagement. 28

| Participants
In the first Delphi round, 50% (n = 131/260) of participants who started the survey completed it. The majority, 58% (n = 76), of respondents were female. Overall, 26 countries were represented (see Table 1 for details). A total of 49% (64/131) of participants selected multiple roles (see Appendix S3 to view the roles of participants in the first Delphi round).
All 131 participants who completed the round 1 survey were invited to complete the second round Delphi survey. Of the participants who started the round 2 survey, 95% (n = 114/120) completed it. Similar to round 1, the majority, 59% (n = 67), were female.
Overall, 18 countries were represented in this round (see Table 1 for details). A total of 50% (57/114) of participants selected multiple roles (see Appendix S3 to view the roles of participants in the second Delphi round).

| Round 2 (3 steps)
Between 86% and 100% of participants rated the nine criteria in Phase 2 as desirable or essential. To avoid redundancy, participants suggested merging two of the criteria in step 2 so that it reads: 'Systematically select evidence about benefits and harms of each option'. Overall, the qualitative feedback indicated that participants wanted more detail on the hierarchy, grading and sources of evidence. Evidence embedded in PDAs should be derived from systematic reviews. GRADE should be followed to rate the quality of evidence as determined by systematic reviews. The certainty or quality of the evidence should also be critically appraised. Grey literature and social media sources should be avoided unless patient-relevant harms and benefits and concerns are not sufficiently covered in biomedical databases of published literature.

| Round 1 (4 steps)
Between 74% and 98% of participants rated the 14 criteria in Phase 3 as desirable or essential. The majority of the qualitative comments focused on the redundancy of the criteria in this phase.
We therefore reworded the first four criteria in step 1 to be more concise, and we removed the last two criteria for the round 2 survey to avoid redundancy. Furthermore, the three criteria in step 2 ('Manage COI') were considered by participants to be too similar to the criteria in step 3 of Phase 1, so we removed them for the round 2 survey. Based on participant feedback, two of the four criteria in step 3 ('Report') were removed to avoid redundancy, and the remaining criteria were reworded for the round 2 survey: (a) report the methods used to represent the evidence, and (b) report the evidence summarization process publicly and in a way that is easy to understand. Examples, where suggested by participants, were inserted to help clarify criteria. For instance, participants wanted some guidance on how to present evidence in a way that is easy to understand, so we provided an example in the third criteria of step 8: For example, the IPDAS chapter on communicating evidence may be used.

| Round 2 (3 steps)
Between 84% and 96% of participants rated the seven criteria in Phase 3 as desirable or essential. Based on qualitative feedback, participants wanted more information in step 1 on how to present risk information in a balanced fashion considering the difficulty of this task. PDAs should use absolute risks instead of relative risks.
Also, participants sought more detail on the external group in step 3.
The criteria should be explicit on the make-up of the external group, which should include patients to assess how to best present the evidence.

| Round 2 (1 step)
The criterion was rated as desirable or essential by 90% of participants. Qualitative feedback indicated that participants were concerned about the practicality of updating the PDA every time new evidence becomes available. Updating the evidence is a time-and resource-intensive process that requires funding. Guidelines should also be developed to identify when and who updates the evidence.
Relatedly, to increase feasibility, participants felt that a benchmark should be established to update the evidence of PDAs (eg updates are required every three years). Participants also suggested including publication dates on PDAs.

| Final process
Overall, the process for selecting and summarizing evidence for  Table 2 (see Appendix S5 to view the full evidence summarization process).

| Principal findings
Based on two rounds of a modified Delphi survey, we have developed a process for selecting and summarizing evidence for PDAs that now consists of four phases, 11 steps and 31 criteria. Based on the stakeholders' feedback, the number of steps was reduced from 13 to 11, and the criteria from 48 to 31 by merging or deleting redundant criteria. The qualitative feedback also informed the rewording of criteria and the addition of examples to improve the clarity of the process for PDA developers. The final 31 criteria were rated as desirable or essential by 84% to 100% of participants.

| Strengths and limitations
Our use of the Delphi method to obtain feedback from a multidisciplinary stakeholder group to develop the PDA evidence summarization process is a strength, and the use of this method is supported when evidence is weak or uncertain. There is a dearth of empirical evidence for an evidence summarization process specific to PDA development. This group included patients in each round of the Delphi survey, ensuring that the patient perspective is included in the evidence summarization process. We believe that a major strength of our evidence summarization process is that it encourages the involvement of patients throughout the process and can be used to develop tools for all patient populations, including those who are underrepresented. In terms of limitations, the high attrition rate in the Round 1 survey decreased the size of our sample and may have led to a non-response bias.
Also, we do not know the total number of participants invited to the Round 1 survey because we are unable to determine the total number of potential participants from each listserv. The consistent 'desirable or essential' ratings suggest a ceiling effect, which could be reduced in future Delphi studies by substituting a 4-point scale with a 5-point scale that includes 3 positive and 2 negative choices. 28,29 The evidence summarization process has also yet to be piloted. Lastly, the omission of a criterion from the Delphi process prevented participants from providing feedback on whether it is desirable or essential to the evidence summarization process.

| CON CLUS ION
Ratings and qualitative feedback from over 100 multidisciplinary stakeholders across 28 countries in two Delphi survey rounds led to the development of a set of criteria for selecting, summarizing, reporting and updating evidence in PDAs. PDAs are promoted as tools that provide evidence-based, trustworthy information. Widespread adherence to the proposed criteria will help ensure that these tools fulfil that promise.

ACK N OWLED G EM ENTS
We would like to thank all of the participants who took the time to complete the Delphi surveys. We would like to thank our patient partner, Stephen T. Campbell, for being part of the steering group during the development of the protocol and first Delphi survey, and wish to acknowledge Jaclyn A. Engel for conducting a review of the final draft of the manuscript prior to submission.

CO N FLI C T S O F I NTE R E S T
Professor Glyn Elwyn has edited and published books that pro-

DATA AVA I L A B I L I T Y S TAT E M E N T
The data sets used and/or analysed during the current study are available from the corresponding author on reasonable request.