Collaborative learning framework for online stakeholder engagement

Abstract Background Public and stakeholder engagement can improve the quality of both research and policy decision making. However, such engagement poses significant methodological challenges in terms of collecting and analysing input from large, diverse groups. Objective To explain how online approaches can facilitate iterative stakeholder engagement, to describe how input from large and diverse stakeholder groups can be analysed and to propose a collaborative learning framework (CLF) to interpret stakeholder engagement results. Methods We use ‘A National Conversation on Reducing the Burden of Suicide in the United States’ as a case study of online stakeholder engagement and employ a Bayesian data modelling approach to develop a CLF. Results Our data modelling results identified six distinct stakeholder clusters that varied in the degree of individual articulation and group agreement and exhibited one of the three learning styles: learning towards consensus, learning by contrast and groupthink. Learning by contrast was the most common, or dominant, learning style in this study. Conclusion Study results were used to develop a CLF, which helps explore multitude of stakeholder perspectives; identifies clusters of participants with similar shifts in beliefs; offers an empirically derived indicator of engagement quality; and helps determine the dominant learning style. The ability to detect learning by contrast helps illustrate differences in stakeholder perspectives, which may help policymakers, including Patient‐Centered Outcomes Research Institute, make better decisions by soliciting and incorporating input from patients, caregivers, health‐care providers and researchers. Study results have important implications for soliciting and incorporating input from stakeholders with different interests and perspectives.

Objective To explain how online approaches can facilitate iterative stakeholder engagement, to describe how input from large and diverse stakeholder groups can be analysed and to propose a collaborative learning framework (CLF) to interpret stakeholder engagement results.
Methods We use 'A National Conversation on Reducing the Burden of Suicide in the United States' as a case study of online stakeholder engagement and employ a Bayesian data modelling approach to develop a CLF.
Results Our data modelling results identified six distinct stakeholder clusters that varied in the degree of individual articulation and group agreement and exhibited one of the three learning styles: learning towards consensus, learning by contrast and groupthink. Learning by contrast was the most common, or dominant, learning style in this study.
Conclusion Study results were used to develop a CLF, which helps explore multitude of stakeholder perspectives; identifies clusters of participants with similar shifts in beliefs; offers an empirically derived indicator of engagement quality; and helps determine the dominant learning style. The ability to detect learning by contrast helps illustrate differences in stakeholder perspectives, which may help policymakers, including Patient-Centered Outcomes Research Institute, make better decisions by soliciting and incorporating input from patients, caregivers, health-care providers and researchers. Study results have important implications for soliciting and incorporating input from stakeholders with different interests and perspectives.

Introduction
Engaging patients, providers, policymakers and other relevant stakeholders can improve the quality of research, especially in health services and public health research. [1][2][3] For example, stakeholder engagement can enhance the cultural sensitivity of the research process, 4 make science more transparent, 5 improve the relevance of interventions to patient and community needs, 6,7 boost public use of research 8,9 and facilitate policy efforts to reduce health disparities. 10 Similarly, public and stakeholder involvement in health policy decision making fosters legitimacy of policy processes, 11 expands norms and values that are taken into account during decision making 12 and promotes a more careful consideration of alternatives. 13 More generally, involvement of large and diverse stakeholder groups in decision-making processes may foster deliberation 12 and promote collaborative learning, 14 which may help stakeholders understand alternative perspectives, clarify their own positions and participate in an open dialogue with those who may disagree with them. Better individual and group judgments on a range of health-related topics may result from large-scale stakeholder and public engagement. 12,15 Although previous research explains the reasons for, and the value of, public and stakeholder participation in research and health policy deliberations, it is not clear how largescale engagement efforts should be designed and how their results should be analysed. 16 In this study, we argue that large, diverse groups of experts and ordinary citizens can be effectively engaged using an online, Delphi-based system, 17 and their input can be analysed with Bayesian data modelling techniques. We draw upon a recent large-scale study on assessing suicide prevention research priorities 18 to propose a new conceptual and analytic framework for conducting online stakeholder engagement panels called 'collaborative learning framework' (CLF). We argue that the CLF offers a conceptual and analytic structure for online stakeholder engagement panels.

Methods and modes of stakeholder engagement
While public engagement usually involves asking citizens to participate in surveys, 13 focus groups, 19 or citizens' juries, 20 expert input is typically collected using Delphi-based approaches, 21 which offer participants an opportunity to independently and anonymously provide responses to a series of questions, receive feedback on how their responses compare to those of other participants and revise their original answers. 17 However, input from ordinary citizens and experts is rarely solicited simultaneously because expanding the panel to include both subject matter experts and ordinary citizens can be problematic, especially if panellists meet face-to-face. The diversity of expertise and comprehension of technical concepts can reduce the panel's ability to reach a common understanding. 22 Socio-economic differences may prevent panellists from sharing ideas, considering other perspectives and understanding consequences of proposed actions. 23 Online panel formats that provide complete or partial participant anonymity have been used to engage large and diverse groups of individuals around health-care issues effectively and costefficiently. 24,25 Like face-to-face expert panels, 26 online panels typically use a modified Delphi structure that adds a discussion round between the rating rounds. [27][28][29] Online discussions allow non-collocated stakeholders to share their positions, learn from each other and judge arguments of other participants based on the soundness of arguments, rather than participants' personalities because of their anonymous nature. 28 Such 'interactive participation' 30 of relevant stakeholders can promote collaborative or deliberative learning, and it can help participants articulate their own perspectives and learn about different viewpoints. 12,14 Methods of analysing data collected from large and diverse groups Several approaches are available for analysing the data collected from large and diverse groups.
One approach relies on a simple aggregation of individual judgments. An aggregate judgment is often superior to the judgment of any individual group member, 31 including the most knowledgeable individuals. 22,32 Simple aggregation seems particularly relevant in situations where the correct answer is not known (or where there is no correct answer, as with many complex policy issues), for interaction among participants helps reduce 'the error or bias in individual judgments, deriving from incomplete knowledge or misunderstanding.' 21 Nonetheless, because competence and expertise in large and diverse panels are not equally distributed, some researchers argue that differential weighting of panellists' judgment is advantageous for producing a high-quality group judgment. 33 The bases for differential weighting in stakeholder engagement panels, however, are unclear and may be ethically and politically unacceptable, especially in panels that include patients and clinicians. It is often difficult to identify a priori the exact competence of each stakeholder 21 and to know what combination of expertise will be needed to address the complex and multifaceted problems typically presented to expert panels. 34 Perhaps most important, differential weighting of stakeholder input based on competence violates the underlying principles of community-engaged research, 23 which promote the democratic legitimacy of the policymaking process.
An alternative approach is to require panellists to develop consensus. However, diverse groups may fail to reach consensus on all relevant issues. 35 Even if consensus is reached, it is typically calculated based on the data from the final round of questions 36 and may make those participants with a minority perspective feel underappreciated. Furthermore, 'forced' consensus in groups with truly different perspectives may be meaningless and may distract from efforts to understand areas of stakeholder disagreement, which can be very important. 37 Because the goal of stakeholder panels is to engage large and diverse groups of individuals, we argue that the analytic methods used to analyse their input should (i) incorporate all the data collected throughout the study; (ii) identify the points of agreement and disagreement among stakeholders; (iii) determine the final group judgment in a way that is sensitive to the existence of conflicting or contradictory perspectives; (iv) use differential weighting of participants' responses, because they are likely to be 'noisy' and of variable quality; and (v) prioritize the input of panellists not based on their competence, but rather based on the quality of stakeholder participation and the extent of their learning during online engagement. This last criterion is arguably the only empirical information about the behaviour of participating stakeholders that can be objectively collected throughout the engagement process itself. These five statements form the foundation of the CLF.

Theoretical framework
The CLF is inspired by the literature on public deliberation, which suggests that public engagement of experts (e.g. clinicians) and ordinary citizens (e.g. patients) maximizes mutual learning and helps sharpen their perspectives. 12 It is also motivated by computer-mediated communication, which defines collaborative learning as two or more people engaging in learning activities together using online tools. 38,39 We argue that participants in online stakeholder engagement processes engage in collaborative learning by understanding how their individual answers fit within the overall group response, discussing the group's responses via anonymous online discussion boards and having an opportunity to revise their answers throughout the study. Collaborative learning is evidenced by changes in individual responses as well as shifts in the overall group judgment between rounds. By looking at the patterns of these changes, we identify clusters of participants that experience similar shifts in their latent or underlying beliefs (as expressed by their answers to study questions), develop a typology of collaborative learning and explain how it can be used to differentially weight stakeholder input during data analysis.
Following the social choice approach to expert panels, 34 we argue that the quality of stakeholder engagement can be judged based on panellists' ability to divulge their latent beliefs to other participants. Panellists, regardless of their individual characteristics, are expected to contribute expertise to the final group judgment by casting informative votes, that is by answering questions and contributing to the online discussion in a way that is consistent with their personal convictions, which are grounded in their prior experiences and interpretation of the available evidence. As a result, participants who are actively engaged in the online process are more likely to learn from other stakeholders than their more passive counterparts.
To detect the presence of collaborative learning, we look at the change in both individual and group judgments between rounds. Being exposed to and engaged by the opinions of other participants may help stakeholders better understand alternative views, potentially change their own perspectives and ultimately affect the group's judgment. 12 Conducting two to three question rounds is typically enough to increase within-group agreement, which refers to an increase in the relative concentration of participant's answers around a particular response. 21 A reduction in how much participant answers vary between rounds indicates that their opinions are moving closer to each other. 40 Therefore, we consider shifts in the group's judgment between rounds towards agreement, or the relative concentration of individual judgments around the group mean, to be the first indicator of learning.
Although changes in individual responses may not lead to an increase in group agreement, they may still be a desirable sign of learning. These shifts may be associated with participants' improved abilities to understand and/or answer study questions. 41,42 Indeed, participants may better differentiate between response categories, learn from the group's responses and improve their ability to express their latent beliefs by answering given questions, which can happen when participants are exposed to group results and answer the same questions more than once. We call the ability to express one's latent beliefs 'articulation'or the relative concentration of participants' answers around their own latent beliefsand treat it as the second indicator of learning.
Because there may be multiple perspectives within large and diverse groups, the CLF is not based on the assumption that stakeholders should reach consensus. However, the CLF focuses on exploration of shifts towards agreement or disagreement and assumes the existence of clusterssubgroups of participants who express similar degrees of change in their underlying beliefs and abilities to articulate them throughout the online engagement process. Such clusters can be determined empirically based on the changes in the relative concentration of stakeholder beliefs (group agreement) and stakeholders' abilities to express their own latent beliefs (articulation).
A key CLF characteristic is the empirical identification of clusters to categorize participants into a typology of collaborative learning, which is based on the presence and direction of changes in group agreement and individual articulation between rounds within clusters, relative to participants' respective latent beliefs. For example, participants with relatively large change in group agreement and low change in articulation are assigned to one cluster, whereas those with relatively low change in agreement and high change in articulation are assigned to another. One of the main benefits of clustering is the ability to recognize agreement or disagreement among participants, which helps ensure that the engagement process does not encourage the development of false consensus that is not reflective of underlying participant beliefs.

Methods
To evaluate the nature of collaborative learning and to present a new approach to analysing large-scale stakeholder engagement data, we use 'A National Conversation on Reducing the Burden of Suicide in the United States' project as a case study. 18,43 The goal of this project was to generate and prioritize aspirational research goals that can reduce suicidal attempts and suicides in the United States by 20% within five years and by 40% or more within 10 years, if this research agenda were fully implemented.
As is common in Delphi panels that solicit input from individuals with relevant knowledge and expertise who can represent a diversity of perspectives that may exist on an issue, 26,44 recruited stakeholders were a purposefully selected sample of adults whose professional or personal lives have been affected by the state of suicide prevention research. The list of potential invitees (individuals personally affected by suicide, researchers, health-care and other treatment providers, policymakers) was obtained by searching the websites of relevant professional associations, academic departments, research funding institutions and asking research team members to nominate relevant stakeholders. Interested stakeholders were asked to register for the study online; registered participants received email notifications with login information and instructions on how to complete each study round. For additional details on recruitment and study methodology, see ref. 18.
The project used ExpertLens (EL), a previously evaluated modified-Delphi. system designed specifically for conducting online panels for research purposes, 40,45,46 to solicit input from 511 stakeholders in a three-round iterative online engagement process. Although participants were not required to reach consensus, they were told that the study would consist of three rounds. In Round 1 (R1) of the EL process, participants rated 12 proposed suicide prevention goals (e.g. prevent repeat suicide attempts by improving follow-up care after a suicide attempt) on four criteria (e.g. potential of this goal to prevent fatal and non-fatal suicide attempts) using 10-point Likert-type scales. In Round 2 (R2), participants saw how their own answers compared to those of other participants; they were presented with distributions of group responses, group medians and quartiles. Participants also discussed group responses using partially anonymous, moderated, online discussion boards. In Round 3 (R3), they re-answered R1 questions and responded to a series of questions about their satisfaction with the EL process. 43

Data analysis
We employed Bayesian data modelling to uncover points of agreement and disagreement between stakeholders' latent beliefs using their responses to study questions, tracking changes in individual and group judgments between rounds and identifying patterns of changes in individual articulation and group agreement. Statistical details of our analytic approach are presented in Appendix S1.
We choose a Bayesian approach for two reasons. First, it allows us to introduce a latent continuous response that generates observed ordered scores, an intuitive formulation that facilitates rich inference of latent individual and group beliefs in situations where the 'correct' answer to a question is unknown. We argue that each rated suicide prevention research goal possesses an intrinsic value that we do not observe. We 'de-noise' the data by uncovering the unobserved latent participant beliefs, which are intrinsic properties of individual stakeholders, and then use them to discover stakeholders' intrinsic scores for each goal. While we may never know the actual intrinsic values with certainty because they are unobserved, we estimate them from participants' answers to study questions.
Second, our Bayesian approach allows the data to reveal clusters of participants who express similar types of learning, determined by changes in the level of individual articulation and group agreement. Identification of clusters is particularly important when participants are diverse and when the group composition may affect the group judgment. Clusters can help us better understand the differences and similarities in the ways stakeholders' beliefs and abilities to articulate them change throughout the engagement process and identify potential coalitions among participants based on the changes in their judgments.

Results
The analytic sample for this study consists of 207 participants who answered questions in both R1 and R3 (41% of 511 participants). The majority of sampled participants were White (94%) females (67%) between 45 and 64 years of age (66%) with a Master's (39%) or a Doctorate (36%) degree. 1 The sample was diverse in terms of the represented stakeholder groups: 33% of stakeholders were survivors (e.g. people touched by suicide), 27% were suicide researchers, 22% were health-care and other treatment providers and 18% were policymakers and administrative decision-makers. Ten per cent did not answer one or more questions exclusively in R1, 15% did not do so exclusively in R3, and 3% did not do so in both rounds. We imputed missing scores from their posterior predictive distributions based on our model formulation (see Appendix S1).

Judgment change between rounds
Results of our study suggest that individual judgments changed throughout the engagement process. On average, participants changed 20 of 48 answers. One hundred and ninety-four of 207 (94%) stakeholders changed at least one of their answers, and 94 (45%) changed at least half of their R1 answers. While the largest number of participants (n = 16) changed 21 of their 48 R1 responses, two participants changed all their answers. Although the vast majority of participants changed their judgments throughout the engagement process, the average magnitude of this change was not large. An average change in stakeholder judgments was very close to 1 on a 10-point scale. Similarly, average change in the mean group ratings of the twelve goals was very small (0.06 on a 10-point scale). Nonetheless, the standard deviations for all goals decreased between rounds (average decrease across all goals between rounds was 0.2), suggesting that there was an increase in group agreement (data not shown).

Clusters
Data modelling revealed six distinct clusters, 2 which included participants with similar degrees of changes in their underlying beliefs and abilities to articulate them throughout the online process. Clusters varied in terms of size, composition, degree and nature of changes in stakeholders' responses, but did not differ in terms of gender, race/ethnicity or education of their members. While some clusters saw improvements in levels of stakeholders' articulation or experienced movement towards group agreement, others experienced both types of changes. All clusters saw changes in either articulation or agreement. To illustrate, Cluster 1 was the largest cluster (n = 50), whereas Cluster 6 was the smallest (n = 16). Cluster 1 was the most diverse because it had roughly the same number of researchers, providers and administrators, with a slightly smaller number of survivors, whereas Cluster 2 was the least diverse cluster and was dominated by researchers (see Fig. 1). While Cluster 1 participants increased their level of articulation the most, as judged by the relative concentration of ratings across all goals for each stakeholder within each round, Cluster 6 members experienced the smallest increase in the levels of individual articulation (see Fig. 2). Although they did not gain in individual articulation, Cluster 6 participants experienced the largest move towards group agreement, as measured by the largest reduction in the variance of scores for a given goal across all stakeholders between the rounds (see Fig. 3). Members of clusters 1, 2 and 3, however, experienced rather trivial improvements in the level of group agreement.
Finally, there was variation in satisfaction with the online engagement process between clusters. Cluster 1 members were the most engaged in the online process, as measured by the satisfaction survey questions (Table 1). For 1 These percentages are based on the total sample size of 199 participants who provided answers to demographic questions. 2 Note that only 172 of 207 participants belong to these six clusters. The remaining 35 participants belong to a number of much smaller clusters, which we do not discuss in this study due to their limited sizes.  example, they agreed that participation in this exercise was interesting, that divergent views were expressed during the online discussion and that participants debated each other's views. At the same time, Cluster 6 participants felt least engaged, as they only had a neutral opinion about the extent to which the online engagement tool was easy to use and divergent views were expressed during the online discussion.

Types of collaborative learning
By looking at clusters, we identified certain patterns in the level of changes in individual and group judgments. We propose a typology of collaborative learning based on the presence and direction of changes in individual articulation and group agreement within clusters, relative to their respective latent beliefs (see Table 2). The most expected type of learning in Delphi panels takes place when statistical feedback and group discussion help increase the articulation of individual responses and move the group towards agreement. We have named this situa-tion learning towards consensus. Exposure to the opinions of other participants may improve stakeholders' ability to express their latent beliefs by answering the study questions and may also encourage them to change their answers and reach agreement based on the new information that they received during R2. This was the case in clusters 4 and 6. If online discussion is either anonymous or partially anonymous (i.e. where only a participant's stakeholder group is revealed to others), changes in participants' judgments are more likely to be attributed to the quality of arguments made by a particular stakeholder than to his/her social status or characteristics. 40 Improved articulation not accompanied by increased group agreement illustrates learning by contrast, which may be explained by an anchoring effect. 47 Assuming that individual R1 responses serve as anchors, or bases for comparison, 48 exposure to different perspectives and the group response in R2 may be seen as anchorinconsistent information that encourages stakeholders to clarify their own position and improve their ability to express their individual beliefs in an attempt to better differentiate their position from that of other participants. At the same time, receiving anchor-consistent information, such as seeing that your own answers are similar to the group averages or reading discussion comments that you agree with, may help reinforce participants' original positions, but not affect the overall group response. Furthermore, stakeholders may experience less agreement as a result of R2 feedback and discussion.
Learning by contrast may happen when a diverse group of stakeholders with strong and well-established opinions (e.g. opinion leaders) provide input on an issue of great concern, such as suicide prevention strategies; exposure to alternative perspectives may help them clarify their own beliefs and may improve their ability to express them, but does not improve the group agreement. Although learning by contrast may not be a desirable outcome of an expert panel, it is a welcomed result of a stakeholder engagement panel when reaching consensus may not be expected or desired. Clusters 1, 2 and 3 illustrate learning by contrast.
Participants' responses may also become more concentrated around a particular value, but the articulation of individual responses may not improve between rounds. In such a situation, the group may suffer from groupthink, as illustrated with Cluster 5. Participants may conform to the majority opinion, and those in minority may be unwilling to voice opinions that do not align with the majority view. 49,50 Although anonymity of online stakeholder engagement processes is intended to facilitate honest and open discussion, stakeholders may find it more difficult to debate with anonymous individuals online. Partial anonymity may still make some participants uncomfortable sharing perspective with members of a more powerful group. Moreover, individual articulation may not increase after R2   because participants may not want to spend time thinking about their answers, knowing that their original position was outside of the group's typical response range. If participants' ability to answer questions does not increase between rounds and there is no movement towards agreement, it is likely that stakeholders did not pay enough attention to questions, were not very interested in providing high-quality input, may not have had sufficient knowledge to participate in the study or were distracted by the online nature or complexity of the study. We call this situation no learning. Judgments of those participants who have not increased their level of articulation or whose answers did not affect group agreement could be down-weighted or potentially ignored in determining the final group judgment. Because none of our clusters belongs to this group, one can argue that participants in this study were engaged in the online process.
While all clusters experienced some learning, learning by contrast was the dominant learning style: three of six clusters, as well as 120 of 174 stakeholders, experienced it. Because our approach is not based on forcing consensus, it can detect improvements in individual articulation not accompanied by increased group agreement. Indeed, the underlying beliefs of the majority of our panellists did not shift towards agreement as a result of their engagement, which illustrates the importance of considering the plurality of perspectives on suicide prevention research strategies that exist. Therefore, the ability of the CLF to detect learning by contrast can help better illustrate differences in stakeholder perspectives, which may help policymakers make more informed decisions. 12

Discussion
We presented a novel approach for collecting, analysing and interpreting the online data collected from large and diverse groups. Instead of requiring participants to reach consensus, our approach helps explore both agreement and disagreement among diverse stakeholder groups, which is important for understanding the plurality of perspectives that may exist on a given issue. The online engagement process helps solicit input from a large number of stakeholders with different perspectives who can contribute using an internet-connected computer at a time convenient to them. Stakeholders do not have to travel to a centralized location. Participant anonymity can help stakeholders evaluate the quality of other participants' arguments without being negatively affected by their personalities or demographic characteristics. Although the increased panel size does not significantly increase the data collection costs, it allows for inviting stakeholders with different types and areas of expertise, some of whom may not have been considered traditional 'experts' (e.g. suicide survivors). 40 To better explain the nature of learning that may have taken place throughout the online engagement process, our collaborative learning framework relies on Bayesian data modelling techniques and can detect movements in group judgments towards agreement or disagreement. The CLF is based on the assumption that the quality of online engagement depends on stakeholders' ability to divulge their latent or underlying beliefs to other participants via responses to study questions and their ability to learn from one another. Therefore, it is important to encourage active participant engagement during all rounds of the online panel, to analyse changes in stakeholders' responses to study questions and to detect shifts in the overall group judgment between rounds of engagement. By looking at the patterns of these changes, the CLF can (i) identify clusters of participants that experience similar shifts in their latent beliefs as expressed by their answers to study questions and (ii) help empirically determine weights for the input of different stakeholders, based on the type of their collaborative learning, during data analysis.
Below we discuss some methodological, practical and policy implications that our approach has for conducting large-scale stakeholder engagement panels on health-related and other topics.

Methodological implications
Analysing composition of participant clusters offers a useful approach for identifying similarities and differences between stakeholders that are based on their engagement in an online process rather than demographic characteristics. Importantly, the process of identifying relevant stakeholder clusters is data driven and uses the extent of stakeholder learning as an indicator of the quality of their input. The measures of collaborative learning are collected as part of the online process, and the final group judgment, if desired, may be estimated by aggregating weighted individual judgments based on the prevalence of their cluster's learning style, cluster size and/or the level of online engagement.
Looking at the contributions of only those individuals who experienced some collaborative learning may improve the quality of the final group judgment because it would discount the judgments of those participants who were not sufficiently engaged. Indeed, results of our study suggest that participants with the most favourable attitudes towards the online stakeholder engagement process, as measured by the satisfaction questions, belong to either Cluster 1 or Cluster 2, both of which experienced learning by contrast. This finding suggests that level of engagement may be a promising predictor of the learning style that best characterizes a particular study and therefore could be used to define the weights assigned to each participant's judgments.
Similarly, cluster size may also be used to develop weights for data analysis in situations when accuracy of judgments is impossible to determine because the correct answer does not exist. The largest number of participants and clusters in our study also experienced learning by contrast. Therefore, because stakeholders vary in how they view priorities for suicide prevention research, the judgment of individuals who belong to clusters that illustrate learning by contrast might be given more weight than the judgments provided by participants who exhibited other learning styles.

Practical implications
Results of our study offer two implications for conducting stakeholder panels, including those focused on determining health-care research priorities, developing new guidelines or developing health interventions. First, they illustrate the importance of recruiting large and truly diverse samples of participants with relevant knowledge. Having enough participants with different backgrounds (e.g. patients, clinicians, researchers, policymakers) is a pre requisite for ensuring that the 'truth' is distributed among stakeholders' perspectives.
Second, our study results highlight the value of maximizing participants' level of online engagement. It is very important to recruit an experienced online discussion moderator who can encourage active participation in the panel by summarizing differences in expressed opinions; asking interesting discussion-provoking questions; and encouraging open, respectful and active discussion. Moderators can help expose participants to different perspective, which is important for facilitating collaborative learning.

Policy implications
Our approach has direct practical implications to engaging stakeholders around a number of health-care policy areas, including the process of priority setting for health services research 51 and conduct of patient-centred outcomes research (PCOR). 52 To improve health-care delivery and outcomes and to help patients make informed decisions about their health, PCOR needs to be informed by the perspectives of patients, caregivers, researchers, clinicians and the broader health-care community. Online stakeholder panels and the CLF can give all relevant stakeholders a fair voice throughout the engagement process by identifying points of agreement and disagreement within and between clusters of stakeholders and by synthesizing stakeholder input based on the quality of their participation as judged by the objective measures of engagement. Moreover, post hoc identification of the participant characteristics that illustrate cluster differences is a useful application of this method for policymakers. For example, they might use such information to determine which stakeholder groups can work with each other on a particular issue moving forward. The online approach can be a valuable and cost-efficient supplement to face-to-face meetings, round-table discussions, town hall meetings and surveys that are conducted to engage stakeholders and identify the national research priorities for health services research and evaluating its impact on patients and communities. 51,53 It is important to note, however, that online engagement is not a substitute for forming personal and trustworthy relationships within local communities, which is very important for communitybased participatory research. 54 Rather, it is a useful adjunct that allows for large-scale (e.g. national) stakeholder engagement that may not be possible otherwise. As such, it may help build ties between stakeholders across the nation and consequently enhance collaborative learning, capacity building and stakeholder ability to affect policy change, which are the main tenants of community-engaged research 55 and deliberative democracy. 12

Study limitations
This study has several limitations that we plan to address in the future. First, the sample used to develop the CLF was not necessarily representative of different stakeholder groups. Although expert panel participants are typically purposefully selected to guarantee diversity of perspectives, 44 future studies should explore ways of ensuring sample representativeness on most relevant criteria.
Second, only a small number of survey questions that measure satisfaction and participant characteristics were asked during the study, which limits our ability to identify relevant cluster characteristics. We plan to add questions about participant values and interests to explore the differences between clusters. Third, our model is based solely on two rounds of rating data. We plan to augment our statistical model by incorporating qualitative discussion data that show the number of comments each stakeholder made and the topics discussed, as well as the data illustrating the amount of time spent in each round.
Fourth, given that participation rates decrease with the addition of a new round in all Delphi studies, 56,57 it is important to develop effective participant engagement and retention strategies, including use of periodic study reminders and discussion status updates via email. Although participants in our study received periodic reminders, the participation rate in all rounds was 41%, which is low yet comparable to other online Delphi processes. 58 In the future, we plan to focus on identifying the characteristics of online tools that facilitate stakeholder retention, make it easier for participants to engage using online discussion boards and promote twoway communication.
Finally, although our approach to data analysis and the resultant CLF has relatively high face validity, we did not have an opportunity to test their robustness. To explore the extent to which our approach can reveal true participant beliefs and group agreement and improve the quality of group judgment, we plan to include a series of questions that have 'correct' answersfor example, use an historical policy task that is not wellknown to participants but has 'known correct' answersin future studies.

Conclusions
Regardless of these limitations, our study proposed a new conceptual and analytic framework for conducting online stakeholder engagement panels. We illustrated a new approach for analysing the input collected from large and diverse stakeholder groups and used a Bayesian approach to develop the CLF that identifies different styles of learning and empirically determine clusters of participants with similar changes in latent beliefs and abilities to express them by answering study questions. We believe that our study findings can help design health interventions and implement guidelines that are more likely to be accepted by different stakeholders and, more broadly, improve policymakers' ability to identify national research priorities that are informed by the input of a wide range of stakeholders.