Blurring the boundaries between synthesis and evaluation. A customized realist evaluative synthesis into adolescent risk behavior prevention

Realist methodologies have been increasingly advocated for the investigation of complex social issues. Public health programs, such as those designed to prevent adolescent risk behavior, are typically considered complex. In conducting a realist review of the empirical literature relating to such programs, we encountered several challenges, including (a) an overabundance of empirical evidence, (b) a problematic level of heterogeneity within and between methodological approaches, (c) discrepancies between theoretical underpinnings and program operationalization, (d) homogeneity of program outcomes, with very little variation in program effectiveness, and (d) a paucity of description relating to content and process. To overcome these challenges, we developed a customized approach to realist evidence synthesis, drawing on the VICTORE (Volition, Implementation, Contexts, Time, Outcomes, Rivalry, and Emergence) complexity checklist and incorporating stakeholder engagement as primary data to achieve greater depth of understanding relating to contextual and mechanistic factors, and the complex interactions between them. Here we discuss the benefits of this adapted methodology alongside an overview of the research through which the methodology was developed. A key finding from this research was that combining the complexity checklist with primary data from stakeholder engagement enabled us to systematically interrogate the data across data sources, uncovering and evidencing mechanisms which may otherwise have remained hidden, giving greater ontological depth to our research findings. This paper builds on key methodological developments in realist research, demonstrating how realist methodologies can be customized to overcome challenges in developing and refining program theory from the literature, and contributes to the broader literature of innovative approaches to realist research.

recommended to be used in a prescriptive sense, and customization of the methodology to account for potential idiosyncrasies within a specific evidence base is accepted.
• A small number of papers within the existing literature have used each of the two key adaptions discussed within this study, though reasons for doing so have not been considered in any great depth. Furthermore, combining both of these adaptions to take an evaluative approach to realist synthesis is novel to this work and lends greater ontological depth to the research findings than may otherwise have been achieved.
• This study builds on key methodological developments in realist research, demonstrating how realist methodologies can be customized to overcome challenges in developing and refining program theory from the literature, and contributes to the broader literature of innovative approaches to realist research.

K E Y W O R D S
adolescent risk behavior, customization, evaluative synthesis, methodology, realist research, synthesis

| INTRODUCTION
Realist methodologies have been used increasingly to investigate complex social issues, and the interventions designed to address them, through the development and evidencing of program theories in the form of context mechanism outcome configurations. 1 Public health programs are typically considered complex, going beyond the level of multi-layered, multi-component interventions, which can be complicated, to take in to account the role of emergent and unpredictable interactions between program elements, proximal and distal contextual factors, and human action in generating program outcomes. 2,3 Public health interventions and their effects develop and emerge over time producing different effects in different contexts. 4 Unpicking these nonlinear causal pathways to develop an understanding of how programs work, for who, in what circumstances and why can be a challenging undertaking. While several key texts set out the core constructs and guiding principles of realist approaches 1,5 strict adherence to these guiding principles is not a requirement, and indeed may need to be tailored to specific project requirements. [6][7][8] The purpose of this research was to investigate programs to prevent or reduce the adoption of multiple health risk behaviors in adolescents. Within the empirical literature, programs are typically evaluated with a view to providing a conclusive and generalizable approach to prevention, often failing to take in to account complex interactions between components or the role of context and human agency. The research reported here aimed to conduct a realist review of existing interventions to understand what works, for whom, in what circumstances and why. As we engaged with the review, we encountered a number of challenges, including: (a) an overabundance of empirical evidence, (b) a problematic level of heterogeneity both within and between approaches to adolescent risk behavior prevention, (c) discrepancies between theoretical underpinnings upon which programs are based and the methods through which programs were operationalized, (d) homogeneity of program outcomes, with very little variation in program effectiveness between across programs making it difficult to ascertain "what works," and (e) a paucity of description relating to content and process. This led to difficulty in collecting, interpreting, and synthesizing the evidence to formulate program theories or the context mechanism outcome configurations on which they are based.
To overcome these challenges a customized approach, blurring the boundaries between realist evaluation and realist synthesis, was developed allowing us to draw equally from primary and secondary sources of evidence to organize findings from the disparate empirical sources, and to ask questions of these groups of evidence to support in the development and refining of program theory. To differentiate between typical methodological approaches used in realist synthesis and the methods used within this project, the approach detailed here will be referred to throughout this study as an evaluative synthesis.

| Purpose and aim
The purpose of this study is to set out the challenges faced in conducting the review, outline the steps taken to customize the realist methodology, and to discuss how this customized methodology was used to overcome these challenges. This study aims to contribute to the realist methodological literature, to provide an example of methodological customization, and to consider the potential usefulness of using an evaluative synthesis approach in conducting future realist research.
The purpose of this evaluative synthesis was to gain a deeper understanding of how, why, for whom, and in what circumstances complex multiple risk behavior prevention programs are most successful in preventing or reducing adolescent risk behavior. The evaluative synthesis, conducted as part of a doctoral thesis 9 will be published as a series of papers detailing the rationale for conducting the research and research protocol, the theoretical framework and program theory development, and key findings from the review. However, a brief overview is provided here in order to contextualize the methodology used, and the specific challenges faced in applying realist methodologies.
Adolescence, defined as falling between the ages of 10 and 19, 10 has historically been considered one of the healthiest life phases, with the lowest rates of morbidity and mortality across the life-course. 11 However, there is increasing recognition that investment in adolescent health programs is pivotal in improving health and wellbeing globally and there has recently been a shift in policy, practice, and research to focus on this critical life phase. 12 Evidence suggests that health risk behaviors do not occur in isolation, but cluster, with adolescents engaging in patterns of health and risk behaviors, signifying potential for shared underlying causal factors. [13][14][15][16][17][18] Within this study, specific attention was paid to tobacco use, alcohol consumption and drug use (often referred to collectively as substance use, and covering an array of different substances), and sexual health and risky sexual practices, as evidence suggests that these are the behaviors which most commonly co-occur. Despite this clustering, and emerging empirical evidence which supports addressing multiple risk behaviors simultaneously, policy recommendations continue to address each behavior separately.
For example, National Institute for Health and Care Excellence have guidelines which address harmful sexual behaviors in adolescence, 19 prevention of sexually transmitted infections, 20 smoking, 21 and substance use. 22 Hale and Viner 23 attribute this to policy developers taking a downstream approach, with a focus on prohibition and reducing accessibility, rather than upstream approaches, which focus on social determinants of health.
The aim of this research was to go beyond efficacy testing approaches, which typically produce moderate outcomes at best, often failing when replicated at scale, to consider the impact of broader health determinants and to understand interactions between these factors and program outcomes.

| Methodology
The evaluative synthesis consisted of four stages: Building the theoretical framework, formulating initial program theories, evidencing and adjudicating between program theories, and testing program theories. Each of these four phases is described briefly below, including literature searching techniques and stakeholder engagement activities.

| Building the theoretical framework
Early screening of the literature investigated the range of approaches employed to prevent adolescent risk behavior to map out the conceptual landscape. The theoretical framework considered how programs are supposed to work, detailing underpinning theory, theory of change, and outcome measures. Literature searching in this phase was conducted using intuitive search terms and focused on retrieval of empirical and theoretical literature (see Table 1).

| Formulating the initial program theories
Guidance on conducting realist research suggests that the starting point is the formulation of early program theories arising from demi regularities or patterns in program outcomes. 5,41 These early program theories then become the focus of the enquiry, providing a framework for the examination and synthesis of data. 1,42 Set out as context-mechanism-outcome configurations, these early program theories are revisited throughout the enquiry as evidence is sought to develop, refine, adjudicate between or refute them. 43 Formulation of the initial program theories was guided by the VICTORE complexity checklist 43,44 and supported through collection and interpretation of primary data from professional stakeholder interviews (n = 6). Stakeholders, recruited using purposive sampling and snowballing techniques, included those involved in the development and delivery of adolescent risk behavior prevention programs, such as researchers, teachers, community youth workers, Personal, Social, Health and Economic education (PSHE) leaders, and peer coordinators. Stakeholders were recruited in this way in this phase to ensure that knowledge and experience aligned with the contextual and mechanistic factors we were seeking to understand.

| Evidencing and adjudicating between program theories
Here the review moves beyond the exploration of program implementation to identify weak points in the chain, seeking to elicit evidence relating to underpinning causal mechanisms and contextual factors which influence the degree to which those mechanisms are activated. This evidence is synthesized in order refine, adjudicate between, or refute developing program theories.
Literature searching in this phase utilized reference and citation searches of key empirical papers captured within the theoretical framework to identify parent, sibling, and follow up papers and targeted searching of relevant editorials, systematic reviews, and discussion pieces to source additional evidence and to further understanding.
To assist us in understanding how specific elements of the program and/or certain contextual factors may impact on program success primary data was collected from young people (n = 28) and school nurses (n = 22).
Five focus groups were conducted with young people, recruited using purposive sampling, to facilitate comparisons between groups, including male and female single gender groups, a mixed gender group, those with experience of involvement in a targeted health behavior program, and those from lower socioeconomic backgrounds to ensure maximum variation in the data collected. These focus groups explored young people's perceptions of health and risk behavior, relevant social and cultural factors, and engagement with risk behavior prevention programs.
Two focus groups were conducted with school nurses, recruited using opportunity sampling based on availability during the data collection period, to capture school nurses' knowledge, understanding, and experience of adolescent healthy lifestyle promotion, and risk reduction programs, and to consider how future policy and practice could be improved.

| Testing the program theories
To enable us to gather the opinion of as many young people as possible, a series of vignettes were designed to investigate key overarching themes emerging from program theories. Vignettes were used to provide a common context around which discussion may be shaped, reducing the need to rely on a personal frame of reference, allowing young to talk openly, without judgment. 45 Youth leaders (n = 2) who had an existing, trusting relationship with the young people, were recruited purposefully as facilitators for dissemination, data collection, and discussion. This was designed to reduce the impact of perceived power imbalances between researcher and participant, and risk of researcher bias 46 increasing the likelihood that the data gathered would be as representative of young people's opinions as possible. This phase was designed to address gaps in knowledge which had not been answered in the three previous stages.
While the methods of carrying out the evaluative synthesis are described here in four distinct phases for the sake of clarity, the synthesis process is much more iterative, cycling between empirical literature searching and data collection, and constant refinement of, adjudication between, and evidencing of emerging program theories. This is represented in Figure 1, which was developed from descriptions by Emmel. 47

| Key findings of the evaluative synthesis
Using a realist approach to analysis, in which evidence is sought and synthesized from the entire range of data sources, 24 program theories were developed across six broad themes; implementation fidelity, program design, content and delivery, school ethos, family, broader social influences, and personal factors. Substantive theory was then sought to explain the pattern of findings observed within the data. As a result of this, three overarching areas for further consideration were identified (a) relationships, (b) program ethos, quality, and behavior change, and (c) community, culture, and health inequalities.

| THE PRINCIPLES OF REALIST REVIEW
The purpose here is not to provide a comprehensive guide to conducting realist research, but to give a brief overview of the guiding principles of realist methodologies with a focus on carrying out a realist review.
Realism is a theory driven methodological paradigm, rooted in philosophy, which sits between positivism and constructivism, 48,49 developed in response to the limitations of empirical science in explaining outcomes. In realist methodologies, outcomes are not seen as a direct result of the intervention or program being delivered, but as a result of activation of causal mechanisms, and the specific contexts in which they occur. 6 These combinations of context, mechanism, and outcomes, often referred to as program theories, provide a framework around which evidence can be sought to explain why observed outcomes, both intended and unintended, may be occurring. Building on this central tenet, realist research moves away from the empirical idea of exploring whether an intervention works, to ask what works, for whom, in what circumstances, how, and why. Pawson et al 1 postulate that the first step in conducting a realist review is to understand the nature of the programs or interventions being examined, in order to match F I G U R E 1 Zigzagging-realist synthesis data collection processes [Colour figure can be viewed at wileyonlinelibrary.com] methodology to the phenomena in question. To do this, Pawson et al set out several core underpinning principles, detailed below.
They begin by suggesting that all programs are theories. Prevention programs are based on a hypothesis, which assumes that if a program provides a set of resources, manipulates key factors, or delivers services in a particular way, then it will bring about a predictable change in outcomes. In this way programs consider what factors contribute to the uptake and maintenance of the target behavior(s), then theorize about how these factors can be changed or manipulated to facilitate change. Improvements in outcomes then, occur as a result of changes made to the social system into which the program is introduced.
Following this, Pawson et al 1 posit that programs are active, and that program effects are brought about through the involvement of human action. As a result of this, prevention strategies delivered within the program may be enacted and heeded, or they may be left out, forgotten, or ignored or overlooked in some way, or it may be rejected as unsuitable or overly paternalistic or moralistic by either those delivering or those receiving the program. This can lead to a range of issues in program evaluation and interpretation. Furthermore, knowledge of stakeholder reasoning, Pawson et al 1 state, is integral to understanding program outcomes.
An extension of this principle implies that program implementation chains are long and densely populated. 1 Programs begin in the minds of the developers, pass through management and those implementing the program, program deliverers, and hopefully finally into the hearts and minds of program recipients. At any of these points, programs are susceptible to misinterpretation or failure leading to possible unintended outcomes. Reviews therefore should inspect the integrity of the implementation chain, investigating what needs to occur for program success, and where the blockages and contentions occur which act as a barrier to success. 1 Up to this point, programs are described as being populated by individuals, and activated through engagement with resources, reasoning behind engagement, and human volition in the choices that are made about health. 1 However, programs are not delivered into a vacuum, but within complex social systems which may shape the way in which they are understood, delivered, and received. Rarely is the same program equally as effective when delivered in new or differing contexts. Regardless of the delivery of the same strategies and resources, differences in the layers that make up the social context, such as commitment from management to accommodate the program, staff training, availability, and willingness to engage with the program, socioeconomic status of the area or community in to which the program is introduced, and availability of local resources, could all change the way in which the program operates. These contextual complexities represent one of the greatest difficulties in empirical evaluation of prevention programs and are often a key focus in realist review.
The remaining paper covers the challenges experienced in the process of conducting this review, including the expansiveness and heterogeneity of the evidence base, challenges in identifying, extracting and synthesizing evidence relating to processes within and surrounding the program, and the relationship between approaches to behavior change employed within the empirical literature and operationalization of the theoretical concepts which underpin those programs. Customization of the realist synthesis methodology, incorporating methods typically associated with realist evaluation, is discussed and its contribution to the broader methodological literature considered.

| Challenges encountered
Initial searching of the existing empirical literature produced a large and varied body of evidence. While programs were typically based on shared theoretical underpinnings, most commonly comprising of constructs drawn from social learning theory 50 and the social development model of behavior change, 51 the approaches through which these theoretical underpinnings were operationalized differed significantly. This was apparent across a range of factors, such as research design, health risk behaviors targeted, agent for and method of delivery, timing and duration of the program, and age of target population. As described within the research protocol overview in the first phase of the review, to provide greater conceptual clarity, programs were grouped in to six broad approaches or models based on descriptions provided within the literature. Interventions falling within these six broad domains share conceptual characteristics, such as behavioral change techniques or constructs on which the program was built, delivery methods, and key outcome measures. However, there remained key methodological differences between programs, even within these domains.
This heterogeneity in program design and implementation proved to be a key challenge in developing the initial theoretical framework, matching methodologies used to operationalize adolescent risk behavior prevention strategies, and in subsequent attempts to unpick the "black box" to understand why observed outcomes may be occurring, for who, and in what contexts. As Pawson et al 1 state, defining the guiding principles of realist review, programs should be considered theories, whereby providing a particular set of resources, manipulating particular mechanisms or sets of mechanisms for change, or delivering services in a particular way should bring about predictable change. However, given the diversity of evidence captured within the literature, identification, extraction, and synthesis of evidence pertaining to key influential contextual and mechanistic factors and their impact on program outcomes proved difficult. Differences in approaches, such as who is delivering the program, the platform through which it is delivered or the context in which it is delivered make understanding what it is about that program that contributes to the changes observed in outcomes problematic.
The purpose of underpinning a program with theory is threefold; to guide the targeting of the behaviors for change with the correct, evidenced, behavior change techniques, to allow for development or adaptation of the program in line with and guided by the underpinning theory, and in evaluation to inform not just what works, but how and why. 52 However, most programs designed to change or prevent adolescent risk behaviors tend to focus on individual capabilities, such as knowledge, skills, and motivation and often fail to consider the broader or deeper influences, such as interpersonal relationships, which can lead a program to succeed or fail. The poor application of theory, both in program design and evaluation, underpins another challenge faced within this review which led to the need to customize the methodology.
Finally, within the available empirical literature there tended to be very little difference in outcomes, regardless of approach, with many programs having a small to moderate effect (Cohen's d = 2.5) in comparison to no treatment or treatment as usual controls. 53 Attempts to explain these poor or unexpected outcomes within the published literature tended to fall back on discussions centered on fidelity and adherence, or lack thereof, to program protocol. While this may indeed be an influencing factor, over reliance on its explanatory power in relation to observed outcomes, both intended and unintended, runs the risk of ignoring the influence of other explanatory or causal factors, which are very rarely discussed in any detail, making the formulation of context-mechanism-outcome configurations difficult. This homogeneity, or lack of any significant difference in outcomes from one program to another, or indeed one approach to another made it difficult to formulate hypotheses based on what works. With many programs producing a similar outcome it was difficult to answer questions relating to what works best. As it was not possible to separate out differences in outcomes, and investigate the causal mechanisms and contextual factors which lead to those differences, this review became focused more heavily on exploring aspects relating to "for whom, in what circumstances, and why," while aiming to identify factors which may impact on program success or failure (outcome).

| CUSTOMIZING THE REALIST REVIEW METHODOLOGY
In order to overcome these challenges, two custom approaches were drawn on, alongside more typical realist synthesis methodologies, to source, extract, and synthesize the data in a meaningful and informative way. Firstly, The VICTORE checklist was used to cut through complexity, to further unpick the guiding principles of realist research providing a clear starting point for the formulation of early program theories. Following this, primary data was collected from stakeholders to allow us to begin to unpack the black box surrounding adolescent risk behavior prevention programs, and to inform theories relating to stakeholder reasoning or volitions and the contextual factors which influence participant decision making. Each of these novel adaptions is described in greater detail below.

| The VICTORE checklist
It is a core assumption of the realist approach that programs, such as those designed to prevent or reduce risk behaviors in adolescence are complex interventions delivered into complex social systems. More typically associated with realist evaluation, Pawson provides a checklist to aid in the identification of key aspects of complexity within a program. 43,44 This checklist, set out under the acronym VICTORE (Volition, Implementation, Contexts, Time, Outcomes, Rivalry, and Emergence), provides a tool by which all complex programs or interventions can be explored, allowing realist researchers to map a program, or family of programs to identify areas where further exploration is needed. Each of these seven characteristics are defined below, followed by consideration of how we applied them in exploring and mapping complex issues in adolescent multiple risk behavior prevention programs.
Volition is typically defined as the way in which program participants engage with, and respond to, programs or program elements. Though rarely covered in depth within the empirical literature, consideration of the points at which participant reasoning or decision making may influence program outcomes is key to formulating early program theories and was a key contributing factor in the decision to include stakeholder consultation as primary data. It should be noted that the use of "participants" here refers not only to those receiving the program, but to all those involved with the program. Implementation addresses the processes through which the program is operationalized including mode of delivery, training and resources, dose and duration, and fidelity and adaption. Within the empirical literature issues in implementation, most commonly fidelity to program protocol, was frequently cited as reason for limited success, or indeed program failure. 54,55 However, none of the programs included within this evaluative review looked beyond this to ask how or why these issues were arising, or how they were impacting on outcomes.
Context as described in the complexity checklist can be understood on four levels: individual characteristics, interpersonal relationships, institutional settings (the rules, norms and customs which surround the program), and infrastructure (the wider social, economic and cultural settings in which the program is embedded). While some programs were designed not only to address some of these contextual factors, such as peer and familial relationships, 35,37 Social norms, 29 school ethos, 56,57 and the exploration of adolescent use of free/leisure time, 38 very little consideration was given to the impact of these stratified and interacting contextual layers.
Time refers to the history and timing of an intervention and variation in these factors. The history of a program describes the learning which occurs through involvement with earlier iterations of implementation (such as a pilot), or other programs which may or may not have been successful. This learning leads to preformed expectations about an intervention which may impact on program outcomes. Drawing on definitions of adolescence and evidence relating to trends and patterns of risk behavior within the wider literature, this aspect of the complexity checklist facilitated interrogation of the empirical evidence in relation to the age appropriateness of interventions, as well as more common factors such as the historical development of the program.
Outcomes of complex intervention programs move away from the scientific approach of clearly defined variables, and before and after measures, mapping a wide array of measures, which monitor a range of outcomes at numerous levels or time points. As previously stated in setting out the challenges encountered within this program, outcomes of adolescent risk behavior prevention programs tend to show little difference in outcomes, demonstrating small to moderate effect sizes in comparison to controls. Therefore, this characteristic was not as influential in understanding the evidence as it may typically be in more traditional realist evaluations.
Rivalry refers to the potential impact of other programs delivered in the field. Social programs are delivered into a world populated with other programs. The way in which programs follow, sit alongside, or even within other programs can have a significant impact on their success and can greatly add to difficulties in examining where effects are coming from. For example, learning may occur as a result of previous experience and be retained both by those delivering and receiving the program. While these experiences may impact on young people's future receptiveness to and engagement with programs of a similar nature, Pawson et al 1 suggest that the greatest impact may be on those delivering the program, impacting on attitudes towards and belief in the program (program buy in).
Emergence, the final characteristic of complexity set out in Pawson's checklist, is defined as the combining of program components to produce novel or unexpected outcomes, thus the systems under investigation continually evolve and adapt. Understanding complexity requires us to map these adaptions, societal changes and unintended consequences, and note the impact on program effectiveness. This final characteristic proves more difficult to explore in realist synthesis, as emergent or novel adaptions, though often alluded to are rarely reported in any detail in the literature.
While there was no new intervention or trial program to be evaluated, using realist methods of enquiry to go beyond what works in adolescent risk behavior prevention, to explore for whom, in what circumstances, how, and why is a novel approach to the investigation of adolescent risk behavior prevention. Building on the work of King et al, 58 utilization of the complexity checklist within this study facilitated us in taking a step back from the programs themselves to develop a series of questions (see Box ) which allowed us to interrogate the data in a systematic way in order to develop a set of early program theories.
Using these seven characteristics, reviewing the evidence within the theoretical framework began with a unique case analysis of each of the six approaches identified, to explore potential contextual and mechanistic factors which may be specific to each approach. Following this, a cross case analysis was conducted, seeking to explore patterns in findings which may be common across some or all the approaches. Through this process a set of early program theories were developed.
Evidence was then sought, using increasingly focused literature searching in combination with primary data collection, to develop, refine, adjudicate between, or refute theories. While the VICTORE checklist aided us in developing these early program theories, the information available within the published literature was rarely enough on its own to develop and refine these theories beyond this initial stage. To address this issue, stakeholder consultation was sought at each stage of the review process, further to this data collected was incorporated as primary data to ensure transparency and to give greater strength to stakeholder voices. This process is described in greater detail below.

| Incorporating stakeholder voices
With growing interest in implementation science and knowledge exchange, stakeholder engagement is increasingly recognized as good practice in the development and testing of public health programs. 59 Close stakeholder engagement is recommended in conducting realist research, however within a realist synthesis this typically takes the form of consultation. 1,42,60 Conducted in this way, evidence from stakeholder participation is used throughout the synthesis to develop, refine, adjudicate between or refute program theories and to validate emergent explanatory theories.
However, Goodman and Sanders Thompson 59 argue that this approach to stakeholder inclusion has limitations, particularly in relation to giving weight and power to stakeholder evidence through meaningful engagement and in demonstrating transparency in the presentation of research findings. Meaningful participation, Goodman and Sanders Thompson 59 state, goes beyond the informative capacity of asking stakeholders for advice moving more towards cooperation and collaboration in which stakeholder evidence has direct and clear impact on research outcomes which is clearly demonstrated in any relevant outputs produced.
The purpose of realist synthesis, as previously discussed, is that of refining theory. 1 Beginning with the development of a theoretical framework, based on evidence from the existing empirical literature, the realist synthesis goes on to draw evidence from a range of sources including empirical studies, editorials, systematic reviews, discussion pieces and grey literature. Given that evidence can be drawn from such a broad range of sources to develop, refine, adjudicate between or refute program theories, we argue in this study that it is possible to include primary data from stakeholders within that

BOX 1 MAPPING APPROACHES TO ADOLESCENT RISK BEHAVIOR PREVENTION
Volitions: How does decision making at various levels throughout the program (program management, senior leadership, program deliverers, learners) impact on implementation, engagement and program outcomes? What are the factors (external and internal) which influence decision making? How do stakeholders feel about the program?
Implementation: Have programs been implemented with high fidelity to the program protocol? How do changes or adaptions, such as changing those delivering the program, impact on program outcomes? What are the contextual or broader sociocultural factors which impact on program fidelity? How are changes or adaptions documented, evidenced, and accounted for in evaluation?
Context: How does the program fit with national/local/institutional policy? How does the program fit within current school ethos? How are the social and cultural needs of the participants considered within the program? What are the social and cultural determinants that impact young people/those involved with the program? What impact do these contextual factors have on uptake, engagement, implementation, and/or program outcomes?
Time: Do programs differ in the age and developmental stage of target learners? Do age or developmental stage at the time of delivery impact on program outcomes? Do programs differ in duration and dose? What are the factors which impact on program timing (workload, timetabling, sickness, etc.)?
Outcomes: What were the tangible outcomes of the program both positive and negative? Planned and unplanned? Were any changes in behavior or attitudes observed (intermediate outcomes)? What happens to program outcomes over time/in replication/at scale? Rivalry: How does previous experience impact on attitudes or behaviors during implementation? Are learners receiving conflicting messages, for example, in school or at home? Is there conflict between program messages and lived experience of those involved in the program? How are conflicts resolved within the program? What impact does this have on the program?
Emergence: How do changes in policy or governance impact on the program? Are programs responsive to change? How are emergent outcomes captured and accounted for in evaluation? data set, and that there are clear advantages to doing so, not only in relation to transparency and power of stakeholder voices as highlighted by Goodman and Sanders Thompson 59 but also in overcoming some of the challenges encountered.
For the sake of clarity, it is important to note here that stakeholders recruited within this research had not had direct experience of the programs captured within the theoretical framework. Empirical studies were historical, spanning from 1980 to 2016, across several countries, therefore conducting a realist evaluation, drawing on evidence directly from program participants was not feasible. However, given the widespread universality of health promotion and health risk behavior prevention programs for young people within the United Kingdom, typically delivered in school as part of personal, social, health and economic education (PSHE) classes, it was possible to draw on the knowledge and experience of those who have had some involvement in such programs. As previously stated, the formulation of program theories is an iterative process in which we return to data sources again and again as program theories change and develop, seeking increasingly more specific evidence to produce informed hypotheses about what works, for whom, in what circumstances and why. On this basis, stakeholder engagement was conducted throughout the research, becoming increasingly focused as mechanistic and contextual factors, and the relationships between them, were revealed.
Both interviews and focus groups, as detailed above, took a semi-structured approach to data collection, combining realist interviewing principles 61 with more typical qualitative interviewing techniques. 62 Participants were not asked directly about context mechanism outcome configurations but were asked questions which had been formulated to elicit evidence pertaining to certain contextual and mechanistic factors and/or the relationships between them.
As discussed, the VICTORE checklist 43 provided a means by which to begin to ask questions of the literature in a systematic way, in order to understand and draw comparisons between the large bodies of heterogeneous evidence captured within the theoretical framework. As the research progressed, moving away from the more tangible factors, such as program resources, implementation, and the immediate surrounding contexts in to which programs were introduced, to understand how and why programs succeed or fail, it became increasingly difficult to source and extract evidence which could contribute to the continued development and refinement of program theories. Central to understanding the realist approach to research is the assertion that underpinning causal mechanisms are hidden, operate at different levels within the system, and are dependent on interactions between program components which may or may not be visible themselves. 63 Drawing on the work of Maidment et al, 64 engaging professional stakeholders in the initial program theory development stage enabled us to begin to unpick the role of human volition and decision making in program selection and delivery. Questions here covered topics such as who delivers the program, support and training typically received in carrying out the role, how national, local and institutional policies are accounted for in delivery plans, potential barriers to successful implementation, how sociocultural factors are accounted for within this type of program, and how any adaptions are documented. In addition to this, inclusion of stakeholders with professional experience of program design and delivery allowed us to consider how and why programs deviate from underpinning theory in operationalization, as well as any impact this may have on program outcomes.
School nurses and young people engaged in discussions focused on young people's health and wellbeing needs, what young people want from such programs, who they would want to deliver risk behavior prevention programs and the impact different program deliverers have on engagement, how and why participants may or may not engage with program resources, and the influence of contextual factors both within the program and outside of it.
As demonstrated in Figure 1, data analysis was an iterative process which involved frequently returning to the data set throughout analysis in order to develop, refine or refute program theories based on the evidence collected. Within this process, evidence from stakeholders was synthesized in two ways. The first used existing program theories as a priori themes, purposefully seeking to elicit evidence relating to mechanistic and contextual factors in order to refine or refute those theories. The second strategy used approaches more typically associated with thematic analysis of qualitative data, constructing new themes based on evidence arising from wider discussions about adolescent health risk behavior and risk behavior prevention. Here the evidence was used to develop new tentative program theories, which in turn lead to further interrogation of the literature, and where necessary further additional literature searching. This zigzagging between data sets continued until a clearly defined and well evidenced set of program theories had been developed which explained what works, for whom, in what circumstances and why in the prevention of multiple risk behaviors in adolescents.
Key program theories were then presented to young people as a series of vignettes, either to check a whole theory, or specific elements contained within a theory where further clarity was needed. To reduce discomfort for young people, and to ensure responses were as open and honest as possible, this stage was facilitated by youth group leaders and was therefore not recorded as primary data. This stage, along with the identification and application of substantiating theory in the formulation of more widely generalizable middle range theory will be discussed in more detail in a follow up paper.

| DISCUSSION
Public health programs are by nature complex, and realist methodologies have recently asserted themselves as ideally placed to study and explain such complexity through the medium of realist program theories. 1 Others in the literature have reported the challenges of sourcing the necessary data to undertake a thorough realist analysis, particularly when the evidence base is heterogeneous or the intervention ill-defined. 65 This was echoed in our experience, particularly in relation to sourcing evidence pertaining to deeper level mechanisms and/or more distal contextual factors. In this study, we have used our work on programs designed to prevent or reduce multiple risk behaviors in adolescence to exemplify the customization of the realist review processes in what we have termed a "realist evaluative synthesis" to reflect the blurring of boundaries between using primary and secondary data in realist work. Reflecting on the processes involved in conducting the research, we have considered the challenges encountered, and the practical steps taken to help us overcome those challenges. We explain how drawing directly on the VICTORE complexity checklist to aid understanding and incorporating stakeholder engagement as primary data were two key tools to operationalize this. This process enabled us to gain meaningful understanding of the mechanistic and contextual factors which contribute to program success or failure, and to ensure transparency in the development of program theories.
In particular, the VICTORE complexity checklist 44 provided a tool through which we were able to systematically interrogate the empirical literature, which was both expansive and heterogeneous, where differences both within and between methodological approaches made identification of commonly occurring themes difficult. While the use of the tool has been recommended in realist work, 44 its explicit and detailed use has seldom been reported in the literature. 58,66 Similarly to the challenges reported here, Rogers et al, 66 in their realist review exploring community accountability and empowerment initiatives, found that there was a discrepancy between underpinning theory on which programs were based, operationalization of those theoretical underpinnings, and reporting of the processes involved. Seeking to identify and understand potential causal pathways in this under theorized area, Rogers et al draw on the VICTORE checklist 43,44 to cut through complexity both within and surrounding the program in question. Here, the checklist was used to systematically identify mechanistic and contextual factors of note to aid in the development and refinement of program theories, though the process undertaken to do this is not detailed.
Facing a slightly different challenge to those reported here, conducting realist evaluation of educational technology (Edtech), King et al 58 drew on the VICTORE complexity checklist 44 as a means of sourcing and extracting evidence from a broader literature base, where there was a dearth of empirical literature which was directly relevant. Here the researchers develop a series of questions addressing potential areas of investigation within the wider literature to identify mechanistic and contextual factors of interest.
As with the research described above, drawing on the complexity checklist within our own research aided us in making sense of a large and varied, heterogeneous data set where relationships between underpinning theory and program implementation lacked clarity or description of processes undertaken. Building on these examples, we argue that regardless of whether the research undertaken is a realist review or a realist evaluation, the VICTORE checklist provides a useful tool though which empirical literature may be systematically interrogated, particularly where there have been challenges in conducting a review using more typical methods.
Our second key contribution relates to the consideration of stakeholder engagement as primary data collection. This was key to the development of meaningful and relevant program theories, aiding in both the elicitation of key contextual and mechanistic factors, and in understanding the complex relationships between them. Mindful of the challenges we faced in eliciting data from a large and heterogeneous data set, we drew on the work of Maidment et al, 64 incorporating primary data from stakeholders as a means to develop and refine program theories which remain unclear based on evidence drawn from the literature. Going a step further to combine this with the use of the complexity checklist further aided us in following up on questions left unanswered by the literature adding deeper ontological depth to our final set of refined program theories.
Further to this, inclusion of stakeholders as primary data was vital in maintaining transparency in program theory development, as we were able to include direct quotes to evidence the process. Inclusion of quotes from stakeholders ensured that stakeholder voices were well represented within program theories, acknowledging the importance of their views in understanding what works, for whom, in what circumstances and why, and reducing potential for researcher bias in evidencing and refining program theories. Empowering stakeholder voices is particularly important when it comes to those of young people, who are typically underrepresented within a literature base which often lacks the depth of exploration required to explore generative casual factors. 4 Including stakeholders in this way required full ethical approval, including informed consent both from stakeholders and, where stakeholders were under 16 years of age, from caregivers, which added a level of administration often not required in realist syntheses. However, there were clear benefits to this, not least the freedom to recruit a more diverse stakeholder group including young people, and the potential to use a range of data collection methods, providing us with a greater breadth and depth of data than we could have achieved through consultation.
As stated by Jagosh et al, 7 while quality standards and guidelines for realist synthesis provide valuable insight into "how to" undertake such work, it is difficult to know before engaging in the review what methodology, or indeed what adaptions to the methodology may be needed. In this article, we demonstrate how realist approaches are sufficiently flexible to enable a blurring of boundaries between what is traditionally considered secondary data (the sole focus of literature synthesis methods) and what is considered primary data (the collection of which is the sole purpose of primary research). By taking a realist logic of analysis and guided by the VICTORE complexity checklist, we demonstrate how concretely in this project the source of the data mattered less than what they brought to the process of theory development, refinement and testing. In this, we are explicitly building on key methodological developments in realist research, such as the work of Westhorp 3 on using complexity consistent theory in realist work, and that of Jagosh 4 on developing ontologically deep understanding of how public health programs work.