Citizen engagement in public services in low‐ and middle‐income countries: A mixed‐methods systematic review of participation, inclusion, transparency and accountability (PITA) initiatives

Background How do governance interventions that engage citizens in public service delivery planning, management and oversight impact the quality of and access to services and citizens’ quality of life? This systematic review examined high quality evidence from 35 citizen engagement programmes in low‐ and middle‐income countries that promote the engagement of citizens in service delivery through four routes: participation (participatory priority setting); inclusion of marginalised groups; transparency (information on rights and public service performance), and/or citizen efforts to ensure public service accountability (citizen feedback and monitoring); collectively, PITA mechanisms. We collected quantitative and qualitative data from the included studies and used statistical meta‐analysis and realist‐informed framework synthesis to analyse the findings. Results The findings suggest that interventions promoting citizen engagement by improving direct engagement between service users and service providers, are often effective in stimulating active citizen engagement in service delivery and realising improvements in access to services and quality of service provision, particularly for services that involve direct interaction between citizens and providers. However, in the absence of complementary interventions to address bottlenecks around service provider supply chains and service use, citizen engagement interventions alone may not improve key wellbeing outcomes for target communities or state‐society relations. In addition, interventions promoting citizen engagement by increasing citizen pressures on politicians to hold providers to account, are not usually able to influence service delivery. Conclusions The citizen engagement interventions studied were more likely to be successful: (1) where the programme targeted a service that citizens access directly from front‐line staff, such as healthcare, as opposed to services accessed independently of service provider staff, such as roads; (2) where implementers were able to generate active support and buy‐in for the intervention from both citizens and front‐line public service staff and officials; and (3) where the implementation approach drew on and/or stimulated local capacity for collective action. From a research perspective, the review found few studies that investigated the impact of these interventions on women or other vulnerable groups within communities, and that rigorous impact evaluations often lack adequately transparent reporting, particularly of information on what interventions actually did and how conditions compared to those in comparison communities.

active support and buy-in for the intervention from both citizens and front-line public service staff and officials; and (3) where the implementation approach drew on and/or stimulated local capacity for collective action. From a research perspective, the review found few studies that investigated the impact of these interventions on women or other vulnerable groups within communities, and that rigorous impact evaluations often lack adequately transparent reporting, particularly of information on what interventions actually did and how conditions compared to those in comparison communities. 1 | PLAIN LANGUAGE SUMMARY 1.1 | Citizen engagement improves access to public services in low-and middle-income countries, but evidence on development outcomes is limited Interventions promoting citizen engagement in public service management involve participation, inclusion, transparency and accountability (PITA) mechanisms. In low-and middle-income countries (LMICs), these interventions are effective in improving active citizenship and service delivery, and may improve the responsiveness of service provider staff for services provided directly by public servants (for example, in health).
In contrast, interventions providing information to stimulate pressure on politicians are not usually effective in improving provider response or service delivery. There is insufficient evidence to conclude whether these interventions are effective in improving wellbeing or the relationship between citizens and the state.

| What is this review about?
Failures in governance lead to the exclusion of large portions of society from public services and to waste, fraud and corruption. This review assesses evidence for interventions promoting better governance of public services: participation (participatory planning), inclusion (involvement of marginalised groups), transparency (information about citizen rights or performance of public officials), and accountability (citizen feedback) mechanisms, known collectively as PITA mechanisms.

| What is the aim of this review?
This Campbell systematic review examines the effects of interventions to promote citizen engagement in public service management.
The review synthesises evidence from 35 impact evaluations and 36 related studies of interventions promoting participation, inclusion, transparency and accountability (PITA) mechanisms.

| What studies are included?
The review includes impact evaluations relating to 35 PITA programmes from 20 LMICs. In addition, 36 qualitative and programmatic documents were included to strengthen understanding of implementation context and programme mechanisms.
1.5 | What are the main findings of this review?
Citizen engagement interventions (i) are usually effective in improving intermediate user engagement outcomes, for example, meeting attendance and contributions to community funds; (ii) improve access to and quality of services but not service use outcomes; (iii) can lead to improvements in some wellbeing outcomes such as health and productive outcomes; (iv) may improve tax collection; but (v) do not usually lead to changes in provider action outcomes such as public spending, staff motivation and corruption. There may be an exception where there is direct interaction between citizens and service providers in the regular delivery of services. Interventions providing performance information do not generally improve access or lead to improvements in service quality.
Only interventions focused on services delivered by front-line staff (e.g., in health) achieve positive outcomes. Those delivered without public interaction (e.g., roads) do not. However, engagement with civil society organisations and interest groups may lead to better outcomes for services accessed independently of providers.
Inclusive citizen engagement programmes have at least as big an effect on user engagement and access to services as less inclusive approaches.
Many interventions experienced challenges stemming from a lack of positive engagement with supply-side actors, whose power the interventions often sought to diminish. Interventions implemented with the strong support of the targeted service providers were better able to realise positive impacts.
Approaches to citizen-service provider engagement appear to work more effectively when implemented through phased, facilitated collaborative processes rather than one-off accountability meetings that are seen as confrontational.
Only four studies present any data on intervention costs. This limited the potential for any analysis of comparisons across programmes and settings.
In interpreting the findings, it must be noted that each individual outcome is reported in only a few studies and that included studies have important methodological weaknesses with risks of bias arising from weak design, analysis and reporting.
1.6 | What do the findings of this review mean? 1.6.1 | For policy and programme managers A collaborative rather than confrontational approach with the service providers whose services are under scrutiny is more likely to be effective. Engaging communities may require using civil society organisations to facilitate the community's participation. Programme design should ensure positive engagement with supply-side actors within the intervention setting.

| For researchers
More high-quality studies are needed, comparing different approaches to improving service delivery, paying attention to complete description of the different approaches being compared. Since implementation is a crucial factor, mixed methods studies should be the norm, and will help focus on equity considerations which have between the state and service providers; and between citizens and service providers.
In this review, we examined interventions that strengthened governance through the "short route" between citizens and service providers, and interventions that improved governance by shortening the "long route" by providing information about the performance of elected officials to improve service provision. Following an Evidence Gap Map study on interventions to promote State-Society Relations (Phillips et al., 2017), the Centre of Excellence on Democracy, Human Rights and Governance (DRG) at USAID commissioned this systematic review to answer the question, "to what extent are programmes that incorporate PITA characteristics into their design effective, as compared to otherwise similar programmes that do not?"

| Objectives
This systematic review includes projects and programmes that aim to change the ways citizens engage in the planning, running and oversight of public services, and investigates the subsequent impact of these efforts on the quality of and access to public services, and ultimately on people's quality of life and satisfaction with the State.
The review applied an innovative approach that sought to understand the mechanisms and processes through which change happens, and to systematically identify the key factors that influence whether an intervention may be effective in a given context.
The review aimed to answer the following five questions: 1. What are the effects of interventions that aim to strengthen PITA mechanisms on social and economic wellbeing of participants (intermediate and final outcomes)?
2. What are the effects of interventions that aim to strengthen PITA mechanisms on participatory, inclusive, transparent or accountable processes (immediate outcomes)?
3. To what extent do effects vary by population group and location? 4. What factors relating to programme design, implementation, context, and mechanism are associated with better or worse outcomes along the causal chain? 5. What evidence is available on programme costs and incremental cost effectiveness in included studies of effects?

| Search methods
The systematic review was carried out according to a protocol that was peer reviewed and published in the Campbell Collaboration library. To identify all potential relevant published and unpublished evaluations to include in the review, the authors carried out a systematic search of key academic databases, donor and practitioner websites, including potential results in all languages, and from any low-or middle-income country, drawing also on an evidence gap map on state-society relations (Phillips et al., 2017). The searches were carried out between February and April 2018.

| Selection criteria
To identify the direct contribution of interventions promoting citizen engagement on service delivery improvements, the review included evaluations in low-and middle-income countries (L&MICs) that compared the impact on service delivery access and quality in participating communities against similar communities where citizens received "standard public services," which did not have access to the same opportunities or support for citizen engagement in the planning or oversight of those services. The review included quantitative WADDINGTON ET AL. | 3 of 90 causal studies (randomised and non-randomised impact evaluations) and also drew on information about mechanisms from the programmes contained in the included impact evaluations (e.g. programme documents, qualitative studies).

| Data collection and analysis
Authors conducted a detailed critical appraisal (risk of bias) and external validity assessment of the included studies, to assess the credibility of the findings. To answer Review Questions 1-3, the authors extracted effect size data measuring the change in outcomes in consistent units from each included impact evaluation. We used statistical meta-analysis to synthesise the findings. To structure the meta-analysis, authors used a conceptual model for outcomes along the results chain relating to immediate outcomes (citizen and service provider engagement), intermediate outcomes (access to services and service uptake), and final outcomes (citizen welfare and statesociety relations).
To answer Review Question 4, a framework synthesis of all included studies plus supplemental qualitative and programmatic documents was conducted, to systematically identify the key barriers, facilitators and moderating factors that could explain why an intervention was more likely to achieve its expected results in a given context. Authors identified five intervention groups for the analysis; interventions promoting rights information, performance information, participatory planning, community feedback mechanisms and community-based natural resource management. Finally, evidence on costs from the included impact evaluations and supplemental documentation was collected to answer Review Question 5.

| Results
The search returned over 10,000 papers, from which 50 impact evaluation reports corresponding to 35 programmes that met the criteria for being included in the review, alongside an additional 11 on-going studies, were identified. For the 35 programmes identified, authors undertook a targeted search and identified 36 qualitative and programmatic documents that were used to strengthen understanding of the context and implementation of the programmes.
Authors identified five specific intervention types across the 35 programmes.
Sixteen citizen engagement programs evaluated citizen participation in the design and implementation of public services, grouped into two intervention sub-groups: • nine participatory priority setting, planning or budgeting interventions, wherein citizens participated in setting the priorities for and/or planning of local services. These include support for participatory budgeting in municipal governments in Brazil, Mexico and Russia, and support for participatory planning in India, Pakistan, Guinea and Kenya. It also included requirements for inclusive participation in two fragile contexts, Afghanistan and DRC.
• seven community-based natural resource management (CBNRM) interventions, wherein citizens form local collectives and take over the management of a shared resource, for forest management in Nepal, Madagascar and Tanzania, and water user associations in Brazil, China and the Philippines, and Namibia.
Eleven citizen engagement programs evaluated transparency mechanisms, which specifically aimed to disclose and/or disseminate information that would shift the power balance between service providers and users, comprising two intervention sub-groups: • five evaluations of rights information interventions, which enable users to demand minimum standards for access to services, such as for social protection services in Indonesia (food subsidies) and India (public works), maternal and child health care in India and freedom of information in Pakistan.
• six evaluations of public official or service provider performance information interventions, such as the dissemination of municipal government performance scorecards in Afghanistan, Brazil, the Philippines and Uganda, and monitoring information provided in police stations in India.
Ten evaluations of accountability mechanisms were included, which specifically comprised citizen feedback or monitoring mechanism interventions, i.e. those that solicited feedback regarding and/or actively engaged citizens in the monitoring of service delivery, to hold public service providers and institutions respon-

| Review question 1
Authors found, on average, that citizen engagement interventions improved access to and quality of services by an overall average pooled effect size of 0.10 standard deviations (95% confidence interval=0.04, 0.16), compared to standard service delivery.
Outcomes tended to be of similar magnitude when service access was measured in physical terms or quality of service. Only outcomes relating to reducing staff absenteeism and embezzlement were not systematically different for citizen engagement interventions, as compared to standard public service delivery. When disaggregating by intervention sub-groups, the pooled effects for rights information and citizen feedback mechanisms were of similar magnitude.
However, interventions providing performance information about public officials or service providers did not tend to lead to changes in access to services or improvements in service quality.
Turning to the rest of the causal chain, the results indicated that citizen engagement interventions incorporating PITA mechanisms did not systematically improve service use, whether measured as use of health services (e.g. immunisation, antenatal care), social protection services (employment services), or attitudes to services (user satisfaction and complaints). These findings were consistent for intervention sub-groups (participatory priority setting, CBNRM, performance information, rights information, and citizen feedback mechanisms).
The findings also indicated that citizen engagement interventions can lead to improvements in some wellbeing outcomes, where wellbeing is measured using health outcomes (morbidity, mortality, nutrition) or productive outcomes (agriculture yields, income/ expenditure, asset ownership). However, these overall changes tended to be small in magnitude (around 0.10 standard deviations increase in the outcome) and were not observable consistently across all outcomes analysed. The outcomes measured were diverse, and sample sizes for each outcome small, hence it is not possible to draw strong conclusions. In addition, interventions providing performance information about public officials or service provision did not increase wellbeing outcomes.
Outcome measures of state-society relations included public confidence in institutions, institutional sustainability and taxes paid. Some study results suggested citizen engagement interventions may improve tax collection. There were no improvements for the other state-society relations outcomes (corruption perceptions or confidence in institutions), although only two studies were identified for each. It is therefore not possible to draw strong conclusions.

| Review question 2
In order to examine effects on immediate outcomes, service user and service provider engagement were analysed separately. The main finding from the analysis of user engagement outcomes is that interventions incorporating PITA mechanisms are usually effective in engaging service users, for example by improving meeting attendance, contributions to community funds and general knowledge about services. The average pooled effect on user engagement was an increase of 0.23 standard deviations across all outcome measure.
When the findings were disaggregated by citizen engagement intervention, the results indicated that pooled effect magnitudes were similar and statistically significant for some intervention sub-groups (participatory priority setting, CBNRM, rights information, citizen feedback mechanisms). Hence, in general, this review found that interventions usually lead to improved citizen engagement outcomes, as compared to standard public service delivery.
However, the effects of interventions promoting citizen engagement on provider action outcomes are very limited. Overall, provider responsiveness to the intervention tended to be small, by a small pooled effect which was statistically insignificant. No significant pooled effects were found for specific outcome measures such as public spending, staff motivation, corruption or responses as perceived by service users. Nor were there significant pooled effects for intervention sub-groups (participatory priority setting, CBNRM, performance information, rights information, or citizen feedback mechanisms). In sum, this review found that citizen engagement in public services interventions do not usually trigger service provider actions.

| Review question 3
Diversity and equity of impacts differ across population groups in three ways. Overall, few of the studies reported disaggregated intervention approaches and/or analysis of results for different population groups. Nine programmes incorporated specific measures within the intervention to extend the engagement to vulnerable groups. These inclusive citizen engagement programmes tended to have as big or bigger effects on user engagement and access to services as other citizen engagement programmes. Across the whole pool of included studies, 12 conducted sub-group analysis to differentiate impacts for different population groups, most commonly by socio-economic status and by sex of participant, yet these were spread widely across intervention type and geography. This review identified only one mixed-methods study that conducted equity-oriented causal chain analysis to differentiate impacts for women. Analysis by global region was not able to find consistent differences by intervention or outcomes along the results chain. Ultimately, due to the small sample of studies across a wide range of interventions and outcomes, it was difficult to conclude anything systematically for different population or geographic groups. participatory process, particularly where communities self-selected into the intervention may encourage more active and effective engagement. Second, the incorporation of specific, culturally appropriate measures that address local barriers to the participation of vulnerable groups may be key to ensuring that decisions taken reflect pro-poor values and outcomes. Finally, participatory planning processes that engaged and/or stimulated the growth of local civil society and capacity for collective action may be more sustainable and more likely to achieve long-term results.
Four contextual factors were identified that mediated results chains amongst community-based natural resource management (CBNRM) interventions. Where interventions required large shifts in control over the resource, representing a relinquishment of power from local officials to community groups, lack of engagement and buy-in from local officials was a frequent barrier to the full implementation of the CBNRM policy. Critically, this barrier often resulted in situations in which community groups took on additional responsibilities for resource management, but did not gain access to the corresponding promised benefits. A related factor is the clarity of the national CBNRM policy context; where there were multiple vague and overlapping policies governing natural resource use, officials were more able to adjust or block full implementation of CBNRM in a way that preserved their power and control over resource benefit access. External support to change resource use was a key facilitating factor: even in the absence of full policy implementation, access to alternative livelihoods such as tourism may still enable communities to realise the joint socio-economic and environmental objectives of CBNRM. Finally, the type and intensity of local resource use were key moderating factors influencing the effectiveness of CBNRM; community management may not be appropriate in contexts prone to illegal logging or poaching, where attempts to enforce regulations may endanger community members.
Across citizen feedback and monitoring mechanism interventions, four key facilitating factors were identified. First, there was a strong distinction between projects targeting services delivered directly to citizens by front-line providers, such as healthcare, versus those that citizens access independently of staff who implement and manage the services, such as infrastructure. The social sanction threat of individual citizens' voices was not strong enough to spur improvements amongst providers who do not interact with users on a regular basis. Building on the criticality of sustained, direct engagement, the findings suggest that interventions that took a phased, facilitated approach to engaging citizens and service providers jointly in the monitoring of service delivery may be able to trigger intrinsic motivation within service providers and create a sense of working towards a common goal, which may be more effective compared to more confrontational town-hallstyle meetings. The incorporation of performance benchmarks was also a frequent facilitating factor to enable community monitors to identify realistic opportunities for local improvements in service delivery. Finally, ensuring the creation of common knowledge around feedback or monitoring results, and working through local community organisations were identified as further facilitating factors that strengthened the weight of citizens' voices and their power to hold service providers to account.
Rights information interventions were more likely to be successful where they targeted the provision of services that citizens access through interactions with service provider staff; created a sense of common knowledge about people's rights to the service amongst both citizens and providers; and created an appropriate level of social sanction risk for providers. An initial critical factor is whether citizens' lack of knowledge of their rights was the key barrier preventing them from accessing services, as opposed to an issue on the "supply" side of service delivery, such as a lack of capacity amongst service providers to deliver the service. Because rights information interventions rarely engage with service providers, even where service use may change as citizens effectively bargain for access, improvements in service delivery quality are unlikely.
Finally, amongst performance information interventions, a key facilitating factor was the extent to which implementers secured the support of and buy-in from the individuals whose performance was being analysed and disseminated. Without such support, the findings suggest that the targeted individuals may be able to avoid accountability by either preventing full implementation of the intervention, or by successfully undermining the credibility of the performance information disseminated. Most of these interventions targeted political actors' performance (as opposed to sectorspecific public services), in attempts to "shorten the long route" of citizen-state accountability by increasing citizen engagement with politicians outside of elections. While interventions were at times successful in eliciting some improvements in politician performance, the findings suggest that, ultimately, this route remains too long to identify short-term effects on service delivery. Politicians may claim plausible deniability of their individual capacity to influence service delivery change, and such interventions do not engage many key actors involved along the public service delivery supply chain.
A key factor influencing progression along the causal chains for accountability and transparency-for-accountability interventions in the framework synthesis was found to be whether interventions targeted public services that were delivered to citizens directly by front-line providers, typically merit good services such as healthcare, versus those that targeted purely public good services delivered indirectly to citizens, such as roads. Disaggregating the meta-analysis amongst accountability interventions targeting merit versus pure public good services suggested that citizen engagement improved across all services. However, interventions targeting directly delivered merit-good services were better able to elicit positive responses amongst service providers, a difference which appeared to trigger a break in the causal chain for interventions targeting indirectly-delivered, pure public good services. The findings showed positive effects for outcomes of service quality and access amongst directly delivered services, but insignificant findings on outcomes amongst interventions targeting indirectly delivered pure public goods.

| Review question 5
Cost effectiveness is a key question for decision making, yet it is rarely incorporated into impact evaluations. Only four studies presented any data on intervention costs, usually at highly aggregated level (e.g. total cost of intervention) and only the study of report cards in health in Uganda presented cost effectiveness information (cost per under-5 death averted). This limited the potential for any analysis of comparisons across programmes and settings.
2.7 | Authors' conclusions 2.7.1 | Implications for policy makers and practitioners The findings suggest interventions to improve governance through citizen engagement in public services may be effective in stimulating active citizen involvement and improving access to and the quality of public services. Sustained, direct engagement between citizens and service provider staff appears to be key: the biggest effects were seen for interventions that targeted public service governance through the "short route" of direct engagement between citizens and service providers, and which targeted services that citizens access directly from front-line providers, typically merit goods such as healthcare.
However, citizen engagement interventions alone do not typically improve use of services and may not also lead to better wellbeing outcomes for citizens or state-society relations. The authors hypothesise that this may be due to the absence of complementary interventions to address bottlenecks over which citizens can have limited access, such as service provider budgets and supply chains or technical capacities.
Citizen engagement interventions were less successful where there was less engagement between citizens and front-line providers. This occurred where interventions aimed to shorten the "long route" of improving governance by increasing citizen pressures on politicians to improve public services through politician performance information, or where interventions targeted services such as infrastructure, which citizens access independently of service providers.
Interventions that work through local civil society groups and stimulate capacity for collective action, particularly amongst vulnerable groups, may be more effective than those that rely on engaging unorganised citizens. This is particularly critical for public good services such as infrastructure, wherein citizens must overcome the collective action issue.
Interventions that obtain and sustain buy-in from local public service providers at the point of citizen engagement may be more effective at creating appropriate threats of social sanctions and stimulating intrinsic motivation to stimulate behaviour changes amongst service providers to improve service delivery quality. This is particularly critical in CBNRM, to ensure interventions do not do unintentional harm by increasing the burden on communities for resource management without enabling them to realise full access to the benefits in return.
Interventions that do not incorporate specific measures to facilitate the inclusion of vulnerable groups may not realise equitable outcomes for those groups in the short-term. Barriers to vulnerable groups' inclusion varies widely by context, and inclusion components should be adapted in response to local contexts and needs.

| Implications for research
Impact evaluations need to "open intervention black boxes" by being more transparent in reporting of intervention design and implementation fidelity, as standard. This also includes clearer reporting of the comparison conditions received by groups outside of the intervention. Authors may draw on frameworks for intervention reporting guidelines, such as TIDieR in health sector research. Impact studies also need to engage more consistently with equity issues, either by evaluating intervention components specifically targeting equity, collecting outcomes relevant for certain vulnerable groups, or at the very least reporting outcomes subgroups for vulnerable groups. Most studies collected outcomes data shortly after intervention, usually within 5 years. There may be opportunities to examine outcomes over longer periods cost-effectively, for example by conducting more follow-up studies of existing trials, or by conducting ex post evaluations using natural experiments.
In this review, the authors used theory-based mixed-methods approaches to examine a wide range of interventions promoting citizen engagement in public services governance, taking the PITA mechanism as the unit of analysis. Further synthesis research adopting this broader approach may focus on interventions to improve other domains of governance (e.g. the compact between state and service provider), combinations of domains (e.g. citizen engagement plus compact), or by comparing citizen engagement with other approaches to increase state capacity such as through better monitoring of public service delivery agents. The authors also note that systematic reviews usually focus on the effectiveness of particular interventions, and new systematic reviews of specific interventions (e.g. participatory budgeting, water user associations) and updates of existing reviews (e.g. community monitoring, education sector govnernance) are needed. 3 | BACKGROUND 3.1 | The problem: unaccountable government systems and poor service delivery inclusive, participatory and representative decision-making at all levels (UNDP, 2016). In the Paris Declaration on Aid Effectiveness, donor and partner countries committed to improving their mutual accountability and transparency in the use of development resources, with partner countries further committing to systematically involve diverse stakeholders in national development priority setting processes (OECD/DAC, 2005). Many development challenges, such as poor service delivery, corruption and slow growth, persist because of the political context around them; they are as much about power dynamics as they are technical challenges.
Improving the governance of public institutions and service delivery has long been a central tenet of strategies for achieving or supporting development; World Bank World Development Reports since the late 1990s have included elements of improving governance as central to their theories of change (Grindle, 2004). In the decades since, mainstream approaches to realising good governance have shifted in focus, away from privatisation of service delivery and towards a focus on increasing the engagement of constituents, particularly vulnerable groups, with public institutions and service providers in such ways to increase the effectiveness, appropriateness, and quality of service delivery. The 2004 World Development Report (WDR) highlighted the insight that public spending on service delivery in developing countries often primarily reached the betteroff minority of citizens; for example, in India, curative health subsidies were primarily going to the richest 20 per cent of the population, who received three times the subsidies of the poorest 20 per cent (World Bank, 2004). This insight remains pertinent. For example, a recent evaluation of an e-governance intervention in India that aimed to improve transparency in a fiscal transfer system for a social benefits programme suggested that while the intervention was successful at reducing leakages, the savings did not translate into improved outcomes for beneficiaries (Banerjee, Duflo, Imbert, Mathew, & Pande, 2017). One of the authors later posited that this may have been because the intervention did not empower the ultimate beneficiaries to ensure that financial gains from reduced corruption were converted into increased outcomes for poor people (Page and Pande, 2018).
There are many definitions of governance. For the purposes of this review, we use the recent definition employed by the World Bank, where governance is defined as "the process through which state and non-state actors interact to design and implement policies within a given set of formal and informal rules that shape and are shaped by power" (World Bank, 2017). Where characteristics of good governance are weak or absent from public processes and service delivery, the effectiveness and sustainability of development interventions is likely to suffer (World Bank, 2016). Barriers to access to public services for vulnerable groups exacerbate inequality, with potential long-term repercussions for a society's development (Easterly, 2007). Fraud and corruption are pervasive across low-and middle-income countries, and the negative consequences on quality of life and core development outcomes are well documented (Molina, Carella, Pacheco, Cruces, & Gasparini, 2016;Svensson, 2005). Where state and public actors cannot be effectively held accountable, a culture of impunity develops that normalises fraud and rent-seeking practices. The World Bank's World Development Report 2017 highlighted key repercussions of power asymmetries, including: exclusion of large portions of society from services, institutions or resources, which is correlated with violent conflict: elite and/or interest-group capture of policies in order to serve interests, resulting in poor targeting and ineffective or inappropriate policies, which can lead to poor or stagnant growth, condemning economies to an underdeveloped state; and clientelism, which often leads to rent-seeking and poor service delivery, which have long-term repercussions on societies' growth (World Bank, 2017).
Despite the decades of acknowledgement of the importance of good governance, progress has been slow; the Worldwide Governance Indicators show limited to none or even negative progress on key governance indicators amongst aggregates of low and lowermiddle income countries from 2006 to 2016 (World Bank, 2018). The repercussions of continued governance failures are high, and well documented; for example, in Nigeria, unabated corruption led to the squandering of billions of dollars by the National Petroleum Company, jeopardizing the country's long-term growth potential and financial stability (World Bank, 2017).
Approaches to improve governance have generally either focused on mechanisms to strengthen the effectiveness and institutionalisation of public institutions, or on external pressures to improve service delivery despite weak institutions. While each approach has yielded valuable insights, translating insights from theory into practice has been challenging. There is some evidence that at times, failures could be due to an over-emphasis of the demand side of governance by service users, citizens and civil society, which ignores the constraints faced on the supply side by politicians, bureaucrats and service providers (Brinkerhoff & Wetterberg, 2015), or of the power of information (Wibbels & Keohane, 2018). More recently, insights are emerging into the value of system-based approaches that look at both the supply and demand sides of governance as actors in a single system, drawing on power analyses and social network theories (Fox, 2014;Halloran, 2015;Mcloughlin & Batley, 2012;Wibbels & Keohane, 2018).

USAID's Democracy, Human Rights and Governance (DRC)
Center identified participation, accountability, transparency and inclusion (PITA) as critical principles that could be incorporated into interventions within and across sectors to improve development outcomes, and in line with the Doing Development Differently global initiative (USAID, 2016). We define participation as efforts to involve citizens in the design, monitoring and delivery of policy and programmes upstream (Quick & Feldman, 2011). Transparency is a "characteristic of governments, companies, organisations and individuals of being open in the clear disclosure of information rules, plans, processes and actions" (Transparency International, 2009: 44).
Accountability is the concept that individuals, agencies and organisations are held responsible for executing their powers according to a certain standard downstream (McGee & Gaventa, 2011). Finally, inclusion means a particular focus on marginalised and vulnerable citizens in policy and programming upstream or downstream (Quick and Feldman, ibid).

| Interventions to strengthen good governance
A recent evidence gap map (EGM) on interventions to improve "state-society relations" highlighted a number of interventions to improve governance (Phillips et al., 2017). These were broadly grouped into interventions for inclusive political processes and leadership (e.g. community-driven development, electoral monitoring, and quotas for women and minority representation in political institutions), and interventions for responsive and accountable institutions and service delivery (e.g. audits, land reform and public servant performance incentives).
Drawing on Phillips et al. (2017), and also insights from the literature, we theorised good governance can come about through sustained improvements across three domains: within the political system; within the management and administration of public sector offices and institutions; and in the ways in which public officials and service providers engage with service users (external engagement) (Waddington, Stevenson, Sonnenfeld, & Gaarder, 2018). In this framing, good governance interventions attempt to influence the social contract that mediates the relationships between government and citizens, regarding who has access to what power and in return for what accountability for service provision, through three accountability domains: • Influencing how the broader political system functions: The broader political system dictates access to and contestability of the policy arena (World Bank, 2017). This primarily comprises the checks and balances, or "horizontal accountability" between institutions, yet also includes political representation systems and thus, as an extension, elements of "vertical accountability" that are exercised through electoral systems (Transparency and Accountability Initiative TAI, 2017). Increasingly, good governance interventions seek to influence how this system functions, rather than the specific form it takes (World Bank, 2017).
• Influencing how a specific public service or institution's system functions internally: Many good governance interventions aim to improve service delivery through the institutionalisation of public services and institutions. These interventions foster "internal accountability" of institutions, and include, but are not limited to, strengthening human resources management, systems of upwards accountability of staff and management or between different levels of government, and supply chains for infrastructure, goods, and financial flows (Finan, Olken, & Pande, 2015).
• Influencing how a specific public service or institution engages externally with constituents: These interventions aim to mediate the ways that citizens engage with government and public service providers outside of the "long route" of electoral processes (World Bank, 2004). They work to improve service delivery through "external accountability", by increasing the engagement between service providers and service users to improve the responsiveness and effectiveness of public services. This comprises the informal processes of vertical accountability, through which citizens, CSOs and the media may attempt to influence political and public service actors directly, as well as efforts towards "diagonal accountability," formalised processes in which citizens are engaged in horizontal accountability efforts (Transparency and Accountability Initiative   TAI, 2017). In addition, it may include approaches which aim to "shorten the long route" by providing information on performance of public servants.
Many good governance interventions are designed to improve service delivery for citizens. This is often done through interventions that embody one or multiple PITA characteristics, which seek to address power dynamics between the state, civil society and citizens to make service delivery more effective and equitable (USAID, 2016).
PITA characteristics influence the functioning of the social contract and its systems throughout each of the three accountability domains, and thus, good governance interventions may target one or more of these ( Figure 1). For example, within the political system domain, the PITA characteristics have a direct impact on who has access to the electoral systems and who can contest the policy arena. Elected officials must exercise some basic level of downwards accountability towards the constituents who elected them (or, in non-democratic states, who grant them legitimacy), and sideways accountability to their fellow statesmen through the checks and balances built into the system. Interventions targeting PITA mechanisms in this domain tend to focus on creating a fair system. Within the internal system domain, interventions tend to focus on creating an efficient system, such as through improving the upwards accountability of officials and service providers to management, or through improving the relevance of service provision at local levels through decentralisation. Finally, in the external engagement domain, the PITA characteristics of a service or institution mediate the means through which it engages with citizens, civil society, and business/interest groups. These interventions aim to address a more diverse set of system attributes, primarily the relevance, effectiveness and inclusivity of the service delivery system, and are further differentiated from those in the previous domains through their reliance on soft power. The following The effectiveness of interventions that target the PITA characteristics within one domain will be mediated by the context of the other domains as well, the power relations and constraints, and also by other interventions aiming to improve good governance and service delivery, particularly those that target service delivery supply chains. There is increasing scholarship that suggests that while interventions improving the PITA characteristics of public services and institutions, particularly in the external engagement domain, may be necessary for achieving sustainable improvements in service delivery and a stable social contract, they may not be sufficient (e-Pact Consortium, 2016). On the other hand, while interventions that target strengthening PITA characteristics within internal institutional systems may be sufficient for improving governance within the system, the impact of those governance improvements may not reach the ultimate beneficiaries (citizens/service users) without the WADDINGTON ET AL. | 9 of 90 incorporation of interventions strengthening the system's external PITA characteristics (Page & Pande, 2018).

| The focus of this review on citizen engagement interventions
While recognising the interactions of interventions promoting PITA mechanisms across each domain and with complementary good governance and service delivery initiatives, it has been pointed out that to attempt to cover the entirety of good governance interventions in a single review would be "exceedingly ambitious" (Sáez, 2013). Thus, this review analysed the value-addition of interventions in the third domain, external PITA interventions targeting public service and institution engagement with citizens.
Interventions promoting PITA mechanisms can be implemented as stand-alone interventions or as part of a larger programme working to strengthen governance and service delivery. They may be implemented either on the supply or demand side of service delivery, or may target both simultaneously, such as a public audit process that trains community members on tools to hold public officials accountable, and works with public officials to increase their understanding of the importance of downwards accountability. An intervention may strengthen one or multiple PITA characteristics of the ways public services and institutions engage with their constituents.
For the purposes of this review, the definitions of PITA were operationalised as follows: • Participation: The intervention promotes or formalises continuous citizen input in the design and implementation of public services, processes or policies. Participation interventions create specific opportunities or processes for citizens to provide meaningful input into public policy or strategy design and planning. An example of a participation intervention is the introduction of participatory budgeting so that citizens may directly contribute to the development of a budget proposal .
A community-level example could be the creation and capacity building of a representative community-based natural resource management committee that is mandated to develop and monitor locally agreed standards and regulations for the use of common property.
• Accountability: The intervention encompasses monitoring and soft/ social accountability mechanisms to encourage or actively hold individuals, public service providers and institutions responsible for executing their powers and mandates according to a certain standard. Accountability interventions create opportunities or processes for constituents to monitor the government and public service providers. An example is a project to encourage and build the capacity of civil society to hold government accountable for the sustainable and equitable management of natural resources (USAID, 2016), or a citizen report card intervention, in which a community group is taught the quality standards to which they are entitled and how to monitor the quality and performance of service delivery, and then to work with the service providers to address any identified issues through a mutually agreed action plan.
• Transparency: The intervention involves the disclosure and/or dissemination of information about rights of public service users, to promote participation, and/or performance of public service providers, to promote accountability. Transparency interventions included in our review have the explicit aim of changing the way that citizens and service providers or public officials interact and the power relations between service providers and users. An example is local clinics posting information about patient rights, service fees and standards, and budget execution (USAID, 2016), which restricts the scope for service providers to charge bribes. F I G U R E 1 PITA throughout the three domains of good governance Notes: P: Participation | I: Inclusion | T: Transparency | A: Accountability. Source: Authors • Inclusion: The intervention includes particular strategies to promote the opportunities and capacities of marginalised and vulnerable groups such as women, ethnic minorities or lesbian, gay bisexual, transgender and intersex (LGBTI) people to engage with the management of public institutions and service providers. Hence, we define inclusion specifically as a component of an intervention that targets a change in participation, transparency or accountability. An example of an intervention to promote inclusion is ensuring that a certain proportion of places in a community governance group are reserved for women (Humphreys, de la Sierra, & van der Windt, 2012).
The intervention categories are described in more detail below (see Table 4 in the Methodology section). While most interventions contribute primarily to a single PITA mechanism as described above, there is often significant interplay between the PITA characteristics to which an intervention contributes. Though efforts have been made to make the definitions mutually exclusive, a single intervention may contribute to strengthening multiple PITA characteristics. The obvious cases here are interventions for transparency and inclusion. For example, a transparency intervention that improves access to information about users' rights may aim ultimately to improve user participation, while one aiming to improve information about public service performance ultimately aims to improve accountability. Further, interventions included in the review that are designed to improve the access of a marginalised group of citizens (inclusion) to a decision-making process aim, at an intermediate outcome level, to improve the group's input into the process by providing increased opportunities for consultation (participation), or service delivery monitoring (accountability).

| How citizen engagement interventions might work
We developed a stylised model showing an indicative theory of change for how the interventions may work at the protocol stage ( Figure 2). The theory of change is represented as a series of "blocks," though the authors recognise that change is not always linear and may be multi-directional. The numbers represent typical hypothesised progression, and enable signposting to the key stages of the change process in the text. Circles are used to represent underlying assumptions and key factors that facilitate, moderate or create bottlenecks along the casual chain. This preliminary theory of change developed in the systematic review protocol ) drew on insights from the literature and programmatic best practices. In particular, the framework built on the 2004 World Development Report (World Bank, 2004) theory of change, which articulated the importance of pro-poor governance practices that actively engage end users for effective outcomes, and Rahman and Robinson (2006) who articulated the importance of local ownership and long-term support. The assumptions and moderating factors drew on insights from Fox (2014), Page and Pande (2018), and the 2017 WDR (World Bank, 2017), among others. We have not taken a "rights-based approach" that views improvements in PITA characteristics as the end objective. While recognising the value of PITA characteristics in and of themselves, the focus of this review is on the value-add they bring to improving development outcomes through improved service delivery.
We note here a useful distinction between the demand and supply side of governance. Implementers may target stakeholders on the demand-side of governance, such as through efforts to improve the capacity of civil society to monitor government service delivery, or the supply-side, such as by training public officials on pro-poor development planning. Other interventions may be geared to affecting both demand-and supply-sides, such as a participatory budgeting process in which government officials are trained on the value of participatory budgeting, while community members are trained and supported to participate in the process.
The indicative theory of change presents the hypothesised causal chain for citizen engagement interventions, from changed opportunities for and capacities of citizens, followed by behavioural changes T A B L E 1 Summary of criteria for inclusion and exclusion of studies

Criteria
Inclusion definition

Population
Programme participants in LMICs were included. Programme participants in high-income countries were excluded.

Interventions
Interventions with PITA components that targeted the means and mechanisms through which public institutions and services engage with constituents (service users) were included. Interventions that bundled PITA components alongside other programme components such as block grants (e.g. community-driven development), or that aimed to strengthen internal or sideways PITA, or those in the education sector were excluded.

Comparisons
Populations that received "business as usual" service access, or an intervention with a different type or degree of PITA were included.

Outcomes
Intermediate and endpoint, intended or unintended outcomes at participant and project level were included. Outcomes relating to political processes (e.g. voting) were excluded. Immediate outcomes relating to citizen engagement (e.g. participation in meetings) or public service response (e.g. public spending) were eligible for the review provided that outcomes relating to access to services (e.g. facilities construction) or intermediate outcomes (e.g. service use) or final outcomes (e.g. health, nutrition, state-society relations) were also reported.

Study designs
Counterfactual studies (review questions 1-4), including relevant programme and project documents providing information on design and implementation (review question 4) and cost evidence provided in counterfactual studies (review question 5) were included.
T A B L E 2 Types of outcomes along the causal chain the impact of decentralisation from state-level government to local government. Thus, though the intervention was designed to reduce corruption, it does not engage citizens in the process or create specific opportunities for them to engage.
Intervention: Citizen Report Cards  Intervention: MIRA Makwanpur Manandhar et al. (2004) Though both of these health-sector interventions work to identify challenges and develop action plans to improve outcomes, the  study enables citizens to hold public health providers accountable for delivering services, and jointly develops strategies for improvement to which the health providers are accountable. In Manandhar et al. (2004), the women's groups are empowered to take responsibility for their own healthy practices; there are no requirements on the health service providers to take responsibility for addressing challenges the women identify. The intervention aims to change health outcomes outside the sphere of public service delivery.
Country: Uganda Country: Nepal PITA: A PITA: P Summary: This study looks at the impacts of an intervention in which "report cards" of health service provision were disseminated amongst communities (T), and a series of interface meetings between service providers and citizens were organised to review the reports and identify an action plan for improvements (A).
Summary: This intervention formed community-based, participatory women's health groups with the aim of identifying key local challenges and potential solutions (P), with the ultimate goal of improving birth outcomes.
Intervention: Raskin subsidy identification cards Banerjee, Hanna, Kyle, Olken, and Sumarto (2018) Intervention: Ciudad Mujer (Women's City) Bustelo, Martinez, Millard, and Silva (2016) Both of these interventions aim to increase citizens' knowledge of their rights to access services (or public subsidies). However, the intervention in Bustelo et al. (2016) was purely about access to services; it did not aim to change the way that women engaged with public service providers, except to encourage them to take advantage of the services. In contrast, the experiment in Banerjee et al.

Include Exclude Rationale
(2018) had the explicit aim of attempting to reduce corruption in the subsidy programme by limiting service providers' ability to direct who received the subsidy and who didn't. Thus, in this latter case, the change in knowledge changes the power relations between service provider and user. right to the subsidy; an alternative intervention in which lists of eligible households in communities were publicly displayed; and a control set where there were no changes in publication of eligibility for the subsidy. The aim was to test the effect of these different transparency initiatives on reducing corruption in subsidy provision.
health facility. When women arrived, they would take part in an orientation that explained all of the different services they could access at the facility, improving their knowledge of their rights to services (T).
Intervention: random federal government audits of transfers to sub-national government and publication of results to citizens Timmons and Garfias (2015) Intervention: increase in the number of government audits , audit arm) Both interventions use a "top-down" audit to improve accountability. In the case of , the audit is undertaken by the government auditor (which constitutes an "internal accountability" intervention by our definitions) and is presented to communities (which constitutes a transparency intervention for "external accountability"). The probability of being audited is known to be very low in control arms, whereas it is known to be 100 per cent in treatment arms. Hence the study is not able to disentangle the effect of the internal and external accountability interventions and is therefore excluded from the review. In contrast, the probability of audit in Timmons and Garfias (2015) is randomly determined; the threat is equal in all municipalities. We therefore consider that the main mechanism being evaluated is the publication of the results of the audit to citizens. The study thus evaluates the effect of providing performance information to enable citizens to hold public officials accountable, with the aim of changing power relations between public officials and citizens. Interventions to allow citizens to feedback concerns or priorities around service delivery to providers, and / or to monitor the delivery of public service delivery. This includes community scorecards and social audits.
On the supply side, either in addition to demand-side efforts or independently, interventions aim to strengthen openness from and active engagement with supply-side stakeholders in efforts to improve service delivery. These may target the actors implementing or managing the service in question, but also other key stakeholders in the community and throughout the system. Seeking and attaining community acceptance prior to implementation is a widely-applied best practice for ensuring that development projects do no harm and that they will have sufficient buy-in from the community to be successful. There is some evidence that suggests that this may be particularly critical for PITA mechamisms. Securing buy-in from stakeholders at the point of intervention, upstream and downstream can increase his or her own personal power through framing an improvement in service delivery as a personal "win," then she or he may be more motivated to work for its improvement (e-Pact, 2016).
The first immediate outcome (Block 2) posits that through engaging in the interventions, citizens will increase their engagement with State and public service officials. This is often an explicit aim of citizen engagement interventions, as it is a critical precursor to the higher-level outcomes. Through the increased engagement, the next level of change (Block 3) posits that citizens develop a better understanding of processes, services, and the constraints faced by service providers, while simultaneously, service providers gain a deeper understanding of the needs of their constituents and appreciation for the engagement process.
In subtle ways, these changes reflect renegotiations of power relations between the State, civil society and citizens, mitigating the power imbalances. This happens as the citizen engagement interventions shift the dynamics of power by drawing on collective and representative voice.
• Participation interventions address power relations by building in meaningful opportunities for citizens to provide input over the direction of policies that affect them and the supply of services they rely on.
• Inclusion interventions address power relations by bringing marginalised voices to the table.
• Transparency interventions address power relations by limiting the government and public service providers' capacities to use their positions for personal gain, and addressing the power difference caused by knowledge gaps.
• Accountability interventions address power relations by increasing the risk and severity of informal social sanctions against poorly performing bureaucrats and service providers.
Power relations are dynamic; they can change quickly, both for the better and worse, and gains are not necessarily secure. A key assumption here is that supply-side actors are fully engaged throughout the process; otherwise, the attempts to increase soft power by citizens may be seen as confrontational rather than collaborative, which could de-incentivise service providers from the process to avoid being seen to give up any of their power (World Bank 2004). Where PITA processes are seen as collaborative, they can be mutually empowering, creating changes in the interactions between state and society that simultaneously give citizens greater input into the provision of the services they rely on, and strengthen the standing of the service providers in the community (Fox, 2014).
Where interventions are unsuccessful at building coalitions to facilitate an enabling environment for change, they may not be successful at changing power relations, as actors may adapt to new systems (Halloran, 2015). For example, though advancements in the field of information and communications technology (ICT) offer exciting possibilities for strengthening external PITA characteristics, a change in technology that is not complemented by supporting interventions that create an enabling environment may fall flat (Hogge, 2010).
As the power relations are shifting and engagement is increasing, a core intermediate outcome of the interventions will emerge (Block 4): public service delivery will improve in efficiency, effectiveness, and equity. Once public officials and service providers are taking into account the input of community members, the selection and targeting of services will improve. This will improve the effectiveness and appropriateness of public service delivery. Inclusion interventions improve the equality of service provision, as they increase access to services and processes for the most vulnerable community members. Transparency initiatives increase the efficiency of public service delivery, as they streamline costs and processing times, and make it harder for politicians and officials to demand inflated payments for services. Finally, accountability initiatives can have direct benefits to the performance of public service delivery, as citizen feedback mechanisms such as Public Audits end with joint workshops between the service provider, citizen representatives, and other key stakeholders to come up with an actionable plan to which all parties can be held to account for how they will address the major issues identified and improve service delivery.
The key assumption here is that institutions have the capacity to respond to priorities requested and issues raised by constituents.
This is a critical assumption, because in its absence, the interventions risk doing harm by having a negative consequence on perceptions of State effectiveness resulting from raised and then unmet expecta- In some cases, citizen engagement interventions, particularly those that focus on improving access to services for marginalised groups (inclusion), may not lead to the active, empowered engagement between citizens and service providers that leads to mitigated power differences and improved services. However, they could still lead to increased access to public services, particularly amongst vulnerable populations (Block 5). This comes about as a direct result of citizen engagement interventions relying on inclusion and transparency (information dissemination) mechanisms, but also through the other interventions; as communities are mobilised to engage with their local government and services, they become more invested in the services that they are attempting to improve. And thus, they are more likely to take advantage of those services, as they understand the importance of ensuring high quality service provision for themselves and their families. However, increased access for marginalised groups is not a given outcome of citizen engagement interventions; there likely needs to be concerted, targeted efforts to reach and engage these groups in order for the impacts to reach them (E-Pact Consortium, 2016). Similarly, interventions targeting services where changes are relatively immediate and visible may be more likely to encourage buy-in and support from supply-side actors (ibid.).
The joint effects of changes from Blocks 4 and 5 lead to improved use of public services anduser satisfaction (Block 6). Further along the causal chain, wellbeing outcomes may also improve (Block 7).
Wellbeing outcomes will vary by intervention sector (e.g. health, social protection, justice, natural resource management), and are more likely to improve in complementary enabling environments. In the majority of citizen engagement programmes, the PITA characteristics interventions are add-ons to core interventions and outcomes in a public service sector. In the long run, all three intermediate outcomes contribute to wellbeing outcomes. A key assumption is that sustained support is provided to the institutions or service providers charged with maintaining the implementation of the intervention, such that it becomes institutionalised. As noted above, power differences are dynamic and constantly evolving. Thus, a short-term project may well change outcomes in the short-term, but without proper support those gains may easily be lost. Target communities are frequently difficult to access, either due to remoteness and extreme weather, or to conflict and insecurity. It is precisely because of these challenges that governance interventions are so strongly needed in such areas, but they must be taken into account during the design phase to ensure risks are appropriately mitigated. These factors breed vicious cycles of weak public service supply, which leads to weak demand, which in turn facilitates weaker public financial management, and so on. In an ideal world, the citizen engagement interventions would create a virtuous circle of active community engagement in their government and service provision.
Interventions tailored to the specific context in which they are implemented, that target both the demand and supply sides of good governance, are more likely to be successful, particularly when the interventions are supplemented by complementary ones that target the technical side of service delivery and/or service delivery supply chains. For example, in the Philippines, a project focusing on improving access and quality of maternal and child health and family planning included social accountability mechanisms in the form of Quality Assurance Partnership Committees, which Brinkerhoff and Wetterberg (2015) argue led to more effective service delivery that improved the client-focus of providers and increased service use.

Additional factors that may influence an intervention's results
include top-down political will, which is key to ensuring that local government officials and service providers have the capacity to implement the changes they agree to with their constituents is having the support of the higher levels of government, which can ensure that funds are appropriately allocated. Political will further influences the sustainability of the results, and the possibility of a change in administration poses a risk to programmes that may be cut due to high association with the outgoing regime.
It is important at this point to also highlight two broad issues which determine the effectiveness of programmes, relating to intervention design and implementation fidelity. There are two main reasons why we might not expect to see the intended impacts of a programme implemented in the "real world" (Bamberger et al., 2010).
The first is that the programme design is inappropriatethat is the underlying mechanisms that drive change are not appropriate for the context in which the programme is based, or for particular groups of participants in that context (Pawson, 2006). According to van der Knaap et al. (2007: 3), "mechanisms are the engines behind behavior, which are often not immediately recognizable… They [include] people's efforts to give way to group pressure (groupthink), people's efforts to be status-congruent with others or to avoid or reduce cognitive dissonances, or people's desire to be an early adopter of an innovation.
[T]he action of mechanisms to some extent depends on the context in which they are used… Behavioral change is achieved through this context".
An example would be a community driven development programme that is supposed to rely on community participation to foster social cohesion, but is unable to support the appropriate level of participation, and therefore cohesion, because people are not comfortable speaking in public meetings due to elite capture (White, Menon, & Waddington, 2018). Similarly, interventions to decentralise decision making in schools are less likely to be effective in low income, low education contexts where communities have low status relative to school staff (Carr-Hill et al., 2018). Another example would be a women's empowerment programme which is ineffective in reaching a particular group of participants (e.g. women from Muslim households) because it does not take into consideration the need to involve community leaders in design of the programme targeting strategy.
Such "failure mechanisms" will vary based on intervention design and targets; for example, in some cultures, traditional community leaders may be critical stakeholders to engage in interventions seeking to change the equity of or access to services, despite the disconnect between their de facto and de jure powerbut only depending on the service targeted. Baldwin and Raffler (2016) argued that traditional leaders are often highly socially accountable for public services such as conflict resolution or natural resource management, but less so for services such as education or health care. In that case, an intervention targeting equitable access to and use of public land may fail if it does not engage traditional leaders, but a similar intervention simply targeting equitable access to and use of health services may still be successful. Failures may also come in the form of unintended consequences; for example, Chong et al. (2014) found that increasing the dissemination of corruption information to voters in Mexico decreased support not only for exposed corrupt politicians, but also for all political parties, and led to a decrease in voter turnout.
The second reason is due to implementation failures for a programme that otherwise (in theory) would be effective in the implementation context. Examples would be technical and logistical problems relating to project delivery (e.g. inadequate training and support to practitioners); weaknesses in implementer systems (e.g. human resource, financial or monitoring); or due to external factors (e.g. conflict, natural disasters).

| Why it is important to do this review
The 2017 World Development Report (World Bank, 2017) (OECD, 2017). Therefore, it appears that the share of aid to governance and civil society also fell from around 14 per cent to 10 per cent, or an increasing share that was traditionally counted under governance is instead being incorporated into sector programming (health, education, agriculture, infrastructure, etc).
Governance programmes are implemented in complex sociopolitical contexts, and involve many challenges in realising, demonstrating, and attributing improvements towards key outcomes. In addition, prominent single study evidence has questioned the viability of bottom-up, community-based approaches, as compared to top-down government accountability . However, it is The second main contribution of the review is to undertake the systematic review and meta-analysis to Campbell Collaboration standards while also aiming to extract the mechanisms underlying programmes and reporting those systematically. We did so by including certain types of comparison groups that would enable us to extract the effect of the PITA mechanism over standard access to public services (or a different PITA mechanism). We also systematically extracted information about the contextual factors and mechanisms and through which programmes operate systematically, based on the included studies and related programme and project documents, and synthesised those using a framework synthesis approach.
As policy makers and implementers work to ensure the sustainability of their investments and interventions, institutionalising good governance practices will become increasingly important. This systematic review assesses the effectiveness of interventions that target participation, inclusion, transparency and accountability in the design and delivery of public services and institutions on development outcomes. Analysis of causal pathways and mechanisms will shed light on the contexts in which these interventions can be successful and corresponding enabling factors. The review aims to provide evidence on what is generalisable, what is context specific, in what ways, and for whom in external accountability governance programming.

| OBJECTIVES
The primary objective of this review was to identify, appraise and synthesise evidence that answers the question: to what extent are programmes in low-and middle-income countries targeting effective and responsive public services and institutions that incorporate PITA characteristics into their design effective in achieving their objectives, as compared to otherwise similar programmes that do not?
Authors compared the effectiveness of different types of programmes that incorporate PITA characteristics, both by intervention sub-group and by which PITA mechanism(s) the intervention incorporates, using an innovative, integrated mixed-methods approach that drew on both quantitative meta-analysis (Review Questions 1-3) and qualitative realist-informed framework synthesis approaches that were then reintegrated with the meta-analysis (Review Question 4).
The secondary objectives were to assess how effects varied by population group and location, to identify the factors relating to programme design, implementation, context, and mechanism that are associated with better or worse outcomes along the causal chain and assess the evidence on programme costs. To address these last two objectives, the review included additional programme design and implementation documents as well as cost data where possible. The review aimed to answer the following specific questions: Primary review questions 1) What are the effects of interventions that aim to strengthen the PITA characteristics of public services or institutions on social and economic wellbeing for participants? (Review Question 1).
2) What are the effects of interventions that aim to strengthen the PITA characteristics on participatory, inclusive, transparent or accountable processes? (Review Question 2). Secondary review questions

3)
To what extent do effects vary by population group and location? (Review Question 3).

4)
What factors relating to programme design, implementation, context, and mechanism are associated with better or worse outcomes along the causal chain? (Review Question 4).

5)
What evidence is available on programme costs and incremental cost effectiveness in included studies of effects? (Review Question 5).

| METHODS
As described in the protocol published in the Campbell Library characteristics. The review also presents cost data from included impact studies (question 5).

| Criteria for considering studies for this review
The criteria determining eligibility of studies in the review are summarised in Table 1.

| Types of studies
To answer Review Questions 1, 2 and 3 the review included counterfactual studies that used an experimental or quasi-experimental design and/or analysis method to measure the net change in outcomes that were attributed to an intervention or policy. The review included randomised and non-randomised studies that were able to take into account confounding and selection bias (Reeves, Wells, & Waddington, 2017;Waddington et al., 2017). Specifically, the following study types were includable: • Randomised controlled trials (RCTs), with assignment at individual, household, community or other cluster level, and quasi-RCTs using prospective methods of assignment such as alternation.
• Non-randomised studies with selection on unobservables: o Regression discontinuity designs, where assignment was done on a threshold measured at pre-test, and the study used prospective or retrospective approaches of analysis to control for unobservable confounding.
o Studies using design or methods to control for unobservable confounding, such as natural experiments with clearly defined intervention and comparison groups, which exploit natural randomness in implementation assignment by decision makers (e.g. public lottery) or random errors in implementation, and instrumental variables estimation.
• Non-randomised studies with pre-intervention and post-intervention outcomes data in intervention and comparisons groups, where data were individual level panel or pseudo-panels (repeated crosssections), which used the following methods to control for confounding: o Studies controlling for time-invariant unobservable confounding, including difference-in-differences, or fixed-or random-effects models with an interaction term between time and intervention for pre-intervention and post-intervention observations; o Studies assessing changes in trends in outcomes over a series of time points (interrupted time series, ITS), with or without contemporaneous comparison (controlled ITS), with sufficient observations to establish a trend and control for effects on outcomes due to factors other than the intervention (e.g. seasonality).
• Non-randomised studies with control for observable confounding, including non-parametric approaches (e.g. statistical matching, covariate matching, coarsened-exact matching, propensity score matching) and parametric approaches (e.g. propensity-weighted multiple regression analysis). Eligible comparators for review questions 1-3 included groups that received normal service delivery ("business as usual") without improved PITA characteristics, or groups that received an intervention testing the inclusion of different PITA design characteristics or weaker or less intensive implementation of PITA design characteristics.

| Types of participants
This review included any participants from low-and middle-income countries (L&MICs), including participants from the general population and those from specific population sub-groups. Authors collected data on differential effects and experiences for sub-populations available and coded information according to the PROGRESS-plus criteria, where progress stands for place of residence, race/ethnicity, occupation, gender, religion, education, socioeconomic status, and social capital, and "plus" represents additional categories such as age, disability, and sexual orientation (O'Neil et al. 2014). o Citizen feedback mechanisms, which allow citizens to feedback concerns or priorities around service delivery to providers, and / or to monitor the delivery of public service delivery. This category also includes social audits, whereby public forums bring together a service provider with local authorities, neighbours, and representatives, to monitor the delivery of a specific project.

| Types of outcome measures
This review included studies that reported outcomes measuring improvement in access to services, service behaviours, attitudes towards services, including user satisfaction, social and economic quality of life improvements for the proposed intervention, and "state legitimacy" (state-society relations). The inclusion criteria for outcomes were broad in order to be able to provide a full picture of the effects of the included interventions along the causal chain, described in 5.1.6 | Secondary outcomes "immediate outcomes" measuring citizen engagement with public institutions and services, such as participation in decision-making, inclusion, transparency and accountability, and responsiveness of public services and public service delivery agents, such as public spending, leakages and corruption.

| Duration of follow-up
The review included any follow-up duration, coding multiple outcomes where studies report multiple follow-ups. Several studies presented multiple follow-ups, which are reported in the descriptive results section.

| Types of settings
Interventions could be implemented in any low-or middle-income country, as defined by the World Bank at the time the intervention was implemented.

| Other
The review included both completed and ongoing studies, including protocols of ongoing studies that met all other inclusion criteria and/ or studies listed in registries of ongoing impact evaluations.
The review included studies published in any language, although all included studies were in English. The review was limited to included studies published in 2000 or after, following Phillips et al.
(2017) and because authors did not expect to identify any impact evaluations that met the criteria from before this date.

| Other searches
The review used the evidence gap map of state-society relations as a primary source of potential studies (Phillips et al., 2017). In addition, authors screened the bibliography of existing systematic reviews and literature reviews, including Molina et al., (2016), Lynch et al. (2013) and Hanna et al. (2011). Authors also screened the reference lists of included studies and undertook forward citation-tracking for those studies using Google Scholar. Authors contacted the review's advisory group and the funder of the systematic review, USAID, to identify additional studies.

| Targeted searches for studies to address Review Question 4
In order to answer question 4 relating to programme design,

| Studies to address Review Question 5
The review aimed to incorporate and synthesise economic evaluations and cost data that were presented in the included studies. Only four studies presented any cost data. These are presented in the results section. screening stage to prioritise the items most likely to be "includes"

| Selection of studies
based on previously included documents. This involved independent double screening a random test set of citations to train the priority screening function, which learned to identify relevant records based on key-words in the title and abstract of the included and excluded studies. All team members were involved at this stage of screening.
The function continues to learn as screening progresses. Using priority screening in this way allows for the identification of includable records at an earlier stage in the review process so that work can begin earlier on full-text screening and data extraction. This review also used the priority screening function to classify studies into groups based on their probability of inclusion in the review. Screening of studies intended to address Review Question 4 took place in a second stage of screening. Studies were assessed for relevance by one author to determine whether they covered one of the programmes included to answer Review Questions 1-3. Each of these studies were then assessed for relevance by at least one other author.

| Data extraction and management
Authors extracted the following descriptive, methodological, qualitative and quantitative data from each included study using a standardised data extraction form (data extraction form provided in Appendix 3): • Descriptive data including authors, publication date and status as well as other information to characterise the study including country, type of intervention and outcome, population, context, type of intervention.
• Methodological information on study design, analysis method, type of comparison (if relevant) and external validity.
• Quantitative data for outcome measures, including outcome descriptive information, sample size in each intervention group, outcomes means and standard deviations, test statistics (e.g. t-test, F-test, p-values, 95% confidence intervals), cost data, and so on.
• Information on intervention design, including how the intervention incorporates participation, inclusion, transparency and accountability characteristics, participant adherence, contextual factors and programme mechanisms.
Authors extracted quantitative data for outcomes analysis using Excel. Two authors independently calculated effect sizes for a random sample of 20 per cent of the included studies, reaching agreement in all except two cases, which the lead author resolved. Disagreements on inclusion or exclusion were resolved by discussion and the input of a third author if necessary. The rest of the quantitative data was extracted by one author only. Authors extracted descriptive, methodological and qualitative data using KoBo Toolbox. Descriptive and qualitative data were single coded by one author and checked by a second author. One author also checked the coding of intervention characteristics and mechanisms coded by others.

| Assessment of risk of bias in included studies
The critical appraisal results for each included study are reported (Critical appraisal of included studies).

Assessment of risk of bias in experimental and quasi-experimental studies (Review Questions 1-3)
Authors assessed the risk of bias in the included quantitative counterfactual studies (impact evaluations) drawing on the signalling questions in the 3ie risk of bias tool, which covers both internal validity and statistical conclusion validity of experimental and quasiexperimental designs  and the bias domains and extensions to Cochrane's ROBINS-I tool and RoB2.0 (Sterne et al., 2016;Higgins et al., 2016).The risk of bias tool developed for this review can be found in Appendix 3. This review noted any potential differences in methods and risk of bias for different outcomes reported in each paper.
The review assessed risk of bias of included studies based on the following criteria, coding each paper as "Yes," "Probably Yes," "Probably No," "No" and "No Information" according to sub-questions relating to the following bias domains: • Causal inference: Factors relating to baseline confounding and biases arising from differential selection into and out of the study (attrition); • Deviation from intended intervention: Factors relating to biases due to performance bias (e.g. cross-overs, contaimination and survey effects) and motivation bias (Hawthorne effects); • Outcomes data collection: Factors relating to biases in outcomes data collection (e.g. social desirability or courtesy bias, recall bias); • Analysis reporting: Factors relating to biases in methods of analysis and reporting.
We used the following decision rule to assign a risk of bias rating for each domain: • "High risk of bias": if any of the criterion within that domain were assessed as "No" or "Probably No".
• "Some concerns": if one or several criterion within that domain were "Unclear" and none were "No" or "Probably No".
• "Low risk of bias": if all of the criterion within that domain were "Yes" or "Probably Yes".
Finally, we used the decision rule of RoB2.0 (Higgins et al., 2016) to reach an overall risk of bias judgment: • "High risk of bias": if any of the bias domains were assessed as being "high risk".
• "Some concerns": if any of the bias domains were "some concerns" and none were "high risk".
• "Low risk of bias": if all of the bias domains were assessed as "low Critical appraisal of project design and implementation (Review

Question 4)
It was not necessary to critically appraise the information extracted on programme design, implementation and context from the project documents as this information was descriptive.

Critical appraisal of cost evidence (Review Question 5)
The review identified cost data in four studies, most of which only presented intervention cost per beneficiary, and as some authors of included studies acknowledged, unit cost estimates were "back of the envelope" calculations. Authors assessed the quality of the cost evidence, using the tool provided by Evers, Goossens, de Vet, van Tulder, and Ament (2005) Formulas for effect size calculations were used depending on data provided in included studies. For example, for studies reporting means (X) and pooled standard deviation (SD) for treatment (T) and control or comparison (C) at follow up (p+1) only: If the study did not report the pooled standard deviation, but reported the standard deviations of outcome in each group, SD was calculated as follows: For studies reporting means (¯) X and standard deviations (SD) for treatment and control or comparison groups at baseline (p) and follow up (p+1): For studies reporting mean differences (∆¯) X between treatment and control and standard deviation (SD) at follow up (p+1): For studies reporting mean differences between treatment and control, standard error (SE) and sample size (n): For studies reporting regression results, authors intended to follow the approach suggested by Keef & Roberts (2004) and used the regression coefficient and the pooled standard deviation of the outcome. However, in most cases, the pooled standard deviation of the outcome was unavailable, and so regression coefficients and standard errors or t-statistics were used to do the following, where sample size information was available in each group: where n denotes the sample size of treatment group and control. The following was used where total sample size information (N) was available only (as suggested in Polanin, 2016): The t-statistic (t) was calculated by dividing the regression coefficient by the standard error. If the study authors only reported confidence intervals and no standard error, the review team calculated the standard error from the confidence intervals. If the study did not report the standard error, but reported t, this was extracted and used as reported by the authors. In cases in which 1 per cent, 5 per cent and 10 per cent significance levels were reported rather than t or se(b), then t was imputed approximately, using information about sample size, as follows: Where studies reported (log-) odds ratios, we transformed them into d using the following (Higgins and Green, 2011):

| Criteria for determination of independent findings and effect sizes
In this review, data are reported according to the intervention that the evidence was based on. Estimation of a standard meta-analytic effect size relies on the statistical assumption of independence of each included estimation of effect (Gleser & Olkin, 2007). Dependent effect sizes arise when one study provides multiple results for the same outcome of interest, or multiple outcomes for the same outcome construct, when a study has multiple treatment arms compared to the same control group, or multiple studies use the same dataset and report on the same outcome. This review therefore used rules to ensure that only statistically independent effect sizes were included in any one meta- Where a study reported multiple effect size estimates using different specifications for the same outcome, the review team chose the one with the lower likelihood of bias, for example the most appropriately specified outcomes equation (e.g. covariate adjusted specifications over unadjusted specifications in non-randomised studies). Where information was collected on the same programme for different periods of time, information on the full range of outcomes over time was extracted. However, the review team calculated an average synthetic effect size for use in any overarching analysis. There was also one case where the findings of an included study (Björkman & Jakob, 2009) were replicated by authors using the same data (Donato & Garcia Mosqueira, 2016). In this case, the review team used critical appraisal to determine which outcomes to include from which study.

| Unit of analysis issues
Authors assessed studies for unit of analysis errors (The Campbell Collaboration, 2014), arising when the unit of allocation of a study or treatment unit is different to the unit of analysis of outcomes data collection. If unit of analysis errors exist, this was corrected for by calculating the effective sample size (N e ) using the following adjustment (Higgins and Green, 2011):

| Dealing with missing data
In cases of missing or incomplete data, this review reported the characteristics of the study but stated that it could not be included in the analysis due to missing data. Data were missing or incomplete for some of the outcomes in one study (Palladium, 2015

| Assessment of heterogeneity
This review assessed heterogeneity by calculating the Q-statistic, Isquared, and Tau-squared to provide an estimate of the amount of variability in the distribution of the true effect sizes (Borenstein et al., 2009). This was complemented with assessment of heterogeneity of effect sizes graphically using forest plots. The review explored heterogeneity using moderator analysis to correlate intervention characteristics with outcomes using bivariate meta-analysis rather than meta-regression. The review conducted separate analyses by primary outcome (Review Question 1):

| Assessment of reporting biases
• service delivery and access (quantity and quality) • service use • attitudes to services • wellbeing outcomes • state-society relations.
The review also analysed the intervention mechanisms by analysing secondary outcomes by intervention type (Review Question 2): • service user and citizen engagement (demand-side behaviours) • service provider and public servant response (supply-side behaviours).
Finally, the review explored heterogeneity in effects by intervention type, as well as global region and effects for particular subgroups of participants (Review Question 3).
As heterogeneity exists in theory due to the variety of interventions and contexts included, this review used inversevariance weighted, random effects meta-analytic models (Higgins & Green, 2011). The review team used Stata's metan command (Sterne et al., 2008) to generate the meta-analyses and forest plots.
Sensitivity analysis was undertaken by reporting findings by study design and risk of bias assessment.

| Methods of synthesis: review question 4
In the context of "real world" programmes, project design and implementation fidelity are often the principal reasons why findings from programme evaluations differ between contexts. This is partly why advocates of mixed-methods evaluation approaches recommend collecting implementation process data (e.g. White, 2009;Bamberger et al., 2010). This review used a realist-informed framework synthesis approach to extract information from project design and implementation documents and included impact studies on context, implementation and mechanisms.
Framework synthesis starts with the identification or development of a framework to guide the analysis that highlights key factors that help understand or predict heterogeneity across results, which is built out through in-depth reading of included studies to include additional relevant themes against which studies are coded and reviewed to identify patterns (Oliver et al., 2008). Framework synthesis is well-placed to handle complexity across interventions and contexts and is amenable to the use of a wide range of potential sources of data, including evidence based on surveys and quantitative data, and more detailed evidence collected using qualitative methods, policies and implementation documents (such as proposals or monitoring reports) (Snilstveit, 2012).
Realist synthesis highlights variation in programme design in explaining differences in outcomes across contexts (Pawson, 2006).
Realists argue that the effectiveness of a programme depends on the that "The focus in such a classification can be on behavioral and social "cogs and wheels" of the intervention… but could also include administrative or legal mechanisms." (p.6).
In the present study, the review team searched included studies for information about how or why the intervention is supposed to work from descriptive information provided in the studies, author analysis (e.g. tests for "mechanisms" using statistical mediator analysis) and authors' own hypotheses about why the intervention was effective (or not). The information collected on contextual factors was partly contained in the detailed information about the comparison condition, co-interventions and background information about participants collected from included studies and project and programme design and implementation documents, and key contextual information collected from international datasets. As noted in more detail below, this review then identified and coded mechanisms associated with particular intervention sub-groups and PITA elements.
CMO is largely an iterative process, and thus the full list of CMO codes for analysis was developed as part of the synthesis. Initially, the review team drew on potential codes identified in the protocol, Where key enabling conditions are already in place, an intervention effectively designed may be successfully implemented in isolation; where key conditions are missing, the intervention design may need to be adjusted or expanded to include complementary interventions that seek to strengthen the enabling environment. For example, an intervention seeking to build transparency and accountability through open data interventions may need to build a coalition of support that engages people at the point in the system targeted for data release, upstream, downstream, and externally to create an environment in which data are provided, demanded, and used (Hogge, 2010). These enabling conditions may change depending on context factors such as the target level of the interventionwhether it targets service delivery at community, sub-national, or national level (E-Pact Consortium, 2016) or whether the external stakeholders it seeks to engage are organised civil society or interest groups, marginalised or vulnerable groups, or citizens and service users more broadly (McGee and Gaventa, 2010). The review team further conducted more detailed analysis of whether the bottleneck for good governance was likely to be properly identified as resting with citizens (e.g. lack of organization, lack of knowledge/ capacity), with the system (e.g. lack of opportunities for citizens to engage), or with individual service providers (e.g. power relations, corruption).
The combination of realist-informed framework synthesis that moved towards "best fit" framework synthesis was selected as the most appropriate method to link the meta-analysis with context and mechanism information given the complexity and heterogeneity of included interventions.
In the analysis, the theory of change developed during the

Moderator analysis
The following moderator variables were collected, as indicated in the protocol: • Methodology: study design, risk of bias status, timing of evaluation (follow-up length).
• Context variables: region, country income level, democracy policy index score.

| Methods of synthesis: review question 5
This review aimed to draw on standard approaches to synthesise economic appraisal evidence (Shemilt et al., 2008;Shemilt et al., 2011).

| 29 of 90
However, only four of the included studies reported cost data, and therefore this review simply reports the cost data that was identified in a  These 35 studies assess the effect of 41 unique policies or trial arms.
The systematic search also identified 11 ongoing studies, a list of which is presented in references to ongoing studies. Reasons for exclusion are discussed in more detail below. Overall, of the 35 unique studies included in this review, 16 had been included in the state-society relations EGM, three of which as ongoing studies with registered trials (Phillips et al., 2017).
Following the search for impact evaluations, authors undertook a targeted search for qualitative and project documents associated with the programmes evaluated in the included impact evaluations. In total, 76 additional documents were identified, of which 36 contributed to the qualitative synthesis. These are discussed in more depth in the section on framework synthesis.

| Excluded studies
Studies were often excludable for more than one reason, but we did not search for all possible reasons for exclusion once a study met one exclusion criteria. We excluded 98 papers at full-text for not meeting our criteria on intervention. With regards to those excluded on intervention, we excluded five as they were classified as informal sector, that is, the programme was implemented independently of government. We excluded six as they only addressed service access for marginalised populations through the delivery of a new service.
We excluded 24 as they were unable to isolate the PITA element of the intervention, that is, the evaluation measured the effect of a PITA mechanism packaged with other interventions. We excluded a further 63 papers for evaluating other irrelevant interventions.
One of these studies excluded on intervention was of an ineligible We excluded an additional six because they evaluated a study of education or a participatory planning intervention alongside a block grant (CDD), 12 because they were not a primary study, 13 because the study did not address questions of effects, five because they were qualitative, five because they did not account for confounding in design or analysis, and 17 for not using a contemporaneous comparison group (e.g. before versus after design). In addition, we were unable to access one paper.
A further seven studies were eligible for being included based on population, intervention and comparison but only examined the effects of a PITA mechanism on one or more secondary outcomes of interest, that is, citizen engagement and/or provider response, without extending the analysis to primary outcomes of interest. After the full-text screening stage, we excluded a further two papers that appeared to be evaluations of eligible interventions, but that we discovered to be PITA mechanisms implemented alongside co-interventions that were not reported clearly in the original evaluation (Alderwish & Dottridge, 2013;Andres et al., 2017). We discovered the presence of the additional co-interventions in the additional documentation we identified through our targeted searches. Both papers evaluated community driven water provision.
For Andres et al. (2017), we identified a 2009 World Bank Implementation Completion and Results Report associated with the project evaluated in the paper, the Jalanidhi project. The report described co-interventions that would likely have impacted the outcomes covered by the evaluation, including significant technical engineering assistance, infrastructure, and capacity building. The impact evaluation does not acknowledge these co-interventions, but rather presents the study as isolating the impact of the institutional form the water management system takes on the outcomes. Thus, due to the co-interventions, the study did not isolate the effect of the PITA mechanism and was excluded from the review. Alderwish & Dottridge (2013) was a similar case in that a project document identified significant infrastructure interventions combined with the community water provision intervention.

| Studies awaiting classification
We identified one eligible study towards the end of the review process that we were unable to include due to time constraints, Tohari, Parsons, and Rammohan (2017). It is unlikely that the inclusion of this study would substantively change the results of our synthesis, partly as the study evaluates an intervention already included in the review (Banerjee et al., 2018). The results of that study should be included in updates of this review.

| Description of included studies
Here we describe the characteristics of the 35 included studies. Key characteristics of each included study are presented in Appendix 4.  , Malawi (Gullo et al., 2017), Namibia (Bandyopadhyay, Humavindu, Shyamsundar, & Wang, 2004) and a study that took place in both Kenya and Guinea (Bradley & Igras, 2005).
Finally, we identified one study in Russia, a study of support for participatory budgeting (Beuermann & Amelina, 2014).

| Interventions and PITA mechanisms
We grouped the identified studies by five main intervention areas, presented in Table 4. Eleven studies provided information to citizens, either about citizen rights to access services or to participate in participatory processes (n = 5), or information about performance of politicians or public service providers, including report cards (n=7).
We consider the main design mechanism for these categories to be in Afghanistan. It should be noted that we did not include the findings from these studies that evaluate the impact of the CDD programmes themselves which was outside the scope of this review, only the comparison between those groups that mandated participation of women and those that did not. We consider these sub-sets of the participatory planning intervention category.
Finally, we identified seven studies evaluating community management of natural resources, whereby there is some devolution of the management of a natural resource to a community group, but where the government retains some powers. These fell into two groups; those that involved management of water (Bandyopadhyay et al., 2010;Barde, 2017;Huang, 2014) and of forests or conservancies Rasolofoson et al., 2015;Bandyopadhyay et al., 2004;Tachibana & Adhikari, 2009). This intervention category differs substantively from the others in that communities are equipped with considerable more power to make decisions and implement public services than the other intervention areas.

| Intervention funders
We attempted to capture information on the funders of the

| Equity
For each study, we captured information about if, and how, it addresses equity concerns, either through the design of the intervention or through the evaluation design and analysis methods.
We considered an intervention to address equity if it targeted a marginalised or vulnerable group or was designed in a way to overcome local barriers to incorporate these groups into the programme. We considered an evaluation design and analysis method to incorporate equity if it undertook sub-group analysis for the marginalised group or reported on how those groups were able to participate in the programme.
Eighteen of the included studies did not explicitly address equity concerns. 4 Nine of the included studies evaluated an intervention

| Critical appraisal of included studies
We assessed the risk of bias for all studies included in this review. The criteria related to the assignment mechanism, analysis reporting and blinding are assessed at the study level whereas all the other criteria are assessed at the outcome level. While selection bias and risks of confounding are usually assessed at the study level, it can be the case that some outcomes are more exposed to bias than others, depending on the data source or the analysis method (e.g. where outcomes data are collected based on participant self-reports rather than direct observation in non-blinded studies).
We found that out of the 166 outcomes assessed separately from non-randomised studies, 146 had high risk of bias, 19 had some concerns, and one had low risk of bias. Out of 386 outcomes assessed separately for randomised studies, 161 had high risk of bias, 83 had some concerns and 142 had low risk of bias. A detailed and overall assessment by study and group of outcomes is presented in Appendix 5.

| Findings by risk of bias domain
Assignment mechanism in randomised studies As Figure 7 illustrates, for a large majority of the studies (73%), the assignment of clusters into the different study arms was random or probably random. For only one study (Kasim, 2016), although the assignment mechanism was reported as random and the sample was relatively large, significant imbalances at baseline suggests that there might have been a problem in the random allocation. While assignment seems to have indeed been random for 73 per cent of randomised studies (and is reported as such), 47 per cent lacked detailed information about the exact randomisation method, such as whether the sequence was generated by a computer or whether a paper-based lottery was organised. In one study, important information on the number of units of programme implementation within each cluster was missing (Berman et al., 2017).
Reporting of a baseline balance  Ananthpur et al. (2017), the baseline data were collected after the start of the intervention in some villages, yet the analysis method used the difference-in-difference technique.
The extent to which this undermines the results will depend on the proportion of observations affected, but the authors did not report the information required to assess the scale of the issue.

Selection bias
The randomisation ensures that the risk of selection bias into the study is relatively small. A majority of outcomes measured in randomised studies were considered free or probably free from selection bias (70%). However the sampling method used to collect survey data or differential attrition at the end of the study represent threats for RCTs and non-randomised studies. Given that tracking survey respondents over long time periods or preventing dropouts can be challenging, attrition is common across almost all studies to a certain extent. It is only a threat to validity if it represents a large proportion of the sample and is systematically larger for some study groups than others (and correlated with outcomes). This might be the case for eight per cent of outcomes and is unclear for 21 per cent of outcomes. Unfortunately, the lack of information reported on the reasons for attrition makes it hard to identify risks of selection bias out of the study. Authors do not tend to make attrition information very accessible. In three studies where attritions rates were particularly high (greater than 20 per cent of the baseline sample), authors do not report attrition rates across different treatment and control groups, or test of the relationship between covariates and treatment status, four neglect to comment on varying sample sizes between the initial sample and the results tables, and two do not provide enough information to calculate attrition.
An example of an unclear case is Giné et al. (2018) analysis of the selection criteria and convincingly argue that all characteristics that might affect outcomes were controlled for in the analysis. For these outcomes, the presence of unobservable characteristics that might affect the outcomes is unlikely, therefore these outcomes were rated as probably free from selection bias.

Deviations from intended interventions
Any spill overs from one study group to the other, contamination of the study by another program, or non-compliance to the assigned intervention status, has been assessed under deviations from intended interventions. Only two randomised studies have outcomes that had high risks of deviations. One of the outcome in Giné et al.

Performance bias
Another potential bias occurring during the data collection process is performance bias: the fact that monitoring participants influences their behaviours because they are aware of being watched (Hawthorne effect). A majority of randomised studies are protected from this bias (56%). When a process evaluation of the intervention was conducted  it was done on a subsample of the treatment group. Banerjee et al. (2014), which was also at risk of motivation bias due to the decoy visits used as a monitoring technique, overcame this risk by adding a pure control study arm (placebo group), free from monitoring visits.

Outcome measurement bias
With regards to outcome measurement bias, which refers to cases where the way the outcome is being measured differs between treatment and control participants as a result of the intervention, it is worth noting that around 65 per cent of the primary outcomes in these studies are self-reported, increasing their exposure to bias. The bias could also come from the outcome assessors, if they know the respondent's treatment status. This could still be a risk for all studies because none of them blinded outcome assessors except one (Pandey et al., 2007).

Analysis and reporting bias
The randomised study designs ensured comparability of groups for the analysis of almost all outcomes. As a result, 70 per cent of all outcomes in randomised studies were free or probably free from confounding. However, depending on the sample size and the randomisation procedures, some imbalances can occur by chance.
The majority of authors identified these imbalances and controlled for relevant variables in the analysis method, whereas in 26 per cent of the cases, it was not clear whether imbalanced variables were controlled in adjusted analysis.
Although 12 out of the 14 non-randomised studies used the appropriate method to control for group differences given the data available, the existing selection bias into the programme and the lack of baseline data explains why more than 60 per cent of studies did Out of all studies, only one blinded data analysts to the treatment .
Overall, for randomised and non-randomised studies alike, there is a lack of transparency and reporting. Non-randomised studies do not systematically report results using different analysis methods and specifications, which is often key to assessing the robustness of their model. Three studies out of eight using statistical matching reported estimation from different matching techniques. The existence of a pre-analysis plan, published before the start of the analysis, or a trial registration is rare across all types of studies. None of the non-randomised studies and only three randomised studies reported having registered the trial or a list of outcomes (Banerjee et al., 2018, Pandey et al., 2007. Only three study reported having published a pre-analysis plan (Beath et al., 2013, Grossman et al., 2017. The 42 per cent of randomised studies being probably free from analysis reporting are studies which have been reported transparently but have not registered either trial, outcomes or pre-analysis plan, therefore we cannot be certain that all relevant analyses are reported. Finally, two randomised studies failed to report analysis differentiating treatment arms (Alhassan et al., 2015;and Kasim, 2016).
More generally, as Figure 7 and Figure 8 illustrate, there is, for all criteria, a share of studies and outcomes which could not be assessed because of a lack of information (grey areas). Overall, it is sometimes the case that there is some doubt about a risk of bias, which could have been eliminated if more information on the issue was provided.
These issues were particularly problematic for method of assignment (randomisation procedures), reporting of baseline data and attrition.

| Research ethics
We also captured information on whether the paper explicitly stated that the authors had ethical clearance to undertake the study. Of the 35 included studies, the majority (28) collected primary data for analysis. However, just three of the included studies reported that they had sought and received ethical clearance for their studies. The rest did not report whether ethical clearance to undertake the research was sought or granted; they may well have done, but they simply do not indicate whether this was the case in the country where the data were being collected and (if different) where the research team was based. In addition, we looked for declarations of interest in the included studies, to capture for example if any of the authors related in any way to the funding or implementing institution. We found that only two studies included conflict of interest statements. In 18 of the studies, the authors did not include a statement or did not present a statement that clearly reported on possible conflicts (known or unknown) for all authors.

| External validity
Several factors need to be taken into account when assessing the external validity of studies such as the approach used by researchers to select the study population, whether the programme implemented was a small scale pilot or a large scale established program, and the characteristics of the population and setting of the study. We captured information on the sampling strategies, as well as authors' discussions of generalisability of their findings.

Selection of the study population
We identified nine studies in which random sampling was used to either select the study's geographical areas such as regions and districts, or select the clusters or units of treatment such as communities, facilities and villages. Twenty one used purposive sampling and four did not provide enough information on their method or the origin of the data set used. Table 6 shows which studies have used each of the sampling strategies, and separate the results by treatment assignment mechanism and whether survey respondents were randomly sampled.
Knowledge of the sampling method is not sufficient on its own and, more attention to each study is needed to be able to conclude on the representativeness of the populations selected. Of the studies which used random sampling, three did not include randomly where survey or administrative data was already available from previous studies were selected to be part of the evaluation (Banerjee et al., 2018;Pandey et al., 2007;Gonclaves, 2013). Finally, three studies evaluating the impact of an established programme were restricted to the area or communities where the NGO or the government was implementing or had had the program (Beath et al., 2013;Giné et al., 2018;Molina, 2014).

Author discussion of external validity
We found 11 studies where authors specifically discussed external

| Summary of findings from critical appraisal
The quality of evidence from randomised studies is relatively high compared to non-randomised studies, and easier to assess due to standards of reporting for those studies. Prospective randomised study design helped ensured comparability of intervention and control groups according to observable characteristics, and protected threats from selection bias into the study in 70 per cent of the cases.
For these studies, threats to internal validity are therefore more or participants) is not attempted or impossible. A majority of the nonrandomised studies did not provide enough information on the selection process into the programme to reject the risk of selection bias, or failed to overcome the selection bias and confounding that was identified. Transparency in reporting is an issue for randomised and non-randomised studies alike given the limited pre-registrations of trial, outcomes or analysis plans. The use of methods such as placebo outcomes or groups, and blinding for outcome assessors or data analysts, is not common, though it seems relatively easy to implement and could reduce risks of biases. With regards to external validity, four studies still do not report their sampling strategies clearly, and a surprisingly small share of all studies specifically discuss the extent or limits to generalisability of their findings.

| RESULTS OF META-ANALYSIS (REVIEW QUESTIONS 1-3)
In this section, we describe the quantitative dataset and outcome variables classification. We present the results of meta-analysis across all included studies, by primary outcomes along the causal chain (review question 1). We then examine findings for secondary outcomes (review question 2). In both instances, we assess the extent to which findings are homogeneous for groups of interventions that aim to address different participation, inclusion, transparency and accountability mechanisms. Finally, we further examine heterogeneity according to context and implementation factors, as well as differential effects for sub-groups of participants such as poor people (review question 3).
As discussed, we collected all effect estimates from each included study, on any eligible outcome, population sub-group or specification.
Hence for some studies we collected large numbers of effects. Figure   9 presents the number of effect estimates collected from each study that we were able to incorporate in meta-analysis.
In total the 35 studies yielded 618 estimates of programme impacts that we incorporated in meta-analysis. All studies provided usable data for effect size calculations. In cases where pooled standard deviations were not available, we had to rely on t-statistic However, the majority of studies presented far fewer effect estimates, usually less than 20.
We assigned specific sub-categories of outcomes (e.g. participation in meetings) to causal chain outcome groupings: intermediate outcomes (service access, service use and attitudes to services), final outcomes (wellbeing and state-society relations) (review question 1), and immediate outcomes (user engagement and provider response) (review question 2). Figure 10 presents the number of effect sizes collected for each outcome, together with the distribution of effect size estimates, showing the mean, minimum and maximum values of g.
We drew on a recent review of community-driven development (White et al., 2018) in informing the outcome groupings along the causal chain, as presented here. Table 7 presents the detailed description of variables included under each outcome area. As these may differ by projects, these are presented by main sector (health, social protection, justice and security, local infrastructure and economy, and natural resources). The full list of variables collected under each outcome category is presented in Appendix 6.

| Meta-analysis of intermediate and final outcomes (review question 1)
We present findings by primary outcome group and subgroups along the results chain (intermediate and final outcomes). In each subsection, we first present an overview of the different outcome metrics used in each study included in meta-analysis (for the full list, see Appendix 6) and then present the subsequent meta-analysis results including forest plots. When presenting the meta-analysis, we present sensitivity analyses to disaggregate findings by study design (whether randomised or non-randomised) and risk of bias status.
Owing to the large number of outcomes collected, we present all effect sizes as standardised mean differences for ease of presentation. 5 The total number of study participants across all studies included in the analysis is 62,500.
In general, the findings suggest that the interventions can be effective ways of boosting citizen engagement in service delivery governance and access to public services. But the evidence does not suggest that outcomes further along the results chain typically improve as a result of interventions to promote citizen engagement.
In a few cases, particularly in health and infrastructure, there may be increases in service use and some wellbeing outcomes. For statesociety relations, payment of taxes may increase. workers (Giné et al., 2018). Access is also measured through costs to consumers in two studies: subsidies received (Banerjee et al., 2018) and user fees paid in health (Giné et al., 2018).

| Service access
Quality of service provision was assessed through measures of service provision performance such as whether there are employees in the Anganwadi or agricultural extension visits occur (Ananthpur 5 As requested by the methods reviewer, we also present odds ratios for dichotomous outcomes in Appendix 6 Figure  which we report separately. The final measure of quality in service delivery was measured by leakages of public goods from road construction  and food aid (Beath et al., 2013).
The overall findings suggest some improvement in access for some measures of service delivery ( Figure 11). This is demonstrated by an increase in average effects of physical access (SMD=0.08, 95% confidence interval (CI)=−0.11, 0.24; 2 studies).
There was significant heterogeneity which we explored in sensitivity analysis (Table 8; forest plots presented in Appendix 6).
The results indicate that the findings for non-randomised studies tend to be bigger than those for RCTs, while results for risk of bias such as use of antenatal and postnatal care (Grossman, 2017;Gullo et al., 2017). In one social protection study, the authors measured participation in employment services (Ravallion et al., 2013).
The results of the meta-analysis ( Figure 12 There also appeared to be significant heterogeneity in the findings although this was not related to study design or risk of bias (Table 9).

| State-society relations
A few studies also measured the category of variable we have referred to ( (Table 11).

| Meta-analysis of immediate outcomes (review question 2)
We grouped immediate outcomes into user engagement and provider response, in order to break down the mechanisms through which interventions operate. In general, the findings suggest that citizen Service access F I G U R E 1 1 Forest plots showing service access outcomes Note: * effect sizes for negative outcomes are inverted for comparability engagement interventions can be effective ways of boosting user engagement in service delivery governance, but not typically provider responsiveness. We conclude that we are able to go some way to explaining intervention mechanisms on demand and supply sides, articulating that the interventions are mainly successful in improve demand (user engagement) and not supply (provider engagement). However, heterogeneity in findings needs further explanation, which we return to below in moderator meta-analysis and framework synthesis. It is worth noting that because these are secondary outcomes, which are reported in studies that also measure primary outcomes, the findings for immediate outcomes are only generalisable to the population of studies that also report immediate and final outcomes.

| User engagement
We first present overall findings for user engagement (Figure 15). The two studies measuring knowledge about intervention processes did not find significant effects (SMD=0.01, 95%CI=−0.11, 0.11; 2 studies).
There was some heterogeneity in the findings which we explored in sensitivity analysis (Table 12). Most of the studies are RCTs so exploring differences by design were not especially useful. The findings suggested low risk of bias studies tended to have bigger effects than higher risk of bias studies.

| Provider response
We categorised provider response variables into groups of related outcomes. A number of studies measured changes in public spending in health (Björkman Nyqvist et al. 2017;Grossman, 2017;Touchton & Wampler, 2015) or more generally (Beuermann & Amelina, 2014;Goncalves, 2013;Grossman & Michelitch, 2018). We also defined other provider actions relating to the citizen engagement intervention such as holding meetings (Pandey et al., 2007) or adopting processes like participatory budgeting (Timmons et al., 2015); or resulting from the engagement, such as activities carried out by staff (Ananthpur et al., 2014;Björkman Nyqvist et al. 2017;Diaz-Cayeros, 2014) and projects selected (Humphreys et al., 2014). Two studies further measured variables relating to self-motivation of staff governing the intervention (Alhassan et al., 2016;Bradley et al., 2005) Ravallion et al., 2013, and ( Figure 17). Overall, the interventions do not necessarily improve provider responsiveness, although there is a significant improvement in the case of food subsidies in Indonesia (Banerjee et al., 2018). Service access also improves (SMD=0.11, 95%CI=0.05, 0.17; 2 studies) and costs fall (SMD=0.14, 95%CI=0.08, 0.21; one T A B L E 9 Service use and satisfaction by study design and intervention  (Kasim, 2016) measured any wellbeing or state-society relations outcomes, and was not able to report any significant changes ( Figure 18).  Figure 19, Figure 20). There is a partial exception in the case of one study (Capuno and Garcia, 2010).

| Participatory planning
For participatory planning interventions, where seven studies measured a range of interventions, the story is mixed but largely not a positive one.
Physical access to services improves on average (SMD=0.10, 95% CI=0.03, 0.18; 3 studies) (Figure 21). A few other outcomes are positive but not statistically significant, for example quality of service delivery (SMD=0.08, 95%CI=−0.02, 0.18; 2 studies) and use of health services and morbidity in Giné et al. (2018). In general, however, the evidence does not support increases in outcomes for other intermediate and final outcomes, for any low risk of bias study groups.
Only one study was able to measure user engagement outcomes (Ananthpur et al., 2014). However, it is noteworthy that a relatively large number of studies that measured service access and wellbeing outcomes also measured provider response outcomes ( Figure 22).

| Citizen feedback mechanisms
The story for citizen feedback mechanisms is more positive, although there is significant heterogeneity in the findings. For evaluations that also measure primary outcomes, citizen engagement improves for active participation (SMD=0.14, 95%CI=0.05, 0.24; four studies) and in one study that measured meeting attendance . The metaanalyses also did not suggest positive improvements in provider responsiveness on average, although some individual studies reported positive effects for provider actions  and staff motivation (Bradley et al., 2005) (Figure 23). Several service access and use outcomes were assessed as having increased on average but not statistically significantly, including service quality (SMD=0.19, 95%CI= −0.01, 0.39; 7 studies) and user satisfaction (SMD=0.13, 95%CI=−0.04, 0.30; six studies). Finally, a few single studies reported positive wellbeing outcomes for reducing illness (Duku et al., 2018) and crime (Palladium, 2015), and improving empowerment and assets  ( Figure 24). Only one study ) measured statesociety relations outcomes and was not able to detect significant changes due to citizen feedback mechanisms.

| Community based natural resources management
To some extent the findings for CBNRM are less convincing than other interventions, because in the main the included studies were assessed as being of risk of bias largely on design grounds (the exception is for the RCT by Barde et al., 2017). The findings from meta-analysis ( Figure 25, Figure 26)

| Impacts by population group (review question 3)
This section presents results of sub-group analysis for studies that report outcomes measured among different groups, including men, women and poor households. In addition, it presents further moderator analysis for whether interventions had an inclusiveness component by design and reporting outcomes by global region.
Three studies collected outcomes data measured separately among women and men (Ananthpur et al., 2014;Palladium, 2015;Ravallion et al., 2013) and a further five studies reported sub-group outcomes solely for women (Beath et al., 2013;Diaz-Cayeros et al., 2014;Humphreys et al., 2014). The results, presented in Table 14 (see Appendix 6 for forest plots), do not suggest there are differences in outcomes by sex, where outcomes for both men and women are reported in the same studies. There are differences in magnitude in a few cases, such as the two studies of employment (Ravallion et al, 2013) and local governance (Ananthpur et al., 2014) in India. Indeed, Ananthpur et al. (2014) suggests that the positive wellbeing outcomes measured among men are not seen among women. However, there are very few observations where studies report sex disaggregated effects, and when they do the confidence intervals overlap, so any differences can be interpreted as statistically insignificant.
Three studies reported outcomes for poor households (Banerjee et al., 2018;Pandey et al., 2007;. In the case of Banerjee et al. (2018), the intervention targeted the poorest decile. In the case of Pandey et al. (2007) and Persha and Meshack (2017), outcomes are presented separately for lower-caste communities and poor households. The findings suggest that outcomes for poor households are often positive and statistically significant (Table 15).
However, there are too few observations to draw conclusions, other than that studies must more consistently present results of sub-group analysis. Even where significant effects are not reported due to underpowered analyses, statistical synthesis (meta-analysis) can be undertaken to detect possible effects across studies.
Finally, we conducted analysis by global region (Appendix 6).
Bearing in mind that the analyses are likely to be confounded by other characteristics such as intervention type, we note simply that the analysis suggests intervention conducted in East Asia and Pacific and South Asia are more likely to have significant effects than studies conducted elsewhere.

| Publication bias analysis
This section presents results of the analysis of small study effects. Figure 27 presents contour enhanced funnel graphs for all study designs (part a) and for RCTs only (part b). There does appear to be asymmetry in the plot, which is markedly less for RCTs than all study designs. This may support Peters et al. (20o8) contention that bias may confound attribution of small study effects to publication bias. Eggers et al. (1997) test also did not find significant evidence for publication bias (Table 16).

| RESULTS OF FRAMEWORK SYNTHESIS (REVIEW QUESTION 4)
The following section presents the analysis of context and mechanisms that may contribute to findings along the causal chain (review question 4). We present the findings of a qualitative, realist-informed framework synthesis that moves toward "best fit" framework synthesis, focusing on the key mechanisms and

| Rights information provision
Five studies comprised or included study arms of interventions that aimed to improve citizens' access to information about their rights to services (Table 17).
These studies look at the provision of information on rights to services that cover both merit goods (such as rice subsidies) and public goods (such as construction monitoring). Through providing citizens with information on their rights to services, including both entitlements to both quality and quantity, these interventions aim to increase their realisation of their rights. The data extracted in the qualitative synthesis for these studies were reviewed to identify patterns of movements along the causal chain. Within this intervention group, three key factors emerged that helped explain the heterogeneity of results: whether the bottleneck to service access was correctly identified as demand-driven lack of information; whether the intervention targeted a collective or individual good; and whether the bottleneck was due to demand-driven lack of information about existing services or supply-driven rationing of service allocation or corruption. Case comparisons using the included studies are provided to illustrate the importance of these factors.
The first stage in the causal chain thus assumes that the underlying bottleneck to citizens' access to services is a lack of T A B L E 13 Provider response outcomes by study design and intervention  .
In comparison, in Ravallion et al. (2013), though the researchers conducted qualitative research during the design phase to ensure their video would be salient to the rural, poor population targeted, and identified low levels of knowledge of their rights to the labour F I G U R E 1 9 Forest plots showing immediate and intermediate outcomes for performance information Note: * effect sizes for negative outcomes are inverted for comparability F I G U R E 2 0 Forest plots showing final outcomes for performance information Note: * effect sizes for negative outcomes are inverted for comparability subsidy service, the intervention ultimately had limited impacts on use of the rural guaranteed labour scheme amongst the targeted population. Subsequent research of the jobs programme suggests that the key barrier to citizens' access to the labour programme was actually rationing of access to jobs by administrators, triggering discouragement amongst potential workers (Narayanan, Das, Liu, & Barrett, 2017).
A key theme throughout the transparency and accountabilityrelated studies is the difference in mechanisms triggered by interventions depending on the nature of the service they were targeting, which related to how citizens accessed the service.
Broadly, the services could be split into two groups: "direct delivery" services and "indirect delivery" services. The first, "direct delivery," refers to those services that citizens access from individual service providers, such as the healthcare one receives from a clinician or the food subsidies one collects from the distributor. In these cases, citizens engage with the service provider staff on a regular basis as part of their normal service use. The second, "indirect delivery," refers to services that citizens access independently of the providers, such as public infrastructure that one uses without engaging with the contractors who built it. In this latter group, citizen engagement in service delivery tends to be limited to transparency/accountability interactions; in the absence of such processes, citizens may not otherwise interact with the providers at all.
Where the intervention targets a directly delivered service, such as the provision of rice subsidies, and the bottleneck is correctly identified as pertaining to lack of information on the demand-side, then the provision of information may suffice to improve the delivery of services to citizens. In Indonesia, Banerjee et al. present evidence suggesting that disseminating cards with information on citizens' rights to rice subsidies and standard costs was sufficient to change citizens' bargaining power with the service provider to increase the amount of subsidised rice they received (2018). The authors highlight facilitating factors that triggered a significant change in response to a relatively small intervention, including: • The salience of the information provided: rice is a staple of the Indonesian government, and the subsidised rice is significantly villages, and the authors present qualitative evidence suggesting that government officials were cautious of sanctioning incomplete compliance too forcefully (Banerjee et al., 2018).
Conversely, in the case of indirectly delivered services, the ability of citizens to influence service providers appears much weaker. In , though the bottleneck was likely correctly identified as actors. This thus triggers demand-driven responses, and may explain the lack of evidence regarding service provider response that led to breaks in the causal chain. However, in cases where service use has a direct effect on wellbeing outcomes, the provision of rights-information may be able to achieve results further along the causal chain directly through inspiring changes in citizen use of services, despite failing to influence the quality of service provision.
For example, in India, Pandey et al. (2007) find that an information campaign on access to health services was successful in increasing citizens' knowledge of existing services that they could choose to access; unlike the video campaign for the guaranteed labour scheme, service allocation rationing was not an issue. However, though the campaign informed citizens on their rights and how to complain when service delivery didn't meet quality standards, the authors present qualitative evidence that suggested that the lack of engagement with the supply-side actors throughout the intervention may have triggered a break in the causal chain for service provider response and service quality improvements (Pandey et al., 2007).
Following the synthesis process, the original framework was adapted to create a "best fit" framework that highlights the abovementioned key mechanisms and moderating factors ( Figure   28). Though the included studies within this intervention group did not include any instances in which an intervention targeting an indirectly delivered service was able to have an effect on service

| Performance information provision
Six studies comprised or included study arms of interventions that improved citizens' access to information about the performance of public service providers (Table 18).
These studies include interventions that provided performance information about both individual service providers in the form of elected politicians (Capuno & Garcia, 2010;Grossman & Michelitch, 2018;Humphreys and Weinstein, 2012) and service provider institutions (Timmons & Garfias, 2015;Banerjee et al., 2014;. Through providing citizens with performance information, these interventions aim to trigger mechanisms in which service providers respond to a change in motivated citizens' efforts to hold them accountable to performance improvements. The data extracted in the qualitative synthesis for these studies were reviewed to identify patterns of movements along the causal chain. Within this intervention group, four key factors emerged that The studies that evaluate the provision of performance information of elected politicians are included in this review because they attempt to make an explicit link between politician performance and service delivery and report the results on service delivery quality accordingly. As noted in the background section, the different spheres of governance interact, and in the case of these studies, the underlying theory is that changes to politician performance can be realised via informal processes of vertical accountability through a The first key moderator identified through the synthesis along the causal chain for politician performance interventions is the influence of competition within an electoral constituency on politicians' behaviours. This mechanism is specifically tested in Grossman & Michelitch (2018), wherein they find that the intense dissemination of scorecards for politician performance only triggered an improvement in politician performance in electorally competitive constituencies. Grossman & Michelitch (2018) provide contextual information suggesting that in Uganda, while the national-level politics are dominated by a single party, locally there is variation in relative competition for elected seats, which enabled them to test this mechanism. The findings in Humphreys & Weinstein (2012) support this theory; they find that while voters were strongly receptive to the disseminated performance information, it did not trigger improved performance amongst national-level MPs, who face minimal electoral competition.
This leads to the next key assumption: that the information provided is salient to citizens' decision-making. As noted, Humphreys & Weinstein (2012) found that while the information was salient to citizens' interests, it did not translate into changes in politicians' chances for re-election, thus suggesting that citizens' electoral decisions were based on factors other than politician performance. Grossman and Michelitch (2018) suggest that the salience of performance information to voters' decisionmaking depends on the political culture; in a context where voting is primarily along party, ethnic or religious lines, politician performance is unlikely to have a large impact on voters' actions. Given the Ugandan context of limited national-level electoral competition, this factor could also help explain the null results. Empowerment (Palladium, 2015) Male Note: effect sizes for negative outcomes are inverted for comparability. In determining whether the performance information provided is likely to be salient to constituents, the extent to which it changes their priors appears to be influential. This mechanism is tested by Timmons & Garfias (2015), who find that the publication of the results of a municipal government audit influenced the willingness to pay taxes for those constituents whose priors were changed by the audit results. This mechanism may also help explain the dissipation in results over time that Capuno & Garcia observe; in their intervention, performance information was regularly disseminated to constituents over two years, and while the intervention Another potential explanatory factor between these two studies is the relative power difference between targeted supply-side actors (i.e. the politicians) and demand-side (constituents). It is reasonable to expect that there is a larger power difference between nationallevel MPs and their primarily rural constituents, compared to rural constituents and district-level councillors. Thus, in the absence of the potential for electoral sanctions, politicians who enjoy a greater level of power difference compared to their constituents are more able to ignore increased transparency without fear of credible social sanctions.
Timmons & Garfias (2015) present some evidence that suggests that while elections are not the only mechanism at play in determining whether performance information dissemination triggers improvements in performance, the timing of information dissemination relative to elections does have some effect. The authors of all included studies evaluating elected politician performance note that the reactions to the dissemination of performance information for elected politicians are likely to be affected by whether they are up for re-election and the time until the next election. Grossman & Michelitch (2018) argue that performance information should be disseminated at such a time that the politicians have the scope to improve their performance before the next election, yet not so close to the election that a negative response (e.g. vote buying or intimidation) is potentially triggered.

(a) (b)
F I G U R E 2 7 Funnel graphs a) All study designs b) RCTs only Even where the information provided is salient to constituents' decision-making, the politicians may still manage to subvert the efforts to hold them accountable, either through preventing the dissemination of information or discrediting the messenger and/or the message. Across the included studies, whether this disruption occurred tended to depend on the extent to which the targeted supply-side actors were engaged in the intervention design; their support or "buy in" for the intervention; and the relative local credibility of the messenger of the performance information compared to the targeted actor or institution.
In Banerjee et al. (2014), the only included study which looked at non-elected service providers, the break in the causal chain occurred extremely early on, as the actors charged with implementing the intervention were the very ones whose performance was being measured, and they were able to successfully prevent effective implementation. The purpose of the community observer intervention was to increase citizens' understanding of the police performance and improve their perceptions, and it had been The importance of the local credibility of the messenger can be understood by comparing the results of Capuno & Garcia (2010) with Humphreys & Weinstein (2012). In the latter, the information was developed and disseminated by a national-level NGO that did not necessarily have strong ties across all of the treatment constituencies. The authors present qualitative evidence from town hall meetings where the MP was effectively able to discredit the information presented by the NGO staff and undermine the message to such an extent that participants in the meetings had a worse estimation of their MP's performance compared to comparison groups (Humphreys and Weinstein, 2012). Conversely, in Capuno & Garcia (2010), the information was disseminated through local partners in each municipality, who were engaged in the process of gathering and analysing the performance data as well. In some cases, the researchers actually worked through the LGUs to present the data, yet even in those where the local partner presented the results, the local partners' strong ties to the community reduced the politicians' ability to "shoot the messenger" (Capuno & Garcia, 2010).
Incorporating these insights into the framework, the following refined theory of change presents an improved "fit" framework for performance information interventions ( Figure 29). While the only included study to investigate performance information dissemination on service delivery through non-elected actors failed at the first stage of the causal chain, as described above, we nonetheless suspect that should the support of targeted service providers be secured for an intervention, the causal chain for these interventions would likely mimic that of rights information provision. Note that the results chain from interventions targeting elected politicians through to service delivery is quite long. The final barrier to move from changes in politician performance to improvements in service delivery was not reached in any of the included studies. Grossman and Michelitch (2018) suggest that this may be because improvements in service delivery cannot be the result of changes to a single actor (the politician); rather, they rely on multiple actors who may have limited to no direct accountability to the targeted politician (2018). This suggests the relative weakness of interventions that aim to affect service delivery through changes to politician performance.

| Citizen feedback and monitoring
Ten studies covered interventions that created or strengthened citizens' access to feedback and monitoring processes for public services (Table 19).
This intervention group included the largest sample of included studies, though there were key differences in the intervention components that influenced the causal chains, particularly related to the nature of the public service that was targeted. Of the included interventions, four targeted healthcare, a directly delivered service, F I G U R E 2 9 Theory of change for interventions providing information on individual and institutional service provider performance three targeted infrastructure, an indirectly delivered service, one targeted the security services, and two targeted a mixture of services. Regarding the nature of the intervention approach, two studies offered only community feedback opportunities : Grossman et al. (2017) and Bradley and Igras (2005).
The rest comprised a version of community-based monitoring, yet differed as to whether the monitoring comprised a collaborative process engaging both citizens and service providers or provided support only to citizens; whether the accountability or "interface" meetings between providers and citizens were facilitated; whether performance information was provided, and if so, if it was generated by the community or provided by external researchers; and whether technical training on monitoring for the particular service was provided to communities. The ramifications of these differences are discussed in depth below.
The framework synthesis identified two key moderating factors that influenced the causal chain and five common facilitators.
Moderating factors included: 1) the type of service targeted, as above, and whether for indirectly delivered services, some additional support was provided to shift the power difference between service providers and citizens, either through well-respected civil society or government engagement; and 2) collaborative versus confrontational approaches. The common facilitators included the provision of technical monitoring skills; access to contracts and other key information; the inclusion of provider performance information; the incorporation of a dedicated community monitoring group; and the creation of common knowledge of provider performance.
As with the other accountability and transparency-for-accountability interventions, the nature of the service being targeted appeared to be a key moderating factor within the causal chains. Similarly to , the interventions evaluated in Molina (2014) and Grossman (2017), both of which rely on engagement with unorganised citizens, were unable to realise significant improvements in public service delivery, despite achievements in triggering citizen engagement with the respective platforms. Conversely, even in a challenging context such as the DRC, in Palladium (2015), the implementer's work with local civil society led to greater-thanexpected project success in organising and hosting well-attended community fora to encourage citizen engagement with the security sector. The evaluation presented qualitative evidence that suggested that participation in these fora had positively impacted people's perceptions of security and the security sector (Palladium, 2015). The role of civil society support to communities may be critical not only for encouraging engagement in monitoring and accountability processes, but also for shifting the balance of power between citizens and public service providers of indirectly delivered services.
In Berman et al. (2017), the authors present evidence from qualitative research to test the underlying mechanisms, which found  Similarly, in analysing citizen audits of construction projects in Colombia, Molina presents evidence that suggests that low participation in monitoring opportunities prevented the creation of common knowledge about the projects, which in turn discouraged politicians and service providers from adhering to quality standards, which he refers to as the "self-fulfilling prophecy" phenomenon (2014). (2017) The qualitative evidence further stresses the importance of the technical training to enable the monitors to effectively identify whether the construction was of sufficient quality or not (Berman et al., 2017); such technical training was absent from the intervention studied in Molina (2014). Thus, it may be that a dedicated monitoring group, with a mandate from the community and technical training in monitoring the service targeted, could have a greater impact than an open-forum type of intervention as in Molina (2014) and, and as noted above, the intervention studied in  ultimately adopted the approach of establishing and training a dedicated group of community monitors (World Bank, 2011).

However, in Fiala and Premand
Amongst the sample of community feedback and monitoring interventions, a unique feature of those targeting healthcare services was a focus on a collaborative process that engaged both supply and demand-side actors, i.e. both community members and frontline health centre staff. This set the group apart from the other interventions, which focused on training and/or creating opportunities for citizens to hold providers accountable through dedicated accountability meetings. This included both public Town-Hall style meetings, as in Molina (2014),  and Palladium (2015), To attempt to explain the black box of intervention and outcome, Alhassan et al. tested the underlying mechanisms for service provider motivation, and found that the service providers working in rural health clinics were highly intrinsically motivated, and through the collaborative engagement with the community, increased their intrinsic motivation (2016). This suggests that monitoring interventions that rely on the "soft" power of social sanctions may be more effective when they focus on identifying mutually empowering "winwin" opportunities and ways for citizens and service providers to work together. This theory is also supported by qualitative evidence presented in Bradley and Igras (2005), wherein healthcare staff reported that the empowering process of local problem identification and solving had a strong impact on their attitudes, and led to changes in the way they engaged with each other and with community members. The increased sense of self-efficacy built through this type of approach may extend to the community members, who see the responses to their efforts enacted by service providers, as suggested by Gullo et al. (2017).
The relationship between service providers and users may also be strengthened through the facilitated, collaborative approach because while learning about their service entitlements and identifying opportunities for improvement, citizens also learn more of the intricacies and challenges in service delivery, which may enable them to mitigate their expectations and be more understanding of the frontline staff. Gullo et al. (2017) suggest that the more realistic expectations held by households in treatment communities may account for their increased satisfaction with the health services, despite the context in which there were serious issues in health service supply chains due to a national-level scandal, which led to decreasing satisfaction with health services in control communities.
A final key facilitator in community monitoring interventions is the benefits wrought by including performance measurement information into the intervention. In Björkman Nyqvist et al. (2017) this was done by external researchers and research assistants, who gathered the data and presented it to communities in a digestible and locally appropriate way. This was a very thorough approach, but it has made replication challenging, an issue the authors identify (Björkman Nyqvist et al. 2017). In Alhassan et al. (2016), the implementers worked with the community groups to support them to undertake the performance assessment, which they then used to identify the key opportunities for improvement. CARE's Community Scorecard methodology takes this further, working with communities to create a localised scorecard in which communities develop their own list of priorities and indicators (Gullo et al., 2017). In comparing their two treatment arms, wherein the difference was access to performance information , Björkman Nyqvist et al. (2017) present evidence suggesting that having information on performance and benchmarks was critical for enabling communities to identify realistic opportunities for service improvements. Conversely, in Palladium (2015), which didn't include any performance information, though perceptions of security rose amongst participants, the study did not find evidence of improved service delivery outcomes, and conclude that changes in perceptions may occur more quickly than changes in service delivery (2015). Fiala and Premand, in a study arm comprising only interventions in livestock provision, also find that the inclusion of both community monitoring support and performance information is critical to achieving positive impacts on household assets (2017).
Through the framework synthesis, the key mechanisms, barriers and facilitators were collected and used to refine the theory of change for this group of interventions ( Figure 30).

| Participatory planning
Eight studies evaluated seven interventions or policies that created or strengthened citizens' access to participatory planning processes (  (Ananthpur et al., 2014;Beuermann & Amelina, 2014); two studies measure the effect of mandating women's inclusion in participatory planning (Beath et al., 2013;Humphreys et al., 2014); and one study measures the effect of participatory planning training on citizens' empowerment to demand services (Giné et al., 2018). Grouped differently, five of the studies look at interventions wherein citizens engage in government planning processes Gonçalves, 2013;Ananthpur et al., 2014;Beuermann & Amelina, 2014;and Diaz-Cayeros et al., 2014), and three of them pertain to interventions wherein citizens are engaged in community-driven development (CDD) types of deliberations (Beath et al., 2013;Humphreys et al., 2014;and Giné et al., 2018). Through engaging citizens in the identification of priorities and allocation of resources, these interventions aim to improve the responsiveness of service delivery to citizens' prioritised needs, particularly for vulnerable groups.
The data extracted in the qualitative synthesis for these studies were reviewed to identify patterns of movements along the causal chain. Within this intervention group, four key factors emerged that helped explain the heterogeneity of results: the extent to which the intervention correctly identified and adequately addressed barriers to participation for vulnerable groups; the extent to which the intervention process was designed to encouraged the growth of local social capital and capacity for collective action; the extent to which the local government or decision-making body supported the process and had the capacity to implement it; and the incorporation of explicit measures to facilitate the inclusion of vulnerable groups.
Case comparisons using the included studies are provided to illustrate the importance of these factors.
As noted above, a key goal of participatory planning processes is frequently to ensure the priorities of vulnerable and marginalised members of society are incorporated into decision-making. As  understanding of local practices is required in selecting an appropriate intervention to address an identified barrier.
Understanding and adequately addressing the power gap between "status quo" participants in decision-making processes and those excluded may be key to addressing participation barriers. In Giné et al. (2018), communities that received community-driven development training were evaluated to ascertain the effects of this sector-non-specific training on citizens' capacity to demand public service provision. The training covered elements of participatory development planning, and communities were organised and mobilised to prepare for project implementation. They found that the intervention had a significant effect on the provision of health services by the local "Lady Health Workers" (LHWs), which they attribute to the growth in collective action capabilities amongst women participants, who had indicated at baseline that healthcare was a priority concern (Giné et al., 2018). However, LHWs are local women from the village in a conservative area wherein women are frequently disempowered; the relative difference in power, therefore, between the LHWs and the other villagers is extremely small.
Thus, in this context, an intervention that was designed to be empowering but did not specifically address people's capacity to demand health services nonetheless had an effect, given that the women had indicated that health was their priority area of focus and the relative power difference between village women and LHWs was minimal. It is telling that the study found limited to no effects on health services at the health centre level (ibid.). This is in stark suggested that the explicit measures adopted by some municipalities were critical to opening up the process to diverse disadvantaged groups (ibid.).
In Diaz-Cayeros et al. (2014), the authors note mixed effects of the intervention on women's participation in local governance.
On the one hand, quantitative evidence suggested that the switch from political-party based to traditional governance systems led to a decrease in the number of women in senior municipal government positions, yet the authors also found qualitative evidence that women's participation in traditional governance processes was slowly increasing (Diaz-Cayeros et al., 2014). In this last case, the intervention (the shift to traditional governance) was not imposed by an external party but rather chosen by the community. While the externally-imposed processes evaluated in Humphreys et al. (2014) and Beath et al. (2013) misidentified the local barrier and appropriate response, respec- Incorporating these insights into the framework, the following refined theory of change presents an improved "fit" framework for performance information interventions ( Figure 31).

| Community-based natural resource management
Seven studies covered interventions that created or strengthened citizens' capacity to manage full or close-to-full decentralisation of service delivery (Table 21).
The included studies in this intervention are quite different from those in the previous groups, as the service provision has been decentralised to such an extent that communities themselves are both the user and the provider. This fundamentally shifts the power dynamics at play, complicating the delineation between supply-side and demand-side actors. Community-based natural resource management (CBNRM) interventions aim to improve communities' sustainable access to resources through increasing their control over resource management and maintenance. The complexities and tensions involved in marrying the dual goals of resource use and preservation are evident throughout the interventions, which cover wildlife conservancy (Bandyopadhyay et al. 2004); participatory forestry management Rasolofoson et al. 2015;and Tachibana & Adhikari 2009); and irrigation or water use (Bandyopadhyay et al. 2010;Barde 2017;and Huang 2014). Each of these studies evaluates the implementation of a national-level policy, which tend to have smaller results than pilots or experiments wherein the quality and uniformity of implementation is more easily managed.
A key moderator identified early in the causal chain for these interventions is the extent to which the policy constitutes a relinquishment of government control over the productive resource.
F I G U R E 3 1 Theory of change for participatory planning and priority setting interventions present qualitative data suggesting that local officials selected a mixture of the policies that best suited their interests, rather than the interests of communities (2015). This sensitivity to capture by government officials substantially decreases the potential benefits communities may realise through CBNRM.
The success of CBNRM further depends on the type of resource use in which communities engage, and their capacity to enforce the rules. In a qualitative study of the community conservancies evaluated by Bandyopadhyay et al. (2004), participants highlighted the challenge of preventing poaching in areas frequented by migrants . Such high-stakes monitoring may be beyond the capacity of communities to enforce, particularly without resorting to violence. Conversely, Tachibana  The synthesis of included studies and additional texts suggests that key factors for success in CBNRM interventions may rest on full legalisation of the communities' ownership of resource benefits ; the injection of donor funds to catalyse the change in resource use (Barnes, MacGregor, & Weaver, 2002); sustained external support to enable the community groups to institutionalise slowly over years ; and the presence of tourism opportunities for communities to undertake alternative livelihoods (Barnes et al., 2002;. As a result of the synthesis process, the theory of change was refined for CBNRM interventions. While the causal chain appears relatively linear, the large number of moderators, assumptions and identified barriers and bottlenecks, combined with the often weak results from the evaluations, suggests that these interventions are extremely tricky to carry out at national scale ( Figure 32).

| Common cross-cutting factors and integrated synthesis
Many interventions experienced challenges stemming from a lack of positive engagement with supply-side actors at the intervention target level, whose relative power the interventions often sought to diminish. Interventions implemented within the good governance domain of external engagement generally operate within a context of an imbalance of power in favour of the service provider, who controls the quality of and access to resources and services. Interventions that seek to change this balance of power without engagement with and buy-in from these actors may trigger response mechanisms in which the service providers attempt to block, discredit or co-opt the intervention to maintain their relative power. For example, Humphreys and Weinstein report evidence that some politicians whose performance scorecards were due to be disseminated successfully blocked implementation of the intervention in their constituencies, threatening violence (2012). Banerjee et al. (2014) identify this triggering of a negative response by the service providers at the targeted level (police station chiefs, in this case, who successfully prevented the implementation of community observers in most areas) as the key mechanism leading to a break in the causal chain.
Similarly,  and Rasolofoson et al. (2015) present evidence that government forestry staff members are able to exploit lack of clarity in national-level policies or top-down enforcement of complete implementation such that the officials are able to maintain their control over the resource benefits despite having devolved the responsibilities of management to the communities. Conversely, interventions that were designed and implemented with the support of key power brokers at the level targeted by the intervention, as in the case of municipal governments that chose to implement participatory budgeting in Brazil Gonçalves, 2013) or structured community engagement in the health sector that aimed to strengthen service providers' intrinsic motivation (Alhassan et al. 2016), were able to realise positive impacts across the causal chain.
It is important to note that while in the majority of included broad intervention groups, a break in the causal chain at this stage may at best prevent outcomes tied to service provider response or lead to null effects, in the case of community-based natural resource management there is a risk of causing negative effects on well-being outcomes. As noted in the  and Rasolofoson et al. (2015) cases, this may happen where a lack of full intervention implementation leads to a context in which resource-and time-poor communities increase their burden of natural resource management, have less access to the resource due to sustainability restrictions, and are not afforded adequate compensation in the form of resource benefits ownership or alternative livelihoods support. The risk that an intervention may F I G U R E 3 2 Theory of change for community-based natural resource management interventions do harm to a community should be seriously considered during project design, and locally appropriate mitigation measures should be developed to lessen the likelihood of negative impacts.
Building on the above, the findings of this review lend some support to the theory that citizens' attempts to increase their relative power through means seen as confrontational by service providers often disincentivise the service provider from participating (World Bank 2004). The findings of this review suggest that approaches to citizen-service provider engagement in the realm of accountability, including transparency for accountability, appear to work more effectively when implemented through phased, facilitated processes that are framed as collaborative, as opposed to one-off accountability meetings that tend to be interpreted as confrontational. Interventions that promote transparency with the aim of triggering mechanisms that motivate citizens to demand greater accountability often fall closer to the confrontational spectrum, and their limited success on realizing outcomes along the causal chain is evident throughout the included studies. Those that promote an explicitly collaborative process may be more effective, particularly when they incorporate measures to improve citizens' understanding of performance benchmarks, such as in Björkman and Svensson (2010) and Alhassan et al. (2016). In these two programs, though citizens were provided or supported to gather information on service provider performance quality, respectively, the process of applying that knowledge to service improvements was done in a collaborative way that was mutually empowering, in line with the theory suggested by Fox (2014).
We note, however, a difference between interventions targeting individuals versus service provider institutions, and caution that it may be more difficult to engage in collaborative approaches to performance improvements with individuals, such as politicians, who are understandably more likely to feel personally targeted. In these situations, the synthesis suggests that ensuring the engagement of a locally credible messenger to disseminate performance information reduces the ability of the targeted individual to undermine, co-opt or discredit the information.
One potential limitation of interventions relying on accountability and transparency-for-accountability through community engagement, however, is that while such interventions often met with some success in realising improvements at a local level regarding service delivery quality, there are many service delivery bottlenecks that cannot be dealt with through community engagement. This was a barrier highlighted in Bradley and Igras (2005) and Gullo et al. (2017): in both these evaluations, the authors identified improvements only among indicators that could be addressed without changes in resources or support. This provides some support to an assumption identified in the initial theory of change, which identified a risk that improvements would be limited to those that were within the purview of the service providers targeted for support. Bottlenecks such as issues in service supply chains or those requiring the approval and engagement of more senior management, particularly at provincial level and above, are unlikely to be successfully addressed through community engagement efforts. This reinforces the need for proper bottleneck identification during project design, to ensure the proper tools are applied.  (2014).
Following the completion of the initial framework synthesis, we added codes to the meta-analysis data to test the strength of some of the mechanisms identified. We first tested the strength of the influence from the different types of service delivery. Initially, the distinction was theorised to be between pure public goodsservices provided by the state which are non-rival and non-excludable, e.g.
public roadsand merit goodspublic services which are rival and excludable, usually because they are provided by front-line public servants, e.g. health services, or are subject to rationing, e.g. food subsidies. We expected to see stronger results around citizen engagement in merit goods provision, in which accountability to service users is more direct, leading to differential effects on access and possibly use and wellbeing further along the causal chain. Note that this distinction relates only to the three accountability and transparency interventions (rights information, performance information, and community feedback and monitoring); it did not emerge as a strong explanatory factor in participation interventions (participatory planning and CBNRM). Based on the results of the integration with the meta-analysis, we revised the theory, including the theory of change best-fit frameworks, to hypothesise that the break in the causal chain at provider response for services such as infrastructure or municipal government is more likely to be due to the nature of the interaction between citizens and those they are attempting to hold accountable. In what we initially conceptualised as merit good services, such as food subsidies, citizens collect the subsidies directly from the service provider staff member; thus, the citizens and providers interact in the provision of services, and thereby have a relationship that extends beyond the accountability measures. This is in contrast to a service such as a road, which is built by service providers but accessed by citizens independently of the providers; once the road is built, the providers are no longer engaged in its day to day management and use. As a result, the relationship between the citizens and service providers is constrained to the accountability initiatives. Upon revisiting the framework synthesis, we extracted further evidence in support of this theory, which is described above.
In addition to the moderating variable regarding the nature of service provision, we further attempted to test the strength of the facilitator identified around service provider engagement by coding interventions according to whether they engaged with service providers in the design and/or implementation of the intervention a) at the point targeted by the intervention; b) with different public service officials whose behaviour wasn't targeted; or c) no engagement with the supply side. However, the results were inconclusive, which was likely due to the small sample of studies within each group and additional key factors that made it difficult to statistically isolate the potential impact of service provider engagement.
F I G U R E 3 3 Immediate outcomes for pure public and merit goods 9 | COST EVIDENCE (REVIEW QUESTION 5) Cost effectiveness is a key question for decision makers, and one that is rarely incorporated into systematic reviews. 6 Unfortunately, few included studies included cost information and no studies included cost information systematically. We present the data here drawn directly from the study reports. Table 22 presents the types of programme costs analysed and key findings.
Two studies (Björkman Nyqvist et al. 2017;Pandey et al. 2007) presented some description of cost per outcome. 7 The measures used to define costs and expenditures varied across these studies.
None of the studies presented tables with detailed breakdown of costs by any kind of category or intervention. This limited the potential for any kind of comparisons across programme settings and intervention designs.
Programme costs were reported in four studies (Alhassan et al. 2015;Ananthpur et al. 2014;Björkman Nyqvist et al. 2017;Pandey et al. 2007). Only total costs were presented across these studies, and the costing methodology used in arriving at these cost values were described as "back of the envelope" and were not detailed. No studies included cost information systematically. One study was assessed as a full economic evaluation (cost-effectiveness) (Björkman Nyqvist et al. 2017) and the remaining three were assessed as partial economic evaluations. The methodological quality of all the economic studies were found to be low (Table 22).
Three studies reported factors influencing implementation costs. Alhassan et al. (2015) suggested that factors such as champions being community members and resources being mobilised from within the F I G U R E 3 5 Final outcomes for pure public and merit goods 8 Cost data may be considered as incomplete due to loss of follow-up. MNAR or nonignorably missing censoring occurs when the mechanism that generates the censored observations is correlated with the mechanism that generates cost (Glick, Doshi, Sonnad, & Polsky, 2015). community, influenced in keeping the implementation costs low. Björkman et al. (2017) highlighted the cost of data collection for the report card to be the main cost item influencing implementation costs. Pandey et al. (2007)  impact on the quality of and access to public services, including health care, social protection, justice and physical infrastructure, and social and economic wellbeing of citizens (review question 1). We also considered the impact on intermediate outcomes in the causal chain, including citizen engagement and provider response (review question 2), and how results vary by participants and location (review question 3). In addition, we aimed to understand the mechanisms and processes through which change happens, by identifying programme design, implementation, context, and mechanism factors associated with programme effectiveness along the causal chain (review question 4). Due to insufficient cost data, we were unable to address review question 5 on the costeffectiveness of interventions incorporating PITA characteristics.
We used quantitative meta-analysis to combine the results of the impact evaluations, including sub-group analysis to explore heterogeneity by intervention, study location and other moderators. We conducted a detailed critical appraisal of the included impact evaluations to assess the credibility of the results. From the included programmes, we identified 36 associated qualitative and programmatic documents that we used to address review question 4. We used framework synthesis to synthesise the data.
We reported our quantitative results in this section along the causal chain to address review questions 1-3, supported by the results from the qualitative framework synthesis to address review question 4. We start by presenting the results of the overall synthesis, followed by the individual results for the five intervention areas.

| Effectiveness of citizen engagement interventions
The meta-analysis found that citizen engagement interventions are usually effective in increasing the engagement of service users, for

Heterogeneity of impacts across populations
We considered diversity and equity of impacts across different population groups in three ways. Overall, few of the studies reported disaggregated intervention approaches and/or analysis of results for different population groups. We identified five studies that incorporated specific measures within the intervention to extend the engagement to vulnerable groups, which comprised three participatory planning interventions and one each of rights information provision and citizen feedback or monitoring. These programmes tended to have smaller effects on citizen engagement and access to services than other programmes, but it is unclear whether this was due to many of the programmes being implemented in challenging contexts (e.g. Afghanistan, Pakistan and DRC) rather than problems inherent in targeting vulnerable community members. Further, we identified nine studies that conducted sub-group analysis to differentiate impacts for different population groups, most commonly by socio-economic status and by gender, yet these were spread widely across intervention type and geography. Finally, we looked for studies that conducted equity-oriented causal chain analysis, and identified only one study that conducted a detailed qualitative assessment that incorporated consideration of differentiated impacts for women. We also examined overall differences by global region, but were not able to find consistent differences by intervention or outcomes along the results chain. Ultimately, due to the small sample of studies across a wide range of interventions and outcomes, it is difficult to conclude anything systematically for different population or geographic groups.

| Performance information provision
We identified six evaluations of public official or service provider performance information interventions, such as the dissemination of municipal government performance scorecards in Afghanistan, Brazil, the Philippines and Uganda, and monitoring information provided in police stations in India.
The framework synthesis identified that amongst performance information interventions, a key facilitating factor was the extent to which implementers secured the support of and buy-in from the actors whose performance was being analysed and disseminated.
Without such support, the findings suggest that the targeted actors may be able to avoid accountability by either preventing full implementation of the intervention, or by successfully undermining the credibility of the performance information disseminated. Most of these interventions targeted political actors' performance (as opposed to specific public services), in attempt to "shorten the long route" of citizen-state accountability by increasing citizen engagement with politicians outside of elections. While interventions were at times successful in eliciting some improvements in politician performance, the findings suggest that, ultimately, this route remains too long to identify short-term effects on service delivery. Politicians may claim plausible deniability of their individual capacity to influence service delivery change, and such interventions do not engage many key actors involved along the public service delivery supply chain.

| Citizen feedback mechanisms
We identified 10 evaluations of accountability interventions, which specifically comprised citizen feedback or monitoring mechanism interventions, that is, those that solicited feedback regarding and/or actively engaged citizens in the monitoring of service delivery, to hold public service providers and institutions responsible for executing their powers and mandates according to appropriate standards. These include community report cards in infrastructure (Afghanistan, Indonesia and Colombia), health (Ghana, Malawi and Uganda), agriculture (Uganda) and the security sector (DRC), and individual citizen "feedback loops" in Guinea, Kenya and Uganda.
The framework synthesis suggested that citizen feedback and monitoring interventions were more successful at achieving results where some or all of the following factors were present: • Interventions targeted a service that citizens accessed through interactions with front-line providers; • A phased, facilitated approach jointly engaged citizens and service providers in monitoring • Performance benchmarks; • Creation of common knowledge of feedback or monitoring results; and • Working through local community organizations to strengthen community members' voices.

| Rights information provision
We identified five evaluations of rights information interventions, which enable users to demand minimum standards for access to services, such as for social protection services in Indonesia (food subsidies) and India (public works), maternal and child health care in India and freedom of information in Pakistan.
The results from the framework synthesis suggested that interventions informing citizens of their rights were more likely to succeed where they targeted the provision of a service citizens access directly from front-line providers; created a sense of common knowledge about people's rights to the service among citizens and providers; and built an appropriate level of social sanction risk for providers.

| Participatory planning interventions
We identified nine participatory priority setting, planning or budgeting interventions, wherein citizens participated in setting the priorities for and/or planning of local services. These include support for participatory budgeting in municipal governments in Brazil, Mexico and Russia, and support for participatory planning in India, Pakistan, Guinea and Kenya. It also included requirements for inclusive participation in two fragile contexts, Afghanistan and DRC.
The framework synthesis suggested three factors improved the likelihood of achieving results along the causal chain: • Strong local buy-in from front-line service providers for the intervention; • Incorporating specific, culturally appropriate measures that address local barriers to the participation of vulnerable groups; and • Interventions designed to spur the growth of local civil society and capacity for collective action. While we identified seven studies of community-based natural resource management committees, these were all rated as having a high risk of bias or having some concerns, with the exception of Barde's (2017) evaluation of water user associations in Brazil.

Inclusive participation interventions
We also undertook a formal assessment of the external validity of the included studies. A number of studies still do not report their sampling strategy clearly, and a surprisingly small share of studies specifically discuss the generalisability of their findings to other contexts.
Only 11 studies explicitly discussed external validity. Among those studies, five acknowledged the limits to the generalisability of their findings, due to the small scale of the study or the sampling strategy.

| Quality of the evidence
Overall, the quality of evidence from randomised studies is relatively high, with studies for the most part ensuring comparability of intervention and control groups and protecting them from selection bias. The risk of bias assessment is therefore more relevant at the outcome level. We identified concerns related to the way some outcomes are measured in the majority of studies. This is due to the use of self-report measures that are often biased by the intervention itself. A majority of the non-randomised studies are natural experiments, which in most cases did not provide enough information on the selection process into the programme to reject the risk of selection bias, or failed to overcome the selection bias and confounding that was identified. Transparency in reporting is an issue for randomised and non-randomised studies alike given the few pre-registrations of trial, outcomes or analysis plan. The use of methods such as placebo outcomes or groups, and blinding for outcome assessors or data analysts, is not common, though it seems relatively easy to implement and could reduce risks of biases.

| Limitations and potential biases in the review process
There are several limitations of this review related to both the existing evidence base in this area and the synthesis approach.
10.4.1 | Limitations of the existing evidence base 1. Statistical power for the meta-analyses and heterogeneity analysis: Our ability to make strong conclusions on the effectiveness of the PITA mechanisms and interventions were limited by the number of studies looking at each intervention and outcome area. This was despite using a fairly high level of aggregation for mechanisms, intervention areas and outcomes.In addition, we were unable to undertake the full moderator analyses that we specified in the protocol to explore heterogeneity quantitatively that due to a limited number of included studies in each mechanism and intervention category.

2.
Reporting in primary studies: We were limited in our ability to test key mechanisms quantitatively that we identified through the framework synthesis due to limited reporting of design and contextual characteristics in the impact evaluations. For example, our framework synthesis and previous reviews have suggested that the extent to which interventions engaged with or were strongly supported by national or local governments would be an important determining factor for effectiveness. However, primary studies rarely reported on this in detail.
3. Cost-effectiveness analysis: We aimed to undertake an analysis of the cost-effectiveness of the included set of interventions (review question 5), however we were limited by the available cost data.
10.4.2 | Limitations of the review scope and synthesis process 1. The focus of our review questions were on the valued added of incorporating PITA characteristics into existing service delivery, and therefore we did not include studies that studied the impact of combining PITA-based interventions with co-interventions to improve resources or capacity for service delivery. One of the hypotheses emerging from our review is that citizen engagement interventions that do not incorporate complementary interventions along the service provider supply chain may be insufficient to improve key wellbeing outcomes for target communities.
However, we are unable to say this conclusively without comparing to the results of interventions that do combine PITA mechanisms and supply side interventions. We believe that this would be a valuable subject for future synthesis.

2.
We did not include studies of education related to PITA mechanisms in our review due to overlap with existing systematic reviews and time and resource limitations. However, the inclusion of this evidence base may have increased the power of our quantitative analysis and the generalisability of our results to this sector.
3. Due to time and resource limitations, we did not undertake independent double coding of effect size information or the qualitative data extraction. In addition, we only undertook double coding for the risk of bias assessments for a sample of 20 per cent of studies rather than the full set. However, the results of the independent double coding of risk of bias demonstrated a high level of agreement between the two authoauthors.

| Deviations from protocol
This review largely followed the approach described in the associated protocol published in the Campbell Library .
However, we note several deviations.
1. Upon identifying the included studies, we mapped the characteristics of each intervention and produced a framework of five subinterventions that shared similar design characteristics. These categories were not pre-specified in the protocol as we defined our intervention inclusion criteria using PITA design characteristics and were unsure what the final set of included interventions would look like. We used these categories to undertake sub-group analysis by intervention area.
2. As noted in the previous section, we did not undertake full independent double coding of effect size information or the qualitative data extraction although categorisation of all effect sizes into outcome groups for every study was done by two authorauthors.
3. We discussed exploring the possibility of applying alternate methods to link the meta-analysis with context and mechanism information, such as QCA (Befani, 2016).QCA articulates the associations between empirical effects and context and mechanism conditions drawing on "truth-tables" which articulate all possible instances of conditions and show which cases share the same combination of conditions. We noted that the application of QCA is limited by the number of included studies, their comparability and the completeness of reporting within them, hence the application of QCA was not feasible in this review. We were unable to apply QCA to our review due to number of included studies, their comparability and the completeness of reporting within them. Instead we used realist-informed framework synthesis that moved towards "best fit" framework synthesis to explore context and mechanism information.
4. In addition, we identified potential programme mechanisms and a moderator variable (merit versus pure public goods) in the qualitative framework synthesis that we subsequently tested in the meta-analysis through sub-group analysis. This moderator analysis was not described in the protocol.

| Agreements and disagreements with other studies or reviews
This systematic review is the first that we are aware of to consider the effects of a range of interventions with PITA characteristics across a range of sectors. The findings from the review are broadly consistent with reviews that have examined governance interventions and/or have examined demand and supply in service delivery. For example, the recent review of community driven development programmes by White et al. (2018) found that effects tended to diminish further along the causal chain, such as programmes were often ineffective in improving wellbeing outcomes, apart from in the special case of water and sanitation.
Several high quality systematic reviews exist focusing specifically on the impact of community-based monitoring and information interventions (Molina et al. 2016;Snilstveit et al. 2015).

| Implications for policy and programming
This section presents the main conclusions for policy and programmes from the synthesis of impact evidence on interventions promoting external participation and accountability in low-and middle-income countries. As might be expected for a review of broad interventions and even broader scope of outcomes, there is significant heterogeneity in findings. In order to manage the anticipated heterogeneity, we developed a framework which enabled sensible grouping of interventions and outcomes. The results from analysis according to this grouping suggested significant heterogeneity in findings across intervention groupings and outcomes.
The first conclusion is that, regardless of intervention type, interventions are usually effective in improving engagement of citizens in service delivery and improving access to services and quality of service provision. However, external participation and accountability interventions are not often able to elicit strong responses from public services.
Secondly, evidence suggests some interventions may be more effective in improving service delivery outcomes, including those with stronger accountability components, and those providing rights information. The findings about relative effectiveness across interventions are tentative in light of the heterogeneity in evidence WADDINGTON ET AL.
| 81 of 90 included in the review. More promising evidence, however, was found in the effectiveness of accountability interventions, including transparency for accountability, that targeted the provision of merit good-type public services, as opposed to those that targeted pure public good-type services. For merit good services, such as health care, citizens typically already came into contact with service delivery agents in order to access the service, and simply built on those relationships to advocate for improvements in service provision management; this multidimensional and ongoing personal engagement between providers and users, comprising both everyday service delivery and accountability engagement, was better able to elicit improvements in service provider actions, leading to greater impacts in quality of and access to services. In contrast, for pure public good-type services, such as roads, citizens typically accessed or used the service independently of front-line providers, and thus their relationship with service providers through citizen engagement efforts was more one-dimensional, focused solely on the accountability efforts. The social sanctions threat of local civic engagement was not strong enough to overcome the power difference between providers and users, and thus interventions often failed to elicit responses in service provider actions, leading to a break in the causal chain. However, there is some evidence from Afghanistan that suggests that where interventions targeting pure public goods incorporate the engagement of strong, locally well-respected civil society groups, the additional social capital provided by the CSO enables citizens to overcome this bottleneck and realise improvements in service delivery quality through citizen engagementyet there is a caveat that effects may only hold so long as the active CSO engagement continues (Berman et al., 2018).
The third main conclusion is that outcomes tend to get smaller along the causal chain, to the extent that we do not expect participation and accountability interventions of themselves to improve wellbeing. This finding should not be surprising, partly because the deteriorating causal chain is a common occurrence, called elsewhere the "funnel of attrition" (see White, 2014). The other reason is that the systematic review inclusion criteria were limited to studies examining the marginal effect of a participation or accountability intervention on top of standard public service delivery.
Hence, any study (or trial arm) that incorporated any co-interventions, including increased resource delivery, was excluded. It is highly possible that participation and accountability interventions when provided alongside other services that can relieve important bottlenecks, can act to improve behavioural responses and wellbeing.
The results suggest particular attention should be paid to the following areas when designing and implementing interventions: Ensuring positive engagement with supply-side actors at the intervention target level Many interventions experienced challenges stemming from a lack of positive engagement with supply-side actors at the intervention target level, whose relative power the interventions often sought to diminish.
Interventions seeking to change this balance of power with engagement and buy-in from these actors are likely to be more effective in improving service delivery outcomes and state-society relations. Interventions implemented with the strong support of the targeted supply-side actors, such as the case of municipal governments that chose to implement participatory budgeting in Brazil or structured community engagement in the health sector have been able to realise positive impacts across the causal chain. In contrast, in Rajasthan, India, only national police leadership were involved in the design of the intervention; local police chiefs, whose behaviour was targeted, were not engaged, and were subsequently able to undermine or effectively block implementation of the intervention (Banerjee et al., 2014).

Particular consideration for natural resource management committees
In the majority of included intervention sub-groups, a limited response on behalf of the service provider may at worst prevent outcomes tied to service provider response or lead to null effects. In the case of CBNRM, however, there is a risk of causing negative effects on well-being outcomes, where a lack of full intervention implementation leads to a context in which resource-and time-poor communities increase their burden of natural resource management, have less access to the resource due to sustainability restrictions, and are not afforded adequate compensation in the form of resource benefits ownership or alternative livelihoods support. For example, in Madagascar, Rasolofoson et al. (2015) reviewed the set of policies, laws and regulations for natural resource management in the country, and identified numerous inconsistencies and contraditions.
They presented qualitative evidence suggesting that this complicated and contradictory policy and legal framework was exploited by frontline forestry staff, who were able to manipulate implementation of the CBNRM forestry policy to suit their purposes and retain power and effective control over the resources, thus causing a break in the causal chain as implementation of the policy was neither complete nor consistent.
Collaborative versus confrontational approaches to service provider engagement The findings of this review lend some support to the theory that citizens' attempts to increase their relative power through means seen as confrontational by service providers often disincentivise service provider participation (World Bank 2004). The findings of this review suggest that approaches to citizen-service provider engagement in the realm of accountability, including transparency for accountability, appear to work more effectively when implemented through phased, facilitated processes that are framed as collaborative, as opposed to one-off accountability meetings that tend to be interpreted as confrontational. Interventions that promote transparency with the aim of triggering mechanisms that motivate citizens to demand greater accountability often fall closer to the confrontational spectrum, and their limited success on realizing outcomes along the causal chain is evident throughout the included studies. Those that promote an explicitly collaborative process may be more effective, particularly when they incorporate measures to improve citizens' understanding of performance benchmarks. This was the case in Uganda, where Björkman Nyqvist et al. (2017) found that communities were better able to identify locally-solvable problems within healthcare service provision and advocate for their improvements when they had access to performance benchmarks and training on healthcare monitoring; when local performance information was not provided, service proiders were better able to skirt accountability by identifying for community monitors key constraints over which they had no control. We note, however, a difference between interventions targeting individuals versus service provider institutions, and caution that it may be more difficult to engage in collaborative approaches to performance improvements with individuals, who are understandably more likely to feel personally targeted. In these situations, the synthesis suggests that ensuring the engagement of a locally credible messenger to disseminate performance information reduces the ability of the targeted individual to undermine, co-opt or discredit the information. This was the case in the Philippines, where Capuno and Garcia (2010)  Facilitating engagement by building local social capital and capacity for collective action Across included interventions, a key facilitator identified in the framework synthesis was the value-add of incorporating into intervention design active engagement with local organized community groups, such as CSOs or interest groups, or the inclusion of measures that explicitly sought to build local social capital and capacity for collective action. The role of civil society support to communities may be critical not only for encouraging engagement in monitoring and accountability processes, but also for shifting the balance of power between citizens and public service providers of indirectly delivered services. There is some evidence that CSO engagement is particularly critical for interventions targeting indirectly-delivered, pure public goods. Engaging CSOs in the intervention may strengthen the social capital of individual citizens: the stronger voice may increase citizens' ability to access the information needed to hold service providers accountable; help bring key stakeholders together in interface meetings; and increase citizens' bargaining power with service providers, thus strengthening their capacity to realize improvements in service delivery quality. As above, this was found to be the case in Afghanistan, where Berman et al. (2018) presented qualitative evidence suggesting that the strength of the name of the highly respected CSO in the intervention enabled community monitors to access key documents such as contracts that had previously been denied. The CSO was also able to engage key actors from government in the monitoring meetings, strengthening the risk for service providersyet when the CSO disengaged, the effects petered out. This suggests the importance both of long-term engagement and of long-term follow-up, as outcomes are frequently not static.

| Implications for research
The results suggested significant heterogeneity according to study design and implementation characteristics. Thus, RCTs tended to have smaller effects than non-randomised studies. Although this finding is consistent across different literatures, and is indicate of the types of effect estimand that RCTs produce, it is important to note that well-conducted RCTs are considered to provide the most reliable estimates of outcome changes, and as a study design is highly amenable to the types of interventions contained in this review. The result of the risk of bias analysis has shown that the overall quality of evidence from the randomised studies is relatively high: risks of confounding and selection bias are low, however researchers should rely less on self-reported outcome measures, which are more susceptible to biases. A majority of non-randomised studies were at high risk of selection bias and confounding, due to the unclear or self-selection of communities into the programme and the lack of baseline data. When baseline data are available and the appropriate analysis method is used, authors may overcome these biases. There are concerns related to reporting; in particular, there is a lack of transparency with regards to how analyses were conducted, how authors responded to implementation problems (e.g. attrition), and approaches to selecting groups for inclusion in the study (external validity). We also anticipate that there are more opportunities to conduct rigorous natural experiments evaluating "real world" national policy or reform over the longer-term than have been taken so far, including through use of regression discontinuity design (as indicated by the study awaiting classification in this review - Tohari et al., 2017). Such studies may be done particularly cost-effectively where existing survey or administrative data can be used.
Researchers should consider the following when undertaking impact evaluations in this area:  We have attempted in this review to demonstrate that it is possible to undertake higher-level synthesis work to articulate broader mechanisms at play which aimed to inform centralised strategic planning.
However, we note that systematic reviews are usually most effectiveespecially in communicating findings to programmerswhen they examine a particular intervention, such as "community-driven development". Hence our attempt in this study to provide both broader-level analysis of empirical results across studies and within-study findings for particular interventions. In addition, our study identified several potential areas for future synthesis work: 1. We focused in this review on interventions that isolated the PITA component, and therefore did not incorporate co-interventions to target the resource base or capacity of the public service providers. It would be useful for a future systematic review to compare the findings of interventions that introduce only PITA mechanisms alongside interventions that combine PITA mechanisms with co-interventions. Future research could also explore the comparative effectiveness of interventions instigating PITA mechanisms within the external engagement domain of governance versus those aiming to strengthen PITA mechanisms within the internal institutional systems of public service provision. Any synthesis work would likely need to focus on particular aspects of participation and accountability, or intervention groups, in order to be both manageable and policy-relevant.

2.
We excluded studies of interventions from the education sector, as they have been synthesised by several previous reviews.
However, we note that a similar mechanisms synthesis could be undertaken of studies in the education sector, which constitute a substantial body of research in this area.
3. Fully mixed-methods systematic reviews examining the effectiveness of particular intervention types (e.g. participatory budgeting, water user associations) would also be valuable.

| ROLES AND RESPONSIBILITIES
The study protocol was developed by Hugh Waddington (HW), Ada Sonnenfeld (AS) and Jennifer Stevenson (JS). The search strategy was designed with John Eyers, and carried out by AS, JS and HW. Juliette Finetti and JS did the critical appraisal with inputs from HW. JS and HW collected the effect size data with inputs from AS, and HW did the meta-analysis. AS collected the qualitative data and did the framework synthesis with inputs from HW. HW, AS and JS wrote the report. Denny John did the cost analysis with inputs from HW. Waddington (HW), who provided additional inputs on intervention design and outcomes. The team were supported by an advisory group of academics and policy makers with specific expertise in governance.
• Information retrieval: AS and JS designed the electronic search terms, which were developed in platform search protocols by John Eyers. AS and JS undertook the screening of titles and abstracts and online sources. AS, JS and HW screened studies at full text.
• Data collection: AS, JS and HW collected descriptive data from included studies. JS and HW extracted effect size data. Juliette Finetti and JS did the risk of bias assessment. Denny John from the Campbell Collaboration extracted information on unit costs.
• Statistical analysis and framework synthesis: HW did the metaanalysis. AS did the framework synthesis. All authors contributed to the discussion and implications.

| Sources of support
We are grateful for financial support from the United States Agency for International Development (USAID) via NORC project number 7554.070.01.
Raj Popat provided research assistance on searching.
We thank Daniel Phillips (NatCen) for coordinating the peer review. Helpful inputs were given at various stages of the review process by the Campbell Collaboration Methods Group, three anonymous referees, USAID and the following stakeholder advisory group members: • Andrew Greer, USAID • Annette Brown, FHI360 • Courtney Tolmey, Results for Development Finally, we are also grateful to Rohini Pande for helpful advice at the scoping stage of the review.

| DECLARATIONS OF INTEREST
None of the team members have any financial interests in the review or have worked on primary research covering the interventions covered by the review.