Interventions to improve the labour market outcomes of youth: A systematic review of training, entrepreneurship promotion, employment services and subsidized employment interventions

15 EXECUTIVE SUMMARY 17 Background 17 Objectives 17 Search methods 17 Selection criteria 18 Data collection and analysis 18 Results 19 Conclusions 21 1 BACKGROUND 24 1.1 The research problem: why youth employment? 24 1.2 The intervention: ALMPs for youth 26 1.3 How the ALMPs are supposed to work 27 1.3.1 Training and skills development 28 1.3.2 Entrepreneurship promotion 32 1.3.3 Employment services 37 1.3.4 Subsidized employment 41 1.4 Why the review is needed 46

: Summary of results for measures within business performance outcomes 91 Table 16: Summary of results for employment outcomes by main category of intervention 93 Table 17: Summary of results for earnings outcomes by main category of intervention 94 Table 18: Summary of results for business performance outcomes by main category of intervention 95 Table 19: Summary of results on employment outcomes across main categories of intervention in high-income countries 101 Table 20: Summary of results on employment outcomes across main categories of intervention in low-and middle-income countries 102 Table 21: Summary of results on earnings outcomes across main categories of intervention in high-income countries 102 Table 22: Summary of results on earnings outcomes across main categories of intervention in low-and middle-income countries 102 Table 23: Summary of results on employment outcomes by duration 104 Table 24: Summary of results on earnings outcome by duration 105 Table 25: Summary of results on employment outcomes reported by a specific gender or both 107 Table 26: Summary of results on earnings outcomes reported by a specific gender or both 108 Table 27: Summary of results on employment outcomes by participant income group (where yes is low-income, disadvantaged, at risk or vulnerable youth) 110 Table 28: Summary of results on earnings outcomes by participant income group (where yes is low-income, disadvantaged, at risk or vulnerable youth) 111 Table 29: Summary of results on employment outcomes by participant income group (lowincome participants includes also disadvantaged, at risk or vulnerable youth) 115 Table 30: Summary of results on earnings outcomes by participant income group (lowincome participants includes also disadvantaged, at risk or vulnerable youth) 116 Table 31: Summary of results on employment outcomes by scale of the programme118 Table 32: Summary of results on earnings outcomes by scale of the programme 119 Table 33: Summary of results on employment outcomes by implementer 120 Table 34: Summary of results on earnings outcomes by implementer 122    Figure 11: Summary forest plot of earnings outcomes (full sample) by main category of intervention 94 Figure 12: Summary forest plot of business performance outcomes (full sample) by main category of intervention 95 Figure 13: Summary forest plot of employment outcomes by main category of intervention for high-income countries 98 Figure 14: Summary forest plot of employment outcomes by main category of intervention for low-and middle-income countries 99 Figure 15: Summary forest plot of earnings outcomes by main category of intervention for high-income countries 100 Figure 16: Summary forest plot of earnings outcomes by main category of intervention for low-and middle-income countries 101 Figure 17: Summary forest plot of employment outcomes by duration of period between individual exiting the intervention and data measurement (short, medium and long term) 104 Figure 18: Summary forest plot of earnings outcomes by duration of period between individual exiting the intervention and data measurement (short, medium and long term) 105 Figure 19: Summary forest plot of employment outcomes (full sample) by gender 106 Figure 20: Summary forest plot of earnings outcomes (full sample) by gender 108 Figure 21: Summary forest plot of employment outcomes by participant income group (where yes is low-income, disadvantaged, at risk or vulnerable youth) 110 Figure 22: Summary forest plot of earnings outcomes by participant income group (where yes is low-income, disadvantaged, at risk or vulnerable youth) 111 Figure 23: Summary forest plot of employment outcomes by main category of intervention for low-income and disadvantaged participants 112 Figure 24: Summary forest plot of employment outcomes by main category of intervention for non-low-income/non-disadvantaged participants 113 Figure 25: Summary forest plot of earnings outcomes by main category of intervention for low-income/disadvantaged participants 114 Figure 26: Summary forest plot of income outcomes by main category of intervention for non-low-income/non-disadvantaged participants 115 Figure 27: Summary forest plot of employment outcomes by scale of the programme117 What are the main findings of this review?
Included studies had to: (1) evaluate an active labour market programme (ALMP) which was designed for -or targeted primarily -young women and men aged between 15 and 35; (2) have an experimental and quasi-experimental design; and (3) report at least one eligible outcome variable measuring employment, earnings, or business performance.
The evidence base covers 107 interventions in 31 countries, including 55 using skills training, 15 with entrepreneurship promotion, ten using employment services and 21 using subsidized employment.
Overall, youth employment interventions increase the employment and earnings of those youth who participate in them. But the effect is small with a lot of variation between programmes. There are significant effects for entrepreneurship promotion and skills training, but not for employment services and subsidised employment.
Impacts on earnings were also positive but small and highly variable across programmes. Entrepreneurship promotion and skills training were effective in increasing earnings, while effects of employment services and subsidised employment were negligible or statistically insignificant. There is limited evidence of the effects of youth employment programmes on business performance outcomes, and the effect size was not statistically significant.
In addition to the variation in impact across different types of programmes, some variation can be explained by country context, intervention design, and profile and characteristics of programme beneficiaries. The impacts of ALMPs are greater in magnitude in low-or middleincome countries than in high-income countries. Programmes targeting the most disadvantaged youth were associated with bigger programme effects, particularly for earnings outcomes, and effects were slightly larger for women than for men.

What do the findings of this review mean?
The evidence suggests that investing in youth through active labour market measures may pay off. Skills training and entrepreneurship promotion interventions appear to yield positive results on average. So, there are potential benefits from combining supply-and demand-side interventions to support youth in the labour market.
The evidence indicates the need for careful design of youth employment interventions. The "how" seems to be more important than the "what" and, in this regard, targeting disadvantaged youth may act as a key factor for success.
There is a need to strengthen the evidence base with more studies of promising programmes, especially in sub-Saharan Africa. Further research should investigate intermediate outcomes and soft skills, and should collect cost data.
How up-to-date is this review?
The review authors searched for studies published up to January 2015. This Campbell systematic review was published in November 2017.
potential to increase human capital and employment prospects in the long-term. The evidence suggested that programmes targeting disadvantaged youth are particularly effective. Entrepreneurship promotion and skills training programmes appear to be a particularly promising intervention for improving employment, earnings and business performance, but the evidence base is still relatively small. More rigorous impact evidence is needed for particular employment programmes more generally, including employment services, subsidised employment and entrepreneurship promotion.

Executive summary BACKGROUND
The youth of today represent a vast potential for inclusive growth and development. If youth are given the opportunity to build appropriate skills and access decent employment, they can help to accelerate progress on the 2030 Agenda for Sustainable Development and engage in meaningful work that benefits them, their families and society as a whole.
Unfortunately, decent jobs are not a feasible prospect for all young women and men. Today, over 73 million young people are unemployed worldwide. Youth unemployment stands at a much higher level than the average unemployment rate for adults, in some cases over three times as high. Moreover, two out of five young people in the labour force are either working but poor or unemployed. The youth employment challenge is therefore not only about job creation, but also -and especially -about enhancing the quality of jobs for youth.
Youth's gloomy prospects in the labour market embody a massive waste of potential and a threat to social cohesion. Understanding what works to improve their labour market outcomes is therefore of paramount importance and a development priority for all countries and regions.

OBJECTIVES
The aim of this systematic review was to investigate the impact of youth employment interventions on the labour market outcomes of young people. The interventions under review comprised training and skills development, entrepreneurship promotion, employment services and subsidized employment. Outcomes of interest included employment, earnings and business performance outcomes.

SEARCH METHODS
The review relied on a comprehensive systematic search across more than 70 sources, including literature databases and a large number of websites, which allowed the identification of both published and unpublished studies . The search process included both a primary search (i.e., searching of a wide range of general and specialized databases) and a complementary search (i.e., hand-searching of relevant websites, searching of dissertations, theses and grey literature databases, citation tracking, screening of reference lists and contacting authors and experts). The in-depth complementary search allowed the identification of several unpublished studies. The process included search terms in English, French, German, Portuguese and Spanish. The search process was completed in January 2015.

SELECTION CRITERIA
Eligible studies are those that: 1. evaluated an active labour market programme (ALMP) that included at least one of the following categories of interventions: training and skills development (such as technical and non-technical skills), entrepreneurship promotion (providing access to capital, from financing to entrepreneurial skills that would enhance human capital), employment services (providing job-placement and job-search assistance, among other services) and/or subsidized employment (providing wage subsidies or public employment programmes); 2. investigated programmes that were designed for -or targeted primarily -young women and men aged between 15 and 35; 3. reflected completed experimental and quasi-experimental evaluations measuring impacts on eligible labour market outcomes; and 4. reported at least one eligible outcome variable measuring employment (e.g., probability of employment, hours worked, duration in unemployment), earnings (e.g., reported earnings, wages, consumption) or business performance (e.g., profits, sales).
In addition to the above inclusion criteria, the review focused on studies with a publication date between 1990 and 2014. No language restrictions were applied.

DATA COLLECTION AND ANALYSIS
A coding tool and manual were developed in order to guide a harmonised data extraction process. Treatment effect estimates were coded across all studies that met the inclusion criteria, along with other parameters and intervention characteristics deemed relevant for the analysis. Additional, non-reported information was retrieved from authors of the primary studies, supporting the computation of standardized mean differences (SMDs) effect sizes. The SMDs captured the relative magnitude of the treatment effect in a dimensionless way, which was therefore comparable across outcomes and studies. Effect sizes were summarized within and across reports to one effect size per outcome for each study.
Random-effects meta-analysis methods were employed to synthesize and compare effect sizes reported in the primary studies. Subsequently, multivariate meta-regression models were estimated and information about intervention-level, study-level and country-level characteristics were included to assess factors associated with the magnitude of reported effect size estimates.

4.
The high degree of inconsistency across interventions suggested programme impacts concealed major contextual differences. The meta-analysis showed important differences in the magnitudes of impact across outcomes and interventions.
Despite the strong similarities across included studies, the differences in impact were not always driven by chance. Tests for heterogeneity demonstrated substantial variation in the effect size magnitude due to: country context, intervention design, and profile and characteristics of programme beneficiaries.
8. Looking at differences in effects by gender, the findings suggested that employment and earnings outcomes for women were marginally larger than those for men. 9.
The systematic review captured information about the type of skills delivered to young people and found no particular connection between soft skills and better labour market outcomes. Similarly, there was no systematic evidence about the role of public, private or civil entities in the implementation of a youth employment programme.

CONCLUSIONS
The extent and urgency of the youth employment challenge and the level of global attention currently being given to this topic calls for more and better evidence-based action. Accordingly, this systematic review sought to examine the empirical evidence in order to understand what drives the success (or failure) of youth employment interventions. Investments in youth employment will continue, and even increase, as countries embark on the implementation of the 2030 Agenda for Sustainable Development; therefore, this review focused on identifying "what works" and, as far as possible, "how".
This systematic review builds on a growing base of studies measuring the impact of youth employment interventions and offers a rigorous synthesis and overall balance of empirical evidence taking into account the quality of the underlying research. The review is systematic through a clearly defined and transparent inclusion and exclusion criteria, an objective and extensive search, a punctual data extraction process, a standardized statistical testing and analysis, and a thorough reporting of findings. These elements and underlying methods and tools were laid out and reviewed in the protocol .
The evidence suggests that investing in youth through active labour market measures may pay off. The evidence also shows a significant impact gap across country income levels. Being unemployed or unskilled in a high-income country -where labour demand is skill intensive -puts youth at a distinct disadvantage in comparison to a cohort that is, on average, well educated. While ALMPs in high-income countries can integrate disadvantaged young people into the labour market, they are not able to fully compensate for a lack of skills or other areas where youth failed to gain sufficient benefit from the education system. On the other hand, in lower income countries, with large cohorts of disadvantaged youth, marginal investments in skills and employment opportunities are likely to lead to larger changes in outcomes. Youthtargeted ALMPs in low-and middle-income countries do lead to impacts on both employment and earnings outcomes. Specifically, skills training and entrepreneurship promotion interventions appear to yield positive results on average. This is an important finding, which points to the potential benefits of combining supply-and demand-side interventions to support youth in the labour market.
The evidence also calls for careful design of youth employment interventions. The "how" seems to be more important than the "what" and, in this regard, targeting disadvantaged youth may act as key factors of success.
The findings from this review need to be discussed vis-à-vis the local and national context and should be complemented by a long-term and holistic commitment towards youth development.
Achieving an understanding of the "how" element is not an easy task. Although the systematic review excluded studies that only reported relative effects, it is also the case that, frequently, impact evaluations do not assess relative effectiveness. Even more often, reports and papers fail to describe the underlying theory of change and observed transmission mechanisms behind an intervention. In some other cases, there is limited information about the characteristics of programme participants in the evaluation sample and their comparison group. Much remains to be done to improve reporting standards and advocate for more and better evidence examining the impact of youth employment interventions. The quality of the primary studies determines the quality of the systematic review and any subsequent synthesis of the evidence.
The review supported the identification of important evidence gaps: • It is important to note that despite the large and significant magnitude of effect of entrepreneurship promotion interventions in low-and middle-income countries, the evidence base is still limited and exhibits high variance, calling for more primary studies on this promising intervention type. Similarly, more and better evidence is needed on employment services, wage subsidies and public employment programmes for youth, particularly in low-and middle-income countries.
• While the review highlighted a growing evaluation evidence from youth employment programmes implemented in Sub-Saharan Africa, it also reported very limited information from the Middle East and North Africa, South Asia and East Asia and the Pacific. These are regions were more targeted action to expand the evidence base should be considered.
• Similarly, more research is needed on intermediate outcomes in primary studies and evidence synthesis work. This is linked to the importance of improving researchreporting standards and expanding the scope of outcomes of interest in order to better synthesize evidence about how interventions affect knowledge, skills, attitudes, and behaviours. More and better information on these intermediate outcomes will improve overall understanding about the causality and pathways of change between the intervention and the final outcomes.
• Soft skills are highly demanded by employers today. Their role in generating better outcomes is yet to be corroborated and more inquiry is required to understand their role in the causal chain as well as their interaction with more technical skills sets.
• Lastly, future primary studies and evidence syntheses should engage with cost information. The applicability of the evidence hinges not only on its internal and external validity but also on its feasibility. More information is needed on programme costs as well as systematic comparisons against programme effects. What may look highly effective may in fact be too expensive to replicate or scale up.

THE RESEARCH PROBLEM: WHY YOUTH EMPLOYMENT?
The economic crisis brought about a swift reversal of the gradual declining trend in global youth unemployment rates observed between 2002 and 2007. The rapid increase in youth unemployment between 2007 and 2010 led to youth's discouragement and withdrawal in significant numbers from the labour force. It is estimated that nearly 6.4 million youth worldwide moved into inactivity in response to the crisis while many others continue to work yet live in poverty (ILO, 2012). The youth employment crisis has become a stubbornly persistent reality in all regions and in nearly every country. Of the estimated 200 million unemployed people today, about 37 per cent -more than 73 million -are between the ages of 15 and 24. This translates into a global youth unemployment rate that has settled at 13.0 per cent during the period 2012 to 2014. While it is expected to remain relatively constant in the near future, this rate is still well above its pre-crisis level of 11.7 per cent (see Figure 1).
According to the 2015 Global Employment Trends for Youth report of the International Labour Organization, youth remain overrepresented among the unemployed and shaken by Youth unemployment rate (%)

Youth unemployment (millions)
Youth unemployment (millions) Youth unemployment rate (%) the changing patterns in the labour market. Two-fifths (42.6 per cent) of the global youth labour force were reported as being unemployed or in working poverty in 2013. Regional youth unemployment trends remain fairly mixed. Most notably, the youth unemployment rates in the Middle East and North Africa (MENA) continue to be the highest worldwide, at 28.2 and 30.5 per cent for 2014, respectively. These figures stand out in comparison to other regions where the unemployment rate ranges from 10 to 20 per cent. In spite of the important achievements in boosting access to education and levels of educational attainment in the MENA region, today more than one in four active youth do not have a job (ILO, 2015a).
After being hit hard by the economic crisis, youth unemployment levels in Developed Economies and the European Union have seen some recent regional improvements, with the youth unemployment rate decreasing from 18.0 to 16.6 per cent, between 2012 and 2014. However, these improvements mask some difficult macroeconomic dynamics in certain countries, which are currently being further aggravated by conflict-driven migration. Six countries stand out in this respect, with unemployment rates of over 30 per cent, namely Croatia, Cyprus, Greece, Italy, Portugal and Spain.
Asian regions and sub-Saharan Africa continue to present relatively low unemployment rates among youth, although these statistics are all too often a reflection of the fact that youth cannot afford not to work and, as a matter of necessity, engage in poor quality and insecure jobs.
The challenge is not trivial since the "demographic dividend" can become a source of instability if young people around the world continue to face disappointing prospects in their job search. Unemployment depreciates human capital and has a significant negative influence on health, happiness, crime levels and socio-political stability (Bell and Blanchflower, 2009). Failing to address unemployment and underemployment among youth may contribute to the loss of human capital and an increase in social discontent.
Addressing the youth employment challenge continues to rank high in both international and local development priorities. The 2030 Agenda for Sustainable Development has placed the importance and urgency of achieving full and productive employment and decent work for all squarely at the centre of the new development vision, with youth explicitly identified as a key target group (Box 1).
It is therefore crucial to gather evidence to support the implementation of the 2030 Agenda. Yet very few rigorous overview and cross-country studies review and analyse the impact of youth employment programmes and what determines their success in different contexts. Even though the number of single-programme evaluations providing rigorous evidence on the effectiveness of active labour market programmes (ALMPs) has increased over the past decade, many fundamental questions remain unaddressed -particularly regarding the key issues: Which programmes work for a given target group, and under what circumstances? What are the crucial design features necessary for youth employment programmes to be effective?
• 4.4: "By 2030, substantially increase the number of youth and adults who have relevant skills, including technical and vocational skills, for employment, decent jobs and entrepreneurship"; • 8.3: "Promote development-oriented policies that support productive activities, decent job creation, entrepreneurship, creativity and innovation, and encourage the formalization and growth of micro-, small-and medium-sized enterprises, including through access to financial services"; • 8.5: "By 2030, achieve full and productive employment and decent work for all women and men, including for young people and persons with disabilities, and equal pay for work of equal value"; • 8.6: "By 2020, substantially reduce the proportion of youth not in employment, education or training"; • 8.b: "By 2020, develop and operationalize a global strategy for youth employment and implement the Global Jobs Pact of the International Labour Organization"; and • 9.3: "Increase the access of small-scale industrial and other enterprises, in particular in developing countries, to financial services, including affordable credit, and their integration into value chains and markets".

THE INTERVENTION: ALMPS FOR YOUTH
In support of more and better programmes and policies for the promotion of youth employment, this systematic review examines labour market interventions that fall into the category of ALMPs, which are further defined as all social expenditure (other than education) which is aimed at the improvement of the beneficiaries' prospect of finding gainful employment or to otherwise increase their earnings capacity. This category includes spending on public employment services and administration, labour market training, special programmes for youth when in transition from school to work, labour market programmes to provide or promote employment for unemployed and other persons (excluding young and disabled persons) and special programmes for the disabled (OECD, 2013).
ALMPs require active participation in programmes that enhance labour market integration, a requirement which differentiates them from other labour market -and social protectionpolicies, such as unemployment insurance schemes and non-conditional transfers. In the case of ALMPs, the economic rationale relies on market clearing (i.e., achieving a match between labour demand and supply) and market efficiency (for instance, through job-search assistance, provision of labour market information and pre-screening of programme applicants). ALMPs can also enhance labour supply by providing training, foster labour demand through labour-intensive public employment programmes, entrepreneurship and self-employment measures, or alter the structure of demand by offering employment subsidies (Auer et al., 2008).
ALMPs considered in the systematic review are clustered in the following typology of interventions: 1.
Training and skills development, including providing trade-or job-specific technical skills, business skills training, literacy and numeracy programmes, and programmes that improve non-technical skills, such as core work skills, behavioural skills, life skills or soft skills of jobseekers.
2. Entrepreneurship promotion, aiming to provide entrepreneurial skills as well as access to capital. Interventions may provide or facilitate access to credit (including microfinance programmes), provide start-up grants and technical support, and those fostering microfranchising mechanisms.
3. Employment services, delivering job counselling, job-search assistance, and/or mentoring services, which are often complemented by job placements and technical or financial assistance.
4. Subsidized employment, including wage subsidies and labour-intensive public employment programmes aiming to reduce the labour cost for employers and provide employment to youth in infrastructure or social development and community projects, respectively.
Although the focus of ALMPs tends to be on economic relevance, they can have important social and political dimensions (Betcherman, Dar & Olivas, 2004). ALMPs can foster the social inclusion of disadvantaged groups while signalling a willingness on the part of politicians to engage with their specific problems.

HOW THE ALMPS ARE SUPPOSED TO WORK
This section offers some theoretical underpinning to the ways in which the interventions included in this systematic review may improve the labour market outcomes of youth. The underlying assumption of programmes is that participation in ALMPs will ultimately improve the employment and earnings outcomes of participants, as well as the performance of those businesses that programme participants start or already own.
Exposure to ALMPs is expected to create a spillover effect among non-participants, as well as general equilibrium effects throughout the economy. While some of these spillovers may positively affect overall employment outcomes, in certain cases ALMPs can have a negative impact on the performance of non-participants. For example, there is evidence that wage subsidy programmes can lead to substitution effects (with subsidized workers replacing non-subsidized workers) and windfall effects (when part of the subsidies go to workers who would have been hired in any case), thereby decreasing the overall employment impact of the programme. To address this issue, increased attention must be given to programme design features such as the establishment of conditionalities for employers (Almeida, Orr & Robalino, 2014).
This section summarizes the theories of change behind ALMPs for youth, aiming to map out the relationship between: (i) the resources that are invested ("Inputs"); (ii) the intervention that takes place, including the different activities that may be part of the intervention ("Activities"); (iii) the individual-level competencies and constraints (such as knowledge, attitudes and behaviours) which are directly affected by the intervention ("Outputs"); and, finally, (iv) the individual labour market outcomes that can be measured as part of an impact evaluation study ("Outcomes"). Key assumptions are also made to determine whether any given event in the sequence actually yields the expected changes in labour market outcomes.
Once the theories of change are clear, the systematic review examines whether the evidence supports the expected causality and impact across the selected intervention types, namely: training and skills development, entrepreneurship promotion, employment services and subsidized employment.
Building on existing literature, operational manuals and programme information, this section describes each intervention and its underlying theory of change. Even though labour market programmes often combine interventions from different categories, the results chains for each category have been separated to provide further transparency in the assumptions and support the interpretation of results to reveal potential causal mechanisms.
In the interests of a well-defined intervention description, those activities and outputs that are not strictly linked to labour market effects have been omitted. Similarly, a narrow focus has been adopted on individual-level labour market outcomes, leaving aside other potential side-effects, such as increased psychosocial well-being. For simplicity, higher level or "longer term" outcomes -such as poverty reduction, economic growth or democratization -are not explicitly shown in the chain of effects, nor are potential general equilibrium effects that may reduce the macroeconomic effectiveness of an intervention. Nonetheless, most of the programmes under scrutiny have broader macroeconomic effects, which will play an important role when scaling up or replicating the programme. In fact, some of the interventions may explicitly target higher-level (economy-wide) macroeconomic outcomes, such as social protection aspects (e.g., public employment programmes may be designed to smooth consumption during recessions or crises).

Training and skills development
Education and skills are considered a core factor in determining young people's opportunities in the labour market (Biavaschi et al., 2012). Skills training programmes are therefore the most widely used labour market intervention for young people worldwide and are increasingly delivered as a complement to other labour market measures (Betcherman, Godfrey, Puerto, Rother & Stavreska, 2007;Fares & Puerto, 2009). Training and skills development comprises programmes outside the formal education system that offer skills training to young people in order to improve their employability and facilitate their transition into the labour market. 1 The objective of skills training programmes is to develop the employment-relevant skills of jobseekers. Broadly speaking, these skills refer to a set of jobspecific technical skills, but also include non-technical soft skills, such as self-management, teamwork and communication. Increasingly, employers across the world are placing higher value on these non-technical skills than on technical competencies (Manpower Group, 2013;Cunningham, Sanchez-Puerta & Wuermli, 2010;Youth Employment Network & International Youth Foundation, 2009).
This analysis classifies training programmes according to the skill set which they target (Table 1): 1. First, training programmes that address a lack of trade-or job-specific technical skills demanded by employers. Such skills range from manual skills to computer literacy. Technical skills training programmes often include an on-the-job training component in order to increase practical work experience (i.e., by placing participants in internships, workplace training or apprenticeship schemes).
2. Second, business skills training, which is often provided as an element of programmes that aim to increase entrepreneurial activities among youth. Such entrepreneurial training programmes cover a wide variety of factors that are believed to determine business success (ranging from financial skills to problem-solving skills).
3. Third, literacy and numeracy programmes, which are designed to teach basic skills or cognitive abilities to youth who had not acquired them by the time they left school (sometimes called "second-chance programmes").
4. Finally, programmes that improve non-technical skills, such as behavioural skills, life skills or soft skills of jobseekers. Technical training programmes are popular in development cooperation because many developing countries experience a skills mismatch between their labour force and emerging segments of their economies. However, pure training programmes have not proven to be particularly successful in many contexts (Betcherman et al., 2004). Therefore, most recent programmes tend to combine skills training with other types of interventions; for example, on-the-job training or employment services (Cunningham et al., 2010;Fares & Puerto, 2009). An example of a skills training programme is provided in Box 2.
A number of conditions determine whether skills training programmes are successful in bringing additional youth into work -most notably, correlation between the skills offered by a training programme and those demanded by the market. To this end, some programmes introduce a market-based (or bottom-up) approach in programme design. The application of this approach enables training curricula and programme components to respond much more effectively to the needs of employers (in both private and public sectors) and communities in a demand-driven fashion.
Furthermore, the success of all these interventions relies on the assumption that the (correct) target group participates in the training and that the training is appropriate and conducted in a way that actually augments the skill sets that are relevant to the labour market. Finally, a crucial element may be the award of a legitimate certificate on successful completion of a programme to prove the acquisition of increased knowledge and skills to potential employers in the job market.

Box 2: Training and skills development: Juventud y Empleo in the Dominican Republic
The Youth and Employment Programme, Juventud y Empleo (JE), in the Dominican Republic represents an innovative model of an ALMP to improve employability and human capital of young people between the ages of 16 and 29 who did not complete high school. The programme provided young people with vocational training (150 hours) and basic or life skills training (75 hours) combined with internships in private sector firms (240 hours). The programme was managed by the Ministry of Labour in cooperation with the National Institute of Technical and Vocational Training (Instituto Nacional de Formación Técnico Profesional) and with financial support from the Inter-American Development Bank. Training services were provided by private training institutions.
The programme came into operation in 2001 and was the first job-training programme in Latin America and the Caribbean to incorporate a randomized evaluation component in the project design. The first impact evaluation showed limited impacts on employment and wages, which led to changes in the programme to focus on working more closely with the private sector and providing a stronger life skills component. Further evaluation results showed that the programme had a positive impact on job formality for men and a positive effect on monthly earnings among those who were employed. In addition, the programme was effective in reducing teenage pregnancy and showed a positive impact in various measures of non-cognitive skills.

Entrepreneurship promotion
Innovative entrepreneurial activities can promote job-rich growth and accelerate economic diversification paths through productivity and competitiveness. Entrepreneurship returns to economic development are maximized within business environments that are amenable to innovation and creativity and provide appropriate regulations, access to infrastructure services and finance (ILO, 2015b). However, entrepreneurship also carries substantial risks of failure and has the potential to contribute to job losses if increased productivity and competition leads to layoffs in existing enterprises (Kritikos, 2014).
Entrepreneurs are important income providers and job creators. They benefit booming economies by challenging existing enterprises to innovate and compete in order to keep up with rapidly changing technologies and global markets. They also benefit economies that are suffering from slow job growth or stagnation by boosting labour demand, developing innovative goods and services and stimulating competition.
Depending on the context, entrepreneurs can be driven by choice or by necessity. Entrepreneurs by choice select entrepreneurship over other employment options in order to increase their income or become more independent. Entrepreneurs by necessity, also known as subsistence entrepreneurs, face a market situation with insufficient labour demand and therefore lack formal employment opportunities, exposing their entrepreneurial ventures to the low productivity and precarious working conditions that prevail in the informal economy.
The enterprise size and its corresponding ability to grow and to create jobs also help to identify the rare "transformation entrepreneurs" or "gazelles". These are the few entrepreneurs whose enterprises grow to become larger enterprises and generate most of the new jobs. Their high-growth enterprises create jobs and income for others, beyond the scope of an individual's subsistence needs (Cho, Robalino & Watson, 2014). In contrast, the enterprises of subsistence entrepreneurs usually do not grow, but provide income and employment for the owner of the micro-enterprise and their immediate family.
Entrepreneurship promotion programmes considered for this systematic review aim to lower the barriers and costs associated with young unemployed and underemployed people planning to establish or maintain a business. Since the scope of formal wage employment is often limited in developing countries, increasing (formal) self-employment among the labour force is considered an important anti-poverty strategy (Gindling & Newhouse, 2012). Because self-employed and small-scale entrepreneurs often face numerous internal and external constraints, a multitude of measures exist to support the process.
Access to capital is often a primary constraint for young entrepreneurs. Schoof (2006) identifies a number of constraints to accessing start-up finance. These range from inadequate personal savings and resources to a lack of securities and credibility, insufficient business experience and skills, strict credit-scoring methodologies and regulations, among others. Accordingly, many entrepreneurship programmes address the lack of access to (affordable) finance faced by young entrepreneurs. The review team disaggregated such programmes into three types: 2. Those providing start-up grants 3. Those fostering microfranchising mechanisms.
ALMPs that facilitate access to finance often provide technical training and advice and support setting up partnerships and capacity-building schemes with (and for) microfinance institutions (MFIs) and banks.
In addition to access to finance, some programmes offer training on business and management skills as well as business advisory services and mentoring for soon-to-be or already self-employed youth. Finally, some interventions aim to reduce the barriers to business creation by assisting prospective entrepreneurs to enter established markets or existing value chains. The abovementioned interventions and their results chain are shown in Table 2. Some skills training programmes (as described in Section 1.3.1 above) incorporate features of entrepreneurship training and specific skills relevant for starting or maintaining a business.
Many entrepreneurship programmes take a multi-component approach; for example, combining access to credit with business skills training or the provision of post-programme consultation (i.e., mentoring and coaching).
Primarily, entrepreneurship programmes increase employment through their direct effect on the soon-to-be self-employed participant. The assumption is that beneficiaries actually plan to set up a new business after receiving credit and/or training (i.e., that targeted and trained individuals have been appropriately selected for the programme) and that they would not have done so without the intervention.
In order to generate additional jobs, entrepreneurship programmes have to assume that the intervention leads to either (i) increased marginal productivity of the input labour or (ii) increased output and profits resulting in additional investments and labour demand. To achieve this end, the training must suit the context and knowledge of the participants. Beneficiaries then have to apply the training or credit to their business and thereby increase performance and competitiveness. 3 Whether or not an entrepreneur will finally hire additional workers may also depend on the macroeconomic and labour market environment.
Box 3 describes the programme Start and Improve Your Business (SIYB), a widely used and adapted entrepreneurship training package designed by the International Labour Organization (ILO) and tailored for youth.

Box 3: Entrepreneurship promotion: Start and Improve Your Business
The Start and Improve Your Business (SIYB) programme is a management-training programme with a focus on starting and improving small businesses as a strategy for creating more and better employment in developing and transitional economies. The SIYB programme is a system of interrelated training packages and supporting materials for small-scale entrepreneurs. The programme is designed by the ILO and implemented with support from certified trainers in partner institutions in more than 100 countries with an estimated outreach of 6 million trainees. Initially developed in the 1980s, it has now been translated into more than 40 languages. The Start Your Business (SYB) package provides a five-day training course for potential entrepreneurs with concrete and feasible business ideas and proposes a follow-up programme including counselling sessions. SYB assists participants to develop a business plan with a marketing strategy, a staffing plan and a cost plan.
The 2011 SIYB Global Tracer Study found that in new businesses started after the training, on average, three jobs were generated. In Uganda, a randomized control trial (Fiala, 2014) providing mainly young business owners with loans, cash grants and the SYB training module or a combination of these components showed that, six and nine months after the interventions, men with access to loans with business skills training reported 54 per cent greater profits.

Employment services
Employment services programmes are generally based on the (matching and) intermediation approach to active labour market policy. Interventions within employment services are shown in Table 3. Job-placement programmes acknowledge the existence of information asymmetries and, particularly, incompleteness of information in the labour market. Hence, these programmes aim to improve the job-matching process by providing information and support to both sides of the labour market. On the one hand, they inform young jobseekers about suitable job opportunities (a service which is of particular relevance to youth who have only recently entered the labour market and are experiencing difficulties in marketing themselves or lack the knowledge, information and networks to find job openings). On the other hand, they provide information to potential employers about unemployed youth. The underlying idea is to facilitate the matching of employment opportunities with jobseekers while reducing the costs and risks to employers connected with recruiting young people.
The second type of intervention, job-search assistance services, includes job-search training, educational or career guidance, counselling and monitoring programmes. Such programmes primarily target disadvantaged or demotivated youth who are disconnected from the labour market. Their primary aim is to improve the intensity, motivation and effectiveness of participants' job-searches.
Mentoring programmes are also provided to youth who are not currently unemployed but are in education or have just entered the labour market (post-placement support). Accordingly, in some circumstances, mentors encourage mentees to stay in education or in on-the-job training. In many countries, employment agencies adopt a case-management approach (identifying barriers to employment, designing individual action plans, referring jobseekers to appropriate interventions and monitoring job-search activity), which has been argued to be the most effective method of providing these services (Walther & Pohl, 2005).
While in some countries public employment agencies continue to be the main providers of employment services, other countries have moved into subcontracting, opening an important role for private employment agencies to address mismatches and information failures in the labour market. Box 4 illustrates a subcontracting model applied by a French public employment agency to facilitate counselling and job-placement for educated youth.

Box 4: Employment services: Counselling and job placement for young graduate jobseekers in France
In France, the government agency Pôle Emploi matches jobseekers with potential employers and provides benefits and job counselling to the unemployed. In 2007, the French Government decided to experiment with subcontracting employment services for young graduates who had been unemployed for at least six months to private providers. The jobseeker assistance programme aimed to help jobseekers find work and to support the former jobseeker in retaining that job or finding a new job. For the first six months of the programme, the private employment agency counselled the jobseeker and helped to find a job with a contract duration of at least six months. During the first six months of employment, the client continued to be supported and advised by the agency.
A randomized experiment measured the direct and indirect (displacement) impacts of job-placement assistance on the labour market outcomes of young people. The evaluation found that the reinforced counselling programme had a positive impact on the employment status of young jobseekers eight months after assignment to the treatment group, compared to untreated jobseekers. However, these positive effects appeared to have come partly at the expense of eligible workers who did not benefit from the programme, particularly in labour markets where they were competing mainly with other educated workers and in weak labour markets.
There are indications that involvement in employment services (and in ALMPs in general) has a stigmatizing effect on participants (Boone & van Ours, 2004;Kluve, Lehmann & Schmidt, 1999). Addressing this adverse effect is a prior condition for successful implementation. To this end, job-placement and job-search assistance programmes are often connected to financial incentives for jobseekers and/or employers. For example, such schemes may involve the imposition of sanctions on the unemployed for failure to comply with the terms of the intervention. Similarly, marketing of unemployed youth may be combined with the offer of short-term subsidies to employers.

Subsidized employment
Insufficient labour demand is one of the main constraints faced by young job market entrants -particularly in developing economies. Subsidized employment interventions comprise two main areas: wage subsidies and labour-intensive public employment programmes (Table 4), both of which are designed to increase the job and training opportunities available to unemployed youth. The main aim of both types of intervention is to ensure that individuals who do not find a job on the regular labour market remain integrated and connected to economic and social life. To that end, such programmes offer short-term interventions but primarily work towards longer-term labour market impacts.
Wage subsidies are transfers to employers or employees in order to fully or partially cover eligible individuals' wage or non-wage employment costs. Most often, the measures aim to incentivize employers to hire members of a specific target group. Wage subsidies come in numerous forms and can be offered through various mechanisms, ranging from direct transfers to firms or workers to reductions in social security contributions or payroll taxes or tax credits.
Employer-side subsidies reduce the financial costs or risks associated with not knowing the productivity of the person to be employed. As with employment services, this is a scheme which is particularly relevant to youth entering the labour market for the first time, and whose (perceived) marginal productivity may be below market wages. Employer-side subsidies may also serve to lower the costs to employers of providing on-the-job youth training. Such training subsidies offer the possibility of expanding the number of work-based training places for disadvantaged young people.
Employee-side subsidies promote labour supply through increasing the returns from employment and hence increasing incentives to seek and retain employment. While it is believed that employer-side subsidies may also encourage more active job-search (because youths believe they will be able to find work), providing employee-side earning supplements may permit more effective targeting of specific socio-demographic groups. Furthermore, whereas employer-side subsidies tackle a lack of labour demand, employee-side subsidies may be more appropriate in countries that face labour supply constraints, for example due to reservation wages.

42
The Campbell Collaboration | www.campbellcollaboration.org It is important to acknowledge the limited use and evidence of wage subsidies in developing countries. Almeida et al. (2014) detail the results of experimental and quasi-experimental impact evaluations around the world. Most evidence comes from the United States with rather mixed results concerning the effectiveness of wage subsidies as tools for fostering job creation.
Evidence on the impact of youth-targeted wage subsidies in developing countries is limited and results are mixed. Evaluations looking into wage subsidies in Jordan (Groh, Krishnan, McKenzie & Vishwanath, 2012) and South Africa (Levinsohn, Rankin, Roberts & Schoer, 2014) show positive though rather short-lived effects and a narrow participation from firms. Details of the Jordan New Opportunities for Women pilot are shown in Box 5: Subsidized employment: Jordan New Opportunities for Women (Jordan NOW). A recent review of wage subsidies for youth argues that, if well targeted, the interventions can be effective in improving employment outcomes of disadvantaged youth (Bördős, Csillag & Scharle, 2016).

Box 5: Subsidized employment: Jordan New Opportunities for Women (Jordan NOW)
The Jordan New Opportunities for Women (Jordan NOW) pilot aims to increase employment of female community college graduates in Jordan by offering wage subsidies and training to graduating students. Groh, Krishnan, McKenzie and Vishwanath (2012) examined the impact of the pilot in a randomized experiment. Female graduating students were randomly allocated into four groups: a treatment group which received a job voucher; a treatment group which was invited to attend an employability skills training course designed to provide key soft skills demanded by employers; a treatment group which received both the voucher and the training; and a comparison group.
The pilot targeted young female graduates who could take the job voucher to a firm while searching for jobs. The job voucher paid the employer an amount equal to the mandatory minimum monthly wage of 150JD (US$210) per month for a maximum of six months within an 11-month period, if they hired the worker, thereby acting as a wage subsidy.
The analysis finds that the job voucher led to an increase in employment in the short term, but that most of this employment was not in the formal sector, and the average effect was much smaller and no longer statistically significant four months after the voucher period had ended. The voucher does appear to have had persistent impacts outside the capital, where it almost doubled the employment rate of graduates. However, the analysis suggests that employment gains may have resulted from displacement effects.
Source: Groh, Krishnan, McKenzie and Vishwanath, 2012. Note: The description above focuses on the wage subsidy intervention of the pilot.
The second type of labour market intervention analysed in this category is labour-intensive public employment programmes, also known as public works. These programmes are commonly used to increase aggregate demand for labour in contexts where markets are unable to create productive employment on the required scale. In addition to their ability to create direct jobs, public employment programmes also generate income and deliver public assets and services. Despite the strong association of these programmes with infrastructure and construction works, they can be quite versatile, with works and projects in the social sector, environmental services and multi-sectoral, community-driven programmes (Lieuw-Kie-Song, Philip, Tsukamoto & Van Imschoot, 2010; Lieuw-Kie-Song, Puerto & Tsukamoto, forthcoming).
In this type of intervention, basic social income recipients are recruited for public jobs and receive a small earning supplement to their unemployment assistance. Programmes usually target unskilled, disadvantaged or long-term unemployed workers with the aim of keeping them in contact with the labour market and mitigating the depreciation of human capital during periods of unemployment.
While public employment programmes have often been recommended as a measure in times of crises (such as seasonal shocks or economic recession), 7 they are increasingly used as a regular component of wider employment policies (Lieuw- Kie-Song et al., 2010). In addition, they have become popular as a mechanism for addressing youth unemployment (Grosh, del Ninno, Tesliuc & Ouerghi, 2008), serving both as an introduction to the world of employment and as a tool to maintain social integration. This is particularly relevant for youth service programmes, in which youth can "play an active role in community and national development while learning new skills, increasing their employability, and contributing to their overall personal development" (Cunningham, McGinnis, Verdú, Tesliuc & Verner, 2008).
Most wage subsidies and public employment programmes are designed to support employment only in the short or medium term. A positive effect on final outcomes is only attainable if the work experience and training received during the period of subsidized work also improves the longer-term employment prospects of participants. For this reason, (i) wage subsidies are often granted to firms that agree to provide additional training to subsidized employees (i.e., in connection with apprenticeship schemes) 8 and (ii) public employment programmes are often paired with exit strategies, such as skills training or entrepreneurship. 7 The programmes' potential to yield stabilization benefits is higher when they are implemented at the right time. Some programmes -particularly in South Asia -are implemented seasonally to ensure that employment is available during the agricultural slack seasons. Others, such as Argentina's Trabajar Program, are implemented during sharp economic crises as a means of increasing the incomes of poor families and those badly affected by recessions. 8 The ability to retain work following the expiration of the wage subsidy period also serves as a signal of the acquisition of certain work-related behavioural skills to potential future employers.

WHY THE REVIEW IS NEEDED
Policymakers and practitioners are seeking answers to the youth employment challenge; looking for ideas and guidance on what works best and why, in order to improve the labour market conditions of young people. Their objectives require a solid evidence base. During the 2012 International Labour Conference, governments and social partners recognized the need for more rigorous evaluation of youth employment interventions in order to review their effectiveness and, in particular, asked the International Labour Office to strengthen the evidence base on youth entrepreneurship interventions (ILO, 2012). Similar requests for information, technical and financial assistance are often made to the World Bank by client countries. Donors, NGOs and employment practitioners in general are also intent on identifying success factors to support youth.
Youth employment interventions, such as entrepreneurship promotion, training and skills development, employment services, mentoring and subsidized employment are considered common measures to improve youth labour market outcomes. Even though the number of studies contributing to rigorous evidence on the effectiveness of ALMPs has increased over the past decade, many fundamental questions remain unanswered, particularly with regard to context, programme type, design features and target groups.
• The role of context: Evidence on youth employment programmes is most common among developed countries and is particularly scarce in Africa, the Middle East and North Africa, Asia and sub-Saharan Africa. While contextual variables, such as levels of income and development, seem to play a role in shaping the probability of positive outcomes from youth ALMPs (Betcherman et al., 2007) more information is needed to understand how similar intervention models may affect youth differently in developed as opposed to developing contexts. Moreover, further evidence is required on the interventions and design features that are better suited to rural than to urban contexts, informal rather than formal settings, and in post-conflict and fragile-state environments.
• The question of programme focus: The majority of evaluations focus on the area of training and skills development, while evidence on other types of youth employment interventions, such as subsidized employment, employment services and entrepreneurship promotion, is relatively scarce. There is a significant knowledge gap regarding the effectiveness of combining different types of programme; for example, bundling up skills training, job-search assistance and mentoring.
• The efficacy of various design features: Little is known about the effectiveness of programme alternatives. There are several areas where policy choices can make a significant difference: design of the interventions; targeting mechanisms; length of exposure to the interventions; pedagogy; governance, management and administration; delivery channel (public, private, partnerships); delivery setting (classroom, on-the-job); and contracting, auditing and payment systems to providers of services. More evidence needs to be gathered on these design aspects.
• The range of beneficiaries: More evidence is needed to provide clarity on how different types of programmes affect individuals differently by age cohort, gender, level of education, ethnicity and socio-economic background.
Focusing on youth employment and understanding what works in terms of improving the labour market outcomes of youth is therefore of significant practical relevance. With the aim of impacting policymaking and programming with informed recommendations, this systematic review takes stock of the available evidence and examines changes in labour market outcomes prompted by labour market interventions for youth.
Assessing the impact of ALMPs has been a major focus of social welfare policies for decades, particularly in developed economies. It has also become a regular feature of recent public programmes in developing and transitional economies, given the increased budget constraints and need for policy decisions that are based on rigorous evidence of programme benefits and losses.
Such assessments have been regularly undertaken through social experiments that allow the estimation of programme impact by comparing observed changes in outcomes against what would have happened in the absence of a programme. In these experiments, random assignment is used to allocate the intervention among members of an eligible population. Differences in outcomes between the programme participants and their comparison group counterparts can be attributed solely to the programme since, according to the design parameters, there should be no correlation between participant characteristics and the outcome (3ie, 2013). 9 Experimental evaluation evidence is growing in the field of youth employment. Most available evidence relies on quasi-or non-experimental methods. The Youth Employment Inventory (YEI) 10 , an online global repository of information on labour market programmes for youth, offers records of impact evaluation studies of youth employment interventions worldwide. While rigour varies between studies, there is a clearly observed transition towards randomized experiments and stylized methods of evaluating impact.
This systematic review examined experimental and quasi-experimental evaluations of ALMPs that target youth. It looked at the available evidence in order to fill the knowledge gap on the impact and effectiveness of these interventions in a systematic and rigorous manner. Section 3 provides further information on the methodology adopted for the review's analysis. 9 In a review of evaluation methods used in ALMPs, Heckman, La Londe and Smith (1999) identified a number of methodological lessons, ranging from recognition of the multiplicity of parameters and heterogeneous impacts intrinsic to ALMPs to the need for appropriate comparison groups and the importance of addressing selection bias. Experimental evaluations can effectively learn from these lessons by providing a framework that relies on credible comparison groups and minimizes selection bias. 10 Available at: www.youth-employment-inventory.org [20 Feb. 2016].
Other reviews have looked at impact evaluations of youth employment programmes from different angles and at varied levels of depth. Table 5 presents the available evidence on completed reviews, identifying key differences between them and this review and summarizing its added value.
While some previous studies synthesize the evidence-base on the effectiveness of ALMPs (e.g., Card, Kluve &Weber, 2010 and, very few reviews specifically focus on programmes and outcomes for youth. The most relevant review of labour market interventions for youth to date, Betcherman et al. (2007), has served as the basis for technical assistance and policy advice worldwide. Since then, a vast amount of research has been published, using experimental or quasi-experimental methods to determine the impact of new and innovative employment programmes. While some recent reviews cover this new evidence, these do not synthesize the existing empirical evidence using empirical methods such as meta-analysis (J-PAL 2013) or they only look at (potentially selective) subsets of the available evidence (IEG, 2012;Eichhorst & Rinne 2015). Other studies only include specific types of intervention or outcomes (Tripney et al., 2013;Grimm & Paffhausen, 2015;Piza et al., 2016;Valerio et al., 2014), with the implication that some of the evaluations included in these studies were also included in this systematic review.
To the best of the review team's knowledge, this is the first systematic review of the impact of employment interventions on youth labour market outcomes to collate global evidence from youth ALMPs, examine employment, income and business performance outcomes and identify study effect sizes through a rigorous meta-analysis.

J-PAL (2013)
A 2013 review paper produced by the Abdul Latif Jameel Poverty Action Lab covered an array of youth interventions from education and health to labour market programmes. The paper discusses existing knowledge about and gaps in policies focused on youth. It identifies unanswered questions and sets a research agenda that will be updated periodically. In the area of ALMPs for youth, the review considers open questions on the effectiveness of employment services, training, subsidized employment and public works programmes. There is very limited information about the search methodology behind the review but it is clear that it builds on results from cross-country reviews and impact evaluations to identify and discuss knowledge gaps. The review does not rely on a statistical meta-analysis or study effect sizes.

Eichhorst and
Rinne ( countries from all levels of development and will disregard any training programme that is delivered in a formal education setting.

ALMPs in general, not focused on youth
There is a series of cross-country studies that reviewed the impact of ALMPs with specific findings from youth employment programmes, including: Betcherman et al. (2004), Dar and Tzannatos (1999), Card et al. (2015) and Filges et al. (2015). The sample of programmes specifically targeting youth is limited, as are the findings. The systematic review and metaanalysis of ALMPs by Filges et al. (2015) covers programmes for those receiving unemployment insurance. Card et al. (2015) offers a relatively rigorous search and quantitative analysis of impact based on study significance. Other studies with similar limitations include those which looked at programmes implemented in Organisation for Economic Co-operation and Development (OECD) countries only, e.g., Heckman et al. (1999), Kluve and Schmidt (2002), and . and training programmes that aim to provide individuals with the entrepreneurial mindsets and skills to enable them to participate in entrepreneurial activities. In addition to evaluations based on experimental and quasi-experimental designs, the study also includes tracer studies as well as monitoring and evaluation reports that rely largely on administrative data. The study does not include a statistical meta-analysis or study effect sizes. Piza et al. (2016) review the impacts of business support services for small and medium-sized enterprises (SMEs) on firm performance indicators, employment generation and labour productivity in low-and middle-income countries. The review examines interventions including tax simplification, boosting exports and facilitating access to external markets; support for innovation policies; support for local production systems; training and technical assistance; and SME financing and credit guarantee programmes. The review relies on a systematic search strategy and provides statistical meta-analysis.

Objectives
This systematic review aims to provide policymakers and practitioners with evidence-based recommendations on what works to effectively support youth in the labour market by summarizing and integrating empirical research to investigate the impact of labour market interventions on labour market outcomes of young people. The review also examined whether the evidence supports the underlying assumptions about what active labour market policies (ALMPs) for youth are designed to achieve.
The following research questions framed the analysis to establish what constitutes effective measures, which will ultimately help decision-makers in the allocation of their resources and determining their investment level and portfolio on youth employment: 1. What is the impact of youth employment interventions on labour market outcomes of youth? In particular, the review investigates skills training, entrepreneurship promotion, employment services and subsidized employment interventions.
2. Which of these interventions are the most effective on average?
By synthesizing the evidence on the relative effectiveness of different labour market interventions for youth, this systematic review has contributed to closing the knowledge gap in this field, which will have a real impact on the 73 million young men and women who are currently actively looking for a job.

TITLE REGISTRATION AND PROTOCOL OF THE SYSTEMATIC REVIEW
The title registration for this systematic review was published in The Campbell Collaboration Library of Systematic Reviews on 1 November 2013. The protocol of this review  was published on 3 November 2014. 11

INCLUSION CRITERIA
This systematic review focused on studies that investigated the impact of interventions on labour market outcomes of young people. The selection of studies was based on the following inclusion criteria (also outlined by the screening questionnaire presented in Section 9.6.2 of the Appendix).

Population and context
The review was global in coverage and considered interventions from all countries, regardless of their level of development. Studies investigated active labour market policies (ALMPs) that were designed for -or targeted primarily -young women and men aged 15 to 35, in consideration of varying national definitions of youth.

Intervention
The ALMPs examined in the study (i) targeted the unemployed or those with low levels of skills or limited work experience or who were generally disadvantaged in the labour market and (ii) aimed to promote employment and/or earnings/wage growth among the target population, rather than simply providing income support (Heckman et al., 1999). Eligible studies evaluated an ALMP that provided at least one of the following categories of intervention (also shown in Section 1.3): training and skills development, entrepreneurship promotion, employment services and/or subsidized employment. An overview of the categories of intervention is presented in Table 6.

Training and skills development
Comprised programmes outside the formal education system (and therefore did not include Technical and Vocational Education (TVE) programmes) that offered skills training to young people in order to improve their employability and facilitate their transition into the labour market.

Entrepreneurship promotion
Aimed to provide entrepreneurial skills as well as physical, financial and social capital for youth becoming self-employed and starting a business and for those seeking to expand and grow their businesses.

Employment services
Delivered job counselling, job-search assistance and/or mentoring services, which were often complemented by job placements and technical or financial assistance.
Subsidized employment Considered mainly those programmes which provided wage subsidies or interventions that aimed to reduce labour costs for employers taking on young workers as well as labour-intensive programmes or public works which provided short-term employment to youth in infrastructure or social development and community projects.
As discussed in more detail in the protocol , this review made an important distinction between programmes, interventions and components of an intervention: A youth employment programme was considered to be a single entity that might consist of one or several interventions. In addition, each of these interventions could have different components: It was possible to find a comprehensive intervention that offered, for instance, both skills training and employment services (to the same participant). Some examples of such multi-component interventions included the Job Corps programme in the United States, the Economic Empowerment for Adolescent Girls programme in Liberia, the Projoven programme in Peru and the Employment Fund in Nepal.
Interventions were therefore specific tracks or sub-programmes of an overall programme that were offered to different samples of participants. They were defined based on their characteristics, such as the category of intervention or the population targeted. For example, if a programme had a training track and an employment services track and participants took one or the other, they were considered to be two interventions within the same programme. Note that, according to this definition of track, it was assumed that each intervention within a programme had separate groups of participants which did not overlap. In order to provide evidence on which interventions and combinations were shown to work best, these different types were evaluated separately in the meta-analysis in the empirical Section 4.3 on Synthesis of results.
Additional consideration was given to identifying primary intervention types among multicomponent designs. The review defined "main category of intervention" as the largest and predominant intervention type within a programme. If several intervention types were equally distributed across the target population (i.e., an individual was exposed to more than one intervention type with the same level of intensity), the main category of intervention was classified as unspecified.

Comparison
The systematic review included studies that measured change in at least one outcome of interest among intervention participants and relative to non-intervention participants based on a counterfactual analysis. Eligible comparison groups (counterfactual) included those which received no intervention or were due to receive the intervention in a pipeline or waitlist study. Note that the comparison group of some studies might have been exposed to interventions other than the evaluated intervention. The review excluded studies that only measured the relative effects of two alternate interventions, without reference to a nonintervention comparator.

Outcomes
Eligible studies reported at least one selected outcome variable measuring the following primary outcomes of interest presented in Table 7: Employment outcomes, earnings outcomes and business performance outcomes. The review also captured outcomes which were measured conditional on other outcomes.

Study designs
The review focused on completed experimental and quasi-experimental evaluations and considered the following research design categories and impact evaluation methods to estimate quantitatively the causal effect of the intervention on the outcome it intended to influence: (i) randomized experiments, (ii) methods for causal inference under unconfoundedness (classical regression methods, statistical matching, propensity score matching) and (iii) selection on unobservables (instrumental variables, regression discontinuity design, difference-in-differences).

Randomized experiments:
The most straightforward case for analysis occurred when assignment to treatment was randomized (in controlled conditions) and, therefore, independent of covariates X as well as the potential outcomes Y. In such classical randomized control trials (RCTs) it was relatively easy to obtain estimators for the average effect of the treatment using, for example, the simple difference-in-means by treatment status. Randomized experiments have been used in the evaluation of labour market programmes since the 1970s (starting in the United States), with an increasing trend over the past decade. The descriptive analysis by research design included in this review in Figure 3 confirmed that RCTs have increasingly been used to assess the impact of youth employment interventions in recent years.

Methods for causal inference under unconfoundedness:
In this case, researchers analysed data from non-experimental (also called "observational") studies. Nonexperimental data generally created challenges in estimating causal effects but, in one important special case, variously referred to as unconfoundedness, exogeneity, ignorability or selection on observables, questions regarding identification and estimation of the policy effects were fairly well understood (Imbens & Wooldridge, 2009). All these labels referred to some variant of the assumption that adjusting treatment and comparison groups for differences in observed covariates X (i.e., pretreatment variables) removed all biases in comparisons between treated and comparison units (Imbens & Wooldridge, 2009). This case was of great practical relevance, with many impact evaluation studies relying on some form of this assumption: specifically, this category comprised classical regression methods (e.g., adjusting for covariates in a linear regression). Another method that was based on the unconfoundedness assumption and has been applied with increasing frequency is statistical matching, generating balanced samples in X of treated and comparison units and thus mimicking an experiment ex post. In practice, in recent years the most frequently used version of a selection-on-observables design has been propensity score matching, adjusting for a scalar, the (estimated) conditional probability of receiving the treatment given the covariate vector X.
3. Selection on unobservables: Without unconfoundedness, there is no general approach to estimating treatment effects, although various methods have been proposed for special cases (see Imbens & Wooldridge, 2009) and three of them were important for this systematic review. One such method is the instrumental variables (IV) approach that relies on the presence of instruments, which satisfy specific exogeneity assumptions. Essentially, in the case in which treatment assignment is endogenous (i.e., confounded with the potential outcomes), researchers look for instrumental variables that satisfy two assumptions. First, the instrument is correlated with the treatment (testable assumption) and, second, the instrument does not exert a direct impact on observed outcomes, but only through the treatment (maintained hypothesis). A second method is the regression discontinuity (RD) design that applies to settings in which (in its pure form, the socalled "sharp" RD) overlap is completely absent because the assignment is a deterministic function of one or more covariates, but causal comparisons can be made exploiting continuity of average outcomes as a function of the covariates. (In the "fuzzy" RD design, the assignment probability does not switch from 0 to 1 as in the sharp design, but only requires a (sufficiently large) discontinuity in the probability of treatment assignment at the threshold determined by the forcing covariate(s).) Regression discontinuity methods have received increasing attention in the economic impact evaluation literature in recent years. Finally, the third method, difference-in-differences (DiD), relies on the presence of additional data in the form of samples of treated and comparison units before and after the treatment (these can be panel data or repeated cross-sections). In the simplest setting, outcomes are observed for units in one of two groups, in one of two time periods. Then the average gain over time in the comparison group is subtracted from the gain over time in the treatment group. This double differencing removes biases in secondperiod comparisons between the treatment and comparison group resulting from permanent differences between the groups, as well as biases from comparisons over time in the treatment group resulting from time trends unrelated to the treatment. The intuitive way in which the DiD design can remove important biases, coupled with its broad applicability in many different contexts, has made this method one of the most frequently applied designs for estimating causal effects. Nonetheless, in practical applications attention must be paid to challenges to the design (e.g., sensitivity of estimates to the timing of measuring outcomes; time trends differentially affecting treatment and comparison groups; etc.). Finally, note that the approaches presented in this third category are often associated with the concept of "natural experiments", in which policy changes (or other "exogenous shocks") can be used to effectively define (randomly assigned, though not in a controlled way) treatment and comparison groups.

Other inclusion criteria
The form of publication of eligible studies included peer-reviewed journal, working paper, mimeo, book, policy or position paper, evaluation or technical report and dissertation or thesis. Eligible studies could be published in any language. The date of publication or reporting of the study had to fall between 1990 and 2014.

SEARCH METHODS
The search for relevant literature was based on a variety of sources in order to ensure that published and unpublished studies ("grey literature") relevant to the research question were included in the search process. The search process included (i) a primary search -searching of a wide range of general and specialized databases -and (ii) a complementary searchhand-searching of relevant websites, searching of dissertations, theses and grey literature databases, citation tracking, screening of reference lists and contacting authors and experts. The search included search terms in English, Spanish, French, German and Portuguese, but no language restrictions were applied in the selection process. Country restrictions were not applied to the search and selection process. The search and selection process was restricted to the period 1990 to 2014 with regard to date of publication or reporting of the study. Searches were carried out by six researchers, who worked in pairs, cross-checked included and excluded studies, and resolved discrepancies collaboratively. Detailed information about the search methods can be found in the protocol of the systematic review ).

Scoping search
Prior to implementing the primary and complementary search, the review team conducted a scoping search of potentially relevant sources to determine their relevance and to develop customized search strategies which would yield relevant results. The scoping search entailed an iterative process of testing and documenting several search strategies and identifying one or more preferred search strategies and search strings for each source in order to yield a comprehensive and precise set of potentially relevant results. The relevance of sources was determined by screening the results obtained from implementing each customized search strategy.
Based on a review of preferred search strategies and the results obtained during the scoping search, selected databases and websites were not included in the final search strategy if the review team did not have access to the source (e.g., SocIndex), the results obtained from the source were of low relevance (e.g., African Economic Outlook) or the source was covered by another source (e.g., ILO working papers are included in Labordoc). The final primary and complementary search strategy covered more than 70 sources, which included general databases, specialized databases, institutional websites, conference websites, dissertations and theses databases and grey literature databases. Section 9.6.1 in the Appendix presents the list of sources included in the final primary and complementary search.

Primary search
The primary search included 11 general databases and 12 databases that specialize in literature relevant to development economics and labour market issues. The search terms used for the primary search were based on the inclusion criteria and tried to strike a balance between sensitivity (e.g., finding all available articles in a topic area) and specificity (e.g., finding only relevant articles). For electronic databases with advanced search functions, the preferred search was based on a search of exposure, outcome and subject terms using Boolean operators in title and abstract from 2000 onwards. Highly relevant databases, such as EconLit, were searched for studies published since 1990 in order to include potentially relevant studies between 1990 and 2000. The search terms for electronic databases and examples for RePEc/IDEAS, EconLit and ERIC are presented in Sections 9.6.3 and 9.6.4 of the Appendix. The search strategy was modified according to the specifications of each database. Wherever possible, synonyms as well as wildcards and truncation symbols were applied as appropriate. The use of synonyms also accounted for British or American English spelling. To account for terminology differences across disciplines, database thesauri were consulted to ensure that all appropriate synonyms were included. Where available, the team also relied on the database's index terms and/or free-text terms. For databases or websites with basic search functions, the review team adjusted the search terms to accommodate the limited functionality of search functions and adapted these customized search strategies to relevant keyword searches and/or topic/theme searches based on the test results of keyword combinations of search terms. The search of electronic databases was completed in February 2014. From November 2014 to January 2015, the review team contacted experts and authors of included studies, screened reference lists of included studies and conducted citation tracking in order to identify additional studies. The search dates for each source used during the primary and complementary search process are presented in Section 9.6.1 in the Appendix.

Complementary search
The primary search was complemented by hand-searching and screening of 35 websites, such as institutional and conference websites, five dissertation, thesis and grey literature databases, nine other reviews and meta-analyses and literature snowballing as well as contacting experts and relevant institutions. The hand-searching strategy was customized for each relevant institutional website. Search terms were used for websites that included a search facility. Otherwise, relevant sections (for example, "documents" or "publications") were searched. Websites of conferences that were deemed relevant to the research question were searched for potentially relevant studies. To include potentially relevant dissertations and theses that were not indexed in bibliographic databases, the review team searched national and international dissertation and thesis databases. The review team also conducted citation tracking and screened reference lists of included studies and relevant existing reviews and meta-analyses to identify further studies for inclusion. The review team contacted authors of previous reviews and included studies, as well as experts and individuals coordinating youth employment related topics in relevant institutions, to ask whether they knew of any studies that might be applicable in addition to the studies that were included after the full text review of full reports. Ongoing and unpublished studies within the grey literature were identified through the screening and hand-searching of relevant websites/gateways and conference websites, citation tracking and contacting experts and relevant institutions. In addition, a keyword search was undertaken for the grey literature databases.

Data extraction
Relevant information from included studies was systematically extracted using a coding tool and coding manual. The coding tool, which is presented in Section 9.7 of the Appendix, included information about variables related to study methods, the characteristics of the intervention and its implementation, the characteristics of the subject samples of analysis, the outcome variables and statistical findings, and contextual features.
At effect size level, the coding tool captured sub-group analysis of employment, earnings and business performance outcomes and estimated treatment effects by age cohorts, gender, educational level, income level and location, among other dimensions. Types of outcomes were further disaggregated by occupation category (dependent vs. self-employment), status of occupation (formal vs. informal) and conditional on other outcomes. To describe the data and empirical methods, the coding tool included information about the research design, statistical methodology, type of significance test, type and method of measurement, date of data measurement and data source. The coding tool also captured the form and year of publication.
For each category of intervention (i.e., skills training, entrepreneurship promotion, employment services and subsidized employment), the coding tool extracted information about the type of intervention, targeting and delivery mechanism, payment system and provider, duration of specific interventions, selection of participants and conditionality of eligibility. General programme characteristics recorded details of the target group by age, gender, educational level, income level, location and employment status as well as the type of organizations involved in designing, financing and implementing the programme. The coding tool kept a record of region, country, scale and average duration of the programme. In addition to any awareness-raising efforts and gender considerations integrated into programme design and implementation, it also captured the incentives, monitoring mechanisms and sanctions for non-compliance connected with the programme. 12 A separate section of the coding tool was used to record information when the study reported intermediary outcomes or outcomes other than the ones considered in this review. This section also captured additional sub-group analyses, relative treatment effects, general equilibrium effects, costs of the programme or cost-benefit analysis, as well as any implementation problems or empirical identification problems described by the author. 12 To minimize the number of missing values in programme-related variables considered relevant for the analysis, additional information was gathered from sources beyond the study (which is the core unit of analysis), including project reports and project websites. The variables coded from these sources were: monitoring mechanisms, participant profiling, incentives to participants (for programme participation and/or performance), and incentives to service providers (payments conditional on outcomes of programme participants), and are presented in Section 9.7 of the Appendix.

Box 6: Understanding effect sizes
Effect size is a generic term used to describe the estimated treatment effect for a study. This treatment effect is the observed relationship between an intervention and an outcome. In order to compare effect sizes across studies and outcome constructs, this systematic review used a meta-analysis to synthesize the data extracted from primary studies.
The SMD was used as a summary statistic in meta-analysis to combine results from studies which used different ways of measuring the same outcome (e.g., income). The SMD is a dimensionless measure of the relative magnitude of the treatment effect, which allowed estimated treatment effects to be compared across studies and different outcome constructs. The direction and magnitude of the effect of the intervention on reported outcomes of interest were essential data elements in assessing the effectiveness of active labour market programmes for youth.
For the analysis in this systematic review, estimated treatment effects were extracted from the primary studies and SMDs were computed. An SMD of zero indicates that the intervention, on average, resulted in an equivalent effect for the treatment group and the (comparison) group which did not receive the treatment; whereas an SMD greater than zero indicates the degree to which, on average, the treatment group had a better outcome. Source: Authors, adapted from Cochrane handbook for systematic reviews of interventions (Version 5. A coding manual provided detailed instructions for coders in order to ensure consistency in extracting and interpreting relevant information, in particular with regard to the selection of appropriate treatment effect estimates. Guidelines were provided to identify the treatment effect estimates with the lowest risk of bias when studies reported multiple estimates for the same types of outcomes. Coders selected the preferred method of estimating the effect and their choices were verified by a second reviewer. For example, estimates based on experimental designs were considered to provide the lowest risk of bias followed by natural experiments and quasi-experimental designs. Other considerations outlined in the manual to mitigate the effects of potential bias included the use of covariates, the type of data used and the statistical methodology applied for the estimation.
Information extracted from included studies was discussed with a second reviewer and coding decisions involving assumptions were documented by each researcher. Further information about the selection of studies and data extraction can be found in the protocol of the systematic review .

Standardizing effect size estimates
To compare estimated treatment effects across studies, the standardized mean difference (SMD) was computed for both continuous outcome variables (e.g., income) and dichotomous outcome variables (e.g., employment probability) reported in the primary studies. In addition, researchers computed a binary variable holding the value of one if a treatment effect was positive and statistically significant (PSS). This report focuses on the SMD-based findings. The analysis of PSS indicators can be found at Kluve et al. (2016).
The SMD captured the relative magnitude of the treatment effect in a way that is dimensionless and hence comparable across outcomes and studies. It was the ratio of the treatment effect for a specific outcome relative to the standard deviation of that outcome within the evaluation sample used to estimate the treatment effect. Most studies reported either matching-or regression-based estimates of the treatment effect (even for RCT-based designs). 13 Hence, SMDs in most cases were computed using the formulae given by Waddington et al. (2004, p. 372f), namely: For studies using parallel group or matching-based strategies Hedges' g and its standard error were computed as Where � and � are the mean outcome in the treatment group and comparison group, respectively. Similarly, and are the respective sample sizes. The term in brackets represents the small-sample correction procedure developed by Hedges and Olkin (1985). The numerator of represents the causal raw impact of the programme on the outcome. In matching-based studies, � − � is reflected by the average treatment effect on the treated (ATET). is the pooled standard deviation of the outcome after treatment and is computed as: With and as the standard deviation in the treatment and comparison group respectively (Hedges' approach). If either the comparison or treatment group's standard deviation was not reported, the standard deviation of the total sample or the comparison group standard deviation was used to compute . In the case of dichotomous outcome variables, the and were computed based on the number of observations and the proportion in the respective group, if available.
For partial effect sizes estimated using multivariate analysis, and its standard error were estimated based on formula described in Keef and Roberts (2004): Where ̂ refers to the coefficient of the treatment variable in the regression, σ � is the pooled standard deviation of the outcome, v is n-k degrees of freedom and Γ() is the gamma function. 14 There are two approaches for the calculation of the pooled standard deviation from regression-based studies. In Hedges' approach, � is the standard deviation of the error term in the regression. As this was rarely reported, the team followed Cohen's approach and computed � from the standard deviation of the dependent variable across all observations ( ) (cf. Lipsey & Wilson, 2001): If information for calculating was not available, it was approximated by = where is the t-value associated with a t-test on the treatment effect of a regression.
If none of the values for , or could be obtained from the report (or by contacting the authors), the standard deviation of the outcome variable was approximated using the formula from Borenstein, Cooper, Hedges and Valentine (2009) = * � n * n n +n where is the standard error of a means test (e.g., regression coefficient). Since this formula is technically only correct for bivariate effect sizes, a sensitivity analysis was performed on the sample without these imputations.
For some studies, the review team transformed reported effect size statistics (often t, F, p or z-values) prior to calculating effect sizes following the procedures suggested in Lipsey and Wilson (2001).
Prior to synthesizing computed effect sizes, checks were made for outliers which could have been a result of erroneous coding or misleading assumption in the computation of SMD. In cases where SMDs or their standard errors seemed implausibly large, the original reports were revisited to check whether these were in accordance with the findings stated by the authors. In cases where the effect sizes where correctly coded and computed but still appeared implausible, the authors were contacted for clarification. As it was not possible to solve all outlier issues following this approach, the data from remaining outliers was censored, initially by winsorizing the data and then finally by dropping any remaining outliers. Winsorizing is a method of censoring data by limiting extreme values in the statistical data to reduce the effect of possibly spurious outliers. Winsorizing refers to a method where all outliers are set to a specific percentile. In this case, the top 1 per cent and bottom 1 per cent of observations were set to the value of the 1st and 99th percentile, respectively. The robustness of the results was tested with respect to the level of winsorizing and the cut-off ranges for trimming outliers.

Unit of analysis issues
Originally, it had been planned to correct the standard errors for a possible unit of analysis error by adjusting the standard errors ( ) according to the formula suggested in Higgins and Green (2011, p. 502ff). A unit of analysis error typically arises if the study conducts analysis and programme placement at different levels and the analysis does not adequately account for this clustering (e.g., by using cluster robust standard errors or variance components analysis). In such cases, the analysis would yield narrower confidence intervals than the true confidence intervals, increasing the risk of Type-I error. This can be a problem in cluster randomized trials or in quasi-experimental studies in which treatment allocation is clustered. However, no studies were identified where there was a suspicion that the unit of analysis was not adequately addressed in the statistical analysis.

Dealing with missing data
In several instances, primary reports did not supply sufficient information to compute standardized effect sizes from reported treatment effect estimates. Most often, postintervention mean and/or standard deviation of the outcome variable could not be obtained.
The frequency with which missing information was encountered indicates that better reporting standards are required for impact evaluations studies.
As a first step, authors of included papers were contacted to provide missing information and to clarify discrepancies. This was an important and time-consuming measure, carried out via standardized letters and missing information forms, into which authors or research assistants could easily insert the results and data requested. 15 Initially, the review team reached out to the authors of 100 included reports (note that this number represents almost the entire sample of included reports in the systematic review), requesting additional information to facilitate the computation of the effect sizes or to achieve clarity on the quantitative results or interventions details. 16 In the event that an author did not reply, the same request was sent two more times. In total, the authors of 34 papers replied, while no response was received from the authors of 63 reports. In the remaining cases, no valid, up-to-date email addresses could be found for the authors.
In several instances, information was missing about, for instance, standard deviations, sample sizes or average outcomes in the comparison group follow-up data collection. In these cases, the missing data were imputed from available information based on specific assumptions. For instance, when the overall sample size was provided but not the sample sizes for the treatment and comparison groups separately, an assumption of equal sample sizes was made (splitting the overall sample size in half). The same assumption was applied in cases in which only the treatment or comparison group sample size was reported. Results from a meta-analysis were reported based on the more conservative sample (without imputing missing information) as part of the sensitivity analysis.
In cases where the information necessary to compute an effect size (e.g., sample size, mean outcomes and/or standard deviation) could not be derived from the available information, the effect size was excluded from the analysis. 17

Dealing with dependent effect sizes
In a meta-analysis, the unit of analysis is the study. Section 3.2.2 clarified that a single programme could include more than one intervention, which was regarded as the review's primary unit of interest (instead of the overall impact of one programme, the team was interested in the impact of each specific intervention). Each intervention may have been evaluated by more than one study (e.g., evaluation), each of which may have been published in multiple reports (e.g., working papers, technical reports or journal publications). Two reports were treated as part of the same study if they were based on the same data and hence could not be treated as independent, even if they were written by different authors. Therefore, an intervention population (all participants) might be different from the study population (all in one data set), which might itself differ from the sample population for a specific treatment effect estimate on a specific outcome construct.
Estimated treatment effects may be regarded as independent from each other when the underlying data were derived from different sample populations. To maintain the independence assumption, it was important that only one effect size per outcome construct and study was included in the analysis (Borenstein, Hedges, Higgins and Rothstein, 2009). However, each report might present different treatment effect estimates for the same outcome construct and the same sample population -for example for different sub-group analyses or employing different statistical methods. This implied that different estimates within each study (sometimes across reports) had to be combined into one effect size per subgroup.
Creating effect size aggregates and summary effect sizes (e.g., at the intervention or study level or across different sub-groups as part of the moderator analysis) required careful estimation to avoid the situation where a single group of participants influenced the summary effect size disproportionately. For example, a treatment effect might be reported in a study for the entire (pooled) sample and subsequently reported for sub-groups of the same sample, such as males and females. 18 The median number of treatment effect estimates per study in the sample was 12, with some reports providing more than 100 estimates. In such instances, a multitude of treatment effects could be reported for the same group where there was no a priori reason to give preference to one measure over another.
In these scenarios it was possible to mitigate the disproportionate influence on the aggregate effect sizes by applying the following steps. First, by identifying a set of effect sizes that were derived from the same independent group of participants and then, where applicable, selecting the effect sizes for this group where it was possible to establish a preference. (For example, keeping only pooled estimates and discarding sub-group estimates except when needed in the analysis.) By dropping some of the effect sizes derived from the sample this redundancy was removed from the analysis as far as possible. 19 This method provided a better approach to the data than averaging effect sizes across all overlapping sub-groups. 20 Second, in cases where multiple effect sizes were reported for each independent group without clear justification for dropping some rather than others (e.g., the where same outcomes were reported at several points in time for the same group), the aggregate ("synthetic") effect sizes were estimated for each independent group. Based on the method for combining effect sizes from the same independent population suggested by Borenstein, Hedges, Higgins and Rothstein (2009), the approach was as follows: Let ij and ij be the th effect size, where = (1, . . . , m), and its standard error, respectively, for the sample population identified by . To arrive at a single combined (aggregate) effect size for group j the team took the simple average: and the standard error of j given by where ρ ikj is the correlation coefficient between ij and kj in study j. 21 18 No studies were encountered in the sample which assessed different treatments using the same group of individuals as the comparison group (multi-arm studies with pooled comparison). 19 Here, redundancy indicates providing additional information about a group that is not needed for the desired level of aggregation. For example, if the goal is to create programme aggregates for all participants, then male and female sub-group estimates may be dropped. On the other hand, if the goal is to create an aggregate for females for each programme, then pooled estimates would be dropped. 20 For the purpose of brevity the guidelines used to drop effect sizes within each group are not included here. This information is available upon request. 21 The first best option is to attempt to estimate from the data. However, in cases where there was an insufficient number of observations, then some assumption about had to be be made. Assuming that = 0 is likely to overestimate precision, and assuming that = 1 is likely to underestimate precision, the more conservative assumption was adopted, that Hence, the independent group aggregates were assembled at the relevant unit of analysis, such as at the intervention or study level (depending on the assumed correlation addressed in the procedure). Then the random-effects meta-analysis was applied to the aggregated data and estimated summary effect sizes.

Synthesis methods
Summary effect sizes are provided for the three outcome categories: (1) employment outcomes, (2) earnings outcomes and (3) business performance outcomes. The summary effect sizes were estimated via a random-effects meta-analysis based on the interventionoutcome level aggregates using the -metan-command in Stata. 22 Random-effects metaanalysis is recommended in settings which present significant contextual heterogeneity in terms of study population, intervention and implementation. To account for differences in individual studies' sample sizes, effect sizes were averaged across studies by using inversevariance weighting of the individual effect sizes. This weighting resulted in the individual effect sizes from studies with larger sample size being given more weight in the combined effect size. The summary effect sizes generated in this manner are presented alongside the 95 per cent confidence intervals in forest plots (Section 4.3). In addition to the aggregate effect size, these forest plots display the weight each intervention carries towards the summary effect size.
Heterogeneity tests were used to examine whether the variation in effect size estimates within outcome categories was larger than expected from sampling error alone (Deeks, Altman & Bradburn, 2001). To test for heterogeneity, the team employed I 2 statistics and Qstatistics. These statistics tested whether the percentage of variability in effect estimates was estimated due to heterogeneity rather than by chance. A significant Q (p-value <0.05) and an I 2 value of at least 50 per cent were considered to be indicators of heterogeneity.

Moderator analysis
Moderator analyses were performed when there was evidence of heterogeneity. The analyses tested hypotheses about whether variation in the (average) effect sizes reported in studies was associated with differences in study, participant and intervention characteristics (moderators). These moderator analyses also served as a test for correlations of effect size magnitude with specific characteristics of interventions and population groups. They therefore formed the basis for the answers to the research questions regarding factors of intervention effectiveness.
In a first step, a univariate approach was implemented, analogous to an analysis of variance (ANOVA) analysis, again via a random-effects meta-analysis based on the interventionoutcome level aggregates. Specifically, the review team investigated heterogeneity within outcome categories by (i) main intervention category, (ii) country income level, (iii) gender, (iv) participant income status and (v) time elapsed after programme completion. Results from these models are presented in the form of forest plots in Section 4.3.3.
Ideally, moderator analysis should be conducted with a minimum of ten studies for each individual moderator variable (Borenstein, Hedges, Higgins & Rothstein, 2009). A decision was made to present forest plots for sub-groups (e.g., intervention types) that had at least four individual interventions. The number of effect size estimates and individual interventions for each sub-group are displayed in the respective forest plots to provide the reader with an indication of the size of the evidence base.
A large array of study-level, intervention-level and contextual variables were identified and coded which it was assumed could be correlated with the reported effect size. The code description in Section 9.7 in the Appendix provides an overview of all variables that are included in the multivariate meta-regression. In addition to these, tests were conducted for the influence of various other moderator variables but the decision was made to exclude any that were deemed non-significant.

Supporting interpretation of effect sizes: The percentage change
In addition to the SMD, the review team computed the simple percentage change of the intervention over the control group mean as a more intuitive indicator of the intervention's impact. The percentage change was calculated by dividing the raw effect size (i.e., the mean difference between treatment and comparison group) by the mean value of the outcome variable for the comparison group. As a consequence, the percentage change indicates the direction of change for the treatment groups, with negative values meaning that the treatment group's outcome was lower than the comparison group's. This percentage change was then averaged by independent effect size group (i.e., by a grouped combination of intervention and study level). Subsequently, the team weighted the group-wise percentage changes using the inverse-variance weights (as throughout the analysis) and computed the final percentage changes. The computation of average percentage changes, however, comes with some caveats (cf. Lipsey et al. 2012) and in some cases the reported average percentage changes are at odds with the average SMDs computed across the same sample of (independent) effect sizes. For this reason, we do not emphasize the findings based on percentage changes but rather see it as a complement to reported SMDs, which are the main focus of our analysis.

Sensitivity analysis
A range of sensitivity checks were conducted to test the robustness of the results. Sensitivity analysis was carried out by restricting the meta-analysis to a subset of all studies included in the original meta-analysis. First, following guidance from the Campbell Collaboration (2014, p. 9), an examination was carried out to establish whether findings were influenced by the rigour of the evidence. Specifically, the team tested heterogeneity across study design (randomized vs. quasi-experimental) and publication status (published vs. unpublished studies). Second, the sensitivity of the results was tested with regard to the assumptions made for computing the SMD effect size in the presence of missing information. Third, the team tested the validity of the method of dealing with statistical outliers (dropping observations vs. winsorizing the data).

Risk of bias and study design assessment
During the research and coding process, the team found that impact studies often lacked important details that would allow a confident appraisal of the plausibility of the identifying assumptions on which the empirical analyses were based. This lack of detailed reporting in many publications limited the extent to which a full risk of bias assessment, for example, based on Waddington and Hombrados (2012), was possible. As a consequence, an alternative framework was adopted (proposed in Duvendack, Palmer-Jones, Copestake, Hooper, Loke & Rao (2011) and Duvendack, Hombrados, Palmer-Jones & Waddington (2012)) in order to assess the statistical rigour of primary studies. This approach combined an assessment of both research design and the method of statistical analysis. It did not incorporate detailed assessment of aspects of bias usually recommended in systematic reviews (see e.g. Higgins and Green, 2011) such as allocation method, confounding, selection bias (including attrition), performance bias, biases in outcomes data collection, and bias in analysis and reporting. In addition to the original approach, the assessment was further disaggregated by the statistical method (DiD, statistical matching, etc.) used for addressing potential confounders of the original research design (randomized experiment, natural experiment, etc.). By placing RCTs at one end of the spectrum and cross-section designs at the other, the tool aimed to reflect the potential capacity of different empirical identification strategies to control for possible confounding. 23 In addition to the sensitivity analysis, the team therefore tested whether different research designs and empirical approaches yielded different effect sizes on average.

Assessment of reporting bias
Publication bias or "file drawer effects" refers to the underreporting of studies which establish a non-significant, negative or mixed evaluation finding (Franco, Malhotra & Simonovits, 2014). The review team assessed the danger of publication bias in the sample of included studies by several means. First, by testing the influence of study design and publication status as part of the sensitivity analysis. Second, by performing standard tests for publication bias: plotting the effect size against standard errors (funnel plots) using themetafunnel-and -metacum-commands in Stata. Moreover, the team also implemented Egger, Davey Smith, Schneider and Minder's (1997) meta-regression test using themetabias-command in Stata. The idea underlying the small-sample assessment to detect publication bias is that "researchers who have small samples and low precision will be forced to search more intensely across model specifications, data, and econometric techniques until they find larger estimates" hence "such considerations suggest that the magnitude of the reported estimate will depend on its standard error" (Doucouliagos & Stanley, 2012).
Tests were also made to establish whether there were observable differences in reported effect sizes between peer-reviewed and unpublished studies. For example, it was possible that estimates reported in journal articles might be more likely to be positive and significant (Stanley, 2013).

DEVIATIONS FROM THE PROTOCOL
The protocol of the systematic review was published in November 2014 and was followed by the implementation of the search and selection process outlined in the protocol. The primary and complementary search process benefitted from the extensive scoping search and development of tailored search strategies for each source prior to the publication of the protocol, allowing the review team to follow the planned search process closely. The main search in electronic databases was completed in February 2014. The systematic search resulted in a high number of studies to be screened, classified and coded in 2014. While the selection and data extraction process was ongoing, the review team decided to consider additional sources that were made available in 2014 (e.g., studies presented at the Doha Evidence Symposium in March 2014). Following the selection and data extraction process, the review team contacted experts and authors of included studies, screened reference lists of included studies and conducted citation tracking in order to identify additional studies from November 2014 to January 2015.
During the process of data collection and synthesis, the team made changes to the coding tool and empirical methodology which represent deviations from the protocol published by the Campbell Collaboration Group .
• In addition to the variables proposed in the protocol, three additional interventionlevel variables were coded. The variables relate to the design of the intervention and were deemed relevant and a priori strongly correlated to reported effect sizes: o Participant profiling for services provided: The variable captured whether the intervention (i) identified individual factors or characteristics that implied a risk in the labour market and (ii) relied on such information to assign youth to specific services. Examples include caseworker discretion, screening or specific eligibility rules.
o Incentives to participants: Capture whether participants received payments conditional on (monitored) programme participation or success. This also included participants' eligibility to welfare or unemployment benefits.
o Incentives to service providers: This variable captured whether payments (or bonuses) to the implementing agency were conditional on outcomes of intervention participants.
• The protocol had proposed to review specific cases were evaluations measured general equilibrium or spillover effects. However, the frequency of such analyses and measures was low. The review team focused on studies looking into partial equilibrium effects on programme participants.
• Given its relevance in policymaking, the protocol had proposed the coding and analysis of Intention-to-Treat (ITT) estimates. The plan was to approximate ITT estimates from studies which reported only Average Treatment Effect on the Treated (ATET) estimates, using the formula suggested in Bloom (2006). However, of those studies estimating the ATET, very few reported the share of individuals who were originally assigned to the treatment group but did not take up treatment (i.e., noncompliers, defiers or no-shows). The approximation proved to be especially difficult for quasi-experimental studies, as the distinction between ITT and ATET estimates was not always clear. Instead of converting treatment effect estimates, the team decided to test differences between ITT and ATET estimates as part of the sensitivity analysis.
• A decision was taken during the analysis stage to present findings for intervention sub-groups that had at least four individual interventions only. Intervention subgroups with fewer than four interventions are not reported in the main text. However, all forest plots containing sub-group analyses by intervention type are presented in Appendix 10.1.

Search results and selection of studies
The primary and complementary search identified 32,117 records, based on a search of over 70 sources, including 12 specialized databases, 11 general databases, 35 websites, such as institutional and conference websites, five dissertation, thesis and grey literature databases, and nine other reviews and meta-analyses. The search in electronic databases was completed in February 2014. From November 2014 to January 2015, the review team contacted experts and authors of included studies, screened reference lists of included studies and conducted citation tracking in order to identify additional studies. The list of included sources as well as the search dates for each source used during the primary and complementary search process are presented in Section 9.6.1 in the Appendix. After removing duplicates in the reference management software EndNote, screening of 28,375 records by title and abstract was carried out by individual reviewers, applying the inclusion criteria of the screening questionnaire (see Section 9.6.2 of the Appendix). A total of 1,141 records were identified for full text screening.
In order to minimize bias, included and excluded results were cross-checked by a second researcher and discrepancies were resolved by both researchers. This systematic screening process led to the identification of 86 reports which were considered to be of adequate content and methodological rigour to inform the systematic review. The main reasons for excluding reports at full-text stage were the following criteria: study design, target group, intervention. In addition, several reports were excluded because a more recent or updated version of the same report was available, the report only focused on relative effects, the impact evaluation study was ongoing or the report did not examine any of the outcomes of interest considered in this review. Examples of excluded reports and the reasons for excluding them are presented in Section 7. 2. 24 After extracting data from the preliminary set of 86 included reports, the review team screened 6,782 additional records that were identified through reference lists and citation tracking of included studies, hand searching of key journals in which a large number of included studies were found and contacting authors and experts. This search process led to the selection of 27 additional reports. Overall, this comprehensive search and selection process identified 113 reports which were considered eligible for inclusion in this review. The search and screening process is illustrated in Figure 2.

Characteristics of included reports
The systematic screening process led to the identification of 113 reports that met all criteria for inclusion (Section 3.2). 25 As shown in Table 8, panel A, more than half of the impact evaluations of youth employment interventions were conducted in high-income countries 25 Note: Each intervention may have been evaluated by more than one study (e.g., evaluation), each of which may have been published in multiple reports (e.g., working papers, technical reports or journal publications). Further information about the relation between study and report is provided in Section 3.4.5. where there is an established practice of results measurement, particularly with regard to government employment measures. The large share of reports from high-income countries in this systematic review (65 out of 113 reports representing eleven of the 31 countries in the sample) is an important feature that justifiably suggests that some caution should be exercised when interpreting the results in global terms. The number of reports assessing the impact of youth employment interventions has increased steadily over the past few years (panel B), with nearly half of the sample published after 2010 and 21 reports published in 2014 alone. 26 Interestingly, this surge in evaluation has benefitted developing countries by providing a greater quantity of better quality evidence about what works to support youth in the labour market. There were 48 reports of interventions implemented in low-and middle-income countries, with a particular prevalence of impact evaluations in Latin America and the Caribbean.
The search process identified a variety of publications from the grey literature (panel C). Only around one-third of the reports come from peer-reviewed journals, with the remainder split 26 In contrast, the 2007 synthesis of the Youth Employment Inventory reported 73 studies with a counterfactualbased impact evaluation of youth employment programmes implemented between 1950 and 2006 (Betcherman et al., 2007). Notably, most impact evaluations recorded in the inventory and implemented prior to 1990 took place in high-income countries (mainly the United Kingdom and the United States). between working papers, technical reports from implementing organizations and others, such as books or dissertations. Most of the reports published in 2014 were working papers, identified through the complementary search process.
While the review focused on counterfactual impact evaluations, the search process uncovered a large variety of different evaluation designs, namely experimental designs, natural experiments and quasi-experimental designs (as discussed in Section 3.2.5). In contrast to other systematic reviews, this review contained a significant share of randomized experiments (53 reports, as shown in Table 8, panel D). Many of the results from these randomized control trials (RCTs) have been published recently (66 per cent after 2010) and hence were not included in previous reviews. Figure 3 shows the recent surge in rigorous evidence. Prior to 2011 most RCTs in the sample were conducted in high-income countries (Figure 4), while the past five years have seen a remarkable increase in RCTs in developing countries. Most notably, in 2014, 12 out of 15 RCTs included in this review were from lowand middle-income countries; seven of them evaluating youth employment programmes in Africa (Box 7).
Quasi-experimental designs, such as panel and cross-sectional evaluations were the second most common study design (50 reports In relation to the evaluation features, 39 reports provided impact estimates at multiple time points. In addition, 71 reports measured changes in outcomes of interest over 12 months after treatment exposure (panel E). These longer term effects were estimated primarily across skills training interventions. Few studies provided a sub-group analysis in addition to the overall analysis (panel F). In particular, only half of the reports in the sample provided separate results for males and females (excluding those that evaluated gender-targeted programmes). Very few reports in the sample provided separate treatment effects for disadvantaged, low-income or low-educated youth. Table 8 also provides an overview of the types of outcomes measured across the included reports (panel G). Three-quarters of the reports in the sample reported results for more than one type of outcome. Employment and earnings outcomes were extensively reported. Employment probability was by far the most commonly measured and reported outcome within the set of reports. More than 88 reports provided an estimate of the programme impact on employment probability. Another 35 reports estimated the effect of an intervention on hours worked. This review included 13 reports of impact evaluations carried out in African countries. None of these 13 reports predated 2010. Most (nine studies) were published in working papers with only two reports published in peer-reviewed journals (by January 2015). With only one exception, all quantitative results came from RCTs, which often reported the intention-to-treat estimator as well as the effect of the intervention on the average participants who completed the programme -this was due to compliance problems which are common across evaluated interventions in the region.
Only six reports measured changes in outcomes of interest over a year after the young person's exposure to the intervention. This s an important aspect, as labour market impacts often materialize only over the long term.
Studies focused mainly on assessing changes in employment (13 reports) and earnings outcomes (12 reports), and to a lesser extent on understanding changes in business performance, survival or expansion (six reports). A sizable number of entrepreneurship promotion interventions were implemented in Africa and included in the review (eight out of 17).
Source: Based on a background report on African studies (Pasali, 2015). Table 8 also displays the limited number of reports (ten out of 113) measuring changes in business performance outcomes. Nine of these related to RCTs. They were most commonly found among interventions aiming to promote entrepreneurship among young people.

Characteristics of evaluated interventions
As shown in Figure 2, the search process led to 113 reports that assessed impacts of 87 youth employment programmes. The review drew a key conceptual distinction between programmes, interventions and components (Section 3.2.2). Youth employment programmes can consist of one or more interventions. These are exclusive tracks offered to discrete samples of participants. For example, in the New Deal for Young People programme, implemented in the United Kingdom and described in Box 8, youth had to choose one of four different tracks, namely, (i) education or training; (ii) a job with a voluntary sector employer; (iii) a job on the environmental task force; or (iv) employment in a wage subsidy programme.
Interventions, on the other hand, have one or several components, which were classified as skills training, entrepreneurship promotion, employment services or subsidized employment measures. Table 9 provides an overview of the 107 interventions in the review. Main category (panel A) refers to the interventions where it was possible to identify a primary component. In line with previous reviews Betcherman et al., 2007), skills training proved to be the most common type of main intervention category, followed by subsidized employment, entrepreneurship promotion and employment services. There were six interventions for which no main category of intervention could be identified, and these were therefore classified as unspecified. Their components were bundled in such a way that made it impossible to identify one type of intervention as being predominant over the others. They were truly multi-dimensional in nature and formed part of the following programmes: Active labour market programme for disadvantaged youth in Germany (study by Ehlert, Kluve & Schaffner, 2012) Jackson et al., 2007). Details of the New Deal for Young People in the United Kingdom are presented in Box 8.

Box 8: New Deal for Young People (NDYP) in the United Kingdom
The New Deal for Young People (NDYP) was introduced in the United Kingdom in 1998 and aimed to help the young unemployed into work and to increase their employability by combining different types of interventions, especially job-search assistance and subsidized employment. Participation was mandatory for all people aged 18-24 who had claimed unemployment benefit (Jobseeker's Allowance) for a period of six months or more. Participants entered a "gateway" period of intensive job-search under the supervision of a personal adviser, intended to last no longer than four months. Those who were still receiving the Jobseeker's Allowance at the end of the gateway period were obliged to take one of four options: (i) entry into full-time education or training for those without basic qualifications; (ii) a job with a voluntary sector employer; (iii) a job on the environmental task force; (iv) employment in a wage subsidy programme. In addition, under the terms of the scheme, employers were obliged to offer education or training on at least one day per week.
Evaluations showed that the programme appeared to have generated an increase in the probability of young men (who had been unemployed for six months) finding a job within the next four months (Blundell, Costa Dias, Meghir & Van Reenen, 2004) and suggested that a period of subsidized employment was a more effective means of exiting unemployment and securing unsubsidized employment than the other options available under NDYP .
While the remaining interventions had one main component to address the labour market constraints of youth, more than one-third extended the intervention's scope with one or more additional measures. As panel B shows, some 64 per cent of interventions in the review incorporated a skills training component; but almost half of these combined skills training with some other measure. The most common combination was skills training and employment services, observed in 27 interventions.
Entrepreneurship promotion interventions that focused on youth were comparatively scarce. Entrepreneurship-related components were only reported in 17 interventions, and these components often seemed to be delivered in a way that was disconnected from other active labour market measures. It is important to highlight that the results chain for entrepreneurship promotion (Table 2) already incorporates the delivery of training services in relation to entrepreneurial and business development and management skills, avoiding potential overlaps between skills training and entrepreneurship promotion categories. 27 As discussed above, the majority of the reports included in this review assessed impacts of youth employment programmes implemented in high-income countries, which translated into a sample of 60 interventions (panel D). There were 56 interventions (52 per cent) from OECD countries alone (panel E), a proportion comparable to those seen in previous reviews (e.g., Card et al., 2010 andBetcherman et al., 2007).
The second largest share of impact evaluations stemmed from interventions in Latin America and the Caribbean, where many countries have experimented with active labour market policies (ALMPs) since the early 1990s -particularly through quasi-experimental designs embedded in the Jóvenes Programmes, a series of skills training interventions implemented throughout the region 28 (see Box 2 for an example).
The review captured 17 interventions evaluated in Africa (15 in  A close examination of programme targeting (panel G) led to the identification of 16 interventions (15 per cent) designed to serve only young women, 48 interventions (45 per cent) targeting youth who were unemployed prior to joining the intervention and 45 (42 per cent) that focused exclusively on low-income and disadvantaged youth.
Public and private sector actors were the most common implementing entities. Their implementing role was more prevalent among high-income countries, while evaluated interventions with an implementation role for non-governmental organizations (NGOs) and non-profit organizations tended to be more common in low-income countries.
Detailed descriptions of the intervention features and overall treatment effects are presented in the Appendix in Sections 9.1 to 9.5.

ASSESSMENT OF INCLUDED STUDY DESIGNS
Impact studies often lacked important details that would allow a confident assessment of the plausibility of the identifying assumptions on which the empirical analyses were based. In order to assess the rigour of the designs in the included primary studies, the review team used the framework proposed in Duvendack et al. (2011Duvendack et al. ( , 2012. The approach combined an assessment of both the research design and the method of statistical analysis leading to an implicit hierarchy of study designs, with RCTs as the most rigorous design and cross-section designs at the bottom. Given the study design, the rigour of the statistical analysis was also a function of the statistical methods, ranging from more advanced methods, such as DiD, PSM, instrumental variables (IV), or regression discontinuity designs (RDD), to multivariate regressions and simple (means) tabulations. Table 10 shows the classification of evaluation reports that were included in the systematic review. Almost half of the reports (67 reports or 47 per cent of cases, see Table 8) were conducted as RCTs, meaning that the studies are assessed to be potentially high quality. Table 10 count the number of cases when a particular report relied on a particular statistical method. It was possible, for example, for the same RCT to rely on more than one method, which explains why the total number of RCTs in Table 10 surpasses that reported in Table 8. Notes: Based on Duvendack et al. (2012). One research design could rely on more than one statistical method of analysis.

Figures in
"Other" includes 11 cases (nine reports) that could not be readily classified within the other statistical methods of analysis.
They comprised non-parametric statistical approaches (three reports), a combination of matching and IV (two reports) and principal stratification approaches (two reports). Given that these are rather sophisticated methods (more than a simple tabulation of means), their occurrence with RCTs or natural experiments was considered of potential high quality.
A further 11 per cent of reports (12 cases reported in Table 10) were based on natural experiments, combined with sophisticated statistical methods that went beyond simple tabulation of means. Accordingly, these studies can also be categorized as potentially of high quality.
There were a total of 60 reports with pipeline, only panel or only cross-section designs ( Table  8, under quasi-experimental designs). In 46 cases they used and/or combined DiD, PSM, IV or RDD methods, associated with a high to medium quality evidence. There were only 12 instances (11 per cent) of low statistical rigour, when the above-mentioned designs relied on multivariate analysis or tabulation methods. There was only one unclassified report that combined panel and multivariate analysis.
In summary, the analysis showed that the included reports generally used rigorous designs, with almost 48 per cent of cases presenting potentially high quality evidence, 42 per cent high-medium quality evidence, and only 9 per cent with potentially low quality evidence ( Figure 5). This finding somewhat alleviated concerns of prevalent biases to the internal validity of included reports. However, it was clear that the design approach could only provide a first approximation of potential factors affecting the internal validity of empirical research designs, which should include examination of methods of treatment assignment, confounding, selection bias (including attrition), performance biases, biases in outcomes data collection, and biases in reporting (see e.g. Higgins and Green, 2011). Section 4.3.4 conducts sensitivity analysis by testing whether studies classified as having potentially low level of statistical and analytical rigour contained statistically significant different effect sizes in comparison to studies that used potentially more rigorous methods.

Descriptive analysis of effect size estimates
To synthesize the results of the 113 empirical reports of youth employment interventions, the review relied on the reported treatment effect as a measure of impact. The search and screening process led to the identification and coding of 3,629 treatment effects. Based on the reported (or acquired) information, it was possible to compute the direction and statistical significance for 3,105 treatment effect estimates. The computation of standardized mean difference (SMD) required further information (the minimum requirement being the number of observations in treatment and/or comparison groups). Even after using the methods of imputing missing information described above (Section 3.4.4), it was only possible to compute the SMD from 2,259 reported treatment effect estimates, as shown in the third column of Table 11.

High, 48%
Medium-high, 42% Low, 9% N/A, 1% It was possible to compute a substantially higher number of effect sizes due to efforts to acquire missing information from authors. There were 121 independent samples to account for dependencies within studies due to overlap of the study population across effect estimates, as described in Section 3.4.5 Dealing with dependent effect sizes.

Univariate random-effects meta-analysis
The following sections discuss results from the univariate meta-analysis approach to explore the differences in average effect size estimates across interventions in the sample. The analysis built on forest plots, which are commonly used to graphically describe the results of a meta-analysis. Forest plots are based on an inverse-variance weighted least squares random-effects meta-analysis model (see Box 9).

Box 9: Reading a forest plot
This review presents effect size estimates and confidence intervals for the respective outcomes of interest of an intervention. This information is displayed in forest plots, which can be read as follows: • Each sub-group (for summary plots) or intervention (for full plots) is represented by one line in the plot.
• The SMD is reported under effect size (ES), along with its corresponding confidence interval. The same information is represented graphically through the diamonds. An SMD greater than zero indicates that, on average, the treatment group had a better outcome than the (comparison) group, which did not receive the treatment. This is considered a positive effect.
• The vertical, unbroken line represents no effect from the interventions on the outcomes of interest.
• The edges of the diamonds represent the confidence interval (CI). For instance, in the summary forest plots shown below, the size of the diamonds represents the confidence interval per sub-group analysed in the respective plot.
• The weight is the inverse of the variance of that particular sub-group or intervention. It shows the contribution or strength that each particular sub-group (for summary plots) or intervention (for full plots) gives to the overall summary effect size.
• The overall effect estimate is reported at the bottom of the plot. The SMD value is further marked by a vertical dotted line, making it easier to compare where sub-group SMDs fall in relation to the overall SMD.
• The level of heterogeneity is captured in the I 2 statistic.
• Notes below each aggregate forest plot provide the number of SMDs and the number of independent studies that form the basis for each computed summary SMD.
To improve the readability of this report, only "summary" forest plots are included in the main text. These provide the summary estimate for each sub-group in the respective analysis and, where appropriate, the respective overall summary SMD. 29 Fifteen "disaggregated" forest plots, with study-level SMDs for each outcome category and main intervention category, are provided in the Appendix. 30 Results presented in these forest plots were based on the sample using all imputations available and winsorizing the top 1 per cent of statistical outliers. Results from the restricted sample and/or obtained under different assumptions regarding outliers are presented as part of the sensitivity analysis in Section 3.4.9. Figure 6 and Table 12 present the overall summary effect sizes for each selected outcome category of interest -namely, employment outcomes, earnings or income outcomes and business performance outcomes. 31 The total sample size is calculated by aggregating the number of observations coded from individual studies, while avoiding double-counting of effect sizes measured for the same sample of participants. The aggregate sample sizes throughout the report are often strikingly large. The reason is a number of quasiexperimental impact evaluations based on administrative data. One example is the paper by Webb et al. (2014), who study a targeted employment subsidy using Canada's Labour Force Survey (LFS) and hence reach a sample size of more than 480,000 individuals. Note that an individual study may have contributed to multiple outcome categories and hence the individual sub-groups may not be independent (in other words, the same sample of participants may have provided an estimate for earnings and employment outcomes, in which case the two estimates are not independent. In addition, employment, earnings and business performance are different constructs. Consequently, an overall effect size is not reported.

Synthesis of the overall evidence by outcome
Employment and earnings outcomes were the largest contributors to the overall metaanalysis: 105 of 119 independent studies estimated an employment outcome and 92 estimated an earnings outcome. 32 The overall effect on earnings outcomes across all intervention categories was 0.05 SMDs (CI = 0.03, 0.06; I 2 = 82 per cent; number of interventions = 92) and statistically significant at the 5 per cent level. The summary effect on employment outcomes was similar and also statistically significant (0.04 SMD; CI = 0.03, 0.06; I 2 = 64 per cent; number of interventions = 105). Only impact estimates from studies that measured business performance outcomes exhibited a relatively large confidence interval and the summary effect was not statistically significant (0.03 SMD; CI = -0.05, 0.12; I 2 = 49 per cent; number of interventions = 14).
At the same time, the plot also exposed high heterogeneity (represented by the I 2 statistic) within each outcome category, suggesting that a large share of the variation in effect sizes is explained by inter-study heterogeneity. Earnings outcomes displayed the highest I 2 value at 82 per cent, suggesting that more than three-quarters of the variation in the effect sizes is not by chance and rather due to heterogeneity between interventions.
In order to explore the factors driving such differences, the remainder of the report explored the effect sizes within each outcome category through moderator and sensitivity analyses. Since the number of independent studies that measured specific outcomes for certain moderators was small for some moderators, the review team only assessed those outcomes where at least four interventions were obtained. 31 See the corresponding full forest plot in Appendix Section 10.1, Figure 49. 32 Discrepancies with Table 11 are due to the treatment of outliers prior to analysis.

Univariate moderator analysis
As a first step, the team tested whether summarizing effect sizes within the three outcome categories presented a viable procedure or whether significant heterogeneity was already detectable across outcome constructs in each outcome category. Following this, tests for heterogeneity were carried out by investigating the influence of several factors as part of the moderator analysis: (i) main intervention type; (ii) country income level; (iii) time after exposure to treatment; (iv) study-level summaries of participant characteristics, including gender and participant's income status; (v) programme characteristics, including scale of the programme and implementing organization. The moderator analyses generally provided results that were stratified by main category of intervention in order to avoid "comparing the incomparable".

Outcome measure
To factor in the diverse nature of each outcome, the team assessed the effect size of each outcome measured separately by outcome category (see Figure 7, Figure 8 and The significantly smaller sample of effect sizes for business performance outcomes presented greater variability, with overall negative effects on profits ( Additional parameters are reported in Table 14 and Table 15. While there was heterogeneity across the different outcome measures within each outcome category, this was not statistically significant based on the random-effects meta-analysis model within each outcome category (the 95 per cent confidence interval of the sub-group average included the overall mean represented by the dotted red line), except for the case of unemployment duration and capital and investment measures). Based on these results, the team was confident that it was viable to pool results across outcome measures in the subsequent analysis.  Table  13 for further details.  Additional parameters are reported in Table 14 and Table 15.
While there was heterogeneity across the different outcome measures within each outcome category, this was not statistically significant based on the random-effects meta-analysis model within each outcome category (the 95 per cent confidence interval of the sub-group average included the overall mean represented by the dotted red line), except for the case of unemployment duration and capital and investment measures). Based on these results, the team was confident that it was viable to pool results across outcome measures in the subsequent analysis.

Main category of intervention
After restricting the analysis to cases where employment outcomes were reported ( Figure  10 Table 16. Interventions providing mainly employment services to youth were the least successful (0.01 SMD; CI = -0.02, 0.04; I 2 = 0 per cent; number of interventions = 10). In agreement with the descriptive analysis of interventions, interventions with skills training as the main category had the greatest weight within the overall employment-related effect size. In most cases, confidence intervals overlapped with the overall mean SMD, suggesting that there were no significant differences in average effect size across types of interventions. The I 2 tests, however, reported statistically significant heterogeneity within the sub-groups for skills training, entrepreneurship promotion and subsidized employment interventions.   There was no evidence of heterogeneity across cases where it was not possible to identify a main category of intervention (i.e., in the unspecified category). Such cases reported an SMD of 0.03 (C I= -0.04, 0.10; I 2 = 0%; number of interventions = 5) on employment outcomes. The category was dropped from the earnings outcome analysis due to insufficient sample size.
Qualitatively, the results of effect sizes from earnings-or income-related outcomes, across main intervention types, mimicked those from employment outcomes, though in this case subsidized employment interventions offered the lowest (and negative) effect size (-0.01 SMD; CI = -0.05, 0.03; I 2 = 61 per cent; number of interventions = 9).
The computed effect sizes (displayed in Figure 11) suggested that skills training (0.07 SMD; CI = 0.05, 0.08; I 2 = 86 per cent; number of interventions = 60) and entrepreneurship interventions (0.09 SMD; CI = 0.01, 0.18; I 2 = 64 per cent; number of interventions = 12) positively and consistently impacted both the employment and earnings prospects of young people, while evidence from other intervention types showed rather lower impacts on both outcome categories. However, significant heterogeneity was detected within all categories of intervention except for employment services (0.01 SMD; CI = 0.00, 0.02; I 2 = 0 per cent; number of interventions = 8). See Table 17 for further information on the results and parameters.  The summary forest plot for business performance outcomes ( Figure 12 and Table 18) relies on a sample of 169 effect sizes, computed from treatment effects reported in 14 studies.
Notably, the impact of skills training interventions, which measured impacts on business performance outcomes (four cases), was negative with an average SMD of -0.09 (CI = -0.18,

Country income level
This section explores differential impacts across country income levels. The analysis recognized (i) the differences in labour market barriers facing youth in the context of different country income levels ; (ii) the role of context on the ability of youth employment interventions to shape labour market outcomes of youth (Betcherman et al., 2007); and (iii) the intrinsic and differentiated characteristics of labour markets and institutions across middle-and low-income countries in comparison to high-income countries (Fields, 2011;Cho, Margolis, Newhouse & Robalino, 2012).
The analysis capitalized on the sizable number of studies under each country income group. There were 65 and 48 reports of interventions implemented in high-income countries and low-and middle-income countries, respectively.
Interventions in high-income countries were typically national programmes, implemented and designed by government agencies. Evidence from local or pilot interventions was scarce (only 15 per cent of the total sample). In low-and middle-income countries, more than 40 per cent of the evidence was generated from small-scale local programmes. These programmes often targeted specific groups, such as young women. While only 5 per cent of interventions in high-income countries targeted young women, they were the focus of 27 per cent of the interventions evaluated in low-income countries.
Evaluated interventions also varied across country income levels. 33 While, in high-income countries, evaluations of employment services, subsidized employment and skills training were common, only a negligible number of entrepreneurship promotion interventions were evaluated. In contrast, both entrepreneurship and skills training interventions were relatively frequently reported in countries outside the high-income economies, but there were few cases of evaluated interventions providing mainly employment services or subsidized employment interventions.
Research designs also varied across country income groups. A significant proportion (>50 per cent) of the recent evidence from middle-and low-income countries had been generated from relatively small-scale experimental evaluation designs. In contrast, quasi-experimental approaches using administrative data made up a large share (60 per cent) of the studies from high-income countries.
The review team also observed that many of the interventions in high-income countries were designed and implemented with the participation of government agencies. However, in some cases other stakeholders were involved, in particular the private sector (for example, in the form of private firms providing training or employment services). 33 Results did not necessarily reflect the intervention types which were predominately implemented as these may not have been evaluated.
Summary forest plots are provided for high-income and low-and middle-income countries. 34 Effect sizes reported on both employment and earnings outcomes were generally higher among low-and middle-income countries (see Skills training interventions generated the highest magnitude of impacts in high-income countries, reportedly 0.04 SMD (CI = 0.01, 0.07; I2 = 68 per cent; number of interventions = 29) for employment ( Figure 13, Table 19) and 0.02 SMD (CI = 0.00, 0.04; I2 = 72 per cent; number of interventions = 21) for earnings ( Figure 15, Table 21) outcomes, respectively. In contrast, skills training interventions in lowand middle-income countries showed lower effect size (0.06 SMD; CI = 0.02, 0.10; I2 = 63 per cent; number of interventions = 38) among employment outcomes, compared to a 0.18 SMD (CI = 0.06, 0.29; I2 = 68 per cent; number of interventions = 5) for entrepreneurship promotion interventions.
While the effect size displayed wide variance, entrepreneurship promotion interventions offered positive prospects for stimulating the labour market outcomes of youth in the developing world. The limited number of effect sizes from entrepreneurship interventions in high-income countries caused the category to drop out of the analysis.
Evidence from subsidized employment interventions in low-and middle-income countries was rather limited but still presented positive impacts (on average) and non-heterogeneity within the sub-group for both outcome types. For high-income countries, the interpretation of the findings was less encouraging, as the studies reported no impact on employment and negative impacts on earnings.
Employment services interventions held an important position in terms of the effect of youth-targeted ALMPs in high-income countries. While their impact on employment was negligible, they tended to yield positive income gains among participating youth. The category dropped out of the low-and middle-income countries analysis due to its reduced sample size.
In conclusion, observing both outcome categories, the team ruled out statistically significant differences between intervention types for both high-income and low-and middle-income countries.  Betcherman et al. (2007), which demonstrated that the probability that a programme has a positive impact on labour market outcomes declines as the country's income level rises. Evidence from subsidized employment interventions in low-and middle-income countries was rather limited but still presented positive impacts (on average) and nonheterogeneity within the sub-group for both outcome types. For high-income countries, the interpretation of the findings was less encouraging, as the studies reported no impact on employment and negative impacts on earnings.
Employment services interventions held an important position in terms of the effect of youth-targeted ALMPs in high-income countries. While their impact on employment was negligible, they tended to yield positive income gains among participating youth. The category dropped out of the low-and middle-income countries analysis due to its reduced sample size.
In conclusion, observing both outcome categories, the team ruled out statistically significant differences between intervention types for both high-income and low-and middle-income countries.

Duration after treatment
Not all research studies reported information on the time lag between exposure to treatment and measurement of changes in outcomes. After imputations, only 72 per cent of the SMDs could be classified according to study timing after treatment into short-(data collected less than 12 months after the end of the treatment), medium-(12-24 months) and long-term studies (more than 24 months). Longer term outcomes were most common in evaluations from high-income country.
In the restricted sample, the overall effect size of employment and earnings outcomes was roughly the same ( In both cases, short-and medium-term studies had a similar weight on the overall effect size of the entire meta-analysis (around 45 per cent and 35 per cent, respectively). While there was unaccounted heterogeneity within the different duration terms, it appeared that effect size estimates from longer term evaluations (>1 year) were relatively larger than shorter and medium term estimates in the case of studies measuring employment outcomes. As displayed in Figure 17 and Table 23, effect sizes for medium and long term were 0.05 SMD (CI = 0.03, 0.07; I 2 = 51 per cent; number of interventions = 43) and 0.06 SMD (CI = 0.02, 0.09; I 2 = 64 per cent; number of interventions = 21), respectively. This suggested a certain time lag before outcomes materialize.  Earnings outcomes showed a reversed pattern. Figure 18 (Table 24) shows impacts that decrease as duration between exposure to treatment and measurement increases. Mediumand long-term effect sizes were 0.06 SMD (CI = 0.03, 0.09; I 2 = 67 per cent; number of interventions = 38) and 0.05 SMD (CI = 0.02, 0.09; I 2 = 80 per cent; number of interventions = 20), respectively. While the effect size for short-term duration was 0.07 SMD (CI = 0.04, 0.1; I 2 = 84 per cent; number of interventions = 54).  At the same time, for both outcome types, the confidence intervals for each sub-group fell within the mean overall SMD and, hence, differences were not statistically significant. As the team suspected that other study-level characteristics might have confounded the analysis, this question was explored in more detail through multivariate meta-regression (Kluve et al., 2016).

Gender
The analysis by gender relied on whether an effect size was reported for female or male participants only. Pooled results (meaning those estimated on data that could not be disaggregated by gender) did not form part of this sub-group analysis. A large body of literature focuses on differences in the effectiveness of labour market interventions by gender. These differences were reflected in the interventions and studies in the sample. There were 39 interventions that reported male-only outcomes, of which more than half were located in high-income countries. Conversely, of the 54 interventions reporting outcomes separately for females, only 44 per cent were in high-income economies. More than one-quarter of these 54 interventions (15) specifically targeted only female participants (the vast majority of which (12) were located in low-and middle-income countries). In contrast, many other intervention characteristics were distributed relatively evenly between interventions that reported male-only and/or female-only estimates, in particular the main category of the intervention, age, education and income status of the target population, or programme implementers.
Summary forest plots showed greater effect sizes for young women compared to young men across employment and earnings outcomes. This suggested that interventions which specifically measured changes in outcomes by gender tended to have higher returns for women.

Participant income status
This subsection looks at differential effects by participant sub-group, focusing on lowincome/disadvantaged/at risk/vulnerable youth. Estimates for this sub-group (labelled "disadvantaged youth" for ease of reference) existed for almost half of the interventions (47 per cent) or more than half of the programmes (51 per cent), equally distributed across gender. The share of interventions containing separate estimates for the disadvantaged youth sub-group was considerably higher in low-and middle-income countries (57 per cent) than in high-income countries (38 per cent).

Figure 22: Summary forest plot of earnings outcomes by participant income group
(where yes is low-income, disadvantaged, at risk or vulnerable youth)    Notes: Entrepreneurship promotion, employment services, subsidized employment and unspecified categories were dropped from the analysis for the group of low-income participants due to the small number of independent studies. Entrepreneurship promotion and unspecified categories were dropped from the analysis for the group of non-low-income participants due to the small number of independent studies. Notes: Employment services, subsidized employment and unspecified categories were dropped from the analysis for the group of low-income participants due to the small number of independent studies. The unspecified category was dropped from the analysis for the group of non-low-income participants due to the small number of independent studies.

Programme characteristics
This section analyses the presence of effect size heterogeneity across studies that evaluated programmes on a different scale or implemented by different actors. In the sample these two characteristics did not differ for interventions within the same programme, they were therefore referred to as programme-level characteristics.
The review team coded the scale of the programme using four categories, which generally referred to the level on which the programme was implemented, namely: 1. National level, which comprised programmes that were implemented across several regions in a country. 2. Regional level, referring to programmes that had clear geographical targeting on selected administrative regions. 3. Local level, when multiple areas in the entire country were selected (e.g., cities). 4. Pilot level, capturing programmes that were implemented as a trial, with relative low scope and the expectation of future scale-up.
Note that the variable was coded on the intervention rather than the study (sample) level: That is, the classification did not reflect whether the evaluation was conducted for a subsample of the entire programme, but rather the main objective was to test the difference between (small-scale) local or pilot programmes and national-level policies.
Results are presented in Figure 27 (Table 31) and Figure 28 ( Table 32). Studies of nationallevel programmes generally reported somewhat smaller effect sizes for both earnings (0.03 SMD; CI = 0.02, 0.05; I 2 = 76 per cent; number of interventions = 47) and employment outcomes (0.03 SMD; CI = 0.01, 0.05; I 2 = 59 per cent; number of interventions = 55). However, the difference in terms of smaller scale programmes is not statistically significant. In addition, there was large unexplained heterogeneity within all sub-groups except the sample of pilot programmes.   Percentage change 0.04 0.08 0.14 0.33     Table 33) and Figure 30 (and Table 34) provide summary SMDs for studies that analysed programmes implemented by different agencies. Implementers were categorized into public institutions, i.e., governments or multilateral organizations, and private entities, which were private sector firms or NGOs. In the analysis, the review team looked at the differential impact of programmes implemented by (i) governments and/or multilaterals, (ii) private sector firms and/or NGOs, or (iii) a combination of public and private sector (i.e., governments and/or multilaterals combined with private sector firms and/or NGOs). Any programmes that were not classified according to these three groups were called "other"; for example, when the implementing agency remained unknown to the reviewers. Private sector only implemented programmes (i.e., implemented by private sector firms and/or NGOs) led to moderate gains for both employment and income of around 0.04 SMDs (CI = -0.01, 0.10; I 2 = 80 per cent; number of interventions = 23) (employment) or 0.05 (CI = 0.00, 0.10; I 2 = 75 per cent; number of interventions = 21) (earnings) and with the summary SMD barely reaching significance at the 5 per cent level.  The summary SMD of studies of public sector only implemented programmes (i.e., government and/or multilateral agency as implementers) was statistically insignificant for both employment and earnings outcomes. However, as it was possible that the analysis could have been confounded with other intervention-or study-level characteristics that were correlated with programme scale (e.g., country income level), this difference was explored in more detail as part of the multivariate meta-analysis depicted in Kluve et al. (2016).

Sensitivity analysis
In this section, the robustness of the results are tested. For the sake of brevity, this section discusses the sensitivity of the results from the overall synthesis of the evidence (pooled sample) and the moderator analysis by main intervention category. Hence, the robustness of each moderator analysis is not discussed, as these generally reflected findings in the pooled sample.
The following section focuses on three types of decisions which may have affected the overall results: First, different assumptions in computing (or imputing) effect sizes are tested. Second, the robustness of some of the decisions in the data synthesis (e.g., regarding outliers) is checked. Third, the question of whether the variance in effect sizes might be caused by factors related to the applied evaluation design (i.e., study type, risk of bias) is investigated.
For the univariate analysis, the respective summary forest plots are again included in the main text while, for this section, forest plots showing each intervention SMD separately have not been appended. 35 In addition to the sensitivity analysis discussed in this section, the review team performed various other checks on the analysis (e.g., using Cohen's d instead of Hedges' g; testing for differences between Intention-to-Treat (ITT) and Average Treatment Effect on the Treated (ATT) estimates; additional methods of data imputation). Since none of these checks significantly altered the main results, they are not discussed in detail.

Imputation of missing information
As discussed in Section 3.4.4, in some cases it was necessary to impute information and make assumptions in order to compute SMDs for specific studies.
• First, the review team had to make certain assumptions regarding the sample size of the treatment and/or comparison group if either of these was not reported. • Second, the team approximated SMDs using the formula provided by Borenstein, Cooper, Hedges and Valentine (2009) to approximate SMDs where information on the pooled standard deviation could not be obtained otherwise.
This section compares results using the entire sample of studies (including all imputed values) with a restricted sample excluding all studies where SMDs (or their standard error) could not be computed without these assumptions. 36 Figure 31 replicates the forest plot displayed in Figure 6, displaying the summary SMDs by outcome category. Not imputing any missing information reduced the overall sample from 2,169 SMDs and 119 studies by almost half, to 1,116 SMDs (82 studies). The average SMD for employment outcomes increased, though the increase was not statistically significant. At the same time, the summary effect size (i.e., SMDs) for earnings outcomes was significantly reduced to 0.01; leading to an (insignificant) reduction in the overall SMD of youth employment interventions across outcomes. Figure 32, Figure 33 and Despite reducing the sample size significantly, the basic results regarding the effectiveness of different intervention types held in the smaller sample. In fact, results and average effect sizes for employment and business performance outcomes were very similar to the main results that included all imputed values. Only in the case of earnings/income outcomes, was the average impact of skills training significantly reduced (the confidence intervals for skills training in the upper and lower panel did not overlap). Also, the precision of the estimate for entrepreneurship promotion intervention was somewhat reduced (i.e., had a larger confidence interval).

Figure 33: Summary forest plot of earnings outcomes by main category of intervention without imputations
Despite reducing the sample size significantly, the basic results regarding the effectiveness of different intervention types held in the smaller sample. In fact, results and average effect sizes for employment and business performance outcomes were very similar to the main results that included all imputed values. Only in the case of earnings/income outcomes, was the average impact of skills training significantly reduced (the confidence intervals for skills training in the upper and lower panel did not overlap). Also, the precision of the estimate for entrepreneurship promotion intervention was somewhat reduced (i.e., had a larger confidence interval).

Assumptions in the analysis
In the main meta-analysis, the team applied a procedure to remove implausibly large or influential estimates (cf. Section 3.4.2). Although meta-analysts would not want to erroneously exclude relevant estimates, balance demands that no single estimate, especially one among hundreds, should determine how an entire research literature is viewed or understood (Stanley & Doucouliagos, 2015).
As described in the respective sections, the team first winsorized the highest and lowest 1 per cent of coded SMD effect sizes estimates. Generally, this affected the (unweighted) mean SMD and its standard deviation only marginally. Subsequently any observations with an SMD or an SMD standard error of more than 0.75 were dropped. In the full sample, roughly 30 SMDs from 4 studies were excluded; most of these stemming from sub-group analysis and therefore all but one study was retained in the sample. This section tests whether the results are robust against these assumptions. The full results for all tests in the report are not displayed but those that appeared to be of major importance are highlighted. For example, the team also tested whether winsorizing at the 5 per cent level (instead of 1 per cent) altered the results, but could not find any definitive evidence and therefore this issue is not discussed. The upper panel of Figure 35, Figure 36 and Figure 37 again replicates the results of the main moderator analysis regarding the main category of intervention but this time without winsorizing the data and only dropping outliers with an SMD or standard error above three (effectively not dropping any observations).
In fact, since very few observations were dropped, the results changed marginally and even confidence intervals did not increase as much as one might have expected. Correspondingly, the team found that results from the meta-regression model were not affected by the decisions to censor the specified data and so this robustness check is not discussed further in the current report.
Tests were also conducted on whether the results were robust to the level of aggregation of effect sizes before synthesizing results based on the random-effects meta-analysis. Specifically, the team checked whether aggregating effect size across all studies of one intervention, rather than only aggregating studies using the same data set, made a difference. The latter increased the number of observations in the full analysis (based on all outcome variables) from 100 interventions to 120 individual studies. Similarly, the reviewers tested whether the level of cluster in the meta-regression model significantly affected results but found no evidence that the level of first-step aggregation or cluster significantly altered results.  In summary, the team tested the robustness of the results towards different decisions made in the process of compiling and analysing the data, such as the imputation of missing information or the handling of statistical outliers. Since all the sensitivity analysis yielded similar results to the main analysis, the team can be quite confident that the findings reported in sections 4.3.2 to 4.3.4 were not influenced by the method of analysis.

Research design
This section tests whether the results depended on the applied evaluation design. In the combined meta-analysis, studies of randomized control trials and quasi-experimental evaluation approaches were pooled but more rigorous evaluation designs might have systematically yielded different effect sizes than less robust evaluation designs. Figure 38 and Figure 39 provide a moderator analysis by research design for both employment and earnings outcomes. In contrast to expectations, experimental studies actually produced larger effect sizes in both cases, but the difference between experimental and quasi-experimental was not statistically significant at the 5 per cent level in either case. Based on the one-way random effects ANOVA model, the team was able to rule out a systematic difference on average between effect sizes generated from experimental and quasi-experimental studies.

Figure 43: Summary forest plot of income outcomes by main category of intervention for quasi-experiments
However, this may differ according to the category of intervention. It is, for example, plausible that a particular intervention type consistently displays higher results when evaluated through experimental studies than through quasi-experimental ones. Therefore, the team tested whether the basic results on the effectiveness of different intervention types held when considering only evidence from experimental studies, which was arguably more reliable than quasi-experimental results. In a first descriptive step, Figure 40, Figure 41, Figure 42 and Figure 43 replicate the moderator analysis by main category of intervention for employment and earnings outcomes separately for experimental and quasi-experimental studies.
The evidence for entrepreneurship interventions was entirely based on empirical research and, hence, the review's results pertain. In contrast, all studies of subsidized employment interventions were derived from quasi-experimental approaches. To some degree, this correlation between intervention type and research design may have confounded the analysis. The spectrum of research designs employed for evaluating skills training interventions was more mixed. But regardless of evaluation design, skills training interventions appear to have been the most successful intervention type, along with entrepreneurship programmes. (The difference in SMDs between entrepreneurship and skills training interventions was still not statistically significant.) Unfortunately, studies of interventions classified as unspecified were dropped from the analysis, since fewer than four interventions were found which could have been classed as falling within either sub-group (experimental vs. quasi-experimental).
As with all univariate analysis, one issue was that the difference between experimental and quasi-experimental studies observed in forest plots might also have been driven by other factors (such as the fact that the majority of experiments were conducted in low-and middleincome countries, generally yielding larger effect sizes). Based on the univariate analysis, it is not possible to state with certainty that the aggregate effect size was actually downward biased by including evidence from quasi-experimental studies.
The meta-analysis results appeared robust to the type of evaluation design and the main findings were corroborated by rigorous evidence from experimental studies.

Analysis of small-sample bias and publication bias
This section uses funnel plots and Egger's tests to check whether there was any indication of publication bias in the sample of studies. Figure 44 and Figure 45 present funnel plots for the entire sample (including all outcomes and all sub-groups). The figure displays plots of the effect size (SMD) on the horizontal axis and the standard error of the effect size (SE SMD) on the vertical axis. In Figure 44, effect sizes are aggregated at the study level, and each dot represents an individual study. In Figure 45, the data is entirely disaggregated, meaning that each dot represents one effect size estimate. The solid line crosses the horizontal axis at the overall average fixed effect estimate.  Although most of the dots (studies) are spread around the solid line and within the triangular area (indicating the 95 per cent confidence interval), a degree of tendency towards the right is observable. These represent studies that reported positive effects (with a medium level of precision, as measured by the standard error). This slight asymmetry may be an indicator of publication bias. Table 35 from Egger's test for publication bias confirmed the visual indication: The coefficient of the variable bias was positive and statistically significant at the 5 per cent level. Note: *, ** and *** denote statistical significance at 10 per cent, 5 per cent and 1 per cent level of significance respectively. As in the previous sections, the analysis is further disaggregated by outcome categories. Figure 61, Figure 62 and Figure 63 in Section 10.2 of the Appendix show funnel plots (aggregated at the study level) for the three outcome variables separately. 37 As can be seen from these figures, effect sizes of earnings and income outcomes appeared most strongly skewed to the right. This was also confirmed when performing Egger's test for each outcome category separately. For business performance outcomes, Egger's test was not significant at the 5 per cent level.

Results presented in
This potential publication bias was accounted for in the multivariate meta-regression model using the procedure described in Doucouliagos and Stanley's study in 2009: The authors argued that including the standard error of the SMD in the random-effect model would account for the potential effect of publication bias and the resulting coefficient estimate would provide an indication of the magnitude (and significance) of the effect. Following this approach, the team found a clear indication of selection for statistically positive results. The point estimate was consistently positive and statistically significant. In addition, the summary effect estimate (represented by the constant) in the model, which pools all outcomes, turned non-significant when accounting for publication bias using this approach. This seemed to be largely driven by the negative (insignificant) results on business performance outcomes, while the summary effect on employment and earnings was still significant even accounting for publication bias.
In addition to the above test for publication bias, the team also tested whether reported effect sizes differed between peer-reviewed articles, working papers (some of which were unpublished at the time of publication search), evaluation reports/technical reports and other types of reports (such as books and dissertations). No statistically significant differences in average effect sizes by publication status were found, as can be seen by the forest plot shown in Figure 46. These results held when the analysis was disaggregated by intervention type or outcome category (not reported). Similarly, the dummy for publication status (peer-reviewed) in the multivariate results did not provide a clear picture.
Funnel plots and Egger's test indicated some publication bias towards studies showing positive effects of youth employment interventions on labour market outcomes. Using the procedure proposed in Doucouliagos and Stanley (2009), the reviewers accounted for publication bias in the multivariate meta-regression model. While the overall effect was significantly reduced, youth employment interventions still showed a significant positive effect on employment and earnings outcomes. Nonetheless, the team concluded that the summary effect size of youth employment outcomes probably represented an upper bound for the true impact of these interventions. At the same time, no correlation of reported effect sizes with publication status was detected.    To support informed decision making, the systematic review examined the existing evidence on the effectiveness of interventions that aimed to improve the labour market outcomes of youth. The review relied on a structured and comprehensive search that allowed the identification and assessment of all relevant impact evaluation studies carried out worldwide between 1990 and 2014 across the following intervention types:

SUMMARY OF MAIN RESULTS
• Training and skills development, which comprises programmes outside the formal education system (and therefore does not consider Technical and Vocational Education (TVE) programmes) that offer skills training to young people in order to improve their employability and facilitate their transition into the labour market.
• Entrepreneurship promotion, aiming to provide entrepreneurial skills as well as physical, financial and social capital for youth becoming self-employed and starting a business and for those seeking to expand and grow their businesses.
• Employment services, delivering job counselling, job-search assistance and/or mentoring services, which are often complemented by job placement and technical or financial assistance.
• Subsidized employment, which are government efforts to boost labour demand and incentivize hiring and meaningful work experience for young women and men. This type of interventions include wage subsidy programmes and labour-intensive public employment programmes.
The key labour market outcomes considered were the post-treatment measures of employment, earnings, and business performance.
• Employment outcomes include employment and/or unemployment probabilities, participation rates, hours worked, unemployment duration and quality of employment.
• Earnings outcomes include reported earnings and income, household income, consumption, and salary and/or wage.
• Business performance outcomes include profits, sales, number of employees and jobs created, capital and investment, business creation and business survival.
In the process of understanding what works, the review also focused on the way in which interventions work, relying on prior theories of change for the selected intervention types as well as on observed programme characteristics reported in the studies.

1.
The systematic review showed that investing in youth through active labour market measures may improve outcomes. Interventions to support young women and men in the labour market may lead to positive outcomes, increasing their chances of finding or staying in employment and improving their income. The positive effect on employment and earnings was statistically significant with effect sizes measured by 0.04 and 0.05 SMDs, respectively, demonstrating the responsiveness of these outcomes to youth's exposure to active labour market programmes (ALMPs). With substantially less evidence, the effect on business performance outcomes was not statistically significant at 0.03 SMD; however, when only entrepreneurship promotion interventions were considered, its impact was larger and significant, at 0.10 SMD.

2.
Programme impacts conceal major contextual differences. Even after factoring in differences across interventions, effects on labour market outcomes of youth were highly inconsistent across studies. The review assessed factors that correlated with reported effect sizes to different intervention results, from country context to programme and participants' characteristics. Tests for heterogeneity showed substantial variation in the effect size magnitude due to country income level, the design and implementation of the interventions and the profile of programme beneficiaries.

3.
The underlying evidence base varies by country income level. Intervention characteristics and research designs differ significantly between high-income and lowor middle-income countries. A large proportion of the evidence from high-income countries derived from quasi-experimental studies of national programmes, implemented in collaboration with government organizations. In contrast, the evidence from low-and middle-income countries was predominantly based on experimental impact evaluations of rather small-scale, targeted interventions, which were often implemented by NGOs or international organizations.

4.
Impact is higher in low-or middle-income countries than in high-income countries, on average. Evaluation studies from low-or middle-income countries produced larger effect size estimates on average than studies conducted in high-income countries. The result holds for employment and earnings outcomes and after controlling for differences in research design and intervention characteristics. The studies pointed to a factual difference across country contexts: Being unemployed or unskilled in a high-income country -where labour demand is skill intensive -puts youth at a highly disadvantaged position vis-à-vis a cohort that is on average well educated. While ALMPs help these youth to reconnect to the labour market, they do not fully compensate for knowledge or skills not acquired earlier, in the education system. In lower income countries, with large cohorts of disadvantaged youth, marginal investments in skills and employment opportunities lead to larger changes in outcomes. This finding coincides with earlier reviews by Betcherman et al. (2007) and Fares and Puerto (2009).

5.
In low-and middle-income countries entrepreneurship and skills training interventions offer the greatest impacts. The evidence from low-and middleincome countries showed that youth employment interventions lead to a meaningful impact on both employment and earnings of youth. In particular, entrepreneurship and skills training interventions yield positive results, on average, especially in terms of income gains. This is an important finding, which points to the merits of combining both supply-and demand-side interventions to support youth. It also provides tangible evidence about the effect of human capital investment. The effect of entrepreneurship promotion interventions should be interpreted with care because despite the large magnitude of impact, this intervention category also reports the largest confidence intervals across outcome measures. More primary studies are therefore needed to increase the accuracy of the finding. The large effects of skills training and entrepreneurship does not imply that other intervention types should be avoided, as much depends on the specificities of the youth employment challenge, the needs of beneficiaries, and the design of the programmes.

6.
In high-income countries, the role of intervention type is less tangible. No single type of intervention provided clear evidence of a significant effect on the employment or earnings of youth in high-income countries. Skills training appeared slightly more likely to effectuate some (albeit small) impact on employment or earnings, but the difference in comparison to other intervention types was generally not significant. Longer term employment and income estimates in high-income countries were higher than estimates that considered only the short term (less than one year after treatment exposure).

7.
Programmes lead to better outcomes when they target low-income and disadvantaged youth. Across measures of targeting, a focus on low-income or youth with low levels of education triggers higher employment and earnings for youth across all country income levels. The analysis by gender is less conclusive. While the overall effect size for employment and earnings appears to be larger for young women than for young men, we find no strong patterns in the multivariate regression analysis to suggest that targeting women only will lead to better outcomes.

8.
There is no clear indication about the impact that public, private or civil society implementers bring to the equation. While the involvement of public and private entities in the implementation of a youth employment programme led to positive impacts in high-income countries, the relationship in low-and middle-income countries was non-significant or negative. More impact research is needed to account for implementation agents and mechanisms.

9.
The results appear robust in terms of the quality of the underlying evidence, as well as across different assumptions and model specifications. Most importantly, they held up under a restricted sample of experimental impact evaluations. There was some evidence for small study effects suggesting that publication bias is present in the literature.

UNPACKING THE CAUSAL CHAIN ACROSS YOUTH EMPLOYMENT INTERVENTIONS
Section 1.3 proposed a series of causal chains connecting youth-targeted ALMPs to expected outputs such as direct job creation or changes in skills, knowledge, attitudes, or behaviours, and ultimately linking programme delivery to projected labour market outcomes as well as other closely correlated outcomes such as accumulation of human capital.
Some of these anticipated connections were confirmed by the results of the systematic review, shedding light on the impacts of skills training, entrepreneurship, employment services and subsidized employment on labour market outcomes of youth.
This section re-examines the proposed result chains, reflecting on the transmission channels that lead from activities to outcomes. It relies on the findings from the meta-analysis and digs deeper into the individual studies, unpacking features of programme design and implementation that triggered success in the achievement of intermediate and final outcomes.

Skills training programmes
Education and training are key determinants of success in the labour market and strong predictors of non-vulnerable jobs among youth (Sparreboom & Staneva, 2014). While time spent on education and training certainly pays off, returns are far more likely to be realized if there are strong, explicit links between education and training policies and the world of work.
Youth training programmes seek to develop skills that enhance human capital and lead to long-term gains in employment. A simplified results chain depicted in Table 37 draws a road map of how exposure to a training programme and the skills acquired through it can lead to improvements in employment, earnings and business performance. The causal hypothesis relies on a series of assumptions and the achievement of some intermediate results, such as positive changes in knowledge, skills, attitudes and behaviours, which are expected to occur in the short term and lead to changes in labour market outcomes such as the probability of employment after programme participation.
The road map is complex, as there are a number of parameters to consider in the design and delivery of training, including (i) the curriculum; (ii) the skills or combination of skills embedded in the curriculum (technical, soft); (iii) training provider's experience and quality; (iv) participation of employers (as well as workers' associations) in programme design and implementation; (v) the setting (in-classroom, on-the-job, mixed); (vi) financial and nonfinancial incentives for participation of both youth and employers; (vii) targeting mechanisms; (viii) mechanisms for the selection of training providers; (ix) monitoring and reporting; (x) alignment with other ALMPs. Skills training interventions are the most widely used youth employment intervention worldwide and are increasingly combined with other measures to boost employability (Betcherman et al., 2007;Fares & Puerto, 2009). A total of 55 out of the 107 evaluated interventions (51 per cent, as shown in Table 9) examined by this review fell within the main category of skills training interventions, with 53 per cent of these being conducted in high-income countries, 35 per cent in middleincome countries and 13 per cent in low-income countries. 39 On average, skills training interventions improved employment outcomes among young women and men by 0.05 SMDs (CI = 0.02, 0.07; I 2 = 65 per cent; number of interventions = 67) and also led to higher earnings (0.07 SMDs; CI = 0.05, 0.08; I 2 = 86 per cent; number of interventions = 60). Some key results emerged from the meta-review: 1) Skills training programmes lead to positive changes in labour market outcomes. With a sizable evidence base, effect size estimates across all country income types were positive (Table 38). The result supports the economics of active labour market training programmes which aim to help youth enter the labour market and accumulate the necessary skills to compete for jobs and improve their productivity -with subsequent positive impacts on wages provided that there is no depreciation in skills (Heckman, Lochner & Cossa, 2002). 2) The effect of training is higher for youth in low-and middle-income countries compared to youth in high-income countries (Table 38). The result echoed the findings from Betcherman et al. (2007) and highlighted the role of contextual variables, such as access to basic and technical vocational education and training, and to social protection systems, and suggested that, while training programmes led to positive outcomes in high-income countries, they were unable to compensate for skills that were not acquired at school.
The multifaceted nature and evolution of skills training interventions was also observed in the evidence from single studies: 3) Comprehensive, multi-service training interventions were more prevalent and worked best in low-and middle-income countries. Skills training interventions have evolved into holistic measures (Fares & Puerto, 2009). Some 36 out of the 107 interventions (34 per cent) examined by this review combined skills training with one or more additional intervention types: 24 interventions combined training with employment services only, eight interventions with subsidized employment only, and one intervention with entrepreneurship promotion only. There were three cases in which skills training interventions were combined with more than one intervention type.
The combination of skills training and entrepreneurship promotion (and potentially further intervention types) was particularly prevalent in low-and middle-income countries, emphasizing youth's scant opportunities in (formal) employment and the limited ability of public and private sectors to absorb the growing youth labour force. Some examples of evaluated interventions that consisted, at a minimum, of a skills training and an entrepreneurship intervention component included the Employment and Livelihood for Adolescents (ELA) in Uganda, the Economic Empowerment of Adolescent Girls (EPAG) Programme in Liberia and the Livelihoods Training for Adolescent Living Programme in India.

4) Recent evidence points to the relevance of incentives and profiling mechanisms within the design of the interventions.
As shown Kluve et al. (2016), incentives and profiling measures were correlated with better employment and earnings outcomes. The Adolescent Girls Employment Initiative (AGEI) of the Employment Fund in Nepal provided technical and life skills training with a comprehensive incentive scheme. Training providers, who were selected through a competitive bidding process, were offered a bonus payment based on the number of trainees that had obtained "gainful" employment six months after completing the training and a second bonus for the share of participants that met pre-specified vulnerability criteria and were successfully placed in employment (Ahmed, Chakravarty, Lundberg & Nikolov, 2014).

5)
Despite the growing awareness and demand for soft skills, aggregated results did not imply that they systematically led to better outcomes. While most interventions covered by this review offered technical skills, soft or nontechnical skills were increasingly embedded in training packages (28 out of 55 skills training interventions), reflecting employers' demand for these abilities (Cunningham et al., 2010;Youth Employment Network & International Youth Foundation, 2009).
The meta-regression results did not suggest a significant correlation of the inclusion of a soft skills component with larger effect size estimates. In fact, when restricting the sample to high-income countries, the availability of soft skills in the programme curriculum was correlated with lower employment effects, particularly among the younger cohort. An example from a high-income country is the JOBSTART programme from the United States. The programme applied an intensive exposure model that combined basic education, occupational skills training, training-related support services and jobs development and placement assistance -which included work-readiness, life and communication skills -for school dropouts and economically disadvantaged youth. While there is no disaggregation of impacts by skills set delivered, the evaluation showed overall meagre impacts on employment outcomes (Cave, Bos, Doolittle & Toussaint, 1993).
Evidence from single studies in low-income countries offered more promising results. The combination of life and vocational skills provided to adolescent girls by the ELA Programme in Uganda led to large and significant changes in behaviours and an increased probability of employment and self-employment (Bandiera, Buehren, Burgess, Goldstein, Gulesci, Rasul & Sulaiman, 2014). These mixed results called for further investigation about the role of soft skills in the causal chain from intervention to final outcomes.
6) Multi-setting approaches enhanced the acquisition of relevant skills and led to better labour market outcomes. Skills training interventions expanded the exposure of trainees to different environments, particularly by combining inclassroom with on-the-job training (Fares & Puerto, 2009). This combination was prevalent in almost half of the evaluated skills training interventions (25 out of 55).
When not combined, classroom training alone was more frequently observed (in 45 out of 55 skills training interventions, compared to 32 where training was given at the work place).
The Jóvenes Programmes in Latin America and the Caribbean were well-represented in this systematic review, with (often several) impact evaluation studies for programmes implemented in Argentina, Chile, Colombia, Dominican Republic, Panama and Peru. The model, piloted in the 1990s, combined in-classroom and onthe-job training in a demand-driven fashion. On the one hand, the design of the programme ensured private sector involvement in the definition of training content, securing the correspondence between the skills taught and those demanded by the productive sector. On the other hand, implementation was demand driven through stringent, competitive bidding processes for the selection of training providers, and incentive payment schemes were based on trainees' outcomes. The first and most successful of the Jóvenes Programmes in terms of impact on employment was Chile Jóven, with an effect size for employment outcomes of 0.35 SMD (CI = 0.13, 0.58) and for income outcomes of 0.23 SMD (measured with less precision, CI = -0.16, 0.60). The employment effect sizes of other Jóvenes Programmes were lower but still positive and close to the sample mean for skills training interventions (which was SMD 0.05; CI = 0.02, 0.07).
It is important to note that, while it was not possible for the systematic review to assess treatment effects on intermediate outcomes, such as knowledge, skills acquisition, attitudes and behaviours, some single studies did find (i) positive impacts of youth employment programmes on educational outcomes (in the United States) and (ii) noticeable changes in behaviours, expectations and non-cognitive skills (in Dominican Republic).
In conclusion, Table 39 provides an evidence check against the expected outcomes for skills training interventions outlined in the results chain.

Increased probability of employment
There was ample evidence demonstrating the ability of skills training to increase the probability of employment among youth after programme exposure. The evidence applied to all country income levels and across wage employment (France's Contrat de Qualification) and selfemployment (Uganda's ELA programme).

Reduced time to find job/ shorter unemployment duration/ greater efficiency in the job search
Few studies reported on the job search or time looking for a job after the programme. Measurements of unemployment duration or unemployment probability were less common and offered mixed results. For example, young men that benefited from the programme Juventud y Empleo in the Dominican Republic saw an increase in formalization (written contract) coupled with an increase in duration (weeks) of unemployment and hours spent job-seeking on last working day (Ibarrarán et al., 2014). Comprehensive measures that combined training with counselling and job search assistance offer potential to impact the job search. However, more evidence is needed to support this proposed causality.

Increased ability to retain job/longer job duration (hours worked)
While the evidence was clear about the positive impact of training on the probability of employment, it was less so about its impact on employment duration. Furthermore, employment probabilities and hours of work did not necessarily react in the same way to the same intervention; e.g., the evaluation of Galpao in Brazil reported positive and negative average SMDs for employment probability and hours worked, respectively (Calero et al., 2014). In contrast, programmes in Nepal and India, reported both high employment probabilities and high hours of work Maitra &Mani, 2014) 4. Better quality of employment (contract type, job type) Skills training increased job quality. The evidence was more common among programmes in low-and middle-income countries and it was correlated with better wages or earnings; e.g. Colombia's Formación Técnica y Tecnológica and Jóvenes en Acción Programmes, the Ninaweza Youth Empowerment Programme in Kenya, Procajoven in Panama, and Projoven in Peru.

Increased earnings
Skills training interventions led to higher earnings among youth, supporting the argument that investments in human capital lead to higher wages and therefore better employment outcomes in the long

term. A review of impact evaluations of the Jóvenes Programmes in
Latin America and the Caribbean showed positive short term impacts on earnings, slightly larger for young women (in Colombia and Panama) than young men. The data was however less reliable as retrospective evaluations had to rely on retrospective income data (Ibarrarán & Rosas-Shady, 2009 Cho et al., 2013 for the Apprenticeship programme in Malawi). However, it was important to emphasize that starting a business was not always the primary goal of those interventions.

Entrepreneurship promotion interventions
Entrepreneurship promotion interventions are designed to address the individual and external constraints that young people encounter in starting or growing a business by providing entrepreneurial skills and facilitating access to capital for self-employmentincluding physical, financial and social capital.
The systematic review examined 15 entrepreneurship interventions that offered mainly business skills training, business advisory services and/or access to credit or grants. Table 40 presents a simplified version of the results chain in Section 1.3.2 to outline the outcomes expected from entrepreneurship interventions, including (i) employment outcomes such as increased probability of employment, (ii) earnings outcomes and (iii) business performance outcomes, such as increased sales.   Cho and Honorati (2013), this review observed a wide variation of effects depending on the services provided within the intervention package and the context.

2) Most of the evidence originated from interventions set in low-income countries (Liberia and Uganda) and middle-income countries (Bosnia and Herzegovina, Colombia, Peru and Tunisia)
, and their evidence was notably recent. Ten of the 15 entrepreneurship interventions were evaluated between 2012 and 2014, with evidence predominantly coming from Africa. Only two interventions were implemented in high-income countries (France and United Kingdom), which implies they were dropped from the analysis due to insufficient sample. Table 41 therefore presents the effects of entrepreneurship interventions in low-and middle-income countries. Detailed characteristics of these entrepreneurship interventions are presented in Section 9.2.
3) The effects were intensified in low-and middle-income countries where entrepreneurship interventions reported larger effects on employment outcomes (0.18 SMD; CI = 0.06, 0.29; I 2 = 68 per cent; number of interventions = 5), earnings outcomes (0.14 SMD; CI = 0.06, 0.22; I 2 = 49 per cent; number of interventions = 10) and business performance outcomes (0.15 SMD; CI = 0.07, 0.23; I 2 = 0 per cent; number of interventions = 9). Entrepreneurship interventions appeared to work well if they address specific constraints: In Uganda, the evaluation of the Youth Opportunities Programme (YOP) showed that grants for non-agricultural vocational training and business start-up had substantial economic impacts on earnings for young people in the capital-constrained environment of a conflictaffected region. This finding is in line with Section 4.3.3.6, which highlighted similar positive effects of entrepreneurship interventions for disadvantaged youth.

4) Entrepreneurship interventions followed a trend towards multicomponent services.
About two-thirds of the evaluated interventions offered a combination of business skills training, business advisory services (including mentoring) and/or access to finance. An intervention which adopted this multipronged approach was the Women's Income Generation Support (WINGS) programme in Uganda with the largest effect size for employment outcomes across all evaluated interventions examined by the review. The programme combined business skills training, cash grants and follow-up support to young women, leading to an increase in working hours from 14 to 25 hours per week. This programme seemed to be the main driver of the overall positive impact of entrepreneurship interventions.

5) Similarly, interventions providing both entrepreneurship training and
business advisory services -irrespective of grants provision -showed strong, positive evaluation results on employment outcomes in low-and middle-income countries. For example, the Economic Empowerment of Adolescent Girls (EPAG) programme in Liberia provided classroom-based training followed by six months of follow-up support and reported a 47 per cent increase in employment. In addition to changes in labour market outcomes, the evaluation showed improvements in the self-confidence of participating girls. 6) Positive business performance outcomes (e.g., an increase in profits) were reported for interventions that provided start-up grants, either alone or in combination with training and advisory services. These results were driven by interventions that specifically aimed at mitigating capital constraints for poor and vulnerable young people, as in the case of the YOP and the WINGS programmes, both implemented in northern Uganda.
7) The evidence on grants was, however, not conclusive when it came to supporting existing young entrepreneurs in growing and expanding their businesses. A recent randomized experiment with the Start and Improve Your Business (SIYB) programme in Uganda showed that limited access to finance was a real constraint for young business owners that could be addressed through the combination of business training and loans. This programme effect, however, only held for the subsample of young men who had expressed an interest in growing their business. Evidence suggested that, in developing countries, family pressure on women can deflect the use of grants or credit for non-business purposes (Fiala, 2014).
The evaluation of the Partner Microcredit Foundation Experiment, a business and financial literacy programmes in Bosnia and Herzegovina, highlighted the fact that the programme led to improvements in business practices and entrepreneurial impetus, but did not directly translate into improved chances of business survival. In Peru, three entrepreneurship interventions addressing the need for a multicomponent approach through business training, business advisory services and access to finance, also improved business performance outcomes of low-income youth and youth living in rural areas. The programmes Calificación de Jóvenes Creadores de Microempresas, Formación de Líderes Empresariales, and Formación Empresarial de la Juventud relied on business plan competitions to determine eligibility for programme participation or start-up funding.

8) Developing a business plan was a component of more than one-third of entrepreneurship interventions and was a common means of determining eligibility for participation in the programme and/or access to finance. Additional examples included CréaJeunes in France and Turning Theses into Enterprises in Tunisia.
In conclusion, Table 42 provides an evidence check against the expected outcomes for entrepreneurship promotion interventions outlined in the results chain. There was strong evidence that entrepreneurship promotion interventions in low-and middle-income countries led to increased employment probability and number of hours worked. E.g., the Economic Empowerment of Adolescent Girls (EPAG) programme in Liberia reported a large increase in employment . The Women's Income Generation Support (WINGS) programme in Uganda showed the largest effect size for employment outcomes across all evaluated interventions examined by the review (Blattman et al., , Blattman et al., 2014).

Increased earnings or consumption among young entrepreneurs
Entrepreneurship promotion interventions tend to show positive effects on earnings and consumption for young people. These effects were intensified in low-and middle-income countries where entrepreneurship interventions proved particularly effective for disadvantaged youth and in capital-constrained environments such as in the context of the Youth Opportunities Programme in Uganda (Blattman, Fiala & Martinez, 2013)

Business started
There was good evidence that entrepreneurship promotion is an effective approach to support business creation by young people. For example, Formación de Líderes Empresariales in Peru improved business creation by providing business training, business advisory services, business plan competitions and access to finance (Jaramillo & Parodi, 2005).
4. Increased business investment, performance and competitiveness (e.g. profits, sales, capital and investment, business survival) Overall, the impact of entrepreneurship interventions on business performance outcomes was positive. E.g., the Youth Opportunities Programme in Uganda led to positive results on capital/investment and the WINGS programme in Uganda reported positive effects on business survival (Blattman et al., , Blattman et al., 2014, (Blattman, Fiala & Martinez, 2013. However, the evidence was inconclusive on means to support existing young entrepreneurs to grow and expand their business. The Start and Improve Your Business Programme (SIYB) in Uganda showed differential impacts across gender, with young men benefiting more from combined training and loans than young women (Fiala, 2014).

Additional jobs created
There is no sufficient evidence to validate the causality between youth entrepreneurship promotion interventions and the creation of jobs through the newly created or expanded businesses. Blattman, Fiala & Martinez's evaluation of the Youth Opportunities Programme in Uganda offered an example of a study that captured positive effects on additional jobs created.

Employment services
Employment services generally comprise interventions focusing on labour intermediation, i.e. programmes optimizing the process that matches jobseekers with vacancies. They deliver job counselling, job-search assistance and/or mentoring services for (re)activation purposes, which are often complemented by job placements and technical or financial assistance. The basic idea behind providing employment services to youth is that young workers have difficulty signalling their skills and credentials and/or lack the networks or knowledge to effectively search for vacancies and connect with employers. Hence, these programmes often focus on improving job-seeking skills and the efficiency of the matching process (Table 43). Financial assistance for job search, through: Provision of credit or grants/stipends connected to jobsearch and job-acceptance (e.g., transport, childcare) Greater ability to find and accept jobs (e.g., enhanced mobility) The review identified a sample of ten employment services interventions, a majority of which combined job counselling, job-search assistance and mentoring services. In fewer cases, the interventions provided job-placement services and/or financial assistance. The only intervention that focused solely on financial assistance for job search was a subsidized transportation experiment in Ethiopia (Franklin, 2014). Interventions were typically of short duration (three months on average) and their intensity ranged from one-off afternoon visits to job information centres for secondary students in Germany to 12 months in the Counselling and Job Placement for Young Graduate Job Seekers programme in France. Importantly, the review highlighted the increasing reliance on employment services as supplementary measures within other ALMPs, mainly training and wage subsidies.
Most evaluations took place in high-income countries (Finland, France, Germany, Portugal and the United States) where they were typically implemented by public employment agencies and operated on a national scale. In developing countries, evaluated interventions were implemented in Ethiopia, India and Jordan. They were characterized by their small scale or pilot nature and the common aim to reduce job-search costs for jobseekers, either via job screening and matching, recruiting services or transport subsidies.
The evidence pointed to several key patterns: 1) Most employment services programmes tended to specialize in specific services. In contrast to other main intervention types, employment services interventions exhibited a trend towards single-pronged approaches, mainly the provision of job counselling, job-search assistance and/or mentoring services. A relatively successful example of this monotypic intervention is the programme of mandatory visits to job information centres for German secondary students, whereas a less effective example is the "Job Shadowing" component of the School-to-Work Opportunities Act (STWOA) in the United States.
The evidence on employment services programmes from low-and middleincome countries is very thin. This is likely in line with the fact that this programme type originates in the idea of assisting registered jobseekers within an Unemployment Insurance (UI) system in a high-income country, and is thus an uncommon main intervention type in low-and middle-income countries. In fact, the number of studies in the sample of low-and middle-income countries was too small to comply with the review's minimum requirement (four) and was therefore dropped from the effect size analysis ( 2) Table 44). An examination of the individual studies in Jordan (Groh, McKenzie, Shammout & Vishwanath, 2014), India (Jensen, 2012) and Ethiopia (Franklin, 2014) showed rather positive impacts on employment outcomes of young participants. 3) Aggregate empirical evidence for high-income countries showed positive effect sizes for employment and earnings outcomes, though they were relatively smaller than the SMDs for other intervention types. Single studies in high-income countries typically found small or often non-significant effects on employment. The study by Caliendo, Künn and Schmidl (2011) was the only one that detected positive long-term effects on youth labour market outcomes from the Job Search Assistance track of the German ALMP measures. 4) In most studies, the changes in labour market outcomes were transitory and there was no sign of a stepping-stone effect. Evidence for this was provided, for instance, by the impact evaluations of the Counseling and Job Placement for Young Graduate Job Seekers in France (Crépon et al., 2013), the transport subsidies intervention in Addis Ababa (Franklin, 2014), and the mandatory visits to job information centres in Germany (Saniter, 2014)).
In conclusion, Table 45 provides an evidence check against the expected outcomes for employment services outlined in the results chain.

Increased labour-market participation
There was no evidence to validate the causality between employment services for youth and this outcome construct.

Increased probability of employment
Current evidence is very thin in this regard an insufficient to secure the causality in the aggregate. Positive changes in employment probability were reported more often among low-and middle-income countries (Ethiopia (Franklin, 2014), India (Jensen, 2012), and) than high-income ones (France (Crépon et al., 2013), Germany (Caliendo, Künn & Schmidl, 2011)). It is however important to note that positive changes were generally not accompanied by positive changes in other outcome types, such as earnings.

Reduced time to find job/shorter unemployment duration
The limited evidence showed that employment services increased the probability of unemployment (instead of reducing it as expected). This effect was captured in Finland (Hämäläinen, Hämäläinen & Tuomala, 2014) and Germany (Saniter, 2014). Unemployment duration was seldom measured, and when it was, results show limited gains.

Increased ability to keep a job/longer job duration/ increase in hours worked
There was no evidence to validate the causality between employment services for youth and this outcome construct.

Better quality of employment (contract type)
There was no evidence to validate the causality between employment services for youth and this outcome construct.

Increased earnings or consumption
Impacts on earnings were rather small with some negative reports in Jordan  and France (Crépon et al., 2013). Consumption changes were only measured in one study (India (Jensen, 2012)), not sufficient to support the proposed causality.

Subsidized employment interventions
Overall, subsidized employment interventions reported larger effects on employment outcomes (0.02 SMDs; CI = -0.01, 0.06; I 2 = 50 per cent; number of interventions = 105) than on earnings (-0.01 SMDs; CI = -0.05, 0.03; I 2 = 61 per cent; number of interventions = 89). They also appeared less successful in higher income countries (Table 46). Before delving further into these findings, the analysis below differentiates between results and evidence from interventions delivering wage subsidies as opposed to public employment programmes -two subsidized employment measures with very distinct characteristics in design and implementation.

Wage subsidy interventions
Low levels of skills, limited or no work experience, signalling barriers, or economic crises and downturns all hamper labour demand for youth. Employers may have limited scope for hiring or suspect that youth come to the market with low productivity levels -lower than the market wage for a given job. To compensate for possible low productivity and to incentivize hiring (and training) of young people, wage subsidy programmes offer a risk discount to employers that offsets certain wage and non-wage costs.  Table 4) lists (i) more and better employment outcomes (from increased probability of employment to higher job quality and more efficient job searches), (ii) higher earnings, and (iii) long-term effects on youth's human capital and employability among the expected outcomes of wage subsidy programmes. 1) Wage subsidy programmes for youth performed better in middle-income countries than in high-income countries (Table 48). Effect sizes for employment and earnings were respectively close to zero and negative in high-income countries. 2) Employment outcomes were highly responsive to young people's exposure to wage subsidies, especially in comparison to earnings outcomes. To explain these effects, the review pointed to the role of design features in determining programme effectiveness; echoing similar claims by Neumark and Grijalva (2013), Almeida et al. (2014) and Bördős et al. (2016). Kluve et al. (2016) show that once design features such as participant profiling, supervision, and incentives were accounted for, subsidized employment interventions, heavily influenced by the wage subsidy programmes in the sample, appeared to be more successful than skills training interventions.
The design of wage subsidy programmes implied numerous decisions on: (i) targetinggeneral subsidies vs. hiring subsidies or the decision to focus on specific target groups; (ii) the payment vehicle -direct payment, reduction in payroll taxes or social security contributions; (iii) the payee -employer or employee; (iv) the size of the subsidy and basis for its computation; (v) the duration of the subsidy or of the intervention as a whole; (vi) the offer -a job, a job with training or a job with training and other services; (vii) conditionalities, reporting requirements and programme monitoring.
While there was no clear evidence on relative effectiveness across design options, some messages from single studies were apparent: 3) Fine-tuning conditionalities, securing feasibility of claims and proper information and dissemination were critical to incentivize firm take up. Conditionalities were set to curb unintended behaviours and ensure connections across the underlying theory of change. Their establishment implied appropriate monitoring, which was often linked to well-developed public employment services. Stringent conditionalities, however, have the potential to deter employers' participation, as shown in the French national programme, Contrat Jeune en Entreprise, aimed at promoting long-term contracts among disadvantaged youth. The programme offered a hiring subsidy, paid directly to the employer, and targeted youth aged 22 and younger, who had dropped out of school before passing the secondary school examination that would qualify them for entry to university. The subsidy was proportional to the part-time ratio for part-time workers and was offered in full for two years and then reduced to half during the third year. In return, employers had to commit to not dismissing a participant, except for professional misconduct, during the three-year term of the contract. The programme led to a very low take-up by employers, who argued that conditions were too strict in comparison to the perceived benefit (Roger & Zamora, 2011).
In contrast, conditionalities that were compensated with relatively high subsidies seemed to cover the employer's opportunity cost adequately and enhance their participation. The national German programme JUMP offered direct payments to employers of 40 per cent of the wage value on the hiring of unemployed youth with secondary education. The relatively generous subsidy was paired with strict conditions for no early dismissal and a guaranteed period of post-subsidy employment, equivalent in duration to half the subsidized period. An impact evaluation of the programme showed positive impacts on the probability of employment in the short and long terms, with higher effects among the more skilled youth and in regions with relatively low labour demand (Caliendo, Künn & Schmidl, 2011).
The lack of internal mechanisms at the firm level and of adequate information decreases incentives for the subsidies. A controlled experiment that provided employment vouchers to unemployed young South Africans in order to reduce the wage costs for employing firms yielded an average SMD for employment outcomes of 0.13 (CI = 0.01, 0.26). The evaluation study reported a positive probability of wage employment that reduced slightly over the longer term. However, the experiment suffered from a low take-up of the employment vouchers by eligible employers, which seemed to be partially correlated with the administrative burden of claiming the subsidy (firms did not have internal processes in place to deal with this aspect) and the perception by employers that the vouchers were not legitimate .

4)
Profiling was key to avoiding deadweight and substitution effects. The Stage d'Initiation à la Vie Professionelle (SIVP) in Tunisia provided an employment subsidy for university graduates by reducing the employer's hiring costs and exempting it from social security contributions, resulting in an average programme SMD of 0.16 (CI = -0.03, 0.34). The programme decreased joblessness, increased the probability of employment in the private sector and reduced the chances of permanent contracts among young programme participants. The first-come, firstserve nature of the programme and non-reliance on profiling mechanisms is argued to have led to large deadweight effects (Broecke, 2013).
In general, single studies hardly account for deadweight, substitution or displacement effects. This is a significant drawback that restricts the interpretation and applicability of evaluation findings (Almeida et al. 2014).

5) What matters in the offer is the ability of programmes to enhance skills formation among youth.
A programme that only offers "a job" has the potential to lead to positive outcomes if the exposure to employment is sufficiently relevant to facilitate learning-by-doing, which will lead to higher employment in the long run (Heckman et al., 2002). Relevant exposure could imply subsidized employment of extended duration, as in the case of the JUMP wage subsidies in Germany (Caliendo, Künn & Schmidl, 2011) or exposure to a job that facilitated the acquisition of or delivered on-the-job training on new and job-relevant skills. Comprehensive designs that combined wage subsidies with skills training measures shed some light on mechanisms to boost skills gains and employability among youth. Although classified under "Unspecified main category", the New Deal for Young People programme, implemented in the United Kingdom, demonstrated the success of combining jobsearch assistance, a wage subsidy, on-the-job training, and sanctions to boost labour market outcomes of registered unemployed youth. The programme, introduced in the United Kingdom in 1998 to help the young unemployed into work and to increase their employability, offered multi-staged job-search assistance, followed by a menu of four tracks: training, education, wage subsidy or reinstatement in the labour market through voluntary work or environmental services. Analyses of the wage subsidy measures showed positive transitions to employment (Blundell et al., 2004) and lower probability of unemployment among programme participants .
In conclusion, Table 49 provides an evidence check against the expected outcomes for wage subsidy interventions outlined in the results chain. There was also evidence to back up a consequential decrease in the probability of unemployment among youth, suggesting some efficiency gains from demonstrating high/higher productivity to employers or improvements in the job search 3. Increased ability to retain a job/longer job duration There was no evidence to demonstrate an increased ability to retain a job or secure longer job duration after exposure to a wage subsidy programme. The few evaluations that reported on hours worked Webb, Sweetman &Warman, 2014) showed negative to no impact

Better quality of employment
Evidence on quality of employment is mixed. Some long-duration subsidies led to positive employment outcomes in the long-term (Caliendo, Künn & Schmidl, 2011) as well as to long-term contracts Expected outcomes Evidence checks (Roger & Zamora, 2011) or fixed term contracts (Brodaty, 2007). Other schemes of shorter duration led to temporary and often unregistered jobs (e.g. Jordan NOW as reported by

Increased earnings or consumption
Overall effect sizes of earnings outcomes were smaller than those for employment, particularly among high-income countries 6. Increased returns from employment, including long-lasting human capital accumulation There was evidence of skills formation, particularly among interventions that offered relevant jobs, sufficient exposure, or the opportunity to learn at the workplace -including through on-the-job training (Wilkinson, 2003;Blundell et al., 2004;De Giorgi, 2005)

Public employment interventions
Public employment programmes seek to stimulate labour demand in contexts where markets are unable to create productive employment on the required scale. In the context of youth, public employment programmes can facilitate first-time jobseekers' entry into the labour market and keep unskilled or disadvantaged youth connected to the labour market, thus mitigating skills depreciation or the negative, scarring effects of long-term unemployment.
The multi-dimensional nature of the included programmes offered scope for multiple objectives. Their connection to social protection policies also allowed the formulation of expected outcomes beyond those related to the labour market, such as consumption smoothing. Table 50 (a shortened version of Table 4), however, focuses on a list of labour market-related outcomes that included (i) more and better employment measures (probability of employment, hours worked, job quality), (ii) higher earnings and (iii) human capital accumulation (when the programme led to skills formation). Public employment programmes are complex, entailing a number of design and implementation parameters, from the selection of works and services to targeting mechanisms, wage setting, determination of benefits, work conditions and labour intensity, incentives for participation and monitoring and reporting requirements.
The evidence to support the proposed theory of change was unfortunately very sparse. During the search period, the systematic review was able to identify only two studies with public employment programmes as their main category that complied with the review's inclusion criteria. Both studies reported zero to negative treatment effects on the probability of employment after programme participation, suggesting that public employment programmes have not effectively facilitated improvements in labour market outcomes of youth.  assessed the impact of the German Job Creation Schemes Programme, which provided unemployed youth with secondary education with the opportunity to work in infrastructure or social projects for a maximum of 12 months. The study found negative impacts on employment probability of young participants both in the short and long term. Brodaty (2007) examined the French programme Travaux d'Utilité Collective (TUC), a social development and community public works project for unemployed youth. The job duration varied from three to 24 months, with contributions by both Government and employers. The study found no significant changes in employment probability compared to youth in the comparison group.
The effect sizes of these two studies fell below the overall effect size for subsidized employment interventions and also in relation to wage subsidy programmes.
Furthermore, one of the four arms during the "option" stage of the above-mentioned UK New Deal for Young People programme acted as a public employment programme. The environmental services track within the programme provided jobs for youth in housing projects, forest and park management, and reclamation of derelict or waste land. Evaluations of the New Deal showed that this particular component had limited to no impact on postprogramme employment, particularly in comparison to wage subsidies, which provided a more effective means of exiting unemployment and securing unsubsidized employment . Similar results were found by Card et al. (2010 and, where evidence that was not specifically focused on youth showed public employment programmes to be generally less successful than other types of ALMPs.
The meagre evidence on youth-targeted public employment programmes limited the discussion about what works or which design features matter most. This finding calls for further impact research on this type of intervention, particularly in low-and middle-income contexts where programme exposure may have diverse effects on youth and their families.
The search window of the systematic review missed the recording of a recent impact evaluation of a public employment programme implemented in Côte d'Ivoire by Premand, Marguerie, Crépon and Bertrand (2015). The evaluation of the Emergency Youth Employment and Skills Development project, established in 2012 to support the economic recovery following the post-electoral crisis, showed large short-term positive impacts on probability of employment and hours worked in wage occupations and positive impacts on earnings while youth were still participating in the programme, in contrast to the comparison group. While results for long-term effects are not yet available, the promising short-term results support the call for more and better evidence-gathering in developing economies.

AGREEMENTS AND DISAGREEMENTS WITH OTHER STUDIES OR REVIEWS
The effort to undertake this systematic review was initially motivated (see Section 1.4) by the statement that new evidence was needed to support decision-making on youth employment. Specifically, similar systematic reviews and studies either required urgent updating (Betcherman et al., 2007) or simply posed related, yet distinct, research questions (e.g., Card et al., 2010Card et al., , 2015Tripney et al., 2013;Grimm & Paffhausen, 2015;Filges et al., 2015).
The findings presented in the previous section are aligned with that motivation: the results of the empirical analysis are generally congruent with previous and related literature, but they (i) add much more depth given the rigour of the analysis and the comprehensive nature of the data, (ii) complement and carve out much more clearly the patterns indicated by previous studies, and (iii) add genuinely novel insights. So, in essence, the current study found few points of variance with related studies, but agreed on major lines, strengthening the existing knowledge base and identifying many new, detailed aspects:

1) Agreement: The main result agreed with related studies -youth interventions are effective tools for improving labour market outcomes.
With a broader sample of target population, i.e. not only youth, Card et al., 2010 and found that ALMPs have smaller effects for older workers and youth in comparison to women and the long-term unemployed. The ability of those studies to factor in other groups offers an important insight to the results of this review: While the impact of ALMPs on youth was positive, the magnitude of reported effect sizes is smaller than ALMPs targeting all individuals, without an age target. The systematic review and meta-analysis by Filges et al. (2015) also found a small increase in employment probability for unemployment insurance recipients who participated in ALMP programmes.
2) Agreement: There was heterogeneity by programme type, as indicated by every systematic assessment of the literature, and that entrepreneurship promotion programmes and skills training programmes were effective interventions, particularly in low-and middle-income countries. The importance of human capital based programmes and their dynamic time horizon -with increasing effect sizes observed over longer durations post-programme -has recently also been found in Card et al. (2015) -a pattern which was replicated in this review of youth-only interventions for employment outcomes, although results were not statistically significant.
3) Agreement: Another pattern indicated by the previous literature (e.g., Card et al., 2010;Betcherman et al., 2007) is that youth labour market interventions tend to be less effective in high-income than in low-middle income countries. Confirming this pattern was another important result of this review's meta-analysis, and it substantially reinforced the corresponding conjectures of previous studies, by providing a basis of more comprehensive data and more profound empirical analysis.

4) More nuanced:
While related studies also conjectured that the more comprehensive type of interventions tended to be more successful, this result was only confirmed by the current review among low-and middleincome countries, and not in high-income countries (agreeing, perhaps coincidentally, with Eichhorst and Rinne (2015) based on the limited information provided in the YEI data).

5) Agreement:
Moreover, as the literature using systematic analyses of labour market programmes has grown, some indicative evidence has pointed to female participants benefitting more than male participants (echoing Card et al., 2015), although differences between outcomes for men and women were not statistically significant.
6) A novel finding of this meta-analysis and of the examination of the theory of change was that intervention design and implementation features tended to drive results more strongly than did the type of intervention (phrased in the Main Results as: the "how" seeming to be more important than the "what").

COMPLETENESS AND APPLICABILITY OF EVIDENCE
The evidence base on youth employment is growing and improving. While better study designs should lead to lower risks of bias, limitations in the evidence still shed only partial light on what works.

1) Insufficient consideration of spillovers and general equilibrium effects.
Exposure to ALMPs is expected to create a spillover effect among non-participants, as well as causing general equilibrium effects throughout the economy. While some of these spillovers may positively affect overall employment outcomes, in certain cases they can hamper the performance of programme non-participants. This is true of the substitution effects and windfall effects that can arise from wage subsidy programmes, which are rarely addressed in the empirical literature.

2)
Box 10 below describes a study of a wage subsidy programme in Tunisia that examined partial equilibrium effects. Accordingly, in the absence of systematic considerations of the general or partial equilibrium effects, the review's findings necessarily exhibit a degree of incompleteness and questionable external validity.

Box 10: Partial equilibrium effects: Entrepreneurship training and self-employment among university graduates in Tunisia
In Tunisia, an entrepreneurship track was introduced into the applied undergraduate (licence appliquée) curriculum in 2009. University students enrolled in the last year of their licence appliquée were invited to apply to the entrepreneurship track, which provided students with: (i) entrepreneurship courses organized by the public employment office; (ii) external private sector coaches in an industry relevant to the student's business idea; and (iii) supervision from university professors in development and finalization of the business plan. The entrepreneurship track offered students the opportunity to graduate by writing a business plan instead of a traditional undergraduate thesis. On graduation, participants were invited to submit their business plans to a competition and the competition winners became eligible to receive seed capital to establish their business.
A randomized trial aimed to identify the impact of the entrepreneurship track on beneficiaries' labour market outcomes. The study showed that the entrepreneurship track significantly increased the rate of self-employment among university graduates approximately one year after graduation, but that the effects were small in absolute terms. The employment rate among beneficiaries remained unchanged, which in partial equilibrium indicates a substitution from wage employment to self-employment. However, Almeida et al. (2012) note that the shift from wage employment into self-employment may free up job opportunities for non-participants, therefore potentially leading to higher employment overall in general equilibrium. The study design did not allow such potential general equilibrium effects to be identified.
3) Limited reporting on the transmission channels. The theory of change is what allows the exploration of how empirical findings in context A can be useful to decision-makers in context B. Unfortunately, studies often focus on the final outcomes and provide limited information dealing with effects on intermediate outcomes, such as changes in knowledge, skills, behaviours or attitudes. Assessing impacts on intermediate outcomes was beyond the scope of this systematic review, and coding and analysing these outcomes may be an area for future research.

4) Insufficient consideration of cost effectiveness.
The applicability of the evidence hinges not only on its internal and external validity but also on its feasibility. Detailed analyses of costs are very limited and methods to compute net benefits and cost-benefit ratios have not yet been standardized.

QUALITY OF THE EVIDENCE
This systematic review did not undertake a full risk of bias assessment as recommended in the systematic review methodology literature. However, it relied on a framework by Duvendack et al. (2012) to assess the analytical and statistical rigour of the included studies based on the studies' design and statistical methodology. This framework makes some assessment of study design and confounding, but excludes other domains of bias such as performance bias, selection bias (including attrition) and biases in outcomes data collection. A future update of this systematic review would need to take these factors into account.

LIMITATIONS AND POTENTIAL BIASES IN THE REVIEW PROCESS
The review relied on evidence from indirect comparisons of programmes in different contexts. It excluded studies which only examined the relative effects of two or more interventions, limiting the extent to which review question 2, namely "Which of these interventions are the most effective on average?", could be answered using direct comparisons of programmes in the same context. In addition, the review did not assess net outcomes resulting from the included interventions, such as whether employment creation of ALMP participants displaced non-participants thereby decreasing the overall employment impact of the programme. The summary effect size of youth employment outcomes may therefore represent an upper bound for the true impact of these interventions.
The review team made an extensive effort to collect missing information by contacting authors using a standardized template to solicit the data required for inclusion of the study. In addition, the team employed several methods to impute missing information where possible and extensively tested the adequacy of these procedures. This allowed effect sizes for a large share of the included studies to be computed. However, the main empirical analysis was based on 2,259 of the 3,629 coded treatment effect estimates, for which it was possible to compute the SMD. While this sample is much larger than in most other systematic reviews, it remains difficult to assess the degree to which missing information may impact the empirical findings (i.e., whether reporting quality is correlated with effect size magnitudes).
Relatedly, it was not possible to conduct a detailed assessment of the risk of bias in included studies. The main reason was that most reports did not provide the information needed to objectively code the information required. This is irrespective of the publication status and study design: Many experimental impact evaluations do not provide basic information on the randomization approach (allocation method and concealment from participants) or the participant flow through the study. For quasi-experimental studies, there are yet no common standards for reporting and analyzing bias that unifies various econometric approaches. It is therefore very difficult to objectively assess bias in studies without contacting original authors, who may not be inclined to respond to such queries. We therefore chose to adopt the approach to classification based mainly on study design.
Echoing other reviews in the social sciences (e.g., Tripney et al., 2013), the review found that the methods for calculating comparable effect sizes from studies using more complex multivariate econometric methods are underdeveloped and require further research. However, the review benefitted from the experience of the principal investigators and was carried out with frequent guidance from the Methods Coordinating Group of the Campbell Collaboration.
Finally, the search and selection of studies focused specifically on quantitative impact evaluations using a rigorous (quasi-) experimental design. While the team believes that this is a strength of the review, the method may have disregarded important findings from studies that were rather more qualitative in nature or did not attempt to provide causal effect estimates.

IMPLICATIONS FOR POLICY AND PRACTICE
The extent and urgency of the youth employment challenge and the level of global attention currently being given to this topic calls for more and better evidence-based action. Accordingly, this systematic review sought to examine the empirical evidence in order to understand what drives the success (or failure) of youth employment interventions. Investments in youth employment will continue, and even increase, as countries embark on the implementation of the 2030 Agenda for Sustainable Development; therefore, this review focused on identifying "what works" and, as far as possible, "how".
This systematic review builds on a growing base of studies measuring the impact of youth employment interventions and offers a rigorous synthesis and overall balance of empirical evidence taking into account the quality of the underlying research. The review is systematic through a clearly defined and transparent inclusion and exclusion criteria, an objective and extensive search, a punctual data extraction process, a standardized statistical testing and analysis, and a thorough reporting of findings. These elements and underlying methods and tools were laid out and reviewed in the protocol ).
The evidence suggests that investing in youth through active labour market measures may pay off. The evidence also shows a significant impact gap across country income levels. Being unemployed or unskilled in a high-income country -where labour demand is skill intensive -puts youth at a distinct disadvantage in comparison to a cohort that is, on average, well educated. While ALMPs in high-income countries can integrate disadvantaged young people into the labour market, they are not able to fully compensate for a lack of skills or other areas where youth failed to gain sufficient benefit from the education system. On the other hand, in lower income countries, with large cohorts of disadvantaged youth, marginal investments in skills and employment opportunities are likely to lead to larger changes in outcomes. Youthtargeted ALMPs in low-and middle-income countries do lead to impacts on both employment and earnings outcomes. Specifically, skills training and entrepreneurship promotion interventions appear to yield positive results on average. This is an important finding, which points to the potential benefits of combining supply-and demand-side interventions to support youth in the labour market.
The evidence also calls for careful design of youth employment interventions. The "how" seems to be more important than the "what" and, in this regard, targeting disadvantaged youth as well as providing incentives for participation of youth, appropriate profiling mechanisms and schemes to motivate service providers to perform effectively may act as key factors of success.
The latter emphasises the ability of specific design features within employment interventions to affect individual behaviours -in this case among both young people and service providers. It also implies -and calls for -sensible interpretation of the results. The findings from this review need to be discussed vis-à-vis the local and national context and should be complemented by a long-term and holistic commitment towards youth development.
Achieving an understanding of the "how" element is not an easy task. Although the systematic review excluded studies which only reported relative effects, it is also the case that, frequently, impact evaluations do not assess relative effectiveness. Even more often, reports and papers fail to describe the underlying theory of change and observed transmission mechanisms behind an intervention. In some other cases, there is limited information about the characteristics of programme participants in the evaluation sample and their comparison group. Much remains to be done to improve reporting standards and advocate for more and better evidence examining the impact of youth employment interventions. The quality of the primary studies determines the quality of the systematic review and any subsequent synthesis of the evidence.

IMPLICATIONS FOR RESEARCH
Counterfactual studies examining youth employment interventions are comparatively well designed, with an increasing share of experimental evaluations conducted in recent years. While this assessment of the study design could only provide a partial picture, the analysis showed a relatively high overall level of rigour of included studies: Only 9 per cent of studies were judged to have a low level of rigour, based on their research design and empirical methodology. However, evidence for small study effects suggested publication bias was present, based on the sample of included studies.
A number of issues which placed limitations on this review could be mitigated with additional or improved primary research on youth employment interventions.
1) Existing research is spread unevenly across the globe. While the evidence gathered was global in nature, capturing 31 countries and all regions of the world, slightly more than half of the evidence derived from interventions in high-income countries. While it was possible to include a number of recent experimental studies from middle-and low-income countries -notably sub-Saharan Africa and Latin America and the Caribbean -there was a distinct lack of evidence from Asia, Central Europe, the Pacific, the Middle East and North Africa. Furthermore, the evaluations of youth employment interventions in low-and middle-income countries were concentrated on rather small-scale, NGO-implemented interventions and there was a lack of evidence for larger, nationwide governmental programmes. 2) A notable observation regarding the quality of impact assessment reports is that too few studies provided evidence about heterogeneous treatment effects for different sub-groups of the interventions, such as female or low-income youth. Similarly, as significant differences in effect size magnitude by length of time since programme exit were observed, it is clear that more research is needed to (re-)assess the effectiveness of youth employment interventions in the long run. 3) More evidence and comparative analyses are needed to assess relative effectiveness across intervention components and between intervention types. The review team believes that the practitioners would greatly benefit from more evidence of interventions with multiple treatment arms which compare the effectiveness of combining different intervention design features. 4) To gain a better understanding of the employment effects on young people, it is important to further observe their transitions from the informal economy to the formal economy. The extent of informality among youth calls for further research into successful approaches to facilitate an effective transition into formal sector jobs and formalized businesses. 5) Authors of primary studies should report all information required to calculate effect sizes across different outcome measures in a more detailed, complete, or standardized way. This relates, in particular, to the follow-up mean of the outcome variable in the control group, as well as pooled (or comparison group) standard deviations. Only 13 of the 113 reports in the initial sample provided all the information needed to compute standardized mean differences without having to contact the authors or, in a second step, impute the missing information. For another 13 reports (representing seven interventions), it was not possible to compute SMDs even after taking these steps and their findings therefore could not be included in the effect-size based quantitative meta-analysis. 6) Frequently, it would also be welcome if authors provided more detail on reporting their study design and empirical identification strategy as well as occurrence and potential causes of attrition. Based on the reported details, it was often difficult to judge the internal validity (or risk of bias) of studies due to a lack of reported information about potential biases, such as attrition, selection or mismeasurement. 7) Finally, the review originally set out to compare the cost-effectiveness of different intervention types. This was not possible as very few studies indicated the cost of implementation in published reports. Much remains to be done to improve the research and reporting standards and generate more and better evidence about the impact of youth employment interventions.

Search terms for electronic databases
The search terms for electronic databases include the most frequent and relevant exposure, outcome and subject terms which were identified during the scoping search through a frequency test of 107 keywords in a group of 32 preselected potentially relevant studies from the Youth Employment Inventory (available at: www.youth-employment-inventory.org/) based on a first draft of inclusion and exclusion criteria. 46

CODE DESCRIPTION
This section contains a description of the variables that were coded at the study level. Each variable name is followed by a description. Opposite each variable name/description there is a description of how the variable should be coded.   We also coded effect size specific information at the effect size/outcome level. A single study may analyse more than one outcome or group. For this reason there may be multiple effect size observations for a single study. The effect size variables are listed below. In addition to the variables above, information was collected about the following programmerelated variables, which were considered relevant for the analysis. To minimize the number of missing values for these variables, information was extracted from the study as well as from sources outside the study (which is the core unit of analysis), including project reports and project websites.        No. of SMDs/Studies: Total: 638/89 Note: Imputation: full, SMDs limit = .75, SMD_SE limit = .75 Note: The Galpão programme in Brazil ) (0.82 SMD; 95% CI = (-0.03, 1.66) exceeds the SMDs limit of 0.75 and was therefore not included in the analysis for the forest plot above.    No. of SMDs/Studies: Total: 670/92 Note: Imputation: full, SMDs limit = .75, SMD_SE limit = .75  No. of SMDs/Studies: Total: 169/14 Note: Imputation: full, SMDs limit = .75, SMD_SE limit = .75 Note: The WINGS programme in Uganda (Blattman et al., 2014) (0.82 SMD; 95% CI = 0.13, 1.50) exceeds the SMDs limit of 0.75 and was therefore not included in the analysis for the forest plot above.

Business outcomes
The disaggregated forest plots for business outcomes of main categories employment services, subsidized employment and unspecified are not displayed due to lack of observations.

About this review
Youth unemployment is much greater than the average unemployment rate for adults, in some cases over three times as high. Today, over 73 million young people are unemployed worldwide. Moreover, two out of five young people in the labour force are either working but poor or are underemployed. The youth employment challenge is not only about job creation, but especially about enhancing the quality of jobs for youth.
This systematic review assesses the impact of youth employment interventions on the labour market outcomes of young people. The included interventions are training and skills development, entrepreneurship promotion, employment services and subsidized employment. Outcomes of interest include employment, earnings and business performance outcomes.