Do Apprenticeships Pay? Evidence for England

The importance of apprenticeships for early labour market transitions varies across countries and over time. In recent times, there has been a policy drive to increase the number of people undertaking apprenticeships in England. This is regarded as important for addressing poor productivity. We investigate whether there is a positive return to undertaking an apprenticeship for young people. We use detailed administrative data to track recent cohorts of young school leavers as they transition to the labour market. Our results suggest that apprenticeships lead to a positive average earnings return (at least in the short run), although there is stark variation between sectors. This is an important driver of the gender gap in earnings.


I. Introduction
Apprenticeships feature in the vocational education systems of many countries, although their popularity varies widely, being especially prevalent in countries like Germany, Austria and Switzerland and much less common in countries like the US, Sweden and Italy which rely more on classroom-based learning or put less emphasis on vocational education. 1 England is somewhere in between, although there is recent policy interest in increasing the number of apprenticeships, with a government commitment to increase them to 3 million over 5 years , alongside substantial reforms to the regulation and funding of the apprenticeship system. The purpose is to help address two important problems in the UK: poor productivity and a significant fall in employers' investment in training over recent decades (National Audit Office, 2019). The apprenticeship programme aims to allow people to develop the knowledge, skills and behaviours required for their occupation. Apprenticeships may also be important for improving social mobility because they tend to be undertaken by students in the low-middle part of the educational distribution and by students who would not otherwise have the credentials for university (at least in England). 2 They are an important part of the government's plan to improve and reform post-16 education in England. 3 Does investment in apprenticeships lead to a positive return? This has been considered from the perspective of both firms and individuals. Acemoglu and Pischke (1999) use the example of the German apprenticeship system to show that firms often engage in general training, despite a large training cost (and consistent with a model of imperfect labour markets). In Germany and Switzerland, detailed cost-benefit assessments have been made (as summarized by Muehlemann and Wolter, 2014) and help to explain how firms recoup their initial investment. From the individual's perspective, in theory, apprenticeships should offer an excellent environment to acquire general employability skills (such as teamworking, communication skills etc) as well as specific occupational skills acquired 'on the job'. Furthermore, they may ease the school-to-work transition by establishing a better match of workers' skills to firm needs (Ryan, 2001). However, the empirical evidence on whether there are employment and wage returns to apprenticeship programmes for workers varies by country. For example, positive wage returns have been found forAustria (Fersterer, Pischke and Winter-Ebmer, 2008) and France (Bonnal, Mendes and Sofer, 2002) but not in the Netherlands (Plug and Groot, 1998) or Germany (Parey, 2016).
Whether apprenticeships generate a net positive return is likely to depend on many institutional aspects of the country -such as the education system, training regulations and labour market institutions (Muehlemann and Wolter, 2014). Furthermore, the apprenticeship model in England is very different than in other European countries. Steedman (2010) documents differences for England which include much fewer hours of off-the-firm training, shorter duration and many at a lower level of skill. Thus, it is difficult to make any generic claim about whether apprenticeships are a good investment for either firms or individuals. However, unless apprenticeships are seen to generate positive returns for individuals, it may be that they fail to attract enough suitable applicants, thus jeopardizing their potential for improving productivity. In this paper, we ask whether apprenticeships lead to positive returns in the labour market. Specifically, we compare individuals with a similar level and type of education, and investigate whether those who undertake an apprenticeship have better employment and earnings outcomes in their early career compared to those who complete their education entirely within a classroom-based setting. Although there is an existing literature on the returns to apprenticeships in the UK (e.g. McIntosh and Morris, 2016), we contribute to this literature with a more detailed analysis of the returns to apprenticeships for young people than has been previously possible in England and indeed not typically possible in other counties. This is because we have access to detailed administrative data for cohorts of young people as they move through the education system and into the labour market. This includes individuals' prior attainment in national tests at the end of primary and secondary school. This is one of the first papers to use the linked administrative education-earnings data for England ('the Longitudinal Educational Outcomes' database). One of the main advantages of this data set is that we are able to analyse returns to apprenticeships in a much more refined way -for example, looking within particular vocational sectors.
Even with the availability of detailed data, unobservable variables (such as various non-cognitive abilities or motivation) might influence both the probability of obtaining an apprenticeship and labour market outcomes. To overcome these issues, we make use of an Instrumental Variable strategy which uses cohort-to-cohort variation in the extent to which peers of young people (within the same year group and school) access apprenticeships. Our results suggest that within our population of interest selection effects are unlikely to be strong for earnings (with the possible exception of women who access lower-level apprenticeships). We find an average earnings return to undertaking an apprenticeship that persists (at least in the early phase of a young person's career), but that this return is highly variable by sector of specialization. This is the main driving force behind the finding that the return is much higher for men than for women. The findings on gender are in line with previous research in other countries. For example, studies in Germany find that occupational segregation explains a large fraction of the gender gap of young workers with an apprenticeship (Kunze, 2003(Kunze, , 2005. Fitzenberger and Kunze (2005) find a large wage gap at the bottom of the income distribution and that occupational mobility is lower for women than for men. This is the first study (to our knowledge) that investigates the gender gap amongst those who have previously undertaken an apprenticeship in England.
The remainder of this paper is structured as follows. Firstly, we briefly discuss existing literature on whether apprenticeships have added value in the labour market (section II). We then give brief details of the English education system and how apprenticeships fit in (section III). We discuss the data and methodology (section IV) before discussing results (section V). We then discuss our conclusions (section VI).

II. A brief literature review on individual returns to apprenticeships
Apprenticeships may offer opportunities for acquiring job-related skills that are not present to the same extent in a classroom-based education (Wolter and Ryan, 2011). For instance, they offer an opportunity for students to put their skills to use immediately in practical situations, thus helping with motivation to learn (especially for less academically-inclined students). They increase familiarity with the work environment and expose students to 'work ethic' present in the workplace. In addition, firms may be better informed about the skills required in a specific job and are more aware of how these requirements change over time due to innovation in the production technology. If it is true that apprenticeships are more suited to providing specific skills than full-time classroom-based vocational education, students who complete an apprenticeship would be expected to have higher productivity. Accordingly, we should observe relatively higher wages in the longer-term. Another prominent argument in support of apprenticeships is that they ease the school-to-work transition by facilitating a better match between workers' skills and firms' needs and by acting as a substitute for job-search.
Cross-country evidence suggests that smoother transitions and lower youth unemployment characterize countries with a more developed apprenticeships system (Quintini and Manfredi, 2009;Quintini, Martin and Martin, 2007). But it can be difficult to fully control for other differences between countries (apart from the apprenticeship system). A common concern of this literature is that students who pursue an apprenticeship track are likely to differ from the comparison group for reasons that are not fully observable to the researcher but that are valued in the labour market. This is why the choice of the comparison group and of the methodology used to deal with selection are very important.
There are studies comparing individuals with the same level of academic and vocational education. For example, Hanushek et al. (2017) compare employment rates across different ages with a difference-in-differences approach. Their findings suggest that gains in youth employment arising from vocational programmes may be offset by diminished employment later in life, especially in countries that have large apprenticeship programmes. However, an analysis based on cohort studies for Britain by Brunello and Rocco (2015) suggests that such a tradeoff between academic and vocational education is only evident for a particular sub-group.
Other studies, more closely related to our paper, compare individuals in the same country who either undertake apprenticeships or school-based vocational education. They then compare the later labour market outcomes of apprentices and school-based vocational students. One of the most convincing studies is for Austria by Fersterer et al. (2008) who make use of firm failures to provide exogenous variation in the length of the apprenticeship completed by individuals. They find that a year of apprenticeship training generates an increase in pay of slightly more than 5 per cent. Parey (2016) studies the effect of following the apprenticeship track on labour market outcomes at age 23 and 26 in Germany. He attempts to circumvent the issue of selection by exploiting exogenous variation in the availability of apprenticeship places. He finds no effect of apprenticeships on wages but finds that apprentices are less likely to experience unemployment than their school-based counterparts (with this advantage fading away over time). In France, Bonnal et al. (2002) study wage differentials between apprentices and graduates from vocational school in their first job. Results show that apprentices, notably males, benefit from higher wages regardless of whether they are employed by the training firm, suggesting that skills acquired on the apprenticeship are transferable. Plug and Groot (1998) find that in the Netherlands there is no long-run advantage of apprenticeships compared to school-based vocational education in terms of earnings and employment. Finally, Bentolila, Cabrales and Jansen (2018) show that in Spain graduates from dual vocational education work more days and earn higher labour income in the first twelve months after graduating than their peers in full-time vocational education. However, their instrumental variable strategy (which involves use of commuting time to schools offering either track) suggests that this does not have a causal interpretation.
Overall, these studies highlight that while apprenticeships generally seem to ease the transition from school to work, they do not necessarily increase workers' productivity in the medium term. They also indicate that evidence on whether apprenticeships lead to wage returns seems to be mixed and dependant on the country.
In England, there is a literature estimating the returns to vocational qualifications, which generally suggests positive returns for apprenticeships. This includes Dearden et al. (2002), Robinson (1997), Bibby et al. (2014), andMorris (2016). They estimate value added models which attempt to deal with selection by including controls for many observable characteristics. McIntosh and Morris (2018) compare earnings of completers and non-completers of apprenticeships amongst older and younger age groups (conditional on starting an apprenticeship over the age of 19), showing that the earnings differential is higher for the latter group. We contribute to this literature by a more detailed study of apprenticeships for young people in England than has previously been possible and because we make use of plausibly exogenous variation in access to apprenticeships for identification (which has been difficult to do convincingly in this literature).

III. The English education system and the role of apprenticeships
The compulsory phase of English education ends at age 16, when students sit exams for the General Certificate of Secondary Education (GCSEs). Typically, students who have done well at GCSE, progress to academic or vocational education from age 16-18 and pursue what are known as 'Level 3' qualifications: A-levels (academic qualifications) and/or vocational qualifications (such as BTECs). About 40% of students opt for A-levels which is the traditional pathway to higher education and most often undertaken in schools or Sixth Form Colleges. Other students typically enrol in colleges of further education and undertake vocational qualifications. Students who do not qualify for progression to Level 3 study vocational qualifications at Level 2 or at lower level. 4 Finally a minority of students undertake an apprenticeship which is either intermediate (equivalent to Level 2) or advanced (equivalent to Level 3). Although apprenticeships may commence directly upon completion of GCSEs, it is more usual for students to spend time in full-time college-based education before starting an apprenticeship. Further details on the different qualifications students undertake and where they lead to are discussed in Hupkau et al. (2017).
In England an apprenticeship is defined as follows: '...a job that requires substantial and sustained training, leading to the achievement of an Apprenticeship standard and the development of transferable skills' (BIS, 2013). As in the other systems, the firm commits to train the apprentice and to pay an apprentice wage. The apprentice in turn commits to 'off the job' training, most often provided by a college or an independent training provider. The Specification of Apprenticeship Standards for England sets out the minimum requirements to be included in a recognized English Apprenticeship framework. The content of each apprenticeship occupation is defined with the contribution of employers' associations (in England there is no role for other social partners) and upon completion an apprenticeship leads to a recognised associated certificate. 5 Apprenticeships are expected to provide students with a mix of sector-specific and more general skills like numeracy, literacy, IT and personal skills which are all taught in the classroom. Apprenticeships are available in most sectors. Up to 2010, most intermediate apprenticeships took between 9 months and a year to complete while most advanced apprenticeships took between 18 months and two years (Steedman, 2010). Since 2012, a minimum duration of 12 months has been imposed.
Amongst those who finished their compulsory education between the academic year 2002/03 and 2007/08 (the cohorts considered in our analysis) about 19% started an apprenticeship at some stage between the age of 16 and 22 (i.e. no new starts between age 23 and 28). Almost all apprenticeships are either intermediate, corresponding to a Level 2 qualification, or advanced, corresponding to a Level 3 qualification, for this age group (with higher apprenticeships a new phenomenon). 6 Table 1 shows characteristics of apprenticeships for men and women respectively for the cohorts of students finishing their GCSEs between 2003 and 2008. Apprenticeships are made up of a number of different components (or aims). At this time, only half of people starting an advanced apprenticeship achieved most or all of their aims. This is lower for those starting an intermediate apprenticeship (and continues to be important for recent cohorts - Bursnall, Nafilyan and Speckesser, 2017). We are interested in the earnings return to starting an apprenticeship (as opposed to completing it) since the benefit of an apprenticeship is not necessarily primarily related to certification. It might also include the benefits of being trained on-the-job, completing some (even if not all) of the aims, and the contacts through which another job might be obtained. Furthermore, there is a compounding of selection issues if we estimate the earnings differential from completing an apprenticeship rather than the earnings differential from having started one. 7 The next panel of Table 1 shows the highest qualification achieved for those who started an apprenticeship. Although there is variability in the highest level of qualification achieved by those doing intermediate and advanced apprenticeships, the biggest group in each case are those who achieved a Level 2 vocational qualification (for intermediate apprenticeships) and Level 3 vocational (for advanced apprenticeships). Specifically, for advanced apprentices, most people obtained a qualification of Level 3 vocational -65% for men and 60% for women. The second biggest category is Level 2 vocational -16% for men and 20% for women.
The average duration of an apprenticeship is about 11 and 10 months for men and women respectively in the case of intermediate apprenticeships. For advanced apprenticeships, the average duration is 17 months and 12 months for men and women respectively. The gender difference in duration is likely to reflect different sectors of specialism. The lower part of Table 1 shows the spread of men and women across sectors for intermediate and advanced apprenticeships respectively. With regard to intermediate apprenticeships, the most popular sectors for men are Construction, Planning and the Built Environment, Engineering and Manufacturing Technologies, Retail and Commercial Enterprise and Business, Administration and Law.There is even more concentration within advanced apprenticeships (where most do Construction / Engineering). For women, the most important sectors are 6 From these cohorts, we could find only 2,959 people starting an apprenticeship at Level 4 by the age of 28 (0.1%) and only 45 people starting an apprenticeship at Level 5. 7 In the context of our report for the Sutton Trust, we show that the earnings differential is higher for those completing an apprenticeship than for those who started one (Cavaglia et al., 2017). It is plausible to suggest that the estimate on the return to starting an apprenticeship is a lower bound for the return on completed training.

Data sources
We use administrative data on entire cohorts of students in England as they move through the education system and into the labour market. Specifically, we combine the National Pupil Database (NPD), the Individualized Learner Record (ILR), and the Higher Education Statistics Agency (HESA) data. The NPD includes information on demographics, schools and test outcomes of students. We use national tests at age 11 and age 16 (GCSE). The information on demographics includes ethnicity, whether English is spoken at home, gender, eligibility to receive free school meals and the deprivation score of the neighbourhood of residence. The data does not contain more detailed information on family background. Linking these to other data sets enables us to capture all students in state education who move from school (in the NPD) to further education (ILR) and higher education (HESA). From the cohort finishing their compulsory education in 2002, it has been possible to link students' records with information on earnings and employment from tax records. 8 The linked education-labour market data ('Longitudinal Educational Outcomes') has only recently become available in England and this is one of the first papers to use it. We focus on cohorts of students who left compulsory schooling (at age 16) between the academic years 2002/03 to 2007/08 for which we have good information on educational outcomes and for whom tax records can be linked. The match between education and administrative data on earnings/employment ranges from 80% for the 2002/03 cohort to 85% for the 2007/08 cohort. 9 We classify students based on their highest level of education achieved throughout the whole period. The tax records contain information about earnings and days of employment. Unfortunately, there is no information about hours of work. We focus on earnings and employment outcomes when individuals are age 23, as we can observe this for several cohorts and it is at a fairly early stage in their working life (but after they have completed their education). For the 2002/03 cohort, we consider these outcomes at age 28 (i.e. in 2015) -which is the most recent year available to us. Further information about data construction as well as detailed descriptive statistics are shown in the Data Appendix.

Description of treatment and comparison groups
Almost everyone pursuing apprenticeships during this time was either qualified up to Level 2 or Level 3. To select a suitable treatment and comparison group, we compare people with the same highest level of vocational qualification (Level 2 or Level 3), some of whom started an apprenticeship (the treatment group) and others who did not (i.e. the comparison group achieved their vocational qualification only within a classroom setting). We make these comparisons for men and women separately. Appendix Tables A.1a and A.1b show that most individuals in this sample obtained their highest qualification by the age of 22. The upper panel of Tables 2a (for males) and 2b (for women) summarizes the main baseline characteristics for the four groups used in our analysis as well as for the cohort as a whole (column 5). 10 The groups of interest comprise 35 per cent of the cohort for men and 28 per cent for women. Tables 2a and 2b show that those educated up to Level 2 are 8 The combined education and labour market data is known as the Longitudinal Educational Outcomes data set or LEO. 9 The match-rate is a little higher for young people with apprenticeships compared with our comparison group (i.e. for those educated up to Level 2 (Level 3), the match rate is 84% (85%) and 82% (80%) for apprentices and non-apprentices respectively. 10 Appendix A.1a and A.1b show the age at which the highest qualification was attained for each group of men and women: those educated up to Level 2 (with and without an apprenticeship) and those educated up to Level 3 (with and without an apprenticeship). The Data Appendix gives further details of baseline characteristics by group as well as further detail on how the sample has been constructed.  much lower achieving compared to the average in the cohort (in terms of GCSE grades). Those educated up to Level 3 are closer to the average. With regard to GCSE grades, it would appear that there is positive selection into apprenticeships, with the exception of women educated up to Level 3 (where we observe the opposite). The positive selection is starker for Level 2 than for Level 3. In other words, those with an apprenticeship have better prior attainment than those who do not at Level 2, whereas at Level 3 there is a better match between the two groups. In all cases (for both levels of qualification and for men and women), those with an apprenticeship are more likely to be White British, speak English as a first language and are less likely to come from disadvantaged backgrounds (as indicated by eligibility to receive free school meals). Differences are bigger for men educated up to Level 3 than for other groups.

Methodological design
Our data facilitates a comparison of outcomes between those with and without an apprenticeship for individuals in similarly defined groups and then controlling for observable characteristics. Thus, we can estimate an OLS regression as follows (supressing the time subscript): Where Y represents the outcome of interest for individual i, observed at a particular age. The variable a i is a dummy variable equal to 1 if the individual had started an apprenticeship at some point previously. 11 We control for a vector of individual characteristics X i , namely demographic characteristics and prior attainment at age 11 (KS2) and age 16 (GCSE). Demographic characteristics are ethnicity, whether the student was eligible to receive free school meals when in secondary school and whether English is the main language spoken at home. We also include the Index of Multiple Deprivation of the place of residence (IDACI score). Measures of prior attainment are the test scores obtained in national tests in English, maths and science at age 11 as well as the score obtained at age 16 in GCSEs. 12 Cohort ( c ) and school dummies ( s ) capture time-invariant cohort and school specific characteristics respectively. The local unemployment rate (u LA ) captures local labour market conditions in the individual's locality (Local Authority) between the age of 16 and 18. 13 Controlling for cohort effects helps account for the fact that apprenticeships have increased over time and also for the different years in which outcomes are measured.
We are interested in labour market outcomes, namely the probability of employment, the number of days worked and log annual earnings. Our analysis is mainly conducted when 11 All the individuals in our data have finished their apprenticeship (and their education generally) by the time we observe labour market outcomes.
12 Key Stage 2 marks the end of primary education when students are aged 11. At this point, they sit national tests in English, Maths and Science. We convert marks from these tests to standardized test scores for each cohort. In the analysis, we drop observations with missing values on these test scores. We have checked that our results are not sensitive to an alternative approach of retaining such observations but including missing variable dummies. 13 For each cohort, we use the average local unemployment rate for when they were aged between 16 and 18. As the data are only available from 2004, we only use two years for the 2003 cohort (2004-05). For 5 local authorities the average rate is approximated because not all years are available.
individuals are at age 23 because almost everyone in the sample had obtained their highest qualification before the age of 22 and are unlikely to be in full-time education thereafter (as shown in Appendix Tables A.1a and A.1b). Furthermore, we want to estimate returns before many people have started to have children. However, we also estimate a regression at age 28 for the cohort that we observe for longest (the 2003 cohort).
Although this data enables us to control for very important characteristics of students that potentially influence both whether he/she gains access to an apprenticeship and labour market outcomes, there are potentially important omitted characteristics. For example, one would expect employers to screen students on qualities that are not available in these data, such as motivation and non-cognitive abilities. To the extent that these omitted variables both positively influence the probability of getting on to an apprenticeship and labour market earnings, the association between starting an apprenticeship and earnings will not reflect the true return (and it will be upwardly biased). From the discussion of observable characteristics (Table 2), it would appear that apprenticeships are positively selected on the basis of prior attainment, with the exception of women educated up to Level 3 where the selection is in the opposite direction. To the extent that prior attainment is positively correlated with omitted characteristics (such as non-cognitive abilities), the selection on omitted variables will be in the same direction.
One way to address causality is to make use of variation in the probability of starting an apprenticeship that is not otherwise correlated with earnings. A plausible source of variation is (within school) cohort-to-cohort variation in the extent to which peers take up apprenticeships between the age of 16 and 18. This variation might exist because of increased exposure to information about apprenticeships via peers in the same grade. In an English context, this source of variation is plausible because careers information in schools is known to be very patchy, especially in relation to vocational education and apprenticeships (Ofsted, 2013). The apprenticeship route is not firmly embedded in the post-16 education system (as it is in many other countries). There is no unified system for applying to apprenticeships. Learners interested in an apprenticeship need to apply either directly to the employer or through a governmental portal where employers can post their apprenticeship vacancies (although this is not obligatory). 14 In this context, having a friend who starts an apprenticeship might provide additional information on opportunities, the application process and on future prospects.
Thus, our hypothesis is that school peers who start an apprenticeship between the age of 16 and 18 may influence the probability of a given individual starting an apprenticeship because of the information they impart. This IV approach is inspired by Hoxby (2000) who investigates the effect of peer group composition in primary schools on achievement. In a robustness check, we test whether this peer measure is sensitive to the inclusion of observable peer characteristics (demographics and prior attainment) and we find that it is not. This suggests that the variation being exploited is not attributable to some cohorts being more equipped to pursue apprenticeships than others -at least based on observable characteristics (e.g. because of better exam results). As with other applications using this approach, the 'reflection problem' is still of potential concern. The reflection problem arises because an individual may influence his/her peers and not just the other way around (Manski, 1993). Unfortunately, and similarly to other studies in this literature, there is no convincing way to rule this out as something that may be pertinent here. Summary statistics for the instrument are provided in the Data Appendix. This at least suggests that there are a large number of peers to which an individual may be influenced by (as well as influencing).
To implement the IV approach, for each individual i we compute the share of students from the same cohort c, the same secondary school s and the same gender g that started an apprenticeship between the age of 16 and 18.
The share, Z csg−1 , excludes student i from the calculation. In other words, The Data Appendix shows summary statistics for the instrument. The percentage of each cohort undertaking an apprenticeship between age 16-18 is about 11 per cent for males and about 8 per cent for females. An obvious problem is that cohort-to-cohort (within school) variation in the number of peers undertaking an apprenticeship might be influenced by local labour market conditions which may persist over time and influence apprentices' future labour market outcomes. To some extent, we deal with this by including a measure of the local unemployment rate. We also test whether this is likely to be a problem by estimating the relationship between the number of apprenticeship starts in the closest neighbouring school and the probability of starting an apprenticeship. If two neighbouring schools are part of the same labour market, a change in local labour market conditions should affect students in both schools. If there is no relationship between the share of apprentices in a given school and the probability of starting an apprenticeship in the neighbouring school (as our results suggest), then it is much less likely that the within-school relationship is driven by omitted local labour market conditions.
The first stage and the structural equations are: Where a icsg is a dummy indicating whether the individual has started an apprenticeship; X icsg contains the same predetermined characteristics as equation (1), such as socioeconomic background, ethnicity, prior attainment (scores at KS2 and KS4 exams); c and s are cohort and secondary school fixed effects. Finally, u LA denotes the unemployment rate at the local authority level in the years corresponding to the time that the cohort is between 16 and 18 years of age. The standard errors are clustered at the local authority level. We estimate the OLS and 2SLS version of equation (3) for cohorts that completed their GCSE exams between 2002/03 and 2007/08.

Exploring heterogeneity and the gender gap
It is evident from the descriptive statistics in Table 1 that apprenticeships are spread across different sectors and popular choices vary for men and women. To further understand our results, we estimate returns within sectors and also conduct a descriptive analysis of the gender earnings gap.

Returns within sectors
To estimate returns within sectors, we use an OLS approach to estimate equation (1) within vocational sectors. In this case, we cannot apply the IV approach to instrument the decision to start an apprenticeship in a specific sector because there are not enough observations if we further split the sample by sector.
However, we conduct bounding analysis as a way of dealing with possible selection on unobservables for an individual observed on an apprenticeship within a given sector. 15 The idea of bounding is to use the amount of 'selection on observables' as a guide to assess the possible amount of 'selection on unobservables'. Altonji, Elder and Taber (2005) develop this method in the context of assessing the effect of attending a Catholic high school on educational attainment. It involves estimating the ratio of selection on unobservables to observables such that the estimated effect of attending a Catholic high school could be 'explained away' in relation to the outcome of interest. This methodology has since been applied in other studies in the education literature, such as Gibbons and Silva (2011). More recently, Oster (2017) has extended this work by providing more precise conditions to estimate the coefficient bounds.
Oster (2017) proposes a method for creating an interval within which the true coefficient is likely to lie. Specifically, if there is positive selection into the treatment and the correlation between selection on observables and on unobservables is positive, the upper bound of the interval is the OLS coefficient. 16 In the simpler case of only one observable covariate, the estimated lower bound,˘ can be approximated as follows: 17 Where 1 and R are the coefficient and R-squared of equation (1) (such as above) whereaṡ andṘ are the coefficient and the R-squared of a regression of earnings on apprenticeship status, with no additional controls. denotes the amount of selection on unobservables, which is defined as a proportion of selection on observables. R max is the R-squared from a hypothetical regression of the outcome on the treatment and on both observed and unobserved controls. Both these parameters are unknown and require assumptions on values which they could take. We adopt the conservative assumption that = 1 which implies that the amount of selection on unobservables and observables is equivalent. Altonji et al. 15 While we can and do apply bounding to the overall estimates (excluding sector), the R-squared is quite low (and shown in Tables A.3a and A.3b). When the R-squared is low, the bounding analysis is not very informative. Note that the R-squared is relatively high within most sectors once explanatory variables are included. 16 Note that it is not always that case that apprentices are positively selected. Based on descriptive statistics in Table 2, and if cognitive and non-cognitive abilities are positively correlated, then we would expect the selection to be positive in the case of all groups except for women educated up to Level 3 (where the selection would be in the opposite direction). 17 Since in our case there are multiple observable covariates, we rely on what the author defines as the 'general' estimator; the general estimator is based on the same key parameters we described for the simple estimator, and, under certain conditions, the simple estimator approximates the general one. The derivation and the proofs for the general estimator are detailed in Oster (2017).
(2005) set R max = 1. Oster (2017) shows that this can be problematic in the case of measurement error in the outcome variable. We adopt a value based on her analysis of experimental estimates and thus set R max = 1.3 R.
'Explaining' the gender gap Finally, we estimate OLS regressions for the full sample except this time we estimate the gender gap for those who achieve the same highest level of education (Level 2 or Level 3) and the same type (apprenticeship or other vocational). We successively introduce controls to try to understand what might be the main drivers of the gender gap and the extent to which this can be fully explained. The regressions that apply to those who have started an apprenticeship enable a broader range of apprenticeship-related characteristics than when we are comparing individuals with and without an apprenticeship. These include the level, duration and detailed sector of apprenticeship. In all regressions, we also examine the effect of including the limited job characteristics available. These include industry and industry interacted with the sector in which the individual's vocational qualification was obtained. 18 We do not observe whether individuals have children but we do observe this for the subsample of individuals who claimed unemployment benefits at a given point (and at that time). A limitation of the administrative data is that we do not know the individual's wages (only their earnings and their days employed). To better understand the gender gap, we use the Labour Force Survey (LFS) which contains information on the hourly wage as well as information on whether individuals have children. However, the sample sizes are much smaller. We undertake some descriptive analysis of data in the LFS and run comparable regressions with the LFS and the administrative data (LEO) to try to infer how important these characteristics are likely to be for explaining the gender earnings gap amongst these groups.

V. Results
Plotting the raw earnings differential Figure 1 illustrates the raw earnings differential for men and women according to their highest level of qualification and whether or not they start an apprenticeship. This is shown for the cohort that completed their compulsory education in 2002/03. The plot is of median gross annual log earnings by gender and group from 2008 to 2015. 19 What is immediately striking is that median log earnings are much higher and on an upward trajectory for men at both education levels (and for those with or without an apprenticeship), at least up to the age of 28 (i.e. in 2015). On the other hand, median earnings for women trend slightly downwards for those with a low level of education (up to Level 2) and stay roughly stable for those whose highest level of education is (vocational) Level 3. 18 Specifically we use the trade class of the firm one works in, which indicates the industry of work and is available in the administrative data. 19 For ease of illustration, in Figure 1  For both men and women, those educated up to Level 3 have higher median earnings than those educated up to Level 2. Within educational level, those who started an apprenticeship at some stage have higher median earnings. For men, this is particularly stark for those educated up to Level 3, with no sign of convergence with non-apprentices over time. For women, there is a smaller earnings gap between routes chosen and they converge over time for those educated up to Level 3.

First stage
As discussed in section IV, the instrument is the share of students from the same cohort, the same secondary school and the same gender that started an apprenticeship between the age of 16 and 18. 20 The dependent variable is the probability of starting an apprenticeship. As argued previously, the instrument plausibly reflects exposure to information about apprenticeships that might vary within the same school for different cohorts. However, this strategy would be undermined if it simply reflected local labour market conditions for cohorts starting post-16 education at the same time. To test for this, we re-estimate the regression but replace 20 Results are qualitatively similar when we do not distinguish between boys and girls with regard to the instrument.
We prefer to separate them given the different preferences of boys and girls with regard to vocational sector (and therefore relevance of peer influence) and more practically, because the IV estimates are more precise. the share of students within the same secondary school starting an apprenticeship with the share of students from the nearest neighbouring school (i.e. making use of schools' postcode information). The median distance from the nearest school is 1.3 km so they can be considered as belonging to the same local labour market. The results of these regressions are reported in Table 3, with the full specification shown in Appendix Table A.2a. They show a strong influence of the share of students from the same cohort in the same secondary school on the probability of starting an apprenticeship. Specifically, the estimates suggest that a 1% point increase in the share of male (female) apprentices in the same school and cohort increases the probability of undertaking an apprenticeship by 1.1 (1.5)% points for those educated up to Level 2 and by 1.2 (1.1)% points for those educated up to Level 3. To put this in context, over this time period, on average around 11% of boys and 8% of girls start an apprenticeship between 16 and 18 years of age. On the other hand, there is no relationship between the probability of starting an apprenticeship and the share of students from the same cohort in the closest neighbouring school. The point estimate is close to zero in all cases. These results suggest that the instrument is not reflecting the influence of labour market conditions. As a final robustness check, we consider whether the first stage results are sensitive to the inclusion of observable peer characteristics (based on demographics and/or prior attainment). The results (reported in Table 4) show that first stage estimates are not sensitive to the inclusion of these additional controls. 21 In other words, the share of peers taking up an apprenticeship does not simply reflect the observable characteristics of peers (e.g. higher attainment) that might influence individual outcomes for other reasons.

OLS and 2SLS estimates
In Table 5, we show OLS and 2SLS estimates for the second-stage regression (with the full specification for earnings shown in Table A.2b). There is very little relationship between having an apprenticeship and the probability of employment. This isn't surprising as the percentage of men and women employed at age 23 is 97-98% for most groups. 22 There is a positive relationship between having an apprenticeship and the number of working days during the year. For those educated up to Level 2, point estimates suggest that an apprenticeship increases the number of working days by about 7% for men and 6% for women. However, the point estimates are not statistically different from zero in the 2SLS estimates. For those educated up to Level 3, the positive relationship estimated in the OLS regressions is reduced (to near zero) in the 2SLS regressions for both men and women. The most interesting results are for log earnings, which are reported in columns (5)- (6) and (11)-(12). In all but one case, the OLS and 2SLS estimates are close in magnitude. For women who are educated up to Level 2, the point estimate in the 2SLS regression is lower than the OLS regression (even though it is still positive and not statistically different). In fact, the OLS and 2SLS estimates are never statistically different from each other. Overall, the findings suggest a very strong relationship between starting an apprenticeship and log earnings at age 23. For men, apprenticeships raise earnings by 30% and 46% for those educated up to Levels 2 and 3 respectively. For women, they raise earnings by 10% to 20% for the respective groups. Thus apprenticeship seems to be a good investment although the magnitude of the return varies markedly by gender.
The similarity of the results in the OLS and 2SLS regressions are consistent with a claim that selection bias is not very serious within the OLS regressions (given our controls and the choice of comparison group) 23 -although we should be mindful that the 2SLS estimate may be biased on account of the 'reflection problem'. Before moving on to understand heterogeneity, it is of interest to consider OLS estimates for the cohort that can be observed for the longest period in our data. This is shown in Table 6 for those finishing compulsory schooling in 2002/03. The outcome variables are the same as those considered above and results are shown when the cohort is at age 23 and 28. The OLS estimates at age 23 are qualitatively similar to those in Table 5 (where we use five cohorts). Coefficients on the employment variables are very small at the age of 28 for both men and women. The most interesting result is for earnings. For those educated up to Level 2, this shows that the earnings estimate stays similar for men but declines markedly for women (from 8% to 3%). For those educated up to Level 3, the earnings estimate stays very high for men (going from 43% to 28% between the age of 23 and 28) but declines very markedly for women (from 15% at the age of 23 to 3% at the age of 28). 22 The percentage is slightly lower for men and women educated up to Level 2 but without an apprenticeship. The percentage employed is 96% and 95% for men and women respectively. 23 However, the 2SLS estimates represent a 'LocalAverageTreatment Effect'among compliers and is not necessarily representative of the causal effect of starting an apprenticeship at other points of the distribution. They are only consistent with non-bias in the OLS regression if the parameter is close to the 'Average Treatment Effect'. Notes: Standard errors in parenthesis. Significance levels: *P < 0.10, **P < 0.05, ***P < 0.01. All regressions include the following controls: demographic characteristics (White British, English as first language, Eligible for Free School Meals, IDACI score), prior attainment in Key Stage 2 (age 10), prior attainment in Key Stage 4 (age 16), secondary school and cohort dummies and the average rate of unemployment rate in the local district. Standard errors are clustered at the Local Authority level. Notes: Standard errors in parenthesis. Significance levels: *P < 0.10, **P < 0.05, ***P < 0.01. For each education level, the first row reports the effects on outcomes measured at 23 years of age; the second row on outcomes measured at 28 years of age. All regressions include the following controls: demographic characteristics (White British, English as first language, Eligible for Free School Meals, IDACI score), prior attainment in Key Stage 2 (age 10), prior attainment in Key Stage 4 (age 16), secondary school fixed effects and average rate of unemployment rate in the local district. Standard errors are clustered at the Local Authority level.

Earnings estimates by sector
An interesting insight from above analysis is that the relative payoff to undertaking an apprenticeship (compared to similar classroom-based vocational education) is much higher for men than women. We examine gender gaps explicitly in the below section. However, it appears that part of the explanation is that men specialize in vocational areas where having an apprenticeship is more beneficial for future earnings. In Tables 7a and 7b, we show the average earnings differential separately for the ten most popular sectors of vocational education for apprentices (ranking sectors in terms of highest to lowest average earnings differential). We show this at age 23 for Level 2 and Level 3 respectively. As described in section IV, we construct bounds around the coefficients to suggest a lower and upper limit to the OLS estimate under the assumption that unobservable characteristics have the same importance as observable characteristics in explaining the relationship between starting an apprenticeship and earnings. Appendix Tables A.3a and A.3b report how the R-squared and earnings coefficients are affected by the inclusion of observable characteristics. Notes: Standard errors in parenthesis. Significance levels: *P < 0.10, **P < 0.05, ***P < 0.01. The results are obtained pooling together five cohorts of students who did their GCSEs from 2002/03 to 2007/08. Earnings measured at age 23. The regressions include the following controls: demographic characteristics (White British, English as first language, Eligible for Free School Meals, IDACI score), prior attainment in Key Stage 2 (age 10), prior attainment in Key Stage 4 (age 16), secondary school and cohort fixed effects, amount of highest vocational studies (guided learning hours associated to the qualification) and local unemployment rate. To define the bounds, we follow the suggestions in Oster (2017): =1 and R-max=1.3R-sq. Standard errors are clustered at the secondary school level. As discussed in section III and illustrated again here, the popularity of different sectors varies substantially according to gender. 24 For example, Building and Construction and Engineering are very important for men; Service Enterprises (i.e. hairdressing, beauty), Child Development and Health and Social Care are very important for women. The pattern of estimates shown in Tables 7a and 7b suggests that although there is a positive earnings differential from undertaking an apprenticeship within most sectors (at this age), the differential is often much larger within sectors that men specialize in. Indeed there are a number of popular sectors for women where the earnings differential is either low or non-existent. These include Child Development at Level 3 (although there is still a positive differential) and Business Management at Level 2. The estimated bounds suggest that there is no (popular) sector for which men would not have a positive earnings premium at age 23. For most sectors, the bounds are reasonably tightly defined. 25 At Level 2, the lower bound never falls below 5% whereas at Level 3, it never falls below 12%. More often, the lower bound is a lot higher than that. For women, there is also a positive lower bound in most cases. 26 However, there are some sectors where a negative earnings differential at age 23 cannot be ruled out. These include Nursing at Level 3 and the following sectors at Level 2: Animal Care and Veterinary Science, Business Management and Sport, Leisure and Recreation.
There is relatively little overlap in the most popular ten sectors for men and women. Where there is overlap, the earnings premium to having started an apprenticeship tends to be higher for men: at Level 2 -Administration (20% v 6%); Retailing and Wholesaling (12% v 9%), Sports, Leisure and Recreation (11% v 8%); at Level 3 -Administration (21% v 5%), Business Management (25% v 14%), Sport, Leisure and Recreation (24% v 11%). Since we observe a gender earnings gap within sectors that are popular both among men and women, the sector of vocational education cannot entirely explain why the earnings premium to having started an apprenticeship is higher for men than for women.

Understanding the gender gap
To look more explicitly at gender differences in earnings, we compare men and women who start an apprenticeship and estimate earnings differentials at age 23. We progressively include controls. Specifically, we include demographic characteristics, prior attainment and the sector of highest vocational education. We also include characteristics of the Level 2 and Level 3 apprenticeships undertaken by the individual, such as an indicator for achievement, duration and the detailed sector of apprenticeship. 27 We include labour market controls: the local unemployment rate, the number of years the individual has been observed in (post-24 In this section, we use a more refined measure of sector than in Table 1. 25 For some sectors, apprentices and non-apprentices are very similar based on observable characteristics. Hence, bounding makes no difference to the earnings differential. 26 For women educated to Level 3, the OLS estimate is usually the lower bound. This is because of 'negative selection' into apprenticeships which is shown in Table 2b and discussed in Section IV. This applies to some sectors more than others. 27 The sector of apprenticeship is a separate variable to the sector of vocational learning. Only the latter is defined for those with and without an apprenticeship. education) employment and the industry of employment. Finally, we include an indicator for having a child for those who have received benefits of any type (available if an individual has claimed benefits at the same time as being a parent). While some of these variables are endogenous to the apprenticeship decision, the purpose here is simply to see the extent to which the gender earnings differential can be reduced (or removed) and what appears to drive this. As before, we run this regression separately for individuals whose highest vocational achievement is Level 2 or Level 3. Table 8 reports the regression results, where the upper panel shows results for individuals educated up to Level 2 and the lower panel shows results for individuals educated up to Level 3. The first column reports the raw differential (in favour of men), which amounts to around 26% at Level 2 and 32% at Level 3. These raw estimates are large and very similar to the OECD estimates for the gender gap amongst graduates (OECD, 2017). The single most important factor making a difference to the gender differential is the vocational sector. In this respect, these results are very similar to those reported for young German apprentices (Kunze, 2005). After this is included (column 3), the differential goes down to 18% at Level 2 and 15% at Level 3. At Level 2, the differential further reduces when detailed employer characteristics are also included such as industry of work and this variable interacted with the sector within which the individual obtained his/her qualification. With all these controls, the differential goes to 9% at Level 2. In contrast, the differential at Level 3 does not change very much from column 3 to the most detailed specification in column 10, where it amounts to 14%.
For comparison, we run similar regressions for men and women with the same vocational level of education but no apprenticeship. Results are reported in Table 9. While the gender gap at Level 2 for non-apprenticeships is similar in the most detailed specification (at 10%), there is a much smaller gender earnings differential for those educated up to Level 3. The small gender gap that exists in the simple specification quickly disappears after controlling for the vocational sector.
What might explain the residual in the gender earnings differential for these groups? One plausible explanation might be hours of work, which is not available in administrative data. For example, women are much more likely to work part-time. Thus, we make use of the Labour Force Survey which reports hours of work. Table 10 shows the average hours of work of 23 year old men and women according to their level of education and whether they have children. It would appear that the 'hours gap' between men and women is around 5-6 hours per week (and higher for the very small sample of people with an apprenticeship). If we restrict the sample to only those without children, the hours differential is a lot smaller (at around 2-3 hours per week). It is not implausible that this could explain the residual earnings gap that appears in the administrative data -especially for those educated up to Level 2.
A more direct way to look at this is to estimate the gender pay gap using both sources of data. To make the sample in the Labour Force Survey as large as possible (whilst also comparable to our sample), we use the waves over the period 2009-15 and consider a sample of individuals aged between 21 and 29 years old. Our final sample consists of 452 workers with a vocational Level 2 qualification as their highest qualification and with an apprenticeship and 1121 individuals with an apprenticeship and a Level 3 vocational qualification. To understand how this exercise relates to the rest of our analysis, we also Gender differential in earnings between male and female apprentices (measured at 23 years of age) (1) (3) (8) Notes: Standard errors in parenthesis. Significance levels: *P < 0.10, **P < 0.05, ***P < 0.01. At level 2, N = 17,656. At level 3, N = 13,026. The controls include the following variables: demographic characteristics (White British, English as first language, Eligible for Free School Meals, IDACI score); prior attainment in Key Stage 2 (age 10); prior attainment in Key Stage 4 (age 16), secondary school fixed effects; detailed sector of highest vocational studies; amount of highest vocational studies (guided learning hours associated to the qualification); a dummy for level 2 and level 3 apprenticeships, the duration of level 2 and 3 apprenticeships and the detailed apprenticeship sector; the labour market controls are: the local unemployment rate, the detailed industry of work, being a parent for individuals in receipt of benefits, the number of years the individual is observed in employment.

TABLE 9
Gender differential in earnings between male and female non-apprentices (measured at 23 years of age) (1) (3)  estimate the same model on a comparable sample of individuals from our administrative data set (20 to 29 year olds over the period 2006-15). Columns 1 and 2 of Table 11 report the gender differential in weekly pay in the administrative data (LEO) and in columns 3 and 4, we estimate this same regression using the Labour Force Survey. In columns 5 and 6, the dependent variable is changed to hourly earnings. For each outcome, we estimate two specifications: age, ethnicity, number of GC-SEs and year dummies (columns 1, 3 and 5); and also industry of employment (columns 2, 4 and 6).
The results show that coefficients on weekly earnings are qualitatively similar when using similar controls and similar samples in both the administrative (LEO) data and the Labour Force Survey. The gender earnings differential is larger than in our previous analysis because we are not using as many controls (and possibly also because we are using an older sample). The gender differential reduces markedly when hourly wages are used as the dependent variable. If we compare columns 4 and 6, the differential is 2-2.5 times smaller in terms of hourly pay rather than weekly pay. If we reduced the coefficients from our earlier analysis by this amount, it would make a substantial dent in the earnings differential at Level 2 (reducing it to around 3-4%) but less so for those educated up to Level 3 with an apprenticeship, where the gender earnings gap is higher in administrative data. 28 28 The LFS estimates show a slightly higher (and more precise) point estimate for the gender differential in hourly pay for those educated up to Level 3 with an apprenticeship compared to those educated up to Level 2 with an apprenticeship. However, they are in the same ballpark and not significantly different from each other. Year fixed effects Personal characteristics (age, ethnicity) Number of GCSEs held (more or fewer than 5) Industry of employment Notes: Standard errors in parentheses. Significance levels: *P < 0.05, **P < 0.01, ***P < 0.001; Column (1)-(2) are estimated using administrative data for the sample of all apprentices aged between 20 and 29 over the period 2006-15. A comparable sample of 21-29-year-old apprentices is constructed from the Labour Force Survey (LFS) using the 2009 to 2015 waves of the survey.

VI. Conclusion
In this study, we have investigated whether there is a return to starting an apprenticeship over and above leaving education with at most classroom-based vocational qualifications at the same level. This question is especially policy relevant in the light of plans in England to increase the number of apprenticeships and to redesign post-16 vocational education with more of an explicit focus on apprenticeships.
We have access to exceptionally detailed administrative data to answer this question. We can look at a whole cohort of young people as they move from school into further education and into the labour market. We also focus specifically on young people, bearing in mind that returns might vary over time as the vocational system (and labour market) has changed in England.
Our findings show an average return to having an apprenticeship on earnings that persists (at least up to age 28). However, the earnings differential is much smaller for women (especially at the advanced level). This difference is driven strongly by the vocational sector. When we compare men and women who undertook the same level of apprenticeship, there is a residual earnings differential at age 23 that does not disappear even when we compare very similar workers. Comparisons with survey data suggest that higher hours of work by men is an important driver of this difference, though it does not entirely account for it among those educated to a more advanced level. It also stands in contrast to the gender gap among those with a vocational (non-apprenticeship) education at Level 3. In this case, the gender earnings gap is driven by a relatively small number of characteristics.
Overall, the results in this paper should give cause for optimism that apprenticeships really do generate a positive return in the labour market for young people. Our analysis suggests that this is unlikely to be driven by selection. Increasing opportunities for young people to access apprenticeships does seem to be a worthwhile policy, especially since these returns are experienced by individuals who leave school with low to medium qualifications. However, the paper also illustrates huge variability in the returns to apprenticeships. This is largely driven by the sectors in which people specialize and is a particularly important source of the gender earnings gap for those educated up to Level 3 (i.e. upper secondary education).A practical implication is that careers information to students should pay careful attention to the type of apprenticeships available rather than to encourage students to take any type of apprenticeship at all.

Description of data sets
We use a number of administrative data sets for England that may be linked. The linked data set is known as the Longitudinal Educational Outcomes (LEO) data set, which combines administrative data on education and tax records. This is one of the first papers to use this data set. The two sections below describe the datasets and the variables used.

Education data
We use the National Pupil Database (NPD), the Individualized Learner Record (ILR) and data from the Higher Education Statistics Agency (HESA). Linked together, these data provide information on the full education career of students in England in state schools, from primary school to university.
The NPD data set contains detailed information about the attainment of students throughout the years of full-time compulsory education (age 5-16). The curriculum in these years is divided into 'Key Stages'. Key Stage 2 ends at the end of primary school when students are aged 11 (Year 6). At this stage, students undertake national tests in maths, English and science. We standardize scores by year and use them in our analysis (described further below).
At the end of secondary school, students undertake examinations for the General Certificate of Secondary Education (GCSEs). This comes at the end of Key Stage 4 (KS4) when students are aged 16 (Year 11). In addition to examination results, the NPD data contains information from the pupil census. We use the secondary school attended; demographic characteristics: gender, ethnicity, whether English is spoken as a first language, eligibility to receive Free School Meals and the Income Deprivation Affecting Children Index, which is a measure of deprivation in the locality of where the student lives (defined below).
For post-16 education, we use the ILR, which is a register of all state-funded vocational courses. It is from this data set that we gather all the information on the main qualification (level, achievement, sector, start and end dates) and on the characteristics of the apprenticeships. As already mentioned in the main text, we include in our sample only those students whose highest level of achieved education is a vocational qualification at Level 2 or at Level 3. HESA data provides information on students in higher education. We use this information to check the highest qualification of each learner.
Tables A.4a and A.4b shows summary statistics for the variables used in the main analysis. Together with secondary school and cohort dummies, these are the variables used in most of the regressions (unless when otherwise specified).
Some of them (White British, English at home, Eligible for FSM) are included in Tables 2a and 2b and discussed in the text. Here we describe three sets of variables that are not examined at length in the main text, namely the IDACI score, the KS2 and GCSEs scores.
The Income Deprivation Affecting Children Index (IDACI) measures the proportion of all children aged 0 to 15 living in income deprived families. We use this in combination with the eligibility to Free School Meals as a measure of socio-economic background.
The KS2 scores (in English, maths and science) are the scores obtained at the examinations at the end of primary school (at age 11). For ease of comparison over time, the scores are standardized by cohort (i.e. for each cohort the mean is 0 and the s.d is 1). Of all the baseline characteristics listed in this table, this is the only variable with some missing data. This variable is missing for 4% to 13% of male students and for 4% to 7% of females, according to the group.
Our main analysis reports the estimates when observations with missing values are dropped. However, as a robustness check, we have estimated the main regressions with imputed values (equations (2) and (3)), where we also added a dummy to indicate the The GCSE score is an overall score that takes into account the grades obtained in the different subject (e.g. English and maths). Like for the KS2 scores, the GCSEs score has been standardized by the cohort.
The summary statistics (below) show that those whose highest attainment is at Level 2 have worse prior attainment than those whose highest attainment is at Level 3. Within each level, those without an apprenticeship have lower prior attainment than those with an apprenticeship. The only exception being females educated up to Level 3, for whom the GCSE score is slightly higher for those without an apprenticeship.

Labour market outcomes
The labour market information comes from Her Majesty's Revenue & Customs individual records (HMRC). This is merged with the education data from administrative records (described above). Together this forms the LEO data base ('longitudinal educational outcomes').
The P45 file contains the employment dates. We combine this information with information coming from other HMRC records, particularly from earnings and benefits to help us define the start and end dates of employment spells, when these are missing or inaccurate in the P45 file. Using this information, we compute an indicator for being in employment (which takes the value of 1 if the individual has worked at least one day during the tax year). We also compute the number of working days. Notes: Standard errors in parentheses. Significance levels: *P < 0.05, **P < 0.01, ***P < 0.001. All regressions also include cohort and secondary school fixed effects. Standard errors are clustered at the Local Authority level.
Annual earnings of employees are declared by employers and collected in the P14 data set. This register contains all earnings of all employees who earn above the Lower Earning Limit threshold (LEL). From this file, we construct one of our dependent variables: the log of real earnings, at 2010 prices. For our analysis, we only consider positive earnings. As mentioned in the main text, we are able to match at least 80% of each cohort with HMRC data (ranging from 80% for the 2003 cohort to 85% for the 2008 cohort). Well over 90% of the matched individuals have positive earnings, with little difference across cohorts. Finally, we also derive the number of waves that each individual has been observed with positive earnings, at least 2 years after the end of their education. This is a proxy for labour market experience and it is included as a control in Tables 8 and 9.
A possible limitation of this data is that low earners may not be adequately captured as employers have no obligation to record earnings below the LEL threshold. However, we do not think this is likely to be an important consideration for our analysis. Firstly, the threshold is reasonably low: in the period considered (2004 to 2015), the LEL ranged from £77 to £111 per week. To put this into context, in 2015, this is just below 41% of the weekly full-time minimum pay. Secondly, all employers having at least one employee   Notes: Standard Errors in parentheses. Significance levels: *P < 0.10, **P < 0.05, ***P < 0.01. All regressions also include cohort and secondary school fixed effects.
Standard errors are clustered at the Local Authority level.   Notes: Standard errors in parenthesis. Significance levels: *P < 0.1, **P < 0.05, ***P < 0.01. The results are obtained pooling together six cohorts of students who did their GCSEs from 2002/03 to 2007/08. Log gross annual earnings measured at age 23. The specification with controls include the following controls: demographic characteristics (White British, English as first language, Eligible for Free School Meals, IDACI score), prior attainment in Key Stage 2 (age 10), prior attainment in Key Stage 4 (age 16), secondary school fixed effects, amount of highest vocational studies (guided learning hours associated to the qualification), local unemployment rate. Standard errors are clustered at the secondary school level. 1: Transportation Operations and Maintenance; 2: Nursing and Subjects and Vocations Allied to Medicine. earning above the threshold must report the earnings of all employees, including the low earners. Another limitation of this data is that it does not contain information on hours of work. This is a general feature of analysis based on tax records.
Tables A.5a and A.5b report the summary statistics for the variables described above for males and females, respectively. The tables indicate that within each Level, those with an apprenticeship work more days and have higher earnings than those without an apprenticeship. This is despite having less labour market experience (notice that the time spent working as an apprentice is excluded by definition). The latter is a consequence of apprentices having spent longer in education than non-apprentices (as shown in tables A.1a and A.1b). With the exception of males educated up to Level 3, a larger share of apprentices is in employment compared to non-apprentices within each category, although the difference is very small. Finally, within each group, males earn more than females, despite females working more days than males on average. This could either imply that males have higher wages and/or that males work longer hours per day.

Summary statistics for the Instrument
In this section, we present detailed statistics for the instrument used in the main analysis. As discussed in the main text, our Instrumental Variable strategy relies on within-school, crosscohort variation in peers' participation to apprenticeships. More specifically, we consider the share of students of the same gender, from the same secondary school and the same cohort who go on to start an apprenticeship in the two academic years following the end of KS4. Table A.6 reports summary statistics related to the instrument. The first line of Panel A reports the average number of peers (as defined above). On average, male students educated up to Level 2 (3) have 109 (112) peers of the same gender and cohort in their school. Female students have 107 and 110 peers respectively. The next line reports the average share of apprenticeships amongst the peers of a given student. Among males educated up to either Level 2 or 3, about 11% of their peers start an apprenticeship within 2 years. For females this is lower at 8%. We also adopt a placebo instrument defined as the share of apprenticeships started by students of same gender and from the same cohort who studied in the nearest school. At the bottom of Panel A, we can see that on average in the nearest school, 8% and 6% start an apprenticeship for males and female respectively. Finally, Panel B of Table A.5 shows very limited variation in the average share of peers starting an apprenticeship across the six cohorts of the main analysis. The correlation between the share of apprentices for cohort t and cohort t − 1 suggest that within school, the share of apprenticeships is quite persistent over time, especially among males.