• Open Access

Widening participation in higher education: analysis using linked administrative data

Authors


Address for correspondence: Anna Vignoles, Department of Quantitative Social Science, Institute of Education, 20 Bedford Way, London, WC1H 0AL, UK.
E-mail: A.Vignoles@ioe.ac.uk

Abstract

Summary.  The paper makes use of newly linked administrative education data from England to understand better the determinants of participation in higher education (HE) among individuals from low socio-economic backgrounds. The data are unique in being able to follow the population of two cohorts of pupils in England—those who might have entered HE between 2004–2005 and 2006–2007—from age 11 to age 20 years. The findings suggest that, although large differences in HE participation rates and participation rates at high status universities by socio-economic background remain, these differences are substantially reduced once prior achievement is included. Moreover, these findings hold for both state and private school pupils. This result suggests that poor achievement in secondary schools is more important in explaining lower HE participation rates among pupils from low socio-economic backgrounds than barriers arising at the point of entry to HE. These findings are consistent with the need for earlier policy intervention to raise HE participation rates among pupils from low socio-economic backgrounds.

1. Introduction

Participation in higher education (HE) has expanded substantially in England over the last half-century. Yet concerns about inequality of access to university for socio-economically disadvantaged pupils remain a major policy issue (Department for Education and Skills, 2003, 2006). This inequality is illustrated by the fact that only 14% of pupils who are eligible for free school meals (FSMs), which is a commonly used indicator of low socio-economic status (SES) in England, participate in HE at age 18–19 or 19–20 years, compared with 33% of pupils who are not eligible for FSMs. Furthermore, despite decades of policy designed to ‘widen’ participation, i.e. to increase the participation in HE of pupils from lower socio-economic backgrounds and other under-represented groups, socio-economic inequality in degree participation and achievement appears to have worsened in England during the 1980s and early 1990s by some measures (Blanden and Machin, 2004; Galindo-Rueda et al., 2004; Glennerster, 2001; Machin and Vignoles, 2004), and the prevailing policy discussion continues to stress the need to ‘widen’ participation in HE further. In this paper we seek to inform one aspect of this debate by addressing the specific question of when socio-economic inequalities in achievement in education emerge in England and the extent to which these differences in pre-university achievement can help to explain the sizable socio-economic differences in participation in HE that we observe.

We use a unique data set to carry out our analysis; unique in the sense that education data from various administrative sources have been linked to create a census of the population of secondary school pupils in England (with approximately half a million pupils per cohort). The data allow us to follow two cohorts of pupils through the English education system from age 11 years as they exit primary school, through secondary school and on to potential participation in HE anywhere in the UK (i.e. including Scotland, Wales and Northern Ireland) at age 18–19 years (when first eligible) or 19–20 years (after a single year out). These linked administrative data come from primary and secondary schools, as well as from colleges of further education and universities. Hence, unlike previous work using individual level administrative data from HE records alone, our analysis is based on population data for both participants and non-participants in HE.

Our approach to estimating the relationship between SES and participation in HE draws on other literature which has shown that differences in education achievement by socio-economic background emerge early (see, for example, Centre for Market and Public Organisation (2006) and Feinstein (2003) for the UK and Cunha and Heckman (2007) and Cunha et al. (2006) for the USA). A feature of our data is that they have extremely detailed information on pupils’ prior educational achievement in both primary and secondary school. By controlling for pupils’ prior achievement we can see whether the socio-economic differences in HE participation rates that are observed at age 18–19 or 19–20 years are reduced substantially, or even disappear, once we take into account differences in earlier achievement. Specifically, if young people with similar test scores at ages 11, 14, 16 and 18 years have a similar probability of being enrolled in university regardless of their socio-economic background, then we can conclude that the inequality in university participation across socio-economic groups is due to the poorer earlier educational achievement of lower SES children. If weak prior educational achievement is at the root of socio-economic inequalities in participation in HE, then policies that are designed to remove barriers at the point of entry to HE—e.g. bursaries—might not be particularly effective at raising participation in HE among disadvantaged youth.

The achievements of pupils in the same school, including whether or not they participate in HE, are likely to have some similarities owing to the influence of schools, peers and teachers within schools. We thus adopt a two-level nested structure with pupils at level 1 grouped within schools at level 2. If we denote by yij the outcome of individual i in school j (i=1,…,nj;j=1,…,J), then the two-level linear model can be written

image(1)

Here yij is a binary variable indicating whether the person enrolled in HE or not at age 18–19 years or 19–20 years (henceforth age 19 or 20 years). The parameter β0 is the regression intercept, xij represents covariates that can vary between individuals or schools, β represents the regression coefficients for these covariates; uj is the effect of school j and eij is an independently distributed individual level error term. We assume that inline image for p=1,…,P and inline image.

Our covariate of interest is our measure of SES, a vector of indicator variables for the SES quintile to which the ith person belongs. Another covariate of interest is a vector of measures of the individual's prior achievement at ages 11, 14, 16 and 18 years. The model is estimated sequentially: we start by estimating the SES differences in HE participation in a model including only the SES quintiles. We then examine the extent to which these differences can be explained by differences in other characteristics, by successively including individual covariates and school effects (see Section 4 for a more detailed discussion of the way in which we model these effects), and an individual's prior achievement, in the model.

We know from existing evidence that not all university degrees have equal economic value and the type of university that is attended also makes a difference to a pupil's labour market outcomes. In the UK the wage benefit from a degree varies markedly according to both the degree subject studied and the type of institution attended (Chevalier and Conlon, 2003; Iftikhar et al., 2008). Previous research has suggested that low SES students in the UK are concentrated in modern ‘post-1992’ universities (Connor et al., 1999) and that degrees from these institutions attract lower labour market returns. Our data include information on the university that is attended by each HE participant, enabling us to model the type of HE institution that a student enrols in. In particular, we identify a group of ‘high status’ universities, as measured by the quality of research carried out by these institutions (research assessment exercise scores) and show that there is a large difference in attendance at these high status institutions according to SES. For example, only 2% of state school pupils who are entitled to FSMs (17% of FSM eligible HE participants) attend a high status institution at age 19 or 20 years compared with 10% of pupils who are not entitled to FSMs (32% of non-FSMs-eligible HE participants).

We model the probability of attending a high status institution with a model that is similar to equation (1), where the sample is restricted to those who enrol in HE. We denote by gij the outcome of individual i in school j (i=1,…,nj;j=1,…,J ) and use the model

image(2)

Here gij takes a value of 1 if the ith individual attends a high status university at age 19 or 20 years and 0 if they attend any other HE institution. The parameter β0 is the regression intercept, xij represents covariates that can vary between individuals or schools, β represents the regression coefficients for these covariates, uj is the effect of school j and eij is an independently distributed individual level error term. We assume that inline image for p=1,…,P, and inline image.

The rest of the paper is organized as follows: the relevant literature is discussed in Section 2 and some information about the English school system and the UK HE system is provided in Section 3. Section 4 provides a description of the data that we use, Section 5 discusses the methodology that we adopt and Section 6 presents our results. Section 7 concludes.

2. Previous research

Part of the motivation for this study is the observation that the socio-economic difference in participation in HE and achievement worsened in the UK during the 1980s and early 1990s (Blanden and Machin, 2004; Galindo-Rueda et al., 2004; Glennerster, 2001; Machin and Vignoles, 2004), although it appears to have narrowed since then (Raffe et al., 2006).

There is a literature that has examined the factors influencing educational achievement of different types of pupils, particularly in terms of the role of socio-economic background (Blanden and Gregg, 2004; Carneiro and Heckman, 2002, 2003; Gayle et al., 2002; Meghir and Palme, 2005; Haveman and Wolfe, 1995). Such studies have generally found that an individual's probability of participating in HE is significantly determined by their parents’ characteristics, particularly parental level of education and/or SES.

A related literature has focused on the timing of the emergence of differences in the cognitive development of different groups of children (see Centre for Market and Public Organisation (2006) and Feinstein (2003) for the UK and Cunha and Heckman (2007) and Cunha et al. (2006) for the USA). This literature suggests that differences in educational achievement emerge early in preschool and primary school (Cunha and Heckman, 2007; Demack et al., 2000), rather than later in life.

Evidence from the USA suggests that potential barriers at the point of entry to university, such as credit constraints (arising from low parental income and/or a lack of access to funds), do not play a large role in determining participation in HE among lower SES pupils (Cunha et al., 2006; Carneiro and Heckman, 2002). However, Belley and Lochner (2007) suggest that credit constraints have started to play a more important role in determining participation in HE in the USA in recent years.

The evidence for the UK is equally mixed. Gayle et al. (2002) found that differences in participation in HE across different socio-economic groups remained significant, even after allowing for educational achievement in secondary school, suggesting that choices at 18 years of age (and potentially credit constraints) play a role in explaining the inequalities in participation in HE that we observe. Dearden et al. (2004) also found limited evidence of credit constraints for members of the 1958 and 1970 British cohort studies. Bekhradnia (2003), in contrast, found that, for a given level of educational achievement at age 18 years (as measured by A-level point score), there is no significant difference by socio-economic background in university participation rates.

Even if prior achievement explains the majority of the difference in HE participation rates of different groups, other barriers exist at the point of entry to university (see Connor et al. (2001), Forsyth and Furlong (2003), Haggis and Pouget (2002) and Quinn (2004) for barriers facing students from lower socio-economic backgrounds). The evidence on the role of these factors was reviewed in Dearing (1997) and recently by Gorard et al. (2006), who made the case for further careful quantitative analysis of participation in HE by using data that include information on participants and non-participants, and measures of prior educational achievement, such as that presented in this paper.

3. Background on English school and UK higher education systems

Over the period that is covered by our study, children in England sat a series of nationally assessed standardized achievement tests at ages 7, 11 and 14 years, assessing their abilities in English, mathematics and science. (The tests at age 14 years have since been abandoned and the tests at age 11 years are now teacher assessed.) At age 16 years most pupils take national examinations known as General Certificates of Secondary Education (GCSEs). Pupils have some choice over the combinations of subjects that they take, and many take vocational qualifications that are distinct from but meant to be equivalent to GCSEs. Most pupils might expect to sit between eight and 10 examinations in different subjects, and the ‘expected’ level of achievement is five GCSEs (or equivalents) at grades A*–C. Around 60% of pupils achieved this benchmark in 2007 (Department for Education, 2010).

Among pupils who choose to remain in education past the compulsory school leaving age of 16 years, those pursuing the academic route tend to take advanced level examinations (A levels). These are nationally examined tests taken at age 18 years and are the majority qualification taken by those going on to university (although some pupils enter HE with vocational qualifications). To achieve the national qualifications framework level 3 threshold, pupils must obtain two A-level passes at grades A–E (or equivalent), although most pupils going on to university—particularly high status institutions—would expect to be achieving significantly more than that.

HE in the UK is largely (but not exclusively) carried out in 130 universities in England, 20 universities in Scotland, two universities in Northern Ireland and 11 universities in Wales. We can observe whether our cohort members enrol in any of these institutions. There is one private university in the UK and this institution is also included in our analysis. Some HE is also taught in colleges of further education but generally at sub-Bachelor degree level. We restrict our sample to those individuals who enrol in university to take a Bachelors degree, which is generally a 3-year degree programme.

There is a centralized admissions procedure for entry into HE which covers the whole of the UK—known as the Universities and Colleges Admissions System—and most young entrants (age 18–20 years) use this system. Students apply through this system but the decision on whether they gain entry into a particular institution is normally made by the admissions tutor in the relevant university department. Offers are generally made on the basis of achievement only, although students do write a supporting statement which is considered by admissions tutors. Very few institutions interview candidates though notable exceptions are Oxford and Cambridge.

Participation in HE in the UK has increased dramatically over time and in 2007, the period pertaining to our data, 43% of 17–30-year-olds went to university (including 21% of those aged 18–19 years and 10% of those aged 19–20 years) (Department of Business, Innovation and Skills, 2010). Some of this increase is attributable to the reclassification of polytechnics (which used to specialize in vocational education) into universities in 1992. Although overall participation has been rising, under-representation of certain groups of pupils in relation to their incidence in the population remains a major policy concern (Department for Education and Skills, 2003, 2006). This is reflected in the myriad initiatives designed to improve the rate of participation of ‘non-traditional’ students, such as those from low SES backgrounds.

Concerns about access to HE increased following the introduction of tuition fees in 1998. Before that time, university attendance was free at the point of use and there was means-tested state support for living costs. Although the 1998 fees were means tested and modest (around £1000 at that time), they had to be paid before starting a course and there were fears that the prospect of fees would create a barrier to participation in HE for poorer students (Callender, 2003). Some evidence suggests that poorer students in the UK leave university with more debt and appear more debt averse in the first place (Pennell and West, 2005). In spite of this, the introduction of fees in 1998 did not reduce the relative HE participation rate of poorer students (Universities UK, 2007; Wyness, 2009). This outcome may be because the fees were a relatively small proportion of the total cost of going to university or because students were forward looking and expected substantial returns on their investment.

The first cohort in our sample potentially entered university in 2004–2005 or 2005–2006, depending on whether the student took a year out between school and university. This cohort therefore experienced the 1998 fee regime. The second cohort potentially entered university in 2005–2006 or 2006–2007 and those students who entered university straight after school would have experienced the 1998 fee regime. Those who took a ‘gap year’ between school and university would have experienced different funding arrangements. Specifically, the 2004 Higher Education Act introduced higher fees that were no longer payable up front and could vary across institution and subject (though in practice they did not), accompanied by commensurately higher student support. These changes took effect in 2006–2007 (see Dearden et al. (2011) and Crawford and Dearden (2010) for an evaluation of the effect of these changes).

To allow for differences in the fee regime that were faced by these students, we include a cohort indicator variable in our model. However, evidence suggests that many students in our second cohort are likely to have decided to forgo a gap year to avoid the increase in tuition fee in 2006–2007 (Crawford and Dearden, 2010); hence most students in our sample will have faced the 1998 regime. Chowdry et al. (2008) analysed the participation decisions of the first cohort of pupils only—whom we might reasonably expect to be unaffected by the 2006–2007 reforms—and found qualitatively similar results to those described in this paper.) Although differences in funding arrangements for HE across Wales, Scotland and England emerged during this period (see Wyness (2009)), both cohorts in our sample faced paying tuition fees (or a graduate endowment) in all the countries of the UK.

Further changes to student support were introduced in 2007–2008 and more recently the UK Government has announced substantial increases in tuition fees accompanied by higher student support following a review of the funding of HE in England by Lord Browne (see Browne (2010)). Our findings therefore apply to an era in which there were relatively small tuition fees and may not necessarily hold under the new fee regime that is due to be introduced in 2012–2013.

4. Data

We use linked individual level administrative data from the national pupil database, the National Information System for Vocational Qualifications and the Higher Education Statistics Agency for two cohorts of pupils, totalling approximately half a million children in each cohort, who sat GCSE examinations at age 16 years in 2001–2002 (cohort 1) and 2002–2003 (cohort 2). These data cover the population of pupils taking these tests and record their participation at HE institutions anywhere in the UK at age 19 or 20 years. Table 1 outlines the progression of these cohorts through the education system.

Table 1.   Progression of our cohorts through the education system
Outcome Results for cohort 1 Results for cohort 2
  1. †Not applicable.

Born1985–19861986–1987
Sat key stage 1 (age 7 years)—†—†
Sat key stage 2 (age 11 years)1996–19971997–1998
Sat key stage 3 (age 14 years)1999–20002000–2001
Sat GCSEs or key stage 4 (age 16 years)2001–20022002–2003
Sat A levels or key stage 5 (age 18 years)2003–20042004–2005
HE participation (age 19 years)2004–20052005–2006
HE participation (age 20 years)2005–20062006–2007

4.1. Data linkage

The data linkage process was carried out by the Fischer Family Trust on behalf of the Department for Education. Unfortunately, only very limited information is available on the linkage process and we could not test the robustness of the linking algorithms. Broecke and Hamed (2008) reported that two linking algorithms were used. Firstly, administrative data from different schools and colleges—the national pupil database and National Information System for Vocational Qualifications data—were linked by using a unique administrative pupil identification number. This type of linkage is unlikely to produce any sizable biases, though more mobile pupils may be less likely to have a complete school record. Administrative data from the Higher Education Statistics Authority were then linked to the school or college data by using probabilistic matching on the basis of a set of identifying variables including name, gender, date of birth and postcode. Given the large number of variables that are used in the linkage algorithm and the use of date of birth and postcode, the linking process is likely to be of high quality. We cannot, however, provide estimates of the number of incorrectly linked records (where individuals in the schools data are matched to the wrong individual in the HE data) nor the number of incomplete records (where individuals in the schools data are not matched to an HE record even when the individual attended university or vice versa).

For the first cohort in our linked data set, we know that, of those English domiciled 18-year-olds who were observed in HE in 2004–2005 in the Higher Education Statistics Authority standard registration population, 19% did not have a linked school record (Broecke and Hamed, 2008). Broecke and Hamed (2008) investigated this issue and reassuringly concluded that most of these non-linked pupils were not in state schools according to the previous institution field in the Higher Education Statistics Authority record (for example they may have been foreign students who came to the UK to go to university). This suggests that our analysis covers the vast majority of those in English state secondary schools, which is our population of interest.

The absence of information on the quality of the linkage process limits our ability to deal with any resultant biases, e.g. by modelling the errors that are produced by the data linkage process (see Chesher and Nesheim (2006) for a survey of these issues). As Chambers (2009) pointed out, however, models for dealing with linkage errors when using hierarchical models (such as fixed or random-effects models, as we use in our study) have in any case not yet been developed. This is an issue for future research.

4.2. Data coverage

The data provide a census of state school children in England and include academic outcomes in the form of achievement test scores at ages 11 and 14 years, and public examination results (GCSEs, A levels and equivalent vocational qualifications) at ages 16 and 18 years. The data also include a variety of pupil characteristics—such as date of birth, home postcode, ethnicity, special educational needs, entitlement to FSMs and whether English is an additional language plus a school identifier. The data additionally contain public examination results at ages 16 and 18 years for children who were educated outside the state school sector (including private school pupils). We have limited additional information for those educated in the private school sector (around 6.5% of each cohort), namely gender and age.

Our data also contain information on whether or not each pupil enrolled in HE at age 19 or 20 years, though we do not know whether they subsequently dropped out (this issue is covered in Powdthavee and Vignoles (2009)). Ideally we would want to know whether an individual applied to an HE institution or not, rather than simply whether they were admitted. The acceptance rate for applications to UK higher education was 77% (Universities and Colleges Admissions Service) during the period that is covered by our data. Unfortunately we do not have information on whether individuals applied for HE. This means that we cannot determine whether the HE participation rate of a particular group, such as those from low SES backgrounds, can be explained by the fact that they do not apply to university in the first place.

4.3. Outcomes

For the purposes of this paper, participation in HE is defined as enrolling in a UK HE institution at age 19 or 20 years, as detailed in Section 3.

To derive our measure of HE status, we linked in institution level average research assessment exercise scores—a measure of research quality—from the 2001 exercise and included all Russell group institutions (a group of 20 well-regarded universities), plus any UK university with an average 2001 research assessment exercise score exceeding the lowest found among the Russell group universities. This gives a total of 41 ‘high status’ universities (which are listed in Table 2) out of a total of 163. Using this definition, 35% of HE participants in our data attend a high status university in their first year, which equates to 11% of our sample as a whole (including participants and non-participants).

Table 2.   High status universities (on our definition)
Russell group universities Universities with 2001 research assessment exercise score better than score for lowest Russell group university
University of BirminghamUniversity of the Arts, London
University of BristolAston University
University of CambridgeUniversity of Bath
Cardiff UniversityBirkbeck College
University of EdinburghCourtauld Institute of Art
University of GlasgowUniversity of Durham
Imperial College LondonUniversity of East Anglia
King's College LondonUniversity of Essex
University of LeedsUniversity of Exeter
University of LiverpoolHomerton College
London School of EconomicsUniversity of Lancaster
 and Political ScienceUniversity of London (institutes and activities)
University of ManchesterQueen Mary and Westfield College
Newcastle UniversityUniversity of Reading
University of NottinghamRoyal Holloway and Bedford New College
University of OxfordRoyal Veterinary College
Queen's University BelfastSchool of Oriental and African Studies
University of SheffieldSchool of Pharmacy
University of SouthamptonUniversity of Surrey
University College LondonUniversity of Sussex
University of WarwickUniversity of York

We recognize that such definitions of institution status are, by their very nature, somewhat arbitrary. Different academic departments within universities will be of differing qualities and we ignore such subject differences here (although recent evidence (Chevalier, 2009) suggests that institutional quality matters more than departmental quality for future wages). Additionally, we have defined status according to research quality and membership of the Russell group, although these indicators of status are not necessarily important in determining the quality of undergraduates’ university experience. However, obtaining a degree from a Russell group institution and attending a university that scored highly in the reasearch assessment exercise leads to a higher wage return to a degree (see Iftikhar et al. (2008) and Chevalier and Conlon (2003)). We would thus argue that our indicator of status is an important proxy for the nature of HE being accessed, which in turn will have long-run economic implications for these pupils. (We also carried out our analysis by using indicators of whether students attended Oxford or Cambridge or not, and whether they attended a Russell group institution or not, with qualitatively similar results. These results can be found in appendices RA1 and RA2 of our on-line appendix: http://www.ifs.org.uk/publications/4665.)

4.4. Measuring socio-economic background

Ideally, we would want rich individual level data on pupils’ socio-economic background, such as parental education, income and social class. However, the administrative data are weak in this respect: the only information that we observe is the pupil's eligibility for FSMs at age 16 years (which is an indicator of being in receipt of state benefits) and their home postcode at the same age, which we use to link in detailed information about the area in which they live. Moreover, we observe this information for state school pupils only.

Although we could use the FSM indicator alone as our measure of SES, this would capture differences in participation for only those who are eligible (approximately 16% of the school population) and those who are not, i.e. the lowest part of the SES distribution.

The inclusion of a range of additional information about the neighbourhood in which a pupil lives would enable us to differentiate pupils who are not eligible for FSMs further; however, the interpretation of these variables when they are included separately in our models is potentially problematic, owing to issues of multicollinearity. For example, to the extent that being on benefit or being a lone parent (which are both strongly correlated with FSM status) predicts where you live and the schools that you access, then some of the SES effect that is measured by FSM status is likely to be loaded onto the neighbourhood variables included.

Although we recognize that it is not ideal, we thus opt to combine individual and neighbourhood level data to create an index of socio-economic background, to provide a broader, more continuous measure of family circumstances. This index combines, by using principal components analysis, the pupil's eligibility for FSMs (measured at age 16 years) with the following neighbourhood-based measures of socio-economic circumstances (linked in on the basis of home postcode at age 16 years):

  • (a) their index of multiple deprivation score (which was designed to capture lack of access to jobs or services in seven domains, including health and education, and is available for neighbourhoods containing approximately 700 households);
  • (b) their classification of residential neighbourhoods type (which was constructed by using information on socio-economic characteristics, financial holdings and property details, and available for neighbourhoods containing approximately 15 households);
  • (c) three very local area-based measures from the 2001 census; specifically, the proportion of individuals in each area
  • (i) who work in higher or lower managerial or professional occupations,
  • (ii) whose highest educational qualification is national qualification framework level 3 or above and
  • (iii) who own (either outright or through a mortgage) their home (which are available for neighbourhoods containing approximately 150 households).

We are aware of the problems of adopting principal components analysis with dichotomous variables (Kolenikov and Angeles, 2009); however, we are using only one binary variable (eligibility for FSMs)—the remainder are continuous—which should minimize the problem. We also tested the robustness of our results by using polychoric correlations (Olsson, 1979) and the Kaiser–Meyer–Olkin measure of sampling adequacy for this index is 0.8, which is regarded as ‘meritorious’ (Kaiser, 1974). Moreover, our substantive conclusions do not change if we use eligibility for FSMs alone as our measure of socio-economic background, nor if we enter each of our SES measures separately. (See appendices RA3–RA5 of our on-line appendix.)

We split the population into five quintiles on the basis of this index of SES, and include the four lowest quintiles in our models, such that the base case is individuals in the highest quintile. As we do not observe eligibility for FSMs or home postcode for private school pupils, we must make some assumptions about their SES to include them in our analysis. We therefore assume that private school pupils come from families of higher SES than most state school pupils; hence they are allocated to the highest (the first) quintile of the socio-economic distribution; they make up 34% of this quintile in total. It is worth noting, however, that excluding private school pupils from our analysis, or making different assumptions about their SES relative to state school pupils, does not substantively change our results; see appendix RA6 of our on-line appendix for results for state school pupils only.

As our index of SES is based primarily on local area measures, we checked its validity by using a separate survey, the Longitudinal Study of Young People in England, which follows around 15000 young people who were aged 14 years in 2003–2004. This analysis shows that our index of SES successfully ranks pupils according to individual measures of SES, including household income, mother's education, father's occupational class and housing tenure (see Table 3 for more details). Moreover, it does so more successfully than other potential combinations of individual- and neighbourhood-based measures of SES that were available to us. For example, two-thirds of children who are eligible for FSMs end up in our lowest (the fifth) SES quintile.

Table 3.   Characteristics of Longitudinal Study of Young People in England cohort members by socio-economic quintile†
Characteristic Bottom SES quintile 4th quintile Middle SES quintile 2nd quintile Top SES quintile
  1. †We constructed our socio-economic quintiles (on the basis of individual FSM eligibility and other neighbourhood-based measures of SES) for Longitudinal Study of Young People in England cohort members, and then summarized the characteristics of individuals in each of these quintiles. Characteristics are reported in wave 1, at age 14 years, unless otherwise specified. Income is in 2003–2004 prices.

Household income (average wave 1–wave 3)£11206£13946£17454£21591£27645
Mother has a degree8%13%20%30%40%
Father has higher managerial or professional occupation11%20%29%43%60%
Family in financial difficulties15%10%7%5%3%
Family living in socially rented housing63%33%18%9%3%

We also use the Longitudinal Study of Young People in England data to check the validity of our assumption that private school pupils belong at the top of the SES distribution. In fact, this analysis suggests that only around 35% of private school pupils belong in the top SES quintile (a further 30% are in the second SES quintile, and a further 25% are in the middle SES quintile). Assuming that, for a given level of prior achievement, private school pupils are more likely to participate in HE than state school pupils are, our estimates can thus be interpreted as an upper bound of the socio-economic difference in participation in HE among state and private school pupils. (We thank a referee for this suggestion.) Our analysis of state school pupils only (shown in appendix RA6 of our on-line appendix) confirms that the SES differences in participation in HE are smaller if we exclude private school pupils from our analysis.

Mean HE participation rates and participation rates at high status institutions among participants are shown in Table 4. Further descriptive statistics can be found in Tables 5 and 6. The differences in HE participation rates vary substantially across the quintiles of the SES distribution, ranging from 11% to 51% for males and 15% to 60% for females. The differences in high status participation among participants are of similar magnitude.

Table 4.   HE participation rates by SES quintile and gender
Gender Top (1 st) SES quintile (%) 2 nd SES quintile (%) Middle SES quintile (%) 4 th SES quintile (%) Bottom (5 th) SES quintile (%) Difference (1 st − 5th) (percentage points) Overall (%)
Participation overall
Males50.934.925.817.110.840.128.0
Females59.645.234.623.715.444.235.7
Participation at high status institutions among participants
Males49.334.227.122.418.131.235.7
Females49.433.526.520.817.531.934.4
Table 5.   Personal characterisitics of HE participants and non-participants†
Characteristic HE participants HE non-participants Difference
  1. †The numbers presented in each column are the mean values of each characteristic for HE participants at age 19 or 20 years (second column) and non-participants (third column), and the difference between these means (fourth column). For all those characteristics taking values either 0 or 1, the mean values in the second and third columns are interpretable as the proportion of participants or non-participants who take the value 1 for that characteristic.

  2. ‡Significance at the 1% level.

Achieved 5 A*–C GCSE grades0.8380.2570.581‡
Reached level 3 threshold by 18 years via any route0.8850.1610.725‡
Male0.4480.536−0.088‡
Top SES quintile0.3340.1240.210‡
2nd SES quintile0.2580.1770.081‡
Middle SES quintile0.1940.207−0.013‡
4th SES quintile0.1310.236−0.105‡
Bottom SES quintile0.0830.257−0.174‡
Attended private school at age 16 years0.1220.0450.077‡
Observations370021793824 
Table 6.   Personal characteristics of HE participants who attend a high status institution and HE participants who do not†
Characteristic Attend a high status institution In HE but do not attend a high status institution Difference
  1. †The numbers presented in each column are the mean values of each characteristic for HE participants who attend a high status institution (second column) and HE participants who do not attend a high status institution (third column), and the difference between these means (fourth column). For all those characteristics taking values either 0 or 1, the mean values in the second and third columns are interpretable as the proportion of HE participants at high status (and respectively other) institutions who take the value 1 for that characteristic.

  2. ‡Significance at the 1% level.

Achieved 5 A*–C GCSE grades0.9590.7730.185‡
Reached level 3 threshold by 18 years via any route0.9670.8410.126‡
Male0.4570.4430.014‡
Top SES quintile0.4760.2570.218‡
2nd SES quintile0.2510.262−0.010‡
Middle SES quintile0.1500.218−0.068‡
4th SES quintile0.0800.158−0.077‡
Bottom SES quintile0.0420.105−0.063‡
Attended private school at age 16 years0.2210.0690.152‡
Observations129560240461 

4.5. Other individual characteristics

In addition to SES quintile, our models controlling for other individual characteristics also include month of birth, ethnicity, whether English is an additional language for the pupil and whether they have statemented (more severe) or non-statemented (less severe) special educational needs (all recorded at age 16 years). Private school students (and others for whom these characteristics are missing for some reason) are included by using missing dummy variables where necessary.

We also include test scores from ages 11, 14, 16 and 18 years. At each age, we divide the population into five evenly sized groups (quintiles) according to their total point score on the relevant test or examination. Again, we include missing dummy variables to account for cases in which these test scores are missing. It has been suggested that the choice of subject as well as level of achievement at 16 and 18 years of age may be an important determinant of participation in HE. To account for this potential factor, at age 16 years, when pupils take GCSEs, we additionally include an indicator for whether the individual achieved five GCSEs at grades A*–C including English and mathematics. At age 18 years, we add indicators for whether the individual achieved passes in certain A-level subjects (including mathematics, biology, physics, chemistry and modern languages) and also make use of information identifying whether individuals had achieved the level 3 threshold via any route by age 18 years. In our analysis of the type of HE institution attended (for HE participants only), we also make use of the pupil's tariff score (which is the record of their total qualification achievement at entry).

5. Methodology

We outlined the basic models to be estimated in equations (1) and (2). There are two important methodological issues to consider. Firstly, in both our model of participation in HE and our model of participation in high status universities we have a binary dependent variable. This would normally prompt the use of a binary response model such as a logistic model. However, there is a second methodological issue, namely how we take account of the role of schools. The achievements of pupils in the same school are likely to have some similarities due to the influence of schools, peers and teachers within schools. This suggests a two-level nested structure with pupils at level 1 grouped within schools at level 2 as outlined in equations (1) and (2) above.

We first need to decide whether to treat the school effects uj as fixed or random (see Clarke et al. (2010) for a full account of these issues). One consideration is the fact that we have limited data on schools and teachers. The use of a random-effects (or multilevel) model would require that uj is uncorrelated with the individual and school characteristics represented by the covariates xij (i.e. cov(xkij, uj)=0)—the so-called random-effects assumption. This assumption requires that unobserved characteristics of the school uj that influence achievement are not correlated with pupil or school characteristics that are included in the model. In our case we believe that the school effects are likely to be correlated with at least one of the independent variables that are included in the model; for example, parents may choose schools on the basis of their attitudes towards education and these attitudes are likely to be correlated with SES. Since we do not have data on parental attitudes this omitted variable will be correlated with both SES and the school effects included, preventing the use of a random-effects model. A global Hausman test rejected the random-effects model at the 1% level of significance for each of our specifications. (There has been some criticism of this test (see, for example, Fielding (2004)) but in the absence of alternatives we were guided by these results.)

We thus adopt a fixed effects approach, in which a dummy variable is included for each school (corresponding to a school-specific intercept term). Unlike random-effects models, fixed effects estimates may be unreliable when the size of cluster is small. In our case we have a census of pupils within schools and schools within the two cohorts of pupils, suggesting that the use of a fixed effects model is appropriate.

The second issue for consideration, however, is the fact that we have a binary outcome variable. To incorporate fixed effects in a binary response model one might estimate a fixed effect logit model. However, the fixed effect logit model would not converge because there are nearly 4000 school fixed effects to estimate. However, there are circumstances under which the linear probability model provides a close approximation to the logit model, namely where the probability of participation is between 0.25 and 0.75. In the case of equation (1), the HE participation model, the proportion of pupils enrolling in HE is around 30% (see Table 4). In the case of equation (2), the model of participation at high status universities, the participation rate (among those going to university) is around 35% (see Table 4). Hence in both cases we can be reasonably confident that the linear probability model with fixed effects will produce estimates that are close to those generated by a fixed effect logit model and thus we decided to use the former—although we recognize the limitations of such models (Aldrich and Nelson, 1984).

As a check of robustness, we estimated a sparse fixed effect logistic model for participation in a high status institution (including only individual demographic variables, test scores at age 11 years and school fixed effects). We also estimated a logistic model for participation in HE with school random effects and school type variables included (as distinct from fixed effects for each school), which generated similar marginal effects, leading to the same substantive conclusions. The results from this analysis can be found in Tables 7–12.

Table 7.   Gradients in HE participation for state and private school males: results from a logistic model with random effects†
Statistic No covariates Individual covariates and school random effects Plus age 11 years test results Plus age 14 years test results Plus age 16 years examination results Plus age 18 years examination results
  1. †All specifications include a cohort dummy variable. The individual covariates included from the third column onwards are ethnicity, whether English is an additional language for the pupil, whether they have special educational needs and month of birth. Random school effects and school type dummy variables are also included from the third column onwards. Standard errors are clustered at school level and reported in parentheses.

  2. ‡Significance at the 1% level.

2nd SES quintile−0.120‡−0.080‡−0.062‡−0.051‡−0.034‡−0.017‡
 (0.003)(0.002)(0.002)(0.002)(0.001)(0.001)
Middle SES quintile−0.199‡−0.143‡−0.109‡−0.086‡−0.053‡−0.025‡
 (0.004)(0.002)(0.002)(0.002)(0.002)(0.001)
4th SES quintile−0.294‡−0.222‡−0.167‡−0.129‡−0.076‡−0.034‡
 (0.004)(0.002)(0.002)(0.002)(0.002)(0.001)
Bottom SES quintile−0.391‡−0.293‡−0.219‡−0.165‡−0.093‡−0.043‡
 (0.005)(0.003)(0.002)(0.002)(0.002)(0.002)
Observations590790590790590790590790590790590790
Number of clusters 43634363436343634363
F-test of additional covariates (p-value) 0.0000.0000.0000.0000.000
Table 8.   Gradients in HE participation for state and private school females: results from a logistic model with random effects†
Statistic No covariates Individual covariates and school random effects Plus age 11 years test results Plus age 14 years test results Plus age 16 years examination results Plus age 18 years examination results
  1. †All specifications include a cohort dummy variable. The individual covariates included from the third column onwards are ethnicity, whether English is an additional language for the pupil, whether they have special educational needs and month of birth. Random school effects and school type dummy variables are also included from the third column onwards. Standard errors are clustered at school level and reported in parentheses

  2. ‡Significance at the 1% level.

2nd SES quintile−0.120‡−0.095‡−0.071‡−0.057‡−0.039‡−0.019‡
 (0.003)(0.002)(0.002)(0.002)(0.002)(0.001)
Middle SES quintile−0.210‡−0.175‡−0.129‡−0.102‡−0.065‡−0.031‡
 (0.004)(0.002)(0.002)(0.002)(0.002)(0.001)
4th SES quintile−0.320‡−0.271‡−0.198‡−0.154‡−0.094‡−0.43‡
 (0.004)(0.002)(0.002)(0.002)(0.002)(0.002)
Top SES quintile−0.430‡−0.363‡−0.262‡−0.199‡−0.116‡−0.052‡
 (0.005)(0.003)(0.002)(0.002)(0.002)(0.002)
Observations572939572939572939572939572939572939
Number of clusters 44164416441644164416
F-test of additional covariates (p-value) 0.0000.0000.0000.0000.000
Table 9.   Gradients in probability of attending a high status HE institution among male participants from state and private schools: results from a logistic model with fixed effects†
Statistic No covariates Individual covariates and school random effects Plus age 11 years test results Plus age 14 years test results Plus age 16 years examination results Plus age 18 years examination results
  1. †All specifications include a cohort dummy variable. The individual covariates included from the third column onwards are ethnicity, whether English is an additional language for the pupil, whether they have special educational needs and month of birth. Random school effects and school type dummy variables are also included from the third column onwards. Standard errors are clustered at school level and reported in parentheses.

  2. ‡Significance at the 1% level.

2nd SES quintile−0.136‡−0.044‡−0.038‡−0.034‡−0.028‡−0.016‡
 (0.007)(0.003)(0.003)(0.003)(0.003)(0.002)
Middle SES quintile−0.208‡−0.091‡−0.077‡−0.066‡−0.049‡−0.027‡
 (0.007)(0.004)(0.004)(0.004)(0.003)(0.003)
4th SES quintile−0.265‡−0.125‡−0.104‡−0.085‡−0.057‡−0.028‡
 (0.008)(0.005)(0.004)(0.004)(0.004)(0.003)
Bottom SES quintile−0.321‡−0.164‡−0.129‡−0.098‡−0.061‡−0.027‡
 (0.009)(0.006)(0.006)(0.006)(0.005)(0.004)
Observations165644165644165644165644165644165644
Number of clusters 34903490349034903490
F-test of additional covariates (p-value) 0.0000.0000.0000.0000.000
Table 10.   Gradients in probability of attending a high status HE institution among female participants from state and private schools: results from a logistic model with fixed effects†
Statistic No covariates Individual covariates and school random effects Plus age 11 years test results Plus age 14 years test results Plus age 16 years examination results Plus age 18 years examination results
  1. †All specifications include a cohort dummy variable. The individual covariates included from the third column onwards are ethnicity, whether English is an additional language for the pupil, whether they have special educational needs and month of birth. Random school effects and school type dummy variables are also included from the third column onwards. Standard errors are clustered at school level and reported in parentheses.

  2. ‡Significance at the 1% level.

2nd SES quintile−0.140‡−0.050‡−0.042‡−0.038‡−0.033‡−0.020‡
 (0.005)(0.003)(0.003)(0.003)(0.003)(0.002)
Middle SES quintile−0.211‡−0.097‡−0.081‡−0.071‡−0.059‡−0.034‡
 (0.006)(0.003)(0.003)(0.003)(0.003)(0.003)
4th SES quintile−0.278‡−0.144‡−0.120‡−0.102‡−0.081‡−0.048‡
 (0.006)(0.004)(0.004)(0.004)(0.004)(0.003)
Top SES quintile−0.324‡−0.182‡−0.145‡−0.117‡−0.087‡−0.049‡
 (0.007)(0.005)(0.005)(0.005)(0.005)(0.004)
Observations204412204412204412204412204412204412
Number of clusters 36583658365836583658
F-test of additional covariates (p-value) 0.0000.0000.0000.0000.000
Table 11.   Gradients in probability of attending a high status HE institution among male participants from state and private schools: results from a logistic model with fixed effects†
Statistic No covariates Individual covariates and school fixed effects Plus age 11 years test results
  1. †All specifications include a cohort dummy variable. The individual covariates included in the third and fourth columns are ethnicity, whether English is an additional language for the pupil, whether they have special educational needs and month of birth. School fixed effects are also included in the third and fourth columns. Standard errors are clustered at school level and reported in parentheses.

  2. ‡Significance at the 1%level.

2nd SES quintile−0.136‡−0.046‡−0.028‡
 (0.007)(0.004)(0.003)
Middle SES quintile−0.208‡−0.093‡−0.055‡
 (0.007)(0.004)(0.004)
4th SES quintile−0.265‡−0.124‡−0.071‡
 (0.008)(0.005)(0.005)
Bottom SES quintile−0.321‡−0.163‡−0.088‡
 (0.009)(0.007)(0.006)
Observations165644163635163635
Number of clusters 31493149
Table 12.   Gradients in probability of attending a high status HE institution among female participants from state and private schools: results from a logistic model with fixed effects†
Statistic No covariates Individual covariates and school fixed effects Plus age 11 years test results
  1. †All specifications include a cohort dummy variable. The individual covariates included in the third and fourth columns are ethnicity, whether English is an additional language for the pupil, whether they have special educational needs and month of birth. School fixed effects are also included in the third and fourth columns. Standard errors are clustered at school level and reported in parentheses.

  2. ‡Significance at the 1%level.

2nd SES quintile−0.140‡−0.054‡−0.036‡
 (0.005)(0.003)(0.003)
Middle SES quintile−0.211‡−0.101‡−0.068‡
 (0.006)(0.004)(0.004)
4th SES quintile−0.278‡−0.151‡−0.100‡
 (0.006)(0.005)(0.005)
Top SES quintile−0.324‡−0.190‡−0.120‡
 (0.007)(0.006)(0.006)
Observations204412202729202729
Number of clusters 33953395

We estimate our models sequentially, separately for males and females. First, we estimate the models with no covariates except our indicator variables for socio-economic background (and a cohort dummy variable). This provides an estimate of the underlying differences in participation in HE and participation in a high status institution by SES. We then estimate a model including a set of individual covariates—namely ethnicity, whether English is an additional language for the pupil, whether they have special educational needs and month of birth—as well as school fixed effects. We go on to examine the extent to which these SES differences can be explained by differences in earlier measures of achievement, from age 11 to age 18 years. We do this to understand better whether SES affects participation in HE directly, or through its effect on prior achievement (which in turn affects the likelihood of attending university) or both.

We note that the school fixed effects capture the additional association between school attended and participation in HE or participation in a high status institution, over any association between school attended and test scores at age 11, 14, 16 and 18 years. We also allow for clustering within schools and use Stata cluster standard errors derived from Huber–White sandwich estimators, where the clusters are the 3490 schools for boys and 3658 schools for girls.

Throughout the paper, we use the term ‘impact’ to describe the statistical association between SES and the probability of attending university at age 19 or 20 years. We would obviously like to uncover the causal effects of SES on participation in HE; however, in the absence of any experiment or quasi-experiment, it is possible that SES may be endogenous. This structure will arise if there are unobserved characteristics that are correlated with both SES and participation in HE. If this were so, then our estimates of the impact of SES on participation would be upward biased if the unobserved characteristics were positively correlated with both SES and the likelihood of participation, and downward biased otherwise.

To maximize our ability to recover the causal impact of SES on the likelihood of attending university or attending a high status institution, we thus need to ensure that we have controlled for as many other factors that influence HE participation as possible. The strength of our analysis is that we have unique longitudinal data on the educational performance and achievement of children from age 11 years onwards. By controlling for these rich measures of prior achievement, we are better able to allow for unobservable factors that influence educational achievement, assuming that such unobserved factors are likely to influence earlier achievement as well as the HE participation decision. The inclusion of school fixed effects in our linear probability model also allows us to capture unobserved differences across schools affecting HE participation decisions.

6. Results

6.1. Participation in higher education

Tables 13 and 14 present our estimates of the impact of SES on HE participation separately for males and females respectively. The second column of Tables 13 and 14 shows the ‘raw’ differences in HE participation rates by socio-economic quintile. The third column adds individual covariates and school fixed effects. The remaining columns show how the impact of SES is mediated by the successive inclusion of measures of prior achievement at ages 11, 14, 16 and 18 years. All coefficient estimates from the final model specification—i.e. up to and including age 18 years examination results—can be found in Table 15. Full details of all coefficient estimates from every model specification can be found in appendix RA7 of our on-line appendix.

Table 13.   Gradients in HE participation for state and private school males†
Statistic No covariates Individual covariates and school fixed effects Plus age 11 years test results Plus age 14 years test results Plus age 16 years examination results Plus age 18 years examination results
  1. †All specifications include a cohort dummy variable. This cohort effect is small (less than 1 percentage point) but significant in the final specification. The individual covariates included from the third column onwards are ethnicity, whether English is an additional language for the pupil, whether they have special educational needs and month of birth. School fixed effects are also included from the third column onwards. Standard errors are clustered at school level and reported in parentheses.

  2. ‡Significance at the 1% level.

2nd SES quintile−0.160‡−0.105‡−0.083‡−0.068‡−0.046‡−0.024‡
 (0.004)(0.003)(0.002)(0.002)(0.002)(0.002)
Middle SES quintile−0.251‡−0.173‡−0.134‡−0.105‡−0.066‡−0.031‡
 (0.005)(0.003)(0.002)(0.002)(0.002)(0.002)
4th SES quintile−0.339‡−0.239‡−0.181‡−0.139‡−0.082‡−0.037‡
 (0.005)(0.003)(0.003)(0.002)(0.002)(0.002)
Bottom SES quintile−0.402‡−0.277‡−0.204‡−0.153‡−0.085‡−0.039‡
 (0.005)(0.003)(0.003)(0.002)(0.002)(0.002)
Observations590790590790590790590790590790590790
R 2 0.09920.1190.2540.290.420.581
Number of schools 43634363436343634363
F-test of additional covariates (p-value) % with predicted values <0 or >1 0.0000.0000.0000.0000.000
% with predicted values <0 or >1     17%
Table 14.   Gradients in participation in HE for state and private school females†
Statistic No covariates Individual covariates and school fixed effects Plus age 11 years test results Plus age 14 years test results Plus age 16 years examination results Plus age 18 years examination results
  1. †All specifications include a cohort dummy variable. This cohort effect is small (less than 1 percentage point) but significant in the final specification. The individual covariates included from the third column onwards are ethnicity, whether English is an additional language for the pupil, whether they have special educational needs and month of birth. School fixed effects are also included from the third column onwards. Standard errors are clustered at school level and reported in parentheses.

  2. ‡Significance at the 1% level.

2nd SES quintile−0.145‡−0.110‡−0.085‡−0.069‡−0.048‡−0.024‡
 (0.004)(0.003)(0.002)(0.002)(0.002)(0.002)
Middle SES quintile−0.250‡−0.197‡−0.151‡−0.120‡−0.079‡−0.038‡
 (0.005)(0.003)(0.003)(0.002)(0.002)(0.002)
4th SES quintile−0.360‡−0.285‡−0.213‡−0.166‡−0.102‡−0.047‡
 (0.005)(0.003)(0.003)(0.002)(0.002)(0.002)
Bottom SES quintile−0.443‡−0.345‡−0.251‡−0.191‡−0.112‡−0.053‡
 (0.005)(0.003)(0.003)(0.003)(0.002)(0.002)
Observations572939572939572939572939572939572939
R 2 0.1080.09560.2210.2830.4230.574
Number of schools 44164416441644164416
F-test of additional covariates (p-value)  0.0000.0000.0000.0000.000
% with predicted values < 0 or > 1     7%
Table 15.   Other coefficients from the final model specification–including all prior achievement up to age 18 years—for state and private school pupils†
Characteristic Results for HE participation Results for high status HE participation (among participants)
  Males Females Males Females
  1. †Standard errors are clustered at school level.

  2. ‡Significance at the 1% level.

  3. §Significance at the 5% level.

Individual characteristics     
Cohort 20.005‡0.004‡−0.012‡−0.009‡
Other white0.013‡0.020‡0.017‡0.004
Black African0.105‡0.128‡−0.0130.002
Black Caribbean0.058‡0.085‡−0.021§−0.021‡
Other black0.041‡0.050‡0.008−0.012
Indian0.161‡0.157‡0.0020.019‡
Pakistani0.104‡0.086‡0.0070.024‡
Bangladeshi0.077‡0.066‡0.0140.022
Chinese0.099‡0.077‡0.048‡0.045‡
Other Asian0.121‡0.107‡0.0100.036‡
Mixed ethnicity0.060‡0.057‡0.0130.022‡
Other ethnicity0.051‡0.047‡−0.0030.026‡
English additional language0.035‡0.037‡−0.002−0.008
Statemented special educational needs0.016‡0.016‡0.005−0.007
Non-statemented special educational needs0.007‡0.008‡0.010§−0.001
Born in October−0.0020.0010.000−0.006
Born in November0.0010.0000.003−0.002
Born in December−0.0010.0030.0040.000
Born in January0.0010.0020.005−0.007
Born in February0.0030.0020.010§0.001
Born in March0.0000.0030.008§−0.001
Born in April0.005‡0.005§0.008§−0.003
Born in May0.004§0.005§0.0080.003
Born in June0.0020.004§0.001−0.001
Born in July0.005‡0.0020.012‡0.006
Born in August0.009‡0.006‡0.0070.004
Age 11 years test results     
2nd quintile−0.001−0.003§0.0010.002
Middle quintile−0.007‡−0.007‡0.000−0.004
4th quintile−0.012‡−0.011‡−0.006−0.010§
Top quintile−0.014‡−0.012‡0.0080.011§
Age 14 years test results     
2nd quintile0.009‡0.007‡0.012§0.020‡
Middle quintile0.001−0.0010.011§0.026‡
4th quintile−0.0020.0030.0050.020‡
Top quintile0.0040.007§0.023‡0.039‡
Age 16 years examination results     
2nd quintile0.023‡0.011‡−0.051‡−0.030‡
Middle quintile0.039‡0.028‡−0.023§−0.001
4th quintile0.084‡0.078‡−0.0160.001
Top quintile0.103‡0.115‡0.066‡0.062‡
5 GCSEs at A*–C including  English and mathematics0.014‡0.024‡−0.001−0.003
Age 18 years examination results     
2nd quintile0.187‡0.176‡−0.015‡−0.031‡
Middle quintile0.299‡0.279‡0.037‡0.003
4th quintile0.359‡0.346‡0.165‡0.103‡
Top quintile0.387‡0.380‡0.267‡0.198‡
Level 3 qualification0.231‡0.199‡0.0030.007
Pass in A-level biology0.041‡0.052‡0.028‡0.025‡
Pass in A-level chemistry0.010‡0.0050.093‡0.107‡
Pass in A-level physics0.038‡0.028‡0.071‡0.073‡
Pass in A-level mathematics0.028‡0.010‡0.090‡0.068‡
Pass in A-level history0.046‡0.051‡0.052‡0.074‡
Pass in A-level economics0.011‡0.0080.063‡0.077‡
Pass in A-level English0.016‡0.022‡−0.022‡−0.024‡
Pass in A-level languages0.027‡0.024‡0.069‡0.095‡
Tariff scores     
1–100 AS-level points  0.013‡0.010‡
101–200 AS-level points  0.016‡0.003
201–300 AS-level points  0.059‡0.065‡
geqslant R: gt-or-equal, slanted 301 AS-level points  0.096‡0.102‡
1–200 A-level points  −0.011‡−0.047‡
201–300 A-level points  0.115‡0.049‡
301–400 A-level points  0.264‡0.235‡
geqslant R: gt-or-equal, slanted 401 A-level points  0.319‡0.323‡

The second columns of Tables 13 and 14 show that there is a large and significant raw socio-economic gradient in HE participation rates: for example, being in the bottom SES quintile (compared with the top SES quintile) reduces the likelihood of going to university at age 19 or 20 years by 40.2 percentage points for boys and 44.3 percentage points for girls. Once we take into account a variety of individual characteristics and school fixed effects, these differences fall by around 30% for boys and 20% for girls (the third column), suggesting that differences in individual characteristics and the types of schools that are attended by young people from different socio-economic backgrounds provide some explanation for why young people from lower SES families are less likely to go to university than young people from higher SES families.

The fourth to sixth columns show how university participation rates vary between pupils from different socio-economic backgrounds, but who otherwise have similar observable characteristics, attend the same schools and follow the same pattern of achievement from age 11 to age 18 years. As might be expected, the inclusion of prior educational achievement reduces the effect of SES on HE participation rates. For example, the impact (on the likelihood of going to university at age 19 or 20 years) of being in the bottom SES quintile (compared with the top SES quintile) falls from 27.7 to 20.4 percentage points for boys—and from 34.5 to 25.1 percentage points for girls—once we add in age 11 years test results.

This result suggests that socio-economic disadvantage has already had an impact on academic outcomes at the age of 11 years and that this disadvantage explains a significant proportion of the difference in participation in HE at age 19 or 20 years. Crucially, however, the addition of age 11 years test scores does not explain the entire difference in HE participation rates between those from high and low SES backgrounds; this suggests that interventions to improve the educational achievement of disadvantaged pupils during secondary school may still make a substantial contribution to narrowing the socio-economic differences in participation in HE.

Once we have added in all available measures of prior achievement, boys and girls in the bottom SES quintile are now respectively 3.9 and 5.3 percentage points less likely to go to university at age 19 or 20 years than boys and girls in the top SES quintile; this is respectively around 10% and 12% of the raw differences observed in the second column. What is striking from these results is that the difference in participation rates between the fourth and bottom quintile of SES is not significant for boys and just 0.6 percentage points for girls. By contrast, the difference between the top and second SES quintiles is 2.4 percentage points for boys and girls. Thus the socio-economic difference in HE participation rates is largest at the top of the SES distribution. Interestingly, this pattern persists if we exclude private school pupils from our analysis (see appendix RA6 in our on-line appendix), suggesting that the high participation rates among higher SES families are not entirely driven by the ability of private schools to get their pupils into university.

It is also worth noting that, if we control for individual characteristics, school fixed effects and age 18 years achievement only, the socio-economic differences fall by almost as much as if we include achievement at ages 11, 14 and 16 years as well. This result suggests that the vast majority of the socio-economic difference in participation in HE that we observe in the second column can be explained by differences in test scores at age 18 years (see appendix RA8 in our on-line appendix).

6.2. Status of higher education institution attended

In this section, we examine the relationship between SES and the likelihood of attending a high status institution at age 19 or 20 years, conditional on participating in HE, and we show how this relationship changes once we take into account individual characteristics (ethnicity, whether English is an additional language for the pupil, whether they have special educational needs and month of birth), school fixed effects and prior achievement measures. Tables 16 and 17 present our estimates of the impact of SES on the likelihood of attending a high status institution among HE participants for boys and girls respectively. All coefficient estimates from the final model specification—i.e. up to and including age 18 years examination results—can be found in Table 15. Full details of all coefficient estimates from every model specification can be found in appendix RA7 of the on-line appendix.

Table 16.   Gradients in high status participation for state and private school males†
Statistic No covariates Individual covariates and school fixed effects Plus age 11 years test results Plus age 14 years test results Plus age 16 years examination results Plus age 18 years examination results
  1. †All specifications include a cohort dummy variable. This cohort effect is small (less than 1 percentage point) but significant in the final specification. The individual covariates included from the third column onwards are ethnicity, whether English is an additional language for the pupil, whether they have special educational needs and month of birth. School fixed effects are also included from the third column onwards. Standard errors are clustered at school level and reported in parentheses.

  2. ‡Significance at the 1% level.

2nd SES quintile−0.151‡−0.043‡−0.037‡−0.033‡−0.027‡−0.016‡
 (0.008)(0.004)(0.003)(0.003)(0.003)(0.003)
Middle SES quintile−0.221‡−0.082‡−0.069‡−0.058‡−0.043‡−0.026‡
 (0.008)(0.004)(0.004)(0.004)(0.004)(0.003)
4th SES quintile−0.270‡−0.105‡−0.085‡−0.069‡−0.045‡−0.023‡
 (0.008)(0.004)(0.004)(0.004)(0.004)(0.004)
Bottom SES quintile−0.312‡−0.128‡−0.097‡−0.075‡−0.046‡−0.025‡
 (0.009)(0.006)(0.005)(0.005)(0.005)(0.004)
Observations165644165644165644165644165644165644
R 2 0.05550.0720.1590.1950.3040.459
Number of schools 34903490349034903490
F-test of additional covariates (p-value) 0.0000.0000.0000.0000.000
% with predicted values < 0 or > 1     10%
Table 17.   Gradients in high status participation for state and private school females†
Statistic No covariates Individual covariates and school fixed effects Plus age 11 years test results Plus age 14 years test results Plus age 16 years examination results Plus age 18 years examination results
  1. †All specifications include a cohort dummy variable. This cohort effect is small (less than 1 percentage point) but significant in the final specification. The individual covariates included from the third column onwards are ethnicity, whether English is an additional language for the pupil, whether they have special educational needs and month of birth. School fixed effects are also included from the third column onwards. Standard errors are clustered at school level and reported in parentheses.

  2. ‡Significance at the 1% level.

2nd SES quintile−0.159‡−0.051‡−0.042‡−0.038‡−0.034‡−0.022‡
 (0.006)(0.003)(0.003)(0.003)(0.003)(0.003)
Middle SES quintile−0.229‡−0.092‡−0.075‡−0.066‡−0.054‡−0.034‡
 (0.007)(0.004)(0.003)(0.003)(0.003)(0.003)
4th SES quintile−0.286‡−0.127‡−0.102‡−0.087‡−0.068‡−0.043‡
 (0.007)(0.004)(0.004)(0.004)(0.004)(0.003)
Bottom SES quintile−0.319‡−0.150‡−0.115‡−0.093‡−0.069‡−0.043‡
 (0.007)(0.005)(0.004)(0.004)(0.004)(0.004)
Observations204412204412204412204412204412204412
R 2 0.06070.04070.1280.1470.2250.385
Number of schools 36583658365836583658
F-test of additional covariates (p-value) 0.0000.0000.0000.0000.000
% with predicted values < 0 or > 1     3%

The second column of Tables 16 and 17 presents raw estimates of the impact of SES on the likelihood of attending a high status institution for respectively boys and girls who participate in HE at age 19 or 20 years. These raw figures show that there are large socio-economic differences in the probability of attending a high status university. For example, males and females in the bottom SES quintile are respectively 31.2 and 31.9 percentage points less likely to attend a high status university than males and females in the top SES quintile (conditional on participating).

Interestingly, the inclusion of individual characteristics and school fixed effects (the third column) reduces the impact of SES on attendance at a high status HE institution by more than it reduced the impact of SES on participation in HE in Tables 13 and 14 earlier. The difference in the likelihood of participating in a high status institution between participants from the top and bottom SES quintiles is reduced by around 50–70% for boys and girls by the inclusion of individual characteristics and school effects (compared with a reduction of around 20–30% for participation in HE). This implies that differences in individual characteristics and schools attended explain more of the socio-economic variation in the likelihood of attending an élite HE institution.

To investigate the role of schools in more detail, we compared our main results (which include school fixed effects) with those from models excluding school fixed effects but including a specific school characteristic, namely a set of school type dummy variables (see appendix RA9 in our on-line appendix for these results). We found that excluding school fixed effects from our high status model significantly increased the socio-economic difference in our final specification, from 2.5 and 4.3 to respectively 3.6 and 6.0 percentage points for males and females, which is an increase of around 40% for both boys and girls. This result suggests that schools may have an important role to play in encouraging pupils from lower SES families to apply to high status universities.

This finding is in contrast with the results for overall participation in HE, in which the addition of fixed effects does not make much difference to our results, suggesting that schools mainly affect overall participation through their effect on prior achievement.

Thereafter, the inclusion of each of our measures of prior achievement makes a relatively smaller difference to the coefficients on SES. Nonetheless, once we have added in prior achievement at ages 11, 14, 16 and 18 years, we find that socio-economic background has a substantially reduced impact on the probability that HE participants will attend high status institutions. For example, boys and girls from the bottom SES quintile are now respectively only 2.5 and 4.3 percentage points less likely to attend a high status institution (conditional on participation) than boys and girls from the top SES quintile; this is just 8% and 13% of the raw differences of 31.2 and 31.9 percentage points respectively.

As with participation in HE, there is no significant difference in participation rates at high status institutions between the fourth and bottom quintiles, whereas boys and girls in the second SES quintile are respectively 1.6 and 2.2 percentage points less likely to attend a high status institution than those in the top SES quintile. Again, these results hold even if we exclude private school pupils from our analysis (see appendix RA6 of our on-line appendix). It is also worth noting that, as for participation in HE overall, age 18 years test scores alone explain the vast majority of the SES differences in high status participation (see appendix RA8 of our on-line appendix).

6.3. Checks for robustness

We highlighted above the potential problems with the use of the linear probability model instead of a binary outcome model and have included in the tables above the percentage of predicted values that exceed 1 or are less than 0 as a measure of the appropriateness of the model. For participation in HE, just 7% of predictions fall outside the 0–1 boundary for women; for men, the percentage is somewhat higher at 17%. For the high status institution model, 10% of predictions for men and 3% of predictions for women fall outside the 0–1 boundary.

A second issue is the use of an index of SES that is constructed by using principal components analysis as our measure of socio-economic background. To see whether this affected our conclusions, we re-estimated the models including each of the measures of SES which we use to construct our index separately in the model (see appendix RA5 in our on-line appendix). Controlling for all measures of prior achievement, the only individual level measure of socio-economic disadvantage, namely whether the individual is eligible for FSMs or not, is statistically significant but extremely small: less than 1 percentage point difference in the HE participation model and 1.3 percentage points difference in the high status institution model. The various measures of neighbourhood SES are also jointly statistically significant as expected, i.e., when the variables are included in the model together, a test of the hypothesis that the covariates are jointly not significantly different from 0 is rejected and hence there is evidence of socio-economic differences in participation in HE by neighbourhood. Hence for a given level of neighbourhood and area disadvantage, we find only a limited additional effect from the person themselves being eligible for FSMs. Interpreting this result is problematic, however, since low SES individuals tend to cluster in low SES neighbourhoods.

Finally, we investigated interactions between the key SES quintiles and a number of variables, including ethnicity and quintile of achievement at various ages. We do not report the results here (the results are available on request) but instead simply note some key findings. For participation in HE, the interaction terms that were included in our models were all jointly significantly different from 0, suggesting that the magnitude of the socio-economic differences varies by ethnicity and prior achievement. In particular, SES differences were largest for white males in the bottom quintile of the age 18 years test score distribution, suggesting that the likelihood of securing university entrance is more strongly associated with SES for low achieving males than for higher achieving males. By contrast, SES differences in participation at high status institutions (among participants) do not vary by ethnicity, although they do vary by prior achievement, in a similar fashion to that described above for overall participation.

7. Conclusions

This paper has shown that pupils from lower SES backgrounds are much less likely to participate in HE than pupils from higher SES backgrounds are. However, our findings suggest that this socio-economic difference in university participation does not emerge at the point of entry to HE. In other words, the socio-economic difference in HE participation does not arise simply because lower SES pupils face the same choices at 18 years of age but choose not to go to university or are prevented from doing so. Instead, it comes about largely because lower SES pupils do not achieve as highly in secondary school as their more advantaged counterparts, confirming the general trend in the literature that socio-economic differences emerge relatively early in individuals’ lives.

It is important to note that a socio-economic difference in participation does remain on entry to university, even after allowing for prior achievement, but it is modest relative to the magnitude of the raw differences once we include A-level achievement, and to a lesser extent GCSE scores. The SES difference is also greatest between the top and second quintiles of the SES distribution. The implication is that focusing policy interventions on encouraging disadvantaged pupils at age 18 years to apply to university is unlikely to have a major impact on reducing the raw socio-economic difference in university participation and in particular is not likely to increase participation by the lowest SES pupils markedly. Our results controlling for age 16 years achievement suggest that there may be some gain from targeting lower SES pupils with good GCSE results but, again, this relatively late intervention is unlikely to result in a large increase in the HE participation rates of low SES pupils as compared with interventions to improve achievement at, say, age 11 years or earlier in primary school.

Our results highlight not only the importance of achievement in secondary schools, but also the potentially important role that schools seem to play in encouraging pupils from lower SES backgrounds to apply to high status institutions. For example, it might be beneficial for universities to encourage schools to be more supportive of low SES pupils, particularly with regard to getting them to apply to high status institutions. We are not therefore suggesting that universities should stop their outreach work, but simply noting that such activities are unlikely to tackle the more major problem underlying the socio-economic difference in university participation—namely the underachievement of disadvantaged pupils in primary and secondary school.

Of course some caution is required in the interpretation of our results. Pupils look forward when making decisions about what qualifications to attempt at ages 16 and 18 years, and indeed when deciding how much effort to put into school work. If disadvantaged pupils feel that HE is ‘not for people like them’, then it may be that their achievement in school simply reflects anticipated barriers to participation in HE, rather than the other way around. This suggests that outreach activities will still be required to raise pupils’ aspirations about university, but that they might perhaps be better targeted on younger children in secondary school (as indeed happened with ‘AimHigher’ and other widening participation interventions). A note of caution is warranted here though: Chowdry et al. (2011) showed that a far greater proportion of pupils in the Longitudinal Study of Young People in England—including those from poorer backgrounds—report (at age 14 years) that they are likely to apply to university than are ultimately likely to end up there. This suggests that simply raising aspirations among pupils from lower socio-economic backgrounds is unlikely to eliminate the socio-economic differences in participation in HE that we see.

Another issue is that we know children's social and emotional skills (e.g. their self-esteem) are also important in determining individuals’ lifetime outcomes, and such non-cognitive skills appear more malleable later in childhood (Carneiro and Heckman, 2003). Unfortunately, we cannot measure skills such as these in our data. It is possible that earlier measures of achievement may be partially capturing the fact that there is a positive relationship between cognitive and non-cognitive skills, and that the effect of prior achievement on participation in HE might be slightly lower if we had separate measures of non-cognitive skills.

Nonetheless, our results make clear that the majority of the socio-economic difference in participation in HE—including at high status institutions—arises as a result of substantial socio-economic differences in educational achievement earlier in life, and thus that policy makers who are interested in increasing participation among pupils from lower SES backgrounds need to intervene earlier to maximize their potential impact.

Acknowledgements

We are grateful for funding from the Economic and Social Research Council (grant RES-139-25-0234) via its ‘Teaching and learning research programme’. In addition, we thank the Department for Children, School and Families (now the Department for Education)—Anna Barker, Graham Knox and Ian Mitchell in particular—for facilitating access to the valuable data set we have used. Without their work on linking the data and facilitating our access, this work could never have come to fruition. We are also grateful for comments from the referees, Stijn Broecke, Joe Hamed, John Micklewright and participants at various seminars and conferences, particularly the British Education Research Association annual conference, the Royal Economic Society conference, the Pupil Level School Census Users Group and those organized by the ‘Teaching and learning research programme’. Responsibility for interpretation of the data, as well as for any errors, is the authors’ alone.

Ancillary