Is Improving Access to University Enough? Socio‐Economic Gaps in the Earnings of English Graduates

Much research and policy attention has been on socio&#8208;economic gaps in participation at university, but less attention has been paid to socio&#8208;economic gaps in graduates&#8217; earnings. This paper addresses this shortfall using tax and student loan administrative data to investigate the variation in earnings of English graduates by socio&#8208;economic background. We find that graduates from higher income families (with median income of around 77,000) have average earnings which are 20% higher than those from lower income families (with median income of around &#163;26,000). Once we condition on institution and subject choices, this premium roughly halves, to around 10%. The premium grows with age and is larger for men, in particular for men at the most selective universities. We estimate the extent to which different institutions and subjects appear to deliver good earnings for relatively less well off students, highlighting the strong performance of medicine, economics, law, business, engineering, technology and computer science, as well as the prominent London&#8208;based universities.


I. Introduction
Higher education is seen as a potentially crucial tool for social mobility, providing a possible route for students from lower income family backgrounds to achieve labour market success and higher earnings. Consequently, there have been numerous government policies around the world focussed on improving access to university degrees for those from poorer households. However, there is relatively little evidence on whether this should be the primary focus of governments trying to improve social mobility.
Consistent with most countries around the world, in England educational achievement and higher education access varies substantially by the level of parental income, with many fewer students from poorer backgrounds attending university, particularly the highest status institutions (Chowdry et al., 2013;Ermisch, Jantti and Smeeding, 2012). However, little is known about the differences in earnings between graduates from poorer and richer family backgrounds. Further, primarily due to data limitations, the question of whether differences in earnings still exist conditional on university and subject choice, has remained largely unanswered. 1 In this paper, we are able to address these shortfalls in the literature by making use of a unique administrative database that tracks the earnings of graduates into their mid thirties.
We use a data set that consists of anonymized individual level-administrative taxable earnings data supplied by Her Majesty's Revenue and Customs (HMRC), linked to information on students' higher education (university or college) from the English Student Loan Company (SLC). The latter is an institution supported by the state to provide loans to students to fund their higher education. The HMRC and SLC data sets are hard linked using a national identification number (National Insurance number 2 ) and we have access to a 10% random sample. We study cohorts of students who entered higher education from 1999 to 2005, and focus on the same students' earnings between 2008/09 and 2013/14. This allows us to follow graduates through their most crucial career developing years and well into their thirties. We also use Higher Education Statistics Agency (HESA) data which we can match at the subject-institution (rather than individual) level. This includes the socio-economic background and prior academic achievement of the students studying the same subject in the same institution. This allows us to add further controls that capture differences in the demographics of students in a given university and subject, although we acknowledge that this does not eliminate ability bias in returns or deal with differential selection into courses across individuals from different socio-economic backgrounds.
A common problem with administrative data is a limited set of background characteristics for individuals. 3 We also face these limitations, and do not directly observe parental income for individuals in our sample. However, we are able to infer a simple binary meaby SLC. The research data sets used may not exactly reproduce HMRC or SLC aggregates. The use of HMRC or SLC statistical data in this work does not imply the endorsement of either HMRC or SLC in relation to the interpretation or analysis of the information. 1 The exceptions include a number of papers that investigate returns to private vs. state secondary school education in the UK, conditional on university education (e.g. Crawford et al., 2016), and Chetty et al. (2017) which investigates variation in returns to attending university by parental income in the US. 2 This is the key individual identifier for all taxes, social security and student loans. 3 The availability of linked administrative data has improved dramatically in the UK in recent years. The Longitudinal Educational Outcomes data (LEO) allows the linkage of entire education histories of individuals to their earnings records. However, these data are currently available only for government research.
sure of parental income based a student's SLC record, which notes the amount each student borrowed in their first year of study. For English students starting university before 2006, the amount individuals were eligible to borrow was linked in a monotonic way to their parental income. We identify people as being from a higher income household if they are borrowing exactly the maximum amount an individual from a higher income household is eligible for in their first year of study. 4 This consists of approximately 20% of borrowers, which in the paper we refer to as the richer group. The remaining 80% of borrowers are of course relatively poorer, rather than poor in an absolute sense. Indeed, based on a sample of borrowers in the Family Resources Survey, we estimate that the median parental earnings of these groups is around £77,000 for the richer group and around £26,000 for the rest (2018 prices). Clearly our parental income measure is likely to have issues with measurement; people from poorer households might borrow the rich maximum, people from richer households might not borrow the rich maximum, and we are unable to say anything at all about the roughly 15% of people who attend university but choose not to borrow, which is likely to be weighted towards those from higher income households. Given these measurement issues -all of which are likely to bias down our estimates -we find considerable differences in earnings between graduates from richer and relatively less well off family backgrounds. These differences roughly halve once we condition on subject and institution choices but remain economically important at around 10%, and are statistically significant. These socio-economic differences also exist right through the earnings distribution and are larger at the bottom and top of the earnings distribution, suggesting family wealth is particularly good at both protecting graduates against very poor outcomes and providing them with opportunities for very high earnings. The conditional differences grow with age and are somewhat smaller for Science, Technology, Engineering and Mathematics (STEM) or Law, Economics and Management/Business (LEM) as compared to other subjects. They are also particularly pronounced for men from the most selective universities.
These findings are descriptive, but clearly important for policy. Data limitations mean we are unable to control for: individual-level qualifications; 5 degree outcomes, such as completion and degree classification (i.e. grades); progression onto (and timing of) postgraduate study; and early career occupation choices. These, along with differences in noncognitive skills and the networks of those from richer and poorer backgrounds should be the subject of future research into understanding the drivers of these earnings differences, and could have important implications for firms, universities and policy.
Finally, we follow Chetty et al. (2017) by estimating 'social mobility scorecards', which measure the extent to which different universities appear to help students from relatively poorer backgrounds get into the top fifth of graduate earners (specifically the 'mobility score' is the probability of a course admitting a poorer student multiplied by the probability that the student goes on to enter the top fifth of the earnings distribution). 4 There were subsequent changes to both tuition fees and student support that took effect from 2006 -see section III for more detail. These changes do not affect our results, however, as we focus on the first-year borrowing of people who entered university before 2006. 5 The period we are looking at was before the big increase in 'contextualised admissions' policies whereby universities make lower offers to students who had attended certain schools, typically those in poorer neighbourhoods. This suggests it would be more important to control for individual qualifications for later cohorts.
Our parental income measure is less rich than that used by Chetty et al. (2017), who focus on the bottom 20% of the observed parental earnings distribution, and consequently our results are not directly comparable. However, unlike Chetty et al. (2017) we are able to estimate mobility scorecards for different subject disciplines. We find that medicine and economics are particularly good at delivering relatively poorer students into the top 20% of the graduate earnings distribution. However, it is not clear that all STEM subjects are broadly effective at delivering this. On the other hand, we find that LEM subjects are effective. More broadly, professional facing subjects (e.g. LEM, computer science, engineering, technology, business) seem to deliver routes to social mobility. At the other end of the scale, biological sciences, mass communication and creative arts subjects do this to a much lesser extent. For institutions, the high-profile London universities -namely the LSE, Imperial College, King's College and UCL -do very well by this index, while outside of London, Warwick and Manchester are two of the best performing universities from the set we have permission to name. 6 These results are necessarily descriptive only and come with several caveats. However, they represent the first descriptive evidence on which institutions and subjects are best for encouraging social mobility.
The paper is laid out as follows. In the next section we discuss our contribution to the existing literature. In section III we outline the institutional details of Higher Education in the UK. In section IV we describe our data and introduce our measure of parental income. In section V, we present results from our modelling. In section VI, we estimate social mobility scorecards by subject and institution. Section VII concludes.

II. Existing literature
This work will contribute to an important literature that has suggested a major impact from higher education on individuals' earnings. We focus on the English graduate labour market (Blundell, Dearden and Sianesi, 2005), though our findings are also relevant to the large US literature which has looked at the heterogeneity in graduate earnings by subject and institution (see e.g. Dale and Krueger, 2014as well as Webber, 2014and Altonji, Arcidiacono and Maurel, 2015 for reviews). The evidence of a sizeable graduate wage premium for English graduates is convincing, see for example Walker and Zhu (2011). Yet although higher education in England appears to be a good investment for many, as is the case for the US, there is also a sizeable empirical literature that has shown substantial variation in graduate earnings that has increased over time (Chevalier, 2011;Hussain, McNally and Telhaj, 2009;Sloane and O'Leary, 2005;Smith and Naylor, 2001;Walker and Zhu, 2011). A key question is therefore, given this increased diversity in graduates' earnings, whether students from poorer backgrounds achieve the same earnings gains compared to their similarly qualified counterparts who come from more richer families.
Differences in earnings between graduates from poorer and richer family backgrounds may of course be attributable to differences in the institutions they attend and the subjects they study. Previous work has shown that graduate earnings vary considerably by subject of degree (Sloane and O'Leary, 2005;Chevalier, 2011;Zhu, 2011, 2013;Chowdry et al., 2013). Walker and Zhu (2013) suggested substantial differences in private returns by degree subject and insignificant differences in returns by institution type (the data were insufficiently granular to analyse at institution level). Britton et al. (2016) also found considerable variation in earnings by both subject and institution, though much of this difference is attributable to different prior achievement levels of the students taking different degree options. Since prior achievement levels are lower, on average, for poorer students, we would expect sorting by subject and institution to depress their earnings.
Even with similar subject field and institution choices, an individual's socio-economic background may have an effect on their labour market outcomes after graduation. This might be because students from more advantaged backgrounds have higher levels of (noncognitive) skills (see e.g. Blanden, Gregg and Macmillan, 2007;Kassenboehmer, Leung and Schurer, 2018) that are not measured by their highest education level, or by their degree subject or institution. Related to this, performance in the degree could be important. Crawford et al. (2016) show that students from poorer socio-economic backgrounds are less likely to complete a degree and are less likely to graduate with a top grade than their more wealthy peers. We do not observe any non-cognitive skills or degree outcomes (grades or whether a student completes the degree) in our data set.
Alternatively, advantaged graduates may earn more because they have greater levels of social capital and are able to use their networks to secure higher paid employment. The literature in the UK at least does suggest that graduates from more advantaged backgrounds, particularly privately educated students, achieve higher status occupations and there is some evidence that privately educated students earn a higher return to their degree (Bukodi and Goldthorpe, 2011a,b;Macmillan, Tyler and Vignoles, 2013;Crawford and Vignoles, 2014). For example, Crawford and Vignoles (2014) found that graduates who attended private secondary schools earn around 7% more per year, on average, than state school students 3.5 years after graduation, even when comparing otherwise similar graduates and allowing for differences in degree subject, university attended and degree classification. This is consistent with earlier work using data from the 1970s and 1980s by Dolton and Vignoles (2000) that found the earnings return for graduates varied according to whether the individual attended a private school or a state school. This research also found that the private school wage premium for graduates who left university in 1980 was 7% for males but there was no premium for females, conditional on subject of degree and institution. Similar results were found by Naylor (2002) for a cohort of 1993 graduates (3% wage premium) and Green et al. (2012) using the National Child Development Study 1958 cohort and the 1970 British Cohort Study. The latter found that the private school wage premium increased from 4% for the earlier cohort to 10% for the later one. By contrast, work on how graduates' earnings vary by parental income level or parental socio-economic status, rather than by whether they attended private school, is more limited. For example, using the British Cohort Study (BCS) Bratti, Naylor and Smith (2005) found little evidence of variation in the return to a degree by social class.
Beyond the UK, there is an impressive body of work that has drawn on administrative data largely from Scandinavian countries (and some US states) to investigate the relationship between parental income and children's outcomes Figlio, Karbownik and Salvanes (2015). Much of this work estimates causal impacts of parental income or education on children's educational outcomes (e.g. Black, Devereux and Salvanes, 2005). There is less work on the extent to which parental earnings impact on graduate's earnings, conditional on the nature of the higher education achieved. Perhaps the most relevant paper in this body of literature is Chetty et al. (2017) which looks at this issue for the US using administrative tax data linked to data from the National Student Loan Data System for around 30 million individuals who were university students between 1999 and 2013. Their study has the advantage of granular information on both parent and child income (the former measured when the student was aged 15-19 and the latter when the student was 32-34). From this, they were able to construct intergenerational income correlations for graduates from different institutions. They found stark differences in the likelihood of poor students accessing elite institution. For instance, a student with parents in the top 1% of the income distribution is 77 times more likely to go to an Ivy League university than those with parents in the bottom fifth of the income distribution. However, they also concluded that students from poorer and richer backgrounds did similarly well if they graduated from the same college. At least for those who are able to gain access, universities appear to be levelling the income playing field in the US. Our study has key differences from Chetty et al. (2017). First, our measure of parental income is binary which is a clear limitation. Second, unlike Chetty et al. (2017) we are able to control for subject of study at the individual level, which is important given the evidence on variation in earnings by subject and the early subject specialization in the English system which differs markedly from the broader curricula of the average US bachelors degree. Third, in England it is less likely, on average, for wealthy but low achieving students to gain access to elite institutions. This is potentially due to differences in the HE admissions and funding systems, with the English system at this time arguably presenting fewer barriers to access compared to the US system. Admission in England is centralized and regulated, with the probability of entry into elite institutions closely correlated with students' prior achievement in national examinations taken at age 18 (A levels or equivalent). English tuition fees were also comparatively very low during this period and were income contingent, so students from poorer households could be exempt from paying. This point is reflected in evidence for England (Chowdry et al., 2013) which found that conditional on prior achievement, there was no socio-economic gap on entry into HE and a gap of just a few percentage points on entry into elite universities. These different institutional arrangements may mean that the socio-economic selectivity into HE, and particularly elite institutions, is somewhat different in the two countries which will impact on graduates' earnings, especially given that both Chetty et al. (2017) and our own study are limited by not having individual level measures of skill or IQ. Hence the analyses in both papers are necessarily descriptive.

III. Institutional background
During the period of study, the minimum school leaving age in England was 16, although comfortably more than half of students stayed in school until age 18. The majority of those that progress on to university do so within the first two years of leaving school. The vast majority of university degrees are in one subject (or sometimes two subjects combined) and take three to four years. Subject specialization therefore occurs relatively early by international standards (and, in particular, compared to the US). It is very common in England to move out of the family home for university, and the government has been loaning money to students to help with their living costs during study since the 1980s.
The English Student Loan Company (SLC) was introduced in 1990 to administer a reformed version of these 'mortgage-style' living cost loans for English students attending a higher education in the UK. There were no tuition fees at the time. The mortgage-style nature of the loans meant that repayments were in equal instalments that were independent of student's subsequent income. 7 In 1998, means-tested tuition fees of up to £1,000 per year (1998 prices) were introduced for the first time, with fees payable up front. Alongside this, the living cost ('maintenance') loans now became income contingent, so that individuals were automatically deducted 9% of their income above a threshold (initially this threshold was £10,000, though it increased on several occasions since) by the tax authority (HMRC). Any outstanding loans were written off when the individual turned 65. 8 Interest rates on student loans were set equal to the lowest of the Bank of England base rate plus 1% and the RPI measure of inflation. This is the regime that all of the students in our estimation in this paper were facing, namely the cohorts of students which started university between 1999 and 2005 who borrowed from the English SLC.
Crucially for our design, maintenance loans eligibility was dependent on parental income. 9 All individuals were eligible for some loan, but people from lower income households could borrow more. People who wanted to borrow more had to prove their income in the previous year by submitting their end of year tax statement ('P60') to HMRC. This meant it was difficult for people from higher income households to gain access to the larger loans. People could also borrow more if they lived in London during their studies (due to the higher living costs), while there was a different cap for those living with their parents while studying. 10 See Table 2 in the following section for the non-income-assessed maximum loan amounts inside and outside London.
Subsequently, there have been further changes in England. In 2006, fees were increased to £3,000 per year (2006 prices), although students could now borrow this money from the SLC to add to their student loans. Alongside this, there were changes to the rules for maintenance loan eligibility. Prior to 2006, individuals from poorer backgrounds could borrow the most. From 2006, the relationship became non-monotonic, as maintenance loans were increased as grants were tapered, which resulted in students from middleearning families borrowing the most.
Combined, these changes make it very difficult to identify poorer individuals in the data from 2006 onwards. This is primarily because the poorest students could now borrow almost exactly the same amount as the richest students (although they did receive additional living cost support through maintenance grants), but also because not everybody borrowed the tuition fee loans. Individuals who start under a given regime stay in that regime (so, for 7 Borrowers were eligible to start making repayments once they started earning more than a certain threshold (85% of average annual earnings for full-time workers). They could also defer payments if they earned less than that amount in a given year. 8 In 2006 the write-off period was reduced to 25 years from leaving HE. In 2012 it was lengthened again to 30 years from leaving HE. 9 Other forms of financial support, including cash bursaries and hardship loans were available during this period.
These are unobservable to us, but fortunately did not affect loan eligibility, which means they are unimportant for our identification of richer and poorer individuals. 10  There were further considerable changes to the English system in 2012; 12 fees were trebled to £9,000 per year (2012 prices), interest rates were increased to RPI plus up to 3% and there were a number of other changes to the repayment conditions. Subsequently maintenance grants for poorer students were abolished again, which meant a return to the situation of the poorest students borrowing the most. Again, these changes do not affect our results directly, but it is worth keeping in mind that the system is now very different to the one in place during our period of analysis.
We were unable to gain access to equivalent student loan data for the rest of the UK which are administered by separate bodies. We therefore do not observe students from Wales, Scotland or Northern Ireland. Higher Education is a devolved policy area, which means there is now considerable variation in policy across the UK. However, in the period we are interested in (1999-2005 starters), this is less true; at this point the different systems were quite similar.Although non-UK European Union residents were also eligible to borrow from the English SLC, we do not observe them in our data set either.

IV. Data
This is an exciting new data set for investigating graduate outcomes. Other UK surveys, such as the Labour Force Survey (LFS) and the Destinations of Leavers from Higher Education (DLHE) survey, have information on subject of study and institution. However, information on higher education institution has only recently been collected by the LFS, limiting the sample sizes available to researchers. The LFS also has only very limited data on the parents of graduates. Meanwhile, the DLHE does have information on graduates' earnings by subject and institution but has issues with sample selection (it is a voluntary online survey) and only captures full time equivalent earnings just three and a half years after graduation. Our data by contrast is able to provide insight into graduates' earnings up to more than a decade after graduation. More extensive detail on the data set is provided in 2019, Shephard and Vignoles (2018). We have a 10% sub sample of all borrowers from the English part of the SLC, which means they had to be domiciled in England upon application to university and attend a university in the UK. We have data on those who entered higher education between 1998 and 2008 but focus on the 1999-2005 entrants (henceforth, 'cohorts') because of the low uptake of loans in 1998 (driven by the slow transition into the income contingent loan system) and the availability of tuition fee loans and maintenance grants after 2005 (see discussion above). 11 Focussing on first year borrowing also means that we are unaffected by the fact that course length is variable (typically 3-5 years). 12 Another important change is the large increase in 'contextualised admissions', whereby universities make lower offers to students who had attended certain schools, typically those in poorer neighbourhoods. This was not highly prevalent during the period we are investigating, but it suggests both that earnings gaps might change for later cohorts and also that conditioning on qualifications on entry might be a crucial addition for these cohorts.
These data provide us with information on gender, first year of study (cohort), 13 institution attended, 14 field of study, 15 region on application to higher education and a detailed measure of income from employment (Pay As You Earn taxable income) and from self employment (Self Assessment income). We do not observe degree outcomes, which means we do not see degree classification or indeed completion. This could of course be important, although we note that dropout rates are low by international standards at around 10%. 16 We focus on earnings data from the tax years 2008/09 through 2013/14. We use earnings from labour, meaning employment income, profits from partnerships and profits from selfemployment are included. We exclude trust income, profits on share transactions, profits from land and property, income from foreign employment, savings, UK dividends, pension income, life policy gains, 'other' income, bank and building society interest. Clearly we are focussing on a period that follows the 2008 recession, which should be kept in mind when considering the results, as it may have implications for the magnitudes of the effects that we see. For example, wealthier students might be more likely to partake in postgraduate study during the start of the recession and that may boost their income subsequently compared to their less well off peers. Unfortunately we are unable to use other years of earnings data.
The sample sizes for our cohorts of interest are given in Table 1, which also shows the gender split. These samples reflect 10% of English borrowers at UK Higher Education Providers. These sample sizes align with overall numbers from the Higher Education Statistics Agency (HESA) for the same period. There are more women, reflecting the higher participation rates of women in the UK (rather than different borrowing behaviour). Note that we use up to six years of earnings data for each individual throughout the majority of this paper.
The administrative data described above is linked to data from HESA. Whilst we cannot link data at the individual level, we are able to do it at the institution and subject level. This provides a quantitative profile of the characteristics of students in each institution-subject combination. These data enable us to control more effectively for the characteristics of students attending different institutions and taking different subjects. This is important if we are trying to identify the residual correlation between socio-economic background and subsequent earnings after allowing for the fact that poorer students take different degree options. These data also allow us to control for the government region in which the student's institution is located, which is important since wages vary by region and we do not have data on the graduates' current location (current region is in any case endogenous since 13 For people who switched degrees we observe their second degree course. The total debt figures include previous borrowing, but the 'first year borrowing' that we use is from the first year of the course we observe them studying. 14 Students in officially recognized UK higher education learning institutions are eligible for loans. The government defines these as either 'recognised' or 'listed'. The former can award degrees and the latter can offer courses that lead to a degree from a recognised institution. We observe students at both types of institution meaning some Further Education Colleges will be included. Overall there are several hundred of these, although we observe 170 distinct institutions with the rest classified as 'other' institutions. 15 We observe the first digit of the 'JACS' code, which is a broad subject level classification set by HESA. JACS codes at this level include a heterogeneous range of courses. For example 'biological sciences'ranges from psychology to biology. Whilst we would ideally control for subject of study at a more granular level, this was not possible for disclosivity reasons. If there is lots of variation by background within the JACS code measures, this could affect our results (e.g. poorer students might choose courses within the JACS bands that have lower earnings potential). 16 See https://www.hesa.ac.uk/data-and-analysis/performance-indicators/noncontinuation. graduates with degrees that are more highly valued in the labour market may be better able to secure high paying jobs in high paying regions). Since a high proportion of graduates remain near their university when they enter the labour market, controlling for region of institution goes some way to account for this issue. We use HESA data from 2002/03.
The key characteristics which we can control for (all averaged at the subject-institution level) are: UCAS 'tariff score', 17 ethnic composition, gender composition and measures of students' socio-economic status. The latter include parental occupation, the percentage of students living at home whilst studying, the percentage of students who attended an English state school (i.e., non-private) and the 'Participation of Local Areas' (POLAR) classification (neighbourhood level participation in higher education by age 19).

Creating our measure of parental income
Our focus is on how graduates'earnings vary by socio-economic background of the student. Unfortunately the data do not include a direct measure of parental income. As discussed above, during this period, the SLC loaned English domiciled students at UK universities money to help with their living costs. Crucially the amount loaned varied by parental income. Our database includes the amount borrowed by each graduate for their student loan overall and in their first year of borrowing. We are able to use this to make an inference about the parental income of each individual because the maximum amount the UK Government was willing to loan a student depended on their parents' income, with individuals from lower income households able to borrow more than their more well-off peers. As discussed, for the 1999-2005 cohorts that we investigate, there was a monotonic relationship between how much individuals could borrow and their parental income, with the students coming from the poorest households able to borrow the most.
There is a lot of noise in the observed amount individuals borrow. However, for each of the 1999-2005 cohorts we observe clear spikes at points in the distribution that we are able to exploit. To explain this, we provide an illustrative density plot in Figure 1. This shows 17 The tariff score is a single quantitative summary of the performance of students prior to entering university in national tests taken at Advanced level (A level) or equivalent at age 18.

Density
x y Amount borrowed Figure 1. Illustrative density plot of amount borrowed. x represents the higher income maximum, while y represents the lower income maximum. Amounts and densities deliberately excluded for disclosure reasons. the distribution of the amount individuals borrow in their first year of study, where x is the maximum an individual from a higher income household can borrow, or the 'unassessed maximum'. The plot is normalized so x is set to 0 to allow for the fact that the maximum amount changes each year and differs for individuals studying inside and outside London to allow for the fact that the borrowing limits are higher in London. People borrowing more than x need to provide evidence of their parents' earnings from the previous tax year to the SLC.
The exact loan amounts for each year, the minimum parental income threshold and the share of individuals at different points in the distribution x are given in Table 2. The biggest spike in the distribution is at exactly £x, with the share at this point increasing from around 15% in the 1999 cohort to around 25% in the 2005 cohort and averages around 20% across all cohorts. Although not shown here, the distributions are very similar when split by gender, but with slightly more men borrowing exactly x. The next biggest spike is at exactly £y, where between 10% and 15% are borrowing exactly the overall maximum.
We also see from the table that around one-third of borrowers borrow less than x and around 20% borrowers borrow between £x and £y. We also see around 20% of borrowers above the official maximum. These individuals are most likely lower income individuals studying courses with longer than standard term lengths.
Using this measure of borrowing we infer a blunt measure of parental income that we set equal to one (indicating high parental income) if the individual borrows exactly x in her first year, and zero otherwise (indicating low parental income). Based on data from the Family Resources Survey (FRS), we were able to approximate the average earnings of  Notes: *This the minimum parental income someone can have to qualify to borrow more than x. **Loan amounts for people studying at a university in London are given in the parentheses. our two groups. Taking the set of 18-21 year olds living with their parents and borrowing a student loan between 2002 and 2005, we observe that the average parental earnings of those above the threshold for extra loans was around £77,400, while the average parental income of those earnings below the threshold was £25,900 in 2018 prices. 18 Around 30% of borrowers are above the parental income threshold, which is not dramatically more than the 25% we observe at the rich maximum in 2004.
We acknowledge that this measure of parental income does not perfectly identify all student from higher income households, for a number of reasons. First, those from higher income households may borrow less than the maximum available. Second, individuals from lower income households may choose to only borrow the higher income maximum because they do not want to borrow more or are unable to provide evidence of their parent's income. Third, we are missing altogether those individuals from the wealthiest households who did not borrow at all. While this figure is around 15% of the overall student population, it is likely to represent considerable fractions of the student populations at some high-status institutions in particular. Fourth, there may be misreporting of parental income to the SLC, though they do require official proof of income to gain access to additional loans. Whilst we cannot completely overcome these weaknesses in our measure, we do provide indicative evidence below that it does indeed identify individuals from more wealthy households. Further, we suggest that most of these issues with the measure are likely to bias our impacts towards zero. 18 Based on around 1,000 borrowers. These numbers include the parents' income from employment and any other private sources, including private pensions and investments. It does not include state benefits (which are not included in student loan assessment) or state pensions (which are, but only a small fraction of parents in the sample are old enough to be eligible). We thank Jonathan Cribb of the IFS for these calculations.

Validation of the parental income indicator
Here we investigate whether our simple indicator is indeed picking up higher income individuals by showing how it relates to university access and voluntary repayments. First, we show the share of higher income students in different types of institutions. We know that poorer students on average access less selective universities where the mean entry tariff score is lower. We divide all the universities in our database up into deciles based on the mean entry scores of their students, taken from HESA data. We split the top 10% of universities into two groups to identify the most elite top 5% of universities since this group is of particular policy interest given their very high earnings (Britton et al., 2016) and their relatively low shares of poorer students. In Figure 2, we plot the share of higher income students (conditional on being borrowers) in each of these university groups, by gender. It is clear that for both men and women, universities with higher entry criteria have much higher shares of individuals we define as being from a higher income household. In the most selective universities, more than half of students come from the 20% of individuals we define as being higher income.
Second, we examine the voluntary repayments of students. These are repayments of student loan amounts that are made direct to the SLC over and above the legally required repayments that are determined by the graduates' income level. Given the loan forgiveness and the low real rate of interest faced by the cohorts we are investigating, voluntary overpayment does not appear to be an optimal strategy for graduates. However, a summary of voluntary repayments is given by gender in Table 3, and clearly a significant amount of repayments occur, possibly due to debt aversion 19 or to avoid overpayment. 20 We are interested in these repayments because conditional on the graduates' own level of income, they may be more likely to be made by those from wealthier families who can afford such lump sum payments. From Table 3 we see that around 9% of students make voluntary repayments at some point between starting university and 2011, the final year we have data on voluntary repayments. The mean annual repayment (conditional on making a repayment) amount is around £2,500. A marginally higher share of women make repayments than men, and women on average make more voluntary repayments, with 34% of those making any repayments making more than one, vs. 29% for men. However average repayments are typically smaller for women than they are for men.
In Table 4, we estimate the probability of individuals from higher income households making any voluntary repayments. We estimate a probit model with a dummy set equal to one if an individual makes any repayment in a given year. The results show that individuals we classify as being from a higher income household are significantly more likely to make voluntary repayments, even conditional on their current earnings. They are about one percentage point more likely to make voluntary repayments, on a baseline of 3.3%. 19 Recently some lenders, including mortgage lenders, do take account of the presence of student debt when making lending decisions, which might make it desirable to pay off student debt more rapidly, but this was not very common in the period we are investigating. 20 There have been incidents of this that were widely reported in the British press. They occur due to slow communication about repayments and outstanding debt between HMRC and the SLC. People with variable income are the most vulnerable to this -in practice all overpayments are refunded by the SLC, although the process can be very slow.   Notes: ***Indicates significant at the 1% level; ** the 5% level. Controls for cohort, age and year are included in all columns. Table 5 further investigates voluntary repayments by highlighting differences in the size of individual repayments. The table shows results from regressing the individual voluntary repayments made by students on demographic characteristics and the higher income household indicator. Individual repayments from those from higher income households are considerably larger than for those from lower income households. Again, this holds true when controls for gender and current earnings are added. Among those who make voluntary repayments, those from higher income households make repayments that are around £1,000 larger on average. When HESA controls for subject-institution mix of students doing the same course are included, this estimate reduces to around £600, but remains statistically significant. Finally, in column (5), we show results using a tobit rather than OLS, with the same specification as in column (4). The broad result -that coming from a higher income household is associated with higher voluntary repayments -is robust, and indeed considerably stronger than in column 4, with individuals from higher income  (5) is run with the same set of controls as column (4). households repaying around £1,200 more, conditional on gender, earnings and university characteristics. This strongly favours the argument that individuals borrowing exactly x are indeed from more advantaged households than those who borrow different amounts.
Column (5) also shows that conditional on making repayments, women make larger repayments by around £460 on average.The sign is flipped compared to the OLS, suggesting differential selection into repayment by gender. Meanwhile, the relation between voluntary repayments and current earnings is economically immaterial, despite being statistically significant (the earnings coefficient in column (5) suggests a £10,000 increase is associated with a reduction in voluntary repayments of just £12). Hence graduates own income levels do not appear to influence whether they make voluntary repayments. 21

Treatment of those borrowing below the unassessed maximum
We also investigate closely those who borrow less than the unassessed borrowing maximum (i.e. £x in Figure 1) to best determine how they should be treated. We repeat the above analysis, splitting out those who borrow below x (Type A, or 'low borrowers') and above x (Type B, or 'high borrowers') from those who borrow exactly x (Type X -higher income households). Figure 3 shows the distribution of university attendance for the three groups, split by gender. Note that this differs from Figure 2 by showing the density function for each of the three groups so that the total for each group sums to one. The most notable feature is the high share of Type A 'low borrowers' in the group of universities with missing entry scores. This group of institutions typically consists of smaller, lower-status universities and 21 This result is surprising, although it is important to keep in mind that involuntary repayments are by definition higher for anyone above the income threshold for repayment, meaning higher earnings individuals are paying more on average overall. It is possible that higher earnings are associated with higher financial literacy and better understanding of the system (because the interest rate is so low, it is not obvious why people would want to make involuntary repayments). However, we do not think this is important for our conclusions here. Further Education colleges. Beyond that, it is clear that Type A 'low borrowers' look much more like Type B 'high borrowers' than they do typical Type X high income household individuals. A very low share of Type A 'low borrowers' and Type B 'high borrowers' attend the top 30% of universities, with a tiny fraction going to the top 5%. This contrasts with Type X high income household individuals, of whom a high share goes to top institutions. In Table A1 in the Appendix, we investigate voluntary repayments of Type A 'low borrowers' and Type B 'high borrowers' relative to Type X higher income household individuals. Both make much smaller voluntary repayments than the latter, with Type A 'low borrowers' making smaller voluntary repayments than Type B 'high borrowers' individuals. Of course, Type A individuals have lower debt, which makes them less likely to make large repayments. However, this is a big difference compared to Type X higher income household individuals, and is suggestive that they again are more like Type B individuals than Type X individuals. Based on the evidence here, we treat Type A and B individuals as our 'lower income household' group. We investigate the robustness to this assumption in our subsequent analysis. We now move on to consider the raw earnings differences between individuals from the two groups. Figure 4 shows the earnings distribution for male and female graduates from higher income households (grey triangles), graduates from lower income households (black circles) and for non-graduates (grey line), for the 1999 cohort in 2012/13. The non-graduate sample comes from the HMRC databases (more information is given in Britton et al., 2019), including a discussion of the relatively high proportion of graduates and non-graduates who have zero or low earnings. In that paper we argue this is a combination of higher earners who are working abroad and hence do not pay tax, lower earners with intermittent attachment to the labour market and part time and self-employed workers who will fall below the tax threshold.). Points to the right of each figure show the mean for each group. The results are striking; graduates from higher income households earn more right across the distribution, from the 20th percentile upwards, for both females and males. Whilst graduates from both lower and higher income households earn more than non graduates, the gap between graduates from lower and higher income backgrounds is also sizeable, particularly at the very top of the distribution. Indeed, whilst around 20% of the graduate population come from higher income households by our definition, of those in the top 1% of the earnings distribution, 45% (men) and 39% (women) come from higher income households.

Descriptive earnings differences
As already discussed, students from different socio-economic backgrounds take different degrees, with students from higher income households more likely to attend high status universities. It is possible that this sorting into universities could explain the raw earnings differences between those from high and lower income households. Figure 5 takes the first step to address this by plotting average earnings (conditional on earnings being positive) for graduates by the university groups defined above, by gender. Even within these institution groups, the differences in average earnings between graduates from high and lower income households are clear, suggesting that broadly speaking even when comparing graduates from similar institutions, those from a higher income background go on to do better in the   labour market. This appears to be particularly pronounced for men from the most selective universities. Of course, these figures do not properly control for different degree choices between those from high and lower income backgrounds. Individuals from higher income households might attend the more selective institutions within our coarse university grouping, or might make subject choices that lead to higher earnings. In the next section we try to address this more formally by investigating earnings differences conditional on subject and institution, as well as some other demographic characteristics.

V. Estimation
In Table 6 we estimate the following, conditional on individuals having positive earnings 22 where y it is earnings of individual i at time t, H i is an indicator for whether an individual is from a higher income household and X it is a vector of controls. We sequentially add additional controls into the vector X .
Columns (1) and (2) indicate the raw differential in earnings between students from higher income households, conditioning only on cohort and year. The differences in earnings are sizeable at around 21% for men and 16% for women. Controlling for subject of degree in columns (3) and (4) reduces these premia by 1-2 percentage points, suggesting that choice of subject explains very little of the differences in earnings. Despite the early specialization of English degrees, it appears that the phenomenon of socio-economic gaps in earnings is not primarily driven by subject choice or sorting of students from higher income households into particular subject areas. By contrast, adding variables which control for the different characteristics of students attending a particular degree course reduces the coefficients considerably (columns 5 and 6 -labelled HESA controls in the table, i.e. including variables describing the course participants from the HESA data). This implies that the nature of the degree course, particularly the entry tariff score, explains more of the variation in earnings between high and lower income students than does their choice of subject. In the final column we include university fixed effects (labelled HEI fixed effects in the table). This does not make an appreciable impact on the coefficients, over and above controlling for the characteristics of the students attending a particular degree course. Overall the results indicate that even allowing for both institution and subject, students from higher income households earn around 10% more than students from lower income households. This suggests that higher education does not fully level the playing field in terms of graduates' earnings. 22 Alongside this approach, we also estimated a probit model predicting employment. We find negligible differences in employment between individuals from high and low income households (see Appendix for more details). However, the data does not include an indicator of whether someone is employed or not. We infer employment from whether or not positive earnings are reported. This means we define employment as those with zero earnings. This will unfortunately also include individuals who move abroad. This could be more common for individuals coming from higher income households, again causing some bias and an underestimate of the socio-economic gap in employment.  We assess the robustness of our findings to different definitions of higher income in Table 7. 23 Defining a student as being from a higher income family in a number of different 23 We also tested the robustness of our results to including second HESA moments at the course level where possible and found that it made a negligible difference. ways, we still obtain the same broad result which is that there remains a wage premium coming from a higher income household of approximately 10%, even conditioning on degree subject and institution. The alternative definitions of higher income student are as follows, where x and y are defined in Figure 1: • Baseline definition: amount borrowed = x; • Definition 1: amount borrowed x; • Definition 2: amount borrowed < y; • Definition 3: amount borrowed = x, but individuals with amount borrowed < x excluded.
Another robustness check is presented in Table 8 which compares OLS regression estimates to those obtained using a nearest-neighbour propensity score matching estimate. To deal with convergence issues, we use the specification from columns (5) and (6) from Table 6 and match on the same set of variables as included in the OLS equation. Again the results are very similar. For men, the coefficient increases marginally by one percentage point, while for women it reduces slightly. Hence, even with an alternative, arguably more flexible estimation approach, we find that the wage premium for students from higher income households is around 10%.

Heterogeneity
We are interested in potential heterogeneous effects, particularly across different subject areas. It may be that the advantage of higher family income impacts upon some subjectoccupation trajectories more than others. Table 9 shows the preferred specification but estimated separately for three different subject areas, namely LEM (Law, Economics and Management courses), STEM (Science, Technology, Engineering and Mathematics courses) and Other (the rest, typically humanities, languages and the arts). The wage premium from coming from a higher income household is similar across all three subject areas except for women who take STEM subjects where interestingly the premium is somewhat lower at around 7%. Another aspect of heterogeneity we are able to explore is the magnitude of the wage premium from coming from a higher income family for those who attend different institutions. Figure 6 24 shows the wage premium for different groups of institutions, split by their average entry score. It is striking that for males only, the wage premium for those from higher income backgrounds is considerably larger if the student attended an institution in the top 5% of the institutional distribution, at around 25%. For women this effect is not evident.
We also explore heterogeneity by age. Because we cannot disentangle age from cohort and year effects, we show cohort effects holding year fixed in Figure 7, and year effects holding cohort fixed in Figure 8. It is evident that the wage premium from coming from a higher income background increases in both cases, suggesting that the impacts increase with age. Typically they appear to rise to around 14% for men and 12% for women by graduates' early thirties, starting at around half that in each case in graduates' mid twenties. There appear to be gender differences in how the effect changes with age; for women, there appears to be a dip in both figures at points that correspond to their early thirties, potentially due to family formation decisions. For men, this dip is not present, with the effects apparently continuing to rise. It should again be kept in mind that the time period we are investigating here coincides with the recovery from the 2008 recession. However, the fact that we see such similar patterns by both cohort and year suggests that the findings are not entirely driven by the recovery. Finally, in Table 10 we investigate the magnitude of the wage premium at different quantiles of the earnings distribution, motivated in part by the strong policy interest in England in effects through the distribution rather than just at the mean. Due to issues with convergence of the estimator, we use a restricted data set that includes only the 1999 cohort in 2011/12 and 2012/13. We provide raw earnings and conditional estimates at the 20th, 50th and 90th percentiles of the distribution, separately by gender. What is striking is that although at the median the conditional wage premium for men and women is around 10%, this rises to 16-20% for men at the bottom (20th percentile) and the top (90th percentile) of the distribution. A similar, though less stark pattern is present for women. This implies that those from higher income households are both better protected against low earnings and more likely to achieve high earnings.

Summary
In summary, men from higher income households (with median household income of around £77,000, 2018 prices) earn around 21% more than men from lower income households (with median household income of around £26,000, 2018 prices), while the equivalent figure for women is 16%. These estimates roughly halve to around 10% once controls for university, subject and other demographics are included. The differences appear to increase with age, doubling between individuals' mid-twenties and their early thirties, before levelling off for women but continuing for to rise for men. This suggests previous work which has focussed on socio-economic differences in early career outcomes (e.g. Macmillan et al., 2013;Crawford et al., 2016) may underestimate earnings gaps. Given our data limitations on the career paths of these individuals, this encourages further research. In particular it would be interesting to determine whether earnings differences by family background in graduates' early thirties are driven by greater participation in postgraduate study or perhaps initial placement into careers with faster earnings trajectories. Whether differences in earnings by parental income exist, even conditional on early career choices, is also an important research question. The socio-economic gap in graduates' earnings is similar across broad subject groupings, with the exception of women doing STEM courses, for whom the earnings gap is considerably smaller. This latter result may be attributable to the types of occupations pursued by women in STEM, particularly those in medicine and the public sector where salaries are more regulated and hence where coming from a more advantaged family may make less difference to earnings. It could also be a selection effect, if women pursuing STEM are somewhat atypical and if family background makes less of a difference to these atypical women in their career prospects.
We find large earnings variation by university type: men from higher income households attending universities with the most demanding entry requirements earn around 25% more than their relatively less well off peers, even holding institution and subject choice constant. This is a stark finding, suggesting being from a higher earning household is particularly advantageous at the top institutions for men. For women, we do not observe this result: a finding that aligns with Crawford and Vignoles (2014) and Dolton and Vignoles (2000) which both find larger effects of private schooling on earnings for men than for women. While we cannot explain this gender difference, we know from other research that earnings from the top institutions are considerably higher than elsewhere (Britton et al., 2016) and also that there are large differences in the occupational choices of men and women, even comparing those who take the same degree subject (e.g. Hakim, 2016). It is possible that some of the advantages of coming from a wealthy background are particularly pertinent right at the very top of the distribution. We explore this further when we investigate effects through the earnings distribution. However, our quantile regression results suggest that the advantages of coming from a higher income household are larger at the bottom and top than at the middle of the earnings distribution for both men and women. This suggests that there is still an advantage for women going into very high earning occupations.
The overall earnings differences we observe are large, particularly given the bluntness of our parental income measure. Indeed we believe this bluntness is likely to result in an underestimate of the true difference. This is because some lower income individuals will borrow the higher income maximum, while some higher income individuals will borrow less than their full allocation. Further, some individuals will not borrow at all; and we would expect these to be especially from higher income households. All of these issues are likely to bias downward our estimates. Notes: High family income premium indicates the additional earnings for graduates from a higher income household. Low-family income earnings indicate earnings of graduates from a lower income background. Percentage wage premium calculates the wage premium for those coming from a higher income household compared to the earnings of those from lower income households, assuming all controls are held constant across the two groups at their means. The first two columns of results show raw estimated earnings for high and low household income earnings. The next two show the difference in earnings from low household income. The final two columns show the conditional difference from low household income -i.e. the difference once controls for region, age, subject and student characteristics are included. All figures are in £000's. Uses 2011/12 and 2012/13 data and the 1999 cohort (estimates are given for 2012/13). Standard errors are clustered at HEP level. *Indicates significantly different to the base (lower family income) at 10% level, ** 5% and *** 1%. HESA controls include variables describing the characteristics of students enrolled on the course. HEI fixed effects include fixed effects for each institution.

VI. Mobility scorecards
The results above suggest that attending university does not appear to be levelling the playing field in terms of earnings. In this section, we follow Chetty et al. (2017) and estimate mobility scorecards to consider which are the best institutions and courses for encouraging social mobility. This has potentially important implications for policymakers trying to reduce the earnings gaps that we have highlighted in this paper. Specifically we investigate the extent to which different subjects and universities appear to help individuals from lower income backgrounds to become top earners, defined as having earnings in the top quintile of the earnings distribution. We split this analysis out by gender, although we consider the probability of getting to the top 20% of the overall earnings distribution, pooled across genders. 25 Chetty et al. (2017) define a mobility score for a given university as follows: Where Q5 is the top quintile of the income distribution and Q1 is the lowest quintile. We are limited to a binary indicator of household income and we therefore estimate the probability of a child making it to the top quintile of the earnings distribution given they are from a lower income household. Figures 9 and 10 follow Chetty et al. (2017) by plotting, for men and women respectively, P(Child in Q5 | Lower income household) on P(Lower income household) for 21 subject groups we observe in our data (see the Appendix for more information on the subject groupings and for the numbers behind the chart). For each subject, we give the rank of their overall scorecard, 26 and for a subset we show their mobility score. The figures give a sense of how good different subject groups are at delivering individuals that come from lower income households to the top of the graduate earnings distribution. We reiterate that care is needed in interpreting these findings, since our 'lower income household' individuals are lower income only in a relative sense and make up 80% of our population of students.
Medicine and economics are the highest performing subject groups by this measure (with scores of 0.4 and 0.273 respectively for men and 0.33 and 0.353 for women). Although these subjects are amongst the worst performing subjects in terms of the proportion of students enrolled from lower income backgrounds (65%), their delivery of students into the top 20% of the earnings distribution is very good. At least 40% of lower income students taking these subjects get into the top 20% of the overall earnings distribution. Other high mobility subject groups are maths and computing and engineering and technology. Miscellaneous law economics and management subjects also do relatively well. 27 On the other hand, we see languages and literature, history and philosophy, linguistics and classics and biological and physical sciences all have a relatively low share of lower income students enrolled and also have very poor delivery of those students into the top of the earnings distribution. Though creative arts does far better at enrolling students from lower income backgrounds, it is the worst subject in terms of enabling students from lower income households to reach the top of the earnings distribution. This latter result is because more generally students taking creative arts are less likely to achieve very high earnings, rather than being attributable to some failure within this subject for poorer students to thrive. Nonetheless, from a social mobility perspective, it is clear that some subjects are  more likely than others to provide a pathway for poorer students to achieve very high earnings. Figures 11 and 12 repeat the same exercise, but for universities. This gives an indication of how each university is delivering individuals that come from lower income households into the top quintile of the graduate earnings distribution. The results show a clear negative relationship between the share of poorer students and the probability of them getting into the top 20% of the earnings distributions. The best performing of the named institutions are clearly those based in London, with the prominent universities of LSE, Kings, Imperial and UCL all performing well for both genders.
For men, around 60% of students at the prominent London based institutions are from lower-income households, compared to less than 50% at Oxford, Cambridge and Bristol. The former all deliver at least 40% of these individuals into the top 20% of the earnings distribution, with the LSE doing the best out of the named institutions by delivering more than 50% into the top. Warwick is the highest performing of the named non-London institution in terms of the mobility score (0.238). It accepts similar shares of poor students to Durham, York, Exeter, Southampton and Cardiff, but is considerably more successful at delivering them to the top of the earning distribution. The worse performing institutions have a delivery rate of under 10%. For women, the LSE and Imperial College appear to have relatively high rates of mobility, accepting similar shares of poorer students to  Newcastle, Manchester, UCL, Southampton and Liverpool, but performing dramatically better in terms of delivery into the top. Oxford, Cambridge and Bristol are again similar, with amongst the lowest shares of students from poorer backgrounds and delivery in to the top of around 30%. Manchester is the best named non-London institution in terms of its overall mobility score (0.160).
These mobility scorecards are all clearly descriptive and do not therefore reflect the causal impact of these institutions on students' earnings. For example, the scorecards do not account for subject compositions of courses, or indeed institution compositions of subjects and do not adjust for proximity to the higher wage labour market in London, which is clearly an important factor. The data also refer to courses taken some years ago and hence may not reflect the outcomes from courses currently offered by these institutions. Not least, results are likely to have been affected by policy reform since 2005 and also the increase in contextualized admissions. However, they do illustrate the point that historically at least, some institutions admit a large number of lower income students but such students do not necessarily go on to have high earnings, whereas some institutions admit far fewer but are more successful in delivering such students into the top of the income distribution. .4 .6 .8 1 Share of students from lower income households Figure 11. Institution mobility scorecard for men. Earnings rankings use the 2011/12-2013/14 tax years, treating individual observations as independent. The results are not very sensitive to this approach, however. We label a subset of universities we have been granted permission to name. Numbers behind this figure are available in the Appendix.

VII. Conclusions
Using an innovative administrative data set consisting of hard linked tax and student loan individual level data, as well as aggregate data on graduates' degree courses, we document how the earnings of graduates from relatively higher and lower income households vary, even after allowing for differences in subject taken and institution attended. The paper is the first of its kind to use such data in the English context to examine the correlation between a measure of higher parental income (i.e. those in the top fifth of the household income distribution of those borrowing from the SLC to attend HE) and graduates' earnings, while being able to take account in some detail for the type of higher education experienced.
The main finding from this paper is that graduates' family background -specifically whether they come from a relatively lower or higher income household -continues to influence graduates' earnings long after graduation. The socio-economic gap in graduates' earnings is by no means entirely explained by differences in the subjects studied or institutions attended, though it is approximately halved once we account for these factors. When we take account of different student characteristics, degree subject and institution attended, the gap between graduates from higher and lower income households is still sizeable, at around 10% at the mean and median. Further, we find that the gap is larger at the 20th  The results are not very sensitive to this approach, however. We label a subset of universities we have been granted permission to name. Numbers behind this figure are available in the Appendix. and 90th percentiles of the graduate earnings distribution, suggesting that coming from a higher income household both protects against low earnings and provides greater opportunity for very high earnings. Men from high income households who attend the most elite universities appear to do particularly well in terms of their earnings.
Clearly, there are caveats to these findings. First, our measure of parental income is blunt, and we miss the roughly 15% of students who are non-borrowers, who are likely to be from the highest income households. We argue that our estimate is likely to be an underestimate of the true earnings differences between the richest and the poorest households. Second, we are analysing earnings in the post-2008 crash period, which may have impacted on the magnitude of our estimates. It is conceivable that the magnitude of the socio-economic gap in graduates' earnings may be affected by the state of the labour market, with students from the wealthiest families being better able to secure the good jobs that become more scarce during a recession. Third, changes to the higher education funding system in England in the intervening period may also mean that the magnitude of the socio-economic gap in earnings could differ going forward, for example due to the large subsequent increases in tuition fees and the increase in contextualized admissions policies.
However, the fact that we observe such robust effects are highly important, and suggest that simply focussing on getting poorer students into university is not enough. Perhaps most importantly, this paper encourages future research into the drivers of the earnings differ-ences that we observe. Possible explanations include differences in attainment on entry to university, performance at university, progression onto postgraduate study, early career occupation and location decisions and career progression, networks, or non-cognitive skills. Uncovering the most important of these drivers could have significant implications for policy, universities and firms. Researchers seeking solutions for improving social mobility might also find inspiration from the subjects and institutions that are best performing in terms of their social mobility scorecards. Notes: ***Indicates significant at the 1% level; ** the 5% level. Female is a dummy set equal to one for women. Controls for cohort, age and year are included in all columns. Notes: ***Indicates significant at the 1% level. Standard errors are clustered at the individual level. Table provides raw data behind Figure 6 in main text. Notes: ***Indicates significant at the 1% level. Standard errors are clustered at the individual level. Table provides raw data behind Figure 6 in main text. Notes: ***Indicates significant at the 1% level. Standard errors are clustered at the individual level. Table provides raw data behind Figure 7 in main text. Notes: ***Indicates significant at the 1% level. Standard errors are clustered at the individual level. Table provides raw data behind Figure 7 in main text. Notes: ***Indicates significant at the 1% level. Standard errors are clustered at the individual level. Table provides raw data behind Figure 8 in main text. Notes: ***Indicates significant at the 1% level. Standard errors are clustered at the individual level. Table provides raw data behind Figure 8 in main text. Notes: ***Indicates significant at the 1% level; ** the 5% level. Controls for cohort, age and year are included in all columns, with sequential addition of subject and HESA controls as indicated. Other courses -where individuals are in classes that are too small for us to be given their fine subject grouping. In these cases, we simply get their broader subject grouping.