A new composite measure of ethnic diversity: Investigating the controversy over minority ethnic recruitment at Oxford and Cambridge universities

Measuring ethnic diversity currently amounts simply to counting ethnicities. This makes it impossible to correlate with achievement, to track changes over time or to compare institutions in a mean-ingful way. It is not clear, for example, whether it is more diverse to have many ethnicities with a large majority in one or two categories, or to have fewer ethnicities with a larger proportion in each. This article is not about race per se, but develops indices from cryptography and ecology to solve the problem of measuring diversity properly. Using data from freedom of information requests and university admissions offices, it analyses the ethnic diversity of undergraduate recruitment at Oxford and Cambridge universities over the past 10 years to resolve one of the most controversial issues in higher education today. It finds that both Oxford and Cambridge universities have increased ethnic diversity by more than 25% over the last decade, but that the problem of under-recruitment of black UK students remains. The article is an important contribution to research methodology, with clear applications in the field of school effectiveness, and informs the debate on social justice in education, particularly in a period of significant demographic change across Europe.


Introduction
Ethnicity is not the same as ethnic diversity, and looking to correlate the former against (say) attainment at the student level cannot be scaled up to correlating the latter against attainment at the level of the institution. Currently, in educational effectiveness research, there is no composite institution-level measure for ethnic diversity. At the student level, there are categorical ethnicity data and binary 'flags' such as (in schools in the UK) English as an Additional Language (EAL), but while this is sometimes up-scaled to the institutional level as a proxy measure for some undefined type of diversity in the same way as entitlement to free school meals (FSM) is a proxy measure in schools for socio-economic status, the relationship between ethnicity and EAL is at best problematic and nuanced, and at worst misleading (see Strand et al., 2014, pp. 6-7, 70). National and local government planners use census and revenue and customs data to forecast expenditure and enrolment in universities, but there is The reason that these 'difficult questions' about systemic bias have not been, and cannot be, answered authoritatively is because the current practice of simply counting the number of ethnicities in a university makes it impossible to track changes over time or compare institutions. It is clear, for example, that a student population with (say) 40 equally populated ethnic categories has twice the diversity of a student body with 20 equally populated ethnicities, but how is diversity to be measured when the categories are not equally populated? What is needed is a single index that does more than simply count how many ethnicities exist in a dataset, but instead takes account of the relative population size of those different ethnicities. This is what will be developed in this article.
In terms of nomenclature for what follows, the number of ethnic types in a dataset is called 'richness' and the relative abundance of these different types is called 'evenness'. The following example will illustrate the difference. Suppose the ethnic diversity in two different universities is being considered: the first consists of 30 students of Indian background, 20 of African background and 50 of European background; the second comprises 2 students of Indian background, 95 of African background and 3 of European background. Both institutions have the same total number of students (100) and have the same 'richness' (three categories), but the first institution has greater 'evenness' because the students are more evenly distributed across the three types (see Kelly, 2016). 2 2 A. Kelly Two groups of diversity indices-Shannon and Simpson-will now be adapted for use in the field of ethnicity in education before discussing the concept of true diversity as a means of comparing them. Part 2 of the article will apply the indices to Oxbridge undergraduate admissions. Readers who wish to avoid the mathematical derivation of the indices can note in passing Eqs (1), (2) and (3) and Table 1, and skip to Part 2 of the article.

Shannon-type indices
The Shannon diversity index (also known as the 'Shannon-Wiener index' and the 'Shannon-Weaver index') is based on an idea in cryptography, originally proposed by Claude Shannon to quantify the uncertainty of predicting letters contained in strings of text (Shannon & Weaver, 1948), that the more different letters there are, and the more equal their proportional populations, the more difficult it is to predict which letter will be next in a string. It has applications in codebreaking. Adapting it here to the field of ethnic diversity in education, the Shannon index (H) quantifies the uncertainty in predicting the next ethnic 'type' of a student taken at random from a dataset. It is given by the formula: where R is richness (i.e. the total number of types in the population), p i is the fraction of the population made up of the ith type in the dataset and ln p i is the natural logarithm of p i . Since the natural logarithm of any fraction (p i ) is negative, the purpose and effect of the negative sign in the formula is only to correct the sum to a positive total. 3 The Shannon index is sometimes called the Shannon entropy. Most non-parametric diversity indices are referred to in the literature as 'entropies' (see Ricotta, 2003), but 'entropy' is not used here in the same sense that it is used in thermodynamics. Here it is a measure of the unpredictability or uncertainty in the outcome of a sampling process (Jost, 2006); for example, when the Shannon index is calculated using log base 2 it is the average minimum number of 'yes/no' questions required to determine the ethnicity of the sampled student. Further details of the Shannon index are provided in Appendix A. When all categories in the dataset are equally common, p i = 1/R for all i and the Shannon index reaches its maximum value, ln R.
The more unequal the category populations, the larger the weighted geometric mean of the p i values, and therefore the smaller the Shannon index. If nearly everyone is concentrated in one category and other categories have near-zero populations, then the Shannon index approaches zero; in other words, there is very little uncertainty in predicting the ethnicity of the next randomly chosen student. So 'not-very-diverse' data has a low H and in extremis, when there is only one ethnic type in a dataset, H is zero.
A normalised version of the Shannon index is the Shannon equitability index, E H , which is calculated by dividing H by H max : The advantage of E H is that its range is fixed from 0 to 1, with 1 representing a perfectly even distribution, whereas the range of the usual Shannon index is not fixed but depends on richness, R.

Simpson-type indices
The Simpson index was invented to measure the degree of concentration of individuals by type (Simpson, 1949). Similar indices were proposed in 1945 by Hirschman andin 1950 by Herfindahl (see Hirschman, 1964;Lovett, 1988), so the metric that is known as the Simpson index in ecology is known as the Herfindahl-Hirschman index in economics. Adapting it here to the field of ethnic diversity in education, the Simpson index (k) is the probability that two students taken at random from a dataset have the same ethnicity. It is given by the formula: where p i is the fraction of the population made up of the ith type in the dataset. This is equivalent to the weighted arithmetic mean of the population fractions (p i ), with the fractions themselves being used as the weights. The bigger the Simpson index, the lower the diversity: k = 0 represents infinite diversity and k = 1 represents no diversity.
The interpretation of k as the probability that two individuals taken at random from a dataset turn out to have the same ethnicity assumes that the first individual is 'replaced' in the dataset before the second one in chosen. Appendix B describes the case where the individual is not assumed to have been replaced.
The Simpson index is small for high diversity and large for low diversity, which is counterintuitive to a layperson, so various versions of the Simpson index can be found in the literature that use transformations to flip this around; that is to say, so that the index increases with greater diversity. I have found two such indices: the inverse 4 A. Kelly

Part 2. Undergraduate admission to Oxford and Cambridge universities
The perceived failure of Oxbridge to admit more BAME students and the lack of diversity in its student body is raised annually by Labour MP, David Lammy. Although his concerns include an allegation of systematic 'apartheid' against poor socio-economic communities, most of his criticisms centre on the low admission rates for black UK students and as a result he has become (unfairly) something of a cheerleader for accusations of racial bias in higher education. David Lammy is not alone in his criticisms. In 2011, the then (Conservative) Prime Minister, David Cameron, himself an Oxford alumnus, said that he found it 'disgraceful that only one black UK student began a course at Oxford University in 2009'. Downing Street supported the PM's remarks but the data were demonstrably wrong. While it was true that only one UK Oxford undergraduate admitted in 2009 identified as 'Black-Caribbean', Cameron ignored a further 26 UK undergraduate students that year who identified as ' Black-African' and 'Black-Other' (BBC, 2011). And therein lies the problem. One of the reasons the topic is so hotly disputed is that the facts are unclear and open to many interpretations. Oxford, for example, 'refuses to publish a detailed breakdown of undergraduate offers by ethnicity' and 'instead publishes only a narrow set of data showing White and Black offers, ignoring Asian, mixed or other ethnic groups' . Ironically, given that Lammy, Cameron and others themselves aggregated the 2009 Black-BAME admissions, they express themselves 'disappointed that Cambridge university combines all black people together into one group' (Lammy, cited in Adams & Bengtsson, 2017). They are correct that granularity is desirable when trying to interrogate ethnicity data, but in the 2009 case above the admissions data were being used selectively and unfairly. Samina Khan, director of undergraduate admissions and outreach at Oxford University, 'sees a very different picture': If you look at the data correctly and properly, you'll find poor students who get three As or more are more likely to get into Oxford than if you're a more well-off student. It's a question of proportion more than looking at the raw numbers. (cited in Adams & Bengtsson, 2017) Other spokespersons have suggested that the data needs to be interrogated at the level of academic subject: Differences in success rates between ethnic groups are something we are continuing to examine carefully for possible explanations. We do know that a tendency by students from certain ethnic groups to apply disproportionately for the most competitive subjects reduces the success rate of those ethnic groups overall. (Paton, 2013). This is particularly important in the case of black students at Oxbridge, a disproportionate number of whom apply for the most over-subscribed subjects: Oxford's three most competitive courses (Economics & Management, Medicine and Maths) account for 44% of all black applicants, compared to just 17% of all white applicants. 28.8% of all black applicants for 2009 applied for Medicine, compared to just 7% of all white applicants. 10.4% of all black applicants for 2009 applied for Economics & Management, compared to just 3.6% of white applicants. (Collier & Wintersgill, 2013) 6 A. Kelly Black students apply disproportionately for the most over-subscribed subjects, contributing to a lower than average success rate for the group as a whole. That means nearly half of black applicants are applying for the same three subjects . . . the three toughest subjects to get places in. . . . This goes a very long way towards explaining the group's overall lower success rate. (cited in Collier & Wintersgill, 2013) Stephen Tall (2011), former editor of Liberal Democrat Voice and development director for the Education Endowment Foundation, reiterated the point that the data needs to be looked at in terms of relative school attainment: In 2009, 29,000 white students got the requisite grades for Oxford compared to just 452 black students. Knowsley in Merseyside, for instance, which Mr Lammy cites as failing to get students into Oxford and Cambridge, is the worst area in England for school achievement. In 2009 only 212 students in all of Knowlsey took three A levels-of these, only three (1.4%) achieved AAA or better. Of those three, two got offers from Oxford. That's a pretty outstanding success rate. And the area of the country with the highest Oxford success rate is Darlington in the north-east. (Tall, 2011) And Collier and Wintersgill (2013) reach a similar conclusion from their analysis of Universities and Colleges Admissions Service (UCAS) data: In 2010, more than 32,000 UK white students got AAA or better at A-level (excluding General Studies) and around 29.2% of them applied to Oxford; 795 black students got AAA or better and more than 40% of them applied to Oxford.
David Lammy and other members of parliament are not swayed by these subtleties and maintain that it is 'not good enough' for universities to blame subject-bias or school performance (Heffer, 2017). They remain adamant that Oxford and Cambridge are 'fiefdoms of entrenched privilege and the last bastions of the old school tie' (Richardson, 2017), and more than 100 MPs have written to the heads of Oxford and Cambridge universities calling for urgent reforms in admissions. For these public representatives, the controversy is about the admission of UK/Home students-the data reveals, as Lammy rightly points out, a 'stark regional and socio-economic divide in intake' (Adams & Bengtsson, 2017;Burns, 2018)-but the underpinning problem of data interpretation comes from the fact that there is no single metric for gauging what is being discussed. Whether the population in question is UK/Home undergraduates or the entire Home and Overseas student body more broadly, the issue remains that we need a single measure of ethnic diversity before we can interpret the data. And we need to be able to measure trends in the data in light of claims made that Oxbridge is 'actually moving backwards in terms of elitism' (Richardson, 2017). The indices developed in Part 1 of this article can address these issues, although of course the fundamental reasons behind potential black or BAME under-achievement/underrecruitment is a problem of policy and politics, rather than of measurement.
The analysis below comes in two phases: • Firstly, it examines data on the nationality 8 of undergraduates admitted to Oxford and Cambridge universities over the decade from 2007 to 2016. Diversity indices are then applied to that data to see whether the universities are making improvements in terms of recruitment.
A new composite measure of ethnic diversity 7 • Secondly, we look at Oxford ethnicity data on UK/Home admissions-the same data is not available for Cambridge-to see whether Oxford is under-recruiting BAME undergraduates from the UK.
Together, the analyses give the most complete picture yet of Oxbridge admissions and go some way towards resolving the recurring controversy. Table 2a-e shows Oxford undergraduate offers by nationality, for the 10 years from 2007 to 2016, and Table 3a-e shows Cambridge undergraduate offers for the same period. The data refers to 'offers' rather than 'acceptances' because this captures the willingness of Oxbridge to accept applicants. Acceptances run at nearly 100% of offers at Oxbridge and the occasional case of an applicant failing to get the stipulated grades or refusing the offer does not negate the university's willingness to accept the applicant.
We have chosen to look at the entire undergraduate student body and not just students with UK/Home fee status, and for the sake of brevity, zero rows (for countries with no offers) have been removed.
Oxford numbers refer to the annual intake of undergraduates (e.g. 3,428 in 2007 and 3,648 in 2016), whereas Cambridge numbers refer to the total number of students in statu pupillari that year (e.g. 11,807 in 2007 and 11,811 in 2016). The data for Cambridge is therefore rolling trend data, but since the purpose of the analysis is not to compare the two universities, which would be problematic in any case for structural reasons, but rather to examine diversity trends in admissions over time for each university, this difference in the way the data are compiled is not important.
Data on UK/Home offers (the penultimate row) was compiled by adding up all the individual Local Authority numbers from UCAS data. Table 4 contains the calculation of the nationality indices for Oxford and Table 5 shows the calculation of the same nationality indices for Cambridge. Figure 1 shows Oxford's nationality indices trend in the period 2007-2016, and Figure 2 shows the trend for Cambridge. Figure 3 shows the true Shannon indices trend for both universities. They are shown in a separate figure for reasons of scale.
Bearing in mind that a decrease in the Simpson index represents an increase in diversity, it is clear from all three figures that diversity, as measured by nationality, has increased significantly over the 10-year period. For example, using true Shannon, Cambridge is 27% more diverse than it was 10 years ago, and Oxford is 32% more diverse. Using the Shannon index, the respective increases are 22% for Cambridge and 30% for Oxford. This is not to say, for reasons explained already, that Oxford is more diverse than Cambridge, but it is clear that both universities are much more diverse than they were a decade ago.
Does this address the concerns and allegations of politicians? Well, not completely because, to be fair, David Lammy's point is that Oxbridge is under-recruiting UK/ Home BAME students. So, we will now look at ethnicity data for the UK only and put that alongside the (whole undergraduate student cohort) 'nationality data' above to get a more complete picture of Oxbridge admissions diversity. Ethnicity data is not publicly available for Cambridge University, but is available for Oxford (Oxford Public Tableau, 2018b) and it is shown in Table 6a-e for undergraduate admissions in the decade 2007-2016. Table 7 shows the calculation of the indices for the dataset.  A new composite measure of ethnic diversity 9 The trends are shown in the usual way in Figure 4 and Figure 5. Once again, the true Shannon is shown on a separate figure for reasons of scaling.
As was the case with the 'nationality indices', the 'ethnicity indices' show a significant increase in diversity. That increase is not as marked for ethnicity as it is for nationality when measured by the Shannon (22% for ethnicity vs 32% for nationality), but is slightly more marked in the case of Shannon (31% for ethnicity vs 30% for nationality). All the indices show a significant increase in diversity, especially in the years 2012-2016, a period of Conservative-Liberal (2010-2015 and Conservativeonly (2015 et seq.) government.

Discussion and conclusion
Legitimate concerns have been expressed in the UK and elsewhere about widening participation at elite universities like Oxford and Cambridge, especially in the 'Black UK (Home)' category. Rightly so: white students are twice as likely to gain a place as their black counterparts; more than one in four Oxford colleges failed to admit a single black student between 2015 and 2017 (Horton, 2018); six of Cambridge's 29 undergraduate colleges admitted fewer than 10 black British students in 5 years A new composite measure of ethnic diversity 11 (Diver, 2018). Concerns have also been expressed about the opaque attitude of the two universities to their admissions data. In 2018, for example, Cambridge revealed to the Financial Times, following a freedom of information request, that 'Magdalene College received 40 applications from black British students in the period 2012-2016 but only made between three and nine offers' [emphasis added]. The data were 'released as a range because otherwise the small numbers would mean that the anonymity of applicants would have been compromised' (Diver, 2018). Whose anonymity could possibly be compromised in this context is baffling and irritating to public representatives. As David Lammy rightly said: We need transparency if we are going to have progress on access to our elite institutions for students from disadvantaged and under-represented backgrounds. (Diver, 2018) The problem of analysing how much Oxbridge has done (or not done) to improve admissions from disadvantaged and under-represented backgrounds is partly a A new composite measure of ethnic diversity 13 problem of nomenclature and partly a consequence of confused discourse. Conflating 'black' with 'BAME', for example, distracts from legitimate concerns about other (non-black) BAME and non-BAME ethnic categories, and from BAME overall. The full spectrum of ethnicities needs to be analysed: only then can we be assured that Oxbridge is making genuine efforts to attract the best and brightest from all sections of society. And of course, the issue of ethnicity is anyway compounded by possible discrimination against applicants from low socio-economic backgrounds. In Oxford, for example, those who grow up in the richer south of England are much more likely to gain admission than their poorer northern counterparts (Horton, 2018). As a consequence, the discourse around Oxbridge admissions is confused and confusing for policy-makers and the public, and the core issues are seldom if ever teased out. For example, calls 'for parents and schools to help boost the number of under-represented minorities' (Diver, 2018) are frequently made, but the issue of quotas for underrepresented ethnic categories, which could only be set after alignment with school census data, is never discussed, although it is virtually impossible to have one without the other.
Overall, the analysis presented in this article suggests that both Oxford and Cambridge are making significant progress to widen BAME access overall, but not enough 14 A. Kelly More needs to be done to prepare high-achieving black students for applications to Cambridge and Oxford, which is why we have significantly increased funding to programmes like Target Oxbridge 9 . (Diver, 2018) Both universities clearly 'want to be more diverse' (Diver, 2018) and are making offers to an increasingly diverse pool of applicants-for example, nearly 50% of all black students who got the necessary grades applied to Oxford, compared to 28% of all white students with the same grades-but the actual raw number of students from the three UK Black categories remains stubbornly low, as David Lammy and other critics have pointed out. One likely fault line is low school attainment and teacher antipathy, rather than systemic racial bias within the universities, so the suggestion that Oxford and Cambridge should 'write to high-achieving BAME students to persuade them to apply, as the Ivy League colleges do in the US' (Bulman, 2017), is a good one and warranted by the analysis, even though, as university supporters suggest, 'it is not the purpose of universities to correct the failings of state schools' A new composite measure of ethnic diversity 17 (Editorial, 2018). Part of the blame may also lie with an 'anti-aspiration' culture prevalent in some state schools, 'reinforced by populists' who perpetuate a Brideshead view of Oxbridge as a place of privilege and aquatint.
The real problem facing Oxford isn't the lack of diversity in its offers but the lack of diversity in its applicants. Not enough students from poorer or non-white backgrounds apply . . . which 'is not wholly the fault of [Oxford]. Some teachers actually put their pupils off applying'. (Editorial, 2018) Sam Gyimah MP, the Conservative government minister in charge of universities, agrees and suggests that for this reason, elite universities should 'start engaging early', even 'at primary school level' (cited in Horton, 2018): There are some schools that are schooling their pupils from the age of 12 or 13 so that when it gets to A-levels it's part of their DNA. What Oxford should be doing is helping those schools that do not have those in-built systems to actually develop those advantages. (cited in Yorke, 2018a) As for the views of BAME students themselves, it is important to record recent encouraging progress (Yorke, 2018a), not least because those already at Oxbridge have expressed concern that 'negative press' campaigns 'only serve to further alienate a proportion of the population who already doubt their ability to be accepted' (Gomes, 2017).
As a member of the university from inner-city northern England, I think Mr Lammy's constant bitter criticism of Oxford is bang out of order. (Horton, 2018) Clearly, for those most closely affected, the interpretation of the data is sensitive, especially given that the overall BAME picture is unclear relative to national background data. The last UK census showed that 18.3% of 17-24-year-olds are BAME. The corresponding figure at Oxford in 2017 was approximately the same at 17.9% (Office for National Statistics, 2011;Editorial, 2018), and the proportion of places Oxford gave to black applicants matched approximately the proportion of black students who achieved AAA or better at A-level at other universities (Editorial, 2018). So, it could be said that Oxford and Cambridge are being unfairly targeted (or that other universities are getting off the hook) and that critics are over-reaching in their claims of racial discrimination (Bulman, 2017). The issue is not so much the admission of BAME applicants per se, but the significant under-representation of categories within BAME, and there are dangers in continuing to misdirect the debate along these lines. Politicians who threaten that 'if Oxbridge can't improve, then there is no reason why the taxpayer should continue to fund them' (Richardson, 2017) only serve to hasten the day when these world-leading universities opt out of the public sector, like Ivy League universities in the USA, which the same critics hold up as exemplars of good practice. Yvette Cooper, Labour MP, regularly slams Oxford for making 'lame excuses' for its 'dismal performance' on diversity and universities minister Gyimah takes the same line (Horton, 2018). It is not clear what the endgame is for these critics on both sides of the political divide, although driving the two universities into the private sector in pale imitation of their Ivy League counterparts would certainly follow the trend of recent decades under both Labour and Conservative administrations. On the Left, David Lammy seems to be pushing for admission quotas and a legally binding system of positive discrimination (Editorial, 2018). On the Right, Sam Gyimah, himself an Oxford graduate and the first black president of the Oxford Union, seems to be pushing for Oxbridge to 'look beyond exam results' and to 'take in a broad range of factors to crack the issue of admissions' (cited in Yorke, 2018a), although it is clear from experience in schools that 'looking beyond examination results' is not to the advantage of low socio-economic status applicants. Any system of coursework, interviews  and personal statements favours students with high levels of cultural capital, agency and parental support (Machin & McNally, 2005;Felix et al., 2008;Ma, 2009). In short, Minister Gyimah favours a contextualised admission system that lowers the academic requirements for applicants from disadvantaged backgrounds. Other oversubscribed universities in the UK (e.g. University College London and Kings College London) have similar schemes in place (Yorke, 2018a), but Oxford and Cambridge have so far resisted the trend, opting instead for a flagging system that alerts tutors to disadvantaged applicants. Graham Virgo, Pro Vice-Chancellor for Education at Cambridge, is adamant that he will not give 'special treatment to BAME applicants' because he 'wants students to feel they have secured their place on merit rather than special treatment' (cited in Yorke, 2018b). Finally, beyond the students, schools and universities themselves, some responsibility must be accepted by policy-makers, none of whom have encouraged debate on related issues like norm-referenced entrance examinations, which would expose racial for Students, is a case in point. He has publicly threatened to 'fine universities' by slashing their tuition fees by a third if they 'fail to improve diversity' (Barber, 2018), but to his credit, he recognises that the whole issue of widening access generally, and in particular raising black-student achievement in schools and universities, is 'not just an Oxbridge challenge':  In doing so, Michael Barber has widened the debate from over-subscribed universities to over-subscribed courses in universities, thus closing the circle on the proposal advocated by his overseeing minister, Sam Gyimah, for contextualised admission.
Universities [should] recognise in [their] admission policies how much harder it can be for a young person at a tough inner-city school to get good A-levels, by reducing required grades a little. (Barber, 2018) However, the Barber-Gyimah proposals are thin on detail: how to calculate the reduction in academic attainment that should be made for disadvantaged applicants    and under what circumstances; what the proposed scale of 'toughness' for schools should be; how universities should compare one tough inner-city school with another; and whether 'being black' trumps 'being poor' when it comes to setting offers for admission. The irony is that Michael Barber was one of the architects of the last Labour government's policy on marketisation in education, which has helped to bring higher education to its current predicament. The powerful Public Accounts Committee (2018) agrees. While being clear that progress had been too 'incremental' and not 'transformational' enough, and that 'universities were not pulling their weight' on widening participation, the watchdog was unequivocal in its view that competition and marketisation in higher education had not resulted in 'the market working in students' best interests'. And this is clearly the fault of politicians, not universities. It is too convenient for policy-makers that when quasi-markets in education work, the improvement can be attributed to market freedom; and when they don't work, as in higher education in the UK, the failure must be corrected by greater regulation. NOTES