Who does and doesn’t pay taxes?

We use administrative tax data from audits of self-assessment tax returns to understand what types individuals are most likely to be non-compliant. Non-compliance is common, with one-third of taxpayers underpaying by some amount, although half of aggregate under-reporting is done by just 2% of taxpayers. Third party reporting reduces non-compliance, while working in a cash-prevalent industry increases it. However, compliance also varies signiﬁcantly with individual characteristics: non-compliance is higher for men and younger people. These results matter for measuring inequality, for understanding taxpayer behaviour, and for targeting audit resources


I Introduction
A large share of tax owed goes uncollected. In the UK, around 6% of tax liabilities go uncollected each year (HM Revenue and Customs, 2019), similar in scale to the value of corporation tax receipts. In the US this number is 16.4% (Internal Revenue Service, 2019); in Canada 11.6% (Canada Revenue Agency, 2019). This issue has become politically important, with strong public support for reducing both avoidance and evasion (Tax Justice UK, 2020). It has also captured increasing academic attention, with work on 'missing income' and 'missing wealth' (Zucman, 2015;Alstadsaeter et al., 2018Alstadsaeter et al., , 2019Guyton et al., 2020). A key input for this research is a clear measure of where in the income distribution under-reporting is taking place, and how much income is under-reported (by income source) across this distribution. This is needed to accurately assign missing income and create corrected income distributions (Alvaredo et al., 2016;Piketty et al., 2018).
In this paper we use data from random tax audits in the UK to study who is underreporting income. We combine confidential administrative data on the universe of UK personal income tax filers with a randomised audit programme. We show three main results. First, non-compliance is relatively prevalent, though it is often small. One-third of tax filers under-report income to some extent, but more than half the aggregate under-reporting is done by the 2% most non-compliant taxpayers. This highlights the potential for policies like auditing to tackle the most extreme non-compliance. Second, the probability of non-compliance is relatively stable at around one-third across the income distribution. However, the share of tax that is underreported declines across the distribution. Finally, individual characteristics including age and sex offer predictive power in estimating non-compliance. This is important for assigning missing incomes, particularly when estimated income distributions are then used to look at relative incomes of different groups, such as men and women (Garbinti et al., 2018;Burkhauser et al., 2020). It is also important for policy, since these can be used to target compliance resources such as audits.
Studies of tax enforcement sometimes compare the two main mechanisms observed from tax authorities: sending letters and conducting audits (Pomeranz and Vila-Belda, 2019;Slemrod, 2019;De Neve et al., 2020). Our results speak to why these approaches are not really comparable: they are attempting to solve different problems. 'Low-level' non-compliance is very prevalent, highlighting the need for scaleable mass-market interventions like letters (Slemrod et al., 2001;Kleven et al., 2011;Mascagni et al., 2017;Bergolo et al., 2020). But extreme non-compliance is highly concentrated, with most of the revenue loss coming from a small fraction of taxpayers, so more costly interventions such as audits can be targeted here (DeBacker et al., 2018;Advani et al., 2019;Hebous et al., 2019;Sarin and Summers, 2019). The two approaches are complementary, aimed at different types of non-compliance. Given the cost of audit interventions, our results on what characteristics best predict non-compliance and help target audits are also particularly valuable.
In trying to understand heterogeneity in tax compliance behaviour, a large literature has grown up around the idea of non-financial motivations to pay, sometimes called tax morale (Andreoni et al., 1998;Luttmer and Singhal, 2014;Dwenger et al., 2016). Early work used surveys to understand reported opinions about tax, such as the extent to which tax evasion can be justified, and correlated these with individual characteristics (Alm and Torgler, 2006;Torgler, 2006;Lago-Peñas and Lago-Peñas, 2010;Doerrenberg and Peichl, 2013). More recently lab experiments have been used to relate observed lab compliance with individual characteristics (Cummings et al., 2009;Choo et al., 2016;Guerra and Harrington, 2018). Our results allow a mapping from individual characteristics to behaviour in real-world tax decisions (Alm et al., 2015). We find qualitatively similar patterns to those on reported morale and lab compliancecompliance is higher among women and increases with age. But we also show that some of the variation in compliance behaviour comes from characteristics of individuals' incomes: incomes that are easier to hide are more likely to be under-reported, and these are more common among men and the young. Evasion behaviour is thus the product of individual characteristics and opportunities to evade.
Beyond tax administration, our results speak to a broad literature in public economics that uses administrative data from tax authorities, rather than surveys, to understand patterns in inequality. Survey data are known to have issues with coverage, and the gap between surveys and national accounts has been rising over time (Atkinson et al., 2011;Ruiz and Woloszko, 2016;Burkhauser et al., 2018a,b;Webber et al., 2020). One reason is that income concepts are sometimes different between national accounts and what surveys are measuring Corlett et al., 2020). 1 But even after defining a consistent measure of income, and weighting to account for non-response, there is substantial undercoverage. One proposed solution to ensure consistency with national accounts has been to implicitly assume that by income source, all individuals under-report incomes proportionally-the so-called Distributional National Accounts (DINA) method (Alvaredo et al., 2016;Garbinti et al., 2018;Artola Blanco and Martínez-Toledano, 2019). Our results show that under-reporting is common among a large set of UK taxpayers -one in three filers, one in ten taxpayers -but the majority do not appear to under-report. There is also a very skewed distribution of under-reporting. Hence this underreporting needs to be corrected directly, before applying something like a DINA approach, and audits are an effective tool in determining how to implement these corrections. This approach complements other recent work focusing on adjustments to the income distribution to capture evasion at the top (Guyton et al., 2020).
The remainder of the paper is organised as follows. Section II provides some context about the aggregate size of the UK tax gap, how it compares internationally, and the relative contributions of different taxes to the gap. Section III outlines the policy context and data sources. Section IV provides evidence on the distribution of non-compliance, both marginally and then by incomes and characteristics. Section V concludes.

II UK tax gap
In the UK the tax gap is defined as 'the difference between the amount of tax that should, in theory, be paid to HMRC, and what is actually paid' (HM Revenue and Customs, 2019). Measuring what is actually paid is straightforward. Defining the theoretical liability is more contentious. HMRC define it as 'the tax that would be paid if all individuals and companies complied with both the letter of the law and our interpretation of Parliament's intention in setting law.' This clearly incorporates (illegal) evasion behaviours, and also incorporates some avoidance where HMRC deem that to be 'unlawful', and this is the definition we will focus on. 2

II.1 Aggregate
The aggregate tax gap in the UK was £35 billion for the 2017-18 tax year, or 5.6% of total liabilities (HM Revenue and Customs, 2019). It has remained relatively stable, moving around between 5.3% and 6.5%, for more than a decade. To put these numbers into context, the five biggest taxes in the UK and (in parentheses) the share of revenue they raised in 2017-18 were: Income tax (25.8%), National insurance contributions (18.8%), Value added tax (17.9%), Corporation tax (7.9%) and Council tax (4.6%) (Office for Budget Responsibility, 2019). So the tax gap is equivalent in scale to between the fourth and fifth largest taxes in the UK. In terms of spending it is close to the amount spent on defence or central government spending on education.
Despite this, by international standards the UK tax gap is relatively low. The most recent tax gap estimates for the US, covering 2011-13, found a gap of 16.4% of revenue owed, triple the UK rate (Internal Revenue Service, 2016). In Canada, estimates for 2014 suggest the tax gap was around twice that in the UK, between 10.6-12.6% (Canada Revenue Agency, 2019). Few other countries produce comprehensive tax gap estimates, though some produce numbers for particular tax sources (and particular years) only (OECD, 2019, Table A146). Again the UK performs very well. So the need to understand how to reduce the tax gap, which is important in the UK, is even more necessary in other contexts.

II.2 Structure
To reduce the tax gap, it is useful to think about which areas of tax have the largest gap. Figure 1 shows the share of the tax gap that comes from various sources, and the share of revenue due from that source that is not brought in (the 'percentage gap'). Three features stand out. First, personal taxes -Income Tax, National Insurance Contributions, and Capital Gains Tax -and VAT make up around a third of the total tax gap each, at £12.9 billion and £12.5 billion respectively (HM Revenue and Customs, 2019). In one sense this is not surprising, since we have already seen that these are the biggest sources of tax revenue. More than half of the remaining gap is from corporation tax.
Turning to the share of revenue due that is not received, this is actually lowest for personal taxes, at only 3.9% (HM Revenue and Customs, 2019). By contrast VAT, which contributes the same share of the aggregate gap, has more than twice the percentage gap: one pound in every eleven (9.1%) is not paid. All other tax types fall somewhere in between. At first glance this might suggest relatively little scope for focusing on individual evasion. While the aggregate amount is large, it is only a small proportion of the revenue due from this source. Figure 1: Structure of the UK tax gap by tax Notes: 'Share of tax gap from source' shows the share of the overall tax gap that is attributable to that tax. 'Share of revenue not brought in' is the share of all revenue due from that tax that is never collected. Personal taxes comprise Income Tax, National Insurance Contributions and Capital Gains Tax. 'Other taxes' includes indirect taxes (Aggregates Levy, Air Passenger Duty, Climate Change Levy, Customs Duty, Insurance Premium Tax, Landfill Tax) and direct taxes (stamp duties, Inheritance Tax and Petroleum Revenue Tax). The share of the total gap from self assessment and PAYE is less than the total gap from personal taxes because the breakdown of personal taxes shown excludes avoidance (1.8%) and hidden economy (5.4%). Source: HM Revenue and Customs (2019)  However, within personal taxes there is substantial variation in the percentage tax gap, as seen in the right-hand panel. Almost all tax due through Pay As You Earn (PAYE), the UK's withholding tax system, is received, with only a 1% percentage gap. By contrast, more than one pound in six (17%) of all tax due through the self-assessment system is not collected. This is by far the largest percentage gap for any tax 'type'. 3 It is also substantial in terms of its contribution to the aggregate tax gap: £7.4 billion of self assessment tax was not paid in 2018, more than one-fifth (21.2%) of the total gap (HM Revenue and Customs, 2019).
The key driver of this difference in percentage tax gap is that PAYE income is 'third party reported' and also withheld at source (Kleven et al., 2011;Slemrod and Gillitzer, 2013;Adhikari et al., 2020). Non-compliance therefore needs to be done by or in concert with one's employer, which is relatively difficult Bjørneby et al., 2018). By contrast self assessment has more scope for non-compliance, since it is self-reported and (partially) self-remitted. As we will show below, the level of this non-compliance also varies substantially by income source. To understand this better we next explain more about the structure of the UK income tax self-assessment system, and what verification and enforcement mechanisms are available.
III The self-assessment system III.1 Who files?
Around one in four income taxpayers in the UK file taxes through the income tax self-assessment system (henceforth 'self assessment'): around 10 million people. 4 In contrast to the US, where essentially all taxpayers are required to file, the UK has an effective withholding system (PAYE) which is able to correctly collect tax for three-quarters of the taxpaying population. Individuals who work in employment (rather than self-employment or partnerships), or who have pension income, will typically have tax withheld at source. If any additional capital income (interest, dividends) is sufficiently small that it falls below the tax-free allowances, then they have no additional tax to pay. Self assessment therefore covers individuals who have income from selfemployment or partnerships, who have substantial income from dividends (more than £5000 in 2018) or interest (more than £500 or £1000 depending on income tax bracket), or who have any income from property, since in all these cases correct withholding does not take place. It also covers all individuals with employment income above £100,000 (£40,000 before 2006). Individuals who are in self assessment are required to annually file a tax return reporting their income (and taxable capital gains). Data from these returns are captured and after processing are available via the HMRC Datalab in the Valid View dataset. 5 Here we obtain information on individuals' incomes and characteristics such as age, sex, region in which they live, and industry (for those who are self-employed or partners). Table 1 shows share of taxpayers with income from each source.
Filers are more likely to be male (70%) than the population at large (49%). The mean age of filers is similar to that of the adult population, around 50 but filers are more concentrated in terms of age-there are fewer young adults and fewer filers a long way past retirement. Filers also have higher incomes than the average; although there are filers with lower incomes, they are more likely to be self-employed or pensioners than employees.
We combine these with data from the SA302 dataset, which is produced from individuals' tax calculations, and contains the tax due based on information about their incomes. We use these data sources over the period 1997 to 2012, and uprate all financial values to 2012 using the Consumer Price Index to account for inflation.

III.2 Verification and Enforcement
As described above, self assessment has proportionally by far the largest tax gap of any income type. To understand this, it is helpful to think of four broad ways in which compliance is achieved in the UK: (i) direct reporting, (ii) third party reporting, (iii) behavioural interventions, and (iv) audits. Understanding how each of these affects different parts of income reported on self assessment will help explain some of the patterns we see in non-compliance.
Direct reporting makes use of systems such as 'fiscal tills', so that transactions are recorded directly to the tax authority systems at the same point in time as they are recorded for the individual or business (Slemrod and Gillitzer, 2013). The UK is currently rolling out a more limited version of this -Making Tax Digital -which will send quarterly reports to the tax authority based on the entries in an organisation's accounts. This will apply to businesses, but also self-employed people and landlords who have high enough receipts, and the programme is explicitly intended to reduce the tax gap by improving accuracy and compliance (HM Revenue and Customs, 2020). Although this is less direct than live reporting when transactions occur, it is likely to have lower administrative costs for individuals than live reporting, and is more applicable for cases where transactions don't go through a till but are just direct transfers into bank accounts e.g. rental payments. Individuals can still keep two sets of accounts, but non-compliance becomes harder, and it becomes particularly difficult to be deliberately noncompliant and later claim it was in error. Although this system can be expected to reduce non-compliance among the self-employed and landlords, it was not at all in place during the period of our data.
Third-party reporting makes use of reports from counterparties to transactions. This can be done systematically, by automatic receipt of information and cross-checking with individual reports; or used in a non-systematic way, by contacting third parties only in cases where an investigation is taking place. The UK does both of these things. Employers directly report employment income to HMRC via the PAYE system, so HMRC can match this against selfreports in self assessment. 6 This reduces ex ante non-compliance since individuals are aware they are highly likely to have misreporting uncovered (Kleven et al., 2011). It also reduces ex post non-compliance, because mismatches on systematically reported items can increase the likelihood a tax report is selected for audit. HMRC can also access bank and credit card records, and non-systematic checking of these third party reports can be used to check the plausibility of returns that are being audited, again reducing ex post non-compliance. As will be shown below, variation in which income sources are covered by third party reporting is important in understanding the variation in who is most likely to be non-compliant.
Behavioural interventions are designed to discourage misreporting. These are typically relatively low cost, and are largely used to improve compliance before tax returns are filed. For example, when filing electronically, the computer system can check box-by-box reports against various benchmarks -previous returns, returns of similar individuals, ratio of reported values across boxes -to see whether the declaration seems plausible. If not, the system asks for confirmation. This helps pick up errors, as well deliberate evasion. Since it makes use of benchmarks as well as third party reports, this can reduce non-compliance in areas where third party reports are currently not available-most notably dividend and property income in the UK. However, Fonseca and Grimshaw (2015) caution that interventions which pre-fill the form with information held by the tax authority can reduce compliance in some circumstances, if the pre-filled information is inaccurate.
Audits help to improve ex post compliance, by looking for erroneous tax returns and having them corrected, as well as improving ex ante compliance through the threat of audit. They are more costly than behavioural interventions, as they require time from a compliance officer to handle each case. 7 But they are able to look more directly at what is reported, and make use of multiple information sources, to try to get an accurate picture of a taxpayer's income. There is also evidence they change taxpayer behaviour in the years after audit (DeBacker et al., 2018;Advani et al., 2019).
Although most audits are targeted at taxpayers believed to be non-compliant, HMRC also performs a number of random audits ('random enquiries') to ensure all individuals have some chance of audit and to learn about characteristics that predict non-compliance. We make use of audit records from these random enquiries for tax years 1999 to 2009, available from CQI (Compliance Quality Initiative)-an operational database that records audits of income tax self assessment returns. It includes operational information about the audits, such as start and end dates, and audit outcomes: whether non-compliance was found, and the size of any correction, penalties and interest. Over our sample period there were 38,000 random audits carried out, just below 3,200 a year. 8 IV Who is underpaying?
IV.1 Distribution of underpayment The first key fact to understand in terms of compliance in self assessment is that around one-third (35.7%) of randomly audited individuals were found to be non-compliant. Most of the remainder had correct returns (53.2% of all taxpayers), while around a tenth (11.1%) were recorded 'correct with no underpayment'-either the return had an error that didn't affect tax liability, such a valid deduction amount being claimed but filed under the wrong deduction category, or in some cases errors that had led to over payment. Non-compliance is therefore relatively prevalent in the UK self assessment system.
Despite the wide prevalence of some non-compliance, most of the under-reporting is due to a small minority. Figure 2 shows -among the non-compliant -who owes money and how much. Non-compliant taxpayers are divided into groups according to how much money the audit suggests that they owe: less than £100, between £100 and £1,000, between £1,000 and £10,000, or more than £10,000. 48% of taxpayers owe between £100 and £1,000, with a total of 60% owing less than £1,000. However, this 60% owe only 9% of the total missing tax. In contrast, 42% of the missing tax is owed by the 4% of people who owe more than £10,000. This highlights the enforcement value of audits in pursuing the small minority of individuals who owe extremely large sums of money. Just 200,000 people owe more than half of the self assessment tax gap -£3.7billion -between them.

IV.2 Incomes of non-compliers
We next study the income characteristics of the non-compliance in terms of location in the income distribution, income source, and industry.
Evasion across the income distribution To understand how tax compliance varies with reported income, in Figure 3 we divide taxpayers up according to the level of total income they declare the year before they are audited. Around 16% of taxpayers previously declared a zero income, whilst the remaining taxpayers are divided into five equal groups. Across all these groups, the probability of being non-compliant was relatively similar. The average amount of non-compliance in cash terms was also similar, at around £2,200, for all but the highest income group, for whom it was £3,600. However, since total tax owed is generally larger with higher previous incomes, the share of tax misreported is lower for higher income groups.
Studies of inequality often make use of tax data to get measures of income (Piketty and Saez, 2003;Piketty et al., 2018;Smith et al., 2019;Joyce et al., 2019;Advani and Summers, 2020a), and they are increasingly recognising the need to adjust measured inequality to account for evasion (Alstadsaeter et al., 2019). In the UK these results suggest that individual income inequality is slightly less bad than would be suggested by unadjusted fiscal income, since proportionally more income needs adding to the lower part of the distribution. 9 However, it is worth noting that evidence from the US suggests that some of the more complex types of tax avoidance behaviours are harder to pick up, and so understatement at the very top may be a larger issue (Guyton et al., 2020).
Evasion by income source Comparing individuals who have income from only one income source other than bank interest gives an indication of how non-compliance varies by the type of income. 10 Figure 4 shows that 59% of taxpayers declaring only self-employment income are found to be non-compliant, double the rate of non-compliance among those declaring only employment income (29%) and almost four times the rate among those declaring only pension income (16%). This is not surprising since -as described previously -self-employment income 9 Note that since the self assessment income tax population contains proportionally more high income individuals than the population as a whole (though it does also have low income individuals), the percentage increases shown here cannot be directly applied to quintiles of the population income distribution. 10 Advani et al. (2019) study differences in dynamic response across income sources within individual. They highlight the importance of source volatility in the dynamic response across income sources. A similar comparison cannot be done here for non-compliance by source within individual, since audit data don't explain which source(s) was/were erroneous where multiple sources exist.

Figure 3: Non-compliance by reported income quintile
Notes: Audited taxpayers grouped into zero income or income quintiles conditional on positive reported income, based on income report the year before audit. Note that income quintiles are defined in the population of self assessment income tax filers, which contains more high income and fewer low income individuals than the UK as a whole. Source: Authors' calculations based on HMRC administrative datasets.
is almost entirely self-reported, and income is often received from a large number of small transactions, making it is easier both to make mistakes and to obscure deliberately.
More surprising is that non-compliance is so high for those reporting only employment income, even though such income is also reported by the employer. This may be because some other source of income (employment or otherwise) was entirely unreported, or due to incorrectly claiming illegitimate some relief or deduction -which are not third party reported -rather than because of errors in reporting the employment income. In some cases employment income will be incorrectly reported, but as noted above the tax gap for PAYE income is around 1% so this is likely to be rare. 11 However, it highlights the importance of audits even in cases where noncompliance seems more unlikely: almost a third of people filing self assessment and reporting only employment income still have some error.
In terms of revenue, the largest amount of tax is owed by non-compliant taxpayers declaring only property income. Although only 24% of those declaring only property income are found to be non-compliant, they owe an average of £3,630. This is almost 60% of the total tax they owe.

Figure 4: Non-compliance by income source
Notes: Left hand panel shows the share of self assessment filers found to be under-reporting tax by income source. Right hand panel shows the additional revenue owed as a share of all tax owed (including the underreport), by income source. Based on self assessment filers who report only a single income source (other than bank interest), since audit data don't explain which source(s) was/were erroneous where multiple sources exist. Source: Authors' calculations based on HMRC administrative datasets.
To compare these results to Kleven et al. (2011), they find 14.9% of self-employment income is underreported but only 2.3% of dividend income and 1.1% of employment income, not conditioning on compliance status. The comparable numbers from our context are 26.1%, 5.2%, and 3.5%. Underreporting therefore seems to be a greater problem in the UK context than in Denmark, whether third party information is available or not. Similar results apply when looking at non-compliance status.
Together these results emphasise an understudied aspect of third party reporting: the extent to which it matters for the extensive margin (declaring at all) versus the intensive margin (correct declarations conditional on reporting). Figure 4 orders income sources from lowest to highest extent of third party reporting in the UK. 12 The right-hand panel shows clearly that third party information is important for the intensive margin, as it makes mismatches in reports easier to pick up. However, it appears to matter much less on the extensive margin, shown in the left-hand panel.
Evasion by industry Industry data are available for taxpayers who have any self-employment income. Figure 5 shows that among those individuals -43% of all self assessment taxpayersnon-compliance is highest among those in the construction, transport, and hospitality sectors. In these sectors, around 60% of taxpayers were non-compliant. In revenue terms, non-compliant taxpayers in hospitality owe by far the most, at an average of almost £4,500. Across other industries, the amount owed varies between £2,500 and £3,200. Taxpayers in both hospitality and transport under-reported the largest shares of the total tax they owed, at 54% in both cases.
Again this underlines the importance of third party information: within self-employment this is hardest to get for individuals working in industries where cash payment is common. Construction, transport (taxi and minicab drivers) and hospitality (bed and breakfasts) are all industries where cash is prevalent. This observable variation in where non-compliance is common provides a means for the tax authority to target audits. However, this also highlights the value in understanding how audit effects spillover between individuals, since in a standard Beckerian framework a large 'campaign' that increased the probability of audit among taxpayers of a particular type would likely, over time, reduce non-compliance even among the non-targeted. It is also worth noting that as cash use becomes generally less common, that should be expected to reduce under-reporting.

IV.3 Characteristics of non-compliers
While income characteristics change the opportunity for non-compliance, individuals may also vary in their willingness to under-report for other reasons. The literature on tax moralethe intrinsic disutility from misreporting -has studied how this varies across individuals with different characteristics (Luttmer and Singhal, 2014;Besley et al., 2019). For the most part this has relied on reported preferences about tax behaviour (Alm and Torgler, 2006;Torgler, 2006;Lago-Peñas and Lago-Peñas, 2010;Doerrenberg and Peichl, 2013) or lab experiments that elicit willingness to misreport (Cummings et al., 2009;Choo et al., 2016;Guerra and Harrington, 2018). The use of audits lets us instead understand how this varies in a real-world environment.
Knowing how behaviour varies with these characteristics can also be useful for targeting audits, since they are not easily manipulable, unlike characteristics of income. It is important to recognise that there may be ethical constraints to varying audit probabilities based on some these characteristics.
Compliance by sex Figure 6 shows that men are more likely to be non-compliant. 40% of men are found to owe additional tax revenue, compared with only 27% of women. Noncompliant men also owe more money (£2,470 on average) than non-compliant women (£1,790). However, this difference is driven by differences in male and female incomes. The additional amount owed is 32% of the total tax owed for both non-compliant men and non-compliant women.
Much is made of gender pay gap statistics, including those making use of tax data. These rely on tax data being an accurate measure of incomes, and previous research has noted that the gender pay gap is smaller in tax data than in surveys (Britton et al., 2019). Given the prevalence of non-compliance and gender-disparity in compliance behaviour, these results indicate that the administrative data underestimate the pay gap, driven by the differences in extensive margin behaviour, and this could be part of the reason for the patterns previously observed.
Compliance by age Looking at non-compliance by age, Figure 7 shows that it is broadly similar for those below state pension age (SPA), with around 40% of individuals found to be non-compliant and, among these individuals, around a third of total tax being undeclared. Among those above SPA, non-compliance is much lower, at 21%, and less than a quarter of total tax due was undeclared among the non-compliant in this group. Part of this difference is driven by differences in the types of income received: it is harder to detect both mistakes and deliberate errors in declared self-employment income than in declared pension income. However, even controlling for income sources non-compliance is declining with age.
Our results on sex and age are consistent with Kleven et al. (2011), who find that in Denmark being older and being female are negatively associated with evasion. However, while they find that only the result on sex is statistically significant, these differences are all significant in our case.
Compliance by geography Looking across the United Kingdom, Northern Ireland stands out as the region with the highest level of non-compliance. Figure 8 shows that 50% of self assessment taxpayers from Northern Ireland were found to be non-compliant, compared with between 35% and 39% across the rest of the UK. Northern Ireland is also unusual in having an average of 42% of total tax being unreported among those who are non-compliant, compared with 31% to 35% elsewhere. However, in cash terms, non-compliant taxpayers from London owe the largest amount (£3,420), driven by higher incomes in the capital. Within England excluding London, there is little difference across regions. In contrast to Kleven et al. (2011), being located in the capital city is not associated with lower evasion. These differences are all statistically significant, and hold when other characteristics are controlled for.
With our data we are not able to study further why this regional variation exists. However, if access were ever made to regional information about audits, it would be interesting to study how regional enforcement policy related to regional levels of non-compliance. This question is of particular policy relevance at the moment, as in recent years smaller HMRC compliance offices have been (and are being) closed and replaced with a small number of regional hubs. This policy will create further geographic variation in the cost of engaging in audit workwhich can involve site visits -likely increasing the variation in compliance behaviour.
The role of tax preparers Around 60% of taxpayers have a tax agent (accountants or other tax preparers). Of these, about 40% are found to be non-compliant, compared with just 29% of those without. Among these non-compliant taxpayers, the average additional tax owed by those with agents is £2,740, around 35% of their total tax owed; for those without it is only £1,400, or 27% of what they owe. Clearly people who have agents are likely to have more complex tax affairs than those without. This leaves more scope both for mistakes and for the use of deductions, reliefs and other tax reduction methods. The comparison made here is therefore not intended to imply any kind of causal relationship. However, these numbers show that the use of agents, who have experience of preparing tax returns so can reduce mistakes, certainly does not solve the problem of non-compliance. Continued targeting of such taxpayers for audit is valuable.

V Conclusion
The UK tax gap is not large by international standards, but when compared with revenue from other taxes it is sizeable: it sits somewhere between corporation tax and council tax-the fourth and fifth largest sources of tax revenue. The largest single gap -both in aggregate value and as a share of tax owed -is from taxpayers who file self assessment. Using data from HMRC's 'random enquiry' audit programme, we are able to shed light on who it is that owes this money: what types of individuals. The patterns we describe are not intended to be interpreted causally, since some of these characteristics are choices for the taxpayer. Nevertheless, the associations are interesting to understand for four reasons.
First, measures of inequality -whether by age, sex, or income shares across the distribution -often rely on tax data (Piketty and Saez, 2003;Piketty et al., 2018;Smith et al., 2019;Joyce et al., 2019;Advani and Summers, 2020a). Differential non-compliance based on these different characteristics will bias such measures, unless corrected for. Our results allow researchers to sign the bias, and to make corrections by imputing the missing income according to these characteristics. Given the importance of these statistics for both policy and public debate, such corrections are essential to provide an accurate understanding of the levels and trends in inequality.
Second, we generally want to understand the differences in individual behaviour. To the extent that this varies with observables, and those observables can be related to preference parameters such risk aversion, one can study how those parameters relate to observed compliance behaviour in a real world setting. Lab settings typically let one elicit these preferences simultaneously with behaviours, but given the importance of framing effects it isn't clear that responses to tax experiments in a lab overlap well with real behaviour. For example, Advani et al. (2019) note that although a number of lab experiments find evidence of so-called 'bomb crater' effects in compliance (Maciejovsky et al., 2007;Kastlunger et al., 2009), with individuals being less compliant immediately after audit, such behaviours are not seen in administrative tax data-in fact the opposite is observed.
Third, these results underscore the value of having third party information in reducing non-compliance (Kleven et al., 2011). The UK does not have systematic asset registers, so unlike many other countries shares and property income cannot be systematically verified. This increases the scope for non-compliance. A policy move towards increasing access to information for the tax authority would reduce non-compliance in a broad way without having the high marginal costs of an audit.
Finally, our results are relevant for the current targeting of audits, though of course directly using them for targeting is subject to the Lucas critique. They highlight the value of using observable characteristics, even ones that do not directly relate to tax owed, in targeting compliance work. This naturally leads to important ethical and legal questions: is it reasonable to use information about sex in determining who to audit. Administrative datasets on tax do not include ethnicity information, but this seems a clear context in which people may feel uncomfortable about such targeting, if it were to be relevant. These are trade-offs that must be confronted in making policy. The alternative is not obviously less harmful: many people who are unlikely to be non-compliant receive tax audits -costly for the individual audited as well as for the taxpayer -because data that are informative about their likelihood of non-compliance was not used. In this sense we fit into a new and growing literature on using characteristics to predict legal outcomes (Skeem and Lowenkamp, 2016;Fagan and Ash, 2017;Amaranto et al., 2018).
To make further progress on policy implications, more needs to be understood about how taxpayers learn and respond to audit risk. Advani et al. (2019) and DeBacker et al. (2018) show that audits shift the behaviour of taxpayers who receive audits. But a simple Allingham-Sandmo model of rational non-compliance would imply that as the risk of being audited rises for particular groups, compliance by that group should increase even among the non-audited (Allingham and Sandmo, 1972;Yitzhaki, 1987). For policy this is a positive effect, since noncompliance reduces faster in audit probabilities than if behaviour were static. Quantifying this behaviour is a natural next step in thinking about the number, as well as targeting, of audits.