The Effect of Education on Criminal Convictions and Incarceration: Causal Evidence from Micro-Data

This paper studies the causal effect of educational attainment on conviction and incarceration using Sweden's compulsory schooling reform as an instrument for years of schooling and a 25 percent random sample from Sweden's Multigenerational Register matched with more than 30 years of administrative crime records. The first stage of the analysis employs a differences-in-differences design to account for the non-random implementation of the reform across municipalities, and finds that exposure to the reform increased average educational attainment by 0.28 years for males and 0.16 years for females. Our 2SLS estimates indicate that more schooling has a significant negative effect on convictions and incarceration at both the extensive and intensive margins. These effects are generally seen for both males and females. Specifically, one additional year of schooling decreases the likelihood of incarceration by 16 percent for males and the likelihood of conviction by 7.5 and 11 percent for males and females, respectively. In addition, we find that the effect of education on crime persists across birth cohorts, throughout the life cycle, and across crime categories.

Matthew Lindquist SOFI Stockholm University

Introduction
There is an extensive amount of research dedicated towards identifying the private economic returns to education. 1 In contrast, much less attention has been given to studying the impact of education on other non-economic outcomes. 2 It is necessary to know the size of the external (social) benefits of education, however, when evaluating policy initiatives. The current paper will contribute to this literature by studying the impact of education on one such non-economic outcome -criminal activity -using register data from Sweden.
There is much evidence that criminals tend to be less educated than the rest of the population. For instance, according to a 2003 Bureau of Justice Statistics Report, about 41 percent of inmates in U.S. prisons or jails in 1997 had not completed high school (or its equivalent) in comparison to just 18 percent of the general population aged 18 or older (Harlow, 2003). 3 In addition, this phenomenon is not isolated to the United States. Using Census data, Machin, Marie, and Vujić (2011) show that 2.57 percent of UK men aged [21][22][23][24][25] with no educational qualifications were in prison in 2001 compared to 0.3 percent of those with some qualifications. The data for the current project indicates the same pattern. Using a 25 percent random sample of Swedes born between 1943 and 1955 (more than 400,000 individuals in total), we see that the average years of schooling for males (females) with at least one conviction is 10.8 (11.4) while the average years of schooling for males (females) with no convictions is 11.5 (11.7). 1 See Card (1999) for a review. 2 One of the more studied non-pecuniary outcomes is health: Grossman and Kaestner (1997) and Lleras-Muney (2005) find a positive relationship between education and health outcomes. See Oreopoulos and Salvanes (2009) for a review of the literature; they argue that the non-pecuniary returns to education are at least as large as the pecuniary returns to education. 3 Petit and Western (2004) report that 40 percent of state prisoners in 1997 lacked a high school diploma. But, are the above described relationships causal in nature? Theoretically, there are a number of reasons to expect that an increase in education causes a decrease in crime. Two recent reviews of the economics of education and crime literature by Lochner (2008Lochner ( , 2010 highlight these potential underlying mechanisms. 4,5 First, education increases wages and, therefore, the opportunity costs to committing a crime. Individuals cannot be engaged in the legitimate labor market during the time allocated towards the planning and execution of their crime or when detained at the police station upon arrest, jailed prior to conviction, or incarcerated post-conviction. Second, as highlighted by Lochner and Moretti (2004) and Lochner (2010), individuals may learn to be more patient through schooling. Such individuals place more weight on their potential future earnings and, thus, may be more likely to factor in the chances of getting caught and the expected prison sentence when deciding whether to commit a crime. Becker's economic model of crime (1968) implies that the increased opportunity costs and patience associated with higher education would decrease criminal activity. Third, increased schooling can decrease the chance that an individual engages in criminal activity by increasing his attachment to legitimate society. In addition, the more educated an individual is, the more educated his peers are likely to be. If education decreases crime through any of the above mechanisms, then there may also be a social multiplier or peer effect that further decreases criminal activity. 6 A number of empirical issues, however, make it difficult to interpret the above described education-crime relationships as causal. Perhaps the most important source of endogeneity is unobservable heterogeneity. A causal interpretation can be complicated by the possibility that the observation of a negative correlation arises because of unobserved individual characteristics, such as low risk aversion, lack of patience, or low ability, that simultaneously place individuals at high risk of both crime and low educational outcomes. Another potential source of endogeneity is reverse causality. How much of the observed negative correlation is driven by the effect of an individual's criminality on his education outcomes ? Hjalmarsson (2008) provides some evidence that this is a valid concern, at least with regards to crimes committed during the teenage years. 7 Lochner and Moretti (2004) were the first to address these issues and provide convincing empirical estimates of the causal effect of education on crime. They study the causal effects of schooling on incarceration and arrests by instrumenting for educational attainment (years of education or high school graduation) with compulsory schooling laws in the state of birth and year when the individual was 14 years old. They find that one extra year of schooling results in 0.10 and 0.37 percentage point reductions in the chance of incarceration for whites and blacks, respectively. They also find that a one year increase in average schooling reduces both property and violent crimes by 11 to 12 percent. 8 More recently, Machin, Marie and Vujić (2011) used changes in the minimum school leaving age in Britain from 14 to 15 in 1947 and 15 to 16 in 7 Another empirical issue is that many papers use administrative arrest and incarceration records. These measures are only observed, however, for individuals who are caught. To the extent that more educated individuals are 'better' criminals, i.e. they are less likely to be caught, convicted, and/or incarcerated conditional on arrest, this can bias estimates of the relationship between education and administrative crime measures. 8 Lochner (2004) uses a similar approach to look at the effect of schooling on arrest rates for white collar crime and finds a positive, though insignificant, relationship.
1973 as an exogenous source of variation in the years of schooling. They find a significant, negative effect of education on property crimes and an insignificant effect on violent crimes. 9 In a recent working paper -perhaps the first to use micro-data to study the causal impact of education on crime -Meghir, Palme and Schnabel (2011) study the impact of Sweden's compulsory schooling reform on crime. The use of this reform as a source of exogenous variation in education was pioneered by Meghir and Palme in their 2005 study, which looked at the effect of the reform on educational attainment and earnings. 10 The Swedish compulsory school reform, which primarily extended compulsory schooling from seven to nine years, differs from the U.S. and U.K. reforms studied by Lochner and Moretti (2004) and Machin, Marie and Vujić (2011), respectively. The Swedish reform was implemented at different times across municipalities during the 1950s and 60s, which allows Meghir, Palme and Schnabel (2011) to estimate the effect of the school reform net of any general equilibrium effects that the reform may have had on the Swedish labor market. In their experiment, they compare individuals from the same birth cohort who are working in the same labor market, but who happened to have been exposed to two different school systems. Meghir, Palme and Schnabel (2011) study the effect of this reform on the crime of males directly affected by the reform and on the sons of those affected by the reform. This intergenerational perspective is, perhaps, the key distinguishing factor that sets their work apart from the previous studies by Lochner and Moretti (2004) and Machin, Marie and Vujić (2011).
Using a reduced form specification, they find evidence of a negative effect of the reform on both the likelihood of conviction (a 5 percent reduction) and the number of convictions (a reduction of 9 Buonanno and Leonida (2009) find that a 10 percentage point increase in high school graduation rates would reduce property crime rates by 4% in a study using panel data for 20 Italian regions from 1980 to 1995. Though they do not have an exogenous source of variation in education, they control for region and time fixed effects, region specific quadratic time trends, and a vector of time-varying region specific characteristics. 10 Earlier versions of this paper include Meghir and Palme (1999) and Meghir and Palme (2001). 0.25 crimes for those coming from low SES backgrounds), but not on imprisonment. Perhaps more striking is their result that sons whose fathers were assigned to the school reform have a 2.5 percent lower probability of being convicted. They argue that these intergenerational effects operate through improved parenting and investments in children.
In the current paper, we also study the effect of educational attainment on crime using Sweden's compulsory schooling reform. Our analysis uses a 25 percent random sample of those born between 1943 and 1955 from Sweden's Multigenerational Register (more than 400,000 individuals). Longitudinal data concerning income, education, parish and municipality of residence in 1960, as well as criminal convictions and sentences for each individual for the years 1973 to 2007 were matched on to the data.
Because municipalities were not randomly selected into the school reform experiment and evaluation program, it is likely that the reform is correlated with municipality specific characteristics. Thus, our analysis is based on a differences-in-differences design, which includes: (i) birth cohort fixed effects, (ii) municipality fixed effects to control for municipality specific characteristics that are constant over time, and (iii) municipality specific time trends to control for municipality characteristics that change over time. Though it is still possible that there are unobservables that are correlated with the reform, we provide some evidence that this is not the case: parental education does not predict reform participation once the above controls are included. In the first stage of our analysis, we find that the reform is a significant predictor of educational attainment: exposure to the reform increases average educational attainment by 0.28 years for males and 0.16 years for females. These estimates are roughly similar to Meghir and Palme's (2005) original estimates. 11 Though we are not the first to study the impact of the Swedish school reform on crime, we believe that our study still makes a number of important contributions. First, we study the effect of education on crime for both males and females; most previous studies study only males. 12 Second, we have more than 30 years of crime data; for the youngest cohort in our analysis, we have crime records beginning at age 18. This allows us to look at the effect of education on crime at various stages of the life cycle. Third, our crime data is also detailed enough to allow us to look at the effect of education on various types of crimes (violent, property, and other) as well as crimes that are serious enough to warrant a sentence to prison. 13 Our baseline estimates indicate that more schooling causes a significant decrease in criminal activity for both males and females at both the extensive and intensive margins. For males, one additional year of schooling decreases the likelihood of conviction by 2.4 percentage points (7.5 percent), the likelihood of incarceration by 1.1 percentage points (16 percent), the number of crimes by 0.40, and the number of days sentenced to prison by six. For females, one additional year of schooling significantly decreases the chance of conviction by one percentage point (11 percent) and the number of crimes by 0.09.
We also find strong evidence that these relationships persist across a number of dimensions. First, a significant negative effect of schooling on crime is seen across birth cohorts.
Second, this negative effect of schooling on crime is observed across the life cycle. For instance, we see a significant negative effect on male convictions in each of the following age categories - 18-29, 30-39, and 40-49 -at both the extensive and intensive margins. Finally, for both males and females, schooling has a causal impact across crime categories. For instance, an additional 12 Machin, Marie and Vujić (2011) present results for females in an appendix table; they do not find a significant effect of education on crime for females. 13 In contrast, Meghir, Palme and Schnabel (2011) only study males. They also have less crime data, such that: (i) the crime records for their youngest cohort begin at age 26, after crime typically peaks, and (ii) they do not observe crime type.
year of schooling significantly decreases the likelihood of a property crime conviction by 10 percent and a violent crime conviction by 13 percent for males.
The remainder of this paper proceeds as follows. Section 2 provides institutional details regarding the Swedish compulsory school reform. Section 3 describes the data and presents descriptive statistics. Section 4 analyzes the relationship between the reform and educational attainment (i.e. the first stage of our analysis) and evaluates the extent to which the reform is a relevant and valid instrument for education. Section 5 presents our baseline instrumental variable results of the impact of schooling on convictions and imprisonment while Section 6 considers the heterogeneity of these results across birth cohorts, over the life cycle, and across crime type.

The Swedish Compulsory School Reform
A careful description of the Swedish compulsory school reform can be found in Marklund (1980Marklund ( , 1981. Detailed information can also be found in a report by the National Board of Education (1960) andin Holmlund (2007). The following brief description builds on these sources, which are recommended for further details on the topic.
In 1946, a parliamentary committee was given the task of analyzing the Swedish school system and to develop proposals and guiding principles for the future compulsory school. At this time, pupils generally went through grades 1 to 4 or 1 to 6 in a common school called folkskolan.
In either fourth or sixth grade, more able students were selected (based on past performance) for the five-year or three-to four-year long junior-secondary school called realskolan. Remaining students stayed in the common school until compulsory education was completed. In most cases, compulsory education was comprised of seven years, but in some municipalities, mainly the big cities, the minimum was eight years.
In 1948, the committee released its proposals. Their main suggestion was to introduce a nine-year compulsory school, where pupils were kept together in common classes longer than in the earlier school system. As a compromise between the advocates and the opponents of early tracking, the committee proposed tracking in 9th grade; pupils would follow either a vocational track, a general track, or a theoretical track preparing for upper-secondary school. Tracking in the 9 th grade, however, was later abandoned in favor of a completely comprehensive system.
The purposes underlying the proposal were among others to postpone the tracking decision to higher grades, in an effort to increase equality of opportunity, and to be able to meet the demand for junior-secondary education among the baby boom cohorts of the mid-1940s. To evaluate the appropriateness and whether the proposed nine-year comprehensive school would serve its purpose, the committee suggested that an "experiment" would take place, where during an assessment period some municipalities and schools would implement the new school system such that the results could be scrutinized before further decisions were made.
The assessment program started in 1949/1950. The new comprehensive school was to be introduced throughout a whole municipality, or in certain schools within a municipality.
Following the 1948 proposal of the parliamentary committee, a number of municipalities had declared interest in reforming their comprehensive schools. For this reason, 264 municipalities (out of around 1000) were asked if they were willing to introduce the nine year school immediately or within a few years. The municipalities that were approached had either shown interest in the reform or had previously expanded their junior-secondary school to four years.
144 municipalities expressed an interest in implementing the reform. Of these, 14 municipalities were selected for the first year of the assessment (1949/50), all of these were required to have an eight year comprehensive school already.
The National Board of Education continued with the implementation of the reform in the following years.
Year by year, more municipalities joined the reform assessment program.
Municipalities that wanted to take part in the reform were asked to report on their population growth, the local demand for education, tax revenues and local school situation. For example, the availability of teachers, the number of required teachers for the nine-year comprehensive school, and the available school premises were explored. The National Board of Education took these municipality characteristics into account when deciding on their participation. In general, implementation of the reform started in grades 1 and 5, the following year covering grades 1, 2, 5 and 6 and so on. From 1958, the reform was introduced in grades 1-5 in the initial year.
Apart from extending compulsory education from seven to nine years and postponing tracking, the educational reform was also pedagogical and had some affect on the curriculum.
The main change to the curriculum was the introduction of English in 5th grade in the new comprehensive school, while this was not necessarily a compulsory subject in the old school system. The school starting age was set at the year the child turned seven in both the old and new comprehensive school.
The assessment period was also accompanied by financial support to families and to municipalities that implemented the reform. A universal child allowance was introduced in 1948 and implied support for children until the age of 16. In reform municipalities, a means tested scholarship compensated families for foregone earnings from keeping their children longer in school, and municipalities were compensated with ear-marked money from the central government for the increased costs following the expansion of mandatory education.
In 1962, the parliament decided to permanently introduce the nine-year school throughout the country. At this point, implementation became a matter for each municipality; by 1969 they were obliged to have the new comprehensive school running. As the timing was much in the hands of each municipality, the implementation was far from a randomized experiment; nevertheless, it provides a source of variation in schooling laws that allows us to investigate the causal impact of education on crime.

Data Description
The data used in this paper were constructed as follows. Statistics Sweden began by drawing a Information on education level is taken from Sweden's Education Register and from the 1970 census. Education is recorded in 7 levels. We assign years of schooling as follows: 7 for old primary school, 9 for new compulsory school, 11 for short high school, 12 for long high school, 14 for short university, 15.5 for long university and 19 for a Ph.D.
Sweden's official crime register was then matched on using the same personal identification number by Sweden's National Council for Crime Prevention. Thus, our data include a full record of criminal convictions for the years 1973 to 2007 for each individual in the data set; for the oldest birth cohort (born 1943), our crime data span ages 30 to 64 and for the youngest birth cohort (born 1955), our crime data span ages 18 to 52.
We use these data to construct a number of crime variables. The first variable, crime, is a measure of crime at the extensive margin. That is, it is equal to one if a person has ever been convicted of a crime between 1973 and 2007 and zero if she/he has not. We also identify whether a person has been convicted of a violent crime, property crime, or other type of crime between 1973 and 2007. 14 At the intensive margin, we create a measure of the number of crimes that a person has been convicted of that we label crimesum. This variable is also broken down by crime type: violent, property and other.
One conviction may include several crimes. Our crime type variables are created by looking over all of the crimes within a single conviction. 15 Speeding tickets, parking tickets and other forms of minor disturbances (ticketable offenses) are not included in our crime measure. It must be an offense that is serious enough to be taken up in court and that results in an admission of guilt or a guilty verdict. 14 Violent crimes, or crimes against persons, are crimes covered by chapters 3-7 in the Swedish criminal code (brottsbalken). Property crimes are those included in chapters [8][9][10][11][12] in the criminal code. These are standard definitions used by Sweden's National Council for Crime Prevention. All remaining crimes are labeled as "other". The 5 most common violent crimes are (in order of frequency) assault, molestation, unlawful threat, aggravated assault and aggravated unlawful threat. The five most common property crimes are petty theft (mainly shoplifting), theft, vandalism, larceny and fraud. The five most common "other" crimes are dangerous driving, driving without a license, unlawful driving, smuggling and minor narcotic offenses. 15 Thus, if you steal a car, then commit an armed robbery and then get caught after a high-speed chase, you will have one trial and one sentence that include convictions for at least three crime types. In this case, the individual would receive violent = 1 (armed robbery), property = 1 (car theft), and other = 1 (serious traffic offense + resisting arrest).
Our data also include information on the type and severity of each sentence handed down by the court. We use this information to construct a dichotomous variable, prison, which indicates if an individual has ever been sentenced to prison. We also create an intensive margin variable prison sentence, which is equal to the total number of days a person has been sentenced to prison (i.e. we sum across all known prison sentences). 16 Those who died or emigrated from Sweden before 1974 are dropped from the sample, as they cannot show up in our crime data. This reduces our sample by 26,338 individuals to 455,318. We also remove those who emigrated to Sweden after 1955, since we know that these individuals did not take part in the compulsory school reform. This reduces our sample by 79,723 individuals to 375,595 individuals. 17

Determining an individual's treatment status
The Swedish administrative registers do not contain information on whether individuals in the affected cohorts went through the old or new school system. However, with help from other sources, it is possible to deduct when, and for which grades, each municipality introduced the new comprehensive school. Based on this information, one can assign a reform dummy to the individuals in a data set extracted from registers.
In the registers, birth year is known, and through the censuses, it is feasible to track in which municipality an individual lived at the time of compulsory education. It is then possible to attach a reform indicator to each individual based on year of birth and municipality of residence, maintaining the assumption that individuals are in the right grade according to their age. In some cases, it is also necessary to use more detailed information on in which parish or school district the individual went to school, since the reform was sometimes introduced in parts of a municipality in different years.
There are two possible ways to construct a reform coding that can be matched to individual-level register data. One is based on existing public documents describing and evaluating the school reform. The second is deduced from register data. Following Holmlund (2007), we label them Reform Coding 1 (based on documentation) and Reform Coding 2 (deduced from register data).
The primary information sources necessary to construct Reform Coding 1 are The National Board of Education (1953)(1954)(1955)(1956)(1957)(1958)(1959)(1960)(1961)(1962) and Marklund (1981), which include lists of which municipalities implemented the reform each year. From these records, it is also possible to see which grades were affected in a particular municipality. These two sources cover the assessment period and only allow coding of the cohorts born [1943][1944][1945][1946][1947][1948][1949]. The remaining cohorts are coded using municipality-level tables of the number of pupils in each grade in the old and new school system published by the Educational Bureau (Undervisningsbyrån) (1960)(1961)(1962)(1963)(1964) and Statistics Sweden (1968Sweden ( -1969 Register data sets, with large sample sizes, also allow for another procedure to code the reform, Reform Coding 2. For each municipality/birth year cell, it is possible to deduce the minimum level of education, and if the minimum level jumps up from folkskola (the old compulsory minimum) to grundskola (the new minimum), it tells us when the reform was implemented. 19

Descriptive Statistics
Descriptive statistics broken down by gender are reported in Table 1. The first three columns report statistics for the 25 percent random sample. We see that the average length of education for those born between 1943 and 1955 is 11.37 years for males and 11.72 years for females.
According to Reform Coding 1, 39 percent of individuals were subject to the reform. According to Reform Coding 2, 43 percent were subject to the reform. We also see that 32 percent of all males have at least one conviction and that males have (on average) been convicted of 1.95 crimes. 20 Only 9 percent of the females in our random sample have been convicted of a crime.
On average, females are convicted of 0.26 crimes. Lastly, we see that seven percent of males and less than one percent of females have been sentenced to prison at least once.
The last three columns in Table 1 report descriptive statistics for all individuals included in the random sample who have non-missing values for Reform Coding 1. The descriptive statistics of this group are nearly identical along all dimensions to those of the purely random sample. 21 19 We use our sibling sample (see footnote 15) -which is twice as large as our random sample -when creating Reform Coding 2, since the algorithm works best with large samples. To create Reform Coding 2, we start by throwing out all individuals with education level 3 or higher. We then collapse the data into cell means by birth year and municipality. We then find the first birth year cohort within each municipality that has an average education level above 1.75. This cohort is designated as the first treated cohort. For the 3 largest cities (Malmö, Gothenburg and Stockholm) we create the reform code at the parish level instead. The correlation between Reform Coding 1 and Reform Coding 2 is 0.91. If we, instead, label the first treated cohort as the cohort that has an average education level above 1.50, then this correlation is 0.83. This correlation is maximized at the value of 1.75. 20 While these crime rates may seem high, they are consistent with those reported in other papers using administrative Swedish crime records (see Hjalmarsson and Lindquist (forthcoming), Meghir, Palme and Schnabel (2011), and Grönqvist (2011)). It is also important to note that a majority of convictions are in the 'other' crimes category, which includes, for instance, a lot of alcohol related offenses. 21 The same is true when we look at the descriptive statistics of those who have both Reform Coding 1 and Reform Coding 2 defined. Figure 1 depicts the share of individuals in each birth cohort that have been classified as treated by the compulsory school reform. Figure 2 shows the average number of years of schooling for males and females by birth cohort. Figure 3 illustrates the share of males and females in each birth cohort who have at least one conviction, as well as those with at least one prison sentence. It is important to understand that the upward trends seen in Figure 3 are driven by the fact that we have more years of crime data at younger (more crime prone) ages for the later cohorts. It is not the case that crime is simply trending upwards over time. 22

The Effect of the Reform on Educational Outcomes
We begin our initial analysis of Sweden's compulsory school reform by looking at the effect of the reform on educational outcomes. Did the reform raise average years of schooling and, if so, by how much? To answer these questions, we estimate the following differences-in-differences regression equation: where S icm is years of schooling for individual i in birth cohort c who goes to school in municipality m. REFORM cm is an indicator that takes the value one if the individual belongs to a birth cohort that was subject to the reform in her particular municipality. X icm is a vector of observable characteristics, while η c and μ m represent birth cohort and municipality fixed effects, respectively. We also allow for municipality-specific time trends, trend m . 22 If we create these figures using age-specific variables, e.g. whether an individual has a conviction between the ages of 30 and 39 or between the ages of 40 and 49, then all of the upward trends in crime disappear except for the trend in violent crime for men aged 30 to 39. In this figure, we see a one percentage point increase when comparing the oldest to the youngest cohorts.
The results for males and females are reported separately in Table 2. To better understand the role played by municipality fixed effects and municipality-specific trends, neither is included in the first two columns. When only birth year effects are included, we find that the average effect of the school reform on schooling is 0.64 years for males and 0.44 years for females.
Controlling for parental education lowers these effects to 0.48 years and 0.31 years, respectively.
Including municipality fixed effects in column (3) reduces the magnitude of these reform effects to 0.24 and 0.12 years of schooling, respectively. After adding municipality-specific linear time trends (in columns (5) and (6)), the effect of the school reform stabilizes at 0.28 years for men and 0.16 years for women. We obtain the same point estimate with and without controls for parental education. When we allow for quadratic (as opposed to linear) time trends, the estimates are slightly reduced to 0.24 for men and 0.15 for women and become less precise. 23 The gender difference in the reform effect is not surprising given the pattern of gender differences in years of schooling seen in Figure 2. Women born in these cohorts were already more likely to study beyond grade seven than men. Thus, fewer women were pushed into grade nine by the compulsory school reform. 24

The Reform as an Instrument for Years of Schooling
The purpose of this paper is to assess the causal impact of years of schooling on crime. To this end, we estimate the following baseline model: 23 A visual inspection of the data on years of schooling at the aggregate level (see Figure 2) and at the municipality level shows that years of schooling tend to grow linearly. Any short-run accelerations or decelerations in the data are likely caused by the reform itself. Hence we prefer not to control for quadratic trends. We will, however, show how sensitive our results are to the inclusion of quadratic trends. 24 The effects reported in Table 2 are most likely downward biased due to the presence of measurement error in our coding of the reform variable. However, if we use our second (independent) measure of the reform, Reform coding 2, as an instrument for Reform coding 1, then the average effect of the school reform for males rises from 0.28 (column (5)) to 0.42 and for females it goes up from 0.16 (column (5)) to 0.22. Note, however, that these are not consistent measures of the effect of the school reform. They are upwardly biased measures (see Kane et al. (1999) and Holmlund (2007) for a discussion). Thus, the true effect of the school reform lies somewhere in between. (2) where C icm is a measure of criminal activity for individual i belonging to birth cohort c and going to school in municipality m. Years of schooling are given by S icm and ε icm is the regression error term. Similar to equation (1), equation (2) also includes birth cohort effects, ω c , municipality fixed effects, θ m , and municipality-specific time trends, trend m . Table 3 presents the results of estimating equation (2). We see a negative and significant relationship between years of schooling and criminal activity. This relationship is observed at both the extensive and intensive margins and for both males and females. At the extensive margin, one additional year of schooling decreases the likelihood that a male (female) has a criminal conviction by 2.3 (0.5) percentage points, on average. This relationship also exists at both margins when looking at each crime category and sentences to prison.
However, it is not necessarily the case that the coefficients presented in Table 3 can be interpreted as the causal impact of schooling on crime. The main threat to identification is that the error term includes unobservable, individual characteristics, such as cognitive and noncognitive abilities, that may be correlated with both years of schooling and criminal behavior.
This would bias our estimate of β 1 , preventing us from giving it a causal interpretation.
To address this problem, we apply an instrumental variables (IV) approach. More specifically, we will use the Swedish compulsory school reform as an instrument for years of schooling to identify the causal impact of schooling on crime, where equations (1) and (2) are the first and second stages, respectively.
A high quality instrument should be highly correlated with the variable for which it is instrumenting. Although it is not perfectly clear how one should define "highly correlated", Staiger and Stock's (1997) rule of thumb has gained a large measure of acceptance in the literature. They argue that when the F-statistic on the instrument is below 10, the instrument should be considered weak. In Table 2, we see that we have only one F-statistic that is less than 10 (equal to 8). The F-statistic in our baseline model (column (5)) is 36 in the specification for females and 80 in the specification for males, implying that our instrument is, in fact, a strong predictor of years of schooling.
To be a valid instrument, it is crucial that the reform is uncorrelated with unobserved characteristics that also determine crime. 25 As explained in Section 2, reform implementation was not random across the population or municipalities. Instead, municipalities with certain characteristics were invited to participate in the "experiment". Moreover, after the assessment period had come to an end, municipalities themselves decided when to join. This indeed makes it likely that the reform is correlated with characteristics specific to the municipalities. Worrisome as this may sound, it is taken care of by our differences-in-differences specification; municipality dummies control for all municipality-specific characteristics that are constant over time, while municipality time trends deal with all municipality-specific characteristics that trend over time.
Nevertheless, it is still possible that the reform correlates with other unobserved factors not addressed in our analysis thus far.
A first test as to whether the reform is exogenous is already provided in Table 2. The differential results when excluding/including municipality effects clearly show that the reform is correlated with municipality-specific factors. Given the differences-in-differences specification however, if the reform is uncorrelated with unobserved characteristics, the point estimate of the effect of the reform should remain constant once we include controls for further background characteristics. Compare, for example, columns (3) and (4) in Table 2. When parental education 25 Furthermore, the interpretation of our results is based on the assumption that the primary effect of the school reform on crime is through an increase in years of schooling and not through changes in the curriculum or through changes in peer groups.
is added to the specification, the point estimates are somewhat reduced, which signifies that the reform is not entirely exogenous, and that it is likely positively correlated with other factors that positively determine children's education (such as parental education). However, once we add municipality-specific linear time trends to the model, then adding parental education no longer affects the reform coefficient (compare columns (5) and (6) in Table 2).
A more direct approach to assessing whether or not the reform is correlated with parental background is to check whether or not parental education can predict reform participation after controlling for birth year effects, municipality effects and municipality-specific trends. In Table   4, we see that parental education does not predict reform participation in our baseline models shown in columns (3) and (6), which include birth year effects, municipality effects, and municipality-specific trends.
If the reform is (conditionally) exogenous, it should not have any effect on cohorts that passed through the educational system before the reform was introduced. If there is such an effect, then it is a signal that the policy is actually correlated with some unobserved factors that are not captured by our differences-in-differences approach. In Table 5, we present results for the effect of the reform on cohorts that were one, two, three, and four years too old to have been subject to any changes. What we find is that the reform did, in fact, have a sizeable, statistically significant effect on years of schooling among men and women who were one year too old to have been affected by the school reform, even after controlling for birth year and municipality effects and municipality-specific trends.
The finding of a pre-reform effect for the cohort one year ahead is not entirely unexpected. First of all, there is some measurement error in the reform coding, since in some cases it is not possible to assign a clear-cut starting date to the reform. This introduces coding errors, but only for one cohort ahead or behind what is coded as the starting year. Second, one underlying assumption of the analysis is that individuals are in the expected grade according to their age. Some of those who repeat a grade might actually have gone to the new school, although in the data they are coded as non-participants. This should also give us a positive effect of the reform for those in the cohort one year ahead. Therefore, the fact that we only observe an effect for the cohort one year ahead, and not those two, three, and four years ahead (see Table 5), is not problematic, given that we add add one pre-reform dummy to the estimated model.
As demonstrated above, measuring the reform effect is sensitive to the inclusion of municipality-specific trends. Wolfers (2006) studies the sensitivity of differences-in-differences estimates to the inclusion of region-specific trends. He argues that adding region-specific trends to the regression may capture actual responses to a policy change, and not just control for prepolicy trends. If a policy implies both a level and a trend shift, then municipality-specific trends will actually partly control for the policy responses that we want to estimate. This can lead to biased estimates, and the problem is aggravated when there are few observations before the policy change is in effect.
To examine this possibility, we re-estimated the reform effect on years of education, including municipality-specific, predicted pre-reform trends in education as a control. The trend is predicted using only observations prior to the introduction of the reform and, therefore, should not capture trend shifts induced by the policy. 26 The measured reform effect is, in fact, stable to 26 We predict pre-reform trends in education using cohorts born . For individuals born 1932-1942, we use their mothers' municipality in 1960 as an indicator of where they went to school (the assumption being that mothers are less mobile than young people leaving the parental home). We first regress years of schooling on municipality and cohort dummies, and municipality-specific trends, using only the years prior to the implementation of the reform in each municipality. Then the predicted values from this regression are included as a control in a regression of years of schooling on the reform dummy and municipality and cohort controls, using the 1943-1955 cohorts. the inclusion of pre-reform trends: the coefficients are close to those reported in column (3) of We have also investigated post-reform dynamics. In column (1) of Table 6, we see that the effect of the school reform on years of schooling among males dies off as we examine younger cohorts born further and further away from when the reform was introduced in their municipality. Column (2) of Table 6 shows that the effect becomes insignificant (at the 5 percent level) for women born three or more years after the reform was implemented in their municipality. 27 A more general concern is the fact that some of our "treated" individuals were born up to 17 years after the reform was introduced in their municipality (the median is four), while some of those in our "control" group were born as many as 15 years before the reform was implemented in their municipality (the median is 5). 28 The fact that the compulsory school reform was implemented over such a long time period raises concerns about the comparability of the treatment and control groups. Should one really compare those born 10 years after the reform was implemented in their municipality to those who were born 10 years before the reform was implemented in their municipality? It does not seem plausible that our limited set of controls is enough to deal with the amount of unobservable heterogeneity that could exist between these two groups.
27 A similar pattern is observed when using municipality specific quadratic trends rather than linear trends. Results available from the authors upon request. 28 Some of the observations in our data come from the earliest reforming municipalities. The first birth cohort affected by the reform in these early reforming municipalities was the cohort born in 1938. Some of the observations in our data lived in municipalities that were late reformers. The first birth cohort affected by the reform in these late reforming municipalities was the cohort born in 1958.
Thus, to make our implicit treatment and control groups more comparable, and to deal with the reform fade-out effect seen in Table 6, we limit our sample to those individuals born at most five years before or after the reform was implemented in their municipality. 29

The Causal Effect of an Increase in Schooling on Crime and Incarceration
This section presents our baseline estimates of the causal effect of schooling on crime and incarceration found by estimating the two stage least squares model presented in equations (1) and (2), which uses exposure to the reform as an instrument for years of schooling. Taking the discussion from Section 4 into account, we restrict the sample to those born within five years of the first cohort affected by the reform in their own municipality. Table 7 presents the results for males (columns (1) - (3)) and females (columns (4) -(6)) for four dependant variables: crime at the extensive margin (crime), crime at the intensive margin (crimesum), prison at the extensive margin (prison), and prison at the intensive margin (prison sentence). For each dependant variable, we consider three specifications. Columns (1) and (4) include just municipality and birth cohort fixed effects while columns (2) and (5) add in municipality specific linear time trends. Columns (3) and (6), our preferred specifications, also control for a dummy indicating whether a cohort was born one year prior to the reform.
The first thing to note is that controlling for municipality specific time trends is clearly important, both qualitatively and quantitatively. In fact, without such time trends, a significant relationship is only observed between years of schooling and the number of days sentenced to 29 A final threat to identification that we would like to mention is selective mobility. Geographical mobility can be a potential problem when analyzing policy changes at the regional level. Individuals may respond to the new policy, by moving away from it in order to avoid it, or by moving to it in order to benefit from it. Using comparable data sets to our own, Meghir and Palme (2003) and Holmlund (2007)  prison sentences, it is important to recall that the proportion of females with a prison sentence is extremely small -less than one percent. Finally, the magnitude and significance of these point estimates change little when the one year pre-reform dummy is included.
We will take the results presented in columns (3) and (6) Table 7 are very similar, particularly for males, to the non-causal OLS estimates in Table 3. 30 Overall, the results in Table 7 indicate that more schooling causes a significant decrease in criminal activity at both the extensive and intensive margins for males and females. For males, this effect is not limited to minor offenses but also has at least as large an impact on offenses serious enough to warrant a prison sentence. In addition, the magnitude of the effect of one additional year of schooling for males is comparable to that found previously in the literature; Lochner and Morretti (2004) find that one additional year of schooling decreases the likelihood 30 This finding is consistent with the results of Lochner and Morretti (2004), who also find a similar pattern. of arrest and imprisonment by about 11 percent. 31 Given the lack of previous research, such a comparison cannot be made for females. The female effect, however, does seem to be just as large as the male effect when measured relative to the proportion of females who are convicted.
We have also estimated a number of specifications to assess the sensitivity of the results to our choice of the baseline model. First, Appendix Table 1 replicates Table 7 using Reform Coding 2, as described previously, to measure exposure to the reform rather than Reform Coding 1. The qualitative and quantitative patterns of the results do not change. Appendix Table 2 demonstrates that our results are not sensitive to the inclusion of just one pre-reform dummy; allowing individuals born two, three, and four years before the reform to be impacted by the reform does not substantively change the impact of schooling on convictions and incarceration.
However, we see that significance is lost, primarily due to a large decrease in precision, when quadratic municipality specific time trends are included rather than linear trends. Table 8 presents the results of estimating our baseline model for three sets of birth cohorts: 1943-1947, 1947-1951, and 1951-1955. 32 Are the baseline results presented above driven by any particular birth cohort? For males, we see a fairly consistent negative effect of education on 31 Meghir, Palme and Schnabel's (2011) estimates are consistently lower and less precise than our own. This may be partly due to the fact that we have eight more years of crime data available to us and can thus observe our individuals at younger, more crime prone ages. But this difference may also be due to the fact that we use parish and municipality at school age to assign treatment (whereas they use parish and municipality at birth), differences in our preferred specifications, the fact that we use more narrowly defined treatment and control groups, and to the fact that they report reduced form coefficients, while we report estimates from two-stage least squares. It is important to keep in mind, however, that the main focus of their paper is on the impact of parental education on children's crime. In this part of their analysis, the effects are both large and quite precisely estimated. 32 These specifications still restrict the sample to individuals born within five years of the first reform birth cohort in their own municipalities. However, the results are quite similar if we eliminate this restriction and use all individuals born in these birth cohorts. By restricting the sample to just a few birth cohorts, we are already, to some extent, controlling for differences between treatment and control groups that may exist over time.

Results by Birth Cohort
conviction across birth cohorts. At the extensive margin, an additional year of education decreases the likelihood of a male's conviction by 2.5 percentage points if born between 1943born between and 1947born between or 1947born between and 1951born between . For the youngest cohorts (1951born between -1955, the effect appears somewhat larger -3.2 percentage points. A significant negative effect of education on the number of crimes is also observed for each group of birth cohorts; however, once again, there is a slightly larger effect for the youngest cohorts (-0.231) compared to the oldest cohorts (-0.166). With regards to incarceration, we see a negative coefficient for all birth cohorts at both margins; once again, this effect tends to be larger in size and more significant for the younger cohorts.
For females, we find a significant negative effect on the likelihood of conviction for cohorts born in 1943-1947 and 1951-1955. The negative effect observed at the intensive margin in Table 7 appears to be driven by the youngest birth cohorts. 33

Results Over the Life Cycle
The conviction and prison variables used in the analysis thus far consider whether an individual has a record (and the extent of the record) during the period 1973 to 2007, regardless of an individual's age. That is, for our oldest birth cohort, we are considering their criminal records from age 30 to 64; but, for our youngest birth cohort, we are considering criminal records from age 18 to 52. The use of these crime measures can raise two potential concerns. First, are any differential effects that are observed across birth cohorts (as described in the previous subsection) driven by the fact that we are measuring crime at different ages for different birth cohorts? That is, do we observe a larger effect of education on conviction and incarceration for younger male birth cohorts than older male birth cohorts in Table 8 because we can capture 33 Note that estimates are not provided for females for the prison variables given the low intensity with which females are sentenced to prison and the reduced sample sizes in the cohort analysis, when compared to the entire sample.
criminal behavior between the ages of 18 and 30 -when criminality generally peaks -for these younger cohorts? Second, and more generally, are there differential effects of education on crime across the life cycle?
To address these questions, we create additional conviction and incarceration variables that are age-specific. We determine whether individuals have any convictions or prison sentences (and the number of crimes and days sentenced to prison) during four age ranges: 18-29, 30-39, 40-49, and 50-59. For all individuals in our sample, we have crime data for the middle two age categories; that is, if an individual has a record between the ages of 30 and 49, we can observe it.
However, the same can only be said for the 50-59 age range for individuals born between 1943 and 1948 and for the 18-29 age range for individuals born in 1955. Table 9 presents the results of estimating our baseline specification for each age-specific crime measure at the extensive margin (columns (1) -(4)) and the intensive margin (columns (5) - (8)). With the exception of the 18-29 crime variables, these specifications are restricted to the sample of birth cohorts for whom crime records are available for all of the relevant ages. Since this is just one birth cohort for the 18-29 variables, we use the three youngest birth cohorts, who have crime records available for all ages between 20 and 29.
The results presented in the top panel of Table 9 indicate that one additional year of education significantly decreases the likelihood that a male is convicted between the ages of 18  (3)) translates into a 15 percent decrease. A significant effect on male convictions is not observed between the ages of 50 and 59 at the extensive or intensive margins. 34 These results indicate that the finding in Table 8 of a larger effect of schooling on male conviction rates for younger than older birth cohorts is not driven by the fact that we observe convictions for these cohorts between ages 18 and 29, as the estimated effect is actually smaller (in both percentage point and percent terms) for the 18-29 category than the other categories. The top panel of Table 9 also indicates a significant negative effect of an additional year of education on the likelihood of a prison sentence between ages 40 and 49 and the length of prison sentences between 30 and 39 and 40 and 49.
For females, in the lower panel of Table 9, we observe a significant negative effect of education on convictions between the ages of 30 and 39 at both the extensive and intensive margins. If we include the whole sample rather than just those with records available between the ages of 20 and 29, then we also observe significant negative effects at both margins for the 18 to 29 conviction variables. Given that less than one percent of females have a prison record at any time (and at most 0.2 percent have a record during each age period), we again do not present results for female prison sentences.
Overall, the results from Table 9 provide evidence that additional education may have a long-term impact on criminal behavior throughout the life cycle. Specifically, they indicate that the effect of an additional year of education on the likelihood of conviction and the number of convictions is fairly homogeneous across the life cycle for males but that there is a larger effect on prison sentences (or offenses serious enough to warrant a prison sentence) between the ages of 30 and 49. For females, the overall effects presented in Table 7 appear to largely be driven by convictions between the ages of 30 and 39. Table 10 considers the heterogeneity of the estimated effect across our three crime categories:

Results by Crime Type
property, violent, and other. As in our baseline model, all specifications include municipality and birth cohort fixed effects, municipality specific linear time trends, and one pre-reform dummy.
For both males and females, a negative relationship between years of schooling and convictions is seen for all crime categories and at both the extensive and intensive margins; ten out of twelve of these estimates are significant.
More specifically, for males, we see that an additional year of schooling decreases the likelihood of a property crime conviction by 1.1 percentage points (or 10 percent), a violent crime conviction by 0.8 percentage points (or 13 percent), and a conviction of another type of crime by 1.5 percentage points (or five percent). These results are very much in line with the previous literature, as Lochner and Morretti (2004) find that an additional year of education decreases the likelihood of arrest for property and violent offenses by 11-12 percent. For females, an additional year of schooling significantly decreases the likelihood of conviction for a property offense by 1.1 percentage points (or 28 percent) and a violent offense by 0.5 percentage points (or 50 percent).
These results indicate that the negative effect of education on crime is not driven by a particular crime category and is, in fact, fairly consistent across crime categories. If anything, a smaller effect is seen for the other crimes category for both males (when compared to the average amount of other crimes) and females (given the insignificance of the estimate). One potential explanation for this pattern is that the "other" crimes category includes numerous offenses that can be classified as white collar offenses, such as tax evasion, insider trading, forgery, anti-competitive behavior, infringement on copy-and patent-rights, neglecting work place safety rules, misconduct of public servants, etc. As suggested by Lochner (2004), a positive or, at least, less negative effect of education on crime may be expected for these offense categories. 35

Conclusion
This paper provides estimates of the causal effect of education on crime by using the change in Sweden's compulsory schooling laws as an instrument for years of education. We find a significant negative effect of an additional year of schooling on crime that is robust in a number of dimensions. Specifically, this effect is seen for: (i) males and females, (ii) convictions and prison sentences, (iii) extensive and intensive margin measures, (iv) young and old birth cohorts, (v) crimes committed across the life cycle, and (vi) violent offenses, property offenses, and other offenses.
The magnitude of the effect is also quite substantial, and in line with previous estimates seen in the U.S. (Lochner and Morretti, 2004) and the U.K. (Machin, Marie and Vujić, 2011).
Specifically, an additional year of education reduces the likelihood that a male is convicted or incarcerated by 7.5 and 16 percent, respectively. For females, one additional year of schooling decreases the likelihood of conviction by 11 percent. Importantly, these effects are just as large for violent crimes, which tend to be most costly to society, as property crimes. 35 In future research, we may be able to study the effect of education on finer sub-categories of crime. But, it is uncertain whether these offenses occur often enough for effects to be precisely estimated.  1943 1944 1945 1946 1947 1948 1949 1950 1951 1952 1953 1954 1955 year of birth  1943 1944 1945 1946 1947 1948 1949 1950 1951 1952 1953 1954 1955 year of birth  1943 1945 1947 1949 1951 1953 1955 year of birth  1943 1945 1947 1949 1951 1953 1955 year of birth  1943 1945 1947 1949 1951 1953 1955 year of birth  1943 1945 1947 1949 1951 1953 1955 year of birth   048] OLS regression coefficients. Robust standard errors in brackets, clustered by municipality; *** indicates significance at 1% level. All specifications include birth cohort and municipality fixed effects, as well as municipality specific time trends.   Robust standard errors clustered at the municipality level are in brackets. *** indicates significance at 1% level; ** indicates significance at 5% level; * indicates significance at 10% level. All specifications are restricted to the sample of individuals born within five years of the reform in their own municipality. Robust standard errors clustered at the municipality level are in brackets. *** indicates significance at 1% level; ** indicates significance at 5% level; * indicates significance at 10% level. All specifications are restricted to the sample of individuals born within five years of the reform in their own municipality. Note that, given the extremely small percentage of females sentenced to prison within a single birth cohort, the results for females are not presented when using prison or prison sentence as the dependent variable.