Close Neighbours Matter: Neighbourhood Effects on Early Performance at School*
Previous versions of this article were presented at the Conference in memory of Zvi Griliches (Paris), at the CEPR Conference on Changing Condition in Education (Uppsala), at the University of Pompeu Fabra (Barcelona), at the University of Milan, at the CREST (Paris) and at the CEPR symposium on Public Policy (Paris). This research has benefited from a grant of the French Commissariat Général au Plan (no. 7–2002). We thank Esther Duflo, Steve Gibbons, Per Johansson, Thomas Piketty, Gerard Van den Berg and Louis-André Vallet for their comments on earlier versions of this article. Special thanks are due to Sandra McNally for her insightful comments and very helpful suggestions for improving the article.
Children's outcomes are strongly correlated with those of their neighbours. The extent to which this is causal is the subject of an extensive literature. There is an identification problem because people with similar characteristics are observed to live in close proximity. Another major difficulty is that neighbourhoods measured in available data are often considerably larger than those which matter for outcomes (i.e. close neighbours). Several institutional features of France enable us to address these problems. We find that an adolescent's outcomes at the end of junior high-school are strongly influenced by the performance of other adolescents in the neighbourhood.
The assumption that children's outcomes are influenced by the characteristics and outcomes of their neighbours forms the basis of a large and growing literature in the social sciences. Providing a convincing evaluation of neighbourhood effects has proven very difficult, however. The main difficulty is in isolating variation in neighbourhood attributes which are exogenous to children's outcomes. Children and families living in the same neighbourhood tend to have similar outcomes. It is unclear, however, whether this is because they influence each other or because they share the same unobserved characteristics (Manski, 1993; Ginther et al., 2000; Moffitt, 2001; Brock and Durlauf, 2001). Another issue is that neighbourhoods measured in available data sets are often considerably larger than those which matter for outcomes (i.e. close neighbours).
This article addresses these issues and identifies the causal impact of close neighbours’ characteristics on children's outcomes. The French Labour Force Survey enables us to consider the effect of close neighbours because of the nature of data collection: the basic sampling unit consists of groups of 20 to 30 adjacent households. It provides us with a large sample of 15-year-old adolescents and includes detailed information on the situation of all the other adolescents and adults living in the close neighbourhood, defined as the 20 to 30 adjacent houses. Existing studies typically proxy neighbourhoods with census tracts – relatively large groups of people (several thousand). Assuming that distant neighbours have less influence than close ones, this plausibly leads to an underestimate of the influence of close neighbours. Our data provide an interesting opportunity to overcome this difficulty and to analyse how persons living in adjacent houses actually influence each other.1
Our first identification strategy relies on variation across neighbourhoods in the proportion of adolescents born at the beginning (or at the end) of the year. As discussed below, the date of birth within the year is an important determinant of French children's early performance at school and is plausibly exogenous to the quality of the neighbourhood in which they live. In such a context, one simple way to identify the influence of neighbours is to test whether children's performance at school is affected by the distribution of dates of birth within the year of the other children living in the same neighbourhood. As shown below, the answer is positive. Regardless of their own date of birth, children living in a neighbourhood with a relatively high proportion of children born at the beginning of the year perform significantly better than children living in a neighbourhood with a relatively high proportion of children born at the end of the year. This result provides interesting evidence of the influence of social context on children's outcomes. It is possible to take some steps further by focusing on children's late outcomes – at the end of junior high-school – and by assuming that their neighbours’ dates of birth, as such, have no effect on these outcomes, i.e. neighbours’ date of birth, as such, has no influence on outcomes, except maybe on early outcomes in primary school or at the beginning of junior high-school. In such a case, the distribution of neighbours’ dates of birth can be used as an instrumental variable to identify the effect of neighbours’ early educational advancement on an adolescent's performance at school. Our IV estimates suggest that a one standard-deviation (SD) increase in the proportion of neighbours who have already been held back a grade at age 15 increases an adolescent's probability of grade repetition between the age of 15 and 16 by about 10–15 percentage points (i.e. about 20% of a SD). It is shown that these estimates do not depend on the specific characterisation of neighbours’ date of birth which is used for identification. Overidentification tests do not reject the identifying assumption nor the linear specification of the endogenous effect which is used in this article.
Following the terminology introduced by Manski (1993), our first strategy identifies the endogenous effect, i.e. the effect of neighbours’ outcomes on own outcomes. We have developed a second strategy which provides an evaluation of the reduced-form effect of neighbours’ family background in relatively poor neighbourhoods. This approach relies on available information on families living in public housing (Habitation à Loyer Modéré, hereafter HLM, about 20% of the population). In France, any family is eligible for a HLM provided that the income per unit of consumption is sufficiently low. The problem is that the number of eligible families is about three times as large as the available space in HLM. Rents are also considerably lower in HLM than in non-HLM. Given these facts, the turnover is very low and the waiting lists are very long. HLM managers have a very limited set of housing to offer each year to eligible families and very little control over the specific neighbourhoods to which families can be assigned. We provide various specification checks supporting the assumption that HLM assignment is quasi-random. Under this assumption, HLM neighbourhood membership can be considered as exogenous and neighbourhood effects can be identified through standard regressions. Interestingly, they confirm the existence of significant contextual effects in HLM neighbourhoods.
The French Labour Force Survey provides information on adolescents’ outcomes only. To explore the influence of social context on French children further, we have used a longitudinal survey recently conducted by the French Ministry of Education. It provides detailed information on the early school career of a large representative sample of pupils. This dataset makes it possible to analyse the relationship between the scores in national tests at entry into 3rd grade and the characteristics of peers at entry into 1st grade, using exactly the same reduced-form and IV specifications as with the Labour Force Survey. Most interestingly, the reduced-form analysis confirms that individual scores obtained at entry into third grade decrease significantly with the proportion of first-grade peers who were born at the end of the year. Also, IV estimates suggest that a one SD increase in the average score of early peers leads to an increase of about 30% of a SD of a child's score at entry into the third grade. As it turns out, the influence of early peers does not seem less strong than that of neighbours with whom they interact later in life.
Generally speaking, this article contributes to the literature on the influence of peers on own educational achievement, where peers are defined as children of a similar age and likely to interact. There is no consensus on the importance of peer effects on own achievement in this literature. Some papers report significant effects (Ding and Lehrer, forthcoming; Hoxby, 2000; McEwan, 2003) whereas others find no impact at all (Angrist and Lang, 2004). One explanation for the lack of consensus is variation in how ‘neighbourhood’ is defined, as well as the variety of approaches used to identify peer effects. A branch of this literature identifies peers’ influence through the analysis of housing mobility programmes where some low income inner-city families are given assistance in moving to less segregated, randomly selected locations (Jacob, 2004; Leventhal and Brooks-Gunn, 2004; Kling et al., forthcoming; Ludwig et al., 2001; Rosembaum, 1995; Sanbonmatsu et al., forthcoming). Another branch of the literature identifies contextual effects in the classroom or at university through exploiting either random variation in the peer group composition over time or random assignment of peers to individual students (Gibbons and Telhaj, 2005; Gould et al., 2004; Hanushek et al., 2003; Kremer and Levy, 2003; Ammermüller and Pischke, 2006; Sacerdote, 2001; Zimmerman, 2003). Using sibling data, Aaronson (1998) finds a significant correlation between variation in educational outcomes across siblings and variation in neighbourhood quality arising from family change of residence. Other researchers have tried to address endogeneity issues by developing instrumental variables strategies (Cutler and Glaeser, 1997; Evans et al., 1992). For example, Cutler and Glaeser (1997) use the topographical features of cities to identify the effect of spatial segregation on blacks’ outcomes.
This article is organised as follows. Section 1 provides a description of our data. Section 2 shows the results of the strategy using neighbours’ dates of birth within the year as an instrumental variable. Section 3 shows results of the strategies which build on the available information on public sector housing. Section 4 shows the findings with the panel of pupils. Section 5 concludes.
1. Data and Variables
The datasets used in this article come from 12 waves of the French Labour Force Survey (LFS), conducted each year between 1991 and 2002. One interesting feature of the French LFS is that the basic sampling unit consists of groups of about 20 adjacent households (aires). More specifically, a typical LFS consists of a representative sample of about 3,500 aires.2 We take ‘neighbourhood’ as equivalent to the LFS aire. Each year, within each aire, all the households are surveyed and, within each household, all persons aged 15 or more are interviewed. The French statistical office (INSEE) has chosen this sampling strategy so as to reduce the travelling expenses of those who administer the survey.
For each respondent, we have standard information on his date of birth, sex, nationality, family situation, place of birth, education, labour market situation (unemployed, out of the labour force, employed). Also we know whether the respondent has been living in his current residence for one year or whether he has just moved into the neighbourhood. For respondents who are still in the education system, we know their current grade. By comparing their age and grade, we know whether they have been held back a grade in primary or junior high-school. For example, respondents of year t born in t − 15 are in the ninth grade (at least) if they have not been held back a grade.3 In the French context, to repeat a grade in elementary school or junior high-school is a very direct indicator of early performance at school. The recent Program for International Student Assessment (PISA) conducted by the OECD shows that 15-year-old French adolescents who have repeated a grade obtain much lower scores in mathematics, reading or science than normal-age adolescents. The difference is about 1.14 standard deviations of the score in mathematics, 1.26 SD in reading, 1.17 SD in science (Murat and Rocher, 2003). By the end of junior high-school, about 42% of adolescents have been held back a grade.
Another interesting feature of the French Labour Force Survey is that only one-third of the sample is renewed each year. For each t, we can construct a large representative sample of 15-year-old adolescents with information on their situation at t and at t + 1 (N = 13,100). This article will focus on the sample of 15-year-old respondents who were already living in their house one year before, who are still observed in the LFS at t + 1 and such that we observe at least one other 15-year-old adolescent in their aire. AppendixTables A1 and A2 provide the distribution of the adolescents in our sample according to the number of other 15-year-old adolescents observed in their aires and provide the basic descriptive statistics for these adolescents.
Table A1. Distribution of LFS 15-year-old Respondents According the Number of Other 15-year-old Children Living in their aire
|10 or more||1,139||8.7|
Table A2. Descriptive Statistics
|Held back a grade at 15||0.42||0.49|
|Held back a grade at 16||0.57||0.50|
|Same grade at 15 and 16||0.20||0.40|
|Father high-school dropout||0.32||0.46|
|Characteristics of the other 15-year-old respondents living in the aire|
|Proportion held back a grade||0.42||0.32|
|Proportion born January–May||0.42||0.30|
|Proportion born June–November||0.49||0.30|
|Proportion parents high-school dropout||0.38||0.27|
|Number of Observations|| ||13,116|
For each adolescent, it is possible to calculate the proportion of other adolescents in the aire who have been held back a grade in primary or junior high-school and who are not ‘normal-age’. The basic research question is whether an adolescent's probability of repeating a grade between the age of 15 and 16 is affected by the proportion of 15-year-old neighbours who are not normal-age. Does the variation in an adolescent's educational advancement between the age of 15 and 16 depend on the educational advancement of his/her close neighbours of the same age?
For each adolescent, we have also constructed several explanatory variables describing the average characteristics of other families living in the aire, namely the proportion of single-parent families, the proportion of families with 3 or more children, the proportion of non-French or unemployed workers among the adults in these families, the proportion of high-school dropouts and the proportion of college graduates. Let us emphasise that for each respondent the different aire-level indicators are constructed using only information on individuals who do not belong to the family of the respondent.4
2. Identification Using Information on Neighbours’ Months of Birth
The question is whether an adolescent's educational advancement at the end of junior high-school is affected by the characteristics of other adolescents in the neighbourhood. The first identification strategy builds on the use of a variable which determines children's performance at school but which is nonetheless exogenous to the quality of the neighbourhood in which they live. Specifically, our first approach relies on the fact that date of birth within the year is an important determinant of French children's early performance at school and that this is plausibly exogenous to the quality of their neighbourhood.
There is ample evidence showing that French children's date of birth within the year is an important determinant of their early school outcomes.5 The French school system is characterised by the full day character of both pre-primary and primary school, the heavy teaching load and the very high proportion of pupils who have to repeat one or two grades before the end of compulsory schooling. In such a context, the date of birth within the year is a very important determinant of early school performance – plausibly more important than in most other Western countries. The national evaluations conducted each year at entry into third grade show an average difference of about 1/2 of a standard deviation between the scores of children born in January and those of children born in December. The proportion of 15-year-old children held back a grade is about 15 percentage points higher for children born at the end of the year – the least mature of their class – than for children born at the beginning.
In contrast, there is no strong reason for children's date of birth within the year to be correlated with the quality of the neighbourhood in which they live. As discussed below, there is no specific residential concentration of children born at the beginning (or the end) of the year. In such a context, it is possible to develop a very simple test for the existence of neighbourhood effects.
To understand better why this is the case, let us denote yk children's educational advancement at age k (k = 7, …, 16) and let us assume that yk is defined recursively by,
where v represents date of birth within the year, n the neighbourhood, εn a neighbourhood fixed effect (the quality of schools) and u the omitted individual characteristics, i.e. the resources that affect schooling and that an individual can bring from one neighbourhood to another. The γk parameters represent the effect of own date of birth within the year on own performance at school whereas the θk parameters capture the persistance of educational outcomes over time.6 Using Manski's terminology, the αk parameters represent endogenous effects, whereas the βk parameter captures an exogenous effect.
The omitted resources u are likely to be correlated with εn, but the date of birth of the respondents (v) and the distribution of the date of birth of their neighbours (E(v |n)) are plausibly uncorrelated with the other determinants of performance at school, as defined by εn or u. AppendixTables A3 and A4 show that there is no significant correlation between the basic observable determinants of an adolescent's performance at school (i.e. date of birth, gender, nationality or family background) and the proportion of neighbours born at the beginning (or at the end) of the year. In particular, there is no specific correlation between own date of birth and neighbours’ date of birth. There are good and bad neighbourhoods (i.e., high and low εn), but children and their neighbours do not seem to be sorted across good and bad neighbourhoods according to their date of birth.
Table A3. Relationships Between an Adolescent’s Characteristics and the Distribution of Dates of Birth of Other Adolescents in the Neighbourhood
|Date of birth (continuous specification||–||–||0.005 (0.005)|
|Date of birth (dummies)|
|Born January–May||0.008 (0.010)||−0.006 (0.010)||–|
|Born June–November||0.001 (0.010)||0.001 (0.010)||–|
|Boy||−0.008 (0.005)||0.013 (0.05)||0.001 (0.036)|
|Non-French||0.002 (0.012)||0.001 (0.012)||−0.007 (0.008)|
|College grad.||−0.004 (0.009)||0.005 (0.009)||0.009 (0.063)|
|High-school grad.||0.010 (0.010)||−0.011 (0.011)||−0.051 (0.072)|
|No Dip.||−0.010 (0.009)||0.008 (0.009)||0.022 (0.063)|
|Missing||−0.003 (0.010)||0.007 (0.010)||0.028 (0.070)|
|College grad.||−0.000 (0.009)||0.004 (0.009)||0.009 (0.063)|
|High-school grad.||0.006 (0.009)||−0.003 (0.010)||−0.025 (0.066)|
|No Dip.||−0.002 (0.009)||−0.006 (0.009)||0.050 (0.059)|
|Missing||−0.002 (0.009)||0.016 (0.009)||0.077 (0.063)|
|Fisher (4 dummies father Educ. = 0)||0.93 (0.42)||0.83 (0.47)||0.46 (0.76)|
|Fisher (4 dummies mother Educ. = 0)||0.17 (0.91)||0.28 (0.83)||0.42 (0.74)|
Table A4. Adolescents’ Characteristics and Neighbours’ Dates of Birth
|Father college grad.||42.4 (0.7)||49.7 (0.7)|
|Father not college grad.||42.3 (0.3)||49.6 (0.3)|
|Born January–May||42.7 (0.4)||49.2 (0.4)|
|Born June–November||42.0 (0.4)||49.9 (0.4)|
|Boy||42.3 (0.6)||49.0 (0.6)|
|Girl||42.2 (0.6)||50.6 (0.6)|
|French||42.3 (0.2)||49.6 (0.2)|
|non-French||41.7 (1.1)||50.9 (1.1)|
After averaging (1) conditional on n and solving the recursive system of equations, we obtain a first-stage equation that can be written,
where the new fixed effect ωn is a linear combination of εn and un = E(u |n) whereas the new parameter β1,k−1 is a linear combination of the βk−t + θk−t parameters, t = 1, …, k−6 (see Appendix). The β1,k−1 parameter captures the cumulative impact of the average maturity of children living in a neighbourhood on their average outcome at age k. This equation makes it clear why the different E(yk |n) are likely to be correlated with εn and, consequently, why αk−1 cannot be estimated in (1) through a standard linear regression of yk on E(yk−1|n).
Replacing E(yk−1|n) in (1) and solving the system, we obtain a reduced form equation that can be written,
where η is proportional to u, whereas the fixed effect μn is a linear combination of εn and un. Also, the reduced-form parameter β2,k−1 is a linear combination of the (αk−tβ1,k−t + βk−t) parameters whereas the γ2,k−2 parameter is a linear combination of γk−t parameters, t = 1, …, k−6 (see Appendix).
The β2,k−1 parameter captures the cumulative impact of neighbours’ average maturity on a child's outcome at age k. Given that v and E(v |n) are uncorrelated with u and εn, they are uncorrelated with the reduced-form residuals η and μn and (3) shows that β2,k can be estimated through an OLS regression of yk on E(v |n). Intuitively, this reduced-form effect provides us with direct evidence of social effects.The β2,k parameter is indeed positive if and only if there is a t such that either αk−t or βk−t is positive.
Hence, observing the distribution of dates of birth within small neighbourhoods provides us with evidence of the influence of social context. Endogenous effects cannot be separated from exogenous effects without an additional identifying assumption, however. One such additional assumption is that the date of birth within the year, as such, has no significant influence on own late school transitions and on that of neighbours. Using the notation of (1), this amounts to assuming that β15 and γ15 are negligible. Under this additional assumption, the distribution of neighbours’ date of birth is clearly a valid instrumental variable for identifying the effect α15 of neighbours’ early outcomes E(y15|n) on an adolescent's late outcome y16 whereas an adolescent's own date of birth is a valid instrument7 for identifying the effect θ15 of own early outcome y15 on own late outcome y16.
Assuming that respondents’ date of birth can be characterised by (say) two variables (v ′ and v′′) rather than by just one (v), our identifying assumption can be tested as an overidentifying restriction. If the date of birth within the year has no effect on current outcome y16 on top of its effect on early outcome y15, then any characterisation v′ of the date of birth is a valid instrument for identifying the effect of y15 on y16 and any further characterisation v′′ should have no additional effect on y16. In what follows, we characterise date of birth by two dummies, ‘born between January and May’ (v′) and ‘born between June and November’ (v′′) (December being the reference). We show that the overidentifying restriction is not rejected. Also we check that the IV results remain unchanged when we characterise date of birth within the year by a single continuous variable rather than by one or two dummies (and the distribution of date of birth in the neighbourhood by its mean value rather by one or two proportions).
Table 1 focuses on our basic sample of adolescents and analyses their educational advancement at the end of junior high-school as a function of their own date of birth and of the date of birth of other adolescents living in the same neighbourhood. The first column shows that an adolescent's probability of being held back a grade at the end of junior high-school is about 8 percentage points larger (+16% of a SD) if the other adolescents living in the neighbourhood were born at the beginning rather than at the end of the year. Interestingly, the reduced-form effect of neighbours’ maturity is almost as strong as that of own maturity.
Table 1. Reduced-form Effect of Close Neighbours’ Date of Birth Within the Year on an Adolescent's Educational Advancement
|Characteristics of the other 15-year-olds living in the aire|
|Proportion born January–May||−0.082 (0.027)||−0.050 (0.022)|
|Proportion born June–November||−0.050 (0.027)||−0.032 (0.022)|
|Proportion born December||Ref.||Ref.|
|Born January–May||−0.085 (0.016)||0.032 (0.014)|
|Born June–November||−0.049 (0.016)||0.020 (0.013)|
Having been held back a grade at the end of junior high-school is a cumulative outcome.8 Hence, the effect estimated in column 1 represents the cumulative influence of close neighbours on both early and late educational outcomes. The second regression isolates the effect of close neighbours on late outcomes, as measured by the probability of grade repetition between the age of 15 and 16. It shows that an adolescent's probability of being in the same grade at age 16 as at age 15 is 5 percentage points larger (i.e., about 13% of a SD) if the other adolescents in the neighbourhood were born at the beginning rather than at the end of the year. As it turns out, the reduced-form effect of neighbours’ maturity keeps on being strong at the end of junior high-school.
This reduced-form analysis does not separate endogenous from exogenous social effects. As discussed above, it is possible to further explore this issue by assuming that date of birth within the year mostly affects early educational advancement. Table 2 shows the result of a regression of an adolescent's educational advancement at age 16 on own educational advancement at age 15 and on that of neighbours, using own date of birth and that of neighbours as instrumental variables. It reveals a significant endogenous effect (i.e., β15 = 0.33) which suggests that a one standard deviation increase in the proportion of neighbours held back a grade at age 15 increases ceteris paribus the probability of being held back a grade at age 16 by about 11 percentage points, i.e., 20% of a SD (Table 2, Column 3).
Table 2. The Effect of Close Neighbours’ Educational Advancement on an Adolescent's Educational Advancement: An Evaluation Using the Distribution of Close Neighbours’ Date of Birth as an Instrumental Variable
|Characteristics of the other 15-year-olds living in the aire|
|Prop. held back a grade at 15||–||0.08 (0.01)||0.33 (0.13)||0.36 (0.15)|
|Prop. born January–May||−0.14 (0.02)||–||–||–|
|Prop. born June–November||−0.07 (0.02)||–||–||−0.006 (0.012)|
|Prop. born December||Ref.||–||–||Ref.|
|(Held back a grade at 15 = 1)||–||0.70 (0.01)||0.57 (0.08)||0.57 (0.08)|
|Born January–May||−0.009 (0.010)||–||–||–|
|Born June–November||−0.002 (0.010)||–||–||−0.005 (0.007)|
|H0 = ‘Proportion held back at||–||–||−0.25 (0.13)||−0.28 (0.15)|
|grade at 15’ exogenous||–||–||(reject. 0.5%)||(reject. 6%)|
|H0 = Instruments jointly valid||–||–||0.22 (0.80)|
It is plausible that an adolescent's educational advancement directly affects an intermediate mechanism (e.g. own studying behaviour), and not neighbours’ performance at school directly. Under this assumption, the endogenous effect estimated in this article reflects the effect of neighbours’ studying behaviour on own studying behaviour, the identifying assumption being that an adolescent's date of birth affects the studying behaviour of neighbours only insofar as it has affected own behaviour.
2.2. Overidentification and Exogeneity Tests
Standard overidentification tests do not reject the validity of our identifying assumptions.9 We have checked that the results of the IV regression remain almost exactly the same when we use only the first dummy (i.e., being born between January and May) and the first proportion (i.e., proportion of neighbours born between January and May) as instrumental variables, the second dummy (i.e., being born between June and November) and the second proportion (i.e., proportion born between January and May) being used as additional control variables. These two additional control variables have no significant effect on the outcome under consideration (Table 2, Column 4). Also, the IV results are almost identical when we characterise the distribution of neighbours’ date of birth within the year by its mean value rather than by two proportions (see AppendixTable A5). As discussed above, these different tests are consistent with the assumption that date of birth, as such, has no direct effect on the probability of repeating a grade between age 15 and 16 on top of its effect on the probability of repeating a grade before the age of 15.10 The fact that the estimated endogenous effect remains the same regardless of whether we exclude one or two characterisations of the distribution of neighbours’ date of birth can also be interpreted as meaning that the linear specification of the endogenous effect is not rejected; see Kling et al. (forthcoming) for a similar argument. We have also checked that the result remains unchanged when we add various family background indicators as control variables, which is consistent with our instruments being uncorrelated with family background.
Table A5. The Endogenous Contextual Effect: an Evaluation Using Average Month of Birth as an Instrumental Variable
|Characteristics of the other 15-year-olds living in the aire|
|Prop. held back a grade at 15||–||–||0.08 (0.01)||0.31 (0.12)|
|(Held back a grade at 15 = 1)||–||–||0.70 (0.01)||0.61 (0.07)|
|Average month of birth (continuous specification)||0.012 (0.01)||−0.006 (0.0020)||–||–|
|Month of Birth (continuous specification)||0.0002 (0.0008)||−0.008 (0.001)||–||–|
|Cohort, Gender, nation. dummies||yes||yes||yes||yes|
A Hausman test rejects (at the 5% level) the assumption that the proportion of neighbours held back a grade is exogenous. Column 2 of Table 2 confirms that the OLS estimate of the proportion of neighbours held back a grade is significantly lower than the IV estimate. This is something of a puzzle, since typically, endogenous neighbourhood selection is likely to lead to upward bias in the OLS coefficient.11 One potential explanation for the IV/OLS difference is that late grade repetition (i.e., between age 15 and 16) mainly affects relatively bad students living in good neighbourhoods on the one hand and good students living in bad neighbourhoods on the other. Good students in good neighbourhoods do not repeat grades whereas bad students living in bad neighbourhoods have already been held back one or two grades early in their school career and are not likely to be held back further in subsequent periods. Hence, holding past educational advancement constant, there is a negative correlation between the quality of a neighbourhood and adolescents’ propensities to repeat grades. Such a correlation can generate strong attenuation bias, i.e., the difference in late grade repetition across good and bad neighbourhoods may be very weak even though the true neighbourhood effect is very strong. Another potential source of attenuation bias arises from errors that affect our measure of respondents’ outcomes and, consequently, our measure of the distribution of outcomes in the neighbourhood. Given that the variance of the errors in the measure of average outcomes decreases with the number of individuals, we should observe a smaller attenuation bias for a sample of respondents with more neighbours. To test these different interpretations, AppendixTable A6 compares OLS estimates obtained using the full sample (column 1), with those obtained using the subsample of respondents with at least four neighbours (column 2) and those obtained on the subsample further restricted to the 15-year-old respondents who are still normal-age (i.e., neither ahead nor held back) and, consequently, the most exposed to late grade repetition (column 3). Interestingly, the OLS estimates are much larger in the more restricted sample (0.24 in the third sample) and no longer different from the IV. This seems consistent with our interpretation of the IV/OLS gap.
Table A6. Variation in OLS Estimates of the Endogenous Effect Across Sub-samples
|Prop. Held back at 15||0.08 (0.01)||0.13 (0.02)||0.25 (0.03)|
|Held back at 15||0.70 (0.01)||0.69 (0.01)||–|
2.3. Alternative Specifications
Table 3 provides alternative evaluations of the endogenous effect using a non-cumulative specification of the dependent variable (column 1) and an alternative specification of the model (columns 2 and 3). Specifically, the first column shows the regression of an adolescent's probability of being in the same grade at age 15 and 16 (i.e., non-cumulative outcome) on the proportion of neighbours held back a grade at age 15 and own educational advancement at age 15, using the same instruments as in Table 2 (i.e., own date of birth within the year and that of neighbours). The estimated endogenous effect is as significant with this specification as with the cumulative one. A one SD increase in the proportion of neighbours held back a grade at age 15 increases an adolescent's probability of grade repetition between age 15 and age 16 by about 14 percentage points.
Table 3. The Effect of Close Neighbours’ Educational Advancement on an Adolescent’s Educational Advancement: Alternative Dependent Variables and Alternative Specifications
|Proportion other adolescents in the aire held back a grade at age 15||0.43|
|(Held back a grade at age 15 = 1)||−0.24|
|Number of Observations||13,116||13,116||13,116|
As discussed above, an alternative strategy is to estimate the endogenous effect β15 conditional on various values of θ15 in [0,1], using the distribution of neighbours’ date of birth as the only instrument. If θ15 is assumed equal to 0 then β15 can be estimated through the IV regression of own educational advancement at age 16 on neighbours’ educational advancement at age 15. In contrast, if θ15 is assumed equal to 1 then β15 can be estimated through the IV regression of grade repetition between age 15 and 16 on neighbours’ educational advancement at age 15.12Table 3 shows these different regressions. They confirm that a one SD increase in the proportion of neighbours held back a grade at age 15 has a strong effect on own educational advancement at age 16. It lies between 22 percentage points when θ15 is assumed equal to 0 (column 3) and 10 percentage points when it is assumed equal to 1 (column 2). The estimated endogenous effect remains strong even under the extreme assumption that past educational advancement does not affect the current probability of grade repetition.
3. Identification Using Information on Families Living in Public Housing
The previous Section has focused on one specific social effect – the effect of performance at school of the other adolescents living in the neighbourhood. This Section provides a broader evaluation of the influence of social context but without separating the endogenous from the exogenous dimensions. Specifically, we ask whether a child's performance at school is influenced by the level of human capital of families living in the neighbourhood, but will not address whether this is because it has a direct effect (exogenous channel) or because it affects the performance of other children in the neighbourhood (endogenous channel). This Section uses available information on families living in public housing (HLM, about 20% of the population).
In France, any family is eligible for an HLM provided that the head of the family is allowed to live in France and that income per unit of consumption is below a threshold (about 30,000 Euros for a four person family in 2002) which depends on the region and which is updated at beginning of each year. Eligible families can apply for an HLM in any city (commune) where such public programmes exist, regardless of their current place of residence or nationality.
Public housing is managed by several different types of administrative authority and – in general – eligible families apply simultaneously through the various possible channels. According to the Housing Survey conducted by the French Statistical Office in 2002, about 1.1 million households are waiting for public housing, whereas only about 400,000 such dwellings are made available each year. Hence, the waiting lists are very long and typically families have to wait for two or three years before a decision is made. Rents are considerably lower in public housing than in private-sector housing (−40% on average) which explains the high level of demand for public housing and the low level of turnover, especially in large cities (Le Blanc et al., 1999). Within this framework, HLM managers have a very limited set of dwellings to offer each year to HLM applicants and very limited control over the neighbourhoods where the supply of dwellings is located. Families have even less control over the specific location of the dwelling to which they are allocated. Given these facts, the sorting of families across HLM neighbourhoods is plausibly much more exogenous than across private sector neighbourhoods.
To test this assumption, the first two columns of Table 4 focus on children who have just moved into a neighbourhood and show the results of a regression of a dummy indicating whether they have been held back a grade on the proportion of children who have been held back a grade in the neighbourhood into which they move. The first regression focuses on non-HLM neighbourhoods and reveals a very significant correlation between the two variables (column 1). Families who choose (or who are constrained by housing prices) to live close to one another are similar with respect to some important individual determinants of performance at school. Further explorations of the data (not reported) reveal that this correlation is mostly due to the fact that families who move into a non-HLM neighbourhood and other families in this neighbourhood have similar levels of education and are likely to share the same nationality. When we add parental education and nationality as supplementary control variables, the effect found in the first column of Table 4 becomes very small and not significantly different from zero.13 The second column shows the results of the same regression but only for families moving into a HLM. Most interestingly, it shows that there is no correlation between the probability of being held back a grade for children moving into public housing and that for other children in the neighbourhood. The assignment of families across HLM neighbourhoods appears to be random with respect to children's educational performance.
Table 4. Endogenous Neighbourhood Membership in HLM and non-HLM Neighbourhoods
|Proportion other adolescents in the aire held back a grade at 15||0.15|
In theory, the composition of HLM neighbourhoods could be biased by selective out-migration, even if the initial assignment were perfectly random. If this assumption were true, however, the correlation between the outcomes of children who have been living in the same HLM neighbourhood for more than one year would be driven (at least in part) by the similarity of their family background. Columns 4 and 6 of Table 4 focus on HLM families who have been living in their neighbourhood for more than one year and show that the correlation between children's performance at school and that of their neighbours does not decrease significantly when we control for their family background. This result does not hold true in non-HLM neighbourhoods, where we observe a very significant decrease in the regression coefficients when we control for the same set of family characteristics (Columns 3 and 5).
The rate of migration out of HLM housing is actually very low, because of the very low level of rents. According to the French Housing Survey, families observed in a HLM neighbourhood in 2002 had already spent an average of 10 years in their current residence, whereas the non-HLM families had spent only 5 years on average.
The findings reported in Table 4 are consistent with the existence of significant neighbourhood effects in HLM neighbourhoods and with the assumption that the HLM population is not sorted across neighbourhoods according to factors affecting early performance at school. Under this assumption, the influence of social context can be evaluated in HLM neighbourhoods by standard OLS regressions. The first column of Table 5 focuses on HLM neighbourhoods and shows that an adolescent's advancement at school is negatively affected by the proportion of non-educated families in the neighbourhood. A one standard deviation increase in the proportion of non-educated neighbours generates a 6 percentage point increase in the probability of being held back a grade (12% of a SD). In contrast, children's performance at school does not seem to be affected by the proportion of non-French families living in the neighbourhood.14
Table 5. Exogenous Contextual Effects: a Reduced-form Evaluation Using Information on the Families Who Live in Public Housing
|Characteristics of the other families living in the aire|
|Proportion high-school dropouts||0.15|
|Proportion single-parent family||–||−0.07|
|Proportion large families (3 or more children)||–||−.02|
|(Held back a grade at 15 = 1)||–||–||0.41|
The neighbourhoods with the highest proportion of non-educated families are also those with the highest proportion of single-parent families, the highest proportion of families with three or more children and also the highest proportion of unemployed adults. We have added these different neighbourhood characteristics as supplementary control variables in order to explore further the channels through which an adolescent's outcomes are influenced by the lack of education of other families in the neighbourhood. As it turns out, as shown by column 2 of Table 5, the proportion of single-parent families and the proportion of large families have no effect, whereas the proportion of unemployed adults living in the neighbourhood has a significant effect. Column 3 adds the adolescent's early outcome as a supplementary control variable in order to separate the effect of context on current outcomes from the effect on early outcomes. The proportion of unemployed adults still has a significant effect whereas the effect of the proportion of non-educated families becomes non-significant. One interpretation of this set of results is that the proportion of non-educated parents affects adolescents’ current outcomes mostly because it affects their early outcomes (plausibly through the endogenous channel) whereas the rate of unemployment in the neighbourhood also has a direct effect on current outcomes, maybe in part because it has a depressing effect on adolescents’ incentives to pursue education.
4. Extension: the Effect of Early Peers’ Dates of Birth Within the Year
The French Labour Force Survey provides us with information on adolescents’ outcomes only. To explore the influence of social interactions on French children further, we use a longitudinal survey recently conducted by the French Ministry of Education.15 This survey provides detailed information on the early school career of a representative sample of pupils who started primary school in 1997. The basic sampling unit is the school. Within each school, a class in the first grade is drawn at random and a random sample of one third of new entrants are surveyed. Their performance is followed up until third grade. For about 7,500 pupils, we have information on their gender, exact date of birth and social background. We know their performance in the national tests that took place at entry into third grade and their performance in specific tests that took place in September 1997 at entry into first grade. Finally, we know the code of the 1997 school. Hence, for each respondent, we can identify the characteristics of his/her early classmates, i.e. pupils who were in the same class at entry into first grade. On average, we observe six early peers per respondent. All in all, this dataset makes it possible to analyse the relationship between performance in tests at entry into third grade and the date of birth of first-grade peers, using exactly the same specification as in the second Section.
To begin with, we have checked that the observed individual characteristics of pupils are not correlated with the distribution of their peers’ date of birth. In particular, there is no correlation between a child's sex, family background or date of birth within the year, on the one hand, and the proportion of early peers born at the beginning (or at the end) of the year on the other (see AppendixTable A7). This result confirms that pupils are not sorted in any systematic way across first-grade classes according to their date of birth within the year, i.e., the distribution of early peers’ date of birth may be assumed exogenous to the respondent's own characteristics. Secondly, we have regressed pupils’ scores in the tests conducted at entry into third grade on the distribution of their early peers’ dates of birth, using the same specifications and control variables as in the LFS analysis (Table 6). The regressions have been performed for the global score and also separately for the scores obtained in mathematics and French. Most interestingly, this reduced-form analysis confirms that individual scores at entry into third grade decrease significantly with the proportion of first-grade peers who were born at the end of the year. A child's score at entry into third grade are 3 points (i.e., 20% of a SD = 15 points) smaller when his/her early peers were born at the end of the year rather than during first months of the year. The effects are stronger and better estimated in mathematics than in French.
Table A7. Pupils’ Characteristics and First Grade Peers’ Dates of Birth
|Father college grad.||41.7 (0.5)||49.9 (0.5)|
|Father not college grad.||41.2 (0.2)||50.5 (0.5)|
|Born January–May||41.5 (0.3)||50.1 (0.3)|
|Born June–November||41.1 (0.3)||50.6 (0.3)|
|Boy||41.2 (0.3)||50.6 (0.3)|
|Girl||41.3 (0.3)||50.2 (0.3)|
|French||41.2 (0.2)||50.4 (0.2)|
|Non-French||43.7 (1.5)||48.8 (1.5)|
Table 6. The Effect of the Distribution of First Grade Peers’ Dates of Birth on a Pupil’s Performance at the Entry into the Third Grade
|Date of birth of 1st Grade peers|
One potential problem with this analysis is that our measure of the proportion of peers born at the begining (or at the end) of the year is affected by sampling errors. The OLS estimates are affected by attenuation bias which increases with the variance of these errors. Given that early peers are randomly drawn among all the pupils in the class, the observed proportion of peers born at the beginning (or at the end) of the year is a consistent estimate of the true proportion but its variance decreases with the number of peers actually observed in the survey n.16 Hence, to evaluate the importance of the attenuation bias linked to sampling errors, we have replicated the previous analysis excluding the 15% of observations with the lowest n (i.e., with n > 4, see columns 4, 5, 6, Table 6). Comfortingly, the effects are stronger and better estimated for this subsample.17 A child's score at entry into the third grade is about 26% of a SD smaller when his/her first-grade peers were born at the end of the year rather than during the first months of the year. The estimated effect remains larger in mathematics (33% of a SD) than in French (20%).
Test scores at entry into third grade represent a measure of the quality of the first two years at school. Assuming that a child's performance at school is affected by the date of birth within the year of his/her early peers only insofar as their date of birth within the year affects the quality of their own early schooling, we can use the distribution of peers’ date of birth as an instrument to identify the true effect of peers’ early school performance on a child's performance. Table 7 shows the OLS and IV regressions of a child's score at entry into third grade on the average score at entry into third grade of the 1st grade peers. The IV estimates are significant and large: a one SD increase in the average score of early peers increases a pupil's score by about 36% of a SD. Overidentification tests do not reject our identifying assumption. The IV estimate is not significantly different from the OLS estimate, however.
Table 7. The Endogenous Effect on Test Scores at Entry into Third Grade: an Evaluation Using the Distribution of Dates of Birth of First Grade Peers as an Instrumental Variable
|Average Score of 1st grade peers at entry into 3rd grade||–||0.27|
|Dates of birth of 1st grade peers:|
|H0 = Peers’ average score exogenous|| ||–||−0.23 (0.11)|
|H0 = Instruments Valid|| ||–||0.18 (0.83)|
The first Sections of this article show that an adolescent's outcomes at the end of junior high-school are strongly affected by the educational advancement of the other adolescents living in the same neighbourhood. This reflects interactions that mostly take place outside the classroom, since it is unlikely that two adolescents attend the same class within the same school, even when they are close neighbours.18 The data from the Ministry of Education suggest that early interactions within French primary schools have no less influence on a child's educational career than interactions between adolescents in the same neighbourhood. The influence of close neighbours on own educational outcomes seems significant from the beginning to end of compulsory education, both inside and outside of the classroom’.
Building on the specifics of French institutions, we analyse the influence of close neighbours’ characteristics on an adolescent's performance at school. Our first strategy builds on the fact that the date of birth within the year, as such, has a significant effect on early educational outcomes. We use the distribution of close neighbours’ month of birth as an instrumental variable to identify the influence of neighbour's early outcomes on an adolescent's educational advancement at the end of junior high-school. This approach suggests that the probability of repeating a grade at the end of junior high-school increases strongly when the other adolescents living in the same neighbourhood have already been held back a grade rather than when they have not. A second strategy uses the fact that the distribution of families across public housing (HLM) is not significantly different from quasi-random assignment. In such a context, the influence of close neighbours’ families can plausibly be identified through standard regressions. This strategy shows that an adolescent's educational advancement is negatively influenced by the proportion of non-educated families living in the neighbourhood.
Our article focuses on the influence of close neighbours on performance at school. Further research is needed, however, to explore the effects of close neighbours on other outcomes, such as the decision to drop out of school or the decision to participate in the labour market. We speculate that neighbourhood effects for such decisions are even stronger than the neighbourhood effects on school performance stricto censu. Put differently, we speculate that close neighbours have more influence on own preferences than on own resources. Also further research is needed to explore better the channels through which children living in the same neighbourhood influence each other. It is obviously a key issue for defining public policies. In particular, it would be useful to better identify the contribution of social interaction during extra-curricular activities. Generally speaking, similar evaluations need to be performed in other countries to explore whether (and why) the role of social interaction varies across societies.
There is a related literature which studies interactions among close neighbours, although the focus is not on educational attainment (Ioannides, 2002, 2003; Ioannides and Zabel, 2003; Case and Katz, 1991).
The Panel Study of Income Dynamics (PSID) has a similar cluster design. The PSID sample is much smaller than the LFS, however, and it is not possible to focus on neighbours of a similar age. Solon et al. (2000) use the PSID to analyse the correlation in educational outcomes between a small sample of adults (N = 687, aged 25–33) who were living in the same neighbourhood at some point in the past, when they were children.
Put differently, 15-year-old respondents who are in the ninth grade are ‘normal-age’, i.e., the expected age for entry to a given grade without repeating or skipping any grades up to that time.
For example, the proportion of adolescents in the neighbourhood born at the beginning (or at the end) of the year is constructed without using the date of birth of the respondent. This only uses the date of birth of other adolescents living in the aire.
There is a literature on the influence of date of birth within the year (called the relative age effect) going back a few decades (Barnsley et al., 1985; Allen and Barnsley, 1993). It is shown that relative age has an impact on achievement in competitive activities (Hockey, Soccer), on achievement at school, on emotional development and even on the probability of committing suicide.
The initial condition of the recursive definition of yk (k = 7, …, 16) is obtained by setting θ6 and α6 equal to zero.
Notice that if θ15 were known, the identification of the endogenous effect β15 would only require an assumption that own date of birth, as such, does not affect neighbours’ late transitions. To check the robustness of our results, it is possible to estimate β15 conditional on various value of θ15 (varying from 0 to 1) using neighbours’ date of birth as the only instrumental variable. This strategy provides us with an upper and a lower bound for the parameter of interest β15 using a weaker exclusion restriction.
According to a survey conducted in 2003 by the French Statistical Office (i.e., the survey ‘Formation et Qualification Professionnelle’), about 20% of French individuals born in 1976–85 have repeated a grade in primary school, 17% at the beginning of junior high-school and 14% at the end of junior high-school.
For a presentation of the over identification and endogeneity tests used in this article, see for example Wooldridge (2002) pp. 118–24).
It should be emphasised that the correlation between own relative age and own late grade repetition is the combination of a potentially positive direct effect (relatively old children are more mature and perform better) and a potentially negative indirect effect (relatively old children are more likely to have already repeated a grade and, because of that, less exposed to further grade repetition). Hence, under the assumption that the direct effect is zero, the correlation between relative age and late grade repetition is not necessarily zero but cannot be negative. Interestingly, the reduced-form effect of own date of birth in Table 2 column 1 shows that this correlation is actually positive, which means that our identifying assumption is not rejected.
Interestingly, comparing experimental and non-experimental estimates, Kling et al. (forthcoming) do not find evidence of upward bias from non-random sorting of households across neighbourhoods, as would occur under the assumption that persons with good unobservables also have good outcomes and live in good neighbourhood.
If θ15 were assumed equal to (say) 0.5 then the dependent variable would be the mean of the two previous ones. We have checked that when we use this specification we obtain an evaluation of the endogenous effect which lies in between those obtained by setting θ15 = 0 or θ15 = 1.
It should be emphasised, however, that the similarity of parents’ education and nationality is not sufficient for explaining the correlation between the educational outcomes of adolescents who have been living in the same non-HLM neighbourhood for more than one year. As shown below in column 5 of Table 5, the correlation between the performance of adolescents who have been living in the same neighbourhood for more than one year remains significant and large even after controlling for the two main sources of endogenous neighbourhood membership, i.e., parental education and nationality. A significant part of the observed correlation between the performance of children and the performance of their neighbours is due to endogenous neigbourhood membership but a significant part is not explained by this phenomenum and consistent with the existence of significant neighbourhood effects.
Oreopoulos (2003) examines the labour market outcomes of adults who were assigned to different public housing projects in Toronto (when children). He does not find a very significant long-run effect of having been assigned to relatively poor neighbourhoods. In his paper, neighbourhoods correspond to census tracts and contain about 1,000 to 3,000 households. It is one potential explanation for the difference between his findings and ours. In the early 1980s, the French Statistical Office conducted a very interesting survey on interactions between neighbours which shows that French households interact on average with 2 or 3 very close neighbours only (Héran, 1986).
Piketty (2004) has used this dataset to explore the effect of class size on early school performance in France.
To be more specific, the variance of the errors is proportional to , where is the sampling rate used within classes.
We have checked that the estimated effects do not increase when we further restrict the sample.
To begin with, there is some flexibility in the choice of the junior high-school (called collège). As a consequence, two neighbouring adolescents do not necessarily attend the same collège: about 20% of adolescents attend private collège and another 20% do not attend the nearest public collège. Given this fact, one can estimate that only about one third of neighbouring adolescents actually attend the same collège. And even if‘ they do, the probability remains weak that they will be found in the same class. According to the ministry of education, a typical collège has on average 10 classes for the eighth and ninth grades. Hence, the probability of finding in the same class two 15-years-olds attending the same collège is not more than about 10%.
This Appendix expresses the parameters of (2) and (3) as a function of the parameters of (1). To begin with, after averaging, (1) yields,
which can be rewritten,
Using (A2), (1) implies,
which can be rewritten,