Sibling Spillover Effects in School Achievement

We provide the first empirical evidence on direct sibling spillover effects in school achievement using English administrative data. Our identification strategy exploits the variation in school test scores across three subjects observed at age 11 and 16 and the variation in the composition of school mates between siblings. These two sources of variation have been separately used to identify school peer effects, but never in combination. By combining them we are able to identify a sibling spillover effect that is net of unobserved child, family and school characteristics shared by siblings. We find a modest spillover effect from the older sibling to the younger but not vice versa. This effect is considerably higher for siblings from deprived backgrounds, where sibling sharing of school knowledge might compensate for the lack of parental information.


NON-TECHNICAL SUMMARY
In this paper we estimate how much a younger sibling's school achievement is affected by his/her older sibling's achievement at school ("sibling spillover effect"). This is an important question to answer as it helps us understand whether investments in children may have multiplier effects through their impact on younger children. We are the first to investigate this issue.
The older sibling's achievement may have a direct effect on the younger sibling's school grades if 1) the older sibling teaches the younger sibling or helps with homework; 2) the younger sibling imitates the older sibling, for example in their work style, or conversely tries to be different, for example to avoid competition; 3) the older sibling passes on important information about educational choices or school and teachers to the younger sibling.
When trying to assess the extent of any sibling spillover effects we need to be careful that we distinguish the direct influence of the older to the younger sibling from any similarities in their exam grades that are caused by the fact that they come from the same family and are likely to go to the same school. This paper does this by combining several techniques known to economists.
Our study shows that there is a small direct effect from the older sibling's test scores to the younger sibling's exam marks. More precisely, for each GCSE exam grade improvement of the older sibling -for example from a B to an A -the younger sibling's exam marks would go up by just 4% of a grade. This effect is about equivalent to the impact of increasing yearly spending per pupil in the younger sibling's school by £670 (see Nicoletti and Rabe 2012).

Introduction
In this paper we study the extent to which school achievements of an older sibling directly improve the school outcomes of their younger sibling. Assessing the magnitude of sibling spillover effects is important to understand whether interactions between siblings are a relevant mechanism through which intergenerational transmission of disadvantage operates. It also helps us to understand whether sibling interactions are a mechanism through which the effect of investments in children may be amplified by the so called social multiplier effect (see Manski 1993 andGlaeser et al. 2003). A large positive spillover effect would suggest that there are externalities of parental and public investments into children through their positive effects on both the children and their siblings.
While the economic literature recognizes the important role of parent-child interactions for child development, 1 the role of sibling interactions is yet to be clearly established. Previous economic papers concentrating on siblings have mainly focused on the intrafamily allocation of resources (Becker 1981), where parental investments into children's human capital depend on parental preferences regarding inequality between children, 2 on birth order and the number and gender composition of siblings. 3 It is only very recently that researchers have begun to look at the effect of interactions between siblings on educational outcomes (see Oettinger 2000;Qureshi 2011;Joensen and Nielsen 2013;Adermon 2013).
We provide empirical evidence on the extent to which cognitive abilities of a child are transmitted to his/her younger sibling. More precisely we estimate the direct sibling spillover effect of a child's school test scores at age 16 on her younger sibling's test scores at the same age. We are interested in the direct causal effect of a child on his/her younger sibling rather than the indirect effect that is mediated by common background characteristics or by intra-household allocation of investments between siblings (see Behrman et al. 1982, Solon et al. 2000, Björklund and Salvanes 2011. Such a direct link might exist for several reasons. For example, there may be productivity spillovers from the older to the younger sibling through help with homework and joint leisure. Another mechanism may be imitation or differentiation, which happens because a sibling gains utility from behaving similarly or opposite to their sibling (Joensen and Nielsen 2013). Here a sibling may be a role model for academic behavior and aspirations, but factors such as wanting to avoid competition may also lead to opposite behavior and/or specialization in different areas. Finally, a further mechanism is information sharing. A sibling may share information about the costs and benefits of educational choices and of exerting effort, as well as insider knowledge pertaining to a school or specific teachers.
Information sharing has been identified as an important driver of the sibling spillover effect by Dahl et al. (2013) and Joensen and Nielsen (2013), who look at sibling spillover effects on the decision to take paternity leave and on school subjects choices respectively.
For school achievement, information sharing between siblings is likely to be more important in families where parental information on educational choices and school/teacher-specific knowledge is scarce and perceptions of costs and benefits are incomplete. We assume that this lack of parental information is a more serious problem in disadvantaged families and we investigate this in subgroup analysis.
Apart from providing new insights about the size of the sibling spillover effect and its potential role in promoting social mobility, this paper presents an estimation approach that allows us to improve on the previous literature in the field and to identify the spillover of school attainment between siblings while minimizing biases due omitted variables issues, in particular those related to parental investments. More precisely we adopt an identification strategy that can be viewed as a combination of two different methods previously adopted to identify school and university peer effects. The first method exploits the variation of school test scores across subjects (see Lavy et al. 2012), while the second exploits the fact that siblings and school mates are two peer groups that are not perfectly overlapping (see Bramoullé et al. 2009;De Giorgi et al. 2010).
Simply regressing a child test score on the older sibling's corresponding test score would not produce a consistent estimation of the sibling spillover effect because the estimated sibling association would be in part explained by similarities in inherited abilities, in school and family investments and characteristics, and in the environment they are exposed to.
To clean the sibling association in test scores of these confounding factors, we use school register data in England which provides information on tests scores at the end of compulsory 2 schooling, at about age 16, in Mathematics, English and Science for the full population of students in state schools.
We regress a child's test score on her older sibling's test score using within-pupil betweensubject estimation, i.e. estimating student fixed effects across subject. 4 The two main gains of this fixed effect estimation are that it allows us to (i) control for the younger child's unobservable average ability and other characteristics across the three subjects that might confound the spillover effect because they are similar to the corresponding characteristics of his/her sibling, (ii) clean the sibling spillover effect of the impact of the allocation of resources by schools and parents between siblings that do not vary across subjects. Differential parental investments can be related to factors such as sibling differences in abilities, number of siblings in the family or sibling sex composition, for example. Controlling for these differences is especially important in studies of child cognitive abilities because it has been found that parents invest differentially in two siblings in an attempt to either compensate or reinforce for differences in their abilities (see Behrman et al. 1982 and.
Further we consider school and family factors that vary by subject. If siblings receive similar subject-specific investments in schools, for example from good Mathematics teachers, then the sibling spillover effect could be overestimated. This is likely to happen because siblings tend to go to the same school or to sort into similar schools. To take account of such subject-specific school characteristics we rely on school-by-cohort-by-subject fixed effects.
Finally, we need to take account of subject-specific skills acquired from parents through family investments and/or inheritance that are shared by siblings. By conditioning on the younger sibling's past subject-specific test scores, at age 11, we take account of subjectspecific skills that are transmitted from parents to their children in the period that goes from birth to age 11. This is still not enough to identify a causal sibling spillover effect because there could be an intergenerational transmission of subject-specific skills between age 11 and 16, which is probably similar between siblings.
To tackle this last issue, we instrument the older sibling's test scores at age 16 using the average test scores of her school mates. 5 This peer identification strategy is similar to that adopted by Lee (2003), Bramoullé et al. (2009) andDe Giorgi et al. (2010) and is based on the presence of some intransitivity in the network of peers. Intransitivity occurs if a person interacts with her peers but not with all of the peers of her peers. In our application we have intransitivity because we assume that the older sibling's school mates do not interact directly with the younger sibling. This implies that, while the older sibling's test scores can be affected directly by her school mates' results, there is no effect from the older sibling's school mates on the younger sibling if not indirectly through the older sibling. We test this assumption by performing a number of sensitivity checks on the data, for example by excluding school mates of the older sibling who live in the same neighbourhood and might therefore interact directly with the younger sibling. Note that we are not interested in estimating a causal school peer effect for the older sibling, as the validity of the instrument does not rely on this. In fact, the effect of the older sibling's school mates on the older sibling can be the consequence of a causal school peer effect but also of factors such as unobserved contextual variables, e.g. the quality of the teachers in the school-cohort of the older sibling, or of school composition effects.
Except for the use of instrumental variable estimation to identify the effect of school peers on children's abilities our strategy is similar to that used by Lavy et al. (2012). The main difference is that, because we are estimating sibling rather than school peer effects, our identification is more challenging and requires to take into consideration the confounding effect caused not only by similar school characteristics of the two siblings but also caused by intergenerational transmission, differential parental investments to compensate or reinforce for differences in siblings' abilities, and subject-specific parental investments; but, contrary to Lavy et al. (2012), we are less concerned about the reflection problem.
Empirical researchers estimating a causal effect between individuals' outcomes are usually concerned with the reflection problem (see Manski 1993), i.e. simultaneity of the individuals' behaviour and potential reverse causality. We actually cannot exclude the existence of spillover effects going from the younger to the older sibling, 6 but in our application the younger sibling's age 16 exam is in the future with respect to the corresponding older sibling's exam at age 16. Therefore reverse causality seems unlikely. Nevertheless, if this was sibling's school mates. The variation in the instrument is caused by idiosyncratic changes in the average subject-specific test score across schools and/or across cohorts.
6 Ewin Smith (1990) and(1993) suggests that the cognitive abilities of older children might improve thanks to teaching younger siblings. not enough to correct for the potential reflection issue, our instrumental variable estimation would be able to control for it because the average test scores of the older sibling's school peers cannot be affected by the younger sibling and therefore there cannot be any reverse causality.
We find that an increase of a standard deviation in a child's test score at age 16 leads to a small increase in the corresponding test score observed for his/her younger sibling of about 2% of a standard deviation. This means that for each exam grade improvement of the older sibling -for example from a B to an A -the younger sibling's exam marks increase by 4% on average, which is equivalent to the impact of increasing yearly per pupil school expenditure in the younger sibling's school by £670 (see Nicoletti and Rabe 2012).
Interestingly, if the two siblings attended the same school, the spillover effect almost doubles (to about 4%). This seems to suggest that there is a more effective transmission of abilities between siblings attending the same school, possibly because the older sibling has direct information relevant for succeeding in the shared school. Another striking result is that the spillover effect is significantly larger for children from disadvantaged than for children from more affluent families, and this result holds across different ways of measuring disadvantage.
This suggests that sibling sharing of school knowledge may compensate for the potential lack of parental information in disadvantaged households and that there are externalities from investing in learning of disadvantaged children which have so far been overlooked.
The remainder of this paper unfolds as follows. The next section discusses the related literature. Section 3 lays out our identification strategy and Section 4 introduces our data set. Section 5 presents our empirical results including heterogeneity analysis and robustness checks, and Section 6 concludes.

Related literature
Over many years, social scientists have used sibling correlations in socio-economic and educational outcomes to measure the importance of family background, where any sibling resemblance indicates that family background matters. Since Solon et al. (2000) introduced the variance decomposition approach to put bounds on the possible magnitude of family and neighborhood effects, using correlations between siblings and between unrelated neighbors, a large number of empirical papers have analyzed sibling correlations in different outcomes (see Raaum et al. 2006;Mazumder 2008;Björklund et al. 2002;Björklund et al. 2009;Björklund et al. 2010;Lindahl 2011;Björklund and Salvanes 2011;Nicoletti and Rabe 2013).
A method to estimate the part of the sibling correlation that is related to intergenerational transmission has been proposed by Corcoran et al. (1990) (see also Solon 1999), who show that under some assumptions the sibling correlation in a specific outcome (e.g. earnings and income) can be decomposed in the sum of the square of the intergenerational elasticity in the corresponding outcome and of a residual sibling correlation. Studies that have applied this decomposition have generally found that the part of sibling correlation related to intergenerational elasticity is quite small and well below 50%, e.g. Björklund and Jäntti (2012) found that the share of sibling correlation in IQ (schooling years, earnings) related to the intergenerational elasticity is about 17% (48%, 9%). Bingley and Cappellari (2014) revisited this decomposition in the case of sibling correlation in earnings and show that, allowing the permanent earnings to change across the life-cycle and the intergenerational mobility in earnings to vary across families, the father's earnings transmissions account for about 80% of the sibling correlation. However, only more recently have economists tried to identify the part of the sibling correlation or association which is explained by sibling interactions, i.e. the causal sibling spillover effect.
In Table 1 we summarize the results of previous papers on sibling spillover effects that have identified a causal spillover effect. In all papers, the reflection problem is dealt with by using instrumental variables that explain the outcome of one sibling but not the other. 7 These papers study a wide range of outcomes, including high school graduation (Oettinger 2000), years of schooling (Qureshi 2011;Adermon 2013), school subject choices (Joensen and Nielsen 2013), teenage motherhood (Monstad et al. 2011), paternity leave take-up (Dahl et al. 2013). We focus our discussion on the papers looking at educational outcomes. Oettinger (2000) uses the US National Longitudinal Survey of Youth 1979 (NLSY79) to estimate the sibling spillover effect in the probability of a child graduating from high school by age 19 and uses as instrumental variables background characteristics that are sibling 7 Another two papers looking at spillover effects of siblings, which are interesting even if they do not use instrumental variables for their estimation are Kuziemko (2006) and Altonji et al. (2013). Both papers use panel data and estimate dynamic models to identify sibling spillover effects on fertility and teenage substance use, respectively. 6 specific, such as whether the family was intact during childhood and the unemployment rate at age 18. He does not find any statistically significant spillover effect from the younger to the older sibling but finds some significant effects from the older to the younger sibling (See Table 1). Unfortunately, it is difficult to believe that the instrumental variables used by Oettinger (2000) are uncorrelated with unobserved family and neighbourhood characteristics that can explain both siblings outcomes. Qureshi (2011) focuses on years of schooling and estimates a spillover effect going from the oldest sister to younger brothers in rural Pakistan. Exploiting the fact that there is a strong gender segregation of schools in Pakistan, she instruments the oldest sister's years of schooling using school distance to the closest girls' school. Even if the school distance is usually not random for children living in developed countries, this instrumental variable seems plausible in the context of rural Pakistan. Qureshi (2011) finds that a one year increase in the schooling of the oldest sister leads to almost half a year increase of schooling for her younger brothers and to large and statistically significant effects on her younger sibling's literacy and school enrollment. These large effects are in part explained by the fact that, in developing countries, child care is not exclusively a parental responsibility and older sisters often have caring responsibilities for their younger siblings. Adermon (2013) and Joensen and Nielsen (2013) both use the introduction of policy reforms that changed the conditional probability (cost) of a given outcome for a random portion of siblings to identify sibling spillover effects. Joensen and Nielsen (2013) look at a pilot school reform implemented in Denmark which reduced the cost for students of choosing advanced Mathematics and Science courses because of the introduction of a more flexible choice set for subject combinations. They are interested in the effect of choosing advanced Mathematics and Science courses on the younger sibling's probability of doing the same. The reform was adopted only by some schools and there do not seem to be any systematic differences between schools that introduced this reform and schools that did not. To avoid any potential bias caused by an endogenous selection of students into schools that implemented the reform, the authors consider only the first year of the implementation i.e. they consider only children who could not anticipate the reform at the time of school enrollment. The probability of choosing advanced Mathematics and Science increases by 7 about 33.4 percentage points for children whose older sibling chose these advanced subjects and this spillover effect is statistically significant at the 10% level. Adermon (2013) exploits the increase in the minimum school leaving age in Sweden.
An increase of two years in the school leaving age was introduced at different times in different municipalities. The timing of the implementation by different municipalities is not completely random, but Holmlund (2008) suggests that it is exogenous after controlling for birth cohort and municipality fixed effects and municipality-specific trends. Using the school reform dummy for the older sibling as instrument and controlling for municipality fixed effects and trends, Adermon (2013) does not find any significant sibling spillover effect on years of schooling.
There is a concern that papers relying on policy reforms such as Adermon (2013) might estimate both the "direct" effect of sibling interactions and the "indirect" effect mediated by parental allocation of resources between siblings (see also Monstad et al. 2011 on teenage motherhood). This can happen if there are unobserved parental investments and the instrumental variables considered are not independent of these investments. For example, in the case of a school reform the sibling spillover effect might capture the effect caused by the parental reallocation of resources between siblings in reaction to the introduction of that reform. Reforms raising the school leaving age as used by Adermon (2013) were implemented over long time-periods, and parents might motivate the older sibling not affected by the reform to stay in school for longer and/or discourage the younger sibling from staying on after compulsory schooling ends in a bid to equalise between siblings. In this case, the sibling spillover effect would be underestimated. This paper is free of this problem because our identification is based on an instrumental variable that exploits group membership and idiosyncratic changes in the average subject-specific test score across cohorts rather than a reform that may allow parents to reallocate resources between siblings to attenuate the potential sibling differences caused by the reform. 8 Moreover, our identification strategy is not based on instrumental variables that are context specific, such as those in Joensen and Nielsen (2013) and Qureshi (2011).
While several of the previous papers on education and other outcomes referenced in Table   1 point to a positive spillover effect from the older to the younger sibling, there is not a lot of evidence on the possible channels through which these effects operate. The most thorough investigation of possible channels is presented by Dahl et al. (2013) in their study of spillover effects of taking up paternity leave on brothers and co-workers. The authors distinguish the strength of ties between peers, measured by the duration, intensity, and frequency of social interactions and find that stronger ties are associated with higher spillover effects. In the sibling context we might expect interactions to be more intense and frequent between samesex siblings, closely spaced siblings and sibling pairs with a smaller number of children in the family, and we investigate these factors in heterogeneity analysis. Moreover, the authors investigate the importance of information sharing as a channel for the spillover effect. They can show that the spillover effect is higher for groups that benefit more from the information that is being transmitted in the interaction. For example, information about how employers react to workers taking paternity leave should be more valuable in firms where job security is low. Consistent with this hypothesis the authors find peer effects that are twice as large in low unionization than in high unionization workplaces. In our application we hypothesize that information on costs and benefits of educational choices will be more valuable to children from lower socio-economic status families where such information is not freely available from parents than for families where this information is a public good. We can show that in line with this hypothesis the spillover effect is larger for siblings from disadvantaged families than for other siblings. Moreover, school-specific information seems to be of particular importance, as the spillover effect between siblings going to the same school is considerably higher than between siblings going to different schools.
Other strands of the economic literature closely related to our paper on sibling spillover effects in test scores are the literature on educational production and child development (see Todd and Wolpin 2003;Cunha and Heckman 2007;Cunha and Heckman 2008;Hanusheck and Woessmann 2011) and on school peer effects (see Sacerdote 2011 for a review). These research strands have provided a theoretical framework to model the production of children's cognitive abilities taking account of family and school inputs and of the possible school peer effects, but they have not focused on the potential effect of interactions between siblings.
In this paper, we extend the recent work on education production models and school peer effects by Nicoletti and Rabe (2012) and Lavy et al. (2012), who both use school register data for England as in our application, and we provide detail on how to identify the causal sibling spillover effect in school test scores at age 16, i.e. the effect of sibling interactions on child development during adolescence.
The only other papers we are aware of that focus on the direct effect of siblings interactions on child development belong to the psychology literature and usually focus on early child development. Cicirelli (1972) and Dunn (1983) provide evidence that young children are effective teachers for their younger siblings. Gregory and Williams (2001) emphasize the importance of older siblings in transmitting school values to their younger siblings, especially in immigrant households where parents have difficulty to talk the language spoken at school. Azmitia and Hesser (1993) compare sibling and peer influence on child cognitive development and find that older siblings are more effective in teaching their younger siblings than unrelated children of the same age.

Identification strategy
To identify the sibling spillover effect on test scores at the end of compulsory schooling (at about age 16) we consider the following value added model: 9 where Y 1,isqt,16 is the age 16 test score of the younger child of the sibling-pair i in school s and subject q, who belongs to the cohort t; 10 Y 1,isqt,11 is the corresponding test score at age 11; Y 2,is ′ qt ′ ,16 is the test score at age 16 of the older sibling, who might have attended a different school s ′ and belongs to a different cohort t ′ ; 11 I F 1,it is the family investment in the younger child of the sibling-pair i between age 11 and 16; I S 1,ist is the corresponding school investment that is not subject specific; X 1,i is a row vector of other child, household and school characteristics, which are not direct investments in a child's cognitive skills but may affect them; µ sqt are unobserved investments that vary by school, cohort and subject; µ 1,i is the younger child's unobservable ability; and e 1,isqt,16 is an error term which is assumed to be identically and independently distributed with mean zero and homoscedastic. In this model 9 See Todd and Wolpin (2003) for a definition. 10 Two students belong to the same school cohort if they began school in the same year. 11 We do not consider twins or siblings whose age gap is such that they begin school in the same year. ρ measures the persistence in test scores between age 11 and 16; γ is our main parameter of interest which measures the spillover effect from the older sibling to the younger; β 1,F and β 1,S are the productivity of family and school investments; and β 1,X is a column vector with the effects of the remaining explanatory variables X 1,i , and α is the intercept. We observe for each sibling-pair their test scores in Mathematics, English and Science so that q takes value 1 for Mathematics, 2 for English and 3 for Science.
Identifying the causal spillover effect in test scores from the older to the younger sibling, γ, is challenging because of two main issues: (i) unobserved correlated effects, i.e unobserved common characteristics of two siblings that may explain their similar test scores and (ii) the reflection problem.
We control for unobserved child specific endowments and characteristics that do not vary across subjects but that could be similar between siblings by transforming model (1) in deviations from the mean across subjects, i.e. we transform the dependent variable in DevY 1,isq,16 = Y 1,isq,16 − ∑ school-by-cohort-by-subject fixed effects that control for µ sqt , i.e. for unobserved subjectspecific school investments and characteristics for the cohort t. In our sample a high percentage of siblings, 83.5%, attend the same secondary school, but even if two siblings attend two different schools they might sort into schools with similar characteristics, e.g. similar quality of teachers in Mathematics or peers with similar subject-specific abilities. Controlling for school-by-cohort-by-subject fixed effects allows us to clean the sibling spillover effect from the confounding effect of such school similarities between siblings.
The issue of unobserved subject-specific family investments and skills inheritance is more challenging. By controlling for the lagged test score, i.e. the test score in subject q at age 11, we estimate a spillover effect that is purged of the influence of such family characteristics up to the age of 11. To also control for the effect of these unobserved subject-specific characteristics between ages 11 and 16, we adopt instrumental variable estimation. We instrument the subject-specific test score of the older sibling at age 16 using the average of DevY js ′ qt ′ ,16 over the school-by-cohort peers of the older sibling, excluding the test score of the older sibling, which we call M DevY 2,s ′ qt ′ ,16 . We assume that a student can be affected by the test scores of the school peers of her sibling only through her sibling. This assumption could be invalid if there is direct interaction between the older sibling's school mates and the younger sibling, for example. We discuss this and other possible threats to identification in section 5.2 and present a number of robustness checks. For example, we exclude the older sibling's school peers who live in the same neighborhood from the computation of M DevY 2,s ′ qt ′ ,16 to assess whether possible interaction within a neighborhood may affect results. We conclude from these checks that our estimated sibling spillover effect holds across a number of specifications.
To be a valid instrumental variable M DevY 2,s ′ qt ′ ,16 must be also uncorrelated with any The use of the instrumental variable estimation and the fact that the older sibling's test scores at age 16 are observed earlier in time than the younger sibling's test scores allows us to address potential reflection issues, i.e. it allows us to cancel any potential causal relationship that goes from the younger to the older sibling rather than vice versa.
We use the same type of instrumental variable estimation to compute the spillover effect from the younger to the older sibling. The model specification is identical to model 1 with the subscripts 1 and 2 exchanged to swap the role of the younger sibling with the one of the older sibling.

Data
The empirical analysis is based on the National Pupil Database (NPD), which is available from the English Department for Education and has been widely used for education research.
The NPD is a longitudinal register dataset for all children in state schools in England, covering roughly 93% of English students. It combines student level attainment data with student characteristics as they progress through primary and secondary school.

Educational system in England
Full-time education is compulsory for all children aged between 5 and 16, with most children attending primary school from age 5 to 11 and secondary school from age 11 to 16. The education during these years is divided into four Key Stages. Students take externally marked National Curriculum Tests at the end of Key Stages 2 and 4. Until recently such national tests were also carried out at Key Stages 1 and 3 but today progress at these stages is examined via individual teacher assessment.
Key Stage 2 National Curriculum Tests are taken at the end of primary school, usually at age 11. Pupils take tests in the three core subjects of English, Mathematics and Science. Key Stage 4 tests are taken at age 16 at the end of compulsory schooling. Pupils enter General Certificate of Secondary Education (GCSE) or equivalent vocational or occupational exams at this stage. They decide which GCSE courses to take, and because English, Mathematics and Science are compulsory study subjects, virtually all students take GCSE examinations in these topics, plus others of their choice, with a total of ten different subjects normally taken.

13
In addition to GCSE examinations, a pupil's final grade may also incorporate coursework elements. Key Stage 2 and 4 test results receive a lot of attention nationally as they play a prominent role in the computation of so-called school league tables, which are used by policy makers to assess schools and by parents to inform school choice.

Outcome and observed background
We focus on GCSEs (Key Stage 4) because they mark the first major branching point in a young person's educational career, and lower levels of GCSE attainment are likely to have a longer term impact on experiences in the adult labour market. We consider results in the core subjects English, Mathematics and Science which are directly comparable to test results at the end of primary school. Students receive a grade for each GCSE course, where pass grades include A*, A, B, C, D, E, F, G. We use a scoring system developed by the Qualifications and Curriculum Authority to transform these grades into a continuous point score which we refer to as the Key Stage 4 score. 12 We control for lagged cognitive achievement using Key Stage 2 National Curriculum tests taken at the end of primary school, usually at age 11, in English, Mathematics and Science.
In the Key Stage 2 exams, pupils can usually attain a maximum of 36 points in each subject, but teachers will provide opportunities for very bright pupils to test to higher levels. All test scores are standardized to have a mean of zero and a standard deviation of one.
The NPD annual school census allows identification of a number of individual and family background variables. These include month and year of birth and gender of the student, ethnicity, whether or not the first language spoken at home is English, any special educational needs identified for the child, eligibility for free school meals (FSM) 13 , area of residence and the number of siblings in the family. As we control for child fixed effects in all our models we do not use these variables as explanatory variables, but we use some of them for heterogeneity and sensitivity analysis.
Sibling definition 12 A pass grade G receives 16 points, and 6 points are added for each unit improvement from grade G. 13 FSM eligibility is linked to parents' receipt of means-tested benefits such as income support and incomebased job seeker's allowance and has been used in many studies as a low-income marker (see Hobbs and Vignoles 2007 for some shortcomings).
14 The NPD includes address data, released under special conditions, which allows us to match siblings in the data set. The first year that full address details were collected in the NPD across all pupil cohorts was 2007. Siblings are therefore defined as pupils in state schools aged 4-16 and living together at the same address in January 2007. Siblings that are not school-age, those in independent schools and those living at different addresses in January 2007 are excluded from our sibling definition.
Step and half siblings are included if they live at the same address, and we are not able to distinguish them from biological siblings (see Nicoletti and Rabe 2013 for details).

Sample restrictions
The main sample for our analysis includes all sibling pairs taking their Key Stage 4 exams in 2007,2008,2009 or 2010. We remove from the data all twins and siblings attending the same academic year. When we have multiple pairs of siblings from one family in the observation window we consider the two oldest students to avoid any multiplier spillover effects (what Dahl et al. 2013 call the snowball effect). 14 We also remove pupils with duplicate data entries or with missing data on background variables from the dataset. Moreover, we retain only pupils for whom we have non-missing test scores for all outcomes at both Key Stages 2 and 4 which leads to a reduction in sample size of 13.7%. Missing cases are concentrated among low attaining students that are more likely to be absent at the exams or, at Key Stage 4, choose not to take exams in one or more of the core subjects. Comparing the original with the retained sample the average test score is increased by about 1%. We also exclude "special schools" that exclusively cater for children with specific needs, for example because of physical disabilities or learning difficulties, as well as schools specifically for children with emotional and/or behavioural difficulties. The final sample contains 435,890 siblings (217,945 sibling pairs). Table 2 reports the means and standard deviations of the unstandardized test scores at age 11 and 16 (Key Stages 2 and 4) respectively; but in all our estimated models we consider the standardized test scores by subject. The bottom panel of the Table also provides main characteristics of siblings used in heterogeneity and robustness analysis.

Main empirical results
We begin by reporting in Table 3 the correlations in sibling's test scores which are a general measure of the importance of background shared between siblings on educational outcomes.
In column (1) of Table 3 we show the raw correlation in test scores (0.50) which is in line with previous papers (e.g. Nicoletti and Rabe 2013;Björklund et al. 2010). In column (2) we display the sibling correlation in test scores net of the effect of past test scores obtained by the younger sibling at the end of primary school, which we estimate by using a value added model, i.e. by regressing the test scores at 16 on the sibling's test scores at 16 and controlling for test scores at 11. 15 This sibling correlation captures the effect of shared family and environment characteristics which operate between ages 11 and 16. We can see that the net sibling correlation is 0.31. In column (3) we show the correlation estimated using the value added model with lagged test scores and controlling for the younger child fixed effects (0.13). This nets out the influence of all environment, family and child characteristics that are invariant across subjects, including the intra-household allocation of resources between siblings. Finally, in column (4) we show the sibling correlation estimated using both child fixed effects and school-by-cohort-by-subject fixed effects. The latter net out subject-specific school charcteristics. This correlation (0.09) therefore comes closest to capturing a causal relationship, but it can be still overestimated because of unobserved subject-specific skills transmitted in the family that are similar between siblings. 16 In Table 4 we present our main estimates of the sibling spillover effect in school test scores from the older to the younger sibling at 16 (end of compulsory schooling) when controlling for individual fixed effects as well as for school-by-cohort-by-subject fixed effects and using instrumental variable estimation to eliminate the bias caused by omitted subject-specific family investments and characteristics (see column 1). Furthermore in column (2) we report the corresponding instrumental variable estimation for the sibling spillover effect going from the younger to the older sibling. For both estimations we consider a value added model (1) 15 Since the test scores at ages 11 and 16 are standardized by subject to have mean 0 and variance 1, we can estimate the raw correlation in test scores by a simple regression of the test scores at 16 on the sibling's test score at 16 and the net correlation by estimating the value added model. 16 For more details on why measures of intraclass correlations do not generally capture a causal peer effect see Angrist (2013).
that controls for past test scores obtained at the end of primary school. We are not concerned about the endogeneity of the lagged test caused by the fact that child unobserved endowments influence both the test scores at ages 11 and 16 because all our estimations control for child fixed effects and therefore eliminate child unobserved endowments. 17 Our instrumental variable estimation is a two-stage least square (2SLS) estimation with fixed effects and the instrument we use is the subject-specific average test score for the school-cohort peers of the older sibling (younger sibling in column 2). Because in equation (1) we control for both child fixed effects and school-by-cohort-by-subject fixed effects, the instrument captures whether the older (younger in column 2) sibling's school-cohort mates were relatively better in a specific subject than the younger sibling's school-cohort mates. The variation in the instrument is caused by idiosyncratic changes in the average subject-specific test score across schools or within the same school but across different cohorts. These changes can occur because of changes in the quality of teaching in a specific subject (e.g. because of teacher turnover) or in the composition of the school-cohort mates in terms of subject-specific abilities across schools or within schools across cohorts.
The top panel of Table 4 shows the first stage results. We find that there is a strong relationship between our instrument and the older sibling's test scores. The coefficient is statistically significant at the 1% level and the F-statistics for the significance of the instrumental variable in the first stage is huge and does not leave any doubt on the validity of the instrument. Second stage results are displayed in the bottom panel of the Table. Looking first at the sibling spillover effect from the older to the younger sibling (column 1), we find that an increase of 1 standard deviation in the test score of the older sibling leads to an increase of 2.4% of a standard deviation in the corresponding test score of the younger sibling.
This spillover effect seems small, but it is strongly statistically significant. In contrast, there is no statistically significant spillover effect in test scores going from the younger to the older sibling (see column 2). This is in line with expectations, as we would not expect the age 16 test scores of the older siblings to be affected by their younger sibling's tests that take place in the future. The endogeneity test reported in Table 4 indicates that we can strongly reject the equality of the estimation with fixed effects and of the 2SLS estimation with fixed effects, therefore rejecting the exogeneity of the sibling test score. Comparing the sibling spillover effect reported in column (1) of Table 4 with the sibling correlation in test scores net of a child fixed effect (Table 3, column 3) we can see that subject-specific family and school investments that are similar between siblings and that explain the sibling association in test scores are quite important. After controlling for these subject-specific family and school investments the correlation is much reduced.

Threats to identification: Robustness checks
In this section we discuss threats to the validity of our identification strategy and probe the stability of our baseline estimates to alternative specifications. We conclude that our estimated sibling spillover effects are robust across a number of specifications.

Direct influence of older sibling's school mates on the younger sibling
We instrument the older sibling's school achievement using measures of his/her school mates' achievement. Therefore we need to assume that there is no direct influence from the older sibling's school mates on the younger sibling. However, there is a possibility that school mates could directly interact with the younger sibling, and this could violate the exogeneity assumption. Direct interactions between the older sibling's school mates and the younger sibling can take place in the neighborhood, in school or at home, and we look at each of these possibilities in turn.
It may be the case that children living in the same neighborhood interact and play with each other outdoors even if they do not belong to the same cohort, for example by meeting up in parks or hanging out near local shops. 18 Although previous evidence for England shows that there are no neighborhood peer effects in school achievement (Gibbons et al. 2013), we still want to allow for this possibility. In our data, we can define neighborhoods based on Lower Level Super Output Areas which are statistical geographies created to reflect proximity and social homogeneity and have an average of roughly 1,500 residents and 650 households. In our sample, an average of 9 peers from the same school and cohort live in a neighborhood defined in this way. This is only 5% of a school cohort which comprises 181 pupils on average. So the interaction within a neighborhood is limited to a small fraction of the peers the older sibling is exposed to at school while learning. Taking into account that students may interact within a wider geographical area, we also look at Middle Layer Output Areas (with a minimum size of 5,000 residents and 3,000 households with an average population size of 7,500). An average of 33 peers from the same school and cohort live in an area thus defined, which is 18% of an average school cohort. We take this as the maximum proportion of the older sibling's school mates a (very sociable) younger sibling could be exposed to within the residential area.
To test the possibility of neighborhood interaction more formally, we exclude the older sibling's school peers living in the same neighborhood in the computation of the instrumental variable with the aim to net out the potential direct effects that go from the children living in the same neighborhood to the younger sibling. We also perform the same test by excluding older sibling's school peers living in the same area, defined at the Middle Layer Output Area level. Table 5 displays the results of this exercise. Excluding older sibling's school mates living in the same neighborhood from the calculation of the instrument changes the estimated sibling spillover effect by very little. Excluding older sibling's school mates living in the same area again produces a result that is comparable to the benchmark estimate. This suggests that direct interaction within neighborhoods and wider areas does not threaten our identifying assumption.
The next possibility we want to consider is interaction of the younger sibling with her older sibling's peers at school, for those siblings that attend the same school. Teaching in English secondary schools is separate by cohort, so any interaction would have to take place during the lunch break which takes place between the morning and the afternoon session of the school day, and is about 45 minutes long. While the scope for interaction in lunch break is limited through the available time, there is also the possibility that students meet in after school clubs organised by the school. To satisfy ourselves that interaction at school is no issue, we estimate the sibling spillover separately for siblings going and not going to schools that offer post-16 schooling (in so-called sixth forms working towards A-levels, the University entry exams). Younger siblings in schools without post-16 schooling will not be exposed to older siblings' school mates at school in the last years leading up to their age 16 exam, as these will have left the school. 19 Depending on the age gap between older and younger sibling, there will be no school-based contact for 1-3 years. Table 5 shows the results of these estimates. The estimated sibling spillover effect is even larger at schools that do not offer post-16 schooling than at those that do, indicating that school-based interaction is not a problem for identification.
Finally, it is possible that interaction between younger siblings and older siblings' peers takes place at home, for example when friends visit. Survey evidence for England shows that about a third of children aged 11-15 had no friends round their house in a reference week, another third had friends round 1-2 times, and the remaining third three or more times (Jamieson and McKendrick 2005). This seems a low frequency of contact within the home, and again would comprise only a fraction of the relevant peer group of the older sibling (181 students on average). Research based on AddHealth data (The National Longitudinal Study of Adolescent Health in the US) shows that among siblings from grade 7 to 12 about half have no or few mutual friends, 30% have some mutual friends and 20% have mostly mutual friends (Rende et al. 2005). An older study based on the Arizona sibling study of 10-16 year old finds similar proportions of mutual friends as in the AddHealth data. When asked how often children interact with the mutual friends shared with siblings, however, 50-70% of children say they never or rarely interact (Rowe et al. 1994). This evidence indicates that interaction in the home is unlikely to threaten our identification strategy.
Our IV could also fail because of the way our sample is constructed. We have data for four cohorts of students taking age-16 exams, and it is possible that an older sibling has school mates whose younger siblings are in the same cohort as that of the older sibling herself. In this case there could be a direct effect of the older sibling's school mates on the younger sibling through their younger siblings. However, even if there was a high proportion of such cases in our data, we control for the younger sibling's school-by-cohort-by-subject fixed effects, so that any link to older sibling's school mates going through the school mates' younger siblings is broken.

Exploring additional instruments
Next we check the validity of our instrument further by using additional instruments, which allows us to test the over-identifying restrictions. We consider as first additional instrument the proportion of the older sibling's school mates that had a particular subject as their best subject. This may reflect the selection of similarly talented students into the same school 20 or the presence of better teachers in a specific subject within a school. As we can see in the first row of the bottom panel of Table 5, the F-test of the excluded instruments is very high, indicating that the instruments are relevant, and the estimated sibling spillover effect remains the same as before. The Hansen's J test shows that the null that the instruments are exogenous cannot be rejected.
We choose our second additional instrumental variable to address the possibility that our results are affected by reverse causality (the reflection problem) between the older sibling and her peers. While we can be quite confident that this is unlikely to happen between younger siblings whose tests are in the future with respect to the older sibling, reverse causality between the older sibling and her peers could affect the validity of our instrument. To test this we adopt the identification strategy used by Lavy et al. (2012) who measure peers' ability by prior achievements at age 11 using end-of-primary-school national tests. These are immune to reflection problems because in the compulsory transition from primary to secondary school a major reshuffling of pupils takes place so that on average students meet more than 80% new peers. This means that the end-of-primary-school ability of new peers is predetermined and unaffected by reverse causality. In the second row of the bottom panel of Table 5 we show results of IV estimates including both our original instrument and the subject-specific average test scores attained in primary school by the older sibling's peers. As we can see, this does not change the results and suggests that our estimates are not affected by the reflection problem. This is confirmed also by the the Hansen's J test, which does not reject the exogeneity of our original instrumental variable. Finally, in the last row of the table we enter all three instruments in the regression and again results remain stable.

Mechanisms
In this section we investigate -as far as possible -the mechanisms that may drive the sibling spillover effects we find in this paper. We do this by performing the analysis for different subgroups of the population using interaction terms. We start off by estimating separate sibling spillover effects for sibling groups that we would expect to differ according to the intensity and frequency of interaction. Among the three possible mechanisms driving the sibling spillover effect (productivity spillovers; imitation/differentiation; information sharing) we would expect the strength of the sibling tie to be most important for productivity spillovers 21 because learning from the older sibling through shared leisure or direct teaching will depend on time spent together. We argue that imitation/differentiation and information sharing depend on intensive and frequent interactions between siblings to a lesser extent. Table 6 shows sibling spillover effects estimated by siblings sex composition, age gap between the siblings, family size and levels of urbanization of the area of residence area (urban versus rural). We would expect interactions to be more intense and frequent between same-sex siblings, closely spaced siblings and siblings with a smaller number of children in the family. This is because we would expect same-sex siblings to share interests, closely spaced siblings to be most similar in terms of their developmental stage, and siblings with only two children in the family to have no outside option to interact with any other child at home. As we can see, the spillover effect between sibling pairs that have no further children in the family are the same as those between siblings that have the option to interact with other siblings, in families with 3 or more children. This seems to indicate that interaction intensity is not driving the spillover effect. This is also supported by the estimates by age gap between the siblings. These show the highest effect for the more closely spaced siblings with an age gap of only one year, but the effect for a two year gap is zero and for a three year gap positive and statistically signifiant, so that no clear pattern by age gap emerges. 20 The estimates by sibling sex combination do not show a pattern consistent with expectations either. The spillover effect is largest from brother to brother, followed by an effect of an older sister on a younger brother, whereas the effect of older brothers on younger sisters and between sister-pairs is zero. At the bottom of Table 6 we also show results that distinguish siblings pairs by the urbanicity of their neighborhood. It could be argued that siblings in rural areas (21% of the sample) have fewer options to interact with friends after school because of travel distance. The lack of outside options could increase the intensity of the sibling interaction. The Table shows that the sibling spillover effects are very similar in rural and urban environments.
Taken together, these results seem to suggest that productivity spillovers between siblings are not the main factor driving the effect. However, the analysis by sex composition may indicate that imitation and differenciation processes play a role. Here a sibling derives utility from behaving similarly or opposite to their sibling. It is possible that the direction of the effect could depend on gender. For example, it may be that younger boys are happy to take their older brothers and sisters as role models and enter into direct competition with them, whereas younger sisters avoid competition, for example by specializing in other areas. We have no formal way of testing this further.
Next we explore the role of information transmission as a possible mechanism of sibling spillovers in attainment, drawing on the work of Dahl et al. (2013). In absence of data on subjective expectations and individual information sets, they test whether the spillover effect is higher for groups that stand to benefit more from the information that is likely being transmitted in the interaction. If this is the case, it would support information transmission as the prevailing mechanism. We hypothesise that the value of information related to the costs and benefits of educational choices and to school-specific knowledge should be higher in families where this information is limited. This is likely to be the case in disadvantaged families where parents have less own experience in the education system and access to the relevant information is costly. In Table 7 we show sibling spillover effects separately for children coming from advantaged and disadvantaged families. We measure disadvantage in three different ways, by deprivation of neighborhood of residence, 21 eligibility for free school meals, and by whether the language spoken at home is English. These three measures of disadvantage each capture slightly different things, with neighborhood deprivation capturing income deprivation of the area (dividing neighborhoods into tertiles) and free school meal eligibility relating to low income in the student's household. Families that do not speak English at home are not necessarily income deprived, but they likely lack knowledge of the English education system as the parents in such families will in most cases not have been raised and educated in England. Table 7 shows that, across all three measures of disadvantage, the sibling spillover effect is higher in families where siblings are likely to benefit more from the information that can be transmitted between siblings. It is more than twice as large in families living in the most deprived as opposed to the middle deprived neighborhood tertile, and almost four times higher in families eligible for free school meals than in other families. The spillover effect is almost double among siblings not speaking English at home compared to native English speakers. 22 This indicates that, consistent with our expectations, the spillover effect is higher in families where we assume that knowledge on education and school-specific factors is lower and hence the value of information sharing amongst siblings higher.
It would be good to learn more about the type of information that is beneficial to younger siblings. The only way to get at this with the available data is to compare siblings going to the same or different schools. If the information that is of value relates to general aspects of the education system, the spillover effect should be the same across siblings in same or different schools, as such information can be gained in any school. If the crucial information is school-specific, however, then the coefficient will be higher for siblings going to the same school. In Table 8 we show results separately for siblings going to the same and to different schools. We find that the spillover effect is sizeably larger than the benchmark estimate for siblings going to the same school. An increase of the test score of the older sibling attending the same school by one standard deviation increases the test score of the younger sibling by 3.6% of a standard deviation, compared to 2.4% when considering the whole sample. This seems to suggest that the effect of interactions between siblings is stronger when the older sibling is able to help the younger sibling by transmitting school-specific information e.g. on the specific rules and teachers in a school. For siblings going to a different school (a minority of 15% in our sample) we find no statistically significant spillover effect.
We further test whether school-specific information transmission is equally important for advantaged and disadvantaged students by interacting the school variables with our measures of disadvantage. The results, displayed further down in Table 8, are compelling. Schoolspecific information seems to drive sibling spillover effects for students from all backgrounds, but the effect is larger for children from disadvantaged backgrounds, indicating that they stand to benefit more from the information. This would be because such information is less available in disadvantaged families through parents. One interesting result emerges for students that are eligible for free school meals. Here we see that there is a sizeable spillover effect also for siblings at different schools. We would interpret this as indicating that in this group, which captures family disadvantage more precisely than the other two, there is also a considerable value of sharing general information about education that is not school specific. 23 In summary, we do not find strong evidence that productivity spillovers through joint leisure or direct teaching ("productivity spillover") play a large role in explaining sibling spillovers. These should be larger for siblings that are likely to interact more frequently and intensely, but we find no corresponding pattern across various ways of proxying the strength of the sibling relationship (except perhaps for siblings that are only one year apart in age). Imitation and differentiation are channels which may play a role but we have no way of investigating it in detail. We find that information sharing seems to be a driver of the spillover effect, as the effect is largest for disadvantaged groups that would presumably benefit most from the information transmitted from the older sibling to the younger. The useful information seems to pertain to knowledge about how to succeed in a particular school, with the exception of students from low income families who also benefit from more general knowledge shared between siblings.
in school achievement. Our paper adds to the economic literature on child development by highlighting the direct role of siblings, where the previous literature has mainly focused on parent-child interactions when investigating the role of family background for child outcomes.
Regarding methodology, we view our main contribution in proposing a new strategy to identify sibling spillover effects in education that is not context specific but can be universally applied. Moreover it does not rely on the introduction of policy reforms where parents may reallocate resources between siblings in reaction to the policy, confounding the direct spillover effect. Our main concern when estimating the spillover effect is unobserved heterogeneity, in particular potential unobserved family and school investments that are shared by siblings and that can cause a spurious association between siblings. We use within-pupil betweensubject estimation to control for child, school and family characteristics that are subjectinvariant. Furthermore, we control for subject-specific school characteristics by applying school-by-cohort-by-subject fixed effects estimation. Finally, we account for subject-specific skills acquired from parents through inheritance or investments and shared by siblings by using instrumental variable estimation. We instrument the older siblings' test scores using the average test scores of her school mates, exploiting idiosyncratic changes in the average subject-specific test score across schools and/or cohorts. We make use of the fact that the older siblings' test scores can be affected directly by her school mates' results, whereas we assume there is no direct effect of the older sibling's school mates on the younger sibling.
We devote a long section of the paper to tests of this assumption, and these lend credibility to the causal interpretation of our results.
The large sample size available in our data allows us to explore potential mechanisms of the sibling spillover effect by performing subgroup analysis. We identify information transmission about the costs and benefits of eduational choices, and importantly relating to school-specific knowledge, as the most likely mechanism, although we cannot rule out other interpretations. Productivity spillovers through direct teaching or joint leisure of siblings do not seem to drive the sibling spillover effect. There is little consistent heterogeneity in the results based on the potential intensity and frequency of the sibling interaction, proxied by family size and age gap, for example. In contrast, we find substantial heterogeneity by family background. The spillover effect is two to four times larger in families where we assume that knowledge on education and school-specific factors is low and hence the value 26 of information sharing between siblings higher. These are families on low incomes, living in deprived neighborhoods or where English is not spoken at home, respectively. The spillover effect in these families ranges from 6-8% for a standard deviation increase in test scores of the older sibling.
Taken together, our paper has important implications for policy that seeks to narrow the attainment gaps between children from different socio-economic backgrounds. Our results indicate that siblings play an important role in conveying information about the costs and benefits of educational choices in families where parents have less access to such information.
This suggests that investments into students from deprived families can have considerable externalities through their benefits on younger siblings.