SUMMARY
 Top of page
 SUMMARY
 INTRODUCTION
 PREVIOUS RESEARCH
 ECONOMETRIC MODEL AND ESTIMATION METHODS
 DATA
 EMPIRICAL RESULTS
 MONTE CARLO SIMULATIONS
 CONCLUSION
 ACKNOWLEDGEMENTS
 REFERENCES
 Supporting Information
We provide the first empirical application of a new approach proposed by Lee (Journal of Econometrics 2007; 140(2), 333–374) to estimate peer effects in a linearinmeans model when individuals interact in groups. Assumingsufficient group size variation, this approach allows to control for correlated effects at the group level and to solve the simultaneity (reflection) problem. We clarify the intuition behind identification of peer effects in the model. We investigate peer effects in student achievement in French, Science, Mathematics and History in secondary schools in the Province of Québec (Canada). We estimate the model using conditional maximum likelihood and instrumental variables methods. We find some evidence of peer effects. The endogenous peer effect is large and significant in Mathematics but imprecisely estimated in the other subjects. Some contextual peer effects are also significant. In particular, for most subjects, the average age of peers has a negative effect on own test score. Using calibrated Monte Carlo simulations, we find that high dispersion in group sizes helps with potential issues of weak identification. Copyright © 2012 John Wiley & Sons, Ltd.
INTRODUCTION
 Top of page
 SUMMARY
 INTRODUCTION
 PREVIOUS RESEARCH
 ECONOMETRIC MODEL AND ESTIMATION METHODS
 DATA
 EMPIRICAL RESULTS
 MONTE CARLO SIMULATIONS
 CONCLUSION
 ACKNOWLEDGEMENTS
 REFERENCES
 Supporting Information
Evaluating peer effects in academic achievement is important for parents, teachers and schools. These effects also play a prominent role in policy debates concerning ability tracking, racial integration and school vouchers (for a recent survey, see Epple and Romano, 2011). However, despite a growing literature on the subject, the evidence regarding the magnitude of peer effects on student achievement is mixed (e.g. Sacerdote, 2001; Hanushek et al., 2003; Stinebrickner and Stinebrickner, 2006; Ammermueller and Pischke, 2009). This lack of consensus partly reflects various econometric issues that any empirical study on peer effects must address. Identifying and estimating peer effects raises three basic challenges. First, the relevant peer groups must be determined. Who interacts with whom? Second, peer effects must be identified from confounding factors. In particular, spurious correlation between students' outcomes may arise from selfselection into groups and from common unobserved shocks. Third, identifying the precise type of peer effect at work may be hard. Simultaneity, also called the reflection problem by Manski (1993), may prevent separating contextual effects, i.e. the influence of peers' characteristics, from the endogenous effect, i.e. the influence of peers' outcome. This issue is important since only the endogenous effect is the source of a social multiplier. Researchers have adopted various approaches to solve these three issues; we discuss the methods and results of previous studies in more detail in the next section. As will be clear, however, there is no simple methodological answer to these three challenges.
In this paper, we provide, to our knowledge, the first application of a novel approach developed by Lee (2007) for identifying and estimating peer effects. In principle, the approach is promising, as it allows to solve the problem of correlated effects and the reflection problem with standard observational (nonexperimental) data. Moreover, the exclusion restrictions imposed by the model are explicitly derived from its structural specification and provide natural instruments. The econometric model does rely on a number of crucial assumptions, however, which makes its confrontation to real data particularly important. We empirically assess the approach using original administrative data on test scores at the end of secondary school in the Canadian Province of Québec. We investigate the presence of peer effects in student achievement in Mathematics, Science, French, and History. In the process, we also provide new economic insights regarding the sources of identification in the model. This matters in particular in assessing its robustness to alternative (nonlinear) approaches.
The econometric model relies on three key assumptions. First, individuals interact in groups known to the modeler. This means that the population of students is partitioned into groups (e.g. classes, grade levels) and that students are affected by all their peers in their groups but by none outside of it. This assumption is typical in studies of academic achievement but clearly arises from data constraints. Second, each individual's peer group is everyone in his group excluding himself. While this assumption seems innocuous and has been used in most empirical studies, it is a key source of identification in the model, as will become clear below. In fact, it is a main source of difference between Manski's (1993) and Lee's models. Manski's approach can be interpreted as one in which each individual's peer group includes himself.1 Third, individual outcome is determined by a linearinmeans model with group fixed effects. Thus the test score of a student is affected by his characteristics and by the average test score and characteristics in his peer group. In addition, it may be affected by any kind of correlated grouplevel unobservable.
Lee (2007) shows that peer effects are identified in such a framework when there are sufficient groups of different sizes. One important contribution of our paper is to clarify the economic intuition behind identification. Regarding the estimation of parameters, one potentially important limitation of the method, however, is that convergence in distribution of the peer effect estimates may occur at low rates when the average group size is large relative to the number of groups in the sample (Lee, 2007). This is also intuitive: excluding the individual or not from his peer group does not change much when its size is relatively large.
Here two remarks are in order. First, these results are to be distinguished from the idea that the group size is a factor in a school's production function (e.g. Krueger, 2003). In Lee's model, the effects of group sizes which are separable from the peer effects are controlled for by fixed effects in the structural model. Second, Lee's identification method differs from the variance contrast approach developed by Graham (2008). The basic idea in this approach is that peer effects will induce intragroup dependencies in behavior that introduce variance restrictions on the error terms. These restrictions are used to identify the composite (endogenous + contextual) social interaction effects under the assumption that the variance matrix parameters are independent of the reference group size.
We use administrative data on academic achievement for a large sample of secondary schools in the Province of Québec obtained from the Ministry of Education, Recreation and Sports (MERS). Our dependent variables are individual scores on four standardized tests taken in June 2005 (Mathematics, Science, French and History) by fourth and fifthgrade secondary school students. All fourth and fifthgrade students in the province must pass these tests to graduate. One advantage of these data is that all candidates in the province take the same exams, no matter what their school and location. This feature effectively allows us to consider test scores as draws from a common underlying distribution. Another advantage is that our sample is representative and quite large. We have the scores of all students for a 75% random sample of Québec schools which, over the four subjects, yields 194,553 test scores for 116,534 students. In terms of interaction patterns, the structure of the data leads us to make the following natural assumption. We assume that the peer group of a student contains all other students in the same school qualified to take the same test in June 2005. In practice, a small number of students postpone testtaking to August 2005. We extend Lee's methodology in the empirical modeling to address this issue. However, since the difference between observed group sizes and actual group sizes is small, the correction has little effect on the results. Following Lee (2007), we estimate the model in two ways: through generalized instrumental variables (IV) and, under stronger parametric conditions, through conditional maximum likelihood robust to nonnormal disturbances (pseudo CML).
Our results are mixed though consistent with the model. We do provide evidence of some endogenous and contextual peer effects. Based on pseudo CML estimates, we find that the endogenous peer effect is positive, significant and quite high in Mathematics (0.83). Moreover it is within the range of previous estimates (see Sacerdote, 2011, for a recent survey). However, the effect is smaller and nonsignificant in History (0.64), French (0.30) and Science (− 0.23).2 Endogenous peer effects estimates obtained from IV methods are highly imprecise with our data, even in Mathematics. The higher precision of our pseudo CML estimates is consistent with results in Lee (2007) showing that CML estimators are asymptotically more efficient than IV estimators. As regards contextual peer effects, we find evidence that some of them matter, based on both pseudo CML and IV estimators. For instance, results from pseudo CML indicate that interacting with older students (a proxy for repeaters) has a negative effect on own test score in all subjects except Mathematics (not significant).
It is remarkable that even with large average group size relative to the number of groups we are able to identify some peer effects. However, there is also much dispersion in group sizes within our samples. We suspect that this helps identification. We study this issue systematically through Monte Carlo simulations. We find that indeed increasing group size dispersion has a positive impact on the precision of estimates.
The remainder of the paper is organized as follows. We discuss past research in Section 2 and present our econometric model and the estimation methods in Section 3. We describe our dataset in Section 4. We present our empirical results in Section 5 and run Monte Carlo experiments in Section 6. We conclude in Section 7.
PREVIOUS RESEARCH
 Top of page
 SUMMARY
 INTRODUCTION
 PREVIOUS RESEARCH
 ECONOMETRIC MODEL AND ESTIMATION METHODS
 DATA
 EMPIRICAL RESULTS
 MONTE CARLO SIMULATIONS
 CONCLUSION
 ACKNOWLEDGEMENTS
 REFERENCES
 Supporting Information
In this section we give a brief overview of the recent literature on student achievement and peer effects, and we explain how our study complements and enhances current knowledge on peer interactions in academic outcomes.3
As discussed above, measuring peer effects is complex as it raises three basic interrelated problems: the determination of reference groups, the problem of correlated effects and the reflection problem. The choice of reference groups is often severely constrained by the availability of data. In particular, there are still few databases providing information on the students' social networks; the Add Health dataset is an exception (see, for example, CalvóArmengol et al., 2009; Lin, 2010).4 For this reason, many studies focus on the gradewithinschool level (e.g. Hanushek et al., 2003; Angrist and Lang, 2004). Other studies analyze peer effects at the classroom level (e.g. Kang, 2007; Ammermueller and Pischke, 2009). The administrative data we use in this study do not provide information on classes or teachers. Therefore, we assume that for each subject the relevant reference group for a student taking the test contains all other students in the same school who have completed all courses in the subject matter by June 2005. Thus, given that the reference group is likely to include students from other classes, one should probably expect peer effects to be smaller than at the classroom level.5
Two main strategies have been used to handle the problem of correlated effects. A first strategy has been to exploit data where students are randomly or quasirandomly assigned within their groups (e.g. Sacerdote, 2001; Zimmerman, 2003; Kang, 2007). Results on the impact of contextual effects using randomly assigned roommates as peers are usually low though significant. However, Stinebrickner and Stinebrickner (2006) have argued that these studies tend to underestimate true peer effects as the true influence of roommates is unclear. A second strategy uses observational data to estimate peer effects. This approach is usually based on two assumptions. First, fixed effects allow correlated effects to be taken into account. With crosssection data, these effects are usually defined at a level higher than peer groups. Otherwise, peer effects are absorbed in these effects and cannot therefore be identified. For instance, Ammermueller and Pischke (2009) introduce school fixed effects to estimate peer effects at the class level for fourth graders in six European countries. Contrary to this approach, our model allows inclusion of fixed effects at the peer group level even with crosssection data. This is so because each student within a group has his own reference group (since he is excluded from it). The second assumption is that one observes exogenous shocks to peer group composition which allow identification of a composite (endogenous + contextual) peer effect. The strategy uses either crosssection or panel data. With crosssection data, demographic variations across grades but within schools are usually exploited (see Bifulco et al., 2011). With panel data, demographic variations across cohorts but within school grades are usually exploited (see Hanushek et al., 2003).
The reflection problem is handled using two main strategies. In most papers, no solution for this difficult problem is provided. Rather, researchers estimate a reducedform linearinmeans model, and no attempt is made to separate the contextual and endogenous peer effects. Only composite parameters are estimated (Sacerdote, 2001; Ammermueller and Pischke, 2009). Note, however, that a number of these papers (often implicitly) assume that there are no contextual effects. In this case, the composite parameter(s) allow(s) to identify the endogenous peer effect. In a second strategy, one uses instruments to obtain consistent estimates of the endogenous peer effect (e.g. Evans et al., 1992; Gaviria and Raphael, 2001). The problem here is to choose suitable instruments. For instance, Rivkin (2001) argues that the use of metropolitanwide aggregate variables as instruments in the Evans et al. (1992) study exacerbates the biases in peer effect estimates. In our paper, we provide some results based on instrumental methods. However, our instruments are naturally derived from the structure of the model.
In short, various strategies have been proposed to address the three basic issues that occur in the estimation of peer effects. But most rely on strong assumptions that are difficult to motivate and may not hold in practice. Some of them require panel data, while others rely on experiments that randomly allocate students within their peer group. This makes the results in Lee (2007) particularly interesting, as they show that both endogenous and contextual peer effects may be fully identified even with observational data in crosssection.
DATA
 Top of page
 SUMMARY
 INTRODUCTION
 PREVIOUS RESEARCH
 ECONOMETRIC MODEL AND ESTIMATION METHODS
 DATA
 EMPIRICAL RESULTS
 MONTE CARLO SIMULATIONS
 CONCLUSION
 ACKNOWLEDGEMENTS
 REFERENCES
 Supporting Information
We gathered for this analysis original data from the Québec Government MERS. These administrative data provide detailed information on individual scores on standardized tests taken in June 2005 on four subjects (Mathematics, Science, French and History) by fourth and fifthgrade secondary school students. They also include information on the age, gender, language spoken at home and socioeconomic status of students. Sampling has been done in two steps. The population of interest is the set of all fourth and fifthgrade secondary school students who are candidates to the MERS examinations in June 2005. This population consists of 152,580 students in total. In the first step, a 75% random sample of secondary schools offering fourth and fifthgrade classes in the 2004–2005 school year has been selected. In the second step, all fourth and fifthgrade students in these schools have been included. Overall, we have 194,553 individual test scores for 116,534 students.13
There are many advantages to the use of our data. First, all fourth and fifthgrade students must take tests on these four subjects to qualify for secondary school graduation. This means that our results do not pertain to a selected sample of schools. In particular, both public and private school students have to take these tests. Another advantage is that the tests are standardized, i.e. designed and applied uniformly within the Province of Québec. We use test results gathered by the MERS, so there is less scope for measurement error with these data than with survey data on grades. Finally, although survey data may have provided information on a larger set of covariates, sample sizes in our study are larger than in typical school surveys.
Given the lack of information on the structure of relevant social interactions, we assume that the peer group for a student taking a test is comprised of all other students in the same school who are qualified to take the test in June 2005. Two test sessions are offered for those who completed coursework in the spring semester. We thus consider as belonging to the same group all those who belong to the same school and who take a subject test in one of the two consecutive sessions of June and August 2005. We know the number of students in each of these groups. But we only observe test scores for the set of students who took the test in June. Therefore we do not always observe the scores of all students within a group. We offered a correction for this problem in our discussion of the econometric model, and our empirical results below incorporate this correction. In any case, an overwhelming majority of the students do take the tests in June, so the correction has little effect on the results.
We use for this study French, History, Science and Mathematics test results as reported in the MERS administrative data. Students in a regular track take History and Science tests in Secondary 4. The French test is commonly taken in Secondary 5. Finally, we focus on students who take the Mathematics test in Secondary 5 (Math 514). This completes their mathematical training for secondary school. Note that the MERS administers a unique test to all secondary school students in French, History and Science. In contrast, it administers different tests in Mathematics, depending on academic options chosen early on by the students. We report here results for students following the regular mathematical training (Math 514). We focus on this test in our analysis.
We provide descriptive statistics in Table 1. For each subject, the dependent variable in our econometric model is the test score obtained in the provincial standardized test. The average score is between 70 % and 75 % in French, Science and History tests. It is lower and about 62 % in Mathematics. In samples for which the regular track for the test is Secondary 5 (respectively Secondary 4), the average age of students is close to 16 (respectively 15). Most students taking French and Mathematics (98 % and 96 %) are enrolled in Secondary 5. Most of those taking Science and History are enrolled in Secondary 4 (92 % and 96 %). Between 52 % and 55 % of students are female, and between 11 % and 13 % of students speak a language at home which is different from the language of instruction (Foreign variable).14 Between 30 % and 34 % of students come from a relatively high socioeconomic background and between 40 % and 42 % from a medium one. We use an index of socioeconomic status provided by the MERS. This index is computed from data from the 2001 census. It uses information on the level of education of the mother (a weight of 2/3) and the job status of parents (weight of 1/3). Low socioeconomic status corresponds to the three lowest deciles of the index (high socioeconomic status to the three highest deciles).
Table 1. Descriptive statisticsCourse  Variable  Mean   SD 


French  Score  72.647   14.086 
(Sec. 5)  Age  16.142   0.488 
 Socioecon. index  —   — 
 Perc. high  0.328   0.469 
 Perc. med.  0.409   0.492 
 Gender (female = 1)  0.549   0.500 
 Foreign  0.111   0.310 
 Secondary 5  0.985   0.120 
 Number of observations   41,778  
 Number of groups   314  
 Size of true groups  133.4   115.7 
 Size of observed groups  133.1   115.4 
Science  Score  74.689   17.671 
(Sec. 4)  Age  15.255   0.610 
 Socioecon. index  —   — 
 Perc. high  0.338   0.470 
 Perc. med.  0.402   0.490 
 Gender (female = 1)  0.527   0.499 
 Foreign  0.127   0.333 
 Secondary 5  0.077   0.267 
 Number of observations   54,981  
 Number of groups   378  
 Size of true groups  146.0   134.2 
 Size of observed groups  145.5   133.7 
Matha  Score  62.088   15.83 
(Sec. 5)  Age  16.272   0.574 
 Socioecon. index  —   — 
 Perc. high  0.303   0.460 
 Perc. med.  0.400   0.490 
 Gender (female = 1)  0.540   0.498 
 Foreign  0.111   0.314 
 Secondary 5  0.957   0.202 
 Number of observations   15,771  
 Number of groups   361  
 Size of true groups  50.7   49.9 
 Size of observed groups  49.9   49.7 
History  Score  70.156   17.280 
(Sec. 4)  Age  15.230   0.580 
 Socioecon. index  —   — 
 Perc. high  0.337   0.473 
 Perc. med.  0.403   0.491 
 Gender (female = 1)  0.533   0.499 
 Foreign  0.127   0.333 
 Secondary 5  0.044   0.205 
 Number of observations   55,057  
 Number of groups   382  
 Size of true groups  144.6   134.8 
 Size of observed groups  144.1   134.5 
We observe test scores and characteristics of students taking the same test in June 2005. Sample sizes are 41,778 for French, 54,981 for Science, 15,771 for Mathematics and 55,057 for History. We also observe the number of students who completed coursework but postpone testtaking to August 2005. There are 118 students postponing French, 186 postponing History, 195 postponing Science and 160 postponing Mathematics. We observe between 314 and 382 peer groups depending on the subject matter considered. The average group size is between 50 (Mathematics) and 146 (Science). The ratio between the number of groups and the average group size varies between 2.36 (French) and 7.23 (Mathematics). These numbers are relatively small, which suggests that our estimates could be subject to weak identification problems. The group size standard deviation is quite large, however, varying between 50 (in Mathematics) and about 135 (in Science and History). We expect such dispersion in group sizes to help identification. We analyze these issues in more details in Section 6.
MONTE CARLO SIMULATIONS
 Top of page
 SUMMARY
 INTRODUCTION
 PREVIOUS RESEARCH
 ECONOMETRIC MODEL AND ESTIMATION METHODS
 DATA
 EMPIRICAL RESULTS
 MONTE CARLO SIMULATIONS
 CONCLUSION
 ACKNOWLEDGEMENTS
 REFERENCES
 Supporting Information
In this section we study through simulations the effect of group sizes and their distribution on the precision and bias of our estimates. Lee (2007) shows that the CML and IV estimators may converge in distribution at low rates when the ratio between the the number of groups and the average group size is small. Since this ratio varies between 2.36 and 7.23 in our samples, a problem of weak identification could in principle emerge. However, the standard deviation of the distribution of group sizes is also relatively large (see Table 1), and we suspect that this may help identification. To study these issues, we realize two simulation exercises. First, we vary group sizes in a systematic manner and study how this affects the bias and precision of estimators. To focus on the approach which provides the most reasonable findings in our empirical analysis, we report results on the model using CML.17 We look at uniform distributions, vary the size of the distribution's support and partly calibrate simulation parameters on our data. Second, we look at bias and precision of estimates for fully calibrated simulations, when group sizes are exactly the same as in the data. Overall, while our analysis confirms Lee's earlier results, we also find a strong positive impact of the dispersion in group sizes on the strength of identification. In particular, conditional maximum likelihood performs well on fully calibrated simulations. This suggests that the bias due to small sample issues is likely low in the results presented in Table 2.
For each simulation exercise, we keep the number of observations fixed around 42,000 and run 1000 replications. We first consider average sizes of 10, 20, 40, 80 and 120. We pick group sizes from the following intervals with decreasing length:
 average size of 10: [3, 17], [5, 15], [7, 13] and [9, 11];
 average size of 20: [3, 37], [8, 32], [13, 27] and [18, 22];
 average size of 40: [3, 77], [12, 68], [21, 59], [30, 50] and [39, 41];
 average size of 80: [3, 157], [18, 142], [33, 127], [48, 112] and [63, 97];
 average size of 120: [3, 237], [28, 212], [53, 187], [78, 162] and [103, 137].
For each of the intervals described above, we proceed in the following manner:
 pick a group size from a uniform distribution for which the support is defined by the minimum and maximum value of the interval;
 truncate this value by eliminating its decimal portion;
 repeat steps 1 and 2 as long as the total number of observations is below or equal to 42,000.
To reduce computing time, we assume that students have the same characteristics except for age and gender. We assume that age follows a normal distribution and gender follows a Bernoulli distribution. We calibrate the moments of these distributions on the sample of students taking the French test: average age is 16, variance of age is 0.25 and proportion of girls is 0.55. Values of the structural parameters β, γ and δ are set close to the estimated coefficients for the French test: β = 0.35, γ_{age} = − 8, γ_{gender} = 3.8, δ_{age} = − 40, δ_{gender} = − 25.
We assume that the values of ε in the structural equation are drawn randomly from a normal distribution with mean zero and variance σ^{2} = 1. We generate the endogenous variable y from the reducedform equation in deviation form.
Looking at Table 4, we first compare simulation results across average group sizes and then we examine how estimators perform for a given average group size as dispersion in group size decreases. Separate horizontal panels in Table 4 pertain to different values of average group size. We report the average estimated coefficient and standard error for the endogenous effect (first vertical panel), the contextual effect associated with age (second vertical panel) and the contextual effect associated with gender (third vertical panel). We find that even for the largest average group size (i.e. 120), CML may perform well in terms of bias and precision (first line in the last horizontal panel of Table 4). The biases of CML get larger, in general, as average group size increases. The CML estimate of the endogenous effect attains a plateau at the value 1. This is consistent with the fact that the CML estimator tends towards the naive OLS estimator as group sizes become larger. In general, peer effects are also less precisely estimated in large groups than in small groups.
Table 4. Group size variation: simulations using CMLAvg. group size  Group sizes Range  Endogenous effect  Contextual effects: age  Contextual effects: gender 

CML  CML  CML 

Avg. coeff.  SE  Avg. coeff.  SE  Avg. coeff.  SE 


10  [3; 17]  0.35  0.00  −40.01  0.25  −25.01  0.33 
10  [5; 15]  0.35  0.00  −40.00  0.35  −24.99  0.74 
10  [7; 13]  0.35  0.02  −40.00  0.53  −25.01  1.50 
10  [9; 11]  0.57  0.38  −40.27  1.79  −26.97  5.78 
20  [3; 37]  0.35  0.00  −40.01  0.27  −25.03  0.44 
20  [8; 32]  0.35  0.02  −40.00  0.50  −25.02  1.10 
20  [13; 27]  0.35  0.09  −39.95  1.23  −25.04  2.11 
20  [18; 22]  0.94  1.56  −37.98  5.37  −28.55  8.47 
40  [3; 77]  0.35  0.00  −39.98  0.41  −25.03  0.65 
40  [12; 68]  0.36  0.07  −39.97  1.42  −25.05  1.66 
40  [21; 59]  0.39  0.20  −39.85  2.76  −25.14  2.67 
40  [30; 50]  0.72  0.85  −37.92  5.82  −26.93  5.30 
40  [39; 41]  1.00  155.98  −36.25  78.67  −26.28  69.22 
80  [3–157]  0.35  0.01  −39.99  0.68  −25.05  0.98 
80  [18–142]  0.43  0.19  −39.55  2.93  −25.40  2.49 
80  [33–127]  0.57  0.46  −38.54  4.87  −25.96  3.66 
80  [48–112]  0.87  1.20  −36.47  8.02  −27.10  5.75 
80  [63–97]  1.00  5.27  −35.75  17.05  −27.74  11.97 
120  [3–237]  0.36  0.01  −39.99  0.99  −25.10  1.50 
120  [28–212]  0.64  0.51  −38.14  5.22  −26.34  3.79 
120  [53–187]  0.89  1.30  −35.99  8.69  −27.25  5.85 
120  [78–162]  1.00  3.85  −35.34  15.16  −27.65  9.94 
120  [103–137]  1.00  25.15  −35.32  39.00  −28.20  25.29 
Our main new result concerns the effect of group size dispersion. When we fix the value of the average group size and reduce the length of the interval from which group sizes are picked, we find that the bias of CML typically increases, while the precision typically decreases. In Table 4, this amounts to looking at each horizontal panel separately. Observe, however, that since we roughly pick group sizes from a uniform distribution holding average group size fixed, reducing the interval's length affects the two parameters of the size distribution (i.e. the minimum and maximum value of its support) and a number of its moments. In particular, this leads to a reduction in variance and to an increase in the size of the smallest groups. In general, both the variance and the size of smallest groups may matter and the strength of identification may depend on the size distribution in complex ways. We leave a deeper investigation of this issue to future research.
We next fully calibrate the simulations' parameters on the data. We use observed group sizes in the French sample, calibrate the model parameters {β, γ_{age}, γ_{gender}, δ_{age}, δ_{gender}} and moments of the explanatory variables as previously, and set the variance of the error term in the structural equation equal to the estimated variance in the French sample (). Simulation results which now report both CML and IV estimates are reported in Table 5. The CML estimator has small bias and standard error, while the IV estimator is not precisely estimated and the bias is large. These results confirm for CML what we obtained from picking group sizes at random; they show that dispersion in group sizes helps identification. Besides, this suggests that small sample bias may be relatively high in the IV estimates of Tables 3, and of Table 2 of the supplementary Appendix, but relatively low for the CML estimates of Table 2.
Table 5. Simulations calibrated on French sample (1000 replications)  CML  2SLS  G2SLS  OLS 


Endogenous effect  0.391  −0.873  0.495  −33.571 
(0.101)  (0.852)  (167.702)  (3.688) 
Individual effects 
Age  −8.002  −7.920  −8.006  −5.758 
(0.145)  (0.149)  (10.021)  (0.545) 
Gender (female = 1)  3.798  3.822  3.828  4.480 
(0.147)  (0.139)  (1.693)  (0.554) 
Contextual effects 
Age  −39.996  −38.085  −39.540  17.373 
(9.996)  (7.579)  (167.394)  (76.788) 
Gender (female = 1)  −25.329  −16.703  −21.857  210.526 
(10.733)  (10.092)  (692.625)  (74.714) 
CONCLUSION
 Top of page
 SUMMARY
 INTRODUCTION
 PREVIOUS RESEARCH
 ECONOMETRIC MODEL AND ESTIMATION METHODS
 DATA
 EMPIRICAL RESULTS
 MONTE CARLO SIMULATIONS
 CONCLUSION
 ACKNOWLEDGEMENTS
 REFERENCES
 Supporting Information
This paper provides an analysis of social interactions in scholar achievement when students interact through groups. Based on a linearinmeans approach with group fixed effects (Lee, 2007), we make two main contributions regarding the identification and estimation of peer effects. First, we provide a new intuition for identification. We show that full identification of the model relies on three key properties. (i) Since the individual is excluded from his peer group, aboveaverage students have belowaverage peers (with respect to any attribute). Therefore, when individual and peer effects are positive, peer effects then tend to reduce the dispersion in outcomes. (ii) This reduction is stronger in smaller groups, reflecting the larger effect of excluding one individual from the mean. (iii) Contextual and endogenous peer effects generate reductions of different shapes, which allow us to identify both of them.
Second, as regards the estimation of peer effects, the model is applied to original administrative data providing individual scores on standardized tests taken in June 2005 in four subjects by fourth and fifthgrade secondary school students in the Province of Québec (Canada). Based on a pseudo conditional maximum likelihood approach, our results indicate that students significantly benefit from their peers' higher test scores in Mathematics but not in other subjects such as Science, History and French. Two reasons may explain these results. First, this is likely to reflect the fact that Mathematics provides more opportunities for interactions among students. Second, in our sample, the average group size (relative to the number of groups) is close to three times smaller in Mathematics than in other subjects. As suggested by Lee (2007), accurate estimation of peer effects requires relatively small groups. This is also confirmed by our Monte Carlo simulations. These results should be warning applied researchers in the future against using data in which the size of groups is too large. Besides, our simulations indicate that, for a given average group size, increasing group size dispersion improves the precision of peer effects estimates. In fact, our results suggest that, conditional on estimating on the whole sample, even data on larger groups may provide useful information for estimation purposes. The basic intuition is that data on very large groups can be used to provide more precise individual effects estimators. In turn, this indirectly provides more efficient estimates of the peer effects from data on smaller groups. Thus future estimations of Lee's model may benefit from data with relatively small average group size but relatively large group size dispersion, including both small and large groups.
In terms of public policy, the fact that the endogenous peer effects appear to be very large in Mathematics suggests that a reform that improves the amount and quality of Mathematics learning is likely to yield very high returns in terms of scholar achievement. This is so since such a reform will not only have direct effects on student performance in Mathematics but also strong indirect effects through the additional external benefits generated by the social multiplier. Remarkably, our analysis also shows that the indirect peer effects of the reform will reduce performance inequalities in Mathematics across students. This is the case because lowability students have better peers (since their peers exclude them) and highability students have worse peers (for the same reason). Moreover, the strong negative effects of the average age of peers on scholar achievement (except in Mathematics) suggest that resources invested by the government to reduce the number of repeaters may have an important indirect positive impact on student performance. One limitation of Lee's linearinmeans approach is that it imposes that average test scores over all schools are not influenced by a reallocation of students across schools (see Sacerdote, 2011). Therefore, the model does not have much to say about issues such as optimal school composition by race or ability.
Our research could be extended in many directions. It would be interesting to evaluate the validity of this approach by using data where group membership is experimentally manipulated and group sizes are heterogeneous (as in Sacerdote, 2001). One could also analyze how group size variations may help to identify peer effects when the outcome is a discrete variable (e.g. pass or fail). Brock and Durlauf (2007) have studied peer effects identification with discrete outcomes but they ignore group size variations. A third potentially fruitful direction of research would be to analyze a nonlinear version of Lee's approach. Thus student achievement could depend on the mean and standard deviation of peers attributes. Overall, we think that this first empirical application confirmed the interest of the method. Many more applications in different settings are needed, however, in order to gain a thorough understanding of the method's advantages, limitations and applicability for public policy.