We provide the first empirical application of a new approach proposed by Lee (Journal of Econometrics 2007; 140(2), 333–374) to estimate peer effects in a linear-in-means model when individuals interact in groups. Assumingsufficient group size variation, this approach allows to control for correlated effects at the group level and to solve the simultaneity (reflection) problem. We clarify the intuition behind identification of peer effects in the model. We investigate peer effects in student achievement in French, Science, Mathematics and History in secondary schools in the Province of Québec (Canada). We estimate the model using conditional maximum likelihood and instrumental variables methods. We find some evidence of peer effects. The endogenous peer effect is large and significant in Mathematics but imprecisely estimated in the other subjects. Some contextual peer effects are also significant. In particular, for most subjects, the average age of peers has a negative effect on own test score. Using calibrated Monte Carlo simulations, we find that high dispersion in group sizes helps with potential issues of weak identification. Copyright © 2012 John Wiley & Sons, Ltd.


Evaluating peer effects in academic achievement is important for parents, teachers and schools. These effects also play a prominent role in policy debates concerning ability tracking, racial integration and school vouchers (for a recent survey, see Epple and Romano, 2011). However, despite a growing literature on the subject, the evidence regarding the magnitude of peer effects on student achievement is mixed (e.g. Sacerdote, 2001; Hanushek et al., 2003; Stinebrickner and Stinebrickner, 2006; Ammermueller and Pischke, 2009). This lack of consensus partly reflects various econometric issues that any empirical study on peer effects must address. Identifying and estimating peer effects raises three basic challenges. First, the relevant peer groups must be determined. Who interacts with whom? Second, peer effects must be identified from confounding factors. In particular, spurious correlation between students' outcomes may arise from self-selection into groups and from common unobserved shocks. Third, identifying the precise type of peer effect at work may be hard. Simultaneity, also called the reflection problem by Manski (1993), may prevent separating contextual effects, i.e. the influence of peers' characteristics, from the endogenous effect, i.e. the influence of peers' outcome. This issue is important since only the endogenous effect is the source of a social multiplier. Researchers have adopted various approaches to solve these three issues; we discuss the methods and results of previous studies in more detail in the next section. As will be clear, however, there is no simple methodological answer to these three challenges.

In this paper, we provide, to our knowledge, the first application of a novel approach developed by Lee (2007) for identifying and estimating peer effects. In principle, the approach is promising, as it allows to solve the problem of correlated effects and the reflection problem with standard observational (non-experimental) data. Moreover, the exclusion restrictions imposed by the model are explicitly derived from its structural specification and provide natural instruments. The econometric model does rely on a number of crucial assumptions, however, which makes its confrontation to real data particularly important. We empirically assess the approach using original administrative data on test scores at the end of secondary school in the Canadian Province of Québec. We investigate the presence of peer effects in student achievement in Mathematics, Science, French, and History. In the process, we also provide new economic insights regarding the sources of identification in the model. This matters in particular in assessing its robustness to alternative (nonlinear) approaches.

The econometric model relies on three key assumptions. First, individuals interact in groups known to the modeler. This means that the population of students is partitioned into groups (e.g. classes, grade levels) and that students are affected by all their peers in their groups but by none outside of it. This assumption is typical in studies of academic achievement but clearly arises from data constraints. Second, each individual's peer group is everyone in his group excluding himself. While this assumption seems innocuous and has been used in most empirical studies, it is a key source of identification in the model, as will become clear below. In fact, it is a main source of difference between Manski's (1993) and Lee's models. Manski's approach can be interpreted as one in which each individual's peer group includes himself.1 Third, individual outcome is determined by a linear-in-means model with group fixed effects. Thus the test score of a student is affected by his characteristics and by the average test score and characteristics in his peer group. In addition, it may be affected by any kind of correlated group-level unobservable.

Lee (2007) shows that peer effects are identified in such a framework when there are sufficient groups of different sizes. One important contribution of our paper is to clarify the economic intuition behind identification. Regarding the estimation of parameters, one potentially important limitation of the method, however, is that convergence in distribution of the peer effect estimates may occur at low rates when the average group size is large relative to the number of groups in the sample (Lee, 2007). This is also intuitive: excluding the individual or not from his peer group does not change much when its size is relatively large.

Here two remarks are in order. First, these results are to be distinguished from the idea that the group size is a factor in a school's production function (e.g. Krueger, 2003). In Lee's model, the effects of group sizes which are separable from the peer effects are controlled for by fixed effects in the structural model. Second, Lee's identification method differs from the variance contrast approach developed by Graham (2008). The basic idea in this approach is that peer effects will induce intra-group dependencies in behavior that introduce variance restrictions on the error terms. These restrictions are used to identify the composite (endogenous + contextual) social interaction effects under the assumption that the variance matrix parameters are independent of the reference group size.

We use administrative data on academic achievement for a large sample of secondary schools in the Province of Québec obtained from the Ministry of Education, Recreation and Sports (MERS). Our dependent variables are individual scores on four standardized tests taken in June 2005 (Mathematics, Science, French and History) by fourth- and fifth-grade secondary school students. All fourth- and fifth-grade students in the province must pass these tests to graduate. One advantage of these data is that all candidates in the province take the same exams, no matter what their school and location. This feature effectively allows us to consider test scores as draws from a common underlying distribution. Another advantage is that our sample is representative and quite large. We have the scores of all students for a 75% random sample of Québec schools which, over the four subjects, yields 194,553 test scores for 116,534 students. In terms of interaction patterns, the structure of the data leads us to make the following natural assumption. We assume that the peer group of a student contains all other students in the same school qualified to take the same test in June 2005. In practice, a small number of students postpone test-taking to August 2005. We extend Lee's methodology in the empirical modeling to address this issue. However, since the difference between observed group sizes and actual group sizes is small, the correction has little effect on the results. Following Lee (2007), we estimate the model in two ways: through generalized instrumental variables (IV) and, under stronger parametric conditions, through conditional maximum likelihood robust to non-normal disturbances (pseudo CML).

Our results are mixed though consistent with the model. We do provide evidence of some endogenous and contextual peer effects. Based on pseudo CML estimates, we find that the endogenous peer effect is positive, significant and quite high in Mathematics (0.83). Moreover it is within the range of previous estimates (see Sacerdote, 2011, for a recent survey). However, the effect is smaller and non-significant in History (0.64), French (0.30) and Science (− 0.23).2 Endogenous peer effects estimates obtained from IV methods are highly imprecise with our data, even in Mathematics. The higher precision of our pseudo CML estimates is consistent with results in Lee (2007) showing that CML estimators are asymptotically more efficient than IV estimators. As regards contextual peer effects, we find evidence that some of them matter, based on both pseudo CML and IV estimators. For instance, results from pseudo CML indicate that interacting with older students (a proxy for repeaters) has a negative effect on own test score in all subjects except Mathematics (not significant).

It is remarkable that even with large average group size relative to the number of groups we are able to identify some peer effects. However, there is also much dispersion in group sizes within our samples. We suspect that this helps identification. We study this issue systematically through Monte Carlo simulations. We find that indeed increasing group size dispersion has a positive impact on the precision of estimates.

The remainder of the paper is organized as follows. We discuss past research in Section 2 and present our econometric model and the estimation methods in Section 3. We describe our dataset in Section 4. We present our empirical results in Section 5 and run Monte Carlo experiments in Section 6. We conclude in Section 7.


In this section we give a brief overview of the recent literature on student achievement and peer effects, and we explain how our study complements and enhances current knowledge on peer interactions in academic outcomes.3

As discussed above, measuring peer effects is complex as it raises three basic interrelated problems: the determination of reference groups, the problem of correlated effects and the reflection problem. The choice of reference groups is often severely constrained by the availability of data. In particular, there are still few databases providing information on the students' social networks; the Add Health dataset is an exception (see, for example, Calvó-Armengol et al., 2009; Lin, 2010).4 For this reason, many studies focus on the grade-within-school level (e.g. Hanushek et al., 2003; Angrist and Lang, 2004). Other studies analyze peer effects at the classroom level (e.g. Kang, 2007; Ammermueller and Pischke, 2009). The administrative data we use in this study do not provide information on classes or teachers. Therefore, we assume that for each subject the relevant reference group for a student taking the test contains all other students in the same school who have completed all courses in the subject matter by June 2005. Thus, given that the reference group is likely to include students from other classes, one should probably expect peer effects to be smaller than at the classroom level.5

Two main strategies have been used to handle the problem of correlated effects. A first strategy has been to exploit data where students are randomly or quasi-randomly assigned within their groups (e.g. Sacerdote, 2001; Zimmerman, 2003; Kang, 2007). Results on the impact of contextual effects using randomly assigned roommates as peers are usually low though significant. However, Stinebrickner and Stinebrickner (2006) have argued that these studies tend to underestimate true peer effects as the true influence of roommates is unclear. A second strategy uses observational data to estimate peer effects. This approach is usually based on two assumptions. First, fixed effects allow correlated effects to be taken into account. With cross-section data, these effects are usually defined at a level higher than peer groups. Otherwise, peer effects are absorbed in these effects and cannot therefore be identified. For instance, Ammermueller and Pischke (2009) introduce school fixed effects to estimate peer effects at the class level for fourth graders in six European countries. Contrary to this approach, our model allows inclusion of fixed effects at the peer group level even with cross-section data. This is so because each student within a group has his own reference group (since he is excluded from it). The second assumption is that one observes exogenous shocks to peer group composition which allow identification of a composite (endogenous + contextual) peer effect. The strategy uses either cross-section or panel data. With cross-section data, demographic variations across grades but within schools are usually exploited (see Bifulco et al., 2011). With panel data, demographic variations across cohorts but within school grades are usually exploited (see Hanushek et al., 2003).

The reflection problem is handled using two main strategies. In most papers, no solution for this difficult problem is provided. Rather, researchers estimate a reduced-form linear-in-means model, and no attempt is made to separate the contextual and endogenous peer effects. Only composite parameters are estimated (Sacerdote, 2001; Ammermueller and Pischke, 2009). Note, however, that a number of these papers (often implicitly) assume that there are no contextual effects. In this case, the composite parameter(s) allow(s) to identify the endogenous peer effect. In a second strategy, one uses instruments to obtain consistent estimates of the endogenous peer effect (e.g. Evans et al., 1992; Gaviria and Raphael, 2001). The problem here is to choose suitable instruments. For instance, Rivkin (2001) argues that the use of metropolitan-wide aggregate variables as instruments in the Evans et al. (1992) study exacerbates the biases in peer effect estimates. In our paper, we provide some results based on instrumental methods. However, our instruments are naturally derived from the structure of the model.

In short, various strategies have been proposed to address the three basic issues that occur in the estimation of peer effects. But most rely on strong assumptions that are difficult to motivate and may not hold in practice. Some of them require panel data, while others rely on experiments that randomly allocate students within their peer group. This makes the results in Lee (2007) particularly interesting, as they show that both endogenous and contextual peer effects may be fully identified even with observational data in cross-section.


3.1 Econometric Model

We review and adapt the structural model suggested by Lee in the context of our application. Lee's model builds on and extends the standard linear-in-means model of peer effects (Moffitt, 2001) to groups with various sizes. The set of students {i = 1, …, M} is supposed to be partitioned into groups of peers indexed by r = 1, …, R. Let Mr be the rth group of peers, of size mr. All students in the same group have the same number of peers since they interact with all others in the group. We assume that student i who belongs to group r is excluded from his own reference group. Let Mri be student i's group of peers, of size mr − 1. A peer is any fellow student whose academic performance and personal characteristics may affect i's performance. Let yri be the test score obtained by student i. Let xri be a 1 × K vector of characteristics of i and Xr be the mr × K matrix of individual characteristics. For expository purposes, the model is first presented with a unique characteristic (K = 1), defined by his family socioeconomic background. Another departure from the linear-in-means model is the inclusion of a term αr that captures all group-invariant unobserved variables (e.g. same learning environment, similar preferences of school or motivation towards education). The error term εri reflects other unobservable characteristics associated with i.

We do not change any other assumption of the linear-in-means model. In particular, we assume that a student's performance to the standardized test may be affected by the average performance in his group of reference, by his family socioeconomic background, and by the average socioeconomic background in his group. Formally, the basic structural equation is given by

display math(1)

where β captures the endogenous effect, γ the individual effect and δ is the contextual effect. Observe that equation (1) can be derived from the first-order conditions of a choice-theoretic non-cooperative (Nash) model where each student's performance is obtained from the maximization of his quadratic utility function, which depends on his individual characteristics, his performance and his reference group's mean performance and mean characteristics.

Importantly, we assume strict exogeneity of mr and {xri : i = 1, …, mr} conditional on the unobserved effect αr, i.e. inline image. This exogeneity assumption can notably accommodate situations where peer group size is endogenous. Suppose that, everything else equal, brighter students attend smaller schools, i.e. schools where the cohort of students eligible to take the province-wide test in the subject matter (our peer groups) is small. In this case, peer group size mr may well depend on unobserved common characteristics of the student's group, αr : inline image. Our model allows for this type of correlation. However, conditional on these common characteristics, peer group size mr is assumed to be independent of the student's idiosyncratic unobserved characteristics: inline image. We maintain this assumption throughout our analysis.

To eliminate group-invariant correlated effects, we next apply a within transformation to equation (1). In particular, as we noted above, when the effect of group size is separable from peer and individual effects, it is captured by αr. The model can address the problem of selection or endogenous peer group formation. For instance, school choice may depend on some unobserved factors specific to a school (e.g. reputation, unobserved quality) and determine the type of students who are attracted by these schools. The advantage of the within transformation is that we compare students of the same type. This transformation also allows to control for common environment effects. Resources available at the school level (e.g. teaching, physical infrastructure) may affect the performance of all the students. Again, by comparing students within the same school, we can abstract from these effects. The within reduced-form equation for students in the rth group can be written as

display math(2)

where means inline image, inline image and inline image are computed over all students in the group. Now assume that γβ + δ ≠ 0. Only one composite parameter can be recovered from the reduced form for each group size mr. At least three sizes are thus necessary to identify the three structural parameters β, γ and δ.6

3.2 Interpretation of Identification

The fact that the parameters of the structural within equation (2) may be fully identified is quite surprising, and deserves some elaboration. Indeed, under the alternative assumption that means are inclusive, i.e. i ∈ Mri, peers are the same for everyone in a group Mri = Mr, and peer effects cannot be separated out from group fixed effects. Thus somehow assuming that the individual is excluded from his own peer group allows us to solve two difficult identification problems: distinguishing true peer effects from correlated effects and further distinguishing endogenous from contextual peer effects. Intuitively, where does identification come from?

Suppose first that the endogenous effect is absent β = 0. Note that each individual has different peers: i ≠ k implies that Mri ≠ Mrk. A first key observation is that, within a group, individual attributes xi are perfectly negatively correlated with mean peer attributes inline image.7 Thus students with an ability above average necessarily have peers with a mean ability below average, and vice versa. If the individual and the contextual effects γ and δ are positive, this negative correlation tends to reduce the dispersion in outcomes. In such a group setting, peer effects lower the difference in achievement between high- and low-ability students.8 Formally, the impact of the difference in attributes on the difference in outcomes changes from γ to γ − δ/(mr − 1) when introducing peer effects (see, for example, equation (2)). So variations in group sizes can be used to identify contextual peer effects. The second key observation is that this reduction is stronger in smaller groups. The variance in mean peer attributes is simply higher in smaller groups, reflecting the relatively larger effect of excluding one individual from the mean. And as group size increases, mean peer attributes converge to the group mean, and peer effects have increasingly less bite on how differences in covariates affect differences in outcomes.

Next, consider the reflection problem. Observe that outcomes are subject to a similar negative correlation: within a group, students with grades above average necessarily have peers with grades below average. Thus, if β > 0, endogenous peer effects lead to a further reduction in outcome dispersion. However, simultaneity now implies that this decrease in impact is nonlinear in the peer coefficient: from γ − δ/(mr − 1) to (γ − δ/(mr − 1))/(1 + β/(mr − 1)) (see equation (2)). The difference in the shapes of impact reduction can then be used to identify endogenous from contextual peer effects.

Finally, this understanding is useful in assessing the robustness of the identification strategy to changes in the econometric model. In particular, it is easy to see that if xi < xk then the distribution of attributes in i's peer group Mri first-order stochastically dominates the distribution in Mrk. Thus identification is likely to hold, in general, if we replace the mean in equation (1) by the median, the variance, or many other moments of the distribution.9

3.3 Treatment of Missing Values

One problem we face in our sample is that we do not always observe the scores of all students within a group. For instance, some students may postpone test-taking to the next session due to illness. We next use a correction first developed by Davezies et al. (2009) to allow for this possibility. Our setting is one where the total number of students (including those who postpone test-taking) in each group is known, but we only observe the test scores of subsamples Nr of size nr of each group Mr, with nr ≤ mr and inline image. We assume that a student's decision to postpone exam-taking is random or depends on the observable strictly exogenous variables, conditional on the fixed group effect. We show how to adapt Lee's analysis to this more general setting. Let Lr be the complement of Nr, i.e. Lr = Mr − Nr.10 The structural equation becomes

display math(3)

where i now denotes an observed individual in the sample (but not any one in the rth group) and inline image is the new group fixed effect. Under our assumptions, estimators are consistent, even if we do not observe test scores for all students in each group. Moreover, effects stemming from unobserved individuals are the same for all the individuals observed in the sample from the rth group. They are therefore picked up by the group fixed effect. Using the within transformation, one obtains the same equation as (2) but where means inline image, inline image and inline image are computed only over all observed students in the group.

3.4 Estimation Methods

3.4.1 CML Estimator

We consider estimation under both pseudo conditional maximum likelihood (or CML) and instrumental variables (or IV) identification conditions.

To present pseudo CML and IV estimators, it is easier to express equation (3) in matrix notations. We now allow for any number of characteristics, so that γ is a K × 1 vector of individual effects and δ a K × 1 vector of contextual ones. Recall that in this setting students are affected by all others in their group and by none outside of it. This means that the observed social interactions can be modeled as an N × N block-diagonal matrix G = Diag(G1, …, GR), such that for all r, Gr is comprised of elements inline image if i ≠ j and grii = 0. In other terms, inline image, where inline image is a nr × 1 vector of ones and inline image is the identity matrix of dimension nr. Equation (3) can be rewritten in matrix form as follows:

display math(4)

where inline image.

Applying the operator matrix inline image allows us to obtain deviations with respect to the mean for the observed group members. Pre-multiplying equation (4) by Jr eliminates the group fixed effect and yields

display math(5)

Elementary linear algebra tells us that inline image. Letting inline image, we obtain

display math

which is equivalent to equation (2).

To derive the pseudo CML estimator, we assume (possibly wrongly) that the εirs are i.i.d. N(0, σ2). It follows that, given Xr, mr, and nr, the pseudo density of inline image is a multivariate normal distribution with mean inline image and variance inline image.11 The pseudo log-likelihood function to be maximized can then be expressed as follows:

display math

where c is a constant. This log-likelihood function excludes any fixed effects. It is a conditional log-likelihood function as it is conditional on the sufficient statistics inline image, (as well as on the Xrs, the mrs and the nrs), for r = 1, …, R. Under the assumption that the εirs are correctly specified and i.i.d. N(0, σ2), Lee (2007) shows that the CML estimators of β, γ, δ and σ are consistent and asymptotically efficient under regularity conditions and provided there is sufficient variation in group sizes.

Even if the assumed density of inline image is misspecified, the pseudo CML estimator is consistent provided that the conditional mean of the inline images is correctly specified. This is the case since the normal density belongs to the linear exponential family (see Gouriéroux et al., 1984). Of course, the estimator is no longer asymptotically efficient. Moreover, one has to compute the robust covariance matrix using the sandwich formula J− 1IJ− 1, where J is minus the expectation of the Hessian matrix and I the expectation of the outer-product-of-the-gradient matrix. A further advantage of this computation is that it allows us to see whether an apparent precision of CML estimators is driven by the normality assumption used in Lee (2007).

3.4.2 2SLS and Generalized 2SLS Estimators

Alternatively, the structural equation (4) can be estimated by instrumental (IV) methods. To see how the methods work, define a N × N block-diagonal matrix J = Diag(J1, …, JR). Concatenating equation (5) over all groups yields

display math(6)

where y (respectively X) is obtained by stacking the vectors yr (respectively the matrices Xr), for r = 1, …, R.

The reduced form of the model is

display math(7)

Identification can be given a natural interpretation in terms of instrumental variables. If i ∉ Mri and there are at least three different group sizes, inline image is not perfectly collinear to (JX, JGX) and the model is identified (see Bramoullé et al., 2009, for more details). Moreover inline image can be used as a matrix of valid instruments for JGy.12

One advantage of an IV approach over pseudo CML is that it requires less structure. Specifically, we do not assume that the specified density function of the yrs, potentially partially misspecified, is normal. Also we do not use the structure on the error terms for identification purpose. Thus identification in this case is semi-parametric, or ‘distribution-free’. Of course, this comes at a price: the IV estimator is asymptotically less efficient than the pseudo CML, since the latter imposes more structure on the distribution of error terms.

In addition, we can derive a generalized IV estimator as proposed in Kelejian and Prucha (1998) and discussed in Lee (2007). Assuming homoskedasticity, it yields an asymptotically optimal (best) IV estimator and reduces to a two-step estimation method in our case. More precisely, our first step consists in estimating a 2SLS as described above, by using as instruments inline image. The second step consists in estimating a G2SLS estimator using as instruments inline image, where inline image is computed from the reduced form (7) pre-multiplied by G and using the first-step estimates.


We gathered for this analysis original data from the Québec Government MERS. These administrative data provide detailed information on individual scores on standardized tests taken in June 2005 on four subjects (Mathematics, Science, French and History) by fourth- and fifth-grade secondary school students. They also include information on the age, gender, language spoken at home and socioeconomic status of students. Sampling has been done in two steps. The population of interest is the set of all fourth- and fifth-grade secondary school students who are candidates to the MERS examinations in June 2005. This population consists of 152,580 students in total. In the first step, a 75% random sample of secondary schools offering fourth- and fifth-grade classes in the 2004–2005 school year has been selected. In the second step, all fourth- and fifth-grade students in these schools have been included. Overall, we have 194,553 individual test scores for 116,534 students.13

There are many advantages to the use of our data. First, all fourth- and fifth-grade students must take tests on these four subjects to qualify for secondary school graduation. This means that our results do not pertain to a selected sample of schools. In particular, both public and private school students have to take these tests. Another advantage is that the tests are standardized, i.e. designed and applied uniformly within the Province of Québec. We use test results gathered by the MERS, so there is less scope for measurement error with these data than with survey data on grades. Finally, although survey data may have provided information on a larger set of covariates, sample sizes in our study are larger than in typical school surveys.

Given the lack of information on the structure of relevant social interactions, we assume that the peer group for a student taking a test is comprised of all other students in the same school who are qualified to take the test in June 2005. Two test sessions are offered for those who completed coursework in the spring semester. We thus consider as belonging to the same group all those who belong to the same school and who take a subject test in one of the two consecutive sessions of June and August 2005. We know the number of students in each of these groups. But we only observe test scores for the set of students who took the test in June. Therefore we do not always observe the scores of all students within a group. We offered a correction for this problem in our discussion of the econometric model, and our empirical results below incorporate this correction. In any case, an overwhelming majority of the students do take the tests in June, so the correction has little effect on the results.

We use for this study French, History, Science and Mathematics test results as reported in the MERS administrative data. Students in a regular track take History and Science tests in Secondary 4. The French test is commonly taken in Secondary 5. Finally, we focus on students who take the Mathematics test in Secondary 5 (Math 514). This completes their mathematical training for secondary school. Note that the MERS administers a unique test to all secondary school students in French, History and Science. In contrast, it administers different tests in Mathematics, depending on academic options chosen early on by the students. We report here results for students following the regular mathematical training (Math 514). We focus on this test in our analysis.

We provide descriptive statistics in Table 1. For each subject, the dependent variable in our econometric model is the test score obtained in the provincial standardized test. The average score is between 70 % and 75 % in French, Science and History tests. It is lower and about 62 % in Mathematics. In samples for which the regular track for the test is Secondary 5 (respectively Secondary 4), the average age of students is close to 16 (respectively 15). Most students taking French and Mathematics (98 % and 96 %) are enrolled in Secondary 5. Most of those taking Science and History are enrolled in Secondary 4 (92 % and 96 %). Between 52 % and 55 % of students are female, and between 11 % and 13 % of students speak a language at home which is different from the language of instruction (Foreign variable).14 Between 30 % and 34 % of students come from a relatively high socioeconomic background and between 40 % and 42 % from a medium one. We use an index of socioeconomic status provided by the MERS. This index is computed from data from the 2001 census. It uses information on the level of education of the mother (a weight of 2/3) and the job status of parents (weight of 1/3). Low socioeconomic status corresponds to the three lowest deciles of the index (high socioeconomic status to the three highest deciles).

Table 1. Descriptive statistics
CourseVariableMean SD
  1. a

    ‘Math’ refers to Math 514 (Secondary 5 regular course).

FrenchScore72.647 14.086
(Sec. 5)Age16.142 0.488
 Socioecon. index 
 Perc. high0.328 0.469
 Perc. med.0.409 0.492
 Gender (female = 1)0.549 0.500
 Foreign0.111 0.310
 Secondary 50.985 0.120
 Number of observations 41,778 
 Number of groups 314 
 Size of true groups133.4 115.7
 Size of observed groups133.1 115.4
ScienceScore74.689 17.671
(Sec. 4)Age15.255 0.610
 Socioecon. index 
 Perc. high0.338 0.470
 Perc. med.0.402 0.490
 Gender (female = 1)0.527 0.499
 Foreign0.127 0.333
 Secondary 50.077 0.267
 Number of observations 54,981 
 Number of groups 378 
 Size of true groups146.0 134.2
 Size of observed groups145.5 133.7
MathaScore62.088 15.83
(Sec. 5)Age16.272 0.574
 Socioecon. index 
 Perc. high0.303 0.460
 Perc. med.0.400 0.490
 Gender (female = 1)0.540 0.498
 Foreign0.111 0.314
 Secondary 50.957 0.202
 Number of observations 15,771 
 Number of groups 361 
 Size of true groups50.7 49.9
 Size of observed groups49.9 49.7
HistoryScore70.156 17.280
(Sec. 4)Age15.230 0.580
 Socioecon. index 
 Perc. high0.337 0.473
 Perc. med.0.403 0.491
 Gender (female = 1)0.533 0.499
 Foreign0.127 0.333
 Secondary 50.044 0.205
 Number of observations 55,057 
 Number of groups 382 
 Size of true groups144.6 134.8
 Size of observed groups144.1 134.5

We observe test scores and characteristics of students taking the same test in June 2005. Sample sizes are 41,778 for French, 54,981 for Science, 15,771 for Mathematics and 55,057 for History. We also observe the number of students who completed coursework but postpone test-taking to August 2005. There are 118 students postponing French, 186 postponing History, 195 postponing Science and 160 postponing Mathematics. We observe between 314 and 382 peer groups depending on the subject matter considered. The average group size is between 50 (Mathematics) and 146 (Science). The ratio between the number of groups and the average group size varies between 2.36 (French) and 7.23 (Mathematics). These numbers are relatively small, which suggests that our estimates could be subject to weak identification problems. The group size standard deviation is quite large, however, varying between 50 (in Mathematics) and about 135 (in Science and History). We expect such dispersion in group sizes to help identification. We analyze these issues in more details in Section 6.


5.1 CML and Pseudo CML Estimates

Table 2 reports the results of maximum likelihood estimation with unrobust (CML) and robust (pseudo CML) standard errors. The model estimated is the linear-in-means model with group fixed effects, individual impacts, and endogenous and contextual peer effects. We find that the estimated endogenous peer effect lies between − 0.24 and 0.83. Using unrobust standard errors (in brackets), the endogenous effect is significantly different from zero and positive for Mathematics (inline image) and History (inline image). It is not significant for French (inline image) or for Science (inline image). Based on robust standard errors, it is no longer significant for History (p-value = 10.82%) but still significant for Mathematics. One thus concludes that, regarding this peer effect, inference appears to be driven by normality for one subject (History). In general, standard errors are larger using pseudo CML than CML, but their differences are not so important.

Table 2. Peer effects on student achievement: conditional maximum Likelihood and pseudo conditional maximum likelihood
  • Notes:

  • CML unrobust standard errors in brackets; pseudo CML robust standard errors in parentheses.

  • Asterisks indicate significance level at

  • **


  • *


  • based on robust SE.

  • The dependent variable is the score in June 2005 provincial secondary exams.

Endogenous effect0.296-0.2310.827**0.641
Contextual effects
Socioecon. index (high)16.6138.94129.310*-6.367
Socioecon. index (medium)-4.76522.15618.246-6.713
Gender (female = 1)-24.87014.85215.558*-11.837
Secondary 5167.926**-0.334-6.08024.041
Individual effects
Socioecon. index (high)1.423**1.609**2.112**2.019**
Socioecon. index (medium)0.670**0.785**1.189**0.795**
Gender (female = 1)3.807**0.3191.018**-1.641**
Secondary 510.519**1.653**6.474**3.126**

Two reasons may explain why the endogenous peer effects in Mathematics is significant in our sample. First, the standard error of the estimates is smaller in Mathematics than in other subjects. This is consistent with the fact that the average group size relative to the number of groups is close to three times smaller in Mathematics than in other subjects. Second, our endogenous effect estimate is much larger in Mathematics (0.82). How does this result compare with other studies? Sacerdote (2011) has recently provided a survey of studies of endogenous peer effects in test scores for primary and secondary schools based on linear-in-means models (see his Table 4.2). Interestingly, in most reported studies (five of six) which analyze achievement in both Mathematics and Reading, the endogenous peer effect is larger in Mathematics. In addition, this effect is often very high and exceeds the value we have estimated. Thus Hoxby (2000) reports a 1.7- to 6.8-point increase in own score in relation to a 1-point increase in mean score of peers in some specifications. Betts and Zau (2004) show a 1.9-point increase in association with a 1-point increase in mean math score of peers. On the other hand, Hanushek et al. (2003) obtain a Mathematics peer effect of 0.4.15 Thus our estimate lies on the average to high side of the range of previous estimates. Observe finally that our results in Mathematics are larger than those usually obtained in studies based on randomized experiments (e.g. Sacerdote, 2001; Zimmerman, 2003). One possible explanation is that peers used in these papers are often people from the same dorm. These individuals do not necessarily represent those who exercise significant influence on students' scholarly achievement.

The relatively large endogenous peer effect in Mathematics may reflect the fact that mathematics provides more opportunities for interactions among students. Also, probably more than in other subjects, it may also reflect general effects such as disruption. For instance, it is likely that success in Mathematics requires much concentration in class from the average student. Now suppose that there is a student (with a low grade in Mathematics) in class who is characterized by his propensity to disrupt learning by bad behavior or asking poor questions. His behavior may have large negative effects on his peers' scholarly achievement (see, for example, Lazear, 2001) and thus generates strong endogenous peer effects.

Regarding individual characteristics, most of them have a significant effect on test scores, and the signs of these effects essentially conform to expectations. All test scores decrease significantly with age. Since older students have often repeated a grade, being younger is a natural proxy for ability. Test scores are significantly higher for female students than for male students, except for History, where male students perform significantly better than female students. This is broadly consistent with results from previous studies. For instance, results from the 2000 Program for International Student Assessment (PISA) show that Québec female students perform better than males on reading literacy tests but that the differences in performance on Mathematics and Science tests are smaller and not significant (see Québec Government, 2001). Similarly, in our analysis, the difference in performance is quantitatively large in French but much smaller in the other disciplines. The performance of foreign students is, not surprisingly, significantly lower than for non-foreign students on the French test, but higher for Science and History and not significantly different for Mathematics. Secondary 5 students tend to perform significantly better on all tests than Secondary 4 students, which reflects the positive impact of an additional year of schooling on test scores. Finally, students from a higher socioeconomic category perform significantly better in all tests.

As far as contextual variables are concerned, a few of them have a significant impact on student performance. Average age of other students has a negative and significant effect on all test scores except Mathematics, where it is positive but not significant. These results also conform our expectations. When the number of repeaters rises (as reflected by an increase in mean age of our peers at a given grade level), this will tend to reduce own test score. The proportion of other students enrolled in Secondary 5 has a large positive and significant effect on own score in French. Peers' socioeconomic background has little effect on own schooling performance. The proportion of female students among peers has a positive and significant effect in Mathematics. When significant, the magnitude of contextual effects is always larger than the magnitude of individual effects. This is not surprising as it captures the effect of a unit change in the characteristics of every other student in the group.16

5.2 Reflection Problem

One way of addressing the simultaneity problem without exploiting group size variations is to exclude at least one contextual variable from the outcome equation and to use it as an instrument for average test score. We estimate a model similar to the one presented in Table 2 but excluding contextual effects that are not individually significant in the pseudo CML specification (i.e. for which the null that δ = 0 is not rejected); see Table 1 of the supplementary data Appendix, available online as supporting information. Using likelihood ratio tests, we reject the null that these δs are jointly equal to zero for French but not for the other subjects. This suggests that the exclusion restrictions may be valid for these latter samples. Therefore, the pseudo CML estimators provided in Table 1 of the supplementary Appendix should be consistent and asymptotically more efficient than those provided in Table 2 of the main text for the Science, Mathematics and History tests. Results, however, appear to be robust to these new specifications. Observe finally that we could not have known this a priori without an estimation of the full model.

Overall, this shows the interest of Lee's solution to the reflection problem. Estimating a model with both endogenous and contextual peer effects is needed to recover the different types of peer effects at work.

5.3 2SLS and G2SLS Estimates

Table 3 provides the 2SLS estimation results of the linear-in-means model of peer effects with group fixed effects, individual impacts, and endogenous and contextual peer effects. In contrast to the CML and pseudo CML estimates of Table 2, none of the endogenous effects is statistically significant. This is consistent with Lee's (2007, p. 345) result that the asymptotic efficiency of IV estimators is smaller than that of the CML. Estimated individual effects are quite similar to the corresponding CML estimates. Some contextual effects are similar, while others are different. For instance, the proportion of other students in Secondary 5 still has a large and positive effect on own French score as well as no significant effects for the other subjects. In contrast, average age among peers now has a positive and significant effect on own score for most subjects, rather than a negative one. This could be explained by differences in small-sample properties of both methods, possibly aggravated by the imprecision in the estimation of the endogenous peer effect.

Table 3. Peer effects on student achievement: 2SLS estimation with group fixed effect
  • Notes:

  • Robust standard errors in parentheses.

  • Asterisks indicate significance level at

  • **


  • *


  • The dependent variable is the score in June 2005 provincial secondary exams.

Endogenous effect1.378-0.509-0.0370.787
Individual effects
Socioecon. index (high)1.373**1.754**1.836**2.041**
Socioecon. index (medium)0.661**0.826**1.069**0.803**
Gender (female = 1)3.871**0.333**0.965**-1.553**
Secondary 59.516**1.415**6.674**2.910**
Contextual effects
Socioecon. index (high)7.36430.997*15.962**-6.246
Socioecon. index (medium)-7.10326.344*13.501*-8.047
Gender (female = 1)-21.310*15.63713.237**0.567
Secondary 540.184-17.3707.8252.537
Sargan test23.520.541.405.35
Stock and Yogo test706.841055.92464.43660.40
[Critical value for b = 0.05 at sign. level of 5%][18.37][18.37][18.37][18.37]

Table 3 also reports two standard test results giving information on instrumental variables properties. We first look at Sargan tests on the validity of instruments and the over-identification restrictions of the model. We do not reject the null for Science, Mathematics and History, but we reject it for French. While this may indicate a problem of model specification in this last case, one must be cautious in interpreting the test given the likely low convergence of peer effects IV estimates. We then compute Stock and Yogo test statistics on weak identification. Based on the definition that a group of instruments is weak when the bias of the IV estimator relative to the bias of ordinary least squares exceeds a certain threshold b, say 5%, one rejects the null that the instruments are weak for all subject matters. Finally, Hausman tests have been performed to test the equality of pseudo CML and G2SLS estimators. Under the null, both of these estimators are consistent, but pseudo CML estimators are asymptotically more efficient; under the alternative, G2SLS estimators are consistent whereas pseudo CML estimators are not. For each subject, we could not reject the null. This suggests the absence of specification errors in the model. Finally G2SLS estimates are provided in Table 2 of the supplementary data Appendix. Results are qualitatively similar to those of Table III.


In this section we study through simulations the effect of group sizes and their distribution on the precision and bias of our estimates. Lee (2007) shows that the CML and IV estimators may converge in distribution at low rates when the ratio between the the number of groups and the average group size is small. Since this ratio varies between 2.36 and 7.23 in our samples, a problem of weak identification could in principle emerge. However, the standard deviation of the distribution of group sizes is also relatively large (see Table 1), and we suspect that this may help identification. To study these issues, we realize two simulation exercises. First, we vary group sizes in a systematic manner and study how this affects the bias and precision of estimators. To focus on the approach which provides the most reasonable findings in our empirical analysis, we report results on the model using CML.17 We look at uniform distributions, vary the size of the distribution's support and partly calibrate simulation parameters on our data. Second, we look at bias and precision of estimates for fully calibrated simulations, when group sizes are exactly the same as in the data. Overall, while our analysis confirms Lee's earlier results, we also find a strong positive impact of the dispersion in group sizes on the strength of identification. In particular, conditional maximum likelihood performs well on fully calibrated simulations. This suggests that the bias due to small sample issues is likely low in the results presented in Table 2.

For each simulation exercise, we keep the number of observations fixed around 42,000 and run 1000 replications. We first consider average sizes of 10, 20, 40, 80 and 120. We pick group sizes from the following intervals with decreasing length:

  • average size of 10: [3, 17], [5, 15], [7, 13] and [9, 11];
  • average size of 20: [3, 37], [8, 32], [13, 27] and [18, 22];
  • average size of 40: [3, 77], [12, 68], [21, 59], [30, 50] and [39, 41];
  • average size of 80: [3, 157], [18, 142], [33, 127], [48, 112] and [63, 97];
  • average size of 120: [3, 237], [28, 212], [53, 187], [78, 162] and [103, 137].

For each of the intervals described above, we proceed in the following manner:

  • pick a group size from a uniform distribution for which the support is defined by the minimum and maximum value of the interval;
  • truncate this value by eliminating its decimal portion;
  • repeat steps 1 and 2 as long as the total number of observations is below or equal to 42,000.

To reduce computing time, we assume that students have the same characteristics except for age and gender. We assume that age follows a normal distribution and gender follows a Bernoulli distribution. We calibrate the moments of these distributions on the sample of students taking the French test: average age is 16, variance of age is 0.25 and proportion of girls is 0.55. Values of the structural parameters β, γ and δ are set close to the estimated coefficients for the French test: β = 0.35, γage = − 8, γgender = 3.8, δage = − 40, δgender = − 25.

We assume that the values of ε in the structural equation are drawn randomly from a normal distribution with mean zero and variance σ2 = 1. We generate the endogenous variable y from the reduced-form equation in deviation form.

Looking at Table 4, we first compare simulation results across average group sizes and then we examine how estimators perform for a given average group size as dispersion in group size decreases. Separate horizontal panels in Table 4 pertain to different values of average group size. We report the average estimated coefficient and standard error for the endogenous effect (first vertical panel), the contextual effect associated with age (second vertical panel) and the contextual effect associated with gender (third vertical panel). We find that even for the largest average group size (i.e. 120), CML may perform well in terms of bias and precision (first line in the last horizontal panel of Table 4). The biases of CML get larger, in general, as average group size increases. The CML estimate of the endogenous effect attains a plateau at the value 1. This is consistent with the fact that the CML estimator tends towards the naive OLS estimator as group sizes become larger. In general, peer effects are also less precisely estimated in large groups than in small groups.

Table 4. Group size variation: simulations using CML
Avg. group sizeGroup sizes RangeEndogenous effectContextual effects: ageContextual effects: gender
Avg. coeff.SEAvg. coeff.SEAvg. coeff.SE
  1. Note: True value of parameters: endogenous effect, 0.35; contextual effects: age, -40; contextual effects: gender, -25.

10[3; 17]0.350.00−40.010.25−25.010.33
10[5; 15]0.350.00−40.000.35−24.990.74
10[7; 13]0.350.02−40.000.53−25.011.50
10[9; 11]0.570.38−40.271.79−26.975.78
20[3; 37]0.350.00−40.010.27−25.030.44
20[8; 32]0.350.02−40.000.50−25.021.10
20[13; 27]0.350.09−39.951.23−25.042.11
20[18; 22]0.941.56−37.985.37−28.558.47
40[3; 77]0.350.00−39.980.41−25.030.65
40[12; 68]0.360.07−39.971.42−25.051.66
40[21; 59]0.390.20−39.852.76−25.142.67
40[30; 50]0.720.85−37.925.82−26.935.30
40[39; 41]1.00155.98−36.2578.67−26.2869.22

Our main new result concerns the effect of group size dispersion. When we fix the value of the average group size and reduce the length of the interval from which group sizes are picked, we find that the bias of CML typically increases, while the precision typically decreases. In Table 4, this amounts to looking at each horizontal panel separately. Observe, however, that since we roughly pick group sizes from a uniform distribution holding average group size fixed, reducing the interval's length affects the two parameters of the size distribution (i.e. the minimum and maximum value of its support) and a number of its moments. In particular, this leads to a reduction in variance and to an increase in the size of the smallest groups. In general, both the variance and the size of smallest groups may matter and the strength of identification may depend on the size distribution in complex ways. We leave a deeper investigation of this issue to future research.

We next fully calibrate the simulations' parameters on the data. We use observed group sizes in the French sample, calibrate the model parameters {β, γage, γgender, δage, δgender} and moments of the explanatory variables as previously, and set the variance of the error term in the structural equation equal to the estimated variance in the French sample (inline image). Simulation results which now report both CML and IV estimates are reported in Table 5. The CML estimator has small bias and standard error, while the IV estimator is not precisely estimated and the bias is large. These results confirm for CML what we obtained from picking group sizes at random; they show that dispersion in group sizes helps identification. Besides, this suggests that small sample bias may be relatively high in the IV estimates of Tables 3, and of Table 2 of the supplementary Appendix, but relatively low for the CML estimates of Table 2.

Table 5. Simulations calibrated on French sample (1000 replications)
  1. Notes:

  2. Average standard errors are in parentheses. Group sizes are calibrated on our French sample. inline image (calibrated)= 154.704.

  3. True value of parameters: endogenous effect, 0.35; individual effects: age: -8; individual effects: gender, 3.8; contextual effects: age: -40; contextual effects: gender: -25.

Endogenous effect0.391−0.8730.495−33.571
Individual effects
Gender (female = 1)3.7983.8223.8284.480
Contextual effects
Gender (female = 1)−25.329−16.703−21.857210.526


This paper provides an analysis of social interactions in scholar achievement when students interact through groups. Based on a linear-in-means approach with group fixed effects (Lee, 2007), we make two main contributions regarding the identification and estimation of peer effects. First, we provide a new intuition for identification. We show that full identification of the model relies on three key properties. (i) Since the individual is excluded from his peer group, above-average students have below-average peers (with respect to any attribute). Therefore, when individual and peer effects are positive, peer effects then tend to reduce the dispersion in outcomes. (ii) This reduction is stronger in smaller groups, reflecting the larger effect of excluding one individual from the mean. (iii) Contextual and endogenous peer effects generate reductions of different shapes, which allow us to identify both of them.

Second, as regards the estimation of peer effects, the model is applied to original administrative data providing individual scores on standardized tests taken in June 2005 in four subjects by fourth- and fifth-grade secondary school students in the Province of Québec (Canada). Based on a pseudo conditional maximum likelihood approach, our results indicate that students significantly benefit from their peers' higher test scores in Mathematics but not in other subjects such as Science, History and French. Two reasons may explain these results. First, this is likely to reflect the fact that Mathematics provides more opportunities for interactions among students. Second, in our sample, the average group size (relative to the number of groups) is close to three times smaller in Mathematics than in other subjects. As suggested by Lee (2007), accurate estimation of peer effects requires relatively small groups. This is also confirmed by our Monte Carlo simulations. These results should be warning applied researchers in the future against using data in which the size of groups is too large. Besides, our simulations indicate that, for a given average group size, increasing group size dispersion improves the precision of peer effects estimates. In fact, our results suggest that, conditional on estimating on the whole sample, even data on larger groups may provide useful information for estimation purposes. The basic intuition is that data on very large groups can be used to provide more precise individual effects estimators. In turn, this indirectly provides more efficient estimates of the peer effects from data on smaller groups. Thus future estimations of Lee's model may benefit from data with relatively small average group size but relatively large group size dispersion, including both small and large groups.

In terms of public policy, the fact that the endogenous peer effects appear to be very large in Mathematics suggests that a reform that improves the amount and quality of Mathematics learning is likely to yield very high returns in terms of scholar achievement. This is so since such a reform will not only have direct effects on student performance in Mathematics but also strong indirect effects through the additional external benefits generated by the social multiplier. Remarkably, our analysis also shows that the indirect peer effects of the reform will reduce performance inequalities in Mathematics across students. This is the case because low-ability students have better peers (since their peers exclude them) and high-ability students have worse peers (for the same reason). Moreover, the strong negative effects of the average age of peers on scholar achievement (except in Mathematics) suggest that resources invested by the government to reduce the number of repeaters may have an important indirect positive impact on student performance. One limitation of Lee's linear-in-means approach is that it imposes that average test scores over all schools are not influenced by a reallocation of students across schools (see Sacerdote, 2011). Therefore, the model does not have much to say about issues such as optimal school composition by race or ability.

Our research could be extended in many directions. It would be interesting to evaluate the validity of this approach by using data where group membership is experimentally manipulated and group sizes are heterogeneous (as in Sacerdote, 2001). One could also analyze how group size variations may help to identify peer effects when the outcome is a discrete variable (e.g. pass or fail). Brock and Durlauf (2007) have studied peer effects identification with discrete outcomes but they ignore group size variations. A third potentially fruitful direction of research would be to analyze a nonlinear version of Lee's approach. Thus student achievement could depend on the mean and standard deviation of peers attributes. Overall, we think that this first empirical application confirmed the interest of the method. Many more applications in different settings are needed, however, in order to gain a thorough understanding of the method's advantages, limitations and applicability for public policy.


We thank Lung-fei Lee, seminar participants at the University of Toronto, Northwestern University, Université de Paris 1, and the CIRANO-CIREQ Conference on the Econometrics of interactions for insightful discussions, and three anonymous referees and the co-editor Thierry Magnac for very helpful comments. We are also grateful to the Québec Ministry of Education, Recreation and Sports (MERS) for providing the data, in particular Raymond Ouellette and Jeannette Ratté for their assistance in obtaining and interpreting the data used in this study. The views expressed in this paper are solely our own and do not necessarily reflect the opinions of the MERS. We received excellent research assistance from Steeve Marchand. Support for this work has been provided by the Canada Chair of Research in Economics of Social Policies and Human Resources, and le Fonds Québécois de Recherche sur la Société et la Culture and le Centre Interuniversitaire sur le Risque, les Politiques Économiques et l'Emploi.

  1. 1

    More precisely, Manski studies a social interactions model which, in terms of identification, has the same properties as a model where individuals interact in groups and each individual is included in his peer group (see Bramoullé et al., 2009).

  2. 2

    The effects of individual characteristics, such as gender, age and socioeconomic background, on test scores are precisely estimated by either method, and these estimates generally conform to expectations.

  3. 3

    For two recent comprehensive surveys on peer effects in education, see Sacerdote (2011) and Epple and Romano (2011).

  4. 4

    Bramoullé et al. (2009) determine conditions under which endogenous and contextual peer effects are identified when students interact through a social network known by the modeler and when correlated effects are fixed within subnetworks. See also Section 3.4.2. in this paper.

  5. 5

    In fact, at the end of secondary level, classes and teachers are usually different, depending on the subject matter taught.

  6. 6

    It is easy to show that when γβ + δ = 0, only γ is identified.

  7. 7

    To see this, observe that inline image. So if xi < xk then inline image.

  8. 8

    In contrast, if γ > 0 and δ < 0 amplify the dispersion in outcomes.

  9. 9

    Of course, one has to address a basic modeling question first, i.e. whether the implied model is coherent. A model has this property when a specific nonlinear structure generates aunique solution for outcomes.

  10. 10

    If Nri denotes the group of peers of student i, we also have Lr = Mri − Nri

  11. 11

    Note that only nr − 1 elements of inline image are linearly independent.

  12. 12

    In fact, inline image and inline image, hence instruments are built here by pre-multiplying characteristics (in deviation) by group-dependent weights and by stacking them across groups.

  13. 13

    There are more individual test scores than students as some students take tests in more than one subject area.

  14. 14

    The language of instruction is French in most schools, and English otherwise.

  15. 15

    Kang (2007, p. 475) also provides a survey of endogenous peer effects in achievement in Mathematics which is broadly consistent with results reported in Sacerdote (2011).

  16. 16

    We have also estimated a second-order pseudo CML in which restrictions are directly incorporated into the variance term and estimated. Results are quite similar to those presented in Table 2.

  17. 17

    In an earlier version of the paper, we also provided results for IV estimates. Basically, the results are qualitatively the same for IV as those for CML but, as expected, the magnitude of the bias and the loss in precision are always larger for IV than for CML.