A discrete-choice model with social interactions: with an application to high school teen behavior

Authors


Abstract

We develop an empirical discrete-choice interaction model with a finite number of agents. We characterize its equilibrium properties—in particular the correspondence between interaction strength, number of agents, and the set of equilibria—and propose to estimate the model by means of simulation methods. In an empirical application, we analyze the individual behavior of high school teenagers in almost 500 school classes from 70 schools. In our baseline model endogenous social interaction effects are strong for behavior closely related to school (truancy), somewhat weaker for behavior partly related to school (smoking, cell phone ownership, and moped ownership) and absent for behavior far away from school (asking parents' permission for purchases). Intra-gender interactions are generally much stronger than cross-gender interactions. In a model with school-specific fixed effects social interaction effects are insignificant, with the exception of intra-gender interactions for truancy. Copyright © 2007 John Wiley & Sons, Ltd.

1. INTRODUCTION

Early contributions by Veblen (1899), Duesenberry (1949), Leibenstein (1950), Pollak (1976), and others show that economists have recognized the potential importance of social interactions for a long time. The slow progress of empirical research in this area is to a large extent related to a number of methodological problems. As described by Manski (1993, 2000) and others, a major difficulty is to disentangle endogenous social interactions (which imply a social multiplier effect) from other types of social interactions (which do not imply a multiplier effect). Another problem is the endogeneity of reference groups. Recent years have shown an increasing number of empirical studies searching for credible empirical evidence on social interactions, in part by using data that are quasi-experimental in nature; see Sacerdote (2001), Durlauf and Moffitt (2003), and Duflo and Saez (2003) for examples.

The present paper focuses on methodological problems related to a specific but frequently encoutered situation: social interactions in small groups when choice variables are discrete. In a discrete-choice model with endogenous social interactions, the choices of other individuals are explanatory variables in the equation describing the choice behavior of a given individual. For estimation and other purposes, the reduced form (or ‘social equilibrium’ or ‘solution’) of the model is required. While the reduced form is straightforwardly obtained in a linear model with continuous variables, its derivation is more complicated in the case of discrete variables. As already noted by authors analyzing the simultaneous probit model (see, for example, Heckman, 1978; Maddala, 1983), such models may not have a solution or may have multiple solutions. This in turn may yield problems regarding the statistical coherency of the model.

In Section 2 we present the model and characterize its equilibrium properties, in particular the correspondence between interaction strength, number of agents, and the set of equilibria. Section 3 proposes to estimate the model by means of simulation methods, assuming that observed choices represent an equilibrium of the static discrete game played by all interacting agents. Section 4 is devoted to an empirical application. We analyze a sample of 485 high school classes with detailed information on the individual behavior of the students within each class. As all students in a sampled class are interviewed in principle, the dataset has rich information on the behavior of potentially important peers of each respondent. We estimate the model for five types of discrete choices made by teenagers: smoking, truancy, moped ownership, cell phone ownership, and asking parents' permission for purchases. To control for sorting into schools and omitted variables that induce a positive correlation between peers, we also estimate versions that allow for within-class correlation of error terms and for school-specific fixed effects. We find strong social interaction effects for behavior closely related to school (truancy), somewhat weaker social interaction effects for behavior partly related to school (smoking, moped and cell phone ownership) and no social interaction effects for behavior far away from school (asking parents' permission for purchases). Intra-gender interactions are generally much stronger than cross-gender interactions. Once we control for school-specific fixed effects, social interaction effects become insignificant, with the exception of intra-gender interactions for truancy.

A number of recent papers have analyzed social interactions in a discrete-choice framework. Brock and Durlauf (2001a, 2006) use a random-fields approach to study aggregate behavioral outcomes in an economy in which social interactions are imbedded in individual decisions. Equilibrium properties of this model are derived by imposing a rational expectations condition on the subjective choice probabilities of the agents and by assuming that the number of agents is sufficiently large that each agent ignores the effect of his own choice on the average choice level. In contrast, the present paper describes behavior in relatively small groups of a given size in which choices of other individuals can be assumed to be fully observable. For this reason, it is more appropriate to model the interactions as a non-cooperative game, by making an individual's pay-off dependent on the actual choice of others in his group. In the analysis, we will focus on the one-shot pure Nash equilibria of this game. In a recent paper Tamer (2003) proposes a semi-parametric estimator which allows—under certain conditions—for consistent point estimation of the model in the N = 2 case without making assumptions regarding non-unique outcomes. Its extension and empricial implementation to equation image have not been fully developed as yet. Gaviria and Raphael (2001) analyze school-based peer effects in the individual discrete-choice behavior of tenth-graders. However, their econometric model ignores multiplicity of equilibria.

2. DISCRETE-CHOICE INTERACTIONS AND MULTIPLE EQUILIBRIA

2.1. Preliminaries

Consider a population of N individuals indexed by i, i = 1, 2, …, N. Each player i faces a binary choice and these choices are denoted by an indicator variable yi which has support Yi = {−1, 1}.1

Yi is the strategy set of player i and equation image. Elements of Y are called strategy profiles or choice patterns. A strategy profile is denoted by y = (yi, y−i), where y−i = (y1, y2, …, yi−1, yi+1, …, yN)′. Note that the number of elements in Y is 2N. Each individual makes a choice in order to maximize a pay-off function V: Y ∪ {−∞}. For ease of exposition we will sometimes refer to y = 1 as ‘smoking’ and to y = − 1 as ‘non-smoking’, although we will also consider other types of behavior in the empirical part of the paper.

In the standard economic approach, the pay-off function is dependent on individual characteristics. Following the notation in Brock and Durlauf (2001b), we assume that these characteristics can be divided into an observable vector xi and a random shock ϵi(yi) that is unobservable to the modeler but observable to agent i. Moreover, in interactions-based models explicit attention is given to the influence of the behavior of others on each individual's choice. Each choice is then described as

equation image(1)

Similar to Brock and Durlauf (2001b), we assume that the pay-off function V can be additively decomposed into three terms:

equation image(2)

where the first term u(yi, xi) denotes deterministic private utility, S(yi, xi, y−i) denotes deterministic social utility, and ϵi denotes random private utility. In this paper we assume the social utility term to have the following form:

equation image

Define y−ij = y\{yi, yj} so that (yi, xi, y−i) = (yi, xi, yj, y−ij). Note that

equation image(3)

Thus, for γ> 0 the utility of smoking (versus non-smoking) when another person smokes as well is larger than the utility of smoking (versus non-smoking) when another person does not smoke. In this case the parameter γ measures the strategic complementarity between the choice of any pair of individuals; for γ< 0 it measures the extent to which the choices are strategic substitutes.2 In fact, for γ> 0 (γ< 0), the model falls into the class of supermodular (submodular) games. Supermodular (submodular) games are games in which each player's strategy set is partially ordered and the marginal returns to increasing one's strategy (in this paper moving from y = − 1 to y = 1) rise (decrease) with increases in the competitors' strategies.3

Conditional on the choice by individual i, deterministic private utility is assumed to be a linear function of exogenous characteristics xi; i.e., u(1, xi) = β′1xi and u(−1, xi) = β′−1xi.

The best response function of individual i given the choices of the other individuals can now be represented as

equation image(4)

where

equation image

and where equation image denotes the difference between the utility individual i derives from choosing yi = 1 and the utility he derives from choosing yi = − 1, conditional on y−i; that is,

equation image

with β≡β1 − β−1; ϵi≡ϵi(1)− ϵi(−1).

Define x≡(x′1, x′2, …, x′N)′ and ϵ≡(ϵ1, ϵ2, …, ϵN)′. A strategy profile y is a pure Nash equilibrium profile if and only if it is consistent with (4) for all i; i.e., if after substitution of these values of yi in si we have equation image for all i with yi = 1, and equation image for all i with yi = − 1.

Let Q(β, γ, x, ϵ, N) denote the number of pure Nash equilibria given {β, γ, x, ϵ} and the population size N. That is, for N ≥ 2,

equation image(5)

with I(·) an indicator function.4. In the model without social interactions (i.e., γ = 0) each combination of {β, γ = 0, x, ϵ} obviously defines a unique equilibrium, and thus Q(β, 0, x, ϵ, N) = 1.

An important feature of the model with social interactions is that, for a given combination of {β, γ≠ 0, x, ϵ}, several strategy profiles may be consistent with (4). For example, if N = 2, γ = 1, and equation image, profiles y = (1, 1)′ and y = (−1, − 1)′ are both consistent with (4). In the left-hand panel of Figure 1, equilibrium profiles for this two-person game are drawn in ϵ-space. The shaded area is the area with multiple equilibria.

Figure 1.

Multiple equilibria in ϵ-space (N = 2, γ> 0, β′x1 = β′x2 = 0)

2.2. Equilibrium Properties

This section provides three propositions on the equilibrium properties of model (4). Proposition 1 guarantees the existence of an equilibrium in pure strategies. It turns out that the situation with strategic complements (γ> 0) is characterized by fundamentally different equilibrium behavior from the one with strategic substitutes (γ< 0). Moreover, in the latter case it makes a difference whether the population has an even or an odd number of members. Propositions 2 and 3 provide strict upper bounds on the number of equilibria, for the case with strategic complements and for the case with strategic substitutes, respectively.

Define zi≡β′xi + ϵi and equation image; that is, k is the net number of agents choosing y = 1.5 Rank observations on the basis of the values of zi. Denote the ordered values as z[1]z[2] ≥ …≥ z[N]. Denote the corresponding values of y for the agent with z[j] as y[j]. Note that the latter are not ordered, such that it is not precluded that e.g. y[j] < y[j+1].

Proposition 1: Existence of an equilibrium in pure strategiesFor every combination {β, γ, x, ϵ} there exists at least one vectory≡(y1, y2, …, yN)′ for which (4) holds.

Proof See the Appendix.

Proposition 2: Maximum number of equilibria (strategic complements)For every combination {β, γ> 0, x, ϵ}, the discrete interaction model (4) withNagents can have at mostd(N) distinct equilibria, with

equation image(6)

Moreover, for every numberN, there exists a combination of {β, γ> 0, x, ϵ} for whichQ(β, γ, x, ϵ, N) = d(N).

Proof See the Appendix.

The first part of Proposition 2 states that in the case of strategic complements the maximal number of equilibria grows linearly in N. The second part ensures that the upper bound on the number of equilibria is strict.

Proposition 3: Maximum number of equilibria (strategic substitutes)For every combination {β, γ< 0, x, ϵ}, the discrete interaction model (4) withNagents can have at mostd(N) distinct equilibria, with

equation image

Moreover, for every even (odd) numberN, there exists a combination of {β, γ< 0, x, ϵ} for whichQ(β, γ, x, ϵ, N) = de(N)(Q(β, γ, x, ϵ, N) = do(N)).

Proof See the Appendix.

Proposition 3 states that for the situation with strategic substitutes the maximal number of equilibria grows exponentially in N. As in the case with strategic complements, the upper bound on the number of equilibria is strict. Note that equation image for all even N and equation image for N odd. That is, in the limit adding one agent to the population doubles the upper bound on the number of equilibria.

It is also worth mentioning that with strategic substitutes equation image decreases monotonically to 0 (1) as γ→− ∞ for N even (N odd). In fact, this result holds more generally: in equilibrium, the difference between the number of agents choosing y = 1 and the number of agents choosing y = − 1 is smaller when γ is more negative, other things equal.

2.3. Extension to More General Interactions

The model considered so far only allows for identical interactions between all individuals in the group. In the more general case the degree of interactions between two given individuals may depend, for example, on their socio-economic characteristics. In this section, we briefly discuss the consequences of one particular extension of the model given by (4) in which the degree of interaction is made gender-dependent. This leads to four different interaction parameters: γGB measures the effect of boys on girls; γBG from girls on boys, and γGG and γBB the intra-gender effects between girls and boys, respectively. Specify

equation image(7)

where

equation image

with equation image is a girl) and equation image is a boy), equation image.

Corollary 1 For every combination {β, γBB ≥ 0, γGG ≥ 0, γGB, γBG, x, ϵ} there exists at least one vectory≡(y1, y2, …, yN)′ for which (7) holds.

Proof See the Appendix.

The equivalent of Proposition 2 for the extended model follows automatically:

Corollary 2 For every combination {β, γBB > 0, γGG > 0, γGB, γBG, x, ϵ}, the discrete interaction model given by (7) withNGgirls andNBboys can have at mostd*(NB, NG) distinct equilibria, where

equation image

Moreover, for allNGandNB, there exists a combination of {β, γBB ≥ 0, γGG ≥ 0, γGB, γBG, x, ϵ} for which the maximum number of equilibria is obtained.

It is noteworthy that the values of the cross-gender interaction parameters γGB and γBG do not play a role in the derivation of the upper bounds for the number of equilibria.

3. ESTIMATION BY SIMULATION

To estimate the model by maximum likelihood we require the probability P(y) that we observe y, for any given set of parameter values.

A choice pattern y observed for a particular group is either a single equilibrium or one of multiple equilibria. The support in ϵ-space for choice pattern y is

equation image(8)

where

equation image

for all i, i = 1, …, N. Denote the region in ϵ-space defined in (8) by W(y, θ), with θ being the parameters to be estimated. Since W(y, θ) may also support equilibria other than y, we have P(ϵW(y, θ))≥ P(y).

Following Bjorn and Vuong (1984) and Kooreman (1994) we make a randomization assumption in the case of multiple equilibria: whenever the model generates multiple equilibria we assume that one of them will occur with probability equal to one over the number of equilibria. To determine the number of equilibria in the various subregions of W(y, θ) we use a simulation-based method. Consider R random draws (indexed by r, r = 1, …, R) from the joint distribution of (ϵ1, …, ϵN) on W(y, θ). For each draw, we calculate the number of equilibria. Recall that by construction of W(y, θ), y is either the single equilibrium or one of the multiple equilibria. Let Ωr be the set of equilibria corresponding to draw r and let Er denote the number of elements in Ωr (i.e., Er is the number of equilibria at draw r). Then the probability P(y) that choice pattern y will be observed is consistently estimated by the frequency simulator

equation image(9)

This procedure guarantees the statistical coherency of the model; i.e., equation image, where yt, t = 1, …, N is the enumeration of all elements in Y. Note that since Er ≥ 1 we have equation image.

Alternatively, P(y) could be estimated directly using

equation image(10)

with R2 the number of draws from the joint distribution of (ϵ1, …, ϵN) on ℜN. However, this would require the number of draws to be of a much larger magnitude to achieve the same precision as achieved when using (9).6

We assume that (ϵ1, …, ϵN) is independently normally distributed (the independence assumption will be relaxed in Section 4.4). We have used values for R1 of 100 and 1000. These values may induce some simulation variance in both the estimated parameters and the estimated standard errors. However, the estimated probabilities are sufficiently precise as inputs in a likelihood maximization algorithm, and experiments in which we increased the value of R1, other things equal, suggest that using larger values for R1 would not substantially change our empirical results. The likelihood functions were maximized using a Newton–Raphson type of algorithm with numerically evaluated derivatives.7 The standard errors were obtained by inverting the approximate Hessian based on the outer products of class-specific score vectors.

The characterization of the equilibria in Propositions 1, 2, and 3 and their proofs turns out to be extremely helpful in developing an algorithm for estimation. Let equation image; i.e., M denotes the number of individuals choosing y = 1. Then equation image implies equation image. From the proof of Proposition 2 it follows that, with γ> 0, the M agents with yi = 1 are those with the M largest values of zi. To determine whether there exists an equilibrium with equation image, we therefore first rank observations on the basis of the values of zi, for a given draw of (ϵ1, …, ϵN). An equilibrium with equation image exists if and only if the inequalities

equation image(11)

with equation image, are satisfied. An equilibrium with M = 0 occurs if and only if zi − γ≤ 0 for all i; an equilibrium with M = N occurs if and only if zi + γ> 0 for all i. The proof of Proposition 2 also shows that two vectors y and equation image that differ in only one element cannot both be equilibria. As a result, we only have to check equation image out of the 2N choice patterns as possible equilibria.

Suppose that model (7), with all γ's positive, has an equilibrium with MG smoking girls and MB smoking boys. It is straightforward to show that the smoking girls are those with the largest values of zi in the subset of girls, and that the smoking boys are those with the largest values of zi in the subset of boys. As a result, we only have to check equation image out of the 2N choice patterns as potential equilibria.

Proposition 3 implies that with one or more negative γ's estimation is computationally more demanding. In an analysis of a large number of teenage behaviors—based on the same data and using a continuous version of the model in the present paper—Kooreman (2006) did not find any significant negative γ's. In this paper we impose non-negativity of γ's in the estimation procedure.

From an empirical perspective it is important to note that in the estimated models the probability of a single equilibrium usually turns out to be larger than 80%; i.e., we usually have equation image. The estimation results in this paper's application also appear to be only moderately sensitive with respect to the assumptions regarding the treatment of multiple equilibria. For example, maximizing a quasi-log-likelihood based on P(ϵW(y, θ)) yields estimates similar to those based on P1(y).

4. EMPIRICAL APPLICATION

4.1. The Data: The Dutch National School Youth Survey

We will estimate the model outlined in the previous sections using data from the Dutch National School Youth Survey (NSYS) from the year 2000.8

The dataset contains information on the teenagers' individual characteristics, time use, income and expenditures, subjective information on norms and values, and information on various behaviors and durable goods ownership. There is only limited information on the parents (including education and working hours) and no information on siblings.

Although in principle all pupils in a sampled class participate in the survey, some pupils are excluded from the data. In some cases this is because a pupil was absent when the questionnaires were filled out, in other cases because information on some of the variables is missing. We exclude classes with more than 24 students. The resulting data contains information on 7534 students in 485 classes in 70 schools.

All information is self-reported. Thus, strictly speaking, our analysis measures social interactions in how teenagers report on their behavior. The results for ‘asking parents’ permission for purchases' may provide some insight into potential differences between social interactions in reported behavior and in actual behavior. Asking parents for permission before making a purchase is an aspect of out-of-class behavior. Since this primarily concerns the relationship between a pupil and his or her parents, we expect very weak or no endogenous social interaction effects in this type of actual behavior. However, if pupils copy each others' responses to the survey questions when filling out the questionnaire, spurious social interaction effects might be found.9

4.2. Specification of the Empirical Model

Given the cross-sectional nature of the data, we will not be able to fully account for the identification problems that characterize the empirical analysis of social interactions. In order to provide a proper perspective for the interpretation of the empirical results to be presented, we briefly discuss the identification issues in relation to the present dataset: (i) the definition of the reference group; (ii) endogenous versus contextual effects; and (iii) non-random selection into reference groups.

The Definition of the Reference Group

As in any empirical analysis on social interaction we require an assumption regarding the definition of the reference group: Who interacts with whom? A number of empirical papers have defined the reference group of an individual as the group of all persons in the population within the same age group and with the same education level, using the sample analogues as an approximation (see, for example, Kapteyn et al. (1997); Aronsson et al., 1999). This is a crude definition, largely motivated by data limitations. A more attractive alternative is to use subjective information on an individual's reference group, as in Woittiez and Kapteyn (1998). However, the information on the reference group of a sampled individual is often limited as these reference group members are not themselves included in the sample. The data in the current analysis can be viewed as a reference group-based sample as all students within a sampled class are interviewed in principle. While teenage behavior is obviously also influenced by persons outside the class, classmates are likely to play a dominant role in shaping teenagers' preferences and behavior. On a weekday, the average student in the sample spends about 6 hours in his or her school class. The total time spent on school-related activities (including homework and commuting) is about 8 hours per weekday: more than 50% of the daily waking time. Teenagers within the same school or class therefore form social groups that are more clearly defined and delineated than in many other situations in which social interactions are likely to play a role. Obviously, the definition of the reference group could be extended to allow for interactions with students outside the class. Also, one could in principle refine the specification of social groups within the class beyond the boy–girl distinction, for example on the basis of ethnicity, or by allowing the effect of younger and of older classmates to be different. These extensions are left for future research.

Endogenous versus Contextual Effects

Gaviria and Raphael (2001) argue that students are less exposed to the family background of their school peers than they are exposed to the family background of peers residing in the same neighborhood. They conjecture that in an analysis of interactions through schools contextual effects are less important than in an analysis of interactions through neighborhoods. In their empirical analysis they assume that contextual effects are absent. Kawaguchi (2004) invokes subjective information about the perception of peer behaviors to achieve full identification.10 He finds that the absence of contextual effects cannot be rejected. The empirical results presented below are based on the assumption that there are no contextual effects. The estimates on the endogenous social interaction effects should therefore be interpreted as upper bounds on the true effects.

Non-random Selection into Reference Groups

To control for non-random selection into schools to some extent we will also estimate a version of the model including school-specific fixed effects. To control for selection into classes conditional on selection into schools we also allow for within-class correlation of error terms. This correlation coeffcient is identified by the nonlinearity of the model and the implicit imposition of equal within-class correlation coefficients across classes.

The vector x includes age, and dummy variables for gender, for being non-Dutch (based on the question ‘Do you consider yourself to be Dutch?’), for the type of education (MAVO (lower level), HAVO (intermediate level), and VWO (higher level), with ‘vocational’ as reference category), for Catholic, for Protestant, and for living in a ‘single-parent family’ (based on the question ‘Do you live in a family with father and mother?’). Unfortunately, a large proportion of teenagers do not know their parents' education level (41% and 36% for father's and mother's education level, respectively). We therefore choose not to include parents' eduation levels as explanatory variables. However, we do include the father's working time and the mother's working time (for a pupil with a single parent the working time of the missing parent is set equal to the sample average).11 Table I provides sample statistics for both the endogenous and exogenous variables in the model.

Table I. Sample statistics at the individual level (7534 observations)
 MeanMedianSDMin.Max.
Girl0.51571.00000.49980.00001.0000
Age14.2114.00001.4511.000021.0000
Non-Dutch0.0910.00000.2870.00001.0000
Single-parent household0.0840.00000.2780.00001.0000
MAVO0.3360.00000.4730.00001.0000
HAVO0.1820.00000.3860.00001.0000
VWO0.1520.00000.3590.00001.0000
Working time, father35.9536.000012.760.000046.0000
Working time, mother15.296.000015.130.000046.0000
Catholic0.2340.00000.4230.00001.0000
Protestant0.1850.00000.3880.00001.0000
Smoking0.0900.00000.2870.0001.000
Truancy0.1840.00000.3870.0001.000
Asking for permission0.8631.0000.3440.0001.000
Moped0.0640.00000.2450.0001.000
Cell phone0.2110.0000.4080.0001.000
Girls (3,885 observations)
Smoking0.0930.0000.2900.0001.000
Truancy0.1740.0000.3790.0001.000
Asking for permission0.8541.0000.3530.0001.000
Moped0.0290.0000.1670.0001.000
Cell phone0.2030.0000.4020.0001.000
Boys (3649 observations)
Smoking0.0880.0000.2830.0001.000
Truancy0.1940.00000.3960.0001.000
Asking for permission0.8731.0000.3330.0001.000
Moped0.1020.0000.3030.0001.000
Cell phone0.2190.0000.4140.0001.000

4.3. Estimation Results

Table II presents four versions of the estimated model for smoking. The first column contains estimation results for the model without social interactions (i.e., with γGG = γGB = γBB = γBG = 0). The probability of smoking strongly increases with age. The effect of gender is insignificant. The higher the level of the type of education, the smaller the probability that a pupil smokes. We also find that pupils from single-parent households and pupils whose mothers have a paid job have a significantly larger probability of smoking. The variables non-Dutch, Catholic, and Protestant negatively affect pupils' smoking behavior. The effects are largely consonant with earlier empirical studies on smoking behavior; see for example, Gruber and Zinman (2001) and Gruber (2001).

Table II. Estimation results: smoking (t-values in parentheses)
 With fixed effects
No SIWith SINo SIWith SI
Constant− 4.21− 3.46− 4.74− 4.67
 (−19.2)(−12.2)(−6.9)(−6.5)
Girl0.044− 0.0370.0250.030
 (1.0)(−0.2)(0.5)(0.1)
Age0.1880.1670.1760.174
 (12.3)(9.7)(7.7)(7.4)
Non-Dutch− 0.269− 0.264− 0.204− 0.202
 (−3.3)(−3.1)(−1.9)(−1.9)
Single-parent family0.2100.1940.2240.222
 (3.2)(2.9)(3.0)(2.9)
MAVO0.1670.1820.2450.247
 (3.6)(3.4)(2.9)(2.9)
HAVO− 0.052− 0.060− 0.102− 0.105
 (−0.9)(−1.0)(−1.1)(−1.1)
VWO− 0.198− 0.145− 0.266− 0.263
 (−3.2)(−2.1)(−2.6)(−2.4)
Father's working time0.0020.0020.0020.002
 (1.4)(1.3)(1.2)(1.2)
Mother's working time0.0050.0050.0050.005
 (3.5)(3.3)(3.2)(3.1)
Catholic− 0.211− 0.207− 0.200− 0.197
 (−4.3)(−4.1)(−2.8)(−2.7)
Protestant− 0.122− 0.154− 0.152− 0.148
 (−2.2)(−2.6)(−1.7)(−1.6)
γBB0.7220.125
  (4.4) (0.7)
γBG0.3890.015
  (1.6) (0.1)
γGB0.3360.041
  (1.7) (0.2)
γGG0.5750.113
  (3.6) (0.6)
Log-likelihood function− 2163.7− 2151.3− 2109.5− 2106.9

Column two presents results for the model with social interactions. All social interaction coefficients are positive and highly significant. The largest one is γBB, measuring the boy–boy interaction, followed in size by γGG, measuring the interaction between girls. The coefficients γGB and γBG, measuring the cross-gender interactions, are smaller in size and not significant. Note that the inclusion of the social interaction coefficients hardly affects the other parameters.

We have also estimated the model for truancy, moped ownership, cell phone ownership, and asking parents' permission for purchases.12 Table III reports the results. (For ease of comparison the first column in Table III repeats the second column from Table II).

Table III. Estimation results (t-values in parentheses)
 SmokingTruancyMopedCell phonePermission
Constant− 3.46− 2.90− 4.54− 2.714.39
 (−12.2)(−9.4)(−13.3)(−13.0)(18.0)
Girl− 0.037− 0.013− 0.9780.027− 0.036
 (−0.2)(−0.2)(−3.4)(0.3)(−0.2)
Age0.1670.1650.2560.151− 0.211
 (9.7)(9.8)(14.2)(10.8)(−14.9)
Non-Dutch− 0.2640.131− 0.1570.184− 0.193
 (−3.1)(2.0)(−1.7)(3.1)(−3.1)
Single-parent family0.1940.054− 0.0310.280− 0.266
 (2.9)(0.9)(−0.4)(5.0)(−4.6)
MAVO0.1820.120− 0.0940.082− 0.104
 (3.4)(2.3)(−1.4)(1.8)(−2.1)
HAVO− 0.0610.144− 0.200− 0.051− 0.159
 (−1.0)(2.3)(−2.7)(−0.9)(−2.9)
VWO− 0.1450.059− 0.377− 0.249(−0.071)
 (−2.1)(0.9)(−4.1)(−3.9)(−1.1)
Father's working time0.002− 0.0010.002− 0.001− 0.005
 (1.3)(−0.7)(1.1)(−0.8)(−3.4)
Mother's working time0.0050.0020.0020.002− 0.003
 (3.3)(1.3)(1.1)(2.1)(−2.7)
Catholic− 0.207− 0.1560.004− 0.0110.246
 (−4.1)(−3.3)(0.1)(−0.3)(5.2)
Protestant− 0.154− 0.124− 0.074− 0.2770.286
 (−2.6)(−2.4)(−1.1)(−5.0)(5.3)
γBB0.7220.8260.5440.4950.168
 (4.4)(8.9)(2.7)(4.8)(1.2)
γBG0.3880.5180.4240.3320.073
 (1.6)(3.7)(1.8)(2.2)(0.4)
γGB0.3360.4420.2380.4000.000
 (1.7)(2.9)(0.8)(2.5)(0.0)
γGG0.5751.1100.0120.6730.036
 (3.6)(11.5)(0.0)(7.3)(0.3)
Log-likelihood function− 2151.3− 3236.7− 1562.0− 3626.3− 2776.6

For truancy, the intra-gender effects are stronger than for smoking. Moreover, we now also have significant cross-gender interactions. The probability of truancy increases sharply with age, is larger for non-Dutch pupils, and is lower for students in VWO, the highest education level in high school.

Moped ownership is the only type of behavior where we find a large gender effect: the probability of moped ownership is much larger for boys than for girls. It strongly increases in age (the legal minimum age for riding a moped in the Netherlands is 16) and decreases with the level of education. It is also the only type of behavior where we have a clear asymmetry in social interactions between genders. For a boy, the probability of moped ownership is strongly affected by moped ownership of other boys and, to a somewhat lesser extent, of girls. Moped ownership for girls, on the other hand, is not affected by social interactions.

For cell phone ownership we again find an increasing effect of age and a decreasing effect of education. Teenagers from a single-parent family have a much larger probability of owning a cell phone. All social interaction coefficients are significant, with the girl–girl effect being largest in magnitude.

The probability of asking parents' permission before purchasing something strongly decreases with age, and is smaller for non-Dutch pupils and for pupils in a single-parent household. It also significantly decreases with father's and mother's working time. The four social interaction coefficients are (jointly) insignificant. This suggests that students do not copy each other's responses when filling out the questionnaire.

4.4. Correlated Within-Class Error Terms

Clearly, a more flexible specification would be obtained by allowing for class-specific fixed effects. With the current data, the estimation of class-specific fixed effects is infeasible. Apart from the dependence of identification on functional form assumptions, estimation would have to be based on a much smaller number of observations since classes with non-smokers only (or with smokers only) cannot be used. However, we can estimate the model with class-specific random effects.13 We assume the covariance matrix Σ of (ϵ1, …, ϵN) to be a ‘one-factor’ matrix such that Σ = {ρij} with ρij = ρ if ij and ρij = 1 if i = j. To calculate the probabilities P(ϵ∈W(y, θ)) we use a decomposition simulator which effectively depends on only a one-dimensional random variable; cf. Stern (1992).14

We first estimated this version of the model for smoking without social interaction effects. We found the estimated ρ to be small but significant (equation image, t-value 4.5), with the other parameters largely unaffected. When estimating the model with social interaction effects, the estimated ρ is virtually equal to zero and insignificant, with the other parameters being identical to those in the second column of Table II. This seems to indicate that the results in Table II are not driven by unobserved variables at the class level.

4.5. The Magnitude of the Social Interaction Effects

In order to gain some insight in the magnitude of the social interaction effects implied by the estimated γ's, consider a reference class (largely based on median values of exogenous variables). This is a hypothetical MAVO class composed of 8 girls and 8 boys; all of them are aged 14, Dutch, non-Protestant, non-Catholic, and come from a two-parent household with a father working 36 hours per week and a mother working 16 hours per week. Using the estimated parameters from Table III, we find that in equilibrium the expected number of truanters is 3.14 (the probability of truancy is 0.191 for girls and 0.201 for boys).15

Now suppose that a surely truanting girl is added to this class (i.e., we add a girl with characteristics such that her probability of truancy is virtually equal to 1, irrespective of the behavior of others). Without social interaction effects, the expected fraction of truanters would rise from 0.196 (3.14/16) to 0.244 (4.14/17): a 24% increase. Taking social interaction effects into account, the new equilibrium fraction of truanters rises to 0.278 (4.73/17): an increase of 41% compared to the original level. If a surely non-truanting girl is added to this class, the expected fraction decreases from 0.196 (3.14/16) to 0.185 (3.14/17) without social interaction effects (a 6% decrease), and to 0.169 (2.88/17) with social interaction effects (a 16% decrease).

The model also implies that a change in the value of an exogenous variable of only one of the pupils in principle affects the behavior of all pupils in class. Suppose, for example, that the mother of one of the girls in the reference class increases her working hours to 46 per week. Then the equilibrium truancy probability of her daughter increases from 0.191 to 0.210. However, it also changes the equilibrium truancy probabilities of the other girls (from 0.1909 to 0.1915) and boys (from 0.2002 to 0.2012). As a result, the expected number of truanters in class increases not by only 0.019 (0.210–0.191), but by 0.031.

4.6. School-Specific Fixed Effects

Smoking behavior in all classes of a given school is likely to be affected by a number of unobserved school-specific factors, like smoking behavior of teachers, the school's policy regarding smoking, and proximity of tobacco outlets. Similar effects are likely to be present for the other behaviors. Unobserved school-specific factors may also be related to a non-random assignment of pupils to schools. For example, parents who smoke themselves may be less likely to send their children to a school in which smoking is strictly prohibited. Significant social interaction coefficients may then merely reflect the failure to control for these unobserved effects. While the estimation of class-specific fixed effects is infeasible with the current data, we estimate in this section a version with school-specific fixed effects.

The inclusion of school-specific fixed effects amounts to estimating 69 additional parameters (one school is reference category) for truancy, cell phone ownership, and asking permission. For smoking two other schools are omitted because they have non-smokers only; for a similar reason five schools are omitted in the moped ownership model. For smoking the results are reported in the fourth column of Table II (the third column reports the results for the model without social interaction but with fixed effects). All social interaction coefficients are now smaller in magnitude, and none of them is significant; a χ2-test shows that the γ's are also jointly insignificant (p = 0.216).

Table IV reports the results for all discrete-choice behaviors. (For ease of comparison the first column in Table IV repeats the fourth column from Table II). In addition to smoking, the social interaction coefficients are also jointly insignificant for moped ownership, cell phone ownership, and asking permission, and several γ's—in particular, those related to cross-gender interactions—reached their lower bound.

Table IV. Estimation results, with school-specific fixed effects (t-values in parentheses)
 SmokingTruancyMopedCell phonePermission
Constant− 4.67− 3.65− 5.51− 4.064.34
 (−6.5)(−5.5)(−9.5)(−10.5)(13.5)
Girl0.030− 0.010− 0.724− 0.041− 0.116
 (0.1)(−0.1)(−2.1)(−0.9)(−2.8)
Age0.1740.1650.3030.190− 0.200
 (7.4)(7.9)(11.7)(10.6)(−10.3)
Non-Dutch− 0.2020.172− 0.1650.079− 0.200
 (−1.9)(2.2)(−1.3)(1.2)(−2.7)
Single-parent family0.2220.080− 0.0490.265− 0.246
 (2.9)(1.2)(−0.5)(4.3)(−3.8)
MAVO0.2470.344− 0.1920.022− 0.135
 (2.9)(4.5)(−1.7)(0.0)(−1.5)
HAVO− 0.1050.206− 0.292− 0.213− 0.121
 (−1.1)(2.6)(−2.4)(−2.5)(−1.4)
VWO− 0.2630.105− 0.481− 0.511(0.014)
 (−2.4)(1.3)(−3.8)(−5.3)(0.1)
Father's working time0.002− 0.0000.003− 0.000− 0.006
 (1.2)(−0.2)(1.2)(−0.1)(−3.2)
Mother's working time0.0050.0020.0020.002− 0.004
 (3.1)(1.5)(1.2)(1.9)(−2.5)
Catholic− 0.197− 0.140− 0.025− 0.0410.221
 (−2.7)(−2.4)(−0.3)(−0.8)(3.4)
Protestant− 0.148− 0.185− 0.156− 0.2170.249
 (−1.6)(−2.7)(−1.3)(−3.0)(3.2)
γBB0.1250.2850.0310.0000.000
 (0.7)(2.7)(0.1)(—)(—)
γBG0.0150.0890.0040.0000.000
 (0.1)(0.6)(0.0)(—)(—)
γGB0.0410.0230.0000.0000.000
 (0.2)(1.6)(—)(—)(—)
γGG0.1130.5420.0000.0320.000
 (0.6)(5.3)(—)(0.3)(—)
Log-likelihood function− 2106.9− 3183.2− 1517.9− 3499.5− 2729.7
Significance γ's (p-values)0.2610.0000.7040.5381.000
Number of students74547510739275137534
Number of classes479483474483485
Number of schools6870657070

Truancy is the exception. Both intra-gender interaction coefficients are still highly significant. Although smaller than in Table III, the effects are still substantial in magnitude, in particular the girl–girl interaction.

In all cases, the fixed effects are jointly significant at the 5% significance level.

5. CONCLUSION

We derived a number of equilibrium properties for the binary choice interaction model with a finite number of agents. Both for the case with strategic complements and strategic substitutes, equilibrium existence was proved and tight upper bounds were derived for the size of the set of equilibria, given the number of agents and the degree of interaction between them. We also briefly discussed the consequences for the set of equilibria when the model is extended to allow for gender-dependent interactions. The main finding here is that the cross-gender parameters are irrelevant in the derivation of the upper bounds.

In our application to teenagers' discrete choices, we found that most of the social interaction coefficients become insignificant once the model allows for school-specific fixed effects. An exception is truancy, for which both intra-gender social interaction coefficients remain significant. The fact that we do find significant social interaction effects for a type of behavior closely related to school (truancy) and do not find such effects for behaviors farther away from school strongly suggests that our model measures genuine endogenous social interaction effects rather than unobserved social group effects.

The work presented in this paper indicates various possible extensions for future research. An example is to allow for more general interaction structures, for example by making interaction parameters dependent on socio-economic characteristics. Another, more general issue—typically neglected in the empirical social interactions literature to date—is the question which type of equilibrium concept is appropriate. The fact that classmates interact daily, usually for many years, and often become friends suggests that non-cooperative Nash equilibria may not always be plausible.

While the present dataset has a number of important advantages in terms of information on reference group members, the empirical results are subject to the usual qualifications regarding inferences about social interactions based on cross-section data. Future steps toward increasing our understanding of social interactions will require more informative data and models characterized by a tight link between game theory and econometrics.

Acknowledgements

A large part of the work on this paper was done while we were both at the University of Groningen. We thank Rob Alessie, Ulf Bockenholt, Marco Haan, Yannis Ioannides, Joyce Jacobsen, Brian Krauth, Bertrand Melenberg, Bert Schoonbeek, Michel Wedel, and anonymous referees for helpful comments and discussions. In addition, we benefited from comments by seminar participants at ESEM2002, RAND Corporation, Tilburg University, the University of Amsterdam, and the University of California at Santa Barbara. Soetevent's research was supported by a grant from the MacArthur Research Network on Social Interactions and Economic Inequality, and by the Netherlands Organization for Scientific Research (NWO). Part of this paper was written while Soetevent was a visitor at the University of Wisconsin at Madison, whose hospitality he gratefully acknowledges.

. APPENDIX: PROOFS

. Proof of Proposition 1: Equilibrium Existence

Proof. The case for γ = 0 is obvious. We prove Proposition 1 for the game with strategic complements (γ> 0) and the game with strategic substitutes (γ< 0) separately. For the first case, existence can be readily proved by showing that the game belongs to the class of supermodular games. Existence then immediately follows from using Theorem 5 in Milgrom and Roberts (1990, p. 1265). In this Appendix however, we will follow for both cases the alternative route of proving equilibrium existence through finding an explicit equilibrium for all combinations of {β, γ, x, ϵ}. This procedure may give more insight into some of the peculiarities of the model.

Every possible combination of {β, γ> 0, x, ϵ} clearly falls into one of the three following categories

  • (i)z[1] ≤ 0;
  • (ii)z[N] > 0;
  • (iii)z[1] > 0, z[N] ≤ 0.

We show that for each z in every category there is an associated y for which (4) holds, for all values γ> 0.

  • (i)z[1] ≤ 0:yi = − 1, i = 1, 2, …, N(k = − N) is an equilibrium solution, since equation image. This implies that equation image since equation image is a constant and z[i] weakly decreases with i.
  • (ii)z[N] > 0:yi = 1, i = 1, 2, …, N(k = N) is an equilibrium solution, since equation image, equation image.
  • (iii)z[1] > 0, z[N] ≤ 0:Define M≡0 if equation image, j∈{1, 2, …, N} and M≡argmaxiequation image otherwise. Five examples of sequences of z[i] with N = 6 and γ = 1 are plotted in Figure 2 together with the corresponding values of M. The solid line represents the equation equation image.If M = 0, y[i] = − 1, i = 1, 2, …, N is an equilibrium solution, since equation image, equation image. (See the + -sequence in Figure 2.)If M > 0, y[i] = 1 for i = 1, 2…, M and y[i] = − 1 for i = M + 1, M + 2, …, N(k = M − [NM] = 2MN) is an equilibrium solution, since equation image for i = 1, 2, …, M and equation image for all j = M + 1, M + 2, …, N.Note that for sequences of z[i]'s for which M = N (like the sequence of circles and x's in Figure 2), y[i] = − 1, i = 1, 2, …, N is another equilibrium solution iff z[1] ≤ γ. In Figure 2, this condition holds for the sequence of x-es but not for the sequence of circles. □
Figure 2.

Five examples of z[i]-sequences and the corresponding solutions for M≡argmaxiequation image for the case with N = 6 and γ = 1

. Strategic Substitutes (γ< 0)

In this case, we distinguish between the case where the number of subjects N is even and the case where this number is odd.

N even Let γ< 0. Define m≡arg;maxi(z[i] > 0). Suppose that m > N/2; that is, the majority of the subjects have a value of z greater than zero. Define the non-overlapping non-empty intervals equation image; equation image and, if m > N/2 + 1, equation image, for r = 1, 2, …, mN/2 − 1.

First consider the case m > N/2 + 1. Since the intervals are non-overlapping and since I0I1∪…∪ImN/2 = [0, ∞), − γ is in one and only one of these intervals. If − γ∈I0, y = (1, 1, …, 1m, − 1, …, − 1)′, (k = 2mN) is an equilibrium, since for this solution equation image and equation image. If − γ∈Ir, for r = 1, 2, …, mN/2 − 1, y = (1, 1, …, 1mr, − 1, …, − 1)′(k = 2(mr)− N) is an equilibrium, since for this solution equation image and equation image. If − γ∈ImN/2, y = (1, 1, …, 1N/2, − 1, …, − 1)′(k = 0) is an equilibrium, since for this solution equation image and equation image.

If m = N/2 + 1, then I0ImN/2 = I0I1 = [0, ∞). Applying similar reasoning, one can verify that y = (1, 1, …, 1N/2+1, − 1, …, − 1)′, (k = 2) is an equilibrium when − γ∈I0 and that y = (1, 1, …, 1N/2, − 1, …, − 1)′(k = 0) is an equilibrium when − γ∈I1.

If m = N/2, then y = (1, 1, …, 1N/2, − 1, …, − 1)′ is an equilibrium for all − γ∈(0, ∞), since equation image and equation image.

Due to symmetry, the above argument can be applied for m < N/2 with m replaced by equation imageNmN/2 and the roles of the outcomes + 1 and − 1 interchanged.

N odd The above argument can also be applied for odd N. Suppose that m > (N + 1)/2 and define equation image, equation image and, if m > (N + 1)/2 + 1, equation image, for r = 1, 2, …, m − (N + 1)/2 − 1.

Taking the case that m > (N + 1)/2 + 1, it follows that for − γ∈I0, y = (1, 1, …, 1m, − 1, …, − 1)′(k = 2mN) is an equilibrium; for − γ∈Ir, r = 1, 2, …, m − (N + 1)/2 − 1, y = (1, 1, …, 1mr, − 1, …, − 1)′(k = 2(mr)− N) is an equilibrium; and for − γ∈Im−(N+1)/2, y = (1, 1, …, 1(N+1)/2, − 1, …, − 1)′(k = 1) is an equilibrium.

If m = (N + 1)/2 + 1, then I0Im−(N+1)/2 = I0I1 = [0, ∞). Applying similar reasoning, one can verify that y = (1, 1, …, 1(N+1)/2+1, − 1, …, − 1)′(k = 3) is an equilibrium when − γ∈I0 and that y = (1, 1, …, 1(N+1)/2, − 1, …, − 1)′(k = 1) is an equilibrium when − γ∈I1.

If m = (N + 1)/2, then y = (1, 1, …, 1(N+1)/2, − 1, …, − 1)′ is an equilibrium for all − γ∈(0, ∞), since equation image and equation imageequation image. Again, the case with m < (N + 1)/2 follows from symmetry. □

. Proof of Proposition 2: Maximum Number of Equilibria (Strategic Complements)

The proof for strategic complements uses the following lemma:

Lemma. Lemma 1 Let γ> 0. Suppose model (4) has an equilibriumy. Then

equation image

wherezi≡β′xi + ϵi.

Proof of Lemma 1

Proof. Consider an agent i with yi = 1 and an agent j with yj = − 1. Suppose equation image. Then equation image. But since yi = 1 and yj = − 1 implies equation image, we have a contradiction. □

The lemma's effect is that it restricts the maximum number of potential equilibria to N + 1. The following observation is an immediate consequence of lemma 1:

  • (1)In any equilibrium the agents withyi = 1 are those with the largest values forzi.

Now consider two vectors y and equation image that differ in one element only. Without loss of generality, assume that yi = 1 and equation imagei = − 1 for some i. Define y−i≡(y1, y2, …, yi−1, yi+1, …, yN)′ and equation imagei≡(equation image1, equation image2, …, equation imagei−1, equation imagei+1, …, equation imageN)′. Since y−i = equation imagei, it follows that equation imageequation image given a combination of {β, γ, x, ϵ}. This implies that yi = equation imagei and we arrive at a contradiction. Note that this result holds irrespective of γ being positive or negative. The following observation is thus obtained:

  • (2)Two vectorsyandequation imagethat differ in only one element cannot both belong to the set of equilibria.

From observations (1) and (2) it follows that the number of equilibria for a given combination of {β, γ> 0, x, ϵ} can be at most equation image, where equation image denotes the largest integer not larger than w. To give an example: when the number of agents N = 8, the maximum number of equilibria can be at most equation image. Due to statements (1) and (2), the strategy profiles of these equilibria must be strictly ordered and differ in at least two elements. This leaves the following five strategy profiles as the only candidates:

equation image

This proves the first part of Proposition 2. The proof of the second part—the upper bound on the number of equilibria is strict—runs as follows. Denote the d equilibria that are to be sustained as16

equation image

First note that y1 can be sustained as an equilibrium outcome if and only if z[N] > − γ and that yd can be sustained as an equilibrium outcome if and only if z[1] ≤ γ. Further note that yd − i, i = 1, …, d − 2 can be sustained as equilibria if and only if equation image and equation image. The fact that these necessary and sufficient conditions on the values of z can be satisfied simultaneously completes the proof. □

. Proof of Proposition 3: Maximum number of equilibria (strategic substitutes)

In order to prove proposition 3, we will use the following lemma.

Lemma. Lemma 2 For a given combination {β, γ< 0, x, ϵ}, yandequation imageare both equilibria of (4), only ifequation image.

The proof of lemma 2 uses the following lemma.

Lemma. Lemma 3 If for a given combination {β, γ< 0, x, ϵ} there exists an equilibriumywithy[j] = − 1 andy[j+1] = 1, then there also exists an equilibriumequation imagewithequation image[j] = 1 andequation image[j+1] = − 1 andequation image[i] = y[i]forij, j + 1.

Proof of Lemma 3

Proof. From the fact that y is an equilibrium with y[j] = − 1 and y[j+1] = 1, it follows that

equation image

However, since γ< 0, we have

equation image

It then follows that equation image with equation image[i] = y[i] for ij, j + 1 and equation image[j] = 1 and equation image[j+1] = − 1 is also an equilibrium. □

Having proved lemma 3 we can now prove lemma 2.

Proof of Lemma 2

Proof. Suppose that y with equation image and equation image with equation image and equation imagek are both equilibria of (4), given a combination {β, γ< 0, x, ϵ}. From lemma 3 it follows that this is true only if equation image and equation image are both equilibria given {β, γ< 0, x, ϵ}. Assume without loss of generality that equation image > k, that is: equation imagek ≥ 2. Let ν be the first subject whose choice is − 1 in equilibrium yk and + 1 in equilibrium yequation image. Then, for this subject

equation image

But also

equation image

and the contradiction follows. □

The message of lemma 2 is that for a given value of γ< 0, two different equilibria y and equation image can coexist only if equation image. That is, both equilibria must have the same number of subjects with outcome + 1 and with outcome − 1.

Repeated application of lemma 3 shows that a strategy profile y with equation image can only be an equilibrium if the ordered (with respect to the zi's) strategy profile y = (11, 12, …, 1k, − 1k+1, …, − 1N)′ is an equilibrium. This result will prove to be useful later on in deriving upper bounds for the number of equilibria that may be sustained for a given value of γ.

To complete the proof of Proposition 3, note that the first part of lemma 2 implies that the maximum number of possible equilibria subject to the condition equation image is obtained when k is chosen to equal 0 (+1 or − 1) when N is even (odd). In that case, there are N/2 ((N + 1)/2 or (N − 1)/2) agents choosing + 1 and the others choosing − 1, giving the upper bounds on the number of possible equilibria as given by d(N) in Proposition 3.

What is left to show is that there exists a combination of {β, γ< 0, x, ϵ} for which the maximum number of equilibria is obtained. From lemma 2 we know that, given a combination of {β, γ< 0, x, ϵ}, every element in the equilibrium set must have the same number of agents choosing y = 1. For N is even, the set can thus only have de(N) elements when the set contains all strategy profiles for which the number of agents choosing y = 1 equals the number of agents choosing y = − 1. For each of these profiles to be an equilibrium, it must be optimal for each agent i to choose yi = 1 given that ∑jiyj = − 1 and to choose yi = − 1 given that ∑jiyj = 1. In particular, it must hold that for each element the number of agents splits:

equation image

For γ negative enough, this condition is satisfied irrespective of the values of z[1], …z[N].

For N is odd, the equilibrium set can only contain do(N) elements when the set contains all strategy profiles for which equation image or all strategy profiles for which equation image. The necessary and sufficient conditions for each of the profiles for which equation image to be an equilibrium, are

equation image(A.1)

and the corresponding conditions for the strategy profiles with equation image are

equation image(A.2)

From these conditions it follows that the equilibrium set with do(N) elements for which equation image is only obtainable when all z values are positive (non-positive). Together this proves Proposition 3. □

Lemma 2 and the observation that for the equilibria in the proof of Proposition 1 equation image monotonically decreases as γ→− ∞, together lead to the following corollary17 that for all equilibria, |k| decreases monotonically to 0 (1) as γ→− ∞, given N even (odd). This result is consonant with intuition: variation in behavior increases when the utility derived from being different increases.

Corollary 3 For the equilibriayof the discrete choice-interaction model given by (4),

equation image
Proof of Corollary 1

Define equation image, equation image if i is a girl and equation image if i is a boy. Denote the ordered values of equation image as equation image such that equation image, with NG(NB) denoting the total number of girls (boys) in the sample.

The line of reasoning used in the proof of Proposition 1 can now be applied to the subset of girls (boys), with z[i] replaced by equation image and γ replaced by γGGBB). □

  • 1

    As long as the model has an intercept, the specific support used is immaterial: Working with Yi = {−1, 1} gives qualitatively the same answers as working with equation imagei = {0, 1}. When no intercept is added, the difference is that when using equation imagei = {0, 1}, one implicitly assumes that only positive choices have a social effect. We thank Brian Krauth for helpful discussions on this matter.

  • 2

    When γ = 0, the model reduces to the standard binary choice formulation without externalities.

  • 3

    Milgrom and Roberts (1990, p. 1255). See also Vives (1990) and the textbook treatments of Topkis (1998) and Vives (1999).

  • 4

    We follow the convention 00 = 1. Note that the expression between brackets checks for each entry yit of a given strategy profile yt whether the choice is consistent with the value of the latent variable. If yit = 1, the first indicator function contains the relevant comparison, and for yit = − 1 the second. If consistency holds for all entries of a given profile yt, the expression between brackets has a value of 1 (and zero otherwise). By aggregating over all 2N possible strategy profiles, one obtains the number of equilibria. If the disturbances are i.i.d. with cumulative distribution function F(·), the expected number of equilibria can be expressed as

    equation image

    See Soetevent (2004) for some properties of ∂E[Q(β, γ, x, N)]/∂γ.

  • 5

    Note that given N, only those values of k for which N + k is an even number are possible. This follows from the observation that k = a·1 − (Na), a∈{0, 1, …, N} can be rewritten as N + k = 2a.

  • 6

    The reason is that in (9), (exact) analytical information on the probability of W(y, θ) is used. In (10), this probability is estimated together with the probability of actually observing y conditional on a draw of ϵ being in W(y, θ).

  • 7

    Write the log-likelihood function as logŁ(θ) = ∑klogpk(θ)+ ∑klogrk(θ), where pk(θ) and rk(θ) are the first and second right-hand side term in (9), respectively, for class k. To increase computational speed, we set ∂rk(θ)/∂θ = 0 in the initial phase of likelihood maximization; i.e., we calculate the term equation image in (9) only once for a given iteration–class combination, and keep it constant when evaluating the numerical derivatives.

  • 8

    Previous surveys were conducted in 1984, 1990, 1992, 1994, and 1996. The NSYS is a joint effort of the Social and Cultural Planning Office of the Netherlands (SCP) and the Netherlands Institute for Family Finance Information (NIBUD). In each survey year a random sample of high schools in the Netherlands is drawn. A participating school is compensated by means of a report summarizing the survey results for that school. The series of surveys is not a panel, although some schools have participated more than once.

  • 9

    A US dataset which is comparable to the present one is the National Education and Longitudinal Study (NELS) (see, for example, Gaviria and Raphael, 2001). Both the Dutch NSYS and the NELS focus on non-cognitive outcomes within schools. The NELS is a biannual survey, first held in 1988, and samples students within roughly 1000 schools. An important difference with the Dutch NSYS is that the NELS surveys only a relatively small group of students within each school. For example, in the 1990 sample used by Gaviria and Raphael, the mean sample size per school was 13.3 students. While the NELS contains information on school averages, these are not available per class, grade, or gender. This limits the possibilities for an analysis of interactions within schools (for example, it is impossible to allow for a school-specific fixed effect) and it precludes any analysis of social interactions within classes. Two other US datasets on teenagers with peer group information are the Teenage Attitudes and Practices (TAPS) and the National Longitudinal Survey of Youths (NLSY). However, the TAPS only contains subjective information on a respondent's four best same-sex friends, whereas the NLSY only has subjective peer information based on questions of the type ‘What percentage of kids in your grade …?’

  • 10

    Identification is based on the problematic assumption that perceived behavior is not determined by actual behavior.

  • 11

    A number of studies have reported indicators for self-esteem to be important explanatory variables in the analysis of teenage behavior (see, for example, Smetters and Gravelle, 2001). We choose not to include such a variable because of its potential endogeneity.

  • 12

    The variable ‘truancy’ in the empirical analysis is based on the question ‘How often have you been playing truant during the last (school)month?’ As truanters have a larger probability of being absent when the questionnaire is being filled out, there is a potential selection bias. The effect on the estimated social interaction coefficients, however, is likely to be small. The absence of a group of truanters with strong mutual interactions might bias the estimated γ's towards zero, but the presence of a group on non-truanters with strong mutual interactions will have the opposite effect. Moreover, tentative calculations indicate that the probability of a student truanting on a random schoolday is of the order of 1%.

  • 13

    As we assume that the within-class correlation coefficient between error terms does not vary across classes, this random effects model calls for the estimation of only one additional parameter.

  • 14

    Let the random variables u1, …, uN, and v be independently normally distributed with zero means; var(ui) = 1 − ρ, i = 1, …, N and var(v) = ρ. (We require ρ> 0; the procedure for ρ< 0 is slightly different. Note, however, that the positive definiteness of Σ implies equation image.) Let ϵi = ui + v, i = 1, …, N. Then cov(ϵ) = Σ, with Σ defined in the main text. Now equation image, with Φ(.) the standard normal cumulative distribution function and f(v) a N(0, ρ) density function. The integral is simulated by drawing v from f(.) and then evaluating equation image conditional on v.

  • 15

    All numbers are based on simulations with R = 100 000.

  • 16

    When N is odd, there has to be one equilibrium that differs in at least three elements when compared to any of the other equilibria. Without loss of generality we assume the last three elements of y to be the three elements that move together.

  • 17

    The corresponding result for positive interactions is that equation image as γ→∞. That is, in the limit all agents conform to y = 1 or to y = − 1 regardless of their private utility such that variation in behavior is minimized.

Ancillary