Child Mental Health and Educational Attainment: Multiple Observers and the Measurement Error Problem

We examine the effect of survey measurement error on the empirical relationship between child mental health and personal and family characteristics, and between child mental health and educational progress. Our contribution is to use unique UK survey data that contains (potentially biased) assessments of each child's mental state from three observers (parent, teacher and child), together with expert (quasi-)diagnoses, using an assumption of optimal diagnostic behaviour to adjust for reporting bias. We use three alternative restrictions to identify the effect of mental disorders on educational progress. Maternal education and mental health, family income, and major adverse life events, are all significant in explaining child mental health, and child mental health is found to have a large influence on educational progress. Our preferred estimate is that a 1-standard deviation reduction in 'true' latent child mental health leads to a 2-5 months loss in educational progress. We also find a strong tendency for observers to understate the problems of older children and adolescents compared to expert diagnosis.


Introduction
Childhood has become the focus of a growing body of research in economics concerned with the closely-related concepts of children's wellbeing, mental health and non-cognitive skills. Much of this interest has been sparked by Heckman's model of life-cycle human capital accumulation, which contends that, independently of cognitive ability, a stock of 'noncognitive skills' are built up by streams of investment over the life course and determine a wide range of life outcomes (Heckman, Stixrud and Urzua, 2006). A strong motivation for this line of research comes from the belief that IQ or cognitive ability is much less malleable than socio-emotional skills, particularly after the age of 10. From a policy perspective, this would suggest that the returns to interventions targeted at non-cognitive skills are potentially much higher than those focused on cognitive outcomes alone. For example, the Perry preschool intervention program in the 1960s did not raise the IQ of participating children in a lasting way, yet they went on to have better adult outcomes than the control group in a variety of dimensions (Heckman et al., 2010). The inference that Perry succeeded because of its impact on attention skills or antisocial behaviours, rather than cognitive ability, is one that is supported by evaluations of more recent childhood interventions which tend to show much larger effects on behaviour (of both parents and children) than on cognitive achievement outcomes (Currie 2009).
Mental health conditions are much more common in childhood than most physical conditions and a growing body of evidence suggests that prevalence is highest among children from low-income backgrounds. While the relationship between non-cognitive skills and medical conceptions of mental health is unclear (even though in practice they are often measured using the same indicators, for example, Duncan and Magnuson, 2009), whether interpreted as lack of non-cognitive skills or the existence of a mental health problem, a central concern is the impact that these adverse childhood states have on the process of human capital 1 accumulation and the implications for the intergenerational transmission of economic advantage. It has been recognised recently that mental health conditions are potentially an important channel through which parental socio-economic status influences the outcomes of the next generation. For example, Currie andStabile (2006, 2007) and Currie et al. (2010) found significant impacts of hyperactivity on a range of later educational outcomes in US and Canadian longitudinal data and shown the persistence of these effects. Evidence from the medical literature is rather more mixed but also indicates the potential importance of mental health problems (Duncan and Magnuson, 2009;Breslau et al., 2008Breslau et al., , 2009.
A key issue in the empirical study of the impact of child mental health on child outcomes is reliability of measurement. Two types of measure are common in the research literature.
Clinical diagnoses are used extensively in psychiatric research, but they have several drawbacks: they are often only available for small, endogenously-sampled groups of children; they identify relatively extreme and rare cases (affecting somewhere in the region of 5 to 10% of children); and they are sensitive to differences in diagnostic practice, which may produce surprising differences between apparently similar groups (for example, diagnosed attention deficit and hyperactivity disorder (ADHD) rates in the US are double those in Canada).
Alternative measures derive from 'screener' questionnaires which can be completed quickly by parents, teachers or the children themselves, in the context of large-scale sample surveys.
These screeners are designed specifically to identify the symptoms of clinical disorders and are often used as a first step in diagnosing suspected cases -a high screening score being suggestive of a recognised disorder, while lower scores reflect the incidence of symptoms among the 'normal' population. These screener questionnaires are typically used in the surveys that also include measures of later outcomes and so can be used to assess the relationship between early mental health and later outcomes. Few data sources are available that give both screening and diagnostic-type information for large representative samples.
Whatever type of information is used, measurement error is an important concern. But is has received little attention in the literature on the consequences of child mental health.
There is a substantial body of research suggesting that adults' assessments of their physical health are prone to serious measurement error (for example, Butler et al. 1987 The availability of multiple measures is particularly helpful in dealing with measurement error problems, but there is a strong possibility of observer-specific reporting bias. There is evidence in the psychology and medical literatures of large disagreements between informants in their assessment of children's psychological well-being. For example, in a sample of US children aged between 5 and 10, Brown et al. (2006) found that parents failed to detect half of school-aged children considered to be seriously disturbed by their teachers. Youngstrom et al. (2003) found that prevalence rates of comorbidity in a clinical sample ranged from 5.4% to 74.1%, depending whether ratings from parent, teacher, child or some combination are used to classify the child. Goodman et al. (2000) suggest that parents are slightly better at detecting emotional disorders than teachers but that the opposite is true for conduct and hyperactivity disorders, while the self-assessments of children have less explanatory power than parents or teachers. Johnston et al. (2010) show, also using data from the Survey of Mental Health of Children and Young People in Great Britain, that estimates of the income gradient in childhood mental health are sensitive to who provides the assessment, with the smallest gradients found when using childrens own assessment of themselves rather than those of parents and teachers.
A clear implication of this limited body of evidence is that measurement error is substantial and unlikely to be the simple random noise which is assumed by the classical errors-invariables model. If no observer can be assumed to be unbiased, standard methods (such as that of Hu and Schennach, 2008) cannot be used to identify the true mental health process.
In this paper we make two main contributions. First, we exploit data from a remarkable UK survey (see Section 2) that contains assessments of children's mental health from parents, teachers and the children themselves, to demonstrate the existence of significant biases in all three observers. We do this by using additional diagnostic-style assessments from a panel of expert psychiatric assessors, under the assumption that the experts are able to make the best possible use (in a rational expectations sense) of all available information, but with random variations in the threshold of seriousness they use for generating diagnoses. This model of expert behaviour, set out in Section 3, allows us to identify (up to scale) the parameters of a model representing the distribution of 'true' child mental health conditional on personal and family characteristics.
Second, we estimate the effect of mental health on educational progress. This requires us to overcome a second identification problem, discussed in Section 4, arising from the difficulty in distinguishing the indirect effect of influences on mental health from their direct effect on educational attainment. We use alternative identification strategies to provide, in Section 5, parallel estimates of the impact of mental health problems on educational progress, relative to an age-specific norm. We show that if an orthodox multiple-indicator latent variable model under the assumption of the existence of an unbiased observer is used, we would reach the conclusion that mental disorders have an adverse impact roughly twice as large as is suggested by a simple regression estimate based on the observable proxy for mental 4 health. However, two alternative (and preferable) instrumental variable strategies which do not impose the simple assumption of an unbiased observer, give rather smaller estimates.
We find in this case they are also similar to those obtained from simple proxy regressions.

Data, Definitions and Descriptive Statistics
The data we use come from the 2004 Survey of Mental Health of Children and Young People in Great Britain, commissioned by the Department of Health and Scottish Executive Health Department, and carried out by the Office for National Statistics. Its aim was to provide information about the prevalence of psychiatric problems among people living in Great Britain, with a particular focus on three main categories of mental disorder: conduct disorders, emotional disorders and hyperkinetic disorders. A sample of children aged between 5 and 16 years was randomly drawn using a stratified sample design (by postcode) from the Child Benefit register. At the time of sampling, Child Benefit was essentially a universal entitlement for parents of all children, so the register provides an excellent sampling frame.
Information was obtained in 76% (or 7,977) of sampled cases, yielding information gathered from the child's primary caregiver (the child's mother in 94% of cases), from the teacher and (if aged 11-16) the young person him/herself. Among co-operating families, almost all the parents and most of the children gave full responses, while teacher postal questionnaires were obtained for 78% of the children interviewed. We focus on a sub-sample of 6,808 white children who have information supplied by their mother, and who have non-missing information for key covariates and mental health measures. The reason for this sample restriction was that ethnic minority and paternal respondent cases were too few for reliable inferences to be drawn about ethnic differences. Inclusion of these groups with associated dummy variables as covariates makes no appreciable difference to the main results.

5
Child mental health is first assessed in the survey with the Strengths and Difficulties Questionnaire (SDQ). The SDQ is a 25-item instrument for assessing social, emotional and behavioral functioning, and has become the most widely used research instrument related to the mental health of children. The SDQ questions cover positive and negative attributes and respondents answer each with a response "not true" (0), "somewhat true" (1), or "certainly true" (2). Appendix Table A1 gives a complete list of the SDQ questions relating to conduct disorder, hyperactivity and emotional problems. In our empirical analyses we use parent, child and teacher SDQ scores that have been constructed in the standard way by summing responses. We carry out the analysis use two alternative indicators: (i ) a sum of the fifteen responses relating to conduct disorder, emotional problems and hyperactivity; and (ii ) a sum of the five items for hyperactivity alone. Each is normalised to a 0-1 scale. The former measure is intended to act as a general assessment of psychological distress, while the latter focuses exclusively on the hyperactivity component of ADHD, which has been studied extensively in the research literature and found to be particularly important in some studies.
Following the SDQ is the Development and Well-Being Assessment (DAWBA), a structured interview administered to parents and older children. The DAWBA contains a series of sections, with each section exploring a different disorder; examples include: social phobia, post traumatic stress disorder, eating disorder, generalised anxiety, and depression. Each disorder section begins with a screening question that determines whether the child has a problem in that domain. If the child passes the screening question and the relevant SDQ score is normal, the remainder of the section is omitted but, if parent or child indicates that there is a problem or the SDQ score is high, detailed information is collected, including a description of the problem in the informant's own words. The DAWBA parent and child interviews respectively take around 50 and 30 minutes respectively to complete (Goodman et al., 2000). A shortened version of the DAWBA was also mailed to the child's teacher. Once all three DAWBA questionnaires were returned, a team of child and adolescent psychiatrists 6 reviewed both the verbatim accounts and the answers to questions about children's symptoms and their resultant distress and social impairment, before assigning diagnoses using ICD-10 criteria. Importantly, no respondent was automatically prioritised. Table 1 provides the sample means for the parent, child and teacher SDQ scores for all children, for the subset of children who were diagnosed with an ICD-10 mental disorder, and for the subset of children without a diagnosed mental illness. The sample means indicate that teachers report the fewest symptoms (0.167) and that children report the most (0.288). Table   1 also shows that the SDQ scores of children with a diagnosed mental illness are 2-3 times larger than the SDQ scores of children without a mental illness. Estimated kernel densities of parent, child and teacher SDQ scores are presented in Figure 1. They are positively skewed, with most children exhibiting few symptoms and only a small minority exhibiting many.
The final key variable for our analysis is educational attainment. The survey focuses very much on measurement of mental state and a consequence of this is that educational outcomes are not documented in detail. In particular, the dataset does not contain test score information, and we use instead the one available measure: the teacher's assessment of the child's scholastic ability relative to other children of the same age. We construct this measure by using teacher responses to the question "In terms of overall intellectual and scholastic ability, roughly what age level is he or she at?", from which we subtract the child's chronological age. This measure of educational progress is unusual in the economics literature, but the concept of a child's "mental age" has a long history in child educational psychology -indeed, Intelligence Quotient (IQ) tests are so named because they were originally constructed as the ratio of mental age to chronological age multiplied by 100. The concept also underlies the practice in many educational systems (but not the UK's) of holding children back in a lower grade if he or she has made inadequate progress relative to the norm for that child's age.
For our sample of children, the average scholastic age gap is 0.034 years, or approximately 2 weeks ahead of actual age (see Table 1). The age gap is however significantly different from zero for the groups of children with and without mental health problems. For children without a diagnosed mental disorder, the mean gap is 0.128 years, and for those with any disorder the gap is -1.007, implying an average gap between the two groups of around 15 months. Non-parametric estimates of the relationships between parent, child and teacher SDQ scores and educational attainment are shown in Figure 2, which confirms the pattern shown in Table 1, but indicates that the relationship is continuous (and approximately linear), rather than a discrete distinction between the absence or presence of a disorder.
Appendix Table A2 presents sample means for the explanatory covariates used in our analysis. The continuous variables have been scaled to avoid extreme numerical values: age, number of children and log income are divided by 10; and mother's GHQ mental health score is scaled to lie in the [0, 1] interval. All other covariates are binary; consequently, the sample means indicate that children with a diagnosed disorder are more likely to: be male; live in social housing; have unmarried parents; have less educated, less employed and less healthy mothers; and have experienced serious adverse life events.  on three main principles. The first is that there exists a 'true' state of psychological disorder, S, conceptualised as the (latent) assessment that would be made by experienced psychiatric assessors in possession of fully detailed, multi-source information on the child. This latent measure is the factor which we see as a potential influence on educational development.
Second, we accept that the child's true mental state S is not accurately observable by 9 anyone: not by the parent, the child him/herself, the teacher, the psychiatric assessment team, nor -least of all -by us, the statistical analysts. We assume the SDQ responses from parents, children and teachers are all potentially subject to systematic distortion, which we see as arising either because certain observers (particularly parents and children) may be reluctant to admit the existence of a problem, or may exaggerate minor problems, or because certain aspects of the problem are less visible to certain types of observer, leading to understatement.
The third underlying assumption is that psychiatric assessors make the best use they can of the information available to them, exploiting their experience of observing children's mental health problems and the reactions of other untrained informants to those problems.
We assume that, in making these assessments, the psychiatric team is aware of the possibility of error (and bias) in the perceptions of parents, children and teachers, and take that possibility into account. In general, fuller information leads to more precise diagnoses.
We implement these ideas through a latent variable structure with switching between observational regimes, to reflect the different information sets that may be available to psychiatric assessors under different circumstances. The sample consists of a set of observed children, indexed by i = 1...n. Child i's 'true' mental health state is S i and we observe three SDQ scores reported by the parent, child and teacher, Y iP , Y iC and Y iT . All are treated as continuously-variable measures. The scores resulting from parents', children's and teachers' responses to the SDQ are potentially biased readings of S i : where X i is a vector of variables, available to all observers, reflecting causal factors including the child's personal characteristics, family and social circumstances and the occurrence of past traumatic events. (V iP , V iC , V iT ) are jointly normal conditional on S i and X i , with zero means and variance matrix Σ Y Y . Here λ j − 1 represents the degree of over-or under-reaction of the observer to the child's true state and α j captures any measurement distortions linked to specific characteristics of the child and family circumstances. Consequently, an observer of type j gives generally unbiased reports only if λ j = 1 and α j = 0. In addition, parents and children are each asked a direct question about whether they perceive there to be a problem with respect to the specific aspect of mental health, yielding two binary indicators, W iP , W iC .These indicators are important, since they play a role in triggering additional questionnaire content. We assume them to be based on the same underlying opinion as revealed by the SDQ and contain no additional information, so that The basic information which is always 1 available to the psychiatric assessment process If the parent's SDQ score exceeds a specific threshold (Y iP ≥ K P ) or the parent reports the child's state to be problematic (W iP = 1), then a much more detailed set of questions is triggered, generating additional information Ω iP ; similarly, if the child perceives there to be a problem or his or her SDQ responses exceed a threshold K C , further information Ω iC is elicited from him or her. Thus, the additional contingent information set available to assessors is: Psychiatric assessors are experienced in diagnosis in a multi-observer family setting, where the information reported to them by children and by parents and teachers may be subject to distortions and misinterpretation. We assume that they make the best use of whatever information is available, interpreting it in the light of their understanding of the mental health and reporting processes which generate that information. Their (approximately accurate) understanding of the relationship between the child's true mental state and his or 1 Apart from missing responses, which we treat as missing at random.

11
her characteristics and circumstances is: Since S i is unobservable, we can normalise β and σ 2 u arbitrarily to fix the origin and scale of S i .
Given this structure, the assessor's best unbiased predictor of which, under our assumptions, takes the form: and b j SY.CX is the analogous coefficient from a regression that also includes the contingent information C i . The vector b SC.Y X contains coefficients of the contingent information C i in the same extended regression. From (1) and (3), the conditional mean function µ j Y (X i ) for observer j is X i (λ j β + α j ).
represents the contribution of information available to the assessor but unobservable for the purposes of statistical analysis and thus, from the point of view of the external observer, merely inflates the residual error inS i .
The observed assessment is a binary quasi-diagnosis D i , which indicates a high predicted level of psychiatric disorder: where τ is the assessor's decision threshold, which may have a random element. 2 As an outside observer, the statistical analyst observes the diagnosis D i and the basic information B i . The probability of a diagnosed problem is: If contingent information C i is available to the assessment process, the probability of a diagnosed mental health problem conditional on the information available to the analyst is: where Thus, conditional on all the observed information in B i , we have a probit model for the psychiatric assessment, with regime switches in the coefficients of W ij and X i and in the normalising variance. However, conditional on B i , these switches are exogenous, so there is no endogenous selection problem as there would be if we conditioned on X i but not the SDQ scores Y ij . Note that, if item non-response makes one or more of the SDQ scores unavailable to us and to the assessors, the forms of (4) and (5) or (6) change to take account of the more limited information available.

Estimates of the measurement model
What can be identified from this measurement model? Equations (1) and (3) imply the following reduced form SDQ models: Thus regression analysis of the SDQ scores conditional on X i identifies coefficient vectors (λ j β + α j ) for each observer j = P, C, T . In the C i = ∅ regime, the probit model (5) identifies b j SY.X σ τ for each j = P, C, T and β − ∑ j b j SY.X (λ j β + α j ) σ τ . Consequently, β σ τ can be recovered, so that β is identified up to scale. By similar reasoning, β can be identified up to another regime-specific scale factor in any of the other informational regimes.

13
Estimates of the measurement model can be computed using maximum likelihood estimation of a system comprising (5), (6) and (7), parameterised in terms of β σ τ , µ τ σ τ , where C is the set of three possible non-empty configurations of contingent information, Ω iP , Ω iC or (Ω iP , Ω iC ). To allow for item or individual non-response in the SDQ for children or teachers, as well as the response-triggered contingent information, we consider four missing data regimes: 3 (i ) Y iP , Y iC , Y iT all observed, with coefficients θ P.P CT , θ C.P CT , θ T.P CT ; (ii ) Y iP , Y iC observed, with coefficients θ P.P C , θ C.P C ; (iii ) Y iP , Y iT observed, with coefficients θ P.P T , θ T.P T ; (iv ) only Y iP observed, with coefficient θ P.P . We parameterise the scale factors as σ τ ω = exp ψ P ν iP + ψ C ν iC , where ν ij is the amount of contingent information supplied by observer j, ranging from ν ij = 0 for no additional information to ν ij = 3 for contingent information on all three aspects of conduct, emotional disorder and hyperactivity.
Parameter estimates of the psychiatric assessment model are given in Table 2. The θparameters indicate that, when available, assessors give greatest weight to teacher's SDQ reports, slightly less to the parental report and considerably less to the child's own selfassessment. The ψ-parameters are negative, which is consistent with the theoretical prediction that σ τ ω < 1 and indicates that additional contingent information has value in clarifying the circumstances which led to the problematic self-assessment.  The estimates of β * are shown in Table 3. Recall that these are estimates of the arbitrarily-scaled coefficient vector β σ ω , so it is only the significance of each coefficient and their relative magnitudes that are meaningful here. Maternal education of any kind has a substantial positive influence on the child's mental health, comparable to major adverse life events including loss of a parent through death or divorce/separation and past experience of serious illness or injury. There is some evidence of inter-generational transmission of mental health problems, since the mother's own GHQ measure of mental (ill-)health is found to have a modest but significantly negative influence on the child's mental state. For example, if the GHQ score were to double from the mean level of 0.3 to 0.6, the predicted impact on the child's mental disorder would be around a third as great as the impact attributable to the absence of maternal educational attainment, or to the death of a friend or serious illness or injury during childhood. Indicators of social disadvantage, do not have a large influence: housing type and tenure are statistically insignificant and, although log household income has a significant protective effect on child mental health, a very large income increase of around 170% would be required to produce an effect comparable to that of maternal education or adverse life events. We find no statistically significant evidence of an effect for the child's age (for general mental health) and gender or for the parents' employment or partnership status, in contrast with the SDQ reduced form estimates presented in Appendix Table A3 (general health) and Table A4 (hyperactivity). Significance: * = 10%; ** = 5%; *** = 1%

Bias in reporting error
The hypothesis of conditionally unbiased reporting by all observers is clearly rejected: A Wald test of the hypothesis of reduced form coefficients equal across observers gives a test statistic (distributed as χ 2 (48) under H 0 ) of 2,201.4 and 1,344.9 for general mental health and hyperactivity respectively. There is also a highly significant difference between the reduced form coefficients for each pair of observers, (P, C), (P, T ) and (C, T ). Thus we can definitely rule out the unbiasedness restrictions λ j = 1 and α j = 0 for all j.
Although the distortion parameters λ j , α j are not identified, it is possible to draw some inferences about the nature of the distortions. If, for some observer j and covariate x k , the identifiable coefficients β * k and [λ j β k + α jk ] are of opposite sign, then α jk must have the opposite sign to β k , implying that misreporting by observer j has the effect of attenuating or even reversing the estimated impact of x k on mental health. We examine this by conducting This test generates significant results only for age, where H 0 can be rejected for all three categories of observer at reasonable significance levels (P −values of 0.047, 0.044 and 0.086, for the parent, child and teacher respectively), implying a tendency for observers to understate the problems of older children and adolescents relative to younger children, by the standards of the fully-informed expert psychiatric assessment. This is perhaps unsurprising, since the early stages of the process of child development are often the focus of special attention, while the problems of older children and adolescents are often less visible to external observers and seem also to be under-acknowledged by young people themselves.

Mental health and educational attainment
Using the assumption that the informed expert assessment makes efficient and unbiased (but not necessarily perfectly accurate) use of available information, we have established that there exists substantial non-classical measurement error in at least two of the three assessments provided by the parent, child and teacher. We now turn to the consequences of this biased reporting for inferences about the causal impact of child mental health on educational development.
The degree of educational attainment relative to the child's age is denoted A i and assumed to be related to mental health S i and other covariates X i as follows: where η i is a normally-distributed regression residual, which may be correlated with some or all of the SDQ residuals V ij . Our results are based on the model (8)

Scaling
Since the mental health variable is unobserved, its scale is arbitrary and the magnitude of ρ cannot be interpreted without an appropriate scale normalisation. The identifiable vector β * = β σ τ contains the coefficients relevant to S i σ τ , and the coefficient of this variable in the education equation would be ρσ τ . This is not a helpful normalisation: one would like to be able to rescale the latent variable S to have unit variance, so that its coefficient can be interpreted as the impact on educational performance of a 1-standard deviation change in the measure of mental disorder. However, var(S i σ τ ) is equal to β * ′ V β * + σ 2 u σ 2 τ , where V is the variance matrix of X i , rather than 1. The scale parameters σ u and σ τ are unknown and it is difficult to find convincing a priori information on them. We resolve this by using a range of normalisations based on alternative assumptions about the population R 2 of the relationship S i = X i β + U i . Assume a particular value for R 2 and multiply β * by the factor κ = R 2 β * ′ V β * . Given an assumed R 2 , and estimates of β * and the variance matrix V , κ is a known constant and the rescaling S * i = κS i σ τ implies var (S * i ) ≡ 1. The corresponding coefficient in the education equation is r = ρσ τ κ, and this is the parameter we aim to identify.

Identification of the mental health-education effect
Consider the reduced form for educational attainment, which reveals the inherent identification problem we face: Even with β known, ρ cannot be uniquely recovered from knowledge of the reduced form coefficients (ρβ + δ). We explore three alternative identification strategies for the coefficient ρ. The first rests on the assumption that one of the observers (parent, child or teacher) is unbiased and that his or her reporting error is uncorrelated with educational attainment: essentially the classical measurement error assumptions. The second approach is to use an exclusion restriction on the coefficient vector δ, which we implement in two distinct ways.
The third alternative is to use prior information on the residual covariances to reveal the sign and significance of ρ.

Covariance restrictions
Residual covariances provide information on ρ and this approach has previously been used by Kan and Pudney (2008) as a basis for identification in a study of time use involving a similar case of repeated-observation measurement error with biased observation. Our application differs from the Kan-Pudney study in that we do not impose the a priori assumption that a particular observer or mode of observation is unbiased and, consequently point-identification is not possible here.
Let c j be the residual covariance cov (Y ij , A i X i ) and σ V j η be the covariance between the random component of the measurement error for observer j and the random component of educational progress. Under our assumptions c j = σ V j η + ρλ j σ 2 u , implying: If we can rule out the possibility of a negative covariance between the random component of the SDQ measurement error (V ij ) and the error in the education outcome (η i ), then c j λ j σ 2 u is an upper bound on the true mental health impact ρ. For parents and children (j = P, C), it may be reasonable to assume that there is no correlation between the observer's error in reporting the child's mental state and the unobserved contributors to the teacher's report of educational attainment, so that σ V j η = 0 and therefore sgn(ρ) = sgn(c j ). A one-sided test of the hypothesis H 0 ∶ c j = 0 against H 1 ∶ c j < 0 then establishes the sign of ρ. The test remains valid (but loses power) if σ V j η ≥ 0. We implement the test by estimating the 4-equation model comprising the reduced form equations (7) for parent, child and teacher observers, together with the education reduced form (9). We then use one-sided singleparameter Lagrange Multiplier tests to test separately the null hypotheses of zero error covariance between the residuals in the education equation and each of the SDQ equations.
The results are given in Table 4. All correlations between the residuals from SDQ reduced forms and the education reduced form are negative and highly significant in one-sided tests (they would also be highly significant against 2-sided alternatives and if adjusted for multiple comparisons by using Bonferroni corrections). The conclusion from this pattern of residual covariances is that the impact of mental disorder on educational progress is negative.
For teachers, the assumption that σ V T η ≥ 0 is questionable, since both SDQ and the measure of educational attainment are teacher-assessed. In this case, we might expect σ V j η < 0, since a tendency to underrate a child's educational achievement might accompany a tendency to overrate the same child's degree of mental disorder due to confounding factors relating to the 'quality' of the child-teacher match. Then (10) would only imply ρ ≥ c j (λ T σ 2 u ), which does not unambiguously fix the sign of ρ. The evidence from Table 4 is consistent with this idea of correlated educational and mental health assessments from teachers, since the (negative) correlation between SDQ and educational outcome is larger in magnitude for teachers than for parent or child and yields a more significant result.

Identification with an unbiased observer
The most common approach to estimation of models like (8) consists in using one of (or an average of) the SDQ scores as a proxy for the unobserved S i , but this fails to address either the classical measurement error problem or the additional problem of biased reporting by parents, children or teachers. The upper panel of Table 5 shows the estimates of the mental health-education impact that results from using one of the SDQ measures, scaled to have unit standard deviation, as a crude proxy for latent mental disorder; full parameter estimates are given in appendix Table A5. The estimates suggest that a 1-standard deviation increase in mental disorder has an average effect of retarding educational development by 3.1-5.7 months. Note that this is considerably smaller than the mean gap of 15 months between those with and without a diagnosed disorder (see Table 1).
A more sophisticated orthodox approach to the measurement error problem is to use a latent factor model, treating (1), (5), (6) and (8) as 'measurement equations' and (3) as the latent variable equation, assuming a priori that at least one of the SDQ measures is unbiased so that α j = 0 for some j, with the corresponding 'loading' λ j normalised at unity (see Bollen, 1989). Although we are reluctant to assume that parents, children and teachers are all unbiased observers, and have already rejected that hypothesis, it remains possible that one of the three types of observer is unbiased and we now explore the implications 21 of this for the mental health-education parameter ρ. The lower panel of Table 5 reports the estimate of the impact of mental health on educational attainment which results from estimating a conventional latent factor model under the restrictions λ j = 1, α j = 0 and V ij U i , η i , {V ik , all k ≠ j} for a specific observer j ∈ {P, C, T }, giving three sets of estimates as we take each observer in turn to be the one who is unbiased. Note that ρ is fully identifiable in this case, so there is no normalisation problem to be dealt with, and we are also able to infer the value of R 2 in the latent mental health equation. Table 5 presents the estimates of ρ in the normalised form ρ × β ′ V β + σ 2 u , so that it represents the effect on the mean educational deficit of a 1-standard deviation increase in latent mental disorder. If accepted, the results would suggest a substantial causal effect in the range 7.9-8.6 months' educational deficit for a 1-standard deviation increase. These estimates imply an R 2 of around 0.2-0.3 for the latent mental health equation which, as one would expect, exceed the R 2 statistics for the SDQ proxy regressions, which are depressed by the measurement noise they contain. Table 5 The estimated mental health-education effect: unbiased observer General mental health Hyperactivity ρ × sd(S i ) Std. err. Standard errors in parentheses; significance: * = 10%; ** = 5%; *** = 1%. All models include the covariates listed in Table 2 22

Exclusion restrictions on δ
Now assume there is no observer known to be unbiased and consider the use of exclusion restrictions as a source of identification. Define b to be the reduced-form coefficient vector, ρβ+δ, for educational performance. A zero restriction on the kth coefficient in δ implies that the corresponding coefficient in b is ρβ k = (ρσ τ )(β k σ τ ) and, since β * = β σ τ is identified from the measurement model, the coefficient (ρσ τ ) relevant to this normalisation is identified uniquely as the ratio of the kth elements of b and β * . The coefficient ρσ τ can then be rescaled in the form r = ρσ τ κ, which is interpretable as the impact of a 1-standard deviation change in mental health. The main problem with this approach is finding exclusion restrictions which can be strongly justified a priori -there are few factors influencing mental health which can confidently be asserted to have no direct causal influence on educational attainment.
Of the covariates available for the model, our view is that only one is a plausible candidate for exclusion from δ. Some 6.8% of children in the sample have experienced the death of a friend and reduced form estimates clearly show that these events have an impact on reported measures of the child's mental health. Unlike the death of a parent (which may change the resources of parental time and interest invested in the child's education), or injury or illness experienced by the child him/herself (which may interrupt schooling and study time), it seems reasonable to argue that the loss of a friend has no direct impact on the child's education, but only an indirect one through his or her mental state. The estimates produced by imposing this exclusion are presented in Table 6, scaled to correspond to R 2 levels in the range 0.1-0.4 for the latent mental health equation. Although the standard errors are larger than we would like, so that the estimated impact is not significantly different from zero, it is still possible to reject unambiguously the hypothesis of an 8-9 month impact for a 1-standard deviation increase in mental disorder, as suggested by the conventional latent factor analysis. Table 6 The estimated mental health-education effect: exclusion restrictions General Hyperactivity As an alternative to this direct a priori restriction, we also exploit a restriction on the effect of age which is suggested by the age-referenced nature of our educational attainment variable, A i , derived from teachers' responses to the following survey question: "In terms of overall intellectual and scholastic ability, roughly what age level is he or she at?" Let e i , a i and Z i represent respectively: the absolute level of the child's achievement; his or her age; and other personal characteristics, and write the age-specific achievement norm used by teachers as N (a), so that the child's educational age reported by the teacher is N −1 (e i ). Now make the further assumptions that: (i ) teachers use the population average as their norm, so that N (a) = E(e a); and (ii ) achievement is generated by a normal regression structure: e a, Z ∼ N (θ 1 a + θ 2 Z, ω 2 e ). Then our education variable is ] θ 1 − a i and its conditional distribution is: This implies that A i is independent of age if the covariates Z i are measured from agespecific means, implying an exclusion restriction on the education equation. The sample is large enough to permit the removal of age-specific means to be done non-parametrically, rather than modeling the relationship between Z and age explicitly.

24
The lower panel of Table 6 shows the results from exploiting the age-referenced nature of the education variable in this way. It demonstrates that the classical measurement error analysis based on the assumption of an unbiased parent, child or teacher observer exaggerates the causal impact of mental health problems on the development of human capital through schooling. While the unbiased observer approach suggests that a 1-standard deviation increase in mental disorder causes on average an 8-9 month delay in educational development, the age restriction indicates an effect half that size or less, of around 2-5 months. Again, there is no evidence of any difference between the impact of mental disorder as measured by a general index covering hyperactivity, emotional and conduct disorders, or hyperactivity alone.
These estimates of the impact of mental health on educational progress are our preferred ones, since they do not rely on the suspect assumption that any particular type of observer is unbiased and they exploit the logical structure of our particular measure of educational attainment to generate an identifying restriction. It is striking that the estimated impact that results is very similar to the result obtained using SDQ variables as crude proxy variables (Table 5), while the conventional wisdom of the latent variable model with an unbiased observer produces considerably larger estimates. This underlines the proposition that, outside the unrealistic world of unbiased observation with classical measurement error, the consequences of dealing with partial and error-prone observations can have consequences that differ greatly from the simple reversal of attenuation bias.

Conclusions
We have focused on the role of child mental health as an influence on educational attainment, addressing a set of problems related to the measurement of the child's state of mental health. These measurement difficulties generate two distinct identification problems. The first relates to estimation of the relationship between mental health and personal and family characteristics: the strong evidence of bias in the reports given by parents, children and teachers means that the classical conditions for irrelevance of measurement error in a regression dependent variable are not met. We have overcome this by using a unique dataset which includes a detailed psychiatric assessment, together with a theory (essentially rational expectations) of the behaviour of these assessors, to identify a latent mental health model.
However, a second identification problem arises when the educational process is introduced, since natural measures of mental health generated from this latent model are collinear with other explanatory covariates used in the education model. We use two alternative exclusion restrictions which can be argued to be valid theoretically and have sufficient empirical power to contribute useful identifying information. One is the experience of a death of a childhood friend, which is hypothesised to influence education only indirectly through its impact on the mental health of the child. The second is an age restriction which flows from the age-referenced nature of our educational attainment measure.
We have found that mental disorders are strongly influenced by family history and background, particularly by the mother's own mental health and education, also by major adverse life events such as the death of a friend or serious illness or injury. The decision-making by expert assessors, which is the key to these conclusions, places greatest weight on the views of teachers, rather less on those of parents and little weight on the self-assessments by young people themselves. Diagnostic behaviour by psychiatric assessors reflects the configuration of information that is available to them.
The impact of mental disorder on educational attainment is significant and, using our preferred strategy based on exclusion restrictions, appears to be moderate -a loss of approximately 2-5 months educational progress for a 1-standard deviation increase in 'true' latent mental disorder. This is closer to the estimate generated by a crude proxy-variable regression which ignores the measurement error problem, than the much larger estimate produced by a multi-indicator latent variable model based on the assumption that at least one of the non-expert observers is unbiased.
On a methodological level, this study exemplifies four important points. First, the measurement error in survey reports of children's mental state is large, not uniform across types of observer (parents, children and teachers), and far from the 'classical' measurement error assumptions embodied in standard latent factor models. The biases that result from the sort of measurement difficulty addressed in this paper can be complex and unexpected in structure and direction. Making allowance for this non-standard form of observation error makes a substantial difference to research findings on issues like the socio-economic gradient in child mental health.
Second, like many other important research issues in the social sciences, the link between child mental health and educational attainment is beset by identification difficulties, and the preferred strategy of using controlled (or 'natural' quasi-) experiments is unavailable because of the nature of the phenomena of interest. Despite this, it is possible to draw some important and strong conclusions without a full solution to the identification problem.
Third, we have shown that it cannot be taken for granted that a conventional 'solution' to a measurement problem is necessarily better than ignoring the problem. In this case, our preferred estimate of the impact of mental disorder on educational progress (which exploits the specific structure of our measure of educational achievement) is considerably smaller than the range of estimates produced by a conventional latent variable analysis based on the assumption of an unbiased observer -and it is much closer to estimates from crude proxy variable regressions. If we are interested primarily in the mental health-education effect, the extra sophistication of the latent variable approach would be positively harmful.
Finally, we have shown the value of evidence that combines standard survey self-reported information with deeper expert assessments, bringing us closer to the ideal situation where 27 there exists an unbiased observer. The UK Survey of the Mental Health of Children and Young People provides a model for this sort of evidence and its potential is substantial, particularly if the design could be extended to give a longitudinal picture of child development.