SUMMARY
 Top of page
 SUMMARY
 1 INTRODUCTION
 2 DATA, DEFINITIONS AND DESCRIPTIVE STATISTICS
 3 MODELBASED MEASUREMENT OF MENTAL HEALTH
 4 MENTAL HEALTH AND EDUCATIONAL ATTAINMENT
 5 CONCLUSIONS
 ACKNOWLEDGEMENTS
 REFERENCES
 Supporting Information
We examine the effect of survey measurement error on the empirical relationship between child mental health and personal and family characteristics, and between child mental health and educational progress. Our contribution is to use unique UK survey data that contain (potentially biased) assessments of each child's mental state from three observers (parent, teacher and child), together with expert (quasi)diagnoses, using an assumption of optimal diagnostic behaviour to adjust for reporting bias. We use three alternative restrictions to identify the effect of mental disorders on educational progress. Maternal education and mental health, family income and major adverse life events are all significant in explaining child mental health, and child mental health is found to have a large influence on educational progress. Our preferred estimate is that a onestandarddeviation reduction in ‘true’ latent child mental health leads to a 2 to 5month loss in educational progress. We also find a strong tendency for observers to understate the problems of older children and adolescents compared to expert diagnosis. Copyright © 2013 John Wiley & Sons, Ltd.
1 INTRODUCTION
 Top of page
 SUMMARY
 1 INTRODUCTION
 2 DATA, DEFINITIONS AND DESCRIPTIVE STATISTICS
 3 MODELBASED MEASUREMENT OF MENTAL HEALTH
 4 MENTAL HEALTH AND EDUCATIONAL ATTAINMENT
 5 CONCLUSIONS
 ACKNOWLEDGEMENTS
 REFERENCES
 Supporting Information
Childhood has become the focus of a growing body of research in economics concerned with the closely related concepts of children's wellbeing, mental health and noncognitive skills. Much of this interest has been sparked by Heckman's model of life cycle human capital accumulation, which contends that, distinct from cognitive ability, a stock of ‘noncognitive skills’ is built up by streams of investment over the life course and influences a wide range of life outcomes (Heckman et al., 2006; Cunha et al., 2010). A strong motivation for this line of research comes from the belief that IQ or cognitive ability is much less malleable than socioemotional skills, particularly after the age of 10. From a policy perspective, this would suggest that the returns to interventions targeted at noncognitive skills are potentially much higher than those focused on cognitive outcomes alone. For example, the Perry preschool intervention program in the 1960s did not raise the IQ of participating children in a lasting way, yet they went on to have better adult outcomes than the control group in a variety of dimensions (Heckman et al., 2010). The inference that Perry succeeded because of its impact on attention skills or antisocial behaviours, rather than cognitive ability, is one that is supported by evaluations of more recent childhood interventions which tend to show much larger effects on behaviour (of both parents and children) than on cognitive achievement outcomes (Currie, 2009).
Mental health conditions are much more common in childhood than most physical conditions. It has been estimated that half of all lifetime mental health disorders start by age 14 (Kessler et al., 2007), and a growing body of evidence suggests that prevalence is highest among children from lowincome backgrounds. While the relationship between noncognitive skills and medical conceptions of mental health is unclear (even though in practice they are often measured using the same indicators; e.g. Duncan and Magnuson, 2009), whether interpreted as lack of noncognitive skills or the existence of a mental health problem, a central concern is the impact that these adverse childhood states have on the process of human capital accumulation and the implications for the intergenerational transmission of economic advantage. It has been recognised recently that mental health conditions are potentially an important channel through which parental socioeconomic status influences the outcomes of the next generation. For example, Currie and Stabile (2006, 2007) and Currie et al. (2010) found significant impacts of hyperactivity on a range of later educational outcomes in US and Canadian longitudinal data and showed the persistence of these effects. Evidence from the medical literature is rather more mixed but also indicates the importance of mental health problems (Duncan and Magnuson, 2009; Breslau et al., 2008, 2009).
A key issue in the empirical study of the impact of child mental health on child outcomes is reliability of measurement. Two types of measure are common in the research literature. Clinical diagnoses are used extensively in psychiatric research, but they have several drawbacks: they are often only available for small, endogenously sampled groups of children; they identify relatively extreme and rare cases (affecting somewhere in the region of 5–10% of children); and they are sensitive to differences in diagnostic practice, which may produce surprising differences between apparently similar groups: for example, diagnosed attention deficit and hyperactivity disorder (ADHD) rates in the USA are double those in Canada (Stabile and Currie, 2006). A second type of measure is derived from a ‘screener’ module which can be completed quickly by parents, teachers or the children themselves, in the context of largescale sample surveys. These screeners are designed specifically to identify the symptoms of clinical disorders and are often used as a first step in diagnosing suspected cases—a high screening score being suggestive of a recognised disorder, while lower scores reflect the incidence of symptoms among the ‘normal’ population. Screener modules are often available in surveys that measure associated outcomes and so provide a way of assessing the relationship between early mental health problems and their consequences. Few data sources are available that give both screening and diagnostictype information for large representative samples.
Whatever type of information is used, measurement error is an important concern, which has received too little attention in the literature on child mental health and its consequences. There is a substantial body of research suggesting that adults’ assessments of their physical health are prone to serious measurement error (e.g. Butler et al., 1987; Mackenbach et al., 1996; Baker et al., 2004; Lindeboom and van Doorslaer, 2004; Etilé and Milcent, 2006; Bago d'Uva et al., 2007; Jones and Wildman, 2008; Johnston et al., 2009), and this problem is likely to be magnified in the case of child mental health. Children may manifest symptoms differently in different settings, perhaps showing deviant behaviour at school but not at home (or vice versa). They may deny or minimise socially undesirable symptoms when asked by parents or teachers. Informants may also have very different thresholds or perceptions of what constitutes abnormal behaviour in children.
The availability of multiple measures is particularly helpful in dealing with measurement error problems, but there is a strong possibility of observerspecific reporting bias. Evidence in the psychology and medical literatures indicate large disagreements between informants in their assessment of children's psychological wellbeing. For example, in a sample of US children aged 5–10, Brown et al. (2006) found that parents failed to detect half of schoolaged children considered to be seriously disturbed by their teachers. Youngstrom et al. (2003) found that prevalence rates of comorbidity in a clinical sample ranged from 5.4% to 74.1%, depending upon whether ratings from parent, teacher, child or some combination were used to classify the child. Goodman et al. (2000) suggest that parents are slightly better at detecting emotional disorders than teachers but that the opposite is true for conduct and hyperactivity disorders, while the selfassessments of children have less explanatory power than parents or teachers. Johnston et al. (2013) show, using data from the Survey of Mental Health of Children and Young People in Great Britain, that estimates of the income gradient in childhood mental health are sensitive to who provides the assessment, with the smallest gradients found when using children's own assessment of themselves rather than those of parents and teachers. A clear implication of this limited body of evidence is that measurement error is substantial and unlikely to be the simple random noise which is assumed by the classical errorsinvariables model. If no observer can be assumed to be unbiased, standard methods cannot be used to identify the true mental health process.
In this paper we make three main contributions. First, we exploit data from a remarkable UK survey (see Section 2) that contains assessments of children's mental health from parents, teachers and the children themselves, to demonstrate the existence of significant biases in all three observers. We do this by using additional diagnosticstyle assessments from a panel of expert psychiatric assessors, under the assumption that the experts are able to make the best possible use (in a rational expectations sense) of all available information, but with random variations in the threshold of seriousness they use for generating diagnoses. This model of expert behaviour, set out in Section 3, allows us to identify (up to scale) the parameters of a model representing the distribution of ‘true’ child mental health conditional on personal and family characteristics.
Second, we estimate the effect of mental health on educational progress, which requires us to overcome a second identification problem (discussed in Section 4), arising from the difficulty in distinguishing the indirect effect of influences on mental health from their direct effect on educational attainment. We use alternative identification strategies to provide parallel estimates of the impact of mental health problems on educational progress, relative to an agespecific norm. The orthodox multipleindicator latent variable model estimated under the standard assumption of an unbiased observer is not consistent when observers may be biased, and we develop an alternative approach which exploits an exclusion restriction derived from the agereferenced structure of our measure of educational progress. This novel method of instrumental variable (IV) construction does not impose the assumption of an unbiased observer.
Third, our empirical findings cast doubt on the robustness of some of the empirical literature on child mental health. In Section 3, we find strong evidence of different biases in the reports from different types of observers of the child (parents, teachers and children). The standard latent variable method of dealing with measurement error suggests an impact of mental disorders on educational progress much larger than that implied by a simple proxy variable regression; we find that the more appropriate IV method gives results much closer to the naive estimate. Unless we are very sure of our assumptions, it is clearly not enough to presume that an estimation method which allows in some way for the existence of response error is necessarily superior to a naive approach.
2 DATA, DEFINITIONS AND DESCRIPTIVE STATISTICS
 Top of page
 SUMMARY
 1 INTRODUCTION
 2 DATA, DEFINITIONS AND DESCRIPTIVE STATISTICS
 3 MODELBASED MEASUREMENT OF MENTAL HEALTH
 4 MENTAL HEALTH AND EDUCATIONAL ATTAINMENT
 5 CONCLUSIONS
 ACKNOWLEDGEMENTS
 REFERENCES
 Supporting Information
The data we use come from the 2004 Survey of Mental Health of Children and Young People in Great Britain, commissioned by the Department of Health and Scottish Executive Health Department, and carried out by the Office for National Statistics. Its aim was to provide information about the prevalence of psychiatric problems among people living in Great Britain, with a particular focus on three main categories of mental disorder: conduct disorders, emotional disorders and hyperkinetic disorders. A sample of children aged between 5 and 16 years was randomly drawn using a stratified sample design (by postcode) from the Child Benefit register. At the time of sampling, Child Benefit was essentially a universal entitlement for parents of all children, so the register provides an excellent sampling frame. Information was obtained in 76% (or 7977) of sampled cases, yielding information gathered from the child's primary caregiver (the child's mother in 94% of cases), from the teacher and (if aged 11–16) the young person him/herself. Among cooperating families, almost all the parents and most of the children gave full responses, while teacher postal questionnaires were obtained for 78% of the children interviewed. We focus on a subsample of 6806 white children who have information supplied by their mother, and who have nonmissing information for key covariates and mental health measures. The reason for this sample restriction was that ethnic minority and paternal respondent cases were too few for reliable inferences to be drawn about ethnic differences. Inclusion of these groups with associated dummy variables as covariates makes no appreciable difference to the main results.
Child mental health is first assessed in the survey with the Strengths and Difficulties Questionnaire (SDQ). The SDQ is a 25item instrument for assessing social, emotional and behavioural functioning, and has become very widely used as a measure of the mental health of children. The SDQ questions cover positive and negative attributes and respondents answer each with a response ‘not true’ (0), ‘somewhat true’ (1), or ‘certainly true’ (2). Tables A1 and A2 of the online Appendix (supporting information) give a complete list of the SDQ questions relating to conduct disorder, hyperactivity and emotional problems. In our empirical analyses we use parent, child and teacher SDQ scores that have been constructed in the standard way by summing responses. We carry out the analysis using two alternative indicators:
 General Mental Health: sum of the 15 items for conduct, emotional and hyperactivity disorder.
 Hyperactivity: sum of the 5 items for hyperactivity alone.
Each is normalised to a 0–1 scale. Measure (i) is intended as a general assessment of psychological distress, while (ii) focuses exclusively on the hyperactivity component of ADHD, which has been studied extensively in the research literature and found to be particularly important in some studies. These measures have good internal consistency, with high Cronbach α for the general and hyperactivity measures (see Table 1), which are in line with external values reported by Smedje et al. (1999).
Table 1. Sample mean scores for psychological disorders and educational attainment  Cronbach  Sample means 

 α  All children  No diagnosed disorder  Diagnosed disorder 


Parent general SDQ scorea  0.82  0.218  0.194  0.470 
Child general SDQ scorea  0.79  0.288  0.272  0.443 
Teacher general SDQ scorea  0.86  0.167  0.146  0.411 
Parent hyperactivity SDQ scoreb  0.78  0.321  0.293  0.615 
Child hyperactivity SDQ scoreb  0.71  0.389  0.372  0.556 
Teacher hyperactivity SDQ scoreb  0.88  0.270  0.241  0.596 
Educational attainment relative to age norm  —  0.034  0.128  −1.007 
Following the SDQ is the Development and WellBeing Assessment (DAWBA), a structured interview administered to parents and older children. Although it has limitations, the DAWBA has been found to be an effective diagnostic tool, especially for ADHD (Foreman et al., 2009). The DAWBA contains a series of sections, with each section exploring a different disorder; examples include social phobia, posttraumatic stress disorder, eating disorder, generalised anxiety and depression. Each disorder section begins with a screening question that determines whether the child has a problem in that domain. If the child passes the screening question and the relevant SDQ score is normal, the remainder of the section is omitted but, if parent or child indicates that there is a problem or the SDQ score is high, detailed information is collected, including a description of the problem in the informant's own words. The DAWBA parent and child interviews respectively take around 50 and 30 minutes, respectively, to complete (Goodman et al., 2000). A shortened version of the DAWBA was also mailed to the child's teacher. Once all three DAWBA questionnaires were returned, a team of child and adolescent psychiatrists reviewed both the verbatim accounts and the answers to questions about children's symptoms and their resultant distress and social impairment, before assigning diagnoses using ICD10 criteria. Importantly, no respondent was automatically prioritised.
Table 1 provides the sample means for the parent, child and teacher SDQ scores for all children, and for the subsets of children who were and were not diagnosed with an ICD10 mental disorder. The sample means indicate that teachers report the fewest symptoms (0.167) and that children report the most (0.288). Table 1 also shows that the SDQ scores of children with a diagnosed mental disorder are two to three times larger than the SDQ scores of children without a mental disorder. Estimated kernel densities of parent, child and teacher SDQ scores are presented in Figure 1. They are positively skewed, with most children exhibiting few symptoms and only a small minority exhibiting many.
The final key variable for our analysis is educational attainment. The survey focuses very much on measurement of mental state and a consequence of this is that educational outcomes are not documented in detail. In particular, the dataset does not contain test score information, and we use instead the one available quantitative measure of general educational progress: the teacher's assessment of the child's scholastic ability relative to other children of the same age. We construct this measure by using teacher responses to the question ‘In terms of overall intellectual and scholastic ability, roughly what age level is he or she at?’, from which we subtract the child's chronological age. This measure of educational progress is unusual in the economics literature, but the concept of a child's ‘mental age’ has a long history in child educational psychology—indeed, Intelligence Quotient (IQ) tests are so named because they were originally constructed as the ratio of mental age to chronological age multiplied by 100. The concept also underlies the practice in many educational systems (but not the UK's) of holding children back in a lower grade if he or she has made inadequate progress relative to the norm for that child's age. However, in the UK, the existence of a national school curriculum and associated testing programme means that there is a clear norm of agespecific achievement against which progress can be judged by teachers.
For our sample of children, the average scholastic age gap is 0.034 years, or approximately 2 weeks ahead of actual age (see Table 1). The age gap is, however, significantly different from zero for the groups of children with and without mental health problems. For children without a diagnosed mental disorder, the mean gap is 0.128 years, and for those with any disorder the gap is −1.007, implying an average gap between the two groups of around 15 months. Nonparametric estimates of the relationships between parent, child and teacher SDQ scores and educational attainment are shown in Figure 2, which confirms the pattern shown in Table 1, but indicates that the relationship is continuous and approximately linear, rather than a discrete distinction between the absence or presence of a disorder (see also Currie and Stabile, 2006). This suggests that identification analysis based on the joint distribution of binary states (see Kreider and Pepper, 2007, for an excellent example) would miss an important feature of the relationship between mental health and education outcomes.
Table A3 of the online Appendix presents sample means for the explanatory covariates used in our analysis. The continuous variables have been scaled to avoid extreme numerical values: age, number of children and log income are divided by 10; and mother's GHQ mental health score is scaled to lie in the [0,1] interval. All other covariates are binary; consequently, the sample means indicate that children with a diagnosed disorder are more likely to be male; live in social housing; have experienced serious adverse life events; and have a parent who is unmarried, less educated, nonemployed or with a mental health problem.
3 MODELBASED MEASUREMENT OF MENTAL HEALTH
 Top of page
 SUMMARY
 1 INTRODUCTION
 2 DATA, DEFINITIONS AND DESCRIPTIVE STATISTICS
 3 MODELBASED MEASUREMENT OF MENTAL HEALTH
 4 MENTAL HEALTH AND EDUCATIONAL ATTAINMENT
 5 CONCLUSIONS
 ACKNOWLEDGEMENTS
 REFERENCES
 Supporting Information
Our model has two components: a model of the complex measurement process for mental health and a relationship between the observed educational outcome and the child's (latent) mental health and other relevant characteristics. The measurement model is based on three main principles. The first is that there exists a ‘true’ state of mental disorder, S, conceptualised as the (latent) assessment that would be made by experienced psychiatric assessors in possession of fully detailed, multisource information on the child. This latent measure is the factor which we see as a potential influence on educational development.
Second, we accept that the child's true mental state S is not accurately observable by anyone: not by the parent, the child him/herself, the teacher, the psychiatric assessment team, nor—least of all—by us, the statistical analysts. We assume the SDQ responses from parents, children and teachers are all potentially subject to systematic distortion, which we see as arising either because certain observers (particularly parents and children) may be reluctant to admit the existence of a problem, or may exaggerate minor problems, or because certain aspects of the problem are less visible to certain types of observer.
It is important to realise that any (finitely) biased measure can be reinterpreted as an unbiased measure of a different concept (although not necessarily a theoretically appealing one). Thus any singlefactor model with biased multiple observers is logically equivalent to a multifactor model in which each observer measures a different factor. Conti et al. (2011) is a recent example of a highdimensional factor model where the use of observerspecific measures generates additional factors. We would argue that the measurement error approach, involving measurement of a common underlying concept, is a powerful one that has important advantages of parsimony and straightforward theoretical interpretation. It also matches the intention behind the SDQ instrument, which was explicitly designed to achieve comparability across observers (Goodman et al., 2000).
The third underlying assumption is that psychiatric assessors make the best use they can of the information available to them, exploiting their experience of diagnosis in a multiobserver setting, where the information reported to them by children and by parents and teachers may be subject to distortions and misinterpretation. Any analysis of measurement error requires an assumption which links some observed measure to the underlying concept that we seek to measure. Our rational expectations assumption for psychiatric assessors provides this link, but how plausible is it? The development of the US Diagnostic and Statistical Manual of Mental Disorders (DSM) and International Statistical Classification of Diseases and Related Health Problems (ICD) provide standardised frameworks for diagnosis which help to impose consistency on diagnostic practice and reduce bias of individual assessors relative to the norms set out in the DSM and ICD. Although the psychiatric assessors in our survey do not meet the families, the survey information they have at their disposal is similar to the diagnostic procedures used in connection with the DSM and ICD. There has been a debate in psychology about diagnostic norms and the possibility of bias linked particularly to ethnicity and culture (see USDHHS, 1999), and there is quite strong evidence of a greater readiness to diagnose disorder in black and other minority groups, particularly by white clinicians (Trierweiler et al., 2006). DSMIV and ICD10 (the versions in force at the time of the survey) both provide for cultural differences to be considered explicitly, but they have been criticised for their Western focus (Kleinman, 1997). Ethnic minorities form a very small proportion of our original survey sample and are not included in the subsample used for our analysis, so the main area of concern over biased psychiatric assessment is not relevant here.
We implement our approach through a latent variable structure with switching between observational regimes, to reflect the different information sets that may be available to psychiatric assessors under different circumstances. The sample consists of a set of n observed children. Child i’s ‘true’ mental health state is S_{i}, which is related to the child's characteristics and circumstances through a latent regression:
 (1)
where U_{i} is . Since S_{i} is unobservable, we can normalise β and arbitrarily to fix the origin and scale of S_{i}. We observe three SDQ scores reported by the parent, child and teacher, Y_{iP}, Y_{iC} and Y_{iT}, all treated as continuously variable measures. The scores derived from parents’, children's and teachers’ SDQ responses are potentially biased readings of S_{i}:
 (2)
where X_{i} is a vector of variables, available to all observers, reflecting causal factors including the child's personal characteristics, family and social circumstances and the occurrence of past traumatic events. (V_{iP},V_{iC},V_{iT}) are jointly normal conditional on S_{i} and X_{i}, with zero means and variance matrix ∑_{YY}; λ_{j} − 1 represents the sensitivity of the observer to the child's true state and α_{j} captures any measurement distortions linked to specific characteristics of the child and family circumstances. Consequently, an observer of type j gives generally unbiased reports only if λ_{j} = 1 and α_{j} = 0. Note that the inclusion of covariates in measurement models is used frequently in psychology to allow for systematic differences (‘bias’) in the sensitivity of cognitive ability tests and has been used in this way by Carneiro et al. (2003) in the economics literature.
The reduced form of (1)–(2) for observer j is
 (3)
and the coefficients [λ_{j}β + α_{j}] describe the mean relationship between the distorted SDQ scores and the child's observable characteristics.
In addition, parents and children are each asked a direct question about whether they perceive there to be a problem with respect to the specific aspect of mental health, yielding two binary indicators, W_{iP}, W_{iC}. These indicators are important, since they play a role in triggering additional questionnaire content, but they are based on the same underlying opinion as revealed by the SDQ and we assume them to contain no additional information, so that . Our rational expectations model of assessors’ behaviour has four components:
 1.
Information. The basic information which is always available to the psychiatric assessment process is
. If the parent's SDQ score exceeds a specific threshold (
Y_{iP} ≥
K_{P}) or the parent reports the child's state to be problematic (
W_{iP} = 1), then a more detailed set of questions is triggered, generating additional information Ω
_{iP}; similarly, if the child perceives there to be a problem or his or her SDQ responses exceed a threshold
K_{C}, further information Ω
_{iC} is elicited from him or her. Thus the additional contingent information set available to assessors is
 (4)
As external observers, we observe which of these four observational regimes occurs, but not the content of the information sets Ω_{iP} and Ω_{iC}.1
 2.
Knowledge. Psychiatric assessors’ knowledge and experience gives them the ability to ‘purge’ informational signals from parents, children and teachers of their bias. Although we do not claim that assessors think in terms of statistical models, this assumption is equivalent to assuming that they know the values of population parameters like β, λ_{j}, α_{j}.
 3.
Conditionally unbiased expectations. Assessors make minimumvariance unbiased predictions of
S_{i}, conditional on the available information
. Standard properties of the multivariate normal distribution imply that this conditional expectation is
 (5)
where
is the conditional mean from the reduced form (3).
is the coefficient of
Y_{ij} in a population regression of
S_{i} on
Y_{iP},
Y_{iC},
Y_{iT} and X
_{i}, and
are the coefficients of
Y_{ij} and the contingent information
from an extended regression on
Y_{iP},
Y_{iC},
Y_{iT} and
. Structure (5) implies that assessors make predictions which are optimal linear combinations of the three observers’ information signals,
after purging those signals of their bias components: consequently, the signal receiving the greatest weight is not necessarily the least biased. The term
represents the contribution of information available to the assessor but not to the statistical analysis and thus, from the point of view of the external observer, only inflates the residual error in
.
 4.
Diagnosis. The observed assessment is a binary quasidiagnosis
D_{i}, indicating a high predicted level of disorder:
, where
τ is the assessor's decision threshold, assumed distributed as
.
2 As an outside observer, the statistical analyst observes the diagnosis
D_{i} and the basic information
. The probability of a diagnosed problem is
 (6)
where
E_{ij} =
Y_{ij} − (
λ_{j}β +
α_{j}). If contingent information
is available to the assessment process, the probability of a diagnosed mental health problem conditional on the information available to the analyst is
 (7)
where
. Thus, conditional on all the observed information in
, we have a probit model for the psychiatric assessment, with regime switches in the coefficients of
E_{ij} and X
_{i} and in the normalising variance. However, conditional on
, these switches are exogenous, so there is no endogenous selection problem as there would be if we conditioned on X
_{i} but not on the SDQ scores
Y_{ij}. Note that, if item nonresponse makes one or more of the SDQ scores unavailable to us and to the assessors, the forms of (5) and (6) or (7) change to take account of the more limited information available.
What can be identified from this measurement model? Equations (2) and (1) imply the following reducedform SDQ models:
 (8)
Estimates were computed using maximum likelihood estimation of a system comprising (6), (7) and (8), parametrised in terms of β/σ_{τ},μ_{τ}/σ_{τ}, and , where is the set of nonempty configurations of contingent information (Ω_{iP}, Ω_{iC} or (Ω_{iP}, Ω_{iC})). To allow for item or individual nonresponse in the SDQ for children or teachers, as well as the responsetriggered contingent information, we allow for four missing data regimes with the following combinations of SDQ scores observed3: (i) Y_{iP}, Y_{iC}, Y_{iT}; (ii) Y_{iP}, Y_{iC}; (iii) Y_{iP}, Y_{iT}; (iv) only Y_{iP}. The structure of the vector varies across these four regimes. The scale factors are parametrised as exp(ψ_{P}ν_{iP} + ψ_{C}ν_{iC}), where ν_{ij} is the amount of contingent information supplied by observer j, ranging from ν_{ij} = 0 for no additional information to ν_{ij} = 3 for contingent information on all three aspects of conduct, emotional disorder and hyperactivity.
Parameter estimates of the psychiatric assessment model are given in Table 2. The estimates of indicate that, when available, assessors give greatest weight to teacher's SDQ reports, slightly less to the parental report and considerably less to the child's own selfassessment. This relative weighting is a consequence of the different amounts of noise that remain in the parent, child and teacher signals, after they are purged of bias. Note that teachers’ assessments are the most informative, but not necessarily the least biased, since the parameters α_{T} may be large. Indeed, we report evidence below that estimates based on the assumption of zero bias in teachers’ assessments are themselves subject to substantial bias. The ψparameters are negative, which is consistent with the theoretical prediction that and indicates that additional contingent information has value in clarifying the circumstances which led to the problematic selfassessment.
The estimates of β/σ_{τ} are shown in Table 3. They give the influence of the characteristics X on the child's mental state S, using a normalisation of S which is dictated by the variability of psychiatric assessments. Since σ_{τ} is unknown, scaling is arbitrary and it is only the significance and relative magnitudes of the coefficients that are meaningful here. Maternal education of any kind has a substantial positive influence on the child's mental health, comparable to major adverse life events including loss of a parent through death or divorce/separation and past experience of serious illness or injury. There is some evidence of intergenerational transmission of mental health problems, since the mother's own GHQ measure of mental (ill)health is found to have a significantly negative influence on the child's mental state. For example, if the GHQ score were to double from the mean level of 0.3 to 0.6, the predicted impact on the child's mental disorder would be around a third as great as the impact attributable to the absence of maternal educational attainment, or to the death of a friend or serious illness or injury during childhood. Indicators of social disadvantage do not have a large influence: housing type and tenure are statistically insignificant and, although log household income has a significant protective effect on child mental health, a very large income increase of around 170% would be required to produce an effect comparable to that of maternal education or adverse life events. We find no statistically significant evidence of an effect for the child's age (for general mental health) and gender or for the parents’ employment or partnership status, in contrast to the SDQ reducedform estimates (see Tables A4 and A5 in the online Appendix).
Table 3. Estimated coefficients (β/σ_{τ}) for latent mental disorder equationCovariate  General mental health  Hyperactivity 

 Estimate  SE  Estimate  SE 


Age  0.276  (0.210)  0.428*  (0.244) 
Male  0.171  (0.111)  0.059  (0.132) 
No. children  0.296  (0.565)  0.097  (0.691) 
Social housing  0.043  (0.152)  0.001  (0.178) 
Apartment  –0.236  (0.236)  –0.281  (0.303) 
Cohabiting  0.340*  (0.183)  0.349  (0.222) 
Single  –0.366  (0.258)  –0.327  (0.305) 
Widowed/divorced  0.085  (0.248)  0.126  (0.307) 
Mother's GHQ  0.398***  (0.046)  0.380***  (0.053) 
Mother employed  –0.134  (0.122)  –0.151  (0.146) 
Father employed  –0.196  (0.213)  –0.210  (0.259) 
Degree  –0.404*  (0.207)  –0.479*  (0.246) 
Vocational  –0.342*  (0.193)  –0.422*  (0.222) 
Alevels  –0.191  (0.180)  –0.160  (0.216) 
Olevels  –0.523***  (0.141)  –0.540***  (0.160) 
ln(income)  –0.356***  (0.043)  –0.403***  (0.049) 
Parental split  0.230  (0.145)  0.164  (0.178) 
Death in family  0.364  (0.233)  0.457*  (0.268) 
Death of friend  0.395**  (0.198)  0.412*  (0.233) 
Illness  0.255*  (0.142)  0.324*  (0.170) 
Injury  0.441**  (0.199)  0.363  (0.257) 
Financial crisis  0.239  (0.147)  0.291*  (0.167) 
Police trouble  0.278  (0.216)  0.136  (0.265) 
If an observer j is unbiased (and thus has α_{j} = 0), then the reducedform coefficient vector λ_{j}β + α_{j} is proportional to β/σ_{τ}. Since both are identified, we can rescale each estimate to have unit Euclidean length and carry out a Wald test of their equality (after dropping one redundant element of the difference vector). These tests give strong rejections for all three observers,4 so we can clearly reject the hypothesis that any of the observers is an unbiased observer, relative to the judgements made by the psychiatric assessors.
Although the distortion parameters λ_{j}, α_{j} are not separately identified, some inferences about the nature of the distortions are possible. If, for some observer j and covariate x_{k}, the identifiable effect of x_{k} on true mental health (β_{k}/σ_{τ}) and on the SDQ report by observer j ([λ_{j}β_{k} + α_{jk}]) are of opposite sign, then α_{jk} must have the opposite sign to β_{k}. This would imply that misreporting by observer j has the effect of attenuating or even reversing the apparent impact of x_{k} on mental health. We examine this by conducting tests of the hypothesis against the alternative for each variable x_{k} in turn, using Chen and Szroeter's (2009) test for multiple inequality restrictions. (Note that this is a very conservative procedure, since sign conflicts between β_{k}/σ_{τ} and α_{jk} need not generate a corresponding sign conflict between the identifiable coefficients β_{k}/σ_{τ} and [λ_{j}β_{k} + α_{jk}].) The test generates significant results only for age, where the joint nonnegativity hypothesis can be rejected clearly for the hyperactivity measure (P = 0.048) and more marginally for general mental health (P = 0.106). This suggests a tendency for observers to understate the problems of older children and adolescents relative to younger children, by the standards of the fully informed expert psychiatric assessment and is perhaps unsurprising, since the early stages of the process of child development are often the focus of special attention, while the problems of older children and adolescents may be less visible to external observers and possibly underacknowledged by young people themselves.
4 MENTAL HEALTH AND EDUCATIONAL ATTAINMENT
 Top of page
 SUMMARY
 1 INTRODUCTION
 2 DATA, DEFINITIONS AND DESCRIPTIVE STATISTICS
 3 MODELBASED MEASUREMENT OF MENTAL HEALTH
 4 MENTAL HEALTH AND EDUCATIONAL ATTAINMENT
 5 CONCLUSIONS
 ACKNOWLEDGEMENTS
 REFERENCES
 Supporting Information
We now turn to the consequences of biased reporting of the child's mental state for inferences about the causal impact of child mental health on educational development. Educational attainment relative to the child's age is denoted by A_{i} and is assumed to be related to mental health S_{i} and other covariates X_{i} as follows:
 (9)
where η_{i} is a normally distributed regression residual, which may be correlated with some or all of the SDQ residuals V_{ij}. Our results are based on model (9) with dependent variable A_{i} defined as the difference between the child's educational age and actual age.5
Note two potentially limiting features of our analysis using model (9). First, we assume that mental health has a unidimensional impact on education. Identification of ρ is a difficult issue and identification becomes more demanding if the state of mental health S_{i} is treated as multi rather than unidimensional. Our approach is to use a single variable to represent mental health, with alternative broad and narrowscope measures used to assess robustness. Thus we again report on two implementations: one where S_{i} represents a concept of general mental health corresponding to the overall emotional + conduct + hyperactivity SDQ score; the other representing hyperactivity alone. The close correspondence between the findings for these two specifications suggests that the assumption of uniform impact across dimensions of mental disorder is a reasonable approximation.
A second potential shortcoming is that we have only a single quantitative measure of general educational attainment, which is provided by the teacher and therefore subject to observational error just as the teacher's SDQ reports are. There are three reasons for believing that bias in teachers’ assessments of educational progress will be a less serious problem than bias in judgements made about mental health. First, teachers are professionals and thus the argument we used to motivate the assumption of unbiased assessment by psychiatrists carries over to teachers in relation to judgements about educational performance. Second, teachers in the UK operate within a tightly defined national curriculum with associated agespecific achievement norms and a rigorous external school monitoring regime. One can credibly argue that this system reduces the scope for bias and performs much the same function of imposing external validity as the DSMIV and ICD10 diagnostic frameworks do for psychiatric practice. Third, the measure A_{i} is a dependent variable so that any independent random noise in A_{i} only has the effect of reduce precision by inflating var(η_{i}) rather than introducing bias. This contrasts with the measurement of mental health states, which are used as explanatory covariates and are therefore vulnerable to classical measurement error as well as biased measurement.
Nevertheless, there remains a possibility that teachers’ judgements about educational achievement have some element of bias related to childrens’ mental states. There is very little evidence available on this issue, although Burgess and Greaves (2013), in a study focused mainly on ethnicity bias, find some tendency for teachers to underpredict the test scores achieved by children in groups with high rates of mental health disorders. If this is the case, then the parameter ρ identified by any of the methods considered here will be larger in magnitude than the true impact of mental health disorder on human capital formation. However, it will not affect the comparison of results from different modelling approaches. If, as we find below, methods using classical assumptions to adjust for measurement error overestimate the impact of mental disorder relative to more defensible methods, then this conclusion is unaffected by any tendency for teachers to confuse mental health problems with slow educational progress.
The reduced form for educational attainment reveals the identification problem we face:
 (10)
Even with β known, ρ cannot be uniquely recovered from knowledge of the reducedform coefficients (ρβ + δ). We explore three alternative identification strategies for the coefficient ρ. The first uses prior information on the residual covariances to reveal the sign of ρ. The second—which we include only to evaluate the performance of the standard latent variable method in this context of biased reporting—uses an assumption that one observer (either parent, child or teacher) is unbiased (and therefore under our rational expectations assumption, also known to be unbiased by the psychiatric assessors), with reporting error uncorrelated with educational attainment: essentially the classical measurement error assumptions used in standard latent variable modelling. The third approach uses an exclusion restriction on the coefficient vector δ, which we implement in two distinct ways.
4.1 Covariance Restrictions
Residual covariances provide information on ρ and this was exploited by Kan and Pudney (2008) in a study of time use also involving biased repeat observations. Our application differs from that study since we do not assume that a particular observer or mode of observation is unbiased and, consequently point identification is not possible here. Let c_{j} be the residual covariance cov(Y_{ij}, A_{i}X_{i}) and be the covariance between the random component of the measurement error for observer j and the random component of educational progress. Under our assumptions , implying . If we rule out any negative covariance between the random components of SDQ and educational attainment, then is an upper bound on ρ. For parent and child observers, it seems reasonable to assume no correlation between their error in reporting the child's mental state and the random component of the teacher's educational report, so that and therefore sgn(ρ) = sgn(c_{j}), j = P, C. A onesided test of H_{0} : c_{j} = 0 against H_{1} : c_{j} < 0 then establishes the sign of ρ. The test remains valid (but loses power) if . In contrast, for teacher observers, we might expect , since a tendency to underrate a child's educational achievement may accompany a tendency to overrate the child's degree of mental disorder due to confounding factors reflecting the ‘quality’ of the child–teacher match. If so, , leaving the sign of ρ ambiguous. We implement the test by estimating simultaneously the reducedform equations ((8) and (10)) for the SDQ scores and education variable, and using separate onesided Lagrange Multiplier tests for the residual covariances between the education equation and each SDQ equation. The results are given in Table 4. All residual correlations are negative and significant; they would also be highly significant against twosided alternatives if adjusted for multiple comparisons by using Bonferroni corrections. Consequently, we have some evidence that the impact of poor mental health on educational progress is negative. Note that Table 4 is consistent with the idea of correlated educational and mental health assessments from teachers, since the (negative) residual correlation is larger in magnitude and more significant for teachers than for parent or child.
Table 4. Tests of zero residual covariances between SDQ scores and school performance  Parent  Child  Teacher 


General mental health 
Residual correlation  –0.248  –0.176  –0.332 
Onesided tstatistica  –17.32  –7.97  –22.96 
Hyperactivity 
Residual correlation  –0.273  –0.156  –0.343 
Onesided tstatistica  –19.10  –7.06  –23.71 
4.2 Identification with an Unbiased Observer
The most common approach to estimation of models like (9) consists in using a single SDQ score (usually from the parent) as a proxy for the unobserved S_{i}, which is equivalent to assuming α_{j} = 0 and var(V_{ij}) for some observer j. Examples include Salm and Schunk (2012) and Bartling et al. (2012), who use SDQ as a covariate, and Datta Gupta and Simonsen (2010) and von Hinke Kessler Scholder et al. (2013), who use it as a dependent variable. This approach fails to address either the classical measurement error problem or the additional problem of biased reporting by parents, children or teachers. The upper panel of Table 5 shows the estimates of the mental health education impact that results from using one of the SDQ measures, scaled to have unit standard deviation, as a crude proxy for latent mental disorder (full parameter estimates are given in Table A6 of the online Appendix). The estimates suggest that a onestandarddeviation increase in mental disorder has an average effect of retarding educational development by 3.1–5.7 months or 2.7–6.0 months, respectively, for the general measure of mental health and for hyperactivity alone. Note that this is considerably smaller than the unconditional mean gap of 15 months between those with and without a diagnosed disorder (see Table 1).
Table 5. The estimated mental healtheducation effect: unbiased observer  General mental health  Hyperactivity 

 ρ x SD(S_{i})  SE  R^{2}  ρ x SD(S_{i})  SE  R^{2} 


SDQ proxy  Leastsquares regression with SDQ proxy 
Parent  –0.367***  (0.021)  0.172  –0.395***  (0.020)  0.184 
Child  –0.258***  (0.032)  0.169  –0.224***  (0.032)  0.163 
Teacher  –0.472***  (0.020)  0.214  –0.497***  (0.020)  0.221 
Respondent assumed unbiased  Latent factor model with unbiased observer 
Parent  –0.718***  (0.031)  0.320  –0.704***  (0.030)  0.263 
Child  –0.660***  (0.034)  0.195  –0.683***  (0.036)  0.216 
Teacher  –0.676***  (0.032)  0.233  –0.708***  (0.032)  0.271 
A more sophisticated approach to the measurement error problem which is common in the social sciences is to use the ‘structural equation modelling’ (SEM) framework, combining a set of measurement equations (2), a latent health equation (1) and a ‘structural equation’ for education (9) (see Bollen, 1989, for a review).
A single unbiased observer is sufficient to give identification up to scale of the coefficients β, since the reducedform coefficients in (3) are proportional to β if α_{j} = 0. But, in this framework, a repeat observation is required to identify ρ in addition to β. One possibility is to assume that another of the nonprofessional observers is also unbiased, but a more credible strategy is to retain the assumption of unbiased psychiatric assessments, so that we have two unbiased measures. The further assumptions required for identification are that the SDQ measurement error is independent of the true mental state and educational attainment: V_{ij}U_{i}, η_{i} for a specific observer j ∈ {P,C,T}. This gives three sets of estimates as we take each observer in turn to be the one who is unbiased. Note that ρ is fully identifiable here, but we report it, in the lower panel of Table 5, in the normalised form , representing the effect on the mean educational deficit of a onestandarddeviation increase in latent mental disorder. We are also able to infer and report the value of R^{2} in the latent mental health equation. These estimates would suggest a substantial causal effect in the range 7.9–8.6 months’ educational deficit for a onestandarddeviation increase, using either the general mental health or hyperactivity measure, and an R^{2} of around 0.2–0.3 for the latent mental health equation which, as one would expect, exceeds the R^{2} statistics for the SDQ proxy regressions, which are depressed by the measurement noise they contain.6
4.3 Exclusion Restrictions on δ
We now dispense again with the assumption of unbiased observation and consider exclusion restrictions as a source of identification. Define b to be the reducedform coefficient vector, ρβ + δ, for educational performance. A zero restriction on the kth coefficient in δ implies that the corresponding coefficient in b is ρβ_{k} = (ρσ_{τ})(β_{k}/σ_{τ}) and, since β/σ_{τ} is identified from the measurement model, the coefficient (ρσ_{τ}) relevant to this normalisation is identified uniquely as the ratio of the kth elements of b and β/σ_{τ}. The coefficient ρσ_{τ} can then be rescaled in the form r = ρσ_{τ}/κ, which is interpretable as the impact of a onestandarddeviation change in mental health. The main problem with this approach is finding exclusion restrictions which can be strongly justified a priori—there are few factors influencing mental health which can confidently be asserted to have no direct causal influence on educational attainment.
Only one of the covariates X_{i} is a plausible candidate for a direct zero restriction on δ. Some 6.8% of sampled children are reported by the mother to have experienced the death of a friend and the reducedform coefficients confirm that these events have an impact on SDQ scores (Tables A4–A5 of the online Appendix). Unlike the loss of a parent (which may change the resources of parental time and resources invested in the child's education), or injury or illness experienced by the child him/herself (which may interrupt schooling and study time), it is reasonable to argue that the loss of a friend has no direct impact on the child's education, but only an indirect one through his or her mental state. Two concerns have been raised about the exclusion of this variable: its high prevalence rate which might indicate response bias; and the possibility of correlation with socially graded unobserved factors such as neighbourhood deprivation. Evidence on the first of these is very sparse, but Fletcher et al. (2013) report an 8% prevalence rate in the USA for the death of a sibling by age 25. Although US child death rates are higher than those in the UK,7 the network size suggested by most surveys of children's friendship relations is typically about 5 or 6 (see Conti et al., 2013, for example), which far exceeds the number of children per family with children (slightly under 2 for both the USA and UK). Consequently, the survey prevalence rate of 6.8% is broadly consistent with external evidence. There remains a possibility that child mortality may act as a proxy for unobserved factors, such as neighbourhood deprivation, imparting an upward bias to our estimate of ρ. While this cannot be settled definitively, there is some available evidence. Using data from UK neighbourhood statistics for 2010, we find that a regression of the mortality rate in the 5–14 age group on the official index of multiple deprivation gives an R^{2} of 0.024 for males and 0.010 for females. Using our survey data and regressing the death of a friend variable on observed family characteristics likely to be associated with neighbourhood deprivation (social tenancy, log income and degreelevel education) gives an R^{2} ranging from 0.0006 to 0.0085; the overall multiple R^{2} is 0.01. These figures suggest only a modest social gradient in child mortality and thus limited scope for bias.
The estimates produced by imposing this exclusion are presented in Table 6, scaled to correspond to R^{2} levels in the range 0.1–0.4 for the latent mental health equation. Although the standard errors are larger than we would like, so that the estimated impact is not significantly different from zero, it is still possible to reject unambiguously the hypothesis of an 8 to 9month impact for a onestandarddeviation increase in mental disorder, as suggested by the conventional latent factor analysis.
Table 6. The estimated mental healtheducation effect: exclusion restrictions  General mental health  Hyperactivity 

 R^{2}=0.1  R^{2}=0.25  R^{2}=0.4  R^{2}=0.1  R^{2}=0.25  R^{2}=0.4 

Loss of friend 
Scaled estimate  –0.129  –0.082  –0.064  –0.137  –0.087  –0.069 
SE  (0.116)  (0.073)  (0.058)  (0.133)  (0.084)  (0.066) 
Age 
Scaled estimate  –0.383***  –0.242***  –0.191***  –0.332***  –0.210***  –0.166*** 
SE  (0.134)  (0.085)  (0.067)  (0.117)  (0.074)  (0.059) 
As an alternative to this direct a priori restriction, we also exploit a restriction on the effect of age which is suggested by the agereferenced nature of our educational attainment variable, A_{i}, derived from teachers’ assessments of the child's educational age. Let e_{i}, a_{i} and X_{i} represent, respectively, the absolute level of the child's achievement, his or her age, and other personal characteristics, and write the agespecific achievement norm used by teachers as T(a), so that the child's educational age reported by the teacher is T^{− 1}(e_{i}). Now make the further assumptions that: (i) teachers use the population average as their norm, so that T(a) = E(ea); and (ii) achievement is generated by a normal regression structure: . Then our education variable is A_{i} = T^{− 1}(e_{i}) − a_{i} = [e_{i} − θ_{2}E(X_{i}a_{i})]/θ_{1} − a_{i} and its conditional distribution is
 (11)
This implies that A_{i} is independent of age if the covariates X_{i} are measured from agespecific means, implying an exclusion restriction on the education equation. The strong age gradient in the onset of mental health disorders documented in the psychology literature (Kessler et al., 2007) gives this restrictionidentifying power. The sample is large enough to permit the removal of agespecific means to be done nonparametrically, rather than modelling the relationship between X and age explicitly.
The lower panel of Table 6 gives the results from exploiting the agereferenced nature of the education variable in this way. It shows that the classical measurement error analysis based on the assumption of an unbiased parent, child or teacher observer exaggerates the causal impact of mental health problems on the development of human capital through schooling. While the unbiased observer approach suggests that a onestandarddeviation increase in mental disorder causes on average an 8 to 9month delay in educational development, the age restriction indicates a much smaller effect of around 2–5 months. Again, there is no evidence of any difference between the impact of general mental health (covering hyperactivity, emotional and conduct disorders), or hyperactivity alone.
These estimates based on exclusion restrictions both suggest a considerably smaller impact of mental health on educational progress than would be suggested by methods based on the assumption of an unbiased parent, child or teacher observer of the child's mental state. The age exclusion, in particular, is a natural assumption to make, since it exploits the logical structure of our particular measure of educational attainment to generate an identifying restriction. It is striking that the estimated impact that results is broadly similar to the result obtained using SDQ variables as crude proxy variables (Table 5), while the more sophisticated latent variable model with an unbiased observer produces considerably larger estimates. Of course, there is no necessity for this to be a general result, but it underlines the proposition that, outside the unrealistic world of classical measurement error, the consequences of dealing with partial and errorprone observations can produce results that differ greatly from the simple reversal of attenuation bias.
5 CONCLUSIONS
 Top of page
 SUMMARY
 1 INTRODUCTION
 2 DATA, DEFINITIONS AND DESCRIPTIVE STATISTICS
 3 MODELBASED MEASUREMENT OF MENTAL HEALTH
 4 MENTAL HEALTH AND EDUCATIONAL ATTAINMENT
 5 CONCLUSIONS
 ACKNOWLEDGEMENTS
 REFERENCES
 Supporting Information
We have focused on the role of child mental health as an influence on educational attainment, addressing a set of problems related to the measurement of the child's state of mental health. These measurement difficulties generate two distinct identification problems. The first relates to estimation of the relationship between mental health and personal and family characteristics: the strong evidence of bias in the reports given by parents, children and teachers means that the classical conditions for irrelevance of measurement error in a regression dependent variable are not met. We have overcome this by using a unique dataset which includes a detailed psychiatric assessment, together with a theory (essentially rational expectations) of the behaviour of these assessors, to identify a latent mental health model. However, a second identification problem arises when the educational process is introduced, since natural measures of mental health generated from this latent model are collinear with other explanatory covariates used in the education model. We use two alternative exclusion restrictions which can be argued to be valid theoretically and have sufficient empirical power to contribute useful identifying information. One is the experience of a death of a childhood friend, which is hypothesised to influence education only indirectly through its impact on the mental health of the child. The second is an age restriction which flows from the agereferenced nature of our educational attainment measure.
We have found that mental disorders are strongly influenced by family history and background, particularly by the mother's own mental health and education, and also by major adverse life events such as the death of a friend or serious illness or injury. The decisionmaking by expert assessors, which is the key to these conclusions, places greatest weight on the views of teachers, rather less on those of parents and little weight on the selfassessments by young people themselves. Diagnostic behaviour by psychiatric assessors reflects the configuration of information that is available to them and adjusts for the biases inherent in different types of observer.
The impact of mental disorder on educational attainment is significant and, using our preferred strategy based on exclusion restrictions, appears to be important—a loss of approximately 2–5 months educational progress for a onestandard deviation increase in ‘true’ latent mental disorder. This is closer to the estimate generated by a crude proxyvariable regression which ignores the measurement error problem, than the much larger estimate produced by a multiindicator latent variable model based on the assumption that at least one of the nonexpert observers is unbiased.
On a methodological level, this study exemplifies four important points. First, the measurement error in survey reports of children's mental state is large, not uniform across types of observer (parents, children and teachers) and far from the ‘classical’ measurement error assumptions embodied in standard latent factor models. The biases that result from the sort of measurement difficulty addressed in this paper can be complex and unexpected in structure and direction. Making allowance for this nonstandard form of observation error makes a substantial difference to research findings on issues like the socioeconomic gradient in child mental health.
Second, like many other important research issues in the social sciences, the link between child mental health and educational attainment is beset by identification difficulties, and the preferred strategy of using controlled (or ‘natural’ quasi)experiments is unavailable because of the nature of the phenomena of interest. Despite this, it has been possible to draw some important conclusions.
Third, this application shows that an attempt to address a measurement error problem inappropriately may make things worse rather than better. In this case, our preferred estimates of the impact of mental disorder on educational progress (which exploit credible a priori restrictions and the specific structure of our measure of educational achievement) are considerably smaller than the range of estimates produced by a conventional latent variable analysis based on the assumption of an unbiased observer—and are much closer to estimates from crude proxy variable regressions. If we are interested primarily in the mental health–education effect, the extra sophistication of the latent variable approach would be detrimental. One cannot, of course, rely on the superiority of naive estimates as a general proposition, but it is important to look carefully at the assumptions underlying more sophisticated approaches.
Finally, we have shown the value of evidence that combines standard survey selfreported information with deeper expert assessments, bringing us closer to the ideal situation where there exists an unbiased observer. The UK Survey of the Mental Health of Children and Young People provides a model for this sort of evidence and its potential is substantial, particularly if the design could be extended to give a longitudinal picture of the evolution of mental health and human capital accumulation over time.