Associations between theory of mind and conduct problems in autistic and nonautistic youth

Many autistic young people exhibit co‐occurring behavior difficulties, characterized by conduct problems and oppositional behavior. However, the causes of these co‐occurring difficulties are not well understood. Impairments in theory of mind (ToM) are often reported in autistic individuals and have been linked to conduct problems in nonautistic individuals. Whether an association between ToM ability and conduct problems exists in autistic populations, whether this association is similar between individuals who are autistic versus nonautistic, and whether these associations are specific to conduct problems (as opposed to other domains of psychopathology) remains unclear. ToM ability was assessed using the Frith–Happé Triangles task in a pooled sample of autistic (N = 128; mean age 14.78 years) and nonautistic youth (N = 50; mean age 15.48 years), along with parent‐rated psychiatric symptoms of conduct problems, hyperactivity/inattention and emotional problems. Analyses tested ToM ability between autistic versus nonautistic participants, and compared associations between ToM performance and conduct problems between the two groups. Where no significant group differences in associations were found, the pooled association between ToM and conduct problems was estimated in the combined sample. Results showed no evidence of moderation in associations by diagnostic status, and an association between poorer ToM ability and higher levels of conduct problems, hyperactivity/inattention and emotional problems across the total sample. However, these associations became nonsignificant when adjusting for verbal IQ. Results provide support for theoretical models of co‐occurring psychopathology in autistic populations, and suggest targets for intervention for conduct problems in autistic youth.


BACKGROUND
Autistic youth are at increased risk of exhibiting cooccurring behavior/conduct problems (Gjevik, Eldevik, Fjaeran-Granum, & Sponheim, 2011;Simonoff et al., 2008), characterized by aggression, temper outbursts, and severe noncompliance (referred to as conduct problems throughout the current manuscript for consistency, although we acknowledge different terms are used in the literature).One approach to understanding the heightened prevalence of co-occurring conduct problems in autistic populations is to focus upon the cognitive profile associated with a diagnosis of autism spectrum disorder (ASD).Traditionally these specific cognitive difficulties have been conceptualized as potential mechanisms that may underpin the core symptoms of social communication difficulties and restricted, repetitive behaviors.However, far less research has considered how these difficulties may also be important in understanding co-occurring psychopathology in autistic individuals.Identifying individual characteristics that predict poor mental health in autistic populations is an important first step in signposting appropriate targets for intervention.
One potentially relevant cognitive risk factor for conduct problems, which is often studied in autistic populations, is difficulties in theory of mind (ToM), characterized by problems understanding the mental states (e.g., beliefs, intentions) of others.While group differences are often present, in that autistic youth show poorer performance on ToM tasks as compared to matched nonautistic youths (see Happé & Conway, 2016 for a review; Salter, Seigal, Claxton, Lawrence, & Skuse, 2008), not all autistic individuals fail tests of ToM (Scheeren, de Rosnay, Koot, & Begeer, 2013).It has been argued that these difficulties in ToM underpin the core symptoms of social impairments (Jones et al., 2018); however, the role of ToM as a risk factor for conduct problems is a relatively understudied area in autistic populations.One study with autistic youth found poorer performance on computerized ToM tasks was associated with higher levels of selfreported aggression (Pouw, Rieffe, Oosterveld, Huskens, & Stockmann, 2013).In contrast to the limited research in autistic populations, more work has been done to examine the link between individual differences in ToM ability and conduct problems in nonautistic (or "typically developing") populations.Early studies found that although children with clinically high levels of conduct problems, as indicated by a diagnosis of conduct disorder, may pass simple tasks, questionnaire measures suggested subtle difficulties in (particularly prosocial) ToM were present (Happé & Frith, 1996).Others report "difficult to manage" (but otherwise typically developing) 4-year-old children show impairments on ToM false belief tasks, but these differences were largely accounted for by variation in verbal IQ (Hughes, Dunn, & White, 1998).However, Hughes and colleagues have also reported associations between poorer ToM and aggressive behavior problems in typically developing 2-year-old children, even after adjusting for differences in age, sex, verbal ability, social disadvantage, and executive functioning ability (Hughes & Ensor, 2006).Studies of slightly older children (aged 3-7 years) also found those who performed worse on ToM tasks were rated as more aggressive by their teachers (Capage & Watson, 2001).More recently, an association between poorer ToM and higher levels of conduct problems has been found in youth community samples (aged 9-13 years), over and above the effects of age, sex and IQ (Sharp, 2008).Furthermore, poor ToM ability longitudinally predicts increases in teacher-rated physical and relational aggression in children aged 6-11 years, even when adjusting for differences in initial level of verbal ability (Holl, Kirsch, Rohlf, Krahé, & Elsner, 2018).
Some suggest the link between difficulties in ToM and conduct problems in nonautistic populations is due to misinterpretation of social situations leading to interpersonal aggression (Choe, Lane, Grabell, & Olson, 2013).In support of this proposal, aggressive children more frequently attribute hostile intentions to peers in ambiguous social situations (De Castro, Merk, Koops, Veerman, & Bosch, 2005;De Castro, Veerman, Koops, Bosch, & Monshouwer, 2002).It should be noted here that children with depression are also more likely to attribute hostile intentions (Quiggle, Garber, Panak, & Dodge, 1992); however they can be differentiated by their behavioral response to the perceived negative intent.Another interpretation of the link between poor ToM and conduct problems is that the social signals that modulate ongoing interpersonal behaviors do not inhibit certain behaviors in the same manner in those with poor ToM.Others contest this link, and instead argue that better social cognition is associated with interpersonal aggression (e.g., the ability to bully successfully may rely on intact ToM) (Sutton, Smith, & Swettenham, 1999), although the empirical support for this stance is mixed (Monks, Smith, & Swettenham, 2005).Differences in the ages of samples, the nature of ToM test (e.g., with or without an affective component, the level of demand on other areas of cognition such as executive function/ inhibitory control), and the conceptualization of conduct problems (e.g., proactive vs. reactive aggression) likely contribute to the heterogeneity in findings.This last point is important to note when considering predictors of conduct problems in autistic populations, as proactive and reactive aggression are thought to have separable cognitive correlates (Dodge & Coie, 1987;Marsee & Frick, 2007), and there is evidence to suggest autistic individuals may be more like to display reactive (as opposed to proactive) conduct problems, compared to other clinical groups (Farmer et al., 2015).

CURRENT STUDY
From literature briefly reviewed above, there is strong evidence to suggest poor ToM is associated with increased conduct problems in nonautistic (or "typically developing") populations, however, few studies have tested whether a comparable association is found in autistic individuals, despite the high levels of conduct problems in autistic populations.A distinction can be made between explanations of increased conduct problems that posit higher rates/increased prevalence of risk factors (such as poor ToM), versus explanations that suggest the same risk factors more strongly predict conduct problems in autistic as compared to nonautistic groups.With regard to the latter hypothesis, theoretically it may be the case that some risk factors have a more negative effect in autistic individuals because of fewer buffering factors (e.g., due to a decreased social support system; Humphrey & Symes, 2010) or compensatory mechanisms (e.g., due to a diffuse profile of cognitive impairments; Brunsdon et al., 2015).Alternatively, some established risk factors may not be associated with conduct problems at all in autistic individuals (e.g., as with lower IQ; Simonoff et al., 2008;Kanne & Mazurek, 2011).Although the focus of the current study is on child-intrinsic factors and specifically ToM, there are many other risk factors for mental health problems which are thought to be more prevalent in autistic populations (e.g., bullying; Schroeder, Cappadocia, Bebko, Pepler, & Weiss, 2014, difficulties with emotion regulation; Mazefsky et al., 2013).
Thus, in the current study we test for differences in ToM ability between autistic and nonautistic individuals using a task-based assessment of ToM, and then compare the strength of associations between ToM ability and parentreported conduct problems in both groups, to assess if ToM difficulties are associated with in a comparable way in autistic and nonautistic individuals.To the best of our knowledge, only one study has tested the association between ToM ability and conduct problems in autistic youth (Pouw et al., 2013), and no work thus far has compared the strength of association between ToM and conduct problems in autistic versus nonautistic populations.In addition, given the frequent co-occurrence of conduct problems and attention-deficit hyperactivity disorder (ADHD) (Ford, Goodman, & Meltzer, 2003) in the general population, and conduct problems and emotional symptoms in autistic populations (Simonoff et al., 2008), although our primary focus is the association between ToM ability and conduct problems, we also examine associations with these other domains of psychiatric symptoms (i.e., hyperactivity/inattention and emotional problems) known to be prevalent in autistic populations.Including these two domains as additional outcomes allows us to explore specificity of any associations to the conduct problems domain.Furthermore, given that many studies of ToM in autistic populations have used small to modest sized samples, which may contribute to issues with reproducibility in the field, we combine two wellcharacterized samples of autistic youth who completed the same measures of psychiatric symptoms and ToM ability to provide sufficient statistical power.

Sample
The full sample consisted of data pooled from the two cohorts described below.More details on sample ascertainment are given in the Supporting information.
QUEST (target cohort born September 2001-September 2004, South East London area of the UK).A total of 277 children were originally assessed as part of the QUEST study (see Salazar et al., 2015 for details), a longitudinal community autistic sample recruited at age 4-8 years, part of the wider IAM Health project (http:// iamhealthkcl.net/).Two hundred seventy-seven children were recruited into the study and split into an "intensively studied" (hereafter intensive; n =1 0 1 )a n d"extensively studied" group (hereafter extensive; n = 176).This sampling structure was maintained throughout all waves of data collection.All participating girls were invited into the intensive subsample in order to make sex comparisons possible.Although all participants had a clinical diagnosis of ASD upon entry to the study, the intensive group had their diagnosis confirmed at age 11-15 years (Wave 2 of data collection) with the Autism Diagnostic Observation Schedule-2 (ADOS-2; Lord et al., 2012), and a subset (66/83 cases) also completed the Autism Diagnostic Interview-Revised (ADI-R; Rutter, Le Couteur, & Lord, 2003).All participants were above threshold on one or both instruments.All participating families gave written informed consent and the study was approved by Camden and King's Cross Ethics Sub-Committee (14/LO/2098).The present study uses data from intensive participants at Wave 2 who had an IQ ≥ 50 (to ensure a similar IQ distribution to the SNAP cohort) (n = 44, mean age = 13.49years, 57% male).SNAP (target cohort born July 1990-December 1991, South Thames area of the UK).A total of 100 autistic adolescents and 57 nonautistic adolescents who had an IQ ≥ 50, were assessed as part of the Special Needs and Autism Project (SNAP) cohort (see Baird et al., 2006 for details).Participants were recruited at 12 years and followed up in adolescence and young adulthood (Simonoff et al., 2020).Here we report upon data from adolescence (mean age 16 years) (see Charman et al., 2011 for details).Upon entry to the study all individuals in the autistic group received a consensus clinical ASD diagnosis, made using the ADI-R and ADOS-Generic (ADOS-G; Lord et al., 2000).The participants who made up the nonautistic group consisted of two groups.First, SNAP participants who did not have ASD but had a range of other primary ICD-10 diagnoses (n = 27; 15 mild intellectual disability; 4 moderate intellectual disability; 3 specific reading/spelling disorder; 2 ADHD; 1 expressive/ receptive language disorder; 2 no diagnosis).To these were added nonautistic participants (n = 30) recruited from local mainstream schools.Parent and teacher report confirmed that all children in this subset of the nonautistic group were typically developing; none had a psychiatric diagnosis, a statement of special educational needs nor were receiving medication.The Social Communication Questionnaire (Rutter, Bailey, & Lord, 2003) was collected from 27 of the 30 adolescents; no individual was above the cut-off for ASD (≥15).Written informed consent was obtained from all parents and, where appropriate, by the participants themselves, if their level of understanding was sufficient.The study was approved by the South East Multi-Centre Research Ethics Committee (05/MRE01/67).The present study uses data from participants at age 16 years (n = 84 in the autistic group, mean age = 15.45 years, 90% male, n = 50 in the nonautistic group, mean age = 15.45 years, 98% male).

Measures
Psychiatric symptoms.The Strengths and Difficulties Questionnaire (SDQ; Goodman, Ford, Simmons, Gatward, & Meltzer, 2000) is a 25-item questionnaire measuring psychiatric symptoms over the last 6 months, comprising of subscales of conduct problems, hyperactivity/inattention (ADHD symptoms), emotional problems, peer-relationship problems, and prosocial behavior.Analyses focused upon the p a r e n t -r a t e ds u b s c a l e so fc o n duct problems, hyperactivity/ inattention, and emotional problems, each composed of five items that are scored as not true, somewhat true or certainly true.Acceptable internal consistency is found for all three subscales (α > 0.70) in typically developing (Goodman, 2001) and autistic samples (Findon et al., 2016).In the current sample of children who completed the ToM task, internal consistency was acceptable for the hyperactivity/inattention and emotional problems subscales (both α = 0.72), but only moderate for the conduct problems subscale (α =0.53).
Autistic symptoms.The ADOS (Lord et al., 2000;Lord et al., 2012) is a semi-structured assessment that is considered a gold-standard instrument for assessing current autistic symptoms.A calibrated severity score can be calculated, scored 0-10, which takes into account age and language level (Shumway et al., 2012).Although the ADOS-2 was used in QUEST and the ADOS-G was used in SNAP, ADOS-G raw data used to generate CSS scores is equivalent to that used to create CSS scores from the ADOS-2, meaning that CSS scores from the ADOS-G and ADOS-2 are comparable.The ADOS was not completed on individuals in the nonautistic sample.
Verbal ability.Full scale and verbal IQ were predominantly estimated using the Wechsler Abbreviated Scale of Intelligence (WASI-I in SNAP, WASI-II in QUEST; Wechsler, 1999;Wechsler, 2012) (the Verbal Comprehension Index was used to calculate verbal IQ).The Wechsler Preschool and Primary Scale of Intelligence (WPSSI-IV; Wechsler, 2012) was also used with one participant in the QUEST cohort.As the WPPSI was used out of age range, an age-equivalent was calculated and a ratio full scale IQ/verbal IQ derived [ratio IQ = (age-equivalent/chronological age) × 100] (Terman & Maude, 1960).
ToM ability.Participants completed the Frith-Happé Triangles animation task (Abell, Happé, & Frith, 2000;Castelli, Frith, Happé, & Frith, 2002), which involved viewing six short animations of two cartoon triangles interacting, and then describing what the two triangles were doing.All responses were recorded for later transcription and scoring.The task consisted of four ToM animations depicting complex social interactions (coaxing, mocking, seducing, surprising), and two goal-directed animations, in which the actions of one object show a simple dependency on those of the other (fighting, chasing).Accurate interpretation of the ToM animations requires understanding of complex mental states to accurately interpret, whereas accurate interpretation of the goal-directed animations requires inference of dependencies but not complex mental states.Two average scores were calculated: intentionality (degree of mental state attribution: scored from 0 to 5) and appropriateness (degree to which participant correctly identified the intended content: scored from 0 to 2).In both cohorts stimuli were presented on a laptop in the presence of a trained experimenter, the same task administration and coding rules were followed, and the responses were double coded by two independent raters (all scripts in QUEST, 56% of scripts in SNAP).Reliability was high in both cohorts (range of intraclass correlations in QUEST = 0.87-0.98,SNAP = 0.82-0.98;Jones et al., 2011).

Statistical analysis
All analyses were completed in Stata 14 (StataCorp, 2015).Variables were assessed for normality; the goal-directed appropriateness variable was highly skewed due to ceiling effects, therefore scores of 0-1.5 were collapsed into a score of 0 and scores of 2 were recoded to 1 to give a binary variable.As there was some evidence of heteroscedasticity of residuals for the conduct problems subscale, all analyses were undertaken using robust standard errors (Hayes & Cai, 2007).
Combining participants from QUEST and SNAP cohorts into one group.The main aim of the study was to test the association between ToM ability and conduct problems in an appropriately powered sample.Hence, before beginning the main analyses we confirmed the validity of combining the QUEST and SNAP autistic participants into one group (and thereby increasing statistical power).We were aware that they would differ in basic demographic characteristics of sex and age due to differences in how the samples were ascertained (i.e., QUEST deliberated over-samples females, QUEST participants were assessed at 10-14 years whereas SNAP were assessed at age 16 years).For the current analyses what was key was whether the SNAP and QUEST autistic participants differed in overall ToM task performance, parent-rated psychiatric symptoms or verbal ability.We also report on cohort differences in parental education and employment to give fuller characterization of potential cohort differences.Both were assessed as binary variables (parental employment scored as 0 = neither parent in employment, 1 = one or more parent in employment; parental education scored as 0 = no qualifications up to GCSEs, 1 = A-Levels or higher).These were both compared using the χ 2 statistic.Cohort differences in all other variables were adjusted for age and sex using a series of one-way ANOVAs.Logistic regression was used for the binary variable of goal-directed appropriateness.We included age and sex as covariates here as we knew they differed between cohorts, and these were planned covariates in the main analysis; sex due to well-recognized sex differences in the prevalence of both ASD and conduct problems (Baird et al., 2006;Collishaw, Maughan, Goodman, & Pickles, 2004), age due to potential effects on task performance.The majority of comparisons between SNAP and QUEST were nonsignificant (see Table S1 for details; apart from differences in hyperactivity/inattention (SNAP had higher scores; p = 0.03) and goal-directed appropriateness (QUEST scoring lower; p < 0.01).In terms of unadjusted demographic characteristics, SNAP autistic participants also had a significantly higher rate of parental employment (99% vs. 66% in QUEST; p < 0.01), but no differences were found in parental education.However, as neither the goal-directed conditions nor hyperactivity/inattention were our primary predictor or outcome of interest, and given the number of comparisons made between the two cohorts (thereby inflating the likelihood of a chance finding), we did not see these two effects as sufficient evidence to suggest combining the samples was inappropriate.
Testing associations between ToM ability and psychiatric symptoms.Before running inferential statistics, we compared demographic factors and SDQ scores between the two groups, using a series of one-way ANO-VAs.We also confirmed that the autistic group scored lower on the ToM task using the same approach, aside from for the binary variable of goal-directed appropriateness, where the χ 2 statistic was used.Next, we compared the strength of associations between ToM performance and psychiatric symptoms in the autistic versus nonautistic group.The critical comparison of the associations between the two groups was the significance of the diagnosis-by-task performance interaction term.A significant term indicates that the association between ToM ability and psychiatric symptoms is different in the autistic as compared to nonautistic group.As stated in the aims, our primary outcome of interest was conduct problems, however associations with hyperactivity/inattention and emotional problems were also examined to assess whether there was any specificity in the type of psychiatric symptoms associated with ToM difficulties.We used a seemingly unrelated regression approach as it allowed us to test the association between multiple predictors and multiple outcomes using standard linear regression, while also allowing for correlations between the outcomes (here the SDQ subscales) (Zellner, 1962) (see Figure 1 for a schematic of the analytic model).Each metric of task performance FIGURE 1. Model testing associations between task performance and psychiatric symptoms was tested independently.Metrics of task performance were centered to aid interpretation of effects.Our primary hypotheses concerned performance in the ToM condition; the goal-directed metrics were included to provide a check that findings were specific to understanding of mental state terms, rather than inference in general.Results of these analyses are reported in Table S2; in summary, no significant associations were found between goal-directed abilities and psychiatric symptoms.Sex, age and cohort (SNAP = 0, QUEST = 1) were entered as covariates; the latter to account for any additional unmeasured ascertainment differences.Finally, where the diagnosis-by-task performance interaction term was nonsignificant, the model was rerun excluding the interaction term to estimate the pooled association between task performance and psychiatric symptoms across the two groups.Standardized estimates (β) were calculated for the pooled associations to aid interpretation (where 0.1 = small effect, 0.3 = moderate effect, and 0.5 = large effect ;Cohen, 1988).
As the task required participants to verbally state their response, associations from the pooled sample were consequently adjusted for verbal IQ.We also report results of sensitivity analyses excluding participants with verbal IQ < 80 (Table S3).For completeness, bivariate correlations were calculated between all variables (Table S4).
As a final check, we tested for differences in ToMpsychiatric symptoms associations between SNAP versus QUEST in the autistic group only (see Table S5 for full results).To give an estimate of the power to detect associations between ToM performance and conduct problems we conducted post hoc power calculations in the full sample on the two primary paths of interest (ToM intentionality/appropriateness − conduct problems).

Group differences in demographic variables, psychiatric symptoms and ToM task performance
See Table 1 for group means.The nonautistic group had a higher percentage of male participants (98% vs. 79% in the autistic group) and were older than the autistic group (both ps < 0.01), but there were no group differences in full scale or verbal IQ.The autistic and nonautistic group had comparable levels of conduct problems (p = 0.74), but the autistic group had higher levels of emotional problems and hyperactivity/inattention (both ps < 0.01).Confirming the first part of our hypothesis, the autistic group scored lower on the ToM task (all ps < 0.05), apart from in the goal-directed appropriateness metric, where group differences were at a trend level of significance (p =0.05).

Associations between task performance and psychiatric symptoms
Certain associations remained constant across all analyses, therefore are summarized to avoid repetition (see Table 2 for details).Sex was not a significant predictor of any outcomes (all ps > 0.10; Lines 5 and 11), and age was     2, Line 8, Columns 9-11).

Cohort differences in task performance-psychiatric symptoms associations
We found two instances of sample-specific effects; the association between ToM appropriateness and emotional problems, and the association between goal-directed appropriateness and emotional problems, where both cohort-by-task performance interaction terms were significant (both p < 0.05) (see Table S5 for full output).There was a significant association between ToM appropriateness and emotional problems in QUEST (b = −1.67,p < 0.01) but only a marginal association in SNAP (b = −0.63,p = 0.07), and a significant association between goal-directed appropriateness and emotional problems in SNAP (b = −1.17,p = 0.02) but no association in QUEST (b = 1.14, p = 0.11).

Post hoc power calculations
The calculations for the two paths of primary interest (ToM intentionality/appropriateness scores − SDQ conduct problems) suggested that the power to detect effects for the ToM intentionality-conduct problems association was questionable (51%), but satisfactory for the ToM appropriatenessconduct problems association (86%) (both coefficients estimated at two-tailed 95% significance).

DISCUSSION
There is strong evidence to suggest difficulties in ToM are associated with increased conduct problems in nonautistic populations (Capage & Watson, 2001;Holl et al., 2018;Hughes & Ensor, 2006;Sharp, 2008).However, few studies have tested whether a comparable association is found in autistic individuals, despite the high levels of conduct problems in autistic populations.The current study sought to not only test the association between experimentally-assessed ToM difficulties and parent-reported conduct problems in autistic youth (thereby minimizing the impact of shared rater variance), but also to compare the strength of associations between ToM ability and conduct problems in autistic versus nonautistic youth, the latter to better understand how established risk factors function in populations of youth with and without a diagnosis of ASD.Analyses found negative associations between ToM performance and conduct problems, hyperactivity/inattention and emotional problems in the full sample, and no evidence of moderation of ToM-behavior associations by diagnosis, as evidenced by the lack of significance of all diagnosisby-task performance interaction terms.Pooled associations became nonsignificant when verbal IQ was included as a covariate.
Notwithstanding the fact we found associations not only with conduct problems, but also with hyperactivity/ inattention and emotional problems, our primary hypothesis that the association between ToM difficulties and conduct problems reported in nonautistic populations would also be found in autistic youth was confirmed.The lack of moderation by diagnosis and the significant unadjusted association with conduct problems suggests that in the current sample, both autistic and nonautistic youths with ToM difficulties had higher levels of conduct problems.This replicates and extends findings reported from nonautistic child and adolescent samples (Capage & Watson, 2001;Holl et al., 2018;Hughes & Ensor, 2006;Sharp, 2008).With regard to the lack of specificity of associations with poorer ToM, we note that there is some existing evidence for links between ToM impairment and emotional symptoms in nonautistic populations (Colonnesi, Nikoli c, de Vente, & Bögels, 2017;Hezel & McNally, 2014;Lee, Harkness, Sabbagh, & Jacobson, 2005).The specificity of the association with hyperactivity/inattention remains unclear; this may simply reflect difficulties in paying attention to cognitive tasks or executive functioning impairments in youth with ADHD symptoms rather than evidence for a specific difficulty in ToM (e.g., Mary et al., 2016).We highlight that statistical model we used accounted for cross-domain correlations, therefore associations with emotional problems and hyperactivity cannot simply be due to overlap with conduct problems.
Results showed significant associations with performance in the ToM, but not the goal-directed, condition, suggesting the link with psychiatric symptoms is due to difficulties in higher-level understanding of mental state terms, rather than general inference.In nonautistic populations, the link between ToM and conduct problems is proposed to be due to poorer ToM leading to misinterpretation of social cues, which in turn prompts interpersonal aggression (Choe et al., 2013;De Castro et al., 2002;De Castro et al., 2005).Our results could be seen to support this suggestion, as stronger effects were found for the appropriateness score, reflecting the accuracy of the interpretation of the animated cartoons, rather than the intentionality score, which indexes the quality of mental state terms used (regardless of their appropriateness).Tasks specifically designed to measure variation in valence of mental state attributions in ambiguous social situations (e.g., De Castro et al., 2002;De Castro et al., 2005) could test this hypothesis.
In previously work (Hughes & Ensor, 2006;Sharp, 2008), adjusting for verbal ability or IQ did not alter the significance of associations between ToM and conduct problems, whereas in our sample inclusion of verbal IQ as a covariate caused associations to fall to nonsignificant trends or become fully nonsignificant.A similar drop in significance of ToM-conduct problems association when accounting for verbal ability was also reported from one study of typically developing four-year old children (Hughes et al., 1998).Inspection of the change in standardized estimates between task performance and the three domains of psychiatric symptoms when verbal IQ was included in the model suggests around one third of the association between ToM appropriateness and conduct problems (β = −0.21fell to β = −0.14)and around 40% of the association between ToM appropriateness and hyperactivity/inattention (β = −0.19 fell to β = −0.08)was due to variation in verbal IQ.The change in estimates for emotional problems was negligible (β = −0.13 fell to β = −0.12).It should be noted that some have argued that controlling for verbal ability likely removes some "true" ToM effect, given how strongly interrelated ToM and language/verbal ability are in development (Happé, 2015), and that autistic groups appear to rely more strongly on verbal ability to pass ToM tests (Happé, 1995).Disagreement regarding the role of verbal ability between current results and previous literature may be in part because studies solely drawn from nonautistic samples generally have a narrower range of cognitive ability and do not include individuals with IQ < 70 (e.g., in the study by Sharp, 2008, the lower bound on their IQ range is similar to the mean IQ in the current study), but may also be due to differences in the ages of samples (e.g., toddlerhood vs. adolescence).Research in this area would benefit from comprehensive measurement of cognitive functioning across a range of domains combined with multivariate statistical approaches, to test which relevant factors are driving different associations, and if they are domain-specific (e.g., associations are only found with tasks tapping ToM) or general (e.g., Carter Leno et al., 2018).
The lack of moderation by diagnostic status suggests that the strength of association between ToM ability and the three domains of psychiatric symptoms included at present was comparable in autistic and nonautistic participants (although we note there will be less power in the test of the interaction term as compared to the tests of associations in the pooled sample).Results suggests that certain risk factors (which are associated with psychopathology in nonautistic populations; here ToM difficulties) may function in a similar manner in the whole population (e.g., in both autistic and nonautistic individuals), however the fact they are more common in certain subgroups (e.g., autistic individuals) could partially explain the heightened prevalence of psychiatric symptoms in that subgroup.Thus we propose that the increased rates of ToM difficulties in autistic individuals (we found overall ToM performance was poorer in the autistic group, similar to that reported elsewhere; Salter et al., 2008), can not only be conceptualized as a driver of core autism symptoms, but also a risk factor for additional psychiatric difficulties.
The current paper only tests one established predictor of conduct problems; there are many more risk factors for conduct problems that could also contribute to the increased rates of behavioral difficulties in autistic individuals (e.g., inconsistent parenting strategies; Dretzke et al., 2009;Gardner, 1989;bullying;Singham et al., 2017;Wolke, Woods, Bloomfield, & Karstadt, 2000).Additionally, it remains possible that these and other risk factors for conduct problems function differently in autistic populations, in that they may have a more negative impact or less influence (as appears to be the case for low IQ; Kanne & Mazurek, 2011;Simonoff et al., 2008) compared with other groups.Understanding how established risk factors function in autistic as compared to nonautistic populations is key to understanding whether existing interventions developed for nonautistic individuals should also be offered to autistic people.
The current study has a number of strengths: first, the use of objective metrics specifically designed to measure ToM, which were not confounded by other cognitive domains (e.g., face processing).Second, both cohorts included well-characterized autistic individuals with a wide range of verbal IQ, who had their ASD diagnoses confirmed with "gold-standard" instruments.Most studies of cognitive functioning (in both autistic and nonautistic populations) are with samples with a narrow range of verbal/full scale IQ, which is not representative of autistic populations as a whole.Third, utilizing a wellpowered sample allowed us to test for a moderating effect of diagnosis upon the association between ToM and psychiatric symptoms, although it should be noted than the nonautistic group was considerably smaller in size and a nearly half had intellectual disability.However, as our hypothesis under question was whether autism moderated the association between ToM and conduct problems, what was crucial was that the nonautistic group did not meet diagnostic criteria for ASD.Furthermore, given that autistic individuals are substantially more likely to have intellectual or learning disabilities (Charman, Jones, et al., 2011;Charman, Pickles, et al., 2011), there is still debate over the validity of using a control group with a higher IQ than the autistic group when asking questions about cognitive functioning.We also note that full scale IQ and verbal ability were not significantly different between the autistic and nonautistic participants.Finally, we note that although we had a relatively modest sample size in the nonautistic group, links between poorer ToM and higher levels of conduct problems in nonautistic individuals are well-established by previous literature (e.g., Capage & Watson, 2001;Holl et al., 2018;Hughes & Ensor, 2006), including in studies that have used the same ToM task as used currently (Sharp, 2008).
In terms of limitations, the ToM task employed in the current study, although widely used, requires participants to verbally communicate their responses, so may have underestimated ToM abilities.Although not they key task metric of ToM ability (but rather a "control" condition to check findings were specific to mental state understanding), ceiling effects in the goal-directed appropriateness condition meant a binary variable was used in these analyses.This may have limited our power to detect effects, and thus results here should be interpreted with caution.Additionally, more controlled tasks such as the one included currently could be argued to be lower in ecological validity, as in real life contexts decoding social situations often involves an affective component and multiple competing inputs.Future studies should employ a battery of ToM measures, including those that do not rely on spoken communication, such as eye-tracking paradigms (Senju, Southgate, White, & Frith, 2009), and those with an affective component (Sebastian et al., 2012;Shamay-Tsoory, Tomer, Berger, Goldsher, & Aharon-Peretz, 2005) to test the impact of task content and presentation.Finally, the current analyses present cross-sectional association in a sample of adolescents; whether similar associations are found at different developmental stages, and the directionality of associations between cognition and behavior cannot be inferred from the current results.
Theoretically, as none of the diagnosis-by-task performance interaction terms were significant, current results suggest that ToM difficulties are associated with conduct problems in a similar manner in autistic and nonautistic populations, and therefore interventions focused on improving the accuracy of mental state attribution may be beneficial in improving conduct problems in general.Furthermore, as similar associations were found with hyperactivity/inattention and emotional problems, interventions may be beneficial for symptoms beyond those in the domain of conduct/oppositional defiant disorder.However, although meta-analyses report an overall improvement in ToM following targeted intervention in autistic and nonautistic samples (Hofmann et al., 2016), others contest that this is based on studies of low quality, and suggest there is little robust evidence to suggest that improvement of the knowledge acquired through training translates to changes in everyday functioning (Fletcher-Watson, McConnell, Manola, & McConachie, 2014).Instead of a sole focus on ToM, appreciation of the heterogeneity of cognitive profiles in autistic individuals, and consideration of the impact of particular individual characteristics in the context of evidence-based interventions may be of merit, although the added impact of adding a ToM component to existing interventions should be formally tested.Furthermore, the finding that verbal ability may account for a substantial proportion (but not all) of the association between ToM and psychiatric symptoms suggests that interventions should also consider language skills as a target of interest (in addition to specific domains of cognitive functioning).This is in line with previous work highlighting the role of poor communication in conduct problems in both autistic and nonautistic populations (McClintock, Hall, & Oliver, 2003;Moffitt & Silva, 1988).Conducting similar analyses in longitudinal cohorts, followed from early in childhood, along with high-quality randomized control trials, will be key to testing developmental hypotheses regarding the impact of impaired or delayed ToM and verbal abilities.Furthermore, as noted in the Introduction, conduct-disordered behavior such as aggression can be partitioned into both reactive and proactive behaviors, which are thought to have independent cognitive correlates (Dodge & Coie, 1987;Marsee & Frick, 2007).Whether a similar differentiation is found in autistic samples has not yet been well explored (although see Pouw et al., 2013).Finally, a comprehensive etiological model of conduct problems in autistic populations should not only consider child-level characteristics such as ToM abilities, but also how these may interact with other established environment-level risk factors (e.g., inconsistent parenting strategies, bullying).

CONCLUSION
The current study found that lower ToM ability was associated with higher levels of conduct problems, but also hyperactivity/inattention and emotional problems, in autistic and nonautistic youth.However, results also suggest that a substantial part of the pooled association between ToM and psychiatric symptoms was driven by individual differences in verbal IQ ability.Understanding the drivers of conduct problems in autistic individuals, and whether they are comparable to those found in nonautistic individuals, is important for accurate etiological models and will inform selection of appropriate targets for intervention.

TABLE 1 .
Demographic information, psychiatric symptoms and task performance in autistic and nonautistic groups a Missing from one participant in the autistic group.