The Medical Research Council-Autism Imaging Multicentre Study (MRC-AIMS) is a UK-based multidisciplinary collaborative project to study brain anatomy and connectivity in male adults with autism spectrum disorder (ASD). As well as neuroimaging, participants completed a series of diagnostic assessments, neuropsychological tests, and questionnaires. The latter measures were selected to ensure that the samples were well described and to provide behavioral correlates for the brain analyses. This paper describes the battery of neuropsychological tests and questionnaires and explores whether cognitive measures can reliably distinguish between ASD and control groups, or ASD subtypes, and how cognitive test performance relates to ASD symptom profile or associated psychiatric symptoms.
Research into cognitive differences in ASD has been driven by three highly influential theories: the “Theory of Mind” account [ToM; Baron-Cohen, Leslie, & Frith, 1985; Lombardo & Baron-Cohen, 2010 ] proposes that people with ASD have a reduced ability to attribute independent mental states to self and others to predict and explain actions; the theory of “executive dysfunction” [Ozonoff, Pennington, & Rogers, 1991; Rumsey & Hamburger, 1988 ] posits that ASD symptoms are a result of impairments in executive functions, including planning, inhibition, flexibility, and working memory; and finally, the “weak central coherence” theory [Frith, 1989; Happé & Frith, 2006 ] suggests that people with ASD have a cognitive style that favors processing of local, detailed information over global, holistic information. These three leading cognitive theories have undergone several modifications, and it is generally accepted that none of the theories can explain all cognitive and behavioral symptoms of ASD, but that each has the capacity to account for a wide range of atypical behaviors common to ASD [Happé & Ronald, 2008 ]. A vast amount of research on cognitive skills in people with ASD has been generated, albeit with conflicting results.
For example, deficits in social cognition have been reported in ToM [Baron-Cohen, O'Riordan, Stone, Jones, & Plaisted, 1999; Baron-Cohen, Wheelwright, Hill, Raste, & Plumb, 2001a; Castelli, Frith, Happé, & Frith, 2002; Happé, 1994 ], and in emotion recognition in high-functioning adults with ASD [Bal et al., 2010; Golan, Baron-Cohen, & Hill, 2006; Golan, Baron-Cohen, Hill, & Rutherford, 2007; Wallace et al., 2011; see Uljarevic & Hamilton, 2013 for a review]. However, other studies have reported that high-functioning adolescents and children with ASD perform as well as controls on ToM tasks [Scheeren, Koot, Mundy, Mous, & Begeer, 2013 ], and some tasks of emotion recognition report that adults with ASD are not necessarily impaired [Adolphs, Sears, & Piven, 2001; Rutherford & Towns, 2008 ]. Executive function deficits, including problems with generating ideas [Boucher, 1988; Low, Goddard, & Melser, 2009 ] and selective inhibitory impairments [Adams & Jarrold, 2012 ], have been reported in ASD. However, it has been suggested that poor performance on such tasks may reflect difficulties understanding experimenters' expectations rather than any specific deficit in executive function [White, 2013 ]. One study tested multiple components of executive function in 30 young ASD adults, and results were variable even within this participant group: they reported impairments in spatial working memory, but no impairments in planning, cognitive flexibility, and inhibition [Sachse et al., 2013 ]. With respect to studies of central coherence in ASD, many studies have reported that children and adults with ASD outperform typical controls on tasks where a local processing bias is advantageous [Bonnel et al., 2010; Jolliffe & Baron-Cohen, 1997; Shah & Frith, 1983, 1993 ], but others have not replicated this [Lai et al., 2012a; White & Saldana, 2011; for a review, see Happé & Frith, 2006 ].
These inconsistencies may partially be explained by small sample sizes or poor task selection [Charman et al., 2011 ], or by heterogeneity within the autism spectrum [Brock, 2011 ]. This heterogeneity is reflected in both the variation in type and severity of autistic symptoms, as well as in the differing degrees of comorbid psychopathology within the autistic spectrum. Therefore, if different studies sample differently from this heterogeneous group, conflicting findings might be predicted. In the present study, we aimed to address these limitations, and a number of others, in the following ways.
First, with respect to sample size, the current study includes 178 participants, which is sufficient to detect an effect size as small as Cohen's d = 0.3 at a power of 0.8. With the much smaller samples often used in studies of cognition in ASD, smaller effects may not reach significance and may not be reported.
Second, most studies have tested only a narrow selection of cognitive skills, making cross-task comparison difficult. The present study sampled a wide range of cognitive abilities including emotion recognition, theory of mind, specific executive functions, phonological memory, central coherence, and dexterity. In addition, it may be important to look at skills in combination since cognitive skills do not operate in isolation (e.g., verbal fluency is dependent on general processing speed, [Spek, Schatorjé, Scholte, & van Berckelaer-Onnes, 2009 ]; or some executive function tasks require a level of theory of mind, such as reflecting on one's own plans and goals, see White, 2013). Here, in addition to the traditional method of group comparisons on individual measures, we also used support vector machine (SVM) algorithms, a supervised multivariate classification method, which has proved useful, although not perfect, at distinguishing clinical groups using neuroimaging data [Ecker et al., 2010 ]. Here, SVM has been used with traditional neuropsychological data for the first time, and we aim to test whether multivariate pattern information could also be useful for distinguishing between two groups.
Third, most previous studies compare average performance of an ASD group with average performance of a non-ASD group. This approach ignores heterogeneity within the ASD sample, yet it is well established that the condition is a “spectrum” and that presentation varies enormously within the spectrum. Therefore, we examined whether scores on cognitive tests were correlated with overall symptom severity, and with severity of symptoms on separate domains. Some studies have examined this—e.g., deficits in executive function have been associated with social [Happé & Frith, 2006; Ozonoff et al., 1991 ] and nonsocial behaviors in ASD [Hill, 2004 ]. Brunsdon and Happé [2014 ] have recently reviewed the literature on the relationship between symptoms and neuropsychological/cognitive test performance in ASD groups, most of which has concerned children.
Fourth, ASD adults often have high levels of comorbid symptomatology, particularly depression, anxiety, and obsessionality [Joshi et al., 2013; Russell, Mataix-Cols, Anson, & Murphy, 2005 ], yet these factors are generally not considered in studies of cognition in ASD. Reports on cognitive ability in adults with depression, anxiety, and obsessive–compulsive disorder (OCD) are mixed [Castaneda, Tuulio-Henriksson, Marttunen, Suvisaari, & Lönnqvist, 2008 ], although one well-replicated finding is that executive function deficits are evident in individuals with depression [Fossati, Amar, Raoux, Ergis, & Allilaire, 1999; Marazziti, Consoli, Picchetti, Carlini, & Faravelli, 2010; Smith, Muir, & Blackwood, 2006 ], anxiety [Airaksinen, Larsson, Lundberg, & Forsell, 2004 ], and OCD [Cavallaro et al., 2003 ]. In the current study, we anticipated that ASD participants would have elevated levels of depression, anxiety, and obsessionality and examined the relationship between cognitive performance and degree of comorbid psychopathology. Significant associations could be useful in two respects: cognitive tasks could be used to predict the development of psychopathology, or differing levels of comorbid symptomatology could account for variation in cognitive ability.
Finally, we compared cognitive profiles of two ASD diagnostic subtypes—Asperger syndrome (AS) and high-functioning autism (HFA). A diagnostic distinction between these groups has, until now, been made on the basis of the presence of a language delay in HFA individuals and no delay in AS. Differences in linguistic ability have been reported in children with HFA and AS [Noterdaeme, Wriedt, & Hohne, 2010; Sahyoun, Soulieres, Belliveau, Mottron, & Mody, 2009 ], however review papers have concluded that the subtypes cannot be reliably distinguished on the basis of diagnostic criteria and cognitive profile [Macintosh & Dissanayake, 2004; Witwer & Lecavalier, 2008 ]. In line with this view, the fifth revision of the Diagnostic and Statistical Manual (DSM-5) collapses these diagnostic categories (along with Pervasive Developmental Disorder-Not Otherwise Specified) into a single category of ASD. Nevertheless, confirming whether any differences do exist is of interest because distinct cognitive profiles may be useful for clinical intervention and prognosis.
To summarize, the present study measured cognitive functioning and symptom profiles in a group of 178 male adults of normal intelligence, where half the participants were on the autism spectrum and half were neurotypical. We aimed to assess the utility of cognitive measures to predict diagnostic group membership, and to indicate severity of symptoms of ASD or commonly associated conditions.
Eighty-nine male adults with ASD and eighty-nine matched neurotypical controls aged 18–43 years were recruited and assessed at one of the three AIMS-UK centers: the Institute of Psychiatry, London; the Autism Research Centre, University of Cambridge; the Autism Research Group, University of Oxford. All participants were right-handed. Approximately, equal ratios of cases to controls were recruited at each site: London, 41 ASD and 41 controls; Cambridge, 30 and 32; Oxford, 18 and 16, respectively.
Exclusion criteria for all participants included a history of major psychiatric disorder (with the exception of major depressive or anxiety disorders), head injury, genetic disorder associated with autism (e.g., fragile X syndrome, tuberous sclerosis), or any other medical condition affecting brain function (e.g., epilepsy). All ASD participants were diagnosed with ASD according to ICD-10 research criteria. If a language delay (no use of single words before 24 months, or no phrases before 33 months) was recorded on the Autism Diagnostic Interview-Revised [ADI-R; Lord, Rutter, & Couteur, 1994 ], HFA was diagnosed (N = 34). If no language delay was recorded on the ADI-R, AS was diagnosed (N = 55).
The study was given ethical approval by the National Research Ethics Committee, Suffolk, UK. All volunteers gave written informed consent.
All participants completed a series of background measures assessing symptomatology and intelligence, and a series of neuropsychological tests.
ASD diagnosis was confirmed using the ADI-R, which is a semi-structured interview conducted with parents or caregivers. It was allowed for participants to be 1 point below cutoff for one of the three ADI-R domains in the diagnostic algorithm. The Autism Diagnostic Observation Schedule Generic [ADOS-G; Lord et al., 2000 ], a semi-structured, standardized observational assessment, was used to assess current symptoms for all participants with ASD.
All participants completed three questionnaires assessing autistic traits (Autism Spectrum Quotient; AQ) [Baron-Cohen, Wheelwright, Skinner, Martin, & Clubley, 2001b ], empathy (Empathy Quotient; EQ) [Baron-Cohen & Wheelwright, 2004 ], and systemizing (Systemizing Quotient; SQ-R) [Wheelwright et al., 2006 ]. These instruments show association with common genetic polymorphisms [Chakrabarti et al., 2009 ] and are widely used both for screening for ASD and for measuring these traits dimensionally in the general population.
Participants also completed three questionnaires measuring symptoms of depression, anxiety, and obsession and compulsion. The Beck Depression Inventory [BDI; Beck, Steer, & Brown, 1996 ] and the Beck Anxiety Inventory [BAI; Beck & Steer, 1990 ] each includes 21 items and gives a maximum score of 63. The Obsessive–Compulsive Inventory-Revised [OCI-R; Abramowitz & Deacon, 2006 ] includes 18 items and gives a maximum score of 72.
The Wechsler Abbreviated Scale of Intelligence [WASI; Wechsler, 1999 ] was used to assess the general cognitive abilities of all participants. The WASI comprises four subtests, two verbal and two performance, and yields three standardized index scores: Verbal IQ (VIQ), Performance IQ (PIQ) and Full Scale IQ (FSIQ).
Tasks were selected to test the core domains considered abnormal in ASD based on the extant literature. Tests tapped emotion processing, theory of mind, language/phonological memory, executive functions, central coherence, and manual dexterity/handedness. For further details of tests, see supplementary materials.
- The Karolinska Directed Emotional Faces (KDEF): Emotion recognition [Lundqvist, Flykt, & Öhman, 1998 ]. Participants indicated which emotion (happy, sad, angry, disgust, fear, surprise, or neutral) was displayed by a color face shown on a computer screen, using a 7-alternative forced choice task [Sucksmith, Allison, Baron-Cohen, Chakrabarti, & Hoekstra, 2013 ]. There were 140 trials. Dependent variables were percentage accuracy and mean reaction time.
- The “Reading the Mind in the Eyes” Task (RMET): Emotion recognition [Baron-Cohen et al., 2001a ]. Participants were shown a black-and-white photograph of eyes and selected which word, from a choice of four, best described what the person in the photograph was thinking or feeling. There were 36 trials. Dependent variables were accuracy (total correct) and mean reaction time.
- The Frith-Happé Animations Test: Theory of mind [Abell, Happe, & Frith, 2000 ]. Participants viewed six silent animations featuring two triangles interacting in such a way as to convey intentions toward the other character's mental state (coaxing, mocking, seducing, surprising; ToM animations) or physical state (leading, fighting; “goal-directed” animations). Participants responded verbally to the question “What happened in the cartoon?” Responses were scored for complexity of mental state terms used (“intentionality”; 0–3) and accuracy of answer given (“appropriateness”; 0–2). Summed scores for intentionality and for appropriateness were the variables calculated for ToM and goal-directed animations.
- Story Test: Theory of mind. Participants were asked a read a story and answer second-order false belief and justification questions. Answers were scored 0 (don't know, incorrect), 1 (partially correct), or 2 (fully and explicitly correct), forming the dependent variable.
- FAS Task: Generativity. Participants were asked to produce orally as many words as possible beginning with a particular letter (F, then A, then S). They had 60 sec per letter. The dependent variable was the number of words produced.
- Nonword Repetition (NWR): Phonological memory [Adapted from Gathercole, Willis, Baddeley, & Emslie, 1994 ]. The participant heard a nonword aloud (e.g., “tirroge”) and then attempted to repeat it immediately. There were 28 trials. The dependent variable was the number of correct responses.
- Go/No Go Test: Attention/inhibition (executive function), [Adapted from Rubia et al., 2001 ]. Participants were asked to indicate whether a series of arrows were pointing left (press “1”), right (press “2”), upward (no key press). There were 300 trials. Dependent variables were errors of omission (as % of trials), errors of commission (as % of trials) and beta, a summary measure indexed according to the signal detection theory [Green & Swets, 1966 ].
- Embedded Figures Test (EFT): Central coherence [Witkin, Oltman, Raskin, & Karp, 1971 ]. Participants were asked to locate a nonmeaningful geometric figure (target) within a larger complex form. There were 12 items. The dependent variables were total correct and mean time to find the shape per trial (in seconds).
- Purdue Pegboard Test: Manual dexterity [Tiffin & Asher, 1948 ]. In the first three subtests, subjects had 30 seconds to fill holes with pegs with the right hand (right hand) then the left hand (left hand), and finally with both hands (both hands) alternatively. Dependent variables were number of holes filled for each subtest and for the sum of the three subtests. In a fourth subtest, participants assembled a peg, then a washer, then a collar, then another washer, as many times as they could in 60 sec. This last dependent variable was the number of parts correctly assembled.
Before participants attended a testing center, they completed some information (date of birth, ethnicity and level of education, details of any regular medication) on a secure web site. They indicated whether they had ever been diagnosed with any of the following: ASD, attention deficit/hyperactivity disorder (or hyperkinetic disorder), OCD, Tourette's syndrome, language delay, epilepsy, depression, schizophrenia, bipolar disorder, personality disorder, fragile X syndrome, tuberous sclerosis, or general learning disability.
The questionnaires and tasks available for completion before the day of the appointment were (as they were named on the web site, with their usual name in brackets): Your Personality Questionnaire (AQ), Your Feelings Questionnaire (EQ), Your Interests Questionnaire (SQ-R), The Eyes Test (RMET), Go/No Go Test. Participants were reminded that they should complete all the questionnaires and tests by themselves. Participants without access to the internet completed these tasks during their appointment.
On the day of the appointment, the ASD participants first had an ADOS-G module-4 assessment and then completed the WASI. The control participants started the day with the WASI. The remaining tasks (Faces Test (KDEF), Animations Test, Story Test, FAS, non-word repetition (NWR), embedded figures test (EFT), and Purdue pegboard) were completed in a randomized order. While the ASD participants were being assessed, an ADI-R was carried out with a parent.
Method of Analysis
Calculating the empathizing–systemizing discrepancy
E–S discrepancy, referred to as the “D-score” [Goldenfeld, Baron-Cohen, & Wheelwright, 2005; Lai et al., 2012b ] was quantified as the difference between standardized EQ and SQ-R scores. The EQ and SQ-R scores were standardized by subtracting the population mean from the raw score then dividing by the maximum possible score: S = (SQ-R-<SQ-R>)/150 and E = (EQ-<EQ>)/80, where <SQ-R> and <EQ> were the estimated population means (55.6 for SQ-R and 44.3 for EQ) derived from a previous large-scale UK study (N = 1761) [Wheelwright et al., 2006 ]. The discrepancy between systemizing and empathizing was then quantified as D = (S-E)/2. Larger D-scores indicate a stronger drive to systemize than to empathize, smaller D-scores indicate a stronger drive to empathize than to systemize.
Comparison of participant groups
The ASD and control groups were compared on all neuropsychological measures and questionnaire scores using t-tests. AS and HFA groups were then compared in the same way. Although some measures were not normally distributed, parametric tests were used because the sample sizes in the current study are considered large enough to be robust to deviations from normality [Skovland & Fenstad, 2001 ]. P-values were Bonferroni adjusted to correct for multiple comparisons, therefore a P-value of less than 0.002 was considered significant.
Classification using support vector machine (SVM)
A linear SVM was used to classify between individuals with ASD and controls, and between AS and HFA participants, on the basis of their task performance on a set of 12 variables. The 12 variables were VIQ, PIQ, and ten dependent variables from the neuropsychological/experimental tasks. Dependent variables for this analysis were selected so as not to be inherently interdependent; hence one variable was selected from each test (with the exception of the ToM animations, where intentionality and appropriateness scores are in principle orthogonal). Variables were chosen based on data distribution (e.g., no floor or ceiling effects) and/or conventional use in the research literature.
Classification using SVM has been described in detail elsewhere [Burges, 1998; Schoelkopf & Smola, 2002 ]. Briefly, SVM is a supervised multivariate classification method where input data are classified into two classes (e.g., individuals with ASD and neurotypicals) by identifying a separating hyperplane or decision boundary, which maximizes the margin (i.e., distance from the hyperplane to the closest data points). The algorithm is initially trained on a subset of the data to find a hyperplane that best separates the input space according to the class labels (e.g., − 1 for cases, + 1 for controls). This is achieved by maximizing the margin (i.e., distance from the hyperplane to the closest data points; [Vapnik, 1995 ]. Once the decision function is learned from the training set, it can be used to predict the class of a new set of test examples.
Our implementation used LIBSVM software (http://www.csie.ntu.edu.tw/~cjlin/libsvm/) implemented in Matlab with a linear kernel and a regularization parameter (C) set to the default of 1. Because each variable in its raw form was not scaled similarly, we used a procedure to scale each variable between values of − 1 and 1. This reduced feature swamping effects of variables with large values and ranges compared to other variables. Scaling parameters were estimated on the training data within each fold of the cross validation loop and were then used to transform the test data. We trained and tested the classifier using a leave-two-out cross validation scheme, whereby on each cross validation fold, one individual from each group is left out as “test” cases, and the remaining individuals are used as the training set. To evaluate performance of the classifier, we used measures of accuracy, sensitivity, and specificity. Sensitivity and specificity are defined as:
sensitivity = TP/(TP + FN)
specificity = TN/(TN + FP) where TP is the number of true positives (i.e., the number of ASD individuals correctly classified), TN is the number of true negatives (i.e., number of neurotypical individuals correctly classified as controls), FP is the number of false positives (i.e., number of controls classified as ASD individuals), and FN is the number of false negatives (i.e., number of ASD individuals classified as controls). These performance metrics were also tested under conditions where the class labels (e.g., controls or ASD) were completely randomized (i.e., permutation test with 10,000 permutations) in order to evaluate the probability of getting specificity and sensitivity values higher than the ones obtained during the cross-validation procedure by chance.
The SVM analysis excludes participants with missing values. The remaining sample size was 58 ASD (35 AS, 23 HFA) and 66 controls.
Associations between cognitive measures and clinical symptoms
A correlation matrix was constructed to investigate associations between ASD symptom measures (ADI-R/ADOS-G/AQ/D-score) and cognitive measures, and between the comorbid symptom measures (BDI, BAI, OCI-R) and cognitive measures, for all the subjects together and for the ASD group and control group separately. Because it was predicted that scores on BDI, BAI, and OCI-R would be associated with executive function, we included all measures of the Go-No-Go (attention/inhibition) task in the correlation matrix. Where significant associations were found analyses of covariance (ANCOVAs) were conducted, using the symptom measure as a covariate, to establish whether differences on cognitive measures between cases and controls remained significant.
The cognitive function of adults on the autistic spectrum has undergone extensive investigation, but the results of previous studies have been inconsistent. This may be partially due to methodology (small sample sizes, poor task selection), or due to heterogeneity within the autistic spectrum. In the current study, we attempted to address some of the limitations of previous studies and tested a large sample of male adults on a range of cognitive tasks. Half of the participants were on the autistic spectrum and half were not. We had four aims: first, to determine whether reliable group differences existed on performance on individual cognitive tasks or on a combination of tasks between cases and controls, and therefore whether these might be useful in categorizing individuals; second, to establish whether performance on the tasks was correlated with degree of autistic symptom severity within diagnostic groups, and third, with degree of comorbid psychopathology. Last, we examined whether cognitive profile distinguished putative subgroups within the autism spectrum.
The use of multiple neuropsychological tasks was justified since the correlation matrix demonstrated that different tasks were associated with one another to varying degrees, thus likely tapping different cognitive domains. Our results suggest that some of these tasks distinguished an ASD group from a neurotypical group (with comparable IQ). The control group significantly outperformed the ASD group on tasks tapping social cognition (KDEF, Eyes Test, ToM animations), executive function (Go No-Go task), and motor performance (pegboard), even when Performance IQ, which differed significantly between ASD and control groups, was partialled out. These highly significant results suggest that there are certain cognitive deficits that are characteristic of male adults on the autistic spectrum. However, there was no clear deficit on tasks tapping generativity (FAS task), phonological memory (nonword repetition), or central coherence (EFT task).
We also investigated whether a cognitive profile across a combination of tasks could distinguish between individuals in the ASD and control groups. ASD is a complex and heterogeneous condition; therefore, it is unlikely that any single model will classify cases and controls 100% accurately when compared to the outcome of gold-standard diagnostic measures (i.e., ADI-R and ADOS-G). Nevertheless, results of the SVM analysis indicated that participants could be accurately classified as ASD or control at a level that was much better than chance (78% sensitivity and 85% specificity). SVM from the same sample using magnetic resonance imaging (MRI) data from a 30-min structural scan allowed group distinction to be achieved at similar rate [90% sensitivity and 80% specificity; Ecker et al., 2010 ]. We suggest that using data sets from multiple models in conjunction (e.g., cognitive and MRI data), where each performs significantly better than chance, could provide valuable objective tools to aid the diagnostic process. This needs to be tested in “real-world” clinical situations where the comparison groups include people with other neurodevelopmental disorders, and/or those with complex personality structures seeking diagnosis relatively late in life.
Regarding our second aim, nonsignificant correlations indicated that ASD symptom severity was unrelated to the cognitive factors examined here. This highlights how variable the autistic spectrum can be, since an individual's symptom severity does not predict their skill level on any particular cognitive domain, and likewise, cognitive skill level is not indicative of ASD symptom severity. This supports the idea that underlying neuropsychological mechanisms may be the same even when clinical presentation is different, which has implications for clinical practice and genetic research.
As expected, the ASD group had elevated severity of comorbid psychopathology. However, the predicted association between measures of executive function and degree of depression, anxiety, or obsessionality was not significant. There were moderate associations indicating that increased depressive symptomatology was associated with poorer ToM, which is in line with previous reports of poor ToM in adults with depression [Fossati et al., 1999; Smith et al., 2006 ]. We suspected that elevated psychopathological symptoms in the ASD group might account for the poorer performance on the cognitive tasks when compared to the controls. However, the results did not support this, suggesting that the deficit in ToM was a factor of being on the autistic spectrum, not a factor of having comorbid symptoms of depression.
With respect to ASD diagnostic subtypes, there were no significant differences between AS and HFA groups on individual cognitive measures, and the multivariate SVM technique did not classify groups any better than chance. In terms of autistic symptom severity, we did find that the HFA group exhibited greater symptom severity than in the AS group in childhood (as measured by the ADI-R), but that symptom severity had leveled out by adulthood (as measured by the ADOS-G). This suggests that a language delay, which distinguishes HFA from AS, is associated with greater severity and wider range of autistic symptoms in childhood, but that these differences do not persist into adulthood with respect to behavioral or cognitive profiles. On balance, therefore, our data are consistent with the idea of collapsing subtypes within the autism spectrum in DSM-5.