Further developing the Frith–Happé animations: A quicker, more objective, and web‐based test of theory of mind for autistic and neurotypical adults

The Frith–Happé Animations Test, depicting interactions between triangles, is widely used to measure theory of mind (ToM) ability in autism spectrum disorder (ASD). This test began with recording, transcribing, and subjectively scoring participants' verbal descriptions, which consistently found ToM‐specific difficulties in ASD. More recently in 2011, White et al. created a more objective version of this ToM test using multiple‐choice questions. However, there has been surprisingly little uptake of this test, hence it is currently unclear if White et al.'s findings replicate. Further, the lack of an online version of the test may be hampering its use in large‐scale studies and outside of research settings. Addressing these issues, we report the development of a web‐based version of the Frith–Happé Animations Test for autistic and neurotypical adults. An online version of the test was developed in a large general population sample (study 1; N = 285) and online data were compared with those collected in a lab‐based setting (study 2; N = 339). The new online test was then administered to adults with a clinical diagnosis of ASD and matched neurotypical controls (study 3; N = 231). Results demonstrated that the test could successfully be administered online to autistic adults, who showed ToM difficulties compared to neurotypical adults, replicating White et al.'s findings. Overall, we have developed a quicker, more objective, and web‐based version of the Frith–Happé Animations Test that will be useful for social cognition research within and beyond the field of autism, with potential utility for clinical settings.


INTRODUCTION
There is considerable evidence that atypical 'theory of mind' (ToM)-the ability to infer other people's mental states (Happé, 2015)-is a cognitive feature of autism spectrum disorder (ASD; e.g., Cantio et al., 2018).A variety of tasks have been developed to measure ToM ability, which have provided evidence for ToM difficulties in autistic children and adults.Initially, ToM was measured in children using the classic false-belief task (e.g., Baron-Cohen et al., 1985), on which autistic children tend to show difficulty in representing a belief that does not correspond to their own view of the world.Following this, many more advanced ToM measures were developed in which participants are required to infer mental states of others from verbal vignettes (e.g., Happé, 1994), pictures of the eye region (e.g., Baron-Cohen, Wheelwright, Hill, et al., 2001) or video-clips of characters interacting (e.g., Dziobek et al., 2006;Murray et al., 2017).There is, however, growing awareness of the limitations of current ToM tasks in autism research, particularly for measuring ToM in adults.First, there are claims of poor validity and suboptimal and subjective scoring, which might be compounded by other cognitive (e.g., verbal, emotional) differences in ASD (e.g., Livingston, Carr, & Shah, 2019;Olderbak et al., 2019).Second, some tasks also produce ceiling effects when administered in neurotypical and autistic adults, and therefore do not capture sufficient variance in task performance (e.g., Happé, 1994).Finally, there are also practical issues with more ecologically valid ToM measures, which are lengthy to administer (e.g., Movie for the Assessment of Social Cognition takes $40 min; Dziobek et al., 2006;Shah et al., 2017) and require a trained experimenter, limiting their use outside of research settings and in large-scale population-based studies.Together, this has led to suggestions that we should be moving towards abbreviated tasks, involving multiple-choice and automated scoring systems, which can be administered online and/or in clinical settings (Livingston, Carr, & Shah, 2019).
The present study therefore aims to develop a webbased version of a quick, objective test of ToM-called the Frith-Happé Animations Test-adapted by White et al. (2011).The Frith-Happé Animations Test consists of two triangles interacting in one of three ways: drifting or bouncing like objects (Random condition), responding to each other's behaviour (goal-directed; GD), or responding to each other's mental states (ToM).The original version (Abell et al., 2000; see also Castelli et al., 2000)-widely used in autism research (e.g., Livingston et al., 2019)-involves recording, transcribing, and subjectively scoring participants' verbal descriptions of the animations.White et al. (2011) adapted the task to be more objective by using multiple-choice questions, whereby participants select whether each animation depicts 'no interaction' (Random), 'physical interaction' (GD), or 'mental interaction' (ToM).In line with previous findings of atypical ToM in ASD, 16 autistic adults had greater difficulty than 15 neurotypical participants with accurately processing the ToM, but not the Random or GD, animations (White et al., 2011).It was suggested that the objective method was as sensitive as the traditional subjective method in demonstrating well-established ToM difficulties in ASD, making the multiple-choice animations test a more useful research tool.
Despite this progress, we note some areas of White et al.'s (2011) method that could be further developed.First, they did not examine the associations between objective and subjective scores, which we aimed to address to further validate the objective test.Second, there has been little uptake of their test.The objective task is potentially less sensitive to individual differences in ToM, such that researchers may have failed to detect and publish associations between autism and task performance.Therefore, we aimed to replicate White et al.'s (2011) results in larger, more heterogeneous samples.Third, a web-based version of the test could be more efficient.Given the 'replication crisis' in many areas of science, including clinical psychology (Tackett et al., 2017), this would enable collection of larger and more diverse datasets-for example, autistic people who cannot attend labs-and reduce experimenter time.There is an increasing number of web-based platforms that facilitate programming of complex cognitive tasks for online data collection (see Anwyl-Irvine, Massonnié, et al., 2020, for an overview).However, it is currently unclear whether web-based (social) cognitive tasks are feasible and if they perform similarly online to in the lab, as very few studies directly compare online and lab performance (although see Germine et al., 2012).Therefore, we aimed to develop a web-based version of the Frith-Happé Animations Test and, critically, compare performance from online and lab participants.Finally, partly because of the aforementioned limitations, the test is rarely used outside research settings (e.g., clinics), which might become possible through development of a more accessible, online test.Such a task may also be useful for clinicians, where time is limited and a short, objective measure-potentially completed at home or in the clinic-would be advantageous.Overall, there is a need for a quicker, more objective, webbased version of the Frith-Happé Animations Test.Given the rapid increase in online and large-scale research, particularly in the era of COVID-19, this could prove to be a timely and useful task for rapidly measuring ToM in autistic and neurotypical adults outside of the lab.Across three studies, we aimed to develop such a task, and in the process, conduct a fresh empirical test of ToM in ASD.
Informed consent was obtained online from all participants, and all procedures were in line with the local ethics committees, British Psychological Society guidelines, and the 1964 Helsinki declaration and its amendments.Participants were free to withdraw from the study at any time.The study was accessed remotely via a web browser, starting with a definition of each type of animation.Following three practice trials with feedback (one of each animation type), 12 experimental trials were presented in a pseudo-randomised order.Each trial began with the animation auto-playing centrally onscreen (384 Â 288 px).Whilst viewing the animations, participants were required to select if the animation depicted no interaction (Random), physical interaction (GD), or mental interaction (ToM) between the triangles.To prompt intuitive responding, they were instructed to respond as quickly and accurately as possible using on-screen buttons (via mouse press) located below the animation.Only the first response was accepted, with no feedback.The participants viewed the entire animation and then were required to provide a free-text response (via keyboard) to 'what happened in the animation?' before the next trial.Trials were interleaved with a 100 ms fixation cross.The order in which participants completed the AQ and the Animations Test was randomised.All participants completed the test via a web browser on their own computer, rather than a mobile phone or tablet.Recent research has suggested that Gorilla is validated for the selection of stimuli via mouse press and that there are minimal influences of browser, devise type or operating system on remotely-collected data that is not time-sensitive (Anwyl-Irvine, Dalmaijer, et al., 2020).
Following White et al. (2011), participants could score a maximum of 12 (4 for each animation type) for objective scores, which were converted into percentage accuracy (Table 1).The free-text descriptions of the animations were reliably scored for the correct inference by three coders (Krippendorff's α = 0.89) in line with Castelli et al.'s (2000) 'appropriateness' score.This generated subjective scores between 0 and 8 for each of the animation types, with higher scores indicating greater accuracy (for ToM animations, this means more accurate inference of the triangles' mental states).
This pattern of results was in line with previous reports of specific autism-related difficulties in the ToM condition (e.g., Livingston, Carr, & Shah, 2019), providing convergent and divergent validity for our online test.More generally, our findings indicated that, unlike many other ToM tasks, the test is sensitive to individual differences in neurotypical individuals.Therefore, in appropriately large samples, as made possible using the internet, the task may be useful to quantify ToM in the general population.

STUDY 2
Although the results from study 1 suggested that the online version of the task was comparable to previous lab-based studies, there are concerns with the administration of psychological measures online.Whilst some suggest poorer validity (Ramsey et al., 2016), others have found cognitive tasks operate similarly when administered in the lab and online (Germine et al., 2012).To explore this issue, we compared online data from study 1 to lab-based data.

Methods
In addition to study 1's participants, 54 participants (aged 18-41 years, M age = 24.85,SD age = 4.96; 39 females) formed a convenience sample recruited using a local participant database.These participants undertook the same computerised procedure as study 1, but in a dimly lit, soundproofed laboratory, following experimenter instructions.Lab-based participants were, as expected, younger than study 1 participants, t (191.34)= 3.19, p = 0.002, d = 0.34, given that online 1 The subjective scores only served to validate the objective measure and are not reported hereafter.
data collection allows for more diverse samples (e.g., in age; Anwyl-Irvine, Massonnié, et al., 2020) and the local participant database we recruited from contained university students.
The data were scored following study 1 procedures for objective scoring; that is, participants could score a maximum of 12 (4 for each animation type), which was then converted into percentage accuracy.

Results and discussion
To assess whether online and lab groups differed on the three different conditions, we conducted pre-planned t test analyses.Given the group difference in age, we explored differences in task performance between the two samples with and without controlling for age.Lab-based participants were marginally more accurate in the GD condition than online participants (t[89.38]= À2.83,p = 0.006, d = 0.38), but there were small and nonsignificant differences in the Random condition (t [337] = À0.26,p = 0.80, d = 0.04) and critical ToM condition (t[337] = 0.83, p = 0.41, d = 0.12; Figure 1).This pattern of results held while controlling for participant age (ToM: F(1, 336) = 0.96, p = 0.33, ηp 2 = 0.003; Random: F(1, 336) = 0.00, p = 0.99, ηp 2 < 0.01).The smallto-medium difference in the GD condition remained when controlling for age (F(1, 336) = 5.65, p = 0.018, ηp 2 = 0.017).It is unclear why this was the case but, importantly, there were no group differences on the Random and critical ToM conditions, thus suggesting that the web-based version of the task overall operates similarly to its use in the lab.

STUDY 3
Having developed the web-based Frith-Happé Animations Test in non-clinical samples, we administered the task to autistic adults and matched controls.Although the internet is widely used for questionnaire-based autism research, there is a paucity of knowledge about measuring (social) cognition in this way.Indeed, the current study reports one of the first social cognitive tasks administered to autistic people online (see also Russo-Ponsaran et al., 2019), therefore representing a methodological development of general interest.In line with White et al. (2011), it was predicted that, compared to neurotypical controls, autistic adults would show difficulties in the ToM, but not the GD or Random, condition.

Methods
Seventy-one participants (36 females) aged 18-67 with a formal autism diagnosis were recruited and compared with 160 participants (80 females) aged 18-80 from study 1, selected to ensure that the groups were closely matched in age, sex, and general mental ability (see Table 2 for group characteristics).Neurotypical participants from study 1 were randomly selected until the groups were matched.General mental ability was estimated using the Spot the Word Task (Baddeley et al., 1993), which has  previously demonstrated convergent validity with the Wechsler Adult Intelligence Scale (Yuspeh & Vanderploeg, 2000).In this task, participants view 60 pairs of words comprising a real word (e.g., albatross) and non-word (e.g., zando) and are required to identify the real word.Task performance was measured as percentage accuracy.The procedure and objective data scoring were otherwise identical to study 1.Participants accessed the study via Gorilla and gave informed consent online and each participant had a percentage accuracy score for each animation type.We conducted multiple linear regressions to assess the unique contribution of ASD group status to ToM, GD and Random performance, whilst accounting for performance on the other two conditions.These analyses showed that the significant relationship between ASD group and ToM remained even after accounting for GD and Random performance (see Table 3).Further, although our groups were matched on age, sex and general mental ability, because these variables have previously been shown to be associated with ToM ability, we re-conducted the multiple regression analyses with them as additional predictors.We found the same pattern of results.Overall, these regression analyses, not previously undertaken by White et al. (2011), more robustly showed the specificity of ToM difficulties in ASD.

GENERAL DISCUSSION
Across three studies, we found that our web-based version of the Frith-Happé Animations Test operates similarly online and in the lab, in both autistic and neurotypical adults.Additionally, we found the expected ToM difficulties in autistic compared to neurotypical adults using online administration.Our findings therefore replicate and extend White et al.'s (2011) finding that the objective version of this popular ToM test is comparable to the traditional version.Enabled by a large sample of the general population, and not directly tested by White et al. (2011), we established that objective and subjective scores collected online were significantly correlated.And importantly, we showed that higher autistic traits were specifically and more strongly linked with online performance on ToM, but not GD or Random, animations.(1) = 0.01, p = 0.92, Φ = 0.01 Note: General mental ability was estimated using percentage accuracy on the Spot the Word Task (Baddeley et al., 1993).The AQ (autism-spectrum quotient; Baron-Cohen, Wheelwright, Skinner, et al., 2001) measured self-reported autistic traits (maximum score = 50) and has a clinical cut-off of 32+.Effect sizes are reported as Cohen's d for t tests and Phi Φ for chi squared tests.Significant group differences are shown in bold font.Abbreviation: ASD, autism spectrum disorder.
F I G U R E 2 Frith-Happé Animations Test-Comparing neurotypical and ASD groups.ASD, autism spectrum disorder; GD, goal-directed; ToM, theory of mind.Error bars show AE1 SEM Further, in line with White et al. (2011), we found significant differences between autistic and neurotypical people, but only in the ToM condition.This adds weight to ToM theories of autism and indicates that our online test is sufficiently sensitive to detect atypical ToM in intellectually able autistic, as well as neurotypical, adults.This is important as many other ToM tasks appear to be solved by autistic people using compensatory strategies (Livingston et al., 2021;Livingston & Happé, 2017) and/or yield ceiling effects for neurotypical adults.Therefore, we suggest that this test has important utility for future research on ToM in autistic and neurotypical adults.For example, moving forward, the test can now be used to investigate important relationships between ToM and other psychological (e.g., mental health) and social-cognitive (e.g., empathy) constructs within and beyond the field of autism, in large samples and with remote data collection.
Our findings also support suggestions that (social) cognitive research is possible using the internet.Like Germine et al. (2012), we found that participants performed similarly online to in the lab.This finding mitigates concerns about online cognitive research, such as task performance being affected by distractions and/or the lack of experimenter oversight.More generally, this test is one of the first social cognitive tasks to be successfully and specifically developed for online use in both typical and autistic adults.This development-of an objective, quick, online test of ToM-will enable its future inclusion in large scale studies that have traditionally been unable to incorporate lengthy social cognitive tasks (e.g., longitudinal studies, including behavioural genetic studies).This will enable statistically powerful investigations of ToM and its inter-relationships, including genetic correlations, with other psychological constructs and phenotypes across the lifespan.More broadly, this study highlights the opportunities of moving more cognitive autism research online to include 'hardto-reach' autistic individuals, who may be unable to attend labs, thereby making research more representative of the population (although we note the need to develop ToM tests accessible to autistic people with language/ intellectual impairment).Finally, the test can also now be adopted in clinical research to begin assessing its clinical utility (see also, Livingston, Carr, & Shah, 2019).For example, this objective test, which can feasibly be administered prior to a time-limited clinical session, may be useful for clinicians to aid understanding of autistic people's ToM abilities and thereby inform and tailor support, although this needs robust investigation.
Our findings should be considered in light of some limitations.First, we note that across the studies, although we did not formally test this, mean values suggest that neurotypical participants performed better on the Random compared to ToM animations.This differs from White et al. (2011) who found equivalent performance for neurotypical participants on these two animation types.However, we also note that Brewer et al. (2017) found a similar pattern of results to ours when using the task in a much larger lab-based study.Therefore, it is possible that the ToM animations are genuinely more difficult to solve than the Random T A B L E 3 Multiple linear regression-Group as a unique predictor of 1) theory of mind (ToM), 2) goal-directed (GD), and 3) random task performance in study 3 Note: All VIF values were <10, suggesting multicollinearity was not a concern.The residuals were normally distributed and there was no evidence of homoscedasticity.
Durbin-Watson values were all $2, suggesting errors were independent.This pattern of results held when including age, sex, and general mental ability as additional predictors in all three regression models but are not reported as the autistic and non-autistic groups were already matched on these variables.Abbreviations: β, standardised regression coefficient; ASD, autism spectrum disorder; B, unstandardised regression coefficient; GD, goal-directed; ToM, theory of mind.
animations, which is understandable given the increased complexity of the ToM animations, but that this was not revealed in White et al.'s (2011) small sample.Overall, the critical distinction to make may be between the GD and ToM conditions, given they are more closely matched on complexity and kinematics.Second, although our autistic and neurotypical participants were matched on general mental ability using an online task, future research should aim to replicate our findings using more in-depth measures of IQ.Finally, whilst the current research validated the web-based version of the task in autistic and neurotypical participants, we were not able to test whether performance on the task predicts performance on other ToM tasks, self-report measures of ToM (e.g., Clutterbuck et al., 2021), or everyday social abilities/differences.Future research should aim to investigate, for example, if autistic participants' ToM performance indexes performance on a range of other ToM tasks, as well as autistic behaviour (e.g., using the Autism Diagnostic Observation Schedule; Lord et al., 2000) and social difficulties in the real world.
To conclude, we have developed a new web-based version of the Frith-Happé Animations Test using White et al.'s (2011) multiple-choice version.It performs just as well online as in the lab and shows sensitivity to the measurement of individuals differences in ToM in both autistic and neurotypical adults.There is promise for this web-based test, which offers a fast and straightforward measure of ToM in autistic and neurotypical adults, to be used in future research and clinical work.

F
I G U R E 1 Frith-Happé Animations Test-Comparing web and lab performance.ToM, theory of mind; GD, goal-directed.Error bars show AE1 SEM