Age, not autism, influences multisensory integration of speech stimuli among adults in a McGurk/MacDonald paradigm

Differences between autistic and non‐autistic individuals in perception of the temporal relationships between sights and sounds are theorized to underlie difficulties in integrating relevant sensory information. These, in turn, are thought to contribute to problems with speech perception and higher level social behaviour. However, the literature establishing this connection often involves limited sample sizes and focuses almost entirely on children. To determine whether these differences persist into adulthood, we compared 496 autistic and 373 non‐autistic adults (aged 17 to 75 years). Participants completed an online version of the McGurk/MacDonald paradigm, a multisensory illusion indicative of the ability to integrate audiovisual speech stimuli. Audiovisual asynchrony was manipulated, and participants responded both to the syllable they perceived (revealing their susceptibility to the illusion) and to whether or not the audio and video were synchronized (allowing insight into temporal processing). In contrast with prior research with smaller, younger samples, we detected no evidence of impaired temporal or multisensory processing in autistic adults. Instead, we found that in both groups, multisensory integration correlated strongly with age. This contradicts prior presumptions that differences in multisensory perception persist and even increase in magnitude over the lifespan of autistic individuals. It also suggests that the compensatory role multisensory integration may play as the individual senses decline with age is intact. These findings challenge existing theories and provide an optimistic perspective on autistic development. They also underline the importance of expanding autism research to better reflect the age range of the autistic population.


| INTRODUCTION
The brain is immersed in a rich world of sensory signals.
To most reliably and efficiently interpret these myriad inputs, it must make use of different types of congruence across modalities to tie relevant signals together in a process known as multisensory integration (MSI).The most important and widely studied avenue for MSI is temporal proximity, upon which all multisensory interactions depend to some degree (Chen & Vroomen, 2013;Costantini et al., 2016;Meredith et al., 1987;Munhall et al., 1996;Shams et al., 2000).Although there are many different ways in which MSI can enhance perception (Green & Angelaki, 2010;Parise & Ernst, 2017;Sumby & Pollack, 1954;Van der Burg et al., 2008;van Ee et al., 2009;Vroomen & de Gelder, 2000), speech perception may be the most significant to daily interactions.The integration of visual signals with their auditory counterparts can greatly enhance our understanding of speech (Erber, 1969;Irwin & DiBlasi, 2017;Sumby & Pollack, 1954;Woodhouse et al., 2008) and even produce multisensory illusions (Mcgurk & Macdonald, 1976) when the stimuli are presented sufficiently close in time (Munhall et al., 1996).One can experience this influence by simply attempting to understand a speaker across a noisy room with or without one's eyes open.
It is in such environments, in which the reliability of relevant auditory signals is compromised by competing inputs, that the most benefit is generally afforded by the integration of visual information (Erber, 1969;MacLeod & Summerfield, 1987;Sumby & Pollack, 1954).It is notable, then, that it is precisely under these circumstances that autistic individuals struggle most with speech perception (Ruiz Callejo et al., 2023;Alc antara et al., 2004;Fadeev et al., 2023;Mamashli et al., 2017).Autism is of particular interest to our understanding of this intersection of MSI, speech perception, and temporal processing because it appears to involve differences on some level in each area (Feldman et al., 2018;Kwok et al., 2015;Rapin & Dunn, 2003;Sperdin & Schaer, 2016;van Laarhoven et al., 2019;Zhou et al., 2018).Our understanding of these issues is, conversely, of particular significance to those with autism because of the manner in which they may contribute to broader social and communication differences.
Many individuals with autism1 demonstrate impairments in speech processing (Kwok et al., 2015;Rapin & Dunn, 2003;Sperdin & Schaer, 2016) as well as attenuated multisensory effects, particularly when young (Feldman et al., 2018).In light of the crucial role temporal dynamics have been shown to play in MSI, and MSI in turn on speech perception, differences in temporal processing may be underlying factors in both of these disparities.It is worth noting that even though auditory and visual information may originate from the same source in the environment, these signals never arrive perfectly simultaneously to the brain.Light travels faster than sound, but auditory stimuli have a lower signal transduction latency (Jain et al., 2015;Kemp, 1973), so the brain must be both tolerant and adaptable to varying degrees of asynchrony between sensory streams to allow integration of relevant stimuli.Tolerance to asynchrony can be observed in the window of perceived synchrony (WPS), which is the range of stimulus onset asynchronies (SOAs) over which participants are still likely to perceive multisensory signals as simultaneous.Narrowing of this window, which can be seen as a refinement of temporal processing acuity, occurs during typical development (Hillock et al., 2011;Hillock-Dunn & Wallace, 2012;Lewkowicz & Flom, 2014) but is both delayed and diminished among those with autism (de Boer-Schellekens et al., 2013;Foss-Feig et al., 2010;Stevenson, Siemann, et al., 2014).However, recent research challenges the degree to which this applies to autistic adults (Ainsworth & Bertone, 2023;Weiland et al., 2022;Zhou et al., 2022).
Adaptability to asynchrony is seen in temporal recalibration (Fujisaki et al., 2004;Vroomen et al., 2004), an effect in which the point of subjective simultaneity (PSS), where participants are most likely to report audiovisual inputs as synchronized, shifts according to prior experience.For example, after hearing an auditory stimulus such as a beep leading a visual stimulus such as a flash, a participant will be more likely to perceive a similarly leading beep as simultaneous with a flash ( Van der Burg et al., 2013).This effect also extends to more complex speech stimuli (Van der Burg & Goodbourn, 2015).Some studies have found that this rapid temporal recalibration effect is also diminished in those with autism (Noel et al., 2017;Turi et al., 2016), although the one with the largest adult sample did not (Weiland et al., 2022), again raising questions about the existence of temporal processing differences in adults.
Together, these differences in MSI and temporal processing have given rise to theories that posit that basic sensory factors may contribute to the higher-level social differences seen in autism via their influence on language and communication (Baum et al., 2015;Donohue et al., 2012;Stevenson et al., 2018;Stevenson, Segers, et al., 2014).Stevenson et al. (2018) found that autistic children's WPS width correlates negatively with the degree of audiovisual integration they experience, which in turn correlates positively with recognition of speech in noise.They took this as evidence that MSI mediates an influence of temporal processing acuity on speech perception in autism.Given the evidence of these cascading effects, understanding the relationship between temporal processing differences, MSI, and speech perception is crucial to illuminating the broader autistic behavioural profile.
Most prominent among the paradigms used to investigate MSI with speech stimuli is the McGurk/ MacDonald effect (Mcgurk & Macdonald, 1976).This effect occurs when participants are presented with conflicting phonemes (the smallest auditory components of speech) and visemes (their visual counterparts), leading to an illusion in which what is heard is influenced by what is seen.For example, the presentation of a /ba/ phoneme with a /ga/ viseme tends to lead participants to report hearing the phoneme /da/.This phenomenon, dubbed a fusion, is highly dependent upon temporal alignment (Munhall et al., 1996) and has been shown to correlate negatively with the width of the WPS (Stevenson et al., 2012), which is, again, wider on average among autistic individuals.As such, it is unsurprising that autistic individuals have shown attenuated susceptibility to the McGurk/MacDonald illusion, at least as children.
In a meta-analysis focusing on the McGurk/ MacDonald illusion (Zhang et al., 2019), it was found not only that autistic individuals show less susceptibility to the effect, but also that the magnitude of this between-group difference increases with age.This led the authors to conclude that non-autistic people continue to develop in their ability to integrate audiovisual speech stimuli, whereas autistic individuals' progress may be hampered by heightened attention to local details and reduced orientation to social information.However, it bears noting that 8/9 studies included in their meta-analysis had child samples and that the only adult study found no difference between groups in the strength of the McGurk/MacDonald effect (Saalasti et al., 2012).Additionally, two studies not included in the meta-analysis (Keane et al., 2010;Stevenson et al., 2018) did not find a difference between groups in susceptibility to the illusion.Notably, this includes the study with the largest previous sample size (Stevenson et al., 2018) and one of the few with an adult sample (Keane et al., 2010).
Such inconsistencies raise questions about the degree to which MSI findings with autistic children extend to adults.These are highlighted by findings that autistic children may catch up to their non-autistic peers in their ability to integrate audiovisual speech signals embedded in noise by early adolescence (Foxe et al., 2015).In fact, it has been found with several paradigms that the differences in MSI seen between autistic and non-autistic children are no longer apparent by adulthood (Beker et al., 2018;Crosse et al., 2022).Theories that posit that it is persistent MSI deficits that drive difficulties with speech perception and other higher order differences between autistic and non-autistic adults are challenged by these findings.Beyond theory, because multisensory training has been shown to be highly effective (Nava et al., 2020;O'Brien et al., 2023;Setti et al., 2014), understanding the ages at which these differences exist is essential to tailoring therapeutic interventions for autistic individuals.
In addition to age, a significant factor in the heterogeneity of findings may be sample size.In a review of McGurk/MacDonald studies, Magnotti and Beauchamp (2018) demonstrated that a publication bias towards significant results would produce a vast overestimation of real population differences given the small sample sizes conventional in this field of research.This led them to conclude that the published estimates of the differences between groups in MSI measured using the McGurk/ MacDonald effect are inflated.They argued that to alleviate this effect of size inflation and enhance replicability, sample sizes must be increased considerably.
In order to examine the degree to which findings from previous studies with children, limited in both scope and age range, extend to autistic adults, we recruited the largest sample to date for a study investigating differences between autistic and non-autistic adults in temporal processing and audiovisual integration of speech stimuli.We measured these using a version of the McGurk/MacDonald task involving manipulation of SOA and both syllable and simultaneity judgments.This allowed us to compare the rate at which the illusion occurs as well as the likelihood for participants to perceive stimuli as synchronized, their WPS, and the effects of rapid temporal recalibration.We predicted diminished susceptibility to the McGurk/MacDonald effect, blunted temporal acuity (i.e., a wider WPS) and an attenuated effect of temporal recalibration in autistic versus nonautistic participants.

| Participants
We recruited 666 autistic participants via the Netherlands Autism Register (NAR, https://nar.vu.nl/) and 517 non-autistic participants via the NAR as well as Prolific Academic.The autistic participants reported a formal diagnosis by an independent, qualified clinician.The non-autistic participants reported no diagnosis of autism.All 1183 participants were fluent in Dutch.NAR participants received €15 gift cards, and Prolific Academic participants were paid £15.All participants were naïve to the purpose of the study and gave informed consent prior to the experiment.The experiment was approved by the ethical committee from the Vrije Universiteit Amsterdam (VCWE-2020-041R1) in accordance with all guidelines and regulations as specified in the Netherlands Code of Conduct for Research Integrity.
A total of 147 participants were excluded because either the demographic information, the AQ-28 data, or the ICAR data were missing.The data from another 167 participants were excluded from further analyses because their performance on the task matched one or multiple exclusion criteria, as pre-registered in As Predicted (#102341, see https://aspredicted.org/g93a3.pdf).For precise information on the number of participants from each group excluded (and the reasons for their exclusion), refer to Figure S1.The demographic information for the remaining 869 participants is depicted in Table 1, and the age distribution of the groups is shown in Figure 1.Note that the non-autistic sample used in this study comprises the entirety of that in our earlier study (Jertberg et al., 2023), in addition to later recruits.

| Apparatus and stimuli
The experiment was programmed and conducted online using the Neurotask platform (www.neurotask.com).Participants completed the task using their own hardware and the Google Chrome browser.The stimuli were taken from Hillock-Dunn et al. (2016) and included videos of an actress saying the syllable /ga/ with either the corresponding audio (for congruent trials) or the audio of the same actress saying the syllable /ba/ dubbed over the video (for incongruent trials).SOA was either À500, À260, 0, 260 or 500 ms.Here, negative signifies that audition was leading, and vice versa.All videos lasted 2000 ms.

| Procedure
Trials began with a black fixation cross in the centre of a white screen for 1000 ms.Subsequently, the video played for 2000 ms.After its conclusion, participants were prompted to make two self-paced responses.First, they  were asked whether they heard /ba/, /da/, or /ga/ by pressing either the b, d, or g key, respectively.Second, they were asked whether the video and audio were synchronized or not by pressing the 1 or 0 key, respectively.
The following trial was initiated as soon as the synchrony judgement was given.The two congruency types and five SOAs produced a total of 10 unique trial types.After reading the written instructions and completing 10 practice trials (one of each stimulus type), participants completed 10 repetitions of each trial type, for a total of 100 experimental trials in randomized order.The experiment lasted approximately 7 min and was part of a larger battery of online tasks (not reported here).Figure 2 illustrates an example trial sequence. .This provides insight into the influence of age on the occurrence of the illusion and allows a comparison between groups that is not confounded by their disparity in age (see Table 1 and Figure 1).

| Syllable responses
F I G U R E 2 Two example trials used in the present study.The participants viewed a movie of an actress mouthing the syllable /ga/.On half the trials, audio corresponded to the video (i.e., congruent trials), whereas on the remaining trials, the audio of the syllable /ba/ was played (i.e., incongruent trials).The onset between the voice and the lip movement was manipulated, and the participants were instructed to make two judgments.First, participants reported whether they heard /ba/, /ga/, or /da/.Subsequently, they judged whether the voice was synchronized with the lip movements.Note: a computer-generated face overlays the actress for privacy reasons.We conducted a repeated measures analysis of variance (ANOVA) on the mean proportion of /da/ responses with SOA and congruency as within-subjects variables and group as a between-subject factor using Just Another Statistical Program (JASP) (Love et al., 2019).Here, and elsewhere in the manuscript, alpha was set to .05, and pvalues were Hyunh-Feldt corrected to avoid sphericity violations.As seen in Figure 3a However, because of the disparity in the age distributions between groups (see Table 1), we conducted an exploratory analysis with age as a covariate and found that the group effect was no longer significant, nor were any of the previously significant interactions including group.Instead, we discovered a strong effect of age (F[1866] = 54.194,p < .001),wherein older participants from both groups tended to experience the illusion more frequently than younger ones (see Figure 3b).Age also interacted with SOA (F[1866] = 37.815, p < .001),congruency (F [1866] = 50.695,p < .001),and SOA Â congruency (F [1866] = 38.530,p < .001),augmenting all of their effects.
Note that this section focused on the /da/ responses, as they reflect the classic McGurk/MacDonald illusion, and rates of visual capture (/ga/ responses in incongruent trials) were extremely low (approximately 2% of trials).For transparency, information regarding the rates of /ba/ and /ga/ responses can be found in Tables S3-S6 and  Figure S2.

| Simultaneity judgments
Figure 4a plots the mean proportion of simultaneity judgments as a function of SOA and congruency for both groups.Figure 4b shows the mean proportion of simultaneity judgments collapsed across SOAs for both groups and congruency conditions as a function of age, divided into bins of 10 years.
We conducted a repeated measures ANOVA on the mean proportion of synchrony responses with SOA and congruency as within-subjects variables, group as a between-subjects variable, and age as a continuous covariate.This yielded a significant main effect of SOA (F [1866] = 208.845,p < .001).The rate of synchrony responses across the SOAs formed a typical Gaussian distribution with a slight visual leading offset (see Figure 4a).Additionally, the proportion of synchrony responses was much higher when the stimuli were congruent than incongruent (F[1866] = 377.547,p < .001).Congruency also interacted with SOA, such that its effect was most pronounced when stimuli occurred simultaneously or with a slight visual lead (F[1866] = 48.803,p < .001),and with group (F[1866] = 17.004, p < .001).A follow-up t-test comparing the difference between mean simultaneity judgement response rates for congruent and incongruent trials according to group revealed that the effect of congruency was greater for non-autistic participants than autistic ones (t[867] = 6.76, p < .001;see Figure 4a).SOA also interacted with group, with the differences between autistic and non-autistic participants emerging at the mid-range SOAs (F[1866] = 7.204, p < .001),which is logical given that the longer SOAs were much more obvious to both groups.Both congruency (F[1866] = 31.101,p < .001)and SOA (F[1866] = 55.233,p < .001)also significantly interacted with age.Finally, we detected a significant three-way interaction between congruency Â SOA Â age (F[1866] = 6.792, p < .001).The difference between congruent and incongruent trials was greater at younger ages.Because SOA interacts with all significant factors due to the nature of simultaneity judgement tasks, these effects were not explored further.

| Window of perceived synchrony
We fitted a Gaussian distribution to the synchrony distribution for each individual by using the curve_fit function from the scipy Python module to estimate a WPS, amplitude and PSS. Figure 5 illustrates the mean WPS as a function of age (bin size = 10 years) for participants with and without autism.Note that for one participant, the fitting procedure was not successful, resulting in exclusion from further analyses.
We conducted an analysis of covariance (ANCOVA) on the mean WPS with group as a between-subjects variable and age as a covariate.The ANCOVA yielded no significant effect of group (F[1865] = 0.227, p = .634)or age (F[1865], p = .053)on the WPS.

| Rapid temporal recalibration
To measure rapid temporal recalibration, we excluded the first trial and split the rest into two categories: those following trials with either a À500or À260-ms SOA (audition leads) and those following trials with a 260-or 500-ms SOA (vision leads).We then fit Gaussian functions (as described previously) to each modality order condition (see Figure 6a) and calculated the mean PSS by identifying the SOA at which each function reaches its peak.Rapid temporal recalibration was quantified as the difference in mean PSS between categories (i.e., PSS audition leads-PSS vision leads; see also Van der Burg et al., 2013, 2018).Note that one participant was excluded because of fitting issues.
Accordingly, Figure 6a reflects the mean proportion of synchrony responses as a function of SOA for each previous modality order and group (collapsed across congruency conditions).Figure 6b reflects the mean PSS derived from these synchrony distributions according to group and previous modality order.Figure 6c shows the Δ PSS (i.e., rapid temporal recalibration) as a function of age (in bins of 10 years) for each group.
We conducted a repeated measures ANOVA on the mean PSS with previous modality order as a withinsubjects variable, group as a between-subjects variable, and age as a covariate.We found a significant main effect of modality order (F[1865] = 13.823,p < .001),such that the PSS was smaller when audition led (206 ms) in the previous trial than when vision led (231 ms), as Figure 6b illustrates.Rapid temporal recalibration did not differ between groups, as the modality order Â group interaction failed to reach significance (F[1865] = 1.968, p = .161).Autistic participants showed a larger average PSS (240 ms) than non-autistic participants (190 ms) overall (F[1865] = 13.866,p < .001),reflecting a preference for a greater visual lead.Age did not significantly affect the magnitude of rapid temporal recalibration, as it did not interact with the previous modality order (F [1865] = 2.624, p = .106;see Figure 6c).However, older participants did have a higher overall PSS (F[1865] = 51.770,p < .001).The mean PSS for participants above the median age was 249 ms, compared with a mean of 188 ms for those below the median age.

| Age matched /da/ response analysis
Because of the significant effect of age on /da/ response rates and the skewness of the age distributions (see Figure 1), we conducted an exploratory follow-up analysis in which 10 participants were selected at random from each age bin depicted in Figure 3b for both groups.This allowed us to conduct another repeated measures ANOVA on the mean proportion of /da/ responses including SOA and congruency as within-subjects variables, group as a between-subjects variable, and age as a covariate, this time with a sub-group matched on age.The mean ages were 45.7 years for autistic participants and 45.5 years for nonautistic participants (t[1118] = 0.066, p = .948),and each group comprised 60 participants.The results of this analysis confirmed all of those discussed in the main results with the full sample (see Table S1).

| Bayesian analysis
To evaluate the evidence that group did not drive differences in susceptibility to the illusion, we conducted an ANCOVA focusing on the rate of /da/ responses in incongruent trials (collapsed across SOAs) with group as a between-subjects factor and age as a covariate.Here, we found that age (F[1866] = 53.104,p < .001)but not group (F[1866] = 0.034, p = .854)had a significant influence on /da/ response rates.We then repeated this ANCOVA using JASP's Bayesian statistics module.We used uniform model priors (assuming equal likelihood of the alternative models including age, age + group, group and the null hypothesis) and default priors on coefficients (r scale prior width = 0.5 for fixed effects and 0.354 for covariates).We found that the best model was provided by age, compared with which the BF01 of the model including age and group was 11.928, the BF01 of the model including group alone was 1.117 Â 10 11 and the BF01 of the null model was 1.450 Â 10 12 .The full results of the Bayesian analysis can be seen in Table S2.

| Gender analyses
Because of the imbalance in gender between groups, we conducted exploratory follow-up analyses with gender included as a variable.Although gender did interact significantly with some factors in our /da/ response and simultaneity judgement analyses (and women had a higher mean PSS than men), it did not change the significance of any of the aforementioned main effects or interactions.For transparency, the results of these analyses can be found in Tables S7-S9 and Figures S3-S5.Gender had no significant influence on the WPS analysis.

| DISCUSSION
Based on studies primarily with children and adolescents, it has been hypothesized that autistic individuals show attenuated MSI, particularly for speech stimuli.Our results provide compelling evidence that some differences found in children may not persist into adulthood.We found no significant difference between autistic and nonautistic individuals in susceptibility to the McGurk/ MacDonald illusion once we accounted for age differences in our sample.Because this ran contrary to the findings of the largest meta-analysis on the topic (Zhang et al., 2019), we confirmed that group was not a significant factor in our results using both age-matched and Bayesian follow-up analyses (see the Supporting Information).While Zhang et al. (2019) concluded that the difference between groups actually increases in magnitude with age, it only included one study with adults (Saalasti et al., 2012), which the original authors did not take as evidence for a difference in the strength of the McGurk/ MacDonald effect.Moreover, some findings suggest that differences between autistic and non-autistic individuals in MSI may be resolved during adolescence (Foxe et al., 2015;Taylor et al., 2010).Our findings with adults are consistent with the trajectory of improvement these results imply.
Instead of a difference between groups, we found evidence that the degree of MSI increases with age (with the average rate of the illusion nearly tripling from the youngest to oldest participants) for both autistic and nonautistic individuals.Although an increase in the rate of the McGurk/MacDonald effect between younger and older adults has been detected in non-autistic participants (Mcgurk & Macdonald, 1976;Sekiyama et al., 2014;Setti et al., 2013), this is the first study comparing them in both autistic and non-autistic samples.The nearperfect overlap of the correlations between age and MSI between groups serves as compelling evidence that although autistic children may not experience the development of visual influence on speech perception as early as their non-autistic peers, autistic adults do show comparable visual influence into their older years.These findings of similar age effects across adulthood resonate with recent longitudinal research suggesting similar cognitive ageing profiles between autistic and non-autistic individuals (Torenvliet et al., 2023).
The reason for such a strong effect of age on the rate of the illusion could be a reduced reliability of the auditory signal resulting from the progressive hearing loss common in ageing, which often goes uncorrected (Walling & Dickson, 2012).The comparative reliability of auditory and visual inputs has been shown to affect the rate at which the McGurk/MacDonald effect occurs, and their respective influence shifts during development (Hirst et al., 2018).Additionally, MSI may also serve a compensatory role in speech perception as hearing declines.Both notions are supported by research showing an increase in MSI and visual dominance later in life (Diaconescu et al., 2013), as well as enhanced susceptibility to the McGurk/MacDonald effect associated with agerelated hearing loss (Rosemann & Thiel, 2018;Stropahl & Debener, 2017).Cortical reorganization leading to increased functional connectivity between auditory and visual regions may facilitate these effects in those with age-related hearing loss (Puschmann & Thiel, 2017).It is encouraging that MSI appears to serve this compensatory role as effectively in autistic adults as non-autistic ones.
Another potential factor in differences between our findings and others is the possibility of an attentional confound.Autistic children have been shown to demonstrate an atypical preference for non-social stimuli, viewing faces less frequently than their non-autistic peers (Gale et al., 2019;Vacas et al., 2021).Additionally, in two McGurk/MacDonald studies using eye-tracking, it was found that autistic children attended less to the pertinent areas of the face than non-autistic ones (Feng et al., 2021;J. R. Irwin et al., 2011), partially explaining differences in susceptibility to the illusion (although Foxe et al., 2015, found little influence of looking behaviour).Accordingly, studies that do not control for visual attention may overstate differences in MSI.A merit of our design is that although we do not directly measure eye movements, our simultaneity judgement task requires participants to attend to the mouth during trials.The performance of participants on this task, resembling a typical Gaussian distribution peaking near simultaneity, suggests that they were indeed attending to the faces.Although the addition of eye-tracking would help to confirm this, in online experiments such as ours, where it is not possible (due to privacy reasons), the addition of a simultaneity judgement task provides an excellent means of reducing the risk of attentional differences being conflated with differences in MSI.
Beyond our findings with regard to the McGurk/ MacDonald illusion, our results have spoken to the nuances of temporal processing and how they compare between autistic and non-autistic individuals.In many ways, our results remained consistent with standard findings in temporal processing research.Synchrony distributions followed a typical Gaussian shape, peaking with a slight visual lead, as is consistently found with audiovisual stimuli (Dixon & Spitz, 1980;Slutsky & Recanzone, 2001;Zampini et al., 2005).Incongruent stimuli were perceived as synchronous significantly less frequently than congruent ones, as was shown in other studies measuring simultaneity judgments for McGurk/ MacDonald stimuli (Jertberg et al., 2023;Van Wassenhove et al., 2007;Vroomen & Keetels, 2010).Rapid temporal recalibration was detected, with the PSS shifting according to the previous modality order ( Van der Burg et al., 2013, 2015, 2018).However, our results also captured novel differences between groups.
First, with regard to synchrony distributions, we found differences in the magnitude of the effect of congruency according to group.Both groups were less likely to perceive incongruent stimuli as synchronized, but this effect was particularly pronounced for the non-autistic sample.This was even true at 0 ms, when participants dropped from recognizing the physical simultaneity of the stimuli on 91.8% to 46.2% of trials in the non-autistic group and on 89.7% to 50.8% of trials in the autistic group.This suggests a profound interference of phonetic incongruence on basic temporal processing.Van Wassenhove et al. ( 2007) attributed a similar finding to a weaker correlation between the facial kinematics (what is seen) and acoustic dynamic envelope (what is heard).But why does the magnitude of this difference vary between autistic and non-autistic individuals, when the disparity between these factors remains the same?
One interpretation might be that the autistic participants simply have a lower temporal resolution than the non-autistic ones, and therefore less room for interference in temporal processing.However, we did not replicate findings that the WPS, the common measure of temporal acuity, differs between groups, so this interpretation is not supported by our results.Alternatively, these differences could be due to impoverished lip reading ability, which has been found to account for some or all of the disparity in susceptibility to the McGurk/MacDonald effect in autistic children (Iarocci et al., 2010;E. G. Smith & Bennetto, 2007).Impoverished lip reading ability may be viewed as a weaker association between a viseme and its associated phoneme.This may translate into a diminished incongruence effect, as the autistic participants would be less sensitive to the difference driving it.That being said, were this the case, one might also expect an attenuated visual influence of the visemes, and hence a lower rate of the McGurk/MacDonald effect, in the autistic participants.An alternative explanation is that autistic participants may be less subject to a cognitive bias to judge the incongruent stimuli asynchronous because they seem unnatural, as described in Vroomen and Keetels (2010), and thereby better able to perform the task.However, as is discussed in Jertberg et al. (2023), the manner in which the effect scales with SOA (and occurs regardless of whether participants experience the illusion) suggests that a cognitive bias may not be the full explanation.As such, further research into the lip reading abilities of autistic adults and their potential influence on the temporal processing of audiovisual speech stimuli is necessary.
Delving deeper into the temporal dynamics at play, we did not detect the differences between groups in the WPS or rapid temporal recalibration formerly reported.With regard to the WPS, the largest meta-analysis to date examining potential differences between autistic and non-autistic participants found a consistent enlargement of its width among those with autism (Zhou et al., 2018), suggesting blunted temporal acuity.However, there was again a limited number of studies germane to the topic (with only four studies investigating the audiovisual WPS), most had small samples (ranging from 32 to 64 participants), and all of them focused on children.More recent research involving adults paints a different picture.Two studies (Weiland et al., 2022;Zhou et al., 2022) with larger samples of adults found no difference between autistic and non-autistic participants in the width of the WPS, suggesting that autistic individuals may also catch up in the honing of temporal processing by the time they reach adulthood.A very similar pattern emerges with rapid temporal recalibration, where smaller studies with younger participants found differences between autistic and non-autistic individuals (Noel et al., 2017;Turi et al., 2016), but the largest adult study did not (Weiland et al., 2022).However, the research here is more limited, and Weiland et al. (2022) also recruited from the NAR, so their sample may partially overlap with ours.It is also worth noting that performance on the SJ task was quite poor overall, possibly due to the difficulty of recognizing the timing of a velar consonant like /ga/.This may have adversely affected the sensitivity of our WPS analysis to subtle temporal differences.Accordingly, further examination of potential differences between autistic and non-autistic individuals in the WPS and rapid temporal recalibration (and the possibility of their resolution) is warranted.
We did, however, detect a difference between groups in the overall mean PSS value.Autistic participants showed a greater mean PSS, irrespective of stimulus type, suggesting a heightened sensory preference for visual lead.This finding may also explain the difference in the magnitude of the congruence effect between groups, at least in part, given that it was largest with a slight visual lead.The two most obvious potential explanations for the PSS difference would be either faster processing of auditory information or slower processing of visual information in autism.Research investigating responses to simple tones and disks suggests similarly protracted reaction times for auditory and visual stimuli among autistic individuals, as well as attenuated multisensory benefits and abnormal electrophysiological responses as early as 100 ms after stimulus presentation (Brandwein et al., 2013(Brandwein et al., , 2015)).Research into visual motion recognition, on the other hand, shows faster reaction times for autistic participants (Foss-Feig et al., 2013).It is possible that the processing speed for speech stimuli also differs, as they are often found to have unique temporal processing profiles (Stevenson & Wallace, 2013).Although research here is more limited, there is some evidence of faster recognition of voices among autistic individuals (Lin et al., 2016).To fully evaluate the possibility of a processing speed explanation for the PSS differences observed in this study, more research should be done to compare auditory and visual reaction times for speech stimuli among autistic and non-autistic individuals.
An alternative explanation falls more in line with our discussion of differences in representation of visual speech stimuli.If autistic individuals have differently developed representations of verbal lip movements (as suggested by their weaker lip reading abilities) and weaker associations between them and the sounds of language, as suggested by van Wassenhove et al. (2007), it stands to reason that it might take them more time to interpret lip movements and integrate them with their corresponding vocal sounds.This might translate into a greater sensory preference for visual lead when processing speech stimuli.However, given the dearth of evidence provided by the literature on the alternative sensory processing speed hypotheses, this interpretation is highly speculative, and further research should explore the factors contributing to differences in PSS between autistic and non-autistic individuals.An excellent starting point would be to see whether this preference for greater visual lead is unique to speech stimuli (supporting the notion that it is driven by differences in representation of verbal mouth movements) or whether it applies more broadly to simple audiovisual stimuli (suggesting a basic sensory processing speed explanation).
Although the large size of our sample and sound experimental design are strengths of our study, it is, of course, not without its limitations.Firstly, this experiment was part of a large online experimental battery, which limited our control over the hardware/settings participants used during the experiment and placed constraints on the number of trials they could complete.Regarding the former, we were unable to control the volume/size and quality of the audio and video input to the extent that would have been possible in the laboratory.Although findings are mixed with regard to whether the McGurk/ MacDonald effect differs in laboratory versus online experiments (Getz & Toscano, 2021;Magnotti et al., 2018), it is possible that differences in the choice of hardware and settings could have influenced the reliability of sensory information and therefore the strength of the illusion.Although we have no reason to believe that meaningful systematic differences in these choices existed between groups, it is possible that older participants may have been more likely to lack headphones or have older devices.Regarding the latter, a larger number of trials and range of SOAs would have allowed more sophisticated analyses of temporal processing and higher resolution representation of participants' WPS and recalibration effects.This also would have allowed us to investigate the potential effects of congruence on recalibration and, conversely, of recalibration on the likelihood for participants to perceive the illusion.A related shortcoming of this study is that the time limitation meant we were unable to include unisensory trial types.These allow a researcher to quantify participants' ability to identify visemes and phonemes on their own, which is important as autistic children have shown differences in their lip reading abilities when compared with non-autistic ones (Foxe et al., 2015;Iarocci et al., 2010;J. R. Irwin et al., 2011;E. G. Smith & Bennetto, 2007;Taylor et al., 2010).Although we did not find a difference in audiovisual speech processing between groups, we are unable to speak to the influence of unisensory factors in our findings because of the lack of audio and video-only trials.Future research should assess the degree to which autistic and non-autistic adults may differ in their perception of visemes and phonemes exclusively as well as in combination to better isolate any potential differences in MSI.
Additionally, although the McGurk/MacDonald paradigm is the most widely used tool for studying audiovisual speech integration, it is not without its shortcomings.High variability in the rate of the illusion and its somewhat contrived nature have raised concerns about its stability and ecological validity (Alsius et al., 2018;Getz & Toscano, 2021;Van Engen et al., 2022).Speech in noise experiments, in which the addition of visual information facilitates speech perception, rather than distorting it, offers a more naturalistic alternative.However, speech in noise performance does not always correlate with McGurk/MacDonald susceptibility (Stevenson et al., 2018;Van Engen et al., 2017).Accordingly, our findings cannot serve as conclusive evidence on their own that autistic adults do not face difficulties with audiovisual speech perception in their daily lives, particularly given that time constraints limited our stimulus set to only two syllable pairings.Still, they do capture a form of audiovisual speech integration that has been shown to correlate positively with communication skills among autistic and non-autistic individuals (Feldman et al., 2022), and they resonate with the speech in noise results of Foxe et al. (2015).As such, they remain encouraging findings with regard to the development of MSI of speech stimuli in autism.Future research should continue to evaluate the degree to which laboratory paradigms predict real-world outcomes for autistic individuals.
Finally, it must be noted that well-educated adults with comparatively high IQs are overrepresented in the NAR sample (Scheeren et al., 2022).It could be argued that our sample is therefore less likely to capture the segments of the autistic population that may suffer from the most severe deficits in areas like MSI.In particular, those with intellectual disabilities are underrepresented.That being said, the parity in IQ (as estimated by the ICAR) between groups suggests that our results can speak directly to differences resulting from the sensory factors related to autism that are not confounded by cognitive ones related to intellectual impairment.If differences between groups in MSI were only found among the individuals with lower IQs (who are underrepresented in our sample), it would be unclear whether they were due to autism or intellectual impairment.Our sample is also notable in that women are overrepresented in our autism group, likely due to the fact that they tend to be more likely to participate in online surveys (Becker, 2022), which make up the majority of NAR projects.However, as seen in the Supporting Information, although gender did interact with certain factors in our analyses, it did not alter the significance of any other main effects or interactions.So this peculiarity of our sample is unlikely to limit the generalizability of our findings.
In conclusion, our study has confirmed several findings with regard to basic temporal and multisensory processing, as well as challenged the degree to which reported differences between autistic and non-autistic children in these areas extend to adulthood.Our findings that MSI, temporal processing acuity, and rapid temporal recalibration all seem to be intact among autistic adults are highly encouraging given the essential role MSI has in speech perception and compensation for the unisensory deterioration that is inevitable with ageing.Additionally, our novel findings with regard to differences in the degree of interference in temporal processing posed by incongruent stimuli and in the mean PSS values between groups are intriguing and demand further research to disentangle alternative explanations.Understanding these phenomena is of paramount importance given the relevance of temporal and multisensory processing to higher order social factors and the proven efficacy of multisensory training.Pinpointing the age at which related interventions may be of use is crucial to their proper timing, which our findings suggest is prior to adulthood.Finally, our results underline the importance of expanding sample sizes and age ranges in autism research.Restricting our focus to children leads to a limited understanding of the broader trajectory of this developmental condition, which can only be extended by giving autistic adults the attention they deserve.

F
I G U R E 1 Age distribution of autistic and non-autistic participants.Darker sections reflect overlap between groups.

Figure
Figure 3a illustrates the mean proportion of /da/ responses (the classical McGurk/Macdonald fusion) for each group as a function of SOA for congruent and incongruent trials.Figure3bshows the mean proportion of /da/ responses on incongruent trials, collapsed across SOAs for each group, as a function of age (divided into bins of 10 years).This provides insight into the influence of age on the occurrence of the illusion and allows a comparison between groups that is not confounded by their disparity in age (see Table1 and Figure 1).
Proportion of synchrony judgments per group as a function of stimulus onset asynchrony (SOA) for congruent and incongruent trials.Here, negative SOAs indicate that the voice was leading the lip movements, and vice versa.(b) Proportion of synchrony judgments (collapsed across SOAs) as a function of age and congruency for each group (bins of 10 years).The error bars reflect the standard error of the mean.

F
I G U R E 5 Mean window of perceived synchrony for autistic and non-autistic participants relative to age (in bins of 10 years).The error bars reflect the standard error of the mean.

F
I G U R E 6 (a) Proportion of synchrony judgments per group as a function of stimulus onset asynchrony (SOA), relative to the modality order of the preceding trial (collapsed across congruency conditions).Here, negative SOAs indicate that the voice was leading the lip movements, and vice versa.(b) Point of subjective simultaneity (PSS) per group relative to the modality order on the preceding trial.(c) Magnitude of recalibration effect (i.e., change in PSS between preceding modality orders) per group relative to age (in bins of 10 years).The error bars reflect the standard error of the mean.
Demographic breakdown by group.
T A B L E 1Abbreviations: AQ, Autism Quotient; ICAR, International Cognitive Ability Resource (abbreviated intelligence quotient test); SD, standard deviation.