Abnormal connectivity and activation during audiovisual speech perception in schizophrenia

The unconscious integration of vocal and facial cues during speech perception facilitates face‐to‐face communication. Recent studies have provided substantial behavioural evidence concerning impairments in audiovisual (AV) speech perception in schizophrenia. However, the specific neurophysiological mechanism underlying these deficits remains unknown. Here, we investigated activities and connectivities centered on the auditory cortex during AV speech perception in schizophrenia. Using magnetoencephalography, we recorded and analysed event‐related fields in response to auditory (A: voice), visual (V: face) and AV (voice–face) stimuli in 23 schizophrenia patients (13 males) and 22 healthy controls (13 males). The functional connectivity associated with the subadditive response to AV stimulus (i.e., [AV] < [A] + [V]) was also compared between the two groups. Within the healthy control group, [AV] activity was smaller than the sum of [A] and [V] at latencies of approximately 100 ms in the posterior ramus of the lateral sulcus in only the left hemisphere, demonstrating a subadditive N1m effect. Conversely, the schizophrenia group did not show such a subadditive response. Furthermore, weaker functional connectivity from the posterior ramus of the lateral sulcus of the left hemisphere to the fusiform gyrus of the right hemisphere was observed in schizophrenia. Notably, this weakened connectivity was associated with the severity of negative symptoms. These results demonstrate abnormalities in connectivity between speech‐ and face‐related cortical areas in schizophrenia. This aberrant subadditive response and connectivity deficits for integrating speech and facial information may be the neural basis of social communication dysfunctions in schizophrenia.

Schizophrenia (SZ) is a disorder characterized by a range of distortions in thinking, perception, emotions, behaviour, sense of self, language and social communication (Onitsuka et al., 2022b(Onitsuka et al., , 2022a(Onitsuka et al., , 2022c)).Among these symptoms, social communication dysfunction is one of the core symptoms of SZ, with a profound negative influence on patients' daily life.The perceptual and cognitive processes that support discerning and identifying facial expressions and vocal cues are essential for successful social communication during in-person interactions (Adams et al., 2017;Avery et al., 2016;Eckland et al., 2019;Little et al., 2011;McGettigan, 2015).However, a growing body of neurophysiological evidence indicates that patients with SZ exhibit deficiencies in cortical activity in response to facial stimuli (Dong et al., 2018;Murashko & Shmukler, 2019;Ohara et al., 2020;Onitsuka et al., 2006) and vocal stimuli (Conde et al., 2016;Hirano et al., 2008Hirano et al., , 2010Hirano et al., , 2020;;Meyer et al., 2021).
In real-world contexts, our brain does not process sensory input in an independent manner; rather, it seamlessly integrates multisensory information at an unconscious level.Within the realm of audiovisual (AV) speech perception, the inclusion of facial information augments the incoming auditory signal by enhancing the auditory cortex's sensitivity to impending auditory information (Peelle & Sommers, 2015).Substantial behavioural evidence indicates that patients with SZ exhibit deficits in AV speech integration (de Gelder et al., 2003(de Gelder et al., , 2005;;de Jong et al., 2009;Pearl et al., 2009;Ross et al., 2007;White et al., 2014).Specifically, studies have demonstrated a reduced McGurk effect (reflecting the influence of visual articulatory information on auditory phonetic perception) in individuals with SZ (de Gelder et al., 2003;Pearl et al., 2009) and an inability to effectively utilize visual articulation cues for speech perception in noisy environments (Ross et al., 2007).Therefore, investigating the neural mechanisms underlying abnormal AV speech perception is particularly important for comprehending the social communication dysfunction observed in patients with SZ within ecologically relevant situations (Tseng et al., 2015).
Numerous neuroimaging studies utilizing functional magnetic resonance imaging (fMRI) have examined cerebral regions implicated in aberrant AV speech perception in SZ (Surguladze et al., 2001;Szycik et al., 2009Szycik et al., , 2013;;Wu et al., 2017).Szycik et al. (2013) suggested that abnormal AV speech perception in SZ primarily stems from impairment in a speech motor system in the right hemisphere, indicative of diminished lateralization of language functions to the left hemisphere in SZ.Furthermore, a study examined the patterns of brain activation and functional connectivity evoked by visual speech (lip movement) priming cues with the aim of enhancing speech perception in SZ patients and healthy controls (HCs) (Wu et al., 2017).The findings revealed that SZ patients exhibited reduced activation in the left inferior temporal gyrus during AV speech perception, along with diminished functional connectivity of the left inferior temporal gyrus with the bilateral Rolandic operculum, bilateral superior temporal gyrus and left insula.Notably, a recent fMRI study by Ross et al. demonstrated that during natural listening conditions, multisensory enhancement involves not only sites of multisensory integration but also numerous regions of the broader semantic network (Ross et al., 2022), which could potentially be disrupted in SZ patients.
Neurophysiological studies utilizing electroencephalography (EEG) or magnetoencephalography (MEG) have yielded invaluable insights into the dynamic nature of neural processing during AV speech perception (Beauchamp, 2016).In most EEG or MEG studies, the event-related potentials (ERPs) or fields (ERFs) in response to AV speech stimuli were compared with those in response to audio-only (A) and visual-only (V) stimuli.If auditory and visual inputs were processed independently, the ERPs or ERFs elicited by AV speech stimuli [AV] are expected to equal the sum of those elicited by audio-only [A] and visual-only [V] speech stimuli (i.e., [AV] = [A] + [V]).However, it is widely acknowledged that AV stimuli elicit supra-additive ( ) responses, reflecting interactions between auditory and visual processing.The auditory N1 response elicited by speech stimuli is typically suppressed by congruent visual stimuli when perceiving AV speech stimuli (Besle et al., 2004;Klucharev et al., 2003;Knowland et al., 2014;Nakamura et al., 2015;Pilling, 2009;Stekelenburg & Vroomen, 2007;van Wassenhove et al., 2005).The auditory N1 is wellknown to be attenuated when an incoming sound is consistent with a lister's prediction (Arnal & Giraud, 2012), and therefore, it is conceivable that N1 subadditive N1 response during AV speech perception indicates enhanced auditory prediction caused by V speech stimuli.
Previous research on ERP-based neural processing during AV speech perception in SZ has been limited in scope (Liu et al., 2016;Senkowski & Moran, 2022;Stekelenburg et al., 2013).Notably, Stekelengurg et al. reported that AV speech stimuli evoked a subadditive N1 response in HCs, consistent with numerous prior studies, but this response was diminished in SZ.This finding suggests that the early-latency subadditive response observed in HCs reflects a reduction in the temporal uncertainty of auditory input facilitated by visual input, whereas the nonadditive response in SZ patients indicates deficits in prompt multisensory processing.However, a recent systematic review (Gröhn et al., 2021) indicates that although multisensory processes in SZ are found to be associated with aberrant, mainly reduced, neural activity in several brain regions, as measured by EEG, multisensory integration could be intact during some conditions.Indeed, employing EEG with audiovisual speech paradigm, Senkowski and Moran (2022) did not find evidence for alterations in integrative multisensory processing, indicating that audiovisual speech deficits can be potentially explained by unisensory auditory speech processing deficits in SZ.Therefore, further neurophysiological investigations about whether SZ patients have AV speech perception deficits are needed.
A recent systematic review (Gröhn et al., 2021) also found that a fronto-temporal region, along with the fusiform gyrus and dorsal visual stream in the occipitalparietal lobe, is a possible key region of multisensory integration deficit in SZ.Nevertheless, the specific brain regions responsible for the subadditive N1 response and its associated brain networks have never been examined in SZ patients.Furthermore, the association between the subadditive N1 response induced by AV stimuli and clinical symptoms was not fully investigated in previous studies.
To address the above issues, we attempted to identify the specific brain regions exhibiting abnormal (subadditive) N1m responses in patients with SZ by combining source analysis with ERF data collected via MEG.MEG offers superior spatial resolution than EEG, enabling us to precisely examine the source activity associated with AV speech perception.In a previous MEG study, we analysed ERFs of HCs in sensor space and showed that the subadditive N1m response was induced by AV speech stimuli in only the left hemisphere (Nakamura et al., 2015).Moreover, given the emerging evidence that SZ patients have structural and functional abnormalities in the auditory cortex, especially in the left hemisphere (Hajek et al., 1997;Hirano et al., 2010Hirano et al., , 2015Hirano et al., , 2020;;Hirano & Uhlhaas, 2021;Kasai et al., 2003;Kompus et al., 2011;Kühn & Gallinat, 2012;Modinos et al., 2013;Shinn et al., 2013;Spencer et al., 2009), we hypothesized that SZ patients may exhibit abnormal subadditive N1m responses in left-hemisphere auditory-related regions.In addition to source-based ERFs, we explored the dynamic neural networks associated with abnormal subadditive N1m responses in SZ by comparing the event-related functional connectivity centered on the cortical regions where the subadditive N1m response was observed in response to AV, A and V speech stimuli between the SZ and HC groups.Last, since negative symptoms appeared to underlie abnormalities in facial recognition (Chan et al., 2010;Chen et al., 2012;She et al., 2017) or cortical activity during facial tasks (e.g., Ohara et al., 2020), we examined the correlations between the subadditive N1m response, its related functional connectivity and negative symptoms of SZ.

| Participants
Twenty-three patients with SZ (mean age: 38.3 ± 11.2 years, 13 males) and 22 HCs (mean age: 41.4 ± 12.9 years, 13 males) participated in the experiment.All participants were right-handed and displayed normal hearing (pure-tone thresholds better than 40 dB hearing level at 500, 1000 and 4000 kHz) and normal or corrected-to-normal vision.The inclusion criteria were as follows: (1) no history of neurological illness or major head trauma, (2) no history of electroconvulsive therapy, (3) no history of alcohol/drug dependence or abuse and (4) a verbal IQ above 75.The HCs were recruited from the local community or were students at Kyushu University, and they were screened using the Structured Clinical Interview of the Diagnostic and Statistical Manual of Mental Disorders (SCID)-nonpatient edition; the HCs and their first-degree relatives were free of Axis-I psychiatric disorders.All SZ patients were recruited from Kyushu University Hospital, and they were diagnosed based on the SCID-DSM V. SZ symptoms were assessed with the Positive and Negative Symptoms Scale (PANSS) (Kay et al., 1991).All patients received antipsychotic medication with a mean daily dose equivalent to 445 mg of chlorpromazine (Inada & Inagaki, 2015).Detailed demographic data for all subjects are presented in Table 1.The study was approved by the Kyushu University Institutional Review Board for Clinical Trials (approval number 29-038).Informed consent was obtained from each subject after a detailed description of the study.

| Stimuli and procedure
A vowel sound /a/, spoken by a Japanese actor, was used as the auditory stimulus.This sound had a fundamental frequency of approximately 140 Hz and four formants.The frequencies of the four formants were as follows: 760 Hz (F1), 1250 Hz (F2), 2750 Hz (F3) and 3600 Hz (F4).The duration of the stimulus was 200 ms with a rise/fall time of 10 ms.This stimulus was presented binaurally through insert earphones (ER-3, Etymotic Research) at a sound pressure level of 70 dB.The visual stimulus was a grayscale image consisting of the face of a male amateur soccer player who was pronouncing the same Japanese vowel sound /a/.The visual stimulus was presented for 200 ms in the centre of a screen located 1 m from the subjects' eyes and had a black background.There were three stimulus conditions.In the audio-only (A) and visual-only (V) conditions, the auditory and visual stimuli were presented with a black screen and with no sound, respectively.In the audiovisual (AV) condition, the auditory and visual stimuli were presented simultaneously for 200 ms.A white circle was also presented visually for 200 ms as a target stimulus to ensure that subjects attended to the stimuli.
Figure 1 shows a schematic illustration of the experimental procedures.In each trial, one of four types of stimuli (A, V, AV, or the target stimulus) was randomly presented to the participants using Stim2 software (Neuroscan Systems Co., Charlotte, NC, USA) after a fixation cross was displayed in the cente of the screen for 200 ms, followed by a blank screen for 200 ms.The interstimulus interval was randomly chosen from the range of 1800 to 2000 ms.In total, there were 300 trials consisting of 90 trials each of A, V and AV stimuli and 30 trials of the target stimulus.
During the MEG recording, the participants were seated in a quiet and dimly illuminated magnetically shielded room at Kyushu University Hospital.They were instructed to keep their eyes open and refrain from sleeping.To ensure sustained attentiveness throughout the task, participants were instructed to press the mouse button using the index finger of their left or right hand (counterbalanced between subjects) promptly upon perceiving the target stimuli.F I G U R E 1 Schematic of the experimental procedure.In this task, there were four stimulus conditions: audio-only (a), visual-only (b), audiovisual (c) and target stimulus (d).In each trial, one of the four stimuli was randomly presented to the participants with an interstimulus interval ranging from 1800 to 2000 ms, following the presentation of a fixation cross for 200 ms and a blank screen for 200 ms.

| MEG recording and preprocessing
MEG signals were recorded using a whole-head 306-channel system (Elekta-Neuromag, Helsinki, Finland).The MEG detector array comprised 102 identical triple-sensor elements, with each sensor element comprising two orthogonally oriented planar gradiometers and one magnetometer.Magnetic fields were recorded at a sampling rate of 1000 Hz with a bandpass filter of 0.1-330 Hz.Prior to the recording session, four head-position indicator (HPI) coils were attached to the participant's scalp.Then, the locations of fiducial points (nasion and left and right auricular points) of the head, the HPI positions and approximately 200 head-surface points were recorded using a 3-D digitizer (FastTrack, Polhemus) to measure anatomical landmarks of the head with respect to the HPI coils.
After recording, a spatiotemporal signal space separation (tSSS) method was applied offline to the recorded raw data to eliminate noise originating from outside the brain (Taulu & Simola, 2006).The tSSS-reconstructed data were analysed using MNE-Python (https://mne.tools/stable/index.html).We used MEG data recorded using 204 planar-type gradiometers.Epochs of 1800-ms duration, including a 900-ms prestimulus interval, were generated for each stimulus condition.The epochs with signal variations exceeding 2000 fT were excluded, and then independent component analysis was applied to remove ocular, myographic and cardiographic artifacts.After artifact removal, the average numbers (standard deviations) of trials for each stimulus were 82.9 (9.6) for A, 83.5 (9.1) for V and 82.4 (11.3) for AV stimuli in the HC group and 81.5 (10.3) for A, 81.6 (10.9) for V and 81.4 (10.6) for AV stimuli in the SZ group.

| Event-related magnetic fields in source space
The ERFs in response to A, V and AV stimuli were evaluated in source space.Separately for each stimulus condition, the ERF was calculated in each gradiometer by averaging artifact-free epochs and digitally filtering the averaged waveform using a Butterworth bandpass filter of 1-50 Hz.The ERF in the source space was obtained for each stimulus condition by applying source localization analysis to the ERF in sensor space using minimumnorm estimation (Hämäläinen & Ilmoniemi, 1994) in the following steps.First, we conducted a coregistration between MEG and an MRI template in FreeSurfer (https://surfer.nmr.mgh.harvard.edu)based on the headposition data obtained through the HPI coils and 3D digitizer.We then computed the forward solutions for all source locations (mesh-patterned 8196 dipole locations marked in the standard brain) using a singlecompartment boundary-element model.A noise covariance matrix was created using data from the prestimulus period (from À100 to 0 ms) of each stimulus condition.Finally, a noise-normalized ERF [achieved via dynamical statistical parametric mapping (dSPM)] ( Dale et al., 2000) at each source location was estimated from each time point of the sensor-space ERF in each stimulus condition using an MNE inverse operator computed from the forward solution and the noise covariance matrix.The ERF obtained at each source location had an amplitude calculated relative to the prestimulus period (from À100 to 0 ms).
The ERFs in response to A, V and AV stimuli in the source space are defined as [A], [V] and [AV], respectively.Drawing from our previous investigations of subadditive responses to AV stimuli (Nakamura et al., 2015), [AV] was assessed by comparing it with [A] and [V].Initially, the difference between [A] + [V] and [AV] was calculated in each participant.Within each group, successive comparisons were made between [A + V] and [AV] with a time window of 10 ms from 100 to 180 ms after stimulus onset.This analysis enabled the identification of distinct cortical regions where the subadditive N1m response was prominently observed in the HC group but not in the SZ group, based on the cortical parcellation scheme outlined in Destrieux et al. (2010), in which the cerebral cortex was subdivided into 148 brain regions.Specific regions of interest (ROIs) and a time window of interest (TOIs) for statistical analysis of regional ERF were defined based on a comparison between [A] ± [V] and [AV], created from data of both HC and SZ groups (see detail in Figure S1).ERFs of individual regions for [AV] and [A] + [V] were derived from these ROIs (depicted as green lines in source activation maps shown in Figure 2).We extracted a regional ERF separately for each stimulus condition by averaging ERFs across all vertices in the ROI.In addition, we calculated time-averaged regional ERFs of [AV] and [A] + [V] with a time window of 120-170 ms following stimulus onset (depicted as transparent yellow areas in Figure 3A).

| Functional connectivity analysis of the source space
We conducted a seed-based functional connectivity analysis to identify the brain network associated with the subadditive N1m response.In this study, the phase slope index (PSI) was used for functional connectivity analysis.The PSI is an imaginary part of the cross-spectrum and is therefore not sensitive to false connectivity caused by volume conduction (Nolte et al., 2004(Nolte et al., , 2008)).PSI analysis also enabled us to examine the direction of information flow based on the phase of the cross-spectrum.We defined the region where the subadditive N1m response was clearly observed in the HC group but not the SZ group as a seed for PSI-based functional connectivity analysis (see Figure 4a).Specifically, we first applied source localization analysis to the epoched data.In each stimulus condition, the PSI was computed from each time point of single-trial ERFs using the seed location and each vertex.The single-trial ERFs were bandpass filtered using the continuous Morlet wavelet transformation before calculating the PSI to estimate the cross-spectrum in the theta (4-7 Hz), alpha (8-13 Hz) and beta (14-30 Hz) frequency bands.The wavelet width increased linearly from 2 to 4 cycles from the lowest to the highest frequency.
To identify the regions that had strong functional connections with the seed during AV speech perception, we created successive whole-brain PSI maps with a time window of 30 ms from 50 to 320 ms after stimulus onset separately for each group and each frequency band.The specific brain region where the absolute value of the theta-band and alpha-band PSIs was higher during the AV condition in the HC group was identified on the parcellation of the human cortex by Destrieux et al. (2010) (see Figure 4 and Figures S2-S4) (high absolute PSI values were not observed in any regions in beta frequency band), and this region was labelled as the ROI for PSI-based functional connectivity (see Figure 4b).An individual PSI time course was extracted from this ROI separately for each stimulus condition and each frequency band.

| Statistical analyses
We conducted a three-way repeated-measures analysis of variance (ANOVA) on the time-averaged (120-170 ms) ERF amplitude with group (HC vs. SZ) as a betweensubjects factor and stimulus condition ([AV] vs. [A + V]) and hemisphere (left vs. right) as within-subjects factors.
Regarding functional connectivity, we first examined the difference in the PSI time course among stimulus conditions in the HC group separately for theta-band or alpha-band using cluster-based permutation ANOVA (Maris & Oostenveld, 2007).The detailed procedure for the cluster-based permutation ANOVA was as follows.First, the F value was calculated using a repeated-measures ANOVA with a within-subject factor including three stimulus conditions at each time point.The time windows for which the F value was above that corresponding to a p ≤ 0.05 were calculated and clustered based on temporal adjacency.Second, cluster-level statistics were calculated by summing the F values within every obtained cluster.Third, the cluster-level statistics were analysed by empirical distribution of the maximum cluster-level statistics for each cluster.This distribution was obtained by taking a thousand random partitions of the combined PSI data across the three conditions and calculating the maximum cluster-level statistic for each partition.A cluster p-value was obtained for each cluster by comparing the clusterlevel statistic of interest against the empirical distribution.The null hypothesis (no difference among the three stimulus conditions) was rejected if the p-value was ≤0.05.The cluster PSI values were obtained for each stimulus condition or each frequency band by averaging the PSI values across time points included in the clusters that had a significant main effect of stimulus condition.For each cluster, we applied a two-way repeated-measures ANOVA on the cluster PSI value with stimulus condition ([A] vs. [V] vs. [AV]) as a within-subject factor and group (HC vs. SZ) as a between-subject factor.
To investigate our hypothesis regarding the potential relationship between cerebral activation linked to AV speech perception and negative symptoms in patients with SZ, we conducted an analysis of the associations between the time-averaged ERF amplitude, the cluster PSI values in the AV condition, and PANSS negative symptoms with Spearman correlation analyses.We also examined their correlations with SES (socioeconomic status) score and chlorpromazine equivalent (CPZ eq) dose.

| A decreased subadditive N1m response in the SZ group
To identify the brain regions where a subadditive response to AV stimuli was observed, we first examined the wholebrain difference in [A ± V] to [AV] at latencies from 100 to 180 ms in 10 ms steps created from the combined data of both HC and SZ groups (see Figure S1).Consequently, we defined the posterior ramus of the lateral sulcus of both hemispheres as ROIs and latencies of 120-170 ms as TOIs for regional activity analyses.We then examined the difference in [A ± V] to [AV] within each group in these ROIs and TOIs (Figure 2).In the HC group, the difference between [A + V] and [AV] was clearly observed in the posterior ramus of the lateral sulcus (areas surrounded by green lines in Figure 2) of the left hemisphere at latencies of 120-170 ms, whereas such difference was small in the right hemisphere.Conversely, the SZ group exhibited no discernible discrepancies between [A + V] and [AV] in the same region (Figure 2). Figure 3a shows comparisons between the above regional ERFs of [AV] and [A + V] for each group and each hemisphere.In the HC group, the regional ERF of [AV] exhibited a decreased dSPM value compared with that of [A + V] within 100 to 200 ms after stimulus onset, specifically in the left hemisphere.This subadditive response was not evident in the right hemisphere of the HC group or in either hemisphere of the SZ group.Figure 3b shows timeaveraged (120-170 ms) ERF amplitude for each group, F I G U R E 3 Comparison of the regional ERF between HC and SZ.(a) Comparisons of regional event-related field (ERF) in the posterior ramus of the lateral sulcus of the left and right hemispheres (LH and RH) for audiovisual (AV) conditions with that for the addition of audio-only (A) and visual-only (V) conditions.The transparent area surrounding each line indicates its standard error.(b) Individual timeaveraged ERF amplitudes were obtained from regional ERFs in the posterior ramus of the lateral sulcus of the left and right hemispheres (LH and RH).Individual values were extracted from the time window (yellow transparent area in (a)) for A, V and AV conditions.HC, healthy control; SZ, schizophrenia.

| Weaker theta-band functional connectivity between the auditory area and the fusiform gyrus in SZ
We evaluated PSI-based functional connectivity in theta and alpha frequency bands using the posterior ramus of the lateral sulcus in the left hemisphere as a seed (Figure 4a).In the theta frequency band, a high positive PSI value was observed at the latencies of 80-140 ms in the fusiform gyrus of the right hemisphere (areas surrounded by green lines) during the AV condition within the HC group (Figure S2).Conversely, such a connectivity pattern was not clearly observed in the SZ group.Subsequently, we extracted the PSI time course from the right fusiform gyrus for each group and each stimulus condition (see Figure 4c).In the HC group, the temporal dynamics of PSI values exhibited distinct variations among the stimulus conditions, with the PSI values exhibiting positive trends at latencies of approximately 100 ms in both the AV and V conditions (see Figure 4c).Cluster-based permutation ANOVA on the PSI time course of the HC group revealed a main effect of stimulus condition at a specific time window (63-128 ms) (transparent yellow areas in Figure 4c) (cluster p = 0.015) (Figure 4b shows time-averaged PSI maps in the wholebran, which were calculated using this specific time window, for AV, A and V conditions).Conversely, in the SZ group, the PSI time course displayed near-zero values at the same time window across all stimulus conditions (see Figure 4c).The individual cluster PSI values were obtained using the above time window and displayed separately for each group and stimulus condition in Figure 4d.We conducted a two-way repeated-measures ANOVA on the cluster PSI values with group as a between-subject factor and stimulus condition as a within-subject factor.There were significant main effects of group (F[1,86] = 4.80, p = 0.034) and stimulus condition (F[1,43] = 12.88, p < 0.001) but no significant interaction (F[2,86] = 2.30, p = 0.14).For post hoc analyses, we also conducted multiple comparison tests among the three stimulus conditions with Bonferroni correction.Multiple comparisons between stimulus conditions showed that the PSI values were significantly higher in the [AV] and [V] conditions than in the [A] condition ([AV] vs. [A]: Bonferroni-corrected p < 0.001; [V] vs. [A]: Bonferroni-corrected p = 0.002).There was no significant difference between the [AV] and [V] conditions (Bonferroni-corrected p = 0.18).In summary, our investigation revealed robust functional connectivity between the auditory-related region exhibiting a subadditive N1m response and the fusiform gyrus during the presentation of visual speech (face) stimuli in the HC group.Remarkably, this connectivity was compromised in patients with SZ.In the alpha frequency band, there were no group differences in PSI-based functional connectivity, although high positive and negative values were confirmed at latencies of 110-140 and 200-260 ms, respectively, in the fusiform gyrus of the right hemisphere (see detailed results and figures in the Supporting Information).

| DISCUSSION
The current study investigated abnormalities in the subadditive response to AV speech perception and its related functional connectivity in patients with SZ, utilizing MEG to elucidate the neural mechanisms underlying deficits in AV speech integration in SZ.Within the HC group, a subadditive response was clearly observed in the posterior ramus of the lateral sulcus within the left hemisphere.Moreover, this particular region exhibited robust theta-band functional connectivity with the fusiform gyrus of the right hemisphere when exposed to facial stimuli.In contrast, SZ patients exhibited no such subadditive response and demonstrated diminished functional connectivity compared with that of the HC group.Our findings demonstrate impairments in the neural network responsible for the integration of AV stimuli during speech processing in SZ.The source-based ERF analysis in the present study extended our previous finding that the subadditive response during AV speech perception was specific to the left hemisphere (Nakamura et al., 2015) and revealed that the specific source of the subadditive response was the posterior ramus of the lateral sulcus situated within the left hemisphere.This specific region is considered a pivotal component of the cerebral network that facilitates the coordination of auditory and motor representations of speech (Hickok & Poeppel, 2004, 2007).Given that AV speech stimuli are regarded as motor representations of speech, it is plausible that the subadditive response observed within this region may be closely linked to the integration of auditory and motor information during speech perception.
One notable finding in this study is the attenuation of the subadditive response associated with AV speech perception in patients with SZ.Based on the above discussion, this result indicates a decline in the neural mechanisms responsible for integrating auditory and motor speech information in SZ, leading to an inability to effectively utilize visual motor information during auditory speech perception.Our current findings align with those of a previous EEG study by Stekelenburg et al. (2013).Nonetheless, Senkowski and Moran (2022) reported that N1 suppression during AV speech perception was preserved in SZ.The inconsistency among these studies may stem from the difference in task procedures, equipment (EEG vs. MEG) and analytical methods (electrode vs. source space) used.In particular, our study, as well as Stekelenburg's, involved the passive presentation of AV speech stimuli in a quiet environment, whereas Senkowski and Moran engaged participants in an AV speech recognition task under noisy conditions.Conceivably, patients with SZ may encounter difficulties in unconscious integration of cross-modal information, even though they can utilize visual cues when auditory input proves insufficient for precise speech comprehension.We also argue that examining subadditive N1 response in source space and its associated networks using MEG with excellent temporal and spatial resolution led to uncovering the detailed audiovisual speech deficits within the cortex in SZ.
Contrary to AV speech perception, several ERP studies investigating nonspeech AV integration tasks (Stone et al., 2011;Wynn et al., 2014) did not uncover such deficits.Notably, Stone et al. (2011) administered an AV integration task utilizing an image of a soccer ball and a low-tone simulation of a bouncing soccer ball (540 Hz) as the visual and auditory stimuli, respectively, to assess abnormalities in AV integration effects on ERPs in SZ.They reported clear subadditive responses in both the SZ and HC groups.Combining the findings of our present study with those of previous research, it is conceivable that the reduction in the early-latency subadditive response may represent a speech-specific neurophysiological deficit in SZ.
We identified the posterior ramus of the lateral sulcus within the left hemisphere as the specific brain region associated with the early-latency subadditive response deficits in SZ.To the best of our knowledge, this study is the first to illustrate neurophysiological lateralization deficits related to AV speech perception using EEG/MEG in patients with SZ.By utilizing fMRI and video clips (combinations of speech and gestures), Straube et al. (2014) showed that SZ patients demonstrate specific reductions in functional connectivity of the left superior temporal sulcus (STS) and the bilateral inferior frontal gyrus (IFG) during the processing of metaphoric gestures.Importantly, left-lateralized language-related functional and structural deficits have been repeatedly demonstrated in SZ (Hirano et al., 2008(Hirano et al., , 2010(Hirano et al., , 2020;;Hirano & Uhlhaas, 2021;Sommer et al., 2003).In addition, a recent MRI study demonstrated an association between disruptions of left language-processing areas (e.g., Heschl's gyrus and planum temporale) and verbal-related symptoms in SZ (Jung et al., 2019).Thus, left-dominant functional and structural abnormalities in speech-related brain areas may underlie the impaired AV speech perception in SZ.
In addition to the subadditive response, we investigated associated dynamic functional connectivity.The posterior ramus of the lateral sulcus in the left hemisphere exhibited robust theta-band functional connectivity with the fusiform gyrus in the right hemisphere when exposed to AV and V speech stimuli.Furthermore, based on the positive values of the PSI observed at latencies of approximately 100 ms, information was estimated to flow from the posterior ramus of the lateral sulcus to the fusiform gyrus.The fusiform gyrus is known to play a critical role in face processing (Kanwisher et al., 1997;Kanwisher & Yovel, 2006), suggesting that such functional connectivity may be closely associated with the transmission of auditory speech information to facial information.As such a connection was not detected during the A condition, it is also conceivable that it is mainly triggered by face processing, thereby linking the speechrelated brain area with the face-related brain region.Moreover, in the alpha frequency band, we identified bidirectional functional connectivity, although no significant group differences were observed.Specifically, by considering the direction of the PSI value (positive or negative) within the fusiform gyrus, we estimated the information flow from the posterior ramus of the lateral sulcus to the fusiform gyrus at latencies around 100 ms and observed an inverted information flow pattern at latencies around 200 ms.A diffusion-weighted MRI study by Blank et al. (2011) also revealed direct structural connections between speech-related brain regions (the anterior, middle and posterior superior temporal sulcus) and the face recognition area (fusiform face area).The dynamic functional connectivity we identified in the present study approximately corresponds to this structural network.
The diminished theta-band functional connectivity observed between brain regions associated with speech and facial processing in SZ was closely associated with negative (deficit) symptoms of SZ.These findings have important implications for unravelling the neurophysiological mechanisms underlying deficits in AV speech integration in SZ.We posit that such aberrant connectivity may stem from both structural and functional alterations within the fusiform gyrus in SZ (Lee et al., 2002;Maher et al., 2019;Ohara et al., 2020;Onitsuka et al., 2003Onitsuka et al., , 2006;;Quintana et al., 2003;Takahashi et al., 2006;Walther et al., 2009).In our previous MEG study (Ohara et al., 2020), we observed a decreased ERF response to facial stimuli (M170), with the right fusiform gyrus identified as the signal source.Importantly, this decrease in M170 was found to correlate with the severity of negative symptoms, mirroring the correlation between functional connectivity and negative symptoms observed in the present study.Furthermore, this abnormal functional connectivity may be linked to social communication impairments in SZ, particularly through the influence of negative symptoms, as a certain domain of negative symptoms is known to exert a profound impact on social outcomes (Harvey et al., 2017(Harvey et al., , 2019;;Ventura et al., 2015).
We acknowledge that the conclusions of this study are tempered by certain caveats.First, given that most of our patients with SZ were in the chronic phase and their psychotic symptoms are relatively mild, it is important to evaluate the early phase of SZ, including clinical highrisk individuals and patients with first-episode SZ, as well as the longitudinal changes during the early phase of SZ.Second, the current study does not provide direct insights into whether structural abnormalities, such as reduced grey matter volume or deficits in white matter tracts, are the underlying basis of these neurophysiological impairments in SZ; future studies are needed to investigate this relationship.Third, the lack of behavioural data constrains us from investigating its associations with brain activity in patients with SZ.Finally, it is unknown how the abnormalities observed here may have been affected by long-term use of antipsychotic medication by patients with SZ, although the subadditive N1m response to AV speech stimuli and related functional connectivity were not correlated with the antipsychotic dosage.
In conclusion, our results demonstrate abnormalities in connectivities between speech-and face-related areas in SZ.We propose that these alterations in the neural network that integrates speech and facial information may represent the neural basis of social communication dysfunctions in SZ.

F
I G U R E 2 Comparison of the subadditive response to audiovisual stimuli between HC and SZ in the source space.Group-averaged source distribution maps (HC: upper maps; SZ: lower maps) in both the left and right hemispheres, displaying the subadditive response (i.e., ([A] + [V]) À [AV]) from 100 to 180 ms in 10 ms steps.The area surrounded by a green line in each map is the posterior ramus of the lateral sulcus.A, audio-only; V, visual-only; AV, audiovisual; HC, healthy control; SZ, schizophrenia; dSPM, dynamical statistical parametric mapping.

F
I G U R E 4 Comparison of seed-based functional connectivity in theta frequency range associated with the subadditive response to audiovisual stimuli between HC and SZ.(a) The seed for functional connectivity analysis was set to the posterior ramus of the left lateral sulcus.(b) Group-averaged source distribution maps (bottom view) displaying theta-band functional connectivity based on phase slope index (PSI) analysis during audio-only [A], visual-only [V] and audiovisual [AV] conditions.The area surrounded by the green line on each map is the right fusiform gyrus.(c) Time courses of group-averaged PSI-based theta-band functional connectivity in the right fusiform gyrus during A, V and AV conditions.The transparent area surrounding each line indicates its standard error.The yellow transparent area is the time window in which a significant main effect of stimulus condition was confirmed in the HC group by a cluster-based permutation analysis of variance.(d) Individual PSI values extracted from the time window (yellow transparent area in (c)) for A, V and AV conditions.HC, healthy control; SZ, schizophrenia; LH, left hemisphere; RH, right hemisphere.

F
I G U R E 5 Correlation between PANSS negative scores and PSI values in the SZ group.Scatter plot showing the relationship between the PANSS negative score and the phase slope index (PSI) value extracted from functional connectivity between the posterior ramus of the left lateral sulcus and the right fusiform gyrus during the audiovisual condition.SZ, schizophrenia.
Demographic and clinical characteristics of the subjects.
T A B L E 1Abbreviations: CPZ equiv, chlorpromazine equivalents; PANSS, Positive and Negative Symptoms Scale; SES, socioeconomic status.