Musicianship and melodic predictability enhance neural gain in auditory cortex during pitch deviance detection

Abstract When listening to music, pitch deviations are more salient and elicit stronger prediction error responses when the melodic context is predictable and when the listener is a musician. Yet, the neuronal dynamics and changes in connectivity underlying such effects remain unclear. Here, we employed dynamic causal modeling (DCM) to investigate whether the magnetic mismatch negativity response (MMNm)—and its modulation by context predictability and musical expertise—are associated with enhanced neural gain of auditory areas, as a plausible mechanism for encoding precision‐weighted prediction errors. Using Bayesian model comparison, we asked whether models with intrinsic connections within primary auditory cortex (A1) and superior temporal gyrus (STG)—typically related to gain control—or extrinsic connections between A1 and STG—typically related to propagation of prediction and error signals—better explained magnetoencephalography responses. We found that, compared to regular sounds, out‐of‐tune pitch deviations were associated with lower intrinsic (inhibitory) connectivity in A1 and STG, and lower backward (inhibitory) connectivity from STG to A1, consistent with disinhibition and enhanced neural gain in these auditory areas. More predictable melodies were associated with disinhibition in right A1, while musicianship was associated with disinhibition in left A1 and reduced connectivity from STG to left A1. These results indicate that musicianship and melodic predictability, as well as pitch deviations themselves, enhance neural gain in auditory cortex during deviance detection. Our findings are consistent with predictive processing theories suggesting that precise and informative error signals are selected by the brain for subsequent hierarchical processing.

The effect of context predictability on sound salience has been proposed to reflect the neural weighting of unexpected events by the precision afforded to sensory inputs (Quiroga-Martinez et al., 2019a;Ross & Hansen, 2016;Vuust, Dietz, Witek, & Kringelbach, 2018).
Such a precision-weighting mechanism would allow the brain to select informative sensory signals for further processing (Feldman & Friston, 2010;Friston et al., 2020;Hohwy, 2012). Research on attention has linked this selection to a modulation of postsynaptic gain in which the activity of neurons representing attended features and objects is enhanced (Garrido, Rowe, Halász, & Mattingley, 2018;Rabinowitz, Goris, Cohen, & Simoncelli, 2015;Reynolds & Desimone, 1999). Such gain modulation likely arises from a change in the strength of intrinsic (i.e., within-region) connections controlling the excitability of brain areas (Auksztulewicz et al., 2017(Auksztulewicz et al., , 2018Auksztulewicz & Friston, 2015), usually ascribed to NMDA receptor function and synchronous interactions between fast-spiking inhibitory interneurons and pyramidal cells. However, it remains unclear whether the same gain mechanisms operate when the salience of a sound is driven, not endogenously by selective attention, but rather exogenously by the predictability of successive stimuli.
The effect of musicianship on sound salience has also been suggested to rely on precision-driven mechanisms. Vuust et al. (2018) proposed that musicians possess a more precise predictive model of musical auditory signals than nonmusicians, a view that has behavioral support (Hansen & Pearce, 2014;Hansen, Vuust, & Pearce, 2016). If musicians have a fine-grained representation of musical tuning-which facilitates deviance detection and leads to enhanced MMN responses-the same precision-driven gain mechanisms above may also operate when sound salience is enhanced by musical expertise.
Here, we characterized the neuronal dynamics and effective connectivity underlying the salience of surprising musical sounds and its modulation by predictability and musical expertise. We employed dynamic causal modeling (DCM) of magnetoencephalography (MEG) data from a previous study investigating magnetic MMN responses (MMNm) in melodic sequences (Quiroga-Martinez et al., 2019b). In that experiment, musicians and nonmusicians listened to highly predictable stimuli-a repeated four-note pattern-and less predictable stimuli-complex, less-repetitive melodies. We found that pitch deviants were more easily detected and generated larger MMNm responses in highly predictable compared to less predictable melodies, and in musicians compared to nonmusicians.
We based the DCM analyses on an auditory network comprising bilateral primary auditory cortex (A1), bilateral superior temporal gyrus (STG), and the right frontal operculum (rFOP). We asked whether the MMNm, and its modulation by predictability and musical expertise, relied on changes in intrinsic connectivity within A1 and STG, as a plausible synaptic mechanism implementing precision-weighting of prediction error. We compared this to the alternative explanation that predictability and expertise modulate propagation of prediction and error signals through forward and backward extrinsic (i.e., betweenregions) connectivity, which is typically associated with short-term plasticity, sensory learning and model updating. Thus, we contrasted models in which intrinsic, forward, and/or backward connections were allowed to explain auditory evoked responses and their modulation as measured with MEG.

| Participants
Twenty musicians and 20 nonmusicians were included in the study.
These participants are part of a larger group of 24 nonmusicians and 26 musicians whose data have been analyzed and reported elsewhere (Quiroga-Martinez, et al., 2019a, 2019bQuiroga-Martinez, et al., 2020). The four nonmusicians and six musicians excluded were those for whom high-quality MRI images were not available, due to artifacts or abstaining from the MRI session. Musical expertise and musical competence (Table 1) were assessed with the Goldsmiths Musical Sophistication Index (GMSI) (Müllensiefen, Gingras, Musil, & Stewart, 2014) and the Musical Ear Test (MET) (Wallentin, Højlund, Friis-Olivarius, Vuust, & Vuust, 2010

| Stimuli
In the experiment, we included conditions with high-predictability (HP) and low-predictability (LP) stimuli. HP stimuli comprised simple melodies consisting of a four-note repeated pitch pattern that has often been used in musical MMNm paradigms and is known as the Alberti bass (Vuust et al., 2011(Vuust et al., , 2012Vuust, Liikala, Näätänen, Brattico, & Brattico, 2016). LP stimuli consisted of a set of major and minor versions of six novel melodies, which had a much less repetitive internal structure and spanned a broader local pitch range than HP The predictability of these stimuli was measured in terms of Shannon entropy with IDyOM, a computational model of auditory expectations (Pearce, 2005). Briefly, IDyOM estimates the surprise value (referred to as "information content") of different continuations in a melody based on the probability of melodic patterns that appeared previously in the melody or in a long-term training corpus. Entropy, which is the expected value of surprise, is maximal when all continuations are equally plausible and is minimal when a single continuation is highly likely. The corresponding analyses revealed higher entropy values for the LP than the HP condition (see Quiroga-Martinez et al., 2019a for details).
Individual melodies were 32 notes long, lasted 8 s, and were pseudo-randomly transposed between 0 and 5 semitones upward.
The presentation order of the melodies was pseudorandom within each condition. After transposition, the pitch-range of the LP condition spanned 31 semitones from B3 (F 0 ≈ 247 Hz) to F6 (F 0 ≈ 1,397 Hz). HP melodies were transposed to two different octaves to cover approximately the same pitch range as LP melodies.
For stimulus delivery, a pool of 31 standard piano tones was created with the "Warm-grand" sample in Cubase (Steinberg Media Technology, version 8). Each tone was 250 ms long, was peakamplitude normalized and had 3-ms-long fade-in and fade-out to pre-  Figure 1), selecting some of these groups, and choosing randomly any of the four places within a group with equal probability.
The order of appearance of the different types of deviants was pseudorandom, so that no deviant followed another deviant of the same feature. The selection of four-note groups was counterbalanced among melodies-under the constraints of a combined condition (i.e., melody and bass accompaniment) that was included to assess the predictive processing of simultaneous musical streams (see F I G U R E 1 Example of the melodies used in the high predictability (HP) and low predictability (LP) conditions. Deviants are indicated with colors. Only pitch deviants were analyzed in this article Quiroga-Martinez et al., 2019a, for further details). The analysis of the combined condition is beyond the scope of this article and will be reported elsewhere. LP and HP conditions were counterbalanced across participants and always followed the combined condition.

| Experimental procedure
Participants received oral and written information, completed musical expertise questionnaires and put on MEG-compatible clothes. We then digitized their head shapes for co-registration with anatomical images and head-position tracking. During the recording, participants were sitting upright in the MEG scanner looking at a screen. Before presenting the musical stimuli, their individual hearing threshold was measured through a staircase procedure using a pure tone with a frequency of 1 kHz. The sound level was set at 60 dB above threshold.
We instructed them to watch a silent movie of their choice, ignore the sounds and move as little as possible. Participants were informed there would be musical sequences playing in the background interrupted by short pauses so that they could take a break and adjust their posture. Sounds were presented through isolated MEGcompatible ear tubes (Etymotic ER•30). The recording lasted approximately 90 min, and the whole experimental session took between 2.5 and 3 hr, including consent, musical expertise tests, preparation, instructions, breaks, and debriefing.

| MEG recording and preprocessing
Brain magnetic fields were recorded with an Elekta Neuromag MEG TRIUX system with 306 channels (204 planar gradiometers and 102 magnetometers) and a sampling rate of 1,000 Hz. Continuous head position information (cHPI) was obtained with four coils attached to the forehead and the mastoids. Offline, the temporal extension of the signal source separation (tSSS) technique (Taulu & Simola, 2006) was used to isolate signals coming from inside the skull employing Elekta's MaxFilter software (Version 2.2.15). This procedure included movement compensation for all participants except two nonmusicians, for whom continuous head position information was not reliable due to suboptimal placement of the coils. These participants, however, evinced reliable auditory event-related fields (ERFs), as verified by visual inspection of the amplitude and polarity of the P50(m) component. Electrocardiography, electrooculography, and independent component analysis were used to correct for eye-blink and heartbeat artifacts, employing a semiautomatic routine (FastICA algorithm and functions "find_bads_eog" and "find_bads_ecg" in MNE-Python; Gramfort et al., 2013). Visual inspection of the rejected components served as a quality check.

| Source localization and network structure
To identify the auditory networks underlying the processing of pitch MMNm responses, we localized the neural generators of the standard and deviant evoked responses. For this, we used Multiple Sparse Priors  implemented in SPM12 (version 7478 Heschl's gyrus (rA1; x = 46, y = À16, z = 0), left (lSTG; x = À58, y = À6, z = 6) and right (rSTG; x = 56, y = 2, z = À1) anterior STG, and rFOP (x = 50, y = 4, z = 12). The coordinates of these peaks were used as spatial priors for the five nodes or sources of our DCM network.
First, we include anterior STG (instead of posterior STG or planum temporale), which has been related to the processing of pitch sequences (Gander et al., 2019;Patterson, Uppenkamp, Johnsrude, & Griffiths, 2002). Second, we include the FOP, which is posterior to the inferior frontal gyrus node used previously. To determine whether the source estimates in the FOP node were an artifact of source leakage, we evaluated the evidence for models with and without this node using Bayesian model comparison.

| Model structure and comparisons
We defined a network comprising five sources: rA1, lA1, rSTG, lSTG, and rFOP. Forward connections were set from A1 to STG and from STG to FOP, whereas backward connections were set from FOP to STG and from STG to A1. The A1 nodes received simulated auditory thalamic input modeled with a temporal Gaussian bump function, peaking at 60 ms after sound onset. We used point dipoles to model each source. The prior location of each dipole was obtained from the source localization estimates above and the dipole orientation and location were optimized during model fitting. We assumed interhemispheric dipole symmetry, as warranted by the bilateral auditory generators.
In a first level (within subject) DCM analysis, we modeled the average evoked response from 0 to 300 ms after sound onset, separately for each of the two predictability conditions. We defined the standard sound as the baseline (reflected in the default connectivity matrix A) and allowed certain connections to change during processing of the deviant sound (as defined in a connectivity matrix B). Note that B-parameters effectively encode the difference between standards and deviants and, therefore, the modulation of effective connectivity that underwrites the MMN response. By switching on and off different B parameters, we were able to

| Group-level analyses
First-level B-parameter estimates were submitted to a second level (between subject or group) analysis using Parametric Empirical Bayes (PEB), a technique in which the group-level variance is used as a hyperprior to constrain the random effects on first-level parameters (Zeidman et al., 2019). The second level model was a general linear model (GLM) that enabled us to assess the evidence that (a specific set of) B-parameters were modulated by stimulus deviance, melodic predictability, musicianship, or the predictability-by-musicianship interaction. The parameters of the second level model constitute an interaction between each factor of the GLM and each B-parameter of a DCM: for example, the effect of melodic predictability on the change in intrinsic connectivity within left A1.
All factors in the GLM (i.e., design matrix) were mean-centered.
To test our hypotheses, we assessed the evidence that the above factors had an effect on the B-parameters, by comparing the evidence for all models in which the factor is switched on, with that of all models in which it is switched off (including the null model). Note that an effect of stimulus deviance is simply a nonzero B-parameter, that is modeled with the constant term in the GLM. The above analysis was repeated for a series of planned comparisons for each family of first level DCMs, that is, prespecified hypotheses about the connections mediating the mismatch responses.
In a complementary analysis, we used Bayesian model averaging to estimate the effects parameterized by the second level model, that is, the posterior density over model parameters weighted by the posterior probability of the models considered. Finally, in an exploratory analysis, we used Bayesian model reduction , which performs a "greedy" search over all possible models-including those outside our hypothesis space-by beginning with a full model that includes all the above effects, then pruning away redundant parameters (that are not necessary to account for the data and just add to model complexity). The (log) probability that a B-parameter is modulated by a factor is given as the difference in log evidence between second level models with and without the effect in question.

| Effective connectivity underlying the MMN
Compared to standard sounds, deviant sounds reduced the inhibitory intrinsic connections in A1 and STG and inhibitory backward connections from STG to A1. This is reflected in the high posterior probabilities (>.90) of "backward" and "intrinsic" families ( Figure 4a, first F I G U R E 3 State equations (left) and generative model (right) based on the canonical microcircuit (Bastos et al., 2012). The activity of four populations of neurons (indicated by the x vectors and marked with different colors) evolves according to a set of coupled differential equations defined by the excitatory (red arrows) and inhibitory (blue arrows) connections in the network and a set of intrinsic (matrix G) and extrinsic (matrix A) synaptic weights. Extrinsic synaptic weights pertain to connections between brain regions or nodes, here denoted by i. See main text for further details.

| Effect of melodic predictability
We found reduced intrinsic connectivity in rA1 for HP compared to LP melodies, as shown by the relatively high probability of the "intrin-

| Interaction between predictability and expertise
We did not observe strong evidence for a predictability-by-expertise

| DISCUSSION
In this study, we found that the MMN responses elicited by surprising sounds in music listening-and the effects of predictability and expertise on these responses-rest on disinhibition of auditory areas, as indicated by reduced intrinsic connectivity within A1 and STG and backward connectivity from STG to A1. This supports the notion that neural gain, as a plausible mechanism for mediating precisionweighted prediction error, underlies the salience of surprising sounds and the strength of the neural responses they generate.

| Connectivity patterns underlying the MMN
The main contributors to the MMN were modulations of intrinsic connectivity within bilateral A1 and rSTG, and bilateral backward connectivity from STG to A1. Reduced intrinsic connectivity implies an increase in the excitability of neural populations and has been interpreted as a salience-related enhancement of neural gain in response to deviant sounds (Auksztulewicz et al., 2017(Auksztulewicz et al., , 2018Auksztulewicz & Friston, 2015). In other words, mistuned sounds may attract attentional resources and thus be prioritized at the earliest stages of auditory cortical processing.
That we found a reduction in top-down inhibition from secondary to primary auditory areas and a lack of modulation of forward connections contrasts with most previous studies in which both forward and backward connections show oddball-related effects (Auksztulewicz & Friston, 2015;Chennu et al., 2016;Garrido, Kilner, Kiebel, & Friston, 2007;Garrido, Kilner, Kiebel, et al., 2009;Lumaca, Dietz, Hansen, Quiroga-Martinez, & Vuust, 2020;Schmidt et al., 2013). From a predictive coding perspective (Friston, 2005;Garrido, Kilner, Stephan, et al., 2009;Huang & Rao, 2011), forward and backward communication between brain areas reflects the update of predictive models by prediction error. Thus, the reason why backward connections were weakened, and forward connections remained unchanged might be that out-of-tune deviants (which violate a rather low-level musical regularity) are handled locally in the auditory cortex F I G U R E 6 Event-related field of the MMNm (difference between deviants and standards) for each condition, group, and hemisphere, as observed in the experiment and predicted by DCM. The time courses correspond to the average of representative left (1611, 1621, 0231, and 0241) and right (2411, 2421, 1331, and 1341) auditory channels. Shaded areas depict 95% confidence intervals. HP, high predictability, LP, low predictability. Note that, while data were originally baseline corrected, for DCM they were mean centered as shown in the figure and elicit little model updating (i.e., learning) at higher stages where melodic expectations are likely processed. This, in turn, might indicate that out-of-tune sounds are heard as occasional, attention-grabbing "wrong" notes, rather than structurally novel events demanding a change in the current predictive model of the melody. In other words, a tone in a melody could still be predicted and recognized, even when it is saliently out-of-tune. A similar view has been proposed by Koelsch, Vuust, and Friston (2018), who suggest that, in typical musical MMN designs, the higher-order predictive model is so strong that deviant sounds elicit prediction error responses that do not get resolved at higher processing levels. As the authors put it, deviant sounds "fall on deaf ears" and "keep knocking on the door" (p. 6).
Another finding that differs from previous research is the lack of involvement of frontal areas in the generation of the MMN, as indicated by the low probability of the opercular family. This is consistent with the lack of frontal generators previously reported for the same dataset (Quiroga-Martinez et al., 2019b) and in a recent fMRI study using simple musical stimuli (Lumaca et al., 2020). This suggests that the opercular peak found in the present source-level statistical analyses may reflect source leakage. Thus, the acoustic deviations introduced (i.e., out-of-tune sounds) may have been resolved at low-level processing stages in the temporal lobe, without engaging frontal areas typically involved in the sequential processing of sounds (Koelsch et al., 2002(Koelsch et al., , 2009). This may have been reinforced by the fact that participants were instructed to ignore the sounds and watch a film instead.
Interestingly, an EEG study using the same stimuli as here found that MMN responses were similarly modulated by melodic predictability in participants with congenital amusia (a condition that disrupts pitch processing) and controls (Quiroga-Martinez et al., 2021). Since amusia most likely arises from reduced connectivity between temporal and frontal areas (Albouy et al., 2013;Peretz, 2016), this further indicates that the processes underlying the mistuning MMN and its modulation by predictability and musicianship are restricted to local auditory areas in the temporal lobe. Note, however, that exploratory Bayesian model reduction indicated that connections between rFOP and rSTG may have been modulated by predictability and the predictability-by-expertise interaction. This could indicate that, as sounds become more salient, higher-order brain areas are engaged.
However, further research is needed to properly assess this claim and dissociate it from source leakage.

| Enhancement of neural gain in predictable melodies
The strength of intrinsic (inhibitory) connections in rA1 was reduced in predictable compared to less predictable melodies. Such connectivity changes may thus underlie the stronger MMN response observed for the former. This is consistent with the hypothesis that the stimulus-driven increase in predictive precision enhances the sensitivity to upcoming sensory signals, thus providing evidence for the role of gain modulation in precision weighting of prediction error. This effect was found only in the right hemisphere, which may reflect the fact that musical pitch processing in the general population is predominantly right-lateralized (Albouy, Benjamin, Morillon, & Zatorre, 2020;Zatorre, Belin, & Penhune, 2002).

| Left-lateralized gain enhancement in musicians
Compared to nonmusicians, musicians showed disinhibition in lA1 and backward connections from lSTG to lA1. This indicates that the stronger MMN response in this group might rest on the same gain enhancement mechanism found for the effect of predictability. In turn, this suggests that both phenomena could be framed as enhancements in predictive precision. In previous work, we have proposed the terms "stimulus-driven" and "expertise-driven" to refer to these two types of uncertainty reduction (Quiroga-Martinez et al., 2019b). Thus, here we show that, although they seem to affect prediction error responses independently, stimulus-driven, and expertise-driven uncertainty reduction might rely on similar underlying changes in effective connectivity. Furthermore, these results agree with the enhanced responses previously found in musicians for violations of pitch-related regularities such as interval, contour, musical tuning, and pitch patterns (Boh, Herholz, Lappe, & Pantev, 2011;Fujioka, Trainor, Ross, Kakigi, & Pantev, 2004;Herholz, Lappe, & Pantev, 2009;Koelsch, Schröger, & Tervaniemi, 1999;Tervaniemi et al., 2014;Vuust et al., 2012).
Considering the hemispheric specialization for temporal versus spectral information and for music versus language (Albouy et al., 2020;Zatorre et al., 2002), this could mean that musicians' pitch processing involves a finer temporal evaluation of the sounds and more explicit lexical knowledge of musical structure.

| A plausible mechanism for precisionweighted prediction error
Taken together, our results suggest that the modulation of gain in auditory areas may underlie the weighting of prediction error responses by uncertainty (Clark, 2013;Feldman & Friston, 2010;Hohwy, 2012), where uncertainty corresponds to unpredictable melodic contexts and a lack of musical expertise in the current experimental design. Effects on intrinsic connectivity are not necessarily explained by short or long-term synaptic plasticity, but rather by modulations of synaptic efficacy through acetylcholine or other classical neuromodulatory neurotransmitters (Auksztulewicz & Friston, 2016;Baldeweg et al., 2006). Changes in synaptic efficacy may also be mediated by fast synchronous interactions, involving spiking inhibitory interneurons equipped with NMDA receptors (Schmidt et al., 2013).
Consistent with our results, the precision weighting of prediction errors has been cast as reflecting unexpected uncertainty-that is, a momentary change in the estimated predictability of the context induced by unexpected events-which has been associated with modulations of pupil diameter and the neuromodulator norepinephrine (Bianco, Ptasczynski, & Omigie, 2020;Dayan & Yu, 2006;Zhao et al., 2019). Thus, a plausible hypothesis is that the enhanced excitability of auditory cortex in response to deviant sounds has its origins in neuromodulation-and concomitant changes in synchronous gain.
Note, however, that we also found evidence for a reduction of backward (inhibitory) connectivity, suggesting that the observed effects were, at least in part, mediated by changes in the sensitivity to topdown afferents from cortical sources higher in the auditory hierarchy.
These changes, nonetheless, are quantitatively smaller than those in intrinsic connections and further contribute to the disinhibition of A1, thereby facilitating gain enhancement. Future research should aim to disentangle the specific contribution of neuromodulation and changes in synaptic efficacy to gain control in A1.

| CONCLUSION
In this study, we characterized the neuronal dynamics and changes in synaptic efficacy underlying the salience of pitch deviants and its modulation by melody predictability and musical expertise during music listening. Using DCM, we found that musicianship and predictability, as well as deviance itself, increased neural gain in primary auditory cortex through a reduction in the strength of intrinsic (inhibitory) connectivity in A1 and STG. The MMN effect was also associated with reduced backward connectivity from STG to A1. Gain modulation in primary auditory cortex was right-lateralized in the case of predictability and left-lateralized in the case of musical expertise. Our findings are consistent with predictive processing theories suggesting that precise and informative error signals are prioritized by the brain for subsequent hierarchical processing. Furthermore, they suggest that the ability to contextualize sensory processing in musicianship and predictable sensory streams relies on similar neuronal gain mechanisms.

ACKNOWLEDGMENTS
The authors wish to thank the project initiation group, namely Chris-

DATA AVAILABILITY STATEMENT
Data sharing is not applicable to this article as no new data were created or analyzed in this study.