Faster maturation of selective attention in musically trained children and adolescents: Converging behavioral and event‐related potential evidence

Previous work suggests that musical training in childhood is associated with enhanced executive functions. However, it is unknown whether this advantage extends to selective attention—another central aspect of executive control. We recorded a well‐established event‐related potential (ERP) marker of distraction, the P3a, during an audio‐visual task to investigate the maturation of selective attention in musically trained children and adolescents aged 10–17 years and a control group of untrained peers. The task required categorization of visual stimuli, while a sequence of standard sounds and distracting novel sounds were presented in the background. The music group outperformed the control group in the categorization task and the younger children in the music group showed a smaller P3a to the distracting novel sounds than their peers in the control group. Also, a negative response elicited by the novel sounds in the N1/MMN time range (~150–200 ms) was smaller in the music group. These results indicate that the music group was less easily distracted by the task‐irrelevant sound stimulation and gated the neural processing of the novel sounds more efficiently than the control group. Furthermore, we replicated our previous finding that, relative to the control group, the musically trained children and adolescents performed faster in standardized tests for inhibition and set shifting. These results provide novel converging behavioral and electrophysiological evidence from a cross‐modal paradigm for accelerated maturation of selective attention in musically trained children and adolescents and corroborate the association between musical training and enhanced inhibition and set shifting.

. Selective attention is the ability to maintain attention towards a target while resisting interference from irrelevant stimuli and is typically considered a central aspect of EFs (cf. interference control in Diamond, 2013;and attentional control in Jurado & Rosselli, 2007). Out of the different EFs, selective attention has an evident connection to inhibition and is by some included as a subcomponent of inhibitory control (Diamond, 2013), as it requires suppression of the processing of irrelevant stimuli. Inhibiting distractor processing during selective attention also seems to depend on neural resources separate from those for direction and maintenance of attention towards targets (Bidet-Caulet et al., 2010. Since being able to stay focused in the presence of distracting stimuli is crucial for cognitive development, academic achievement and health, uncovering the neurodevelopmental trajectory of selective attention has been a central goal of cognitive neuroscience and neuropsychology (Stevens & Bavelier, 2012).
Building on classic behavioral work on selective attention (Cherry, 1953;Posner, 1978), numerous auditory eventrelated potential (ERP) studies have investigated the neural mechanisms of attentional selection and distraction in paradigms where subjects focus on a primary task while taskirrelevant sounds are presented concurrently. These studies indicate that early (100-200 ms) auditory ERP responses such as the N1 and mismatch negativity (MMN) may be attenuated when attention is strongly focused towards the primary task and away from the eliciting sounds (Fritz et al., 2007;Hillyard et al., 1973;Woldorff et al., 1991). It is noteworthy, however, that there is a longstanding debate on whether attention modulates just the N1 or also the MMN and whether these responses are in fact separate entities (May & Tiitinen, 2010). Therefore, we adopt the term N1/MMN here to refer to the change-related negative responses in this time range. In such paradigms, rare task-irrelevant novel sounds (e.g., highly salient environmental sounds) elicit the P3a response, which is considered to reflect involuntary, bottom-up attention switch towards unexpected sounds (for reviews, see Escera et al., 2000;Escera & Corral, 2007). Accordingly, behavioral performance in the main task tends to be perturbed by the presentation of task-irrelevant, P3a-eliciting sounds (Escera et al., 1998;Gumenyuk et al., 2004;Wetzel et al., 2006). In adults, the novel sound-induced P3a is seen as a positive deflection over the frontal and central electrode sites that peak between 200 and 500 ms after stimulus onset. Particularly the later time window (~300 ms onwards) of the response has been associated with attentional orienting and distraction (Escera et al., 1998;Horváth et al., 2011) Accordingly, the late P3a (lP3a) is enlarged in individuals with heightened distractibility such as children with major depression (Lepistö et al., 2004) or ADHD (van Mourik et al., 2007) and patients with closed head injury (Kaipio et al., 2000). The early phase of the P3a has been proposed to reflect the processing of the deviating stimulus features (Horváth et al., 2011) rather than attention capture, perhaps reflecting an enhanced P2 response to salient sound changes (Wetzel et al., 2011).
Selective attention continues to improve during adolescence, reflecting the maturation of the fronto-parietal and subcortical networks supporting these functions (Blakemore & Choudhury, 2006). Behavioral studies indicate that even infants are able to deploy selective attention (e.g., Tummeltshammer et al., 2014) and ERP studies have revealed smaller responses to unattended versus attended stimuli in young children (Sanders et al., 2006) akin to the results obtained in adults (Hillyard et al., 1973). However, children are more readily distracted by extraneous information than adults as indexed for example by Flanker task performance (Rueda et al., 2004). Accordingly, young children tend to show larger P3a responses to novel sounds and poorer behavioral performance in the primary task than older children (Gumenyuk et al., 2004), adolescents (Wetzel & Schröger, 2007), or adults (Wetzel et al., 2011), indicating that the P3a reduces in amplitude with age as the attentional control system matures. Thus, the novel-sound-induced P3a is a widely used electrophysiological measure for the maturation of selective auditory attention.
Children's and adolescents' EFs skills predict various societally important phenomena such as academic performance, health, and occupational success (Miller et al., 2012;Moffitt et al., 2011;St Clair-Thompson & Gathercole, 2006). Such findings have fueled great interest in interventions aimed at supporting the development of EFs. There is evidence that targeted intervention programs and even leisure activities may enhance EFs in children (Diamond & Ling, 2016). A number of correlational studies indicate that musically trained individuals outperform untrained peers in tasks for key components of EFs like cognitive flexibility and working memory (Zuk et al., 2014), inhibition (Bialystok & DePape, 2009), and set shifting (Degé et al., 2011;Saarikivi et al., 2016; however, see Alemán et al., 2017;Schellenberg, 2011).
The enhancement of various EF components in musically trained children raises the question of whether this advantage extends to selective attention and the ability to inhibit distraction. Previous studies in children have provided inconclusive results due to methodological issues (for evidence for a selective attention advantage in adult musicians, see Puschmann et al., 2019;Tierney et al., 2020). Decreased trial-to-trial variability in ERP responses to syllable stimuli presented in the attended channel of a dichotic listening experiment has been interpreted as evidence for enhanced selective attention in musically trained children (Strait et al., 2015). However, the somewhat idiosyncratic neural measure employed in these studies and the lack of behavioral data from the dichotic listening task preclude firm conclusions from these findings. Although enhanced speech-in-noise perception in musically trained children has been taken as evidence for PUTKINEN ET al.

| 4247
enhanced auditory selective attention (Strait et al., 2012), speech-in-noise perception also strongly taps into perceptual acuity, and the relative roles of top-down attention and lower level perceptual processes in such group differences remain unclear (Coffey et al., 2017). Finally, musically trained children have been found to outperform untrained children in a neuropsychological test for "selective attention" where the children needed to touch a red circle within 2 s after hearing the word "red" on a recording of a word list (Degé et al., 2011). However, this task probably measures sustained rather than selective attention. Thus, while these studies have offered valuable insights into cognitive differences between musically trained and untrained children, they do not provide unequivocal evidence for enhanced selective attention in musically trained children and do not address whether musical training is associated with enhanced ability to resist distraction.
Only a few studies have investigated the neural correlates of the enhanced EFs in musically trained children. A handful of fMRI studies suggest that children with musical training recruit various regions including the inferior frontal gyrus and supplementary motor area more strongly than untrained children in set-shifting and inhibition tasks (Sachs et al., 2017;Zuk et al., 2014). Electroencephalography (EEG) and magnetoencephalography (MEG) studies in musically trained and untrained children have, in turn, almost exclusively concentrated on lower level auditory processing (Chobert et al., 2014;Fujioka et al., 2006;Habibi et al., 2016;Putkinen, Tervaniemi, Saarikivi, de Vent, et al., 2014;Putkinen, Tervaniemi, Saarikivi, Ojala, et al., 2014) rather than selective attention or other EFs (however see, Moreno et al., 2011;Strait et al., 2015).
In the current study, our goal was to use the wellestablished neural indices of auditory attention and distraction, the N1/MMN and P3a, to examine the maturation of selective attention in musically trained and untrained children and adolescents aged 10-17 years. To this end, we employed an audio-visual distraction paradigm, adapted from classical studies on selective auditory attention (e.g., Escera et al., 1998), where a task-irrelevant sequence of standard and novel sounds were presented in the background while subjects engaged in a visual categorization task. In similar auditoryvisual paradigms, auditory distractors have been shown to disrupt (or facilitate) behavioral performance in the visual task (e.g., Andrés et al.,;Ljungberg & Parmentier, 2012;Munka & Berti, 2006;Parmentier & Andrés, 2010;SanMiguel et al., 2010). Furthermore, attention towards visual tasks have been shown to reduce ERP responses to auditory deviants and novel sounds (Harmony et al., 2000;Zhang et al., 2006;SanMiguel et al., 2008;however, see Muller-Gass et al., 2007). The novel sounds were expected to elicit the N1/MMN-P3a-complex, providing neural indices of selective attention and distraction. The behavioral performance on trials that followed distracting novel sounds versus standard sounds provided a behavioral measure of how well the subjects were able to ignore the auditory stimuli. Furthermore, we measured inhibition and shifting with the same neuropsychological test as in our previous study (Saarikivi et al., 2016) to replicate the EF advantage in the music group we observed previously and to delineate the relationship between behaviorally measured inhibition and set shifting and our neural indices of selective attention and distraction. Since musical training has been associated with heightened EFs, we expected that the music group will outperform the control group in the neuropsychological tests for inhibition and set shifting and the behavioral selective attention task. Furthermore, since attention towards visual tasks has been shown to reduce auditory ERPs in the N1/MMN and P3a time ranges and subjects with better attentional skills show smaller responses to novel sounds in audio-visual attention tasks, we expected that the music group would show smaller N1/MMN and P3a responses to the task-irrelevant novel sounds (despite that in some conditions such as passive MMN paradigms musicians may show larger responses, e.g., Putkinen, Tervaniemi, Saarikivi, de Vent, et al., 2014;Putkinen, Tervaniemi, Saarikivi, Ojala, et al., 2014). Finally, we expected that the P3a amplitude will diminish with age, in line with previous studies (cf. Mahajan & McArthur, 2015).

| Subjects
Sixty-six subjects took part in the ERP experiment, and 80 subjects participated in neuropsychological testing. Out of these, 60 subjects took part in both (see Table 1).
The music group consisted of children who had started taking instrument lessons at approximately the age of 7. The subjects in the music group had attended or were attending | a special elementary school that included music lessons (individual instrument lessons, group music lessons, music theory) as a part of their curriculum. The control group consisted of children and adolescents with no formal music training outside their school curriculum (one lesson/week). None of the children had hearing deficits or neurological impairments. There was no significant difference between the control and music groups in IQ, estimated with scores of the Block design and Vocabulary subtasks of the WISC-IV intelligence scale (t[74] = 1.33, p = .186). The socioeconomic status (SES) of the participants was estimated by parental income and parental education. Education was measured on a scale of 1-7 (education: 1 = elementary school, 7 = postgraduate degree; and income on a scale of 1-6 = <1,000€/month, 6 = over 5,000€/month). A combined score was calculated with normalized values of education and income of both parents. There was no difference between the groups in SES (t[68] = 0.97, p > .33) for the 70 subjects (38 from the music group) whose parents provided these data. Written informed consent for participation was obtained from guardians of the participants before the experiment. Participants also gave verbal consent for their participation. Participants were rewarded three movie tickets for taking part in the study. The experiment protocol was approved by the Ethical Committees of the former Department of Psychology and of the Faculty of Behavioural Sciences, both at the University of Helsinki, Finland.

| Procedure and stimuli
During the experiment, the subjects sat in a recliner chair in an electrically and acoustically shielded room. The auditory stimuli were presented via headphones at a sound pressure level of 60 dB. The visual stimuli were presented on a computer screen placed at approximately 1.5 m in front of the subject.
The auditory stimuli were composed of a sequence of repeating standard sounds (p = .875) and occasional novel sounds (p = .125). The standard sounds were complex tones with two upper harmonic partials (−3 and −6 dB relative to the fundamental, respectively) and had the fundamental frequency of 500 Hz. The novel sounds consisted of a diverse set of environmental sounds and artificial noises. These same sounds have been found to elicit a prominent P3a response in children (Putkinen et al., 2012). The standard and novel sounds were 200 ms in duration and were presented with the interstimulus interval (ISI) of 500 ms. There were 34 different novel sounds which were all presented in a random order during each fifth of the sound sequence. Thus, each novel sound was presented five times during the experiment (a total of 170 novel sound presentations). Altogether 1,280 sounds were presented, making the total duration of the sequence 640 s.
Concurrently with the auditory stimulation, the subjects were presented with a sequence of photographs on the computer screen depicting either familiar animals (e.g., a cat, dog, or a rabbit) or non-animal objects (e.g., a car, book, or a computer) at the center of the screen on a white background. The pictures were presented with a constant SOA of 2,000 ms so that there was a 300-ms delay between each picture and the preceding sound. On half of the trials, the preceding sound was a standard sound (standard trials), and on the other half, it was a novel sound (novel trials). The novel and standard trials were presented in a random order.
The subjects were instructed to ignore the sounds and to press one button on a response box with their left hand and another button with their right hand depending on whether the picture depicted an animal or a non-animal object (counterbalanced across the subjects). Figure 1 provides a schematic of the paradigm.

| EEG recording
EEG data were continuously acquired with a BioSemi Active-Two system (BioSemi, Amsterdam, The Netherlands), recorded at a sampling rate of 512 Hz. The EEG was registered with 64 active Ag-Cl electrodes, positioned according to the International 10-20 system, and additional electrodes at the nose and the left and right mastoids. The electro-oculogram (EOG) was recorded with two electrodes, one below the left eye and the other lateral to the left outer canthus. F I G U R E 1 Illustration of the ERP paradigm. A sequence of standard and novel sounds (SOA, 500 ms) was presented in the background while the subjects engaged in a visual categorization task. N, Novel; S, Standard. Note that only responses to sounds preceding the pictures (by 300 ms) were included in the ERP analysis PUTKINEN ET al.

| EEG data analysis
All data preprocessing and analyses were conducted in MATLAB using the EEGLAB toolbox (v. 13.5.4b; Delorme & Makeig, 2004). Continuous data files were high-pass filtered at 0.5 Hz (Hamming windowed sinc FIR filter). The files were then epoched from 100 ms before to 500 ms after stimulus onset and referenced to the average of the two mastoid channels. Since novel sounds were always presented 300 ms before the picture onset, only those standard sounds that also preceded the picture by 300 ms were included in the analysis that preceded.
Artifact removal was done by conducting an independent component analysis (ICA) on the data. Before ICA, noisy epochs were removed through visual inspection, and bad channels were identified and excluded from the ICA. The resulting IC topography maps were used to identify and remove artifacts resulting from eye movements and other motion. The data were then low-pass filtered at 30 Hz. Any remaining epochs that contained deflections exceeding ±100 μV were automatically rejected. In total, this process removed an average of 7.7% (SD: 5.4%) of epochs. After this, bad channels were interpolated, and epochs were averaged separately for standard and novel sounds.

| Behavioral neuropsychological test data analysis
Reaction times (RT) and the number of incorrect button presses (error rate, ER) were calculated separately for the pictures following the novel sounds and standards. Only trials with correct responses were included in the analysis. The RTs and ERs were analyzed with separate repeated measures ANOVAs with the categorical between-subject factors stimulus (novel, standard) and group (music, control) and continuous predictor age.
A subtest from the NEPSY-II test battery (Korkman et al., 2008) was used to assess the participant's inhibition as well as set-shifting abilities. The inhibition phase of the test requires inhibiting the automatic response and naming the opposite shape ("circle" if square; "square" if circle) and direction of the arrow ("up," if down; "down," if up), and in the setshifting phase, the participant is instructed to switch between two response strategies, naming the correct shape/direction and naming the opposite shape/direction, depending on the color of the shape or the arrow (white/black). The test scores were analyzed with separate repeated measures ANOVAs with group (music, control) as a between-subject factors and age as a continuous predictor.
We correlated novel trial-standard trial RT difference and the inhibition and shifting test scores with the subject-wise response peak amplitudes defined largest amplitude of the novel-standard ERP difference at Cz within 100-150, 200-250, and 300-350 ms for the N1, eP3a, and lP3a, respectively.

| ERP results
The novel sounds elicited an N1/MMN response peaking at ~125 ms and a slow P3a-like response between approximately 175-400 ms and peaking at ~225 ms (Figure 2a-c).
The N1/MMN novel-minus-standard amplitude grew with age (main effect of age: F[1,62] = 6.308, p < .05) and was larger in the control group than in the music group (main effect of group: F[1,62] = 4.107, p < .05). The eP3a novelminus-standard amplitude reduced with age (main effect of age: F[1,62] = 4.312, p < .05) and did not significantly differ between the groups. Finally, in the lP3a time range, there was a significant group × age interaction (F[1,62] = 6.571, p < .01) indicating that the lP3a novel-minus-standard amplitude reduced with age more steeply in the control group than in the music group ( Figure 3). Accordingly, post hoc comparison of the estimated means amplitude at the lower quantile of the age range revealed a larger lP3a amplitude in the control group than in the music group whereas there was no significant difference in the estimated means amplitudes between the groups at the upper quantile of the age range. Separate follow-up ANOVA's for the novel and standard response amplitudes (calculated over the 300-to 350-ms time window) suggest that the interaction resulted for the novel sounds amplitudes being modulated by age and group (group × age interaction for novel sounds: F[1,62] = 3.6408, p < .062; for standard sounds: F[1,62] = 0.1882, p > .66). No other significant main effects or interactions were observed for any of the responses.

| Behavioral performance in the ERP task
The music group made fewer errors than the control group (main effect of group: F[1,73] = 4.637, p < .05; Figure 4a). ERs reduced with age irrespective of group (main effect of age: F[1,73] = 24.430, p < .001). No other significant effects on ERs were observed (all p > .1).
The RTs were significantly faster for trials following the novel sounds than for trials following the standard sounds (main effect of stimulus: F[1,73] = 8.373, p < .01). The RTs for both types of trials reduced with age (main effect of age: F(1,73) = 20.032, p < .001). No other significant effects on the RT were observed (all p > .2).
The novel trial-standard trial difference RTs correlated positively, although modestly, with the lP3a peak amplitude at Cz (r[64] = 0.29, p < .05). This correlation remained significant after adjusting for age and group (p < .05). Thus, even though on average the RTs on novel trials were faster than on standard trials, subjects with large lP3as showed slower RTs on novel trials than on standard trials (Figure 4b).

| Inhibition and set-shifting test performance
Completion times were faster for the music group than for the control group in both the inhibition (F[1,76

| DISCUSSION
We investigated the maturation of selective attention in musically trained and untrained children and adolescents aged 10-17 years using well-established neural markers (N1/MMN, P3a) and audio-visual selective attention task. Our main finding was that musically trained children and adolescents showed smaller N1/MMN and late P3a responses relative to the control group and outperformed the untrained peers in the audio-visual selective attention task. The late P3a amplitude was larger in the control than in the music group in the younger subjects. The late P3a amplitude diminished with age in the control group but remained stable in the music group across the examined age range. Large lP3a amplitudes were associated with stronger behavioral distraction in the audio-visual selective attention task. Finally, the music group outperformed the control group standardized tests for inhibition and set shifting, and poorer performance in the inhibition test was associated with a larger lP3a response.

| Musical training is associated with enhanced selective attention, inhibition and set shifting
The music group made fewer errors on average than the control group in the cross-modal selective attention task, indicating that children and adolescents in the music group were better able to ignore the distracting novel sounds and focus on the visual categorization task. Results from a handful of previous studies have been interpreted as evidence for enhanced selective attention in musically trained children (Degé et al., 2011;Strait et al., 2012Strait et al., , 2015. These studies have, however, employed tasks akin to those traditionally used for quantifying response sustained attention (Degé et al., 2011) F I G U R E 3 (a) The novel-minus-standard difference signals at Cz in the younger and older subjects separately for the music and control groups (note that age was used as a continuous predictor in the main statistical analyses and the age median split was done for illustration purposes). (b) The age effect in the music and control groups F I G U R E 4 (a) Percentage of correct responses in the ERP task and completion times for the Inhibition and Shifting subtests. The error bars indicate 95% confidence intervals. *p < .05; **p < .01; ***p < .001 (b) The relationship between lP3a amplitude and novel-minus-standard RTs. Turquoise area on the left: novel trial RTs > standard trial RTs; purple area on the right: standard trial RTs > novel trial RTs PUTKINEN ET al. | 4252 rather than selective attention per se or have used speech-innoise tasks (Strait et al., 2012) that have a strong perceptual component in addition to a putative selective attention demand (Coffey et al., 2017). Thus, it is unclear whether the previously reported group differences in fact reflect differences in selective attention. Here, we attempted to overcome these difficulties by employing a cross-modal task that arguably taps more directly into selective attention and distraction (adapted from Escera et al., 1998). Even though the task does also require attention maintenance and inhibitory control, it taxes selective attention more heavily by requiring continuous attention towards one modality during interference from another.
In our previous study (Saarikivi et al., 2016), conducted in the same cohort as the current one, we found that at age 9-15, the music group outperformed the control group in the same inhibition and set-shifting test employed here. Thus, the current results replicate our earlier findings and show that the inhibition and set-shifting advantage in the music group remained 2 years after the initial measurement. These results corroborate the association between musical training and enhanced inhibition and set shifting in children found in a number of previous studies (Bialystok & DePape, 2009;Degé et al., 2011;Holochwost et al., 2017;Joret et al., 2017;Travis et al., 2011).

| Neural processing of unattended auditory stimuli is attenuated in musically trained children and adolescents
In line with the behavioral data, the ERP results suggest more rapid maturation of neurocognitive mechanisms underlying selective attention in musically trained children than in their untrained peers. Namely, in the younger musically trained children, the amplitude of the late P3a was smaller than in their peers in the control group. This suggests that the younger children in the music group were able to allocate less attentional resources to the processing of the taskirrelevant novel sounds than the control children of the same age. In the control group, the lP3a was large in the younger subjects and diminished with age, in line with previous studies on P3a maturation during adolescence (e.g., Mahajan & McArthur, 2015). In the music group, in contrast, the lP3a amplitude was smaller relative to the control group in the younger subjects and showed no additional reduction with age. This result suggests that the neural mechanism underlying the lP3a had already reached relative maturity in the younger music group subjects.
The N1/MMN elicited by the novel sounds was also smaller in amplitude in the music group. Prior studies indicate that responses in the N1 and MMN time range are attenuated for deviant stimuli when attention is strongly focused to another stimulus stream (Hillyard et al., 1973;Woldorff et al., 1991) suggesting that early sound encoding and auditory deviance detection can be partially suppressed by selective attention. The smaller N1/MMN in the music group suggests they were better able to gate the processing of taskirrelevant novel sounds. Unlike the lP3a group difference, which diminished with age, the N1/MMN was smaller in the music group irrespective of age. This suggests that N1/MMN and lP3a effects capture different aspects auditory attention, have distinct developmental trajectories, and might be differentially affected by musical experience.
It is noteworthy that numerous prior studies have reported enlarged N1, MMN and P3-like auditory responses in musicians or musically trained children when compared to non-musicians (e.g. Chobert et al., 2014;Fujioka et al., 2004;Koelsch et al., 1999; for reviews, see Putkinen & Tervaniemi, 2018;Tervaniemi, 2009) including studies with the same participants as in the current one (Putkinen, Tervaniemi, Saarikivi, de Vent, et al., 2014;Putkinen, Tervaniemi, Saarikivi, Ojala, et al., 2014;Saarikivi et al., 2016). In most of these studies, the responses were recorded to small acoustic changes in passive listening conditions (e.g., while watching a movie), meaning that the sound changes were not particularly salient, and the attentional demands of the task were modest. Some studies have also reported enlarged N2b responses, a negative response in the N1/MMN time range, in musicians in active listening conditions, where the participants actively discriminating the sounds (Tervaniemi et al., 2005. Thus, the enlarged responses obtained in these studies reflect enhanced discrimination of small acoustic changes in musicians, whereas the reduced responses in the current study index enhanced control over distraction induced by highly distinct novel sounds.

P3a and behavioral indices of EF
It is a common finding that behavioral performance is impaired in the primary task on the target trials following task-irrelevant sounds that elicit the P3a. Here, in contrast, RTs were faster on the novel sound trials than on the standard sound trials. Some prior studies also indicate that the P3a is not always associated with behavioral distraction and that novel sounds can even facilitate behavioral performance (Wetzel et al., 2013). In the current study, novel sounds were always followed by a target stimulus whereas standard sounds were not. Some subjects might have been able to use the novel sounds as cues for the upcoming targets and, as a consequence, responded faster on these trials. However, this behavioral facilitation, although significant, was small and a large proportion of the subjects showed PUTKINEN ET al.

| 4253
the typical RT cost on trials following the novel sounds. These subjects also displayed larger lP3a amplitudes on average than those showing the behavioral facilitation effect as indexed by the positive correlation between the novelstandard-RT difference and lP3a (Figure 3b). Thus, despite the group-level behavioral facilitation on the novel trials, the positive association between lP3a amplitude and RTs is in line with numerous previous studies indicating that the lP3a is related to involuntary attention capture and distraction (Escera et al., 2000).
Interestingly, the completion times in the inhibition subtest correlated positively with the lP3a amplitude, i.e., the slower the test performance, the larger the lP3a. This result suggests that this behavioral measure of inhibition and the neural index of involuntary attention switch may tap into common underlying processes. The results are in line with the notion that inhibition of behavioral responses and control over interference are related yet separable (cf. cognitive inhibition in Diamond, 2013). Furthermore, the RTs in the standard and novel trials of the audiovisual selective attention task correlated modestly with the performance in the inhibition and set-shifting tests. These results dovetail with findings from latent variable analyses indicating that, to an extent, behavioral measures of different EFs measure overlapping functions (Friedman & Miyake, 2017). Furthermore, meta-analyses of fMRI studies on brain areas activated during diverse EF tasks have shown considerable overlap in adults (Niendam et al., 2012) as well as children (McKenna et al., 2017) indicating that a common neural system supports distinct EFs. However, in our study, test performance explained only 9%-14% of the variation in lP3a amplitude and the RTs in the selective attention task, indicating that-despite the modest association between these measures-the inhibition and set-shifting tests and the audio-visual selective attention task measure separable aspects of EFs.

| Limitations
One caveat of the current study is that we did not manipulate the attentional demands of the paradigm parametrically which could arguably have provided even stronger evidence that the reduced amplitude of the N1/ MMN and P3a response in the music group was due to selective attention. One alternative interpretation of the ERP group difference would be that the children in the music group simply show smaller responses for reasons unrelated to the main visual task. Another interpretation of our results is that the novel sounds did not trigger involuntary attention capture or orienting response in the music group as readily as in the control group either because of attenuated involuntary attention in the music group or enhanced involuntary attention in control group (i.e., differences in top-down control of attention did not contribute to the ERP group difference). However, these interpretations are in contrast to previous studies-including ones in the same children who participated in the current study-showing that in other paradigms musically trained children show larger MMN and P3a responses than their untrained peers (Putkinen, Tervaniemi, Saarikivi, Ojala, et al., 2014;Putkinen, Tervaniemi, Saarikivi, de Vent, et al., 2014;Saarikivi et al., 2016). Furthermore, the correlation between task performance and response amplitudes indicates that topdown control of attention contributed to the reduced responses in the music group.
It bears reminding that cross-sectional correlational studies, like the current one, cannot establish causality between musical training and the observed group differences (Sala & Gobet, 2017;Schellenberg, 2011). Due to practical difficulties related to conducting long-term randomized control studies in children , the current study lacks random assignment and baseline data collected before the music group started receiving musical training, and therefore, we cannot rule out the contribution of self-selection and pretraining differences. Indeed, twin studies indicate that individual differences in EF have genetic etiology suggesting that enhanced EF in musicians might stem from genetic predispositions. Thus, although there is persuasive evidence that EFs can be improved by specifically designed training programs (Diamond, 2013;Diamond & Ling, 2016), some of which mimic musical training (Moreno et al., 2011), it would be premature to attribute these group differences entirely to training-induced plasticity.

| CONCLUSIONS
We found that musical training is associated with enhanced selective attention, inhibition, and set shifting in children and adolescents. Namely, we found that musically trained children and adolescents outperformed untrained peers in a selective attention task and showed smaller N1/MMN and lP3a responses to unattended novel sounds. Furthermore, the music group performed faster in standardized tests for inhibition and set shifting. These results provide novel, converging behavioral and ERP evidence for enhanced maturation of EFs in musically trained children and adolescents.