Address correspondence to Dr. Jeffrey R. Binder, Department of Neurology, Medical College of Wisconsin, 9200 W. Wisconsin Ave., Milwaukee, Wisconsin 53226, U.S.A. E-mail: email@example.com
Aims: Many fMRI protocols for localizing speech comprehension have been described, but there has been little quantitative comparison of these methods. We compared five such protocols in terms of areas activated, extent of activation, and lateralization.
Methods: fMRI BOLD signals were measured in 26 healthy adults during passive listening and active tasks using words and tones. Contrasts were designed to identify speech perception and semantic processing systems. Activation extent and lateralization were quantified by counting activated voxels in each hemisphere for each participant.
Results: Passive listening to words produced bilateral superior temporal activation. After controlling for prelinguistic auditory processing, only a small area in the left superior temporal sulcus responded selectively to speech. Active tasks engaged an extensive, bilateral attention, and executive processing network. Optimal results (consistent activation and strongly lateralized pattern) were obtained by contrasting an active semantic decision task with a tone decision task. There was striking similarity between the network of brain regions activated by the semantic task and the network of brain regions that showed task-induced deactivation, suggesting that semantic processing occurs during the resting state.
Conclusions: fMRI protocols for mapping speech comprehension systems differ dramatically in pattern, extent, and lateralization of activation. Brain regions involved in semantic processing were identified only when an active, nonlinguistic task was used as a baseline, supporting the notion that semantic processing occurs whenever attentional resources are not controlled. Identification of these lexical-semantic regions is particularly important for predicting language outcome in patients undergoing temporal lobe surgery.
Localization of language areas prior to brain surgery can help determine the risk of postoperative aphasia and may be useful for modifying surgical procedures to minimize such risk. Anterior temporal lobe (ATL) resection is a common and highly effective treatment for intractable epilepsy (Wiebe et al., 2001; Tellez-Zenteno et al., 2005), but carries a 30–50% risk of decline in naming ability when performed on the left temporal lobe (Hermann et al., 1994; Langfitt & Rausch, 1996; Bell et al., 2000; Sabsevitz et al., 2003). In addition to retrieval of names, the left temporal lobe is classically associated with speech comprehension (Wernicke, 1874). These seemingly different language functions both depend on common systems for processing speech sounds (phonology) and word meanings (lexical semantics), both of which are located largely in the temporal lobe (Indefrey & Levelt, 2004; Awad et al., 2007). Identification of these phonological and lexical-semantic systems is therefore an important goal in the presurgical mapping of language functions.
Functional magnetic resonance imaging (fMRI) is used increasingly for this purpose (Binder, 2006). fMRI is a safe, noninvasive procedure for localizing hemodynamic changes associated with neural activity. Many fMRI studies conducted on healthy adults have investigated the brain correlates of speech comprehension, though with a variety of activation procedures and widely varying results. There has been little systematic, quantitative comparison of these activation protocols. There is at present little agreement, for example, on which type of procedure produces the strongest activation, which is most specific for detecting processes of interest, and which is associated with the greatest degree of hemispheric lateralization.
Table 1. Five types of protocols for mapping speech comprehension areas
Another principle useful for categorizing speech comprehension studies is whether or not an active task is requested of the participants. Many studies employed passive listening to speech, whereas others required participants to respond to the speech sounds according to particular criteria. Active tasks focus participants' attention on a specific aspect of a stimulus, such as its form or meaning, which is assumed to cause “top-down” activation of the neural systems relevant for processing the attended information. For example, some prior studies of speech comprehension sought to identify the brain regions specifically involved in processing word meanings (lexical-semantic system) by contrasting a semantic task using speech sounds with a phonological task using speech sounds (Démonet et al., 1992; Mummery et al., 1996; Binder et al., 1999).
A final factor to consider in categorizing speech comprehension studies is whether or not an active task is used for the baseline condition. Performing a task of any kind requires a variety of general functions such as focusing attention on the relevant aspects of the stimulus, holding the task instructions in mind, making a decision, and generating a motor response. Active control tasks are used to “subtract out” these and other general processes that are considered nonlinguistic in nature and therefore irrelevant for the purpose of language mapping. Another potential benefit of such tasks is that they provide a better-controlled baseline state than resting (Démonet et al., 1992). Evidence suggests that the conscious resting state is characterized by ongoing mental activity experienced as “daydreams,”“mental imagery,””inner speech” and the like, which is interrupted when an overt task is performed (Antrobus et al., 1966; Pope & Singer, 1976; Singer, 1993; Teasdale et al., 1993; Binder et al., 1999; McKiernan et al., 2006). It has been proposed that this “task-unrelated thought” depends on the same conceptual knowledge systems that underlie language comprehension and production of propositional language (Binder et al., 1999). Thus, resting states and states in which stimuli are presented with no specific task demands may actually be conditions in which there is continuous processing of conceptual knowledge and mental production of meaningful “inner speech.”
Table 1 illustrates five types of contrasts that have been used to map speech comprehension systems. The first four are obtained by crossing either a passive or active task state with a control condition using either silence or a nonspeech auditory stimulus. The last contrast uses speech sounds in both conditions while contrasting an active semantic task with an active phonological task. Though there are other possible contrasts not listed in the table (e.g., active speech vs. passive nonspeech), most prior imaging studies on this topic can be classified into one of these five general types. Within each type, of course, are many possible variations on both stimulus content (e.g., relative concreteness, grammatical class, or semantic category of words; sentences vs. single words) and specific task requirements.
Our aim in the current study was to provide a meaningful comparison between these five types of protocols through controlled manipulation of the factors listed in Table 1. Interpreting differences between any two of these types requires that the same or comparable word stimuli be used in each case. We used single words for each protocol, as these lend themselves readily to use in semantic tasks. Similarly, we used the same nonspeech tone stimuli for the passive and active control conditions in protocols 2 and 4. Finally, because fMRI results are known to be highly variable across individuals and scanning sessions, we scanned the same 26 individuals on all five protocols in the same imaging session. The results were characterized quantitatively in terms of extent and magnitude of activation, specific brain regions activated, and lateralization of activation. The results provide the first clear picture of the relative differences and similarities, advantages and disadvantages of these speech comprehension mapping protocols.
Participants in the study were 26 healthy adults (13 men, 13 women), ranging in age from 18 to 29 years, with no history of neurologic, psychiatric, or auditory symptoms. All participants indicated strong right-hand preferences (laterality quotient > 50) on the Edinburgh Handedness Inventory (Oldfield, 1971). Participants gave written informed consent and were paid a small hourly stipend. The study received prior approval by the Medical College of Wisconsin Human Research Review Committee.
Task conditions and behavioral measures
Task conditions during scanning included a resting state, a passive tone listening task, an active tone decision task, a passive word listening task, a semantic decision task, and a phoneme decision task (Table 2). Auditory stimuli were presented with a computer and a pneumatic audio system, as previously described (Binder et al., 1997). Participants kept their eyes closed during all conditions. For the resting condition, participants were instructed to remain relaxed and motionless. No auditory stimulus was presented other than the baseline scanner noise. Stimuli in the passive and active tone tasks were digitally synthesized 500-Hz and 750-Hz sinewave tones of 150 ms duration each, separated by 250-ms intertone intervals. These were presented as sequences of three to seven tones. In the passive tone task, participants were asked simply to listen to these sequences. In the tone decision task, participants were required to respond by pressing a button with the left hand for any sequence containing two 750 Hz tones. The left hand was used in this and all other active tasks to minimize any leftward bias due to activation of the left hemisphere motor system.
Table 2. Conditions used in the study
2. Passive Tone
3. Tone Decision
Respond if sequence
contains two high tones
4. Passive Word
5. Semantic Decision
Respond if animal meets
both semantic criteria
6. Phoneme Decision
Respond if item contains
both target phonemes
Stimuli in the passive word and semantic decision tasks were 192 spoken English nouns designating animals (e.g., turtle), presented at the rate of one word every 3 s. In the passive word task, participants were asked simply to listen passively to these words. In the semantic decision task, participants were required to respond by a left-hand button press for animals they considered to be both “found in the United States” and “used by people.” No animal word was used more than once in the same task. The tone and word stimuli were matched on average intensity, average sound duration (750 ms), average trial duration (3 s), and frequency of targets (37.5% of trials). Characteristics of the tone and word stimuli and the rationale for the tone and semantic decision tasks are described elsewhere in greater detail (Binder et al., 1995; Binder et al., 1997).
Stimuli in the phoneme decision task were spoken consonant-vowel (CV) syllables, including all combinations of the consonants b, d, f, g, h, j, k, l, m, n, p, r, s, t, v, w, y, and z with the five vowels /æ/, /i/, /a/, /o/, and /u/. All syllables were edited to a duration of 400 ms. Each trial presented three CV syllables in rapid sequence, e.g., /pa dæ su/. Subjects were required to respond by a left-hand button press when a triplet of CV syllables included both the consonants /b/ and /d/.
Seven functional scans were acquired (Table 3). In each scan, one of the conditions alternated eight times with one of the other conditions in a standard block design. Each of these sixteen blocks lasted 24 s and included eight stimulus presentations (i.e., eight tone trains, eight words, or eight CV triplets). To avoid the possibility that participants might automatically perform the active decision tasks during the passive conditions, all passive scans were acquired before the participants received any training on the decision tasks or had knowledge that tasks were to be performed. Within this group of scans (scans 1–3), the order of scans was randomized and counterbalanced across participants.
Table 3. Composition and order of the functional scans
1, 2, or 3
1, 2, or 3
1, 2, or 3
Learn Tone Decision task
Learn Semantic Decision task
5, 6, or 7
Learn Phoneme Decision task
5, 6, or 7
6, or 7
Following the passive scans, participants were instructed in the Tone Decision task and performed a Tone Decision versus Rest scan to identify brain regions potentially involved in task-unrelated conceptual processing during the resting state. To avoid the possibility that participants might rehearse or review the semantic task during these resting blocks, the Tone Decision versus Rest scan was always acquired first after the passive scans and before participants had any knowledge of the semantic task. The remaining three scans were acquired after training on the Semantic Decision task. These included a Semantic Decision versus Rest scan, a Semantic Decision versus Tone Decision scan, and a Semantic Decision versus Phoneme Decision scan. The order of these three scans was randomized and counterbalanced across participants, except that the Phoneme Decision task was restricted to one of the final two scans. This restriction was to avoid the mental burden of having to learn and perform two new tasks simultaneously (Semantic Decision and Phoneme Decision) for scan 5.
Performances on the Tone Decision and Phoneme Decision tasks were scored as the proportion of correct responses. Responses on the semantic decision task were scored using response data from a group of 50 normal right-handed controls on the same stimulus sets. Items that were responded to with a probability greater than 0.75 by controls (e.g., lamb, salmon, horse, goose) were categorized as targets, and items that were responded to with a probability less than 0.25 by controls (e.g., ape, cockroach, lion, stork) were categorized as distractors. Performance by each subject was then scored as the proportion of correct discriminations between targets and distractors.
Scanning was conducted on a 1.5 Tesla General Electric Signa scanner (GE Medical Systems, Milwaukee, WI, U.S.A.) using a 3-axis local gradient coil with an insertable transmit/receive birdcage radiofrequency coil. Padding was placed behind the neck and around the head as needed to relax the cervical spine and to fill the space between the head and inner surface of the coil. Functional imaging employed a gradient-echo echoplanar sequence with the following parameters: 40 ms echo time, 4 s repetition time, 24 cm field of view, 64 × 64 pixel matrix, and 3.75 × 3.75 × 7.0 mm voxel dimensions. A total of 17 to 19 contiguous sagittal slice locations were imaged, encompassing the entire brain. One hundred sequential image volumes were collected in each functional run, giving a total duration of 6 min, 40 s for each run. Each 100-image functional run began with 4 baseline images (16 s) to allow MR signal to reach equilibrium, followed by 96 images during which two comparison conditions were alternated for eight cycles. High-resolution, T1-weighted anatomical reference images were obtained as a set of 124 contiguous sagittal slices using a 3D spoiled-gradient-echo (SPGR) sequence.
Image processing and subtraction analysis
Image analysis was done with AFNI software (available at http://afni.nimh.nih.gov/afni) (Cox, 1996). Motion artifacts were minimized by registration of the raw echoplanar image volumes in each run to the first steady-state volume (fifth volume) in the run. Estimates of the three translation and three rotation movements at each point in each time-series were computed during registration and saved. The first four images of each run, during which spin relaxation reaches an equilibrium state, were discarded, and the mean, linear trend, and second-order trend were removed on a voxel-wise basis from the remaining 96 image volumes of each run.
Multiple regression analysis of each run in each subject was performed to identify voxels showing task-associated changes in BOLD signal. An idealized BOLD response was derived by convolving the 24-s on/off task alternation function with a canonical hemodynamic response modeled using a gamma function (Cohen, 1997). Movement vectors computed during image registration were included in the model to remove residual variance associated with motion-related changes in MRI signal. This analysis generated, for each functional run, a map of beta coefficients representing the magnitude of the response at each voxel and a map of correlation coefficients representing the statistical fit of the observed data to the idealized BOLD response function.
Group activation maps were created with a random-effects model treating subject as a random factor. The beta coefficient map from each subject was spatially smoothed with a 6-mm full-width-half-maximum Gaussian kernel to compensate for intersubject variance in anatomical structure. These maps were then resized to fit standard stereotaxic space (Talairach & Tournoux, 1988) using piece-wise affine transformation and linear interpolation to a 1-mm3 voxel grid. A single-sample, two-tailed t-test was then conducted at each voxel for each run to identify voxels with mean beta coefficients that differed from zero. These group maps were thresholded using a voxel-wise 2-tailed probability of p < 0.0001 (|t-deviate|≥ 4.55) and minimum cluster size of 200 mm3, resulting in a whole-brain corrected, 2-tailed probability threshold of p < 0.05 for each group map, as determined by Monte-Carlo simulation.
The final analysis determined the mean number of significantly activated voxels in each hemisphere for each task contrast. Individual masks of the supratentorial brain volume were created for each subject by thresholding the first echoplanar image volume to exclude voxels outside of the brain, followed by manual editing to remove the cerebellum and brainstem. The resulting mask was aligned to standard stereotaxic space and divided at the midline to produce separate left and right whole-hemisphere regions of interest (ROIs). The correlation coefficient maps from each subject were then thresholded at a whole-brain corrected p < 0.05 (voxel-wise p < 0.001 and minimum cluster size of 295 mm3) and converted to standard stereotaxic space. Activated voxels were then automatically counted in the left and right hemisphere ROIs for each subject and task contrast. A laterality index (LI) was computed for each subject and task contrast using the formula (L – R)/(L + R), where L and R are the number of voxels in the left and right hemisphere ROIs. LIs computed in this way are known to vary as a function of the significance threshold, becoming more symmetrical as the threshold is lowered (Adcock et al., 2003). LIs for some tasks have been shown to vary substantially depending on the brain region in which the voxels are counted (Lehéricy et al., 2000; Spreer et al., 2002). To permit a meaningful comparison between activation protocols, we therefore used the same threshold and whole-hemisphere ROI for all protocols.
All participants learned the tasks easily and tolerated the scanning procedure well. Performance on the Tone Decision task was uniformly good, with participants attaining a mean score of 98.4% correct (SD = 1.9, range 89–100%). A paired t-test showed no difference in Tone Decision performance when the task was paired with rest compared to when it was paired with the Semantic Decision task (p = 0.256). Participants also performed well in discriminating targets from distractors on the Semantic Decision task, with a mean score of 92.3% correct (SD = 4.4, range 72–100%). It should be noted that judgments in the Semantic Decision task are subjective and depend on participants' personal experiences, hence there are no strictly correct or incorrect responses. Accuracy scores reflect the similarity between a participant’s responses and those of a group of participants, and are intended merely to demonstrate compliance with the task. Repeated-measures ANOVA showed no difference in Semantic Decision performance when the task was paired with rest, with the Tone Decision task, or with the Phoneme Decision task (p = 0.745). Accuracy on the Phoneme Decision task averaged 92.4% correct (SD = 4.7, range 77–100%).
Results for each of the main speech comprehension contrasts are described below, as well as several relevant contrasts between the control conditions. Peak activation coordinates for each contrast are given in the Appendix.
Passive Words versus Rest
Activation during passive listening to words, compared to a resting state, occurred mainly in the superior temporal gyrus (STG) bilaterally, including Heschl's gyrus (HG) and surrounding auditory association cortex in the planum temporale (PT), lateral STG, and upper bank of the superior temporal sulcus (STS) (Fig. 1, top panel). Smaller foci of activation were observed in two left hemisphere regions, including the inferior precentral sulcus (junction of Brodmann areas (BA) 6, 44, and 8) and the posterior inferior temporal gyrus (ITG) (BA 37). Stronger activation during the resting state (blue areas in Fig. 1, top) was observed in the posterior cingulate gyrus and precuneus bilaterally.
Passive Words versus Passive Tones
Much of the STG activation to words could be due to prelinguistic processing of auditory information, and therefore not specific to speech or language. As shown in the middle panel of Fig. 1, passive listening to tone sequences elicited a very similar pattern of bilateral activation in STG, HG, and PT, demonstrating that activation in these regions is not specific to speech.
The lower panel of Fig. 1 shows a direct contrast between passive words and passive tones. Contrasting words with this nonspeech auditory control condition should eliminate activation related to prelinguistic auditory processing. A comparison of the top and bottom panels of Fig. 1 confirms that activation in HG, PT, and surrounding regions of the STG was greatly reduced by incorporating this control condition. Activation for words relative to tones is restricted to ventral regions of the STG lying in the STS. This STS activation is clearly lateralized to the left hemisphere, consistent with the idea that additional activation for words over tones reflects language-related phoneme perception processes. No other areas showed greater activation for words over tones. Stronger activation for tones was noted in the posterior STG bilaterally, and in the right posterior cingulate gyrus.
Semantic Decision versus Rest
In contrast to the passive word listening condition, the Semantic Decision task requires participants to focus attention on the words, retrieve specific semantic knowledge, make a decision, and generate a motor response. Like the passive listening condition, this task produced bilateral activation of the STG due to auditory processing, though this activation was somewhat more extensive than in the passive listening condition (Fig. 2, upper panel). In addition, the Semantic Decision task activated a complex, bilateral network of frontal, limbic, and subcortical structures. Activation in the lateral frontal lobe was strongly left-lateralized, involving cortex in the pars opercularis of the inferior frontal gyrus (IFG) and adjacent middle frontal gyrus (MFG), and distinct regions of ventral and dorsal premotor cortex (BA 6, frontal eye field). Strong, symmetric activation occurred in the supplementary motor area (SMA), anterior cingulate gyrus, and anterior insula. Smaller cortical activations involved the left intraparietal sulcus (IPS) and the right central sulcus, the latter consistent with use of a left-hand response for the task. Bilateral activation also occurred in several subcortical regions, including the putamen (stronger on the right), anterior thalamus, medial geniculate nuclei, and paramedian mesencephalon. Finally, there was activation in the cerebellum bilaterally, involving both medial and lateral structures, and stronger on the left. Regions of task-induced deactivation (i.e., relatively higher BOLD signal in the resting condition) were observed in the posterior cingulate gyrus and adjacent precuneus bilaterally, the rostral and subgenual cingulate gyrus bilaterally, and the left postcentral gyrus.
Semantic Decision versus Tone Decision
Much of the activation observed in the Semantic Decision–Rest contrast could be explained by general executive, attention, working memory, and motor processes that are not specific to language tasks. As shown in the middle panel of Fig. 2, the Tone Decision task elicited a similar pattern of bilateral activation in premotor, SMA, anterior cingulate, anterior insula, and subcortical regions. In addition, the Tone Decision task activated cortex in the right hemisphere that was not engaged by the Semantic task, including the posterior IFG, dorsolateral prefrontal cortex (inferior frontal sulcus), supramarginal gyrus (SMG) (BA 40), and mid-middle temporal gyrus (MTG). Regions of task-induced deactivation were much more extensive with this task, involving large regions of the posterior cingulate gyrus and precuneus bilaterally, rostral/subgenual cingulate gyrus bilaterally, left orbital and medial frontal lobe, dorsal prefrontal cortex in the superior frontal gyrus (SFG) and adjacent MFG (mainly on the left), angular gyrus (mainly left), ventral temporal lobe (parahippocampus, mainly on the left), and the left anterior temporal pole.
The lower panel of Fig. 2 shows the contrast between Semantic Decision and Tone Decision tasks. Using the Tone Decision task as a control condition should eliminate activation in general executive systems as well as in low-level auditory and motor areas. A comparison of the top and bottom panels of Fig. 2 confirms these predictions, showing subtraction of the bilateral activation in dorsal STG, premotor cortex and SMA, anterior insula, and deep nuclei that is common to both tasks. Compared to the Tone task, the Semantic task produced relative BOLD signal enhancement in many left hemisphere association and heteromodal regions, including much of the prefrontal cortex (anterior and mid-IFG, SFG, and portions of MFG), several regions in the lateral and ventral temporal lobe (MTG and ITG, anterior fusiform gyrus, parahippocampal gyrus, anterior hippocampus), the angular gyrus, and the posterior cingulate gyrus. Many of these regions (dorsal prefrontal cortex, angular gyrus, posterior cingulate) were not prominently activated when the Semantic Decision task was contrasted with Rest, suggesting that these regions are also active during the resting state. Their appearance in the Semantic Decision–Tone Decision contrast is due to their relative deactivation (or lack of tonic activation) during the Tone task. Other areas activated by the Semantic task relative to the Tone task included the right cerebellum and smaller foci in the pars orbitalis of the right IFG, right SFG, right angular gyrus, right posterior cingulate gyrus, and left anterior thalamus.
Several areas showed relatively higher BOLD signals during the Tone Decision task, including the PT bilaterally, the right SMG and anterior IPS, and scattered regions of premotor cortex bilaterally.
Semantic Decision versus Phoneme Decision
Stimuli used in the Tone Decision task are acoustically much simpler than the speech sounds used in the Semantic Decision task and contain no phonemic information. The Phoneme Decision task, which requires participants to process meaningless speech sounds (pseudowords), provides a control for phonemic processing, allowing more specific identification of brain regions involved in semantic processing.
The following regions showed stronger activation for the Semantic compared to the Phoneme task (Fig. 3): left dorsal prefrontal cortex (SFG and adjacent MFG), pars orbitalis (BA 47) of the left IFG, left orbital frontal cortex, left angular gyrus, bilateral ventromedial temporal lobe (parahippocampus, fusiform gyrus, and anterior hippocampus, more extensive on the left), bilateral posterior cingulate gyrus, and right posterior cerebellum. Small activations were observed in the anterior left STS and the right pars orbitalis. In contrast to the Semantic Decision–Tone Decision contrast (see Fig. 2, bottom), there was little or no activation of dorsal regions of the left IFG or adjacent MFG, or of the lateral temporal lobe (MTG, ITG). These latter regions must have been activated in common during both the Semantic and Phoneme Decision tasks, and are therefore likely to be involved in presemantic phonological processes, such as phoneme recognition. Posterior regions of the left IFG and adjacent premotor cortex (BA 44/6) were in fact activated more strongly by the Phoneme task than the Semantic task. Other regions showing this pattern included the right posterior IFG and premotor cortex, extensive regions of the STG bilaterally, the SMG and anterior IPS bilaterally, and the SMA bilaterally.
A notable aspect of the Semantic Decision activation pattern for this contrast is how closely it resembles the network of brain areas showing task-induced deactivation (i.e., stronger activation during the resting state) in the contrast between Tone Decision and Rest (blue areas in Fig. 2, middle panel). Fig. 4 shows these regions of stronger activation for the resting state, duplicated from Fig. 2, together with the areas activated by the Semantic relative to the Phonemic task. In both the Semantic Decision–Phoneme Decision contrast and the Rest–Tone Decision contrast, stronger BOLD signals are observed in left angular gyrus, left dorsal prefrontal cortex (SFG and adjacent MFG), left orbital frontal cortex, left pars orbitalis, posterior cingulate gyrus, bilateral ventromedial temporal lobe (more extensive on the left), and left temporal pole.
Activation extent and degree of lateralization
For clinical applications, it is important not only that a language mapping protocol identify targeted linguistic systems, but also that it produce consistent activation at the single subject level and a left-lateralized pattern useful for determining language dominance. We quantified the extent and lateralization of activation for each task protocol by counting the number of voxels that exceeded a whole-brain corrected significance threshold in each participant (Table 4). Repeated-measures ANOVA showed effects of task protocol on total activation volume (F(4,100) = 19.528, p < 0.001), left hemisphere activation volume (F(4,100) = 23.070, p < 0.001), right hemisphere activation volume (F(4,100) = 15.133, p < 0.001), and LI (F(4,100) = 21.045, p < 0.001). Total activation volume was largest for the Semantic Decision–Tone Decision protocol, and was greater for this protocol than for all others (all pair-wise p < 0.05, Bonferroni corrected for multiple comparisons) except the Semantic Decision–Rest protocol. Total activation volume was greater for all of the active task protocols than for any of the passive protocols (all pair-wise p < 0.05, Bonferroni corrected). Left hemisphere activation volume was largest for the Semantic Decision–Tone Decision protocol, which produced significantly more activated left hemisphere voxels than any of the other four protocols (all pair-wise p < 0.05, Bonferroni corrected). The LI was greatest for the Semantic Decision–Tone Decision and Semantic Decision–Phoneme Decision protocols. LIs for these tasks did not differ, but both were greater than the LIs for Passive Words–Rest and Semantic Decision–Rest (all pair-wise p < 0.05, Bonferroni corrected). Single-sample t-tests showed that LIs for the Passive Words–Rest (p = 0.23) and Semantic Decision–Rest (p = 0.07) protocols did not differ from zero, whereas LIs for the other three protocols were all significantly greater than zero (all p < 0.0001). Finally, LIs for the Semantic Decision–Tone Decision and Semantic Decision–Phoneme Decision protocols showed much less variation across the group (smaller SD) compared to the protocols using passive conditions (all F ratios > 4.4, all p < 0.001).
Table 4. Mean extent and lateralization of language-related activation for five fMRI contrasts
Activation volume in ml
Numbers in parentheses are standard deviations. Voxel counts have been converted to normalized volumes of activation, expressed in ml.
Passive Words vs. Rest
Passive Words vs. Passive
Semantic Decision vs. Rest
Semantic Decision vs.
Semantic Decision vs.
In summary, the active task protocols (Semantic Decision–Rest, Semantic Decision–Tone Decision, and Semantic Decision–Phoneme Decision) produced much more activation than the passive protocols, the Semantic Decision–Tone Decision protocol produced by far the largest activation volume in the left hemisphere, and the protocols that included a nonresting control (Passive Words–Tones, Semantic Decision–Tone Decision, and Semantic Decision–Phoneme Decision) were associated with stronger left-lateralization of activation than the protocols that used a resting baseline. Of the five protocols, the Semantic Decision–Tone Decision protocol showed the optimal combination of activation volume, leftward lateralization, and consistency of lateralization.
For clinical applications, it is important to know the probability of detecting significant activation in targeted ROIs in individual patients. We previously constructed left frontal, temporal, and angular gyrus ROIs using activation maps from the Semantic Decision–Tone Decision contrast in a group of 80 right-handed adults (Frost et al., 1999; Szaflarski et al., 2002; Sabsevitz et al., 2003). With these three ROIs as targets, activated voxels (whole-brain corrected p < 0.05) were detected in 100% of the 26 participants in the current study in all three ROIs using the Semantic Decision–Tone Decision protocol.
Our aim in this study was to compare five types of functional imaging contrasts used to examine speech comprehension networks. The contrasts produced markedly different patterns of activation and lateralization. These differences have important implications for the selection and interpretation of clinical fMRI language mapping protocols.
Many researchers have attempted to identify comprehension networks by contrasting listening to words with a resting state. Our Passive Words–Rest contrast confirms similar prior studies showing bilateral, symmetric activation of the STG, including primary auditory areas in HG and PT as well as surrounding association cortex, during passive word listening (Petersen et al., 1988; Wise et al., 1991; Mazoyer et al., 1993; Price et al., 1996; Binder et al., 2000; Specht & Reul, 2003). Interpretations of this STG activation vary, with some authors equating it to the “receptive language area of Wernicke” and others arguing that it represents a prelinguistic auditory stage of processing (Binder et al., 1996a; Binder et al., 2000). The latter account arises from the fact, often neglected in traditional models of language processing, that speech sounds are complex acoustic events. Speech phonemes (consonants and vowels), prosodic intonation, and speaker identity are all encoded in subtle spectral (frequency) and temporal patterns that must be recognized quickly and efficiently by the auditory system (Klatt, 1989). Analogous to the monkey STG, which is comprised largely of neurons coding such auditory information (Baylis et al., 1987; Rauschecker et al., 1995), the human STG (including the classical Wernicke area) appears to be specialized for processing complex auditory information. Thus, much of the activation observed in contrasts between word listening and resting can be attributed to auditory perceptual processes rather than to recognition of specific words. According to this model, activation should occur in the same STG regions during listening to spoken nonwords (e.g., “slithy toves”) and to complex sounds that are not speech. Many imaging studies have confirmed these predictions (Wise et al., 1991; Démonet et al., 1992; Price et al., 1996; Binder et al., 2000; Scott et al., 2000; Davis & Johnsrude, 2003; Specht & Reul, 2003; Uppenkamp et al., 2006). Because auditory perceptual processes are represented in both left and right STG, this model also accounts for why the activation observed with passive listening is bilateral, and why lateralization measures obtained with this type of contrast are not correlated with language dominance as measured by the Wada test (Lehéricy et al., 2000).
Passive Words–Passive Tones
We used simple tone sequences to “subtract out” activation in the STG due to auditory perceptual processes with the expectation that only relatively early auditory processing would be removed using this control. The Passive Words–Passive Tones contrast confirmed similar prior studies, showing activation favoring speech in the mid-portion of the STS, with strong left lateralization (Mummery et al., 1999; Binder et al., 2000; Scott et al., 2000; Ahmad et al., 2003; Desai et al., 2005; Liebenthal et al., 2005; Benson et al., 2006). Nearly all of the activation observed in the dorsal STG in the Passive Words–Rest contrast was removed by incorporating this simple nonspeech control, confirming that this more dorsal activation is not specific to speech or to words. The resulting activation, though strongly left-lateralized, was much less extensive than with other protocols. Six participants had little or no measurable activation (<0.1 ml) with the Passive Words–Passive Tones protocol. Among the other 20 participants, the total activation volume for the Semantic Decision–Tone Decision protocol was, on average, 44 times greater than for Passive Words–Passive Tones.
Active tasks that require participants to consciously process specific information about a stimulus are often used to enhance activation in brain regions associated with such processing. We designed a semantic decision task that required participants to retrieve specific factual information about a concept and use that information to make an explicit decision. Like the passive listening condition, this task produced bilateral activation of the STG due to auditory processing. This activation was somewhat more extensive than in the passive listening condition, consistent with previous reports that attention enhances auditory cortex activation (O'Leary et al., 1996; Grady et al., 1997; Jancke et al., 1999; Petkov et al., 2004; Johnson & Zatorre, 2005; Sabri et al., 2008). The Semantic Decision–Rest contrast also activated widespread prefrontal, anterior cingulate, anterior insula, and subcortical structures bilaterally. Some of these activations can be attributed to general task performance processes that are not specific to language. For example, any task that requires a decision about a stimulus must engage attentional systems that enable the participant to attend to and maintain attention on the stimulus. Similarly, any such task must involve maintenance of the task instructions and response procedure in working memory, a mechanism for making a decision based on the instructions, and a mechanism for executing a particular response. Dorsolateral prefrontal and inferior frontal cortex, premotor cortex, SMA, anterior cingulate, anterior insula, IPS, and subcortical nuclei have all been linked with these general attention and executive processes in prior studies (Paulesu et al., 1993; Braver et al., 1997; Smith et al., 1998; Honey et al., 2000; Adler et al., 2001; Braver et al., 2001; Ullsperger & von Cramon, 2001; Corbetta & Shulman, 2002; Krawczyk, 2002; Binder et al., 2004). Although there is modest leftward lateralization of the activation in some of these areas, most notably in the left dorsolateral prefrontal cortex, much of it is bilateral and symmetric, consistent with prior studies of attention and working memory.
Semantic Decision–Tone Decision
Active control tasks are used to subtract activation related to general task processes (Démonet et al., 1992). The aim is to design a control task that activates these systems to roughly the same degree as the language task while making minimal demands on language-specific processes. The Tone Decision task used in the present study requires subjects to maintain attention on a series of nonlinguistic stimuli, hold these in working memory, generate a decision consistent with task instructions, and produce an appropriate motor response. Because many of these processes are common to both the Semantic Decision and Tone Decision tasks, activations associated with general executive and attentional demands of the Semantic Decision task are not observed in the Semantic Decision–Tone Decision contrast. As with the Passive Words–Passive Tones contrast, the tone stimuli used in the control task also cancel out activation in the dorsal STG bilaterally and even produce relatively greater activation of the PT compared to words (Binder et al., 1996a). Other areas with relatively higher BOLD signals during the Tone Decision task included scattered regions of premotor cortex bilaterally, the right SMG, and right anterior IPS. These activations probably reflect the greater demands made by the Tone task on auditory short-term memory (Crottaz-Herbette et al., 2004; Arnott et al., 2005; Gaab et al., 2006; Brechmann et al., 2007; Sabri et al., 2008).
Most striking about the Semantic Decision–Tone Decision contrast, however, are the extensive, left-lateralized activations in the angular gyrus, dorsal prefrontal cortex, and ventral temporal lobe that were not visible in the Semantic Decision–Rest map. These areas have been linked with lexical-semantic processes in many prior imaging studies (Démonet et al., 1992; Price et al., 1997; Cappa et al., 1998; Binder et al., 1999; Roskies et al., 2001; Binder et al., 2003; Devlin et al., 2003; Scott et al., 2003; Spitsyna et al., 2006). Lesions in these sites produce deficits of language comprehension and concept retrieval in patients with Wernicke aphasia, transcortical aphasia, Alzheimer disease, semantic dementia, herpes encephalitis, and other syndromes (Alexander et al., 1989; Damasio, 1989; Gainotti et al., 1995; Dronkers et al., 2004; Nestor et al., 2006; Noppeney et al., 2007). These regions form a widely distributed, left-lateralized network of higher-order, supramodal cortical areas distinct from early sensory and motor systems. We propose that this network is responsible for storing and retrieving the conceptual knowledge that underlies word meaning. Processing of such conceptual knowledge is the foundation for both language comprehension and propositional language production (Levelt, 1989; Awad et al., 2007). It is these brain regions, in other words, that represent the “language comprehension areas” in the human brain, as opposed to the early auditory areas, attentional networks, and working memory systems highlighted by the Semantic Decision–Rest contrast.
Why is activation in these regions not visible in the Semantic Decision–Rest contrast? The most likely explanation is that these regions are also active during the conscious resting state (Binder et al., 1999). These activation patterns suggest, in particular, that people retrieve and use conceptual knowledge and process word meanings even when they are outwardly “resting.” Though counterintuitive to many behavioral neuroscientists, this notion has a long history in cognitive psychology, where it has been discussed under such labels as “stream of consciousness,”“inner speech,” and “task-unrelated thoughts” (James, 1890; Hebb, 1954; Pope & Singer, 1976). Far from being a trivial curiosity, this ongoing conceptual processing may be the mechanism underlying our unique ability as humans to plan the future, interpret past experience, and invent useful artifacts (Binder et al., 1999). The existence of ongoing conceptual processing is supported not only by everyday introspection and a body of behavioral research (Antrobus et al., 1966; Singer, 1993; Teasdale et al., 1993; Giambra, 1995), but also by functional imaging studies (Binder et al., 1999; McKiernan et al., 2006; Mason et al., 2007). In particular, a recent study demonstrated a correlation between the occurrence of unsolicited thoughts and fMRI BOLD signals in the left angular gyrus and ventral temporal lobe (McKiernan et al., 2006). A key finding from both the behavioral and imaging studies is that ongoing conceptual processes are interrupted when subjects must attend and respond to an external stimulus. This observation allows us to explain why activation in this semantic network is observed when the Semantic Decision task is contrasted with the Tone Decision task but not when it is contrasted with a resting state. Unlike the resting state, the tone task interrupts semantic processing, thus a difference in level of activation of the semantic system occurs only when the tone task is used as a baseline.
This account also explains why this semantic network was not visible in either of the passive listening contrasts. Passive listening makes no demands on attention or decision processes and is therefore similar to resting. According to the model presented here, conceptual processes continue unabated during passive listening regardless of the type of stimuli presented. The semantic network therefore remains equally active through all of these conditions and is not visible in contrasts between them. Based on these findings, we disagree with authors who advocate the use of passive listening paradigms for mapping language comprehension systems. This position was articulated strongly by Crinion et al. (2003), who compared active and passive contrasts using speech and reversed speech stimuli. Similar activations were observed with both paradigms, which the authors interpreted as evidence that active suppression of default semantic processing is not necessary. Two aspects of the Crinion et al. study are noteworthy, however. First, the active task required participants to detect two or three changes from a male to a female speaker during a story that lasted several minutes. This task was likely very easy and did not continuously engage participants' attention. Second, the activation observed with these contrasts was primarily in the left STS, resembling the passive Words–Tones contrast in the current study. There was relatively little activation of ventral temporal regions such as those activated here in the Semantic Decision–Tone Decision contrast. Thus, we interpret the activations reported by Crinion et al. as occurring mainly at the level of phoneme perception, though their sentence materials likely also activated dorsolateral temporal lobe regions involved in syntactic parsing (Humphries et al., 2006; Caplan et al., 2008). In contrast to these systems, mapping conceptual/semantic systems in the ventral temporal lobe requires active suppression of ongoing conceptual processes.
The model just developed is outlined in schematic form in Table 5. The table indicates, for each of the experimental conditions, whether or not the condition engages any of the following six processes: auditory perception, phoneme perception, retrieval of concept knowledge, attention, working memory, and response production. A comparison of the + and – entries for any two of the conditions provides a prediction for which processes are likely to be represented in the contrast between those conditions. Reviewing the contrasts discussed above, for example, the Passive Words–Rest contrast is predicted to reveal activation related to auditory and phoneme perception, and the Passive Words–Passive Tones contrast is predicted to show activation related more specifically to phoneme perception. Semantic Decision–Rest is predicted to show activation in auditory and phoneme perception, attention, working memory, and response production systems. Semantic Decision–Tone Decision is predicted to show activation in phoneme perception and concept knowledge systems.
Table 5. Hypothesized neural systems engaged by the six conditions used in the study
Semantic Decision–Phoneme Decision
The last contrast we examined aims to isolate comprehension processes related to retrieval of word meaning. Because the Tone Decision task uses nonspeech stimuli, activation observed in the Semantic Decision–Tone Decision contrast represents both phoneme perception and lexical-semantic stages of comprehension. As illustrated in Table 5, the Phoneme Decision task, which incorporates nonword speech stimuli, is designed to activate the same auditory and phoneme perception processes engaged by the Semantic Decision task, but with minimal activation of conceptual knowledge. Areas activated in the Semantic Decision–Phoneme Decision contrast included the angular gyrus, ventral temporal lobe, dorsal prefrontal cortex, pars orbitalis of the IFG, orbital frontal cortex and posterior cingulate gyrus, all with strong leftward lateralization. These results are consistent with prior studies using similar contrasts (Démonet et al., 1992; Price et al., 1997; Cappa et al., 1998; Binder et al., 1999; Roskies et al., 2001; Devlin et al., 2003; Scott et al., 2003). Compared to the Semantic Decision–Tone Decision contrast, the Semantic Decision–Phoneme Decision contrast produces less extensive activation, consistent with the hypothesis that some of the activation observed in the former contrast is due to presemantic speech perception processes. It is also possible that the phoneme decision stimuli, though not words, were sufficiently word-like to partially or transiently activate word codes (Luce & Pisoni, 1998), resulting in partial masking of the lexical-semantic system.
The Phoneme Decision task requires participants to identify individual phonemes in the speech input and hold these in memory for several seconds. Thus this task makes greater demands on auditory analysis and phonological working memory than the Semantic Decision task. Posterior regions of the left IFG and adjacent premotor cortex (BA 44/6), and the SMG bilaterally, were activated more strongly by the Phoneme task than the Semantic task, supporting previous claims for involvement of these regions in phonological processes (Démonet et al., 1992; Paulesu et al., 1993; Buckner et al., 1995; Fiez, 1997; Devlin et al., 2003). Other areas activated more strongly by the Phoneme task included large regions of the STG bilaterally, the SMA bilaterally, the right posterior IFG and premotor cortex, and the anterior IPS bilaterally.
A close look at Table 5 reveals another contrast that could be used to identify activation related specifically to conceptual processing. In the contrast Rest–Tone Decision, the only system predicted to be more active during resting than during the tone task is the conceptual system. Fig. 4 shows a side-by-side comparison of the regions activated by the Semantic Decision–Phoneme Decision contrast and the Rest–Tone Decision contrast. In both cases, activation is observed in the left angular gyrus, ventral temporal lobe, dorsal prefrontal cortex, orbital frontal lobe, and posterior cingulate gyrus. The similarity between these activation maps is striking, and particularly so because they were generated using such different task contrasts. The same brain areas are activated by (1) resting compared to a tone decision task, and (2) a semantic decision task compared to a phoneme decision task. This outcome is highly counterintuitive and can only be explained by a model, such as the one in Table 5, that includes activation of conceptual processes during the resting state (Binder et al., 1999). Note that this resting state activation is the same phenomenon often called “the default state” in research on task-induced deactivations (Raichle et al., 2001). Our model provides an explicit account of the close similarity between semantic and task-induced deactivation networks, which has not been addressed in previous accounts of the default state.
Implications for language mapping
These observations have several implications for the design of clinical language mapping protocols. We found that the Semantic Decision–Tone Decision contrast produced an optimal combination of consistent activation and a strongly left-lateralized pattern. These results are not due to any unique characteristics of this particular semantic task; many similar tasks could be designed to focus participants' attention on word concepts. Moreover, as should be clear from this study, activation patterns are not determined by a single task condition, but rather by differences in processing demands between two (or more) conditions. In designing protocols for mapping language comprehension areas, it is critically important to incorporate a contrasting condition that interrupts ongoing conceptual processes by engaging the subject in an attentionally demanding task. The Tone Decision task accomplishes this by requiring continuous perceptual analysis of meaningless tone sequences. The Semantic Decision–Tone Decision contrast thus identifies not only differences related to presemantic phoneme perception but also differences in the degree of semantic processing. It is this strong contrast, not the Semantic Decision task alone, that accounts for the extensive activation.
Consistent and extensive activation, however, is not the only goal of language mapping. More important is that the activation represents linguistic processes of interest. All effortful cognitive tasks require general executive and attentional processes that may not be relevant for the purpose of language mapping. The Tone Decision task also engages these processes and thus “subtracts” activation due to them, resulting in a map that more specifically identifies language-related activation. As with the semantic task, these characteristics of the tone task are not unique, and there are many possible variations on this task that could accomplish the same goals, perhaps more effectively.
The activation patterns we observed reinforce current views regarding the neuroanatomical representation of speech comprehension processes. In contrast to the traditional view that localizes comprehension processes in the posterior STG (Geschwind, 1971), modern neuroimaging and lesion correlation data suggest that comprehension depends on a widely distributed cortical network involving many regions outside the posterior STG and PT. These areas include distributed semantic knowledge stores located in ventral temporal (MTG, ITG, fusiform gyrus, temporal pole) and inferior parietal (angular gyrus) cortices, as well as prefrontal regions involved in retrieval and selection of semantic information.
fMRI methods for mapping lexical-semantic processes have particular relevance for the presurgical evaluation of patients with intractable epilepsy. The most common surgical procedure for intractable epilepsy is resection of the anterior temporal lobe, and the most consistently reported language deficit after ATL resection is anomia (Hermann et al., 1994; Langfitt & Rausch, 1996; Bell et al., 2000; Sabsevitz et al., 2003). Although these patients have difficulty producing names, the deficit is not caused by a problem with speech articulation or articulatory planning, but by a lexical-semantic retrieval impairment. Difficulty retrieving names in such cases is a manifestation of partial damage to the semantic system that stores knowledge about the concept being named or to the connections between the concept and its phonological representation (Levelt, 1989; Lambon Ralph et al., 2001). Efforts to use fMRI to predict and prevent anomia from ATL resection should therefore focus on identification of this lexical-semantic retrieval system rather than on auditory processes or motor aspects of speech articulation, neither of which are affected by temporal lobe surgery or play a role in language outcome.
Consistent with this view is the fact that ATL resections commonly involve ventral regions of the temporal lobe while typically sparing most of the STG. Fig. 5 shows an overlap map of 23 left ATL resections performed at our center, computed by digital subtraction of preoperative and postoperative anatomical scans in each patient. As shown in the figure, the typical resection overlaps ventral lexical-semantic areas activated in the Semantic Decision–Tones contrast (green), but not STG areas activated in the Semantic Decision–Rest contrast. From a clinical standpoint, it is critical to detect these language zones that lie within the region to be resected, particularly since it is damage to these lexical-semantic systems that underlies the language deficits observed in these patients. In support of this model, lateralization of temporal lobe activation elicited with the Semantic Decision–Tone Decision contrast described here has been shown to predict naming decline after left ATL resection (Sabsevitz et al., 2003).
The current study focused on cognitive processes elicited by various tasks and did not attempt to resolve all issues regarding language mapping methods. For example, no attempt was made to compare lateralization in different ROIs. Previous studies showed that placement of ROIs in brain regions with lateralized activation (e.g., IFG) can circumvent the problem of nonspecific bilateral activation in other regions (Lehéricy et al., 2000). Furthermore, we restricted the current investigation to protocols involving single words, yet recent data suggest that spoken sentences may provide a more potent stimulus for eliciting temporal lobe activation (Vandenberghe et al., 2002; Humphries et al., 2006; Spitsyna et al., 2006; Awad et al., 2007). Development of more sensitive methods for identifying semantic networks in the anterior ventral temporal lobe is a particularly important goal for future research. We also did not investigate the relative ease with which these protocols can be applied to patients with neurological conditions. The Semantic Decision–Tone Decision protocol, as conducted here with a regular alternation between active tasks, has been used successfully by our group in over 200 epilepsy patients with full-scale IQ ranging as low as 70 (Binder et al., 1996b; Springer et al., 1999; Sabsevitz et al., 2003; Binder et al., 2008). Personnel with expertise in cognitive testing, such as a neuropsychologist or cognitive neurologist, are needed to train the patient to understand and perform the tasks, as is standard with Wada testing and other quantitative tests of brain function. Finally, the current study does not address whether removal of the temporal lobe regions activated in these protocols reliably causes language deficits. Does sparing or removal of these regions account for variation in language outcome across a group of patients? This question is beyond the scope of the current study but will be critical to address in future research.
The authors thank Julie Frost, William Gross, Edward Possing, Thomas Prieto, and B. Douglas Ward for technical assistance. This study was supported by National Institute of Neurological Diseases and Stroke grants R01 NS33576 and R01 NS35929, and by National Institutes of Health General Clinical Research Center grant M01 RR00058.
Conflict of interest: We confirm that we have read the Journal's position on issues involved in ethical publication and affirm that this report is consistent with those guidelines. None of the authors has any conflicts of interest to disclose.