Can resting-state functional MRI serve as a complement to task-based mapping of sensorimotor function? A test–retest reliability study in healthy volunteers




To investigate if resting-state functional MRI (fMRI) reliably can serve as a complement to task-based fMRI for presurgical mapping of the sensorimotor cortex.

Materials and Methods:

Functional data were obtained in 10 healthy volunteers using a 3 Tesla MRI system. Each subject performed five bilateral finger tapping experiments interleaved with five resting-state experiments. Following preprocessing, data from eight volunteers were further analyzed with the general linear model (finger tapping data) and independent component analysis (rest data). Test–retest reliability estimates (hit rate and false alarm rate) for resting-state fMRI activation of the sensorimotor network were compared with the reliability estimates for task-evoked activation of the sensorimotor cortex. The reliability estimates constituted a receiver operating characteristics curve from which the area under the curve (AUC) was calculated. Statistical testing was performed to compare the two groups with respect to reliability.


The AUC was generally higher for the task experiments, although median AUC was not significantly different on a group level. Also, the two groups showed comparable levels of within-group variance.


Test–retest reliability was comparable between resting-state measurements and task-based fMRI, suggesting that presurgical mapping of functional networks can be a supplement to task-based fMRI in cases where patient status excludes task-based fMRI. J. Magn. Reson. Imaging 2011;. © 2011 Wiley-Liss, Inc.

BLOOD OXYGENATION LEVEL dependent functional MRI (BOLD fMRI) experiments during which the subject performs a specific task is an important method for presurgical mapping of pertinent areas of neocortex in, for example, tumor and epilepsy patients. A typical clinical fMRI experiment involves several paradigms aimed at activating brain areas in the vicinity of the lesion to provide information to the surgeon regarding areas at risk. Task-evoked fMRI activation requires active patient participation, and this can be difficult to achieve, for example, in children or in patients showing severe disease-related impairment in their cognitive capacity or difficulties in performing instruction guided movement of their body. In cases when task-based fMRI is not feasible, it would be of great value to be able to map the areas at risk without the need for active patient participation. To this end, several studies have indicated that functional networks in the human brain can be mapped using resting-state fMRI (rs-fMRI).

The feasibility of rs-fMRI has been reported in several publications for several subject groups. Damoiseaux et al (1) showed that consistent networks can be found in healthy volunteers using Tensor Probabilistic Independent Component Analysis and concluded that baseline activity of the brain is consistent across subjects. It has also been shown that rs-fMRI can be used to map the development of functional cortical networks from infancy to adolescency (2, 3). Rs-fMRI has, furthermore, revealed functional networks under periods of extended rest or during sleep (4–8), as well as during anesthesia (9, 10). The results of these studies suggest that rs-fMRI can be valuable in a clinical setting, possibly even in sleeping or anaesthetized patients. In a study by Liu et al (11), the clinical use of rs-fMRI was investigated in candidates for surgery with respect to similarities between activation patterns for task-based and resting state fMRI. By using a seed-ROI (region of interest) based approach for rs-fMRI, a significant overlap of activated areas, identified by the two approaches, was found for both hand and tongue activation, which further supports the validity of rs-fMRI for presurgical mapping of sensorimotor areas. Shimony et al (12) suggested the use of seed-ROI based rs-fMRI as a presurgical planning technique in tumor patients. However, a potential drawback of using the seed-ROI based approach in a clinical setting is that the ROI has to be defined using morphological image information. In the case of locally distorted anatomy, not uncommon in the vicinity of brain tumors, this could be problematic. Furthermore, areas such as the language system exhibit greater variability across subjects, posing difficulties in identifying seed regions even for patients exhibiting normal brain anatomy. An important aspect of any method aiming at presurgical assessment of brain regions at risk is the issue of repeatability. Chen et al (13) applied group ICA to a group of healthy volunteers over five rs-fMRI sessions and found that several resting-state networks were consistent over time. Furthermore, a study reported by Van Dijk et al (14) indicated good reliability of resting-state networks across two sessions using spatial overlap of activated areas in thresholded statistical maps as a measure of reliability.

To supplement previous studies on test–retest assessment of rs-fMRI, we have, in the present study, investigated the reliability of resting-state measurements by comparing the repeatability of identifying resting-state networks to the repeatability obtained by task-evoked brain activation. The sensorimotor cortical network is often of interest in clinical fMRI. Robust paradigms, such as finger tapping and fist clenching, are routinely used for mapping this network. Hence, mapping of the sensorimotor area during a bilateral finger tapping task was used as a reference method to which we compared the test–retest repeatability of the resting-state sensorimotor network. Also, because a seed-ROI based method might be quite unreliable in a patient with distorted anatomy, the data-driven independent component analysis (ICA) was used for assessment of resting-state networks in the present study. Test–retest reliability was assessed by the method proposed by Genovese et al (15) and Noll et al (16), where estimates of hit rate (pA), false alarm rate (pI), and overall fraction of active voxels (λ) are calculated on the basis of the repeatability of activation over several identical experiments. Importantly, the estimates of pA, pI, and λ can be assessed for a range of statistical thresholds. The estimated parameters pA and pI yield a receiver operating characteristics (ROC) curve, from which the area under the ROC curve serves as an indicator of the overall test–retest reliability.


Ten healthy volunteers participated in the study (5 females, 5 males; mean age, 26.8 years; range, 20–38 years). Written informed consent was obtained from all volunteers and the study was approved by the local ethics committee. Functional data were acquired using a 3 Tesla (T) MRI system (Philips Achieva, Philips Medical, Best, The Netherlands) equipped with an eight-channel head coil. Each experiment consisted of five successive activation (task) sessions (bilateral, finger tapping/fixation) interleaved with five rs-fMRI sessions with eyes closed. For the activation sessions, self-paced finger tapping was used, as it is a well-established clinical paradigm, and, for both the finger tapping and fixation conditions, a block paradigm was used with a block length of 30 s. Scan parameters for the task-based fMRI sessions were TR/TE = 3000/30 ms, 3 mm isotropic voxels, 36 slices, and 100 dynamic scans. For the rs-fMRI sessions, the scan parameters were TR/TE = 2000/30 ms, 3 mm isotropic voxels, 33 slices, and 300 dynamic scans. The shorter TR for the rs-fMRI sessions was chosen to reduce aliasing of respiration-induced signal fluctuations, while still preserving full brain coverage. Total scan time for all experiments was 75 min. Before analysis, the image data were realigned, slice time corrected and smoothed using a Gaussian kernel with full width half maximum (FWHM) = 6 mm using SPM5 []. If data from several sessions, for any individual subject, exhibited translations and/or rotations larger than 1 mm or 1°, respectively, the data of that subject were excluded from further analysis.

Data from the activation sessions were analyzed using SPM5, whereas the rs-fMRI data were analyzed using the Group ICA of fMRI Toolbox (GIFT) [] (17). Dimensionality estimation was performed individually for all datasets using the minimum description length (MDL) criteria (18). ICA was run 10 times (using the Infomax algorithm) (19) for each dimensionality reduced dataset to facilitate validation of the components using clustering (ICASSO) [] (20). To select a component corresponding to the motor network from the ICA results of rs-fMRI data, all components were thresholded at z = 1. One component, representing the sensorimotor network, was then selected, based on visual inspection of each component's spatial characteristics, by three independent observers. Criteria for selecting a component as representing the sensorimotor network included observed activation in areas corresponding to the motor cortex as well as comparability to patterns previously reported regarding spatial profiles of resting-state networks (14, 21). In cases when all three observers did not select the same component, a consensus decision was made to select a single representative component. Additionally, to increase the confidence of the component selection, the power density spectrum of each component's characteristic time course was examined. The visually selected components were validated by retrospectively ensuring that sufficient fractional power below 0.1 Hz existed, as would be expected for a component exhibiting a resting-state network.

Estimation of Reliability

Reliability estimation was performed using the approach proposed by Genovese et al (15). The t- and z-maps resulting from the five replications (M = 5) are first used to create raw reliability maps, i.e., maps showing how many times out of five replications a given voxel is classified as active for a certain statistical threshold. For each voxel, the reliability count is assumed to be drawn from a mixture of two binomial distributions according to:

equation image(1)

where Binomial(M,p) denotes a binomial distribution with M trials and event probability p. pA, and pI correspond to the probabilities of true positive and false positive classifications, respectively. The parameter λ represents the fraction of truly active voxels. Eq. [1] is only valid when a single statistical threshold is used to create the reliability map, because a voxel that has been classified as active for a high threshold must also be considered active at any lower threshold. To evaluate reliability for several statistical thresholds, an extension of Eq. [1], taking dependence into account, was therefore proposed [17] and is used in the present work. In this extended version of Eq. [1], λ is constrained to be the same (or nearly the same) across replications. By linking the likelihood functions of the different datasets into a joint likelihood, with λ constant across replications, a joint model can be specified, yielding λ and a pair of reliability parameters (pA and pI) for each threshold. The reliability parameters can be estimated by means of maximum likelihood, a procedure in which parameter values are chosen so as to make the observed data as likely as possible under the model. The methods needed to estimate these parameters (λ, pA, and pI) for a range of thresholds were implemented in MATLAB (The Mathworks Inc., Natick, MA). Ten thresholds were selected to provide a reasonably even distribution of pA and pI estimates, wide enough to allow model fitting. The thresholds were selected individually for the task and the resting-state experiments, respectively. For a more complete description of the reliability parameter estimation, the reader is referred to the work by Genovese et al and Noll et al (15, 16).

The parameters pA and pI for all thresholds were then used to fit an ROC (receiver operating characteristics) curve, and from that curve the area under the curve (AUC) was calculated as a measure of reliability. To evaluate whether task-based or resting-state fMRI showed significantly higher reliability, statistical testing using the Wilcoxon pairwise signed-rank test was performed (P = 0.05). Additionally, the variance of the reliability measures within each of the two methods was also assessed and compared using the F-test, i.e., F = var(AUCtask)/var(AUCrest) (P = 0.05). Furthermore, replications of a task experiment could induce a systematic activation change over time. This possibility was investigated by first calculating the mean t-values for all voxels exhibiting t > 3. A straight line was then fitted to the five mean values obtained from the five replications for each subject. A sign test was then used to assess whether the obtained slopes deviated significantly from zero.


For both methods, subject motion was generally small, with the exception of subjects 2 and 8, who exhibited motion in several replications of the rs-fMRI experiments exceeding 1 mm translation and 1° rotation. Hence, subjects 2 and 8 were excluded from further analysis. For the remaining rs-fMRI datasets, the number of computed components, as determined by the MDL criteria, was on average 47.2 (range, 37 to 61). The sensorimotor network components selected by the three observers were in good agreement, and a motor component was found for all subjects and for all replications of the experiment. Out of the totally selected 40 components (8 subjects × 5 replications), 38% required a consensus decision. The fractional power below 0.1 Hz, for all components, was in the range 0.074 to 0.996, and among the 40 selected components, five exhibited a fraction below 0.1 Hz that was lower than 0.9, with the lowest fractional power being 0.73.

Figure 1 shows the ROC curves for the remaining eight subjects, with blue curves representing the task experiments and the red curves representing the rs-fMRI experiments. The ROC curves for the two methods were similar for all subjects with the exception of subjects 4 and 5 for whom the rs-fMRI AUCs were notably smaller. Examples of reliability maps for the task and the resting-state fMRI experiments are shown in Figures 2 and 3. In both figures, the t-values obtained from the analysis of the task fMRI experiments have been converted to the corresponding z-scores. In Figure 2, reliability maps corresponding to five different thresholds (z-values of 3.1, 2, 1.3, 0.8 and 0.3, respectively) are illustrated for subject 1. In Figure 3, reliability maps for the eight subjects are shown, using a threshold of z = 3.1 for both task and rs-fMRI data. Pixels are colored according to the number of times out of the five replications that they were considered active.

Figure 1.

Fitted ROC curves for eight subjects, corresponding to the task experiment (blue) and the rs-fMRI experiment (red).

Figure 2.

An illustration of reliability maps for subject 1. From top to bottom, the z thresholds used were 3.1, 2, 1.3, 0.8, and 0.3 for the task (left) and resting-state (right) experiments, respectively.

Figure 3.

Example reliability maps for one slice representative of the sensorimotor cortex for all eight subjects. The left and right panels show maps for the task-based and resting-state fMRI experiments, respectively. The z threshold used was 3.1. Note that the reliability estimates (pA, pI, and λ) were obtained from a range of thresholds, and the maps shown above are only displayed for the purpose of illustration.

The values for AUC and λ (corresponding to the true fraction of active voxels) are summarized in Table 1. The estimated λ was larger for the rs-fMRI experiments than for the task-based experiments in six of eight subjects, reflecting the activation of a larger part of the cortex. On a group level, the median λ was, however, not statistically larger for the rs-fMRI data (Wilcoxon pairwise signed rank test).

Table 1. Areas Under the ROC Curves and Estimated λ for the Eight Subjects Included in the Analysis
SubjectAUC (task)AUC (rest)λ (task)λ (rest)

The estimated AUC was larger for the task-based experiments than for the rs-fMRI experiments in six of the eight subjects. For subjects 3 and 9, the AUC values were smaller for the task-based experiments than for the rs-fMRI experiments. Although the AUCs for the rs-fMRI data were generally lower, as seen in Figure 1, the statistical analysis of the AUC values showed no significant difference in median AUC between the task experiments and the rs-fMRI experiments (Wilcoxon pairwise signed-rank test), suggesting comparable levels of test–retest performance between the two methods. The variance of the AUC values was not significantly different between the two groups, showing comparable levels of within-method variance. The mean t-values for all voxels exhibiting t > 3 were, in several subjects, lower for the later experiments. However, no statistically significant modulation of induced activation over the five replications was found on a group level.


In this work, the test–retest reliability of rs-fMRI activation of the sensorimotor network was investigated by comparison with the reliability of task-evoked activation of the sensorimotor cortex. By using the method proposed by Genovese et al, the reliability can be assessed for several statistical thresholds, yielding an ROC curve comprised of the two parameter estimates pA and pI. From such an ROC curve, reliability can be assessed as the area under curve. Our results suggest that the overall reliability for rs-fMRI, for mapping of the sensorimotor network, compares well with that of the widely used finger tapping paradigm for a group of healthy volunteers. The AUC was generally somewhat higher for the task experiments, although the difference was not statistically significant according to the nonparametric test used. Additionally, no significant difference was found between the variances of the AUC for the two groups, indicating that the within-group variability was comparable for the two groups.

The results of this study were obtained for a group of healthy volunteers, which means that it should be interpreted with some caution when considering a more heterogeneous patient population. Although no test–retest study has been conducted to assess the repeatability of rs-fMRI data for potential patient groups, results indicating reliable assessment of resting-state networks in infants (3, 22, 23), subjects at sleep or at rest (4–8) and during anesthesia (9, 10) strengthen the hypothesis that rs-fMRI can be used as a complementary clinical investigation.

Furthermore, the exclusion of two volunteers and the fact that a nonparametric test is inherently weak under certain conditions means that some degree of care should be taken when drawing conclusions. Hence, the not significant difference in test–retest reliability between task based and resting state results in this study does not rule out that there is a small difference between these methods. Another important issue is spatial overlap, which is reflected by the raw reliability maps for repeated experiments within each method. However, a comparison of spatial overlap between task-based and rs-fMRI was not performed in this study, as the usage of ICA yields activation in the whole sensorimotor network, whereas task-based fMRI as used in this study activates the hand area. Despite this, the present results are encouraging and together with the results reported by Liu et al (11), indicating that resting-state networks exhibit good overlap with areas of activation obtained by a traditional motor task, suggest that rs-fMRI could be used as a supplementary clinical tool in cases where task-based fMRI is not feasible.

In the present work, we decided to use ICA for analysis of the rs-fMRI data, because this approach does not require any seed regions to be defined. The seed-ROI based approach could be problematic in patients showing distorted anatomy, and could also introduce a methodological bias in a reliability study. Furthermore, when a data-driven method like ICA is used, the whole network is mapped simultaneously, suggesting that a single experiment could be used instead of several paradigms demanding patient compliance. As ICA has the ability to isolate several networks, a whole range of networks could be mapped from the data acquired at a single experiment, further increasing the clinical value of the method. A problem when using ICA is, however, that a valid component representing the network of interest must be selected. Generally, several methods have been proposed to facilitate this task. De Martino et al (24) suggested a method based on temporal and spatial characteristics of time courses corresponding to the spatial components, for automatic determination of component content, classifying BOLD-related components as well as artifactual components corresponding to, for example, motion or vessel influence. By using a priori information about expected spatial activation alone, masks can be used for the selection of specific components pertaining to targeted spatial areas. However, this approach requires transformation of the data into standard space, which is not typically done when using fMRI for presurgical mapping. Alternatively, the temporal characteristics can be used to filter the components, retaining only those that exhibit prominent power for frequencies below, for example, 0.1 Hz. In the present work, the component selection was performed by three independent observers, with additional aid by such a frequency analysis. We found that components corresponding to the sensorimotor network were reliably observed for all subjects and replications. Furthermore, the selection of valid components was strengthened by the use of the ICASSO approach, i.e., running the ICA algorithm multiple times and then performing a subsequent clustering of components, thereby improving the validity of the aggregated components.

A methodological concern when interleaving task and rest experiments is that an activation phase can assert influence on a subsequent period of rest. Waites et al (25) addressed this issue, but were only able to demonstrate a nonsystematic spatial variation in resting-state networks on the individual level. No systematic changes were observed between functional connectivity before and after the task phase, consisting of an orthographic lexical retrieval task, and the variations were not statistically significant on group level. Other studies have observed that resting-state connectivity can be influenced by prior task performance, but the relationship between the altered performance of the resting-state network and the preceding task is not completely straightforward. For example, Albert et al (26) found that resting-state networks involved in the learning of a visuomotor task was altered following task performance, and Hasson et al (27) found that the default mode network was modulated following a language comprehension task. Thus, a possibility that resting-state networks could be influenced by a previous phase of activation cannot be ruled out, although such alterations would not necessarily affect the sensorimotor network studied in this work.

Because, in this work, the experiments were performed during one session, habituation effects should also be considered. If present, such effects would lead to decreased activity over time, resulting in a lower reliability for the task-based experiments. When testing for this effect, the t-values for the later replications of the task-based experiments showed a slight tendency to decrease, but no monotonic behavior was found, suggesting that habituation effects were small. Furthermore, it can be argued that a more clinically relevant experimental procedure would be to perform the test–retest sessions over a longer period of time. Chen et al (13) observed, however, that resting state networks were consistent for data obtained in 5 sessions across 16 days. Hence, we do not believe that performing the experiments during a single session significantly influenced the consistency of resting state activation.

In conclusion, we have showed that the test–retest reliability for mapping of the sensorimotor network is comparable between rs-fMRI and task-based fMRI. This suggests that resting-state mapping of functional networks at risk can be considered as an alternative or complementary method in clinical cases where patient status does not permit a task-based fMRI experiment. Further work includes the evaluation of test–retest performance for patient groups and study of spatial overlap between results obtained with different methods.