- Top of page
Head movement during functional magnetic resonance imaging (fMRI) degrades data quality. The effects of small movements can be ameliorated during data postprocessing, but data associated with severe movement is frequently discarded. In discarding these data, it is often assumed that head-movement is a source of random error, and that data can be discarded from subjects with severe movement without biasing the sample. We tested this assumption by examining whether head movement was related to task difficulty and cognitive status among persons with multiple sclerosis (MS). Thirty-four persons with MS were scanned while performing a working memory task with three levels of difficulty (the N-back task). Maximum movement (angle, shift) was estimated for each difficulty level. Cognitive status was assessed by combining performance on a working memory and processing speed task. An interaction was found between task difficulty and cognitive status (high vs. low cognitive ability): there was a linear increase in movement as task difficulty increased that was larger among subjects with lower cognitive ability. Analyses of the signal-to-noise ratio (SNR) confirmed that increases in movement degraded data quality. Similar, though far smaller, effects were found in a cohort of healthy control (HC) subjects. Therefore, discarding data with severe movement artifact may bias MS samples such that only those with less-severe cognitive impairment are included in the analyses. However, even if such data are not discarded outright, subjects who move more (MS and HC) will contribute less to the group-level results because of degraded SNR. Hum Brain Mapp 35:1–13, 2014. © 2012 Wiley Periodicals, Inc.
- Top of page
In the analysis of functional magnetic resonance imaging (fMRI) data, perhaps the single largest factor that degrades data quality is subject motion. This is because when a subject moves his/her head during a scan, one of the fundamental assumptions underlying fMRI data analysis is violated—the assumption that a given voxel corresponds to a given volume of brain tissue across time [see Friston et al., 1996 for a review]. This assumption is critical because in fMRI data analysis, we wish to ascribe variance in the signal from each voxel to our experimental manipulation(s). However, if a given voxel corresponds to one location in the brain at Time 1 and a different location at Time 2, then there are at least two sources of variance in the data: the experimental manipulation and subject motion. To ascribe changes in the blood oxygen level dependent (BOLD) signal to the experimental manipulation(s), it is therefore necessary to ensure that subject motion accounts for little to none of the variance in the data. If this is not done, if data is included in the analyses that have been minimally corrected for motion, the results become unreliable [e.g., Power et al., 2012; Van Dijk et al., 2012]. This is not only because there are two sources of variance, but also because the changes in the BOLD signal associated with movement can be far larger than changes associated with the experimental manipulation. Thus, movement-related changes can “swamp” changes associated with the experimental paradigm.
In the functional neuroimaging literature, three ways have been proposed and used to minimize the contribution of head motion to variance in the data. One method is to use restraints that make movement difficult [Fitzsimmons et al., 1997; Green et al., 1994]. Nearly all fMRI studies in the literature use restraints such as foam pads that are inserted around the subject's head to help the subject remain still. While these are useful, they do not completely eliminate movement; their value is largely in allowing subjects to feel when they are moving, thereby allowing compliant subjects to remain still. A more invasive method is to use a bite-bar. This is a device that is anchored to the head-coil, and that subjects hold in their jaws. While it is very effective in limiting head motion, it is also perceived by some to be aversive and uncomfortable, limiting its utility; this is particularly so for clinical samples.
Another method that is being developed is to measure head motion in real time and to either adjust scanning to account for this motion [Derbyshire et al., 1998; Mathiak et al., 2001; Speck et al., 2006; Thesen et al., 2000; Welch et al., 2002] or to use this information retrospectively to correct for head motion [Tremblay et al., 2005]. Finally, motion can be corrected retrospectively, during image processing [Biswal and Hyde, 1997; Ciulla and Deek, 2002; Friston et al., 1996; Hajnal et al., 1995; Woods et al., 1992]. Several algorithms have been developed for this sort of “motion correction,” but the central approach is largely the same: a canonical image is chosen, and every other image in the time-series is compared to that canonical image. The extent to which each image differs is quantified in at least six parameters (three angular deviations: roll, pitch and yaw; three translational deviations: shifts in the right/left, posterior/anterior, and superior/inferior dimensions), and corrected by applying a rigid-body transformation. While this approach has proven very useful for minimizing the effects of small amounts of motion on the BOLD signal, it is less reliable when there are large deviations in the data [Tremblay et al. 2005]. While this problem is difficult in the X and Y directions (i.e., movement that is parallel to the slice acquisition plane), it is nearly impossible in the Z direction (i.e., across slices) because of spin history effects (i.e., it is impossible to know what the data would have been, if it had been acquired at a different time). It is therefore common practice to exclude (discard) data in which movement exceeds ∼1–2 mm, which translates to <1° in angular deviation, and less than one voxel (usually ∼3 × 3 × 3 mm or ∼27 mm3) in translational deviation.
While it is unquestionably good practice to exclude data with excessive motion artifact, there are several potential disadvantages. For example, if there is a systematic relationship between excessive motion and task difficulty (i.e., if subjects tend to move more during more difficult tasks), then the removal of blocks with excessive motion will result in the removal of data from the most difficult conditions, resulting in sampling bias. Moreover, if subjects who tend to move more are systematically different from those who do not (e.g., if they have a lower IQ), then the removal of subjects with excessive motion will result in the removal of subjects with this difference (lower IQ), again introducing sampling bias. Generally, when data is excluded, it is assumed that head movement is random, and not affected by task difficulty or by subjects' cognitive abilities.
Although head motion is common in typical healthy individuals (a recent study on over 1,000 healthy subjects indicated a range of motion from 0.027 to 0.051 mm [Van Dijk et al., 2012], the concern about inadvertently introducing sampling bias when subjects with excessive motion are excluded is stronger when clinical populations are studied. Indeed, motion has been shown to be a problem in fMRI studies of neuropsychiatric populations including multiple sclerosis [Phillips, 2008], traumatic brain injury [for review, see Hillary et al., 2002], stroke [Seto et al., 2001], epilepsy [Lemieux et al., 2007], and schizophrenia [Weinberger et al., 1996]. In addition to clinical samples, studies involving pediatric samples are affected by greater head movement as children are less able to remain still compared to adults [Evans et al., 2010; Yuan et al., 2009]. Despite this, it has not been universally found that head motion is greater in clinical populations. For example, Yoo et al. [Yoo et al., 2005] reported that there was very little head motion in a group of individuals with schizophrenia, and that their head motion was no greater than that seen in a matched group of healthy controls. While this result is reassuring, it is not clear that it is representative of other clinical populations (e.g., multiple sclerosis), nor indeed whether it is generalizable beyond the group studied inasmuch as the sample was very small (n = 11).
Here, we investigated this issue in a cohort of subjects with multiple sclerosis (MS), using a working memory task (the n-back task), with three levels of difficulty. We hypothesized that movement would be related to task difficulty in MS, based on the idea that the requirement to remain still in the fMRI scanner is similar to adding a second task to the experimental paradigm. It has been shown that when MS subjects must perform a demanding cognitive task while walking, their walking performance declines [Hamilton et al., 2009]. We hypothesized that the same would be true of the ability of MS subjects to remain still in the scanner. Moreover, previous fMRI research investigating differences in brain activation between MS and healthy controls (HCs) has shown that cognitive status moderates group differences. For example, Chiaravalloti et al. 2005 have shown that the activation in high-functioning MS subjects was similar to HCs, while a lower-functioning cohort of MS subjects showed a markedly different pattern. Therefore, we also hypothesized that the cognitive status of the MS subjects would moderate the effect of task difficulty on movement in the scanner. We tested these hypotheses by (1) comparing the extent of maximum motion (both in angular and translational deviation) across three levels of task difficulty (0-, 1-, 2-back) in a group of MS subjects, and (2) by comparing the extent of maximum motion across task difficulty in two groups of MS subjects (high vs. low cognitive ability). We also examined the effect of task difficulty on maximum motion in a group of healthy control (HC) subjects.
- Top of page
This study confirmed that in a clinical sample, such as MS, subjects do indeed move more as task difficulty increases. This shows that subject movement is not a random variable, but that it is related to the experimental manipulation. This is somewhat concerning, particularly if a strict cutoff of 1–2 mm [<1° of angular motion and less than one voxel (∼3 mm)] is used to determine which data to retain and which to discard. As a group, the MS sample moved as much as 1.31° (corresponding to ∼3.43 mm), and 1.47 mm (in the 2-Back task).
More concerning are the results that emerged when the MS sample was divided into cog− and cog+ groups. In those analyses, it emerged that the cog− group moved far more than the cog+ group. This was true for both angular and translational (shift) motion, but was far more problematic for angular motion. In the 2-Back task, the cog− group moved nearly 2° (1.82°, or ∼4.76 mm), which is far more than can be reliably corrected for with current image processing software. If these subjects were simply discarded, the sample would be strongly biased toward individuals with MS who have higher cognitive abilities. This would likely result in an underestimation of the effects of MS on brain functions.
One way to avoid the introduction of this sampling bias would be to correct for the motion as much as possible during image-processing, and then to include the motion parameters in the deconvolution as regressors of no interest. This would minimize the effects of motion on the data (though it would by no means remove them entirely), and might allow some of the subjects who would otherwise be discarded to remain in the sample. However, while this approach works well for event-related designs, it appears to decrease the sensitivity of the general linear model when block designs are used [Johnstone et al., 2006]. Moreover, the analyses of the SNR in the data presented here show this solution to be flawed as well: the more subjects move, the lower their SNR. This means that the results from subjects who moved very little are stronger than the results from subjects who moved more. This has the unfortunate result that the group-level statistics will be skewed toward the subjects who moved less: the cog+ subjects. Thus, even if the data from subjects with a large amount of motion are not simply discarded, a bias remains in the group-level data because of the higher SNR in the data from the subjects who moved less.
The random combinatorial analysis demonstrates the effect of including subjects with excessive motion in the sample. In the 2-Back condition, 12 subjects moved >1°. As the data from these 12 movers was incrementally added to a subsample of 12 subjects, there was a systematic decrease in the strength of the signal. This was true for 0-Back and 1-Back, but was particularly marked for the 2-Back condition. Because the 12 subjects who moved >1° in the 2-Back condition (movers) were also all in the cog− group, adding them to the sample would be expected to result in decreased signal for two reasons: the signal from the cog- group might be expected to be less than that of the cog+ group, and the SNR would be expected to be less in this group because these 12 subjects moved. However, the difference in SNR should be worst in the 2-Back condition, since that is where these subjects moved the most.
These data tell an important cautionary tale in relation to fMRI studies of clinical populations such as MS. However, a great many fMRI studies are conducted to better understand brain function in healthy populations. We therefore also assessed whether the motion parameters increase with task difficulty in healthy controls.
As with the MS sample, the HC group showed increasing motion as the task increased in difficulty. However, unlike the MS group, the mean amount of motion in the HC group never exceeded 1° of angular motion or one voxel of translational motion (shift). This is reassuring for those who investigate cognition in healthy samples. However, the fact that SNR was nevertheless correlated with motion (albeit only for translational motion) is concerning. Just as with the MS sample, this means that the results from those who move more will be weaker than the results from those who move less, and that any group-level statistics will over-represent those subjects who moved less in the scanner.
The purpose of these experiments was to empirically assess the concern that subject motion (in the scanner) is not a random variable, a concern that is particularly important in clinical samples [e.g., Hillary et al., 2002; Phillips, 2008]. The results suggest that motion is indeed a problem in clinical samples (in this case, MS), particularly in cog− group. If subjects with excessive motion were simply removed from the group-level analyses, the excluded subjects would overwhelmingly be the cog− subjects. This would introduce sampling bias into the study because the subjects remaining in the group-level analyses would be biased against cognitive impairment. Thus, any results would not represent MS subjects as a whole, but would rather represent MS subjects who had higher cognitive abilities. This would almost certainly lead to underestimations of the effect of MS on brain activity.
If discarding subjects with excessive motion results in sampling bias, would it be better to leave these subjects in the group-level analyses (after attempting to mitigate the motion artifact by, for example, including the motion parameters in the deconvolution as regressors of no interest)? Unfortunately, there can be no simple answer to this question. Certainly, including data from subjects with significant motion artifact will not benefit the group-level analyses: in avoiding sampling bias, spurious activation patterns (associated with motion artifact) would be included in the analyses. Moreover, even if only subjects with no obvious motion artifact are included in the group-level analyses, the signal-to-noise ratio (SNR) will be less from those who moved more (i.e., the group with low cognitive ability). Thus, it is very difficult (though not impossible: see below) to escape from sampling biases in the data, using current techniques.
Relationship to Prior Research
While we found clear evidence of greater motion in our subjects with MS than in our HC subjects, others have reported no such difference in other clinical populations [e.g., Yoo et al., 2005]. While this might have to do with a difference in disease type (MS vs. schizophrenia), it is more likely due to the fact that the cognition of the individuals with schizophrenia used in the Yoo et al. [Yoo et al., 2005] study was relatively intact. Although their performance on working memory tasks was worse than the HCs, their IQ was very high (mean = 111.5), and the estimate of disease severity was very low (brief psychiatric rating scale total score = 25.5). In as much as movement became a larger problem in our sample as cognitive impairment increased, one might not expect the head motion in the sample of individuals with schizophrenia studied by Yoo et al. to be that much more than their healthy counterparts. Moreover, because only 11 subjects were included in the Yoo et al. study, it is possible that the null effect they report is due, at least in part, to a lack of power.
A rather different aspect of prior research is that studies investigating functional activity in MS relative to HCs often report “more” activity in the MS group. This increased activity is generally twofold: there is an increase in the intensity of activity in the same brain areas that HCs use to perform a given task, and the extent of the active areas is greater in the MS group [e.g., Chiaravalloti et al., 2005; Sweet et al., 2006]. The results presented here suggest that this frequent finding in the MS literature may represent an underestimate of the increase in activity seen in MS. This is because movement is correlated with decreased SNR, which means that the more people move, the less signal there is to detect (relative to the noise). In as much as individuals with MS move more than HCs, it is more difficult to detect activity in MS. Despite this, we consistently see increased activity in MS cohorts (relative to HCs). Therefore, it seems likely that if individuals with MS moved as little as HCs, the increase in activity seen in MS would be even larger than what is reported in the literature. There is an important caveat to this line of reasoning. In many studies that investigate increasing task difficulty in MS relative to HC, there are large differences when the task is relatively easy, but the differences are less apparent as the task becomes more difficult [for a good example using the N-back task, see Sweet et al., 2006]. The results of the current paper suggest one possible reason for this perplexing lack of difference at higher levels of task difficulty: increased motion (and therefore decreased SNR) in the MS group as difficulty increases. As the amount of motion in the MS group increases, the concomitant decrease in SNR would eventually begin to make even robust functional activity difficult to detect. Thus, if the results presented here are present in other MS samples (as seems likely) the lack of differences in activity as task difficulty increases may be due, at least in part, to progressive increases in head motion and consequent decreases in SNR in the MS group.
Another frequent observation, when functional activation in MS samples are compared to HCs, is that the MS group shows activation in areas where the HC group shows no reliable activation [e.g., Sweet et al., 2006]. The contribution of motion to this finding is more nuanced. On the one hand, decreased SNR may play a smaller role here: if there is no reliable activation in these regions in the HC group, a smaller increase in the MS group would be detectable (even if this increase was lessened by poorer SNR). On the other hand, motion artifact may result in spurious activation in the MS group, thus producing artifactual “activation.”. Unfortunately, in many studies it is difficult to determine which cause (real activation or motion artifact) produces this type of activation pattern.
Functional MRI research has traditionally considered head motion a source of random error. This would suggest that, at worst, motion reduces the SNR and, therefore, reduces statistical power. In fact, as shown by the current study, head movement may actually be a source of systematic error, which is far more troubling. That is, if clinical samples move more than healthy samples, and impaired patients move more than intact patients, then SNR and statistical power may also vary between groups and within groups as a function of impairment. One current goal of clinical fMRI research is to identify neurophysiologic biomarkers of neurologic disease and behavioral/cognitive impairment. For instance, several studies have demonstrated that functional connectivity within the default network differs between healthy adults and persons with Alzheimer disease [e.g., Greicius et al., 2004; Sorg et al., 2007]. Other studies have correlated continuous measures of behavioral/cognitive impairment with functional connectivity in clinical samples [e.g., Di Martino et al., 2009]. Given that movement impacts SNR and statistical power within functional connectivity analyses in general [Power et al., 2012; Satterthwaite et al., 2012; Van Dijk et al., 2012] and default network analyses in particular [Van Dijk et al., 2012], it is at least possible that group related differences are due in whole or in part to head movement rather than differences in neurophysiology. However, this conclusion is far from certain: the data presented in the current article show that MS subjects move more as the task becomes increasingly difficult while much of the functional connectivity literature is based on resting state scans which, of course, involve no overt task. Therefore, it may be that clinical samples do not move more during rest than HCs. Nevertheless, the field of neuroimaging must consider head movement within the MR scanner as a possible source of systematic error, and seek ways to ameliorate this confound in the acquisition, analysis, and interpretation of fMRI data.
Why Do Subjects With Cognitive Impairment Move More in the Scanner?
Although it is not clear why task demands and cognitive impairment are associated with greater movement, we have considered one possible explanation. Persons with MS typically require greater cerebral resources (e.g., prefrontal activation) to perform the same cognitive tasks as healthy controls [e.g., Sweet et al., 2006]. This is especially true for MS patients with cognitive impairment [Chiaravalloti et al., 2005]. Experimental fMRI paradigms typically require subjects to perform two tasks simultaneously: (a) perform the cognitive task of interest (e.g., N-Back), and (b) remain still. As demands of the cognitive task increase (2-Back), there may be fewer cerebral resources available to maintain the second task (remain still). Healthy persons and MS patients with higher cognitive abilities may process the cognitive task with enough efficiency that cerebral resources remain available for remaining still; however, cerebral inefficiency in MS patients with lower cognitive ability may lead to depleted cerebral resources, resulting in neglect of the second task (remain still).
Another factor which may contribute to increased movement in the MS population is fatigue. Individuals with MS frequently report high levels of both physical and cognitive fatigue [Krupp et al., 2012], and self-reported fatigue levels often increase during a difficult cognitive task [Johnson et al., 1997]. Although it was not directly studied in the current study, increased fatigue throughout the course of the fMRI paradigm likely leads to increased head movement, which will significantly impact the BOLD signal.
Because we cannot fully correct for subject motion, we are left having to decide between two unpalatable alternatives: (1) exclude subjects with excessive motion and accept the resulting bias in our sample, (2) include as many subjects as possible, and accept the fact that the subjects who moved more will contribute less to the group-level results. In practice, the latter choice is preferable, but only because the former choice is unacceptable. One insidious problem with the latter choice has to do with the fact that the SNR is almost never reported in fMRI studies. Therefore, when two groups are compared (e.g., individuals with high vs. low cognitive ability), it is almost impossible to tell how much of the difference between the groups is due to differences in SNR. This problem is less concerning in studies involving only HCs, but it would be wise for studies involving clinical samples to include analyses of SNR in their results.
A better solution would be to prospectively coregister all of the images in the fMRI time-series, adjusting the scanner to track changes in the position of the brain as they occur. Several methods have been devised to do this, ranging the use of three external markers placed on the participant's head [Derbyshire et al., 1998; Speck et al., 2006], to techniques that calculate rigid-body transformations of the EPI image, similar to algorithms used in retrospective motion correction [Mathiak et al., 2001; Thesen et al., 2000], to techniques that measure differences in k-space [Welch et al., 2002]. These techniques are very promising and may obviate the need to correct for motion retrospectively by ensuring that the time-series of EPI images is coregistered at the time of acquisition. This would minimize signal distortions and changes in SNR due to motion, and would thus allow clinical populations to be scanned without the concern that motion artifact will cause differences in signal strength between groups. Indeed, some of these methods have recently become commercially available (e.g., PACE, available on Siemens scanners).
Another solution is to carefully monitor motion parameters from every subject who participates in the study (an approach that should be followed in any case), and to ensure that sufficient numbers of subjects with low cognitive ability are included. From the scatter plots in our analyses, it can be seen there are some subjects with low cognitive ability who were able to remain still. One consequence of this observation is that, despite the fact that many subjects with impaired cognition will move too much to be included, it is possible to sufficiently power a study by continuing to recruit such subjects until a sufficient number who are able to remain still have been found. This is a rather costly option, since it entails the collection of many datasets that will not ultimately be usable, but it is perhaps the best solution for studies of clinical samples.