Accounting for task performance
An important issue in developmental imaging, but also for any group comparison, is group differences in task performance [Brown et al., 2005; Johnson et al., 2002; Palmer et al., 2004; Price and Friston, 1999; Schlaggar and McCandliss, 2007; Schlaggar et al., 2002]. A task that is simple for adults could be much harder for children, or a task designed for children could produce ceiling effects in adults. A discrepancy in performance on the task of interest (as well as any comparison task—see “the Task B Problem” later) creates a potential confound in the analysis. In this instance, any differences in activation observed between groups might be due to less successful performance (e.g., inattentiveness, misunderstanding of instructions, and guessing) in one group and not necessarily by a fundamental difference in the way the brains of members of the two groups process the task. This point is important, because if we want to discover group differences in brain processing responsible for producing a particular behavior, we should do our best to increase the chance that we are sampling the brain activity during that behavior. If group task performance is discrepant, there are a number of reasons beyond the fundamental group differences in brain processing related to task implementation, for why this discrepancy could be the case. In studies that do not address the potential confound of performance discrepancies, results must be interpreted with caution. To be clear, understanding the functional neuroanatomical basis of group performance differences is also valuable. The argument here is that by isolating the variables, one can attain the unique contribution of performance or group membership to any differences observed between groups.
The first step in addressing the performance confound is to collect, whenever possible, behavioral data while the subject is in the scanner (i.e., recordings of verbal outputs, eye movements, button presses, etc). Performance metrics should include both accuracy and reaction time. One can argue that response accuracy and response time are nonoptimal surrogates of performance (though chronometrics have certainly provided the means for interrogating cognitive architecture [Posner, 1978]). Two groups of subjects might be entirely matched on accuracy and response time, yet have two entirely different strategies, whether those strategies are overt or implicit, for task completion [Brown et al., 2005; Schlaggar et al., 2002]. In developmental (and aging) studies, neurobiological differences may bias toward different implicit processing strategies over age [Grossman et al., 2002; Reuter-Lorenz and Lustig, 2005]. Group imaging differences, then, could result from one group's unsuccessful implementation of the same or alternate strategy as the other group (i.e., “the performance confound”), or successful implementation of an alternate strategy (i.e., “behavioral phenocopy,” see Interpretation of Group Differences below). When behavioral performance is not different between the groups, but imaging differences remain, the interpretation space is narrowed to consideration of successful, as measured by overt performance, implementation of different strategies. Without overt performance information, there is no means to address a frank performance confound.
Relative to a typical testing environment, the scanner can cause a degradation in performance in subjects of all ages, but children may be more susceptible than adults. Hence, estimates of an individual's performance on the task outside of the scanner cannot be relied upon exclusively to give a good estimate of their performance when the imaging data were acquired. Similarly, it is important to differentiate between correct and incorrect trials of a task, as it has been shown that error-related activity can differentially affect many regions of the brain [Dosenbach et al., 2006; Garavan et al., 2002], and that there may be group differences in error processing [Rubia et al., 2005; Velanova et al., 2008]. Examination of just the correct trials of a task does not address differences in response times between the two groups. There are well-established age differences in processing speed across a variety of tasks [Kail, 1991]. Thus, reaction time effects are important to distinguish from other types of group activation differences.
Research groups have dealt with performance differences between groups using many strategies. We will discuss four strategies briefly. One common technique is to create equivalent performance by calibrating the demands of the task until the adults and children are performing at a similar level of accuracy and/or reaction time [summarized briefly in Casey, 2002; Kotsoni et al., 2006]. Testing various levels of task difficulty in each of the groups allows for comparison of activity at equivalent performance. For example, one could parametrically manipulate an N-back working memory task to create roughly equal performance between children and adults (e.g., by having children do a 1-back version and adults do a 3-back version). However, this parametric manipulation assumes that the brain activations in the two groups are being manipulated the same way by the different versions of the task. This may be problematic in cases of memory span, for instance, when different list lengths are proposed to emphasize different processes.
Another strategy for addressing performance differences is that of post-hoc “performance matching.” In this approach, the groups perform the same task, and any group differences are found. A subgroup analysis is then done by separating groups based on overt performance measures (i.e., reaction time and accuracy) into matched and unmatched sets. Using this approach, we have identified regions that produce group differences only when performance is discrepant between groups, and regions that remain different between groups even when performance is equated [Brown et al., 2005; see “Interpretation of Group Differences” later]. This approach requires some degree of overlapping performance on the task between the groups, which may not be possible for all tasks. This method has been criticized as selecting for the slowest and dullest adults and the quickest and brightest children in the subgroups. However, it is important to note that the group imaging differences continue to exist in the other subgroups [Brown et al., 2005; Schlaggar et al., 2002]. Also, task behavioral responses are often single “moment in time” measures; subgroups, and the overall groups, often can be matched on IQ and other offline assessments suggesting the “bright/dull” dichotomy is not inherently built-in to this approach.
A third strategy is to regress performance variables as covariates of interest. Performance regression is often done in conjunction with an age regression, and requires at least some degree of non-collinearity between age and performance [see Fair et al., 2006 for discussion]. Performance regression has the benefit of not reducing power through subgroupings as performance matching analysis can do, but a strong degree of collinearity between age and performance may inflate the variance of the estimate related to each of the factors.
A fourth strategy for dealing with performance issues has been to equate performance between the groups on one task (e.g., picking out the capital letter in a word), while indirectly assessing the group differences on a simultaneously occurring implicit task (e.g., reading). However, this approach raises issues similar to the Task B problem discussed next, in that there is little reason to assume that because the groups are equated on the overt task that they are thus matched on the implicit task.
The “Task B” problem
Many neuroimaging studies employ direct comparisons between two (or more) task conditions, such that brain activity during a “control task” (Task B) is subtracted from brain activity during a “higher order” cognitive task (Task A). The thought is that an appropriate control task will subtract away functional activity common to the two tasks, leaving only activity related to the higher order aspects of Task A (“pure insertion”). The concept of pure insertion is problematic because of its assumptions of linearity and noninteraction between tasks [Friston et al., 1996]. The choice of a control task thus has important consequences on results and their interpretation, particularly in group comparison studies where Task B has the potential to be different between the groups. For example, developmental studies often compare (Task Achild–Task Bchild) to (Task Aadult–Task Badult), and interpret the results as a straightforward difference between children and adults for Task A. Critically, this interpretation rests on the assumption that Task B is the same in both children and adults, but this assumption may go untested or not be discussed. This assumption is very common in blocked designs. However, if the two groups activate significantly different brain regions (or activate similar regions to different degrees) while performing a control task, the interpretation of group differences in the higher order task will be confounded (see Fig. 1). This issue can be addressed in at least two complementary manners. First, one can directly compare the two groups on Task B (Task Badult–Task Bchild). A null result in this direct comparison would support the contention that Task B is behaving well as a comparison task. Some might worry that a “Task C problem” emerges here such that Task B needs to be contrasted with a lower level task, ad infinitum. The key is that the between-group Task B comparison obviates this need. The point of the between-group Task B comparison is to test the validity of the (Task Achild–Task Bchild) versus (Task Aadult–Task Badult) construct. Alternatively, or in addition, one can investigate a Task (A vs. B) by Group (child vs. adult) interaction and use post hoc analysis to determine the source of the interaction.
Figure 1. The Task B Problem. Studies that compare the difference in activation between two tasks across two groups must account for the possibility of differences existing in either task (Task A or Task B), not just the task of interest (Task A).
Download figure to PowerPoint
Event-related fMRI designs allow estimation of the hemodynamic response to individual trial types, eliminating the need for subtraction and thus eliminating the Task B problem. When these designs do not assume a shape of response, timecourses of the hemodynamic responses can be evaluated and displayed. Examination of timecourses allows a straightforward quality control assessment to be implemented to ensure appropriate event coding and data preprocessing. Analysis of the timecourses of separate conditions also reveals the direction of hemodynamic response to an event type (positive or negative activity) that other types of analysis (particularly subtraction analyses) can obscure; the direction of the response can be critical to data interpretation (see “Negative BOLD Activity” below).
An approach that we often employ begins with the use of an event-related (or a mixed/blocked event-related) design to assess group differences. We model a parameter for every time point in the time course of each transient event type (e.g., correct vs. incorrect responses, different item types) thus estimating the magnitude at each of the points while making no assumptions about the precise shape of the response [Corbetta et al., 2000; Shulman et al., 1999]. From here, a significant main effect of time from an ANOVA analysis can be conducted at the voxel level or by using experimenter-defined regions. A significant effect indicates that the hemodynamic response deviates significantly from zero (i.e., is not flat), and can be interpreted as an activation or deactivation. A significant interaction of any other factor with time implies a significant variation in the hemodynamic response across the levels of that factor. This approach allows for independent assessment of both Tasks A and B and their effect across the groups.