Several methods to measure implicit attitudes have recently been proposed (Steffens & Jonas, 2010). Examples include the Implicit Association Test (IAT; Greenwald, McGhee, & Schwartz, 1998; Sriram & Greenwald, 2009), the Extrinsic Affective Simon Task (EAST; De Houwer, 2003; Degner & Wentura, 2008), and the Go/No-go Association Task (GNAT; Nosek & Banaji, 2001). All these techniques are based on a discrimination task that involves speeded responding to object-evaluation pairs (Banaji, 2001). For instance, the IAT is based on the idea that it is easier to assign exemplars of two associated concepts (e.g., flowers and positive words) to the same response key than exemplars of two unassociated concepts (e.g., flowers and negative words). Indeed, Greenwald et al. (1998) demonstrated that people are better in IAT performance (i.e., faster and more accurate) when they are instructed to respond to flower exemplars and positive words with the same response key and to insect exemplars and negative words with a different key (compatible block). In contrast, they are poorer in performance when flowers and negative words are assigned to the same response key and both insects and positive words are assigned to the other key (incompatible block). Thus, the average reaction time difference between incompatible and compatible IAT blocks can be used as an indirect preference measure for one target concept (e.g., flower) over the other (e.g., insect).
In spite of some significant advantages indirect attitude measures have over self-report measures (e.g., enhanced validity and being less susceptible to faking, cf. Boysen, Vogel, & Madon, 2006; Nosek & Smyth, 2007; Steffens, 2004), they also share a serious problem—the lack of a theoretical foundation. As Fazio and Olson (2003, p. 301) concluded, “… research concerning implicit measures has been surprisingly atheoretical. It largely has been a methodological, empirically driven enterprise.”
THE QUAD MODEL
A promising theoretical model of the cognitive processes involved in implicit attitude measures was introduced by Conrey, Sherman, Gawronski, Hugenberg, and Groom (2005). The authors developed the Quad Model, a multinomial processing tree (MPT) model (Batchelder & Riefer, 1999; Erdfelder, Auer, Hilbig, Aßfalg, Moshagen, & Nadarevic, 2009) of IAT performance. MPT models are stochastic models for discrete categorical data as typically obtained in implicit attitude tasks (e.g., correct vs. incorrect responses for different conditions). They are based on the idea that observed responses rarely depend on a single cognitive process. In general, multiple processes may affect each possible response. Moreover, it is assumed that each of these underlying processes occurs with a certain probability. By means of MPT modeling of observed data, estimates of these process probabilities can be obtained. Hence, MPT models are useful tools for disentangling the contributions of different cognitive processes to observed response frequencies, provided these models have been shown to be psychologically valid (for a brief introduction and a useful practical guide to MPT modeling, see Klauer and Wegener (1998), Appendix A).
According to Conrey et al. (2005), performance in the IAT and related tasks also depends on multiple cognitive processes. In contrast to the standard interpretation of IAT scores as pure measures of automatic associations, they proposed that IAT performance is based on a quadruplet of processes: (1) Automatically activated stimulus–valence associations (probability AC), (2) the discriminability of the correct response (probability D), (3) the ability to inhibit automatically activated associations (“overcoming bias”, probability OB), and (4) guessing (probability G). Specifically, the Quad Model makes the following predictions (see Figure 1): When an IAT stimulus is presented, a valence association is activated automatically with probability AC that depends on the strength of a stimulus–valence association. Regardless of whether an association has been activated (probability AC) or not (probability 1-AC), the IAT stimulus can be discriminated (i.e., categorized) correctly with probability D. Because discrimination is a controlled process, the size of D should mainly depend on cognitive capacity. Whereas both association activation and discrimination result in correct responses in compatible IAT blocks, they interfere in incompatible blocks. In the latter case, association activation is suppressed with probability OB to select the correct response. Thus, OB reflects “overcoming bias,” a controlled inhibition process. When no association has been activated (probability 1-AC) and there is no correct answer available (probability 1-D), the response is determined by guessing. Whether this leads to a left hand guess (probability G) or a right hand guess (probability 1-G) depends on automatic response tendencies, strategic guessing, or random influences.
One of the advantages the Quad Model shares with other MPT models is the fact that it allows for model tests by means of goodness-of-fit statistics. These statistics indicate whether the observed data match the data that are predicted by the model's assumptions, treating the four unknown probabilities AC, D, OB, and G as free parameters in the interval [0,1]. Additionally, the Quad Model is capable of disentangling and quantifying the contributions of its proposed four cognitive processes by means of maximum likelihood (ML) parameter estimation. By now, several studies have been conducted that attest the Quad Model an excellent model fit to IAT data (Conrey et al., 2005; Gonsalkorale, Sherman, & Klauer, 2009; Sherman, Gawronski, Gonsalkorale, Hugenberg, Allen, & Groom, 2008). Moreover, these studies demonstrate that the Quad Model's parameters mirror experimental manipulations of study materials and context conditions in a meaningful, psychologically plausible way. For instance, D and OB proved to be substantially reduced under time constraints, and G has been shown to be sensitive to experimentally induced response biases.
However, despite all these advantages of the Quad Model, the model has one serious drawback. Because MPT models are models for categorical data, the Quad Model captures error rates only and ignores response latencies entirely. This feature of the Quad Model is of course problematic, particularly with regard to IAT data. Because the IAT does not make use of response deadlines, errors can easily be avoided by taking as much time as needed to determine the correct response. Consequently, IAT effect sizes based on error rates are usually much smaller than IAT effect sizes based on response latencies (Steffens et al., 2004).
Considering this fact, MPT models seem more appropriate for implicit attitude tasks that focus on error rates rather than response latencies, for example, the GNAT (Nosek & Banaji, 2001). According to Rudolph, Schröder-Abé, Schütz, Gregg, and Sedikides (2008), one main difference between the IAT and the GNAT is that “whereas in the IAT accuracy is held constant and reaction time varies, in the GNAT reaction time is held constant and accuracy varies” (p. 279). Unlike the IAT, the GNAT is capable of measuring implicit attitudes toward a single target concept. Therefore, the GNAT employs a set of distractor items instead of a second target concept. Furthermore, GNAT responses are restricted by a response deadline. When performing the GNAT, participants are required to respond to target stimuli and positive words in one block and to target stimuli and negative words in another block (“go” stimuli) whereas other stimuli, the distractors, have to be ignored (“no-go” stimuli). Accuracy differences between both blocks (usually assessed by means of the signal detection sensitivity measure d′) indicate the implicit attitude toward the target concept. Hence, because the GNAT focuses on accuracy differences between blocks of trials, it is clearly suitable for multinomial modeling approaches.
Consequently, Gonsalkorale, von Hippel, Sherman, and Klauer (2009) recently proposed a generalized Quad Model to analyze GNAT data. This generalized Quad Model differs from the original Quad Model in a single aspect: Gonsalkorale, von Hippel, et al. (2009) assume that the discrimination accuracy D in the GNAT differs between (a) stimuli of the category that serves as the go category throughout all GNAT blocks and (b) stimuli of the other categories. More precisely, it is assumed that because stimuli of the constant go category capture attention continuously in every GNAT block, participants become better at detecting these stimuli after a while. Therefore, the Quad Model variant for the GNAT uses two separate D parameters (one parameter D1 for the stimuli of the constant go category and a different parameter D2 for all other stimuli). Note that Conrey et al.'s (2005) original Quad Model can be derived from Gonsalkorale, von Hippel et al.'s (2009) generalized Quad Model by setting D1 = D2. Thus, the former model is a nested submodel of the latter. In other words, the original Quad Model can fit the data only if the generalized Quad Model does.
Apart from the additional assumption concerning D1 and D2, Gonsalkorale, von Hippel, et al. (2009) do not believe that the cognitive processes underlying the IAT and the GNAT differ from each other. In other words, the Quad Model should be applicable to both paradigms, either with a single D parameter (IAT) or with two different D parameters (GNAT). The latter hypothesis has so far only been tested in a single experiment. This study revealed a good fit of the generalized Quad Model for GNAT data Gonsalkorale, von Hippel, et al. (2009).
THE TRIP MODEL
Despite the generalized Quad Model's good fit to GNAT data in the Gonsalkorale, von Hippel, et al. (2009) study, we doubt that the IAT and the GNAT involve the same cognitive processes. Our skepticism is based on empirical findings as well as theoretical considerations.
Empirical studies have shown that correlations between IAT and GNAT measures are rather low (Nosek & Banaji, 2001; Rudolph et al., 2008). If both measures reflect the same implicit attitude, these weak correlations can only be explained by low reliability of the measures involved or by method specific variance caused by different underlying cognitive processes. The latter explanation is supported (a) by the finding of Rudolph et al. (2008) that both the IAT and the GNAT measures exhibit satisfactory to good levels of reliability and (b) by the fact that divergent results between two-choice procedures and go/no-go procedures have also been observed for other psychological tasks, for example, the Simon task (Ansorge & Wühr, 2004, 2008) and lexical-decision tasks (Gomez, Ratcliff, & Perea, 2007).
From a theoretical perspective there is also reason to believe that the IAT and the GNAT differ in their underlying cognitive processes. There are two essential differences between the two paradigms. First, the GNAT involves a (short) response deadline whereas the IAT does not. Therefore, we believe that a controlled overcoming bias process is not involved in GNAT performance simply because participants do not have enough time to overcome bias when performing a GNAT. Second, when performing a GNAT, participants have to respond to go stimuli and to withhold responses to no-go stimuli. In contrast, when performing an IAT, participants have to respond to all stimuli. Research on go/no-go tasks has demonstrated that it is more difficult to withhold responses to no-go stimuli than to respond to go stimuli. For instance, Nieuwenhuis, Yeung, van den Wildenberg, and Ridderinkhof (2003) observed 8.3% false alarms (responses to no-go stimuli) in comparison to only 0.6% misses (omissions to go stimuli) in a go/no-go tasks with an equal amount of go and no-go stimuli. Consequently, we assume that GNAT performance is more strongly influenced by general response biases than IAT performance is.
Based on these two major differences between the IAT and the GNAT, we designed a new MPT model of GNAT performance which complies with the task specific features of the GNAT. This model, called the Trip Model, is illustrated in Figure 2. The Trip Model differs from the Quad Model in the following manner.
First, the Trip Model does not include an overcoming bias process. Because GNATs typically involve relatively short response deadlines, it is very unlikely that people are able to overcome bias when performing a typical GNAT. Hence, in contrast to the Quad Model, the Trip Model uses a triplet of parameters only: AC, D, and G.
Second, whereas AC and D are assumed to measure independent processes in the Quad Model, the Trip Model assumes that association activation and discrimination reflect strongly correlated processes, with block type moderating the sign of this correlation. In other words, the model predicts that, depending on block type, an automatically activated association either strongly facilitates or impairs discrimination. Regarding the compatible GNAT block, the Trip Model assumes that associations already suggest the correct response and thus facilitate discrimination. Consequently, discrimination should be perfect in the compatible block if an automatic association has been activated. In contrast, associations in incompatible GNAT blocks counteract the discrimination process because they suggest the incorrect response. Thus, if an automatic association has been activated in the incompatible GNAT block, participants should be unable to determine the correct response in time.
Third, whereas the Quad Model assumes that response biases can drive responses only when no association is activated, the Trip Model assumes that response biases always determine responses when people are unable to detect the correct response (irrespective of whether an association has been activated or not). With this assumption the Trip Model tries to account for the strong impact of response activation and inhibition in go/no-go tasks (e.g., Johnstone, Pleffer, Barry, Clarke, & Smith, 2005). If response activation is dominant, participants should show a response bias toward pressing the go-key (probability G). In contrast, if response inhibition is dominant they should show a bias toward not responding (probability 1-G).
Note that the Trip Model shares some features with the ABC Model (Stahl & Degner, 2007), another MPT model that has recently been developed for analyzing data of the EAST (De Houwer, 2003). Both the Trip Model and the ABC Model include an association activation parameter, a response bias parameter, and a controlled discrimination parameter, whereas they omit an overcoming bias parameter. Moreover, in contrast to the Quad Model, both models do not conceive association activation and discriminability as independent processes. However, unlike the ABC Model, the Trip Model assumes that response bias determines responses in the incompatible block whenever people are unable to detect the correct response. This difference is due to the fact that the models have been developed to comply with specific features of two different tasks—the GNAT and the EAST, respectively.
In this article, we assess two different multinomial models of GNAT performance, namely the Trip Model and the generalized Quad Model. In particular, we evaluate the models' fit for different paradigms by applying them to two different GNAT variants and to the IAT (Experiment 1). In addition, we assess the validity of the Trip Model's parameters, that is, the AC parameter (in Experiment 2), the G parameter, and the D parameter (both of them in Experiment 3).
The goal of Experiment 1 was to test the Trip Model and the generalized Quad Model of GNAT performance against each other. Experiment 1 used a standard IAT and two different GNAT variants. The IAT and the two GNAT variants assessed flower–positive and insect–negative associations. Therefore, “flowers” and “insects” were used as target concepts and “positive” and “negative” as attribute concepts. The two GNAT variants differed in their go categories only. One variant was a standard GNAT that applied the target concepts “flowers” (or “insects”, respectively) as go concepts throughout all GNAT blocks. The other GNAT variant applied the attribute concepts “positive” (or “negative”, respectively) as go concepts throughout all blocks. In line with these differences, we refer to the former GNAT variant as Target GNAT and to the latter as Attribute GNAT.
Based on the assumption that the IAT and the GNAT differ in their underlying cognitive processes, we made the following predictions. (1) Regarding the Quad Model and the generalized Quad Model, we expected a good model fit for IAT data, as has previously been found by Conrey et al. (2005) and Sherman et al. (2008). In contrast, we expected that the fit of the Quad Model would be worse for the Target GNAT and Attribute GNAT data. (2) Regarding the Trip Model, we predicted the reverse pattern of results. That is, we expected a good Trip Model fit for both GNAT variants. Primarily because an overcoming bias process is not taken into account in the Trip Model, we did not expect a good model fit for IAT data.
Sixty-one students of the University of Mannheim participated in the experiment. One non-native German speaker had to be excluded from analysis because of difficulties in language processing. The age of the remaining 60 participants ranged from 19 to 40 years (M = 23.35, SD = 3.56). Fifteen participants were males, 45 were females.
Stimulus material consisted of 60 German nouns (15 flower names, 15 insect names, 15 positive words, and 15 negative words; see Appendix).
Task type was manipulated between participants. Each participant had to accomplish four Target GNATs, four Attribute GNATs, or four IATs in a row. In the Target GNAT condition, the target concept switched between the four GNATs, with the order (Flower GNAT first vs. Insect GNAT first) counter-balanced across participants. Likewise, for the Attribute GNAT and the IAT, the attribute assignment changed alternately, and the order of attribute assignment (positive GNAT first vs. negative GNAT first) was counter-balanced. Participants were randomly assigned to the experimental conditions.
Up to four participants simultaneously performed the experiment in experimental cubicles on standard PCs. To familiarize participants with their task, the experiment started with three warm-up blocks that consisted of a target discrimination task (20 trials), an attribute discrimination task (20 trials), and a combined task with response compatible concept assignment (20 trials). The concept-key assignment of the warm-up blocks was opposite to the concept-key assignment of the first actual task block. For example, when the first GNAT was a Flower GNAT, participants had to identify insects and negative words in the warm-up blocks.
Following the standard IAT procedure (e.g., Greenwald, Nosek, & Banaji, 2003), each of the subsequent GNATs or IATs contained seven blocks. The first four blocks consisted of a target discrimination task (e.g., “respond to flowers and ignore all other stimuli”; 20 trials), an attribute discrimination task (e.g., “respond to positive words and ignore all other stimuli”; 20 trials), a compatible combined practice block (e.g., “respond to flowers and positive words and ignore all other stimuli”; 20 trials), and a compatible combined test block (40 trials) with identical task instructions. The fifth block (20 trials) differed between the experimental paradigms. Participants performing the Target GNAT had to accomplish a reversed attribute discrimination task (e.g., “respond to negative words and ignore all other stimuli”). Because valence assignment was not supposed to change for the other two paradigms, the fifth block consisted of a reversed target discrimination block for the IAT and the Attribute GNAT (e.g., “respond to insects and ignore all other stimuli”). In the final two blocks, participants had to accomplish an incompatible combined practice block (e.g., “respond to insects and positive stimuli”; 20 trials) and an incompatible combined test block (40 trials) with identical task instructions. In each GNAT block, the same number of go stimuli and no-go stimuli was presented. Likewise, the IAT installed an equal amount of stimuli assigned to the left response key and stimuli assigned to the right response key.
For the Target GNATs and the Attribute GNATs, every block started with new instructions introducing the go concepts during that block. To remind participants of these go concepts, category labels were displayed in the upper corners of the screen throughout all events of the block. When pressing “enter”, instructions disappeared while a fixation mark remained 500 milliseconds in the center of the screen. Subsequently, a black screen was shown for 200 milliseconds until the first stimulus was presented. Stimuli were randomly sampled and presented in the center of the screen. To facilitate discrimination, targets were presented in white letters whereas attributes were presented in light blue letters against a black background. Each stimulus remained on the screen until the participant responded by pressing the space bar or until the response deadline ran out (900 milliseconds). When the stimulus disappeared, a green circle (correct response) or a red cross (incorrect response) was displayed for 200 milliseconds. This feedback slide was followed by a 200 milliseconds black screen preceding the next stimulus.
For the IAT, the procedure was almost identical. The only difference was that response time was not restricted and that “d” served as left response key and “l” served as right response key. Task completion took about 15–20 minutes. In the end of the experiment, participants were fully debriefed and thanked.
As usual in IAT and GNAT applications, only data from the combined test blocks were analyzed.
Analysis of Error Rates
Performance was significantly better in the compatible blocks than in the incompatible blocks for the Target GNAT, t(19) = 6.23, p < .001, dz = 1.39, the Attribute GNAT, t(19) = 7.37, p < .001, dz = 1.65, and the IAT, t(19) = 3.70, p = .002, dz = 0.83. Mean error rates are displayed in Table I. As expected, error rates differed only slightly between the compatible and the incompatible IAT blocks. In comparison, performance differences between these blocks were much more pronounced for the two GNAT variants.
Table I. Mean error rates for the different experimental groups of Experiments 1, 2, and 3
Compatible blocks (%)
Incompatible blocks (%)
75% go stimuli
25% go stimuli
The computer program multiTree (Moshagen, 2010) was employed for testing the Trip Model and the Quad Model. The model specifications for both models are presented in the Appendix. The equation files and the corresponding data files for the following analyses can be obtained from the first author. Goodness of fit of the different models was evaluated by means of the log-likelihood ratio statistic G2 which is asymptotically χ2-distributed if the model holds (Read & Cressie, 1988), with degrees of freedom (df) corresponding to the difference between the number of independent data categories and the number of estimated parameters.
To determine the sensitivity of our goodness-of-fit tests for model violations, we ran a power analysis using G*Power 3.1 (Faul, Erdfelder, Buchner, & Lang, 2009). In doing so, we applied the following default values. First, for the sake of comparability between studies, we used the conventional significance level of α = .05 in all of the following analyses. Second, in order to detect possible deviations from the models reliably, we set the target power of the model test at 1−β = .99. Finally, to be as conservative as possible, we assumed N = 3200 observations which is the smallest number of observations for a model test within this paper (i.e., in Experiment 3). The results of the G*Power sensitivity analysis revealed that, given these specifications, the following goodness-of-fit tests are powerful enough to detect deviations from the Trip Model of size w = 0.09 and deviations from both the Quad Model and the generalized Quad Model of size w = 0.10. According to Cohen (1988, p. 227), effects of size w = 0.10 can be considered “small.” Thus, given the present sample sizes and significance levels, our goodness-of-fit tests are powerful enough to detect even small model violations reliably.
Trip Model Analyses
When testing the Trip Model, we observed a good model fit for the Target GNAT, G2(4) = 6.39, p = .17 as well as for the Attribute GNAT, G2(4) = 0.35, p = .99. The four Trip Model parameter estimates (AC1, AC2, G, and D) did not significantly differ between the two GNAT versions, ΔG2(4) = 6.10, p = .19 (see Figure 3). Parameter estimates showed flower–positive associations (AC1) significantly larger than zero, ΔG2(2) = 100.50, p < .001, and significant insect–negative associations (AC2), ΔG2(2) = 132.34, p < .001. D parameter estimates were relatively high, indicating that people were quite good at determining the correct response. Furthermore, G parameter estimates suggest a general response bias toward pressing the go-key. More precisely, despite an equal number of go and no-go stimuli, G parameters were significantly larger than 0.50, ΔG2(2) = 129.37, p < .001. This observation is in line with the common finding that false alarms (responses to no-go stimuli) are more frequent than misses (omissions to go stimuli) in go/no-go tasks (e.g., Falkenstein, Hoormann, & Hohnsbein, 1999; Menon, Adleman, White, Glover, & Reiss, 2001; Nieuwenhuis et al., 2003).
The implications of the Trip Model findings are twofold. First, the Trip Model seems to be an appropriate model of the cognitive processes underlying GNAT performance. Second, the underlying cognitive processes of the GNAT do not depend on whether an attribute concept or a target concept is used as the constant go category. In contrast, when IAT data were analyzed with the Trip Model, the goodness-of-fit test indicated only a marginal model fit, G2(4) = 9.29, p = .05. Parameter estimates of the Trip Model for the IAT data are also displayed in Figure 3. As can been seen, the AC parameters were smaller for the IAT compared to the GNAT. This is consistent with the idea that the IAT tends to underestimate automatic associations. However, given the marginal fit of the Trip Model, this result should be interpreted with caution.
Quad Model Analyses
The Quad Model was tested in two variants. The generalized variant of Gonsalkorale, von Hippel, et al. (2009) involving two separate D parameters was used for the GNAT data analyses whereas the classical variant with a single D parameter was used for the IAT data analysis.
When analyzing the GNAT data, the fit of the generalized Quad Model was not as good as the one that had been observed with the Trip Model. More precisely, the generalized Quad Model only fit the data of the Target GNAT, G2(8) = 14.21, p = .08, but failed to fit the data of the Attribute GNAT, G2(8) = 31.73, p < .001.1 As explained above, misfit of the generalized Quad Model implies misfit of the classical Quad Model involving only a single D parameter. Hence, neither of the two Quad Model variants can account for the Attribute GNAT data of Experiment 1. Interestingly, Quad Model parameter estimates for the two GNAT variants were similar to the Trip Model parameter estimates. AC parameters differed significantly from zero, ΔG2(4) = 189.49, p < .001, D parameters were rather high, and G parameters indicated a significant response bias, ΔG2(2) = 67.57, p < .001 (see Figure 4). Moreover, parameter estimates for D1 and D2 did not significantly differ from each other, ΔG2(2) = 2.81, p = .25, and the OB parameters did not differ significantly from zero, ΔG2(2) = 1.44, p = .49.
When analyzing the IAT data with the Quad Model, the goodness-of-fit test indicated a good model fit, G2(7) = 10.93, p = .14. This result replicates previous findings (Conrey et al., 2005; Sherman et al., 2008) and corroborates the Quad Model as an appropriate model of IAT performance. Parameter estimates of the Quad Model for both the GNAT and the IAT data are displayed in Figure 4. Again, the AC parameters were lower for IAT as compared to GNAT data.
As predicted, the Quad Model fit the IAT data better than the GNAT data whereas the Trip Model showed a better model fit to the GNAT data. These findings (a) support our assumption that the cognitive processes underlying the GNAT differ from those underlying the IAT and (b) favor the Trip Model as a measurement model of GNAT performance.
The goal of Experiment 2 was to investigate the validity of the Trip Model in further detail. More precisely, we wanted (1) to replicate the Trip Model's fit using independent GNAT data and (2) to assess the construct validity of the Trip Model's AC parameter. Because the AC parameter is supposed to reflect automatic associations, it is probably the most interesting parameter of the Trip Model for researchers and practitioners.
One method to manipulate automatic associations is to confront participants with a short scenario before performing an implicit attitude task. For instance, Foroni and Mayr (2005) used apocalyptic scenarios that either described insects as radioactively contaminated and flowers as indispensable to life (pro-stereotype scenario) or flowers as radioactively contaminated and insects as indispensable to life (counter-stereotype scenario). They found that these scenarios affected GNAT performance significantly. After presentation of the pro-stereotype scenario, participants exhibited highly reliable GNAT effects, indicating significant flower–positive and insect–negative associations. However, when the counter-stereotype scenario had been presented, the GNAT effects did not significantly differ from zero.
In Experiment 2, we used the manipulation of Foroni and Mayr (2005) to test the validity of the Trip Model's AC parameter. We hypothesized that reading the pro-stereotype scenario would enhance the accessibility of flower–positive and insect–negative associations whereas reading the counter-stereotype scenario would decrease the accessibility of these associations. Therefore, we predicted that the AC parameters would be significantly higher in the pro-stereotype condition than in the counter-stereotype condition. Moreover, we assumed that the manipulation should not affect response biases (parameter G). Concerning the D parameter, we had no specific prediction. Although we did not intend to manipulate the D parameter in a specific direction with the presented scenarios, we considered it to be plausible that the scenario manipulation affects discrimination performance. For instance, people who hold a negative attitude toward insects are forced in the counter-stereotypical condition to think of insects as essential for surviving. Keeping this new conflicting association in mind might require additional cognitive resources and thus might reduce discrimination performance.
Fifty-seven University of Mannheim psychology undergraduates participated in the experiment. The age of the participants ranged from 18 to 38 years (M = 21.98, SD = 4.67). Fifteen participants were males, 42 were females.
The presented scenarios were taken from Foroni and Mayr (2005) and translated into German. The GNAT stimuli were the same as used in Experiment 1. However, in contrast to Experiment 1, only Target GNATs (i.e., Flower GNATs and Insects GNATs) were used.
The presented scenarios were manipulated between participants. Half of the participants read a pro-stereotype scenario and the other half read a counter-stereotype scenario before performing two GNATs. One GNAT assessed flower–positive associations and the other one assessed insect–negative associations. The order of the two GNATs (Flower GNAT vs. Insect GNAT first) and the order of combined blocks within each GNAT (compatible block vs. incompatible block first) were counter-balanced across participants. Participants were assigned randomly to the experimental conditions.
Up to six participants simultaneously performed the experiment in experimental cubicles on standard PCs. The first three task blocks were warm-up blocks. Following these warm-up blocks, either the pro-stereotype scenario or the counter-stereotype scenario was presented on the computer screen. Participants read the scenario before completing two GNATs. To keep the procedure as short as possible, each GNAT consisted of only five instead of seven blocks: A target discrimination block (20 trials), an attribute discrimination block (20 trials), a compatible combined block (80 trials), a reversed attribute discrimination block (20 trials), and an incompatible combined block (80 trials). A reminder of the scenario preceded each combined block. The settings for the single blocks and response trials were the same as in Experiment 1.
Task completion took about 15 minutes. At the end of the experiment, participants were fully debriefed and thanked.
Analysis of Error Rates
In both experimental groups, participants made significantly more errors in the incompatible blocks compared to the compatible blocks (pro-stereotype condition: t(27) = 7.42, p < .001, dz = 1.40; counter-stereotype condition: t(28) = 4.52, p < .001, dz = 0.84). The effectiveness of the scenario manipulation was apparent in the smaller GNAT effect size of the counter-stereotype condition compared to the pro-stereotype condition (as indicated by Cohen's dz). Mean error rates for both conditions are displayed in Table I.
When analyzing the GNAT data with the Trip Model, the model fit the data both in the counter-stereotype condition, G2(4) = 6.63, p = .16, and in the pro-stereotype condition, G2(4) = 9.12, p = .06. The Trip Model's parameter estimates for the two groups are displayed in Figure 5. As predicted, the AC parameters reflecting the flower–positive association (AC1) and the insect–negative association (AC2) were smaller in the counter-stereotype condition than in the pro-stereotype condition. This difference in the AC parameters was statistically significant, ΔG2(2) = 13.51, p = .001. Replicating Experiment 1, G parameter estimates were significantly larger than 0.50, ΔG2(2) = 121.32, p < .001, but did not differ between the two experimental conditions, ΔG2(1) = 1.57, p = .21. Thus, as expected, the scenario manipulation did not affect the G parameter. In contrast, there was a numerically small but statistically significant scenario effect on the D parameter, ΔG2(1) = 13.38, p < .001, indicating smaller discrimination ability in the counter-stereotype condition. This finding confirms our conjecture that cognitive capacity in the counter-stereotype group might be reduced. In contrast to participants in the pro-stereotype condition, participants in the counter-stereotype condition might be more distracted when performing the GNAT because they are forced to keep new associations in mind that contradict their intrinsic automatic associations.
The GNAT data were also analyzed with the generalized Quad Model of Gonsalkorale, von Hippel, et al. (2009). However, this model clearly failed to fit the data in both conditions (pro-stereotype condition: G2(8) = 58.71, p < .001; counter-stereotype condition: G2(8) = 43.56, p < .001). Therefore, parameter estimates of the Quad Model are not reported.
Experiment 2 successfully replicated the major findings of Experiment 1. The Trip Model fit the GNAT data in both experimental conditions whereas the generalized Quad Model did not fit the data in either condition. Furthermore, by using short scenarios influencing the accessibility of stereotypical associations, we successfully tested the validity of the Trip Model's AC parameter. After reading a counter-stereotypical scenario, the AC parameters of the Trip Model were significantly reduced compared to a pro-stereotypical scenario. Thus, the Trip Model's AC parameter indeed reflects cognitive associations.
The primary goal of Experiment 3 was to complete the series of validation studies by testing the construct validity of the Trip Model's G and D parameters. Because the G parameter is supposed to reflect response biases toward pressing the go key, we manipulated response bias by implementing GNATs with different base rates of go and no-go stimuli. We expected that G would exceed 0.50 and fall below 0.50 for high and low proportions of go-stimuli, respectively. Moreover, in order to assess the validity of the D parameter, which is supposed to reflect a controlled discrimination process, we implemented GNATs with different response deadlines. We predicted that shorter response deadlines impair discriminability and thus should result in smaller D parameter estimates. Because associations should not be affected by design features of the GNAT, such as base rates and response deadlines, we expected that the AC parameters would not differ between the different response bias and response deadline conditions.
Forty-two University of Mannheim psychology undergraduates participated in the experiment. One participant had to be excluded from the analysis because of an extremely high error rate (75%) in one GNAT block. The age of the remaining 41 participants ranged from 19 to 47 years (M = 23.54, SD = 5.98). Nine participants were males, 32 were females.
GNAT stimuli were the same as used in Experiments 1 and 2.
The ratio of go to no-go stimuli was manipulated between participants. Half of the participants processed 75% go and 25% no-go stimuli. This ratio was inverted for the other half of the participants. Each participant performed four GNATs, the first two with a response deadline of 700 milliseconds and the other two with a response deadline of 900 milliseconds, or vice versa. Thus, response deadline was a within-subject factor, with the order of deadlines (700 milliseconds vs. 900 milliseconds deadline first) counter-balanced across participants. Two of the four GNATs assessed flower–positive associations, whereas the other two assessed insect–negative associations. Flower and Insect GNATs were presented in an alternating order. In doing so, the sequence of the different GNATs (Flower GNAT first vs. Insect GNAT first) was also counter-balanced across participants. Participants were randomly assigned to the experimental groups.
Up to four persons simultaneously performed the tasks in experimental cubicles on standard PCs. Again, the first three blocks were warm-up blocks. Response deadlines of these warm-up blocks matched the one of the first actual GNAT.
Subsequently, participants completed four GNATs, that is, one GNAT for each combination of target concept (flower vs. insect) and response deadline (700 milliseconds vs. 900 milliseconds). Each GNAT consisted of five blocks: A target discrimination block (20 trials), an attribute discrimination block (20 trials), a compatible combined block (40 trials), a reversed attribute discrimination block (20 trials), and an incompatible combined block (40 trials). The order of the compatible and the incompatible blocks was randomized.
The settings for the single blocks and response trials were identical to Experiments 1 and 2, except for the response deadlines (700 milliseconds for one half of the GNATs and 900 milliseconds for the other half of the GNATs) and the proportions of go stimuli (75% vs. 25% for the two groups).
The experiment took about 15–20 minutes. After completion, participants were fully debriefed and thanked.
Results and Discussion
Analysis of Error Rates
The typical GNAT effect (more errors in the incompatible blocks than in the compatible blocks) was observed for participants that were exposed to 75% go stimuli, t(20) = 5.88, p < .001, dz = 1.28, and also for participants that were exposed to 25% go stimuli, t(19) = 7.41, p < .001, dz = 1.66. Mean error rates are summarized in Table I. Surprisingly, error rates were much higher in the 75% go stimuli group than in the 25% go stimuli group. Although this outcome was unexpected, it can be explained in terms of interference between responding and visual encoding (e.g., Danielmeier, Zysset, Müsseler, & von Cramon, 2004; Koch & Prinz, 2005; Müsseler & Wühr, 2002). For instance, Müsseler and Wühr (2002) used a dual-task paradigm including a simple go/no-go keypress task and a visual identification task. They found that identification performance was better in no-go trials than in go trials. Thus, given this finding, it is plausible that participants in the 75% go stimuli group were more strongly impaired in stimulus discrimination simply because they had to respond more frequently. If this explanation is correct, D parameters should turn out to be significantly lower in the 75% go stimuli condition compared to the 25% go stimuli condition. We will assess this prediction below.
When analyzing the GNAT data with the Trip Model, we obtained a good model fit for three of the four experimental conditions, G2(4) ≤ 4.60, p ≥ .33. No satisfactory fit was observed for the 75% go stimuli group under the 700 milliseconds response deadline condition, G2(4) = 13.91, p = .01. However, given the high statistical power of our goodness-of-fit tests, this slight discrepancy between model and data is presumably due to small and unsystematic model deviations. This interpretation is supported by the fact that (a) this is the only marginal misfit observed in eight model tests based on three experiments and (b) despite the marginal misfit, parameter estimates varied as predicted.
Within each of the two response bias groups, D parameter estimates were significantly higher in the 900 milliseconds than in the 700 milliseconds response deadline condition, ΔG2(2) = 113.73, p < .001. However, D parameter estimates not only varied between the two response deadline conditions but also between the two base rate conditions, ΔG2(2) = 122.09, p < .001. The observation that the D parameter was significantly reduced when participants were exposed to 75% go stimuli compared to only 25% go stimuli is consistent with the idea that responding interferes with stimulus discrimination. As hypothesized, G parameter estimates were significantly larger than 0.50 when the ratio of go stimuli was 75% and significantly lower than 0.50 when the ratio of go stimuli was 25%, ΔG2(4) = 1300.45, p < .001 (see Figure 6). Importantly, the two AC parameters measuring flower–positive associations (AC1) and insect–negative associations (AC2) did not differ significantly between the four experimental conditions, ΔG2(6) = 7.44, p = .28. This finding confirms the Trip Model's AC parameter as an uncontaminated, pure measure of automatic attitudes. More precisely, AC is neither affected by response base rates nor by response speed manipulations.
Finally, GNAT data were also analyzed with the generalized Quad Model of Gonsalkorale, von Hippel, et al. (2009). Replicating Experiment 2, the Quad Model again failed to fit the data in either condition, G2(8) ≥ 21.37, p ≤ .01.
Validity of the Trip Model
The results of our experiments provide strong evidence for the validity of the Trip Model. For seven of the eight GNAT data sets, goodness-of-fit tests showed a good model fit for the Trip Model. In contrast, the generalized Quad Model did not fit more than a single of the eight GNAT data sets. This result clearly demonstrates that the Trip Model captures the processes underlying GNAT performance better than the generalized Quad Model. Although not the primary goal of the present paper, we also compared the Trip Model to the ABC Model (Stahl & Degner, 2007). As outlined above, the latter model was originally suggested for the EAST paradigm and not for the GNAT. In principle, however, it can also be adapted to the GNAT. Interestingly, despite some conceptual similarities between the Trip and the ABC Model, our analyses revealed that the GNAT-adaptation of the ABC Model fit only one of the eight GNAT data sets. The Trip Model should, therefore, be preferred to both the Quad and the ABC Model when analyzing GNAT data. Table II provides a summary of the goodness-of-fit tests for the Trip Model, the generalized Quad Model, and the ABC Model.
Table II. Summary of goodness-of-fit test results for the eight different GNAT data sets
Trip Model fit
Quad Model fit
ABC Model fit
Note: Significant G2-values indicate model misfit.
Apart from the Trip Model's superiority in terms of model fit, additional evidence for the validity of the Trip Model derives from the finding that its parameters are affected by experimental manipulations in meaningful ways. As demonstrated in Experiments 2 and 3, (1) both AC parameters were smaller after reading counter-stereotype scenarios compared to pro-stereotype scenarios, (2) the G parameter correlated with the GNAT's base rate of go stimuli, and (3) the D parameter was smaller when the GNAT's response deadline was set to 700 milliseconds compared to 900 milliseconds. These findings support three important conclusions regarding the Trip Model's parameters: First, the AC parameters provide valid, uncontaminated measures of implicit associations; second, the G parameter assesses response bias selectively; and third, the D parameter captures the cognitive capacity to detect the correct response.
Because the D parameter represents a controlled process that requires cognitive capacity, any manipulation that directly or indirectly affects cognitive resources should also affect D. Presumably, this is the reason why D was sensitive to (1) the response deadline manipulation, (2) the scenario manipulation, and (3) the base rate manipulation. The counter-stereotype scenario in Experiment 2 not only affected associations as measured by AC but also impaired cognitive capacity as indexed by D because participants experienced interference between the newly created associations and their conflicting automatic associations. Similarly, the base rate manipulation in Experiment 3 not only influenced response biases but also discrimination ability D. We explain this finding in terms of interference between action and visual encoding, a phenomenon that was previously found by Danielmeier et al. (2004), Koch and Prinz (2005), and Müsseler and Wühr (2002). Moreover, we would not be surprised to find significant correlations between the D parameter and individual differences in excecutive control or intelligence, as recently reported for the IAT (e.g., Klauer, Schmitz, Teige-Mocigemba, & Voss, 2010; von Stülpnagel & Steffens, 2010).
Generalizability of Results
The superior fit of the Trip Model to GNAT data compared to the generalized Quad Model implies that the GNAT and the IAT do not rely on exactly the same processes. As previously stated, we believe that this difference is due to two procedural features of the GNAT. First, unlike the IAT, the GNAT is not a two-choice task but a go/no-go task. Because go/no-go procedures usually provoke significant response biases, GNAT performance should also be strongly influenced by response biases. Second, in contrast to the IAT, the GNAT requires response deadlines. Typically, short response deadlines are chosen to avoid bottom effects in the error rates (i.e., deadlines shorter than 1000 milliseconds as recommended by Nosek and Banaji (2001)). With deadlines shorter than 1000 milliseconds, it is very unlikely that people have sufficient time to overcome bias when performing a GNAT. Hence, this process is ignored in the Trip Model. Of course, overcoming bias could possibly play a role if substantially longer response deadlines are applied. We thus do not expect that the Trip Model holds for all conceivable GNAT procedures. Instead we claim that the Trip Model is a valid model for the processes underlying a typical GNAT, that is, a GNAT with response deadlines shorter than 1000 milliseconds.
Although the length of response deadline is certainly the most evident variable that might influence the validity of the Trip Model, effects of other procedural variables are also possible. For example, when reanalyzing the data of Gonsalkorale, von Hippel, et al. (2009), we found good fit for the Trip Model only if two separate D parameters were used—one for stimuli of the category to which people had to respond throughout all GNAT blocks (D1) and one for stimuli of the remaining three categories (D2). One possible explanation for not finding such an effect in our own data could be that we used words as stimuli whereas Gonsalkorale, von Hippel, et al. (2009) used images. Foroni and Bel-Bahar (2010) argue that words and images usually differ in their level of stimulus representation and thus can produce divergent results. Effects of these and other procedural GNAT features on the validity of the Trip Model remain an interesting topic for subsequent research.
Robustness Against Interindividual Variability
One possible objection against our data analyses (and also against those of other authors that previously made use of MPT analyses of IAT and GNAT data) could address data aggregation across participants prior to submitting these data to the model fitting procedures. To test whether the observed estimates for the Trip Model are unbiased by data aggregation across participants, we first checked our data for parameter heterogeneity using the computer program HMMTree (Stahl & Klauer, 2007)2. Details of this analysis can be obtained from the first author. As expected, parameter heterogeneity was present in each of the eight experimental conditions. Therefore, in a second step, we fitted latent-class hierarchical MPT models (Klauer, 2006) to our data that allow for variability between participants. In almost all cases, a rather good model fit was observed with either a two-class or a three-class solution. Most importantly, however, in all cases the basic pattern of the Trip Model's parameters describing participants within latent classes mirrored the effects found for the aggregate data. Hence, our results are robust against aggregation across participants. Parameter heterogeneity between participants was present but did not affect the pattern of results reported here. Moreover, inspection of the parameter estimates for different latent classes also revealed that the variability is mainly due to different ability (or motivation) levels of participants, as indicated by large differences in the D parameter between latent classes.
The present research emphasizes the utility of MPT models in studying the processes underlying implicit attitude measures. Multinomial models combine two important advantages: They are capable of disentangling and measuring the contributions of different cognitive processes, and they allow for the assessment of model fit. Based on goodness-of-fit tests, we were able to show that a typical IAT and a typical GNAT differ in their underlying cognitive processes.
As demonstrated by previous research, IAT performance is influenced by an overcoming bias process. However, as indicated by the results of our three experiments, overcoming bias can safely be ignored for typical GNAT tasks. Based on this finding, we recommend using a typical GNAT procedure rather than an IAT when research interests focus on implicit attitude assessments. Although overcoming bias is a theoretically interesting process for cognitive psychologists, applied researchers interested in measuring implicit attitudes will see it as a to-be-controlled confounding variable in the first place. One might argue that this problem can be solved by analyzing IAT data with the Quad Model, a model previously shown to be capable of separating and quantifying the influence of overcoming bias. However, in doing so, the strength of implicit attitudes is typically underestimated because IAT response latency information is ignored by the Quad Model. Therefore, we prefer the standard GNAT procedure combined with the Trip measurement model. Using this measurement tool has three important advantages: First, differences in overcoming bias cannot affect GNAT performance (at least if response deadlines are sufficiently short). Second, the Trip Model disentangles and quantifies the contribution of implicit attitudes, response biases, and discriminability in GNAT performance. Finally, in line with conventional GNAT measures, the Trip Model is based on error rates so that no important information is lost when analyzing GNAT data with the Trip Model.
To check the possibility that the generalized Quad Model's misfit to the data of the Attribute GNAT was influenced by the unusual large number of GNAT trials, we repeated the analysis separately for the first half of trials and for the second half of trials. However, the generalized Quad Model still failed to fit the data of the Attribute GNAT, G2(8) ≥ 20.49, p = .01.
Parameter homogeneity across participants is implicitly assumed whenever MPT models are applied to data that have been aggregated across participants. Violations of this assumption may cause incorrect model rejections, biased parameter estimates, and biased confidence intervals and thus needs to be tested.
As specified by Gonsalkorale, von Hippel, et al. (2009), the following parameters were estimated. One AC parameter representing the flower–positive association (AC1), one AC parameter representing the insect–negative association (AC2), one G parameter, and one OB parameter for target stimuli only. Furthermore, again following Gonsalkorale, von Hippel, et al. (2009), we used two D parameters for the GNAT analyses, one for stimuli of the constant go category (D1), and one for the stimuli of the remaining three categories (D2). Thus, six parameters (AC1, AC2, G, OB, D1, and D2) were estimated.
In contrast to Gonsalkorale, von Hippel, et al. (2009) who did not collapse data across associated categories (e.g., flower and positive), we sticked to the aggregation level originally suggested by Conrey et al. (2005). That is, data were aggregated across the categories flower and positive as well as insect and negative whenever the model equations did not differ for these categories. Aggregating independent data generated by the same process probabilities is an effective means of reducing sampling error. We thus preferred the aggregation procedure of Conrey et al. (2005) because it is statistically more efficient compared to the procedure of Gonsalkorale, von Hippel, et al. (2009). Furthermore, the Quad Model performed better when the aggregation procedure of Conrey et al. (2005) was applied than when the procedure of Gonsalkorale, von Hippel, et al. (2009) was applied.
Based on these model specifications, the model for the GNAT consisted of 14 model trees. For the Target GNATs there were six model trees for the compatible blocks: Three model trees for the Flower GNAT (one tree for flowers as the constant go category, one for positive words as the temporary go category, and one for the aggregated data of the no-go categories insects and negative words) and three model trees for the Insect GNAT (one tree for insects as constant go category, one for negative words as temporary go category, and one for the aggregated data of flowers and positive words as no-go categories). The same logic applies to the Attribute GNATs. Because the OB parameter was estimated for targets (flowers and insects) but not for attributes (positive and negative words), we could not aggregate data across the associated stimulus categories for incompatible GNAT blocks. Thus, the Quad Model consisted of eight trees for the incompatible blocks, resulting from the possible combinations of four different stimulus categories (flower, insect, positive, negative) and the required responses (go vs. no-go). Because each model tree involves two response categories only (i.e., correct response vs. incorrect response), Quad Model tests for GNAT data were based on 14 (independent data categories)−6 (estimated parameters) = 8 degrees of freedom.
For Quad Model analyses of IAT data only one D parameter was estimated so that in the compatible blocks data could always be aggregated for associated categories. Therefore, the model for the IAT consisted of only 12 model trees, that is, four trees for the compatible blocks (flower–positive vs. insect-negative crossed with required response left vs. right) and eight trees for the incompatible blocks (flower vs. positive vs. insect vs. negative crossed with required response left vs. right). Thus, the Quad Model test for IAT data was based on 12 (independent data categories)−5 (estimated parameters) = 7 degrees of freedom.
Only four parameters were estimated for the Trip Model (i.e., AC1, AC2, G, and D). Due to the lack of an OB parameter, data could be aggregated across the associated categories (flower and positive; insect and negative) not only in the compatible but also in the incompatible blocks. Hence, the model consisted of eight model trees (compatible block vs. incompatible block crossed with flower−positive vs. insect−negative crossed with go vs. no-go instructions) with two response categories (correct response vs. false response) each. Thus, Trip Model tests were based on 8 (independent data categories)−4 (estimated parameters) = 4 degrees of freedom.