A Phase Transition Model for the Speed-Accuracy Trade-Off in Response Time Experiments

Authors


should be send to Gilles Dutilh, Department of Psychology, University of Amsterdam, Roetersstraat 15, 1018 WB Amsterdam, the Netherlands. E-mail: gilles.dutilh@gmail.com

Abstract

Most models of response time (RT) in elementary cognitive tasks implicitly assume that the speed-accuracy trade-off is continuous: When payoffs or instructions gradually increase the level of speed stress, people are assumed to gradually sacrifice response accuracy in exchange for gradual increases in response speed. This trade-off presumably operates over the entire range from accurate but slow responding to fast but chance-level responding (i.e., guessing). In this article, we challenge the assumption of continuity and propose a phase transition model for RTs and accuracy. Analogous to the fast guess model (Ollman, 1966), our model postulates two modes of processing: a guess mode and a stimulus-controlled mode. From catastrophe theory, we derive two important predictions that allow us to test our model against the fast guess model and against the popular class of sequential sampling models. The first prediction—hysteresis in the transitions between guessing and stimulus-controlled behavior—was confirmed in an experiment that gradually changed the reward for speed versus accuracy. The second prediction—bimodal RT distributions—was confirmed in an experiment that required participants to respond in a way that is intermediate between guessing and accurate responding.

1. Introduction

One of the key phenomena in response time (RT) research is the speed-accuracy trade-off, by which a decision maker can speed up at the expense of accuracy and become more accurate at the expense of speed (Bogacz, Wagenmakers, Forstmann, & Nieuwenhuis, 2010; Schouten & Bekker, 1967; Wickelgren, 1977). The interdependence of RT and accuracy implies that people can be accurate and slow in one situation, yet fast and inaccurate in another, although their efficiency in information processing does not change. The speed-accuracy trade-off therefore frustrates a straightforward interpretation of RT in terms of cognitive processing time and forces researchers to consider RT and accuracy jointly.

Most models that account for the speed-accuracy trade-off, including most sequential sampling models, implicitly assume that the speed-accuracy trade-off is a continuous function. This assumption implies that a participant who is responding accurately on a certain task can gradually increase speed at the cost of gradual decreases in accuracy, until speed reaches ceiling and accuracy is at chance level (i.e., fast guessing). Here, we challenge this assumption and hypothesize that with increasing pressure to respond quickly, relatively accurate behavior suddenly collapses into guessing behavior, without going through all the intermediate stages between accurate responding and guessing.

To account for this discontinuous shift in performance, we introduce a phase transition model for the speed-accuracy trade-off. The model postulates that guessing and stimulus-controlled responding are irreconcilable modes of processing. This means that when experimental settings continuously change and force people to switch from one mode of processing to the other, this switch will be abrupt. When participants are, for example, forced to speed up over trials (and become less careful), at first they will be able to persist in fairly accurate responding. However, with a gradual increase in speed stress, performance will at some point break down completely and participants abruptly resort to fast guessing. Our model predicts a similar abrupt switch when the experimental conditions gradually encourage participants to stop guessing and be more careful (and respond more slowly).

Our phase transition model finds its roots in Ollman's fast guess model (Ollman, 1966). However, our model offers a more dynamic account of the speed-accuracy trade-off and allows for a connection to sequential sampling models of RT such as Ratcliff's diffusion model (Ratcliff, 1978). The phase transition model has the form of a cusp model from catastrophe theory. Catastrophe theory is a mathematical theory that applies to dynamic systems in which continuous changes of environmental variables lead to sudden changes in observed behavior (e.g., Zeeman, 1976). From this model, we derive two signature predictions of the phase transition model: hysteresis and bimodality. We test these two predictions in two experiments.

The outline of this article is as follows: In the first section, we discuss sequential sampling models and varied state models of the speed-accuracy trade-off. In the second section, we introduce the phase transition model. In the third section, we explain the two experiments that test the predictions of our model. Next, the experimental data are described and discussed by means of quantitative models (i.e., a hidden Markov model and our cusp model).

2. The speed-accuracy trade-off

In many speeded choice tasks, participants are instructed to respond ‘‘as fast and accurately as possible.” These instructions leave it to the participant to assess the relative importance of speed versus accuracy. This implies that both RTs and proportion of errors depend largely on the participant's judgment. Therefore, the separate analysis of either mean RT or accuracy can be deceiving. Only through an understanding of how RT and accuracy trade off can observed behavior be translated into conclusions in terms of psychologically interesting constructs.

The exact nature of the trade-off between speed and accuracy has been studied for almost a century (Henmon, 1911). Over the years, many studies have been devoted to the speed-accuracy trade-off (from now on referred to as ‘‘SAT’’). One approach of studying the SAT is to explore experimentally the entire range from chance performance (i.e., guessing) to asymptotic accuracy (e.g., Pachella & Pew, 1968; Swensson & Center, 1968; Wickelgren, 1977; Yellott, 1971). In most of these studies, the trade-off is assumed to be under experimental control through the use of response deadlines, response signals, differential pay-off, and various other methods (e.g., Meyer, Irwin, Osman, & Kounios, 1988; Pachella & Pew, 1968; Schouten & Bekker, 1967; Verhelst, Verstralen, & Jansen, 1997). The objective of these studies was to formulate a function that describes how participants move along the hypothetical SAT-curve.

The behavior at the extremes of the hypothetical SAT curve is uncontroversial: When rewards only emphasize accuracy, participants respond accurately but slowly; when rewards only emphasize speed, participants respond fast, but at chance accuracy. The controversy, however, is about how participants can shift from highly accurate to very fast performance. Fig. 1 summarizes several conflicting points of view. Lines A1 and A2 in the left panel represent continuous accounts of this shift. Function A1 represents the predictions of the SAT by most RT models, including sequential sampling models (Wickelgren, 1977, Fig. 1). This functional form is also found in some empirical studies (e.g., Dosher, 1979; Ratcliff, 2006). Another form of the SAT function that is reported in some empirical studies is represented by sigmoid function A2 (e.g., Schouten & Bekker, 1967).

Figure 1.

 The speed-accuracy trade-off. (A) Continuous trade-off of speed for accuracy, as predicted by most models of response time (RT) (A1), and as found in some empirical studies (A2). (B) Two discontinuous trade-off functions. B1 is the functional form predicted by the fast guess model. B2 is the functional form predicted by the phase transition model.

A very different account is given by models that assume that behavior originates from different states. In particular, the fast guess model (Ollman, 1970) predicts the stepwise (discontinuous) trade-off depicted by line B1. In this model, behavior originates from one of two states. The phase transition model we present in this article also predicts a discontinuous SAT and is represented by line B2.

3. Current models of the speed-accuracy trade-off

3.1. Sequential sampling models

Sequential sampling models form the dominant class of models to account for both RT distributions and accuracy. Its members include, for instance, Ratcliff's diffusion model (Ratcliff, 1978), the leaky competing accumulator model (Usher & McClelland, 2001), the linear ballistic accumulator model (Brown & Heathcote, 2005, 2008), and Poisson counter models (e.g., Smith & Van Zandt, 2000). Sequential sampling models generally produce a good fit to behavioral data, and they allow researchers to decompose effects on RT and accuracy into effects on underlying psychological constructs (e.g., Dutilh, Vandekerckhove, Tuerlinckx, & Wagenmakers, 2009; Ratcliff & Rouder, 1998). Motivated in part by the availability of easy-to-use fitting routines, sequential sampling models are applied increasingly often, both in experimental psychology (e.g., Wagenmakers, 2009) and in the neurosciences (e.g., Bogacz et al., 2010; Forstmann et al., 2008).

All sequential sampling models postulate a decision-making system that samples stimulus information over time. Often, but not always, this information is assumed to be noisy. The accumulated information reflects the evidence for each of the possible response options (see the meandering line in the left panel of Fig. 2). When the evidence for one response option reaches a preset response criterion (boundary A), the response is initiated. The setting of the response criterion in such a model governs the SAT. When the response criterion is high, the system requires strong evidence before a response is initiated, and this results in responses that are accurate, but slow. When the response criterion is low, the system requires only weak evidence to decide, and this results in responses that are fast, but inaccurate. The right panel of Fig. 2 shows how the distribution of RT changes when the response criteria (boundaries) are set so as to generate percentages correct of 95% (boundary A), 75% (boundary A′), and 55% (boundary A′′). Decreasing boundary separation leads not only to lower accuracy but also to faster responses and smaller spread of the RT distribution. Note that all intermediate values of speed and accuracy are accessible, that is, they can be achieved by an intermediate setting of response criteria. In this sense, sequential sampling models predict a continuous SAT.

Figure 2.

 Illustration of a generic diffusion-style sequential sampling model. The lower left panel shows a possible trajectory of how sampled information is accumulated until a response boundary is reached. The response time (RT) distribution associated with this process is drawn above. The three hypothetical boundary settings (A, A′, and A′′) in the left panel are associated with three different RT distributions (right panel) and proportions of correct responses (given a positive drift).

It should be acknowledged that sequential sampling models assume only implicitly that the transition from very fast to highly accurate behavior is smooth and continuous. Although the stability of intermediate trade-offs seems a crucial assumption of sequential sampling models, one could imagine a mechanism that governs discrete changes in boundary setting that would result in discrete changes in behavior. However, as we will discuss in the concluding remarks, we think that this a rather unnatural interpretation of the boundary principle and that the resulting RT model would not be very plausible.

3.2. The fast guess model

A different account of the SAT is given by varied state models, which assume that behavior originates from various separate states. The most extensively studied varied state model is Ollman's simple fast guess model (Ollman, 1966, 1970). The simple fast guess model assumes that the behavior in choice RT tasks is governed by two distinct processes, namely a guess mode (GM) (sometimes called the pre-programmed mode) and a stimulus-controlled mode (SCM).

The GM corresponds to the way information is processed in simple detection tasks, that is, no discrimination between stimuli occurs. Responses in this mode are fast and accuracy is at chance level. In the SCM, discrimination between stimuli does occur and hence responses are slower and accuracy approaches 100%. Consequently, intermediate values of RT and accuracy can only be achieved by mixing responses from the two modes. On the basis of this fast guess model, Yellott (1971) proposed an easy procedure to correct mean RT estimates for fast guessing. Link (1982) discussed this correction for two-state models in general.

The predictions of the fast guess model were supported by the studies of Swensson (1972), Swensson and Edwards (1971), and Yellott (1971). In these studies, the pressure on speed relative to accuracy was changed in small steps over the entire domain of the SAT. The results showed that with stimuli that are difficult to discriminate, participants achieved intermediate values of accuracy by mixing fast guesses and stimulus-controlled responses. Luce (1986), Townsend and Ashby (1983), and Yantis, Meyer, and Smith (1991) review several tests of the simple fast guess model and conclude that when the experimental stimuli are easy to discriminate, the results were ambiguous, supporting either a continuous SAT or a discrete SAT.

From the perspective of the fast guess model, optimal performance involves choosing the response strategy (i.e., guessing or stimulus-controlled responding) that is most profitable for the current payoff settings, and to apply this strategy on every trial that features the same payoff settings. Hence, in the fast guess model, changing demands on speed versus accuracy yields a discrete and stepwise SAT, at least when people consistently apply one and the same strategy for a specific payoff setting.

The consistent response strategy described above is optimal according to the fast guess model. However, as suggested by the results of Swensson (1972), participants might use a suboptimal ‘‘probability matching’’ strategy and meet increasing demands for speed by increasing the proportion of fast guesses. This mixing of response modes would yield a continuous SAT when trials are averaged.

4. The phase transition model

4.1. Introduction

The fast guess model predicts a very simple trade-off function. As argued above, when the relative payoff for speed versus accuracy is changed from emphasizing only accuracy through emphasizing only speed, the model predicts that optimal behavior requires a stepwise speed-accuracy trade-off (line B1 in Fig. 1). A more advanced model of the dynamics can be formulated by use of the concept of phase transitions developed in research on nonlinear dynamic systems.

Phase transitions occur in all kinds of systems, ranging from those that are physical, chemical, and biological (e.g., Poston & Stewart, 1978b) to those that are social and psychological (e.g., Jansen & Van der Maas, 2001; Latané & Nowak, 1994; Schöner, Haken, & Kelso, 1986; Stewart & Peregoy, 1983; Zeeman, 1976). These phase transitions have been studied from various theoretical perspectives, including catastrophe theory, synergetics, and nonequilibrium thermodynamics. These perspectives are mathematically similar. Here, we focus on the catastrophe perspective because it is especially useful in cases where it is not feasible to derive an exact mathematical model of the transition process under investigation (Wagenmakers, Van der Maas, & Molenaar, 2005).

4.2. Catastrophe theory

We limit our review of catastrophe theory to a small number of concepts that are required for the present purposes (for more details, see Arnold, Afrajmovich, Ilyashenko, & Shilnikov, 1999; Castrigiano & Hayes, 1993; Gilmore, 1993; Poston & Stewart, 1978a; Thom, 1975). Catastrophe theory, a branch of bifurcation theory, is a mathematical theory about dynamic systems that are governed by the gradient of a potential function. Such systems optimize some quantity, like energy or profit. Consequently, the behavior in these systems attains those values, called equilibrium states, that lead to a zero gradient (defined as the first derivative of the potential function). Catastrophe theory describes and classifies changes in equilibrium behavior. These changes come about when smooth changes in the system's parameters lead to the sudden appearance or disappearance of stable states. The simplest catastrophe, in which such discontinuities occur, is the cusp catastrophe.

To get insight into the dynamics of the cusp catastrophe, Fig. 3 plots the equilibria of the cusp catastrophe. To illustrate the function of the axes in Fig. 3, we use the famous example of the phase transition between the liquid and solid states of water (Poston & Stewart, 1978b). For relatively high values of pressure (β), such as the pressure at sea level, smooth changes in temperature (α) lead to sudden jumps between the solid and liquid phase of water (Z). In cusp terminology, β is the splitting axis, and α is the normal axis. Z represents a function of the behavioral variables, in this example, the state of the water.

Figure 3.

 The equilibrium of the cusp catastrophe describes behavior as a function of control variables α and β. Jumps in behavior take place when the setting of control variables leaves the area within the bifurcation lines, in which the system exhibits bimodality. The ball in the valleys represents the state that the system adopts when possible states appear and disappear as a function of α.

The cusp has eight distinguishing properties, which are known as catastrophe flags. These catastrophe flags, formally derived by Gilmore (1981), can be used to test the model empirically and to detect phase transitions (Van der Maas & Molenaar, 1992). The most important flags are illustrated in Fig. 3. Sudden jump is a sudden large change in behavior. In the water example, this describes the sudden temperature-induced change from water to ice and vice versa that takes place when pressure β is large. Bimodality means that two stable modes of behavior exist for a range of values of the independent or control variables. Inaccessibility means that the behavior in between these stable modes is unstable and repelling; no stable state in between water and ice exists. However, when pressure β is low, such an intermediate, syrupy state exists. Hysteresis is a delay in the sudden jump when control variables are changed up and down. In terms of the freezing water example, hysteresis refers to the fact that, when the pressure is high and in perturbation-free conditions, water freezes at −4°C and ice melts at 0°C. Divergence refers to a strong dependency on the initial conditions with respect to the mode of behavior that will be selected. Anomalous variance refers to a strong increase in variability in behavior near the sudden jump. The last two flags specify the effect of external perturbations of the system: Critical slowing down refers to delayed recovery of equilibrium behavior, and divergence of linear response refers to large oscillations induced by perturbations. A more elaborate explanation of catastrophe theory and the catastrophe flags can be found in, for example, Gilmore (1993) and Ploeger, Van der Maas, and Hartelman (2002).

4.3. Critique of catastrophe theory

Catastrophe theory carried with it the prospect that all kinds of systems could be formally described, as long as they exhibit phase transitions. This prospect caused a burst of applications in the 1970s, until Sussmann and Zahler (1978) articulated strong reservations about applications of catastrophe theory in the social and behavioral sciences.1 The main concerns that were raised are the following. First, in contrast to its formal and deterministic nature, catastrophe theory was often applied to qualitative concepts and stochastic variables. Second, the choice of control variables was often arbitrary and not well motivated. Today, catastrophe theory has regained importance in many fields of science (e.g., K. M. Newell, Liu, & Mayer-Kress, 2000; Tamaki, Torii, & Maeda, 2003; Wales, 2001). As in other modern applications of catastrophe theory, our application to the SAT is scientifically informative and falsifiable. Both our control variables, payoffs for RT and accuracy, and our dependent variables, RT and accuracy, are well defined and naturally observable quantities. Furthermore, we test both the quantitative and qualitative predictions of the cusp model.

4.4. The phase transition model of SAT

Now that we have discussed the general properties of the cusp catastrophe, we formulate a cusp model for the SAT. Below, we will first show how the SAT can be mapped onto the cusp catastrophe. Second, we will describe the dynamics implied by this mapping. Then, having formulated the SAT phase transition model, we will derive testable predictions.

4.4.1. Optimality

The use of potential functions in catastrophe theory is based on the assumption that the system under investigation optimizes (minimizes or maximizes) some quantity. In RT experiments, this assumption is uncontroversial as participants are asked to respond as quickly and accurately as possible, or to maximize profits. The experimental settings thus define the function that is optimized.

4.4.2. Definition of the behavioral variable

The main behavioral variables in RT experiments are RT and/or accuracy. As RT and accuracy strongly covary in the SAT, we can take RT, accuracy, or some function of both as the behavioral variable Z. For graphical representations of the behavior, we will use RT here; RT is a continuous measure and therefore has better measurement properties than accuracy. In the statistical analyses with hidden Markov models, we will fit both RT and accuracy simultaneously.

In accordance with the fast guess model, we hypothesize that the stable states of these behavioral variables are the GM—where responses are fast and accuracy is at chance level—and the SCM—where responses are slow, but accuracy is high. The availability of these modes or states will depend on the values of the control variables. Note that the phase transition model, however, does not describe the process by which responses are generated in both modes.

4.4.3. Choice of control variables

The control variables in a cusp catastrophe are the normal factor (α), which is associated with the hysteresis effect, and the splitting factor (β), which is associated with divergence. It seems obvious to relate the normal factor α to the traditional manipulations of the SAT (e.g., deadlines, instructions, and payoffs), because they force the participant to select one of the modes. Analogous to other catastrophe models in psychology (Latané & Nowak, 1994; Van der Maas & Molenaar, 1992; Zeeman, 1976), we propose to relate the splitting factor β to motivation or involvement. Involvement in RT tasks can be quantified, for instance, by the reward that can be earned in the experiment. In a cusp catastrophe, the size of the jumps and the magnitude of the hysteresis effect depend on the value of the splitting factor. Therefore, we hypothesize that only when rewards are significant, strong discontinuities in behavior occur. Below, we formulate this hypotheses more precisely in terms of payoffs.

4.4.4. Dynamics

Payoffs are factors that weigh the speed and accuracy of the response in the computation of a reward, denoted Rt, that a subject receives on every trial t. We distinguish two types of payoffs, the payoff for speed inline image and the payoff for accuracy inline image. When the payoff for speed is near zero and the payoff for accuracy is large, we expect that participants select the SCM. When inline image is large and inline image is near zero, we expect that participants select the GM. Finally, when both factors are large and thus the involvement is high, the typical speed-accuracy conflict is expected to arise. In this conflict situation, we expect sudden mode switches. We can express these predictions in the cusp catastrophe, in which the normal and splitting axis are functions of the payoff factors. The payoff factor for speed induces the GM and the payoff factor for accuracy induces the SCM (see Fig. 4). A rotation of 45° of these factors leads to clear definitions of the normal and splitting axis of the cusp. They are simple functions of the difference and the sum of the payoff factors, respectively.

image
image

where a and b are scaling parameters.

Figure 4.

 Graphical representation of the phase transition model. At high values of PRT + PAcc, the system consists of two irreconcilable modes, the stimulus-controlled mode and the guess mode. At intermediate values of PRT − PAcc, both states are possible. Which state is adopted at a certain moment depends on whether the system was in either the guessing state or the stimulus-controlled state before that moment.

Given these choices of the behavioral and control variables, we can derive clear predictions about behaviors corresponding to the catastrophe flags. When involvement PAcc + PRT is high, Sudden jumps between the modes should occur when payoff factors are varied. Bimodality and inaccessibility are expected within the bifurcation lines. Hysteresis should occur when PAcc − PRT varies at a high constant value of PAcc + PRT. Jumps to the SCM occur when inline image is (much) higher than inline image, whereas jumps to the GM are expected when inline image is (much) higher than inline image (hysteresis). If both payoff factors are zero, behavior is in the vicinity of the neutral point (0,0,0) and neither mode is selected. Here participants may either guess slowly or not respond at all. This seems reasonable as they cannot win or loose anything (i.e., there is no involvement whatsoever). Divergence is expected when we increase involvement PAcc + PRT but hold PAcc − PRT at zero. The participant may then choose the SCM or the GM, but the participant cannot maintain an intermediate position (inaccessibility). Anomalous variance implies that the variance of Z (i.e., RT and accuracy) in each of the modes increases strongly near the jump. If the subject is perturbed (by incorrect feedback, for instance), behavior should show large oscillations (divergence of linear response), which take a long time to fade away (critical slowing down).

4.5. Empirical predictions

The phase transition model we propose here can be considered a generalization of the fast guess model. As in the fast guess model, the phase transition model assumes two stable states, a guessing state and an accurate state. However, in the fast guess model, the switches between these states take place at the same point in either direction, that is, when PAcc = PRT. The phase transition model, on the other hand, predicts that when the stakes PAcc + PRT are high enough, hysteresis occurs in the switching between the two states, that is, the switches between these states do not take place at the same point in either direction. In catastrophe theory, the size of the hysteresis effect depends on noise in or perturbations of the system (Gilmore, 1981). Without noise, the hysteresis effect will be strong (so-called Delay convention); with noise, the hysteresis effect diminishes until switches between modes take place at the same point (the Maxwell convention), as in the fast guess model.

This hysteresis prediction implies that when the subject has to speed up, accurate responding might persist well beyond the point where the fast guess model, with its stepwise trade-off, predicts a collapse of accurate responding. This phenomenon relates to a phenomenon that many chess players experience when playing a game with less and less time on the chess clock. Although difficult to test, in many sports and activities, performance seems to be relatively unaffected, albeit within a certain range, by time pressure.2

Standard manipulations of the SAT are not able to reveal the presence of hysteresis. Therefore, to discriminate between the phase transition model and the fast guess model, we designed an experiment where changing payoffs push a participant from accurate behavior SCM to guessing (GM) and vice versa, thereby switching in either direction between the two hypothesized states. A possible limitation of this procedure is that participants may perceive changes in payoffs with a delay, potentially resulting in an artificial hysteresis effect. To exclude this possibility, we included an extra experiment (1c), based on a ‘‘modified method of limits’’ (Hock, Kelso, & Schöner, 1993).

The difference between the phase transition model and most sequential sampling models is arguably more structural. Whereas sequential sampling models assume that all responses originate from a single unitary process, the phase transition model predicts that two distinct modes of processing underlie the behavior. When the two modes of processing that underly the behavior are clearly separated, bimodality of the behavioral variables is expected. Standard manipulations of the SAT (‘‘respond as fast and accurately as possible”) cannot reveal this bimodality, as participants usually respond at above 90% correct, where the phase transition model does not predict bimodality. Bimodality is expected at intermediate values of accuracy (e.g., 75% correct). Therefore, to test the phase transition model against pure sequential sampling models, we designed an experiment in which participants were pressed to respond at 75% correct, as fast as possible.

Below, we describe two experiments designed to detect hysteresis and bimodality. We first present the hysteresis experiment and a follow-up extension that corroborates the initial results. Second, we present the bimodality experiment.

5. Experiment 1: Hysteresis

The hysteresis experiment was conducted in two stages. In Experiment 1a, three participants (A–C) were tested. Experiment 1b was conducted as an autonomous replication of Experiment 1a. In this replication, eight more participants (D–K) were tested with a slightly improved experimental design. The modified method of limits experiment is discussed in the next section.

5.1. Method Experiment 1a

In the hysteresis experiments, we aimed to force the participants to perform over the entire range of the SAT. Therefore, we applied a reward function that incorporates both speed and accuracy, enabling us to change the emphasis on speed versus accuracy in small steps. By smoothly changing the reward for speed versus accuracy back and forth from a situation with complete emphasis on speed to a situation with complete emphasis on accuracy, it is possible to reveal hysteresis in the transitions between guessing and stimulus-controlled responding.

5.1.1. Participants

Three students from the University of Amsterdam participated for course credit. Students were native speakers of Dutch and did not participate in any of the other experiments.

5.1.2. Materials and procedure

We used a lexical decision task, in which participants were required to discriminate word from non-word stimuli by pressing either the ‘‘z” button (for ‘‘word”) or the ‘‘/” button (for ‘‘non-word”). We chose to use a lexical decision task, because RTs from accurate responses are expectedly easy to distinguish from fast guesses. The stimuli were sampled randomly without replacement from a list of 120 words and 120 non-words. When all stimuli were used, sampling started from the whole list again. Macromedia's Authorware for Macintosh was used to both present the stimuli (black on white) and register the response (at a resolution of 60 Hz, i.e., precision of 16.6 ms). The response stimulus interval varied randomly between 1000 and 3000 ms, to prevent anticipatory responses. To discourage this behavior further, responses faster than 80 ms resulted in a ‘‘too early’’ warning message.

On every trial, the participant was rewarded according to

image(1)

where Rt is the total reward earned on the current trial, and inline image and inline image are the rewards for speed and accuracy, respectively. First, inline image is given by

image(2)

where inline image (PRT  ∈  [0,24])3 is the payoff weight for speed at trial t, RTSCM is the mean RT in the SCM, RTGM is the mean RT in the GM, and RTt is the RT at trial t. Equation 2 ensures that responding at guessing speed (RTt = RTGM) yields an expected reward of inline image.4 At the same time, responding slow enough to yield perfect accuracy (i.e., RTt = RTSCM) yields no reward for speed at all (inline image).

Second, inline image is given by

image(3)

where inline image is the current payoff weight for accuracy and Acct is −1 when the response on trial t is an error and 1 when the response is correct. This rule rewards responding accurately by the amount of inline image.5 At the same time, guessing yields an expected reward of zero, as half of the time, Acc will be −1, and half of the time Acc will be +1.

The reward for speed and accuracy and the total reward were presented to the participant on a feedback screen (Fig. 5). The values of the rewards were represented by horizontal bars, that, when positive, were green and rightwards, and that, when negative, were red and leftwards.

Figure 5.

 On the feedback screen, the reward for speed, accuracy, and the summed total were represented with three horizontal bars underneath the stimulus. These bars were green when positive (gray in figure, note the reward for accuracy) and red when negative (black in figure, note the negative reward for speed). The horizontal bar above the stimulus was displayed only in Experiment 1b and c. This bar contained an orange portion on the left side (gray in figure) that represented the current value of inline image, and a blue portion on the right side (black in figure) that represented the current value of inline image.

The values for RTSCM and RTGM were determined by each participant's performance in two training blocks. In one block, inline image was set to zero (for estimating RTSCM). In the other block, inline image was set to zero (for estimating RTGM).

To test for hysteresis, inline image was kept constant at the arbitrary value of 24, whereas the difference inline image was varied step by step between −24 and +24. We used a simple adaptive algorithm to adjust the payoff factors. A session always started with inline image and inline image. At these payoff settings, the participant was supposed to engage in fast guessing, because only speed is rewarded. Once it was established that the participant indeed engaged in fast guessing (mean RT over the last five trials smaller than RTGM + 50 ms), we increased PAcc and decreased PRT, in steps of 1 on each new trial. When the participant had given five consecutive correct responses, the direction of the steps reversed, that is, PRT was increased and PAcc was decreased, until the participant again met the aforementioned criteria for guessing. By following this simple algorithm, the direction of change of the payoff factors reversed as soon as the participant was stable, either in the GM or the SCM. The algorithm ensures that participants passed the bifurcation set (i.e., the area where sudden jumps can take place) as often as possible, generating the essential data for a test of the hysteresis hypothesis.

The lexical decision task was administered in blocks of 100 to 200 trials, depending on how many trials a participant took to engage in the required mode of responding. Participant A was tested on three occasions within 3 weeks and the other participants were tested once for an hour. The participants were trained on a slightly different task for about 20 min. In this training task, the payoff factors were fixed within blocks of trials. Each block ended when the total reward exceeded 200, which took between 10 and 25 trials. In each block, the payoff setting was different, so the participants learned to optimize their performance at each payoff setting. This training was very important, as naive participants tend to show suboptimal behavior when inline image is small and inline image is large. Specifically, participants tend to display stimulus-controlled behavior, whereas, at these payoff settings, fast guessing is much more rewarding.

5.2. Results Experiment 1a

Below, we will refer to a set of trials in which the participants switched from guessing to accurate responding and vice versa as a series (in which PAcc first increases and then decreases6). The first series for each participant served as training on the main task, leaving 372, 436, and 299 trials for A, B, and C, in which they completed 12, 14, and 10 series, respectively.

5.2.1. Descriptive results

For each participant, mean RT was calculated at each value of PAcc, for both directions of change in PAcc separately, as shown in Fig. 6. As expected, we found that high values of PAcc provoked slow and accurate responding, whereas low values of PAcc provoked fast and inaccurate behavior. Note, however, that mean RTs at intermediate values of PAcc are higher when the participants are directed away from slow, accurate responding (i.e., decreasing PAcc) than when they are directed away from fast and inaccurate responding (i.e., increasing PAcc). In support of this finding, model selection based on the Bayesian Information Criterion (BIC) prefers a linear model that regresses RT on both PAcc and direction of change over a linear model that regresses RT on PAcc only (see BIC values in Fig. 6). BICs quantify the relative performance of the models by striking a balance between goodness-of-fit and parsimony (Schwarz, 1978). Along with the BICs, the Pr values show the accompanying Schwartz weights, that is, the posterior probabilities for both models (given equal prior probabilities for the models, Raftery, 1999; Wagenmakers & Farrell, 2004). The finding that RT depends on both PAcc and the direction of change is consistent with the hysteresis hypothesis. Due to the binary nature of the response variable, the results for accuracy (Fig. 7) are less clear. Our statistical analysis, reported later, is based on multivariate hidden Markov models that take mean RTs and proportion correct into account simultaneously. This analysis strongly supports the hysteresis hypothesis. However, we first describe Experiment 1b, which differs from Experiment 1a only by minor methodological improvements.

Figure 6.

 Experiment 1a: Mean response time (RT) increases with PAcc. When PAcc decreases (and participants are speeding up), participants are slower at intermediate values of PAcc than when PAcc increases (slowing down). The Bayesian Information Criterions (BICs) allow for comparison between a linear model in which RT is regressed on both PAcc and direction of change (two-factor), and the model with only PAcc as predictor (one-factor). Lower BICs indicate the better model. In addition, the posterior probabilities (Schwarz weights) for both models are reported.

Figure 7.

 Experiment 1a: Mean accuracy increases with PAcc. With sparse data, visual inspection of the binary variable accuracy does not allow one to draw clear conclusions about the presence or absence of hysteresis.

5.3. Method Experiment 1b

Experiment 1b was conducted to replicate the results of Experiment 1a. We only describe the method of Experiment 1b insofar as it differs from Experiment 1a.

5.3.1. Participants

Eight first-year psychology students participated for course credit. None of these students participated in either Experiment 1a or Experiment 2.

5.3.2. Materials and procedure

We used two different tasks: one lexical decision task, as in Experiment 1a, and one perceptual task. In this perceptual task, participants were asked to judge, by pressing the appropriate button, whether a horizontal line crossing a vertical reference line extended more to the right or to the left. When the line extended more to the right, the distance to the right was about 1 mm larger than the distance to the left and vice versa. Participants were either presented the lexical decision task (participants D–G) or the visual perception task (H–K).

In this experiment, we used Presentation® software (Version 9.90, http://www.neurobs.com) to present the stimulus and register the responses. Two response buttons attached to the parallel port were used to maximize timing accuracy. The response stimulus interval varied randomly between 1000 and 3500 ms, to prevent anticipatory responses. To discourage this behavior further, responses before stimulus onset resulted in a warning message.

For both tasks in Experiment 1b, the payoff structure was equivalent to the one used in Experiment 1a. The only difference was that in Experiment 1b RTSCM and RTGM were updated during the experiment. When it was established that a participant was performing accurately, evidenced by five consecutive correct responses, these five responses were used to update RTSCM. Likewise, when it was established that the participant was guessing, evidenced by five consecutive responses faster than RTGM + 50 ms, these five responses were used to update RTGM.

An improvement to Experiment 1a was that we added a bar in the top portion of the screen (permanently visible for the participant), that visually displayed the current value of inline image and inline image, respectively. The portion of this bar that was blue/orange represented the amount of pressure on accuracy/speed (see Fig. 5). This bar ensured that the participant was aware of the current payoff setting at any time during the experiment.

Participants were tested in two sessions; the first one lasted 2 h and the second one lasted 1 h. Participants were allowed a short break every 20 min.

5.4. Results Experiment 1b

As in Experiment 1a, we discarded the first series for each participant. Participants D, E, F, and G (lexical decision task) contributed respectively 1021, 569, 1137, and 449 trials, in which they completed 39, 19, 32, and 10 series. Participants H, I, J, and K (visual task) contributed respectively 589, 920, 860, and 481 trials, in which they completed 16, 20, 26, and 13 series.

5.4.1. Descriptive results

Again, for each participant, the mean of RT was calculated at each value of PAcc, for both directions of change in PAcc separately, as shown in Fig. 8 (participants D–G) and Fig. 9 (participants H–K). Again, as expected, high values of PAcc invoke stimulus-controlled behavior, whereas low values of PAcc invoke guessing behavior. More notable, in both the lexical decision version and the perceptual version, we again found hysteresis effects on RT, indicated by higher mean RT at intermediate values of inline image when decreasing PAcc than when increasing PAcc. Again, visual inspection of response accuracy (Figs. 10 and 11) does not allow one to draw strong conclusions, and this may be due to the aforementioned binary character of the response variable. However, naturally, accuracy is low when inline image is low and high when inline image is high.

Figure 8.

 Experiment 1b (lexical decision): Mean response time (RT) increases with PAcc. For participants F and G, responses are slower at intermediate values of inline image when PAcc decreases (and participants are speeding up) than when PAcc increases (and participants are slowing down). The Bayesian Information Criterions (BICs) allow for comparison between a linear model in which RT is regressed on both PAcc and direction of change (two-factor), and the model with only PAcc as predictor (one-factor). Lower BIC's indicate the better model. In addition, the posterior probabilities (Schwarz weights) for both models are reported.

Figure 9.

 Experiment 1b (perceptual task): Mean response time (RT) increases with PAcc. When PAcc decreases (speeding), participant I is slower at intermediate values of inline image than when PAcc increases (slowing down). The Bayesian Information Criterions (BICs) allow for comparison between a linear model in which RT is regressed on both PAcc and direction of change (two-factor), and the model with only PAcc as predictor (one-factor). Lower BICs indicate the better model. In addition, the posterior probabilities (Schwarz weights) for both models are reported.

Figure 10.

 Experiment 1b (lexical decision): Mean accuracy increases with PAcc. With sparse data, visual inspection of the binary variable accuracy does not allow one to draw clear conclusions about the presence or absence of hysteresis.

Figure 11.

 Experiment 1b (perceptual task): Mean accuracy increases with PAcc. With sparse data, visual inspection of the binary variable accuracy does not allow one to draw clear conclusions about the presence or absence of hysteresis.

In sum, visual inspection of changes in mean RT and consideration of the difference in model fit between the one-factor and two-factor model both suggest that hysteresis is present for about half of the participants—for the same payoff settings, these participants were faster coming out of the GM that they were coming out of the SCM.

The visual inspection of the descriptive results has two important drawbacks. First, averaging over series might have masked sudden shifts in behavior. Second, the univariate descriptives above do not reflect the multivariate character of RT data. To address these issues and provide a more formal test of the hysteresis hypothesis, we now turn to an analysis using hidden Markov models.

5.5. Hidden Markov analyses

The SAT phase transition model assumes that, as participants switch from one processing mode to the other, RT and accuracy undergo sudden jumps. This implies that RT and accuracy can be considered a multivariate time series that follows a two-state mixture distribution. Such data can be analyzed using hidden Markov models (HMMs, e.g., Böckenholt, 2005; Vermunt, Langeheine, & Böckenholt, 1999; Visser, Raijmakers, & Van der Maas, 2009; Wickens, 1982). HMMs allow one to learn about a number of latent states that cannot be observed directly. Additional parameters describe the transition dynamics of these unobserved states and their connection to the observed behavioral variables.

Thus, a hidden Markov model consists of two main parts: the measurement model and the transition dynamics (Visser et al., 2009). The measurement model defines latent states in terms of the observed variables. In our case, the measurement model defines the states (GM and SCM) in terms of RT and accuracy. The transition dynamics are defined by the transition probabilities, that is, the probabilities of switching from one state (in our case GM) to the other SCM and vice versa.

5.5.1. HMM for the phase transition model

The phase transition model posits two modes of behavior that can be captured by a two state hidden Markov model (see Fig. 12). The two states differ in mean and variance of RT and in proportion correct. In our analysis, one state GM has a mean accuracy of 0.5 and relatively short RTs and another state SCM has a high mean accuracy and longer and more variable RTs.

Figure 12.

 Graphical representation of a two-state hidden Markov model. Both states (circles) are defined by two observed measures (squares). The transition probabilities are represented by πGS (from GM to SCM) and πSG (from SCM to GM), respectively. These πGS and πSG are regressed on PAcc via a logit transformation, as is captured in the regression functions below. In the hysteresis model, inline image. The difference between intercepts inline image and inline image quantifies the hysteresis effect.

Furthermore, the phase transition model predicts that the probability to switch from GM to SCM (πGS in Fig. 12) increases when PAcc increases and that the probability to switch from SCM to GM (πSG in Fig. 12) increases when PAcc is lowered. We incorporate this prediction by regressing the transition probabilities on PAcc via a logit link.

Finally, the hysteresis hypothesis posits that the switching probabilities depend on the direction of change of PAcc. This asymmetry is expressed in the difference between the intercepts of the regression function that describes the probability to switch from GM to SCM and the intercepts of the regression function that describes the probability to switch from SCM to GM (see Fig. 13).

Figure 13.

 The logit link function is used to regress the probability of switching from SCM to GM (πSG) and from GM to SCM (πGS) on the payoff for accuracy PAcc (see Fig. 12). Note that we plot Pr(stay in SCM) here, which is equivalent to 1-Pr(switch from SCM to GM).

5.5.2. HMMs for competitor models

The model described above represents the phase transition model. We tested this model against three competitor models that contrast with various predictions of the phase transition model.

5.5.2.1. Model (1)  The first competitor model is a model that assumes that there is only one (latent) state and that the behavior in this state is completely independent from the covariate PAcc. This model serves as a reference model and contrasts the phase transition model's prediction of the existence of two qualitatively different modes of behavior.

5.5.2.2. Model(2)  The second competitor model also comprises a single state, but here, the payoff for speed and accuracy PAcc is included as a linear predictor of both RT and accuracy. This model can be seen as a representation of the predictions of sequential sampling models in which there is a continuous trade-off between speed and accuracy. This continuous trade-off contrasts the phase transition model's prediction of a discrete trade-off.

5.5.2.3 Model(3)  The third competitor model comprises two states and PAcc is modeled to affect the transition probabilities. This model represents the predictions of the fast guess model and thus comprises symmetric transition dynamics between the two states. This symmetry contrasts the phase transition model's prediction of hysteresis that implies asymmetric transition dynamics.

The three competitor models described above are from now on referred to as model 1, 2, and 3, and the phase transition model is referred to as model 4. To fit the models, we used the r-package depmixS4 (Visser, 2007). This package allows one to fit HMMs to time series of multiple variables with different distributions (using maximum likelihood). In our case, these differently distributed variables are RT and accuracy. We chose to model RT by a log normal distribution, so we could estimate a mean and variance of log (RT) in each state. Accuracy was modeled as binomial, so, for each state, the binomial parameter was to be estimated. Furthermore, the depmixS4 package allows us to put various constraints on the parameters of the models and to include a covariate on the transition probabilities. In the formalization of the phase transition model, we will include PAcc as a covariate on the transition probabilities.

5.5.3. Hidden Markov results Experiment 1a and 1b

Table 1 shows the BICs and accompanying Schwarz weights (column Pr) for the different models fitted to each participant's data. For nine of 11 participants, the best-fitting model is model 4, the model that represents our hypothesis of a two-state system exhibiting hysteresis. For eight of those, the odds are strongly in favor of the hysteresis model. For participant G, the odds only weakly favor the hysteresis model. For participants E and H, the model that represents the fast guess model fits best, although the odds are not convincing.

Table 1. 
For each participant, the model that postulates two states and hysteresis outperforms the competitor models, as evidenced by BIC values. Columns Pr show the associated posterior probabilities (assuming the models are equally likely a priori). 1s and 2s stand for one state and two states, respectively. Experiment code 1bL indicates lexical decision version of 1b, 1bV indicates visual version. Numbers in boldface are the lowest BIC values per subject
ExperimentParticipantModel 1: 1sModel 2: 1s w/covariateModel 3: 2s fast guessModel 4: 2s hysteresis
BICPrBICPrBICPrBICPr
1aA970.12<.001749.51<.001490.15<.001442.41>.999
1aB995.28<.001708.14<.001389.33<.001358.10>.999
1aC692.19<.001526.68<.001415.66<.001376.39>.999
1bLD2,738.43<.0011,929.94<.0011,063.29<.001982.65>.999
1bLE1,263.54<.001810.07<.001701.81=.578702.44=.422
1bLF2,468.40<.0011,507.48<.0011,066.25=.0031,054.70=.997
1bLG972.91<.001754.51<.001656.46=.465656.18=.535
1bVH1,574.63<.001970.32<.001931.90=.668933.30=.332
1bVI2,468.01<.0011,897.47<.0011,692.40<.0011,650.41>.999
1bVJ2,042.91<.0011,314.84<.0011,117.39<.0011,093.78>.999
1bVK1,269.36<.001817.67<.001760.30=.052754.50=.948

For each participant, Table 2 shows the response parameters and the transition parameters for the hysteresis model 4 (model 3 for participants E and H). The response parameters are mean RT and accuracy for both the guess and the SCM.7 The response parameters show that for each participant, the modes are clearly separated in terms of mean RT. It is also clear that the spread of RT is larger in the SCM than in the GM. Accuracy in the stimulus-controlled mode is relatively high, but not perfect, suggesting that participants were able to trade-off accuracy for speed within the stimulus-controlled mode, at least to some extent.

Table 2. 
Response parameters (Mean of RT, SD of RT, and proportion correct) estimated for the hysteresis model (Model 4). State 1 is fast with accuracy at chance level. State 2 is slower and accuracy is relatively high. Spread of RT is larger in the accurate state. The rightmost columns display the parameters of the link functions, estimated for the hysteresis model (for participants E and H, for the fast guess model with inline image). For all participants, the intercept is smaller for the function linking PAcc to the probabilities to switch to SCM (a1) than the intercept for linking PAcc to the probability to switch to GM (a2). β1 was constrained to be equal in the regression functions on both YSG and YGS (see Fig. 12)
ParticipantRT state 1SD RT state 1PC state 1RT state 2SD RT state 2PC state 2β1inline imageinline image
A250.0945.330.50617.66148.680.916.481.394.35
B242.0328.230.50542.81112.920.904.951.433.66
C262.6555.060.50570.85119.290.894.551.023.96
D192.8132.840.50500.7693.330.915.14−0.152.05
E258.6551.730.50508.4298.950.907.151.38
F258.5451.330.50504.3896.520.899.300.171.60
G253.3733.780.50486.00118.870.753.740.291.23
H203.1427.270.50541.60168.310.7811.427.44
I215.6737.480.50521.09154.820.753.03−0.201.64
J205.3230.320.50455.21103.170.765.461.223.08
K196.4825.770.50476.44137.560.813.61−0.061.21

The three rightmost columns of Table 2 show the parameters of the function that links PAcc to the transition probabilities. The values of β1 confirm that for all participants, increasing PAcc leads to an increased probability to switch toward the stimulus-controlled mode. The difference between intercepts inline image and inline image quantifies the hysteresis effect (for all participants but E and H).

5.6. Cusp model results

As a final analysis, we used the r-package CUSP (Grasman, van der Maas, & Wagenmakers, 2009) to fit the data of each participant to the stochastic cusp equation by Cobb and Watson (1981). In this model, the normal and splitting variable α and β are modeled as linear functions of the experimental variable PAcc. So, for each axis, an intercept parameter (i.e., α0 and β0) and a slope parameter (i.e., αPAcc and βPAcc) are estimated. The behavioral variable Z is modeled as a linear function of  log (RT) (with parameters Z0 and ZPAcc). All parameters of the fitted cusp model can be found in Table 3. For all participants, the cusp model fitted better than a linear model according to BIC model selection, which indicates that the cusp model gives a proper description of the data. For participants B, D, E, F, and K, βPAcc could be constrained to zero, which indicates that for these participants, the experimental variable PAcc only related to the normal axis, which is predicted by the phase transition model.

Table 3. 
Parameter estimates (intercept and slope) of the linear functions that relate PAcc to the α and β axes of the cusp (columns one through four) for participants A to K. The sixth and seventh column display the parameters of the linear function that relates the cusp surface (Z) to log (RT). In all cases the cusp model gave the best explanation of the data. The phase transition model predicts positive coefficients αPAcc and zero values for βPAcc, as PAcc in our model is associated with the normal axis α. For participants B, D, E, F, and K, βPAcc could be constrained to zero
Participanta0aP Acc b0bP Acc Z0ZP Acc
A−1.420.182.27−0.16−14.762.43
B−1.170.181.840.00−20.113.42
C−1.070.191.48−0.08−16.612.78
D−2.270.222.020.00−16.542.88
E−2.140.220.840.00−17.903.05
F−2.260.210.940.00−18.343.13
G−1.710.131.71−0.08−18.223.05
H−3.490.401.27−0.40−11.361.85
I−1.470.121.68−0.06−14.142.40
J−2.420.281.25−0.11−15.402.67
K−2.540.211.180.00−16.092.79

Fig. 14 shows the best fitting model for each participant. The plotted symbols show how the participant's behavior at different settings of PAcc maps onto the αβ plain. The shaded area is the bifurcation set, that is, the area where two stable behaviors exist. The phase transition model predicts that a substantial part of the behavior falls in this bifurcation set, which is the case for most of the participants.

Figure 14.

 The best fitting cusp model for each participant. The plotted symbols show how the participant's behavior at all different settings of PAcc maps onto the αβ plain. The phase transition model predicts that a significant part of the behavior lies in the shaded area, the bifurcation set.

6. Interim conclusion

Experiments 1a and b were designed to detect hysteresis. When we changed payoff factors gradually, hysteresis was observed in the behavior of about half of our participants. Hidden Markov analyses confirmed that two modes of responding exist and that the phase transitions between these states displayed hysteresis for the majority of the participants. The fit of a stochastic cusp model provided converging evidence that the SAT can be described as a cusp catastrophe. Still, there are two important reasons to doubt that the SAT can be conceptualized generally as a cusp catastrophe. First, the hysteresis effect could be artificially enlarged by the speed of change of the payoff factors. Although we tried to make the change in PAcc gradual, it could still be the case that hysteresis occurs as an effect of a participant's delayed awareness of change in PAcc. To test this possible artifact, we applied a so-called modified method of limits in Experiment 1c. Second, although hidden Markov analyses showed that a two-state mixture describes the data best, the hysteresis experiments described above do not prove that the behavior is governed by two distinct modes. (The bimodality in the data could, indeed, be caused by the experimental manipulations.) To test whether behavior is indeed bimodal, which is a necessary criterion for the existence of phase transitions, we designed the ‘‘75%-task’’ in Experiment 3.

7. Experiment 1c: Modified method of limits

7.1. Method Experiment 1c

To test whether the hysteresis effect found in Experiment 1a and b is an artifact caused by delayed awareness of PAcc we applied a modified method of limits. This method is based on a procedure used by Hock et al. (1993) to study bistability in the perception of apparent motion. We apply the method by gradually changing PAcc but eventually pausing at predetermined settings of inline image and inline image. To understand this procedure, consider a payoff setting at which optimal behavior requires that a participant switches from his or her current mode to another mode. Now, the participant refuses to switch at this payoff setting. This reluctance to switch can have one of two reasons: First, the participant was not yet fully aware of the current payoff setting. Second, the current mode of processing is very stable (hysteresis as predicted by the phase transition model). If the first reason holds, then waiting for a couple of trials at the same value of PAcc would result in a switch, because the participant will become aware of the current payoff settings. If the second reason holds, the participant would stay in the current mode of processing, even after waiting some trials at that value of PAcc. The latter finding would be strong evidence for hysteresis.

7.1.1. Participants

The method applied in this experiment is demanding in that it requires relatively many trials. Therefore, we tested only two participants. We chose to test two participants whose data clearly displayed hysteresis in Experiment 1b. This allows us to determine whether the hysteresis effect observed in Experiment 1b stands the litmus test of a ‘‘modified method of limits.’’ We tested participant F from the lexical decision version of Experiment 1b, and participant I from the perceptual version of Experiment 1b.

7.1.2. Materials and procedure

In this experiment, both participants performed the same task they did in Experiment 1b. The difference with Experiment 1a and b is the way PAcc changed over trials.

Based on the results of Experiment 1a and b, six critical levels of PAcc were chosen (inline image), at which most of the jumps between modes appeared to take place. These are the levels at which it is interesting to examine what happens when PAcc does not change for a couple of trials. The levels of inline image we chose to examine were 8, 9, 10, 11, 12, and 13.

The experiment consisted of sets of trials. Every set of trials started with a value of inline image that was clearly favoring either guessing or stimulus-controlled responding. Then, PAcc was changed, step by step, toward a predetermined inline image (e.g., inline image). When the sequence of PAcc arrived at inline image, it remained at this value for five more trials. These are the five waiting trials where a delayed switch might or might not happen. After these five trials, a new set was started at a inline image value that either clearly favored guessing or stimulus-controlled responding. As is illustrated in Fig. 15, PAcc was increased from a low value, that favored fast guessing, upwards to inline image for two sets. Then, PAcc was decreased from a high value, that favored stimulus-controlled behavior, downwards to inline image for two sets. Over the entire experiment, the direction of change was alternated in this two-by-two manner. Each set's value of inline image was chosen semi-randomly from the selected values 8 to 13.

Figure 15.

 Illustration of the way PAcc changed in the modified methods Experiment 1c. Each dot represents a trial. After a sequence of a few trials with increasing or decreasing PAcc, the value of PAcc was kept constant for six trials at a inline image value of inline image.

7.2. Results Experiment 1c

The above procedure resulted in 6 × 4 × 2 = 48 sequences containing a total of 814 trials per participant. In this procedure, however, the only trials of interest are the waiting trials. We chose to use only the last four waiting trials from each sequence. We did so, because it is reasonable to assume that after two trials with the same PAcc value (inline image), the participant is aware of this setting. Thus, the total amount of trials used for calculating the descriptives below are 48 × 4 waiting trials.

Fig. 16 shows the data of both participants. The data of participant F clearly show hysteresis, that is, both RT and accuracy are higher for decreasing PAcc than for increasing PAcc. Again, we compared the fit of a linear model regressing waiting trials’ RT on both PAcc and direction of change (BIC = 2428.72, Schwarz weight Pr = 1.00), with a model that regressed RT on PAcc only (BIC = 2443.69, Pr = .00). We conducted the same analysis to accuracy data (be it a logistic regression), in which the two-factor model again performed better (AIC = 212.29, Pr = 1.00) than the one-factor model (AIC = 498.10, Pr = .00). These analyses suggest that both RT and accuracy depend on direction of change, which supports the hysteresis hypothesis. The data of participant I are less clear, yet for all but one critical value of PAcc, mean RT is higher for decreasing PAcc than for increasing PAcc, which is consistent with our hypothesis of hysteresis. Also for this participant, the two-factor model (BIC = 2443.91, Pr = .79) outperformed the one-factor model (BIC = 2446.55, Pr = .21) in predicting RT. In the prediction of accuracy also, the two-factor model (AIC = 267.71, Pr = 1.00) outperformed the one-factor model (AIC = 498.10, Pr = .00). Thus, also the results of participant I support the hysteresis hypothesis.

Figure 16.

 Experiment 1c: Mean response time (RT) and accuracy of the ‘‘waiting” trials. Participant F's RT and accuracy are higher when PAcc was decreased (speeding) than when it was increased (slowing). Participant I's data are less clear, but RT is usually higher on slowing trials.

7.3. Discussion Experiment 1

Hysteresis was present in the data of Experiment 1a and was replicated with the slightly improved method of Experiment 1b. For all participants in Experiments 1a and 1b, the hidden Markov analyses showed that the hysteresis model outperformed the competitor models. Furthermore, hysteresis was still present for one participant when put to the strict test of the modified method of limits, carried out in Experiment 1c. These results favor the phase transition model over the fast guess model that predicts that the jumps from GM to SCM and the jumps from SCM to GM should take place at the same setting of inline image. However, as argued before, to contrast our model against the pure sequential sampling models, evidence for hysteresis is not enough. When we would also find bimodality in the SAT, this would provide complementary evidence to favor the phase transition model over pure sequential sampling models.

8. Experiment 2: Bimodality

The phase transition model predicts bimodality in behavior when the pressure on both speed and accuracy is high (i.e., the area between the two dotted bifurcation lines in Fig. 4). Bimodality occurs because the intermediate behavior (e.g., responding at 75% correct) is inaccessible. When a participant nonetheless wants to meet the experimenter's demands to respond at 75% correct, this can only be achieved by mixing responses from the two stable modes, that is, responding accurately on some of the trials and fast guessing on the others. This mixing of strategies yields bimodal distributions of the behavioral variables. In contrast, according to most sequential sampling models, participants can simply adjust the bounds of the decision process to reach an accuracy of 75% (see Fig. 2), which would lead to a unimodal distribution of the behavioral variables.

We set out to test these diverging predictions in the second experiment by making participants respond at 75% correct. To evoke 75% correct performance, deadline or response signal procedures could be applied. However, RTs in these tasks are under control of the deadline manipulation or response signal and only accuracy is left as a dependent variable. Accuracy, unfortunately, is a discrete variable and discrete variables cannot be bimodally distributed. For that reason, deadline or response signal procedures could not be used to study bimodality.

Thus, in the experiment below, we chose not to manipulate speed but instead manipulate accuracy, allowing RT to be used as the dependent variable. We refer to the task as ‘‘the 75% task.“ In the 75% task, participants are instructed to respond at an accuracy level of 75% correct, and to do so as fast as possible. For comparison, we also administered a 50% task and a 100% task, in which participants were asked to respond as fast as possible at 50% correct (guessing) and at 100% correct. Whereas the predictions of the phase transition model and sequential sampling differ for the 75% task, they agree for the 50% and 100% tasks. For both tasks, the models predict unimodal RT distributions: fast and chance-level RTs for the 50% task, slow and almost error-free RTs for the 100% task.

It is interesting to note that we could only find a 75% instruction in experiments of Lappin and Disch (1972). Yet, from a statistical point of view, and assuming that the speed-accuracy trade-off is continuous, such an instruction would give estimates of mean RT with much lower standard errors than we get at the more typical 95% correct target. As Wickelgren (1977) points out, it is precisely at high levels of accuracy where the variation in RT is very large for very small differences in error percentage.

8.1. Method Experiment 2

8.1.1. Participants

Thirteen students at the University of Amsterdam participated for a small monetary reward.

8.1.2. Materials and procedure

We used the same lexical decision task as used in Experiment 1a. Stimuli were selected and presented in the same way as in Experiment 1a. Also, the screen refresh rate, response stimulus intervals, and response button assignment were also identical to those used in Experiment 1a.

Trials were presented in blocks and sets. A set consisted of a random number of trials (between 15 and 25), and a block consisted of a number of sets (5 to 10, depending of the number of trials in each set, such that a block had never more than 150 trials). At the end of each set, participants received feedback about their performance (i.e., their penalty score, the ranking of this score on a personal high score list, and a general high score list). After each set, the penalty score (PS) was computed as follows:

image

Participants were instructed to minimize this penalty score PS. Note that, because the participants did not know in advance the amount of trials in each set, their best strategy was to try to maintain a mean accuracy close to 75% correct throughout a set.

Note that the PS is heavily dependent on how close the participant is to the accuracy target (75%) and that speed is of secondary importance. Nevertheless, a small deviation from the optimal percentage correct could be compensated by faster responses. For instance, a 5% deviation from the goal can be compensated with an increase in speed of 140 ms. Speed was also included in the PS to prevent that participants used a stimulus-controlled strategy and intentionally erred on every fourth trial.

8.1.3. Design

The experiment featured four conditions. In the first condition, the stated target was to obtain 50% correct. In the second condition, the target was to respond at 100% correct. In the third and fourth condition participants were instructed to respond at 75% correct. The latter two conditions—denoted 75%PT and 75%SS—differed with respect to the instructions. In the 75%PT (phase transition) condition, the instructions were given in terms of the phase transition and fast guess model. The participants were told that optimal performance implied an alternation of guessing and accurate responding. In the 75%SS (sequential sampling) condition, on the other hand, the instruction of the task was given in terms of sequential sampling models. The participants were told that they should respond at such a high speed that accuracy, on average, reaches 75%. In a pilot study, the 75% condition was introduced to participants without any specific instruction, but it turned out that most participants then persisted in adapting the highly inefficient strategy of slow responding with intentional errors in one of four cases. RTs associated with this inefficient strategy are equal to, or slower than, those in the 100% condition. It is important to note that, if the sequential sampling models’ prediction of a continuous SAT is correct, instructions in terms of sequential sampling models would yield lower penalty scores than instructions in terms of the phase transition model.

A complete experimental session consisted of a series of blocks. For example, the experiment could start with a block of five sets of the 100% condition, then a block with five sets of 50%, then a block with 10 sets of 75%SS, and finally a block with 10 sets of 75%PT. Each experimental session was comprised of a total of 40 sets × 15 to 25 items ≈800 trials, and took about an hour to complete.

There were three groups (A, B, and C) of participants. The experimental session of group A (participants 1 to 5) was organized as follows: six sets with %correcttarget = 100%, six sets with %correcttarget = 50%, seven sets with %correcttarget = 75% (PT), seven sets with %correcttarget = 75% (SS). The session of group B (participants 6 to 13) was organized as follows: six sets with %correcttarget = 100%, six sets with %correcttarget = 50%, seven sets with %correcttarget = 75% (SS), seven sets with %correcttarget = 75% (PT). Two weeks later, six participants were retested. This group C (participants 1, 7, 8, 9, 11, 13) received eight sets with %correcttarget=50%, seven sets with %correcttarget = 75% (SS), seven sets with %correcttarget = 75% (PT). So group A received the phase transition model instructions first, and groups B and C received the sequential sampling model instructions first.

8.1.4. Data analysis

Data were inspected visually for bimodality, analyzed using the distributional RT analysis program of Dolan, Van der Maas, and Molenaar (2002), and the mode testing program of Hartelman, van der Maas, and Molenaar (1998). Using the distributional RT analysis program, mixtures of 1, 2, and 3 Ex-Gaussian components were fitted to RT distributions obtained in each condition. The choice for the Ex-Gaussian distribution is pragmatic and, as explained earlier, the present phase transition model does not make any critical distributional predictions. The BIC was used to determine the number of components. The number of components is, however, not always equal to the number of modes, as components can have equal means but differ in variance. Therefore, when the BIC favored the two or three component mixture, and the data and fitted distribution are clearly bimodal (and not trimodal), we concluded that the reaction times in that condition were bimodally distributed. Finally, we confirmed the results from the mixture analysis using a kernel density mode-testing program (e.g., Hartelman et al., 1998; Silverman, 1981, 1986).

8.2. Results Experiment 2

8.2.1. Penalty scores

In the 75% task, the phase transition model instructions (75%PT) yielded lower penalty scores PS than the sequential sampling model instruction (75%SS).8 Participants in group A (75%PT first) had a mean PS of 0.86 (SD = 0.33) in the 75%PT condition and a mean PS of 0.98 (SD = 0.39) in the 75%SS condition. For participants in group B and C (75%SS first), these means were 0.78 (SD = 0.28) and 0.96 (SD = 0.36), respectively. An anova with PS as dependent variable and instruction (phase transition model vs. sequential sampling model) and group (A vs. B, and C) as independent variables showed that the phase transition model instructions yielded lower penalty scores PS than sequential sampling instructions (F(1) = 16.5, p < .001).

8.2.2. Distributional analyses

Fig. 17 shows histograms of the RT data of Experiment 2 per experimental manipulation (50%, 100%, 75%PT, and 75%SS) and per participant group (A, B, C). In each group, the data were aggregated over participants. The rightmost column of histograms displays the data when aggregated over all participants in all groups using the ‘‘Vincentizing’’ technique described by Ratcliff (1979). Individual participants’ RT distributions can be found on the first author's website.

Figure 17.

 Experiment 2: For each group and each condition, the data were aggregated over participants. The condition's accuracy targets (%correcttarget) are shown in the left margin of the figure. The affixes PT and SS stand for phase transition and sequential sampling model instructions, respectively. The histograms of these aggregated sets are displayed in the left three columns. Strong evidence for bimodality is found in both versions of the 75% task. The data of all participants (from all groups) were then aggregated using the ‘‘Vincentizing’’ method described by Ratcliff (1979). The resulting histograms are plotted in the rightmost column.

As expected, the distributions of RT in the 50% and 100% condition were clearly unimodal for almost all participants, as is suggested by the group-averages shown in the upper two rows of panels in Fig. 17. In contrast, visual inspection of the RT distributions of individual participants in the 75%PT and 75%SS condition suggested bimodality in the data of the majority of participants. This bimodality is also suggested by the group distributions in the lower two rows of Fig. 17.

These results were checked with the mode testing method. The distributions of about half of the participants in conditions 75%PT and 75%SS were identified as bimodal. For the majority of participants, the mixture analyses also supported a two-component solution. However, individual-subject analyses were plagued by computational problems due to outliers, sensitivity to starting values, and occasional failures to convergence.

More robust results were obtained by aggregating the data of participants across groups A, B, and C. As can be seen in the rightmost histograms of Fig. 17, the aggregated RT distributions were unimodal in the 50% and 100% conditions and bimodal in both 75% conditions. These conclusions were confirmed by mode testing using kernel density estimates, using an alpha level of 0.05. (For details about this analysis, see Hartelman et al., 1998.)

We also inspected the results of the 10% lowest penalty scores in conditions 75%PT and 75%SS separately because sequential sampling models predict that (close to) optimal behavior would involve intermediate, unimodal behavior. Visual inspection showed that the bimodality of the distribution in the condition was preserved when only data were included of sets with the 10% of lowest penalty scores. This was checked with the mode testing method. We found that in both the 75%PT and the 75%SS condition, the hypothesis of only one mode was rejected (h-crit = 86.5, p < .001, and h-crit = 67.5, p < .05, respectively).

Finally, we checked whether fast responses were inaccurate and slow responses were accurate. As expected, in the 75% conditions, the probabilities correct of responses of RTs below and above 450 ms were .53 and .82, respectively.

8.3. Discussion Experiment 2

The data of Experiment 2 generally support the hypothesis that the instruction to respond at intermediate levels of accuracy (75%) lead to bimodality of behavior, regardless of the instructions given. With both instructions, participants managed to respond at 75% correct by alternating between two modes, which are presumably the GM and the stimulus-controlled mode. Instructing participants to try to reach 75% by adjusting their response criteria did not result in data that are consistent with the continuous SAT predicted by sequential sampling models. Furthermore, the instruction based on the phase transition model leads to lower penalty scores than the instruction based on sequential sampling models. As noted before, this is important, as, when the sequential sampling model account was correct, instructions according to the sequential sampling model would yield the lowest penalty scores. These findings strongly suggest that no intermediate mode of processing is available. Furthermore, both instructions resulted in bimodal RT distributions.

One could argue that grouping data of participants could be the source of spurious bimodality. However, this would only be the case, when some of the participants always guessed and others always responded accurately (which would still evidence the absence of an intermediate mode of processing). Both the Vincentized distributions and the distribution of RT for the 10% best sets suggest that this is not the case. If anything, averaging might have partly masked bimodality, as the locations of the two modes of processing vary over participants.

For some participants, we were unable to convincingly demonstrate bimodality. This could be due to a lack of power (the mode testing method is known to be very conservative, for example, Fisher & Marron, 2001), or to the fact that some participants find it difficult to engage in guessing behavior. We suspect that more training is required for these participants.

9. Concluding remarks

In this article, we presented a model that departs radically from much current theorizing about RTs (e.g., Ratcliff & McKoon, 2008). The phase transition model predicts a sudden collapse in the accuracy of responding when the participant is instructed to speed up, whereas most models predict a continuous trade-off between speed and accuracy.

It should be noted, however, that in sequential sampling models, it is not precisely specified how manipulations of response strategy, such as payoffs and deadlines, relate mathematically to the effective boundary values. This issue is addressed by several models, most of which describe how performance is monitored and optimized (e.g., Bogacz, 2007). One model that describes an autonomous mechanism to adapt response thresholds is Vicker's self-regulating PAGAN model. This model provides an algorithm that adjusts threshold settings on the basis of discrepancies between experienced and desired response confidence (Vickers, 1979; Vickers & Lee, 1998; Vickers & Lee, 2000). Another model that describes how performance is optimized is the neural network model proposed by Simen, Cohen, and Holmes (2006) in which response thresholds are adjusted to maximize reward rate. One of these or related models could describe the relationship between manipulations and threshold setting as a cusp function. One could argue that this implies that our current results are perfectly consistent with sequential sampling models.

Our objection to this line of reasoning is two-fold. First, although the precise relationship between boundary manipulations and the effective boundary setting is rarely specified, it is safe to say that a continuous function is assumed implicitly. A cusp function for boundaries does not really explain the inaccessible region at around 75% and we do not see a conceptual justification for this extension.9 On the contrary, in the standard explanation of the SAT in sequential sampling models, it is essential that subjects can select any boundary setting.

Second, sequential sampling models, such as the diffusion model, have not been designed to account for guessing. In these models, guessing occurs when boundary separation approaches zero. In that case, no evidence accumulation occurs and the response process reduces to residual processes combined in Ter. One could argue that Ter incorporates the guessing process, so that guessing is just normal decision making without one (important) component. It remains unclear, however, how such an account could explain inaccessibility and hysteresis, phenomena that require competition or conflict between guessing and stimulus-controlled processing. In two-state models, guessing and stimulus-controlled processing compete for the same cognitive resources such as attention, motor preparation, and stimulus encoding. This competition results in intermediate behavior that is inherently unstable: One either attempts to answer correctly or one guesses.

Therefore, we believe that the current findings are best explained in terms of a phase transition between two states or processes. Such a two-state explanation should be connected to models that describe how participants select response strategies. Relevant models would be, for example, those of Rieskamp and Otto (2006) and B. R. Newell and Lee (2010) that describe how response strategies may be selected and adjusted according to environmental variables such as pay-off and difficulty.

Yet the phase transition model is clearly too simple to explain all empirical facts. Fig. 17 shows the RT distributions of the 50%, 75%, and 100% conditions. The 100% distribution's peak is shifted to the right compared to the accurate and slow component of the 75% distributions. This is not predicted by the current formulation of the phase transition model. In a sequential sampling framework, however, this can be explained naturally by assuming two different criterion settings; one that results in very slow but almost 100% correct responses and one that results in, for example, 90% correct and somewhat quicker responses. Therefore, it would be of great value to formulate our model by integrating nonlinear sequential sampling models (Heath, 2000; Roe, Busemeyer, & Townsend, 2001; Smith, 1995; Usher & McClelland, 2001) in the phase transition model. It could be feasible to do so because mathematically, stochastic catastrophe models are very similar to continuous time diffusion models (Cobb & Watson, 1981). In the light of this integration of both frameworks, it is interesting to observe that the qualitative form of the continuous trade-off (e.g., A2 in Fig. 1) may well be described as a fold catastrophe (see Fig. 18).

Figure 18.

 The cusp catastrophe consists of two coupled fold catastrophes. This figure shows that the upper fold 1 could represent the SAT according to sequential sampling models and that the lower fold 2 could represent a model for simple RT. In this figure, the y-axis represents accuracy, but the same pattern holds for RT.

Considering that a cusp catastrophe consists of two coupled fold catastrophes, as illustrated in Fig. 18, it seems plausible to specify the upper fold as representing a sequential sampling process and the lower fold as representing a model of simple RT. As simple RTs can also be modeled as a sequential sampling process (Luce, 1986), a complete phase transition model that includes a sequential sampling explanation for both the stimulus-controlled mode and the GM seems within reach.

Footnotes

  • 1

    For a detailed response to Sussmann and Zahler (1978), see the online appendix available on the first author's website.

  • 2

    Based on the phase transition model, one could argue that trading off accuracy for speed is not so much a free strategic choice, but a competence in itself. It requires a competence to operate close to the transition point, at a high speed, very near a collapse to complete inaccurate responding. This competence may be an age-related factor in the explanation of trade-off differences in age groups (e.g., Botvinick, Braver, Barch, Carter, & Cohen, 2001; Rabbitt, 1979).

  • 3

    The value of 24 of the payoff weights was chosen arbitrarily to scale the reward rule.

  • 4
    image
  • 5
    image
  • 6

    We only write the change of PAcc for brevity. Note that the change in PRT is implied, as: inline image.

  • 7

    In the fitting routine, the proportion correct of one mode (the GM) was fixed at 0.5. Preliminary analyses showed that this restriction improved any two-state model's BIC performance. Therefore, in Table 2, the accuracy parameter in the GM is 0.5 for each participant.

  • 8

    Although the majority of participants understood the task well, a small minority found it difficult to engage in guessing. These participants found it hard to ignore the primary task of discriminating words from nonwords. After some training, however, all participants were able to perform reasonably well (i.e., attained a low penalty).

  • 9

    Other sequential sampling models, of the accumulator type, are more promising in this respect. Guessing could, for instance, be modeled by excitation instead of inhibition of accumulators.

Ancillary