Set as an Instance of a Real-World Visual-Cognitive Task


should be sent to Enkhbold Nyamsuren or Niels A. Taatgen, Department of Artificial Intelligence, University of Groningen, Nijenborgh 9, 9747 AG Groningen, The Netherlands. E-mails:;


Complex problem solving is often an integration of perceptual processing and deliberate planning. But what balances these two processes, and how do novices differ from experts? We investigate the interplay between these two in the game of SET. This article investigates how people combine bottom-up visual processes and top-down planning to succeed in this game. Using combinatorial and mixed-effect regression analysis of eye-movement protocols and a cognitive model of a human player, we show that SET players deploy both bottom-up and top-down processes in parallel to accomplish the same task. The combination of competition and cooperation of both types of processes is a major factor of success in the game. Finally, we explore strategies players use during the game. Our findings suggest that within-trial strategy shifts can occur without the need of explicit meta-cognitive control, but rather implicitly as a result of evolving memory activations.

1. Introduction

Human performance in complex tasks is often a combination of internal planning and responding appropriately to the environment. Nevertheless, cognitive models of complex tasks typically focus on the mental planning aspects, and fail to take into account that the external world can heavily influence the control of behavior.

The role of the environment was first recognized in robotics (Brooks, 1991), but it was later extended to human cognition to form embodied cognition (e.g., Clark, 1997). However, in more complex tasks, it is clear that the control of behavior is not entirely in the environment. The challenge is, therefore, to understand how control is shared between goal-driven planning and processes that are driven by perceptual input. Moreover, the balance between goal and perceptually driven control is likely to change with expertise (Kirsh & Maglio, 1994). The approach we take in this article follows the threaded cognition theory of multitasking (Salvucci & Taatgen, 2008). We will assume two parallel processes: a bottom-up visual process that scans the visual field on the basis of saliency and similarity, and a top-down planning process that tries to achieve the goal but also biases the bottom-up process. The interaction between the two processes follows the central idea in threaded cognition that there is no overall executive process that balances parallel goals. Instead, the two processes alternate in using the cognitive resources (e.g., vision, declarative memory [DM], procedural memory, etc.). Changes in the balance between the two occur if one process benefits more from learning than the other and therefore makes more efficient use of the resources available to it.

Finding an appropriate task to study the cognitive aspects of human behavior in real-life situations is not easy. However, games provide environments that often require the same type of complex processes that are usually involved in real-world situations (Green & Bavelier, 2004). This has the advantage that the behavior of a player can be studied in a controlled environment. These qualities make games on a computer an ideal tool for studying complex cognitive processes. One such game is the card game SET.1

The SET card deck consists of 81 cards. Each card differs from other cards by a unique combination of four attributes: color, number, shape, and shading. Each attribute can have one of three distinct values: red, green, and blue for the color; open, solid, and textured for the shading; one, two, and three for the number; oval, rectangle, and wiggle for the shape. The gameplay for SET is relatively simple. At any moment in the game, 12 cards are dealt face up, as is shown in Fig. 1. From those 12 cards, players should find any combination of three cards, further referred to as a set, satisfying a rule stating that in the three cards the values for each particular attribute should be all the same or all different. We will further refer to the number of attributes for which the three cards in the set have different values as the set level. A level 1 set has only one attribute with three different values, but three attributes with identical values. Correspondingly, there can be sets of level 2, 3, or 4. Fig. 1 shows an example of a level 1 (different shape) and a level 4 set (all attributes are different). In a similar manner, we can quantify perceptual similarity of two cards as the number of attributes that are shared between the two. For example, the cards in a level 1 set have a perceptual similarity of three among each other as they have three attributes with identical values. Cards in a level 4 set have a perceptual similarity of 0 because all attribute values are different.

Figure 1.

 An example array of 12 cards. The cards with the solid highlight form a level 4 set (all attributes are different), and cards with dashed highlight form a level 1 set (shape is different, and all other attributes are the same).

In the regular game, when a set is found, the corresponding set cards are picked up and replaced with new cards from a deck. After the deck runs out, the player with the most cards wins. Even though a regular game of set consists of multiple rounds, we will refer to a “game of set” in what is normally a single round: finding a set in 12 displayed cards.

There are several advantages of choosing SET as a target game of study. First, SET has an appealing simplicity of the game dynamics. The game has very simple rules to follow and a relatively static game environment. Despite the simplicity, SET requires complex cognitive processes, including pattern recognition, visual processing, and decision making. Previous studies on SET have established that both cognitive and perceptual processes are important (Jacob & Hochstein, 2008; Taatgen, Oploo, Braaksma, & Niemantsverdriet, 2003). Without consideration of both of them in combination, important information in understanding of how players play the game will be inevitably lost. As such, the game of SET provides an excellent opportunity to study the dynamics of such processes in a relatively simple environment.

Next, the game is quite unpredictable in its structure, and players are not likely to replay the exact same sequence again. There are 7*1013 possible combinations of 12 cards, which makes it highly unlikely that players will play through the same 12 cards again. There are also 1,080 different sets. This means that even experienced players will periodically have to find a set they have never encountered before.

Finally, game difficulty can differ significantly based on a player’s strategy. Given an array of 12 cards with a single set in it, a player may choose to compare every possible combination of three cards. There are 220 possible combinations, and a probability of finding a set with random choice is 1/220. However, a player may also consider combinations of two cards as a pair uniquely defines a third card. In that strategy, a player would pick two cards, would then determine what the third card should be to complete a set, and would then see whether the predicted third card is actually among the remaining 10 cards. There are 66 possible pairs, and the same set is defined by three different pairs. Therefore, a probability of finding a set with a random choice of a pair is 1/22. However, with an optimal search strategy, a player still has to consider a maximum of 54 pairs before finding a set. This is a significant decrease in complexity compared with a strategy where a player has to compare every combination of three cards.

The above two strategies are both top-down in the sense that they do not take into account what the properties of the particular array of 12 cards are. However, players are likely to be using perceptual processes and clues, such as visual grouping and visual similarity, to decrease complexity or speed up the search. As an example, suppose that there are eight red cards and two cards each for blue and green. Furthermore, let us assume that a player is using similarity in color to find a set. Blue and green cards cannot have a set as there are only two cards in each group. There are 56 combinations of three cards among red cards and 32 combinations of three cards with different colors. It is already a significant decrease in complexity from 220 to 88 possible combinations and a 2.5 time increase in a chance probability of finding a set. Chance probability of finding a set among red cards is even higher 56/88 or about 2/3. This leverage in a chance probability only comes from a larger group size for red cards. For example, if there is an even split of four cards for each color, then the chance probability of a set being among cards of the same color is only 4/76. As will be discussed next, players actually exploit the advantage of a larger group size.

There are two studies directly relevant to the work in this article. Jacob and Hochstein (2008) did several experiments with human subjects playing SET on a computer without any opponent. Each experiment was designed to test a particular aspect of the game including a strategy of playing the game, dependency of the performance on the set level, attribute preference, and the learning. Taatgen et al. (2003) also did similar experiments aimed at studying the strategy of playing the game and developed a computer model of a human player.

Jacob and Hochstein (2008) demonstrated that SET players prefer to look at perceptually similar cards, and, for the comparison of the cards, mainly rely on the perceptual processes such as similarity-detecting process. According to the authors, bias to the perceptual similarity and corresponding bottom-up processes can explain why players need less time to find lower level sets than higher level sets. Taatgen et al. (2003) also reached the conclusion that the perceptual elements play a greater role in finding lower level sets. They suggested a strategy where a player looks at an arbitrary first card then at a second card that shares an attribute value. Next, the player predicts the third card and determines whether that card is one of the remaining 10 cards. Taatgen et al. (2003) also hypothesized that the choice of the first card might not be arbitrary in some cases. They proposed that players try to find the set among the cards that have an attribute value occurring in more than half of 12 cards. For example, if there are many red cards, it is attractive to search for a set among those cards. Taatgen et al. (2003) implemented this strategy in an Adaptive Control of Thought–Rational (ACT-R) model. However, the data they collected did not have enough detail to determine whether subjects used such a strategy.

Jacob and Hochstein (2008) proposed a generalization of the above strategy based on the notions of the most abundant value and the most abundant-value group. The former refers to an attribute value that occurs most, and the latter refers to the group of cards that have the most abundant value. They found that the sets belonging to the most abundant-value group are preferred to the sets outside of that group. In addition, the time required to find the set in the most abundant-value group decreased as the size of the group increased. Most abundant-value group was preferred to any other value group independently of the attribute type. Jacob and Hochstein (2008) suggested a dimension-reduction strategy where players try to reduce the four-dimensional search space to three by choosing to look at cards that have one or more attribute values in common. It was assumed that dimension-reduction strategy is primarily used with the most abundant value.

2. Research objectives

2.1. Cognitive and perceptual processes

The dimension-reduction strategy is an example of a strategy that combines perceptual processing and goal-directed planning. Dimension reduction’s gain in efficiency is due to the fact that the perceptual system is good at detecting similarity, but goal-directed planning is needed to decide what attribute value to focus on and for how long. Even though the earlier studies have established that dimension reduction is used, their methodology did not allow studying the dynamics within a trial. Moreover, not all sets can be found with that strategy. In particular, level 4 sets have no attributes in common, making them impossible to find with dimension reduction. To gather real-time behavioral data that can provide more insight into previously hidden aspects of user behavior, we decided to use eye tracking. As many studies have shown that the eye-movement protocols directly or, at least, indirectly reflect both the cognitive processes and the amount of cognitive load (Kong, Schunn, & Wallstrom, 2010; Rayner, 1995; Salvucci, 1999), we considered eye tracking a viable choice for studying human behavior.

2.2. Performance

Performance in SET is defined by how fast a player can find a set. Hence, speed is a major factor in the game. There can be different factors defining a player’s speed. One of them is a strategy. This is the aspect of the game we are interested to explore. Taatgen et al. (2003) found that most players differ little in reaction times when it comes to finding lower level sets. However, reaction times differ significantly in finding higher level sets. One explanation for this effect might be that all players are likely to rely on general perceptual processes to find lower level sets (Jacob & Hochstein, 2008). On the other hand, finding the higher level sets may require strategies. As a consequence, we expect that slow players’ eye movements will be more guided by similarity between cards than faster players, because faster players’ strategies will overrule the default similarity-based search.

We will not address how previous experience can affect the performance, or how subjects derive the strategies. One can get better at the game through practice, by naturally having better strategic thinking skills or by just simply being good at pattern recognition. Learning in SET is a complex process and requires separate study.

2.3. Improved ACT-R model

The ACT-R model created by Taatgen et al. (2003) was able to closely approximate the human player’s reaction times. Its main drawback is that it fully predicts the third card, given the first two cards it has looked at, and then searches for that card among the remaining cards. It, therefore, does fully use a dimension-reduction strategy and also does not use perceptual similarity to find sets. In other words, it uses a pure top-down strategy. Our aim is to test whether a model with greater emphasis on perceptual elements of the game can explain the human data.

3. Experiment

3.1. Subjects

In total, 14 subjects participated in the experiment. The age of the subjects ranged from 20 to 30 years. All subjects were either students or staff members of University of Groningen. The subjects’ previous experience with SET varied greatly: from a few played games to several years of experience.

3.2. Design and procedure

Every subject was asked to do 60 trials. The group of 60 trials was the same for all subjects, but the order was determined randomly for each subject. Each trial consisted of 12 cards shown on a computer screen and arranged in an array similar to one shown in Fig. 1. Each trial had exactly one combination of three cards that formed a set. Subjects were aware of this but were not told about the level of the set. Subjects were asked to find a set and select the relevant cards with a mouse. A time limit of 180 s was given for each trial after which the next trial was shown.

All 60 trials were randomly generated. In 30 trials, one of the set cards was highlighted with a red border. These trials were distributed evenly over the four levels, with seven or eight trials of each level for each of the two highlighting conditions. The highlighted card belonged to the set and served as a clue for the subject to find the other two cards. Subjects were aware about the meaning of a highlighted card. The presence of the highlighted card should make the task of finding a set much easier. In particular, it decreases the number of possible combinations form 200 to 55, and the number of possible pairs from 66 to 11. As there are two pairs that lead to a set, in the worst case, a player will have to consider only 10 pairs. This is a six times reduction in complexity of the problem in terms of the search space. The main purpose of highlighting a card is that it provides a reference point on which we can base our eye-movement analysis.

Prior to an experiment, subjects were asked to do four warm-up trials to let them get familiar with experiment setup and with SET itself, in the case that the subject had never played it before. Results from those trials were not included in the analysis.

3.3. Eye tracking

An EyeLink 1000 eye tracker was used for recording the eye movements. It is a desktop-mounted remote eye tracker with monocular sampling rate of 500 Hz and spatial resolution of <0.01° RMS. The card images were shown on a 20-inch LCD monitor with screen size of 1,024 × 768 pixels and screen resolution of 64 pixels/inch. The card images had a size of 124 × 184 pixels, or 4.02° × 5.95°. The horizontal and vertical distances between images were 80 and 70 pixels, respectively, which constitutes to 2.59° and 2.27°. Angular sizes were calculated with an approximated viewing distance of 70 cm as subjects were given a certain freedom for head movement. The gaze position was calculated using the eye’s corneal reflection captured using an infrared camera compensated for head movements. The eye tracker’s default parameters were used to convert gaze positions into fixations and saccades. The calibration of an eye tracker was performed at the start and during the experiment, if necessary. A calibration accuracy of 0.8° was considered as an acceptable measure. Before each trial, subjects were asked to do a drift correction as an additional corrective measure.

4. Experiment results

4.1. Reaction times

In total, there were 29 trials where subjects failed to find the set, constituting 3% of all trials. Given this small proportion, we treated them as response trials with a reaction time of 180 s. Fig. 2 shows reaction times by level and highlighted condition. It shows that having a highlighted card as a clue more than halves the reaction time, and that the reaction time increases as the set level increases. This latter effect was also observed in previous studies (Jacob & Hochstein, 2008; Taatgen et al., 2003).

Figure 2.

 The mean reaction times with standard errors in ordinary and highlighted trials clustered by the levels and averaged over all subjects.

As it is shown in Fig. 3a, subjects differed significantly by mean reaction times. As can be seen in the graph, all subjects were divided into three groups of fast, medium, and slow players based on their mean reaction times. Fig. 3b indicates that there is only a small difference in speed among three groups when it comes to finding a level 1 set. However, as level increases, the differences between three groups also increase. This result is consistent with description of fast and slow players provided in Taatgen et al. (2003). Hence, we expected the groups to exhibit different behavioral effects despite the post-hoc division.

Figure 3.

 (a) Mean reaction times averaged over all trials for each subject. Subjects are divided into three groups: fast, medium, and slow players. (b) Mean reaction times averaged over trials of the same level and player group.

4.2. Dimension reduction

The reaction-time analysis shows that subjects require less time to find sets with perceptually similar cards. This suggests that subjects apply a similarity-based strategy. Even though dimension reduction is such a strategy, we want to investigate in detail to what extent this strategy is used. In this subsection, we will examine evidence for the use of the dimension-reduction strategy. If subjects used dimension-reduction strategy, then the corresponding scanpath should contain consecutive fixations on cards sharing at least one common attribute value.

To explore the existence of such a pattern, the scanpath from each trial was transformed into labeled fixation sequences. Each card in a trial was assigned one area of interest with four different labels (see Fig. 4).

Figure 4.

 One of the problems shown to a subject. Card 7 is the highlighted card. Also shown are the fixations (circles) and saccades (arrows) produced by the subject. The outer thin, black borders indicate 12 areas of interest. The four combinations of letters and numbers on top of each card represent four labels for each area of interest. A set is formed by the fourth, fifth, and seventh cards.

Each label describes one of the attribute values in a card and the position of the card in an array. For example, “G1,”“E1,”“W1,” and “C1” are four labels describing the first card with values as green-open-two-oval. Then each fixation was tagged with four labels of an area of interest within which it falls. The consecutive fixations on the same area of interest are considered as a single fixation, and the corresponding fixation durations are summed. Combining all labeled fixations of a common attribute type into fixation sequences produces four distinct sequences for each trial.

An analysis of the fixation sequences revealed the existence of a pattern of fixations related to the usage of dimension reduction. We will demonstrate this using the example problem from Fig. 4.

Fig. 5a shows a fixation sequence diagram produced from the scanpath shown in Fig. 4. Each horizontal lane in the diagram shows a subject’s fixation sequence with respect to the particular attribute type. One unit on the x-axis represents a fixation on one particular card, while the corresponding bars on four lanes represent attribute values of that card. In the diagram, the labels are color coded according to the corresponding attribute value. The consecutive fixations on the cards with the same attribute value are shaded with a solid color if the probability of such a fixation subsequence occurring by chance is equal to or below 0.05 (refer to the Appendix for details of calculating the probability).

Figure 5.

 (a) Single subject’s fixation sequence diagram for trial “lvl3_15.” (b) Changing proportion of subjects who used dimension reduction in trial “lvl3_15” as a function of fixation position in the sequence and attribute value.

From the figure, we can see that at the beginning of the trial, the subject looked at green cards, and, by the end, at cards with an oval shape. We can conclude that the subject used dimension-reduction strategy at least two times, and each time with respect to a different attribute value: green and oval. The fixation pattern for this trial is not unique for this particular subject. Fig. 5b shows the proportion of all subjects that used dimension reduction with green and oval values. This proportion is also contrasted against proportions of subjects that used dimension reduction on any of the three values from either number or shape attributes. The figure shows that at the start of the trial, subjects preferred to search for a set among green cards and later switched to a group of cards with an oval shape while mostly ignoring all other values.

4.2.1. Effects of an attribute type on dimension reduction

According to Jacob and Hochstein (2008), dimension reduction primarily occurs with the most abundant value. However, it can be observed from Fig. 5b that a majority of subjects prefer the group of green cards to the group of cards with an oval shape despite the fact that the latter has the most abundant value. This suggests that the type of the attribute also plays a role in deciding the value to be used for dimension reduction.

To find an effect of an attribute type, we have calculated an average proportion of fixation sequences where all subjects used dimension reduction for all problems. The result indicates that blocks of fixations with the same attribute value occupy on average 46% and 35% of an overall fixation sequence in trials with and without highlighted card, respectively. Note that these estimates are on the conservative side, because some sequences may not have been recognized because they cannot be distinguished from a random sequence, either due to wandering fixations, sequences that are too short, or inaccuracy in the eye tracker.

Fig. 6a shows how use of dimension reduction distributes over the four attribute types and reveals an effect of overall attribute preference. Subjects are two times more likely to look at the group of cards with the same color than any other attribute. The distributions of the most abundant values in the 60 trials among color, shading, number, and shape were 28, 27, 19, and 26, respectively.2 According to such, the corresponding bars on Fig. 6a should have nearly equal height if choice of a value was dependent only on group size. This is not the case. The results suggest that the four attribute types have different saliency properties with color being the most salient, and shape and shading being the least salient attributes.

Figure 6.

 (a) Mean proportions of attribute types used in similarity-based scanning. Proportions are shown separately for trials with and without highlighted card. (b) Proportion of trials where subjects preferred to use a value for dimension reduction with the biggest group size among other values of the same attribute type. The horizontal, dashed black line indicates the expected proportion if the choice was made randomly.

There is still an effect of the most abundant value within each attribute type. This means that among the three values of the same attribute type, the most abundant value is preferred for dimension reduction. As Fig. 6b indicates, in 85% of all trials, subjects prefer the most abundant value over the other two values of the same attribute type.3 The trend is consistent among all four attributes.

4.2.2. Effect of dimension reduction on performance

There is also a difference between fast and slow players in how they use dimension-reduction strategy. Fig. 7a shows how the usage of dimension-reduction strategy changes over time in trials with a highlighted card. There is a general trend among players to use dimension reduction at the beginning of a trial and gradually stop using it over time. It suggests that players gradually switch from a dimension reduction to some different strategy. Furthermore, the graph suggests that slow players are more likely to stick to dimension reduction longer than fast players.

Figure 7.

 (a) Changing proportion of trials in which dimension reduction was used. The proportions are calculated as a function of the fixation position within a trial. The proportion on fixation x is calculated by counting the trials that have a dimension-reduction block that includes fixation x. (b) The mean overall similarity of all cards in a particular subsequence to the highlighted card.

4.3. Dissimilarity-based search

In the previous section, we have seen that subjects use a dimension-reduction strategy to reduce the complexity of finding a set. However, it is not yet clear how a similarity-based approach can eventually find sets with many different attribute values. The fact alone that subjects were able to find level 4 sets, in which all attribute values are different, proves that the strategies they use are not limited to dimension reduction. In fact, Fig. 6a has already shown that subjects use dimension-reduction strategy only 46% of the time, even though this number may be conservative given our analysis method. Fig. 7a also suggests that players switch to a different strategy.

It is our assumption that subjects gradually switch from a similarity-based strategy to a dissimilarity-based strategy. It should be possible to observe this switch from one strategy to another in fixation sequences produced from trials with highlighted cards.

4.3.1. Search subsequences

The next analysis involves only trials with a highlighted card. Preliminary inspection of the data revealed that subjects refixated on a highlighted card approximately every five fixations, presumably to refresh their memory and to restart a new search subsequence. For example, the following labeled fixation sequence “4-7-11-10-3-7-2-11-4-3-10-2-5-9-5-6-4-7-5-8-4,” with 4 being a fixation on a highlighted card, can be broken down into three subsequences. In a similar manner, fixation sequences corresponding to other three attributes can be broken down into identical subsequences.

Breaking down a trial into separate subsequences allows us to analyze how a mean perceptual similarity of fixated cards to a highlighted card changes with each subsequence (Fig. 7b). The calculations were done separately for slow and fast players. There is a general tendency to look at a less similar card with each new fixation and each new subsequence. When players start a search, they seem to prioritize cards based on decreasing similarity to a highlighted card. Furthermore, Fig. 7b suggests that with each new search subsequence, subjects lower the similarity threshold and include in their visual search less similar cards that were not included in the previous subsequences. Finally, there may be a difference between fast and slow players in terms of bias to similarity-based search as Fig. 7b indicates. Fast players appear to abandon similarity-based search earlier than slow players.

Using a mixed-effect regression analysis (Baayen, Davidson, & Bates, 2008), we have further investigated how the tendency to look at perceptually similar cards changes during the trial. The dependent variable in the regression is the perceptual similarity of each fixated card to the corresponding highlighted card (the values on the y-axis in Fig. 7b). The following fixed effects were used: Subsequence is a log-transformed position of a subsequence in a fixation sequence. Fixation is a log-transformed position of a fixation within a subsequence. Variable RT is subject’s mean reaction time in seconds shown in Fig. 3a. In addition, two random effects on an intercept, Subject and Trial, were added, each representing subjects and trials, respectively.

Resulting coefficients for fixed main and interaction effects are shown in Table 1. The table also presents corresponding t and p values for fixed effects. The variances and standard errors of the random effects are depicted in Table 2.

Table 1. The fixed effects’ coefficients, t and p values
Fixed EffectsCoefficientsStandard Errors t Values p Values
RT 0.00460.00114.106.0012
Table 2. Variances and corresponding standard errors of random effects
Random Effects on InterceptVariancesStandard Errors

In the interpretation of coefficients, we are mainly interested in their signs. Positive coefficients increase perceptual similarity to the highlighted card. Hence, the corresponding independent variables promote the similarity-based search. The negative coefficients decrease perceptual similarity. Therefore, the corresponding independent variables facilitate the transition from the similarity-based search to dissimilarity-based search.

Both Fixation and Subsequence have negative coefficients, supporting our assumption that over time, cards that subjects look at decrease in similarity to the highlighted card. The significant main effect for Fixation indicates that transition occurs not only within fixation sequence as a whole but also within individual subsequences. An interaction effect between Fixation and Subsequence has positive coefficient. The interaction effect provides a threshold for the main effect of Subsequence after which subject cannot look at less similar cards anymore. It makes sense as it is impossible to look at cards that have more than four dissimilar attributes.

There is a strong correlation between subjects’ mean reaction time and the tendency to look at cards similar to the highlighted card. The variable RT serves as a strong predictor. Its coefficient’s sign indicates that slower players are more biased toward similarity-based search than faster players. And this bias increases as mean reaction time increases.

5. Experiment discussion

5.1. Experiment results’ summary

The mixed-effect regression analysis of the fixation sequences indicates that the subjects’ basic strategy of playing SET is similarity based. Subjects prefer to look for a set among the cards that are similar to each other.

One specific instance of a similarity-based strategy is the dimension-reduction strategy (Jacob & Hochstein, 2008). The dimension-reduction strategy can be used more than once (Figs. 5 and 6) within the same trial and each time with different attribute value. The player chooses one attribute value, to which we refer as a guiding value, and starts looking for a set among the cards that share that value. If a player fails to find a set with the current value, then another guiding value is chosen, and the new group of cards is defined as a next search space.

The overall strategy of dimension reduction is top-down, but the choice of a guiding value is heavily influenced by two bottom-up elements: (a) the size of the group of cards that share the value and (b) its attribute type. The importance of group size (Fig. 6b) was also found by Jacob and Hochstein (2008). However, contrary to their conclusion, we have found that an attribute type also plays an important role (Fig. 6a) in choosing a guiding value. In particular, color is preferred to any other attribute type, while shape and shading are the least preferred attribute types. This result coincides with other studies, concluding that people prefer to operate on colors rather than on shapes (Kieras, 2010; Kim & Cave, 1995; Pomplun et al., 2001). The number attribute also seems to be preferred to shape and shading, at least in trials with highlighted cards. The presence of a highlighted card can bias players to values of that card. Such bias can override an effect of a group size or even attribute type.

Another interesting finding is the fact that within a trial, subjects decrease the use of dimension-reduction strategy. This reduction (Fig. 7a) nicely coincides with gradual reduction in reliance on similarity (Fig. 7b). As the game progresses, players increasingly look at more dissimilar cards more suitable for finding higher level sets.

It seems that all players follow more or less these strategies. However, there are subtle differences between fast and slow groups of players. We found that fast players are less dependent on similarity than slow players (Fig. 7b and Table 1). Fast players are initially less likely to use dimension reduction and switch faster to the dissimilarity-based search than slow players.

5.2. Additional assumptions

There are still open questions that were not answered by the data analysis. For creating a plausible model of an SET player, it is essential that we have a complete picture of a player’s behavior. In this section, we address the essential but missing aspects of an SET player’s strategy by referring to relevant literature or making our own assumptions.

The two critical aspects of finding a set are reducing the search space by selecting an appropriate guiding value, and the search strategy itself once a guiding value has been selected.

5.2.1. Choice of a guiding value

Although the decision to choose a guiding value is top-down, the choice itself, we assume, is not top-down. This choice is defined by two components: a static task-independent component that defines the saliency of an attribute value in the visual field and task-dependent factors, some of which change while the search for a set progresses.

Task-independent components include attribute type and group size. The four attributes have different inherent saliency properties. The color is the most salient attribute type, and the number is more salient than shape or shading (Kieras, 2010; Kim & Cave, 1995; Pomplun et al., 2001). On the other hand, six green cards are more salient than four red cards because of an effect of group size on the saliency. These factors are not dependent on the current goal and are inherent properties of the visual object and the visual scene as a whole.

Task-dependent components include the presence of a highlighted card and the current progress within a trial. The task for the player is to find a set that includes the highlighted card (if it is present). This connection of a highlighted card to the current task increases the relevancy of the attribute values in the highlighted card. The relevance of an attribute value, however, decreases once we have already tried to find a set with that attribute value. So, if the player has not been able to find a set among the green cards, then the task relevancy of the green value decreases. This decreasing relevance can explain why the similarity of attended cards to the highlighted card decreases: Once particular attribute values have been tried as a guiding value, their relevance decreases and other, more dissimilar values are selected to guide search.

For example, at the beginning of the game, most players tend to focus on the group of cards that share particular color or number values, as color and number are the most salient attribute types. However, their relevancy will decrease over time, and eventually a player will focus on other attribute types.

5.2.2. Strategies and within-trial strategy shifts

As described earlier, the data suggest a gradual shift from dimension-reduction to a dissimilarity-based strategy.

However, so far we have no concrete evidence for the mechanisms behind such a strategy shift. One option is that there is an explicit meta-cognitive process tracking the current state of the game and timing the strategy shifts. However, a far more elegant and simpler explanation would be one in which a strategy shift occurs implicitly as a result of changing relevance of the attribute values as they are used as a guiding value. The second option does not require an explicit process of tracking current state and timing strategy shifts. The mechanism that chooses the guiding values, outlined in the previous section, does exactly that: Initially, the attribute values of the highlighted card will dominate the choice of guiding value and will, therefore, lead to similarity-based search. However, once those values have been tried, their relevance diminishes, and other values are chosen that are not attributes of the highlighted card. This will lead to a dissimilarity strategy in which a third, dissimilar card will be necessary to complete the set.

5.2.3. Strategy implementation

Once a guiding value is chosen, a search process is needed to try to find a set using the guiding value. There are two basic strategies to do this: The first is to, in addition to the highlighted card, pick a second card on the basis of the guiding value, and then pick a third card that is perceptually similar to the second card. At that point, the three cards can be compared to see whether they constitute a set. Even with a highlighted card, this search process is potentially expensive, because there are still 55 possible combinations to check. The use of a guiding value is helpful to look for the most promising combinations first, especially combinations that are potential lower level sets.

The second strategy is to select a second card in addition to the highlighted card and predict what the third card should be. After making the prediction, the predicted card may or may not be present among the remaining cards. If it is, it completes the set. This strategy is much more efficient, because there are only 11 combinations of the highlighted card with a second card, and two of those will complete a set. Even when there is no highlighted card, the prediction strategy is more efficient than the similarity strategy, because there are only 66 possible pairs, three of which are part of the set, but 220 combinations of three cards. However, the prediction strategy is more effortful and requires at least some experience with the game to be successful.

5.2.4. Competitive parallelism of the two strategies

Even though we can identify two distinct strategies, several hybrid combinations are possible. For example, instead of predicting all attribute values of the third card, it is possible to only predict two values and use these two to guide the similarity strategy. In fact, both strategies and all possible hybrids can be produced if we assume two parallel processes, a bottom-up process that scans cards based on similarity, and a top-down process that makes a prediction for the third card. This idea is consistent with the threaded cognition theory of multitasking (Salvucci & Taatgen, 2008); a bottom-up visual scanning and a top-down prediction task run in parallel, not only collaborating but also competing to achieve the same goal.

Competitive parallelism assumes that all players have two parallel processes independent of player’s proficiency. Slow players know how to predict, but they are not good at it, so typically the visual-scanning process will dominate performance. Faster players are proficient enough to make fast and accurate predictions, so the prediction process can keep up with visual scanning, making targeted search of a predicted card possible rather than just scanning on the basis of similarity.

Competitive parallelism provides advantages over a pure sequential strategy. It provides a means for a more objective comparative evaluation of efficiency of one process over another. It prevents a one-sided choice of one process over another even if one is less efficient. The less efficient process has a chance to become more cost effective with training and rehearsal. Competitive parallelism actually provides an opportunity for slow players to become faster, because even a partial prediction (i.e., two attributes instead of all four) already provides an advantage over pure similarity-based search.

Prediction works at a more conceptual level, and, therefore, requires a certain degree of proficiency that slow players may lack. Prediction is more beneficial in finding higher level sets in contrast to sequential perceptual comparison. However, it may provide little leverage against parallel bottom-up similarity detection in lower level sets. Those differences can explain why slow and fast players differ little in finding lower level sets and differ significantly in finding higher level sets.

6. An ACT-R model of an SET player

6.1. Briefly about ACT-R

We have implemented the model using the ACT-R cognitive architecture (Anderson, 2007). ACT-R has a modular organization where each module is dedicated to a distinct type of cognitive resources (visual, motor, etc.). Factual knowledge in ACT-R is represented by chunks with slots where other chunks serve as slot values. Each module has its own buffer where either new chunks can be created or existing chunks can be passed on. Three modules that are important for this article are described next.

The visual module handles visual mechanisms such as perception, attention shift, and encoding of visuals stimuli. Visual stimuli are represented in form of chunks within the visicon, a virtual imitation of a screen visible to model. This module cannot create new chunks, but rather “perceives” chunks within the visicon by placing them in its buffer.

Every chunk that has been cleared from any buffer is stored in DM module and can be retrieved again. The DM module can retrieve only one chunk at a time, which is stored in the module’s buffer. Each chunk in DM has a base-level activation value, which represents frequency and recency of use (e.g., Anderson & Schooler, 1991). A chunk’s activation in DM can also be influenced by chunks contained in buffers at the time of retrieval, by a spreading activation mechanism. Based on activation, the module computes the probability and time cost of retrieving a chunk from memory. We have implemented an additional extension to the ACT-R visual module, which enables chunks in visicon (i.e., the whole visual field) to spread activation to chunks in DM in the same manner as the chunks in buffers do.

Lastly, there is a problem state module that serves as a working memory. This module is unique as it can create new chunks that are neither perceived in the environment, nor retrieved from DM. Slot values from chunks in other buffers can be used as values for the new chunk’s slots. However, creating a new chunk is a time-costly process that takes 200 ms, a parameter in the architecture that is typically not changed.

The architecture provides an essential set of parameters by default including, but not limited to, times it takes to move the mouse, retrieve a chunk from memory, or encode a visual stimulus. It also provides a set of adjustable parameters and range of recommended values for each of those parameters. These elements of the architecture have received extensive experimental support (e.g., Anderson, 2007 and see

6.2. Model design decisions

We will now describe the solution method that we outlined in the previous section in more detail.

6.2.1. Threads

The model consists of two parallel processes (threads; see Salvucci & Taatgen, 2008) reflecting both top-down and bottom-up nature of a task. A bottom-up thread is responsible for visual processes such as choosing a scanpath or shifting attention from one card to another. The top-down thread is responsible for higher level processes such as deciding a guiding value and comparing cards. Both threads can influence each other’s processes indirectly. For example, the top-down thread chooses a guiding value based on what has already been tried earlier in the trial. However, bottom-up features such as what cards are visible or which card is being fixated also influence the choice.

6.2.2. Algorithm for general strategy

The model largely follows strategies that we have deduced from the data and the assumptions we made in the previous section. The following is the description of the model’s general strategy:

  • 1 Focus attention on the highlighted card HC.
    • a. Let CardHC be a set of four attribute values in the highlighted card.
  • 2 Retrieve any attribute value VDM from DM.
    • a. Let AV be the attribute type of VDM.
  • 3 Pick the attribute value VHC from CardHC that also has AV as attribute type.
  • 4 If VDM = VHC then use dimension reduction.
    • a. Define search space G as a group of cards that have VHC.
  • 5 If VDMVHC then use dissimilarity strategy.
    • a. Define search space G as a group of cards that does not have VHC.
  • 6 Start comparison cycles on G to search for a set (depicted in Fig. 8).
  • 7 If a set is not found, then go back to step 1.
Figure 8.

 An algorithm for searching for a set, given a specified group of cards G. Two shaded boxes represent two approaches that model uses in parallel to find a set. The shaded box on the right shows the bottom-up approach to find a set, and the shaded box on the left shows the top-down prediction approach.

Implementation of both strategies in the model is liberal in a sense that model behavior is not hardcoded. There is no explicit control over the guiding value choice. Neither is there an explicit top-down control over strategy shift. The model decides all specific details of those steps on the fly based on a visual scene and progress of a current trial. Steps 2 and 6 are most important. The outcome of step 2 defines the strategy to be used, while in step 6, bottom-up and top-down threads run in parallel, each trying to find a set separately.

6.2.3. Saliency and relevancy

This subsection describes how model takes step 2 of the algorithm. The attribute value that is the most salient and relevant at the time is chosen as the guiding value VDM. Saliency is a constant feature within a trial; however, relevancy is not and calculated each time a new VDM needs to be chosen. Within the model, we have used ACT-R’s activation mechanism to mimic both saliency and relevancy. Activation depends on several parameters such as values of a highlighted card, number of times the attribute value was used previously, the last time it was used, and so on.

The two main parameters defining saliency are attribute type and the size of the group of cards that have that value. Color is generally the most salient attribute type followed by number, whereas shape and shading are the least salient types. Attribute-type saliency is simulated using ACT-R’s chunk referencing mechanism (Table 3).

Table 3. Parameters for calculating activation for an attribute value i
ParameterInfluence on ActivationImplementation Method
Attribute typePositiveBase-level activation Bi is calculated for each attribute value based on initial number of references it is assigned. An initial number of references (n) is set for each attribute type as following (higher number results in higher activation):
 Color chunks: 40
 Number chunks: 36
 Shape chunks: 32
 Shading chunks: 28
An exact calculation was used with the decay rate of base-level learning (d) set to default value of 0.5. (tj) is the elapsed time since the chunk has been used for the j-th time.
Group sizePositiveCustom extension for Adaptive Control of Thought–Rational (ACT-R) that spreads activation from the visual field to the declarative memory (DM). The associative weight parameter (W) is set to 0.7.
faniis a measure of how many chunks in the visual field are associated with chunk i. Higher fani results in more activation spread to value i: Gi = W * ln (1 + fani)
Highlighted cardPositiveACT-R’s equation for a spreading activation from a visual buffer. (j) indicates to a value in j-th slot of a chunk that is in visual module buffer. (fanji) is a measure of how many chunks in DM are associated with value in j-th slot. Higher fan results in less activation spread to value i.
Maximum associative strength (S) is set to 4, a sufficiently high value to prevent negative spreading activation.
(Wj) is the amount of activation to be spread from in value j-th slot to value i if two are associated and set to 0.13. inline image
Frequency of useNegative (inhibitive effect)ACT-R extension for a base-level inhibition is used with short-term decay rate (ds) and time scaling (ts) parameters set to 1 and 10, respectively as recommended by Lebiere and Best (2009). inline image
Latency of useNegative (inhibitive effect)
Random noisePositive εi—ACT-R’s transient noise generated from logistic distribution with mean 0 and with :ans parameter set to 0.1. This noise ensures that model’s behavior differs each time even if presented with exactly same trial and starting conditions.

To model the effect of the group size, we used a logarithmic function (see Table 3) to map the number of occurrences of an attribute value i in the visual field onto a group size factor fani. This mapping is similar to the spreading activation mechanism in ACT-R’s DM.

The relevancy of a value depends on whether it appears on a highlighted card and whether it was used previously. The highlighted card spreads additional activation to each value it has. The relevancy of a value is temporarily inhibited after it has been used and no set was found. The time and duration of the inhibition are calculated according to Lebiere and Best’s (2009) short-term inhibition equation. The complete description of the parameters used in calculating the activation is shown in Table 3.

Values for most of the constants mentioned in Table 3 are taken from the range of recommended values mentioned in ACT-R literature (see However, we fitted the four initial numbers of references for attribute types. Two other parameters that required fitting are the associative weight parameter W and spreading activation amount Wj. The first parameter defines scale of influence of a group size, and the second one defines scale of an influence of a highlighted card.

Combining all parameters from Table 3 results in the following equation for calculating activation for attribute value i: Ai = Bi + Si + Gi − Ii + εi. The value with the highest activation is chosen for retrieval from DM. The time cost of retrieval is calculated via ACT-R equation: Time = Fe−A, where A is an activation value and F is the latency factor set to 0.2, a value most commonly used in other models.

6.2.4. Top-down versus bottom-up processes in comparison cycles

After deciding which strategy to use, the model proceeds by scanning a chosen group of cards. This is described as a step 6 in the algorithm. Individual steps of scanning are described in Fig. 8. The entire scanning can be divided into comparison cycles. In each cycle, the model picks two cards, further referred to as C1 and C2, to compare to the highlighted card. The model first chooses C1, and then C2. In each cycle, the model picks as a C1 a card that was not chosen as C1 before. Hence, the number of cycles is the same as the number of cards that match the scanning criteria.

The order in which cards are chosen as C1 is mostly defined by the order in which those cards were fixated since the scanning began. Earlier-fixated cards have a higher chance of being chosen as C1. The model is free to choose its own scanpath with the only restriction that it will not refixate on the cards it fixated before, until all other cards have been fixated.

Two different approaches are used in parallel to make the decision about C2: bottom-up and top-down. In the bottom-up approach, the model continues scanning the search space and compares the first fixated card with the highlighted card and C1 (a box in Fig. 8 denoted Wait for bottom-up scanning to return C2). At the same time, the top-down approach tries to make a prediction about C2 based on the available rules (a box in Fig. 8 denoted Predict C2 values). It generates the abstract representation of C2 and asks the visual thread to find the card matching that representation. The success and completeness of the prediction depend on availability and accessibility of prediction rules. Both approaches compete with each other. The approach that requires less time is favored over the other. In other words, if the model is able to make a prediction before the visual thread fixates and encodes some card as C2, then prediction is favored.

Given all three cards, the model verifies if the cards really make a set. If cards do not make a set, then the model goes back to visual scanning. If a set is still not found, then the model interrupts the scanning and refixates on the highlighted card to choose another guiding value. Due to limited number of cycles and the liberal way the model chooses C2, the search is not exhaustive, and the model can fail to find a set even if search space contains it.

6.2.5. Prediction rules

Predictions are made based on prediction rules. Rules are declarative chunks that have to be retrieved from memory when necessary. An example of such a rule is Given (Textured, Solid)⇒Expected (Open). It should be noted that Given (Solid, Textured)⇒Expected (Open) and the previous rule are treated as different ones. There are also rules for similarity such as Given (Red, Red)⇒Expected (Red). In total, the model can have 36 rules: nine rules for each attribute.

6.3. Model results

In each trial, the model is presented with 12 cards. One card is always highlighted, indicating that it belongs to a set. The model has to find the other two cards forming a set. The same 30 trials from the experiment with human subjects were used.

We created eight versions of the model. The only difference between model versions was the availability of prediction rules. The first model had no prediction rules in DM. The second model had 12 prediction rules for predicting similarity of the corresponding 12 attribute values (e.g., Given (Red, Red)⇒Expected (Red)). The third model had 16 rules: 12 similarity rules and four rules for predicting dissimilarity, one for each attribute. The number of available rules in subsequent models was increased in a similar manner by four. Each version of the model was run 10 times.

6.3.1. Reaction times

Fig. 9a shows reaction times for all eight models averaged over four difficulty levels. As was hypothesized previously, the model’s reaction time gradually decreases as the model becomes better at making predictions. The model with zero rules is the slowest model, and the model with all rules is the fastest model. For low-level sets, there is little difference in RT between different versions of the model. However, the difference is quite high for trials with high-level sets. This effect resembles the one found in human data. Overall, Fig. 9a suggests that the main boost in performance through predictions is produced by trials with high-level sets.

Figure 9.

 (a) Reaction times of eight models averaged over four difficulty levels. (b) Reaction times of the slowest and fastest models compared to the reaction times of the human players in trials with highlighted cards.

In Fig. 9b, the mean reaction times (dashed lines) of the fastest and slowest models are compared to the mean reaction times (solid lines) of corresponding fast and slow groups of human players. As it can be seen, the models closely reproduce reaction times of both slow and fast human players. The fixation sequences produced by these two models were further compared to human data from fast and slow groups.

6.3.2. Dimension reduction

Both the fast and slow models are quite good at replicating the subjects’ tendency to use dimension reduction and preference to certain attribute values. As an example, the fast model’s (Fig. 10a) and the subject’s (Fig. 5a) fixation sequences from the same trial are compared. The model’s fixation sequence closely resembles the sequence produced by the subject. At the beginning of the trial, the model also preferred to look at the green cards and later on switched attention to a group of cards with oval shape in the same manner as human subjects did. This decrease is consistent with behavior of the human subjects.

Figure 10.

 (a) Model’s fixation sequence diagram for trial “lvl3_15.” (b) Changing proportion of blocks where the models used dimension reduction in trial “lvl3_15.” Proportions are calculated from both slow and fast models’ data. Proportions are shown as a function of fixation position in the sequence and attribute value.

It is obvious from multiple model runs that half of the times, the model prefers to look at the green cards at the beginning of the trial, although they form the second-largest group after cards with an oval shape (Fig. 10a). Nevertheless, the fact that color is the most salient attribute type is enough to compensate for a smaller group size. Defining separate saliency values for attribute types works quite well for modeling players’ bias to an attribute type.

It can be observed from Fig. 10 that the model favored shape in the later stage of the game, which is the least salient attribute type. This is due to the effect of a group size. Oval shape compensates its inherent lack of saliency with bigger number of occurrences. The fact that oval value provides strong competition to green value even at the beginning of the trial suggests that the effect of a group size is stronger than it should be (compare Fig. 10b to Fig. 5b).

Overall, the saliency and relevancy mechanisms work well in modeling subjects’ strategy to use dimension reduction. Combined data from both models show similar order of preference for the attribute types as the human subjects. Fig. 11 shows that, in general, models clearly prefer color and number while they make little difference between shape and shading. Both models gradually stop using dimension reduction if it fails to find a set (Fig. 12a). This behavior is again consistent with behavior of human subjects. However, models are more dependent on dimension-reduction strategy than the human subjects. We attribute this difference to the difference in manner of scanning between model and human subjects. We discussed earlier that human subjects can get distracted and produce wandering fixations in the middle of the scan. On the other hand, the model is precise and does not produce such fixations.

Figure 11.

 Mean proportions of attribute types used in similarity-based scanning. The overall values of all subjects’ trials with the highlighted card are compared to the overall values of both models’ trials.

Figure 12.

 (a) Changing proportion of trials in which dimension reduction was used. The proportions are calculated as a function of fixation position in the sequence. Results are shown separately for slow and fast models. (b) A mean overall similarity of all cards in a subsequence to a highlighted card shown separately for slow and fast models.

Finally, as Fig. 12a shows, the slow model is more likely to use dimension reduction in the latter part of the trial than the fast model. Overall, the fast model is less biased toward dimension reduction than the slow model, showing an effect similar to one produced by the fast human players (Fig. 7a).

6.3.3. Dissimilarity-based search

Our experiment revealed that the subjects gradually switch from dimension-reduction strategy to dissimilarity-based search (Fig. 7b). To test whether the model exhibits the same pattern of behavior as the human players, the same type of analysis was done on fixation sequences produce by the model. The results can be observed in Fig. 12b. There are gradual transitions from the similarity- to dissimilarity-based search for both slow and fast models. The difference between fast and slow models with respect to bias to the perceptual similarity is smaller than in human players; however, it is present. It can be seen that the graph for the fast model comes to an abrupt end at the 10th subsequence. This is due to the fact that the fast model rarely required more than 10 subsequences to find the set.

7. General discussion

7.1. Bottom-up and top-down processes

Improvement in playing SET can be explained by the interplay between the two types of processes. Slow players initially tend to rely on bottom-up processes, because their top-down strategies are too slow to keep up. Improvement in the game is characterized by an increase in efficiency and involvement of top-down processes.

A similar development was found in studies of other games such as Scrabble (Halpern & Wai, 2007). In that study, slow and fast players also differed in the interplay between top-down and bottom-up processes. Slow players prefer to rotate and rearrange the letters physically to check whether they form a word. It makes players very much dependent on bottom-up motor processes and perceptual stimuli representing the letters. On the other hand, fast players prefer to rotate and rearrange the letters mentally. Hence, a fast player prefers to use top-down processes to manipulate the abstract representations of the letters.

Another example of a shift in balance between bottom-up and top-down processes is observed in Tetris. Initially, it was believed that slow players prefer to rotate and translate tokens mentally to check whether that piece will fit at various parts of the screen, whereas more experienced players prefer to rapidly manipulate the tokens physically (Kirsh & Maglio, 1994). However, a later study showed that players with extensive experience prefer to rotate and translate pieces mentally rather than physically (Destefano, Lindstedt, & Gray, 2011). This means that they no longer require perceptual input to verify their solution. This is similar to learning in Set, where prediction processes make it unnecessary to “see” the third card to infer it is part of the set.

In light of these findings, we conclude that such shift in balance between top-down and bottom-up processes may be a very common learning process.

7.2. Implications of this study

7.2.1. Threaded cognition for bottom-up and top-down processes

In earlier studies, fast players substitute bottom-up with top-down processes through substitution of physical with abstract, but otherwise identical, actions. However, our model showed that fast players can combine bottom-up and top-down processes beyond that of simple substitution. The fast model is able to perform actions, such as prediction, that are otherwise beyond capabilities of bottom-up processes. This capability requires viewing bottom-up and top-down processes as parallel and competing processes. Earlier, we referred to it as a competitive parallelism. This is in contrast to conventional sequential or hierarchical view, but in line with theory of threaded cognition (Salvucci & Taatgen, 2008, 2011). However, in addition to the separation of processes into threads by tasks, we also have a division of processes into threads by their types within a single task. As such, this study can be viewed as a theoretical and practical example of threaded cognition and can contribute to general understanding of this theory.

Having two threads for the same task, competitive parallelism, has a direct implication in learning. Competitive parallelism ensures that when the same task can be accomplished by both bottom-up and top-down processes, training will ensure the most suitable one will be chosen eventually. Competitive parallelism can be a cornerstone for problem-solving tasks. For example, it can explain how Tetris or Scrabble players minimize cost of mental operations while still doing the same task physically. Further study is needed to confirm those assumptions.

7.2.2. Implicit decision making

There are several interesting findings in this research. SET players can apply more than one strategy during the game, similarity and dissimilarity based. Our model has shown that shifts between those strategies can occur as a result of evolving activations triggered by basic bottom-up elements such as inherent memory associations, inhibition, and influence of perceptual stimuli. Such a choice of a strategy is not a deliberate explicit decision, but rather an implicit bottom-up decision. Perhaps, absence of explicit meta-cognitive control can explain why SET players are often unable to clearly describe their strategy. Furthermore, the similarity-based strategy is bottom-up and dissimilarity-based strategy is top-down. It suggests that there is not only an implicit shift in strategy but also in type of processes. All together, it suggests that bottom-up decision may have a bigger role in cognitive processes and should be paid more attention in future studies.

As a possible line of future research in this direction, we would like to draw similarity between the way our models shifts between strategies and perceptual decision-making models based on decision threshold. These mathematical models assume integration of sensory evidence until decision variables reach decision threshold, after which a categorical choice is made from alternatives (Smith & Ratcliff, 2004; Usher & McClelland, 2001). Similarly in our model, evolving activation in memory influenced by items in the visual field can be viewed as an accumulation of sensory evidence, and resulting probability of retrieval as a decision threshold. However, mathematical models provide no information about processes that govern the contextual regulation of the perceptual decision making (Domenech & Dreher, 2010). In contrast, our model provides a set of perceptual and cognitive processes backed by theory. As such, integration of mathematical and ACT-R-based models may provide much more insight in domains of decision making and problem solving in general.

7.2.3. Predictability and learning in problem-solving task

The ability to predict is a useful, but understated, in our opinion, process of human cognition. There are limitations on the amount of visual information a human brain can process, and many consider selective attention shifts as mechanisms to deal with the limitation. However, recent studies suggest that prediction also plays an important role in mitigating processing limitations (Alink, Schwiedrzik, Kohler, Singer, & Muckli, 2010; Soga, Akaishi, & Sakai, 2009). Important parts of visual stimuli that are not processed are predicted based on previous experience. Furthermore, prediction is used to anticipate future stimuli. A recent study showed that predictability of the environment has significant influence on the decision-making process (Domenech & Dreher, 2010).

In this article, we showed how predictability of the environment in combination with a player’s proficiency influences decision making. The fact that, in our model, difference in ability and accuracy of prediction was able to explain major difference between fast and slow players suggests that prediction possibly has important role not only in decision making but also in the learning process.

8. Conclusion

It is our hope to contribute to the understanding of visual cognition where both internal conceptual knowledge and external perceptual stimuli converge in a goal-driven task. As one step toward this goal, we have studied the importance of perceptual and cognitive processes in complex tasks requiring both internal planning and reaction to perceptual stimulus from the environment.

First, there is an interaction between two types of process in accomplishing an immediate task. Such interaction involves both a sequential cooperation and a parallel competition with emphasis on the latter. Such competition gives a chance for top-down processes to gain edge over faster, but limited bottom-up processes.

Next, both bottom-up and top-down processes are involved in decision making. On the one hand, bottom-up processes can influence top-down decision. On the other hand, bottom-up process, such as evolving memory activations, can result in decision without need of top-down control. This suggests that decision making may not be an explicit process only.


  • 1

     SET is a game by Set Enterprises (

  • 2

     The sum of distribution numbers exceeds the total number of trials because the same trial can have two or more most abundant-value groups of equal size but different attribute types.

  • 3

     To eliminate possible influence of the highlighted card, only trials without highlighted cards were considered in calculating the proportion.


The probability of k subsequent fixations falling on cards that share at least one value in common if the fixations are assumed to be random is calculated with a following equation:


k— the number of fixation in fixation subsequence

nij—a number of cards in array of 12 cards that have value j for an attribute i

Before further explanation, one should consider that this analysis is done on collapsed fixation sequence where consecutive fixations on the same card are considered as a single fixation; therefore, the next fixation always falls on another card.

Let us assume that there are five green cards among 12 cards on the desk. If we assume that the subject is always fixating on one of the cards before fixating on another card, then the number of possible cards on which the subject can fixate is 11. Probability of randomly fixating on one of those 11 cards is 1/11. Now, if we assume that the subject started looking at green cards, then the probability of the first fixation on any green card is 5/11. However, the probability of second consecutive fixation on another green card is 4/11, as the subject is already fixating on one of the green cards. The probability of each of the next consecutive fixations after the second fixation will be 4/11 as well. If the subject did seven consecutive fixations on green cards, then the probability of an entire block of fixations will be inline image. If, instead, we want to calculate a probability of seven consecutive fixations on cards that share any attribute value (not just green color), then it will be the sum of probabilities for each individual attribute value.

If the calculated probability of the block of k fixations is below 0.05, then it is assumed to be not produced by chance. The blocks are calculated for each attribute type. If two blocks of fixations from different attributes overlap, then the block with the least chance probability is preferred. The other block is cut at the point of an overlap, and its probability is calculated again based on the block’s new length. If the two blocks overlap and have an equal chance probability, then the longest block is preferred. If the lengths are also equal, then one of the blocks is randomly chosen and removed. Finally, Holm–Bonferroni correction was used on initial significance value of 0.05. The correction compensated for the inflation of the chance probability when multiple solid blocks are present in the same trial.