Because our experiments tests the semantics involved in chess chunks, we briefly comment analogous experiments on semantics and perception. Sachs (1967), for instance, presented participants with an original paragraph and subsequently asked them to point out whether some particular phrases were present on the original paragraph. Participants easily perceived that, if meaning departed from the original, the phrase could not have been in the originally presented text. However, participants would point out as “the correct sentence” a phrase that shared words and meaning with the original text, but was not in it. Participants were able to retain the essence of the message, but could not retain the specific wording used. The key point we would like to stress is that memory is more prone to find semantic similarity than surface similarity.

These results resonate with the influential study of Chi, Feltovich, and Glaser (1981), which showed that in physics, novices paid excessive attention to surface features of a problem, whereas experts quickly were able to point out the basic physics principle underlying each problem. Novice physics students would tend to classify these types of problems using their surface similarity, with classes such as “always a block of some mass hanging down,” “velocity problems,” “rotation,” “inclined plane,” and so on. On the other hand, experts physicists would tend to classify the problems by giving the underlying physics principles that should be applied to them, regardless of their surface structure. Classifications by experts would thus consist of “Newton's third law,” “conservation of energy,” “conservation of linear momentum,” and so on. Similar findings have appeared in domains ranging from scientific knowledge to taxi drivers (Chi, 1993). Here again, semantic aspects seem to take precedence in accessing long-term memory: Students still have not acquired the required fundamentals of physics that rapidly lead one to the essence of a particular problem.

#### 2.2. Participants

The experiment was realized in chess clubs or tournaments organized by the Chess Federation of Rio de Janeiro (FEXERJ). Forty-three chess players, with varying degrees of chess skill and experience, participated. However, results from 7 participants had to be discarded due to errors or discrepancies in following the procedure (4 participants felt burned out after playing a game; a 5th asked for aspirins during the experiment; 2 others, after properly introduced to the task, attempted to cluster positions instead of pairing them). The 36 participants represent a sampling of 4.5% of all players associated with FEXERJ. Participants were divided into two groups according to their ELO rating (ELO, 1978) obtained in the federation's tournaments. One group consisted of novices to chess, whereas the other group's ELO ratings started at 1,600 points, averaging 1,942 points (*SD* = 168). Although this group mostly consisted of Class B and Class A players, some of them held a master level (in FIDE ratings). This group was composed of 22 players overall. The novice (control) group was composed of 14 players with maximum FEXERJ ratings of 1,599, averaging 1,299 points (*SD* = 206). Three players of this control group of participants were still unrated.

#### 2.3. Materials

In our experimental setting, 20 chess positions were carefully designed in which the key abstract roles played by pieces could also be found in another, usually very distinct, position—and, therefore, the theory predicted 10 expected pairings. The abstract roles used are found in Table 1. There were 10 “control positions” in the set, which were very similar in specific POS pairings (with a single pawn included or displaced to another square). All positions were white to move. The positions were easy to solve, with perhaps the exception of position 13. The main idea behind having easy positions is that (a) it enables cognitive scientists unfamiliar with chess to grasp them without difficulty, (b) a comparison between experts and novices could provide insightful results, and finally (c) by studying the errors that novices make, some fundamental points about the learning process might emerge.

Table 1. Abstract roles used in designing the pairs of positionsPredicted pairings | Abstract roles |
---|

1–7 | White moves a piece to a protected square and checkmates |

2–4 | White moves a piece to a square guarded by black and holds a discovered checkmate |

3–16 | Pawn structures block passage; kings are unable to strike |

5–9 | White has a piece that can simultaneously attack the black king and other strong piece(s) in a move (absolute fork), leading to significant material gain |

6–10 | White king and passed pawn cooperate in threatening to promote; black king must defend from both attacks, and is overwhelmed by the task |

8–20 | Black king has restricted mobility (unable to move), white can, by sacrificing, sustain that situation and has a knight in close distance |

11–19 | White has a pawn chain with a passed pawn; white's king is able to make Black's king retreat from protecting it |

12–13 | The pawn structures are perceived to block each other |

14–17 | Pawn chains unmovable; bishops unable to attack |

15–18 | Pawn chains unmovable; white bishop capable of attack, black bishop unable to attack |

These control pairs of positions were specifically devised to check whether players in the different groups would perceive distinctions at a strategic level between two positions that seemed similar on a surface level. As was mentioned above, these positions had at most one piece moved to another square, or a pawn inserted, which did not alter the POS structure significantly, but could drastically alter the strategic situation. Because these pairs do not alter the POS structure of the positions significantly, according to traditional theories, these models should retrieve from long-term memory a number of similar chunks (which, in other words, means that the positions should seem similar according to these theories). We refer the reader to the appendix for the full set of positions and the corresponding commentary.

#### 2.4. Procedure

The positions were permuted in random order and numbered 1 to 20 using that order. Participants were given two simple questionnaires: The first one presented each position in a separate sheet, and asked whether participants felt that the position was a win for white, a win for black, or a draw. Participants were also asked to give the first move for white. This phase was intended to familiarize chess players with each position, as a preparation for the experiment. Positions were presented in the shuffled order in both questionnaires (i.e., according to our theory, position 1 should be matched with position 7, etc.). The matching predicted by our model is found in Table 1.

In the second phase, experts were presented two sheets containing all the 20 positions (in the same permuted order), and a third sheet containing the numbers 1,2, …,20 in circular arrangement. Players were then told that their task was to find 10 pairings of those positions and to draw lines between their corresponding numbers. Players were specifically instructed to look for “similarities of strategic vision,” “essence, not appearance,” and their particular “feelings for how the positions will evolve strategically.” No more instructions were given. It took participants around 20 min to match up the positions.

In the experiment, the main dependent variable under analysis is the number of matched pairs as suggested by the hypothesis that abstract roles determine similar strategic scenarios (i.e., matchings included in Table 1). Another dependent variable analyzed was the number of pairs stemming from the control group of positions.

Before we proceed to results, it is important to discuss how much a matching of 10 pairs is able to inform us. How reliable should those results be? Let us suppose an expert player finds the exact matching predicted by our “abstract role” theory of chunking in chess. What is the likelihood of that event happening by chance? If there is a high likelihood of such a “false-positive” result, then the robustness of the experiment would obviously be questionable. We thus analyze the underlying combinatorics of such pairings and compute the probability of such a false-positive result arising in our experiment. If there are *N* positions on a set, let us imagine that participants will choose any one of them and look for its pair. There are at this stage (*N*-1) positions to be paired with, so that when one is chosen we have (*N*-1) branches of this decision tree. Now, at the next step, there are (*N*-2) positions remaining. By the same reasoning, a participant will have (*N*-3) options to chose from, and thus the decision tree now holds (*N*-1)(*N*-3) end nodes. Because this reasoning extends for the entire set of positions, until there is only a single pair remaining, the equation counting the number of possible distinct pairings follows: For *N* positions, *N* being even, we have (*N*-1).(*N*-3).(*N*-5). …. (1) = ∏_{i = 1}^{N/2}(2*i* − 1) distinct pairing possibilities. The rapidly growing number of distinct possibilities is obviously a combinatorial explosion.

With 20 positions, a false-positive matching has the minuscule probability of occurrence of 0.00000000152735. It is exceedingly unlikely that a participant would come up with our predicted matching of 20 positions by chance, and the possibility vanishes should numerous participants find it independently.

#### 2.5. Results

Given the prominence of POS information, traditional theories might expect match-ups between positions with higher POS co-incidences. Our theory, on the other hand, predicts that a large number of chess players would match the positions based on the roles that pieces play, not on the types of pieces, the number of pieces, or their specific board squares. In fact, 19 participants, representing 53% of our sample, matched the 20 positions precisely as expected by the theory. The chance of a false-positive result—that is, the probability of so many simultaneous match-ups—is smaller than 10^{− 170}.

This makes it clear that, in the chess players' perspectives, there should be a high similarity, on the strategic vision level of what the essence of the position feels like—as opposed to the surface level of what the appearance of the position looks like. This strategic vision similarity would be due to the perception of the similar sets of abstract roles, which were used to create the set of positions in the first place.

Regarding the different participant groups on the two types of problems, we have the following findings: In the first participant group, formed by experts, 16 from 22 participants have correctly matched the 10 pairs expected by our theory, with a mean of 9.32 pairs (*SD =* 1.21). In the novice group, however, only 3 from 14 participants have correctly matched the 10 pairs expected by our theory, with a mean of 5 pairs (*SD =* 3.44).

In relation to the “control position groups” (i.e., exhibiting similar underlying structure as given by high POS matches; and highly different strategic structure), experts held a mean of 0.14 (*SD =* 0.47), whereas novices presented a mean of 1.71 (*SD =* 1.82). The measures of performance of the experts and novices on two types of problems are presented in Table 2.

Table 2. | | | | | | 99% Confidence Interval for Mean | | |
---|

Problems | Groups | Subjects | Mean | Std. Deviation | Std. Error | Lower Bound | Upper Bound | Minimum | Maximum |
---|

Pairs expected by theory | Experts | 22 | 9.32 | 1.211 | 0.258 | 8.78 | 9.85 | 6 | 10 |

| Novices | 14 | 5.00 | 3.442 | 0.920 | 3.01 | 6.99 | 0 | 10 |

Control positions (pairs) | Experts | 22 | 0.14 | 0.468 | 0.100 | 0.07 | 0.34 | 0 | 2 |

| Novices | 14 | 1.71 | 1.816 | 0.485 | 0.67 | 2.76 | 0 | 5 |

Our theory would also predict that, as players advance to higher skill levels, players would be able to perceive previously unforeseen abstract roles, leading to the following hypothesis:

There is a positive correlation between player ratings and pairings predicted by our theory.

To test this hypothesis, an analysis of variance (ANOVA) was carried out to measure whether the difference between the averages of the pairs matched as expected was statistically significant. The significance level used was 1% (*p* < .01). The results obtained show that more advanced players match more positions postulated by the theory and that the difference between groups is statistically significant, *F*^{(1,34)} = 29.35, *MS*_{e} = 5.43, *p* < .01.

Our theory also predicts that as players advance in skill level, they will be less likely to match superficially similar positions (on a POS basis):

There is a negative correlation between player ratings and pairings from the control group of positions.

To test Hypothesis 2, another ANOVA was carried out to test whether the difference between the averages of the pairs matched in the control pairs was statistically significant. The significance level used was 1% (*p* < .01). The results obtained show that more advanced players match fewer control pairs then beginners, and that the difference between groups is statistically significant, *F*^{(1,34)} = 15.26, *MS*_{e} = 1.40, *p* < .01.

An important point concerns the relation between the familiarization phase of the experiment in which participants classified positions as win, draw, or lose, for white, and suggested moves, and the pairing judgments. It has been an established result for decades that higher skilled players were better at assessing win/draw situations and selecting abstract attacking themes (Charness, 1981a, b). This is further supported by our study in which participants unable to perceive the core dynamics of a position, as displayed by a wrong assessment and move suggestion, were also unable to point out the pairings expected by the theory. It is understandable, thus, that the higher the skill level, the higher the number of expected pairings—for participants that cannot meaningfully perceive *a single* strategic situation cannot be expected to perform better than chance in pointing out how that strategic situation may be similar to others.

Finally, it is vital to point out that the 3 unrated beginners included in this study matched four control pairs (out of 5) as “most similar strategically.” Because these control positions share the highest number POS combinations, it seems plausible that novices have great difficulty perceiving the abstract relations that constitute the essence of a position, tending to become confused with surface appearances. Although this might seem a trivial remark, it might have implications for cognitive computational architectures.