Perhaps Unidimensional Is Not Unidimensional

Authors


should be sent to Pennie Dodds, School of Psychology, University of Newcastle, Callaghan, NSW 2308, Australia. E-mail: pennie.dodds@newcastle.edu.au

Abstract

Miller (1956) identified his famous limit of 7 ± 2 items based in part on absolute identification—the ability to identify stimuli that differ on a single physical dimension, such as lines of different length. An important aspect of this limit is its independence from perceptual effects and its application across all stimulus types. Recent research, however, has identified several exceptions. We investigate an explanation for these results that reconciles them with Miller’s work. We find support for the hypothesis that the exceptional stimulus types have more complex psychological representations, which can therefore support better identification. Our investigation uses data sets with thousands of observations for each participant, which allows the application of a new technique for identifying psychological representations: the structural forms algorithm of Kemp and Tenenbaum (2008). This algorithm supports inferences not possible with previous techniques, such as multidimensional scaling.

1. Introduction

Absolute identification (AI) is the task of identifying stimuli that vary only on one physical dimension; for example, tone frequency (e.g., Hartman, 1954; Pollack, 1952), tone intensity (e.g., Garner, 1953), or line length (e.g., Lacouture, 1997). In a typical AI task, stimuli are first presented to the participant one at a time, each with a unique label. In the test phase, the participant is then presented with randomly selected stimuli from the set and asked to recall the associated labels.

Miller’s (1956) classic article investigated limits in both short-term memory and in AI, and found that 7 ± 2 was not only the number of chunks that can be held in short-term memory but was also the mean number of items that individuals could learn to identify in such a unidimensional stimulus set. Specifically, Miller referred to a limitation in information processing capacity. He found that the average observer was only able to correctly identify seven stimuli, equivalent to transmitting approximately 2.6 bits of information (a form of measurement derived from information theory) from the stimulus set to the response choices. Miller was most surprised at the small variation in this capacity limit. Across a broad range of stimulus types, not more than nine stimuli were reliably identified (from electric shocks, to saltiness: e.g., Lacouture, Li, & Marley, 1998; Pollack, 1952; Garner, 1953). This upper limit is particularly surprising because it is apparently resistant to many experimental manipulations that greatly improve performance in other tasks, such as extensive practice (e.g., Weber, Green, & Luce, 1977), changes in the size of the choice set (e.g., Garner, 1953), and stimulus spacing (e.g., Braida & Durlach, 1972). These results, among others, led researchers to conclude that Miller’s capacity limit was a fundamental aspect of human information processing rather than a sensory limitation. Even though Miller’s original analyses were applied to averages over subjects, subsequent work confirmed that these limits applied to individual observers, provided sufficient data were obtained from each (e.g., Weber et al., 1977).

Recent work has identified exceptions. One of Rouder, Morey, Cowan, and Pfaltz’s (2004) three participants was able to learn to perfectly identify 20 line lengths. Dodds, Donkin, Brown, and Heathcote (2011) reported similar learning effects for line lengths, and also for dot separation, line angle, and tone frequency. These findings contradict Miller’s (1956) assertion of a small upper limit to memory processing capacity, and so they could represent a theoretically important finding: Perhaps the information capacity limit is not as fundamental or as fixed as previously assumed. The theoretic implications of this result could be wide ranging. At the very least, this result would challenge all existing theories of identification, which assume that learning has no effect (including Stewart, Brown, & Chater’s, 2005; Petrov & Anderson’s, 2005; Marley & Cook’s, 1986; and Brown, Marley, Donkin, & Heathcote’s, 2008).

There is, however, an alternative explanation. The number of stimuli that can be reliably identified increases exponentially as the number of dimensions increases (Eriksen & Hake, 1955; Miller, 1956; Rouder, 2001), at least when those dimensions can be perceived independently (“separable” dimensions: Nosofsky & Palmeri, 1996). For example, people are able to identify hundreds of faces, names, and letters, all of which vary on multiple dimensions. To take a simple example, if an observer could perfectly identify, say, seven line lengths and also seven angles, he/she might be able to identify 49 different stimuli with these combined features, such as circle sectors. With one key assumption, this line of reasoning might reconcile the new learning effects observed by Rouder et al. (2004) and Dodds et al. (2011) with the long-standing results of Miller. The extra assumption is that some stimulus sets which vary on just one physical dimension might nevertheless invoke more complex psychologic representations. As with physically multidimensional stimuli, more complex psychologic representations might support richer percepts, perhaps allowing multiple ways to estimate the magnitude of a stimulus and hence better identification.

This extra assumption is not unprecedented. Even though the stimuli used in AI always vary on just one physical dimension, this does not guarantee that the corresponding psychologic representations are uni-dimensional. For example, perceived hue is represented either on a circle or a disk (depending on saturation: Shepard, 1962; MacLeod, 2003) and the psychologic representation of pitch is a helix (Bachem, 1950), even though the corresponding physical stimuli vary on only one dimension (wavelength, in both cases). In Dodds et al.’s (2011) and Rouder et al.’s (2004) studies, it might have been that those observers who learned to identify stimuli beyond Miller’s (1956) limit managed this feat by constructing more complex psychologic representations for the unidimensional stimuli. If these observers had access to percepts on dimensions that are even partially independent, this could explain their improved performance without challenging Miller’s long-standing hypothesis that performance on any single dimension is severely limited.

1.1. Examining psychologic representation

In the absence of additional evidence, there is an unsatisfying circularity to the hypothesis linking complex psychologic structure with learning. The only evidence to suggest that some physically unidimensional stimuli might have more complex psychologic representations is that those same stimuli support learning beyond Miller’s (1956) limit. The only tested prediction from the hypothesized complex representation is that those same stimuli can be learned well. One method of independently probing psychologic representation is to use multidimensional scaling (MDS; Cox & Cox, 1994, 2001). MDS determines relationships between objects by examining estimates of the perceived similarity of pairs of the objects. In some cases, such as with color, MDS techniques are able to reliably infer that the complex psychologic representation extracted from apparently unidimensional stimuli. This success presumably depends on the clear and consistent form of the representation across different people—allowing data to be averaged across subjects. In turn, the consistency of the psychologic representation across subjects is probably an upshot of the basic physiology of the retina. In less clear-cut cases, MDS is not always sensitive to subtle or inconsistent changes in the form of psychologic representations.

Dodds, Donkin, Brown, and Heathcote (2010) collected similarity ratings for line lengths, which was one of the stimulus types that both Rouder et al. (2004) and Dodds et al. (2011) identified as an exception to Miller’s (1956) limit. Dodds et al. (2010) found that MDS was not reliably able to distinguish between one- and two-dimensional representations. The problem is that it lacks a framework for inference about these arrangements. This means that, if one wishes to recover the number of dimensions that best represent a relationship between objects, the conclusions are based on subjective judgements. Lee (2001) investigated this problem in detail and found that, for one- or two-dimensional representations, MDS correctly identified the number of dimensions only 14% of the time.

Since the development of MDS, more than 50 years ago, there have been some important advances in statistical algorithms, which can uncover geometric structure from similarity data. The new methods include coherent frameworks for statistical inference, supporting objective and quantitative conclusions about the psychologic structure. One of the most influential new approaches is a Bayesian version of MDS (e.g., Oh & Raftery, 2001), which employs a “penalized likelihood” approach. Bayesian MDS calculates the likelihood of the similarity data, under different assumptions about the psychologic structure, capturing the notion that structures which fit closely with the observed data are more likely. Most important, the likelihood is penalized by the complexity of the assumed structure; a quantitative version of Occam’s Razor.

Penalized likelihood provides a chance to investigate our core hypothesis about the complexity of psychologic structures. However, Oh and Raftery’s (2001) Bayesian MDS is not quite the right tool for the job. The difficulty is that, like traditional MDS, Bayesian MDS makes very weak assumptions about the shape of the psychologic structure. This can be an important advantage in exploratory usage, because it does not impose preconceptions on the data, but it can also be a disadvantage. In all other investigations of multidimensional psychologic structure, very specific structures were always identified. For example, in the perception of color, long wavelength light has a perceived hue (red) that is similar to the hue perceived for short wavelength light (violet), which is represented as a ring structure. In the perception of pitch, notes are perceived as similar to those an octave above, but dissimilar to those in between (e.g., A3, at 220 Hz is perceived as similar to A4, at 44 Hz, but quite different from the notes in between). This structure is represented as a helix. Ring and helix structures can sometimes be successfully identified by Bayesian MDS, because they can be embedded as lower dimensional manifolds in higher-dimensional Euclidean spaces. However, the penalized likelihood approach can also be biased against identifying these structures: Too-large penalties can be applied, because the algorithm can treat the complexity of these structures as equivalent to the complexity of the (much larger and more complex) space in which they are embedded.

An elegant solution is to employ the same penalized likelihood approach as Bayesian MDS, but to constrain the algorithm to estimate only the simpler structure (ring, or helix) and apply appropriate complexity penalties. This approach brings with it a danger of mis-fitting—if the true psychologic structure is some complex shape that is not a ring, or helix, then the results might be misleading. To avoid that problem, we also estimate the completely general multidimensional structures (with correspondingly heavy complexity penalties) that are standard in MDS.

We use Kemp and Tenenbaum’s (2008)“structural forms” algorithm for these analyses. The structural forms algorithm is based on a universal grammar for generating graphs, and the generality of those graphs allows the algorithm to coherently compare constrained structures (such as rings, helices, and one-dimensional chains) with the most flexible approach of estimating points in vector spaces (as in Bayesian MDS). The structural forms algorithm employs penalized likelihood for statistical inference, with the penalty for complexity based on a Bayesian approach, similar to Oh and Raftery’s (2001) Bayesian MDS, but one that easily allows comparison of constrained and free structures.

We use Kemp and Tenenbaum’s (2008) algorithm to investigate the psychological representation of the stimuli used in AI experiments. We limited our analyses to undirected graph structures only, on the assumption that the similarity of two stimuli should not depend on the order of comparison (or, if it did, that this dependence was not of primary interest). We investigated the likelihood of four of Kemp and Tenenbaum’s forms—the chain, ring, cylinder, and grid (see Fig. 1 for examples of the first two):

Figure 1.

 An illustration of a (a) ring structure and a (b) chain structure for lines of varying length. Note that in the chain structure the extreme stimuli (lightest and darkest) are far apart, whereas in the ring structure they are close.

  • Chains represent the standard assumptions for AI stimuli: one-dimensional continua, where the psychological distance between stimuli is found by summing the distance from one neighbor to the next, and the next again, and so on.
  • Rings represent just a small increase in complexity from chains, capturing the additional property that stimuli near one end of the set are perceived to have something in common with stimuli at the extreme other end. This kind of relationship is found in the perception of color.
  • Cylinders are more complex again and extend ring structures to allow for successive rings, which differ on a second dimension. The perceived structure of musical pitch (a helix) is a subtype of cylinder.
  • Grids are the most general two-dimensional structures, equivalent to the two-dimensional analyses used in standard MDS and Bayesian MDS. By estimating different leaf lengths, grids allow for arbitrarily close approximation of any two-dimensional arrangement.

To foreshadow our results, we find that the data are almost always best described by chain and ring structures—the extra complexity of cylinders and grids was almost never justified by improvement in data fit. To provide some intuition into the kinds of structures we analyze in data, Fig. 1 shows an example ring and chain (using leaf lengths actually recovered from data).

1.2. Data

A direct way to investigate psychological structure relies on similarity estimates obtained by direct interrogation: Participants are presented with two stimuli and asked to rate their similarity on some scale. Such ratings have many problems. First, there is a severe limit on sample size, because participants find it difficult to give many repetitions of these responses. Secondly, the numerical similarity ratings provided by participants depend on the experimenter’s choices. For example, different ratings would be provided if the observers are asked to rate similarity from 1 to 10 or from 0 to 100, or on a Likert scale, and the precise nature of this dependence is not clear. Even more troubling, it is not clear whether similarity ratings obtained by this method are based on the particular psychological representation of interest (the one underlying identification). To circumvent all three problems, we replace similarity judgments with confusion matrices calculated from many thousands of AI trials. These confusion matrices encode how often pairs of stimuli are confused with each other (e.g., when stimulus A is presented, what is the probability that it is identified as stimulus B?). Thus, our assumption is that the probability of confusing two stimuli is monotonically related to their similarity.

We calculated confusion matrices using the data from four AI experiments reported by Dodds et al. (2011; see Table 1). In all four experiments, participants were given extensive practice over a series of 10 sessions, leading to around 5,000 observations per participant. Each experiment included five or six participants. Three of the experiments used a smaller number of stimuli (15 or 16) allowing for fair comparison between different stimulus types. These three experiments included the only one in which participants did not exceed Miller’s (1956) limit of 7 ± 2 stimuli (tone intensity) and two in which they did (line length and dot separation). The fourth experiment used 30 line lengths. We included this experiment because it showed some of the greatest improvement in performance with practice. We chose these four experiments to showcase opposite ends of the performance continuum. The experiment with tones of varying intensity included some of the poorest performance and weakest learning. Two experiments using line lengths were used as one used a comparable set size to the tone intensity experiment (Experiment 5a), and the other showed some of the strongest learning effects (Experiment 1a).

Table 1. 
Data sets used from Dodds et al. (2011)
ExperimentaStimuliSet Size
  1. a Note. Experiment refers to the experiment number as listed in Dodds et al. (2011)

1aLine length30
2bDot separation15
5aLine length16
5bTone intensity16

We are wary of direct comparison between smaller and larger set size experiments because of the varying statistical reliability of the data sets. The larger set sizes resulted in one quarter as many observations contributing to each element of the confusion matrix—as few as three observations per matrix element. The penalized complexity approach, as in Kemp and Tenenbaum’s (2008) algorithm means that noisier data lead to a preference for simpler structures—a bias toward identifying chain structures, in our case. This might extend to our smaller set size experiments, because even with 5,000 observations in the smaller set sizes, the average number of observations contributing to each confusion matrix element was between 16 and 20. We return to this point in the Discussion.

To provide an alternative auditory data set, we also ran a new identification experiment using tone frequency. For this experiment, we gave six musically trained participants practice with a set of 36 pure sine tones varying in frequency—their frequencies matched the fundamental frequency of the standard Western-tuned piano notes from A3 to G#6. The procedure for this experiment was similar to the procedure outlined for Experiment 6 in Dodds et al. (2011), except that responses were labeled not only with a number (1–36) but also the corresponding piano note name. There were 10 learning sessions providing 4,860 identifications per participant.

2. Results

Confusion matrices were constructed for individual participants for (a) their entire 10 h of practice, (b) the first 5 h of practice, and (c) the last 5 h. Data for individual participants were used as opposed to averaged data because of the large individual variation (see Table 2 for variation in accuracy). Relational feature data were simulated from the confusion matrices using the default methods described by Kemp and Tenenbaum (2008).

Table 2. 
Accuracy and log-likelihood values for each participant in each of the five experiments
Experiment (Stimuli)ParticipantInitial AccuracyImprovement in AccuracyOverall Log-Likelihood DifferenceaEarly Log-Likelihood DifferenceaLate Log-Likelihood Differencea
  1. aNote that difference values are calculated by subtracting the likelihood values for ring structures from the likelihood values for chain structures.

Tone intensity (16)10.340.18.35719.32612.25
20.30.1920.09218.70525.209
30.330.1212.50826.44217.106
40.270.0924.00727.55420.384
50.340.0826.20523.1922.165
60.310.1319.07819.52717.521
Line lengths (16)10.580.22−2.5420.12−5.116
20.510.38−3.4230.171−4.711
30.410.317.04112.7334.816
40.40.283.3449.77−0.223
50.460.305.14313.69−2.046
60.570.24−3.365−3.874−4.798
Dot separation (15)10.440.27.85210.3788.621
20.510.37−0.8774.088−5.481
30.530.274.4727.2292.468
40.390.375.70311.38−12.56
50.530.41−2.6582.36−1.536
Line length (30)10.210.2624.556  
20.180.113.405  
30.290.47−1.426  
40.20.124.292  
50.310.4157.852  
60.170.24−22.265  
Tone frequency (36)10.40.29−4.273  
20.590.31−0.394  
30.20.1327.8  
40.210.0924.934  
50.190.0822.549  
60.220.140.634  

We used v1.0 (July 2008) of the Matlab implementation of the structural forms algorithm (obtained from the first author’s website). For each confusion matrix, we identified the best chain, ring, cylinder, and grid structure, and recorded the penalized likelihood. In all cases, we used default values for the algorithm’s parameters. We made one modification to the algorithm, for numerical stability, restricting the search over edge lengths to disallow lengths that were extremely close to zero (smaller than e−10). Note that this restriction still permits edge lengths of precisely zero, because adjacent stimuli can be collapsed into single nodes using the rules of the graph grammar; the restriction only disallows extremely small, but non-zero separation between stimuli.

Our analyses provided almost no evidence in favor of the two more complex structures. The grid structure was not found to be the best description for any confusion matrix. The cylinder structure was deemed best in only three of the 65 confusion matrices analyzed. Two of those three were early or late half analyses, and the other was a full data set (from one of our best performers—participant 5 from the 30-line-lengths study). Since there was so little evidence in favor of the cylinder and grid structures, from here on we discuss results for the ring and chain structures only.

Table 2 reports differences in penalized log-likelihood between the chain and ring fits. To put the likelihood results in statistical perspective, differences in log-likelihood can be used to approximate the posterior probability that one structure of the pair (chain or ring) was the data-generating structure. This approximation should be interpreted with care, as it relies upon some strong assumptions—for example, that the data-generating model was one of the pair under consideration (e.g., Raftery, 1995). Notwithstanding, using this interpretation, a difference in log-likelihood of two units corresponds to about three-to-one odds in favor of one model over the other, and a difference of six units in log-likelihood to better than twenty-to-one odds.

2.1. Whole data sets

We report the analyses for smaller set sizes (15 or 16 stimuli) separately from the larger set sizes (30 or 36 stimuli). This allows cleaner comparison within each group because the number of data contributing to each element of the confusion matrices is comparable: about 18 observations per entry for the small set sizes, and about four for the large set sizes.

2.1.1. Small set sizes

The small set size experiments used 16 tones of increasing intensity, 16 line lengths, and 15 dots varying in separation. Participants who practiced tone intensity did not improve their performance much with practice, and their confusion matrices were also unanimously better described by chain structures than ring structures (see Table 2, where positive log-likelihood differences imply support for chain structures over ring structures). These results are consistent with Miller’s (1956) original hypothesis that AI is subject to a severe capacity limit when the stimuli really are unidimensional.

In comparison, there were several confusion matrices from participants who practiced with line lengths and dot separations that were better described by the ring structure than the chain. In these two experiments, the ring structure was deemed more likely for about half of the participants (five of 11; see Table 2). The support for a ring structure is surprising when considering the data used in these analyses, confusion matrices rather than similarity ratings. It seems conceivable that, if asked for a similarity rating, a participant might rate the extreme edge stimuli as similar, even if they are very unlikely to confuse those stimuli in an identification experiment. This presumably biases our results toward the chain structure, and yet several participants were still better described by ring structures.

Those five participants for whom the ring provided a better description in these experiments were also the ones who demonstrated higher initial identification performance and more improvement with practice. At the beginning of practice (first session), their mean accuracy was 54%, compared with 44% for the participants better described by chain structures, and over the course of practice, those subjects identified as having ring-like representations improved their identification performance by 32% compared with 29% for the chain-like participants. We are hesitant to calculate inferential tests on these differences due to the small number of participants (five in one group, six in the other).

2.1.2. Large set sizes

Table 2 also shows accuracy and log-likelihood differences for experiments with 30 line lengths and 36 tone frequencies. Four of the twelve participants demonstrated greater likelihood for a ring structure compared to the chain structure. As with the smaller set size experiments, those who demonstrated a ring structure demonstrated greater prepractice performance (Mring = 0.36, Mchain = 0.22) compared to those that demonstrated a chain structure (Mchain = 0.22) and also greater improvement in performance with practice (Mring = 0.36).

2.2. The effect of practice

To examine whether the improvement in performance was associated with a change in the structure of psychological stimulus representations, we also examined the confusion matrices for each participant in the small set size experiments separately for early (1:5) and late (6:10) practice sessions (see Table 2). We did not examine this split in the data from the large set size experiments because the sample size was too small— average of fewer than two observations per entry.

For those who practiced tone intensity (Table 2) there was no difference in the estimated structure between early and late sessions: The data from every participant, for both early and late sessions, were always better described by chain structures than rings. For those who practiced line length or dot separation (Table 2), the chain structure was also dominant for early sessions (10 out of 11 participants). For six participants, however, the most likely structure changed from a chain to a ring from early to late sessions. Three participants demonstrated a chain structure both in the early sessions and in the late sessions, and one other demonstrated a ring structure in both early and late sessions. No participant demonstrated the reverse switch—from ring to chain structure. Consistent with the hypothesis that high performance in AI is only possible through more complex psychological representations, the single participant who demonstrated a ring structure throughout practice also had very high performance throughout, and the three participants who demonstrated a chain structure even late in practice were among the poorest performers.

A repeated theme in the above findings is that more complex (ring) structures are associated with better identification and with more improvement with practice. To investigate this more formally, we calculated the correlations between improvement in performance and initial accuracy, and log-likelihood differences between ring and chain structures (see Fig. 2). Both improvement in accuracy and initial accuracy demonstrated strong negative correlations with log-likelihood differences. That is, smaller log-likelihood differences, representing stronger preference for ring structures were associated with greater overall improvements in accuracy (r = −.70, p < .001) and greater initial performance (r = .65, p < .001; see Fig. 2).

Figure 2.

 Accuracy and improvement in accuracy as a function of difference in log-likelihood values for ring and chain structures (where a negative log-likelihood value indicates a preference for a ring structure). Note. Two outliers were removed from this analysis (where log-likelihood difference was <−10 and more than 30).

3. Discussion

For more than 50 years, identification with unidimensional stimulus sets has been thought to be subject to a strict performance limit, Miller’s (1956) magical number 7 ± 2. More recently, Rouder et al. (2004) and Dodds et al. (2011) have shown that some stimulus sets support much greater performance than this limit (including line length and tone frequency), although at least one does not (tone intensity). One way to reconcile these new findings with previous literature is to hypothesize that some stimulus sets, although physically varying on only one dimension, give rise to more complex psychological representations. The data from Dodds et al. and the new structural forms algorithm developed by Kemp and Tenenbaum (2008) provide an opportunity for investigating this hypothesis in a way that was not previously possible because of limitations in analytic tools such as MDS.

Our results provide support for the previously untestable hypothesis that improved identification performance is only possible with more complex psychological representations. When we examined data from the identification of tones varying in intensity (for which identification performance was severely limited), we uniformly found strong support for the simplest unidimensional psychological representation—a chain, the structure assumed by all theoretical accounts of identification. This result was observed for all participants and was also confirmed as the most likely structure in both the early and late practice data. Data from those stimulus sets in which practice improved performance yielded different results. The psychological representations of the stimuli for nearly half of these analyses (14 of 33) were better described by a more complex structure than the chain rings, in our analyses. This figure rose to 8 of 11 when only data from the second half of practice were considered, in sharp contrast to the 1 of 11 participants identified as using a ring structure in the first half of practice. The hypothesized relationship between identification performance and structure was further supported by strong correlations between performance in practice and the goodness of fit of the ring and chain structures.

To check our results with data from another laboratory, we also analyzed data reported by Rouder et al.’s (2004). Rouder et al.’s participants practiced the identification of line lengths in a similar procedure to that described above, using set sizes of either 13, 20, or 30 line lengths. We restrict our analyses to the two smaller set sizes, because sample size per element of the confusion matrix became prohibitively small for the 30-line experiment. Only two of Rouder et al.’s three participants took part in these two set sizes. Three of four of these analyses revealed data that were best described by the ring structure. The participants that demonstrated a more complex structure were also those that demonstrated higher initial accuracy (Mring = 0.85; Mchain = 0.68).

3.1. Limitations of the study

A natural question arising from our analyses is why our analyses did not yield entirely uniform results. That is, if improved performance in the identification task really is supported by more complex psychological representations of the stimuli, why did we not observe such representations for every participant? Two explanations seem plausible. First of all, in all experiments reported in Dodds et al. (2011) there was considerable variability among the participants in identification performance. About half of the participants did not learn to improve their performance beyond Miller’s (1956) limit of 7 ± 2 stimuli, and so it is consistent with the hypothesis that those participants should maintain the simplest (chain) psychological representations. Secondly, there is an inherent bias favoring the chain structure over the ring structure in noisy data. This bias arises because general noise (such as non-task-related responses, and random error) biases the confusion matrices toward uniformity, and uniform confusion matrices are—according to the structural forms algorithm—better described by chain than ring structures due to the higher complexity penalty attracted by ring structures.

A potential limitation of these analyses is the small number of participants in each of Dodds et al.’s (2011) experiments. Although we analyzed nearly 400 participant hours of data in total, these were obtained from only 37 participants. Our analyses borrow strength by grouping participants across experiments, and by splitting the large samples of data into early and late practice, resulting in 65 confusion matrices in total. The small number of participants per stimulus type (five or six) always leaves open the possibility that our observed differences between stimulus types are really due to random individual differences. However, the consistency of three of our results speaks against this possibility: Every one of the 18 confusion matrices from the tone intensity experiment was best described by the chain structure; there was a strong relationship between identification performance and the complexity of the psychological representation (Fig. 2); and there was a sizeable difference between the proportion of participants identified as having chain-versus ring-structured representations early and late in practice. These three results are all consistent with the hypothesis that improved identification performance is supported by more complex psychological representations.

3.2. Conclusions

Our results are consistent with the hypothesis that improved performance in an identification task is supported by complex psychological representations—at least, more complex than the unidimensional physical nature of the stimulus sets. The logic of our proposal is as follows: First, we propose that prior to learning, the psychological representation of the unidimensional stimulus sets used in identification tasks is a straightforward unidimensional mirror of the physical reality (i.e., a chain structure). Second, during identification practice, participants somehow manage to use different, partially independent sources of information about the stimulus. Third, they use these new sources of information to develop more complex psychological representations (e.g., ring structure, or something similar). Finally, they use the more complex psychological representation to support improved identification performance in the same way that complex physical structure supports improved performance (e.g., Rouder, 2001).

Our analyses are the first to provide evidence for the third stage in this hypothesis. However, our work does not allow insight into exactly what extra stimulus information participants are using. For example, it is easy to speculate that participants might learn to judge line lengths using information from several sources—perhaps the extent of the retinal image, in addition to the magnitude of the saccade needed to traverse the line, or even cues gained by comparing the line to external objects such as the display monitor. Magnitude estimates obtained from these sources would presumably be highly, but not perfectly, correlated, which could result in psychological representations more complex than unidimensional chains. Further study might examine such hypotheses by attempting to limit the information available from such cues, for example, by presenting visual stimuli using virtual reality goggles.

In summary, it seems that tone intensity was the only stimulus modality that showed consistent evidence for only a single underlying psychological dimension. Line length, dot separation, and tone frequencies showed evidence for more complex psychological representations than simple chain structures—particularly for highly performing participants and late practice data. The implications of these results are remarkable for the study of memory in terms of identification—if these stimuli are truly represented on multiple dimensions, unidimensional identification does not apply to these stimuli. In the extreme, it might be that the long history of study of unidimensional identification should have been limited to the study of tones varying in intensity. Or in the very least, that the identification of other stimulus types only qualifies as unidimensional as long as participants are not well practiced.

Acknowledgments

We are grateful to Jeff Rouder and Richard Morey for sharing their data for this analysis, and to A.A.J. Marley and Chris Donkin for comments on an earlier version.

Ancillary