### Abstract

- Top of page
- Abstract
- Introduction
- Number estimation
- Operational momentum
- General discussion
- Acknowledgements
- References

The current study presents a series of computational simulations that demonstrate how the neural coding of numerical magnitude may influence number cognition and development. This includes behavioral phenomena cataloged in cognitive literature such as the development of numerical estimation and operational momentum. Though neural research has begun to describe neural coding of number, it is unclear how specific characteristics of the neural coding may relate to the expansive list of behavioral phenomena in the development of number cognition. The following study considers several possibilities.

### Introduction

- Top of page
- Abstract
- Introduction
- Number estimation
- Operational momentum
- General discussion
- Acknowledgements
- References

Number cognition, broadly speaking, includes numerical estimation, simple arithmetic operations, magnitude judgments, and counting amongst other skills. There is a long history of research on number cognition, including the cognitive and neural processes involving numerical magnitude. Research includes behavioral studies of number development (e.g. Gelman & Gallistel, 1978; Piaget, 1954, amongst others) and more recently a large number of neural studies relevant to number cognition (e.g. Ansari & Dhital, 2006; Ansari, Garcia, Lucas, Hamon & Dhital, 2005; Cantlon, Brannon, Carter & Pelphrey, 2006; Cantlon, Libertus, Pinel, Dehaene, Brannon & Pelphrey, 2008; Cohen Kadosh & Walsh, 2009; Dehaene, Piazza, Pinel & Cohen, 2003; Göbel, Calabria, Farnè & Rossetti, 2006; Pesenti, Thioux, Samson, Bruyer & Seron, 2000; Walsh, 2003; Whalen, McCloskey, Lesser & Gordon, 1997). This increasingly large literature involving humans has been supplemented by research with non-human primates (e.g. Brannon & Terrace, 1998; Nieder & Miller, 2003; Roitman, Brannon & Platt, 2007) and by computational methods that incorporate neural principles (e.g. Ahmad, Casey & Bale, 2002; Dehaene, 2007; Dehaene & Changeux, 1993; Verguts & Fias, 2004; Zorzi, Stoianov & Umiltà, 2004).

Significant contributions to cognitive development have been made through computational modeling that connects neural and behavioral data – in areas of language learning (Elman, 1993), motor development (e.g. Spencer, Simmering, Schutte & Schöner, 2007), and visual development (e.g. Mareschal & Johnson, 2002). For example, Spencer and colleagues use neurocomputational modeling to provide evidence for a novel interpretation of the classic A-not-B error developmental phenomenon (Piaget, 1954). By modeling of visual-motor neural processes Spencer and colleagues conclude that the A-not-B phenomenon is an example of a broader class of errors that occur in development. The current study presents a series of simulations based on recent advances in the study of the neural coding of numerical magnitude that offer new insights into behavioral phenomena described in the developmental literature.

#### Neural coding of number

A variety of investigations with both humans and non-human primates have characterized the neural activity related to the perception of number. First, research has focused on the localization of neural activity specific to number. There has been convergence on the intraparietal sulcus and areas of prefrontal cortex (e.g. middle frontal gyrus) from both humans (e.g. Ansari & Dhital, 2006; Ansari *et al*., 2005; Cantlon *et al*., 2006; Cantlon *et al*., 2008; Dehaene *et al*., 2003) and non-human primates (Nieder, Freedman & Miller, 2002; Nieder & Miller, 2003; Nieder & Merten, 2007; Sawamura, Shima & Tanji, 2002). Numerical coding activity has been recorded in both intraparietal sulcus and prefrontal cortex; two areas that have been found to be functionally connected (Cavada & Goldman-Rakic, 1989; Chafee & Goldman-Rakic, 2000; Quintana, Fuster & Yajeya, 1989). Neural activity in these areas has been recorded in tasks such as number magnitude comparison, arithmetic operations and even the perception of a digit. The basic result has been replicated across a variety of presentation formats, such as dot displays and written digits (Eger, Sterzer, Russ, Giraud & Kleinschmidt, 2003) and cultures (Tang, Zhang, Chen, Feng, Ji, Shen, Reiman & Liu, 2006).

Second, studies have described in detail neural responses to number with the use of direct neural recording. Two types of neural coding have been described: number selective coding and summation coding. Summation, or monotonic coding, of number includes graded coding that increases as the perceived number magnitude increases (Roitman *et al*., 2007). This type of coding is consistent with the accumulator model of number representation; that number is represented by accumulating a fixed number of pulses produced serially by some pacemaker (Meck & Church, 1983). There is also evidence of number specific activity in that the spiking rate of a given set of neurons is correlated maximally to a particular value *N*, and less so for *N* + 1, *N*– 1 and so on (Nieder *et al*., 2002; Nieder & Miller, 2003; Nieder & Merten, 2007; Sawamura *et al*., 2002). This holds across presentation format (e.g. dot displays, written digits) of the numerical values. This type of coding creates Gaussian-like neural tuning function (see Figure 1). Each number magnitude is not coded exactly, but in a manner that is consistent with Weber-Fechner’s law (Fechner, 1966 [1860]); that noticeable differences between perceptual stimuli are a function of the proportional difference. As the magnitude of the number increases the neural tuning function width increases proportionally. For example, the width of the tuning function for the magnitude 5 is half that of the magnitude 10, which is half of 20. Thus differences in the perceived value are a function of the proportional stimulus differences, as with Weber-Fechner’s law.

Theories of how number sensitive neural activity develops have been supported by computational models (e.g. Ahmad *et al*., 2002; Dehaene, 2007; Dehaene & Changeux, 1993; Miller & Kenyon, 2007; Pearson, Roitman, Brannon, Platt & Raghavachari, 2010; Verguts & Fias, 2004). These studies demonstrate the development of number selective activity from other inputs, such as perceptual object tracking, or accumulator-like summation coding (Miller & Kenyon, 2007; Verguts & Fias, 2004). Computational results show number selective activity coded with tuning functions that are proportional to the number magnitude, skewed on the linear scale and symmetric on the log scale, similar to the neural data (Dehaene, 2007).

#### The current simulations

The current simulations are in part based on prior neural and computational work. General aspects of the model such as Gaussian tuning curves for number values have been illustrated in prior neural (e.g. Nieder & Miller, 2003) and computational work (Dehaene, 2007; Verguts & Fias, 2004). The current model posits these basic aspects and focuses on developmental change in both the neural activity and behavior. Prior computational work has not provided a clear mechanism of how the neural coding of number may influence developmental behavioral phenomena, such as the apparent log to linear shift in number line estimations; ‘what triggers the conceptual shift from logarithmic to linear in children remains unknown’ (Dehaene, 2007, p. 557). The current focus on how changes in neural activity may influence behavioral changes provides possible answers to this and other questions of numerical development.

The current model focuses on two aspects of the neural tuning curves. First, the width of the function depends on the magnitude of the value being coded. Thus the tuning function for the value 10 is narrower than the function for the value 30, on a linear scale. The functions are proportionally similar, and thus similar on a log scale (Nieder & Miller, 2003; see Figure 1). Second, the tuning functions, though resembling Gaussian distributions, are positively skewed *on a linear scale*. The positive skew also results from the transformation from a logarithmic scale to a linear scale; if the tuning function is symmetric on a log scale it will be positively skewed on a linear scale. In their studies of non-human primates, Nieder and Miller (2003) reported that neural responses are positively skewed on a linear scale. In addition, Nieder and Merten (2007) found that in the coding of values 1–30, smaller values are clearly positively skewed, and larger values are not skewed as much. Computational accounts (Dehaene, 2007; Verguts & Fias, 2004) have shown positive skew in number coding that arises through unsupervised learning with number magnitudes. Thus these two properties – the logarithmic scale and the positive skew – may be fundamental aspects of the human number system. Although both positive skew and proportional tuning functions have been reported in the literature, their role in number cognition has not been well studied.

The current study includes a series of computational simulations that explore how the properties of the neural coding of number may contribute to the development of number cognition. More specifically, the simulations provide a likely neural mechanism for several phenomena previously only described behaviorally. The tasks used in the simulations reflect the tasks used in behavioral investigations of number line estimation and operational momentum. Within the simulations, for a given set of numerical values there is a corresponding set of neural tuning functions that resemble Gaussian distributions with peak activity corresponds to the number being coded (see Figure 2). The simulations specifically examine the relation in coding between the positive skew and the varying width of the tuning function. Building on the neural evidence (Nieder & Miller, 2003), it is assumed that the more narrow distributions that characterize small number values are more skewed than the wider distributions that represent larger numbers. Thus, the tuning functions resemble a Poisson distribution in that both displays attenuate positive skew. Poisson distributions have a history of use in neurocomputational work in describing neural spike trains (Ashby & Valentin, 2007; Boccaletti, Latora, Moreno, Chavez & Hwang, 2006; Song, Miller & Abbott, 2000). The tuning curves presented in prior work (Nieder & Miller, 2002) are arranged to show one particular neural population’s relative activation to varied numerical stimuli. The tuning curves used in the current work represent the relative activation of a range of neural populations in response to one specific numerical stimulus. The shape and characteristics of the neural tuning curves, if viewed this way, retain the identical shape of a positively skewed Gaussian curve.

Prior research has also reported that when behavioral errors occur, the neural activity for the preferred quantity is significantly reduced compared to correct trials (Nieder *et al*., 2002; Nieder, Diester & Tudusciuc, 2006; Nieder & Miller, 2004; Nieder & Merten, 2007). Errors in neural coding of number were linked to errors in the behavioral task. This is key to the current framework. Errors or lack of precision in neural coding may occur and give rise to these same properties in numerical judgments.

### Number estimation

- Top of page
- Abstract
- Introduction
- Number estimation
- Operational momentum
- General discussion
- Acknowledgements
- References

By a variety of measures, young children are poor estimators of numerical values and relative quantities in comparison to adults (e.g. Siegler & Booth 2004; Opfer & Siegler, 2007). One task that has been used to investigate the development of number estimation is the mapping of number values to spatial representations such as a number line (e.g. Baroody, 1999; Booth & Siegler, 2006; Opfer & Siegler, 2007; Siegler & Booth, 2004; Siegler & Opfer, 2003). Older children’s and adults’ estimates are linear, but preschool (and young school age) children produce estimations that are overall logarithmic. Researchers have interpreted this developmental change as a change in children’s cognitive ‘representation’ of number being initially solely logarithmic changing to include linear also (e.g. Siegler & Booth, 2004; for an alternative view see Moeller, Pixner, Kaufmann & Nuerk, 2009). In brief, by this account, younger children rely on representations of number on a log scale while older children are able to use multiple representations, including linear. Though the behavioral phenomenon is quite robust, it is unclear what precipitates the change toward linear estimation other than increased experience with numbers, nor is it clear why young children initially have a logarithmic representation. Just what might be changing as a function of experience with numbers?

The advances in understanding the neural coding of discrete quantities offer a potential account. The assumption is that cognitive-level representations may reflect underlying properties of the neural code. As pointed out by many (e.g. Nieder & Miller, 2003; Johnson, Hsiao & Yoshioka, 2002), studying behavior limits conclusions to the realm of cognitive representations; however, what we know about the neural code suggests a clear hypothesis about the transition from logarithmic to linear mapping of numbers to a number line. Children’s difficulty in the number estimation task may arise because the width of tuning representations which increase proportionally with the magnitude of the number with respect to the spatial representation of number on number line, which is not proportionally scaled. Although this is true for adults as well as children, mapping from a proportional representational system to a linear one may be more difficult for young children than adults if the tuning functions change in certain ways with age. This is the question investigated in the simulations.

The present approach is consistent with findings suggesting that children and adults often use the same neural networks for a task, and that differences in performance are largely a matter of magnitude, timing, or extent of activation (Brown, Lugar, Coalson, Miezin, Petersen & Schlaggar, 2005; Casey, Galvan & Hare, 2005; Casey, Giedd & Thomas, 2000; Durston, Davidson, Tottenham, Galvan, Spicer, Fossella & Casey, 2006; Gaillard, Hertz-Pannier, Mott, Barnett, LeBihan & Theodore, 2000; Rubia, Overmeyer, Taylor, Brammer, Williams, Simmons, Andrew & Bullmore, 2000; Schlaggar, Brown, Lugar, Visscher, Miezin & Petersen, 2002). That is, children may show quantitatively poorer or qualitatively different patterns of performance because their networks are noisy, and are less able to drive activation of parts of the network at the appropriate moment or to the optimal degree. This has been illustrated in computational work in which narrowing of tuning functions of neurons contributes to modeling developmental changes in cognition (e.g. Simmering, Schutte & Spencer, 2008; Schutte, Spencer & Shoner, 2003). Narrow tuning curves have been shown to be necessary for accurate coding of number (Diester & Nieder, 2008). In addition, behavioral work shows that the Weber Fraction, the smallest proportional difference that can be differentiated, changes with age (Halberda, Mazzocco & Feigenson, 2008), which may indicate a change in these underlying tuning functions..

The following series of simulations show (1) that the combination of positive linear skew and broad neural tuning functions leads to estimation errors that are overall logarithmic; and (2) the log to linear development in number estimation is facilitated by neural coding of number and its development, specifically that the narrowing of neural tuning curves with development result in the log to linear shift seen in the behavioral literature.

#### Model specifications

The following simulations use vectors to represent neural tuning functions. Each item in the vectors represents the relative activation level for a group of neurons that respond selectively to some number stimuli. Each simulation included one vector for each of the number magnitudes to be estimated. The values in each vector represent the relative activation (spiking rates) of number selective neurons. For example, the value A was represented by the vector A(n_{1}, n_{2}, …n_{150}), where n_{x} is the activation for the neurons selective for the number magnitude X. Vectors for values A = 1 through 100 were calculated and each vector contained 150 activation values. For example, the activation value at index 5 corresponds to the average activation for all neurons which respond maximally to the number magnitude 5. Activation values represent the relative activation levels for that specific vector only and do not correspond to specific spiking rates. Research suggests that the maximum spiking rate for large numbers is actually lower than for smaller numbers (e.g. Nieder & Dehaene, 2009), thus here *relative spiking rates* are used for ease of comparison. Activation values for each vector were calculated using a modified Gaussian distribution function. This a general function that defines a variety of Gaussian distributions. Similar equations have been used in prior computational work (Dehaene, 2007).

The values of *h* and *m* are set as constants for all simulations. Whereby *h* is the maximum value of the function, this is set to 1; *m* is the mean of the distribution and is set to zero. The value of *s* determines the width of the curve and varies across model instantiations. *X* is defined by the logarithmic difference between the target number and vector item index. For example if the target number is A = 6 and S = 1, for A(n_{6}), x = log_{10}6 - log_{10}6, x = 0. The remaining equation variables are constants other than *s* which for this example is equal to 1. The equation result is A(n_{6}) = 1; thus when the vector index is equal to the target number the relative activation equals 1. Then, for A(n_{4}), x = log_{10}6 - log_{10}4, x = 0.176, and A(n_{4}) = 0.984. Thus, for index 4 the relative activation is slightly reduced. The method of defining *X* by logarithmic differences results in Gaussian functions that are symmetric on a log scale and of identical width (see Figure 2). On a linear scale the functions vary in width and positive skew (skew merely refers to the fact that the function is not symmetric about the mean). Smaller values are both more narrow and more skewed. Again this is simply the consequence of transforming a Gaussian curve that is symmetric on a log scale to a linear scale.

#### Methods

All simulations were evaluated using MATLAB (Mathworks) software. A series of simulations were evaluated, including, as a point of comparison, both symmetric and positively skewed coding of varying tuning function widths. In each case coding vectors were calculated for target numbers 1 through 100. The initial vectors can be interpreted as idealized activation patterns to which some activation noise is added to determine the model output vectors. If the model produced vectors where the maximum value has the same index then the model correctly estimated that number value. Noise is calculated as an change in the vector values by some percent taken from a random distribution, where the mean noise is zero. Thus some vector values increased, others decreased, and the mean amount of noise was zero. After the application of noise the vector output values were calculated, where the index of the maximum value of the vector equaled the output. For example, prior to noise the maximum value for the vector representing ‘5′ is A = 1 at index 5. After the application of noise this value may have been reduced to some value, 0.79 while the value at index 6 was increased to 0.81. The vector has now, due to noise, overestimated the value 5 as 6 for its output. The use of noise in neural models is well established (Schutte *et al*., 2003) and is a more accurate representation of neural coding than static coding. The entire process of the application of random noise to the set of tuning functions was repeated 200 times for 200 simulated ‘subjects’ per coding condition.

As previously noted, prior work has shown that when behavioral errors occur the neural activity for the preferred quantity was significantly reduced compared to correct trials (Nieder *et al*., 2002, 2006; Nieder & Miller, 2004; Nieder & Merten, 2007). The hypothesis here is that the pattern of errors in the neural tuning functions influence the pattern of errors in behavioral output. Thus for these simulations an incorrect index of the maximum activation value is interpreted as an incorrect number estimate.

#### Results and discussion

For each instantiation the simulation produced estimations were plotted against the target numbers and best fit lines were calculated. Variances of 0.5, 1, 2, and 3 were examined for both symmetric and positive skew. *R*^{2} values were calculated for both linear and logarithmic best fit lines, which will be referred to as linear *R*^{2} and log *R*^{2} values. For positive skew coding linear *R*^{2} values decreased as variance increased (0.99, 0.95, 0.79, 0.70), while log *R*^{2} values increased (0.81, 0.89, 0.97, 0.94) (see Figure 3). For symmetric coding, linear *R*^{2} values were similar as variance increased (0.99, 0.99, 0.99, 0.99), as were log *R*^{2} values (0.80, 0.80, 0.83, 0.81). Thus, symmetric coding was overall quite accurate in estimation and did not resemble the log function curve shown by young children. For positive skew coding small variance values, which have narrow tuning functions, produce higher linear *R*^{2} values than log *R*^{2}, similar to older children; larger variance values, which have broader tuning functions, produce higher log *R*^{2} values than linear *R*^{2} values, similar to younger children. Thus with the positive skew coding there is a shift from more logarithmic estimates to linear estimates as the tuning function narrows.

Further comparisons between behavioral data and simulations were completed. A direct comparison was done between prior behavioral data with the current model results. Behavioral data taken from Booth and Siegler (2006, Figure 1), included 37 data points which were matched to corresponding simulation data points. Of the current simulations positive skew with a broad tuning function (*S* = 2) fits this the closest (see Figure 3). Simulation data points were highly correlated with the behavioral data points, *R* = 0.94.

Only with both an overly broad tuning function and positive linear skew does the model produce estimations similar to that of very young children. A narrowing of the tuning curve produces data similar to developmentally advanced children and adults. Younger children tend to be overall less accurate in their estimates and tend to overestimate smaller numbers in the number line task. The simulation matches this pattern due to several factors. As the width of the tuning function increases, the potential for large misestimating increases, thus wide neural tuning functions are less precise than broad tuning functions. In addition, the positive linear skew of the tuning function causes any misestimating likely to be overestimations. The more the skew the more likely an error to be an overestimation, as opposed to an underestimation. As the magnitude of the estimated value increases, positive skew decreases and misestimating tends to average towards zero, over- and underestimations are nearly equally likely. Together these factors contribute to the simulation’s production of a logarithmic estimation pattern, closely mirroring behavioral data.

The neural coding of number is unlikely to be the only influence on children’s performance on estimation tasks. There certainly must be a ‘read-out’ process to go from a neural coding to behavioral output. This process could add noise to the outcome or include influence from top-down control. Children have been shown to change their estimation performance based on structured feedback (Siegler & Booth, 2004; Opfer & Siegler, 2007) and this may reflect top-down influences on estimations. In the two studies children who had previously shown logarithmic estimation patterns were given an additional specific landmark on the number line. After this additional feedback children adjusted their estimation to a more linear pattern. While the neural coding may provide a starting point and present limitations in accuracy, this may be mitigated by explicit feedback, particularly with older children.

In both the behavioral task and the current simulations, output estimations are limited to a particular range. Neither child participants nor model simulations can provide an estimation more than the top value of 100. This does have some consequences in both cases; by limiting estimations neither can overestimate values as greater than 100. In the simulations, removal of this barrier does slightly reduce the fit of the log function. It is unclear how child participants would perform in such a situation. The current model predicts constant proportional variance from the target number.

Prior work has also reported correlations between number line estimation and other number tasks (Booth & Siegler, 2006). Children’s score on a standardized math achievement test was significantly positively correlated with the linear *R*^{2} value of their given estimates. It was, however, not significantly correlated with mean absolute error of estimates. This suggests that producing linear estimation functions is correlated with superior performance in related math tasks. This is unsurprising given the current account. Participants who produce logarithmic estimations due to broad neural tuning functions will also show errors in simple computation, while participants who produce linear estimations due to narrow tuning functions may show fewer errors in computation.

### Operational momentum

- Top of page
- Abstract
- Introduction
- Number estimation
- Operational momentum
- General discussion
- Acknowledgements
- References

Another relevant aspect of number cognition is the development of knowledge of arithmetic operations. Research on simple arithmetic includes participants from 5-month-olds (Wynn, 1992), to older children (e.g. Barth, Beckmann & Spelke, 2008; Prather & Alibali, 2011) to adults (e.g. Barth, Mont, Lipton, Dehaene, Kanwisher & Spelke, 2006; Robinson & Ninowski, 2003). In one such avenue of research several studies have described a phenomenon termed operational momentum (Knops, Viarougue & Dehaene, 2009; Lindemann & Tira, 2011; McCrink, Dehaene & Dehaene-Lambertz, 2007; McCrink & Wynn, 2009). In short, for addition (A + B = C) participants tended to overestimate the value of C, while for subtraction (A – B = C) participants tended to underestimate. The basic phenomenon has been shown with participants ranging from 9 months to adults. A high-level representational account of the phenomenon was envisaged: humans are able to cognitively represent numbers spatially and thus addition and subtraction involve moving along the mental number line. For both addition and subtraction the participants overshoot the value of C, leading to overestimation in addition and underestimation in subtraction. Arithmetic errors are a result of movement along the mental number line where the correct answer is overshot; perhaps a similar mechanism to representational momentum (Hubbard, 2005). The original work describing operational momentum (McCrink *et al*., 2007) also suggested that the effect may reflect properties of the neural coding of number and does so in terms of arithmetic operations as movement along a mental number line.

Given the prior work on the use of mental number lines (e.g. Dehaene, Bossini & Giraux, 1993), this appears to be a plausible behavioral description of the phenomenon. The current simulation examined how and if the neural coding of number may contribute to this behavioral phenomenon. The current simulations illustrate that for the operational momentum effect, the mental number line explanation is *unnecessary* once the neural coding of number is taken into account. Again, the simulations examined how two key tuning function characteristics, positive skew on a linear scale, and proportional scaling, contribute to the patterns of performance reported in the operational momentum literature – a tendency to overestimate addition and underestimate subtraction.

#### Model specifications

Model specifications were identical to the prior experiment with the exceptions of the range of number values, the length of vectors and tuning function widths considered. In the following simulations values 1 to 30 were used in a variety of arithmetic equations. Each value was represented by a vector contained of 50 items. Variance parameters 1, 1.5, and 2 were evaluated.

#### Method

All simulations were evaluated using MATLAB (Mathworks) software. Two separate simulations were carried out; symmetric Gaussian coding and positively skewed Gaussian coding (on a linear scale). In each case coding vectors were calculated for target numbers 1 through 30. Random noise was then added to each vector value, whereas the activation level was altered by a percent calculated from a random distribution. The amount of noise applied was random and independent for each vector value.

After the application of noise the vector output values were calculated, where the index of the maximum value of the vector equaled the output. For example, prior to noise the maximum value for the vector representing ‘5′ is A = 1 at index 5. After the application of noise this value may have been reduced to some value, 0.79, while the value at index 6 was increased to 0.81. The vector has now, due to noise, overestimated the value 5 as 6 for its output. The output values for all vectors were then used to calculate the simulated results of the full set of addition and subtraction equations. For example, for the equation 7 + 3, the vectors representing 7 and 3 are applied some random noise, and then some resulting outputs, e.g. 7, 4 are combined together to determine the model estimate of the addition equation, in this case 7 + 3 = 11. Again, this paradigm is based on prior work reporting correlations between neural coding errors and behavioral errors (Nieder *et al*., 2002, 2006; Nieder & Miller, 2004; Nieder & Merten, 2007). The entire process for the set of equations was repeated 200 times for 200 simulated ‘subjects’ per coding condition.

#### Results and discussion

Simulation results were analyzed separately by coding style and equation operation. For addition and subtraction there were 435 equations evaluated each (all combinations of 1–30). The percent deviation between the target result and the simulated result was calculated for each equation. For positive skew coding, tuning function widths (S = 2, 1.5, 1) tended to produce average overestimate deviations for addition and underestimate deviations for subtraction, 72% and -39%, 56% and -19%, 37% and 2%, respectively. For symmetric coding, all tuning function widths (*S* = 2, 1.5, 1) produced small average deviations for addition and subtraction, 0.28% -0.32%, 0.21% 0.14%, 0.01% 0.03%, respectively. Thus the operational momentum is more severe for relatively broad tuning functions.

Performance curves for addition and subtraction were calculated, similarly to that reported in prior behavioral work regarding operational momentum (McCrink *et al*., 2007). For each equation the difference between the simulated result and the target result was calculated as a percentage difference (see Figure 4). The performance curve conveys the frequency of over- and underestimation errors for both addition and subtraction. The behavioral data show that overestimates are more frequent for addition while underestimates are more frequent for subtraction. The current simulation results show that for the positive skew broad tuning function condition addition equation results are more frequently overestimated than subtraction. Symmetric coding shows equal frequency of over- and underestimation for both addition and subtraction. Thus, the simulated data with positive skew and broad tuning function show the same cross-over between addition and subtraction as the behavioral work, while symmetric coding does not.

The data reported here suggest that a positive linear skewed neural coding of number (Nieder & Merten, 2007; Neider & Miller, 2002; Verguts & Fias, 2004) results in arithmetic errors that are consistent with the reported behavioral phenomenon termed operational momentum. That is, addition operations tend to be overestimated, while subtraction is underestimated. This occurs because both the chance of a misestimate and the type of misestimate vary by magnitude. While smaller numbers with sharper tuning functions tend to have both less frequent and smaller errors, the errors that do occur are much more often overestimations than underestimations. That relatively small values are typically overestimated is consistent with prior work on the development of numerical estimation in children (Booth & Siegler, 2006; Huntley-Fenner, 2001; Opfer & Siegler, 2007) and numerical estimations in non-human animals (e.g. Brannon & Roitman, 2003; Platt & Johnson, 1971). Given the relative magnitude of numbers in addition and subtraction equations, this particular tendency of misestimation accounts for both overestimation of addition and underestimation of subtraction.

Cognitive accounts of operational momentum (McCrink *et al*., 2007; Knops *et al*., 2009) such as spatial associations with number (Dehaene *et al*., 1993; Knops *et al*., 2009; Santens & Gevers, 2008) are not necessarily inconsistent with the current account. A variety of cognitive representations could exacerbate the behavioral pattern including number-spatial associations. However, the current account requires a priori only the experimentally established neural coding of number. Prior research has illustrated how number selective neurons can come about through unsupervised learning (Verguts & Fias, 2004), neural data illustrate the positive skew and relative width of the neural tuning functions used in the current simulations (e.g. Nieder & Miller, 2003). The effect can be described as a ‘natural result’ of the neural coding.

There were several differences of note between the current model and typical behavioral methodology. The behavioral methodology (McCrink *et al*., 2007; Knops *et al*., 2009) has typically included a verification task in which participants evaluated presented arithmetic results, whereas the current simulations produced the results of arithmetic equations. In addition, the behavioral methodology has typically used a limited set of arithmetic equations, due to experimental constraints, whereas the current simulations evaluated all relevant arithmetic equations, resulting in a more comprehensive data set.