Valuation and estimation from experience

The processing of sequentially presented numerical information is a prerequisite for decisions from experience, where people learn about potential outcomes and their associated probabilities and then make choices between gambles. Little is known, however, about how people's preference for choosing a gamble is affected by how they perceive and process numerical information. To address this, we conducted a series of experiments wherein participants repeatedly sampled numbers from continuous outcome distributions. They were incentivized either to estimate the means of the numbers or to state their minimum selling prices to forgo a consequential draw from the distributions (i.e., the certainty equivalents or valuations). We found that participants valued distributions below their means, valued high-variance sequences lower than low-variance sequences, and valued left-skewed sequences lower than right-skewed sequences. Though less pronounced, similar patterns occurred in the mean estimation task where preferences should not play a role. These results are not consistent with prior findings in decision from experience such as the overweighting of high numbers and the underweighting of rare events. Rather, the qualitative effects, as well as the similarity of effects in valuation and estimation, are consistent with the assumption that people process numbers on a compressed mental number line in valuations from experience.

reported in DFE. In addition, we compare these valuations to objective mean judgments to examine whether behavioral patterns extend to nonpreferential tasks. Whereas preferential valuations are usually subject to risk and skewness preferences, mean estimates should not be affected by these preferences. To the extent that numeric information is similarly processed in both objective estimation and subjective valuation tasks, behavioral patterns should be similar in both tasks.
However, to the extent that risk and skewness preferences drive subjective valuations but not objective estimations, responses in valuation should differ from those in the estimation task. In the following, we develop specific hypotheses that are rooted in existing findings in the literature.

Underweighting of rare events and overweighting of high outcomes
Past research in DFE suggests that people choose as if they underweight rare events (Hertwig et al. 2004). Although this effect may be partly due to undersampling of the rare event (Fox & Hadar, 2006), a recent meta-analysis found that the effect remains when controlling for undersampling (Wulff et al. 2018). Another important finding is that people are risk seeking in DFE, which suggests that participants overweight high outcomes (Ludvig & Spetch, 2011). This overweighting has been corroborated by higher frequency judgments for higher numbers in the gain domain (Madan et al. 2014(Madan et al. , 2016.
So far, underweighting of rare events and overweighting of high outcomes have been tested predominantly on choices for gambles with one or two discrete outcomes. Using valuations of continuous outcome sequences, however, allows us to test the generalizability of these effects and provides a bridge to the numeric cognition literature.
For symmetric distributions like the normal distribution, rare events are equally likely to occur for high and low outcomes. Thus, underweighting of rare events predicts no valuations below the mean and no effect of the variance on valuations. In skewed distributions, rare events are more likely to occur on one side of the distribution. Hence, if all other characteristics are equal, underweighting of rare events predicts lower valuations for right-skewed distributions (where high values are rare) than for left-skewed distributions (where low values are rare). In contrast, overweighting of high outcomes predicts higher valuations than the mean, and higher valuations for high-variance (than for low variance) and right-skewed (than for left skewed) distributions, again if all other characteristics are equal.

Numeric cognition and the compressed mental number line
Important contributions have been made to the better understanding of how people process and integrate information sequentially (e.g., Ashby & Rakow, 2014;Baucells et al. 2011;Hotaling et al. 2019;Wulff & Pachur, 2016 (Dehaene, 2003;Dehaene et al. 2008;Feigenson et al. 2004) has indicated that the internal representation of numerals can be described as a compressed mental number line. This means that differences in numerosity are represented to be smaller for higher numerosity, similar to concave mappings from objective stimuli to subjective perceptions in other domains (see Fechner, 1860).
For the processing of number sequences, the compressed mental number line predicts that people give lower estimations, and hence also lower valuations, than the true mean of the underlying distribution.
Further, it predicts that both estimation and valuation are lower for sequences with higher variance. In this case, apparent risk-averse behavior would be partly due to numeric perception rather than subjective economic preference (Schoemaker, 1982). Furthermore, it can be shown mathematically that for compressed power functions, Note: Behavioral predictions based on findings in decision from experience (DFE) and numeric cognition on the valuation of a continuous number sequence. Key: 0 means no deviation from the mean or no effect, + means a positive deviation from the mean or a positive effect, and − means a negative deviation from the mean or a negative effect. For details, see text. for each distribution. This was explained to them as the minimum price they would demand to forgo the option to make a single consequential draw from that distribution that would be paid out. In the estimation task, participants were asked to estimate the mean of the distribution. Here, accuracy was incentivized with respect to how closely the estimates matched the theoretical mean of the underlying distribution.
Under the assumption that participants were well calibrated to the mean estimation task, the variability of the monetary bonus should have been higher in the valuation than in the estimation task. However, we believe that this is at the core of the difference between the two tasks and that other incentives of the estimation task would have made answers subject to risk preferences.
Each trial contained a rectangular box representing the underlying outcome distribution. Participants could sample from the distribution by pressing <space>. Each sampled number was shown for 250 ms and was generated as a random draw from the respective underlying distribution, rounded to its nearest integer. Participants had to draw at least one sample before they typed their answers into the gray fields and confirmed their inputs with <enter> (see Figure 1 for a schematic).

Distributions
We constructed 24 continuous number distributions by combining four means (80, 100, 130, and 160), two standard deviations (5 and 10), and three distribution shapes (normal, left skewed, and right skewed). Skewed distributions were constructed from scaled gamma distributions with a shape parameter of 1 (absolute skewness = 2) and were truncated at the first (left skewed) or last (right skewed) percentile to avoid extreme outliers. The different distributions were presented in randomized order and were the same in both the valuation and the estimation tasks.

Procedure and incentives
The experiment was implemented on a computer with PsychoPy (Peirce, 2007) and conducted in individual sessions in separate rooms at the University of New South Wales School of Psychology. All instructions were presented on the computer screen and could be read at a participant's own pace. Each participant completed two blocks of 24 tasks, starting with either estimations or valuations.
Payment was determined by randomly selecting one answer across both blocks. If the trial was in the valuation block, a Becker-DeGroot-Marschak (BDM) procedure was implemented (Becker et al. 1964): a random number was uniformly drawn between zero and the theoretical mean of a given distribution. When the random number was below the participant's answer for this trial, the participant received a draw from the distribution; otherwise, the participant received the points from the random number for certain. If the selected trial was in the estimation block, the observed points were determined by the true mean from which the error of the estimate was subtracted. Finally, the obtained points were exchanged for Australian dollars (AUD) with a 20:1 ratio and paid out in cash.

Participants and data analysis
We tested 53 participants and determined sample size prior to data inspection. Participants were undergraduates from the University's subject pool, recruited via online advertisement. Participants received course credit and a choice-dependent bonus of 1.50 to 8.93 AUD (M pay = 5.43 AUD). In the subject pool, the mean age was 19 years, and approximately 70% of the subjects were women.
Prior to analyzing the data, we excluded two participants who did not comply with the task. Further, we excluded answers more than five standard deviations from the distribution's mean (21 out of 2448 total trials). We assumed that in these trials, participants made typos or did not pay attention to the samples; thus, these trials were not informative for our research question.
We analyzed the data by means of a participant mixed-effects regression analysis in R (R Core Team, 2016;RStudio Team, 2015) using the lme4 package (Bates, Mächler, Bolker, & Walker, 2015) and the lmerTest package (Kuznetsova, Bruun Brockhoff, & Haubo Bojesen Christensen, 2016). Across all regressions, we used the theoretical characteristics of the respective distributions as independent variables.
We dummy-coded variance and skewness and treated the mean as a continuous predictor variable. As dependent variables, we defined the logarithm of sample size and participant accuracy, quantified as the deviation of their answers proportional to the true means of the distributions. The reason we chose this accuracy measure was to prevent heteroscedasticity in the data, that is, to prevent high answers from having a stronger influence on the regression results than low answers. 1

Results
On  Participants sampled from the white box and typed their answer into the gray box.
In this example trial, the participant sampled 127 both tasks, however, sampled more when the variance was higher (b = 0.14, p < .001). 2 This is in line with previous findings in the literature (Ashby, 2017;Lindskog et al. 2013), and it is adaptive in the sense that more samples mitigate higher uncertainty. Finally, the higher the mean, the fewer the samples taken (b = −0.001, p < .01).
This effect can be explained by the decrease in relative variation with higher means and is discussed below for valuations and estimations.  Table 2 shows the corresponding regression results for the valuation task. In particular, the parameter for variance is negative Finally, deviations got smaller as the mean increased (b = 0.04, SE = 0.01). This might be because the variation relative to the mean decreased with higher means when the variance was held constant.

Valuation task
We address this issue in Experiment 2.

Estimation task
The mean estimates within each condition are depicted in Figure 2 (right). As in the valuation task, participants underestimated the theoretical mean of the number sequences across all distributional characteristics (M = −1.59, Mdn = 0, SD = 9.40). A t test revealed that this underestimation was significant, t(50) = −5.17, p < .001. Again for robustness, we calculated a Wilcoxon test, and it led to the same conclusion (W(n = 51) = 189, p < .001).
Variance was a significant predictor for estimation deviations in the regression (b = −1.66, SE = 0.59; Table 2 right-middle column). Together, all these effects are in accordance with a compressed mental number line.
Finally, the proportional deviation from the theoretical mean got smaller with higher means (b = 0.02, SE = 0.01). As in the valuation FIGURE 2 Answers in Experiment 1A. The y axis shows percentage deviation of participant answers from the theoretical means of the distributions. Error bars are 95% confidence intervals Note: Effects of theoretical mean, variance, and skewness on percentage deviation of answers from the theoretical mean in economic valuation and estimation for Experiments 1A and 1B. All models included subject random intercepts. All significant predictors are robust to the inclusion of random slopes. Standard errors in parentheses. * p < .05. ** p < .01. *** p < .001.

Method
The main difference was in participant instructions, as participants in Experiment 1A indicated some difficulty in comprehending the incentive scheme (particularly the BDM auction). Hence, in Experiment 1B, we simply instructed participants to answer thoroughly and stated that their accuracy would influence their final payoff. We further informed participants that details of the actual payment mechanism were available upon clicking an extra button on the screen. About one third of participants in each block made use of this option.

Participants
We tested 58 participants from the same subject pool as in Experiment 1A. Participants received course credit and a choice-dependent bonus of 1.50 to 8.93 AUD (M pay = 5.43 AUD). Prior to analyzing the data, we excluded answers more than five standard deviations from the distribution's mean (33 out of 2784 total trials) as preregistered.
We used data analyses as preregistered, and deviations were clearly marked.

Results
On average, participants drew M = 29.29 samples from each distribu-   The qualitative effects in the estimation task were similar to those in the valuation task. In particular, the positive effect of skewness on estimations was robust in Experiments 1A and 1B. Presumably due to 4 For robustness, we also report the result of a Wilcoxon test: W(n = 58) = 465, p = .003. 5 For robustness, we also report the result of a Wilcoxon test that was not preregistered:

Estimation task
a smaller effect size, underestimation of the mean and the influence of variance on estimations was not always significant.
Surprisingly, increasing the mean led to less undervaluation and trended toward less underestimation, which was not in line with any of the theories reviewed above. A possible confound could be that we kept the variances of the underlying distributions constant across all mean levels. Thus, there was proportionally less variation for highthan for low-mean sequences (see also Whalen et al. 1999;Weber et al. 2004). To further clarify these issues, we conducted another experiment, outlined next.

EXPERIMENT 2
Experiment 2 aims to clarify the influence of the absolute variance

Material
We made several changes from the previous experiments. We constructed distributions by holding the coefficient of variation (standard deviation/mean) constant across different mean levels. We introduced three variation levels (5%, 10%, and 20%) and eight mean levels (30, 50, 75, 100, 130, 160, 200, and 250). To keep from inflating the number of trials, we omitted the skewness manipulation. Finally, we drew the offer for the BDM auction in the valuation task from 0% to the 99% quantile of the respective distribution.

Participants and procedure
We recruited 120 participants from the University of Geneva subject pool. We determined the sample size before the start of the experiment and doubled it from the previous experiments to increase power.

Results
We excluded trials in which responses were more than five standard deviations from the true mean. In contrast to the previous experiments, we also excluded six participants with more than five such trials in either condition, as such behavior indicates a general misunderstanding. However, all reported effects were robust even when the trials of these participants that were within five standard deviations

Estimation task
The proportional deviations of the mean estimates are depicted in  Similarly, we found that higher variation relative to the mean led Again, this indicates that the effect of the mean on estimation in Experiment 1A was spurious.

Discussion of Experiment 2
We replicated all effects in valuations that aligned with the predic- is not based on preferences-is consistent with this observation.

EXPERIMENTS 3A AND 3B
In the previous studies, the number of samples from each distribution

Method
In both Experiments 3A and 3B, participants drew the fixed number of 20 samples from each distribution. Participants were recruited online via Prolific (n = 131 and n = 133). For the payment of the valuation task, the BDM offer was randomly drawn from the lowest outcome to the highest outcome of the selected sequence.
Sequences in Experiment 3A varied in mean (80-120), standard deviation (10 vs. 20), and skewness (−2.5 vs. 0 vs. +2.5). There were two experimental conditions: in one, participants estimated the mean before making a valuation; in the other, participants were told the true mean before making a valuation.
In Experiment 3B, only normally distributed sequences were presented. These distributions had similar variance but different means.
At the end, participants completed the Berlin Numeracy Test (Cokely et al. 2012), which consisted of four questions about calculation with probability.

Results and discussion
Using a fixed sampling design, Experiment 3A replicated the effects of underestimation and undervaluation, as well as the negative effect of variance and the positive effect of skewness, in both tasks. This showed that previous results did not depend on motivated sampling or endogenous sample size. Undervaluation was less pronounced when the true mean was known than when it was unknown (M = −0.37%, Mdn = −0.28%, SD = 11.34%; regression: b = −6.21, p < .001).
Knowing the mean led to less undervaluation by about 3.39% of that in the condition where the mean was not known. In addition, the deviation in the estimation condition predicted the deviation in the following valuation, b = 0.62, p < .001. As a limitation, these effects might also have been driven by the fact that valuation followed directly after true-mean presentation or estimation, which may have set an anchor for the valuation.
In Experiment 3B, we again replicated the effects of underestimation and undervaluation. When estimation and valuation tasks were elicited in different blocks and based on separate (but identical) samples, there was no significant correlation between the mean of percentage deviations for each participant in both tasks (r = .11, p = .13).
Further, the Spearman correlation between the numeracy score and the participant mean deviation was r = .13 (p = .076) in the estimation task and r = −.05 (p = .710) in the valuation task. We conclude that in this experiment the evidence for an individual tendency to underestimate and undervalue remains inconclusive. Overall, trial-by-trial variability seemed to be very strong, which impeded the correlation of individual differences between tasks (Rouder, Kumar, & Haaf, 2019).

SUMMARY OF RESULTS
To summarize the results across all experiments, we combined and analyzed the data in an internal meta-analysis. The overall sample size was 454, and all previously reported effects were significant for both valuation and estimation. Estimation effects were smaller than valuation effects but were always in the same direction. To quantify the degree to which estimation resembled valuation, we used the ratio (estimation/valuation) of each respective effect.
Across all experiments, the overall ratio of underestimation to undervaluation was 19%. Taking the difference in answers between low-(SD = 5) and high-variance (SD = 20) sequences separately for the two tasks yielded a ratio of 27%. Finally, taking the difference in answers between left-and right-skewed distributed sequences separately for the two tasks yielded a ratio of 48%. This means that the greatest similarity in the size of the effect between the estimation and valuation tasks was found in the effect of the skewness. 6 The ratios across all experiments, as well as separately for each experiment where the particular effect was manipulated, are presented in Figure 5.

GENERAL DISCUSSION
In examining experience-based valuations of gambles with continuous outcome distributions, we found that people showed risk aversion.
They gave valuations below the sequence means, and valuations were lower for high-than for low-variance sequences. In addition, participants valued right-skewed higher than left-skewed distributions.
These results disconfirm the hypotheses that underweighting rare events and overweighting extreme events generalize to valuations of continuous outcomes. We found a similar qualitative, though less pronounced, pattern in a task where participants estimated the mean of a sequence, and economic preferences for risk or skewness should play no role. This shows that the characteristics of a number distribution affect the perception and integration of numeric information beyond the narrow area of economic valuation under risk and uncertainty.
The behavioral effects, as well as the similarity between valuations 6 We do not report the ratio for the mean, because as shown in Experiment 2, this effect vanishes once we hold the variation (instead of the variance) constant. and estimations, are consistent with the idea that numeric cognition is subject to a compressed mental number line (Feigenson et al. 2004).
Consequently, economic valuations of gambles could be shaped partly (20% to nearly 50%, according to our data) by regularities in numeric cognition rather than by subjective preferences alone.

Valuations versus choices from experience
The main difference between our study and those in the DFE literature is that we elicited valuations of single number sequences, whereas most previous studies asked people to choose between two number sequences. In choice studies, people have tended to prefer higher variance sequences, a behavior explained by the overweighting of high numbers (Glickman et al. 2018;Konstantinidis et al. 2018;Ludvig & Spetch, 2011;Spitzer et al. 2017;Tsetsos et al. 2012). This contrasts with the lower valuations of high-variance sequences observed in our experiments. Future research is needed to shed more light on this gap between valuations and choice in DFE (see Golan & Ert, 2015). Crucial differences between the two paradigms that could be relevant are the number of streams the decider has to pay attention to (Vanunu, Pachur, & Usher, 2019), the goal of the decider to come up with either a precise monetary amount or to make an ordinal comparison, and the attitude of the decider toward perceived (relative) losses (Ashby et al. 2018;Kunar et al. 2017).
Another regularity often found in choice studies of DFE is that people behave as if they underweight rare events (Hertwig et al. 2004). This contrasts with the higher valuations observed in our experiments for right-skewed distributions, with rare high outcomes, than for left-skewed distributions, with rare small outcomes. Prior choice studies that report underweighting of rare events have usually used situations with one safe option and another option with two outcomes. When participants have looked at pairwise options between two-outcome gambles, their choices were more in line with attenuated overweighting of rare events (Glöckner et al. 2016;Kellen et al. 2016).
In addition to the difference in elicitation format, unlike previous studies, our experiments used continuous outcome sequences and, thus, a different definition of rare events.

Valuations from experience and numeric cognition
We found a statistically robust but small underestimation of the mean of −1.72% across all experiments relative to the actual means of the observed number sequences. This is consistent with recent research (Brezis et al. 2015;Scheibehenne, 2019). Yet older studies did not find such an effect and described people as intuitive statisticians (e.g., Beach & Swenson, 1966;Laestadius Jr, 1970;Spencer, 1963). A possible explanation for this discrepancy is that older studies focused primarily on absolute or squared deviations from the mean and that way did not measure estimation biases. In addition, older studies typically had smaller numbers of participants; thus, the small effect size of the underestimation might not have reached statistical significance.  As a limitation, we cannot conclude from the similarity between estimation and valuation that mean estimates are direct antecedents of valuations nor that valuations are causally influenced by numeric cognition. For one, the individual correlation between estimation and valuation was surprisingly small given the similar patterns in the aggregate. Further, there are alternative explanations for the data. For example, the effects in Experiment 3A could be due to anchor effects or due to common-method biases, a task characteristic that has affected both estimation and valuation similarly (Podsakoff, MacKenzie, & Podsakoff, 2012). Moreover, we are not claiming that estimation and valuation are identical. They are clearly different in that all effects in estimation were smaller in size than the effects in valuation. This difference could stem from the increased importance of distributional characteristics like variance or skewness in preferential tasks. To rigorously establish a causal link between numeric cognition and economic valuation, a direct manipulation of the underlying cognitive processes would be required.

7.3
The cognitive underpinning of preferences Peters, 2014). The DFE paradigm seems particularly useful to a better understanding of how numeric information is processed to come up with preferential decision making. One defining characteristic of this task is that numeric information is presented sequentially and must be stored in memory as it is usually not available at the time a decision is made. Hence, memory effects can mediate behavioral characteristics as, for example, the overweighting of extreme events (Kahneman et al. 1993;Madan et al. 2014). The process of sequential number integration can also be explicitly modeled through online-updating or memory-based individual number recall (Erev et al. 2008;Mason et al. 2019;Gonzalez et al. 2003). Future research is needed to link these memory models to the compressed mental number line and to examine whether compression differs depending on the number of memory processes that are necessary to integrate numeric information. Another class of cognitive models, range-frequency theory and decision-by-sampling, examines how the distribution of individual numbers shapes the perception of individual and summary evaluations (Parducci et al. 1968;Stewart et al. 2006;Tripp & Brown, 2016). In sum, a better understanding of how the context and memory processes influence the valuation of numbers can inform models of preferential decision making.
In general, processing numeric information requires cognitive resources. To the extent that people differ in their cognitive abilities, this could also affect preferential decision making (Ashby, 2017;Dohmen et al. 2018). In the current studies, we found no conclusive evidence of a relation between underestimation and numeracy. This could be due to the small effect size of this correlation. However, it could also mean that more refined cognitive models are necessary to find a direct relation between cognitive ability and DFE. One important point is that the numeracy questionnaire mostly asks about the understanding of probabilities, whereas no explicit probabilities have to be calculated in DFE. Further, we are not aware of any research linking the curvature of the mental number line to numeracy explicitly. Other measures of cognitive ability may be more important to understanding individual differences in DFE. Given the memory component present in DFE, a plausible candidate could be working memory capacity, that is, the ability to store and manipulate items in short-term memory (Frey, Mata, & Hertwig, 2015).
Finally, many important laboratory studies about economic behavior make use of the DFD paradigm (e.g., Holt & Laury, 2002;Tversky & Kahneman, 1992). Cognitive models could also help us better understand behavioral differences between DFD and DFE by examining the process of numeric integration. Whereas outcomes must be integrated with probabilities when information is summarized descriptively, single outcomes must be integrated when information is presented sequentially. Thus, differences in the context of numbers presented and the involvement of memory processes in DFE could lead to divergent behavior in both paradigms. Future research is needed to synthesize the above points through the integration of cognitive models into the examination of preferential behavior in both DFD and DFE.