Rewards are objects that are advantageous or necessary for the survival of individuals in a variety of environmental situations. Individuals live in a world with limited resources, which are incompletely known (epistemological uncertainty), partly unpredictable or inherently stochastic. Thus, uncertainty is an inherent feature of resources in many natural situations and importantly determines survival. Faced with uncertainty, humans tend to engage in superstitious beliefs and behaviour in order to reduce the uncertainty and bias uncertain events towards favorable outcomes. The degree of superstition is often inversely related to the degree of knowledge of the world, and the acquisition of knowledge can reduce superstition. This relationship suggests that human brains detect the uncertainty in environmental events. Thus, comprehensive investigations of the neuronal mechanisms underlying behaviour directed at obtaining important resources should include uncertainty.
Rewards have specific magnitudes and occur with specific probabilities. Therefore rewards can be adequately described by probability distributions of reward values, and choices between two rewards can be viewed as choices between probability distributions. Decision makers prefer probability distributions of reward with the highest expected value (the anticipated ‘mean’, first moment of the distribution) while taking into account the variance (second moment) or its square root, the standard deviation. Variance or standard deviation reflects the spread of a distribution and indicates how far possible values are away from the expected value. They are measures of the degree of uncertainty in known probability distributions and are referred to as risk (Markowitz, 1952). An alternative, often more appropriate, measure of risk is the coefficient of variation, defined as the standard deviation divided by the expected value (Weber et al., 2004). Intuitively, ‘risk’ refers to the chance of winning or losing relative to the expected value of the probability distribution, rather than the more narrow, common-sense association with loss. By contrast, ‘ambiguity’ refers to the uncertainty in probability distributions that are incompletely known or difficult to capture.
In order to make informed choices, decision makers need predictive information about the probability distributions they are about to choose from. With incomplete predictions, decision making will degenerate towards guessing. Predictions about future rewards are determined by the contingency (dependency) of the reward on a particular stimulus or action (Rescorla, 1967). In the most simple form, Pavlovian predictions are learned without the active participation of the subject. The range of Pavlovian predictions beyond the classically studied vegetative reactions (Pavlov, 1927) is open and debatable. However, for the sake of simplicity we may call any prediction of reinforcers ‘Pavlovian’ if the acquisition of these predictions does not require the subject’s own action. Importantly, with this definition Pavlovian predictions inform about probability distributions of rewards. Thus, Pavlovian predictions include at least the two-first moments of these distributions, namely the value (expected value) and the risk (variance or standard deviation). With this concept, classic Pavlovian predictions allow subjects to take the risk of outcomes into account when making informed decisions.
Risk influences behavioural choices depending on the local risk attitude of the individual decision maker. In risk-averse choices, individuals prefer options with lower rather than higher risk, all other parameters being equal. Behavioural attitudes toward risk vary between individuals and depend within the same individual on the domain of the risky event and the situation in which it occurs, as shown in animal foraging (Caraco et al., 1980, 1990) and human choices (Weber et al., 2002). Risky choices are determined by the subjectively perceived riskiness of options, even with constant absolute degree of risk (Weber & Milliman, 1997). Risk attitudes can be related to temporal discounting; steeper discounting results in lower valuation of later higher rewards that would compensate earlier losses, thus leading to risk aversion. Thus risk aversion might be partly explained by temporal discounting, whereas risk seeking would be anticorrelated with temporal discounting and require to overcome the reduced value of delayed compensation.
Risk attitude can be derived from the subjective utility of objective values assessed by choice preferences. A gradually flattening, concave utility function indicates that the gains achieved by higher outcomes become gradually less important (Fig. 4A). When risk is involved, the steeper slope of the function at lower gains suggests that potential losses loom larger than gains relative to an intermediate reference. For example, a risk avoider prefers a sure gain of £5 over an option with an equal probability of winning £1 or £9 (each P = 0.5). Due to the steep slope at low values, receiving the £1 reduces the utility more relative to £5 than a gain of £9, thus discouraging the choice of risky options. Higher risk increases the tendency to choose safe outcomes because the differences in slope become important with larger ranges. Thus risk-avoiders appear to assign lower subjective values to options with higher rather than lower risk, despite the same mean objective value (£5). By contrast, with a convex utility function the slope increases with increasing values, indicating that lower values have little utility (‘small change’) whereas higher values produce disproportionately more utility (Fig. 4B). Convex utility functions are associated with preference for higher rather than lower risk options, called risk-seeking. The utility gained from outcomes exceeding the mean more than offsets the loss incurred by outcomes below the mean; the steeper slope at higher gains encourages the choice of risky options. Thus risk preference can be inferred from utility functions. Taken together, risk influences the subjective valuation of outcomes, and choices are determined not only by the value of outcomes but also by their risk. The valuation of risky outcomes, and possibly even the preceding assessment of risk itself, appear to be subjective.
Figure 4. Theory and design for risk experiments in humans. (A) Hypothetical concave utility function with single concave component, associated with risk aversion. Based on such a utility function, decision makers would prefer a safe choice of £5 over a gamble of £1 £ or £9 occurring with equal probability (P = 0.5 each), as the loss from obtaining £1 weighs more than the gain of £9 (arrow). (B) Hypothetical convex utility function associated with risk seeking. Such a function would be associated with higher subjective value of the gamble compared to the safe outcome. (C) Expected reward value and risk as a function of reward probability. Expected reward value, measured as mathematical expectation of reward, increases monotonically with reward probability (filled circles). Expected value is minimal at P = 0 and maximal at P = 1. Risk, measured as reward variance, follows an inverted U-function of probability and is minimal at P = 0 and P = 1 and maximal at P = 0.5 (open squares). (D) Experimental stimuli used for testing reward value and risk. Twelve different stimuli are associated with different reward magnitudes (ordinate) and probabilities (abscissa) as shown. Expected value of stimuli (sum of probability-weighted magnitudes) is indicated below stimuli and increases with distance from origin. Circles connected by lines indicate two-choice options with two identical expected values, respectively (100 and 200 points) but each with different risk due to specific magnitude–probability combinations.
Download figure to PowerPoint
Although probability denotes the frequency of uncertain events, it is not by itself a monotonic measure of risk. For example, in a two-outcome situation such as reward vs. no reward, outcome value increases monotonically with the probability of outcome whereas risk is maximal at P = 0.5. Thus the degree of risk follows an inverted U-function peaking at P = 0.5 (Fig. 4C). At P = 0.5, there is exactly as much chance to obtain a reward as there is to miss a reward, whereas higher and lower probabilities than P = 0.5 make gains and losses, respectively, more certain and thus are associated with lower risk. Thus, the design distinguishes risk, which varies according to an inverted U-function of probability, from expected value, which increases monotonically with probability.
The use of imperative, no-choice tasks facilitates the study of basic neuronal mechanisms of risk that are independent of choice and occur before a decision is made. Many processes intervene between the reception of key decision information and the overt behavioural choice. Neuronal signals track expected reward value and risk at an initial perceptual level, and additional subsequent neuronal processes may determine the final choice, including the comparison of previously signalled action values (Sutton & Barto, 1998). Thus, a first step in investigating neuronal mechanisms of reward might focus on neuronal value and risk signals without choice. Nevertheless, to be meaningful for decision making, neuronal reward signals should be also investigated in choice situations.
In a typical experiment in humans, specific pictures predict specific reward magnitudes (100–400 points in steps of 100) at specific probabilities (P = 0.0–1.0 in steps of 0.25), resulting in specific expected value and variance predicted by each stimulus (Fig. 4D; Tobler et al., 2007b). Only one stimulus is presented in imperative trials without choice, whereas two stimuli are shown simultaneously in choice trials. The outcome is presented as the number of points gained (0–400), of which 4% are summed and paid out as British Pence immediately after the experiment.
The risk attitude of participants influences the choice between two simultaneously presented stimuli associated with low and high risk but the same expected value (e.g. connected circles in Fig. 4D). The risky gamble produces one of two equiprobable (P = 0.5) reward magnitudes. Each time the participant choses the more certain stimulus, the factor of risk aversion increases by one, whereas choosing the more uncertain stimulus decreases it by one (four choices total). An average positive factor indicates risk aversion, a negative factor indicates risk seeking, and a zero factor risk neutrality. We also determine risk attitude at choice indifference by identifying for each risky option the safe amount for which participants are indifferent between the risky and the safe option (certainty equivalent), using the PEST procedure. In another risk assessment, participants rate the pleasantness of the risk-predicting stimuli on a scale ranging from 5 (very pleasant) to −5 (very unpleasant). We quantify risk aversion by comparing the ratings for (P = 0.25 + P = 0.75) and P = 1.0 (Wakker, 1994). Risk attitudes measured by choice preferences and subjective pleasantness ratings correlate in our experiments with factors around r = 0.6 (Tobler et al., 2007b, 2009). Using these risk assessments with different expected values allows us to determine the coding of reward value separately from risk.
Subjective coding of reward risk
When different visual stimuli predict reward with different probabilities, BOLD responses in the lateral orbitofrontal cortex vary according to an inverted U-function of probability without significantly varying with reward value (Tobler et al., 2007b). These data indicate the coding of the risk in the different probability distributions. Risk coding is also found, separately from value coding, in the ventral striatum, subthalamic nucleus, mediodorsal thalamic nucleus, midbrain and bilateral anterior insula when the interval between the prediction and resolution of risk is extended to several seconds (Preuschoff et al., 2006). These latter risk signals have longer latencies than the orbitofrontal risk signal and occur in brain structures that receive dopamine afferents, possibly reflecting input from the similarly slow dopamine risk signal (Fiorillo et al., 2003. The differences in time course between the striatal and orbitofrontal responses may reflect different functions of these risk signals.
A good test for the subjective coding of risk might be to correlate the risk signal with individual risk attitudes across different individuals, as measured by their choice preferences. Indeed, the risk signal, as defined above by fitting an inverted U-function of probability, increases in the lateral orbitofrontal cortex with individual degrees of risk aversion (Fig. 5, top). Risk avoiders seem to have a particularly substantial signal indicating the degree of risk in the upcoming reward (right), whereas risk seekers lack such a signal (left). By contrast, a risk signal in the medial frontal cortex increases with risk seeking (Fig. 5, bottom). The signal is particularly strong in risk seekers (left) and, if used by the brain for biasing decisions, might drive individual choices toward the more risky options.
Figure 5. Relation of human frontal risk signals to individual risk attitude. (A) Location of BOLD signal for risk in lateral orbitofrontal cortex covarying with risk as inverted U-function of probability. The signal increases with increasing risk aversion across participants. Risk attitude is measured by choice preferences. (B) Correlation of contrast estimates for risk of individual participants with individual risk aversion. (C) Location of BOLD signal for risk in medial frontal cortex covarying with risk seeking. (D) Risk correlation analogous to B.
Download figure to PowerPoint
These data suggest that risk signals are not the same across different individuals but vary according to individual risk attitudes, suggesting subjective coding of risk. The individual variations in risk signals may explain the different attitudes of individuals towards risk and might influence their choices in risky situations.
Subjective valuation of risky rewards
Risk attitudes determine choice preferences in risky situations. It is generally assumed that choices are directed toward the most highly valued outcomes. Thus, choices biased by risk attitude are based on the subjective valuation of risky outcomes. A more complete investigation of neuronal risk mechanisms should not only assess individual, subjective variations of risk signals but, importantly, consider the influence of risk on reward value.
BOLD signals in parts of prefrontal cortex code expected reward value irrespective of individual risk attitudes. The same BOLD signal also codes risk; the risk coding, but not the value coding, varies with individual risk attitude (Tobler et al., 2007b). These data reveal a combined value and risk signal whose risk component appears to be subjective. However, the result does not yet demonstrate a neuronal correlate for the influence of risk on subjective reward valuation. What we need is not only a signal that codes both value and risk but a direct influence of risk on the value signal, and that influence of risk on the value signal should depend in a consistent manner on risk attitude. This is exactly what BOLD responses in parts of prefrontal cortex do.
BOLD responses in the lateral prefrontal cortex increase with increasing expected value irrespective of risk attitude, suggesting value coding (Fig. 6A; Tobler et al., 2009). Time courses of value responses to the low-risk options are similar irrespective of risk attitude (Figs 6B and C; blue curves). These value-coding activations are influenced by different levels of risk. Importantly, the influence depends on individual risk attitudes measured by choice preferences. The value signal decreases with increasing risk in risk avoiders (Fig. 6B; blue lines with downward arrows toward red lines) but increases with increasing risk in risk seekers (Fig. 6C; upward arrows). The changes occur in both imperative and choice situations. These results suggest a remarkable integration of risk into expected-value signals in the prefrontal cortex.
Figure 6. Influence of risk attitude on subjective valuation of risky outcomes in human lateral prefrontal cortex. (A) Location of BOLD signal for risk attitude-dependent influence of risk on reward value. The risk signal is defined by correlation with risk as an inverted U-function of probability. (B) Time courses of BOLD value signal with low-risk outcomes (blue curves). In risk avoiders, increasing risk (red curves) leads to collapse of value coding (green arrows), suggesting reduced subjective valuation of more risky outcomes compared to less risky outcomes. (C) Time courses of BOLD value signal increasing with risk in risk seekers, suggesting increased subjective valuation of more risky than of less risky outcomes. In B and C, the average variance of low-risk and high-risk outcomes is 2500 and 20 000 points2, respectively. Low and high expected values are averaged across 50 and 100, and across 150 and 200, points respectively. Time courses are averaged separately for risk avoiders and risk seekers. Risk attitude is measured by choice preferences.
Download figure to PowerPoint
The subjective nature of the BOLD value signal for risky rewards relates well to the influence of risk on behavioural decisions conceptualised by expected utility theory. Just as the lower preference of risk avoiders for risky options reflects lower subjective valuation of risky outcomes (Fig. 4A), the BOLD value responses decrease with more risky outcomes (Fig. 6B). In analogy, the preference of risk seekers for more risky options demonstrates hightened subjective value of risky outcomes and is paralleled by stonger value responses for risky outcomes (Fig. 6C). The correlation between behavioural and lateral prefrontal BOLD responses to risk relates well to alterations in risk attitude induced by electrical stimulation to the prefrontal cortex (Knoch et al., 2006; Fecteau et al., 2007). Taken together, this part of the prefrontal cortex values risky rewards in a subjective manner according to individual risk attitute, reporting less value in risk avoiders but more value in risk seekers.
These neuronal results correlate well with the role of risk in behavioural references conceptualised by expected utility theory. The integration of risk and expected value into a subjective value signal is reminiscent of the mean-variance approach in finance theory, which views expected utility as a function of the expected value and risk, based on Taylor series expansion using utility functions (Levy & Markowitz, 1979). However, the data should not be taken to refute neuronal correlates for other approaches to subjective valuation of outcomes such as scalar expected utility theory or prospect theory. However, even if these data relate to the mean-variance approach to utility it is conceivable that other brain structures instantiate other theoretical notions of subjective outcome valuation.
The assessment of expected utility is traditionally based on preferences during choices. The similarity of ventrolateral prefrontal activations in imperative and choice situations suggests that risk attitude may influence the subjective valuation of risky outcomes in this part of frontal cortex irrespectively of overt choices. Subjective reward coding in imperative trials facilitates data interpretation and suggests that the mechanism is operational at the input stage of processes leading to potential choices.