SEARCH

SEARCH BY CITATION

Keywords:

  • Risk preferences;
  • elicitation methods

Abstract

  1. Top of page
  2. Abstract
  3. 1 Introduction
  4. 2 Experimental Methods and Procedures
  5. 3 Replication of Results and Analysis of Aggregate Decisions
  6. 4 Analysis of Individual Decisions
  7. 5 Conclusion
  8. Acknowledgement
  9. References
  10. Supporting Information

We compare the consistency of choices in two methods used to elicit risk preferences on an aggregate as well as on an individual level. We ask subjects to choose twice from a list of nine decisions between two lotteries, as introduced by Holt and Laury 2002, 2005) alternating with nine decisions using the budget approach introduced by Andreoni and Harbaugh (2009). We find that, while on an aggregate (subject pool) level the results are consistent, on an individual (within-subject) level, behaviour is far from consistent. Within each method as well as across methods we observe low (simple and rank) correlations.


1 Introduction

  1. Top of page
  2. Abstract
  3. 1 Introduction
  4. 2 Experimental Methods and Procedures
  5. 3 Replication of Results and Analysis of Aggregate Decisions
  6. 4 Analysis of Individual Decisions
  7. 5 Conclusion
  8. Acknowledgement
  9. References
  10. Supporting Information

Measuring and controlling for risk aversion in laboratory economic experiments is commonplace. However, while the concept of measuring risk aversion – for a given utility function – is relatively straightforward theoretically, finding the appropriate test for risk attitudes is still discussed extensively in the literature. In this article we compare results of two methods that can be used to elicit risk preferences – the popular method by Holt and Laury [HL (2002, 2005)] and a newer approach by Andreoni and Harbaugh [AH (2009)]. Both methods are developed to provide data which can be easily interpreted in a constant relative risk aversion (CRRA) framework. Our design, using each method twice, alternating between methods, allows us to check for consistency of aggregate-level as well as individual-level measurements of risk aversion within and between methods. This within-individual benchmark makes our study unique in the literature studying consistency of risk attitudes. We find that while analysis of aggregate data indicates consistency in behaviour, this evidence is weak on the individual level, both within methods and between methods.

Why should we care about consistency? In this study we take the role of the practitioner who uses elicitation methods to control for risk attitudes of participants. Hence, we require usable and reasonably robust value estimates (ex post) based on an (a priori) theoretical framework for which a method was designed. As such, these estimates should not be too sensitive to small manipulations in task descriptions and pay-offs; and they should consistently identify risk attitudes – degrees of risk averse, neutral or seeking behaviour – on an aggregate as well as on the individual level. We perform this analysis for HL and AH.

A large body of the literature suggests and discusses problems with elicitation methods, in particular to connect experimental results to the underlying theoretical models. These problems have been acknowledged by adopting empirical strategies, in particular by using utility functions that incorporate stochastic elements affecting individual choices (e.g. Loomes and Sugden, 1995, 1998; Loomes et al., 2002) by allowing for effects of interdependence between choices and the choice options presented (Starmer and Sugden, 1993), or capturing explicitly the idea of heterogeneity of players (Ballinger and Wilcox, 1997). Unfortunately this literature – despite its insightful considerations – does not provide an implementable toolkit for the elicitation of risk attitudes in laboratory experiments.

Harrison and Rutström's (2008) survey addresses the practical issue of risk elicitation reviewing different risk-elicitation methods and discussing ways used to estimate risk attitudes. This review compares different elicitation methods and discusses specific characteristics of the methods. It focuses on comparing (cross-sectional) group aggregated data and does not compare differences in elicitation methods on an individual or participant level. One reason for this might be that several studies have found that individuals as well as group aggregates (like averages) show that different elicitation methods yield different measures.1

Isaac and James (2000) compare implied risk attitudes of 34 subjects that resulted out of choices made in a first-price auction to measurements based on the Becker–DeGroot–Marschak (BDM) procedure,2 finding that experimental choices in the auction cannot be aligned to risk attitudes based on the BDM procedure. Their results indicate that the two methods only weakly keep the order in the measures of risk aversion, and that the methods are not just shifters of risk aversion measures within individuals. Ranked correlations (across individuals) are only around 39%.

We add to this literature on within-subject consistency of risk elicitation.3 Several studies have found that risk attitudes are not stable within individuals in experimental settings: Berg et al. (2005) found that implied risk attitudes depend on whether individual decisions are measured using auctions for a risky or a riskless asset. Hey et al. (2009) compared willingness to pay, willingness to accept, BDM measures and choices over pairwise lotteries, finding inconsistencies and in some cases even negative correlations between results of the different methods within individuals. Anderson and Mellor (2009) compare results of the method developed by HL and survey results on gambles (over job and investment choices), finding that except for a small fraction of superconsistent (‘consistently consistent’, p. 152) decision-makers, the methods did not provide consistent within-individual estimates of risk attitudes. Comparing HL results and decisions in a choice setting they refer to as the ‘Deal or No Deal game’ (named after a popular TV show), Deck et al. (2008) find that decisions are not consistent and conclude that one elicitation method is treated as an investment (HL), whereas the other is treated as a gambling decision.

When understanding risk attitudes as some (stable) personal characteristic, one can also question if risk attitudes remain constant over time. Andersen et al. (2008) and Harrison et al. (2005a) investigate temporal stability of risk attitudes using HL. They argue for temporal stability given their results, although subjects do not necessarily make the same choice over a 6-month and 17-month period respectively. The reason for this is that they attribute changes in decisions to order effects (Harrison et al. 2005a) and to the fact that there is no shift in the distribution of risk attitudes over time (Andersen et al. 2008). Lönnqvist et al. (2011) also look at intertemporal stability using HL and a survey; their results indicate that the assumption of stability is problematic and that the predictive power of implied risk attitudes based on HL and decisions in the trust game is low.

Each of these studies compare the results from one risk-elicitation method with the results from another choice setting (like an auction, a trust game, a game show or a survey) in which choices are also likely to be driven by risk attitudes. Our approach differs from this literature by comparing the results of two risk elicitation methods (each applied twice) to measure choice stability over a short time frame within individual and within method (as opposed to the long time frame as in Harrison et al. (2005a)). In addition, we investigate aggregate and individual cross-method consistency using two methods of risk-elicitation that have the same theoretical starting point but employ different procedures.

Closest to our study is Dave et al. (2010), in a study on a cross section of the Canadian population), who also compare the results of two methods, HL and an approach by Eckel and Grossman (2002). They find that implied risk attitudes of the two methods differ (in the Eckel and Grossman method more individuals are risk neutral) and that HL lead to more inconsistent choices, particularly among individuals with lower mathematical skills.4 Our study differs to their approach not only in our second elicitation method but also by the fact that we have a more homogeneous (student) population as our experimental subjects (see Supporting Information). In the study by Dave et al. (2010) this mattered for their result, as mathematical skills which were widely distributed across their experimental subjects, changed the accuracy of measures between methods. Furthermore, our approach of letting subjects make decisions for both methods twice provides us with an internal benchmark when comparing results across methods.5 In addition, our methods of choice are based on the same decision variable used to determine risk attitudes (an optimal probability over gains, see the description of the methods below for more detail); furthermore, both methods were designed with the same theoretical framework (i.e. utility function) in mind. This reduces two potential reasons for why different methods might provide different results on risk attitudes for an individual.

We find (a) that both methods provide a divergent picture of the overall risk attitude of the groups in our subject pool, hence whether subjects are predominately risk neutral or risk averse depends on the elicitation method; (b) that within-subject consistency of individual decisions throughout the experiment is limited for both methods, even within methods and (c) that individual-level consistency decreases further when comparing the two methods. These results confirm outcomes of prior research and call into question how experimental results on risk attitudes can be used for more than general statements about groups of subjects. The observation of limited cross-method consistency is further aggravated considering that the internal consistency is not much better within than across methods. That is, the problem does not only seem to be that measures depend on framing. Hence, both methods fall short of some main criteria we would have seen as desirable.

We proceed by introducing the methods we used for our analysis and describe the experimental procedure. Section 'Replication of Results and Analysis of Aggregate Decisions' contains an analysis on the aggregate level and section 'Analysis of Individual Decisions' the within-subject analysis. Section 'Conclusion' concludes.

2 Experimental Methods and Procedures

  1. Top of page
  2. Abstract
  3. 1 Introduction
  4. 2 Experimental Methods and Procedures
  5. 3 Replication of Results and Analysis of Aggregate Decisions
  6. 4 Analysis of Individual Decisions
  7. 5 Conclusion
  8. Acknowledgement
  9. References
  10. Supporting Information

In our experiment we used the methods by HL and AH. Both methods are based on the idea that they provide good measures for a CRRA utility function and the underlying idea of stable individual risk attitudes.6 They ask experimental participants to make choices over lotteries where the main decision variable is the probability of winning; they both allow relatively straightforward calculations of risk aversion parameters and they are both laboratory specific and can be incentivised similarly, which increases the comparability of data collected. We incentivised (ex post randomly) selected rounds of both methods such that they yield the same expected value for a risk neutral decision-maker. This further increases the comparability between the methods.

We briefly outline the two approaches as we implemented them. The original studies contain full details, and screenshots can be found in the Supporting Information.

2.1 Holt and Laury's Method

HL consists of a menu of lotteries (or multiple price list, MPL) with changing probabilities over ten constant pairs of outcomes. For each pair of outcomes in the list there is a more and a less risky option (based on the variance of outcomes). Participants were able to see an MPL and were asked to make choices separately for each row between a pair of lotteries. For each further decision row down, the probability mass on the higher pay-off increased by 10%, making the safer option A (i.e. the option with a lower variance in pay-offs) less attractive. We deviate from HL slightly by leaving out the certain option (i.e. 100% probability of the higher pay-off) to avoid any reference point of safety. This reduced our number of choices from ten to nine in each round where we use HL. Over the two rounds we played one set-up in which participants played over slightly higher stakes (a gamble between 10 and 8 vs. a gamble between 19.25 and 0.5) and one in which they played over slightly lower stakes (8 and 6.4 vs. 15.4 and 0.4). The higher stake set-up just scaled up pay-offs and therefore implies slightly higher risk premia for choosing the more secure option. However, the estimated bounds for CRRA coefficients remain the same for each number of safe choices.7 Table 1 provides an example for the set-up with the slightly lower stakes.

Table 1. Multiple price list design by Holt and Laury
Option AOption B
p X 1−p Y p X 1−p Y
Note
  1. Individuals are asked to chose between Option A or B for each row.

0.180.96.40.115.40.90.4
0.280.86.40.215.40.80.4
0.380.76.40.315.40.70.4
0.480.66.40.415.40.60.4
0.580.56.40.515.40.50.4
0.680.46.40.615.40.40.4
0.780.36.40.715.40.30.4
0.880.26.40.815.40.20.4
0.980.16.40.915.40.10.4

2.2 Andreoni and Harbaugh's Method

In AH individuals have to trade-off the probability of winning against the amount they can win in a gamble; they allocate a budget (or convex risk budget, CRB) in each decision on the probability (p) of winning and the amount to be won (x) – with the inverse probability (1−p) individuals get a pay-off of zero. Individuals receive a budget b which they can use to buy extra percentage points of winning at a price or exchange rate (e) per percentage point. Hence, individuals choose a pair (p,x) such that x=bp·e.

As the method of AH is less common we give the following example here: In round D the participant starts with a budget of $88. The participant can buy extra probability of winning at the cost of $2.75 for each percentage point of winning. She could, for example, chose to buy ten percentage points at a cost of 10·$2.75 = $27.5. Consequently, she would get $88−$27.5 = $60.5 with a corresponding probability of p = 10% and $0 otherwise (with 1−p = 90%). The participant can continue to buy further winning probability, or reduce the winning probability and get a higher amount in case of winning. The participant will (move the slider and) adjust her combination of probability and amount won until some optimal point is reached.

AH vary the range of potential winning gains (b) as well as the price e between additional gains and additional probability of winning. We implemented the AH method with the deviation that we did not use lotteries involving losses. As in the original study we presented the probability of winning as a green shaded area in a pie chart. The amount received in case of winning was illustrated as a green shaded area in a bar chart. Participants were able to change the probability of winning by moving a slider. While the gamble was graphically presented on the computer screen, the probability of winning and the amount to be won were also stated as numbers on the screen. Table 2 shows the budgets b (hence, the maximum amount that could be won with probability zero) in each round as well as the price e of one extra percentage probability of winning. These combinations were each presented to participants twice.

Table 2. Pairs of maximum gain and cost of probability
RoundABCDEFGHI
Note
  1. Individuals chose over p, facing the constraint that they will receive bp·e with probability p.

Budget (b)27.3561728849.439.254.5207116
Price (e) of one extra per cent of winning probability0.281.1710.752.750.770.410.688.622.42

2.3 Experimental Design

We used a within-subject design of individuals who make choices based on the risk-elicitation methods introduced by HL and AH. We analysed decisions of 78 experimental participants from a regular student population throughout seven sessions. Participants were recruited online from the experimental subject pool at the Queensland University of Technology using ORSEE (Greiner, 2004) and through announcements in tutorials. Some participants were also recruited in common places at the university in personal communication; however, when asking students in person for participating in the experiment, the same information was used for recruitment, including the organiser (researchers at the School of Economics and Finance), average earnings (around 20 Australian dollars) and time estimated to complete the experiment (around 30 min). It was also pointed out to the students that there would be no minimum payment for participating in the experiment. It is worth noting that this recruitment of asking students personally to participate was somewhat less controlled than common in many economic experiments. However, as we were interested in within-subject comparisons and were still drawing from a relatively homogeneous student population, this was of minor concern.

The risk-elicitation methods were implemented in a computer laboratory using a custom-made, java-based software. Upon arrival at the laboratory, participants were seated at computers, were asked to work through experimental instructions and start the experiment. Instructions included examples of how to make choices in the experiment and two test questions for each risk-elicitation method. Further help by the experimenter was available upon request of participants. When participants had passed the test questions, they started the experiment, going through two rounds of nine choices for each risk-elicitation method, alternating between the methods. The order of the risk-elicitation methods was switched for about half of our experimental sessions (we did not find significant order effects across participant's decisions depending on the order of the methods).

To avoid portfolio-building or wealth effects in the course of the experiment, after completing the experiment, one of the two rounds was randomly chosen for payment. For this round one choice of each method was randomly selected. Thus, for each method 1 of 18 decision was payment relevant. For the two choices that were selected, participants were given the opportunity to change their earlier decisions; we did so to test whether participants, once they knew that this decision would be paid, would change their decisions.8 Furthermore, the changes in decisions also provide an indicator on the reliability of previously recorded choices over (potentially) hypothetical stakes. Finally, participants were given a questionnaire that asked for some demographic information and student status. After students had finished the questionnaire they were paid and could leave the computer laboratory. Average payments were $17 (SD $18) Australian dollars, of which $10 (SD $5) were paid for decisions in HL and $7 (SD $17) for decisions in AH.

3 Replication of Results and Analysis of Aggregate Decisions

  1. Top of page
  2. Abstract
  3. 1 Introduction
  4. 2 Experimental Methods and Procedures
  5. 3 Replication of Results and Analysis of Aggregate Decisions
  6. 4 Analysis of Individual Decisions
  7. 5 Conclusion
  8. Acknowledgement
  9. References
  10. Supporting Information

In a first step we replicated some of the (central) results in the approaches by HL and AH that were relevant for our comparison. Both studies considered deriving parameter estimates for a CRRA utility function of the form inline image, as introduced in HL or similarly U(x) = xα as in AH. As both notations are equivalent for our purpose, we only report values for α. In both methods the probability chosen was the main choice variable of interest for the analysis. For this utility function, HL grouped experimental decision-makers into categories of individuals with a certain risk attitude, based on the estimated coefficient α. Although the method used by HL does not allow to directly calculate such a coefficient, bounds of it can be determined by looking at the switching points from more risky to less risky choices. These bounds are, however, difficult to identify if individuals have more than one switching point. Dealing with these issues, HL counted the number of safe choices that an individual had made and grouped individuals into categories that this number of safe choices would have implied if they had only a single switching point (SSP). Table 3 reports our replicated results for our two pay-off set-ups comparable to the low stakes set-up of HL, as well as the original results in HL in their two treatments with low and high monetary pay-offs. The last column contains the empirical distribution of CRRA coefficients based on our AH data to allow a comparison.

Table 3. Overall distribution of risk attitudes
Risk attitudeNumber of safe choices HL (replicated) (%)HL (2002) (%) AH (%)
   (1)(2)(3)(4)(5)
Notes
  1. The table shows the share of decisions that would be classified in risk categories as proposed by HL. We include our replicated HL results (1) and (2), the results from HL's original (2002) study (3) and (4), as well as results implied by our data about the AH method (5). Our stakes are higher in (2) than in (1), but both correspond to the low stakes treatment (3) in HL's original approach, as they are significantly lower than in HL's high stakes treatment (4).

Highly risk loving0–1α>1.9511115
Very risk loving21.95>α>1.4907112
Risk loving31.49>α>1.1585646
Risk neutral41.15>α>0.852921261361
Slightly risk averse50.85>α>0.591723261911
Risk averse60.59>α>0.32221923239
Very risk averse70.32>α>0.03102213224
Highly risk averse80.03>α>−0.37443112
Stay in bed9–10−0.37>α94160

The AH risk-elicitation method allows for a straightforward calculation of CRRA coefficients under the functional form as described above; we do this for each decision that experimental participants take and report the distribution of all the decisions by all participants based on the implied α-coefficient.9 We do not replicate the full analysis by AH, who answer five questions on EUT.10 Instead we focus on whether using a CRRA framework with a simple utility function as characterised before is reasonable. We confirm their regression results over all decisions showing that budget allocations of the winning probability and the winning price are approximately constant over the size of winning stakes. As AH we find very small standard errors in the regression results and negligibly small coefficients. This indicates that CRRA is a reasonable assumption.

We find that the classification in terms of risk attitudes of our subjects pool when using the HL method follows a similar distribution to the one reported by HL in their original contribution. Furthermore, we can generally identify a noticeable degree of risk aversion in our subject pool and also find a tendency of (slightly) increasing risk aversion when the stakes over which the lotteries are played increase. Although our stakes are always close to the lower stake set-up of HL, slightly increasing the stakes shifts the results in the expected direction.

Harrison et al. (2005b) pointed out that order effects may influence the results, leading to higher risk aversion in the second round. This may affect our results and explain different levels of risk taking between the two rounds. However, we assume that the effect will be similar across subjects and should not change the rank order of participants based on their risk attitudes. For this reason we use rank correlations in our analysis.

Using AH's method we find that coefficients provide results that indicate a higher number of risk neutral choices compared to results in HL, some risk averse choices as well as some decisions that are risk loving. Analysing the total distribution of the results indicates that the two methods, despite drawing on a very similar notion of utility functions and both being theoretically legitimate risk-elicitation procedures, do not provide with the same result. The average risk attitude in HL is between slightly risk averse and risk averse (on average 5.44 safe choices are made), while the average decision in AH is risk neutral (with a tendency towards risk aversion). This is true despite the fact that the expected monetary pay-off is the same across methods, as on expected value terms the same amount can be earned in both methods.

4 Analysis of Individual Decisions

  1. Top of page
  2. Abstract
  3. 1 Introduction
  4. 2 Experimental Methods and Procedures
  5. 3 Replication of Results and Analysis of Aggregate Decisions
  6. 4 Analysis of Individual Decisions
  7. 5 Conclusion
  8. Acknowledgement
  9. References
  10. Supporting Information

To get a better understanding of these differences in the results, we analysed the decisions of our participants on a within-subject basis. That is, as all our participants made 18 decisions in each method, we can analyse in how far each individual decided consistently within and across the two methods.

4.1 Internal consistency of the methods

In a first step we analysed in how far individual participants made consistent decisions within one risk-elicitation method. For this, we used correlations of individual decisions over the two rounds. For the HL method, the number of safe choices made in the first and the second period, which were used to calculate CRRA coefficients as shown in Table 3, gave a correlation of 55% and a ranked (Spearman's ρ) correlation of 62%. We also considered a second way to measure the degree of risk aversion for which we did not assume that participants have a clearly determinable SSP, but calculated the average risk premium within their farthest switching points. (This corresponds to an approach described by Andersen et al., 2006, who are, however, critical about this procedure.) These averages were correlated at a level of 68% (ρ = 69%) over the two rounds of HL. Figure S2 in the Supporting Information also provides a picture of the dispersion of the difference between safe choices in the first round (over lower stakes) and the second round (over higher stakes), indicating that there is a slight shift towards risk aversion, but that it is not a one-directional shift.

As in the HL method the idea of a SSP from less to more risky options is important, we also looked at whether assuming the general prevalence of SSP was reasonable for our sample, and how many of the participants with SSP consistently chose the same number of safe choices over the two rounds. From our 78 participants in the experiment, 48 had a SSP in both rounds of the HL method.11 Of these, 22 chose the same number of safe choices in both periods.12 Of the 22 (HL-consistent) individuals, 10 participants were in the risk neutral category as introduced above and 12 were either risk averse or risk loving.

When asked to reconsider their choices knowing the period that is going to be paid, 8 of 78 participants wanted to change their decision. One of these eight increased the number of safe choices, while all others increased the number of risky choices. Hence, this indicates that generally participants were fine with the choices they had made earlier. To get a better understanding of which individuals switched their decisions, we investigated if they differed in some way from the other participants. However, we found no noticeable correlation with respect to gender, age, their estimated risk attitude or their mother language. There was also no order effect of which method was played first or if the choice that could be reconsidered had just been made in the round before. There was only some small correlation (of 13%) showing that individuals were somewhat more likely to change decisions around the risk neutral switching point. Hence, there is no observable explanation why individuals changed their decisions in HL.

To analyse the internal consistency in the AH set-up, we similarly first looked at correlations between decisions of individuals made between the rounds. For this purpose we calculated implied CRRA α-coefficients for each decision as described in footnote 3. These coefficients showed correlations that ranged between 15% and 60% for the same lottery (i.e. the same choice over a corresponding maximum gain and price of an extra probability of winning) over the two rounds. Ranked correlations were between 30% and 57% across individuals indicating that looking at only ordinal risk attitudes of individuals lowered the effect of outliers, but did not lead to a greater consistency over the rounds.13 There was no apparent relationship between the stake of the lottery (b) and the correlation between the two rounds; that is it was not clear how to identify which factors led to higher consistency over the rounds. We looked at correlations both for the raw decision variable (i.e. the probability chosen in a given period) as well as for the implied α coefficients for each round of the game. Figures 1 a and b illustrate these correlations for each round. As can be seen, for all combinations of b and e a positive relationship exists, but correlations are far from perfect.

image

Figure 1. Correlations of decisions over two rounds of AH

Note: The figure shows the correspondence of the choices made over the two rounds of the AH method with each point representing the choices of one participant. The first round of AH is represented on the x-axis and the second round of AH on the y-axis. The stakes and price(p) can be mapped with the overview in Table 2.

Download figure to PowerPoint

In a second step we therefore tried to find an individual aggregate for the CRRA coefficient over the different CRB choice allocations. We did so by averaging the coefficients for each individual over each round.14 To find out if such an aggregation was appropriate, we tested for whether there was a positive or negative relationship between the maximum gain and the implied CRRA coefficient. We found that it did not for most individuals.15

Having done this aggregation, we compared (round) average α values of the two rounds; they showed a correlation of 70% by individual and a ranked correlation of 72%. To get a better picture of robustness of the CRRA coefficients, we also looked at whether participants changed their decisions when being informed that a certain round would be selected for final pay-off. The result showed that – comparatively to the HL method – many participants (a total of 27) changed their choices. However, changing decisions in HL and AH can have a very different leverage and the two are hence not directly comparable. In AH comparatively small changes can be made by adjusting the budget allocation just a little bit, while changing in HL essentially always implies a significant shift in measured risk attitudes. Furthermore, in AH the percentage change in those individuals who revised their decisions was noticeable; on average, participants who changed their choices moved 12% towards safer choices and absolute changes were 30%.16

Again, as for the HL method we investigated potential reasons for changing decisions. We found some variables that are correlated with decision changes in the AH method. Non-native speakers are more likely to make changes (correlation of 24%), indicating that understanding the task might play a role. Age and gender, however, play no role. Furthermore, individuals with higher values of α are less likely to change their choices (correlation of 22%). However, these relationships do not seem to be strong. Individuals who change their HL choice are not more likely to do so for their AH choice as well.

Finally, we investigated in how far using average CRRA coefficients derived using the AH method allowed us to reliably classify participants into broad categories of risk averse, risk neutral and risk loving individuals. We therefore tested whether the average CRRA coefficient α was significantly different from one using confidence intervals of 2 within-subject standard deviations. We found that only for 5 of the 78 participants the CRRA coefficient α was significantly different from 1; from our estimates these five participants were risk averse and all other participants were approximately risk neutral.17

4.2 Comparison across methods

Our data also allow us to compare the two risk-elicitation methods on a within-individual basis. One way to do so is trying to make predictions based on one method of how an individual would have made decisions in the other method. Following this rationale, we used the average risk aversion coefficient derived using the AH method to predict how an individual with this parameter would have decided in the HL framework.

We found that this would have predicted 76% and 75% of decisions in the two rounds of the HL method respectively. However, in this comparison any individual who has multiple switching points (MSP) will have some incorrect predictions, even if both methods estimate the same coefficient. To alleviate this effect, we looked at individuals with SSP only, which showed 83% and 82% correct predictions for single (rows of binary) HL choices (hence not the overall implied risk attitude) over the two periods. These numbers indicate a high level of comparability.

However, we read these numbers with care. The reason for this is that we used AH to determine individual-specific risk attitudes and then predicted choices in HL. A simple benchmark is hence to assume all participants having the same risk attitude and see how well this counterfactual can predict choices made in HL. We did so by assuming all individuals to be risk neutral. This should bias the comparison to the favour of the AH method as aggregate analysis for both methods indicates risk aversion. We found that assuming these risk neutral participants would have predicted choices made by individuals under the HL method equally well (85% and 82% respectively). Hence, our individual-specific estimates do not outperform the counterfactual.

We therefore reverted to the categorisation of participants into groups of people with different risk attitudes as in Table 3. We allocated individuals into these risk categories according to the two methods. Using this approach, 10% of participants were grouped into the same risk attitude category by both methods. The main reason for this is that the AH method (on average) classifies individuals as more risk neutral than the HL method. In this sense one could say the AH method ‘shifts’ behaviour of individuals towards risk neutrality. We observe an average shift of 27%; however, the shift is not only in one direction (the average absolute shift is about 33%) and when looking at the ranked correlation on allocations to risk categories ρ is 38%.18 This is surprisingly close to what Isaac and James (2000) found in their study comparing risk attitudes of individuals using a first-price auction and the BDM procedure, which was the first approach comparing two elicitation methods on an individual level. Figure 2 illustrates this relationship.

image

Figure 2. Allocation of individuals into risk categories by HL and AH

Note: The number of individuals at a point is indicated by the size of the bubble. The line displays a corresponding linear fit.

Download figure to PowerPoint

5 Conclusion

  1. Top of page
  2. Abstract
  3. 1 Introduction
  4. 2 Experimental Methods and Procedures
  5. 3 Replication of Results and Analysis of Aggregate Decisions
  6. 4 Analysis of Individual Decisions
  7. 5 Conclusion
  8. Acknowledgement
  9. References
  10. Supporting Information

Using the risk-elicitation methods developed by HL and AH, we tested their internal and external consistency across and within individuals. We find correlations of about 60–70% between decisions in the two periods within method and within individual. Comparatively, cross-method predictions and correlations were smaller and can only be established on an aggregate level. Hence, the two methods are not procedurally invariant, both over the full subject pool (as visible in Table 3), as well as on an individual level. However, as low cross-method correlations are also due to low within-method consistency of decisions, as visible in our within-method benchmark, ρ = 38% between the methods is not so little. Evidently one would like to have risk-elicitation methods with more consistency.

This result of low cross-method consistency seems undesirable considering that a priori one would have guessed that the two methods would yield similar results and it seems difficult to determine a better method ex post. The difference of a priori compatibility and ex post divergence of the methods can also not be resolved empirically given our data, as no observable variable explains the difference. Any reasoning seems highly speculative given that the two methods have the same theoretical motivation and the same decision variable, as in both methods individuals choose over probabilities.19

Also the comparison between methods did not provide an unambiguous guideline of which method should be preferred as both are subject to inconsistencies. While individuals were more consistent over the two rounds in the HL method than the AH method, for both it is problematic to clearly identify the risk attitude of an individual.

In both methods we are not describing small errors as inconsistencies, but shifters that are crucial for the interpretation and meaningfulness of estimated coefficients. This conclusion remains despite the fact that we only repeated the tasks over two rounds and increasing the number of repetitions might lead to more inconsistencies. The analysis of the HL method can be improved by disregarding or simplifying inconsistent choices, but this might not be advisable, as Jacobson and Petrie (2009) have shown. We also find few superconsistent individuals (as did Anderson and Mellor, 2009); in our subject pool individual inconsistencies are an almost universal problem. As most of the literature before, we read our individual-based cross-method correlations as (unsatisfactory) low.

While we have no clear means to determine which of the two methods is the correct or superior one, we can evaluate in how far the desirable characteristics mentioned in the beginning of the study are met by the methods. In the aggregate both methods allow for making statements about the overall risk attitude and we would conclude that the subject pool is on average (moderately) risk averse. However, while both methods tell about the risk attitude of a group of subjects, it seems difficult to reliably infer the risk attitude of an individual from the methods. Hence, given the desirable criteria of a method from the perspective of a practitioner, both methods seem not to meet more than the most basic ones, primarily allowing to make statements about the general prevalence of risk attitudes in the population. This is somewhat disappointing considering that risk aversion is essentially an individual-based concept.

Our findings are particularly crucial from the perspective of a practitioner for whom measuring risk attitudes is not the last step and ultimate goal, but who would like to use this information for further analysis, for example when quantifying the role of risk attitudes in decisions where risk and other elements determine outcomes jointly. Without individual consistency of decisions, it is questionable to what extent HL (or AH) can be used as measures of control for risk aversion, as it is sometimes done when interpreting other experimental games. We would, for example, now be more careful when using them as a stable measure for individual risk attitudes in experiments trying to take out the risk aspect of other decisions and using experimental results on risk attitudes as indicators of whether individuals are risk averse, risk neutral or risk seeking (e.g. in public good decisions (Gangadharan and Nemes, 2009), trusting decisions (Houser et al., 2010) or when linking them to genetic data (Zhong et al., 2009)).20 However, it would be very useful having such a tool.

Acknowledgement

  1. Top of page
  2. Abstract
  3. 1 Introduction
  4. 2 Experimental Methods and Procedures
  5. 3 Replication of Results and Analysis of Aggregate Decisions
  6. 4 Analysis of Individual Decisions
  7. 5 Conclusion
  8. Acknowledgement
  9. References
  10. Supporting Information

We would like to thank participants at the ANZWEE 2011 in Melbourne and the Econometric Society European Meeting 2012 in Malaga for helpful comments.

Footnotes
  1. 1

    To our understanding Isaac and James (2000) published the first study addressing this, and subsequent studies, as outlined in more detail below, included further elicitation methods and investigated further aspects of this original finding.

  2. 2

    This procedure elicits certainty equivalents of given lotteries.

  3. 3

    When talking about consistency in this study we refer to the ability of a method to provide with reliable measures in the context of the theoretical framework for which the method was designed. Hence, an individual would not always have to make identical choices when repeatedly presented with the same choice options, but only be required to make choices which imply a similar – ideally the same – conclusion about the risk attitude of this individual. It may be argued that elicited choices can also be analysed using other methods; while we analyse our data in an expected utility (EUT) framework, it may also be used to make statements in non-EUT frameworks. We will, however, stay in EUT realms for the purposes of this study. An example of within-individual comparisons in the context of risk attitudes in a non-EUT framework can be found in Harbaugh et al. (2010).

  4. 4

    Two further studies looking at differences in elicited risk attitudes among non-students are Reynaud and Couture (2012) and Charness and Viceisza (2011).

  5. 5

    Bruner (2009) compares risk attitude results when changing the probability of the occurrence of a fixed outcome to results with a fixed probability and changing the outcomes between choices. He then compares the consistency of the method in a statistical estimation over all participants. In our study we always remain on the individual level in our data analysis, assuming heterogeneity in risk attitudes across individuals and interpreting all choices as informing about the risk attitude of the specific individual who made the choice.

  6. 6

    Generally, both methods can also be informative in frameworks that do not assume CRRA. In the CRRA framework HL and AH are well comparable. We limit our analysis on CRRA here. Other theoretical frameworks as a basis of comparison might be interesting, but beyond the scope of this study.

  7. 7

    In their study, HL finds that subjects are generally risk averse and that risk aversion increases with the size of the stakes, a statement they refined in a second study (Holt and Laury, 2005) after a comment by Harrison et al. (2005b). Both pay-offs in our study are close to the low pay-off treatment in HL; however, we might observe a small increase in risk attitudes over the two rounds.

  8. 8

    Participants were informed about the random selection mechanism of the two choices at the beginning of the experiment, but were not informed that they would be allowed to change their choices for these two decisions at the end of the experiment. We follow Andreoni and Harbaugh (2009, p. 2009 in this step and address implications of this in the discussion of our results.

  9. 9

    Given that in AH U(x;p) = p·xα and x=b−e·p, argmaxpU(x;p)=argmaxpinline image, where b is the maximum gain in a period that can be chosen with a corresponding probability of zero. As b and e are known, for a chosen p we can calculate inline image, which corresponds to r=1−α.

  10. 10

    For example, AH investigate if participants violate the generalised axiom of revealed preference (GARP). They find that in the domain of gains (which corresponds to our approach) GARP violations are not too frequent (65% of participants show no violations and 85% two or less violations over 14 budgets). For our participants we observe slightly more violations (45% and 57% of participants do not violate GARP in the first and second round of AH respectively). The number of violations decreases in the second round and we observe less than 10% of individuals violating GARP when allowing for ‘small errors’ as described in AH. Table S2 in the Supporting Information documents the GARP violations.

  11. 11

    The rate of individuals who had more than one switching point in our study is comparatively high, at least when compared to the other studies mentioned in the Introduction that used the HL method; they reported non-consistent individuals and non-SSP individuals with shares between 2% and 9% of the sample. Our single-round rate is higher than this and our rate is further increased as we played the HL game over two rounds.

  12. 12

    One of them changed the decision when being able to reconsider their choice at the end of the experiment.

  13. 13

    The correlations of α-coefficients understates the correlation of probabilities chosen over the two rounds, as small differences in probabilities chosen over the rounds that are far away from the risk neutral choice optimum are amplified. Hence, for comparison we also looked at the probabilities chosen over the rounds; these are correlated between 40% and 63%.

  14. 14

    AH determine α by using OLS regressions in which α is an estimated parameter. Doing so for our data leads to more dispersed α values than the α values we calculate by averaging. The statistical fit in the estimations is high (average R2 = 70%). OLS-based and average-based values are strongly correlated (r = 63%, ρ = 69%). Both simple and ranked correlations of the OLS-based α-values with HL decisions (as described later) are lower that the α values based on averaging. As our main aim is to compare AH and HL and as we are aiming to be favourable to consistency, we have chosen averaged values here.

  15. 15

    The Supporting Information includes more detail why we concluded that using the average is appropriate, although we found a relationship between b and α for some individuals.

  16. 16

    Hence, this result indicates that most participants (65%) made their best-informed choice before. Those who do change, however, often made large changes; inline image of them made changes of 17% or more and numerous changes are at the rate of 40% or more. Here, but also for the HL method, this raises the question if revised choices are ‘better’ in their cross-method stability. We tested these later (the revised) specifications, but found no further improvement of consistency with previous choices (i.e. individuals who made changes did not only do so when they were able to alter an outlier). We therefore treated all participants (changers and non-changers) the same by using their original choices. However, we consider the fact that we observe switching as a further indication of problematic consistency within the methods.

  17. 17

    The main reason for this is that for almost all participants the estimated standard deviation on α is SD≥0.3 (see Supporting Information).

  18. 18

    The AH method allows for finer grains than the six risk categories which HL identified in their study. Using these finer measures for the rank correlation analysis gives ρ = 36%.

  19. 19

    For example, one could speculate that we did not sufficiently explain the tasks, leading to higher inconsistencies. However, we used relatively standard instructions, included test questions and supported participants when they had questions. Any protocol effects are hence unintentional. Another conjecture could be that presenting the full choice list for HL at once in contrast to presenting choices sequentially in AH might have an influence on decisions. Or generally, the cognitive load in the tasks may play a role in consistency (as in Harbaugh et al., 2010). However, one would prefer a method with desirable characteristics to be robust to such small changes. Particularly as presenting HL choices sequentially would be likely to further decrease within-individual consistency.

  20. 20

    For a discussion of how to control for risk attitudes in econometric analysis when investigating experimental results also see Harrison et al. (2006)

References

  1. Top of page
  2. Abstract
  3. 1 Introduction
  4. 2 Experimental Methods and Procedures
  5. 3 Replication of Results and Analysis of Aggregate Decisions
  6. 4 Analysis of Individual Decisions
  7. 5 Conclusion
  8. Acknowledgement
  9. References
  10. Supporting Information
  • Andersen, S., G. W. Harrison, M. I. Lau and E. E. Rutström (2006), ‘Elicitation using Multiple Price List Formats’, Experimental Economics 9, 383405.
  • Andersen, S., G. Harrison, M. Lau and E. E. Rutström (2008), ‘Lost in State Space: Are Preferences Stable?International Economic Review 49, 10911112.
  • Anderson, L. R., and J. M. Mellor (2009), ‘Are Risk Preferences Stable? Comparing an Experimental Measure with a Validated Survey-Based Measure’, Journal of Risk and Uncertainty 39, 137160.
  • Andreoni, J. and W. Harbaugh (2009), ‘Unexpected Utility: Experimental Tests of Five Key Questions About Preferences Over Risk’, Working Paper.
  • Ballinger, T. P. and N. T. Wilcox (1997), ‘Decisions, Error and Heterogeneity’, The Economic Journal 107, 10901105.
  • Berg, J., J. Dickhaut and K. McCabe (2005), ‘Risk Preference Instability Across Institutions: A Dilemma’, Proceedings of the National Academy of Sciences of the United States of America 102, 4209.
  • Bruner, D. M. (2009), ‘Changing the Probability Versus Changing the Reward’, Experimental Economics 12, 367385.
  • Charness, G. and A. Viceisza (2011), ‘Comprehension and Risk Elicitation in the Field’, IFPRI Discussion Paper.
  • Dave, C., C. C. Eckel, C. A. Johnson and C. Rojas (2010). ‘Eliciting Risk Preferences: When is Simple Better?Journal of Risk and Uncertainty 41, 125.
  • Deck, C., Lee, J., Reyes, J., and Rosen, C. (2008), ‘Measuring Risk Attitudes Controlling for Personality Traits’, Working Paper, University of Arkansas, Florida International University.
  • Eckel, C. C. and P. J. Grossman (2002), ‘Sex Differences and Statistical Stereotyping in Attitudes Toward Financial Risk’, Evolution and Human Behavior 23, 281295.
  • Gangadharan, L. and V. Nemes (2009), ‘Experimental Analysis of Risk and Uncertainty in Provisioning Private and Public Goods’, Economic Inquiry 47, 146164.
  • Greiner, B. (2004), ‘The online Recruitment System ORSEE 2.0–A Guide for the Organization of Experiments in Economics’, Technical report, University of Cologne, Department of Economics.
  • Harbaugh, W. T., K. Krause and L. Vesterlund (2010), ‘The Fourfold Pattern of Risk Attitudes in Choice and Pricing Tasks’, Economic Journal 120, 595611.
  • Harrison, G. W. and E. E. Rutström (2008), ‘Risk Aversion in the Laboratory’, in: Risk Aversion in Experiments, volume 12 of Research in Experimental Economics. Emerald Group Publishing, Bingley, UK, pp. 41196.
  • Harrison, G. W., E. Johnson, M. M. McInnes and E. E. Rutstöm (2005a), ‘Temporal Stability of Estimates of Risk Aversion’, Applied Financial Economics Letters 1, 3135.
  • Harrison, G. W., E. Johnson, M. M. McInnes and E. E. Rutström (2005b), ‘Risk Aversion and Incentive Effects: Comment’, American Economic Review 95, 897901.
  • Harrison, G., E. Johnson, M. McInnes, E. Rutström and M. Boumans (2006), ‘Measurement with Experimental Controls’, in: M. Boumans (ed.), Measurement in Economics: A handbook, Elsevier, San Diego, pp. 79104.
  • Hey, J. D., A. Morone, and U. Schmidt (2009), ‘Noise and Bias in Eliciting Preferences’, Journal of Risk and Uncertainty 39, 213235.
  • Holt, C. A. and S. K. Laury (2002), ‘Risk Aversion and Incentive Effects’, American Economic Review 92, 16441655.
  • Holt, C. A. and S. K. Laury (2005), ‘Risk Aversion and Incentive Effects: New Data Without Order Effects’, American Economic Review 95, 902904.
  • Houser, D., D. Schunk and J. Winter (2010), ‘Distinguishing Trust from risk: An Anatomy of the Investment Game’, Journal of Economic Behavior & Organization 74, 7281.
  • Isaac, R. M. and D. James (2000), ‘Just Who Are You Calling Risk Averse?’, Journal of Risk and Uncertainty 20, 177187.
  • Jacobson, S. and R. Petrie (2009), ‘Learning from Mistakes: What do Inconsistent Choices Over Risk Tell us?Journal of Risk and Uncertainty38, 143158.
  • Lönnqvist, J. E., M. Verkasalo, G. Walkowitz and P. C. Wichardt (2011), ‘Measuring Individual Risk Attitudes in the Lab: Task or Ask? An Empirical Comparison’, SOEP Papers on Multidisciplinary Panel Data Research.
  • Loomes, G. and R. Sugden (1995), ‘Incorporating a Stochastic Element Into Decision Theories’, European Economic Review 39, 641648.
  • Loomes, G. and R. Sugden (1998), ‘Testing Different Stochastic Specifications of Risky Choice’, Economica 65, 581598.
  • Loomes, G., P. G. Moffatt and R. Sugden (2002), ‘A Microeconometric Test of Alternative Stochastic Theories of Risky Choice’, Journal of Risk and Uncertainty 24, 103130.
  • Reynaud, A. and S. Couture (2012), ‘Stability of Risk Preference Measures: Results From a Field Experiment on French Farmers’, Theory and Decision 73, 203221.
  • Starmer, C. and R. Sugden (1993), ‘Testing for Juxtaposition and Event-Splitting Effects’, Journal of Risk and Uncertainty 6, 235254.
  • Zhong, S., S. Israel, H. Xue, P. Sham, R. Ebstein, and S. Chew (2009), ‘A Neurochemical Approach to Valuation Sensitivity over Gains and Losses’, Proceedings of the Royal Society B: Biological Sciences 276, 4181.

Supporting Information

  1. Top of page
  2. Abstract
  3. 1 Introduction
  4. 2 Experimental Methods and Procedures
  5. 3 Replication of Results and Analysis of Aggregate Decisions
  6. 4 Analysis of Individual Decisions
  7. 5 Conclusion
  8. Acknowledgement
  9. References
  10. Supporting Information
FilenameFormatSizeDescription
geer12043-sup-0001-SuppInfo.pdfapplication/PDF293KData S1. Supporting documents and experimental instructions.

Please note: Wiley Blackwell is not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing content) should be directed to the corresponding author for the article.