Experimental and Self‐Reported Measures of Risk Taking and Digit Ratio (2d:4d): Evidence from a Large, Systematic Study

We systematically investigate the links between the digit ratio (2D:4D)&#8212;a biomarker for prenatal testosterone exposure&#8212;and two measures of individual risk taking: (i) risk preferences (RP) over lotteries with real monetary incentives and (ii) self&#8208;reported risk attitude (RA). We find that both the right&#8208;hand and the left&#8208;hand digit ratio are significantly associated with RP: Subjects with lower digit ratios tend to choose riskier lotteries. Neither digit ratio, however, is associated with self&#8208;reported RA.


INTRODUCTION
We report findings from a laboratory experiment conducted with a large sample of subjects, which systematically investigates the links between two different measures of individual risk taking and the digit ratio (also known as 2D:4D), the ratio of the length of the index finger to the length of the ring finger. People's digit ratio has been shown to correlate negatively with their prenatal exposure to androgens, such as the steroid hormone testosterone (Goy and Ewen, 1980;Lutchmaya et al., 2004;Hönekopp et al., 2007;Hönekopp and Watson, 2010;Honekopp, 2011;Zheng and Cohn, 2011). Given the difficulty of measuring prenatal androgen exposure directly, we opt for the digit ratio as a biomarker to investigate how early-life physiology shapes economic behavior in adult life.
Previous digit ratio studies provide evidence that prenatal testosterone exposure is associated with several types of important economic and financial behavior. Economic experiments show significant correlations between digit ratio and dictator game giving Galizzi and Nieboer, 2015), cognitive reflection (Bosch-Domènech et al., 2014), contributions to a public good (Cecchi and Duchoslav, 2016), overconfidence bias under incentivized conditions (Dalton and Ghosal, 2014;Neyse et al., 2016), and effort provision (Neyse et al., 2014). In the domain of finance, low digit ratio individuals achieve higher trading profits (Coates and Herbert, 2008;Coates et al., 2009), are more likely to self-select into the financial services profession (Sapienza et al., 2009), bid more competitively (Pearson and Schipper, 2012;Schipper, 2015a), and are more active and risk-taking traders (Cronqvist et al., 2016). These findings suggest that the preferences underlying these choices-such as people's appetite for competition and risk-are partly determined before birth.
Findings to date strongly suggest a biological basis for economic behavior, complementary to recent research on genetic inheritance of economic behaviors (Rangel et al. 2008;Cesarini et al., 2009;Dreber et al., 2009;Kuhnen and Chiao, 2009;Zhong et al., 2009). Similar to genetic factors, prenatal hormone exposure can thus shape one's physiology in ways that affect a variety of social and economic outcomes over the life time. The evidence base for the relationship between prenatal hormones and adult behavior is broad: in both nonhuman mammals and humans, measures of prenatal hormones have been shown to correlate with postnatal behavior (Hines, 2006;Hines et al., 2015). Most evidence points to the period from 8 to 24 weeks of fetal gestation as a key stage, during which a marked difference in androgen levels is observed between male and female fetuses (Rodeck et al., 1985;Finegan et al., 1989), leading to different degrees of "masculinization" of the brain (Manning, 2002). 2 The digit ratio correlates with these androgen levels and is similarly dimorphic-men have lower digit ratios than women. Consequently, much of the literature on prenatal androgen exposure and digit ratio has focused on correlations with sexually dimorphic behavior, such as athletic achievement (Tester and Campbell, 2007), desire for dominance (Neave et al., 2003), traffic offenses (Schwerdtfeger et al. 2010), and stereotypical childhood play behaviors (Hines, 2006).
Our study focuses on an economic behavior that is often said to be sexually dimorphic: risk taking. Although we study the distribution of risk taking within sexes, we suggest that prenatal testosterone exposure may contribute to the behavioral observation that women tend to be, on average, more risk averse than men. The latter finding, with important implications in a range of economic situations, has been documented in both experimental and observational economic studies (Byrnes et al., 1999;Croson and Gneezy, 2009). Of course, a multitude of (biological and social) factors may lead to a differentiation between the sexes on risk aversion, and the observed risk-taking tendencies of men and women will overlap to a large extent. However, at least part of the observed difference may have its origins in prenatal androgen exposure.
We investigate the hypothesis that differences in prenatal testosterone exposure give rise to different levels of risk aversion, with lower digit ratios being associated with more risk taking. Several prior studies of financial risk taking provide evidence of such a relationship within samples of both sexes (Dreber and Hoffman, 2007;Garbarino et al., 2011) or at least within male subsamples (Ronay and von Hippel, 2010;Brañas-Garza and Rustichini, 2011;Strenstrom et al., 2011). Our study contributes to the literature on digit ratio and risk taking with a systematic investigation of the relationships between the digit ratios of a large subject sample (n = 704) and two distinct economic measures of risk taking: (i) revealed risk preferences (RP) over monetary incentives, as measured by the elicitation task developed by Binswanger (1980Binswanger ( , 1981 see also Grossman, 2002, 2008), and (ii) unincentivized self-reported risk attitudes (RA), as measured by the scale developed by Dohmen et al. (2011).
To our knowledge, ours is the first study to date to systematically report, for a large sample of subjects, the associations between both the right-hand digit ratio (R2D:4D) and the left-hand digit ratio (L2D:4D), and two different experimental measures of risk taking, one incentivized and one hypothetical. As explained more in detail in Section 2, our approach to measurement, sample size, and econometric controls for ethnicity is specifically designed to mitigate some of the issues that may have driven mixed results in the literature to date.
Our main findings are as follows: First, R2D:4D and L2D:4D are significantly negatively correlated with RP: subjects with lower R2D:4D and L2D:4D tend to make riskier choices in the experimental lottery test with real monetary payments. It is worth noting that the negative correlation of the L2D:4D with an experimental measure of RP has not been previously reported by the literature. Second, and in contrast to RP, the R2D:4D and L2D:4D are not significantly associated with RA. In sum, incentivized experimental measures of risk taking correlate with both hands' digit ratios, but hypothetical measures do not. 3 The rest of the article is structured as follows: Section 2 contains a detailed discussion of the background literature on digit ratio and on its relationship with risk taking. Section 3 describes the methods, whereas Section 4 presents the results. Section 5 discusses the main findings and concludes.
2. BACKGROUND 2.1. Digit Ratio and Prenatal Testosterone Exposure. Before we discuss the literature on risk taking, it is worth examining the evidence for the digit ratio as a biomarker for prenatal testosterone exposure. 4 The "exposure" is that of the brain's androgen receptors to testosterone-an exposure that is typically much higher for male than female fetuses, since the male fetus produces testosterone in larger amounts (in the Leydig cells of the testes, whereas females produce it in the adrenal glands near the kidney). Effective exposure may also vary with the hormone levels of the mother (Hines, 2006;Talarovičová et al., 2009). There are four strands of empirical evidence that support the existence of a significant, negative relationship: People with lower digit ratios were exposed to higher levels of prenatal testosterone.
First, there is direct evidence from the amniotic fluid: Using a small mixed-sex sample of 2-year olds (n = 29), Lutchmaya et al. (2004) found that digit ratio is related to testosterone and the testosterone-to-estradiol ratio in utero. Using a larger sample of newborns (n = 102), Ventura et al. (2013) found a similar relationship between digit ratio and testosterone in plasma (p = 0.04). 5 However, decomposing these results by sex shows significant effects for girls (p = 0.03 and p = 0.09 for right-and left-hand digit ratios) but not for boys (both p > 0.1). Follow-up research with larger samples seems desirable, as well as studies to fill the evidence gap on the relationship between prenatal testosterone exposure and digit ratios in adolescent or adult subject samples. There is evidence, however, that digit ratios are stable 3 months after fetal gestation (Malas et al., 2006;Galis et al., 2010) and longitudinally stable in samples of children and adolescents (McIntyre et al., 2005;Trivers et al., 2006).
Second, there is evidence from androgen spillovers in zygotic twins: Females with a male twin have lower digit ratios than females with a female twin (Van Anders et al., 2006). The channel of influence is a hypothesized "hormone-transfer" between the twins in utero (Miller, 1994), although the support for this theory is somewhat limited.
Third, there is evidence from individuals with sex hormone-related syndromes: conditions that limit the production of, or the brain's sensitivity to, androgens. Subjects with Congenital Adrenal Hyperplasia (CAH)-characterized by increased androgen production-have lower digit ratios than control subjects (Brown et al., 2002). Males with Complete Androgen Insensitivity Syndrome (CAIS) have higher digit ratios than controls (Berenbaum et al., 2009). Similarly, males with Klinefelter's syndrome-associated with low fetal androgen levels-have higher digit ratios than controls (Manning et al., 2013).
A fourth source of evidence is the laboratory study of nonhuman mammals. Since experimentation with prenatal testosterone administration on human fetuses is ethically unacceptable, testosterone administration in laboratory animals may be the closest substitute. Increasing parental testosterone levels in pregnant rats has been found to lead to lower digit ratios of both male and female fetuses (Talarovičová et al., 2009). Similarly, Auger et al. (2013) exposed male rat fetuses to estrogenic and antiandrogenic disruptors and found that this led to higher digit ratios. In mice, testosterone administration in utero leads to lower digit ratios, whereas estrogen administration leads to higher digit ratios (Zheng and Cohn, 2011). Although replications with other species of mammals seem desirable to strengthen the evidence base, we do note that these findings fit into a broad experimental literature that documents the effects of prenatal testosterone administrations on mammalian brain development (Arnold, 2009;Arnold and Breedlove, 1985;Hines et al., 2015). Unlike levels of circulating hormones, which may change as a response to an individual's context and actions (see Archer, 2006), the stability of the digit ratio implies it cannot be shaped by the individual's previous behavior. With the issue of two-way causation out of the way, the question remains whether there is any relationship between the digit ratio and circulating testosterone levels. The jury is still out: Although Hönekopp et al.'s (2007) meta-analysis on a sizable body of research did not find any relationship between the digit ratio and circulating sex hormone levels in adults, more recent research suggests that the digit ratio is associated with circulating sex hormones under challenging situations, like fighting or competition (Coates et al., 2010;Crewther et al., 2015).
The previous paragraph hints at a more general question: Is the effect of prenatal hormones on the developing brain the only relevant influence that the digit ratio proxies for? This is currently unclear. Most of the research on digit ratio seems to make a tacit assumption that selection into different levels of testosterone exposure in utero is independent of other indirect influences on behavior. We note that this assumption is untested and may not hold. It is, for example, possible that physiological characteristics of the mother affect both the effective level of testosterone exposure in utero and aspects of the child's upbringing. Whether this is merely a theoretical possibility or a factor of significance is a topic worthy of further research. 6 We now turn our focus to the relationship between digit ratio and risk taking. Table 1-have explored the relationship between digit ratio and experimental measures for risk taking, yielding mixed evidence to date. In particular:

Digit Ratio and Risk Taking. A number of studies-summarized in
r Five studies find a negative, significant relationship between digit ratio and risk taking: People with a lower digit ratio take more risk. Dreber and Hoffman (2007) and Garbarino et al. (2011)  r Five studies find a statistically not significant association between digit ratio and risk taking (Apicella et al., 2008;Sapienza et al., 2009;Aycinena et al., 2014;Drichoutis and Nayga, 2015;Schipper, 2015b).
As Table 1 shows, methods differ greatly between studies, both in terms of subject pool and of the measurement of key variables. First, significant relationships appear either in Caucasian samples or male-only samples: Not a single significant result is found for females only. This asymmetric effect might be related to the fact that males are exposed, on average, to higher amounts of testosterone in utero.
Second, mixed results in the literature to date may stem from a combination of selective sampling from particular ethnicities and small sample sizes. 7 The studies cited in Table 1 consider either samples of (predominantly) Caucasian subjects (Dreber and Hoffman, 2007;Ronay and von Hippel, 2010;Brañas-Garza and Rustichini, 2011;Garbarino et al., 2011) or relatively small samples of ethnically diverse subjects (Apicella et al., 2008;Sapienza et al., 2009;Drichoutis and Nayga, 2015;Schipper, 2015b). Weaker relationships between the digit ratio and risk taking in studies with mixed-ethnicity samples might therefore be due to a relationship between digit ratio and risk taking that is mediated by ethnicity. In fact, all the studies reporting significant relationships are conducted with Caucasians. 8 To address any concerns about sample size, we recruit a large sample of subjects (n = 704) consisting of students of different ethnicities.
Finally, previous studies differ greatly in terms of how the digit ratio measure is taken and subsequently computed. Researchers use various tools (e.g., photocopies, scanners) and then use either the digit ratio of both hands or the mean digit ratio of the two hands or the digit ratio of the right hand only. Regarding the R2D:4D, there is some biological evidence to indicate that R2D:4D is more reflective of prenatal hormone exposure than left-hand digit ratio. 9 The two digit ratio measurements, however, are typically strongly correlated, which may mean that the L2D:4D is simply a noisier measure.
In our study, we follow a standardized procedure to obtain high-quality digit ratio measures from hand scans (Neyse and Brañas-Garza, 2014) and report data on both the R2D:4D and the L2D:4D. Note that the actual digit ratio is defined on bone length, something we do not directly observe. Any method that does not use radiographs, therefore, introduces noise into the measurement. The fact that it is only possible to obtain a noisy measure of the digit ratiowhich itself is a proxy for prenatal testosterone exposure-may partly explain the mixture of significant and null results reported thus far. The literature to date may also have been affected by a reporting bias with regards to which gender and which hand is tested for a correlation with risk taking: Apicella et al. (2015), for example, point out that studies that report fewer measures of the digit ratio have a greater proportion of significant results. As with any empirical literature, one cannot rule out the possibility of a "file drawer" problem (Rosenthal, 1979;Ioannidis, 2005;Simonsohn et al., 2014).
As mentioned, our main contribution concerns the systematic investigation of the relationships between the digit ratios of a large sample and two different economic measures of risk taking. The studies listed in Table 1 use different experimental measures for risk taking, some incentivized with monetary outcomes and some not incentivized. Other studies use self-reported indicators. We collect both incentivized and not incentivized measures of risk taking and test both for an association with both hands' digit ratios. In more detail: r Our first measure is an experimental elicitation task for RP over real monetary payments developed by Binswanger (1980Binswanger ( , 1981 and then applied by Grossman (2002, 2008). The RP task involves a choice between six lotteries with different levels of risk. We select this task because its links with the digit ratio have never been previously investigated (Table 1) and because it has the advantage of being simple to understand and intuitive, thus yielding clean and consistent choices (Charness et al., 2013). The RP task, in fact, has already been used to measure RP of large heterogeneous samples of the population (Dave et al., 2010;Galizzi et al., 2016a). The RP task also has drawbacks. For example, compared to the Holt andLaury (2002, 2005) test, 10 the RP task does not allow us to discriminate between different degrees of risk seeking and maps into a rather limited range of constant relative risk aversion (CRRA) parameters that do not directly overlap with the ranges of risk aversion values implied by the standard versions of the Holt and Laury (2002) test (Loomes and Pogrebna, 2014;Crosetto and Filippin, 2016). Nonetheless, a direct systematic comparison of the RP task with the Holt andLaury (2002, 2005) test within a representative sample of the U.K. population finds a positive and statistically significant correlation between the two measures of risk aversion (Galizzi et al., 2016a). r Our second measure is a self-reported measure for general RA on a 10-point Likert scale developed by Dohmen et al. (2011), which has been introduced in large representative surveys (Josef et al., 2016;Galizzi et al., 2016a), and it has been extensively used in other studies with neurobiological measures (Cesarini et al., 2009;Zethraeus et al., 2009). This procedure also has drawbacks. For example, the procedure does not allow us to associate the different individual choices with specific ranges of risk aversion parameters under a CRRA theoretical framework.
Looking at different measures of risk taking is important because risk taking is likely to be a multifaceted and largely context-specific construct (Jackson et al. 1972;Hershey and Schoemaker, 1980;MacCrimmon and Wehrung, 1990;Viscusi and Evans, 1990;Zeckhauser and Viscusi, 1990;Bleichrodt et al., 1997;Finucane et al., 2000;Loewenstein et al., 2001;Weber et al., 2002;Blais and Weber, 2006;Prosser and Wittenberg, 2007;Galizzi et al., 2016b) and because the evidence is mixed on the extent to which different measures correlate and map into each other (see Galizzi et al., 2016a, for a summary of the evidence to date on the crossvalidity of RP measures). It is thus plausible that incentive-compatible, hypothetical and/or self-reported measures capture different aspects of individual risk taking (Battalio et al., 1990;Holt andLaury, 2002, 2005;Harrison, 2006). Most of the studies on the links between digit ratio and risk taking, however, have exclusively looked at Multiple Price List measures such as the already mentioned Holt and Laury (2002) task. Exceptions are the studies by Dreber and Hoffman (2007) and Apicella et al. (2008), who consider the investment task by Gneezy and Potters (1997); Ronay and von Hippel (2010), who use the Balloon Analog Risk Task (BART) procedure; Brañas-Garza and Rustichini (2011), who use a series of nonincentivized binary lottery choices (including the Holt and Laury, 2002, task); and Stenstrom et al. (2011), who use a questionnaire. As mentioned above, no study to date has ever looked at the links between the digit ratios and RP as measured by the Binswanger (1980Binswanger ( , 1981 and the Grossman (2002, 2008) and the Dohmen et al. (2011) procedures.

METHODS
All experimental sessions were run at the Behavioural Research Lab (BRL) at the London School of Economics and Political Science (LSE), London. The first round of data collection took place in February and March 2014 (yielding 543 observations); a supplementary round of data collection took place in April 2015 (yielding a further 161 observations). The procedures followed in both rounds were identical. The experimental protocol was approved by the LSE Research Ethics Committee. Subjects were recruited from the BRL mailing list of volunteers (about 5,000 subjects, mostly current and former students of the LSE). There was no other eligibility or exclusion criterion to select subjects. In the e-mail invitation, subjects were not informed about the exact nature of the experiment that would be conducted and were only told that the experiment would last about an hour, that they would receive £10 as a show-up fee, and that they would have the chance to get an extra payment related to some of the tasks. Subjects could sign up to any of five 1-hour sessions starting every hour between 10 am and 5 pm at every working day in the week.
A total of 921 subjects participated in our experimental sessions. Upon arrival, subjects were identified anonymously using an ID code assigned by the online recruitment system (SONA), asked to read an informed consent form, and to sign the latter if they agreed to carry on with the experiment. After the experiment, subjects were led to a separate room where they were presented with a second consent form, which asked for consent to have both of their hands scanned by a high-resolution scanner. Subjects were clearly briefed that participation in this stage was entirely voluntary. A total of 704 subjects gave consent for their hands to be scanned and yielded resulting scans of sufficiently high quality. We thus focus our analysis on these 704 subjects (76.43% of the original sample). Note that this is an underestimation of the actual consent rate, as we lost a number of observations due to a technical issue with the scanner. 11 We distinguish between RP-subjects' observed choice between monetary lotteries that are played out and paid for real at the end of experiment-and RA-a self-reported measure of risk taking. Both measures were obtained in a computerized questionnaire administered at the start of the experimental session. The questionnaire also contained other items, such as questions about personality and demographic data. The computerized questionnaire was programmed and implemented using Z-Tree (Fischbacher, 2007).
These choices were thus increasing in the variance of the outcomes and in the risk they represented, with A being the safe bet (a variance of 0) and F being the highest-risk choice (a variance of σ 2 F = 1156). To make a choice, subjects clicked one of six radio buttons on their screen, which were labeled with the lottery probabilities and outcomes. Our RP measure thus increases with an individual's appetite for risk. As mentioned, the Binswanger (1980Binswanger ( , 1981 and Grossman (2002, 2008) Dohmen et al. (2011). Each subject was asked the following: "Are you generally a person who is fully prepared to take risks or do you try to avoid taking risks?" To select an answer between 0 and 10, subjects clicked a radio button on their screen, on which the value 0 was labeled as "Unwilling to take risks" and the value 10 was labeled as "Fully prepared to take risks." In the on-screen instructions it was made clear to subjects that the question was about their own assessment of their general attitude toward risk. Our RA measure thus increases with individual self-reported risk taking, with values between 0 and 10.
The RA question was asked first, followed by the RP task a few screens later, with the two questions being separated by other questionnaire items unrelated to risk. This separation was designed to avoid subjects, consciously or unconsciously, adjusting their answer to the RP item to match their answer to the RA item. Furthermore, the RP question was preceded by an on-screen announcement that the upcoming choices would affect subjects' earnings. Note that the RP item was followed by several other incentivized decisions-subjects were informed that each of these decisions would have an equal probability of being randomly selected to be played and paid out for real at the end of the experiment. Average earnings per subject for the entire experiment, composed of the £10 show-up fee and potential extra earnings from the incentivized choices, were £19.48. Subjects were paid their earnings in cash at the end of the session.
After the questionnaire and a completely unrelated task, subjects were led into a separate room where the experimenters had set up a computer with a high-resolution scanner (300 DPI on a Canon LiDE 110). Subjects were told: "Before you leave the laboratory today, we would like to ask you to participate in an optional task. Please can you read the following consent form to see what it involves?" 12 Subjects were then given time to read an informed consent form, which explained that they would be asked to place both of their hands on a scanner to obtain the digit ratio, which " . . . has been shown in various scientific studies to correlate with people's behaviour in the laboratory." They were reminded that placing their hands on the scanner was completely voluntary and that the data would remain strictly anonymous and confidential (" . . . we will not be able to share your digit ratio with anyone, including you"). Finally, they were told that they could ask as many questions as they wanted. 13 After the experimental sessions were completed, we recruited two research assistants to provide us with independent measures of the length of the second and fourth finger of each hand. 14 We calculated the digit ratios from the finger length measures and checked the correlation between the digit ratios implied by the measurements from the two research assistants. These correlations (0.895 for left hand, 0.867 for right hand) suggest that measurement was highly accurate. To obtain a single measure of the digit ratio of each hand for our analysis, we computed the average of the two research assistants' ratios (Neyse and Brañas-Garza, 2014).

Summary Statistics.
Our sample consists of 704 student subjects. The sample consists predominantly of female students (478, 67.89% of the sample). The sample, moreover, is highly ethnically diverse: 244 subjects described themselves as Chinese (34.65% of the sample), 241 as 12 See Appendix A.2 for these instructions. 13 When subjects asked what kind of behavior the digit ratio predicted, or what the purpose of our study was, the experimenters replied that we were looking for correlations with their answers to the questionnaire that was administered earlier.
14 The research assistants were told to take as much time as they needed to provide us with reliable measures. Both research assistants used Adobe Photoshop to measure the length of the fingers on the scans. They were instructed by the same experimenter to follow the procedures described in Neyse and Brañas-Garza (2014). The assistants were also given a copy of this procedure, for reference. The two research assistants did not know or meet each other and worked independently at different times. Research assistants had no access to the details of the subjects' whose fingers they were measuring.  Table 2 summarizes our L2D:4D and R2D:4D measures, in aggregate, and by sex and ethnicity-specific subsamples. Figure 1(a) shows the sample distribution of the L2D:4D for male and female subjects separately; Figure 1(b) shows the same for R2D:4D.
Overall, both the L2D:4D and the R2D:4D of the male subjects are lower than those of female subjects. The average R2D:4D is 0.9584 (SD = 0.0305) for male subjects and 0.9770 (SD = 0.0325) for female subjects; the average L2D:4D is 0.9599 (SD = 0.0353) for male subjects and 0.9733 (SD = 0.0321) for female subjects. Both differences are strongly statistically significant (p = 0.0000). The significant differences of digit ratios across sexes also hold when the analysis is replicated at ethnicity level, with the exception only of Black subjects. Chinese males have significantly lower L2D:4D and R2D:4D than Chinese females (p = 0.0086 and p = 0.0002, respectively), and the same holds for White subjects (p = 0.0283 and p = 0.0007) and for South Asian subjects (p = 0.0468 and p = 0.0055).
Although the difference in digit ratio between sexes is significant, differences between ethnicities are not clear-cut in our sample. In general, the L2D:4D is 0.9661 (SD = 0.0296) for Chinese subjects, 0.9720 (SD = 0.0329) for White subjects, 0.9733 (SD = 0.0370) for South Asians, and 0.9650 (SD = 0.0476) for Black subjects: The L2D:4D for Chinese subjects are statistically different from the L2D:4D of White subjects (p = 0.0156). In general, the R2D:4D is 0.9679 (SD = 0.0305) for Chinese subjects, 0.9738 (SD = 0.0331) for White subjects, 0.9753 (SD = 0.0349) for South Asians, and 0.9595 (SD = 0.0352) for Black subjects: The R2D:4D for Chinese and Black subjects are statistically different from the R2D:4D of White subjects (p = 0.0184 and p = 0.0543, respectively).
For males, we found no statistically significant differences in L2D:4D or R2D:4D between ethnic groups. Within the female subsample, the differences between the L2D:4D and R2D:4D for White females (0.9752 and 0.9790, respectively) and for Chinese females (0.9694 and 0.9724) are both statistically significant (p = 0.0252 and p = 0.0230, respectively). Similarly, the differences between the L2D:4D and R2D:4D for White females (0.9752 and 0.9790, respectively) and for Black females (0.9605 and 0.9607) are both statistically significant (p = 0.0658 and p = 0.0245, respectively).  4.1.2. Risk taking. Only 35 subjects in our sample chose lottery F in the RP task (18 male subjects and 17 female subjects). Moreover, the Ordered Probit (OP) models (discussed in detail in the next regression analysis subsection) suggested that the estimated threshold parameters for the cutoff points corresponding to the lottery choices E and F were not statistically significantly different from each other, suggesting that the two categories should better be collapsed into the same category. We have therefore recoded the responses to the RP experimental test into five categories, taking value 1 if subjects chose the safe lottery A, value 2 if subjects chose lottery B, and so on increasing in risk seeking, up to value 5 if the subjects chose either lottery E or F. 15 The left side of Table 3 summarizes our recoded RP measure. The mean value for RP in our sample is 2.794 (SD = 1.306). Male subjects in our sample chose riskier lotteries on average, with a mean choice of 2.971 (SD = 1.407) compared to 2.714 (SD = 1.251) for female subjects, a difference that is statistically significant (p = 0.0282). This result is in line with the commonly reported finding that women are more risk averse than men (Eckel and Grossman, 2008;Croson and Gneezy, 2009;Charness and Gneezy, 2010). 16 15 We are grateful to an anonymous reviewer for having suggested this analysis. We have also replicated all the estimations of the ordered probit models with six (instead of five) ordered values for the dependent variable (i.e., with choices of lotteries E and F considered in two distinct categories) or only focusing on choices of lotteries A to E, and in all cases we have obtained substantially identical results concerning the associations (or lack of associations) between the digit ratios and the two measures of risk taking (all available on request). 16 Note that evidence on the difference between male and female risk taking in the laboratory is currently disputed (see, for instance, Filippin and Crosetto, 2016). With the exception of the Chinese subjects, who are significantly more risk averse than the White subjects (p = 0.0156), and of Chinese female subjects, who are marginally more risk averse than White female subjects (p = 0.0913), we find no significant differences between the RP of different ethnicities, either for the whole sample or for sex-specific subsamples. Moreover, when looking at each ethnicity separately, we cannot find any statistically significant differences in the RP between sexes.
The right side of Table 3 summarizes our data for the RA measure. The mean value for RA in our sample is 4.697 (SD = 2.273). Also according to this measure, male subjects appear slightly more risk seeking, describing themselves as 5.026 on average (SD = 2.315) compared to 4.541 (SD = 2.238) among female subjects, a difference that is statistically significant (p = 0.0087). RA among South Asian (4.697) and Black (5.433) subjects are not statistically significantly different from White subjects (4.971), but Chinese subjects (4.319) report taking significantly less risk than White subjects (p = 0.0012). None of the differences in RA are significant considering the subsample of males only, whereas Chinese females (4.159) report taking significantly less risk than White females (4.892, p = 0.0053). Moreover, when looking at each ethnicity separately, we cannot find any statistically significant differences in the RA between sexes.
Figures 1(c) and (d) report the sample distributions of the responses of male and female subjects to the RP and RA tasks, respectively.
As it can be seen in Figure 1(c), male subjects in our sample tend to take more risks than female subjects in the RP task. The figure visually confirms the above-mentioned finding that women tend to be more risk averse than men (Eckel and Grossman, 2008;Croson and Gneezy, 2009). Figure 1(d) shows that, compared to female respondents, male subjects report being more willing to risk in the RA task (see also Table 3, right side). Figure 2(a)-(b) reports the sample distribution of the responses to the RP task split by low and high L2D:4D (R2D:4D). In particular, the respondents are divided according to whether their L2D:4D (R2D:4D) is below (above median = 0) or above (above median = 1) the median value of the L2D:4D (R2D:4D) in our sample. Figures 2(c) and (d) report the corresponding sample distributions of the RP responses split by subject sex.
As it can be seen in Figure 2 (right panel), subjects with low R2D:4D (below the median value) tend to take more risks in the RP task than subjects with high R2D:4D (above the median value). The bottom part the Figure 2 focuses on male and female respondents separately. Looking at the cumulative distributions, it can be seen that lottery choices by subjects with digit ratios below the median (both males and females) are first order stochastically dominated by the choices of subjects with digit ratios above the median, which implies that the former take more risk than the latter.
An analogous pattern emerges when the RP responses are split by below and above the median L2D:4D (Figure 2, left panel), but the difference in the distribution of RP responses NOTE: (a): Histogram of Risk Preferences (RP) for low (gray) and high (black) L2D:4D subjects, with low (high) L2D:4D referring to values below (above) the median. (b): Histogram of RP for low (gray) and high (black) R2D:4D subjects, with low (high) R2D:4D referring to values below (above) the median. (c): Cumulative Distribution Functions (CDFs) of RP for low (gray) and high (black) L2D:4D female (left side) and male (right side) subjects, with low (high) L2D:4D referring to values below (above) the median. (d): CDFS of RP for low (gray) and high (black) R2D:4D female (left side) and male (right side) subjects, with low (high) R2D:4D referring to values below (above) the median. is less evident than the analogous difference for the R2D:4D. Although we observe strong differences for males, the same pattern is not observed for females (bottom left). Figure 3(a)-(b) reports the sample distribution of the responses to the RA task split by low and high L2D:4D (R2D:4D) for both male and female subjects. The respondents are divided again according to whether their high L2D:4D (R2D:4D) is below (below median = 0) or above (above median = 1) the median value of the L2D:4D (R2D:4D) in our sample. The corresponding sample distributions of the RA responses by low and high L2D:4D and R2D:4D for the male and female subjects are shown below (Figures 3(c) and (d), respectively).
As can be seen in Figure 3 (notably panels 3(b) and (d)), in the RA task there are some differences in the willingness to take risks between the subjects with low R2D:4D (below the median value) and the subjects with high R2D:4D (above the median value): Subjects with low R2D:4D (right) seemingly report being somewhat more willing to take risks. The difference in the distributions of the RA responses, however, is far less evident than the analogous difference in the distributions of the RP responses. Table 4 reports pairwise correlations among the main variables of interest. 17 We first note that, in our sample, L2D:4D and R2D:4D are strongly positively correlated (0.719, p = 0.000). Next, looking at the measures of risk taking, we find a significant positive correlation between the incentive-compatible RP test and the self-reported RA measure (p = 0.000). However, we note that the correlation coefficient is rather low (0.204), in line   with other evidence of moderate correlations between the two methods Galizzi et al., 2016a). This may indicate that self-reported RA and RP revealed through experimental tasks with real monetary incentives (RP) capture different aspects of individual risk taking. Furthermore, the correlation analysis reveals interesting patterns of association between digit ratios and our risk-taking measures. On the one hand, there is a negative and significant correlation between RP and R2D:4D: −0.126 (p = 0.001). So, the higher is the R2D:4D-that is, the lower the prenatal testosterone exposure-the less likely are the subjects to take risk in an incentivized experimental test. The association of RP with L2D:4D is also negative (−0.108) and statistically significant (p = 0.005). The sign of the association is in line with the existing literature (Dreber et al., 2009;Garbarino et al., 2011;andalso Ronay andvon Hippel, 2010, andRustichini, 2011, although for males only).

Correlation Analysis.
On the other hand, the self-reported RA measure does not exhibit significant correlations with either digit ratio: Although the association is negative with both the L2D:4D (−0.021) and the R2D:4D (−0.010), neither of these is statistically significant (p = 0.582 and p = 0.792, respectively).
Similar patterns of association hold when only the subsample of male or female subjects is considered (Tables A1 and A2 in the Appendix). Notice, however, that while the correlation of RP with RA, of L2D:4D with R2D:4D, and of RP with R2D:4D are all statistically significant for the sex-specific subsamples, the negative association between RP and L2D:4D is not significant in the all-female subsample, and the negative association between RP and R2D:4D is only marginally significant in the all-male subsample. 18 4.3. Regression Analysis.

Digit ratio and RP.
We also conduct regression analysis to explore the links between digit ratio and risk taking, controlling for sex and ethnicity. We first look at RP, which we investigate using an OP model. In our OP model, the dependent variable can take five values, from 1 (choosing lottery A) to 5 (choosing either lottery E or F), increasing with individual risk seeking. We first look at sex and ethnicity as explanatory variables and then add digit ratio variables (R2D:4D or L2D:4D) into the OP regressions, retaining controls for sex and ethnicity. Unless stated otherwise, all regression models are conducted pooling all data together and with adjustments to the variance-covariance matrix for possible heteroskedasticity and serial correlation.
Starting with the regressions of RP on individual characteristics, results show (Table A3, Appendix) that female subjects are more risk averse (p = 0.024), even when controlling for ethnicity (p = 0.032). There is no significant effect for any ethnicity, apart from the Chinese group, with Chinese subjects being significantly more risk averse (p = 0.035 and p = 0.048 when controlling for sex).
We now turn to the regression models with digit ratio variables, starting with R2D:4D (Table 5) and then replicating with L2D:4D (Table 6). We first look at the R2D:4D as the main explanatory variable for RP and then add sex, an interaction term between sex and R2D:4D, and ethnicity variables as control variables, while retaining R2D:4D. Table 5 shows that, when included in the regression on its own, the R2D:4D is negatively and strongly significantly associated with RP (p = 0.001): Subjects with lower R2D:4D tend to be less risk averse, a result that is closely in line with previous studies and with the descriptive and correlation analyses. Importantly, the association of RP with R2D:4D remains statistically significant even when directly controlling for sex (p = 0.007), sex and a sex × R2D:4D interaction term (p = 0.066), ethnicity (p = 0.000), and both sex and ethnicity simultaneously (p = 0.002): Individuals with lower R2D:4D tend to make less risk-averse choices in the incentive-compatible experimental test. There are no significant sex or sex × R2D:4D interaction effects in the estimations with R2D:4D. 19 18 The latter result could point to differences between male and female subjects in our sample and/or to differences in the sample size of the two genders subsamples (fewer male subjects). 19 The OP estimations in Tables 5 and 6 also show that the threshold parameters for RP appear to be statistically significantly different from each other, suggesting that the five RP categories should not be further collapsed into fewer categories. As already mentioned, we have also replicated all the estimations of the OP models only focusing on choices of lotteries A to E, or with six ordered values for the dependent variable (i.e., with choices of lotteries E and F considered in two distinct categories), and in all cases we have obtained substantially identical results concerning the associations between the digit ratios and the RP measure. The OP models, however, suggested that the estimated threshold parameters for the cutoff points corresponding to the lottery choices E and F were not statistically significantly different from each other, suggesting that the two categories should better be collapsed into one category.  Next, we turn to the regression model with L2D:4D. Table 6 shows that, when included in the regression on its own, the L2D:4D is negatively and strongly significantly associated with RP (p = 0.006): Subjects with lower L2D:4D tend to be less risk averse, a result that is closely in line with the descriptive and correlation analyses and that has never been previously documented in the literature.
The association of RP with the L2D:4D is marginally lower and less statistically significant than with the R2D:4D. Importantly, however, the association of RP with L2D:4D remains statistically significant even when directly controlling for sex (p = 0.018), sex and a sex × L2D:4D interaction term (p = 0.005), ethnicity (p = 0.003), and both sex and ethnicity simultaneously (p = 0.009): Individuals with lower L2D:4D tend to make less risk-averse choices in the incentivecompatible experimental test. There is a marginally significant sex effect (female subjects tend to make more risk-averse choices) and a marginally significant sex × L2D:4D interaction effect in the estimations with L2D:4D.

4.3.2.
Digit ratio and self-reported RA. We next consider the relationship between digit ratio and RA, modeled using an OP model. In our OP model, the dependent variable can take 11 values, associated with the 11 degrees of risk taking that the subjects could self-report. Again, we first conduct a set of regressions with sex and ethnicity as explanatory variables, whereas the second set of regression models adds the digit ratios (L2D:4D and R2D:4D), retaining controls for sex and ethnicity. Also these regression models are conducted with adjustments to the variance-covariance matrix for possible heteroskedasticity and serial correlation. Table A4 (Appendix) reports the findings from the OP regression models of RA without digit ratio variables. Female subjects in our sample report significantly lower willingness to take risks. This is in line with what is found by Josef et al. (2016) and Galizzi et al. (2016a) in representative samples in Germany and the United Kingdom, respectively. Furthermore, among the various ethnic groups, only the Chinese subjects report significantly more risk-averse attitudes when directly asked how risk seeking they are. Both effects are robust to controlling for both sex and ethnicity together.
We now turn to the regression models with digit ratio variables, starting with R2D:4D (Table 7) and then replicating with L2D:4D (Table A5). We first look at the R2D:4D (or L2D:4D) as the main explanatory variable for RA, and then add sex, an interaction term between sex and R2D:4D, and ethnicity dummies as control variables, while retaining R2D:4D (or L2D:4D).
Next, we turn to the association between RA and R2D:4D, shown in Table 7. In no regression is the R2D:4D significantly associated with self-reported RA, neither on its own or when included together with sex and/or ethnicity variables. The only variables significantly associated to RA seem to be again the dummies for female and Chinese subjects, both of whom self-report more risk-averse attitudes. Table A5 (Appendix) reports the OP models of RA and L2D:4D. As with R2D:4D, there is no significant association between RA and L2D:4D, either when included in the regressions on its own or with sex and/or ethnicity as control variables. Also in the regressions with the L2D:4D, female and Chinese subjects self-report being more risk averse. 20 4.3.3. Consistency of results across subsamples. Furthermore, in Tables A6-A9 in the Appendix we also report the results of the estimations obtained in the subsamples of male and female subjects. For the sake of comparability, for each dependent variable (RP or RA), we report the estimations for the full sample and for the two sex-specific subsamples in terms of the models where the only explanatory variables are the digit ratios (L2D:4D or R2D:4D) as well as of the models adding the controls for the ethnicity groups. As it can be seen in Tables  A6-A9, and in line with the previously reported analysis, in the full sample both the L2D:4D and the R2D:4D are negatively and significantly associated with RP, whereas none of them is associated with RA.
All the associations are robust to the inclusion of the ethnicity controls, with Chinese being the only ethnic group significantly (negatively) associated with RP. In the male subsample, both the L2D:4D and the R2D:4D are negatively and significantly associated with RP, but the association with the R2D:4D is only marginally significant (p = 0.073). In the female subsample, the R2D:4D is negatively and significantly associated with RP, but the L2D:4D is not significantly associated with RP (p = 0.407). In both the male and the female subsamples, there is no association between the digit ratios and RA.
Finally, note that all our results are qualitatively identical when the regressions are conducted excluding the respondents in the Black or Other ethnic groups; using OLS models or ordered logit hierarchical regressions; using interval regression models for the ranges of the coefficient of relative risk aversion implied by the different choices in the RP; using standardized z-values for the digit ratios (as done in Garbarino et al., 2011); or using the average digit ratio of the two hands instead of the R2D:4D and L2D:4D separately (results not reported but all available on request).
As a further robustness check, note that our results are not affected by corrections for multiple testing. For example, if we adjust the p-values of our pairwise correlation coefficients for R2D:4D and L2D:4D using a conservative correction-such as the Bonferroni (1935) correction-that assumes no correlation between outcome variables, our findings remain substantially unchanged (Table A10): The digit ratios are significantly negatively associated with risk taking in the experimental task and not significantly associated with self-reported RA. Less conservative adjustments that allow for correlations among the variables-such as the corrections proposed by Holm (1979), Hochberg (1988), Hommel (1988, Benjamini and Hochberg (1995), or Benjamini and Yekutieli (2001), for example-would a fortiori yield the same substantial findings.

DISCUSSION AND CONCLUSIONS
To our knowledge, ours is the first study to date to systematically report, for a large ethnically diverse pool of subjects, the associations between one incentivized and one hypothetical measure of risk taking, and both the R2D:4D and the L2D:4D.
We report two main findings. First, both the R2D:4D and the L2D:4D are significantly associated with RP measured by an incentive-compatible experimental task: Subjects with lower R2D:4D and L2D:4D tend to make significantly riskier choices in the experimental lottery test with real monetary payments. This finding is robust across a wide range of alternative specifications, which vary the estimation strategies and include sex and ethnicity dummies as well as other controls. We thus contribute to the existing literature (Dreber et al., 2009;Ronay and von Hippel, 2010;Brañas-Garza and Rustichini, 2011;Garbarino et al., 2011) by showing that the association between R2D:4D and financial risk taking that these studies report for relatively small samples of Caucasian subjects also holds within large samples of ethnically diverse subjects.
Although marginally weaker than the association with the R2D:4D, the association of the L2D:4D with an experimental measure of RP has not been previously reported by the literature. This confirms the importance of separately considering both hands' measures when looking at the links between digit ratios and behavioral attitudes (Dreber and Hoffman, 2007;Apicella et al., 2008).
Second, in contrast to our findings on revealed RP, neither the R2D:4D nor the L2D:4D is significantly associated with RA measured by a hypothetical question. That is, while incentivized experimental measures of risk taking are related to both hands' digit ratios, hypothetical measures are not. Although our study is the first to test such a relationship for digit ratio and risk taking, this result is in line with the abundant experimental literature showing that self-reported and incentive-compatible measures for economic preferences correlate only imperfectly (Battalio et al., 1990;Blackburn et al., 1994;Cummings et al., 1995Cummings et al., , 1997Rutstrom, 1998;List, 2001;Holt andLaury, 2002, 2005;Harrison, 2006;Lusk and Shogren, 2007). 21 It is worth reflecting on how our negative findings on hypothetical risk-taking decisions fit into the broader literature. Our finding is in line with the idea that risk taking is a complex, multidimensional aspect of individual behavior and that different measures could well capture different nuances and angles of risk taking (Jackson et al., 1972;Hershey and Schoemaker, 1980;MacCrimmon and Wehrung, 1990;Viscusi and Evans, 1990;Zeckhauser and Viscusi, 1990;Bleichrodt et al., 1997;Finucane et al., 2000;Weber et al. 2002;Blais and Weber, 2006;Prosser and Wittenberg, 2007;Galizzi et al., 2016aGalizzi et al., , 2016b. It seems also plausible that one's general tendency to take risk is much more influenced by one's social environment, socioeconomic situation, knowledge, and other factors, instead of traits associated with prenatal hormone exposure. A monetary gamble in a laboratory experiment, a much more instantaneous decision that comes with its own context, does show a correlation with prenatal hormones. Self-reporting bias aside, the differences between the two risk-taking measures are myriad, and it may well be that they rely on different cognitive and physiological processes. A more constructive take is that it may be possible to classify certain decisions under risk as more "visceral" or "hormonal" than others, perhaps shedding some light on the emotional determinants of risk taking (Loewenstein, 1996;LeDoux, 1998;Loewenstein et al., 2001;Damasio, 2006). An alternative interpretation of our results is that there may be a correlation between people's general tendency to take risk and prenatal hormones in the population, but that idiosyncrasies of our sample of university students do not allow us to detect this relationship.
Of course, our findings also have limitations. Although both the R2D:4D and the L2D:4D are significantly associated with experimental measures of RP, the digit ratios explain only a very small part of the variance in individual risk taking. This finding is consistent with the remarks of Apicella et al. (2008) on the small size of the digit ratio "effect." We note also the potential for measurement to introduce further noise into the equation.
Another limitation of our study is that it looks at the links between risk taking and digit ratios among subjects in an ethnically diverse, but socially homogeneous, large pool of subjects. It is widely known that university students may be a peculiar and unrepresentative subsample of the population (Enis et al., 1972;Gachter et al., 2004;Exadaktylos et al., 2013). Further research is needed to systematically explore the association of digit ratios and risk taking in more socially and culturally diverse groups and in representative samples of the population.
One line of inquiry that deserves further attention is the role of mediating factors. For instance, risk taking has previously been shown to correlate positively with cognitive ability (Frederick, 2005;Dohmen et al., 2010;Benjamin et al., 2013), and so has digit ratio. Bosch-Domènech et al. (2014) find that low 2D:4D males and females score higher in the cognitive reflection test; Brañas-Garza and Rustichini (2011) find the same relationship for males' performance in Raven matrices. Other examples of possible mediating factors are preferences for competition, sensation seeking, optimism, and overconfidence. Disentangling the relationship between prenatal hormones and various aspects of ability and preference is likely to be a complex task, but one that could greatly enhance our understanding of how personalities are shaped in utero. It may, for example, shed more light on why individuals with low 2D:4D are more likely to self-select into the financial services profession (Sapienza et al., 2009) and are more successful in highly competitive professions like financial trading (Coates and Herbert, 2008).
In closing, it is worth reiterating that, although the digit ratio is relatively easy to measure, and it cannot be altered or manipulated, it is only a proxy. Although the evidence on its association with economic behavior, notably risk taking, is rapidly accumulating, it still leaves us several steps removed from actually measuring the effect of prenatal hormone exposure. As we discussed earlier, further research is needed on whether prenatal factors affecting the digit ratio are linked to third factors that may shape one's behavior in later life. An example, which may be of sufficient interest to researchers in its own right, is the relationship between prenatal hormone exposure, parental hormone levels, and the infant's upbringing. More generally, longitudinal research that links directly observed prenatal hormone levels to behavior in later life, beyond infancy, would do much to enrich the interpretation of digit ratio studies. With the right kind of tools and sufficiently large subject samples, there is much promise in linking biological and behavioral economic measures.          A.2. Subject Consent Form for Digit Ratio Measurement. Please read this consent form carefully and ask as many questions as you like before you decide whether or not you want to participate in the next measurement. Before you leave the laboratory today, we are asking everyone to take a measure called the digit ratio. This ratio is calculated by combining the length of your second and fourth finger, and it has been shown in various scientific studies to correlate with people's behavior in the laboratory. The most efficient and reliable way of measuring the ratio is by scanning someone's hand on a flatbed scanner.
As with all responses during our experiments, we will collect your digit ratio completely anonymously. No-one, not even the researcher in charge of the study, will be able to link your digit ratio to your identity, name, and personal information. As such, we will not be able to share your digit ratio with anyone, including you.
There are no risks to you from this research and no foreseeable direct benefits. It is hoped that the research will benefit others (or science) who wish to understand behavior and decisions. The researcher in charge of today's study has collected digit ratio data in the LSE Behavioural Research Lab before. The image data will only be used for calculating the digit ratios, and it will be stored on an encrypted hard drive with no access to any external networks, kept in a secure storage space which will only be accessible by the researchers directly involved in this project.