Examining children's ability to delay reward: Is the delay discounting task a suitable measure?

Discounting the value of delayed rewards has primarily been measured in children with the delay of gratification task and in adolescents and adults with the delay discounting task. In the present study, we assessed the suitability of the delay discounting task as a measure of temporal discounting in children. A sample of 7 ‐ to 9 ‐ year ‐ olds ( N = 98) completed a delay discounting task, a delay of gratification task, a sensation seeking measure, and IQ measures. In addition, teacher ‐ based assessments of attention ‐ deficit/hyperactivity disorder traits were measured. The results indicated that the majority of children produced meaningful data on the discounting task and discounted rewards hyperbolically. Children with an elevated risk of attention ‐ deficit/hyperactivity disorder showed a trend towards discounting future rewards on the delay discounting task more steeply than did those at low risk. However, delay discounting was unrelated to either delay of gratification or sensation seeking. We interpret these results as providing some support for the use of delay discounting as a measure of intertemporal choice in children, although the results also suggest that delay discounting and delay of gratification tasks may tap different processes in this population.


| Measuring delay of gratification abilities
There is now an extensive literature examining children's ability to exercise self-control in the pursuit of delayed rewards, with the dominant paradigm in this literature being the delay of gratification task.
Most studies of children have employed versions of a delay of gratification task developed by Mischel and Metzner (1962), the delay choice task, in which the participant chooses between a low-value reward immediately available and a higher value reward available after a fixed delay. Participants' choice, once made, is binding; in what follows, we will refer to this sort of task as a DoG task. Among adolescents and adults, the ability to delay reward has primarily been assessed using more complex delay discounting tasks (henceforth DD tasks). The trial structure of DD tasks is homologous to that of DoG tasks, in that participants are presented with a series of choices between a small sooner reward (SSR) and a large later reward (LLR).
DD tasks, however, give a richer picture of DoG abilities. They allow for the estimation of a discount rate (analogous to an interest rate) that describes the present value of a delayed reward. They also allow one to examine changes in discount rates as the timing of the receipt of the delayed reward varies. This increased richness comes at a price for developmental researchers in that DD tasks are challenging to conduct with children. For example, in comparison with DoG tasks, DD tasks typically involve (a) many more trials, (b) multiple delay periods that are unexperienced, and (c) rewards that are hypothetical rather than realized. In what follows, we will describe the typical procedure of a DD task and discuss how discounting behavior is measured in DD tasks before assessing their suitability for use with children.

| Measuring DD
There are many variants of the DD task, each with a number of parameterizations at the discretion of the experimenter. The majority of studies employ hypothetical monetary rewards in which the LLR (e.g., $100) is fixed in value and the SSR (e.g., $10) is made available immediately. Across trials, two factors are systematically varied: the delay to receipt of the LLR and the value of the SSR. For any given delay, an indifference point can be calculated, representing the value of the SSR at which the individual is indifferent between that and the LLR (e.g., according the same value to $100 in a year's time as to $10 now). A common way of deriving an indifference point is to take the mean value of the ascending and descending switch points (i.e., when trials are ordered by the value of the SSR, the ascending switch point is the first trial on which preferences switch from delayed to immediate reward and the descending switch point is the first trial on which preferences switch from immediate to delayed reward).
If switch points are calculated for a number of delay periods, then a continuous function can be fitted that describes the discounting of the delayed reward over time (Simpson & Vuchinich, 2000;Urminsky & Zauberman, 2015). Different discount functions model the relationship between subjective value and time in different ways (see Figure 1). Perhaps the most influential account is the discount utility model (Samuelson, 1937), which models intertemporal preferences as the output of an exponential function given by the following formula: In Equation (1), A is the value of the delayed reward, and k is a free parameter that is related to the steepness of the discounting; t is the length of the delay period. Exponential discounting has a number of features that make it attractive as a normative model of intertemporal choice. The exponential discount function entails a constant discount rate across time that is incompatible with dynamical inconsistencies in one's preferences; that is, if an exponential discounter prefers £997 now over £1000 in 1 month, then they will also prefer £997 in 1 year to £1,000 in 1 year and 1 month. Despite its putative normative status, exponential discounting is a poor fit for individuals' actual intertemporal preferences (Ainslie, 1975;Thaler, 1981). This has led researchers to search for alternative discount functions. One that has received much attention is the hyperbolic discount function (Mazur, 1987) in which discount rates are inversely proportional to the delay to receipt. Unlike exponential discounting, hyperbolic discounting accounts for dynamical inconsistencies in intertemporal preferences. The formula for hyperbolic discounting is as follows: As with the exponential function, k is a free parameter that is negatively correlated with the steepness of the discounting. A third type of discount function, quasi-hyperbolic discounting (Laibson, 1997), combines elements of both exponential and hyperbolic processes. It does so by the inclusion of an additional free parameter. The formula for quasi-hyperbolic discounting is given as follows: Sample data points fitted with four different discount models. Each line represents the best fitting equation for that model for these data points [Colour figure can be viewed at wileyonlinelibrary.com] In Equation (3), δ t models discounting as the outcome of an exponential function (as in Equation 1), and β is a free parameter representing present bias that is restricted to the value range 0 < β < 1. When β approaches 1, discounting approximates an exponential process. One advantage of a model with an additional parameter is that it can potentially model intrapersonal variation in discounting arising from framing effects and contextual factors as well as interpersonal differences in discounting. Each of the three models described above along with a y-intercept model (so-called noise model) is fitted to some sample data points and displayed in Figure 1.

| Measuring DD in children
That DD tasks rely on repeated trials involving hypothetical monetary rewards raises concerns over their validity as a measure of children's intertemporal preferences. Children have well-documented difficulties in sustaining motivation and attention (Levy, 1980), reasoning about hypotheticals (Markovits & Vachon, 1989) and are relatively inexperienced with money. Among adolescents these concerns are somewhat allayed by the growing literature documenting associations between DD task performance and a number of cognitive, social, and behavioral outcomes (Barkley, Edwards, Laneri, Fletcher, & Metevia, 2001;Bromberg, Wiehler, & Peters, 2015;Olson, Hooper, Collins, & Luciana, 2007;Romer, Duckworth, Sznitman, & Park, 2010;Steinberg et al., 2009).
By contrast to the growing literature examining adolescent discount functions, there are relatively few studies examining discount functions among children in the period before adolescence (Staubitz, Lloyd, & Reed, 2018). The majority of the existing studies have been conducted to specifically examine whether there are group differences between children with attention-deficit/hyperactivity disorder (ADHD) and typically developing children. ADHD has been proposed to arise in part due to dysfunction of the frontostriatal circuit and in particular hypofunctioning of the mesolimbic dopamine system that regulates reward reinforcement and processes of extinction (Sagvolden, Johansen, Aase, & Russell, 2005). These impairments are thought to manifest themselves in problems with self-control and, in particular, a preference for small immediate rewards over larger later rewards in situations involving intertemporal choice. The results of studies with children directly testing this hypothesis, however, are somewhat mixed (see Patros et al., 2016, for a review). Wilson, Mitchell, Musser, Schmitt, and Nigg (2011) administered a DD task to 7-to 9-year-olds with and without a diagnosis of ADHD and found that discount functions were significantly steeper for the ADHD group, although this effect did not survive controlling for IQ. By comparison, Rosch and Mostofsky (2016), using the same hypothetical DD task as Wilson et al., found no difference in performance between an ADHD group and typically developing controls aged 8 to 12 years (see also Antonini, Becker, Tamm, & Epstein, 2015;Martinelli, Mostofsky, & Rosch, 2017).
One possible interpretation of these largely negative results is that group differences are not consistently found because the standard DD task is unsuitable for use with children. Indeed, in the light of potential difficulties children may have with tasks involving hypothetical monetary rewards over long delays, an alternative approach has been to use DD tasks in which children experience real delays and real rewards. Studies using this approach have, by necessity, used very short delays (e.g., 30-60 s) and small rewards (e.g., 10 cents). These studies, too, though, have also reported mixed results with regard to differences between typically developing children and those with ADHD (Martinelli et al., 2017;Scheres et al., 2006;Yu, Sonuga-Barke, & Liu, 2015).
In summary, various different techniques have examined children's ability to delay rewards. The hypothetical DD task that has been used extensively with adults and adolescents has occasionally been used with children, but it is not clear whether more complex versions of this task using multiple delays are suitable for use before adolescence (Staubitz et al., 2018). It is also not clear to what extent, specifically in a child population, such hypothetical DD tasks measure the same thing as the DoG choice tasks more typically used with children.

| THE CURRENT STUDY
The aim of the current study was to develop a "child-friendly" version of the DD task and assess its suitability as a tool for assessing children's intertemporal decision making. To do so, we addressed three questions arising from children's performance on a DD task. First, we examined the relation between children's performance on a DD task and that on a DoG choice task. Given that these paradigms have emerged from distinct research traditions (DoG from the psychology of self-control and impulsivity and DD from behavioral analysis and economics) it is perhaps unsurprising that the relationship between them has been either ignored or on occasion taken for granted (Reynolds & Schiffbauer, 2005).
Second, we examined the time course of children's discounting behavior in more detail than has previously been studied. Performance on DD tasks is typically measured in one of two ways: either by fitting a hyperbolic function (as described in Equation 2) and taking the log of the free parameter k or by approximating the area under the curve using the trapezoidal rule (so-called point-based AUC; Myerson, Green, & Warusawitharana, 2001), with smaller AUC indicating greater discounting of delayed rewards. Log k presupposes that a hyperbolic discount function best describes changes in the subjective value of rewards across time. Although there is a great deal of evidence to indicate that a hyperbolic function is a better fit of adult discounting behavior than is an exponential function (Kirby & Maraković, 1995;Thaler, 1981), there is nevertheless a growing realization that hyperbolic functions are not the best fit across all scenarios (Read, 2001;van den Bos & McClure, 2013). A single hyperbolic discount parameter, for example, cannot account for sign effects (gains are discounted steeper than losses) or magnitude effects (smaller rewards are discounted steeper than are larger rewards; Benzion, Rappoport, & Yagil, 1989;Read, 2011;Thaler, 1981). Most importantly for developmental researchers, studies comparing different discount functions have almost exclusively been conducted with adult and adolescent samples, leaving the question of whether children's discounting approximates an exponential, hyperbolic, or some alternative model unanswered.
In the present study, we fitted exponential, hyperbolic, quasihyperbolic and y-intercept functions (so-called noise models) to individuals' indifference points and used Bayesian information criterion scores (Schwarz, 1978) to select the best fitting model. Exponential and hyperbolic models were chosen due to the priority they are afforded in adult studies of temporal discounting. The quasihyperbolic model (Laibson, 1997;Phelps & Pollak, 1968), like the hyperbolic model, captures dynamic inconsistencies in preferences; however, the quasi-hyperbolic model explicitly models the influence of present bias in the form of the β parameter (see Equation 3). Present bias is arguably an important feature of children's discounting behavior (Hongwanishkul, Happaney, Wendy, & Zelazo, 2005 Third, and finally, we examined the link between DD and measures of sensation seeking and ADHD, traits that have been associated with DD in adolescents. As described above, although a number of studies have found a difference between the discount rates of adolescents with ADHD and those od typically developing controls ( We did not use an ADHD group in the study, but we did gather teacher ratings of ADHD traits in our sample in order to examine the relation between such ratings and children's performance on the DoG and DD tasks. It is difficult to predict whether such relations would be expected to be observed in the general population given that previous studies have focused on clinical groups, but we were particularly interested in whether the two delayed reward tasks (DD and DoG) showed differential relations with this measure and specifically in whether the DoG task, which had real rewards available in real time, would show a stronger relation with ADHD traits.
However, despite its conceptualization as a facet of impulsiveness, studies with adults and adolescents have typically not found a consistent relation between sensation seeking and impatience in intertemporal choice tasks (Khurana et al., 2015;Mishra & Lalumière, 2011;Ostaszewski, 1996). Interestingly though Romer et al. (2010) indicate that among younger adolescents (14 to 16 years old), higher sensation seeking is associated with steeper discounting, whereas the pattern is reversed with among older participants (19 to 22 years old). To the best of our knowledge, our study is the first to examine the relation between sensation seeking and DD in a child sample.
To summarize, in the current study, we developed a novel DD task to measure intertemporal choice in children before the adolescent period. The DD task that we used was computer programmed and deliberately designed to be more child friendly than versions of the task typically used with older participants. There was an initial training phase, and on test trials, the rewards were represented concretely using pictures of amounts of money; previous DD tasks with children using hypothetical rewards have typically not used visual presentations (Staubitz et al., 2018). The choices available to children were also made clear by first presenting the immediate option, both visually and auditorily, and then presenting the delayed option, again both visually and auditorily (see Section 3 for details). In addition to the measures described above, we included a measure of IQ, given previous research has indicated a relation between IQ and DD performance in older participants (see Shamosh & Gray, 2008, for a meta-analysis).

| Participants
One hundred four children aged 7 to 9 years were recruited for this study. Data from five participants were excluded from the analyses due to equipment failure on the DD task; data from one further participant were removed due to experimenter error. The final sample therefore consisted of 98 children (M age 8 years 7 months, SD 8 months), 53 of whom were female. Three participants were not administered the sensation seeking measure due to their experimental session running out of time. Participants were recruited from three 1 Johnson and Bickel (2008) recommend removing participants from DD analysis that show less than a 10% decrease in subjective value from the first delay period to the last delay period. However, as the longest delay period we used in the present task was 180 days, we considered the 10% decrease rule too conservative.
rural primary schools local to the experimenters, and informed written consent for participation was sought from parents/guardians prior to testing.

| Materials
The DoG task employed two shallow trays, one labeled now and another labeled tomorrow, to display the rewards. Four reward pairs were used: trading cards (1 vs. 2), party bags (1 vs. 2), sweets (3 vs. 6), and cartoon clips (1 vs. 2). Party bags were opaque and filled with small toys. Cartoon clips were 2 min in length, and miniature "cinema tickets" were used to represent the number of clips children could watch (1 vs. 2). There were six different cartoon clips from which to choose. Both the DD task and the sensation seeking scale were presented on a 15-inch Dell touchscreen laptop with a Core i5 vPro processor.

| Design and procedure
Children were tested individually and alone in either a room or corridor in their school. Testing took place over two sessions on consecutive days. On Day 1, participants first completed a DoG task and then undertook IQ measures. 2 On Day 2, they first completed the DD task and then the sensation seeking measure. Any rewards owing to children from delayed choices on the DoG task were presented to them at the end of the Day 2 session. The tasks were administered in the same order for all. Each session lasted approximately 20-30 min.

| Delay of gratification
This task consisted of four trials in which participants were offered a choice between a small immediate reward and a larger reward available after a delay of one day. Prior to commencing the test trials, children selected their favorite theme of trading card from an array of six (e.g., football and Disney). Cards from their favored theme were then used on the test trial, though their specific identity was unknown to participants as they were presented face down. Children sat at a table, opposite the experimenter, and on each trial, the rewards were placed in two trays labeled now and tomorrow, positioned in front of children.
Delayed choices were scored as 1 and immediate choices as 0, with total scores on the task varying from 0 to 4.

| IQ measures
Two subtests from the Wechsler Intelligence Scale for Children (WISC; fourth edition; Wechsler, 2003) were administered: vocabulary and blocks design, with raw scores on the tasks used in the analyses.

| DD task
The DD task was a computer-based hypothetical intertemporal choice task implemented on E-prime (Psychology Software Tools, Pittsburgh, PA). The task began with a training phase that familiarized participants with the task parameters and trial structure. On test trials, the immediate reward was presented on the upper left side of the screen, and the delayed reward was presented on the upper right side (see Figure 2). The reward values were always displayed with both pictures and text. Below each reward, the respective delay to receipt of that reward was displayed. The pictures and text displaying the reward values and respective time delays were introduced serially. Accompanying audio described the reward values and delays as they appeared.
First, the immediate reward was displayed by itself for 2,000 ms, then the temporal information (i.e., "now") for a further 2,000 ms, then the delayed reward value appeared for 2,000 ms, and finally, the delay information (e.g., was "1 week") displayed. Participants were familiarized with this procedure in the training phase and instructed to select the option they preferred by tapping the computer screen. £10). These were included to ensure that participants understood the choice they had to make and were paying attention to the task.
Only participants who selected the greater sum of money on at least two of the three check questions had their discounting data included in subsequent analyses. Indifference points were calculated by taking the midpoint of the ascending and descending switch points. As a final check on the validity of the data, we applied a systematicity criterion (Johnson & Bickel, 2008)

| Sensation seeking
We assessed sensation seeking using three of the five subscales from with temptation (e.g., "Keeping secrets is easy for me"). The Thrill Seeking subscale measures individuals' preference for exciting, emotionally arousing activities (e.g., "I would like to try jumping from a plane with a parachute"). We selected these three subscales because of their high internal reliability and their association with parental and child reports of risk-taking (Morrongiello & Lasenby, 2006). We also felt that the items on these scales were better suited to a primary school age sample than were the two subscales we omitted, which predominantly focus on behavioral choices more typically available to older children. The SSSC presents pairs of statements antonymic in meaning (e.g., "Keeping secrets is easy for me" and "Keeping secrets is hard for me") and asks children to select the response that best characterizes them. In the present version of the task, the statements were presented on a computer, each in a speech bubble emanating from an avatar. The experimenter read each option aloud to the child. The avatars were positioned on the left and right of the screen with the high and low sensation seeking options randomly assigned to either the left or the right.
High sensation seeking options are scored 1 and low sensation seeking options are scored 0. Scores on the Behavioral Intensity, Behavioral Inhibition, and Thrill Seeking subscales, ranged from 0 to 8, 0 to 7, and 0 to 11, respectively. Higher scores on the Behavioral Inhibition subscale are indicative of weaker inhibition, and weak Behavioral Inhibition is associated with high sensation seeking.

| Conners Teacher rating scale
Teachers of participants completed the Conners 3 Teacher Short Form (Conners, 2008)  to guidelines given in the test manual; these adjust for age and gender. T scores greater than 65 (so-called elevated scores) are 1.5 standard deviations above the mean for that age and gender group and are described as indicating "significant concerns" (Conners, 2008). We therefore identified children who had a T score greater than 65 on either the inattention subscale or the hyperactivity subscale as an at-risk group.

| RESULTS
Descriptive statistics of the variables measured are reported in Table 1.
The internal consistency (α coefficient) of the DoG task was.60. Preliminary analyses examined the effect of gender on the dependent measures listed in Table 1. As males were older than females (8 years 11

| DD task
Applying both the systematicity criterion and the check question criterion removed 29% of participants (N = 28) from the DD analysis. Fourteen children (14%) produced unsystematic data, and 20 children (20%) failed two or more check questions (six children produced both unsystematic data and failed two or more check questions). We compared children removed from the DD analysis with those included in the analysis on age, vocabulary, and block design scores from the WISC and DoG score, accounting for multiple comparisons using a Bonferroni correction. Children removed from the DD analysis were significantly younger than those who produced meaningful data (101 vs. 106 months), t(96) = 2.9, p < .01. There was no difference between the groups on any of the other measures (all p values >.06).
We fitted exponential, hyperbolic, quasi-hyperbolic, and noise models to each participant's indifference points.

| Relations between the measures
We next examined the zero-order correlations between the three subscales of the SSSC. Behavioral Intensity and Thrill Seeking were highly correlated with one another (r (95) = .

| Conners ratings of elevated risk of ADHD
Twenty-four children had an elevated T score (>65) on either the inattention or hyperactivity subscales of the Conners 3 Teacher Short Form. Table 4 compares the scores of two groups on the various dependent measures. The low risk group had significantly larger AUC scores than had the elevated risk group, t(68) = 2.06, p = .04, Cohen's d = 0.54. Conners T scores are adjusted for age and gender differences; however, to control for the potential contribution of IQ differences between the groups, we ran an analysis of covariance on AUC scores with WISC vocabulary and WISC block design as covariates and risk (low vs. elevated) as the predictor variable. The trend for ADHD risk to predict AUC score was only marginally significant when covarying IQ, F (1, 66) = 3.77, p = .06, η p 2 = .05.

| DISCUSSION
In this study, we examined the suitability of the DD task as a measure of intertemporal preferences in children. We examined the nature of the data obtained using the DD task from a sample of 7-to 9-yearolds; we also looked at the relations between children's performance on a DD task and their performance on a DoG choice task (Mischel & Metzner, 1962), as well as further measures of sensation seeking and ADHD traits. DD performance was unrelated to DoG performance and sensation seeking. However, DD was associated with teacher-based assessment of ADHD risk, such that those with elevated risk discounted hypothetical future rewards more steeply (although this association was only marginally significant when controlling for IQ).

| Patterns of performance on the DD task
A variety of aspects of our findings enable us to examine whether the DD task is a suitable one for this age group. First, it is possible to examine whether individual data are consistent with the basic axiom of normative decision theory, namely, that individuals should seek to maximize expected value. Selecting a delayed reward of less value than an immediately available reward is a violation of this axiom. Participants in our study were presented with four such trials (e.g., £10.50 now or £10 in 7 days). In total, 80% of our sample selected the immediate reward on at least three of these four trials. We also included three check trials in which participants had to select between two immediate rewards of different value; 80% of participants selected the larger reward on at least two of these trials. Second, if children are producing meaningful data on the DD task that are similar to those produced by older participants, we might also expect indifference switch points to decrease as time delays increase. Under the criterion recommended by Johnson and Bickel (2008), only 14% of our sample produced data that were not systematic in this way. 3 Finally, when we applied exponential, hyperbolic, quasi-hyperbolic, and noise models to an individual's indifference points, we find that the noise model is the best fit for a relatively small number of participants (17%). The noise model captures insensitivity to variations in delay.
Overall, on the basis of these criteria, we can tentatively conclude that the majority of children in this age range produce data that suggest they engage with and understand the task. This is despite the fact that children had to complete a large number of trials and that the rewards were hypothetical and monetary in nature. However, it needs to be borne in mind that around 30% of children did not produce such data; this is despite the fact that we tried to ensure the task was child friendly, with concrete displays of the rewards. Notably, Staubitz et al.
(2018) point out that many previous DD studies with children have not used any performance-based exclusion criteria. Our data suggest that future studies using similar tasks need to include suitable checks to ensure that children who are not behaving systematically in the task or who appear not to engage with it are removed from the sample.
The fact that such a sizeable minority of children were excluded for producing inconsistent data suggests that this task is unsuitable for children younger than those included in the present study.
One of the most robust findings in the adult intertemporal choice literature is that hyperbolic or quasi-hyperbolic functions give a significantly better approximation of individual discounts rates than do exponential functions (Dixon, Jacobs, & Sanders, 2006;Kirby & Hernstein, 1995;Myerson & Green, 1995;Petry & Casarella, 1999); this is also true for teenagers (Bromberg et al., 2015). In the present study, model fitting indicated that for the majority of participants the hyperbolic discount function and quasi-hyperbolic discount function were the best approximations of children's discounting of delayed rewards. Unlike the exponential discount function, both the hyperbolic   Wilson et al. (2011), applying the same criteria on an identical task with the similar aged sample, reported that 17% of their sample produced unsystematic data. and quasi-hyperbolic models predict dynamical inconsistencies in intertemporal preferences. Thus, in this respect, our data suggest that, when used with children, the DD task produces data that resemble those of older participants. Interestingly, there were a small number of participants who appeared to discount rewards exponentially (n = 6). These children had significantly reduced AUC scores in comparison with hyperbolic discounters. Notwithstanding obvious concerns about small sample size, the reduced AUC scores suggest that exponential discounters differ from hyperbolic discounters in their discount rates at longer delays rather than at shorter delays.

| Relations with other measures
The DD task has attracted particular interest amongst researchers because of its association with a host of real-world behaviors in adolescents and adults. Many of the correlates of DD among adolescents and young adults are not relevant for children (e.g., substance abuse; Audrian -McGovern et al., 2009;Kirby, Petry, & Bickel, 1999;Reynolds, 2004). However, two traits that have been associated with DD task performance in adolescents that can be measured in children are ADHD (Barkley et al., 2001;Demurie, Roeyers, Baeyens, & Sonuga-Barke, 2012) and sensation seeking (Romer et al., 2010). In the present study, after controlling for differences in IQ, there was a trend for children with elevated risk of ADHD as assessed by teacher report to show steeper discounting in the DD task than children at low risk, but not lower scores on the DoG task. The former finding corroborates and extends those studies in which children with clinically diagnosed ADHD show steeper discounting of real rewards over short delays (e.g., Martinelli et al., 2017;Yu et al., 2015). Unlike those studies, the present task used hypothetical rewards over much longer delay periods and tested children with elevated risk of ADHD on the basis of teacher ratings rather than on a clinical diagnosis.
The performance of children on the DD task did not predict either sensation seeking or the tendency to delay on a DoG choice task of the type typically used with children. To the best of our knowledge, this is the first study that has simultaneously examined sensation seeking and DoG in children. The measure of sensation seeking that we used has previously been shown to predict children's real-world  Khurana et al., 2015) argue that, although preference for smaller immediate rewards should be viewed as a facet of weak self-control that is associated with poorer developmental outcomes, high levels of sensation seeking can be adaptive across the adolescent period and are not in and of themselves predictive of poorer long-term outcomes. Although our results do not allow us to speculate on whether similar claims can be made regarding sensation seeking in childhood, they do suggest that this trait is not associated with difficulties delaying gratification, indicating that these two constructs cannot straightforwardly be characterized together as features of impulsivity in children.
Finally, we turn to the lack of a relation between the DD and DoG measures. Relatively few previous studies, with any age group, have included both these measures (though see Göllner, Ballhausen, Kliegel, & Forstmeier, 2018). In a recent meta-analysis of the convergent validity of measures of self-control, Duckworth and Kern (2011) report an average across-task correlation of r = .21 from studies that used at least two DoG tasks (although this was based on a sample of just four papers, indicative of the small number of studies that have reported correlations among different DoG measures). In the current study, the DoG and DD tasks differed from each other in a number of ways, including reward type (real vs. hypothetical), delay lengths, and trial number. It is difficult to assess which, if any, of these factors contributed to the lack of a relation between the measures. At face value, the findings suggest that the tasks draw on different processes, although alternatively the two measures may differ in their sensitivity. The DoG task consisted of four trials at a single delay period (1 day), whereas the DD task in the current study consisted of 91 trials and measured indifference points at four different delays. Indeed, among those who produced systematic data on the DD task, 78% delayed on at least three of the four DoG trials, meaning there was relatively little variance in the DoG data. It is possible that a more sensitive DoG task would yield a relation with DD (see Göllner et al., 2018), although we note that the DoG task was sufficiently sensitive to yield a significant zero-order correlation with sensation seeking. As things stand, though, despite research with adults suggesting that delayed reward tasks involving hypothetical rewards essentially measure something similar to those involving real rewards (Madden, Begotka, Raiff, & Kastern, 2003), we cannot yet be confident that this is the case in children.

| Summary and conclusions
The current findings suggest that a DD task involving hypothetical monetary rewards may be suitable for use from around 8 years, at least when the task is child friendly and includes visual props that represent the reward values in a concrete way. However, individual data need to be inspected carefully, and researchers should expect to have to exclude data from a substantial minority of their participants. We note that there may be further ways of improving the DD task for use with children; some versions of the task use a titration method (Rodzon, Berry, & Odum, 2011) in which the value of the rewards is adjusted on the basis of participants' previous responses (see Bromberg et al., 2015, for an example). Such a method could yield a less onerous procedure in which fewer trials are needed in order to gauge switch points. In the current study, we found some evidence that DD performance relates to a behavioral measure (teacher ratings of ADHD traits). However, unlike with adults and adolescents, it remains to be seen whether children's performance on the DD task is more widely predictive of real-world behavior, either concurrently or longitudinally. Indeed, Watts, Duncan, and Quan's (2018) recent findings regarding the predictive power of the DoG task in children when other variables are controlled for provide grounds for being cautious about yet assuming that the DD task will serve as an important predictor of developmental outcomes.