An exploration of the non‐iterative time trade‐off method to value health states

Abstract Time Trade‐Off (TTO) usually relies on “iteration,” which is susceptible to bias. Discrete Choice Experiment with duration (or DCETTO) is free of such bias, but respondents find this cognitively more challenging. This paper explores non‐iterative TTO with or without lead time: NI(LT)TTO. In NI(LT)TTO, respondents see a series of independent pairwise choices without iteration (similar to DCETTO), but one of the two scenarios always involves full health for a shorter duration (similar to TTO). We compare three different “types” of NI(LT)TTO relative to DCETTO. Each type is presented in two “modes”: (a) verbally tabulated (as in a DCE) and (b) with visual aids (as in a TTO). The study has 8 survey variants, each with 12 experimental choice tasks and a 13th task with a logically determined answer. Data on the 12 experimental choices from an online survey of 6,618 respondents are modelled, by variant, using conditional logistic regressions. The results indicate that NI(LT)TTO is feasible, but some relatively mild states appear to have implausibly low predicted values, and the range of predicted values is much narrower than in DCETTO. The presentation of NI(LT)TTO tasks needs further improvement.

dead at the aggregate level (Bansback, Brazier, Tsuchiya, & Anis, 2012). However, a disadvantage of DCE TTO is the cognitive burden: each pairwise choice task in a DCE TTO of EQ-5D-5L comprises 12 pieces of information (the five dimensions of health and duration for one scenario, and the same for the other scenario), and all of these can change from one task to another. There is evidence that respondents find this cognitively more challenging than a conventional TTO, where only one piece of information (duration in full health) changes from one task to another for a given health profile (Mulhern et al., 2014).
The second methodological challenge for conventional TTO is that it has separate protocols depending on whether the state is better or worse than being dead, and data for the latter are typically subjected to arbitrary transformation (Patrick, Starks, Cain, Uhlmann, & Pearlman, 1994). DCE TTO does not require a separate protocol or transformation but has its challenges (see above). The addition of lead time has been suggested as an alternative way of valuing both kinds of states using a uniform protocol (Devlin, Tsuchiya, Buckingham, & Tilling, 2011). This, in effect, allows the shorter duration in full health to take negative values and requires no arbitrary transformation of data. However, respondents may "exhaust" lead time (Devlin et al., 2011;Devlin et al., 2013), where a given lead time is not long enough to accommodate their preference for a state worse than dead. For example, if a respondent prefers to die immediately rather than to live in full health for 10 years (lead time) followed by 10 years in a severe state, then lead time TTO cannot determine an indifference point, only that the health state value is strictly lower than −1.
One possible next step is a non-iterative variant of TTO. Non-iterative TTO (NITTO) without lead time is a cross between iterative TTO and DCE TTO . The respondent is faced with a series of independent pairwise choices between health profiles, without iteration (so it is like a DCE TTO ), but one of the two scenarios always involves full health for a shorter duration than the other scenario (so it is like a TTO). The motivations are to avoid the iteration bias of TTO and to reduce some of the cognitive burden associated with DCE TTO . However, it is not clear if NITTO can predict values for states worse than dead with high precision-it will not have direct observations in the negative range and will need to extrapolate negative values based on ordinal preferences observed in the positive range. A possible adaptation is NITTO with lead time (NILTTTO), which simply adds lead time to NITTO scenarios. The obvious disadvantage of this is the added complexity, while the advantage is that, because the preferences elicited are ordinal, the analysis of NILTTTO data is not hindered by the exhaustion of lead time. Table 1 summarises the main advantages and disadvantages of these different methods. The abbreviation "NI(LT) TTO" is used throughout this paper to mean non-iterative TTO with or without lead time. An NI(LT)TTO exercise was first explored using the three-level EQ-5D in Mulhern et al. (2014), which referred to it as "binary choice TTO," but we will call it "non-iterative TTO," because most TTO exercises (including those that are iterative) are based on a series of binary choice tasks. Mulhern et al. (2014) demonstrated that respondents can deal with non-iterative TTO tasks. However, not much else is known about this innovative method.
The aim of this paper is to build on earlier research to explore the effects of the following: • respondents with various data quality concerns; • different "types" of NI(LT)TTO designs; • different "modes" of presenting pairwise choice tasks; • learning and fatigue effects over 12 choice tasks; and • heterogeneity in respondent preferences.

| The four types of experimental tasks
All the choice tasks used in this study use EQ-5D-5L (Herdman, Gudex, & Lloyd, 2011) to describe health states. Each choice task involves two scenarios consisting of "you" living in a hypothetical EQ-5D-5L state or in "full health" for a specified duration followed by death, where respondents are asked which scenario they think is better. No ties are allowed. Of the four types, the first (DCE TTO ) is included as a baseline against which to compare the NI(LT)TTO types against. The second (NITTO) is the natural hybrid of DCE TTO and iterative TTO. The third and fourth add lead time to this (NILTTTO) but are designed using different approaches.
Type 0: The baseline type used in the study is a DCE TTO that replicates the design used in an earlier study ("Type Ia" from Mulhern, Bansback, Hole, & Tsuchiya, 2017). This consists of 120 scenario pairs generated using Ngene (Choice Metrics, 2012) and has six levels of duration (6 months, 1, 2, 4, 7, and 10 years). (The number of scenario pairs, 120, is sufficient to estimate a model with categorical dummies representing the EQ-5D-5L descriptive system and continuous duration; interactions between the descriptive system and duration; quadratic duration; and interactions between the descriptive system and quadratic duration-for details, see Mulhern et al., 2017). Prior values of zero are used for all parameters, and no adjustment is made for so-called "implausible" states. A DCE TTO task can be represented as a choice between two scenarios, or health profiles, A and B, where the levels of utility u associated with each health profile, made up of state x for duration t, are given by u A = βt A + λx A t A + ε A , and similarly, where β represents the utility of living in full health for 1 year (expected to be positive); and λ represents the (dis)utility associated with living with health problems x for 1 year (expected to be negative).
The associated value of health state x (v x ) is given by This formula applies to all four types. Under Type 0, negative values are interpolated from within the data, when the combined disutility of an EQ-5D-5L state cancels out the utility of full health. For further details, see Bansback et al. (2012).
Type 1: This is a non-iterative TTO with no lead time (NITTO). One scenario (A) is to live in an EQ-5D-5L state for 10 years, whereas the other (B) is to live in "full health" for one of six shorter durations (6 months, 1, 2, 4, 6, and 9 years). This can be represented as The choice tasks correspond to TTO for states better than being dead, and therefore negative values are extrapolated from observations in the positive range. Ngene is used to generate 120 scenario pairs using prior values of zero constrained to be full health. Type 2: This is a non-iterative TTO with lead time (NILTTTO). It is similar to Type 1, but the six levels of duration used in scenario A includes a "negative" level of duration (−3, 0, 3, 5, 7, and 9 years). The design is identical to that of Type 1, except for the labels attached to the different levels of duration-the experimental design uses zero prior values, so is unaffected. In order to operationalise the negative durations, a 4-year lead time is used: in the actual choice tasks, scenario A is 4 years in "full health" followed by 10 years in the EQ-5D-5L state, whereas scenario B is 1, 4, 7, 9, 11, or 13 years in "full health." In the analysis, the lead time is subtracted so that T is 10 and t ranges from −3 to 9. Type 3: This is another NILTTTO (NILTTTO-II), but instead of an experimental design to select pairs of health scenarios in a single step, a two-stage design is used. This is an innovative and promising approach to design DCE TTO (Mulhern et al., 2017), and the present study tests if this is also viable for NILTTO. In the first stage, 120 pairs of EQ-5D-5L states (with no durations) combined with full health are generated, assuming u A = λx A + ε A and u B = 0. In the second stage, each state x in scenario A is matched with one of the six duration levels t for scenario B (−3, 0, 3, 5, 7, and 9 years) that achieves an expected split of respondents between the two scenarios of 70% versus 30%, which is chosen to be within the range of optimal choice probabilities for DCEs derived by Kanninen (2002). The results of Mulhern et al. (2014) were used as priors for this second stage. As with Type 2, the lead time in "full health" is added for the presentation to respondents but removed for the analysis.
2.2 | The modes of presentation NI(LT)TTO is a cross between iterative (LT)TTO and DCE TTO and can be presented as either of these. Typically, (LT) TTO are presented using visual aids or TTO boards, whereas DCE exercises are presented using tabular format. Thus, each of the four types above is presented in two different modes.

| Survey design, recruitment, and the sample
Within each type, the 120 choice sets are blocked randomly into 10 blocks of 12 tasks using Stata (Stata Corp); this procedure is repeated 10 times, and the blocking variable with the lowest association with the design attributes is chosen as the final blocking variable. Each respondent within a given variant is randomly allocated 1 of the 10 blocks.
Data were collected through an online survey using a commercial internet panel (Survey Sampling International). Age and sex quotas were set for each of the eight survey variants corresponding roughly to the UK general population. Target sample size was 600 for Type 0 (DCE TTO ) and 900 for the other types. Panel members were invited by e-mail to take part in one of the survey variants. Part 1 of the survey consisted of background questions including age, sex, education, own EQ-5D-5L, and life satisfaction. Part 2 was for the 12 choice tasks. In addition, there was a 13th task with a logically determined answer, which was the same across all variants (A: mild state for 10 years; vs. B: full health for 10 years) but presented in the relevant format. Part 3 of the survey asked additional questions including assessment of the choice tasks. The survey was hosted by epiGenesys, a University of Sheffield spin off company.

| Analyses of quantitative data
The choice data by respondent i for scenario j are modelled using conditional logit regressions: Of particular interest are the sign and significance of the β and the λ coefficients (β is expected to be positive and λ is expected to be negative); the relative ordering and significance of the λ coefficients within dimensions (e.g., whether the Level 4 coefficient for self-care is statistically significantly worse than the Level 3 coefficient of the same dimension); predicted health state values for select states (22222, 33333, 44444, and 55555); and the gap between the predicted values for states 22222 and 55555.
There are two further quantitative analyses. First, learning and fatigue effects are explored, through modelling the early tasks (Tasks 1-4), middle tasks (Tasks 5-8), and the late tasks (Tasks 9-12) separately, for each variant.
Second, heterogeneity in respondent preferences is examined through latent class analysis, by variant, using the lclogit command in Stata (Pacifico & Yoo, 2013). This analysis assumes that respondents can be divided into subgroups (or classes) depending on their preferences. In the estimation process, a separate set of coefficients is estimated for each class (Greene & Hensher, 2003;Hole, 2008), reflecting that preferences are allowed to vary across, but not within, classes.
Stata Versions 13 and 14 are used for all analyses.

| Analysis of free text comments
Before the end of the survey, participants were given a chance to leave a comment in a textbox field. The comments are reviewed with the aim to develop overarching themes and to compare these themes across variants. They are categorised into themes in several steps. Each comment of the first variant (0a) is assigned an initial theme and an initial index was developed. This index is applied to the next variants. A new one is created if an appropriate theme is not available for a comment. Comments could be classified under multiple themes. The themes and the number of times each theme was mentioned are then reviewed to see if differences existed per variant.

| Response rate and demographics
There are no large discrepancies in respondent numbers and rates across variants in completion rates, although respondents allocated to Type 0 (DCE TTO ) take longer than the rest (see Table A1). In general, respondents' background characteristics are similar across the eight variants (Table A2).

| Descriptive statistics of the choice tasks by variant
The median time taken for individual choice tasks, by variant, is reported in Table 2. Respondents spend considerably more time in the first task than the remaining 12 tasks. Task 13 (the logical consistency test) does not seem to take less (or more) time than the preceding tasks. Looking at the averages of the individual median time taken for Tasks 2-12, Type 0 (DCE TTO ) takes the most and Type 1 (NITTO) takes the least time. There is little difference by mode amongst these types. Across Types 2 and 3 (NILTTTO and NILTTTO-II), the effect of Mode is larger than the effect of type and the tabulated variants (Mode a) take less time than the visual aid variants (Mode b).
In terms of the distribution of preferences across the two scenarios, A and B, the data in Type 0 (DCE TTO ) are balanced evenly across the two scenarios, and few (<1%) respondents choose one or the other scenario throughout. For the remaining the six variants, there is a stronger preference for scenario B (shorter survival in full health) over A (longer survival in suboptimal health), and this is observed throughout the 12 tasks. There are no clear associations between these patterns and respondent background characteristics. 1 Over 90% of the respondents "pass" the logical consistency test by correctly choosing scenario B, with similar rates across all variants. Table 2 also reports the respondents' assessment of the choice tasks: respondents who were allocated to DCE TTO tasks (0a and 0b) found the tasks more difficult than the others; and amongst the NI(LT)TTO variants, those allocated to the visual Mode (b) report more difficulty than those allocated to the tabulated Mode (a). Nevertheless, there is little variation in the consistently high proportion of respondents who felt they could answer more tasks.

| The choice model results
The conditional logit regressions by variant are summarised in Table 3 (see Table A3 for full results). The coefficient for duration is significant in all models, and with the expected sign (positive). Variants 0a, 0b, and 1a do not perform well in terms of number of significant coefficients (variant 0b shows a coefficient with the unexpected sign for Level 2 mobility interacted with duration, or "MO2xD"). Variants 0a and 3a performed the best in terms of the number of coefficients in the expected ordering, followed by variants 2a and then 0b, 1a, and 1b. Statistical significance of each interaction term relative to the adjacent level before (within the same dimension) is also reported. Two of those differences are in the wrong order and are statistically significant at the 5% level (asterisk with a dash): between AD5xD and AD4xD (1b); and between UA5xD and UA4xD (2a). Across the dimensions, most variants result in the largest Level 5 decrement in PD and AD; and the smallest Level 5 decrement in dimensions SC, UA, or MO. In this respect, variants 1b and 3b appear to have unusual ordering of dimensions.
Predicted values (b v) for a select number of EQ-5D-5L states are illustrated in Figure 1a using Mulhern et al., 2017). In particular, for variant 2b, the predicted value for state 22222 is worse than being dead (−0.03). Variants 2b and 3b reported higher predicted values for state 33333 than 22222, which is due to the number of coefficients with the unexpected ordering between Levels 2 and 3 in these two variants. The differences between the predicted values for states 22222 and 55555, reported in Table 3 with 95% confidence intervals, are significantly lower than 1 in the two visual NILTTTO variants (2b and 3b). The DCE TTO in Mulhern et al. (2017) has a corresponding value of 1.32.

| Qualitative analysis of free text comments
Overall, 1,530 comments were coded (including multiple categories per comment) into 29 themes. 5 Table 4 summarises the 10 most frequent themes, within each variant. Example quotes for each theme are presented in Table 5. These themes cover 86-96% of the comments, depending on the variant. Across all variants, the most frequent are positive comments (26% for 3b; to 40% for 2a). Explanation of how a respondent made their choices was another frequent theme. These often referred to not wanting to be a burden on others. Comments for the NI(LT)TTO variants focused often on the trade-off between quality of life and survival, rather than between health dimensions. For example 4 Full results available from authors on request. 5 Full results available form authors on request. I was expecting to have to make more difficult choicese.g. choosing between anxiety/depression and pain. In the event I always chose the full health option (even if only 1 year was offered) because I believe full health is priceless (variant 3b).
Between 2% and 14% of comments expressed difficulty of the task, with the DCE TTO variants (01, 0b) more difficult than the NI(LT)TTO variants. Respondents also found the visual variant (Mode b) consistently more difficult than the corresponding tabulated variant (Mode a). Both these findings are in line with the assessment questions (in Table 2). Participants in DCE TTO variants reported more often to be made uncomfortable by the survey. For the other themes, there were no clear patterns by variant.

| DISCUSSION
This study is the first to experiment a full-scale health state valuation using an innovative valuation method that is a cross between iterative (LT)TTO and DCE TTO . NITTO was developed so that the advantages of TTO and DCE are combined and the disadvantages avoided. Compared to TTO without lead time, NI(LT)TTO is not susceptible to iteration bias, uses the same task for states better and worse than being dead, and does not need to make arbitrary transformations of negative values. Compared to iterative LT-TTO, NI(LT)TTO is not susceptible to iteration bias and does not need to make arbitrary assumptions to address exhaustion of lead time. All of these advantages also apply to DCE TTO , but NILTTTO is less cognitively challenging. However, the presentation of each individual NILTTTO task is more complex compared to a DCE TTO task, because of the lead time.
The study used a single-stage design and a two-stage design for NILTTTO. And finally, this study compared two modes of presentation: tabulated, and with a visual aid.
The overall results show that NI(LT)TTO is feasible. Ngene can be used to design NI(LT)TTO surveys; fewer respondents found the tasks difficult compared to DCE TTO ; the great majority "pass" the logical consistency test; and data can be modelled to produce interpretable coefficients. However, a closer look suggests a few issues for discussion.

FIGURE 2 Predicted health state value of four states, by variant, by stages of the tasks
A major concern is that the modelled coefficients for the NI(LT)TTO variants predict very low values for the milder states. A negative value for state 22222 (2b, Figure 1, full sample) lacks face validity. It appears that at least some respondents are choosing on the basis of the health states alone, without accounting for the durations. This is in contrast to iterative TTO, where some respondents resist trading off any time in full health for relatively mild states (Robinson, Dolan, & Williams, 1997). In other words, relative to iterative TTO, the way NI(LT)TTO was operationalised in this study seems to draw the respondent's attention away from the sacrifice in duration in full health. The exclusion of non-traders increases the values, but only by 0.14 on average.  Note. Percentages represent the share of a given theme within all the comments given in the variant. The columns are ordered by the overall row, pooling across all variants. Positive "it was a very good survey" "the survey was easy to understand well laided out a pleasure to complete" "FAB SURVEY" Explaining choices "my choices were based on the fact that, i hate pain and also hate being a burden to anyone including family" "I tended to opt for the choice which gave me the best healthy years. When I chose the alternative one I was intending to do away with myself before I became too dependent on carers." "It was an interesting survey. Not a situation you normally think about unless you are in it. I think overall my choices were right. I would rather live a shorter life in full health than a longer one in pain and being dependant on another person for everyday task like washing and dressing."

Made me think
"Made you think about your own life" "Interesting to try to decide whether a short, healthy life is preferable to a longer, but possibly more demanding, life. A really good survey, has given me much to think about. Well designed and set out." "Very interesting and thought provoking." Other "just be your-self and manage your time and food" Difficult "Anxiety and depression are difficult to compare to pain." Needs more information about life with health state "There is no info about where you would be living or if you had enough money. You can put up with a lot if you are home with someone you love." Difficult to imagine "Difficult to imagine what 'severe' depression or pain would feel like." Uncomfortable "I found this entire survey very uncomfortable" Unrealistic "Some of the scenarios had totally unrealisitic combinations, e.g. no difficulty walking around but unable to wash and dress yourself. If the scenario does not make logical sense, it is hard bordering on imposible to make a judgement about it." Relates to previous experience "a very interesting survey that was relevant to me after receiving an Industrial Accident of crush spinal injuries" Furthermore, there seems to be no simple pattern across the modes. The predicted values for the select states in Figure 2 (both panels) suggest that there may be complex interactions between the mode, the type, and the state. For example, the two milder states tend to have higher values using Mode (a) with the tabular presentation (but not Type 3, state 33333), whereas the two more severe states tend to have lower values with this mode (but not Type 0). In addition, the differences between 22222 and 55555 (Table 3) are larger using Mode (b) with visual aids for the NITTO variants, but not for DCE TTO . In terms of respondents' feedback, the tabular Mode (a) was found clear by significantly more respondents in Types 2 and 3 (Table 2).
However, it should also be noted that Type 0a, which is identical to the DCE TTO used in online studies elsewhere, has also resulted in lower predicted values for mild states. Therefore, the lower values for mild states cannot be attributed entirely to NI(LT)TTO. This raises questions for the reliability of online DCE TTO , which is beyond the scope of the current study.
A related point is the high proportion of nontraders. Always choosing either A or B throughout NI(LT)TTO tasks means they are not trading between quality of life and survival, and the exercise generates little information from these respondents. The proportion of respondents who always choose B throughout ranges from 7% (2b) to 23% (1b), suggesting a Type × Mode interaction ( Table 2). The lower percentage in variant 2b may suggest this variant is relatively immune from nontrading. Indeed, Figure 3 shows that the latent class with the lowest values has the lowest share (dotted lines) in variant 2b.
On the other hand, the predicted values for state 55555 are not particularly low compared to other studies. This leads to a narrower range between the predicted values of 22222 and 55555 (0.48 for variant 2b; Table 3). Analyses of NI(LT) TTO excluding nontraders do not improve the outcomes (e.g., a range of 0.51 for variant 2b; Table 3), suggesting that the phenomenon is not down to a minority of easily identifiable individuals.
The analysis of early/middle/late tasks in Figure 2 suggests that the responses to NI(LT)TTO tasks are stable (at the aggregate) as respondents work their way through the 12 tasks. On the other hand, DCE TTO data appear to deteriorate through the stages, which disagrees with Mulhern et al. (2014). This may also question feedback from respondents: similar proportions of respondents (around 85%) reported that they could answer a few more tasks, with no indication that the Type 0 respondents may be more fatigued than the rest.
The results of the latent class analysis suggest that there is heterogeneity in the data. Assuming the randomisation of respondents across survey variants was successful, it is natural to assume that the level of heterogeneity in underlying individual preferences is similar across the variants. If so, not all the variation across the variants in Figure 3 should be interpreted to represent heterogeneity in individual preferences. One interpretation is to attribute at least some of it to a heterogeneity in the ability of respondents to deal with different NI(LT)TTO tasks. Another possibility is that the NI(LT) TTO variants (especially Types 1 and 2) are better at capturing preference heterogeneity than Type 0 (DCE TTO ). Either way, judging on the basis of the difference between predicted values for 22222 and 55555 for the class with the highest share (Class 1; solid line), variants 1a (1.23), 1b (1.08), and 3a (1.55) appear similar to DCE TTO (1.59 for 0a; 1.15 for 0b).
The qualitative data reinforce these findings. First, NI(LT)TTO respondents are more likely to choose full health over nonfull health, even when the duration in full health is very short and the nonfull health state is relatively mild. And, second, respondents find NI(LT)TTO easier than DCE TTO . However, the qualitative analysis has limitations. The comments were coded by one person (MK), and there was no formal secondary coding. In addition, content analysis was performed and the comments were counted, which assumes that each comment has the same strength.
To conclude, NI(LT)TTO aims to overcome methodological challenges of iterative (LT)TTO and DCE TTO . The data indicate that NI(LT)TTO is easier than DCE TTO , generate more stable data, and involve less respondent fatigue. However, in its current forms, it clearly has its own challenges. A particular issue is the effect of visual aids used: the respondents in the variant with visual aids found the choice tasks more difficult than the respondents in the variant without visual aids. This seems to suggest substantial scope for improvement in the way the NI(LT)TTO tasks were presented. Further research is needed to better understand the potential interactions across the mode of presentation, the method (or type) of valuation exercise, and the health state being valued in non-iterative tasks, especially when conducted online. SSI internet panel. Due to an administrative error, informed consent was not obtained from individual respondents specifically for this research. The conduct of the analyses reported in this paper is approved by University of Sheffield Research Ethics Committee. The usual disclaimers apply. A number of people accessing the survey had to be turned away because the initial set up only allowed up to 1,000 attempts per variant, at which point, survey blocks "ran out." Subsequently, this was corrected to continue accepting respondents and allocating to blocks.

APPENDIX A ADDITIONAL TABLES
b Completion rate = n included/(n accessingn excluded due to block not available)