Allowing for missing outcome data and incomplete uptake of randomised interventions, with application to an Internet-based alcohol trial

Missing outcome data and incomplete uptake of randomised interventions are common problems, which complicate the analysis and interpretation of randomised controlled trials, and are rarely addressed well in practice. To promote the implementation of recent methodological developments, we describe sequences of randomisation-based analyses that can be used to explore both issues. We illustrate these in an Internet-based trial evaluating the use of a new interactive website for those seeking help to reduce their alcohol consumption, in which the primary outcome was available for less than half of the participants and uptake of the intervention was limited. For missing outcome data, we first employ data on intermediate outcomes and intervention use to make a missing at random assumption more plausible, with analyses based on general estimating equations, mixed models and multiple imputation. We then use data on the ease of obtaining outcome data and sensitivity analyses to explore departures from the missing at random assumption. For incomplete uptake of randomised interventions, we estimate structural mean models by using instrumental variable methods. In the alcohol trial, there is no evidence of benefit unless rather extreme assumptions are made about the missing data nor an important benefit in more extensive users of the intervention. These findings considerably aid the interpretation of the trial's results. More generally, the analyses proposed are applicable to many trials with missing outcome data or incomplete intervention uptake. To facilitate use by others, Stata code is provided for all methods. Copyright © 2011 John Wiley & Sons, Ltd.


Introduction
Missing outcome data and incomplete uptake of trial interventions are common problems in randomised controlled trials. A key consideration in handling both issues is the intention-to-treat (ITT) principle [1], which states that all individuals randomised in a clinical trial should be included in the analysis, in the groups to which they were randomised, regardless of any departures from randomised treatment. Following this principle preserves the benefit of randomisation, that the treatment groups cannot differ systematically on any factors except those assigned in the trial, and avoids selection bias. However, it is not universally agreed how the ITT principle applies when some outcomes are missing [2]. Further, the ITT principle does not tell us how to estimate the effects that might have been observed with better uptake of trial interventions.
Missing outcome data are problematic because they cause a loss of power and can lead to biased estimates of intervention effects. Once data are missing, the loss of power cannot be reversed, but it can be minimised by appropriate analysis choices, in particular by including all observed data in the analysis [3]. Estimates of intervention effects are typically biased if the analysis makes the wrong assumption about the missing data. However, any analysis with missing data must make partly or completely untestable assumptions, so we can rarely be sure that we have the correct analysis. For this reason, sensitivity analysis is recommended [4][5][6].
The assumptions of many (but not all) statistical methods for handling missing data can be expressed using the framework of Little and Rubin [7]. Data are missing completely at random (MCAR) if the probability of data being missing does not depend on any missing or observed values. Data are missing at random (MAR) if the probability of a particular set of values being missing for an individual does not depend on the values themselves, conditional on the observed values of other variables. Otherwise, data are missing not at random (MNAR).
Incomplete uptake of trial interventions often means that randomised groups have more similar experience than the investigators had intended, which usually causes the difference in outcomes to be smaller than it would have been with better uptake [8]. However, bias in the estimated intervention effect is not always towards zero: incomplete uptake in equivalence or non-inferiority trials, or in trials where non-trial interventions are available, can inflate differences between randomised groups [9]. Estimating the effect of allocating an intervention does not require adjustment for incomplete uptake, in contrast to estimating the effect of a particular level of intervention uptake. The latter is commonly carried out by per-protocol analysis, which excludes data observed when participants had poor intervention uptake. However, per-protocol analysis is undesirable because it is subject to selection bias. Randomisationrespecting alternatives achieve the same aim by using only comparisons of groups as randomised [8]. One such method is principal stratification [10], which leads to estimation of the complier-average causal effect (CACE) [11] in problems where intervention uptake is dichotomous. An alternative, suitable for quantitative intervention uptake, is the structural mean model (SMM) [12].
This paper aims to promote the implementation of recent methodological developments by describing a sequence of analyses that explores both issues and to illustrate the methods using data from an Internetbased trial. This trial is a good example because the issues are particularly acute, but they arise in a wide range of other trials. The Internet-based trial is described in Section 2. Methods for tackling missing data are described in Section 3, with results in Section 4. Methods for tackling incomplete uptake of interventions are described in Section 5, with results in Section 6. We conclude with a discussion in Section 7.

The Down Your Drink trial
Hazardous drinking in the general population is an important public health problem [13]. Brief interventions are effective [14] but hard to implement. The Internet is increasingly used to deliver behaviour change interventions [15], and a new 'Down Your Drink' (DYD) website was developed, building on psychological theories and aiming to engage users by providing interactive tools [16].
The DYD trial was a randomised evaluation of the DYD website compared with a non-interactive control website providing information only [17]. All stages of the trial-recruitment, randomisation, intervention and data collection-were conducted online. This presented a number of challenges [18].
The key challenge relevant to the present paper was whether the numbers of participants using the intervention website and providing follow-up data would be sufficient.
The primary trial outcome was alcohol consumption in the previous week, which was recorded by the TOT-AL, a specially developed online questionnaire [19]. When an outcome assessment was due, participants received an email with a link to the trial website where they could complete the outcome questionnaires. Alcohol consumption was transformed in all analyses to log(number of units in the last week plus 1). This paper uses data, summarised in Table I, from the pilot trial, which recruited 3746 individuals from 16 February to 16 October 2007. Outcome data were collected at 1 and 3 months; we focus on estimating the intervention effect at 3 months. The correlation between baseline and 3-month alcohol consumption was 0.41 (0.45 for baseline and 1 month; 0.53 for 1 month and 3 months).
Although baseline data were complete, poor follow-up response rates were anticipated because there was no personal contact with participants. To increase response rates, all participants who did not complete the outcome questionnaires within 7 days of the first email invitation received second and (if necessary) third invitations at weekly intervals. A fourth email inviting participants to provide their outcome data directly by email to the investigators yielded no further responses. Offline follow-up was attempted for users who had provided a telephone number or address, but was not successful [18]. Incentives were also trialled [20]. Participants were additionally randomised to complete only one of four secondary outcome measures in order to reduce the assessment burden and improve response rates Copyright [21]. The number of emails sent to each participant is summarised in Table II and is used in the analysis in Section 4. In the intervention arm, the mean log(TOT-AL C 1) is larger in later respondents than earlier respondents, suggesting a MNAR mechanism, with non-respondents perhaps having an even higher mean log(TOT-AL C 1). Similarly, low use of the website was a concern. It is hard to define and measure website use [22]; in particular, although each page download was recorded, the length of time that participants spent actually using the website is unknown. We summarised website use by the number of login sessions and the total number of pages downloaded in the first month; in calculating the latter, multiple downloads of the same page were counted only if they occurred in different login sessions. Individuals were automatically logged in to the intervention or control websites after randomisation, but a few who immediately left the trial website had no logins.
The main findings of the trial were that alcohol consumption in responders dropped substantially from baseline to 1 month, and again slightly from 1 to 3 months, but that the drops were very similar across randomised groups [23]. The ratio of (geometric mean) 3-month alcohol consumption in the intervention group compared with the control group was 1.04 (95% confidence interval 0.94 to 1.16). However, missing data were substantial and more common in the intervention group (Table I). Use of the intervention website was greater than use of the control website, but the majority of participants in both arms had only one login session.
The outstanding questions that this paper aims to answer are whether the results are robust to different assumptions about the missing data and whether interpretation is affected by incomplete use of the website.

Missing outcome data: methods
We propose a modelling strategy that starts with simple data on baseline and outcome and then progressively adds in intermediate outcomes, website use and ease-of-contact data.
For the ith participant, let´i denote their randomised group, x i a vector of baseline covariates (assumed complete), y i1 , y i2 the outcomes at two follow-up times and r i1 , r i2 whether each outcome was observed (1) or missing (0). Our methods generalise easily to more than two follow-up times. If the data were complete, then an adjusted analysis for the outcome at follow-up time 2 would estimateˇin the model An unadjusted analysis is the same without the 0 x i term.

Complete cases
A first analysis fits model (1) in the subset with r i2 D 1, the 'complete cases'. This analysis is inefficient because it does not make use of individuals with y i2 missing but y i1 observed. It is valid if the model is correctly specified and the data are 'covariate-dependent missing completely at random' [24]: that is, if the missing data mechanism depends only on the baseline covariates included in the model. If the model is incorrectly specified (e.g. if it should contain a nonlinear function of x i ), then the analysis is in general valid only if the data are MCAR within randomised groups [25].

Using repeated outcome measures
We now consider three methods that jointly model both y i1 and y i2 . These are valid if the data .y i1 ; y i2 / are MAR given .´i ; x i /: in particular, for participants with y i1 observed, dropout at follow-up time 2 is now allowed to depend on y i1 . A generalised estimating equations (GEE) approach [26] fits the model: Normality is not assumed, but the residual variance 2 is assumed to be equal at the two times. Estimation uses the standard estimating equations [26]; the parameter of main interest isˇ2. If the model is misspecified (in particular, if the residual variance is different at the two times), then valid standard errors can still be obtained by the robust (sandwich) method. With incomplete data, point estimates for a correctly specified model are valid if the data are MAR, whereas point estimates for an incorrectly specified model are valid if the data are MCAR; weighted estimating equations can relax the latter condition to MAR [27]. We allow the coefficients in the two components of (2) to be different: this amounts to allowing interactions between time and the baseline variables x and´. In general, it is best to use an unstructured working correlation matrix; with only two time points, this is the same as an exchangeable working correlation matrix. Copyright  A mixed models approach [28] modifies the model defined by (2) and (3) by adding the distributional assumption and replacing (3) with an unconstrained variance-covariance matrix †. The model is estimated using restricted maximum likelihood. It may be appropriate to allow † to differ by randomised group.
In multiple imputation (MI), several completed data sets are produced by drawing the missing values from their posterior predictive distribution thus acknowledging the uncertainty due to missing data under a MAR assumption [29,30]. This can be carried out using model (4). It is often sensible to draw imputations separately for each trial arm, because interactions between randomised group and baseline covariates may be of interest [31]. It is sometimes considered that MI offers a way to include all randomised individuals in the analysis (e.g. [32]). However, if the imputation model is the same as the analysis model, then MI is expected to give approximately the same results as a mixed model analysis [33].

Using compliance
One way to make the MAR assumption more plausible is to introduce other post-randomisation variables v i into the analysis. Specifically, we now assume that .y i1 ; y i2 ; v i / are MAR given .´i ; so that observed values of v i are allowed to explain missingness of .y i1 ; y i2 /. Here, we take v i as the amount of intervention received (compliance), because this is likely to predict both outcomes .y i1 ; y i2 / and responses .r i1 ; r i2 /, but v i could also include trial outcomes that are more observed than y i2 . In the mixed model approach, v i can be included using the extended model [5] where i1 and i2 are still as defined in (2) and i3 D˛3 Cˇ3´i C 0 3 x i . † is modelled completely flexibly. The GEE approach is similar but without the normality assumption; † is modelled with an unstructured working correlation and equal variances, so it is advisable to scale the compliance variables to have variances similar to the outcome variables. Including v i is easiest under the MI approach, because it can simply be included in the imputation model and excluded from the analysis model.
In many trials, compliance has a very different distribution across the two arms and may have different meaning. In this case, it is important to allow the association between v i and .y i1 ; y i2 / to vary by randomised group. This is most conveniently carried out in MI, by imputing separately by arm; it cannot be carried out using standard GEE implementations, but a mixed model could allow † in (5) to depend on´i .

Sensitivity analyses
The aforementioned models attempt to make a MAR assumption more plausible by including more data in the analysis [34]. However, MAR often remains at least questionable, if not implausible [35]. We now consider sensitivity analyses to departures from MAR. Following Kenward et al. [4], we embed the MAR model in a wider family of MNAR models indexed by one or more 'informative missing parameters' that express the magnitude of departures from MAR. We then use subject-matter knowledge to specify possible values of the informative missing parameters and re-estimate the intervention effect in each case.
We use a pattern-mixture model [36] that extends Equation (1) by allowing a term ı Y that controls departures from MAR: The regression parameters subscripted CC can be estimated by fitting Equation (1) to the complete cases (r i2 D 1), but ı Y is not identified by the data. An important extension allows the informative missing parameter ı Y to differ between randomised groups: This model is plausible because, for example, missing data may well be more informative among individuals who have been encouraged to change their behaviour than among controls, and is important because treatment effects are most affected when departures from MAR behave differently in the two arms [37]. Parameters .ı Y 0 ; ı Y 1 / are not identified by the data: ı Y 0 is the mean difference between unobserved and observed outcomes in the control arm, adjusted for x, and ı Y 1 is the corresponding difference in the intervention arm.
This model has previously been used with an informative prior distribution for .ı Y 0 ; ı Y 1 / that was elicited from investigators [37]. In the present paper, investigators' views are used to define plausible values of .ı Y 0 ; ı Y 1 / for sensitivity analysis, rather than tackling a fully Bayesian analysis. It is useful to consider three sensitivity analyses: first, to values of It is easy to estimate Â C C and Â ADD . Finally, the estimated parameters are independent, so we can estimate var

Using the number of attempts
Instead of specifying the informative missing parameter(s) based on subject-matter knowledge, it may be possible to estimate them using data on the number of attempts made to observe outcome y i2 . Let r i2k be the outcome of the kth attempt to observe the primary outcome y i2 , where r i2k D 1 indicates that the outcome was observed, r i2k D 0 indicates that it was not observed and r i2k D indicates that the kth attempt was not made (either because a previous attempt was successful or because the participant had refused or withdrawn from the trial). The association between r i2k and y i2 can be identified using Alho's model, which assumes that this association is the same for all k [38]: Here, we allow the probability of responding to vary between attempts, but we assume that the association between fully observed covariates and responding is the same at all attempts, although the latter assumption could easily be relaxed. ı R is an informative missing parameter, and ı R D 0 corresponds to MAR. Estimation of model (8) uses data on individuals with observed outcomes together with the numbers and baseline covariates of individuals with unobserved outcomes. The model may be fitted using a conditional likelihood supplemented by a set of estimating equations [38], but this algorithm is not guaranteed to converge. Alternative estimation methods are based on the full likelihood for model (8) jointly with model (4). A Bayesian approach has been used [39], and a likelihood-based approach is also possible; the likelihood involves integrating out the unobserved values of y i2 . Fitting the model by using the full likelihood directly estimates the parameters of (4); an alternative is to use the inverse of the response probability as a weight for analysis of complete cases [38], but care must be taken to obtain standard errors that allow for the often large uncertainty in the weights [39].
As in the previous section, an important extension to model (8) allows the informative missing parameter ı R to differ between randomised groups: where ı Ŕ i y i2 can also be written as ı R 1´i y i2 C ı R 0 .1 ´i /y i2 .

Implementation
In the DYD trial,´i D 1 for individuals randomised to the interactive website and 0 for the control website, and .y i1 ; y i2 / are the 1-month and 3-month alcohol consumption outcomes (log(TOT-AL C 1)). All analyses were performed both unadjusted and adjusted for x i , which comprises the baseline variables  Table I; in general, we prefer the analysis adjusted for baseline covariates, especially the baseline value of the outcome. For GEEs, robust standard errors were used. For mixed models, the unstructured variance-covariance matrix was allowed to differ between randomised groups, although results with a common variance-covariance matrix (not shown) were very similar. Multiple imputations were drawn separately for each arm by using the chained equations approach [40] implemented in Stata [41,42]. For method MI1, the imputation model for the outcome at each time was a linear regression including the outcome at the other time and the baseline variables. For method MI2, the imputation models additionally included the log of one plus the numbers of pages hit at 1 and 3 months and the log of one plus the numbers of login sessions at 1 and 3 months. The distribution of the incomplete variables was not fully Normal, even after log transformation, so predictive mean matching was used to improve the imputations [31]. In each case, model (1) was fitted to each of 50 imputed data sets, and the results were combined using Rubin's rules [29]. Covariates were used in the imputation model even when unadjusted analyses were performed.
For sensitivity analyses, the views of five DYD investigators were quantified before the trial results were known, and these views were used to choose values of the informative missing parameters .ı Y 0 ; ı Y 1 / in Equation (7). When the informative missing parameters were assumed the same in both arms, the investigators believed that the mean of the unobserved responses for alcohol consumption at 3 months could be as much as 75% more or 50% less than the mean of the observed responses: these suggest the sensitivity analyses ı Y 0 D ı Y 1 D log 1:75 and ı Y 0 D ı Y 1 D log 0:5. When the data were assumed to be informatively missing only in the control arm, the investigators believed that the mean of the unobserved responses could be as much as 50% more or 50% less than the mean of the observed responses: these suggest the sensitivity analyses .ı Y 0 ; ı Y 1 / D .log 1:5; 0/ and .log 0:5; 0/. We also choose the corresponding cases with the data informatively missing only in the intervention arm: .ı Y 0 ; ı Y 1 / D .0; log 1:5/ and .0; log 0:5/. In addition to these rather extreme sensitivity analyses, we also used more moderate sensitivity analyses with .ı Y 0 ; ı Y 1 / D .log 1:5; log 1:5/, .log 1:25; 0/ and .0; log 1:25/. Analysis of number of attempts used the number of email reminders that were sent to each participant. The conditional likelihood algorithm diverged in some cases, so we used the maximum likelihood approach. 'Alho 1' and 'Alho 2' refer to models (8) and (9), respectively.
Stata code for these analyses is given in Appendix A.1.

Results
We summarise the results in Table III and display the covariate-adjusted results in Figure 1. The intervention effect is expressed as the ratio of the geometric mean alcohol consumption (plus 1 unit/week) at 3 months in the intervention group to the corresponding geometric mean in the control group. In the following text, we interpret these figures as percentage increases or decreases. All methods based on MAR, as well as complete-cases analysis, give very similar results: the point estimate represents a non-significant increase of between 4% and 12% due to the intervention, with a 95% confidence interval that does not extend below a 6% reduction.
Sensitivity analyses show that the estimated intervention effect is not very sensitive to departures from MAR when the informative missing parameter is assumed to be equal across randomised groups, but is very sensitive to departures from MAR that occur differently in the randomised groups. Moderate sensitivity analyses (indicated by * in Table III and Figure 1) yield estimates ranging from an 8% reduction to a 23% increase in alcohol consumption, whereas more extreme sensitivity analyses range from a 32% reduction to a 56% increase. This suggests that the trial's results are only robust to departures from MAR that are similar in both randomised groups.
Using the number of email reminders and the MNAR models (8) and (9) gives the estimates of the informative missing parameters in Table IV. For model (8), where the informative missing parameter is assumed equal across the two groups, the negative estimate of the informative missing parameter ı R suggests that heavier drinkers are more likely to be non-responders. However, the informative missing parameter is not significantly different from zero so that the data are consistent with a MAR assumption. For model (9), where the informative missing parameter is allowed to differ between groups, both estimates are again negative and that for the intervention group is larger in magnitude, suggesting that the tendency for heavier drinkers to be non-responders may be greater in the intervention group. Although the informative missing parameter in the intervention group is significantly different from zero (P D 0:03), a test for a difference between the arm-specific informative missing parameters is not significant (P D 0:26) nor is a test on 2 degrees of freedom for departure from MAR (P D 0:09).    (8)). Alho 2: MNAR, common ı R across attempts (model (9)).
These results do not provide good evidence against a MAR assumption, but they change the estimated intervention effects in Table III when the informative missing parameter is allowed to differ across randomised groups as in model (9). This MNAR analysis indicates a much larger increase due to intervention in the unadjusted analysis, and much wider confidence intervals in both unadjusted and adjusted analyses. We attribute these findings to the great sensitivity of estimated intervention effects to differences in informative missing parameter ı R between randomised groups, along with the difficulty of estimating this parameter.

Incomplete uptake of interventions: methods
Intervention receipt in some randomised trials can be summarised as a binary variable [43], whereas other trials have complex intervention receipt that may be summarised as one or more quantitative variables. We present a SMM that is applicable to both binary and quantitative cases, provided that intervention receipt is univariate.

Structural mean model
Structural mean models describe the relationship between the observed data and the counterfactual data that would have been observed with a different random allocation [12,44]. For the ith individual, we define y i .1/ as the outcome that would be observed if they were randomised to intervention and y i .0/ as the corresponding outcome if they were randomised to control. Exactly one of these potential outcomes is observed for each individual. Define d i .1/ as the ith individual's compliance (binary or quantitative) with the intervention, if they were allocated to intervention. We initially ignore compliance with the control. We now assume that the causal effect of the intervention is proportional to the compliance. This implies that individuals who would be complete non-compliers if allocated to intervention have no effect of allocation, the 'exclusion restriction' assumption. We then have the SMM where e i is a zero-mean error term whose presence allows treatment effects to vary between individuals. Model (10) implies that Estimation proceeds by noting that randomised group´i is independent of the potential outcomes y i .1/, y i .0/ and d i .1/, so each expectation in (11) can be computed in one arm of the trial. This leads to the estimating equation 1/ or 0 for individuals randomised to intervention or control, respectively.
Baseline covariates x i that are uncorrelated with´i and e i may be used in two ways to improve the efficiency of the estimation procedure. First, we can condition on x i in (11) and model E OEy i .0/jx i D C x i , yielding the alternative estimating equation to which we add standard estimating equations for˛and [45], giving Equation (12) is easy to estimate because it is a standard instrumental variables (IV) model [46], in which´i is the instrument, d i is the 'endogenous' variable and x i is the 'exogenous' variable. A second approach, not adopted here, makes use of baseline covariates w i that predict d i in the intervention group. In this case, precision can be gained by using the interactions´i w i as additional instruments, but at the cost of further assumptions [47]. Again, this can be fitted using standard IV software.

Interpretation
Interpreting the estimated parameter is easiest when compliance is binary. In the aforementioned model, this means that d i .1/ is 0 or 1. In the statistical literature, the two groups formed are often known as 'compliers' (d i .1/ D 1) and 'non-compliers' (d i .1/ D 0), referring to an individual's compliance status if they were randomised to the intervention. In this setting, the 'exclusion restriction' assumption, which identifies the model, states that randomised allocation has no effect on non-compliers. The parameter can be interpreted without further assumptions as the CACE, the average of y i .1/ y i .0/ over the subgroup with d i .1/ D 1 [48].
If compliance is not naturally binary, it is tempting to dichotomise it. Clearly, a good definition of 'compliers' is needed. It is not necessary to assume that compliers all receive the same benefit of intervention, because the CACE represents an average overall compliers. However, it is essential to assume that non-compliers receive no benefit from intervention. It is therefore typically necessary to use a restrictive definition of non-compliance, classing any individual whose moderate compliance could have brought him or her benefit with the compliers.
An alternative approach is to express compliance d i .1/ quantitatively. This typically makes the exclusion restriction more plausible, because the zero level can be chosen to represent no use of the intervention. However, without further assumptions, no simple interpretation of generalises the CACE. The further assumption usually made is E OEe i jd i D 0 so that model (10) is correctly specified. In this case, d can be interpreted as the average causal effect of allocation to intervention in the subgroup who would comply to an extent d : that is,

Using control group compliance
When the control group also receives some intervention, such as a placebo or a standard treatment, control-group compliance d i .0/ is also available. The SMM could then be extended as which allows for a causal effect of the control intervention. The original approach to this problem assumed that d i .1/ is a monotonic function of d i .0/ [49], but the method is very sensitive to departures from this assumption [50]. More recent causal estimation methods for models such as (13) use either an assumption that d i .0/ Ä d i .1/ for all i and Bayesian modelling with slightly informative priors [51], or informative priors for one of the treatment effects [52], or covariates that predict d i .0/ and d i .1/ differently but that do not modify the causal effect of treatment [45]. Because of the complexities of all these approaches, we would prefer to ignore d i .0/ when it is plausible that the control intervention has no causal effect (i.e. that 0 D 0 in (13)).

Missing data
Missing outcome data complicate estimation of the IV model. Standard implementations of IV are restricted to using complete cases only and are thus valid only under MCAR. Three approaches can be used to make them valid under MAR. First, inverse probability weighting (IPW) can be used [53,54]. Models are constructed for p.r i jx i ;´i D 1; d i / and p.r i jx i ;´i D 1/, and the 'stabilised weights' [55]  Second, the 'adjusted treatment received' (ATR) method [56,57] is equivalent to IV regression for complete data and is valid when outcomes are MAR [54]. In this method, a linear regression model is first constructed for actual treatment receipt on randomised group and covariates (using all observations including those with missing y i ), and the residuals are estimated. The causal effect of actual treatment receipt is then estimated by linear regression of y i on actual treatment receipt, adjusting for Copyright  the previously estimated residuals and the covariates. The standard errors from this second stage may be underestimated because they ignore uncertainty in the residuals [54].
Third, MI can be used.

Implementation
The DYD trial has complex intervention receipt: individuals could use the website on different numbers of occasions, for different lengths of time, and in different ways. Any attempt to estimate the effect of intervention receipt in such data relies on a plausible causal model describing how intervention receipt may affect outcomes. We describe one approach with dichotomised compliance and one with quantitative compliance.
For dichotomised compliance, a non-zero cut-off was chosen, because almost all randomised individuals had at least one login (Table I). Section 5.2 argues for a relatively low cut-off, and we defined compliers as individuals who logged in more than once or accessed more than 10 pages of the website within the first 1 month from randomisation. Our analyses therefore rest on the assumption that an individual who accessed fewer than 10 pages on only one occasion received no benefit, and they estimate the average benefit of the intervention website over a wide range of use.
For quantitative compliance, we defined d i .1/ in Equation (10) as the number of pages downloaded over the first month of the trial, but with an upper limit of 300 pages because we did not believe that use above this level would have further benefit. We did not use website uptake in the control group in the model, because the control website is unlikely to be effective. For the MI approach, we used the imputations constructed using compliance variables as in MI2 of Section 4.
Stata code for these analyses is given in Appendix A.2.

Results
Of 1880 individuals allocated to intervention, 1461 (78%) were classed as compliers. As a result, the estimated CACE (Table V) was not very different from the ITT MAR estimate (Table III). IPW and ATR methods behaved very similarly. MI gave somewhat different results: although this is unexpected, it is consistent with the differences between MI1 and MI2 in Table III.
Estimates of the causal effect per 100 pages downloaded were somewhat larger than the ITT estimates. This appears to be because the mean number of pages downloaded in the intervention group was 65, so the estimated effect of downloading 100 pages was approximately one and a half (100/65) times the ITT effect. The confidence interval for the intervention effect in these analyses does not extend below an 11% reduction.

Conclusions for the Down Your Drink trial
A concern with the DYD trial, and many other online trials, is that high rates of non-response and low intervention uptake makes it hard to draw conclusions about the intervention's effectiveness. Our analyses in this paper show that the conclusions were not substantially affected under a range of assumptions about the missing data mechanism, except when we assumed that the informative missing parameter differed between randomised groups. To the extent that the latter assumption may be implausible, our results appear reasonably robust. Similarly, conclusions were not substantially affected when we used causal models to consider the impact of downloading 100 website pages. The latter conclusion depends on a judgement that 100 website pages was a reasonable target for moderately conscientious website use.
Our results therefore provide some support for the use of online trials in general, with two cautions: it is essential to consider the informative missingness parameters differing between randomised groups and to consider how much the observed intervention uptake falls short of what might be hoped for. Analyses allowing for non-response and low intervention uptake are best specified in advance and included in the analysis plan.

Methodological conclusions
These methods are of potential use in all trials and not just online trials. When rates of missing data are low, sensitivity analysis may be enough to demonstrate that missing data are not a problem. In other cases, including intermediate or other outcomes and/or compliance variables in MAR analyses is a useful strategy, although treatment effect estimates may only be changed when the auxiliary variables are strongly associated with outcome [58]. Sensitivity analyses are always helpful but depend on expert consideration of the plausible degree of departure from MAR. Data on number of attempts to obtain data, or more generally ease of contact, are often recorded and should be more widely used in analysis: results from the Alho model (8) or (9) can be a useful way to allow for extra uncertainty due to the possibility of MNAR data without the need to rely on expert opinion. With pressure on journal space, it may be convenient for all these alternative analyses to be included in web appendices. In the primary publication of the DYD trial [23], which was based on more data than those used here, the primary analysis was the adjusted complete-cases analysis, and web appendices presented alternative analyses for the missing data-a partial last observation carried forward (LOCF; see in the next section), MI and sensitivity analyses using (7)-and analyses adjusting for non-compliance.
A particularly relevant question in a trial with a 'negative' result is whether this negative result is attributable to incomplete intervention uptake. In this context, it is important to formulate the causal question carefully, defining a parameter such as the CACE or the causal effect of a particular amount of intervention, and then consider the limits of the confidence interval for the parameter.

Other methods
We have not reported here an analysis using LOCF, one of the most widely used techniques [3], which simply replaces missing outcomes with the last observed value. LOCF rests on an assumption that outcomes do not change (on average in each arm) after participants drop out of the study, which is often implausible. In the DYD trial, with its large change in outcome after baseline, LOCF would yield implausibly different imputations for individuals with no post-baseline measurement and those with a 1-month measurement. Because LOCF is widely used, the primary DYD trial publication [23] reported a partial LOCF analysis that carried only post-baseline measurements forward. Unfortunately, usual justifications for LOCF rest not on the plausibility of its assumption but on approximate constancy of observed outcomes, or on an appeal to the ITT principle, or on conservatism: none of these are valid [2].
Our sensitivity analyses were based on data from baseline and follow-up time 2 only. Basing the sensitivity analyses on the mixed model (4) might be preferable but is technically more complicated and is unlikely to make much difference in view of the small differences between complete-cases and MARbased analyses (Table III). The Alho method could also be extended to allow for the repeated measures: for example, better estimating the informative missing parameter ı R by assuming it to be constant across follow-up times. Another possible assumption about the missing data is that they are 'latent ignorable', meaning that they would be MAR if the potential compliance d i .1/ were observed for everyone [59].

Extensions
For binary outcomes, mixed models become more complex, and GEE or MI methods might be preferred. The MNAR methods can be applied equally well, and the exp .ı/ parameters can be interpreted Copyright  as informatively missing odds ratios [60,61]. The SMMs described may still be used to estimate causal risk differences, but if causal odds ratios are wanted then generalised SMMs are needed [62].
Methods used for survival outcomes are typically very different from those that we have described. Here, missing data take the form of censoring, and the non-informative censoring assumption takes the place of the MAR assumption: departures from the non-informative censoring assumption are rarely considered but should be. SMMs are not suitable for survival outcomes, but the structural accelerated failure time model is a general alternative for handling incomplete intervention uptake [63,64], and hazard-based methods are available for handling all-or-nothing uptake [65].
In the DYD trial, all baseline covariates were complete. Incomplete baseline covariates are simply and efficiently handled by single imputation methods such as imputing the overall or centre-specific mean of the covariate [31]. Such simple methods would be inappropriate for missing outcomes: they are appropriate for missing baselines because baseline covariates are independent of randomised group, and adjustment for baseline covariates is not required for unbiased estimation [66].
Stata do-files to implement the analyses presented in this paper are given in Appendix A.

A.1. Missing data
The succeeding code uses four user-written commands. ice [67] and mim [68] implement MI and are available from the Statistical Software Components (SSC) archive. alho and rctmiss implement the 'number of attempts' model and sensitivity analyses, respectively, and are available from the first author's website by typing net from http://www.mrc-bsu.cam.ac.uk/IW_Stata/ in Stata. The code shows adjusted analyses; for unadjusted analyses, delete the global xvars and global time_xvars commands. The data are assumed to be in a file DYDwide.dta with one record per randomised individual.

A.2. Allowing for incomplete intervention uptake
The following code is for quantitative compliance: results with binary compliance are obtained by redefining treat.