Allowing for uncertainty due to missing and LOCF imputed outcomes in meta‐analysis

The use of the last observation carried forward (LOCF) method for imputing missing outcome data in randomized clinical trials has been much criticized and its shortcomings are well understood. However, only recently have published studies widely started using more appropriate imputation methods. Consequently, meta‐analyses often include several studies reporting their results according to LOCF. The results from such meta‐analyses are potentially biased and overprecise. We develop methods for estimating summary treatment effects for continuous outcomes in the presence of both missing and LOCF‐imputed outcome data. Our target is the treatment effect if complete follow‐up was obtained even if some participants drop out from the protocol treatment. We extend a previously developed meta‐analysis model, which accounts for the uncertainty due to missing outcome data via an informative missingness parameter. The extended model includes an extra parameter that reflects the level of prior confidence in the appropriateness of the LOCF imputation scheme. Neither parameter can be informed by the data and we resort to expert opinion and sensitivity analysis. We illustrate the methodology using two meta‐analyses of pharmacological interventions for depression.


INTRODUCTION
Missing data in clinical trials pervade all fields of medicine and may compromise the validity of inferences even from well-designed randomized controlled trials. 1 Trials usually follow patients over time and take measurements at several time points. Many participants drop out from follow-up before the end of the study but have their outcomes reported at intermediate time points. Our target is the treatment effect if complete follow-up was obtained, even if some participants discontinue the protocol treatment. Of course, discontinuing treatment is indicative of how effective and acceptable a treatment is, but ideally, the target in a randomized controlled trial is to take measurements and calculate an effect size at the end of the trial in order to abide by the intention-to-treat principle.
To achieve this, some imputation method is needed. A standard methodology in many clinical fields for imputing incomplete longitudinal data sets is the last observation carried forward (LOCF) method: The missing outcome is replaced by the last observed value. Missing data are particularly evident in mental health trials where dropout rates may exceed 50% 2 and the LOCF method is commonly applied. 3 An LOCF analysis is valid for estimating the treatment effect under very restrictive and usually unrealistic assumptions. In medical fields, disease progression is a definite feature and patients are expected to deteriorate over time, eg, in dementia, 4 assuming no progression after dropout is expected to give biased results. In such a case, a LOCF analysis would give overly optimistic results for both groups; if participants in the treatment group leave earlier (ie, due to adverse events) or more frequently, then results would favor the treatment group. However, in depression and psychosis trials, we expect participants to improve over time and an early stop may give conservative results if participants in the experimental treatment drop out earlier because of adverse events.
Establishing a treatment effect based on an analysis that is clearly conservative represents compelling evidence of efficacy from a regulatory perspective. 1 However, LOCF may induce bias in unpredictable ways, so treatment effects estimated using these assumptions are not necessarily conservative. LOCF (and other single-imputation methods) does not propagate imputation uncertainty and leads to an underestimation of standard errors, which, in turn, increases the likelihood of finding a false positive result.
Although more appropriate methods have been proposed and adopted in new trials, older trials included in systematic reviews and meta-analyses often use LOCF. 5 A recent study showed that more than 75% of meta-analyses in mental health contained studies that had LOCF imputed outcomes. 3 The availability of individual participant data is rare and, as a result, meta-analyses are not able to use appropriate imputation methods (eg, multiple imputation, likelihood methods) within each study. In this paper we focus on meta-analysis with aggregate data (AD) and provide methods to reanalyze any study in an AD meta-analysis whose reporting used the most common single-imputation methods. In these reanalyses, we make a range of assumptions about the missing data. We use the term 'LOCF analysis' to refer to a synthesis of the reported outcome data from completers with LOCF-imputed outcome values.
If studies take one single measurement at the end of the trial, then the complete case analysis would be valid under the missing-at-random (MAR) assumption: Missingness is conditionally independent of the outcome given any predictor. In either case (multiple or one final measurement), the probability of missingness may depend on unobserved characteristics such as the value of the missing outcome. In this case, data are missing not at random (MNAR). Patients in the treatment group may leave earlier because of adverse events, or patients randomized to a placebo group or a suboptimal treatment may leave earlier because of improvement and an LOCF analysis would give a biased treatment effect.
Methods to account for missing outcome data in AD meta-analysis have been previously developed. 6 They are primarily based on informative missingness parameters; parameters that relate the observed outcomes in completers to the assumed missing outcomes. White et al presented a pattern mixture model for handling dichotomous missing outcomes in which the degree of departure from the MAR assumption is quantified by the informative missingness odds ratio; this is defined as the ratio of the odds of the outcome in the missing participants to the odds of the outcome in the completers. 7,8 Mavridis et al extended the approach to missing continuous outcomes and to network meta-analysis by quantifying the degree of departure from a MAR assumption using various informative missingness parameters such as an informative missingness difference of means (IMDoM, the difference in mean value of outcome in the missing participants and completers). 9 Little work has been done, however, to account for uncertainty in data that have been imputed using LOCF. Dimitrakopoulou et al considered a sensitivity analysis by decomposing the probability of an unobserved successful outcome assuming various prior distributions for the sensitivity and specificity of the LOCF imputation. 10 Here, we extend our previous work on AD meta-analysis with continuous outcomes to account not only for missing outcome data but also for outcomes that have been imputed using LOCF.
We propose a pattern-mixture model that allows us not only to consider LOCF as a special case but also to assume LOCF with some uncertainty introduced for the imputed values. Hence, we may get LOCF estimates with increased uncertainty reflecting the facts that we made an assumption that may not be true and that imputed data should not be treated as if they had been actually observed. The suggested model uses expert opinion to correct for bias. If expert opinion is not available, we can employ a sensitivity analysis to explore how robust results are to departures from the LOCF assumptions. The methods potentially work for the most common single-imputation method, but we describe them for LOCF as this is the commonest and we describe other single-imputation methods in the discussion. This paper is organized as follows. In Section 2, we present two data sets from a large network of depression trials. 11 In Section 3, we define the model. In Section 4, we discuss how we can inform the informative missingness parameters of the model, and in Section 5, we illustrate the methodology using the data sets presented in Section 2. We conclude with a discussion in Section 6.

MOTIVATING EXAMPLES
We use two data sets to illustrate the suggested methodology. The first data set (Table 1) consists of 14 studies comparing fluoxetine and venlafaxine, whereas the second one (Table 2) consists of 11 studies comparing reboxetine with placebo. Both comparisons are taken from a large network of depression trials. 11 In both data sets, the outcome is the reduction in  symptoms of depression in the Hamilton depression scale. Figure 1 shows the proportions of participants who are imputed using LOCF, drop out from follow-up because of side effects, and have missing outcomes, for fluoxetine and venlafaxine (graphs on top row) and for reboxetine and placebo (graphs on bottom row). Dropout for side effects and dropout before providing any measurement (missing outcomes) are more likely in the experimental groups (venlafaxine and reboxetine). The overall LOCF imputation rate is more balanced. We conjecture that participants randomized to the experimental groups tend to leave the studies early because of side effects, whereas those randomized to the control groups tend to leave somewhat later because of lack of efficacy. The inequalities between missing/imputation rates raise concerns that data are likely to be MNAR and study effects are potentially biased.
It is interesting that the three studies that provide an effect from both a complete case and an LOCF analysis (that includes both completers and imputed outcomes) show larger and less precise effect estimates for the former (Table 1). For example, the study of Sheehan et al shows a very large effect in the complete case analysis, ie, −0.61 (95% confidence interval, −0.95 to −0.28), and a much smaller effect in the LOCF analysis, ie, −0.27 (95% confidence interval, −0.55 to 0.02).
The analysis of completers (using only four studies) gave a summary standardized mean difference (SMD) of −0.28 (95% confidence interval, −0.51 to −0.04) and a heterogeneity standard deviation ( = 0.18), suggesting that there is a small difference between the two antidepressants. An analysis of the LOCF data gave a summary SMD of −0.13 (95% confidence interval, − 0.20 to −0.05) with = 0.09, drawing the same conclusions but with a more precise and less heterogeneous effect size. Both sources of data are likely to be biased, because the latter has used a single-imputation method and because the former does not include more than two thirds of the studies and all the participants who dropped out.
In Table 2, all study-specific SMDs are more precise in the LOCF analysis than in completers, although within-study standard deviations are smaller in completers. This happens because sample size in the LOCF analysis is much bigger. The Versiani 2000 study had an imputation rate of 57% in the placebo group compared with a 14% rate in the experimental group. As a result, the LOCF analysis hardly showed a benefit in the placebo group and an SMD of −1.42 (95% confidence interval, −2.01 to −0.84) was computed. The corresponding SMD for the completers is −0.70 (95% confidence interval, −1.47 to 0.07). The analysis of completers gave a summary SMD of −0.15 (95% confidence interval, −0.30 to 0.00) and a heterogeneity standard deviation ( = 0.17), suggesting that there is marginally not a statistically significant difference between the two antidepressants. An analysis of the LOCF data gave a summary SMD of −0.24 (95% confidence interval, − 0.43 to − 0.05) with heterogeneity = 0.29.
We see from these two examples that LOCF does not always give more conservative meta-analytic results than completers analysis. Although LOCF is typically suggested as a conservative method, in the second example, all studies are more precise in the LOCF analysis compared with those in the complete-case analysis. The LOCF pooled estimate, however, is less precise because heterogeneity is much larger in the LOCF analysis. Hence, the decrease in within-study variations in the LOCF analysis brought an increase in between-study variation.

Notation and model definition
We divide all randomized individuals into three groups. Completers are those who completed the study providing outcome data at the end of the study. Imputed are those who did not complete the study but provided an outcome at an intermediate step and whose missing values at the end of the trial were imputed using LOCF (or another single-imputation method).
Missing are those who left the study without providing any outcome data. An analysis of the completers only is a complete case analysis. The completers and imputed together form the reported outcomes, and we refer to an analysis of these outcomes as an LOCF analysis.
In the notation, index i refers to study, j refers to study arm, and k refers to individuals. The notation involving denotes population probabilities that a participant is of a particular type (completer/imputed/missing); and denote population outcome means and standard deviations, respectively; and p, x, s denote sample counterparts of these quantities.
Among participants randomized to arm j of study i, we count n com completers, n imp imputed, and n miss missing. Therefore, the fraction who reported at least one post-baseline measurement during the study is p rep = n com +n imp n com +n imp +n miss with We use tilde throughout the manuscript to refer to quantities and estimates that have been potentially contaminated by the LOCF imputation. What we observe isx rep , the mean outcome for the completers and imputed participants. A thorough description of the model parameters is shown in Table 3.
We define Y ijk to be the true outcome of the kth individual at the end of the trial and and we define indicator variable R ijk to be 1 in reported outcomes and 0 in missing outcomes, where We then define com and imp as the true mean outcomes in completers and imputed participants, respectively. We also denote by * com and * imp , with sample counterparts p * com = n com n com +n imp and p * imp = n imp n com +n imp , the probabilities of an individual being a completer and imputed, respectively, conditional on having at least one outcome reported.
Thus, in those who had their outcome imputed, we distinguish the imputed outcomeỸ with expectatioñi mp from the true unobserved outcome Y ijk with expectation imp . More details are given in Appendix A.
We aim to estimate the mean outcome E(Y ijk ) and its variance var(Y ijk ) for all individuals that were initially randomized to group j in study i. The former is expressed in the following equation: The true outcome in the reported data, rep , is not known. We define the expected mean value of the reported data using LOCF imputation as̃r We develop a pattern mixture model as follows.
1. We estimate rep by associating it with the estimable parameter̃r ep via an unidentified parameter using the methodology presented in Section 3.2. 2. We estimate the outcome tot as a mixture of rep and miss ; see Equation (1). We associate tot with rep via an unidentified parameter using methodology presented in the IMDoM paper. 9 3. We contrast tot across study arms within the same study to obtain effect sizes and their standard errors. 4. We synthesize effect sizes via inverse variance random-effects meta-analysis. 12

Accounting for uncertainty and bias due to LOCF and missing outcome data
The aim here is to estimate the true outcome mean in participants who provided at least one outcome value. This is However, only a sample estimate for̃r ep is reported; see Equation (2).
To link rep tõr ep , we introduce a new parameter, the bias in LOCF (BILOCF) parameter imp , that quantifies the bias in the imputed values as the difference between the true outcome imp and the imputed outcomẽi mp in patients who left the study early: The BILOCF parameter is not estimable and we need to make assumptions about its value. We may consider a fixed value or a plausible range of values by assigning a distribution, eg, , that would reflect our uncertainty about its true value. Letting imp = 0 is equivalent to an analysis of reported outcomes.
In the examples considered in this manuscript, letting imp = 0 is equivalent to the LOCF analysis. We can acknowledge uncertainty about the correct analysis by letting imp = 0, meaning that our best guess is that those who dropped out neither improved nor deteriorated, and 2 imp > 0, expressing uncertainty about this guess. Effect estimates will be similar to the LOCF analysis but less precise. The methodology can be applied for other imputation schemes (eg, mean imputation).
From Equations (2), (3), and (4), it follows that We previously developed a model for missing outcome data that uses miss , an IMDoM 9 parameter, that quantifies the difference in mean outcome between observed and missing participants: . Again, we need to resort to assumptions to define this distribution.
The total outcome is tot = rep rep + miss miss .
We now estimate these quantities from the data, which we write as expectations given the data. Using Equations (5) and (2), we obtain the imputation-adjusted outcome We can also estimate an imputation-adjusted variance for the mean outcome by using a Taylor-series approximation and assuming that outcomes, probabilities of observing a pattern (completers, imputed, missing), and informative missingness parameters are uncorrelated as Proofs are given in Appendix A.
If p * com = 1 (all patients with intermediate measurements completed the study) or if imp = 0( imp = imp = 0) (the imputation process is accurate without uncertainty), then V( We can also let the BILOCF and IMDoM parameters be correlated. Mathematically, this is easily done (see Appendix B), but eliciting information about this correlation may be hard in practice.
Data inform directlỹr ep , * imp , and miss , whereas the external assumptions inform the BILOCF ( imp ) and the IMDoM ( miss ). The expected value of the outcome conditional on the reported data is By taking the variance of Equation (8) conditional on the observed data and using Equation (10) to replace V( rep ), we get It should be noted that participants drop out for various reasons. It may be unrealistic to assume the same BILOCF and IMDoM parameters ( imp and miss ) across all imputed and missing participants, respectively (eg, for those who left because of lack of improvement and side effects). In Appendix D, we present how one can assume different scenarios according to the reasons for missingness and manipulate the aforementioned Equations accordingly by assuming different BILOCF and IMDoM parameters for the various types of missing participants. However, the numbers of participants left for any possible reason are rarely reported.

Estimating the effect size and its uncertainty for each trial
The unconditional means tot are contrasted to obtain the relative treatment effect in each study, which is defined as the difference where j = C and j = T refer to the control and treatment group and f is a link function that determines the effect measure.
If f is the identity function, f(u) = u, then i is the mean difference (MD). If we obtain the SMD. We show the working for the SMD in the Appendix. For MDs, it holds and applying Equation (11) in each arm of the right-hand side of Equation (14), we obtain where V ( tot |data ) and V ( tot |data ) can be estimated from Equation (12). More information is given in Appendix B. Then, we can conduct a meta-analysis in two steps as follows.
a. Compute study-specific treatment effects and their variances from Equations (15) and (16). b. Conduct an inverse-variance meta-analysis. 12 Alternatively, the model can be fit in a single one-stage procedure, 8 eg, in WinBUGS software. 13

INFORMING THE MODEL PARAMETERS
The model presented in Section 3 is underidentified because the distributions of imp and miss (BILOCF and IMDoM parameters) cannot be informed by the data. To inform these parameters, we can either use expert opinion, possibly informed by empirical data, eg, from studies with individual patient data to inform BILOCF, or conduct a sensitivity analysis assuming various distributions for imp and miss , to explore how robust results are to departures from the LOCF analysis.
Methods have been suggested in the literature 14,15 to elicit the distribution of miss . More details are given in Appendix C.
We propose new methods to elicit the distribution of imp . This involves experts' beliefs about those who dropped out of the study at an intermediate step and had their outcome imputed using LOCF. More specifically, we would like to know how different the imputed outcome is from that we would have observed had the individual stayed in the trial until its end. We can use an expert opinion to inform the BILOCF parameter. Along with the number of imputed outcomes, we may inform the expert of the dropout times. Participants may have dropped out at different time points. Suppose that we measure reduction in symptoms in depression at 12 weeks using the Hamilton Rating Scale for Depression (HAMD) scale. Previous measurements exist for 4 and 8 weeks. We consulted two psychiatrists (AC and TF) with expertise in conducting depression trials with the aim to identify what information is important to deliver to the expert and to form appropriate questions for eliciting the parameters of interest ( imp and miss ). We put forward the following question to the experts.
Participants randomized to fluoxetine who dropped out of follow-up at 8 weeks after the onset of the treatment were observed at this point to have a mean score of 35 at the HAMD scale with 95% confidence interval [30][31][32][33][34][35][36][37][38][39][40]. What is your prediction about their outcome at 12 weeks?
Then, we may repeat the question for measurements at a different time point (eg, we have measurements at 4 weeks) or for other antipsychotics (eg, venlafaxine) or placebo. Table 4 shows the responses of a hypothetical expert who believes that participants who left at 4 weeks would have improved considerably had they stayed in the study until its completion but participants who left at week 8 would not change at 12 weeks. Translating the answers from Table 4 into parameter values for the BILOCFs, we get approximately imp ∼ N ( −6, 3 2 ) for 4 weeks and imp ∼ N ( 0, 6 2 ) for 8 weeks.
We typically know neither the mean imputed outcome (̃i mp ) nor the time point participants dropped out from the study. The latter is very important. For two active antidepressants, the typical trajectory in the acute phase treatment of depression is that we have a large improvement in 2 to 4 weeks, a smaller one in 4 to 8 weeks, and then, the effect almost flattens out. For a comparison between an antidepressant and placebo, we would expect a small difference in the first 2 to 4 weeks and the largest difference would occur around 8 weeks, and then, the difference decreases. Hence, we may have different BILOCFs for different groups of participants (or even for different comparisons of interventions). We show how this can be implemented in Appendix D. Ideally, we would like to provide the expert with the following information: 1. proportion of participants who were LOCF-imputed; 2. mean outcome estimated from imputed participants (̃r ep ) and its uncertainty; and 3. time of dropout (eg, 20% left before completion of eight weeks-usually not available in the absence of individual participant data).

TABLE 4
Eliciting expert opinion to evaluate the differences in the outcomes between LOCF imputed participants and their true outcomes at the end of the trial. This table shows, for illustration purposes, a hypothetical example with the responses of an expert who believes that participants who left at 4 weeks would have reduced by many points in the Hamilton Rating Scale for Depression (HAMD) scale had they stayed in the trial, but those who left at 8 weeks would not change at the end of the trial

Left at 8 weeks
If the patient stayed in the study, s/he would have been improved by It is not always easy to elicit expert opinion. There are difficulties in communicating the question and translating the experts' answers into parameters. With a systematic review including many studies, we would need expert opinion in each one of the studies and such a process would entail a large time burden. This was not our intention in this work as we placed more emphasis on establishing the statistical model. An easier solution is to conduct a thorough sensitivity analysis. We can start assuming imp = miss = 0∀i, and start moving gradually away from the LOCF analysis by considering The sensitivity analysis should be prespecified in the protocol analysis. A simple approach would be to assume imp = miss = 0 and increasingly assume larger values for 2 imp and 2 miss . This would be ideal if one believes in LOCF as it surpasses the problem of having spuriously narrow confidence intervals.

ANALYSIS OF MOTIVATING EXAMPLE
We suggest assuming imp = miss = 0 as the primary analysis, which is equivalent to the LOCF analysis. Any difference in the mean values imp and miss across groups would favor one treatment over the other. We take a neutral stance, assuming a zero mean for the BILOCF and IMDoM parameters in both groups ( imp = miss = 0). Most probably, this scenario is not realistic but we use it for illustration purposes. We let the standard deviation of the BILOCFs and the IMDoMs assume a range of values from 0 up to 6. The fact that we impute uncertainty around BILOCF and IMDoM would increase within-study variation. Hence, the pooled effect would change because study effect sizes would be weighted differently.
In this example, missing and imputation rates are similar across studies and we do not expect big fluctuations. Figure A1 in the Appendix shows the summary effect size, denoted by a solid line, and its 95% confidence limits, denoted by the dotted lines, under various scenarios with imp = miss = 0 and with increasing imp , miss reflected in the horizontal axis. We assume that the BILOCF and IMDoM parameters are independent across arms and with each other and that imp = miss . We made this choice so that we will not a priori favor either of the drugs. The summary effect is similar across the various scenarios with a minor reduction due to the different weights assigned to the studies. We observe that, when   [1] Complete case analysis is based on four studies. [2] Last observation carried forward (LOCF) analysis is based on thirteen studies. imp = miss > 5, ie, imp ∼ N ( 0, 5 2 ) and miss ∼ N ( 0, 5 2 ) , the summary estimate becomes nonsignificant for fluoxetine vs venlafaxine. In the comparison placebo vs reboxetine, results become nonsignificant instantly, ie, imp ∼ N ( 0, 2 2 ) and miss ∼ N ( 0, 2 2 ) , suggesting that even minor doubts about the LOCF results would result in no differences between the two groups. Table 5 shows the summary effect assuming various scenarios. Some scenarios are neutral in the sense that they assume that the distributions for BILOCF and IMDoM are the same across the two arms (scenarios N1-N5); other nonneutral scenarios assume different distributions across the two arms so that either fluoxetine (F1-F3) or venlafaxine is favored (V1-V3). Neutral scenarios do not have a large impact on the results unless imp = miss > 5. In this case, there is a small drop in the summary effect because relative weights are reassigned and a study with a positive SMD (Keller, 2009) loses much of its weight (Table E1 in the Appendix). However, with that much uncertainty around IMDoM and BILOCF, the summary effect becomes nonstatistically significant. The Keller 2009 study is by far the largest in this meta-analysis with 266 and 781 participants randomized to fluoxetine and venlafaxine, respectively (Table 2). It also has large imputation numbers (47 and 124), but their imputation rates are similar to those of other studies (Table 2). However, the penalty given to that study is relatively large exactly because of the large weight this study has on the LOCF analysis. Not trusting the LOCF results impacts mainly studies with large imputation rates. If imputation rates are similar across studies, not trusting the LOCF results impacts larger studies whose effect size has a larger impact on the summary results.

DISCUSSION
Missing data have not been handled properly in most trials, potentially leading to biased and overprecise results. These problems are propagated in a synthesis of trials through a meta-analysis, and we run the risk of finding a false-positive result because of the inflated sample sizes within trials. The LOCF method has been typically requested by regulatory agencies on the grounds that it is a conservative method, but this is mistaken and recommendations have been against its use. 1, [16][17][18] In this paper, we focused on LOCF, but the methodology can be applied to other imputation schemes. Another well-known method is baseline observation carried forward (BOCF) in which the outcome at the end of the study is replaced by its baseline measurement and is typically employed when patients withdrew from trials because of adverse events and LOCF is seen as insufficiently conservative. 19,20 This equal to assuming that missing participants have not improved/deteriorated at all.
Most depression trials report the outcome values from the LOCF analysis. We agree with the current practice that considers an LOCF analysis or a complete case analysis to be the primary analysis in a meta-analysis. The suggested methodology can be used alongside as a sensitivity analysis. It is easily understood conceptually that, by using LOCF, we not only run the risk of getting a biased outcome but also artificially increase the sample size of the study. Missing data are usually MNAR. Participants may drop out of a study because they do not see any improvement or because of drug-related side effects. Because drugs usually differ in terms of effectiveness and side effects, we expect different imputation rates and time points of dropout across the groups of a study. The method can easily extend to network meta-analysis. 21 We created R code 22 (given in Appendix E) that uses Equations (11) and (12) to compute the adjusted effect sizes and standard errors and, then, uses R package "meta" to synthesize them. 23 We have also created a Stata 24 command that will become available through "mtm.uoi.gr" and would be an extension of the recently developed command metamiss2. 25 It is not always straightforward how to embed an expert's beliefs into a statistical model. We may have data on intermediate time points that show a very different effect across time points. It could be the case, depending on the field, that there is a seemingly significant effect during the first weeks that is lost at the end of the trial (transient effect). In depression trials, this may be the case when placebo or a suboptimal treatment is involved. It is important that experts understand the reasons people drop out of a study group or collect reasons for dropout. If they leave with unequal rates, then missing data may well introduce bias. There may be bias in favor of the group with the highest imputation rate if participants are expected to deteriorate over time and in favor of the group with the lowest imputation rate if participants are expected to improve over time. The researcher may try to adjust results by making assumptions about the BILOCF parameter that favor the group that is not favored by the imputation rates. One way to inform the missing data parameters is through individual participant data (from the studies where it is available) or from trials in the systematic review that have results on all time points. In a comparison between two antidepressants, the one with the smallest imputation rate is favored as patients stay in the study longer with more chances of seeing any improvement.
Any analysis about missing data has to make untestable assumptions because the actual data needed to test the assumptions are missing. These assumptions can be used mathematically to inform effect estimates in a sensitivity analysis. Hence, starting with the LOCF analysis, we then consider various scenarios about the informative missingness parameters and explore how robust results are. The outcomes can be adjusted in such a wide range of ways there is a risk that one may, deliberately or not, make assumptions in favor of a certain drug. To minimize such a risk, we suggest that the sensitivity analyses should be prespecified and described in detail in the protocol and that values for the BILOCF and IMDoM parameters should be chosen on clinical grounds.
The validity of the analysis rests on the plausibility of the assumptions made. Clinicians with expertise in clinical trials have a good understanding of the reasons for missingness in clinical trials, but caution is needed in translating this expertise into values for the BILOCFs and the IMDoMs. We plan to continue working on how to formulate the appropriate questions to elicit information about the distributions of imp and miss . Extra caution is needed when trying to elicit correlation parameters that are not easily understood by clinicians. Missing participants dropout for various reasons, and ideally, these reasons are reported. It may be unrealistic to assume the same BILOCF or IMDoM across all missing participants. In Appendix D, we present how one can assume different scenarios for the various types of missing participants. Even if we do not wish to favor any of the interventions, we suggest assuming departures from the missing distribution assuming the same distributions for BILOCF and IMDoM across the groups of the study (neutral scenarios). An expert may inform us on which drug is likely to be favored by the LOCF analysis and consider nonneutral scenarios in the opposite drug.
In practice, it is very time consuming to define BILOCF and IMDoM for all studies taking into account their characteristics and some grouping is necessary (eg, all placebo control studies have the same BILOCF and IMDoM).
Another limitation of the model presented here is that we associated the mean outcome in the missing participants with the true outcome and not with the outcome reported in the trial. The reason we did this was to avoid potential contamination due to the LOCF imputed outcomes. However, experts might be more comfortable relating missing values to a quantity for which the data provides an estimate. The maths could be adapted to do this.
It is likely that dropouts in a randomized controlled trial would have dropped out in real life as well. Even in that case, LOCF would underestimate/overestimate a drug's efficacy if patients are expected to improve/deteriorate over time. The target in randomized controlled trials is to get an unbiased effect estimate at the predesignated primary outcome measurement point. Such an estimate would inform us about the true effectiveness of the experimental intervention. Dropouts and side effects should be taken into account (this is also why the dropout rate is a major outcome in depression trials) when informing the patient of the benefits and costs of each drug.
Our model suggests an extra source of variance (around imputed and missing outcome data). If studies have similar imputation/missing rates, then reweighting the studies would give more weight to small studies because we add an extra source of uncertainty that would have a relatively larger impact on large studies with small variances. This is similar to what is happening when we go from fixed to random effect meta-analysis. Hence, we have to take into account how much confidence we would like to place to small and large studies. Small studies may be poorly reported (eg, not report missing data) and hence get overweighted in the suggested analysis.

CALCULATION OF IMPUTATION-ADJUSTED VARIANCE
Using the result for the variance of the product of two independent random variables A and B var (AB) = ( and B = imp , we take the variance of the true mean outcome for those who provided some data (Equation (5) in the manuscript) as which is actually Equation (10) in the manuscript. By taking the variance of Equation (8) conditional on the observed data using the above result for random variables A and B, we get Then, by using Equation (10) for V which is Equation (12) in the manuscript APPENDIX B

COMPUTATION OF EFFECT SIZES FOR SIMPLE AND NETWORK META-ANALYSIS
We would like to compute the adjusted effect size of Equation (13) where j = C and j = T refer to the control and treatment group and f is a link function that determines the effect measure. We would like to estimate E( i | data) and V( i | data). For E( i | data), we have to estimate )) .
For the variance of a sum of two random variables, it holds var (A ± B) = var (A) + var (B) ± 2cov (A, B) . or study-specific such as The method is easily extended to network meta-analysis. If there are three-arm trials, the correlation between effect sizes using a common comparator should be accounted for. Suppose that, in a three-arm trial, we estimate iAB and iAC . In this case, i = ( iAB , iAC ) ′ follows a bivariate normal distribution with covariance given by the formula If, instead of MD, we consider SMD, the above Equation should be multiplied by 1

PRIOR ELICITATION
To inform the IMDoM parameter imp , we inform the experts of the data we have observed and ask them their opinion about potential differences in the missing data. For example, suppose that we have a depression trial comparing reboxetine to placebo, just like in Example 2, and the outcome is measured in the HAMD scale. We may ask the experts the following question. "Suppose that XX% of patients allocated to reboxetine had completed the final interview and their mean score in the HAMD scale is 20 with standard deviation 6 (so that about 95% of these participants have values between 8 and 32). What is your expectation for the mean outcome score for those who did not provide any outcome data compared with those who completed the trial?" The experts are then asked to distribute a total weight of 100 across nine categories. Most likely, we would expect missing data to be judged worse than completers. In Table C1, we give an example of the answers of an expert who believes there is no difference between missing participants and completers, and the beliefs of a second expert who believes that missing participants did worse than completers.

TABLE C1
Eliciting expert opinion to evaluate the differences in the outcomes between missing and reported outcomes. Experts are asked to distribute a total of 100 across the nine categories. This table shows the responses of an expert who believes that there are no big differences between missing and reported outcomes, and an expert who believes that missing participants had worse outcomes than those reported

The mean score in the HAMD scale of those who were randomized to reboxetine and completed the final questionnaire is 20 with standard deviation 6 (so that about 95% of these participants have values between 10 and 30). What is your expectation for the mean outcome score for those who do not completed the final questionnaire, compared with those who completed it?
First expert Interval of mean change for the noncompleters Worse than reported outcomes by Same as reported outcomes Better than reported outcomes by

APPENDIX D DIFFERENT BILOCF PARAMETERS
We expect to observe different effects in those dropping out before 4 weeks after the onset of treatment and in those dropping out after 4 weeks. We introduce two BILOCFs, one referring to the period before 8 weeks and one to the period from 8 weeks onwards. Similarly, if we have K groups, we introduce K BILOCF parameters.