Information anchored reference-based sensitivity analysis for truncated normal data with application to survival analysis

The primary analysis of time-to-event data typically makes the censoring at random assumption, that is, that—conditional on covariates in the model—the distribution of event times is the same, whether they are observed or unobserved. In such cases, we need to explore the robustness of inference to more pragmatic assumptions about patients post-censoring in sensitivity analyses . Reference-based multiple imputation, which avoids analysts explicitly specifying the parameters of the unobserved data distribution, has proved attractive to researchers. Building on results for longitudinal continuous data, we show that inference using a Tobit regression imputation model for reference-based sensitivity analysis with right censored log normal data is information anchored , meaning the proportion of information lost due to missing data under the primary analysis is held constant across the sensitivity analyses. We illustrate our theoretical results using simulation and a clinical trial case study.


INTRODUCTION
However carefully clinical trials are designed and planned, some outcome data are often missing. This occurs when a patient is lost to follow-up, which could be, for example, due to noncompliance with the study protocol, or stopping an assigned treatment due to experiencing adverse effects. When the outcome of interest is the time to an event, patients censored at their last-known time point can be considered as generating a type of missing data. The event occurs after they were last seen. Any analysis must make inherently untestable assumptions about the distribution of the time from the last observed visit to the event. Therefore, censored data cause unavoidable ambiguity in the analysis. Since we never know the true event time for censored data, it is important to be clear about the assumptions made about such data, and the subsequent impact of these assumptions on the conclusions. If a contextually implausible assumption concerning censored data is made, then the results from the analysis might lead to potentially misleading conclusions, which can directly influence patient care (among others Sterne et al., 2009;White & Carlin, 2010;Ibrahim, Chu, & Chen, 2012;Jakobsen, Gluud, & Winkel, 2017).
Therefore, in a trial setting the ideal is for the primary analysis to make the most contextually plausible assumptions about the censored data; the conclusion from the primary analysis is then tested using sensitivity analysis which makes different, contextually plausible assumptions concerning the censored data.
In practice, the primary analysis of time-to-event data most often makes the censoring at random (CAR) assumption, that is, that-conditional on covariates in the analysis model-the distribution of event times is the same, whether they are observed or unobserved. In a sensitivity analysis, postcensoring behavior will typically then assume data are censored not at random (CNAR), also known as informative censoring.
If the results from the primary and sensitivity analysis are broadly consistent, we can conclude that for the sensitivity analysis scenarios investigated, the outcome from the primary analysis is robust to contextually plausible departures from the primary assumption concerning the missing data mechanism. If this is not the case, and the conclusions change following the sensitivity analysis, then the investigators should report the conditions under which the conclusions may change, along with the relative likeliness of these circumstances occurring. These steps provide more confidence in the results, especially when regulators are considering new treatments for approval. Both the European Medicines Agency (EMA) and U.S. Food and Drug Administration have underlined the importance of performing sensitivity analyses in clinical trials (CHMP, 2010;NRC, 2010), with the recent EMA ICH E9 (R1) addendum providing clarity in terms of both definitions and example scenarios for such sensitivity analyses CHMP (2019).
MI involves forming a suitable imputation model based on the observed data. Typically, this will take the form of a regression model, with asymptotically normally distributed parameter estimates. Repeated draws from this conditional predictive distribution of the missing data are made to create multiple completed datasets. Each completed dataset is then analyzed using the substantive analysis model of interest to obtain multiple treatment estimates (Carpenter & Kenward, 2013, chapter 2). Treatment and variance estimates are then averaged across multiply imputed data sets using Rubin's MI combination rules; provided the imputations are proper (e.g., Rubin 1987), statistical inferences have good frequentist properties.
Reference-based multiple imputation (RBMI), an approach originally proposed by Little and Yau (1996), has recently gained popularity as a sensitivity analysis approach due to its inherent practicality. Missing data are imputed by reference to an appropriately chosen group, or groups of patients from the observed data. For example, observed data from one arm of a clinical trial can be re-used to impute all unobserved data in the trial under a particular assumption, where all unobserved are expected to behave like the specified "reference" arm. Recent practical applications of RBMI are numerous, with examples in the fields of psychotherapy, endocrinology, and paediatric and geriatric care (e.g. Philipsen et al. (2015), Jans et al. (2015), Billings et al. (2018), Atri, Frölich, Ballard, et al. (2018)). Carpenter, Roger, and Kenward (2013) defined a number of approaches for RBMI sensitivity analysis for longitudinal data with a continuous endpoint, and one of these, "Jump to Reference," has recently been extended to the time-to-event setting (Atkinson, Kenward, Clayton, and Carpenter (2019)). In summary, Jump to Reference imputes unobserved data using an imputation distribution that has parameters from the patient's randomized arm up to their last observed time point, with parameters from a specified reference arm for unobserved time points.
With RBMI methods, the model used for imputing data and analyzing data is no longer the same (i.e., uncongenial, Meng, 1994), and this has provoked a discussion on the operating characteristics of Rubin's variance estimate in this context (Hughes, Sterne, & Tilling, 2014;Kim, Zeng, & Taylor, 2017;Liu & Peng, 2016;Robins & Wang, 2000;Tang, 2018). In addition, with RBMI we can intuitively understand that since we reuse data, in the analysis we increase the homogeneity between the groups we are seeking to compare. If we are not careful, this dilution or mixing effect between groups decreases the variability of the resulting treatment estimate. We view this as undesirable, since this reduced variability may be seen as a "reward" for missing data.
However, recently Cro, Carpenter, and Kenward (2019) proved that, at least for continuous longitudinal data, RBMI is information anchored, meaning that the proportion of information lost due to missing data is held constant across the primary and sensitivity analysis. Here, we are using "information" in a rather specific sense, defined to be the inverse of the derived estimator of the sampling variance.
Furthermore, in their recent paper Atkinson et al. showed that, counterintuitively, the empirical variance decreases as the proportion of censored data increases (columns 7 and 8 of Table 1 of reference 2), whereas information anchoring was shown to hold for Rubin's variance estimator (albeit with simulated data). Building on this work, and that of Cro et al., in this article we provide theoretical results showing that information anchoring also holds for a reference-based sensitivity analysis with a Tobit imputation model assuming truncated normal data. We illustrate the approach with right censored log normal survival times, where the estimate is the difference in mean survival times by treatment.
The remainder of the paper is organized as follows. The next section introduces our example clinical trial dataset which will be used to illustrate the theoretical results. Section 3 defines information anchoring in more detail. Section 4 presents the theoretical results, which are then compared with simulated data in Section 5, along with an illustration of the approach based on the clinical trial data in Section 6. We finish with a discussion of the results and possible areas for further study.

MOTIVATING DATA
The Second Randomized Intervention Treatment of Angina (RITA-2) trial (Henderson et al., 1997(Henderson et al., , 2003 randomized 1018 eligible coronary artery disease patients from the United Kingdom and Ireland to receive either Percutaneous Transluminal Coronary Angioplasty (PTCA, n = 504) or continued medical treatment (n = 514). Those patients randomized to angioplasty received the intervention in the first 3 months. The primary endpoint of the study was a composite of all cause mortality and definite nonfatal myocardial infarction. After 7 years, there were 73 deaths (14.5%) on the PTCA arm and 63 (12%) on the medical arm (difference in proportions +2.2%, 95% confidence interval [−2%, 6.4%], p = .21).
The study concluded that an initial policy of PTCA was associated with greater improvement in angina symptoms, with this effect being particularly present in patients with more severe angina, and that the increased risk of performing PTCA should be offset against these benefits.
RITA-2 was a pragmatic trial, so that although patients were initially randomized to PTCA or medical treatment, in the course of the follow-up patients received further procedures according to clinical need, and the trial was designed to compare a policy of beginning with medical treatment against a policy of beginning with PTCA. Subsequent nonrandom interventions (NRIs) were either PTCA, or when necessary, a coronary artery bypass graft (CABG). In the PTCA arm, 17.0% of patients had a second PTCA, while 12.7% had a CABG. By contrast, in the medical arm 27% went on to have a PTCA and 12.3% had a CABG.
The primary published analysis of our illustrative example, was an "Intention To Treat" analysis, that is, targeted a "treatment policy" estimand, with a composite outcome of time to first of death or nonfatal myocardial infarction. Consistent with this ITT approach, this analysis did not take account of nonrandomized interventions (of which, as we detail below, there were quite a number, especially in the medical therapy arm, where many patients "crossed-over" to PTCA). This suggested that our approach could be illustrated as follows: we artificially censor patients in the medical arm at the time of their first non-randomized intervention, typically PTCA. Our first analysis of the data with this artificial censoring uses within-arm imputation and therefore targets a per-protocol (i.e., hypothetical) estimand (in the language of EMA ICH E9 (R1) addendum). However, in our sensitivity analysis we impute event times for these artificially censored patients, but now using our method to assume that at these nonrandomized intervention times patients in the medical arm "jumped" to the hazard in the PTCA arm. As this is effectively what happened (because the typical nonrandomized intervention for medical arm patients was PTCA), this sensitivity analysis is targeting the treatment policy estimand. We did this to provide an empirical evaluation of the utility of our approach. This is because if our method works as intended, our sensitivity analysis is targeting the ITT estimand, and we expect it to give similar results to the original ITT analysis. Should our approach give clinically sensible results when we can cross-check it with actual data, we will have more confidence in applying it in typical applications, where such cross-checking is not possible.
In the next section we define information anchored sensitivity analysis.

INFORMATION ANCHORED SENSITIVITY ANALYSIS
If we consider a two-arm clinical trial such as the RITA-2 trial, the aim of a sensitivity analysis is to assess the behavior of the treatment estimate under alternative, clinically plausible, CNAR scenarios for post-censoring data. Since in RITA-2 post-censored patients in both treatment arms underwent PTCA, we propose using RBMI to model the post-censoring behavior based on the PTCA arm; patients "jump" from the medical arm to the PTCA arm. Therefore, we estimate the event time distribution of the PTCA arm in the usual manner, and use this to impute potential post-censoring behavior on the medical arm. However, variance estimators for the primary analysis may behave in an unexpected manner under certain sensitivity analysis scenarios. Indeed, there are examples in which the usual variance estimator with a reference-based method decreases as the proportion of missing values increases (refer to Appendix A of Data S1). Such counter-intuitive properties would undermine our confidence in the approach and of course would reward trialists for losing data! Therefore, to justify the use of RBMI it is important to assess this is not the case by quantifying the amount of statistical information available in the sensitivity analysis relative to the primary analysis. This will establish whether the sensitivity analysis is injecting new information, or taking away information, relative to the primary analysis. Cro et al. proposed the principle of information anchoring to address this, which we now formally define in the context of time-to-data.
Consider a clinical trial in which time-to-event data is collected from patients, denoted by Y , in order to estimate a treatment effect . We denote the data from those patients experiencing the event by Y obs , and the data of those censored by Y cens . We make a primary set of assumptions, for example, that all censored patients are "censored at random" (CAR). The estimate of under this primary assumption is denoted bŷo bs,CAR , which, for a valid estimate under CAR, is calculated using both the observed event and censoring times.
Furthermore, let us assume that we are able to observe a realization of the actual event times for the censored patients, Y cens−realized,CAR , under the primary assumption of CAR. Of course, this is a hypothetical construct, but it will help to frame the definition of information anchoring.
Taken together, the observed data, Y obs , and the realization of the event times for the censored patients, Y cens−realized,CAR , we obtain a full data set under the primary CAR assumption. We definê full,primary to be the corresponding estimate of after fitting the primary analysis model to this full dataset.
For the sensitivity analysis, we make a different set of assumptions concerning the distribution of post-censoring data, that is, scenarios in which censoring is assumed to be informative (i.e., censored notat random).
Defined in the same manner as for the primary analysis, for the sensitivity analysis we havêo bs,sensitivity and̂f ull,sensitivity , whereby "full" is defined analogously, assuming we are able to observe the posited post-censoring behavior from our hypothetical construct denoted Y cens−realized,sens , associated with a specific set of assumptions for the sensitivity analysis (e.g., "jump to PTCA arm").
Furthermore, we define the observed information about under the primary and sensitivity analyses by I( … ). Since there is less information when there is censored data, then we would expect the ratio I(̂f ull,primary ) I(̂o bs,primary ) > 1 and I(̂f ull,sensitivity ) I(̂o bs,sensitivity ) > 1. Information anchored sensitivity analysis is defined as: so that the proportion of information lost due to missing data is constant across primary and sensitivity analyses. If Equation (1) above holds then we say that the sensitivity analysis is information anchored with regards to the primary analysis Cro et al. (2019). In the next section, we show that for a two-arm clinical trial, and assuming a Tobit imputation model, RBMI for a time to event outcome is information anchored. We do this by estimating each of the components of Equation (1) above. We begin by defining the clinical trial setting and distributional assumptions, then provide variance estimates assuming fully observed behavior, before moving on to estimate the components under CAR, and finally provide the results for the specific informative censoring scenario of interest ("jump to reference").

Clinical trial setting with time-to-event data
We consider a two-arm clinical trial in which patients are randomized either to a reference or active treatment arm. For example, the reference arm could be a placebo or standard of care treatment.
We assume the times from patient randomization to when an event occurs, typically death or treatment failure, are log normally distributed. For the primary endpoint we wish to estimate the difference in the log normal survival times between the reference and active treatment arms.
In addition, we also measure a baseline covariate per patient at the time of randomization. Furthermore, we assume that, due to randomization, the mean and variance of the baseline covariate is the same in both arms.
Patients are right censored if they deviate from the study protocol, or do not experience the event before the end of the study period. In addition, and without loss of generality, for ease of exposition we assume that patients are only censored on the intervention arm, and therefore those on the reference arm always experience the outcome event of interest (i.e., patients are fully observed). We assume the baseline covariate is fully observed on both arms, and further, that the log event times and baseline covariate are bivariate normally distributed.
As noted above, the treatment effect of interest is the difference in mean log time-to-event between the trial arms, and we will test if this difference is statistically significant at the 5% level using a standard t-test with shared sample variance estimate.
Throughout, we assume a single baseline covariate, but the approach generalizes to multivariable scenarios. Also note that we assume 1:1 randomization between the trial arms, so in the expectation there are equal numbers of patients in the reference and active treatment arms.
Denoting subscript r for patients on the reference arm, we therefore have baseline covariate measurements Y rj1 for the j = 1, … , n r patients, and their log event times Y rj2 : where r denotes the reference arm, and the baseline mean 1 is common between the reference and active arms. Using the subscript a for the active treatment arm, we now differentiate between two sets of patients, those whose event times are observed (), and those whose event times are censored (, for deviating from protocol). Throughout, we assume that n d of the n a patients on the active treatment arm are censored at some fixed time point , and the remaining n o are followed until they have the event, so that n o + n d = n a . The choice of a common censoring time allows the use of standard results for the truncated normal distribution in the derived expressions.
We again assume bivariate normality for the baseline covariate and event times: For the primary analysis we assume censoring is at random, so that d2 = a2 . For the sensitivity analysis, we make the assumption of "Jump to Reference" (J2R ) for those censored on the active arm.
We estimate the parameters of the unobserved data distribution from the reference arm, and use this to impute event times for censored patients on the active treatment arm, so that, post-censoring, these patients "jump to reference" (for details refer to Atkinson et al., 2019). An example of this post-censoring behavior is the "jump to PTCA" sensitivity analysis scenario noted above in the context of the RITA-2 trial. More formally, defining S(t) to be the usual survival function, and assuming a patient is censored on the active arm at a time point T i , in such a "jump to reference" scenario from this point on we use the survival function on the reference arm to impute an event time. So that, where the subscript post denotes the post-censoring distribution. This expression is the basis for the multiple imputation process for the sensitivity analysis.
Since we are using a reference-based sensitivity analysis method, post imputation we retain the primary analysis method, the t-test, for the difference in the log normal survival times, and apply this to each of the multiply imputed datasets created under the J2R assumption for post-censoring behavior, and combine the estimates in the usual way with Rubin's rules.
Now, to confirm that the principle of information anchored sensitivity analyses holds for this scenario we require the following equality to hold, at least approximately: so that the proportion of information lost due to missing data is constant across primary and sensitivity analyses. We build this result step by step, evaluating each of the components in the above equation in turn. We begin with the left-hand side of Equation (2), derive the information ratio I(̂f ull,CAR ) I(̂o bs,CAR ) . In the next subsection, we derive the upper part of this expression, when the data is fully observed for the primary analysis, I(̂f ull,CAR ). In Section 4.3, we estimate the variance following multiple imputation under CAR, I(̂o bs,CAR ). Putting these two components together, we derive an expression for the left-hand side of the Equation (2) in Section 4.4 . The right-hand side of the equation is then derived for the J2R sensitivity analysis scenario in Section 4.5, I(̂f ull,J2R ) I(̂o bs,J2R ) . Finally, we compare the two ratios in Section 4.6.

Estimation of V(̂f ull,CAR )
On the active treatment arm, let us assume that we are able to observe a realization of the censored log event times under the primary assumption of CAR, Y cens−realized,CAR , and we then combine this with the fully observed data, Y obs , forming a full dataset under CAR. Fitting the primary analysis model to this dataset leads to a treatment estimatêf ull,CAR , the subscript making the CAR setting explicit. We start by deriving the expression for V(̂f ull,CAR ), the variance of this estimate. Conditioning on n d , the number of patients censored, the expected treatment effect is a weighted average of the mean time to censoring, and the mean time to event for those fully observed, compared to the mean survival time for all fully observed patients on the reference arm The variance of this estimate for the fully observed data is: Now, we additionally assume equal numbers of patients in both arms of the study n r = n a = n, and equal variance in both arms, so that r22 = a22 = 22 . The expected value of this expression under repeated draws, with the same underlying CAR assumption, is E[V(̂f ull,CAR )] =

22
n . This establishes the expression for the top part of the left-hand side of Equation (2). In the next section, we calculate the post multiple imputation estimate for Rubin's variance under CAR.

Tobit imputation model
To establish the properties of multiply imputed data on the active treatment arm, consistent with our underlying normal model, we use a Tobit model fitted to the observed data as the imputation model (Greene, 2003;Tobin, 1958). Based on estimates from this fitted model, we impute events for the censored patients, and finally derive the variance of the combined sets of observed and imputed data using Rubin's rules.
We now calculate the estimate of the variance under the CAR assumption following multiple imputation of the truncated (censored) data from the fitted Tobit model. We assume the estimateŝ from the fitted imputation model are normally distributed. Although MI is formally Bayesian, we assume the observed data dominates the posterior distribution, and, using inferential arguments set out on pp. 56-60 of Carpenter and Kenward (2013), we can, without any important loss of generality, assume the variance of the data is known in the imputation model. Accordingly, the multiple imputation process is as follows: 1. Fit the Tobit model to the observed data on the active arm regressing the log normal survival times Y aj2 on the baseline covariate Y aj1 , including the appropriate likelihood terms for those censored at time : resulting in the maximum likelihood estimateŝ0,̂1, and estimate of the residual variancê 2.1 . 2. Approximate a draw from the Bayesian posterior distribution assuming a noninformative prior by drawing̃2 where X ∼ 2 n o −2 . We assume the model estimates are bivariate normally distributed with mean = (̂0,̂1) T and covariate matrix: we then take a draw from MVN(̂,V), resulting in a vector of estimates (̃0,̃1). 3. Impute the censored observations using the resulting regression model with one set of the estimates (̃0,̃1), repeating this process until the imputed event time is greater than : resulting in a complete dataset with no censored observations. 4. Repeat the above steps K times, resulting in K complete data sets. 5. Perform the substantive analysis, in this case the t-test, on each of the k = 1, … , K complete datasets in turn, resulting in estimateŝk,̂2 k , which we combine to form overall estimates using Rubin's rules in the usual manner. The MI estimate of iŝM I,CAR = 1 K ∑ K k=1̂k , for k = 1, … K. Rubin's variance estimator is defined as: This is the standard MI procedure we would follow for the primary analysis assuming CAR. Now, to derive an estimate of Rubin's variance analytically we have to take a slightly different approach since there is no closed form solution to calculate the maximum likelihood estimators for the Tobit model.

Estimation of V(̂M I, CAR )
We begin by briefly presenting existing results for truncated normal distributions which we will subsequently use. Recall, n d patients on the active arm are censored at a randomly defined, but fixed time point . Then, for j = 1, … , n d patients censored at ; and Φ being the density and CDF of the standard normal distribution, respectively. The fraction part (in large square brackets) of the expression above is known as the inverse Mills ratio (Greene (2003)), and, Note that the expression in equation (6) is just the "usual" expected value, with an additional term √ 22 , where is treated as a constant for specific values of a2 , 22 , and . Analogously, we also define: using the standard expression for the variance of the truncated normal distribution, and analogously: Without loss of generality, the truncation limit is assumed to be greater than the mean throughout the analysis.
We now derive the variance estimate under CAR. The direction we take is to write down an expression for a "typical" multiply imputed event time, and then work from there to derive an expression for Rubin's variance estimate. To do this, we combine our knowledge of the observed data, the properties of the bivariate normal distribution (see Appendix B of Data S1), and the results for the truncated normal distribution stated above.
The imputation model for the jth of n d censored values, from the kth of K imputed datasets, is defined as: whereỸ a2o,k is the expected value of the observed survival times for the censored patients, with̃2 wherê2 .1 is the estimate of the residual variance from the fitted Tobit model, or equivalently, using the properties of bivariate normalitŷ2 .1 =̂2 2 −̂2 12 the coefficient̂k of the regression model is 12 wherê2 2 is the sampling variance of the log survival times on the active treatment arm. We note at this point that we use the 2 -distribution here as an estimate of the sampling variance of the standard deviation of̂2 2 , √̂2 2 ; and finally, j,k |Ȳ a2d,k ,̃2 2,k ∼ Tr (0,̃a 2d,k , a = ( −Ȳ a2d,k )), with̃a 2d,k the expected value of the unobserved survival times for the censored patients. The right-hand side of the above expression denotes the truncated normal distribution with mean 0 and variancẽa 2d,k , truncated on the left-hand side at ( −Ȳ a2d,k ); we use this relocation so that the mean of this expression is centered at zero, with the variance as we require, and we ensure that multiply imputed events are greater than the original censoring time for patient j. To estimate the variance for both the observed and multiply imputed data, we have to evaluate Equation (3) in Section 4.2. Since there is no missingness on the reference arm, the first part of this expression can be calculated directly, For the second part of Equation (3) pertaining to the active treatment arm, we decompose the summation into observed and censored parts, substituting our new expressions forỸ aj2,k and Y a2,k (for the latter, refer to Appendix C of Data S1), . (10) Now, to calculate the above expression following multiple imputation we need to consider both the components of Rubin's variance estimator, that is, the between and within imputation variance estimates (refer to Equations (4) and (5)): Proposition 1. Letting (n a − 1) ≈ n a and K → ∞, where w k, and w k,̇a re normally distributed variables with mean 0, which encompass the variability of the inverse mills ratio terms in equation (8).
To arrive at the pooled variance of the treatment difference under CAR following MI, we just add the expression above to the variance for the reference arm, ) .
Proof. Refer to Appendices C and D. ▪

V (̂f ull,CAR )
We now have the building blocks for the first result, concerning the information ratio I(̂f ull,primary )

I(̂o bs,primary )
which, under the primary assumption of CAR, is rewritten as I(̂f ull,CAR ) I(̂o bs,CAR ) . In Equation (3) of Section 4.2 we defined an estimate of I(̂f ull,CAR ) for the hypothetical case of no censoring. In the previous section, the expression for E[V(̂M I,CAR )] was obtained, which is an estimate of 1∕I(̂o bs,CAR ). Therefore, the required ratio on the left-hand side of Equation (3)  . Lemma 1. The ratio of the information in the full data relative to that in the incomplete data following multiple imputation assuming CAR, using the asymptotic expressions for Rubin's variance estimator, as K tends to infinity, is bounded above by I(̂f ull,CAR ) assuming n = n a = n r , d = , which is the correlation between the baseline measurements and the event times squared. C is the variance of the standard deviation of 22 , a , with N = n a (e.g., p. 171 of Kenney & Keeping, 1951).
Proof. Refer to Appendix E of Data S1. ▪ For the principle of information anchoring to hold, the ratio assuming CAR shown above should be, at least approximately, the same numerically as that for the sensitivity analysis following multiple imputation under the Jump to Reference assumption for censored patients.

Estimation of
Under J2R, the n d -censored patients obtain multiply imputed event times based on the reference-arm hazard. This has the effect of reducing the difference between the estimated event times on the two arms, since we now have n d additional observations generated under the hazard from the reference arm. We referred to this phenomenon in our introductory comments as the "dilution" or "mixing" effect.
Lemma 2. The ratio of the information in the full data relative to the incomplete data under the assumption of J2R is: with and, where Δ c = a2d − a2o , with the inverse mills ratio calculated assuming N( r2 , 22 ).
Proof. For the derivation of Equations (15) and (16) ) or larger, it simplifies to an expression quite similar to that which was derived for the CAR case. In fact, both expressions are dominated by the first term enclosed in brackets, but the expression under J2R starts with a term in 22 n rather than 2 22 n , which was the first term of the analogous expression under the CAR assumption (see Proposition 1).
Again, this makes sense because n d censored observations have been replaced with new event times of a similar magnitude to those on the reference arm (in terms of the hazard). Therefore, and in line with what might be expected, the variability in the difference between the event times of both arms is somewhat reduced due the hazard dilution effect.

Information anchoring under Jump to Reference
Equations (15) and (16) provide the building blocks for the main result concerning information anchoring.

Theorem 1. For bivariate log normally distributed right censored event times, the variance estimate,E[V(̂M I,J2R )], is approximately information anchored.
Proof (outline). We hypothesize that despite using the J2R approach for sensitivity analysis, the variance inflation due to missing data following MI is the same as that under CAR. We compare the expression for the estimated variance under J2R in equation (15), E[V(̂M I,J2R )], with the predictedvariance under J2R, E[V anchored ]. The latter we calculate using the other three terms in the equality in Lemma 2, which we recall, relates the ratios for information anchoring to hold, Therefore, using the expressions for the three terms on the right hand side, which we know from earlier calculations, we can obtain the predicted information anchored variance, Now, if we subtract the predicted term E[V anchored ] above from the expression for the newly derived expression for E[V(̂M I,J2R )] in Equation (15), we will obtain an estimate of the difference, which, if information anchoring holds, should be rather small numerically.
We obtain the following expression: where we only consider terms greater than or equal to O The upper bound on the difference in equation (18) is dominated in absolute magnitude by the first two terms, and the negative ones in a2o and Δ 2 c . Focussing only on these terms, we see that the difference depends on the number of patients on each arm (n), the censoring level ( d ), the variance of the data at time 2 ( 22 ), the variance of those observed ( a2o ), the correlation between measurements at times 1 and 2 ( 2 ), the difference between the mean of those observed and those deviating on the active arm at time 2 (Δ c ), and the inverse Mills ratio relating to the censoring point .
Using the same argument as Cro et al. (2016), we can apply standard trial size calculation arguments to provide a realistic upper bound on Δ c , assuming, for example, 80% power and 5% significance: The first term in Equation (18)  Since d ≪ 0.5 can be expected for many sensitivity analyses, the whole expression is of the order of approximately 10% of the total variance 22 .
Therefore, we conclude that the upper bound on the difference is small in comparison to the absolute information anchored variance, and the principles of information anchoring have been approximately upheld following MI under J2R, confirming the proposition in Lemma 2 and Theorem 1 above. . This final result concludes the presentation of the theoretical results for information anchoring for censored data. In the following sections, we validate these results using a simulation study, and then illustrate their application using simulated data closely modeled on the RITA-2 trial data. ▪

SIMULATION STUDY
We present the results of a simulation study which demonstrates the information anchoring results derived in the previous section. We began by assuming normally distributed baseline and time-to-event outcome data without any censoring, and then increased the level of censoring on the active treatment arm. We then compared the empirical results from multiply imputing events using the Tobit model ("simulation"), with those predicted using the theoretical results presented in the previous section ("theory").
The simulation study includes patients with a baseline measurement and event or censoring times generated from a bivariate normal distribution with means and covariances as follows: with a sample size n = n r = n a = 250 in each arm. We imputed new event times for those censored at a fixed time point using the CAR and J2R MI methodology presented earlier with 50 imputed datasets (refer to Appendix I of Data S1 for example R code). We used mean baseline measurements r1 = a1 = 2.0 on the reference and active treatment arms, respectively, and common variance 11 = 0.4. There is no censoring on the reference arm with mean log normal survival time of r2 = 2.0, with a2 unknown due to censoring on the active treatment arm. Again, in line with our theoretical framework, the survival times have a known common variance 22 = 0.6. Censoring was varied between 10% and 50% on the active treatment arm, with no censoring on the reference arm. We simulated 500 datasets. Table 1 presents the results. In the following, we have dropped thêfrom the expressions to ease readability.
Column (A) shows the theoretical estimate of the information anchored variance following MI under the assumption of J2R. In other words, we calculate each of the three elements on the right-hand side of Equation (19) below with a priori known summary statistics for the simulated data: Column (B) shows the same theoretical estimate of Rubin's variance under the J2R assumption, E[V MI,J2R ], but this time calculated using the expression for that term directly (i.e., the left-hand side of Equation (19) above), again using a priori values.
Column (C) uses the same calculation for information anchoring as defined in column (A) (i.e., the right-hand side of Equation (19)), but estimates each of the quantities following MI of simulated data (denoted by the wide hat), not using the theoretical expressions derived earlier: Column (D) is the analogue of Column B, but calculated using simulated data to estimate of Rubin's variance under the J2R assumption, denotedÊ[V MI,J2R ].
Columns ( Following our theoretical expressions and assuming information anchoring holds, we would expect Column A ≈ Column B and Column C ≈ Column D. The results in Table 1 show excellent agreement when we compare the predicted difference using theoretically calculated values (column "Difference theory (A-B)"), with those from simulated data (column "Difference simulation (C-D)).
The discrepancies between the true information anchored variance, and the MI estimate of it, increase as the censoring level increases as we move down the table (column "Difference simulation (C-D)"), from 0.00002 at 10% censoring to 0.0002 at 50% censoring, which are approximately of the order of magnitude of the Monte Carlo simulation error (0.00016).
These results are in line with those for continuous longitudinal data presented in Cro et al. (2019), which were also based on asymptotic arguments.
We conclude that the simulation results are consistent with our expectations and our information anchoring arguments hold.
In the next section, we investigate whether information anchoring principles hold in a clinical trial setting, inspired by data from the RITA-2 trial.

ILLUSTRATIVE EXAMPLE BASED ON THE RITA-2 TRIAL
Here, we use a simulated dataset very closely modeled on that from the RITA-2 trial. Our theoretically derived expressions assume a common right censoring threshold, , for all patients. However, for the RITA-2 data this assumption does not hold, and therefore, instead of using the original data set directly, we have simulated data based on the descriptive statistics of the original data, and chosen to reproduce the level of censoring of the original data.
We generated bivariate normally distributed data according to the properties RITA-2 trial data, and chose to result in a censoring level of 27% on the medical arm, again as in the RITA-2 data. The baseline mean we define to be the same on both arms ( medical = PTCA = 0.94). The fully observed log normal survival times on the PTCA arm have mean PTCA = 1.75. The "true" mean of the log time on the medical arm, r2 , is unknown because of the censoring:̂m  that is, a1 = r1 = 11 = 0.15, a2 = r2 = 22 = 0.22, and 12 = −0.04. For the medical arm, we also know the mean time of the observed events, a2o = 1.60, with associated variability, a2o = 0.11. The end-point is the difference in mean log survival time for patients on the medical arm (standard of care, the reference arm) and those receiving PTCA (the active treatment arm). Furthermore, for the primary analysis we assume there is no censoring on the PTCA arm, and patients are censored at their first NRI on the medical arm.
With this simulated dataset, and consistent with the original trial, we assume that medical arm patients censored because of a nonrandomized intervention "Jump to the PTCA arm." This allows us to compare the observed results (from analyzing the trial data) with (i) multiply imputed data and (ii) our theoretical results. Table 2 summarizes the parameter estimates and SDs, along with the results from the primary and sensitivity analyses.
Column [1] provides the descriptive statistics and treatment difference for the primary analysis assuming CAR. Column [2] summarizes the results of the original trial when patients were followed up irrespective of nonrandom interventions (i.e., "intention to treat"). Column [3] presents the results when censored patients are multiply imputed under "Jump to PTCA," again calculated using both the theoretically derived results and following multiple imputation.
Under Consistent with our theory and the simulation results, there is little practical difference between the theoretically predicted estimates using our calculated quantities, and those following multiple imputation. Furthermore, the estimates of Rubin's variance in the primary and sensitivity analysis are also very similar (final row of Table 2), which provides additional validation of our theoretical results. Plugging these and the other relevant values back into the information anchoring equation, indicated that the sensitivity analysis was approximately information anchored, as expected.

DISCUSSION
Given the increasingly prominent role of sensitivity analysis in the analysis of clinical trials, exemplified by the ICH E9 addendum published in 2019 CHMP (2019), it is important to provide methods which are not only easy to implement and use, but which are also clinically plausible and contextually relevant to the trial team and other stakeholders. This also responds to the FDA mandated report by the U.S. National Research Council in 2010, which highlighted the lack of such sensitivity analysis methods NRC (2010). To address this, reference based multiple imputation methods have recently been extended for time to event data (Atkinson, 2019;Atkinson et al., 2019). However, there has been considerable discussion and numerous publications concerning the perceived conservativeness of Rubin's variance estimator when the imputation and analysis models are not congenial (Hughes et al., 2014;Kim, 2004;Nielson, 2003;Robins, Hernan, & Brumback, 2000), a situation which occurs with RBMI (Liu and Peng (2016)). Following the arguments set out in Cro et al. (2019), we believe it is important that sensitivity analyses are information anchored, so that as we move to the sensitivity analysis we do not inflate the statistical information about the treatment effect. As the theory and the simulations presented here show, this is achieved when we use Rubin's variance estimator. By contrast, in this setting, using the bootstrap (e.g., Liu and Peng) means that the information about the treatment effect is being increased as we move to the sensitivity analysis (so we are rewarded in effect for missing data). We argue the conservativeness of Rubin's variance estimator is of less concern compared to the requirement to ensure that the sensitivity analysis does not (implicitly or explicitly) increase the statistical information about the treatment effect. Therefore, we believe Rubin's variance estimator is more appropriate to use in sensitivity analyses of this type.
The information anchoring principle states that the statistical information should be held constant across the primary and sensitivity analyses, and that it should certainly not be increased, because (Cro et al. (2019)) "an information positive sensitivity analysis is rarely justifiable, implying … that the more data are missing, the more certain we are about the treatment effect under the sensitivity analysis," and, "while information negative sensitivity analyses provides an incentive for minimizing the missing data, there is no natural consensus about the appropriate loss of information." We showed that if we wish to follow the information anchoring principle, reference-based sensitivity analysis implemented using multiple imputation is statistically appropriate for inference trials with censored outcome data that can be modelled with the truncated normal distribution. A natural example of this is time-to-event data, where the survival times follow a log-normal distribution. We built our theoretical results based on using Tobit regression as imputation model, and showed that, assuming log normal survival times and right censored data, information anchoring holds. We validated these results by simulation, with the results closely mirroring those obtained in the continuous longitudinal setting. As a reviewer pointed out, whilst we assumed no difference in the baseline covariate between the two arms, the framework would allow an adjusted analysis of variance analysis.
A limitation of the approach is that we assume a fixed censoring point , primarily to enable us to use standard results for the theoretical work. Having a fixed censoring point is unusual in individual patient randomized trials, but can occur in cluster randomized trials where interventions start on a specific day, and is most often the case, by design, for emulated trials using observational data (Atkinson et al. (2020)). Notwithstanding these examples, most clinical trials have rolling recruitment and stop at a fixed calendar time, which means that the censoring time is variable. Therefore, as proposed by a reviewer, we performed additional simulations with random censoring, and compared the results with those from the theory based on a fixed censoring point (refer to Supplementary Material Appendix J of Data S1). The results suggest that, at least empirically, our theory also holds with random censoring, although this remains to be formally confirmed.
A further potential limitation of the approach presented here is using the t-test, or analysis of covariance, to determine the mean log difference in the event times. The most common choice for the primary analysis model would be the Cox Proportional Hazards (CPH) model, with the hazard ratio over the total follow-up period defined as treatment effect for the trial. However, it is important to recognize that the CPH model inherently assumes that the hazards are proportional, even though this is increasingly being challenged in many clinical settings (Royston & Parmar, 2011, 2013. The restricted mean survival time (RMST) is now frequently used instead of the hazard ratio as a preferable clinical endpoint. Although not always equivalent to the RMST, there are clear parallels between using the RMST and the mean log time to event used to calculate the endpoint in our theoretical framework. Nevertheless, an extension of the information anchoring theory to the proportional hazards setting would be beneficial.
We have focused solely on the "Jump to Reference" assumption (from one of a number of proposals suggested in , since this was the approach which, following mapping to the time to event setting, had a clear clinical context when assuming proportional hazards (refer to Atkinson et al. (2019)).
In addition, our simulation results and those from a previous study (cf, figure 3 of Atkinson et al., 2019) indicate that information anchoring holds for censoring levels of approximately 50% or less. We believe this is sufficient reassurance for practical use since we are not primarily intending this approach to be applied to administrative censoring (which we know is completely at random and may reach high proportions), but to censoring (for various reasons, typically treatment changes) in the course of the follow-up.
The information anchoring approximation is most accurate when randomization is 1:1. When this is not the case, the theory presented here suggests how to modify the procedure to retain information anchoring. For simplicity the theory presented here focuses on the case in which there is censoring in one arm. The approach could be extended for deviations in both arms in an analogous manner to that set out by Cro et al. Approaches varying an explicitly defined sensitivity parameter, such as -methods, are often presented as an alternative to RBMI for performing sensitivity analysis. Such methods introduce a -size change to each patient's hazard at the censoring time point, which is varied to reflect an improvement or a deterioration in their post-censoring condition (Lipkovich, Ratitch, & O'Kelly, 2016;Lu, Li, & Koch, 2015). While attractive due to their simplicity, determining a meaningful range of values for has often proved to be a practical drawback. The process often requires iterative discussions within the trial team, and these can become difficult to conclude satisfactorily, especially when considering delta multipliers of a hazard or odds ratio, and how this may vary by treatment arm. Nevertheless, Cro et al. showed that the information anchoring principle holds for specific settings with delta methods, but this has yet to be demonstrated for time to event data. This is certainly an area for potential further study.
In summary, the attraction of RBMI sensitivity analysis methods is that they are accessible, that is, they are both simple to understand and straightforward to implement. Taking such an approach avoids the alternative in which we would have to explicitly model the event and censoring process, which is often rather complex to achieve in practice, even for experts in the field. We have demonstrated that RBMI in the time to event setting, albeit under specific distributional assumptions, is information anchored, providing both regulators and industry with confidence for using this approach for sensitivity analysis for time to event data.