Regression discontinuity designs for time‐to‐event outcomes: An approach using accelerated failure time models

Regression discontinuity designs (RDDs) have been developed for the estimation of treatment effects using observational data, where a treatment is administered using an externally defined decision rule, linked to a continuous assignment variable. Typically, RDDs have been applied to situations where the outcome of interest is continuous and non‐temporal. Conversely, RDDs for time‐to‐event outcomes have received less attention, despite such outcomes being common in many applications. We explore RDDs for a time‐to‐event outcome subject to right censoring. An accelerated failure time (AFT) approach is used to establish a treatment effect estimate for a fuzzy RDD (where treatment is not always strictly applied according to the decision rule). This estimation approach is robust to different levels of fuzziness and unobserved confounding, assessed using simulation studies and compares favourably to established structural AFT models. A motivating example is presented in which models are fitted to estimate the effect of metformin on mortality and cardiovascular disease rate using real observational data from UK Primary Care.


INTRODUCTION
Regression discontinuity designs (RDDs) are an approach to the estimation of a treatment effect in observational studies. An RDD can be applied to situations where a treatment or intervention is allocated according to an externally imposed and pre-specified guideline that is linked to the value of a continuous assignment variable and a threshold. The assignment variable is measured for a subject and the subject should receive treatment if the value of their assignment variable is greater (or less) than or equal to the pre-specified threshold value. An RDD exploits the fact that, for subjects with assignment variables close to the threshold, the threshold can be seen as a randomising device, where groups of subjects whose assignment variable values lie 'just above' and 'just below' the threshold may, under certain assumptions, be seen as 'exchangeable' and balanced with regard to confounding variables.
As an example, which shall be explored in this paper, in the United Kingdom it has been recommended that metformin, a medicine used to reduce blood sugar level, be prescribed to patients who are at risk of diabetes whose haemoglobin A1c (HbA1c) level is at or above the level of 48 mmol/mol. A high level of HbA1c can be an indicator of uncontrolled diabetes which may lead to serious health complications or death if not treated. In this example, the 'treatment' is a prescription of metformin, the 'assignment variable' is a patient's HbA1c level and the treatment 'threshold' is an HbA1c level of 48 mmol/mol or above. An effect that we might wish to estimate is that of metformin on all-cause mortality or perhaps on the occurrence of particular complications.
In general, a treatment guideline is not always adhered to strictly. As such, there may be some subjects whose assignment variable value lies below the threshold but who receive treatment (and vice-versa). In this case, the design is referred to as a fuzzy RDD. Otherwise, where there is strict adherence to the guideline, the design is known as a sharp RDD. In this work, we focus on fuzzy RDD designs, which are common in medical studies and for which treatment effect estimation may be more challenging in general. In the metformin example, it may be that some patients are counter-indicated for metformin, perhaps owing to side effects or co-morbidities. It may be that some patients would prefer to try other measures to reduce their blood glucose level, such as changing their diet or lifestyle choices, or that patients who are 'just below' the 48 mmol/mol HbA1c level are prescribed metformin as a precautionary measure.
The RDD was first introduced by Thistlethwaite and Campbell (1960) and has since been used widely in economics (Angrist et al., 1996;Hahn et al., 2001;Imbens & Angrist, 1994;Imbens & Kalyanaraman, 2012;Xu, 2017) and political science (Caughey & Sekhon, 2011;Erikson et al., 2015). Despite the obvious analogy between an RDD and a randomised controlled trial (with the threshold seen as a randomising device), RDDs have not been used extensively in medical studies until relatively recently. Most applications of the RDD to medicine have focused on studies where the outcome of interest has a continuous distribution, such as a normal distribution (see, e.g. Bor et al., 2014;Geneletti et al., 2015;Law et al., 2017;Linden et al., 2006;O'Keeffe et al., 2014). A few more recent applications of RDDs when the outcome is binary has also been explored (Bor et al., 2017;Geneletti et al., 2019;van Leeuwen et al., 2018). Time-to-event outcomes are important and ubiquitous measures in many medical studies. Despite this, the use of an RDD with a time-to-event or survival outcome has received little attention. Bor et al. (2014) fitted a model that used a log hazard ratio to estimate the effect of antiretroviral drugs on mortality in HIV patients, where antiretrovirals were prescribed according to a decision rule linked to a patient's CD4 count. A scaling method was used for the model fit but this was only valid where the event of interest is rare and the follow-up time short.
In this paper, we explore approaches for the estimation of a treatment effect in an RDD where the outcome is a time-to-event. In particular, we focus on treatment effect estimation under the accelerated failure time (AFT) assumption, an alternative to the popular proportional hazard (PH) assumption for modelling time-to-event outcomes. An attractive property of the AFT assumption is that the treatment effect estimate can be interpreted directly in terms of the time-to-event and the AFT assumption is also more compatible with assumptions required for an RDD. An estimator is derived for an AFT treatment effect in an RDD, using the assumptions made for an RDD. In addition, we also demonstrate that a spline-based approach may be suitable for situations where the assignment variable-outcome relationship is non-linear around the threshold. These estimation approaches are compared to that from a structural AFT (S-AFT) model, commonly used in causal inference for time-to-event outcomes in observational studies (Clarke & Windmeijer, 2010;Didelez et al., 2010;Hernán & Robins, 2006). Using simulation studies, the RDD estimator is desirable in that it is less sensitive to confounding. An important assumption of the S-AFT model is that all potential confounders are observed or that unobserved confounders can be explained by the observed confounders. This is a reasonably strong assumption and, since it is untestable, the corresponding estimate might be misleading when the assumption is violated. The RDD approach does not include such a stringent assumption, particularly when the treatment guideline is adhered to.
This paper is organised as follows: in Section 2, the RDD and its assumptions are outlined. In Section 3, the RDD-AFT and S-AFT estimators are introduced and described. A simulation study is carried out in Section 4 to compare estimation approaches. The example on the prescription of metformin on mortality and cardiovascular disease is presented in Section 5 and a discussion and conclusions are provided in Section 6.

THE REGRESSION DISCONTINUITY DESIGN
In this paper, motivated by the modelling of the effect of metformin on all-cause mortality for pre-diabetic patients (patients at risk of type 2 diabetes), we develop an RDD for a time-to-event outcome. We now describe the notation and the modelling assumptions for an RDD with a time-to-event outcome in detail.

RDD: variables and assumptions
We define the following variables that will be used in the RDD. The index i denotes the i th subject for the set of subjects {1, … , N}. In this work, we consider only right censoring for the time-to-event.
• T * i is the true time-to-event and C i is the time at which right censoring occurs. In a given data set, the observed 'time-to-event' is denoted T i = min{T * i , C i }. • Since either T * i or C i is observed (and not both), we define the event indicator i = 1 which takes the value 1 if the i th subject's time-to-event is observed and 0 if the time-to-event is right censored.
• X i is the continuous assignment variable and x 0 is the treatment threshold, such that the i th subject should receive treatment if X i ≥ x 0 .
• Z i = 1 {X i ≥ x 0 } is threshold indicator, which is equal to 1 (0) if the i th subject's assignment variable lies above (below) the threshold.
• A i is a treatment indicator, such that A i = 1 if the i th subject receives treatment and 0 otherwise.
is a set of confounding variables both observed ( i ) and unobserved ( i ).
As noted previously, an RDD considers subjects whose assignment variable values lie 'just above' the threshold to be similar to those whose assignment variable values lie 'just below' the threshold. To quantify this notion, we define a bandwidth, h, such that only those subjects whose assignment variable values lie within the set [x 0 − h, x 0 + h] are included in the RDD and used to estimate the treatment effect at the threshold. Some works have discussed bandwidth selection and, in particular, choosing the 'optimal' bandwidth for an RDD (Calonico et al., 2014;Imbens & Kalyanaraman, 2012;Ricciardi et al., 2020); however, in this paper, we refrain from a focus on an optimal bandwidth. In a medical context, it is likely that clinicians would have input into how similar subjects would be with regard to their assignment variable values, particularly where these values are clinical indicators or perhaps risk scores. As such, we shall consider RDDs for a variety of bandwidths and investigate the sensitivity of results to bandwidth choice in a pragmatic manner, similar to the approach taken in other papers (for example Geneletti et al., 2015;Geneletti et al., 2019;Li et al., 2015).
In an RDD, inference is typically focused on treatment effect estimation at the threshold. To identify an effect at the threshold, we make a number of assumptions that should hold. In this paper, we use a conditional independence framework (Dawid, 1979;Didelez et al., 2010) to outline these and details of these assumptions are provided in Appendix A. We now consider approaches for a time-to-event (survival) outcome.

RDD FOR A TIME-TO-EVENT
We use an AFT assumption when comparing groups with a time-to-event outcome in an RDD. First, we outline the general AFT model for groups and will later adapt this for use in an RDD.

Accelerated failure time assumption
An AFT model assumes that the time-to-event in the 'treated' group is accelerated (or decelerated) compared to that in the 'untreated' group. Formally, the survivor functions for the treated and untreated in a general AFT model may be expressed as Here is termed an 'acceleration factor' and it is assumed that, in general, the times taken to attain the same survival probability in the treated and untreated groups are t and t respectively. As such, is a measure of the difference in the time-to-event between the two groups and, clearly, = 1 would suggest no general difference. Typically, a log-linear form may be used where an AFT assumption is made, with a general model for the time-to-event, T i , given by Here i denotes the (natural) log event time for a subject who does not receive treatment (A i = 0) and the acceleration factor is given by = exp( ). We note that other covariates could be easily added into this model if desired. If the probability distribution of i is specified, the parameter may be estimated parametrically. Alternatively, a semi-parametric AFT model may be used without specifying the distribution of i (Buckley & James, 1979;Louis, 1981;Wei, 1992). For a randomised controlled trial without non-compliance, or a sharp RDD with no unobserved confounding, an estimate of reflects the treatment effect at the threshold (subject to the validity of the AFT model and any distributional assumptions). However, in a fuzzy RDD (essentially similar to a randomised controlled trial with non-compliance), estimation of the treatment effect at the threshold is less straightforward. Proportional hazards (PH) models are used commonly for inference in analyses involving time-to-event outcomes. However, PH models may have a limited interpretation when estimating causal effects, with or without confounding (Aalen et al., 2015;Hernán, 2010;Lange & Hansen, 2011;Martinussen et al., 2020;VanderWeele, 2011). In a PH model, marginal and conditional hazard ratio estimates may differ considerably and give different causal interpretations, owing to the non-additive form of the hazard function with respect to the intervention and potential confounding variables. This may be problematic and lead to the misinterpretation of a treatment effect (Janes et al., 2010;Martinussen & Vansteelandt, 2013;Sjölander et al., 2016;Sutradhar & Austin, 2018). Furthermore, for a Cox PH model, the construction of the likelihood function using risk sets implies that, if the risk is different for different levels of confounding variables, the independence of individuals who have not experienced the event of interest by a given time-but have different levels of confounding variables-may not hold if the confounding variables are not included in the model or are unknown. In contrast, the AFT model has been used for causal inference (Huling et al., 2019;Robins, 1992;Robins & Tsiatis, 1991). In particular, the Weibull AFT model, used in this paper, can be easily formulated as a location-scale model which yields a treatment effect estimate that is collapsible and can be interpreted causally. Furthermore, the AFT model is used to describe and compare the survivor experience of group(s) using the probability of survival, which may be more intuitive than comparing survivor experience using hazards.

RDD-AFT estimator
We now consider an estimator for the acceleration factor at the threshold based on the RDD assumptions outlined in Appendix A. In RDDs where the outcome is continuous and not a time-to-event, the local average treatment effect (LATE) has been used extensively for effect estimation at the threshold Imbens & Lemieux, 2008). Essentially, the LATE involves fitting linear models for the outcome above and below the threshold to estimate the 'jump' in the outcome, using only data where the assignment variable, x i lies in the range [x i − h, x 0 + h] for a given bandwidth h. The fuzziness present in an RDD is accounted for by estimating the proportion of subjects who receive treatment above and below the threshold and then scaling the estimate of the 'jump' by the difference in these proportions. That is, the LATE at the threshold is given by where is the difference in means for the linear models above and below the threshold and a − b is the difference in the probability of receiving treatment above (a) and below (b) the threshold.
We note that the AFT model specified in Equation (1) is a linear model and methods used to estimate the LATE in a standard RDD may be adapted for use in an AFT, as we now outline. A model that estimates the discontinuity in the time-to-event at the threshold may be written The interaction term between the assignment variable and the threshold indicator may be dropped if it is assumed that slopes (on the log-scale) are the same above and below the threshold by setting 3 = 0. The discontinuity or 'jump' at the threshold is measured by 1 . As mentioned previously, if the distribution of i is specified then model parameters may be estimated using maximum likelihood and accounting for right censoring, where appropriate, and these models may be fitted easily using standard statistical software (e.g. using the survival package in R (Therneau, 2021)).
In this paper, we shall assume a Weibull distribution for i and use maximum likelihood methods for parameter estimation. In practice, other distributions such as a log-normal or log-logistic distribution could be fitted and an optimality criteria based on the model residuals can be used to select the distribution that best fits a data set (Chan et al., 2018).
In a sharp RDD, the estimate of 1 should be unbiased for the treatment effect at the threshold on the log-scale (assuming, of course, that the RDD assumptions are satisfied and that the model is correctly specified). In a fuzzy RDD, direct estimation of 1 by fitting the model (2) would yield a biased estimate of the treatment effect at the threshold, owing to the fact that Z i does not necessarily distinguish the treated from the untreated. As such, in a fuzzy RDD, the estimate of the treatment effect at the threshold on the log-scale may be given by Here a and b denote the probability of treatment for subjects with assignment variable values above and below the threshold, respectively. A proof of this result is provided in Appendix B. The parameters a and b may be estimated using logistic regression or another appropriate method. We now consider a structural AFT model as a method for treatment effect estimation.

Structural AFT model
Structural models have been considered widely for causal effect estimation in observational or non-randomised studies (Clarke & Windmeijer, 2010;Didelez et al., 2010;Hernán & Robins, 2006;Robins et al., 2000;VanderWeele, 2009;Vansteelandt & Goetghebeur, 2003). In particular, the structural AFT (S-AFT) model has been used for treatment effect estimation where the outcome is a time-to-event (Hernán et al., 2005). The S-AFT model may be defined easily using a potential outcomes framework which we now outline. We define a as an indicator such that a = 1 implies that treatment is assigned and a = 0 that treatment is not assigned. The potential outcomes are T i (1) (where a = 1) and T i (0) (where a = 0) and these represent the time-to-event for the i th subject where the subject is treated and not treated, respectively. The treatment effect at the subject level would be ascertained by comparing T i (1) and T i (0). However, since these cannot both be observed, the treatment effect is typically estimated using G-estimation, which involves exploiting the conditional independence between the potential outcomes and the treatment indicator. Using the AFT assumption, it follows that where is the acceleration factor. For the S-AFT model to be valid, the following assumptions must hold: S1 There are no unobserved confounders and treatment allocation is strongly ignorable conditional on observed confounders. This assumption may be written formally as In words, this assumption implies that the time-to-event outcome for the i th subject, whether treated or not does not depend on whether or not the subject has actually received treatment (observed as treated). S2 The probability of treatment is non-zero, conditional on observed variables. That is The S-AFT model is not strictly an RDD approach. Instead, the S-AFT method compares the treated and untreated groups while correcting for effect of confounding using the observed confounders and relying on the assumptions presented above. In contrast, the RDD-AFT method compares the patients above and below the threshold, using the threshold as a quasi-randomising device, and a treatment effect at the threshold is estimated by adjusting the result of this comparison with the probability of compliance to the treatment guideline. Considering the estimates made by the RDD-AFT and S-AFT models (the RDD-AFT model compares outcomes for levels of Z and the S-AFT model compares outcomes for levels of A), we note that assumption S1 is similar to A5 in the RDD assumptions (Appendix A). In addition, assumption S2 is similar to A6. The S-AFT model would be applicable for an RDD, assuming that there is no unmeasured confounding in the region around the threshold or, alternatively, if any confounding variables present in a region around the threshold were accounted-for in the S-AFT model. Where these assumptions are satisfied, the acceleration factor (i.e. the treatment effect at the threshold) may be estimated using G-estimation (Hernán et al., 2005) G-estimation is well known as a method that may overcome bias from time-varying confounders when estimating a treatment effect and has been used with AFT models. It is known that estimation problems may sometimes be encountered when fitting S-AFT models using G-estimation, owing to the 'artificial censoring' procedure used in the model fitting process (Joffe et al., 2012). An overview of the G-estimation procedure used for the fitting of the S-AFT model is provided in Appendix C.
We now aim to compare the S-AFT and RDD-AFT approaches using a variety of simulated scenarios, before a comparison and application using a real data set on the prescription of metformin and mortality in UK primary care.

SIMULATION STUDY
We consider some scenarios using simulated data in which an RDD would apply, with different levels of confounding. In doing so, our aim is to explore and evaluate the use of the S-AFT and RDD-AFT models for a time-to-event outcome in a fuzzy RDD. The simulation algorithm and set-up is outlined below. First we shall consider a simple linear relationship between the assignment variable and the outcome of interest. We shall then consider a simulation scenario with a non-linear assignment variable-outcome variable relationship where flexible functions of the assignment variable are considered in the model.

Simulation algorithm and set-up
We describe the simulation algorithm where M data sets, each of size N, are simulated (M ∈ N, N ∈ N). As noted previously, the index i denotes a subject with i ∈ {1, … , N}.
Step 1: For each subject, an assignment variable, X i , is simulated from a continuous uniform distribution.
Step 2: The threshold is set to be equal to 0.5 and we define the centred assignment variable, X C i , and threshold indicators, Z i , as Step 3: A confounding variable, W i , is simulated from a standard normal distribution Step 4: The probability that the i th subject receives treatment, p i , is given by The parameters of this model are specified to reflect the simulated fuzziness of the design and the level of confounding with regard to treatment allocation.
Step 5: The treatment indicator, A i , is simulated as follows Step 6: The 'true' time-to-event is simulated as Here, 4 defines the treatment effect (on the log-scale), 5 reflects the correlation between the time-to-event outcome and the confounder W i and i defines the probability distribution for log T * i .
Step 7: To incorporate right censoring, a censoring time C i is simulated. A censoring probability, p c , and an 'end-of-study' time K are specified. A random variable is specified U C i ∼ Uniform(0, 1) and C i is defined as follows Step 8: The observed time-to-event and event indicators are defined as follows Step 9: Steps 1-8 are repeated N times, to create a data set with N subjects.
Step 10: Steps 1-9 are repeated M times and M data sets, each with N subjects, are obtained.
In this paper, we set 0 = −2 and 2 = 2. We set 4 = log 1.5 so that the acceleration factor is 1.5. We choose the median survival time in the untreated group to be 7 years and, as such, we set ) .
For the censoring mechanism, we set the maximum follow-up time as K = 10 and the probability of drop-out (right censoring) before 10 years as p c = 0.15. As noted in the simulation algorithm, 1 denotes the strength of the relationship between the threshold indicator and the probability of treatment (i.e. the fuzziness of the RDD). In addition, 3 and 5 reflect levels of unobserved confounding with treatment and the time-to-event, respectively. We varied these parameters to adjust the level of fuzziness and level of confounding, resulting in a range of simulated data sets using which the methods outlined in Section 3 may be compared. The type of simulation scenarios, to which we refer in our discussion of simulation results, are outlined with parameter choices in Table 1.
No confounding implies that there is no linear correlation between the confounding variable W i and both the time-to-event T i and treatment allocation A i . The chosen values of 3 and 5 result in the Pearson correlation coefficients shown in the columns T,W and A,W which allows an assessment of the linear correlation level between variable pairs and, hence, the level of confounding.
We measure how well the treatment guideline is adhered to using the level of fuzziness of the design. Where there is a high level of compliance to the treatment guideline, the fuzziness is weak and a low compliance to the treatment guideline implies that fuzziness is strong. This is shown in the column P.C. of Table 1. In the case of weak fuzziness, about 80% to 90% of the patients complied to the treatment guideline, while the probability of compliance to the treatment guideline is about 55% where fuzziness is strong. We chose N = 2000 and M = 1000. Standard errors were estimated using bootstrapping, with a bootstrap sample size of 1000. (3) and (5) for the simulation scenarios with the corresponding probability of compliance (P.C.) and estimates of correlation coefficients between T and W ( T,W ) and A and W ( A,W )

Simulation study-results
Simulations were carried out using the chosen parameter values stated in Section 4.1. We chose RDD bandwidths of 0.05, 0.1, 0.15 and 0.2 and fitted both the S-AFT and RDD-AFT models to estimate the treatment effect at the threshold. We note that the use of the bandwidths results in a reduction in sample size from N = 2000, with smaller bandwidths using smaller sample sizes. The probability of censoring (either by drop-out or by not experiencing the event before the simulated end point of 10 years) was approximately 0.47 for each simulated data set. Figure 1 shows box plots that summarise the acceleration factor estimates for each bandwidth, estimation approach and confounding level where the level of fuzziness is weak. As would be expected, when there is no confounding both methods yield unbiased estimates of the acceleration factor across all bandwidths. Where there is a low level of unobserved confounding, the S-AFT method is generally biased for the acceleration factor whereas the RDD-AFT method is not, with the median RDD-AFT estimate lying close to the true acceleration factor value for all bandwidths. Similarly, for the scenario where the level of confounding is high, the RDD-AFT estimate is more desirable than the S-AFT estimate, generally producing an unbiased estimate of the acceleration factor for all bandwidths. The difference between the S-AFT and RDD-AFT estimates would be expected for scenarios where confounding occurs, because confounding variables should be accounted-for for the S-AFT model to be valid. To check the S-AFT model further, we repeated the simulation study but conditioned on confounders and found that the S-AFT estimates were less biased, as would be expected, shown by the boxplots for the adjusted S-AFT estimates in Figures 1 and 2. Numerical summaries of these adjusted S-AFT estimates (and all other simulations) are shown in Table 2. Figure 2 shows box plots that summarise the acceleration factor estimates for each bandwidth, estimation approach and confounding level where the level of fuzziness is strong. Both methods produce unbiased estimates of the acceleration factor where there is no confounding. For both the low and high unobserved confounding levels, the S-AFT model yields a biased estimate of the acceleration factor. In contrast, the RDD-AFT model is less biased and generally provides a more accurate estimate of the acceleration factor (treatment effect).
In the simulation study, the time-to-event was drawn from a Weibull distribution, which has a monotonic hazard function. To investigate the sensitivity of the approach to the distributional assumption, we simulated data from a log-logistic distribution that has a non-monotonic hazard function and fitted the Weibull RDD-AFT model to the simulated data sets. Results are shown in Appendix D and these show some bias in the RDD-AFT estimates, which tend to underestimate the treatment effect at the threshold. However, the bias is smaller as the bandwidth reduces. From the simulation study, we can conclude that a smaller bandwidth not only reduces the effect of unobserved confounding, it may also reduce the effect of model mis-specification, subject to the validity of the RDD assumptions.

Simulation study-flexible models
We consider a scenario where the relationship between the assignment variable and outcome is non-linear and a flexible model is used. The simulation algorithm is similar to that given in Section 4.1, but Step 6. is replaced with the step outlined below, to allow a non-linear relationship between the 'true' time-to-event and the assignment variable. Step 6 The 'true' time-to-event is simulated as where Here, 4 defines the treatment effect (on the log-scale), 5 reflects the correlation between the time-to-event outcome and the confounder W i and i defines the probability distribution for log T * i . Figure 3 shows a plot the functional relationship between the assignment variable and the outcome on the log-scale.
To fit a flexible model, we use a natural cubic spline (Hastie & Tibshirani, 1999) to model the relationship between the outcome and assignment variable (i.e. the numerator of the log (RDD-AFT) estimator). The number of knots in the spline (V) varied for different bandwidth sizes as follows: for h = 0.2, V = 8; for h = 0.15, V = 6; for h = 0.1, V = 4 and for h = 0.05, V = 2. Table 3 shows numerical summaries of acceleration factor estimates at the threshold where data were simulated from the non-linear scenario for all levels of confounding and where fuzziness is weak or strong. Figure 4 shows boxplots that summarise acceleration factor estimates for each bandwidth and where the fuzziness level is weak. Figure 5 shows similar boxplots but where the level of fuzziness is strong.
Examining these results and summaries, across all scenarios, the linear RDD-AFT approach does not accurately estimate the true acceleration factor as we should expect. In contrast, the spline-based RDD-AFT model appears to provide an unbiased estimate of the true acceleration factor across all bandwidths and confounding levels, where fuzziness is weak. Performance of the spline-based model is also reasonable where fuzziness is strong. However, across all scenarios, we note that the variability in the spline-based estimates is higher than that for the linear RDD-AFT estimates. We might expect the variability to be larger with a spline-based model owing to the possibility of a bias-variance trade-off commonly encountered when fitting splines. We note that the linear RDD-AFT method improves (in terms of a reduction in bias) as the bandwidth decreases. This would be expected as the functional form of the assignment variable-outcome variable relationship varies less over a smaller bandwidth. The linear S-AFT model performs reasonably well where the bandwidth is small and confounding is low. However, where confounding is high, the S-AFT model does not estimate the true acceleration factor accurately.

EXAMPLE: METFORMIN IN UK PRIMARY CARE
In the United Kingdom, the National Institute for Health and Care Excellence (NICE) has recommended that metformin, a drug that reduces blood sugar level, is prescribed to adults at risk TA B L E 3 Estimates, biases, empirical standard errors (ESE), average standard errors (ASE) and 95% coverage for the log of the acceleration factor from simulation studies for the RDD-AFT estimators estimated using the previous approaches and a flexible model (Spline RDD-AFT

F I G U R E 5
Boxplots showing the acceleration factor estimates from the simulation studies to compare performance of the standard RDD-AFT and spline-based RDD-AFT approaches for designs with strong fuzziness. The true value of the acceleration factor is 1.5 (dashed horizontal line). The sample size was 2000 in each simulated data set and simulations were repeated 1000 times of developing type 2 diabetes. Specifically, a NICE guideline states that metformin should be considered as a treatment for patients whose HbA1c level is greater than or equal to 48 mmol/mol (NICE, 2015). Here, HbA1c is the continuous assignment variable, the threshold is an HbA1c level of 48mmol/mol and the treatment is metformin. The treatment effect of metformin is to reduce blood sugar which, in turn, reduces complications that may arise. We aim to explore the effect of metformin on survival (time to death) and time to a cardiovascular event (myocardial infarction or stroke) for patients who are at risk of type 2 diabetes (i.e. patients with a high HbA1c level). We consider a subset of patients from a large data set of patients in UK Primary Care, known as The Health Improvement Network (THIN) database. The THIN database contains anonymised data collected at over 500 UK general practices (family doctors) and is representative of the general UK population (Blak et al., 2011;Bourke et al., 2004). We use data from 4532 male patients aged between 40 and 80 years who had their first HbA1c measurement in 2010, who had not been diagnosed with diabetes previously and whose body mass index (BMI) was less than 30 kg/m 2 . The time origin is the time of HbA1c measurement.
Of these 4532 patients, 643 patients had HbA1c values above the threshold and 3889 patients had HbA1c values below the threshold. Of those patients with HbA1c values above the threshold, 453 (70%) were prescribed metformin whereas, for those with HbA1c values below the threshold, 27 (1%) were prescribed metformin. Hence, it seems plausible that the probability of treatment is very different above and below the threshold, strengthening the case for the use of an RDD with these data.
For bandwidth selection, we consider the distributions of potential confounding variables for patients above and below the threshold. Since the bandwidth should reflect the 'similarity' of patients above and below the threshold, our aim is that, within a given bandwidth, distributions of confounding variables appear similar. The confounders that we consider, after discussion with epidemiologists, are: age at origin, BMI, low-density lipoprotein (LDL) cholesterol level and high-density lipoprotein (HDL) cholesterol level. Table 4 shows means and standard deviations for each of these variables for patients above and below the threshold, of both raw and standardised variables. Examining Table 4, we see that BMI values and LDL and HDL cholesterol values are broadly similar for all bandwidths, when comparing groups. For age, patients above the threshold appear to be slightly older in general than those below the threshold. As such, we will incorporate age at origin as a covariate when fitting models. In doing this, we are assuming that the relationship between the treatment and age has been correctly specified-we assume a simple linear relationship here.
Empirical probabilities of the events of interest are given Table 5. The probability of dying is similar in patients above and below the threshold across the bandwidths. Table 5 also shows median time-to-events for patients that experienced the events of interest. For mortality, empirical probabilities are very similar, above and below the threshold, for all bandwidths. Median times to death are broadly similar but perhaps slightly higher for subjects below the threshold, especially as the bandwidth increases. For CVD, the probability of a CVD event appears to be higher overall for subjects below the threshold. The median time-to-event is also higher for subjects below the threshold across all bandwidths. This may suggest that CVD event rates and times may differ for subjects above and below the threshold, perhaps implying that there is a discontinuity in the CVD event rate at the threshold.
Finally, we apply the methods discussed in Section 3 to the data. Since age appears to be unbalanced for patients above and below the threshold, age was included as a covariate in the methods for modelling both the LATE numerator and denominator. In comparing values of potential confounders above and below the threshold, we are seeking to verify that groups of patients above TA B L E 4 Sample means and standard deviations (SD) for potential confounding variables (both raw and standardised) above (Z = 1) and below (Z = 0) the threshold, for various HbA1c bandwidths ( and below the threshold are balanced with respect to these variables, as would be expected when comparing randomised groups in a randomised controlled trial, in line with assumption A5 in Appendix A. Implicitly, we assume that patients either side of the threshold are exchangeable and that there is a degree of randomness with regard to where patients' HbA1c values lie (above or below the threshold).
Estimates and associated 95% confidence intervals of the acceleration factor for the effect of metformin prescription on mortality and CVD are presented in Table 6. Standard errors that were TA B L E 6 Estimates and 95% confidence intervals for the acceleration factor across varying bandwidths (h) for the RDD-AFT and S-AFT approaches. n denotes the sample size for the corresponding model  .25, 9.88) used to construct the confidence intervals for the RDD-AFT and S-AFT models were computed using bootstrapping, with a bootstrap sample size of 1000. Acceleration factor estimates for the two methods are all greater than 1 and this may suggest that metformin prescription is beneficial for the events considered. That is, patients that receive metformin prescription have a higher median time-to-event (mortality and CVD) compared to patients that did not receive the prescription. However, these estimates are not statistically significant as the confidence bounds include 1 for the two methods. The 95% confidence intervals are generally wide and this may be because of a reduced sample size when the data are sub-sampled so that only patients whose HbA1c values lie close to treatment threshold are included in the RDD. In addition, the number of events observed is low for both deaths and CVD events which is likely to reduce the precision when estimating parameters of interest.

DISCUSSION
In this work, we have focused on methods for an RDD with a time-to-event (survival) outcome using an AFT approach. Time-to-event outcomes are common in medical scenarios and, in addition, treatments are increasingly prescribed or administered according to pre-defined, external guidelines. As such, an RDD approach for time-to-event outcomes is appealing. We have shown that the assumptions made for a standard RDD may be adapted for use with a time-to-event outcome using an AFT model. The RDD-AFT model is intuitive in that the treatment effect estimate obtained may be interpreted directly in terms of the time-to-event.
We have shown that the RDD-AFT approach performs reasonably well, typically yielding an unbiased treatment effect estimate at the threshold, even when the RDD is fuzzy and there is confounding. We compared the approach to the commonly used S-AFT model and saw that, as would be expected, the S-AFT model did not provide an accurate or precise estimate of the treatment effect where confounding was ignored. When confounding variables were conditioned on in the simulation study, the S-AFT model produced more precise and unbiased estimates of the treatment effect at the threshold. We note that this relied on conditioning on the correct, and known, confounding variables. This is straightforward in a simulation study but perhaps not so in real data sets in which confounders may be unmeasured or unknown. In contrast, the RDD-AFT method did not rely on the inclusion of confounding variables in the fitted model, even for simulated scenarios in which a high level of confounding was incorporated, and the model treatment effect estimates were still, generally, unbiased. Despite this, it is still possible to include confounding variables in the RDD-AFT model if desired. In addition, we showed that the RDD-AFT model could be adapted to handle situations where the assignment variable-outcome variable relationship is non-linear, using a spline-based approach to estimation. In the metformin prescription example, we compared the RDD-AFT and S-AFT models, controlling for baseline age as confounder. Although estimates from both approaches were imprecise, perhaps owing to the sample sizes, the point estimate of the acceleration factor was greater than one, for both time to death and time to a CVD event. The two modelling approaches agreed on the 'direction' of the acceleration factor estimate and yielded estimates reasonably close to one-another for all bandwidths. This may suggest that the RDD-AFT model would serve as a useful check in situations where an S-AFT model has been fitted.
The RDD-AFT model is parametric and we examined examples using a Weibull and a log-logistic distribution. This could be a drawback of the RDD-AFT approach, when compared to the semi-parametric S-AFT model which requires fewer distributional assumptions. However, the choice between parametric and non-parametric survival models may be often debated, even in a non-RDD setting. As with any statistical model, we would recommend that distributional assumptions are justified or checked before fitting a parametric RDD-AFT model. In addition, checking that the RDD assumptions are valid is also important.
Overall, we recommend that RDDs be considered more often for time-to-event outcomes, particularly in medical studies although methods presented may be easily adapted to other fields. The RDD-AFT method is an approach that is easy to fit using standard statistical software and that may produce a useful and interpretable estimate of the difference, if any, in the time-to-event between treated and untreated groups using observational data and a fuzzy RDD.

A1:
The threshold and treatment indicators are not independent.
This assumption implies that the threshold rule and the allocation to treatment must be associated. For a sharp design, this relationship would be deterministic. For a fuzzy design, there should be an association between A i and Z i which should be of sufficient strength for an RDD approach to be valid. A2: The probability of receiving treatment is discontinuous at the threshold. That is This implies that the threshold signifies a separation in the probability of treatment (above and below) and, coupled with A1, ensures that the strength of association between treatment and threshold allows an RDD to be used. A3: where f (.) is the distribution function of the time-to-event. If a treatment effect on the time-to-event occurs then we should expect a difference in the distribution of the time-to-event above and below the threshold. Assumption A3 ensures that any 'jump' or difference in the time-to-event distribution at the threshold is due to the treatment and not any other variable. A4: The threshold indicator is independent of confounders conditional on the assignment variable.
This assumption implies that the treatment guideline (threshold rule) cannot be adapted for individual subjects and we note that this should hold at least within the region of the threshold. A5: The threshold indicator and the time-to-event outcome variable are independent conditional on the other variables.
This assumption ensures that, within a group of exchangeable subjects with assignment variable values close to the threshold, subjects ought to fall randomly above and below the treatment threshold. The threshold represents an externally defined decision rule and does not change based on subject characteristics. We should expect confounding variables to be distributed similarly amongst groups of subjects above and below the threshold, in an analogous manner to randomised groups in a randomised controlled trial. In this sense, the threshold acts as a quasi-randomisation device for a group of subjects whose assignment variable values lie close enough to the threshold for these subjects to exhibit similar characteristics. In practice, this assumption is likely to hold if there is an association between the threshold and treatment assignment and as long as patients are not prescribed (or prescribed) treatment in a systematic manner that ignores the threshold-assignment variable decision rule. In addition, this assumption implies that the treatment guideline (threshold) is externally defined and would imply that subjects cannot manipulate their outcome variable so that they would lie above (or below) the treatment threshold. As the outcome is a time-to-event (and especially in the metformin example where the outcome is all-cause mortality) this example is likely to hold in many scenarios. A6: We assume that there is no systematic non-adherence to the treatment guideline. This assumption implies that there are no healthcare practitioners who would systematically behave in the opposite way suggested by the threshold rule. In other words, that there is not a practitioner who would deliberately prescribe treatment to all subjects with assignment variables below the threshold and withhold treatment from subjects with assignment variable values above the threshold. This assumption would, in general, seem plausible and especially so in the metformin example discussed in this paper. A7: Censoring is uninformative. We assume that right censoring would occur at similar rates (and for similar reasons) above and below the threshold (or, at least, in the subset of subjects whose assignment variables lie close enough to the threshold to be included in an RDD).

APPENDIX B. DERIVATION OF THE RDD-AFT ESTIMATOR
Consider the model: is the treatment effect of interest. Here, the expectation of is not necessarily zero as it represents the expected value of log T when treatment is not received.
Fitting the marginal model in Equation (A1) may result in a biased estimate of the treatment effect, owing to confounding variables. Therefore, we estimate the treatment effect by exploiting the fact that we have partial information on treatment allocation.
We can represent In subsequent derivations, the limits will be dropped for simplicity: E(log T | Z = 1) will appear instead of lim x↓x 0 E(log T | X = x) and E(log Using Assumption 3, we have Therefore, reverting to the original notation including limits, The numerator term is equivalent to the estimate of 1 in Equation (2) while the denominator term is the probability of compliance.

APPENDIX C. G-ESTIMATION: OVERVIEW
We define the counterfactual outcomes of receiving and not receiving treatment as T 1 i and T 0 i , respectively. The counterfactual outcome of not receiving the treatment is estimated from the S-AFT model using the relationship: We note that if the i th subject did not receive the treatment, then T i = T 0 i whereas T i = T 1 i if the i th subject received the treatment, where T i is the observed outcome. Hence Equation (A2) may be rewritten as: and T 0 i = T i for subjects who do not receive treatment and T 0 i = exp(− )T i for those who receive treatment. We consider T 0 i as a function and write T 0 i = T 0 i ( ); the parameter of interest, to be estimated using G-estimation, is . A logistic regression model for the probability of treatment, conditional on observed confounders and T 0 i = T 0 i ( ) may be written When assumption S1 holds, that is, T 0 i ( ) is independent of treatment conditional on  i , then 1 = 0. Therefore the G-estimate of is the value of (say, * ) that leads to the failure to reject the null hypothesis that 1 = 0. The G-estimate can be obtained by minimising the score test statistic for 1 = 0 with respect to * . This is equivalent to finding the solution to the estimating equation (Hernán et al., 2005): Hence, the value * that solves the estimating equation above is the G-estimate of . The solution to the estimating in Equation (A4) is computed using the uniroot function in R.
To handle right-censoring using G-estimation, the type of censoring should be considered. Specifically, we should consider whether censoring occurs owing to a subject dropping out prior to the end of a study's follow-up period or if censoring happens because the event of interest has not occurred by the end of follow-up (administrative censoring). When censoring occurs because of drop out, inverse probability weighting is used to account for right-censoring. We define an indicator variable D i , where D i = 0 if the i th subject drops out before the end of the study and D i = 1 otherwise. The inverse probability weight for censoring is calculated as The score function (Equation A4) is then adjusted to account for the inverse probability weight.
For administrative censoring, we specify K to be the end of follow-up time for the study, such that any subject for whom T 0 i ( ) ≥ K would have a time-to-event that is right-censored. We define K( ) as Using K( ), we define Δ i ( ) = 1(T 0 i ( ) ≥ K( )). For any subject where Δ i ( ) = 1, T 0 i ( ) is replaced with zero in the estimating equation below (Equation A5). Further explanation regarding the intuition behind this approach can be found in Hernán et al. (2005) The G-estimate of is then the * that solves the estimating in Equation (A5) and, typically, this equation will be solved numerically (for example, using uniroot function in R (R Core Development Team, 2019)).

APPENDIX D. SIMULATION STUDY RESULTS WHERE THE HAZARD FUNCTION IS NON-MONOTONIC
The distribution of ( i ) in Step 6 of the simulation study presented in Section 4 is simulated from a logistic distribution: i ∼ Logistic[log(7), 0.15].
Therefore, T * comes from a log-logistic distribution with scale parameter of 7 and shape parameter of 1 0.15 . Since the value of the shape parameter is greater than 1, the hazard function of T * is clearly non-monotonic, which implies the Weibull distribution is not appropriate. Figure A1 shows boxplots that summarise simulation study estimates.

F I G U R E A1
Boxplots showing the acceleration factor estimates from the simulation studies to compare performance of the RDD-AFT and S-AFT approaches. The true value of the acceleration factor is 1.5 (dashed horizontal line). The sample size was 2000 in each simulated data set and simulations were repeated 1000 times