Estimating efficacy in trials with selective crossover

When one arm in a trial has a worse early endpoint such as recurrence, a data‐monitoring committee might recommend that all participants are offered the apparently superior treatment. The resultant crossover makes it difficult to measure differences between arms thereafter, including for longer‐term endpoints such as mortality and disease‐specific mortality. In this paper, we consider estimators of the efficacy of treatment on those who would not cross over if randomised to the apparently inferior arm. Binomial and proportional hazards maximum likelihood estimators are developed. The binomial estimator is applied to analysis of a breast cancer treatment trial and compared with intention‐to‐treat and inverse probability weighting estimators. Full and partial likelihood proportional‐hazard model estimators are assessed through computer simulations, where they had similar bias and variance. The new efficacy estimators extend those for all‐or‐none compliance to this important problem. © 2017 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd


Introduction
Unplanned crossover occurs in randomised trials when participants decide to switch treatment arms. It makes it more difficult to measure a difference between arms, especially when there is selective crossover because those who switch have a different background risk than others. In this paper, we consider trials with selective crossover in one arm arising due to results from early efficacy endpoints, such as when initial results from a trial or other concurrent trials lead a data-monitoring committee to recommend that all participants are offered the apparently superior treatment. This form of crossover is represented using a lexis diagram in Figure 1. It occurred in the BIG-1 98 trial, where women were randomised to receive 5 years of either letrozole or tamoxifen to prevent breast cancer recurrence. Initial results [1] led the organisers to inform women in the tamoxifen monotherapy arms of their treatment, in order to allow an informed decision about future care; one quarter of those in the tamoxifen arm switched to letrozole [2].
The traditional inferential method for a trial with selective crossover is an intention-to-treat (ITT) analysis. This maintains the randomised balance of all causal factors other than the treatment, but it is likely to attenuate the estimated effect of treatment, because some participants in the control arm receive the intervention. A per-protocol approach censors those who cross over in the analysis, but it is unreliable when crossover is informative because bias may occur in either direction and inference is compromised [3]. Instead of these, we focus in this article on the effect of treatment on those who would not cross over when offered if randomised to control. Although this may also be biased for the effect of treatment on everyone in the trial, it is still of interest to identify the effect of treatment on those who would receive it for the duration, and these are the only individuals who provide information on treatment effects over the complete follow-up period.
Efficacy is defined more formally using a model with two latent strata. The first latent stratum contains insistors, who would cross over to treatment when offered if randomised to control; the second consists of 'ambivalents', who would not cross over when offered if randomised to control. The strata are shown using an example in Figure 2. No one in the treatment arm switches to control, so there are assumed to be no individuals who defy randomisation by taking the opposite treatment, or who would always switch from treatment to control. The latent strata are unobserved in both arms prior to crossover, and only partially revealed in the control arm. Efficacy measures the average difference between the ambivalents in the two treatment arms. Figure 2 is a suitable schematic for analysis of disease-free survival such as in the BIG-1 98 trial, but it does not adequately show subtle complications that arise for a mortality endpoint. Figure 3 illustrates the flow of patients under the model in the control arm when death is also included. The additional issue is that the underlying strata of participants in the control arm are not revealed after unblinding for those still alive but whose disease progressed beforehand. If this information was ignored, then the insistors in control with a recurrence before the offer to switch would be incorrectly identified as ambivalent.
We view the model as a plausible description of the trial population when there is a difference in underlying risk after crossover between those who switch treatment and the entire treatment arm. Models with similar latent strata have been applied to different problems [4], including when they are revealed at baseline [5][6][7] and when they are revealed after baseline [8]. A consequence of the two-strata model is that the proportion of insistors at risk is likely to be different at baseline from the time of potential crossover.
The unobserved boxes in Figure 2 show why the estimation problem is difficult. The marginal effect of treatment may be estimated before crossover by comparing the two arms without adjustment. After crossover, insistors in control receive the treatment. If the ambivalents were observed in both arms after the offer, then estimation of efficacy would be straightforward. The problem is that they are missing data before crossover in control and all the time in treatment. The remainder of the article considers this estimation problem.

Binomial model
A binary model may be applied to analysis of a trial when events are rare and little information is contained in the follow-up or event time of patients. It also provides insight into assumptions and parameter identifiability aspects. We next focus on the model illustrated by Figure 2, but the methods may be adapted to mortality endpoints such as in Figure 3 by ensuring that the latent strata are unknown throughout follow-up for those with a recurrence before the opportunity to cross over.

Assumptions
The first assumption is as follows: (A1) The potential time of crossover (time when the patient would cross over if an insistor and randomised to control) is observable for all participants in both arms. This holds for the type of crossover shown by Figure 1. Using A1, we may take two time periods before and after the offer (potential crossover time) and parameterise the probability p trk of an event in period t = 0, 1 (before, after the offer), treatment arm r = 0, 1 (control, treatment) and latent insistor stratum k = 0, 1 (ambivalent, insistor) as follows. For ambivalents (k = 0) in control (r = 0) and periods t = 0, 1 where 0 ⩽ t ⩽ 1 is an unknown probability of the event (if a constant rate * ⩾ 0 is assumed and the time at risk T t is known, then t = 1−exp(− * T t ), so that t in both periods only depends on a single unknown parameter). Efficacy t in period t = 0, 1 is defined as the ratio between the treated and untreated ambivalents t = p t10 ∕p t00 . Insistors in the treatment arm are taken to differ from the ambivalents in period t = 0, 1 through We also assume the following: (A2) The treatment effect prior to the offer is the same for the ambivalents and insistors, so that p 011 ∕p 001 = 0 , or equivalently A pragmatic reason for taking assumption A2 is that it ensures that the conditional treatment effect among ambivalents before the offer matches the marginal effect (seen hereafter in Equation (2)), which would be the primary measure for an ITT analysis of a trial without selective crossover.
After the offer to switch, we allow a different treatment effect for the insistors in control through p 101 = 1 1 * 1 , where * 1 need not equal 1 . For example, a treatment crossover effect may exist, such as resulting from receiving a drug in control then the different drug from the treatment arm (which would not have been the case at baseline for those randomised to treatment). Alternatively, this might represent a difference in the treatment effect for insistors and ambivalents, and so testing whether * 1 = 1 is a way to partially verify A2, albeit after the offer. We require two further assumptions: (A3) The probability of being in strata k = 1 (insistor) at baseline is the same in both arms because of randomisation. (A4) Censoring is independent of strata k = 0, 1.
A4 may be partly verified by using data in the control arm after the offer to switch and inspecting censoring in the other observed groups prior to and after the offer. Equivalent versions of A3 and A4 have been used by all-or-none compliance estimators [7].

Identifiability
Six observable statistics depend on the model parameters. These are the total number of events in arm r = 0, 1 in the first period y 0r , the number of insistors n 101 (or ambivalents n 100 ) in control after the offer, the number of events in the second period in control ambivalents y 100 and insistors y 101 , and the number of events in the treatment arm in the second period y 11 . The number censored d 0r in the first period in each arm r is an ancillary statistic, and binomial distributions are conditional upon the number n tr in each arm r at risk at the start of each period t. The expected values of the observable statistics relate to the model as follows. Before the offer, the expected proportion of events in control is and in the treatment arm it is E(Y 01 )∕n 01 = 0 E(Y 00 )∕n 00 .
At the offer to cross over, we observe the number of insistors still at risk in control n 101 , where which subtracts the expected number of events and censored (assumption A4) in insistors from the expected number enrolled in control at baseline. The term compounded by E(Y 00 ) is the probability that the events are insistors. In the treatment arm, the number of insistors at risk at the offer N 111 is unobserved, but the expected value has the same form as Equation (3) with r = 1 in the subscripts rather than r = 0. Thus, in the first period, there are three equations but four unknowns ( , 0 , 0 , 0 ). The only parameter that is uniquely identifiable without further constraints is the treatment effect 0 , replacing expectations by observed values in Equation (2). After the offer to switch the expected proportion of events in three observed groups (cf. Figure 2) is where 11 is the probability that individuals at risk in the treatment arm after the offer are insistors. These equations help to assess identifiability of the model. Replacing the expectation with the observed value in Equation (4) provides an estimate of 1 . There is one degree of freedom for̂0 and̂( may be expressed as a polynomial function of 0 via Equation (3) and̂1 1 =Ê(N 111 )∕n 11 ). It might be important to allow for different rates in both periods, so that 0 and 1 cannot be tied together through an constantrate assumption. For example, risk of death increases with age. After incorporating ( 0 , 1 , 0 , 0 ), we have three unknowns ( 1 , 1 , * 1 ) and two degrees of freedom. However, this is not very restrictive because our a priori assumption is that 0 = 1 = * 1 , partly because we are interested in an overall measure of treatment efficacy over the entire period of follow-up, and 0 = 1 is also our primary model. The ability to check these assumptions contrasts favourably with an ITT analysis, where it would be very difficult to check whether the treatment effect is constant because any difference might only be caused by crossover.

Estimation
Estimation might be based on equating observed to expected in Equations (1)-(6), but we recommend to use maximum likelihood estimation, which will be covered in Section 4. For insight into the target of estimation, consider when * 1 = 1 and rearrange Equation (6) using Equations (4) and (5) so that The numerator is a total proportion of events in treatment after crossover minus the proportion of events in insistors after crossover, scaled by the proportion at risk at crossover in treatment. The denominator is the proportion of events in the ambivalents in control after crossover, scaled by the proportion of ambivalents at risk in treatment at crossover. This structure is very similar to for all-or-none compliance [6]. Essentially, estimation of efficacy after crossover subtracts out the insistors from both arms by using the observed insistors from the control arm.

Proportional-hazards model
The binomial model extends straightforwardly to an exponential model, but a proportional-hazards model is more flexible, and so we focus on it next. Let the results from the trial for independent individuals i = 1, … , n be (t i , i ) where t i is the continuous observation time and i is equal to 1 if the event occurred and 0 if right censored at the last follow-up. Denote treatment randomisation indicator r i to be 1 if subject i was randomised to treatment and 0 if control, and c i = 1 if an insistor and 0 if ambivalent. The potential crossover time is s i (cf. Figure 1). Then the hazard at time t ⩾ 0, conditional on covariates where 0i (t) is a baseline hazard function (with an i subscript because it might be stratified so that 0i = 0j if i and j are in the same strata, or not with 0i = 0 ), and where I(.) is the indicator function, prime denoting the transpose, and parameters = ( , , 1 , … , m ). The treatment effect parameter is , the effect of being an insistor , and covariate effects = ( 1 , … , m ). The formulation is an extension of [7], but where the insistors receive the effect of treatment only after crossing over. It is quite different than treating compliance as a time-dependent covariate, which does not respect randomisation and leads to bias, because it allows for the latent strata in both arms. The model does not include time-varying covariates, but this is unlikely to be restrictive for randomised trials, where even static covariates are not commonly used other than by stratifying the baseline hazard function.
The estimators that we develop in the next section require assumptions A1-A3 from Section 2 and the following.
(A4b) Censoring is independent given covariates and independent of latent strata given covariates. (A5a) Latent strata are independent of covariates, or (A5b) latent strata are independent of covariates within defined groups.
(A6) Latent strata are independent of time at entry within defined groups.
A5a is needed for the partial likelihood estimator in the Section 4; the more relaxed A5b is for the full likelihood estimator. Both are not an issue if no covariates are used in the model, and A5a was also used by [7] in their partial-likelihood estimator. A6 is needed for both estimators. The assumptions A5 and A6 may be checked using the data after the offer to switch. If A5b and A6 are not met without groups, then groups might be sought where they are met. As for the binomial estimator, one may test if the treatment effect is constant before and after the offer. An outline of a full maximum likelihood estimation algorithm is as follows; see Section 4 for technical details. Firstly, the probability each individual is an insistor at baseline is computed given̂; then the baseline hazard may be fitted conditional upon this and̂; finally, are fitted given the baseline hazard and insistor probability estimates. The approach used in this algorithm to estimate the proportion of insistors at baseline (full likelihood) and through time (partial likelihood) is the biggest difference compared with [7]. This is because in the all-or-none compliance case the proportion of switchers at risk through time in the unobserved arm may be estimated simply as the observed number of switchers at risk. However, the new algorithm could also be applied in the all-or-none case where the main practical difference is that the estimate of the number of non-compliers at risk in the unobserved arm would change each time there is an event in that arm, rather than only each time there is an event in the observed arm. The overall profile-likelihood algorithm continues until convergence in the likelihood [9]. The partial likelihood is asymptotically concave because it is the same form as for all-or-none compliance [7], so it offers a good starting point for full maximum likelihood.
Inference may be based on profile likelihood. One approach that is not recommended is to use Wald confidence intervals from the partial likelihood; these would underestimate variability by treating the proportion of insistors at risk through time as known.

Binary model
In the first period in control (t = 0; r = 0) and the treatment arm in both periods (t = 0, 1; r = 1), the probability of an event is where p trk was defined in Section 2 and 0r = is assumed constant for t = 0. There are five unknown parameters in the primary model: the baseline probabilities 0 , 1 (potentially different in both periods), the insistor effect = 0 = 1 (common in both periods), the treatment effect = 0 = 1 = * 1 (common in both periods) and the proportion of insistors at baseline . The probability of being an insistor at risk in treatment after the offer ( 11 ) is estimated as a function of the data and parameter estimates ( 0 , , , ), as discussed in Section 2. For the treatment arm (t = 0, 1; r = 1) and the control arm before the offer (t = 0; r = 0), the likelihood is and after the offer in control The complete likelihood may be used to form maximum likelihood estimates and profile-likelihood confidence intervals in the usual manner.
where i = P(C i = 1 | r i , x i ) is the probability a person is an insistor conditional on baseline covariates x i and If the crossover is observed (t i ⩾ s i and r i = 0), then L i is the product of two terms given later in Equation (20). The first contribution is a survivor function P(t i > s i | r i = 0, s i , x i ), which is of the same form as Equation (14); the second contribution when c i is known ( . The full likelihood for everyone is and we will write l( ), or just l, for the log likelihood.

Partial likelihood function.
Consider when 0i (t) = 0 (t) so that there are no baseline-hazard groups. Then the partial likelihood is denoted When the latent stratum is unknown (r j = 0 and s j > t i , or r j = 1) where (t i | r j ) is the conditional probability of being an insistor (c j = 1) still at risk in arm r j at time t i . Note that we follow assumption A5a and assume (t i | r j , x j ) = (t i | r j ). When crossover status is known (r j = 0 and s j ⩽ t i ), then (t i , r j , s j , x j ) = (t i , r j , c j , x j ). For a model with different baseline hazard functions for defined groups, the product of different partial likelihoods of form (15) for each baseline-hazard group is taken.

Proportion of insistors.
In this section, we develop a consistent estimator of the proportion of insistors at risk through time given the event times, insistor effect and the number of insistors at baseline u, without a full probability survival model. We take a group where C is independent of X (assumption A5b) and time at entry in the group (A6); the following may be applied separately to all such groups. Then consider arm 0 where latent strata are observed for t j ⩾ s j . The proportion of insistors at risk is to be updated at t ′ j = min(t j , s j ) for j = 1, … , n 0 individuals in arm 0, which is the crossover time s j if it is before the survival time t j , or t j otherwise. The combination of censoring indicator j and whether the individual was at risk at s j is classified using s j > t j and j = 1 (event before offer to switch) 1 + c j s j ⩽ t j (at risk after offer) 3 s j > t j and j = 0 (censored before the offer).
We take the order of individuals to be sorted ascending by t ′ j , so that for indices j ∈ (1, … , n 0 − 1), . For a given u 0 = u number of insistors in arm 0 at baseline and insistor effect ′ = exp( ), the number u j and proportion of insistors at risk for j = 1, … , n 0 are estimated iteratively using ) +̂(t ′ j−1 ) ′ } e j = 0 (unknown stratum, event before the offer) 0 e j = 1 (ambivalent) 1 e j = 2 (insistor) (t ′ j−1 ) e j = 3 (unknown stratum, censored before the offer) The rationale is as follows. When e j = 1, 2, then we know whether the individual crossed over, and the number still at riskû j at time t ′ j is updated accordingly. When e j = 3, then̂(t ′ j ) does not change because censoring is assumed independent of insistor status. If e j = 0, then c j is unobserved and j estimates the probability P(c j = 1 | e j ); compare with Equation (3) for the binomial model. It arises from model (8) where using N(t | c j ) to denote a counting process that jumps by one when there is an event. For example, if the arms are balanced with (t ′ j ) = 0.5, but the insistor effect ′ = 2, then for each event among the ambivalents, (c j = 0) ′ = 2 events are expected in the insistors (c j = 1), that is, Similarly, if ′ = 1 and there are twice as many insistors as ambivalents ( (t ′ j ) = 2∕3), then for each event among ambivalents, there will be (t ′ j ){1 − (t ′ j )} −1 = 2 events expected in the insistors. Although the survival model is in continuous time, a few ties are sometimes tolerated. If t ′ j = t ′ j+1 = … = t ′ j+k are tied, then the suitable adjustment is to set the associatedû l =û j+k for l = j, … , j + k − 1.
Secondly, the number of insistors at baseline u may be estimated by maximum likelihood, given . The data are occasions when an individual crossed over or not, that is, when e j = 1, 2. For j = 1, … , n 0 , if e j = 0 or 3, then the log-likelihood l j (u | ) = 0, else where the where k selects the first crossover event time equal to t ′ j . The * (t ′ j ) definition is needed for ties when more than one event occurs at each point t ′ j ; the same estimate of the proportion of insistors should be used for all of them. The proportion of insistors at the start may be estimated by maximising the overall log-likelihood l(u | ) = ∑ n 0 j=1 l j (u | ). Finally, given u and , one may estimate the proportion of insistors at risk in the observed arm 0. If the proportion of insistors at the start of the trial in arm 0 is estimated to be the same as in arm 1, then one may use Equation (16) but for the unobserved arm 1.

Baseline hazard function.
A baseline hazard function may be estimated for each baseline-hazard group, so for simplicity take that i0 (t) = 0 (t) for i = 1, … , n, where 0 (t) is a step function with unknown jumps Δ i at each observed event time t i and that the times are sorted ascending. To estimate it, we use the partial derivative In the following, we drop conditioning arguments from the setup except c in reduce notational burden, but they are still present. Firstly, consider j = 1, … , n where r j = 1, that is, those in the treatment arm whose crossover status c j is not observed. Let j = P(C j = 1 | x j ), then in this case we have that, for j = i, … , n such that r j = 1, where ∶= denotes definition. Now consider j = 1, … , n in arm 0. When insistor status is not observed because t j < s j , then the same form as Equation (19) applies; denote it a 2 . If t j ⩾ s j , then and for when s j > t i l j (2) 3j , so that overall for those observed to crossover (t j > s j ) Overall, we have Setting Equation (21) to zero provides a route to estimation of Δ i for i = 1, … , n, conditional on and . If i = 1 and i+1 = 1 and it is not true that t i < s j < t i+1 for j = i + 1, … , n, then (22) say. This simple form arises because the i + 1, … , n terms in the a 1 , a 2 , a 3 summations over j cancel out. Extra terms are needed with censoring, and if t i < s j < t i+1 for j = i + 1, … , n because the derivatives of those j's likelihood contribution with respect to Δ i do not cancel out. Thus, more generally for two events at times i and i + k with censored observations between from i + 1, say, where the first summation is for the censored observations and the second summation is for those observed to crossover in arm 0 between t i and t i+k . Now Δ −1 i+1 = Δ −1 i − D i , and the problem of estimating the baseline given is reduced to determining Δ i for the first i (i.e. smallest t i ) with i = 1. This may be achieved using a root-finding algorithm for the score function.
The above extends when there are ties. Suppose that w i times are tied following t i (t i = t i+1 , … , t i+w−1 ), and w i+k following t i+k . Then the left-hand side of Equation (23)

Example
We next apply the binomial estimator to the data from two monotherapy arms in the BIG1-98 trial, where postmenopausal women who had been diagnosed with hormone-receptor positive invasive breast cancer were randomised to receive 5 years of tamoxifen (control) or letrozole (treatment). Following the primary analysis [1], women who were still receiving tamoxifen were unblinded [2]. In this example, one might question the model assumption that the treatment effect of letrozole is that same after taking tamoxifen, as at baseline (i.e. 1 = * 1 in the notation of Section 2). This issue was actually investigated by planned crossover arms in the BIG-1 98 trial, where little difference was found [10], so we proceed under the assumption. Table I shows a summary of data previously reported [10,11]. The results before unblinding are in period 0; period 1 is the time after that until 12-year follow-up. Of 2459 women randomised to tamoxifen, some 25% received letrozole after unblinding, with most crossing over between 3 and 5 years since randomisation. Table II shows published results from fitting ITT and inverse probability of censoring weighting (IPCW) stratified proportional hazards models using individual time-to-event data [10]. The per-protocol The data in period 0 were reported by [11]. The number of events in period 1 are the 12 year follow-up analysis [10] minus events in period 0. The number at risk in the second period was calculated as the number randomised minus the number (i) of disease-free survival events in the first period, (ii) lost to follow up and (iii) withdrawn (Table A2 in [11]), split by the reported 619 who chose to cross over [10].  approach that censors women who ceased to comply with their randomised allocation is not suitable for these data because there was strong evidence that those who crossed over had a different baseline risk than those who did not; the group who crossed over from tamoxifen in Table I had a lower proportion of events (9.4%) than the letrozole arm (13.0%). Thus, the randomised balance of risk factors between comparison groups is almost certainly lost in a per-protocol analysis. An ITT analysis was applied using just the number of events and women in both arms. In general, relative risks are closer to unity than hazard ratios, so it is not surprising that the relative risks for ITT were closer to unity than the hazard ratio in Table II. However, differences were small and ITT inference was unchanged. The binomial efficacy estimate for disease-free survival was of a similar order less than the ITT estimate, in comparison with the IPCW estimate. This provides some support to the IPCW estimate for disease-free survival.
We also used the model to allow for different treatment effects in each period. There was little heterogeneity between the point estimates of efficacy before and after unblinding (Table II; Figure 4). However, the wide confidence intervals after crossover indicate that further follow-up would be useful to determine whether disease-free survival was better in the first period. There was also almost no evidence for a difference in 0 and 1 (data not shown).
A limitation of the example is that it is best viewed as a demonstration of the method. Firstly, the actual number at risk in the second period was only estimated on the basis of published data (although small changes to these numbers will not materially affect results). Secondly, application of the proportionalhazards model on individual data would also be a better comparison with IPCW. Finally, we were unable to apply either estimator for mortality endpoints because the subdivision of data illustrated by Figure 3 would be needed.

Simulations
The following simulations consider the performance of the full and partial likelihood estimators for different treatment and insistor effects, as the level of information on the treatment effect changes from censoring. Nelder-Mead simplex algorithms were used for both full and partial likelihood estimation, but very little difference was seen to a Broyden-Fletcher-Goldfarb-Shanno algorithm (not reported); Brent-Dekker algorithms estimated the insistor proportions and baseline hazard function [12].
Survival times were simulated from an exponential distribution. The simulation was set up so that there were three periods in follow-up time: before, during and after crossover. The period during crossover was the same length as the period before it. We considered two censoring scenarios (i) with censoring where an expected 5% events occurred before the first crossover in the lowest hazard group, and 5% events after the last crossover, and (ii) no censoring with an expected 10% of events before the first crossover in the lowest hazard group. For example, if the hazard was less for insistors and the treatment group, then the lowest hazard group was insistors randomised to treatment. One thousand individuals were in each arm, and the chance of being an insistor was 25% in both arms, but the number at baseline was sampled independently in both arms from a binomial distribution. The treatment effect ( ) was 7/10 or 10/7; the effect of being an insistor ( ) was either 0.2 or 2.0.
The pseudo algorithm used to generate the data is as follows. For r = 0, 1 and i = 1, … , n r individuals in arm r with maximum follow-up F and an event-rate pre-crossover ar (c i ) and post-crossover pr (c i ) conditional on underlying crossover trait c i : sample (i) crossover trait c i at baseline from P(c i = 1) = 0.25; (ii) crossover time s i from a uniform distribution between 0 and 1 {U(0, 1)} plus 1; (iii) survival time t i1 from an exponential distribution with pre-crossover rate ar (c i ); (iv) survival time t i2 from an exponential distribution with post-crossover rate pr (c i ). Then calculate (i) unobserved survival time (no censoring) t u i = t i1 if t i1 ⩽ s i ; else t u i = s i + t i2 ; (ii) observed survival time (with censoring) t i = min(t u i , F); and (iii) event status d i = 1 if t i < F; else d i = 0.
Full and partial likelihood estimators are compared in Table III. Mean bias was very small; for example, 0.467 for̂from the partial likelihood in scenario 4 implies that the mean estimated treatment effect was 0.699 = exp{(1 + 0.00467) × log(0.7)}. The full likelihood bias was less than the partial likelihood for both the treatment effect and insistor effect , but their variances were very similar. There was also very little difference in the estimated proportion of insistors at baseline. Thus, overall, the full and partial  To evaluate the use of a simple bootstrap for inference, Table IV presents some results of resampling with replacement individuals in each arm in each simulation sample. Two hundred bootstrap resamples were made in each sample, and an estimate of the variance of the parameter estimate was obtained. The table shows that this matched the observed variance from the 1000 simulation replicates at each scenario quite closely. Because the variance of the full likelihood matched that of the partial likelihood quite closely in Table III, one might just bootstrap the partial likelihood as computational time becomes more of a consideration.

Conclusion
The methods in this paper are applicable for analysis of trials with selective crossover arising due to results from early efficacy endpoints and preserve randomisation. Our binary estimator is relatively simple to implement, but the proportional-hazards model is more involved. An R package [13] with code for both is available on request. We end by discussing the model assumptions in relation to other methods, and comment on other breast cancer trials where treatment efficacy might be estimated.
The most common methods to account for selective crossover in practice are IPCW [14] and g-estimation [15,16]. IPCW censors individuals when they cease to comply with randomised allocation, but up-weights those who do comply and have similar characteristics to the non-compliers, as judged by a chosen statistical model. The problem is that the assumptions required to guarantee that randomisation is maintained cannot be verified because it is not possible to check the randomised balance of factors that have not been measured. G-estimation involves predicting the outcome for each person who switches treatment to be that expected if they had not switched. It is often accomplished through an accelerated failure-time model [17]. In contrast with IPCW, randomisation is maintained. A disadvantage is that it impossible to assess the assumed effect of treatment on the insistors after the offer to switch because there is no comparison group: all insistors receive treatment after the offer. It is also not possible to allow for time-dependent treatment effects. Other advantages and disadvantages of these methods have been discussed more fully elsewhere [18,19].
Although inverse probability weighting and g-estimation make important untestable assumptions that are not required by our method, they may have better statistical performance than our estimators when their assumptions are most closely met. For instance, assumptions A5b and A6 might be restrictive for a particular data set. We also note that assumption A1 precludes a stochastic time of crossover, perhaps arising due to the health of a participant though time when randomised to control, but inverse probability weighting and g-estimation do not require this assumption. An important example of this in oncology is crossover after progression-free survival in advanced disease.
Other oncology trials appear to be suitable for our estimators. The open-label ABCSG-8 addressed a similar issue to BIG-1 98 and after publication of initial results approximately 9% crossed over from the tamoxifen arm [20]. The MA.17 trial showed superiority of letrozole over placebo for postmenopausal women with hormone receptor-positive breast cancer following about 5 years of tamoxifen [21] and almost two-thirds of the patients in the control arm crossed over after unblinding. In the open-label Herceptin Adjuvant (HERA) trial, more than half of patients in the 'control' group chose to cross over after a first interim analysis [22]. With such substantial crossover, methods such as those in this article are clearly needed to help assess longer-term differences between treatments.
In conclusion, we recommend that treatment efficacy and our estimators thereof be considered for trials with selective crossover in one arm arising due to results from early efficacy endpoints.