Estimation in multi‐arm two‐stage trials with treatment selection and time‐to‐event endpoint

We consider estimation of treatment effects in two‐stage adaptive multi‐arm trials with a common control. The best treatment is selected at interim, and the primary endpoint is modeled via a Cox proportional hazards model. The maximum partial‐likelihood estimator of the log hazard ratio of the selected treatment will overestimate the true treatment effect in this case. Several methods for reducing the selection bias have been proposed for normal endpoints, including an iterative method based on the estimated conditional selection biases and a shrinkage approach based on empirical Bayes theory. We adapt these methods to time‐to‐event data and compare the bias and mean squared error of all methods in an extensive simulation study and apply the proposed methods to reconstructed data from the FOCUS trial. We find that all methods tend to overcorrect the bias, and only the shrinkage methods can reduce the mean squared error. © 2017 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd.


Introduction
Multi-arm trials, which compare multiple treatment arms to a common control group, are more efficient than traditional two-arm designs as only a single control group is used [1]. Additional efficiency gains can be achieved by selecting treatments at one or more interim analyses [2]. Most of the recent research in this area focuses on how such studies are designed for a normally distributed endpoint (e.g., [3][4][5][6]). Approaches that are applicable to non-normal and particularly time-to-event endpoints are less frequent in contrast (e.g., [7][8][9][10]), although many examples of multi-arm trials with survival endpoints, such as the three-arm SANAD trial in generalized and unclassifiable epilepsy [11], the four-arm STAMPEDE trial in prostate cancer [12], or the five-arm FOCUS trial in poor prognosis advanced colorectal cancer [13], exist.
Adaptive treatment selection at an interim analysis increases the chance of a successful trial and avoids wasting resources on unpromising treatments [1]. It is well known, however, that selecting promising treatments based on the observed data leads to overestimation of the treatment effect in the selected treatment arms and an underestimation of the treatment effect in the dropped treatment arms [14]. A variety of different strategies to remove or reduce the bias has been developed for normal data. The uniformly minimum variance conditionally unbiased estimator (UMVCUE) in a two-stage design when selecting the best treatment was first derived by Cohen and Sackrowitz [15] and extended to more general selection rules in [16] . A simple iterative method for calculating a bias-corrected estimate based on the estimated conditional selection bias was proposed by Stallard and Todd [17]. An extension of the shrinkage estimator of Hwang [18] to two-stage adaptive designs has been proposed by Carreras and Brannath [19]. They show that the shrinkage estimator has smaller Bayes risk than the maximum likelihood estimator (MLE) and also a smaller mean squared error (MSE) in many scenarios.
Common to the methods for normal data is that they assume equal variance between treatment arms and do not consider the control group, because in the normal case, the conditional selection bias of the MLE is the same with or without a control group (although the equal variance assumption has been relaxed for the UMVCUE in recent work [20], which also explicitly accounts for a shared control group). However, in the context of survival trials and for the estimation of log hazard ratios (log-HRs), a control group is required. The challenge thereby is that a control group, which is common to all treatment groups, induces a correlation among the log-HR estimators. Moreover, the assumption of equal and known variances is unrealistic in the survival setting as the variances depend on the number of observed events. In this paper, we therefore adapt the method of Stallard and Todd [17] to time-to-event endpoints and consider shrinkage estimators in this context. In an extensive simulation study, we then compare the bias and MSE of these estimators and analyze reconstructed data from the FOCUS trial.
In Section 2.2, we derive explicit formulas for the selection bias of the MLE in the selected, as well as in the dropped treatment arms under a Cox proportional hazards model, when the treatment with the largest observed effect at an interim analysis is selected. We derive two shrinkage estimators in Section 2.3 and adapt the iterative Stallard-Todd method to survival data in Section 2.4. The simulation study in Section 3 compares the bias and MSE of these methods for different number of treatment groups, allocation ratios, and total number of events. An analysis of reconstructed data from the FOCUS trial is presented in Section 4 before we conclude with a discussion.

Methods
We consider a two-stage design where the best treatment, that is, the treatment with the smallest estimated log-HR, is selected at the interim analysis. Let S be the index of the selected treatment. Note that S is a random variable depending on the Stage 1 data.

Model
Consider K ⩾ 2 treatment groups and a common control group. Denote the total sample size by n, and suppose patients are allocated to group k in Stage 1 with probability p k (k = 0, … , K). Here, the index k = 0 represents the control group. Assuming proportional hazards between each treatment group and the control group, we consider the following Cox proportional hazards model: where is the vector of log-HRs and Z j = (Z 1j , … , Z Kj ) is the vector of binary treatment indicators, that is, Z kj = 1 if patient j is in group k and 0, otherwise (k = 1 … , K, j = 1, … , n). The joint model in Eq. (1) is equivalent to K separate Cox models with the same baseline hazard function 0 , which is the hazard in the common control group. In a sequential survival trial with more than one analysis time point, we need to distinguish between the calendar time, measured from the start of the trial, and the individual survival time of each patient. Associated with each patient is an entry time R j , measured from the start of the trial, a survival time T j , and a censoring time C j , both measured from entry of the patient into the trial. At each calendar time t, only the minimum Y j (t) = min(T j , C j , t − R j ) and the censoring indicator Δ j (t) = I{T j ⩽ min(C j , t − R j )} can be observed. The observed data of all patients at calendar time t are given by is the number of patients recruited by calendar time t.

Maximum likelihood estimator
The MLE of in Eq. (1) at calendar time t is denoted bŷM LE (t) = (̂M LE 1 (t), … ,̂M LE K (t)) and is obtained by maximizing the partial-likelihood of the Cox model based on the data D(t). In practice, interim and final analyses are usually not scheduled at fixed calendar times but after a certain number of events have been observed. Let d j be the total number of events of the j-th analysis and d kj the number of events in the k-th group at the j-th analysis, such that d j = d j0 + … + d jK (k = 0, … , K, j = 1, 2). Denote bŷ MLE,j the vector of estimated log-HRŝM LE,j k (k = 1, … , K, j = 1, 2) at the interim and final analyses, respectively. The estimated log-HR in the selected treatment group at the interim analysis iŝM LE,1 The true treatment effect S of the selected treatment is a random variable, and in general, S ≠ min k k . We say that an estimator̂M The MLEs are correlated because of the common control group. If ≈ 0 and the censoring time C is stochastically independent from Z, we show in Section A in the Appendix that the correlation kl between Note that when the size of the common control group decreases, the correlation of the MLEs increases. In the special case where all treatment groups are the same size, the correlation between any two log-HR estimators at the interim analysis is approximately (1 − p 0 )∕(1 + (K − 1)p 0 ). The derivation of the selection bias is based on the joint asymptotic distribution of the vectorŝM LE ,1 and̂M LE ,2 . We have approximately where Σ j is the asymptotic covariance matrix of̂M LE ,j (j = 1, 2) (see Section A in the Appendix).

Estimation in a two-stage design. Denote bŷM LE
S the increment of the MLE based on all data observed after the interim analysis (including events observed in Stage 2, from patients recruited in Stage 1). This is an (asymptotically) unbiased estimator of the true log-HR, which is inefficient, because it ignores all of the Stage 1 data. Formally,̂M LE S can be defined via the score process U of the Cox model and the corresponding Fisher information I conditional on the selection S. Asymptotically U 2S − U 1S and U 1S are stochastically independent, because of the independent increments property of U. Thus, the increment of the MLE given bŷM is asymptotically independent from̂M LE ,1 S conditionally on S. At the final analysis, the MLE of the selected treatment can be defined in two different, but asymptotically equivalent, ways, either bŷM LE ,2 , that is, from all data accrued until the final analysis or by a weighted sum of the first stage MLE and the increment from first to second stage, that is, where w ∈ [0, 1] is a pre-specified constant. We will consider both estimators in our simulation study. Usually, w will be equal to the information fraction at the planned time of the interim analysis, that is, w = d 1 ∕d 2 .

Selection bias and mean squared error.
The bias of the MLE of the selected treatment at analysis j can be written as The true treatment effect in the selected treatment arm is overestimated, when the best performing treatment arm is selected, because One can see that the true treatment effect in the dropped treatment arms is underestimated.
For normal means, the selection bias is maximal if all means are equal [19]. Their result and its proof still hold asymptotically for the estimated log-HRs when the signs are reversed; that is, for given variances and correlation, the absolute value of the bias is maximal if all true log-HRs are equal. We obtain a lower bound for the selection bias of̂M LE ,j S depending on the asymptotic covariance matrix Σ j . This bound is attained when the true log-HRs are all equal. Because the correlations are always nonnegative in our setting, it is an immediate consequence of Slepian's lemma [21] that for fixed variances, the lower bound C j (Σ j ) is minimal when all correlations are 0. This corresponds to a situation where every treatment group has its own independent control group.
There are two possible ways of defining the MSE, because S is a random variable. First, the MSE of which is the definition that we will use in our simulation study, or alternatively the MSE of estimating Note that while the bias does not depend on the correlation in the normal case, this is not the case for the variance. The MSE is always larger with control group than without a control group: where X k is the sample mean and k is the true mean in group k = 0, … , K.

Conditional selection bias.
We will now give explicit expressions for the selection probabilities and the conditional selection biases at the interim and final analyses. The detailed derivations can be found in Section B in the Appendix.
respectively. The probability of selecting treatment k is given by At the interim analysis, the conditional selection bias cb + 1k of̂M LE ,1 In the dropped treatment arm, the MLE is biased in the opposite direction; that is, the treatment effect is underestimated, and the bias cb − 1k of̂M LE ,1 We derive now an explicit formula for the conditional selection bias at the final analysis. Let u k = Σ 2k. Σ −1 1 , where Σ 2k⋅ is the k-th row of the matrix Σ 2 . Then, the conditional selection bias at the final analysis can be written as where In the dropped treatment arm, analogously to Eq. (3), We have shown in Section 2.2.2 that the absolute value of the bias is maximal when the correlation is 0; that is, each treatment group has its own independent control group. In this case, Eqs (4) and (5) Thus, each additional event observed after the interim analysis reduces the conditional bias, and the bias at the final analysis will be smaller than at the interim analysis. We also see that only the total number of observed events matters and not whether the event comes from a patient recruited before or after the interim analysis. This is especially interesting in the dropped treatment arms, because it quantifies the bias reduction that we can expect, when continuing follow-up until the final analysis. In our simulation study in Section 3, we see that this relation also holds approximately in the case of a common control group. This result is analogous to the normal case, for example [22].
In a two-arm trial with a single treatment group, recruitment may be stopped early at the interim analysis for the lack of benefit, when the estimated log-HR is larger than some pre-specified threshold c, but follow-up is continued until the final analysis. The selection bias in this case can be obtained as a special case by setting K = 2,̂M LE ,1 2 = c, and letting 12 → 0. This leads to We see that Eq. (6) still holds; that is, additional events reduce the conditional selection bias. This confirms the simulation study results of [23].

Shrinkage estimator
We consider two different approaches to deriving the shrinkage estimator. The first approach follows closely the original derivation of [18], by considering the posterior mean of the selected treatment when the covariance matrix of the data is known, but arbitrary, and then replacing the prior mean and variance by their sample estimates. The second approach first scales the estimators by the square root of their covariance matrix to obtain independent estimators with unit variances and then rescales the shrinkage estimates back to the original scale.
In the normal case with known and equal variances and four or more treatment groups, the shrinkage estimator can be derived as an empirical Bayes estimator [24]. Let X k be the sample mean and i the true mean in group k (k = 1, … , K ⩾ 4). We assume normal priors for the true means k ∼ N( , 2 ) and that the sample mean X k conditional on k is normally distributed with mean k and known variance 2 (k = 1, … , K). The posterior mean of k conditional on X k is given by (1−C) +CX k and the posterior variance by 2 To obtain an estimator of the posterior mean of S , the unknown quantities and C are estimated by the overall sample meanX andĈ respectively.Ĉ is an unbiased estimator of C [24]. Because C ⩾ 0, the estimatorĈ is replaced byĈ + = max(Ĉ, 0). Therefore, the shrinkage estimator is given by The aforementioned estimators are only defined for K ⩾ 4. The best linear unbiased predictor can be used instead when K ⩽ 3. The best linear unbiased predictor is the same as the empirical Bayes estimator but with K − 3 replaced by K − 1 [19].

2.3.1.
Shrinkage estimator for the log hazard ratios. The extension from independent equal variances to the general case of an arbitrary covariance matrix is straightforward. In order to simplify the notation, we writêin the following for the MLE at Stage j instead of̂M LE ,j . Assume now where I is the K × K identity matrix and the Σ the covariance matrix, which we assume to be known for the moment. Note that we still put independent priors on the true treatment effects. The posterior mean of given the datâis where the shrinkage factor C is now the matrix I − Σ( 2 I + Σ) −1 . The marginal covariance matrix of̂is 2 I + Σ. Because Σ is symmetric, there exists an orthogonal matrix U and a diagonal matrix D such that D = UΣU ′ . Thus, Ûis a vector of independent normal distributed variables. The marginal covariance matrix of Ûis 2 I + D. The following iterative procedure described by Morris [24] to calculate the MLÊ 2 of 2 can now be applied to the transformed data Û: 1 Start with an initial guess of̂2. 2 Define weights w k = (̂2 + D 2 kk ) −1 (k = 1, … , K).
As Σ is unknown, we apply this iterative method with the diagonal matrixD =ÛΣÛ ′ instead of D, whereΣ is an estimator of the unknown covariance matrix Σ andÛ is an orthogonal matrix to diagonalizê Σ. We can now define an estimator of the matrix C bŷ Similarly as before, we replace the unknown prior mean by the overall log-HR̄, which is calculated by pooling all treatment groups together and fitting a Cox proportional hazards model to the pooled data. The shrinkage estimator is defined bŷE The shrinkage estimator of the selected treatment iŝE B S . If̂2 = 0, thenĈ = 0, and the shrinkage estimator is equal to the overall log-HR̄. We will call this the EB shrinkage method.
We could have diagonalized with the orthogonal transformation U from the beginning; that is, we could have used U and Û, applied the shrinkage, and then transformed back to the original coordinates. This would have led to the shrinkage factor: which is exactly the same as C, because U is orthogonal.

Shrinkage estimator of the scaled log hazard ratios.
Instead of only diagonalizingΣ, we standardize it to the identity matrix and then apply the method of the independent equal variances case. This leads to an alternative shrinkage estimator. Consider the problem of shrinkage towards 0 when Σ is known. We start by transforming the vector̂with covariance matrix Σ to a vector * with independent components and unit variances. Define the scaled MLEs: * = Σ −1∕2̂∼ N(Σ −1∕2 , I).
When we use a N( * , * 2 I) prior on the scaled treatment effects Σ −1∕2 , we know from the independent case with equal variances that the empirical Bayes estimator is given byĈ +̂ * , wherê The right hand side is the Wald test statistic for testing equality of all treatment effects, which is asymptotically equivalent to the log-rank test statistic in the Cox model in Eq. (1). When shrinking towards the overall log-HR̄, we take the log-rank test statistic Z, which directly compares the treatment groups, that is, without the control group and treatment group 1 being the new baseline, instead. Therefore, ) .
As in the normal case, we replace K − 3 by K − 1 if K < 4. The log-rank test statistic Z is asymptotically 2 K−1 distributed. This is the justification to consider the shrinkage estimator: which we will call the LR shrinkage method.

Two-stage shrinkage estimators.
We define a two-stage estimator similar to the two-stage shrinkage estimator proposed by Carreras and Brannath [19] using the increment of the MLE: * ,1, S ∶= ŵ * ,1 wherê * ,1 S is one of the two shrinkage estimators at the interim analysis defined previously.

Confidence intervals.
Our focus is on point estimation. Construction of correct confidence intervals is difficult and a topic of ongoing research. No adequate methods for constructing confidence intervals in this setting are available apart from bootstrap resampling. However, bootstrap confidence intervals cannot properly account for the variability introduced by the selection process, because the selection in a resampled data set may be different from the decision in the original data set, but the decision to stop recruitment in the dropped treatment arms cannot be reversed in retrospect. Despite this, such intervals are useful to compare the relative merits of the estimators. We use a bootstrap approach conditional on the selection S 0 in the original data set similar to [25]: 1 Resample data accrued until final analysis; 2 Determine calendar time t * of interim analysis in resampled data set; 3 Obtain Stage 1 data from resampled data set, and if necessary, recensor event times of overrunning patients at t * ; 4 Calculate Stage 1 MLEs̃k (k = 1, … , K) from Stage 1 data; 5 If̃S 0 = min k̃k , calculate bias-corrected estimates. These steps should be repeated until a large enough number of estimates have been produced in order to calculate empirical quantiles of the sampling distributions with the desired accuracy. Note that the interim analysis in each resampled data set is conducted at the same information fraction as in the original data set, although this may correspond to a different number of events and calendar time.
The conditional biases depend on the unknown covariance matrix of̂M LE ,1 (Section 2.2.3), which we replace with the estimatorΣ 1 from the Cox model. The estimation of the vector b(̃) is computationally complex, because it requires the numeric evaluation of integrals of multivariate cumulative distribution functions. Transforming the vector̂M LE ,1 in the same way as in Section 2.3.2 and then solving Eq. (9) with the identity matrix as covariance matrix does not work, because these solutions are in general not solutions of the original equation.
We have defined the Stallard-Todd estimator only at the interim analysis. A similar definition is possible at the final analysis using the bias formulas derived in Section 2.2.3, that is solving Eq. (9) witĥ MLE ,2 instead of̂M LE ,1 . However, in our simulations, the iterative procedure diverged most of the time. This also happened in the data example in Section 4. This seems to be related to the fact that the selection bias depends on the marginal covariance matrix, but only the conditional covariance matrix given the selection can be estimated at the final analysis. We therefore only use the two-stage estimator at the final analysis defined bŷS Confidence intervals for the Stallard-Todd estimator can only be obtained by resampling, as its (asymptotic) sampling distribution is not known. Simulations not reported here suggest that asymptotic normality does indeed hold.

Bias-corrected Kaplan-Meier estimator
The Kaplan-Meier estimator in the selected treatment group will be biased upwards. The Kaplan-Meier estimator in the control group will always be (asymptotically) unbiased. We can define a bias-corrected Kaplan-Meier estimator based on any of the bias-corrected estimatorŝ * S (t) of the log-HR by using the relationship between the survival functions in the control and the treatment groups under the Cox model (Eq. (1)). The bias-corrected Kaplan-Meier estimator at a time t is defined aŝ In simulations not reported here, we found that the selection bias of the Kaplan-Meier estimator in the selected treatment arm is negligible even when all true treatment effects are equal. We would expect the selection bias to be more pronounced, when selection was performed based on, for example, the estimated survival probability at a specific time instead of the estimated log-HRs at interim.

Simulations
We consider three base scenarios in our simulation study, which are similar to those in [19] and cover a range of configurations of true log-HRs. It is of course impossible to cover all possible configurations in a simulation study.
2 Linear: A linear trend from 0.6 to 1: 3 Peak: All but one HRs are 1: The baseline hazard was log(2)∕12, which corresponds to a median survival time of 12 (months) in the control group. We compare the bias and MSE of all methods for an increasing number of treatment groups K. A maximum of 200 patients are allocated to each group. The interim analysis was conducted after (K + 1) ⋅ 50 observed events in all treatment groups and control group combined. This amounts to approximately 50 events in each group when the event rates are similar. The final analysis was performed after 200 events in the selected treatment group and control group combined. Figure 1 shows the bias in 10 5 simulations of each method in each of the base scenarios as the number of treatment groups K ranges from 2 to 6 at the interim analysis, the final analysis, and for the twostage estimators. The corresponding square root of the MSE is shown in Figure 2. We observed some convergence problems of Stallard-Todd estimator, especially in the case K = 2 with non-convergence in up to 20% of the simulations. In these cases, the MLE was used as a fallback, which slightly increases the bias but reduces the MSE. Overall the results are very similar to those of [19] in the normal case.
The simulation results confirm the theoretical result that the bias of the MLE is maximal, when all treatment effects are equal ("Constant" scenario). Both shrinkage estimators clearly outperform the MLE and the Stallard-Todd method in terms of bias and MSE in this case. In the "Linear" scenario, the LR shrinkage estimator has smaller bias than the MLE, when there are four or more treatment groups and about the same MSE for any number of treatment groups. Both shrinkage estimator win over the Stallard-Todd method except in the K = 2 case (which is identical to the K = 2 case in the "Peak" scenario). In the "Peak" scenario, the MLE is practically unbiased and beats all other methods in terms of bias and MSE. However, the LR shrinkage estimator is still very close to the MLE and has a smaller MSE than the Stallard-Todd estimator.
While the estimates of the Stallard-Todd method are on average not far from those of the other methods, a closer look at the distribution of the estimated HRs at the interim analysis ( Figure 3) reveals that the Stallard-Todd method produces more extreme and skewed results, than the other methods, especially in the "Constant" scenario.
The influence of the allocation ratio on the bias and MSE of the two-stage estimators is shown in Figure 4. A maximum of n patients were allocated to each treatment group and a maximum of m patients to the control group. The allocation ratios m ∶ n considered were 3:1, 2:1, 1:1, 1:2, and 1:3. This corresponds to correlations between the estimated log-HRs of approximately 1/4, 1/3, 1/2, 2/3, and 3/4. The bias increases with increasing correlation in line with our theoretical result. The MSE apparently attains its minimum at a correlation of 0.5. This suggests that patients should be equally allocated to all groups. The bias and MSE decrease when the number of observed events increases ( Figure 5). There is a considerable finite sample bias in addition to the selection bias, when only a few events (< 25 events per group) are observed.
The largest effect considered is a HR of 0.6. The MLE is already practically unbiased when the HR is 0.6 for one treatment and 1 for all other treatments as can be seen in Figure 1. An even larger effect would not provide any additional insight, although estimation also works well for values smaller than 0.6.

Application
We apply our methods to data reconstructed from the published Kaplan-Meier curves of the FOCUS trial [13] using the method of [27]. The Kaplan-Meier curves were digitized using WebPlotDigitizer [28].
In the FOCUS trial, five treatment strategies of sequential and combination chemotherapy for poor prognosis advanced colorectal cancer were compared: flourouracil only (arm A, control), irinotecan after flourouracil (arm B-ir), combination of flourouracil and irinotecan (arm C-ir), oxaliplatin after flourouracil (arm B-ox), and combination of flourouracil and oxaliplatin (arm C-ox). A total of 2135 patients were randomized in the ratio 2:1:1:1:1.
The reconstructed data closely match the original data in terms of hazard ratios and corresponding confidence intervals (Table I). The data are analyzed under a hypothetical two-stage design, with an interim analysis after 900 events across all arms. This corresponds to roughly 300 events in the control arm and 150 events in each of the four treatment arms. The final analysis is conducted after 900 events  in the control arm and the selected treatment arm, resulting in about 600 events in the control arm and 300 events in the selected treatment arm. In addition to point estimates, bootstrap confidence intervals from 1000 replications for all methods conditional on the treatment selection in the original data set are provided. The resulting confidence intervals therefore do not account for the variability introduced by the selection and are expected to have less than the nominal coverage as discussed in Section 2.3. Both shrinkage methods produce very similar and sensible results (Table II), when compared with the original observed HRs of the FOCUS trial (Table I), where no selection was performed. The estimated    HRs of the LR shrinkage method are very close the MLE in all arms, with narrower confidence intervals, while the EB shrinkage method shrinks all estimates to the overall HR, resulting in a smaller treatment effect in the selected C-ir arm. This is the same tendency of the EB shrinkage estimator to overcorrect the bias that was also observed in the "Linear" and "Peak" simulation scenarios. The Stallard-Todd estimator applied to this specific data set at the interim analysis gives rather different results, which contradict the observed HRs in the original data, where the C-ir arm was the best performing treatment arm. While this result is unsatisfactory, it is in-line with the simulation results ( Figure 3) that show that extreme solutions can be obtained with this approach. The iterative procedure did not converge when applied at the final analysis (see also Section 2.4).
The effect of continuing follow-up in the dropped treatment arms can be seen clearly in the B-ir and C-ox arms. The substantial difference in Stage 1 HRs completely disappears at the end of Stage 2.

Conclusion
The bias of the MLE was found to be small except for a large number (K ⩾ 6) of treatment groups. We have shown that the common control group reduces the bias of the MLE because of the induced correlation of the MLEs and that the bias of the MLE in the dropped treatment arms is reduced when continuing follow-up until the final analysis in accordance with the simulation results of [23].
In our simulation study, the shrinkage methods performed very well, when all treatment effects are equal, but had a larger bias than the MLE, which increased with number of treatment groups, when the the maximal absolute difference of the true treatment effects was large. These results reflect the prior assumptions underlying the empirical Bayes estimators. When comparing the estimators with respect to the MSE, the Stallard-Todd method performs the worst across all scenarios. This is also reflected in the analysis of the reconstructed FOCUS trial data, which shows that either shrinkage method is preferable to the Stallard-Todd method.
Unless there is reason to believe that there are large differences between the true treatment effects, the LR shrinkage estimator can be recommended as a way to reduce the bias and the MSE of the MLE. Because the shrinkage estimators tend to overcorrect the bias, they might also be of interest in situations where overestimation of the treatment effect is considered worse than underestimation.
All methods were based on the asymptotic normal approximation of the log-HRs. In our main simulations, the number of events was at least 150, such that finite sample bias is negligible. This can be seen, for example, in the Peak scenario in Figure 1, where the MLE is practically unbiased. However, when the number of events per group is very small (< 25), the finite sample bias outweighs the selection bias as was seen in Figure 5. Such small event numbers do not seem relevant in practice, because typically much larger event numbers are required to achieve acceptable power.
Assuming a Cox proportional hazards model is convenient but not necessary, because all methods are based on the asymptotic approximation of the distribution of the treatment effect estimators. Therefore, the Cox model could be replaced with any other model with asymptotically normal treatment effect estimators. It is anticipated that the results would be comparable to those under the Cox model.
Confidence intervals after adaptive treatment selection that have the correct coverage probability are non-trivial and were outside of the scope of this work. As far as we know, no adequate methods for construction of confidence intervals for the shrinkage or Stallard-Todd methods have been proposed.
In most trials some form of covariate adjustment is required, for example, inclusion of center effects in a multi-center trial to account for stratified randomization. As long as the asymptotic results still hold, no adjustment is necessary, because only the dimension K of the multivariate normal distribution needs to be increased to accommodate any additional regression coefficients.
The shrinkage estimators could also be derived as estimators arising from penalized Cox regression [29]. When using an L 2 penalty, that is, ∑ K k=1 ( k −̄) 2 , there is a one-to-one correspondence between the penalty parameter and the prior variance . This is appealing, because the penalized Cox regression works directly on the time-to-event data and not only via an asymptotic normal approximation. Selection of the penalty parameter could be performed by cross-validation in order to minimize the prediction MSE. However, cross-validation is computationally extremely expensive and results in an additional cross-validation error. In simulations not reported here, the penalized MLE with cross-validation did not perform better than the shrinkage estimator.
The LR shrinkage of the vector of MLEs does not change the ordering of its components, becausê (1 ⩽ k, l ⩽ K) andĈ + ⩾ 0. Thus, the results would be the same if selection were based on the LR shrinkage estimator instead of the MLE. Finally, it is worth mentioning that the calculation of the shrinkage estimator does not depend on the specific selection rule used. It can be used without change with complex selection rules, for example, when selection is based on hypothesis tests. For the Stallard-Todd method and the UMVCUE approach, non-trivial modifications are necessary [20].