Confidence bands for multiplicative hazards models: Flexible resampling approaches

Abstract We propose new resampling‐based approaches to construct asymptotically valid time‐simultaneous confidence bands for cumulative hazard functions in multistate Cox models. In particular, we exemplify the methodology in detail for the simple Cox model with time‐dependent covariates, where the data may be subject to independent right‐censoring or left‐truncation. We use simulations to investigate their finite sample behavior. Finally, the methods are utilized to analyze two empirical examples with survival and competing risks data.


Algorithmic Implementation
In the following algorithms we explain the algorithmic application of the bootstrap procedures to construct time-simultaneous confidence bands (TSCBs) of the form for the cumulative hazard function Λ 0 on a fixed interval I = [t 1 , t 2 ] ⊂ [0, τ ] in the Cox Model as described in Section 3.3 of the main paper.

TSCBs based on the 'classical' multiplier bootstrap
We start with the 'classical' multiplier bootstrap based on direct resampling as inspired by Lin et al. (1994) and described in Section 3.1. The Spiekerman and Lin (1998) type versions are obtained by substituting G i N i (t) with G i M i (t) as described in the manuscript. In the first algorithm we use the untransformed version of (1) based on φ = φ 1 = id with the weight function g (1) Algorithm 1 The 'classical' multiplier bootstrap -Untransformed TSCBs for Λ 0 Require: Survival data (including covariable vectors) from n subjects Step 1: Calculate Estimators β, σ and Λ 0 = Λ 0 (t, β) from the data.
To obtain the log-transformed equal precision bands we only have to change the last Step 8 within Algorithm 1: The weight function g n (t) =g (1) For the log-transformed Hall-Wellner bands the same transformation is used together with the weight function g n (t) =g (2) As explained in Section 3.3 of the manuscript we also tried to use the (1 − α)-quantile of sup t∈Ig * (t)| log{ Λ * 0 (t, β)} − log{ Λ 0 (t, β)}| as critical value c * φ (α) in (1). Due to numerical instabilities, we preferred the current, asymptotically equivalent choice.

TSCBs based on bootstrapping the score equations
We now consider the resampling approach based on bootstrapping the score equations as described in Section 3.2. Again, we start with the untransformed version of (1) based on φ = φ 1 = id with the weight function g n (t) = √ n/ σ(t). The resulting Algorithm 2 is shown below and shares the same first three steps with Algorithm 1.
Step 8: Step 9: Calculate the TSCB (1) based upon g n (t) = g (1) To obtain the untransformed TSCBs based upon g (2) n (t) or corresponding log-transformed equal precision or Hall-Wellner bands, one only has to change the Steps 8 and 9 of Algorithm 2 as described in Subsection 1.1 above for the Steps 7 and 8 of Algorithm 1.

R Code
In order to make the developed resampling procedures easily available to all readers, we have decided to upload the R code that has been used in Section 5 in the manuscript. It can be accessed through the following link: https://github.com/scheike/wildCoxBootstrap It contains the implementations of all considered resampling procedures and it illustrates how they were used to conduct the real data analyses in Section 5 in the manuscript.

Second Simulation Study: Cumulative Incidence in Competing Risks
In addition to the simulation study in Section 4 of the manuscript with a focus on cumulative hazard functions, we conducted a second set of simulations in a competing risks model. In this section we analyze the performance of the multiplier bootstrap in the situation of two competing risks. We focus on confidence bands for the baseline cumulative incidence function of the first risk, F 1 (t | X = 0). A Cox model with a one-dimensional covariate is used for each cause-specific hazard. The cause-specific baseline hazard rates are λ 1 ≡ λ 2 ≡ 0.5 and we consider two cause-specific hazards of Cox form, λ 1 exp(Xβ 1 ) and λ 2 exp(Xβ 2 ), respecitively, where β 1 = 0.3 and β 2 = −0.3. The covariates are distributed as X i i.i.d.
∼ N (0, 16) and the right-censoring times follow a mixture distribution of standard exponentially distributed and deterministic times: The considered sample-sizes are n = 100, 200, 400 as in the main paper.
The aim is again time-simultaneous inference on F 1 (t | X = 0) = 0.5(1 − exp(−t)) along the time interval [0.1, 3], based on 95% confidence bands. Because of the good performance for simultaneous inference on cumulative baseline hazards, we chose the resampling approach combination of centered standard exponentially distributed multipliers, the estimating equation resampling method, and G i dN i . We would like to verify that this choice leads to reliable inference on cumulative incidence functions as well. Table 1 contains the empirical coverage probabilities of the bands which are based on 10, 000 independent bands, each of which used 1, 000 bootstrap iterations.
[ From the results we can conclude that the multiplier bootstrap again leads to very accurate confidence bands. In particular, also the bands that were not based on the log-transformation showed coverage probabilities close to 95%, even for small sample sizes.

Proofs
In the statements of the facts, we always assume that the considered events are measurable which will be the case in all of their applications.
In the following, let (Z n ) n∈N and (Z * n ) n∈N be sequences of random elements in the Skorokhod space D[0, τ ], let Z be a time-continuous random element of D[0, τ ], and let G be a σ-field with respect to which all Z n and Z are measurable. Denote by ρ the usual Skorokhod metric.
The first fact which will help us to simplify the proofs in this supporting information is that convergence in probability is equivalent to convergence in conditional probability. This was pointed out to us by a referee who also suggested a proof similar to the one given below, see also Lemma A.5(ii) in combination with Korollar A.4 in Pauly (2009) for a similar argument. This is why we will only refer to unconditional convergence in probability. In particular: On the other hand, suppose that D n (ε) p → 0 for every ε > 0. This implies that, for any subsequence {n } of {n}, there exists a subsequence thereof, say {n }, along which D n (ε) → 0 P -almost surely. By the dominated convergence theorem, P (ρ(Z n , Z) > ε) = Furthermore, we will make use of a conditional version of Lenglart's martingale inequality (Section II.5.2.1 in Andersen et al., 1993). For completeness, it is stated as another fact.
Fact 2: Let m be a time-continuous martingale with respect to a filtration F with a non-trivial initial σ-field F 0 . Then, for all δ, η > 0, We will rely on martingale arguments for processes of the form Here the function k n,β,i is measurable with respect to F 0 , which is the non-trivial initial σ-field in the filtration Indeed, it turns out that m is a martingale with respect to F: Fact 3: The stochastic process m is a martingale with respect to the filtration F. Its predictable and optional variation process are given by Proof. Consider for 0 r t the conditional expectation 0 for r s (Bluhmki et al., 2018). Hence, the martingale property is satisfied. With similar arguments, one finds the stated representations of the predictable and optional variation processes.
Proof of Lemma 1. To a large extent, it is possible to parallel the martingale arguments as used in the proofs of Theorems VII.2.1 and VII.2.2 in Andersen et al. (1993). We show the proof for the resampling scheme (13) only; once it has been understood how martingale methods can be applied here, it will be apparent how to conduct the proof for the classical multiplier bootstrap scheme (9) which consists of martingales entirely.
Proof of (I). The proof follows the lines of Theorem VII.2.1 in Andersen et al. (1993).
where ∇ again denotes the gradient with respect to β. We analyse the asymptotic behaviour of the process X * (t, β) S 0 (s, β) dN i (s)}; cf. Fact 3. Thus, the martingale X * (t, β)−X * (t, β) with respect to F has the predictable variation process where the second equality follows from β − β 0 = O p (n −1/2 ) in combination with the meanvalue theorem applied to the function β → log S 0 (s, β), whose gradient, with the different partial derivatives evaluated at different intermediate vectors, is bounded in probability.
Proof of (II). Recall the definition ofDU * τ ( B) where each column β (j) of B is on the line segment between β * and β. We again use martingale theory to prove the claimed convergences. Without loss of generality, we assume that β (1) = · · · = β (p) because random matrices converge in probability if and only if each row does. Hence, we assume that ) which is the usual Jacobian at β (1) . The object to be analyzed is thus We consider the integral until t instead of τ to make use of martingale theory and we apply the decomposition −n −1 DU * t (β (1) ) = A n (t) + B n (t) + C n (t) + D n (t), where
Next, an application of Lenglart's inequality as above shows the asymptotic negligibility of B n (t) which is a square-integrable martingale in t with respect to (F t ) t∈[0,τ ] .
Proof of (III). We will use that n defines a square-integrable martingale in t with respect to F. This can be shown in the same way as for the other martingales above. Its predictable variation process is given by Similarly as before, this function is asymptotically equivalent to The second term on the right-hand side converges in probability to t 0 v(s, β 0 )s 0 (s, β 0 )λ 0 (s) while the remaining martingale term vanishes asymptotically: its predictable variation process is given by 1 for the vectorization vec(·) of matrices. Clearly, this predictable variation goes to zero in probability since the X i are bounded and the functions E and S 0 converge uniformly in probability to bounded functions. Hence, in order to apply Rebolledo's martingale central limit theorem (Theorem II.5.1 in Andersen et al., 1993) to conclude the asymptotic normality of n −1/2 U * τ ( β), only the Lindeberg-type condition remains to be verified. To this end, define for arbitrary ε > 0, the process t → n −1/2 U * ε,t ( β) as a cumulation of all jumps of at least size ε of n −1/2 U * (·) ( β) until t. That is, defining the vector-valued indicator function for each component and the multiplication with it to be component-wise, Obviously, this process is still a martingale with respect to F, to which we apply Lenglart's inequality: for any η, δ > 0, we have, after having taken expectation on both sides as above, This predictable variation process is given by All jump times are assumed to differ. Hence, if any component of the indicator function is one, this implies the existence of i such that n −1/2 |G i | X i (s)−E(s, β) ∞ ε. Here, · ∞ denotes the sup-norm, i.e. the largest absolute value of a vector. We will show that the probability of this event goes to zero across all i and s. To this end, we define the sequence of random variables Z n = sup i=1,...,n, s∈[0,τ ] X i (s) − E(s, β) ∞ . From the boundedness assumption on the covariates and the asymptotic behaviour of E and β, it is clear that Z n is bounded in probability. That is, there is a constant K > 0 such that P (Z n > K) → 0. Let us consider the probability where the last equality follows from the existing second moment E(G 2 1 ) which implies the tail rate P (|G 1 | > t) = o(t −2 ) as t → ∞. (This is easily seen from Problem 2.3.3 in van der Vaart and Wellner (1996) because, for all ε > 0, P (max i=1,...,n |G i | > ε √ n) [ var(G 1 ) ε 2 n ] n → 0.) All in all, we conclude that the probability that the predictable variation process n −1 U * ε,(·) ( β) jumps at all goes to zero. Hence, it converges to zero in probability as n → ∞ and the Lindeberg-type condition is verified and Rebolledo's martingale central limit theorem applies to n −1/2 U * · ( β). In particular, it follows that n −1/2 U * τ ( β) is asymptotically normal.
Proof of Theorem 1 We only prove the assertion on the "wild bootstrapping the score equations" approach. The applicability of the Lin et al. (1993Lin et al. ( , 1994 multiplier scheme follows along the same lines and is in fact easier to prove because less applications of the mean-value theorem are required. All in all, we will make use of similar martingale arguments for the multiplier bootstrapped estimators as in the proof of Lemma 1. To increase readability, some repeating arguments are omitted. As a first step, paralleling the proof of Theorem VII.2.3 in Andersen et al. (1993), we deduce an asymptotic representation of the first part of Λ * This equality holds by a Taylor expansion around β. Here,β * is on the line segment between β * and β.
Note that we can replace the intermediateβ * vectors by β because the resulting error converges to zero in probability as n → ∞: due to Lemma 1 in combination with Conditions 1(a)-(c), the difference in the curly brackets in (3) is of order o p (n −1 ) uniformly in s, giving us n i=1 |G i + 1|o p (n −1 ) = o p (1) as an upper bound of (3). Hence, vanishes because the process inside the curly brackets is a square-integrable martingale with respect to F, and thus asymptotically Gaussian. We conclude that t 0 e(u, β 0 )λ 0 (u)du + o p (1) + W * (t).
For the joint convergence of √ n( β * − β) and W * (t), we consider the bivariate process t → {n −1/2 U * t ( β), W * (t)} which, for similar reasons as in the proof of Lemma 1 defines a square-integrable martingale with respect to F. We wish to apply Rebolledo's martingale central limit theorem (with non-trivial initial sigma field F 0 ) in order to obtain the desired joint conditional central limit theorem which holds in probability. To this end, we analyze the predictable covariation process: Approximating E(u, β) on the right-hand side by E(u, β 0 ) (then using (4)) and S −1 0 (u, β) by By definition of S 0 and S 1 , the second term (6) on the right-hand side is zero. The remaining term (5) is a martingale with predictable variation Thus, Lenglart's inequality implies that the martingale (5) also goes to zero in probability as n → ∞. Hence, √ n( β * − β) and W * are asymptotically independent.