Nonparametric analysis of nonhomogeneous multistate processes with clustered observations

Abstract Frequently, clinical trials and observational studies involve complex event history data with multiple events. When the observations are independent, the analysis of such studies can be based on standard methods for multistate models. However, the independence assumption is often violated, such as in multicenter studies, which makes standard methods improper. This work addresses the issue of nonparametric estimation and two‐sample testing for the population‐averaged transition and state occupation probabilities under general multistate models with cluster‐correlated, right‐censored, and/or left‐truncated observations. The proposed methods do not impose assumptions regarding the within‐cluster dependence, allow for informative cluster size, and are applicable to both Markov and non‐Markov processes. Using empirical process theory, the estimators are shown to be uniformly consistent and to converge weakly to tight Gaussian processes. Closed‐form variance estimators are derived, rigorous methodology for the calculation of simultaneous confidence bands is proposed, and the asymptotic properties of the nonparametric tests are established. Furthermore, I provide theoretical arguments for the validity of the nonparametric cluster bootstrap, which can be readily implemented in practice regardless of how complex the underlying multistate model is. Simulation studies show that the performance of the proposed methods is good, and that methods that ignore the within‐cluster dependence can lead to invalid inferences. Finally, the methods are illustrated using data from a multicenter randomized controlled trial.

with Y i·,· (0+) = h∈T c Y i·,h (0+). The corresponding empirical version can be easily obtained by replacing the unknown state occupation and transition probabilities with their consistent estimates, γ ihj (0, t) withγ ihj (0, t) as described above, and the expectations with sample averages over clusters.
The influence function for the weighted state occupation probability estimatorP ′ n,j (t) is The corresponding empirical version can be easily obtained by replacing the unknown state occupation and transition probabilities with their consistent estimates, γ ′ ihj (0, t) withγ ′ ihj (0, t) as described above, and the expectations with sample averages over clusters.
The classes of functions {ψ ij (t) : t ∈ [0, τ ]} and {ψ ′ ij (t) : t ∈ [0, τ ]} are P -Donsker for any j ∈ S. This is due to the fact that these classes consist of linear combinations of functions that belong to P -Donsker classes by Theorem 2 in the main text, fixed functions, and random variables with bounded second moments.

Web Appendix B: Asymptotic Theory Proofs
The proofs of the theorems provided in Section 2 of the manuscript rely on empirical process theory (van der Vaart and Wellner, 1996;Kosorok, 2008). In this Appendix we use the standard empirical processes notation where, for any measurable function f : D → R, D i denotes the observed variables for the ith cluster, D denotes the sample space, and P the true (induced) probability measure defined on the Borel σ-algebra on D. We also use the supremum norm notation f (t) ∞ ≡ sup t∈ [0,τ ] |f (t)|. Let V be a generic constant that may differ from place to place. In this Web Appendix we only prove the asymptotic properties ofP n (s, t) since the properties ofP ′ n (s, t), P n,j (t), andP ′ n,j (t), j ∈ S, can be established using the same arguments. Without loss of generality and for simplicity of presentation we set the starting point s = 0. Before outlining the proofs of Theorems 1-3 we provide and prove two useful lemmas. Proof. Let h Q,2 = ( |h| 2 dQ) 1/2 for any probability measure Q. Now, for any probability measure Q and any t 1 , t 2 ∈ [0, τ ] it follows that By lemma 22.4 in Kosorok (2008), it follows that the class Φ 1 = {N (t) : t ∈ [0, τ ]} has a bounded uniform entropy integral (BUEI) with envelope 2N (τ ), and is also pointwise measurable (PM). This implies that, for any t ∈ [0, τ ] there exist a t i ∈ [0, τ ], i = 1, . . . , N (ϵ2 N (τ ) Q,2 , Φ 1 , L 2 (Q)), such that N (t) − N (t i ) Q,2 < ϵ2 N (τ 2 ) Q,2 , for any ϵ > 0 and any finitely discrete probability measure Q. Therefore, for any member of F 1 (s), there exist a t i s h(u)dN (u), for i = 1, . . . , N (ϵ2 N (τ ) Q,2 , Φ 1 , L 2 (Q)), such that for any ϵ > 0 and any finitely discrete probability measure Q. Consequently, by the minimality of the covering number it follows that for any ϵ > 0 and any finitely discrete probability measure Q, we have that N (ϵ2V N (τ ) Q,2 , F 1 (s), L 2 (Q)) N (ϵ2 N (τ ) Q,2 , Φ 1 , L 2 (Q)), which yields a BUEI for F 1 (s) with envelope 2V N (τ ). Using similar arguments to those used in the example of page 142 of Kosorok (2008), it can be shown that the class F 1 (s) is also PM. Therefore, by Theorem 2.5.2 in van der Vaart and Wellner (1996), the class F 1 (s) is P -Donsker. Since s was arbitrary, the last statement is true for any s ∈ [0, τ ).
Lemma 2: Let Y (t) be an arbitrary at-risk process, A(t) a continuous cumulative transition intensity function on [0, τ ], and h(t) a fixed and non-negative function with h(t) V almost everywhere with respect to the Lebesgue-Stieltjes measure generated by A(t). Then, the class of functions is P -Donsker for any s ∈ [0, τ ).
Proof. It is not hard to show that for any probability measure Q and any t 1 , for any ϵ > 0 and any finitely discrete probability measure Q. Therefore, for any member of for any ϵ > 0 and any finitely discrete probability measure Q. Consequently, by the minimality of the covering number, it follows that for any ϵ > 0 and any finitely discrete probability measure Q, we have that which yields a BUEI for F 2 (s). Finally, similar arguments to those used in the proof of Lemma 1 lead to the conclusion that the class F 2 (s) is P -Donsker for any s ∈ [0, τ ).

B.1 Regularity Conditions
In this work we assume the following conditions: C1. The potential left truncation L im,1 and right censoring L im,2 times are independent of the underlying counting processes , h ∈ T c , and the cluster size M i . Also, L im,1 and L im,2 are identically distributed C3. The underlying counting processes are identically distributed conditionally on cluster size, C4. The underlying at-risk processes are identically distributed conditionally on cluster size, Note that the counting and at-risk processes are also allowed to depend on the total cluster size M i . However, the assumption of identical distributions in conditions C3' and C4' is defined conditional on the size of the pth sample within the ith cluster.

B.2 Proof of Theorem 1
It is clear thatŇ im,hj (t), h = j can be expressed aš where T imv,hj , v = 1, . . . ,Ň im,hj (τ ), are the random jump times ofŇ im,hj (t), t ∈ [0, τ ], and v 0 ∈ N is a constant which is selected to satisfyŇ im,hj (τ ) v 0 a.s. in light of condition C3. The corresponding observable version, which is subject to right censoring and/or left truncation, is Thus, by conditions C1 and C2, Additionally, the observed version ofY im, and thus, by conditions C1, C2, and C5 Next, in light of the assumption of identically distributed counting processes conditional on cluster size (condition C3), condition C2, and the i.i.d. assumption of the observations across clusters it follows that for h = j and any t ∈ [0, τ ] for any m = 1, . . . , M 1 . Similarly, under conditions C2 and C4, it can be shown that for any cluster member m = 1, . . . , M 1 .

As a result,
Taking all the pieces together, and using empirical process theory notation, if follows, by condition C4, that hj (t) = 0. Next, it is easy to see that, for any h ∈ T c and j ∈ S, the following inequality holds: The first term can be bounded as follows: where the last inequality follows from condition C4, which implies that there exists a constant hj being the Lebesgue-Stieltjes measure generated by (the sample paths of) N ·,hj (t). By conditions C2 and C3, the class of functions can be expressed as a (finite) linear combination of monotone caglad square-integrable processes (Andersen et al., 2012), multiplied by R m (t−), which belong to Donsker classes by lemma 4.1 in Kosorok (2008). Thus, by lemma 4.1 and corollary 9.32 in Kosorok (2008), the classes This result and the fact that {P n Y ·,h (t)} −1 is bounded a.e. (µ N ·,hj ) with probability 1 lead to the conclusion that Q n,1 as * → 0. For Q n,2 , conditions C1 and C4 imply that there exists a constant V such that Thus, by conditions C2, C3, and Lemma 1, it follows that the class t ∈ [0, τ ]} is P -Donsker and thus also P -Glivenko-Cantelli. This implies that Q n,2 as * → 0 and, consequently, by inequality (1) it follows that Â n,hj (t) − A 0,hj (t) ∞ as * → 0, for all h ∈ T c and j ∈ S. This result along with the continuity of the product integral (Andersen et al., 2012) lead to the conclusion that

B.3 Proof of Theorem 2
The class of functions h ∈ T c and j ∈ S, by conditions C2 and C3, and lemma 4.1 and corollary 9.32 in Kosorok (2008). Also, the class {Y ·,h (t) : [0, τ ]} is P -Donsker for any h ∈ T c as argued in the proof of for h = j, whereG 1hj andG 2h are tight zero mean Gaussian processes with covariance func- for ϵ > 0 and F −1 1 of bounded variation (Kosorok, 2008), with derivative at (f 1 , f 2 ) given by These facts along with condition C6 and the functional delta method (van der Vaart, 2000), lead to the conclusion that The class of the influence functions {ϕ hj (t) : t ∈ [0, τ ]} is P -Donsker by the Donsker property of the class {N ·,hj (t) : [0, τ ]}, conditions C2-C5, Lemmas 1 and 2, and corollary 9.32 in Kosorok (2008). Therefore, √ n(Â n,hj −A 0,hj ) converges weakly to a tight zero mean Gaussian where the influence functions belong obviously to a P -Donsker class. Thus, the joint sequence √ n(Â n,hj − A 0,hj ) for h = j, converges weakly to a tight zero mean Gaussian process with cross-covariance betweeñ where the matrix ϕ i (t) contains the elements ϕ ihj (t), and the matrix γ i (0, t) contains the

By the P -Donsker property of the classes {N
for h ∈ T c , conditions C3-C5, corollary 9.32 in Kosorok (2008), and Lemmas 1 and This concludes the proof of part (i) of Theorem 2.
Before showing the weak convergence results which are conditional on the observed data, we provide a more formal definition of what does conditional weak convergence mean. Clearly, conditionally on the observed data, the only source of randomness inB n,hj (s, ·) andB ′ n,hj (s, ·) are the standard normal variates ξ i . Weak convergence of conditional laws of a random sequence G n (ξ, O), that depends on the simulation realizations ξ and the observed data O, to a tight process G in some metric space (D, d) is defined as follows (Kosorok, 2008) sup where BL 1 is the space of Lipschitz functions f : D → R, with Lipschitz norm bounded by 1, and E ξ denotes conditional expectation with respect to the simulation realizations ξ, Weak convergence of conditional laws of the cluster bootstrap processes is defined in a similar manner. More precisely, let (U n1 , . . . , U nn ) be a random vector from the multinomial distribution with n trials and probabilities 1/n for each trial. Then, the nonparametric cluster bootstrap versions of the proposed estimators arê respectively. Weak convergence of conditional laws of the corresponding bootstrap processes is defined, conditionally on the observed data, with respect to the multinomial bootstrap weights U and is denoted as For the first conditional weak convergence result in part (ii) of Theorem 2, define the processB hj (0, t) = √ nP n γ hj (0, t)ξ. By the P -Donsker property of the class {γ hj (0, t) : t ∈ [0, τ ]} and the conditional multiplier central limit theorem (Kosorok, 2008) it follows that unconditionally on the observed data. After some algebra it can be shown that wherẽ Next, it is easy to see that almost everywhere with respect to both µ N ·,lq and µÂ n,lq (which is the Lebesgue-Stieltjes measure generated byÂ n,lq ). Therefore, by condition C3 and C6, the outer almost sure consistency of the transition probability estimators, arguments similar to those used in the proof of Theorem 1, and the central limit theorem, it follows that By similar arguments and condition C5 it follows thatQ n,lq2 = o p (1). Finally, by the P -Donsker property of the class {Y ·,l (t) : t ∈ [0, τ ]}, the uniform consistency of the cumulative transition intensity, and the same arguments to those used in the proof of proposition 7.27 in Kosorok (2008), it follows thatQ n,lq3 = o p (1), since convergence in distribution to a constant implies convergence in probability. Thus, by (2) For the second conditional weak convergence result in part (ii) of Theorem 2, the P -Donsker hj −A 0,hj ), the bootstrap central limit theorem (Kosorok, 2008), and the bootstrap functional delta method (Kosorok, 2008, Kosorok, 2008) lead to the conclusion that √ n{P * n,hj (0, ·) −P n,hj (0, ·)} p U G hj (0, ·). The proof of part (iii) of Theorem 2 follows from the same arguments.

B.4 Proof of Theorem 3
By Theorem 2 and the uniform consistency ofŴ hj (t), it follows that The boundedness of the fixed function W hj (t) and the P -Donsker property of {γ p,hj (0, t) : for t 1 , t 2 ∈ [0, τ ].
Next, by the conditional multiplier central limit theorem it follows that Also, by the uniform boundedness of W hj (t) and the P -Donsker property of the class {γ p,hj (0, t)ξ : The uniform consistency ofŴ hj (t) and the arguments used in the proof of part (ii) in Theorem 2 lead to the conclusion that sup t∈[0,τ ] Ĉ n,hj (0, t) −C n,hj (0, t) = o p (1) and, thus, By Theorem 2 and the bootstrap continuous mapping theorem it follows that By the (unconditional) multiplier central limit theorem (van der Vaart and Wellner, 1996) and a double application of the functional delta method, it follows that √ n{P * n,phj (0, ·) − P n,phj (0, ·)}, p = 1, 2, converge weakly (unconditionally) to tight mean zero Gaussian processes in D [0, τ ]. This result along with the uniform consistency ofŴ hj (t) lead to the conclusion that Part (ii) of Theorem 3 can be shown using similar arguments.
It has to be noted that the proposed Kolmogorov-Smirnov-type tests are consistent for any fixed alternative hypothesis. This follows from Theorem 3, the uniform consistency of the proposed estimators, the continuity of these tests in the differences∆ n,hj (s, t),∆ n,j (t), ∆ ′ n,hj (s, t), and∆ ′ n,j (t), and Lemma 14.15 in van der Vaart (2000).

B.5 Violation of condition C6
It is possible that, in some applications, condition C6 is not satisfied. This happens when for ϵ > 0 and F −1 1 of bounded variation (Kosorok, 2008). Therefore, the same calculations to those used in the proof of Theorem 2 lead to the conclusion that converges weakly to a tight zero mean Gaussian processG 3hj in DJ h with covariance function P ϕ hj (t 1 )ϕ hj (t 2 ), t 1 , t 2 ∈ J h , for h = j. The same arguments to those used in the proof of Theorem 2 can be used to show that this theorem holds for t restricted to ∩ h∈T c J h . This means that inference about P 0,hj (s, t), h = j, is possible for s and t in ∩ h∈T c J h . From a practical standpoint one needs to restrict the time interval for confidence intervals/bands and hypothesis tests to a set such that there are at least some observations in all transient states.

B.6 A remark on non-Markov processes
As mentioned in the main manuscript, inference with non-Markov processes can be performed as indicated in Theorems 2 and 3, with the exception that the influence functions for the landmark versions ofP n,hj (s, t) andP ′ n,hj (s, t) involve the modified processesÑ im,lj (t; h, s), l = j, andỸ im,l (t; h, s), l ∈ T c . However, note that the influence functions of the estimators P n,hj (0, t),P ′ n,hj (0, t),P n,j (t), andP ′ n,j (t) involve the quantities P 0,qj (u, t) and P ′ 0,qj (u, t), for u > 0. With non-Markov processes, these quantities are defined as the (q, j) element of the matrices (u,t] {I k + dA 0 (s)} and (u,t] {I k + dA ′ 0 (s)}, respectively. The latter matrices are not necessarily equal to the true (conditional on the prior history) transition probability matrices under a non-Markov process. Nevertheless, the true influence functions depend on these matrices regardless of the Markov assumption. This is because, given the consistency of the estimators, the derivation of the influence functions in the proof of Theorem 2 (Web Appendix B.3) does not utilize the Markov assumption. The same phenomenon is observed for the independent observations setting (Glidden, 2002). Since these matrices are continuous in A 0 (s) and A ′ 0 (s) (Andersen et al., 2012), they can be consistently estimated by (u,t] {I k + dÂ n (s)} and (u,t] {I k + dÂ ′ n (s)}. These can be used to estimate the influence functions of the estimatorsP n,hj (0, t),P ′ n,hj (0, t),P n,j (t), andP ′ n,j (t).

Web Appendix C: Test for Informative Cluster Size
For situations where cluster size is random, the proposed Kolmogorov-Smirnov tests can be applied for testing the null hypotheses H 0 : P 0,hj (s, ·) = P ′ 0,hj (s, ·), for s ∈ [0, τ ), or H 0 : P 0,j = P ′ 0,j , for the transition or state of the main scientific interest, respectively. Rejecting such a null hypotheses provides evidence for a violation of the non-informative cluster size assumption. The weighted difference functions for this hypothesis testing problem becomê In light of Theorem 2 and the discussion in Section 2.4 of the main manuscript, these differences are asymptotically linear which both belong to P -Donsker classes. Furthermore, the cluster bootstrap versions of the weighted differences arê With these (slight) modifications, a similar version of Theorem 3 holds for the aforementioned weighted differences and, thus, testing the null hypothesis H 0 : P 0,hj (s, ·) = P ′ 0,hj (s, ·), for s ∈ [0, τ ), or H 0 : P 0,j = P ′ 0,j can be performed using the test statistics sup t∈[s,τ ] |Ŵ (t){P n,hj (s, t) −P ′ n,hj (s, t)}| or sup t∈[0,τ ] |Ŵ (t){P n,j (t) − P ′ n,j (t)}|, as described in the last paragraph of Section 2.5 of the main manuscript.

Web Appendix D: Additional Simulation Results
This Web Appendix includes additional simulation results. The population-averaged probabilities of interest under the main simulation setup are depicted in Figure 1. Since, under the simulation setup the intensity for the transition 1 → 2 was lower for larger clusters, the population-averaged probabilities over the population of all cluster members P 0,2 (t) and P 0,12 (0.5, t) were lower compared to the corresponding ones for the population of typical cluster members P ′ 0,2 (t) and P ′ 0,12 (0.5, t) ( Figure 1). This illustrates the fact that larger [ Figure 1 about here.]
Ignoring the within-cluster dependence was associated with underestimated standard errors and poor coverage probabilities of the 95% pointwise confidence intervals and simultaneous confidence bands. This poor performance of the naïve methods for the transition probabilities P 0,12 (0.5, t) and P ′ 0,12 (0.5, t) was less pronounced compared to the case of the state occupation probabilities P 0,2 (t) and P ′ 0,2 (t). This is because estimation of P 0,12 (0.5, t) and P ′ 0,12 ( [

D.2 Right Censoring and Left Truncation
Simulation results for the situation with both right censoring and left truncation are presented in Tables 5-12. As expected, the naïve methods which ignore the within-cluster dependence were associated with underestimated standard errors and poor coverage probabilities of the 95% pointwise confidence intervals and simultaneous confidence bands. The proposed methods performed well with the exception of lower coverage probabilities (reaching 91% in a few cases for the influence function-based intervals) for cases with a very small number of clusters (n = 20), and a small number of clusters (n = 40) with only 5-15 observations per cluster. This is attributed to the fact that with such small cluster sizes and under both right censoring and left truncation, the amount of available information for estimating the parameters of interest was quite small. For such cases, the performance of the nonparametric cluster bootstrap was somewhat better compared to the influence function-based inference, indicating that the nonparametric cluster bootstrap may have a better performance for cases with a very small number of clusters. It is important to note that, even in such cases, the proposed methods outperformed their naïve counterparts.
[ [       Table 2: Simulation results for the analysis of P 0,12 (0.5, τ 0.6 ) and P ′ 0,12 (0.5, τ 0.6 ), where τ 0.6 is the 60th percentile of the follow-up time conditional on survival at t = 0.5, based on the standard approach which ignores the within-cluster dependence (naïve) and the proposed method with i) the influence function-based variance estimator (IF) and ii) the nonparametric cluster bootstrap (CB). Results under right censoring. (n: number of clusters; F M : discrete uniform distribution of the cluster size; * : ×10 2 ; MCSD: Monte Carlo standard deviation of the estimates; ASE: average estimated standard error; CP: coverage probability).  Table 3: Simulation results regarding the coverage probabilities of the 95% simultaneous confidence bands for P 0,12 (0.5, ·) and P ′ 0,12 (0.5, ·) based on the standard method that ignores the within-cluster dependence (naïve) and the proposed method with i) the estimated processesB n,12 (0.5, ·) andB ′ n,12 (0.5, ·) (IF) and ii) the nonparametric cluster bootstrap (CB). Results under right censoring. (n: number of clusters; F M : discrete uniform distribution of the cluster size).  Table 4: Simulation results regarding the empirical type I error (H 0 ) and the empirical power (H 1 ) of the proposed two-sample Kolmogorov-Smirnov-type tests for H 0 : P 0,112 (0.5, ·) = P 0,212 (0.5, ·) and H 0 : P ′ 0,112 (0.5, ·) = P ′ 0,212 (0.5, ·) at the α = 0.05 level. Significance levels were calculated based on either the estimated processesĈ n,12 (0.5, ·) andĈ ′ n,12 (0.5, ·) (IF) or the nonparametric cluster bootstrap (CB). Results under right censoring. (n: number of clusters; F M : distribution of the cluster size).  Table 5: Simulation results for the analysis of P 0,2 (τ 0.4 ) and P ′ 0,2 (τ 0.4 ), where τ 0.4 is the 40th percentile of the follow-up time, based on the standard approach which ignores the withincluster dependence (naïve) and the proposed method with i) the influence function-based variance estimator (IF) and ii) the nonparametric cluster bootstrap (CB). Results under both right censoring and left truncation. (n: number of clusters; F M : discrete uniform distribution of the cluster size; * : ×10 2 ; MCSD: Monte Carlo standard deviation of the estimates; ASE: average estimated standard error; CP: coverage probability).  Table 6: Simulation results for the analysis of P 0,2 (τ 0.6 ) and P ′ 0,2 (τ 0.6 ), where τ 0.6 is the 60th percentile of the follow-up time, based on the standard approach which ignores the withincluster dependence (naïve) and the proposed method with i) the influence function-based variance estimator (IF) and ii) the nonparametric cluster bootstrap (CB). Results under both right censoring and left truncation. (n: number of clusters; F M : discrete uniform distribution of the cluster size; * : ×10 2 ; MCSD: Monte Carlo standard deviation of the estimates; ASE: average estimated standard error; CP: coverage probability).  Table 7: Simulation results regarding the coverage probabilities of the 95% simultaneous confidence bands for P 0,2 (·) and P ′ 0,2 (·) based on the standard method that ignores the within-cluster dependence (naïve) and the proposed method with i) the estimated processeŝ B n,2 andB ′ n,2 (IF) and ii) the nonparametric cluster bootstrap (CB). Results under both right censoring and left truncation. (n: number of clusters; F M : discrete uniform distribution of the cluster size).  Table 8: Simulation results regarding the empirical type I error (H 0 ) and the empirical power (H 1 ) of the proposed two-sample Kolmogorov-Smirnov-type tests for H 0 : P 0,12 (·) = P 0,22 (·) and H 0 : P ′ 0,12 (·) = P ′ 0,22 (·) at the α = 0.05 level. Significance levels were calculated based on either the estimated processesĈ n,2 andĈ ′ n,2 (IF) or the nonparametric cluster bootstrap (CB). Results under both right censoring and left truncation. (n: number of clusters; F M : distribution of the cluster size).  Table 9: Simulation results for the analysis of P 0,12 (0.5, τ 0.4 ) and P ′ 0,12 (0.5, τ 0.4 ), where τ 0.4 is the 40th percentile of the follow-up time conditional on survival at t = 0.5, based on the standard approach which ignores the within-cluster dependence (naïve) and the proposed method with i) the influence function-based variance estimator (IF) and ii) the nonparametric cluster bootstrap (CB). Results under both right censoring and left truncation. (n: number of clusters; F M : discrete uniform distribution of the cluster size; * : ×10 2 ; MCSD: Monte Carlo standard deviation of the estimates; ASE: average estimated standard error; CP: coverage probability).  Table 10: Simulation results for the analysis of P 0,12 (0.5, τ 0.6 ) and P ′ 0,12 (0.5, τ 0.6 ), where τ 0.6 is the 60th percentile of the follow-up time conditional on survival at t = 0.5, based on the standard approach which ignores the within-cluster dependence (naïve) and the proposed method with i) the influence function-based variance estimator (IF) and ii) the nonparametric cluster bootstrap (CB). Results under both right censoring and left truncation. (n: number of clusters; F M : discrete uniform distribution of the cluster size; * : ×10 2 ; MCSD: Monte Carlo standard deviation of the estimates; ASE: average estimated standard error; CP: coverage probability).  Table 11: Simulation results regarding the coverage probabilities of the 95% simultaneous confidence bands for P 0,12 (0.5, ·) and P ′ 0,12 (0.5, ·) based on the standard method that ignores the within-cluster dependence (naïve) and the proposed method with i) the estimated processesB n,12 (0.5, ·) andB ′ n,12 (0.5, ·) (IF) and ii) the nonparametric cluster bootstrap (CB). Results under both right censoring and left truncation. (n: number of clusters; F M : discrete uniform distribution of the cluster size).  Table 13: Simulation results for the analysis of P 0,2 (τ 0.4 ) and P ′ 0,2 (τ 0.4 ), where τ 0.4 is the 40th percentile of the follow-up time, based on the standard approach which ignores the withincluster dependence (naïve) and the proposed method with i) the influence function-based variance estimator (IF) and ii) the nonparametric cluster bootstrap (CB). Results under right censoring, a large variability of cluster size M i , and a small number of clusters. (n: number of clusters; F M : discrete uniform distribution of the cluster size; * : ×10 2 ; MCSD: Monte Carlo standard deviation of the estimates; ASE: average estimated standard error; CP: coverage probability).   Table 14: Simulation results for the analysis of P 0,2 (τ 0.6 ) and P ′ 0,2 (τ 0.6 ), where τ 0.6 is the 60th percentile of the follow-up time, based on the standard approach which ignores the withincluster dependence (naïve) and the proposed method with i) the influence function-based variance estimator (IF) and ii) the nonparametric cluster bootstrap (CB). Results under right censoring, a small number of clusters, and a large variability of cluster size M i . (n: number of clusters; F M : discrete uniform distribution of the cluster size; * : ×10 2 ; MCSD: Monte Carlo standard deviation of the estimates; ASE: average estimated standard error; CP: coverage probability). P 0,2 (τ 0.6 ) P ′ 0,2 (τ 0.6 ) n F M Method Bias * MCSD * ASE * CP Bias * MCSD * ASE * CP  Table 15: Simulation results regarding the coverage probabilities of the 95% simultaneous confidence bands for P 0,2 (·) and P ′ 0,2 (·) based on the standard method that ignores the within-cluster dependence (naïve) and the proposed method with i) the estimated processeŝ B n,2 andB ′ n,2 (IF) and ii) the nonparametric cluster bootstrap (CB). Results under right censoring, a small number of clusters, and a large variability of cluster size M i . (n: number of clusters; F M : discrete uniform distribution of the cluster size).