Non-parametric Analysis of Gap Times for Multiple Event Data: An Overview



Multiple event data are frequently encountered in medical follow-up, engineering and other applications when the multiple events are considered as the major outcomes. They may be repetitions of the same event (recurrent events) or may be events of different nature. Times between successive events (gap times) are often of direct interest in these applications. The stochastic-ordering structure and within-subject dependence of multiple events generate statistical challenges for analysing such data, including induced dependent censoring and non-identifiability of marginal distributions. This paper provides an overview of a class of existing non-parametric estimation methods for gap time distributions for various types of multiple event data, where sampling bias from induced dependent censoring is effectively adjusted. We discuss the statistical issues in gap time analysis, describe the estimation procedures and illustrate the methods with a comparative simulation study and a real application to an AIDS clinical trial. A comprehensive understanding of challenges and available methods for non-parametric analysis can be useful because there is no existing standard approach to identifying an appropriate gap time method that can be used to address research question of interest. The methods discussed in this review would allow practitioners to effectively handle a variety of real-world multiple event data.

1 Introduction

Multiple event data arise in various applications including medical follow-up studies when each subject may experience a series of events. If they are repetitions of events of the same type, they are known as recurrent events. Examples include repeated hospitalisations due to a certain disease and multiple infections after bone marrow transplant. They may be chronologically ordered clinical events of different natures, such as HIV infection, AIDS onset and death in a natural history study of AIDS. Similar examples arise in the analysis of complex systems, such as manufacturing equipment in engineering applications. To analyse multiple event data, two types of variable can be considered: (i) the total time from the start of follow-up to an event; and (ii) the times between successive events. The latter are referred to as gap times, and they may be preferred when multiple events are chronologically ordered. For recurrent event data, when the total time to an event is the variable of interest, a variety of statistical methods that consider individual's multiple events as the realisation of a counting process has been developed. Two popular classes of model are (i) intensity processes that describe the instantaneous risk of an event given at risk status (Anderson & Gill, 1982; Prentice et al., 1981; Wang et al., 2001); and (ii) mean processes (Lawless & Nadeau, 1995; Lin et al., 2000; Pepe & Cai, 1993) that consider the expected number of events over time rather than the event times themselves. In many applications, gap times are of more interest than the total time to an event. For instance, in an AIDS study, patients may repeatedly experience opportunistic infections that indicate deteriorating health. The occurrence of opportunistic infection is commonly used to measure patients’ quality of life, and gap times between successive opportunistic infections can serve as an index of disease progression. Another example comes from a study to evaluate the effect of HIV subtype on disease progression of AIDS, where the event sequence of interest is HIV → AIDS → death. The total time from HIV infection to death might be of less interest because a certain HIV subtype that has a faster progression from HIV infection to AIDS onset will inevitably shorten the total time to death even if it has less or no effect after AIDS onset.

The stochastic-ordering structure and within-subject dependence of multiple events generate challenges for the development of statistical methods for gap time data. Many medical follow-up studies often have a fixed length, resulting in right censoring. Various statistical methods have been proposed for non-parametrically estimating gap time distributions in the sequentially ordered multiple events setting. For the events of the same type and treating the number of recurrent events as random, Pena et al. (2001) derived Nelson–Aalen and Kaplan–Meier-type estimators for univariate gap times. These are shown to be the non-parametric maximum likelihood estimators under the independence and identical distribution assumption on gap time distributions. Their asymptotic variances have explicit, closed-form expressions. Wang & Chang (1999) relaxed the independence assumption on gap times to allow dependence within the same subject and developed non-parametric estimation of univariate gap time distributions. For the case with fixed number of multiple events of different types, Visser (1996) considered non-parametric estimation of the bivariate survival function when censoring may depend on the previous gap times. But his method relied on estimating the cumulative conditional hazard of the second gap time given the first gap time and required discrete censoring time and gap times. Huang & Louis (1998) developed the estimation methods for marked survival data, and they could be readily applied to analyse gap times. Wang & Wells (1998) suggested a product-limit type estimator for the second gap time by an inverse probability of censoring weighting method. The estimator has a very complicated covariance structure. Lin et al. (1999) proposed a non-parametric approach to estimating joint and conditional distributions for multivariate gap times. van der Lann et al. (2002) developed two estimators of a multivariate survival function: an initial inverse probability of censoring weighted estimator and a one-step locally efficient doubly robust estimator, and the method can accommodate censoring that is dependent on the total times and covariates. Schaubel & Cai (2004a) constructed a non-parametric estimator of conditional gap time-specific survival function directly through a cumulative hazard estimator, rather than through a joint distribution function estimator as discussed in Lin et al. (1999) and van der Lann et al. (2002). In addition, Huang & Wang (2005) proposed a non-parametric estimator for joint distribution of bivariate alternating gap times arising from studies where subjects experience two different types of events alternately over time, one kind of bivariate recurrent events.

The literature on gap time analysis is scattered, and a standard approach has not emerged despite the recent methodological developments. Further, the issues of induced dependent censoring and non-identifiability are not well known among practitioners. In some cases, unrealistic and unnatural assumptions on the censoring mechanism are imposed to avoid non-identifiability. Often, these issues have been ignored or not described explicitly. In this paper, we discuss the key issues on gap time modelling and use two examples to illustrate the results. One is a comparative simulation study to assess the performances of different estimation approaches. The other is an application to opportunistic infection data from an AIDS clinical trial, which shows important gap time features that need to be appropriately addressed in practice. The paper is organised as follows. In Section 2, we introduce some notations and formalise the statistical issues in gap time analysis. Section 3 reviews existing non-parametric methods to estimate gap time distributions for univariate recurrent event data, multivariate failure time data and bivariate recurrent event data. Examples are presented in Section 4 for illustrating different methods. Finally, we conclude the paper with some remarks, and further developments are considered.

2 Notation, Induced Dependent Censoring and Identifiability

Firstly, we introduce some notation to facilitate the discussion. Let i be the index of a subject (i = 1, … ,n), and j be the index of an event. The events may be of the same or different types. Suppose that subject i may experience a sequence of events in chronological order at times Yi1,Yi2, … ,Yij, … . These are the total times to events measured from the start of the follow-up. The total censoring time from the start for subject i is denoted by Ci with a survival function G, and we make the standard assumption that the follow-up time is subject to independent right censoring, which implies that the total times to events (Yi1,Yi2, … ,Yij, … ) are independent of Ci. This assumption is fundamental for the gap time methods considered throughout the paper. Gap time analysis focuses on the time between two successive events: Ti1 = Yi1,Ti2 = Yi2 − Yi1, … ,Tij = Yij − Yi,j − 1, … . For any j ≥ 2, the gap time Tij is right censored by Cij = Ci − Yi,j − 1; thus, the censoring time Cij for the j-th gap time is functionally related to previous event times. Generally, unless Tij is independent of Yi,j − 1, the second and subsequent gap times are subject to ‘induced dependent censoring’ (Gelber et al., 1989), which is our first concern. For example, because gap times Ti1 and Ti2 are often naturally dependent, the longer the first gap time, the greater the probability that the second gap time is censored. As a result, among the second gap times, which we are observed, there will be a disproportionate number of shorter times in the presence of censoring, indicating that the standard estimation methods for gap time distribution without accounting for induced dependent censoring would lead to biased results.

The next important issue in gap time analysis is identifiability. Consider the marginal survival function, P(Tj > t), for each gap time. Here and in the sequel, the subscript i is dropped to indicate a random variable for the general population. For the first gap time, the familiar constraint is that we can only estimate P(T1 > t) for t ∈ [0,τc], where τc = sup{t : P(C ≥ t) > 0}. For the second and subsequent gap times, P(Tj > t) (j ≥ 2) is generally not identifiable unless the support of Tj − 1 is contained by the [0,τc] interval. To be specific, if τc < τ1 + t where τ1 = sup{t : P(T1 ≤ t) = 1}, the marginal survival distribution for the second gap time P(T2 > t) is not estimable. The challenge is to find some quantity that is identifiable, interpretable and of general interest to investigators. To this end, it is desirable to estimate meaningful conditional distributions related to gap times. As such, for the second and subsequent gap times, methods are proposed by many authors to estimate the conditional distribution or survival function given the prior event occurs before some fixed time point. In particular, the conditional estimates can be produced from the joint distribution of gap times (Lin et al., 1999) or the conditional cumulative hazard function (Schaubel & Cai, 2004a).

3 Methods for Gap Time Data Analysis

Approaches to analysing gap times depend on the structure of multiple event data as well as on the research interest of the study. In the following, for different types of multiple events, we describe the data feature in detail and present corresponding non-parametric method to estimate the gap time distribution of interest.

3.1 Estimation for Gap Time Data from Univariate Recurrent Events

Univariate recurrent event data arise in longitudinal studies where each subject may repeatedly experience a certain event during follow-up. When the focus is placed on the gap times between events and the gap times for different episodes are assumed to have the same marginal distribution, it is of interest to estimate the marginal survival function of gap time non-parametrically. Gap times from recurrent events can be treated as a type of dependent survival data in analysis. However, because of the chronologically ordered nature of recurrent events, standard methods, such as the Kaplan–Meier estimator (Kaplan & Meier, 1958) derived from the pooled gap times, may not be appropriate. Another special feature in recurrent event data is that the last observed gap time is always biased because of induced dependent censoring. These issues need to be considered with care in estimation.

Let Ni = {Tij : j = 1,2, … }be the set of gap times for subject i. Assume that the data for different subjects (N1,C1),(N2,C2), … ,(Nn,Cn) are independent and identically distributed. Throughout the discussion, the number of recurrent events and hence gap times per subject is considered random. Let mi denote the index satisfying inline image and inline image. Here mi is a random variable taking integer values. The collection of inline image refers to uncensored data, and the collection of inline image refers to censored data, where inline image is the time from event mi − 1 to the end of follow-up.

Wang & Chang (1999) proposed a non-parametric estimator of the marginal survival function based on hazard estimation technique. Their method is based on the following two assumptions:

  • There exists a frailty Z, so that the gap times T1,T2, … are independent and identically distributed given Z = z.

  • The censoring time C is independent of (N,Z).

The first assumption is a common condition in frailty models used to characterise the dependence of gap times from the same subject, and ensures that gap times T1,T2, … have the same marginal survival function S(t) = P(Tj > t). The second assumption specifies the usual independent censoring condition. The issues of identifiability and induced dependent censoring are resolved by these two assumptions. The validity of the two assumptions relies on the nature of recurrent events and the censoring pattern in a specific study. The existence of the frailty Z in the first assumption is to ensure that the independence and identical distribution structure of gap times within subject, but the distribution of Z is actually not used in the methodology. An advantage of the method is that the proposed estimator and properties would still hold even if Z were an abstract variate with the probability measure PZ, where the marginal survival function would be defined as inline image. We then consider estimating marginal survival function from the censored gap times inline image. Define inline image for mi ≥ 2 and inline image for mi = 1. The observed gap times is Xij = Tij for j = 1, … ,mi − 1 and inline image for j = mi. As with the Kaplan–Meier estimator, we need to identify the risk set and failures at a certain time. The total mass of the risk set at time t is

display math

and the mass of failures at t is

display math

where ai = a(Ci), and a( · ) is a positive-valued function subject to the constraint inline image (the expectation is with respect to the distribution of C). The random coefficient ai in R * (t) and d * (t) can be considered as a weight that depends on the length of the censoring time with the potential to give more weight to those with longer observation periods. When mi ≥ 2, the last gap time inline image is not used to avoid the sampling bias. Using the technique of hazard estimation as for the Kaplan–Meier estimator, the product-limit type estimator of survival function is

display math

where inline image are the ordered and distinct uncensored gap times, and inline image is non-increasing in t with inline image. Under regularity conditions and assuming that ai = a(Ci) is a bounded function, as n → ∞ , the random process inline image for t ∈ [0,t * ] converges weakly to a zero-mean Gaussian process with the covariance function σ(t1,t2) = S(t1)S(t2)E{φi(t1)φi(t2)} (Wang & Chang, 1999), where t *  is a constant satisfying t *  < sup{t : S(t)G(t) > 0}and

display math
display math

The use of weight ai in the estimator can be chosen to improve the efficiency of the estimator. However, the choice of optimal ai would vary at different time points, and a closed-form solution does not exist for the optimal choice. For the convenient and important choice ai = 1 (or, equivalently, ai = a positive-valued constant), the estimator leads to a smaller pointwise asymptotic variance of inline image than that of the Kaplan–Meier estimator derived from only using the first gap times or from always dropping the last gap times. The improvement of efficiency would be significant if the gap times are weakly dependent. When the censoring times vary among subjects in observed data, the weight function ai = Ci usually leads to better efficiency results for the situation where the gap times are weakly dependent. In contrast, when there is stronger dependence, the constant weight is a better choice.

In applications, it is often interesting to estimate an average length of gap times for recurrent events over the population. The non-parametric estimation of marginal survival function can be readily applied to produce such estimate. Particularly, such an estimate is a useful tool for evaluating the associated survival behaviours for different risk groups. The method is applied to compare treatment effects on recurrent opportunistic infections in an AIDS clinical trial in Section 4.2.

3.2 Estimation for Gap Time Data from Multivariate Failure Times

In many medical follow-up studies, each subject may experience a series of events of different nature, representing different states of a process. A fundamental question in analysing such multiple event data is the estimation of the multivariate distribution function for gap times in the presence of right censoring. In the sequentially ordered multivariate failure time setting, the gap times except the first one are subject to induced dependent censoring, even if the overall follow-up time is independently right censored. Analysis should take account of this dependent censoring, and the corresponding estimation method for gap times is challenging. We present several convenient and simple approaches to estimating the joint and conditional distributions or survival functions of gap times in the following.

3.2.1 Joint and conditional distribution estimations by inverse probability weighting method

Suppose that each subject under observation experience J ordered events of different types at times Y1 < Y2 < … < YJ. The gap times of interest are T1 = Y1,T2 = Y2 − Y1, … ,TJ = YJ − YJ − 1. The j-th gap time, Tj, is censored by C − Yj − 1. A non-parametric estimator for the joint distribution function of gap times is provided by Lin et al. (1999). In this method, the follow-up time is subject to independent right censoring, but no assumption is imposed on the dependence structure of gap times. The inverse probability weighting methodology is used to adjust for dependent censoring.

For ease of discussion, we first consider the case when J = 2, then extend the estimation method to the general setting of J > 2. Suppose that there are n subjects in the study, each experiences two events at times Y1 < Y2. The gap times are T1 = Y1 and T2 = Y2 − Y1. The observed data consist of inline image, where inline image,δij = I(Yij ≤ Ci) for j = 1,2, i = 1, … ,n; ∧ is the minimization function, and I( · ) is the indicator function. Let F be the joint distribution function of (T1,T2) and G be the survival function of the censoring time C. We have

display math

where H(t1,t2) = P(T1 ≤ t1,T2 > t2). Therefore, an estimator of F can be obtained by estimating H. Because inline image, we have

display math

which indicates that H(t1,t2) can be estimated by a weighted empirical estimator,

display math

where inline image is the Kaplan–Meier estimator of G based on the data inline imageor inline image for i = 1, … ,n, and τc = sup{t : G(t) > 0}. The corresponding estimator of F(t1,t2) is inline image, for t1 + t2 < τc. If t1 + t2 > τc, F(t1,t2) and H(t1,t2) are not estimable. In addition, the estimator inline image is not always a proper distribution function in that it may have negative mass, although it converges to a proper distribution for large samples. As shown in Lin et al. (1999), the estimator inline image is strongly consistent, and the process inline image converges weakly to a bivariate zero-mean Gaussian process with covariance function

display math

where D(t1,t2;u) = {pr(T1 ≤ t1) − pr(T1 ≤ u)} +  − {H(t1,t2) − H(u − t2,t2)} + , and Λc is the cumulative hazard function of C.

For the first gap time T1, the marginal distribution can be estimated by inline image, which is identical to the Kaplan–Meier estimator. Although the marginal distribution function for T2 is not generally estimable, it is interesting and possible to estimate the conditional distribution F2 | 1(t2 | t1) = P(T2 ≤ t2 ∣ T1 ≤ t1) = F(t1,t2) ∕ F(t1, ∞ ) as long as t1 + t2 < τc. A natural estimator of F2 | 1(t2 | t1) is obtained by replacing F with inline image. The conditional estimator is consistent and converges weakly to zero-mean Gaussian process with easily estimated covariance function (Lin et al., 1999). It is straightforward to extend the method to the general case of J > 2. Define t = (t1, … ,tJ), t0 = (t1, … ,tJ − 1,0), F (t) = P(T1 ≤ t1, … ,TJ ≤ tJ), and H(t) = P(T1 ≤ t1, … ,TJ − 1 ≤ tJ − 1,TJ > tJ).Obviously, F(t) = H(t0) − H(t). Similarly, H(t) can be estimated by

display math

where inline image is the Kaplan–Meier estimator of G based on the datainline imagefor i = 1, … ,n. The joint distribution F(t) is then estimated by inline image, and the conditional distributions for the second and subsequent gap times can be obtained in the same fashion as that for J = 2.

The method provides a simple solution to gap time distribution estimation for multivariate failure times. The joint distribution function is estimated by an empirical mean-type estimator with a closed-form covariance structure. Essentially, each observation is inversely weighted by the probability of being uncensored, which effectively removes the bias from dependent censoring. Estimation of the conditional distribution for the second gap times is provided, and such measure could be more informative and ready to interpret for practitioners. Taking a natural history study of HIV, for example, the event sequence is of interest, is birth → HIV → death. A regression analysis of time from HIV to death on age at HIV infection, based on the Cox proportional hazards model, would give some general understanding of the relationship between the corresponding two gap times. Nevertheless, the disease progression could be more precisely depicted by the conditional probability of surviving beyond a certain amount of time given some restricted age at infection. In addition, it is often of interest to compare gap time distributions among groups of subjects. The method plays an important role in constructing a family of non-parametric two-sample tests for difference in gap time distributions (Lin & Ying, 2001). The non-parametric method, however, requires that the censoring time is completely independent of gap times, which could be a practical limitation in some contexts.

3.2.2 Conditional survival estimation based on conditional cumulative hazard

As an alternate to the aforementioned method suggested by Lin et al. (1999), estimate of conditional survival function of gap time can be developed on the basis of the corresponding conditional hazard function, via the relationship of S(t) = exp{ − Λ(t)} where Λ(t) is the cumulative hazard function. In this way, the resulting estimate is not subject to negative mass, which is a problem with many survival function estimators that depend on estimation of the joint survival function as in Section 3.2.1.

Because of non-identifiability of the marginal distributions of the second and subsequent gap times, the conditional survival function of the j-th gap time Tj given the (j − 1)-th event occurring prior to a fixed time, Sj ∣ j − 1(tj | tj − 1) = P (Tj > tj ∣ Yj − 1 ≤ tj − 1) where Yj − 1 is the total time to the (j − 1)-th event, is of interest for j ≥ 2. Schaubel & Cai (2004a) proposed to estimate the condition survival function based on a conditional cumulative hazard estimation. In the absence of censoring, the conditional cumulative hazard for (Tj ∣ Yj − 1 ≤ tj − 1) could be estimated by

display math(1)

where inline image and inline image (s,Yi,j − 1 ≤ tj − 1). With censoring, the observed gap time data consist of inline image where inline image, δij = I(Tij ≤ Cij), Cij = Ci − Yi,j − 1, and Ci is the overall censoring time for subject i, for j = 1,2, … , and i = 1, … ,n. Denote G(t) = P(Ci > t). In the presence of censoring, for j ≥ 2, Sj ∣ j − 1(tj | tj − 1) is identifiable for tj ∈ [0,τj] where tj − 1 + τj ≤ τc and τc = sup{t : G(t) > 0}. We have for j ≥ 2,

display math
display math

where inline image and inline image. An estimate of the conditional cumulative hazard function can be obtained by replacing potentially unobservable random variables in (1) with some consistent estimates of quantities with the same conditional expectation, and is given as

display math

for j ≥ 2, where inline image are the ordered and distinct uncensored gap times for Tj, and inline image is the Kaplan–Meier estimator of G(t) based on (YiJ ∧ Ci,1 − δiJ) for i = 1, … ,n. The conditional survival function of interest can then be estimated by inline image, which is bounded by 0 and 1 and is monotone in tj, for tj ∈ [0,τj]. The asymptotic results are established by empirical process technique and functional delta method. The estimator inline image is uniformly consistent, and converges weakly to a zero-mean Gaussian process with a covariance function that can be consistently estimated by inline image where inline image is given by formula ((3)3) in Schaubel & Cai (2004a). Simultaneously, a 100(1 − α)% confidence band for the estimator inline image can be constructed on the basis of the normality result.

Similar in spirit to Lin et al. (1999), the method adjusts for induced dependent censoring by weighting risk set contributions by the inverse of the probability of being uncensored. The conditional survival function is estimated directly through a cumulative hazard estimator, rather than through a joint distribution estimator. One advantage of the estimator compared with that in Section 3.2.1 is the ease in computing asymptotic standard errors, which is important in practice. For the sake of identifiability, the method conditions on the (j − 1)-th event occurring in the [0,tj − 1] interval. In application, such conditioning may often make the survival estimators more meaningful to investigators, particularly when the conditioning serves to identify the subjects of primary interest. For instance, in the disease process of HIV → AIDS → death, it could argue that the subjects developed AIDS within 1year after HIV infection may have an extremely bad disease progression, and therefore, special focus should be placed on the survival behaviour after AIDS onset for these subjects. The method could be applied to address questions of this kind.

3.2.3 Joint distribution estimation based on method for marked survival data

Huang & Louis (1998) proposed a general estimation procedure for the joint distribution of survival time and mark variables that mark the endpoints and are observed only if the survival time is uncensored. Analogous to the product integral representation of survival function, the joint distribution of survival time and mark variables can be represented by a cumulative mark-specific hazard function, which serves as a basis of the estimation method. In the sequentially ordered multiple events setting, the survival time is the overall follow-up time, and the mark variables are a vector of gap times. Therefore, by appropriately choosing endpoints and mark variables, estimator for the joint distribution of gap times can be obtained using method for marked survival data. We present the gap times of interest and the corresponding procedure for estimating their joint distribution function under the framework of marked survival data.

For simplicity of discussion, we consider the case when J = 2, and the gap times are T1 and T2. Define random variables X = (T1,T2) as the bivariate mark vector, Y = T1 + T2 as the survival time and denote their joint distribution function by FXY, and the marginal survival function of Y by SY. For x = (t1,t2),FXY(x,y) = P(T1 ≤ t1,T2 ≤ t2,Y ≤ y) and SY(y) = 1 − FXY{( ∞ , ∞ ),y}. Obviously, the joint distribution function of (T1,T2) is determined by FXY through F(t1,t2) = FXY{(t1,t2),t1 + t2}. Suppose the overall follow-up time Y is subject to independent censoring with censoring time C and the values of X = (T1,T2) are completely observed only when Y is uncensored. The observed data consist of inline image where inline image and δi = I(Yi ≤ Ci), for i = 1, … ,n. Denote the marginal survival function of inline image by inline image and let inline image. Under independent censoring assumption of Y and C, the joint distribution function FXY can be expressed as

display math

whereinline image, inline image, and G( · ) is the survival function of C. In construction of the estimator of FXY, the survival function SY is estimated by the Kaplan–Meier estimator, inline image and inline image are estimated by their corresponding empirical measures. The joint distribution of gap times (T1,T2) is then estimated by inline image, which enjoys desirable asymptotic properties following Theorem 6 in Huang & Louis (1998). By defining a mark vector X = (T1, … ,TJ) and a survival time Y = T1 + … + TJ based on gap times, it is straightforward to extend the method to the general case of J > 2. The resulting estimator is unbiased, uniformly strongly consistent and asymptotically normal. As other methods considered in this paper, the approach relies on the random censorship assumption.

In summary, the three non-parametric methods, presented in this section to estimate the gap time distribution for multivariate failure times, appropriately adjust for induced dependent censoring. For the conditional survival function, the variance is easier to compute for the estimator on the basis of conditional cumulative hazard than the one derived from the estimated joint and marginal distributions. In addition, simultaneous confidence bands can be constructed by the method based on conditional cumulative hazard. Neither of the other two methods provide such techniques. Further, by combining with the method for univariate recurrent events in Section 3.1, the method based on marked survival data not only can be applied to analyse gap times from multivariate failure times but also can be used to draw inference on bivariate alternating gap time data from bivariate recurrent events as shown in Section 3.3.

3.3 Estimation for Gap Time Data from Bivariate Recurrent Events

In many studies, subjects experience two different types of events alternately over time. For instance, a patient with coronary heart disease could be repeatedly admitted into and discharged from a hospital. Such data arise from bivariate recurrent events, and we focus on the recurrent gap times of two different natures. In the example of coronary heart disease, at each point, a patient is at either the stage of being in hospitalisation or being discharged. When evaluating the efficacy of a treatment, investigators may want to study whether the treatment shortens the hospitalisation stage or whether it prolongs the duration of being discharged. Particularly, the two gap times of different natures are of interest, and estimation of the corresponding joint distribution may suggest the degree of association within bivariate recurrent events. Method proposed by Huang & Louis (1998) can be applied by only using the first pair of bivariate recurrent times, but such approach loses efficiency because bivariate recurrent events of higher orders are disregarded. Taking advantage of the recurrent feature of the data, Huang & Wang (2005) developed a non-parametric method for estimating the joint distribution of gap times of this type, combining techniques for gap times from univariate recurrent events and techniques for gap time data from bivariate failure times.

Suppose each subject's status alternates between two states over time. Denote the durations of the two states by (S,T), which is a pair of gap times, and a collection of the two-state gap times for the i-th subject by Ni = {(Si1,Ti1),(Si2,Ti2) , … }. Let C be the censoring time from the start of the study with survival function G( · ). Assume that (N1,C1),(N2,C2) , … , (Nn,Cn) are independent and identically distributed, but correlation among gap times from the same subject is allowed. Let mi denote the index that satisfies inline image and inline image. The last pair of the two-state gap times is subject to right censoring. The observed data are denoted by inline image. It is possible that the last S observation is uncensored, and only the last T is censored. As in the univariate case, the total number of bivariate recurrent events is random, and the last observation time inline image is always biased. The interest is on estimating the joint distribution of (S,T), denoted by FST(s,t). Because the observation time of S + T does not exceed C, FST(s,t) is identifiable only in the region {(s,t) : s + t ≤ τc} where τc = sup{t : G(t) > 0}. Two assumptions are adopted in the estimation method:

  • Given a frailty Z = z, the two-state gap times (S1,T1),(S2,T2), … are independent and identically distributed.
  • The censoring time C is independent of (N,Z).

Next, we present the estimation procedure for FST(s,t). Using technique for marked survival data, define Xij = (Sij,Tij) as the bivariate mark vector and Yij = Sij + Tij as the survival time. For 1 ≤ j ≤ mi, let inline image and denote inline image, inline image and δi = I(Yij ≤ Cij). Under the two assumptions,

display math(2)

where Fa(x,y) = E {aiI (Xi1 ≤ x,Yi1 ≤ y,δi1 = 1),Ra(y) = E {aiI (Yi1 ≥ y)}, x = (s,t) is a vector of real numbers, and ai is a positive-valued weight with inline image. By the exchangeability of the complete observations of the two-state gap times, Fa and Ra can be estimated by the corresponding empirical measures inline image and inline image. Define inline image for mi ≥ 2 and inline image for mi = 1. To avoid sampling bias, the last pairs of the two-state gap times are not used unless mi = 1. We have

display math


display math

By the technique for univariate recurrent events in Section 3.1,

display math(3)

where inline image are the ordered and distinct uncensored times from inline image. Following the representations in (2) and (3), we can estimate FXY by

display math

Then the joint distribution function for the two-state gap times, FST(s,t), can be immediately estimated by

display math

which has desirable asymptotic properties for large samples (see Theorem 2 in Huang & Wang (2005).

In general, the marginal distribution functions of S or T cannot be estimated directly from FST(s,t), because this joint distribution function is identifiable only for s + t ≤ τc. However, the marginal function for S can be obtained readily by applying gap time method for univariate recurrent events. The marginal function for T is not estimable because of induced dependent censoring, and we can only estimate the conditional distribution function of T given S for s + t ≤ τc. Although the method is non-parametric and does not require to specify the parametric distribution of the frailty Z, the validity depends on the conditional independence and identical distribution assumption on the two-state gap times as well as independent censoring assumption. Without the conditional independence and identical distribution assumption, the exchangeability among uncensored gap times may not hold, and the method is not appropriate. Independent censoring assumption could fail when the observation of the bivariate recurrent events is terminated by informative drop-out or some failure event, such as death. In application, these two assumptions should be carefully examined.

4 Illustrative Examples

4.1 Simulation

For gap times arising from multiple events of different natures, the conditional distributions of the second or subsequent gap times given previous event occurring prior to some time point are often of important scientific interest and can be fully identifiable. The estimate can be derived from the estimated joint distribution based on the inverse probability weighting method as discussed in Section 3.2.1, or from an estimated joint distribution based on the method for marked survival data in Section 3.2.3, or from a conditional cumulative hazard estimation in Section 3.2.2. The finite sample property of the estimator for the conditional distribution was separately studied in Lin et al. (1999) and Schaubel & Cai (2004a). A simulation study is presented here to compare the performances of these three approaches with estimating the conditional distribution. We consider the case of gap times of two states, and generate (T1,T2) from bivariate survival function of Clayton copula 1978,

display math

with unit exponential margins and correlation parameter α = 1 and 2, where inline image and inline image are the marginal survival functions of T1 and T2, respectively. The follow-up time is subject to right censoring by an independent random variable that follows uniform (0, 4) distribution, so that about 20% to 25% of T1 and 40% to 45% of T2 are censored. One thousand replications are conducted with sample sizes n = 200 and n = 400 for each data configuration.

Table 1. Simulation summary statistics for inline image, inline image and inline image.
nαt1t2inline imageinline imageinline imageinline imageinline imageinline imageinline imageinline imageinline image
  1. Bias( · ), empirical bias; see( · ), empirical standard error; sem( · ), average model-based standard error.

20010.5110.223 − 0.0010.0580.0560.0020.0530.0520.0060.0480.049
   0.511 − 0.0350.0600.0590.0130.0580.0560.0010.0590.057
   0.916 − 0.0430.0460.0440.0170.0490.0480.0030.0540.053
  0.9160.2230.0030.0460.045 − 0.0020.0500.0480.0050.0400.038
   0.511 − 0.0400.0520.0500.0150.0540.0520.0040.0510.048
   0.916 − 0.0510.0390.0330.0180.0430.0420.0050.0440.042
   0.511 − 0.0230.0540.0520.0080.0560.0550.0080.0580.057
   0.916 − 0.0230.0350.0340.0090.0350.0350.0100.0450.041
   0.511 − 0.0360.0490.0480.0080.0470.0460.0010.0470.045
   0.916 − 0.0370.0350.0320.0100.0380.0360.0040.0400.037
40010.5110.223 − 0.0010.0410.0400.0010.0420.0410.0020.0380.039
   0.511 − 0.0350.0420.0400.0100.0450.0430.0030.0440.042
   0.916 − 0.0410.0310.0280.0150.0360.0330.0050.0390.038
  0.9160.2230.0030.0330.032 − 0.0010.0340.033 − 0.0020.0330.031
   0.511 − 0.0410.0360.0330.0130.0360.0340.0000.0350.034
   0.916 − 0.0520.0280.0250.0180.0300.0290.0010.0330.030
 20.5110.223 − 0.0010.0440.0430.0010.0430.042 − 0.0020.0440.042
   0.511 − 0.0250.0400.0380.0080.0410.0390.0000.0400.037
   0.916 − 0.0240.0250.0240.0070.0410.0390.0040.0270.026
   0.511 − 0.0360.0360.0350.0090.0330.0310.0060.0330.031
   0.916 − 0.0380.0250.0230.0120.0230.0240.0030.0270.025

We assess the performances of the estimators of S2 ∣ 1(t2 | t1) = P (T2 > t2 ∣ T1 ≤ t1) as well as the corresponding variance estimators. The conditional survival function estimator obtained from the joint survival function estimation in Section 3.2.1 is denoted by inline image, that from the joint survival function estimation in Section 3.2.3 is denoted by inline image and that from the conditional cumulative hazard estimation is denoted by inline image. Table  1 summarises the empirical bias, empirical standard error and average model-based standard error of inline image, inline image and inline image, respectively. The results are given for t1 = 0.511 and 0.916 corresponding to marginal survival probabilities of 0.6 and 0.4, and t2 = 0.223, 0.511 and 0.916 corresponding to marginal survival probabilities of 0.8, 0.6 and 0.4. It shows that although the empirical and average model-based standard errors for all three estimators are comparable and considerably small, inline image outperforms the other two estimators with much smaller bias in general. The larger bias of inline image and inline image is probably due to an additional step of estimating the marginal survival probability of T1, and the bias of inline image is bigger than that of inline image. Moreover, for all the estimation methods, the empirical and average model-based standard errors are very close, which implies that the inferences of S2 ∣ 1(t2 | t1) based on these methods are reasonably good. In particular, the functional delta method-based asymptotic variance estimator for inline image is more convenient to implement than those for inline image and inline image. The simulation experiment suggests advantages of using the conditional cumulative hazard-based approach to estimate the conditional survival function for gap times.

4.2 Application to Opportunistic Infection Data

An application to opportunistic infection data from an AIDS clinical trial is provided as an illustration of gap time analysis for univariate recurrent events. One objective of the clinical trial conducted by the community programmes for clinical research on AIDS was to compare didanosine (DDI) and zalcitabine (DDC) as treatment for HIV-infected patients (Abrams et al., 1994). A total of 467 patients who previously received zidovudine and had 300 or fewer CD4 cells per cubic millimetre or a diagnosis of AIDS were recruited in the study. Among them, 230 were randomly assigned to receive DDI treatment and 237 assigned to receive DDC treatment. As shown in Figure 1, during the first 180 days after randomization, the survival curves for patients in the two treatment groups are very similar, whereas for the rest of the study, DDC provides a survival advantage over DDI. Nevertheless, prolonging overall survival time is not the only goal of the treatments. Quality of life is also considered as an important index for treatment assessment.

Figure 1.

Kaplan–Meier estimates for time to death by treatment group.

Various opportunistic infections are associated with AIDS because of the patient's compromised immune system. The occurrence of such events indicates a deterioration in health and is a commonly-used measure of qualify of life. In the study, 362 opportunistic infections were recorded, among which 171 were in DDI group, and 191 were in DDC group. Each individual experienced between 0 and 5 opportunistic infections. The interest is in investigating the treatment effect on opportunistic infection by analysing the gap time between two successive opportunistic infections. However, because the observation of opportunistic infection may be terminated by death, the gap time method univariate recurrent events in Section 3.1 that relies on independent censoring assumption can not be directly applied. Such an approach, if used for the entire study population, intends to assess the overall benefit of the treatment without distinguishing the treatment effect on opportunistic infection from the effect on susceptibility to death. One solution is to focus on a subsample of the original data, which comprises 279 patients who survived until the end of the study. More generally, different time lengths from the beginning of the study can be chosen to study the treatment effect on opportunistic infection among survivors within some restricted time interval. For instance, it is often of interest to know whether DDC would reduce the likelihood of experiencing opportunistic infection compared with DDI, for AIDS patients who survived beyond 1 year. The gap time and censoring time are measured in days in the study. The maximum follow-up time is 604 days, so it may be appropriate to assume that there is no potential long-term pattern change of gap times. Among 279 survivors, 121 experienced between 1 and 5 opportunistic infections, in which 57 received DDI, and 64 received DDC. For the remaining 158 survivors without opportunistic infection, 73 received DDI, and 85 received DDC. The estimated survival function for gap times is calculated for each group. The analytical results are presented in Figure 2. Comparing the two estimated survival curves, it is clearly shown that there is no big difference between DDI and DDC in preventing opportunistic infection. The corresponding 95% pointwise confidence bands for the two curves (not shown) also largely overlap with each other.

Figure 2.

Estimated survival functions for gap times of opportunist infection by treatment group.

In addition, we artificially set different maximum censoring times at C = 200, 300, 400, 500 and 600 days to construct non-parametric survival function estimates for those who were still alive until C. We found that the resulting survival estimates are close to each other in the estimable areas, which may indicate that the assumption of exchangeability among gap times is approximated reasonable. Moreover, the survival function estimates for the two treatment groups do not have significant difference for all the cases. The analysis suggests that although DDC outperforms DDI in prolonging life, it does not show a significant improvement in preventing opportunistic infection among AIDS survivors. To avoid the dependent censoring caused by death, we place our interest on the survivors and consider the treatment effect for this particular subsample. Such analysis is important and scientifically meaningful, but it also introduces limitations by not including all the study patients. Further, it is interesting to study the treatment effect on opportunistic infection for those who died. These patients may be sicker with weaker immune systems, for whom probably DDC might show some advantage in preventing opportunistic infection. Gap time method for univariate recurrent events in the presence of dependent censoring needs to be developed to answer such question.

5 Conclusion and Further Development

This paper summarises the current state of non-parametric gap time analysis and discusses a class of methods for gap times from various multiple event data, with a focus on medical follow-up studies. We have kept the presentation of each method as clear and easy as possible, focusing on specific data structures and providing the necessary technical details in the estimation procedure. Two illustrative examples are provided to show some practical issues in implementation. One is a comparative simulation study assessing finite sample performances of two approaches to estimating the conditional survival function, which suggests advantages of the conditional cumulative hazard-based estimator over the joint distribution-based estimator. The other is an application to opportunistic infection data from an AIDS clinical trial, an example of gap times from univariate recurrent events. Practitioners should use gap time methods with care because they require various assumptions. The choice of a method depends on the type of gap time data under study as well as on the research problem of interest.

We now discuss some directions for further work. All the methods considered in this paper rely on a common assumption of independent censoring, which means the overall follow-up is subject to random right censoring. However, in many applications, the censoring time could be informative. For instance, in the example of the AIDS clinical trial, the observation of opportunistic infection for some patients was terminated by death. Consequently, patients with shorter life expectancy may have relatively less chance to experience opportunistic infection. Directly applying the gap time methods for univariate recurrent event data would lead to unfair comparison among patients with different survival experiences. In the current analysis, we focus on a specific group of survivors to avoid such dependent censoring, and the statistical inference is restricted to the survivors only. A more flexible approach that allows dependent censoring is desirable. Under the framework of stochastic process, Wang et al. (2001) proposed a multiplicative intensity model to analyse the univariate recurrent event data with informative censoring, where a latent variable was introduced to relax the assumption to conditional independence between censoring time and recurrent event process. This process-based approach may shed some light on handling informative censoring for gap time analysis.

This paper is concerned with non-parametric methods for estimating gap time distribution using event time data only. However, in some real-world studies, information on a variety of continuous and/or discrete covariates is collected and would help in modelling gap times and understanding their relationship with other factors. Extension of these methods to the regression setting that incorporates the covariate information represents another interesting development. Various regression methods have been proposed for modelling the hazard functions of gap times of different types (Chen et al., 2004; Huang & Chen, 2003; Schaubel & Cai, 2004b; Xue & Brookmeyer, 1996). A discussion and review on this topic will be communicated in a separate report.


The author thanks two reviewers, the editor and the co-editor-in-chief for their helpful comments and suggestions that have greatly improved this paper.