## 1 Introduction

Multiple event data arise in various applications including medical follow-up studies when each subject may experience a series of events. If they are repetitions of events of the same type, they are known as recurrent events. Examples include repeated hospitalisations due to a certain disease and multiple infections after bone marrow transplant. They may be chronologically ordered clinical events of different natures, such as HIV infection, AIDS onset and death in a natural history study of AIDS. Similar examples arise in the analysis of complex systems, such as manufacturing equipment in engineering applications. To analyse multiple event data, two types of variable can be considered: (i) the total time from the start of follow-up to an event; and (ii) the times between successive events. The latter are referred to as gap times, and they may be preferred when multiple events are chronologically ordered. For recurrent event data, when the total time to an event is the variable of interest, a variety of statistical methods that consider individual's multiple events as the realisation of a counting process has been developed. Two popular classes of model are (i) intensity processes that describe the instantaneous risk of an event given at risk status (Anderson & Gill, 1982; Prentice *et al.*, 1981; Wang *et al.*, 2001); and (ii) mean processes (Lawless & Nadeau, 1995; Lin *et al.*, 2000; Pepe & Cai, 1993) that consider the expected number of events over time rather than the event times themselves. In many applications, gap times are of more interest than the total time to an event. For instance, in an AIDS study, patients may repeatedly experience opportunistic infections that indicate deteriorating health. The occurrence of opportunistic infection is commonly used to measure patients’ quality of life, and gap times between successive opportunistic infections can serve as an index of disease progression. Another example comes from a study to evaluate the effect of HIV subtype on disease progression of AIDS, where the event sequence of interest is HIV → AIDS → death. The total time from HIV infection to death might be of less interest because a certain HIV subtype that has a faster progression from HIV infection to AIDS onset will inevitably shorten the total time to death even if it has less or no effect after AIDS onset.

The stochastic-ordering structure and within-subject dependence of multiple events generate challenges for the development of statistical methods for gap time data. Many medical follow-up studies often have a fixed length, resulting in right censoring. Various statistical methods have been proposed for non-parametrically estimating gap time distributions in the sequentially ordered multiple events setting. For the events of the same type and treating the number of recurrent events as random, Pena *et al*. (2001) derived Nelson–Aalen and Kaplan–Meier-type estimators for univariate gap times. These are shown to be the non-parametric maximum likelihood estimators under the independence and identical distribution assumption on gap time distributions. Their asymptotic variances have explicit, closed-form expressions. Wang & Chang (1999) relaxed the independence assumption on gap times to allow dependence within the same subject and developed non-parametric estimation of univariate gap time distributions. For the case with fixed number of multiple events of different types, Visser (1996) considered non-parametric estimation of the bivariate survival function when censoring may depend on the previous gap times. But his method relied on estimating the cumulative conditional hazard of the second gap time given the first gap time and required discrete censoring time and gap times. Huang & Louis (1998) developed the estimation methods for marked survival data, and they could be readily applied to analyse gap times. Wang & Wells (1998) suggested a product-limit type estimator for the second gap time by an inverse probability of censoring weighting method. The estimator has a very complicated covariance structure. Lin *et al*. (1999) proposed a non-parametric approach to estimating joint and conditional distributions for multivariate gap times. van der Lann *et al*. (2002) developed two estimators of a multivariate survival function: an initial inverse probability of censoring weighted estimator and a one-step locally efficient doubly robust estimator, and the method can accommodate censoring that is dependent on the total times and covariates. Schaubel & Cai (2004a) constructed a non-parametric estimator of conditional gap time-specific survival function directly through a cumulative hazard estimator, rather than through a joint distribution function estimator as discussed in Lin *et al*. (1999) and van der Lann *et al*. (2002). In addition, Huang & Wang (2005) proposed a non-parametric estimator for joint distribution of bivariate alternating gap times arising from studies where subjects experience two different types of events alternately over time, one kind of bivariate recurrent events.

The literature on gap time analysis is scattered, and a standard approach has not emerged despite the recent methodological developments. Further, the issues of induced dependent censoring and non-identifiability are not well known among practitioners. In some cases, unrealistic and unnatural assumptions on the censoring mechanism are imposed to avoid non-identifiability. Often, these issues have been ignored or not described explicitly. In this paper, we discuss the key issues on gap time modelling and use two examples to illustrate the results. One is a comparative simulation study to assess the performances of different estimation approaches. The other is an application to opportunistic infection data from an AIDS clinical trial, which shows important gap time features that need to be appropriately addressed in practice. The paper is organised as follows. In Section 2, we introduce some notations and formalise the statistical issues in gap time analysis. Section 3 reviews existing non-parametric methods to estimate gap time distributions for univariate recurrent event data, multivariate failure time data and bivariate recurrent event data. Examples are presented in Section 4 for illustrating different methods. Finally, we conclude the paper with some remarks, and further developments are considered.