## 1. Introduction

The Cox (1972) proportional hazards model is the corner-stone of modern survival analysis. The model specifies that the hazard function of the failure time conditional on a set of possibly time varying covariates is the product of an arbitrary base-line hazard function and a regression function of the covariates. Cox (1972, 1975) introduced the ingenious partial likelihood principle to eliminate the infinite dimensional base-line hazard function from the estimation of regression parameters with censored data. In a seminal paper, Andersen and Gill (1982) extended the Cox regression model to general counting processes and established the asymptotic properties of the maximum partial likelihood estimator and the associated Breslow (1972) estimator of the cumulative base-line hazard function via the elegant counting process martingale theory. The maximum partial likelihood estimator and the Breslow estimator can be viewed as non-parametric maximum likelihood estimators (NPMLEs) in that they maximize the non-parametric likelihood in which the cumulative base-line hazard function is regarded as an infinite dimensional parameter (Andersen *et al.* (1993), pages 221–229 and 481–483, and Kalbfleisch and Prentice (2002), pages 114–128).

The proportional hazards assumption is often violated in scientific studies, and other semiparametric models may provide more accurate or more concise summarization of data. Under the proportional odds model (Bennett, 1983), for instance, the hazard ratio between two sets of covariate values converges to 1, rather than staying constant, as time increases. The NPMLE for this model was studied by Murphy *et al.* (1997). Both the proportional hazards and the proportional odds models belong to the class of linear transformation models which relates an unknown transformation of the failure time linearly to covariates (Kalbfleisch and Prentice (2002), page 241). Dabrowska and Doksum (1988), Cheng *et al.* (1995) and Chen *et al.* (2002) proposed general estimators for this class of models, none of which are asymptotically efficient. The class of linear transformation models is confined to traditional survival (i.e. single-event) data and time invariant covariates.

As an example of non-proportional hazards structures, Fig. 1 displays (in the full curves) the Kaplan–Meier estimates of survival probabilities for the chemotherapy and chemotherapy plus radiotherapy groups of gastric cancer patients in a randomized clinical trial (Stablein and Koutrouvelis, 1985). The crossing of the two survival curves is a strong indication of crossing hazards. This is common in clinical trials because the patients who receive the more aggressive intervention (e.g. radiotherapy or transplantation) are at elevated risks of death initially but may enjoy considerable long-term survival benefits if they can tolerate the intervention. Crossing hazards cannot be captured by linear transformation models. The use of the proportional hazards model could yield very misleading results in such situations.

Multivariate or dependent failure time data arise when each study subject can potentially experience several events or when subjects are sampled in clusters (Kalbfleisch and Prentice (2002), chapters 8–10). It is natural and convenient to represent the dependence of related failure times through frailty or random effects (Clayton and Cuzick, 1985; Oakes, 1989, 1991; Hougaard, 2000). The NPMLE of the proportional hazards model with gamma frailty was studied by Nielsen *et al.* (1992), Klein (1992), Murphy (1994, 1995), Andersen *et al.* (1997) and Parner (1998). Gamma frailty induces a very restrictive form of dependence, and the proportional hazards assumption fails more often with complex multivariate failure time data than with univariate data. The focus of the existing literature on the proportional hazards gamma frailty model is due to its mathematical tractability. Cai *et al.* (2002) proposed estimating equations for linear transformation models with random effects for clustered failure time data. Zeng *et al.* (2005) studied the NPMLE for the proportional odds model with normal random effects and found the estimators of Cai *et al.* (2002) to be considerably less efficient.

Lin (1994) described a colon cancer study in which the investigators wished to assess the efficacy of adjuvant therapy on recurrence of cancer and death for patients with resected colon cancer. By characterizing the dependence between recurrence of cancer and death through a random effect, one could properly account for the informative censoring caused by death on recurrence of cancer and accurately predict a patient's survival outcome given his or her cancer recurrence time. However, random-effects models for multiple types of events have received little attention in the literature.

In longitudinal studies, data are often collected on repeated measures of a response variable as well as on the time to the occurrence of a certain event. There is a tremendous recent interest in joint modelling, in which models for the repeated measures and failure time are assumed to depend on a common set of random effects. Such models can be used to assess the joint effects of base-line covariates (such as treatments) on the two types of outcomes, to study the effects of potentially mismeasured time varying covariates on the failure time and to adjust for informative drop-out in the analysis of repeated measures. The existing literature (e.g. Wulfsohn and Tsiatis (1997), Hogan and Laird (1997) and Henderson *et al.* (2000)) has been focused on the linear mixed model for repeated measures and the proportional hazards model with normal random effects for the failure time.

The linear mixed model is confined to continuous repeated measures with normal error. In addition, the transformation of the response variable is assumed to be known. Inference under random-effects models is highly non-robust to misspecification of transformation. Our experience in human immunodeficiency virus (HIV) and acquired immune deficiency syndrome research shows that different transformations of CD cell counts often yield conflicting results. Thus, it would be desirable to employ semiparametric models (e.g. linear transformation models) for continuous repeated measures, so that a parametric specification of the transformation or distribution can be avoided. This kind of model has not been studied even without the task of joint modelling, although econometricians (Horowitz (1998), chapter 5) have proposed inefficient estimators for univariate responses.

As evident from the above description, the existing semiparametric regression models, although very useful, have important limitations and, in most cases, lack efficient estimators or careful theoretical treatments. In this paper, we unify and extend the current literature, providing a comprehensive methodology with strong theoretical underpinning. We propose a very general class of transformation models for counting processes which encompasses linear transformation models and which accommodates crossing hazards, time varying covariates and recurrent events. We then extend this class of models to dependent failure time data (including recurrent events, multiple types of events and clustered failure time data) by incorporating a rich family of multivariate random effects. Furthermore, we present a broad class of joint models by specifying random-effects transformation models for the failure time and generalized linear mixed models for (discrete or continuous) repeated measures. We also propose a semiparametric linear mixed model for continuous repeated measures, under which the transformation of the response variable is completely unspecified.

We establish the consistency, asymptotic normality and asymptotic efficiency of the NPMLEs for the proposed models by appealing to modern empirical process theory (van der Vaart and Wellner, 1996) and semiparametric efficiency theory (Bickel *et al.*, 1993). In fact, we develop a very general asymptotic theory for non-parametric maximum likelihood estimation with censored data. Our general theory can be used to derive asymptotic results for many existing semiparametric models which are not covered in this paper as well as those to be invented in the future. Simulation studies show that the asymptotic approximations are accurate for practical sample sizes.

It is widely believed that NPMLEs are intractable computationally. This perception has motivated the development of *ad hoc* estimators which are less efficient statistically. We present in this paper simple and effective methods to calculate the NPMLEs and to implement the corresponding inference procedures. These methods apply to a wide variety of semiparametric models with censored data and make the NPMLEs computationally more feasible than the *ad hoc* estimators (when the latter exist). Their usefulness is amply demonstrated through simulated and real data.

As hinted in the discussion thus far, we are suggesting the following strategies in the research and practice of survival analysis and related fields.

- (a) Use the new class of transformation models to analyse failure time data.
- (b) Make routine use of random-effects models for multivariate failure time data.
- (c) Choose normal random effects over gamma frailty.
- (d) Determine transformations of continuous response variables non-parametrically.
- (e) Formulate multiple types of outcome measures with semiparametric joint models.
- (f) Adopt maximum likelihood estimation for semiparametric regression models.
- (g) Rely on modern empirical process theory as the primary mathematical tool.

We shall elaborate on these points in what follows, particularly at the end. In addition, we shall pose a wide range of open problems and outline several directions for future research.