Invited Review Series: Modern Statistical Methods in Respiratory Medicine
Survival analysis of time-to-event data in respiratory health research studies
Department of Epidemiology and Preventive Medicine, Monash University, Melbourne, Victoria, Australia
Victorian Centre for Biostatistics (ViCBiostat), Melbourne, Victoria, Australia
Correspondence: Jessica Kasza, Department of Epidemiology and Preventive Medicine, Monash University, The Alfred Centre, 99 Commercial Road, Melbourne, Vic. 3004, Australia. Email: email@example.com
The Authors: Dr Jessica Kasza, BSc, PhD, a research fellow in biostatistics at the Department of Epidemiology and Preventive Medicine at Monash University, has research interests that include health-care provider comparison and the estimation of causal effects. Dr Darren Wraith, BMath, PhD, a research fellow in biostatistics at the School of Population and Global Health at Melbourne University, has broad research interests in biostatistics. Dr Karen Lamb, BSc, PhD, a biostatistician and research fellow at the Murdoch Children's Research Institute at the Royal Children's Hospital, has broad biostatistical research interests, with a focus on measures for describing treatment effects in survival analysis and appropriate methods for dealing with correlation in the analysis of neighbourhood effects on health. Prof. Rory Wolfe, BSc, PhD, professor of biostatistics at the School of Public Health and Preventive Medicine, has broad research interests in biostatistics.
Series Editors: Rory Wolfe and Michael Abramson
This article provides a review of techniques for the analysis of survival data arising from respiratory health studies. Popular techniques such as the Kaplan–Meier survival plot and the Cox proportional hazards model are presented and illustrated using data from a lung cancer study. Advanced issues are also discussed, including parametric proportional hazards models, accelerated failure time models, time-varying explanatory variables, simultaneous analysis of multiple types of outcome events and the restricted mean survival time, a novel measure of the effect of treatment.
In a previous article in this series, regression models were introduced for the analysis of relationships between a measure of respiratory health (measured on a continuous, binary or ordinal scale) and one or more patient characteristics (explanatory variables). We now consider the situation where the outcome of interest is the time to occurrence of a specific event, for example, time to death for lung cancer patients or time to recovery for pneumonia patients. Two key aspects characterize the challenge of analysing ‘time to event’ or ‘survival time’ data. Firstly, the distribution of event times in a group of study participants may be unlike a normal distribution, for example, because of being strongly skewed, as may be observed with times to discharge from hospital for patients admitted with a severe asthma episode. Secondly, research study participants may not experience the event by the time the study ends or might be lost to follow up prior to experiencing the event. These participants have a time to event that is ‘censored’ at the time they were last observed. Unusual distributions of event times and the presence of censoring in a time-to-event outcome require special methods for statistical analysis. This article will introduce such methods, including appropriate regression methods to relate the survival time of patients to explanatory variables.
To illustrate survival analysis methods, we use the Veterans Administration lung cancer trial, from Kalbfleisch and Prentice. Male patients with inoperable lung cancer were randomized to receive one of two treatments (standard or experimental chemotherapy). The survival time (or time to censoring) of each patient was recorded, along with their age (in years), Karnofsky score, whether or not they received prior treatment and the histological tumour type (classified as adenocarcinoma, squamous, small cell or large cell carcinoma). Age is a continuous variable, treatment and prior treatment are binary and tumour type is categorical (adenocarcinoma is taken as the reference level). The Karnofsky score is a measure of the quality of life of the patient, taking values from 0 (dead) to 100 (healthy), and we treat it as a continuous variable. For the purposes of this article, we restrict follow up to 365 days, censoring patients that were still alive at this point. Of 137 patients in the trial, 118 died during this follow-up time; the remaining 19 patients had censored survival times. In other studies, the fraction of censored patients may be larger. For the analyses that follow, we are primarily interested in the relationship between treatment and survival time.
Censoring and Observation Period
In clinical trials, participants are typically considered to be at risk of the event of interest from the time of randomization. For observational studies, the situation is more complex depending on the exact circumstances of the study, and study participants may be considered to have been at risk of the event either from birth or from some other point in time such as a date of surgery, diagnosis or recruitment to a cohort. There are differing views on the relative merits of these approaches.[4, 5]
If a study participant has not experienced the event of interest by the time observation of that person ceases (which may be due to the scheduled end of the study or unscheduled loss of the person to follow up), they are said to be ‘right censored’. We only have partial information on the survival time of someone who is right censored: it is longer than the length of time for which they were observed (which incorporates the possibility that they never experience the event of interest). This is the type of censoring usually observed in clinical trials, for example, the 19 censored patients in the lung cancer trial.
Left and interval censoring can also arise although these are less common than right censoring. Left censoring occurs, for example, in an observational study when a participant has already experienced the event when the study begins, but precisely when that event occurred is unknown. Interval censoring occurs when it is known that someone experienced an event during an interval of time, but the exact timing of the event is unknown, as may occur in longitudinal studies with assessment of participants at regular periods, for example annually, for disease status.
The methods described in this review assume that the censoring of patients is not related to their subsequent survival history, that is, that the censoring mechanism is ‘non-informative’. The validity of this assumption must be assessed qualitatively for each application. For example, if the event of interest is non-lethal and participants are censored because of death, then whether this represents non-informative censoring may be debatable, and specialist methods may be required. While all of the methods we discuss below assume non-informative right censoring, they may be extended to different censoring mechanisms.
Survival Functions and the Kaplan–Meier Method
The survival function, denoted by S(t), gives, for any time t > 0, the probability that a patient remains event free for a period of length t after the start of observation, for example, from a clinical diagnosis, randomization or recruitment, to a cohort. A useful graphical summary plots the survival function against t. To produce such a plot, an estimate of S(t) is required. The most popular method for estimating S(t) is the Kaplan–Meier method,[8-10] an approach that takes into account both censored and uncensored observed survival times (Box 1).
Box 1. Example calculations for the Kaplan–Meier (KM) approach to estimating survival probabilities
Time index (tj)
Follow-up time (months)
Risk set (nj)
Outcome; event (E) or censored (C)
Number of events (dj)
Survival probability, S(tj)
† Two participants censored at 6 months.
The table illustrates the simplicity of the calculations for the overall survival probability at different time points using the KM approach. The hypothetical study starts with 100 participants where: nj represents the number of participants at risk of an event occurring as we enter time tj; E indicates whether the event of interest occurred, and C indicates if a participant was censored before experiencing the event; and dj equals 1 if the event of interest occurred and 0 if not.
Assuming events are independent over time, that is, that the participant outcomes don't influence each other in any way, the probability of being event free at time tj (denoted as S(tj)) is calculated using: S(tj) = S(tj−1) ×(nj − dj)/nj where t0 = 0 and S(t0) = 1.
A table of survival can be simply constructed by ordering the observed follow-up times from smallest to largest (noting that there can be ties where more than one participant has the same follow-up time, e.g. two participants censored at 6 months in the table) and identifying if an event has occurred or not. In the table, we see that a participant was followed for 2 months and censored at that time. In this case, the survival probability does not change, but the number of participants at risk of an event for the next time period decreases by 1. At t2 = 5 months, a participant experienced the event of interest and the new survival probability reflects the survival probability in the previous time period (S(t1) = 1) and the number at risk in the time period (n2 = 99). More specifically, at t4 = 9 months a participant experienced an event of interest and the survival probability then equals the product of the survival probability in the previous time period (S(t3) = 0.9899) and the proportion of survivors at t4 = 9 months among the persons at risk just before 9 months (n4 = 96 and hence the proportion of survivors is 95/96). Censoring is assumed to take place at the end of the time interval, tj, and the event is assumed to occur at the start of the time interval.
KM plots show the survival probability estimates using the KM approach on the y-axis and the follow-up time on the x-axis, so it is quite common to see constant estimates (flat lines) corresponding to periods of follow-up time where no events have occurred. The cumulative incidence of events at a point in time can also be plotted and is simply calculated as 1 − S(t). Confidence intervals (CI) for the survival probability estimates can be constructed, with the CI width at a given time point related to the number at risk (nj) and also the value of the survival probability. As time progresses and participants exit the study, the CI gets wider (greater uncertainty as fewer participants remain on which to base estimates of survival), although this may be counteracted by a narrowing as the estimated survival tends towards zero.
The Kaplan–Meier estimate of the survival function for the lung cancer trial is shown in Figure 1a with a 95% confidence interval (CI) to indicate uncertainty in the estimate. This CI should only be interpreted for single time points (i.e. the CI should not be used for inferences about the shape of the curve through multiple time points). Below Figure 1a, the number of patients at risk of death at various time points is given. Only 25 patients reached 200 days of follow up, so the survival curve beyond 200 days is estimated based on a diminishing sample size.
Figure 1b shows the Kaplan–Meier survival curves separately for the two treatment groups. An informal by-eye assessment indicates that there is not much difference in the survival of the two groups (or if there is a difference, then it favours first one group, then crosses over to favour the other group—we return to this idea of changing treatment effect over time in subsequent sections as its analysis requires advanced methods). There are a number of statistical tests available to formally test whether or not survival is better in one group compared with another, one of which is the log-rank test. Here, the log-rank test yields a P-value of 0.70, which indicates that there is no evidence of a difference in overall survival of patients on standard versus experimental chemotherapy. The log-rank test provides no facility to describe how different the groups are: for that, alternative methods are required. Additionally, methods are required to determine the effect of multiple explanatory variables on survival. We discuss such methods later, but first introduce a new concept: the hazard function.
An alternative (mathematically convenient) way to express the information contained in the survival function is to use the hazard function (also referred to as the hazard rate), denoted by h(t). The hazard and survival functions are mathematically related, which implies that one may be derived from the other. Informally, the hazard function h(t) describes the risk of the event occurring at time t. More formally, it describes the instantaneous event rate (death rate for the lung cancer trial) at time t for an individual who has survived to that time.
An estimate of the hazard function for the lung cancer trial is shown in Figure 2a and for the two treatment groups separately in Figure 2b. Although a bit bumpy, Figure 2a indicates that the hazard of death tends to decrease over time for the patients in the lung cancer trial. Estimating separate hazard functions for the two treatment groups (Fig. 2b) suggests that the hazard of death may differ in the two groups, although the CIs (not shown) are very wide. Figure 3 displays some theoretical examples of hazard functions: in Figure 3a, the dashed line indicates a theoretical hazard function that decreases over time and the solid line a theoretical hazard function that is constant over time.
Estimation of the hazard function proceeds through the assessment of risk at each instant of time; however, we have no events for most time instants. In our example, 137 patients are observed over a year, so even if all patients died on distinct days, there would be 228 days with no event. (This does not even account for time periods shorter than a day.) Estimation of the hazard is inherently noisy, unlike the estimation of survival which is more stable due to its consideration of the cumulative occurrence of events. If information from neighbouring time instants is pooled in order to estimate the hazard, then the noise is ‘smoothed’. Figure 2a,b was constructed using a smoothing estimation approach.
Regression models relating survival time to explanatory variables are often expressed in terms of the hazard function, which is a mathematically more convenient quantity. Such models are discussed after first introducing hazard ratios.
The hazard ratio is the ratio of hazard rates for two groups and quantifies how the two groups differ in experiencing the event of interest. If we assume, for convenience, that the hazard ratio remains constant over time, regardless of hazard rate fluctuations over time, then this defines a ‘proportional hazards’ model. In such a model, the hazard function may be modelled as a function of multiple explanatory variables by applying the proportional hazards assumption to each variable. The one-to-one relationship between hazard and survival functions then implies that a proportional hazards model, although specified in terms of hazard functions, simultaneously and equivalently relates survival time to the explanatory variables. The remaining question for the full specification of a model is what are we willing to assume about the underlying hazard function fluctuations over time?
Consider those hazard functions displayed in Figure 3a: each of the lines on this figure relates to a specific shape governed by the value of a single parameter called ‘p’. Assuming such a form for the hazard function results in what is known as a ‘parametric model’. An alternative approach, used in the Cox model (after the British statistician, Sir David Cox), is not to assume any particular shape for the hazard function. This is known as a ‘semiparametric’ approach because although no parameters are estimated to describe the hazard function, the model does incorporate the proportional hazards assumption and hence hazard ratio ‘parameters’ that can be estimated from data. The Cox model is discussed in the next section, and parametric models are discussed later.
Cox Proportional Hazards Model
For the lung cancer trial, to obtain a comparison of survival in the two treatment groups allowing for any baseline imbalance between groups that might have occurred in spite of randomization, we may need to account for the variables age, Karnofsky score, prior treatment and cell type.
The Cox proportional hazards model[13, 14] allows for explanatory variables to be included in a regression model for survival times. In one view, it assumes a linear relationship between the logarithm of the hazard function (where the logarithm is base e = 2.718 …) and the explanatory variables, assuming no specific form for the hazard function:
A binary variable called treatment was defined, which took the values 0 for standard and 1 for experimental chemotherapy, and variables for the cell types were defined similarly, taking the value 1 when a patient has that particular cell type and 0 otherwise. h0(t) is the baseline hazard that is experienced by individuals with each explanatory variable equal to zero. log(h0(t)) is analogous to the intercept term in a linear or logistic regression model, except that it varies with time, as conveyed by dependence on t. As for the intercept terms in other regression models, this quantity is often not of interest (and is only meaningful when 0 is a valid value for each explanatory variable), but is structurally important. The additive structure for the effects of explanatory variables in the Cox proportional hazards model is analogous to the structure of other regression models, and hence inclusion of transformed explanatory variables and interaction terms follow the same rules.
If the equation is rearranged, this model's specification of hazard ratios according to the proportional hazards assumption can be seen more explicitly. If we exponentiate both sides (where the exponential is the reverse operation to taking a logarithm with base e, i.e. for an arbitrary value α, exp[log(α)] = α, and we can equivalently write exp(β) or eβ) we have:
The quantities are the hazard ratios for the respective explanatory variables (cf. odds ratios in logistic regression).
When applied to the lung cancer trial data, coefficients βi, i = 1, … , 7, standard errors, P-values, hazard ratios and 95% CI for the hazard ratios can be obtained from standard statistical software packages and are displayed in Table 1. The hazard ratio of treatment is slightly larger than 1, implying that patients in the study treated with experimental chemotherapy had slightly greater risk of mortality. The fairly wide 95% CI of 0.90–2.05 leaves us uncertain as to whether the treatment is of benefit or harm but appears to rule out the possibility that the treatment exerts a strongly protective effect on mortality risk (which would be indicated by a hazard ratio much less than 1). Squamous and large cell tumour types have hazard ratios less than 1. For instance, compared with patients with adenocarcinoma, patients with squamous cell tumours had a 69% lower hazard of death (hazard ratio = 0.31). This implies that patients with squamous and large cell types are expected to have longer survival times than adenocarcinoma patients. The hazard ratio for small cell tumours is also less than 1, but has a CI that includes 1, indicating that for generalizing beyond this sample there is no evidence of a difference in the hazard rates for patients with adenocarcinoma and small cell tumours.
Table 1. Coefficients (log hazard ratios), standard errors (SE), P-values, hazard ratios and 95% confidence intervals (CI) for the hazard ratios, for the Cox proportional hazards model of survival time for the lung cancer trial data on treatment (experimental compared with standard chemotherapy), age, Karnofsky score, histological tumour type and prior receipt of treatment
Tumour type (comparisons with adenocarcinoma)
Testing the proportional hazards assumption
A key assumption of the Cox proportional hazards model is that the hazard of death in one group is proportional to the hazard of death in any other group over the entire observation period (i.e. that the hazard ratio is constant over time), which must be assessed for each explanatory variable included in the model. For instance, when the Cox proportional hazards model is assumed for the data from the lung cancer trial, the hazard of death in the standard chemotherapy group is assumed to be proportional to the hazard of death in the experimental chemotherapy group over the 365 days of follow-up and similarly for the remaining explanatory variables.
To informally assess the assumption of proportional hazards, plots of the estimated hazards in each group could be examined, for example, Figure 2b. Such plots are not recommended in general because of the ‘noisy’ aspect of their estimation, as discussed above. A more informative plot is that exemplified in Figure 4, which plots –log(–log(survival probability)) against log(time) (a ‘log-minus-log plot’) for the two treatment groups. These plots can be constructed with standard statistical packages and, if the assumption of proportional hazards is valid, should give rise to roughly parallel lines. That parallel lines indicate the validity of the assumption of proportional hazards is related to the mathematical relationship between the hazard and survival functions.
For the lung cancer trial, the lines for the two treatment groups do not appear to be parallel so the assumption of proportional hazards seems to be violated for this dataset: the hazard ratio does not appear to be constant over time. However, another interpretation in cases such as this, where the two lines are close to each other throughout the plot, is that the groups simply do not differ much in their respective hazards at any point in time. The crossed-over lines correspond to a hazard ratio moving in value from just below 1 to just above 1. With the first interpretation, reporting a single hazard ratio for the effect of the experimental chemotherapy versus standard is not valid, as this quantity is inferred to change through time. With the second interpretation, a constant hazard ratio with value close to 1 would be an appropriate summary.
Although the technicalities of testing the proportional hazards assumption are quite complex, the assumption of proportional hazards should be checked for each explanatory variable in the model. The plots described above are useful when assessing the assumption for categorical/binary variables, but not for continuous variables (e.g. age and Karnofsky score). Formal statistical tests of the proportional hazards assumption are available to complement the graphical approach (and may be used for both categorical and continuous variables), for example, there are tests known as the Schoenfeld residuals test and the time-dependent covariate test.
Stratified proportional hazards models
If the assumption of proportional hazards is not satisfied for a particular explanatory variable, a possible solution is to ‘stratify’ the model by that variable, see Klein and Moeschberger, section 9.3. If this variable is continuous, it must be transformed into an ordinal variable for this purpose by dividing its range into categories.
The stratified model estimates a different baseline hazard for each of the different strata, but assumes that the remaining explanatory variables exert their influences equally in all strata (i.e. that the coefficients βi are identical across strata). For example, if we were to stratify the Cox model for the lung cancer trial by treatment, two separate baseline hazard functions would be estimated (one for each treatment group), and the effects of the remaining variables (age, Karnofsky score, prior treatment and tumour type) would be assumed to be identical in each of the two treatment groups. Such a model would not estimate a hazard ratio for the treatment variable and would be of limited use when the effect of treatment on survival is of most interest. If the variable violating the proportional hazards assumption is of main interest, alternative models are required that allow for non-proportional hazards, such as the accelerated failure time model, described below.
Parametric proportional hazards
The Cox proportional hazards model is ‘semiparametric’ in the sense of assuming linear relationships between the log-hazard and the explanatory variables (the strengths of these relationships being characterized by parameters β), but not specifying the form that the hazard function takes. Alternatively, one may assume a convenient or appropriate form for the hazard function. While there are many forms available, common choices are the exponential, Gompertz and Weibull distributions.
The exponential model is the simplest choice because it assumes a hazard function that is constant over time; an assumption that is not generally valid in human disease applications, but may be in other applications (e.g. assessing the lifetime of electronic components). The Weibull and Gompertz models are appropriate if the hazard steadily increases or decreases over time. The Gompertz model is often used to model mortality data and implies that the log of the hazard function changes linearly with time, whereas the Weibull model implies that the log of the hazard function changes linearly with log(time).
The parameters of the Weibull and Gompertz distributions define the shape of the associated hazard functions, and Figure 3 displays the hazard functions implied by various parameter values. The exponential model happens to be a special case of both the Weibull (when the parameter p = 1) and Gompertz (when the parameter gamma = 0), as shown in Figure 3 (chapter 5 of Collet and sections 8.2 and 8.3 of Hosmer et al. provide further detail).
Parametric proportional hazards models incorporate the same proportional hazards assumption as the Cox model, which will need to be assessed in each application. Each of these models has additional parameter(s) specific to the parametric form for the baseline hazard that are estimated from observed data in addition to the log hazard ratio coefficients β.
Accelerated failure time models
Proportional hazards models are specified in terms of hazard functions, rather than directly in terms of survival times. This leads to models that are mathematically tractable, but the conceptual basis of hazard functions can be elusive. An alternative class of models are accelerated failure time models, which directly specify models in terms of survival times. The parameters of accelerated failure time models are interpreted in terms of ‘time ratios’, which compare the lengths of time to event occurrences given different values of the explanatory variables. Although accelerated failure time models are applied less frequently than proportional hazards models, expressing the effects of explanatory variables in terms of time ratios may offer an attractive alternative to expression in terms of hazard ratios.
Accelerated failure time models are written in terms of the logarithm of survival time (survival time denoted by t). The model for the lung cancer trial is given by:
τ represents residuals that follow some assumed distribution, in much the same way that residuals in linear regression are assumed to follow the normal distribution. For now, we assume a logistic distribution for log(τ) (i.e. a log-logistic distribution for τ) and discuss alternatives below.
The exponentials of the β coefficients may be interpreted as time ratios. For example, is the ratio of the median survival times of patients receiving experimental chemotherapy to those receiving standard chemotherapy. If , the effect of the experimental chemotherapy is to increase the amount of time until death occurs, relative to standard: time can be thought of as ‘slowing down’ for a patient receiving experimental chemotherapy relative to a patient receiving the standard treatment. Conversely, if , the effect of the experimental chemotherapy is to decrease the amount of time until death occurs: time can be thought of as being ‘accelerated’ for those patients receiving experimental chemotherapy relative to those receiving standard.
The estimated coefficients and time ratios for the accelerated failure time model applied to the lung cancer study are given in Table 2. These data provide no evidence that the survival times of patients on experimental chemotherapy are different to those on the standard chemotherapy (which is similar to the results obtained for the Cox proportional hazards model). The estimates also indicate that the median survival time of patients with squamous cell carcinoma is expected to be about twice as long as the median survival time of patients with adenocarcinoma (the estimate of is 2.19, with 95% CI 1.27–3.79).
Table 2. Coefficients (log time ratios), standard errors (SE), P-values, time ratios and 95% confidence intervals (CI) for the time ratios, for the accelerated failure time model‡ of survival time on treatment (experimental compared with standard chemotherapy), age, Karnofsky score, histological tumour type and prior receipt of treatment
†The constant term is β0 from the accelerated failure time model. exp(β0) = 3.67 can be interpreted as the estimated median survival time for patients with all covariates equal to 0. As for the intercept term in linear regression models, this quantity is not meaningful if 0 is not a meaningful value for all explanatory variables. This interpretation only holds when a log-normal or log-logistic distribution is assumed for τ. When alternative distributions for τ are assumed, such as the Weibull or Gompertz distributions, the median survival time for patients with all covariates equal to 0 may be estimated as exp(β0) × median(τ).
‡The additional parameter γ (estimate: 0.60, 95% CI: (0.52, 0.70)) is the parameter of the log-logistic distribution assumed for τ and determines the shape of the hazard function for these data. Figure 5 displays some examples of the hazard function associated with different choices of γ (which determines the standard deviation of log(τ)).
In the preceding model, we assumed the log-logistic distribution for τ, but there are alternative choices, such as the log-normal and Weibull distributions. The shapes of the log-logistic and log-normal distributions are similar to each other and are useful in situations when the hazard first increases and then decreases over time. Figure 5 displays the shape of the hazard function when the log-logistic distribution is assumed for various values of the log-logistic distribution parameter.
In addition to being useful in their own right, accelerated failure time models offer an alternative to proportional hazards models when the assumption of proportional hazards appears to be violated.
A relatively simple solution to address non-proportional hazards can be to allow for coefficients in a regression model to change over time by including an interaction with a time variable in the model. For example, considering the Cox proportional hazards model for the lung cancer trial outlined earlier, if there is some evidence of non-proportional hazards over time for the two treatment groups, then the following model could be considered:
where a new variable, the interaction formed by the product ‘Treatment × Time’ has been included. More complex relationships over time can also be considered, such as assuming a log-linear relationship (by using log(Time) in the interaction term) or splitting the follow-up time into discrete periods and using this categorical time variable in the interaction (whereby within each time period the hazards are assumed proportional, but between periods the treatment effect differs).
In some studies, important variables relevant to a patient's risk of experiencing the event of interest may change over the course of the study. For example, in cohort studies of time to thromboembolism among lung cancer patients, risk depends on patient characteristics such as age and stage of cancer, as well as dynamic factors such as treatment stage (e.g. chemo/radiotherapy, surgery and use of biologic agents). The commencement of a new treatment stage is often accompanied by a hypercoagulable ‘pro-thrombotic’ state for a period of time.[19, 20] Such time-dependent information can be incorporated into survival analysis; each patient's follow-up time is split up into a number of records, each relating to a time interval defined by changes in the time-dependent variable, and account taken of these multiple records per person. On a technical note, the model structure for accommodating time-dependent variables is equivalent to the structure used for allowing time-varying effects of explanatory variables as discussed in the previous subsection.
Mean survival time
Although it is common in a survival analysis to present results in terms of hazard ratios, sometimes absolute measures of the treatment effect are of interest and can provide greater clinical insight in certain contexts. A popular example in end-stage cancer trials is to report median survival for the two treatment groups, that is, the time at which 50% of participants are expected to have experienced the event of interest, as obtained from the estimated survival curve. Another measure that may be of interest is the mean survival time. However, there can be difficulties in calculating this measure. Conceptually, the mean survival time for a group of patients can be calculated as the area under the estimated survival curve for that group. In the presence of censored observations, mean survival time can be estimated using either the ‘restricted mean’ or the ‘extended mean’. The restricted mean can be determined by calculating the area under the survival curve to the time at which the last observed event occurs or, alternatively, until some other prespecified time point, t*, where the choice of t* should be clinically motivated. The extended mean survival time, on the other hand, involves parametrically extrapolating the survival time distribution and hence extending the survival curve to zero to enable the mean to be estimated. The restricted mean survival underestimates the ‘true’ mean survival time when the last observed time point is censored. The extended mean survival avoids this problem but depends on the survival time distribution chosen, and must therefore be used and interpreted with caution. Computational details are available in Barker.
We have assumed non-informative censoring in this paper. A practical example of ‘informative’ censoring is given by early-stage lung cancer studies in which death due to lung cancer may be the event of interest. In such studies, patients may die as a result of advanced age, or smoking-related chronic diseases other than lung cancer that were present at the commencement of follow up. Such patients have censored survival times at the time of their death and do not experience the event of interest. Events of a type different to the event of interest, but which occur through a mechanism related to the risk of the event of interest, are known as ‘competing-risk’ events. In the presence of competing-risk events, the approaches described in this article will overestimate survival probability because it is the sicker patients who tend to be censored. To allow for the effect of competing risks, relatively simple adjustments can be made to the Kaplan–Meier approach, which incorporate the other events into the survival probability calculations. More complex models, such as a modified form of the Cox proportional hazards model or a competing risk regression model, can also be applied.
While Cox proportional hazards models are the most popular choice for modelling survival data, parametric proportional hazards or accelerated failure time models may provide a better fit to observed survival data in some studies. The issue of model choice in survival analysis is complex, like other areas in statistics, and extends beyond the scope of this review. Briefly, the desire to closely fit the hazard or survival rate over time is often weighed against model parsimony or a preference for a simple model. The Akaike information criteria can be used to compare competing models, for example, of differing complexity or those based on different assumptions about the underlying hazard such as a Cox and a Weibull proportional hazards model. A model having better fit to observed survival in a sample of patients than a competing model does not guarantee that either model describes survival accurately or inaccurately in the population of similar patients. See Bradburn et al. for a brief description of various techniques for assessing model adequacy.
It is possible that none of the options discussed so far will satisfactorily model the hazard function for a particular application. A family of models called ‘Royston–Parmar models’[26, 27] provide a great deal of flexibility and allow for hazard functions with more complex shapes than those considered in Figures 3 and 5. We refer interested readers elsewhere for discussion of other important issues, which include but are not limited to the following list: Firstly, dealing with time-to-event analysis where events may be recurrent, for example, studies investigating recurrent tuberculosis or asthma attacks. Secondly, situations where patients under study are not independent with regard to their underlying hazard, perhaps because of shared genetic background (family members) or an environment (in a cluster-randomized clinical trial), can be analysed with use of ‘frailty’ models that accommodate random effects to capture the homogeneity that may exist within the clusters of study participants. Thirdly, multiple imputation to deal with missing data has been introduced in this series, and its extension to survival analysis is often needed but requires some care.
Jessica Kasza, Darren Wraith and Karen Lamb were supported under a National Health and Medical Research Council Centre of Research Excellence grant, ID#1035261, awarded to the Victorian Centre for Biostatistics (ViCBiostat). We thank Michael Abramson and Christian Schindler for insightful comments that helped to clarify many ideas in this article.