## Introduction

In a previous article in this series,[1] regression models were introduced for the analysis of relationships between a measure of respiratory health (measured on a continuous, binary or ordinal scale) and one or more patient characteristics (explanatory variables). We now consider the situation where the outcome of interest is the time to occurrence of a specific event, for example, time to death for lung cancer patients or time to recovery for pneumonia patients. Two key aspects characterize the challenge of analysing ‘time to event’ or ‘survival time’ data. Firstly, the distribution of event times in a group of study participants may be unlike a normal distribution, for example, because of being strongly skewed, as may be observed with times to discharge from hospital for patients admitted with a severe asthma episode. Secondly, research study participants may not experience the event by the time the study ends or might be lost to follow up prior to experiencing the event. These participants have a time to event that is ‘censored’ at the time they were last observed. Unusual distributions of event times and the presence of censoring in a time-to-event outcome require special methods for statistical analysis. This article will introduce such methods, including appropriate regression methods to relate the survival time of patients to explanatory variables.

To illustrate survival analysis methods, we use the Veterans Administration lung cancer trial, from Kalbfleisch and Prentice.[2] Male patients with inoperable lung cancer were randomized to receive one of two treatments (standard or experimental chemotherapy). The survival time (or time to censoring) of each patient was recorded, along with their age (in years), Karnofsky score, whether or not they received prior treatment and the histological tumour type (classified as adenocarcinoma, squamous, small cell or large cell carcinoma). Age is a continuous variable, treatment and prior treatment are binary and tumour type is categorical (adenocarcinoma is taken as the reference level). The Karnofsky score[3] is a measure of the quality of life of the patient, taking values from 0 (dead) to 100 (healthy), and we treat it as a continuous variable. For the purposes of this article, we restrict follow up to 365 days, censoring patients that were still alive at this point. Of 137 patients in the trial, 118 died during this follow-up time; the remaining 19 patients had censored survival times. In other studies, the fraction of censored patients may be larger. For the analyses that follow, we are primarily interested in the relationship between treatment and survival time.