Visualizing the quantile survival time difference curve

Abstract The difference between the pth quantiles of 2 survival functions can be used to compare patients' survival between 2 therapies. Setting p = 0.5 yields the median survival time difference. Varying p between 0 and 1 defines the quantile survival time difference curve which can be straightforwardly estimated by the horizontal differences between 2 Kaplan‐Meier curves. The estimate's variability can be visualized by adding either a bundle of resampled bootstrap step functions or, alternatively, approximate bootstrap confidence bands. The user‐friendly SAS software macro %kmdiff enables the straightforward application of this exploratory graphical approach. The macro is described, and its application is exemplified with breast cancer data. The advantages and limitations of the approach are discussed.


| INTRODUCTION
As consistent estimator of the survival function, the Kaplan-Meier curve 1 is the most commonly used graphical tool in survival analysis. It is extensively used to visually compare censored survival time curves between groups of patients distinguished by different therapies, biomarker categories, or demographic features. Estimates for survival probabilities at specific times (eg, 3-year survival probability) and specific quantiles of the survival time distribution (eg, median survival time) can be easily obtained.
The diagonal upper-left-to-lower-right nature of the plot, however, hampers the visual assessment of the difference between the pth quantiles of 2 Kaplan-Meier curves. To overcome this problem, the quantile survival time difference curve is defined and straightforwardly estimated by calculating the horizontal differences between the corresponding Kaplan-Meier curves. Two bootstrap-based approaches are suggested to illustrate the estimate's variability; alternatively, the traditional normal approximation method or a smoothed empirical likelihood approach could have been considered as well. 2 The quantile survival time difference curve is not to be confused with the survival probability difference curve [3][4][5][6] and approaches related to it. [7][8][9][10] Besides that, the term "quantile difference" itself is not unambiguously defined. It could also refer to the difference between the pth and qth quantiles of a single survival curve 11,12 , eg, the interquartile range for p = 0.75 and q = 0.25.
In Section 2, the technical details of the approach are presented.
The SAS macro %kmdiff is described and illustrated in Section 3. A brief discussion is given in Section 4. The SAS macro code was generated using Version 9.4 (for Windows) of the SAS software (copyright

| METHODS
Assume that from a large patient population with survival function S(t), right-censored survival data have been observed for a sample of n patients, where x 1 …x n denote observed survival times and a 1 …a n denote their corresponding censoring indicators, respectively. The Kaplan-Meier estimator of the survival function is defined as where 0 ≤ t 1 < t 2 < … < t K are K > 0 different observed failure times, d i is the number of failures, and n i is the size of the risk set at t i , i = 1, …, K. If Note that b S t K ð Þ equals 0 only if all censored observations occur before t K . If patients are still alive after t K , then either b S t ð Þ can be set equal to b S t K ð Þ from t K to the largest censoring time or b S t ð Þ can be considered not defined for t > t K . In the current manuscript, the latter definition is applied. Consequently, in the case of K = 0, that is no observed failure times, b S t ð Þ equals 1 for t = 0 and is not defined for t > 0.
The pth quantile survival time Q(p) corresponds to the labelling of the survival function axis of the Kaplan-Meier plot and is equivalent to the and K > 0, the estimator for the pth quantile survival time can be defined as Now assume that from a second large patient population (independent from the first one) with survival function S ′ (t), right-censored survival data have been observed as well for a sample of m patients.
Their observed survival times and censoring indicators are denoted by y 1 …y m and b 1 …b m , respectively, and 0 ≤ u 1 < u 2 < … < u L are the L different observed failure times. By analogy with formulae (1) and (2), a corresponding estimator c Q ′ p ð Þ for the pth quantile survival time Note that the prime in the notations refers to the second population and its corresponding sample. Let Now, an estimator for the quantile survival time difference curve It seems obvious now to plot b D p ð Þ against p. However, for the sake of comparability with the Kaplan Meier plot, p has to be plotted on the vertical axis against b D p ð Þ on the horizontal axis, respectively.
In a next step, the variability of the estimator b D p ð Þ will be visualized. For this purpose, 2 bootstrap solutions are suggested, both are based on Efron's classical bootstrap for censored data. 13

| Bootstrap bundle
Draw a sample of size n with replacement from (x 1 , a 1 ), (x 2 , a 2 )…(x n , a n ) and a sample of size m with replacement from (y 1 , b 1 ), (y 2 , b 2 )…(y m , b m ).
These are the first bootstrap samples, Here, p* 1 is defined in analogy to p 0 of formula (3).
Repeating the drawing of the bootstrap samples bundle times In doing so, a subdued colour like light grey should be used for the bootstrapped quantile difference curves, and a vibrant colour like green should be used for b D p ð Þ, respectively. Setting bundle to a value between 40 and 200 seems reasonable when using the bootstrap bundle approach.

| Bootstrap confidence bands
The generation of a confidence band also requires bootstrapped Here, the value of boot should depend on the chosen confidence level 100(1 − α)%. As rule of thumb, we recommend boot ≥ 100/α, that is boot ≥ 2000 for a 2-sided 95% confidence band and boot ≥ 10000 for a 2-sided 99% confidence band, respectively.
A confidence band for the quantile difference curve can be constructed from a series of pointwise confidence intervals for quantile differences. For a given survival probability p, the corresponding confidence interval is determined as the (α/2)th and the 1 − (α/2)th where p* max ≤ p ≤ 1 and p* max = max (p* 1 , p* 2 …p* boot

| SAS macro
The SAS macro %kmdiff allows the straightforward application of the method to survival data sets and is freely available at https://cemsiis. meduniwien.ac.at/en/kb/science-research/software/statistical-software/.
The macro produces the standard Kaplan-Meier plot of SAS, and plots of the quantile differences with (i) a bootstrap bundle, (ii) bootstrap confidence bands, and (iii) both a bootstrap bundle and bootstrap confidence bands. The macro parameters are described in Table 1.

| Example
Patients  Input SAS data set (required); its name must not start with two underscores, and it must not contain variables whose names start with two underscores (in particular, "__g", "__t", and "__status")

| DISCUSSION
The SAS macro %kmdiff provides an easy-to-use computational tool to visually assess the estimated quantile survival time difference curve and its sample variability. The curve is intended as useful complement but not as replacement of the survival probability difference curve and all the refined approaches based thereon. [3][4][5][6][7][8][9][10] The motivation to plot a quantile survival time difference curve may be threefold. Firstly, patients as well as health professionals may find time differences more intuitive and easier to interpret than probabilities and probability differences. Secondly, the curve shows an overall picture unlike the isolated snippet provided by the commonly reported median survival time difference. And thirdly, it can become rather difficult to assess horizontal (and also vertical) differences between 2 Kaplan-Meier curves; the visual perception is often affected by the shortest (Euclidean) distance between the 2 curves.
The main purpose of the SAS macro %kmdiff is to support the exploration of (possibly time-dependent) group effects in the presence of right censoring. Using the macro for confirmatory purposes would require the prespecification of statistical hypotheses and the proper adjustment for any multiple testing; it should also be taken into account that the macro produces a pointwise confidence band.
There is a potential limitation of bootstrap-based confidence intervals for quantile differences: in particular in very small samples and risk sets, the actual coverage probability may considerably deviate from the nominal one. 15,16 Naturally, this limitation also affects the bootstrap bundle approach; that is, the graphically shown bootstrap replications might only provide a distorted picture of the variability of the estimated quantile differences in very small samples and risk sets.
Confidence intervals could also be obtained by normal approximation or a smoothed empirical likelihood method. 2 However, the former will need moderate to large sample sizes to work properly, whereas the latter requires the selection of a kernel bandwidth by cross-validation which is computationally burdensome. 2 Besides that, due to the close relationship between the bootstrap and the empirical likelihood, it seems reasonable to assume that the empirical likelihood will face similar problems as the bootstrap in very small samples and risk sets.
The presented SAS macro %kmdiff only uses Base SAS and SAS/ STAT procedures for statistical computations; graphical representations are based on the ODS Graphics procedure SGPLOT. Note that the user can easily modify the SGPLOT code or replace it with purpose-built SAS code to obtain specially tailored graphical output.
In conclusion, the SAS macro %kmdiff provides a useful exploratory tool for medical researchers as it brings in an additional dimension to the assessment and communication of group differences in patient survival.