• Open Access

Two-stage meta-analysis of survival data from individual participants using percentile ratios



Methods for individual participant data meta-analysis of survival outcomes commonly focus on the hazard ratio as a measure of treatment effect. Recently, Siannis et al. (2010, Statistics in Medicine 29:3030–3045) proposed the use of percentile ratios as an alternative to hazard ratios. We describe a novel two-stage method for the meta-analysis of percentile ratios that avoids distributional assumptions at the study level. Copyright © 2012 John Wiley & Sons, Ltd.

1 Introduction

In a review of individual participant data (IPD) meta-analyses published during the years 1999 to 2001, Simmonds et al. [1] found that in practice meta-analysis of IPD was most frequently conducted using simple two-stage methods. For example, in the meta-analysis of survival IPD, hazard ratios can be estimated for each study individually in the first stage using a proportional hazards model, or approximated by a log-rank statistic (for example, [2]). Individual study results can then be combined using standard random-effects or fixed-effects meta-analysis in the second stage.

In the analysis of a single survival study, the hazard ratio is a commonly used measure of treatment effect and is therefore a natural quantity to consider when undertaking a meta-analysis. However, an assumption of proportional hazards is even more restrictive in meta-analysis because it is imposed on multiple studies. If the proportional hazards assumption does not hold for some studies, then the estimated hazard ratio depends on the length of follow-up in those studies, and meta-analysis may not be appropriate. Methods that relax the proportional hazards assumption, the majority of which focus on a measure or measures of treatment effect that are based on the hazard ratio, have been proposed. For example, Moodie et al. [3] meta-analysed a function that can be interpreted as a log-transformed weighted average of hazard ratios over time for each trial. Arends et al. [4] extended the model of Dear et al. [5] by modelling the survival probabilities in treatment and control groups using a multivariate, mixed-effects model. The treatment effect in the model of Arends et al. is the hazard ratio, which can be time varying when treatment-by-time interaction terms are included in the model. Fiocco et al. [6] described a piecewise-constant hazard model, where hazards for the treatment and control arms are modelled using a bivariate frailty model. And more recently, Ouwens et al. [7] have proposed a meta-analysis model in which hazard ratios are time varying and expressed in terms of the shape and the scale parameters of parametric survival curves. The methods we propose here provide an alternative approach to the use of time-varying hazard ratios in this context.

In a recent paper, Siannis et al. [8] considered the percentile ratio as an alternative measure of treatment effect in survival IPD meta-analysis when proportional hazards assumptions are not appropriate. The percentile ratio qk at percentile level k comparing the survival distributions of two groups is defined as

display math(1)

The median ratio at k = 0.5 represents the expected ratio of times at which half of the population will fail in the treatment group compared with the control group. Similarly, for other percentile levels k, the percentile ratio is the expected ratio for the time at which 100k% of the population will fail in the treatment group compared with the control group. Because the interpretation of percentile ratios can be so easily explained, their use could lead to clearer, more practical understanding of survival differences between treatment groups. In general, percentile ratios may vary across percentile levels. If the survival distributions in question are accelerated failure time models, however, the percentile ratio q is constant across all values of k [8].

Siannis et al. estimated percentile ratios using a one-stage parametric model with data at the individual study level being modelled using either accelerated failure time or proportional hazards distributions. In the simplest version of the model, accelerated failure time distributions were used to model the data at the study level. In this case, the combined percentile ratio q is constant across percentile levels and can be modelled using either fixed or random effects. The proposed framework is very general, however, and could be used to model any choice of distribution at the study level. This was illustrated using a combination of accelerated failure time and proportional hazards models.

Motivated by the popularity of simple two-stage analyses, we propose an alternative, two-stage method for meta-analysis of percentile ratios, which in addition avoids all distributional assumptions in the first stage. In stage 1, we use Kaplan–Meier estimates of the survivor functions for the treatment and control groups to estimate percentile ratios and their variance–covariance matrix. In stage 2, percentile ratios are combined using either univariate or multivariate, random-effects meta-analysis (see [9] for an overview of multivariate meta-analysis). The pros and cons of using multivariate meta-analysis in this context are explored in the analysis of an example data set.

The layout of the paper is as follows. In Section 2, we describe the new two-stage method and explore its properties using a simulation study in Section 3. We apply the method to an example data set in Section 4 and conclude with a discussion in Section 5.

2 Two-stage meta-analysis of percentile ratios

In this section, we describe our methods in more detail. In Section 2.1, we describe stage 1 of the analysis, in which a vector of log percentile ratios (logPRs) and its variance–covariance matrix is estimated for each study. In Section 2.2, we describe how estimated logPRs from several studies can be combined using multivariate meta-analysis.

2.1 Stage 1: estimation of log percentile ratios

We focus here on the analysis of the data from a single study. The aim is to estimate a vector of logPRs qi = (qi1, … ,qiK)T for each study i, where qik denotes the percentile ratio in study i at percentile level k, and its variance–covariance matrix Si. Methods for combining these estimates will be discussed in Section 2.2. To estimate qik, we first estimate the kth percentile in the treatment and control groups for study i, which we denote inline image and inline image, respectively, and use

display math(2)

To simplify the notation, we will ignore study index i for the remainder of this section. We will assume throughout that censoring is non-informative, that is, it occurs independently of survival.

The percentile tk at percentile level k of a distribution of survival times is defined by the equation S(tk) = k, where S() is the survivor function. We estimate tk from a Kaplan–Meier estimate of the survivor function inline image as follows

display math(3)

where the minimisation in (3) is necessary because of the steplike nature of the estimated survival curve inline image. In a graph of inline image against t, this corresponds to the value of t for which a horizontal line at S(t) = k crosses the estimated survival curve. If there is an interval t1 ≤ t < t2 for which S(t) = k exactly, then the percentile is estimated to be the midpoint of that interval:

display math(4)

Suppose inline image and inline image are the estimated kth percentiles in the treatment and control groups, respectively. Then, an estimate for the logPR is given by substituting the estimates inline image and inline image in Equation (2).

Sander [10] demonstrated asymptotic normality for quantile estimates of survival distributions derived from Kaplan–Meier curves under certain conditions and further discussed by Roth [11] and Reid [12] (we thank a referee for pointing out these references). However, estimation of the variance–covariance matrix S for quantile estimates is not straightforward. We describe two methods, the first of which is based on asymptotic approximations and is relatively fast to compute. The second method uses bootstrapping but is computationally more intensive.

2.1.1 Asymptotic variance estimation

A simple asymptotic expression for the variance of an estimated log-percentile inline image is given by

display math(5)

where f(t) is the probability density function of the survival distribution (see Appendix A.1 for a derivation). To estimate inline image, we use Equation (5) with inline image and f(t) replaced by suitable estimates. inline image is taken to be inline image (theorem 2 of [11]), which is estimated by Greenwood's formula. The estimation of f(t) is less straightforward but can be carried out using the presmooth package in R [13]. A brief explanation of presmoothed density estimation is given in Appendix B.

Having estimated the variances of both treatment and control group log percentiles, an estimate of the variance of the estimated logPR is given by

display math(6)

An asymptotic expression to estimate the covariance between two estimated log percentiles inline image and inline image is given by

display math(7)

(see Appendix A.2 for a derivation). When k1 = k2, Equation (7) reduces to the variance formula of Equation (5). Var{S(t)}and f(t) can again be estimated using Greenwood's formula and the presmooth package, respectively. The estimated covariance matrix S between two logPRs is then given by

display math(8)

2.1.2 Bootstrap variance estimation

Variance–covariance matrices could alternatively be estimated by bootstrapping. The validity of bootstrap methods for the case of censored data has been demonstrated by Efron [14] (see also [15] for a summary of bootstrap methods). For a study with nT and nC participants in the treatment and control groups, respectively, a bootstrap sample is constructed by making nT random draws with replacement from the treatment group and nC from the control group. A large number B of bootstrap samples are drawn, and each is used to estimate a vector of logPRs inline image for b = 1, … ,B. Then, an estimate of the variance–covariance matrix S of inline image is given by

display math(9)

2.2 Stage 2: meta-analysis

The output from stage 1 of the analysis is an estimated logPR or vector of logPRs from each study, inline image, along with an estimated variance or variance–covariance matrix Si. Si can be estimated using Equation (8) or (9) but, for the purposes of the meta-analysis, is assumed fixed and known. When a single percentile level k is of interest, logPRs can be combined using standard random-effects meta-analysis as follows. The estimated logPR from study i, inline image, is assumed to be normally distributed about the true logPR in study i, log qik, with variance Si,kk:

display math(10)

The true logPR in study i, log qik, is then assumed to be normally distributed about the combined logPR qk with between-studies variance inline image:

display math(11)

If a vector of logPRs from each study are to be combined, this can be performed either by using a separate univariate meta-analysis at each percentile level or by combining logPRs at all levels simultaneously using multivariate meta-analysis. It has been argued that multivariate meta-analysis is preferable when combining multiple correlated outcomes because estimates of combined effect sizes borrow strength across outcomes through the correlations between them [16]. For a multivariate meta-analysis, we assume that inline image has a multivariate normal distribution

display math(12)

where logqi is a vector of true, underlying logPRs in study i. Si is the within-study variance–covariance matrix from study i, which is again assumed fixed and known. The within-study correlation for a given study arises from the use of data from the same set of participants in the estimation of percentile ratios for that study. For random-effects meta-analysis, the vectors of true logPRs are then assumed to be distributed multivariate normally about a vector of combined logPRs log q,

display math(13)

where Σ is the between-studies variance–covariance matrix, to be estimated from the data. Σ comprises the between-studies variances and the between-studies correlation. The between-studies variances measure the variation in the true effect sizes across studies, equivalent to inline image in the aforementioned standard random-effects meta-analysis. The between-studies correlation is the correlation between the true effect sizes across studies.

The model given by (12) and (13) can be estimated using the mvmeta package in STATA, which uses restricted maximum likelihood (REML) to estimate log q and Σ [17, 18]. In practice, there must be sufficient data available to estimate the K variance parameters and the K(K − 1) ∕ 2 correlation parameters contained in the matrix Σ. The number of percentile levels for which estimation is possible is therefore limited by the number of studies in the dataset. The number of parameters to be estimated in Σ could be reduced by imposing constraints, but options to do this within the mvmeta package are currently limited.

3 Simulation study

In this section, we present the results of a simulation study designed to assess the performance of stage 1 of the two-stage method. Bivariate meta-analysis has already been investigated in simulation studies [19]. These results can be anticipated to apply to the multivariate meta-analysis of stage 2 of our method; it is not our intention to examine the properties of multivariate meta-analysis here.

Data were simulated from a Cox proportional hazards model with a log-logistic baseline distribution with location parameter μ = 4, scale parameter σ = 0.3 and log hazard ratio θ = − 0.4. The baseline parameter values were chosen to be similar to parameter estimates for the example dataset of Section 4. The hazard ratio was chosen to represent a moderately beneficial effect of treatment. For simplicity, no censoring was assumed to take place, but results should be comparable to censored datasets with a similar numbers of events.

Simulated data were generated for studies with 20, 100 and 500 participants, with equal numbers in each treatment arm, and percentile ratios were calculated for percentile levels 0.9, 0.7, 0.5, 0.3 and 0.1 (we give percentile levels in decreasing orders because higher percentile levels correspond to earlier survival times). Because of the absence of censoring in these simulated datasets, estimated survival probabilities were always equal to the chosen percentile levels for some time interval. Percentile estimates were therefore always calculated using Equation (4). To investigate percentiles estimated using Equation (3), we also generated data for studies with 18, 98 and 498 participants. For each simulated dataset, two variance estimates were calculated, one using the asymptotic expansion of Section 2.1.1 and the other using the bootstrapping method of Section 2.1.2 with 1000 bootstrap samples.

For asymptotic variance estimation, the presmoothing method for estimation of the survival density function requires two bandwidths to be specified, as explained in Appendix B. The ‘presmooth’ package offers two built-in options for bandwidth selection, ‘plug-in’ and ‘bootstrap’. However, we found these to be unreliable for some simulated datasets, giving undersmoothed density function estimates that were inappropriately close to zero. For the simulation study, we instead fixed the bandwidths for all datasets to be the median of the bandwidths selected by the ‘plug-in’ method for 50 simulated datasets for each treatment arm in each study size.

From the simulations study results (not presented here), the bias in the estimated logPRs is estimated to be less than 0.01 for all percentile ratios for all sample sizes except the smallest, n = 18 and 20. For the smaller sample sizes, the estimated bias is greatest for the lower percentile levels, rising to 0.086 for n = 18 and 0.046 for n = 20 at k = 0.1. The true logPR at k = 0.1 is 0.362.

Results for the estimated coverage probabilities are given in Table 1, calculated as the percentage of replications for which the estimated 95% confidence interval (CI) for the logPR contains the true value. The asymptotic variance estimation method appears to perform badly, with only larger sample sizes giving appropriate coverage probabilities of approximately 0.95 for mid-range percentile levels. For higher percentile levels, coverage tends to be too high, whereas for lower percentile levels, it tends to be too low. This is consistent with the first omitted term in the asymptotic expansion (14), which provides a negative contribution for lower percentile levels and a positive contribution for higher levels. In practice, the method could be improved by selecting more appropriate bandwidths, considering each study individually, as in the analysis of the example dataset in Section 4. For the bootstrap method of variance estimation, the estimated coverage is much better, with reasonable coverage for studies with as few as 20 patients for the mid-range percentile levels. For the smaller studies, coverage is too low for the highest percentile levels, where few events have yet to take place, and for low percentile levels, where few patients remain in the study. Computation time is longer for the bootstrap method than for the asymptotic method but is not prohibitively so, with the estimation procedure taking around 1 min for the largest studies considered here.

Table 1. Estimated coverage of log percentile ratio estimates for data simulated from a log-logistic proportional hazards distribution.
Variance estimation methodEstimated coverage probability
kn = 18n = 20n = 98n = 100n = 498n = 500

4 Example: glioma dataset

We illustrate our method using a dataset consisting of nine studies examining the post-surgery treatment of patients with high-grade glioma. Patients in the treatment groups underwent a post-surgery course of radiotherapy and chemotherapy, whereas in the control groups, patients were treated with radiotherapy alone. The original meta-analysis contained 12 studies. Hazard ratios were estimated by log-rank statistics and combined using fixed-effects meta-analysis, giving a pooled hazard ratio of 1.18 (95% CI 1.09, 1.28) [20]. We were unable to obtain data from three of these studies and re-analysed the remaining nine using the two-stage method of Section 2. In our analysis, the studies are labelled 2, 7, 9, 11, 13, 16, 17, 18 and 19, in line with the labels given in the dataset.

The data were investigated for departures from the proportional hazards assumption of Siannis et al. [8] by inspecting plots of the estimated log-cumulative hazard against log time (in these plots, the lines for the treatment and control groups should appear parallel under proportional hazards). In these plots, proportional hazards appeared to be violated for trials 9 and 17 and to be questionable for trials 19 and 11.

These data were also analysed using the accelerated failure time models of Siannis et al. [8]. The distribution used was the extended log-gamma distribution, which incorporates several of the more commonly used accelerated failure time distributions. We have assessed the goodness of fit of the extended-log-gamma distribution to the data from each study by comparing the estimated survival curves from the extended-log-gamma analysis for treatment and control groups with the respective Kaplan–Meier estimates. The plots for the treatment groups are given in Figure 1; plots for control groups are similar. The extended-log-gamma distribution appears to fit the data well for some studies, such as study 16, but not for others, such as studies 9 and 17. The data therefore warrant further investigation using the methods described here.

Figure 1.

Assessing goodness of fit of the extended-log-gamma distribution to the data from each study. Plotted are estimates of the survival curve for the treatment group with Kaplan–Meier estimates plotted using solid lines and extended-log-gamma estimates using dashed lines.

4.1 Implementation

We estimated logPRs for each study in the glioma dataset at k = 0.9, … ,0.2 using the two-stage method of Section 2. There were insufficient data available to estimate logPRs at k = 0.1. Variance–covariance matrices were calculated for each study using both asymptotic and bootstrap methods.

For the asymptotic variance estimation method, bandwidths were selected using the plug-in option as a default. To ensure that appropriate bandwidths had been selected, estimated density functions were plotted for each treatment arm in each study. For the control arms of studies 9 and 19, the estimated density functions appeared to be undersmoothed. We then used the bootstrap bandwidth selection option as an alternative. If the estimated curves still appeared to be undersmoothed, we then multiplied the bootstrap bandwidth by successive integers until the estimated curves appeared smooth.

When applying the bootstrap variance estimation method, we were unable to calculate the logPR at lower percentile levels for some bootstrap samples when Kaplan–Meier estimates of survival curves did not decrease below a survival probability of 0.2. Removing all bootstrap samples where this is the case may bias the variance estimate because those samples would tend to have higher survival rates in the treatment group and therefore larger differences between treatment groups. For study 9, the logPR at k = 0.2 could not be estimated for 140 out of 1000 bootstrap samples and for four out of 1000 at k = 0.3. We therefore calculated logPRs only at k = 0.9, … ,0.3, discarding the small number of samples at k = 0.3 for which logPRs could not be calculated.

The code was written in R for the estimation of percentile ratios and the variance–covariance matrix in stage 1 and is available from the authors on request. In stage 2, multivariate meta-analysis was carried out using the ‘mvmeta’ command in STATA with unconstrained between-studies covariance matrix Σ. Parameter values for the multivariate meta-analysis model described in Equations (12) and (13) are estimated using REML [17, 18]. Univariate meta-analyses were also carried out using ‘mvmeta’ and REML.

4.2 Results

We first present in Figure 2 a forest plot of median ratios with variances estimated by bootstrapping in stage 1 and combined on the log-scale using univariate, random-effects meta-analysis. The combined median ratio of 1.12 with 95% CI (1.01, 1.23) indicates a small increase in median survival time associated with treatment. This is consistent with a two-stage meta-analysis of the same nine trials using hazard ratios, which gives a combined hazard ratio of 0.85 (95% CI 0.77, 0.93), and with the one-stage analysis of Siannis et al. using accelerated failure time models, which gave a combined percentile ratio of 1.18 (95% CI 1.07, 1.29). The widths of the estimated CIs are similar for all analyses, suggesting that the new method has similar power to detect non-null treatment effects compared with the standard method. The value of I2, which measures the proportion of the variation in the data due to heterogeneity [21], is estimated to be 13%, indicating a low level of heterogeneity in the data.

Figure 2.

Forest plot of estimated median ratios for the glioma data using bootstrap variance estimation in stage 1 and univariate meta-analysis in stage 2. MR, median ratio, CI, confidence interval.

We present results for the full range of k-values for logPRs combined using univariate meta-analysis in the left-hand side of Table 2, illustrated graphically in Figures 3a and 3b. The results using asymptotic variance estimation in stage 1 suggest a slight increase in combined logPR values at later values of k, whereas for the bootstrapped results, combined logPRs appear to be more constant over time. Estimated standard errors are also less variable when the bootstrap method is used. Results from the simulation study of Section 3 suggest that the bootstrapped results are more reliable. In general, CIs are wider at early k, when fewer events have taken place, and at later k, when fewer participants remain in the studies. Also presented in Table 2 are REML estimates for the between-studies standard deviation inline image. For higher percentile levels, inline image is 0.000, indicating that no or very little heterogeneity is present. Heterogeneity increases as the percentile level decreases, with inline image rising to 0.091 for k = 3 (using the bootstrap method).

Table 2. Meta-analysis results for the glioma data with asymptotic and bootstrap variance estimation in stage 1 and univariate and multivariate meta-analysis in stage 2.
kAsymptotic univariateBootstrap univariateAsymptotic multivariateBootstrap multivariate
logPRinline imagelogPRinline imagelogPRinline imagelogPRinline image
  1. Reported are parameter estimates with standard errors in brackets and estimates of the between-studies standard deviation (σk in the univariate case and inline image in the multivariate case)
0.90.106 (0.114)0.0000.080 (0.094)0.000
0.80.121 (0.077)0.0000.156 (0.065)0.0000.095 (0.078)0.0400.149 (0.066)0.051
0.70.084 (0.059)0.0000.126 (0.059)0.0440.074 (0.066)0.0790.119 (0.060)0.067
0.60.051 (0.050)0.0000.083 (0.056)0.0710.048 (0.064)0.1060.084 (0.055)0.078
0.50.090 (0.045)0.0000.111 (0.049)0.0520.091 (0.059)0.1050.109 (0.052)0.082
0.40.148 (0.048)0.0320.126 (0.049)0.0240.150 (0.058)0.1100.150 (0.054)0.089
0.30.205 (0.077)0.1600.165 (0.066)0.0910.197 (0.079)0.1780.184 (0.074)0.132
0.20.205 (0.082)0.151
Figure 3.

Plots of combined percentile ratio estimates and 95% confidence intervals using (a) asymptotic and (b) bootstrap variance estimation in stage 1 and univariate meta-analysis in stage 2.

To compare our results with a method based on time-varying hazard ratios, we re-analysed the data using Cox relative risk regression models with different treatment effects in years 1, 2, 3 and beyond of the studies. The combined hazard ratio was 0.84 for year 1 (95% CI 0.72, 0.98), 0.85 for year 2 (95% CI 0.71, 1.02) and 0.86 for year 3 and beyond (95% CI 0.64, 1.15). For this example, both methods are in agreement, suggesting a small, beneficial effect of treatment, which is maintained over time.

Results for logPRs combined using multivariate meta-analysis are presented in the right-hand side of Table 2. Only logPRs for k = 0.8, … ,0.3 have been combined to avoid convergence difficulties when too many correlation parameters are estimated. The asymptotic and bootstrapped results both show similar trends to those observed in the univariate analyses. The multivariate standard error estimates are slightly higher than the corresponding univariate estimates, in contrast with expectations that the borrowing of strength among outcomes would lead to smaller standard errors. The larger standard errors in the multivariate case appear to be caused by higher estimates of the between-studies standard deviation (also given in Table 2). To investigate why this is the case, we consider the estimated between-studies correlation matrices, which are presented in Table 3. Recall that the between-studies correlation is the correlation between the true logPRs across studies and is estimated from the data. Riley et al. [19] found that estimates of between-studies variance may be inflated in bivariate meta-analysis to compensate for a between-studies correlation estimate of ± 1, which is on the boundary of the parameter space. It is possible that a similar effect may be observed when the estimated correlation is very close to the boundary. In our case, only one of the correlations is estimated to be 1 in Table 3, but several correlation parameters are estimated to be very close to 1.

Table 3. Estimated between-studies correlation matrices from multivariate meta-analysis of the glioma data with asymptotic and bootstrap variance estimation in stage 1.
0.50.8130.9980.9621.. − 0.0260.9880.9751..
0.40.6940.9930.8960.9831. − 0.2570.9250.8970.9731.
0.30.6760.9900.8850.9791.0001 − 0.3350.8910.8590.9510.9971

Our analysis of the glioma dataset suggests that there is limited benefit in the use of multivariate meta-analysis in this context. However, it may still be advantageous to use multivariate meta-analysis for some datasets. For example, if some of the studies had only sufficient follow-up to estimate a subset of the logPRs of interest, multivariate meta-analysis would enable the combination of logPRs that were missing in those studies to borrow strength from the logPRs that had been observed [22]. To illustrate this, we re-analysed the glioma data, but with one of the studies artificially truncated by removing estimated logPRs for k = 0.5, … ,0.1 for that study. We did this for each study in turn and analysed each truncated dataset using both univariate and multivariate meta-analyses with bootstrapping in stage 1 (we chose to use the bootstrapping method here because it gives better results than the asymptotic method in the simulation results of Section 3). For the multivariate meta-analyses, logPRs at levels k = 0.7,0.6,0.5 and 0.4 were combined simultaneously. Results for combined median ratios are presented in Table 4. Recall from Table 2 that for the full dataset the estimated combined median ratio from the univariate analysis was 0.111 with standard error 0.049. For each truncated study, the multivariate median ratio estimates are closer to the full data estimate than the univariate estimates. This is because data from the truncated studies did not contribute to the univariate meta-analyses because estimated median ratios were not available, whereas the multivariate meta-analyses incorporated information from logPR estimates at k = 0.7 and 0.6 for those studies. Estimated standard errors in Table 4 are lower in the multivariate meta-analyses for over half the truncated studies. For studies 2, 13 and 17, the univariately estimated standard errors are lower compared with the estimate from the full dataset because the removal of a study has reduced the estimated heterogeneity in the data. In these cases, the multivariate analysis would give a more conservative estimate for the standard error.

Table 4. Meta-analysis results for the glioma data with truncated follow-up in one of the studies using bootstrap variance estimation in stage 1 and univariate and multivariate meta-analyses in stage 2.
StudynAnalysisinline image (SE)p-valueinline image
  1. Multivariate meta-analyses combined logPRs simultaneously for k = 0.7, 0.6, 0.5, 0.4.
220Univariate0.101 (0.048)0.0340.045
  Multivariate0.112 (0.051)0.0280.073
7105Univariate0.103 (0.053)0.0510.060
  Multivariate0.111 (0.052)0.0320.072
9116Univariate0.126 (0.054)0.0190.058
  Multivariate0.122 (0.052)0.0190.071
11511Univariate0.126 (0.060)0.0340.081
  Multivariate0.106 (0.055)0.0530.084
1391Univariate0.083 (0.045)0.0650.000
  Multivariate0.115 (0.054)0.0330.076
16125Univariate0.115 (0.052)0.0280.062
  Multivariate0.116 (0.053)0.0280.077
17270Univariate0.082 (0.048)0.0850.000
  Multivariate0.111 (0.054)0.0390.071
18674Univariate0.137 (0.055)0.0120.054
  Multivariate0.128 (0.050)0.0100.058
19235Univariate0.130 (0.056)0.0210.068
  Multivariate0.121 (0.052)0.0210.071

5 Discussion

Meta-analyses of survival data usually focus on the estimation of constant hazard ratios. If an assumption of proportional hazards seems inappropriate, then models that incorporate time-varying hazards could be used. However, it may be preferable to consider alternative measures of treatment effect, which are more readily interpretable. We have proposed a novel, two-stage method for meta-analysis of survival IPD using percentile ratios. The advantage of our method lies in the avoidance of distributional assumptions at the study level, which makes it suitable to use when proportional hazards or accelerated failure time assumptions are inappropriate.

Stage 1 of the proposed meta-analysis involves estimation of logPRs and their variance–covariance matrix from Kaplan–Meier estimates of the survivor function. We have presented two methods for estimation of the variance–covariance matrix, an asymptotic method and a bootstrap method. Results from the simulation study of Section 3 suggest that the bootstrap method has the superior coverage properties, and we therefore recommend its use for variance estimation. If bootstrapping is too computationally intensive, the asymptotic method could be used to estimate mid-range logPRs for sufficiently large studies.

In stage 2, logPRs can be combined using either univariate or multivariate meta-analysis. When logPR estimates are available for all percentile levels of interest, we recommend the use of univariate meta-analysis because multivariate meta-analysis can lead to inflated estimates of between-studies variances. However, when some studies lack sufficient data to estimate all logPRs, then a multivariate meta-analysis may be more appropriate to allow combined logPR estimates to borrow strength across percentile levels. We have not yet explored the possibility of constraining the between-studies variance–covariance matrix Σ to have, for example, an auto-correlated form; we leave this for future work.

Our method could be adapted to allow either parametric or non-parametric methods to be used in stage 1 of a two-stage meta-analysis. Where appropriate, study-level data could be modelled using parametric distributions, with the non-parametric approach being used when distributional assumptions appear unjustified. Alternatively, the non-parametric method could be extended to estimate other quantities of interest from survival curves, such as percentile differences, ratios of survival probabilities at a given time, or differences in survival probabilities. Further work would be required to investigate the asymptotic properties of these alternative measures of treatment effect.

Our focus has been on methods for meta-analysis of IPD. Our approach could also be used in the meta-analysis of aggregate data if methods were available for the extraction of IPD from published survival curves. To reconstruct the individual patient data, a model would be required for the censoring mechanism. Parmar et al. [23], Williamson et al. [24], Ouwens et al. [7] and Fiocco et al. [25] have recently made progress in this area.

Appendix A: Derivation of variance–covariance formulae

A.1 Derivation of variance formula

We derive the formula (5), following the method described in [26], by asymptotic expansion of the survival function

display math(14)

Rearranging and using f(t) = − dS ∕ dt, we get

display math(15)

By the delta method

display math(16)

and using equation (15) we obtain

display math(17)

A.2 Derivation of covariance formula

Let t(1), … ,t(N) be the N observed event times for the study in question and let inline image be the estimated probability of surviving through the time interval that begins at t(m). Then the estimated survivor function at time t is given by

display math(18)

where t(r) ≤ t < t(r + 1). Suppose t1 ≤ t2, then

display math(19)

where we have used the assumption that inline image's corresponding to different time intervals are independent. Then, by using asymptotic expansions of inline image in both sides of (19) we obtain

display math(20)

Then, using similar calculations to those described in section A.1, and approximating inline image by (20), we obtain

display math(21)

Appendix B: Presmoothed density function estimation

A presmoothed kernel density estimator for censored data was introduced by Cao and Jacome in [27]. See also [28] for an extension to left-truncated data.

Let Zi be the ordered observation times, i = 1, … ,n, with Zi = min(Ti,Ci), where Ti is the event time and Ci is the censoring time. Let δi be the censoring indicator with δi = 1 if the observation is uncensored and δi = 0 otherwise. Then the Kaplan-Meier estimate inline image of the probability distribution function F(t) of T is given by

display math(22)

An estimate inline image for the probability density function f(t) can be derived from inline image using the standard kernel density estimation procedure due to Rosenblatt [29] and Parzen [30]:

display math(23)

where K(x) is a kernel and Ks(x) = s − 1K(x ∕ s) is the rescaled kernel with bandwidth s.

Let m(t) be the conditional probability of uncensoring, m(t) = E(δ | Z = t). The function m(t) can be estimated non-parametrically using the Nadaraya–Watson estimator with kernel K and bandwidth b

display math(24)

Replacing the censoring indicator δi in (22) with inline image leads to the presmoothed estimate inline image of F(t):

display math(25)

Note that the presmoothed estimate inline image is a step-like function of t with jumps at all values of Zi. It has been shown to be more efficient than the Kaplan-Meier version inline image for a suitable choice of the bandwidth b. Analogous to equation (23), a corresponding presmoothed estimate inline image of f(t) is then

display math(26)

Note that the estimate inline image depends on two bandwidths; the presmoothing bandwidth b, used in the estimation of m(t), and the smoothing parameter s, used to estimate f(t) from inline image.

Efficiency of the presmoothed estimator inline image relies on selection of appropriate bandwidths b and s. In [31] two approaches to bandwidth selection are described, both of which have been implemented in the presmooth package. The first approach, termed ’plug-in’ bandwidth selection, minimises an asymptotic expansion of the mean integrated squared error (MISE). The second approach, ’bootstrap’ bandwidth selection, minimises a bootstrap version of the MISE. See [31] for details.


This work was supported by the Medical Research Council [grant numbers G0902100 and G0600650, Unit Programme numbers U105261167 and U105285807]. We would like to thank Simon Thompson for useful comments. We would also like to thank the Glioma Meta-analysis Trialists Group who brought the individual patient data together for the original meta-analysis and to thank all the trial groups for their permission to use data from their trials (Alberta Brain Tumor Group, USA; EORTC Headquarters, Brussels, Belgium and EORTC Brain Tumour Group - EORTC 26751, 26812, 26882; Istituto Nazionale Tumori, Milan Italy; Medical Research Council, UK - MRC BR05; National Institute of Neurosurgery, Budapest, Hungary; Neurological Departments of Cracow, Warsaw and Lublin Medical Academies; Radiotherapy Oncology Group, USA - RTOG 7401). The contents of this publication and the methods used are, however, the sole responsibility of the authors and do not necessarily represent the official views of the trial groups supplying data for the meta-analysis.