SEARCH

SEARCH BY CITATION

Keywords:

  • assurance;
  • elicitation;
  • prior distribution;
  • power;
  • sample size;
  • survival analysis

Abstract

  1. Top of page
  2. Abstract
  3. Introduction
  4. Assurance and sample size
  5. Assurance calculations for parametric survival models
  6. Assurance calculation for proportional hazard and nonparametric survivor function models
  7. Numerical examples
  8. Summary
  9. Acknowledgements
  10. References

We consider the use of the assurance method in clinical trial planning. In the assurance method, which is an alternative to a power calculation, we calculate the probability of a clinical trial resulting in a successful outcome, via eliciting a prior probability distribution about the relevant treatment effect. This is typically a hybrid Bayesian-frequentist procedure, in that it is usually assumed that the trial data will be analysed using a frequentist hypothesis test, so that the prior distribution is only used to calculate the probability of observing the desired outcome in the frequentist test. We argue that assessing the probability of a successful clinical trial is a useful part of the trial planning process. We develop assurance methods to accommodate survival outcome measures, assuming both parametric and nonparametric models. We also develop prior elicitation procedures for each survival model so that the assurance calculations can be performed more easily and reliably. We have made free software available for implementing our methods. © 2013 The Authors. Statistics in Medicine published by John Wiley & Sons, Ltd.

Introduction

  1. Top of page
  2. Abstract
  3. Introduction
  4. Assurance and sample size
  5. Assurance calculations for parametric survival models
  6. Assurance calculation for proportional hazard and nonparametric survivor function models
  7. Numerical examples
  8. Summary
  9. Acknowledgements
  10. References

Sample size determination is an important part of clinical trial design and conventionally involves power calculations. However, the power of a trial does not necessarily give the probability of the trial demonstrating a treatment effect, as the true treatment effect may be different to that assumed in the power calculation. Several authors have proposed a hybrid classical-Bayesian approach for assessing the probability of a successful trial, given the sample size only, which can then be used to inform sample size decisions.

The hybrid method was first considered by Spiegelhalter and Freedman [1]. They constructed an unconditional probability of having a desired outcome and called this unconditional probability the average power. O'Hagan and Stevens [2] used this method for choosing sample sizes for clinical trials of cost-effectiveness. They referred to the unconditional probability of a successful trial as the ‘assurance’ of the trial, and we use this term here. O'Hagan et al. [3] extended assurance methods to two-sided testing and equivalence trials, covering the use of non-conjugate prior distributions for uncertain parameters. Chuang-Stein [4] discussed the difference between traditional power calculations and assurance calculations to determine sample sizes, giving an example of planning the next trial based on the results of an early trial. Chuang-Stein and Yang [5] reviewed the concept of assurance and illustrated its use when planning phase III trials. They also applied assurance to study designs when re-estimating a sample size based on an interim analysis.

An assurance calculation requires a prior distribution for the treatment effect, but does not necessarily involve a Bayesian analysis of the trial data. The method of analysis, and in particular the criteria for which the trial is deemed a ‘success’, are determined externally, for example, by a regulator. Once the criteria have been specified, a prior distribution is used to assess the probability that these criteria will be met. Typically, the prior distribution will only be used in the design stage and not the analysis. At the design stage, the risk of trial failure is primarily the trial sponsor's, and so it should be uncontroversial for a trial sponsor to use all their prior knowledge in assessing such a risk.

We consider clinical trials in which the endpoint of interest is a survival time. For time-to-event outcome measures, power and sample size calculations have been well studied under various model assumptions. For example, Schoenfeld and Richter [6] developed a power function with a limited recruitment period and a pre-specified follow-up period under the assumption that the survival times in each treatment group follow exponential distributions and patients enter the trial uniformly. Gross and Clark [7] provided a method of calculating sample size by assuming that the sample mean survival time is approximately normally distributed under Weibull models for the survival times. Freedman [8] and Schoenfeld [9] derived sample size formulae under the assumption of proportional hazards based on asymptotic properties of the logrank statistic.

Little has been done in calculating assurance for survival endpoints. Assuming proportional hazards, Spiegelhalter et al. [10] derived an assurance formula in the case of equal allocation and follow-up. The only uncertain variable considered was the log hazard ratio, and a normal prior was assumed. In this paper, we extend assurance calculations to accommodate both parametric and proportional hazards models. Under proportional hazards models, we derive an assurance formula assuming uniform patient entry over a limited recruitment period. We consider uncertainty in both the log hazard ratio and the baseline survivor function.

In Section 2, we review how assurance is calculated to determine the unconditional probability of having a desired outcome. In Section 3, we derive assurance calculations for exponential and Weibull survival models and describe the elicitation methods for the required prior distributions. In Section 4, we extend assurance calculations to accommodate proportional hazards models, considering uncertainty in both treatment effect and baseline survivor function. We also describe the procedure of generating the baseline survivor function. Examples are given in Section 5.

Assurance and sample size

  1. Top of page
  2. Abstract
  3. Introduction
  4. Assurance and sample size
  5. Assurance calculations for parametric survival models
  6. Assurance calculation for proportional hazard and nonparametric survivor function models
  7. Numerical examples
  8. Summary
  9. Acknowledgements
  10. References

We now review the concept of assurance. Suppose that a randomised controlled trial is to be conducted to compare an experimental treatment and a standard treatment for a particular disease. A hypothesis test is to be carried out to test the null hypothesis that the treatment effect θ = 0 against the alternative that θ ≠ 0. On the basis of a power calculation, the sample size is chosen to solve

  • display math(1)

for some desired probability π * .

The power of the test P(Reject H0 | θ = θA) provides the probability of successfully rejecting the null hypothesis if the true value of θ is the specified θA. As the true value of θ may be very different to θA, the actually probability of successfully rejecting the null hypothesis may be very different to the power.

Assurance is the unconditional probability that the trial will end with the desired outcome, which we derive via

  • display math(2)

where f(θ) is the prior distribution for the true treatment effect θ. If a successful trial simply corresponds to rejecting a null hypothesis of no treatment effect, then the assurance in (2)) can be thought of as an expected power (interpreting θA in (1) as the true value of the treatment effect, rather than some minimum clinically relevant difference).

If our desired outcome is to reject the null hypothesis with data favouring the experimental treatment, then assurance is given by

  • display math

with inline image indicating that the data favour the experimental treatment. We again emphasise that specifying what constitutes a ‘successful trial’ is not part of the assurance method; the criteria determining a successful trial are set externally, and the idea of assurance is to use prior information to determine the probability that these criteria are met.

The power of a clinical trial can, in theory, be made as large as desired by increasing the sample size. The same does not hold for assurance. For a large enough sample size, we will ‘observe’ the true treatment effect, so that the assurance converges to the prior probability that the new treatment is suitably effective. If this prior probability is low, no trial will have a high assurance of success; we cannot ‘beat the prior’.

We restrict the scope of this paper to frequentist methods for analysing the trial data, but Bayesian methods could be used and are discussed in depth by Ibrahim et al. [11] and Christensen et al. [12]. (In particular, Ibrahim et al. [11] discussed Bayesian inference for all the survival models considered in this paper.) In this case, there may be a distinction between the prior used in the design stage, and the prior used in the analysis stage, if the regulator is not willing to accept the trial sponsor's prior.

Assurance calculations for parametric survival models

  1. Top of page
  2. Abstract
  3. Introduction
  4. Assurance and sample size
  5. Assurance calculations for parametric survival models
  6. Assurance calculation for proportional hazard and nonparametric survivor function models
  7. Numerical examples
  8. Summary
  9. Acknowledgements
  10. References

We now suppose that, in each of two treatment groups, the outcome variable for each patient is the survival time to some event and consider exponential and Weibull models for the survival times. For each model, we first choose the analysis method, and hence the criteria for a successful trial. We then consider assurance calculations and elicitation methods for the required prior distributions.

Exponential distribution

We first suppose that the survival times in each treatment group follow an exponential distribution, with hazard rates λ1 and λ2 ( i = 1 for the control group and i = 2 for the experimental group) and allow for a limited recruitment period from time 0 to R with uniform patient entry and T as the total trial length. The time origin for survival time is when a patient enters the trial, not when the trial starts. Here, we consider the analysis method based on Schoenfeld and Richter [6].

The null hypothesis is m1 ∕ m2 = 1, where mi is the median survival time in group i, against the alternative m1 ∕ m2 = ϕ, where ϕ is the minimum clinically important difference. Note that assuming an exponential model, the hypotheses stated earlier are equivalent to H0 : θ = 0 versus H1 : θ ≠ 0, where

  • display math(3)

The test statistic is

  • display math

where di is the number of events in group i and inline image is the maximum likelihood estimate of θ.

On the basis of the asymptotic properties of the test statistic, the power πE of a 100α% two-sided test is

  • display math(4)

where Ni is the number of patients in group i and Pie is the probability of an individual patient in group i experiencing the outcome event during the trial. Schoenfeld and Richter [6] derived an exact formula for  Pie:

  • display math(5)

The assurance of rejecting the null hypothesis with data favouring the experimental treatment is

  • display math(6)

where θ and Pie are functions of λ1 and λ2.

Constructing the priors

From (6), we see that a joint prior is required for λ1 and λ2. Kadane and Wolfson [13] argued that it is better to ask for opinion about observable quantities rather than parameters in statistical models, and we follow their advice here. We elicit f(λ1,λ2) via judgements about survival rates at some specified time.

To construct a prior for each parameter, first note that

  • display math(7)

with Si(t0) the survival rate for group i at time t0. Hence, to elicit the joint prior distribution, we first elicit judgements about S1(t0) instead of eliciting beliefs about λ1 directly. An expert may judge S1(t0) to be informative for S2(t0), so that λ1 and λ2 are not independent. To elicit this dependence, we propose to elicit judgements about the difference, ρ = S2(t0) − S1(t0), and assume that ρ is independent of S1(t0).

Methods for eliciting univariate distributions are given in Sections 6.3 and 6.4 of O'Hagan et al. [14] and can be implemented using the freely available SHELF package [15] and the MATCH online elicitation tool available at http://optics.eee.nottingham.ac.uk/match/uncertainty.php. See also Johnson et al. [16] for a systematic review of elicitation methods.

One option is to elicit a beta distribution for S1(t0) and a normal distribution for ρ, truncating the normal prior if necessary to ensure S2(t0) ∈ (0,1). (An alternative would be to use a shifted and scaled beta distribution for ρ, although we have not found the need to truncate to cause significant computational problems.)

For illustration, we describe a ‘trial roulette’ method proposed by Gore [17] to elicit a normal prior for ρ. The method is based on the fixed interval approach, in which the expert is asked to provide a probability that the unknown quantity of interest will fall in a pre-fixed interval. Using the SHELF package, the facilitator, who conducts the elicitation, firstly elicits from the expert the lower and upper bounds of the range of plausible values for ρ. Then the facilitator divides the range from the lower bound to upper bound in to 10 equal-width ‘bins’. The expert is asked to specify his or her probability of ρ lying in a particular bin by placing ‘chips’ in that bin, with the proportion of chips allocated representing the probability. The number of chips given to the expert is specified by the facilitator. For example, if in total 20 chips are used, then each chip represents a probability of 0.05. The trial roulette method is simple to use and provides the expert with an immediate display of her elicited judgements.

A parametric distributed can be fitted to the elicited judgements using a least squares procedure: the parameters are chosen to make the fitted probabilities as close as possible to the elicited probabilities. Feedback should be provided to the expert for checking the adequacy of the elicited distribution.

One illustration of the use of elicitation in clinical trials is given by Parmar et al. [18], who used a questionnaire to elicit the log hazard ratio given in (3) by eliciting a point estimate for S1(t0) and a prior distribution for ρ using the roulette method. Tan et al. [19] and Hiance et al. [20] both adapted this questionnaire and used it for a phase III trial. Our elicitation process is more complicated, as we consider uncertainty in both S1(t0) and ρ.

Given the elicited distributions f(S1(t0)) and f(ρ), we estimate γE using Monte Carlo simulation:

  • display math(8)

where θ(j) and inline image for i = 1,2, are obtained by the following steps.

  1. Simulate inline image from the elicited prior distribution f(S1(t0)).

  2. Simulate ρ(j) from the elicited prior distribution f(ρ) and calculate inline image.

  3. Calculate inline image and inline image using (7).

  4. Calculate θ(j) using (3).

  5. Calculate inline image and inline image using (5).

The process is computationally quick, so we can make M very large to ensure convergence.

Supporting software

We have made available software to implement the methods in this paper. The software can be downloaded from www.jeremy-oakley.staff.shef.ac.uk/assurance.zip. For the exponential case, we have a written an interactive elicitation tool for computing assurance. The code is written in R [21] and uses the rpanel package of Bowman and Crawford [22] and the tkrplot package of Tierney [23] to provide interactive graphics. The tool helps to elicit the prior distributions of the baseline survival rate S1(t0) at a specified time t0, and the survival difference ρ between the experimental group and the control group at time t0, using the trial roulette method. The tool also provides feedback to check the adequacy of the elicited distributions. Once the priors are specified, the tool draws both power and assurance curves for the corresponding elicited distributions. Users can see immediately how changes in the elicited beliefs affect the assurance.

Weibull distribution

We now suppose that the survival times of patients receiving the standard and experimental treatment follow Weibull distributions, with scale parameters λ1, λ2 and shape parameters κ1, κ2, respectively. The probability density function of the Weibull distribution in each treatment group is

  • display math

for i = 1,2. The method of analysis that we consider here is to compare mean survival times for each group. We assume that the sample mean survival times are approximately normally distributed: inline image, with Ni the number of patients in group i ( i = 1 for the control group, i = 2 for the experimental group).

Gross and Clark [7] derived a power function of a 100α% two-sided test of the null hypothesis that the mean survival times are the same, μ1 = μ2, against the alternative μ1 ≠ μ2. They used the test statistic

  • display math

The power formula is

  • display math(9)

where μi and inline image, for i = 1,2, are expressed by

  • display math(10)
  • display math(11)

As the variance parameters inline image and inline image are unknown, we switch to using a two-sample t-test in the assurance calculation. The assurance of rejecting the null hypothesis with data favouring the experimental treatment is given by

  • display math(12)

where T1 − α ∕ 2;ν is the 100 × (1 − α ∕ 2) percentile from the t-distribution with degrees of freedom ν calculated according to Welch's t-test. As in the exponential case, we estimate this integral using Monte Carlo simulation.

Constructing the priors

To derive the assurance, a joint prior distribution for κ1,κ2,λ1 and λ2 is needed. Clearly, making judgements directly about these parameters would be too difficult, so we again construct the priors from judgments about survival rates.

Several authors have presented methods for eliciting an expert's opinion for a single Weibull distribution. In Singpurwalla [24], beliefs about the median survival time and shape parameter κ are elicited. Berger and Sun [25] and Kaminskiy and Krivtsov [26] both considered a predictive approach, in which survival rates at two specified times are elicited. We consider a similar approach, allowing for the possibility of dependence between the two uncertain survival distributions.

For each group, the shape and scale parameters can be estimated using the survival rate after two periods. Let Si(t0) and Si(t ′ 0) be the survival rates at time t0 and t ′ 0, where t ′ 0 > t0 without loss of generality. The Weibull parameters are derived from

  • display math(13)
  • display math(14)

We could elicit the prior distribution for κi and λi by eliciting an expert's opinion about Si(t0) and Si(t ′ 0) and then applying (13) and (14), but the expert may judge that Si(t0) and Si(t ′ 0) are dependent. Instead, we suggest eliciting beliefs about the following four observable quantities (assuming independence):

  • display math(15)
  • display math(16)
  • display math(17)

and this will induce a joint prior for κ1, κ2, λ1 and λ2. Another option is to elicit judgments about odds ratios instead of differences.

If beliefs about the differences between two survival rates are elicited, one option is to elicit a beta distribution for S1(t0), and a normal prior for δ12, beta priors for δ11, δ22. It may be necessary to truncate the priors to ensure S1(t ′ 0), S2(t0) and S2(t ′ 0) all in the range (0,1), but this is unlikely to be a significant computational problem. If the odds ratios are the uncertain quantities of interests, we could again elicit a beta distribution for S1(t0), and lognormal distributions for the odds ratios.

We estimate γW using Monte Carlo simulation:

  • display math(18)

where I() is the indicator function, and the simulation procedure for each j is as follows.

  1. Simulate inline image, inline image, inline image and inline image from their elicited prior distribution.

  2. Calculate inline image, inline image and inline image from the sampled values in step 1 using Equations (15)(17).

  3. Calculate inline image and inline image for i = 1,2, using Equations (13) and (14).

  4. Simulate survival times inline image for i = 1,2, for the two groups, from Weibull distributions with the parameter values calculated in step 3.

  5. Calculate the sample means inline image and sample variances inline image for i = 1,2, from the simulated data in step 4.

Again, the process is computationally quick, so M can be chosen to be very large to ensure convergence. The R code to compute the assurance for the Weibull model is also available in the supporting software described in Section 3.2.

Priors based on historical data

Suitable historical data may be available for informing the prior distributions, particularly with regard to the control arm of a trial. We could then derive a posterior distribution given the historical data (following the approaches described by Ibrahim et al. [11]), which could be used as the prior in the assurance calculation. For example, in the exponential case, the distribution for the rate parameter λ1 in the control arm could be based on the historical data (and perhaps a noninformative prior), and we would then only elicit prior judgements about the difference between the treatments ρ = S2(t0) − S1(t0).

Ibrahim et al. [11]) also described the use of ‘power priors’. This also involves deriving a posterior distribution given the historical data, but a posterior in which the likelihood function is downweighted, by raising it to a power between 0 and 1. The downweighting may be used to reflect differences between the study populations in the historical and new trials.

Assurance calculation for proportional hazard and nonparametric survivor function models

  1. Top of page
  2. Abstract
  3. Introduction
  4. Assurance and sample size
  5. Assurance calculations for parametric survival models
  6. Assurance calculation for proportional hazard and nonparametric survivor function models
  7. Numerical examples
  8. Summary
  9. Acknowledgements
  10. References

We now consider a proportional hazards model, with no parametric assumption about the underlying survivor functions, for a two-arm trial with uniform patient entry during the recruitment period 0 to R and total study length T. We suppose that the trial results will be analysed with a logrank test. Let hi and Si denote the hazard and survivor function for treatment group i ( i = 1 for the control group and i = 2 for the experimental group), respectively, with ϕ the hazard ratio h2(t) ∕ h1(t). A two-sided 100α% logrank test is performed to test the null hypothesis, H0, that the log hazard ratio, θ = log(ϕ), is zero against the alternative θ ≠ 0.

Assuming equal number of patients per treatment group, the power formula is

  • display math

where d is the total number of events in the trial. Spiegelhalter et al. [10] derived an exact assurance formula assuming a normal prior N(m,v) for θ:

  • display math

This assumes that all patients are monitored until the outcome event with no limitation of the length of the trial. We extend this formula to allow for limited recruitment and follow-up periods.

Our assurance calculation is based on the power formula derived by Schoenfeld [9]. The recruitment rate is uniform over an interval 0 to R, and there is a follow-up period to time T. The time origin for survival time is when a patient enters the trial. Given the total number of patients N, the test statistic is

  • display math

where U is the logrank statistic, Pe is the probability that an individual patient will experience the outcome event during the trial and Qi is the proportion of patients allocated to group i for i = 1,2. Let P1e and P2e denote the probabilities that a patient from treatment groups 1 and 2 will experience the outcome event during the trial, respectively. The power formula for a two-sided 100α% logrank test is

  • display math(19)

where

  • display math(20)

The assurance, considering uncertainty in θ and Pe, is

  • display math(21)

where inline image implies that the experimental treatment is better than the standard/placebo.

To compute the assurance γP, a joint prior for θ and Pe is required. Depending on whether there are data available, the elicitation procedures for Pe are different. When there are no data available about the standard treatment, we could elicit Pe directly, and the assurance is calculated using Equation (21). When data about the standard treatment are available, we will use the data to learn about S1(.) and then derive Pe from

  • display math(22)

With both the data available and no data available cases, beliefs about the log hazard ratio θ are required. The model parameter θ can be expressed in terms of the survival rates at a fixed time point t0 in each group:

  • display math(23)

where ρ denotes the difference between the survival rates in the two groups at time t0 (group 2 minus group 1). We elicit opinion about the survival rates S1(t0) and ρ instead of the model parameter θ as in the case of exponential models.

Constructing the priors with no data available

From (21), a joint prior distribution for θ and Pe is required. Under the proportional hazards model, P2e is given by

  • display math(24)

Hence, to elicit the joint prior distribution for θ and Pe, we can elicit beliefs about P1e, S1(t0) and ρ and then apply (24), (23) and (22). As in Section 3.1.1, we could elicit independent beta distributions for P1e and S1(t0) and an independent normal distribution for ρ. As before, a Monte Carlo simulation can be used to estimate (21).

Constructing the priors with available data

Information from a pilot study or historic data for the standard treatment may be available at the planning stage. In this section, we describe a method of incorporating both information from the data and expert opinion to obtain the final joint prior distribution for the assurance calculation.

From (22), Pe is determined by S1(.) and θ. Hence, we consider a joint distribution for (θ,S1(.)) rather than (θ,Pe). The integral in the first term of (22) can be estimated numerically, for example, using Simpson's rule:

  • display math(25)

where wk = 1,4,2,4,2 … ,4,1 for k = 1, … ,H, and H is the number of subintervals in the interval (T − R) to T with H an odd number. We now just require a joint prior for (θ,S1(u1), … ,S1(uH)).

Taking into account both uncertainty in the log hazard ratio θ and baseline survival rates S1(uk) for k = 1, … ,H, the (approximate) assurance inline image is

  • display math(26)

We suppose that survival data (with right censoring at J distinct times, τ1, … ,τJ) are available for the standard treatment, and so consider inference for S1(u1), … ,S1(uH) using a Dirichlet distribution, as in Susarla and Ryzin [27]. To handle censoring, we use a Gibbs sampling approach, as suggested by Kuo and Smith [28], to generate (S1(u1), … ,S1(uH)) from its posterior distribution, which is another Dirichlet distribution. See also Ibrahim et al. [11] for further discussion of inference for survivor functions using Dirichlet distributions.

The problem with censored data is that we do not observe the exact event times. Hence, in the Gibbs sampler, we firstly simulate event times for each censored observation and then update the prior distribution using these simulated times and the observed uncensored event times.

We now describe the procedure for simulating (S1(u1), … ,S1(uH)) using the Gibbs sampling approach. The first step is to partition the sample space [(T − R),T] into (H + J) subintervals according to the censored times and the quadrature points in Simpson's rule. The uncertain quantities of interest are S1(u1), … ,S1(uH), and the nuisance parameters are S1(τ1), … ,S1(τJ). To simplify the notation, we define

  • display math

where tj is the jth smallest value in the set {u1, … ,uH,τ1, … ,τJ}.

We define a vector of probabilities p1:H + J + 1 = (p1,p2, … ,pH + J + 1) of an event occurring in each subinterval:

  • display math

We consider a Dirichlet process prior for S1(.) with parameter function α, of the form α([t, ∞ )) = c0G(t). The function G(.) represents the beliefs about the shape of S1(.). The precision parameter c0 is a positive real number, and it measures how much weight to put on these prior beliefs. The prior distribution of p1:H + J + 1 is a Dirichlet distribution:

  • display math

In the Gibbs sampler, we iterate between sampling event times for the censored data conditional on the probabilities p1:H + J + 1, and sampling a new probability vector p1:H + J + 1 given the sampled event times.

Sampling the unobserved event times for the censored data

For the Gibbs sampler, conditional on the probabilities p1:H + J + 1, we need to sample which interval each censored event time occurred in (we do not actually need the precise event time). We introduce variables Zk + 1,k, … ,ZH + J + 1,k, which decompose the number of censored observations rk in the interval (tk − 1,tk], into the number of events that fall in the intervals (tk,tk + 1], … , (tH + J − 1,tH + J], (tH + J, ∞ ), so that inline image. The full conditional distribution of Zk + 1,k, … ,ZH + J + 1,k given the probabilities p1:H + J + 1 is a multinomial distribution with sample size rk and probability parameters ηk + 1,k, … ,ηH + J + 1,k, where

  • display math

for j = k + 1, … ,H + J + 1.

Sampling the probabilities p1:H + J + 1

Following the sampling of Zk + 1,k, … ,ZH + J + 1,k, define d ′ k to be the revised number of events in the interval (tk − 1,tk], which is the sum of the observed events and sampled events:

  • display math(27)

where dk and Zk,j are the number of observed events and simulated events in the interval (tk − 1,tk], respectively. New probabilities p1,p2, … ,pH + J + 1 are sampled from their full conditional distribution, which is another Dirichlet distribution:

  • display math
Eliciting the prior for θ and calculating the assurance

To elicit the prior for the log hazard ratio θ, we only need to elicit the prior for the survival difference ρ at time t0 and then use simulation to obtain the prior for θ using (23). Simulated S1(t0) can be obtained when generating the survival rates using the Gibbs sampling approach. As before, the Monte Carlo simulation can be used to estimate (26). The R code to compute the assurance for the proportional hazards models is also available in the supporting software described in Section 3.2.

Numerical examples

  1. Top of page
  2. Abstract
  3. Introduction
  4. Assurance and sample size
  5. Assurance calculations for parametric survival models
  6. Assurance calculation for proportional hazard and nonparametric survivor function models
  7. Numerical examples
  8. Summary
  9. Acknowledgements
  10. References

In this section, we provide examples to illustrate how assurance is computed to inform the sample size choice under different model assumptions. We also present how elicited priors have an effect on the assurance. In each example, we suppose that a randomised controlled trial is going to be conducted to compare two treatment effects with an equal number of patients allocated to each treatment group. The sample sizes determined using power calculations are based on a 5% two-sided hypothesis test.

Exponential model

We first consider sample sizes based on power. We suppose that the trial has a 3-year recruitment period with a 2-year follow-up period and that 60% of patients receiving the standard treatment are expected to be alive after 5 years. For the power calculation, we consider an absolute 20% increase in patient survival for the experimental group.

Using Equation (7), we have the model parameters λ1 = 0.102 and λ2 = 0.0446. To achieve a specified power π * , the required sample size N is determined by solving

  • display math

where the power function πE is given in Equation (4).

To calculate assurance γE, an expert's judgments about the 5-year survival rate in the control group S1(5) and the 5-year survival difference ρ are assessed using univariate elicitation methods. Suppose this yields S1(5) ∼ B(as,bs) and ρ ∼ N(mρ,vρ). In the following, we look at three scenarios for the priors.

  • Scenario 1: ρ ∼ N(0.2,0.001) and S1(5) ∼ B(60,40).

  • Scenario 2: ρ ∼ N(0.2,0.05) and S1(5) ∼ B(60,40).

  • Scenario 3: ρ ∼ N(0.3,0.005) and S1(5) ∼ B(60,40).

In scenario 1, the prior ρ ∼ N(0.2,0.001) indicates a strong prior belief that the 5-year survival difference is around 0.2. In scenario 2, vρ = 0.05 implies that P(S2(5) − S1(5) > 0) = 0.769. In scenario 3, the prior ρ ∼ N(0.3,0.005) expresses the belief that P(S2(5) − S1(5) < 0) = 0.00003, that is, the experimenter believes that the experimental treatment has a very high probability of being superior.

Figure 1 shows how the assurances differ given the different joint prior distributions. When an expert has strong beliefs (scenario 1) that the treatment effect will be close to that as specified in the power calculation, the required sample size informed by assurance is similar to that determined by the power calculation. In scenario 2, the assurance cannot exceed 80%, as the prior probability of the new treatment being superior is 76.9%. In scenario 3, a smaller sample size may be required to achieve an 80% probability of having a successful trial given the very high prior probability of the experimental treatment being superior.

image

Figure 1. The comparison between power and assurance γE considering ρ ∼ N(mρ,vρ) and S1(5) ∼ B(60,40).

Download figure to PowerPoint

Weibull model

We first consider determining sample sizes using the power function under the Weibull model. Suppose that the 1-year survival rate in the control group is expected to be 20% and to decrease to 10% at the end of the second year. We consider a two-sided hypothesis test of the null hypothesis of no change in the mean survival time between the control and experimental groups. For the power calculation, we suppose that the survival rate in the experimental group at 1 year is 30% and at 2 years is 20%.

Using Equations (13) and (14), the Weibull parameters in each group are κ1 = 0.52, λ1 = 1.61, κ2 = 0.42 and λ2 = 1.20. Using Equations (10) and (11), the mean and variance of the survival times in each group are μ1 = 0.75, inline image, μ2 = 1.89, and inline image. To achieve a specified power π * , the require sample size N is determined by solving

  • display math

where πW is given in Equation (9).

To compute the assurance, we consider eliciting an expert's opinion about the 1-year survival rate in the control group S1(1), the survival difference at 1 year between two groups, δ12, and the difference in survival probability between the experimental and control groups at 1 and 2 years, denoted by δ11 and  δ22, respectively.

Suppose that we elicited quartiles for the uncertain quantities, as given in Table 1. Using the MATCH online elicitation tool, the distributions fitted to the elicited judgements are given later.

  • display math
Table 1. Elicited quartiles of S1(1), δ11, δ12 and δ22.
 S1(1)δ11δ12δ22
Lower quartile0.150.080.050.05
Median0.20.110.10.1
Upper quartile0.230.150.140.12

Figure 2 shows how the four survival rates S1(1), S1(2), S2(1) and S2(2) are correlated given the elicited quartiles listed in Table 1. Figure 3 illustrates the comparison between the power and assurance functions. Given the quartiles of the uncertain quantities in Table 1, the prior probability that the experimental treatment is indeed superior is 73.9%, which cannot be exceeded by the assurance. The large difference in terms of power and assurance given a fixed large sample size is because uncertainty in the prior distributions has a large influence on the probability of success.

image

Figure 2. Scatterplots of Si(1) and Si(2) for i = 1,2, given the elicited quartiles.

Download figure to PowerPoint

image

Figure 3. The comparison between power and assurance γW considering uncertainty in S1(1), δ11, δ12 and δ22.

Download figure to PowerPoint

Proportional hazard model

Suppose that a trial is planned to have a 5-month recruitment period with a 5-month follow-up period. In our example, we use the data given in Kaplan and Meier [29] as the available information for the standard treatment. The data are 0.8, 1.0 * , 2.7 * , 3.1, 5.4, 7.0 * , 9.2 and 12.1 * , where ‘ * ’ denotes censored observations.

We first consider sample size calculations using the power function. For the power calculation, we consider an absolute 17.5% increase in the 7-month survival rate for the experimental treatment compared with the standard. Using the Kaplan–Meier estimate, the survival rate at 7 months for the standard treatment is S1(7) = 0.525, so the corresponding log hazard ratio θ is − 0.591. Using Equation (20), the probability, Pe, that a patient will experience the outcome event during the trial is 0.42. To achieve a specified power π * , the required sample size N is determined by solving

  • display math

where πP is given in Equation (19).

Considering uncertainty in both the log hazard ratio and survivor function in the control group, the quantities that need to be elicited are the difference ρ in survival probabilities at 7 months between the experimental and standard treatment, and parameter function α of the Dirichlet process prior. Suppose we have already obtained the prior ρ ∼ N(0.175,0.01). Furthermore, an expert proposes the mean of the standard treatment survivor function to be an exponential distribution with a 5-month survival rate S1(5) of 50%. Hence, the parameter function α of the Dirichlet process is α([t, ∞ )) = c0 exp( − λt), where λ is given by

  • display math

We fix c0 at 1, which represents a fairly weak prior for the survival rates S1(.) with the data being dominant. Figure 4 shows the comparison between the power calculated on the basis of the Kaplan–Meier estimate of S1(.) and the assurance. The assurance (dashed line) in this case cannot exceed 95.5%, which is the prior probability of the experimental treatment being superior. If we had a stronger prior for ρ, for example, ρ ∼ N(0.175,0.001), then the calculated assurance would be very close to the power (as shown by the dotted line in Figure 4.

image

Figure 4. The comparison between power and assurance γP considering uncertainty in θ and S1(.).

Download figure to PowerPoint

Summary

  1. Top of page
  2. Abstract
  3. Introduction
  4. Assurance and sample size
  5. Assurance calculations for parametric survival models
  6. Assurance calculation for proportional hazard and nonparametric survivor function models
  7. Numerical examples
  8. Summary
  9. Acknowledgements
  10. References

We have extended the assurance method to accommodate time-to-event outcomes in clinical trials, assuming one of three analysis methods, and we have made software available for implementing our methods. The reliability of an assurance probability will depend on the reliability of the elicited prior, and so it will be important to check the robustness of assurances to the choice of prior. However, the process of formally assessing the evidence in support of a new treatment and quantifying the attendant uncertainties could itself form a useful part of the trial planning process. Overall, we believe that it is clearly useful to know the probability of a trial producing a successful result, and in the context of clinical trial planning, the extra effort required in using the assurance method is relatively small.

Acknowledgements

  1. Top of page
  2. Abstract
  3. Introduction
  4. Assurance and sample size
  5. Assurance calculations for parametric survival models
  6. Assurance calculation for proportional hazard and nonparametric survivor function models
  7. Numerical examples
  8. Summary
  9. Acknowledgements
  10. References

This work was funded by an EPSRC Dorothy Hodgkin PhD studentship, with financial support from Roche. We thank John Stevens, Simon Day and Nelson Kinnersley for helpful discussions and two referees for their suggestions to improve the paper.

References

  1. Top of page
  2. Abstract
  3. Introduction
  4. Assurance and sample size
  5. Assurance calculations for parametric survival models
  6. Assurance calculation for proportional hazard and nonparametric survivor function models
  7. Numerical examples
  8. Summary
  9. Acknowledgements
  10. References
  • 1
    Spiegelhalter DJ, Freedman LS. A predictive approach to selecting the size of a clinical trial based on subjective clinical opinion. Statistics in Medicine 1986; 5:113.
  • 2
    O'Hagan A, Stevens JW. Bayesian assessment of sample size for clinical trials of cost-effectiveness. Medical Decision Making 2001; 21:219230.
  • 3
    O'Hagan A, Stevens JW, Campbell MJ. Assurance in clinical trial design. Pharmaceutical Statistics 2005; 4:187201.
  • 4
    Chuang-Stein C. Sample size and the probability of a successful trial. Pharmaceutical Statistics 2006; 5:305309.
  • 5
    Chuang-Stein C, Yang R. A revisit of sample size decisions in confirmatory trials. Statistics in Biopharmaceutical Research 2010; 2:239248.
  • 6
    Schoenfeld DA, Richter JR. Nomograms for calculating the number of patients needed for a clinical trial with survival as an endpoint. Biometrics 1982; 38:163170.
  • 7
    Gross AJ, Clark VA. Survival Distributions: Reliability Applications in the Biomedical Sciences. Wesley: New York, 1975.
  • 8
    Freedman LS. Tables of the number of patients required in clinical trials using the logrank test. Statistics in Medicine 1982; 1:121129.
  • 9
    Schoenfeld DA. Sample-size formula for the proportional-hazards regression model. Biometrics 1983; 39:499503.
  • 10
    Spiegelhalter DJ, Abrams KR, Myles JP. Bayesian Approaches to Clinical Trials and Health-care Evaluation. John Wiley and Sons Ltd: England, 2004.
  • 11
    Ibrahim JG, Chen M, Sinha D. Bayesian Survival Analysis. Springer: New York, 2001.
  • 12
    Christensen R, Johnson W, Branscum A, Hanson T. Bayesian Ideas and Data Analysis: An Introduction for Scientists and Statisticians. CRC Press: Boca Raton, 2010.
  • 13
    Kadane JB, Wolfson LJ. Experiences in elicitation. Statistician 1998; 47:120 (with discussion, 55–68).
  • 14
    O'Hagan A, Buck CE, Daneshkhah A, Eiser JE, Garthwaite PH, Jenkinson DJ, Oakley JE, Rakow T. Uncertain Judgements: Eliciting Expert Probabilities. John Wiley and Sons Ltd: England, 2006.
  • 15
    Oakley JE, O'Hagan A. Shelf: The SHeffield ELicitation Framework (version 2.0), School of Mathematics and Statistics, University of Sheffield, 2010. http://tonyohagan.co.uk/shelf.
  • 16
    Johnson SR, Tomlinson GA, Hawker GA, Granton JT, Grosbein HA, Feldman BM. Methods to elicit beliefs for Bayesian priors: a systematic review. Journal of Clinical Epidemiology 2010; 63:355369.
  • 17
    Gore SM. Biostatistics and the medical research council. Medical Research Council News 1987; 35:1920.
  • 18
    Parmar MKB, Spiegelhalter DJ, Freedman LS. The CHART trials: Bayesian design and monitoring in practice. Statistics in Medicine 1994; 13:12971312.
  • 19
    Tan SB, Chung YFA, Tai BC, Cheung YB, Machin D. Elicitation of prior distributions for a phase III randomized controlled trial of adjuvant therapy with surgery for hepatocellular carcinoma. Controlled Clinical Trials 2003; 24:110121.
  • 20
    Hiance A, Chevret S, Lévy V. A practical approach for eliciting expert prior beliefs about cancer survival in phase III randomized trial. Journal of Clinical Epidemiology 2009; 62:431437.
  • 21
    R Development Core Team. R: a language and environment for statistical computing, R Foundation for Statistical Computing: Vienna, Austria, 2011. http://www.R-project.org/, ISBN 3-900051-07-0.
  • 22
    Bowman AW, Crawford E. R package rpanel: simple control panels (version 1.0-5), University of Glasgow, UK, 2008. http://www.stats.gla.ac.uk/~adrian/rpanel.
  • 23
    Tierney L. tkrplot: Tk rplot, 2011. http://CRAN.R-project.org/package=tkrplot, R package version 0.0-23.
  • 24
    Singpurwalla ND. An interactive PC-based procedure for reliability assessment incorporating expert opinion and survival data. Journal of the American Statistical Association 1988; 83:4351.
  • 25
    Berger JO, Sun D. Bayesian analysis for the poly-Weibull distribution. Journal of the American Statistical Association 1993; 88:14121418.
  • 26
    Kaminskiy MP, Krivtsov VV. A simple procedure for Bayesian estimation of the Weibull distribution. IEEE Transactions on Reliability 2005; 54:612616.
  • 27
    Susarla V, Ryzin JV. Nonparametric Bayesian estimation of survival curves from incomplete observations. Journal of the American Statistical Association 1976; 71:897902.
  • 28
    Kuo L, Smith AFM. Bayesian computations in survival models via the Gibbs sampler. In Survival Analysis: State of the Art, Klein JP, Goel PK (eds). Kluwer Academic: Dordrecht, 1992; 1124.
  • 29
    Kaplan EL, Meier P. Nonparametric estimation from incomplete observations. Journal of the American Statistical Association 1958; 53:45781.