• Open Access

Sequential methods for random-effects meta-analysis



Although meta-analyses are typically viewed as retrospective activities, they are increasingly being applied prospectively to provide up-to-date evidence on specific research questions. When meta-analyses are updated account should be taken of the possibility of false-positive findings due to repeated significance tests. We discuss the use of sequential methods for meta-analyses that incorporate random effects to allow for heterogeneity across studies. We propose a method that uses an approximate semi-Bayes procedure to update evidence on the among-study variance, starting with an informative prior distribution that might be based on findings from previous meta-analyses. We compare our methods with other approaches, including the traditional method of cumulative meta-analysis, in a simulation study and observe that it has Type I and Type II error rates close to the nominal level. We illustrate the method using an example in the treatment of bleeding peptic ulcers. Copyright © 2010 John Wiley & Sons, Ltd.

1. Introduction

The meta-analysis of results from multiple, similar studies is traditionally considered a retrospective activity. However, many meta-analyses are updated over time as new studies are undertaken or are identified from the previously hidden literature. In particular, systematic reviews in the Cochrane Database of Systematic Reviews1 are expected to be updated approximately every two years 2. Furthermore, genuinely prospective meta-analyses are increasingly being performed. For example, meta-analyses have been planned to combine evidence from existing, ongoing clinical trials 3, 4, and clinical trials have been designed with the aim of meta-analysing their results 5. In the field of genetic epidemiology, potentially important genetic variants are examined prospectively in multiple existing collections of DNA samples until sufficient evidence of their role has been determined 6, 7.

If a meta-analysis is conducted repeatedly on the addition of new studies, without any allowance for multiple testing, the overall risk of a false-positive finding will increase with the number of meta-analyses performed. Indeed, Berkey et al. note that when there is no underlying effect the process of continually updating a meta-analysis using standard significance tests will lead eventually to a false-positive result 8. One way to address this problem is to exploit formal sequential methods. Although sequential methods are well established in the analysis of individual randomized trials, they have received much less attention in the meta-analysis context. This is an area where opinions differ. Some argue that, because meta-analysts do not (typically) have control over the generation of new evidence, they are not in a position to act on stopping rules and so sequential methods should not be applied. In contrast, our view is that since meta-analysts frequently make decisions on whether to recommend further research or whether to recommend that there is convincing evidence for the presence or absence of an effect, sequential methods can be similarly important to meta-analyses as to individual primary studies. The intention is to facilitate a justifiable recommendation rather than to control future research directly.

Whitehead 9 describes the use of a stopping boundaries approach for fixed and random-effects meta-analyses based on the sequential trial methodology of Whitehead 10. Pogue and Yusuf suggest the use of monitoring boundaries based on alpha-spending functions and stochastic curtailment, with application to fixed-effect meta-analysis 11, and these have been implemented in practice 12. These methods are based on desirable sample sizes in terms of numbers of participants across all studies. Lan et al. apply the law of the iterated logarithm that ‘penalizes’ the usual test statistic to account for multiple tests, equivalent to using a particular open ended boundary in the framework of Whitehead, although the approach does not control Type II error 13, 14.

Heterogeneity of effects is a key characteristic of many meta-analyses, and estimation of the among-study variance may create problems in the early stages of a sequential meta-analysis. The methods of Pogue have been adapted to account for heterogeneity in a (retrospective) cumulative meta-analysis scenario, by adjusting the desired sample size based on a function of I2, a measure of inconsistency across the observed studies' findings 15, 16. The adjustment is derived essentially from a comparison of the (assumed known) variances of fixed-effect and random-effects meta-analysis estimates. In prospective meta-analyses these will not be known, although anticipated values of I2 might be used. Much of the present paper is devoted to the problem of accounting prospectively for heterogeneity, as it is the principal obstacle to the simple application of sequential trial methodology.

In this paper we consider the use of frequentist sequential approaches to meta-analysis with the aim of controlling the risk of drawing incorrect conclusions. We take an approach based on ‘group-sequential’ trial methodology, with a stopping rule. A previously proposed method is reviewed and extended to better incorporate random effects. We compare six methods (including a sequential fixed-effect method) through a simulation study and compare them in an example in the treatment of bleeding peptic ulcers. Our emphasis is on the determination of a straightforward approach that has good empirical properties, recognizing that even more formal methods are available through the use of dedicated software such as PEST 17.

2. A sequential approach to meta-analysis

2.1. Random-effects meta-analysis

Before introducing the sequential methods we review a general parametric approach to meta-analysis on which the sequential methods are based 18. Suppose we have an estimate, yi, of a true effect θi with estimated variance vi, from study i, i = 1, …, k. The effect might be measured as a log odds ratio or a difference in means for a comparative study, or a simple mean or logit proportion for a single group study; numerous other options are possible. We make the usual assumption that the estimates and variances are uncorrelated, although this will not always be the case (for example, if larger studies tend to be done in populations in which a treatment is less effective). Alternative approaches are available that do not rely on this assumption. For example, the raw data from the studies could be modelled, which may require individual participant data. Alternatively, all studies could be given equal weight 19, although recent proposals to adopt this practice have not been enthusiastically received 20–22.

A random-effects analysis involves inference on the distribution of the θi across studies. A standard assumption is that this distribution is normal with mean µ and variance τ2. The variance describes the extent of heterogeneity. A simple estimate of µ is given by the weighted average

equation image

where equation image represents the weight attributed to study i. An approximate variance of this estimate is given by

equation image

The heterogeneity variance τ2 may be estimated simply by equation image, a method of moments estimator described by DerSimonian and Laird 23, where

equation image

wi = 1/viand

equation image

is the standard statistic used to test for heterogeneity across studies. In many circumstances, inferences from a random-effects meta-analysis should be based on the full distribution of the θi across studies rather than on µ alone. This can be achieved using a predictive interval for the effect in a new study, which takes into account both the mean µ and variance τ2 of effects 24. In the methods we describe below, we assume that primary inference is to be made on the mean effect across studies, i.e. on µ.

If equation image is equal to zero, then wmath image is equal to wi and the resulting calculations lead to a fixed-effect meta-analysis.

The estimates yi and vi are typically calculated from either Wald statistics or score statistics. For the former, equation image and equation image, where equation image denotes the maximum likelihood estimate of θi and equation image its variance. For the latter, we may write yi = Zi/Viand vi = 1/Vi, where Zi is the efficient score for θi and Vi is Fisher's information, both evaluated at θi = 0. When sample sizes are large and θi is reasonably near 0, there will be little difference between the results from the two approaches.

2.2. A monitoring boundary approach to sequential random-effects meta-analysis

To follow the progress of a meta-analysis over time we require a measure of the amount of relevant information contained within it. One approach is to use the total number of study participants 11, 15, 16. Pogue et al. introduce a desirable total that they refer to as the ‘optimal information size’, based on standard sample size calculations 11. Wetterslev et al. adapt this to account for observed heterogeneity in a retrospective cumulative meta-analysis 16, although in our applications this heterogeneity is yet to be observed. In contrast, Whitehead 9 and Lan et al.13, 14 use statistical information in the form of the inverse variance of equation image or, equivalently, the sum of the weights (specifically, Whitehead uses Fisher's information). We choose the statistical information approach, since it relates directly to the precision of the meta-analysis estimate and hence the amount of evidence contained in the collection of studies. In particular, if highly heterogeneous studies are combined sequentially over time, the precision of a random-effects meta-analytic estimate may decrease while the number of participants increases. Our approach may be used when study estimates yi and wi are calculated either from Wald statistics (as in Lan et al.) or score statistics (as in Whitehead).

A boundaries approach to sequential meta-analysis involves monitoring information using the sum of weights, equation image after tj studies have been included, j = 1, 2, …. A plot of Vj against the cumulative effect size, equation image, at each update of the meta-analysis provides a visualization of the path of the meta-analysis over time. A line from the origin to a point (Zj, Vj) has slope equal to the simple random-effects meta-analytic estimate at the corresponding stage. Note that as information about τ2 changes over time, the individual study weights, wmath image, are recalculated on each update.

In the absence of heterogeneity, and when τ2 is assumed to be zero (as in a fixed-effect meta-analysis), this approach may be justified formally 9. It is equivalent to the group-sequential approach to monitoring clinical trials described by Whitehead 10. Monitoring boundaries may be used to stop the meta-analysis when there is sufficient evidence of an effect or lack of an effect based on a pre-specified significance level, power and clinically important effect.

Whitehead 9 describes the implementation of a triangular design in the context of a prospectively planned meta-analysis, and Bollen et al. implement a double-triangular design 25. Here we focus on the restricted procedure of Whitehead, equivalent to an O'Brien and Fleming stopping rule 26. The O'Brien and Fleming design produces a rectangular stopping boundary in the (Z, V) plane, with symmetric upper and lower boundaries at Z = ± H and a vertical boundary at V = Vmax reflecting the maximum amount of information to be collected (Figure 1). Table I provides monitoring boundaries for a selection of choices of two-sided significance level (α) and power (1 − β), obtained using the software PEST 4.0 17. To allow for the possibility of the path crossing the boundary between updates, a ‘Christmas tree correction’ is applied, bringing the vertical boundaries in at update j to values equation image and equation image, where V0 = 0 10.

Figure 1.

Sequential fixed-effect meta-analysis of peptic ulcers data.

Table I. Monitoring boundaries in the (Z, V) plane for the O'Brien and Fleming design.
1 − βBoundary0.0010.010.05
  1. Values presented correspond to a reference effect size of µR = 1. Boundaries for other values of µR are obtained by dividing H by µR, and Vmax by µmath image.

0.8Horizontal (±H)±14.576±9.779±6.457
 Vertical (Vmax)17.53512.1388.299
0.9Horizontal (±H)±16.120±11.029±7.461
 Vertical (Vmax)21.44715.43811.079
0.95Horizontal (±H)±17.394±12.061±8.288
 Vertical (Vmax)24.97218.46113.673

As soon as |Zj|⩾Hj or VjVmax, the meta-analysis will be stopped with the conclusion that the mean effect is in one direction or the other (for the former criterion) or that the mean effect is zero (if only the latter criterion is true). We refer to the values of Zj and Vj at this point as Zend and Vend. If the path stops with Vend>Vmax, then we extend the Christmas tree correction to Vend, and consider an effect to be present if |Zend|⩾Hend.

The methods we describe can be extended to address incremental information within studies as well as across studies. For instance, if at time j, study i has interim data yielding an estimate yij and weight wmath image (based on the current estimate of τ2 across currently available data), then we can calculate

equation image

This situation was addressed by Whitehead 9 but we do not pursue it further in the present paper.

2.3. Repeated confidence intervals

The sequential procedure, described in the previous section in terms of monitoring boundaries, may also be presented using the repeated confidence intervals approach introduced by Jennison and Turnbull 27. This enables us to present sequential meta-analyses using forest plots, the conventional way of illustrating meta-analyses. Lower and upper limits, Lj and Uj, for repeated confidence intervals after the inclusion of tj studies can be obtained by inverting the sequential design, yielding

equation image(1)

The stopping rules based on the horizontal monitoring boundaries, that is stop the meta-analysis if Zj⩽ − Hj or if ZjHj, is equivalent to stopping if the interval (Lj, Uj) excludes 0.

The repeated confidence intervals (Lj, Uj) have the property that all of them, calculated up until V = Vmax, contain the true value of the parameter µ with probability 1 − α, where αis the chosen two-sided significance level. When the amount of information is less than Vmax, the confidence interval series has probability greater than 1 − α that the truth is contained in every interval. This means that at any time, an individual confidence interval has probability greater than 1 − α of containing the truth, and is therefore wider than a confidence interval from a conventional meta-analysis.

The repeated confidence interval calculated at V = Vmax is the last valid member of the sequence. Thus if the amount of information exceeds Vmax it is unclear how to proceed. For consistency with the previous section, we propose the calculation of repeated confidence intervals until and including the first time that Vmax is exceeded. Thereafter, this last confidence interval is used irrespective of how many additional data accumulate. This is likely to be conservative, but is based on the result that the coverage property of the valid sequence of repeated confidence intervals is also satisfied by the intersection of all of the repeated confidence intervals.

2.4. Formal versus ad hoc stopping rules

At each analysis the repeated confidence interval provides confidence limits that are valid and not dependent on any stopping rule which might be used. Thus information about the magnitude of µ can be obtained from the latest confidence interval. If the sequential meta-analysis follows a formal stopping rule, then a final analysis would be conducted after a stopping boundary had been crossed. A point estimate of µ is given by Zend/Vendand the repeated confidence interval can be calculated. However, this point estimate may be a biased estimate. Methods for obtaining adjusted estimates and confidence intervals are available based on sequential methodology (see, for example, Chapter 5 of Whitehead 10), but these require specialized software, and we do not address them in detail here. The adjusted confidence interval will be narrower than that based on repeated confidence interval methodology. This is because the latter is not dependent on the stopping rule that has been used.

2.5. Estimation of heterogeneity variance

A key issue in sequential random-effects meta-analysis is the heterogeneity variance parameter, τ2. In the early stages of a sequential procedure, when very few studies are included, uncertainty surrounding any estimate of among-study variance will be considerable. Lan et al. prefer to over-estimate τ2 by using the arithmetic mean of the yi when there are five or fewer studies, mainly to avoid spurious estimates of zero 13, 14. Whichever estimate is used, consecutive point estimates of the parameter are likely to be variable, and if equation image it is possible that Vj will be smaller than Vj − 1. As a consequence, the path (Zj, Vj) of the sequential meta-analysis will go backwards. The situation also has implications for the Christmas tree correction, and if Vj is smaller than Vj − 1 we set Hj = H, so that a correction is only made if the path goes forwards. Within a repeated confidence interval framework, a decrease from Vj − 1 to Vj yields a confidence interval wider than the previous one. We consider that the possibility that the confidence intervals can become wider (or that the path goes backwards)—and the actual widening of the confidence intervals—is a suitable reflection of the increased uncertainty resulting from identifying inconsistency among the studies.

In Section 3 we consider five methods for addressing the heterogeneity variance in a sequential meta-analysis: (i) ignoring it (a fixed-effect meta-analysis); (ii) implementing a standard random-effects method (using the DerSimonian–Laird estimate); (iii) accounting for uncertainty in τ2 (using the approach of Biggerstaff and Tweedie 28); (iv) Bayesian updating of uncertainty in τ2; and (v) approximate Bayesian updating of uncertainty in τ2. The last two approaches are proposed to allow for prior information on τ2 to be incorporated in the early stages of the sequential meta-analysis. We refer to the sequential approaches using Bayesian inference for τ2 as ‘semi-Bayes’ to reflect the fact that inference on the parameter of interest (µ) is frequentist. We compare the five approaches, contrasting them with a naïve cumulative random-effects meta-analysis without correction for multiple looks, in a simulation study in Section 4.

3. Specific methods for sequential meta-analysis

3.1. Fixed-effect sequential meta-analysis using the general parametric approach

For a fixed-effect sequential meta-analysis, as described by Whitehead 9, we take

equation image

where yi and wi are as defined in Section 2.1. As with all methods described below, we apply the confidence interval formulae in Equation (1) to produce repeated confidence intervals.

3.2. Random-effects sequential meta-analysis using the general parametric approach

For a random-effects sequential meta-analysis, also described by Whitehead, we take

equation image

where yi and wmath image are as defined in Section 2.1 and wmath image involves the DerSimonian–Laird estimator, equation image, applied to the studies accumulated so far in the sequential meta-analysis. The estimate of τmath image is assumed known at each update, as in a conventional meta-analysis.

3.3. Accounting for uncertainty in the estimation of τ2

Methods that incorporate uncertainty in equation image are available (Biggerstaff and Tweedie 28, Hardy and Thompson 29). Biggerstaff and Tweedie consider an approach for accounting for uncertainty in the DerSimonian–Laird estimate based on a gamma approximation to the distribution of Q. This leads the authors to a revised weighting scheme for combining the effect estimates, yi:

equation image

where equation image are obtained as the expected values of the DerSimonion–Laird weights, wmath image, over the approximating distribution for equation image (allowing for truncation at zero), obtained for example using numerical integration.

As the new weights, equation image, are no longer inverse variances, we define Zj and Vj as

equation image

Incorporating uncertainty in estimates of τ2 into the meta-analysis should slow the progress of the (Z, V) path towards the boundary, since the uncertainty is reflected in a smaller value for V. Early repeated confidence intervals are likely to be wide.

3.4. Incorporating prior information about heterogeneity: Bayesian updating of τ2

A natural approach to making inferences about τ2 in the presence of a small number of studies is to draw on external evidence about its likely value. An informative prior distribution may contribute heavily to producing a realistic estimate in the early stages of a sequential meta-analysis. We may therefore expect the estimates of τ2 in the meta-analysis to be less erratic, and the path to proceed forward towards the boundaries. We propose to use Bayes' theorem to update the value of τ2, and to insert this into the frequentist sequential approach outlined above.

A suitable prior distribution may be developed following the ideas of Higgins and Whitehead 30, who gathered a collection of meta-analyses in the same therapeutic area and produced an empirical distribution from the observed degrees of heterogeneity for a similar type of outcome. A point summary from the prior distribution may be used in the analysis of the first study in the sequential meta-analysis, and point summaries from updated posterior distribution for τ2 may be used in subsequent updates of the sequential meta-analysis.

Let p2) represent the prior distribution for τ2, which here will be considered to be an inverse gamma distribution with shape parameter η and scale parameter λ. At the jth meta-analysis the likelihood function for the data L(µ, τ2;{y1, …, ymath image}, {v1, …, vmath image}) is a product of normal distributions with means equal to µand variances equal to (vi + τ2). The joint posterior distribution of µand τ2 at this meta-analysis, assuming an independent prior distribution, p(µ), for µ, is given by

equation image

Inference on τ2 would normally be performed by integration or simulation. However, µ is unknown and in our frequentist analysis is not associated with a probability distribution, so cannot be integrated out. A solution is to replace µ by its estimate from the (j − 1)th meta-analysis. This makes τ2 the only unknown parameter in the posterior formula, and a simple one-dimensional numerical integration can be performed to obtain its posterior mean, equation image as follows.

equation image

If the first meta-analysis (j = 1) contains several studies (i.e. t1>1), then a suitable value for equation image might be provided by the estimate from a standard random-effects meta-analysis based on the first (t1 − 1) studies. If the first meta-analysis contains exactly two studies (t1 = 2) we would set equation image equal to y1.

We opt for the mean rather than the median or mode since not only is it straightforward to calculate, but it will tend to overestimate τ2 in the early stages, producing an analysis that is less likely to yield a false positive result. As information accumulates, and the posterior distribution of τ2 develops a stronger peak, the mean will move towards the median and the mode.

For a random-effects sequential meta-analysis, we take

equation image

where equation image. In principle, the approach could be extended to incorporate additional uncertainty in τ2 by applying the ideas of Biggerstaff and Tweedie. Specifically, the integrations in the calculation of equation image could be performed for the weights wmath image rather than for equation image. We have not implemented this extension.

3.5. An approximation to the Bayesian approach for updating τ2

We now propose a simpler approach that avoids the numerical integration. Suppose the true effects in the studies to date were known exactly, so that yi = θi and vi = 0. Then the inverse gamma prior distribution for τ2 is the conjugate prior distribution for the unknown variance of a normal distribution, and the posterior distribution for τ2 is then an inverse gamma posterior distribution with parameters (η + (tj/2)) and (λ + (tjτmath image/2)), where equation image is the heterogeneity variance among the studies up to tj. We may estimate this variance using the DerSimonian–Laird estimate equation image. As the mean of an inverse gamma distribution with parameters η and λ is given by λ/(η − 1), it follows that the posterior mean for τ2 is given by

equation image

where equation image is the mean of the prior inverse gamma distribution. The posterior mean can therefore be viewed as a weighted average of the prior estimate of τ2 and the usual estimate from the studies in the meta-analysis. As the number of studies increases, the effect of the prior information decreases.

For a random-effects sequential meta-analysis, our approximate semi-Bayes approach takes

equation image

where equation image. R code for this method is included in the Appendix.

In practice, the approximations here might be expected to yield adequate results when the studies are large (in which case yi≈θi and vi≈0), and to overestimate τ2 when the studies are small, due to the overdispersion in the distribution of the estimates, yi, compared with the distribution of the true effects, θi.

4. Simulation study

4.1. Methods of the simulation study

We conducted a simulation study to examine the behaviour of the five sequential meta-analysis methods described in Section 3 under a wide range of conditions. These were also compared with a traditional cumulative meta-analysis approach that simply performs a conventional random-effects meta-analysis on the addition of each study. We chose an O'Brien and Fleming design with 90 per cent power to detect an effect µR of 0.5 with a two-sided significance level of 5 per cent. This value for µR corresponds to a moderately sized standardized mean difference 31, or to an odds ratio, hazard ratio or risk ratio of 1.65. The boundaries for the design, obtained from the package PEST 4.0, are given by H = 14.92 and Vmax = 44.32 (see Table I).

The simulation study was conducted with true effect sizes of µ = 0, 0.25, 0.5, and 1, that is ranging from no effect to an effect substantially larger than µR. Values for the heterogeneity variance τ2 were 0, 0.0625, and 0.25. We simulated studies such that the expected number of studies under the null hypothesis under a fixed-effect analysis would be t = 5, t = 10, and t = 20, as described below. This corresponds to an approximate doubling of sample size for each case we investigated. In total we examined 36 scenarios.

Within each simulated sequential meta-analysis the true effect size from each study, θi, was sampled from a N(µ, τ2) distribution. The mean within-study variance of the effect estimates, yi, across all studies was taken to be σ2 = t/Vmax, ensuring that there would be approximately t studies in a sequential fixed-effect meta-analysis if the vertical boundary was crossed at Vmax and τ2 = 0. The within-study variance of the effect estimate for each study was sampled from a uniform distribution in the range (σ2 − 0.75σ2, σ2 + 0.75σ2). This ensures that the studies have different degrees of precisions, reflecting a realistically wide range of study sizes within each meta-analysis. As an approximate interpretation, for the case of one-group studies estimating means with a between-person standard deviation equal to 4 in every study, this ensures that, for t = 5, sample sizes would be distributed between 81 and 567 rather than all being equal to 142. The prior distribution for τ2 for the semi-Bayes methods was an inverse gamma distribution with parameters η = 1.5 and λ = 0.08. This corresponds to a prior estimate for τ2 of 0.16. This is less than the maximum simulated heterogeneity, and, if the true effect size were 0.5, allows for some studies to have negative effect sizes. Taking η = 1.5 is equivalent to having prior information with the weight of one study, so the prior estimate contributes less than 10 per cent towards the posterior estimate of heterogeneity after ten studies have been included in the meta-analysis.

For each of the 36 scenarios 5000 sequential meta-analyses were simulated and analysed using each of the six approaches, using the procedures described in Section 2. The stopping rule for the naive cumulative meta-analysis method was to stop (conclude an effect exists) if the naive 95 per cent confidence interval excludes 0; or to stop (conclude no effect) if Vj exceeds Vmax; or otherwise to continue. The simulation was conducted in R (the program is available upon request).

From the simulations, for each approach, we recorded (i) the mean number of studies taken to reach a stopping boundary of the design; (ii) the proportion of simulations stopping having crossed the upper boundary (in order to evaluate Type I error under the null hypothesis); (iii) the percentage of simulations in which all calculated two-sided 95 per cent confidence intervals contained the true effect size µ (that is, the coverage of all confidence intervals); (iv) the percentage of simulations in which the confidence interval at the time when a stopping boundary is crossed (based on Zend and Vend) contained the true effect size µ(coverage of the last confidence interval); (v) estimates of the heterogeneity variance, τ2, at the time when a stopping boundary is crossed and (vi) mean value of Vend.

4.2. Results of the simulation study

Table II–IV give a selection of the results when t = 5 (few, large studies), t = 10 (more, average sized studies) and t = 20 (many, small studies). The results of the simulations should be interpreted in comparison with the fixed-effect sequential method as well as in isolation. The fixed-effect sequential method should in theory have good properties in the absence of heterogeneity. However, the results do not agree with the expectations due to a combination of (i) the inadequacy of the Christmas tree correction; (ii) our random sampling of within-study variances from a uniform distribution, which means that we cannot guarantee to hit the Vmax boundary with the tth study and (iii) the ad hoc decision to consider an effect present when Vend>Vmax and |Zend|⩾Hend, a region in which sequential theory does not formally apply. The third issue is expected to increase the probability of declaring that an effect exists, with a larger increase the further away Vend is from Vmax. When µ = 0, the vertical boundary will be crossed in the majority of simulations, and it can be seen from Table II2 = 0) with approximately five studies to reach Vmax that the Type I error rate is 0.039 instead of 0.025. When there are more and smaller studies, for example in Table IV, Vend moves closer to Vmax and it can be seen that the Type I error rate reduces to 0.028. In addition, the probability that all confidence intervals up to and including the one at Vend will contain the true effect size will only be equal to 0.95 if Vend = Vmax. If Vend<Vmax, this probability will be greater than 0.95, and if Vend>Vmax this probability will be less than 0.95. The closer that Vend is to Vmax, the closer will be the probability to 0.95. When µ = 0 and τ2 = 0, this probability is equal to 0.922 (Table II) and 0.939 (Table IV). The results for the other sequential meta-analysis approaches should be viewed in this light.

Table II. Results from simulation study with approximately five studies.
µτ2 Naïve random-effectsFixed-effectRandom-effectsBTApprox. semi-BayesSemi-Bayes
  1. BT, Biggerstaff and Tweedie.

00Studies at stopping6.
  Coverage, all CIs90.
  Coverage, last CI90.
  τ2 median0.01400.0150.0150.0350.060
  τ2 mean00000.0210.057
  V at stopping48.648.749.049.048.547.4
 0.0625Studies at stopping7.
  Coverage, all CIs79.981.586.286.391.294.6
  Coverage, last CI79.981.586.286.391.294.6
  τ2 median0.04000.0440.0440.0630.074
  τ2 mean0.007200.0150.140.0410.066
  V at stopping47.748.548.348.348.047.2
 0.25Studies at stopping10.84.813.113.114.312.7
  Coverage, all CIs64.055.677.777.783.783.9
  Coverage, last CI64.055.677.777.783.783.9
  τ2 median0.1500.
  τ2 mean0.1200.
  V at stopping45.546.346.
0.50Studies at stopping3.
  Coverage, all CIs89.391.692.592.595.898.2
  Coverage, last CI96.096.397.096.998.499.2
  τ2 median0.0300.0180.0100.0450.068
  τ2 mean00000.0320.065
  V at stopping22.
 0.0625Studies at stopping3.
  Coverage, all CIs80.880.986.786.791.394.6
  Coverage, last CI92.991.594.594.596.797.9
  τ2 median0.06200.0460.0460.0120.073
  τ2 mean0.001400.00460.00460.0430.081
  V at stopping21.031.430.730.730.629.8
 0.25Studies at stopping5.
  Coverage, all CIs63.856.477.978.183.183.5
  Coverage, last CI86.779.591.891.994.994.6
  τ2 median0.1700.
  τ2 mean0.1000.
  V at stopping18.130.628.428.428.528.2
Table III. Results from simulation study with approximately 10 studies.
µτ2 Naïve random-effectsFixed-effectRandom-effectsBTApprox. semi-BayesSemi-Bayes
  1. BT, Biggerstaff and Tweedie.

00Studies at stopping10.910.511.911.912.513.9
  Coverage, all CIs84.793.293.993.995.196.9
  Coverage, last CI84.793.293.993.995.196.9
  τ2 median0.02200.0250.0220.0330.067
  τ2 mean00000.0150.065
  V at stopping46.346.446.646.646.746.1
 0.0625Studies at stopping11.610.313.413.414.014.5
  Coverage, all CIs77.488.590.990.992.594.5
  Coverage, last CI77.488.590.990.992.594.5
  τ2 median0.04900.0570.0570.0670.079
  τ2 mean0.006600.0240.0240.0380.071
  V at stopping46.046.446.546.546.546.0
 0.25Studies at stopping15.19.719.519.519.917.5
  Coverage, all CIs64.971.885.985.887.887.7
  Coverage, last CI64.971.885.985.887.887.7
  τ2 median0.1600.
  τ2 mean0.1300.
  V at stopping45.145.945.745.845.845.7
0.50Studies at stopping5.
  Coverage, all CIs85.993.694.694.695.597.4
  Coverage, last CI96.397.798.
  τ2 median0.04300.0310.0310.0480.073
  τ2 mean00000.0230.069
  V at stopping16.130.029.729.729.629.5
 0.0625Studies at stopping5.
  Coverage, all CIs78.287.990.890.992.794.6
  Coverage, last CI94.395.796.896.897.498.1
  τ2 median0.07500.0610.0610.0770.085
  τ2 mean000.0130.0140.0370.076
  V at stopping15.429.929.729.729.729.4
 0.25Studies at stopping7.16.812.312.412.711.3
  Coverage, all CIs65.072.185.485.287.086.5
  Coverage, last CI89.990.295.595.595.995.5
  τ2 median0.1800.
  τ2 mean0.1100.
  V at stopping13.529.
Table IV. Results from simulation study with approximately 20 studies.
µτ2 Naïve random-effectsFixed-effectRandom-effectsBTApprox. semi-BayesSemi-Bayes
  1. BT, Biggerstaff and Tweedie.

00Studies at stopping19.716.322.522.522.824.2
  Coverage, all CIs80.793.994.794.895.296.3
  Coverage, last CI80.793.994.794.895.296.3
  τ2 median0.03500.0380.0380.0440.073
  τ2 mean00000.00080.069
  V at stopping45.345.445.545.545.545.3
 0.0625Studies at stopping20.320.324.124.124.424.8
  Coverage, all CIs76.191.593.193.293.695.0
  Coverage, last CI76.191.593.193.293.695.0
  τ2 median0.06300.0710.0710.0760.084
  τ2 mean0.001700.0230.0230.0310.076
  V at stopping45.245.445.545.545.545.2
 0.25Studies at stopping23.719.730.630.630.727.2
  Coverage, all CIs67.083.790.590.590.889.8
  Coverage, last CI67.083.790.590.590.889.8
  τ2 median0.1700.
  τ2 mean0.1400.
  V at stopping44.845.245.345.345.345.2
0.50Studies at stopping8.613.615.215.215.516.3
  Coverage, all CIs79.993.494.494.494.795.8
  Coverage, last CI95.897.897.897.897.998.7
  τ2 median0.06700.0490.0490.0580.081
  τ2 mean00000.0160.076
  V at stopping13.629.329.
 0.0625Studies at stopping9.013.616.216.216.416.6
  Coverage, all CIs75.891.793.393.493.995.4
  Coverage, last CI95.097.698.
  τ2 median0.09200.0790.0790.0870.090
  τ2 mean000.0150.0150.0280.081
  V at stopping13.129.228.828.828.829.0
 0.25Studies at stopping10.113.219.819.920.017.8
  Coverage, all CIs66.084.090.890.890.789.6
  Coverage, last CI91.794.897.
  τ2 median0.2000.
  τ2 mean0.1000.
  V at stopping11.328.027.827.827.928.1

The Biggerstaff and Tweedie method is not noticeably different from the sequential random-effects meta-analysis. This is because it does not change the estimate of heterogeneity, but widens the confidence interval in line with the uncertainty in the estimation of heterogeneity. This difference in confidence intervals appears to be small after the first few studies, so the Biggerstaff and Tweedie adjustment has little impact on the analysis.

The simulation results show that the sequential random-effects meta-analysis generally takes at least one study more to reach a stopping boundary than the sequential fixed-effect meta-analysis, the additional number of studies increasing with increasing heterogeneity. The semi-Bayes methods generally result in about one more study than the sequential random-effects meta-analysis. This shows that taking account of heterogeneity can reduce the risk of early stopping.

As the number of studies increases it can be seen that the Type I error rate for the naïve cumulative random-effects meta-analysis without correction for multiple looks also increases, in contrast to the other random-effects approaches which allow for the multiple looks. Type I error rates increase with increasing heterogeneity, this being most noticeable for the sequential fixed-effect meta-analysis, which makes no allowance for heterogeneity. The use of a sequential random-effects meta-analysis improves the Type I error rate in the presence of heterogeneity, but it is still unacceptably high when there is a moderate amount of heterogeneity. Hence these methods may produce spurious findings if there is heterogeneity. The semi-Bayes methods produce values closer to the desired value, but they too show a slight increase with heterogeneity.

The semi-Bayes methods show considerably greater coverage than the sequential fixed-effect meta-analysis when there is no heterogeneity. This over-coverage is caused by their invalid prior assumption of non-zero heterogeneity. The coverage falls off rapidly with heterogeneity for the sequential fixed and random-effects meta-analyses. This reduction in coverage is smaller for the semi-Bayes methods.

There is some underestimation of heterogeneity in the sequential random-effects meta-analysis, which may lead to smaller confidence intervals and hence to early stopping. The semi-Bayes methods provide estimates of τ2 that are closer to the true value provided that τ2 is not equal to 0. If there is no heterogeneity the Bayesian methods give an overestimate.

In conclusion, the semi-Bayes and approximate semi-Bayes methods had the most desirable false-positive and coverage properties in our simulation study, despite the prior distribution being informative and derived independently of the simulated meta-analyses. Our findings may be dependent on the choice of prior distribution, and we return to this issue in the Discussion.

5. Application

Sacks et al. consider the data from 23 trials that compare endoscopic haemostasis with a control treatment for the treatment of bleeding peptic ulcers 32. The outcome of interest is post-treatment bleeding and the treatment effect recorded here is the log-odds ratio of ‘no bleeding’. The trial estimates yi and vi are calculated from Wald statistics, that is

equation image

where S1 and F1 are the number of patients with ‘no bleeding’ and ‘bleeding’ in the endoscopic haemostasis group, and S2 and F2 are defined similarly for the control group. A positive value for yi favours the experimental treatment. The data and log-odds ratio estimates and confidence intervals for the individual studies are presented in Table V. For trials in which there were no patients with ‘no bleeding’ or no patients with ‘bleeding’ in at least one treatment group, 0.5 was added to all of the four cells of the 2 × 2 contingency table. Although this is common practice, for meta-analyses of rare events it is often preferable to use alternative approaches such as logistic regression 33. However, such methods do not fall conveniently into the general sequential framework we describe. A conventional random-effects meta-analysis of the full data set (Figure 2) demonstrates a significant benefit of treatment, with a log-odds ratio of 1.09 (an odds ratio of 3.0), but with considerable heterogeneity equation image.

Figure 2.

Random-effects meta-analysis and naïve random-effects cumulative meta-analysis of peptic ulcers data.

Table V. Peptic ulcers data: randomized trials of endoscopic hemostasis compared with control for bleeding peptic ulcers: Log-odds ratio of ‘no bleeding’.
Trial, iBledTotalBledTotalwiwiyiLog-odds ratio (yi) [odds ratio]95 per cent CI equation image
1. Vallon (1980)206823687.321.500.20[1.2]−0.52, 0.93
2. Swain (1981)113617404.292.220.52[1.7]−0.43, 1.47
3. Papp (1982)11613160.682.834.17[65]1.79, 6.56
4. Rutgeerts (1982)55219543.315.391.63[5.1]0.55, 2.71
5. MacLeod (1983)6218242.380.530.22[1.3]−1.05, 1.49
6. Jensen (1984)27790.741.622.17[8.8]−0.10, 4.44
7. Kernohan (1984)9217242.52−1.51-0.60[0.5]−1.83, 0.63
8. Goudie (1984)7215252.15−1.49-0.69[0.5]−2.03, 0.64
9. Freitas (1985)73617423.623.751.04[2.8]0.01, 2.07
10. Swain (1986)76927684.548.001.76[5.8]0.84, 2.68
11. O'Brien (1986)17101341038.727.760.89[2.4]0.23, 1.55
12. Krejs (1987)198518897.28−0.92−0.13[0.9]−0.85, 0.60
13. Brearley (1987)6208212.270.820.36[1.4]−0.94, 1.66
14. Moreto (1987)11611210.802.232.80[16.5]0.61, 5.00
15. Laine (1987)01012140.391.814.65[105]1.51, 7.80
16. Panes (1987)35525582.366.092.57[13.1]1.30, 3.85
17. Chung (1987)03434340.252.098.47[4761]4.52, 12.4
18. Balanzo (1988)73615363.433.721.08[3.0]0.03, 2.14
19. Fellerton (1989)0205230.441.092.50[12]−0.46, 5.46
20. Angerinas (1989)7334322.14−1.36−0.63[0.5]−1.97, 0.71
21. Rutgeerts (1989)104012202.934.401.50[4.5]0.36, 2.65
22. Chiozzini (1989)4345191.801.780.99[2.7]−0.47, 2.45
23. Laine (1989)73815373.483.851.11[3.0]0.05, 2.16

We suppose for the purposes of illustration that this meta-analysis had been planned with the aim of incorporating observations at the end of each trial as they became available. Given the anticipated heterogeneity of effects, we apply the random-effects methods described above. The analysis uses an O'Brien and Fleming design in order to avoid early stopping unless the odds ratio is very large, with a significance level of 5 and 90 per cent power to detect a log-odds ratio of 0.693 (an odds ratio of 2). From Table I, for µR = 0.693 this gives H = 10.77 and Vmax = 23.07 (Figure 1).

The sequential random-effects meta-analysis and the approximate semi-Bayes and semi-Bayes methods are applied and compared with a sequential fixed-effect meta-analysis (Figure 3). To assess sensitivity to prior distributions, two inverse gamma prior distributions for τ2 are used for the semi-Bayes methods: an IG (1.5, 0.08) has a prior mean of 0.16 and an IG (1.5, 1) has a prior mean of 2. These prior means are respectively much smaller and much larger than the DerSimonian–Laird estimate based on all of the trials. Table VI shows the number of studies included at the point when the stopping criterion is met, together with estimates of the treatment effect (Zend/Vend) and heterogeneity, and the last repeated confidence interval. If the plan for this sequential meta-analysis had been to follow the formal stopping rule of the O'Brien and Fleming design, then it would be more appropriate to present a ‘final’ analysis which correctly adjusts for earlier looks. The results from conducting this ‘final’ analyses using the PEST package 17 are included in Table VI. Such an analysis corrects for the bias in the Zend/Vendestimate, and provides an appropriate P value and confidence interval. The confidence interval is narrower than the last repeated confidence interval, because the repeated confidence intervals have had no effect on the decision to conduct further studies.

Figure 3.

Sequential meta-analysis of peptic ulcers data.

Table VI. Results from the sequential meta-analyses of the peptic ulcers data.
  FixedRandomApprox semi-Bayes IG(1.5, 0.08)Approx semi-Bayes IG(1.5, 1)Semi-Bayes IG(1.5, 0.08)Semi-Bayes IG(1.5, 1)
At stoppingNumber of trials4111115915
 confidence0.14, 1.390.014, 1.630.042, 1.590.032, 1.750.015, 1.200.0054, 1.79
 equation image00.550.520.740.170.79
PEST analysisP-value0.00550.00520.00410.00480.01610.0039
 unbiased estimate of µ [odds ratio] Confidence[2.1][2.2][2.3][2.2][1.8][2.3]
 interval for µ0.22, 1.240.24, 1.380.26, 1.370.25, 1.370.11, 1.090.27, 1.41

These results show that a sequential fixed-effect meta-analysis, which ignores the heterogeneity, stops after four studies (see also Figure 1), leading to a smaller estimate of the treatment effect compared with that from all trials and a confidence interval that is narrow. The sequential random-effects method requires 11 studies, and produces a slightly larger estimate of the treatment effect and a wider confidence interval. When the prior distribution provides a much smaller underestimate of τ2 than is apparent in the dataset, as with the IG (1.5, 0.08) prior, the semi-Bayes method stops after a smaller number of studies (nine) although the approximate semi-Bayes method carries on for the same 11 studies as for the sequential random-effects method. The earlier stopping of the semi-Bayes IG (1.5, 0.08) method results in smaller estimates of treatment effect and heterogeneity. The use of an IG (1.5, 1) prior with its large prior estimate of heterogeneity causes the approximate semi-Bayes and semi-Bayes methods to stop after 15 trials. The estimates of treatment effect and heterogeneity from the sequential approaches are smaller than those from the overall random effects analysis of 23 studies. This is mainly due to the seventeenth trial, which has a large, beneficial effect of treatment, increasing the heterogeneity.

In our example, we have assumed that a sequential approach to the meta-analysis had been planned before the trials had reported their results. The example is instructive in illustrating some of the limitations of applying our generic statistical methods prospectively in a meta-analysis situation. We have remarked already that 2 × 2 contingency tables may be better analysed using binomial likelihoods. This would have avoided the need to make continuity corrections by adding 0.5 s to cell counts. Furthermore, the use of maximum likelihood estimates of odds ratios also creates a correlation between the yi and vi. Indeed, a commonly applied test for association between the estimates and standard errors produces P = 0.005 34, although a more appropriate test based on efficient score and Fisher information statistics, which overcomes artefactual correlation between the estimates and weights, produces P = 0.44 35. Also, some of the studies display extreme treatment effects; for example, in the Chung study, 0 and 100 per cent of the patients bled in the treatment and control groups, respectively. This study is not very informative about the magnitude of the odds ratio, and receives very little weight in the meta-analysis, despite providing convincing evidence that the treatment has a large effect on the absolute risk of bleeding. Using risk differences rather than odds ratios as the effect size would have given this study more weight, but risk differences are generally discouraged as a metric for meta-analysis 36.

6. Discussion

This paper has considered the use of sequential methods for meta-analyses that incorporate random effects to allow for heterogeneity between studies. Our emphasis has been on determining a straightforward approach that has good empirical properties. We have developed and compared five methods, including a direct extension of the standard random-effects meta-analysis method, an extension of the approach of Biggerstaff and Tweedie, a semi-Bayes method involving Bayesian updating of the heterogeneity variance, and an approximation to the semi-Bayes method. The choice of an O'Brien and Fleming sequential design with a Christmas tree correction for discrete monitoring leads to a simple procedure for calculating repeated confidence intervals, which enables us to present sequential meta-analyses using forest plots, the conventional way of illustrating meta-analysis results. When either a repeated confidence interval excludes zero or Vmax is reached, then the recommendation can be made that no further research is required to inform the question addressed by the meta-analysis.

Some issues are worthy of discussion, including philosophical, practical, and technical considerations. First, should sequential methods be applied to meta-analysis? Second, if sequential methods are to be used, what methods should be implemented in practice? Third, what technical problems remain that require further research?

The use of sequential methods for meta-analysis is contentious. Whereas the steering committee for a prospective study has control over the recruitment of participants, researchers undertaking meta-analyses may have no direct control over whether further studies are performed. Whether sequential methods have a role for updating of meta-analyses that are unconnected with primary researchers (such as those in Cochrane reviews) remains an open question. Chalmers and Lau 37 question the need to correct for multiple looks within a cumulative meta-analysis because of the lack of direct control. However, we would remark that many systematic reviews contain recommendations for practice and for research, and decisions over the content of these recommendations are similar to decisions over whether a primary research study should continue to recruit participants. Formal sequential methods for this process allow the wider field of evolving knowledge to be subject to the same rigorous considerations of Type I and Type II errors that individual randomized trials have traditionally received.

Meta-analyses are now well established as part of the research process, and many have argued for the need for primary studies to both be informed by meta-analyses of existing evidence, and be rounded off with an updated meta-analysis including the newly generated evidence. Thus primary research is increasingly viewed as part of a wider sequential process, and the methods described here may have application in this context. For instance, there may well be more enthusiasm for conducting a new study if the effect estimate is close to an upper or lower stopping boundary in a scheme such as Figure 1. Finally, we believe that sequential meta-analyses can play an important role in the actual design of individual studies, since the amount of further information that would be required to meet a pre-specified Vmax can be determined. However, there are many considerations in the design of a new study, and it is rare for them to be designed primarily around an updated meta-analysis. Related issues are discussed by Sutton et al.38.

One area of potential application of sequential methods is the monitoring of adverse effects of pharmacological interventions. However, as we remark in Section 5, the particular general parametric approach we describe does not lend itself well to rare events. Furthermore, it is not clear that strict control over false-positive findings is important in this context, since a small, non-statistically significant, signal should still be investigated when the adverse effect is major. More research is required to explore appropriate methods for this issue.

If sequential methods are to be implemented, can the methods we have described be recommended? Certainly we would recommend against a fixed-effect approach. However, in a random-effects framework, a key problem in the early stages of a sequential meta-analysis is the poor estimation of the heterogeneity variance. We argue that prior information is therefore necessary. Of our two semi-Bayes methods, the approximate semi-Bayes approach is straightforward to implement, and this is the one we recommend from among those we investigated. From the simulation results, it appears that the method has reasonable false-positive and coverage properties, although these are likely to be dependent on the choice of prior distribution for the among-study variation. Our application demonstrated that the early stages of the sequential meta-analysis can be affected by the prior distribution. Further empirical research is needed to characterize the degree of heterogeneity that can be anticipated in a meta-analysis with particular clinical and methodological features, so that realistic informative prior distributions can be formulated. An alternative approach would be a sensitivity analysis in which the sequential meta-analysis is repeated for a variety of plausible values of τ2, but an explicit stopping rule does not follow from such an approach.

Our recommendations are interim, since more work is required in this area. We have commented on methods based on the law of the iterated logarithm proposed by Lan et al.13, 14. Their open-ended boundaries have the advantage of allowing studies to continue being conducted as long as the null hypothesis is not rejected, but they do not ensure that a pre-specified power is achieved. Open-ended boundaries with specified Type 2 error may offer preferable methods. Some technical issues with the proposed methods might also be investigated. For example, we have assumed that primary inference is to be made on the mean effect across studies. This does not take into account the extent of heterogeneity. It may be more appropriate to base decisions on alternative characteristics of the random-effects distribution, such as the probability that an individual study will have an effect of a particular size. Such inferences can be drawn from the predictive distribution of the effect in a new study 24. Another limitation is that we have proposed the calculation of repeated confidence intervals until and including the first time that Vmax is exceeded. When this last repeated confidence interval is calculated, there is a probability of slightly less than 1 − α that the truth is contained in every interval. The reduction in the probability depends on the increase in V from Vmax, and will be greater if only a few sequential steps have been conducted. A better understanding of this issue would be useful. Finally, the advantage of using the Christmas tree correction for discrete monitoring with the O'Brien and Fleming design is that this leads to a simple procedure for calculating the repeated confidence intervals, but there are also some disadvantages to this approach. First, Stallard and Facey 39 show that the Christmas tree correction is less accurate for the O'Brien and Fleming design than for other types of sequential design, particularly if there are only a few looks at the data. Second, the Christmas tree correction depends on the information at the current (Vj) and previous (Vj − 1) meta-analysis update. If (VjVj − 1) is less than 0, there is no Christmas tree correction. However, if at the next update (Vj + 1Vj) is greater than 0, there will be a Christmas tree correction. However, Vj + 1 may be less than Vj − 1. Further work is needed to assess the effect that this might have on the validity of the results.

We have considered a frequentist sequential approach to meta-analysis, proposing semi-Bayes approaches to allow prior information about τ2 to be incorporated. The entire meta-analysis could alternatively be undertaken within a fully Bayesian framework. The meta-analyst would then calculate a new credibility interval for every meta-analysis update. The credibility interval, unlike the frequentist repeated confidence interval, is not corrected for multiple looks. The posterior distributions of µ and τ2 from the first meta-analysis would become the prior distribution for the second and so on. If τ2 increases during the course of the sequential meta-analysis, the credibility interval may become wider, but this causes no problem. Bayesian stopping rules for individual clinical trials have been proposed by various authors (see, for example 40, 41). However, as the Bayesian inference is not affected by the repeated updates, Bayesian monitoring procedures can have very poor frequentist properties, such as inflated Type I errors 42.


equation image


This work was supported by an MRC studentship, by MRC Grant U.1052.00.011 and by a grant from the PHG Foundation.