Sequential methods for random-effects meta-analysis

Although meta-analyses are typically viewed as retrospective activities, they are increasingly being applied prospectively to provide up-to-date evidence on specific research questions. When meta-analyses are updated account should be taken of the possibility of false-positive findings due to repeated significance tests. We discuss the use of sequential methods for meta-analyses that incorporate random effects to allow for heterogeneity across studies. We propose a method that uses an approximate semi-Bayes procedure to update evidence on the among-study variance, starting with an informative prior distribution that might be based on findings from previous meta-analyses. We compare our methods with other approaches, including the traditional method of cumulative meta-analysis, in a simulation study and observe that it has Type I and Type II error rates close to the nominal level. We illustrate the method using an example in the treatment of bleeding peptic ulcers. Copyright © 2010 John Wiley & Sons, Ltd.


Introduction
The meta-analysis of results from multiple, similar studies is traditionally considered a retrospective activity. However, many meta-analyses are updated over time as new studies are undertaken or are identified from the previously hidden literature. In particular, systematic reviews in the Cochrane Database of Systematic Reviews [1] are expected to be updated approximately every two years [2]. Furthermore, genuinely prospective meta-analyses are increasingly being performed. For example, meta-analyses have been planned to combine evidence from existing, ongoing clinical trials [3,4], and clinical trials have been designed with the aim of meta-analysing their results [5]. In the field of genetic epidemiology, potentially important genetic variants are examined prospectively in multiple existing collections of DNA samples until sufficient evidence of their role has been determined [6,7].
If a meta-analysis is conducted repeatedly on the addition of new studies, without any allowance for multiple testing, the overall risk of a false-positive finding will increase with the number of metaanalyses performed. Indeed, Berkey et al. note that when there is no underlying effect the process of continually updating a meta-analysis using standard significance tests will lead eventually to a falsepositive result [8]. One way to address this problem is to exploit formal sequential methods. Although sequential methods are well established in the analysis of individual randomized trials, they have received much less attention in the meta-analysis context. This is an area where opinions differ. Some argue that, because meta-analysts do not (typically) have control over the generation of new evidence, they are not in a position to act on stopping rules and so sequential methods should not be applied.
In contrast, our view is that since meta-analysts frequently make decisions on whether to recommend further research or whether to recommend that there is convincing evidence for the presence or absence of an effect, sequential methods can be similarly important to meta-analyses as to individual primary studies. The intention is to facilitate a justifiable recommendation rather than to control future research directly.
Whitehead [9] describes the use of a stopping boundaries approach for fixed and random-effects meta-analyses based on the sequential trial methodology of Whitehead [10]. Pogue and Yusuf suggest the use of monitoring boundaries based on alpha-spending functions and stochastic curtailment, with application to fixed-effect meta-analysis [11], and these have been implemented in practice [12]. These methods are based on desirable sample sizes in terms of numbers of participants across all studies. Lan et al. apply the law of the iterated logarithm that 'penalizes' the usual test statistic to account for multiple tests, equivalent to using a particular open ended boundary in the framework of Whitehead, although the approach does not control Type II error [13,14].
Heterogeneity of effects is a key characteristic of many meta-analyses, and estimation of the amongstudy variance may create problems in the early stages of a sequential meta-analysis. The methods of Pogue have been adapted to account for heterogeneity in a (retrospective) cumulative meta-analysis scenario, by adjusting the desired sample size based on a function of I 2 , a measure of inconsistency across the observed studies' findings [15,16]. The adjustment is derived essentially from a comparison of the (assumed known) variances of fixed-effect and random-effects meta-analysis estimates. In prospective meta-analyses these will not be known, although anticipated values of I 2 might be used. Much of the present paper is devoted to the problem of accounting prospectively for heterogeneity, as it is the principal obstacle to the simple application of sequential trial methodology.
In this paper we consider the use of frequentist sequential approaches to meta-analysis with the aim of controlling the risk of drawing incorrect conclusions. We take an approach based on 'group-sequential' trial methodology, with a stopping rule. A previously proposed method is reviewed and extended to better incorporate random effects. We compare six methods (including a sequential fixed-effect method) through a simulation study and compare them in an example in the treatment of bleeding peptic ulcers. Our emphasis is on the determination of a straightforward approach that has good empirical properties, recognizing that even more formal methods are available through the use of dedicated software such as PEST [17].

Random-effects meta-analysis
Before introducing the sequential methods we review a general parametric approach to meta-analysis on which the sequential methods are based [18]. Suppose we have an estimate, y i , of a true effect i with estimated variance v i , from study i, i = 1, . . ., k. The effect might be measured as a log odds ratio or a difference in means for a comparative study, or a simple mean or logit proportion for a single group study; numerous other options are possible. We make the usual assumption that the estimates and variances are uncorrelated, although this will not always be the case (for example, if larger studies tend to be done in populations in which a treatment is less effective). Alternative approaches are available that do not rely on this assumption. For example, the raw data from the studies could be modelled, which may require individual participant data. Alternatively, all studies could be given equal weight [19], although recent proposals to adopt this practice have not been enthusiastically received [20--22].
A random-effects analysis involves inference on the distribution of the i across studies. A standard assumption is that this distribution is normal with mean and variance 2 . The variance describes the extent of heterogeneity. A simple estimate of is given by the weighted averagê where w * i = (v i +ˆ 2 ) −1 represents the weight attributed to study i. An approximate variance of this estimate is given by The heterogeneity variance 2 may be estimated simply byˆ 2 DL , a method of moments estimator described by DerSimonian and Laird [23], wherê is the standard statistic used to test for heterogeneity across studies. In many circumstances, inferences from a random-effects meta-analysis should be based on the full distribution of the i across studies rather than on alone. This can be achieved using a predictive interval for the effect in a new study, which takes into account both the mean and variance 2 of effects [24]. In the methods we describe below, we assume that primary inference is to be made on the mean effect across studies, i.e. on . Ifˆ 2 DL is equal to zero, then w * i is equal to w i and the resulting calculations lead to a fixed-effect meta-analysis.
The estimates y i and v i are typically calculated from either Wald statistics or score statistics. For the former, y i =ˆ i and v i = var(ˆ i ), whereˆ i denotes the maximum likelihood estimate of i and var(ˆ i ) its variance. For the latter, we may write y i = Z i /V i and v i = 1/V i , where Z i is the efficient score for i and V i is Fisher's information, both evaluated at i = 0. When sample sizes are large and i is reasonably near 0, there will be little difference between the results from the two approaches.

A monitoring boundary approach to sequential random-effects meta-analysis
To follow the progress of a meta-analysis over time we require a measure of the amount of relevant information contained within it. One approach is to use the total number of study participants [11,15,16]. Pogue et al. introduce a desirable total that they refer to as the 'optimal information size', based on standard sample size calculations [11]. Wetterslev et al. adapt this to account for observed heterogeneity in a retrospective cumulative meta-analysis [16], although in our applications this heterogeneity is yet to be observed. In contrast, Whitehead [9] and Lan et al. [13,14] use statistical information in the form of the inverse variance ofˆ or, equivalently, the sum of the weights (specifically, Whitehead uses Fisher's information). We choose the statistical information approach, since it relates directly to the precision of the meta-analysis estimate and hence the amount of evidence contained in the collection of studies. In particular, if highly heterogeneous studies are combined sequentially over time, the precision of a random-effects meta-analytic estimate may decrease while the number of participants increases. Our approach may be used when study estimates y i and w i are calculated either from Wald statistics (as in Lan et al.) or score statistics (as in Whitehead).
A boundaries approach to sequential meta-analysis involves monitoring information using the sum of weights, i y i , at each update of the meta-analysis provides a visualization of the path of the meta-analysis over time. A line from the origin to a point (Z j , V j ) has slope equal to the simple random-effects meta-analytic estimate at the corresponding stage. Note that as information about 2 changes over time, the individual study weights, w * i , are recalculated on each update. In the absence of heterogeneity, and when 2 is assumed to be zero (as in a fixed-effect metaanalysis), this approach may be justified formally [9]. It is equivalent to the group-sequential approach to monitoring clinical trials described by Whitehead [10]. Monitoring boundaries may be used to stop the meta-analysis when there is sufficient evidence of an effect or lack of an effect based on a pre-specified significance level, power and clinically important effect.
Whitehead [9] describes the implementation of a triangular design in the context of a prospectively planned meta-analysis, and Bollen et al. implement a double-triangular design [25]. Here we focus on the restricted procedure of Whitehead, equivalent to an O'Brien and Fleming stopping rule [26]. The O'Brien and Fleming design produces a rectangular stopping boundary in the (Z , V ) plane, with symmetric upper and lower boundaries at Z = ±H and a vertical boundary at V = V max reflecting the  maximum amount of information to be collected ( Figure 1). Table I provides As soon as |Z j | H j or V j V max , the meta-analysis will be stopped with the conclusion that the mean effect is in one direction or the other (for the former criterion) or that the mean effect is zero (if only the latter criterion is true). We refer to the values of Z j and V j at this point as Z end and V end . If the path stops with V end >V max , then we extend the Christmas tree correction to V end , and consider an effect to be present if |Z end | H end .
The methods we describe can be extended to address incremental information within studies as well as across studies. For instance, if at time j, study i has interim data yielding an estimate y ij and weight w * ij (based on the current estimate of 2 across currently available data), then we can calculate This situation was addressed by Whitehead [9] but we do not pursue it further in the present paper.

Repeated confidence intervals
The sequential procedure, described in the previous section in terms of monitoring boundaries, may also be presented using the repeated confidence intervals approach introduced by Jennison and Turnbull [27]. This enables us to present sequential meta-analyses using forest plots, the conventional way of illustrating meta-analyses. Lower and upper limits, L j and U j , for repeated confidence intervals after the inclusion of t j studies can be obtained by inverting the sequential design, yielding The stopping rules based on the horizontal monitoring boundaries, that is stop the meta-analysis if Z j − H j or if Z j H j , is equivalent to stopping if the interval (L j ,U j ) excludes 0. The repeated confidence intervals (L j ,U j ) have the property that all of them, calculated up until V = V max , contain the true value of the parameter with probability 1− , where is the chosen two-sided significance level. When the amount of information is less than V max , the confidence interval series has probability greater than 1− that the truth is contained in every interval. This means that at any time, an individual confidence interval has probability greater than 1− of containing the truth, and is therefore wider than a confidence interval from a conventional meta-analysis.
The repeated confidence interval calculated at V = V max is the last valid member of the sequence. Thus if the amount of information exceeds V max it is unclear how to proceed. For consistency with the previous section, we propose the calculation of repeated confidence intervals until and including the first time that V max is exceeded. Thereafter, this last confidence interval is used irrespective of how many additional data accumulate. This is likely to be conservative, but is based on the result that the coverage property of the valid sequence of repeated confidence intervals is also satisfied by the intersection of all of the repeated confidence intervals.

Formal versus ad hoc stopping rules
At each analysis the repeated confidence interval provides confidence limits that are valid and not dependent on any stopping rule which might be used. Thus information about the magnitude of can be obtained from the latest confidence interval. If the sequential meta-analysis follows a formal stopping rule, then a final analysis would be conducted after a stopping boundary had been crossed. A point estimate of is given by Z end /V end and the repeated confidence interval can be calculated. However, this point estimate may be a biased estimate. Methods for obtaining adjusted estimates and confidence intervals are available based on sequential methodology (see, for example, Chapter 5 of Whitehead [10]), but these require specialized software, and we do not address them in detail here. The adjusted confidence interval will be narrower than that based on repeated confidence interval methodology. This is because the latter is not dependent on the stopping rule that has been used.

Estimation of heterogeneity variance
A key issue in sequential random-effects meta-analysis is the heterogeneity variance parameter, 2 . In the early stages of a sequential procedure, when very few studies are included, uncertainty surrounding any estimate of among-study variance will be considerable. Lan et al. prefer to over-estimate 2 by using the arithmetic mean of the y i when there are five or fewer studies, mainly to avoid spurious estimates of zero [13,14]. Whichever estimate is used, consecutive point estimates of the parameter are likely to be variable, and ifˆ j >ˆ j−1 it is possible that V j will be smaller than V j−1 . As a consequence, the path (Z j , V j ) of the sequential meta-analysis will go backwards. The situation also has implications for the Christmas tree correction, and if V j is smaller than V j−1 we set H j = H , so that a correction is only made if the path goes forwards. Within a repeated confidence interval framework, a decrease from V j−1 to V j yields a confidence interval wider than the previous one. We consider that the possibility that the confidence intervals can become wider (or that the path goes backwards)-and the actual widening of the confidence intervals-is a suitable reflection of the increased uncertainty resulting from identifying inconsistency among the studies.
In Section 3 we consider five methods for addressing the heterogeneity variance in a sequential meta-analysis: (i) ignoring it (a fixed-effect meta-analysis); (ii) implementing a standard randomeffects method (using the DerSimonian-Laird estimate); (iii) accounting for uncertainty in 2 (using the approach of Biggerstaff and Tweedie [28]); (iv) Bayesian updating of uncertainty in 2 ; and (v) approximate Bayesian updating of uncertainty in 2 . The last two approaches are proposed to allow for prior information on 2 to be incorporated in the early stages of the sequential meta-analysis. We refer to the sequential approaches using Bayesian inference for 2 as 'semi-Bayes' to reflect the fact that inference on the parameter of interest ( ) is frequentist. We compare the five approaches, contrasting them with a naïve cumulative random-effects meta-analysis without correction for multiple looks, in a simulation study in Section 4.

Fixed-effect sequential meta-analysis using the general parametric approach
For a fixed-effect sequential meta-analysis, as described by Whitehead [9] , we take where y i and w i are as defined in Section 2.1. As with all methods described below, we apply the confidence interval formulae in Equation (1) to produce repeated confidence intervals.

Random-effects sequential meta-analysis using the general parametric approach
For a random-effects sequential meta-analysis, also described by Whitehead, we take where y i and w * i are as defined in Section 2.1 and w * i involves the DerSimonian-Laird estimator,ˆ 2 DL, j , applied to the studies accumulated so far in the sequential meta-analysis. The estimate of 2 j is assumed known at each update, as in a conventional meta-analysis.

Accounting for uncertainty in the estimation of 2
Methods that incorporate uncertainty inˆ 2 are available (Biggerstaff and Tweedie [28], Hardy and Thompson [29]). Biggerstaff and Tweedie consider an approach for accounting for uncertainty in the DerSimonian-Laird estimate based on a gamma approximation to the distribution of Q. This leads the authors to a revised weighting scheme for combining the effect estimates, y i : wherew i are obtained as the expected values of the DerSimonion-Laird weights, w * i , over the approximating distribution forˆ 2 DL (allowing for truncation at zero), obtained for example using numerical integration.
As the new weights,w i , are no longer inverse variances, we define Z j and V j as Incorporating uncertainty in estimates of 2 into the meta-analysis should slow the progress of the (Z , V ) path towards the boundary, since the uncertainty is reflected in a smaller value for V . Early repeated confidence intervals are likely to be wide.

Incorporating prior information about heterogeneity: Bayesian updating of 2
A natural approach to making inferences about 2 in the presence of a small number of studies is to draw on external evidence about its likely value. An informative prior distribution may contribute heavily to producing a realistic estimate in the early stages of a sequential meta-analysis. We may therefore expect the estimates of 2 in the meta-analysis to be less erratic, and the path to proceed forward towards the boundaries. We propose to use Bayes' theorem to update the value of 2 , and to insert this into the frequentist sequential approach outlined above. A suitable prior distribution may be developed following the ideas of Higgins and Whitehead [30], who gathered a collection of meta-analyses in the same therapeutic area and produced an empirical distribution from the observed degrees of heterogeneity for a similar type of outcome. A point summary from the prior distribution may be used in the analysis of the first study in the sequential meta-analysis, and point summaries from updated posterior distribution for 2 may be used in subsequent updates of the sequential meta-analysis.
Let p( 2 ) represent the prior distribution for 2 , which here will be considered to be an inverse gamma distribution with shape parameter and scale parameter . At the jth meta-analysis the likelihood function for the data L( , 2 ; {y 1 , . . ., y t j }, {v 1 , . . . , v t j }) is a product of normal distributions with means equal to and variances equal to (v i + 2 ). The joint posterior distribution of and 2 at this metaanalysis, assuming an independent prior distribution, p( ), for , is given by Inference on 2 would normally be performed by integration or simulation. However, is unknown and in our frequentist analysis is not associated with a probability distribution, so cannot be integrated out. A solution is to replace by its estimate from the ( j −1)th meta-analysis. This makes 2 the only unknown parameter in the posterior formula, and a simple one-dimensional numerical integration can be performed to obtain its posterior mean,ˆ 2 B, j as follows.
If the first meta-analysis ( j = 1) contains several studies (i.e. t 1 >1), then a suitable value forˆ j−1 might be provided by the estimate from a standard random-effects meta-analysis based on the first (t 1 −1) studies. If the first meta-analysis contains exactly two studies (t 1 = 2) we would setˆ j−1 equal to y 1 . We opt for the mean rather than the median or mode since not only is it straightforward to calculate, but it will tend to overestimate 2 in the early stages, producing an analysis that is less likely to yield a false positive result. As information accumulates, and the posterior distribution of 2 develops a stronger peak, the mean will move towards the median and the mode.
For a random-effects sequential meta-analysis, we take In principle, the approach could be extended to incorporate additional uncertainty in 2 by applying the ideas of Biggerstaff and Tweedie. Specifically, the integrations in the calculation ofˆ 2 B, j could be performed for the weights w * B,i rather than forˆ 2 B, j . We have not implemented this extension. 2 We now propose a simpler approach that avoids the numerical integration. Suppose the true effects in the studies to date were known exactly, so that y i = i and v i = 0. Then the inverse gamma prior distribution for 2 is the conjugate prior distribution for the unknown variance of a normal distribution, and the posterior distribution for 2 is then an inverse gamma posterior distribution with parameters ( +(t j /2)) and (

An approximation to the Bayesian approach for updating
/t j is the heterogeneity variance among the studies up to t j . We may estimate this variance using the DerSimonian-Laird estimateˆ 2 DL, j . As the mean of an inverse gamma distribution with parameters and is given by /( −1), it follows that the posterior mean for 2 is given bŷ For a random-effects sequential meta-analysis, our approximate semi-Bayes approach takes R code for this method is included in the Appendix. In practice, the approximations here might be expected to yield adequate results when the studies are large (in which case y i ≈ i and v i ≈ 0), and to overestimate 2 when the studies are small, due to the overdispersion in the distribution of the estimates, y i , compared with the distribution of the true effects, i .

Methods of the simulation study
We conducted a simulation study to examine the behaviour of the five sequential meta-analysis methods described in Section 3 under a wide range of conditions. These were also compared with a traditional cumulative meta-analysis approach that simply performs a conventional random-effects meta-analysis on the addition of each study. We chose an O'Brien and Fleming design with 90 per cent power to detect an effect R of 0.5 with a two-sided significance level of 5 per cent. This value for R corresponds to a moderately sized standardized mean difference [31], or to an odds ratio, hazard ratio or risk ratio of 1.65. The boundaries for the design, obtained from the package PEST 4.0, are given by H = 14.92 and V max = 44.32 (see Table I).
The simulation study was conducted with true effect sizes of = 0, 0.25, 0.5, and 1, that is ranging from no effect to an effect substantially larger than R . Values for the heterogeneity variance 2 were 0, 0.0625, and 0.25. We simulated studies such that the expected number of studies under the null hypothesis under a fixed-effect analysis would be t = 5, t = 10, and t = 20, as described below. This corresponds to an approximate doubling of sample size for each case we investigated. In total we examined 36 scenarios.
Within each simulated sequential meta-analysis the true effect size from each study, i , was sampled from a N ( , 2 ) distribution. The mean within-study variance of the effect estimates, y i , across all studies was taken to be 2 = t/V max , ensuring that there would be approximately t studies in a sequential fixed-effect meta-analysis if the vertical boundary was crossed at V max and 2 = 0. The within-study variance of the effect estimate for each study was sampled from a uniform distribution in the range ( 2 −0.75 2 , 2 +0.75 2 ). This ensures that the studies have different degrees of precisions, reflecting a realistically wide range of study sizes within each meta-analysis. As an approximate interpretation, for the case of one-group studies estimating means with a between-person standard deviation equal to 4 in every study, this ensures that, for t = 5, sample sizes would be distributed between 81 and 567 rather than all being equal to 142. The prior distribution for 2 for the semi-Bayes methods was an inverse gamma distribution with parameters = 1.5 and = 0.08. This corresponds to a prior estimate for 2 of 0.16. This is less than the maximum simulated heterogeneity, and, if the true effect size were 0.5, allows for some studies to have negative effect sizes. Taking = 1.5 is equivalent to having prior information with the weight of one study, so the prior estimate contributes less than 10 per cent towards the posterior estimate of heterogeneity after ten studies have been included in the meta-analysis.
For each of the 36 scenarios 5000 sequential meta-analyses were simulated and analysed using each of the six approaches, using the procedures described in Section 2. The stopping rule for the naive cumulative meta-analysis method was to stop (conclude an effect exists) if the naive 95 per cent confidence interval excludes 0; or to stop (conclude no effect) if V j exceeds V max ; or otherwise to continue. The simulation was conducted in R (the program is available upon request).
From the simulations, for each approach, we recorded (i) the mean number of studies taken to reach a stopping boundary of the design; (ii) the proportion of simulations stopping having crossed the upper boundary (in order to evaluate Type I error under the null hypothesis); (iii) the percentage of simulations in which all calculated two-sided 95 per cent confidence intervals contained the true effect size (that is, the coverage of all confidence intervals); (iv) the percentage of simulations in which the confidence interval at the time when a stopping boundary is crossed (based on Z end and V end ) contained the true effect size (coverage of the last confidence interval); (v) estimates of the heterogeneity variance, 2 , at the time when a stopping boundary is crossed and (vi) mean value of V end .

Results of the simulation study
Tables II-IV give a selection of the results when t = 5 (few, large studies), t = 10 (more, average sized studies) and t = 20 (many, small studies). The results of the simulations should be interpreted in comparison with the fixed-effect sequential method as well as in isolation. The fixed-effect sequential method should in theory have good properties in the absence of heterogeneity. However, the results do not agree with the expectations due to a combination of (i) the inadequacy of the Christmas tree correction; (ii) our random sampling of within-study variances from a uniform distribution, which means that we cannot guarantee to hit the V max boundary with the tth study and (iii) the ad hoc decision to consider an effect present when V end >V max and |Z end | H end , a region in which sequential theory does not formally apply. The third issue is expected to increase the probability of declaring that an effect exists, with a larger increase the further away V end is from V max . When = 0, the vertical boundary will be crossed in the majority of simulations, and it can be seen from Table II ( 2 = 0) with approximately five studies to reach V max that the Type I error rate is 0.039 instead of 0.025. When there are more and smaller studies, for example in Table IV, V end moves closer to V max and it can be seen that the Type I error rate reduces to 0.028. In addition, the probability that all confidence intervals up to and including the one at V end will contain the true effect size will only be equal to 0.95 if V end = V max .
If V end <V max , this probability will be greater than 0.95, and if V end >V max this probability will be less than 0.95. The closer that V end is to V max , the closer will be the probability to 0.95. When = 0 and 2 = 0, this probability is equal to 0.922 (Table II) and 0.939 (Table IV). The results for the other sequential meta-analysis approaches should be viewed in this light.
The Biggerstaff and Tweedie method is not noticeably different from the sequential random-effects meta-analysis. This is because it does not change the estimate of heterogeneity, but widens the confidence interval in line with the uncertainty in the estimation of heterogeneity. This difference in confidence intervals appears to be small after the first few studies, so the Biggerstaff and Tweedie adjustment has little impact on the analysis. The simulation results show that the sequential random-effects meta-analysis generally takes at least one study more to reach a stopping boundary than the sequential fixed-effect meta-analysis, the additional number of studies increasing with increasing heterogeneity. The semi-Bayes methods generally result in about one more study than the sequential random-effects meta-analysis. This shows that taking account of heterogeneity can reduce the risk of early stopping.
As the number of studies increases it can be seen that the Type I error rate for the naïve cumulative random-effects meta-analysis without correction for multiple looks also increases, in contrast to the other random-effects approaches which allow for the multiple looks. Type I error rates increase with increasing heterogeneity, this being most noticeable for the sequential fixed-effect meta-analysis, which makes no allowance for heterogeneity. The use of a sequential random-effects meta-analysis improves Copyright  the Type I error rate in the presence of heterogeneity, but it is still unacceptably high when there is a moderate amount of heterogeneity. Hence these methods may produce spurious findings if there is heterogeneity. The semi-Bayes methods produce values closer to the desired value, but they too show a slight increase with heterogeneity. The semi-Bayes methods show considerably greater coverage than the sequential fixed-effect metaanalysis when there is no heterogeneity. This over-coverage is caused by their invalid prior assumption of non-zero heterogeneity. The coverage falls off rapidly with heterogeneity for the sequential fixed and random-effects meta-analyses. This reduction in coverage is smaller for the semi-Bayes methods.
There is some underestimation of heterogeneity in the sequential random-effects meta-analysis, which may lead to smaller confidence intervals and hence to early stopping. The semi-Bayes methods provide estimates of 2 that are closer to the true value provided that 2 is not equal to 0. If there is no heterogeneity the Bayesian methods give an overestimate.
In conclusion, the semi-Bayes and approximate semi-Bayes methods had the most desirable falsepositive and coverage properties in our simulation study, despite the prior distribution being informative and derived independently of the simulated meta-analyses. Our findings may be dependent on the choice of prior distribution, and we return to this issue in the Discussion.

Application
Sacks et al. consider the data from 23 trials that compare endoscopic haemostasis with a control treatment for the treatment of bleeding peptic ulcers [32]. The outcome of interest is post-treatment bleeding and the treatment effect recorded here is the log-odds ratio of 'no bleeding'. The trial estimates y i and v i are calculated from Wald statistics, that is where S 1 and F 1 are the number of patients with 'no bleeding' and 'bleeding' in the endoscopic haemostasis group, and S 2 and F 2 are defined similarly for the control group. A positive value for y i favours the experimental treatment. The data and log-odds ratio estimates and confidence intervals for the individual studies are presented in Table V. For trials in which there were no patients with 'no bleeding' or no patients with 'bleeding' in at least one treatment group, 0.5 was added to all of the four cells of the 2×2 contingency table. Although this is common practice, for meta-analyses of rare events it is often preferable to use alternative approaches such as logistic regression [33]. However, such methods do not fall conveniently into the general sequential framework we describe. A conventional randomeffects meta-analysis of the full data set ( Figure 2) demonstrates a significant benefit of treatment, with a log-odds ratio of 1.09 (an odds ratio of 3.0), but with considerable heterogeneity (ˆ DL = 0.91). We suppose for the purposes of illustration that this meta-analysis had been planned with the aim of incorporating observations at the end of each trial as they became available. Given the anticipated heterogeneity of effects, we apply the random-effects methods described above. The analysis uses an O'Brien and Fleming design in order to avoid early stopping unless the odds ratio is very large, with a significance level of 5 and 90 per cent power to detect a log-odds ratio of 0.693 (an odds ratio of 2). From Table I, for R = 0.693 this gives H = 10.77 and V max = 23.07 ( Figure 1).
The sequential random-effects meta-analysis and the approximate semi-Bayes and semi-Bayes methods are applied and compared with a sequential fixed-effect meta-analysis ( Figure 3). To assess sensitivity to prior distributions, two inverse gamma prior distributions for 2 are used for the semi-Bayes methods: an IG(1.5, 0.08) has a prior mean of 0.16 and an IG(1.5, 1) has a prior mean of 2. These prior means are respectively much smaller and much larger than the DerSimonian-Laird estimate based on all of the trials. Table VI shows the number of studies included at the point when the stopping criterion is met, together with estimates of the treatment effect (Z end /V end ) and heterogeneity, and the last repeated confidence interval. If the plan for this sequential meta-analysis had been to follow the formal stopping rule of the O'Brien and Fleming design, then it would be more appropriate to present a 'final' analysis which correctly adjusts for earlier looks. The results from conducting this 'final' analyses using the PEST package [17] are included in Table VI. Such an analysis corrects for the bias in the Z end /V end estimate, and provides an appropriate P value and confidence interval. The confidence interval is narrower than the last repeated confidence interval, because the repeated confidence intervals have had no effect on the decision to conduct further studies.
These results show that a sequential fixed-effect meta-analysis, which ignores the heterogeneity, stops after four studies (see also Figure 1), leading to a smaller estimate of the treatment effect compared with that from all trials and a confidence interval that is narrow. The sequential random-effects method requires 11 studies, and produces a slightly larger estimate of the treatment effect and a wider confidence interval. When the prior distribution provides a much smaller underestimate of 2 than is apparent in Copyright   the dataset, as with the IG(1.5, 0.08) prior, the semi-Bayes method stops after a smaller number of studies (nine) although the approximate semi-Bayes method carries on for the same 11 studies as for the sequential random-effects method. The earlier stopping of the semi-Bayes IG(1.5, 0.08) method results in smaller estimates of treatment effect and heterogeneity. The use of an IG(1.5, 1) prior with its large prior estimate of heterogeneity causes the approximate semi-Bayes and semi-Bayes methods to stop after 15 trials. The estimates of treatment effect and heterogeneity from the sequential approaches are smaller than those from the overall random effects analysis of 23 studies. This is mainly due to the seventeenth trial, which has a large, beneficial effect of treatment, increasing the heterogeneity.
In our example, we have assumed that a sequential approach to the meta-analysis had been planned before the trials had reported their results. The example is instructive in illustrating some of the limitations of applying our generic statistical methods prospectively in a meta-analysis situation. We have remarked already that 2×2 contingency tables may be better analysed using binomial likelihoods. This would have avoided the need to make continuity corrections by adding 0.5 s to cell counts. Furthermore, the use of maximum likelihood estimates of odds ratios also creates a correlation between the y i and v i . Indeed, a commonly applied test for association between the estimates and standard errors produces P = 0.005 [34], although a more appropriate test based on efficient score and Fisher information statistics, which overcomes artefactual correlation between the estimates and weights, produces P = 0.44 [35]. Also, some of the studies display extreme treatment effects; for example, in the Chung study, 0 and 100 per cent of the patients bled in the treatment and control groups, respectively. This study is not very informative about the magnitude of the odds ratio, and receives very little weight in the meta-analysis, despite providing convincing evidence that the treatment has a large effect on the absolute risk of bleeding. Using risk differences rather than odds ratios as the effect size would have given this study more weight, but risk differences are generally discouraged as a metric for meta-analysis [36].

Discussion
This paper has considered the use of sequential methods for meta-analyses that incorporate random effects to allow for heterogeneity between studies. Our emphasis has been on determining a straightforward approach that has good empirical properties. We have developed and compared five methods, including a direct extension of the standard random-effects meta-analysis method, an extension of the approach of Biggerstaff and Tweedie, a semi-Bayes method involving Bayesian updating of the heterogeneity variance, and an approximation to the semi-Bayes method. The choice of an O'Brien and Fleming sequential design with a Christmas tree correction for discrete monitoring leads to a simple procedure for calculating repeated confidence intervals, which enables us to present sequential meta-analyses using forest plots, the conventional way of illustrating meta-analysis results. When either a repeated confidence interval excludes zero or V max is reached, then the recommendation can be made that no further research is required to inform the question addressed by the meta-analysis.
Some issues are worthy of discussion, including philosophical, practical, and technical considerations. First, should sequential methods be applied to meta-analysis? Second, if sequential methods are to be used, what methods should be implemented in practice? Third, what technical problems remain that require further research?
The use of sequential methods for meta-analysis is contentious. Whereas the steering committee for a prospective study has control over the recruitment of participants, researchers undertaking metaanalyses may have no direct control over whether further studies are performed. Whether sequential methods have a role for updating of meta-analyses that are unconnected with primary researchers (such as those in Cochrane reviews) remains an open question. Chalmers and Lau [37] question the need to correct for multiple looks within a cumulative meta-analysis because of the lack of direct control. However, we would remark that many systematic reviews contain recommendations for practice and for research, and decisions over the content of these recommendations are similar to decisions over whether a primary research study should continue to recruit participants. Formal sequential methods for this process allow the wider field of evolving knowledge to be subject to the same rigorous considerations of Type I and Type II errors that individual randomized trials have traditionally received.
Meta-analyses are now well established as part of the research process, and many have argued for the need for primary studies to both be informed by meta-analyses of existing evidence, and be rounded off with an updated meta-analysis including the newly generated evidence. Thus primary research is increasingly viewed as part of a wider sequential process, and the methods described here may have application in this context. For instance, there may well be more enthusiasm for conducting a new study if the effect estimate is close to an upper or lower stopping boundary in a scheme such as Figure  1. Finally, we believe that sequential meta-analyses can play an important role in the actual design of individual studies, since the amount of further information that would be required to meet a pre-specified V max can be determined. However, there are many considerations in the design of a new study, and it is rare for them to be designed primarily around an updated meta-analysis. Related issues are discussed by Sutton et al. [38].
One area of potential application of sequential methods is the monitoring of adverse effects of pharmacological interventions. However, as we remark in Section 5, the particular general parametric approach we describe does not lend itself well to rare events. Furthermore, it is not clear that strict control over false-positive findings is important in this context, since a small, non-statistically significant, signal should still be investigated when the adverse effect is major. More research is required to explore appropriate methods for this issue.
If sequential methods are to be implemented, can the methods we have described be recommended? Certainly we would recommend against a fixed-effect approach. However, in a randomeffects framework, a key problem in the early stages of a sequential meta-analysis is the poor estimation of the heterogeneity variance. We argue that prior information is therefore necessary. Of our two semi-Bayes methods, the approximate semi-Bayes approach is straightforward to implement, and this is the one we recommend from among those we investigated. From the simulation results, it appears that the method has reasonable false-positive and coverage properties, although these are likely to be dependent on the choice of prior distribution for the amongstudy variation. Our application demonstrated that the early stages of the sequential meta-analysis can be affected by the prior distribution. Further empirical research is needed to characterize the degree of heterogeneity that can be anticipated in a meta-analysis with particular clinical and methodological features, so that realistic informative prior distributions can be formulated. An alternative approach would be a sensitivity analysis in which the sequential meta-analysis is repeated for a variety of plausible values of 2 , but an explicit stopping rule does not follow from such an approach.
Our recommendations are interim, since more work is required in this area. We have commented on methods based on the law of the iterated logarithm proposed by Lan et al. [13,14]. Their openended boundaries have the advantage of allowing studies to continue being conducted as long as the null hypothesis is not rejected, but they do not ensure that a pre-specified power is achieved. Openended boundaries with specified Type 2 error may offer preferable methods. Some technical issues with the proposed methods might also be investigated. For example, we have assumed that primary inference is to be made on the mean effect across studies. This does not take into account the extent of heterogeneity. It may be more appropriate to base decisions on alternative characteristics of the random-effects distribution, such as the probability that an individual study will have an effect of a particular size. Such inferences can be drawn from the predictive distribution of the effect in a new study [24]. Another limitation is that we have proposed the calculation of repeated confidence intervals until and including the first time that V max is exceeded. When this last repeated confidence interval is calculated, there is a probability of slightly less than 1− that the truth is contained in every interval. The reduction in the probability depends on the increase in V from V max , and will be greater if only a few sequential steps have been conducted. A better understanding of this issue would be useful. Finally, the advantage of using the Christmas tree correction for discrete monitoring with the O'Brien and Fleming design is that this leads to a simple procedure for calculating the repeated confidence intervals, but there are also some disadvantages to this approach. First, Stallard and Facey [39] show that the Christmas tree correction is less accurate for the O'Brien and Fleming design than for other types of sequential design, particularly if there are only a few looks at the data. Second, the Christmas tree correction depends on the information at the current (V j ) and previous (V j−1 ) meta-analysis update. If (V j − V j−1 ) is less than 0, there is no Christmas tree correction. However, if at the next update (V j+1 − V j ) is greater than 0, there will be a Christmas tree correction. However, V j+1 may be less than V j−1 . Further work is needed to assess the effect that this might have on the validity of the results.
We have considered a frequentist sequential approach to meta-analysis, proposing semi-Bayes approaches to allow prior information about 2 to be incorporated. The entire meta-analysis could alternatively be undertaken within a fully Bayesian framework. The meta-analyst would then calculate a new credibility interval for every meta-analysis update. The credibility interval, unlike the frequentist repeated confidence interval, is not corrected for multiple looks. The posterior distributions of and 2 from the first meta-analysis would become the prior distribution for the second and so on. If 2 increases during the course of the sequential meta-analysis, the credibility interval may become wider, but this causes no problem. Bayesian stopping rules for individual clinical trials have been proposed by various authors (see, for example [40,41]). However, as the Bayesian inference is not affected by the repeated updates, Bayesian monitoring procedures can have very poor frequentist properties, such as inflated Type I errors [42].