Meta‐analysis of Gaussian individual patient data: Two‐stage or not two‐stage?

Quantitative evidence synthesis through meta‐analysis is central to evidence‐based medicine. For well‐documented reasons, the meta‐analysis of individual patient data is held in higher regard than aggregate data. With access to individual patient data, the analysis is not restricted to a “two‐stage” approach (combining estimates and standard errors) but can estimate parameters of interest by fitting a single model to all of the data, a so‐called “one‐stage” analysis. There has been debate about the merits of one‐ and two‐stage analysis. Arguments for one‐stage analysis have typically noted that a wider range of models can be fitted and overall estimates may be more precise. The two‐stage side has emphasised that the models that can be fitted in two stages are sufficient to answer the relevant questions, with less scope for mistakes because there are fewer modelling choices to be made in the two‐stage approach. For Gaussian data, we consider the statistical arguments for flexibility and precision in small‐sample settings. Regarding flexibility, several of the models that can be fitted only in one stage may not be of serious interest to most meta‐analysis practitioners. Regarding precision, we consider fixed‐ and random‐effects meta‐analysis and see that, for a model making certain assumptions, the number of stages used to fit this model is irrelevant; the precision will be approximately equal. Meta‐analysts should choose modelling assumptions carefully. Sometimes relevant models can only be fitted in one stage. Otherwise, meta‐analysts are free to use whichever procedure is most convenient to fit the identified model.

in place of published estimates. The merits of using individual participant data (IPD) are well documented [1][2][3][4] : Data can be checked by the reviewer; consistent inclusion and exclusion criteria can be applied; participants can be included who were not in the original reports; longer follow-up may be included than at the time of the primary study report; analyses can be executed identically for each study, meaning that it is more reasonable to combine estimates; and there is greater scope to investigate interactions. Collecting and checking IPD is a time-consuming, diplomatic, and expensive exercise, so when IPD has been obtained, the advantages should be fully leveraged.
When IPD have been obtained, the analysis is not restricted to combining estimates and standard errors from published data. Instead, the whole analysis can be done in one stage. The merits of "one-stage" compared to "two-stage" approaches have been debated. [5][6][7][8] Those who argue for one-stage methods tend to be statistically trained, while those who argue for two-stage methods are typically meta-analysts whose primary training is not statistical. (Executing a thorough systematic review and IPD meta-analysis involves many non-statistical skills.) There seems to be support for both perspectives.
The views of a one-stage exponent are exemplified by an anonymous reviewer of one of the authors' (DJF) work on a Stata package to perform two-stage meta-analysis of IPD. 3 The reviewer was ambivalent. Despite approving the package, he or she was convinced that two-stage methods have little to offer, writing The debate is ongoing but the statistical merits of one-stage IPD far outweigh the benefits of a two-stage approach … … the merit of the one-stage is doing an analysis that is definitely more accurate … … a simulation study is needed to quantify how much better one-stage is in certain scenarios … I do not think it [two-stage meta-analysis] should be recommended for use when a one-stage approach is possible . . . .
These opinions were apparently informed by Mathew and Nordström's paper, 6 which shows that a two-stage procedure can be at best as asymptotically efficient as a one-stage procedure. This is in contrast to the view of Burke et al who argue that "differences arise because of different modelling assumptions, rather than the choice of one-stage or two-stage itself." 9 The two-stage perspective tends to be that the models and their assumptions are transparent and of scientific value and are sufficient to address relevant substantive questions. There is arguably less transparency in the reporting of models used with one-stage analysis, as evidenced by two recent reviews. 10, 11 Simmonds et al found that "One-stage methods were in general more poorly described, perhaps because of the greater complexity involved in describing properties of regression models." 10 The main choices for two-stage analysis are around whether to model treatment effects as fixed or random and (a closely related choice) how much weight studies should contribute to the overall estimate. The apparent simplicity of two-stage analysis, combined with the longer history of two-stage estimation in the context of meta-analysis, may be why descriptions tend to be better.
The aim of this paper is to consider the statistical arguments for one-and two-stage analysis in the context of Gaussian outcome data, and particularly to identify differences in precision for models that can be fitted in either way. This is done in the finite-sample setting; the reality we all work in. That is, our results allow explicitly for small numbers of patients-per-study and the implied uncertainty about study-specific estimates.
To aid clarity, we focus here on the simple setting where the IPD include a quantitative outcome, an indicator of treatment assignment, a participant identifier, and a study identifier. It is assumed that the focus of the meta-analysis is on the treatment effect and that no covariates are present. This is an inadequate summary of a meta-analysis, 12 but this is often the primary focus and permits a clear discussion of the arguments for and against one-and two-stage approaches.
As evidenced by the above quotes, two-stage procedures can receive a bad press from statisticians, and we seek to establish the extent to which such comments are justified, not asymptotically, but in practice.

MODELS FOR META-ANALYSIS
Even for the relatively simple structure of datasets considered here, there are several possible models for performing a meta-analysis. Borrowing heavily from Senn, 13 Table 1 lists seven that might be of interest. We define our notation below before discussing the models in Table 1.

Notation
Let y denote outcome and x denote treatment assignment 14 (coded −0.5/+0.5). Let i = 1, … , I index studies and j = 1, … , n i index patients within a study, so that y ij is the outcome for the jth patient in the ith study. Let be the intercept term and be the treatment effect, the parameter of central interest. The interpretation of and will depend on which TABLE 1 Models for individual patient data meta-analysis. Model II is the "fixed-effects meta-analysis" and model V is the "random-effects meta-analysis"

Model
Study-specific Yes mixed y x i.study || study: x, nocons residuals(,by(study)) reml df(kr) ipdmetan, study(study) re(kr): regress y x Study-specific Yes mixed y x##study || study: , residuals(,by(study)) reml df(kr) While the model can technically be fitted in a two-stage way, it only obtains an estimate of , not so is practically useless for meta-analysis of treatment effects. b Note that ipdmetan and mvmeta are user-written Stata packages, which can be installed in Stata using ". ssc install command_name." model is used; for example, in model I, represents mean outcome (because treatment is coded −0.5/+0.5 and it is assumed that randomisation is 1:1). There are 3 possibilities for modelling the main effect (intercept) of study: random treatment effect (random treatment-by-study interaction) With a one-stage analysis, 2 i can be allowed to vary by study or restricted such that 2 i = 2 ∀i or some combination of these. For example, a fixed-effects model based on ordinary least squares (OLS) implies i = + a i (with a i representing fixed differences in intercepts for i = 1, … ,), i = , and 2 i = 2 . Two-stage analysis by default allows 2 i to vary by i (we have referred to this as default because, at the time of writing, we believe all software packages which perform stage two do so under this assumption). However, the assumption that 2 i = 2 can also be invoked with a two-stage analysis (see Olkin and Sampson 5 ). To do this, relative study weights are proportional to a function of n i rather than of̂2 i . Weighting in this way exactly recovers the one-stage OLS estimate. 15 We return to this point later.
We distinguish here between two types of variance we will refer to, particularly for understanding the workings provided in the appendices. We denote the true variance of the estimator of by Var(̂). In the frequentist sense, this is the long-run variance of̂under repeated sampling. Second, the estimated variance is denoted byVar(̂). This is what is estimated in a specific realisation of a meta-analysis. A good variance estimator should have expectation Var(̂).

Flexibility and estimands
Note the column of Table 1 stating whether the model can be fitted in one stage (top) or two (bottom). Of the models considered, (I), (III), and (IV) cannot be fitted in two stages. Further, although model (VI) can be fitted in two stages, it does not provide an estimate of . This restriction is what is meant when one-stage approaches are promoted for their flexibility.
Note that, provided that the available evidence is representative and the model specific assumptions hold, any of the models listed in Table 1 could provide an unbiased estimate of the treatment effect and its variance. The practical settings for which this is true may be rather limited. For example, if each study used simple randomisation and the same allocation ratio (not necessarily 1:1), then̂would be unbiased for all the models in Table 1. The interesting point is that this depends only on the design of the included studies and not on the true data-generating model. We refer interested readers to Parzen et al 16 or Kahan and Morris. 17

Fixed-effects model (model II)
We wish to compare the variance of one-and two-stage fixed-effect meta-analysis. There are 2 approaches to this.
1. Compare the variance of "default" or most natural one-and two-stage fixed-effect estimators. 2. Compare the variance of one-and two-stage estimators when the models and associated assumptions are the same.
The former is an apples-vs-oranges comparison. It is interesting because one-stage fixed-effect estimators are most naturally estimated by OLS (this is the default in all major software), which assumes a common variance across studies (previously described as "scarcely credible" in the context of meta-analysis). 15 Meanwhile the two-stage inverse-variance estimator most naturally assumes heterogeneity of variances across studies (again this is the default in software). Assumptions about within-study variances impact on the relative weights of studies and thus on the overall estimate of and its variance. Results may differ between one-and two-stage analysis due to subtle differences in modelling assumptions, such as this one. It is important not to attribute this to the number of stages in computation. Therefore, the comparison of defaults is very much a comparison of the modelling assumptions, rather than of the procedure by which the models are estimated (ie, in one or two stages).
The most natural one-stage analysis is to fit a model based on OLS, where the theory is well understood. This assumes 2 i is common across studies. The theory is less well understood when 2 i are allowed to be study specific, and the model is estimated based on variance-weighted least squares. The fact that the mixed command is needed to fit this model demonstrates that it is not a natural one-stage model to fit. We derive the one-stage estimator of and variance estimator allowing study-specific 2 i in Appendix A.1. These turn out to be the same as the two-stage inverse-variance estimator and the usual asymptotic inverse-variance formula:V For details, see Appendix A.1.
Thus, when we allow study-specific variances, computing our estimates in one or two stages would lead to identical point estimates and variance estimates. (In fact, the variance estimators are biased downwards in finite samples because they assume that thê2 i are known rather than estimated, as is assumed in Mathew and Nordström. 6 ) This result is related to the result of Olkin and Sampson, 5 when both models impose a shared 2 i , and to Lunn et al, 18 who present a two-stage strategy for fitting a full Bayesian model.
We focus on comparisons of variance here and, to do so, needed to derive some new theoretical results. Appendix A works through the mathematics, while the rationale and results are given here. The overall aim is to calculate the expected value of the variance for the OLS and inverse-variance estimators. This involved: 1. Calculation of the expected value of the asymptotic variance formula (1) for the inverse-variance estimator (noting again that this variance estimator is biased downwards in small samples and so produces misleading conclusions) (Appendix A.2). 2. Derivation of a new variance formula which explicitly acknowledges that thê2 i are estimates and so is unbiased with practical sample sizes (Appendix A.3). This tends to the standard formula (1) as all n i → ∞ but is less biased when any n i is small. This is where 1∕Ŵ is the standard inverse-variance formula (1). 3. Calculation of the expectation of the new small-sample variance formula (2) (Appendix A.4).
The expected value of the asymptotic and small-sample variance formulas are then compared with the version that assumes common̂2 i .
Results are shown in Figure 1, showing very little effect of the number of studies I (the 40 translucent curves are almost on top of each other) but some effect of the number of patients per study n i . However, this effect becomes small with more than about 25 patients per study. This is shown for scenarios where all studies contain the same number of participants but could equally be done with specific incidences of unbalanced data. This demonstrates that assuming 2 i is the same across studies reduces variance of̂when n i is small. This being the assumption underlying OLS, it is typically associated with one-stage analysis. Two-stage fixed-effects meta-analysis typically allows 2 i to be study specific. These are merely defaults, and it is not necessary for either procedure to use the default assumption.

Random-effects model (model V)
Here, we consider model V of Table 1, often designated as "random-effects" meta-analysis, because the I treatment effects are assumed to follow a probability distribution. There is a bewildering range of competing two-stage estimators for this model. 3 We will consider the restricted maximum likelihood estimator. It is attractive because it is more efficient than moment-based estimators such as DerSimonian and Laird's method but can provide consistent estimates of variance parameters. For finite-sample inference with restricted maximum likelihood (REML) estimation, Kenward and Roger's approximate correction to the covariance matrix and degrees of freedom is the best available option. 19

FIGURE 1
Ratio of two-to one-stage expected value of variances, using the asymptotic variance formula (1) (lines below 1) and the small sample formula (2) (lines above 1). The plot shows 39 translucent lines as the number of studies increases from i = 2, … , 40. The lines for smaller i are higher for both panels, but differences are negligible It is possible to fit the random-effects meta-analysis REML realisation of the Kenward-Roger correction in one stage via mixed in Stata or proc mixed in SAS. The covariance and degrees of freedom for two-stage meta-analysis of a single parameter has not previously been derived; we provide the derivation in Appendix B.
The one-and two-stage versions of these models were compared in a simulation study. The simulation design follows. This is structured as aims, data-generating mechanism, methods, estimand, and performance measures.

• Aims
The aim of the simulation study is to compare one-and two-stage procedures in terms of (1) precision of meta-analysis with random treatment effects and fixed-study intercepts and (2) coverage of 95% confidence intervals. Both are with respect to , the overall average treatment effect.

• Data-generating mechanism
Individual participant data were simulated for independently 2 up to 40 studies from model V of Table 1, that is, studies had fixed intercepts and random treatment effects (drawn from a Gaussian distribution). One thousand simulated datasets were produced using Stata 14's default random-number generator (64-bit Mersenne twister). For the data-generating mechanism we used, parameter values were set to = a i = 0∀i, = 0 with E[̂2] = 1, and E[̂2] = 1 with Var(̂2) = 50∕ 2 50 (where 2 50 denotes a random draw from a 2 distribution on 50 df). Study sizes were unbalanced. Following Rücker et al, 20 based on Galandi et al, 21 study sizes n i were drawn from a log-normal distribution with E(logn i ) = 3.798 and SD(logn i ) = 1.104, with n i rounded to the nearest integer or to 20 for any drawn values < 20.

• Methods
One-stage analysis was done in SAS 9.3 using proc mixed, while two-stage analysis 3 used ipdmetan in Stata 14, based on the same simulated datasets. The choice of SAS for one-stage models was for computational speed: Both packages return the same results, but SAS currently gets there faster. The only difference in the methods is that the two-stage variance estimates were based on expected information and the one-stage on observed.

• Estimand
The estimand of interest is the meta-analytic estimate of average overall treatment effect . As earlier mentioned, we acknowledge that this is not an adequate summary of a meta-analysis but is usually the parameter of central interest.

• Performance measures
The key performance measure is precision: the inverse of the empirical variance of̂. 22 This is estimated for two-stage relative to one-stage (see White 22 ) and presented as "% gain". The empirical variance is based only on the REML point estimate and so does not depend on the Kenward-Roger adjustment. To assess the new variance and df adjustments, we also compare the coverage of nominal 95% confidence intervals.  The Stata files required to generate data and run one-and two-stage analyses are included as a supplement to this article. Also included is the simulation analysis of the results (though the code to produce figures is not). 22 Figure 2 plots results for relative precision and coverage. All results are accompanied by 95% Monte Carlo confidence intervals to describe simulation uncertainty. 22 The upper panel displays relative precision, showing that the two-stage REML estimator has precision extremely similar to its one-stage counterpart under this data-generating mechanism. Although we are not able to obtain an analytic result to support whether this is a general or specific result, it appears to be in line with the equivalence of one-and two-stage for fixed-effects meta-analysis when the same models are fitted.
Coverage results (the lower panel of Figure 2) indicate that both methods tend to have good coverage, except when the number of studies is below 5. Inspection of the simulation results shows that variance estimates are accurate, on average very close to the true (empirical) variance. The conservative intervals appear to be due to the df estimated by two-stage Kenward-Roger approach being conservative (two-stage) and anti-conservative (one-stage). For both procedures, the variance formulas closely matched the empirical variance of the REML estimate of . The issue with confidence interval coverage is then due to the df used. A comparison of the one-and two-stage df across simulation runs showed that the one-stage df were slightly higher than the two-stage counterpart. This is apparently due to the approximations involved in the Taylor expansions, which begin to fail in different directions at very small study sample sizes.

ILLUSTRATION OF METHODS FOR INDANA META-ANALYSIS
The Indana data includes IPD from 10 trials of antihypertensive drugs in patients at high risk of cardiovascular disease. 23,24 Several outcome variables were collected, such as death, stroke, systolic blood pressure, diastolic blood pressure, and cholesterol. We focus here on systolic blood pressure (SBP), which was recorded annually at years 1 to 5 in  seven of the 10 studies and never in the remaining three. As previously, we consider a simple analysis involving a linear regression of SBP on randomised group. Analysis was restricted to participants with observed values of SBP at 1 year. The situation is shown in Figure 3, a forest plot of the within-study estimates of the mean difference in SBP at 1 year. The trials are highly heterogeneous in many ways, and this is apparent in the estimated treatment effects. A fixed-effect meta-analysis will thus produce a confidence interval that is too narrow and relates to what is arguably an ill-defined target parameter, as the assumption of a constant treatment effect is clearly not supported. Meanwhile, the smaller trials seem to have larger effects, so that the random-effects point estimate may be too large. Our purpose here is not to produce the most appropriate summary of the overall effect in terms of the point estimate and confidence interval; instead, we aim to demonstrate the impact of different modelling assumptions and of the number of stages used to fit a given model, on these quantities.
We consider three different modelling assumptions, matching the models considered in Section 2. First, a fixed-effect meta-analysis where 2 i is assumed to be equal in all studies; second, a fixed-effect meta-analysis under the assumption that 2 i are different across studies; and third, a random-effects meta-analysis also allowing 2 i to vary across studies. Table 2 gives the results of applying one-and two-stage procedures for parameter estimation. Results are intentionally given to 4 decimal places. We see clear differences between different modelling assumptions, the largest difference being between the fixed-and random-effects models. However, the different procedures produce almost identical estimates of and very similar confidence intervals.

DISCUSSION
Focusing on practical sample sizes (rather than asymptotic results), this paper attempts to resolve the debate about using one-or two-stage estimation methods in meta-analysis. We have considered Gaussian data and demonstrated theoretically (for fixed effect) and via simulation (for random-effects models), that provided the same underlying model is used, inference from one-and two-stage procedures is practically equivalent for 2 models which are practically relevant.
To make progress, we focused on a very simple data structure and considered combining evidence from randomised studies. We considered the main arguments regarding "flexibility" and precision. Table 1 shows that a wider range of statistical models can be fitted in a one-stage framework; however, for many meta-analyses, this flexibility is not needed.
To study precision, we derived some new results. Specifically, the one-stage fixed-effects estimator of which allows for study-specific variances; a new variance estimator which admits that̂2 i are estimated and not known (again for the fixed-effect model allowing study-specific variances); and estimated the expectation of this formula, comparing it to the expectation of the OLS variance, wherê2 i is assumed to be equal across all studies. For random-effect meta-analysis, we also calculated and implemented the Kenward-Roger small sample correction.
In practical sample sizes, our work shows that the arguments regarding precision are largely redundant, as previously shown asymptotically for more general models. 7 Mathew and Nördstrom claim to show that a one-stage analysis will always be at least as precise as two-stage, but the only example of an inequality is for a meta-analysis with no study effects at all, which is of no practical interest. 6 Appendix A.1 shows that the fixed-effect model can be fitted using weighted one-stage estimation, which returns the same estimates as would be obtained through inverse-variance weighted two-stage estimation. Meanwhile, Senn 15 shows how OLS estimates can be recovered from summary data in two stages. The results displayed in Figure 1 show that the latter is more precise than the former. Thus, analysis in two stages may be less precise, more precise, or identical to two stages, depending on the modelling assumptions made by the computational procedure. In summary, these papers do not contradict our key finding, which is that for equivalent models, fitting by a one-or two-stage procedure gives practically equivalent results.
Improvements in precision must be attributed to different modelling assumptions, that is, the precision argument for one-stage procedures is actually about flexibility again. A different model may be more or less precise: In general, we buy information with assumptions. We argue that model choice should never be based on precision. Opting to fit a model in one stage or two based on the perceived efficiency is in fact doing exactly this. In practice, it is important to make modelling assumptions that are tenable and fit the model as using whatever procedure allows for these assumptions. For the assumptions which may change between one-and two-stage procedures, see Burke et al. 9 Due to the simple data structure considered, we have not touched on covariate adjustment here. Adjusting for the main effects of covariates may be desirable, either to obtain a valid estimate of Var(̂i) (for example, where covariates have been balanced by design 25 ) or to increase power. 26 Here, a one-stage model can estimate covariate effects to have the same values across all studies. Two-stage analysis relaxes this assumption by fitting a model which implies separate covariate effects for each study, which we conjecture is the correct way to ensure a valid estimate of Var(̂i). To fit a one-stage version of this model, covariate-study interaction terms need to be included and, if the modelling assumptions are identical, our results would hold. If the covariate effects truly differ by study, then the model that incorporates this (whether fitted by a one-or two-stage procedure) will tend to give more precise estimates; the converse is also true.
For meta-analysis of patient-level treatment−covariate interactions, using the two-stage procedure of fitting a model including interactions and combining these within-study interactions in a second-stage meta-analysis, guards against ecological bias. 27-29 As always, a practically equivalent model can be fitted using a one-stage procedure; however, in this setting, care needs to be taken over the parameterisation and study-specific covariate means must be adjusted for. 28,29 This may not be intuitive, and in this context, a one-stage analysis should aim to estimate the same effect as a two-stage analysis. Here, a model that can be fitted in one stage but not two may actually be misleading ("deluded" in the parlance of Fisher et al). 28 For practical meta-analysis, a clear description of the model and assumptions is important. See Burke et al 9 for a description of practical assumptions which may differ between one-and two-stage meta-analysis: The relevance of each of these should be considered and described.
When fitting a model in two stages, it is important to describe whether a fixed-or random-effects model is being used. For fixed-effect models, the assumption about common or study-specifiĉ2 i then needs to be described. For random effects, the random-effects model should be described. The clearest way of doing this is in terms of the estimator of tau squared (eg, DerSimonian-Laird or REML 3 ) followed by any additional details such as use of the Hartung-Knapp/Sidik-Jonkman or Kenward-Roger variance correction. When fitting a model in one stage, some modelling aspects to describe may be whether random effects were estimated and the parameters they were attached to; whether̂2 i was shared or study specific; any correction to the covariance matrix; whether parameters for covariate effects were constrained to be equal or allowed to vary across studies; and how interactions were estimated.
We encourage those involved in meta-analysis methodology to focus more on consideration of models and assumptions and less on the procedures used for parameter estimation. The procedure used (one or two stages) is a computational tool; the model being fitted is what the tool aims to construct, and a tool that does the required job conveniently should be favoured.

CONFLICTS OF INTEREST
The meta-analysis group of the MRC Clinical Trials Unit at UCL have historically used two-stage approaches. Defending the approach may buttress our colleagues' work. DJF wrote the ipdmetan Stata package, which performs two-stage meta-analysis; our colleague Ian White wrote the mvmeta Stata package; defending two-stage approaches may increase the use of these package and citations of the articles. TPM came into this research expecting one-stage analysis to be better than two-stage and now favours neither approach other than the most convenient. The authors declare that they have no other conflicts of interest.

APPENDIX A: PROPERTIES OF WEIGHTED ESTIMATORS
The below sections derive the variance of the fixed-effects inverse-variance estimator. Unless otherwise stated, summations are over all i, the study index.

A.1 Variance estimator for a one-stage analysis assuming varying variances
The fixed effects model is Y ij = i + X ij + ij , where ij ∼ N(0, 2 ), i indexes studies and j indexes patients. This model is estimated by (X ′ X) −1 X ′ Y and covariance matrix by (X ′ X) −1̂2 . We require the expectation ofVar(̂).
Assume that we have I studies, each with n i patients. Define the ∑ n i × (I + 1) design matrix The first I columns indicate (exclusive) membership in study i = 1, … , I and the (I + 1)th indicates treatment assignments. For a one-stage analysis,Var(̂) is the (I + 1)th diagonal element of̂2(X ′ X) −1 . First, (X ′ X) is (I + 1) × (I + 1): The inverse is We are interested in the last element (variance of the treatment effect), which isVar(̂) The shared-2 assumption can be relaxed in the following way: Construct a ∑ n i × ∑ n i diagonal matrix of weights W, where the nonzero elements are 1∕̂2 i ; the variance-covariance matrix is then estimated as (X ′ WX) −1̂2 . The matrix X ′ WX is The solid lines show how the matrix can be partitioned to invert the (I+1)th diagonal element (variance of the treatment effect), which is This is identical to the standard inverse-variance formula used by a two-stage procedure. The algebra can also be worked through for the estimator of to include weights W, that is (X ′ WX) −1 X ′ Wy. The one-and two-stage procedures then give identical results for̂. This complements the result described and worked through in Senn, 15 taken from Olkin and Sampson, 5 which shows that a two-stage procedure can obtain one-stage estimates when modified to make the same assumptions of equal 2 i ∀i.

A.2 Expected value of standard two-stage variance estimator
Let i = 1, … , I index studies and j = 1, … , n i index individuals. The unadjusted estimator of for active vs control iŝ with variance for study i estimated byV This first stage is repeated within each study. Define weightsŵ i = n∕4̂2 i and let For the second stage, the inverse-variance estimator of the overall treatment effect iŝ The variance of is almost always estimated bŷ We now calculate the expectation ofVar(̂) for 2 studies. Note that and Var(̂2 i ) = 2 4 i n i − 1 . .

(A3)
For a meta-analysis of two studies, our variance estimator iŝ The first and second partial derivatives for x and y are Denote f x after evaluating x at x = 2 1 , = 2 , and similarly for f y , f xx , f yy . By a Taylor expansion, Note that E[ x ] = E[ y ] = 0, and studies are uncorrelated, so E[ x y ] = 0, then Within each study, Generalising to I studies, This gives us that This result is used to compare the expectation of (A2) to the OLS fixed-effect estimator (results given in Figure 1, upper panel).

A.3 An improved variance estimator to partner the inverse-variance estimator ofB
ecause the standard variance formula given in (A2) is biased downwards in finite samples, we derive an improved estimator for the variance of (A1). Let̂=

First derivatives wrt̂i andŵ
By a Taylor expansion of f(), we get The bold 0s in (A6) are always equal to 0 in meta-analysis provided studies can be assumed independent. The roman 0s are set to 0 here because we are considering linear models, where an estimate and its variance are independent. Note that when i and Var( i ) cannot be assumed independent, as with binary outcomes, the roman 0s in (A6) need to be replaced with the correct terms.
(A7) provides a variance estimator for Gaussian data which improves on (A2) in finite samples.

A.4 Expected value of the variance estimator (A7)
The estimator is given by A1, whereŵ . i andŵ i are assumed to be independent. Note thatŵ i can be expressed as a function of the random variable X i . Let X represent the set of all X i s.
We want the variance of̂.
The generic problem is that we want the expectation of the ratio for A and B as defined above with arbitrary positive constants e i and f i . Use the second-order expression 30 The calculation of the required moments follows. , , and Var(X 2 i ) = , .
Substituting the expectations E(A), E(B), Cov(A, B), and Var(B) into (A8) gives an expression for the actual variance of̂. This has been verified in simulation; the Mata code to calculate the formula and verify (A8) and its components is provided as a supplement. Also included is the log file. This is run for 7 studies where, arbitratily, f = [1, 2, 3, 4, 5, 6, 7] and e = [10, 5, 3 1 3 , 2.5, 2, 1 2 3 , 1 3 7 ]. We simulated the X i s from 1∕ 2 v i and calculated We provide 2 simulation results. First, Var(R) is estimated empirically from the simulated values; second, it is calculated using (A8) where empirical values of E(A), E(B), Var(B), and Cov(A, B) are substituted for the theoretical expectations. These results, and those based on the expectations, are given in Table A1 for 1 × 10 10 repetitions. A tiny error in the approximation (A8) is seen, but the empirical and theoretical components of (A8) are near identical. This justifies the use of the theoretical formula to compare expected variances of the inverse-variance and OLS estimators (as in Figure 1).

APPENDIX B: KENWARD-ROGER CORRECTION FOR TWO-STAGE RANDOM-EFFECTS META-ANALYSIS ESTIMATED BY REML
This section derives the Kenward-Roger variance estimator and degrees of freedom for two-stage meta-analysis using the notation of Kenward and Roger. 19,31 We first recap the general approach and notation before applying it to the meta-analysis model. The general linear mixed-effects model can be written as where y is a (k×1) vector of random variables, X is a (k×p) matrix of known constants for the (p×1) fixed-effect parameter vector , Z is the (k × q) design matrix for the (q × 1) random-effects parameter vector , and e is a (k × 1) vector of random error terms.
We assume E[ ] = 0, E[e] = 0, and Cov[ , e] = 0. Define D as the (q × q) covariance matrix of the random-effects parameters in , and R as the (k×k) covariance matrix of e. Then, , the (k×k) covariance matrix of y, is equal to ZDZ ⊤ +R. After assuming normality of the random terms in the model, we obtainy ∼ N(X , ).
Kenward and Roger describe under the general linear mixed-effects model as being an unknown, block diagonal, (k × k) variance-covariance matrix whose elements are well-behaved functions of (r × 1). 19 The standard approach would be to use REML to find an estimatêfor and proceed to find a REML estimator̂= (̂)X ⊤ (̂) −1 Y for , with variancê(̂) = [X ⊤ (̂) −1 X] −1 , where and hence are dependent on̂.
The standard REML estimator of Var(̂),̂= [X ⊤ −1 X] −1 , is the variance-covariance matrix of the asymptotic limiting distribution of̂based on the REML estimation of . Kenward and Roger obtain a better approximation than̂to the finite-sample variance-covariance matrix of̂.
The corrected estimator can be written aŝ and W ij is the (i, j)th element of the variance-covariance matrix of , that is, of W = Var[ ], which may be approximated by the inverse of the observed or expected information matrix. Kenward and Roger give the following expression for the expected information: ) .
The third term of̂A, and therefore V i and S ij , may be ignored if the covariance matrix is linear in . Those quantities are defined as follows:

B.1 The random-effects meta-analysis model
For the random-effects meta-analysis model, y consists of the I treatment effect estimates i , X = 1 i , includes the "pooled" effect size , and is diagonal with elements ii =̂2 i +̂2, wherê2 i is the estimated variance of i and̂2 is the estimated between-study heterogeneity.
We proceed by noting that Kenward and Roger's 1 = 2 (note that this is distinct from i , the within-study standard deviation) and also define w i = ( 2 i + 1 ) −1 , so that ii = w −1 i . Then, To calculate approximate degrees of freedom m, working through the formulae of Kenward and Roger 19(p987) :