Multistep estimators of the between‐study covariance matrix under the multivariate random‐effects model for meta‐analysis

A wide variety of methods are available to estimate the between‐study variance under the univariate random‐effects model for meta‐analysis. Some, but not all, of these estimators have been extended so that they can be used in the multivariate setting. We begin by extending the univariate generalised method of moments, which immediately provides a wider class of multivariate methods than was previously available. However, our main proposal is to use this new type of estimator to derive multivariate multistep estimators of the between‐study covariance matrix. We then use the connection between the univariate multistep and Paule–Mandel estimators to motivate taking the limit, where the number of steps tends toward infinity. We illustrate our methodology using two contrasting examples and investigate its properties in a simulation study. We conclude that the proposed methodology is a fully viable alternative to existing estimation methods, is well suited to sensitivity analyses that explore the use of alternative estimators, and should be used instead of the existing DerSimonian and Laird‐type moments based estimator in application areas where data are expected to be heterogeneous. However, multistep estimators do not seem to outperform the existing estimators when the data are more homogeneous. Advantages of the new multivariate multistep estimator include its semi‐parametric nature and that it is computationally feasible in high dimensions. Our proposed estimation methods are also applicable for multivariate random‐effects meta‐regression, where study‐level covariates are included in the model.

studies' true effect sizes (ie, heterogeneity) in the model.The estimation of the between-study variance has received considerable attention in the literature. 8,9Accurate estimation of this parameter is desirable for quantifying the heterogeneity in the studies' true effect sizes and this also has implications for the estimation of the average effect, because the estimated between-study variance is incorporated into the study weights.A wide variety of point estimators of the between-study variance has been proposed. 8,9The estimator that is currently most often used was proposed by DerSimonian and Laird, [10][11][12] but the restricted maximum likelihood and Paule-Mandel estimator are nowadays recommended. 8,9ore recently, Van Aert and Jackson 13 revealed a relationship between the multistep and Paule-Mandel estimator, where the Paule-Mandel estimate is obtained in the limit as the number of steps tends towards infinity.
The multivariate random-effects meta-analysis model 14,15 is a generalization of the univariate model.This model allows the joint analysis of multiple, and possibly correlated, outcomes.There are numerous advantages of the multivariate approach, 16 including the potential for improving precision, reducing bias, and facilitating the appropriate calculation of joint confidence intervals.However, there are also potential disadvantages of using the multivariate approach, so that it does not result in better inferences in every case. 15,16Potential disadvantages of the multivariate approach are, for instance, described by Jackson et al 16 in their section 9.8, which include encountering estimation problems and that bias, rather than strength, might be borrowed from some outcomes.
In this article, we focus on the multivariate random-effects model and in particular on estimation of the between-study covariance matrix.The between-study covariance matrix contains the variances of true effect sizes of each outcome and their corresponding covariances.8][19][20][21][22] The main contribution of this article is that we propose a novel multistep estimator of the between-study covariance matrix.
The development of our new estimator begins by extending the univariate generalised method of moments estimator 3,13 to the multivariate setting.We achieve this by generalizing the multivariate estimation method of Jackson et al 23 to accommodate a more general weighting scheme.This immediately provides a wider class of multivariate estimation methods than was previously possible.However, our main proposal is to use this to motivate multivariate multistep estimators that are analogous to the univariate multistep estimator. 13We will derive our results under the more general multivariate model for meta-regression, where study level covariates are included, but meta-analysis (no covariates) is our focus.By taking the limit as the number of steps tends toward infinity in the multistep estimator, we are able to tentatively propose a multivariate Paule-Mandel 24 estimator.Our work therefore connects and extends a wide variety of methods for meta-analysis that have recently been published.
The rest of the article is set out as follows.In Section 2, we present the multivariate random-effects model for meta-regression.In Section 3, we develop the multivariate generalised method of moments estimator, and so develop a new class of estimators for the between-study covariance matrix.In Section 4, we use this to motivate the multistep estimator, and tentatively propose that the limiting form (where the number of steps tends towards infinity) of this estimator should be considered a Paule-Mandel-type estimator.In Section 5, we illustrate our new methodology using two contrasting examples.We perform a simulation study in Section 6, and show that our proposed methodology is a fully viable alternative to the existing methods.We conclude with a discussion in Section 7.

THE MULTIVARIATE RANDOM-EFFECTS MODEL FOR META-REGRESSION
Let d be the number of outcomes in the meta-analysis and p be the number of regression parameters.If there are no covariates, then there is an intercept for each outcome in the "regression", and p = d.The multivariate random-effects meta-regression model 23,25,26 is for all studies i = 1, 2, … n, where Y i is the d × 1 column vector of effect size estimates of the outcomes (or summary effect measures) associated with study i, S i is the d × d corresponding within-study covariance matrix,  is the d × d between-study covariance matrix, X i is the d × p design matrix for study i and  is the p × 1 column vector containing the regression coefficients.We use N in model (1) to refer to the multivariate normal distribution.We assume that the outcomes from different studies are independent, but within studies outcomes may be correlated via either nonzero off-diagonal entries of S i or  (or both).
For a multivariate meta-analysis (no covariates), X i is the d × d identity matrix and  is the d × 1 column vector of average outcome effects.If instead covariates are included, then the X i contain further columns of covariates in order to describe the multivariate meta-regression.We adopt the convention of treating the S i as fixed (and known) constants but these are estimated in practice.Adopting this convention may, just as in the univariate setting, [27][28][29][30] result in inaccurate inferences, because the uncertainty in S i is ignored.If a study does not provide all outcomes (some studies may only estimate a subset of the outcomes of interest) then, assuming these are missing at random (MAR), 31 the model for the observed outcomes for study i is taken as the implied marginal model from (1); that is we use the submodel from (1) that describes the observed Y i .
To make inferences about the average outcome effects vector , a "two-stage" approach to estimation is used where we estimate the between-study covariance matrix  in the first stage.While the estimation of this covariance matrix is the main focus of our article, we proceed in this section by assuming that this estimation has been performed.Inference for  is then performed in the "second stage" by approximating  = Σ.Recalling that we treat the S i as known, this means that we treat the covariance matrix of Y i in model (1), S i + , as known.When making inferences for , we therefore use a weighted linear regression where all weights are known.Let Y obs denote the stacked vector of the observed entries of all the Y i .Let X obs denote the design matrix for Y obs as implied by model (1).Let Var(Y obs ) =  −1 be the block diagonal covariance matrix of Y obs implied by model (1), where  is treated as fixed and known.Then where t denotes the matrix transpose, which is approximately normally distributed with covariance matrix such that standard errors of the estimates can be obtained as the square root of the diagonal entries of Var( β); 95% confidence intervals for each entry of  can be obtained as the estimates plus and minus the corresponding standard errors multiplied by the 97.5% quantile of the standard normal distribution.
Missing outcome data can also be accommodated by instead defining  to be the precision matrix of Y, the stacked vector of the Y i including the missing observations.The sub-matrix of  corresponding to the observed outcomes is then the inverse of Var(Y obs ) and all other rows and columns of  contain only zeroes.Equations ( 2) and (3) are then applied with X obs replaced by X, where X is obtained by vertically stacking the X i , and Y obs replaced by Y; missing values of Y are imputed with an arbitrary value (such as zero) in this computation.This alternative approach allocates zero precision to missing observations and ensures that the imputed values used to accommodate missing data do not contribute to the estimation.

THE MULTIVARIATE GENERALIZED METHOD OF MOMENTS
In this section, we will develop the generalised method of moments for estimating  under the multivariate random-effects model for meta-regression.Our main interest is estimation for meta-analysis (no covariates) for which we obtain the necessary estimating equations as a special case.We continue in Section 3.1 with describing the existing method of Jackson et al. 23 Our proposed new methodology commences in Section 3.2.

The method of Jackson et al
Before extending the estimation method of Jackson et al 23 to accommodate a more general weighting scheme, and so derive the multivariate generalized method of moments, we will briefly describe their approach.Jackson et al 23 initially compute the fitted d × 1 outcome vectors from the equal-effect model model ( = 0), which they denote by Ŷi .These fitted values are easily computed for all study outcomes (both observed and any that are missing).They then define where R i is a d × d diagonal matrix containing the missing data indicator of Y i .Letting Y i,j be the jth entry of Y i , j = 1, 2, … d, the jth entry of the leading diagonal of R i is therefore equal to one if Y i,j is observed and is zero if Y i,j is missing.W i is the d × d within-study precision matrix associated with study i.If all outcomes are provided by study i, then W i = S −1 i .If some of these outcomes are missing, then we take the sub-matrix formed by the rows and columns of S i that correspond to the observed outcomes.Then, we obtain W i by including columns and rows of zero that correspond to the missing outcomes whilst the other rows and columns of W i are given by the corresponding entries of S −1 i of reduced dimension.Missing outcomes are therefore allocated zero precision, in the similar way to the alternative approach for handling missing data when making inferences about  in Section 2.1.
The pre-multiplication of the residuals (Y i − Ŷi ) by the R i in Equation ( 4) ensures that missing values of Y i do not contribute to Q.This means that any missing outcomes in the Y i can be imputed with an arbitrary value without affecting the computation, for which zero is often used in practice.Missing outcome data, which are common in application, are therefore easily accommodated.From the definition of W i , we have W i R i = W i .Hence, we can simplify Equation (4) slightly and write ( Jackson et al 23 where Q is given in Equation ( 5), as a function of  (their Equation 4).They then use the method of moments, and so replace E[Q] with the observed value of Q, and  with its estimate, to provide an estimating equation for Σ (their Equation 6).The resulting estimating equation cannot immediately be solved, because Σ appears in the middle of matrix products.Hence, Jackson et al 23 apply the vec operator to the estimating equation, and use the identity vec(AXB) = (B t ⊗ A)vec(X), where ⊗ denotes the Kronecker product, to produce an estimating equation where Σ is the final term in all matrix products.The resulting estimating equation can then be solved to provide an initial Σ.This initial estimate is not symmetrical or positive semi-definite, both of which are required for it to be acceptable.
By taking the average of this asymmetrical estimate and its transpose, which is equivalent to matching the moments of (Q + Q t )∕2, a symmetrical estimate of  is easily obtained.Then, by writing this symmetrical estimate in terms of its spectral decomposition, and setting any negative eigenvalues in this decomposition to zero, a final positive semi-definite symmetrical estimator of  is obtained as follows: where  i is the ith eigenvalue of the symmetrical Σ and e i is the corresponding normalized eigenvector.The final "truncated" (symmetrical and positive semi-definite) estimate Σ+ is then used to make inferences about , as explained in Section 2. The matrix Q in Equations ( 4) and ( 5) is based on a weighting scheme where residuals (Y i − Ŷi ) are weighted by their (assumed fixed and known) within-study precision matrices W i .This is most easily seen from Equation (5) where the outer products of these residuals are pre-multiplied by the W i .Furthermore, the Ŷi , and so the residuals, are also conceptualised as being estimated using these weighting matrices.To see why this is the case, we note that Equation ( 5) is equal to where btr (M) is the block trace operator, 23 which is the sum of all n of the d × d matrices along the main diagonal of matrix M, Y and Ŷ are obtained by vertically stacking the Y i and Ŷi , respectively, W = diag(W i ) which is the block diagonal matrix containing the W i and R = diag(R i ).To compute Ŷ in Equation ( 6), we use the standard weighted regression result where X is obtained by vertically stacking the X i and H is the 'hat' matrix under the equal-effect model.Hence, W is the weighting matrix used in the weighted linear regression used to compute Ŷ.The study results are therefore conceptualised as being weighted by their W i when computing the Ŷi , as stated.

Generalizing the estimation method of Jackson et al
We generalise this estimation method so that it is based upon an alternative weight matrix W * = diag(W * i ), so that the study results are instead weighted by the W * i .If we set W * = W, then we recover the previous method of Jackson et al 23 as a special case.The study specific W * i are interpreted as arbitrary weighting matrices so that, other than being block diagonal, W * is also an arbitrary weighting matrix.For example, by using W * i that are more similar than the W i , we can weight studies more equally.We can then, for example, expect the estimator to perform better when the studies are heterogeneous because the total study covariances matrices S i +  will be similar when the entries of  are large.
Hence, instead of Equations ( 6) and (7), we base our estimation on where We assume that all entries of W * are fixed and known constants but we no longer require that the weighting matrix is related to the precision of Y. Now that W * may be of a more general form, we use the notation H * and Ŷ * to emphasise that all aspects of this weighted regression depend on the weighting scheme adopted.

Constraints on the form of W *
Although we permit quite a general form for W * , we place three constraints on this.Firstly, we require that all rows and columns of W * that correspond to missing data in Y must be zero.This is so that missing outcome data do not contribute to Ŷ * in Equation ( 9), and so also do not contribute to Q * in (8), as in the special case of Jackson et al. 23 This means that we can impute missing outcomes with any arbitrary value when computing Q * , such as zero, in exactly the same way as when computing Q.
Secondly, we require that the sub-matrix of W * corresponding to observed outcomes is symmetrical and positive definite.This ensures that the weight matrix W * can be conceptualized as being the inverse of a valid covariance matrix, where missing data have been accommodated with zero precision, so that regression implicit in Equation ( 9) is valid under the theory of generalized least squares.
Finally, we require that W * is block diagonal, where the ith d × d block along the main diagonal corresponds to the ith study.This ensures that each block can be interpreted as a weighting matrix for each study and the weighting matrix W * respects the fact that the studies are assumed to be independent.However, this is not really an additional constraint, because we have defined W * = diag(W * i ) such that W * is necessarily block diagonal.

3.2.2
Computing the expectation of Equation ( 8) and deriving the resulting estimating equation The estimation procedure for  proceeds in a similar way as in Jackson et al. 23 In the web supplementary materials we show that, under the assumptions of model ( 1), the expectation of Q * in Equation ( 8) is given by where ), I nd is the nd × nd identity matrix; we partition the nd × nd matrices A * and B * into n 2 blocks of dimension d × d and we denote the ith by jth sub-matrix of A * and B * by A * i,j and B * i,j , respectively.We now apply the method of moments, and so replace E[Q * ] with its observed value Q * , and  with Σ. Equation ( 10) therefore provides the estimating equation We apply the vec operator to Equation (11) giving Equation ( 12) can then easily be solved for .We then proceed as in Section 3.1 to produce a final symmetrical and positive semi-definite Σ+ , and the estimation of  is completed.Inferences for  can then be made as explained in Section 2.1.
The matrix H * has the properties W * H * W * −1 = H * t and (I nd − H * t ) 2 = (I nd − H * t ).Upon substituting W = S −1 , these properties quickly give rise to the previous estimating equations of Jackson et al. 23 Equation ( 12) is a multivariate generalization of estimating Equation (2) of van Aert and Jackson 13 and Equation ( 6) of DerSimonian and Kacker. 3

Handling missing data
By requiring that all rows and columns of W * that correspond to missing data in Y must be zero, missing entries of Y do not contribute to the computation of Q * and may be imputed with any arbitrary value (Section 3.2.1).The corresponding missing values in S= diag(S i ) must also be accommodated in estimating Equation (12).S does not contribute to the computation of Q * and only the first term on the right-hand side of the estimating equation ( 12) depends on this matrix via M * = btr(A * SB * ).Hence, we must also accommodate missing values of S when computing M * .This was not an issue for Jackson et al, 23 because algebraic simplification was possible to remove S from the calculations when weighting studies by their within-study precision matrices.
In the web supplementary materials we show that missing values of S do not contribute to A * SB * , and so do not contribute to the computation of M * .Hence, the missing values of S, like those of Y, can be imputed with an arbitrary number when performing the calculations.We therefore (arbitrarily) impute zeroes for all missing entries of S and Y.It makes intuitive sense that the assumed properties of the missing outcomes do not affect any aspect of the estimation.This is because the definition of Q * , with the constraint that W * is block diagonal and contains zeroes for missing outcomes, ensures that missing outcomes do not contribute to any aspect of the computation and so their assumed properties are irrelevant.

MULTI-STEP ESTIMATORS
The multivariate estimating equation ( 12) provides a much wider class of estimators than was previously available.
Although this equation could be used with a variety of weighting schemes, our main proposal is to use it to motivate the multivariate multistep estimator.

The univariate case
Our work builds upon that of DerSimonian and Kacker 3 who develop two-step estimators of the between-study variance in the univariate setting.An initial ('first step') estimate of the between-study variance (such as the DerSimonian and Laird estimator) is calculated.After deriving the univariate generalized method of moments, the second step estimator is then obtained using weights equal to the reciprocal of the total (the within plus the first-step estimate of the between) study variances.
Van Aert and Jackson 13 extend this idea to develop univariate multistep estimators.The second step estimator of the between-study variance is then used in the same way as the one from the first step, and as explained in the previous paragraph, to compute the third step estimator.This estimate is then used to compute the fourth step estimator, and so on.

The multivariate case
We define the multivariate multistep estimator of  in an analogous way.An initial estimate Σ1 is calculated, for example using the estimator of Jackson et al. 23 The second step estimator, Σ2 , is then calculated using estimating equation (12)  where W * i is the estimated total precision of the ith study study using the estimate of Σ1 from the first step, and If instead some entries of Y i are missing then we define W * i as the corresponding precision matrix in the usual way; the sub-matrix of W * i corresponding to observed data is obtained as the inverse of the estimated covariance matrix of the observed entries of Y i and the other entries of W * i are set to zero.We truncate Σ2 as explained in Section 3.1 to provide a symmetrical and positive semi-definite second-step estimate, to ensure the weighting matrix used in the subsequent step has the properties described in Section 3.2.1.We then estimate Σ3 in the same way, but where W * i is the estimated total precision of the ith study study using the estimate Σ2 from the second step.We then compute Σ4 using Σ3 to compute the weights, and so on.
We will explore the use of alternative initial estimates Σ1 in the examples and simulation studies that follow.A simple way to implement different Σ1 is to define a variety of 'starting' or 'step-zero' matrices Σ0 and then calculate Σ1 as the corresponding initial estimate, in the iterative way described in the previous paragraph.For example, and in particular, by taking Σ0 = 0, the initial Σ1 is then the DerSimonian and Laird type estimator of Jackson et al. 23 This observation shows that the DerSimonian and Laird estimator may be naturally embedded into the multistep framework using a 'starting' or 'step-zero' matrix corresponding to homogeneity.By using alternative Σ0 , for example taking this to be an average or the largest within-study covariance matrix, we can commence the multistep estimation in different parts of the parameter space; Σ0 can be interpreted as the analysts' prior conjecture for the true .
By allowing the number of steps to become large, we can investigate the limiting properties of the multistep estimator.In this limit, we obtain the Paule-Mandel estimator 24 in the univariate case (when convergence is obtained, but non-convergence is an extremely rare event), 13 and so this limit is a potential way to define a multivariate Paule-Mandel estimate of .However for this to be acceptable, we will require that the limiting properties of the multistep estimator are not sensitive to the choice of Σ1 .The use of alternative 'starting' or 'step-zero' matrices, as described in the previous paragraph, will be used to assess this.

EXAMPLES
In this section, we illustrate the multistep estimator by applying it to two examples.Both examples are two-dimensional meta-analyses.Data are complete for the first example whereas not all studies report the effect size estimates for both outcomes in the second example.We only show the results of the multistep estimator for the multivariate random-effects model (ie, no covariates) in the article.However, the R code used for the analyses is available at https://osf.io/58u3w/and contains an example where a study level covariate is included in the model.In our examples, we will evaluate the use of the proposed multi-step and the restricted maximum likelihood (REML) 32 estimators.We will also evaluate the existing DerSimonian and Laird (DL) type estimator proposed by Jackson et al 23 (Section 3.1) by using Σ0 = 0 (Section 4.2).

Example one: Complete data
Antczak-Bouckoms et al 33 compare surgical and non-surgical procedures for treating periodontal disease in this famous example.Data for this meta-analysis are available in Table 1.This meta-analysis consists of five studies and two outcomes of interest: probing depth and attachment level.The effect size measure used was the mean difference in millimeters of probing depth and attachment level when comparing surgical versus non-surgical treatment.Positive effect sizes indicate that surgery was more effective.The forest plot in the left panel of Figure 1 provides an overview of the included studies.
Figure 2 shows the entries of the estimated between-study covariance matrix for the first 10 steps of the multistep estimator.Three different initial estimates Σ1 were used, and implemented by using alternative 'starting' or 'step-zero' matrices Σ0 (see Section 4.2).The results shown in black use Σ0 = 0, so that Σ1 is the DerSimonian and Laird type estimator of Jackson et al. 23 The results shown in green use an average within-study covariance matrix as Σ0 : the five within-study covariance matrices were ranked by their trace (the sum of their diagonal terms) and Σ0 was taken to be the median.Finally, the results shown in orange use the within-study covariance matrix with the largest trace as Σ0 .The REML estimates are indicated by a diamond on the right side of the figure.
The main conclusions for this example are that convergence is quickly obtained and that all three initial Σ1 result in very similar estimates from Σ3 onwards.The limit of the multistep estimator is in better agreement with the REML, compared to the DL type estimator.The limiting properties of the multistep estimator do not depend on which of the three initial estimates Σ1 is used.Table 2 shows the estimates of  and corresponding statistical inferences for the three F I G U R E 1 Forest plots of the meta-analysis by Antczak-Bouckoms et al. 33 (left panel) and the meta-analysis by Malin et al. 34 (right panel).These forest plots were created using the R package "metafor".estimators.The results of the multistep estimator are based on Σ10 .All estimators yield highly similar estimates and the same conclusions when drawing statistical inferences.

Example two: Missing data
Malin et al. 34 conducted a meta-analysis to study to what extent different cut-points on the Apgar score are related to neonatal mortality.The Apgar score 36 ranges from 0 to 10, and lower scores imply an increased risk.Studies in the meta-analysis examined whether babies with an Apgar score smaller than 3 and 6 were associated with increased risk of neonatal mortality compared to babies with a higher score.The effect size measure of interest was the log odds ratio, and ten studies were included in the meta-analysis.Data for this meta-analysis were obtained from Riley et al. 37 and are reported in Table 3 and the right panel of Figure 1.There are missing data in this meta-analysis, because not all studies reported effect sizes on both cut-points.For example, study 6 only reported an effect size for cut-point 3 and study 10 only reported an effect size for cut-point 6.
Figure 3 shows the entries of the estimated between-study covariance matrix for the first 10 steps of the multistep estimator.Three different initial estimates Σ1 were used, implemented in a similar way as in the previous example.The

F I G U R E 2
Estimates of the between-study covariance matrix by the multistep estimator for the meta-analysis by Antczak-Bouckoms et al. 33 The first 10 steps of the multistep estimator are shown together with the estimate of the restricted maximum likelihood estimator.The colors of the different lines refer to the different 'starting' or 'step-zero' matrices Σ0 .DL and REML refer to the DerSimonian and Laird and restricted maximum likelihood estimator, respectively.PD is probing depth and AL is attachment level.

TA B L E 2
Effect size estimates and results of statistical inferences for the meta-analysis by Antczak-Bouckoms et al. 33 when the between-study covariance matrix is estimated with the multistep (10th step), the restricted maximum likelihood (REML), and DerSimonian and Laird (DL) estimator.main conclusions for this example is that convergence is again quickly obtained.However, this time the limit of the multistep estimator is in better agreement with the DL type compared to the REML estimator.As in the previous example, the limiting properties of the multistep estimator do not depend on which of the three initial estimates Σ1 is used.All estimators yield highly similar estimates and the same conclusions when drawing statistical inferences (see Table 4).

SIMULATION STUDY
The two examples shown in Section 5 are encouraging, because in both cases convergence was quickly achieved.Furthermore, the limit of the multistep estimator was found to be in broad agreement with the DL type and REML estimators, but also this was sufficiently different to demonstrate that the multistep estimator is distinct.In this section, we perform a simulation study to examine the properties of the multistep estimator for both complete and missing data.The simulation study will address two main questions: Firstly, do the limiting properties of the multistep estimator depend on the initial Σ1 ?This was not the case for either example in Section 5 but this is an important issue to determine TA B L E 3 Data for the meta-analysis by Malin et al. 34 OR refers to odds ratio and SE to standard error.

F I G U R E 3
Estimates of the between-study covariance matrix by the multistep estimator for the meta-analysis by Malin et al. 34 The first 10 steps of the multistep estimator are shown together with the estimate of the restricted maximum likelihood estimator.The colors of the different lines refer to the different starting values.DL and REML refer to the DerSimonian and Laird and restricted maximum likelihood estimator, respectively.the acceptability of presenting the limit of the multistep estimator as a multivariate Paule-Mandel estimator.Secondly, does the multistep estimator perform better, or substantively differently, to the DL type and REML estimators?To simplify this question, we will assess the performance of the limit of the multistep estimator, instead of any particular step (for example, the second-step estimator).This means that datasets that converge after a different number of steps are directly comparable, and it is an efficient way to assess whether or not multistep estimators offer any improvement per se, rather than at any particular step.

TA B L E 4
Effect size estimates and results of statistical inferences for the meta-analysis by Malin et al. 34 when the between-study covariance matrix is estimated with the multistep (10th step), the restricted maximum likelihood (REML), and DerSimonian and Laird (DL) estimator.

Outcome 1 (cut-point 3) Outcome 2 (cut-point 6)
Multi Note: CI refers to confidence interval and two-tailed p-values are reported.

Methodology for simulating data
We simulated data for meta-analyses with two outcomes (d=2) and no covariates (p=2).For each simulated meta-analysis, i = 1, 2, … n vectors of observed effect sizes, Y i , were generated from a multivariate normal distribution with mean  = 0 and covariance matrix S i + .All simulated meta-analyses consisted of n=10 studies, a realistically small sample size but one where the between-study covariance matrix should be reasonably well identified.Setting  = 0 is immaterial because alternative choices simply translate the effect sizes, with no implications for any estimate of .
To simplify matters, the same within-study variance was used for both outcomes within every study.In order to generate a variety of study sizes in a simple and transparent way, the resulting 10 within-study variances were taken to be 0.1, 0.2, 0.3, … , 1.All within-study correlations were taken to be 0.7.
Eleven different  were explored:  was taken to be equal to each of the 10 different within-study covariance matrices plus a  = 0 reflecting homogeneity.This simple approach allows us to calibrate the extent of the between-study heterogeneity relative to the size of the studies.Although the magnitude of the within-study variances in this simulation study may not correspond to those of effect sizes in particular applications, re-scaling all variance components simply re-scales the Σ in the same way.We also proceeded in a simple and transparent way in the missing data scenario.We reanalyzed the simulated complete data after removing the 2nd, 4th, 6th, 8th, and 10th observations of the first outcome (and the corresponding entries of the within-study covariance matrices).
The simulations were programmed in R. 38 The R package MASS 39 was used for generating data from a multivariate normal distribution.1,000 iterations per condition (ie, the  assumed) were used, so that with 11 different values of Σ, in total 11,000 datasets were simulated (and, including the missing data scenario, 22,000 simulated datasets were analysed).R code of the simulation study is available at https://osf.io/dntzp/.

Implementation of the multistep estimator
The first goal of the simulation study was to examine limiting properties of the multistep estimator.We explored the use of alternative initial Σ1 by using three different "starting" or "step-zero" matrices Σ0 (see Section 4.2).We first used Σ0 = 0, so that Σ1 is the DL-type estimator, then secondly Σ0 = S 5 and finally Σ0 = S 10 , where S i , i = 1, 2, … 10, is the ith within-study variance used in the simulation study.This means that we explore the implications of starting at three very different parts of the parameter space, in the same way as in the examples (Figures 2 and 3).The maximum number of steps was set to 10,000.Criteria are needed to determine when the limiting behaviour has been reached and no further steps are necessary.The criterion used to determine convergence was that all entries of Σ at the current step, rounded to eight decimal places, must be the same as at the 10 previous steps.This is a very strong condition but we wanted to be absolutely certain that convergence was achieved whenever it is declared.We therefore only began checking for potential convergence at the 11th step.
Van Aert and Jackson 13 show that it is possible to construct datasets where the univariate multistep estimator does not converge and instead oscillates between two notably different estimates.We therefore also checked for oscillation from the 11th step onwards, by again rounding the current Σ, and those at the previous nine steps, to eight decimal places.Oscillation was considered to be detected if the estimates at the "odd steps" were all equal to each other, the estimates at the "even steps" were all equal to each other, and the two sets of estimates differ.For example, at the 12th step, we checked whether the "odd steps" Σ3 , Σ5 , Σ7 , Σ9 , Σ11 are all equal.We also checked whether the "even steps" Σ4 , Σ6 , Σ8 , Σ10 , Σ12 are all equal.If this was the case for both sets of estimates, but those at the "odd" and "even" steps differ, then oscillation was declared.This is a weaker condition than the one used for declaring convergence, which instead would be declared if the estimates at the "odd" and "even" steps are the same (and where one further step confirms this).The estimation procedure of the multistep estimator was stopped as soon as convergence or oscillation was declared.The maximum number of iterations was never reached.The limiting behaviour for all 22,000 simulated datasets could be therefore evaluated.
The above criteria, although useful for determining when the limiting behaviour is reached, are not necessarily reliable for distinguishing between convergence and oscillation.This is because a multistep estimator that is very slowly converging can sometimes meet the above criteria for oscillation before it meets the stronger criteria for convergence.A final measure was put in place to ultimately distinguish between convergence and oscillation.The sum of the absolute differences between the entries of the final and the penultimate multistep estimates of  was calculated.If this sum was more than 10 −7 then the multistep estimator was considered to be oscillating, otherwise it was considered to have converged.However, even this can be too stringent, because multistep estimators may be oscillating between two estimates that are very similar (or even converging, but extremely slowly).If this sum was more than 10 −4 , then the multistep estimator was considered to be oscillating between two notably different estimates.
A convention is needed to produce a single estimate when the multistep estimator oscillates in the limit.We propose taking the average of the two oscillations whenever this occurs.Our codes therefore take the average of the penultimate and final Σ whenever the iteration was stopped because oscillation was detected.This is of no consequence if the multistep estimator is in fact slowly converging and provides a unique point estimate if there is genuine oscillation.We return to this issue in the discussion.
An algorithm box is provided in Figure 4 to summarize the procedure used to implement the multistep estimator in the simulation study.

6.3
Results: Limiting properties of the multistep estimator

Meta-analyses with complete data
All three initial estimates Σ1 resulted in the same (to eight decimal places) multistep estimator for all 11,000 simulated meta-analyses with complete data.The limiting properties of the multistep estimator therefore do not appear to depend on the initial estimate in this simulation setting.The median number of steps taken by the multistep estimator before convergence was detected was 21 (quartiles 19 and 24), 20 (quartiles 18 and 23), and 20 (quartiles 18 and 24) for the first, second, and third initial Σ1 , respectively.Note that our methods for detecting convergence require at least 11 iterations, and so convergence will often have occurred sooner than this, as in the two examples.However, convergence could also be substantially slower: the largest number of steps needed before detecting convergence was 196.Including simulated datasets and initial Σ1 where the estimation procedure was stopped due to detecting oscillation, increased the maximum number of steps to 1223.
Oscillation was detected at the 10 −7 threshold (see Section 6.2) for at least one of the three initial Σ1 in 57 datasets.This was reduced to 50 at the 10 −4 threshold, so a few of these 57 datasets are likely cases of very slow convergence.Oscillation, instead of convergence, therefore occurs in around 0.5% of simulated datasets with complete data.

Meta-analyses with missing data
All three initial estimates Σ1 resulted in the same (to eight decimal places) multistep estimator in the vast majority of simulated meta-analyses (10,974 out of 11,000; 99.8%).For the 10,974 'well behaved' simulated datasets, the median number of steps before convergence was detected was 23 (quartiles 20 and 28), 22 (quartiles 19 and 27), and 22 (quartiles 19 and 28) for the first, second, and third initial Σ1 , respectively.Convergence could also be obtained very slowly: the largest number of steps needed before detecting convergence was 327.Including simulated datasets and initial Σ1 where the estimation procedure was stopped due to detecting oscillation, increased the maximum number of steps to 6,983.
Oscillation was detected at the 10 −7 threshold (see Section 6.2) for at least one of the three initial Σ1 in 127 datasets.However, this was reduced to 114 at the 10 −4 threshold, so a few of these 127 datasets are likely very slow converging.Oscillation, instead of convergence therefore occurs in around 1% of simulated datasets in this setting, i.e. this is about twice as common as in the complete data scenario.
We scrutinized the multistep estimator in the 26 meta-analyses where the different initial Σ1 yielded different results.There was in fact convergence to a unique solution in 9 out of these 26 meta-analyses for all three initial estimates but this convergence was very slow, which resulted in slight discrepancies at the eight decimal place.The estimation procedure was oscillating between the same three initial estimates for one dataset.
In the 16 remaining meta-analyses, the multistep estimator using one or two of the initial Σ1 resulted in exactly the same type of oscillation whereas there was the same type of convergence for another initial estimate or estimates.We therefore conclude that it is possible for the limiting behaviour of the multistep estimator to depend on the initial estimate but this is very rare.The analyst needs to make a decision which limiting result is to be preferred, which could be based on either their preferred initial Σ1 or their preference for convergence rather than oscillation.In either case, the resulting limiting form of the multistep estimator could be presented as a Paule-Mandel estimator.Although this evidence for the potential dependence of the limiting behaviour of multistep estimator on the initial Σ1 is a little uncomfortable, this seems to be a rare event.Furthermore, no serious dependence of this type has been uncovered by our investigations, because the initial Σ1 always yielded the same estimates in the limit if convergence was achieved.It therefore seems reasonable to suggest that analysts may tentatively present the limit of their multistep estimators as a multivariate Paule-Mandel estimator.We return to this issue in the discussion.

Results: Statistical properties of the multistep estimator
The second goal of the simulation study was to study the statistical properties of the multistep estimator.We compare the performance of the proposed method to the REML 32 and the existing DL type estimator proposed by Jackson et al. 23 (Section 3.1), by using Σ0 = 0 (Section 4.2).We report the multistep estimator results using Σ0 = 0, because the impact of using different initial estimates is negligible (see Section 6.3, this only has implications for 16 simulated datasets in the missing data scenario).Estimates of the REML estimator were obtained using the R package "mvmeta". 40n the complete data scenario, the simulation outputs of interest are the bias and precision of the estimates of the between-study variance of the first outcome and the between-study covariance.Due to the symmetrical nature of our simulation study, the results for the second outcome are, to within Monte Carlo error, the same as the first in the complete data scenario.However, this symmetry is lost in the missing data scenario so that the estimated between-study variances of both outcomes and the between-study covariance are then of interest.

Meta-analyses with complete data
Table 5 shows the results of the complete data simulations.The first two columns show the true between-study variance and covariance.The next three columns show the corresponding average estimates and their variance (in brackets).The last three columns show the same results for the between-study covariance.We highlight, for each condition, which of the estimators was the least biased (bold font) and second least biased (underlined).All estimators overestimated the true between-study variance when this is less than 0.7.This overestimation was greatest for the smallest true between-study variances, as expected because all three methods constrain estimated variances to be non-negative.Differences between the performance of the estimators are generally small and decrease as the true between-study variance increases.The REML and DL type estimators were the least-biased methods for up to "moderate" true between-study variance (ie, 0.7 or less) but the multistep estimator performed better for larger true between-study variances.The REML-and DL-type estimators were most precise for up to moderate true between-study variance, whereas the REML and multistep estimators were most precise for larger true between-study variances.This lower precision of the DL-type estimator for large between-study variance can be explained because this estimator, in contrast to the REML and multistep estimators, does not take the magnitude of the between-study covariance matrix into account in the weights.
The between-study covariance was generally slightly overestimated by all methods.Differences between the methods were again small, but the REML and DL-type estimators were generally the two least-biased methods.Precision was again the largest for REML and DL type estimators if the true between-study covariance was at most moderate, whereas the REML and multistep estimators were most precise for larger true between-study covariances.
The overall impression is that the performance of the multistep estimator is broadly similar to the established REML-and DL-type estimators, and that it compares most favourably to them when the between-study heterogeneity is considerable.

Meta-analyses with missing data
The table of results of the simulation study with missing data is shown in the web supplementary materials.As expected, the biases of the estimated between-study variances again decreased as the true value increased.As also expected, all three estimators were notably less precise for the between-study variance for the first outcome (with missing data) and the between-study covariance, compared to the complete data scenario.The precision of estimates of the between-study variance for the second outcome were also slightly adversely affected by missing outcome data for the first outcome.
Biases also appeared to be adversely affected by missing data, but to a much lesser extent than precision.The multistep estimator was consistently the most biased estimator of the two between-study variances but its performance when estimating the between-study covariance was much better.This is in contrast to the results in the complete data scenario, which suggests that the relative performance of the three estimators may be context specific.We return to this issue in the discussion.As in the complete data scenario, the overall impression is that the multistep estimator's performance is broadly similar to the established REML-and DL-type estimators.

Recommendations
This simulation study demonstrates that multistep estimators are a fully viable alternative to the existing standard methods.However, there is no evidence that, in general, they perform substantively better.In the supplementary materials, we show the simulation study results for the between-study correlation that indicate that all estimators are downwardly biased, but that REML outperforms the other estimators in this regard.Multistep estimators should be considered to be an alternative, rather than a replacement, estimator of the between-study covariance matrix.
The results from the simulation study indicate that the existing DL type estimator proposed by Jackson et al 23 should be preferred to the use of multistep estimators when the data are expected to be homogeneous, or the between-study heterogeneity is mild.This is because the multistep estimator then appears to result in both greater bias and lower precision.However, these results also indicate that worthwhile gains in precision can be made using the multistep estimator when the data are heterogeneous.This makes intuitive sense because the DL type estimator clearly uses very appropriate study weights when the data are homogeneous but the proposed procedure is able to weight studies more equally when the data are heterogeneous, so that their total covariance matrices S i +  are more similar.We therefore recommend that our proposed approach should replace the existing moments-based approach in application areas where considerable between-study heterogeneity is anticipated.
We also recommend that multistep estimators should be considered as an alternative to the REML estimator in situations where the normality assumptions made by the multivariate random-effects model are especially dubious, for example because many studies are small.This is because inaccurate likelihood-based inference is a serious implication of violated normality assumptions in the univariate setting (Jackson and White, 30 their Table 3) and this can also be anticipated to be the case multivariately.Like the REML estimator, multistep estimators do not require the use of a predetermined weighting scheme, and allow the studies to be weighted more equally when the data are heterogeneous.Hence, they naturally accompany the REML estimator in sensitivity analyses that explore the use of alternative estimators of .More generally, multistep estimators naturally lend themselves to this type of sensitivity analyses because different numbers of steps may be used.
Finally, applying multistep estimators may be especially beneficial in high-dimensional multivariate meta-analyses.Estimation of a small number of steps with the multistep estimator is expected to be substantially faster than the REML estimator in a high-dimensional multivariate meta-analysis.

DISCUSSION
We have extended a variety of recently proposed methods for meta-analysis to facilitate the development of multivariate multistep estimators of the between-study covariance matrix.The multivariate generalized method of moments is in itself a useful development, so that the analyst can adopt a prespecified weighting scheme best suited to their analysis.For example, it might be considered desirable to weight studies more equally than by their within-study precision, whilst also ensuring that large studies receive more weight, to lessen the impact that small studies have on the resulting meta-analysis.The multivariate generalized method of moments also allows the implementation of a sensitivity analysis to the choice of weighting scheme.Multistep estimators are an automatic way to explore sensitivity, because estimates using different numbers of steps provide different possibilities to explore.The simulation study results suggest that multistep estimators are merely an alternative to existing methods, and so may be used as a sensitivity analysis by analysts who already have a preference for DerSimonian and Laird and/or likelihood based estimators.
We have tentatively proposed that the limiting form of the multistep estimator may be presented as a Paule-Mandel estimator.This is because of the relationship between these two estimators in the univariate setting.There are two main reasons for our caution here.Firstly, the simulation study found that it is possible (but rare) for the limiting behaviour of the multivariate multistep estimator to depend on the initial estimate.Secondly, defining a multivariate Paule-Mandel estimator via its relationship with the multistep estimator is not the only approach.An alternative is to define a generalized Q matrix, where the weights depend on the unknown true between study covariance matrix, and solve the estimating equation where this matrix is equated to its expected value.This is the more usual way of defining the univariate Paule-Mandel estimator, 24 but the difficulties include determining the most appropriate definition of the generalized Q matrix (a variety of such matrices, all of which simplify to the generalized Q statistic in the univariate case, are likely to be feasible) and ensuring that the resulting multivariate estimating equation has a unique solution.Another possibility is to make use of the equivalence between the univariate Paule-Mandel and empirical Bayes estimators 41 to motivate multivariate Paule-Mandel estimators.We suspect that a variety of ways of generalizing the univariate Paule-Mandel estimator to the multivariate setting are possible, of which the approach used here is just one.If "Paule-Mandel" is to ultimately be associated with the limiting form of the multistep estimator, then we suggest that it is a Paule-Mandel estimator, rather than the Paule-Mandel estimator.
We have found that oscillation, rather than convergence, is unusual (but not exceptional) in the multivariate setting.This is in contrast to the univariate case, where oscillation was observed in rare cases with large differences between the within-study sampling variances. 13Hence, oscillation of the multivariate multistep estimator is expected to be less prevalent in situations where the between-study covariance matrices are more similar.We have proposed taking the average of the oscillating estimates when this occurs, but other conventions are possible and should be considered in both applications and future methodological work.Although we have found a few simulated examples where the limiting behaviour of the multivariate multistep estimator depends on the initial estimate no very unsatisfactory behaviour has been found, for example where the multistep estimator converges to different estimates.However, we cannot rule out the possibility that datasets, either simulated or real, might uncover this type of behaviour in the future.If such an example or examples should come to light then this may dampen enthusiasm for taking the limit, but two and three-step estimators would still likely be an appealing option.This is because, by requiring only a few iterations, they are both computationally feasible in very high dimensions (unlike likelihood based methods) and take the extent of the between-study variation into account when computing the weighting matrices (unlike the DerSimonian and Laird estimator).Our methods therefore can be used to provide a compromise between the computational and conceptual simplicity of the existing moment based methods and fully likelihood based analyses.Modifications of our approach that attempt to avoid oscillation, perhaps motivated by ideas for avoiding convergence issues in likelihood based analyses, are potentially worth considering.For example, one might consider decreasing the difference between subsequent multistep estimators whenever oscillation occurs, but we leave this for future work.
The simulation study suggests that the performance of the multistep estimator, relative to the existing alternatives, may be context specific.This is because different observations were made regarding the estimation of the between-study covariance matrix.Our simulation study is by no means extensive, and other scenarios have the potential to uncover more strengths and weaknesses of the alternative estimators as in the univariate case. 9For example, Langan et al. 9 conclude that, in the univariate setting, "Paule-Mandel is often approximately unbiased when DerSimonian and Laird is negatively biased.However, results also show that Paule-Mandel has high positive bias when there are large differences in study size."Further work is needed to determine if these pros and cons extend to the multivariate setting.Future simulation studies may also address whether non-normal within-study and also between-study distributions affect estimation of the between-study covariance matrix, because we only generated data under the multivariate random-effects model and this model is only an approximation of data that are encountered in practice.][19][20][21][22] In conclusion, we have developed a new class of estimator for the between-study covariance matrix for multivariate random-effects meta-analysis and meta-regression.Our new estimator has been found to perform satisfactorily in real examples and a simulation study.It is a viable alternative estimator and facilitates sensitivity analysis.Our work connects and extends a variety of existing methods for meta-analysis, and we hope it will stimulate further methodological development and prove useful in applications.

F I G U R E 4
Algorithm box of the implemented multivariate multistep estimator in the simulations. 35 33ta for the meta-analysis by Antczak-Bouckoms et al.33 TA B L E 1Note: SE refers to standard error and correlation is the within-study correlation between probing depth and attachment level.
Results of the simulation study with complete data.The table shows the average and variance (between brackets) of the estimated variance of the first outcome and covariance between outcomes across all iterations.The included methods are the multistep, restricted maximum likelihood (REML), and DerSimonian and Laird (DL) estimators.Values in bold and underlined indicate that a method was the least or second least biased in a particular condition. Note: