### Abstract

- Top of page
- Abstract
- 1. Introduction
- 2. Mixed-effects meta-regression models
- 3. Estimating the model predictive power in meta-analysis
- 4. Previous simulation studies
- 5. Objectives and hypotheses of this study
- 6. An illustrative example
- 7. Simulation study
- 8. Results
- 9. Discussion
- 10. Conclusion
- Acknowledgement
- References

Several methods are available to estimate the total and residual amount of heterogeneity in meta-analysis, leading to different alternatives when estimating the predictive power in mixed-effects meta-regression models using the formula proposed by Raudenbush (1994, 2009). In this paper, a simulation study was conducted to compare the performance of seven estimators of these parameters under various realistic scenarios in psychology and related fields. Our results suggest that the number of studies (*k*) exerts the most important influence on the accuracy of the results, and that precise estimates of the heterogeneity variances and the model predictive power can only be expected with at least 20 and 40 studies, respectively. Increases in the average within-study sample size () also improved the results for all estimators. Some differences among the accuracy of the estimators were observed, especially under adverse (small *k* and ) conditions, while the results for the different methods tended to convergence for more optimal scenarios.

### 1. Introduction

- Top of page
- Abstract
- 1. Introduction
- 2. Mixed-effects meta-regression models
- 3. Estimating the model predictive power in meta-analysis
- 4. Previous simulation studies
- 5. Objectives and hypotheses of this study
- 6. An illustrative example
- 7. Simulation study
- 8. Results
- 9. Discussion
- 10. Conclusion
- Acknowledgement
- References

Meta-analysis is a form of research synthesis that allows researchers to quantitatively integrate the results from a set of studies on the same topic (Borenstein, Hedges, Higgins & Rothstein, 2009; Cooper, Hedges & Valentine, 2009). Since the outcomes from the individual studies are often expressed in different measurement units, their results are typically converted into a common metric through a standardized effect size index (such as the standardized mean difference). The main objectives in a meta-analysis are to obtain an overall effect size estimate, to assess the heterogeneity among the individual effect size estimates, and to search for moderators that can account for (at least) part of that heterogeneity (Hedges & Olkin, 1985; Sánchez-Meca & Marín-Martínez, 2010).

The results or effect sizes of the individual studies in a meta-analysis usually exhibit some heterogeneity (e.g., Sidik & Jonkman, 2005b; Thompson & Higgins, 2002). This means that, although a set of studies analysing the same phenomenon (e.g., effectiveness of psychological treatments and interventions on a given disorder) are selected, their results are likely to differ to some extent. For that reason, moderator analyses typically constitute a crucial element of a meta-analysis (Lipsey, 2009). In a moderator analysis, the goal is to test the influence of one or more study characteristics (e.g., type and duration of the intervention, severity of the disorder in the sample patients) on the outcome variable (e.g., efficacy of the intervention, assessed through the comparison between a treatment and a control group). Such analyses can be conducted by fitting linear models to the data where the moderators constitute the predictor variables and the effect sizes are employed as the criterion variable (Borenstein *et al*., 2009). This leads to so-called meta-regression models (Thompson & Higgins, 2002). In a meta-regression model, both continuous and categorical moderators can be included.

When carrying out a meta-analysis, some statistical model must be assumed for the effect size distribution, and the model choice will have an influence on the validity and generalizability of the results from the meta-analysis. Two kinds of statistical models have been employed for the majority of meta-analytic reviews conducted so far, namely the fixed-effects and random-effects models (Hedges & Vevea, 1998; Schmidt, Oh & Hayes, 2009). Nowadays, most researchers agree that the model choice should be made based on the generalizability intended for the results (National Research Council, 1992). Only random-effects models, which include an additional variance component to model the between-studies heterogeneity, allow for generalization to studies different to the ones included in the meta-analysis, which is usually the goal when carrying out such a review. Thus, random-effects models are a suitable option for most meta-analyses (Hedges & Vevea, 1998; Raudenbush, 1994, 2009).

Under a random-effects model, it is assumed that the study outcomes (e.g., treatment efficacy) will fluctuate as a consequence of two sources of variation: the sampling of the participants for each study; and the differential characteristics of the studies (e.g., different conditions of the sample, treatment application, methodology, or context in each individual study). The magnitude of the latter can be analysed through the estimation of the heterogeneity (or between-studies) variance, τ^{2}, which represents the excess variation among the effects over that expected from sampling error alone (Thompson & Sharp, 1999). In contrast to the sampling variances from each effect size, which quantify the random sampling error, τ^{2} denotes systematic differences due to the influence of characteristics from the individual studies. The identification of some of these characteristics (or moderators) is the main objective of the moderator analyses. Since the moderators are usually included as fixed effects in the model, the addition of a random effect (the effect sizes in the studies) to model the heterogeneity among the studies leads to mixed-effects meta-regression models.

There are several parameters of interest in a meta-regression model. One of these is the model predictive power, denoted by Ρ^{2} (Ρ denotes the capital Greek letter ‘rho’), which can be defined as the proportion of variance among the effect sizes that can be accounted for by the predictors included in the model. Note that only the variance due to differences among the studies, quantified by τ^{2}, can be explained by the predictors usually included in a mixed-effects meta-regression model. An estimate of the Ρ^{2} parameter is usually denoted as an *R*^{2} value. The interpretation of *R*^{2} is identical in ordinary regression and in meta-regression models, in terms of a percentage or proportion of the variability in the outcomes associated with the predictor(s).

When regression models are fitted using ordinary least squares techniques, the *R*^{2} index is computed as the quotient between the sum of squares due to the regression and the total sum of squares, that is, (e.g., Pedhazur & Schmelkin, 1991). However, this strategy is not suitable for meta-regression models because part of the total variability, more specifically the sampling error of an observed effect size given the population effect size in that study, cannot by definition be explained by the moderators included in the model (Aloe, Becker & Pigott, 2010; Konstantopoulos & Hedges, 2009; Rodriguez & Maeda, 2006).1 Thus, a different method is typically proposed for obtaining an *R*^{2} index in meta-regression models (Raudenbush, 1994), where the total variability is an estimate of the between-studies variance, τ^{2}, and the variability explained by the predictors in the model is estimated as a part of τ^{2} (see equation (3)) . This method will be presented, explained, and illustrated in this paper.

In a meta-regression model, an adequate estimate of the magnitude of its predictive power via the *R*^{2} index is an essential complement of the statistical significance of the model. The *R*^{2} index informs us about the practical significance or the degree of influence of a set of moderators in the heterogeneity of the effect sizes in a meta-analysis (e.g., explaining around 20% or 30% of the heterogeneity). However, as far as we know, no studies have yet evaluated in a systematic manner the performance of the *R*^{2} index in the conditions of a meta-regression model. Therefore, the purpose of the present study was to assess the performance of the method proposed by Raudenbush to compute an *R*^{2} index in meta-analysis, by conducting a Monte Carlo simulation with different conditions usually found in the real meta-analyses.

The outline of the present paper is as follows. First, mixed-effects meta-regression models are briefly sketched. Second, various alternatives for computing an *R*^{2} index according to the proposal of Raudenbush (1994) for meta-analysis are considered. After presenting the methods, results from previous simulation studies that pursued part of the objectives of our study are summarized. The performance of the alternative methods here considered is then illustrated by applying them to an example. Next, a simulation study comparing the various estimators is presented and the results obtained are detailed. Finally, the results are discussed and some conclusions provided, where the degree of accuracy of the different methods for the computation of an *R*^{2} index as a measure of the explanatory power of a predictor is assessed as a function of the specific conditions in a meta-analysis (e.g., number of studies, sample size distribution of the studies, effect size distribution, and the true percentage of variance accounted for by the predictor).

### 2. Mixed-effects meta-regression models

- Top of page
- Abstract
- 1. Introduction
- 2. Mixed-effects meta-regression models
- 3. Estimating the model predictive power in meta-analysis
- 4. Previous simulation studies
- 5. Objectives and hypotheses of this study
- 6. An illustrative example
- 7. Simulation study
- 8. Results
- 9. Discussion
- 10. Conclusion
- Acknowledgement
- References

In a meta-analysis with *k* studies, let **y** denote a *k* × 1 vector of independent effect sizes {*y*_{i}} that represents the results of the studies, and **X** a *k* × (*p* + 1) design matrix of full column rank with *p* predictor variables, representing some differential characteristics in the studies. Since the predictors are included as fixed effects in the model, assuming a random-effects model for the effect sizes leads to a mixed-effects meta-regression model, which can be expressed by the formula (Raudenbush, 1994)

- (1)

where **β** is a (*p* + 1) × 1 vector containing the regression coefficients , **u** is a *k* × 1 vector of independent between-studies errors {*u*_{i}} with distribution , and **e** is a *k* × 1 vector of independent within-study errors {*e*_{i}}, each with distribution . While *v*_{i} is the within-study variance (or sampling error) for the *i*th study, represents the residual heterogeneity (or between-studies) variance, that is, the remaining variability in the true effect sizes not accounted for after adding one or more predictors to the model (Viechtbauer, 2007a).

Note that the mixed-effects model presented in equation (1) is actually an extension of the random-effects model and that the latter can be formulated if **X** is defined as a *k* × 1 vector of ones. In this case we would have a model without predictors, where **β** is a scalar containing the hypermean (mean of the population effects) and **u** is normally distributed with mean 0 and variance τ^{2}, the latter denoting the total heterogeneity in the true effects. If, moreover, the error term **u** were suppressed from equation (1), then the model would become a fixed-effect model (which is equivalent to setting τ^{2} = 0 or assuming that the sampling error is the only source of variability).

The regression coefficients can be estimated using the weighted least squares formula

- (2)

where is a *k* × *k* diagonal matrix with the inverse variances of the effect sizes as elements, that is, for mixed-effects models. Note that an adequate estimate of both the within-study variance for each study, *v*_{i}, and the residual between-studies variance, , is needed for the estimation of the regression coefficients. For commonly used effect size metrics (e.g., standardized mean differences, correlation coefficients, odds ratios, risk ratios), approximately unbiased estimators are available for *v*_{i} and the usual practice in meta-analysis is to substitute those estimates and treat them as known values (e.g., Aloe *et al*., 2010; Hedges & Pigott, 2004; Knapp, Biggerstaff & Hartung, 2006; Konstantopoulos & Hedges, 2009; Viechtbauer, 2007b; for a different approach, see Malzahn, Böhning & Holling, 2000). A more crucial issue is the choice of estimator for , and at least seven different estimators have been described in the literature, as detailed in the next section.

### 3. Estimating the model predictive power in meta-analysis

- Top of page
- Abstract
- 1. Introduction
- 2. Mixed-effects meta-regression models
- 3. Estimating the model predictive power in meta-analysis
- 4. Previous simulation studies
- 5. Objectives and hypotheses of this study
- 6. An illustrative example
- 7. Simulation study
- 8. Results
- 9. Discussion
- 10. Conclusion
- Acknowledgement
- References

A proposal to compute an *R*^{2} index in meta-analysis was presented by Raudenbush (1994, 2009). It is based on the re-estimation of the amount of heterogeneity (i.e., between-studies variance) after adding one or more predictors to the model, resulting in the residual heterogeneity or the heterogeneity that cannot be explained by the predictors. The rationale for this index is that the extent to which the moderators can account for the heterogeneity in the true effects will be reflected in the degree by which the residual heterogeneity, , will be smaller than the total amount of heterogeneity, τ^{2}, as a result of including explanatory variables in the model. In practice, the parameter values are replaced by their estimates, and , allowing for the computation of the *R*^{2} index as (Borenstein *et al*., 2009)

- (3)

denoting the proportion of total heterogeneity accounted for by the moderator(s) included in the model.

Several alternatives have been proposed in the literature to estimate the total heterogeneity variance, τ^{2}, in random-effects models (DerSimonian & Laird, 1986; Morris, 1983; Sánchez-Meca & Marín-Martínez, 2008; Sidik & Jonkman, 2005b, 2007; Viechtbauer, 2005). Most of these estimators have also been extended to mixed-effects models, allowing for estimating the residual heterogeneity variance, (Raudenbush, 1994, 2009; Sidik & Jonkman, 2005a,b). It is important to remark here that, for both parameters, no estimator is expected to provide accurate results unless the number of studies is large enough (e.g., Borenstein *et al*., 2009; Schulze, 2004).

Seven different estimators of τ^{2} and can be computed with the formulae gathered in Table 1. The metafor package programmed in R (Viechtbauer, 2010) directly computes these seven estimators from the values of the effect sizes and their corresponding within-study variances in the studies of the meta-analysis. The Hedges (HE), Hunter–Schmidt (HS), DerSimonian–Laird (DL), and Sidik–Jonkman (SJ) methods are non-iterative estimators, while the maximum likelihood (ML), restricted maximum likelihood (REML), and empirical Bayes (EB) methods require iterative computations. All estimators presented in Table 1 can be succinctly expressed after defining the matrix

- (4)

where **W** is a diagonal weighting matrix whose elements, *w*_{i}, can change from one estimator to another. For the iterative estimators, one starts with an initial estimate of (e.g., as obtained with one of the non-iterative estimators) and then iterates through the equation

- (5)

until convergence, where Δ is given in Table 1 for the ML, REML, and EB estimators. Although all the equations gathered in Table 1 include predictors, they also apply for the random-effects model without predictors by setting *p* = 0 and with **X** being a *k* × 1 vector of ones. In a model without predictors, the equations in Table 1 estimate the total heterogeneity variance, τ^{2}, while the inclusion of predictors in the same equations leads to the estimation of the residual heterogeneity variance, .

A value of zero for suggests that all the heterogeneity among the effect sizes is accounted for by the predictors included in the model (Viechtbauer, 2007a). Also, due to random sampling error, the estimators in Table 1 (with the exception of the SJ estimator) can provide a negative estimate, which is a value outside of the parameter space for a variance component. The usual practice is to truncate negative values to zero. When an iterative estimator is employed, a simple strategy to avoid negative estimates is the use of step-halving (Jennrich & Sampson, 1976), which implies multiplying the adjustment value, Δ, by 1/2 (e.g., first by 1/2, then by 1/4, then by 1/8, and so on) until it becomes sufficiently small enough for the resulting estimate to stay non-negative.

Both (total and residual) heterogeneity variance estimates employed in equation (3) can be obtained using any of the methods presented in Table 1. As a consequence, there are at least seven different methods for computing the *R*^{2} index using this proposal. Aloe *et al*. (2010) recommended using the same method for both estimates. Indeed, it does not seem sensible to mix two estimates obtained using methods with different theoretical assumptions and, furthermore, only the estimates obtained with the same method are readily comparable.

It is important to note that, due to sampling error, the formula proposed by Raudenbush may require or lead to truncation in several situations. First, can be larger than for a given meta-analytic data set, especially with small samples (small number of studies, small sample sizes, or both), leading to a negative *R*^{2} value that is typically truncated to zero in practice (indicating that all of the heterogeneity among the effect sizes remains unaccounted for after including the moderator(s) in the model). Second, a negative value of truncated to zero leads to division by zero in equation (3), in which case *R*^{2} is undefined. It is then common practice to set (or truncate) the value of *R*^{2} to 0 (indicating that none of the heterogeneity is accounted for by the moderators, given that there appeared to be none to begin with). Finally, with a positive value of , a negative value of truncated to zero will lead to an *R*^{2} value of 1 (indicating that all of the heterogeneity is accounted for).

Since an estimate of the heterogeneity variance is included in both the random- and mixed-effects model weights (cf. equation (2)), the accuracy of these estimates might affect the result of other statistical analyses, such as the computation of an overall effect size estimate and its confidence interval in a random-effects model or the estimation and testing of the model coefficients in a mixed-effects meta-regression model. However, getting accurate estimates of τ^{2} and seems even more crucial for the assessment of the predictive power in meta-regression models since the *R*^{2} index in equation (3) requires estimates both of the total and residual amount of heterogeneity.

### 4. Previous simulation studies

- Top of page
- Abstract
- 1. Introduction
- 2. Mixed-effects meta-regression models
- 3. Estimating the model predictive power in meta-analysis
- 4. Previous simulation studies
- 5. Objectives and hypotheses of this study
- 6. An illustrative example
- 7. Simulation study
- 8. Results
- 9. Discussion
- 10. Conclusion
- Acknowledgement
- References

Several simulation studies have already been conducted with the aim of comparing the accuracy of various estimators of the heterogeneity variance in meta-analysis. Some of these studies employed effect size indices for dichotomous measures (e.g., Malzahn *et al*., 2000; Sidik & Jonkman, 2005b, 2007), while others considered indices for continuous variables (e.g., Van den Noortgate & Onghena, 2003; Viechtbauer, 2005).

In general, a positive bias has been found in the SJ estimator for small to medium parameter values (Sidik & Jonkman, 2005b, 2007), while a negative bias was reported for the HS and ML estimators, as well as for the DL method when estimating large parameter values (Malzahn *et al*., 2000; Viechtbauer, 2005). The HE method was found to perform appropriately in terms of bias, although it was less efficient than the HS, DL, ML, and REML estimators (Viechtbauer, 2005). Finally, good performance was observed for both the REML and EB estimators when considering bias and efficiency criteria jointly (Sidik & Jonkman, 2007; Van den Noortgate & Onghena, 2003; Viechtbauer, 2005).

All of these simulation studies focused on random-effects models. Therefore, it is not certain to what extent these results would also carry over to mixed-effects meta-regression models. Moreover, these studies do not indicate whether one of the various estimators for τ^{2} and would be preferable when computing the *R*^{2} index given by equation (3).

### 5. Objectives and hypotheses of this study

- Top of page
- Abstract
- 1. Introduction
- 2. Mixed-effects meta-regression models
- 3. Estimating the model predictive power in meta-analysis
- 4. Previous simulation studies
- 5. Objectives and hypotheses of this study
- 6. An illustrative example
- 7. Simulation study
- 8. Results
- 9. Discussion
- 10. Conclusion
- Acknowledgement
- References

In the present study, all seven heterogeneity variance estimators presented (i.e., the HE, HS, DL, SJ, ML, REML, and EB estimators) were considered and applied to simulated meta-analyses where the standardized mean difference was the effect size index. This simulation compared the accuracy of the methods under different scenarios for the estimation of the total and residual heterogeneity variances as well as of the model predictive power, as defined by Raudenbush (1994).

A first objective was to check whether the patterns reported in previous studies for the heterogeneity variance estimators under random-effects models also apply for mixed-effects models with one predictor. The second objective was to assess the performance of Raudenbush's proposal for estimating the model predictive power in meta-analysis when computing *R*^{2} with the various estimators for τ^{2} and described earlier.

Regarding our hypotheses, we expected to find results similar to those reported in previous simulation studies for the different estimators of the total heterogeneity variance under random-effects models. In particular, we expected the HS and ML estimators to show a negative bias and the DL method to provide negatively biased estimates for large parameter values. The SJ estimator was expected to show a large positive bias for small to medium parameter values, while the HE method was expected to provide essentially unbiased estimates, although less efficiently than the remaining methods under comparison. According to our hypotheses, the REML and EB estimators were expected to provide the best performance, as found in previous simulation studies. The same trends observed for the different estimators under random-effects models were also expected to be found when estimating the residual heterogeneity variance under mixed-effects meta-regression models with one moderator. Finally, it was expected that the REML and EB estimators would also provide the best performance for the estimation of the predictive power in mixed-effects meta-regression models, computed with equation (3). We also expected that an increase in the average sample size and (especially) the number of studies would lead to more precise results for all estimators.

### 6. An illustrative example

- Top of page
- Abstract
- 1. Introduction
- 2. Mixed-effects meta-regression models
- 3. Estimating the model predictive power in meta-analysis
- 4. Previous simulation studies
- 5. Objectives and hypotheses of this study
- 6. An illustrative example
- 7. Simulation study
- 8. Results
- 9. Discussion
- 10. Conclusion
- Acknowledgement
- References

Else-Quest, Hyde and Linn (2010) published a meta-analysis integrating results from the Programme for International Student Assessment (PISA) in different countries in 2003. This report evaluated 15-year-old students' performance in several subjects. The authors focused on mathematics and, since they were interested in gender differences, effect sizes were defined as standardized mean differences between the marks achieved by boys and girls (with positive values indicating better performance for boys).

One of the coded characteristics for each country was the share of parliamentary seats held by women (given as a proportion), used as a moderator in this example. Twenty countries from different parts of the world were selected to illustrate the methods described earlier. Table 2 shows the effect size, *y*_{i}, sampling variance, *v*_{i}, and the moderator value, *Parl*_{i}, for each of the 20 countries.

Table 2. Data from the meta-analysis published by Else-Quest *et al*. (2010)Country | *y* _{ i } | *v* _{ i } | *Parl* _{ i } | Country | *y* _{ i } | *v* _{ i } | *Parl* _{ i } |
---|

Australia | 0.06 | 0.0003 | 0.27 | Mexico | 0.13 | 0.0001 | 0.16 |

Belgium | 0.07 | 0.0005 | 0.25 | Netherlands | 0.06 | 0.0010 | 0.33 |

Brazil | 0.16 | 0.0009 | 0.09 | Poland | 0.06 | 0.0009 | 0.21 |

Canada | 0.13 | 0.0002 | 0.24 | South Korea | 0.25 | 0.0008 | 0.06 |

France | 0.09 | 0.0009 | 0.12 | Spain | 0.10 | 0.0004 | 0.27 |

Germany | 0.09 | 0.0009 | 0.31 | Sweden | 0.07 | 0.0009 | 0.45 |

Greece | 0.21 | 0.0009 | 0.09 | Thailand | −0.05 | 0.0008 | 0.10 |

Iceland | −0.17 | 0.0012 | 0.35 | Tunisia | 0.15 | 0.0008 | 0.12 |

Italy | 0.19 | 0.0003 | 0.10 | Turkey | 0.14 | 0.0008 | 0.04 |

Japan | 0.08 | 0.0009 | 0.10 | USA | 0.07 | 0.0007 | 0.14 |

All seven variance estimators compared in this study were employed to estimate the total heterogeneity variance in a random-effects model, as well as the slope, the residual heterogeneity variance, and the proportion of variance accounted for by the moderator in a mixed-effects meta-regression model. Results are presented in Table 3.

Table 3. Estimates in random- and mixed-effects models using data from Else-Quest *et al*. (2010)Method | | | | *R* ^{2} |
---|

HE | 0.0077 | −0.3870 | 0.0061 | .2120 |

HS | 0.0052 | −0.3849 | 0.0046 | .1207 |

DL | 0.0058 | −0.3861 | 0.0054 | .0691 |

SJ | 0.0076 | −0.3870 | 0.0061 | .1891 |

ML | 0.0069 | −0.3858 | 0.0051 | .2544 |

REML | 0.0073 | −0.3867 | 0.0058 | .2060 |

EB | 0.0075 | –0.3868 | 0.0059 | .2093 |

As the slope estimates show, a negative relationship was found with all methods, indicating that a higher percentage of women in parliament was associated with decreasing advantages for boys in the mathematics test. Regarding the total heterogeneity variance, the lowest estimates were obtained using HS and DL methods (0.0052 and 0.0058, respectively), while the highest estimates were provided by HE, SJ, and EB methods (0.0077, 0.0076, 0.0075, respectively). Residual heterogeneity variance estimates also showed some variability, with values ranging between 0.0046 (HS estimator) and 0.0061, obtained with the HE and SJ estimators. These differences led to notable variation among the estimates of the model predictive power depending on the estimator used. The *R*^{2} values showed fluctuations from 6.9% of heterogeneity accounted for by the moderator (DL estimator) to the 25.4% obtained with the ML estimator.

### 7. Simulation study

- Top of page
- Abstract
- 1. Introduction
- 2. Mixed-effects meta-regression models
- 3. Estimating the model predictive power in meta-analysis
- 4. Previous simulation studies
- 5. Objectives and hypotheses of this study
- 6. An illustrative example
- 7. Simulation study
- 8. Results
- 9. Discussion
- 10. Conclusion
- Acknowledgement
- References

A simulation study was programmed in R using the metafor (Viechtbauer, 2010) package. Meta-analyses of *k* studies were generated, obtaining the individual scores for each study from two normal populations (see Marín-Martínez & Sánchez-Meca, 2010) and using the standardized mean difference as the effect size index (Marín-Martínez & Sánchez-Meca, 2010; equation (2)).

For each meta-analysis, **θ** and **x** were defined as *k* × 1 vectors containing parameter effects and moderator values, respectively. The predictor **x** was generated from a standard normal distribution. On the other hand, the **θ** values were obtained from the expression **θ** = β_{0} + β_{1}**x** + **u**, where β_{0} was set to 0.5, which can be regarded as an effect of medium size in some psychological areas (Cohen, 1988); the slope β_{1} was set as described below, and **u** is an error term with distribution . Note that if the predictor is dropped from the model, the error term **u** will have distribution *N*(0, τ^{2}).

The total heterogeneity variance, τ^{2}, and the model predictive power, Ρ^{2}, were manipulated in the simulations. The former was set to values representative of no, low, medium, or large amounts of heterogeneity in psychology and related fields (0, 0.08, 0.16, and 0.32, respectively), similar to the values employed in previous simulation studies (e.g., Knapp & Hartung, 2003; Marín-Martínez & Sánchez-Meca, 2010; Schulze, 2004). For Ρ^{2}, we used values of 0%, 25%, 50%, or 75% of heterogeneity accounted for, with the aim of reflecting realistic conditions (Thompson & Higgins, 2002). After setting both parameter values, we then assigned a value to β_{1} by means of the expression . Table 4 gathers the different values considered for these parameters, as well as the resulting values for and the residual heterogeneity variance parameter, , which we computed as .2

Table 4. Parameter values considered in this simulation for τ^{2} and Ρ^{2} (and the resulting values for and )τ^{2} | 0 | 0.08 | 0.16 | 0.32 |
---|

Ρ^{2} | 0 | 0 | 0.25 | 0.50 | 0.75 | 0 | 0.25 | 0.50 | 0.75 | 0 | 0.25 | 0.50 | 0.75 |

| 0 | 0 | 0.02 | 0.04 | 0.06 | 0 | 0.04 | 0.08 | 0.12 | 0 | 0.08 | 0.16 | 0.24 |

| 0 | 0.08 | 0.06 | 0.04 | 0.02 | 0.16 | 0.12 | 0.08 | 0.04 | 0.32 | 0.24 | 0.16 | 0.08 |

Other factors manipulated in this simulation were the number of studies in each meta-analysis (*k* = 5, 10, 20, 40, and 80) and the average sample size of the *k* studies (, 50, 100, 150, and 200). Note that, for the *i*th study, *N*_{i} = *n*_{iE} + *n*_{iC}, with *n*_{iE} = *n*_{iC}. Vectors of individual sample sizes were generated with an skewness of +1.546, as reported by Sánchez-Meca and Marín-Martínez (1998, p. 317) in a review of meta-analytic syntheses in psychology. A total of 13 × 5 × 5 = 325 conditions were examined. For each condition, 10,000 meta-analyses were simulated, and , , and *R*^{2} were computed with the seven alternatives above presented for each simulated data set.

Moreover, the MSE was estimated as

- (8)

Finally, as described earlier, the computation of the *R*^{2} value may require truncation in various cases. When τ^{2} and are both actually positive (in which case 0 < P^{2} < 1), a large rate of truncated *R*^{2} values would reflect undesirable performance of equation (3). Therefore, the proportion of *R*^{2} values truncated to 0 or 1 was also examined for the different estimators along the simulated scenarios.

### 9. Discussion

- Top of page
- Abstract
- 1. Introduction
- 2. Mixed-effects meta-regression models
- 3. Estimating the model predictive power in meta-analysis
- 4. Previous simulation studies
- 5. Objectives and hypotheses of this study
- 6. An illustrative example
- 7. Simulation study
- 8. Results
- 9. Discussion
- 10. Conclusion
- Acknowledgement
- References

In this study, the performance of seven methods for the estimation of the total and residual heterogeneity variances, as well as the model predictive power, was assessed under a variety of realistic scenarios in applied research. The estimators here compared showed different performance, especially under adverse and intermediate conditions, while all methods provided similar and accurate estimates of the parameters of interest for the most favourable conditions (e.g., large number of studies and large number of participants per study).

Regarding the results for the total heterogeneity variance, the patterns found in this simulation are comparable to the ones reported by Viechtbauer (2005). The DL, REML, and EB estimators performed reasonably well in terms of bias and efficiency, although the DL method yielded negatively biased estimates for large parameter values, as was found in previous simulations (Malzahn *et al*., 2000; Sidik & Jonkman, 2005b, 2007; Viechtbauer, 2005). The HE estimator showed essentially unbiased results (the slight positive bias observed in Table 5 can be regarded as a consequence of truncating the negative estimates to zero) but large MSE values, while the HS and ML methods performed very efficiently but with a negative bias. Finally, the SJ method showed a large positive bias for small parameter values, as has been previously described (Sidik & Jonkman, 2005b), and the largest MSE values. The performance of the various estimators remained very similar after the inclusion of a moderator.

Regarding the estimation of the predictive power in meta-regression models with one predictor, no estimator performed accurately with less than 40 studies. Again, the HS, ML, and SJ estimators yielded the most biased estimates. The remaining estimators performed more precisely, although their estimates still showed wide variation even with a moderate to large *k*, including truncated values to zero and one, as shown in Table 7. Given the large MSE of the SJ estimator for τ^{2} and , the SJ estimator showed surprisingly efficient performance for estimating P^{2}, while the HS and ML methods now provided the largest MSE values.

Out of the different factors manipulated in this simulation, our results suggest that the number of studies exerts an important influence on the accuracy of the results, and that precise estimates of the heterogeneity variances and the model predictive power can only be expected with at least 20 and 40 studies, respectively. An increase in the average sample size also improved the results for all estimators. The critical influence of *k* on the accuracy of the heterogeneity variance estimators has already been discussed by several authors both in the context of random-effects models (e.g., Borenstein *et al*., 2009; Schulze, 2004) and mixed-effects models (Thompson & Higgins, 2002). The fact that results were more accurate as *k* and increased is in agreement with large-sample theory, which underlies the statistical models and methods in meta-analysis (Hedges, 2009). Moreover, as shown in Figure 2 and Table 8, the P^{2} estimators performed more efficiently as the total heterogeneity variance increased. An explanation of this fact is that, when estimating τ^{2}, a small parameter value will lead more often to negative estimates requiring truncation, and this will also lead to truncated *R*^{2} values.

### 10. Conclusion

- Top of page
- Abstract
- 1. Introduction
- 2. Mixed-effects meta-regression models
- 3. Estimating the model predictive power in meta-analysis
- 4. Previous simulation studies
- 5. Objectives and hypotheses of this study
- 6. An illustrative example
- 7. Simulation study
- 8. Results
- 9. Discussion
- 10. Conclusion
- Acknowledgement
- References

When a meta-analysis is carried out, some variability is usually found among the effect sizes from the individual studies. The part of that variability due to systematic differences among studies can be quantified by estimating the heterogeneity (or between-studies) variance, τ^{2}. Moreover, if the results from the studies are not homogeneous, the meta-analyst may be interested in the identification of one or more study characteristics that can explain part of the variability among the results. This goal can be addressed through meta-regression analyses, which are typically conducted under a mixed-effects model. Two parameters of interest in a mixed-effects meta-regression model are the residual heterogeneity variance after including one or more moderators, , and the predictive power of the moderator(s) included in the model, P^{2}.

The results obtained in this simulation study suggest that about 40 studies are required to get accurate estimates of P^{2} in mixed-effects meta-regression models, so that a cautious interpretation of *R*^{2} values should be advised for meta-regression models fitted with a smaller number of studies (Thompson, 1994). Among the different estimators here compared, the REML, DL, and EB methods showed the most accurate results across the different scenarios and criteria considered. Although the present study focused on standardized mean differences, it is likely that our findings can be generalized to meta-analyses with other effect size measures that are (at least approximately) normally distributed. However, conclusions from this simulation are restricted to the scenarios considered here, so that further simulation studies are needed in order to account for conditions different from the ones included in the present study.