Volume 11, Issue 3
RESEARCH ARTICLE
Open Access

Methods for estimating between‐study variance and overall effect in meta‐analysis of odds ratios

Ilyas Bakbergenuly

Corresponding Author

E-mail address: i.bakbergenuly@uea.ac.uk

School of Computing Sciences, University of East Anglia, Norwich, UK

Correspondence

Ilyas Bakbergenuly, School of Computing Sciences, University of East Anglia, Norwich NR4 7TJ, UK.

Email: i.bakbergenuly@uea.ac.uk

Search for more papers by this author
David C. Hoaglin

Population and Quantitative Health Sciences, University of Massachusetts Medical School, Worcester, Massachusetts, USA

Search for more papers by this author
Elena Kulinskaya

School of Computing Sciences, University of East Anglia, Norwich, UK

Search for more papers by this author
First published: 29 February 2020
Citations: 1

Funding information: Economic and Social Research Council, Grant/Award Number: ES/L011859/1

Abstract

In random‐effects meta‐analysis the between‐study variance ( τ2) has a key role in assessing heterogeneity of study‐level estimates and combining them to estimate an overall effect. For odds ratios the most common methods suffer from bias in estimating τ2 and the overall effect and produce confidence intervals with below‐nominal coverage. An improved approximation to the moments of Cochran's Q statistic, suggested by Kulinskaya and Dollinger (KD), yields new point and interval estimators of τ2 and of the overall log‐odds‐ratio. Another, simpler approach (SSW) uses weights based only on study‐level sample sizes to estimate the overall effect. In extensive simulations we compare our proposed estimators with established point and interval estimators for τ2 and point and interval estimators for the overall log‐odds‐ratio (including the Hartung‐Knapp‐Sidik‐Jonkman interval). Additional simulations included three estimators based on generalized linear mixed models and the Mantel‐Haenszel fixed‐effect estimator. Results of our simulations show that no single point estimator of τ2 can be recommended exclusively, but Mandel‐Paule and KD provide better choices for small and large numbers of studies, respectively. The KD estimator provides reliable coverage of τ2. Inverse‐variance‐weighted estimators of the overall effect are substantially biased, as are the Mantel‐Haenszel odds ratio and the estimators from the generalized linear mixed models. The SSW estimator of the overall effect and a related confidence interval provide reliable point and interval estimation of the overall log‐odds‐ratio.

Highlights

What is already known?

In combining estimates from studies that had a binary individual‐level outcome, the most common methods of meta‐analysis use a weighted average of the studies' odds ratios (on the logarithmic scale), under a random‐effects model; but their required estimators of the between‐study or heterogeneity variance suffer from bias and below‐nominal coverage, and produce bias and undercoverage in estimates of the overall log‐odds‐ratio.

What is new?

Our extensive simulations confirm that the usual methods of meta‐analysis produce biased estimates of the overall effect and confidence intervals whose coverage is too low. Estimates of heterogeneity variance have similar shortcomings. Small sample sizes are rather problematic, and meta‐analyses that involve numerous small studies are especially challenging.
  • For estimating between‐study variance, a new method (KD), based on an improved approximation to the null distribution of Cochran's Q, provides reliable interval estimates. The KD point estimator is inferior to another estimator (Mandel‐Paule) when the number of studies is small, but is better otherwise.
  • A new, pragmatic point estimator of the overall effect (SSW) uses a weighted average in which a study's weight is proportional to its effective sample size. It has less bias than the popular inverse‐variance‐weighted estimators and three estimators obtained from generalized linear mixed models.
  • The best interval estimator of the overall log‐odds‐ratio is centered on SSW and bases its endpoints on a t distribution and the KD point estimator of the between‐study variance.

Potential impact for RSM readers outside the authors' field

  • The methods in common use for random‐effects meta‐analysis of odds ratios can advantageously be replaced by the new estimators, which have better performance.
  • Meta‐analysis software should include the new estimators.

1 INTRODUCTION

Meta‐analysis is broadly used for combining estimates of a measure of effect from a set of studies in order to estimate an overall (pooled) effect. In studies with binary individual‐level outcomes, the most common measure of treatment effect is the odds ratio.1 Our primary interest lies in meta‐analysis of odds ratios. The actual measure of effect is the logarithm of the odds ratio (LOR), and the summary data are the numbers of subjects and the numbers of events in the two arms of each study, from which the usual analysis calculates the logarithm of each study's sample odds ratio and the large‐sample estimate of its variance. A fixed‐effect (or common‐effect) model (FEM) assumes that the studies share a single true effect. It is usually more likely that the true study‐level effects differ. A random‐effects model (REM) describes that variation via a distribution, whose mean serves as the overall effect and whose variance summarizes the heterogeneity of the true study‐level effects. Higgins et al2 point out, “This variance explicitly describes the extent of the heterogeneity and has a crucial role in assessing the degree of consistency of effects across studies, which is an element of random‐effects meta‐analysis that often receives too little attention.”

We focus mainly on two‐stage approaches, which first calculate the studies' log‐odds‐ratios (and their estimated variances) and then combine those estimates; but we include, for limited comparisons, some one‐stage approaches, which use the studies' numbers of events and subjects (eg, in a binomial likelihood) and avoid calculating the sample log‐odds‐ratios. To estimate the overall effect, the most common methods use a weighted average of the study‐level estimates in which the weight for a study's estimate is the reciprocal of an estimate of its variance. Under the REM such inverse‐variance weights combine the variance of the study‐level estimate and the variance of the distribution of true study‐level effects ( τ2). Thus, they require an estimate of the between‐study variance. Most of the common inverse‐variance‐weighted methods estimate τ2 by using the theoretical moments of Cochran's Q or its generalization. However, Kulinskaya and Dollinger3 and van Aert et al4 have shown that, for log‐odds‐ratio, the distributions assumed for those theoretical moments are incorrect. As a result, the moment‐based point estimators of τ2 are biased, and the interval estimators have coverage below the intended 95% level. Also, in combination with inverse‐variance weighting, the departures from assumptions lead to biased point estimation of the overall effect and undercoverage of the associated confidence intervals (CIs). Therefore, for estimating between‐study variance, we propose new point and interval estimators based on an improved approximation to the moments of Cochran's Q statistic, suggested by Kulinskaya and Dollinger.3 For the overall effect, we propose a weighted average in which the weights depend only on the effective sample sizes.

We use simulation to compare bias of our proposed point estimator of τ2 with that of three previous moment‐based estimators (the popular estimators of DerSimonian and Laird5 and Mandel and Paule6 and the less‐familiar estimator of Jackson7) and the restricted‐maximum‐likelihood estimator, and also to compare coverage of our proposed interval estimator of τ2 with that of four previous estimators (profile likelihood,8 the Q‐profile [QP] interval,9 and the generalized QP intervals of Biggerstaff and Jackson10 and Jackson7). We also compare bias and coverage of our proposed point estimator of the overall effect and a companion interval estimator with those of the related inverse‐variance‐based estimators. We extend the comparisons by including point estimators of τ2 and point and interval estimators of the overall effect obtained from logistic linear mixed‐effects models, and also the Mantel‐Haenszel estimator of the odds ratio.

Section 2 reviews estimation of study‐level log‐odds‐ratio and Section 3 briefly reviews REMs. Section 4 discusses previous point and interval estimators of between‐study variance and introduces the proposed Kulinskaya‐Dollinger method. Section 5 describes the corresponding point and interval estimators of the overall effect. Section 6 presents our simulation study and summarizes its results. In Section 7, we apply the various methods to data on the effect of diuretics on pre‐eclampsia. Section 8 offers a concluding summary.

The Supporting Information (Data S1) reviews the logistic linear mixed‐effects models, tabulates the methods studied in our simulations, discusses the properties of the M‐H estimator under the REM, presents the results of the additional simulations that included the logistic linear mixed‐effects estimators and the M‐H estimator, and lists our R programs for calculating the proposed estimators.

2 ESTIMATION OF STUDY‐LEVEL LOG‐ODDS‐RATIO

Consider K studies that used a particular individual‐level binary outcome. Each study i reports a pair of independent binomial variables, X iT and X iC, the numbers of events in n iT subjects in the Treatment arm (j = T) and n iC subjects in the Control arm (j = C); for i = 1, …, K:
X iT Binom n iT p iT and X iC Binom n iC p iC . (2.1)
The log‐odds‐ratio for Study i is:
θ i = log e p iT 1 p iC p iC 1 p iT estimated by θ ^ i = log e p ^ iT 1 p ^ iC p ^ iC 1 p ^ iT . (2.2)
The large‐sample estimate of the variance of θ ^ i , derived by the delta method, is:
σ ^ i 2 = Var ^ θ ^ i = 1 n iT p ^ iT 1 p ^ iT + 1 n iC p ^ iC 1 p ^ iC (2.3)
(in finite samples Var θ ^ i is not finite). Evaluation of θ ^ i and σ ^ i 2 requires the estimates p ^ ij . The usual (and maximum‐likelihood) estimate of p ij is p ^ ij = x ij / n ij , but an adjustment is necessary when either of the observed counts x ij is 0 or n ij (ie, when the 2 × 2 table for Study i contains a 0 cell). The standard approach adds 1/2 to x iT, n iT − x iT, x iC, and n iC − x iC when the 2 × 2 table contains exactly one 0 cell, and it omits Study i when the 2 × 2 table contains two 0 cells. An alternative approach always adds a (>0) to all four cells of the 2 × 2 table for each of the K studies; that is, it estimates p ij by p ^ ij a = x ij + a / n ij + 2 a . The most common choice, a = 1/2, removes bias of order n−1 in θ ^ i .11 It is convenient to denote the resulting estimate of θ i by θ ^ i a .

Using p ^ ij a with a = 1/2 in Equation (2.3) yields an estimator of Var θ ^ i a that is unbiased except for terms of order n−3. When n ijp ij < 3, however, that estimator substantially overestimates Var θ ^ i a . 12 As far as we are aware, the corresponding small‐sample bias of the standard approach has not been calculated. However, using unbiased estimators of the θ i and Var θ ^ i does not make the inverse‐variance estimator of the combined LOR unbiased, because 1 / Var ^ θ ^ i is a biased estimator of 1 / Var θ ^ i and the θ ^ i and their estimated variances are not independent.13, 14

2.1 Double‐zero studies

Meta‐analysis of binary data is challenging when the event rates are low. Such situations may involve so‐called double‐zero studies (ie, studies with zero events in both arms or, at the other extreme, x iT = n iT and x iC = n iC). Actual practice varies, but often meta‐analyses omit these studies. A popular argument is that such studies provide no information on the direction or magnitude of the effect.15

Simulations that retain double‐zero studies are rather scarce. Kuss16 considers only methods that include double‐zero studies without adjustment. Bhaumik et al13 refer to their extensive simulation study comparing inclusion (with a = 1/2) and exclusion of double‐zero studies and claim that inclusion results in less bias in estimation of the overall effect, but negatively affects estimation of τ2. Cheng et al17 provide a review and some limited simulations for p ≤ .01 and K = 5. They argue that including double‐zero studies is beneficial when θ is 0, but detrimental when a true treatment effect exists. We believe that this issue has no major practical consequences for our simulations (Section 6) because we use θ ≥ 0 and p iC ≥ .1.

3 RANDOM‐EFFECTS MODELS

3.1 Standard random‐effects model

The standard REM assumes that each estimated study‐level effect, θ ^ i , has an approximately normal distribution and that the true study‐level effects, θ i, follow a normal distribution:
θ ^ i N θ i σ i 2 and θ i N θ τ 2 . (3.1)
Thus, the marginal distribution of θ ^ i is N θ σ i 2 + τ 2 . Although the σ i 2 are generally unknown, they are routinely replaced by their estimates, σ ^ i 2 . A key step involves estimating the between‐study variance, τ2; the most popular random‐effects method uses the DerSimonian‐Laird (DL) estimate.5 The estimate of the overall effect is then
θ ^ RE = i = 1 K w ^ i θ ^ i i = 1 K w ^ i , (3.2)
where w ^ i = w ^ i τ ^ 2 = σ ^ i 2 + τ ^ 2 1 is the inverse‐variance weight for Study i. If the σ i 2 and τ2 were known, the variance of θ ^ RE would be [∑w i]−1 with w i = σ i 2 + τ 2 1 . In practice, the variance of θ ^ RE is traditionally estimated by w ^ i τ ^ 2 1 , and a CI for θ uses critical values from the normal distribution.

The assumptions in this model (eg, within‐study normality, between‐study normality, and known σ i 2 ) have become familiar and seldom attract attention. Jackson and White,14 however, advocate careful examination; they conclude that methods that make fewer normality assumptions should be considered more often in practice.

3.2 Logistic linear mixed‐effects models

One alternative approach uses a binomial‐normal likelihood; the resulting logistic linear mixed‐effects model belongs to the class of generalized linear mixed models (GLMMs).18, 19 Kuss,16 Jackson et al,20 and Bakbergenuly and Kulinskaya21 review these GLMM methods. We include a fixed‐intercept model (FIM) and a random‐intercept model (RIM), equivalent to Models 4 and 5, respectively, of Jackson et al20 and to models FIM2 and RIM2 of Bakbergenuly and Kulinskaya.21 Briefly, the FIM includes fixed control‐group effects (log‐odds for the control‐group probabilities), and the RIM replaces these fixed effects with random effects. Section A.1 in the Supporting Information gives more details.

3.3 Noncentral‐hypergeometric‐normal model (NCHGN)

When one conditions on the total number of events for Study i, X iT + X iC = X i, only the number of events in the treatment group X iT is random. Then, given the study‐specific log‐odds‐ratio θ i, X iT has a noncentral hypergeometric distribution. If the θ i are normally distributed, θ iN(θ, τ2), the exact hypergeometric‐normal likelihood function for Study i can be written as19, 22:
l HGN x iT θ τ 2 = n iT x iT n iC x iC exp x iT θ i P θ i ϕ θ i θ τ 2 d θ i , (3.3)
where the normalizing constant is defined as:
P θ i = u = max 0 X i n iC min n iT X i n iT u n iC X i u exp u θ i
and ϕ(⋅| θ, τ2) is the probability density function of the normal distribution with mean θ and variance τ2. Integrating out the unobserved study‐specific effects produces the marginal distribution of X iT. Section A.1 in the Supporting Information gives more details.

4 METHODS OF ESTIMATING BETWEEN‐STUDY VARIANCE

A number of methods provide point and interval estimates of between‐study variance. In a comprehensive review of existing simulation and empirical studies, Veroniki et al23 focus on general‐purpose estimators. Langan et al24 systematically review simulation studies that compared estimators of heterogeneity variance. They summarize performance in estimating heterogeneity and also in estimating the overall effect. The studies used a variety of effect measures, including the odds ratio. Langan et al25 use simulated data on standardized mean difference and odds ratio to compare nine estimators. We considered the recommendations of those three reports in choosing estimators to study. This section briefly reviews them; for reference, Section A.2 in the Supporting Information contains a list. More‐detailed descriptions appear in Veroniki et al,23 Langan et al,24 and Langan et al25 and in Sections A.1 and A.3 in the Supporting Information.

4.1 Point estimators

In applications, the DerSimonian‐Laird5 method remains the most popular; its relative simplicity facilitated its early implementation in software. Accumulating evidence of its inferior performance has done little to dislodge it. Recommended alternative point estimators include restricted maximum likelihood (REML), the method of Mandel and Paule,6 and the method of Jackson.7 These and other methods have been studied by many authors, including Viechtbauer26 and Kosmidis et al.27 This section briefly reviews these four methods and describes the Kulinskaya‐Dollinger method. Information on the logistic linear mixed‐effects models (FIM, RIM, and NCHGN) appears in Section A.1 in the Supporting Information. All these methods replace negative values of τ ^ 2 with zero.

4.1.1 DerSimonian‐Laird method

When τ2 = 0, the statistic Q = w ^ i θ ^ i θ ^ 2 , with w ^ i = w ^ i 0 = 1 / σ ^ i 2 and θ ^ = w ^ i θ ^ i / w ^ i , is customarily assumed to have approximately the χ2 distribution on K − 1 degrees of freedom. DerSimonian and Laird5 substitute w i = 1 / σ i 2 for w ^ i , derive the corresponding expected value of Q when Var θ ^ i = σ i 2 + τ 2 , and estimate τ2 by the method of moments. The resulting closed‐form expression has made the DL estimator attractive.

4.1.2 Restricted‐maximum‐likelihood method

Assuming that the θ ^ i are distributed as N θ σ ^ i 2 + τ 2 , the REML estimator τ ^ REML 2 maximizes the restricted (or residual) log‐likelihood function l R(θ, τ2), which differs from the ordinary likelihood function by the addition of 1 2 ln w ^ i τ 2 . It is obtained iteratively, using θ = θ ^ REML from Equation (3.2) with weights w ^ i τ ^ REML 2 . REML is superior to DL because of its balance between unbiasedness and efficiency.26 However, like DL, using the σ ^ i 2 as if they were the σ i 2 may undermine its performance.

One can also obtain the REML estimator of τ2 by maximizing the penalized log‐likelihood developed by Kosmidis et al27 to reduce the bias of maximum‐likelihood estimation.

4.1.3 Mandel‐Paule method

The Mandel‐Paule (MP) estimator, τ ^ MP 2 , is another iterative moment‐based estimator of the between‐study variance.6, 28

As in Section 3.1, let the random‐effects weights and θ ^ RE depend on τ2; denote the resulting Q by Q(τ2). The MP estimator τ ^ MP 2 is obtained by iteratively solving the equation:
Q τ 2 = i = 1 K w ^ i τ 2 θ ^ i θ ^ RE 2 = K 1 (4.1)
and requiring τ ^ MP 2 > 0 .

This method is equivalent to the empirical Bayes methods of Carter and Rolph29 and Morris,30 as noted by Rukhin and Vangel31 and Rukhin et al.32

4.1.4 Jackson method

DerSimonian and Kacker33 generalize Q, replacing the w ^ i by arbitrary fixed positive constants, a i, to obtain Q a = a i θ ^ i θ ^ a 2 , from which they derive a general method‐of‐moments estimator of τ2. They discuss several special cases, including DL (with a i = 1 / σ ^ i 2 , treating the σ ^ i 2 as fixed).

As an option when some heterogeneity is anticipated but there is little prior knowledge about its extent, Jackson7 uses Q a with a i = 1/σ i. Although that choice yields a point estimator of τ2, he focuses on the interval estimator. However, the R function inference in the supplementary materials of Jackson7 returns the point estimate. (His computational procedure avoids negative τ ^ 2 .) We abbreviate the point and interval estimators as J. In practice, meta‐analyses would use σ ^ i , so the a i in Q a are not fixed.

4.1.5 Kulinskaya‐Dollinger method

The chi‐squared approximation for Q is inaccurate, and the actual distribution depends on the effect measure. Under the null hypothesis of homogeneity of the log‐odds‐ratio, Kulinskaya and Dollinger3 obtain corrected approximations for the mean and variance of Q and match those corrected moments to obtain a gamma distribution that (as their simulations confirm) closely fits the null distribution of Q. These approximations blend theoretical derivations with simulation results. Let EKD(Q) denote the corrected expected value of Q under the null hypothesis τ2 = 0. This corrected first moment has the form EKD(Q) = K − 1 − 0.687[K − 1 − Eth(Q)], where Eth(Q) is a theoretical moment obtained from their general expansion for the mean of Q for arbitrary effect measures.34 The corrected variance of Q is a quadratic function of the corrected mean EKD(Q). The expression for Eth(Q) involved in specifying the corrected distribution of Q is not simple; Kulinskaya and Dollinger3 give the details. For large sample sizes, Eth(Q) → K − 1.

We propose a new estimator of τ2 based on this improved approximation. One obtains the KD estimate τ ^ KD 2 by iteratively solving
Q τ 2 = i = 1 K θ i θ ^ RE 2 σ ^ i 2 + τ 2 = E KD Q . (4.2)
This estimator closely resembles the MP estimator; both assume that adding τ2 to σ ^ i 2 in the IV weights makes the non‐null distribution of Q (or at least, its mean) close to its null distribution. This assumption needs to be verified by simulation.

4.2 Interval estimators

Viechtbauer9 and Jackson and Bowden35 compare CI estimators of the between‐study variance. Interval estimators recommended by Veroniki et al23 include profile likelihood,8 the QP interval,9 and the generalized QP intervals of Biggerstaff and Jackson10 and Jackson.7 Quality of estimation varies with the effect measure; for odds ratio van Aert et al4 found that coverage of the last three methods can deviate substantially from the nominal 95% level. If the lower confidence limit is not defined or is negative, all these methods set it to zero. The logistic linear mixed‐effects methods (FIM, RIM, and NCHGN) as implemented in the rma.glmm function in metafor,36 used in our simulations, do not produce CIs for τ2.

4.2.1 Profile‐likelihood interval

The 95% profile‐likelihood (PL) CI for τ2 consists of the values that are not rejected by the likelihood‐ratio test with τ2 as the null hypothesis.8 Here the other parameter in the likelihood, θ ^ , is a function of τ2, as in Equation (3.2). The values of τ2 in the CI satisfy
τ 2 : l R θ ^ τ 2 τ 2 > l R θ ^ REML τ ^ REML 2 1 2 χ 1 ; 0.95 2 , (4.3)
where χ 1 ; 0.95 2 = 3.841 is the 0.95 quantile of the χ 1 2 distribution, and l R θ ^ τ 2 τ 2 is the restricted log‐likelihood function evaluated at θ ^ τ 2 τ 2 .

4.2.2 Q‐profile confidence interval

If the weight for Study i is 1 / σ i 2 + τ 2 , the generalized Q‐statistic
Q τ 2 = i = 1 K θ ^ i θ ^ τ 2 2 σ i 2 + τ 2 (4.4)
follows the chi‐squared distribution with K − 1 degrees of freedom. To obtain the QP CI, Viechtbauer9 finds the lower and upper confidence limits by iteratively solving Q τ ˜ L 2 = χ K 1 ; 0.975 2 and Q τ ˜ U 2 = χ K 1 ; 0.025 2 . In practice, it is necessary to use the σ ^ i 2 instead of the σ i 2 , and then the generalized Q‐statistic no longer follows the assumed chi‐squared distribution.

4.2.3 Biggerstaff and Jackson interval

For a generic effect measure, Biggerstaff and Jackson10 derive the exact distribution of the statistic
Q = i = 1 K w i θ ^ i θ ^ 2 , (4.5)
where w i = 1 / σ i 2 and θ ^ = w i θ ^ i / w i . They show that the distribution is that of a linear combination of mutually independent chi‐squared random variables, each with 1 degree of freedom, and they take advantage of available software for evaluating the cumulative distribution function F Q of such a distribution.
That distribution yields a generalized QP CI, whose lower and upper limits are the solutions to the equations:
Q τ ˜ L 2 = F Q ; 0.975 , Q τ ˜ U 2 = F Q ; 0.025 , (4.6)
in which F Q; 0.025 and F Q; 0.975 are, respectively, the 0.025 and 0.975 quantiles. If the equation for τ ˜ L 2 has no solution, they set τ ˜ L 2 = 0 . We refer to this interval as the BJ CI.

Despite the title of Biggerstaff and Jackson,10 Q in (4.5) is not Cochran's heterogeneity statistic. In the definition of Q, Cochran37 used w i = 1 / σ ^ i 2 .

4.2.4 Jackson interval

As mentioned in Section 4.1.4, Jackson7 proposes another generalized QP CI for τ2. The approach is the same as for the BJ interval, but with a i = 1/σ i in Q a.

4.2.5 Kulinskaya‐Dollinger interval

For the log‐odds‐ratio, we propose a new CI for the between‐study variance. The KD CI for τ2 combines the QP approach and the improved approximation by Kulinskaya and Dollinger.3 This corrected QP CI can be estimated from the lower and upper quantiles of F Q, the cumulative distribution function for the corrected distribution of Q, as in Equation (4.6). The upper and lower confidence limits for τ2 can be calculated iteratively.

5 METHODS OF ESTIMATING OVERALL EFFECT

Most of the point estimators of the overall effect have corresponding interval estimators, but some do not. Therefore, we describe point estimators and interval estimators in separate sections.

5.1 Point estimators

A random‐effects method that estimates θ by a weighted mean with inverse‐variance weights, as in Equation (3.2), is determined by the particular τ ^ 2 that it uses in w ^ i τ ^ 2 . The best‐known and most widely used estimator, θ ^ DL , was introduced by DerSimonian and Laird5; it uses τ ^ DL 2 . Its shortcomings, in particular bias and below‐nominal coverage of the companion CI, have led numerous authors to propose alternative estimators of τ2. Some of those shortcomings arise from the derivation underlying τ ^ DL 2 , which uses the σ i 2 and τ2 and then substitutes the σ ^ i 2 and τ ^ 2 . Unfortunately, the alternative methods (REML, J, and MP) generally rely on that same unsupported substitution. In our simulations, we add one more inverse‐variance‐weighted estimator, KD, to this list.

In an attempt to avoid the bias in the inverse‐variance‐weighted estimators, we include a point estimator whose weights depend only on the studies' effective sample sizes.38, 39 For this estimator (SSW) θ ^ i uses p ^ ij a with a = 1/2 (as discussed in Section 2), and w i = n ˜ i = n iT n iC / n iT + n iC ; n ˜ i is the effective sample size in Study i. These weights would be equivalent to the inverse‐variance weights if all the probabilities across studies were equal (ie, p iT = p iC ≡ p for i = 1, …, K).

As we mentioned in Section 1, we also include estimators obtained from logistic linear mixed‐effects models, namely FIM, RIM, and NCHGN.

A reviewer pointed out that the weights in SSW are the same as those in the MH estimator of a common risk difference,40 and suggested that we include the MH estimator of a common odds ratio. That fixed‐effect estimator applies the weight (n iT − x iT)x iC/(n iT + n iC) to the sample odds ratio for Study i. As we discuss in Section A.3 in the Supporting Information, we expect MH to be biased under the REM.

In summary, the point estimators that we study are DL, REML, J, MP, KD, SSW, FIM, RIM, NCHGN, and MH.

5.2 Interval estimators

The point estimators DL, REML, J, MP, and KD have companion interval estimators of θ. The customary approach estimates the variance of θ ^ RE by w ^ i τ ^ 2 1 and bases the width of the interval on the normal distribution. That expression for the variance of θ ^ RE would be correct if it were based on w i = σ i 2 + τ 2 1 . In practice, however, using w ^ i τ ^ 2 may not yield a satisfactory approximation. Also, we have not seen empirical evidence that the sampling distributions of θ ^ RE for the various choices of estimator for τ2 are adequately approximated by a normal distribution.

Hartung and Knapp41 and, independently, Sidik and Jonkman42 developed an estimator for the variance of θ ^ DL that takes into account the variability of the σ ^ i 2 and τ ^ 2 . The Hartung‐Knapp‐Sidik‐Jonkman (HKSJ) CI uses the estimator:
Var ^ HKSJ θ ^ DL = i = 1 K w ^ i τ ^ DL 2 θ ^ i θ ^ DL 2 K 1 i = 1 K w ^ i τ ^ DL 2 , (5.1)
together with critical values from the t distribution on K − 1 degrees of freedom. A potential weakness is that the derivation of the variance estimator and the t distribution uses the σ i 2 and τ2 and then substitutes the σ ^ i 2 and τ ^ DL 2 . Also, the HKSJ interval uses θ ^ DL as its midpoint, so it will have any bias that is present in θ ^ DL . We study a modification of HKSJ (HKSJ KD) that uses the KD estimator of τ2 and uses θ ^ KD as the midpoint.
The interval estimator corresponding to SSW (SSW KD) uses the SSW point estimator as its center, and its width equals the estimated standard deviation of SSW under the REM times twice the critical value from the t distribution on K − 1 degrees of freedom. The estimator of the variance of SSW is
Var ^ θ ^ SSW = n ˜ i 2 v i 2 + τ ^ 2 n ˜ i 2 , (5.2)
in which v i 2 comes from Equation (2.3) and τ ^ 2 = τ ^ KD 2 .

In summary, the interval estimators that we study are DL, REML, J, MP, KD, HKSJ, HKSJ KD, SSW KD, FIM, RIM, and NCHGN.

6 SIMULATION STUDY

In a simulation study with log‐odds‐ratio as the effect measure, we varied six parameters: the number of studies K, the total sample size of each study n, the proportion of observations in the Control arm q, the overall true LOR θ, the between‐study variance τ2, and the probability of an event in the Control arm p C. The number of studies K ∈ {5, 10, 30}. We included sample sizes that were equal for all K studies and sample sizes that varied among studies. The total sample sizes were n ∈ {40, 100, 250, 1000} for equal sample sizes, and the average total sample sizes were n ¯ 30 , 60 , 100 , 160 for unequal sample sizes. In choosing sample sizes that varied among studies, we followed a suggestion of Sánchez‐Meca and Marín‐Martínez,43 who selected study sizes having skewness 1.464, which they considered typical in behavioral and health sciences. For K = 5, Table 1 lists the sets of five sample sizes, which have the chosen skewness and average equal to 30, 60, 100, and 160. The simulations for K = 10 and K = 30 used each set of unequal sample sizes twice and six times, respectively. The values of q were .5 and .75. The sample sizes of the Treatment and Control arms were n iT = ⌈(1 − q)n i and n iC = n in iT, i = 1, …, K. The values of the overall true LOR were θ = 0(0.5)2 (ie, from 0 to 2 in steps of 0.5). The probability in the Control arm was p iC = .1, .2, .4. The values of the between‐study variance were τ2 = 0(0.1)1, corresponding to small to moderate heterogeneity. This interval of τ2 values is similar to or, for smaller sample sizes, somewhat shorter than that for the meta‐analyses of LOR in the Cochrane database (Appendix 2 of Langan et al)25.

Table 1. Unequal sample sizes for simulations with K = 5
n ¯ i : 1 2 3 4 5
30 12 16 18 20 84
60 24 32 36 40 168
100 64 72 76 80 208
160 124 132 136 140 268

Altogether, the simulations comprised 7920 combinations of the six parameters. We generated 10 000 meta‐analyses for each combination. The true values of LOR (θ i) were generated from a normal distribution with mean θ and variance τ2. For a given p iC, the number of events in the control group, X iC, was generated from the Binomial(n iC, p iC) distribution. The number of events in the treatment group, X iT, was generated from the Binomial(n iT, p iT) distribution with p iT = p iC exp(θ i)/(1 − p iC + p iC exp(θ i)). The estimate θ ^ i was calculated as in Equation (2.2), and its sampling variance was estimated by substituting p ^ iT and p ^ iC in Equation (2.3). The methods differed however, in the way they obtained p ^ ij from x ij and n ij. In all standard methods, we added 1/2 to each cell of the 2 × 2 table only when the table had at least one cell equal to 0. This approach corresponds to the default values of the arguments add, to, and drop00 of the escalc procedure in metafor.36 In the KD methods, and for estimation of θ i in SSW, we corrected for bias by adding 1/2 to each cell of all K tables. We also tried always adding 1/2 in the standard methods, but that made the biases for τ ^ 2 worse.

In expanding our comparative study, we included the MH estimator of θ and the estimators from the FIM, RIM, and NCHGN models in simulations for selected values of the parameters: p C = .1, q = .5 and equal sample sizes with n = 40 and n = 100. The three logistic linear mixed‐effects methods provide point but not interval estimators of τ2 and both point and interval estimators of θ. For the MH point estimator of θ, we studied two versions: the usual version (MH), which does not modify the cell counts, and a version that always adds 1/2 to each cell (MH with 1/2). The results of these additional simulations are plotted in Section A.4 in the Supporting Information.

6.1 Results of simulation studies

Our full simulation results are available as an arXiv e‐print.44 They comprise 300 figures, each presenting a plot of bias or coverage vs τ2 for the four values of n or n ¯ and the three values of K. A detailed summary is given below and illustrated by Figures 1-3.

image
Bias of τ ^ KD 2 and τ ^ MP 2 in estimating the between‐study variance τ2 for θ = 0(0.5)2, p iC = .1, q = .5, n = 40, 100. The symbols for the values of θ are θ = 0, black °; θ = 0.5, brown ; θ = 1, green +; θ = 1.5, blue ×; and θ = 2, red . Light grey line at 0 [Colour figure can be viewed at wileyonlinelibrary.com]
image
Bias and ratio of MSEs for estimators of the overall effect θ for θ = 0, p iC = .1, q = .5, and equal sample sizes n = 40, 100. Light grey line at 0 and 1, respectively [Colour figure can be viewed at wileyonlinelibrary.com]
image
Coverage of between‐studies variance τ2 (top two rows) and overall effect θ (bottom two rows) for θ = 0, p iC = .1, q = .5, and unequal sample sizes n ¯ = 30 , 100 . Light grey line at 0.95 [Colour figure can be viewed at wileyonlinelibrary.com]

6.1.1 Bias in estimation of τ2

All the estimators have bias that varies with τ2, often roughly linearly. The sign and magnitude of the bias and the slope of that relation depend on p iC, θ, n, K, and q. For example, when p iC = .1, θ = 0, q = .5, n = 40, and K = 5, the bias of KD goes from +0.32 when τ2 = 0 to −0.08 when τ2 = 1, and the traces for the other estimators, close together, go from around +0.12 to around −0.47. Among these, MP appears to be the least biased. As K increases, the pattern shifts down; and as n increases, the traces tend to flatten (when n = 1000, most of the estimators are unbiased, but the bias when τ2 = 1 is −0.08 for J and −0.17 for DL). As θ increases, the patterns shift down. When all studies are unbalanced (in favor of the Control arm), q = .75, the patterns often shift down, and the slopes become steeper.

Figure 1 shows these patterns for KD and MP in the balanced case (q = .5). Both estimators have positive bias at zero, but for larger values of τ2, the bias of MP is mostly negative, whereas for KD it may be positive for larger values of θ. MP is considerably worse than KD (apart from τ2 = 0) for K = 30. For K = 5 and 10, KD is less biased than MP for large values of τ2, but it may be worse for small values.

The effect of increasing p iC is not simple. As p iC increases from .1 to .2 to .4, the (positive) bias of KD at τ2 = 0 decreases, and its bias at τ2 = 1 approaches 0; at τ2 = 0 the (positive) bias of the other estimators changes little, but at τ2 = 1 the magnitude of the (negative) bias decreases when θ = 0 but decreases and then increases when θ = 2.

None of the point estimators of τ2 has bias consistently close enough to 0 to be recommended; but among the existing estimators, MP and KD provide better choices for small and large K, respectively (Figure 2).

6.1.2 Bias in estimation of θ

In the results for bias of the point estimators of θ, a common pattern is that the bias is roughly linearly related to τ2 with a positive slope. The varied positions of the estimators' traces relative to the horizontal line of zero bias, however, complicate the process of summarizing. The situation with p iC = .1 and θ = 0 is straightforward: When n = 40 and K = 5, all estimators have no bias when τ2 = 0; when τ2 = 1, SSW has bias 0.14, and the other estimators' biases range from 0.23 to 0.26. Increasing K (to 10 and 30) has little effect on the pattern, and increasing n (to 100, 250, and 1000) flattens the pattern until little bias remains. (The plots for n = 100 show that the bias of SSW decreases more rapidly.) When θ = 0.5, the pattern splits into three: SSW has much smaller slope and flattens to essentially zero bias; the bias of KD changes from negative to positive around τ2 = 0.5; and the common trace for the other estimators parallels that for KD and is about 0.06 units above it. Again, by n = 1000 the traces have flattened and merged. As θ increases (to 1.0, 1.5, and 2.0), the traces for all estimators except SSW shift down further, and the gap between KD and the others widens. When p iC = .2 and θ > 0, slopes of the non‐SSW traces decrease as θ increases (the traces are flat when θ = 2). When p iC = .4 and θ > 0, the non‐SSW traces go from flat to having negative slope as θ increases. Also, increasing K tends to shift those traces down slightly.

When all K studies are unbalanced (q = .75), p iC = .1, θ = 0, and n = 40, the estimators have larger positive bias, even when τ2 = 0. This effect decreases as θ and p iC increase, consistent with the behavior when q = .5, and it is absent when n ≥ 100.

As expected, in the vast majority of situations, SSW avoids most, if not all, of the bias in the IV‐weighted estimators. The bias of the IV‐weighted estimators affects their efficiency, so SSW tends to have smaller mean squared error than MP as τ2 and K increase, but larger MSE than KD when K = 5 and K = 10 and when K = 30 and τ2 is small (Figure 2).

6.1.3 Coverage in estimation of τ2

Coverage of τ2 is generally good for K = 5, but is considerably worse for larger numbers of studies, especially so for large values of θ. All methods are somewhat conservative at τ2 = 0. When K = 5, PL is very conservative, whereas KD provides close to nominal coverage for τ2 > 0, though it may become a bit liberal for large θ. The other methods are between these two, being somewhat conservative for small sample sizes n. For K = 10, PL is still mostly conservative, though it may become somewhat liberal for larger τ2. KD is almost perfect, though in one instance, for unequal sample sizes with n ¯ = 30 , p C = .4 , and θ = 2, its coverage drops to 90%. The other intervals are too liberal for small n. The large number of studies K presents the greatest challenge for the standard methods. PL is the most affected, with considerable undercoverage up to n = 100 for medium to large values of τ2. The other methods also have low coverage for small n, but they improve faster with increasing n. KD provides reliable coverage except for small sample sizes combined with p C = .4 and θ ≥ 1.5, where its undercoverage worsens with increasing τ2, though it is still considerably better than all the competitors (Figures 2 & 3).

6.1.4 Coverage in estimation of θ

Interval estimators of θ respond in a variety of ways to the variables in the simulations. No simple description adequately summarizes the patterns. In one common pattern, coverage decreases as τ2 increases, often falling substantially below the nominal 95% for the IV‐weighted estimators. For a given value of θ and K = 10 and K = 30, undercoverage tends to decrease as n increases. For K = 5, however, the undercoverage of the IV‐weighted estimators generally increases as n goes from 40 to 100 to 250 to 1000; when n = 1000, coverage is around 95% when τ2 = 0 and roughly constant, at several percentage points below 95%, for 0.1 ≤ τ2 ≤ 1 (the decrease is greater for p iC = .2 and p iC = .4 than for p iC = .1). Because HKSJ, HKSJ KD, and SSW KD do not exhibit such undercoverage in these situations, the explanation is likely to lie in the use of the normal distribution as the basis for the CI (Figure 3).

On the other hand, for given values of θ, n = 40 and n = 100, and τ2 > 0, coverage tends to decrease as K increases. This effect is small for SSW KD (which moves from overcoverage to coverage close to 95%) and larger (by varying amounts) for all the other estimators. Thus, counterintuitively, when more than a small amount of heterogeneity is present and n ≤ 100, increasing the number of studies decreases coverage. A likely contributor is bias in estimating θ, which (for n = 40 and n = 100) is positive and increasing as τ2 increases, and changes little with K.

A different pattern arises when θ ≥ 1, n = 40 and n = 100 (and n ¯ = 30 and 100), and K = 30. Coverage of HKSJ KD and KD is below 95% when τ2 = 0 and increases toward 95% as τ2 increases. For KD the explanation probably lies in its bias in estimating θ, which is negative and rises toward 0 (but remains <0) as τ2 increases. For HKSJ KD (which has greater undercoverage), the reason is less clear. Undercoverage of both KD and HKSJ KD at τ2 = 0 increases as θ increases. This pattern arises when p iC = .1 and p iC = .2. When p iC = .4, however, it is not evident when θ = 1. When θ ≥ 1.5, coverage of KD and HKSJ KD decreases as τ2 increases and then stabilizes.

We do not recommend standard CIs based on IV‐weighted estimators of θ, because of their undercoverage. HKSJ and HKSJ KD often have coverage close to 95%, but they sometimes have serious undercoverage. All problems are typically worse for the unbalanced sample sizes. The SSW KD interval often has coverage somewhat greater than 95%, but its coverage is at least 93% (except for a few cases involving K = 30 and unequal sample sizes with n ¯ = 30 ) (Figure 2).

6.1.5 Additional results: FIM, RIM, NCHGN, and MH

In estimating τ2, FIM and RIM (Figure A4.1 in the Supporting Information) often have bias that is between those of τ ^ KD 2 and τ ^ MP 2 (Figure 1) and is generally not small, going from positive near τ2 = 0 to negative at larger τ2. The size of their bias tends to decrease as K increases. As θ increases, the bias of RIM tends to decrease, whereas the bias of FIM remains roughly constant. The pattern of NCHGN is more complicated: positive and decreasing as K increases and θ increases when n = 40; but roughly linear (+ to −) in τ2, increasing as θ increases, and flattening as K increases when n = 100, where NCHGN is almost unbiased for larger τ2 when K = 30. However, convergence rates of NCHGN are rather low, especially so for low values of τ2 and K; they improve somewhat for larger values of n (Figure A4.4 in the Supporting Information).

For point estimation of θ (Figure A4.2 in the Supporting Information), the biases of FIM, RIM, NCHGN, and MH follow patterns that resemble those of KD and MP and are quite unlike the (generally more favorable) patterns of SSW. The bias of MH increases with τ2, starting at 0 when θ = 0 but at around −0.1 to −0.05 when θ = 1. MH with 1/2 is less positively biased than MH when θ =  − 1 or 0, but more negatively biased than MH for low values of τ2 when θ = 1 (Figures A3.1 and A3.2 in the Supporting Information).

For interval estimation of θ (Figure A4.3 in the Supporting Information), FIM, RIM, and NCHGN generally have lower coverage than the other estimators, decreasing as θ increases. When n = 100, the coverage of RIM decreases rapidly as τ2 increases, and that pattern becomes more pronounced as K increases.

In summary, we do not recommend MH or the GLMMs for point or interval estimation of θ.

7 EXAMPLE: EFFECTS OF DIURETICS ON PRE‐ECLAMPSIA

Data from nine trials that reported the effect of diuretics on pre‐eclampsia45 were studied by Hardy and Thompson,8 Biggerstaff and Tweedie,46 Turner et al,18 Viechtbauer,9 Kulinskaya and Olkin,47 and Bakbergenuly and Kulinskaya.48 The data are shown in Table 2 and are re‐analyzed here in order to compare the methods of point and interval estimation of between‐study variance and the log‐odds‐ratio. For comparison we include results from three GLMMs available in the metafor package36: the FIM, the RIM, and the exact method based on the NCHGN. Bakbergenuly and Kulinskaya21 give more details on those methods.

Table 2. Data for meta‐analysis on effects of diuretics on pre‐eclampsia45
Study y iT y iC n iT n iC p ^ iT p ^ iC θ ^ i n ˜ i
1 14 14 131 136 0.1068 0.1029 0.042 66.727
2 21 17 385 134 0.0545 0.1268 −0.924 99.403
3 14 24 57 48 0.2456 0.5000 −1.122 26.057
4 6 18 38 40 0.1579 0.4500 −1.473 19.487
5 12 35 1011 760 0.0118 0.0460 −1.391 433.857
6 138 175 1370 1336 0.1007 0.1310 −0.297 676.393
7 15 20 506 524 0.0296 0.0382 −0.262 257.421
8 6 2 108 103 0.0555 0.0194 1.089 52.720
9 65 40 153 102 0.4248 0.3921 0.135 61.200

Table 3 provides the point estimates of the between‐study variance and the point estimates and CIs for the overall log‐odds‐ratio and the overall odds ratio; and Table 4 shows the point estimates and CIs for the between‐study variance. DL has the lowest estimate of τ2, 0.230, followed by the GLMM estimates at 0.254 to 0.264, and KD gives the highest estimate, 0.392. MP is second highest at 0.386. QP provides the longest CI for τ2, with length 2.130, and KD the second longest at 1.875, whereas BJ is considerably shorter at 1.384, and NCHGN has a very short interval with a length of just 0.667.

Table 3. Meta‐analysis of diuretics in pre‐eclampsia
Model Method τ ^ 2 θ ^ L U Length OR L U
FEM −0.398 −0.573 −0.223 0.530 0.672 0.564 0.800
REM DL 0.230 −0.517 −0.916 −0.117 0.799 0.596 0.400 0.889
REM HKSJ DL −0.517 −1.061 0.028 1.089 0.596 0.346 1.028
REM REML 0.300 −0.518 −0.956 −0.080 0.876 0.596 0.384 0.923
REM J 0.329 −0.518 −0.971 −0.065 0.906 0.596 0.379 0.937
REM MP 0.386 −0.518 −0.998 −0.037 0.961 0.596 0.369 0.963
REM KD 0.392 −0.507 −0.987 −0.027 0.960 0.602 0.373 0.973
REM HKSJ KD 0.392 −0.507 −1.054 0.040 1.094 0.602 0.348 1.040
REM SSW KD 0.392 −0.558 −1.337 0.221 1.558 0.572 0.263 1.247
GLMM FIM 0.254 −0.513 −0.923 −0.104 0.819 0.599 0.398 0.901
GLMM RIM 0.264 −0.516 −0.930 −0.102 0.828 0.597 0.395 0.903
GLMM NCHGN 0.260 −0.513 −0.927 −0.100 0.827 0.599 0.396 0.905
  • Note: Point estimates of the between‐study variance τ2 and point estimates and confidence intervals for the overall log‐odds‐ratio (θ) and the overall odds ratio (OR). L and U are the lower and upper limits of the 95% confidence intervals.
  • Abbreviations: DL, DerSimonian‐Laird; FEM, fixed‐effect model; FIM, fixed‐intercept model; GLMM, generalized linear mixed model; HKSJ DL, Hartung‐Knapp‐Sidik‐Jonkman DL; KD, Kulinskaya‐Dollinger; MP, Mandel‐Paule; NCHGN, Noncentral‐hypergeometric‐normal model; REM, random‐effects model; REML, restricted maximum likelihood; RIM, random‐intercept model.
Table 4. Meta‐analysis of diuretics in pre‐eclampsia
Model Method τ ^ 2 L U Length
REM DL (QP) 0.230 0.072 2.202 2.130
REM DL (BJ) 0.230 0.047 1.431 1.384
REM J 0.329 0.074 1.678 1.604
REM MP (QP) 0.386 0.072 2.202 2.130
REM REML (PL) 0.300 0.043 1.475 1.432
REM KD 0.392 0.087 1.962 1.875
GLMM NCHGN 0.260
  • Note: Point estimates and confidence intervals for the between‐study variance τ2. The GLMM estimate using the NCHGN distribution is included for comparison. L and U are the lower and upper limits of the 95% confidence intervals. Methods are DL with QP and BJ confidence intervals, J, MP with QP interval, REML with PL interval, and KD.
  • Abbreviations: DL, DerSimonian‐Laird; GLMM, generalized linear mixed model; KD, Kulinskaya‐Dollinger; MP, Mandel‐Paule; PL, profile‐likelihood; REM, random‐effects model; REML, restricted maximum likelihood.

In estimating θ, all inverse‐variance‐weighted methods give similar values, ranging from −0.518 to −0.517 apart from KD which is −0.507, and the GLMM methods also give similar values ranging from −0.516 to −0.513. By contrast the FEM produces the highest estimate, −0.398, and SSW produces the lowest, −0.558. All the standard inverse‐variance‐weighted methods and the GLMMs show a significant effect of diuretics on pre‐eclampsia, whereas all methods using t quantiles (HKSJ DL, HKSJ KD, and SSW KD) do not find a significant effect.

It is rather difficult to decide, from our simulation results, which method gives the best estimates, as the sample sizes, even though rather balanced, vary greatly, from 38 to 1370 in the Treatment arm. Therefore, we ran additional simulations, where we kept the sample sizes and the prevalence in the Control arm as in the actual nine trials, and varied the value of θ from −0.4 to −0.6 and the value of τ2 = 0.20(0.05)0.45 to cover the range of possible values of these parameters. We used 10 000 repetitions at each combination of θ and τ2. Results of these simulations are shown in Figure 4.

image
Bias and coverage of estimators of the between‐study variance τ2 and of the LOR θ for the sample sizes and the p ^ iC in the pre‐eclampsia data of Collins et al,45 θ =  − 0.6,  − 0.5,  − 0.4 and τ2 = 0.20(0.05)0.45 [Colour figure can be viewed at wileyonlinelibrary.com]

From these simulations, MP and KD are the least biased estimates of τ2; the other methods have considerable negative bias, especially DL and the GLMMs, RIM being the most biased. KD provides the best coverage of τ2, though the coverage of all methods appears to be reasonable. All methods but SSW considerably overestimate θ, though here NCHGN and FIM are the least biased, with positive biases of 0.01 to 0.03. Coverage of θ is best for SSW KD and somewhat too low for the other methods based on t quantiles. The coverage of the standard IV‐weighted methods based on normal quantiles is clearly not acceptable, and the GLMMs provide even worse coverage, probably because of their underestimation of τ2.

8 SUMMARY

Our extensive simulations demonstrate that the existing methods of meta‐analysis of (log) odds ratio often present a biased view of both the heterogeneity and the overall effect. In brief: small sample sizes are rather problematic, and meta‐analyses that involve numerous small studies are especially challenging. Because the study‐level effects and their variances are related, estimates of the overall effects are biased, and the coverage of CIs is too low, especially for small sample sizes and larger numbers of studies.

The between‐study variance, τ2, is typically estimated by generic methods which assume normality of the estimated effects θ ^ i . It is usually overestimated near zero, but the standard methods are negatively biased for larger values of τ2. Our findings agree with those by van Aert et al4 that the standard interval estimators of τ2 are often too liberal. The behavior of the PL method is especially erratic.

Therefore, we proposed and studied a new method of estimating τ2 based on the corrected approximation to the null distribution of Cochran's Q for log‐odds‐ratio developed by Kulinskaya and Dollinger.3 The KD method provides reliable interval estimation of τ2 across all values of τ2, n, and K. Point estimation of τ2 is more challenging; even though KD is better for K = 30, for small values of K it has positive bias and MP is better.

Arguably, the main purpose of a meta‐analysis is to provide point and interval estimates of an overall effect.

Usually, after estimating the between‐study variance τ2, inverse‐variance weights are used in estimating the overall effect and its variance. This approach relies on the theoretical result that, for known variances, and given unbiased estimates θ ^ i , it yields a uniformly minimum‐variance unbiased estimate of θ.

In practice, however, the true within‐study variances are unknown, and use of the estimated variances makes the inverse‐variance‐weighted estimate of the overall effect biased. These biases (and even their sign) depend on τ2 and the true value of θ, worsen for unbalanced studies, and may be considerable, even for reasonably large sample sizes such as n = 250. The coverage of the overall effect follows the same patterns because the centering of the CIs is biased. Additionally, traditional intervals using normal quantiles are too narrow; and the use of t quantiles, as in the HKSJ method, brings noticeable though not sufficient improvement.

Our additional simulations showed that the MH method and the GLMMs also do not perform well for point or interval estimation of θ.

A pragmatic approach to unbiased estimation of θ uses weights that do not involve estimated variances of study‐level estimates, for example, weights proportional to the study size n i. Hedges and Olkin,38 Hunter and Schmidt,39 and Shuster,49 among others, have proposed such weights. We use weights proportional to an effective sample size, n ˜ i = n iT n iC / n i ; these are equivalent to the optimal inverse‐variance weights for LOR when all the probabilities are equal. Importantly, because inverse‐variance‐weighted estimators have considerable biases, little, if any, efficiency is lost by using the sample‐size‐based weights.

A reasonable estimator of τ2, such as MP or KD, can be used as τ ^ 2 . Further, CIs for θ centered at θ ^ SSW with τ ^ KD 2 in Equation (5.2) can be used. In our simulations, this is by far the best interval estimator of θ, providing near‐nominal coverage under all studied conditions.

ACKNOWLEDGEMENTS

The work by E. Kulinskaya was supported by the Economic and Social Research Council [grant number ES/L011859/1].

    CONFLICT OF INTEREST

    The author reported no conflicts of interest.

    DATA AVAILABILITY STATEMENT

    Our full simulation results are available as an arXiv e‐print.44

      Number of times cited according to CrossRef: 1

      • Meta-analysis in vocational behavior: A systematic review and recommendations for best practices, Journal of Vocational Behavior, 10.1016/j.jvb.2020.103397, (103397), (2020).

      The full text of this article hosted at iucr.org is unavailable due to technical difficulties.