Confidence intervals for the between‐study variance in random effects meta‐analysis using generalised Cochran heterogeneity statistics

Statistical inference is problematic in the common situation in meta‐analysis where the random effects model is fitted to just a handful of studies. In particular, the asymptotic theory of maximum likelihood provides a poor approximation, and Bayesian methods are sensitive to the prior specification. Hence, less efficient, but easily computed and exact, methods are an attractive alternative. Here, methodology is developed to compute exact confidence intervals for the between‐study variance using generalised versions of Cochran's heterogeneity statistic. If some between‐study is anticipated, but it is unclear how much, then a pragmatic approach is to use the reciprocals of the within‐study standard errors as weights when computing the confidence interval. © 2013 The Authors. Research Synthesis Methods published by John Wiley & Sons, Ltd.


Introduction
Meta-analysis is the statistical process of pooling the results from separate studies that investigate the same treatment or issue. One major difficulty when attempting this in practice however is that the study results may be too disparate to assume that they measure a common underlying effect. Indeed, some between-study variation may be thought to be inevitable due to different study practices and population mixes. The random effects model, described in detail in Section 2, is commonly used to model this between-study heterogeneity and estimate a meaningful average effect. Here, an unobserved random effect provides any necessary additional variation in the studies' results.
Although the average effect is the parameter of primary interest, quantifying the extent of the between-study variation is also important (Higgins and Thompson, 2002;Higgins et al., 2009). In particular, if this variability appears to be considerable, then investigating the reasons for this is often encouraged (Thompson, 1994). However, the uncertainty in the estimate of the between-study variance is often large (Biggerstaff and Jackson, 2008) and a measure of this uncertainty is an important aid to inference. Here, a procedure for calculating confidence intervals is developed for this purpose.
More specifically, generalised Cochran heterogeneity statistics (DerSimonian and Kacker, 2007) are used to provide confidence intervals in a similar way to Biggerstaff and Jackson (2008), who used the more conventional heterogeneity statistic. Here, it is also proved that the calculations result in 'well-behaved' confidence intervals with the correct coverage. Methods based on these heterogeneity statistics are however not the most efficient (Jackson et al., 2010). Likelihood-based methods are asymptotically efficient, but these present problems. For example, inferences from the asymptotic theory of maximum likelihood (Hardy and Thompson, 1996;Biggerstaff and Tweedie 1997) cannot be expected to be accurate in situations where there are a small number of studies, as is commonly the case. Small sample correction methods, such as those suggested for the average effect (Noma, 2011;Guolo, 2012), could be usefully employed to provide more accurate inference, but finding accurate likelihood-based methods for variance components with very few observations is inevitably challenging. Furthermore, inferences from Bayesian analyses are sensitive to the prior specification (Higgins et al., 2009;Lambert et al., 2005;Pullenayegum, 2011). In situations where an informative prior distribution for the between-study variance is available, then this could be used, but the resulting analysis requires more assumptions than conventional meta-analyses. A Bayesian analysis using informative prior distributions will also be open to immediate criticism by those who do not find the priors plausible.
Less efficient, but exact and easily computed, methods with good frequentist properties are therefore a desirable alternative. The Q profile method has recently been developed for this purpose (Knapp et al., 2006;Viechtbauer 2007), where Cochran type heterogeneity statistics are used with weights related to the total (within plus the unknown between) study variances. This gives rise to a pivotal test statistic that follows an w 2 distribution that may be inverted to give confidence sets. A similar approach was also suggested by Tian (2008) for normally distributed data, and Bowden et al. (2011) present some closely related ideas. Here, however, an alternative procedure is developed where the weights are instead fixed constants. As demonstrated in a simulation study in Section 4, this can provide shorter exact confidence intervals than the alternatives. The methods applied in this paper provide exact confidence intervals under the assumptions of the random effects model for meta-analysis, but this exactness requires normally distributed study outcomes with within-study variances that are treated as fixed and known. Hence, the lengths of the confidence intervals will be used to determine which method provides the most informative inference and so makes best use of the data.
The rest of the paper is set out as follows. In Section 2, the random effects model for meta-analysis and Cochran's conventional heterogeneity statistic are described. In Section 3, DerSimonian and Kacker's generalised heterogeneity statistic is presented and all the properties needed to ensure that these can be used to provide confidence intervals for the between-study variance are derived. In Section 4, a simulation study is performed that compares the proposed method with the Q profile method. Three variations of the methods are applied to some example datasets in Section 5, and the paper concludes in with a short discussion in Section 6.

The univariate random effects model and Cochran's heterogeneity statistic
The random effects model assumes that the outcome from each of n studies, y i for i = 1, 2, . . ., n, may be modelled using the equation Á and all u i and e i are mutually independent. It is conventional in the meta-analysis setting to assume that the within-study variances s 2 i are fixed and known, but they are estimated in practice. This convention is followed but methodology, which reflects the fact that the s 2 i are estimated, has also been developed (Malzahn et al., 2000). The random variable u i denotes the random, study-specific deviation from the mean effect, and the parameter t 2 represents the between-study variance: t 2 > 0 reflects underlying study differences in a formal sense, and if t 2 = 0, then all studies have the same underlying average effect m, providing a fixed effect model. Cochran's heterogeneity or Q statistic is frequently used in conjunction with this model and is conventionally written as the weighted sum of squares effects model treats the s 2 i as fixed constants, rather than random quantities, and hence, any fixed function of the s 2 i is also treated as a constant by this model. The conventional choice for this function is the reciprocal function. Estimation of t 2 and then inference m may be performed as described by DerSimonian and Kacker.

A generalisation of Biggerstaff and Jackson's result
In this section, a generalisation of the result from Biggerstaff and Jackson (2008) is proved. If a i = w i for all i, then the generalised heterogeneity statistic reduces to Cochran's heterogeneity statistic, and Biggerstaff and Jackson's result is recovered as a special case.
Write DerSimonian and Kacker's generalised heterogeneity statistic in matrix form. Let Y be the vector containing the Y i , and let B ¼ A À 1 aþ aa t , where A is the diagonal matrix containing the a i , a is the vector containing the a i , a + = P i a i and t denotes matrix transpose. The matrix representation of DerSimonian and Kacker's generalised heterogeneity statistic, Q a , is then given by Next, let Σ denote the variance of Y, the diagonal matrix with entries s 2 i þ t 2 À Á , and let Z denote a standard n dimensional multivariate normal vector. Noting that Q a is location invariant, we can write defining S = Σ 1/2 BΣ 1/2 . B is symmetric and hence, so is S. Following the same procedure described by Biggerstaff and Jackson (2008), writing S in terms of its spectral decomposition, we obtain where w 2 i 1 ð Þ are mutually independent chi-squared random variables with 1 degree of freedom and l 1 (S) ≥ l 2 (S) ≥ ⋯ ≥ l n (S) are the ordered eigenvalues of S.

The parameters that the distribution of Q a depend on
The l i (S) are functions of t 2 through their dependence on S but do not depend on m. The eigenvalues l i (S) also depend upon the s 2 i and a i , but these are taken to be fixed constants: the s 2 i are treated as fixed constants in the random effects model and the a i are fixed values, possibly a fixed function of the s 2 i , chosen by the analyst prior to analysis. Hence, the only unknown parameter that the distribution of Q a depends only on is t 2 . Next, some other important properties possessed by Q a are proven.
3.3. Property 1: Q a is distributed as a positive linear combination of w 2 (1) random variables with exactly (nÀ1) positive coefficients Biggerstaff and Jackson (2008) proved that the conventional Q statistic (1) is distributed as a positive linear combination of w 2 (1) random variables with at most (nÀ1) positive coefficients. The Appendix contains a proof that this is also the case for the generalised heterogeneity statistic Q a where it is further shown that this statistic is distributed as such a linear combination with exactly (nÀ1) positive coefficients.
3.4. Property 2: The cumulative distribution function of Q a is a continuous and strictly decreasing function in t 2 Biggerstaff and Jackson (2008) implicitly assumed this property when obtaining confidence intervals using the conventional Q statistic.
Proof: The entries of S are continuous in t 2 and hence, so are its eigenvalues. Hence, from the form of (2), the cumulative distribution function of Q a is continuous in t 2 and all that remains to be shown is that it is strictly decreasing in t 2 .
The eigenvalues in (2) are those of Σ 1/2 BΣ 1/2 where Σ depends on t 2 and B does not. Suppose that we also consider a larger value of t 2 , denoting the larger variance of Y as ΣM, where M is a diagonal matrix where all the diagonal entries are greater than one. Then l i (M) > 1 for all i. The eigenvalues in (2) for the larger value of t 2 are those of Σ 1/2 M 1/2 BΣ 1/2 M 1/2 , which are equal to those of SM. Then the second inequality in Equation (5) in the Appendix, with C = S and D = M, immediately shows that all the eigenvalues that provide coefficients in (2) are greater when considering the larger t 2 and so are strictly increasing in t 2 . Hence, the cumulative distribution function of Q a is strictly decreasing in t 2 .

Confidence intervals for t 2 by test inversion
Now that the necessary properties of Q a have been established, the test inversion procedure described by Casella and Berger (2002), Section 9.2.1, will be used to construct confidence sets with coverage probability 1 À a 1 À a 2 , where a 1 + a 2 denotes the significance level associated with the two-tailed test. Our proposed method essentially follows the method proposed by Biggerstaff and Jackson (2008) when using Q a with the conventional weights a i = w i .
We accept the null hypothesis H 0 : t 2 ¼ t 2 0 , and thus, t 2 0 lies in the corresponding confidence set, if and only if and Conceptually, (3) ensures that t 2 0 is not too small to result in the observed q a and so provides a lower bound; (4) provides an upper bound. We reject the null hypothesis H 0 : t 2 ¼ t 2 0 using (3) if q a is greater than the (1 À a 1 ) quantile of Q a . Similarly, we reject the null hypothesis using (4) if q a is less than the a 2 quantile. Because a 1 + a 2 < 1, it is impossible for (3) and (4) to simultaneously reject the null hypothesis. Hence, the significance level of the test is a 1 + a 2 , as required when producing a confidence set with coverage probability 1 À a 1 À a 2 .

Obtaining confidence intervals numerically
The cumulative distribution function of Q q , P(Q a ≤ q a ; t 2 ), can be evaluated using the algorithm for a positive linear combination of w 2 random variables proposed by Farebrother (1984). The CompQuadForm R package implementation of this was used throughout.
If P(Q a ≤ q a ; t 2 = 0) < a 2 , then no t 2 0 satisfies (4); t 2 is nonnegative and the cumulative distribution function of Q a is decreasing in t 2 . Hence, the strict implementation of (3) and (4) provides a null set for the confidence set for t 2 . This is analogous to the possible null confidence set that the Q profile method proposed by Viechtbauer (2007) may also provide. However, this only occurs when the observed q a is very small, so instead of providing a null set for the confidence set in such instances, it is preferable to give t 2 If t 2 > 0, as typically thought to be the case, then this does not increase the coverage probability of the confidence set. If t 2 = 0, then the coverage probability of the confidence set increases by a 2 and the confidence set is conservative. This convention can also be adopted when using the Q profile method, and this was suggested by Knapp et al. (2006). Alternative interpretations of the null set are possible in application however, such as 'the data appear to be highly homogenous' or even 'the interval estimation fails'.
If instead P(Q a ≤ q a ; t 2 = 0) ≥ a 2 then, because the cumulative distribution function P(Q a ≤ q a ; t 2 ) is continuous and strictly decreasing in t 2 , we can use any simple numerical method to find the value t 2 u that satisfies P Q a ≤q a ; t 2 ¼ t 2 Then all t 2 in the interval 0; t 2 u Â Ã satisfies Equation (4) and all other values of t 2 do not. Next, if P(Q a ≥ q a ; t 2 = 0) ≥ a 1 , then all t 2 in the interval [0,1] satisfy (3) and we define t 2 l ¼ 0; Biggerstaff and Jackson (2008) refer to this as 'truncating' the lower confidence bound to zero. Otherwise, we can use any simple numerical method to find the value of t 2 that satisfies P Q a ≥q a ; t 2 ¼ t 2 l À Á ¼ a 1 and then all t 2 in the interval t 2 l ; 1 Â Ã satisfies (3) and all other values do not. The intersection of the intervals t 2 l ; 1 It is easily shown that the resulting confidence set is the interval t 2 First, note that if the null confidence set has been interpreted as [0,0], then trivially, t 2 u ≥t 2 l . Otherwise t 2 u > 0 and if the lower confidence bound has been truncated to zero then, again trivially, t 2 u ≥t 2 l . Finally, if the lower bound is not truncated, when solving P Q a ≥q a ; t 2 ¼ t 2 l À Á ¼ a 1 , and noting that a 1 + a 2 < 1, we have that Because the cumulative distribution function of Q a is strictly decreasing in t 2 , we must have t 2 l < t 2 u . Hence, t 2 u ≥t 2 l .

A simulation study
To assess the performance of the proposed method, a simulation study was performed with n = 5 studies. This represents the common situation where there is just a handful of studies and the proposed method may be anticipated to be especially valuable. If instead there were just two or three studies, any form of estimation for the random effects model would be extremely challenging, and if there were many more studies, then the sample D. JACKSON size would be sufficient, for example, to make accurate inferences using the profile likelihood in the way proposed by Hardy and Thompson (1996).
Weights of the form a i ¼ 1= s 2 i þ x À Á p were investigated when applying the proposed method, where x took the five values of t 2 described previously, in conjunction with p = 0, 0.5, 1. Provided the analyst specifies the fixed values of x and p to be used prior to examining the data or performing the analysis, the resulting weights are fixed constants because the s 2 i are treated as such in the random effects model. The repeated sampling properties of confidence intervals under the assumptions of the random effects model, where x and p depend in any way on an examination of the data, or any other random variables, are much harder to evaluate but one possibility is explored in Section 4.2. Weights of the form a i ¼ 1= s 2 i þ x À Á p could also be used to make inferences for m in the way described by DerSimonian and Kacker (2007), but here, the focus is the proposed method for constructing confidence intervals for t 2 These three values of p provide an unweighted analysis, and also weights that are related to the within-study standard errors and the corresponding variances. Hence, these values of p are intuitively appealing values to consider; x = 0 and p = 1 correspond to the conventional weights w i ¼ s À2 i . If p = 0, then an unweighted heterogeneity statistic is used irrespective of x; unweighted methods for meta-analysis have previously been proposed (Bonett, 2008;Bonett 2009;Shuster, 2010), and completely arbitrarily x = 0 was used in conjunction with p = 0. Setting x = 0 and p = 0.5 means that the reciprocals of the studies' within-study standard errors are used as weights. If, for example, an a priori value of t 2 is thought plausible, then setting x equal to this means that the weights are related to the total study variances thought plausible before examining the data, which are intuitively appealing weights to use. In situations where a suitable positive value of x is difficult or impossible to state in advance, then this should be set to zero to use weights that are more akin to the conventional ones.
Confidence intervals with a 1 = a 2 = 0.025, and hence, 95% coverage probability, were used throughout. Putting an equal probability of 0.025 in each tail follows common practice, but other possibilities are returned to in the discussion.
To compare the proposed procedure to the established Q profile method, the R package metafor was used to apply Viechtbauer's implementation of this. This method was chosen because it has become popular and, like the proposed method, only requires that the random effects model is assumed for the outcome data used in analysis. For example, neither the proposed method nor the Q profile method requires that the raw data follow a normal distribution.
The results of the simulation study are shown in Table 1, where the mean and the standard deviation of the lengths of the resulting confidence intervals are shown. Here, the proposed method with each set of weights, and the Q profile method, was applied to the same 40,000 simulated datasets for each t 2 ; m = 0 was used when simulating data, but this is immaterial. A different random seed was used for each value of t 2 . The convention where null confidence sets were interpreted as intervals of [0,0] was adopted throughout. The empirical coverage probabilities of the 95% confidence intervals are also shown in Table 1 and are within Monte Carlo error of 0.95 if t 2 > 0 and 0.975 if t 2 = 0, as the theory predicts. One conclusion from Table 1 is that the confidence intervals are in general very wide, reflecting the considerable uncertainty in t 2 for examples with just five studies. The results for the best performing method (shortest average confidence interval) for each value of t 2 are highlighted in bold in Table 1.
The results in Table 1 show that as t 2 increases, the average lengths of all confidence intervals, and the variation in these lengths, also increase. In terms of the previously proposed methods, the Q profile method provides shorter confidence intervals when there is considerable heterogeneity than intervals based on Cochran's conventional heterogeneity statistic ('B and J' in Table 1). However, this requires I 2 > 0.5 (t 2 > 0.069), and otherwise, Biggerstaff and Jackson's method is preferable.
Intuition suggests that using p = 1 in conjunction with a value of x that is appropriate in the context of the metaanalysis in question will ensure that the weights used will most accurately reflect the true variance structure in the data and hence provide shorter confidence intervals. This intuition is confirmed in Table 1, where the method that provides the shortest confidence intervals is in every case the one that uses the inverse of the true total study variances (p = 0and x = t 2 ). Cochran's heterogeneity statistic can therefore be seen as quite an extreme case, where the weights incorporate no between-study variance whatsoever. In practice, using values of x that are thought a priori close to the true value can be expected to perform better than sticking to the conventional weights. Those more comfortable with the I 2 statistic could specify a plausible value and convert this to a value of x to use in the weights. However, this suggestion requires some a priori knowledge about the likely extent of the between-study variation, which the author does not possess for the examples that follow, and so may be difficult to implement in practice.
The results for p = 1/2 are interesting because, for example, p = 1/2 and x = 0 outperforms the Q profile method unless the heterogeneity is very severe and only increases the length of the intervals slightly compared to D. JACKSON Table 1. Results from the simulation study. 40,000 simulated datasets were produced for each value of t 2 . The average lengths of the 95% confidence intervals are shown with their standard deviations in parentheses. The results for the procedure that provides, on average, the shortest intervals is highlighted in bold font for each value of t 2 . 'B and J' indicates the conventional weights, and so the procedure suggested by Biggerstaff and Jackson (2008) is used. The empirical coverage probabilities of the 95% confidence intervals are also shown. Biggerstaff and Jackson's method when the heterogeneity is mild. Weighting studies by the reciprocal of their within-study standard errors in this way, rather than by their variances as convention dictates, appears to provide a sensible and viable option when there is little a priori knowledge about the extent of heterogeneity, but some is anticipated. This weights the studies more equally than the usual weights and so better reflects the true variance structure when heterogeneity is present. Hence, it makes intuitive sense to consider this alternative set of weights under these circumstances. This proposal is compared with the established alternatives using some real datasets in Section 5.

Additional simulation studies
Further simulation studies were performed, again using 40,000 simulated datasets for each scenario, where different random seeds were used for each combination of n and t 2 . Sample sizes of n = 10, n = 20 and n = 40, and the same values of t 2 as in Table 1, were used in these additional simulation studies. Within-study variances were obtained in the same manner as before, as equally spaced quantiles, from 0% to 100%, from the distribution suggested by Brockwell and Gordon (2007). The results from these simulation studies reinforce the previous conclusions. In each case, Biggerstaff and Jackson's method outperformed (shorter confidence intervals) the Q profile method when the heterogeneity was mild, but when the heterogeneity was large, these roles were reversed. In every instance, p = 1 in conjunction with x equal to the true between-study variance performed very well as anticipated. Perhaps most importantly, the choice of p = 1/2 and x = 0 seemed a reasonable compromise between Biggerstaff and Jackson's method and the Q profile method, exactly as it did for n = 5. The results from these additional simulation studies are available in the supplementary materials that accompany the paper.

Weighting by the reciprocal of the estimated total study variances
Weights of the form a i ¼ 1= s 2 i þ t 2 À Á , so that the weights are the reciprocals of the true total study variances, were found to perform well in the simulation study. However, because t 2 is unknown, it is not entirely fair to compare the use of these weights with the established methods. It is however tempting to use weights equal to the estimated total study variances so that a i ¼ 1= s 2 i þt 2 À Á . These weights are easily computed and straightforward to use in application but their use invalidates the theory that ensures that exact confidence intervals are obtained, because the weights a i are now random variables.
Despite this theoretical objection, results were also obtained, using the simulated datasets used to produce Table 1 and weights of the form a i ¼ 1= s 2 i þt 2 À Á , wheret 2 is the usual estimate originally proposed by DerSimonian and Laird (1986). The average length of the resulting confidence intervals (with standard deviations in parentheses, as in Table 1) were 0.872 (0.862), 1.182 (1.050), 1.564 (1.306), 2.751 (2.136) and 11.541 (8.396) for t 2 = 0, 0.029, 0.069, 0.206 and 1.302, respectively. Furthermore, the empirical coverage probabilities of the nominal 95% confidence intervals were for these same t 2 were 0.978, 0.953, 0.953, 0.954 and 0.948. Because the Monte Carlo standard error associated with estimating a probability of 0.95 with a sample size of 40,000 is ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 0:95 Â 0:05=40000 p % 0:001, these results provide evidence that this procedure fails to provide the nominal coverage probability exactly but also that this departure from the nominal probability level is too small to be of any practical concern. This method also appears to perform well compared with the alternatives, in that it provides relatively short 95% confidence intervals, and in particular, it appears to provide shorter confidence intervals than using the weights proposed for general use in Section 4.1 (p = 1/2 and x = 0; seventh row of Table 1). These conclusions are supported by the results using n = 10, 20 and 40; all the results for these larger sample sizes are available in the supplementary materials that accompany this paper.
Further investigation is required before this procedure can be safely recommended for general use, because the random weights invalidate the theory, but this simulation study suggests that using the reciprocal of the estimated total study variances as weights, which are the conventional weights for making inferences about m (DerSimonian and Laird, 1986;Jackson et al., 2010) may also prove to be a good option for obtaining confidence intervals for t 2 .

Examples
Four examples were analysed by Biggerstaff and Jackson (2008), and these will also be used here. See Biggerstaff and Jackson for full details, but briefly, the examples use meta-analytic data from studies that examine: (i) aspirin and heart attack; (ii) diuretic and preeclampsia; (iii) glycerol for acute stroke; and (iv) sclerotherapy and cirrhosis. Three methods for obtaining exact confidence intervals will be applied to each example. First, the established Q profile method (first row of Table 1) and the method proposed by Biggerstaff and Jackson using Cochran's Q statistic (second row of Table 1) will be used. Finally, generalised Cochran heterogeneity statistics will be used, where the weights are the reciprocals of the within-study standard deviations (seventh row of Table 1). This follows the suggestion in Section 4, where some heterogeneity is anticipated but it is uncertain how much. Now that we have real data, however, the exactness of the confidence intervals is brought into question, because the random effects model only provides an approximation for data such as these.
The results are shown in Table 2. In three of the four examples, the proposed method (weighting by the reciprocal of the within-study standard errors) provides shorter confidence intervals than the method used by Biggerstaff and Jackson. This reflects the fact that statistical heterogeneity is present in all four datasets (Biggerstaff and Jackson, 2008). However, in three of the four examples, the Q profile method provides the shortest confidence interval. This observation may appear to contradict the finding from the simulation study that this method is generally outperformed by the proposed method. However, this is belied by the fact that the Q profile method provides a very much longer confidence interval for the Diuretic data, which gives an indication of the longer confidence intervals that this method has been found to provide.
The confidence intervals are generally in good agreement for all four examples and are wide, as anticipated, given the difficulty in accurately estimating the between-study variance in examples where there are few studies. The reasons for the more specific differences between the results using the three methods are hard to explain however, which is perhaps inevitable given the considerable heterogeneity present in these examples and the imprecision of estimates of t 2 in meta-analyses such as these.

Discussion
Generalised Cochran heterogeneity statistics provide a convenient method for computing exact confidence intervals for the between-study variance parameter in a random effects meta-analysis. They incorporate an existing method as a special case, and by choosing more appropriate weights than is conventional, shorter confidence intervals may be obtained. The only potential numerical difficulty is the use of Farebrother's algorithm, but the R implementation is convenient and fast. R code is available in the supplementary materials that accompany the paper, which produces confidence intervals in a few seconds.
In the simulation study, a 1 = a 2 = 0.025 was used, so that the convention of using equal probabilities in both tails has been adopted, but shorter confidence intervals with the same coverage might be possible by using a 2 6 ¼ a 1 and it is left as an open question as to whether this should be considered more often in practice. However, those who do not consider t 2 = 0 plausible may prefer to calculate one-sided confidence intervals that reflect this and set a 1 = 0. Furthermore, due to the very wide confidence intervals that were obtained in the simulation study and for the examples, coverage intervals with less coverage, 90% or even 80%, might be deemed preferable to obtain tighter confidence intervals.
A wide variety of methods for estimating t 2 are available (Sidik and Jonkman, 2007) and the confidence intervals developed here are built upon just one of these, the method proposed by DerSimonian and Kacker. Methods might also be developed that build upon the alternatives however and this could form the subject of future work. Methods for constructing confidence intervals for I 2 follow naturally by taking the typical within-study variance proposed by Higgins and Thompson (2002) as fixed, so that I 2 may be interpreted as a monotonic function of t 2 . Generalised Cochran heterogeneity statistics could also be used to test the null hypothesis that t 2 = 0, in a similar way to the conventional one, but this possibility is not examined here. This is because using them is more complicated and no gain in power is anticipated, but this too could provide an avenue for further work.
The Q profile method remains a viable alternative but this appears to be inefficient compared with the alternatives considered here unless the heterogeneity is very considerable, in which case, all confidence intervals become very wide and so very little can be inferred about the extent of the heterogeneity. In any case, many would hesitate to combine very disparate results in a random effects meta-analysis. If some heterogeneity is thought plausible but is difficult to determine how much there might be a priori, then using generalised Cochran heterogeneity statistics with weights equal to the reciprocal of the within-study standard errors would appear to be a sensible option. The possibility of using weights equal to the reciprocal of the estimated total study variances warrants further investigation. Table 2. Results for the four examples using three exact methods for obtaining confidence intervals for the between-study variance. For each method, the 95% confidence interval is tabulated using a 1 ¼ a 2 ¼ 0:025, and the width of the interval is given in square brackets. Dataset n Q profile B and J Proposed method Appendix: If C and D are square matrices of the same size, then l i (CD) = l i (DC) (Zhang 2010;page 57). Because the rows of B sum to zero, this matrix has an eigenvalue of zero and hence so does S. This can most easily be seen by observing that l i (S) = l i (ΣB), and the premultiplication of B by Σ retains the linear dependency amongst the rows of B and hence the eigenvalue of zero. Thus, l n (S) = 0, and the sum in Equation (2) only extends to (nÀ1). It can also be seen that l i (S) > 0 for i < n in Equation (2). This is a consequence of the observation that l i (A À 1/2 BA À 1/2 ) = l i (BA À 1 ) = 1 for i < n; A À 1/2 BA À 1/2 = S, where s 2 i ¼ a À1 i and t 2 = 0. It is a standard result that Q a $ w 2 (n À 1) when these standard weights are used and t 2 = 0 (Biggerstaff and Jackson, 2008). Because (2) shows that Q a is a linear combination of w 2 (1) random variables, an inspection of the form of the moment generating function of w 2 confirms that l i (BA À 1 ) = 1 for i < n. If C and D are positive semi-definite Hermitian matrices, then where l 1 (D) and l n (D) are the largest and smallest eigenvalues of D, respectively (Zhang, 2010;page 274). Then, the first inequality in (5) with C = B andD = A À 1 , and noting that l i (A À 1 ) > 0 for all i, shows that l i (B) > 0 if i < n. Finally, the second inequality in (5) with C = B and D = Σ, and noting that l i (Σ) > 0 for all i, shows that l i (S) = l i (BΣ) > 0 for i < n. Hence, Q a is a linear combination of w 2 (1) random variables with exactly (nÀ1) positive coefficients as stated.