1. We consider the use of weighted regression when modelling data from different sites, times or studies. Our primary focus is on the coverage rate of the 95% confidence interval for the slope parameter when we have a single predictor variable. We use simulation to assess this coverage rate for both weighted and unweighted regression, across a range of scenarios likely to be encountered in ecology.
2. Our results are surprising: unweighted regression will often be more reliable than weighted regression. The well-known advantages of weighted regression are offset by having to estimate the process error variance. Although unweighted regression involves assuming that the measurement error variances are equal, the coverage rate is remarkably robust to departures from this assumption. Unweighted regression will often be more robust because it does not make use of potentially poor information on the measurement error variances. The only situation in which unweighted regression will perform poorly is when there is a strong relationship between the precision of an estimate and its leverage in the regression. We propose a simple diagnostic tool to assess when this might be the case.
3. The implications of our results are important in a management context as they indicate the benefits obtained from using a simple, readily understood approach to combining information.
If you can't find a tool you're looking for, please click the link at the top of the page to "Go to old article view". Alternatively, view our Knowledge Base articles for additional help. Your feedback is important to us, so please let us know if you have comments or ideas for improvement.
In many ecological settings, we wish to combine information from different sites, times or studies to relate the true value of the parameter to one or more predictor variables. For example, we might wish to estimate the relationship between reproductive rate and an index of climate. Meta-regression, in which we combine information from different studies, is an important special case of this type of analysis (Verdú & Travaset 2005; Knowles, Nakagawa, & Sheldon 2009).
It is important to realise that there are two sources of error variation in such data: process error and measurement error. In our example, the former refers to the error variation we would encounter if we knew the exact reproductive rate each year, while the latter arises as a consequence of having to estimate reproductive rate from field data. These two sources of error variation need to be borne in mind whenever we analyse such data.
Recognition of the need to distinguish process error from measurement error is one of the reasons for the increasing popularity of hierarchical modelling, which arises naturally within the Bayesian approach to data analysis (Gelman & Hill 2007; Royle & Dorazio 2008; Gurevitch & Mengersen 2010). In our example, we would specify a model for the relationship between the true reproductive rate and the climate index (incorporating process error), as well as for the relationship between the true reproductive rate and the observed reproductive rate (incorporating measurement error). Fitting these two models simultaneously provides an effective means of recognising and allowing for the two sources of error variation.
Although hierarchical modelling has obvious benefits, we might sometimes wish to adopt a simpler approach, in which the analysis is performed in two stages (Cox 2006; Murtaugh 2007). In our example, this would involve first calculating annual estimates of reproductive rate and then fitting a regression of these against the climate index. To allow for measurement error, it would be common practice to perform a weighted regression at the second stage, with the standard errors of the annual estimates being used to determine the relevant weights (Gurevitch & Hedges 1999; Murtaugh 2007).
This two-stage approach may be necessary if the original data are not available, as is common in meta-regression. In addition, in some settings, specialist software might be required to model the measurement process, making a hierarchical modelling approach logistically difficult (see the molecular rates example discussed below). Even if all the data are available, a two-stage approach might be preferred for reasons of simplicity and transparency (Murtaugh 2007).
The purpose of this paper is to consider how such a two-stage approach should be carried out. In particular, we focus on assessing when it is preferable to use standard (unweighted) regression in the second stage, i.e. to ignore the differences in the measurement error of the individual estimates. Our motivation for this is twofold. First, we might wish to use an even simpler analysis, as long as we can still make reliable inferences (Murtaugh 2007). Second, it is possible that the use of unweighted regression might provide a more reliable analysis, as it does not require separate estimation of the process and measurement error variation. The standard recommendation in the literature is to use weighted regression (e.g. Gurevitch & Hedges 1999), and our aim is to consider the extent to which such a recommendation is merited.
We do not consider all the statistical issues that arise in meta-regression or more generally in meta-analysis, such as choice of population and parameter, sampling issues (including publication bias), and standardisation of methods across studies (Englund, Sarnelle, & Cooper 1999; Osenberg et al. 1999), as these are outside the scope of this paper. In addition, for simplicity of presentation, we primarily consider the case where it is reasonable to assume a linear relationship between the population parameter and a single predictor variable, with all the error terms being independent and normally distributed.
In the next section, we provide motivating examples and describe three methods of analysis that might be used in a two-stage approach: unweighted regression, weighted regression and ‘weighted regression ignoring the process error’. We compare these methods, using theory to outline their properties, and simulation to assess their performance in terms of the coverage rate and width of a 95% confidence interval for the slope.
Materials and methods
1Reproductive rate in blue petrels: Chastel, Weimerskirch, & Jouventin (1995) studied the relationship between reproductive rate and body condition of blue petrels (Halobaena caerulea), using data collected from the Kerguelen Archipelago between 1986 and 1991. Figure 1 shows annual estimates of reproductive rate (fledglings per egg laid) vs. mean pre-breeding body condition (body mass divided by culmen length) for a sample of males that year. The plot suggests that there might be a negative relationship between reproductive rate and body condition. The standard errors of the estimates of reproductive rate range from 0·05 to 0·12. This difference in precision is common, and we might wish to allow for it when estimating the relationship between true reproductive rate and body condition.
2Meta-regression of emergence time in plants: Verdú & Travaset (2005) combined information from 55 studies to assess the effect of emergence time on different components of plant fitness (survival, growth and fecundity). In doing so, they used meta-regression to allow for differences between the studies in terms of seed size, experimental conditions (field, laboratory, greenhouse or nursery), life-form (annual vs. perennial) and the time between first emergence and measurement of fitness. The meta-regression used Fisher’s z-transformation of the correlation coefficient from each study as the response variable, the standard error of which ranged from 0·011 to 0·224 (survival), 0·022 to 0·378 (growth) and 0·009 to 0·209 (fecundity).
3Time dependency in molecular rates: Burridge et al. (2008) assessed time dependency in rates of mitochondrial DNA change by regressing estimates of the rate of change against time, for nine ‘events’ in New Zealand, dated between 7000 years and 5 million years ago. They used specialist software to estimate the rates of change (Hey & Nielsen 2006). These estimates were used in a weighted nonlinear regression to model the relationship between rate of change and time of the event; the regression involved a log transformation of the estimates, and on this scale, their standard errors ranged from 0·29 to 0·74.
Model for combining information
Suppose we have an estimate yi of an ecological parameter μi, for each of k‘replicates’ (sites, times or studies), and that it is reasonable to assume the hierarchical model given by
where β is the slope parameter, ɛi is the measurement error, γi is the process error and xi is the predictor variable . We also assume that the γi and ɛi are independent, with and . Thus is the process error variance and the are the measurement error variances. The unconditional variance of yi implied by this model is , while the variance conditional upon μi is . We assume that the first stage of the analysis has provided both the estimates yi and their estimated standard errors .
Approaches to the second stage of the analysis
In comparing different methods for the second stage of the analysis, our focus will be on the coverage rate of a 95% confidence interval for the slope parameter β, i.e. the probability that the interval contains β. For all the methods, this interval is calculated as
where and are the estimates of β and , respectively, and t is the 97·5th percentile from a t-distribution with ν degrees of freedom.
If we use weighted regression (Kutner et al. 2004), the estimates of the parameters are given in matrix notation by
where y is the vector of yi, X is the design matrix, is the vector of estimates, W is a diagonal matrix, with the ith diagonal element being the weight wi, and is the corresponding matrix containing the estimated weights , where
with and being the estimates of and . An estimate of the covariance matrix of is given by
For unweighted regression, we have:
where and is the residual mean square from the unweighted regression (Kutner et al. 2004). Unweighted regression involves the assumption that the are equal, which is unlikely to be true. However, unlike the weighted regression methods, it does not require estimation of and does not depend on how well the are estimated; Bement & Williams (1969) suggest that at least ten degrees of freedom are required when estimating each in a weighted regression.
Theoretical comparison of weighted and unweighted regression
From the form of eqn 2, it is clear that the coverage rate of the confidence interval will be influenced by how well we can estimate β and and how well we can determine a suitable value for the degrees of freedom, ν. It is straightforward to show that is always unbiased for β (Kutner et al. 2004). If the weights are known, is also unbiased and has the smallest possible variance amongst linear unbiased estimates (Aitken 1935; Kutner et al. 2004); if not, the bias in will depend on how the weights are estimated and whether the predictor variables are considered as fixed or random.
In general, both and will be biased. It is difficult to obtain a simple expression for the bias in , owing to the estimation of the weights (Bement & Williams 1969). For unweighted regression, the bias in can be written as (Appendix A)
where is the mean of the and measures the ‘leverage’ of the ith observation. The term is proportional to the covariance between and ; if this covariance is positive, will tend to underestimate and the resulting confidence interval will be too narrow, and vice versa if the covariance is negative.
The precision of and will determine suitable choices for ν in eqn 1. However, any expressions for the precision are likely to be complicated and so provide little insight. Qualitatively, we would expect their precision to depend on the values of the predictor variable. In addition, the precision of will depend on the precision of , and that of will depend on how well the weights are estimated.
Choices in the use of weighted regression
Use of weighted regression involves a choice as to how is estimated and how we then determine an appropriate value for ν. We assume here that is estimated using residual maximum likelihood (REML), as this is commonly used and is regarded as a good general method for estimating variance components (Thompson & Sharp 1999). Standard use of REML leads to being non-negative. Although this seems logical, it may have undesirable effects on the confidence interval. We therefore also consider a ‘no-bound’ version of REML, which allows . We consider the following values for ν:
1ν = ∞, which leads to t = 1·96,
2ν = k−2, the error d.f. from the unweighted regression
A special case of weighted regression involves setting , which is likely to perform poorly unless is small. We include it here because many statistical textbooks state that weighted regression involves setting to be the reciprocal of the variance of yi, without making it clear whether this refers to the conditional or unconditional variance; use of the conditional variance is equivalent to setting .
We used simulation to compare the different methods in terms of the coverage rate of the 95% confidence interval for β. As the value of β will not affect our results, we arbitrarily set β = 0. Likewise, the results will be influenced by the relative size of and (the mean of the ) and by the variation in the , rather than their actual values (Cochran 1954). We therefore arbitrarily set . The methods we considered are as follows:
1 Method W: Weighted regression with estimated using standard REML and ν = ∞, k−2 or k−5.
2 Method WNB: This is the same as Method W, but using the ‘no-bound’ version of REML.
3 Method W0: Assuming with ν = ∞.
4 Method U: Standard (unweighted) regression with ν = k−2.
The factors that were likely to influence the performance of the methods were chosen as follows:
1 Values of : these were set at 0, 0·1, 0·2, ..., 2.
2 Number of replicates: k = 5, 10, 20, 30 and 100, with the primary focus being on k = 10.
3 Values of the predictor variable: we initially considered equally spaced values between 1 and 10. Based on the theoretical discussion above, we also considered two scenarios that had a large covariance between and , to assess how poorly unweighted regression might perform.
4 Variation in the : we primarily considered , where and are the maximum and minimum of the , but we also considered and .
5 Pattern of variation in : we initially considered decreasing linearly with xi, as would occur when the predictor variable is year and sampling effort increases over time. We also considered increasing linearly with xi and three cases where has no relationship with xi, the relative sizes of the being (5,2,10,6,7,1,8,4,9,3), (10,1,1,1,1,1,1,1,1,10) or (2,1,1,1,1,1,1,1,1,2). The last two choices were again chosen to assess how poorly unweighted regression might perform, as they correspond to a large and a moderate covariance between and , respectively.
6 Estimation of : in almost all scenarios, we assumed that the were estimated well enough that they were effectively known. To assess how important this assumption is to the performance of weighted regression, we also considered three scenarios in which each had a χ2 distribution with τ degrees of freedom, where τ = 4, 9 or 19 (as would arise if yi were a mean based on 5, 10 or 20 observations).
We considered a total of 15 scenarios (Table 1) for each of the 21 values for . For each scenario and value of , we simulated and analysed 10 000 sets of data using SAS version 9·1 (Appendix B). We summarised the results by calculating
Table 1. Scenarios considered in the simulations, where k is the number of estimates obtained from the first stage of the analysis, xi shows the values of the predictor variable, shows the relative sizes of the measurement error variances and τ is the d.f. associated with each (when is unknown)
Relative sizes of the
1, 2, 3 … 10
10, 9, …, 1
1, 2, 3 … 10
1, 2, …, 10
1, 2, 3 … 10
5, 2, 10, 6, 7, 1, 8, 4, 9, 3
Large decrease in
1, 2, 3 … 10
100, 89, 78, …, 1
k = 5
1, 3·25, 5·50, 7·75, 10
Same as xi
k = 20
1, 1·47, 1·95, … 9·53, 10
Same as xi
k = 30
1, 1·31, 1·62, … 9·69, 10
Same as xi
k = 100
1, 1·09, 1·18, … 9·91, 10
Same as xi
τ = 4
1, 2, 3 … 10
10, 9, …, 1
τ = 9
1, 2, 3 … 10
10, 9, …, 1
τ = 19
1, 2, 3 … 10
10, 9, …, 1
Outlier with large
1, 21, 22, …, 29
10, 1, 1, …, 1
Outlier with small
1, 2, …, 9, 29
10, …, 10, 10, 1
Influential points with large
1, 2, 3 … 10
10, 1, 1, …, 1, 10
Influential points with moderate
1, 2, 3 … 10
2, 1, 1, …, 1, 2
1B, the estimated bias in , calculated as the difference between the mean of over all simulations and the true value of ;
2E, the estimated efficiency of relative to the best possible estimator, calculated as the ratio of the variance of obtained using weighted regression with known weights to the variance of over all simulations;
3BV, the estimated relative bias in , calculated as the ratio of the mean of over all simulations to the variance of over all simulations. Overestimation of is indicated by BV > 1. We used relative, rather than the absolute, bias here for ease of interpretation when plotting the results against ;
4 The empirical coverage rate of the 95% confidence interval for β, calculated as the proportion of confidence intervals that include β. This will tend to exceed 0·95 if is overestimated or ν is underestimated.
The bias in is close to zero for all four methods, even when the weights are estimated (results not shown); as mentioned earlier, is known to be exactly unbiased. The estimates of all exhibit bias. For the ‘decrease in ’ scenario, is generally overestimated using W and WNB but correctly estimated using U (Fig. 2). Unless there is no process error, W0 can severely underestimate . For the two scenarios in Fig. 2, the bias is similar for W and WNB; across all 15 scenarios, there were some differences between these two methods, but no clear pattern. Both tend to overestimate when σp is small, especially when k is small. U provides an unbiased estimate of the variance except when there is some covariance between and (Fig. 3, bottom panels).
The unweighted estimator U is less efficient than WNB and W, the two estimators that include process error (Fig. 2). However, the loss of efficiency for U is much smaller when the measurement variance is estimated, a result that was also evident in other scenarios (results not shown). When the process error is large relative to the mean measurement error, U is as efficient as WNB or W.
The coverage of the confidence interval for β depended upon both the method and the choice of ν. Again, we show results for two of the scenarios in detail (Fig. 4). As would be expected from the above results, use of W0 leads to intervals that are too narrow unless σp is small, and we do not discuss this method any further. For W and WNB, use of ν = ∞ and ν = k−5 leads to intervals that are too wide and too narrow, respectively, so we focus on the use of ν = k−2 for these two methods. WNB performs slightly better than W across all scenarios, including those not shown in Fig. 4, so we focus on WNB as the preferred weighted regression option.
Figure 5 provides a comparison of WNB and U across a range of scenarios. U achieves approximately the correct coverage rate, for all values of σp in almost all scenarios. The exceptions are the four scenarios with large-to-moderate covariances between and , where U can perform very poorly. WNB achieves the correct coverage rate when σp is large, except when k is small or we have two of the scenarios for which and are correlated. It is less reliable when σp is small, k is not large or the σi is not known precisely. Our results suggest that U is to be preferred to WNB unless there is some covariance between and . If this covariance is positive (negative), U will produce an interval that is too narrow (wide).
A simple diagnostic tool to evaluate when U might perform poorly is to plot and , as in Fig. 6, and to visually assess their correlation. When this correlation is close to zero, the bias in the estimated variance is small and the confidence interval for the regression slope will be reliable. For the blue petrel example, the relationship between and is weak and negative, suggesting that the U confidence interval for the slope will be reliable or possibly a little wide. This is confirmed by comparing the estimated variance of the slope from the unweighted regression with an estimate of its bias (eqn 3). The estimate of bias is 0·0040, approximately half the estimated variance of 0·0088. As the bias is positive, the confidence interval from U will tend to be too narrow.
For the ‘decrease in ’ scenario, the covariance between and is zero, so U will be reliable, even though and are clearly dependent. For the ‘influential point with large σi’ scenario, the covariance is large and positive, so U gives an estimated variance that is too small and a confidence interval that is too narrow. Conversely, for the ‘outlier with small ’ scenario, it is large and negative and U gives an interval that is too wide. The bias in the estimated variance depends on all the observations, so it does not matter that the non-zero covariance is attributable to a single unusual value.
Our results suggest that automatic use of weighted regression in this setting requires careful consideration and that unweighted regression will often be more reliable. In the literature, discussion of weighted regression places emphasis on it being the ‘optimal’ method of analysis, in terms of the precision of the parameter estimates. However, this is only part of the story: to have a reliable confidence interval, we also need to be able to estimate the precision. In addition, optimality of weighted regression only applies if the weights are known, which is seldom the case in practice, primarily due to the need to estimate the process error variance. We found that the efficiency of unweighted regression relative to weighted regression was greater when the weights needed to be estimated, with the efficiency tending to one as the process error variance increased. Estimation of the weights is nontrivial, even if the measurement error variances are known. As the measurement error variances usually need to be estimated, the benefits of using weighted regression can be quite small (Bement & Williams 1969). For simplicity, we have assumed that the measurement error variances are estimated without bias, which might not be the case in practice, particularly if the first stage of the analysis is relatively complex; any such bias will cause weighted regression to be less reliable.
A further issue in using weighted regression is that method W0 is likely to be used in practice, as the need to allow for process error is not always well understood. The increasing use of hierarchical models will lead to greater understanding of this point, but it is important to note that W0 can fail spectacularly, leading to confidence intervals that are far too narrow and therefore to the detection of effects that are spurious.
Recommending the use of unweighted regression in this setting might seem odd, given that it amounts to ignoring the different levels of precision associated with the individual estimates. The key point is that under the right circumstances, unweighted regression is more robust than other methods precisely because it does not try to make use of the potentially misleading information on these levels of precision. In the unlikely case that the weights are known, there is no doubt that weighted regression is the preferred method.
We have provided a simple diagnostic tool to allow researchers to assess whether unweighted regression is to be preferred to weighted regression. The form of this tool suggests that unweighted regression will perform poorly when the estimates for which the predictor variable is particularly low or high also have either low or high precision relative to the other estimates. However, this is also the type of situation in which weighted regression does not perform particularly well.
Of the different options for carrying out weighted regression, we have focussed on use of REML to estimate the process variance, with the d.f. being set equal to the residual d.f. from an unweighted regression. In addition, we have found that the use of an unconstrained estimate of the process variance performs slightly better than the standard constrained estimate. This might again appear odd, in that it would seem preferable to have an estimate of the process variance that is guaranteed to be non-negative. However, we argue that it is preferable to accept a negative estimate to obtain a more reliable confidence interval for the parameter of interest. The fact that the estimate is negative can still be interpreted as suggesting that the true process variance is very small. This issue has been discussed elsewhere in the more general context of estimating of variance components (Searle, Casella, & McCulloch 1992, pp. 60–62).
We have focussed attention on the coverage rate of confidence intervals, as estimation is usually more informative than hypothesis testing. However, our results naturally carry over to the hypothesis-testing framework. Thus, we can say that the use of unweighted regression will generally lead to a type I error rate (for testing H0: β = 0) that is closer to the nominal level than that achieved using weighted regression.
There are several issues associated with these two-stage analyses that we have not considered. For example, if we have annual estimates of a parameter and wish to use year as a predictor variable, we might want to allow for serial correlation in the second stage of the analysis. Likewise, we might have measurement error in the predictor variable, as in the blue petrel example, where the body condition index is an estimate of the body condition for the whole study population. However, these sorts of complications arise regardless of whether or not one uses weighted or unweighted regression. One advantage of adopting an unweighted approach is that modification of the analysis to allow for serial correlation or measurement error would be simpler.
We have assumed that both the process and measurement error follow normal distributions. We would expect our results to hold qualitatively in other settings, for example where the parameters are probabilities, as in the blue petrel example. Indeed, our results suggest that unweighted regression of such data (possibly after a transformation of the response variable) may be more reliable than the use of a generalised linear model if that model does not allow for the overdispersion induced by process variation (c.f. Young, Campbell, & Capuano 1999).
We have not considered the simpler case with no predictor variables, i.e. where we wish to estimate the mean of an ecological parameter. Meta-analysis is an important special case of this setting. In a landmark paper on the combination of information from agricultural experiments, Cochran (1954) concluded that in this setting, use of an unweighted mean ‘is often entirely adequate.’, especially when the process error variance is large relative to the measurement error variances. Although he did not consider coverage rates of confidence intervals, his results suggested that there are circumstances in which use of an unweighted analysis will be preferable. We suggest there is a need for further work in this area.
The implications of our results are important in a management context, in that they indicate the benefits obtained from using a relatively simple approach to combining information from different sites, times or studies. This simplicity should allow such analyses to be carried out more easily and transparently.
We are grateful for financial support from the University of Otago in providing travels funds for PMD to visit Dunedin in 2009. We are also indebted to Sally and Chris Sclater in Wisborough Green, Sussex, for their hospitality in the final stages of preparing the manuscript.