Abstract
 Top of page
 Abstract
 Introduction
 Theoretical Background
 Metaanalysis: Literature Sampling
 The Metaanalysis Framework: Testing for Publication Bias and Estimating the True Coefficient
 Graphical Investigation and Bivariate Testing for Publication Bias and True Empirical Effect
 Multivariate MRA
 Conclusion
 References
This paper seeks to identify whether there is a representative empirical Okun's law coefficient (OLC) and to measure its size. We carry out a metaregression analysis on a sample of 269 estimates of the OLC to uncover reasons for differences in empirical results and to estimate the ‘true’ OLC. On statistical (and other) grounds, we find it appropriate to investigate two separate subsamples, using respectively (some measure of) unemployment or output as dependent variable. Our results can be summarized as follows. First, there is evidence of type II publication bias in both subsamples, but a type I bias is present only among the papers using some measure of unemployment as the dependent variable. Second, after correction for publication bias, authentic and statistically significant OLC effects are present in both subsamples. Third, biascorrected estimated true OLCs are significantly lower (in absolute value) with models using some measure of unemployment as the dependent variable. Using a bivariate MRA approach, the estimated true effects are −0.25 for the unemployment subsample and −0.61 for the output subsample; with a multivariate MRA methodology, the estimated true effects are −0.40 and −1.02 for the unemployment and the output subsamples respectively.
Introduction
 Top of page
 Abstract
 Introduction
 Theoretical Background
 Metaanalysis: Literature Sampling
 The Metaanalysis Framework: Testing for Publication Bias and Estimating the True Coefficient
 Graphical Investigation and Bivariate Testing for Publication Bias and True Empirical Effect
 Multivariate MRA
 Conclusion
 References
Since the pioneering work of Okun (1962) and his famous result that a 3 per cent increase in output is associated with a 1 per cent decline in the rate of unemployment, a large stream of literature has been devoted to the socalled Okun's law, the responsiveness of the unemployment rate to real output variations. As the Okun's law coefficient (OLC hereafter) continues to be a central parameter in the field of shortrun macroeconomics, it is not surprising that the empirical component of this literature has reported a proliferation of estimates of the correlation between unemployment and real GDP movements.
To date, however, no consensus has been reached regarding the size of the OLC, and several alternative theoretical models and empirical strategies have been used for estimating its value. However, empirical estimates are often sensitive to model specification and particularly to whether output or unemployment is used as the dependent variable. Other forms of differences in model specification arise from the choice about use of a static or dynamic model; and from the choice about use of firstdifference (with output and unemployment variables expressed in first differences) or gap model (with output and unemployment variables expressed in terms of the cyclical components or deviations from longterm trends). In the case of the gap model, empirical results may also be sensitive to the choice of the detrending method (linear trend, Hodrick–Prescott (HP) filter, etc.).
While this literature is characterized by a diversity of models and empirical strategies and by a striking heterogeneity of empirical results, no systematic survey has been done. This diversity of models, empirical strategies and results makes it difficult to use these estimated OLC values for the practical analysis of shortrun macro fluctuations.
Moreover, as suggested by DeLong and Lan (1992), publication bias can be found in several fields of economic research and may thus potentially concern empirical analysis of the Okun relationship. Two forms of publication bias are of particular interest in the present context. One form will exist if the process of research publishing predominantly selects papers with statistically significant results. Hence, larger and more significant effects will be over represented while studies with small insignificant effects will be under represented or won't be published. This form of bias—where statistically significant results are preferred—is known as type II bias. A second form, known as type I bias, occurs where a particular direction of results is preferred.
With publication selection, one would expect the average of effect magnitudes across papers to be upwardly biased, and so the presence of large empirical effects in the literature would not be statistically well founded (Stanley, 2005). Without correction for publication bias, it is not valid to take summary statistics of large empirical effects found the literature as indicative of true population values of the effect in question. It follows that if the Okun's law literature has been subject to publication selection bias, averages of OLC estimates across papers are likely to be upwardly biased in magnitude (in absolute value) and so will be invalid as evaluations of the true value of the OLC.
Economists have already tried to use metaregression analysis (MRA hereafter) to test for publication selection and then to remove or lessen its effects (beginning with Stanley and Jarrel, 1989). One of the main aims of this paper is to use MRA to study whether the observed variation in OLC may be partly accounted for by the existence of such publication biases.1 To the best of our knowledge, this is the first paper which performs a metaregression on Okun's law. Okun's law is widely used as a rule of thumb for assessing the expected level of the unemployment rate, and the reliability of any such assessments should be improved if estimated values of the OLC are corrected for significant evidence of publication bias.
As suggested by Stanley (2005, 2008), we first test for publication bias with the simplest bivariate metaregression model. In order to accommodate systematic heterogeneity, this preliminary test is then embedded into a multivariate funnel asymmetry test (FAT) test based on a multivariate metaregression including ‘moderator’ dummy variables. These moderator variables may help to explain genuine systematic variation among reported OLCs and to establish whether variations in OLC across studies are mainly due to data characteristics or to different model specifications. The retained procedure is thus in line with the reporting guideline proposed by Stanley et al. (2013) for MRA.
As the choice of real output or unemployment as dependent variable is a notable aspect of heterogeneous specifications in the empirical literature on the Okun's law, this choice may be expected to influence empirical estimates of the OLC (except if there were one cointegrating relationship between unemployment and real output, which is not found in the literature). Hence, we will investigate the influence of this specification choice by running separate investigations for the subset of studies using real output as the dependent variable and for the subset of papers using unemployment as the endogenous variable.
Our results can be summarized as follows. First, there is evidence of type II bias in both subsets, but a type I bias is present only among the papers using some measure of real output as the dependent variable.
Second, after correction for publication bias, statistically significant OLC effects are present in both subsets. Third, biascorrected estimated OLCs are significantly lower (in absolute value) with models using some measure of unemployment as the dependent variable. Using a bivariate MRA approach, the estimated true effects are −0.25 and −0.61 for the unemployment subset and the output subsample respectively; with a multivariate MRA methodology, the estimated true effects are −0.40 and −1.02 for the unemployment and the output subsamples respectively.
The paper is structured as follows. Section 2 briefly reviews the main issues in the empirical research on the Okun's law. Section 3 describes the properties of the literature sample used for the metaanalysis. Section 4 explains our approach to implementing the MRA. Section 5, using graphical analysis and bivariate MRA, tests for the existence and magnitude of publication bias. This permits the authors to estimate (one or more) ‘authentic’ OLC beyond publication bias. The corresponding multivariate MRA is conducted in Section 6. Section 7 concludes.
Theoretical Background
 Top of page
 Abstract
 Introduction
 Theoretical Background
 Metaanalysis: Literature Sampling
 The Metaanalysis Framework: Testing for Publication Bias and Estimating the True Coefficient
 Graphical Investigation and Bivariate Testing for Publication Bias and True Empirical Effect
 Multivariate MRA
 Conclusion
 References
Since Okun's (1962) seminal paper, Okun's law has widely been accepted in the literature as a representation of the negative relation between unemployment and output. In his 1962 article, Okun presented two simple equations connecting the rate of unemployment to real output which have frequently been used as rules of thumb for applied macroeconomic analysis. Since that time, these equations have been expanded on and modified by many authors so as to improve statistical fit and to make their theoretical foundation more precise.
A first group of papers includes two classes of specification suggested by Okun (1970): the first difference model and the ‘gap’ model. According to the firstdifference model, the relationship between the natural log of observed real output (y_{t}) and the observed unemployment rate (u_{t}) is given by the expression
 (1)
where a_{0} is the intercept, a_{1} (a_{1} < 0) is Okun's coefficient measuring by how much changes in output produce changes in the unemployment rate, and ε is the disturbance term.
From the point of view of the gap model, the specification is given by the expression
 (2)
where y* represents the log of potential output, u* is the natural rate of unemployment and the other symbols have the same meaning as in equation (1). In this second specification, the lefthand side term represents the unemployment gap, whereas () captures the output gap. In other words, the difference between the observed and potential real GDP captures the cyclical level of output. Likewise, the difference between the observed and natural rate of unemployment represents the cyclical rate of unemployment.
A major problem with the gap model is that there are no observable data on y* and u* so they have to be estimated. While Okun retained as a target rate of labour utilization and favoured a simple time trend to measure , alternative time series approaches have been proposed in the literature for estimating and . Among others, deterministic methods such as the HP filter (see for instance Marinkov and Geldenhuys, 2007, or Moosa, 2008) or the BaxterKing filter (see for instance Villaverde and Maza, 2009) have been widely used while some authors selected stochastic decomposition procedures such as Beveridge and Nelson (see for instance Lee, 2000) or the unobserved components model suggested by Harvey (1989) and estimated with a Kalman filter algorithm (see for instance Moosa, 1997, or Silvapulle et al., 2004). Finally, some papers use a specific auxiliary model to estimate these equilibrium values (see for instance Prachowny, 1993, or Marinkov and Geldenhuys, 2007).
As Okun noted that one of the shortcomings of the proposed relationship lies in the fact that the unemployment rate may only be considered as a proxy variable for idle resources affecting output losses, a second group of papers built empirical versions of the Okun's law from a macroeconomic production function relating real output to a set of factors potentially including labour, capital and technology (see for instance Gordon, 1984). Assuming that equilibrium real output is obtained when all factors reach their equilibrium level, the production function can then be transformed into a gap version of Okun's law including the idle resources coming from each input and which can be written as
 (3)
where is a vector of gaps between equilibrium and observed values of inputs other than labour. It is important to note that this kind of production functionversion of the Okun's law is then estimated with real output as the dependent variable instead of the unemployment rate.
Theoretically and econometrically, this reversal of the functional form of the estimated relationship makes it difficult to compare the empirical results found with the two groups of studies: one group in which the unemployment change or gap is the dependent variable; the other obtained using the production function version of the Okun's law. It is well known that the coefficient of a regression of X on Y is not in general equal to that in the inverse of a regression of Y on X. However, to make both groups of OLC estimates interpretable as the sensitivity of unemployment to real output changes, and so to facilitate comparison across the two groups of studies, coefficients estimated with equations using real output as the endogenous variable were systematically inverted, thereby rewriting all OLC values as the effect of real output variations on unemployment movements.
Metaanalysis: Literature Sampling
 Top of page
 Abstract
 Introduction
 Theoretical Background
 Metaanalysis: Literature Sampling
 The Metaanalysis Framework: Testing for Publication Bias and Estimating the True Coefficient
 Graphical Investigation and Bivariate Testing for Publication Bias and True Empirical Effect
 Multivariate MRA
 Conclusion
 References
Here we describe the procedure retained for literature sampling for the MRA. In order to select a sample of OLC empirical studies which is representative both of this literature and of a manageable size, we have resorted to a structural search for articles using the following sampling criteria. First, we searched the EconLit database for empirical studies on the OLC and all the papers that fulfilled the following criteria have been selected: (i) key words used in the search were: ‘Okun's law’ and ‘Output–unemployment relationship’; (ii) an abstract is presented so that the presence of econometric estimations of the OLC can be checked; and (iii) the article was published after 1980 and was listed in the EconLit database as of December 2010.
The year 1980 was retained as the starting date in order to permit analysis of the variance of published OLC empirical estimates but within relatively unified econometric frameworks and with data sets of the same quality and with reasonable time lengths. Dynamic time series methods with regards to data transformation, data stationarity and optimal lag selection became increasingly common in the eighties. Prior to 1980, many papers used very short data series (for instance, Thirlwall, 1969, used annual data from 1950 to 1967 with just 18 data points) or statistically questionable methods (such as empirically estimated time trends or ad hoc coefficients in order to calculate potential output or the natural rate of unemployment). All papers not related to the research question have been excluded. This selection process identified 97 papers.
After having examined these 97 articles, we excluded studies that do not include any original econometric estimation of the OLC. We also excluded studies that do not give sufficient information concerning the type of estimated model (endogenous/exogenous variables), the database (initial and final dates, periodicity) or the empirical results (Rsquared value, estimated coefficients and standard errors). We decided to exclude studies including only nonlinear Okun's law models.2 Finally, it is important to note that while some studies suggest that Okun's law has undergone structural change over time (e.g. Lee, 2000, Sögner and Stiassny, 2002, Huang and Chang, 2005), over countries (Kaufman, 1988; Moosa, 1997; Lee, 2000) or over the course of the business cycle (e.g. CrespoCuaresma, 2003, Silvapulle et al., 2004, Huang and Chang, 2005), we decided to restrict our database to linear versions of the Okun's relationship assumed to be stable across the whole data sample. This choice was motivated by the following reasons. First, these studies predominantly use either nonlinear models such as threshold models which include ad hoc assumptions concerning the threshold variable (the previous level of unemployment or the previous growth rates of real output for instance) or timevarying models where empirical results may appear highly dependent upon the characteristics of the retained methodology (the size of the rolling window, for example). Incorporating these papers in the database would thus go in hand with a large increase of the set of conditioning variables in the multivariate metaregression model with a limited number of observations associated with each variable. Second, due to the sensitivity of the estimated results to the retained testing procedure, these papers often lead to heterogeneous results and may give rise to controversies (see for instance the recent debate between Owyang and Sekhposyan, 2012 and Ball et al., 2012 on the stability of the Okun's law relationship during the Great Recession).
As a consequence, while the comparison of the empirical results produced by linear and nonlinear models within a MRA may constitute an interesting area of research, it seemed a priori difficult to include both linear model and heterogeneous nonlinear models within the same metaregression sample. The total number of studies left after applying these criteria was 28 and the total number of observations in our database is 269, each corresponding to one regression. Figure 1 shows the ‘life cycle’ of this literature in terms of the number of documents recorded in EconLit and retained in the present MRA.
As can be seen, the average number of papers meeting our selection criteria increased after 2003 and the literature peaked in 2007. Even base specifications of the Okun's law model permitted more than one regression per study since this specification is often applied to different samples, different time periods and different measure of the output gap or of the variation of the unemployment rate around its equilibrium level. In accordance with common practice in MRA, these were recorded as independent regressions in order to investigate the influence of these heterogeneities on the published effect. The full list of studies included in the MRA is given in the list of References at the end of this paper (each being marked by an * symbol).
Table 1 presents salient characteristics of the papers retained for our MRA. The number of observations used in the Okun's law (OL) equations varied enormously. The smallest was 21, while the largest was 408. All but 1.1 per cent of the OLC were estimated from time series databases and more than half of the studies (68.5 per cent) used annual frequency. Nearly three quarters of the papers use country level data while the remaining papers use regional databases. The percentage of estimates obtained with either the gap or the first difference version of the OL equation (41.8 per cent) is close to the percentage of estimates obtained with production function versions of the OL (58.2 per cent).
Table 1. Descriptive Statistics of OLC Studies (28 Studies) and OLC Estimates (269 Estimators)  Minimum  Maximum  Mean  Standard deviation  Median 

OLC  −3.22  0.17  −0.77  0.71  −0.58 
Number of observations  21  408  50.4  46.54  41 
First year  1948  1990  1968.2  10.75  1970 
Last year  1985  2006  1999.2  4.61  1999 
Proportion of OLC estimators with the following features (%) 
Time series database  98.9  Country  74.0 
Panel database  1.1  Region  26.0 
Yearly frequency  68.5  European countries  74.4 
Frequency higher than year  31.5  Unites States  7.6 
Endogenous variable: Unemployment rate  41.8  Rest of the world  18.0 
Endogenous variable: Real output  58.2  Static model  53.6 
Model in level  9.2  Dynamic model  40.0 
Model in first difference  14.7  Cointegrated model  6.4 
Equilibrium values of real output and unemployment from filtering procedure  76.1   
The Metaanalysis Framework: Testing for Publication Bias and Estimating the True Coefficient
 Top of page
 Abstract
 Introduction
 Theoretical Background
 Metaanalysis: Literature Sampling
 The Metaanalysis Framework: Testing for Publication Bias and Estimating the True Coefficient
 Graphical Investigation and Bivariate Testing for Publication Bias and True Empirical Effect
 Multivariate MRA
 Conclusion
 References
The process of academic publishing may influence the characteristics of the published results. While several kinds of publication biases can appear, two specific biases are most often encountered (Stanley, 2005; Stanley et al., 2008). Type I bias occurs when editors, referees, and/or researchers have a preference for a particular direction of results. Positive estimates of the OLC, for instance, might be ignored as it seems implausible that shortrun movements of unemployment are positively correlated with output gap fluctuations. However, even if there are very strong theoretical reasons for expecting negative estimates of the OLC, at least a few studies should report positive estimates. We can, for example, imagine the case of specific labour market regulations in case of macroeconomic downturns. A positive OLC finding may also arise due to some characteristics of data sets or of empirical methodologies. Such a bias would make the average taken from the published literature larger (in absolute value) than the estimated true effect.
Type II bias arises when editors, referees, and/or researchers have a preference for results that are statistically significant. As smaller samples and limited degrees of freedom reduce the probability of finding a significant result, this kind of publication bias may appear when researchers using small samples are inclined to search across econometric ‘tools’ (proxies, estimators, specifications) in order to produce more significant results. Type II selection will thus lead to excess variation (Stanley, 2005).
Detection of the presence of type I publication bias most commonly starts with the socalled funnel plot which compares the effect size for each regression (here the OLC) against some measure of its precision (the inverse standard error of the OLC, Egger et al., 1997). In the case of no bias, the plot should appear as an inverted funnel: observations with high precision should be concentrated closely to the true effect, while those with lower precision should be more spread at the base of the plot. In the absence of type I publication bias, the funnel plot is thus symmetric.
This visual investigation can also be supplemented with explicit regression tests. The FAT due to Egger et al. (1997) is implemented by means of the regression
 (4)
where OLC_{i} is the ith estimate of the OLC, SE_{i} is the standard error of point estimate i, N is the number of estimates of the OLC and u_{i} is the regression error term. In this simple MRA, α denotes the true OLC, and β indicates the size of publication bias.
As regression (4) is heteroskedastic and the measure of heteroskedasticity is the standard error of the estimate of the OLC, Stanley (2008) suggests performing weighted least squares by dividing equation (4) by the standard error of the OLC. This is simply achieved by OLS estimation of the transformed regression equation:
 (5)
where t_{i} is the tstatistic measuring the significance of the ith OLC. Equation (5) represents a regression line through a funnel graph which is rotated by 90 degrees and which is adjusted for heteroskedasticity. The FAT test for publication bias is then a simple ttest on the intercept of equation (5); a β significantly different from zero indicates the presence of publication bias. If β is significantly positive (or negative), then the effect size is subject to an upward (or downward) bias. Moreover, there is evidence of a ‘true’ empirical effect (i.e. a systematic relationship between unemployment variation and real output movements) if the coefficient α is significantly nonzero.
As the process of selecting estimates from the literature makes metaanalysis highly vulnerable to data contamination, the robustness of this basic test is checked by re estimating equation (5) with the iteratively reweighted least squares (IRLS) method as in Krassoi Peach and Stanley (2009) or Havranek (2010).
In a similar way to the case of the type 1 bias, a visual inspection for the presence of type II bias can be assessed using the Galbraith plot (Galbraith, 1988). This consists of a scatter diagram of the precision of the estimates of the OLC against the tstatistics corresponding to those estimates for a given assumed value of the true effect. If there were type II selection, large values (in absolute terms) will be over reported and there will be an excessive likelihood of reporting significant results. In case there was no type II publication bias and the true effect (labelled TE) were really true, the statistics (OL_{i} − TE)/SE_{i} should not exceed 2 more than 5 per cent of the time and the cloud should be randomly distributed around 0, with no systematic relation to precision.
The method of testing for type I bias can also be used to test for significance of the true effect beyond publication bias. The precision effect test (PET) is a simple ttest on the slope coefficient α of equation (5).
As one of the main objectives of most metaanalyses is to determine the dependencies of empirical results on characteristics of empirical strategy and design, we finally (in Section 6) use the general multivariate version of the FATPET method which is specified as follows:
 (6)
where Z_{ki}, k = 1, …, K are metaindependent variables assumed to potentially affect the estimate of the OLC and ω_{i} is the metaregression disturbance term, which has the standard characteristics. Each of the Z_{ki} is weighted by (1/SE_{i}) and the γ_{k} are K coefficients to be estimated, where each one measures the impact of the corresponding variable on the OLC.
The metaindependent variables used in this paper are presented in Table 2. We focus on a set of variables constructed to represent the following characteristics of models used in the Okun's law empirical literature. Regarding the influence of sample features on empirical results we concentrate on the initial and final dates (respectively FIRSTYEAR and LASTYEAR) of the studies (and a variable constructed as the central point of the sample period used, AVGYEAR); we distinguish between time series data (SAMPTS) and panel data (SAMPPA); between samples dealing with annual data (FREQY) and semestrial or quarterly data (FREQSQ); between samples using countrylevel (COUNT) or regionallevel (REG) data sets; and finally between papers that focus on OECD countries (OECDCOUNT) and papers centred on nonOECD countries (NOECDCOUNT). While there may be variance across countries within each of the OECD and nonOECD groups, these dummies control for a variety of institutional characteristics (such as property rights regimes and labour mobility conditions) that may differ systematically between, but not within, the two groups.
Table 2. Description of Potential Explanatory VariablesVariables  Description of the variable 

FIRSTYEAR  First year of the sample 
LASTYEAR  Last year of the sample 
SAMPTS  Dummy, 1 if the study uses a time series database, 0 otherwise 
SAMPPA  Dummy, 1 if the study uses a panel database, 0 otherwise 
FREQY  Dummy, 1 if the study uses annual data, 0 otherwise 
FREQSQ  Dummy, 1 if the study uses semestrial or quarterly data, 0 otherwise 
COUNTDED  Dummy, 1 if the database only includes developed countries, 0 otherwise 
COUNTDING  Dummy, 1 if the database only includes developing countries, 0 otherwise 
COUNT  Dummy, 1 if the database only includes countries, 0 otherwise 
REG  Dummy, 1 if the database only includes regions, 0 otherwise 
MODSTA  Dummy, 1 if the model is static, 0 otherwise 
MODDYN  Dummy, 1 if the model is dynamic, 0 otherwise 
OTHEXO  Dummy, 1 if the model includes other exogenous variables than the unemployment variable or the GDP variable, 0 otherwise 
NOOTHEXO  Dummy, 1 if the model includes no other exogenous variables than the unemployment variable or the GDP variable, 0 otherwise 
NEQ1  Dummy, 1 if the model includes a single equation, 0 otherwise 
NEQN  Dummy, 1 if the model includes several equations, 0 otherwise 
ENDU  Dummy, 1 if unemployment rate is used as the endogenous variable, 0 otherwise 
ENDY  Dummy, 1 if real GDP is used as the endogenous variable, 0 otherwise 
LEVEL  Dummy, 1 if the model is written with the levels of the variables, 0 otherwise 
DELTA  Dummy, 1 if the model is written with first differences of the variables, 0 otherwise 
FILTLT  Dummy, 1 if the equilibrium paths of GDP and unemployment are estimated with a linear trend, 0 otherwise 
FILTHP  Dummy, 1 if the equilibrium paths of GDP and unemployment are estimated with a HP filter, 0 otherwise 
FILTBK  Dummy, 1 if the equilibrium paths of GDP and unemployment are estimated with a Baxter King filter, 0 otherwise 
FILTBN  Dummy, 1 if the equilibrium paths of GDP and unemployment are estimated with a Beveridge Nelson filter, 0 otherwise 
FILTUC  Dummy, 1 if the equilibrium paths of GDP and unemployment are estimated with unobserved component models, 0 otherwise 
FILTMOD  Dummy, 1 if the equilibrium paths of GDP and unemployment are estimated with specific models, 0 otherwise 
YEAR  Publication year 
YEAR2  Variable YEAR squared 
Regarding equation characteristics, as explained previously we first distinguish between models using unemployment as the endogenous variable (ENDU) and models using real output as the endogenous variable (ENDY). We then distinguish between static (MODSTA) and dynamic models (MODDYN), between models including only one exogenous variable (NOOTHEXO) and models including several additional exogenous variables (OTHEXO), and then between single equation models (NEQ1) and multi equations models (NEQN). As the empirical evaluation of potential output and natural unemployment are essential steps in the estimation of the OLC, we also tried to take into account the precise nature of the econometric procedure retained for estimating these two variables. We thus constructed separate dummies for distinguishing between a linear trend methodology (FILTLT), an HP filter (FILTHP), a BaxterKing filter (FILTBK), a BeveridgeNelson procedure (FILTBN), an unobserved components model (FILTUC) or an explicit model such as a production function for potential output (FILTMOD). In order to investigate more deeply the influence of model characteristics, we also included separate dummies for distinguishing between models in levels (LEVEL) and models in first difference (DELTA).
Graphical Investigation and Bivariate Testing for Publication Bias and True Empirical Effect
 Top of page
 Abstract
 Introduction
 Theoretical Background
 Metaanalysis: Literature Sampling
 The Metaanalysis Framework: Testing for Publication Bias and Estimating the True Coefficient
 Graphical Investigation and Bivariate Testing for Publication Bias and True Empirical Effect
 Multivariate MRA
 Conclusion
 References
As it is now common in applied MRA, we start by investigating the presence of type I publication bias by using the funnel plot technique. Figure 2 (a) and (b) displays the funnel plots for the unemployment subset and the real output subset, respectively. As a measure of precision, we use the inverse of the standard deviation of point estimates, which is plotted on the vertical axis; estimates of the OLC are plotted on the horizontal axis.
There are no positive estimates in the real output subset and only seven positive estimates in the unemployment subsample so that the plot is clearly overweighed on the left side in both cases. This asymmetry is strongly suggestive of publication bias. Even though macroeconomic theory generally leads to the prediction of a negative OLC, an unbiased set of empirical evidence on the OLC would be consistent with a symmetric distribution of estimated OLC around a negative mean. For the unemployment subset, visual inspection suggests a somewhat bimodal distribution of estimates; the mean of the two most precisely estimated values places the top portion of the funnel around −0.10, although the average of the top five points on the chart is substantially larger in magnitude, at around—0.3. In the case of the real output subset, the top portion of the funnel is close to −1.63 and the average of the top five points on the graph equals −1.35. These top values are quite far from the average of all the estimates (larger by 54 per cent in the case of the unemployment subset and lower by 98 per cent in the case of the real output subsample). Although there is a very high probability that the OLC is in fact negative, the potential magnitudes of the bias show that simple summaries of this literature may lead to a biased evaluation of the true size of the OLC.
As visual inspection of the funnel plots can be misleading and vulnerable to subjective interpretation, the funnel graphs are now supplemented with the FAT performed using equation (5). Table 3 summarizes FAT results for the same samples as discussed before.
Table 3. Tests of Type I Publication Bias and the True Effect  Obs.  Dependent variable = tstatistic on the OL coefficient 

OLS estimator  IRLS estimator 

β (bias)  α (precision effect)  R^{2}  β (bias)  α (precision effect)  R^{2} 


Output subset  157  −2.060  −0.606  0.51  −1.970  −0.593  0.47 
(−5.22)***  (−11.77)***  (−6.53)***  (−11.41)*** 
Unemployment subset  112  0.171  −0.265  0.39  −0.125  −0.253  0.39 
(0.12)  (−8.39)***  (−0.06)  (−3.11)*** 
Before performing the FAT tests on each subset separately, we start by testing for the null that the data do not need to be split into these two subsets. In order to do so, we merge the two subsamples then perform an ordinary least squares (OLS) estimation of equation (5) with the whole sample. We then perform a Chow test for the selected null hypothesis. The test produces an Fstatistic of 13.594 with an associated p value of 0.000 which clearly confirms the rejection of the null. As a result, the remaining part of the paper will in the main focus on these two subsets separately.3
We now consider the sign and significance of publication bias for each of the two subsets.4 First consider the subset of studies with real output as the dependent variable (denoted ‘output subset’ in Table 3). Here, the estimated sign of β suggests that the direction of a publication bias is negative. Moreover, using either OLS or IRLS estimator, the FAT test shows that the β coefficient (intercept term) is highly significant, so that the null of no type I publication bias is strongly rejected. Also note that not only is the β coefficient negative, but its size is larger than 2 in absolute value (or nearly 2 in the case of the IRLS estimator), which might be considered as an indication of a ‘severe selectivity’ effect according to Doucouliagos and Stanley (2008).
The story is different for the case of the subset of studies with the unemployment rate as dependent variable (denoted ‘Unemployment subset’ in Table 3). In this case, the β coefficient is not significant with both OLS and IRLS estimators, so that the hypothesis of no type I publication bias is not rejected in this subset.
Hence we find that a type I bias is present only in the subset of papers estimating the OLC with empirical models using real output as the dependent variable. The difference between studies using real output as the endogenous variable and studies using unemployment rate as the endogenous variable is an important finding: while the first group of papers seems to be plagued by publication bias, the null hypothesis that the second group is not affected by this problem cannot be rejected at the usual confidence level.
We now turn to type II bias, and begin by examining the Galbraith plots shown in Fig. 3 (a) and (b) for the output subset and the unemployment subsample respectively (the horizontal lines are the +2 and −2 limits for the tstatistics). The reported tstatistics exhibit both a wide variation and an apparent tendency to decline with rising precision. This visual examination of the Galbraith plots can be complemented by the use of ztype tests on the proportion of significant tstatistics. Table 4 reports the results of these ztests.
Table 4. Tests of Type II Publication Bias and the True Effect  Proportion of significant tstat.a  Z  p value  Assumed true effect 


Endogenous: Real output  84%  41.50  0.00  0.00 
60%  30.66  0.00  −1.60b 
Endogenous: Unemployment  76%  38.80  0.00  0.00 
65%  34.95  0.00  −0.275b 
As can be seen in the Galbraith plots for the output subset and the unemployment subsample, type II biases seem to be present in both of these two subsamples. Assuming that there is no underlying true effect (TE = 0), only 5 per cent of the studies should report tstatistics larger than 2. However, the proportions of studies reporting tstatistics exceeding 2 are close to 84 per cent and 76 per cent respectively and the null hypothesis that the proportion of significant tstatistic is equal to 5 per cent is systematically rejected when the TE is taken to be zero (z = 41.50 with p < 0.0000 for the output subset and z = 38.80 with p < 0.0000 for the unemployment subset). Moreover, implementing the tests for a value of the TE evaluated from the top 10 per cent of the corresponding funnel graphs, the null hypothesis that the proportion of significant tstatistic is equal to 5 per cent is again strongly rejected (z = 30.66 with p < 0.0000 for the output subset and TE = −1.601 and z = 34.95 with p < 0.0000 for the unemployment subset and TE = −0.275).
While studies using real output as the endogenous variable and studies using unemployment rate as the endogenous variable exhibited different results with respect to the null hypothesis of no type I publication bias, the null of no type II bias is now rejected for both subsamples (and also for the combined whole sample as it happens). In the literature on the OLC, this excess variation may thus reflect selection for statistically significant results.
Whereas the detection of the presence of publication bias is a necessary step in analysing the literature, a more important question concerns whether there is an underlying true effect, irrespective of publication selection. As suggested by Stanley (2008), equation (5) may also be used to test for an authentic empirical effect beyond publication bias. Empirical results of performing the PET on the slope coefficient α of equation (5) highlight the following points.
Using the α (precision effect) point estimates and tstatistics reported in Table 3, the 95 per cent confidence intervals reported by PET for the unemployment rate subset are: [−0.33; −0.20] with OLS and [−0.41; −0.09] with IRLS. In the case of the output subset, empirical estimates of the TE are much larger (in absolute values) since they vary from [−0.72; −0.52] with OLS to [−0.70; −0.50] with IRLS.
Aside from the evident sensitivity of results to the estimation procedure, the TE obtained for the OLC appears to be systematically larger (in absolute value) for the output subset than for the unemployment subset. Empirical models aimed at estimating the OLC by using models specified with real output as the dependent variable thus seem to lead to large estimators of the sensitivity of unemployment movements to real output fluctuations.
Multivariate MRA
 Top of page
 Abstract
 Introduction
 Theoretical Background
 Metaanalysis: Literature Sampling
 The Metaanalysis Framework: Testing for Publication Bias and Estimating the True Coefficient
 Graphical Investigation and Bivariate Testing for Publication Bias and True Empirical Effect
 Multivariate MRA
 Conclusion
 References
To implement the multivariate MRA, we estimate equation (6) first for the full set of 269 estimates, and then separately for each subset of those estimates, where the partition is based on choice of endogenous (dependent) variable. Each regression initially includes all the dummy explanatory variables listed in Table 2, other than those which have to be omitted so as to avoid linear dependence (in which case the constant term represents the effects of the omitted dummies). In this paper, the omitted dummies are SAMPTS, FREQY, COUNT, COUNTDED, MODSTA, NOOTHEXO, NEQ1 and DELTA.
Each model is first estimated with OLS. Insignificant variables are then excluded with a stepwise procedure involving both specific to general (or forward) and general to specific (or backward) selection steps to specify the finally estimated model. More precisely, variables are added to the model sequentially until no variable not yet in the model would, when added, have a tstatistic with a p value smaller than 0.05. Each time a variable is added to the model, variables with the lowest tstatistics are deleted until all remaining variables have a p value smaller than 0.05.
A robustness check was then performed by reestimating the finally retained model with the IRLS method procedure. Metaexplanatory variables that appear as significant with both OLS and IRLS estimation of the finally selected model can be considered as the most influential effects on the value of the OLC. Lastly, in order to take into account the fact that the socalled ‘economics research cycle’ (Havranek, 2010) may influence the size of the OLC, the year of publication (YEAR) and its square (YEAR2) are also added to the list of the finally selected significant variables. According to the economics research cycle hypothesis, when pioneering empirical results are published they are often quickly confirmed by other publications exhibiting highly significant estimates. After that, publishing sceptical results or empirical results that diverge with initial results may become preferable for editors in order to feed the controversies. A positive coefficient associated with the variable YEAR and a negative coefficient associated with YEAR2 (with joint significance) may indicate that the economics research cycle hypothesis is consistent with the data at hand in fully specified models. Empirical results are reported in Table 5.
Table 5. Multivariate MRA  Whole sample  Unemployment subsample  Output subsample 

OLS  Stepwise then IRLS  OLS  Stepwise then IRLS  OLS  Stepwise then IRLS 


Constant  −240.41 (−2.01)  −194.45 (−3.00)  −286.50 (−0.72)   −274.87 (−3.24)  −327.92 (−5.58) 
Precision  −0.400 (−3.08)  −0.528 (−9.44)  −0.289 (−1.15)  −0.409 (−12.53)  −1.138 (−8.85)  −1.022 (−14.81) 
SAMPPA  −0.261 (−1.74)  −0.174 (−1.80)    0.054 (0.64)  
FREQSQ  0.152 (1.37)  0.186 (4.38)  0.147 (0.72)  0.197 (4.55)  1.775 (5.36)  1.489 (11.86) 
COUNTDING  0.188 (3.83)  0.225 (4.83)  0.139 (1.65)  0.205 (6.77)   
REG  0.334 (2.67)  0.293 (3.71)    0.183 (2.01)  0.192 (2.77) 
MODDYN  0.117 (2.36)  0.145 (2.96)  0.008 (0.09)   1.379 (6.33)  1.107 (10.29) 
OTHEXO  0.138 (2.16)  0.218 (5.54)  0.012 (0.10)   −0.764 (−4.34)  −0.614 (−5.22) 
NEQN  −0.057 (−1.65)   −0.071 (−1.39)    
ENDY  −0.437 (−3.35)  −0.390 (−6.22)     
LEVEL  −0.124 (−1.71)   −0.253 (−1.89)  −0.211 (−5.85)  1.371 (5.33)  1.107 (10.296) 
FILTLT  −0.153 (−1.09)   −0.055 (−0.11)   0.123 (0.85)  
FILTHP  −0.031 (−0.54)   −0.008 (−0.08)   0.134 (0.99)  
FILTBK  −0.160 (−1.00)   0.022 (0.05)   0.301 (1.77)  
FILTBN  −0.300 (−1.20)   −0.325 (−0.72)   0.106 (0.51)  
FILTUC  −0.019 (−0.16)   −0.012 (−0.05)   0.057 (0.32)  
FILTMOD  0.545 (0.88)      
AVGYEAR  0.120 (1.99)  0.097 (2.96)  0.143 (0.72)   0.138 (3.22)  0.164 (5.56) 
R^{2}  0.65  0.61  0.62  0.57  0.80  0.79 
Ftest (p value)  0.000  0.000  12.43 (0.00)  0.000  42.95 (0.00)  71.30 (0.00) 
Reset test (p value)  0.061 (0.80)  0.024 (0.87)  0.003 (0.95)  0.936 (0.33)  2.097 (0.15)  0.557 (0.46) 
In order to obtain more information about the influence of the endogenous variable on the OLC estimates, equation (6) is first estimated for the whole set of 269 OLC estimates, with the model including the full set of explanatory variables. Given our previous finding in Section 4, we are aware that this pooling process (stacking the effect of GDP on unemployment and the inverse of the effect of unemployment on GDP) is likely to be invalid. But this is precisely why we do carry out this step so that here, in a more general multivariate context, the influence of endogenous variable (either GDP or unemployment) can be statistically tested for.
Of particular interest in this exercise is the role played by the dummy variable ENDY (which equals 1 if real GDP is used as the endogenous variable and 0 otherwise). In this case, the constant term captures the influence of omitted variables for the subset of models with unemployment rate as the endogenous variable and the coefficient associated with the dummy ENDY, where it is nonzero and significant, indicates by how much the OLC changes when moving from the unemployment subset to the real output subset.
This initial regression is presented in the first two columns of Table 5. The last four columns present the empirical results for the unemployment subset and the output subset respectively. For each pair of columns in the table, the first column in the pair lists unrestricted OLS regression results, while the second reports results from the IRLS estimator after applying the stepwise testing down procedure.
For the whole sample and each of the two subsets, Ftests indicate that the estimated coefficients are jointly significant. However, in the unrestricted regressions, low values of tstatistics indicate that some coefficients may be nonsignificant. This is confirmed by the stepwise testing down procedure.
For the ‘pooled regression’ using the full set of 269 OLC study estimates, the results of the multivariate analysis are consistent with the bivariate FAT model and also suggest the presence of a publication bias. Moreover, the estimated ‘true’ OLC equals −0.53 (with 95 per cent confidence interval (−0.64, −0.42)) with the IRLS procedure. Note that in this multivariate analysis, the coefficient of the precision effect can be considered as a measure of the OLC for studies corresponding to the omitted dummies (i.e. studies using annual time series data for developed countries and single equation models specified as static relationships involving the first difference of unemployment rate as the dependent variable and the first difference of real output as the only dependent variable). As suggested by the value and significance of the coefficient associated with the moderator variable ENDY, studies using a model specified with output as the dependent variable tend to yield larger absolute values of the OLC (a positive sign means that the value of the OLC increases towards zero while a negative sign means that the value of the OLC decreases away from zero). Moreover this effect appears to be highly significant, as revealed by the associated tstatistics. The use of real output instead of the unemployment rate as the dependent variable in the Okun's law equation specification increases the absolute value of the OLC by 0.390 (on average). As the estimated OLCs in the sample are harmonized so as to represent the impact of output on unemployment, the coefficient on the unemployment variable retained for this group of studies is simply the inverse of the coefficient associated with unemployment (or employment) in the real output equation. As a consequence, the large negative values of the OLC estimated in this pooled group of studies may result from the fact that estimating some form of production function leads to an underestimation of the sensitivity of output to employment (or unemployment) because of simultaneity bias. The OLC calculated as the inverse of this coefficient is thus mechanically overestimated.
When splitting the whole sample so as to analyse separately the group of studies involving an Okun's law model with unemployment rate as the endogenous variable and the group of studies with real output as the endogenous variable, the multivariate models lead to empirical results for publication bias and authentic empirical effect which are fully consistent with those from bivariate MRA. Papers with real output as the endogenous variable are affected by negative publication bias while no publication bias appeared as statistically significant in the case of papers with unemployment rate as the endogenous variable. Moreover, authentic empirical effects are significant in both groups of papers with a lower value (in absolute terms) for the group of studies with unemployment rate as the endogenous variable. The precision effect equals −0.40 (with 95 per cent confidence interval {−0.47, −0.34}) for the unemployment subset and −1.02 (with 95 per cent confidence interval {−1.15, −0.88}) for the output subset.
For both subsets, it is important to note that the influence of the filtering procedure (such as the HP filter, or the Baxter King filter or Beveridge Nelson filter) is never significant after selection of the most influential moderator variables with the stepwise methodology. Finally, as in the case of the bivariate MRA, the hypothesis of an ‘economics research cycle’ is systematically rejected at the 5 per cent confidence level with both subsets (F_{(2,259)} = 0.327 with p value = 0.722 for the unemployment rate subset and F_{(2,259)} = 0.960 with p value = 0.385 for the real output subset).
Let us consider first results for the multivariate MRA using the ‘unemployment as endogenous variable’ subset. The null hypothesis of linear functional form (no omitted variables) for the estimated model is not rejected by the Ramsey RESET test. Empirical estimates of the magnitude of the OLC are affected by the frequencies of the databases (FREQSQ: +), the development level of the countries (COUNTDING: +) and by whether the model specification is in terms of level or first difference of the variables (LEVEL: −). The higher the frequency of the data, the smaller the OLC (in absolute terms). Whereas adjustment may be rather rapid in some circumstances, it takes time for output variations to generate changes in the rate of unemployment. Quarterly or semestrial databases may thus yield lower estimated OLC values. Other things equal, the estimated OLC is also lower (in absolute terms) when the database includes only nonOECD countries. One might conjecture, although we have no evidence for this here, that this may be explained by the dependence of the magnitude of the OLC on labour market institutions, the ease of hiring and firing workers, labour mobility, migration possibilities and the nature of economic shocks. Finally, specification of the Okun's law model in levels (LEVEL = 1) systematically leads to higher estimated OLC values (in absolute terms). One plausible explanation for this finding is that models estimated in levels (without filtering the data so as to exclude potential output or natural unemployment) will capture the total cumulated or longrun effect of the exogenous variable on the endogenous variable. The corresponding estimates of the OLC may thus be expected to be larger with this kind of model.
We now consider results for the multivariate MRA using the ‘output as endogenous variable’ subset. The overall fit is quite high for a metaregression and the null hypothesis of linear functional form is again nonrejected by the RESET test. The last two columns of Table 5 show that empirical estimates of the OLC are smaller (in absolute value) when using semestrial or quarterly data rather than when using annual data5 (FREQSQ: +), and when using regional data instead of national data (REG: +).
The results in the last two columns of Table 5 show positive coefficients on the dummy variables picking out whether the specification used is that of a dynamic model of the Okun's law involving lags of the measure of unemployment and/or real output (MODDYN: +), and when the model specification is in terms of the levels of the variables (LEVEL: +). But we must take care in interpreting these two positive coefficient signs, particularly given that the positive coefficient on LEVEL in this regression appears to contradict the negative coefficient found on LEVEL in the MRA regression involving the unemployment subset. This apparent contradiction is easily resolved. In the case of models where unemployment is the endogenous variable, we reported in Table 5 that where a study used a regression in the levels of variables the OLC will be larger in absolute value; that is, the coefficient on LEVEL was negative. However, in the case of models where GDP is the endogenous variable, the same result will appear and the impact of unemployment on real output will be larger. But this will be revealed as a positive coefficient on the coefficient in Table 5 because we retain the inverse of the estimated OLC for models with output as the endogenous variable (so as to make them comparable to the OLC obtained when unemployment is endogenous).
The same reasoning applies to the coefficient attached to the variables MODDYN as it does to that attached to the variable LEVEL. They are both reported as positive (and of the same order) in Table 5. Hence, the coefficient on MODDYN implies that, for the case of studies using output as endogenous variable, the OLC will be larger in absolute value where models are estimated with dynamic regressions (including at least lags of the endogenous variables). Again, one might conjecture that this arises because such models will capture the total cumulated or longrun effect of the exogenous variable on the endogenous variable.
Finally, one can see from the final two columns of Table 5 that a more recent database also seems to lead to smaller values (in absolute values) of the OLC (AVGYEAR: +). In contrast, the estimated impact on unemployment of output is larger (in absolute terms) when extra exogenous variables are added to the regression model (OTHEXO: −).
These results suggest the following. First, studies that use regional data instead of macroeconomic data are more likely to report smaller values (in absolute terms) of the OLC. This lower sensitivity of unemployment rate to regional output variations may be due to the fact that asymmetric regional output shocks are partly dampened by local or regional policy adjustments. Another possibility might be that regional labour market disequilibrium is partly cancelled by real wage variations and labour mobility so that the regulation does not systematically occur through variations in the number of unemployed persons. Second, the absolute value of the OLC tends to be smaller (in absolute terms) in studies using a dynamic model instead of a static one. Dynamic models incorporate lags of the endogenous variable and may also include lags of the exogenous variables as in the traditional autoregressive distributed lag (ARDL) model. Even with a limited number of lags, this kind of model may capture the total cumulated effect of real output variations on unemployment. This total cumulated effect of real output on unemployment may thus be expected to be lower than the impact effect evaluated with a static model if disequilibria of the labour market tend to vanish progressively over time. However, this interpretation has to be advanced with care because the retained sample does not allow us to investigate the context of complex dynamic effects such as threshold effects or nonlinear effects over time.
Conclusion
 Top of page
 Abstract
 Introduction
 Theoretical Background
 Metaanalysis: Literature Sampling
 The Metaanalysis Framework: Testing for Publication Bias and Estimating the True Coefficient
 Graphical Investigation and Bivariate Testing for Publication Bias and True Empirical Effect
 Multivariate MRA
 Conclusion
 References
In this paper, we have been searching for the value of an underlying nonobservable parameter (a ‘true effect’). However, in the real world, the observed value of the parameter or the estimated value of the parameter can be different from this underlying true value because of the characteristics of the country under examination, and of many other things such as data periodicity, the filtering procedure, and so on. It is these kinds of factors which can help one to understand and explain the large degree of heterogeneity of the OLC in the associated empirical literature.
We selected a sample of 269 estimates of the OLC from the literature to uncover the reasons for the differences in empirical results across studies and to estimate the ‘true’ OLC. On the basis of prior analysis suggesting the inappropriateness of pooling, we then implemented a MRA on each of two subsets of studies: the group using some measure of unemployment as the dependent variable and the group employing a production function version of the Okun's law with some measure of output as dependent variable.
While there is evidence of type II publication bias in both subsets, a type I bias is present only among the papers using a measure of output as the dependent variable. Moreover, taking into account those biases, the estimated true OLCs are significantly larger (in absolute value) with models using output as the dependent variable: (−0.61 instead of −0.25 with a bivariate MRA and −1.02 instead of −0.40 with a multivariate MRA). Our results clearly show that one of the primary sources of heterogeneity that can be identified in this literature is between studies which investigate the OLC with a model including some measure of unemployment as the dependent variable and those that focus on a model involving some measure of output as the endogenous variable.
Thus, model specification is an important source of heterogeneity in this literature, and it may be reasonable to argue that there are two underlying ‘true values’ for the OLC depending on the choice of dependent variable. Selecting some measure of output as endogenous variable might amount to estimating a form of production function indicating the longrun impact of employment on real output. In contrast, when estimating the OLC with a model in which some measure of the unemployment rate is treated as the endogenous variable, such a specification seems adequate to capture the shortrun impact of aggregate demand movements on unemployment variations.
But of course choice of dependent variable is not the only source of heterogeneity. Among other possible sources of heterogeneity, we found the dynamic specification of the model, the frequency of the data, the degree of development of the countries and the choice between regional data and national data to be particularly important. To help interpret our results, let us consider characteristics of the zone or country in question, including the degree of development of the region or country. To capture (and control for) such factors, our multivariate MRA models included exogenous dummy variables to pick out whether a study only database comprised only developed countries (or only developing countries) and whether a study database only included countries (or only included regions). In doing so, we implicitly assume that the OLC can be different from the true value because of two important characteristics of the country under examination: the degree of development of the country or zone; and the degree of exogeneity of wages and the degree of labour mobility (through the dummies REG and COUNT).6 Moreover, including different countries will not bias our OLC estimates if the chosen dummies capture the main influence of the characteristics of the countries on the OLC. Thus, if one wished to identify the particular value of the Okun's law relationship for a given country, one should not use the estimated ‘true value’ of the OLC, but rather use the value implied by our estimates for that country; that is, the value which takes into account the characteristics of this country.
Now we turn to what our results tell us about the true value of the OLC. After eliminating the influence of the main characteristics of each country (the previously mentioned dummies), the influence of the characteristics of the databases and the characteristics of the econometric procedures, the fundamental true value of the OLCs are −0.61 and −1.02 (depending on the endogenous variables: unemployment or GDP). Of course, we cannot use these values for a given country but we can say that the real value of the correlation between unemployment and GDP movements should be close to −0.61 (when the main assumption is that shortrun unemployment movements are mainly driven by exogenous output shocks), and −1.02 (when the main assumption is that exogenous unemployment rate variations impact on GDP movements), on average and across countries and regions.