Summary
 Top of page
 Summary
 1 Introduction
 2 Data and Design of the Forecasting Exercise
 3 Baseline Case
 4 Selection of Hyperparameters and Lag Length
 5 Level Versus Growth Rates and the Role of Cointegration/unit Roots Priors
 6 Alternative MultiStep Forecasting Approaches
 7 Litterman Prior and CrossVariable Shrinkage
 8 Rolling Versus Recursive Estimation
 9 VAR Size
 10 Results for other Countries
 11 Conclusions
 Acknowledgements
 References
 Supporting Information
In this paper we discuss how the point and density forecasting performance of Bayesian vector autoregressions (BVARs) is affected by a number of specification choices. We adopt as a benchmark a common specification in the literature, a BVAR with variables entering in levels and a prior modeled along the lines of Sims and Zha (International Economic Review 1998; 39: 949–968). We then consider optimal choice of the tightness, of the lag length and of both; evaluate the relative merits of modeling in levels or growth rates; compare alternative approaches to hstepahead forecasting (direct, iterated and pseudoiterated); discuss the treatment of the error variance and of crossvariable shrinkage; and assess rolling versus recursive estimation. Finally, we analyze the robustness of the results to the VAR size and composition (using also data for France, Canada and the UK, while the main analysis is for the USA). We obtain a large set of empirical results, but the overall message is that we find very small losses (and sometimes even gains) from the adoption of specification choices that make BVAR modeling quick and easy, in particular for point forecasting. This finding could therefore further enhance the diffusion of the BVAR as an econometric tool for a vast range of applications. Copyright © 2013 John Wiley & Sons, Ltd.
1 Introduction
 Top of page
 Summary
 1 Introduction
 2 Data and Design of the Forecasting Exercise
 3 Baseline Case
 4 Selection of Hyperparameters and Lag Length
 5 Level Versus Growth Rates and the Role of Cointegration/unit Roots Priors
 6 Alternative MultiStep Forecasting Approaches
 7 Litterman Prior and CrossVariable Shrinkage
 8 Rolling Versus Recursive Estimation
 9 VAR Size
 10 Results for other Countries
 11 Conclusions
 Acknowledgements
 References
 Supporting Information
Forecasting future developments in the economy is a key element of the decision process in policy making, consumption and investment decisions, and financial planning. For example, members of the Federal Open Market Committee often stress that, because monetary policy affects the economy with a lag, policy must be forwardlooking. Looking ahead means relying on forecasts of output growth, inflation, and other key indicators.
Recently there has been a resurgence of interest in applying Bayesian methods to point and density forecasting, particularly with Bayesian vector autoregressions (BVARs). BVARs have a long history in forecasting, stimulated by their effectiveness documented in the seminal studies of Doan et al. (1984) and Litterman (1986). In recent years, the models seem to be used even more systematically for policy analysis and forecasting macroeconomic variables (e.g. Kadiyala and Karlsson, 1997; Koop, 2013). At present, there is considerable interest in using BVARs for these purposes in a large dataset context (e.g. Carriero et al., 2009, 2011; Banbura et al., 2010; Koop, 2013).
However, putting BVARs to use in practical forecasting raises a host of detailed questions about model specification, estimation and forecast construction.
With regard to model specification, the researcher needs to address issues such as (i) the choice of the tightness and of the lag length of the BVAR; (ii) the treatment of the error variance and the imposition of crossvariable shrinkage1; (iii) whether or not to transform the variables to get stationarity, and whether to complement this choice with the imposition of priors favoring cointegration and unit roots.
Accordingly, a first point of this paper is to examine the effects that such specification choices have on the forecast accuracy of the BVAR. We adopt as a benchmark a common specification in the literature, a Bayesian VAR with variables entering in levels and a prior modeled along the lines of Sims and Zha (1998), Robertson and Tallman (1999), Waggoner and Zha (1999), Zha (1998) and, more recently, Giannone et al. (2012).
With regard to model estimation and forecast construction, under some approaches estimating and forecasting with a BVAR can be technically and computationally demanding. For the homoskedastic BVAR with natural conjugate prior, the posterior and onestepahead predictive densities have convenient analytical forms (Student's t). However, even for this prior, multistep predictive densities do not have analytical forms and simulation methods are required. Under a Normalinverted Wishart prior and posterior that treat each equation symmetrically, Monte Carlo methods can be used to efficiently simulate the multistep predictive densities, taking advantage of a Kronecker structure to the posterior variance of the model's coefficients.
Other priors or model extensions (such as allowing for asymmetric prior variances across equations, or time series heteroskedasticity in the disturbances) mean that neither posteriors nor predictive densities have analytical forms. In these cases, simulations become more computationally intensive because the posterior variance of the model's coefficients no longer has a Kronecker structure. To avoid costly simulation, Litterman's (1986) specification of the Minnesota prior treats the error variance matrix as fixed and diagonal. Litterman (1986) imposes such a strong assumption to allow for equationbyequation ridge estimation of the system; treating the error variance matrix as random would have required Markov chain Monte Carlo (MCMC) simulations of the entire system of equations.
While improved computational power has made simulation of models under a Normalinverted Wishart prior specification more tractable, some researchers and practitioners may prefer to avoid simulation methods and use alternatives considered in such studies as Banbura et al. (2010) and Koop (2013). This preference could stem from the computational hurdles of conducting simulations with very large models. Also, it can stem from very tight time constraints for the production of the forecasts, as can be the case for market strategists, for example. Alternatively, the preference could be a function of software choice and the coding burden of simulation. Common software such as RATS and Eviews provides commands for estimating BVARs and forecasting without simulation; simulation requires more significant programming by the user. Similarly, while many users of Matlab are capable of programming simulation, the absence of simple procedures or toolboxes may make simulation costly to other users.
Accordingly, a second point of this paper is to examine approaches that make the computation of point and density forecasts from BVARs quick and easy, for example by making specific choices on the priors and by using direct rather than iterated forecasts (e.g. Marcellino et al., 2006). In most cases, the resulting forecasts represent approximations of the posterior distribution. Hence we then assess whether such approximations yield significant losses in terms of decreased forecast precision, as measured by either the root mean squared forecast error or the predictive score, in the case of density forecasts. We show that, for users focused on point forecasts, there is little cost to methods that do not involve simulation.
Since it is difficult to rank the alternative modeling and forecasting choices from a purely theoretical point of view, given that their relative performance will be determined by the unknown datagenerating process, we take a more practical perspective. Specifically, we consider a set of variables whose future evolution is of key interest for central banks and more generally for economic policy making, and we evaluate the performance of different BVAR modeling choices in this context. In light of recent evidence on the success of larger models relative to smaller ones and interest in large datasets (e.g. Banbura et al., 2010; Koop, 2013), we focus on midsize models applied to monthly data: 18variable BVARs for US macroeconomic and financial data.
To ensure our results have broad applicability, we check their robustness to changes in both the time series and crosssectional dimension of the system. In particular, we consider recursive and rolling estimation, a reduction in the size of the VAR to a subset of seven of the 18 US variables, and we repeat the analysis for some other datasets—specifically, data for Canada, France and the UK.
We obtain a large set of empirical results, but we can summarize them by saying that we find very small losses (and sometimes even gains) from the adoption of BVAR modeling choices that make forecast computation quick and easy, in particular for point forecasting. An approach that works well is to specify a Normalinverted Wishart prior along the lines of Sims and Zha (1998) on the VAR in levels, preferably optimizing its tightness and lag length. Optimizing over the lag length is generally helpful, and optimal selection of the tightness never harms, though the average gains are small in our empirical applications. For the accuracy of point forecasts, there proves to be essentially no payoff to using MCMC methods to obtain multistep forecasts from the posterior distribution. For density forecasting, simulation methods work better than a direct multistep approach, especially at longer horizons (and less so at shorter horizons). Specifications in levels benefit a lot from the imposition of the sum of coefficients and dummy initial observation priors of Doan et al. (1984) and Sims (1993). Instead, there is no payoff to using a Litterman (1986) prior that treats the error variance matrix as fixed and diagonal and is tighter for lags of other variables than for lags of the dependent variable. Using forecast robustifying methods, such as rolling estimation or modeling in differences, can enhance the density forecasting performance, while in terms of mean squared error it is difficult to do better than the benchmark. The finding that simple methods work well could therefore further enhance the diffusion of the BVAR as an econometric tool for a vast range of applications.
The paper is structured as follows. In Section 2 we describe the US data and the design of the forecasting exercise. In Section 3 we present the baseline case. In the following three sections we evaluate changes in three main features of the benchmark. Specifically, In Section 4 we consider optimal choice of the tightness, lag length and both. In Section 5 we consider modeling in levels or growth rates. In Section 6 we compare the alternative approaches to multistep forecasting, with a special focus on the nonsimulationbased ones. In Section 7 we discuss the treatment of the error variance and of crossvariable shrinkage. Next we evaluate the robustness of our findings. In particular, in Section 8 we consider the relative merits of rolling and recursive estimation. In Section 9 we look at the size of the VAR and in Section 10 we summarize the results for Canada, France and the UK, comparing them with those for the US. Finally, in Section 11 we summarize the main findings and conclude. Supplemental material referred to in the text is available upon request.
2 Data and Design of the Forecasting Exercise
 Top of page
 Summary
 1 Introduction
 2 Data and Design of the Forecasting Exercise
 3 Baseline Case
 4 Selection of Hyperparameters and Lag Length
 5 Level Versus Growth Rates and the Role of Cointegration/unit Roots Priors
 6 Alternative MultiStep Forecasting Approaches
 7 Litterman Prior and CrossVariable Shrinkage
 8 Rolling Versus Recursive Estimation
 9 VAR Size
 10 Results for other Countries
 11 Conclusions
 Acknowledgements
 References
 Supporting Information
Our dataset for the USA has monthly frequency and runs from January 1973 to March 2010.2 The data include 18 macroeconomic and financial series of major interest to policymakers and forecasters, listed in Table 1 (panel A).
Table 1. Description of dataset and transformationsCode  Series  Transformation 

  VAR in Levels  VAR in growth rates 


Panel A: USA 
UR  Unemployment rate  None  None 
PCEPI  PCE price index  1200lny_{t}  1200ln(y_{t}/y_{t − 1}) 
PCEXFEPI  Core PCE price index (ex food and energy)  1200lny_{t}  1200ln(y_{t}/y_{t − 1}) 
PAYROLLS  Nonfarm payroll employment  1200lny_{t}  1200ln(y_{t}/y_{t − 1}) 
WEEKLYHRS  Weekly hours worked  None  None 
CLAIMS  New claims for unemployment insurance  None  None 
RETAILSALES  Nominal retail sales  1200lny_{t}  1200ln(y_{t}/y_{t − 1}) 
CONSCONF  Index of consumer confidence  None  None 
STARTS  Singlefamily housing starts  100lny_{t}  100ln(y_{t}/y_{t − 1}) 
IP  Industrial production  1200lny_{t}  1200ln(y_{t}/y_{t − 1}) 
CU  Index of capacity utilization  None  None 
PMISUPDELIV  Purchasing Managers’ Index of supplier delivery times  None  None 
PMIORDERS  Purchasing Managers’ Index of new orders  None  None 
POIL  Price of oil (West Texas Intermediate)  100lny_{t}  100ln(y_{t}/y_{t − 1}) 
SP500  S&P 500 index of stock prices  100lny_{t}  100ln(y_{t}/y_{t − 1}) 
ITB10y  Yield on 10year Treasury bonds  None  None 
FFR  Federal funds rate  None  None 
REALXR  Real exchange rate  100lny_{t}  100ln(y_{t}/y_{t − 1}) 
Panel B: Canada, France, UK 
UNRATE  Unemployment rate  None  None 
EMPLOY  Total employment  1200ln(y_{t})  1200ln(y_{t}/y_{t − 1}) 
IP  Industrial production  1200ln(y_{t})  1200ln(y_{t}/y_{t − 1}) 
CPI  CPI inflation  1200ln(y_{t})  1200ln(y_{t}/y_{t − 1}) 
OIL  Spot commodity price—crude oil  100ln(y_{t})  100ln(y_{t}/y_{t − 1}) 
XRATE  Real exchange rate vs. major currencies  100ln(y_{t})  100ln(y_{t}/y_{t − 1}) 
STOCKPRICE  Stock price index  100ln(y_{t})  100ln(y_{t}/y_{t − 1}) 
POLRATE  Official policy rate  None  None 
BONDRATE  10year government bond yield  None  None 
In the paper we will report results based on both a VAR for the variables in levels or loglevels (which we label VAR in levels), and a VAR estimated after transforming variables as needed to get stationarity (which we label VAR in growth rates). In this growth rates specification, we logdifference variables such as employment to make them stationary, but we do not difference interest rates and diffusion indexes from surveys because, conceptually, they should be stationary. For all variables, the prior means of the coefficients will be set accordingly to 1 (for the VAR in levels) or 0 (for the VAR in growth rates). The transformations used on each variable are listed in Table I.
While we estimate the models in both levels form and (for some variables) difference form, we always report the forecast results in units corresponding to stationary variables, given in the last column of Table I. For example, the forecast results for industrial production (IP) are for annualized growth rates of IP. For models estimated in levels, we must transform some of the modelproduced forecasts to use the same units.
The main forecasting exercise is performed in pseudoreal time, i.e. we never use information which is not available at the time the forecast is made. For all models, we use a recursive estimation window, except in section 8, where we assess the robustness of the results to the use of a rolling sample estimation scheme. We have data starting from 1973:1, but after differencing the first observation is missing. Moreover, as we plan to compare models in levels featuring up to 13 lags (and 12 lags in growth rates), we start with the estimation sample of 1974:2 to 1985:12 in order to have the same number of data points for each model. We produce forecasts for all the horizons up to 12 steps ahead; for a horizon of h periods, the first available forecast is for 1986:1 + h1. Our last estimation sample is 1974:2 to 2009:3, yielding a forecast for horizon h for date 2009:4 + h1.
We will evaluate both the point and density forecast ability of the examined models. For point forecasts, we evaluate our results in terms of root mean squared forecast error (RMSFE) for a given model. Let denote the forecast of the ith variable made by model M. The RMSFE made by model M in forecasting the ith variable at horizon h is
 (1)
where the sum is computed over all the P forecasts produced.
To compare each model M against the benchmark B we therefore consider the percentage gains in terms of RMSFE, defined as
 (4)
and the percentage gain in terms of score, which is
 (5)
Finally, to have an indication of the statistical significance of differences in forecasting performance, we provide the results of the Diebold and Mariano (1995) test for equal mean squared forecast error, compared against standard normal critical values. Following the recommendation of Clark and McCracken (2011c), to reduce the chances of spurious rejections at longer forecast horizons, we compute the Diebold–Mariano test with the Harvey et al. (1997) adjustment of the variance that enters the test statistic. Our use of the Diebold–Mariano test with forecasts that are, in many cases, nested is a deliberate choice. Monte Carlo evidence in Clark and McCracken (2011a,2011b) indicates that, with nested models, the Diebold–Mariano test compared against normal critical values can be viewed as a somewhat conservative (in the sense of tending to have size modestly below nominal size) test for equal accuracy in finite samples. Nonetheless, we obtain many rejections of the null of equal accuracy. This reflects the higher power of the test due to our long forecast sample (269 onestep observations for the USA) compared to many other studies in the literature.
To provide a rough gauge of the statistical significance of differences in average log scores, we use the Amisano and Giacomini (2007) ttest of equal means, applied to the log score for each model relative to the baseline forecast. We view the tests as a rough gauge because the asymptotic validity of the Amisano and Giacomini (2007) test requires that, as forecasting moves forward in time, the models be estimated with a rolling, rather than expanding, sample of data. The tstatistics are computed with a serial correlationrobust variance, using a rectangular kernel, h1 lags, and the smallsample adjustment of Harvey et al. (1997).
5 Level Versus Growth Rates and the Role of Cointegration/unit Roots Priors
 Top of page
 Summary
 1 Introduction
 2 Data and Design of the Forecasting Exercise
 3 Baseline Case
 4 Selection of Hyperparameters and Lag Length
 5 Level Versus Growth Rates and the Role of Cointegration/unit Roots Priors
 6 Alternative MultiStep Forecasting Approaches
 7 Litterman Prior and CrossVariable Shrinkage
 8 Rolling Versus Recursive Estimation
 9 VAR Size
 10 Results for other Countries
 11 Conclusions
 Acknowledgements
 References
 Supporting Information
It is in principle unclear whether transforming variables into their growth rates can enhance the forecasting performance of the BVAR. Some researchers and practitioners prefer to leave variables in log levels and impose prior means of unit roots with additional priors on sums of coefficients (see, for example, Banbura et al., 2010; Giannone et al. 2012). One reason is that such a specification can better take into consideration the existence of longrun (cointegrating) relationships across the variables, which are omitted in a VAR in differences. On the other hand, Clements and Hendry (1996) show that in a classical framework differencing can improve the forecasting performance in the presence of instability. Diebold and Kilian (2000) show that, for variables with unit roots, forecasting accuracy can be improved by differencing. Hence this is another issue to be considered from an empirical perspective. As far as we know, there has been little published effort in the BVAR forecasting literature to compare specifications in levels versus differences. Following the Litterman (1986) tradition, some BVAR forecasting work uses models with variables in levels or log levels (e.g. Banbura et al., 2010; Giannone et al., 2010, 2012), while other work uses models in differences or growth rates (e.g. Del Negro and Schorfheide, 2004; Clark and McCracken, 2008; Koop, 2013).
Accordingly, we revisit the levels versus growth rates question. We estimate a version of the baseline BVAR in which many variables enter the model in growth rates (see Table 1). As the transformed variables are likely to be stationary, we change the prior beliefs accordingly. In particular, the prior mean Φ^{*} in (13) is set to 0 for all variables, while we remove the unit root/cointegration priors.8 We use 12 lags and set the overall shrinkage parameter λ_{1} at 0.2.
Results for the growth rate specification are displayed in Figure 4. On average over all variables and forecast horizons, the differences in the loss functions are small, with an average gain in RMSE of just 0.37% and an average loss in score of just 0.75%. When one looks at individual variables an interesting pattern emerges. Most of the variables feature a small increase in RMSFE when the forecasts are produced with the model in growth rates. Indeed, the model in growth rates outperforms the model in levels in only 36.6% of the combinations of variables and horizons. However, for a few variables at longer horizons (WEEKLYHRS, PMISUPDELIV, FFR and ITB10Y), the model in growth rates provides sizable forecasting gains. On average, these effects cancel out and overall we have that mean and median relative gains are very close to 0. Similarly, for average log scores, for most variables and horizons (67.6% of the cases), the model in levels performs better than the model in growth rates, although the median difference in scores is small. Again, though, for a few variables at longer horizons (e.g. WEEKLYHRS and PMISUPDELIV), the growth rates specification yields sizable improvements in scores.
For this growth rates specification we have also considered optimizing over the tightness hyperparameter and lag length. The optimal tightness changes only once, moving from a value of 0.25 in the first 40 estimation samples to 0.2 for all the remaining samples. The optimal lag length is instead fixed at 12 over all the samples. For this reason, in the growth rate case the results obtained by optimizing the hyperparameters are inevitably very close to those obtained with the baseline specification.
The role played by the inclusion of the sum of coefficients and initial dummy observation priors in the baseline levels specification also deserves investigation. If one decides to estimate the model in levels, then these priors implement the belief that there is unitroot/cointegration in the system, which is a typical feature of macroeconomic datasets such as the one considered here. Sims (1993) shows examples that ‘without such elements in the prior, fitted multivariate time series models tend to imply that an unreasonably large share of the sample period variation in the data is accounted for by deterministic components’. In order to check for this effect we have also reestimated our BVAR in levels omitting the sum of coefficients and initial dummy observation prior. We do not report the results here, for brevity, but they are available in the supporting information, Figure 11. The main finding is that once one does that, the model in growth rates systematically dominates the model in levels. This is evident in RMSFEs (relative to the baseline model) that are much higher for the model in levels without these priors (by an average of about 11%) than for the model in growth rates. This suggests that, while the model in growth rates does not need by construction the cointegration/unit root prior (because it is stationary), when the model is estimated in levels such priors do help and should be included, consistent with evidence in Banbura et al. (2010).
Finally, it is worth mentioning that, even when working with larger VARs, forecasters might be interested in forecasting only a few variables in the system, e.g. inflation and output growth. In this paper, selections of lag length and prior hyperparameters were based on all of the variables. While we do not pursue this here to save space, a promising direction for further research is to optimize only the marginal likelihood of the variables of interest.9
To sum up, we find that specifications in levels and growth rates produce on average comparable forecasts. For most variables, the model in levels produces more often (i.e. for more variables) gains in point and density forecasting, but the model in growth rates performs particularly well for selected variables. The good performance of the model in levels deteriorates substantially if one removes the unit roots/cointegration priors. Therefore, if one wants to use the traditional Minnesota prior specified in (13), then we recommend working in differences. For a VAR in levels, the traditional Minnesota prior should be combined with the dummy observation/sum of coefficients priors in (14) and (15). Overall, the latter specification should be preferred.10
7 Litterman Prior and CrossVariable Shrinkage
 Top of page
 Summary
 1 Introduction
 2 Data and Design of the Forecasting Exercise
 3 Baseline Case
 4 Selection of Hyperparameters and Lag Length
 5 Level Versus Growth Rates and the Role of Cointegration/unit Roots Priors
 6 Alternative MultiStep Forecasting Approaches
 7 Litterman Prior and CrossVariable Shrinkage
 8 Rolling Versus Recursive Estimation
 9 VAR Size
 10 Results for other Countries
 11 Conclusions
 Acknowledgements
 References
 Supporting Information
The baseline specification we have considered so far is a Normalinverted Wishart conjugate prior which features the same sample mean for the VAR coefficients of the prior proposed by Litterman (1986) and which is known as Minnesota prior. However, the NIW prior of our baseline specification differs in three respects. First, in Litterman's original implementation the unit root/cointegration priors were not considered (these were introduced by Doan et al., 1984, and Sims, 1993). We have analyzed the role played by such priors in Section 5 and concluded that it is indeed relevant in models in levels.
Second, in the original Litterman implementation, the hyperparameter λ_{2} in the prior variance is set to a value smaller than 1, which puts additional shrinkage on the lags of all the variables other than the dependent variable of the ith VAR equation. Litterman (1986) sets this parameter to a value smaller than 1 in order to capture the idea that, at least in principle, these lags should be less relevant than the lag of the dependent variable itself. This modification implies that the Kronecker form for the coefficient variance matrix breaks down and as a consequence one can only derive the conditional posteriors, while to draw from the joint posterior of the coefficients and error variance matrix one needs to implement MCMC (Gibbs) sampling. Gibbs sampling has poorer mixing properties than the simple MC integration required for the NIW case as MCMC methods produce autocorrelated draws. Moreover, a Gibbs sampling algorithm would require at each iteration the manipulation of MN × MN matrices to derive the conditional posterior mean of the coefficients and to perform a random draw from the conditional posterior.
The third difference in the Litterman (1986) approach arises precisely because of the difficulty of estimating a large system when the crossvariable shrinkage is imposed. To overcome this, Litterman (1986) treats the error covariance matrix as fixed and diagonal and estimates it in a preliminary step. This assumption means that the model can be estimated with ridge regression on an equationbyequation basis. In contrast, in our baseline NIW prior, the covariance matrix is sampled from an inverted Wishart, calibrated so that its expected value coincides with the fixed diagonal matrix of Litterman (1986).
While the pioneering work of Litterman (1986) suggested it was useful to have crossvariable shrinkage, it has become more common to estimate larger models without crossvariable shrinkage, in order to have a Kronecker structure that speeds up computations and facilitates simulation. Still, preprogrammed Bayesian capabilities in programs like RATS include an option for crossvariable shrinkage.
To assess these specification choices, Figure 7 provides results for our Litterman specification of a model in levels, using crossvariable shrinkage of λ_{2} = 0.2. To be able to compare density results, we simulated forecasts for the Litterman specification using the posterior normal distribution for each equation's coefficients, treating each equation independently per Kadiyala and Karlsson (1997) and holding the error variance matrix fixed (and diagonal). The Litterman approach that uses both crossvariable shrinkage and a diagonal error variance matrix fares on average slightly worse than the baseline model. The average losses are rather small: about 0.92% for the RMSFE and 1.41% for the SCORE loss function. These apparently clearcut results hide an interesting feature, which becomes apparent when one looks at the role played by the crossvariable shrinkage versus the diagonal error variance matrix. To shed light on this, we estimated the BVAR using the Litterman estimation approach but setting λ_{2} = 1. This case is in between the Litterman approach and the baseline model: it can be thought of as the Littermann approach with no crossvariable shrinkage, or as the baseline model with a diagonal variance matrix. Results for this case (available in the supporting information, Figure 12) show a clear deterioration with respect to the baseline model, with an average loss of 10.57% in RMSFE and 24.87% in SCORE. Moreover, the model is outperformed by the baseline specification in 100% of the cases for the RMSFE and 97% of the cases for the SCORE. Therefore, it seems that, by itself, the use of a diagonal variance matrix in the baseline specification reduces forecast accuracy, while the use of crossvariable shrinkage in the Litterman approach improves accuracy. On balance, these two effects offset each other.12
To sum up, the imposition of a diagonal error variance matrix is detrimental to forecasting accuracy, especially in density forecasting. Imposing crossvariable shrinkage provides a benefit, but on average such benefit is offset by the cost of imposing the diagonality in the variance of the system. Finally, while imposing crossvariable shrinkage and allowing a nondiagonal variance matrix is possible in principle, estimation should be performed via Gibbs sampling and quickly becomes difficult from a computational perspective (see, for example, Karlsson, 2012).
8 Rolling Versus Recursive Estimation
 Top of page
 Summary
 1 Introduction
 2 Data and Design of the Forecasting Exercise
 3 Baseline Case
 4 Selection of Hyperparameters and Lag Length
 5 Level Versus Growth Rates and the Role of Cointegration/unit Roots Priors
 6 Alternative MultiStep Forecasting Approaches
 7 Litterman Prior and CrossVariable Shrinkage
 8 Rolling Versus Recursive Estimation
 9 VAR Size
 10 Results for other Countries
 11 Conclusions
 Acknowledgements
 References
 Supporting Information
There is a long debate in the forecasting literature on the relative merits of rolling versus recursive estimation. The former can be more robust in the presence of structural breaks, while the latter can be more efficient. Hence we now assess their performance in our context and evaluate whether the other results we have obtained so far are robust to the choice of the estimation method.
To start, in Figure 8 we compare point and density forecasts from recursive and rolling estimates of the benchmark specification, taking the recursive case as the benchmark. The rolling estimates use a window of 11 years of data, corresponding to the size of the sample used to generate the first forecast observation in the recursive scheme. On average, the two methods perform broadly similarly, particularly in terms of RMSFEs. But looking at the percentage of cases in which a given method outperforms the other, it appears that the rolling method performs relatively better in density forecasting, while the recursive method performs relatively better in point forecasting. We interpret this finding as possibly related to timevarying volatility in the errors. Indeed, when one considers density forecasts, the assumption of homoskedasticity and Gaussianity is restrictive and drifting volatilities likely matter. Introducing stochastic volatility in a VAR of this dimension creates substantial computational problems and goes beyond the scope of this paper, but it is considered in Carriero et al. (2011).13 It is also worth mentioning that rolling estimation improves substantially the FFR forecasts under both loss functions, a finding that might be explained by the ability of rolling forecasts to ‘forget’ about previous policy regimes.
We now assess the robustness of the previous findings related to the usefulness of optimal selection of shrinkage and lag length and crossvariable shrinkage. To save space, we present a summary of the results, with full details available upon request. When we optimize the overall shrinkage hyperparameter λ_{1}, the optimal setting moves a bit more compared to the recursive case, when it was steady at 0.15. In particular, the parameter still remains at 0.15 for the majority (86.4%) of the samples, but in the other cases it moves up to 0.2. The resulting forecasts, are however, fairly close to those based on a fixed tightness, confirming what we found before. When we optimize the lag order, the selected lag is much more variable than in the recursive case. The optimal lag starts at 3, rises to 6 or 7 in the mid 1990s, bounces around to levels as high as 12, and then ends the sample at 9. Consistent with the recursive results, for models estimated with a rolling sample, optimizing the lag improves accuracy (compared to the case of a fixed lag). For example, RMSFE is lower with the optimal lag in about 82% of the horizon and variable combinations, by an average of 2%. Finally, the results on crossvariable shrinkage are in line with what we documented in the case of recursive estimation.
In summary, rolling estimation can slightly improve the density forecasts, but not the point ones, with respect to recursive estimation. However, the other main findings of the paper in terms of small but positive gains coming from hyperparameter and lag length selection are overall confirmed also with rolling estimation.
9 VAR Size
 Top of page
 Summary
 1 Introduction
 2 Data and Design of the Forecasting Exercise
 3 Baseline Case
 4 Selection of Hyperparameters and Lag Length
 5 Level Versus Growth Rates and the Role of Cointegration/unit Roots Priors
 6 Alternative MultiStep Forecasting Approaches
 7 Litterman Prior and CrossVariable Shrinkage
 8 Rolling Versus Recursive Estimation
 9 VAR Size
 10 Results for other Countries
 11 Conclusions
 Acknowledgements
 References
 Supporting Information
While a number of studies have found forecast accuracy improves with larger datasets, it is not necessarily the case that more is always better. For example, Boivin and Ng (2006) suggest that preselecting the variables that are included in a factor model according to their relationship with the target variable of interest can improve the forecasting precision. Similarly, Banbura et al. (2010) show that a mediumscale BVAR of about 20 variables delivers often more accurate forecasts than large BVARs. Koop (2013) shows that the forecasting performance increases with size, but only up to about 20 variables.
Therefore, we now assess the forecasting performance of a smallerscale BVAR for a subset of the variables of interest, comparing it to that of our benchmark mediumsized VAR. Then, we consider whether the findings on the role of the BVAR specification choices remain valid for the smaller system.
We focus on the following seven variables: unemployment rate (UR), core PCE price index (PCEXFEPI), nonfarm payroll employment (PAYROLLS), nominal retail sales (RETAILSALES), singlefamily housing starts (STARTS), industrial production (IP) and the federal funds rate (FFR).
Results based on the baseline specification for both VARs are displayed in Figure 9. It is interesting to note that while on average the small system seems to produce slightly better forecasts with respect to the large system, for most variables and horizons (38.1% for the RMSFE and 29.8% for the SCORE), the small system is actually less accurate. The similar average performance of the two models is indeed driven by the particularly good performance of the small system in forecasting the FFR. If one does not consider this variable, then the large system produces better forecasts in most cases.
Let us now assess the robustness of the results obtained using the baseline VAR with 18 variables with respect to a set of specification choices, specifically, optimal tightness and lag length, levels versus growth rates, and iterated, pseudoiterated and direct approach. To save space, we do not report the detailed results, but they are available upon request.
With the sevenvariable VAR, the optimal lag length selected is 7 in the first quarter of the sample, 8 in the second quarter and 13 in the second half. It seems that, with some variables dropped out, the smaller model needs longer lags to soak up the associated dynamics. As the selected lag length is always quite high, the gains in choosing optimally the lag length are limited. The mean and median of the loss functions are very similar to those computed using the 13lag specification, and the percentage of cases where the optimal selection pays off is close to 50%. When considering optimal tightness (λ_{1}) selection, the optimal tightness comes out at 0.2 for all forecast origins, up (implying less shrinkage) a bit from the 0.15 that proved optimal in the 18variable model. The similarity of the shrinkage selections for the sevenvariable and 18variable models is consistent with the conventional wisdom that a setting of 0.2 generally works well. As in the 18variable case, with just seven variables in the model the mean and median loss function values with optimal shrinkage are very similar to those obtained with fixed tightness. Moreover, the percentages of instances where a variable is better forecast by the model with optimal tightness is 43%, so in this case selecting optimally the tightness slightly reduces forecast accuracy on average. These results do not change much when tightness and lag length are jointly optimized.
With regard to the effect of levels versus growth rates and the use of iterated, pseudoiterated and direct forecasting approaches, results are in line with those obtained with the baseline 18variable specification.
With regard to the direct approach, the advantage of the 18variable model over the sevenvariable model is smaller. On average, the larger model is more accurate, but the difference between the two models is smaller than under the pseudoiterated and iterated approaches. In fact, at longer forecast horizons, for many variables the smaller model becomes slightly more accurate than the larger model. This might indicate that the sevenvariable VAR is somewhat misspecified with respect to the 18variable VAR, and the direct approach is better suited to handle such misspecification.
10 Results for other Countries
 Top of page
 Summary
 1 Introduction
 2 Data and Design of the Forecasting Exercise
 3 Baseline Case
 4 Selection of Hyperparameters and Lag Length
 5 Level Versus Growth Rates and the Role of Cointegration/unit Roots Priors
 6 Alternative MultiStep Forecasting Approaches
 7 Litterman Prior and CrossVariable Shrinkage
 8 Rolling Versus Recursive Estimation
 9 VAR Size
 10 Results for other Countries
 11 Conclusions
 Acknowledgements
 References
 Supporting Information
To make sure our conclusions have broad applicability, we extend our analysis to three more countries: Canada, France and the UK. For each country we have collected a dataset composed of nine variables since the entire set of variables used for the US analysis is not available for each country, or at least not for a sufficiently long time span. The variables, together with their transformations, are described in Table 1 (panel B).
The estimation and forecast samples are comparable to those for the USA. Specifically, for Canada and France we use data ranging from January 1971 to May 2010. The first estimation sample is February 1972 to December 1983, and then the estimation sample expands with the recursive scheme, ending in May 2009. The forecast period for these countries ranges from January 1984 to May 2010. For the UK the sample is slightly shorter, starting in January 1975 and ending in March 2010. The first estimation window is February 1976 to December 1987, the last is January 1975 to March 2009, and the forecast sample is January 1988 to March 2010.
The benchmark specification is as for the USA; therefore with a fixed lag length and tightness, variables in levels (with unit root and sums of coefficients priors), full iteration to obtain hstepahead forecasts and recursive estimation. We then assess the role of optimal selection of the lag length and prior tightness; growth versus level specification; alternative methods for computing hstepahead forecasts (pseudoiterated and direct); crossvariable shrinkage; and rolling versus recursive estimation. In most cases we consider both point and density forecasts.
In the interest of space, here we briefly summarize the results, focusing especially on the comparison with the findings obtained for the USA. More detailed results can be found in the online Appendix (supporting information).
The results on the choice of the tightness parameter and lag are broadly in line with those for the USA, but with some small differences. For two of the three other countries (Canada and France), optimizing the shrinkage yields slightly larger and more consistent gains than does optimizing the lag, while lag shrinkage performs relatively better for the USA and UK. For example, in the case of Canada, optimizing shrinkage improves RMSFEs relative to the baseline in 89.8% of the variable and horizon combinations, with a median gain of 1.15%; optimizing the lag improves RMSFEs in 66.7% of cases, with a median improvement of 0.73%. For the UK, optimizing shrinkage also improves RMSFEs in 89.8% of cases, with a median gain of 0.83%; optimizing the lag improves RMSFEs in 87.0% of cases, with a median improvement of 1.74%. Optimizing both shrinkage and lag together does not add any additional RMSFE gains. Therefore, the overall message remains that optimizing over the tightness hyperparameter and lag length can be helpful for the majority of variables and forecast horizons, though for most variables and horizons the gains are limited.
The comparison of levels versus growth rates mostly—although not entirely—resembles that for the USA. For Canada and France, like the USA, forecast accuracy is similar for models in levels and growth rates, when measured either by RMSFE or predictive score. However, for the UK the levels specification has a clearer advantage over the growth specification, with average RMSFE and score gains of about 2% and 5%, respectively, and a better performance in about 60% of cases. Overall, with respect to the USA, there seems to be less support for specifications in growth rates.
Turning to the findings on the iterated versus pseudoiterated approach, for the US we found clear evidence that the two methods produce virtually the same results in terms of point forecasts, supporting the use of the much quicker pseudoiterated approach. These results are resoundingly confirmed. For Canada, France and the UK, the RMSFEs never differ by 1% or more.
As for the comparison between the direct and iterated approach (for models in what we refer to as growth rate form), the results for other countries are broadly similar to those for the USA: there is little to be gained by the use of direct multistep estimation and forecasting. For Canada, France and the UK, RMSFEs are on average slightly higher with the direct approach than the baseline specification (in levels, with iterated forecasts), with RMSFEs that are higher in most variablehorizon combinations. The direct approach also yields slightly lower scores, although the percentage of cases in which it is less accurate than the baseline is smaller in terms of scores than in terms of RMSFEs.
With regard to the comparison between the Litterman specification prior and the baseline specification we confirm that the crossvariable shrinkage, when coupled with a fixed and diagonal error variance matrix, does not pay a lot, since the fraction of cases in which the simpler baseline specification forecasts better is often above 50%, and it yields average gains in the range of 1–4%.
Finally, about rolling rather than recursive estimation, the former performs slightly better than for the USA, in particular for density forecasting. In terms of RMSFE the largest average gains are about 3% for the UK, with a value of 2% for Canada and France. The gains in terms of predictive score are instead in the range 5–11%, with 71–91% of cases where the rolling score is better than the recursive one. As mentioned in the case of the USA, rolling estimation could provide more robustness in the presence of parameter time variation, and this seems to matter more for density forecasts.
11 Conclusions
 Top of page
 Summary
 1 Introduction
 2 Data and Design of the Forecasting Exercise
 3 Baseline Case
 4 Selection of Hyperparameters and Lag Length
 5 Level Versus Growth Rates and the Role of Cointegration/unit Roots Priors
 6 Alternative MultiStep Forecasting Approaches
 7 Litterman Prior and CrossVariable Shrinkage
 8 Rolling Versus Recursive Estimation
 9 VAR Size
 10 Results for other Countries
 11 Conclusions
 Acknowledgements
 References
 Supporting Information
In this paper we discuss how a set of specification choices affects the forecasting performance of Bayesian VARs. We adopt as a benchmark a common specification in the literature, a Bayesian VAR with variables entering in levels and a prior modeled along the lines of Sims and Zha (1998). We then consider optimal choice of tightness, of lag length and of both; consider the relative merits of modeling in levels or growth rates; compare alternative approaches to multistep forecasting (direct, iterated and pseudoiterated); and discuss the treatment of the error variance and of crossvariable shrinkage.
To ensure our results have broad applicability, we check their robustness to the choice of the sample size (the time series dimension of the VAR) by comparing rolling with recursive estimation; to the VAR size (the crosssectional dimension of the VAR) by analyzing a subset of seven of the 18 US variables; and to the VAR composition, by repeating the analysis for some other datasets—specifically, data for Canada, France and the UK.
We obtain a large set of empirical results. We can summarize them by saying that we find very small losses (and sometimes even gains) from the adoption of BVAR modeling choices that make forecast computation quick and easy, in particular for point forecasts. An approach that works well is to specify a Normalinverted Wishart prior along the lines of Sims and Zha (1998) on the VAR in levels, preferably optimizing its tightness and lag length. Optimizing over the lag length tends to be more helpful (i.e. providing relatively larger gains) than optimizing the tightness. For the accuracy of point forecasts, there proves to be essentially no payoff to using simulation methods to obtain multistep forecasts from the posterior distribution. For density forecasting, simulation methods work better than a direct multistep approach, especially at long horizons. Specifications in levels benefit a lot from the imposition of the sum of coefficients and dummy initial observation priors of Doan et al. (1984) and Sims (1993). Instead, there is no payoff to using a Litterman (1986) prior that is tighter for lags of other variables than for lags of the dependent variable and treats the error variance matrix as fixed and diagonal. Using rolling estimation can enhance the density forecasting performance, while in terms of mean squared error it is difficult to do better than the benchmark. The finding that simple methods work well could therefore further enhance the diffusion of the BVAR as an econometric tool for a vast range of applications, particularly by researchers and practitioners relying on preprogrammed BVAR tools in common software packages such as RATS and Eviews.