Sophisticated and small versus simple and sizeable: When does it pay off to introduce drifting coefficients in Bayesian VARs?

We assess the relationship between model size and complexity in the time-varying parameter VAR framework via thorough predictive exercises for the Euro Area, the United Kingdom and the United States. It turns out that sophisticated dynamics through drifting coefficients are important in small data sets, while simpler models tend to perform better in sizeable data sets. To combine the best of both worlds, novel shrinkage priors help to mitigate the curse of dimensionality, resulting in competitive forecasts for all scenarios considered. Furthermore, we discuss dynamic model selection to improve upon the best performing individual model for each point in time.

1. Introduction.In predictive inference, two main bearings can be found.First, simple models are increasingly replaced by more sophisticated versions in order to avoid functional misspecification.Second, due to increased data availability, information sets become more sizeable, and models thus higher dimensional, which in turn decreases the likelihood of omitted variable bias.The goal of this paper is a systematic assessment of the relationship between model size and complexity using different macroeconomic data sets by comparing the predictive performance of multivariate time series models that range from being relatively simple (i.e., feature constant parameters and heteroskedastic shocks) to very flexible (timevarying parameters and heteroskedastic shocks).Our conjecture is that the introduction of drifting coefficients in multivariate time series regressions can control for an omitted variable bias in small-scale models or, conversely, larger information sets can substitute for non-linear model dynamics.
The choice between model size and model complexity is not innocuous.Large models, such as vector autoregressions (VARs) with many endogenous variables, naturally avoid an omitted variable bias.This often translates into superior predictive performance (Bańbura, Giannone and Reichlin, 2010) and avoids puzzles commonly observed in empirical macroeconomics (such as the price puzzle; see Sims, 1992).Adding stochastic volatility (SV) often and a large model with 15 variables.To focus on the relationship between model size and complexity, we differentiate between models that feature constant and drifting parameters.In both specifications, we assume that the shocks are heteroskedastic and follow an SV model.To control for overfitting, two recent shrinkage priors, the Normal-Gamma (NG) prior (Griffin and Brown, 2010) and the Dirichlet-Laplace (DL) prior (Bhattacharya et al., 2015), are used to induce shrinkage in our different model specifications.
Our results are threefold: First, we show that the proposed TVP-VAR-SV shrinkage models improve one-step ahead forecasts.Allowing for time variation and using shrinkage priors leads to smaller drops in forecast performance during the global financial crisis -a finding that is also corroborated by looking at model weights in a dynamic model selection exercise.Second, comparing the proposed priors, we find that the DL prior shows a strong performance in small-scale applications, while the NG prior outperforms using larger information sets.This is driven by the higher degree of shrinkage the NG prior provides, which is especially important for large scale applications.Last, we demonstrate that the larger the information set, the stronger the forecast performance of a simple, constant parameter VAR with SV.However, also here, the NG-VAR-SV model turns out to be a valuable alternative, providing forecasts that are not far off those of the constant parameter competitor.To allow for different models at different points in time, we also discuss the possibility of dynamic model selection (Koop, Leon-Gonzalez and Strachan, 2009).
The remainder of the paper is structured as follows.The second section sets the stage, introduces a standard TVP-VAR-SV model, and highlights typical estimation issues involved.Section 3 describes in detail the prior setup adopted.Section 4 presents the necessary details to estimate the model, including an overview of the Markov chain Monte Carlo (MCMC) algorithm and the relevant conditional posterior distributions.Section 5 provides empirical results alongside the main findings of our forecasting comparison.Furthermore, it contains a discussion of dynamic model selection.Finally, the last section summarizes and concludes the paper.
2. Econometric framework.In this paper, the model of interest is a TVP-VAR with (SV) in the spirit of Primiceri (2005).The model summarizes the joint dynamics of an Mdimensional zero-mean vector of macroeconomic time series {y t } T t=1 as follows:1 (1) The M × M matrix A jt (j = 1, . . ., p) contains time-varying autoregressive coefficients, ε t is a vector white noise error with zero mean and a time-varying variance-covariance matrix H t is a lower unitriangular matrix, and V t = diag(e v1t , . . . ,e vMt ) denotes a diagonal matrix with time-varying shock variances.The model in Equation 1 can be cast in a standard regression form as follows, (2) Cogley and Sargent (2005), we can rewrite Equation 2 as (3) and multiplying from the left with Ht := H −1 t yields (4) For further illustration, note that the first two equations of the system are given by with h21,t denoting the second element of the first column of Ht .Equation 6 can be rewritten as ( 7) where A i•,t denotes the ith row of A t .More generally, the ith equation of the system is a standard regression model augmented with the residuals of the preceding i − 1 equations, (8) Thus, the ith equation is a standard regression model with For each equation i > 1, the corresponding dynamic regression model is then given by ( 9) The states in B it evolve according to a random walk process, (10) where Ω i = diag(ω 1 , . . ., ω Ki ) is a diagonal variance-covariance matrix.Note that if a given diagonal element of Ω i is zero, the corresponding regression coefficient is assumed to be constant over time.
Typically, conjugate inverted Gamma priors are specified on ω j (j = 1, . . ., K i ).However, as Frühwirth-Schnatter and Wagner (2010) demonstrate, this choice is suboptimal if ω j equals zero, since the inverted Gamma distribution artificially places prior mass away from zero and thus introduces time variation even if the likelihood points toward a constant parameter specification.To alleviate such concerns, Frühwirth-Schnatter and Wagner (2010) exploit the non-centered parameterization of Eqs. ( 9) and ( 10), ( 11)

We let
√ Ω i denote the matrix square root such that Ω i = √ Ω i √ Ω i and Bit has typical element j given by bij,t = bij,t−bij,0 √ ωij .The corresponding state equation is given by ( 12) Moving from the centered to the non-centered parameterization allows us to treat the (signed) square root of the state innovation variances as additional regression parameters to be estimated.Moreover, this parameterization also enables us to control for model uncertainty associated with whether a given element of z it , i.e., both autoregressive coefficients and covariance parameters, should be included or excluded from the model.This can be achieved by noting that if b ij,0 ̸ = 0, the jth regressor is included.The second dimension of model uncertainty stems from the empirically relevant question whether a given regression coefficient should be constant or time-varying.Thus, if ω jj ̸ = 0, the jth regressor drifts smoothly over time.Especially for forecasting applications, appropriately selecting which subset of regression coefficients should be constant or time-varying proves to be one of the key determinants in achieving superior forecasting properties (D 'Agostino, Gambetti and Giannone, 2013;Korobilis, 2013;Belmonte, Koop and Korobilis, 2014;Bitto and Frühwirth-Schnatter, 2019).
Finally, we also have to introduce a suitable law of motion for the diagonal elements of V t .Here, we assume that each v it evolves according to an independent AR(1) process, (13) ), for i = 1, . . ., M .The parameter µ i denotes the mean of the ith log variance, ρ i is the corresponding persistence parameter, and σ 2 i stands for the error variance of the relevant shocks.
3. Prior specification.We opt for a fully Bayesian approach to estimation, inference, and prediction.This calls for the specification of suitable priors on the parameters of the model.Typically, inverse Gamma or inverted Wishart priors are used for the state innovation variances in Equation 10.However, as discussed above, such priors bound the diagonal elements of Ω i artificially away from zero, always inducing at least some movement in the parameters of the model.
We proceed by utilizing two flexible global-local (GL) shrinkage priors (see Polson and Scott, 2010) on B i0 and ω i = (ω i1 , . . ., ω iKi ) ′ .A GL shrinkage prior comprises a global scaling parameter that pushes all elements of the coefficient vector toward zero and a set of local scaling parameters that enable coefficient-specific deviations from this general pattern.
3.1.The NG shrinkage prior.The first prior we consider is a modified variant of the NG shrinkage prior proposed in Griffin and Brown (2010) and adopted within the general class of state space models in Bitto and Frühwirth-Schnatter (2019).In what follows, we let a 0 = vec(A 0 ) denote the time-invariant part of the VAR coefficients with typical element a 0j for j = 1, . . ., K = pM 2 .The corresponding signed squared root of the state innovation variance is consequently denoted by ± √ ω j or simply √ ω j .Thus, √ ω j crucially determines the amount of time variation in the jth element of a t .
With this in mind, our prior specification is a scale mixture of Gaussians, where τ 2 sj for s ∈ {a, ω} is a set of local scaling parameters that follow a Gamma distribution and λ s is a set of global shrinkage parameters.The hyperparameter ϑ a controls the excess kurtosis of the marginal prior, (18) p(a 0j |λ a ) = p(a 0j |τ 2 aj , λ a )dτ 2 aj , obtained after integrating out the local scaling parameters.For the marginal prior, λ a controls the overall degree of shrinkage.Lower values of ϑ a place increasing prior mass at zero while at the same time lead to heavy tails of p(a 0j |λ a ).
On the covariance parameters his,0 (i = 2, . . ., M ; s = pM + 1, . . ., K i ) and the associated innovation standard deviations γ is = √ ω is , we impose the standard implementation of the NG prior.To simplify prior implementation we collect the v = M (M − 1)/2 free covariance parameters in a vector h0 and the corresponding elements of Ω = diag(Ω 1 , . . ., Ω M ) in a v-dimensional vector γ with typical elements hi0 and γ i , Here, for s ∈ {h, γ}, τ 2 si is the set of local scaling parameters, and ϖ h and ϖ γ are global shrinkage parameters that push all covariance parameters and the corresponding state innovation standard deviations across equations to zero, respectively.The set of hyperparameters ϑ s for s ∈ {h, γ} again controls the excess kurtosis of the marginal priors.

3.2.
The DL shrinkage prior.The NG prior possesses good empirical properties.However, from a theoretical point of view, its properties are still not well understood.In principle, GL shrinkage priors aim to approximate a standard spike and slab prior (George and Mc-Culloch, 1993;George, Sun and Ni, 2008) by introducing suitable mixing distributions on the local and global scaling parameters of the model.Bhattacharya et al. (2015) introduce a prior specification and analyze its properties within the stylized normal means problem.Their prior, the DL shrinkage prior, excels in both theory and empirical applications, especially in very high dimensions.Thus, for the TVP-VAR-SV, it seems to be well suited given the large dimensional parameter and state space.
Similarly to the NG prior, the DL prior also depends on a set of global and local shrinkage parameters, Hereby, for s ∈ {a, ω}, ξ sj is the set of local scaling parameters defined on the (K − 1)dimensional unit simplex S K−1 = {x = (x 1 , . . ., x K ) ′ : x j ≥ 0, K j=1 x j = 1} with ξ s = (ξ s1 , . . . ,ξ sK ) ′ .The parameter n s controls the overall tightness of the prior.ψ sj is the set of auxiliary scaling parameters to achieve conditional normality.
For the variance-covariance matrix we also impose the DL prior, The local shrinkage parameters ψ si and ξ 2 si for s ∈ {h, γ} are defined analogously to the case of the regression coefficients described above.We let πs denote the set of global shrinkage parameters with large values, implying heavy shrinkage on the covariance parameters of the model.
The main differences of the NG and the DL prior are the presence of the Dirichlet components that introduce even more flexibility.Bhattacharya et al. (2015) show that in the framework of the stylized normal means problem, this specification yields excellent posterior contraction rates in light of a sparse DGP.Within an extensive simulation exercise, they moreover provide some evidence that this prior also works well in practice.
Finally, the prior setup on the coefficients in the state equation of the log volatilities closely follows Kastner (2016).Specifically, we place a weakly informative Gaussian prior on µ i , µ i ∼ N (0, 10 2 ) and a Beta prior on ρi+1 2 ∼ B(25, 1.5).Additionally, σ 2 i ∼ G(1/2, 1/2) introduces some shrinkage on the process innovation variances of the log volatilities.This setup is used for all equations.
4. Bayesian inference.The joint posterior distribution of our model is analytically intractable.Fortunately, however, the full conditional posterior distributions mostly belong to some well-known family of distributions, implying that we can set up a conceptually straightforward Gibbs sampling algorithm to estimate the model.

4.1.
A brief sketch of the MCMC algorithm.Our algorithm is related to the MCMC scheme put forward in Carriero, Clark and Marcellino (2019) and estimates the latent states on an equation-by-equation basis.Specifically, conditional on a suitable set of initial conditions, the algorithm cycles through the following steps: vector with element y it e −(vit/2) , and V i is a prior covariance matrix that depends on the prior specification adopted.Note that in contrast to Carriero, Clark and Marcellino (2019) who sample the VAR parameters in A 0 and the elements of H0 conditionally on each other, we follow Eisenstat, Chan and Strachan (2016) and sample these jointly, which speeds up the mixing of the sampler.2. Simulate the full history of { Bit } T t=1 by means of a forward filtering backward sampling algorithm (see Carter and Kohn, 1994;Frühwirth-Schnatter, 1994) per equation.3. The log volatilities and the corresponding parameters of the state equation in Equation 13are simulated using the algorithm put forward in Kastner and Frühwirth-Schnatter (2014) via the R package stochvol (Kastner, 2016;Hosszejni and Kastner, 2021).4. Depending on the prior specification adopted, draw the parameters used to construct V i using the conditional posterior distributions detailed in Section 4.2 (NG prior) or Section 4.3 (DL prior).
This algorithm produces draws from the joint posterior distribution of the states and the model parameters.In the empirical application that follows we use 30,000 iterations where we discard the first 15,000 as burn-in.
4.2.Conditional posterior distributions associated with the NG prior.Conditional on the full history of all latent states in our model as well as the global shrinkage parameters, it is straightforward to show that the conditional posterior distributions of τ 2 sj for s ∈ {a, ω} and j = 1, . . ., K are given by where • indicates conditioning on the remaining parameters and states of the model.Moreover, GIG(ζ, χ, ϱ) denotes the generalized inverse Gaussian distribution with density proportional to x ζ−1 exp{−(χ/x + ϱx)/2}.To draw from this distribution, we use the algorithm of Hörmann and Leydold (2013) implemented in the R package GIGrvg (Leydold and Hörmann, 2017).
The conditional posteriors of the local scalings for the covariance parameters and their corresponding innovation standard deviations also follow GIG distributions, The conditional posterior of λ s for s ∈ {a, ω} is of well-known form, namely, a Gamma distribution, (33) Likewise, for s ∈ {h, γ}, the conditional posterior of ϖ s is given by ( 34) 4.3.Conditional posterior distributions associated with the DL prior.Following Bhattacharya et al. (2015) we update jointly the full conditional posterior distribution p(ξ sj , λ s , ψ sj |•) for s ∈ {a, ω}.For the Dirichlet components, the conditional posterior distribution is obtained by sampling a set of K auxiliary variables N aj , N ωj (j = 1, . . ., K), After obtaining the K scaling parameters we set ξ aj = N aj /N a and ξ ωj = N ωj /N ω with N a = K j=1 N aj and N ω = K j=1 N ωj .The conditional posterior of the global shrinkage parameter λs for s ∈ {a, ω} follows a GIG distribution, The full conditional posterior distributions of ψ −1 aj and ψ −1 ωj are inverse Gaussian, To sample from the conditional posterior distribution of ξ si for s ∈ {h, γ}, again, we introduce a set of auxiliary variables N hi , N γi , and obtain draws from ξ hi and ξ γi by using ξ hi = N hi / v i=1 N hi and ξ γi = N γi / v i=1 N γi .The global shrinkage parameter on the covariance parameters and the process innovation variances follows again a GIG distribution, The full conditional posterior distributions of ψ −1 hi and ψ −1 γi for i = 1, . . ., v are given by 5. Forecasting macroeconomic quantities for three major economies.In what follows we systematically assess the relationship between model size and model complexity by forecasting several macroeconomic indicators for three large economies, namely the EA, the UK and the US.In Section 5.1, we briefly describe the different data sets and discuss model specification issues.Section 5.2 deals with simple visual summaries of posterior sparsity in terms of the VAR coefficients and their time variation for the two shrinkage priors proposed.The main forecasting results are discussed in Section 5.3.Finally, Section 5.4 discusses the possibility to dynamically select among different specifications in an automatic fashion.
5.1.Data and model specification.We use prominent macroeconomic data sets for the EA, the UK and the US.All three data sets are on a quarterly frequency but span different periods of time.For the euro area, we take data from the area wide model (Fagan, Henry and Mestre, 2005) and additionally include equity prices available from 1987Q1 to 2015Q4.UK data stem from the Bank of England's "A millennium of macroeconomic data" (Thomas, Hills and Dimsdale, 2010) and covers the period from 1982Q2 to 2016Q4.For the US, we use a subset from the FRED QD database (McCracken and Ng, 2016), which covers the period from 1959Q1 to 2015Q1.
For each of the three cases we use three subsets, a small (3 variables), a medium (7 variables) and a large (15 variables) subset.The small subset covers only real activity, prices and short-term interest rates.The medium models cover in addition investment and consumption, the unemployment rate, and either nominal or effective exchange rates.For the large models we add wages, money (measured as M2 or M3), government consumption, exports, equity prices, and 10-year government bond yields.
To complete the data set for the large models, we include additional variables depending on data availability for each country set.For example, the UK data set offers a wide range of financial data, so we complement the large model by including also data on mortgage rates and bond spreads.For the EA data set we include also a commodity price indicator and labor market productivity, while for the US we add consumer sentiment and hours worked.In what follows, we are interested not only in the relative performance of the different priors but also in the forecasting performance using different information sets.Thus, we have opted firstly to strike a good balance between different types of data (e.g., real, labor market, and financial market data) and secondly to alter variables for the large data sets slightly.This is done to rule out that performance between information sets depends crucially on the type of information that is added (e.g., labor market data vs. financial market data).
For data that are non-stationary we take first differences; see Table 2 in the appendix for more details.Consistent with the literature (Cogley and Sargent, 2005;Primiceri, 2005; D'Agostino, Gambetti and Giannone, 2013), we include p = 2 lags of the endogenous variables in all models.For both the NG prior and the DL prior, we use lag-wise specifications, similarly to those in Huber and Feldkircher (2017), in order to increase shrinkage for higher lags. 2 5.2.Inspecting posterior sparsity.Before we turn to the forecasting exercise, we assess the amount of sparsity induced by our two proposed global-local shrinkage specifications, labeled TVP-SV NG and TVP-SV DL.This analysis is based on inspecting heatmaps that show the posterior mean of the coefficients as well as the posterior mean of the standard deviations that determine the amount of time variation in the dynamic regression coefficients.Figs. 1 to 3 show the corresponding heatmaps.Red and blue squares indicate positive and 2 Given that the lag length is set equal to two, this choice does not matter much empirically.negative values, respectively.To permit comparability, we use the same scaling across priors within a given country.We start by inspecting posterior sparsity attached to the time-invariant part of the models, provided in the upper panels of Figs. 1 to 3. We generally find that the first own lag of a given variable appears to be important, while the second lag is slightly less important in most equations.This can be seen by dense (i.e., colored) main diagonals elements.Turning to variables along the off-diagonal elements, i.e., the coefficients associated with variables j ̸ = i in equation i, we find considerable evidence that the (un)employment rate as well as longterm interest rates appears to load heavily on the other quantities in most country models, as indicated by relatively dense columns associated with the first lag of unemployment and interest rates.
Equations that are characterized by a large amount of non-zero coefficients (i.e., dense rows) are mostly related to financial variables, namely, exchange rates, equity, and commodity prices.These observations are general in nature and relate to all three countries considered.
In the next step, we investigate sparsity in terms of the degree of time variation of the VAR coefficients (see the lower panels of Figs. 1 to 3).Here, we observe that, consistent with the dense pattern in a 0 , equations associated with financial variables display the largest amount of time variation.Interestingly, the results suggest that coefficients in the EA tend to display a greater propensity to drift as compared to the coefficients of the UK country model.
Comparing the degree of shrinkage between both the DL and the NG prior reveals that the latter specification induces much more sparsity in large dimensional systems.While both priors yield rather sparse models, the findings point toward a much stronger degree of shrinkage of the NG prior.Notice that the NG prior also favors constant parameter specifications.This suggests that in large scale applications, the NG prior might be particular useful when issues of overparametrization are more of a concern, while in smaller models, the flexibility of the DL prior might be beneficial.5.3.Forecasting results.In this section, we examine the forecasting performance of the proposed prior specifications.The forecasting setup largely follows Huber and Feldkircher (2017) and focuses on the one-quarter-and one-year-ahead forecast horizons and three different information sets: small (3 variables), medium (7 variables), and large (15 variables).We use an expanding window and a hold-out sample of 80 quarters, which results into the following hold out samples: 1995Q4-2015Q3 for the EA, 1997Q1-2016Q4 for the UK and 1995Q4-2015Q3 for the USA.
Forecasts are evaluated using log predictive scores (LPSs), a widely used metric to measure density forecast accuracy (see, e.g., Geweke and Amisano, 2010).The LPSs are closely related to the marginal likelihood, a standard Bayesian measure for discriminating between competing models.It is also worth stressing that, as opposed to focusing on point forecasts, considering LPSs allows us to factor in how well a given model captures higher-order features of the predictive distribution.These higher-order features are important if the researcher is interested in reporting functions of the predictive distribution, such as growth at risk (Adrian, Boyarchenko and Giannone, 2019) or other measures relevant for capturing tail risks to macroeconomic outcomes (Clark et al., 2023).Last, relying on point forecasts requires the researcher to choose a loss function (in addition to model and prior choice) in order to evaluate their quality, whereas using LPSs amounts to choosing the model-implied loss function.
We compare the NG and DL specifications with a simpler constant parameter Bayesian VAR (BVAR-SV) and a time-varying parameter VAR with a loose prior setting (TVP-SV) as a general benchmark.Specifically, this benchmark model assumes that the prior on √ ω j is given by ( 41) On a 0 and for the BVAR-SV, we use the NG shrinkage prior described in Section 3.For the evaluation, we focus on the joint predictive distribution of three focal variables, namely, GDP growth, inflation, and short-term interest rates.This allows us to assess the predictive differences obtained by switching from small to large information sets.Figure 4 summarizes the results for the one-step-ahead forecast horizon.All panels display LPSs for the three focus variables relative to the TVP-SV specification.To assess the overall forecast performance over the hold-out sample, particularly consider the rightmost point in the respective figures.Doing so reveals that the time-varying parameter specifications, TVP-SV NG and TVP-SV DL outperform the benchmark for all three countries and information sets as indicated by positive log predictive Bayes factors.Except for the EA and the small information set, this finding holds also true for the constant parameter VAR-SV specification.Zooming in and looking at performance differences among the priors reveals that the TVP-SV DL specification dominates in the case of small models.The TVP-SV NG prior ranks second, and the constant parameter VAR-SV model performs worst.The dominance of the DL prior stems from the performance during the period of the global financial crisis 2008/2009.While predictions from all model specifications worsen, they deteriorate the least for the DL specification.In particular for the EA and the UK, the dominance of the DL prior stems mainly from improved forecast for short-term interest rates, see Figs. 8 and 9 in Appendix B.
It is worth noting that in small-dimensional models, the TVP-SV specification also performs quite well and proves to be a competitive alternative relative to the BVAR-SV model.This is due to the fact that parameters are allowed to move significantly with only little punishment introduced through the prior, effectively controlling for structural breaks and sharp movements in the underlying structural parameters.This result corroborates findings in D 'Agostino, Gambetti and Giannone (2013) and appears to support our conjecture that for small information sets, allowing for time variation proves to dominate the detrimental effect of the large number of additional parameters to be estimated.
In the next step, we enlarge the information set and turn our focus to the seven variable VAR specifications.Here, the picture changes slightly, and the NG prior outperforms forecasts of its competitors.Depending on the country, either forecasts of the DL specification or the constant parameter VAR-SV model rank second.For US data, it pays off to use a time-varying parameter specification since -as with the small information set -the BVAR-SV model performs worst.Finally, we turn to the large VAR specifications, featuring 15 variables.Here, we see a very similar picture as with the seven variable specification.The TVP-SV NG prior yields the best forecasts, with the constant parameter model turning out to be a strong competitor.Only for US data, both time-varying parameter specifications clearly outperform the constant parameter competitor.
We now briefly examine forecasts for the four quarter horizon displayed in Figure 5.For the small and medium-sized models, all competitors yield forecasts that are close or worse compared to the loose shrinkage benchmark prior model.The high degree of shrinkage induced by the NG prior yields particularly poor forecasts, especially for observations that fall in the period of the global financial crisis.The picture slightly reverses when considering the large-scale models.Here, all competitors easily outperform forecasts of the loose benchmark model, implying that shrinkage pays off.Viewed over all settings, the DL prior does a fine job in balancing the degree of shrinkage across model sizes.terms of achieving superior forecasting results, one could ask whether there are gains from dynamically selecting models.Following Raftery, Karny and Ettler (2010); Koop and Korobilis (2012); Onorante and Raftery (2016) we perform dynamic model selection by computing a set of weights for each model within a given model size.These weights are based on the predictive likelihood for the three focus variables at t − 1. Intuitively speaking, this combination scheme implies that if a given model performed well in predicting last quarter's output, inflation and interest rates, it receives a higher weight in the next period.By contrast, models that performed badly receive less weight in the model pool.We further employ a so-called forgetting factor that induces persistence in the model weights over time.This implies that the weights are not only shaped by the most recent forecast performance of the underlying models, but also by their historical forecasting performance.Finally, to select a given model, we simply pick the one with the highest weight.
The predicted weight associated with model i is computed as follows Here, p t−1|t−2,i denotes the one-step-ahead predictive likelihood for the three focus variables in t − 1 for model i within the model space M. Letting t 0 stand for the final quarter of the training sample, the initial weights w t0+1|t0,i are assumed to be equal for each model.more gradually improving over the sample period.Forecasts for the EA that are based on the large information set are less precise during the period from 2000 to 2012 compared to the benchmark models.This might be related to the creation of the euro, which in turn has triggered a fundamental shift in the joint dynamics of the EA's macro model.Due to the persistence in the models' weights, the model selection algorithm takes some time to adapt to the new regime.This can be seen by investigating the latest period in the sample, in which dynamic model selection again outperforms forecasts of the benchmark model.In other words, for EA data, either restricting the sample period to post 2000 or reducing the persistence via the forgetting factors might improve forecasting results.
6. Conclusive remarks.In this paper, our goal is to investigate how model complexity (loosely defined as the dimension of the state space) interacts with model size.The initial conjecture is that small but flexible (TVP) models are capable of achieving predictive performances that are comparable to large-scale models that assume constant parameters (but time-varying error variances).In three different data sets, we find precisely this pattern.The larger the model gets, the less we require time-varying parameters and the closer the forecasting performance of the large TVP-VAR is to the constant parameter large VAR with SV.This is mainly driven by the fact that modern global-local shrinkage priors introduce substantial shrinkage on the process innovation variances and the degree of shrinkage increases rapidly in the number of coefficients.In case that the model is small, the TVPs control for omitted variable biases, and this proves to be a good substitute for using larger data sets in combination with simpler models.

FIG 1 .
FIG 1. Posterior means in the large model -euro area.

FIG 3 .
FIG 3. Posterior means in the large model -USA.

FIG 4 .
FIG 4. One-quarter-ahead cumulative log predictive Bayes factors for the three focus variables GDP growth, inflation, and short-term interest rates over time, relative to the TVP-SV-VAR without shrinkage.Top row: Small model (3 variables).Middle row: Medium model (7 variables).Bottom row: Large model (15 variables).Left column: euro area (EA).Middle column: United Kingdom (UK).Right column: United States (US).

FIG 7 .
FIG 7. Log predictive Bayes factors of dynamic model selection relative to the best performing model over time, as implied by the predictive performance for the three focus variables GDP growth, inflation, and short-term interest rates.Top row: Small model (3 variables).Middle row: Medium model (7 variables).Bottom row: Large model (15 variables).Left column: euro area (EA).Middle column: United Kingdom (UK).Right column: United States (US).