Dynamic shrinkage in time-varying parameter stochastic volatility in mean models

Successful forecasting models strike a balance between parsimony and flexibility. This is often achieved by employing suitable shrinkage priors that penalize model complexity but also reward model fit. In this note, we modify the stochastic volatility in mean (SVM) model proposed in Chan (2017) by introducing state-of-the-art shrinkage techniques that allow for time-variation in the degree of shrinkage. Using a real-time inflation forecast exercise, we show that employing more flexible prior distributions on several key parameters slightly improves forecast performance for the United States (US), the United Kingdom (UK) and the Euro Area (EA). Comparing in-sample results reveals that our proposed model yields qualitatively similar insights to the original version of the model.


INTRODUCTION
Forecasting in macroeconomics and finance requires flexible models that are capable of capturing salient features of the data such as structural breaks in the regression coefficients and/or heteroscedastic measurement errors. Time-variation in the shocks is often introduced through stochastic volatility (SV) models that imply a smoothly evolving error variance over time. Such models typically rule out that the level of the volatility directly affects the conditional mean of the predictive regression. This assumption is relaxed in Koopman and Hol Uspensky (2002) and Chan (2017) by assuming that the volatilities enter the conditional mean equation and thus exert a direct effect on the quantity of interest.
In this note, we reconsider the model proposed in Chan (2017) and replicate the main findings both in a narrow and wide sense. The original specification is a time-varying parameter (TVP) model with SV that allows for feedback effects between the level of volatility and the endogenous variable. As opposed to most of the existing literature, this model assumes that this relationship is time varying. Estimation and inference is carried out in a Bayesian framework, implying that prior distributions are specified on all coefficients of the model. These priors are often set to be weakly informative.
One key contribution of this note is to introduce shrinkage via state-of-the-art dynamic shrinkage priors that allow for capturing situations where coefficients are time-varying over certain periods in time while they remain constant in others. 1 These priors are based on a recent paper, Kowal et al. (2019), that proposes introducing a dynamic shrinkage process that is time-varying and follows an AR(1) model with Z-distributed shocks. Proper specification of the hyperparameters of this error distribution yields a dynamic Horseshoe (DHS) prior that possesses excellent shrinkage properties. Other specifications we propose also introduce shrinkage but assume the shrinkage coefficients to be independent over time (static horseshoe prior, SHS) or time-invariant, such as a standard horseshoe (HS) prior that exploits the non-centered parameterization of the state space model (see Frühwirth-Schnatter and Wagner, 2010).
The second contribution deals with replicating the main findings of Chan (2017) using updated real-time inflation data. Instead of considering the original three countries (the US, the UK and Germany), we replace Germany with the EA and investigate whether the main findings also hold for this dataset. Using more flexible shrinkage priors generally yields similar in-sample findings for the US and the UK. For the EA, we find only minor evidence of a link between inflation and inflation volatility. This finding relates to Jarociński and Lenza (2018), who observe limited evidence in favor of SV for inflation derived from the harmonized index of consumer prices (HICP). When it comes to forecasting we find that shrinkage sometimes improves predictive accuracy. In cases where predictive accuracy is below the no-shrinkage specification, these differences are often very small.
In the remainder of the note we proceed as follows. The next section summarizes the model and motivates our shrinkage priors. Section 3 replicates the main findings of Chan (2017) using the proposed model and carries out a real-time forecasting exercise to show that using shrinkage often further improves upon the already excellent predictive performance of the original model. Finally, the last section briefly summarizes and concludes the note.

The Time-varying Parameter Stochastic Volatility in Mean Model
The time-varying parameter stochastic volatility in mean (TVP-SVM) model is given by: where y t is a scalar time series, τ t denotes a stochastic trend term, β t is a K-dimensional vector of dynamic regression coefficients while γ t is a coefficient that measures the (potentially) time-varying relationship between y t and the shock volatility e ht . The column vector z t may contain lags of the dependent variable, additional predictors and/or latent factors capturing high-dimensional information. The log-volatility h t follows an AR (1) process with unconditional mean µ h , persistence parameter φ h , and error variance σ 2 . h t , moreover, depends on the lag of y t through a time-invariant parameter δ.

Imposing Shrinkage in TVP Models
The model outlined in the previous sub-section is quite flexible and allows for a direct relationship between the error volatilities and y t . And this relationship might be subject to parameter instability. Allowing for TVPs in all coefficients could, however, lead to overfitting and this often leads to decreases in predictive accuracy. Chan (2017) uses weakly informative priors on key parameters and finds them to yield good forecasting results.
Here, we aim to improve upon this finding by introducing three additional priors that allow us to flexibly select restrictions in the empirical model and thus achieve parsimony.
The priors we consider in this study are given by: (1) A weakly informative prior on the coefficients and state innovation variances similar as in Chan (2017). We use independent weakly informative inverse gamma priors on the innovation variances of the state equation ω j . We subsequently label this prior "None," reflecting the notion that almost no shrinkage is imposed.
(2) A hierarchical global local prior on the constant part and innovation variances of the model. We achieve this by rewriting the model in the non-centered parameterization of Frühwirth-Schnatter and Wagner (2010): with We collect the constant parameters and the state innovation variances in a 2k × 1-vector α = (θ 0 , √ ω 1 , . . . , √ ω k ) and index its ith element for i = 1, . . . , 2k by α i .
(3) A static variant of the horseshoe prior (labeled "SHS") imposing shrinkage using the centered parameterization of the state equation with time-varying variances: We denote the jth diagonal element of Ω t by ω jt = λ j φ jt and assume inverse Gamma distributions as priors for the global and local shrinkage parameters Following Makalic and Schmidt (2015), auxiliary variables v jt ∼ G −1 (1/2, 1) and w j ∼ G −1 (1/2, 1) for j = 1, . . . , k are used for establishing the horseshoe prior. Here, λ j governs the overall amount of time variation for the coefficient of the jth regressor, while ϕ jt allows for predictor and time specific shrinkage.
(4) A dynamic horseshoe prior (labeled "DHS") as in Kowal et al. (2019). Again using the centered parameterization of the state equation with time-varying state innovation variances in Ω t with jth element ω jt = λ 0 λ j φ jt . To achieve a log-scale representation of the global local prior, define ψ jt = log(λ 0 λ j φ jt ) and assume with Z denoting the Z-distribution, where setting a = b = 1/2 yields the dynamic horseshoe prior (for details on related prior choices, see Kowal et al., 2019). Here λ 0 is a global, λ j are predictor specific, and φ jt are predictor and time-specific shrinkage parameters that follow a joint autoregressive law of motion.
We use standard Markov chain Monte Carlo (MCMC) methods such as Gibbs sampling augmented by a forward filtering backward sampling (FFBS) algorithm for the TVPs (Carter and Kohn, 1994;Frühwirth-Schnatter, 1994).

INFLATION MODELING
In this study we take a real time perspective to modeling inflation for the US, UK, and the EA. Vintage data available at specific times in the past is obtained from the webpages of We model inflation, defined as π t = 400 log(p t /p t−1 ), with an unobserved component model augmented with stochastic volatility in the mean (UC-SVM): which is a special case of Eq. (1) with β t = 0 for all t. This model has been used by Chan (2017) to forecast inflation. If γ t = 0, we obtain the UC-SV model proposed in Stock and Watson (2007). If the prior on the state innovation variance ω is specified too loose, the model might be prone to overfitting and this would be deleterious for predictive accuracy.
Hence, in this empirical application we assess whether using shrinkage priors improves the predictive fit of the model. But before we turn to analyzing predictions, we focus on key in-sample results. Before and after that period, the error volatility process remains rather stable (as opposed to more rapidly changing log-volatilities in the no shrinkage case).

In-sample results
Turning to the findings for γ t yields a different picture. While low frequency movements remain similar across shrinkage priors, some interesting differences arise. Shrinkage specifications that imply time-varying shrinkage (i.e. SHS and DHS) allow for sharp movements in γ t for selected periods and across economies. For instance, in the US we observe a pronounced change in the relationship between inflation and inflation volatility during the Volcker disinflation. A comparable appreciable decrease in γ t can also be observed in the UK during the crisis of the European Exchange Rate Mechanism (ERM) at the beginning of the 1990s. A similar decline, albeit more noisy, can be found during the GFC in the EA.
In sum (and with some exceptions) Figure 1 shows that the original results of Chan (2017) remain remarkably robust with respect to different shrinkage priors. Exceptions arise especially during periods where the level of inflation experienced sharp changes (such as during the Volcker disinflation, the ERM and the GFC) and for EA data.

Forecast results
In this section, we analyze whether our set of shrinkage priors improves out-of-sample predictive performance within a real time forecasting exercise. We evaluate both point and density forecasts by means of root mean squared errors (RMSEs) and average log predictive likelihoods (LPLs, see e.g., Geweke and Amisano, 2010). Each real time vintage is used to produce forecasts which are then evaluated using the final available vintage.
We assess the merits of using shrinkage in the SVM model relative to the following competitors. As in Chan (2017), we use a random walk (RW) model as the benchmark for relative RMSEs and LPLs: π t = π t−1 + η t , with η t ∼ N (0, σ 2 η ). Moreover, we include unobserved component models with stochastic volatility (UC-SV) as a special case of the UC-SVM model: π t = τ t + t . We assume t ∼ N (0, e ht ) with the state equation given by h t = µ h + φ h (h t−1 − µ h ) + ν t and ν t ∼ N (0, σ 2 ). UC-SV and UC-SVM are estimated using the four shrinkage priors (None, HS, SHS and DHS) discussed above. Table 1 presents forecasting results for different economies and shrinkage priors. In general (and with only very few exceptions) we find that all models improve upon the random walk. This holds true for both point and density forecasts, all economies and forecast horizons considered. Only in the case of density forecast accuracy for EA inflation we find the random walk to yield more precise predictions. The strong performance of the UC-SVM model without shrinkage confirms the findings reported in Chan (2017).
We now investigate whether using shrinkage further improves predictive accuracy. Considering both density and point forecasts, this question is difficult to answer. For some economies, horizons and specifications, shrinkage priors seem to improve both point and density forecasting performance while for other configurations, shrinkage seems to slightly hurt predictive accuracy. But these differences (both negative and positive) are often very small. There exist some cases where we find more pronounced improvements. For instance, the UC-SV model with shrinkage performs appreciably better in predicting UK inflation at both horizons and by considering RMSEs and LPLs than the no-shrinkage counterpart. Another example that provides evidence that shrinkage improves forecasts can be found for EA inflation density forecasts. In this case, any shrinkage prior yields better forecasts than the no-shrinkage specification.
Considering differences between the different shrinkage priors provides no clear winner of our forecasting horse race. In most cases, predictions are similar to each other. If we were to choose a preferred prior our default recommendation would be the HS specification. This is because it performs well across the different configurations and for both model classes considered. Especially in the case of the EA, we find the HS setup to provide favorable point and density forecasts (especially for the UC-SV model).
The key take away from this discussion is that the benchmark model introduced in Chan (2017) seems to work very well for all considered economies. Using shrinkage helps in some cases but also leads to slightly inferior predictive performance in others. However, these decreases in forecast accuracy are never substantial. By contrast, we observe several cases where shrinkage improves forecasts. And these improvements are substantial. Hence, as a general rule we can suggest to combine the SVM model with shrinkage priors since the risk of obtaining markedly weaker forecasts appears to be low while the chances that forecasts can be improved substantially are much higher.

CONCLUDING REMARKS
In this note we have successfully replicated the findings in Chan (2017) both in a narrow and wide sense. We have shown that using several different shrinkage techniques has the potential to improve forecasts. While these gains are small on average, several cases emerge where improvements are more pronounced. More importantly, we never find situations where using shrinkage strongly decreases forecast performance.