Forecasts with Bayesian vector autoregressions under real time conditions

This paper investigates the sensitivity of forecast performance measures to taking a real time versus pseudo out-of-sample perspective. We use monthly vintages for the United States (US) and the Euro Area (EA) and estimate a set of vector autoregressive (VAR) models of different sizes with constant and time-varying parameters (TVPs) and stochastic volatility (SV). Our results suggest differences in the relative ordering of model performance for point and density forecasts depending on whether real time data or truncated final vintages in pseudo out-of-sample simulations are used for evaluating forecasts. No clearly superior specification for the US or the EA across variable types and forecast horizons can be identified, although larger models featuring TVPs appear to be affected the least by missing values and data revisions. We identify substantial differences in performance metrics with respect to whether forecasts are produced for the US or the EA.


INTRODUCTION
Forecasting economic and financial series is of crucial interest to the private sector and for policy makers in central banks and governments.Many contributions in specialized academic journals thus develop new econometric methods, aiming to improve forecast accuracy of existing approaches.The proposed models are benchmarked against established frameworks, and evaluated in terms of their performance for point and density forecasts mostly by means of so called pseudo out-of-sample exercises.
In these simulations, one splits available data at a specific point in time into an estimation and holdout period, produces forecasts for the relevant horizon based on the estimation period and assesses these forecasts with realized values in the holdout sample.
Subsequently, an additional observation is added to the information set, and the procedure is iterated until the end of data in the holdout is reached.Results for vast sets of models can subsequently be compared, and a "winner" in terms of point and density forecasts can be chosen on the grounds of various applicable statistics (see, for instance, Geweke and Amisano, 2010).
A key difference for forecasting under real time conditions is, however, that data are frequently subject to revisions and that observations for many series are not yet available to the forecaster until the present date upon first release of the data.This implies that practitioners must rely on incomplete information sets, the "vintages" published at a specific point in time.By contrast, pseudo out-of-sample forecasts are produced and evaluated using truncated series from a single vintage.Handling real time data and accounting for revisions and missing values is a burdensome task.Is it necessary to address such concerns, or do pseudo out-of-sample exercises suffice to establish the relative performance ordering of different modeling approaches?
In this paper, we assess the sensitivity of forecasts to taking a real time perspective, and seek to identify models that are most robust to data revisions, and situations where data for the most recent periods are not yet available (for related studies, see Diron, 2008;Giannone et al., 2008;Molodtsova et al., 2008;Schumacher and Breitung, 2008;Bańbura and Rünstler, 2011;Bańbura et al., 2013;Ghysels et al., 2018;Krippner and Lewis, 2018). 1   Rather than focusing on developing methods for predictable measurement errors in initial releases of statistical agencies (see Aruoba, 2008;Strohsal and Wolf, in press) as in Kishor and Koenig (2012) or Cogley et al. (2015), our interest centers on assessing the robustness of popular modeling approaches that do not explicitly account for data revisions.We ask whether forecast metrics derived from pseudo out-of-sample and real time studies establish the same ordering in terms of forecast performance, and thus agree on model selection.
We investigate datasets for the United States (US) and the Euro Area (EA).The Federal Reserve Bank of St. Louis maintains monthly vintages for the US, where the dataset for a given month only contains observations until the previous month and data 1 For pioneering work in the context of real time analysis of data, see Diebold and Rudebusch (1991) and Croushore and Stark (2001).A comprehensive survey of the related literature is given by Croushore (2006Croushore ( , 2011)).
are revised frequently (Federal Reserve Economic Data, FRED-MD, see McCracken and Ng, 2016).A comparable database is available for the EA, with even longer lags in the publication of available series (Euro Area Real Time Database, EA-RTD, see Giannone et al., 2012).
Using a set of popular models reflecting recent trends in macroeconomic forecasting, that is, Bayesian time-varying parameter (TVP) vector autoregressive (VAR) models with stochastic volatility (SV) of different sizes equipped with a flexible shrinkage prior, we investigate relative differences in forecast performance relying on real time information and pseudo out-of-sample exercises.To gauge the role of large information sets while keeping computation feasible, we also use variants of these models augmented by principal components extracted from the high-dimensional datasets that are available (see also Bernanke and Boivin, 2003).Moreover, we employ methods for in-sample imputation of missing values.
Our results suggest three main conclusions.First, we observe differences in the relative ordering of model performance for point and density forecasts depending on whether real time data or truncated final vintages in pseudo out-of-sample simulations are used for evaluating forecasts.This finding suggests that when developing econometric frameworks for forecasting, special attention has to be paid to the robustness of the proposed models to missing observations and data revisions.Providing methods for this case is crucial for practitioners, since initial data releases are typically incomplete and imperfect.Second, depending on the release schedule of the data, we detect differences in the severity of this problem between the US and the EA.Missing values and data revisions play a much more prominent role for the case of the EA.Finally, and perhaps unsurprisingly, producing accurate forecasts is substantially harder when relying on real time data.Depending on the variables and forecast horizon of interest, no clearly superior model specification arises.
Relatedly, we cannot identify a unique specification that performs best for the US and EA economy.Differences in forecast performance between real time and pseudo out-of-sample designs are typically smaller for larger and more complex models featuring TVPs.
The rest of the paper is structured as follows.Section 2 briefly presents the econometric framework.Section 3 introduces the datasets and discusses imputation methods for missing data alongside details on model specification.The main findings are discussed in Section 4. Section 5 concludes.

ECONOMETRIC FRAMEWORK
Recent approaches in macroeconomic forecasting usually involve high-dimensional multivariate models to extract information from many available series to forecast key variables of interest.Examples are variants of factor or large Bayesian VAR models (see Forni et al., 2000;Stock and Watson, 2002;Bańbura et al., 2010;Giannone et al., 2015;Huber and Feldkircher, 2019;Carriero et al., 2019).In addition, forecasters rely on methods to account for nonlinear dynamics and structural breaks.Key features identified to achieve gains in forecast performance, especially for predictive densities, are SVs, and to a lesser degree, TVPs (see, for instance, D 'Agostino et al., 2013;Koop and Korobilis, 2013;Carriero et al., 2016;Aastveit et al., 2017;Feldkircher et al., 2017;Huber et al., 2020).For the purposes of this paper, we consider TVP-VARs with SV, equipped with a flexible shrinkage prior as in Feldkircher et al. (2017) and Huber et al. (in press). 2 The baseline TVP-VAR-SV specification for an M -dimensional vector of endogenous variables y t for t = 1, . . ., T is with M × M -matrices of time-varying regression coefficients A pt (for lags p = 1, . . ., P ), and a Gaussian error term t with zero mean and time-varying M × M -covariance matrix Ω t .In the empirical application, we also include an intercept term that we omit here for brevity.
The covariance matrix can be decomposed as Ω t = H t Σ t H t , with H t denoting a lower unitriangular matrix, and Σ t = diag(σ 1t , . . ., σ M t ).The model can be recast in triangular regression form using A t = (A 1t , . . ., A P t ) and x t = (y t−1 , . . ., y t−P ) , Reparameterizing the model and augmenting the individual VAR equations with the preceeding residuals allows to treat the covariances as regression coefficients (see also Carriero et al., 2019), and for equation-by-equation estimation.Here, we exploit the fact that H −1 t t = η t .Denoting the ith row of A t by A i•,t and the jth element in the ith row of H t by h ij,t , the ith equation for i > 1 is Equivalently, using 2 To incorporate high-dimensional information while keeping the computational burden such models entail feasible, we also consider variants augmented by principal components.This amounts to extracting principal components from the full information set other than the series treated as observed (depending on the model size) and including them in the vector yt.
The states are assumed to follow a random walk process with zero mean Gaussian innovations and a diagonal Using a non-centered parameterization of the model (Frühwirth-Schnatter and Wagner, 2010) allows to treat the square root of the innovation variances in V i as additional regressors, Here, , and βit with βij,t = . This feature allows for inducing shrinkage not only on the constant part of the parameters, but also on the amount of time variation in the states.3 We rely on equation-specific global-local shrinkage via the popular horseshoe prior (Carvalho et al., 2010).Note that many shrinkage priors can be used on these coefficients (see Cadonna et al., 2019;Huber et al., in press).Popular choices also include the Bayesian Lasso (Park and Casella, 2008), the Normal-Gamma (Griffin and Brown, 2010) or the Dirichlet-Laplace prior (Bhattacharya et al., 2015).For a comparison of the empirical shrinkage properties of various priors in VAR models, see Cross et al. (in press).
On the log of the time-varying variances, we impose the following independent AR(1) processes and use the prior setup and algorithm proposed in Kastner and Frühwirth-Schnatter (2014): All competing models feature stochastic volatilities, based on inferior forecast performance identified in many studies assuming constant variances (see, for instance, Clark, 2011).
Appendix A provides details on prior specification and the estimation algorithm.To obtain a reasonably long initial estimation period for the forecast exercise, we only rely on vintages published after 2003:12.

Data sources
Due to the mode of publication of the underlying Monthly Bulletin, the release schedule of the vintages in the EA is less strictly organized than for the case of the US data.
Consequently, individual series may exhibit a different number of missing values, depending on the date of the respective release, implying a "ragged edge" of data availability at the end of the sample (see also Jarociński and Lenza, 2018).In selected periods, this is also the case for FRED-MD.For instance, while oil prices may be already available for the full length of the vintage sample, inflation data may not yet have been released for the current month.This calls for a conditional nowcast or data imputation scheme, described in the next subsection.
Both datasets range until 2019:01.We preprocess the vintages (when applicable) to account for seasonality using the methods established by the US Census Bureau (X-13-ARIMA-SEATS, see Monsell, 2007;Sax and Eddelbuettel, 2018), and obtain data on interest rates from the ECB's financial market database.For pseudo out-of-sample simulations, we rely on truncated samples from the final vintage, while for real time simulations we use data available at the specific months.In both cases, we evaluate the forecasts using the final available vintage.

Imputing missing values in real time
The Bayesian approach to estimation allows for a fully probabilistic treatment of missing endogenous variables.There are two relevant cases to be considered.First, a subset of values may be missing, while other variables are already available.Second, at a specific point in time, values for all series may be missing, which allows for sampling these values similarly to unconditional forecasts.
The fitted values at time t + 1 are A t+1 x t+1 = (µ 1 , µ 2 ) , with µ 1 being a q × 1 vector corresponding to the missing values, and µ 2 the (M − q) × 1 vector related to the available series.Similarly, we partition the covariance matrix Σ t+1 , denoting the upper left q × q block by Σ 11 , the upper right q × (M − q) block by Σ 12 , the bottom left (M − q) × q block by Σ 21 , and the bottom right (M − q) × (M − q) block by Σ 22 .
The distribution of the missing values conditional on the realizations, the endogenous variables up to t and all other model parameters, follows from the properties of the multivariate Gaussian distribution.In particular, Conditioning on the realized values alters both the mean μ and variance Σ of the distribution of the missing values, although the variance does not depend upon the particular value of the realizations.For the case where all values at a specific point in time are missing, the distribution of ỹ * t+1 is Nowcasting the missing values in this way is related to data augmentation techniques, and moreover allows for drawing the model coefficients conditional on the synthetic data in each iteration of the algorithm.Consequently, posterior uncertainty surrounding both the missing data and model parameters is accounted for during estimation.

Model specification
This subsection describes the differently sized information sets used for estimation.We focus on forecasting inflation, a short-term interest rate and the unemployment rate as key variables of interest (for abbreviations, see Appendix B).We rely on three different model sizes, where the information sets are specified to approximately correspond to similar studies employing VAR models for forecasting, conditional on the availability of the respective series in all vintages: • Small (M = 3 variables): The small model employs the focus variables.For the US we thus include CPIAUCSL as a monthly inflation indicator, FEDFUNDS as the key interest rate and UNRATE to measure the unemployment rate; for the EA, we use C OV for inflation, EUR3M as short-term interest rate and UNETO for the unemployment rate.
• Medium (M = 6 variables): This specification incorporates the small information set and adds stock prices, industrial production and long-term interest rates.The variable codes for the additional quantities are S&P500, INDPRO, GS10 (US) and DJE50, XCONS, and 10Y (EA).
• Large (M = 11 variables): The large VAR features the information set of the small and medium models and further includes information on oil prices, exchange rates, loans, a money aggregate, and a term spread: OILPRICEx, EXUSUKx, BUSLOANS, M2SL, T10YFFM for the US; and OILBR, ERC0 BGR, LOANSEC U NG, M2 V NC, 10Y2Y in the case of the EA.
To circumvent scaling issues arising from the data and to improve stability of the models, we standardize each vintage prior to estimation to have zero mean and unit variance by subtracting the mean and dividing all series by their standard deviation.We denote the sample mean and standard deviation by m y and s y , respectively.
Augmented variants of the baseline specification to use the full available information sets are constructed as follows.Depending on the respective model size, we extract five principal components from the remaining (demeaned and standardized) series and include them in the vector of endogenous variables y t .Varying the number of principal components does not significantly alter the forecasting results.Rather than taking a fully Bayesian approach, we rely on this approximation to reduce the computational burden.This implies that the largest model features M = 16 endogenous variables.We use P = 2 lags.

FORECASTS UNDER REAL TIME CONDITIONS
One of the main questions this paper raises is whether pseudo out-of-sample forecasts suffice to establish a clear ranking in terms of predictive performance that also applies in the real time context.We assess both moments of the predictive distributions and evaluate point and density forecasts.Details on the employed metrics are provided in Appendix A.

Overall forecast performance
To gauge overall forecast performance of the competing models in real time and pseudo out-of-sample designs, we rely on cumulative joint log predictive scores (LPS) over the full holdout.Higher scores indicate superior performance, and allow for constructing a ranking of the models on a monthly frequency.We obtain these ranks for the real time and pseudo out-of-sample exercises, and study their correlation over time to see whether they agree on the ordering of the models in terms of relative forecast performance.
As a summary statistic, we calculate Kendall's rank correlation coefficient τ for all holdout periods.Zooming in on the forecasts for the US, we detect a high concordance of real time and pseudo out-of-sample performance rankings.The rank correlation coefficient is close to one, especially after 2010.Depending on the forecast horizon, the measure detects subtle differences over time.For one-month ahead forecasts, we observe that the relative ordering seems to change between the two recessions in the holdout, and during the Great Recession.These changes are even more pronounced for the one-quarter ahead forecasts, with Kendall's τ indicating values close to zero during and just after the Great Recession.
Interestingly, for one-year ahead forecasts, differences in the ranking of the models occur mainly between the two recessions.
For the EA, using real time information appreciably changes the relative performance ordering of the competing specifications, especially at short forecast horizons.At the onemonth ahead horizon, we observe periods over the full holdout where the model ordering between real time and pseudo out-of-sample information sets is reversed, indicated by negative values of Kendall's τ .During and between the two recessions, the correlation fluctuates around zero.For the one-quarter ahead horizon, the rankings tend to agree with each other early in the holdout.After the European debt crisis, the rank correlation coefficient turns negative, indicating that real time information and pseudo out-of-sample rankings diverge.By contrast, for one-year ahead forecasts, real time and pseudo out-ofsample designs suggest similar rankings of relative performance, with correlations close to one apart from a brief period early in the holdout.
We proceed with investigating the relative forecast performance of the competing specifications in real time and based on the pseudo out-of-sample simulations.Figure 2 shows the ranking over time, featuring the two best and worst specifications at the end of the holdout for visualization purposes.Starting with the US, panel (a) indicates that small and medium models featuring TVPs without principal components perform well for most of the holdout sample at one-month and one-quarter ahead horizons.Specifications of the baseline model augmented by principal components occupy the lowest ranks.As suggested in the context of our discussion of rank correlations, the best and worst performing models tend to be similar for both information sets in the case of the US.Some differences occur for one-year ahead forecasts.First, we find that the large model featuring TVPs but no principal components outperforms all other specifications for most of the holdout period.
Second, models that perform relatively poorly for shorter horizons exhibit significant gains in the Great Recession.
Considering the results for the EA, Fig. TVPs performs best for most of the holdout at the one-month and one-quarter ahead horizon, smaller models with constant parameters and without principle components dominate all other specifications.This finding translates to values of the rank correlation coefficient close to zero or even negative, as described previously.A striking example is provided by the small constant parameter model without principal components, that performs best for one-month ahead forecasts in pseudo out-of-sample simulations, but is second to last when using real time information.For one-year ahead forecasts, a different picture emerges.The larger constant parameter models without principal components perform best, while the models featuring TVPs are ordered last for most of the holdout for both real time and pseudo out-of-sample contexts.
The final part of this subsection assesses in detail how forecast performance differs across models when adopting a real time and pseudo out-of-sample simulation.For this purpose, each real time model is benchmarked against its complementary specification estimated using the pseudo out-of-sample information set.Relative LPS larger (smaller) than zero indicate that predictive accuracy based on the pseudo out-of-sample information set is superior (inferior).These relative differences can be interpreted as measures capturing the distance between real time and pseudo out-of-sample forecasts, and do not necessarily indicate superior forecast performance, since each real time model is benchmarked against its pseudo out-of-sample counterpart.
Figure 3 shows the resulting relative cumulative LPSs over time.We find that producing forecasts at the one-month and one-quarter ahead horizons using real time information is substantially harder for both the US and EA, indicated by negative relative joint LPSs for all models considered.At the one-month ahead horizon, the distance between the performance metric in real time and pseudo out-of-sample exercise is minimal for medium and large models with TVPs.
Reconsidering Fig. 2, this implies that such models can be considered comparatively robust to data revisions and imputations, and they tend to perform well in both forecast evaluation contexts.Interestingly, we do not observe clear breaks in relative real time and pseudo out-of-sample performance, but rather consistent differences over time.For onequarter ahead forecasts, differences are smallest for smaller constant parameter models, and substantial especially for principal component augmented variants with TVPs.At the one-year ahead horizon, we find that some models estimated using real time data outperform their pseudo out-of-sample counterparts, with small to medium constant parameter models showing the most pronounced gains.In contrast to the shorter forecast horizons, clear differences in relative performance emerge during the Great Recession.
A clearer pattern in terms of relative performance using real time and pseudo out-ofsample data is visible for the EA in Fig. 3(b).In particular, for all considered forecast horizons, the constant parameter models show the largest differences between the two forecast evaluation designs.While most specifications estimated using real time data perform worse consistently, we observe deteriorating relative performance measures for one-month and one-quarter ahead forecasts especially during the Great Recession.Similar to the US, the picture is different for one-year ahead forecasts.Again, there are several TVP real time models that outperform their pseudo out-of-sample counterparts.
This discussion of overall forecast performance can be summarized by noting three key observations.First, and perhaps unsurprisingly, producing accurate forecasts in real time is substantially harder than when relying on a pseudo out-of-sample forecast exercise.This is mainly the case at short forecast horizons, and differs for longer horizons.Second, real time and pseudo out-of-sample information sets do not necessarily result in the same relative performance ordering among models.This appears to be a significant problem in the case of the EA.Finally, differences occur both in terms of the specifics of the dataset and release schedule of the vintages, and which forecast horizon is considered.

Data vintages and predictions
Before turning to a more detailed analysis of differences between forecasts in real time and pseudo out-of-sample exercises for different variables and addressing point and density forecasts, we pick the small model with time-varying parameters but without principal components as an example to demonstrate both data features over the different vintages,  In Fig. 5(a), it is clearly visible that taking a real time perspective yields wider credible sets for the predictive distribution for the US, originating from the need of nowcasting some of the involved quantities.Interestingly, even though interest rates are less affected by data revisions, we also detect differences in the predictive distribution stemming from missing values in the other variables.In general, the benchmark model performs well for forecasting, with the predictive intervals covering the realized series in most cases.
Exceptions are various months during recessions, where some values lie outside the credible set.
For the EA (see Fig. 5(b)), a key notion again is that the posterior distribution for predictions based on real time data are wider than for pseudo out-of-sample forecasts for all focus variables, even more so than for the US.This increased uncertainty surrounding forecasts stems from both the imputed values and data revisions.While forecasts track the actual evolution of interest rates satisfactorily, we observe that some peaks and troughs of inflation lie outside the bounds of the credible set.An interesting finding for unemployment is that the model consistently predicts lower unemployment rates in real time during the Great Recession, while for the pseudo out-of-sample exercise, the posterior more accurately tracks the evolution of the realized series.The same is visible during the European debt crisis, albeit to a lesser extent.

Forecasts for the United States
In Sections 4.3 and 4.4, we provide a more detailed discussion of differences in forecasts when adopting a real time perspective individually for the US and the EA.In particular, we assess specific findings for point and density forecasts across focus variables and horizons.
Additional results to identify the best performing models and differences between real time and pseudo out-of-sample contexts are provided in Appendix C.
Figure 6 shows Kendall's rank correlation coefficient τ for the forecast performance at different horizons and across variable types based on both point and density forecasts.
For point forecasts, we assign ranks based on minimal cumulative absolute forecast errors (FEs), while ranks for predictive densities are constructed from cumulative marginal LPSs in descending order.A striking observation is that rank correlations for density forecasts and point forecasts exhibit different patterns over time across most variable types and forecast horizons.
At the one-month ahead horizon, we find that the relative performance in terms of density forecasts is similar in real time and pseudo out-of-sample contexts for interest rates and inflation, with the rank correlation coefficient close to one for most of the holdout.
Here, especially models with TVPs appear to perform well.For unemployment, there is essentially no correlation in rank orders until the Great Recession.During the recovery period, we observe a slightly higher agreement in terms of relative forecast performance for density forecasts.However, this robustness to data revisions and imputations fades slowly in the later part of the holdout, fluctuating around 0.3 at the end of the considered period.
In a real time context, introducing additional information via principal components tends to pay off.
Conversely, for point forecasts, we find almost no association in the rank ordering until the Great Recession for all variables.Afterwards, coinciding with the period when the zero lower bound was reached, the relative orderings tend to agree for interest rates, and to a slightly lesser extent for inflation.The rank correlation coefficients over time visibly show comovement for point and density forecasts of unemployment, albeit the correlation is higher for point forecasts.It is noteworthy that models that perform well for density forecasts do not necessarily perform well for point forecasts, especially for inflation.
Interestingly, the large constant parameter model featuring principal components produces accurate point and density forecasts in real time and pseudo out-of-sample contexts for unemployment.
For one-quarter ahead forecasts of interest rates, we find that rank orderings consistently agree for point and density forecasts, although we observe a brief period of lower correlations just after the Great Recession in terms of point forecasts.The relative performance for point forecasts of inflation is rather stable and strongly positively correlated for real time and pseudo out-of-sample exercises.Conversely, we find that although the performance ordering agrees early in the sample for one-quarter ahead density forecasts of inflation, the rank correlation coefficient is zero during the Great Recession.After that, we again observe a clearly positive correlation between real time and pseudo out-of-sample relative performance ranking.Here, TVPs seem to be crucial to provide accurate density forecasts, while best point forecasts are achieved by small scale constant parameter models.
Turning to one-quarter ahead forecasts of unemployment, we observe fluctuating correlations between almost zero and 0.8, with point and density rank correlations exhibiting a substantial degree of comovement.This relationship decouples during the Great Re-cession, with high correlations in terms of densities, and zero association between real time and pseudo out-of-sample rank ordering for point forecasts afterwards.Larger TVP specifications without principal components appear to produce the best point and density forecasts for both real time and pseudo out-of-sample simulations, on average.
One-year ahead forecasts appear to be remarkably robust in terms of relative performance orderings.For point forecasts, the rank correlation coefficient is close to one for all three focus variables for most of the holdout, indicating that real time and pseudo outof-sample exercises produce almost identical rankings of model performance.For density forecasts, we observe a similar situation, although we identify lengthy periods before 2010 with correlations around 0.5, especially for the interest rate and inflation.In general, we find that constant parameter models produce superior forecasts at longer horizons, except for density forecasts of inflation.

Forecasts for the Euro Area
Variable specific differences for relative orderings of point and density forecast performance in the EA are shown in Fig. 7.As suggested in Section 4.1, taking a real time perspective is even more consequential for relative forecast performance for the EA than the US.This observation is true for both point and density forecasts.
Starting with the one-month ahead horizon, point and density forecasts for interest rates exhibit a similar ordering for real time and pseudo out-of-sample simulations until the Great Recession.Overall, TVPs appear to improve forecast accuracy.In the period between the two recessions, we observe negative rank correlations, indicating that using real time information reverses the ordering of the considered models in terms of their relative forecast accuracy.In the period after the European debt crisis when the zero lower bound was reached, density forecasts are again similar for real time and pseudo outof-sample information, while this is not the case for point forecasts.This is mainly due to the relatively poor performance of smaller constant parameter models augmented with principal components in the pseudo out-of-sample simulation, specifications that appear to perform satisfactorily in real time.
A different picture emerges for short-horizon inflation forecasts, with no clear patterns in terms of rank correlation, albeit changes in relative performance orderings appear to occur especially during and between the two recessions.While larger models with TVPs and without principal components perform well for density forecasts, small to medium size models with constant parameters indicate superior performance for point forecasts.Interest rate density forecasts for the one-quarter horizon show similar concordance as in the one-month ahead case, again with gains for models featuring TVPs.Point forecasts mostly agree on model ordering, different to the shorter horizon.Principal components seem to improve forecast accuracy in both the TVP and constant parameter cases.For inflation, we observe higher correlations especially after the Great Recession, indicating that pseudo out-of-sample exercises establish a similar model ordering when compared to relying on real time information.Here, TVP models are superior for density forecasts, while constant parameter models are superior for point predictions.
Interestingly, for unemployment, correlation measures show a similar path as for the US.Early in the holdout, the measure fluctuates substantially, increasing during the Great Recession.This implies that for density forecasts, both real time and pseudo out-of-sample simulations produce similar orderings of relative performance.The small constant parameter model featuring principal components performs relatively similar in both evaluation contexts, while TVP models are ranked last in both cases.Point forecast correlations deteriorate afterwards, and are close to zero or even negative after the first recession.
Long-horizon forecasts exhibit different patterns for rank correlations.For interest rates, consequences for the relative performance orderings derived from real time and pseudo out-of-sample simulations are present but muted, indicated by a coefficient that fluctuates just above 0.5 for most of the holdout.TVPs play an important role for point and density forecasts both using real time and truncated vintage data.Inflation forecast performance orderings are discordant for point forecasts early in the sample, however, tend to agree on the rank order after the Great Recession.Small scale constant parameter models featuring principal components perform well.Density forecasts, on the other hand, tend to be ordered similarly early in the sample.This changes after the Great Recession, with negative τ suggesting inverse performance orderings of the competing models.At the end of the holdout sample, the association between performance orderings are close to zero.The best performing model in real time here is the large TVP-VAR without principal components, while the medium scale constant parameter model works best in the pseudo out-of-sample exercise.
For unemployment, we find consistently high values of correlations throughout the holdout period in terms of density forecasts.This implies that pseudo out-of-sample evaluation is sufficient to establish a useful ordering of model performance.Regarding point forecasts, we observe some periods until 2010 where substantial rank changes occur.
From 2010 onwards, Kendall's rank correlation is close to one, again indicating that relying on real time information does not affect relative performance orderings.Here, constant parameter specifications appear to be superior for point and density forecasts in both information sets.

CLOSING REMARKS
In this paper, we systematically assess differences in forecast performance between a set of differently sized models using pseudo out-of-sample versus real time simulations.We rely on constant and TVP-VAR models with SV, equipped with a global-local shrinkage prior.
We also consider variants augmented by principal components to capture high-dimensional information from available datasets, and discuss imputing missing values.Our results suggest differences in the relative ordering of model performance for point and density forecasts.No clearly superior specification for the US or the EA across variable types and forecast horizons can be identified, although larger models featuring TVPs appear to be affected the least by missing values and data revisions.This finding suggests that pseudo out-of-sample simulations are insufficient to establish a clear ordering of relative model performance, and attention in the development of new methods should always be paid to real time features of data in order to be of value for forecasters in central banks and governments.

A.1. The horseshoe prior
For simplicity, we construct a 2K i × 1-dimensional vector containing the constant part of the coefficients, covariances, and the associated state innovation variances for equation i of the VAR, , and denote the jth element of this vector by b ij (j = 1, . . ., 2K i ). 6As in Huber et al. (in press), we establish the horseshoe prior of Carvalho et al. (2010) based on the auxiliary variable representation in Makalic and Schmidt (2015), with G −1 referring to the inverse Gamma distribution.The auxiliary variables are assigned the priors ϕ i ∼ G −1 (1/2, 1) and ζ ij ∼ G −1 (1/2, 1).

A.2. Priors for stochastic volatilities
For the stochastic volatilities, we rely on the R-package stochvol and employ the priors proposed in Kastner and Frühwirth-Schnatter (2014).Specifically, we use a Gaussian prior for the level of the log-volatilities µ i ∼ N (0, 100).For the autoregressive coefficient φ i ∈ (−1, 1), a transformed Beta distribution is used, (φ i + 1)/2 ∼ B(25, 5), while a Gaussian prior on the signed square root of the innovation variances is employed, ± √ ς i ∼ N (0, 1).The prior on the initial state is based on the stationary distribution of the process, A.3.Posterior simulation and algorithm 1. Given a draw of the full history of the time-varying part of the model coefficients and the stochastic volatilities, the VAR coefficients, covariances and associated state innovation variances in b i can be drawn jointly (on an equation-by-equation basis) from their multivariate Gaussian conditional posterior distribution with standard moments for linear regression models (see, for instance, Koop, 2003).For the exact formulae of the moments, refer to Feldkircher et al. (2017).
2. Conditional on all other model parameters, a forward-filtering backward-sampling (FFBS, Carter and Kohn, 1994;Frühwirth-Schnatter, 1994) algorithm is employed again on an equationby-equation basis for drawing the full history of the TVPs.This step is omitted for the constant parameter models.
3. The posteriors associated with the horseshoe prior established in Appendix A.1 are for the global and local shrinkage parameters.The conditional posterior distributions for the auxiliary variables are 4. Sampling of the stochastic volatilities and associated parameters is carried out using the algorithm in Kastner and Frühwirth-Schnatter (2014) and its implementation in the R-package stochvol.
5. Depending on the forecast exercise, we impute missing values in the endogenous variables using the quantities discussed in Section 3.2.
We use a total number of 6, 000 draws and discard the initial 2, 000 draws as burn-in.We consider each second draw to obtain a set of S = 2, 000 posterior draws.These draws can be used for calculating the predictive distribution ex post, and allow for simulating point and density forecasts that are used to obtain the performance metrics.

A.4. Producing and evaluating forecasts
For calculating iterated h-step ahead forecasts, we use the VAR coefficients A T and covariance matrix Ω T at time T , let Y T = (y T , y T −1 , . . ., y T −P +1 ) , ν T = ( T , 0 M , . . ., 0 M ) and denote the respective companion and covariance matrix by .
The moments for the distribution of the h-step ahead forecasts can be derived by observing Ãi Ω Ãi .

The forecast y (f )
T +h arises from a Gaussian distribution, with moments μT +h = µ T +h , whereby the superscript [1 : M ] indicates selecting the first M elements or M × M block in the respective vector or matrix.Taking account of standardizing the series, the non-standardized forecast is y T +h s y + m y where denotes element-wise multiplication.We assess the performance of both point and density forecasts.For measuring the precision of first moment forecasts, we rely on the root mean squared error (RMSE), while second moment forecasts are evaluated by the log predictive likelihood (LPL) constructed from the full predictive distribution.
To obtain a point forecast, we store draws from Eq. (A.1) for all iterations of the algorithm and refer to the mean of this distribution for series i by ŷ(f) i,t+h .Let T H indicate the length of the holdout sample, y T +h are the corresponding realized values, then RMSEs for the ith variable (for i = 1, . . ., M ) are defined as This measure captures the average deviation of the forecast from realizations over the holdout period.Absolute forecast errors (FEs) by period are closely related, and are calculated as A key aspect in forecasting, however, is how precise the forecasts are, which is also reflected in the second moment of the predicive distribution.The marginal predictive likelihood (MPL) for each period of the holdout at time t for the h-step ahead forecast of variable i is given by where f N denotes the probability density function of the normal distribution with the realized value being evaluated in the predictive distribution.Note that if the data are not standardized a priori, m yi = 0, s yi = 1, and the moments in Eq. (A.2) correspond to Eq. (A.1).This distribution can be evaluated for each iteration of the algorithm, taking into account both intrinsic uncertainty from the error term and estimation uncertainty for parameters.Following Geweke and Amisano (2010), the LPLs are defined as Here, s = 1, . . ., S refers to the iterations of the sampler with a total number of S draws from the posterior distribution.To assess overall forecast performance of a model, one may also evaluate the joint predictive likelihood (JPL): 3) The non-normalized predictive covariance matrix is constructed using a Cholesky factorization of Σt+h = L t+h L t+h and the normalizing standard deviations in S y = diag(s y1 , . . ., s yM ) to yield St+h = (S y L t+h )(S y L t+h ) .Note that we evaluate the joint predictive likelihood solely for the focus variables, which amounts to subsetting the vectors and matrices in Eq. (A.3) to the corresponding elements.The joint log predictive likelihood is t+h .

B. DATA APPENDIX
We obtain data for the US and the EA to reflect similar information sets.All series (when applicable) are seasonally adjusted, see also Section 3. To avoid confounding information sets, all adjustments such as for seasonality, transformations to stationarity and standardizing the data are applied to the vintages independently.The transformation of a series x t for obtaining stationarity are: (1) no transformation, (2) ∆x t , (5) ∆ log(x t ), (6) ∆ 2 log(x t ) with ∆ i indicating ith differences.
The transformations for the US dataset are based on suggestions in McCracken and Ng (2016), while for the EA we use corresponding transformations for the respective series.

B.1. FRED-MD data
The Federal Reserve Economic Data (FRED) is maintained by the Federal Reserve Bank of St. Louis and available for download at research.stlouisfed.org.All series are on a monthly frequency, with some of them starting in 1959:01.Data vintages are available from 1999:08.The final vintage contains 129 series -we preselect the 99 series used for this paper, as mentioned in Section 3, based on consistent availability of all historical data.Moreover, some variables are dropped due to substantial publication lags exceeding six months (see also McCracken and Ng, 2016).Base year changes are accounted for by normalizing the respective series to a unique basis.

B.2. EA-RTD data
The Euro Area Real Time Database (EA-RTD) is maintained by the European Central Bank based on information from its Monthly Bulletin, and available for download at sdw.ecb.europa.eu.A thorough description of the dataset is provided by Giannone et al. (2012).All 165 series are on a monthly frequency, with consistent coverage starting from 1999:01, while vintages are available from 2001:01.The dataset contains a substantial amount of variables in different units or transformations (e.g.unemployment numbers and rates), alongside series with substantial publication lags exceeding six months.We preselect the 94 variables used for the forecasting application due to these reasons.

C. ADDITIONAL RESULTS
This appendix provides additional information for variable specific forecast performance measures to be considered in the context of Sections 4.3 and 4.4.), each real time model is benchmarked against its complementary specification estimated using the pseudo out-of-sample information set.The relative differences and ratios can be interpreted as measures capturing the distance between real time and pseudo out-of-sample forecasts, and do not necessarily indicate superior forecast performance, since each real time model is benchmarked against its pseudo out-of-sample counterpart.
Corresponding average measures over the full holdout are displayed in Tabs.C.1 (US) and C.2 (EA).The tables feature average LPSs and RMSEs for the real time information set with the rank at the end of the holdout in parentheses in the first row per model.The second row indicates the difference (LPSs) and ratio (RMSEs) between real time and pseudo out-of-sample simulations within each model specification, with ranks for pseudo out-of-sample exercises in parentheses.Note: Each real time model is benchmarked against its complementary specification estimated using the pseudo out-of-sample information set (differences for density forecasts, ratios for point forecasts).The grey shaded areas indicate recessions dated by the CEPR Euro Area Business Cycle Dating Committee.

Fig. 1 :
Fig. 1: Rank correlation of different specifications between real time and pseudo out-ofsample forecasts over the holdout.Note: Ranks are derived from cumulative joint log predictive scores.The grey shaded areas indicate recessions dated by the NBER Business Cycle Dating Committee (US) and the CEPR Euro Area Business Cycle Dating Committee (EA).

4Figure 1
Figure1shows that real time and pseudo out-of-sample information sets produce different rankings in terms of model performance depending on the forecast horizon, and more importantly, specific features of the data releases.Comparing the US and the EA, differences in the release schedule of real time data for the EA lead to substantially more disagreement between real time and pseudo out-of-sample simulations regarding the relative performance of the competing models.

Fig. 3 :
Fig. 3: Relative cumulative LPSs of real time and pseudo out-of-sample information sets.Note: Each real time model is benchmarked against its complementary specification estimated using the pseudo out-of-sample information set.The grey shaded areas indicate recessions dated by the NBER Business Cycle Dating Committee (US) and the CEPR Euro Area Business Cycle Dating Committee (EA).

Fig. 4 :
Fig. 4: Data vintages.Note: The black lines indicate the respective maximum and minimum value at each point in time across all vintages.For imputed data, we present the posterior median estimate of the benchmark specification.The grey shaded areas indicate recessions dated by the NBER Business Cycle Dating Committee (US) and the CEPR Euro Area Business Cycle Dating Committee (EA).

Figure 4 Fig. 5 :
Figure4shows the full dataset of the focus variables over time, with the blue line marking the final vintage.To visualize the vast amount of data captured in all new releases and the imputed values, we proceed as follows.We use the posterior median of the nowcasted observations and calculate the minimum and maximum value of the respective series per period (black lines) across all vintages.The figure thus indicates both data revisions and uncertainty surrounding imputed values.We start with the US in panel (a).Since historical vintage data starts only in late 1998, data imputation only plays a role past this date.Interestingly, differences across vintages are also visible prior to this date, implying that earlier data are also subject to revisions.This notion is especially important for inflation.After 1998, the main differences originate from the nowcasting scheme, with largest deviations observable during recessionary episodes for inflation and unemployment.Interest rates, on the other hand, are comparatively unaffected by revisions and data imputation.The same is true for the EA.Data revisions and imputations in the EA, shown in Fig.4(b), play only a minor role for interest rates, evidenced by the narrow bounds surrounding

Fig. 6 :
Fig. 6: Rank correlation of different specifications between real time and pseudo out-ofsample forecasts for the US.Note: Ranks are derived from cumulative log predictive scores (Density forecasts) and cumulative absolute forecast errors (Point forecasts).The grey shaded areas indicate recessions dated by the NBER Business Cycle Dating Committee.

ForFig. 7 :
Fig. 7: Rank correlation of different specifications between real time and pseudo out-ofsample forecasts for the EA.Note: Ranks are derived from cumulative log predictive scores (Density forecasts) and cumulative absolute forecast errors (Point forecasts).The grey shaded areas indicate recessions dated by the CEPR Euro Area Business Cycle Dating Committee.
Figures C.1 (US) and C.3 (EA) show the model ranks for point and density forecasts for real time and pseudo out-of-sample simulations over time.In Figs.C.2 (US) and C.4 (EA Fig. C.2: Relative performance measures for real time and pseudo out-of-sample information sets for the US.Note: Each real time model is benchmarked against its complementary specification estimated using the pseudo out-of-sample information set (differences for density forecasts, ratios for point forecasts).The grey shaded areas indicate recessions dated by the NBER Business Cycle Dating Committee.
For the US, the Federal Reserve Bank of St. Louis maintains many macroeconomic and financial series on a monthly frequency starting in 1959:01, with monthly data vintages available from 1999:08.We preselect a set of 99 series with consistent availablity of all Giannone et al. (2012)2).The Monthly Bulletin provides the ECB Governing Council with the most recent macroeconomic and financial data available, and thus establishes a historical record of vintages.Since many of the variables of interest were only established after the inception of the Euro in 1999, consistent coverage can be achieved starting from 1999:01.We preselect a set of 94 relevant variables.The first available vintage was pub- Table B.1: Variables in the FRED-MD dataset.Notes: The dataset described in McCracken and Ng (2016) is available for download at research.stlouisfed.org(FRED-MD).Column Tc indicates the transformation of a series x t for obtaining stationarity: (1) no transformation, (2) ∆x t , (5) ∆ log(x t ), (6) ∆ 2 log(x t ) with ∆ i indicating ith differences.Columns Small, Medium and Large refer to different model sizes discussed in Section 3.
Giannone et al. (2012)n the EA-RTD dataset.Notes: The dataset described inGiannone et al. (2012)is available for download at sdw.ecb.europa.eu(EA-RTD).Column Tc indicates the transformation of a series x t for obtaining stationarity: (1) no transformation, (2) ∆x t , (5) ∆ log(x t ), (6) ∆ 2 log(x t ) with ∆ i indicating ith differences.Columns Small, Medium and Large refer to different model sizes discussed in Section 3.
Table C.1: Average marginal LPSs and RMSEs for the US.