Model instability in predictive exchange rate regressions

Abstract In this paper we aim to improve existing empirical exchange rate models by accounting for uncertainty with respect to the underlying structural representation. Within a flexible Bayesian framework, our modeling approach assumes that different regimes are characterized by commonly used structural exchange rate models, with transitions across regimes being driven by a Markov process. We assume a time‐varying transition probability matrix with transition probabilities depending on a measure of the monetary policy stance of the central bank at home and in the USA. We apply this model to a set of eight exchange rates against the US dollar. In a forecasting exercise, we show that model evidence varies over time, and a model approach that takes this empirical evidence seriously yields more accurate density forecasts for most currency pairs considered.


Introduction
Since the end of the Bretton Woods system in 1971, economists have been confronted with the challenging issue of designing empirical models of bilateral exchange rates which are also useful for forecasting applications.In a seminal contribution, Meese and Rogoff (1983) provide some early evidence that exchange rates are difficult to predict, at least in the short-run.Using a set of theoretical models in the spirit of Frankel (1979); Dornbusch (1976); Hooper and Morton (1982) that guide the choice of covariates included in the forecasting regression, Meese and Rogoff (1983) find that a simple random walk benchmark is difficult to outperform for most major exchange rate pairs.One reason for the dismal performance of most empirical and structural models is that, within a standard asset pricing framework, the high persistence of the underlying fundamentals in light of a discount factor near unity translates into highly persistent exchange rates.One consequence is that a random walk appears to be a benchmark extremely hard to beat (see Engel and West, 2005).
Over the years, a plethora of alternative econometric techniques emerged that provide more sophisticated means for analyzing exchange rate data to successfully improving longer-term predictions.The literature on unit roots and cointegration, for example, opened the way for tools to explicitly discriminate between short-term movements of a given currency pair and its long-run behavior.Mark (1995), for instance, applies an error correction model to a set of four exchange rates against the US dollar.Within this error correction framework, the exchange rate is assumed to return to its long-run equilibrium value determined by a simple monetary model eventually, with short-run fluctuations determined by own lagged values of the exchange rate and its fundamentals.The finding that exchange rates tend to be predictable in the mediumand long-run sparked a series of related contributions that corroborate this result for different periods and currency pairs (Groen, 2000;Mark and Sul, 2001;Rapach and Wohar, 2002).
More recently, several studies emphasized the usefulness of accounting for nonlinearities in the underlying econometric models to provide more precise exchange rate predictions (see, for example, Canova, 1993;Sarno et al., 2004;Mark, 2009;Byrne et al., 2016;Huber, 2016Huber, , 2017;;Huber and Zörner, 2018).These non-linearities may relate to movements in the error variances of the models or to changes in the regression coefficients over time.Byrne et al. (2016) assess whether exchange rate predictions can be improved by using time-varying parameter models for a set of competing models with variable choice guided by a set of theoretical models.
The majority of the above literature deals with the question on whether a given empirical model that is loosely based on an underlying structural model outperforms a set of competing models.However, another key source of non-linearities could stem from the fact that the underlying theoretical model changes over time, potentially jeopardizing the predictive fit of the econometric specification. 1 For instance, the recent success of Taylor rule based models (see Engel and West, 2006;Molodtsova et al., 2008;Molodtsova and Papell, 2009;Molodtsova et al., 2011) can be attributed to the fact that the involved central banks did actually follow a policy rule that is closely related to a Taylor rule.With short-term interest rates, however, reaching the zero lower bound (ZLB) and central banks starting to adopt unconventional monetary policy measures, the question arises whether a Taylor rule still proves to be an adequate exchange rate model.In fact, recent literature on non-linear Taylor rules suggests that during the ZLB, Taylor rule based models loose their momentum against simple random walk specifications (Byrne et al., 2016;Huber, 2017).
In this paper, we contribute to the literature by acknowledging this empirical evidence and propose a modeling framework that is capable of handling model instability over time in a flexible manner.We allow for dynamically switching between regimes, each incorporating different empirical exchange rate models.This is achieved by proposing a Markov switching (MS) regression model with each regime being characterized by different covariates arising from structural exchange rate models.In contrast to the existing literature, which relies on dynamic Bayesian model averaging techniques, our approach is an integrated modeling device.Through the introduction of time-varying transition probabilities, it allows to assess how the likelihood that a given model is adopted for each point in time depends on some signal variable.As signal variables, we adopt the (lagged) interest rate of the home and foreign country.This specification is motivated by the observation that Taylor rule fundamentals are good predictors in periods of the great moderation (with policy rates being significantly larger than zero), but are known for their weak performance after the Great Recession (characterized by policy rates close to zero).
We assess the merits of the proposed approach using a forecasting exercise for eight different exchange rates against the US dollar.By considering the resulting regime allocation and the transition probabilities, we examine whether structural models indeed tend to change and how this is related to movements in policy rates.The findings indicate that allowing for time-varying probabilities is a key feature, pointing towards a strong relationship between policy rates and the underlying transition distribution of the Markov process.In terms of forecasting, we find that our proposed model improve upon the random walk for selected currencies, both in terms of point and density predictions.The improvements for point forecasts are, however, muted.Comparing different model features reveals that a model based on a set of predictors that is based on fundamentals from various structural models combined with shrinkage priors and non-linearities (in the form of Markov switching) is also competitive.
The remainder of this paper is organized as follows.Section 2 discusses the four structural exchange rate models adopted while Section 3 proposes the econometric framework.The empirical application is presented in Section 4. Finally, the last section summarizes and concludes the paper.A technical appendix provides details on the estimation algorithm adopted.

Theoretical exchange rate models
In this section, we briefly discuss the main theoretical underpinnings to be used to guide covariate inclusion in the empirical model as well as to structurally identify the different regimes considered in our non-linear regression framework.
The point of departure for the discussion is a set of macroeconomic and financial quantities stored in an R-dimensional matrix X t , with i t−1 denoting the lagged short-term interest rate, π t inflation, x t output gap, m t money supply, y t income, p t price level, while the real exchange rate is denoted by q t and the exchange rate by e t . 2 The subsets of X t , X jt (j = 1, . . ., 4), represent the different structural models we are going to describe next.

A taxonomy of selected models of exchange rate determination
In the following, we provide a brief taxonomy of the theoretical models considered, depending on the theory adapted, guiding the specific partitions of X t .
• Our starting point is the model based on Taylor rule fundamentals (see Molodtsova and Papell, 2009, for a recent forecasting study).This specification assumes that the set of predictors is given by X 1t and thus includes the lagged short-term interest rate, inflation and the output gap of both the home and foreign country, and the real exchange rate.This model has proved to be successful in terms of describing exchange rate movements, both in-sample (Engel and West, 2006) and out-of-sample (Molodtsova et al., 2008(Molodtsova et al., , 2011)).However, one critical assumption of this model is that the central bank at home and abroad is actively pursuing a Taylor rule-type monetary policy strategy.Especially during the recent period of the ZLB, this assumption could be violated, effectively leading to an inferior model fit.
• The second model considered is the long-run monetary model.The monetary model assumes that the covariates are given by X 2t and include data on domestic and foreign money supply as well as the cross-country differences in income for a given income elasticity.As mentioned by Rapach and Wohar (2002), the longrun monetary model simply states that the price level of the home and foreign country is determined by the money supply and the level of production.Assuming purchasing power parities (PPP) and uncovered interest rate parity (UIP), one is able to relate the change of the exchange rate to supply and demand for money.
• Third, we consider a model based on PPP.This model is based on using X 3t , leading to a regression model that includes domestic and foreign price indices.PPP originates from the theory of one price in goods markets, which in turn implies that the real exchange rate is supposed to revert to a long-run equilibrium level determined by relative prices.If this turns out to be true, the real exchange rate is a stationary process.However, Sarno (2005) highlights substantial persistence in real exchange rates.The convergence towards PPP is thus slow in the long run and real exchange rates typically display pronounced deviations from their PPP-implied fundamentals in the short run.
• Finally, we also augment our forecasting regression with the UIP model.By selecting X 4t , this model simply establishes a relationship between the change in the exchange rate and the interest rate differential between home and abroad.Following Chinn (2006), UIP implies a positive one-to-one relationship between the interest rate differential and changes in the exchange rate.A positive change in the interest differential may be potentially followed by both an immediate and persistent appreciation in the short run, implying that UIP does not hold immediately.Here, we follow Molodtsova and Papell (2009), who address the UIP puzzle by not placing any restrictions on the coefficients.3 All these models have been shown to possess some merit in terms of predictive power.However, several recent studies find remarkable heterogeneity with respect to the fundamental model adopted (see, inter alia, Wright, 2008;Beckmann and Schüssler, 2016;Byrne et al., 2018;Beckmann et al., 2018).In particular, Beckmann et al. (2018) argue that an investor is capable to adjust his strategy by means of sequential learning and incorporating new information arriving in each point in time.The authors assume this behaviour is captured best by a dynamically changing set of fundamentals depending on historical forecast performance.
We now turn to describing our model framework that allows for dealing with issues of model instability in an intuitive way.

Controlling for model instability in empirical exchange rate models
In this section, we propose a model that controls for dynamic model instability by specifying a non-linear econometric framework.After summarizing the model structure in Subsection 3.1, we highlight the prior setup adopted in Subsection 3.2.

A Markov switching model specification
We now turn to describing the proposed Markov switching model with time-varying transition probabilities (MS-TVP).The key feature of our proposed framework is that, in general, it allows for switching between the fundamentals implied by K competing theoretical exchange rate models.We assume that exchange rate returns ∆e t follow an MS-TVP model given by (3.1) Hereby, S t ∈ {1, . . ., K} follows a first-order Markov process, β k represents a vector of dimension M k that collects the state-specific coefficients of state S t = k while η t ∼ N (0, σ 2 St ) is a white noise shock with regime-specific variance σ 2 St .Note that, each β k may exhibit different dimensions.We depart from the traditional literature on Markov switching models (see, among many others, Hamilton, 1994;Engel, 1994;Filardo, 1994;Amisano and Fagan, 2013;Kaufmann, 2015;Billio et al., 2016;Huber and Fischer, 2018;Casarin et al., 2018) by assuming that the regimes are characterized by competing structural exchange rate models, implying that different fundamentals enter the predictive exchange rate regression at different points in time.
In the spirit of Belmonte and Koop (2014) and Frühwirth-Schnatter (2006) we introduce a selection matrix D St that entails switching between K alternative model specifications, with X t−1 denoting an R-dimensional vector of the full set of economic fundamentals.
The selection matrix D k of state S t = k is a R × M dimensional matrix with binary indicators that allow for choosing β k and X kt−1 while zeroing out the elements in β and X t−1 associated with the remaining models.For instance, we effectively obtain the model based on Taylor rule fundamentals, characterized through S t = 1, by setting whereby 0 i×j is a i × j-dimensional matrix of zeros.Multiplying β from the left with From this discussion, it is clear that the matrix D St effectively controls the prevailing structural exchange rate model and the set of covariates to include in the state-specific regression.Notice, that an MS kitchen-sink regression is obtained by defining D St in such a way that in each states all economic indicators are included at all points in time.

Time-varying transition probabilities
Assuming constant transition probabilities is a standard (and potentially restrictive) assumption in Markov switching models (for economically motivated examples, see Filardo, 1994;Amisano and Fagan, 2013;Kaufmann, 2015).Both, Amisano and Fagan (2013) and Kaufmann (2015) propose treating the transition distributions as being dependent on additional covariates.Here, and since our model features K regimes, we follow Kaufmann (2015) and parameterize the transition probabilities by a multinomial logit specification.Given the forecasting evidence provided in the literature quoted above, we assume that the transition probabilities depend on a measure of the monetary policy stance such as the policy rate.This captures the notion that if policy rates approach the ZLB, a Taylor rule-based model might become inadequate and the likelihood of a regime-shift could increase.
Let ST = (S 1 , . . ., S T ) denote the full history of the state vector, the multinomial likelihood reads , with category-specific regression coefficients γ kj = (γ 0,kj , . . ., γ N,kj ) , collected in γ for all k and j.Moreover, we define Z t as an N -dimensional set of covariates.This set of covariates is given by whereby z t is a vector of covariates that determine the dynamics of the transition probabilities while I(•) denotes an indicator function that equals one if its argument is true.This implies that we capture a first-order Markov structure by including the previous states as additional regressors.Moreover, γ 0,kj represents the intercept of the reference state S t−1 = K, and thus captures the corresponding time-invariant state persistence.
Consistent with Amisano and Fagan (2013), we let the coefficients associated with z t be regime-invariant.It is worth noting that, if coefficients of z t are zero, we obtain a classic fixed transition probability Markov switching model.For further convenience, define ZT = (Z 1 , . . ., Z T ) and zT = (z 1 , . . ., z T ) to be the history of Z t and z t up to time T .The specific choice of z t proves to be an important modeling decision.As mentioned above, our goal is to include a measure of the (conventional) monetary policy stance to signal a potential transition from Taylor rule-type based policy making to discretionary monetary policy actions such as quantitative easing (QE).In our case, we assume two early warning indicators z t = ( ĩt−1 , ĩ * t−1 ) , the demeaned, lagged interest rate at home and abroad.Consequently, the three-dimensional coefficient vector γ kj determines the sensitivity of the transition probability that drives the transition from the kth to the jth state.The demeaned covariates imply the centered parameterization of Kaufmann (2015), since a covariate zT can be rewritten as a linear combination of its time-varying component (z T − z) and its mean z, which affects the time-invariant average state persistence.Demeaning covariates ensure that the time-invariant part does not depend on the scale of z t .

Prior specification and estimation strategy
Our approach is Bayesian and this implies that we have to carefully specify priors on the parameters of the model.Here, we follow George andMcCulloch (1993, 1997) and specify a mixture of Gaussians prior on β ik , the ith element of β k .The prior is centered on the theoretically motivated restrictions, in order to test whether these restrictions hold true.The prior mean is stored in a M k -dimensional vector β k and summarized in Table 1.A priori we assume a symmetric Taylor rule with same coefficients for the home and foreign country and do not consider interest smoothing (see Molodtsova and Papell, 2009, for a detailed discussion).For the remaining models we center them on the implied long-run fundamental value.
Formally, this prior reads Here, we let τ 2 ik,0 and τ 2 ik,1 be prior variances (with τ 2 ik,1 τ 2 ik,0 ), for i = 1, . . .M k , and β ik denotes the ith element of β k .The first mixture component is referred to as the 'spike' component, tightly fixed around the prior mean β ik while the second is called the 'slab' component, which translates into almost no prior influence.The indicator δ ik serves to select the mixture component used.Following the semiautomatic approach of George et al. (2008), moreover, we scale the prior variances, τ 2 ik,0 and τ 2 ik,1 , with variances of the ordinary least square estimates of the underlying structural model of state S t = k.
This modeling approach constitutes a data-driven way of assessing whether coefficients should be pushed towards theoretically motivated restrictions or allowed to be closely related to the corresponding maximum likelihood estimate.Thus, if δ ik = 0, the posterior estimate of β ik is strongly pushed towards the prior restriction β ik while in the opposite case only little prior information on β ik is introduced.In what follows, we store all regime-specific indicators in a vector δ k = (δ 1k , . . ., δ M k ) that corresponds to the block of β associated with the kth structural model.Each element of the latent variable δ k is a priori independently Bernoulli distributed, for hyperparameters ω ik ∈ ω k , an M k -dimensional vector, chosen by the researcher.Again, as in the case of δ k , the dimensionality of ω k and the elements directly corresponds to the coefficient vector β k .A reasonable choice is ω ik = 0.5, for all i, k, implying an equal prior probability of introducing significant prior information or using a relatively loose prior. 4or the variances σ 2 , we assume an independent inverse Gamma prior for each element σ 2 i (i = 1, . . ., K).More specifically, we set with a 0 and A 0 being scalars.The specific values for a 0 and A 0 are chosen to be weakly informative with hyperparameters a 0 = 0.01, A 0 = 0.01.
The prior distribution on the initial state is set to p(S 0 = k) = 1/K, for all k.(Kaufmann, 2015).Finally, for the coefficients of the multinomial logit model, we adopt a weakly informative and symmetric prior across all states.That is, for all k and j = 1, . . ., K with V = ζI K , and ζ denoting a scalar.In the empirical application we set ζ = 100.
In a Bayesian framework, we combine the likelihood with the prior to obtain the posterior distribution.In our case, the joint posterior density is intractable.Fortunately, however, the full conditional posterior distributions take simple forms, permitting Gibbs updating steps.The Markov chain Monte Carlo (MCMC) algorithm is described in more detail in Appendix A. In the empirical application, we repeat the algorithm 80,000 times, discard the first 30,000 draws as burn-in and define a thinning factor of ten, thus basing inference on 5,000 draws from the joint posterior.
Before proceeding to the empirical application, a brief word on identification is in order.Identification is necessary for structural interpretation of the states, but is not relevant if interest centers exclusively on the predictive density of the model (Frühwirth-Schnatter, 2001, 2006). 5Recall that in the present model, each regime is characterized by a different set of fundamentals, reflecting different theoretical exchange rate models.By exploiting the specific structure of the theoretical models we have imposed inequality constraints on the coefficients assuming which fundamentals enter each regime.The only potential source of non-identifiability occurs in case of more than one state pointing towards a random walk.However, pushing coefficients in direction of theoretical guided values is sufficient to disentangle regimes and fully identify the model.When considering the alternative specification, in which we always include all predictors, identification is certainly an issue.Moreover, each state is implicitly centered on a random walk a priori.In this case, we apply a permutation sampling step and solely focus on predictive densities.

Empirical application
This section starts by briefly describing the dataset and forecasting design adopted in Subsection 4.1.Then, we discuss key in-sample features of the model in Subsection 4.2.Finally, Subsection 4.3 presents the main forecasting results, discriminating between point and density forecasting performance of all models considered.

Data, forecasting design, and competing models
In this paper, our aim is to forecast bilateral exchange rates for Australia, Canada, Japan, Norway, South Korea, Sweden, Switzerland and the United Kingdom relative to the US dollar.We collect monthly data on nominal exchange rates, industrial productions, monetary aggregates, three-month money market rates and consumer price indices for countries under consideration.Table 2 depicts the data transformations of variables.Table 3 provides an overview of the data alongside information on time coverage and source of economic fundamentals.Table 3: Sources of economic fundamentals.
In order to assess whether time-varying transition probabilities improve predictive accuracy, the proposed model framework is benchmarked with MS specifications with fixed transition probabilities (labeled MS-FT), as well as standard structural exchange rate models that are estimated under weakly informative priors (labeled linear).These linear benchmarks are based on Taylor rule, monetary, PPP, and UIP fundamentals.Therefore, in the forecasting exercise, the set of competing models is divided in the three overall classes: MS-TVP, MS-FT and linear.Moreover, we consider not only theoretically motivated MS-TVP and MS-FT specifications, but also models that include all macroeconomic indicators of X t−1 within each state (labeled kitchen-sink).For the kitchen-sink regressions, we consider different numbers of states, ranging from two to four regimes.Moreover, to allow for state-specific shrinkage in kitchen-sink regressions, the SSVS prior described above is centered on zero and different state-specific indicators are estimated.To assess the role of allowing for heteroscedasticity in forecasting exchange rates, we also consider MS-TVP specifications with state-specific variances.All models are then benchmarked to the random walk without drift.
We evaluate predictive accuracy by means of a recursive pseudo out-of-sample forecasting exercise.This implies choosing a initial estimation period that ranges from t = 1 up to t = T 0 with the remaining periods used as a hold-out sample.In the present application, we estimate all models using data up to 2004M12 and then proceed by computing h-step-ahead predictions for t = T 0 + 1.After obtaining draws from the corresponding predictive distributions, we consequently expand the initial estimation period by one month.This procedure is repeated until the end of the sample is reached.
To rank forecasts, we rely on cumulative squared forecast errors (CSFEs) to assess the quality of point forecasts.As point predictions, we take the posterior median of the predictive density.Turning to density forecasts, we follow Geweke and Amisano (2010) and rely on the log predictive score (LPS) to measure density forecasting accuracy.This has the advantage that, conditional on the proposed model and data, uncertainty surrounding the parameters and latent quantities is integrated out.After obtaining the LPS, we compute log predictive Bayes factors (LBFs) for the entire hold-out sample by computing the difference between the LPS of a given model relative to the random walk.

Inspecting evidence for model instability
We now turn to assess whether our model proposed in Subsection 3.1 signals significant shifts in the underlying structural representation.Figure 1 summarizes the mean of the filtered state probabilities for the eight exchange rates considered.In general, we observe that the regime dynamics across countries share one common feature.The models based on Taylor rule (state 0) and the UIP fundamentals (state 4) appear to be the dominant states before the global financial crisis in 2008/2009.After that period, however, model evidence changes significantly for the majority of countries.More precisely, models based on monetary (state 2) and PPP (state 3) fundamentals tend to receive more posterior support.
Compared to the remaining currencies, the Swiss franc (see Fig. 1(c)) exhibits a somewhat higher regime-switching frequency.Figure 1, moreover, suggests that hitting the ZLB does shift filtered probabilities, supporting regimes other than Taylor rule fundamentals (state 0) for countries such as Australia, Canada, South Korea, Sweden, and the United Kingdom. 6Countries such as Japan and Switzerland, on the other hand, indicate an opposite dynamic, namely a shift of probabilities towards the Taylor rule state.
Taking a closer look at the United Kingdom, Taylor rule fundamentals are the predominant regime, reflecting the fact that these quantities tend to describe exchange rates well in times when the primary policy rule of the Bank of England is the Taylor rule.After this period, transition probabilities point towards a first shift during the crisis of the European Monetary System.After the financial crisis, and upon hitting the ZLB, the model based on Taylor rule fundamentals receives only limited posterior support.It is noteworthy that after 2010, the short-term interest rate (both at home and abroad) is stuck at zero (and almost constant).This implies that the model based on interest fundamentals closely mimic a random walk with drift during this period, even without introducing shrinkage.
The transition probabilities, depicted in Figs. 2 and 3, generally track the movements in filtered state probabilities, providing considerable evidence of time-varying transition distributions.Our findings thus suggest that a measure of the monetary policy stance at home and abroad tends to drive transitions between structural models.This is consistent with our conjecture that during the period of the ZLB, using Taylor rule-based exchange rate models might be inappropriate, at least from an in-sample perspective.

Forecasting results
In this section, interest centers on the predictive performance of our proposed MS-TVP specification.The discussion in the last section highlights that it proves to be important to allow for time-varying transition probabilities in-sample for several exchange rate pairs.This suggests that parameterizing the transition distributions with additional covariates helps to avoid situations where the model gets stuck within a certain state.Amisano and Fagan (2013) and Kaufmann (2015) highlight this issue and point towards advantages of explaining the regime-switching behavior of the model as opposed to a model based on constant transition probabilities.The key question, however, is whether this additional flexibility also improves predictive performance.We answer this question using both, point and density forecasts.

Point forecasts
Figures 4 and 5 present the evolution of CSFEs of one-step-ahead forecasts for the best performing models across the considered model classes.We consider all linear predictive exchange rate regressions and the five best performing MS-TVP and MS-FT models, according the CSFEs at the end of the hold-out-sample.CSFEs of all models are presented relative to the CSFEs of the random walk benchmark.Thus, values below zero indicate more accurate forecast relative to random walk predictions.Here, we focus on one-step-ahead forecasts since we find that models that perform well at the one-step-ahead horizon also do well for h > 1 periods ahead. 7When considering density forecasts, we report the results for higher order predictions as well.
Turning to the actual results, we observe pronounced differences across countries.For instance, in Australia, Canada, Norway, Sweden and the United Kingdom, modeling non-linearities pays off, in particular during periods of financial turmoil, outperforming forecasts of linear models as well as the random walk benchmark.herefore, one interesting finding is that controlling for heteroscedasticity also tends to exert a positive effect on the point forecasting performance during volatile periods of the business cycle.
By contrast, for South Korea and Switzerland, the random walk appears to be hard to beat.In general, we observe error ratios that are close to unity when averaged over the full hold-out sample.This indicates that including more information does not necessarily translate into improved point predictions relative to a simple no-change forecast for these two economies.Again, we find some heterogeneity in relative forecasting performance over time.
Turning to the performance of the theoretically inspired MS-TVP specifications, we observe strong forecasting accuracy for Japan, Norway, South Korea, and the United Kingdom, at least for one specification (marked red in Figs. 6 and 7).On the other hand, it appears that kitchen-sink specifications dominate theoretically inspired regimes for Australia, Canada, Switzerland and Sweden.In particular, this holds true for Switzerland as indicated by the absence of a red colored line in Fig. 6(c) for MS-TVPs.
When comparing MS-TVP to MS-FT models, we observe that MS-FT specifications appear to be more robust over time in terms of CSFEs.This can be seen by noting that the forecast errors of the MS-FTs feature fewer outliers.In general, we observe a better performance of MS-TVP models for Australia, Japan and Norway, at least for one model, compared to MS-FT models; although MS-FT models perform well for the United Kingdom and Canada.

Density forecasts
Tables 4 and 5 depict a summary of all models' LBFs for all currency pairs considered.
Values highlighted in green point towards outperformance of a model relative to the random walk while red values signal a weaker predictive performance when benchmarked against the random walk.To provide a dynamic picture of LBFs over time, Fig.s 6 and 7, again, show the LBFs of all linear predictive exchange rate regressions and the five best performing MS-TVP and MS-FT models.
In general, both tables attest non-linear specifications good predictive power, while linear models display a somewhat weaker forecast performance.Furthermore, our results suggest that non-linear models that perform well in terms of point predictions also exhibit high predictive capabilities in terms of density forecasts.However, we find that predictive performance evolves differently for CSFEs and the LBFs.Density forecasts strengthen the argument in favor of non-linear models, as the performance gains of the MS models are even more sizable in periods of high exchange rate volatility while accuracy losses in tranquil times are rather muted.
Although we do not observe a single dominant non-linear model across forecast horizons and countries, Tables 4 and 5 suggest that at least one non-linear specification outperforms the random walk and the linear counterparts as well.One exception proves to be Japan, for which the best specification is either the random walk (for one-step ahead predictions) or the linear PPP model (for longer horizons).
For all forecast horizons considered, non-linear specifications do well for Australia, Canada, South Korea, Sweden and the United Kingdom.Specifically, the MS-TVP kitchen-sink specification with four states coupled with state-specific variances, constituting the most flexible specification, has good predictive power for Australia, Canada and South Korea.For Australia, this model is the single best performing model across forecast horizons and for South Korea, it is the best for three-and twelve-step-ahead forecast and the second best specification in terms of one-step ahead forecasts.Moreover, as shown in Figs. 6 and 7, the poor performance of linear exchange rate models is even more pronounced for density forecasts.In particular, the linear structural regression display a sharp decline in predictive power after the financial crisis, corroborating findings in Byrne et al. (2016); Molodtsova and Papell (2012).
Turning to the question whether allowing for heteroscedasticty pays off in terms of density forecasting, we find substantial evidence that this additional flexibility proves to be important.The gains in predictive accuracy of non-linear models can mainly be attributed to the more flexible variance specification of MS models.This can also be seen by comparing MS specifications with a common variance across states with MS models that feature individual state variances.For these models, we observe a slight accuracy premium relative to their homoscedastic counterparts.Allowing for statespecific variances thus appears to be an important ingredient of a successful forecasting model.However, this increased flexibility comes at a cost.Specifically, we observe that during normal periods, relative predictive accuracy declines steadily for several MS models (see, for example, Fig. 7(c)), in line with recent evidence provided in Abbate and Marcellino (2018).
Contrasting MS-TVP with MS-FT models, Figs. 6 and 7 show that MS-TVP models yield more precise density forecasts for Australia, Norway and South Korea and Switzerland, while yielding an almost equivalent performance for Canada.Moreover, with time-varying transition probabilities, theoretically motivated specifications play a more important role than for MS-FT models.This points towards potential accuracy premia obtained by allowing for time-varying transition probabilities and thus sharpen inference surrounding the regime allocation.
Theoretically motivated MS-TVP specifications exhibit good forecast performance for Australia, Canada and Norway.Considering the results in Sweden, theoretically motivated MS-TVPs perform well during prolonged periods of high exchange rate volatility.In periods of low volatility, however, this specification is slightly outperformed by competing specifications.By contrast, we find that for South Korea, MS-TVP kitchen-sink regressions improve upon our proposed MS-TVP model that allow for switching across structural exchange rate regressions.For the remaining countries in Figs. 6 and 7, no clear pattern emerges when comparing both model types.For example, using structural MS-TVP specifications yields strong increases in predictive power during the financial crisis but a weaker performance afterwards.For kitchen-sink MS-TVP regressions, we find no gain during the crisis but, at the same time, no subsequent loss in the aftermath of the crisis.
Finally, we assess whether using shrinkage priors on the coefficients improves forecasts.Tables 4 and 5 indicate that shrinkage generally translates into better results in pairwise comparisons with the corresponding non-shrinkage counterpart.This observation, however, is not consistent across all models, countries and forecast horizons considered.In particular, using a kitchen-sink regression without shrinkage leads to poor forecast performance, as already shown by Li et al. (2015); Wright (2008).Turning to theoretically motivated MS models provides mixed insights on whether using shrinkage is useful.We conjecture that this stems from the fact that adopting Markov switching specifications with theoretically defined regimes already introduces a certain amount of regularization that helps avoid overfitting.

Concluding remarks
In this paper, we propose a Bayesian non-linear time series model for exchange rates.
Our framework, a multi process model, allows for dynamically switching between selected theoretical exchange rate models that are used to guide the specific choice of covariates included.As an additional novelty, we assume that the transition probabilities vary over time and depend on a measure of the monetary policy stance at home and abroad.This feature enables us to capture breaks in the policy rule of the central bank that, in turn, could impact the prevailing structural exchange rate model adopted.For instance, our framework entails to dynamically switch between models if short-term interest rates hit the zero lower bound.
We use this framework to predict eight exchange rates vis-á-vis the US dollar.Considering the transition probabilities, we find considerable evidence of time-variation.The filtered probabilities indicate that especially after interest rates are bound to zero, model evidence shifts in favor of models other than the Taylor rule based models, highlighting the necessity to control for model uncertainty.To assess whether this feature also translates into predictive accuracy gains, we conduct a forecasting exercise.There, we find that results appear to be rather mixed, with point forecasts being only slightly better than the ones obtained from standard models.In terms of density predictions, however, we observe pronounced accuracy increases for selected exchange rates.from a Pólya Gamma distribution.This approach has the advantage of fast convergence, simple implementation and no need for an additional layer for approximating the error distribution.
C is a T × K − 1 dimensional matrix, with elements being defined as and e j represents the unit vector with an one at the jth position.

Fig. 1 :
Fig. 1: Mean posterior state probabilities for a weakly informative prior and a common variance across states.State 0 indicates Taylor rule fundamentals, state 1 monetary fundamentals, state 2 PPP fundamentals and state 4 UIP fundamentals.The vertical bars (yellow) indicate NBER recessions for the US.The black solid line depicts the country-specific interest rate and the red solid line the interest rate for the US.The left-axis shows the probabilities and the right-scale the values of interest rates.

Fig. 2 :
Fig. 2: The blue line depicts the posterior mean of time-varying transition probabilities for each state with a weakly informative prior and common variance across states.State 0 indicates Taylor rule fundamentals, state 1 monetary fundamentals, state 2 PPP fundamentals and state 4 interest rate fundamentals.The vertical bars (yellow) indicate NBER recessions for the US.

Fig. 3 :
Fig. 3: The blue line depicts the posterior mean of time-varying transition probabilities for each state with a weakly informative prior and common variance across states.State 0 indicates Taylor rule fundamentals, state 1 monetary fundamentals, state 2 PPP fundamentals and state 4 interest rate fundamentals.The vertical bars (yellow) indicate NBER recessions for the US.

Table 1 :
Prior mean β k for each state

Table 2 :
Transformation of variables.

Table 4 :
Cumulative one-, three-, and twelve-step-ahead LBFs (random walk benchmark) at the end of the full hold-out sample summarized for Australia, Canada, Switzerland and Japan.Values highlighted green are greater than zero, values highlighted red are smaller than zero, indicating a better or a weaker performance compared to the random walk.Best model in bold.