A Bayesian Time-Varying Autoregressive Model for Improved Short- and Long-Term Prediction

Motivated by the application to German interest rates, we propose a timevarying autoregressive model for short and long term prediction of time series that exhibit a temporary non-stationary behavior but are assumed to mean revert in the long run. We use a Bayesian formulation to incorporate prior assumptions on the mean reverting process in the model and thereby regularize predictions in the far future. We use MCMC-based inference by deriving relevant full conditional distributions and employ a Metropolis-Hastings within Gibbs Sampler approach to sample from the posterior (predictive) distribution. In combining data-driven short term predictions with long term distribution assumptions our model is competitive to the existing methods in the short horizon while yielding reasonable predictions in the long run. We apply our model to interest rate data and contrast the forecasting performance to the one of a 2-Additive-Factor Gaussian model as well as to the predictions of a dynamic Nelson-Siegel model.


Introduction
To forecast an univariate time series the first model of choice is often a linear model.A very basic example of this model class in the context of time series analysis is the autoregressive model of order 1 (AR(1)), which is defined as follows: where x t represents the observed variable at time point t and α and β are real valued constants, while |β| < 1 is assumed to ensure stationarity.The innovation process t can be, e.g., a Gaussian white noise process, i.e., an independent and identically (i.i.d.) normal distributed t i.i.d.
∼ N (0, σ 2 ) for all time points t ∈ T .Linearity is often an excessively strict assumption in practice and many time series exhibit features that can not be captured by a linear model [14].In the last decades a lot of research has been conducted to introduce different types of nonlinear models.A bi-linear model is an example of a nonlinear model, which assumes a nonlinear relationship between the covariates and response variable [see, e.g., 12,26], although not often used in macroeconomic applications [20].A more typical approach is to allow one (or more) parameters of a linear model to change over time.This comprises the regime switching and time-varying parameter models.
The first approaches to regime switching models were conducted by Quandt [21], who considered a switching regression model extending a linear regression model by allowing the parameters to switch according to a random variable.Bacon and Watts [1] introduced a smooth transition model, which implements a smooth transition from one regime to another without a sudden jump.Goldfeld and Quandt [11] introduce the Markov switching regression model and use a discrete latent Markov process to determine the current regime.These models were adapted to time series models by Lim and Tong [17] and Chan and Tong [5] introducing the threshold autoregressive model (TAR) and the smooth transition autoregressive model (STAR), respectively.Hamilton [14] introduced the Markov switching autoregressive model for applications in economics.These are amongst the most famous regime switching models used in macroeconomics and have been investigated thoroughly together with different variants in the literature [13,27,15].Lanne and Saikkonen [16] used a TAR-model, which only allows regime changes for the constant parameter α and applied it to strongly autocorrelated time series data.
In contrast to regime switching models, which allow the parameters to take a finite number of states, time-varying parameter models allow one (or more) of the parameters in a linear model to be driven by its own continuous process [20].For example, if the parameter vector (α, β, σ 2 ) of the linear AR(1) model becomes a stochastic process, this results in a time-varying autoregressive model of order 1 (TV-AR(1)) with t ∼ N (0, σ 2 t ).Certain distribution assumptions for the underlying stochastic process of the parameter vector (α t , β t , σ t ) are made in practice to complete the TV-AR(1) model specification [28].Similar to the TAR model in Lanne and Saikkonen [16] the time variation of the TV-AR(1) model can be restricted to the constant parameter α t , resulting in a time-varying constant autoregressive model of order 1 (TVC-AR(1)): If |β| < 1 and the latent process of α t is stationary, the process x is also stationary.But due to random shifts in the mean reversion level -because of the time-varying constant parameter -realizations of the model can resemble those of a (close to) random walk process, when restricting to a limited time window.
Strong autocorrelation and (close to) random walk behavior of actual stationary time series is the time series feature we will address in this work.Using a linear (near) integrated process to model these time series might not account for characteristics valid according to economic theory.As Lanne und Saikkonen-Lanne and Saikkonen [16] point out, an impulse response function would imply a very slow mean reversion inconsistent with properties of many economic variables.Also, the behavior in the very long horizon might be unrealistic.A (near) integrated process, e.g., applied to interest rates or unemployment rates, might lead -due to its large variance -to extreme values in the long run that have been never observed in the past.Furthermore, estimating the model parameters of a near integrated but stationary process might include large estimation errors if the sample size is not sufficiently large.
We therefore consider a nonlinear model, which allows for a time-varying mean reversion level.Specifically, we propose a Bayesian TVC-AR(1) model, which is still stationary but has linear properties similar to an integrated or nearly integrated process due to a stochastic mean reversion level.Furthermore, the Bayesian approach allows us to regularize the long run distribution of the time series without affecting the short-term distributions adversely.The novelty of our approach lies in the proposed Bayesian framework that (1) allows a model with linear properties in accordance with economic theory, (2) with the possibility to regularize the long run distribution by using prior assumptions and (3) if applied, e.g., to interest rates gives improved forecasting performance in the short horizon compared to commonly used linear models in practice.Moreover, we place particular emphasis on the interpretability of the model structure and prior parameters.This allows to include expert knowledge or assumptions in accordance with economic theory about the long run behavior of a time series into the model in a sound mathematical way.
The remainder of this paper is arranged as follows.Section 2 specifies the Bayesian TVC-AR(1), including the derivation of required full conditional posterior distributions and the application of a Metropolis-Hastings within Gibbs sampling routine for statistical inference.In Section 3 we discuss an application of our model to interest rate data and compare the forecasting performance as well as the long run distribution of our nonlinear model with the dynamic Nelson-Siegel model and the Gauss2++ model, which is a standard model in the insurance industry.We conclude with Section 4 and give a brief outlook on potential further research topics.

A Bayesian TVC-AR(1) Model for Long Run Regularization
In this Section we introduce the Bayesian TVC-AR(1) (BTVC-AR(1)) model.The model incorporates assumptions about the long-term behavior of the time series and thereby regularizes the process in the long horizon.At the same time, the model is mainly driven by the given data in the short run and thus fosters a good short-term prediction.
t is assumed to be a Gaussian white noise process, i.e., t i.i.d.
∼ N (0, σ 2 ).We further specify α t as a stationary Gaussian process specified by its unconditional expectation θ := ϑ • 1 and covariance Σ, i.e., The Bayesian approach considers the parameters of model ( 4) as random variables.For the conditional prior distribution of β conditional on σ 2 a truncated normal distribution with lower bound −1 and upper bound 1 is assumed as a prior, i.e., with conditional prior expectation µ β and additional multiplicative variance parameter σ β .The prior distribution for σ 2 is an inverse gamma distribution with shape and scale parameter, a and b, respectively, These two prior distributions are conjugate priors for model (4) if the respective other parameter is known and therefore allow for an analytical derivation of the corresponding full conditional distributions.
Using these priors the defined model can be seen as a Bayesian version of the TVC-AR(1) model.The mean θ and covariance Σ might be assumed fixed or defined as random variables with further attached prior distributions.In the latter case (5) describes the distribution of α conditional on θ and Σ. Placing priors on these parameters allows to incorporate assumptions about the long run distribution into the model as further elaborated in Section 2.2.
While this basic model setup is flexible in many ways and particular in terms of its covariance structure assumptions for α, further practical insights can be obtained from a more in-depth model characterization.In the following, we will shed light on useful properties of this framework when assuming an ARcovariance structure.

Arbitrating Between Short and Long Run Distribution
The goal of our work is to propose a new modeling framework, which can regularize the long run distribution of (nearly) integrated time series by keeping a good forecasting performance in the short horizon.Linear models often concentrate on the conditional distribution in the short horizon, but due to the near integration property of the time series this can lead to inappropriate long run distributions.For example, if the AR(1) model is estimated for a time series, which shows a (close to) random walk behavior, the parameter β of the model will take a value close to 1.This can lead to a large long run variance given by potentially yielding unrealistic values in the long run that have never been observed in the past.On the other hand, calibrating β to a given long run variance is not straightforward without deteriorating the short run prediction performance.Figure 1 depicts this undesired behavior by showing the long run mean of a linear AR(1) model which is driven by the conditional short run distribution at the expense of an unrealistic long-term distribution.
We address this issue by incorporating a time-varying mean reversion level, which locally preserves the good short term prediction and at the same time regularizes the long run distribution.The current mean reversion level valid in the short run can be different to the long run behavior accounting for the current market situation and therefore improving the short run prediction.We enable the model to stay in a reasonable range in the long run by assuming a stationary process for the time-varying mean reversion level and a stronger mean reversion to this time-varying level than a linear AR(1) model would induce to its constant mean reversion level.Such a behavior can be achieved by introducing a time-varying α parameter into a linear AR(1) model with additional prior assumptions.In particular, this does not change the (weak) stationarity property of the model if the assumed process for α is (weakly) stationary itself.This can be verified by calculating the unconditional mean, the unconditional variance and the unconditional covariance: As the α-process is stationary and the BTVC-AR( 1) is (weakly) stationary.
The time-varying α increases the flexibility of our model to account for short and long run distributional properties.As current observations have almost no influence in the very long run, a reasonable way to include information about the long run mean and long run variance in a Bayesian setting is via prior assumptions for α.We will further elaborate this in Section 2.2.1 and 2.2.2.The time-varying α also increases the flexibility of the model such that the conditional distribution in the short run is consistent with the empirical data, i.e., E[x t+h |x t , x t−1 , ...] and V ar[x t+h |x t , x t−1 , ...] still reflect the empirical distribution for a short horizon h.
Our BTVC-AR(1) model can therefore produce both, a conditional short term distribution, which roughly corresponds to an unrestricted linear model, and a long run distribution with a reasonable range of values.

The Long Run Mean and Time-Varying Mean Reversion
The mean reversion level in a linear AR(1) model as specified in (1) amounts to α 1 − β .
As the mean reversion level stays constant over time it is also the long run mean of the model.In contrast, the mean reversion level in the BTVC-AR(1) model changes over time and is given by α t 1 − β for time point t.This local mean reversion level is in general different to the long-term mean and can even pull the process away from it in expectation, i.e, which helps fitting the model to a time series exhibiting a (close to) random walk behavior.The long run mean of the BTVC-AR(1) depends on the unconditional mean of α and amounts to ϑ 1 − β in our model.We assume the data to be centered around a prior specified long run mean.By setting θ = 0, i.e., ϑ = 0, this long run mean is reached in expectation after reshifting the simulated data.
The implications of the time-varying mean reversion level of the BTVC-AR(1) model are visualized in Figure 1.Two AR(1) models (with unrestricted and restricted constant parameter) and the BTVC-AR(1) model have been exemplary fitted to a simulated stationary time series, which shows a (nearly) integrated behavior.
In the left graphic the "historical" time series can be seen as well as the expected future development according to the three models.The AR(1) model with no restrictions has a long-term mean far away from the historical domain, as its focus lies on the conditional short term distribution.The restricted AR(1) model sets the α parameter to 0 to regularize the long run mean, but at the same time the expected values in the short horizon are pushed in the direction of the long run level leading to an inferior forecasting performance.If we assume that the (close to) random walk behavior stems from changes in the mean reversion level determined by unobserved variables, the BTVC-AR(1) model has a more desired behavior.The time-varying constant parameter in the model leads to a time-varying mean reversion level and can therefore account for the changes induced by the unobserved variables.The long run mean can still be regularized to 0 while influencing the short term distribution less abruptly.This allows the time series to follow the current trend in expectation and veer away from the long run mean for a couple of time steps.The reason for this behavior is that the latent α-process induces a local mean reversion level that lies below the last observation, which can be seen in the right plot of Figure 1 showing the average latent mean reversion level extracted during the simulation process.In the long run the mean reversion level returns in expectation to the prespecified value of 0.

Long Run Variance
The long run variance of a linear AR(1) model is given by The closer the model behaves like a random walk, i.e., the closer β approaches 1, the larger the long run variance gets under the assumption of a fixed conditional variance σ 2 .In terms of the long run variance, the BTVC-AR(1) model is more flexible by incorporating two sources of variation, the residual term of the AR(1) model and variance of the latent α-process.The model's long run variance is given by The first term has the same form as the long run variance of a linear AR(1) model and can be interpreted as the "unconditional" variance around the timevarying mean reversion level, i.e., the variance conditional on the α-process.The second term incorporates the part of the variance stemming from the α-process and depends on both its unconditional variance and unconditional covariances.This allows the BTVC-AR(1) model to be more flexible and to control the long run variance of x t , while reducing the opposing effect on the conditional distribution in the short horizon.The model thus still produces short term distributions consistent with the given data.If α is a constant process, the second term is 0 and the BTVC-AR(1) model reduces to a linear AR(1) model.
Prior Assumptions.With the goal in mind to control the long-term variance based on prior information, a more refined specification of the BTVC-AR (1) model is helpful in order to translate this information into the model.We will use a centered α-process with an AR-covariance structure for demonstrative purposes.In this case, α can be represented by a linear AR(1) model where ρ represents the correlation between two successive time steps and η t is an i.i.d.Gaussian white noise process, i.e., η t i.i.d.
∼ N (0, τ 2 ).The long run variance of the BTVC-AR(1) model is then given by If the process x t is supposed to reach a certain objective variance in the long run, the degrees-of-freedom in (7) reduce from four to three.For example, for given ρ, β and σ 2 and a prior value assumption for V ar(x t ), the variance of x t has a one-to-one relationship with τ 2 and it is straightforward to solve (7) for τ 2 .Let denote the solution by τ 2 .To ensure positivity the truncation limits for the prior distribution of β can be set to −1 and V ar(xt)−σ 2 V ar(xt) .For this specific covariance structure, a possible prior distribution of τ 2 can thus be defined by the conditional distribution where δ denotes a degenerated distribution with point mass 1 at τ 2 .This definition forces the process to reach its prespecified long run variance V ar(x t ) while controlling the speed of mean reversion of the α-process through ρ.A conjugate prior for ρ is a normal distribution truncated below by −1 and from above by 1, i.e., with mean µ ρ and variance σ 2 ρ .The previous prior specifications allow to introduce prior information into the model in a straightforward manner while maintaining the properties of the BTVC-AR(1) model.

The Short Run Distribution
For the short run distribution of the BTVC-AR(1) model the goal is to balance between a consistent estimation with the observed data and the opposing effect of the prespecified long run distribution.For a linear AR(1) model with a restricted long run mean of 0 the conditional expectation and the conditional variance amount to The model can get arbitrarily close to a centered random walk if β approaches 1, while the long run variance increases at the same time as shown in Section 2.2.2.For the BTVC-AR(1) model we get A random walk behavior, i.e., E[x t+1 |x t ] ≈ x t , can be reached without β necessarily being close to 1 due to the conditional expectation of the α-process that supports the random walk behavior in the short horizon.This increases the flexibility of the BTVC-AR(1) model compared to a linear AR(1) model in combining short and long run distributional characteristics.
We can further decompose the conditional expectation to see the similarities of the BTVC-AR(1) model to a linear AR(t) process at a given time point t.Let α = (α 1 , ..., α t+1 ) denote the time-varying constant extended to t+1 in a consistent manner with the BTVC-AR(1) model definition, i.e., the same covariance parameterization is assumed.For a given data set x = (x 0 , ..., x t ), the conditional distribution of α|x is multivariate normal (c.f.Appendix A.1), i.e., α|x ∼ N (μ, Σ).
, the conditional expectation of α t+1 is given by the last entry of μ, where s t+1,.= (s t+1,1 , . . ., s t+1,t+1 ) and s i,j represent the entries of Σ.The one step ahead conditional expectation of the model therefore amounts to This shows that the conditional expectation depends on all previous time points like in a linear AR(t) model, allowing the BTVC-AR(1) model to better account for current trends in the process.Due to the given covariance structure for α the number of parameters are, however, much less than in an actual AR(t) process.
The goal of Bayesian inference is to find the joint posterior distribution, p( α, β, σ 2 |x), conditional on the observed data x = (x 0 , ..., x t ).If the full conditional distribution of all parameters is known, the Gibbs sampler [see, e.g., 9] can be used to draw samples from this joint posterior distribution and inference can be based on Monte Carlo approximation [see, e.g. , 6].By regularizing the long run variance under the assumption of an AR-covariance structure and choosing a degenerated prior distribution for τ 2 as in (8), the full conditional distributions of ρ, β and σ 2 depend on the prior of τ 2 and can not be derived analytically.We therefore apply a Metropolis-Hastings within Gibbs sampling routine [see, e.g., 19].We will state the algorithmic details in the following section and here only derive the necessary distributions.
As the model defined in Section 2.2.2 can be considered under a different parameterization where τ 2 is given by the function and thus fixed for given ρ, β, σ 2 and a specified long run variance V ar(x t ), we will focus on deriving two conditional distributions in order to be able to employ a two-step Gibbs sampling procedure.The goal is to iteratively sample α and the vector (ρ, β, σ 2 ) based on the respective other full conditional distribution.As it is not straightforward to derive the conditional distribution for the latter vector, we will here derive conditional distributions for all parameters involved as if the parameter τ 2 was fixed and later employ these distributions to derive a suitable proposal distribution in a Metropolis-Hastings procedure.In the following subsections we just state the (full) conditional distributions.A more detailed derivation can be found in Appendix A.1-Appendix A.4.

Full Conditional Distributions of α
In the following we derive the full conditional distribution of α.It holds Due to the conditional independence induced by the Markov assumption in the AR(1) model the likelihood of the parameters is given by where φ(•|µ, σ 2 ) denotes the density function of a normal distribution with expectation µ and variance σ 2 .Note, that we have assumed a degenerated distribution with point mass 1 for the first entry in x.An alternative option is to estimate the unconditional distribution.For increasing length of the time series the difference between these two approaches will however vanish.
With (10) and ( 11) and the prior distributions specified in Section 2.1 the full conditional distributions of α, can be derived analytically.Under the assumption that θ = 0 as specified in Section 2.2.1 to regularize the long run mean, the full conditional distribution of α is given by with ∆ in this case denotes ∆ = (x 2 − βx 1 , . . ., x t − βx t−1 , 0, . . ., 0).
As ∆ incorporates data information up to time point (vector entry) t, is 0 for time points > t and Cov(α t+j , α t ) −→ 0 with increasing j, the mean of the full conditional distribution tends to 0, corresponding to the unconditional mean of the prior distribution.The covariance structure of the full conditional distribution behaves analogously.Therefore, the distribution of α t+j | x, β, σ 2 in the long run tends to the prior distribution.This means that the prior distribution of α effectively regularizes the distribution of x in the long horizon towards the prespecified long run mean and long run variance.Note that the derivations are independent of the specific choice of Σ.If prior distribution assumptions for the parameters in Σ are used, we need to further condition on the hyper-parameters for the full conditional distribution of α.

Full Conditional Distributions of ρ, β, σ 2
If we assume an AR-covariance structure with prior distributions for its parameters as specified in Section 2.2.2, the conditional distribution of ρ is given by where The conditional distribution of β is given by where dt−j is defined by dt−j := x t−j − α t−j .
The conditional distribution of σ 2 is given by an inverse gamma distribution with parameters This means Note that this only holds if the prior of β | σ 2 is a normal distribution instead of a truncated normal distribution as assumed in Section 2.1.When using a truncated distribution assumption, the derivation of the full conditional of σ 2 is more intricate as the prior distribution of β also conditions on σ 2 .Since our approach will make use of the full conditionals as proposal distributions in the Metropolis-Hastings part of our sampling routine, this simplification allows a more straightforward implementation while we observe that values outside the given truncation are highly unlikely and practically occur with zero probability in our application.

Markov Chain Monte Carlo Inference
In the following we assume again an AR-covariance structure for Σ determined by the parameters ρ and τ 2 with prior distributions as specified in Section 2.2.2.To conduct inference, we use the Metropolis-Hastings within Gibbs sampler.More specifically, we generate samples from the posterior distribution by iteratively sampling from the full conditional distribution of α given a sample of (ρ, β, σ 2 ) and vice versa.Based on the derivation of the full conditional distribution for α in the previous section we are able to directly sample from a multivariate normal distribution to generate values for α.To obtain a sample from p(ρ, β, σ 2 | α, x) conditional on α, we apply the Metropolis-Hastings algorithm as neither the joint distribution of ρ, β, σ 2 nor each single full conditional distribution is available.A suitable and already available proposal distribution q for these parameters is given by In other words, we use the product of all full conditional distributions under the assumption of a fixed τ 2 .
In the BTVC-AR(1) model we use this approach in a first step to draw from the joint posterior distribution p( α, β, σ 2 | x).A detailed description of the sampling routine can be found in Appendix B. In a second and final step, we use these samples to generate paths of the x-process as follows:

Application To Interest Rate Data
We now apply the BTVC-AR(1) model to the first principal component (PC) of a principal component analysis (PCA) on interest rate data to predict the term structure of interest rates and compare it to the 2-Additive-Factor Gaussian (Gauss2++) model [see, e.g., 3] and the dynamic Nelson-Siegel model [7] with respect to the forecasting performance and the long run distribution.

Motivation and Background
The Gauss2++ model is a popular short-rate model in the insurance industry, used, e.g., to classify certified pension contracts into risk classes.Because its mean reversion level is calibrated to external interest rate forecasts, it generates realistic interest rates in the long horizon, which is a necessary model feature for insurance companies, as they are obliged to calculate risk measures and performance scenarios for specific insurance contracts for up to 40 years [8].Nevertheless, Diebold and Li [7] point out that short-rate models perform poorly in forecasting.Their dynamic Nelson-Siegel model shows a better forecasting performance than the Gauss2++ model in the short horizon, but can produce unrealistic interest rates in the very long horizon.Our model, which we call the BTVC-AR(1)-Factor model in the following as it applies the BTVC-AR(1) model to the first PC of a PCA, combines both: a good forecasting performance in the short horizon and realistic interest rates in the long horizon.It further accounts for the strong autocorrelation and the (close to) random walk behavior of interest rates.

Data
We use data of the German term structure of interest rates estimated by the Deutsche Bundesbank from prices of German government bonds.The exact estimation procedure can be found in [23].The time span ranges from September 1997 to August 2016.Figure 2 shows the monthly evolution of the interest rate curves.In the last ten to fifteen years a decrease of the interest rates can be observed.Each maturity represents a dimension in the data set.We use PCA to reduce the dimension of the data set for the following reason.According to Litterman and Scheinkman [18] a three factor model can explain for each interest rate with a specific maturity a minimum of 96% of the variability in the data.We here extract these (principle) factors but only use the first two to facilitate a fair comparison with the Gauss2++ model, which is a two factor model.Furthermore, the first two PCs already account for more than 99% of the variability in the given data.Figure 3 shows the loadings and the time series of the two extracted PCs.The loadings of the first PC are similar for all 20 maturities, while the loadings of the second PC are positive for short and negative for long maturities.The first and the second PC are therefore often interpreted as level and slope of the term structure, respectively.
The decrease of the interest rates in the last years is also visible in the level factor, showing a downward trend.There is an ongoing discussion in the literature about mean reversion of interest rates.Economic theory predominantly assumes that interest rates are (in the long run) mean reverting.But statistical evidence is not so clear [29].The mainstream literature says that unit roots can not be rejected, which would imply that interest rates are not mean reverting [24,22,25,4].More recent literature investigates the unit root hypothesis by fractional integrated techniques that apply differencing to time series by an order smaller than or greater than one [2,10].These studies find that shocks to interest rates have a long memory, which explains their (close to) random walk behavior.

Estimation of Model Parameters
In this subsection the estimation of the BTVC-AR(1)-Factor model and the two benchmark models is described.

Modeling Interest Rates with the BTVC-AR(1)-Factor Model
The factors of our BTVC-AR(1)-Factor model are the first two PCs extracted by a PCA and interpreted as level and slope of the interest rate curve.The level factor shows a (close to) random walk behavior, which can not be adequately captured by a stationary linear model.Following the economic theory view that interest rates (and therefore also the level) are mean reverting (in the long run) and assuming that the random walk behavior results from changes in the mean reversion level, we use therefore the BTVC-AR(1) model for this PC.It allows us to account for the (close to) random walk behavior as well as to regularize the level of the interest rate curve in the long horizon via prior assumptions.The slope factor is more stable over time.As an augmented Dickey Fuller test suggests that the existence of a unit root can be rejected, a linear AR(1) model is used for this factor.By modeling the level and the slope factor interest rate forecasts rt (τ ) with maturity τ can be calculated via where lt and ŝt denote the forecasts of the level and the slope factor, respectively.ξ 1 (τ ) and ξ 2 (τ ) denote the loading of the first and second PC for maturity τ .Before applying the PCA the data has been centered and therefore µ(τ ) is the mean interest rate of the data set for maturity τ .We now specify the prior assumptions of the BTVC-AR(1) model for the level factor and the estimation procedure of the AR(1) model for the slope factor.

The Level Factor
Latent AR1 constant α.For this application we assume an AR-covariance structure for the α-process of the BTVC-AR(1) model with the parameters ρ and τ 2 representing the correlation of two successive time points and the conditional variance, respectively.The unconditional mean of the α-process is set to 0, which implies the assumption that the long run mean of the level factor is 0. Because we also assume that the slope factor is a centered process this means that the long run interest rate curve converges in expectation to the average interest rate curve of the dataset.Autocorrelation parameter ρ.As specified in Section 2.2.2 we assume for ρ a truncated normal distribution with the parameters µ ρ = 0.98 and σ 2 ρ = 0.001 2 with lower truncation −1 and and upper truncation 1 as a hyper prior, i.e., ρ ∼ N (0.98, 0.001, −1, 1) The truncation ensures the stationarity of the process.The parameters of this hyper-prior rely on expert judgment and incorporate the assumption of a weak mean reverting α-process into the model and therefore allow the mean reversion level of the level factor to deviate from the long run mean for longer periods.This yields the (close to) random walk behavior present in (our) interest rate data.Variance of the latent process.According to Section 2.2.2 the parameter τ 2 is set in each iteration of the sampling procedure such that the long run variance of the level factor amounts to a prespecified value.We here use the value 120, which is inferred from a quantile of the unconditional distribution.By giving consideration of the rather unusual market situation of extremely low interest rates we make the assumption that the last observation is equal to the 7.5%-quantile.Due to the model assumptions, the unconditional distribution is normal with mean 0 and the corresponding unconditional variance can be calculated easily.Slope parameter of the AR(1) model.For β we assume that µ β = 0.95 and σ 2 β = 0.015 2 .This expert judgment represents a weak mean reversion to the time-varying mean reversion level.The lower and upper truncation of the truncated normal distribution amount to −1 and to ensure the stationarity of the model as well as the positivity of τ 2 , i.e., Residual variance.For the prior distribution of σ 2 the shape and scale parameter a and b are set to 0.5 and 2 respectively, representing an uninformative prior.
By specifying the parameters of the prior (and hyper-prior) distributions the full conditional distribution of α as well as the conditional distributions of the other parameters can be analytically derived as described in Section 2.3.Combining the Gibbs Sampler and the Metropolis-Hastings algorithm as explained in Section 2.4, paths of the level factor can be generated.Forecasts of the level factor are then represented by the average of the simulated paths.

The Slope Factor
The linear AR(1) model for the slope factor is given by where γ is a real valued constant between −1 and 1 and η t is a Gaussian white noise process, i.e., η t i.i.d.
∼ N (0, σ2 ).The constant parameter c is set to 0. The other parameters are estimated by a standard ordinary least squares approach.

Modeling Interest Rates With the Gauss2++ Model
The Gauss2++ model -in a different representation also known as the 2-Factor-Hull-White model -is a popular interest rate model in the insurance industry used for pricing interest rate derivatives as well as for risk management and forecasting purposes.The model assumes that the short-rate r(t), which is the interest rate with an infinitesimal small maturity, is given by the sum of two latent processes (x(t)) t≥0 and (y(t)) t≥0 , and a deterministic function ϕ: r(t) = x(t) + y(t) + ϕ(t).
The latent processes are modeled by dependent Ornstein-Uhlenbeck processes, which are the continuous version of a linear AR(1) process.Interest rates with longer maturities are then derived from the short-rate via pricing the corresponding zero-coupon bonds, which is analytically possible due to the model's distributional assumptions.The estimation process is materially different from the one of the other two models as it does not use historical data but calibrates the model to current future market assumptions (implicitly) provided by the current interest rate curve, interest rate derivatives as well as interest rate forecasts.By applying the downhill simplex algorithm the parameters of the model are chosen in such a way that forward rates -implicitly given by the current interest rate curve -and swaption prices are met in expectation.The relevant data has been extracted from Bloomberg.Additionally the mean reversion level of the two latent factors are analytically set such that two interest rate forecasts with a maturity of 3 months and 10 years, which are published by the OECD, are met in expectation.This approach is in line with the standard calibration procedure in the insurance industry.

Modeling Interest Rates With the Dynamic Nelson-Siegel Model
The dynamic Nelson-Siegel model of Diebold and Li [7] applies specific time series models to extracted latent factors.Diebold and Li tested several time series models on the level, slope and curvature factors of the Nelson-Siegel interest rate curve and compared the forecasting performance [Diebold and Li, 2006].In this paper we follow one of their approaches, in which they apply a PCA on interest rate data and use an univariate linear AR(1) process for each of the first three PCs.Because of comparison reasons to the other two two-factor models in this paper, we just use the first two PCs.The parameters of the AR(1) model are estimated by the ordinary least squares method.

Backtest
We now compare the forecasting performance of the BTVC-AR(1)-Factor model, the Gauss2++ model and the dynamic Nelson-Siegel model and analyse their long run distributions of the 10-year interest rate.

Comparison of the Forecasting Performance
For the out-of-sample backtest we apply an expanding window approach.The data of the first 10 years of the observations are used to estimate the parameters of the BTVC-AR(1)-Factor model and the dynamic Nelson-Siegel model as described in the Section 3.3.The Gauss2++ model is calibrated to the current market data.We then forecast the interest rates for the maturities of 1, 3, 5 and 10 years (representing the interest rate curve) for the horizons of 1, 3, 6 and 12 months.We expand the training sample by one month and repeat the procedure again.This is done until 12 months before the last observation in the data set.To evaluate the forecasting performance the error between the predicted interest rate rτ (t) and the actual interest rate r τ (t) with the maturity τ is calculated, i.e., error τ (t) = r τ (t) − rτ (t).

Table (C.1)-(C.4) in Appendix C
show the mean and the standard deviation of this error for each model.In addition, the root mean squared error for the given deviation is calculated, where N is the number of forecasts conducted in the backtest.
The RMSE for the 1-month ahead forecasts is similar for all three models.For longer forecasting horizons the Gauss2++ model shows the highest RMSE.For example, the 6-month ahead forecast of the 10-year interest rate of the Gauss2++ model has a RMSE, which is approximately twice as high as the RMSE of the other two models and more than three times as high for the 12-month ahead forecast.This supports the statement of Diebold and Li [7] that short-rate models perform poorly in forecasting.However, it should be mentioned that the performance of the Gauss2++ model highly depends on the interest rate forecasts used in the calibration process.Regarding the predominant negative mean error suggests that the OECD forecasts have been too optimistic in the past.
The results of the BTVC-AR(1)-Factor model and the dynamic Nelson-Siegel model are more consistent.For the forecasting horizon of 1-month the BTVC-AR(1)-Factor model shows a slightly lower RMSE except for the 10-year interest rate.For the 3-months, 6-months and 12-months forecasting horizons the BTVC-AR(1) model shows a lower RMSE for the short maturities, but a higher RMSE for the longer maturities compared to the dynamic Nelson-Siegel model.Note that the dynamic Nelson-Siegel model anticipated the downward trend present in the last years, which might have been beneficial in terms of the forecasting performance in the past, but also produces unrealistic interest rates in the long horizon.In contrast the BTVC-AR(1)-Factor model forces the model to mean revert to a prespecified level to regularize the interest rates in the long horizon.It can therefore follow the current trend only for a couple of time steps, which might explain the slightly worse performance for the 6and 12-months forecasting horizon.The fact that the RMSE error is still similar to the dynamic Nelson-Siegel model suggests that this does not affect the forecasting performance in the short horizon much.

Comparison of the Distribution in the Long Run
We further investigate the interest rate distribution in the long horizon.This is especially important for insurance companies as risk measures and performance scenarios for their products have to be calculated for up to 40 years [8].We therefore fit all three models on all data points up to the last observation date of the data set.We then simulate paths of the 10-year interest rate and visualize the distribution in 40 years.The median of the dynamic Nelson-Siegel model amounts to approximately -10%.A value that is not realistic for the 10-year interest rate.In comparison, the distribution of the BTVC-AR(1)-Factor model and the Gauss2++ model seem to be more realistic as the range of their distributions is (mainly) positive between 0% and 10%.It can be observed that the standard deviation of the Gauss2++ model is much smaller than of the BTVC-AR(1)-Factor model and as the median is quite high negative values are not reached by this model.This is due to the fact that the Gauss2++ model assumes a stronger mean reversion than historical data would suggest.The (close to) random walk behavior is better captured by the BTVC-AR(1)-Factor model leading to a prediction range which fits historical observations quite well.This is due to the regularization of the mean and the standard deviation of the BTVC-AR(1)-Factor model induced by appropriate prior assumptions, which represents the main difference to other interest rate models.

Conclusion
In this paper we introduced a new Bayesian framework for the TVC-AR(1) model particularly suitable for nearly integrated time series which can not be estimated by a linear model consistent with economic theory or historical observations.In these cases a (close to) random walk behavior can be an indication for a missing variable, for which we account for by the usage of a non-linear model.The time-varying constant of the BTVC-AR(1) allows a stochastic mean reversion level leading to realizations, which exhibit a random walk behavior although being stationary and do not have an exploding long run variance.Additionally, with the Bayesian approach it is possible to incorporate prior assumption about the long run distribution into the model without affecting the short-term predictions adversely.This gives the possibility to include expert knowledge or well known economic facts about the long-term behavior of the time series into the model that is otherwise fully data-driven in the short term forecast.
We apply the proposed approach to interest rate data.We find that the BTVC-AR(1)-Factor model, which applies a BTVC-AR(1) model to the first PC of a PCA, shows a similar forecasting performance as the dynamic Nelson-Siegel model in the short horizon but in contrast produces realistic interest rates in the very long horizon and also yields better forecasts compared to the Gauss2++ model.
The presented framework allows for many different specifications and is, in particular, flexible in terms of the assumed covariance structure of the latent α process in the model.In this paper we propose an AR-covariance structure and explain how model parameters can be inferred in this special case.Investigating other covariance structures may further improve the forecasting performance in the short horizon while still regularizing the distribution in the long run.

Figure 1 :
Figure 1: A comparison of a linear AR(1) model with no restrictions for the constant parameter, a linear AR(1) model with restrictions to the constant parameter and a BTVC-AR(1) model applied on a simulated time series.

Figure 2 :
Figure 2: Time series of the term structure of German government bond yields.

Figure 3 :
Figure 3: The scores and the loadings of the first two PCs.

Figure 4 :
Figure 4: Comparison of the distributions of the 10-year interest rate in 40 years modeled by the dynamic Nelson-Siegel model, the Gauss2++ model and the BTVC-AR(1)-Factor model.