### Abstract

- Top of page
- Abstract
- 1. INTRODUCTION
- 2. A POSTERIOR SIMULATOR FOR SMOOTH TRANSITION AUTOREGRESSIONS
- 3. BAYESIAN ESTIMATION OF MARKOV SWITCHING AUTOREGRESSIONS
- 4. BAYESIAN SPECIFICATION SEARCH
- 5. POSTERIOR PREDICTIVE
*P*-VALUES - 6. MCMC ESTIMATION
- 7. MAXIMUM LIKELIHOOD ESTIMATION
- 8. FORECAST EVALUATION
- 9. DISCUSSION AND CONCLUSIONS
- Acknowledgements
- REFERENCES
- Supporting Information

Logistic smooth transition and Markov switching autoregressive models of a logistic transform of the monthly US unemployment rate are estimated by Markov chain Monte Carlo methods. The Markov switching model is identified by constraining the first autoregression coefficient to differ across regimes. The transition variable in the LSTAR model is the lagged seasonal difference of the unemployment rate. Out-of-sample forecasts are obtained from Bayesian predictive densities. Although both models provide very similar descriptions, Bayes factors and predictive efficiency tests (both Bayesian and classical) favor the smooth transition model. Copyright © 2008 John Wiley & Sons, Ltd.

### 1. INTRODUCTION

- Top of page
- Abstract
- 1. INTRODUCTION
- 2. A POSTERIOR SIMULATOR FOR SMOOTH TRANSITION AUTOREGRESSIONS
- 3. BAYESIAN ESTIMATION OF MARKOV SWITCHING AUTOREGRESSIONS
- 4. BAYESIAN SPECIFICATION SEARCH
- 5. POSTERIOR PREDICTIVE
*P*-VALUES - 6. MCMC ESTIMATION
- 7. MAXIMUM LIKELIHOOD ESTIMATION
- 8. FORECAST EVALUATION
- 9. DISCUSSION AND CONCLUSIONS
- Acknowledgements
- REFERENCES
- Supporting Information

US unemployment is characterized by relatively brief periods of rapid economic contraction (rising unemployment) followed by relatively extended periods of slow economic expansion (falling unemployment). In the recent literature, several studies (Rothman, 1998; Montgomery *et al.*, 1998; Koop and Potter, 1999; Van Dijk *et al.*, 2002) have attempted to capture this salient feature by means of well-known nonlinear models such as threshold autoregression (TAR), the closely related logistic smooth transition autoregression (LSTAR), and Markov switching autoregression (MSAR). The first two papers attempt to base model choice on a comparison of forecasting performance.

There are two important conceptual differences between the MSAR and the TAR or LSTAR models. First, MSAR incorporates less prior information than TAR and LSTAR. Indeed, a filtered or smoothed regime probability in an MSAR model can be interpreted as a transition function which is estimated flexibly from the data. By contrast, specifying the transition function in a TAR or LSTAR model necessitates the choice of a transition variable (a difficult problem). Secondly, regime changes are predetermined in a TAR or LSTAR model, but are exogenous in the MSAR: in the latter, even if the model parameters were known, these changes could not be predicted with certainty from past data due to the presence of additional disturbances (in the Markov evolution equation). It is of interest to investigate whether the added flexibility and complexity of the MSAR model result in a superior predictive ability.

For the reasons given by West and McCracken (1998), it may be important to base such an investigation on small-sample predictive densities that take parameter uncertainty into account. In the context of maximum likelihood estimation, this requires using the bootstrap, and involves the repeated estimation of nonlinear models by local optimization algorithms. Convergence difficulties may make this impractical: see, for example, Chan and McAleer (2002, 2003). These difficulties do not arise if Bayesian methods are used: Markov chain Monte Carlo (MCMC) is used for simulating the joint posterior, and the resulting parameter replications are used for dynamic simulations of future observations.

In a Bayesian context, a prior identification constraint on the parameters of Markov switching models should be imposed; if this is not done, a multimodal posterior is obtained, except by spurious accident when mixing in the MCMC sampler is poor. This constraint can take the form θ_{1} < θ_{2} < · < θ_{K}, where θ_{i} is a particular population parameter in regime *i* and *K* is the number of regimes. The permutation sampler proposed by Frühwirth-Schnatter (2001) provides an effective procedure both for choosing an appropriate prior identification constraint, and for subsequently imposing this constraint. An MCMC posterior simulator for LSTAR models has been proposed by Lopes and Salazar (2006).

Among the authors mentioned in the first paragraph, only Montgomery *et al.* (1998) investigate the forecasting performance of an MSAR model; and only Koop and Potter (1999) fully rely on Bayesian methods. Even though the MSAR model in Montgomery *et al.* (1998) is estimated by the Gibbs sampler, the presentation is frequentist: the authors do not present their prior specification (including those aspects of the prior that are relevant for model identification) and only discuss point estimates and point forecasts. Koop and Potter (1999) provide a thorough Bayesian treatment of a TAR model of US unemployment; however, they do not compare its forecasting performance with that of an MSAR model.

An LSTAR formulation can approximate the TAR models used in Montgomery *et al.* (1998) and Koop and Potter (1999), but can be estimated with standard econometric software (contrary to the TAR and MSAR models), and is therefore particularly convenient. Van Dijk *et al.* (2002) have been the only authors to investigate the forecasting accuracy of an LSTAR model of the US unemployment rate. However, they do not update the parameter estimates as new observations become available, presumably for the reasons given in our third paragraph.

As pointed out by Koop and Potter (1999), there are important benefits in using a logistic transformation of the unemployment rate. This transformation not only guarantees that predictions are restricted to the unit interval (an important consideration if the emphasis is on predictive densities), but also removes the strong residual leptokurticity which plagues a model estimated from untransformed data. Among the four contributions mentioned in the first paragraph, only the paper by Koop and Potter (1999) uses such a transformation.

On these grounds, and since the permutation sampler has only recently become available, it may be argued that the potential of the LSTAR and MSAR models for predicting the US unemployment rate should be examined in more detail, and that true Bayesian predictive densities should be used in the investigation. This is the twofold objective of this paper.

An outline follows. Section 2 presents an MCMC posterior simulator for LSTAR models. It differs from the previous one in two respects. First, an independence Metropolis–Hastings chain is used, rather than the random walk chain used by Lopes and Salazar (2006). Secondly, the autoregressive order *p* and transition delay parameter *d* are assumed to be fixed (whereas one of the algorithms proposed by Lopes and Salazar is defined on a space that includes *p* and *d*). In our approach, we propose to choose *p* and the transition variable (or function) according to the criterion of highest marginal likelihood; some potential advantages are discussed. Section 2 therefore also describes our application of the bridge sampling method of Meng and Wong (1996) to the estimation of marginal likelihoods in a STAR model.

Section 3 briefly describes the MCMC estimation of the MSAR model and the bridge sampling estimation of marginal likelihoods for this model.

Section 4 presents estimated marginal likelihoods for 54 possible LSTAR, MSAR, and autoregressive (AR) models, where the dependent variable is a logistic transformation of the monthly US unemployment rate; a sensitivity analysis with respect to the prior parameters is done.

Section 5 discusses Bayesian misspecification diagnostics for the LSTAR and MSAR models that were found, in Section 4, to have the highest marginal likelihoods. The diagnostics are based on posterior predictive *p*-values for three relevant misspecification indicators.

Section 6 presents the MCMC estimates of the chosen LSTAR model and of its MSAR counterpart; some economic implications of the estimates are discussed.

Section 7 presents, for comparison purposes, maximum likelihood estimates of the models in Section 6, and diagnostics based on generalized residuals.

Finally, Section 8 attempts to discriminate between the MSAR, the LSTAR, and a benchmark AR model by means of simulated out-of-sample prediction exercises. For each model, Bayesian predictive densities are estimated from expanding windows of observations and for horizons of 1 to 6 months. Diagnostics based on probability integral transforms (Diebold *et al.*, 1998; Berkowitz, 2001), on one of the test statistics proposed by Diebold and Mariano (1995), and on efficiency tests based on regressions of observations on point predictions are reported; versions of the efficiency tests are analyzed from both classical and Bayesian standpoints. Section 9 concludes.

### 2. A POSTERIOR SIMULATOR FOR SMOOTH TRANSITION AUTOREGRESSIONS

- Top of page
- Abstract
- 1. INTRODUCTION
- 2. A POSTERIOR SIMULATOR FOR SMOOTH TRANSITION AUTOREGRESSIONS
- 3. BAYESIAN ESTIMATION OF MARKOV SWITCHING AUTOREGRESSIONS
- 4. BAYESIAN SPECIFICATION SEARCH
- 5. POSTERIOR PREDICTIVE
*P*-VALUES - 6. MCMC ESTIMATION
- 7. MAXIMUM LIKELIHOOD ESTIMATION
- 8. FORECAST EVALUATION
- 9. DISCUSSION AND CONCLUSIONS
- Acknowledgements
- REFERENCES
- Supporting Information

The two-state LSTAR model introduced by Teräsvirta (1994) may be written as

- (1)

with

- (2)

where *r* is a scaling constant that can be set equal to the sample standard deviation of the observable variable *s*_{t}, and where, conditional on *s*_{t} and *y*_{t−1}, …, *y*_{t−p}, *u*_{t} is a random disturbance with distribution *N*(0, σ^{2}). *G*(*s*_{t}, γ, *c*) is called the transition function; *s*_{t} the transition variable; γ the shape parameter; and *c* the location parameter. The model implies transitions between the two regimes where *G*(*s*_{t}, γ, *c*) = 0 (which tends to occur when *s*_{t} < *c*) and *G*(*s*_{t}, γ, *c*) = 1 (which tends to occur when *s*_{t} > *c*). When γ becomes large, *G*(*s*_{t}, γ, *c*) tends to the step function postulated by a two-state TAR model. In (2), the division of γ^{2} by *r* ensures that γ has a comparable order of magnitude across competing models.

The MCMC algorithm of this section iterates on the full conditional posteriors of the vector:

- (3)

of σ^{2}, and of (γ, *c*), using the most recently drawn conditioning values.

If γ and *c* are known, equation (1) collapses to the usual regression model *y* = *X*β+ *u*, where row *t* of the *T* × (2*p* + 2) matrix *X* has the form

with *G*_{t}≡*G*(*s*_{t}, γ, *c*). A multinormal prior on β with expectation vector β_{a} and precision matrix *V*_{a} and an independent inverted Gamma prior on σ^{2} with parameters *a* and *b* are assumed. It is then straightforward to show that the full conditional posteriors of β and σ^{2} are respectively multinormal and inverted Gamma:

- (4)

- (5)

with

The algorithm for simulating ϑ is based on a Metropolis–Hastings independence chain (Tierney, 1994), using a multivariate Student candidate-generating density with location and scale parameters based on the following linearization, obtained by taking a first-order Taylor expansion of (1)–(2) around (γ*, *c**) and regrouping in the left-hand side those terms that do not depend on γ and *c*:

where

The anchor point ϑ* = (γ*, *c**) is an approximate solution of the Bayesian update equations:

- (7)

- (8)

where *X*_{*} is the *T* × 2 matrix with row *t* equal to and where *y*_{*} is the *T* × 1 vector with elements . This approximate solution is obtained from a few iterations on (7) and (8), with starting point given by the prior expectations. A candidate ϑ is drawn from a multivariate Student density with kernel

- (9)

and is accepted with probability

where ϑ_{old} is the most recently drawn vector. If the candidate is rejected, ϑ_{old} is retained. The number ν of degrees of freedom in (9) can be chosen by experimentation; in the empirical part of this paper, a value of ν = 3 was chosen and led to acceptance rates of approximately 0.80.

Lopes and Salazar (2006) parameterize equation (2) in terms of γ rather that γ^{2}, and ensure the positivity of γ by specifying a prior with positive support (such as a Gamma distribution) for this parameter. They use a random walk Metropolis–Hastings chain, where two tuning parameters must be specified. An advantage of the method proposed in this section is that its implementation can be automatic: choosing ν = 3 in (9) seems to give uniformly good results, with high acceptance rates and well-mixing chains. This advantage will prove decisive in Section 8, where several thousand MCMC estimations will be needed.

Lopes and Salazar (2006) also present a reversible jump MCMC method where the autoregressive lag order and transition delay parameter are included in the parameter space. By contrast, our method treats *p*, and the transition variable *s*_{t}, as fixed; it is proposed to investigate the choice of *p* and *s*_{t} by estimating marginal likelihoods for a range of candidate models. Although less ambitious than reversible jump MCMC, this approach easily allows the comparative investigation of models where *s*_{t} is *any* transition variable, and *G*(.) is *any* transition function (in particular, LSTAR and ESTAR models can easily be compared).

The rest of this section describes the method for estimating marginal likelihoods. It is based on the bridge sampling identity (Meng and Wong, 1996), which allows estimation of the ratio of the normalizing constants of two density kernels with overlapping support. For a model with prior *p*(θ) and likelihood *f*(*y*|θ), and given a bridge function α(θ), the marginal likelihood

is equal to

- (10)

where *q*(θ) is a normalized importance density. The numerator in (10) can be estimated by an average of replications of *p*(θ)*f*(*y*|θ)α(θ) where θ is drawn from *q*(θ), and the denominator by an average of replications of *q*(θ*)α(θ*), where θ* is drawn from the posterior.

A good choice of α(θ) is important for numerical efficiency. Meng and Wong (1996) recommend

- (11)

where we have assumed that *n* replications from *q*(θ) and *m* replications from *p*(θ|*y*) are available. Since *p*(*y*) is unknown, (11) must be obtained by an iterative procedure.

For the STAR model, the author chose an importance density *q*(θ) having the same parametric form as the prior, but with moments that match the empirical posterior moments obtained by MCMC. This choice is appropriate when the marginal posteriors are unimodal, and gave very good results in this case.

Bridge sampling has been shown by Frühwirth-Schnatter (2004) to include as special cases other well-known methods for estimating *p*(*y*), such as the method of Gelfand and Dey (1994). The method proposed by Chib and Jeliazkov (2001) has also been shown by Mira and Nicholls (2004) to be a special case of bridge sampling.

### 3. BAYESIAN ESTIMATION OF MARKOV SWITCHING AUTOREGRESSIONS

- Top of page
- Abstract
- 1. INTRODUCTION
- 2. A POSTERIOR SIMULATOR FOR SMOOTH TRANSITION AUTOREGRESSIONS
- 3. BAYESIAN ESTIMATION OF MARKOV SWITCHING AUTOREGRESSIONS
- 4. BAYESIAN SPECIFICATION SEARCH
- 5. POSTERIOR PREDICTIVE
*P*-VALUES - 6. MCMC ESTIMATION
- 7. MAXIMUM LIKELIHOOD ESTIMATION
- 8. FORECAST EVALUATION
- 9. DISCUSSION AND CONCLUSIONS
- Acknowledgements
- REFERENCES
- Supporting Information

The MSAR counterpart of (1)–(2) is

- (12)

where γ_{1} = α_{1}, β_{1j} = ϕ_{1j}, γ_{2} = α_{1} + α_{2}, and β_{2j} = ϕ_{1j} + ϕ_{2j} for *j* = 1, …, *p*. Given (*G*_{t}, *y*_{t−1}, *y*_{t−2}, …, *y*_{t−p}), *u*_{t} has the distribution *N*(0, σ^{2}). The parameters α_{i} and ϕ_{ij} have the same interpretation as in (1), but *G*_{t} is here a discrete random variable with a value of zero in the first regime and a value of unity in the second. The prior on (*G*_{1}, …, *G*_{T}) is first-order Markov, with *P*[*G*_{t} = 0|*G*_{t−1} = 0] = **p** and *P*[*G*_{t} = 1|*G*_{t−1} = 1] = **q**; independent uniform hyperpriors on **p** and **q** are assumed.

The MCMC estimation of this model is now well established; see, for example, Albert and Chib (1993), and Chib (1996). Assuming conjugate priors, the full conditional posterior of the regression coefficients is multivariate normal; that of σ^{2} is inverted Gamma; those of **p** and **q** are Beta; and draws of (*G*_{1}, …, *G*_{T}) can be obtained by a simulation smoother. The identification of (12), however, has only recently been adequately discussed in the literature. Frühwirth-Schnatter (2001) proposes the permutation sampler, which comes in an unconstrained and a constrained version. In the unconstrained version, each pass of the Gibbs sampler is followed by a random permutation of the regime definitions; this guarantees a balanced sample from the unconstrained posterior. An examination of this sample is used to suggest an appropriate identification constraint; in model (12), one may choose a single constraint of the form α_{2} > 0, ϕ_{2j} > 0 for one *j*∈{1, …, *p*}, or **p** < **q**. This examination is followed by an application of the constrained version of the sampler, where the chosen identification constraint is imposed. More details can be found in Frühwirth-Schnatter (2006), where Bayesian and classical methods for estimation and specification search in finite mixture models such as (12) are extensively reviewed.

It is important to note that the permutation sampler requires a prior that is invariant with respect to relabeling. In the context of (12), this implies a prior with *p*(γ_{1}, γ_{2}) = *p*(γ_{2}, γ_{1}) and *p*(β_{1j}, β_{2j}) = *p*(β_{2j}, β_{1j}) for all *j* = 1, …, *p*. When the prior on the regression coefficients is normal, it is of course a simple matter to translate such a prior into an equivalent prior on the α_{i} and ϕ_{ij}.

Frühwirth-Schnatter (2004, 2006, Ch. 5) recommends using bridge sampling for estimating the marginal likelihood of this model from the *unconstrained* permutation sampler output. The unconstrained marginal likelihood is proportional to the constrained one, with a factor of proportionality equal to the factorial of the number of regimes; this implies that computing the constrained marginal likelihood is unnecessary, and also that marginal likelihoods cannot be used to investigate the adequacy of an identification constraint.

In the bridge sampling identity (10), the likelihood *f*(*y*|θ) is needed. It can easily be computed by integrating out the latent variables (*G*_{1}, …, *G*_{T}), as follows:

- (13)

where

where *y*^{t − 1} contains all observations on *y*_{s} up to *t* − 1, and where the conditional probabilities of the regimes can be evaluated by the filter described in Hamilton (1994, pp. 692–693).

The construction of the importance density *q*(θ) in (10) requires care, due to the multimodality of the unconstrained posterior. The author adopts the suggestion of Frühwirth-Schnatter (2004) to construct *q*(θ) from a discrete mixture of transition kernels. Specifically, a sample (*G*^{1}, …, *G*^{S}) with is drawn from the unconstrained posterior. The importance density is

- (14)

where *k*_{1}(.) is the full conditional posterior of the transition probabilities, *k*_{2}(.) is the full conditional posterior of the regression coefficients, *k*_{3}(σ^{2}) is an inverted gamma whose moments match the empirical moments of the MCMC output for σ^{2}, and is the previous draw from the full conditional posterior of σ^{2}. In practice, for a two-state model, setting *S* = 20 will suffice.

### 4. BAYESIAN SPECIFICATION SEARCH

- Top of page
- Abstract
- 1. INTRODUCTION
- 2. A POSTERIOR SIMULATOR FOR SMOOTH TRANSITION AUTOREGRESSIONS
- 3. BAYESIAN ESTIMATION OF MARKOV SWITCHING AUTOREGRESSIONS
- 4. BAYESIAN SPECIFICATION SEARCH
- 5. POSTERIOR PREDICTIVE
*P*-VALUES - 6. MCMC ESTIMATION
- 7. MAXIMUM LIKELIHOOD ESTIMATION
- 8. FORECAST EVALUATION
- 9. DISCUSSION AND CONCLUSIONS
- Acknowledgements
- REFERENCES
- Supporting Information

We will now apply the methods of the two preceding sections to LSTAR and MSAR models of the unemployment rate. The dependent variable is *y*_{t} = ln[0.01*U*_{t}/(1 − 0.01*U*_{t})], where *U*_{t} is the civilian male (over 20 years old) deseasonalized monthly US unemployment rate (in percentage points) from 1960 : 1 to 2004 : 12, taken from the LRMT20 series in the Haver Analytics USECON database.

Van Dijk *et al.* (2002) suggest using as transition variable the lagged seasonal difference of the unemployment rate. Indeed, Figure 1 reveals that the series *s*_{t} = *U*_{t−1} − *U*_{t−13} closely reproduces the business cycle, with low values of *s*_{t} corresponding to expansions and high values of *s*_{t} to contractions. However, Van Dijk *et al.* (2002) do not use the logistic transformation of *U*_{t}, and their estimates (equations 45 and 46 in their paper) imply an extreme lack of normality in the residuals, with a *p*-value of 5.9 × 10^{−5} for the Bera–Jarque statistic.

Table I reports the logarithms of the marginal likelihoods for AR, LSTAR, and two- and three-state MSAR models with dependent variable *y*_{t}, for various values of the autoregressive order *p* and of the transition delay parameter *d* in *s*_{t} = *U*_{t−d} − *U*_{t−d−12}. The prior parameters were as follows. For the MSAR models, the autoregression coefficients (β_{ij} in (12)) are independent standard normal, and the intercepts γ_{i} are independent *N*(0, 0.01). For the LSTAR models, the prior on the intercepts (α_{1}, α_{2}) is bivariate normal with null expectation vector, variances of *V*(α_{1}) = 0.01 and *V*(α_{2}) = 0.02, and correlation coefficient ; and the priors on the autoregression coefficients (ϕ_{1j}, ϕ_{2j}) are bivariate normal with null expectation vector, variances of *V*(ϕ_{1j}) = 1 and *V*(ϕ_{2j}) = 2, and correlation coefficient . This choice ensures that the priors on the regression coefficients are identical for the MSAR and the LSTAR models. The prior parameters on γ and *c* are γ_{a} = 3, , *c*_{a} = 0, and . For the AR models, the prior on the intercept is *N*(0, 0.01), and the priors on the autoregression coefficients are *N*(0, 1). In all models, the inverted Gamma prior on σ^{2} is almost improper, with *a* = *b* = 10^{−6}, and prior coefficient independence across covariates is assumed.

Table I. Logarithmic marginal likelihoods and numerical standard errors | *p* |
---|

1 | 2 | 3 | 4 | 5 | 6 |
---|

AR | | 919.566 | 916.410 | 927.651 | 937.031 | **937**.**795** | 935.637 |

| 0.000 | 0.001 | 0.001 | 0.001 | 0.001 | 0.001 |

2-state MSAR | | 932.093 | 946.035 | **949**.**597** | 945.624 | 941.595 | 937.158 |

| 0.008 | 0.008 | 0.010 | 0.016 | 0.017 | 0.034 |

3-state MSAR | | 934.640 | 941.001 | 943.708 | 939.379 | 934.560 | 929.930 |

| 0.040 | 0.158 | 0.085 | 0.229 | 0.063 | 0.078 |

LSTAR | *d* = 1 | 935.575 | **956**.**206** | 955.722 | 952.162 | 946.854 | 941.455 |

| 0.004 | 0.005 | 0.006 | 0.006 | 0.006 | 0.007 |

| *d* = 2 | 936.308 | 954.752 | 953.807 | 951.360 | 946.043 | 940.856 |

| 0.005 | 0.005 | 0.006 | 0.006 | 0.006 | 0.007 |

| *d* = 3 | 928.488 | 950.139 | 951.096 | 948.494 | 943.463 | 938.140 |

| 0.005 | 0.005 | 0.005 | 0.006 | 0.006 | 0.006 |

| *d* = 4 | 926.544 | 945.231 | 947.103 | 946.198 | 941.172 | 935.886 |

| 0.005 | 0.005 | 0.005 | 0.006 | 0.006 | 0.006 |

| *d* = 5 | 922.249 | 938.212 | 942.523 | 941.437 | 937.365 | 932.142 |

| 0.006 | 0.005 | 0.005 | 0.006 | 0.006 | 0.006 |

| *d* = 6 | 919.496 | 933.870 | 938.681 | 938.802 | 934.502 | 929.884 |

| 0.007 | 0.005 | 0.007 | 0.009 | 0.012 | 0.017 |

An examination of Table I confirms that bridge sampling is very efficient, especially for the AR and LSTAR models. Among the AR specifications, *p* = 5 is preferred. Among the LSTAR specifications, the evidence is in favor of (*p* = 2, *d* = 1), with (*p* = 3, *d* = 1) being a close contender. The Bayes factor in favor of the first of the two models is exp(956.206 − 955.722) = 1.623; on the Jeffreys scale, the evidence against the less parsimonious model is ‘not worth more than a bare mention’ (Jeffreys, 1961, Appendix B). Indeed, assuming prior odds of unity, the posterior probability of (*p* = 2, *d* = 1) against (*p* = 3, *d* = 1) is 1.623/2.623 = 0.619. In the MSAR case, a two-state model with *p* = 3 is very clearly preferred.

Marginal likelihoods are well known to be sensitive to the prior specification (and are not defined for improper priors). In Table II, we therefore present some marginal likelihood estimates obtained when all the prior variances (except that of σ^{2}) are doubled. The marginal likelihoods are indeed uniformly lower, but the ranking between models remains unchanged.

Table II. Logarithmic marginal likelihoods and numerical standard errors with looser priors | *p* |
---|

1 | 2 | 3 | 4 | 5 | 6 |
---|

AR | | 919.136 | 915.639 | 926.566 | 935.586 | **935**.**997** | 933.493 |

| 0.000 | 0.001 | 0.001 | 0.001 | 0.001 | 0.001 |

2-state MSAR | | 931.312 | 944.596 | **947**.**571** | 942.833 | 938.350 | 932.123 |

| 0.008 | 0.011 | 0.009 | 0.014 | 0.020 | 0.038 |

3-state MSAR | | 933.653 | 938.641 | 940.267 | 935.681 | 929.097 | 928.811 |

| 0.037 | 0.065 | 0.246 | 0.137 | 0.163 | 0.145 |

LSTAR | *d* = 1 | 934.715 | **954.603** | 953.354 | 949.074 | 943.098 | 937.012 |

| 0.005 | 0.005 | 0.006 | 0.007 | 0.007 | 0.007 |

| *d* = 2 | 935.410 | 953.186 | 951.411 | 948.264 | 942.274 | 936.393 |

| 0.006 | 0.005 | 0.006 | 0.006 | 0.007 | 0.007 |

| *d* = 3 | 927.619 | 948.612 | 948.802 | 945.452 | 939.741 | 933.724 |

| 0.006 | 0.005 | 0.005 | 0.006 | 0.006 | 0.006 |

The conclusions that emerge from Tables I and II are clear. First, Bayes factors present very strong evidence against linearity. Secondly, the LSTAR model with *s*_{t} = *U*_{t−1} − *U*_{t−13} is uniformly, and very strongly, preferred to the two- and three-state MSAR models. Other choices of *s*_{t}, such as *U*_{t−1} − *U*_{t−d−1} for various values of *d*, yielded lower marginal likelihoods; the same was true for the transition variable (*y*_{t−1} − *y*_{t−6})/5, found by Koop and Potter (1999) to maximize the posterior odds in their TAR model, albeit for a different sample. An ESTAR model was also tried, with inferior results.

Even though care was taken to ensure comparable priors across models, an element of uncertainty due to prior sensitivity remains. Also, the marginal likelihood criterion is known to favor parsimony; if one treats the discrete latent variables in the MSAR model as unknown parameters, LSTAR becomes much more parsimonious than MSAR. In the author's opinion, an ultimate comparison of both models should therefore rely on complementary evidence. Such evidence might be provided by posterior predictive *p*-values; this is the topic of the next section.

### 5. POSTERIOR PREDICTIVE *P*-VALUES

- Top of page
- Abstract
- 1. INTRODUCTION
- 2. A POSTERIOR SIMULATOR FOR SMOOTH TRANSITION AUTOREGRESSIONS
- 3. BAYESIAN ESTIMATION OF MARKOV SWITCHING AUTOREGRESSIONS
- 4. BAYESIAN SPECIFICATION SEARCH
- 5. POSTERIOR PREDICTIVE
*P*-VALUES - 6. MCMC ESTIMATION
- 7. MAXIMUM LIKELIHOOD ESTIMATION
- 8. FORECAST EVALUATION
- 9. DISCUSSION AND CONCLUSIONS
- Acknowledgements
- REFERENCES
- Supporting Information

Let *s*_{i}(*x*, θ), for *i* = 1, …, *n*, be a collection of statistics commonly used for misspecification testing, *x* being a vector of data assumed to be generated by a given model with parameter vector θ. In the applications of this section, *s*_{i}(*x*, θ) will be based on the generalized residuals *u*_{t}(*x*, θ), defined as

- (15)

where Φ(.) is the normal integral, and *x*^{t − 1} contains all observations on *x*_{s} up to *t* − 1. If the probabilities on the right-hand side of (15) are indeed those implied by the process generating *x*, the generalized residuals are independent standard normal. In the LSTAR model, *u*_{t}(*x*, θ) is simply the standardized residual implied by (1). In the MSAR model (12), we have

- (16)

where the conditional probabilities of the regimes are computed as in (13).

The predictive distribution of *s*_{i}(*x*, θ) can be readily simulated if a posterior sample from *p*(θ|*y*) is available: for each replication θ from this sample, one simply generates *x* by recursively simulating the model, computes the generalized residual series , and computes the resulting *s*_{i}(*x*, θ) for *i* = 1, …, *n*. The posterior predictive *p*-value for statistic *i* is defined as

and is estimated as the percentage of replications of *s*_{i}(*x*, θ) that exceed the posterior average of the values computed from the actual data. More details on this approach can be found in Gelman and Meng (1996) or Koop (2003).

We will compute three statistics *s*_{i}(*x*, θ) from the generalized residual series , namely:

- 1.
the Bera–Jarque statistic, used as an indicator of error non-normality;

- 2.
an *F* statistic for the nullity of the autoregression coefficients in an AR(12) model of the generalized residuals. This is used as an indicator of error autocorrelation, and is denoted by AC(12);

- 3.
an *F* statistic for the nullity of the autoregression coefficients in an AR(12) model of the squared generalized residuals. This is used as an indicator of error conditional heteroscedasticity, and is denoted by ARCH(12).

Table III presents the estimated posterior predictive *p*-values for each statistic and for the LSTAR and MSAR models that were found, in Section 4, to have the highest marginal likelihoods. The MSAR model was identified using the first autoregression coefficient: the constraint ϕ_{21} > 0 in (12) was very clearly suggested by the output of the unconstrained permutation sampler. In each case, the estimated *p*-values are based on 10,000 replications. The estimated predictive distributions of the statistics were quite close to the relevant asymptotic chi-square and *F* distributions, suggesting that asymptotic approximations would be reasonable.

Table III. Posterior predictive *p*-values | Bera–Jarque | AC(12) | ARCH(12) |
---|

LSTAR (*p* = 2, *d* = 1) | 0.6718 | 0.0935 | 0.0066 |

LSTAR (*p* = 3, *d* = 1) | 0.5872 | 0.0681 | 0.0254 |

2-state MSAR (*p* = 3) | 0.7330 | 0.0149 | 0.0468 |

The estimated *p*-values in Table III are all larger than 0.01, with the exception of that for ARCH(12) in the LSTAR model with *p* = 2. So, there is weak evidence of conditional heteroscedasticity in this model.

Since, as discussed in Section 4, the posterior probabilities of the two preferred LSTAR formulations are nearly equal, it is legitimate to use other evidence, such as the one provided in this section, for discriminating between these two models (additional evidence in favor of *p* = 3 will be given in Section 8). There is another, more subjective, reason for preferring a three-lag LSTAR model: since our ultimate aim is the comparison of LSTAR and MSAR models, it makes sense to choose the same autoregressive order for both. It might otherwise be difficult to disentangle those differences that are due to model order from those that are due to model class.

For these reasons, and for the sake of brevity, we will concentrate in the sequel on the 2-state MSAR model with *p* = 3 and on the LSTAR model with *p* = 3 and *s*_{t} = *U*_{t−1} − *U*_{t−13}. However, we will occasionally mention results obtained with the two-lag LSTAR model when this appears noteworthy.

### 6. MCMC ESTIMATION

- Top of page
- Abstract
- 1. INTRODUCTION
- 2. A POSTERIOR SIMULATOR FOR SMOOTH TRANSITION AUTOREGRESSIONS
- 3. BAYESIAN ESTIMATION OF MARKOV SWITCHING AUTOREGRESSIONS
- 4. BAYESIAN SPECIFICATION SEARCH
- 5. POSTERIOR PREDICTIVE
*P*-VALUES - 6. MCMC ESTIMATION
- 7. MAXIMUM LIKELIHOOD ESTIMATION
- 8. FORECAST EVALUATION
- 9. DISCUSSION AND CONCLUSIONS
- Acknowledgements
- REFERENCES
- Supporting Information

In this section, we will present Markov chain Monte Carlo estimates of the LSTAR model (1)–(2) with *p* = 3 and *s*_{t} = *U*_{t−1} − *U*_{t−13}, and of the MSAR model (12) with *p* = 3 and the identification constraint ϕ_{21} > 0. The prior distributions are the same as those used for estimating the marginal likelihoods in Table I, and were described in the third paragraph of Section 4.

In Table IV, θ_{α} denotes the estimated posterior quantile at probability α and *s*_{θ} is the estimated posterior standard error. The regression coefficient estimates for both models are generally similar, even though some differences can be noticed (in particular, LSTAR predicts regime switching in the second autoregression coefficient, but this is not the case for MSAR since the confidence interval for ϕ_{22} contains zero in the second model but not in the first). In the MSAR model, *G*_{t} = 0 can be associated with expansion and *G*_{t} = 1 with contraction. In the LSTAR model, the data do not appear to be very informative on the shape parameter of the transition function, since the posterior results for γ almost reproduce the *N*(3, 0.1) prior for this parameter: the 95% prior confidence interval is [2.38,3.62] and the *p*-value of a Bera–Jarque statistic for testing the normality of the posterior replications of γ is 0.39. This is not surprising, since a wide range of values of γ will typically lead to similar shapes of the transition function, as noted by Teräsvirta (1994); indeed, this reason is commonly invoked to explain the failure of likelihood maximization algorithms in STAR models.

Table IV. Posterior replication summaries for the LSTAR and MSAR modelsθ | LSTAR | MSAR |
---|

θ_{0.025} | θ_{0.5} | θ_{0.975} | *s*_{θ} | θ_{0.025} | θ_{0.5} | θ_{0.975} | *s*_{θ} |
---|

α_{1} | − 0.082 | − 0.045 | − 0.006 | 0.019 | − 0.078 | − 0.034 | 0.013 | 0.023 |

ϕ_{11} | 0.525 | 0.645 | 0.761 | 0.060 | 0.458 | 0.588 | 0.712 | 0.065 |

ϕ_{12} | 0.165 | 0.291 | 0.417 | 0.064 | 0.126 | 0.255 | 0.380 | 0.065 |

ϕ_{13} | − 0.055 | 0.054 | 0.164 | 0.056 | 0.027 | 0.151 | 0.280 | 0.065 |

α_{2} | − 0.115 | − 0.045 | 0.023 | 0.035 | − 0.117 | − 0.052 | 0.019 | 0.034 |

ϕ_{21} | 0.384 | 0.583 | 0.779 | 0.100 | 0.340 | 0.539 | 0.743 | 0.102 |

ϕ_{22} | − 0.591 | − 0.312 | − 0.027 | 0.143 | − 0.362 | − 0.098 | 0.142 | 0.127 |

ϕ_{23} | − 0.487 | − 0.294 | − 0.101 | 0.098 | − 0.656 | − 0.467 | − 0.261 | 0.100 |

σ^{2} × 100 | 0.128 | 0.145 | 0.163 | 0.009 | 0.114 | 0.130 | 0.148 | 0.009 |

γ | 2.463 | 3.006 | 3.567 | 0.282 | |

*c* | 0.043 | 0.188 | 0.326 | 0.072 | |

**p** | | 0.921 | 0.966 | 0.987 | 0.017 |

**q** | | 0.868 | 0.941 | 0.978 | 0.029 |

In Figure 2, we present kernel density estimates of the marginal posteriors of the intercepts in both states (α_{1} and α_{1} + α_{2}) and of the sums of the autoregression coefficients in both states ( and ). Both models predict a unit root during expansions and a highly persistent process during contractions (the results obtained with the two-lag LSTAR model were almost identical). The presence of unit roots, or roots very close to unity, implies that it would be misleading to speak of an equilibrium unemployment rate in a particular regime.

It is also of interest to compare the estimated transition function of the LSTAR model with the smoothed probability of the second regime in the MSAR model, which is *P*[*G*_{t} = 1|*y*_{1}, …, *y*_{T}]. This probability can be estimated from the MCMC output by the percentage of replications of *G*_{t} having a value of unity. The two functions are plotted in Figure 3. They are generally similar, but the ‘transition function’ of the MSAR model appears better able to anticipate turning points. This is not too surprising, since the smoothed probability is conditional on all the sample observations whereas the LSTAR transition function uses only past observations. For comparison, the NBER business cycles (available at www.nber.org/cycles.html, accessed on January 4, 2007) are also reported as shaded areas.

Finally, Figure 4 reports a kernel density estimate of the ergodic probability of contraction in the MSAR model, given by (1 − **p**)/(2 − **p** − **q**); of the expected duration of an expansion, given by 1/(1 − **p**); and of the expected duration of a contraction, given by 1/(1 − **q**). The posteriors of the expected durations are strongly leptokurtic and skewed to the right, with a median of approximately 30 months for expansions and 17 months for contractions.

The MSAR model (12) assumes that the innovation variance σ^{2} is constant. In a previous version of this paper, the more general model described in Deschamps (2006) was estimated. This model assumes Student-*t* disturbances with regime-dependent scale parameters. In this more general model, the point estimates of the Student scale parameters are 0.00126 in expansions and 0.00125 in contractions; and the point estimate of the Student degrees of freedom is equal to 94. The point and interval posterior estimates for the other parameters (regression coefficients and transition probabilities) were almost identical to those in Table IV. Together with the predictive *p*-values reported in Section 5, this confirms that a normal homoscedastic model is appropriate for this sample.

### 7. MAXIMUM LIKELIHOOD ESTIMATION

- Top of page
- Abstract
- 1. INTRODUCTION
- 2. A POSTERIOR SIMULATOR FOR SMOOTH TRANSITION AUTOREGRESSIONS
- 3. BAYESIAN ESTIMATION OF MARKOV SWITCHING AUTOREGRESSIONS
- 4. BAYESIAN SPECIFICATION SEARCH
- 5. POSTERIOR PREDICTIVE
*P*-VALUES - 6. MCMC ESTIMATION
- 7. MAXIMUM LIKELIHOOD ESTIMATION
- 8. FORECAST EVALUATION
- 9. DISCUSSION AND CONCLUSIONS
- Acknowledgements
- REFERENCES
- Supporting Information

Table V presents maximum likelihood estimates of the preceding three-lag LSTAR model and two-state, three-lag MSAR model. The LSTAR model was estimated using E-Views 5.1; the MSAR model was estimated by the EM algorithm described in Smith *et al.* (2006, section 2.2). In order to ensure the convergence of this algorithm, it proved necessary to concentrate the likelihood with respect to the regression coefficients and equation variance, and to maximize the concentrated loglikelihood *L*(**p**, **q**) by a bivariate grid search on the square [0.9, 0.99]× [0.9, 0.99], with increments of 0.01 for each coordinate; this was followed by a more refined grid search on [0.986, 0.995]× [0.986, 0.995] with increments of 0.001.

Table V. Maximum likelihood estimates of the LSTAR and MSAR modelsθ | LSTAR | MSAR |
---|

| | | |
---|

α_{1} | − 0.047 | 0.020 | − 0.031 | 0.020 |

ϕ_{11} | 0.651 | 0.061 | 0.582 | 0.056 |

ϕ_{12} | 0.291 | 0.063 | 0.251 | 0.064 |

ϕ_{13} | 0.046 | 0.056 | 0.162 | 0.056 |

α_{2} | − 0.050 | 0.038 | − 0.054 | 0.029 |

ϕ_{21} | 0.577 | 0.105 | 0.536 | 0.079 |

ϕ_{22} | − 0.323 | 0.146 | − 0.101 | 0.105 |

ϕ_{23} | − 0.279 | 0.100 | − 0.462 | 0.078 |

σ^{2} × 100 | 0.145 | 0.009 | 0.132 | 0.008 |

γ | 3.153 | 0.850 | |

*c* | 0.195 | 0.067 | |

**p** | | 0.990 | 0.006 |

**q** | | 0.991 | 0.006 |

The regression coefficient estimates in Table V are quite close to the corresponding ones in Table IV. The estimated standard error of in Table V is, however, larger than *s*_{γ} in Table IV, reflecting the relatively tight prior on γ. Another noteworthy observation is the near equality of the estimated persistence probabilities p̂ and *q̂* in Table V; furthermore, the estimates are larger than in Table IV. They imply nearly equal ergodic probabilities, and nearly equal expected durations for both cycles. This is at variance with the behavior exhibited in Figure 4, and conflicts somewhat with historical evidence. However, the smoothed probability graph was very similar to the one in Figure 3.

Figures 5 and 6 present distribution graphs and correlograms of the generalized residuals defined in Section 5. Table VI presents *p*-values of misspecification diagnostics computed from these residuals; the values for an AR(5) model are also reported. They confirm the previous results: there is evidence against linearity, and an LSTAR model with *p* = 2 appears slightly misspecified.

Table VI. *p*-Values of misspecification diagnostics (maximum likelihood) | Bera–Jarque | AC(12) | ARCH(12) |
---|

AR (*p* = 5) | 0.0200 | 0.1013 | 0.0008 |

LSTAR (*p* = 2) | 0.6112 | 0.1068 | 0.0096 |

LSTAR (*p* = 3) | 0.6252 | 0.1279 | 0.0484 |

MSAR (*p* = 3) | 0.7602 | 0.0171 | 0.0476 |

### 9. DISCUSSION AND CONCLUSIONS

- Top of page
- Abstract
- 1. INTRODUCTION
- 2. A POSTERIOR SIMULATOR FOR SMOOTH TRANSITION AUTOREGRESSIONS
- 3. BAYESIAN ESTIMATION OF MARKOV SWITCHING AUTOREGRESSIONS
- 4. BAYESIAN SPECIFICATION SEARCH
- 5. POSTERIOR PREDICTIVE
*P*-VALUES - 6. MCMC ESTIMATION
- 7. MAXIMUM LIKELIHOOD ESTIMATION
- 8. FORECAST EVALUATION
- 9. DISCUSSION AND CONCLUSIONS
- Acknowledgements
- REFERENCES
- Supporting Information

This paper has attempted to compare the performance of LSTAR, MSAR, and AR models for forecasting the monthly US unemployment rate. Unlike many past contributions, it has used fully Bayesian methods for estimating predictive densities.

The LSTAR model incorporates strong prior knowledge on the factors determining the onset of transitions between regimes, in the form of a particular transition function and transition variable; by contrast, in the MSAR model, such prior knowledge only consists in a flexible evolution equation. It is therefore not surprising that *an appropriate* LSTAR model should make better use of available information and perform better on predictive efficiency tests. The fact that both models yield insignificant misspecification diagnostics and have insignificant mean loss differentials, however, is an additional argument in favor of the particular LSTAR model used in this paper: if two formulations with different prior assumptions yield results that are essentially similar, our confidence in the stronger assumptions increases, and one will tend to favor the more structured alternative. In this sense, the MSAR and LSTAR approaches can be said to be cross-validating and complementary.

It has been pointed out by a referee that the basic MSAR model can be extended to specify transition probabilities that depend on past observables. A simple formulation would model the transition probabilities as logistic normal processes. By a suitable choice of parameters, a logistic normal can approximate a Beta distribution; this leads to a natural candidate-generating density in a Metropolis–Hastings step. This was tried by the author, but did not result in a well-behaved MCMC sampler. The reason appears to be due to the shape of the logistic function, which is nearly constant over a large range of argument values. A more sophisticated approach, followed by Filardo and Gordon (1998), uses a latent probit process to model the transition probabilities and appears to work well in the MCMC context. Unfortunately, this generalization obviously complicates an MCMC sampler, the bridge sampling estimation of marginal likelihoods, and the estimation of predictive densities; it is not certain that the primary aims of this paper could have been achieved with the more general model. In the opinion of the author, the more limited objective that consists in comparing two relatively simple formulations that clearly differ in their prior assumptions on structural change is also interesting, if only to validate the choice of a particular LSTAR transition variable: indeed, in Table I, the ranking between the MSAR and LSTAR models would have been reversed if *d* = 4 rather than *d* = 1 had been chosen.

The data used in this paper are available from January 1948. When the MSAR model is estimated using the full sample from 1948 to 2004, a third regime can be identified. In this regime, the equation variance is approximately six times higher than in the other two; and the autoregression coefficients take values that are intermediate between those for expansions and contractions. The probability that this third regime occurs after 1960, however, is estimated to be less than 20% in any period, most estimates being very close to zero. Adding the pre-1960 data to each of the 295 information sets defined in Section 8.2 and using the three-regime MSAR model for conducting the simulated prediction exercises improves neither efficiency nor mean absolute prediction errors. For this reason, the data prior to 1960 were considered to be mostly of historical interest, and the results obtained with the extended MSAR model are not reported.

The evidence in favor of nonlinear dynamics appears to be stronger in this paper than in previous contributions. This may be partly due to our use of a monthly data frequency, rather than the quarterly frequency in, for example, Rothman (1998) and Montgomery *et al.* (1998). Indeed, in another context, Klaassen (2005) finds that using lower-frequency data can mask evidence in favor of regime switching.

This paper has also illustrated the importance of allowing *all* the parameters of a hidden Markov model of the unemployment rate to switch between regimes. This can be seen by comparing our results with those obtained by Bianchi and Zoega (1998). By allowing only the intercept term and the equation variance to vary across regimes, these authors all but eliminate the possibility of detecting Markov switching behavior in the US unemployment rate, and conclude in favor of a one-state model for this series; this conclusion is at considerable variance with the results of the present study.

Finally, this paper has illustrated the potential of regressing observations on out-of-sample point predictions for discriminating between non-nested models. It is perhaps surprising that this simple technique, which appears to have been originally suggested by Granger and Ramanathan (1984), has not been used more often in the literature.