A new procedure is proposed for modelling nonlinearity of a smooth transition form, by allowing the transition variable to be a weighted function of lagged observations. This function depends on two unknown parameters and requires specification of the maximum lag only. Nonlinearity testing for this specification uses a search over a plausible set of weight function parameters, combined with bootstrap inference. Finite-sample results show that the recommended wild bootstrap heteroskedasticity-robust testing procedure performs well, for both homoskedastic and heteroskedastic data-generating processes. Forecast comparisons relative to linear models and other nonlinear specifications of the smooth transition form confirm that the new WSTR model delivers good performance. Copyright © 2010 John Wiley & Sons, Ltd.


Nonlinear models play an increasingly important role in the analysis and forecasting of economic and financial time series. An attractive feature of these models is their state dependence, since it is often plausible that the nature of responses may vary with underlying conditions, such as the state of the business cycle, the central bank's monetary policy stance or conditions in financial markets. Smooth transition regression (STR) models are popular in this context because they provide an economic explanation for the regime, by making it a continuous function of an observed variable. Teräsvirta (1994) provides a coherent modelling strategy in the univariate smooth transition autoregression (STAR) context, which is generalized in Teräsvirta (1998) to the STR case. Recent applications include Anderson et al. (2007), Sensier et al. (2002) and Teräsvirta et al. (2005).

A crucial step in STR modelling is the specification of the transition variable that determines the regime. In practice, this is almost invariably taken to be a single lag of an observed variable, with researchers typically following the recommendation of Teräsvirta (1994) to specify the appropriate lag using linearity test statistics computed over a range of plausible values. However, the choice is often not clear-cut, since linearity may be rejected at multiple lags, implying that more than one lagged value contains information about the current regime. Although Medeiros and Veiga (2003, 2005) allow the transition variable to be an unknown linear function of multiple lags, the resulting procedure is fairly complicated and does not always deliver a transition variable that is a weighted average (with positive weights) of lagged values.

The present paper proposes a new procedure for transition variable lag specification, by defining it to be a parsimonious weighted average over potential lags. This WSTR (weighted STR) model requires estimation of only one additional parameter compared to procedures based on a single lag, which is a minimal cost in relation to the added flexibility it delivers. Although the Medeiros and Veiga (2003, 2005) specification is more general, our approach is preferable for the many situations in which regimes are anticipated to vary smoothly over time. The WSTR model is closely related to the STMIDAS specification of Galvão (2009), who exploits the mixed data frequency MIDAS model of Ghysels et al. (2005, 2006) to examine the use of high-frequency data for forecasting a lower-frequency variable in an STR context. Although the WSTR model can also be used with mixed-frequency data, our focus differs from Galvão (2009) in being more concerned with issues of model specification and nonlinearity testing.

As usual with models in the STR class, testing for nonlinearity has to confront the issue that transition function parameters are not identified under the null hypothesis. Our approach follows much of the literature by applying a Taylor series approximation. However, the implementation differs in that we propose searching over a plausible set of values for the WSTR weighting function parameters and applying the bootstrap approach of Hansen (1996). Further, we advocate the use of the wild bootstrap to account for possible heteroskedasticity of unknown form. In line with Becker and Hurn (2009), our results indicate that the wild bootstrap approach performs very well, delivering reliable finite sample size and power comparable to that achieved by tests that assume homoskedasticity when the true data-generating process (DGP) is homoskedastic.

The structure of the paper is as follows. Section 2 outlines the WSTR model, focusing on the transition variable. Inference is discussed in Section 3, which develops the proposed WSTR nonlinearity test and studies its properties through a Monte Carlo analysis. Section 4 then examines forecast performance in comparison with linear and other STR specifications. A concluding section completes the paper, with model estimation discussed in an Appendix.


This section discusses the transition variable in STR models and then examines the WSTR weighting function we propose. Although we consider a single transition function, generalization to two or more such functions is straightforward.

2.1. The Transition Variable

A model of the STR class may be written as


where xt is a (1 × n) vector of explanatory variables (typically including p lagged values of yt), α1 and β1 are (n × 1) parameter vectors, f(st) is the (scalar) transition function satisfying 0 ≤ f(st) ≤ 1 and εt is a martingale difference process. As (1) can be written as equation image, where equation image and equation image, it is evident that the coefficients of the STR model vary as a function of the (1 × q) observation vector st. This time variation is often captured through the logistic specification1


in which f(st) is a monotonically increasing smooth function of the scalar transition variable stδ for given (q × 1) parameter vector δ. The properties of the model given by (1) and (2) are well described by Teräsvirta (1998), while van Dijk et al. (2002) review univariate STAR models.

In practice, most researchers follow the methodology of Teräsvirta (1994) and define f(st) in terms of a single lagged value of an observed variable. That is, assuming it is known that zt defines the regimes in yt (zt = yt in the univariate case), then st = (zt−1, …, ztq), with δ in (2) defined implicitly as δ = ek, where ek is the kth column of a q × q identity matrix. Specification of the transition variable then amounts to specifying the appropriate lag k. Following the convention of the literature, we refer to (1) and (2) with δ = ek as an LSTR model (or LSTAR model in the univariate case). Although the LSTR specification ensures that f(st) is a smooth function of the specific lagged variable ztk, it is not necessarily smooth over time. Since regimes associated with economic phenomena such as the business cycle are persistent, some researchers have attempted to avoid implausibly frequent regime changes (that is, abrupt movements from f(st)≈ 0 to f(st)≈ 1 or vice versa) by employing an average as the transition variable.2 Although this implies an essentially ad hoc specification of δ, it suggests that some averaging over lags may be appropriate when the transition function f(st) is anticipated to be a smooth function over time.

The NCSTR (neuro coefficient STR) model of Medeiros and Veiga (2003, 2005) is also based on (1) and (2), but imposes no a priori restrictions on the elements of δ beyond the maximum lag q and the identification conditions requiring δδ = 1 and δi > 0 for some given i. For practical reasons, an unrestricted δ is not estimated, with Medeiros and Veiga (2003, 2005) generalizing the selection procedure of Teräsvirta (1994) to specify the nonzero elements of this vector. However, although the NCSTR class allows for very general weight functions, this generality does not ensure that the estimated weights deliver smooth temporal regimes in contexts when these are plausible.

Our proposed WSTR model parameterizes the weights through the beta distribution in order to avoid these problems. The properties of these weights are discussed in the next subsection, with estimation issues considered in the Appendix. In addition to delivering smooth regimes over time, the WSTR model has the advantage of avoiding the transition variable specification step required in both Teräsvirta (1994) and Medeiros and Veiga (2003, 2005).

2.2. Beta Distribution Weights

Our proposed WSTR model employs


where g(i|q1, κ2) is the density function of the beta distribution and κ1, κ2 are parameters to be estimated. Use of (3) ensures that the restrictions equation image with δi ≥ 0 (i = 1, …, q) are satisfied, so that the transition variable stδ is generally smooth over time.

Parsimonious weighting functions have a long history in econometrics, where they are often known as distributed lags (e.g., Almon, 1965). Beta distribution weights have been found useful in the recent literature developing the MIDAS approach (Ghysels et al., 2005) for modelling data observed at different frequencies, with Ghysels et al. (2007) providing a discussion of the variety of shapes generated by (3). In particular, κ1 = κ2 = 1 yields equal weights (corresponding to a transition variable which is an average) over lags 1, …, q, while other values yield patterns with modal weight at a specific lag, such as 1 or 2. The single lag weight function δ = ek can also be approximated by beta distribution weights.3

An important advantage of the WSTR model based on (3) is that, by introducing one extra parameter compared with the usual LSTR model (κ1 and κ2 as compared to the single lag k), flexible and parsimonious weighting functions are obtained. Further, no model specification step is required, because estimation of κ1 and κ2 also acts to select the lags that enter the transition function. Although the specification of q is required, this is common to all procedures in the realistic case where the relevant lag(s) entering the transition function is (are) unknown.


This section develops a procedure for testing linearity against nonlinearity of the WSTR form.4 After discussing the use of Taylor series expansions for STR nonlinearity tests, our wild bootstrap test is outlined and its finite sample properties examined.

3.1. Taylor Series Expansions

The testing procedure for STR models through Taylor series approximations is laid out in Teräsvirta (1994) and Luukkonen et al. (1988). Our discussion considers the bivariate case with xt = (yt−1, …, ytp, zt−1, …, ztr) and n = p + r in (1), since this is sufficient to illustrate the issues. We assume the transition variable is a linear combination of the elements of st = (zt−1, zt−2, …, ztq); the modifications required for st = (yt−1, yt−2, …, ytq), including the univariate case, are obvious. Generalizations to further explanatory variables are also straightforward, although our methodology assumes that st involves lags of a single variable. Note, however, that we do not require r = q, so that the maximum lag q potentially entering the transition function (2) can differ from the maximum lag r for zt in xt.

It is convenient to define scalar equation image and, without loss of generality, centre the logistic function by defining equation image, so that f(0) = 0. To avoid the identification problem in (1) under the null hypothesis γ = 0, Luukkonen et al. (1988) replace equation image by a Taylor series approximation around equation image. With a third-order approximation for equation image, this leads to


where equation image includes the approximation error.5 The vector parameters ϕi (i = 1, 2, 3, 4) are functions of α0, α1, β0, β1, γ and δ in (1)/(2). For known δ, linearity can be tested against a WSTR alternative through the 3n restrictions ϕi = 0 for i = 2, 3, 4 in (4).

However, in practice δ is typically unknown.6 Using (3), only (κ1, κ2) are required and, although these parameters are also unknown, it may be plausible to define a set κ = (κ1, ℓ, κ2, ℓ)′(ℓ = 1, …, m), with corresponding weight vectors δ, which captures the features anticipated in a specific application. Alternatively, the traditional single lag specification δ = ek for k = 1, …, q could be employed as the set of lag functions for testing purposes. It may be noted that although the test regression (4) can be generalized to include all members of the set, as in Luukkonen et al. (1988) for the LSTAR model, this leads to a highly parameterized test regression. Therefore, we pursue a bootstrap approach to nonlinearity testing.

3.2. Bootstrap Inference

Assume a discrete set Δ of weight vectors δ(ℓ = 1, 2, …, m) is given, which can be specified through a set of κ = (κ1, ℓ, κ2, ℓ)′ in (3). Then, indexing the corresponding parameters of (4) in an obvious way, a Lagrange multiplier test of H0: ϕi, ℓ = 0(i = 2, 3, 4) for given δ in this regression results in a test statistic LM (δ). A linearity test requires a joint test over all ℓ = 1, 2, …, m and, following the literature in this area (Davies, 1987; Andrews and Ploberger, 1994; Hansen, 1996), the maximal, exponential or average statistics, LMmax, LMexp or LMave, can be applied to translate LM (δ), for all δ∈Δ, into a single test statistic. As is now well known, these overall statistics do not follow standard distributions, and critical values can be tabulated only for limited specific cases (Andrews, 1993).

Our approach to hypothesis testing is based on Hansen (1996), who proposes simulating the distributions of test statistics for parameters that are unidentified under the null hypothesis. However, our approach differs in how possible heteroskedasticity is taken into account. Allowing for heteroskedasticity when testing for STR nonlinearity has been problematic, with Lundbergh and Teräsvirta (1998) finding that robustification can remove most of the power of the test. Nevertheless, Becker and Hurn (2009) demonstrate that appropriate bootstrap techniques can deliver reliable inference and we pursue this line in order to avoid the undesirable consequences of unmodelled heteroskedasticity.

For a specific δ, let xt, ℓ = (1, xt, (stδ)xt, (stδ)2xt, (stδ)3xt) and let Y and X represent the stacked matrices of sample observations yt and xt, ℓ for t = 1, …, T, with M = XX/T. Define the n0 = n + 1 and n1 = 3n-dimensional parameter vectors α = (ϕ0ϕ1)′ and equation image with equation image and equation image. Also define R = (0I)′, with 0 being an (n1 × n0) matrix of zeros and I an (n1 × n1)-dimensional identity matrix. Further, let equation image be the estimated residual vector for the model imposing the null hypothesis, with the score function having typical element equation image.

In the presence of possible heteroskedasticity, the LM statistic for testing H0: equation image in (4) for a specific δ is


where equation image is a consistent robust estimator of the covariance matrix for equation image, in which equation image and equation image is obtained by stacking the score vector evaluated under the null hypothesis, namely stacking equation image for t = 1, …, T. When (5) is computed over all δ∈Δ, sample statistics for LMequation image, LMequation image or LMequation image can be obtained.

In order to conduct asymptotically valid tests that replicate the heteroskedasticity observed in the data, the resampling procedure of Hansen (1996), when applied in our case, involves resampling from the distribution of equation image through


for j = 1, 2, …, J, where uj is a (T × 1) vector of standard normally distributed random variables. Asymptotically, this randomization preserves the observed heteroskedasticity, but any given random draw j does not exactly reproduce the heteroskedastic pattern of the observed data. Following the recent bootstrapping literature, we apply the fixed-design wild bootstrap to generate random draws from the null distribution. Gonçalves and Kilian (2004) show theoretically that wild bootstrap procedures applied to robust test statistics deliver consistent inference for the coefficients of an autoregressive process. Moreover they establish through simulations that in moderate samples wild bootstrap procedures tend to be more accurate than the robust test statistics evaluated against their asymptotic distributions. Becker and Hurn (2009) find that the wild bootstrap performs well when testing nonlinearity in the presence of heteroskedasticity.

To implement the wild bootstrap, replace equation image in (6) with equation image, where each element equation image (t = 1, …, T, j = 1, …, J) is generated as an independent draw from the Rademacher distribution such that equation image, each with probability 0.5. By using the residual computed under the null hypothesis in equation image, but randomizing its sign through equation image, the fixed-design wild bootstrap exactly replicates the heteroskedasticity observed in the finite sample under test.7 Realization j from the asymptotic null distribution of (5) is computed as


After calculating LMj(δ) in (7) for each δ∈Δ, maximum, exponential and/or average test statistics for bootstrap replication j can be computed. Repeating this procedure for j = 1, …, J draws of ηj, the bootstrap p-value for a sample (LMequation image, LMequation image or LMequation image) test statistic is then obtained as


where I is an indicator function which takes the value unity when the condition in parentheses is satisfied and is zero otherwise. Clearly, (8) refers to the maximal, exponential or average form of the test statistic, as appropriate.

Under homoskedasticity, the sample statistic is calculated using equation image in (5), where equation image is the estimated disturbance variance imposing the null hypothesis. Bootstrap inference is conducted as in Hansen (1996), and proceeds by using equation image in place of (6) and resampling equation image, where elements of ε are independent equation image random variables.

3.3. Finite-Sample Properties

This subsection examines the finite-sample size and power properties of the nonlinearity test procedure just outlined. All results are based on 10,000 replications and sample sizes are T = 200, 500 and 1000, while bootstrap inference uses J = 400.

3.3.1. Size

The DGP for investigating size is the univariate AR(1) yt = 0.4yt−1 + εt and the nonlinear model investigated is of the univariate WSTAR form. In the homoskedastic case, εtN(0, 1), while the heteroskedastic case employs equation image, with equation image for equation image and equation image for equation image, where the latter represents the occasional abrupt volatility changes that appear to characterize macroeconomic variables (see, for example, Sensier and van Dijk, 2004). The WSTAR nonlinearity test utilizes the auxiliary regression (4), with inference conducted as in SubSection 3.2 employing the Hansen (1996) homoskedastic bootstrap procedure and our fixed-design wild bootstrap version of the Hansen heteroskedastic-robust test.8 These two cases are denoted by the subscripts homo and wb, respectively.

In practice, the order p of the autoregression in the DGP is unknown, while implementation of the WSTAR nonlinearity test also requires a priori specification of the maximum lag q that may enter the transition function. To replicate the (often arbitrary) values used in applications with quarterly data, we consider p = q = 4 and p = q = 8. A third case utilizes p = 4 and q = 8, mimicking the situation where a researcher considers that a relatively long weighted average of past values may be required to capture business cycle regimes. The sets Δ of δ values employed are defined through (κ1, ℓ, κ2, ℓ), and include (κ1, ℓ = 1, κ2, ℓ = 1), giving equal weight to all lags, and also cases where the weight on lag i is close to unity for i = 1, 2, 3, 4, to allow the possibility that nonlinearity may be of the LSTR type assumed by Teräsvirta (1994).9

Size results are displayed in Table I for nominal significance levels of 1%, 5% and 10%. Results based on LMexp are not shown, in either its homoskedastic or heteroskedastic forms, since these are always very similar to those obtained using the corresponding LMave and, more particularly, LMmax statistics.

Table I. Empirical size of WSTR tests
Statisticp = 4, q = 4 Nominal sizep = 8, q = 8 Nominal sizep = 4, q = 8 Nominal size
  1. Note: Empirical sizes of WSTR nonlinearity tests, with p lags in the linear part and maximum lag q in the transition function. Test statistics assume homoskedastic (subscript homo) or heteroskedastic disturbances through the use of the wild bootstrap (subscript wb). See text for further details.

(a) Homoskedastic DGP
T = 200LMequation image0.0090.0410.0860.0070.0480.0990.0080.0440.093
 LMequation image0.0070.0380.0870.0060.0430.0980.0080.0420.088
 LMequation image0.0120.0480.0940.0110.0480.0990.0120.0490.099
 LMequation image0.0110.0490.0950.0100.0510.1040.0120.0500.097
T = 500LMequation image0.0100.0460.0930.0080.0460.0970.0100.0420.088
 LMequation image0.0100.0410.0890.0090.0430.0900.0090.0400.082
 LMequation image0.0100.0500.1000.0110.0500.0970.0110.0470.097
 LMequation image0.0120.0510.1020.0120.0500.0960.0110.0450.095
T = 1000LMequation image0.0100.0460.0910.0120.0490.0960.0100.0440.091
 LMequation image0.0110.0450.0910.0110.0450.0950.0090.0450.088
 LMequation image0.0100.0490.0970.0130.0520.1010.0130.0500.102
 LMequation image0.0130.0510.0960.0120.0490.1010.0130.0510.098
(b) Heteroskedastic DGP
T = 200LMequation image0.1650.3790.5220.3480.6310.7680.1830.4010.544
 LMequation image0.1590.3700.5100.3680.6510.7840.1710.3860.526
 LMequation image0.0130.0530.1020.0160.0640.1240.0130.0560.107
 LMequation image0.0140.0550.1100.0160.0680.1320.0120.0550.106
T = 500LMequation image0.2040.4140.5570.4080.6780.7960.2110.4280.572
 LMequation image0.1950.4010.5400.4380.6980.8120.1990.4140.557
 LMequation image0.0120.0570.1060.0150.0570.1060.0130.0570.108
 LMequation image0.0120.0530.1040.0140.0590.1180.0130.0540.106
T = 1000LMequation image0.2190.4280.5690.4240.6780.7980.2220.4450.581
 LMequation image0.2110.4230.5560.4540.7000.8080.2130.4300.572
 LMequation image0.0130.0550.1070.0140.0600.1130.0120.0550.112
 LMequation image0.0140.0570.1120.0160.0600.1160.0110.0560.114

With a homoskedastic DGP in panel (a), the tests assuming homoskedasticity have empirical size fairly close to the nominal value, although they are slightly conservative. Not surprisingly, size generally improves with T and also generally improves when fewer parameters are estimated. Indeed, very similar sizes apply for q = 4 and 8 when a common assumed AR order p = 4 is employed. Although it allows for the possibility of heteroskedasticity that is not present in this DGP, it is striking that the use of the wild bootstrap procedure delivers very good size for all cases in panel (a), irrespective of T.

However, when the true DGP is heteroskedastic, tests based on homoskedasticity are badly oversized; see panel (b) of Table I. For example, with a nominal significance level of 0.05, the empirical size is around 0.40 or more, and exceeding 0.60 when p = 8 is employed. This finding applies irrespective of the use of LMmax, LMexp or LMave, and hence rejections of linearity using tests based on homoskedasticity should be treated with extreme caution when heteroskedasticity may be present. Once again, wild bootstrap inference performs well. Although some oversizing is observed for these tests in panel (b), this is modest even when T = 200, and especially with p = 4.

3.3.2. Comparison with Other Tests

Table II compares our WSTAR nonlinearity test procedure with two approaches based on the conventional assumption of a single-lag transition variable, ytk. The first, denoted LMequation image, is a special case of LMequation image, but restricts the set of weight functions Δ to those with weight concentrated on a single lag ytk, k = 1, …, q, representing a conventional LSTAR model. The second, denoted LST, is the overall test of Luukkonen et al. (1988) for a single-variable transition function at unknown lag k = 1, …, q, for which the test statistic is compared with the appropriate asymptotic χ2 distribution. This test also assumes homoskedasticity, and hence LMequation image and LST compare different approaches to implementing a nonlinearity test (under homoskedasticity) when δ = ek.

Table II. Empirical size and power of nonlinearity tests
TestLinear AR(1) DGP Nominal sizeLSTAR DGP δ = (1000)′ Nominal sizeWSTAR DGP δ = (1/31/31/30)′ Nominal size
  1. Note: Tests are the WSTR LMmax and LMave (subscripts homo and wb indicate use of homoskedasticity and wild bootstrap draws, respectively), LMkmax is the LMmax test computed over weight distributions representing the single-lag LSTAR specification, while LST indicates the Luukkonen et al. (1988) test. The linear AR(1) results yield size, while the LSTAR and WSTAR DGPs give empirical power. In all cases p = q = 4 in the test regression.

T = 200LMequation image0.0080.0410.0840.0200.0840.1610.0410.1480.258
 LMequation image0.0060.0390.0810.0180.0860.1660.0510.1780.294
 LMequation image0.0080.0440.0940.0220.0860.1630.0490.1540.257
 LMequation image0.0090.0470.0970.0260.1050.1870.0550.1690.275
 LMequation image0.0100.0420.0890.0230.0870.1600.0440.1420.240
T = 500LMequation image0.0070.0440.0860.0940.2470.3680.2540.4790.609
 LMequation image0.0080.0440.0900.0870.2430.3680.2940.5310.659
 LMequation image0.0120.0520.0990.0970.2510.3710.2770.5050.635
 LMequation image0.0120.0510.1010.1080.2720.3910.3380.5740.690
 LMequation image0.0110.0450.0930.1090.2490.3590.2390.4480.581
T = 1000LMequation image0.0090.0450.0920.3760.5930.7060.7180.8800.932
 LMequation image0.0080.0460.0920.3050.5510.6790.7540.9040.948
 LMequation image0.0100.0500.1010.3810.5980.7090.7470.8920.939
 LMequation image0.0090.0530.1060.3340.5700.6950.7880.9170.957
 LMequation image0.0090.0480.0940.3820.6060.7130.6880.8590.919

Although not reported in Table II, our analysis also considered a test based on the lag k leading to the strongest rejection of the linearity null hypothesis over k = 1, …, q when δ = ek in (4). This is proposed by Teräsvirta (1994) as a model specification procedure, but the literature abounds with examples where the minimum p-value obtained by comparing each test statistic to the χ2 distribution is incorrectly treated as valid in relation to the overall linearity null hypothesis. However, this leads to very substantial over-sizing, with empirical size around three times the nominal size, irrespective of the sample size employed.

Both homoskedastic and wild bootstrap versions of the univariate WSTAR tests are implemented in Table II using q = 4.10. An AR lag of p = 4 is employed for all tests.

The linear AR(1) process used in Subsection 3.3.1 is again employed to evaluate size. Two nonlinear DGPs provide power information, both of which use (1) and (2) with α0 = β0 = 0, α1 = (0.6000)′, β1 = (−0.4000)′, γ = 20 and c = 0. One is a conventional LSTAR specification with δ = (1000)′ and hence has transition variable yt−1, while the second has weights of the WSTAR form with δ = (1/31/31/30)′. Note that the latter true weight distribution is an average over three not four lags, so that none of the weight distributions included in the set Δ exactly corresponds to this δ. For the analysis of Table II, all DGPs are homoskedastic.

The WSTR results for the AR(1) in Table II are, not surprisingly, similar to the more detailed size results in Table I. The LST statistic is quite severely undersized for T = 200, 500 but, as anticipated, this is less marked for T = 1000.

Turning to power, the LST test always has less power than those employing bootstrap inference, due primarily to the number of parameter restrictions being tested. For example, in the LSTAR specification with transition variable yt−1, when T = 500 and a nominal 5% significance level is employed, the LST test has power 0.18, whereas the corresponding bootstrap tests, including LMequation image, have power around 0.25. All tests gain power when applied to the DGP that has a WSTAR form, but the advantage of the bootstrap approach over the Luukkonen et al. (1988) test is also more marked in this case. Although considered here only for p = 4, it can be anticipated that the performance of the bootstrap tests will dominate the LST test even more strongly when higher potential lag orders are considered, especially for realistic sample sizes in macroeconomics, such as T = 200 or T = 500.

Also note that, when the DGP is of the WSTAR form, the WSTAR tests that consider a wider set of weight functions almost always have higher power than LMequation image. However, and perhaps surprisingly, the LMequation image statistic does not dominate these more general tests when the DGP has an LSTAR form. Further, robustification against heteroskedasticity using the wild bootstrap LM statistics does not lead to a deterioration of power. Indeed, in contrast to the recommendation of van Dijk et al. (2002, p. 160) that robust procedures not be used for nonlinearity testing, our recommendation (based on Tables I and II) is that the wild bootstrap form of the WSTR test should always be applied, irrespective of whether heteroskedasticity is suspected or not and irrespective of whether the transition variable may be a single lag or a weighted average of lags.


While the presence of nonlinearities in many economic time series relationships is unquestioned, it is more contentious whether nonlinear models are useful for forecasting (see Teräsvirta et al., 2005, for a relevant discussion). This section undertakes two comparisons of the WSTR form to linear, LSTR and NCSTR models, through both a Monte Carlo experiment and an empirical study of UK gross domestic product (GDP).

4.1. Monte Carlo Evidence

We employ a nonlinear DGP similar to the two-regime model estimated by Potter (1995, Table II, p. 113) for GDP in the USA. Our DGP has the form of (1), where xt = st, while maximum lags p = q = 4 are employed for analysis. The transition function is logistic, as in (2), with c = 0.0 and γ = 100, while α0 = − 0.205, equation image, β0 = 0.545, equation image and εtN(0, σ2).

The parameters σ2 and δ vary across simulations, with σ2 = 0.94, 0.5, 0.1, the first of these being approximately the value estimated in Potter (1995). For the nonlinear specifications, lower values of σ2 (with other parameters given) yield stronger nonlinearities. The weight vectors δ considered are


where δ1 yields a conventional LSTAR model with delay parameter k = 2, while δ2 is a more general weight function that may be captured by WSTAR or NCSTAR specifications. In addition, a linear AR(2) DGP is considered, with parameters α0, α1 and σ2 as for the nonlinear models.

Each DGP is replicated 1000 times for samples of T = 500, 1000 observations. For the linear, LSTAR and WSTAR models, the lag p employed in xt is selected using the Schwarz SBC criterion in a linear model context. For the NCSTAR model, SBC is used in the way proposed by Medeiros and Veiga (2005). The transition variable lag k for the LSTAR model is selected as that yielding the minimum residual sum of squares using a grid search over γ, c and k = 1, 2, 3, 4, 11 while the included lags in the NCSTAR case are selected through the testing procedure of Medeiros and Veiga (2005). As noted above, no such lag choice is required for the WSTAR model. All models are estimated by nonlinear least squares; see the Appendix for further discussion.

After estimation, forecasts are calculated for horizons τ = 1, 2, …, 10, by simulating the estimated process to τ steps ahead (400 simulations per forecast) and the average yielding estimates of E[yT|yT, yT−1, …]. The resulting root mean square forecast errors (RMSFEs) across the 1000 replications are shown in Table III as ratios relative to the AR model on the numerator, and hence values greater than unity indicate more accurate forecasts relative to a linear specification.

Table III. Forecast performance relative to linear AR model
Tγσ2δLSTAR Model Horizon τWSTAR Model Horizon τNCSTAR Model Horizon τ
  1. Note: Values presented are root mean square forecast errors (RMSFE) relative to the linear model (on the numerator). Linear and nonlinear DGPs are employed. See text for details.

(a) No prior nonlinearity test
(b) With prior nonlinearity test

Panel (a) of Table III reports results without any prior test for the presence of nonlinearity and with a single transition function employed for the nonlinear models. Not surprisingly, the linear specification generally produces the most accurate forecasts when the true process is linear (γ = 0). However, the accuracy loss from using a WSTAR or NCSTAR model at short horizons (up to τ = 5) is at most 2%, which is less than that of the more restricted LSTAR model. However, at τ = 10 the performance of the NCSTAR model can drop dramatically, even with T = 1000 observations. Although this may be ascribed partly to no prior testing for nonlinearity, nevertheless neither the WSTAR nor the LSTAR model exhibits similar RMSFE deterioration.12

The NCSTAR model also generally performs well for the LSTAR DGP (weight vector δ1). Indeed it outperforms even the LSTAR model for this case in panel (a), whereas the WSTAR model is generally a little less accurate. Although the more general weight vector δ2 can, in principle, be captured by both the WSTAR and the NCSTAR models, the former does better overall in this case. As for the linear DGP, the performance of the NCSTAR (and indeed the LSTAR here) sometimes deteriorates badly at longer horizons. However, the WSTAR model does not suffer from this problem.

It is often emphasized that nonlinear models should be used for forecasting only when there is evidence that nonlinearity is present in the data. Therefore, panel (b) of Table III shows results for a selection of DGPs when the nonlinear model is used only if a prior test for nonlinearity, applied to the sample of T observations, indicates nonlinearity at the 5% significance level. In the light of its good performance in Section 3.3, this prior testing employs the WSTAR test LMequation image, using the weights for q = 4 as in Tables I and II. Forecasts are generated from a linear model if no significant nonlinearity is found. Further, since the NCSTAR modelling strategy of Medeiros and Veiga (2003, 2005) involves testing for additional nonlinearity after estimating a model with one (or more) transition functions and adding further transitions (one at a time) until no such evidence is found, this is incorporated in the NCSTAR model results in panel (b).13

The pattern of results from panel (a) continues to apply in panel (b), except that the NCSTAR model delivers worse performance than previously when the DGP is linear. Further, the pre-testing strategy does not prevent the NCSTAR model sometimes producing very large RMSFE when τ = 10, as in the results for T = 1000, σ2 = 0.5 and weights δ1. The WSTAR model is particularly impressive, with pre-testing yielding forecasts that are as accurate as the linear and LSTAR models for the linear and LSTAR DGPs, respectively, and has the best performance of all models when the DGP has the smooth transition function weights of δ2.

4.2. UK Growth and Interest Rate Spread

There is an extensive literature (particularly for the USA) on the leading indicator properties of the interest rate spread for output growth; see, among many others, Estrella and Hardouvelia (1991), Hamilton and Kim (2002), and Davis and Fagan (1997). Although much of the literature employs linear models, Galvão (2009) uses higher-frequency financial variable data in a smooth transition MIDAS specification for US GDP growth, while Anderson et al. (2007) examine nonlinear interactions for the G7 countries. Here we investigate the performance of linear and nonlinear models for forecasting quarterly UK real GDP growth using the end-of-month interest rate spread (at the monthly frequency). The spread is constructed as 10-year bond returns less the 3-month Treasury bill yield, while GDP growth is computed as the first difference of the logarithm of the quarterly seasonally adjusted values. The sample period of 1960Q1 to 1999Q4 used for nonlinearity testing and model specification is the same as that employed by Anderson et al. (2007), with subsequent data used for forecasting.

A preliminary linear analysis, with GDP and spread values considered to a maximum of 1 year (4 and 12 lags respectively), selection of maximum lags by Akaike information criterion (AIC) leads to xt = (yt−1, yt−2, yt−3, rt−1/3, rt−2/3)′, where yt is GDP growth and rti/3 is the interest rate spread at a lag of i months relative to GDP in quarter t. Including an intercept and the dummy variable DUM73 for 1973Q1 to account for the abnormal growth of 5% experienced that quarter, the WSTR wild bootstrap nonlinearity test for an interest rate spread transition with q = 12 yields p-values of 0.049, 0.006 and 0.032 for LMequation image, LMequation image and LMequation image, respectively. The set of beta distribution parameters (κ1, ℓ, κ2, ℓ) used in computing these statistics captures a variety of plausible shapes for this monthly case.14

To specify the WSTR model, the transition function corresponding to LMequation image is employed as equation image and the nonlinear model of (1) is estimated by ordinary least squares (OLS), with AIC then used to select individual variables from the set (1, DUM73t, xt, equation image, equation image).15 Nonlinear least squares estimation of the resulting specification yields (with heteroskedasticity-robust t-statistics conditional on transition function parameter estimates in parentheses)




The function equation image(0.010, 0.827) has weight of approximately 0.3 at a lag 1 month and weights that decline slowly with increasing lag length.

A forecast comparison employs the alternative models as in the previous subsection. All include an intercept and DUM73; the linear specification employs xt as defined above. The LSTR specification is selected from the set (1, DUM73t, xt, equation image, equation image by AIC in an analogous way to the WSTR case, except that equation image is here the single lag rtk/3 in the range k = 1, 2, …, 12 that yields the maximum nonlinearity test statistic in (4) for δ = ek.16 For the NCSTR case, the modelling procedure follows that recommended by Medeiros and Veiga (2005), employing (1, DUM73t, xt) and transition variable st = (rt−1/3, rt−2/3, rt−3/3, rt−4/3). Note that st in this case is restricted, compared to the WSTR and LSTR cases, due to degrees-of-freedom problems which otherwise arise using this transition function specification procedure.

Out-of-sample forecasts are compared over 25 quarters from 2000Q1 (i.e., estimated using data to 1999Q4) to 2006Q1 inclusive. Although no model respecification takes place, model parameters are re-estimated each quarter. Forecast accuracy is again compared using RMSFE, with the Clark and West (2007) test performed to evaluate whether forecast improvements are statistically significant. The Clark and West (2007) test is designed for comparing nested models, and examines the null hypothesis of equal predictive accuracy against the alternative that the more general model delivers superior forecasts.17

One-step-ahead forecasts yield the results in Table IV. Note, first, that all three nonlinear specifications have lower RMSFE than the linear model, with the forecasting gains over the AR model being significant, at the 5% level or less. However, the more general WSTR and NCSTR models also provide significant forecast gains over the LSTR specification. Although the difference between the accuracy of the NCSTR and WSTR forecasts is not statistically significant, nevertheless the WSTR forecasts deliver one-step-ahead RMSFE around 2% smaller than the NCSTR model.

Table IV. UK GDP forecast comparison
  1. Note: Comparison of one-step-ahead forecasts of UK GDP growth from 2000Q1 to 2006Q. Linear, LSTR, WSTR and NCSTR forecasts are described in the text. RMSFE is root mean square forecast error relative to the linear model (on the numerator). CW shows the p-values of the Clark and West (2007) test for the forecast accuracy of the more general model relative to the indicated restricted model.

CW p-value relative to linear 0.02730.00000.0031
CW p-value relative to LSTR  0.00010.0273
CW p-value relative to WSTR   0.8985


Building on the distributed lag literature begun by Almon (1965), this paper generalizes the smooth transition class of nonlinear models by defining the transition variable as a parsimonious weighted average of lagged values. Our approach removes the need to specify the appropriate individual lag or lags for the transition variable, while also enabling a wide variety of weight functions to be considered. One advantage is that such a weighted average will, in general, deliver a transition variable that is smooth over time and hence imply persistent regimes. By employing the methodology of Hansen (1996), we also develop a nonlinear testing procedure that delivers correctly sized tests in the realistic situation that the appropriate weight function is unknown and heteroskedasticity may be present. In particular, we recommend use of the fixed-design wild bootstrap (Gonçalves and Kilian, 2004) in this context, showing that it delivers well-sized inference in finite samples, while also having power comparable to that of the homoskedastic Hansen (1996) test when heteroskedasticity is absent.

Because both testing and modelling use a transition variable that is based on flexible but parsimonious weighted functions of lagged values, our approach can be applied in the mixed-frequency context considered by Ghysels et al. (2005, 2006). This is illustrated in our empirical application, where the monthly interest rate spread is used to forecast UK quarterly GDP growth. In this case, and also in a more extensive Monte Carlo examination of forecast accuracy, the WSTR model performs well relative not only to a linear specification but also to the nonlinear LSTAR and NCSTR models. Galvão (2009) has independently proposed an approach similar to ours in the context of prediction and our forecast results are in line with those she obtains.

In summary, we believe that the WSTR model proposed here has wide applicability for economic and financial time series, removing the need to specify the unknown lag for the transition variable and also delivering regimes that plausibly represent persistent phenomena such as business cycle phases. Further, the proposed testing procedure is correctly sized in finite samples, even in the presence of heteroskedasticity.


As with any nonlinear estimation, starting values are important in ensuring that the global optimum of the criterion function is reached. Therefore, use of multiple sets of starting values is recommended for WSTR model estimation. One set of starting values for θWSTR = (α0α1β0β1γcκ1κ2)′ can be obtained as a by-product of the nonlinear test of SubSection 3.2, by using values (κ1, ℓ, κ2, ℓ) corresponding to LMequation image to compute an initial transition variable equation image, which is combined with a grid search over γ, c to obtain an initial equation image in (2).

A second set of starting values can be obtained from the same approach with δ = ek, that is, from the more restricted STR form. This vector, which is also used as the STR starting values, is translated into a vector θWSTR, 0 for the more general WSTR model by mapping the LSTR delay parameter k0 into κ1, 0 and κ2, 0. However, a number of parameter combinations for (κ1κ2)′ give almost unit weight to a single lag k and consequently WSTR estimation problems may be encountered because the objective function is flat in the neighbourhood of such starting values. For this reason, (κ1, 0, κ2, 0) are adjusted to distribute the weight over more than one lag. In practice, we achieve this by multiplying κ1, 0 and κ2, 0 corresponding to the LSTR case by a factor less than one, which increases the variance of the weight distribution.

All results are obtained using the constrained maximum likelihood procedure of GAUSS 6.0 to minimize the residual sum of squares function over the respective parameter vectors. In achieving this, the parameter vector is decomposed as θ = (θ1θ2)′, where θ1 = (α0α1β0β1)′ and θ2 = (γck)′, θ2 = (γcκ1κ2)′ or θ2 = (γcδ′)′ for the LSTR, WSTR and NCSTR models, respectively. This is convenient as, given the estimated equation image, θ1 can be concentrated out of the criterion function, so that nonlinear optimization only needs to search over the relevant parameter space for θ2. Since equation image yields equation image, equation image can be obtained as the standard OLS estimate from regressing yt on equation image. In practice, the WSTR estimation is initialized using the LMequation image results for κ1, κ2, as suggested above, while c is initialized as the median observed value of the corresponding transition variable stδ and γ from a grid search. Nonlinear estimation of the LSTR specification employs an initial grid search for k and γ, with c again initialized as the corresponding median observed value. Finally, initialization for the NCSTR model follows the recommendations of Medeiros and Veiga (2003, 2005), which involves a grid search with random δ.

Our experience is that further restrictions often need to be imposed for NCSTR model estimation. As in any nonlinear threshold model, the estimated equation image needs to deliver equation image that varies over the sample, which (as usual) we ensure by restricting equation image to lie between the 5th and 95th percentile of observations on the nonlinear driving variable equation image. However, for the NCSTR model, variation in equation image may result in equation image lying outside the range of equation image. To avoid this problem, we define equation image and restrict, via a normal cdf squashing function, equation image. We find this substantially improves the stability of the estimation process.

It is interesting to note that, in our Monte Carlo analysis of SubSection 4.1, the NCSTR methodology tends to produce transition variable weight vectors that are substantially less widely distributed across different lags (even when δ2 is the true weight vector) than the WSTR model. Details are available on request.

Finally, after estimation, if the estimated variance–covariance matrix for the full vector equation image is required, this can be obtained using gradient and Hessian estimates for the unconcentrated likelihood function.


This version of the paper has benefited from comments of Heather Anderson, Marianne Sensier, Dick van Dijk, Hashem Pesaran and three referees. Their constructive input is gratefully acknowledged by the authors.

  • 1

    Much of our discussion can be applied to other forms of the transition function, such as the widely used exponential STR specification. However, we focus on the logistic case for expositional convenience.

  • 2

    For example, Teräsvirta et al. (2005), Sensier et al. (2002), and others employ the lagged annual change as the transition variable in a model for a quarterly variable Δyt.

  • 3

    As the weights are based on the density function for a continuous random variable, the use of (3) cannot place unit weight on a single lag. Nevertheless, they can provide good approximations to the conventional single lag transition variable by choosing a suitably concentrated distribution.

  • 4

    We are grateful to a referee who points out that the proposed procedure does not deliver a test that is ‘generically comprehensively revealing’ in the sense of Stinchombe and White (1998). Nevertheless, our purpose is not to design a test against any type of nonlinearity, but rather against models of the LSTR type.

  • 5

    Implicitly (4) assumes rq. If this is not the case, then the linear term xtϕ1 needs to be supplemented by the additional lags zr+1, …, zq, with the corresponding coefficients also zero under H0: γ = 0. Further, it is assumed that any redundant transformed variables are dropped from (4).

  • 6

    In a similar context, Galvão (2009) proposes treating the transition function as known for the purposes of the test by employing a MIDAS specification for the linear part of the model and using the resulting (linear model) weights for δ. However, there is in general no a priori reason why a weighting function suitable for parameterizing a lag function for xtα1 in (1) will also be appropriate for the transition variable.

  • 7

    Although other choices of equation image are possible, our choice is supported by the Monte Carlo analysis of Godfrey and Orme (2004), who study the performance of a variety of regression misspecification tests.

  • 8

    Our analysis also employed the Hansen (1996) approach for heteroskedastic robust inference. These results are, however, not reported as these tests were undersized even for T = 1000, irrespective of whether the DGP was heteroskedastic or not (similar conservative results are reported in Hansen, 1996).

  • 9

    In each case, the set Δ includes nine (κ1, ℓ, κ2, ℓ) values. For q = 4 these are: (0.04,3.0), (4.0,18.0), (6.0,10.0), (0.14,0.89), (1.0,1.0), (0.04,10.0), (14.0,22.0), (22.0,14.0), (10.0,0.04). The values for q = 8 are: (0.04,3.0), (4.0,15.0), (4.0,10.0), (0.14,0.89), (1.0,1.0), (0.04,16.0), (17.0,60.0), (40.0,80.0), (60.0,80.0). Generally, it is advisable to include weights that mimic full weight on individual lags—as, in these examples, the four weights following (1.0,1.0)—and reasonable distributed weight structures—here represented by the weights listed before (1.0,1.0). It is also possible to directly employ pre-specified weight vectors for δl when applying the nonlinearity test and, if required, later find (using a least squares criterion) the beta distribution parameters that provide the closest fit to a desired weight vector.

  • 10

    The same weight set is used for testing as in SubSection 3.3.1, footnote 9. Results were also obtained for the exp form of all LM statistics. However, as for size (discussed in the preceding subsection), these results were qualitatively very similar to those obtained using the max and ave versions. Indeed, the empirical size and power for the exp form was typically intermediate between the max and ave results shown.

  • 11

    The procedure proposed by Teräsvirta (1994) for selecting k yields qualitatively very similar results.

  • 12

    The reason for this, both here and for other DGPs of Table III, is that on a small number of occasions the NCSTR model produces explosive processes which eventually produce large forecast errors. While in practice such processes are easily identified as non-suitable data representations, their presence indicates that estimating NCSTR models can be problematic.

  • 13

    To obtain these results, as proposed by Medeiros and Veiga (2003, 2005), the minimum p-value of additional nonlinearity tests across all possible lag combinations is evaluated against 5% nominal significance. However, underperformance persists even if the more conservative modified Bonferroni p-value of Benjamini and Hochberg (1995) is used. For example, in the latter case, the DGP with T = 500, γ = 0 and σ2 = 0.94 has relative forecast performance of 0.91 at τ = 1.

  • 14

    The values employed are: (1.0,1.0), (8.0,45.0), (16.0,45.0), (16.0,25.0), (15.0,15.0), (65.0,65.0), (25.0,16.0), (45.0,16.0), (45.0,8.0).

  • 15

    Model reduction for the WSTR can be conducted using techniques applicable for LSTR models. In this case we use the approach adopted by Sensier et al. (2002).

  • 16

    We also investigated selecting the transition variable lag through a grid search procedure, as in Sensier et al. (2002). Although yielding marginally more accurate forecasts, the results are qualitatively the same as those reported.

  • 17

    Robust standard errors (Newey–West with lag length 2) are used in calculating the Clark–West statistics. Critical values are obtained from the asymptotic standard normal distribution (Clark and West, 2007). Note that although in principle the models are nested, the separate model reductions applied to obtain the specifications used for forecasting do not guarantee this in practice. Using the full set of variables defined by xt, without model reduction, resulted (in each case) in worse forecast performance than that reported.