This section develops a procedure for testing linearity against nonlinearity of the WSTR form.4 After discussing the use of Taylor series expansions for STR nonlinearity tests, our wild bootstrap test is outlined and its finite sample properties examined.
3.1. Taylor Series Expansions
The testing procedure for STR models through Taylor series approximations is laid out in Teräsvirta (1994) and Luukkonen et al. (1988). Our discussion considers the bivariate case with xt = (yt−1, …, yt−p, zt−1, …, zt−r) and n = p + r in (1), since this is sufficient to illustrate the issues. We assume the transition variable is a linear combination of the elements of st = (zt−1, zt−2, …, zt−q); the modifications required for st = (yt−1, yt−2, …, yt−q), including the univariate case, are obvious. Generalizations to further explanatory variables are also straightforward, although our methodology assumes that st involves lags of a single variable. Note, however, that we do not require r = q, so that the maximum lag q potentially entering the transition function (2) can differ from the maximum lag r for zt in xt.
It is convenient to define scalar and, without loss of generality, centre the logistic function by defining , so that f(0) = 0. To avoid the identification problem in (1) under the null hypothesis γ = 0, Luukkonen et al. (1988) replace by a Taylor series approximation around . With a third-order approximation for , this leads to
where includes the approximation error.5 The vector parameters ϕi (i = 1, 2, 3, 4) are functions of α0, α1, β0, β1, γ and δ in (1)/(2). For known δ, linearity can be tested against a WSTR alternative through the 3n restrictions ϕi = 0 for i = 2, 3, 4 in (4).
However, in practice δ is typically unknown.6 Using (3), only (κ1, κ2) are required and, although these parameters are also unknown, it may be plausible to define a set κℓ = (κ1, ℓ, κ2, ℓ)′(ℓ = 1, …, m), with corresponding weight vectors δℓ, which captures the features anticipated in a specific application. Alternatively, the traditional single lag specification δ = ek for k = 1, …, q could be employed as the set of lag functions for testing purposes. It may be noted that although the test regression (4) can be generalized to include all members of the set, as in Luukkonen et al. (1988) for the LSTAR model, this leads to a highly parameterized test regression. Therefore, we pursue a bootstrap approach to nonlinearity testing.
3.2. Bootstrap Inference
Assume a discrete set Δ of weight vectors δℓ(ℓ = 1, 2, …, m) is given, which can be specified through a set of κℓ = (κ1, ℓ, κ2, ℓ)′ in (3). Then, indexing the corresponding parameters of (4) in an obvious way, a Lagrange multiplier test of H0: ϕi, ℓ = 0(i = 2, 3, 4) for given δℓ in this regression results in a test statistic LM (δℓ). A linearity test requires a joint test over all ℓ = 1, 2, …, m and, following the literature in this area (Davies, 1987; Andrews and Ploberger, 1994; Hansen, 1996), the maximal, exponential or average statistics, LMmax, LMexp or LMave, can be applied to translate LM (δℓ), for all δℓ∈Δ, into a single test statistic. As is now well known, these overall statistics do not follow standard distributions, and critical values can be tabulated only for limited specific cases (Andrews, 1993).
Our approach to hypothesis testing is based on Hansen (1996), who proposes simulating the distributions of test statistics for parameters that are unidentified under the null hypothesis. However, our approach differs in how possible heteroskedasticity is taken into account. Allowing for heteroskedasticity when testing for STR nonlinearity has been problematic, with Lundbergh and Teräsvirta (1998) finding that robustification can remove most of the power of the test. Nevertheless, Becker and Hurn (2009) demonstrate that appropriate bootstrap techniques can deliver reliable inference and we pursue this line in order to avoid the undesirable consequences of unmodelled heteroskedasticity.
For a specific δℓ, let xt, ℓ = (1, xt, (stδℓ)xt, (stδℓ)2xt, (stδℓ)3xt) and let Y and Xℓ represent the stacked matrices of sample observations yt and xt, ℓ for t = 1, …, T, with Mℓ = X′ℓXℓ/T. Define the n0 = n + 1 and n1 = 3n-dimensional parameter vectors α = (ϕ0ϕ′1)′ and with and . Also define R = (0I)′, with 0 being an (n1 × n0) matrix of zeros and I an (n1 × n1)-dimensional identity matrix. Further, let be the estimated residual vector for the model imposing the null hypothesis, with the score function having typical element .
In the presence of possible heteroskedasticity, the LM statistic for testing H0: in (4) for a specific δℓ is
where is a consistent robust estimator of the covariance matrix for , in which and is obtained by stacking the score vector evaluated under the null hypothesis, namely stacking for t = 1, …, T. When (5) is computed over all δℓ∈Δ, sample statistics for LM, LM or LM can be obtained.
In order to conduct asymptotically valid tests that replicate the heteroskedasticity observed in the data, the resampling procedure of Hansen (1996), when applied in our case, involves resampling from the distribution of through
for j = 1, 2, …, J, where uj is a (T × 1) vector of standard normally distributed random variables. Asymptotically, this randomization preserves the observed heteroskedasticity, but any given random draw j does not exactly reproduce the heteroskedastic pattern of the observed data. Following the recent bootstrapping literature, we apply the fixed-design wild bootstrap to generate random draws from the null distribution. Gonçalves and Kilian (2004) show theoretically that wild bootstrap procedures applied to robust test statistics deliver consistent inference for the coefficients of an autoregressive process. Moreover they establish through simulations that in moderate samples wild bootstrap procedures tend to be more accurate than the robust test statistics evaluated against their asymptotic distributions. Becker and Hurn (2009) find that the wild bootstrap performs well when testing nonlinearity in the presence of heteroskedasticity.
To implement the wild bootstrap, replace in (6) with , where each element (t = 1, …, T, j = 1, …, J) is generated as an independent draw from the Rademacher distribution such that , each with probability 0.5. By using the residual computed under the null hypothesis in , but randomizing its sign through , the fixed-design wild bootstrap exactly replicates the heteroskedasticity observed in the finite sample under test.7 Realization j from the asymptotic null distribution of (5) is computed as
After calculating LMj(δℓ) in (7) for each δℓ∈Δ, maximum, exponential and/or average test statistics for bootstrap replication j can be computed. Repeating this procedure for j = 1, …, J draws of ηj, the bootstrap p-value for a sample (LM, LM or LM) test statistic is then obtained as
where I is an indicator function which takes the value unity when the condition in parentheses is satisfied and is zero otherwise. Clearly, (8) refers to the maximal, exponential or average form of the test statistic, as appropriate.
Under homoskedasticity, the sample statistic is calculated using in (5), where is the estimated disturbance variance imposing the null hypothesis. Bootstrap inference is conducted as in Hansen (1996), and proceeds by using in place of (6) and resampling , where elements of ε are independent random variables.
3.3. Finite-Sample Properties
This subsection examines the finite-sample size and power properties of the nonlinearity test procedure just outlined. All results are based on 10,000 replications and sample sizes are T = 200, 500 and 1000, while bootstrap inference uses J = 400.
The DGP for investigating size is the univariate AR(1) yt = 0.4yt−1 + εt and the nonlinear model investigated is of the univariate WSTAR form. In the homoskedastic case, εt∼N(0, 1), while the heteroskedastic case employs , with for and for , where the latter represents the occasional abrupt volatility changes that appear to characterize macroeconomic variables (see, for example, Sensier and van Dijk, 2004). The WSTAR nonlinearity test utilizes the auxiliary regression (4), with inference conducted as in SubSection 3.2 employing the Hansen (1996) homoskedastic bootstrap procedure and our fixed-design wild bootstrap version of the Hansen heteroskedastic-robust test.8 These two cases are denoted by the subscripts homo and wb, respectively.
In practice, the order p of the autoregression in the DGP is unknown, while implementation of the WSTAR nonlinearity test also requires a priori specification of the maximum lag q that may enter the transition function. To replicate the (often arbitrary) values used in applications with quarterly data, we consider p = q = 4 and p = q = 8. A third case utilizes p = 4 and q = 8, mimicking the situation where a researcher considers that a relatively long weighted average of past values may be required to capture business cycle regimes. The sets Δ of δℓ values employed are defined through (κ1, ℓ, κ2, ℓ), and include (κ1, ℓ = 1, κ2, ℓ = 1), giving equal weight to all lags, and also cases where the weight on lag i is close to unity for i = 1, 2, 3, 4, to allow the possibility that nonlinearity may be of the LSTR type assumed by Teräsvirta (1994).9
Size results are displayed in Table I for nominal significance levels of 1%, 5% and 10%. Results based on LMexp are not shown, in either its homoskedastic or heteroskedastic forms, since these are always very similar to those obtained using the corresponding LMave and, more particularly, LMmax statistics.
Table I. Empirical size of WSTR tests
|(a) Homoskedastic DGP|
|T = 200||LM||0.009||0.041||0.086||0.007||0.048||0.099||0.008||0.044||0.093|
|T = 500||LM||0.010||0.046||0.093||0.008||0.046||0.097||0.010||0.042||0.088|
|T = 1000||LM||0.010||0.046||0.091||0.012||0.049||0.096||0.010||0.044||0.091|
|(b) Heteroskedastic DGP|
|T = 200||LM||0.165||0.379||0.522||0.348||0.631||0.768||0.183||0.401||0.544|
|T = 500||LM||0.204||0.414||0.557||0.408||0.678||0.796||0.211||0.428||0.572|
|T = 1000||LM||0.219||0.428||0.569||0.424||0.678||0.798||0.222||0.445||0.581|
With a homoskedastic DGP in panel (a), the tests assuming homoskedasticity have empirical size fairly close to the nominal value, although they are slightly conservative. Not surprisingly, size generally improves with T and also generally improves when fewer parameters are estimated. Indeed, very similar sizes apply for q = 4 and 8 when a common assumed AR order p = 4 is employed. Although it allows for the possibility of heteroskedasticity that is not present in this DGP, it is striking that the use of the wild bootstrap procedure delivers very good size for all cases in panel (a), irrespective of T.
However, when the true DGP is heteroskedastic, tests based on homoskedasticity are badly oversized; see panel (b) of Table I. For example, with a nominal significance level of 0.05, the empirical size is around 0.40 or more, and exceeding 0.60 when p = 8 is employed. This finding applies irrespective of the use of LMmax, LMexp or LMave, and hence rejections of linearity using tests based on homoskedasticity should be treated with extreme caution when heteroskedasticity may be present. Once again, wild bootstrap inference performs well. Although some oversizing is observed for these tests in panel (b), this is modest even when T = 200, and especially with p = 4.
3.3.2. Comparison with Other Tests
Table II compares our WSTAR nonlinearity test procedure with two approaches based on the conventional assumption of a single-lag transition variable, yt−k. The first, denoted LM, is a special case of LM, but restricts the set of weight functions Δ to those with weight concentrated on a single lag yt−k, k = 1, …, q, representing a conventional LSTAR model. The second, denoted LST, is the overall test of Luukkonen et al. (1988) for a single-variable transition function at unknown lag k = 1, …, q, for which the test statistic is compared with the appropriate asymptotic χ2 distribution. This test also assumes homoskedasticity, and hence LM and LST compare different approaches to implementing a nonlinearity test (under homoskedasticity) when δ = ek.
Table II. Empirical size and power of nonlinearity tests
|T = 200||LM||0.008||0.041||0.084||0.020||0.084||0.161||0.041||0.148||0.258|
|T = 500||LM||0.007||0.044||0.086||0.094||0.247||0.368||0.254||0.479||0.609|
|T = 1000||LM||0.009||0.045||0.092||0.376||0.593||0.706||0.718||0.880||0.932|
Although not reported in Table II, our analysis also considered a test based on the lag k leading to the strongest rejection of the linearity null hypothesis over k = 1, …, q when δ = ek in (4). This is proposed by Teräsvirta (1994) as a model specification procedure, but the literature abounds with examples where the minimum p-value obtained by comparing each test statistic to the χ2 distribution is incorrectly treated as valid in relation to the overall linearity null hypothesis. However, this leads to very substantial over-sizing, with empirical size around three times the nominal size, irrespective of the sample size employed.
Both homoskedastic and wild bootstrap versions of the univariate WSTAR tests are implemented in Table II using q = 4.10. An AR lag of p = 4 is employed for all tests.
The linear AR(1) process used in Subsection 3.3.1 is again employed to evaluate size. Two nonlinear DGPs provide power information, both of which use (1) and (2) with α0 = β0 = 0, α1 = (0.6000)′, β1 = (−0.4000)′, γ = 20 and c = 0. One is a conventional LSTAR specification with δ = (1000)′ and hence has transition variable yt−1, while the second has weights of the WSTAR form with δ = (1/31/31/30)′. Note that the latter true weight distribution is an average over three not four lags, so that none of the weight distributions included in the set Δ exactly corresponds to this δ. For the analysis of Table II, all DGPs are homoskedastic.
The WSTR results for the AR(1) in Table II are, not surprisingly, similar to the more detailed size results in Table I. The LST statistic is quite severely undersized for T = 200, 500 but, as anticipated, this is less marked for T = 1000.
Turning to power, the LST test always has less power than those employing bootstrap inference, due primarily to the number of parameter restrictions being tested. For example, in the LSTAR specification with transition variable yt−1, when T = 500 and a nominal 5% significance level is employed, the LST test has power 0.18, whereas the corresponding bootstrap tests, including LM, have power around 0.25. All tests gain power when applied to the DGP that has a WSTAR form, but the advantage of the bootstrap approach over the Luukkonen et al. (1988) test is also more marked in this case. Although considered here only for p = 4, it can be anticipated that the performance of the bootstrap tests will dominate the LST test even more strongly when higher potential lag orders are considered, especially for realistic sample sizes in macroeconomics, such as T = 200 or T = 500.
Also note that, when the DGP is of the WSTAR form, the WSTAR tests that consider a wider set of weight functions almost always have higher power than LM. However, and perhaps surprisingly, the LM statistic does not dominate these more general tests when the DGP has an LSTAR form. Further, robustification against heteroskedasticity using the wild bootstrap LM statistics does not lead to a deterioration of power. Indeed, in contrast to the recommendation of van Dijk et al. (2002, p. 160) that robust procedures not be used for nonlinearity testing, our recommendation (based on Tables I and II) is that the wild bootstrap form of the WSTR test should always be applied, irrespective of whether heteroskedasticity is suspected or not and irrespective of whether the transition variable may be a single lag or a weighted average of lags.