Response surface regressions for critical value bounds and approximate p-values in equilibrium correction models

In the following, we present some computational aspects about the Monte Carlo simulations in Section 3 of the main paper. All computations are performed in Stata 16. The bulk of the computations, the Monte Carlo simulations, are performed in Stata’s integrated matrix language, Mata. As a byte-compiled language, Mata runs about 5 to 6 times slower than a high-performance, compiled language such as C. However, most Mata functions used in our simulations hook directly into compiled ones, such as LAPACK functions (Anderson et al., 1999), which decreases the speed disadvantage substantially. We estimate that our simulation runs about half as fast as pure C would. Mata, however, is much more user friendly than C. For example, an appropriate random number generation mechanism that has a sufficiently large period and that accommodates parallel computations is readily available. For that, we use random number streams based on the Mersenne Twister pseudo-random number generator. Overall, we believe that Mata provides a good balance between speed and high-level language features. We manually distribute the computational load on 38 cores, each of which running at 2.9 GHz. After the removal of any redundant calculations, such as repeated calculation of the same cross products, the average run time per single core is 2.9 days, with the longest one taking 5.1 days. Storing the calculated statistics is a desirable computational aspect of the simulation. One of the advantages is that it isolates sequential steps that are computationally intensive. Once the statistics are saved, any subsequent operations can be done independently, without recalculating the results from the previous step over and over again, should either Corresponding author: University of Exeter Business School, Department of Economics, Streatham Court, Rennes Drive, Exeter, EX4 4PU, UK. Tel.: +44-1392-722110; E-mail: S.Kripfganz@exeter.ac.uk Max Planck Institute for Demographic Research, Konrad-Zuse-Straße 1, 18057 Rostock, Germany. Tel.: +49-381-2081245; E-mail: schneider@demogr.mpg.de

bugs or additional research ideas pop up. Storing the test statistics is also a prerequisite for density plots (Figures 1 to 3 in the main paper). However, the large number of calculated statistics, roughly 100 billion F -statistics and 60 billion t-statistics, poses several problems, the most serious one being storage. Using floating-point numbers with 8 digit precision (4 bytes per number), the (uncompressed) storage requirement for our application is 578 GB. While this is not technically infeasible, it is too much of a hindrance for practical research. Since floating-point numbers are typically not suited for eliciting sizeable compression gains, 1 our solution was to first round the calculated statistics to three digits after the decimal point. As we show further below, the effect of rounding on the RS regressions is absolutely negligible. After rounding, we further transformed the numbers in terms of first differences of sorted statistics and occurrence counts. The transformation is completely reversible, so that the original rounded 10 billion statistics per simulation design can be fully recovered. The resulting storage requirement is 43 GB, which decreases further to 9 GB when adding a conventional compression algorithm. This magnitude is easily manageable. The total disk saving is, compared to the original storage requirement, 92.6% after number transformations, and 98.4% after additional compression.
The untransformed and uncompressed storage requirement is 10 · 4 = 40 bytes. After rounding and multiplying by 1000, we obtain the numbers {3001, 3002, 3003} with corresponding occurrence counts {3, 3, 4}. Since the occurrence counts can be larger than 1-byte integers in practice, each of these numbers that characterize the rounded and transformed data is stored as a 2-byte integer. The storage requirement in this example thus reduces to 12 bytes. Furthermore, what we actually store instead of the rounded numbers themselves are the first differences, along with the level of the first statistic for reversibility of the algorithm (up to rounding). That is, {3001, 3002, 3003} becomes {3001, 1, 1}. The resulting data have many bit-level repeating patterns, so applying a conventional compression algorithm yields additional sizeable disk savings of 80%.
We emphasize that the effects of rounding to 3 digits after the decimal point are inconsequential for practical purposes. To substantiate this claim, we kept the unrounded statistics for a subset of specifications (k = 1 only) and compared the predicted critical values from the response surface regressions in Appendix C for the rounded and unrounded statistics. 99.9999% of the predicted critical values remain unchanged at least for the first two digits after the decimal point. Less than 0.002% of the predicted critical values change at the 3rd digit after the decimal point (and only for very small sample sizes), and the vast majority of the critical values remain unchanged even for the first 6 digits after the decimal point.

Appendix B Alternative data-generating processes
The finite-sample distributions and CVs that we obtained in Section 3 of the main paper are based on the restricted VAR(1) DGP in equations (4) and (5). As a robustness check, we alternatively generate the disturbances t = ( yt , xt ) either from a first-order autoregressive (AR) process, or from a first-order moving-average (MA) process, where the elements of the vector η t are independently drawn from the standard normal distribution. We choose ρ = 0.4 for the AR(1) error process and ρ = 0.5 for the MA(1) error process. Due to the computational complexity, we restrict our attention to a situation with k = 1 long-run forcing variable. Along the same lines as in Section 3.1, we obtain EDFs of the test statistics for various combinations of time periods T and lag orders q for all five cases (i)-(v). With these EDFs, we can then assess the finite-sample size distortions of the bounds test that is ignorant of the AR(1) or MA(1) error structure. For this purpose, we infer the actual p-values corresponding to the CVs from Section 3. Tables B.1 to B.8 present the sizes for both the F -statistic and the t-statistic, separately based on the finite-sample or the asymptotic CVs. Overall, the size distortions remain relatively small when the regression model is augmented with sufficiently many lags to account for the serial dependence. To make this point, notice that the AR(1) process (B.1) for yt translates into an AR(2) process for y t : y t = y t−1 + yt = (1 + ρ)y t−1 − ρy t−2 + η yt .
Hence, augmenting the regression model by at least the second lag of y t captures the serial correlation induced by the AR(1) shock process, which is a prerequisite for the validity of the bounds test (Assumption 2). Indeed, our results confirm that the size distortions can be substantial without such a lag augmentation, that is q < 2, and they can go either way. With a lag order of q = 2, the extent to which the bounds test is oversized for small sample sizes appears to be acceptable. Adding further lags does not yield improvements. With increasing sample size, the size distortions become smaller. They eventually become negligible for very large sample sizes, provided that q ≥ 2. It is less straightforward to deal with the MA(1) process (B.2) for yt that translates into an AR(∞) process for y t : With any finite-order approximation, the disturbance term will retain some serial correlation, thus violating Assumption 2. Hence, not surprisingly, the size distortions exceed those from the AR(1) shock process in most situations. While the approximation generally becomes better with a higher lag augmentation q, the distortions do not monotonically decrease over the whole spectrum. In particular for q < 2, the results are quite erratic. Even for very large sample sizes, the bounds test remains size distorted to a nonnegligible extent unless the lag order is chosen sufficiently large, say q ≥ 8. For smaller sample sizes, properly accounting for such MA shock dynamics may be infeasible because degree-of-freedom restrictions prevent the inclusion of sufficiently many lags.
In most of the considered scenarios with serially dependent disturbances, our finitesample CVs from Section 3 still improve the inference over the asymptotic CVs, in particular for relatively small sample sizes. Nevertheless, there are a few instances in which the asymptotic CVs provide better approximations. However, this happens primarily when both the lag order and the sample size are large, in which case the size distortions are very small.
As a final remark, when aiming to run RS regressions for a DGP based on AR(1) shocks, one should discard all test statistics computed from a regression model with q < 2 for which the bounds test would be inconsistent. Otherwise, the quantiles of the distributions from those inconsistent tests would bias the RS coefficients. The same logic implies that consistent RS estimates based on finite-order autoregressive models cannot be obtained for a DGP with MA(1) shocks.

Appendix C Separate response surface regressions
In Section 3.2 of the main paper, we have obtained RS estimates for the F -and t-statistic that allow us to predict quantiles of the distributions for any number of long-run forcing variables. In the previous literature, these RS models were estimated separately for each variable count k of interest. In this appendix, we do the same for all k ∈ [0, 10]. While the resulting predictions are expected to be slightly more precise, we have seen in Figure  4 of the main paper that there is hardly any practically relevant difference compared to the joint model for all k.
In the following, we estimate separate RS models for each quadruplet {c, k, d, p}. Given the 100 meta replications, up to 19 choices of the time horizon T , and 8 different lag orders q, we have between 5,900 and 12,400 observations per estimation, accounting for the restriction that there shall be at least twice as many observations as parameters in equation (6). 3 The general RS model is given in equation (7), with the restrictions θ 0,l = 0 for all l > 0 due to the asymptotic irrelevance of the short-run coefficients. We have chosen the polynomial orders m = 3 and n = 1. The latter provides a better fit than alternatively setting n = 3 together with the restrictions θ j,l = 0 whenever j = l for l > 0, which has been done by Cheung and Lai (1995). Equation (7) thus reduces to We report the ordinary least squares results for the quantiles corresponding to a size of 1%, 5%, and 10% in Tables C.1 to C.8. 4 These tables also contain the standard error (SE) of the intercept, robust to heteroskedasticity, as a measure of uncertainty about the asymptotic quantile. It is always smaller than 0.0041 for the F -statistic and below 0.0011 for the t-statistic. In most experimental designs, the standard error remains far below this magnitude. However, the reported standard errors are too small because they are conditional on the correct specification of the RS model, as emphasized by MacKinnon (1991).
The asymptotic CVs can be read off directly from the RS intercept θ 0,0 . Our estimates are close to the corresponding near-asymptotic CVs tabulated by Pesaran et al. (2001). The absolute difference is for the most part below 0.05, both for the F -statistic and the t-statistic. However, these asymptotic CVs are less useful in small samples. For a given number of variables in the level relationship, finite-sample CVs can be computed from the regression coefficients for any combination of the effective sample size and number of short-run coefficients.
Previously reported CVs typically do not take the lag augmentation in equation (6) into account and might thus be inaccurate in many empirically relevant situations, in particular when the sample size is relatively small. Figure 5 in the main paper illustrates the variation across lag orders. For k = 0, there is obviously no distinction possible between I(0) and I(1) variables in the level relationship. In this situation, the F -statistic in cases (ii) and (iv) is the one analyzed by Dickey and Fuller (1981). In cases (i), (iii), and (v), it equals the square of the t-statistic. The latter corresponds to the familiar augmented Dickey-Fuller unit-root  (c) q = 8 Figure C.1: RS from equation (C.1) for the t-statistic in case (iii) with k = 0 variables at the 5% significance level for selected lag orders q over a range of effective sample sizes N (T, q). The diamonds are the CVs computed from the aggregate EDFs of the 10 7 t-statistics. The horizontal line represents the respective estimate of θ 0,0 in Table C.7 and the solid curve the corresponding RS. The long-dashed curve is the RS from Cheung and Lai (1995), the medium-dashed curve from Ericsson and MacKinnon (2002), and the short-dashed curve from MacKinnon (2010). Crosses are tabulated CVs from Dickey (1976).
RS regressions closely match those reported in the previous literature. 5 RS estimates for the original Dickey and Fuller (1979) test statistic, q = 1, have been previously obtained by MacKinnon (1991MacKinnon ( , 2010 and Ericsson and MacKinnon (2002). 6 Cheung and Lai (1995) go one step further by estimating a RS that allows the quantiles of the distribution to vary with the lag order. Figure C.1 compares these RS estimates to ours for case (iii) and three different lag orders at a size of 5%. For the test without lag augmentation, q = 1, our RS and the ones from MacKinnon (2010) and Ericsson and MacKinnon (2002) are visually indistinguishable and they all fit nicely through the quantiles from the aggregate EDFs obtained in Section 3.1 of the main paper. 7 The advantage of our approach becomes apparent when we move to higher lag orders. Because the RS from MacKinnon (2010) does not accommodate the lag augmentation, it becomes too conservative. In fact, for higher lag orders the asymptotic critical value would provide a better approximation for most sample sizes than the MacKinnon (2010) surface or the tabulated CVs from Dickey (1976). By contrast, Figure C.1 confirms that our RS provides a very good fit to the CVs implied by our simulated aggregate EDFs. It also outperforms the RS from Cheung and Lai (1995) that is skewed towards zero, possibly due to the smaller number of replications in their simulation and a lower polynomial order in their RS regressions. Ericsson and MacKinnon (2002) indirectly account for the lag order by estimating the RS over the degrees-of-freedom adjusted sample size. However, 5 See Table 1 in the main paper. 6 Dickey (1976) obtains his CVs as predictions from RS regressions but he does not report the regression coefficients.
7 MacKinnon (2010) is an updated version of MacKinnon (1991). 1) for the F -and t-statistic in case (iii) at the 5% significance level with k ∈ {0, 2, 4, 6, 8} variables over a range of effective sample sizes N (T, q) and with a lag order q = 1. The solid curve refers to k = 0. With increasing k, the curves have shorter dashes.
Figure C.1 clearly shows that this strategy is not appropriate for higher lag orders as the fit worsens even compared to MacKinnon (2010).
In the multivariable environment, the order of integration affects the distribution of the test statistic. Banerjee et al. (1998) and Ericsson and MacKinnon (2002) consider the t-statistic for cointegration testing under the assumption that all regressors are individually I(1), the upper bound for the bounds test, but neither of them account for the lag augmentation. In this situation, when we vary k for a fixed lag order q = 1, the spread between the RS curves is largely driven by the asymptotic critical value that now depends on k. This is shown in Figure C.2 for both the F -and t-statistic. Importantly, the gap between the curves becomes systematically smaller with increasing k, which justifies our approach in Section 3.2 of the main paper to directly model the variation in k as part of a joint RS.

Appendix D Critical values and approximate p-values
To assess the precision of the empirical distribution functions obtained in our Monte Carlo simulation in Section 3.1 of the main paper, we can compute the coefficient of variation for the quantiles of interest based on the 100 meta replications with 100,000 replications each. For selected simulation designs, they are reported in Tables D.1 and D.2. Because the replications for a given design are independent, the coefficient of variation for the quantiles based on 10 million replications is expected to be one-tenth of the one for 100,000 replications.
Besides being useful on their own in an empirical analysis, the approximate p-values computed in Section 3.3 of the main paper can be used to assess the relevance of the differences between asymptotic and finite-sample CVs. Tables D.3 and D.4 present the approximate finite-sample p-values for a given sample size and variable count that correspond to the respective asymptotic CVs at a specified significance level. These p-values can be interpreted as the expected finite-sample size of the asymptotic test.
Finally, we present larger and colored versions of Figures 1 to 3 and 5 to 6 at the end of this document.  Note: Reported are the p-values obtained from the EDFs for a DGP with AR(1) shock process (B.1) that are associated with the finite-sample CVs from Tables 2 to 5 for a given significance level. Only designs with lag order k = 1 are considered.  Note: Reported are the p-values obtained from the EDFs for a DGP with AR(1) shock process (B.1) that are associated with the asymptotic CVs from Tables 2 to 5 for a given significance level. Only designs with lag order k = 1 are considered.  Note: Reported are the p-values obtained from the EDFs for a DGP with AR(1) shock process (B.1) that are associated with the finite-sample CVs from Tables 2 to 5 for a given significance level. Only designs with lag order k = 1 are considered.  Note: Reported are the p-values obtained from the EDFs for a DGP with AR(1) shock process (B.1) that are associated with the asymptotic CVs from Tables 2 to 5 for a given significance level. Only designs with lag order k = 1 are considered.     (1) shock process (B.2) that are associated with the asymptotic CVs from Tables 2 to 5 for a given significance level. Only designs with lag order k = 1 are considered.   (1) shock process (B.2) that are associated with the finite-sample CVs from Tables 2 to 5 for a given significance level. Only designs with lag order k = 1 are considered.  (1) (1) shock process (B.2) that are associated with the asymptotic CVs from Tables 2 to 5 for a given significance level. Only designs with lag order k = 1 are considered.   (6). SE(θ 0,0 ) denotes the heteroskedasticity-robust standard error of the intercept,R 2 the adjusted coefficient of determination, and RMSE the root mean square error.  (6). SE(θ 0,0 ) denotes the heteroskedasticity-robust standard error of the intercept,R 2 the adjusted coefficient of determination, and RMSE the root mean square error.  (6). SE(θ 0,0 ) denotes the heteroskedasticity-robust standard error of the intercept,R 2 the adjusted coefficient of determination, and RMSE the root mean square error.  (6). SE(θ 0,0 ) denotes the heteroskedasticity-robust standard error of the intercept,R 2 the adjusted coefficient of determination, and RMSE the root mean square error.  (6). SE(θ 0,0 ) denotes the heteroskedasticity-robust standard error of the intercept,R 2 the adjusted coefficient of determination, and RMSE the root mean square error.  (6). SE(θ 0,0 ) denotes the heteroskedasticity-robust standard error of the intercept,R 2 the adjusted coefficient of determination, and RMSE the root mean square error.  (6). SE(θ 0,0 ) denotes the heteroskedasticity-robust standard error of the intercept,R 2 the adjusted coefficient of determination, and RMSE the root mean square error.  (6). SE(θ 0,0 ) denotes the heteroskedasticity-robust standard error of the intercept,R 2 the adjusted coefficient of determination, and RMSE the root mean square error.  Note: The coefficient of variation is computed as the ratio of the standard deviation to the mean over the 100 meta replications for the empirical quantiles that correspond to the respective significance level and simulation design. Only designs with a lag order q = 1 are considered.  Note: The coefficient of variation is computed as the ratio of the standard deviation to the absolute value of the mean over the 100 meta replications for the empirical quantiles that correspond to the respective significance level and simulation design. Only designs with a lag order q = 1 are considered.   (9) for the F -statistic in case (iii) at the 5% significance level over a range of effective sample sizes N (T, q) with k = 1 variable and lag order q ∈ {0, 3, 6, 9, 12}. The white dashed line indicates the asymptotic critical value.  (9) for the F -statistic in case (iii) at the 5% significance level over a range of effective sample sizes N (T, q) with k = 3 variables and lag order q ∈ {0, 3, 6, 9, 12}. The white dashed line indicates the asymptotic critical value.  (9) for the t-statistic in case (iii) at the 5% significance level over a range of effective sample sizes N (T, q) with k = 0 variables and lag order q ∈ {0, 3, 6, 9, 12}. The white dashed line indicates the asymptotic critical value.  (9) for the t-statistic in case (iii) at the 5% significance level over a range of effective sample sizes N (T, q) with k = 1 variable and lag order q ∈ {0, 3, 6, 9, 12}. The white dashed line indicates the asymptotic critical value.  (9) for the t-statistic in case (iii) with k ∈ {0, 2, 4, 6, 8} variables.