Abstract
 Top of page
 Abstract
 INTRODUCTION
 BASIC SETUP WITH NO SHIFT OR KNOWN SHIFT DATE
 TREATING THE SHIFT DATE AS UNKNOWN
 TESTING FOR A LEVEL SHIFT IN A UNIVARIATE TIME SERIES
 FINITE SAMPLE PERFORMANCE OF THE VF STATISTIC
 APPLICATION: DATA AND METHODS
 CONCLUSIONS
 Acknowledgements
 REFERENCES
 Supporting Information
Comparisons of trends across climatic data sets are complicated by the presence of serial correlation and possible stepchanges in the mean. We build on heteroskedasticity and autocorrelation robust methods, specifically the Vogelsang–Franses (VF) nonparametric testing approach, to allow for a stepchange in the mean (level shift) at a known or unknown date. The VF method provides a powerful multivariate trend estimator robust to unknown serial correlation up to but not including unit roots. We show that the critical values change when the level shift occurs at a known or unknown date. We derive an asymptotic approximation that can be used to simulate critical values, and we outline a simple bootstrap procedure that generates valid critical values and pvalues. Our application builds on the literature comparing simulated and observed trends in the tropical lower troposphere and midtroposphere since 1958. The method identifies a shift in observations around 1977, coinciding with the Pacific Climate Shift. Allowing for a level shift causes apparently significant observed trends to become statistically insignificant. Model overestimation of warming is significant whether or not we account for a level shift, although null rejections are much stronger when the level shift is included. © 2014 The Authors. Environmetrics published by John Wiley & Sons, Ltd.
INTRODUCTION
 Top of page
 Abstract
 INTRODUCTION
 BASIC SETUP WITH NO SHIFT OR KNOWN SHIFT DATE
 TREATING THE SHIFT DATE AS UNKNOWN
 TESTING FOR A LEVEL SHIFT IN A UNIVARIATE TIME SERIES
 FINITE SAMPLE PERFORMANCE OF THE VF STATISTIC
 APPLICATION: DATA AND METHODS
 CONCLUSIONS
 Acknowledgements
 REFERENCES
 Supporting Information
Many empirical applications involve comparisons of linear trend magnitudes across different time series with autocorrelation and/or heteroskedasticity of unknown form. Vogelsang and Franses (2005, herein VF) derived a class of heteroskedasticity and autocorrelation robust (HAC) tests for this purpose. The VF statistic is similar in form to the familiar regression Ftype statistics but remains valid under serial dependence up to but not including unit roots in the time series. For treatments of the theory behind HAC estimation and inference, see Andrews (1991), Kiefer and Vogelsang (2005), Newey and West (1987), Sun et al. (2008), and White and Domowitz (1984) among others. Like many HAC approaches, the VF approach is nonparametric with respect to the serial dependence structure and does not require a specific model of serial correlation or heteroskedasticity to be implemented. Unlike most nonparametric approaches, the VF approach avoids sensitivity to bandwidth selection by setting the bandwidth equal to the entire sample.
Here, we extend the VF approach to the case in which one or more of the series has a possible level shift. Our assumption throughout is that a researcher considers a onetime level shift as a fundamentally different process than a continuous trend. Consequently, if the null hypothesis is that two series have the same trend and one series exhibits a trend while the other exhibits a level shift and no trend, a rejection of the null would be considered valid because the two phenomena are distinct and a prediction of one is not confirmed by observing the other.
Accounting for level shifts does not necessarily increase the likelihood of rejecting a null of trend equivalence. In the top panel of Figure 1, a comparison of the linear trend coefficients would suggest they are similar, but clearly, y_{1} differs from y_{2} in that the former is steadily trending while the latter is trendless with a single discrete level shift at the break point T_{b}. By contrast, in the bottom panel, a failure to account for the shift would overstate the difference between the trend slopes. In each case, the influence of the shift term is highlighted by the fact that if the trend slope comparisons were conducted over the preshift or postshift intervals, they might indicate opposite results to those based on the entire sample (with the shift term omitted).
The basic linear trend model is written as
 (1)
where i = 1,…,n denotes a particular time series and t = 1,…,T denotes the time period. The random part of y_{it} is given by u_{it}, which is assumed to be covariance stationary (in which case y_{it} is labeled a trend stationary series, that is, stationary around a linear time trend, if one is present). For a series of length T, we parameterize the break point by denoting the fraction of the sample occurring before it as λ = T_{b}/T.
The following issues must be addressed in order to derive an HAC robust trend comparison test in the presence of a possible level shift. (i) If λ is known, and specifically is known to be in the (0, 1) interval, the VF test score can be generalized, as we show in Section 'TREATING THE SHIFT DATE AS UNKNOWN', but the distribution is shown to depend on λ and the critical values change. It will turn out that the form of the VF statistic and its critical values are the same whether one is testing hypotheses involving the trend coefficients or other parameters in the trend function. (ii) If λ is unknown, it must be estimated along with the magnitude of the associated shift term. But this gives rise to a problem of nonidentification if we want to allow for the possibility that the true value of the level shift parameter is zero.
The regression model with level shift takes the form
 (2)
where the dummy variable DU_{t}(λ) = 0 if t ≤ λT and 1 otherwise (we will typically suppress the λ term where it is convenient to do so). Hence, for series i, estimation of (2) by ordinary least squares (OLS) yields an estimated intercept of up to T_{b} and thereafter. In our empirical application, we are primarily interested in testing hypotheses about the trend slope parameters, b_{i}, while controlling for the possibility of a level shift. If it is reasonable to view λ as known, then inference about the trend slopes will proceed in a straightforward way with DU_{t}(λ) included in the model even in the case where the true value of g_{i} is zero. However, if it is more reasonable to treat λ as unknown and we want to be robust to the possibility that g_{i} is nonzero, then inference about the trend slopes (b_{i}) becomes more delicate because λ is not identified when g_{i} is zero. We would face a similar identification problem if we wanted to test the null hypothesis that g_{i} itself is zero and λ is unknown.
There is now a wellestablished literature in statistics and econometrics for carrying out inference where a parameter is not identified under the null hypothesis but is identified under the alternative hypothesis. See for example Davies (1987), Andrews (1993), Andrews and Ploberger (1994), and Hansen (1996) among others. One solution to this identification problem involves the use of a supremum function, which is akin to a datamining approach. In the present case, we can compute the VF statistic for equality of trends for a range of λ allowed to vary across (0, 1) and find the largest VF statistic, the supVF statistic. To be robust to the possibility that there are no level shifts in the data, that is, to be robust to the critique that the date of the level shift was chosen to datamine an outcome for the equality of trends test, we work out the null distribution of the supVF statistic for equality of trends for the case where g_{i} = 0. This yields a trend equality test that is very robust to the possibility and location of potential level shifts.
Although our focus is on the problem of trend inference allowing for possible unknown level shifts, our extension of the VF approach is general enough to include tests of the null hypothesis of no level shift, and we report some limited results in the paper for these tests. A potential application of tests for a level shift is the homogenization of weather data. Many long observational records are believed to have been affected by possible equipment and/or sampling changes, changes to the area around monitoring locations, and so forth (see Hansen et al., 1999; Brohan et al., 2006 for examples in the land record; Folland and Parker, 1995; Thompson et al., 2008 for examples in sea surface data). A typical method for detecting and removing level shifts is to construct a reference series that is not expected to exhibit the discontinuity, such as the mean of other weather station records in the vicinity, and then look for one or more jumps in a record relative to its reference series.
While the application of the VF approach to testing for a level shift is potentially quite useful in many empirical settings, the problem of testing for a level shift in a trending series with a known or unknown shift date has already received some attention in the econometrics literature (Vogelsang (1997) and Sayginsoy and Vogelsang (2011)) and the empirical climate literature (see Gallagher et al. (2013) and references therein). Each proposed method has inherent strengths and weaknesses. A complete comparison of the VF approach to existing tests for a level shift would be a substantial undertaking and is beyond the scope of this paper, but we draw some contrasts in Sections 'TESTING FOR A LEVEL SHIFT IN A UNIVARIATE TIME SERIES' and 'FINITE SAMPLE PERFORMANCE OF THE VF STATISTIC'.
The question of whether or not a level shift is present in trending data can strongly affect the resulting trend calculations and tests of equality of trend slopes. If a change point λ is known, the analysis in Section 'TREATING THE SHIFT DATE AS UNKNOWN' applies, and if a change point is suspected but the date is unknown, the analysis in Section 'TESTING FOR A LEVEL SHIFT IN A UNIVARIATE TIME SERIES' applies. In our application, we focus on the case where there is at most one level shift in each series. In other applications, such as those involving very long weather series, one might suspect there are multiple shifts. If they occur at known dates, then our extension of the VF approach is general enough to apply. However, should it be more reasonable to model the shift dates as unknown or should there be uncertainty regarding the number of shifts, this greatly complicates the analysis, especially from a computational perspective, and is beyond the scope of this paper. In addition, if one thinks level shifts occur frequently and with randomness, then there is the additional difficulty that the range of possible specifications could, in principle, include the case in which the level changes by a random amount at each time, which is equivalent to having a random walk, or unit root component in u_{it}. If y_{it} has a unit root component, inference in models (1) and (2) becomes more complicated. More importantly, it is difficult to give a physical interpretation to a unit root component of a temperature series. See Mills (2010) for a discussion of temperature trend estimation when a random walk is a possible element of the specification.
In our application, we think it is reasonable to assume that the observed series are well characterized by a trend and at most one level shift at a known date and that the errors are covariance stationary. We focus on the prediction of climate warming in the troposphere over the tropics. As shown in Section 'APPLICATION: DATA AND METHODS', climate models predict a steady warming trend in this region because of rising atmospheric greenhouse gas levels, but none predict a stepchange, so trends and shifts can be regarded as distinct phenomena. A number of studies (summarized later) have shown that models likely overstate the warming trend, but there is disagreement as to whether the bias is statistically significant. McKitrick et al. (2010) used the original VF test to examine this issue over the 1979–2009 interval, coinciding with the record available from weather satellites. We extend their analysis to the 1958–2012 interval using data from weather balloons. This long span encompasses a date at which a known climatic event caused a level shift in many observed temperature series. If the shift is nontrivial in magnitude, the comparison would thus be akin to that in Figure 1, such that failure to take it into account could bias the comparison either toward overstating or understating the difference in trend slopes.
The event in question occurred around 1978 and is called the Pacific climate shift (PCS). This manifested itself as an oceanic circulatory system change during which basinwide wind stress and sea surface temperature anomaly patterns reversed, causing an abrupt steplike change (level shift) in many weather observations, including in the troposphere, as well as in other indicators such as fisheries catch records (see Seidel and Lanzante, 2004; Tsonis et al., 2007; Powell and Xu, 2011, and extensive references therein). For our purposes, we do not try to present a specific physical explanation of the PCS or even evidence which its origin was exogenous to the climate system, only that it was a large event at an approximately known date, the existence of that has been documented and studied extensively and that resulted in a shift in the mean of the temperature data. We first present results based on assumption that the PCS occurred at a known date (Section 'Multivariate trend comparisons: no shift and known shift date cases') and then based on the assumption that the PCS is not known to have occurred or that the date of occurrence is unknown (Section 'Multivariate trend comparisons: unknown shift date case'). We find, in some cases, that the shift term is significant at the 5% or 10% level, confirming the overall importance of controlling for this possibility when comparing trends.
If the date of the PCS is taken as given and exogenous, we find that the models project significantly more warming in both the lower troposphere and midtroposphere than are found in weather balloon records over the interval. This finding remains robust if we treat the date of the PCS as unknown and apply the conservative datamining approach. In fact, this finding is robust whether or not we include a level shift in the regression model: we reject equivalence of the trend slopes between the observed and modelgenerated temperature series either way. The evidence against equivalence is simply stronger when we control for a level shift and this is true whether we treat the date of the shift to be known or unknown.
We also find that if the date of the PCS is assumed to be known then (a) the appearance of positive and significant trend slopes in the individual observed temperature series vanishes once we control for the effect of the level shift and (b) we find statistical evidence for a level shift in some but not all observed temperature series. If the date of the PCS is assumed to be unknown, statistical evidence remains for a level shift in the mean of the observed midtroposphere series but is weak in the lower troposphere series. This is not surprising given that we use the datamining robust critical value that decreases the power of detecting such a shift.
TREATING THE SHIFT DATE AS UNKNOWN
 Top of page
 Abstract
 INTRODUCTION
 BASIC SETUP WITH NO SHIFT OR KNOWN SHIFT DATE
 TREATING THE SHIFT DATE AS UNKNOWN
 TESTING FOR A LEVEL SHIFT IN A UNIVARIATE TIME SERIES
 FINITE SAMPLE PERFORMANCE OF THE VF STATISTIC
 APPLICATION: DATA AND METHODS
 CONCLUSIONS
 Acknowledgements
 REFERENCES
 Supporting Information
Many previous authors (e.g. Seidel and Lanzante, 2004) have treated the date of the level shift as known because the PCS was an exogenous event observed across many different climatic data series. As a robustness check, we also report results where we treat the date of the level shift as unknown. We take a “datamining” approach that has a long history in the change point literature. For a given hypothesis, we compute the VF statistic for a grid of possible shift dates and determine the one that gives the largest VF statistic. In other words, we search for the shift date that gives the strongest evidence against the null hypothesis. The effect on critical values of searching over shift dates must be taken into account, otherwise this approach would be a “datamining” exercise that could give potentially misleading inference. The level of the test will be inflated above the nominal level compared with the case where the shift date is assumed to be known. Fortunately, it is easy to obtain critical values that take into account the search over shift dates.
For a given potential shift date T_{b}, let VF(λ) denote the VF statistic for testing a given null hypothesis. The limiting random variable given by (14) depends on λ through the level shift regressor, and we now label the limit by to make explicit the dependence on the shift date used to estimate the model. For technical reasons (Andrews (1993)), we need to “trim” the fraction v from each end of the sample, leaving a grid of potential shift dates given by vT + 1, vT + 2, …, T − vT (in our application, we set v = 0.1). Define the “datamined” VF statistic as
Under the null hypothesis (7) and under the assumption there is no level shift in the data, we have
 (15)
where the limit follows from (14) and application of the continuous mapping theorem. Using simulation methods identical to those used for the known shift date case, we computed asymptotic critical values for supVF for v = 0.1 and q = 1 for testing hypotheses about the trend slope parameters in model (2). These critical values are given in Table 2. Using the supVF statistic along with the critical values given by (15) provides a very conservative test with regard to the shift date.
Table 2. Asymptotic critical values: model (2), unknown shift date, q = 1; 10% trimming (λ* = 0.1)%  supVF trend slope  supVF level shift 


0.700  79.765  95.455 
0.750  88.184  109.94 
0.800  98.532  116.20 
0.850  111.78  130.76 
0.900  131.92  150.99 
0.950  166.41  188.68 
0.975  205.15  225.78 
0.990  261.39  279.85 
0.995  301.94  322.48 
TESTING FOR A LEVEL SHIFT IN A UNIVARIATE TIME SERIES
 Top of page
 Abstract
 INTRODUCTION
 BASIC SETUP WITH NO SHIFT OR KNOWN SHIFT DATE
 TREATING THE SHIFT DATE AS UNKNOWN
 TESTING FOR A LEVEL SHIFT IN A UNIVARIATE TIME SERIES
 FINITE SAMPLE PERFORMANCE OF THE VF STATISTIC
 APPLICATION: DATA AND METHODS
 CONCLUSIONS
 Acknowledgements
 REFERENCES
 Supporting Information
As part of the empirical application, we provide visual evidence that the observed temperature series exhibit level shifts around the time of the PCS. Some formal statistical evidence regarding these level shifts can be provided by application of the VF statistic to an individual time series. Consider model (2) for the case of n = 1, and place the model in the general framework (3) with d_{0t} = DU_{t}, d_{1t} = (1, t) ′, β_{1} = g_{1}, and δ_{1} = (a_{1}, b_{1}) ′. If we take the shift date as known, then the VF statistic for testing for no level shift (H_{0} : g_{1} = 0) can be computed as before using (9) with R = 1 and r = 0. The asymptotic null critical values are still given by Table 1.
If we treat the shift date as unknown, we can apply the supVF statistic although the asymptotic critical values depend on which regressor is placed in d_{0t}. While it is true that for a given value of λ, the distribution of is the same regardless of the regressor placed in d_{0t}, the covariance structure of across λ depends on which regressor is placed in d_{0t}. Therefore, the supVF statistic when testing for a zero trend slope has different asymptotic critical values than the supVF statistic for testing a zero level shift. We simulated the asymptotic critical values of supVF for testing for a zero level shift for the case of v = 0.1 and q = 1 and provide those critical values in Table 2.
Other tests for a level shift at an unknown date of a trending time series have been proposed in the empirical climate literature. Reeves et al. (2007) provide a review of change point detection methods developed in the climate literature, but the review focuses on tests designed for time series variables that do not have serial correlation. In contrast Lund et al. (2007) propose a test for a level shift that allows a specific form of autocorrelation—the first order periodic autoregressive model. We prefer the VF approach for two reasons. First, the VF approach is robust to more general forms of autocorrelation. Second, we formally derive and characterize the limiting null distribution of the sup statistic, and this allows us to tabulate null critical values. Lund et al. (2007) also use a suptype statistic, but they do not provide any asymptotic theory that can be used to generate valid approximate critical values. A recent paper by Gallagher et al. (2013) develops asymptotic theory for a level shift test that treats the shift date as unknown but their analysis is confined to trend models where u_{it} is assumed to be uncorrelated over time. What seems to be missing from the empirical climate literature are level shift tests that allow the shift date to be unknown and permit serial correlation in u_{it}. Fortunately, there are several papers in the econometrics literature that propose level shift tests for trending series that have these properties, see Ploberger and Krämer (1996), Vogelsang (1997), and Sayginsoy and Vogelsang (2011). While clearly well outside the scope of this paper, it would be interesting to compare the supVF test for a shift in trend at unknown date with the other tests proposed in the literature.
FINITE SAMPLE PERFORMANCE OF THE VF STATISTIC
 Top of page
 Abstract
 INTRODUCTION
 BASIC SETUP WITH NO SHIFT OR KNOWN SHIFT DATE
 TREATING THE SHIFT DATE AS UNKNOWN
 TESTING FOR A LEVEL SHIFT IN A UNIVARIATE TIME SERIES
 FINITE SAMPLE PERFORMANCE OF THE VF STATISTIC
 APPLICATION: DATA AND METHODS
 CONCLUSIONS
 Acknowledgements
 REFERENCES
 Supporting Information
We use the following data generating process:
 (17)
 (18)
where , , , ε_{it} ~ iidN(0, 1), cov(ε_{1t}, ε_{2s}) = 0, and , . The errors, u_{it}, are configured to have unit variances with . When η = 0, y_{1t} and y_{2t} are uncorrelated with each other.
We report two sets of results. The first set of results focuses on empirical null rejection probabilities. For y_{1t}, we set b_{1} = 0.01 and g_{1} = 0 so that there is no level shift in y_{1t}. For y_{2t}, we set b_{2} = 0.01 so that the null hypothesis of equal trend slopes holds. We set η = 0 so that the two series are uncorrelated. We report results for T = 120, 240, and 636, and a selection of values of ρ_{1} = 0 and g_{1} = 0.25. In all cases, 50,000 replications were used, and we computed empirical rejection probabilities for the VF, ρ_{1} = 0.9, and VF statistics for testing g_{1} = 0 using the appropriate asymptotic critical values. The simulation results for this configuration highlight the impact of serial correlation structure on null rejection probabilities relative to the sample size.
The results are tabulated in Table 3. There are two sets of results reported for each of the three statistics. The first set of results corresponds to the case where no level shift dummy variable is included in the estimated model. The second set of results corresponds to the case where the level shift dummy is included in the estimated model. In this case, we also report results for the supVF statistic. Results are organized into three blocks corresponding to the three sample sizes. Within a block, results are given for seven configurations of the autoregressive parameters ranging from no serial correlation to strong serial correlation.
Table 3. Empirical null rejections with AR(2) errors  Without level shift  With level shift 

T  ρ_{1}  ρ_{2}  W_{PW}  W_{Bart}  VF  W_{PW}  W_{Bart}  VF  SupVF 


120  0  0  0.062  0.058  0.051  0.064  0.059  0.050  0.042 
0.3  0  0.065  0.091  0.057  0.067  0.095  0.058  0.054 
0.6  0  0.076  0.126  0.069  0.079  0.133  0.070  0.079 
0.9  0  0.141  0.263  0.131  0.140  0.271  0.120  0.203 
0.3  0.3  0.180  0.175  0.079  0.180  0.178  0.078  0.101 
0.6  0.3  0.258  0.303  0.153  0.253  0.299  0.131  0.238 
0.9  −0.3  0.020  0.102  0.057  0.021  0.117  0.060  0.054 
240  0  0  0.057  0.055  0.050  0.058  0.055  0.052  0.045 
0.3  0  0.058  0.080  0.054  0.059  0.081  0.054  0.050 
0.6  0  0.063  0.101  0.060  0.064  0.106  0.061  0.061 
0.9  0  0.100  0.190  0.097  0.101  0.203  0.096  0.137 
0.3  0.3  0.167  0.140  0.064  0.168  0.142  0.066  0.072 
0.6  0.3  0.216  0.226  0.110  0.215  0.232  0.106  0.164 
0.9  −0.3  0.013  0.084  0.054  0.015  0.091  0.055  0.049 
660  0  0  0.052  0.051  0.050  0.053  0.052  0.051  0.050 
0.3  0  0.052  0.067  0.051  0.054  0.068  0.052  0.050 
0.9  0  0.055  0.078  0.053  0.056  0.081  0.055  0.052 
0.9  0  0.068  0.122  0.067  0.070  0.131  0.069  0.078 
0.3  0.3  0.154  0.102  0.055  0.156  0.104  0.056  0.055 
0.6  0.3  0.177  0.149  0.073  0.179  0.155  0.075  0.090 
0.9  −0.3  0.009  0.067  0.051  0.010  0.072  0.053  0.049 
If the asymptotic approximations were working perfectly for the statistics, we would see rejections of 0.05 in all cases. When the serial correlation is absent, all statistics have empirical rejection probabilities close to 0.05 regardless of the sample size. Once there is serial correlation in the model, overrejections can occur depending on the strength of the serial correlation relative to the sample size. First focus on the case of AR(1) errors (VF). Rejections tend to be close to 0.05 when W_{Bart} is small, but as W_{PW} increases in value, rejections tend to increase. This is especially true for the W_{Bart} statistic where rejections exceed 0.25 when ρ_{1} = 0.9 In contrast, W_{pw} and VF suffer from less severe overrejection problems although they tend to be oversized when T = 120 and ρ_{1} = 0.9 For a given value of ρ_{1}, as T increases, overrejections tend to fall for all three statistics but slowest for W_{Bart}. Overall, for the AR(1) error case, W_{pw} and VF have similar rejections to each other and outperform W_{Bart}. The supVF statistic tends to overreject more than VF when serial correlation is strong although the differences between supVF and VF decrease as the sample size increases. It is a common finding that supremum statistics tend to have more overrejection problems than statistics that treat break dates as known.
One of the reasons that W_{pw} performs relatively well with AR(1) errors is that W_{pw} is explicitly designed for AR(1) error structures. But, when the errors are not AR(1), W_{pw} can suffer from overrejection and underrejection problems. Consider the case ρ_{1} = 0.3, ρ_{2} = 0.3 where W_{pw} shows substantial overrejections that are larger than W_{Bart} and VF. These overrejections tend to persist as T increases. In contrast, VF is much less distorted and rejections approach 0.05 as T increases. For the case of ρ_{1} = 0.9, ρ_{2} = −0.3, W_{pw} underrejects, and the underrejection problem becomes more severe as T increases, whereas VF has rejections close to 0.05 for all sample sizes. The W_{Bart} statistic tends to overreject mildly in this case.
In general, Table 3 indicates that the VF statistic has the least overrejection problems and is the better statistic with regard to control of type 1 error.
In the second set of results, we use T = 660 to match the empirical application. We now include a level shift in y_{1t} with λ = 0.3636 and we set b_{1} = 0,0.01 and g_{1} = 0.25. For y_{2t} we set b_{2} = 0.01, 0.0105, 0.011, 0.0116, and 0.0121. We report results for η = 0,0.5. While we ran simulations for a wide range of values for ρ_{1} and ρ_{2}, we only report results for ρ_{1} = 0,0.9 and ρ_{2} = 0 given that results for other serial correlation configurations have similar patterns to what is reported in Table 3.
The results are given in Table 4. The first block of 20 rows gives results for η = 0, whereas the second block of 20 rows gives results for η = 0.5. Within each η block, results are first given for g_{1} = 0 followed by results for g_{1} = 0.25. For each value of g_{1}, results are given for ρ = 0 followed by results for ρ = 0.9. When b_{2} = 0.01, we are observing null rejection probabilities, whereas for other values of b_{2}, we are observing power.
Table 4. Empirical null rejections and empirical power with AR(1) errors  Without level shift  With level shift 

η  g_{1}  ρ_{1}  b_{2}  W_{PW}  W_{Bart}  VF  W_{PW}  W_{Bart}  VF  SupVF 


0  0  0  0.01  0.052  0.051  0.050  0.053  0.052  0.051  0.050 
   0.0105  0.452  0.451  0.364  0.178  0.177  0.150  0.222 
   0.011  0.954  0.955  0.878  0.526  0.524  0.439  0.684 
   0.0116  1.000  1.000  0.994  0.858  0.859  0.762  0.955 
   0.0121  1.000  1.000  1.000  0.981  0.981  0.937  0.997 
  0.9  0.01  0.068  0.123  0.068  0.069  0.132  0.069  0.078 
   0.0105  0.093  0.155  0.088  0.078  0.142  0.075  0.089 
   0.011  0.166  0.247  0.148  0.102  0.175  0.096  0.127 
   0.0116  0.284  0.386  0.245  0.142  0.226  0.130  0.188 
   0.0121  0.435  0.551  0.374  0.197  0.293  0.175  0.271 
0  0.25  0  0.01  0.442  0.441  0.292  0.053  0.052  0.051  0.228 
   0.0105  0.051  0.050  0.036  0.178  0.177  0.150  0.078 
   0.011  0.450  0.449  0.293  0.526  0.524  0.439  0.278 
   0.0116  0.954  0.954  0.810  0.858  859  0.762  0.698 
   0.0121  1.000  1.000  0.984  0.981  0.981  0.937  0.947 
  0.9  0.01  0.092  0.153  0.087  0.069  0.132  0.069  0.091 
   0.0105  0.068  0.122  0.066  0.078  0.142  0.075  0.079 
   0.011  0.092  0.154  0.087  0.102  0.175  0.096  0.093 
   0.0116  0.165  0.245  0.145  0.142  0.226  0.130  0.130 
   0.0121  0.280  0.384  0.242  0.197  0.293  0.175  0.192 
0.5  0  0  0.01  0.051  0.051  0.051  0.052  0.052  0.050  0.050 
   0.0105  0.691  0.691  0.577  0.280  0.278  0.234  0.365 
   0.011  0.998  0.998  0.983  0.779  0.779  0.672  0.908 
   0.0116  1.000  1.000  1.000  0.983  0.983  0.941  0.998 
   0.0121  1.000  1.000  1.000  1.000  1.000  0.995  1.000 
  0.9  0.01  0.068  0.123  0.068  0.069  0.130  0.068  0.078 
   0.0105  0.112  0.182  0.105  0.084  0.151  0.081  0.100 
   0.011  0.242  0.341  0.212  0.128  0.208  0.117  0.166 
   0.0116  0.439  0.556  0.379  0.198  0.297  0.178  0.273 
   0.0121  0.656  0.758  0.570  0.292  0.410  0.261  0.417 
0.5  0.25  0  0.01  0.683  0.684  0.419  0.053  0.052  0.050  0.357 
   0.0105  0.050  0.050  0.028  0.280  0.278  0.234  0.105 
   0.011  0.687  0.689  0.424  0.779  0.779  0.672  0.459 
   0.0116  0.998  0.998  0.939  0.983  0.983  0.941  0.903 
   0.0121  1.000  1.000  0.999  1.000  1.000  0.995  0.996 
  0.9  0.01  0.111  0.177  0.100  0.069  0.130  0.068  0.099 
   0.0105  0.067  0.121  0.066  0.084  0.151  0.081  0.081 
   0.011  0.111  0.179  0.100  0.128  0.208  0.117  0.103 
   0.0116  0.241  0.338  0.206  0.198  0.297  0.178  0.172 
   0.0121  0.436  0.552  0.370  0.292  0.410  0.261  0.281 
First focus on the results when the null hypothesis is true, that is, b_{2} = 0.01. For ρ_{1} = 0, we see that when the level shift dummy is included, we have rejections close to 0.05 for all statistics. However, when the level shift dummy is not included and g_{1} = 0.25, we observe severe overrejections that range from 0.292 to 0.442. The statistic with the least severe overrejection problem is VF. When ρ_{1} = 0.9, we have relatively strong autocorrelation in the data. When either g_{1} = 0 or the level shift dummy is included in the model, there are some mild overrejection problems ranging from 0.069 for VF to 0.132 for W_{Bart} with W_{pw} in between. Overrejections are slightly worse when η = 0.5 compared with η = 0. As we see in Table 3, supVF tends to overreject slightly more than VF when autocorrelation is strong. In addition, when there is a level shift in the data, supVF tends to have rejections above 0.05. This happens because supVF nests the null hypotheses of equal trend slopes and no level shift. A rejection using supVF indicates a level shift and/or differences in trends slopes.
Now focus on the cases where b_{2} > 0.01. In these cases, y_{2t} has a bigger trend slope than y_{2t}, and we should be rejecting the null of equal trend slopes. When g_{1} = 0, we see that all statistics have good power when ρ_{1} = 0 and power is higher for η = 0.5 compared with η = 0. Power increases as expected as b_{2} increases. Across the three statistics, VF tends to have lower power than other two statistics. This illustrates the well known tradeoff between overrejection problems and power. Note that while power of VF is lowest, its power is still relatively good in an absolute sense. If we include the level shift dummy in the model even though it is not needed (g_{1} = 0), all three tests show a reduction in power as one would expect. An unexpected finding is that the supVF statistic has higher power than VF and the two Wald statistics when the level shift regressor is included but there is no shift in the data. In contrast and as expected, power of supVF is lower than the tests for the case where the level shift regressor is not included in the estimated model.
The most interesting power results occur for g_{1} = 0.25 and b_{2} = 0.0105 when the level shift dummy is left out of the model. In this case, the estimator of b_{1} is biased up, and one can show that the probability limit of the estimator of b_{1} exactly equals 0.0105. For the case of ρ_{1} = 0, rejections of all three statistics are close to the nominal level of 0.050. This shows that an omitted level shift variable can cripple the power of the tests to detect a difference in trend slopes between two series. For larger values of b_{2}, the tests have power even if the level shift variable is not included. When b_{2} = 0.011, power is higher if the level shift dummy is included, whereas for b_{2} = 0.012, power is higher when the level shift dummy is left out. When a level shift is present in the data, supVF has less power overall than VF as expected given that supVF treats the break date as unknown and uses conservative critical values.
These simulation results show that (i) the VF statistic has type 1 errors closest to the nominal level, (ii) the VF statistic has lower power which is the price paid for more accurate type 1 error; however the power of VF is still reasonably good, (iii) including a level shift dummy when there is no level shift in the data lowers power, (iv) failure to include a level shift dummy when there is a level shift in the data causes type 1 errors to be excessively larger than the nominal level and, depending on the magnitude/direction of the level shift, can make it difficult to detect slopes that are different, (v) positive correlation across series (η = 0.5) tends to increase power, and (vi) stronger serial correlation tends to inflate overrejections under the null while reducing power.
CONCLUSIONS
 Top of page
 Abstract
 INTRODUCTION
 BASIC SETUP WITH NO SHIFT OR KNOWN SHIFT DATE
 TREATING THE SHIFT DATE AS UNKNOWN
 TESTING FOR A LEVEL SHIFT IN A UNIVARIATE TIME SERIES
 FINITE SAMPLE PERFORMANCE OF THE VF STATISTIC
 APPLICATION: DATA AND METHODS
 CONCLUSIONS
 Acknowledgements
 REFERENCES
 Supporting Information
Heteroskedasticity and autocorrelation robust (HAC) covariance matrix estimators have been adapted to the linear trend model, permitting robust inferences about trend significance and trend comparisons in data sets with complex and unknown autocorrelation characteristics. Here, we extend the multivariate HAC approach of Vogelsang and Franses (2005) to allow more general deterministic regressors in the model. We show that the asymptotic (approximating) critical values of the test statistics of Vogelsang and Franses (2005) are nonstandard and depend on the specific deterministic regressors included in the model. These critical values can be simulated directly. Alternatively, a simple bootstrap method is available for obtaining valid critical values and pvalues.
The empirical focus of the paper is a comparison of trends in climate modelgenerated temperature data and corresponding observed temperature data in the tropical troposphere. Our empirical innovation is to make the trend model robust to the possibility of a level shift in the observed data corresponding to the PCS that occurred around 1978. With respect to the Vogelsang and Franses (2005) approach, this amounts to adding a level shift dummy to the model that requires a new set of critical values that we provide.
As our empirical findings show, the detection of a trend in the tropical lower troposphere and midtroposphere data over the 1958–2012 interval is contingent on the decision of whether or not to control for a level shift coinciding with the PCS. If the term is included, a time trend regression with autocorrelationrobust error terms indicates that the trend is small and not statistically different from zero in either the LT or MT layers. Also, most climate models predict a significantly larger trend over this interval than is observed in either layer. We find a statistically significant discrepancy between the average climate model trend and observational trends whether or not the meanshift term is included. However, with the shift term included, the null hypothesis of trend equivalence is rejected much more strongly (at much smaller significance levels).
Regarding the question of preferred specification (that is, whether to include a shift or not), where the researcher suspects a break has occurred, results ought to be robust to controlling for the possibility. In the multivariate tests, when we fix the break at 1977:12, the shift terms are not significant in either level, but when we use the grid search method, the shift is significant at 10% in the LT layer and at 5% in the MT layer. Because breaks are harder to identify than trends, these findings indicate the importance of controlling for the possibility that one is present.
The testing method used herein is both powerful and relatively robust to overrejections under the null hypothesis caused by strong serial correlation. The power of the test is indicated by the span of test scores in Table 8 in which relatively small changes in modeled trends translate into smaller pvalues. Using the datamining method provides a check on the extent to which the results depend on the assumption of a known shift date.
As such, our empirical approach has many other potential applications on climatic and other data sets in which level shifts are believed to have occurred. Examples could include stratospheric temperature trends that are subject to level shifts coinciding with major volcanic eruptions and land surface trends where it is believed that the measuring equipment has changed or was moved. Generalizing the approach to allow more than one unknown break point is left for subsequent work.