SEARCH

SEARCH BY CITATION

Abstract

  1. Top of page
  2. Abstract
  3. 1 INTRODUCTION
  4. 2 THE MODEL AND ESTIMATORS
  5. 3 METHOD TO COMPUTE GLOBAL MINIMIZERS
  6. 4 CONSTRUCTING CONFIDENCE INTERVALS
  7. 5 TEST STATISTICS FOR MULTIPLE BREAKS
  8. 6 EMPIRICAL APPLICATIONS
  9. 7 CONCLUSIONS
  10. Acknowledgements
  11. REFERENCES
  12. Supporting Information

In a recent paper, Bai and Perron (1998) considered theoretical issues related to the limiting distribution of estimators and test statistics in the linear model with multiple structural changes. In this companion paper, we consider practical issues for the empirical applications of the procedures. We first address the problem of estimation of the break dates and present an efficient algorithm to obtain global minimizers of the sum of squared residuals. This algorithm is based on the principle of dynamic programming and requires at most least-squares operations of order O(T2) for any number of breaks. Our method can be applied to both pure and partial structural change models. Second, we consider the problem of forming confidence intervals for the break dates under various hypotheses about the structure of the data and the errors across segments. Third, we address the issue of testing for structural changes under very general conditions on the data and the errors. Fourth, we address the issue of estimating the number of breaks. Finally, a few empirical applications are presented to illustrate the usefulness of the procedures. All methods discussed are implemented in a GAUSS program. Copyright © 2002 John Wiley & Sons, Ltd.


1 INTRODUCTION

  1. Top of page
  2. Abstract
  3. 1 INTRODUCTION
  4. 2 THE MODEL AND ESTIMATORS
  5. 3 METHOD TO COMPUTE GLOBAL MINIMIZERS
  6. 4 CONSTRUCTING CONFIDENCE INTERVALS
  7. 5 TEST STATISTICS FOR MULTIPLE BREAKS
  8. 6 EMPIRICAL APPLICATIONS
  9. 7 CONCLUSIONS
  10. Acknowledgements
  11. REFERENCES
  12. Supporting Information

Both the statistics and econometrics literature contain a vast amount of work on issues related to structural change, most of it specifically designed for the case of a single change. The problem of multiple structural changes has received considerably less but an increasing attention. Related literature includes Andrews, Lee and Ploberger, (1996), Garcia and Perron (1996), Liu, Wu and Zidek (1997), Pesaran and Timmermann (‘Model instability and choice of observation window’, unpublished manuscript, 1999), Lumsdaine and Papell (1997), and Morimune and Nakagawa (1997). Most of these studies are concerned with issues related to hypothesis testing in the context of multiple changes. Recently, Bai and Perron (1998) considered estimating multiple structural changes in a linear model estimated by least-squares. They derived the rate of convergence and the limiting distributions of the estimated break points. The results are obtained under a general framework of partial structural changes which allows a subset of the parameters not to change (and, of course, includes a pure structural change model as a special case). They also addressed the important problem of testing for multiple structural changes: a sup Wald type tests for the null hypothesis of no change versus an alternative containing an arbitrary number of changes and a procedure that allows one to test the null hypothesis of, say, ℓ changes, versus the alternative hypothesis of ℓ + 1 changes. The latter is particularly useful in that it allows a specific to general modelling strategy to consistently determine the appropriate number of changes in the data.

The present study focuses on the empirical implementation of the theoretical results of Bai and Perron (1998), henceforth referred to as BP. We first address the problem of the estimation of the break dates and present an efficient algorithm to obtain global minimizers of the sum of squared residuals based on the principle of dynamic programming which requires at most least-squares operations of order O(T2) for any number of breaks. Our method can be applied to both pure and partial structural change models. We also consider the problem of forming confidence intervals for the break dates under various hypotheses about the structure of the data and errors across segments. In particular, we may allow the data and errors to have different distributions across segments or impose a common structure. The issue of testing for structural changes is also considered under very general conditions on the data and the errors. We discuss how the tests can be constructed allowing different serial correlation in the errors, different distribution for the data and the errors across segments or imposing a common structure. We also address the issue of estimating the number of breaks. To that effect, we discuss methods based on information criteria and a method based on a sequential testing procedure. Empirical applications are presented to illustrate the usefulness of the procedures. All methods discussed are implemented in a GAUSS program.

The rest of this paper is structured as follows. Section 2 presents the model and the estimator. Section 3 discusses in detail an algorithm, based on the principle of dynamic programming, that allows us to efficiently estimate models with multiple structural changes. Section 4 discusses the construction of confidence intervals for the various parameters, in particular the break dates. Section 5 discusses tests for multiple structural changes, methods to estimate the number of breaks and summarizes practical recommendations based on a simulation study presented in Bai and Perron (‘Multiple structural change models: a simulation analysis’, unpublished manuscript, 2000). Empirical applications are presented in Section 6. Some conclusions are contained in Section 7.

2 THE MODEL AND ESTIMATORS

  1. Top of page
  2. Abstract
  3. 1 INTRODUCTION
  4. 2 THE MODEL AND ESTIMATORS
  5. 3 METHOD TO COMPUTE GLOBAL MINIMIZERS
  6. 4 CONSTRUCTING CONFIDENCE INTERVALS
  7. 5 TEST STATISTICS FOR MULTIPLE BREAKS
  8. 6 EMPIRICAL APPLICATIONS
  9. 7 CONCLUSIONS
  10. Acknowledgements
  11. REFERENCES
  12. Supporting Information

We consider the following multiple linear regression with m breaks (m + 1 regimes):

  • equation image(1)

for j = 1, …, m + 1. In this model, yt is the observed dependent variable at time t; xt (p × 1) and zt (q × 1) are vectors of covariates and β and δj(j = 1, …, m + 1) are the corresponding vectors of coefficients; ut is the disturbance at time t. The indices (T1, …, Tm), or the break points, are explicitly treated as unknown (we use the convention that T0 = 0 and Tm + 1 = T). The purpose is to estimate the unknown regression coefficients together with the break points when T observations on (yt, xt, zt) are available. This is a partial structural change model since the parameter vector β is not subject to shifts and is estimated using the entire sample. When p = 0, we obtain a pure structural change model where all the coefficients are subject to change. The variance of ut needs not be constant. Indeed, breaks in variance are permitted provided they occur at the same dates as the breaks in the parameters of the regression.1

The multiple linear regression system (1) may be expressed in matrix form as

  • equation image

where Y = (y1, …, yT)′, X = (x1, …, xT)′, U = (u1, …, uT)′, δ = (δ1′, δ2′, …, δm + 1′)′, and Z is the matrix which diagonally partitions Z at (T1, …, Tm), i.e. Z = diag(Z1, …, Zm + 1) with Zi = (zmath image, …, zmath image)′. We denote the true value of a parameter with a 0 superscript. In particular, equation image and equation image are used to denote, respectively, the true values of the parameters δ and the true break points. The matrix Z0 is the one which diagonally partitions Z at equation image. Hence, the data-generating process is assumed to be

  • equation image(2)

The method of estimation considered is that based on the least-squares principle. For each m-partition (T1, …, Tm), the associated least-squares estimates of β and δj are obtained by minimizing the sum of squared residuals

  • equation image

Let equation image and equation image denote the estimates based on the given m-partition (T1, …, Tm) denoted {Tj}. Substituting these in the objective function and denoting the resulting sum of squared residuals as ST(T1, …, Tm), the estimated break points (1, …, m) are such that (1, …, m) = argminmath imageST(T1, …, Tm), where the minimization is taken over all partitions (T1, …, Tm) such that TiTi−1q.2 Thus the break-point estimators are global minimizers of the objective function. The regression parameter estimates are the estimates associated with the m-partition {j}, i.e. equation image. Since, the break points are discrete parameters and can only take a finite number of values, they can be estimated by a grid search. This method becomes rapidly computationally excessive when m > 2. Fortunately, there exists a very efficient method which we now discuss.

3 METHOD TO COMPUTE GLOBAL MINIMIZERS

  1. Top of page
  2. Abstract
  3. 1 INTRODUCTION
  4. 2 THE MODEL AND ESTIMATORS
  5. 3 METHOD TO COMPUTE GLOBAL MINIMIZERS
  6. 4 CONSTRUCTING CONFIDENCE INTERVALS
  7. 5 TEST STATISTICS FOR MULTIPLE BREAKS
  8. 6 EMPIRICAL APPLICATIONS
  9. 7 CONCLUSIONS
  10. Acknowledgements
  11. REFERENCES
  12. Supporting Information

We now consider an algorithm based on the principle of dynamic programming that allows the computation of estimates of the break points as global minimizers of the sum of squared residuals. This algorithm uses at most least-squares operations of order O(T2) for any number of structural changes m, unlike a standard grid search procedure which would require least squares operations of order O(Tm). The basis of the method, for specialized cases, is not new and was considered by Guthery (1974), Bellman and Roth (1969) and Fisher (1958). Nevertheless, it seems to have been forgotten, at least in the econometrics literature, and a thorough description appears useful. The original method works only for pure structural change models; we propose a scheme that allows estimating more general partial structural change models.

3.1 The Triangular Matrix of Sums of Squared Residuals

The basic idea of the approach becomes fairly intuitive once it is realized that, with a sample of size T, the total number of possible segments is at most T(T + 1)/2 and is therefore of order O(T2). This is presented in Figure 1 for the special case with T = 25 and m = 2 where the vertical axis represents the initial date of a segment and the horizontal axis the ending date. Each entry represents an estimated sum of squared residuals corresponding to the associated segment. The global sum of squared residuals for any m-partition (T1, …, Tm) and for any value of m must necessarily be a particular linear combination of these T(T + 1)/2 sums of squared residuals. The estimates of the break dates, the m-partition (1, …, m), correspond to this linear combination with a minimal value. The dynamic programming algorithm can be seen as an efficient way to compare possible combinations (corresponding to different m-partitions) to achieve a minimum global sum of squared residuals.

thumbnail image

Figure 1. Example of the triangular matrix of sums of squared residuals with T = 25, h = 5 and m = 2

Download figure to PowerPoint

In practice, less than T(T + 1)/2 segments are permissible. First, some minimum distance, h, between each break may be imposed, as is done in the construction of the tests discussed in Section 5 (we suppose without loss of generality that hq). This implies a reduction in the number of segments to be considered of (h − 1)T − (h − 2)(h − 1)/2 (see Figure 1). Other reductions are possible. The largest segment must be short enough to allow m other segments before or after. For example, when the segment starts at a date between 1 and h, the maximal length of this segment is Thm when m breaks are allowed. This allows a further reduction in the total number of segments considered of h2m(m + 1)/2. Finally, a segment cannot start at dates 2 to h, since otherwise no segment of minimal length h could be inserted at the beginning of the sample. This allows a further reduction of T(h − 1) − mh(h − 1) − (h − 1)2h(h − 1)/2 segments to be considered. We discuss below how this triangular matrix of sums of squared residuals can be constructed and used to obtain global minimizers for both cases of pure and partial structural change models.

3.2 The Case of a Pure Structural Change Model

We first consider the case of a pure structural change model with the regression given by:

  • equation image(3)

In such a case, the computation of the estimates equation image, ût and ST(T1, …, Tm) can be done applying OLS segment by segment without constraints among them. The computation of the triangular matrix of sums of squared residuals can be achieved using standard updating formulae to calculate recursive residuals. Indeed, all the relevant information can be calculated from Thm + 1 sets of recursive residuals. Let v(i, j) be the recursive residual at time j obtained using a sample that starts at date i, and let SSR(i, j) be the sum of squared residuals obtained by applying least-squares to a segment that starts at date i and ends at date j. We have the following recursive relation (e.g. Brown, Durbin and Evans, 1975): SSR(i, j) = SSR(i, j − 1) + v(i, j)2. All the relevant information is contained in the values SSR(i, j) for the relevant combinations (i, j). Note that the number of matrix inversions needed is simply of order O(T).

3.3 The Dynamic Programming Algorithm

Once the sums of squared residuals of the relevant segments have been computed and stored, a dynamic programming approach can be used to evaluate which partition achieves a global minimization of the overall sum of squared residuals. This method essentially proceeds via a sequential examination of optimal one-break (or two segments) partitions. Let SSR({Tr, n}) be the sum of squared residuals associated with the optimal partition containing r breaks using the first n observations. The optimal partition solves the following recursive problem:

  • equation image(4)

The procedure starts by evaluating the optimal one-break partition for all sub-samples that allow a possible break ranging from observations h to Tmh. Hence, the first step is to store a set of T − (m + 1)h + 1 optimal one-break partitions along with their associated sum of squared residuals. Each of the optimal partitions correspond to subsamples ending at dates ranging from 2h to T − (m − 1)h.

Consider now the next step which proceeds in a search for optimal partitions with two breaks. Such partitions have ending dates ranging from 3h to T − (m − 2)h. For each of these possible ending dates, the procedure looks at which one-break partition (saved earlier) can be inserted to achieve a minimal sum of squared residuals. The outcome is a set of T − (m + 1)h + 1 optimal two breaks (or three segments) partitions. The method continues sequentially until a set of T − (m + 1)h + 1 optimal (m − 1) breaks partitions are obtained with ending dates ranging from (m − 1)h to T − 2h. The final step is to see which of these optimal (m − 1) breaks partitions yields an overall minimal sum of squared residuals when combined with an additional segment. The method can therefore be viewed as a sequential updating of T − (m + 1)h + 1 segments into optimal one, two and up to m − 1 breaks partitions (or into two, three and up to m sub-segments); the last step simply creating a single optimal m breaks (or m + 1 segments) partition.

It is important to note that, in practice, this method is very fast using samples of the usual sizes. Indeed, the major computation cost is the construction of the triangular matrix of sums of squared residuals for all possible segments. The search for the optimal m-partition represents a marginal addition to the total computing time. This means that it is only marginally longer to obtain global minimizers with five or ten breaks as it is with two.

3.4 The Case of a Partial Structural Change Model

This dynamic programming method to obtain global minimizers of the sum of squared residuals cannot be applied directly to the case of a partial structural change model (p > 0). This is because we cannot concentrate out the parameters β without knowing the appropriate partition, i.e. the estimate of β associated with a global minimization depends on the optimal partition which we are trying to obtain. Unlike for the pure structural change model for which we can write the regression in the form (3), each element of the triangular matrix of sums of squared residuals depends on the final optimal m-partition that we search.

However, a simple iterative procedure is possible. Let θ = (δ, T1, …, Tm), we can write the sum of squared residuals as a function of the vectors β and θ, i.e. SSR(β, θ). As discussed in Sargan (1964), we can minimize SSR(β, θ) in an iterative fashion as follows. First, minimize with respect to θ keeping β fixed and then minimize with respect to β keeping θ fixed, and iterate. Each iteration assures a decrease in the objective function.3

We discuss the details of this method in our context with a slight modification that permits a very rapid convergence. Note that the first step, minimizing with respect to θ keeping β fixed, amounts to applying the dynamic programming algorithm discussed above with ytxt′β as the dependent variable. Since β is fixed this is, indeed, a step involving a pure structural change model. Let θ* = (δ*, {T*}) be the associated estimate from this first stage (with equation image). The application of Sargan's method suggest that the second step be a simple linear regression with equation image being the dependent variable for t in regime j (j = 1, …, m + 1), the regimes being defined by the partition {T*}.

Important efficiency gains can be obtained making a slight modification to the second step. The idea is to only keep {T*} fixed and to maximize again with respect to δ and β simultaneously. Hence, δ is updated at each of the two steps. The reason why this leads to important efficiency gains can be explained as follows. In general, the values {T*} obtained at the first iteration will be quite close to the values {} that correspond to the global minimum (unless the initial value of β is very far from its true value β0). Intuitively this is so because a misspecification in the initial value of β has little effect on the estimates {T*}, since the latter depend mostly on the changes in the coefficients δ (associated with the zt variables) across regimes. Consider a second step which applies the OLS regression Y = Xβ + Z*δ + U, with Z*, the diagonal partition of Z at the m-partition equation image. If the values {T*} are equal to {}, corresponding to the global minimum, the estimates of β and δ from this second step are then those corresponding to the global minimum. Experiments with real and simulated data showed that, in the majority of cases, a single iteration is sufficient. In a few cases two are required but it was difficult to find examples where three were needed.

To highlight the contrast between the two methods, consider what happens if δ is not re-updated in the second step. This step becomes a simple OLS regression of the form YZ*δ* = Xβ + U. Note that even if {T*} is equal to {}, corresponding to the global minimum, the estimate of β will not necessarily be close to equation image (the value at the global minimum) unless, of course, δ* is already close to equation image at the first iteration (which can only happen with a small probability). Hence, the need for additional iterations; and experiments on real an simulated data have shown that the number of iterations necessary can be high, even in simple models.

The convergence criterion adopted is that the change in the objective function ST(T1, …, Tm) be smaller than some arbitrary ϵ. Using the iterative method suggested, it is possible to specify ε = 0 because of the discrete nature of the variables (T1, …, Tm). Indeed, in most of the experiments performed, the minimum was attained after the first iteration and the second one only verified that there was effectively no change in the objective function.

3.5 The Choice of the Initial Value for β

The efficiency of the method proposed above to achieve a global minimum (by opposition to a local minimum) depends on an appropriate choice of the initial value of the vector β to start the iteration. We suggest the following procedure. First, apply the dynamic programming algorithm treating all coefficients as subject to change, i.e. treat the model as one of pure structural change. To be precise, write this pure structural change model as

  • equation image

for j = 1, …, m + 1. The application of the dynamic programming algorithm to this model gives estimates equation image; j = 1, …, m + 1) and equation image. To obtain an initial value of the vector β, we only need to use the OLS regression equation image, where Za is the diagonal partition of Z at the m-partition equation image and equation image is the estimate of δ2. The estimate so obtained, say βa, is used to initialize the iteration procedure.

Using this method to choose the initial value of β is justified on the grounds that the estimates equation image of the break fractions equation image are convergent at rate T even when some coefficients do not change across regimes. All that is needed is that at least one coefficient changes at every break date. Hence, the estimate βa obtained is asymptotically equivalent to the estimate equation image associated with the global minimum. This permits reaching the global minimum in very few iterations and greatly reduces the risk of reaching a local minimum. Indeed, this later problem did not occur in any of the experiments that we tried.

It may be the case that using this method to initialize the vector β is difficult to implement in practice; for example, when the dimension p of the vector β and/or the number m of changes are large. In such cases, one can always use some fixed initial values. Here, however, the problem of convergence towards a local minimum becomes more important and care should be used by applying some sensitivity analyses.

3.6 Extension to Threshold Models

The algorithm can be adapted to estimate threshold models of the form:

  • equation image(5)

for j = 1, …, m + 1 with the convention that τ0 = −∞ and τm + 1 = + ∞. Again, yt is the observed dependent variable at time t; xt (p × 1) and zt (q × 1) are vectors of covariates and β and δj(j = 1, …, m + 1) are the corresponding vectors of coefficients; ut is the disturbance at time t. Here, the functional form of the regression depends on the value of some observable variable vt. This variable can be an element of the vectors xt or zt but need not, and it should be predetermined relative to ut (e.g. a lagged value of the dependent variable is permitted). There are m threshold points (τ1, …, τm) which are unknown and, hence, m + 1 regimes. The purpose is to estimate the unknown regression coefficients together with the threshold points when T observations on (yt, xt, zt) are available. This is a partial threshold model in the sense that β is not subject to shifts and is effectively estimated using the entire sample. When p = 0, we obtain a pure threshold model with all coefficients subject to change.

To describe the estimation method, let v′ = (v1, …, vT) and equation image be the sorted version of v′ such that vmath imagevmath image ≤ … ≤ vmath image. The indices (t1, …, tT) are a permutation of the time indices (1, …, T). Now, for i = 1, …, m, let Ti be the time index such that vmath image ≤ τi for all j such that jTi and vmath image > τi for all j such that j > Ti. The m-partition (T1, …, Tm) is the partition that corresponds to the time indices of the sorted vector equation image when the variables vmath image reach each of the m thresholds. We can write model (5) using all variables sorted according to the partition (T1, …, Tm). Then, we have, for j = 1, …, T and i = 1, …, m + 1:

  • equation image(6)

(using T0 = 0 and Tm + 1 = T). This model is in the form of a partial structural change model that we have considered. Note that this change in the time scale maintains the structure of the model even with lagged dependent variables (see Tsay, 1998). One can obtain consistent estimates of the parameters (T1, …, Tm) using the dynamic programming algorithm. Let the estimate of the partition be denoted by (1, …, m); the estimates of the thresholds are then recovered as equation image with r = j for j = 1, …, m. One can then recover the estimates of β and δi from (5) by OLS conditioning on the threshold values.

4 CONSTRUCTING CONFIDENCE INTERVALS

  1. Top of page
  2. Abstract
  3. 1 INTRODUCTION
  4. 2 THE MODEL AND ESTIMATORS
  5. 3 METHOD TO COMPUTE GLOBAL MINIMIZERS
  6. 4 CONSTRUCTING CONFIDENCE INTERVALS
  7. 5 TEST STATISTICS FOR MULTIPLE BREAKS
  8. 6 EMPIRICAL APPLICATIONS
  9. 7 CONCLUSIONS
  10. Acknowledgements
  11. REFERENCES
  12. Supporting Information

A central result derived in BP concerns the convergence of the break fractions equation image and the rate of convergence. The results obtained show not only that equation image converges to its true value equation image but that it does so at the fast rate T, i.e. equation image for all i. It is important, however, to note that this rate T convergence pertains to the estimated break fractions equation image and not to the break dates i themselves. For the latter, the result shows that with a probability arbitrarily close to 1, the distance between i and equation image is, in large samples, bounded by a constant independent of the sample size.

This convergence result is obtained under a very general set of assumptions allowing a wide variety of models. It, however, precludes integrated variables (with an autoregressive unit root) but permits trending regressors. The assumptions concerning the nature of the errors in relation to the regressors {xt, zt} are of two kinds. First, when no lagged dependent variable is allowed in {xt, zt}, the conditions on the residuals are quite general and allow substantial correlation and heteroscedasticity. The second case allows lagged dependent variables as regressors but then, of course, no serial correlation is permitted in the errors {ut}. In both cases, the assumptions are general enough to allow different distributions for both the regressors and the errors in each segment.

The possibility of the two cases described above is potentially quite useful in dynamic models when the parameters associated with the lagged dependent variables are not subject to structural change. In this case, the investigator can take these dynamic effects into account either in a direct parametric fashion (e.g. introducing lagged dependent variables so as to have uncorrelated residuals) or using an indirect non-parametric approach (e.g. leaving the dynamics in the disturbances and applying a non-parametric correction for proper asymptotic inference). This trade-off can be useful to distinguish gradual from sudden changes the same way a distinction is made between innovational and additive outliers. Consider, for example, the case of a change in mean for a correlated series. When specifying zt = {1} and xt = {∅}, all the dynamics is contained in the error term and does not affect the impact of the change in mean on the level of the series. The change is, hence, abrupt. However, when specifying zt = {1} and xt = {lags of yt}, a change in the coefficient associated with the constant zt is related to a change in the level of yt that varies for the periods following the break. This change depends on the autoregressive dynamics and takes effect gradually.

4.1 Confidence Intervals for the Parameters β and δ

The fact that the quantities equation image converge at the fast rate of T is enough to guarantee that the estimation of the break dates has no effect on the limiting distribution of the other parameters of the model. This permits us to recover, for these estimates, the standard √T asymptotic normality. More precisely, let equation image and θ0 = (β0, δ0), then equation image with V = plim T−1W0W0, Φ = plim T−1W0′ ΩW0, Ω = E(UU′), and where equation image is the diagonal partition, at equation image, of W = (w1, …, wT) with wt = (xt′, zt′). Note that when the errors are serially uncorrelated and homoscedastic we have Φ = σ2V and the asymptotic covariance matrix reduces to σ2V−1, which can be consistently estimated using an estimate of σ2. When serial correlation and/or heteroscedasticity is present, a consistent estimate of Φ can be constructed along the lines of Andrews (1991). In all cases where covariance matrix robust to heteroscedasticity and serial correlation are needed, we use Andrews's (1991) data dependent method with the Quadratic Spectral kernel and an AR(1) approximation to select the bandwidth (henceforth, referred to as the HAC estimator). We also allow in the program using pre-whitening as suggested in Andrews and Monahan (1992). Note that the correction for possible serial correlation can be made allowing the distributions of the regressors and errors to differ.

4.2 Confidence Intervals for the Break Dates

To get an asymptotic distribution for the break dates, the strategy considered is to adopt an asymptotic framework where the magnitudes of the shifts converge to zero as the sample size increases. The resulting limiting distribution is then independent of the specific distribution of the pair {zt, ut}. To describe the relevant distributional result, we need to define some notations. For i = 1, …, m, and equation image, let equation image, equation image, and equation image. In the case where the data are non-trending, we have, under various assumptions stated in BP, the following limiting distribution of the break dates:

  • equation image(7)

where equation image, if s ≤ 0, equation image, if s > 0, and ξi = ΔiQi + 1ΔiiQiΔi, equation image, equation image. Also, equation image and equation image are independent standard Weiner processes defined on [0, ∞), starting at the origin when s = 0. These processes are also independent across i. The cumulative distribution function of arg maxsV(i)(s) is derived in Bai (1997a) and all that is needed to compute the relevant critical values are estimates of Δi, Qi, and Ωi. These are given by equation image, equation image, and an estimate of Ωi can be constructed using a HAC estimator applied to the vector {zt ût} and using data over segment i only.

In practice, one may want to impose some constraints related to the distribution of the errors and regressors across segments. We then have the following cases:

  • The regressors zt are identically distributed across segments. Then Qi = Qi + 1 = Q which can be estimated by equation image. In this case, the limiting result states that equation image, with ξi = 1.

  • The errors are identically distributed across segments. Then Ωi = Ωi + 1 = Ω which can be estimated using a HAC estimator applied to {zt ût} using data over the whole sample.

  • The errors and the data are identically distributed across segments. Here, we have ξi = 1, and ϕi,1 = ϕi,2 and the limiting distribution reduces to equation image, which has a density function symmetric about the origin. Here, W(i)(s) denotes a two-sided Brownian motion defined on ℜ.

  • The errors are serially uncorrelated. In this case equation image and equation image which can be estimated using equation image. The confidence intervals can then be constructed from the approximation

    • equation image(8)
  • The errors are serially uncorrelated and the regressors are identically distributed across segments. Here equation image and ξi = 1. The confidence intervals can then be constructed from the approximation

    • equation image(9)
  • The errors are serially uncorrelated and identically distributed across segments. The approximation is the same as (8) with equation image instead of equation image.

  • The errors are serially uncorrelated and both the data and the errors are identically distributed across segments. The approximation is the same as (9) with equation image instead of equation image.

All the cases discussed above are allowed as options in the accompanying computer program. Since the break dates are integer valued, we consider confidence intervals that are likewise integer-valued by using the highest smaller integer for the lower bound and the smallest higher integer for the upper bound.

4.3 The Case with Trending Regressors

Simple modifications can be applied to deal with the case of trending regressors. Suppose that we have regressor zt of the form zt = [g1(t/T), …, gq(t/T)], with gi(t/T) having bounded derivatives on [0,1]. For example, in the case of a polynomial trend function, gi(t/T) = (t/T)i. Then, (see Bai, 1997a)

  • equation image

where equation image and equation image. If the errors have the same distribution across segments, we have equation image, where fu(0) is (2π times) the spectral density function of ut at frequency zero which can be consistently estimated using standard kernel methods. If ut is uncorrelated, fu(0) is replaced by equation image estimated by equation image.

5 TEST STATISTICS FOR MULTIPLE BREAKS

  1. Top of page
  2. Abstract
  3. 1 INTRODUCTION
  4. 2 THE MODEL AND ESTIMATORS
  5. 3 METHOD TO COMPUTE GLOBAL MINIMIZERS
  6. 4 CONSTRUCTING CONFIDENCE INTERVALS
  7. 5 TEST STATISTICS FOR MULTIPLE BREAKS
  8. 6 EMPIRICAL APPLICATIONS
  9. 7 CONCLUSIONS
  10. Acknowledgements
  11. REFERENCES
  12. Supporting Information

5.1 A Test of No Break versus a Fixed Number of Breaks

We consider the supF type test of no structural break (m = 0) versus m = k breaks. Let (T1, …, Tk) be a partition such that Ti = [Tλi] (i = 1, …, k). Let R be the conventional matrix such that (Rδ)′ = (δ1′ − δ2′, …, δk′ − δk + 1′). Define

  • equation image(10)

where equation image is an estimate of the variance covariance matrix of equation image that is robust to serial correlation and heteroscedasticity; i.e. a consistent estimate of

  • equation image(11)

Following Andrews (1993) and others, the test is supequation image where equation image minimize the global sum of squared residuals which is equivalent to maximizing the F-test assuming spherical errors. This is asymptotically equivalent to, and yet much simpler to construct than, maximizing the F-test (10) since the estimated break dates are consistent even in the presence of serial correlation. The asymptotic distribution depends on a trimming parameter via the imposition of the minimal length h of a segment, namely ϵ = h/T.

Various versions of the tests can be obtained depending on the assumptions made with respect to the distribution of the data and the errors across segments. These relate to different specifications in the construction of the estimate of equation image given by (11). In the case of a partial structural change model (p ≠ 0), we consider three specifications.

  • Allowing for serial correlation, different distributions for the data across segments and the same distribution for the errors across segments. The estimate is then equation image where K̂T is the HAC estimator of the (m + 1)q vector equation image where equation image are the elements of the matrix MXZ.

  • Serially uncorrelated errors, different variances of the errors and different distributions of the data across segments. The estimate is equation image, with equation image, equation image, equation image with Z* = MXZ.

  • Serially uncorrelated errors, different distributions for the data across segments and the same distribution for the errors across segments. In this case, equation image which can be estimated using equation image.

In the case of a pure structural change model, we consider more possible specifications on how to estimate the relevant asymptotic covariance matrix. They are the following:

  • No serial correlation, different distributions for the data and identical distribution for the errors across segments. In this base case, the estimate is equation image.

  • No serial correlation in the errors, different variances of the errors and different distributions of the data across segments. In this case, equation image, where equation image uses only data from segment i, i.e. equation image with equation image. These are simply the OLS estimates obtained using data from each segment separately.

  • Serial correlation in the errors, different distributions for the data and the errors across segments. Here, we make use of the fact that the errors in different segments are asymptotically independent. Hence, the limiting variance is given by equation image, where equation image. This can be consistently estimated, segment by segment, with a HAC estimator of equation image using only data from segment i.

  • Serial correlation in the errors, same distribution for the errors across segments. In this case the limiting covariance matrix is equation image where (using the convention that λ0 = 0 and λm + 1 = 1) Λ = diag1 − λ0, …, λm + 1 − λm). This can be consistently estimated using equation image and a HAC estimator of Z′ΩZ based on {zt ût} constructed using the full sample.

In the construction of the tests we do not consider imposing the restriction that the distribution of the regressors zt be the same across segments even if they are (except as they enter in the construction of a HAC estimate involving the pair {zt ût}). This might seem surprising since imposing a valid restriction should lead to more precise estimate. This is, however, not the case. Consider the case with no serial correlation in the errors and the same distribution for the errors across segments. Imposing the restriction that the distribution of the regressors zt be the same across segments leads to the asymptotic covariance matrix equation image, where equation image. Note that a consistent estimate can be obtained using equation image, equation image and equation image constructed using equation image (i = 1, …, m). Suppose that the z's are exogenous and the errors have the same variance across segments. Then, for a given partition (T1, …, Tm), the exact variance of equation image is equation image. Using the asymptotic version may imply an inaccurate approximation especially if small segments are allowed, in which case the exact moment matrix of the regressors may deviate substantially from its full-sample analogue.

The same problem occurs in the case with no serial correlation in the errors and different variances for the residuals across segments. Imposing the restriction that the distribution of the regressors zt be the same across segments gives the limiting variance equation image where equation image which can be consistently estimated using Q̂, equation image and equation image. Again, imposing the constraint that ZiZi/(Δi) be approximated by Q̂ over all segments may imply a poor approximation in finite samples. We have found, in these two cases, that imposing a common distribution for the regressors across segments leads to tests with worse properties even when the data indeed have an invariant distribution. These distortions becomes less important, however, when the sample size is large and/or the trimming ϵ is large.

The relevant asymptotic distribution has been derived in BP and critical values were provided for a trimming ϵ = 0.05 and values of k from 1 to 9 and values of q from 1 to 10. As discussed in Bai and Perron (2000), a trimming as small as 5% of the total sample can lead to tests with substantial size distortions when allowing different variances of the errors across segments or when serial correlation is permitted. This is because one is then trying to estimate various quantities using very few observations; for example, if T = 100 and ϵ = 0.05, one ends up estimating, for some segments, quantities like the variance of the residuals using only 5 observations. Similarly, with serial correlation a HAC estimator would need to be applied to very short samples. The estimates are then highly imprecise and the tests accordingly show size distortions. When allowing different variances across segments or serial correlation, a higher value of ϵ should be used. Hence, the case with no serial correlation and homogenous errors should be considered the base case in which the tests can be constructed using an arbitrary small trimming ϵ. For all other cases, care should be exercised in the choice of ϵ and larger values should be considered. For that purpose, we supplemented the critical values tabulated in BP with similar ones for ϵ = 0.10, 0.15, 0.20 and 0.25. The results are presented in Bai and Perron (‘Additional critical values for multiple structural changes tests’, unpublished manuscript, 2001). Note that when ϵ = 0.10 the maximum number of breaks considered is 8 since allowing 9 breaks impose the estimates to be exactly equation image, equation image up to equation image. For similar reasons, the maximum number of breaks allowed is 5 when ϵ = 0.15, 3 when ϵ = 0.20 and 2 when ϵ = 0.25.

Note that the asymptotic theory for these tests in BP is valid only for the case of non-trending data. The case with trending data, discussed in Bai (1999), yields different asymptotic distributions. However, the asymptotic distributions in the two cases are fairly similar, especially in the tail where critical values are obtained. Hence, one can safely use the same critical values. Using simulations, we found the size distortions to be minor.

5.2 Double Maximum Tests

Often, an investigator wishes not to pre-specify a particular number of breaks to make inference. To allow this BP have introduced two tests of the null hypothesis of no structural break against an unknown number of breaks given some upper bound M. These are called the double maximum tests. The first is an equal weighted version defined by equation image, where equation image (j = 1, …, m) are the estimates of the break points obtained using the global minimization of the sum of squared residuals. The second test applies weights to the individuals tests such that the marginal p-values are equal across values of m and is denoted WD max FT(M, q); see BP for details. Critical values were provided for M = 5 and ϵ = 0.05 in BP. A value M = 5 should be sufficient for most empirical applications. In any event, the critical values vary little for choices of the upper bound M larger than 5. Bai and Perron (2001) provide additional critical values for ϵ = 0.10 (M = 5), 0.15 (M = 5), 0.20 (M = 3) and 0.25 (M = 2).

5.3 A Test of versus + 1 Breaks

BP proposed a test for ℓ versus ℓ + 1 breaks, labelled supFT(ℓ + 1|ℓ). The method amounts to the application of (ℓ + 1) tests of the null hypothesis of no structural change versus the alternative hypothesis of a single change. The test is applied to each segment containing the observations i−1 to i(i = 1, …, ℓ + 1). We conclude for a rejection in favour of a model with (ℓ + 1) breaks if the overall minimal value of the sum of squared residuals (over all segments where an additional break is included) is sufficiently smaller than the sum of squared residuals from the ℓ breaks model. The break date thus selected is the one associated with this overall minimum. The estimates i need not be the global minimizers of the sum of squared residuals, one can also use sequential one at a time estimates which allows the construction of a sequential procedure to select the number of breaks (see Bai, 1997b).

Asymptotic critical values were provided by BP for a trimming ϵ = 0.05 for q ranging from 1 to 10, and Bai and Perron (2001) present additional critical values for ϵ = 0.10, 0.15, 0.20 and 0.25. Note that, unlike for the supFT(k; q) test, we do not need to impose similar restrictions on the number of breaks for different values of the trimming ϵ.4 Of course, all the same options are available as for the previous tests concerning the potential specifications of the nature of the distributions for the errors and the data across segments.

5.4 Estimating the Number of Breaks

A common procedure to select the dimension of a model is to consider an information criterion. Yao (1988) suggests the use of the Bayesian Information Criterion (BIC) while Liu et al. (1997) propose a modified Schwarz criterion (LWZ). Perron (1997) presented a simulation study of the behaviour of these two information criteria and of the AIC in the context of estimating the number of changes in the trend function of a series in the presence of serial correlation. The results first showed the AIC to perform very badly. The BIC and LWZ perform reasonably well in the absence of serial correlation in the errors but chooses a much higher value than the true one in the presence of serial correlation. When no serial correlation is present in the errors but a lagged dependent variable is present, the BIC performs badly when the coefficient on the lagged dependent variable is large. In such cases, the LWZ performs better under the null of no break but underestimate the number of breaks when some are present. The method suggested by BP is based on the sequential application of the supFT(ℓ + 1|ℓ) test using the sequential estimates of the breaks.

5.5 Summary and Practical Recommendations

Bai and Perron (2000) present an extensive simulation analysis pertaining to the size and power of the tests, the accuracy of the asymptotic approximations for the confidence intervals and the relative merits of different methods to estimate the number of breaks. The methods are shown to be adequate, in general, but care must be taken when using particular specifications. The following recommendations are made:

  • First, ensure that the specifications are such that the size of the test is adequate under the hypothesis of no break. If serial correlation and/or heterogeneity in the data or errors across segments are not allowed in the estimated regression model (and not present in the DGP), using any value of the trimming ε will lead to tests with adequate sizes. However, if such features are allowed, a higher trimming is needed. With a sample of T = 120, ϵ = 0.15 should be enough for heterogeneity in the errors or the data. If serial correlation is allowed, ϵ = 0.20 may be needed. These could be reduced if larger sample sizes are available.

  • Overall, selecting the break point using the BIC works well when breaks are present but less so under the null hypothesis of no break, especially if serial correlation is present. The method based on the LWZ criterion works better under the null hypothesis (even with serial correlation) by imposing a higher penalty. However, this higher penalty translates into a very bad performance when breaks are present. Also, model selection procedures based on information criteria cannot take into account potential heterogeneity across segments unlike the sequential method. Overall, the sequential procedure works better.

  • There are instances where the sequential procedure can be improved. The problem is that, in the presence of multiple breaks, certain configurations of changes are such that it is difficult to reject the null hypothesis of 0 versus 1 break but it is not difficult to reject the null hypothesis of 0 versus a higher number of breaks (this occurs, for example, when 2 changes are present and the value of the coefficient returns to its original value after the second break). In such cases, the sequential procedure breaks down. A useful strategy is to first look at the UD max or WD max tests to see if at least one break is present. If these indicate the presence of at least one break, then the number of breaks can be decided based upon a sequential examination of the supF(ℓ + 1|ℓ) statistics constructed using global minimizers for the break dates (i.e. ignore the test F(1|0) and select m such that the tests supF(ℓ + 1|ℓ) are insignificant for ℓ ≥ m). This method leads to the best results and is recommended for empirical applications. Its usefulness is illustrated in Section 6.2.

  • In general, non-symmetric confidence intervals for break dates provide better coverage rates than symmetric confidence intervals when data are non-stationary (the second moment of regressors are non-constant). This case includes structural changes in autoregressive models because a change in the intercept or a change in the AR coefficient implies non-constant second moment of the observable variables. Indeed, Monte Carlo simulations show asymmetric distributions for the estimated break points for AR models. These asymmetric distributions are well approximated by the asymptotic distributions discussed in Section 4.2.

  • The coverage rates for the break dates are adequate unless the break is either too small (so small as not to be detected by the tests) or too large. This is, from a practical point of view, an encouraging result. The confidence intervals are inadequate (in that they miss the true break value too often) exactly in those cases where it would be quite difficult to conclude that a break is present (in which case they would not be used anyway). When the breaks are very large the confidence intervals do contain the true values but lead to a conservative assessment of the accuracy of the estimates. It was found that correcting for heterogeneity in the data and/or errors across segments yields improvements over a more straightforward uncorrected interval. Correcting for serial correlation also does lead to substantial improvements.

6 EMPIRICAL APPLICATIONS

  1. Top of page
  2. Abstract
  3. 1 INTRODUCTION
  4. 2 THE MODEL AND ESTIMATORS
  5. 3 METHOD TO COMPUTE GLOBAL MINIMIZERS
  6. 4 CONSTRUCTING CONFIDENCE INTERVALS
  7. 5 TEST STATISTICS FOR MULTIPLE BREAKS
  8. 6 EMPIRICAL APPLICATIONS
  9. 7 CONCLUSIONS
  10. Acknowledgements
  11. REFERENCES
  12. Supporting Information

We discuss two applications of the procedures presented. The first analyses the US ex-post real interest rate series considered by Garcia and Perron (1996). The second re-evaluates some findings of Alogoskoufis and Smith (1991) who analysed changes in the persistence of inflation and shifts in an expectations-augmented Phillips curve resulting from such changes.

6.1 The US Ex-post Real Interest Rate

Garcia and Perron (1996) considered the US ex-post real interest rate (the three-month treasury bill rate deflated by the CPI inflation rate taken from the Citibase data bank). The data are quarterly and the sample is 1961:1–1986:3. Figure 2 presents a graph of the series. Of interest is the presence of abrupt structural changes in the mean of the series. To that effect we apply our procedure with only a constant as regressor (i.e. zt = {1}) and account for potential serial correlation via non-parametric adjustments (see the discussion in Section 4). We allowed up to 5 breaks and used a trimming ϵ = 0.15, hence each segment has at least 15 observations. We also allowed serial correlation in the errors and different variances of the residuals across segments. The results are presented in Table I.

thumbnail image

Figure 2. US ex-post real interest rate 1961:1–1986:3

Download figure to PowerPoint

Table I. Empirical results: US ex-post real interest rate (1961:1–1986:3)
Specifications
zt = {1}q = 1p = 0h = 15M = 5  
  • Notes:

  • 1

    The supFT(k) tests and the reported standard errors and confidence intervals allow for the possibility of serial correlation in the disturbances. The heteroscedasticity and autocorrelation consistent covariance matrix is constructed following Andrews (1991) and Andrews and Monahan (1992) using a quadratic kernel with automatic bandwidth selection based on an AR(1) approximation. The residuals are pre-whitened using a VAR(1).

  • 2

    We use a 5% size for the sequential test supFT(ℓ + 1|ℓ).

  • 3

    In parentheses are the standard errors (robust to serial correlation) for equation image (i = 1, …, 4) and the 95% confidence intervals for i(i = 1,2,3).

  • *

    Significance at the 5% level.

Tests1
SupFT(1)SupFT(2)SupFT(3)SupFT(4)SupFT(5)U DmaxW Dmax
 57.91* 43.01* 33.22* 24.77* 18.33* 57.91* 57.91*
SupF(2|1)supF(3|2)supF(4|2)    
 33.93* 14.72* 0.03    
Number of breaks selected2
Sequential3     
 LWZ2     
 BIC2     
Estimates with Three Breaks3
equation imageequation imageequation imageequation image123
1.820.87−1.805.6466:472:380:3
(0.19)(0.16)(0.51)(0.60)(64:1–69:2)(70:3–72:4)(79:4–81:1)

The first issue to be considered is the determination of the number of breaks. Here the supFT(k) tests are all significant for k between 1 and 5. So at least one break is present. The supFT(2|1) test takes the value 34.31 and is therefore highly significant. The FT(3|2) test has value 14.32 which is also significant at the 5% level. The sequential procedure (using a 5% significance level) selects 3 breaks while the BIC and the modified Schwarz criterion of Liu et al. (1997) selects two breaks. Given the documented facts that the information criteria are biased downward and that the sequential procedure and the FT(ℓ + |ℓ) perform better in this case, we conclude in favour of the presence of three breaks.

Of direct interest are the estimates obtained under global minimization. The break dates are estimated at 1966:4, 1972:3 and 1980:3. The first date has a rather large confidence interval (between 1964:1 and 1969:2 at the 95% significance level). The other break dates are, however, precisely estimated since the 95% confidence intervals cover only a few quarters before and after. The differences in the estimated means over each segment are significant and point to a decrease of 0.95% in 1966:3, another decrease of 2.67% in late 1972 and a large increase of 7.44% in late 1980. These results contrasts with those of Garcia and Perron (1996) who found only two breaks. This points to the fact that our procedure may be more powerful than the regime switching method they used to detect abrupt changes in level. In particular, the difference in results is largely due to the fact that allowance is made for different error structures across segments.5

6.2 Changes in the Persistence of Inflation and the Phillips Curve

Alogoskoufis and Smith (1991) considered the expectations-augmented Phillips curve:

  • equation image

where wt is the log of nominal wages, pt is the log of the Consumer Price Index, and ut is the unemployment rate. They posit that inflation is an AR(1) so that

  • equation image(12)

Hence, upon substitution, the Phillips curve is:

  • equation image(13)

where γ1 = α1δ1 and γ2 = α2δ2. Here, a parameter of importance is δ2 which is interpreted as measuring the persistence of inflation. Using post-war annual data from the United Kingdom and the United States, Alogoskoufis and Smith (1991) argue that the process describing inflation exhibits a one-time structural change from 1967 to 1968, whereby the autoregressive parameter δ2 is significantly higher in the second period. This is interpreted as evidence that the abandonment of the Bretton Woods system relaxed the discipline imposed by the gold standard and created higher persistence in inflation. They also argue that the parameter γ2 in the Phillips curve equation (13) exhibits a similar increase at the same time, thereby lending support to the empirical significance of the Lucas critique.

Using the methods presented in this paper, we re-evaluate Alogoskoufis and Smith's (1991) claim using post-war annual data for the United Kingdom.6 Consider first the structural stability of the AR(1) representation of the inflation series whose graph is depicted in Figure 3. When applying a one-break model (not reported), we indeed find the same results, namely a structural change in 1967 with δ2 increasing from 0.274 to 0.739 while δ1 remains constant. The estimate of the break is, however, imprecisely estimated with a 95% confidence interval covering the period 1961–1973. More importantly, the supFT(1) test is not significant at any conventional level indicating that the data do not support a one-break model. A feature of substantial importance is that a look at the graph of the inflation series suggests different variability in different periods. To that effect, we have investigated the stability of the inflation process allowing different variances for the residuals across segments. Details of the estimation results are contained in Table II. Again, the supFT(1) test is not significant at any conventional level, but the supFT(2) test is, however, significant at the 5% level and the supFT(2|1) test is significant at the 10% level. The supFT(ℓ + 1|ℓ) test is not significant for any ℓ ≥ 2. Since the supFT(1) test is not significant, it is not surprising that the sequential procedure selects zero break; the BIC and LWZ also select zero break. However, the supFT(2), the UD max, the WD max and the supFT(2|1) tests being all significant, the results, overall, suggest a model with two breaks.

thumbnail image

Figure 3. POST-war UK inflation rate 1947–1987

Download figure to PowerPoint

Table II. Empirical results: UK CPI inflation rate 1947–1987
Specifications
zt = {1, yt−1} het_u = 1q = 2p = 0h = 8M = 3 ε = 0.20
  • a, b

    A statistic significant at the 5% and 10% levels, respectively.

Tests
SupFT(1)SupFT(2)SupFT(3)SupFT(2|1)SupFT(3|2)
8.509.88a6.74b10.22b1.25
UD maxWD max (10%)WD max (5%)  
9.88b11.71b12.08  
Number of breaks selected
Sequential procedure0   
LWZ0   
BIC0   
Parameter Estimates with Two Breaks
equation imageequation imageequation image12
0.0240.000.01819671975
(0.008)(0.020)(0.016)(1964–1968)(1969–1981)
equation imageequation imageequation image  
0.2741.340.684  
(0.200)(0.250)(0.136)  

Nevertheless, the estimates of a two-breaks model reveal a similar picture as that suggested by Alogoskoufis and Smith (1991). The first break date is the same as in the one-break model, namely 1967, which is linked to the end of the Bretton Woods system. The second break is located in 1975. The coefficient estimates point to the importance of shifts in the persistence of inflation. Indeed, the coefficient δ2 varies from 0.274 to 1.34 in 1967. It is, however, back to 0.684 after 1975 suggesting that the effect of the abandonment of the Bretton Woods system was short-lasting.

Since there are structural changes in the inflation process, it is of interest to see if the Phillips curve equation underwent similar changes in accordance with the Lucas critique. Here, the setup involves a partial structural change model since changes in the inflation process should only affect the coefficients γ1 and γ2 with no effect on the coefficients γ3 and γ4. The results are presented in Table III. The evidence points strongly to a two-breaks model with exactly the same break dates as for the inflation process (1967 and 1975). The supFT(2|1) is significant as well as the supFT(k) tests for all k. The sequential method, the BIC and the LWZ all select 2 as the number of breaks. Finally, the UD max and WD max tests are also highly significant. Furthermore, the coefficient γ2 (associated with the lagged inflation) move in the same direction as the persistence of inflation; in particular there is a substantial increase in this coefficient in 1967 from 0.094 to 1.23 (following a change in persistence from 0.274 to 1.34). In 1975, γ2 shows a substantial decrease in agreement with the decrease in the persistence of inflation. Overall, the results confirm the conclusions of Alogoskoufis and Smith (1991) and provide support for the Lucas critique.

Table III. Empirical results: Phillips curve equation
Specifications
yt = {ΔΔwt}q = 2p = 2zt = {1, Δpt−1}xt = {ΔΔut, ut−1}
het_u = 0ε = 0.10h = 4M = 5ε = 0.10
  • a

    A statistic significant at the 1% level.

Tests
SupFT(1)SupFT(2)SupFT(3)SupFT(4) 
22.84a25.77a20.76a17.19a 
SupFT(2|1)SupFT(3|2)SupFT(4|3)UD maxWD max(1%)
24.39a4.984.9825.77a32.34a
Number of breaks selected
Sequential procedure2   
LWZ2   
BIC2   
Parameter estimates with two breaks
equation imageequation imageequation image12
0.0660.0620.18119671975
(0.012)(0.019)(0.054)(1965–1968)(1973–1976)
equation imageequation imageequation image  
0.0941.230.015  
(0.240)(0.205)(0.257)  
equation imageequation image   
−0.141−0.877   
(0.581)(0.373)   

7 CONCLUSIONS

  1. Top of page
  2. Abstract
  3. 1 INTRODUCTION
  4. 2 THE MODEL AND ESTIMATORS
  5. 3 METHOD TO COMPUTE GLOBAL MINIMIZERS
  6. 4 CONSTRUCTING CONFIDENCE INTERVALS
  7. 5 TEST STATISTICS FOR MULTIPLE BREAKS
  8. 6 EMPIRICAL APPLICATIONS
  9. 7 CONCLUSIONS
  10. Acknowledgements
  11. REFERENCES
  12. Supporting Information

This paper has presented a comprehensive treatment of practical issues arising in the analysis of models with multiple structural changes. Of considerable interest is a dynamic programming algorithm which makes possible efficient computation of the estimates of the break points as global minimizers of the sum of squared residuals. We have also discussed methods to construct confidence intervals for the break dates, test statistics and model selection procedures. All procedures discussed are available as options in a GAUSS program available for non-profit academic purposes.

Acknowledgements

  1. Top of page
  2. Abstract
  3. 1 INTRODUCTION
  4. 2 THE MODEL AND ESTIMATORS
  5. 3 METHOD TO COMPUTE GLOBAL MINIMIZERS
  6. 4 CONSTRUCTING CONFIDENCE INTERVALS
  7. 5 TEST STATISTICS FOR MULTIPLE BREAKS
  8. 6 EMPIRICAL APPLICATIONS
  9. 7 CONCLUSIONS
  10. Acknowledgements
  11. REFERENCES
  12. Supporting Information

This paper was in part the basis of the 1998 Jacob Marschak Lecture of the Econometric Society presented by Perron at the Latin American Econometric Society Meeting in Lima, Peru on 14 August 1998. The code discussed in this paper is available for non-profit academic use at the Journal of Applied Econometrics' Data Archive and also at http://econ.bu.edu/perron. Perron acknowledges financial support from the Fonds pour la Formation de Chercheurs et l'Aide à la Recherche du Québec (FCAR). We acknowledge financial support from the National Science Foundation under Grants SBR9709508 (Bai) and SES-0078492 (Perron). We wish to thank Maxwell King for useful comments.

REFERENCES

  1. Top of page
  2. Abstract
  3. 1 INTRODUCTION
  4. 2 THE MODEL AND ESTIMATORS
  5. 3 METHOD TO COMPUTE GLOBAL MINIMIZERS
  6. 4 CONSTRUCTING CONFIDENCE INTERVALS
  7. 5 TEST STATISTICS FOR MULTIPLE BREAKS
  8. 6 EMPIRICAL APPLICATIONS
  9. 7 CONCLUSIONS
  10. Acknowledgements
  11. REFERENCES
  12. Supporting Information
  • 1

    The existence of breaks in the variance could be exploited to increase the precision of the break date estimators. We, however, do not pursue this avenue and instead treat the variance as a nuisance parameter and focus on breaks in the conditional mean of yt. Hence, while permitting breaks in the variance, we do not make use of it other than to estimate the variance segment by segment when such changes are permitted.

  • 2

    It is possible to relax the constraint that a segment be at least of length q by making use of generalized inverses. We, however, have not considered this extension in the algorithm presented in Section 3.

  • 3

    The convergence properties of this scheme are discussed in Sargan (1964). Of course, convergence to the global minimum is not guaranteed and a proper choice of the initial value of β might be important to avoid a local minimum.

  • 4

    However, considering more than int[1/ϵ] − 2 breaks implies changing ϵ as one progresses through the sequential procedure. For example, one could use a trimming ϵ = 0.05 and find 6 breaks in the first half of the sample, then switch to a trimming of ϵ = 0.20 to test for a seventh break. The accompanying computer program does not incorporate the possibility of such switch and, hence, in this case the same constraints as for the supFT(k; q) test on the maximum number of breaks apply.

  • 5

    Garcia and Perron (1996) only allowed the mean of the series and the variance of the errors to be state dependent, the autoregressive parameters being constrained to be fixed across regimes. If no allowance is made for the possibility of changes in the structure of the correlation in the errors, applying our method leads to the same conclusion as in Garcia and Perron (1996), namely two breaks in 1972:3 and 1980:3. Allowing the structure of the correlation to differ across segments show that the errors are negatively correlated in the 60's and basically i.i.d. in the 1970s and 1980s. This negative correlation in the 1960s allows a more precise estimate of the break date by reducing its variance and, accordingly, the tests also have higher power.

  • 6

    The data are the same as in Alogoskoufis and Smith (1991) and were kindly provided by George Alogoskoufis. We refer the reader to their paper for details on the definition and source of each series.

Supporting Information

  1. Top of page
  2. Abstract
  3. 1 INTRODUCTION
  4. 2 THE MODEL AND ESTIMATORS
  5. 3 METHOD TO COMPUTE GLOBAL MINIMIZERS
  6. 4 CONSTRUCTING CONFIDENCE INTERVALS
  7. 5 TEST STATISTICS FOR MULTIPLE BREAKS
  8. 6 EMPIRICAL APPLICATIONS
  9. 7 CONCLUSIONS
  10. Acknowledgements
  11. REFERENCES
  12. Supporting Information

The JAE Data Archive directory is available at http://qed.econ.queensu.ca/jae/datasets/bai001/ .

Please note: Wiley Blackwell is not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing content) should be directed to the corresponding author for the article.