INFERENCE ON STRUCTURAL BREAKS USING INFORMATION CRITERIA*

This paper investigates the usefulness of information criteria for inference on the number of structural breaks in a standard linear regression model. In particular, we propose a modiﬁed penalty function for such criteria, which implies each break is equivalent to estimation of three individual regression coefﬁcients. A Monte Carlo analysis compares information criteria to sequential testing, with the modiﬁed Bayesian and Hannan–Quinn criteria performing well overall, for data-generating processes both without and with breaks. The methods are also used to examine changes in Euro area monetary policy between 1971 and 2007


INTRODUCTION
Over recent years many papers have studied aspects of change in important macroeconomic relationships through the use of formal tests for structural breaks.In particular, the seminal studies of Andrews (1993) and Bai and Perron (1998) provide researchers with statistical testing procedures to investigate the presence and timing of change when one or more breaks may occur within the available sample period.One context where such tests have been widely applied relates to monetary policy, where models for either short-term interest rates or inflation have been examined to shed light on the nature and implications of changes in monetary policy since the 1970s; examples include Cecchetti and Debelle (2006), Duffy and Engle-Warnick (2006), O'Reilly and Whelan (2005), and Zhang et al. (2008).
Information criteria provide an alternative approach to inference on structural breaks for linear models, with Yao (1988), Liu et al. (1997) (LWZ) and Zhang and Siegmund (2007) proposing versions of the criterion of Schwarz (1978) (referred to as BIC) for this purpose while Ninomiya (2005) considers a version of the Akaike (1973) criterion (AIC).Further, Bai (2000) establishes conditions under which an information criterion is consistent for estimation of the number of breaks in vector autoregressions with martingale difference sequence errors.Nevertheless, despite the widespread use of such criteria for model specification in econometrics, there appear to be very few applications to structural break inference.One reason may be that the Monte Carlo study of Bai and Perron (2006) finds that the criteria of Yao (1988) and LWZ do not perform well relative to testing-based procedures.In particular, they conclude that the former can be poor when there are no breaks (especially in the presence of serial correlation), whereas the latter often fails to detect breaks when these are present.Based on such results, Bai and Perron (2003) recommend the use of sequential testing for structural break detection.
However, an implication of recent theoretical analyses by Ninomiya (2005) and Hall et al. (2013) is that the penalty terms incorporated in the structural breaks information criteria of Yao (1988) and LWZ may not take full account of the estimation of break dates.More specifically, extending the analysis of Ninomiya (2005) to a regression model, we show in Hall et al. (2013) that estimating the dates of change in a process that experiences m true breaks has an asymptotic effect on the minimized residual sum of squares equivalent to the estimation of 3 m coefficients, rather than m as embedded in the penalty function employed by Yao (1988).Based on this result, the present paper proposes a modified penalty term for information criteria in the context of structural break estimation.
Employing data-generating processes (DGPs) similar to those used in Bai and Perron (2006), the present paper undertakes a Monte Carlo study to examine the performance of a range of consistent information criteria for estimating the number of structural breaks, also comparing these with results obtained using the sequential testing procedure of Bai and Perron (1998).Implementation of information criteria approaches requires searching for the global minimum of the residual sum of squares, for which the efficient algorithm of Bai and Perron (2003) is employed.Our results indicate that the modified penalty term substantially improves the overall performance of both the BIC criterion and also that of Hannan and Quinn (1979) (HQIC).Indeed, these can provide reliable information for structural breaks inference even in the presence of serial correlation, when sequential testing does not perform well.
Using a range of techniques, a number of studies have drawn inferences about changes in US monetary policy by employing analyses that allow time variation in the coefficients of the policy rule; see, for example, Boivan (2006), Duffy and Engle-Warnick (2006) or Sims and Zha (2006).Surprisingly, however, few such studies focus on Euro area monetary policy changes in a historical context.Such changes are of particular interest, however, because the Euro area came into existence only in 1999, but data have been constructed back to the 1970s by aggregating over the countries that later com-bined to form this monetary union.Although both Clausen and Hayo (2005) and Castelnuovo (2007) consider the possibility of a break in Euro area monetary policy, they assume the date is known to be 1999, whereas monetary policy of the constituent countries may have changed before that date as monetary integration progressed, or subsequently as monetary policy developed for the newly formed area.We employ a formal structural breaks analysis to shed light on this question.
The structure of the paper is as follows.Section 2 first sets out the regression model with structural breaks and then discusses the methodology of structural breaks inference, focusing particularly on information criteria methods.A simulation analysis is conducted in Section 3 to compare the performance of various information criteria with testing-based procedures for estimating the number of structural breaks.An analysis of Euro area monetary policy follows in Section 4, with concluding remarks in a final section.

The Model
The case of interest is a linear DGP that exhibits m ≥ 0 true breaks in coefficients, such that where T 0 0 0 = and T T m+ = 1 0 , for a total sample of size T.In (1), yt is a stationary dependent variable, while xt is a p × 1 vector of exogenous explanatory variables that includes the constant term and may also include autoregressive terms, β i 0 is the corresponding vector of regime-dependent coefficients for i = 1, . .., m + 1 and ut is a mean zero disturbance with variance σ 2 .Although estimation of the parameters of (1) is straightforward given knowledge of the break dates at T i m i 0 1 ( , , ) = … , in practice a researcher typically knows neither the true number of breaks m nor their temporal locations.
In order to derive analytical results, a number of formal assumptions are required on (1), such as those made by Bai and Perron (1998) or Hall et al. (2013).These always require that the breaks are distinct, so that T T i i 0 0 = [ ] λ , where 0 1 and [•] is the integer part of the expression in brackets.Each of the regimes is assumed to satisfy Ti − Ti−1 ≥ [εT] ≥ p (i = 1, . .., m + 1), which therefore contains a pre-specified minimum number of observations that must be sufficient to enable estimation of the regression coefficients β i 0 .Clearly, it is required that β β i i 0 1 0 ≠ + for regimes i and i + 1 to be distinct for the coefficients of (1) and hence for distinct regimes to be defined in terms of these coefficients.
Assumptions are required on the behaviour of the regressors xt and the disturbances ut, including the exogeneity restrictions E[ht] = 0, where ht = xtut.This exogeneity restriction generally rules out the inclusion of lagged dependent variables in xt in the presence of autocorrelated disturbances.Otherwise, autocorrelation of a stationary form is permitted.As noted by Bai and Perron (1998), the researcher may therefore have a choice between using a parametric dynamic model with uncorrelated disturbances or using a static regression model with autocorrelated disturbances.In the latter case, a testing approach requires making an appropriate non-parametric correction for autocorrelation.The regressors in (1) are assumed to be 'well-behaved', with I(1) and trending regressors ruled out. 1  Since the investigator has no a priori knowledge of either the number or dates of breaks in (1), a search strategy is employed to estimate these.The assumption that each true regime contains at least [εT] observations also specifies the minimum length of the estimated regimes; following Bai and Perron (1998) and others in this literature, this minimum regime length also applies at the beginning and end of the sample.In practice, the parameter ε is specified by the researcher, and this is often referred to as the trimming parameter.Searching therefore, considers all observations t = [εT], [εT] + 1, . .., T − [εT] as potential break dates, subject to the required minimum sample proportion between breaks.An efficient search algorithm is discussed in some detail by Bai and Perron (2003), and is employed in our analysis below.
Now, consider a regression model for (1) that is correctly specified, except that the number of breaks considered, denoted as n, may have n ≠ m: where e t * is an error.For given break dates Ti (i = 1, . .., n), the estimates of … n are obtained by minimizing the sum of squared residuals … n ; we denote these estimates as β Using the efficient search algorithm of Bai and Perron (2003), (3) can be evaluated for all n + 1 partitions of the sample satisfying Ti − Ti−1 ≥ [εT], with the estimator of the set of break points then obtained as the global minimizer 1 The specific assumptions made differ across studies.For example, Hall et al. (2013) assume that the scaled regressor cross-product matrix has constant asymptotic properties over regimes, while Bai and Perron (1998) allow this to be regime-dependent.Further, although not considered in the present paper, Bai (1999) examines the case of trending regressors.
ˆ, , ˆarg The corresponding estimated break fractions are denoted as λ n ( ), the n × 1 vector with jth element equal to T T j .The estimators λ n ( ) and ˆβ =1 are calculated conditional on n.In practice, n is typically unknown a priori, and the next two subsections outline the sequential testing and information criteria approaches to obtaining the optimal n, yielding the estimator m of m.

Sequential Testing
Bai and Perron (1998) propose a method for estimation of the number of breaks based on the sequential application of tests for parameter change.The strategy consists of applying tests for n + 1 breaks against the null hypothesis of n breaks, for n = 0, 1, . .., N − 1, where N is the maximum number of breaks considered.The tests are applied for an increasing number n, but stop at n m = ˆwhen the null hypothesis is not rejected for this n at the specified significance level.
In more detail, the procedure is as follows.For n breaks, the optimal break dates given by (4) are obtained.The test against the alternative of n + 1 breaks then examines each of the n + 1 segments defined by ˆ, , T T n 1 …

(
)to determine whether the insertion of one additional break date significantly decreases the residual sum of squares.For a regression with disturbances that are neither autocorrelated nor heteroscedastic, the Bai and Perron (1998) sequential test statistic is 5)   where σ 2 is a consistent estimator of the disturbance variance and Λi is the set of all partitions within the ith regime defined by ˆ, , T τ contain at least the minimum fraction ε of the total sample T. Autocorrelation and/or heteroscedasticity robust versions of ( 5) are available where these are required, while Qu and Perron (2007) extend the approach to systems of equations.
An implication of ( 5) is that although the global optimizer (4) is used to obtain the residual sum of squares and associated break date estimates for n breaks, this is not compared with the analogous global optimizer for n + 1 breaks: rather, the latter considers the insertion of an additional break date into those given by ˆ, , T T n 1 …

(
).Although Bai (1999) provides a sequential test that employs a comparison of the respective global optimized residual sum of squares for n + 1 versus n breaks, we do not consider it here as this test The Manchester School does not appear to be widely used by practitioners. 2 Since the Bai and Perron (1998) test is now commonly used in empirical econometric research, our Monte Carlo study below compares the performance of various information criteria methods to the sequential procedure based on their test.

Information Criteria
Information criteria used for estimation of the number of breaks in (1) can be written in generic form where S T T T n ˆ, , 1 … ( )is the global minimum of the residual sum of squares for n breaks, as in (4), and K(n, T) is a penalty term that depends on the dimension of the model.The estimated number of breaks then minimizes the information criterion over the potential number of breaks considered (n = 0, . .., N), so that ˆarg min , , The penalty term typically has the form K(n, T) = K1(n)K2(T) where K1(n) is a monotonically increasing function of n while the predominant choices of K2(T) are ln(T)/T, which is associated with BIC (Schwarz, 1978), 2 ln[ln(T)]/T which is the choice associated with HQIC (Hannan and Quinn, 1979), or 2/T as in AIC (Akaike, 1973).Yao (1988) considers a BIC criterion when the only parameter of interest is the mean of an independent and identically distributed (i.i.d.) Gaussian process.For a regression model as in (1), the form used (see, for example, Bai and Perron, 2006) is This K1(n) effectively treats estimation of each break date as equivalent to estimation of a single coefficient in (1). 3 Yao (1988) establishes the consistency of BIC with (8) for the estimation of the number of breaks in his context.Using similar arguments to Bai's (2000) proof of his Theorem 6, it is possible to establish consistency for a wider range of penalty functions.4 2 Further, the Bai (1999) test is not included in the Monte Carlo analysis of Bai and Perron (2006).
3 Note that for practical purposes this can be replaced by K1(n) = n(p + 1), since the term p in ( 8) is common to all comparisons made and hence can be omitted.
Specifically, in our notation, m defined by ( 7) is consistent for m provided that K2(T) satisfies These conditions cover both BIC and also the HQIC criterion, with K T T T 2 2 HQ ( ) ln[ln( )] = . Although apparently not considered previously in the context of estimating the number of structural breaks, our analysis considers K T 2 HQ ( ), in addition to K T 2 BIC ( ), in conjunction with K1(n) of (8).However, the AIC criterion K T T 2 2 AIC ( ) = does not satisfy these conditions and can asymptotically lead to overestimation of m; consequently AIC procedures are not considered in our analyses.
A different BIC-type criterion is proposed by Liu et al. (1997), who argue that (8) is not sufficiently severe for inference in a non-Gaussian model and their penalty employs where c0 > 0 and δ0 > 0. Based partly on simulation experiments for sample sizes between 30 and 200, they recommend c0 = 0.299 and δ0 = 0.1.Further, LWZ employ a degrees of freedom correction, with S T T However, since this is equivalent to including the additional term −ln{T − [(n + 1)p + n]} in ( 6), it has no asymptotic role 5 and the LWZ criterion leads to consistent inference.The finite sample performance of the BIC-type criteria of ( 8) and ( 10) are compared with testing-based methods in the simulation study of Bai and Perron (2006) and also in Section 3 below.Taking a more theoretical perspective, Zhang and Siegmund (2007) follow Schwarz (1978) by employing a Bayesian approach.That is, for an unknown number of breaks in the mean of an independent Gaussian process and a uniform prior distribution over the break dates and regime means, Zhang and Siegmund (2007) derive an asymptotic approximation to the posterior probability.The resulting penalty K(n, T) is data-dependent, involving both the number of breaks and the intervals between the break fractions.Although these intervals become relatively less important as T increases, they note that estimating each break date carries a penalty equivalent to the estimation of between one and two mean values, which again implies that the penalty embodied in (8) may not be sufficient to capture the impact of break date estimation on the residual sum of squares in samples of positive fraction of the total sample size).As a result, the conditions on the penalty function for consistency of the information criteria stated in Bai's (2000) Theorem 6 can be relaxed to (9) in our setting. 5Division of S T T T n ˆ, , 1 … ( )by T in (6) plays no (asymptotic or finite sample) role for (7), since −ln(T) is constant over all model comparisons.Although the LWZ degrees of freedom correction may have a finite sample effect, nevertheless different n lead to asymptotically negligible differences in −ln{T−[(n + 1)p + n]} across models.
The Manchester School moderate size.However, to our knowledge, this approach has not been extended to a more general regression context.
To detect the breakpoints for the mean and variance of an i.i.d.(vector) Gaussian process, Ninomiya (2005) considers AIC as a bias-corrected maximum likelihood estimator.In contrast to Yao (1988), where each breakpoint has the same weight as one conventional parameter, Ninomiya (2005) shows that evaluation of the bias leads to each breakpoint having a weight equivalent to three such parameters.Although AIC remains unattractive because it does not lead to consistent estimation of m, the result of Ninomiya (2005) is illuminating for the importance of break date estimation in relation to the estimation of the other parameters.
Our analysis in Hall et al. (2013) extends this result to the regression context and for non-Gaussian but serially homoscedastic uncorrelated disturbances, where the regressor cross-product matrix satisfies ( ) and Q(r) is linear in the sample fraction r.In order to facilitate the derivation of asymptotic results, the breaks examined in Hall et al. (2013) are 'shrinking', in the sense that β β 1 0 0 are assumed to converge to zero as the sample size increases. 6Under these assumptions, if the regression model (1) experiences m true structural breaks, then the difference between the asymptotic expectation of the global minimizer of (4) and the expected residual sum of squares evaluated at the true break dates and with true parameters is shown to be where AE denotes the asymptotic expectation of the quantity in parentheses; see Hall et al. (2013) (Theorem 1).In common with the more restricted case examined by Ninomiya (2005), ( 11) implies that estimation of each break date has an asymptotic impact on the global minimum of the residual sum of squares equivalent to estimation of three individual coefficients in (1).Since conventional information criteria for model selection employ a penalty component K1 equal to the number of coefficients in the model, the result in (11) suggests that the appropriate penalty for an information criterion used for estimation of the unknown number of structural breaks in (1) is Clearly, this penalty is more severe than that used by Yao (1988) and consequently may alleviate the tendency for the BIC criterion using (8) to detect spurious breaks in the simulation study of Bai and Perron (2006).It may also be noted that the BIC penalty of Zhang and Siegmund (2007), which is 6 The formal assumption is that β β θ , where sT = T −α for some α ∈ (0, 0.5) and θ i 0 does not depend on T. Bai and Perron (1998) make the same assumption in order to derive confidence intervals for break dates.derived from a different perspective and assumes breaks of fixed magnitude, will deliver results intermediate between that of ( 8) and ( 12).Since HQIC is also able to provide consistent inference for the number of structural breaks, our analysis uses (12) in conjunction with K T 2 HQ ( ), in addition to employing it with K T 2 BIC ( ).The criteria using ( 8) and ( 10) sometimes lead to poor inference on the number of breaks in the Monte Carlo analysis of Bai and Perron (2006).The next section reconsiders the performance of information criteria in this context by expanding the set to include modified BIC and HQIC that employ the penalty component of ( 12).

MONTE CARLO ANALYSIS
This section evaluates the performance of information criteria alongside the sequential testing procedures of Bai and Perron (1998), by comparing their empirical distributions for the number of estimated breaks.The information criteria include BIC and HQIC, employing K T 2 BIC ( ) and K T 2 HQ ( ) (respect- ively) with K1(n) = (n + 1)p + n, together with the modified versions that replace this last expression with ( 12) and are denoted as MBIC and MHQIC respectively.The LWZ criterion is also included. 7The Bai and Perron (1998) sequential testing procedure is examined with no correction for heteroscedasticity or serial correlation (BP); allowing for both, following the Bai and Perron (2006) simulations in using the covariance matrix estimator of Andrews (1991) and a Quadratic Spectral kernel with a first-order autoregressive (AR(1)) approximation to construct the optimal bandwidth (denoted (BP(HAC)); for DGPs with no structural breaks, the versions of BP with corrections for serial correlation only (BP(AC)), and allowing for heterogeneous error variances only (BP(Het)) are also included.We omit these last two cases for DGPs with breaks in order to save space, but also (and more substantively) because researchers undertaking structural breaks analyses often wish to employ HAC estimators in order to account for unmodelled serial correlation and heteroscedastic data features.
The experiments we undertake are primarily based on those of Bai and Perron (2006), although we extend the analysis to consider DGPs with four regressors (in addition to the intercept), rather than the maximum of one considered by Bai and Perron (2006).In all experiments discussed below, εt is a sequence of i.i.d.N(0, 1) random variables and wt, representing one or more regressors, is a scalar or vector of i.i.d.N(1, 1) random variables that are mutually and serially uncorrelated and also uncorrelated with εt.Reported results primarily relate to the sample size of T = 120, corresponding to 30 years of quarterly observations and which is typical for empirical macroeconomic analysis.Nevertheless, the impact of increasing sample size is illustrated by also presenting empirical distributions for a selection of DGPs that present difficulties for all methods.Specifically, results are shown using T = 240 for DGPs which exhibit no breaks but contain moderately strong un-modelled AR(1) disturbances, and also for DGPs with true break(s) in the intercept but constant regressor coefficients; a full set of results for all DGPs used here and T = 240 is available from the authors on request.
Each DGP is replicated 2000 times and within each replication the same random observations are employed across all methods.The sequential testing procedure of Bai and Perron (1998) is implemented with a nominal significance level of 5 per cent, using the critical values recently provided by Hall and Sakkas (2013).All simulations are performed in MATLAB.
Results are presented as empirical frequency distributions for the numbers of breaks identified.Subsections 3.1, 3.2 and 3.3 consider results for DGPs with no breaks, one break and two breaks respectively.The maximum number of breaks allowed (N) depends on the trimming window employed in each case.For the no breaks case we present results for ε = 0.10, 0.20, with ε = 0.10 used in the one and two-break DGPs.For ε = 0.10 we set N = 5, but for ε = 0.20 we set N = 3 as more breaks would result in trivial cases where breaks would only be allowed at specific locations due to the trimming restriction.Results are available on request for ε = 0.20 in DGPs with breaks.
There are, of course, trade-offs in choosing the appropriate trimming.A higher value of ε leaves more observations in each segment for parameter estimation and Bai and Perron (1998) find this to be particularly important when HAC robust inference is applied using sequential testing for breaks.Indeed, they recommend the use of a relatively wide trimming window, such as ε = 0.20, in this case, in order to avoid substantial size distortions exhibited by HAC tests with relatively small ε.The disadvantage of large trimming, however, especially in a relatively modest sample size of T = 120, is that it restricts the fitting of breaks in the sense that it leaves fewer permissible break locations in the sample and this may result in omitting true breaks or forcing them to the wrong locations.Bai and Perron (1998) also effectively assume that the researcher knows whether the true DGP exhibits serial correlation; hence they apply HAC inference and ε = 0.20 when this is present and inference for serially uncorrelated disturbances and ε = 0.05 when it is absent.In contrast, we employ both types of test and report results for the intermediate value ε = 0.10 for DGPs that exhibit structural breaks.

No-break DGPs
We employ a total of eight different DGPs that exhibit no structural breaks, with the following showing both the true DGP and the corresponding regression which is employed for inference (with p being the number of coefficients, including the intercept), with the theoretical R 2 associated with the latter also provided: In all cases, the required starting values for ut or εt, as appropriate, are set to zero.
The first six DGPs above are the set used by Bai and Perron (2006).DGPs 1 and 2 are the benchmark cases of i.i.d.N(0, 1) disturbances, with either an intercept only or an intercept and a single exogenous regressor in the regression model.DGPs 3 − 6 allow for various patterns of autocorrelation.Although DGPs 3 and 4 are identical, the dynamics are explicitly modelled only in DGP 3. DGPs 7−8 extend the set considered by Bai and Perron (2006) to include four exogenous regressors together with an intercept.The coefficient vector γ0 has all elements set to 0.50 and 0.58 for DGPs 7 and 8, respectively, which ensures the theoretical R 2 = 0.5.
The results are presented in Table 1.In the benchmark cases (DGPs 1 and 2) all the information criteria in our study perform very well, selecting the true model more than 95 per cent of the time, with the exception of the unmodified HQIC.Nevertheless, the modification of (12) improves performance for both BIC and HQIC, so that MBIC is (like LWZ) almost always correct while MHQIC has a success rate above 98 per cent.With regards to the performance of sequential testing, the BP method that takes no account of serial correlation or heteroscedasticity outperforms the other versions and has empirical size close to nominal size.In these, and in fact all DGPs in the table, the larger trimming parameter value (ε = 0.20) improves the ability of the procedures to select the correct number of breaks, especially when heteroscedastic consistent inference is employed.Although Bai and Perron (2006) also find that size is improved with larger trimming, and recommend the use of ε = 0.20 with HAC inference, nevertheless BP(HAC) has empirical size twice its nominal value in DGP 2 with this value of ε.
When the AR(1) process is estimated in DGP 3, performance is generally similar to the benchmark cases.Note, in particular, that while the performance of HQIC deteriorates, especially with ε = 0.10, other methods are largely unaffected.8However, the presence of unmodelled positive    disturbance autocorrelation (DGPs 4 and 5) adversely affects all inference methods, implying that explicit modelling of the dynamics leads to better performance than relying on HAC inference with these sample sizes when R 2 is low.The worst performance is again given by HQIC, where it finds an average of three spurious breaks for DGP 4 with ε = 0.10.Although the modification helps, nevertheless the performance of MHQIC remains relatively poor for this DGP.Notice that BIC performs relatively poorly, as noted by Bai and Perron (2006), but the degrees of freedom correction of ( 12) is successful in tempering the detection of spurious breaks.Our results also confirm that LWZ performs well when no breaks occur.Although BP(AC) and BP(HAC) outperform the tests without autocorrelation corrections, they are substantially oversized, even with T = 240. 9Conversely, the negatively autocorrelated MA(1) process of DGP 6 leads to all procedures having very high success rates, although this also implies that the sequential testing methods are undersized.
It should also be remarked that DGPs 4−6 are difficult, since the regression model has no explanatory power (i.e.R 2 = 0).Indeed, as the parameter in an AR(1) process such as DGP 4 approaches a unit root, spurious detections of structural breaks in the constant may be anticipated to occur relatively more frequently.A similar comment applies also to the moving average of DGP 5.
With the inclusion of exogenous regressors in DGPs 7 and 8, the information criteria maintain a high level of success in inference across the board, albeit with performance being a little worse in the presence of autocorrelation than when this is absent.Indeed, the inclusion of regressors aids inference using these criteria, as seen by comparing DGPs 4 and 8. Surprisingly, however, this is not the case with the testing approach, where more marked oversizing applies with the application of a heteroscedasticity correction in DGPs 7 and 8 compared with DGPs 2 and 4 (respectively) when T = 120; the HAC correction is especially poor in the former cases and is out-performed by the uncorrected BP test even when autocorrelation is present.
The information criteria that are the most reliable overall when no breaks are present are MBIC and LWZ, but their advantage over MHQIC is not great when the regression model has at least reasonable explanatory power (DGPs 2, 3, 7 and 8).Our results also imply that sequential testing employing a HAC correction is not recommended in models containing multiple regressors, even with ε = 0.20.It is also notable that MBIC, MHQIC and LWZ have good performance in Table 1 (with the partial exception of 9 The oversizing we find for these in Table 1 is greater than that reported by Bai and Perron (2006).However, they do not indicate whether the sample size employed for their DGPs with no breaks is T = 120 or T = 240, and the better size they report may be associated with the use of larger T. Note also that results in our Table 1 for DGP 4 with T = 240 are similar to the values indicated by Bai and Perron (2006) for this DGP.
DGP 4) irrespective of whether trimming of 0.10 or 0.20 is applied, even when p = 5 coefficients are examined for potential breaks.It is this feature of the information criteria that leads us to focus on ε = 0.10 for DGPs with structural breaks.

One-break DGPs
As in Bai and Perron (2006), the DGP we employ for one break can be written as [ .] The results, given in Table 2, are divided into four cases.These are a single regressor plus constant (p = 2), with i.i.d.disturbances (Case I) and AR(1) errors (where ut = 0.5ut−1 + εt, εt = i.i.d.N(0, 1)) (Case II), together with analogous processes with four exogenous regressors and a constant (p = 5) (Cases III and IV).We present various combinations of parameter values within each case, some of which are considered also by Bai and Perron (2006); however, they do not consider four regressor DGPs as in our Cases III and IV.In these latter cases, the same parameter value applies for all elements of γ1, and similarly for γ2.Under all four cases, we also consider DGPs such that the same theoretical regression R 2 applies across the two segments, and this value is given in the table.The regression model applied always includes an intercept and the indicated (correct) number of regressors.As noted above, results for T = 240 are included for DGPs where only the intercept changes, but otherwise the sample size of T = 120 is employed.We consider first DGPs that exhibit breaks in the regressor coefficient vector.For Case I, all methods except (unmodified) HQIC perform well when there is change in the exogenous variable coefficient(s), with or without a change in the intercept.The modification embodied in (12) works well here, pushing performance close to 100 per cent, and benefits HQIC more than BIC since the former initially has worse performance.Irrespective of the application of the HAC correction or not, the BP test correctly estimates the presence of one break more than 90 per cent of the time across all Case I DGPs.Nevertheless, performance is always worse for Case II (with AR(1) disturbances), compared with the corresponding Case I DGP.Despite the information criteria correction being based on the analysis of Hall et al. (2013) that assumes serially uncorrelated disturbances, MBIC continues to do well when autocorrelation is present, and has similar (and sometimes better) performance compared with LWZ.Not surprisingly, the BP test taking no account of serial correlation can be relatively poor and BP(HAC) improves on this, but it never surpasses MBIC.In this context, it should also be noted that no size corrections are applied to the sequential tests, with statistics compared with the nominal critical values throughout.The Manchester School The patterns of results for Cases I and II largely carry over to Cases III and IV with four regressors, when coefficients on these regressors change.Nevertheless, the information criteria benefit from the additional regressors relative to sequential testing, with all except HQIC correctly identifying one break with higher frequency than either BP or BP(HAC).Notice also that BP performs better than BP(HAC) even when autocorrelation is present.Since the information criteria are applied without any consideration of autocorrelation, their good performances here provide an advantage over testing.That is, in order to apply the theoretically appropriate test, the researcher not only has to decide in advance whether autocorrelation is present or not, but we now find there is risk that use of the HAC statistic may result in a drop in performance even when it is appropriate.
Finally note that, compared with other DGPs and for a sample size of T = 120, the occurrence of a break in the intercept alone (i.e. the first sets of results in each of Cases I-IV) leads to a reduction in overall performance, with this sometimes being substantial.These breaks may be considered small Notes: As for Table 1, except that all results employ ε = 0.10 and R 2 gives the theoretical R 2 for the regression in each segment.
and difficult to detect, especially for Cases III and IV with four regressors whose coefficients are constant and a change equal to one disturbance standard deviation applies in the intercept.Although MBIC generally fails to detect intercept only breaks with this smaller sample size, its relative performance markedly improves with T = 240.With four regressors, the LWZ criterion performs very poorly, correctly detecting one break at most 25 per cent of the time even with the larger sample size.

Two-break DGPs
The simulated DGP with two breaks has the form which again follows Bai and Perron (2006).As with one break, we divide the results, given in Table 3, in four cases differentiated by the nature of the errors (i.i.d. and AR(1)) and the number of exogenous regressors (one and four, plus intercept).Again, we include DGPs with changes in the intercept alone, changes in the regressor coefficients, changes in both, and controlled changes that keep the R 2 constant across segments.In the two breaks model, we include DGPs where the three regimes include two distinct sets of coefficients and also DGPs with two non-reverting breaks in both the intercept and regressor coefficients.Some of the parameter sets that we consider are used also by Bai and Perron (2006); however, they do not consider DGPs with more than one exogenous regressor and, indeed, their DGPs have no regressors when ut is AR(1).Unlike the one break case of the preceding subsection, we use different values for μi and γi for four versus one regressor, since we found that parameter sets that present a challenge for all methods in DGPs with one regressor lead to high levels of performance and little discernible difference across methods when four regressors are used.As for the onebreak DGPs, results for both T = 120 and T = 240 are shown when the intercept only changes, with T = 120 being used otherwise.
The results generally follow similar patterns to those of the one-break DGPs with the corresponding error structure and number of regressors, albeit with some notable differences.For DGPs where a change in intercept and/or regressor coefficients is later reversed, typically either zero or two breaks are detected across all methods.With a single regressor, plus intercept, the use of ( 12) implies a substantial increase in the penalty for BIC when two breaks are considered, causing MBIC often to perform worse than the original BIC of Yao (1988) for T = 120.With its initially more liberal penalty, the performance of HQIC is improved using ( 12) and it performs well for this sample size.Although we show results for T = 240 only for the intercept break The Manchester School 0.00 0.00 0.00 0.00 0.00 0.00 89.45 0.00 0.00 0.00 0.00 0.00 0.00 0.00 case, the patterns seen in Table 3 of MBIC selecting the correct number of breaks more frequently than BIC for this larger sample size and of MHQIC often over-specifying the number of breaks in the presence of autocorrelated disturbances apply across all two-break DGPs we consider.Although there are DGPs in Table 2 where it performs well, LWZ often has poor performance in Table 3, finding no breaks up to 90 per cent of the time (Case IV) when in fact two breaks are present.Notes: As for Table 2.

The Manchester School
Sequential testing never matches the performance of MHQIC in terms of correctly estimating the number of true breaks in Table 3. Further, irrespective of whether autocorrelation is present in the true DGP or not, BP(HAC) has a strong tendency to opt for no breaks, a finding in common with the results of Bai and Perron (2006).In the light of the oversizing of the sequential tests in Table 1 in the presence of unmodelled autocorrelation, especially with ε = 0.10, it appears they have little power.It is to be noted, however, that the performance of the sequential procedure for these DGPs might be substantially improved by prior application of a 'double maximum' test for the null hypothesis of no breaks against the alternative of one or more breaks.
Considering the simulations results presented here in their entirety, for zero, one and two breaks, we conclude that there is no clear winner when the aim is to estimate the number of true breaks.Nevertheless, in terms of avoiding spurious breaks, the MBIC and LWZ criteria perform well in Table 1, while also avoiding the oversizing sometimes exhibited by the sequential testing approach.However, as also noted by Bai and Perron (2006), LWZ can be poor in detecting breaks when these are actually present (Tables 2 and 3).In this respect, our modification of BIC, namely MBIC based on the penalty ( 12) is preferable to LWZ overall and the modification also considerably improves the performance of HQIC.Although the latter can detect spurious breaks, this occurs primarily when the true DGP includes a positively autocorrelated process that is not explicitly modelled.On the other hand, while MBIC largely avoids spurious breaks, it sometimes underestimates the number of true breaks, but delivers impressive performance in estimating the number of breaks for samples of T = 240 observations.Overall, these two modified information criteria perform well relative to sequential testing.

EURO AREA MONETARY POLICY
Since the establishment of the European Monetary Union (the Euro area) in 1999, discussion has been ongoing about the nature of monetary policy pursued by the European Central Bank (ECB).However, largely due to the relatively short period of existence of the Euro area, it is common to base empirical studies of ECB monetary policy on data extending back to the 1970s or 1980s; recent examples include Castelnuovo (2007), Clausen and Hayo (2005) and Lippi and Neri (2007).Nevertheless, the possibility of change during the period of analysis is sometimes recognized by the authors.Although Castelnuovo (2007) examines whether a structural break occurs at the beginning of 1999, he finds no evidence for such a break.
On the one hand, it may appear surprising if the establishment of the Euro area in 1999 did not see any change in the nature of monetary policy across the (aggregate) area, when the responsibility for the conduct of this policy passed from individual countries to the ECB.On the other hand, the beginning of the process of European monetary integration is usually dated to be the introduction of the European exchange rate mechanism in 1979, with progress being somewhat chequered over much of the subsequent two decades.Hence it is plausible that an aggregate monetary policy equation estimated over an extended period may have experienced more than one structural break.Further, even if the establishment of the Euro area itself led to a break, the appropriate date for this is unclear since the decade of the 1990s witnessed a number of landmarks in the movement towards monetary integration.To shed light on these questions, we apply the methods of structural break inference discussed in preceding sections to Euro area monetary policy.

Model and Data
Empirical monetary policy functions are frequently based on the Taylor rule, originally proposed by Taylor (1993) as a description of interest rate policy in the USA.As in Castelnuovo (2007) and many other studies, the Taylor rule can be written as r y u where πt is inflation and yt is the output gap; in practice, ut is typically autocorrelated.This equation may be considered as the baseline monetary policy reaction function of macroeconomic modelling.Therefore, to gain insight into Euro area monetary policy in a historical context, the next subsection applies the structural break methods discussed in previous sections to (13).
Any analysis for the Euro area over a sample period that starts prior to 1999 must employ pseudohistorical data, where the series are constructed by aggregating across countries that later constituted the Euro area.A common source for such data, employed by Castelnuovo (2007), Lippi and Neri (2007), and many other researchers, is the AWM database prepared within the ECB for use in their area-wide model (Fagan et al., 2005).However, Anderson et al. (2011) argue that the fixed-weight cross-country aggregation of the type adopted in the AWM database pre-1999 may be inappropriate for representing the financial and monetary characteristics of the later Euro area.Therefore, in order to mitigate the effects of possibly inappropriate aggregation, our main analysis employs the interest rate and inflation series constructed by Anderson et al. (2011), while also noting the nature of results obtained when the corresponding AWM series are used.Anderson et al. (2011) (ADOV) provide aggregate Euro area data for inflation and interest rates from 1971Q1 to 2007Q4 inclusive, giving 148 quarterly observations.To represent monetary policy, the short-term (threemonth) interest rate is employed as rt, while πt is annual percentage inflation in the harmonized index of consumer prices, HICP, computed as 100 times the difference of log values compared with one year earlier.Finally, and as conventional in much macroeconomic analysis, the output gap yt is measured by applying the Hodrick-Prescott filter to the logarithm of real gross domestic product (GDP) obtained from the AWM database. 10Figure 1 shows the data used to obtain the results reported in Tables 4 and 5.In line with the modern macroeconomic theory of monetary policy, both interest rates and inflation are assumed to be stationary variables, albeit with the former exhibiting possible structural breaks. 11

Results
The results in Table 4 show evidence of multiple structural breaks in Euro area monetary policy since 1971, reflecting the changing nature of monetary 10 Although later AWM data are available, Euro area real GDP is truncated at 2007Q4 prior to application of the Hodrick-Prescott filter.This is to avoid the two-sided filter using information about the subsequent recession, thereby affecting output gap estimates at the end of our sample.The output gap is computed as 100 times the difference between log real GDP and the Hodrick-Prescott trend estimate.
The formal assumptions of Hall et al. (2013) also require inflation and the output gap to be stationary, without structural breaks.This assumption may be called into question in relation to inflation over the sample examined here.

FIG. 1. Data for Euro Area Monetary Policy Analysis
affiliations in Europe over the period.That analysis employs trimming parameters ε = 0.10 and 0.20 in conjunction with maximum numbers of breaks of N = 5 and 3, respectively, with a 5 per cent nominal significance level used for the sequential tests of Bai and Perron (1998).Although BP(HAC) finds no break in Panel A, which is in line with the Monte Carlo results of Table 3 for DGPs with multiple breaks, it may nevertheless be noted that tests (not reported) of no breaks against two or more breaks, as well as the 'double maximum' tests of Bai and Perron (1998) using BP(HAC), would deliver the conclusion that multiple breaks are present.Although all methods except BP(HAC) find the maximum permitted three breaks when ε = 0.20, the number varies between three and five for the narrower trimming parameter of ε = 0.10.In effect, for this latter case, BP omits the first break identified by the BIC and HQIC-based criteria, while LWZ effectively omits a further one.It is also noteworthy that identical results to those shown are obtained with trimming ε = 0.10 when the maximum number of breaks is set at N = 6, except that HQIC finds an additional break in the mid-1980s.The breaks uncovered suggest that the decades before the establishment of the Euro area should not be regarded as a period of constant monetary policy.Nevertheless, no evidence of a change in monetary policy is indicated after 1999, namely from the establishment of the Euro area.
Although the first break identified in 1974 in Table 4 may be associated with the response to the inflationary pressures induced by the oil price increases of the period, others breaks appear to be associated with events in Europe.Indeed, the first European monetary system began operation in March 1979 and, for this reason, some researchers explicitly select that year as the starting date for their Euro area analyses (see, for example, Clausen and Hayo, 2005).The break dated in 1980 may, therefore, be due to this change in monetary policy.The period around 1990 was atypical for Europe The Manchester School due to macroeconomic consequences of German reunification; see, for example, the discussion in Perez et al. (2007).Further, the Maastricht Treaty, which agreed the final stage of monetary integration, came into force in 1993.Finally, although the LWZ criterion finds a break in 1997, other methods point to such a break occurring at the euro introduction in 1999Q1.Unlike Castelnuovo (2007), therefore, our analysis does point to a break in monetary policy at the establishment of the Euro area.
Table 5 shows the estimated coefficients of (13) obtained using both the full sample and the break dates obtained using MBIC and MHQIC for ε = 0.10; estimation is by ordinary least squares and HAC standard errors are shown for all coefficients.The choice of these regimes is based on the good performance of MBIC and MHQIC with this small trimming in the Monte Carlo analysis of Section 3, even in the presence of autocorrelation, and also on the relevance of the estimated break dates for events in European integration (as just discussed).Comparison across estimates indicates that the full sample regression explains substantially less of the variation in interest rates than the models estimated over the subsample periods (except 1989-93) and, further, exhibits substantially more first-order serial correlation than the subsample estimations; both of these features of the full sample estimates could be a consequence of structural breaks.
There are a number of interesting features to the changing monetary policy responses shown in Table 5.First, inflation plays a much more important role for monetary policy after 1974.Second, the distinctive nature of the 1989-93 period is emphasized, with interest rates being high in relation to inflation in order to finance the costs of German reunification (see Fig. 1).Third, the fight against inflation is evident in the subsample from 1993 to the late 1990s, when a number of countries had to bring inflation down in order to meet the Maastricht Treaty criteria for joining the Euro area.Finally, the euro period (post-1999) shows an apparently subdued response of interest rates to inflation, which may be due to inflation itself being close to the ECB target of 2 per cent during this period.Time variation in the monetary policy responses to the output gap are also indicated by Table 5, with the strongest responses being during the 1970s and since 1999; Fig. 1 shows that interest rates primarily track the output gap during these periods.
A further notable feature of the Table 5 is that the coefficient on inflation is less than unity over much of the period.This is in contrast to the requirements of macroeconomic theory that this should exceed unity for effective monetary policy, but similar estimates of this coefficient for the (actual) Euro area are reported in Sauer and Sturm (2007).
Although results are not shown in order to conserve space, a structural breaks analysis of (13) using AWM data for interest rates and inflation shows qualitatively similar results to those of Table 4.However, all methods then find the maximum number of breaks permitted, namely 5 with ε = 0.10 and 3 with ε = 0.20, except that four breaks are obtained using BP(HAC) with ε = 0.10.

The Manchester School
It is also noteworthy that when a dynamic monetary policy rule is examined, including lagged interest rates to account for interest rate smoothing and the autocorrelation generally evident in (13), fewer breaks are generally detected irrespective of whether ADOV or AWM data are employed.Indeed, including two lags of rt and using ADOV data, MBIC finds one break (at 1980Q3) and MHQIC two (1980Q3 and 1985Q4).While this confirms the importance of the 1980 break, it is nevertheless surprising that later changes to European monetary affiliations are not reflected in the detected structural breaks.A plausible reason may be that interest rate dynamics themselves do not change substantially over time, making breaks in other coefficients more difficult to detect when all are assumed to change at each break date.This possibility is, however, left as a matter for further research.

CONCLUDING REMARKS
This paper investigates the usefulness of information criteria for estimation of the number of structural breaks in models estimated by ordinary least squares.In particular, based on the asymptotic expected residual sum of squares when break dates are estimated, we propose a modification to the penalty function for structural break inference.This modified penalty is more severe than the BIC-type criteria proposed for structural break estimation by Yao (1988) and also by Zhang and Siegmund (2007).Although Liu et al. (1997) propose a criterion based on BIC, their modification is primarily based on calibration, whereas ours is analytical.Since our modification essentially compares the impact of estimation of break dates with that from estimating individual regression coefficients, it can be applied to a range of information criteria that yield consistent estimators for the number of breaks.
We undertake a Monte Carlo analysis to compare the performance of a number of methods for structural break inference.Information criteria applied are the BIC criterion of Yao (1988), an analogous criterion based on Hannan and Quinn (1979) (which does not appear to have been employed previously for structural break inference), our modified versions of BIC and HQIC, and the LWZ criterion of Liu et al. (1997).Alongside these, the sequential testing approach of Bai and Perron (1998) is also examined, using both i.i.d. and HAC inference.Overall, the modified BIC and HQIC perform well, irrespective of whether the disturbances are serially uncorrelated or positively autocorrelated, with the new penalty function substantially reducing the problem of spurious breaks to which the BIC is subject in the study of Bai and Perron (2006).Therefore, these modified criteria provide a viable alternative to sequential testing, and in some cases have superior properties.However, the criterion of Liu et al. (1997) is often poor in detecting the presence of true breaks.
The methodology of our Monte Carlo analysis largely follows that of Bai and Perron (2006), notably in evaluating the methods through the empir-ical distributions of the number of estimated breaks, irrespective of the magnitudes of these breaks.However, other criteria can be considered.For example, a researcher may be primarily interested in the coefficient values themselves, for which an appropriate metric would involve a measure of the distance of the estimated from the true coefficient vectors over all regimes that apply the sample period.In that case it may be optimal to ignore breaks when these are sufficiently small, due to the additional noise induced by sample splitting.A related situation applies in a forecasting context, where Pesaran and Timmermann (2007) show that use of pre-break data can reduce mean-squared forecast error compared with the use of post-break data only.These considerations point to the need for further research relating to the role of the estimation of the number of structural breaks.
Applied to Euro area monetary policy, our modified BIC/HQIC methods indicate multiple structural breaks prior to, and also at, the establishment of the Euro area in 1999, but none over the subsequent period.These results, including the break dates and estimated coefficients, are compatible with monetary policy changing over the various earlier phases of European monetary integration and hence point to the inadequacy of assuming monetary policy to be constant over a period that includes pre-euro data.
100.00 100.00 100.00 98.60 99.10 100.00 100.00 100.00 100.00 99.80 99.85 99.45 99.75 97.05 96.75 97.30 97defined in the text.BIC, MBIC, HQIC, MHQIC and LWZ are information criteria methods, with M indicating use of the modified penalty of (12).Sequential test methods employ a nominal significance level of 5 per cent, with Het and AC indicating heteroscedasticity and autocorrelation corrected statistics, respectively, while HAC includes both.NA means not applicable.
Coefficient estimates relate to (13) with HAC consistent standard errors shown in parentheses (using the Bartlett window with a lag truncation of 4); s is the residual standard deviation and ρ1 is the first-order autocorrelation coefficient the residuals.

TABLE 2 DISTRIBUTIONS
OF ESTIMATED NUMBERS OF BREAKS FOR DGPS WITH ONE BREAK

TABLE 3 DISTRIBUTIONS
OF ESTIMATED NUMBER OF BREAKS FOR DGPS WITH TWO BREAKS

TABLE 4 BREAKS
IDENTIFIED FOR EURO AREA MONETARY POLICYNotes: m is the number of breaks estimated for the monetary policy rule given by (13) where the trimming parameter ε defines a minimum regime length of εT observations, where T = 148 (1971Q1 to 2007Q4).The maximum number of breaks considered is 5 for ε = 0.10 and 3 for ε = 0.20.Methods employed as in Table1; data are discussed in Section 4.1.NA indicates no break dates are applicable.