SEARCH

SEARCH BY CITATION

Keywords:

  • Forecast combination;
  • Structural breaks;
  • Density forecasts

Abstract

  1. Top of page
  2. Abstract
  3. 1. INTRODUCTION
  4. 2. FORECASTING BY THE CONDITIONAL EXPECTATION
  5. 3. FORECASTS FROM MIS-SPECIFIED CONSTANT MODELS
  6. 4. IMPLEMENTING FORECAST COMBINATIONS
  7. 5. THE ROLE OF ENCOMPASSING
  8. 6. COMBINING UNDER EXTRANEOUS STRUCTURAL BREAKS
  9. 7. EMPIRICAL ILLUSTRATION
  10. 8. A MONTE CARLO STUDY
  11. 9. POOLING AND FORECAST DENSITIES
  12. 10. CONCLUSION
  13. ACKNOWLEDGEMENTS
  14. REFERENCES
  15. Appendix

Summary  We consider forecasting using a combination, when no model coincides with a non-constant data generation process (DGP). Practical experience suggests that combining forecasts adds value, and can even dominate the best individual device. We show why this can occur when forecasting models are differentially mis-specified, and is likely to occur when the DGP is subject to location shifts. Moreover, averaging may then dominate over estimated weights in the combination. Finally, it cannot be proved that only non-encompassed devices should be retained in the combination. Empirical and Monte Carlo illustrations confirm the analysis.


1. INTRODUCTION

  1. Top of page
  2. Abstract
  3. 1. INTRODUCTION
  4. 2. FORECASTING BY THE CONDITIONAL EXPECTATION
  5. 3. FORECASTS FROM MIS-SPECIFIED CONSTANT MODELS
  6. 4. IMPLEMENTING FORECAST COMBINATIONS
  7. 5. THE ROLE OF ENCOMPASSING
  8. 6. COMBINING UNDER EXTRANEOUS STRUCTURAL BREAKS
  9. 7. EMPIRICAL ILLUSTRATION
  10. 8. A MONTE CARLO STUDY
  11. 9. POOLING AND FORECAST DENSITIES
  12. 10. CONCLUSION
  13. ACKNOWLEDGEMENTS
  14. REFERENCES
  15. Appendix

In the third of a century since Bates and Granger (1969), the combination of individual forecasts of the same event has often been found to outperform the individual forecasts, in the sense that the combined forecast delivers a smaller mean-squared forecast error (MSFE)—see inter aliaDiebold and Lopez (1996) and Newbold and Harvey (2002) for recent surveys, and Clemen (1989) for an annotated bibliography. Studies such as Newbold and Granger (1974) provided early evidence consistent with that claim. Moreover, simple rules for combining forecasts, such as averages (i.e. equal weights), often work as well as the more elaborate rules based on the relative past performance of the forecasts to be combined (see Stock and Watson 1999; Fildes and Ord 2002). Nevertheless, despite some potential explanations (such as Granger 1989), precisely why forecast combinations should work well does not appear to be fully understood. This paper addresses that issue.

There are a number of potential explanations. First, if two models provide partial, but incompletely overlapping, explanations, then some combination of the two might do better than either alone. In particular, if two forecasts were differentially biased (one upwards, one downwards), it is easy to see why combining could be an improvement over either. Similarly, if all explanatory variables were orthogonal, and models contained subsets of these, an appropriately-weighted combination could more completely reflect all the information. However, it is unclear why investigators would construct systematically biased or inefficient models; and there are other solutions to forecast biases and inefficiencies than pooling forecasts. Moreover, it is less easy to see why a combination need improve over the best of a group, particularly if there are some decidedly poor forecasts in that group.

Second, in non-stationary time series, most forecasts will fail in the same direction when forecasting over a period within which a break unexpectedly occurs. Combination is unlikely to provide a substantial improvement over the best individual forecasts in such a setting. Nevertheless, what will occur when forecasting after a location shift depends on the extent of model mis-specifications, data correlations, the sizes of breaks and so on, so combination might help. Since a theory of forecasting allowing for model mis-specification interacting with intermittent location shifts has explained many other features of the empirical forecasting literature (see Clements and Hendry 1999), we explore the possibility that it can also account for the benefits from pooling.

Third, averaging reduces variance to the extent that separate sources of information are used. Since we allow all models to be differentially mis-specified, such variance reduction remains possible. Nevertheless, we will ignore sample estimation uncertainty to focus on specification issues, so any gains from averaging also reducing that source of variance will be additional to those we delineate.1

Next, an alternative interpretation of combination is that, relative to a ‘baseline’ forecast, additional forecasts act like intercept corrections (ICs). It is well known that appropriate ICs can improve forecasting performance not only if there are structural breaks, but also if there are deterministic mis-specifications. Indeed, Clements and Hendry (1999) present eight distinct interpretations of the role that ICs can play in forecasting, and for example, interpret the cross-country pooling in Hoogstrate et al. (2000) as a specific form of IC.

Finally, pooling can also be viewed as an application of the Stein–James ‘shrinkage’ estimation (see e.g. Judge and Bock 1978). If the unknown future value is viewed as a ‘meta-parameter’ of which all the individual forecasts are estimates, then averaging may provide a ‘better’ estimate thereof. Below, we consider whether data-based weighting will be useful when the process is subject to unanticipated breaks.

Thus, we evaluate the possible benefits of combining forecasts in light of the nature of the economic system and typical macroeconomic models thereof, to discern the properties of the system and models—and the relationships between the two—that result in forecast combination reducing MSFEs. In particular, given that a general theory of economic forecasting which allows for structural breaks and mis-specified models has radically different implications from one that assumes stationarity and well-specified models (see Clements and Hendry 1999; Hendry and Clements 2003), we explore the role of forecast combinations in the former framework.

Section 2 confirms that combinations of forecasts are ineffective when forecasting using the correct conditional expectation in a weakly stationary process. Thus, departures from ‘optimality’, due to mis-specification, mis-estimation, or non-stationarities are necessary to explain gains from combination. Section 3 considers whether combination could deliver gains in a weakly-stationary process when forecasting models are differentially mis-specified by using only subsets of the relevant information. We show there is a range of values of the parameters of the data generation process (DGP) where this can occur, but gains are not guaranteed. Nevertheless, the logic of why gains ensue in such a setting points to why combination might work in general, partly by providing ‘insurance’ against obtaining the worst forecasts. Section 4 notes alternative ways of implementing forecast combinations, and Section 5 considers the role of encompassing—which is violated by the need to pool—and discusses whether only non-encompassed models are worth pooling. If the weights used in any combination are estimated, then they directly reflect a lack of encompassing; however, if pre-fixed weights, such as the average, are used, encompassed models may lower rather than raise the efficiency of the combined forecast. Section 6 extends the analysis to processes subject to location shifts, where the combination can dominate in MSFE. Moreover, previously encompassed models may later become dominant, and the earlier dominant model may fail badly, so averaging across all contenders cannot be excluded as a sensible strategy. Section 7 provides an empirical illustration based on the data set originally used by Bates and Granger (1969), and by demonstrating the efficacy of ICs, suggests that combination works there because of location shifts of the form underlying our theoretical approach. The Monte Carlo study of the behaviour in finite samples of our theoretical approximations in Section 8 supports their applicability in practice. Section 9 considers forecast densities after pooling, and Section 10 concludes.

2. FORECASTING BY THE CONDITIONAL EXPECTATION

  1. Top of page
  2. Abstract
  3. 1. INTRODUCTION
  4. 2. FORECASTING BY THE CONDITIONAL EXPECTATION
  5. 3. FORECASTS FROM MIS-SPECIFIED CONSTANT MODELS
  6. 4. IMPLEMENTING FORECAST COMBINATIONS
  7. 5. THE ROLE OF ENCOMPASSING
  8. 6. COMBINING UNDER EXTRANEOUS STRUCTURAL BREAKS
  9. 7. EMPIRICAL ILLUSTRATION
  10. 8. A MONTE CARLO STUDY
  11. 9. POOLING AND FORECAST DENSITIES
  12. 10. CONCLUSION
  13. ACKNOWLEDGEMENTS
  14. REFERENCES
  15. Appendix

Consider a weakly stationary n-dimensional stochastic process {xt} with density Dx(xtXt−1, θ), which is a function of past information Xt−1= (…x1xt−1) for inline image. Forecasts of xT+h based on the conditional expectation given information up to period T,

  • image(1)

are conditionally unbiased,

  • image(2)

and no other predictor conditional on only XT has a smaller MSFE matrix,

  • image(3)

Moreover, both (2) and (3) hold for all h. Consequently, on a MSFE basis for forecasting xT+h, the conditional expectation cannot be beaten, as is well known. However, the empirical evidence that combination is useful clearly indicates that the above framework is inappropriate as an analytic basis.

There are several possible explanations for the empirical outcome. First, forecasts inline image might be used that are based on only subsets of the available information XT. Second, the functions of past data used to form those forecasts do not coincide with the conditional expectation. Thirdly, parameter estimation uncertainty is sufficiently large that averaging is advantageous. Finally, the underlying data density Dx(xt | Xt−1, θ) is not constant, in which case, the first two mistakes are almost bound to occur as well, particularly if location shifts are the source of the non-constancy.2 The proliferation of competing forecasting methods and models is also evidence for the first two potential explanations. Here, we first explore the implications of combining the forecasts from mis-specified models when Dx(·) is constant, then consider what happens when the DGP is subject to intermittent breaks.

3. FORECASTS FROM MIS-SPECIFIED CONSTANT MODELS

  1. Top of page
  2. Abstract
  3. 1. INTRODUCTION
  4. 2. FORECASTING BY THE CONDITIONAL EXPECTATION
  5. 3. FORECASTS FROM MIS-SPECIFIED CONSTANT MODELS
  6. 4. IMPLEMENTING FORECAST COMBINATIONS
  7. 5. THE ROLE OF ENCOMPASSING
  8. 6. COMBINING UNDER EXTRANEOUS STRUCTURAL BREAKS
  9. 7. EMPIRICAL ILLUSTRATION
  10. 8. A MONTE CARLO STUDY
  11. 9. POOLING AND FORECAST DENSITIES
  12. 10. CONCLUSION
  13. ACKNOWLEDGEMENTS
  14. REFERENCES
  15. Appendix

To articulate our approach, we approximate the DGP Dx(xt | Xt−1, θ) by the constant-parameter first-order vector autoregression (VAR),

  • image(4)

where εtINn[0, Ωε]. Section 6 considers the impacts of breaks due to location shifts. We focus on 1-step ahead forecasts for T+ 1 from time T purely to simplify the algebra; no issue of principle seems involved in generalizing to multi-step forecasts. Also, we restrict attention to forecasting the scalar yt, which is one element of xt, and in this section, assume that, in the absence of structural breaks, xt in (4) has been reduced to weak stationarity by appropriate transformations. Thus, partitioning xt= (x1,t: x2,t), the model determining yt is given by

  • image(5)

where etIN[0, σ2e], independently of xt−1. Since the processes are all weakly stationary, intercepts are set to zero.

Two investigators unaware of the nature of the process in (5), fit separate models of the form:

  • image(6)

and

  • image(7)

Each model is mis-specified by omitting the components which the other includes—the absence of overlapping variables seems an inessential simplification (the switch to wt and zt is to ease notation below, but note that wT+1 and zT+1 are known at the forecast origin). Moreover, as we believe the explanation for any benefits from combination derive from specification—rather than estimation—issues, we further simplify by neglecting sampling variability in the coefficients a and b where necessary to obtain sharper results. The assumption that the partial models span the information set is to simplify the algebra, and does not seem consequential: Section 8 provides a Monte Carlo illustration.

It must be stressed that in such a constant-parameter framework, pooling the information will produce the optimal forecast, as the resulting model coincides with the DGP, whereas pooling the forecasts will not in general (but see Granger (1989) for an example). However, that implication need not generalize to non-constant DGPs.

Let

  • image(8)

where φw,t and φz,t are fixed functions of past variables, and

  • image(9)

Our interest is in comparing the accuracy of the forecasts from the models in (6) and (7) against that of a pooled forecast, based on MSFEs (as that is the criterion most frequently applied in practice, but see Clements and Hendry 1993). We set φw,t=φz,t=0, so both dynamics and deterministic factors are ignored, and this is known to the investigators, so intercepts and further lags are omitted: Section 8 investigates dynamics via Monte Carlo simulations.

The 1-step ahead forecast from (6) is denoted inline image, so the forecast error is

  • image(10)

The corresponding forecast from (7) uses inline image with

  • image(11)

Neither forecast should encompass the other. Section 5 considers testing for non-encompassing before forecast combining.

In the Appendix, we detail the derivation of the MSFEs for the two models. Letting M[·] denote MSFE, these are given by

  • image(12)

and

  • image(13)

where the approximations result from ignoring parameter estimation uncertainty, that is, terms of Op(T−1). In these expressions, Ωmath imageVzw,t, where V[·] denotes a variance, and ηzw,t is defined by:

  • image(14)

Similarly, Ω_ηwzVwz,t], where

  • image(15)

To a first approximation, then, the MSFEs depend on the importance in the DGP of the omitted variables (e.g. for the model given by (6), with MSFE given by (12), this is β2, the coefficient on zt) and will be greater to the extent that the included variables do not explain the excluded (measured by Ωmath image in (12) for the model in (6)). To order the outcome in terms of accuracy, we assume inline image, so β2′Ωmath imageβ2 < β1′Ωmath imageβ1. Consequently, inline image would transpire on average to be the more accurate forecast here: equivalent results hold for the opposite ranking.

Writing a combined forecast as

  • image(16)

we derive in the Appendix the associated MSFE as

  • image(17)

(to the same order of approximation as for the individual forecasts), where

  • image

and, for example, Ωzw=E[ztwt], as indicated. The last line in the above is the matrix analogue of (1−R2wz), and has a negative sign: intuitively, if the regression of zt on wt over- (under-) estimates, the reverse regression will do the opposite.

Stock and Watson (1999) find that a combination obtained by pooling forecasts across many methods does well, using either the mean or median forecast, so we focus on the case where λ= 0.5. Then

  • image(18)

as against the smaller of the two individual forecast errors:

  • image

Therefore

  • image

if and only if

  • image

Let β2′Ωmath imageβ2=kβ1′Ωmath imageβ1, where k < 1 given our ordering, then combination dominance requires

  • image

This is more likely to hold if the marginal effects of w and z on y in the DGP are of the same sign and ‘match’ the sign of Ωzw.

In the special case that Ωzw=0, combination dominance requires

  • image

so an improvement over the better individual forecast by averaging is possible within that range (and similarly for the alternative ranking). However, the larger forecast error was

  • image

as against (18), so when Ωzw=0, dominance requires

  • image

which is bound to hold. Thus, averaging guarantees ‘insurance’, and may provide dominance when the models are differentially mis-specified for a constant DGP.

3.1. Scalar case

In the scalar case where n1=n2= 1, somewhat more transparent results can be obtained. Denote the correlation between w and z by rwz and their variances by σ2w and σ2z, then domination by the average over the best requires

  • image

for ρ=σwz > 0 with β22σ2z=kβ21σ2w. Normalizing such that β12= 1, then k= 1/ρ2 so ρ > 1 and dominance requires:

  • image

This is bound to hold when ρ is close to unity, and also for ρ < 3 when rwz is close to +1.

Also, against the larger forecast error (again using the normalized parameter values),

  • image

which must always hold even when rwz < 0. Thus, combination—even by averaging—seems likely to be advantageous here.

4. IMPLEMENTING FORECAST COMBINATIONS

  1. Top of page
  2. Abstract
  3. 1. INTRODUCTION
  4. 2. FORECASTING BY THE CONDITIONAL EXPECTATION
  5. 3. FORECASTS FROM MIS-SPECIFIED CONSTANT MODELS
  6. 4. IMPLEMENTING FORECAST COMBINATIONS
  7. 5. THE ROLE OF ENCOMPASSING
  8. 6. COMBINING UNDER EXTRANEOUS STRUCTURAL BREAKS
  9. 7. EMPIRICAL ILLUSTRATION
  10. 8. A MONTE CARLO STUDY
  11. 9. POOLING AND FORECAST DENSITIES
  12. 10. CONCLUSION
  13. ACKNOWLEDGEMENTS
  14. REFERENCES
  15. Appendix

Forecast combination can be implemented in many different ways (see Granger and Ramanathan 1984; Diebold 1988; Wall and Correia 1989; Coulson and Robins 1993 among others). Potential approaches range from simple averaging to more complex schemes designed to give optimal combination weights. In this last case, the weights are often estimated to optimize some criterion (e.g. minimizing the MSFE of the combined forecast) on a post-model-estimation ‘training sample’ for which the realizations are available, prior to undertaking genuine out-of-sample forecasting. Sometimes the individual models' explanatory variables will be assumed known, and the true values can be conditioned on, at either training or forecasting stages, or alternatively these may themselves be forecast.

Forecasting is seldom a ‘one-off’ venture, and typically forecasts will be made at a number of successive forecast origins. The individual models may be re-specified and/or re-estimated at each origin, as may the combination weights—one can imagine the training window moving through the sample as the forecast origin progresses. The estimation windows may be of fixed length so that early observations are dropped, or may expand indefinitely. The success (or otherwise) of forecast combining is likely to depend in part on how it is implemented, so that explanations of its efficacy will be multi-faceted. Nevertheless, given a careful articulation of the context in which forecasting is undertaken, it should be possible to determine which factors are likely to play a key role.3

4.1. Forecast combination as a bias correction

Suppose inline image denotes a set of forecasts over a training period i= 1, …, R, where inline image is the 1-step ahead forecast of y at T + i based on T + i− 1, etc., and the parameter estimates are based on a sample over 1, …, T. We allow the forecasts to be biased, possibly because they are generated from assumed constant-parameter models in the presence of structural breaks: Granger (1989) recommends ‘unbias(ing) the component forecasts’ prior to combination. Thus, inline image and inline image for i= 1, …, R, and this is reflected in non-zero values of the corresponding sample moments. Suppose the weights are calculated to minimize the MSFE of the combined forecast, imposing the restriction that the weights sum to unity, and allowing for bias by including an intercept. Letting inline image and inline image denote the vectors of observations over T + 1 to T + R, the weight α is estimated from

  • image(19)

where i is an R-dimensional vector of 1s, or

  • image

By the Frisch–Waugh theorem (see Frisch and Waugh 1933), one can equivalently run the regression of inline image on inline image where Mi=IRi(ii)−1i,

  • image

Using

  • image

where inline image is the sample estimate of the bias in inline image, and inline image is the bias-corrected forecast. Similarly, inline image, where

  • image

so

  • image

The combination forecast is

  • image

that is, a combination of the bias-corrected forecasts. Bias correction should account for a reduction in the MSFE, so that the appropriate benchmarks for the combined forecast should be inline image and inline image rather than inline image and inline image. In practice, the combined forecast is usually only compared to the uncorrected individual forecasts.

An alternative interpretation of the role of δ in (19) is as an ‘IC’ for the forecast given by inline image. This interpretation is clearer if we assume there is just a single forecast inline image, so that the problem is simply to calculate δ in

  • image(20)

or

  • image

so that inline image, namely the sample estimate of the bias. If inline image because of a tendency to under-predict, the intercept-corrected forecasts inline image are revised up by that amount.

4.2. Forecast-error variances

Even if pooling were to improve the accuracy of the resulting forecasts relative to all the individual models, to be of practical value, the resulting forecast would need a measure of error variance. Consider the fixed-weight combination in (16), so

  • image

or

  • image

so

  • image

The first two components on the right-hand side can be calculated from the estimated models, and the third could be derived from any available historical track record on past forecast errors. Alternatively, the historical MSFE of inline image could be used directly, and indeed may be the only feasible approach when many models need to be pooled. In stationary processes, inline image would allow valid forecast intervals to be constructed.

5. THE ROLE OF ENCOMPASSING

  1. Top of page
  2. Abstract
  3. 1. INTRODUCTION
  4. 2. FORECASTING BY THE CONDITIONAL EXPECTATION
  5. 3. FORECASTS FROM MIS-SPECIFIED CONSTANT MODELS
  6. 4. IMPLEMENTING FORECAST COMBINATIONS
  7. 5. THE ROLE OF ENCOMPASSING
  8. 6. COMBINING UNDER EXTRANEOUS STRUCTURAL BREAKS
  9. 7. EMPIRICAL ILLUSTRATION
  10. 8. A MONTE CARLO STUDY
  11. 9. POOLING AND FORECAST DENSITIES
  12. 10. CONCLUSION
  13. ACKNOWLEDGEMENTS
  14. REFERENCES
  15. Appendix

When fixed weights are used (as in an average), it is easy to illustrate a case where only non-encompassed models are worth pooling. In particular, when (5) is one of the forecasting equations, averaging with any subset model, or models, will produce systematically poorer forecasts. This should hold more generally for weakly stationary processes—since all other forecasts are then inferentially redundant—and suggests testing for forecast encompassing prior to averaging: (see Harvey et al. 1998 and Diebold (1989), who relate encompassing to forecast combinations). Ericsson and Marquez (1993) and Andrews et al. (1996) provide empirical examples of forecast-encompassing tests. However, section 6.4 provides a counter example in processes subject to location shifts where an encompassed model may later dominate: since breaks seem pandemic in macroeconomics, no general result can be established.

When weights are estimated forces operate. First, under weak stationarity, there is the detrimental effect of the uncertainty added by estimation of the weights. Second, there is an offset from the benefit of choosing the best weights. Overall, we suspect estimation probably does not explain much of the success of pooling: whether or not the weights are estimated, combining must be better than the worst of the individual forecasts, and could beat the best. Section 8 shows that this occurs in the Monte Carlo.

When the weights are estimated by regression, then any forecast which contributes to a combination is not encompassed by the others (see Chong and Hendry 1986). Thus, estimated weights assign little role to encompassed forecasts, as their weights will be insignificant. While the need to pool violates encompassing (see Lu and Mizon 1991; Ericsson 1992), and so reveals non-congruence, congruence per se cannot be established as a necessary feature for good forecasting (see Hendry and Clements 2003). Indeed, the next section suggests that averaging might be preferable when unanticipated breaks can occur. Section 8 confirms that estimated weights need not dominate over fixed.

6. COMBINING UNDER EXTRANEOUS STRUCTURAL BREAKS

  1. Top of page
  2. Abstract
  3. 1. INTRODUCTION
  4. 2. FORECASTING BY THE CONDITIONAL EXPECTATION
  5. 3. FORECASTS FROM MIS-SPECIFIED CONSTANT MODELS
  6. 4. IMPLEMENTING FORECAST COMBINATIONS
  7. 5. THE ROLE OF ENCOMPASSING
  8. 6. COMBINING UNDER EXTRANEOUS STRUCTURAL BREAKS
  9. 7. EMPIRICAL ILLUSTRATION
  10. 8. A MONTE CARLO STUDY
  11. 9. POOLING AND FORECAST DENSITIES
  12. 10. CONCLUSION
  13. ACKNOWLEDGEMENTS
  14. REFERENCES
  15. Appendix

Clements and Hendry (1999). Hendry and Doornik (1997) and Hendry (2000) establish that location shifts are the problematic class of structural breaks in a forecasting context, so we focus on those. We consider a DGP, where the regressor processes x1,t−1 and x2,t−1 in (5) experience breaks at different times, but the forecasting model remains unchanged. Thus, φw,t and φz,t in (8) are non-constant, beyond being functions of past variables. The DGP for the y process in terms of wt and zt remains

  • image(21)

where etIN[0, σ2e]. As before, dynamics and intercepts are assumed absent merely to simplify the algebra, so prior to forecasting, φz,t=φw,t=0, whereas in-sample

  • image(22)

Again, the investigators fit separate models of the form

  • image(23)
  • image(24)

Now intercepts are included, to offset any mean values induced by location shifts. We first allow only the z process to shift by φz,T + 1=μz (redefined to simplify notation), which is in fact a change at the end of the estimation sample, influencing the forecast-period behaviour of y. Since the shifts occur in the processes determining the regressors, we refer to these as extraneous breaks. Any breaks in variables that influence the DGP but are excluded from both (23) and (24) would act to influence them in a similar way to the case we examine, but without the offset from averaging over models that included the breaking variables. Breaks in the intercept of the DGP equation are noted below.

The 1-step ahead forecast from (23) is

  • image

so the forecast error inline image is

  • image(25)

The corresponding forecast from (24) uses inline image with inline image,

  • image

Next, we derive the conditional biases and variances of the forecast errors. This requires the relationship equations between the regressors, of which the first is given by

  • image(26)

so

  • image(27)

Thus, from the estimation sample, prior to any shifts, and assuming least squares estimates of in-sample parameters

  • image

so

  • image

using (27). Again we ignore Op(T−1) terms arising from estimation increasing MSFEs, so

  • image

with

  • image

A break may also be induced in the model, which includes z when zT + 1 shifts because

  • image

so κ=−Πwzμz, whereas Πwz=Ω−1zzΩzw, leading to a forecast error of

  • image

Then the squared error is

  • image

We continue to assume that, prior to the break, the model including w is the more accurate, so β2′Ωmath imageβ2=kβ1′Ωmath imageβ1 for k < 1. Then, to the approximations involved

  • image

Consequently, inline image could be the more accurate forecast here, despite being less accurate prior to the break. This is more likely the larger μz and the less correlated are z and w—in the limit, when inline image is a consistent estimator of β2, and the term involving μz drops out of the MSFE for inline image.

The average forecast is

  • image

with error

  • image

so

  • image

Again, ignoring terms of Op(T−1),

  • image

Thus, the combined forecast could beat both individual forecasts depending on the relative sizes of the unmodelled shift in the z process to the error variances.

To illustrate this, we consider two simplifications: first Ωwz=0, then a scalar case in Section 6.1. Against inline image (the more accurate forecast in the absence of breaks) in the first simplification, the average forecast dominates when

  • image

which is bound to hold for k > 1/3 and could hold even for small k. Against the second forecast,

  • image

If we approximate by k= 1, then both hold when

  • image

where the last inequality must be true. If instead, k is small, then

  • image

Thus, irrespective of whether k is large or small, the average can ‘win’ against both mis-specified forecasting devices when the DGP experiences location shifts.

6.1. Scalar illustration

In the scalar case when n1=n2= 1, using the approach in Section 3.1

  • image

with

  • image

Against inline image, the average outperforms in the normalized case if (as rwz=ρπwz and kρ2= 1)

  • image

When ρ is close to unity and rwz is large, this reduces to

  • image(28)

which must hold. Alternatively, if rwz= 0, then

  • image

which will hold when the relative break is sufficiently large.

Against inline image, the average dominates if

  • image

As before, when ρ is close to unity and rwz is large, we replicate (28). And if rwz= 0, dominance requires

  • image

Thus, dominance over both individual models simultaneously requires

  • image

We conclude that there is a wide range over which averaging will dominate.

6.2. Later breaks

If, in a later forecast period, there is a break in the other process, then a similar analysis applies with the initial rankings of the individual models reversed. The algebra naturally becomes tedious, but the outcome must depend on both the absolute and relative sizes of the breaks, whether earlier breaks were modelled or not, the robustness of devices to breaks, and the sizes of the signal–noise ratios. There must exist combinations in which the average dominates over individual forecasting devices, on average over repeated forecasting episodes, because other devices swing from good to bad performance. Such later breaks may also vitiate the estimation of weights: when a method is doing well because it had not previously suffered forecast failure, estimation will attribute an above-average weight to it. Any later shift in that ‘current-best’ device would induce poorer performance than just the average.

6.3. Breaks in falsely included variables

If some of the variables that are included with non-zero coefficients in forecasting models are in fact irrelevant, then an analogous derivation is feasible to show that the effects of breaks favour combination. When such variables experience a location shift, the forecasts from that model will be poor, since the dependent variable will not have been affected. Any average will attribute a smaller weight than unity to such a set of forecasts, and so outperform it. Later breaks in other variables in rival models will similarly worsen their performance, leaving the average as the ‘winner’.

6.4. Within-equation breaks

Finally, a break in the y process introduces further complications, depending on the class of models under analysis. When a break occurs after forecasts are announced, all devices will fail, usually in the same direction, so averaging will neither resolve nor exacerbate that problem. However, some methods will continue to fail for many later periods—especially equilibrium-correction models (EqCMs)—again usually in the same direction (see e.g. Clements and Hendry 1999). If the EqCMs were previously the dominant approach, then we have the analogue of the conditions in Section 6, namely a switch in ranking between methods pre and post break, precisely the situation when averaging can dominate on average. Now, however, in the sub-periods, the average may or may not dominate. Moreover, estimated weights would emphasize the near encompassing of an EqCM over (say) a first-differenced autoregression, so could do less well than the average. Indeed, when simple—but robust—forecasting devices are encompassed by the EqCM, and so excluded from pooling, we have a counter example to any claim that only non-encompassed models should be included in the average.

6.5. Pooling information

In the present context, pooling of information should prove more successful than pooling forecasts for all extraneous breaks in correctly included variables, but not for breaks in the equation of interest, however generated. Since there are often many variables involved, the former type of break should be more frequent than the latter, supporting pooling information. On the other hand, false inclusion of variables that later break will be detrimental. In Hendry and Clements (2001), we explore these ideas to investigate the apparent success of ‘factor forecasts’, or diffusion indices, as in Stock and Watson (1999) and Forni et al. (2000).

Moreover, extraneous breaks become endogenous in a system, so our approach also points to an explanation for why multi-step (or dynamic) estimation may be advantageous: see Chevillon (2000). Conversely, when different transformations (e.g., log and linear) of the same variable are involved, pooling information seems less likely to dominate.

7. EMPIRICAL ILLUSTRATION

  1. Top of page
  2. Abstract
  3. 1. INTRODUCTION
  4. 2. FORECASTING BY THE CONDITIONAL EXPECTATION
  5. 3. FORECASTS FROM MIS-SPECIFIED CONSTANT MODELS
  6. 4. IMPLEMENTING FORECAST COMBINATIONS
  7. 5. THE ROLE OF ENCOMPASSING
  8. 6. COMBINING UNDER EXTRANEOUS STRUCTURAL BREAKS
  9. 7. EMPIRICAL ILLUSTRATION
  10. 8. A MONTE CARLO STUDY
  11. 9. POOLING AND FORECAST DENSITIES
  12. 10. CONCLUSION
  13. ACKNOWLEDGEMENTS
  14. REFERENCES
  15. Appendix

Bates and Granger (1969) provide an example of the usefulness of combining forecasts from linear and exponential trend models of output. Table 1 records an output index for the U.K. gas, electricity and water sectors for the years 1948 to 1965, along with forecast errors from linear and exponential trend models of output {yt}, given by yt=α+βt+errort and ln(yt) =a + bt + errort, where t is a linear time trend. The forecast error in period t(t= 1950, …, 1965) is calculated from a forecast based on estimating the model on data up to t− 1. The results in the table show that although the exponential model forecasts have a much smaller sum of squared errors (SSE) than the linear model, nevertheless, a combination which attaches a small weight to the linear forecasts has a smaller SSE. For example, for a fixed weight of 0.16 on the linear forecasts, the combined forecast SSE is 78.8.4 This clearly supports combination, but it is of interest to interpret how the gain comes about given our analysis.

Table 1.  Forecasts of output indices, 1950–1965.
 Actual1-step forecast errors
Linear Exponential CombinationLinear bias-correctedExponential bias-corrected
  1. The output series is the output index for the gas, electricity and water sector, given in Bates and Granger (1969, Table A1, p. 462). The combination forecast has fixed weights of 0.16 and 0.84 on the (uncorrected) linear and exponential forecasts.

194858.0 
194962.0 
195067.01.00.70.771.00.7
195172.00.70.10.21−0.3−0.6
195274.0−2.5−3.4−3.24−3.3−3.8
195377.0−2.2−3.3−3.11−1.9−2.4
195484.02.10.80.992.82.2
195588.01.0−0.6−0.371.20.4
195692.00.4−1.7−1.330.4−0.7
195796.00.0−2.5−2.08−0.0−1.4
1958100.0−0.2−3.2−2.71−0.3−2.0
1959103.0−1.3−4.8−4.28−1.4−3.4
1960110.01.9−2.1−1.472.0−0.3
1961116.03.2−1.4−0.713.10.4
1962125.07.01.82.606.73.5
1963133.08.82.83.748.04.3
1964137.06.1−0.90.264.70.3
1965145.08.0−0.01.266.31.1
Sample bias2.1−1.1−0.61.8−0.1
Sum of squared errors263.384.478.8211.977.0

The forecast errors from the linear model become large and positive from around 1961 onwards, indicating that the constant absolute increase model is inappropriate. On average, the exponential model over-predicts (negative errors), albeit to a lesser extent. Combination is seen to work by tempering the negative errors of the more accurate exponential model with the predominantly positive errors of the linear model over the 1955–1961 period. This view is supported by the SSEs of the bias-corrected forecast errors (see the last two columns of the table), and the results of combining the bias-corrected forecasts. The bias-corrected forecast of period t is calculated by adding the sample mean of the forecast errors up to period t− 1 to the forecast of period t. Because the bias term is calculated from past forecast errors up to that point, it adapts only slowly to the run of positive errors in the linear forecasts of the 1960s. The SSE of the bias-corrected exponential forecasts is 77, less than the combined forecast SSE of 78.8 (with a weight of 0.16), but more pertinently, we find that any fixed weight combination of the bias-corrected forecasts, with weights in the interval (0, 1), has a larger SSE than that of the exponential model forecasts.5 Of course, the fixed-weight combination forecasts discussed are not feasible, in the sense that they are based on knowledge of the full set of forecast errors. Moreover, fixed weights can also be improved upon by varying-weight schemes, as shown by Bates and Granger (1969). This example shows that gains from combination may disappear if individual forecasts are first corrected, consistent with the derivation when there are no breaks that combination exploits offsetting biases.

A final implication, given the autocorrelated forecast errors, is that IC or differencing should improve the forecasts. For the latter, the SSEs become 73.9 and 59.0 for the linear and exponential models respectively, providing a dramatic improvement for the former, and a smaller—but worthwhile—gain for the latter, which now does better than any combination. Clements and Hendry (1999) treat inappropriate specification or estimation of deterministic terms as near equivalents of shifts in those terms, so such an interpretation is also consistent with the present gains from combination and differencing.

8. A MONTE CARLO STUDY

  1. Top of page
  2. Abstract
  3. 1. INTRODUCTION
  4. 2. FORECASTING BY THE CONDITIONAL EXPECTATION
  5. 3. FORECASTS FROM MIS-SPECIFIED CONSTANT MODELS
  6. 4. IMPLEMENTING FORECAST COMBINATIONS
  7. 5. THE ROLE OF ENCOMPASSING
  8. 6. COMBINING UNDER EXTRANEOUS STRUCTURAL BREAKS
  9. 7. EMPIRICAL ILLUSTRATION
  10. 8. A MONTE CARLO STUDY
  11. 9. POOLING AND FORECAST DENSITIES
  12. 10. CONCLUSION
  13. ACKNOWLEDGEMENTS
  14. REFERENCES
  15. Appendix

We consider a range of settings. The first set includes extraneous shifts in white-noise processes to match the theory derivations and check their applicability in finite samples (Section 8.1). We then allow for dynamic models (Section 8.2), breaks in the DGP equation itself (Section 8.3), and situations where some of the explanatory variables are absent from all of the models (Section 8.4).

8.1. Shifts in an extraneous variable

8.1.1. Forecast-period shiftTable 2 reports a selection of results from a Monte Carlo study of the usefulness of combination in small samples for constant DGPs and when there is a shift in the mean of an extraneous variable. The DGP is as given in Section 6 with wt and zt scalar variables. The models are estimated without intercepts on the sample up to T, and used to forecast T + 1. The means φzw= 0 in-sample, but we allow φz,T + 1z to be non-zero in some experiments. Results are given for three combination schemes: simple averaging, the ‘optimal’ combination, and the use of relative MSFE weights.6 We set β12= 1, with a DGP disturbance variance of 0.16. The table records the Monte Carlo estimates of the biases and MSFEs over 50,000 replications, for a number of sample sizes T, and different values of σw, σz and rwz. We also record the Monte Carlo estimates of inline image and inline image. The columns headed Mz, Mw and Mc relate to forecasts from the models including z, w, and the simple average of the two, whereas the columns headed Mλ and Mmath image show the optimal and MSFE weight combinations respectively.

Table 2.  Simulation results: Extraneous shifts.
 Tφz,T + 1σzφw,TσwrwzForecast biasMSFEinline imageinline imageinline imageinline image
MzMwMCMλMλMMzMwMCMλMλM
 No shift 
 11000.01.00.01.50.750.00−0.000.000.000.001.140.600.270.240.250.610.662.131.50
 2200.01.00.01.50.750.010.000.000.010.011.200.630.290.270.280.610.652.131.50
 3100.01.00.01.50.75−0.000.000.00−0.00−0.001.280.670.330.320.340.610.642.121.50
 φz,T + 1z 
 4201.01.00.01.50.75−1.121.00−0.060.190.272.521.640.310.340.400.610.652.131.50
 5201.01.00.01.00.000.001.010.510.500.501.282.240.950.990.990.500.501.001.00
 6201.51.50.01.00.75−0.751.500.380.130.041.213.470.440.320.330.390.351.502.12
 7201.01.00.01.5−0.751.131.011.070.941.042.541.641.941.491.791.110.65−0.120.50
 8201.51.50.01.0−0.750.751.511.130.671.011.233.482.071.111.74−0.110.350.50−0.13
 φz,T + 1= 2 ×σz 
 9202.01.00.01.50.75−2.252.00−0.120.370.536.504.640.370.520.730.610.652.131.50
10202.01.00.01.00.000.002.011.000.991.001.475.251.761.831.820.500.501.001.00
11203.01.50.01.00.75−1.503.000.750.250.083.0010.230.890.460.480.390.351.502.12
12202.01.00.01.5−0.752.252.012.131.872.076.524.655.384.135.021.110.65−0.120.50
13203.01.50.01.0−0.751.503.012.261.322.013.0110.265.902.624.83−0.110.350.50−0.13
 Tφz,T + 1σzφw,TσwrwzForecast biasMSFEinline imageinline imageinline imageinline image
MzMwMCMλMλMMzMwMCMλMλM
  1. Mz and Mw refer to the forecasts from the models including z and w, respectively. MC assigns equal weights to each forecast, Mλ assigns ‘optimal’ weights based on the formula in Bates and Granger, and Mmath image assigns optimal weights but omitting the covariance terms between the models' forecast errors.

 φz,T + 1z, φw,T:T + 1= 2 ×σw 
14201.01.03.01.50.751.73−0.180.780.520.434.390.841.030.680.630.640.692.131.42
15201.01.02.01.00.001.911.011.461.401.415.012.492.972.842.870.550.541.001.00
16201.51.52.01.00.751.16−0.270.440.590.642.051.630.650.760.840.410.381.501.93
17201.01.03.01.5−0.753.982.193.092.002.7117.255.6010.374.888.161.080.69−0.120.58
18201.51.52.01.0−0.752.663.292.972.582.867.7712.349.697.438.950.070.370.500.06
 φz,T + 1= 2 ×σz, φw,T:T + 1=−2 ×σw 
19202.01.0−3.01.50.75−5.093.19−0.950.230.6327.6110.951.370.681.220.640.692.131.42
20202.01.0−2.01.00.00−1.892.010.060.220.205.195.530.761.121.050.550.541.001.00
21203.01.5−2.01.00.75−3.394.780.69−0.05−0.3212.3824.380.930.650.990.410.381.501.94
22202.01.0−3.01.5−0.75−0.590.820.110.820.382.011.470.921.501.001.080.69−0.120.58
23203.01.5−2.01.0−0.75−0.391.230.42−0.200.211.013.051.071.070.970.070.370.500.06

For the first three rows of the table, τz= 0, so these show the effects of combination when there are no structural shifts. The model including w (Mw) is the more accurate of the individual models, because with β12= 1, the higher variability of w2w > σ2z) means that it explains more of the variation in the dependent variable. Nevertheless, the simple average of the two forecasts yields a smaller MSFE. The optimal combination assigns a weight of just over 0.6 to Mw (a little higher when relative MSFE weights are used), and the combined forecast is then a little smaller than in the case of averaging. Monte Carlo estimates of these weights are shown in the table under the columns headed inline image and inline image. Notice that the individual forecasts (and therefore the combinations) have a zero bias (to two decimal places) in the absence of location changes. The high value of rwz entails that the effects of z and w in the individual models (estimated as inline image and inline image) are quite different from their effects in the DGP. Our analytic derivations ignore terms of O(T−1): the Monte Carlo suggests that the qualitative results are the same for T= 100 and T= 10 (compare the first and third results), suggesting that these terms are indeed unimportant.

The next set of rows reports results for a shift φz equal to one standard deviation of the z-equation disturbance term, namely μzz (equalling one standard deviation of z in the absence of explanatory variables in the z-equation). Consider row 4. This suggests that the relative percentage reductions in MSFE can be much larger when there are structural shifts. The bias in the forecasts from Mw is approximately the value of the shift. By including z, Mz picks up the value of the shift, but because the coefficient on z is approximately double that in the DGP, this model over-predicts by approximately the amount of the shift. Now the combination based on optimal weights (Mλ) no longer delivers the smallest MSFE: just as the best model in-sample may not yield the most accurate forecasts when there are structural changes, so the optimal combination in-sample may no longer be optimal for out-of-sample forecasting. When rwz= 0 (row 5), inline image is an unbiased estimator of β1, so that Mz is unbiased. Nevertheless, combination is still better (averaging is optimal): it pays to combine with the biased predictor. Rows 7 and 8 illustrate the results of combination when rwz < 0, so that both individual models are biased in the same direction, and averaging leads to a worse outcome than the best, but still outperforms the worst individual forecast. The optimal combination remains dominant, but the weights are outside (0, 1), and relative MSFE weights give similar results to averaging. The third set of rows are for φz,T + 1= 2σz. Row 9 illustrates a greater proportionate reduction in MSFE from combination. Row 10 (rwz= 0) indicates that for shifts of this size the bias induced in Mw is large enough to counteract the benefits to combination, and Mz has the smallest MSFE.

8.1.2. Forecast and estimation period shifts The fourth and fifth panels of Table 2 replicate the second and third, but with a shift in the intercept of the w process of two and minus two times the standard error of its disturbance, respectively, taking effect in periods T and T+ 1 (φw,Tw,T + 1=±2σz). We allow intercepts in the Mz and Mw models, but otherwise proceed as above. Note that the impact of the single observation T on the estimation of the models' parameters is relatively minor, so that the results for consecutive shifts in the forecast period would be qualitatively similar. From the bottom panel, it is apparent that combination can yield large percentage reductions in MSFE when the explanatory variables undergo shifts in different directions, and the variables are positively correlated (rwz > 0, rows 19 and 21). Then, the upward bias in the coefficient estimates exacerbates the forecast biases of Mz and Mw. When rwz < 0, the models' slope parameter estimates are biased towards zero, and the forecasts from both individual models are closer to the actual value of yT + 1 that results from the largely offsetting shifts in the two explanatory variables. When the shifts are in the same direction (rows 14–18) the deterioration in the individual models' performances is less pronounced, but depending on the relative sizes of the shifts and the importance of the individual explanatory variables, combination can either beat the best individual forecast or guard against inadvertently choosing the worst. For example, in row 14 the size of the shift in w relative to that in z is such that Mw is better than averaging, but a scheme that assigns a higher weight (λ) to Mw delivers a smaller MSFE (the table presents results for average values of 0.64 and 0.69). When the size of the shifts is more comparable (e.g. row 16), averaging is again beneficial.

8.2. Autocorrelated explanatory variables

Table 3 reports results for a subset of these experiments, except that z and w now follow AR(1) processes, with an autoregressive coefficient of 0.9. Keeping the same values of the disturbance variances as before, the variances of z and w increase by a factor of approximately 5, so that the costs to omitting either in terms of MSFE is now larger: see Mz and Mw in the first three rows of the table, for example. The proportionate gains to combination are correspondingly greater. Now, combination pays even when rwz= 0 and μz= 2 (row 10), but note that the size of the shift relative to the standard error of z has fallen.

Table 3.  Simulation results: Extraneous shift when z and w are AR(1) processes.
 Tφz,T + 1σzσwrwzForecast biasMSFEinline imageinline imageinline imageinline image
MzMwMCMλMλMMzMwMCMλMλM
  1. Mz and Mw refer to the forecasts from the models including z and w, respectively. MC assigns equal weights to each forecast, Mλ assigns ‘optimal’ weights based on the formula in Bates and Granger, and Mmath image assigns optimal weights but omitting the covariance terms between the models' forecast errors.

 No shift 
 1100.000.001.001.500.750.01−0.000.000.000.005.362.470.830.660.740.610.68 2.131.50
 220.000.001.001.500.75−0.010.010.000.000.005.152.351.060.820.920.610.66 2.131.50
 310.000.001.001.500.750.00−0.00−0.00−0.00−0.004.792.221.170.850.980.610.65 2.121.50
 φz,T + 1z 
 420.001.001.001.500.75−1.131.01−0.060.210.336.853.371.170.921.100.610.66 2.131.50
 520.001.001.001.000.00−0.011.000.490.470.505.656.113.002.502.720.500.50 1.001.00
 620.001.501.501.000.75−0.751.510.380.150.063.147.361.240.891.020.390.34 1.502.12
 720.001.001.001.50−0.751.121.001.060.750.996.823.344.072.453.351.120.66−0.120.50
 820.001.501.501.00−0.750.751.491.120.640.963.137.304.132.443.39−0.120.34 0.50−0.12 
 φz,T + 1= 2 ×σz 
 920.002.001.001.500.75−2.262.01−0.130.420.6611.906.381.491.201.610.610.66 2.131.50
1020.002.001.001.000.00−0.012.000.990.931.006.909.104.053.423.760.500.50 1.001.00
1120.003.001.501.000.75−1.513.010.750.310.115.3914.151.811.101.290.390.34 1.502.12
1220.002.001.001.50−0.752.242.002.121.501.9711.816.347.744.316.391.120.66−0.120.50
1320.003.001.501.00−0.751.492.992.241.281.935.3514.048.054.246.52−0.120.34 0.50−0.12 

8.3. Shifts in the y equation

Table 4 reports the results for the three combination schemes and the individual model forecasts when there are shifts in the y equation. We chose the parameter values corresponding to row 2 of Table 2, so μz= 0, σz= 1 and σw= 1.5, and rwz= 0.75 in the first panel. The shifts in the y equations are defined by: τ, the time of the shift, whereby the new values take effect from τ+ 1 onwards, and τ= 15 or 18 for T= 20; δ0, the shift in the (hitherto zero-valued) intercept of 0.8 (twice the standard deviation of the y-equation disturbance term); and δ1, the shift in the coefficient on z, where δ1= 1 so the coefficient doubles in size. The second and third sets of four rows repeat the first, but with rwz= 0, and rwz=−0.75. For both the Mz and Mw models, intercepts are estimated to accommodate the shifts in the y equation.

Table 4.  Simulation results: Shifts in y-equation.
 Tτδ0δ1rwzForecast biasMSFEinline imageinline imageinline imageinline image
MzMwMCMλMλMMzMwMCMλMλM
  1. Mz and Mw refer to the forecasts from the models including z and w, respectively. MC assigns equal weights to each forecast, Mλ assigns ‘optimal’ weights based on the formula in Bates and Granger, and Mmath image assigns optimal weights but omitting the covariance terms between the models’ forecast errors.

120.00150.800.750.610.600.610.610.611.641.040.680.670.680.610.63 2.131.50
220.00180.800.750.730.720.730.730.731.801.190.840.820.830.610.64 2.131.50
320.0015010.750.010.010.010.010.011.852.361.311.381.410.550.56 2.381.62
420.0018010.750.010.010.010.010.012.082.471.561.651.680.590.61 2.231.55
520.00150.800.000.610.610.610.610.613.041.671.441.331.350.690.66 1.001.00
620.00180.800.000.730.730.730.730.733.201.821.601.491.500.690.66 1.001.00
720.0015010.000.020.010.010.020.023.244.382.742.902.890.590.58 1.251.00
820.0018010.000.020.010.010.020.013.474.332.933.143.130.650.63 1.101.00
920.00150.80−0.750.610.610.610.610.611.641.041.201.011.131.100.63−0.120.50
1020.00180.80−0.750.730.730.730.730.731.801.191.361.171.281.110.64−0.120.50
1120.001501−0.750.010.010.010.010.011.852.341.952.111.980.830.57 0.130.37
1220.001801−0.750.010.010.010.010.012.082.452.122.342.171.010.61−0.020.45

The results suggest the following. The individual forecasts (and therefore combinations) remain unbiased when δ1 is not equal to zero, because z is a mean-zero variable. Nevertheless, for both types of shift, combination proves to be efficacious for rwz= 0.75 and rwz= 0, but, as in the absence of such shifts, is generally less so when rwz is negative.

8.4. Completely omitted variables

Our analytic derivations assume that the variables in the models span the explanatory variables in the DGP, so each model only excludes variables which the other contains. The condition that all the variables in the DGP are included in at least one of the models would appear to be unimportant to our explanations of why pooling works, but we checked that aspect in a further Monte Carlo study reported in table 5. There we report experiments based on rows 1 to 6 of table 2, but allowing an additional variable {qt} to enter the DGP with a unit coefficient. This variable is mean-zero white noise, with a variance of unity, but with a shift to a mean of unity (rows 1–6) or 2 (rows 7–12) for periods T and T + 1, i.e., φq= 0 but φq,T:T+1= 1 or 2. If q were uncorrelated with the explanatory variables and φq,T:T + 1= 0, our analytical calculations would be unaffected, since q could be subsumed into the disturbance term, so only affect the equation error variance. Maintaining the inter-relatedness assumption, a shift in φq is equivalent to a shift in the intercept of the y equation. The interesting cases are when q is correlated with one or both of z and w. In our experiments, both correlations are one half. We also estimate intercepts in both the Mw and Mz models.

The first three rows of Table 5 (and rows 7–9) show that the individual-model forecasts (and therefore the combined forecast) are approximately biased by an amount equal to the size of the shift in q. Nevertheless, combination reduces the MSFEs. When z also shifts (rows 4–6), then because q and z shift in the same direction, and because the ‘omitted variable bias’ in Mz causes the coefficient on z to be upward biased, the forecast biases of Mz are smaller than either Mw or the combination. When, in addition, σz > σw so that z is the more important determinant of y (row 6), combination is worse than Mz (but only marginally so). For larger shifts in φq, Mz is relatively better than the combined forecast.

Table 5.  Simulation results: Shift in variable excluded from both models.
 Tφz,T + 1σzσwrwzForecast biasMSFEinline imageinline imageinline imageinline image
MzMwMCMλMλMMzMwMCMλMλM
  1. Mz and Mw refer to the forecasts from the models including z and w, respectively. MC assigns equal weights to each forecast, Mλ assigns ‘optimal’ weights based on the formula in Bates and Granger, and Mmath image assigns optimal weights but omitting the covariance terms between the models’ forecast errors.

1100.000.001.001.500.75 1.000.991.001.001.003.312.612.052.022.020.590.592.621.83
2 20.000.001.001.500.75 0.950.940.940.940.943.402.652.032.062.040.590.582.631.83
3 10.000.001.001.500.75 0.900.900.900.900.903.642.812.112.242.180.590.572.631.83
4 20.001.001.001.500.75−0.681.940.630.860.843.115.531.572.061.970.590.582.631.83
5 20.001.001.001.000.00 0.451.951.201.181.193.597.043.433.533.500.500.501.501.50
6 20.001.501.501.000.75−0.312.441.070.810.841.958.462.301.991.990.410.421.832.63
7100.000.001.001.500.75 1.991.981.991.991.996.285.565.004.984.980.590.582.621.83
8 20.000.001.001.500.75 1.901.891.891.891.896.115.354.734.774.740.590.572.631.83
9 10.000.001.001.500.75 1.801.801.801.801.806.115.274.574.744.630.590.562.631.83
10 20.001.001.001.500.75 0.272.891.581.811.772.7310.133.684.634.420.590.572.631.83
11 20.001.001.001.000.00 1.402.902.152.132.145.3711.656.626.696.660.500.501.501.50
12 20.001.501.501.000.75 0.643.392.021.761.812.2914.015.244.474.560.41 0.421.832.63

8.5. Summary

These simulations confirm the analytical results, and explore a number of extensions. The qualitative nature of the conclusions based on the analytical work hold up, so that model mis-specification and parameter non-constancy are seen to explain why combination, and especially averaging, often works in practice. When a DGP variable which is not included in any of the individual models undergoes a shift in mean, at the same time that other variables shift, a range of outcomes is possible depending on the exact design, that is, the relative sizes of the shifts, their relative contributions to the total variation in the dependent variable, and the signs of the cross-correlations, etc. In general, allowing for variables that do not enter any of the models could strengthen or weaken the case for combination when there are shifts.

9. POOLING AND FORECAST DENSITIES

  1. Top of page
  2. Abstract
  3. 1. INTRODUCTION
  4. 2. FORECASTING BY THE CONDITIONAL EXPECTATION
  5. 3. FORECASTS FROM MIS-SPECIFIED CONSTANT MODELS
  6. 4. IMPLEMENTING FORECAST COMBINATIONS
  7. 5. THE ROLE OF ENCOMPASSING
  8. 6. COMBINING UNDER EXTRANEOUS STRUCTURAL BREAKS
  9. 7. EMPIRICAL ILLUSTRATION
  10. 8. A MONTE CARLO STUDY
  11. 9. POOLING AND FORECAST DENSITIES
  12. 10. CONCLUSION
  13. ACKNOWLEDGEMENTS
  14. REFERENCES
  15. Appendix

Our emphasis has been on explaining the efficacy of pooling point forecasts: the vast majority of studies in which pooling has been found to be beneficial have looked at point forecasts. Nevertheless, in recent years there has been a growing recognition that often some measure of the degree of uncertainty surrounding a ‘central tendency’ will enhance the usefulness of the forecast, and that in some cases the dispersion, or the tails, or even the whole forecast density, are of interest. Section 4.2 considered forecast-error variances in stationary processes. However, the production and evaluation of interval and density forecasts have recently attracted a good deal of attention, but only rarely has pooling been applied outside of the analysis of point forecasts.7Granger et al. (1989) is a notable exception, looking at interval forecast combination using the quantile regression techniques of Koenker and Bassett (1978, 1982). A full analysis of the pooling of density forecasts is beyond the scope of the present paper, although in the remainder of this section we discuss some preliminary results obtained in the Monte Carlo described in Section 8.

9.1. Calculation of density forecasts

For simplicity, we use the Box–Jenkins method of calculating density forecasts. The drawbacks with this method, and a number of alternatives, are discussed in the context of interval forecast calculations by Clements and Taylor (2001). We assume that the forecast density is Gaussian with mean given by the model's point forecast, and variance given by the conditional variance of the forecast error. For a 1-step ahead density, the latter is approximated by the estimated equation standard error. For the pooled forecast densities, application of the logic of this approach suggests taking the forecast densities as being Gaussian, with mean given by the pooled point forecast, and the standard error estimated from the in-sample ‘residuals’, where these residuals are actuals less the linear combination of the two models' fitted values as in Section 4.2.

9.2. Evaluation of density forecasts

The key tool in the recent literature on density forecast evaluation is the probability integral transform. This can be traced back at least to Rosenblatt (1952), with recent contributions by Shephard (1994), Kim et al. (1998) and Diebold et al. (1998), inter alia. When the forecast densities closely approximate the true conditional densities, the probabilities obtained by integrating the forecast densities up to the realized values will constitute an i.i.d. series of uniform random variables on the unit interval. Thus the idea is to evaluate the forecast densities by assessing whether these probabilities are i.i.d. U[0, 1]. Let {zi}ni= 1 denote the sequence of n 1-step ahead probability integral transforms. In empirical applications, the hypothesis of interest is a joint hypothesis (uniformity and independence), and testing is complicated by the fact that the effects of a failure of independence on the distribution of the test for uniformity is unknown, and tests of autocorrelation (and independence more generally) will be affected by failure of the uniformity assumption. In view of this, some authors (e.g. Diebold et al. 1999) have recommended using graphical analysis in addition to formal tests.

Our task is somewhat easier. On each replication of the Monte Carlo, we obtain a 1-step forecast density for each of the models for period T. The sequences corresponding to {zi}ni= 1 are obtained from n replications, so that independence holds by virtue of the independence of the drawings of the pseudo-random numbers. Thus, only the distributional part of the joint hypothesis needs to be tested. Berkowitz (2001) suggests taking the inverse normal CDF transformation of the {zi}ni= 1 series, to give, say, {z*i}ni= 1. Thus, the zis, which are i.i.d. U[0, 1] under the null, become i.i.d. standard normal variates. Berkowitz argues that more powerful tools can be applied to testing a null of i.i.d. N[0, 1] compared to one of i.i.d. uniformity. He proposes a one degree-of-freedom test of independence against a first-order autoregressive structure, as well as a three degrees-of-freedom test of zero-mean, unit variance and independence. In each case the maintained assumption is that of normality, so that standard likelihood-ratio tests are constructed using the Gaussian likelihoods. We calculate a test of the assumption of normality (that recommended by Doornik and Hansen 1994) as well as a test of zero mean and unit variance, assuming normality and independence. Because our forecast densities are Gaussian, with mean given by the point forecast, and the standard error of the forecast given by the in-sample standard error, z* is just the standardized forecast error (actual less forecast, divided by the forecast standard error).

9.3. Monte Carlo results

Table 6 portrays results for a set of experiments focusing on ‘extraneous shifts’, using the design parameter values of the first 13 rows of Table 2. As before, each row relates to a different experiment. In each instance, the sample size T= 100, so that ‘small-sample’ effects should be of secondary importance.8 For rows 1 to 11 the number of replications is set at 100, and for rows 12 to 22, at 200, so that n= 100 and 200 respectively, providing some information on the effect of the number of forecast densities on the outcomes of the tests of the adequacy of the densities. For each model or combination method, the table gives the p-values of the Doornik–Hansen normality test, followed by the p-value of a zero mean and unit variance assuming normality (and independence).

Table 6.  Density evaluation results.
 φz,T + 1σzσwrwzMzMwMCMλMmath image
  1. Mz and Mw refer to the forecasts from the models including z and w, respectively. MC assigns equal weights to each forecast, Mλ assigns ‘optimal’ weights based on the formula in Bates and Granger, and Mmath image assigns optimal weights but omitting the covariance terms between the models’ forecast errors.

100 replications
 No shift 
10.001.001.500.750.640.630.670.130.780.310.590.650.340.60
 φz,T + 1z 
21.001.001.500.751.000.000.970.000.960.310.810.000.750.00
31.001.001.000.000.300.380.440.000.040.000.070.000.060.00
41.501.501.000.750.230.000.010.000.260.000.180.040.110.76
51.001.001.50−0.750.420.000.510.000.340.000.600.000.370.00
61.501.501.00−0.750.730.000.580.000.330.000.490.000.420.00
 φz,T + 1= 2 ×σz 
72.001.001.500.750.970.000.850.000.080.000.060.000.170.00
82.001.001.000.000.660.190.000.000.040.000.030.000.040.00
93.001.501.000.750.030.000.190.000.440.000.360.000.140.11
102.001.001.50−0.750.760.000.460.000.920.000.410.000.850.00
113.001.501.00−0.750.480.000.720.000.560.000.540.000.550.00
200 replications
 No shift 
120.001.001.500.750.720.210.430.570.360.120.340.180.290.21
 φz,T + 1z 
131.001.001.500.750.380.000.040.000.430.000.590.000.800.00
141.001.001.000.000.530.040.530.000.900.000.970.000.960.00
151.501.501.000.750.590.000.910.000.890.000.950.000.910.19
161.001.001.50−0.750.840.000.610.000.930.000.460.000.860.00
171.501.501.00−0.750.020.000.460.000.270.000.010.000.140.00
 φz,T + 1= 2 ×σz 
182.001.001.500.750.450.000.450.000.930.000.950.000.730.00
192.001.001.000.000.530.860.870.000.330.000.320.000.290.00
203.001.501.000.751.000.000.260.000.930.000.730.000.560.40
212.001.001.50−0.750.400.000.100.000.140.000.120.000.100.00
223.001.501.00−0.750.720.000.520.000.860.000.530.000.980.00

The results suggest the following. In the absence of shifts (rows 1 and 12), there is no evidence against the adequacy of the forecast densities obtained by either of the mis-specified models or of any of the three pooled methods. That is, for both n= 100 and 200 there is no evidence that the {z*i}ni= 1 series are not Gaussian, or zero mean and unit variance (conditional on their being Gaussian). Second, when there are shifts in z, the densities from model Mw (which omits z) are in all cases shown to be inadequate on the basis of the test for zero mean and unit variance of the z* series. The test for normality seldom rejects in this set of experiments. Third, when there are shifts in z, the Mz model densities are rejected in all cases except when rwz= 0: only then will the impact of z in the Mz model match that in the DGP, so that the mean of the model density correctly shifts to match the shift in the location of the data. Finally, when rwz= 0.75, σz= 1.5 and σw= 1 (e.g. rows 4 and 9), the optimal combination that ignores the covariances yields forecast densities that are not rejected, though they are for the other two combination schemes.

10. CONCLUSION

  1. Top of page
  2. Abstract
  3. 1. INTRODUCTION
  4. 2. FORECASTING BY THE CONDITIONAL EXPECTATION
  5. 3. FORECASTS FROM MIS-SPECIFIED CONSTANT MODELS
  6. 4. IMPLEMENTING FORECAST COMBINATIONS
  7. 5. THE ROLE OF ENCOMPASSING
  8. 6. COMBINING UNDER EXTRANEOUS STRUCTURAL BREAKS
  9. 7. EMPIRICAL ILLUSTRATION
  10. 8. A MONTE CARLO STUDY
  11. 9. POOLING AND FORECAST DENSITIES
  12. 10. CONCLUSION
  13. ACKNOWLEDGEMENTS
  14. REFERENCES
  15. Appendix

Practical experience shows that combining forecasts has value added and can dominate even the best individual device. Thus, we considered selecting a forecasting method by pooling several individual devices when no model coincides with a non-constant DGP.

We first show that averaging guarantees ‘insurance’, and may provide dominance, when the models are differentially mis-specified even for a constant DGP. While such a result can occur in weakly stationary processes, we suspect that empirical findings are better explained by the intermittent occurrence of location shifts in unmodelled explanatory variables. Consequently, we demonstrate that when forecasting time series that are subject to location shifts, the average of a group of forecasts from differentially mis-specified models can outperform them all on average over repeated forecasting episodes. Moreover, averaging may well then dominate over estimated weights in the combination. Finally, it cannot be proved that only non-encompassed devices should be retained in the combination.

In practice, trimmed means, or perhaps medians, might be needed to exclude ‘outlying’ forecasts, since otherwise, one really poor forecast could needlessly worsen a combination.

Both the empirical and Monte Carlo simulation illustrations confirmed the theoretical analysis. The average of the levels forecasts outperformed the best individual forecasts in both settings, sometimes spectacularly. However, in the empirical example, bias correcting the forecasts removed much of the benefit of averaging, and other devices for robustifying forecasts to breaks did even better. Thus, although we have established that combination can be beneficial in our theoretical framework, comparisons with other approaches are merited.

Hendry and Clements (2003) present 10 cases where well-known empirical phenomena in economic forecasting can be explained by a theory of mis-specified models of processes that experience intermittent location shifts. The present paper extends that list to 11 cases. We believe that the related results on forecasting using ‘factor models’ can be accounted for by the same general theory, and are also investigating multi-step estimation within that framework.

ACKNOWLEDGEMENTS

  1. Top of page
  2. Abstract
  3. 1. INTRODUCTION
  4. 2. FORECASTING BY THE CONDITIONAL EXPECTATION
  5. 3. FORECASTS FROM MIS-SPECIFIED CONSTANT MODELS
  6. 4. IMPLEMENTING FORECAST COMBINATIONS
  7. 5. THE ROLE OF ENCOMPASSING
  8. 6. COMBINING UNDER EXTRANEOUS STRUCTURAL BREAKS
  9. 7. EMPIRICAL ILLUSTRATION
  10. 8. A MONTE CARLO STUDY
  11. 9. POOLING AND FORECAST DENSITIES
  12. 10. CONCLUSION
  13. ACKNOWLEDGEMENTS
  14. REFERENCES
  15. Appendix

Financial support from the U.K. Economic and Social Research Council under grants L116251015 and L138251009 is gratefully acknowledged by both authors. Computations were performed using the Gauss programming language, Aptech Systems, Inc., Washington. We are grateful to Neil Shephard and two anonymous referees for helpful comments.

Footnotes
  • 1

    There is a substantial literature on Bayesian model averaging, which is claimed to reduce ‘model uncertainty’. In our general framework, the key uncertainty relates to which (mis-specified) model will best represent the outcomes after an unanticipated location shift. It seems impossible to form ‘priors’ on that, and ‘in-sample’ weights will not be a useful guide for combining. Even if one were correct and certain about the in-sample model, pooling that with encompassed models could dominate its forecasts in such a setting, although some other solutions (such as ICS) would also be viable.

  • 2

    Combination could also offset measurement errors in preliminary data (see Gallo and Mariano 1994).

  • 3

    As an example of a possible factor, consider the early successes based on the combination of time-series models and (largely) static economic models. The failure to model the dynamics in the latter and the absence of causal factors in the former constitute important sources of model mis-specification.

  • 4

    The figures we report are based on our own calculations. We reproduce the forecasts, and forecast errors etc., based only on the actual series. Some small differences were observed relative to Bates and Granger's figures, presumably because of improved precision.

  • 5

    The optimal combination for the bias-corrected forecasts, imposing the constraint that they sum to unity (and with a zero intercept in the combination) was −0.22 on the linear forecasts, delivering an SSE of 72.61. Negative weights sometimes appear anomalous but may be warranted, as here, when the individual forecast errors are highly positively correlated. In general, structural breaks may be expected to bias all the forecasts in the same direction, resulting in positively correlated forecast errors. In our example, the method of bias correction increases the degree of positive correlation between the individual model's forecast errors (from 0.76 to 0.91), and the less accurate linear model forecasts attract a negative weight. Including an intercept in the combination results in a smaller SSE of 60.1, but the optimal combination weights that sum to unity are largely unchanged at −0.23 and 1.23.

  • 6

    The optimal combination, as derived by Bates and Granger (1969), chooses the weights to minimize the MSFE of the combined forecast (subject to the weights on the individual forecasts summing to unity). This involves covariance terms between the models' forecast errors. When these are ignored, the optimal weight is given by the relative MSFEs alone. For simplicity, we substitute the in-sample estimated residuals for the 1-step in-sample (1, …, T) forecast errors in calculating the weights, so that the period t‘forecast error’ is based on parameter estimates obtained on data up to T, rather than t− 1.

  • 7

    On interval forecasts, see Granger et al. (1989), Chatfield (1993), Christoffersen (1998) and Clements and Taylor (2003), and on density forecasts, Diebold et al. (1998, 1999a, 1999b), Clements and Smith (2000), and the review by Tay and Wallis (2000). Wallis (2003) evaluates both types of forecast in a single framework using chi-squared goodness-of-fit tests.

  • 8

    So, for example, ignoring parameter estimation uncertainty in the construction of the forecast density, as happens when the Box–Jenkins method is used, should have only a minor influence.

REFERENCES

  1. Top of page
  2. Abstract
  3. 1. INTRODUCTION
  4. 2. FORECASTING BY THE CONDITIONAL EXPECTATION
  5. 3. FORECASTS FROM MIS-SPECIFIED CONSTANT MODELS
  6. 4. IMPLEMENTING FORECAST COMBINATIONS
  7. 5. THE ROLE OF ENCOMPASSING
  8. 6. COMBINING UNDER EXTRANEOUS STRUCTURAL BREAKS
  9. 7. EMPIRICAL ILLUSTRATION
  10. 8. A MONTE CARLO STUDY
  11. 9. POOLING AND FORECAST DENSITIES
  12. 10. CONCLUSION
  13. ACKNOWLEDGEMENTS
  14. REFERENCES
  15. Appendix
  • Andrews, M. J., A. P. L. Minford and J Riley (1996). On comparing macroeconomic forecasts using forecast encompassing tests. Oxford Bulletin of Economics and Statistics 58, 279305.
  • Bates, J. M. and C. W. J. Granger (1969). The combination of forecasts. Operations Research Quarterly 20, 45168.
  • Berkowitz, J. (2001). Testing density forecasts, with applications to risk management. Journal of Business and Economic Statistics 19, 46574.
  • Chatfield, C. (1993). Calculating interval forecasts. Journal of Business and Economic Statistics 11, 12135.
  • Chevillon, G. (2000). Multi-step estimation for forecasting non-stationary processes. M. Phil. Thesis, Economics Department, University of Oxford .
  • Chong, Y. Y. and D. F. Hendry (1986). Econometric evaluation of linear macro-economic models. Review of Economic Studies 53, 67190.
  • Christoffersen, P. F. (1998). Evaluating interval forecasts. International Economic Review 39, 84162.
  • Clemen, R. T. (1989). Combining forecasts: A review and annotated bibliography. International Journal of Forecasting 5, 55983.
  • Clements, M. P. and D. F. Hendry (1993). On the limitations of comparing mean squared forecast errors. Journal of Forecasting 12, 61737.
  • Clements, M. P. and D. F. Hendry (1999). Forecasting Non-stationary Economic Time Series. Cambridge , MA : MIT Press.
  • Clements, M. P. and J. Smith (2000). Evaluating the forecast densities of linear and non-linear models: Applications to output growth and unemployment. Journal of Forecasting 19, 25576.
  • Clements, M. P. and N. Taylor (2001). Bootstrapping prediction intervals for autoregressive models. International Journal of Forecasting 17, 24767.
  • Clements, M. P. and N. Taylor (2003). Evaluating prediction intervals for high-frequency data. Journal of Applied Econometrics 18, 44556.
  • Coulson, N. F. and R. P. Robins (1993). Forecast combination in a dynamic setting. Journal of Forecasting 12, 6368.
  • Diebold, F. X. (1988). Serial correlation and the combination of forecasts. Journal of Business and Economic Statistics 6, 10511.
  • Diebold, F. X. (1989). Forecast combination and encompassing: Reconciling two divergent literatures. International Journal of Forecasting 5, 58992.
  • Diebold, F. X., T. A. Gunther and A. S. Tay (1998). Evaluating density forecasts: With applications to financial risk management. International Economic Review 39, 86383.
  • Diebold, F. X., J. Y. Hahn and A. S. Tay (1999a). Multivariate density forecast evaluation and calibration in financial risk management: High frequency returns on foreign exchange. Review of Economics and Statistics 81, 66173.
  • Diebold, F. X. and J. A. Lopez (1996). Forecast evaluation and combination. In G. S.Maddala and C. R.Rao (Eds.), Handbook of Statistics, Volume 14, pp. 24168. Amsterdam : North-Holland.
  • Diebold, F. X., A. S. Tay and K. F. Wallis (1999b). Evaluating density forecasts of inflation: The Survey of Professional Forecasters. In R. F.Engle and H.White (Eds.), Festschrift in Honor of C. W. J. Granger, pp. 7690. Oxford : Oxford University Press.
  • Doornik, J. A. and H. Hansen (1994). A practical test for univariate and multivariate normality. Discussion paper, Nuffield College .
  • Ericsson, N. R. (1992). Parameter constancy, mean square forecast errors, and measuring forecast performance: An exposition, extensions, and illustration. Journal of Policy Modeling 14, 46595.
  • Ericsson, N. R. and J. Marquez (1993). Encompassing the forecasts of U.S. trade balance models. Review of Economics and Statistics 75, 1931.
  • Fildes, R. and K. Ord (2002). Forecasting competitions—their role in improving forecasting practice and research. In M. P.Clements and D. F.Hendry (Eds.), A Companion to Economic Forecasting, pp. 32253. Oxford : Blackwells.
  • Forni, M., M. Hallin, M. Lippi and L. Reichlin (2000). The generalized factor model: Identification and estimation. Review of Economics and Statistics 82, 54054.
  • Frisch, R. and F. V. Waugh (1933). Partial time regression as compared with individual trends. Econometrica 1, 22123.
  • Gallo, G. M. and R. S. Mariano (1994). Combining provisional data and forecasts in nonlinear models. Working papers n.47, Dipartimento Statistico, Universita' Degli Studi Di Firenze .
  • Granger, C. W. J. (1989). Combining forecasts—Twenty years later. Journal of Forecasting 8, 16773.
  • Granger, C. W. J. and R. Ramanathan (1984). Improved methods of combining forecasts. Journal of Forecasting 3, 197204.
  • Granger, C. W. J., H. White and M. Kamstra (1989). Interval forecasting: An analysis based upon ARCH quantile estimators. Journal of Econometrics 40, 8796.
  • Harvey, D. I., S. Leybourne and P. Newbold (1998). Tests for forecast encompassing. Journal of Business and Economic Statistics 16, 25459.
  • Hendry, D. F. (2000). On detectable and non-detectable structural change. Structural Change and Economic Dynamics 11, 4565.
  • Hendry, D. F. and M. P. Clements (2001). Forecasting using factor models. Mimeo, Economics Department, University of Oxford .
  • Hendry, D. F. and M. P. Clements (2003). Economic forecasting: Some lessons from recent research. Economic Modelling 20, 30129.
  • Hendry, D. F. and J. A. Doornik (1997). The implications for econometric modelling of forecast failure. Scottish Journal of Political Economy 44, 43761.
  • Hoogstrate, A. J., F. C. Palm and G. A. Pfann (2000). Pooling in dynamic panel-data models: an application to forecasting GDP growth rates. Journal of Business and Economic Statistics 18, 27483.
  • Judge, G. G. and M. E. Bock (1978). The Statistical Implications of Pre-Test and Stein-Rule Estimators in Econometrics. Amsterdam : North Holland Publishing Company .
  • Kim, S., N. Shephard and S. Chib (1998). Stochastic volatility: likelihood inference and comparison with ARCH models. Review of Economic Studies 81, 36193.
    Direct Link:
  • Koenker, R. and G. Bassett (1978). Regression quantiles. Econometrica 46, 3355.
  • Koenker, R. and G. Bassett (1982). Robust tests for heteroscedasticity based on regression quantiles. Econometrica 50, 4362.
  • Lu, M. and G. E. Mizon (1991). Forecast encompassing and model evaluation. In P.Hackl and A. H.Westlund (Eds.), Economic Structural Change, Analysis and Forecasting, pp. 12338. Berlin : Springer-Verlag.
  • Newbold, P. and C. W. J. Granger (1974). Experience with forecasting univariate time series and the combination of forecasts. Journal of the Royal Statistical Society A 137, 13146.
  • Newbold, P. and D. I. Harvey (2002). Forecasting combination and encompassing. In M. P.Clements and D. F.Hendry (Eds.), A Companion to Economic Forecasting, pp. 26883. Oxford : Blackwells.
  • Rosenblatt, M. (1952). Remarks on a multivariate transformation. Annals of Mathematical Statistics 23, 47072.
  • Shephard, N. (1994). Partial non-Gaussian state space. Biometrika 81, 11531.
  • Stock, J. H. and M. W. Watson (1999). A comparison of linear and nonlinear models for forecasting macroeconomic time series. In R. F.Engle and H.White (Eds.), Cointegration, Causality and Forecasting, pp. 144. Oxford : Oxford University Press.
  • Tay, A. S. and K. F. Wallis (2000). Density forecasting: A survey. Journal of Forecasting 19, 23554.
  • Wall, K. D. and C. Correia (1989). A preference-based method for forecast combination. Journal of Forecasting 8, 26992.
  • Wallis, K. F. (2003). Chi-squared tests of interval and density forecasts, and the Bank of England's fan charts. International Journal of Forecasting 19, 16576.

Appendix

  1. Top of page
  2. Abstract
  3. 1. INTRODUCTION
  4. 2. FORECASTING BY THE CONDITIONAL EXPECTATION
  5. 3. FORECASTS FROM MIS-SPECIFIED CONSTANT MODELS
  6. 4. IMPLEMENTING FORECAST COMBINATIONS
  7. 5. THE ROLE OF ENCOMPASSING
  8. 6. COMBINING UNDER EXTRANEOUS STRUCTURAL BREAKS
  9. 7. EMPIRICAL ILLUSTRATION
  10. 8. A MONTE CARLO STUDY
  11. 9. POOLING AND FORECAST DENSITIES
  12. 10. CONCLUSION
  13. ACKNOWLEDGEMENTS
  14. REFERENCES
  15. Appendix

This appendix details the derivation of the individual model MSFEs, and that of the combined forecast, reported in Section 3. We derive the conditional biases and variances of the forecast errors, and then combine them. First, using Equation (10) for the forecast error from the model given in (6):

  • image

and similarly for inline image. Let inline image, where

  • image((A.1))

where

  • image

using

  • image((A.2))

Notice that

  • image

where V[·] denotes a variance, so

  • image

and

  • image

Similarly,

  • image

where

  • image

Thus

  • image

with

  • image

Letting M[·] denote MSFE

  • image

where the final expression ignores terms of Op(T−1). Similarly,

  • image

with

  • image

These are the MSFEs for the two individual models reported in the main text as Equations (12) and (13).

The combined forecast is

  • image

where the last expression relates pooling to IC, with error:

  • image

so

  • image

Also (ignoring terms of Op(T−1)),

  • image

which is the expression for the combined forecast MSFE reported in the text as Equation (17).