Stochastic Modeling and Environmental Change
Published Online: 15 SEP 2006
Copyright © 2002 John Wiley & Sons, Ltd
Encyclopedia of Environmetrics
How to Cite
Young, P. C. and Pedregal, D. J. 2006. Forecasting, Environmental. Encyclopedia of Environmetrics. 3.
- Published Online: 15 SEP 2006
The term ‘environmental forecasting’ is open to a range of interpretations, from ‘black-box’ time series analysis and forecasting, based on relatively simple stochastic methods; to ‘mechanistic’ prediction based on ‘physically meaningful’, and normally quite complex, deterministic simulation models. In this entry, however, we concentrate on stochastic forecasting methods based on models that range from purely black-box time series characterizations of data to data-based mechanistic (DBM) models that are physically meaningful but, at the same time, fairly simple in their structure and signal topology .
There are numerous statistical approaches to forecasting, from simple, regression-based methods to optimal statistical procedures formulated in stochastic State–Space (SS) terms. Since it would be impossible to review all these methods here, this entry first reviews briefly those that are felt to have most significance in theoretical and practical terms within the specific context of environmental forecasting. It then discusses in more detail a unified approach to forecasting based on a generic unobserved component (UC) model, formulated in stochastic, SS terms.
A Brief Review of Model-Based Forecasting
The more recent approaches to stochastic forecasting are all model-based and allow inherently for the specification of dynamic relationships. They began with a consideration of simple, univariate series, stimulated by the discovery that such models were not only simpler to estimate, but they could often produce forecasts that compared well with, and often out-performed, those generated by much larger mechanistic models.
Some of the earliest and simplest univariate options are those based on forms of exponential smoothing (ES). These were introduced by Holt  and Winters  but, because of their practical utility and success, they are still used today. Different versions of these ES forecasting procedures are available, but the general idea is the synthesis of forecasting functions on the basis of discounted past observations. These apparently ad hoc methods can, however, be justified and unified by reference to both the univariate autoregressive integrated moving average (ARIMA) model (see Time Series) and alternative UC models .
Although there were a number of important contributions dealing with the ARIMA class of models prior to 1970, that year marked the beginning of the popularization of this model and its associated forecasting methodology, with the publication of Box and Jenkins' most influential book Time-series Analysis, Forecasting and Control . Indeed, the influence of this monograph has been so great that it is considered by some authors to mark the beginning of modern forecasting developments. This methodology has served as the basis of innumerable developments and extensions since 1970, as the number of published books on the subject clearly demonstrates (see [6-11]). Also within the univariate context are AR Fractional Integrated MA (ARFIMA) models (e.g. ), in which the integration order is not necessarily an integer number, as assumed in ARIMA models. This latter model has become a popular type of ‘long memory’ model, with obvious environmental implications.
One important extension of the univariate models is to multivariate or vector AR and ARMA models (VAR; VARMA), as introduced by Quenouille  and Tiao and Box , and described in the book by Lütkepohl . Often, VAR is preferred to VARMA because any VARMA model can be expressed as an equivalent (although parametrically less efficient) VAR model and, in addition, the estimation methods are simpler in the latter case. In particular, estimation of the VAR model can be carried out separately equation-by-equation using linear least squares methods, without any loss of efficiency. Versions of these models with exogenous input variables have also been developed (i.e. VARMAX and VARX models).
Another time-series approach that has developed in parallel with the previous methodologies and, indeed, can subsume some of them straightforwardly, is the UC model (also known in the econometrics literature as the structural model (SM): see ). As we shall see in the next section, these models are based on a decomposition of the time series into simple components (sometimes also called ‘stylized facts’).
The UC approach to forecasting can be divided roughly into two groups. First, those methods that are based directly on the UC model and take a variety of forms depending upon the nature of the stochastic models used to define the components (e.g. [4, 16-25]). Second, reduced form models, in which the UC model components are inferred from other models, most notably the Box–Jenkins ARIMA type. (e.g. [26-28]). However, in this reduced form approach, the UC components are used simply as submodels for signal extraction, normally without an integrated option for forecasting.
The Unobserved Component Model
In their present form, UC models derive originally from the statistics and econometrics literature, where they have been variously termed dynamic linear models  (see Dynamic Model) and structural models . However, their origin is intimately linked with state–space methods of recursive estimation (e.g. [22, 29]) and, in particular, the Kalman Filter (KF) .
Various Approaches to UC Modeling
The components of the UC model normally include a low frequency stochastic trend; a (possibly damped) periodic cycle defined at fundamental and associated harmonic frequencies; and an irregular component, normally considered as a zero mean, serially uncorrelated sequence of random variables (white noise). In the case of univariate time series (e.g. the famous Mauna Loa atmospheric CO2 series shown in Figure 1), these components are normally visually recognizable both in a temporal plot of the series and in its associated frequency spectrum.
However, in the environmental context, multivariable processes are often active, so that the environmental variable of interest may be affected by one or more input or ‘exogenous’ variables entering the model through regression or transfer function (TF) terms. A typical example is the flow or ‘discharge’ series shown in the upper panel of Figure 2, which is obviously affected by the local rainfall shown in the lower panel.
Reflecting their perceived importance in a forecasting and signal extraction context, many extensions of UC models have been developed. For instance, new models for trend and seasonal components have been introduced recently, including a facility to allow for the modulation of one periodic component by another (e.g. a diurnal component modulated by a weekly component: see [31, 32]); the introduction of nonlinear and TF components (e.g. ); and the use of vector models including phenomena such as cointegration [21, 34, 35].
Theoretical and Algorithmic Background
For simplicity in this entry, attention will be restricted to univariate UC models; i.e. a model that can be used for forecasting a single time series variable, although it is assumed that this variable may be affected by other input variables. A typical univariate UC model of this type takes the following form:
where Tt is a nonstationary trend or very low frequency component; Ct is a cyclical or periodic component that may have time variable amplitude and phase characteristics; f(Ut) are linear or nonlinear relations between the outputs and any specified inputs, denoted here by the vector Ut; Nt is colored (often assumed ARMA) noise; and et is a zero mean, serially uncorrelated white noise sequence with variance σ2. In this formulation, Ct may represent cyclical components of any period that seems appropriate, e.g. in a typical environmental application they could represent annual, weekly, or diurnal cycles, or combinations of these (see Temporal Change).
For estimation purposes, 1 is considered in the following alternative vector ‘observation equation’ form:
where Ht relates the various components in 1 to the n-dimensional stochastic state vector xt. The definition of xt and exact form of Ht will depend upon the nature of the components in 1 and the assumed model for their stochastic evolution. However, in general terms, this evolution is defined by a set of stochastic state equations of the following form:
Here, ut is an m-dimensional vector of inputs that may have elements in common with Ut; and ηt is an n-dimensional vector of (typically assumed Gaussian) system disturbances, i.e. zero-mean white-noise inputs with covariance matrix Q. In general, et is assumed to be independent of ηt, and these two noise inputs are assumed to be independent of the initial state vector x0. A, B, G and Ht are, respectively, the n × n, n × m, n × n and 1 × n matrices, some elements of which are known and others of which need to be estimated. The manner in which these matrices are defined in specific applications is discussed in the cited publications. Two environmental examples relating to data shown in Figures 1 and 2 are considered later.
The main reason for formulating the system model in discrete-time state–space (SS) terms is to facilitate the process of recursive estimation by exploiting the power of the KF (see Space-Time Kalman Filter) and fixed interval smoothing (FIS, ) algorithms. In recursive estimation, which has a long history, going back to Gauss (see e.g. ), the estimates of model parameters and states are updated sequentially while working through the data in some specified order, usually forward or reverse in time. It is well known that this allows for a number of useful operations when dealing with real data, operations that are not normally available in other forecasting/signal extraction algorithms. If missing data are detected anywhere within, or immediately outside, the data set, then the filtering and smoothing algorithms simply replace the missing samples by their expectations, based on the SS model and the data. In this way, the KF can produce multiple-step ahead forecasts, while FIS can provide interpolations and backcasts of the series. Also, it is useful in ‘variance intervention’ (e.g. ) and related, more complex, methods for handling sudden changes in the trend level.
For a data set of N observations, the KF algorithm runs forward in time and yields a ‘filtered’ estimate of the state vector xt at every sample t, based on the time-series data up to sample t. The FIS algorithm, in contrast, is applied subsequent to the filtering pass and runs in reverse time, producing a ‘smoothed’ estimate of xt which, at every sample t, is based on all N observations of the data. This means that, as more information is used in the latter estimate, its mean square error cannot be greater than the former. As these algorithms are discussed in detail in other references (including some of those already cited), we will not pursue the topic further (see, for example, [4, 19, 22, 25]).
Optimization of Hyper-Parameters
The application of the recursive KF/FIS algorithms requires knowledge of the system matrices A, B and Ht, together with the noise covariance matrix Q and the variance σ2 of the white noise et. In many cases, the latter variance–covariance parameters can be combined to form a noise variance ratio (NVR) matrix Qr = Q/σ2 (usually assumed to be diagonal for simplicity). Depending on the particular structure of the model, there will be a number of elements in the above matrices that are known a priori (often zeros and ones). Normally, however, there will be some unknown elements, referred to as hyper parameters that must be estimated in some manner. Of particular importance within this stochastic setting are the diagonal elements of the NVR matrix. These NVR parameters inject a statistical degree of freedom into the model and so allow for stochastic variations in the state variables. As we shall see, this is the primary mechanism for estimating temporal changes in the parameters and variables of the UC model (see also the discussion in ).
There are a number of ways of handling this hyper-parameter estimation problem. The best known approach is based on maximum likelihood (ML) optimization in the time domain. Assuming that all the disturbances in the SS model are normally distributed, the log-likelihood function can be computed using the KF and a technique known as ‘prediction error decomposition’ [4, 39]. The hyper parameters can then be estimated by numerical optimization of this likelihood function.
There is no doubt that ML has theoretical and practical advantages. Its optimal properties are well-known and, in the above form, it is generally applicable if Gaussian assumptions are valid. But ML in the time domain is not the only estimation method available. Other methods are discussed in [4, 19, 21, 25, 27, 28, 40]. This last reference suggests an alternative frequency domain (YFD) method for hyper parameter optimization that appears to be superior to ML, particularly for higher-dimensional models. In particular, the optimum is much better defined and computation times are much less than those required for equivalent ML optimization. Indeed, ML very often fails to converge at all in these higher-dimensional situations and convergence may only be achieved if several harmonics in the seasonal components are constrained to have the same hyper parameters. Paradoxically, therefore, the YFD method can produce solutions that are even better in likelihood terms than the constrained versions of the same model estimated directly by ML optimization (see ).
Most of the methods mentioned here have been implemented in statistical packages available commercially. These include: the CAPTAIN Matlab Tool-box (see http:/cres1.lancs.ac.uk/ systems.html); STAMP® ; BATS® ; and SEATS® . Of course, no one optimization and estimation method will outperform the rest in all situations, and all have their own particular advantages and disadvantages. Consequently, as is often the case, the user's experience and knowledge is essential in selecting the best option for each application.
Other publications illustrate how the modeling and forecasting procedure outlined above can be applied to environmental and other time series (see  and the prior references therein). Here, however, we will consider two simple but practically important environmental examples relating to the environmental time series shown in Figures 1 and 2.
Mauna Loa Atmospheric CO2 Series
The complete analysis of this CO2 series is described in . Given the nature of the series, the specific UC model takes the form:
where the trend Tt is a polynomial in t, with unknown but constant parameters a0, a1 and a2, combined with a residual signal dt that models the stochastic deviations about this polynomial trend. The annual seasonality is then modeled by:
where ω1 = 2π/12 is the fundamental frequency associated with the 12 month cycle and ω2 = 2π/6 is its first harmonic. The spectrum of the data suggests that the other harmonics are insignificant and can be omitted. The time variable parameters (TVPs) ajt and bjt, j = 1,2, allow for possible temporal variations in the amplitude and phase of the annual cycle.
Equations 4 and 5 are a special version of the dynamic harmonic regression (DHR) model . The associated stochastic state equations 3 are composed of simple random walk (RW) models (see Stochastic Process) for dt, ajt and bjt; and the hyper parameters are the NVRs (see earlier discussion) associated with these models. In other words, A is a fifth-order identity matrix; B is a null matrix since there are no deterministic inputs; G is a 5 × 1 vector with all unity entries; and H is defined as follows:
where Cj = cos(ωjt) and Sj = sin(ωjt), for j = 1,2. The NVR matrix is then defined as a diagonal matrix with diagonal elements NVRj, j = 1,2,…,5, equal to the individual NVRs. These define the variances of the white noise inputs to the RW models for dt, ajt and bjt (j = 1,2), respectively. However, because of the nature of the periodic components, it makes sense to constrain the NVRs associated with each component (note: not all components) to be equal. The resultant three NVRs are optimized by the YFD method in the frequency domain (see ) and the polynomial parameters aj, j = 0,1,2 are estimated concurrently by linear regression in an iterative ‘backfitting’ procedure (see ). This procedure yields the following optimized parameters:
Typical interpolation and forecasting results obtained using the optimized model, as generated by Matlab m-files using tools from the CAPTAIN Matlab Toolbox mentioned earlier, are presented in Figures 3, 4 and 5. For this analysis, the estimation data set consisted of the first 313 samples (1958(1)– 1985(5)) but with two years of data between 1971(8) and 1973(8) omitted to show how the algorithms interpolate over such a gap. True multistep ahead forecasting is carried out in the short (3 years ahead) and long (14.42 years ahead to 1999(12)) terms, respectively, based only on the estimated model and the estimation data set.
The top panel in Figure 3 shows the two year interpolation results, while the lower panel shows the three year-ahead forecasts. In both cases, the results are excellent with very small errors between the interpolates/forecasts (full lines) and the data (dashed lines: not used in the analysis). The estimated error bounds (95% confidence intervals) are shown as dotted lines. Figure 4 presents the main signals extracted by the FIS algorithm during this analysis. The top two panels show, respectively, the estimated 12 and 6 month components of the annual cycle; the estimate of dt is plotted in the third panel; and the very small final residual (the estimate of et) is shown in the lower panel. As required, the innovations process [22, 42] has the desired white noise properties.
Figure 5 shows the last five and a half years of the 14.4 year-ahead forecast. Here, the data (again not used in the estimation or forecasting) are plotted as circular points and the forecast as a full line, with the error bounds again shown dotted. Given the very long forecasting interval, these results are quite remarkable and show how predictable this important series can be if sufficiently powerful forecasting algorithms are utilized.
Finally, it is worth referring back to the estimate of dt shown in the third panel of Figure 4. This represents an estimate of the deviations about the polynomial trend after the seasonal pattern and residuals have been removed. The origin of this signal is not clear but earlier research  suggested that it might be associated with the El-Niño/Southern Oscillation (ENSO) phenomenon. Although this is not proven in any way, it is supported by the present analysis which shows that the extracted signal correlates quite well with the ENSO anomaly series over the same time period of 1958–2000 .
River Hodder Rainfall–Flow Series
The complete analysis of the rainfall–flow data series is described in . The UC model in this case is quite different from that used in the previous example, since there is no trend or periodicity in the flow data. As a result, the main component is the nonlinear function f(Ut) where Ut represents two, additive and unobserved flow components, both of which derive nonlinearly from the rainfall rt via a second-order TF model:
In terms of 3, the associated stochastic state equations are defined by A, B, G and H as follows:
and, in this case, the scalar input ut is the ‘effective rainfall’, a delayed, nonlinear transformation of the measured rainfall rt. In these definitions, the parameters α1 = 0.753, α2 = 0.986, β1 = 0.0055 and β2 = 0.147 are obtained from prior TF modeling using an iterative optimization procedure combining optimal instrumental variable (IV) estimation of the TF parameters with state dependent parameter (SDP) estimation of the effective rainfall nonlinearity (see [44-48]).
Flow forecasting is quite difficult because, nominally, it requires concurrent forecasting of the rainfall rt, which is obviously problematical, particularly in England's changeable and unpredictable climate. However, forecasts are usually only required a few hours ahead (see, for example,  or the entry on meteorology) and, in this example, the TF model exhibits a four-hour pure time delay between the occurrence of rainfall events and the first effect of these on flow. As a result, rt is known over this delay period and can be utilized to compute ut, which then acts as a ‘leading indicator’ to generate good flow forecasts. This is demonstrated in Figure 6, which compares the four-hour-ahead forecasts of flow on the River Hodder with the measured flows. Here, the two diagonal elements of the Qr NVR matrix are optimized at values of NVR(x1) = 0 and NVR(x2) = 0.023, respectively. In other words, the first flow component (x1) is defined exactly by the prior TF model predictions, while the second component x2 is ‘adapted’ by the KF algorithm from the prior TF model predictions to a degree determined by the associated NVR.
The performance of the forecasting algorithm is quite good: the coefficient of determination of the four-hour-ahead forecasts is R2 = 0.872, compared with R2 = 0.484 for a naive or ‘persistence’ forecast (forecast equal to present value).
Note that, in this example, the model parameters are assumed to be constant, unlike the ‘parameter-adaptive’ flood forecasting system described in . Although, as in this reference, TVP estimation could be used here, the prior TF estimation results justify the constant parameter assumption and this makes the forecasting algorithm simpler and more practically robust. As a result, the recursive analysis in this case yields short-term, ‘state-adaptive’ forecasts of flow, in the sense that the adjustment of the flow estimates derive from the stochastic estimates of the two flow components x1,t and x2,t, rather than any adjustment of the TF model parameters. This is in contrast to the ‘parameter-adaptive’ forecasts of the Mauna Loa CO2 series, where the model parameters are being continually adjusted to reflect any estimated nonstationarity in the series.
Finally, although the model used in this example has been presented so far as a ‘black-box’ representation of the rainfall–flow process, this is not the case. The model is, in fact, a good example of a DBM model (see , and the prior references therein). In other words, the data-based model can be interpreted in hydrologically meaningful terms. In particular, the state equations represent a ‘parallel flow’ process, often encountered in hydrological systems (e.g. ), with the state x1,t modeling the ‘fast’, surface processes in the river catchment; and the second state x2,t modeling the ‘slow’, subsurface processes (see Catchment Hydrology). In addition, the estimated effective rainfall nonlinearity, which is a function of flow, can be interpreted in terms of the continually changing soil moisture characteristics of the catchment, with the flow acting as a ‘surrogate’ measure of soil moisture .
- 21957). Forecasting seasonals and trends by exponentially weighted moving averages, ONR Research Memorandum 52, Pittsburgh, Carnegie Institute of Technology.(
- 41989). Forecasting Structural Time-series Models and the Kalman Filter, Cambridge University Press, Cambridge.(
- 51970, 1976). Time-series Analysis: Forecasting and Control, Holden-Day, San Francisco.& (
- 61983). Statistical Methods for Forecasting, New York, Wiley.& (
- 71984). The Analysis of Time-series: An Introduction, Chapman & Hall, London.(
- 81998). Elements of Forecasting, South-Western, Cincinnati.(
- 91990). Time-series: A Biostatistical Introduction, Clarendon Press, Oxford.(
- 101986). Forecasting Economic Time-series, 2nd Edition, Academic Press, San Diego.& (
- 111981). Time-series Models, Philip Allan, New York.(
- 131957). The Analysis of Multiple Time-series, Griffin, London.(
- 141981). Modelling multiple time-series with applications, Journal of the American Statistical Association 76, 802–816.& (
- 161976). Bayesian forecasting, Journal of the Royal Statistical Society, Series B 38, 205–247.& (
- 171984). Recursive filtering and the inversion of ill-posed causal problems, Utilitas Mathematica 35, 351–376.& (
- 181995). STAMP 5.0: Structural Time-series Analyser Modeller and Predictor, Timberlake Consultants., , & (
- 201995). Applied Bayesian Forecasting and Time-series Analysis, Chapman & Hall, New York., & (
- 211989). Bayesian Forecasting and Dynamic Models, Springer-Verlag, New York.& (
- 221984). Recursive Estimation and Time-series Analysis, Springer-Verlag, Berlin.(
- 271983). Modelling considerations in the seasonal adjustment of economic time-series, in Applied Time-series Analysis of Economic Data, A. Zellner, ed., US Dept. of Commerce Bureau of the Census, Washington, pp. 74–100., & (
- 291983). Theory and Practice of Recursive Estimation, MIT Press, Cambridge.& (
- 301960). A new approach to linear filtering and prediction problems, Journal of Basic Engineering 83-D, 95–108.(
- 312001). Multi-rate forecasting of telephone call demand: a software package for unobserved components modelling and forecasting, International Journal of Forecasting, in press., , & (
- 332000). Data-based mechanistic modelling and adaptive flow forecasting, in Flood Forecasting: what does Current Research Offer the Practitioner? M. Lees & P. Walsh, eds, British Hydrological Society, Occasional Paper No. 12, produced by the Centre for Ecology & Hydrology on behalf of the British Hydrological Society, pp. 26–40.& (
- 351997). Comments on multivariate structural time-series Models, in System Dynamics in Economic and Financial Models, C. Heij, H. Schumacher, B. Hanzon & K. Praagman, eds, Wiley, Chichester.& (
- 401977). Maximum likelihood from incomplete data via the EM algorithm, Journal of the Royal Statistical Society, Series B 39, 1–38., & (
- 411999). Nonstationary time-series analysis and forecasting, Progress in Environmental Science 1, 3–48.(
- 422000). The Mauna Loa atmospheric CO2 data: new analysis and forecasting results, Centre for Research on Environmental Systems and Statistics (CRES), Technical Note No. TR/180.(
- 452001). Data-based mechanistic modelling and validation of rainfall-flow processes, in Model Validation in Hydrological Science, M.G. Anderson, ed., Wiley, Chichester, pp. 117–162.(
- 462000). Stochastic, dynamic modelling and signal processing: time variable and state-dependent parameter estimation, in Nonstationary and Non- linear Signal Processing, W.J. Fitzgerald, A. Walden, R. Smith & P.C. Young, eds, Cambridge University Press, Cambridge, pp. 74–114.(
- 472001). The identification and estimation of nonlinear stochastic systems, in Nonlinear Dynamics and Statistics, A.I. Mees, ed., Birkhäuser, Boston, in press.(