Abstract
 Top of page
 Abstract
 1. Introduction
 2. Statistical features of the data
 3. Autoregressive modelling using the Yule–Walker equations
 4. Development of autoregressive neural network model
 5. Comparative study
 6. Conclusions
 Acknowledgements
 References
The present paper has adopted an autoregressive approach to inspect the time series of monthly maximum temperature (T_{max}) over northeast India. Through autocorrelation analysis the T_{max} time series of northeast India is identified as nonstationary, with a seasonality of 12 months, and it is also found to show an increasing trend by using both parametric and nonparametric methods. The autoregressive models of the reduced T_{max} time series, which has become stationary on removal of the seasonal and the trend components from the original time series, were generated through Yule–Walker equations. The sixth order autoregressive model (AR(6)) is identified as a suitable representative of the T_{max} time series based on the Akaike information criteria, and the prediction potential of AR(6) is also established statistically through Willmott's indices. Subsequently, autoregressive neural network models were generated as a multilayer perceptron, a generalized feed forward neural network and a modular neural network. An autoregressive neural network model of order four (ARNN(4)), in the form of a modular neural network (MNN), has performed comparably well with that of AR(6) based on the high values of Willmott's indices and the low values of the prediction error. Therefore, ARNN(4)MNN will be a better option than AR(6) to forecast a time series, i.e. the monthly T_{max} time series of northeast India, because ARNN(4)MNN requires fewer predictors for a superior forecast of a time series. Copyright © 2010 Royal Meteorological Society
1. Introduction
 Top of page
 Abstract
 1. Introduction
 2. Statistical features of the data
 3. Autoregressive modelling using the Yule–Walker equations
 4. Development of autoregressive neural network model
 5. Comparative study
 6. Conclusions
 Acknowledgements
 References
The observed and the projected global warming in the 20^{th} and 21^{st} centuries has affected and will continue to affect agriculture, the hydrological cycle, environmental conditions and ecological systems (Lianchun et al., 2007). The average air temperature at the surface of the Earth is the most frequently used parameter for sensing the state of a climatic system (Ceschia et al., 1994). Different forcing actions of external factors such as the carbon dioxide concentration (greenhouse effect), the amount of solar radiation and the particulates reaching the stratosphere due to the volcano eruptions, significantly influence the surface temperature (Cracknell and Varotsos, 2007; IPCC, 2007). Therefore, advective processes exerted by atmospheric circulation are crucial factors that control regional air temperature changes (Xoplaki et al., 2003). The forecasting of surface temperature has long been an area of interest of the scientific community in general, and meteorologists in particular. Since the late 1930s, different statistical methodologies have been attempted to forecast the surface temperature on hourly (Spreen, 1956), daily (Mantis and Dickey, 1945; Gilbert, 1953), monthly (Kangieser, 1959), and seasonal (Van Loon and Jenne, 1975; Kumar et al., 1997) time scales. Namias (1948) was among the first to state that the mean monthly geopotential height fields for mid tropospheric levels determine monthly air temperature anomalies.
The nonlinear nature of the relationship between changes in mean temperature and the corresponding changes in extreme temperature events is well documented in the literature (Mearns et al., 1984; Meehl et al., 2000), and the major impact of relatively small alterations in the mean state in the probabilities of extreme events has been identified (Griffiths et al., 2005). The studies mentioned above were based on simple linear statistical approaches. Some improved methodologies in this direction include Klein and Lewis (1970), Klein et al. (1971), Klein and Marshall (1973), whilst authors such as Calvo and Gregory (1994), Massie and Rose (1997), and Vil'fand et al. (2007) adopted the regression approach in various forms to forecast surface air temperature.
Studies on longterm variations in surface air temperature have shown a rising trend during the last few decades (Hingane et al., 1985; Willmott and Matsuura, 1995; Shrestha et al., 1999), and various authors have emphasized the need for proper forecasts of the surface temperature (e.g. Hussain, 1984; Rehman et al., 1990; Said, 1992). Tasadduq et al. (2002) mentioned the importance of acquiring the knowledge of the variability of surface ambient temperature in weather forecasting, surface budget studies, total solar radiation estimation, cooling and heating degreedays calculations, micrometeorological studies, initialization of planetary boundary models, calculation of thermal load on buildings, air pollution studies and upper air heating rate calculations. Luterbacher et al. (2004) discussed the evolution of European winter, summer and annual mean temperatures for more than 500 years in the context of estimated uncertainties, emphasizing the trends, spatial patterns for extreme summers and winters, and changes in both extreme and mean conditions. Elliott and Angell (1987) studied the relation between Indian monsoon rainfall, the southern oscillation and the hemispheric air and sea surface temperatures, and revealed that the correlations between sea surface temperature and the monsoon rainfall are higher than the correlations between monsoon rainfall and the various pressure indices over northeast India. Kumar et al. (1999) suggested that the inverse relationship between the El NiñoSouthern Oscillation (ENSO) and the Indian summer monsoon (weak monsoon arising from warm ENSO event) has broken down in recent decades after analyzing a 140 year historical record. Frias et al. (2005) discussed the limitations of general circulation models and subsequently applied a multimodel ensemble system to investigate the predictability of monthly average maximum temperature, and they found that the results were not dependent upon model formulation. Mandal et al. (2007) discussed the association of sea surface temperature with genesis of a severe storm over India and concluded that the sea surface temperature and its gradient have a significant impact on modulating the intensity of the storm, and the peak intensity of the storm reached over the warmest sea surface.
The present paper is concerned with the monthly maximum temperature over northeast India. Most of northeast India and much of north India are subject to a humid subtropical climate. Northeast India refers to the easternmost region of India consisting of the states of Arunachal Pradesh, Assam, Meghalaya, Manipur, Mizoram, Nagaland, Tripura, Sikkim and parts of North Bengal (districts of Darjeeling, Jalpaiguri and Koch Bihar). Weston (1972) studied the flow pattern over northeast India and revealed that solar heating has a positive impact upon cumulus and cumulonimbus convection over northeast India during the premonsoon season, March, April and May. The early cloud observations over the Indian landmass reported that the premonsoon convection in northeast India is more intense than the monsoonal convection (Ludlam, 1980; Chaudhuri and Chattopadhyay, 2001; Zuidema, 2003; Yamane and Hayashi, 2006). The convective intensity may be aided by the presence of midtropospheric dry air, which increases downdraft evaporation and, thereby, the intensity of the cold pool. The convective activities over northeastern India have been studied by Pattanaik (2007), and several authors (e.g. Peterson and Mehta, 1995; Yamane and Hayashi, 2006) have documented that severe local storms, including tornadoes, damaging hail and wind gusts, frequently occur in northeastern India during the premonsoon season. The role of the maximum surface temperature in the genesis of premonsoon thunderstorms has been established by Chaudhuri and Chattopadhyay (2005). It can, therefore, be surmised that a forecast model for maximum temperature may be a good contribution to the forecasting of thunderstorms. Moreover, the univariate modelling approach adopted in the present research depends upon the past values of the same time series, i.e. the monthly maximum temperature. Therefore, without the help of any other climatic predictor, the present modelling can be used for forecasting the future values of temperature over northeast India.
Northeast India has a great economic dependence on crops such as paddy, tea and forest products. Rainfall, evaporation, transpiration and evapotranspiration are vital components of the hydrological cycle and are significant for irrigation processes and agricultural practices. Temperatures, along with other climatic parameters such as sunshine duration, wind speed and humidity, affect the processes of evaporation and evapotranspiration. No significant trend in rainfall has been observed in the northeast region of India as a whole (Das and Goswami, 2003; Das, 2004). However, a significant decreasing trend in seasonal rainfall at a rate of 11 mm decade^{−1} was reported in the South Assam Meteorological Subdivision covering the hilly states of Nagaland, Manipur, Mizoram, Tripura and parts of the Barai Hills in southern Assam during the last century (Mirza et al., 1998; Das, 2004). Jhajharia et al. (2009), examined trends in total rainfall by using the Mann–Kendall nonparametric test for 11 sites in northeast India over different durations (yearly, winter, premonsoon, monsoon and postmonsoon seasons), and reported that, except for increasing trends at Agartala in winter and for the yearly and premonsoon season data at Chuapara, and decreasing trends at Nagrakata in yearly and monsoon rainfall, no statistically significant trends in any of the yearly and seasonal rainfall were observed at most of the sites of northeast India. Das (2004) observed that the mean maximum temperature is found to be rising at a rate of 0.11 °C decade^{−1}. The annual mean temperature is also reported to be rising at a rate of 0.04 °C decade^{−1} in the northeast region of the country (Pant and Rupa Kumar, 1997; Das, 2004). Also, Jhajharia et al. (2009) reported that five and six sites of northeast India observed statistically significant increasing trends in maximum temperature in the monsoon and post monsoon seasons, respectively, obtained through the Mann–Kendall (MK) test at the 5% significance level. However, 10 sites each out of total 11 sites of northeast India observed no significant trends in maximum temperature in the winter and premonsoon seasons. Also, 10 and 9 sites witnessed no significant trend through the MK test at the 5% level of significance in minimum temperature in the winter and premonsoon seasons, respectively, over northeast India. However, almost half of the sites analysed in the study witnessed increasing trends in minimum temperature in the monsoon season over the northeast region of India (Jhajharia et al., 2009). Tea is one of the main cash crops of northeast India, and the relationship between temperature and tea yield is discussed in Wijeratne (1992) and Ghosh Hajra and Kumar (1999). The present study, therefore, may also help in agrometeorological modelling of tea.
The organization of the paper is as follows. In Section 2 the autocorrelation structure of the time series of monthly maximum temperature time series has been investigated and the trend in the time series has been tested using a parametric as well as nonparametric approach. Subsequently, the time series has been deseasonalized and detrended. In Section 3, the autoregressive models have been generated using Yule–Walker equations. Development of the autoregressive neural network models are described in Section 4, and a comparative statistical analysis of all of the models described in Section 5. Conclusions are presented in Section 6.
3. Autoregressive modelling using the Yule–Walker equations
 Top of page
 Abstract
 1. Introduction
 2. Statistical features of the data
 3. Autoregressive modelling using the Yule–Walker equations
 4. Development of autoregressive neural network model
 5. Comparative study
 6. Conclusions
 Acknowledgements
 References
After deseasonalizing and detrending the time series, the ACF and the partial autocorrelation function (PACF) have been computed for the reduced time series. The ACF (Figure 5(a)) shows that after the first few lags the autocorrelations are very close to 0 and, consequently, the autoregressive (AR) process can be attempted. This gradual decaying of the ACF to 0 indicates that the reduced time series is stationary. The PACF has been presented in Figure 5(b) and it is described in Box et al. (2007) that the PACF of an AR process of order p has a cut off after lag p. From the plot of PACF in Figure 5(b) it is understood that the PACF is becoming exactly equal to 0 at lag 7. Thus, AR(6) can be a suitable autoregressive model for the given time series. However, all the partial autocorrelations before lag6 being very close to 0 (excepting lag1) the (p) processes are examined for p varying from 1 to 6 and their performances are finally judged by Akaike information criteria (AIC). The AIC is given by (Storch and Zwiers, 1999):
 (3)
Details of the symbols used in the above expression are given in page 167, of Storch and Zwiers (1999). Some examples of application of AR modelling for climatological time series include Davies and Milionis (1994), Koscielny and Duchon (1984), Leite and Peixoto (1996) and Besse et al. (2000).
The set of adjustable parameters ϕ_{1}, ϕ_{2}, …, ϕ_{p} of an autoregressive process of order p, i.e. the AR(p) process:
 (4)
satisfies certain conditions for the process to be stationary. Here, z̃_{t} = z_{t} − µ. The parameter ϕ_{1} of an AR(1) process must satisfy the condition ϕ_{1}< 1 for the time series to be stationary. It can be shown that the autocorrelation function satisfies the equation:
 (5)
Substituting k = 1, 2, …, p in Equation (5) we get the system of Yule–Walker equations (Box et al., 2007):
 (6)
The Yule–Walker estimates of the autoregressive parameters ϕ_{1}, ϕ_{2}, …, ϕ_{p} are obtained by replacing the theoretical autocorrelation ρ_{k} by the estimated autocorrelation r_{k}. Thus, the matrix notation, the autoregression parameters can be written as:
where
The six autoregressive models, AR(1), AR(2), AR(3), AR(4), AR(5) and AR(6), have been generated for the normalized time series, which has been already identified as a stationary time series using the method explained above. The autoregressive parameters for the six models are presented in Table III. For each model, the AIC has been computed and the magnitudes of the AIC corresponding to the models have been presented in Figure 6. This figure shows that the minimum magnitude of the AIC occurs for AR(6), and subsequently it is finally established that AR(6) is the best autoregressive model for the time series under consideration. The autoregressive model fitting is based on the methodology explained in Box et al. (2007).
Table III. Autoregressive parameters corresponding to the six AR(p) modelsAR(p)  ϕ_{1}  ϕ_{2}  ϕ_{3}  ϕ_{4}  ϕ_{5}  ϕ_{6} 

AR(1)  0.517  —  —  —  —  — 
AR(2)  0.062  0.421  —  —  —  — 
AR(3)  0.128  − 0.193  0.699  —  —  — 
AR(4)  0.044  0.072  − 0.099  0.604  —  — 
AR(5)  0.111  − 0.100  0.237  − 0.365  0.831  — 
AR(6)  0.047  0.057  0.015  0.048  − 0.011  0.474 
5. Comparative study
 Top of page
 Abstract
 1. Introduction
 2. Statistical features of the data
 3. Autoregressive modelling using the Yule–Walker equations
 4. Development of autoregressive neural network model
 5. Comparative study
 6. Conclusions
 Acknowledgements
 References
Before training the ARNN models, the prediction capacity of the AR(6) model is judged by computing the prediction error (PE), and Willmott's indices of order 1 (WI1) and 2 (WI2). These statistics are given by:
 (16)
 (17)
 (18)
The predicted values of the deseasonalized and detrended time series generated by the AR(6) model are brought to its original scale by means of the process reverse to that of deseasonalizing and detrending. Equations (16)–(19) are then considered to compute the statistical measures of goodness of fit of the AR(6) model. The results are presented in Table IV, where it is seen that the values of WI1 and WI2 are very close to 1, which indicates a very good prediction by the AR(6) model.
Table IV. Values of various statistics measuring the goodness of fit of various modelsModelsa  PE  WI1  WI2 


AR(6)  0.03  0.87  0.98 
ARNN(2) (MLP)  0.11  0.45  0.70 
ARNN(3) (MLP)  0.09  0.56  0.76 
ARNN(4) (MLP)  0.08  0.50  0.76 
ARNN(2) (GFFNN)  0.04  0.77  0.94 
ARNN(3) (GFFNN)  0.06  0.73  0.96 
ARNN(4) (GFFNN)  0.18  0.45  0.56 
ARNN(2) (MNN)  0.06  0.70  0.89 
ARNN(3) (MNN)  0.03  0.83  0.83 
ARNN(4) (MNN)  0.03  0.83  0.97 
The purpose of implementing an ARNN is to examine whether the same accuracy as in AR(6) can be achieved by means of less number of predictors. As explained earlier, ARNN has been developed for three types of ANN and the statistics PE, PCC, WI1 and WI2 have been computed for each model. The results are available in Table IV. It is apparent from the table that AR(6) and ARNN(4), where MNN has been chosen as the form of the neural network, are producing almost similar values of the statistics under consideration. Thus, it is established that AR(6) model that requires past six values of the time series can be replaced by ARNN(4) model where the modular neural network (MNN) is used as the form of the neural network. It is further found that ARNN(4) model is producing much less values for the same statistics when it is trained in the form of MLP and GFFNN. Thus, AR(6) outperforms the ARNN(4) model when MLP or GFFNN is used as the ANN models. The PE values are equal for AR(6) and ARNN(4) when MNN is used. It should be further noted that ARNN(3) with MNN as the form of ANN, performs very well, but considering all the statistics, ARNN(4) with MNN as the form of ANN is more acceptable. The other two models that perform well are ARNN(2) and ARNN(3) with GFFNN as the form of ANN. These two models have produced very high values of WI2. The model producing the worst prediction is ARNN(4) model with GFFNN as the form of ANN. It is noticed that in this case the prediction error attains its highest value and WI1 and WI2 attain their lowest values, which are significantly away from one. Finally, ARNN(4) is identified as the most acceptable autoregressive neural network when it is trained as MNN, that is, modular neural network. While developing this particular ARNN model, the network has been trained thrice and minimization of mean squared error (MSE) has been taken as the stopping criterion. The evolution of MSE with the increase in the number of epochs is presented in Figure 7.
6. Conclusions
 Top of page
 Abstract
 1. Introduction
 2. Statistical features of the data
 3. Autoregressive modelling using the Yule–Walker equations
 4. Development of autoregressive neural network model
 5. Comparative study
 6. Conclusions
 Acknowledgements
 References
In the first phase of the present study, the relevance of studying the monthly maximum surface temperature over northeast India has been discussed. Subsequently, the structure of the time series has been analyzed and it is discerned through the autocorrelation analysis that the monthly maximum temperature time series of northeast India is characterized by significant nonstationarity and seasonality of 12 months. Increasing trends in the maximum temperature time series were observed through both the Mann–Kendall nonparametric test and parametric test in almost all months and seasons over northeast India. The seasonal and the trend components were removed from the original time series, which was found to be nonstationary, in order to produce a stationary time series. The reduced time series, generated after the removal of seasonal and trend components from the original series, is found to be normally distributed. The autoregressive models have been generated through Yule–Walker equations and the autoregressive process of order six (AR(6)) is found to be a representative model and the best autoregressive process for the maximum temperature time series of northeast India by examining both the partial autocorrelation function of the reduced time series and the Akaike information criteria. The prediction potential of the AR(6) model is also established through the high values of the Pearson correlation, Willmott's indices and low value of the prediction error.
In the next phase, autoregressive neural network models have been attempted to reduce the number of predictors from six as required in AR(6). Three types of neural network, multilayer perceptron, generalized feed forward neural network and modular neural network, have been attempted with variable orders of autoregression. An autoregressive neural network with four predictors, denoted as ARNN(4), with modular neural network as the form of the neural network has prediction performance comparable to that of the AR(6) process based on the high and the low values of Willmott's indices and the prediction error, respectively. Thus, it is finally concluded that implementation of a modular neural network in the autoregressive structure potentially reduce the number of predictors from six to four and predict the monthly maximum temperature over the northeastern region of India. In the present paper, the maximum temperature time series has been univariately modelled using artificial neural network as well as the autoregressive approach. A similar univariate approach can be used, for further future study in the field of applied hydrology or meteorology, to scrutinize the time series of monsoon rainfall over different regions of India and India as a whole.