Seasonal prediction of the Indian summer monsoon rainfall using canonical correlation analysis of the NCMRWF global model products


U. C. Mohanty, Centre for Atmospheric Sciences, Indian Institute of Technology, New Delhi, India. E-mail:


In this study, canonical correlation analysis (CCA) has been used to statistically downscale the seasonal predictions of the Indian summer monsoon rainfall (ISMR) from a global spectral model. An extensive diagnostic study of the global model products and observed data for the period 1981–2008 indicates that while the predictions of rainfall anomalies have poor skill, the mean flow patterns are brought out reasonably well by the model. The model precipitation is found to be more strongly dependent on sea surface temperature over the Nino regions in the Pacific Ocean. However, the observed precipitation has a stronger links to winds at 850 hPa near the Somali coast than is evident in the model. On the basis of correlation maps, potential model predictors (specific humidity and zonal and meridional winds over different regions at different levels) are chosen for CCA for the prediction of ISMR. Using leave-three-out cross-validation technique, canonical coefficients are computed using 25 years data (as training period) for CCA model. With this, predictions from the CCA model have also been prepared for the period of 1981–2005 to evaluate the performance. In addition to the above, predictions are made for four independent years (2006–2009). An improvement in skill of the composite forecasts (obtained using all the predictors) in terms of interannual variability is noticed over some parts of east- and northeast India as well as many parts of peninsular region especially over west coast of India. Copyright © 2012 Royal Meteorological Society

1. Introduction

The Indian summer monsoon plays a crucial role for the agriculture-based economy of India, as the success or failure of the crops and water scarcity in any year is always closely linked with the amount of the monsoon rains in India in that year. The Indian summer monsoon rainfall (ISMR) from June to September (JJAS) contributes > 70% of the annual rainfall over India (Parthasarathy et al., 1995). Rainfall during the Indian summer monsoon season shows considerable spatial and temporal variability. Early warning of likely behaviour of seasonal rainfall over smaller spatial domains over India has the potential to help farmers in taking appropriate decisions that could increase farm productivity and maximize returns or minimize losses. The seasonal scale prediction of monsoon rainfall over smaller regions such as meteorological subdivision scale (Parthasarathy et al., 1995) and at monthly to seasonal timescales is one of the challenging tasks for the scientific community. To predict seasonal rainfall over India during the monsoon seasons, several techniques have been developed by the India Meteorological Department (IMD) (e.g. Gowariker et al., 1989; Rajeevan et al., 2006a).

Forecast products from coupled general circulation models (CGCMs) or atmosphere only general circulation models (GCMs) are effectively being used all over the world for generating seasonal forecasts. A number of previous studies (Kang et al., 2004; Kang and Shukla, 2005; Krishna Kumar et al., 2005; Wang et al., 2005; Barnston et al., 2010) suggested that the most of the present day dynamic global models have poor skills in predicting summer monsoon rainfall over India. Earlier studies indicate that statistical post-processing such as multi-model ensemble (MME) marginally improves the skill of seasonal monsoon predictions (Krishnamurti et al., 1999; Goddard et al., 2001; Robertson et al., 2004; Wang et al., 2004; Kang and Shukla, 2005; Chakraborty and Krishnamurti, 2006; Acharya et al., 2011). Moreover, these global models often poorly represent small-scale physical processes which drive important local or regional surface variables, such as local circulation and precipitation (frequency of occurrence and intensity) (Wood et al., 2004). It is seen that performance of global model in simulation of rainfall (in terms of intensity and variability) is poor, however, large-scale flow patterns are simulated well (Zhang et al., 1997; Anagnostopoulou et al., 2007). Thus, statistical corrections are needed to translate GCMs output for seasonal scale prediction (Zhu et al., 2008). Two important methodologies for producing forecast with the use of statistical relationships are (1) perfect prognostic method (Klein et al., 1959), where it is assumed that the behaviour of the model outputs (predictor fields) is exactly same as in the real atmosphere. The regression coefficients are calculated using the past observational data (predictor as well as predictand) and are used for prediction of the predictand variable and (2) model output statistics (MOS) (Glahn and Lowry, 1972) in which models are not able to represent the atmosphere perfectly; and the regression coefficients are calculated using predictor variables from model and predictand variables from observations.

Downscaling works as the bridge between climate forecast and weather (Wilby and Wigley, 1997). This links the state of large-scale variables to the state of small-scale variables (Benestad et al., 2008). One important approach is ‘statistical downscaling’ that takes advantage of statistical relationships among the regional climate and statistical characteristics of desired fields from a coarse resolution GCM. Downscaling techniques have been well described in literature (Wigley et al., 1990; Wilby and Wigley, 1997; Murphy, 1999; Tippett et al., 2003; Robertson et al., 2009). Wilby and Wigley (2000) investigated the choice of atmospheric predictor variables and their respective domains for statistical downscaling. During last few years, several efforts have been made using statistical downscaling techniques to predict monthly as well as seasonal timescale rainfall over the Indian region (Iyengar and Raghukanth, 2003; Kishtawal et al., 2003; Mohanty and Dimri, 2004; Rajeevan et al., 2004; Juneng et al., 2010; Prasad et al., 2010; Ashok et al., 2011) and other regions (Yu et al., 1997; Tippett et al., 2003; Kang et al. 2009).

Canonical correlation analysis (CCA) is a MOS technique that selects a sequence of pairs of patterns in two multivariate data sets and constructs sets of transformed variables by projecting the original data onto these patterns (Wilks, 1995). CCA can also be viewed as a diagnostic of multiple regressions since the multicomponent predictors are linearly related to the multicomponent predictands and independent from one another. A few studies (Yu et al., 1997; Juneng et al., 2010) stated that the skills of MME forecast are higher than individual model forecasts and the improvement in skill of predictions is more using CCA techniques, however, this improvement is region specific. So far, no statistical downscaling approaches using CCA have been developed for prediction of the Indian summer monsoon seasonal rainfall.

For last several years, the National Centre for Medium Range Weather Forecasting (NCMRWF) has been using an atmospheric general circulation model (AGCM) for dynamical seasonal prediction of the Indian monsoon (Kar, 2007; Acharya et al., 2011). In this atmosphere only GCM uses predicted sea surface temperature (SST) anomalies and has very limited skill. To represent the uncertainties in dynamical predictions, many ensemble hindcasts (retrospective forecasts) have been carried out using perturbations of both atmospheric and surface variables. The average of the ensemble members in conjunction with statistical downscaling techniques may have improved skill in seasonal scale predictions.

As seen from the literature review presented in previous paragraphs, the CCA technique has not been used for statistically downscaling seasonal rainfall predictions from global models over India. Therefore, in this study, a CCA model has been developed for the Indian monsoon rainfall predictions in seasonal timescale. This CCA model downscales and statistically corrects the seasonal prediction products of a global model. Further, in this study, an attempt has been made to find out regions where forecast skill is improved using the CCA model.

For developing the CCA model, the NCMRWF global model products have been used. A brief description of the model is given in the next section. A description of data and methodology used for this study is given in Section '4. Results and discussion'. Results and discussions are provided in Section '5. Conclusion' and the present study is concluded in Section '5. Conclusion'.

2. Model description and hindcast runs

The global model used for this study is the Indian global model (In-GLM1) (Kar, 2007) which is the climate version of a medium-range weather forecast model of NCMRWF. This is a global spectral model with 80 waves in triangular truncation (T80), which corresponds with a grid resolution of about 1.4° × 1.4°. The vertical coordinate is sigma with 18 vertical layers. Deep convection is modelled by a fairly basic Kuo–Anthes type of scheme, requiring moisture convergence and deep conditional instability in order to be active (Anthes, 1977). More details of the model may be found in Kanamitsu (1991), Kar (2007) and Kar et al. (2011).

One of the main problems for real time prediction with an atmospheric model is to have the prediction of boundary conditions such as the SST (Fu and Wang, 2003; Wang et al. 2004). Predictions of SST from coupled models have large variations among one another for a particular season. There are also large systematic errors in the SST predictions from a given model (Tippett and Barnston, 2008). The structure of the systematic errors of predicted SSTs can be estimated from the hindcast data. For preparing seasonal predictions in real time, it is possible to represent uncertainties in SST predictions through the systematic error variance of SST in the hindcast period. One can build a probability density function based on the root mean square error (RMSE) of the SST predictions in the hindcast period and create several SST scenarios for the season of prediction. By incorporating uncertainties in SST predictions, one introduces a physical-based ensemble prediction system along with different initial conditions (Acharya et al., 2011).

For each year over the period 1981–2009, the model is initialized in May and integrated until the end of September. The 18 member ensemble is generated using initial conditions from six different days from 10 to 15 May of each year, and for each initial date, three sets of SST data [mean ± 1 standard deviation of National Centre for Environment Prediction (NCEP)–Climate Forecast Systems (CFS) forecasted SST, version-1] are provided to the model. The NCEP–CFS (version-1; Saha et al., 2010) forecasted SST was obtained from

3. Data and methodology

In this study, ensemble mean of the NCMRWF GCM outputs for the period 1981–2008 are considered as predictor variables for statistical downscaling approaches. The observed SST (Reynolds and Smith, 1994) for the period 1982–2008 and NCEP–National Centre for Atmospheric Research (NCAR) reanalysis data (Kalnay et al., 1996) for 1981–2008 are used to examine observed teleconnection patterns. Observed daily rainfall data at 1° × 1° resolution (Rajeevan et al., 2006b) obtained from IMD has been used to prepare observed seasonal monsoon rainfall.

Observations as well as model output data are seasonally averaged from JJAS for the period 1981–2008. Model rainfall fields were bi-linearly interpolated to the observed rainfall grid and set to missing values outside the Indian boundary to compare with observations. During the period of study, 1982, 1986, 1987, 2002 and 2004 were deficit rainfall years, while 1983, 1988 and 1994 were excess rainfall years (e.g.∼kolli/MOL/Monsoon/frameindex.html). A deficit (excess) rainfall year is defined when rainfall during a monsoon season is ⩽ 10% (≥ 10%) of climatological rainfall. In the following texts, deficit or excess rainfall years are refereed to as extreme years. Composite anomalies of winds for deficit (excess) years have been calculated by taking mean of all deficit (excess) years minus the climatological winds of the period of study for observations and model, respectively.

In this study, a MOS approach using CCA for statistical downscaling is being used. In this method, the original data X (independent) and Y (dependant) are transformed into a new set of variables Vm and Wm, respectively, called canonical variables (Wilks, 1995). In this context, X's are the predictor variables from the global model and Y is the predictant, i.e. the observed precipitation data. These variables are defined as follows:

equation image(1)
equation image(2)

Calculation of A and B, called as canonical vectors, is similar to that of principal component analysis. The linear regression between the canonical variables can be written as:

equation image(3)

It can be easily proved that β = Rc, where Rc is the diagonal matrix of the canonical correlations. The original data Y can be easily estimated using the relation Y = B−1 WmB. The Student's t-test is used for statistical significant test of correlation and the critical value of t for 28 years is 0.34 at 10% significance level (SL).

4. Results and discussion

Before presenting the results related to the prediction of ISMR using statistical downscaling, a brief diagnosis of the NCMRWF model output is described. For this purpose, various statistical parameters, such as standard deviation, correlation coefficient, RMSE etc., have been used. A standardized anomaly index (SAI; Katz and Glantz, 1986) of rainfall was calculated for the Indian region using observed as well as the model rainfall. For this, standardized anomalies of rainfall have been averaged over India and SAI of rainfall obtained from model and observation have been compared.

4.1. Model diagnosis

Rainfall climatology based on the available 28 years data obtained from the observation and the model (ensemble mean) is shown in Figure 1(a) and (b), respectively. Figure 1(b) shows that the rainfall distribution pattern especially over Western Ghats, northwestern parts of India and Jammu & Kashmir are represented well in the model simulations. However, over central and most parts of northeast India (NEI), model simulated rainfall climatology is underpredicted, while over southern peninsular region especially over the rain shadow region of Western Ghats, it is overpredicted compared to observed climatology. Interannual rainfall variability (standard deviation) at each grid point has been calculated from observed rainfall as well as from the model rainfall for the period of study. These are shown in Figure 2(a) and (b), respectively. Note that, since, interannual variability of the model rainfall is weaker than the observed, the range of the colour scale in Figure 2(b) has been halved in order to understand the spatial variation and pattern of rainfall from the model. Figure 2(a) clearly shows that rainfall variability over Western Ghats and almost all parts of NEI is the maximum in the observations. The model is able to represent relatively maximum rainfall variability over north of Western Ghats, but fails to do so over NEI. Over the northwest India and some parts of peninsular India, rainfall variability in the model is relatively small and that is well in agreement with observations. In central parts of India, rainfall variation in the model is higher than the southern and northern parts which are not seen in the observations.

Figure 1.

Climatological mean JJAS rainfall total (mm) during the study period (1981–2008), for (a) observed (IMD gridded rainfall) data and (b) model (NCMRWF) rainfall interpolated to the observed grid points.

Figure 2.

Interannual standard deviation of JJAS rainfall totals (mm), for (a) observed gridded rainfall data and (b) model rainfall interpolated over observed grid points.

The spatial averages (all-India) of standardized rainfall anomalies from the observed gridded data and the ensemble mean of the model retrospective forecasts (SAI of all-India) are shown in Figure 3. It is seen from the figure that the model reproduces the sign of the observed rainfall anomaly in about 60% cases (about 17 out of 28 years); however, the model performance is not satisfactory in representing the extreme (deficit and excess) rainfall years. It is also noted that model rainfall anomaly shows opposite sign compared with observations during the few extreme years such as 1983 and 2002. In contrast to Figure 2, the interannual variation of the SAI in Figure 3 is higher in the model than in the observations. This implies that the spatial coherence of the model's rainfall anomalies is larger than the observed. As discussed in Moron et al. (2007, 2012), the variance of the SAI can provide a useful measure of the spatial coherence of interannual rainfall anomalies. The variances of the SAI in Figure 3 are 0.364 and 0.104 for model and observations, respectively.

Figure 3.

Spatial average of JJAS standardized anomaly rainfall over India (SAI) for observed gridded rainfall data (solid line) and model rainfall (interpolated over observed grid points) data (dash line)

Correlation coefficient (CC) and RMSE between the rainfall anomalies from the model and observations have been calculated grid point-wise for the period 1981–2008 and shown in Figure 4(a) and (b), respectively. These figures show that the performance of the model in terms of simulation of seasonal mean rainfall anomalies is better over the northern parts of India with relatively higher CC value (about 0.2–0.5) and lesser RMSE (200–300 mm). Statistically significant correlations (at 10% SL) are seen only over few regions such as parts of Himachal Pradesh, Haryana, Jammu & Kashmir and Bihar. Almost all parts of northwest and central India have positive CC with lesser magnitude (about 0.1–0.32, not statistically significant at 10% SL) and lesser RMSE; however, some parts of these regions have negative CC values and with relatively higher RMSE. It is seen from the figure that over west coast of India and NEI, where the occurrence of rainfall is the maximum compared to other regions during summer monsoon season, the CC is poor and has negative values, and the RMSE is also maximum (> 500 mm). Negative CC values are seen also over the southern peninsular of India as well as south tip of NEI, but the RMSE is comparatively lesser than surroundings.

Figure 4.

Statistical analysis between observed and model JJAS rainfall totals (model rainfall is interpolated in observed grid) for the period 1981–2008 shown (a) for correlation coefficient (positive in solid contour and shaded and negative are in dash contour) and (b) for root mean square error (mm).

Climatological winds obtained from the model at different pressure levels illustrate that the circulation patterns, e.g. low-level Somali jet, upper air jet stream etc., are well brought out in the model when compared with the NCEP–NCAR reanalysis fields (figures not shown). In order to understand the model simulations for deficit and excess rainfall years over the Indian region, composite analysis has been made. Composite wind anomalies obtained from the NCEP–NCAR reanalysis fields as well as from model for deficit (1982, 1986, 1987, 2002 and 2004) and excess (1983, 1988 and 1994) years at 850 hPa are shown in panels (a) and (b), respectively, of Figures 5 and 6. Figure 5(a) clearly depicts that, in the reanalysis fields, the southwesterly flow over the Arabian Sea (AS) during deficit years has less strength. An anticyclonic flow centred near Gujarat and a cyclonic flow centred near south of the AS (0–10°N) exists. It is also noticed that the wind anomaly is easterly over the AS (10–20°N) and northerly over the central and north Bay of Bengal (BoB). This suggests that the entrainment of moisture over to the Indian land mass from the AS and BoB is less during deficit rainfall years. The wind anomaly for deficit years obtained in the model [Figure 5(b)] shows a cyclonic circulation similar to observations, though its centre is located too far north compared to the reanalysis. Southwesterly flow near Somali has more strength that causes a westerly flow near the tip of the Indian peninsular region. It is noted that over the north and central BoB, the wind anomaly is easterly which indicates more moisture entrainment to the east coast of India such as West Bengal, Orissa and Andhra Pradesh. During excess years [Figure 6(a)], a strong cyclonic circulation centred at northeast of AS exists in the wind anomalies of reanalysis fields, and in this cyclonic pattern, southwesterly flow entering India through west coast of India. Composite wind anomalies obtained from the model for excess years show a southwesterly flow over the AS that crosses upper parts of the west coast [Figure 6(b)]. It is well known that the intensity of seasonal rainfall over west coast strongly depends on the low level southwesterly flow over the AS which brings moisture to the Indian landmass. Since during excess rainfall years in the model, southwesterly wind over the southern part of west coast has less strength, the simulated rainfall over this region is less. It is seen that the anomalous flow entering the Indian landmass is southerly over the BoB in the reanalysis, while the anomaly flow is westerly/southwesterly over the BoB in the model. This indicates that during excess years, central and eastern part of India may have more moisture over these regions from the BoB, while in the model, this does not happen.

Figure 5.

Anomaly of composite JJAS wind (m/s) averaged over the deficit years (mean of composite wind for deficit years—wind climatology) at 850 hPa calculated using (a) NCEP–NCAR reanalysis data and (b) model output

Figure 6.

Anomaly of composite JJAS wind (m/s) averaged over the excess years (mean of composite wind for excess years—wind climatology) at 850 hPa calculated using (a) NCEP–NCAR reanalysis data (b) model output

This study reveals that the model has several deficiencies in its predictions of the Indian summer monsoon circulation and precipitation in terms of intensity and variability. Interannual variation as well as the flow pattern for extreme years in the model exhibit large differences with that of the observations. Interannual variability of the model ensemble-mean rainfall depends on how the model responds to local and remote SST forcing prescribed in the model (Fu and Wang, 2003; Wang et al., 2004). The analysis of rainfall variability of the model presented so far indicates that the model may be incorrectly responding to prescribed SSTs used. Wang et al. (2004) have also noted similar conclusion after conducting monsoon experiments with 11 AGCMs. It also appears that the model rainfall over India is strongly linked to the SST over the Pacific. This aspect shall be further examined later in the corresponding section while developing the CCA model.

Thus, for improved prediction of seasonal scale rainfall over India using this model, it is necessary to correct the conditional as well as mean biases of the global model, to the extent possible. In the following, we explore the potential for statistical downscaling of model output using CCA to improve seasonal forecasts of summer monsoon rainfall over India.

4.2. Prediction of ISMR using MOS approaches

MOS approach using CCA has been used. As a first step towards this, teleconnection maps have been prepared to find suitable predictors as well as domains and and then predictions of ISMR have been made using MOS.

4.2.1. Teleconnection maps

Teleconnection maps are generated using the correlation between all-India summer monsoon rainfall and different meteorological parameters (surface as well as upper air) obtained from observations/reanalysis and the model output separately for the period 1981–2008. For this, SAI of rainfall is calculated over India and correlation maps with global variables (meteorological parameters of surface and upper air) are produced at 10% SL. The analysis is performed separately for observations and model, using the SAI constructed from observed data and model hindcasts, respectively, and the correlation maps plotted in panels (a) and (b), respectively, of Figures 7-9.

Figure 7.

Teleconnection maps showing anomaly correlations between JJAS all-India rainfall index (SAI) and 850 hPa relative humidity/specific humidity (shaded; negative values are lightly shaded and positive values are darkly shaded) and wind (vectors) for the period 1981–2008 for (a) IMD rainfall with NCEP–NCAR reanalysis fields and, (b) NCMRWF rainfall with NCMRWF fields. Only values significant at the 10% level are plotted.

Figure 8.

Same as Figure 7 but for 850 hPa geopotential heights (shaded; negative values are lightly shaded and positive values are darkly shaded) and 200 hPa winds (vectors).

Figure 9.

Same as Figure 7 but for (a) SAI of observed rainfall with Reynolds SST and (b) SAI of model rainfall with CFS SST. The negative values are lightly shaded and positive values are darkly shaded.

It is seen in Figure 7(a) that the 850 hPa reanalysis wind over the Indian monsoon region covering the Indian landmass, AS and BoB has significant correlation with observed rainfall. Significant correlation of rainfall with 850 hPa wind obtained from the model is also noted for the same domain mentioned above [Figure 7(b)], although there are some differences in wind-anomaly patterns between model and observations. However, map for the observation indicates that the westerly and southwesterly flow over the AS and southerly flow over BoB have significant correlation with rainfall, while in the model, only southwesterly flow over AS and BoB has significant correlation. Correlation of observed SAI rainfall with 850 hPa moisture fields (relative humidity) is significant (at 10% SL) over the western part of India, some parts of Pakistan and northeast of AS [Figure 7(a)] which is well brought out by the model. Role of southwesterly wind at 850 hPa over AS and southerly wind over BoB is crucial for all-India monsoon summer rainfall and this is well represented by the model. It is well known that moisture availability over the Indian landmass during summer monsoon season is mainly due to low level southwesterly flow over the AS and southerly/southeasterly flow from the BoB and has significant correlation with the Indian monsoon rainfall (Rao, 1976; Sikka and Gadgil, 1980; Webster et al., 1998). The model represents this feature reasonably well.

Correlation map generated for upper air flow pattern with all India rainfall using observational data shows that a large area over the Pacific Ocean and some parts over Indian subcontinent, wind at 200 hPa has a significant correlation (at 10% SL) as shown in Figure 8(a). Mohanty et al. (1983) stated in their study that large scale divergence in the upper troposphere associated with the large scale low-level convergence and rising motion on average in the middle troposphere over the domain 0–150°E and 0–40°N plays an important role in ISMR variability. Figure 8(b) shows that the model also exhibits significant correlations of 200 hPa wind over Pacific Ocean with SAI rainfall over India. It is seen that observed geopotential heights at 850 hPa over Indian subcontinent and AS are negatively correlated with ISMR and the model also brings out the same feature well.

The respective correlation maps with SST are shown in Figure 9. Observed correlation with ISMR and observed SST (Reynolds and Smith, 1994), for the period 1982–2008, is highly statistically significant (at 10% SL), with a small region of significant negative correlations confined to the Nino3.4 region [Figure 9(a)]. The model rainfall also has a significant (at 10% SL) negative correlation with forecasted SST over the Nino3.4 region. However, the region with negative CC is large and extended towards the east as well as the west as seen from Figure 9(b) as compared with the observed pattern. The CC map generated using rainfall with SST for the model rainfall indicates that model rainfall is positively biased with forecasted SST over the west and south Pacific Ocean which is not seen in the observations. This result is also similar to an earlier study by Wang et al. (2004) who examined the relationship of ISMR with SST over Pacific Ocean using simulations from 11 AGCMs. Thus, the model's ISMR variability is much more strongly tied to SSTs than that in the observations.

On the basis of teleconnection maps and physically based relationships between ISMR and meteorological variables, four predictors including specific humidity at 850 hPa over the Indian domain (domain-1), zonal as well as meridional wind at 850 hPa over AS, Indian subcontinent and BoB (domain-2) and meridional wind at 200 hPa over Pacific Ocean (domain-3) are considered for statistical downscaling from the model using CCA. These domains are shown in Figure 10. Since the length of the data period is modest (28 years), only four predictors have been chosen which is sufficient for the study.

Figure 10.

Spatial domains of various predictors used in MOS approaches.

4.2.2. Prediction of ISMR rainfall

For developing the CCA model, 25 years (1981–2005) data have been considered for training period to compute the canonical variables and correlations for the regression equations. Rainfall forecast of each year (using the regression equations developed during the training period in cross-validation) has been made to estimate the skill of CCA model in terms of correlation. In the cross-validation technique, leave-three-out (one forecasted year and another two years randomly selected) is considered for calculation of the canonical coefficients. The omission of the two random years in addition to the forecasted year was found to make the CCA model more robust and reduces the cross-validation bias (Mason and Tippett, 2004). In addition, detailed assessments of the forecasts for the years 2006–2009 (four independent years not used in training) are also made. The composite forecasted rainfall over a grid point is computed using rainfall forecasts from one of the individual predictor, CC of which is the maximum for that grid point in cross-validation mode. The mean forecast is the simple mean of all forecast computed by using individual predictors.

The anomaly correlation skill (at 10% SL) of the forecasted rainfall obtained with the use of cross-validated CCA models versus observed data over the period 1981–2005 is shown in Figure 11, for different choice of predictors. The anomaly correlation plotted in Figure 11 has been shown only over those areas where CC is statistically significant (≥ 0.34) to understand the performance of the CCA model. An experiment was carried out with the model rainfall as a predictor. However, the correlation of the downscaled rainfall from this predictor with observed rainfall was not found to be significant. Therefore, in this study, results obtained using the model rainfall as a predictor are not described. It is seen from Figure 11 that all the four predictors chosen in this study have significant correlation in predicting ISMR over some parts of southern India, few parts of north and NEI, However, larger area with significant correlation is found for composite forecast than individual predictors. The number of grid points over India with statistically significant skill obtained for the composite forecast is more than that of any individual predictor or mean forecasts. It may be noted here that the correlation of the model rainfall anomaly (direct model output interpolated over observation grid) with observed rainfall anomaly is very poor over all parts of India except some parts of northern India [Figure 4(a)], while an improvement in skill is noticed over some parts of eastern and northern and many parts of southern and west-central India (WCI; in composite forecast) with the use of statistical downscaling approaches.

Figure 11.

Anomaly CC skill of MOS-corrected hindcasts (1981–2005) using various NCMRWF predictors over the domains in Figure 11: (a) precipitation, domain 1; (b) zonal wind at 850 hPa, domain 2; (c) medional wind at 850 hPa, domain 2; (d) meridional wind at 200 hPa, domain 3 (e) specific humidity at 850 hPa, domain 1 (f) composite of all predictor forecast (g) mean of all predictor forecast (positive in solid contour and shaded and negative are in dash contour).

In order to understand the performance of the forecasted rainfall over WCI (based on Parthasarathy et al., 1995), CC of area average rainfall obtained from the forecast and observation has been computed and shown in Figure 12. It is found that CC is statistically significant (at 10% SL) for all the forecasts except for the forecast using Q850_d1 as a predictor. However, in the other regions, such as peninsular India, eastern India etc., CC of area average rainfall is positive but not statistically significant (figures not shown).

Figure 12.

Correlation coefficient between forecasted rainfall using different predictors and observed rainfall for the period 1981–2005 computed over west-central India

Forecasted rainfall anomalies (standardized) for four independent years (from 2006 to 2009) are obtained using the CCA model that was developed for the training period. Observations and predicted (from individual predictors, composite and mean of all forecast) standardized rainfall anomalies for 2006 and 2007 years are shown in Figures 13 and 14, respectively. The figures indicate that the skill of composite forecast over many parts of India such as some parts of east (over some parts of West Bengal and Bihar) and northeast (some parts of Mizoram and Tripura) region, west coast of India, many parts of the peninsular India, some parts of north India (near Uttarakhand, Himachal Pradesh) has been improved. The area of reasonable skill with the use of composite forecast techniques is more than individual predictors forecast. The rainfall of eastern and central India is highly dominated by the monsoon lows/depressions that form over the BoB and move towards the northwest. In addition, the monsoon trough passes through the northern India near Gangetic plains and extends upto head BoB. It may be mentioned here that the present day AGCM are not able to simulate mean and interannual variability of the Indian summer monsoon very successfully (Gadgil and Sajani, 1998; Kang et al., 2002) especially over central and eastern regions of India. This may be due to coarse resolution of CGCMs/AGCMs which is inadequate to capture the mesoscale convective activity especially monsoon depressions over BoB associated with summer monsoon circulation.

Figure 13.

Observed (panel a) and MOS forecasts (panels b–h) of standardized seasonal rainfall anomalies for JJAS 2006 (positives are in shaded and negatives are in contour). The MOS forecasts in panels b–h correspond to the CCA predictors detailed in Figure 11.

Figure 14.

As Figure 13 but for 2007.

Spatial correlations between observed and forecasted rainfall of each independent year (2006–2009) have been computed and shown in Table 1. From the CCA model, the rainfall obtained from composite technique only has been used for the purpose as the composite forecasts have better skill than others. Total six rainfall regions namely all-India and five rainfall homogeneous regions based on Parthasarathy et al. (1995) have been considered for this purpose. It is noticed that spatial correlation over all-India is higher (by about 0.15–0.3) in composite forecast than the NCMRWF model for all the years. It is also found that the spatial correlation is higher in composite forecast for all four homogeneous regions namely northwest India (NWI), WCI, southern peninsular India (SPI) and NEI, during 2006–2009. Over central northeast India (CNEI), composite forecast is able to show higher spatial correlation during 2008 and 2009, but it has lesser correlation during 2006 and 2007. The correlation values from the composite forecasts are statistically significant at 95% SL. Therefore, rainfall distribution and magnitude over all-India as well five rainfall homogeneous regions are forecasted better using statistical downscaling approach, namely the CCA.

Table 1. Spatial CC of area average rainfall obtained from observation and models output for the period 2006–2009
NEI0.030.310.030.76− 0.090.25− 0.010.52

5. Conclusion

A CCA model has been developed to statistically downscale the Indian monsoon rainfall in seasonal timescales using the NCMRWF global model products. Based on 28 years (1981–2008) model predicted data, it was found that the model has several deficiencies in predicting the climatology and interannual variability of monsoon rainfall over India. The interannual variance of spatial-averaged all-India rainfall is higher in the model than in observed data, while the model underestimates the interannual variability at smaller spatial scales. Thus, the model greatly overestimates the highly spatially heterogenous nature of the Indian summer monsoon seasonal rainfall anomalies. This may be partly due the model's moderate spatial resolution (T80). The model has reasonable skill in terms of correlation over some parts of north India; however, model performance is very poor for rest of the India in simulating seasonal scale rainfall. The mean circulation pattern is described well by the model; however, the variation in circulation pattern for extreme years is not being represented well.

Though, global model outputs have differences with observations, some of the large-scale variables can be used as predictor fields for statistical downscaling using CCA technique. Based on the teleconnection maps and physically based relationships between ISMR and meteorological variables, four predictors including specific humidity at 850 hPa over the Indian domain, zonal as well as meridional wind at 850 hPa over AS, Indian subcontinent and BoB and meridional wind at 200 hPa over Pacific Ocean are considered for the CCA. An improvement in the prediction skill of seasonal rainfall using CCA technique has been obtained over some of the regions especially over west coast of the country. This finding corroborates with Yu et al. (1997) and Juneng et al. (2010) which states that statistical downscaling can improve seasonal forecasts over some of the domains only. The domain with higher correlations is more in case of composite forecast than the forecasts from individual predictors. Rainfall distribution and magnitude over all-India as well five rainfall homogeneous regions are forecasted better using statistical downscaling approach, namely the CCA than the AGCM model itself. Statistical downscaling using CCA improves the forecast skill over some parts of SPI especially west coast of India, and few parts of northern India and NEI where performance of the NCMRWF global model is not satisfactory in simulation of ISMR.


This research work is a part of the project entitled ‘Development and Application of Extended Range Forecast System for Climate Risk Management in Agriculture’ at IIT, Delhi and supported by financial grant from Department of Agriculture & Cooperation (DAC), Ministry of Agriculture, Government of India. The authors also acknowledge with thanks to IMD for providing gridded (1°× 1°) rainfall data. The authors also duly acknowledge NCEP for reanalysis observed SST data provided by the NOAA/OAR/ESRL PSD, Boulder, CO, USA, from their website at Authors are thankful to NCEP–CFS for providing the forecasted SST. AWR and MKT are supported by a grant/cooperative agreement from the National Oceanic and Atmospheric Administration (NA10OAR4310210). The views expressed herein are those of the authors and do not necessarily reflect the views of NOAA or any of its sub-agencies.