Prediction of interannual variability (IAV) of Indian summer monsoon (ISM) rainfall is limited by “internal” dynamics, and the monsoon intraseasonal oscillations (MISOs) seems to be at the heart of producing internal IAV of the ISM. If one could find an identifiable way through which these MISOs are modulated by slowly varying “external” forcing, such as El Niño–Southern Oscillation (ENSO), the uncertainty in the prediction of IAV could be reduced, leading to improvement of seasonal prediction. Such efforts, so far, have been inconclusive. In this study, the modulation of MISOs by ENSO is assessed by using a nonlinear pattern recognition technique known as the Self-Organizing Map (SOM). The SOM technique is efficient in handling the nonlinearity/event-to-event variability of the MISOs and capable of identifying various shades of MISO from large-scale dynamical/thermodynamical indices, without providing information on rainfall. It is shown that particular MISO phases are preferred during ENSO years, that is, the canonical break phase is preferred more in the El Niño years and the typical active phase is preferred during La Niña years. Interestingly, if the SOM clustering is done by removing the ENSO effect on seasonal mean, the preference for the break node remains relatively unchanged; whereas, the preference reduces/vanishes for the active node. The results indicate that the El Niño–break relationship is almost independent of the ENSO-monsoon relationship on seasonal scale whereas the La Niña–active association seems to be interwoven with the seasonal relationship.
 Due to the intense socio-economic impact over the region, the prediction of interannual variability (IAV) of Indian summer monsoon (ISM) is of great use to policy-makers. Several studies [e.g., Goswami and Ajaya Mohan, 2001; Goswami, 2005; Joseph et al., 2009, 2010] indicate that the prediction skills of IAV come from “external” as well as “internal” components. The major contributions from external forcing are through the El Niño–Southern Oscillation (ENSO)-monsoon teleconnections [Rasmusson and Carpenter, 1983; Ashok et al., 2007] and the local air-sea interactions over the warm oceans, especially over the eastern equatorial Indian Ocean (EEIO) [Saji et al., 1999] and the western Pacific [Wang et al., 2005]. While the internal IAV in the atmosphere could be generated largely through nonlinear interaction between intraseasonal oscillations (ISOs) and the seasonal cycle [Goswami and Xavier, 2005], some contribution could also come from interactions between the monsoon flow and topography [Shukla, 1985] and from nonlinear scale interactions between high frequency oscillations. It has been estimated that almost 50% of the contribution to the IAV from ISM may come from internal components which are, in turn, controlled mainly by the chaotic monsoon ISOs (MISOs) [Goswami, 1998; Goswami and Ajaya Mohan, 2001; Goswami and Xavier, 2005], thus making the monsoon IAV challenging to predict [Kang et al., 2004; Wang et al., 2004; Goswami et al., 2006; Xavier and Goswami, 2007]. However, if the internal IAV is somehow constrained by external forcing like ENSO, it would be a cause for optimism for improving the skill of seasonal prediction of the ISM. As MISOs are at the heart of generating internal IAV of ISM, if it is possible to find some identifiable way through which the chaotic MISOs are modulated by slowly varying external components such as ENSO, the uncertainty in IAV prediction could be reduced.
 ENSO and ISM are known to be associated on a seasonal/interannual scale [Walker, 1918; Sikka, 1980; Rasmusson and Carpenter, 1983; Webster and Yang, 1992; Webster et al., 1998]. The conventionally accepted mechanism of ENSO-ISM teleconnection is through an anomalous, large-scale, east-west shift in the tropical Walker circulation, which, in turn, modifies the monsoon Hadley circulation [Ashok et al., 2004]. During El Niño years, ISM may be below normal (e.g., 1982 and 1987); whereas ISM tends to be above normal during La Niña years (e.g., 1975 and 1988). However, there is no one-to-one relationship. There have been some drought years without El Niño (e.g., 1979) and some flood years without La Niña (e.g., 1983). Some studies indicate that the ENSO-ISM relationship has weakened in recent decades [Kumar et al., 1999, 2006; Kucharski et al., 2007].
 The relationship between ENSO and ISM on an intraseasonal time scale has been a topic of research in recent years. Palmer  advocated that the probability distribution functions (PDFs) of ISOs may be affected by external forcing. Annamalai et al.  and Sperber et al.  suggested from the PDFs for El Niño and La Niña years that El Niño disposes ISM to more break spells. These studies found that ENSO and ISO are related on an intraseasonal time scale. On the other hand, Krishnamurthy and Shukla  showed clear separation between intraseasonal and interannual modes of variability. Some studies on the IAV of ISO found that there is no significant correlation between ENSO and ISO activity in summer as well as winter [Salby and Hendon, 1994; Hendon et al., 1999; Slingo et al., 1999; Lawrence and Webster, 2001]. Recently, based on results from a nonlinear hidden Markov model (HMM), Yoo et al.  proposed that the influence of ENSO on ISO exists as a preference of certain ISO phases, even without IAV. They also indicated that when a seasonal mean anomaly is retained, the ENSO-related seasonal mean will project onto preferred ISO phases with a similar pattern, and this may give the impression of a shift in the PDF.
 Most previous studies on the ENSO-ISO connection have used rainfall data in the analyses. Also, the ISO signals have been extracted by using linear statistical methods such as empirical orthogonal function (EOF) [Annamalai et al., 1999], the extended EOF (EEOF; [Lau and Chan, 1986]), and multichannel singular spectrum analysis [Krishnamurthy and Shukla, 2007]. Pre-filtered data has also been used, for the sake of obtaining large-scale structure and smooth propagation characteristics. Given that ISOs are inherently chaotic and exhibit event-to-event variability [Chattopadhyay et al., 2008], it is better to use some nonlinear techniques to isolate the different phases of ISO [Chattopadhyay et al., 2008; Yoo et al., 2010]. However, hitherto, such attempts are very few.
 In this study, we make use of a nonlinear pattern recognition/clustering technique known as self-organizing maps (SOMs) [Kohonen, 1990], which can effectively extract various shades of the convectively coupled oscillation of rainfall (e.g., active, break, and normal conditions and their transitions) by employing a set of large-scale dynamical and thermodynamical indices [Chattopadhyay et al., 2008] that describe the seasonal mean ISM and its variability, without using any prior information on rainfall. A detailed description of the SOM algorithm, its advantages over conventional methods, such as EOF, EEOF, and multiple regression, is provided in Appendix A. The application of SOM in the present study is given in section 2.
 Our main objective in this study is to examine whether particular MISO phases are preferred by ENSO, and if they are preferred, whether the relationship is interlaced with the ENSO-ISM relationship on the seasonal scale. To elucidate this, we have examined the ENSO-ISO relationship by using two different methods: Method-1, using the standardized anomalies of the input indices thus preserving the influence of ENSO on all temporal scales of ISM, and Method-2 removing the effect of ENSO on seasonal mean of the input indices. Methods 1 and 2 are described in detail in sections 2.2.1 and 2.2.2, respectively.
2. Data and Method
 The eight indices used in the study are presented in Table 1. To develop the time series of dynamical indices, we have used the National Centers for Environmental Prediction/National Center for Atmospheric Research (NCEP/NCAR) reanalysis data [Kalnay et al., 1996] for 60 years (1948–2007), available from http://www.cdc.noaa.gov/ at a horizontal resolution of 2.5° × 2.5°. The parameters used from NCEP/NCAR reanalysis data are zonal, meridional, and vertical components of wind, air temperature at pressure levels, and specific humidity at pressure levels. The High resolution (1° × 1°) gridded daily rainfall data from National Climate Centre (NCC), India Meteorological Department (IMD), Pune [Rajeevan et al., 2006] and monthly sea surface temperature (SST) data from Extended Reconstructed SST (ERSST) version 2 [Smith and Reynolds, 2004] obtained at a horizontal resolution of 2° × 2° are also used for the same period, as supplementary data.
Table 1. List of Dynamical Indices Used in SOM Classification and the Regions Used to Define Thema
The references for the well-known indices are also given.
 A SOM is a type of artificial neural network technique which uses “unsupervised learning” to produce a low-dimensional (typically two-dimensional), discretized representation of the multidimensional input space of the training samples, called a map. It is an effective method for feature extraction and classification. SOM methodology is explained in detail in Appendix A.
2.2.1. Basic Steps Involved in the Implementation of SOM
 The indices used in the study are known to portray the seasonal mean ISM and its variability, and most of them have been defined/used by earlier researchers (e.g., Webster-Yang (WY) index [Webster and Yang, 1992], Goswami (GO) index [Goswami et al., 1999], Wang-Fan (WF) index [Wang and Fan, 1999], tropospheric temperature (TT) index [Xavier et al., 2007], and the specific humidity 850 (S850) index). Since seasonal mean monsoon and ISOs are governed by a common mode of spatial variability, it is worthwhile to use these indices to classify MISO phases. As the eight indices used in our SOM classification (Table 1) vary in their magnitude over a large range (starting from the order of 10−3 for specific humidity indices to the order of 103 for the kinetic energy (KE) index), we have used the corresponding standardized values to generate the SOM clusters. All indices exhibit significant intraseasonal variations within a particular monsoon season and possess a linear correlation with the precipitation index (not shown). However, their respective correlations with the precipitation index are weak and the phase relationships between them change from event to event, indicating a certain degree of nonlinearity embedded in the relationships. Chattopadhyay et al.  has shown that the SOM technique is highly efficient in isolating nonlinearly coupled ISO states from the combination of the large-scale indices alone. The basic steps of SOM clustering for Method-1 are described in this section. Method-2 is discussed in section 2.2.2.
22.214.171.124. Deciding the Number of Nodes
 While deciding on the total number of nodes, the basic requirement we imposed is that the nodes should be kept to a minimum, such that they will have the least distortion and a sufficiently low quantization error (a measure of error that is due to a reduction in the output dimension). Also, the nodes should reproduce the basic characteristics of the known phases of the oscillation (e.g., active states, break states, and normal states). We consider the mean spatial patterns associated with active, break, and normal conditions [e.g., Krishnamurthy and Shukla, 2000; Webster et al., 1998; Goswami and Ajaya Mohan, 2001] as the “base” states (A, B, and C states, respectively). Any of these base states, say A can, in turn, have “normal” (ensemble mean, A0), “above normal” (A+) and “below normal” (A−) substates. Similar substates can be obtained for the states B and C. These substates of any base state could be characterized either by an east-west or north-south shift of the dominant spatial pattern or by an increase or decrease in the intensity of the pattern that is due to movement of the monsoon trough. Thus, a minimum of nine (3 A(0,+,−), 3 B(0,+,−) and 3 C(0,+,−)) states are required to detail the regional patterns and their transition from one phase to another (similar to a nonlinear curve-fitting problem which typically requires at least 8–9 points to trace a full nonlinear curve). Therefore, based on a consideration of mathematical optimization and the physical requirement of identifying distinct patterns, a configuration of 3 × 3 states is chosen.
126.96.36.199. Preparation of Data
 The standardized anomalies of the eight large-scale indices are now to be arranged to give input to the SOM routine. To determine whether a particular day (target day) from each of 122 days (starting from 1 June) for each of 60 (1948–2007) years is associated with a particular node in the 3 × 3 lattice, we consider the target day (N), the previous three days (N–1, N–2, and N–3) and the next 3 days together. This exercise is performed to include the evolutionary history of the pattern associated with each target day. Thus, we have data for 7 days for each of the eight indices, that is, 7 × 8 = 56 input values for any target day. Also for each target day, we include the 1 May value of all eight indices for the target day's corresponding year (i.e., adding another eight input values). The 1 May information is added to make the reference vector “informed” (initialized) to the pre-monsoon condition of each variable for each year. Finally, the Julian day variation of eight parameters is introduced as a variable (input value) to represent the annual cycle, according to Cavazos : sin[(2πt/365)−π/2], where t is the target day. Thus, the input vector has 65 (56 + 8 + 1) components (input values) for each target day. Similarly, the associated reference vector has 65 weighing coefficients.
188.8.131.52. Random Initialization and Training
 After determining the number of nodes and constructing the data set, each reference vector of the 9 nodes is initialized with a random value, with the condition that none of the 9 initial reference vectors are identical. The input vectors (having identical dimensions as the reference vectors) are then broadcast parallel to each of the nodes. If the Euclidean distance between the input vector x(n) and initial code vector at any of the 9 nodes is minimum, it is the winning node. The code vector of the winner node is changed according to equation (A1) in Appendix A. The iteration is then continued as many times as the total data record we wish to train. Further, this process is also repeated many times (many training cycles), starting from a large number of neighbors and high learning rate, until it is fine-tuned to a single nearest neighbor and until the learning rate is converging to zero. Thus, finally, the weight vectors for the nodes are arranged nonlinearly (because of the inclusion of the neighborhood) into distinctly separated nodes. After this initialization and training of the reference vector, we classify the sample of 60 years. Since each input vector has to be associated with a particular node, the corresponding target day will also be associated with that node. The dates clustered at each node are identified. If the summer MISO is a convectively coupled oscillation, the actual value for the different variables (indices) on the dates clustered at a node corresponds to the commonality among the various input parameters, and each pattern should be strongly related to a particular phase of the precipitation oscillation. In particular, one of the nodes should correspond to the active pattern, while another should correspond to that of the break pattern.
2.2.2. SOM Clustering by Method-2
 To remove the ENSO effect on the seasonal mean from the input indices, we have regressed the seasonal mean values of the indices with respect to the seasonal value of the ENSO index (defined as concurrent JJAS value of Nino 3.4 SST). Subsequently, the respective estimated seasonal mean component of the indices, which, in turn, are associated with the ENSO, are subtracted from the corresponding daily values of the indices. After that, the standardized values of the residuals that do not contain any ENSO-associated component from a linear sense are calculated and given as the input to the SOM routine. Clustering is then carried out in the same way as explained in Section 2.2.1.
3. Results and Discussion
3.1. Basic Characteristics of ISOs Derived Using the Two Methods
 MISOs in the form of active and break spells within the monsoon season arise as a manifestation of fluctuations in the intertropical convergence zone (ITCZ) between its continental and oceanic locations. The periodicity of a complete MISO cycle, comprising active, intermediate, and break phases, ranges from 30 to 60 days, with an average of 45 days. The active/break spells are conventionally identified based on criteria, such as rainfall or convection anomalies averaged over the central Indian region, exceeding or falling below a threshold [e.g., Annamalai and Slingo, 2001; Krishnan et al., 2000; Gadgil and Joseph, 2003; Joseph et al., 2009; Rajeevan et al., 2010]. Such definitions, in general, identify only the intense shades of the MISO, i.e., the most intense active or break phase. In this study, we objectively identify the broader shades of the MISO, which constitute the most intense to less intense active/break phases as well as the intermediate/normal phases. Traditionally, active/break phases are identified when their duration is at least 3 days [Joseph et al., 2009], but, as per our definition, the active/break spells may be of 1 day or even 30 days in duration, as our intention is not to identify only the intense active/break shades, but to identify all shades of MISO. In addition, the identification of rainfall MISO phases is done using broad-scale dynamical and thermodynamical indices, defined over a large region (not concentrated only over central India). Thus, the procedure to identify the MISO phases in this study differs slightly from traditional methods.
 Here, we have considered the whole JJAS season unlike conventional methods that consider mainly July–August for the identification of active/break phases. This change is because the onset and withdrawal phases are also associated with some phases of the MISO. Further, ENSO influence tends to peak toward the end of the monsoon season (September). Since our objective is to identify the role of ENSO in modulating MISO, it will be difficult to do that if we omit September. In this subsection, the basic characteristics of the MISO phases are discussed.
 Once we obtain the classification using the SOM algorithm, the dates from the 60 year data are collected at each node. In order to test whether the SOM nodes based on only large-scale indices (without using information on rainfall data) are related to organized rainfall anomalies associated with different shades of MISO, we composited the rainfall anomalies area-averaged over central India (CI; 70°E–85°E, 15°N–25°N) that corresponded to the dates associated with each of the 3 × 3 nodes (Table 2). A brief description of the interpretation of SOM clusters follows: A strong positive rainfall anomaly associated with the node (3,1) indicates that the node clustered by the large-scale indices correspond to a strong active phase of ISO; whereas, a strong negative rainfall anomaly associated with the node (1,3) indicates that the node corresponds to an intense break situation. Rainfall anomalies corresponding to other nodes signify that nodes (1,2) and (1,1) correspond to less acute break states and nodes (3,2) and (3,3) represent less intense active phases. Nodes (2,1), (2,2), and (2,3) correspond to near-neutral states of ISO. Table 2 also gives the average number of days per ISO event that are clustered at each node. The number of “events” at each node is determined by counting the number of times the data records are mapped consecutively to a particular node without a break. The average number of days per event is defined by dividing the total number of days mapped on to a SOM node by the number of events counted for that node. It is illustrated from Table 2 that the maximum number of days per event cluster at extreme nodes (1,3) and (3,1). This indicates that these two nodes are relatively more stable than other nodes, based on their residence times. Assuming that one full ISO (active-break-active) cycle is an episode, the number of days per episode (obtained by summing the average days per event at all nodes) is about 43, which corresponds to the average periodicity of a low-frequency ISO event. Thus, it becomes clear from Table 2 that SOM can effectively classify the different phases of ISO in rainfall, from large-scale indices alone.
Table 2. Rainfall Anomalies Averaged Over 70°–85°E, 15°–25°N and the Average Number of Days Per Events at Each Node for Method-1
Rainfall Anomalies (mm day−1)
Average Number of Days Per Events
 Importantly, if we plot the composite rainfall anomalies over the Indian subcontinent corresponding to the dates clustered at each SOM node (Figure 1), nodes (3,1) and (1,3) reproduce well-known patterns of active and break phases, respectively, with considerable regional details. The other nodes represent developing or decaying active/break phases. Further, northward propagation of rainfall anomalies can be clearly seen by following the nodes in Figure 1 in a clock-wise direction starting from node (1,1).
 The propagation characteristics of each SOM node are shown in Figure 2, which depicts the probability (as a percentage) of propagation of each node. If we start from, say, node (1,1), that node is seen to exhibit a maximum probability to move to node (1,2), although there are chances that it may go to node (2,1) or node (2,2). Likewise, node (1,2) has an approximate 60% chance of going to node (1,3), but there are still chances, with less probability of course, that the MISO may opt for any of nodes (1,1), (1,3), (2,2), or (2,3). Thus, in a broad sense, the ISO cycle has a maximum probability of following the path (1,1) → (1,2) → (1,3) → (2,3) → (3,3) → (3,2) → (3,1) → (2,1) → (1,1). However, since ISOs are chaotic and exhibit event-to-event variability, they can also follow different paths, and there is a likelihood that the neutral node (2,2) also get involved in the ISO path, thus lengthening/shortening the total period of the ISO cycle.
 Having said that the extreme nodes have maximum residence time, we calculated and plotted the cumulative rainfall anomaly (CRA; averaged over the entire Indian subcontinent, i.e., over all Indian land points), associated with the days clustered at each node (Figure 3). The percentage contribution of the CRA at each node, to the seasonal mean is also depicted in Figure 3. Note that the maximum contribution comes from extreme nodes (1,3) and (3,1). The contributions from all other nodes are meager. Thus, it is noteworthy that, if we could find some relation of the rainfall or the residence time associated with these two nodes with the ENSO Index, then the seasonal mean prediction could be improved.
Table 3 illustrates composited rainfall anomalies averaged over the CI region and the average number of days per event clustered at each SOM node, for Method-2. Note that even after removing the effect of ENSO on the seasonal mean, the rainfall anomalies at each node still correspond to various shades of ISO (compare with Table 2). Compared to Method-1 values, the node values have not markedly changed. The average number of days per event are highest for extreme nodes (1,3) and (3,1), that is, for the intense break and active phases, respectively. The number of days per ISO episode totals 42, which indicates that, in Method-2, a complete ISO cycle is clustered by the SOM.
 The spatial pattern of composite rainfall anomalies corresponding to each node is shown in Figure 4. The patterns of the break phases, nodes (1,3), (1,2), and (1,1), and the patterns of the active phases, nodes (3,1), (3,2), and (3,3), are almost similar to the ones clustered by Method-1. This is because of the fact that the ENSO effect on MISO is not so large that it does not change the intrinsic spatial pattern associated with MISO; instead it could modulate the residence time of each MISO phase, thereby lengthening or shortening their duration. Only the patterns of the neutral phases are slightly different between Method-1 and Method-2. The propagation characteristics are also similar to the ones for the two methods (Figure 5), with Method-2 results broadly following the path (1,1) → (1,2) → (1,3) → (2,3) → (3,3) → (3,2) → (3,1) → (2,1) → (1,1).
 The CRA and its percentage contribution to the seasonal mean are depicted in Figure 6. It is clear from Figure 6 that even after removing the ENSO effect on the seasonal mean, the contribution to the seasonal mean comes largely (∼70%) from the typical active (node (3,1)) and break (node (1,3)) phases. Compared to Method-1, the contribution is reduced slightly for node (1,3).
3.2. Relationship Between ENSO and ISM on an Intraseasonal Scale
 It is clear from section 3.1 that the SOM technique is efficient in separating the various phases of ISO, from raw data as well as from data from which the ENSO-related signals are removed. It has also been shown that the contribution to the seasonal mean mainly comes from the nodes or ISO phases that have maximum residency time, that is, nodes (1,3) and (3,1). Here, we examine the relation of these two nodes with ENSO, as these two nodes are significant to seasonal mean prediction.
 The average number of days per event clustered during ENSO years for the canonical active and break phases is calculated for methods 1 and 2 and shown in Table 4. The days clustered during El Niño years are given in bold letters and those during La Niña years are in italics. In Method-1, the influence of ENSO on all temporal scales of ISM has been retained. Therefore, for the typical break phase, the average number of days per event during El Niño years is about 12.95; whereas that during La Niña years is only 6.27. For the active phase, the average number of days is 6.67 (12.07) during El Niño (La Niña) years. This clearly indicates that during El Niño years, the canonical break phase is preferred more, and with more residency time, than in the La Niña years, in which the typical active phases are favored. The difference between the break and active periods are statistically significant (95% significance level, one-tailed Student's t-test). For Method-2, in which the ENSO effect on seasonal mean has been removed, the number of days is slightly reduced for the break node during El Niño years and slightly increased during La Niña years. In the active phase, the average number of days per event is comparable during both El Niño and La Niña years (the difference is statistically insignificant). Interestingly, the average number of days per event for both break and active phases during La Niña years is comparable (the difference is statistically insignificant). This gives an indication that even after removing the ENSO effect on seasonal mean, the preference for the break node is extant during El Niño years, whereas there is no specific preference for the active node during La Niña years.
Table 4. Various Statistics of Most Active and Most Break Spells, When SOM Clustering is Done Using Methods 1 and 2
Number of Days Per Events at Each SOM Node (El-Nino; La Nina)
Correlation of the Cumulative Rainfall Anomalies Associated With the Days Clustered at Each SOM Node With ENSO Index
Correlations significant at 99% significance level.
Table 4 also shows the correlation coefficient (CC) calculated between the ENSO index and the CRA (averaged over all Indian land points) associated with the days clustered at both the active (3,1) and break (1,3) nodes. It is interesting to note that the CC is significant (at 99% significance level) for the break node in both cases. On the other hand, the CC for the active node is significant only for Method-1; for Method 2, the CC is insignificant.
 Thus, based on the ENSO effect on residence time and on the CCs, especially for residence time, it is proposed that the relationship of El Niño with the typical ISM break phase is nearly independent of the ENSO-monsoon relationship on a seasonal scale. The break node is preferred more during El Niño years, even after the removal of ENSO effect on the seasonal mean. The year-wise contribution of the node to the seasonal mean has been slightly reduced in Method-2. This is also evident in the CC results. On the other hand, the La Niña–active node association seems to be interwoven with the seasonal relationship. The preference for the active node during La Niña years vanishes when SOM classification is performed using Method-2. Here, the average number of days per event for both break and active nodes during La Niña years is comparable. Also, the number of days per event for the active node is similar during both El Niño and La Niña years. This is consistent with the loss of significance in the CC values for the active node when clustering was done using Method-2.
4. Summary and Conclusions
 Uncertainty in the prediction of IAV of ISM rainfall comes partly from the event-to-event variability associated with the convectively coupled MISOs, which form an integral part of the internal variability of ISM. Thus, indications of modulation of MISO characteristics by external components such as ENSO would provide some hope in overcoming this uncertainty. The association between ENSO and ISM on a seasonal scale is well known; however, their relationship on an intraseasonal time scale has not been explored by many researchers. In this study, we attempt to comprehend this ENSO-MISO relationship, using SOM, a nonlinear pattern recognition technique. The nonlinear nature of the MISOs prompted us to use SOM to recognize the various shades of MISO from large-scale dynamical and thermodynamical indices, without giving any information on rainfall as an input. In this study, our main objective is to identify whether any particular shade of MISO is related to ENSO and, if so, whether this relationship is linked to the ENSO-ISM connection on a seasonal scale. For this purpose, SOM clustering was done in two ways: by retaining the ENSO effect on temporal scales (Method-1) and by removing the ENSO effect on the seasonal mean (Method-2).
 It is demonstrated that SOM is successful in identifying the various shades/phases of MISO, from the large-scale indices alone. The spatial patterns of the rainfall anomalies corresponding to the days clustered at each node depicts the conventional patterns related to the active, break, and normal/transition phases of MISO. The propagation characteristics of MISO are also very clear in both methods. The total number of days per event for all nodes, which in effect depicts the periodicity of the total episode clustered by SOM, is in the range of 42 to 45 days, clearly indicating that a complete MISO cycle is reconstructed by SOM from the large-scale indices.
 From the plot of CRA averaged over whole of the Indian subcontinent, it is clear that the major contributions to the seasonal mean comes from the two extreme nodes, (1,3) and (3,1). The two nodes contribute about 70%–75% of the total variance. It is noteworthy that two nodes remain as the prominent contributors, even after the removal of ENSO-related signals from the seasonal mean. Hence, we examined the relation of these two nodes with ENSO. It is found from Method-1 that the residence time or the average number of days per event for the canonical break node is maximum during El Niño years, while that for the active node is maximum during La Niña years. When the ENSO effect on seasonal mean is removed, the preference for the break node is present during El Niño years, while there is no specific preference for the active node during La Niña years. Also, the average number of days per event for both break and active nodes during La Niña years is comparable. The CC of these two nodes with ENSO index, in both cases, is also calculated. It is interesting to note that the CC is significant (99% significance level) for the break node in both cases. On the other hand, the CC for the active node is significant only for Method-1; for Method 2, the CC is not significant. This gives us an indication that the El Niño–break relationship is almost independent of the ENSO-monsoon relationship on seasonal scale. However, the La Niña–active association seems to be interlinked with the seasonal relationship.
 The study is unique in the sense that it provides clear evidence on the association between ENSO and MISO, especially on the influence of ENSO on the residence time of MISO. Yoo et al.  also indicted that ENSO can influence particular phases of ISO; however, the ISO they referred was the Madden-Julian oscillation (MJO) in the boreal summer, which has northward as well as eastward propagating components. Here, our focus is on the MISO, which has prominent northward propagation. Also, the HMM technique, which Yoo et al.  used, exhibited difficulties when trained over a reduced domain; hence, they could not produce results, consistent with the ones they got, when training was done over a bigger domain. An important advantage of the present study is that the SOM technique could isolate the various shades of MISO and its relationship with ENSO, simply by using large-scale dynamical/thermodynamical indices alone. The results have great implications for improving the predictability of the parts of IAV that are controlled by nonlinear and chaotic MISOs.
Appendix A:: The Self-Organizing Map (SOM) Algorithm
 The self organizing map or SOM is basically a pattern recognition technique or cluster algorithm based on “unsupervised learning” neural networks (i.e., the learning process without prior knowledge of the data domain or human intervention). This method is similar to standard iterative clustering algorithms such as K-means clustering [see, e.g., Gutiérrez et al., 2004]. In this study we used the Kohonen model [Kohonen, 1990] of SOM which belongs to the class of “vector-coding” algorithms [Haykin, 1999]. Given a N-dimensional (N-D) data space consisting of a cloud of data points (input variables), the SOM algorithm distributes an arbitrary number of “nodes” (or cluster centers) in the form of a one-dimensional (1-D) or two-dimensional (2-D) regular lattices in such a way that it is representative of the multidimensional distribution function, thereby facilitating data compression and visualization. Mathematically speaking, this is a process of a “topology conserving projection” from an original higher-dimensional data space into the lower-dimensional lattice [Haykin, 1999]. Each node is uniquely defined by a reference vector (or code vector) consisting of weighing coefficients. Each weighing coefficient of the reference vector is associated with a particular input variable. The SOM technique adjusts the reference vectors to the N-D data cloud (input vector) through a user-defined iterative cycle adapting the reference vector in accordance with the input vector. This adaptation is done through the minimization of the Euclidean distance between the reference vector for any jth node, Wj, and the input data vector X, to find min ∥X − Wj∥.
 For a particular data record, only one node wins, the “winner node.” “Optimal” mapping will be such that the winner node also changes the neighbor nodes, as defined by the user. This inclusion of the neighborhood makes the SOM classification nonlinear, since each node has to be adjusted relative to its neighbor. This training cycle may be continued n times and may be mathematically described as:
where Wj (n) is the reference vector for the jth node for the nth training cycle, x(n) is the input vector, Rj (n) is the predefined neighborhood around the node j and c(n) is the “neighborhood kernel” which defines the neighborhood. The neighborhood kernel may be a monotonic decreasing function of n (0 < c(n) < 1; called the “bubble”) or it may be of Gaussian type:
where α(n) and σ(n) are constants monotonically decreasing with n. Constant α(n) is the learning rate which determines the “velocity” of the learning process, and σ(n) is the amplitude which determines the width of the neighborhood kernel. The rj and ri are the coordinates of nodes j and i in which the neighborhood kernel is defined. In the present study we have used a Gaussian neighborhood. The free software for SOM used in this study is available at http://www.cis.hut.fi/research/som-research/.
 The SOM reference vectors span the data space and each node represents the position that approximates the mean of the nearby samples in the data space. An important advantage of SOM is that the smaller (larger) number of SOM nodes are allocated when the data are sparse (dense) [Hewitson and Crane, 2002]. Also, SOM arranges the distribution of nodes in such a way that similar nodes are located close together and dissimilar nodes are farther apart. Hewitson and Crane  demonstrated the advantages of SOM by using a simple artificial 2-D data set. Chattopadhyay et al.  has explained the same for a 2-D meteorological data set. In this study, we illustrate the advantages of SOM for three-dimensional (3-D) atmospheric data (Figure A1). For this, we randomly selected three indices: the S850 and KE indices from the eight indices used in the study and an index from rainfall data (PR index) defined by standardizing the rainfall anomalies averaged over central India (CI; 70°E–85°E, 15°N–25°N). The input data set contains 122 days of the summer monsoon season (1 June to 30 September) from a 60 year period (1948–2007), that is, a total of 122 × 60 = 7320 data points, for each index. Such a large sample of data is then mapped by SOM on 15 × 15 = 225 nodes. The number of nodes are chosen arbitrarily (as we shall see, the choice depends on the physical requirements of the problem in question). In Figure A1, we have plotted the scatter plot of different combinations of these three indices, for 7320 points (small closed black circles) and the mapped 225 SOM nodes (open shaded circles) onto the data points. It can be seen from Figure A1 that the nodes are placed continuously and densely in the region with abundant data points and sparsely where there are fewer data points. The node distribution also captures the nonlinearity in the data by preserving the “topology.” It is interesting to note that the features of a huge sample of 7320 points are reproduced well by a comparatively small sample of 225 points, resulting in data compression and easy visualization. Thus, it is demonstrated that the advantageous features of SOM are intact for real atmospheric data. Using the SOM routine one can also determine the dates clustered at each node and plot the input variable (here any of the three indices) on those dates for any node. Such a plot may be used to visualize a “pattern” for any variable associated with each node.
 The SOM algorithm has been used in various disciplines [e.g., Palakal et al., 1995; Chen and Gasteiger, 1997]. In meteorology, the SOM has been used for synoptic classification of weather states [Cavazos, 1999; Hewitson and Crane, 2002], for climate study and downscaling of seasonal forecast [Malmgren and Winter, 1999; Gutiérrez et al., 2005], for cloud classification [Ambroise et al., 2000], for ENSO variability and diagnostic studies [Leloup et al., 2007], for Indian Ocean variability studies [Tozuka et al., 2008; Morioka et al., 2010] among others. The SOM technique is different from other statistical analysis tools like EOF and multiple regressions. In SOM, the clustering of each node (which has a specific pattern) is based on the Euclidean distance among the reference vectors associated with a node and the input data vector. If the distance between any two reference vectors is the largest, the two nodes with be the most different and will be the patterns associated with those nodes. In EOF or EEOF analysis, the data are classified in terms of variance. However, the “orthogonality” property of the EOF modes makes it logically unsuitable to analyze a quasiperiodic oscillation like ISO. The shortcomings of EOF and various conventional techniques are discussed by Goulet and Duvel . Further, in multiple regression (linear or nonlinear), a functional relationship is computed between the predictor parameters and the predictand. In contrast, in SOM, the lattice is first chosen and then the patterns are obtained at each node through an iterative process without seeking (explicitly) any functional relationship between the parameters involved. In a way, SOM technique is analogous to (but more complex than) the nonparametric regression technique [Heskes and Kappen, 1995].
 The authors are grateful to two anonymous reviewers for their valuable comments. We are thankful for partial support from Indo-French project 3907/1. IITM is funded by Ministry of Earth Sciences, Government of India. The authors wish to acknowledge K. Ashok for help in improving the manuscript.