Deep Residual Convolutional Neural Network Combining Dropout and Transfer Learning for ENSO Forecasting

To improve EI Niño‐Southern Oscillation (ENSO) amplitude and type forecast, we propose a model based on a deep residual convolutional neural network with few parameters. We leverage dropout and transfer learning to overcome the challenge of insufficient data in model training process. By applying the dropout technique, the model effectively predicts the Niño3.4 Index at a lead time of 20 months during the 1984–2017 evaluation period, which is three more months than that by the existing optimal model. Moreover, with homogeneous transfer learning this model precisely predicts the Oceanic Niño Index up to 18 months in advance. Using heterogeneous transfer learning this model achieved 83.3% accuracy for forecasting the 12‐month‐lead EI Niño type. These results suggest that our proposed model can enhance the ENSO prediction performance.

HU ET AL. 10.1029/2021GL093531 2 of 9 The most remarkable work is the CNN-based model that can make effective forecasts 17 months in advance (Ham et al., 2019), outperforming most existing methods. This model is trained on Coupled Model Intercomparison Project phase5 (CMIP5) and reanalysis data to predict the Niño3.4 index. However, the model has few layers, only convolutional and pooling layers, does not use residual structures, and does not use some techniques to improve the predictability except for the use of transfer learning on CMIP5. Recent studies have shown that the dropout technology can improve the performance of shallow neural networks applied to temperature simulation problems (Piotrowski et al., 2020). Through extensive experiments they show that improving model performance and stability requires nodes to be discarded with much lower probability than common deep neural networks (about 1%, instead of 10-50% for deep learning). Due to a number of layers applied in our model, we consider using the dropout in more detail to further improve the prediction ability. Additionally, comparing to the existing research on ENSO prediction which only performs transfer learning on simulated data, we also consider transferring the knowledge learned from the task of predicting the Niño3.4 index to the tasks of predicting Oceanic Niño Index (ONI), so-called homogeneous transfer learning.
There are various methods of predicting ENSO types, for example, based on the random forest (Santos et al., 2020), multimodel ensemble (Ren, Scaife, et al., 2018), and CNN (Ham et al., 2019). In this work, we focus on the CNN method trained on CMIP5 data to predict EI Niño types. The accuracy remains 66.7% at lead times of 12 months. However, they have only expected the types of EI Niño, not yet the types of La Niña and the normal events. Besides, using transfer learning in the index prediction leads to a slight performance improvement, while in the type prediction, no transfer learning is used. Nevertheless, we can transfer the knowledge learned from the task of index prediction to type prediction. This method is called heterogeneous transfer learning, thereby further improving the prediction ability.
In this work, the main contributions are summarized as follows: 1. We propose a deep Residual Convolutional Neural Network (Res-CNN) model for ENSO predictions, including the Niño3.4 index, ONI, and types. It is worth noting that our model requires only a few changes for different tasks. We find that the Res-CNN model can effectively predict the Niño3.4 index for up to 20 months in advance, 3 months more than the previous CNN-based model. 2. Keeping the network structure intact, we show the ONI can be skillfully predicted 18 (12) months in advance with (without) homogeneous transfer learning, which provides us a new strategy for further enhancing the predictive ability of ENSO. 3. We apply heterogeneous transfer learning to enhance the type prediction. We show that the knowledge learned from the index prediction task can be transferred to the type prediction task by changing only the output layer of the model trained for the index prediction task and retraining on the type prediction task. The accuracy of EI Niño type prediction can reach 83.3% 12 months in advance, while the current best is 66.7%.

Res-CNN Model
The input data for three consecutive months are recorded as x t−2 , x t−1 , x t , the output data of the Niño3.4 index, ONI or type all referred to as y, and the forecast result can be described by where F denotes the Res-CNN model, and l is the forecast lead months from 1 to 23.
Res-CNN shown in Figure 1 uses a seven-layer convolutional neural network, a three-layer max-pooling to extract features, two-layer skip connections, and one-layer fully connected layer to generate the final result. In index predicting task, the output is a single value; while in type predicting task, the output is the probability of various categories. The convolution process of Res-CNN is the most efficient computational tool for extracting features as follows: where (x, y) is the dimensions of the feature map, l denotes the lth convolution layer, and f is for the fth feature map. M means the number of feature maps, and (P l , Q l ) is the dimensions of the lth filter. b is the bias units, w is the weight at grid point (p, q) in the convolution kernel and v l,f denotes one value of the lth filter and the fth feature map.
The parameters of our model are learned through multiple iterations of the minimization loss function of Mean Square Error in predicting index or Cross Entropy in predicting type. In our model, the residual structure can be defined as follows: where x and y are the input and output vectors of the considered layer, and the function R denotes the residual mapping to be learned. The operation R + x is performed by a shortcut connection and element-wise addition.
The other details are same as those in K. He et al. (2016), except that we use the Tanh activation function instead of the rectified linear unit and do not use the batch normalization (Ioffe & Szegedy, 2015). Because our network is shallow compared to a standard residual network, small changes in network parameters have little effect when the network is not deep. Also, because our data is insufficient and complex, if the input of each layer of the network is kept the same distribution, the model cannot be trained well. Setting the number of residual connections to 0, 1, 2, and 3, our model has various structures. In order to further improve performance, 11 different dropout rates are token, namely 0, 0.01, 0.03, 0.05, 0.07, 0.1, 0.3, 0.5, 0.7, 0.9, 0.99. Thus, for each advance month, there are 44 models ( Figure S1 in Supporting Information S1). The final model would be the best result from the model that determines the number of residual connections. See Text S1 in Supporting Information S1 for details on dropout and transfer learning techniques.

Indexes Forecast
In predicting index includes the Niño3.4 index and ONI. The number of the unit in fully connected layer is one, and the Adam (Kingma & Ba, 2014) optimization algorithm is used. The specific parameter settings can be found in the Text S2 in Supporting Information S1. We use the correlation coefficient function I as a measure of the ENSO index prediction: where O and F denote the observed and the predicted values, respectively. ̄ and ̄ denote the temporal climatology concerning the calendar month m (from 1 to 12) and the forecast lead months l (from 1 to 23). The label y means the forecast target year. Finally, s and e denote the earliest and latest validation or test year.

Types Forecast
We conduct two kinds of experiments: one is to predict three types, that is, EP, CP, and MIX of EI Niño; and the other is with seven types, that is, EP, CP, MIX of EI Niño (La Niña), and Normal Year (NY). See the Text S3 in Supporting Information S1 for more details.

Results
The All-season Correlation Skill (ACS) is shown in Figure 2 for the CNN (b) and the Res-CNN (c). The ACS of the 3-month-moving-averaged Niño3.4 index between 1984 and 2017 in the Res-CNN model is higher than almost all state-of-the-art dynamic models and the CNN model ( Figure 2a). It is worth noting that, except for the Res-CNN model, the CNN model fails to perform optimally when the lead time is less than 6 months. The correlation coefficient of the CNN model exceeds 0.5 only 17 months in advance, and worse than the Scale Interaction Experiment-Frontier (SINTEX-F) dynamic prediction model for a lead of 23 months, while the Res-CNN model reaches 20 months in advance and outperforms the SINTEX-F dynamic prediction model in all advance months. Thus, we conclude that the Res-CNN model can skillfully predict ENSO 20 months in advance, which is better than all the compared models. The Res-CNN model exhibits a higher correlation coefficient than the CNN model in almost all target seasons, especially in spring and autumn seasons. For example, when the target season is MJJ (May-June-July), the SINTEX-F model predicts a correlation coefficient above 0.5 for only four months (Table S3 in Supporting Information S1), the CNN model for 11 months (Figure 2b), and the Res-CNN model for 17 months (Figure 2c), suggesting that our model is less affected by the spring prediction barrier (SPB) than the CNN and SINTEX-F model. Typically, the SPB phenomenon is more severe in statistical models than in dynamic models (Jan van Oldenborgh et al., 2005). The Res-CNN model is less affected than other statistical methods because it is likely to make fuller use of the heat content information than other statistical methods, and accurate initialization of heat content can improve spring forecasting (McPhaden, 2003). Nevertheless, the skills are much lower for summer time than winter, which may be related to the predictability. And the "spring barrier" (western Pacific Ocean) may be a main factor to impact the summer predictability. 0.2 higher. Besides, the Res-CNN also has correlation coefficient of 0.2 higher than the CNN model 20 months in advance ( Figure S4 in Supporting Information S1), better predicting years with higher Niño3.4 index, such as 1982/1 983, 1997/1 998.
The ACS of the ONI is shown in Figure 3 for the Gaussian Density Neural Network (GDNN) (a), the Quantile Regression Neural Network (QRNN) (b) (Petersik & Dijkstra, 2020), and the Res-CNN (c). Instead of our method, which is to predict a single value, the GDNN and the QRNN are used to estimate the prediction uncertainty of ENSO forecast. Compared to the GDNN and QRNN methods, the ACS of the ONI between 1984 and 2017 in the Res-CNN model using homogeneous transfer learning is highest (Figure 3d). Notably, the correlation coefficients of GDNN and QRNN in predicting ONI from 2002 to 2011 drop below 0.5 at 7 months ahead, while our method still has 0.6 at 10 months ahead, which is almost consistent with Niño 3.4 in predicting 2002 to 2017. Besides, comparing the results of predicting Niño3.4 index and ONI from 1982 to 2017, ONI gives better results until the advance month is 12, while Niño3.4 index gives better results after that, suggesting that for index prediction with greater than 1 year, the amount of data has an impact on the model.  Table 1. We found that compared to using one-step, using two-step achieves better results in all five scenarios of A-E; comparing A-two with B-two and C-two with E-two, the results of A are not as good as those of B and C's are not as good as E's. This indicates that the pretraining and soda training methods are not as good as just using the pretraining method in type prediction, in the meantime, the distribution of SODA and GODAS data is very inconsistent, possibly due to the significant difference in the frequency of occurrence of various types in the SODA data set (Yeh et al., 2009) and the diversity after 2000 (Barnston et al., 2012). Compared with A-two and C-two, our model can still achieve 67% accuracy under 12 months ahead using heterogeneous transfer learning. Also, it can predict all the type of super ENSO 12 months in advance, especially the 2015/2 016 EI Niño, the strongest events in history, which can still be predicted at a lead time of 18 months. At present, almost all models cannot predict the event one year in advance (Tang et al., 2018). By comparing the results of A-two and D-two, the accuracy of A-two is lower than that of D-two, indicating that transfer training on SODA instead reduces the accuracy of most of the models initially trained on CMIP5. This suggests that fine-tuning on SODA does not yield better results, probably because heterogeneous transfer learning has been able to resolve, to some extent, the problem of unbalanced data distribution between CMIP5 and SODA.
Finally, to evaluate the performance of our model, we compare it with the CNN model. Figure S5 in Supporting information S1 shows our model achieves 83.3% accuracy 12 months earlier on the period from 1984 to 2017 compared to the CNN model (66.7% accuracy). These results indicate that the Res-CNN model predicts the ENSO index and type better than the CNN model.

Discussions
Through various dropout experiments, we found that we got better and more stable results at a lower dropout rate (0-0.3) than those at a higher dropout rate (0.5-0.9) ( Figure S6 in Supporting information S1). The finding differs from the conventional deep learning approach usually set with 0.4-0.6 of the dropout rate. The achievement is because high dropout rate is not suitable for this task. Therefore, too large dropout rate will lead to serious loss of regional features, and only some of the regional features will reduce the accuracy of ENSO forecast. To find the appropriate number of residual connections, we conducted ablation experiments. The obtained results ( Figure S7 in Supporting information S1) show that the effective prediction months were about 17, 18, 20, and 16 when the number of residual connections was 0, 1, 2, and 3, respectively. It was selected as our optimal model since the model with residual connections of two predicted best. Furthermore, Figure S8 in Supporting information S1 shows that using unnormalized two residual connections achieved better prediction results in comparison to using normalization, indicating that data normalization does not improve the model performance in deep learning-based ENSO prediction. Additionally, the model with a residual number of 3 can only predict the effective forecast for 17 months, indicating no further improvement of a higher residual connection number.

Conclusions
Although this study showed remarkable results, there are still some limitations. In predicting the Niño3.4 index, the predictive ability of Res-CNN is notably improved in all seasons. However, by comparing the correlation coefficient for lead months from 1 to 23 months, we found that they were nearly the lowest from late spring to fall (Table S1 in Supporting Information S1), the same as CNN (Table S2 in Supporting Information S1) and SINTEX-F (Table S3 in Supporting Information S1). This suggested that the SPB is still prevalent (Levine & McPhaden, 2015) and requires further study. Moreover, there is a large negative anomaly of the predicted SST for the first 10 years for both CNN and our model, whether this implies a change in climate or for other reasons we also need to investigate further. Holding the model structure constant to predict ONI, surprisingly, Res-CNN can effectively predict for 12 months ( Figure S9 in Supporting information S1) despite using only a small amount of data. However, the correlation coefficients were unstable at times high and low under different months of advance, and did not show a stable downward trend. To alleviate the problem, we predicted ONI using homogeneous transfer learning, and the skill was significantly enhanced. Since our model initially predicted the Niño3.4 index well, we assume that our model could learn to predict it. The ONI definition is closer to the Niño3.4 index, so the model is able to learn lots of knowledge and only needs less training to predict the ONI well. By varying only the number of the unit in the output layer of our model to predict the EI Niño type, the result in Table 1 is still almost 20% points higher than the CNN. Moreover, two-step and heterogeneous transfer learning were used in this work to predict ENSO types, with some predictive performance improvement. Note. Results of forecasting the types of ENSO 3, 6, 9, 12, 18 months in advance from 1982 to 2017. There are 36 events in total, A-E in the table represents the number of correct predictions. H and N denote the use and nonuse of heterogeneous transfer learning, respectively. One means one-step seven classes prediction, two means two-step seven classes prediction, that first predicts El Niño, La Niña, and normal year events, and then predicts whether El Niño or La Niña will be EP, CP, or MIX. Super ENSO means that A-two/C-two correctly predicted the number of 1982/1 983, 1997/1 998, 2015/2 016 EI Niño. Accuracy indicates that the accuracy of A-two/C-two. A: Train in CMIP5. B: Train in CMIP5 and then train in SODA. C: Heterogeneous transfer learning the index model to CMIP5. D: Homogeneous transfer learning the C to SODA. E: Heterogeneous transfer learning the index model to CMIP5 and then training in SODA. The model of using heterogeneous transfer is the optimal model for predicting the respective lead and target of the Niño3.4 index. The values in bold mean the best results among these options for different advance months.

Table 1 Prediction of ENSO Types
In summary, this study showed that the Res-CNN-based model can improve the long-term prediction of ENSO. Also, we found that the predictive ability can be better improved by using transfer learning and dropout techniques. The future extensions would be using different numbers of predictors and input months under different prediction months, for example, intuitively, trying fewer predictors and input months under shorter advance months.

Data Availability Statement
The research data can be found in the website (https://doi.org/10.5281/zenodo.4646653).