Optimizing Seasonal‐To‐Decadal Analog Forecasts With a Learned Spatially‐Weighted Mask

Seasonal‐to‐decadal climate prediction is crucial for decision‐making in a number of industries, but forecasts on these timescales have limited skill. Here, we develop a data‐driven method for selecting optimal analogs for seasonal‐to‐decadal analog forecasting. Using an interpretable neural network, we learn a spatially‐weighted mask that quantifies how important each grid point is for determining whether two climate states will evolve similarly. We show that analogs selected using this weighted mask provide more skillful forecasts than analogs that are selected using traditional spatially‐uniform methods. This method is tested on two prediction problems using the Max Planck Institute for Meteorology Grand Ensemble: multi‐year prediction of North Atlantic sea surface temperatures, and seasonal prediction of El Niño Southern Oscillation. This work demonstrates a methodical approach to selecting analogs that may be useful for improving seasonal‐to‐decadal forecasts and understanding their sources of skill.


10.1029/2023GL104983
2 of 10 are made by taking the mean evolution of the top-N analogs, where N is chosen by the user.There are several ways to quantify the similarity between the potential analogs and the SOI.The most straightforward method is to compute the global correlation between each potential analog and the SOI (e.g., Mahmood et al., 2022).Using a global correlation assumes that the similarity between the maps at each grid point globally matters equally.A natural next step in complexity is to compute a correlation over a region that is known to be important for predictability of a given target, such as the North Pacific for predicting the Pacific Decadal Oscillation (e.g., Wu & Yan, 2023).While this approach removes some regions that may not be useful for determining the best analogs, it still assumes that each grid point within the region is equally important and the region must be known a priori.
In the following work, we train an interpretable neural network on a proxy task that is similar to the analog problem (Section 3).The network learns a weighted mask which is used for determining analogs.The forecasting skill of the analogs selected using the learned weighted mask is tested through a perfect model approach where climate model data substitutes observations and is used to predict future climate model data.We demonstrate how this method can be applied to analog forecasting through two prediction examples: forecasting 5-year sea surface temperature (SST) anomalies in the North Atlantic (Section 4) and wintertime SST anomalies in the tropical Pacific (i.e., El Niño Southern Oscillation; Section 5).In these examples, analogs identified using the weighted mask provide more skillful forecasts than analogs that are identified in a way that is globally or regionally uniform.In addition, we show that these masks, once generated by a neural network, can be modified post hoc to further investigate the importance of each region for seasonal-to-decadal prediction (Section 5).

Climate Model Data
We use monthly SST from the historical run of the Max Planck Institute (MPI) for Meteorology Grand Ensemble (GE; Maher et al., 2019) at 2° latitude by 2° longitude resolution.This data set contains 100 members and each simulates 156 years  of the Earth's climate with historical forcing.The MPI-GE uses the MPI Earth System Model version 1.1 (ESM1.1;Giorgetta et al., 2013).Each member is initialized using a different year of the preindustrial control simulation such that the differences between ensemble members are a product of internal variability.Learning the weighted mask requires a large number of training samples, which makes the 15,600 years provided by the MPI-GE historical simulations a natural fit for this task.

Standardization and Selection
Subsets of the MPI-GE ensemble members are used for different purposes.Our library of potential analogs is made up of members 1-35.Members 36-50 are the SOIs for training the neural network, members 51-55 are the SOIs for the early stopping validation set (which is used to prevent overfitting to the training data), and members 56-60 are the SOIs for the tuning validation set (which is used to identify optimal hyperparameters for the neural network).Finally, members 96-100, which are withheld until the very end, are the test set for making and evaluating the analog forecasts.Details on the process of tuning and training the neural network, including selecting the hyperparameters, can be found in Section S1 of Supporting Information S1.
Each sample i or j, from the SOIs or the library of potential analogs, is composed of an input field (I SOI,i or I analog,j ) and a target (T SOI,i or T analog,j ).The input fields are one or more maps of global SST leading the targets over some earlier period (the "input period").The targets are time-and area-mean SST anomalies over a certain region and forecast window.
We removed the forced signal from the climate model data by subtracting the ensemble mean of the library of potential analogs at each location and year from each set of data.After the forced signal was removed, the data was standardized by dividing by the standard deviation at each grid point across the library of potential analogs.By using the library of potential analogs to calculate the forced signal and internal variance we treat the SOIs as if they are truly unseen data as we would when forecasting.

Metrics
We measure forecasting skill with a mean absolute error (MAE) skill score.This skill score is calculated by comparing the MAE of the analog prediction for the SOIs in the test set with the MAE of climatology, as: such that a perfect prediction has a score of one, and a climatology prediction has a score of zero.Climatology is the prediction by the mean state, which is zero for this standardized data.Analog forecasts made using the weighted mask are compared with the following additional baselines: a global analog forecast, a target region analog forecast, a mean target evolution forecast, and a random forecast.In the global analog forecast (target region analog forecast), the analogs are selected if the unweighted mean-squared error (MSE) over the entire globe (target region) is the smallest.The mean target evolution forecast is based on how the targets in the input period evolve on average and is detailed in Section S2 of Supporting Information S1.The random forecast is made by randomly selecting targets from the library of potential analogs and using them as the prediction.In addition to the MAE skill score, the Pearson correlation coefficient can be found in Figure S6 of Supporting Information S1.

Optimized Analog Forecasting Approach
Our goal is to find optimal analogs for forecasting a specific target.To do this, we train a neural network to identify a spatially-weighted mask.This weighted mask is then multiplied by the SOI and potential analogs and the MSE between the weighted maps is used to determine how similar they are (Figure 1).This weighted mask should contain large values where similarity between the analogs and the SOI is most important for predicting the target and near-zero values where similarity between the maps is not important.With this architecture, the MSE will be low if the maps agree where the mask weights are high, regardless of the differences between the maps where the mask weights are low.For the plots in this paper, the mask is normalized by dividing by the sum of the weights times the size of the input, such that the mean weight is one.
We generate the weighted mask by training a neural network on a proxy task that is tangential to our main goal.While our goal is to identify a weighted mask that is optimized for making an analog forecast, our proxy task is to predict the difference in T SOI,i and T analog,j given I SOI,i and I analog,j .En route to making this prediction, the neural network must learn the weighted mask, multiply it by the two input maps, compute the MSE between these weighted maps, and finally convert the MSE into a predicted difference in the targets.This process is depicted in the red box of Figure 1.
Once the weighted mask has been learned, a neural network is no longer needed to make analog predictions.The weighted mask is multiplied by the SOI and each potential analog, the MSE is computed between the weighted SOI and the weighted potential analogs, and the potential analogs with the lowest MSE are used to make the analog forecast.While the proxy task is not identical to the analog problem, it provides a weighted mask that improves analog forecasting skill, as we will show in Sections 4 and 5.

Multi-Year Prediction of North Atlantic Sea Surface Temperature
We first test our analog forecasting approach on a multi-year prediction of SSTs over the North Atlantic.North Atlantic SSTs exhibit clear variability on multi-annual timescales (Jackson et al., 2022) and exhibit potential for skillful decadal forecasts (Hawkins et al., 2011;Sutton & Allen, 1997).SST variability in the North Atlantic has been associated with weather and climate anomalies globally, including Atlantic hurricane frequency and intensity (Balaguru et al., 2018;Goldenberg et al., 2001), northern hemisphere precipitation (Enfield et al., 2001;Si et al., 2023), and the strength of the Asian summer monsoon (Shekhar et al., 2022).In this prediction problem, we use global maps of SST, averaged over the previous 5 years, to predict the mean SST anomaly in the North Atlantic (40°-60°N, 10°-70°W) over the following 5 years.
The weighted mask learned by the neural network is shown in Figure 2a.The Greenland Sea and the gulf stream region in the western North Atlantic emerge as the most important regions for identifying analogs in the MPI-GE.
Over the western North Atlantic, there is an area of zero weight between two areas of high weight.These may be where the boundaries of persistent SST anomalies vary, and the neural network has learned that the specific locations of these boundaries are not important for the prediction problem.Previous studies that have used an analog approach to assess North Atlantic decadal predictability selected the best analogs by taking a correlation over the whole globe (Mahmood et al., 2022) or the entire North Atlantic basin (Menary et al., 2021).As shown in Figures 2b-2d, when using the weighted mask, the best analogs only have to look like the SOI in the highest weight regions.An example SOI is shown in Figure 2b and its best analog in Figure 2c.These two maps look similar in the North Atlantic, but are starkly different in the North Pacific and Indian Ocean, among other regions.
Once the weighted mask has been applied to the SOI (Figure 2d) and its best analog (Figure 2e), the maps look nearly identical.
These results suggest that using uniform weights across the entire North Atlantic basin, or the whole globe, may lead to a selection of analogs that are not optimized for forecasting multi-year variability in the North Atlantic.Indeed, we see that this is true in the skill scores shown in Figure 3a.For 1 ≤ N ≤ 50, where the top-N analogs are averaged, our weighted mask analog forecast outperforms the global and target region analog forecasts, as well as the climatology, mean target evolution, and random baselines.The skill score is lowest when only the single best analog is used for forecasting, and subsequently improves for larger N. Given that the skill score maximizes (3) Make a prediction using the best analog(s).In the blue box, we show our weighted-mask approach for determining the similarity of two maps.The weighted mask is multiplied by the SOI and a potential analog before computing the mean squared error (MSE).In the red box, the interpretable neural network architecture is shown.Two input samples are multiplied by a matrix of trainable weights and the MSE is computed.This MSE is then converted to a predicted difference in the sample targets using a group of fully-connected dense layers.Note that the weighted mask has the same dimensions as the input field(s), despite the coarser resolution in this figure.

Seasonal Prediction of El Niño Southern Oscillation
In addition to improving prediction skill, the weighted mask can be used to explore precursor patterns within the climate simulated by MPI-GE.This is a major benefit of the interpretable neural network architecture, as the weighted mask can be compared to known precursor patterns to improve trust in the weighted analog forecasts and provide new insight into Earth system predictability.Here, we extend the application of the weighted mask analog approach to seasonal forecasts of ENSO.ENSO precursors are well-studied providing an ideal case for exploring the utility of the weighted mask for predictability studies.
ENSO is the leading mode of global annual SST variability (Hsiung & Newell, 1983) and has an extensive influence on global weather and climate (reviewed in Yeh et al. (2018)).Analog forecasting has been applied to seasonal prediction of ENSO in several studies due to its potential to outperform initialized GCM forecasts (e.g., Ding et al., 2018Ding et al., , 2019).In the following example, we use wintertime (November-March) global SST anomalies to forecast SST anomalies in the Niño 3.4 region (5°S-5°N, 120°-170°W; Barnston et al., 1997;Hanley et al., 2003) the following winter.
The weighted mask for forecasting ENSO looks markedly different from that for forecasting North Atlantic multiyear variability (Figure 4a).While a few regions are assigned higher weights, the weights in Figure 4a are much more uniform across the globe than in Figure 2a.The four main regions that stand out in this weighted mask have also been identified as important precursors in previous literature: the western North Pacific (e.g., S.-Y.et al., 2015), and the tropical Pacific itself (e.g., Capotondi & Sardeshmukh, 2015).The skill score of the global analog forecast (Figure 4b) is similar to that of our weighted mask analog forecast (but always lower, see Figure S3 in Supporting Information S1), which is not surprising since the values of the weighted mask are near one for most areas of the globe.
Since the weighted mask can be manually updated post hoc, we use this to explore the sensitivity of the forecast skill to which regions are included in the weighted mask.Figure 5a shows the weighted mask for ENSO prediction (Figure 4a) but where the smallest 95% of the weights have been set to zero.Forecasts made with this "constrained" weighted mask have similar skill to the original weighted mask (as shown in Figure S4 of Supporting Information S1).From the constrained weighted mask, we identify four main precursor regions for ENSO: the West Pacific (ocean grid points bounded by 0°-40°N, 100°-170°E), the Tropical Pacific (25°S-10°N, 170°E−65°W), the Baja Coast (10°-40°N, 110°-140°W), and the Tropical Atlantic (0°-20°N, 20°-80°W).
We assess how important each precursor region is in two ways.In the first approach, we test the skill score of analog forecasting when each region is occluded from the constrained weighted mask (weights in that region are set to zero).When all four regions are included, the skill score is 0.146.Removing any of the four regions from the weighted mask results in a skill score decrease.Interestingly, removing the Tropical Atlantic results in the most drastic decrease in prediction skill.While the Tropical Atlantic has been connected to ENSO predictabilitytropical Atlantic SSTs modulate the Walker Circulation and, in turn, the SST gradient of the tropical Pacific (Ding et al., 2012;Martín-Rey et al., 2015)-it is not considered a primary driver (C.Wang, 2018).In the second approach, we isolate each of the four regions (weights outside that region are set to zero).There is no

Discussion and Conclusions
We have shown how an interpretable neural network can be used to identify a weighted mask that improves the selection of analogs for seasonal-to-decadal forecasting.The precursors identified in the weighted masks 10.1029/2023GL104983 9 of 10 are not necessarily causal, but they do provide the optimal predictors for the given input.In this work we have constrained the neural network to learn one mask that represents all pathways of predictability, however allowing the network to learn different masks for different SOIs could lead to better analog forecasts.
This paper is intended to demonstrate the weighted mask approach to analog forecasting.For clarity and simplicity, we only used a single input map of SST to predict a future target SST in this work.However, this methodology is designed to identify masks for multiple inputs (e.g., different variables, time lags) as well.We provide an example of this in Figure S5 of Supporting Information S1, where we include the time tendency of SST as a second input variable for the North Atlantic multi-year prediction example.Including sea surface height or ocean heat content as an additional variable (e.g., Ding et al., 2018;Gordon & Barnes, 2022) has the potential to improve prediction skill in the North Atlantic and tropical Pacific and would provide a unique mask for where these variables provide information beyond SST alone.
We have explored this method through a perfect model setup.As such, the identified precursors are intrinsic to the MPI-ESM and may not reflect patterns of predictability in the observed Earth system.Although there are known issues in the MPI-ESM's ability to simulate North Atlantic SSTs, including a warm bias and a weak meridional gradient (Sein et al., 2020), this learned weighted mask still acts to improve observational forecasts relative to a uniform mask.These results, and a comparison with an initialized dynamical forecast, can be found in Section S3 of Supporting Information S1.Training the weighted mask on a multi-model ensemble may provide patterns that are more consistent with observations (e.g., Kirtman et al., 2014;Rader et al., 2022) and allow for enhanced analog predictions on real data.Additionally, we could train on models and observations at the same time to identify a weighted mask that is more representative of the true Earth System.We believe that this weighted mask approach will be influential to analog forecasting moving forward.

Figure 1 .
Figure1.Optimized analog forecasting method and interpretable neural network architecture.The analog forecasting method can be described in three steps: (1) identify a state of interest (SOI) and a library of potential analogs.(2) Determine which maps are the most similar.(3) Make a prediction using the best analog(s).In the blue box, we show our weighted-mask approach for determining the similarity of two maps.The weighted mask is multiplied by the SOI and a potential analog before computing the mean squared error (MSE).In the red box, the interpretable neural network architecture is shown.Two input samples are multiplied by a matrix of trainable weights and the MSE is computed.This MSE is then converted to a predicted difference in the sample targets using a group of fully-connected dense layers.Note that the weighted mask has the same dimensions as the input field(s), despite the coarser resolution in this figure.

Figure 2 .
Figure 2. Weighted mask and example for multi-year predictions of North Atlantic sea surface temperature (SST).(a) Weighted mask, as learned by the interpretable neural network.(b) Standardized SST anomalies for a sample state of interest (SOI).(c) Standardized SST anomalies for the best analog associated with the SOI.(d) Weighted SOI.(e) Weighted best analog.

Figure 3 .
Figure 3. Analog forecasts of North Atlantic sea surface temperature.(a) Skill scores for our weighted mask analog forecast and other baselines.(b) Weighted mask analog forecasts for 200 years of MPI-GE simulations, including the mean prediction from the top-10 analogs, the spread of these predictions, and the truth values.

Figure 4 .
Figure 4. Weighted mask and skill scores for seasonal predictions of El Niño Southern Oscillation.(a) Weighted mask.(b) Skill scores for our weighted mask analog and other baselines.

Figure 5 .
Figure 5. Analog forecasting skill of El Niño Southern Oscillation when various regions are occluded or isolated.(a) As in Figure 4a, but the lowest 95 percent of weights are set to zero.Four regions of focus are highlighted by the colored boxes.(b) Skill scores for analog forecasts when each region is occluded from the mask (top) and when the region is isolated to make a forecast (bottom).