Harnessing the Power of Graph Representation in Climate Forecasting: Predicting Global Monthly Mean Sea Surface Temperatures and Anomalies

The variability of sea surface temperatures (SSTs) is crucial in climate dynamics, influencing marine ecosystems and human activities. This study leverages graph neural networks (GNNs), specifically a GraphSAGE model, to forecast SSTs and their anomalies (SSTAs), focusing on the global scale structure of climatological data. We introduce an improved graph construction technique for SST teleconnection representation. Our results highlight the GraphSAGE model's capability in 1‐month‐ahead global SST and SSTA forecasting, with SST predictions spanning up to 2 years with a recursive approach. Notably, regions with persistent currents exhibited enhanced SSTA predictability, contrasting with equatorial and Antarctic areas. Our GNN outperformed both the persistence model and traditional methods. Additionally, we offer an SST and SSTA graph data set based on ERA5 and a graph generation tool. This GNN case study has shown how the GraphSAGE can be used in SST and SSTA forecasting, and will provide a foundation for further refinements such as adjusting graph construction, optimizing imbalanced regression techniques for extreme SSTAs, and integrating GNNs with other temporal pattern learning methods to improve long‐term predictions.


Introduction
Sea surface temperature variability or SST anomalies (SSTAs) are associated with climate oscillations and extreme events, such as the El Niño-Southern Oscillation (ENSO), the Indian Ocean Dipole (IOD) oscillation, and marine heatwaves.SSTAs are the departures from the average SST, evaluated over a certain period.The values of climate oscillation indices and extreme events are derived from SSTs or SSTAs.Hence, the ability to accurately forecast SSTs or SSTAs for 1 month to 2 years in advance would allow mitigation of the potential impact of these climate oscillations and extreme events, for example, by enhancing the monitoring of marine reserves and protected areas, moving resources and collecting healthy DNA samples from areas under stress, adjusting aquaculture production, learning, specifically deep learning (DL), to SST and SSTA forecasts have emerged.DL models serve as an alternative to resource-intensive numerical dynamical models traditionally used for the forecasts.DL models differ by the type of data structures they are suited to work with.For example, many existing models used sequences or grids as input.DL methods that use image and video input for forecasting often treat these inputs as Euclidean data.For example, a fully-connected neural network (FCN) was proposed to predict monthly SSTAs in the Indian Ocean and the IOD index (Ratnam et al., 2020), a long short-term memory neural network (LSTM) (Hochreiter & Schmidhuber, 1997) predicted the monthly IOD index (Pravallika et al., 2022), and a convolutional neural network (CNN) was used to forecast the seasonal ENSO index (Ham et al., 2021;Ham et al., 2019).Among these neural network classes, CNNs have become widespread due to their advantage in efficiently capturing spatial patterns, and specifically the CNN model for ENSO forecasts (Ham et al., 2019) showed a comparable performance with physics-based numerical models.
However, potential improvements could capitalize on the structure of climatological data at global scale, which is different from images and videos.The teleconnections of climate events (Tsonis et al., 2006), through atmospheric and oceanic circulation or large-scale waves, are increasingly considered as an important factor for developing DL methods (Cachay et al., 2021;Taylor & Feng, 2022) and can be incorporated using the graph structure.Besides, the use of grids and CNNs still have some inherent problems: handling of missing values, and the lack of rotation equivariance (Defferrard et al., 2019) and the issue of computational receptive fields (Luo et al., 2016) for Earth data.These inherent drawbacks might affect the performance of DL-based climatological forecasts and will be discussed in the Methods section.
Considering that grids are essentially lattice graphs, exploring a more generic non-Euclidean data structure might provide a better basis for DL models.Graph representation learning has been successfully applied to environmental fields such as forecasting weather (Lam et al., 2022), frost (Lira et al., 2021), river streamflows (A.Y. Sun et al., 2022), groundwater levels (Bai & Tahmasebi, 2023), PM 2.5 (Wang et al., 2020), and the air quality index (Xu et al., 2021).In this paper, we explore the predictability of global SST and SSTA fields by a graph neural network (GNN).Specifically, for SSTAs, we focused on time horizon of 1 month which could serve a simple baseline for this type of modeling for further studies.We presented a tool for grid-to-graph SST and SSTA conversion and the results of a GNN model.Specifically, the following research questions are addressed in this paper: First, we further investigated the appropriate graph re-sampling method for SST and SSTA grids.We presented a simple tool with an alternative measure to convert the preprocessed SST grids into graphs.Due to the limitation in computational resources, it was not feasible to use complete graphs directly converted from grid data at global scale, and an efficient tool is needed.Additionally, we examined the impact of graph construction on forecasting performance.
Second, building on the preliminary results of several GNN classes from our earlier work (Ning et al., 2023), we continued our exploration of the performance of the GraphSAGE model, not only for global monthly mean SST forecasts but also extending it to SSTA forecasts.We evaluated the spatial patterns learned by the GraphSAGE especially for one lead month short-term forecasts, which could be compared with the persistence baseline.We focused on the SSTA predictability at specific marine heatwave hotspots from the oceanographic viewpoint.
Third, two ways of implementing SSTA forecasts were explored: using SSTs as input (Taylor & Feng, 2022) and using SSTAs as input (Ham et al., 2019;Zhou & Zhang, 2023).Direct forecasts by the GraphSAGE model produced more accurate short lead SSTA forecasts and could be easily adjusted to improve predictions for extreme values, that is, those associated with marine heatwaves, while computing SSTAs with SST predictions by the model has potential advantages in long lead forecasts.
The rest of the paper is organized as follows.Section 2 presents a review of related work on DL models for SST, SSTA, and/or related event forecasts.Section 3 briefly introduces the ERA5 data set used in this study.Section 4 details the methods for data preprocessing, graph construction, and GNNs, as well as reproducibility considerations.Section 5 presents the experiment results and engages in a discussion of our findings.Section 6 concludes with our contributions and potential directions for future research.forecast the seasonal Nino3.4SSTA index up to 18 months ahead, which is a climate index based on the SST field over the tropical eastern Pacific that characterizes the ENSO in the equatorial Pacific Ocean.Another distinctive aspect of their work was the application of transfer learning to improve prediction, by leveraging two supplementary data sets.Ratnam et al. (2020) proposed an FCN to forecast SSTAs and the IOD oscillation over the Indian Ocean.Also, the IOD forecasts have been made using an LSTM (Pravallika et al., 2022) and a CNN (Feng et al., 2022).Moreover, a CNN was developed to forecast SSTs and marine heatwaves around Australia (Boschetti et al., 2022).
A number of studies utilized the combination of several neural network classes for SST and SSTA forecasts.A CNN-LSTM was applied for forecasting daily SSTs in the East China Sea (Xiao et al., 2019).A deep gated recurrent unit (GRU) (Chung et al., 2014) CNN was proposed to forecast daily SSTs in the East China Sea and the Yellow Sea (Yu et al., 2020).Taylor and Feng (2022) combined the U-Net (Ronneberger et al., 2015) with an LSTM to forecast SSTs up to 24 months ahead and validated the forecasts with a focus on specific SSTA-related oscillations (ENSO and IOD) and climate events (the "Blob" marine heatwave, a heatwave that occurred off the west coast of the United States between 2013 and 2015).The U-Net-LSTM used SSTs as the input to predict SSTs, which then were converted into the Nino3.4 and Nino4 indices and the Blob index.The Blob index is the difference between the monthly average SST climatology  and the monthly average SST computed over the Blob area 34°N to 47°N and 147°W to 128°W, as an indicator of the marine heatwave.In this DL framework, to forecast SSTs, the spatial dependencies were captured by the convolutional autoencoder, the shortterm temporal dependencies were learned by the sliding window method, and the long-term temporal dependencies were captured by the LSTM.Taylor and Feng (2022) also mentioned the possible deficiency of transfer learning that was used in Ham et al. (2019), which is insufficient to correct the inherent biases in the coupled models (Timmermann et al., 2018), and highlighted the feasibility of knowledge-based physics-informed machine learning (Karniadakis et al., 2021).A U-Net was also applied to predicting sea surface salinity over the Western Pacific Ocean (Zhang et al., 2023).Some work also investigated other neural network classes than CNNs and LSTMs.Zhou and Zhang (2023) used a 3D transformer model to forecast 3D upper-ocean temperature anomalies and wind stress anomalies.The model could depict the evolution of upper-ocean temperatures and the coupled ocean-atmosphere dynamics by using the self-attention-based mechanism, and achieved a generally higher correlation skill for forecasting the Nino3.4index for 18 months in advance than the CNN (Ham et al., 2019).Notably, this enhancement in the performance is particularly evident at specific lead times: 1 month, around 10 months, and around 15 months.
The DL models outlined above input sequences or grids, that is, Euclidean data, to perform SST and SSTA forecasts.The structure of the climatological data could be further utilized by re-sampling grids into graphs.For example, Cachay et al. (2021) used a GNN for ENSO forecasts which outperformed the previous CNN model (Ham et al., 2019) for one to six lead months.Similar research on GNNs for SST forecasts include Zhang et al. (2021) that used the Euclidean distance between coordinates to define the adjacency matrix and a memory GNN to forecast SSTs for two areas near the Bohai Sea and the East China Sea, and Y. Sun et al. (2021) used the Pearson correlation coefficient to define the adjacency matrix and a time-series GNN to forecast SSTs in the northwest Pacific.For global SST forecasts, Ning et al. (2023) constructed graphs from the ERA5 SST data used in Taylor and Feng (2022) with the Pearson correlation coefficient, and investigated several GNN classes for learning on the graphs for 1-month-ahead global monthly SST forecasts.Among the GNNs, the GraphSAGE (Hamilton et al., 2017) outperformed the persistence model and the other GNNs at most nodes across the world.
Another relevant work is ResGraphNet, a combination of a GraphSAGE model and a residual network for forecasting average global monthly mean temperature anomalies (Chen et al., 2022).The data set was derived from HadCRUT5 (Morice et al., 2021), a global surface temperature data set which is a combination of SSTs over the ocean and near-surface air temperatures over the land with a spatial resolution of 5°from January 1850 to March 2022.The mean temperature anomaly data, in this context, represented the average monthly temperature anomaly time series, computed across all locations.The graphs were path graphs constructed from the time series.For predicting 1-month-ahead temperature anomalies in terms of the determination coefficient, ResGraphNet outperformed the other models, including the ResNet (He et al., 2016), GraphSAGE, UniMP (Shi et al., 2020), graph convolutional network (Kipf & Welling, 2016), LSTM, GRU, conventional machine learning models, and autoregressive integrated moving average (ARIMA) models.In this study, the graph construction and GNN modeling were based on a 1D time series, and the temperature anomalies were derived from both SSTAs and land surface air temperature anomalies.The spatial patterns, including those emerging from climate teleconnections, were not effectively captured.
Here, we extend the investigation into graph construction and GNN modeling specifically for global monthly mean SST and SSTA forecasting from a 3D grid.Specifically, for SSTA predictions, we further compare and analyze two methods: direct SSTA predictions using a GNN, and computing SSTAs from SST predictions made by the GNN.

Data
The data set for SST and SSTA forecasts was extracted from ERA5 (Hersbach et al., 2020).ERA5 is the fifth generation reanalysis by the European Centre for Medium-Range Weather Forecasts (ECMWF) for the global climate and weather covering the last nine decades.The ERA5 reanalysis product provides monthly estimates of a large number of atmospheric, land and oceanic variables at global scale with a spatial resolution of 0.25°, from January 1940 to the present.By the time we downloaded the data, ERA5 was available from January 1940 to February 2023.We downloaded the data up to December 2022, excluding the potentially unstable data from the final 2 months and to simplify interpretations by focusing on complete yearly periods.Different from grid representation learning with CNNs, learning with GNNs does not require special treatment for missing values, for example, for land pixels.In this research, we exclusively utilized the SST variable extracted from ERA5.This SST data set encompasses a spatial extent of [720,1440] pixels along the latitude and longitude dimensions correspondingly.In this data set, land pixels are represented as missing values for SSTs, and only ocean pixels were used for graph construction.The temporal domain data span from January 1940 to December 2022, with a temporal resolution of 1 month, a total of 996 months.

Methods
In this research, we investigated graphs and GNNs for global monthly mean SST and SSTA forecasts.Graphs could be a viable data structure to model the teleconnections of climate events globally.Grids and CNNs have some inherent problems: First, Earth surface data grids usually have pixels that either represent land or ocean, and some environmental variables may exist in one of these domains and leave missing values in pixels of the other domain, making replacement for these missing values a problem.Second, grids are not assumed to be rotation equivariant but Earth data are essentially spherical data with this property (Defferrard et al., 2019).Third, CNNs have an issue of computational receptive fields (Luo et al., 2016), where the input Earth data are assumed to have edges, with pixels near edges having less influence on the computation of feature maps by CNN, such as a large number of ocean pixels near the International Date Line (180°W and 180°E) in the Pacific Ocean, which is often used as the edge in map projections.These inherent problems in grids and CNNs might affect the performance of DL-based SST and SSTA forecasts.

Data Processing
Following Taylor and Feng (2022) and Ning et al. (2023), we preprocessed the data to a smaller [64,128,996] latitude (64°S to 62°N in 2°increments), longitude (180°W to 180°E in 2.8125°increments), and monthlyresolution grid.By removing pixels with highly spatially correlated values, the smaller data set substantially reduced the model training time.The unit of SSTs is Kelvin.The 3D data were split into a training set and a test set based on the temporal domain.There was a 2D latitude and longitude grid at each time step.Due to the proposed window size of 12, in the training set, the 840 2D grids at time steps from January 1940 to December 2009 were features and the grids from January 1941 to December 2010 were labels.The same 12-month window lag was applied to the 144 grids in the test set, where January 2010 to December 2021 were the feature period and January 2011 to December 2022 were the label period.
Graph Construction.Following Ning et al. (2023), we constructed the SST graphs from the 2D SST grids by defining the node feature tensor, the node label matrix, and the adjacency matrix.Different from Ning et al. (2023), we used a different measure to select the edges between nodes and constructed undirected graphs only, where each edge between two nodes does not have a direction.
To define the node feature tensor, let I ∈ R M×N×T denote a tensor that represents the preprocessed SST grid, where M is the number of points in the latitudinal direction, N is the number of points in the longitudinal direction, and T is the number of monthly time steps.There is an SST time series of T elements at every latitude and longitude coordinate.The node feature matrix without including the window size V ∈ R X×T , where X = M × N is the number of nodes, was acquired by flattening every 2D slice I :,:,t of I at time step t.V x,t is the SST value at the xth node at time step t.Given a window size w, the tensor extracted from V excluding the last w columns (the last w months), where V x,t was replaced by a vector [V x,t , …, V x,t+w 1 ], is the node feature tensor.

Earth and Space Science
The node label matrix is the matrix extracted from V excluding the first w columns (the first w months).
For the adjacency matrix A, an element A is defined by an indicator function: where V x,: and V y,: are the SST time series at any two nodes, ρ(⋅) is the Kendall rank correlation coefficient (Kendall, 1938), and c is a threshold which is a controllable parameter.Algorithm 1 describes how the approach is implemented.The decrease in the correlation threshold c leads to a substantial increase in the number of edges and node degrees.The node degree of a node is the number of edges connected to the node.We generated several sets of SST graphs using different c values.Besides, all graphs have isolated nodes and no self-loops, and graphs in the same set are spatially homogeneous.Homogeneous graphs mean that graphs have the same structure but different node features and labels at different time steps.Initialize adj_mask as empty list 3: for i in range of node_feat_mat length do 4: for j from i + 1 to node_feat_mat length do 5: Append both (i, j) and (j, i) to adj_mask 8: end if 9: end for 10: end for 11: Sort adj_mask 12: Convert adj_mask to adjacency matrix form adj_mat 13: Ensure minimum number of edges for each node in adj_mat 14: Sort adj_mat using lexsort 15: return adj_mat 16: end function We selected the Kendall rank correlation coefficient as the measure because it has greater flexibility than the Pearson correlation coefficient used in Ning et al. (2023).Our approach commenced with the utilization of simpler correlation coefficients, and the selection of appropriate measures for the edge construction remains an ongoing research question.In the constructed graphs, we noticed for specific nodes, for example, the nodes in the equatorial Pacific, have no or very few edges, even using a small threshold c = 0.3.We constructed another set of undirected graphs that preset a minimum number of edges to all nodes, denoted as m.If the number of edges to a node was smaller than m, defined as a less-connected node, there would be random edges added to the nodes to make up for the difference.There were several sets of graphs used in the experiments with different adjacency matrices, shown in Table 1. Figure A1 is a node degree map using the defined adjacency matrix when c = 0.7, and Figures A2-A4 show the connected and disconnected nodes to four example nodes using the same adjacency matrix.
Graph construction offers greater flexibility compared to grid manipulation, and provides numerous possibilities for defining SST and SSTA graphs.However, because graph construction primarily involves the selection of edges, and learning is not directly tied to this process, we will not delve deeper into graph construction in this study.
Computing SSTAs.For further experiments on SSTA forecasts, the SSTs were converted into SSTAs by removing seasonality, which generated a separate SSTA graph data set.The SSTA values are the departures from the average monthly SSTs evaluated over the 30-year period 1981-2010.Tables 2 and 3 show the node feature tensors and node label matrices for constructing the SST graphs and the SSTA graphs respectively.
Normalization.The data values were normalized to the [0, 1] range before inputting to the models, using the following formula: xi = x i x train,min x train,max x train,min , where x i is a raw ERA5 SST or converted SSTA value, x train, min and x train, max are the minimum and the maximum over the training data, and xi is a normalized SST or SSTA value.Unlike the normalization in Taylor and Feng (2022) and Ning et al. (2023), we applied the scale factors from the training data, rather than from the entire data set.This approach helps to ensure that scale factors remain consistent, even when future test data are input into the pre-trained models.For simplicity, we did not normalize the data using the extremes for each node, so every node was normalized by the same range.Normalization primarily helps to stabilize numerical calculations and accelerate the rate of convergence to a solution (Taylor & Feng, 2022).The test results were computed from the denormalized values, which were transformed back from their normalized state.Note.Due to the limited memory on the shared GPU where we performed the experiments, the largest adjacency matrix that could be used for training the models was when c = 0.68.(Rong et al., 2019) and DeGNN (Miao et al., 2021), may warrant further investigation for this forecasting task, though beyond the scope of this study.Hence, we applied the GraphSAGE to perform learning on the SST and SSTA graphs.The GraphSAGE provides a scalable solution to produce embedding for nodes in large graphs and its inductive learning makes it suitable for graphs that evolve over time (Hamilton et al., 2017).Algorithm 2 describes the GraphSAGE model and Figure 1 explains the procedure with a simple example.The forecasting task here is node regression with sliding windows.First of all, we aimed at forecasting SSTs or SSTAs at every node 1 month ahead.In each iteration, the inputs were a set of w graphs at earlier time steps, V :,t w+1,…,t , where w is the forecasting window size, and the output was one graph at the time step for prediction, V :,t+1 .Following Taylor and Feng (2022), we used a window size of 12, and for longer lead forecasts, we used a recursive model for up to 2-year-ahead forecasts, where the model's predictions for one lead time become part of the inputs for the next lead time prediction, and this process could continue iteratively to produce forecasts for unlimited lead times.The generated SST and SSTA graphs in NumPy arrays, including the node feature tensors (containing both features and labels), the adjacency matrices, and the node coordinates, are accessible at https://doi.org/10.5281/zenodo.10702373 (Ning, 2024d).

Graph Neural Network
Experiment Details.For training the GraphSAGE, the graph construction was further implemented using the Data object in PyTorch Geometric (Fey & Lenssen, 2019).The attributes: "x" was the node feature matrix, "y" was the node label matrix, "edge index" was transformed from a selected adjacency matrix, "edge attr" was none, "num nodes" was the number of rows of the node feature matrix, "num edges" was the rank of the adjacency matrix, "has isolated nodes" was true, "has self loops" was false, and "is undirected" was true.
The GraphSAGE models were implemented in Python using PyTorch (Paszke et al., 2019) and PyTorch Geometric (Fey & Lenssen, 2019).We deployed a similar GraphSAGE structure as that in Ning et al. (2023), with the following changes: the number of output features between layers is 15.The optimizer is the AdamW, with a 0.001 learning rate and a 0.0001 weight decay.The other configurations remained the same: there are two layers, where the first layer inputs w features and the second layer outputs one feature.The activation function is the hyperbolic tangent.The loss is the mean squared error.The RMSE calculated from the converted values in Kelvin is reported.PyTorch Geometric provided three aggregation functions for SAGE convolutional layers and we used the default mean aggregation.
For training the SST forecasting models, we used early stopping in order to prevent overfitting associated with long training time.We set patience to be 100 and the maximum number of training epochs to be 1000.After each training epoch, the model with the lowest overall mean squared error (MSE) over the test data was saved and the training process continued for 100 epochs until the next model with the lowest test MSE was found.If models with a lowest MSE continued to be found within epochs, the training was ended at the 1000th epoch.Because there were early fluctuations in training the SSTA forecasting models, early stopping might stop at an early epoch with local minima, so we trained the SSTA models for a fixed number of epochs (400).Figures A5 and A6 demonstrate the training and test MSEs of the models by epoch.
The GraphSAGE models were trained on a shared Nvidia Tesla v100 machine.The set of largest graphs that could be trained with the allocated memory, used the adjacency matrix with c = 0.68, but the best test performance was given when c = 0.7.

Results and Discussion
For global SST and SSTA forecasts, we calculated RMSEs on the denormalized test data for each node.The average RMSEs of all nodes are summarized.The persistence model was used as a baseline against which to compare performance for 1-month-ahead forecasts.

One Month to Two Years Ahead Global SST Forecasts
Table 4 shows the results of the persistence model and the GraphSAGE models with and without random compensating edges for the 1-month-ahead global SST forecasts.The GraphSAGE models outperformed persistence substantially in terms of RMSEs.The lower standard deviation of the RMSEs demonstrated that the GraphSAGE's performance was more consistent than persistence across different nodes.
Including random edges to the less-connected nodes did not improve the performance, which suggests that the current graph construction with the Kendall rank correlation measure was effective.Hence, in the subsequent SSTA forecasts, we only used the graphs without random edges.However, we also noticed that the model returned a slightly smaller average RMSE when c = 0.7 than when c = 0.68, so larger graphs might not necessarily lead to better performance.Also, including a temporal lag when calculating the correlations could be another possible improvement.However, because the graphs in this study were undirected, including lag time was out of scope.Another attempt could assign a fixed number of edges to each node, to ensure that each node is connected to the same degree.
In order to further investigate the spatial predictability of the GraphSAGE, we obtained the difference between the persistence RMSE and the RMSE of the GraphSAGE at each node, shown in Figure 2. The GraphSAGE model outperformed persistence largely at the majority of nodes across the world in terms of RMSEs, and the poorest performance occurred in the equatorial Pacific.The reasons could be that the SSTs in this area varied within a small range with less seasonality and the nodes were less-connected shown in Figure A1.
In Figure A8, we selected 12 locations of interest for time series plots and scatter plots in Figures A9 and A10.The locations are in the marine heatwave hotspots summarized in Oliver et al. (2021).The forecasts at the selected locations were all more accurate than the persistence model.The predictions for larger observed SSTs were sometimes more inaccurate than those for smaller observed SSTs, for example, in Benguela, Peru, and northern Australia, possibly implying a trend of long-term increases in SSTs in these regions.
Figure 3 shows the results of the GraphSAGE models with the c = 0.7 graph set for longer lead global SST forecasts.The average RMSEs of the SST forecasts across all nodes for 2 months and 3 months ahead were 0.6641 and 0.7479 respectively.The RMSEs for the period spanning 4 months to 1 year hovered around 0.8, slightly exceeding the 0.75 RMSE reported by Taylor and Feng (2022).The average RMSEs exhibited an increase after the 1-year mark, eventually stabilizing around 1.05 for the 2-year-ahead forecasts.The uptick in the initial 3 months of the first year may be attributed to the high temporal autocorrelation of SSTs.However, the increase in  Earth and Space Science 10.1029/2023EA003455 the second year is likely tied to the utilization of a 12-month window size.Consequently, we hypothesize that employing a larger window size could potentially enhance the accuracy of longer lead forecasts, despite necessitating more input features and demanding a more computationally intensive training process.To investigate effect of the window size on forecasting performance, we employed a smaller adjacency matrix with c = 0.9 and further experimented with four different window sizes: 3, 6, 12, and 24 months.A longer window indeed provided better overall RMSE scores than a shorter window.Comparing the 24 months window with the 12 months, the difference is evident for a forecasting time horizon up to 24 months, where all RMSE values were below or around 0.8 even with a small adjacency matrix.Especially beyond 12 lead months, a larger window size decreased the rate of change in the RMSEs after a 1-year time horizon.Conversely, the smaller window sizes 3 and 6 did not provide useable long lead forecasts as expected.These findings further highlight the importance of window size selection in capturing long-term dependencies.The results are in Figure A33.
Another solution is incorporating approaches to tackle long-term patterns such as LSTMs and transformers.However, this study focused on GNNs and adding another class of neural network layers will increase the model's complexity, making it difficult to isolate GNNs' performances and the spatial patterns learned from them.We conducted a preliminary ablation study by integrating an LSTM layer subsequent to the GraphSAGE, which resulted in a reduction of the RMSE; thus, further investigation might be needed.

One Month Ahead Global SSTA Forecasts
Table 5 shows the results of the persistence model and the GraphSAGE models with and without random compensating edges for 1-month-ahead global SSTA forecasts.Additionally, we computed SSTAs by the predicted SSTs using the GraphSAGE model with the c = 0.7 graph set.Similar to the SST forecasts, the added compensating edges did not improve the performance.The difference between the persistence RMSE and the RMSE of the GraphSAGE at each node is shown in Figure 4.The GraphSAGE model outperformed persistence by a small margin at most nodes across the world.The SSTA forecasts, derived from the predicted SSTs, exhibited a higher average RMSE compared to those obtained directly by the GraphSAGE and the persistence model.Figures 4 and A27 demonstrate the two types of SSTA forecasts.
The GraphSAGE model outperformed the persistence model in terms of the average RMSE.Compared with the SST forecasts, the differences between the RMSEs of the GraphSAGE model and the persistence model were smaller, indicating the improvement from the persistence for SSTAs was not as substantial as that for SSTs and SSTA forecasts might be more challenging.
In Table 6, we further compared both types of SSTA forecasts at the 12 selected locations.Figure 5 shows the forecasts by GraphSAGE and further adds the selected locations and the major ocean gyres to the map for further analysis.There are several regions in the northeast Pacific, where direct SSTA forecasts by GraphSAGE were outperformed by the persistence, which are areas known for extreme heatwaves, for example, the "Blob" (Michaud et al., 2022).The northeast Pacific is affected by multiple gyres, which feature  the North Pacific Current, the Alaska Current, and the California Current.There was an increase in SST variance and positive SSTAs in the Gulf of Alaska and Bering Sea and the northwest Atlantic (Oliver et al., 2018(Oliver et al., , 2021;;Walsh et al., 2018).Large ocean gyres are driven by the global wind patterns (Talley et al., 2011), and so variability in the wind drives variation in the strength of these currents, which ultimately leads to SSTA being more variable.This variability might account for the elevated forecast errors in the northeast Pacific.In Figures A2 and A3, the two nodes in the northeast Pacific were well connected with many nodes in the northern temperate zone and some nodes in the southern temperate zone, which ensured a  Note.The nodes are in the marine heatwave hotpots (Oliver et al., 2021).The selection of the hotpots followed (Oliver et al., 2021), where key historical marine heatwaves were observed since 2000.AL, Gulf of Alaska and Bering Sea; Beng., Benguela; BCS, Baja California Sur; ECS, East China Sea; Med., Mediterranean Sea; NA, Northern Australia; NEP, Northeast Pacific; NWA, Northwest Atlantic; Tas., Tasman Sea; WA, Western Australia; WSA, Western South Atlantic, applied to the other part of this study.The SSTA forecasts that outperformed the persistence models are in bold, indicating that the direct SSTA forecasts by the GraphSAGE had more instances of outperforming the persistence for the selected locations.The slight differences between the corresponding persistence RMSEs (in parentheses) were potentially caused by rounding discrepancies, due to the various conversions (between SSTs and SSTAs, and normalization and denormalization) of the NumPy arrays.
Earth and Space Science 10.1029/2023EA003455 NING ET AL.
sufficient number of nodes for learning.However, the node next to the Baja California Sur was only connected with the nodes nearby, which might not provide sufficient information for learning.We also noted several regions in the southern temperate zone with poorer SSTA forecasts.Possibly the cold currents from Antarctica have a stronger influence on the ocean temperatures here.Also, poorer forecasts generally occurred in the center of the subtropical ocean gyres which are affected by downwelling Ekman transport.For the other marine heatwave hotspots, the SSTA forecasts produced by GraphSAGE generally outperformed the persistence.With the exception of one node in the northwest Atlantic, which is likely more affected by continental land processes, most of these nodes were influenced by currents around a single major ocean gyre, which makes the regions less complex and the forecasts more accurate.
When the two types of SSTA forecasts were compared, the direct SSTA forecasts by the GraphSAGE were mostly more accurate but the predictions for the upper extreme observations, which are associated with marine heatwaves, were generally underestimated.This problem could be potentially resolved by the techniques for imbalanced regression which enhance predictions for outliers.
Longer lead SSTA forecasts could only be produced using SST predictions with the recursive model.Figures A31  and A32 show the 2-year-ahead SST and SSTA forecasts with the recursive model, using 2021 and 2022 as an example.The SST forecasts could be accurate because of the extra information provided by strong seasonality but the SSTA forecasts were generally inaccurate and potentially unusable.

Conclusion
This study represents a step toward the spatiotemporal forecasting of monthly mean SSTAs and marine heatwaves.The methods applied to SST and SSTA forecasts, centered on graph construction and GNN modeling, highlight the flexibility in feature structures and underscore the importance of modeling SST teleconnections.This is especially beneficial for the understanding of the spatial patterns in SSTs and SSTAs globally.

Contributions
In this study, the work in Ning et al. (2023) was further progressed by extending 1-month-ahead SST forecasts to up-to-two-year SST and 1-month-ahead SSTA forecasts.SSTA forecasts are more challenging than SST forecasts possibly due to removing seasonality.We identified that the SSTA forecasts by the GraphSAGE outperformed both the persistence model and the conversion from SST predictions by the GraphSAGE at most nodes in terms of RMSEs.We also found that unlike SST forecasts, the recursive model would not produce useable longer lead SSTA forecasts by directly inputting 1-month-ahead SSTA predictions.
The exploration of graph construction was further progressed by updating the method via changing the measure for defining the adjacency matrix: to accommodate for the non-linear associations between nodes and simplicity in edge construction, the Kendall correlation coefficient was used instead of the Pearson.Although including random edges did not improve the performance, having random edges may serve as a baseline to demonstrate the efficacy of incorporating the assumption that teleconnections, as described by statistical correlations, influence the performance.
We showed that, in our SST forecasts, regions with strong persistent currents, that is, those around ocean gyres, are more predictable than low current equatorial regions which are less predictable.For SSTA forecasts, the spatial patterns inherent in global SSTA predictability and the forecasts at selected marine heatwave hotpots were further analyzed, where regions with poor SSTA forecasts were identified in several distinct zones: middle latitudes in the Southern Hemisphere which are less influenced by currents, high latitudes which are influenced by cold currents from Antarctica, the equatorial Pacific, and the northeast Pacific which is under the influence of multiple currents.
In addition, along with the study, a global SST and SSTA graph data set was generated from ERA5, and a software package to generate these graph data was developed.

Future Work
SSTA and marine heatwave forecasts, viewed through the lens of spatiotemporal anomaly forecasting, provide avenues for deeper exploration, particularly in anomaly prediction and long-term time series forecasting.
SST and SSTA Graph Construction.The investigation into the construction of graphs for SST and SSTA is still ongoing.Given the inherent flexibility of graphs and their construction methods, there could be potential for further improvement.When computational resources permit, larger-size graphs could also serve as input.Besides setting a minimum number of edges at each node, it could be also feasible to set a maximum number of edges at each node to control the graph size, because some nodes may have an excessive number of edges that are not necessary for learning.Other bivariate dependency measures could be explored.Applying different edge structures for SST and SSTA graphs might be viable, by finding more efficient measures for SSTA graph construction.Introducing temporal lags into the construction of graphs might be viable, despite the increased complexity.
Techniques for Imbalanced Regression.Techniques tailored for imbalanced regression could influence the predictions of extreme SSTAs, which are associated with marine heatwaves.These techniques include continuous imbalanced data re-sampling (Branco et al., 2017;Torgo et al., 2013;Yang et al., 2021) and loss functions (Ren et al., 2022;Yang et al., 2021).Considering the completeness of the ERA5 data and the presence of outliers not in the SSTA distribution at every node, one topic of interest is applying imbalanced regression loss functions or customizing one to cater specifically to imbalanced regression.
Long-Term Temporal Dependency Learning.Using larger window sizes could improve long-term SST predictions.Besides the sliding window method and recursive model, incorporating LSTMs (Pravallika et al., 2022;Taylor & Feng, 2022) or transformers (Zhou & Zhang, 2023) may enhance long-term SST forecasting, subsequently influencing the SSTA predictions converted from SSTs.However, graphs have more complex structures than grids and sequences, especially for large graphs, which require LSTMs or transformers to be combined with GNNs to capture the underlying graph structures and then perform forecasting.Such combinations, however, might introduce integration complexity and hyperparameter sensitivity, which results in more challenges in optimization.

Figure 1 .
Figure1.This simple case assumes that a sea surface temperature (SST) grid (gray dots) has been converted into three spatially homogeneous SST graphs at different time steps (a, b, c panels).The blue, red and green nodes are target nodes, and lines of the corresponding color show the connection with other nearby or remote nodes.With some other nearby or remote nodes.Each graph is then sequentially input to the GraphSAGE for every training epoch.The GraphSAGE aggregates node features X i (i.e., features at 12 time steps before the target time) from the connected nodes.It predicts the node feature at the target nodes using the aggregated node features and also the node features of the target nodes.The yellow node in the equatorial Pacific is an isolated node, where the GraphSAGE only uses the node features of the node itself for prediction.

Figure 2 .
Figure 2. ΔRMSE GraphSAGE (= RMSE persistence RMSE GraphSAGE ) for 1-month-ahead sea surface temperature forecasts by GraphSAGE using the c = 0.7 graph set at every node on the global grid.The positive values are red, indicating the GraphSAGE outperforms the persistence model in terms of the test root mean squared error at a node, and the negative values are blue; the size of a dot represents the absolute value, in the same way as in the other figures in this study.The dot sizes are normalized using fixed maximum and minimum sizes for display.

Figure 3 .
Figure3.Average RMSEs and corresponding standard deviations across all nodes for sea surface temperature (SST) forecasts ranging from 1 month to 2 years.Both metrics generally escalate as the number of lead months increases, demonstrating increase in variability of prediction error across nodes.Compared to the U-Net-LSTM, which maintained a root mean squared error below 0.75 across all months in a 24-month SST prediction period(Taylor & Feng, 2022), our model exhibited higher errors for forecasts with longer leads.

Figure 4 .
Figure 4. ΔRMSE GraphSAGE (= RMSE persistence RMSE GraphSAGE ) for 1-month-ahead SSTA forecasts by GraphSAGE at every node on the global grid.The dot sizes are normalized using the same maximum and minimum sizes as Figure 2.

Figure 5 .
Figure 5. ΔRMSE GraphSAGE (= RMSE persistence RMSE GraphSAGE ) for 1-month-ahead SSTA forecasts by GraphSAGE at every node on the global grid, with 12 selected locations marked and major ocean gyres included.

Figure A1 .
Figure A1.Node degree map when c = 0.7 using the Kendall rank correlation coefficient as a measure to construct graphs.

Figure A5 .
Figure A5.The MSEs calculated from a comparison of the sea surface temperature predictions with the training and the test data sets during training.We used early stopping to prevent overfitting and save the model with the lowest test MSE at an earlier epoch.

Figure A6 .
Figure A6.The MSEs calculated from a comparison of the SSTA predictions with the training and the test data sets during training.Early stopping was not used because of a substantial local minimum in the early 100 epochs.Instead, we used a fixed number of training epochs.

Figure A7 .Figure A8 .
Figure A7.Scatter plots for node degrees and RMSEs for 1-month-ahead sea surface temperature forecasts by GraphSAGE across all nodes.

Figure A9 .
Figure A9.Time series plots for 1-month-ahead sea surface temperature forecasts by GraphSAGE at the 12 selected locations.

Figure A10 .
Figure A10.Scatter plots for 1-month-ahead sea surface temperature forecasts by GraphSAGE at the 12 selected locations.

Figure A11 .
Figure A11.Time series plots for 2-month-ahead sea surface temperature forecasts by GraphSAGE at the 12 selected locations.

Figure A12 .
Figure A12.Scatter plots for 2-month-ahead sea surface temperature forecasts by GraphSAGE at the 12 selected locations.

Figure A13 .
Figure A13.Time series plots for 3-month-ahead sea surface temperature forecasts by GraphSAGE at the 12 selected locations.

Figure A14 .
Figure A14.Scatter plots for 3-month-ahead sea surface temperature forecasts by GraphSAGE at the 12 selected locations.

Figure A15 .
Figure A15.Time series plots for 6-month-ahead sea surface temperature forecasts by GraphSAGE at the 12 selected locations.

Figure A16 .
Figure A16.Scatter plots for 6-month-ahead sea surface temperature forecasts by GraphSAGE at the 12 selected locations.

Figure A17 .
Figure A17.Time series plots for 1-year-ahead sea surface temperature forecasts by GraphSAGE at the 12 selected locations.

Figure A18 .
Figure A18.Scatter plots for 1-year-ahead sea surface temperature forecasts by GraphSAGE at the 12 selected locations.

Figure A19 .
Figure A19.Time series plots for 18-month-ahead sea surface temperature forecasts by GraphSAGE at the 12 selected locations.

Figure A20 .
Figure A20.Scatter plots for 18-month-ahead sea surface temperature forecasts by GraphSAGE at the 12 selected locations.

Figure A21 .
Figure A21.Time series plots for 2-year-ahead sea surface temperature forecasts by GraphSAGE at the 12 selected locations.

Figure A22 .
Figure A22.Scatter plots for 2-year-ahead sea surface temperature forecasts by GraphSAGE at the 12 selected locations.

Figure A23 .
Figure A23.Scatter plots for node degrees and RMSEs for 1-month-ahead SSTA forecasts by GraphSAGE across all nodes.

Figure A25 .
Figure A25.Time series plots for 1-month-ahead SSTA forecasts by GraphSAGE at the 12 selected locations.

Figure A27 .
Figure A27.ΔRMSE GraphSAGE (= RMSE persistence RMSE GraphSAGE ) for 1-month-ahead SSTA forecasts computed from the sea surface temperature predictions by GraphSAGE at every node on the global grid.

Figure A28 .
Figure A28.ΔRMSE GraphSAGE (= RMSE persistence RMSE GraphSAGE ) for 1-month-ahead SSTA forecasts computed from the sea surface temperature predictions by GraphSAGE at every node on the global grid, with 12 selected locations marked.

Figure A29 .
Figure A29.Time series plots for 1-month-ahead SSTA forecasts computed from the sea surface temperature predictions by GraphSAGE at the 12 selected locations.

Figure A30 .
Figure A30.Scatter plots for 1-month-ahead SSTA forecasts computed from the sea surface temperature predictions by GraphSAGE at the 12 selected locations.

Figure A31 .
Figure A31.Time series plots for 2-year-ahead sea surface temperature forecasts for 2021 and 2022 by GraphSAGE at the 12 selected locations.

Figure A32 .
Figure A32.Time series plots for 2-year-ahead SSTA forecasts for 2021 and 2022 by GraphSAGE at the 12 selected locations.The recursive model was ineffective for long lead SSTA forecasts.

Figure A33 .
Figure A33.Average RMSEs and corresponding standard deviations across all nodes for sea surface temperature forecasts ranging from 1 month to 2 years, using the adjacency matrix c = 0.9 and different window sizes: (a) 3 months; (b) 6 months; (c) 12 months; (d) 24 months.

Table 2
Node Feature Tensors for Constructing Graphs When w = 12

Table 3
Node Label Matrices for Constructing Graphs The ERA5 data can be downloaded from the Copernicus Climate Change Service, https://cds.climate.copernicus.eu/,by selecting the "ERA5 monthly averaged data on single levels from 1940 to present" data set.

Table 4
Average RMSEs Across all Nodes for One-Month-Ahead SST Forecasts

Table 5
Average RMSEs Across all Nodes for One-Month-Ahead SSTA Forecasts

Table 6
Average RMSEs at the 12 Selected Nodes for One-Month-Ahead SSTA Forecasts Using Two Approaches: Direct SSTA Prediction by the GraphSAGE and Computing From SST Predictions by the GraphSAGE