Learning Sea Surface Height Interpolation From Multi‐Variate Simulated Satellite Observations

Satellite‐based remote sensing missions have revolutionized our understanding of the Ocean state and dynamics. Among them, space‐borne altimetry provides valuable Sea Surface Height (SSH) measurements, used to estimate surface geostrophic currents. Due to the sensor technology employed, important gaps occur in SSH observations. Complete SSH maps are produced using linear Optimal Interpolations (OI) such as the widely used Data Unification and Altimeter Combination System (duacs). On the other hand, Sea Surface Temperature (SST) products have much higher data coverage and SST is physically linked to geostrophic currents through advection. We propose a new multi‐variate Observing System Simulation Experiment (OSSE) emulating 20 years of SSH and SST satellite observations. We train an Attention‐Based Encoder‐Decoder deep learning network (abed) on this data, comparing two settings: one with access to ground truth during training and one without. On our OSSE, we compare abed reconstructions when trained using either supervised or unsupervised loss functions, with or without SST information. We evaluate the SSH interpolations in terms of eddy detection. We also introduce a new way to transfer the learning from simulation to observations: supervised pre‐training on our OSSE followed by unsupervised fine‐tuning on satellite data. Based on real SSH observations from the Ocean Data Challenge 2021, we find that this learning strategy, combined with the use of SST, decreases the root mean squared error by 24% compared to OI.


Introduction 1.Background
Since the first ocean remote sensing missions in the 1970s, satellite observation has become one of the most determining contributions to understanding ocean state and dynamics (S.Martin, 2014).Through the years, satellites have provided a huge amount of data of various physical natures with wide spatial coverage that complemented in situ datasets.Among these techniques, satellite altimetry is used to retrieve the Sea Surface Height (SSH) a determining variable of the ocean circulation.The SSH spatial gradient can be used to estimate geostrophic circulation, i.e. the currents issued from the equilibrium between the Coriolis force and the pressure force in the surface layer of the Ocean.SSH (also called Absolute Dynamical Topography by the altimetry community) is currently measured by nadir-pointing altimeters, meaning that they can only take measurements vertically, along their ground tracks, by calculating the return time of a radar pulse.This leads to large gaps in the observed SSH, and providing a gap-free product (L4) is a challenging Spatio-Temporal interpolation problem.One of the most widely used L4 products in oceanography applications is provided by the Data Unification and Altimeter Combination System (duacs) (Taburet et al., 2019).It is a linear Optimal Interpolation (OI) of the nadir along-track measurements leveraging a covariance matrix tuned on 25 years of data.However, several studies show that duacs OI misses some of the mesoscales structures and eddies (Amores et al., 2018;Stegner et al., 2021).Improving the reconstruction of gridded altimetry products remains an open challenge.
To enhance the quality of the SSH reconstruction and sea surface current estimation, using additional physical information such as the Sea Surface Temperature (SST) has been demonstrated to be beneficial (Ciani et al., 2020;Thiria et al., 2023;S. A. Martin et al., 2023;Archambault et al., 2023;Fablet et al., 2023).SST motion is linked to ocean circulation (Isern-Fontanet et al., 2006), and therefore to SSH, as currents transport heat in an advection dynamics.SST measurements obtained through passive infrared technology offer a remarkably high spatial resolution, ranging from 1.1 to 4.4 km (Emery et al., 1989), even if intermittent clouds introduce data gaps.On the other hand, microwave sensors provide lower-resolution SST data (25 km) which can be obtained through nonraining clouds.Infrared and microwave data are then combined with in situ measurements, to produce fully gridded SST maps (Donlon et al., 2012;Chin et al., 2017).Thus, a crucial challenge lies in developing efficient reconstruction methods capable of fusing data derived from different remote sensing techniques, each presenting distinct interpolation challenges.This is essential to unlock the full potential of satellite oceanography products.

SSH interpolation with deep neural networks
In the last decade, deep learning has emerged as one of the leading methods to address image inverse problems.Neural networks demonstrated remarkable flexibility in fusing observations from various sources and modalities, exhibiting their capacity to learn complex relationships given enough training samples (McCann et al., 2017;Ongie et al., 2020).Prior works proved that it is possible to use SST to enhance SSH reconstruction with a deep-learning network, whether from a downscaling perspective (Nardelli et al., 2022;Thiria et al., 2023) or an interpolation one (Fablet et al., 2023;S. A. Martin et al., 2023).However, training neural networks usually requires fully gridded ground truth, which is unavailable in a realistic geoscience scenario.To overcome this limitation, it is possible to design a twin experiment of the satellite observing system on a numerical simulation, also called an Observing System Simulation Experiment (OSSE).Neural networks can then be trained on simulated data and applied to satellite observations.The Ocean Data Challenge 2020 (CLS/MEOM, 2020) is a 1-year OSSE providing SSH simulated observations and ground truth, aiming to compare various reconstruction methods.Among them, Fablet et al. (2021) performed a supervised deep learning of the SSH interpolation and extended their study using SST showing increased performance (Fablet et al., 2023).However, if the SSH-only network was successfully applied to real data, adapting its SST-using version is still a challenging problem.Another way to overcome the lack of ground truth is to employ loss functions allowing the neural network to learn from observations alone.Archambault et al. (2023); S. A. Martin et al. (2023) trained a neural network using only SST and SSH observations showing the potential of unsupervised learning for SSH interpolation.This last option has the advantage of not suffering from the domain gap between the simulation and the real data, but we expect unsupervised interpolations to produce less accurate reconstructions.

Contributions
First, as the previously existing Ocean Data Challenge OSSE provided only one year of data without SST, it presents clear limitations to train neural networks.We pro-pose a new OSSE that includes 20 years of SSH and SST data, with realistic simulated observations of these variables.Second, we compare a fixed neural architecture trained in a supervised and unsupervised way, with or without SST.The SSH interpolation is learned by an Attention-Based Encoder-Decoder (ABED) on our OSSE.Its assessment involves evaluating errors in SSH and geostrophic currents reconstruction.Additionally, a comparison of the eddy structures is conducted, both quantitatively and visually.Third, we propose a hybrid learning strategy consisting of supervised pre-training on our OSSE and unsupervised fine-tuning on real-world observations.Specifically, we compare the same network architecture, trained in the three following manners: supervised on our OSSE and directly applied to observations, trained directly on observations, and the proposed hybrid approach.This paper is structured as follows.In Section 2, after giving a rationale for levraging SST information in the interpolation method, we detail our OSSE.In Section 3 we present our architecture and the training loss functions.In Section 4 we evaluate the interpolation on our OSSE, in terms of SSH reconstruction and geostrophic circulation errors.We also perform an eddy detection to demonstrate that SST-using methods retrieve more realistic ocean structures, and we compare ourselves to existing state-of-theart methods on the Ocean Data Challenge 2020 OSSE.Finally, we compare the learning strategies on real observations.In Section 5, we discuss the limitations and perspectives of this work.

Multi-variate data
In the following, we provide a rationale for the SSH and SST relationship, outline the reference data source we utilized (Global Ocean physics Reanalysis (CMEMS, 2020)), and detail our OSSE's SSH and SST observations.We also present the satellite observations that will be used for training and fine-tuning.

Physical relationship between SSH and SST
One of the most important uses of SSH data is to recover oceanic currents through geostrophic approximation.It consists of supposing a static equilibrium between the surface projection of the Coriolis force and the resultant pressure forces.Far from the Equator, where the Coriolis force projection is null, it is a good approximation of the circulation.The surface geostrophic currents can be computed from the SSH h following Equation 1 where u geo and v geo are the Eastward and Northward geostrophic currents, x and y the Eastward and Northward coordinates and where f = 2Ω r sin(ϕ) is the Coriolis factor, Ω r being the Earth the rotation period, ϕ the latitude and g the gravitational acceleration.
In a first approximation, the surface temperature T can be considered as a passive tracer transported by surface currents.The evolution of a scalar in a velocity field is described by the linear advection given in Equation 2.
Combining the geostrophic and the advection Equations (1,2), we understand why a time series of SST observations should provide pertinent information for constraining the SSH reconstruction.Several studies pointed out the interest in using SST to reconstruct SSH as Isern-Fontanet et al. (2006);González-Haro et al. (2020), which established spectral relations between SSH and SST in a Surface Quasi Geostrophic framework.However, the physical link between temperature and SSH is more complex, as other phenomena must be considered, such as diffusion, convection, circulation between water depths, atmosphere interactions, and viscosity.Satellite observations of temperature and sea surface height also suffer from instrumental errors and are, by nature, limited to observing the ocean surface.This is why neural network architectures, thanks to their flexibility, seem appropriate to learn the complex underlying link between the data.

Observing System Simulation Experiment
To effectively replicate the relationship between the two variables, we propose an Observing System Simulation Experiment (OSSE), meaning a twin experiment that accurately models the satellite observations of the Ocean.This approach is widely used in the geosciences community as it provides a way to test reconstruction methods and errors (Gaultier et al., 2016;Amores et al., 2018;Stegner et al., 2021).With this mindset, SSH and SST variables of a high-resolution simulation are considered as the ground truth ocean state upon which we simulate satellite measurements.The coherence of the relation between SSH and SST is ensured by the physical model, while with our OSSE we produce enough pairs of ground truth/observation to train a neural network.
In this paper, we denote X ssh and X sst the ground truth fields of the SSH and SST and Y ssh and Y sst , the simulated observations.Hereafter, we detail the reference dataset of our OSSE and the observation operators of the two variables.

Base simulation
We conduct our experiments on the Global Ocean Physics Reanalysis product (GLO-RYS12) (CMEMS, 2020).It provides various physical data such as SSH, SST, and oceanic currents with a spatial resolution of 1/12 • (around 8 km).GLORYS12 is based on the NEMO 3.6 model (Madec et al., 2017) and assimilates satellite observations (SSH alongtrack observations and SST full domain observations) through a reduced-order Kalman filter.It is updated annually by the Copernicus European Marine Service, making it impossible to use in near real-time applications.We select a temporal subset of this simulation from 2000/03/20/ to 2019/12/29, for a total of 7194 days.
We select a portion of the Gulf Stream, between 33 • to 43 • North and -65 • to -55 • East.This area is known for its intense circulation, its water mass of very different temperatures, and is far enough from the equator that the geostrophic approximation can be applied.Comparing the surface circulation of the model with its geostrophic approximation, we find that an RMSE of 6.6 cm /s for u geo and 6.1 6.1 cm /s for v geo .Considering the high intensity and variations of the currents in the Gulf Stream (with 37.1 and 34.3 cm /s of standard deviation for u and v respectively), geostrophy seems to be an adequate estimation.Thus, we expect a significant synergy between SSH and SST which a neural network can learn.For computational reasons, we resample the data to images of size 128 × 128 with a bilinear interpolation, corresponding to a resolution of 0.078 • by pixel (approximately 8.7 km).Doing so, the perceptive field of the network covers the entire 10 • by 10 • area.

SSH simulated observations
The nadir-pointing altimetry satellites take approximately a measurement per second, along their ground tracks.Their observations are a series of values with precise spatiotemporal coordinates that we aim to simulate.To do so, we retrieve the support of realworld satellite observations denoted Ω = {Ω i = (t i , lat i , lon i ) , i ∈ [0 : N ]} from the Coper-nicus sea level product (CMEMS, 2021).Using Ω and the ground truth data X ssh we simulate SSH observations Y ssh as the trilinear interpolation of the simulated field on each point of the support.We add an instrumental error ε ∼ N (0, σ) with σ = 1.9 cm, which is the distribution used in the Ocean data challenge 2020 (CLS/MEOM, 2020).The SSH observations Y ssh is defined as following: where H ssh is the trilinear interpolation operator of the ground truth X ssh on the support Ω.An example of these simulated along-track measurements is presented on the first row of Figure 1.For the neural network input observations, we regrid these data to a daily 128×128 image.We set the pixel value with no simulated satellite observation to zero, and we average the daily measurements of SSH inside each pixel to represent the mean of the daily data from the different satellites (if any).As GLORYS12 reanalysis assimilates along-track SSH data, selecting satellite measurements at the same location as the assimilated data might introduce a bias in our observations.To overcome this issue, we desynchronize the real satellite ground tracks from the one we use to produce SSH observations by introducing a time delay (772 days) between the real L3 satellite observations and the simulation.It ensures that simulated along-track data is selected randomly, rather than specifically where the model assimilated real-world observations.

SST simulated observations
SST remote sensing is based on direct infrared imaging, leading to wider measurement swaths but making the data sensitive to cloud cover.The so-called L3 satellite products have much higher data coverage, but no observation is possible when clouds are present.To fill the gaps, the L3 products from several satellites are merged and interpolated to form the fully gridded image, using complementary microwave satellite sensors (which produce lower resolution data but are less sensitive to clouds), and in situ measurements (Donlon et al., 2012;Chin et al., 2017).This results in various resolutions in the same product, where high-resolution structures are artificially smoothed when the cloud cover (C) is too thick.
We simulate the SST observation operator H sst as follows: where ⊙ is the element-wise product, ⋆ the convolution product, and ε is a white Gaussian noise image of size 32 × 32 linearly upsampled to a 128 × 128 image.We also use a spatio-temporal Gaussian filter, G σt,σx with σ t = 1.23 days and σ x ≈ 16(km) to simulate the smoothing of the interpolation performed by satellite products.To compute a realistic cloud cover C, we use 2 years of data from an NRT L3 product (CMEMS, 2023), which we periodically replicate to match the length of our dataset.We then linearly interpolate the cloud cover to our spatial resolution, and perform an average filter with a kernel size approximately equal to 43 (km).This step is essential, as applying a binary mask results in patches at the frontiers between cloud-free and cloudy regions.Our SST observations thus present a spatially and temporally correlated noise, with different resolutions depending on cloud cover.In the end, H sst adds a noise with RMSE of 0.48 • C where the SST standard deviation of the ground truth is 4.96 • C, which we present in Figure 2.This observation operator is different from real-world degradations but produces an image with an in-equal noise resolution similar to the errors present in the L4 SST products.Also, as SST presents strong annual variations that should be removed, we deseasonalize it.For each SST image, we subtract the mean image calculated for the corresponding day across the dataset.This is known to improve machine learning timeseries prediction (Ahmed et al., 2010), and in our case, it produces better reconstructions as shown in Appendix 6.3.

Satellite observations
To constitute a dataset of real-world observations, we propose the L3 SSH product that we used to recover realistic satellite ground tracks (CMEMS, 2021).These data are the inputs used in the duacs optimal interpolation process and are available from the years 1993 to 2023.For the L4 SST product, we use the Multiscale Ultrahigh Resolution (MUR) SST (NASA/JPL, 2019).MUR SST is produced through an optimal interpolation of infrared, microwave, and in situ measurements (Chin et al., 2017).Its resolution is very high (0.01 • ), so we linearly interpolate the data to our resolution (0.078 • ), and are available from 2002/05/31 to the present.We select satellite observations from 2002/06/01 to 2022/02/09 for a total of 7194 days which is the same number of timesteps that our OSSE.We also select the same geographical area between 33 • to 43 • North and -65 • to -55 • East.The two data are presented in Figure 3.
3 Proposed interpolation method

Learning the interpolation
The observation operator H ssh previously described can be seen as a forward operator that we aim to inverse.In the past years, deep neural networks, especially convolutional neural networks, have proven their ability to solve ill-posed image inverse problems (McCann et al., 2017) and more specifically inpainting problems (Jam et al., 2021; Qin et al., 2021).A neural network f θ is trained on a database to estimate the true state from observations f θ (y) = x.Learning this inversion operator thus requires (y, x) pairs (supervised) or only y (unsupervised) (Ongie et al., 2020).
We chose to perform the interpolation on a temporal window of 21 days; the input is thus a tensor of 21 images of SSH, with or without SST images, and the output is the 21 corresponding days of SSH only.The neural network estimates the true state from observations, Xssh = f θ (Y), where Y = Y ssh for a SSH-only interpolation, and Y = Y ssh , Y sst if the network uses SST.The length of the time window is discussed in Section 4.1, and training losses of the network in Section 3.3.

Architecture
Convolutional neural networks, one of the most used deep learning methods in image tasks, learn convolution operations able to identify features over space and/or time.These networks have been used for multiple tasks in geosciences, from forecasting (Che et al., 2022) to interpolation (Manucharyan et al., 2020;Fablet et al., 2021;S. A. Martin et al., 2023;Archambault et al., 2023), and from eddy detection (Moschos et al., 2020) to super-resolution (Nardelli et al., 2022;Thiria et al., 2023), to name a few.Over time, the machine learning community introduced various ways to organize these convolution operations, each one presenting distinct advantages.Residual layers learn small modifications between their input and output, making neural networks easier to train (He et al., 2016).Attention layers ponder their inputs by a factor between zero and one.This allows subsequent layers to focus on important features while neglecting irrelevant ones, which makes it well-suited to extracting information from contextual variables.It is widely used in many computer vision tasks (Guo et al., 2021) and can be transposed to geoscience applications such as (Che et al., 2022).An encoder-decoder architecture progressively compresses and decompresses the input data, identifying structures at different resolutions.
In this study, we compare different learning techniques on a fixed architecture: an attention-based encoder-decoder (abed) presented in Figure 4.This neural network benefits from the layers described above.The overall structure of our neural network is inspired by Che et al. (2022), who introduced a residual U-Net with attention layers for rain nowcasting.We removed U-Net residual connections that were not suited for the interpolation task and changed the attention and the upsampling blocks.The encoder starts with a batch normalization and a 3D convolution (in time and space) followed by two downsampling blocks that divide spatial dimensions by 2 (see Figure 4).The decoder is composed of residual attention blocks followed by upsampling blocks.
Hereafter, we describe our attention block, which consists of two essential steps: temporal and spatial attention modules.Our approach builds upon the Convolutional Block Attention Module (CBAM) principle introduced by Woo et al. (2018), which successively performs channel and spatial attention.We extend this idea by incorporating Then, a 3D attention layer block is used to highlight relevant information in the images, followed by a residual connection.Finally, a decoding block upsamples the images, and attention and decoding blocks are repeated to get back to the initial image size.
temporal information in the channel attention mechanism.To do so, we first compute the spatial average of each channel and instant, resulting in a tensor of size C×T where C is the channel number and T is the time series length.Subsequently, we apply two onedimensional convolutional layers with a kernel of size 1, followed by a sigmoid activation function to estimate the attention weights.This corresponds to a 2-layer perceptron shared by every time step, which is different from the CBAM, as it includes the temporal information in the channel attention.These weights are then multiplied to each timestep of every channel, enabling the network to highlight salient features and suppress irrelevant information.After performing temporal attention, we proceed with spatial attention.This step involves utilizing a 3-dimensional convolutional operation, where the kernel size's temporal length matches the time series's length.As a result, the entire time series is aggregated into a single 2D image, which serves as the basis for deriving spatial attention.A residual skip connection is then applied, and the described block is repeated 4, 2, and 1 time for the first, second, and last block, respectively.For further details about our implementation, we provide the PyTorch implementation of our network in https://gitlab.lip6.fr/archambault/james2024.

Loss and regularization
We propose to compare two main strategies to train the neural network.Thanks to the OSSE previously described, we have access to the ground truth, which we can use to learn the interpolation in a classic supervised fashion.However, it is also possible to train directly on observations by applying the H ssh on the generated map Xssh before computing the loss (see Equations 5,6,7).Filoche et al. (2022) performed the interpolation with SSH observations only, and, using the same principle, Archambault et al. (2023) showed that it was possible to estimate SSH images starting from SST only and constraining on SSH observations.Both these methods are fitted on one (or a small number) example and must be refitted to be applied to unseen data.Using a larger real-world satellite dataset, S. A. Martin et al. (2023) trained a neural network directly from observations by constraining it on independent satellite observations that were not given in the input.However, the lack of ground truth reference makes it harder to compare the different reconstructions, especially regarding detected eddies and structures.We propose to train neural networks using the 3 following losses: • The MSE using ground truth: (5) • The MSE using only observations: • The MSE using only observations and the regularization introduced by S. A. Martin et al. (2023): where ∂ ∂s is the along-track derivation of the SSH approximated by its rate of change (see Appendix 6.1).T is the temporal length of the time series (here 21), H and W the spatial dimensions of the images (here both equals 128), and N , N 1 , N 2 , the number of satellite measurements of SSH, SSH first, and SSH second spatial derivative along satellite tracks, respectively.We take λ 1 = λ 2 = 0.05 the regularization coefficients, the same values used by S. A. Martin et al. (2023).
The losses L unsup and L unsup reg apply the observation operator H ssh , before computing the MSE, which allows the training in a framework where only observations are available.Thus, from an interpolation point of view, the inversion methods that use these losses are unsupervised as they can be trained without any ground truth image.However, if we constrain the network on the same observations that were given in input, an over-fitting of along tracks will occur with no guarantee of generalization.To avoid this problem, S. A. Martin et al. (2023) constrained their network on the observations of one satellite that were withdrawn from the input.Similarly, we remove the data of one satellite from the inputs but we calculate the loss function on all satellite observations (the ones given and the ones left aside).In doing so, the network must generalize outside the along-track measurement that was given as input.In Figure 5, we call Y ssh in the input observations and present an unsupervised inversion computational graph.

Training details
Train, validation, test split.We partitioned the OSSE dataset into three subsets: training, validation, and test data.We used the year 2017 exclusively to test our reconstruc- Normalization.We normalize the artificial network's input and output by subtracting the mean and dividing by the standard deviation.The normalization parameters are computed only on the neural network inputs, SST, or along-track data.Specifically, we first perform this normalization for images related to SSH along-track measurements and subsequently replace any missing values with zeros.We normalize the neural network SSH outputs with the statistics computed on the input observations (so that the method remains applicable in an unsupervised setting).When training with the regularized loss of Equation 7, we also normalize the data from the first and second SSH along-tracks derivative.
Training hyperparameters.We train every method using an ADAM optimizer (Kingma & Ba, 2017) with a learning rate starting at 5.10 −5 and a decay of 0.99.We perform an early stopping with a patience of 8 epochs.For the supervised training, the stopping criteria is the RMSE of the reconstruction on the fully gridded domain on the validation data, but in the unsupervised setting, we compute this RMSE on left-aside along-track measurements.Doing so, the stopping strategy is still compliant with a situation where no ground truth is accessible.
Ensemble.As neural network optimization is sensitive to its weight initialization, we train 3 networks for every setting.The so-called "Ensemble" estimation is the average SSH map of the 3 networks.An ensemble estimation helps stabilize performances and enhances the reconstruction (Hinton & Dean, 2015).In the following, we call "Ensemble score" the score of the previously mentioned ensemble estimation and "Mean score" the average of the score of each network taken independently.

Results
In Sections 4.1 and 4.2, we compare the different training methods on our OSSE to highlight the drawbacks of unsupervised learning and the advantages of SST.In Sec-tion 4.3, we assess the similarity of our OSSE and the previously existing one, the Ocean Data Challenge 2020.In Section 4.4, we build upon the conclusions drawn in previous sections to present a transfer learning method from our OSSE to observations.

SSH reconstruction and quality of derived geostrophic currents
We compare the fields estimated by the networks trained using the 3 losses L sup , L unsup and L unsup reg , with 3 different sets of input data: only SSH tracks, SSH and the noised SST (denoted nSST), and noise-free SST (denoted SST).The noise-free SST provides an upper-bound performance of the neural network in the case of a perfect physical link between SSH and SST.We give the RMSE of the SSH estimates fields on the test set in Table 1, and the RMSE on the velocity fields in Table 2. Systematically, the ensemble reconstruction has a lower RMSE than the mean performance, which is usual in machine learning, as individual member errors are compensated by others.Comparing the ensemble scores, we observe that the supervised loss function outperforms the unsupervised framework in every data scenario.Specifically, in the SSH+SST scenario, the supervised loss decreases the ensemble RMSE of L unsup by 17%, and 9% without SST.Also, adding SST as an additional input to the network generally improves performance compared to using SSH alone.This improvement is observed across all three loss functions, as the error values decrease for SSH+nSST compared to SSH.For instance, the SSH-only ensemble RMSE is decreased by 31% and 20% for SST and nSST, respectively, with L sup .The regularization introduced by S. A. Martin et al. (2023)  We estimate the surface currents from the reconstructed SSH from Equation 1, and we compare it to the surface circulation of the model.The errors on velocity in Table 2 follow the same patterns as the RMSE on the SSH fields but with lesser differences between methods.The RMSE is not too far from the minimal error achievable through geostrophy, which is 6.57cm /s for u and 6.14 for v on this data.
In Figure 6, we show the daily errors of the different methods on the test year.We notice a strong temporal variability of the RMSE, with a notable increase in late summer.Specifically, in August and September, all methods are performing worse than in Winter, which can be explained by the high kinetic energy of the ocean in summer (Zhai et al., 2008;Kang et al., 2016).
An important challenge of ocean satellite products is to provide real-time estimations, as many applications cannot use products available with too much time delay.In an operational framework, products that are immediately available are called Near Real Time (NRT) whereas those that require a time delay before release are called Delayed Time (DT).While in Table 1 we presented the results obtained on the central image of the time window, we can also display their scores along the 21-day temporal window as in Figure 7.The central image is a 10-day Delayed Time reconstruction as we need im- L sup 12.8 13.9 11.1 12.0 10.1 10.7 L unsup 13.4 15.5 12.0 14.1 11.1 13.1 L unsup reg 12.8 14.3 11.7 12.9 11.0 12.0 Table 2. Eastward (u) and Northward (v) surface currents in cm/s.The currents were estimated by applying the geostrophic approximation (see Equation1) on the SSH ensemble estimation of the 3 ABED networks.

Importance of mesoscale eddies
Mesoscale eddies play an important role in ocean circulation and dynamics, and their understanding leads to diverse applications in oceanography or navigation (Chelton, Schlax, & Samelson, 2011).Previous studies underline how these structures transport heat, especially between latitudes 0 • and 40 • in the North Atlantic (Jayne & Marotzke, 2002), but also salinity (Amores et al., 2017), or plankton (Chelton, Gaube, et al., 2011).In practice, mesoscale eddies and structures are estimated through geostrophic currents derived from satellite altimetry.However, operational satellite products such as duacs OI, have too coarse resolutions to resolve accurately mesoscale structures.Performing an OSSE to simulate the satellite's remote sensing, Amores et al. (2018); Stegner et al. (2021) showed that duacs-like optimal interpolation aggregates small eddies into larger ones (i.e. with a radius greater than 100 km).These interpolations also capture a small percentage of eddies in the model simulation (around 6% in the North Atlantic) and change the eddies' distribution and properties.This is why we are interested in finding to what extent our reconstruction methods can detect small eddies in the ground truth, and how well the detected eddies are resolved and their physical properties conserved.

Automatic eddy detection algorithm: AMEDA
We use the Angular Momentum for Eddy Detection and tracking Algorithm (AMEDA) introduced by Vu et al. (2018) to perform the eddies detection.It is based on the Local Normalized Angular Momentum (LNAM), a dynamic metric first introduced by (Mkhinini et al., 2014), that we define hereafter: where P i is the point of the grid where we compute the LNAM, P j is a neighbor point of the grid, −−→ P i P j is the position vector from P i to P j and − → V j is the velocity vector in P j .Thus, the unnormalized angular momentum L i is computed through a sum of cross products and is bounded by BL i , so that if P i is the center of an axisymmetric cyclone (resp anticyclone), LNAM(P i ) will be equal to 1 (resp -1).Also, if the circulation field is hyperbolic and not an ellipsoid, S i will reach large values, and LNAM(P i ) will be close to 0. All sum is computed on a local neighborhood of P i , which is a hyperparameter of the method (typically a square centered in P i ).In our case, we used the default parameters where the square has a length of 2∆x, with ∆x being the grid resolution (≈9 km).
AMEDA finds potential eddy centers by searching for the local extrema of the LNAM field, more precisely by taking the points P i where | LNAM(P i )| > 0.7.The characteristic contour of an eddy is then defined as the closed streamline of maximum velocity which does not include another eddy center.We perform the AMEDA algorithm on the geostrophic velocity field of our estimation and on the ground truth currents.We then look for the eddies that are both present in the ground truth and in our estimation.An eddy is said to be detected if the distance between its barycenter and the reference one is smaller than the average of the mean radius of the two characteristic contours.This definition allows "multiple" detection (i.e., colocalization with several eddies).Therefore, we exclude eddies that include more than one candidate in the ground truth.For further details about the AMEDA algorithm, we refer the reader to Vu et al. (2018).

Eddy detection performances
We present the detection scores of the different reconstruction methods, with three data scenarios and three losses.We take the ensemble SSH estimation of the neural networks and perform the AMEDA algorithm on the velocity field derived through the geostrophic approximation (see Equation 1).
In Table 3 we present the F 1 score, the recall, and the precision of the methods.The recall tells us the proportion of actual positive instances that were correctly identified by the detection (a recall of 1 means that all ground truth eddies were detected).The precision gauges our trust in the detected eddies (a precision of 1 means that all eddies in the simulation were also present in the ground truth).To aggregate recall and precision, we use the F 1 score, which is the harmonic mean of recall and precision.A value of 1 means a perfect detection: all ground truth eddies were detected, and the estimation produced no false positives.Data comparison.As expected, no matter which loss we consider, the noise-free temperature detection method outperforms the two other scenarios with higher F 1 scores.Even the noisy SST provides important information for eddy reconstruction, as the SSHonly method yields lower results than the two other scenarios.We also see that for each loss, the precision scores are less impacted by the input data than the recall is.This means that the SSH-only scenario does not produce a lot more false detection than the SST methods but misses much more structures.

Loss
Loss comparison.On the other hand, the loss function used to perform the inversion substantially impacts precision and recall.The regularization of the unsupervised loss brings the detection precision to the level of the supervised method (even higher for the SSH-only and SSH+SST) but also reduces the recall of all methods compared to their unregularized version.In other words, the regularization prevents the neural network from generating false eddies and from retrieving some structures, which leads to lower F 1 scores.
Visual comparison.We plot in Figure 8 the SSH maps and eddies detected by AMEDA, and in Figure 9 the relative vorticity ξ computed from geostrophic currents (see Equation 1) as follows: Relative vorticity is an important quantity in the analysis of surface circulation as it highlights areas of important direction change of the stream.ξ is positive in counterclockwise spin and negative in clockwise spin.In the presented figures, we normalize relative   vorticity fields by the Coriolis factor f .Figures 8,9 illustrate an example of the conclusions established in Table 3: the SSH-only reconstruction shows fewer eddies than the ones using SST, and aggregates small eddies into larger ones (see highlighted eddies).
We also see the effect of regularization, especially in the relative vorticity fields, which are a lot smoother than the ones in the supervised and regularized inversion.This smoothing effect results in a reduced number of detected eddies, as illustrated by the two highlighted eddies that are detected separately when SST is used without regularization.

Physical properties of detected eddies
To further investigate the performance of the eddy detection methods, we analyze the detection outcomes based on the physical characteristics of the eddies.For instance, smaller eddies tend to have shorter lifespans, making them more challenging to detect due to their decreased likelihood of being observed by satellites.Conversely, high-speed eddies are derived from important sea surface height (SSH) variations, thus exhibiting a strong signature in the generated mapping.Figure 10 shows the detection performances as a function of some key parameters such as maximum radius, lifetime, or maximum velocity along the final closed current line.
As anticipated, using SST and nSST data contributes to the detection of eddies, as indicated by the higher F 1 scores achieved in every loss scenario.However, small and short-lived eddies are less frequently detected, resulting in lower recall scores.Specifically, only 17% of the eddies with a radius below 15 km are successfully detected in the best scenario.Nonetheless, except for the unregularized loss function, the precision scores for the detected eddies remain high, even for small and short-lived ones.This observation confirms the previously observed phenomenon where the regularization employed in the inversion process prevents the network from generating false eddy detections but also stops it from capturing a significant portion of the actual eddies.This regularization behavior is expected, as forcing a smoothness constraint on the SSH gradient field leads to denying some of the small structures.
We also want to assess the model's accuracy to estimate the eddies' physical properties.To this end, we focus on the eddies that were successfully detected by all the methods (3534 eddies out of the 7908 eddies in the ground truth) and compare the physical parameters of the estimated eddies to their values in the corresponding true eddy.We compute the RMSE and bias of the following parameters: maximum radius and velocity of the characteristic contour of the eddies.Once again, Tables 4 and 5 show that SST helps to estimate eddies radius, and velocity.Nonetheless, there is a bias of radius and velocity: the size of the eddy is statistically overestimated compared to its ground truth, while its speed is systematically underestimated.This is particularly true for the regularized unsupervised loss because of its smoothness constraint, with a velocity bias accountable for half of the RMSE.

Comparison with state-of-the-art methods on a NATL60 OSSE
Comparing various SSH interpolation methods requires a common benchmark and metrics.The Ocean Data Challenge 2020 (CLS/MEOM, 2020) provides an OSSE sim-ilar to the one described in Section 2, the state-of-the-art estimations and metrics.The included data are the ground truth SSH, nadir-pointing observations, and a simulation of the SWOT (Surface Water and Ocean Topography) observations, a new altimetry technology (Gaultier et al., 2016).In this study, we have excluded the SWOT measurements as we do not simulate them in our OSSE and focus on nadir-pointing data.The ground truth is the NATL60 simulation (Ajayi et al., 2019) which uses the same physical model (NEMO 3.6) (Madec et al., 2017) but at finer scales than GLORYS12, and without assimilation.Given that the NATL60 model also outputs SST and ocean currents fields, we retrieved and used these variables, even though they were not included in the official depository of the challenge.The state-of-the-art frameworks presented in this challenge are the following: • duacs: the operational linear optimal interpolation leveraging covariance matrix tuned on 25 years of data; • dymost (Ubelmann et al., 2016;Ballarotta et al., 2020) and miost (Ardhuin et al., 2020): two variants of the linear optimal interpolation where the Gaussian covariance model is changed for a non-linear quasi-geostrophic dynamic model (for dymost) or by a wavelet base (miost); • bfn (Le Guillou et al., 2020): a data assimilation method that performs a back and forward nudging of a quasi-geostrophic model; • 4dvarnet (Fablet et al., 2021): a deep learning framework supervised on the Ocean Data Challenge 2020.In this configuration, it only takes SSH observations as input; • musti (Archambault et al., 2023): an unsupervised neural network fitting SSH along tracks observations starting from an SST image.The fact that this method must be fitted to new observations, limits its operational use.
This benchmark is not complete as the convltsm interpolations introduced by S. A. Martin et al. (2023) were trained on real satellite observations only, and the 4dvarnet versions using SST were only computed using SSH observations from nadir pointing satellites and SWOT data (Fablet et al., 2023).Still, we are interested in evaluating the reconstructions of our networks, trained on our OSSE, on the Ocean Data Challenge 2020 to show the similarity of the two simulated observation systems.To produce our estimation, we regrid the provided data to our resolution (from 0.016°to 0.078°) using trilinear interpolation.We use the SSH simulated observations of the data challenge and the SST of the corresponding NATL60 simulation.The test period includes 42 days of simulation (between 2012/10/22 and 2012/12/02) as defined in the challenge.As such, the comparison is not fully fair since regridding and not training on the same data might bias the scores obtained.It is still a good way to evaluate the similarity of our OSSE to the Ocean Data Challenge 2020, as our approach obtains comparable performances to the state-of-the-art.Each method is then evaluated using the following metrics, and we sum up the results in Table 6: • µ and σ t (in cm), are respectively the RMSE of the SSH and the temporal standard deviation of this RMSE.In the data challenge, these two metrics are normalized by the root mean square of the SSH, but we prefer giving it in centimeters to be coherent with the rest of the work; • λ x (in degrees) and λ t (in days) are two spectral metrics, introduced by (Le Guillou et al., 2020).We compute respectively the spatial and temporal power spectrum of the error, λ x is then the smallest spatial wavelength where the power spectrum of the error is equal to the power spectrum of the signal and λ t its temporal equivalent.For further information, we refer the reader to (Le Guillou et al., 2020); • µ u and µ v (in cm/s) are the RMSE between the NATL60 currents and the geostrophic currents of the estimation.We see in the scores a predominance of neural network-based methods (musti, 4dvarnet and abed) as the importance of the SST in the reconstruction (musti, and abed).The abed-ssh networks do not perform as well as 4dvarnet, but better than optimal interpolations (duacs, dymost, miost) and bfn.This analysis further supports using SST data in deep learning-based methods for these inverse problems.We can expect around 2 cm of error reduction on the operational interpolation scheme duacs with our best method (41% of reduction).We also significantly reduce the errors on currents compared to duacs's, by 5.7 cm /s for u and 5.4 cm /s for v (35% and 34% error reduction).

Application to real satellite observations
In this section we focus on applying the developed methods to real observations with two objectives in mind: show the utility and realism of our OSSE compared to the pre-existing one, and explore transfer learning strategies.To evaluate our method on a shared benchmark, we use the Ocean Data Challenge 2021 (CLS/MEOM, 2021), which provides one year of real SSH nadir observations and evaluation metrics.All the evaluations presented in this section are computed on the along-track data from the CryoSat-2 satellite left aside in all the benchmarked methods.The comparison is done on the entire 2017 year, which is the year that we left aside from training on our OSSE to avoid data leakage.To be coherent with the area covered by all the methods, the evaluation area is smaller than the one of the OSSE (between 34 • to 42 • North and -65 • to -55 • East).These real-world measurements present instrumental errors that produce much higher RMSE scores than the ones computed on the OSSE.Also, as we do not have access to complete SSH maps, the metrics used are µ, σ t , and λ x (in km this time).For methods requiring SST information, we use satellite SST from (NASA/JPL, 2019) described in Section 2.3.

OSSE comparison
In this part, we compare the generalization to real satellite data of models trained on our OSSE with models trained on the Ocean Data Challenge 2020.As this last dataset provides one year of data it can also be used to fit neural networks, but as shown in Appendix 6.2, training on a longer dataset drastically improves reconstructions.As the ex-isting OSSE does not provide SST data, it is possible to use NATL60 SST, but the lack of realistic noise leads to a domain gap with real data.To this day, if SSH-only neural networks have been successfully transferred to real SSH data, this is not the case for SSTaware ones.We compare abed trained in a supervised way on our OSSE (SSH-only or using noisy SST), and on the Ocean Data Challenge 2020 (SSH-only or with NATL60 SST output).To train abed on NATL60 data, we regrid the input and target data to our resolution, and use the data split of the challenge (CLS/MEOM, 2020); validation of the training between 2012/10/22 and 2012/12/02, and fitting on the remaining days.We use the same hyperparameters as for the training on our OSSE.
Once networks are trained on the simulation, we perform inferences on real data, excluding the tracks from the independent satellite.In Table 7 ensemble scores of the models on the Ocean Data Challenge 2021.As expected, abed performs significantly better when trained on our OSSE.Specifically, abed-ssh-sst trained on the Ocean Data Challenge leads to higher errors than its SSH-only version, which shows the domain gap between NATL60 and satellite SST.We conclude that the length of our OSSE and the addition of SST realistic noise enhanced the reconstructions of the realworld SSH.
4.4.2Transfer OSSE learning to real-world data.
Enhancing real-world SSH reconstruction using the information of a simulation is a typical transfer learning problem, where we have access to ground truth in a source domain (OSSE) but not in a target domain (satellite data) (Pan & Yang, 2010).Given the losses described in Section 3.3 and a satellite dataset (see Section 2.3), we can consider three ways to apply our methodology to the Ocean Data Challenge 2021.We partially presented this experiment in (Archambault et al., 2024).
Observation only: Perform an unsupervised training on real-world data, with the loss function described in Equation 6.The training hyperparameters and dataset split are the same as the ones used in the OSSE study (see Section 3.4).
Simulation only: Use the networks trained on our OSSE in a supervised way directly on satellite data.As the test year of our OSSE and one of the Ocean Data Challenge 2021 are the same, we have no issues with data leaking.
Pre-training on OSSE and fine-tuning on satellite data: After the supervised pretraining on OSSE data we fine-tune the neural network on satellite data for a few epochs using the unsupervised loss.The fine-tuning is done using a small learning rate of 1.10 −5 and a decay of 0.9.We use an early stopping with a patience of 8 epochs and we save the best model on the validation set.
We present in Table 8 the RMSE on the Ocean Data Challenge 2021 of 3 abed networks trained with the previously mentioned methodologies.One of the first conclusions we can draw from these reconstruction scores is the interest of our OSSE in the training process.The networks fitted on the simulation perform better than their equivalent trained with observations only, except for the network trained using noise-free SST.This shows that our SST noise is realistic, as introducing SST noise during pre-training is beneficial for generalization to satellite data.Secondly, in every data scenario, the pre-trained and fine-tuned networks perform significantly better than their version trained on observation or simulation.In particular, once fine-tuned, the networks pre-trained on nSST and on SST lead to close performance, whereas without fine-tuning, the network trained on noise-free SST produces the worst reconstruction.Given an appropriate fine-tuning strategy, the features learned on noise-free SST that do not apply to satellite data are effectively modified.From this experiment, we conclude that combining supervised training on our OSSE with unsupervised re-fitting on satellite data increases performance, especially if SST is used.

Learning method
Input data SSH SSH+nSST SSH+SST Observation 7.07 -6.75 6.63 -6.27 -Simulation 6.63 -6.35 6.28 -6.06 6.89 -6.68 Pre-training & Fine-tuning 6.49 -6.28 6.02 -5.82 6.04 -5.84In Table 9, we compare our method to the state-of-the-art interpolation methods provided in the context of the Ocean Data Challenge 2021.The included methods are the same as in Table 6, plus the convltsm-ssh and convltsm-ssh-sst (S. A. Martin et al., 2023).We give ensemble scores of the three pre-trained and fine-tuned abed networks using only SSH, or SSH and the noised SST.The enhanced scores of abed-sshsst and convltsm-ssh-sst compared to their SSH-only versions emphasize the improvements brought by the SST.abed, convltsm and 4dvarnet lead to better SSH gridding than optimal interpolation-based methods (duacs, dymost, miost) both in terms of RMSE and effective spatial resolution.We also note a significative drop in RMSE score for the bfn method compared to its OSSE reconstruction, which shows that the idealized QG model is less applicable to real-world observations.
In Figure 11, we present the SSH maps on the different reconstruction methods with their associated relative vorticity (see Equation 9).The three first methods (dymost, duacs, miost) present smooth vorticity maps as a consequence of the optimal interpolation.All the vorticity maps from neural network-based methods: 4dvarnet, musti, convltsms, and abeds have higher contrast and also some artifacts due to convolution operations.4dvarnet in particular, produces very high-frequency variations on which we can see the input satellite path.We suppose this is a consequence of the U-Net's skip connections whereas the other networks have Encoder Decoder architectures, less prompt to produce high-frequency noise.For the last four methods, convltsm-ssh, convltsmssh-sst, abed-ssh, and abed-ssh-sst, we highlights areas where small structures are visible in the vorticity maps of the SST-using methods but not in their SSH counterparts.
The similar shape of the structures between convltsm-ssh-sst and abed-ssh-sst suggests that they are linked to the use of SST and not the deep learning method.

Summary
In this work, we designed a new OSSE emulating 20 years of satellite observations of SSH and SST while the previously existing OSSE provided only one year of simulated SSH observations (CLS/MEOM, 2020).We were able to train an Attention-Based Encoder Decoder using 3 different loss functions (2 of them learning the reconstruction without ground truth), on three different sets of data (SSH only, SSH and noised SST, SSH, and SST).We show a systematic interpolation improvement thanks to the use of SST.Using temperature data (noisy or not), the unsupervised inversion outperforms even the supervised SSH-only neural network (3.86 cm of RMSE for the unsupervised noisy SST against 4.18 cm for the supervised SSH-only method).This shows the importance of contextual information to constrain this inverse problem, even while learning with observation only.
Using AMEDA, an automatic eddy detection algorithm, we were able to identify cyclones and anticyclones in the ground truth and compare them with the eddies detected in the geostrophic approximation of the different mappings.This allows a deeper physical interpretation than the SSH reconstruction alone.We conclude that SST aids in capturing finer structures that might be overlooked by SSH-only methods and that SSTusing methods better render the key physical properties of the detected eddies, such as size, speed, or center position.Furthermore, in unsupervised reconstruction, we show that the non-regularized and regularized inversions have close detection scores, but their errors are different.The regularized inversions exhibited lower recall scores, indicating that certain eddies were not detected due to the smoothing effect of the regularization process.However, they demonstrated higher precision scores, implying increased confidence in the successfully detected eddies.
We evaluate abed trained using the data from our OSSE on the Ocean Data Challenge 2020 and compare it with state-of-the-art interpolation techniques.We show that  the utilization of SST led to a substantial improvement of 41% in terms of RMSE for SSH compared to the widely used L4 product from duacs.Moreover, we observed significant improvements of 34% and 35% for u and v currents, respectively.These findings present promising perspectives for advancing satellite SSH gridding through the application of deep learning methodologies and the fusion of diverse physical information.
Finally, we presented a novel training strategy, using jointly OSSE and real-world satellite observations.We proposed to perform a transfer from the OSSE to the satellite domain by pre-training the neural network on the OSSE and fine-tuning it on a realworld dataset in an unsupervised way.Comparing the same network trained following three strategies: on simulation only, on observations only, or the one introduced here, we found that using together simulation and satellite data leads to better performances.Specifically, our transfer method achieves state-of-the-art performances on the Ocean Data Challenge 2021, on which we report an RMSE improvement of 24% compared to duacs.

Perspectives
SSH Forecast.This study focused on a delayed time interpolation of the SSH.However, near real-time and forecast data are often useful in many operational applications, such as navigation and meteorology.In future works, we would be interested in extending the output window in the future compared to the input one.In doing so, the neural network would be trained to interpolate and forecast the SSH simultaneously.We would be interested to compare a method doing the two tasks simultaneously to a method doing it successively.
Global interpolation.Furthermore, many challenges still need to be addressed to get toward a global gridded SSH product.For instance, as the geostrophic equilibrium depends on the Coriolis force surface projection and thus on the latitude considered, we may require a model to be trained on several areas with different latitudes.Also, we can wonder which strategy is more efficient between training a global model or several local models, each one specialized for a range of latitude or geographical area.Closed seas and coastal water also have very different physical interactions and might need to be reconstructed by different methods.
Using different input and output data.We have demonstrated the benefit of using multi-physical information, specifically SST, to enhance SSH reconstruction through the implementation of a flexible neural network framework.The integration of data from diverse physical sources exhibits promising outcomes, yet conventional model-based methods encounter challenges due to noise and observational difficulties associated with realworld data.In contrast, machine learning opens doors to augment these methods with diverse and abundant data sources.For instance, we employed noisy yet complete SST data in our investigation, but using L3 SST products is also possible.Furthermore, an intriguing prospect arises as to whether Level 4 (L4) and Level 3 (L3) SST products can be effectively combined, thereby potentially yielding even more precise and exhaustive information.Other physical measurements might improve the reconstruction, such as chlorophyll maps that track plankton advected by currents (Kahru et al., 2012).where Y ssh i is the i-th measurement of SSH, △s i is the ground distance between the SSH measurements, and △s ′ i is the ground distance between the two first derivative approximations.The lists of first and second spatial derivatives, ∂ ∂s Y ssh and ∂ 2 ∂s 2 Y ssh , are recentered on new coordinates, corresponding to the dual coordinates of the Y ssh and ∂ ∂s Y ssh , respectively.We only compute the spatial derivatives from observations coming from the same satellite and only if the measurements are taken with less than two seconds of delay.This way we estimate spatial derivatives only where the rate of change is a valid approximation of the derivation.

Impact of the OSSE temporal length on training
Our OSSE dataset is composed of 7194 days, which leads to 5504 training days once the partition between train, validation, and test sets is made.To evaluate the interest in using more data to constrain the neural network, we train abed network in the optimal configuration (supervised and using noise-free SST).We compare the scenario where all the samples are seen during training with those where only half, a quarter, or a single year of the dataset is used.The validation and test sets remain unchanged, while the training subset is the first consecutive days from the initial training set.Table 10.Mean RMSE score (in cm) of 3 abed networks trained on our OSSE in a supervised manner using SSH and noise-free SST.We compare the situation where the full, half, a quarter, or one year of the dataset is used.

Impact of the SST deseasonalization on reconstruction
In the results presented in this work, we deseasonalized the SST data in the inputs of the neural networks.In Table 11, we show the RMSE of the neural network using "native" SST and the ones using deseasonalized SST.We see that this preprocessing operation decreases the RMSE in every scenario.

Figure 1 .
Figure 1.Images of the ground truth SSH from GLORYS12, the simulated along-track measurements, and the difference.

Figure 2 .
Figure 2. Images of our cloud cover, the ground truth SST from GLORYS12, the noised SST, and the difference.

Figure 3 .
Figure 3. Images of satellite observations of the SSH and the SST, respectively.

Figure 4 .
Figure 4.The architecture of the proposed Attention-Based Encoder Decoder (abed) neural network.It is designed to take a time series of 21 images of SSH, with or without a time series of SST.The encoder divides the spatial dimensions of the images by 4 through 2 "down-block".

Figure 6 .
Figure 6.RMSE of the different reconstructions during the test year (2017).

Figure 7 .
Figure 7. RMSE of the different reconstructions along the time window.The errors at a time delay of −20 correspond to an anti-causal scheme (knowing only future observations) whereas timedelay = 0 corresponds to a causal scheme (knowing no future observations).Knowing both past and future observations leads to the optimal reconstruction at timedelay = −10.

Figure 8 .
Figure 8. SSH maps and detected eddies the 1 st June 2017 on our OSSE.The first line presents the True SSH, the noised SST, and the True SST, on which we plot the eddies detected on the True SSH.The second, third, and last lines present respectively the inversion using L sup , L unsup , and L unsup reg .The first, second, and last columns present the maps using the SSH-only, SSH+nSST, SSH+SST data, respectively.Each SSH map is the ensemble reconstruction of 3 networks with their associated eddies.

Figure 9 .
Figure 9. Relative vorticity (normalized by the Coriolis factor) and detected eddies on the 1 st June 2017 in our OSSE.The first line presents the True relative vorticity.The second, third, and last lines present the neural networks trained with L sup , L unsup , and L unsup reg .The first, second, and last columns present the SSH-only, SSH+nSST, and SSH+SST interpolations.Each relative vorticity map is computed from the ensemble SSH estimation of the 3 networks.

Figure 10 .
Figure10.Detection scores of the different methods on eddies separated by radius (first row), lifetime (second row), and maximum velocity (last row).The considered scores are F1 (first column), recall (second column), and precision (third column).The recall tells the proportion of actual positive instances that were correctly identified, the precision gauges the trust that we can put in the detected eddies, and the F1 score aggregates these two values.
on the right) of 3 abed networks, computed on 1 year of data provided by the Ocean Data Challenge 2021.The training strategies include observation-only training (with satellite SSH and SSH+nSST), simulation-only training (SSH, SSH+nSST, SSH+SST), and fine-tuned networks (SSH, SSH+nSST, SSH+SST).For the Fine-tuned networks, when a network is pre-trained with noise-free SST, it is still fine-tuned with noisy satellite SST.

Figure 11 .
Figure 11.SSH maps and Relative Vorticity maps (normalized by the Coriolis frequency) of the methods from Table 9.The SSH maps are used to compute geostrophic currents from which we derive Relative Vorticity.Due to the different areas covered by the methods, we plot the SSH and RV on a portion of the training area; from 34.25°to 41.75°North and from -64.75°to -56.75°East.On the last four relative vorticity maps, we highlight some regions where small structures are visible in the SST-using interpolations and not visible (or less salient) in their SSH counterparts.
-track spatial derivativesWe calculate the SSH's first and second spatial derivatives along the satellite ground track as described in Equation 10 and 11.Given Y ssh , the list of SSH measurements from one satellite (sorted in time), we approximate the derivative by the rate of change of the SSH: Computational graph of the proposed unsupervised interpolation method.The neural network input is a 21-day time series of SSH satellite observations, excluding data from a single satellite, and optionally includes SST measurements.The network estimates a time series of SSH field states, upon which the observation operator is subsequently applied in order to deduce Ŷssh .Finally, the Mean Squared Error between the Ŷssh and Y ssh is used to control the network.

Table 1 .
slightly increases reconstruction but is still close to the unregularized inversion.SSH reconstruction RMSE in centimeters (mean score on the left and ensemble score on the right) of 3 ABED networks.The interpolation is trained using the 3 different losses described in Section 3.3 with the following settings: SSH-only interpolation, SSH and noised SST, and SSH and noise-free SST.All metrics are given on the central image of a 21-day time window.

Table 3 .
Scores of the AMEDA eddy detection performed on the Ensemble estimation of abed interpolation.The considered scores are the precision, the recall, and the F1 score.

Table 4 .
Eddies maximum radius RMSE and bias (km).The eddy detection is performed on geostrophic currents of the ensemble estimation and the bias is computed from the estimated radius minus ground truth radius.

Table 6 .
Comparison of the state-of-the-art reconstruction methods on the Ocean Data Challenge 2020.SST stands for whether or not the reconstruction methods are using SST, and SUP stands for whether or not the methods are supervised.

Table 7 .
, we present the mean and Comparison of abed networks trained on our OSSE to the ones trained on the Ocean Data Challenge 2020.All the metrics are computed on independent real data of the Ocean Data Challenge 2021.The left scores are the mean performances on three networks and the right ones are the ensemble scores.

Table 8 .
Along-track SSH RMSE in centimeters (mean score on the left and ensemble score

Table 9 .
Comparison of the state-of-the-art reconstruction methods on the real satellite data of the Ocean Data Challenge 2021.SST stands for whether or not the reconstruction methods are using SST.abed-ssh and abed-ssh-sst stands for the ensemble score of our pre-trained and fine-tuned networks.
Table 10 presents the RMSE of the reconstructions on the test year of our OSSE.The scores of the net-works trained with different dataset sizes clearly show better reconstruction performance when the size increases.