DNN‐Based Retrieval of Arctic Sea Ice Concentration From GNSS‐R and Its Effects on the Synoptic‐Scale Forecasting as Supplementary Observation Source

Using delay‐Doppler maps of Global Navigation Satellite Systems Reflectometry (GNSS‐R) from the TechDemoSat‐1 satellite and considering sea ice and ocean interaction, an innovative method for retrieval of Arctic sea ice concentration (SIC) based on a deep neural network is proposed. This retrieval method shows the potential of future GNSS‐R applications for Arctic missions. Compared with SIC products from Hamburg University, the root mean square errors (RMSE) of retrieved results in March and June 2016 are 0.0284 and 0.0415, respectively. When the retrieved GNSS‐R SIC data are added into the assimilation as supplementary passive microwave remote‐sensing data, it has a positive influence on improving the accuracy of the Arctic SIC forecast. Especially in some edge regions of sea ice, when compared to only assimilating the remote‐sensing data, the regional RMSE of joint assimilation has a maximum decrease of approximately 17% in the 24‐hr forecast time, and over 5% in 72‐hr.

provides detailed information about sea ice edge changes (Torres et al., 2012). In high-resolution dynamic assimilation models and machine learning models, ice chart products derived from SAR are gradually being applied in sea ice forecasting (Fritzner et al., 2020). Notably, both SIC retrieval algorithms based on passive microwave and SAR images have specific uncertainties. For instance, high-frequency algorithms may generate the erroneous detection of ice over open water (Lu et al., 2018) and SAR images are greatly affected by the incidence angle and the physical properties of sea ice (Gao et al., 2019). The current state of the assimilation depends on whether observations can provide useful and accurate information, and the error caused by the above reasons may have a negative influence on the assimilation effect of SIC.
Global Navigation Satellite System-Reflectometry (GNSS-R) is a new low-cost remote-sensing technology (Hall & Cordey, 1988). GNSS-R is favored in sea ice satellite remote-sensing research due to its small size and mass, low power consumption and an acceptable spatial resolution (Fabra et al., 2010(Fabra et al., , 2012Komjathy et al., 2000;Yan & Huang, 2019a). With the launch of TechDemoSat-1 (TDS-1) satellite in 2014, millions of delay-Doppler maps (DDMs) have been provided, which greatly accelerated the use of DDMs for sea ice detection and sea ice classification (Cartwright et al., 2019;Jiang et al., 2022;Tye et al., 2015;Zhu et al., 2017). Yan et al. (2017) first clarified the feasibility of using DDM to retrieve SIC based on neural networks. Subsequently, further research on the retrieval of SIC by establishing the relationship between DDM and SIC has been carried out based on deep learning or statistical models (Yan & Huang, 2018, 2019bZhu et al., 2021). Can high accuracy SIC be obtained by constructing a SIC retrieval model with the assumptions that it is simple to directly distinguish sea ice and seawater based on DDMs and that there are many physical factors closely related to changes in SIC is a research topic worth exploring further. It is currently uncertain if there are positive effects when the SIC retrieved from GNSS-R is applied to Arctic sea ice forecasting.
In this study, we propose a novel method to intelligently retrieve the Arctic SIC using DDMs considering the interaction between sea ice and ocean. First, sea ice and seawater are identified from TDS-1 DDMs based on DDM pixel number eigenvalue method. On this basis, the SIC retrieval model is established based on deep neural network (DNN) by extracting sea ice attributes from historical satellite-based SIC products and combining with the ocean and atmospheric reanalysis data related to the thermal and dynamic processes of sea ice change. The data and method are described in Section 2, and the retrieved SIC is evaluated and verified in Section 3. In Section 4, considering the inadequacy of spatial coverage of GNSS-R, the retrieved SIC and the passive microwave remote sensing SIC product are jointly assimilated, and a series of retroactive real-time SIC forecast experiments are carried out.

Data and Method
The TDS-1 is a satellite technology demonstration platform designed by Surrey Satellite Technology Ltd. It carries eight separate payloads operated in an 8-day cycle. The data used in this study from the Space GNSS Receiver Remote Sensing Instrument on TDS-1 are provided in the form of DDMs. The DDMs can reflect the roughness of the reflecting surface and the roughness of sea ice and seawater is very different, so their DDMs are different. Generally, the DDM of seawater has greater diffusion on the delay axis and doppler axis than the DDM of sea ice (Yan & Huang, 2016). In this study, the retrieval of SIC at GNSS-R subsatellite point is achieved through two parts, as shown in Figure 1. First, the normalized DDMs are obtained by a series of preprocessing, such as noise removal. To describe the diffusion degree of DDMs, the number of DDM pixels with a power greater than the preset threshold in normalized DDMs is defined as the DDM observation value (Yan & Huang, 2016). The threshold is set to 0.5. Then, the normalized DDMs in the non-study period are selected as the training data. Based on the histogram method, the intersection of the sea ice and seawater histograms is considered the DDM observation threshold (Cartwright et al., 2019). Comparing the DDM observation value with its threshold, those larger than threshold are considered to be seawater, while the others are considered to be sea ice to realize the identification of sea ice and seawater.
Furthermore, the sea ice and seawater are labeled as 1 and 0, respectively. Considering that the real Arctic sea ice environment is the result of the interaction of various elements, it is necessary to establish a nonlinear relationship between the input and the target in the retrieval of SIC. Deep learning technology has shown great advantages in dealing with complex nonlinear problems in the marine field in recent years (Reichstein et al., 2019). DNN is an artificial neural network with multiple hierarchical structures that has been proven to be effective in retrieving sea ice and ocean elements (Ding et al., 2020;Wang et al., 2021). Eight types of parameters are used as the inputs. In addition to the ice/water value (1 or 0) at the GNSS-R subsatellite point, its geographical location, the historical SIC remote sensing data and the reanalysis data of ocean and atmospheric elements (sea surface temperature (SST), 2 m temperature, 10 m u-component and v-component of wind, mean sea level pressure) are also selected. The daily SIC observations are processed by Arctic Radiation and Turbulence Interaction STudy Sea Ice (ASI) algorithm of Hamburg University, with a spatial resolution of 12.5 km × 12.5 km (hereafter, ASI-SIC). These data are also used as the truth value of DNN model output. The monthly average SIC is derived from the NSIDC-0051 data set based on multiple passive microwave instruments released by the National Snow and Ice Data Center, with a spatial resolution of 25 km × 25 km. The reanalysis data adopt ERA5 products at 0000 UTC from European Centre for Medium-Range Weather Forecasts, with a spatial resolution of 0.25° × 0.25°. The above data are first normalized to the 0-1 range. Then, the four grid points closest to the subsatellite point are found, and the corresponding values of the grid points are interpolated to the subsatellite point by the irregular quadrilateral bilinear interpolation method. After preprocessing, the model has 23 inputs and 1 output (see Table S1 in Supporting Information S1 for details).
The model consists of three neuron layers, each with 120, 100, and 80 neurons. The empirical linear unit (ELU) (Clevert et al., 2016) is used as an activation function for the first two neuron layers. To meet the physical significance of SIC, the neuron activation function of the last layer is set to sigmoid, making the output value within the range of 0-1. The parameters of the DNN model, such as the weight and bias of each connection layer, are updated using a backpropagation algorithm that returns the mean squared loss between the predicted value and the true value. In this study, GNSS-R data with a time range of 1 September 2015 to 1 July 2016 were selected and divided into two parts for training (the range has intervals; see Table S2 in Supporting Information S1 for details). In the first part, a total of 279,019 pieces of GNSS-R data (86%) were selected from 1 September 2015 to 24 February 2016 to train the DNN model, and a total of 44,559 pieces of data (14%) from 1 March to 26 March in 2016 were used as an independent test set. The time ranges for the training set and the test set in the second part are from 13 January to 14 June in 2016 (232,177 pieces, 81%) and from June 15 to July 1 in 2016 (54,743 pieces, 19%), respectively.

Retrieved GNSS-R SIC Estimation and Accuracy
An example of GNSS-R observation points covering a part of the Arctic region is presented in Figure 2a to show the location distribution of ASI-SIC 12.5 km grid points and GNSS-R subsatellite points. It can be clearly seen that compared to grid points GNSS-R data points have a lower spatial coverage but provide a higher spatial resolution. When two GNSS-R trajectories intersect (such as the blue box in Figure 2a), additional sea ice information on four to six coordinate points can be provided at locations where data are missing in the middle of the four grid points. A single track can also reach two to three points, which may provide more detailed information for capturing sea ice changes. Furthermore, in the qualitative comparison with ASI-SIC, the SIC retrieved from the DNN model (hereafter, GNSS-SIC) shows almost the same spatial distribution, both in winter ( Figure 2b) and summer ( Figure S1 in Supporting Information S1). However, in summer, there are some differences between them near the edge of sea ice. Compared to the drastic changes of SIC in ASI-SIC, the GNSS-SIC seems to be more prone to a slow transition. This may be the result of their different spatial resolutions.
The accuracy of GNSS-SIC was quantitatively evaluated using the value of ASI-SIC interpolation to GNSS-R subsatellite points as true values. Five evaluation indicators are used, namely, the mean deviation (Bias), mean absolute deviation (MAD), root mean square error (RMSE), correlation coefficient (Coff) and dispersion index (SI) (Yang et al., 2020). As shown in Figure 2c, the GNSS-SIC achieved a good scatter pattern for all samples in March with RMSE of only 0.0284 and Coff of 0.9975, obtaining small Bias (−0.0018), MAD (0.0122) and SI values (3.97%). Specifically, 95.36% of GNSS-SIC data points have an absolute difference from the true value of less than 0.05, while only 0.37% points have a difference greater than 0.2 ( Figure 2d). In contrast, the accuracy of GNSS-SIC decreased slightly in June, with SI rising to 7.63% and about 400 (0.68%) data points having an absolute difference from the true value greater than 0.2 ( Figure S2 in Supporting Information S1). Nonetheless, the GNSS-SIC still has low RMSE (0.0415) and a significant positive correlation with the true value (0.9951), which also indicates that the DNN retrieval model has a certain degree of robustness. The spatial distribution of the differences between GNSS-SIC and ASI-SIC is further analyzed ( Figure S3 in Supporting Information S1). In March, GNSS-SIC only has deviations in part of marginal ice zones (MIZ). In June, with the gradual melting of sea ice, there is an increase in areas with significant differences in the outer edge of the sea ice, mostly with positive deviations, and there are also small negative deviations within the ice zone.
In general, the GNSS-SIC obtained based on the DNN retrieval model not only provides more detailed information, but also has good accuracy in different periods.

Impact of Retrieved GNSS-R SIC on Forecast
With the increased spatial resolution and good accuracy of GNSS-SIC, the combination of GNSS-SIC and satellite-based SIC with high spatial coverage has the potential to improve the forecast accuracy of Arctic SIC compared with the assimilation of a single variable. Therefore, a set of retrospective forecast experiments are performed to study this topic. The experiments are conducted using the Massachusetts Institute of Technology general circulation model (MITgcm) (Marshall et al., 1997). This model achieves calculations such as SIC, sea ice thickness (SIT), and snow cover through a two-way coupled process between sea ice and ocean modules (Zhang et al., 1998). A cubic spherical grid is used in the model, and the grid covering the Arctic region is locally orthogonal with a horizontal average resolution of 18 km. The MITgcm model has been proven to have good performance in predicting Arctic sea ice variables Yang et al., 2014). Table 1, four comparative experiments are performed. These include a control experiment (Exp_ Ctrl) without any assimilation and integrated forward through the model and three experiments using data assimilation methods to assimilate different SIC data: ASI-SIC and GNSS-SIC (Exp_Comb), only ASI-SIC (Exp_ASI), and only GNSS-SIC (Exp_GNSS). The spatial multi-scale recursive filter (SMRF) method, a data assimilation method, uses a variational optimization technique to minimize the difference between the estimated and observed fields. Recursive filters are applied to the gradient of the cost function, and the filter scale decreases with iteration to realize successive extraction of various scales (Yang et al., 2022). The SMRF is used to construct daily SIC initial fields of model in experiments. After obtaining the initial field for each day, the MITgcm model is used to integrate forward for 7 days to forecast SIC. The GNSS-SIC data are not available every day, therefore, for Exp_Comb (or Exp_GNSS), when the GNSS-SIC data are not available during the assimilation process, only the ASI-SIC data are assimilated (or no assimilation). The starting time ranges of the forecast experiments are from 1 to 17 March in 2016 and from 15 June to 6 July in 2016, respectively. The ASI-SIC is still used as reference data for evaluating SIC forecast results.

As shown in
The evaluation of SIC forecast results for four experiments during two parts of forecast periods is shown in Table 1. Compared to Exp_Ctrl, Exp_GNSS achieves significant improvement in difference statistics, reflecting the positive impact of assimilating GNSS-SIC. However, it is clear that the deviations of Exp_GNSS are greater than Exp_ASI. This is not difficult to understand because GNSS-SIC is much lower in temporal resolution and spatial coverage than ASI-SIC. It can be noted that when GNSS-SIC is assimilated as supplementary data for ASI-SIC (Exp_Comb), compared to Exp_ASI, MAD and RMSE are reduced, albeit limited. Additionally,  considering the advantage of high resolution in GNSS-SIC, it is necessary to statistically analyze the error in the MIZ. The RMSE of areas with SIC above 0.05 and below 0.8 in both observations and any experiment is used as an indicator , denoted as RMSE MIZ . The results show that the RMSE MIZ of Exp_Comb decreases by 0.46% in March and 0.98% in June-July compared to Exp_ASI. Especially, these values are 2.19% and 4.45% in the 24-hr forecast results (not shown). The above conclusions are also valid when used for comparison with the independent SIC observations ( Table S3 in Supporting Information S1).
The time available for GNSS-SIC data accounts for a small proportion of the entire starting time ranges, accounting for approximately 35% and 31% in March and June-July, respectively. Therefore, it is necessary to conduct further research on the direct and potential impacts of the forecast results when GNSS-SIC is available. Here, we focus more on the improvements between Exp_Comb and Exp_ASI. To illustrate the effects of the addition of GNSS-SIC in a more obvious way, the percentage of RMSE decline of forecast results in Exp_Comb compared to Exp_ASI is used as an indicator and defined as follows: where the subscript "dec" indicates a decline and the subscripts "Comb" and "ASI" indicate Exp_Comb and Exp_ASI, respectively.
The time series of the RMSE changes in forecast results caused by the addition of GNSS-SIC are shown in Figures 3a and 3d. A positive value represents an improvement and a smaller RMSE of Exp_Comb, while a negative value represents the opposite. Obviously, when GNSS-SIC is additional added to the assimilation (red arrow), the forecast RMSE decreases significantly. Moreover, this improvement can continue for a short time without the addition of GNSS-SIC. In March, there is a sustained improvement lasting about 3-4 days, and even the forecast result on the seventh day still has a slight advantage. This result decreases to 2-3 days in June-July due to the sea ice melting faster (Figure 3d). In addition, as the SIC in March is almost 0.9 or above, the addition of GNSS-SIC data in assimilation has a limited impact on the forecast results (the maximum RMSE dec is 0.91%).
In contrast, the combination of both GNSS-SIC and ASI-SIC brings more improvements to the forecast results when sea ice changes are more intense from June to July, with a maximum RMSE dec of 3.31%. The GNSS-SIC was added to the assimilation for two consecutive days before both 23 June and 1 July, but the decrease in the 24-hr forecast RMSE on 23 June was greater than on 1 July. This may be due to more GNSS-SIC data available on 23 June (9,591) than on 1 July (3,082), providing more "real" information for the construction of the initial field. A similar conclusion can also be obtained by comparing the 24-hr forecast RMSE on 15 June (10,051) and 21 June (2,215).
Furthermore, using the 24-hr forecast results on 10 March and 23 June (the two days with maximum RMSE dec ) as an example, the improvement brought by adding GNSS-SIC in spatial distribution is analyzed. Overall, the forecast results of the two experiments on 10 March have almost no difference in the inner Arctic region ( Figure  S4 in Supporting Information S1), but the deviation of Exp_Comb at the MIZ ( Figure 3c) is slightly smaller than that of Exp_ASI (Figure 3b), such as the Greenland Sea. On 23 June, after joint assimilation of GNSS-SIC and ASI-SIC, more accurate forecast results are obtained at approximately 77°N in the Laptev Sea and in the Beaufort Sea near the Pacific Ocean side (the enlarged images in Figures 3e and 3f). The regional RMSEs in these two regions decrease by 13.84% and 17.17% compared to Exp_ASI, respectively ( Figure S5 in Supporting Information S1). Even in the 72-hr forecast time, Exp_Comb still has improvement effects of 5.01% and 8.76%, respectively ( Figure S6 in Supporting Information S1). In fact, both locations are near the MIZs ( Figure S7 in Supporting Information S1). The added GNSS-SIC data provide more additional information at these locations ( Figure S8 in Supporting Information S1), which propagate to broader areas based on the SMRF method, bringing a positive impact on improving the large positive bias in the background field. In some regions, such as the sea ice margins of the Greenland Sea and the Barents Sea (Figures 3e and 3f), the prediction effect of Exp_Comb has slightly deteriorated compared to Exp_ASI. This may be due to the small amount of GNSS-SIC data at these locations leading to uncertain disturbances in areas with large SIC gradients during the assimilation process.

Conclusions and Discussions
In this study, based on extracted ice/water information from GNSS-R DDMs and sea ice and ocean historical data, a DNN-based multi-element model is proposed to retrieve Arctic SIC at GNSS-R subsatellite points. The results show that the retrieved SIC not only has more detailed local information than the coarse resolution passive microwave remote-sensing data but also has good accuracy, with a mean RMSE and correlation coefficient of 0.035 and 0.996, respectively.
It is useful to add ocean elements to the input of the retrieval model. The melting or freezing of sea ice is closely related to changes in SST. Similarly, heat from the atmosphere is also an important factor in maintaining the energy balance during the melting or growth of sea ice (Andersson et al., 2021). Wind, as a key driving factor for sea ice drift, affects the movement and formation of sea ice (Guemas et al., 2016;Kang et al., 2014), and has a certain impact on the accuracy of estimating SIC from DDMs (Yan et al., 2017). In the supplementary experiment, it was found that when ocean elements were removed from the retrieval model, the RMSE increased by 3.58%. Moreover, adding the selected ocean parameters separately has varying degrees of positive impact on the accuracy of the retrieved results. Yan and Huang (2019b) used selected features from DDMs as input for the neural network, and the optimal RMSE of the retrieved SIC in winter was approximately 0.15, which is higher than the 0.03 obtained after adding ocean elements in this study.
Although sufficient historical data of SIC and reanalysis data of ocean elements have been added to the input of the retrieval model, the ice/water value provided by GNSS-R is still one of key factors determining the accuracy of the retrieved results. When it is not included in input variables, the RMSE increases by 6.78%. The ice/water value provides effective weight information for parameter training of the model, and it is also the only real-time data among the input variables.
Furthermore, the potential impact of retrieved GNSS-SIC on forecast is evaluated. Compared to the control experiment, the assimilation of GNSS-SIC achieves smaller forecast deviations, but its forecast performance is inferior to that of assimilating only microwave remote-sensing ASI-SIC. Nevertheless, when GNSS-SIC is added to the assimilation as supplementary data for ASI-SIC, it has certain positive impacts on the improvement of forecast accuracy, especially in the MIZs. The regional forecast skills in the MIZs of the Beaufort Sea and the Laptev Sea have increased by 17.17% and 13.84%, respectively, and still have obvious advantages in 72-hr forecast time. Yang et al. (2015) showed that assimilating SIC microwave remote-sensing observations can improve the forecast accuracy of MIZs. Our study further reduces the error by adding GNSS-R data. In addition, the assimilation of SIT can also improve the SIC forecast accuracy (Chen et al., 2017;Mu et al., 2019). Further research will be conducted on the retrieval of SIT based on GNSS-R data to evaluate its impact on the SIC forecast accuracy.
Since the cadency of TDS-1 cannot achieve daily capture of SIC dynamics (Zhu et al., 2021), the daily assimilation of GNSS-SIC in Arctic cannot yet be achieved. Cyclone Global Navigation Satellite System (CYGNSS) can achieve high spatial and temporal resolution detection of dynamic sea ice systems, but it cannot cover areas covered by high latitude sea ice. In the future, GNSS Transpolar Earth Reflection Monitoring (G-TERN) missions dedicated to polar science will have the expectation of providing more valuable data in detecting sea ice (Cardellach et al., 2018), which may help further improve the forecast accuracy of Arctic SIC.