PCSSR‐DNNWA: A Physical Constraints Based Surface Snowfall Rate Retrieval Algorithm Using Deep Neural Networks With Attention Module

Global surface snowfall rate estimation is crucial for hydrological and meteorological applications but is still a challenging task. A novel approach is developed to comprehensively use passive microwave, infrared data and physical constraints in deep‐learning neural networks with an attention module for retrieving surface snowfall rate (PCSSR‐DNNWA). The PCSSR‐DNNWA consistently outperforms traditional approaches in predicting surface snowfall rates with a correlation coefficient of ∼0.76, mean error of ∼−0.02 mm/hr, and root mean squared error of ∼0.21 mm/hr. It is found that graupel water path is of vital importance with largest contributions in retrieving surface snowfall rate. By integrating the physical constraints, the algorithm of PCSSR‐DNNWA opens a new avenue for retrieving the surface snowfall rate from satellites since some predictors are intelligently considered, resulting in an increased accuracy, interpretability, and computational efficiency.

reduce the dimensionality of high-dimensional datasets, making it easier to visualize the data and improving the performance of machine learning models (Kongoli et al., 2015). Nevertheless, information loss is inevitable if multiple independent variables are simultaneously considered in PCA, which also hinders the interpretability of results due to the new dimensions not necessarily corresponding to the original features.
The recent advancements in deep learning techniques have opened up new opportunities for the detection and retrieval of snowfall (Adhikari et al., 2020). The attention mechanism is a new technique used in deep learning to enhance the model's ability to focus on specific parts among the multiple independent variables in predicting the dependent variable. By incorporating the attention module into deep neural networks (DNN), multi-sensor data could be comprehensively and intelligently considered as independent variables with appropriate corresponding weights, greatly aiding in improving the accuracy and interpretability of the model. Furthermore, physical constraints such as surface variables obtained through radiative transfer models are also taken into account to alleviate the uncertainties in retrieving the snowfall rate under different surface conditions. This study aims to develop a physically-constrained surface snowfall rate retrieval algorithm using deep neural networks with attention module (PCSSR-DNNWA) for addressing the aforementioned challenges in surface snowfall rate retrieval. CloudSat CPR 2C-SNOW-PROFILE products are used as a reference. PCSSR-DNNWA has great advantages in intelligently and effectively considering multiple independent variables to accurately estimate surface snowfall rate at high spatiotemporal resolutions.

CloudSat-GPM Coincidence Data Set
Brightness temperature information of multi-sensors comes from 2B-CSATGPM product. The 2B-CSATGPM (V03B) selects coincident samples from both CloudSat and the GPM Core Observatory, with a maximum time difference of ±15 min (Turk et al., 2021). The 2B-CSATGPM (V03B) incorporates data from GMI, DPR, and CPR, as well as some auxiliary datasets. Specifically, the data set used in this study contains GMI brightness temperatures (GMI L1C-R, microwave), NPP-ATMS brightness temperatures (1C.NPP.ATMS, microwave), MODIS auxiliary product (MODIS-AUX, infrared) and SSR (from the CPR 2C-SNOW-PROFILE product).
GMI is a passive conically scanning microwave radiometer to probe the atmosphere and surface and has 13 channels with frequency ranging from 10.65 to 183.31 ± 7 GHz. The frequencies at 10. 65, 18.7, 36.5, 89, and 166 GHz channels have both vertical and horizontal polarizations.
Additional data from the MODISAUX auxiliary product (thermal channels from the Moderate Imaging Spectroradiometer (MODIS) imager onboard the Aqua satellite, matched to each CloudSat beam), are included for the 11 thermal infrared (IR) or near-IR channels (channels 20 and 27-36).

MiRS Precipitation and Surface Products
The Microwave Integrated Retrieval System (MiRS) is a physically based retrieval system that employs a 1-D variational (1DVAR) retrieval scheme to solve simultaneously for surface and atmospheric parameters, including hydrometeors (Boukabara et al., 2007(Boukabara et al., , 2011. This approach retrieves the fundamental physical attributes that impact microwave observations, including the profile of atmospheric temperature, water vapor, hydrometeors, as well as surface emissivity and temperature, through physical means (Boukabara et al., 2018).
The MiRS system was first implemented operationally by the U.S. National Oceanic and Atmospheric Administration (NOAA) in 2007 for the NOAA-18 satellite. Currently, MiRS generates orbital products such as rain rate, snowfall rate, layered and total precipitable water (TPW), cloud water, snow cover/water equivalent, sea-ice concentration, and land surface temperature/emissivity from various satellites including NOAA-18, -19, and -20, MetOp-A and -B, S-NPP, GPM, DMSP F-17 and F-18. Furthermore, this 1-DVAR retrieval scheme can be applied to other sensors like Chinese FengYun-3D microwave soundings (Xu et al., 2023).

Methods
The framework of the study is shown in Figure 1. First, DNN input data sets were produced by combining Cloud-Sat, GPM, MODIS, and MiRS products data. Second, DNN and other machine learning methods were trained in 2015 based on various scenarios. Finally, the results were estimated among different methods.

Data Preprocessing
The PCSSR-DNNWA system receives several inputs, including PMW data from GMI and NPP-ATMS, IR data from MODIS, and physical constraints-based data from MiRS Precipitation and Surface Products. Although the physical constraints-based data is not directly provided in 2B-CSATGPM, it can be resampled to a 0.25° × 0.25° map from MiRS Precipitation and Surface Products as a monthly physical background reference, taking both accuracy and efficiency into account. Additionally, to account for the diurnal cycle, each day is divided into three eight-hour periods. (Graupel Water Path) and LWP (Liquid Water Path) as environmental data (ENVI) and Emis01-22 (Emissivity of 22 NPP-ATMS channels). The incorporation of physical constraints based data has the potential to enhance the accuracy of surface snowfall rate estimates by minimizing physical uncertainty and enabling differentiation between various surface backgrounds. Furthermore, the need to test various combinations of input data is obviated, as the model is capable of calibrating the significance of predictors, rendering it widely applicable to other satellite sensors.

DNN With Attention Module
A deep neural network (DNN) with an attention module is a type of neural network architecture that includes an additional mechanism that enables the network to selectively focus on different parts of the input data. Attention mechanisms are commonly used in neural machine translation, natural language processing, and other sequence modeling tasks, where the input data may be long and complex (Vaswani et al., 2017).
PCSSR-DNNWA is based on the standard DNN, with the incorporation of an attention module that enables self-adaptive calibration of predictors in surface snowfall rate retrieval, and provides insights into predictor importance. Compared to artificial neural networks (ANNs) that typically comprise an input layer, a hidden layer, and an output layer, DNNs have multiple hidden layers, which increase their capacity to capture complex nonlinear relationships. In the input layer, each neuron represents one predictor, and the neuron in the output layer represents the estimated result. Neurons in each hidden layer take the weighted sum of neurons from the previous layer via dense connections, and apply a nonlinear transformation via an activation function.
The DNNWA model, which incorporates an attention module based on the sigmoid activation function, represents an improvement over the standard DNN model in terms of self-adaptive calibration of predictors. Initially, dense connections were implemented to identify dependencies among the input predictors. Next, the sigmoid activation function was employed to calculate the importance score for each input predictor, which was then self-adaptively adjusted based on the interaction among the input predictors. Finally, the input predictors were multiplied by the corresponding importance scores to achieve self-adaptive calibration. Consequently, informative predictors were emphasized while less useful predictors were suppressed. The sigmoid activation function is computed as follows: The attention module was added after the input layer to the standard DNN. Following the attention module, several hidden layers were connected to the output layer. The number of hidden layers and the number of hidden neurons in each hidden layer were determined through cross-validation (CV). Based on the trade-off between model accuracy and efficiency, the DNNWA model was configured with six hidden layers, each comprising 1,024 neurons. Thus, the proposed DNNWA model not only self-adaptively calibrates the predictors, but also possesses strong nonlinear modeling capabilities. The DNNWA model, incorporating the predictors, can be expressed as: where f represents the DNNWA algorithm. SSR refers to surface snowfall rate, T GMI refers to 13 GMI Level 1C brightness temperatures, T MOD refers to MODIS 1-km thermal channels 20 and 27-36, T NPP refers to NPP-ATMS Level 1C brightness temperatures (CloudSat-GPM period only) and corresponding earth incident angle of pixel (NPP IA), ENVI (environmental data) and Emis NPP (emissivity of 22 channels from NPP-ATMS) refer to variables of MiRS Precipitation and Surface Products from S-NPP.

Identification of Predictor Importance
To validate the robustness of the model, five-fold cross-validation strategy was employed in this study. For instance, the data set was first randomly divided into five subsets, each containing approximately 20% of the samples; and then, one subset was assigned as testing data, and the model was trained with the remaining subsets. This process was repeated five times to ensure that all the datasets have been tested at least once.
The validation results of the proposed DNNWA model were compared with and without physical constraints, using identical datasets and a five-fold CV approach. In addition, the performance of the DNNWA models were compared with standard DNN and three widely-used machine learning models, support vector regression (SVR), random forest (RF) and gradient boosting regression (GBR, Rysman et al., 2019), which have shown exceptional performance in modeling surface snowfall rate retrieval.
The attention module captures spatiotemporal heterogeneity of the predictors and calibrates them using the sigmoid activation function. The sigmoid function is applied to a weighted sum of the input sequence and a learned parameter vector, which is used to compute the attention scores. The sigmoid function ensures that the attention scores are normalized and sum up to 1, allowing the module to effectively identify the most relevant elements in the input sequence. Unlike the RF model's overall predictor importance, the PCSSR-DNNWA model determines predictor importance for each grid. Scores closer to 1 indicate higher importance of the predictor in the model (i.e., the predictor is emphasized), while scores closer to 0 indicate lower importance of the predictor (i.e., the predictor is suppressed).

Physically Constrained Surface Snowfall Rate Retrieval Model Outperforms Traditional Algorithms
The results section is based on the testing period of 2015. To evaluate the performance of surface snowfall rate estimates, three commonly used evaluation metrics were utilized: Pearson correlation coefficient (CC), mean error (ME), and root mean squared error (RMSE). Figure 2 depicts the density scatter plots of data pairs (differ ent model estimates vs. CloudSat estimates) extracted from all snowfall events during the entire testing period. The metrics (CC, ME, and RMSE) are based on a point-by-point comparison of model estimates to CloudSat esti mates for all snowfall events.
In the datasets used, the traditional machine learning models such as SVR, GBR, and RF were evaluated, with RF showing superior performance. The outcomes yielded by these models may exhibit considerable underestimation of elevated values and overestimation of lower values, notably in the context of the SVR model. Owing to their superior ability in capturing non-linear relationships, DNNs offer a potent resolution to address this challenge effectively. In the absence of physical constraints, standard DNN displays a significant advantage over traditional machine learning models. Furthermore, the introduction of attention mechanisms endows DNNWA with superior performance. Compared to standard DNN, DNNWA has an 8% improvement in CC, a 60% absolute reduction in ME, and a 14% reduction in RMSE. Even more impressively, with the incorporation of physical constraints-based data, the PCSSR-DNNWA model demonstrates the best performance, achieving the highest CC (0.76), lowest ME (−0.02 mm/hr), and RMSE (0.21 mm/hr).
The overall and seasonal performance of the PCSSR-DNNWA model was compared with those of the RF, DNN, and DNNWA models on different surface types (Table 1). Spanning various seasons and surface types, the PCSSR-DNNWA model demonstrated consistent superiority over the competing models, exhibiting the highest CC and lowest RMSE. Generally, all models demonstrate a stronger CC over the sea surface than over the land. This superior performance is particularly noticeable during winter, when all models, most notably PCSSR-DNNWA with a CC of 0.82, show remarkable efficacy over the sea surface. A plausible explanation for this might be the heightened stability of brightness temperature observed over the sea surface, coupled with a larger quantity of data samples, which consequently enhances the generalization capability of the model. Conversely, the models' performance during summer tends to be less stable, likely due to the randomness triggered by an inadequate sample size. PCSSR-DNNWA exhibits a weaker performance on the sea surface during summer (CC = 0.65), but fares better on land in the same season (CC = 0.75). While the performance of each model on the summer sea surface is inferior, the PCSSR-DNNWA demonstrates the most significant improvement compared to DNNWA, with a notable increase in the CC by 0.08. This suggests that the integration of physical constraints effectively mitigate the impact of high randomness when dealing with smaller sample sizes.

Attention Module in PCSSR-DNNWA Offers Advantages in Intelligently Adjusting the Weights of Predictors
To better understand how the predictors affect the skill of SSR retrieval, a detailed analysis of the weights of predictors in the models was conducted. Note. Bold values represent the optimal metrics among the four models.

Table 1
The

Metrics (CC, ME, and RMSE) of Surface Snowfall Rate Estimates From Different Algorithms (RF, DNN, DNNWA, and PCSSR-DNNWA) Against CloudSat Estimates in Different Seasons
predictors by evaluating their importance, which can be converted into their respective weights. An overview of the mean weight of predictors in PCSSR-DNNWA is shown in Figure 3a. Overall, the first seven important predictors are GWP, RWP, T MOD 1, TWP, T MOD 2, T GMI 6, NPP IA. Brightness temperature data from MODIS, NPP-ATMS, and GMI sensors account for almost the same proportion, with physical constraint-based data (ENVI and Emis NPP ) accounting for nearly half of the weight in the PCSSR-DNNWA model. It is worth noting that ENVI data (especially GWP, RWP, and TPW) play an important part in surface snowfall rate retrieval.
The weights of predictors given by the PCSSR-DNNWA model are more reasonable and interpretable than those given by RF. Figure 3b shows the weights of Emis NPP given by RF and PCSSR-DNNWA. While the overall trend in the weight distribution of these two models bears similarity, the approaches differ for channels 10-15 and 18-22, where the center frequencies coincide. In these instances, the RF model assigns equivalent weights to these inputs. On the contrary, PCSSR-DNNWA model emphasizes one channel, thereby diminishing the significance of the other channels with similar characteristics. This distinctive trait enhances the interpretability of the PCSSR-DNNWA model, aiding in the identification of key variables-specifically, the peak channels 1, 4, 7, 11, and 16.
Moreover, unlike the static weights of the RF model, the PCSSR-DNNWA model can provide dynamic weights spatiotemporally. Although the values are similar, mean weight of Emis NPP can be calculated in different latitude ranges. The spatial pattern in the Northern Hemisphere of the most important Emis NPP (channel 16) is depicted in Figure 3c. In contrast to the static weights of the RF model, the weights of the PCSSR-DNNWA model are dynamically adjusted in space. Ranging from 0.18 to 0.3, the weights show overall stability and spatial variability where high weights areas are mostly locked to geographical locations associated with storm tracks in the Northern Hemisphere (Liu, 2008).
It is worth noting that the combination of deep neural networks and attention mechanisms allows the model to effectively capture nonlinear relationships. For instance, although correlation between ATMS brightness temperatures at the window channel 16 (band3 of T NPP , center frequency 88.2GHz) and surface snowfall rate is weak (Kongoli et al., 2015), the PCSSR-DNNWA model still gives band3 of T NPP a relatively high weight, which means this high-frequency channel plays an important role in SSR retrieval (Rysman et al., 2019). Similarly, graupel water path (GWP) is given the highest weight while the correlation between it and SSR is weak. As a representative of solid precipitation characteristics in the MiRS precipitation and surface products, GWP definitely has some reference significance for SSR retrieval (Moisseev et al., 2017). The PCSSR-DNNWA model can effectively capture this information and highlight the importance of GWP in SSR retrieval, which has often been overlooked in previous studies.
The effectiveness of PCSSR-DNNWA in retrieving SSR with high accuracy can be attributed to several characteristics. To begin with, the use of all high-frequency channels of microwave sensors addresses the issue of vertical and microphysical variability of snowfall. Depending on their frequencies and polarizations, these high-frequency channels can be more or less sensitive to different layers of a snow cloud and the size or shape of snowflakes (Panegrossi et al., 2017). Moreover, the incorporation of low-frequency channels provides information on the background surface at the time of overpass, without relying on climatology (Rysman et al., 2019). In addition, the use of infrared data yields ancillary information on the surface type or cloud top. Lastly, a prominent advantage of PCSSR-DNNWA is its incorporation of physical constraint-based variables to adjust the retrieval strategy.

Conclusions
This study proposes a physically-based deep learning model (PCSSR-DNNWA) for surface snowfall rate retrieval, which combines multi-sensor data with physical constraints-based data to generate surface snowfall rate estimates. The effectiveness and advantages of PCSSR-DNNWA have been clearly verified through cross-validation and comparison with other widely-used traditional models, providing insights into its superior performance. Particularly, one interesting finding is that graupel water path (GWP) is of vital importance with largest contributions in retrieving surface snowfall rate, which could serve as a useful guideline for future surface snowfall rate retrieval studies. It should be noted that this study predominantly focuses on SSR retrieval, without considering the detection of snowfall events. Despite the partial mitigation of this issue by the models employed in this study, the inevitably reduced precision in retrieving both heavy and light snowfall events remains a critical challenge (Rysman et al., 2019) that future studies must seriously consider. The results show that PCSSR-DNNWA could effectively calibrate the weights of predictors spatiotemporally, achieving good statistical performances in terms of CC (∼0.76), ME (∼−0.02 mm/hr), and RMSE (∼0.21 mm/ hr). Furthermore, PCSSR-DNNWA exhibits better performance than DNNWA without physical constraints, which illustrates the significant role of incorporating physical constraints in reducing uncertainties in the surface snowfall rate retrieval process. Results of this study could provide potential reference for future satellite-based global SSR retrievals at the purposes of better accuracy, interpretability, and computational efficiency.