Comparative Assessment of Snowfall Retrieval From Microwave Humidity Sounders Using Machine Learning Methods

Accurate quantification of snowfall rate from space is important but has remained difficult. Four years (2007–2010) of NOAA‐18 Microwave Humidity Sounder (MHS) data are trained and tested with snowfall estimates from coincident CloudSat Cloud Profiling Radar (CPR) observations using several machine learning methods. Among the studied methods, random forest using MHS (RF‐MHS) is found to be the best for both detection and estimation of global snowfall. The RF‐MHS estimates are tested using independent years of coincident CPR snowfall estimates and compared with snowfall rates from Modern‐Era Retrospective analysis for Research and Applications Version 2 (MERRA‐2), Atmospheric Infrared Sounder (AIRS), and MHS Goddard Profiling Algorithm (GPROF). It was found that RF‐MHS algorithm can detect global snowfall with approximately 90% accuracy and a Heidke skill score of 0.48 compared to independent CloudSat samples. The surface wet bulb temperatures, brightness temperatures at 190 GHz, and 157 GHz channels are found to be the most important features to delineate snowfall areas. The RF‐MHS retrieved global snowfall rates are well compared with CPR estimates and show generally better statistics than MERRA‐2, AIRS, and GPROF products. A case study over the United States verifies that the RF‐MHS estimated snowfall agrees well with the ground‐based National Center for Environmental Prediction (NCEP) Stage‐IV and MERRA‐2 product, whereas a relatively large underestimation is observed with the current GPROF product (V05). MHS snowfall estimated based on RF algorithm, however, shows some underestimation over cold and snow‐covered surfaces (e.g., Greenland, Alaska, and northern Russia), where improvements through new sensors or retrieval techniques are needed.


Introduction
Snowfall is the most frequent type of precipitation in high latitudes (Adhikari et al., 2018;Behrangi, Tian, et al., 2014;Field & Heymsfield, 2015;Levizzani et al., 2011;Liu & Curry, 1997) and can play an important role in global energy and water cycle studies (Kidd & Huffman, 2011). In high latitudes, snow in the form of snowpack provides a major source of freshwater for many countries including the United States (Skofronick-Jackson et al., 2015). Snow over ocean may influence the surface sensible heat and ocean surface buoyancy flux (Liu & Curry, 1997). Knowing the snowfall amount on sea ice is also essential to derive sea ice thickness from spaceborne altimeters that measure ice freeboard (Brucker & Markus, 2013;Kwok et al., 2017;Webster et al., 2014). Therefore, an accurate quantification of snowfall in regional and global scales is important not only for the present weather observations but also for long-term climate analysis.
The traditional gauge measurement technique for snowfall measurement comes with high uncertainties and errors. Over the high latitudes, the gauge deployment is sparse and discontinuous , and due to various factors such as wind-induced undercatch, the measurement could be largely biased Fuchs et al., 2001;Panahi & Behrangi, 2020). The launch of CloudSat satellite in 2006 (Stephens et al., 2002) provides a valuable alternative for global snowfall estimation from space. CloudSat carries the Cloud Profiling Radar (CPR) that is suitable to measure snowfall from space because of its high sensitivity (~−28 dBZ) (Tanelli et al., 2008). CPR shows great capability to detect falling snow from space (Kulie & Bennartz, 2009;Wood et al., 2014); as a result, it has been widely used for regional and global snowfall analysis Chen et al., 2016;Kulie et al., 2016;Liu, 2008;Palerme et al., 2014). CloudSat ran into a battery issue in 2011, limiting its data collection to daylight only. In 2018 CloudSat exited the A-Train constellation but still continued to collect samples during the day (Braun et al., 2019). Another satellite with instrumentation capable of measuring rain and snowfall is the Global Precipitation Measurement (GPM) mission core observatory, which was launched in 2014 (Hou et al., 2014;Skofronick-Jackson et al., 2017). GPM is capable of measuring falling snow because of its dual-frequency precipitation radar (DPR) that operates at Ka and Ku bands. DPR can detect falling snow especially those that are associated with heavy particles (reflectivity >14 dBZ; Hamada & Takayabu, 2015) but may miss significant amount of lighter snow (~50%) when compared to the CPR observations (Adhikari et al., 2018;Skofronick-Jackson et al., 2019). Also, its limited latitudinal coverage (up to 67°N/S) restricts GPM to measure snowfall in higher latitudes.
Besides the active measurements (i.e., radar), Passive Microwave (PMW) sensors are frequently used for snowfall estimation. Snow retrieval from PMW sensors are relatively less direct than radar because of the issues with the passive nature of the sensor and the comparative lack of vertical profile information, but precipitation estimation from PMW has been performed for a few decades (e.g., Kidd et al., 1998;Kummerow et al., 1996Kummerow et al., , 2015Meng et al., 2017;Smith et al., 1992;Spencer et al., 1989) and is valuable due to the long-term and relatively good temporal sampling that the collection of PMW sensors offer. PMW snow retrieval is challenging and comes with large uncertainties and errors (Panegrossi et al., 2017). One of the major uncertainties is related to the complex radiative properties of nonspherical shape of snowflakes and ice crystals compared to nearly spherical shape of liquid droplets that are often considered in the retrievals (Levizzani et al., 2011;Liu, 2008). Another source of uncertainty is the difficulty of distinguishing falling snow signals or background contributions to upwelling radiations from snow/ice surfaces (Skofronick-Jackson & Johnson, 2011;You et al., 2017). More comprehensive discussion on issues related to the PMW snow retrieval is summarized in Liu and Seo (2013), Skofronick-Jackson and Johnson (2011), You et al. (2017), etc.
While challenging, PMW observations, especially high-frequency (e.g., >89 GHz) channels, have been used for snowfall retrieval (Kongoli, 2003;Liu & Curry, 1997;Noh et al., 2006;Skofronick-Jackson et al., 2004;Staelin & Chen, 2000;You et al., 2017, etc.). You et al. (2017) assessed the performance of all GPM microwave imager (GMI) channels and concluded that the 166 GHz frequency channel is most influential for snow detection. This is because the higher-frequency channels (e.g., 89-187 GHz) or their combinations are sensitive to the scattering of the particles rather than emission and can be used over land and ocean. (Bennartz & Bauer, 2003;Michelle & Bauer, 2006;Weng & Grody, 2000). The depression in upwelling brightness temperatures (TBs) caused by falling snow is~15 K at higher-frequency channels, whereas the difference is hardly detectable by lower channels (<37 GHz) (Katsumata et al., 2000). Noh and Liu (2004) observed the response of PMW channels for snowfall cases and noticed that TBs depression is optimum at 150 GHz. Skofronick-Jackson and Johnson (2011) summarized that TBs near 166 and 183.3 GHz are more sensitive to frozen hydrometeors and are potentially useful to retrieve snow over both land and ocean. Although snow retrieval over land is found more complex than over ocean because of the variations in emissivity from heterogeneous nature of surface (Skofronick-Jackson & Johnson, 2011), combinations of lowerand higher-frequency (e.g., 10 and 166 GHz) channels have shown great potentials for retrieving snowfall over snow and ice surfaces (Ebtehaj & Kummerow, 2017). Furthermore, Edel et al. (2019) demonstrated that among all Microwave Humidity Sounder (MHS) channels, near-183 GHz channels are the most important ones to detect snowfall. It should be noted that besides sensitivity toward scattering signatures, high-frequency channels are useful to detect snowfall because they are less affected by surface features (Skofronick-Jackson et al., 2004).
Currently, multiple PMW sensors such as GMI, combined microwave imager/sounders (e.g., Special Sensor Microwave Imager and Sounder [SSMIS]), and microwave sounders (e.g., Advanced Technology Microwave Sounder [ATMS] and MHS) are in space and can potentially be used to detect snowfall. One of the popular PMW precipitation retrieval algorithms is the Goddard Profiling (GPROF) algorithm which implements Bayesian approach to retrieve precipitation rates from TBs (Kummerow et al., 2015). GPROF has gone through several improvements over time and currently is being used to retrieve rainfall and snowfall from GMI and GPM constellations of microwave sensors. Figure 1 shows the geographical distributions of mean surface snowfall rates from (a) CPR, (b) MHS on NOAA-18 using GPROF V05, and (c) their differences. The mean snowfall rates are calculated in each 2°× 2°longitude and latitude boxes by dividing accumulated snowfall by the total number of snow and 10.1029/2020EA001357

Earth and Space Science
ADHIKARI ET AL. no-snow samples. Note that Figure 1 represents the statistics for matched footprints between the sensors, and both CPR and MHS using GPROF (or MHS-GPROF) precipitation are considered when Modern-Era Retrospective analysis for Research and Applications Version 2 (MERRA-2) 2 m surface temperatures are below 1°C. Four years of matched climatology shows that MHS snowfall rate using GPROF heavily underestimates that of CPR over both land and ocean, especially over Greenland and Antarctica (Figure 1c). In several global regions such as northern Greenland including coastal areas, Arctic Ocean, Alaska, Eastern Europe, and southern Antarctic ocean, CPR reports higher values of mean snowfall rates (e.g., >2.5 mm/day) (Figure 1a) than MHS-GPROF estimates that are often less than 1 mm/day (Figure 1b). This motivated us to explore the information content in MHS channels for snowfall retrieval and to investigate if an alternative approach can be used to outperform GPROF snowfall retrieval. There are several MHSs available, mainly on National Oceanic and Atmospheric Administration (NOAA) and MetOP platforms, providing valuable spatiotemporal sampling for precipitation retrieval. Liu and Seo (2013) used coincident observations from CPR and Advanced Microwave Sounding Unit-B (AMSU-B)/ MHS to produce a lookup table that can be used for snowfall detection. Using independent samples, they reported a Heidke skill score of up to 0.4 for snowfall detection over land. Kongoli et al. (2015) used ground observations to train the ATMS observations over the United States. They established a snowfall relationship with variable weather conditions such as for relatively warmer (colder) weather; snowfall is associated with lower (higher) high-frequency TBs, and the TBs are negatively (positively) correlated with measured snowfall rates, respectively. Tang et al. (2018) merged CPR and GPM radar observations to separately train GMI and Moderate Resolution Imaging Spectroradiometer (MODIS) channels for snowfall retrieval. Their report shows that snowfall rate estimated by combining GMI channels, environmental variables, and infrared bands can be improved over snowfall estimates only from microwave (MW) channels. A recent study by Edel et al. (2019) have demonstrated the ability of MHS channels to detect snow over the Arctic region. Combining MHS channels with some meteorological variables and applying random forest (RF) model, the study has reported a success of detecting Arctic snow occurrence that was fairly close to the CPR observations (~4% underestimation). However, besides detection, a global snowfall intensity estimation is needed to assess MHS potentials over various surface types. Considering CPR as reference or "truth," this study trains NOAA-18 MHS channels with CloudSat snowfall rate by using several machine learning (ML) methods. The study aims to address the following scientific questions: 1. How does snowfall rate retrieved from MHS-GPROF compare with that estimated from the CPR? 2. Are ML methods effective for detecting and estimate global snowfall from MHS observations? How do these methods compare? What are the important input features? 3. How well does the developed ML snowfall retrieval compare with snowfall estimates from CPR, groundbased, reanalysis, and Atmospheric Infrared Sounders (AIRS) products?
Note that precipitation estimates from AIRS (Susskind et al., 1997) has been used in the Global Precipitation Climatology Project (GPCP) product (Adler et al., 2003(Adler et al., , 2018 in lack of quality MW precipitation estimates in high latitudes and Polar Regions.To answer the above questions, this study utilizes 4 yr (2007)(2008)(2009)(2010) of coincident data from CPR, NOAA-18 MHS, and a few environmental variables from reanalysis, to assess the possibility of using ML methods to detect and estimate snowfall from PMW sounders at a global scale. MHS on NOAA-18 was selected as it provides ample coincident observations with CPR during the 2007-2010 period, when CPR was fully functional during both day and night. Section 2 provides a brief description of the data set used in the study. Section 3 describes the data-matching process followed by comparison of several ML algorithms to detect global snow. Section 4 provides both classification and regression results, followed by evaluation in section 5. Summary and conclusions are presented in section 5.1.

Data
The data sets used in this study are summarized in Table 1. A brief description of each data set is given as follows:

NOAA-18 MHS
The MHS on board the NOAA-18 satellite is the primary source of the data set used in the study. MHS operates at five MW channels (89, 157, 183 ± 1, 183 ± 3, and 190 GHz; hereafter "tb89," "tb157," "tb1831," "tb1833," and "tb190," respectively), and it has been collecting data since it was launched in May 2005. It is a self-calibrating cross-track scanning radiometer, observing the Earth at an altitude of 854 km above the Earth surface with a field of view of ±50°across the nadir (Mo, 2006). MHS samples 90 contiguous footprints across each track, with a resolution of 17 km at nadir, that results in approximately 2,180 km of the total swath width. MHS high-frequency channels are sensitive to ice hydrometeors (e.g., depression in TBs in the presence of ice); thus, they are suitable to estimate snowfall measurement (Bennartz & Bauer, 2003, Noh & Liu, 2004. Few other MHS sensors are available on NOAA (e.g., 18 and 19) and MetOP (A, B, and C) platforms, enabling good spatiotemporal coverage. However, all of them are not available at the same time (Huffman et al., 2019).

CloudSat CPR
CPR is a 94 GHz CPR that was launched in 2006 and is on the CloudSat satellite. CPR is a nadir-looking radar with spatial resolution of~1.4 × 1.7 km and vertical resolution of 500 m. CPR covers 81°S to 81°N, providing almost global coverage for detecting and estimation of snowfall from space. CPR may have some limitations in measuring heavy snowfall, because of its signal saturation under intense precipitation. However, the saturation is minimal in high latitudes, especially poleward of 60°latitude (Behrangi et al., 2012). In this study, 4 yr (2007-2010) of CloudSat 2C-snow profile V5 data are used which includes surface snowfall rate, snow uncertainty, and other variables listed in Table 1. 2C-snow profile estimates surface snowfall rates based on the vertically retrieved radar reflectivity profiles considering size, attenuation, and multiple scattering properties of the hydrometer. Note that the CPR surface snowfall rates in 2C-Snow product are not directly at the surface; it is estimated by ignoring two to four ground-contaminated bins depending upon surface types (i.e., ocean or land) (Wood et al., 2013). CloudSat CPR is chosen as a reference data set here because of large coincident sampling with MHS on NOAA-18, its capability to detect and estimate snowfall comparable to ground-based radar observations (Chen et al., 2016;Hudak et al., 2009;Matrosov et al., 2008;Smalley et al., 2014), and its wide coverage (81°N/S).

Auxiliary Data
In this study, MERRA-2 variables are considered as one of the primary auxiliary data sets. MERRA-2 has improved over the original MERRA data set because of the several improvements made in the assimilation system including the assimilation of modern MW observations and hyperspectral radiances along with GPS-Radio Occultation data sets (Gelaro et al., 2017). MERRA-2 produces several meteorological variables. This study uses surface 2 m wet bulb temperatures (hereafter: "t2mwet"), total precipi water vapor (TPW), and surface precipitation rate. The temporal resolution of surface data is 3 hr with a horizontal resolution of 0.5 × 0.6°.
Another important axillary data set is surface snow/ice index ("AUTOSNOW" hereafter) which is derived from global multisensor automated snow/ice cover maps (Romanov, 2017). Surface indices are represented by binary values, where 0 stands for open water, 1 for snow-free land, 2 for snow over land, and 4 for ice over water. This is a daily global gridded product with a spatial resolution of 4 km. The surface indices are generated from combined snow/ice retrievals with various satellite sensors operating in visible, infrared, and MW spectral bands.

Data Sets for Evaluation and Comparison
For the evaluation purpose, besides independent CloudSat snowfall rates, hourly accumulated precipitation data from the National Center for Environmental Prediction (NCEP) Sage-IV was used over the United States. NCEP Stage-IV data assimilates both radar and gauge observations, which are further adjusted based on 12 River Forecast Centers. The spatial resolution of Stage-IV data is 4 km and is available in polar stereographic grid (Lin, 2011). With quality control, the product agrees closely with gauge measurement (Zhang, 2018), so this study uses Stage-IV product for validation and case study. Furthermore, the study uses MERRA-2 surface precipitation rates, AIRS IR only precipitation, and NOAA-18 MHS-GPROF level 2 products to assess the performance of the snowfall rates predicted by ML approach compared to other popular products. To define snowfall from MERRA-2, AIRS, and MHS-GPROF precipitation products, a threshold of surface wet bulb temperatures below 1°C is applied, which ensures that the given precipitation product is snowfall. AIRS estimated precipitation rate is an hourly "level 2G" global gridded orbital product with a grid cell size of 0.25°. AIRS precipitation rates are estimated from cloud information such as cloud top pressure, fractional cloud cover, and cloud layer relative humidity using a regression relationship between cloud parameters and coincident rain gauge measurements (Bolvin et al., 2009;Susskind et al., 1997). AIRS precipitation rate is used in GPCP in high latitudes and has been compared with in situ data reasonably well (e.g., Bolvin et al., 2009).

Methodology
Figure 2 provides a summary of data and procedure that has been implemented in this study. The first step is to create a database by matching up all the required parameters, considering CPR snowfall rate as the reference data. Then, train, test, and evaluate the model using independent years of CPR snowfall as well as other popular products listed in Figure 2.

Database and Data Matching Approach
To prepare input data for ML algorithms, 4 yr (2007-2010) of CloudSat CPR orbital snowfall rates are considered as a reference data. CPR footprints are matched with MHS footprints by using the nearest neighbor approach. CPR footprint size is fixed (1.4 km × 1.7 km), whereas MHS footprint size is approximately 16 × 16 km 2 at nadir and increases toward the edge of the scan line. One possible approach to make footprint sizes comparable is upscaling the CPR resolution, which is a common practice (Adhikari et al., 2018;Behrangi et al., 2012;Liu & Seo, 2013;Stephens et al., 2010;Sun et al., 2006). Following the methodology, we have reduced the CPR resolution by calculating a running mean average of snowfall rates at 10 contiguous CPR footprints (e.g., see Behrangi et al., 2012). This allows a better comparison between CPR and MHS footprints. For each CPR footprints, nearest MHS footprints are identified within 20 km of distance and 15 min of overpass. In case multiple MHS footprints are found within the given threshold, preference is given to the MHS footprint that has the closest time with CPR footprints. Once the nearest MHS footprint is found, MHS TB values (from all five channels) and MHS-GPROF surface precipitation rate are stored. The MHS-GPROF precipitation values toward the outermost edges of the swath are either missing or not determined, so the outermost five footprints on both sides of the swath (1-5 and 86-90) are excluded from our analysis, and only 80 MHS footprints in a scanline are considered. To be consistent with CPR coverage, this study is restricted to 80°S/N.
Meteorological variables and surface-type information associated with each CPR footprint are derived from MERRA-2 and AUTOSNOW products. Since MERRA-2 and AUTOSNOW are gridded products, the nearest grid box of these products is found based on CPR longitude, latitude, and time, then variables such as t2mwet, TPW, surface precipitation rate, and surface type index are stored. Furthermore, we matched up CPR and the gridded AIRS precipitation product that is used in GPCP and has 0.25 × 0.25°resolution. CPR provides nadir-only observation and has a footprint size of~1.4 km × 1.7 km. Therefore, several CloudSat footprints fit inside a single AIRS grid, and the thin CloudSat track can be anywhere within the corresponding AIRS footprint. Sometimes, the CloudSat track passes through the center of an AIRS grid, and sometimes it passes near the edges. So, AIRS search box is extended to one more neighbor box in both longitude and latitude directions, then average precipitation value is assigned to the corresponding CPR footprint.

Training and Testing Data
This study uses eight variables, that is, TBs from five MHS channels and TPW and t2mwet from MERRA-2; surface index from AUTOSNOW product as an input (features); and snowfall index/snowfall rate from CloudSat CPR as a reference (label). The CPR snowfall rates associated with high uncertainties such as "snowfall retrieval status" greater than 3 are excluded from this study, and only "snow certain" footprints are considered. Surface index, t2mwet, and TPW are also considered as input variables to understand the importance of environmental conditions toward snow retrieval.
Since 80 MHS footprints in a scanline are considered in this study, one issue may be related with MHS footprints locations (e.g., nadir vs. edges), which might result in different TB values at nadir when compared with the edges due to increase in slant path toward the edges (Skofronick-Jackson & Johnson, 2011). To Figure 2. A schematic diagram lists the data sets as well as description of the process that has been implemented to train, test, and evaluate the model. Four years of data (2007-2010) is used to train and test the model based on "rotating-train test" scheme. The scheme uses any three years of data to train the model and test over remaining independent year data for each combination. The independent year estimation is further evaluated by comparing with CPR, MHS-GPROF, MERRA-2, and AIRS precipitation.

10.1029/2020EA001357
Earth and Space Science ADHIKARI ET AL.
address this issue, MHS footprints are divided into four groups (20 in each group) based on the footprint locations. The first group consists of 20 near-nadir footprints (36-55), and similarly going toward the edges on both sides, the fourth group consists of the outermost 20 footprints (6-15 and 76-85). Then, each footprints group is trained separately. Out of 4 yr of data, the model is trained using 80% of 3 yr of training sets and validated using the remaining 20%. The method of splitting training and testing data in 80:20% is similar to Edel et al. (2019). The final test was performed using the remaining 1 yr data set that is completely independent from those used for training and validation. This process is repeated for all four possible combinations ("rotating-train test" scheme), and average values are used for interpretation. The four possible combinations are 2007-2008versus 2010, 2007-2008versus 2009, 2007versus 2008versus 2007. For instance, if 2007-2008 are used for training (80%), it will be validated with the remaining 20% in this period and will be tested against 2010 independent year and so on. So at the end of the process, the model is tested against four independent years of data, and the results of all four years are analyzed.

Model Selection
There are several ML methods available that can be used for both classification and regression analysis, and choosing a proper one for snowfall detection is important. In this study, a supervised learning approach is used, where features are trained under the supervision of labeled data; that is, MHS TBs and environmental variables are trained using CPR snowfall. Note that classification approach is used for snowfall detection and regression approach is used for estimating snowfall rate. For snowfall area classification, a snow index is used as labeled data. If CPR footprints show nonzero surface snowfall rate, then "1" is assigned, otherwise "0" is assigned for no-snow footprints. To pick the best model for snow classification, we have analyzed four popular supervised ML models: K nearest neighbor (KNN), naïve Bayes, decision tree, and RF. A brief introduction of each algorithm is given as follows.
The KNN is a relatively simple learning algorithm that can be used for both classification and regression problems. The KNN has been used in several remote sensing applications such as precipitation types classifications (e.g., Yang et al., 2019), precipitation estimation (Ahmed et al., 2020;Huang et al., 2017), and image classification (e.g., Li et al., 2009). The KNN assumes that features with similar properties exist in a close proximity. We have determined the "proximity" based on the Euclidian distance; that is, an object is classified by a majority vote of its neighbors (Baoli et al., 2004;Keller & Gray, 1985). In this study, algorithm finds the KNN (K = 5; where K is an arbitrary positive integer) among the training data and uses the category of the nearest neighbor, that is, snow and no-snow footprints.
Another ML algorithm that is tested in this study is naïve Bayes algorithm, which is one of the simplest instances of a probabilistic classifier. It uses the probability theories especially the Bayes theorem based on the prior knowledge and makes a naïve assumption that all the features are independent (Soria et al., 2011;Ting et al., 2011). The naïve Bayes algorithm has been used in several classification and regression problems such as cloud classification (Grams et al., 2016), image classification (e.g., Yang et al., 2017), and rainfall forecast (e.g., Gupta & Ghose, 2015;Hameg et al., 2016).
The third algorithm that has been tested for classification problem is a decision tree. Decision tree is based on a multistage or hierarchical decision scheme or a tree-like structure, where it recursively divides the data set into smaller subdivisions based on the descriptive features until it reaches a small enough set, which eventually falls under one label. Decision tree has been widely used in remote sensing classification problems because it does not need assumption regarding the distributions of the input data and handles nonlinear relationships between features and labels (Brodley & Friedl, 1997;Xu et al., 2005).
The fourth and the last ML algorithm that has been tested is the RF algorithm. Like its name suggests, RF is a model that is made up of several decision trees with randomly sampled training data points, as well as randomly subsets of features when splitting the nodes. For the classification problem, each individual tree predicts a class of the features, and the class with the most votes is considered as the final class or model prediction (Gislason et al., 2004;Ham et al., 2005). If the statistical probability of the given class is higher than 0.5, then the class is considered as true otherwise false. In this study, each sample footprint is passed through numerous decision trees and each tree casts a vote for snow or no-snow classification, then the output is averaged out in terms of statistical probability. If the probability of snowfall of a given footprint is 10.1029/2020EA001357 Earth and Space Science ADHIKARI ET AL.
higher than 0.5, the RF classifies the given footprint as a snow footprint. The RF can also be used in a regression problem, where it predicts snowfall rate of a given footprint based on the all decision trees' prediction.
Since RF method relies on several decision trees (in this study, 100 trees are considered because there is no significant improvement by adding more trees, which is consistent with Edel et al., 2019), and it is based on a random process, the RF is considered to have a low chance of overfitting (Hastie et al., 2009). RF runs efficiently on large data sets, computationally lighter than other tree assemble methods, and can capture nonlinear relationships between input features and labels, so it has widely been used in satellite remote sensing research (Kühnlein et al., 2014).
To select a suitable model for snowfall area classification, all the four ML models are trained with the same input features. Each model's performance is evaluated based on a test data set (i.e., the independent year data), and the skill scores are compared among the models (see Table 2). The validation statistics (i.e., based on 20% of data) are not considered in this study, as it might be biased because of potential overlap between training and validation samples or similarity of the observed scenes. In terms of accuracy score, all the models show greater than 0.85 except the naïve Bayes (0.74). KNN shows the least value of probability of detection (POD: 0.32) and Heidke skill score (HSS: 0.25) when compared to the other models. In terms of bias, decision tree and RF have a bias score close to 1. Bias is calculated by dividing the number of snowfall samples detected by RF-MHS by the number of snowfall samples detected by CPR. A bias value of 1 indicates that the total number of snowfall samples detected by RF-MHS and CPR are equal, and a bias value smaller (larger) than 1 indicates underdetection (overdetection) by RF-MHS, respectively. Overall, RF classification approach stands out as the best among the other models with a bias score 1.01, accuracy score 0.89, HSS score 0.48, and with the lowest value of false alarm rate (0.27) ( Table 2). So, this study uses RF approach for both classification and regression (hereafter "RF-MHS") for further analysis.

Classification Analysis
In this section, the skill of RF-MHS classification for detection of the snowfall area is assessed. Note that a total of eight input features are used in the model which includes five TBs from NOAA 18 MHS, and rest of others are t2mwet from MERRA-2, TPW from MERRA-2, and surface types from AUTOSNOW. The RF-MHS is trained and tested based on the "rotating-train test" approach, and each independent year's statistics is presented in Tables 2 and 4; years of average snowfall occurrence climatology are presented in Figure 3. Figure 3a shows the occurrence of snow reported by CPR during 4 yr of observations. Snow occurrence is estimated by dividing total number of CPR snow by total number of CPR snow and no-snow in each 2°× 2°longitude and latitude grid. The CPR snowfall occurrence map is consistent with previously reported global snow occurrence maps (Kulie & Bennartz, 2009;Liu, 2008). Figure 3b is similar to Figure 3a, but the snow occurrence is estimated from MHS based on the "rotating-train test" RF algorithm. Four years of average RF-MHS snowfall occurrence shows underestimation over few geographic locations such as Greenland, eastern Russia, and Antarctic region, whereas RF-MHS overestimates CPR over northern Russia, eastern coast of the United States, Sea of Japan, and Southern Ocean (Figure 3b). RF-MHS snowfall underestimates CPR by~25% over Greenland and some  regions of northern Antarctic and overestimates CPR by~20% over northern Russia and the Southern Ocean. However, overall, it shows good agreement with CPR in many other locations across the globe (Figure 3b). This implies that RF-MHS algorithm can be transferred to independent years of MHS observations, yet detect global snow with approximate accuracy of 0.88, POD of 0.55, and HSS score of 0.48 (Table 3). It is reasonable to see some lower values of skill scores for independent years test data because the model has never seen those data during the training. It is also important to note that the differences in footprint size between MHS and CloudSat is still a source of uncertainty. The 10 CloudSat footprints considered here for comparison with MHS is based on the lower bound approach discussed in Behrangi et al. (2012). Increasing the number of CloudSat footprints based on other scenarios (e.g., areal fit instead of linear fit) can lead to increase in CPR precipitation occurrence as discussed in Behrangi et al. (2012).
An important question is how the RF-MHS method performs over different surface types identified by the AUTOSNOW product. The RF-MHS performance is further evaluated over different surface types such as over open ocean, snow-free land, snow-covered land, and sea ice, and 4 yr of statistics are presented in Table 3. RF-MHS snowfall area detection skill scores are slightly better over snow-free surfaces when compared to the snow/ice-covered surfaces both over land and ocean. Based on skill scores, RF-MHS shows relatively better performance over snow-free land (HSS: 0.57, POD:0.61, bias: 0.99) and open water (HSS: 0.51,   (Table 03). Although it has been widely accepted that the PMW snow detection over snow/ice surface are complex and associated with large uncertainties (Liu, 2008;Levizzani et al., 2011, Panegrossi et al., 2017Skofronick-Jackson & Johnson, 2011;You et al., 2017 etc.), RF-MHS shows encouraging statistics that the model can be implemented to any independent years data and any surface types. For all combined surface types, an HSS score of 0.48 explains that 48% of the correct classification is not due to chance, and a bias score close to 1 demonstrates the RF-MHS detected snowfall area is similar to the reference CPR.

Importance of the Variables
One of the interesting features of the RF is that it can provide the feature importance based on how each variable influences the entire classification process. The importance of each of the eight features for snowfall area classification is presented in Figure 4. It shows an average important score and its deviation from the mean (error bars) for each input feature using 4 yr of data. The most important features are found to be t2mwet and tb190 with approximately 19% of importance, followed by tb157 and TPW with approximately 13% of importance. t2mwet is an effective variable for precipitation phase separation and has been used in several snowfall classification algorithms (Adhikari et al., 2018;Behrangi et al., 2018;Sims & Liu, 2015). Sims and Liu (2015) reported that t2mwet heavily influences the mode of the precipitation near the surface. They estimated that at or below 1°C t2mwet, there is more than 50% chance that precipitation falls as snow both over land and ocean. Higher-frequency PMW channels such as tb190 and tb157 also show higher importance when compared to other variables. Higher PMW channels (>89 GHz) are sensitive to ice scattering signatures, and several previous studies have reported the potential of satellite higher-frequency channels to detect snowfall signal (Edel et al., 2019;Liu & Seo, 2013;You et al., 2017). TPW shows approximately 13% importance, while surface types are found to be the least important variables with relative importance being less than 2%. It is interesting that the surface type is found as the least important variables among the others, and it is consistent with previous reports (Edel et al., 2019;Rysman et al., 2018). It is speculated that when the surface is below freezing temperatures or covered with snow, RF-MHS determines snow classification based on the t2mwet and other input features, so a separate surface type input is not critical.

Regression Analysis
In this section, surface snowfall rates are estimated based on the eight input features (i.e., those shown in Figure 4) using RF regression approach. The labels are snowfall rates instead of snow or no-snow index used in the classification. A two-step process is used in which in the first step no-snow (or zero snowfall rate) cases are identified, and then a regression approach is applied to only snowfall (or nonzero snowfall rate) cases. The two-step process was selected as it resulted in better statistics than one-step process, and it also enables consistency between snowfall detection and estimation outcomes. Like the classification approach, the regression model is trained based on the "rotating-train test" methods as described in the previous section. The regression output is evaluated mainly based on three skill scores: Pearson correlation coefficient (R), Figure 4. The figure represents the input feature importance to classify MHS pixels. The total sum adds up to 1. Since the model is trained based on "rotating-train test" method, the plot represents 4 yr average importance score, and the error bar represents the standard deviation.

Earth and Space Science
ADHIKARI ET AL.
root-mean-square error (RMSE), and bias. R is a statistical measure of the strength of the relationship between the relative movement of two variables. Its value ranges between −1 and +1. A value of +1 (−1) is a perfect positive (negative) correlation which indicates two variables move in the same (opposite) direction, whereas an R value of 0 implies no linear relationships between the variables. RMSE is the standard deviation of the residuals that tells how well the predicted data is concentrated around the best fit line, and bias compares the predicted total snowfall with respect to the reference. The regression statistics are estimated over open water, snow-free land, snow-covered land, sea ice and combined surface types. For combined surface types, an R value of approximately 0.5 is estimated which indicates that the CPR snowfall pattern is fairly captured by RF-MHS with an error of approximately 0.28 mm/hr (Table 4). For each independent year, the bias score is close to 1, which indicates that the RF-MHS can estimate almost similar amount of accumulated snowfall when compared to the CPR. The analysis is further extended over different surface types. In terms of R value, open ocean and snow-free land show slightly better scores (~0.55) when compared to the snow/ice covered surface (~0.50). The RMSE is comparable to all surface types (~0.30 mm/hr) for each independent test year except a slightly higher value over sea ice (~0.40 mm/hr). The bias score is close to 1 for all surface types except over snow-free land where RF-MHS slightly underestimates CPR. Besides minimal differences over snow-covered land and sea ice, RF-MHS shows an encouraging result.
The above presented statistics are based on footprint to footprint comparison. The skill scores can be improved significantly by averaging them in 0.5°× 0.5°longitude and latitude grid boxes. Scatter density plot of 4 yr of averaged snowfall rates between CPR and RF-MHS over various surface types are presented in Figure 5. The R values are improved noticeably from approximately 0.5 to 0.75 over combined surface types when compared to footprint to footprint statistics (Table 4 and Figure 5). For lighter snowfall rates

Earth and Space Science
(<5 mm/day), RF-MHS slightly overestimates CPR, whereas RF-MHS underestimates CPR for heavy snowfall cases (>15 mm/day). Similar patterns are observed over all surface types ( Figure 5). The bias is almost close to 1 for all surface types except over snow-free land, where a slight underestimation (bias = 0.92) by RF-MHS is observed. Although the overall statistics are promising for all surface types, capturing a correct snowfall rate remains a challenge.
To understand the differences and biases in terms of spatial distributions, global maps of mean snowfall rates are generated and presented in Figure 6. Figure 6a shows geographical distributions of mean unconditional snowfall rates estimated by CPR for 2007-2010. The mean snowfall rates in each 2°× 2°grid boxes are aggregated as in the previous figures. A similar plot is constructed by RF-MHS predicted snowfall rates using a similar time period of data based on "rotating-train test" method and presented in Figure 6b. Both figures are fairly similar, and only slight differences are observed (Figure 6c). Over the Northern Hemisphere, RF-MHS predicted snowfall rates slightly underestimates CPR (up to 0.5 mm/day) in few spots such as over the north east coast of the United States, southern coastal region of Greenland, and northern part of Europe. Over the Southern Ocean, RF-MHS is compared well with CPR except some overestimation or underestimation in few spots as displayed in Figure 6c. The pattern seems to be more or less random and might be due to sampling stability. Over northern hemispheric higher latitudes (70°N and beyond), although most of the regions are either frequently or permanently covered with snow (Karl et al., 1993;Prigent et al., 2006;Turk et al., 2014), RF-MHS algorithm shows promising results with a difference of 0.2 mm/day or less when compared to CPR. It is reasonable to see slight underestimation over these regions because under colder environments, scattering signatures are relatively weaker because of the smaller snowflake sizes (Kongoli et al., 2015;Liu & Seo, 2013). It is also true that the snowfall rates are often small over these regions, so the differences are naturally small. Over the Antarctic region, similar results are observed with slight underestimation by RF-MHS predicted snowfall rates. Note that CPR distribution represents statistics for only those footprints that are matched with MHS. It can be concluded that RF-MHS regression approach is able to reproduce global snow climatology maps that are comparable with CloudSat in terms of overall pattern and magnitude.

Evaluation and Discussion
This section evaluates RF-MHS estimated snowfall rates for a case study (i.e., an event) using ground observation and compares the estimate with reanalysis (i.e., MERRA-2) and GPROF products. The goal is to assess the snowfall pattern obtained from the RF-MHS estimate at an event scale. Figure 7 shows a case study, representing a mesoscale winter storm over the United States on 20 December 2010 at approximately 10 UTC. Figure 7a shows precipitation rate by NCEP Stage-IV (mm/hr). The observed precipitation is indeed associated with snow because the MERRA-2 surface 2 m temperatures are well below 0°C, indicated by the black contours in Figure 7a. During the event, a long narrow band of winter storm has occurred that extends from Illinois all the way to North Dakota and Canada border. Stage-IV product shows large areas with approximately 1.5 mm/hr snowfall rate. Similar pattern, but slightly wider, is also reported by MERRA-2 snowfall product (Figure 7b), which is derived from MERRA-2 precipitation product with applying a temperature threshold of 1°C. The snow event coincides with a MHS (on NOAA-18) overpass  Figure 7d. A clear and distinct snowfall band is predicted by RF-MHS algorithm with surface snowfall rates up to 1.5 mm/hr. The RF-MHS snowfall rates are fairly comparable with the Stage-IV product in terms of intensity, but with a slightly wider band (especially toward the south near Indiana and Iowa). The RF-MHS snowfall estimates also agree well with the MERRA-2 snowfall product, in terms of both intensity and spatial pattern (Figures 7b and  7c). This verifies that RF-MHS snowfall product can reasonably detect and estimate snowfall rates.

Comparison With Other Products
RF-MHS estimated snowfall rates are further compared with snowfall rates from some of the popular products such as MERRA-2, AIRS precipitation, and MHS-NOAA18 GPROF V05 product using CPR as the reference. AIRS precipitation is used in GPCP in high latitudes over both land and ocean. The outcomes over various surface types such as open water, snow-free land, snow-covered land, and sea ice as well as over combined types are summarized in Table 5. Note that for consistency, AIRS and MHS-GPROF precipitation are

10.1029/2020EA001357
Earth and Space Science converted to snowfall using MERRA-2 t2mwet below 1°C. The current version of GPROF correlates low (R = 0.10) with and notably underestimates (bias = 0.22) the CPR snowfall rates (Figure 1 and Table 5). The GPROF estimate shows almost no linear relationship with CPR over snow-free land (R = −0.04) and over sea ice (R = 0.05). Table 5 further shows that AIRS has relatively better statistics with CPR (R = 0.36 and bias = 0.59) over combined surface types, whereas the statistics drop over the sea ice surface (R = 0.28 and bias = 0.52). MERRA-2 shows better agreement with the CPR (R = 0.52 and bias = 0.81) when compared to GPROF and AIRS for combined surface types. For each surface type, MERRA-2 statistics are consistent to each other and do not fluctuate much. RF-MHS snowfall outperforms both MHS-GPROF and AIRS products and shows comparable results to MERRA-2, with slightly better performance for BIAS. From Table 5, it is seen that RF-MHS snowfall has a correlation coefficient of 0.53 with the CPR that is the highest among the others. The bias value 0.96 indicates that the total snow retrieve from RF-MHS approach is almost close to the CPR with only a slight underestimation, and an RMSE of 0.27 mm/hr indicates the lower value of prediction error (Table 5). Over snow/ice covered regions, RF-MHS shows good agreement to the CPR as indicated by R and bias values of approximately 0.52 and 0.82, respectively. This demonstrates that RF-MHS estimated global snowfall over various surface types agrees better with CPR estimation than the current GPROF and a reanalysis product. The RF-MHS algorithm performs better or consistent with some of the earlier snowfall estimation studies (such as Liu & Seo, 2013, Kongoli et al., 2015Tang et al., 2018).
Note that the above statistics are based on the RF-MHS model where eight input features are used to train the model including t2mwet and TPW from MERRA-2. t2mwet is already established as an important variable to separate snowfall from rain and combination of near-surface air temperature, and TPW can provide information about precipitation regime as it is also used in GPROF. Figure 4 also suggests that TPW is important for snow/no-snow classification (~13%). A sensitivity test is conducted to analyze the impact of TPW on RF-MHS performance without including the TPW as an input features, and statistics are listed in Table 5. The statistics drop for all surface types when TPW is excluded. For example, the RF-MHS snowfall rate (without TPW) correlates lower (R = 0.41) when compared to the RF-MHS with TPW (R = 0.52) and the R value is reduced (from~50 to~0.37) over snow/ice covered surfaces. This indicates that without TPW, RF-MHS snowfall retrieval is less skillful but still outperforms both AIRS and MHS-GPROF estimates. Figure 8 shows geographical distribution of mean snowfall rates estimated by RF-MHS algorithm, MERRA-2, and AIRS IR-only products for footprints matched with CPR and by applying a t2mwet threshold below 1°C . Note that the represented samples include only those footprints where all the given products are matched with CPR footprints within the given threshold, that is, within 15 min of overpass and less than 20 km of

10.1029/2020EA001357
Earth and Space Science distance. Therefore, in some regions, all the sensors may not overlap with each other within the threshold used in this study, and some of the stripes or gaps are expected (see Figure 10c) because of the sample issues as they all have different scanning geometry. Although CPR and AIRS are flying together, only about 1 min apart, they do not always overlap each other within the above mentioned thresholds. The orientation of both satellites could be different and follow the same pattern when passing over the particular locations each time. This might be the reason why some of the spots in the Southern Ocean do not have matched samples even considering the 4 yr of matching samples between CPR and AIRS. Over the Northern Hemisphere in several geographical locations (e.g., the United States, northeast part of Asia, and northern Europe), RF-MHS predicted snowfall rates (Figure 8a) are well compared with MERRA-2 ( Figure 8b) and AIRS (Figure 8c). However, over Greenland, northeast of Canada, and the Arctic region, RF-MHS predicted snowfall rates slightly underestimate MERRA-2 and AIRS, which is consistent with CPR product (Figure 6c). The slight underestimation over the above mentioned regions are associated with snow-covered land/ocean which might cause augmentation in TBs or difficulty in distinguishing snowfall signals from surface or snow producing clouds. Another possible reason is that land systems have more supercooled liquid water when compared to ocean, and these supercooled liquid water may increase TBs instead of the ice scattering (Kulie et al., 2010;Liu & Seo, 2013). Over the Southern Hemisphere, RF-MHS predicted snowfall rates provide almost identical pattern to MERRA-2 with slight underestimation especially over the Antarctic Peninsula.

Error Analysis
The above analysis demonstrates the ability of RF-MHS to detect and estimate snowfall from MHS. While RF-MHS performance is fairly robust at regional and global scales, some of the snow/ice covered regions such as Greenland, Antarctic regions, and some spots beyond 70°N still show relatively higher errors for both detection and estimation. Edel et al. (2019) noticed that performance of their snowfall detection algorithm over Greenland is off by 30% when compared to the reference CPR and stated that extremely dry weather is the reason behind it. To understand the errors in the RF-MHS detection algorithm, CPR versus RF-MHS skill scores are analyzed as a function TPW and T2M and are presented in Figure 9 (solid curves). For extreme values of TPW (<5 and >14 mm), RF-MHS performance is sharply declined as indicated by lower values of HSS and POD and higher values of FAR (Figure 9a). This indicates that if the column of atmosphere is either very humid or dry, RF-MHS has difficulty to detect snowfall. The RF-MHS detection algorithm also shows its dependency on near-surface air temperatures. The POD near-freezing temperature is approximately 0.7, which is then decreases rapidly with decreasing the surface temperatures and reaches to~0.3 at or below −25°C (Figure 9b). The lower values of POD and HSS are observed specially over those regions where CPR and RF-MHS differences are high such as the northern part of the Greenland and Antarctic Peninsula regions (not shown). Similar skill score analysis between CPR and AIRS is conducted and presented in Figure 9 (dashed curves). When compared to AIRS-CPR analysis, RF-CPR shows better HSS scores for all values of TPW and T2M except the very low end (T2M < −23°C and TPW < 2 mm), where the AIRS-CPR has better HSS score than the RF-CPR. RF-CPR shows lower FAR scores than AIRS-CPR across all TPW and T2M values, whereas AIRS-CPR shows higher POD scores, almost for the entire TPW and T2M values.
Furthermore, and to investigate the effect of surface type, 4 yr of average CPR, RF-MHS, RF-NO-TPW, GPROF, and AIRS mean snowfall rates are shown as a function of T2M and TPW over various surface Figure 8. Geographical distribution of mean snowfall rate (mm/day) estimated using four years of (a) RF (b) MERRA-2, and (c) AIRS estimates. The RF estimation is average of 4 yr of model output which is trained and tested based on "rotating-train test" scheme. Note that the average is calculated based on matched pixels between CPR, MERRA-2, and AIRS. MERRA-2 t2mwet less than or equal to 1°C is used for both Panels (b) and (c). types ( Figures 10 and 11), respectively. The mean snowfall rates are calculated in each T2M and TPW bins by dividing accumulated snow by total number of samples including both snow and no-snow. CPR snowfall rate pattern is well captured by RF-MHS for all temperature bins over ocean, snow-covered land, and sea ice (Figures 10a, 10c, and 10d). Over snow-free land, RF-MHS slightly underestimates CPR especially for relatively warmer temperatures (> −10°C) (Figure 10b). This is quite interesting and not expected because we generally believe that the snow retrieval over snow/ice covered surfaces is difficult. The exact reason is unknown, but it is possible that the learning algorithm performs better when surface is relatively homogeneous (e.g., ocean or snow/ice covered surfaces) when compared to heterogenous surface (e.g., snow-free land). MHS-GPROF heavily underestimates CPR at all temperature bins for all surface types, whereas AIRS perform better than GPROF, but still large underestimation is observed when compared to the CPR (Figure 10). The RF-NO-TPW (RF estimates without TPW as an input feature) is consistent with CPR at near-freezing surface temperatures but tends to underestimate CPR when surface becomes colder. It should be noted that while the precipitation products show differences in mean snowfall rates as a function of T2M, they generally follow a similar decreasing pattern as T2M decreases. The only exception is AIRS over open water and snow-free land that shows a slightly different pattern. Figure 11 is similar to Figure 10 but shows mean snowfall rates as a function of TPW instead of  , which is trained and tested based on "rotating-train test" scheme. Red, blue, and black curves represent FAR, HSS, and POD, respectively.

Earth and Space Science
T2M. RF-MHS shows better agreements with CPR than the other products at all TPW bins over all surface types. AIRS shows better performance than the MHS-GPROF but still underestimates CPR and shows a slightly different pattern (e.g., the value of TPW corresponding to the maximum mean snowfall rate) compared to the other products. RF-NO-TPW overestimates the CPR when TPW is below~5 mm, and above that value it underestimates CPR over all surface types.
While the above findings are encouraging and illustrate the value of using ML techniques to retrieve global snowfall, some of the shortcomings of this study should be highlighted. First, this study has considered CPR surface snowfall rates as a reference data set for training and testing. CPR estimates snowfall rates based on the near-surface reflectivity which itself may suffer from attenuation problem or missing near-surface snowfall due to the near-surface clutter issue. The CPR estimated snowfall rate could be overestimated significantly by ground clutter especially over complex terrain such as mountainous regions and fjords (Palerme et al., 2018). Another important challenge is matching up the CPR with MHS footprints. CPR has a much smaller footprint size (1.4 × 1.7 km 2 ) than MHS footprint (16 × 16 km 2 at nadir). MHS footprint becomes even larger toward the edge of the swath. To mitigate the footprint size differences, we calculated running mean of 10 CPR footprints prior to the comparison. However, one may argue that more CPR footprints need to be averaged (see Behrangi et al., 2012). Another issue is related to the MHS footprints themselves, in which TBs need to be adjusted for changes in scan angle and slant path. We classified MHS footprints into four different groups based on their locations and trained them separately. This might reduce some uncertainties associated with the slant path, but it may not be accurate enough. Furthermore, uncertainties related to MERRA-2 product in colder environments also need to be considered. It is stated that MERRA-2 precipitation shows higher biases especially over the Arctic region (Boisvert et al., 2018). MERRA-2 surface temperatures could have up to 15% of false alarm rate when compared to the ground-based observations, especially when the surface is near-freezing temperatures (Adhikari & Liu, 2019a, 2019b. The use of more accurate auxiliary data may improve our analysis.

Summary and Conclusion
This study develops and compares several ML methods for detecting snowfall occurrence and estimating snowfall rate using MHS observations and auxiliary data set. Four years (2007)(2008)(2009)(2010) of CloudSat CPR snow product is used as a reference for training and testing based on the "rotating-train test" scheme to create independent samples using the entire 4 yr of data. RF algorithm is found to be the best among the other

10.1029/2020EA001357
Earth and Space Science studied methods, so we chose that for the final classification and regression analysis. A global snowfall statistic is estimated and compared with the CPR, MHS-GPROF, MERRA-2, and AIRS products. The main findings of the study are summarized as follows: 1. Four ML techniques (RF, KNN, decision tree, and naïve Bayes) are developed and tested for snow classification. The models' performance is further analyzed based on independent year data sets. Statistics such as HSS (0.48), BIAS (0.99), and FAR (0.45) suggested that RF method has the best skill to estimate snowfall from the passive MW sounder (i.e., MHS) observations with the help of some additional auxiliary data set. 2. The geographical distribution of RF-based snowfall occurrence is generally comparable to the CPR estimates, but it also shows some overestimation over the Southern Ocean and underestimation over Greenland and the Arctic Ocean. 3. The feature's importance provided by the RF-MHS method shows that surface 2 m wet bulb temperatures and TBs at 190 GHz have the highest importance for snowfall area classification with approximately 18% importance, followed by TBs at 157 GHz (16%), and TPW (13%). Surface type shows the least importance (~1.5%) among the other features that might be due to the capability of the remaining input features to provide information pertinent to the surface types. 4. A case study over the United States demonstrates that RF-MHS predicted snowfall rate is well agreed with NCEP Stage-IV and MERRA-2 products in terms of capturing the overall magnitude and location of the snowfall event. 5. The RF estimated global snowfall rates is well compared with CPR estimates (R = 0.51, bias = 0.96, and RMSE = 0.27 mm/hr) and outperforms MHS-GPROF (R = 0.13, bias = 0.22, and RMSE = 0.36 mm/hr) and AIRS (R = 0.36, bias = 0.59, and RMSE = 0.31 mm/hr). It produces results comparable to MERRA-2 (R = 0.52, bias = 0.81, and RMSE = 0.28 mm/hr). When compared to other products, RF-MHS performs better as a function of T2M and TPW.
This study shows that ML has a potential to learn complicated relationships between input features (here from MHS and auxiliary data) and reference snowfall product (here from CloudSat). We found that RF is the most robust method among the three other ML methods explored here, but this does not mean that RF-MHS is the best available option. Other options could be the use of Deep Neural Networks (DNN). A study by Tang et al. (2018), who used DNN to retrieve snow and rain in high latitudes using GMI, MODIS, and reanalysis data, showed that the DNN-based snowfall retrieval is close to reanalysis data. DNNs have been successfully applied in many diverse domains. Recently, there has been increasing interest in using DNNs to generate and improve weather forecasting. Long short-term memory (LSTM), Convolutional Neural Network (CNN, or ConvNet), and a recently proposed approach Convolutional LSTM (ConvLSTM) are other options that can compete with RF-MHS and should be tested for precipitation retrieval.
ML seems to be a valuable approach to mitigate some of the challenges that current popular precipitation products (e.g., GPROF) have for snowfall retrieval in high latitudes. The fact that the performance of the RF-MHS method is fairly stable over different surface types is particularly useful for implementation in the combined products such as the NASA Integrated Multisatellite Retrievals for GPM (IMERG; Huffman et al., 2019) as IMERG currently uses MW retrievals only over snow/ice-free surfaces. However, progress is also being made by other groups through development of more advanced physical-based methods. It seems that combinations of ML and physical methods are needed to maximize our understanding of physical processes as well as the retrieval accuracy. While in the present study we explored MHS data, the use of more advanced sensors (e.g., the Advanced Microwave Sounder, ATMS, or future generation of sensors) can help better learn features of the reference snowfall product, so they can be reproduced more effectively using independent samples.