Improving Snow‐Process Modeling by Evaluating Reanalysis Vertical Temperature Profiles Using a Distributed Hydrological Model

The use of the Japanese 55‐year Reanalysis (JRA55) vertical temperature profile (VTP) has been suggested in previous studies for estimating gridded air temperatures at high elevations in data‐scarce regions to conduct hydrological simulations. In this study, we investigate and further enhance the applicability of VTP for hydrological simulations through several experimental sets using a distributed hydrological model with improved snow physics (WEB‐DHM‐S) based on water and energy budget to simulate the hydrology of a snowfed river basin in Japan, Kurobe Basin, while simultaneously correcting the snow precipitation estimates through optimization. We examined the simulated discharge, point snow depth, and snow cover, finding that the snow processes were highly sensitive to the individual station used with VTP for extrapolating air temperature to higher elevations, with corrected snowfall estimates differing by as much as 18%. We further compared this with simulations based on linear regression‐based lapse rates derived from station observations (scarce and often distributed at low elevations). It was found that the use of VTP improved discharge simulations but deteriorated point snow depth simulations at low elevations, compared with linear regression‐based lapse rates. A hybrid of the two methods led to better discharge and point snow depth simulations with reasonable snow cover simulations. The study advances the application of snow‐process models in data‐scarce regions by enhancing our understanding about the usage of VTP.

the vertical gradient of air temperature with elevation. This air temperature lapse rate is known to be a sensitive parameter in snowmelt runoff simulations (Liu et al., 2018;Martinec & Rango, 1986). A spatiotemporally constant value of −6.5 K/km for the air temperature lapse rate (also known as the environmental lapse rate) has been used in a number of previous studies (Fiddes & Gruber, 2014;Stull, 2018). However, there is increasing evidence that the assumption of this constant lapse rate in space-time does not always hold true and thus may lead to unreliable estimates of air temperature (Kunkel, 1989;Liston & Elder, 2006). To deal with this, one of the simplest methods that makes use of available station observations is to estimate the lapse rate based on a linear regression between the station air temperature measurement and its elevation (Harlow et al., 2004;Hay & Clark, 2003;Hay et al., 2000;de Scally, 1997;Wilby et al., 1999Wilby et al., , 2000. Furthermore, a number of regional studies have derived and analyzed spatiotemporally varying lapse rates. These studies found that regional lapse rates generally exhibited a strong diurnal variation and a distinct seasonal cycle and also depended upon synoptic weather conditions (Blandford et al., 2008;Dodson & Marks, 1997;X. Li et al., 2013;Minder et al., 2010;Navarro-Serrano et al., 2018;Pepin et al., 2006). However, in these studies, the derivation of lapse rates relied on a large number of station observations. For areas with no such ground observations or observations in low-elevated regions, other alternatives are required. For example, Gruber (2012) derived lapse rates using the lowermost seven pressure levels of the National Centers for Environmental Prediction reanalysis. However, the derivation and evaluation of these lapse rates was not the primary objective of their study and hence was not thoroughly investigated. Gao et al. (2012) derived lapse rates using the European Center for Medium-Range Weather Forecasts reanalysis ERA-Interim air temperature over three different pressure levels (700, 850, and 925 hPa) to correct the ERA-Interim 2 m air temperature. Their results have been validated at several sites in the German and Swiss Alps and exhibited better performance, rather than using a constant or monthly lapse rate. They further extended the application of their method over the Tibetan Plateau (Gao et al., 2017) and the Tian Shan Mountains in China (Gao et al., 2018), and similar results were established. However, they used different combinations of pressure levels for different study areas to estimate lapse rates. Tang et al. (2018) also downscaled air temperature using ERA-Interim based air temperature over pressure levels, following Gao et al. (2012) and Gruber (2012). They established that, in comparison with more straightforward methods such as nearest neighbor or bilinear interpolation, the method by Gruber (2012) performed better for phase partitioning of precipitation. More recently, Dutra et al. (2020) used different combinations of ERA5 pressure level data to estimate lapse rates and applied corrections to the ERA5 2 m air temperature. However, they noted limited improvements when comparing ERA-Interim to ERA5.
In all of the aforementioned studies, elevation-corrected air temperatures were only cross-validated against other station observations and were not assessed for hydrological modeling applications. However, some recent studies have also utilized distributed hydrological models to study the impact of using lapse rates derived from different methods on the performance of the hydrological model. Wang et al. (2016) used nighttime Moderate Resolution Imaging Spectroradiometer (MODIS) land surface temperature (LST) to derive lapse rates for extrapolating station air temperature observations. These air temperature observations were then used to drive a water and energy budget based distributed hydrological model with improved snow physics (WEB-DHM-S). The performance of the model-simulated discharge from a series of experiments was compared and it was observed that the MODIS LST-based lapse rates derived at various snow conditions performed better than those derived by linear regression using station observations. Recently, Naseer et al. (2019) used the entire vertical temperature profile (VTP) from the JRA55 reanalysis and considered its implications for snowmelt modeling. A single temperature observation station was extrapolated using the VTP to obtain gridded air temperature for use as a forcing in WEB-DHM-S. They further used these gridded air temperatures for characterizing the phase of precipitation and applying snowfall correction factors to a radar-gauge composite precipitation product for Japan called Radar-AMeDAS (AMeDAS is the Automated Meteorological Data Acquisition System). These snowfall correction factors were applied since Radar-AMeDAS is known to underestimate snowfall because the Z-R relationship (where Z is radar reflectivity and R is precipitation rate) is designed primarily for liquid precipitation and the AMeDAS observations used for calibration of the radar product are scarce in high elevation regions (Shrestha, Koike, et al., 2014). However, the use of different air temperature observation stations with VTP and its implications for hydrological modeling were not discussed in the study by Naseer et al. (2019). Furthermore, a comparative analysis of the performance of hydrological simulations using lapse rates based on VTP with lapse rates derived by linear regression using station observations remains to be conducted. Moreover, both studies do not evaluate how extrapolating air temperature to elevations higher than where station data are available alters MOIZ ET AL.

10.1029/2021JD036174
3 of 25 estimation of the precipitation phase and corrected gridded snowfall estimates (such as those obtained by the application of snowfall correction factors defined by Naseer et al. (2019) for Radar-AMeDAS), since it is a function of air temperature.
Keeping these limitations in mind, we further apply the method developed by Naseer et al. (2019). This study aims to determine (a) how the use of individual station observations with VTP affects snowfall correction factors for Radar-AMeDAS (defined by Naseer et al. (2019)) and the resulting snowmelt simulations; (b) how well does the VTP-based method compare with simple linear regression-based methods from a hydrological modeling perspective and if it is possible to develop hybrid products with the benefit of both methods; and (c) if the VTP-based method can also be applied to reanalysis-based 2 m air temperature products to effectively "downscale" them and see what their implications in hydrological modeling are.

Distributed Hydrological Model
In this study, WEB-DHM-S was utilized to analyze the impact of different air temperature correction methods on the performance of snow-process modeling. WEB-DHM-S is based on the original water and energy budget based distributed hydrological model (WEB-DHM) (Wang, Koike, Yang, Jackson, et al., 2009;Wang, Koike, Yang, & Yeh, 2009), developed by coupling a land surface model, Simple Biosphere Model 2 (SiB2) Sellers, Tucker, et al., 1996), with the distributed hydrological model known as the geomorphology-based hydrological model (GBHM) (Yang et al., 2004).
In previous research, the snow modeling capabilities of WEB-DHM were further improved by coupling the three-layered energy balance snow scheme from the Simplified Simple Biosphere 3 model (SSiB3) (Xue et al., 2003) and the prognostic albedo scheme from the Biosphere-Atmosphere Transfer Scheme (BATS) (Dickinson et al., 1993) with WEB-DHM. The model was henceforth called WEB-DHM-S and could simulate snow processes more accurately than WEB-DHM (Shrestha et al., 2010). In addition, WEB-DHM-S could also simulate the spatial distribution of snow variables, including the snow water equivalent, snow density, snow depth, snow albedo, and snow surface temperature. WEB-DHM-S-derived spatially distributed snow cover simulations have been successfully compared with MODIS snow cover products in a number of previous studies (Bhatti et al., 2016;Naseer et al., 2019;Shrestha et al., 2012). The most recent iteration of WEB-DHM-S also employs parallel programming for grid-based processes to improve its computational efficiency. For this study, the most recent iteration of WEB-DHM-S was utilized. The model consists of regular grids and for this study we configured the model grid size to be 250 m. All the input gridded data sets were resampled to match the model grid size. Furthermore, as the target basin analyzed for the purpose of this study contains several dams, we employed a dam function which bases the releases to downstream on observation data and maintains the reservoir mass balance.

Air Temperature Correction Using Ground Station Observations
Corrections to air temperature are generally applied while interpolating observations to the grids of the hydrological model based on lapse rates derived from ground station observations using a detrended inverse distance weighting interpolation method (Shrestha et al., 2012). In this study, the lapse rates are derived based on a simple linear regression between the air temperature and elevation of the stations. Some studies have also employed factors such as the longitude and latitude of the stations (Y. Li et al., 2015); however, in this study, due to the small size of the river basin, we only considered the station elevations. This linear relationship is shown below: where is the air temperature observation at each station, a and are regression coefficients, and is the elevation of each station. represents the near-surface air temperature lapse rate(Γ) here and can be calculated for various time scales depending on .

Air Temperature Correction Using VTP
Corrections to air temperature were applied using VTP and a single observation station as proposed by Naseer et al. (2019). In this case, instead of using a linear lapse rate, the entire VTP was used to calculate the air temperature at various model grids. In this study, VTP is based on pressure level data of the JRA55 reanalysis data set (Kobayashi et al., 2015). Air temperatures and geopotential heights are defined over 37 pressure levels ranging from 1,000 hPa to 1 hPa, showing the variation of air temperature with altitude vertically above the surface. An offset is applied to the VTP by comparing the air temperature values at one selected observation station with the air temperature value of the VTP at the same elevation. Equations 2 and 3 represent a simpler formulation of the method proposed by Naseer et al. (2019) for general applicability. Figure 1 shows a schematic illustration of this method for the case of a single model grid. To prepare the gridded model inputs, this process was conducted for all model grids, as given by: where corr is the corrected air temperature at the model grid, model is the elevation of the model grid, model is the geopotential height of the lower reanalysis pressure level within which the model grid elevation lies, model is the corresponding air temperature at this pressure level, and Γ model is the lapse rate between the pressure levels within which the model grid elevation lies and the offset is given by, here obs is the air temperature at the individual observation station (which is to be extrapolated), obs is the elevation of the station, obs is the geopotential height of the lower reanalysis pressure level within which the station elevation lies, obs is the corresponding air temperature at this pressure level, and Γ obs is the lapse rate between the pressure levels within which the station elevation lies.

Snow Precipitation Correction
Despite the availability of highly sophisticated energy balance snow models, one of the biggest limitations to their application is the uncertainty in snow precipitation measurements, which are especially scarce in mountainous regions (Elsner et al., 2014;Materia et al., 2010;Nasonova et al., 2011). The interpolation of these sparse observations at high elevations for distributed hydrological modeling can lead to very poor discharge simulations compared with when some form of elevation dependent correction is applied, with Nash-Sutcliffe efficiency (NS) (Nash & Sutcliffe, 1970) varying by as much as from −0.03 to 0.66 as reported by Shrestha, Wang et al. (2014). Mo et al. (2012) also reported that precipitation forcing is the primary source of uncertainties in soil moisture and runoff simulations, particularly in the Western United States where the number of precipitation gauges are relatively limited. Radar products, on the other hand, offer a much wider spatial coverage compared with gauge-based measurements. In this study, we used the gridded precipitation data from a radar-gauge composite product developed by the Japan Meteorological Agency (JMA) called Radar-AMeDAS for all the experiments of this study. Radar-AMeDAS consists of observations from 20 JMA C-band radars and 26 MLIT (Ministry of Land Infrastructure Transport and Tourism) C-band radars. However, due to several factors, such as the high variability of the Z-R relationship employed, orographic obstacles, and distance from the radar, AMeDAS observations are used by JMA to calibrate the Radar-AMeDAS product (Makihara et al., 1996;MRI, 2000).
Despite the calibration of the radar product by AMeDAS observations, Radar-AMeDAS is known to underestimate snowfall in mountainous regions of Japan (Motoya et al., 2001;Shrestha, Koike, et al., 2014;Yamamoto et al., 2011). This is because of the reason that there are insufficient AMeDAS gauges in the mountainous regions, leading to poor Radar-AMeDAS snowfall estimates even after calibration with AMeDAS observations. Furthermore, the Z-R relationship employed in the Radar-AMeDAS product, while suitable for rainfall, may not be suitable for snowfall and is known to vary considerably for snowfall depending on factors such as the size and shape of snow crystals and the liquid content in partially melted snowfall (Fassnacht et al., 1999(Fassnacht et al., , 2003Matrosov, 1992). To overcome this underestimation in Radar-AMeDAS snowfall, Shrestha, Koike, et al. (2014) and Shrestha, Wang et al. (2014) applied a snowfall correction factor to the snow precipitation estimates provided by Radar-AMeDAS. However, recently, Naseer et al. (2019) suggested that different snowfall correction factors should be applied to the two elevation bands, one above the constant altitude plan position indicator (CAPPI) and one below it. CAPPI is a characteristic of the radar, and in the case of Radar-AMeDAS, is located at an altitude of 2 km. Furthermore, to reconstruct snowfall estimates at both the point and basin scales, several studies have relied on inverse modeling approaches (Bartolini et al., 2011;Cherry et al., 2005;Valery et al., 2009). Most of these studies only minimized errors in discharge simulations to estimate snowfall; however, Shrestha, Koike, et al. (2014) and Shrestha, Wang et al. (2014) developed a framework to minimize errors in both discharge and snow cover simulations in order to apply corrections to gauge-based snow precipitation. Our study optimized two snowfall correction factors for Radar-AMeDAS as suggested by Naseer et al. (2019) using the optimization framework of Shrestha, Koike, et al. (2014) and Shrestha, Wang et al. (2014). The framework for correcting the Radar-AMeDAS snow precipitation used in this study is displayed in Figure 2.
The WEB-DHM-S model is forced with the corrected gridded air temperature estimates along with other static data sets (a digital elevation model or DEM, as well as land use and soil) and meteorological data sets (precipitation, surface pressure, cloud cover, wind speed, relative humidity, downward longwave, and shortwave radiation). The phase of the Radar-AMeDAS precipitation was determined using the air temperature threshold snow . In this study, we set the air temperature threshold for phase partitioning ( snow) to 0 o C based on the findings of Shrestha, Koike, et al. (2014) and Shrestha, Wang et al. (2014), who used a similar modeling and optimization framework for a different river basin in Japan. This assumption was further tested through model sensitivity runs using different values of snow (see Section 4.3). Once the precipitation phase was determined, a correction factor is applied based on the grid elevation. The corrected snow precipitation is given by: where 0 is the original precipitation from the radar, 1 is the correction factor at/below the CAPPI, 2 is the correction factor above the CAPPI, is the elevation of the WEB-DHM-S model grid, and is the corrected snow precipitation. The model-simulated discharge was compared with the observed discharge to calculate the discharge error ( err ), and the model-simulated snow cover pixels were compared with the MODIS snow cover product (MOD10A2) to calculate the snow cover error ( err ). The objective function to be minimized is given by: where, err is the objective function, and is defined as assigning weights to both error components. Following Shrestha, Koike, et al. (2014) and Shrestha, Wang et al. (2014), should be defined in such a way that both error components are given equal weights. is given by: here, err is defined by a combination of the NS and relative volume error (RVE), as given by: The snow cover error ( err ) is calculated based on a pixel-by-pixel comparison between MODIS and model-simulated snow cover and is given by: , and are discharge error, snow cover pixel error, and the objective function, respectively. NS and RVE are the Nash-Sutcliffe efficiency coefficient and relative volume error, respectively. OE and UE are the snow cover overestimation (land classified as snow) and underestimation (snow classified as land) errors at time step respectively, where is the total number of time steps (see Equation 10). air is the grid air temperature and snow is the threshold temperature for phase partitioning of precipitation. 0 is the original snow precipitation from Radar-AMeDAS, 1 is the correction factor at/below the CAPPI, 2 is the correction factor above the CAPPI and is the corrected Radar-AMeDAS snow precipitation.

of 25
where, OE is the model overestimation error (land classified as snow), UE is the model underestimation error (snow classified as land), and is the number of time steps. , , , and are the number of pixels defined by the confusion matrix where, = pixels with snow in both the model and MODIS, = pixels with snow in the model, but no snow in MODIS, = pixels with no snow in the model and snow in MODIS, and = pixels with no snow in either the model or MODIS. The objective function is minimized by utilizing a well-known heuristic algorithm called the shuffle complex evolution developed by University of Arizona (SCE-UA) (Duan et al., 1992(Duan et al., , 1993(Duan et al., , 1994 to determine the snow correction factors 1 and 2 .

Study Area and Experiment Design
This study was conducted in a typical snow-dominated watershed located in Toyama Prefecture, Japan, with an area of 592km 2 (Figure 3). The river basin is known to receive heavy amounts of snowfall, with the snow depth reported to be over 4 m during winters. Heavy snowfall is associated with the cold and dry northwesterly winds of the Asian winter monsoon with sensible and latent fluxes of the Japan Sea . The basin has four hydropower dams operated by the Kansai Electric Power Company (KEPCO), which rely largely on seasonal snowmelt, particularly Kurobe Dam, which is the highest dam in Japan with an effective storage capacity of 148.8 million m 3 . The basin exhibits a wide variation in elevation from sea level to above 3,000 m. At Kurobe Dam, the coldest mean temperatures are observed in January at −5 o C and the warmest temperatures in August, with a mean of 20 o C . Much of the winter-accumulated snow melts from April to July. From August to September, the basin is usually covered with minimal to no snow, and much of the precipitation is largely received in the form of rainfall commonly associated with typhoons and Mei-Yu fronts, which can produce sharp flood peaks. Further information on the climatology and hydrology of the Japanese Alps in Toyama Prefecture is provided by Ma et al. (2013) and Yamanaka et al. (2012). Table 1 shows the experimental runs conducted in this study to determine the effect of using different elevation correction methods for air temperature on the hydrological simulations in a snowfed watershed. We further grouped these experimental runs into three unique experimental sets. For all experimental runs, the hydrological model (WEB-DHM-S) was calibrated for land-soil parameters during the summer season. The initial conditions for all the experimental runs were obtained by running the model multiple times in a spin-up period using the forcing data from 2010-09-15 to 2011-09-15 until the relative change in soil moisture and ground water storage between successive runs became less than 0.1% (see Moiz et al., 2018;Shrestha, Wang, et al., 2014;Wang, Koike, Yang, & Yeh, 2009). The model was configured at a resolution of 250 m. All experimental runs were simulated for four hydrological years (2011)(2012)(2013)(2014).
In experiment set 1, we compared four different experiments (KR-VTP, SD-VTP, KD-VTP, and DD-VTP), all of which used the Naseer et al. (2019) elevation correction method. Here, KR, SD, KD, and DD are the names of the stations used in different experiments; that is, Kurobe, Sennindani, Koyadaira, and Dashidaira, respectively, to offset the VTP. For example, KR-VTP uses air temperature observations at the Kurobe station to offset the VTP. In this way, experiment set 1 aimed to determine how the usage of different stations to offset the VTP affected the hydrological simulations and how differently the snowfall correction factors could be optimized for each experimental run. In experiment set 2, the performance of the VTP-based air temperature correction method (Section 2.3) was compared with the hourly lapse rates (HLR) derived by the linear regression-based air Note. 1 is the correction factor at/below the radar-AMeDAS CAPPI and 2 is the correction factor above the radar-AMeDAS CAPPI. Here, KEPCO = Kansai Electric Power Company and AMeDAS = Automated Meteorological Data Acquisition System. Ratio (2011Ratio ( -2014 MOIZ ET AL.

Table 1 Description of Experiments Along With the Optimized Snow Correction Factors and the Resulting Basin Average Annual Corrected Snowfall and Snowfall/Rainfall
10.1029/2021JD036174 9 of 25 temperature correction method (Section 2.2), using hydrological simulations. We further subdivided experiment set 2 into experiment set 2A and 2B, depending on the network of air temperature measurement stations used, managed by KEPCO and AMeDAS, respectively ( Figure 3). In addition, a third hybrid experiment (HYBRID) was conducted in each of experiment set 2A and 2B, in which we used a linear regression-based method (HLR) below the elevation of the highest station and the VTP-based method above that elevation. Thereafter, experiment set 2A consists of three experiments, namely KR-VTP (where KR denotes Kurobe station), HLR [K], and HYBRID [K]. Similarly, experiment set 2B consists of three experiments, namely OM-VTP (where OM denotes Omachi station), HLR [A] and HYBRID [A]. Here, "K" and "A" in brackets represent the KEPCO and AMeDAS network, respectively. The purpose of deriving the hybrid product is to determine whether there is any significance in merging both methods. In addition, to determine the impact of the hybrid product on phase partitioning of precipitation and its subsequent impact on hydrological modeling, we conducted a fourth experiment in the case of the AMeDAS station network (in experiment set 2B). In this experiment, called HYBRID [A] Snowfall , we used air temperature from HLR [A] along with the same snowfall (and rainfall) used in HYBRID [A]. Finally, in experiment set 3, HLR [K] was compared with an experimental run where the air temperature was entirely based on reanalysis (JRA55 [R]). Here, "R" in the brackets denotes that this air temperature data set is only based on the reanalysis product (JRA55). In experiment set 3, the air temperature was derived by elevation correction of a coarse resolution 2 m air temperature, using reanalysis based VTP. This approach is similar to the study by Gao et al. (2012); however, the major difference lies in the usage of the entire VTP instead of using only specific pressure levels to derive lapse rates. The advantage of this method is that no station observations were used, which may be useful for hydrological modeling of regions with no station observations.
To evaluate the impact of different air temperature correction methods on hydrological simulation in each experimental run, we compared the NS and PBIAS of discharge simulations at each of the four dams (Kurobe, Sennindani, Koyadaira, and Dashidaira) at a daily scale. Furthermore, we also compared the root mean square error (RMSE) between the observed and simulated point snow depths at each of the four stations located at the four dams. The pixel-by-pixel underestimation ( UE ) and overestimation ( OE ) between snow cover simulations and MODIS observations were also compared. The snow correction factors ( 1 and 2 ) were independently optimized in all the experiments, whereas the rest of the model parameters did not vary (see Table S1 in Supporting Information S1 for a complete list of model parameters). The distribution of the average annual air temperature at each model grid and its elevation is displayed in Figure 4 for all experiment sets. Their seasonal counterparts are displayed in Figures S1-S4 in Supporting Information S1.

Data Sets
The static and dynamic inputs used for the WEB-DHM-S simulations and the evaluation of the model outputs are shown in Table 2. These data sets are discussed in the following sections.

Static Data Sets
The static data sets include the DEM and the soil and land cover types. The DEM was obtained from the Geospatial Information Authority of Japan (GSI) at 10 m resolution and was then resampled to 50 m to derive the grid hillslope parameters; then was further resampled to 250 m to provide the model elevation grid. Soil maps were obtained from the Food and Agriculture Organization (2003), and land cover maps for SiB2 land classification were obtained from the United States Geological Survey (USGS) (https://lta.cr.usgs.gov/GLCC). Both data sets were resampled to model resolution (250 m) using nearest neighbor interpolation.

Dynamic Vegetation Data Sets
The dynamic vegetation inputs include the leaf area index (LAI) and the fraction of photosynthetically active radiation (FPAR) based on the MODIS 8-day composite product (MCD15A2H), combined from both the Aqua and Terra sensors. These data sets were downloaded from NASA's AppEEARS (https://lpdaac.usgs.gov/tools/ appeears/) at a resolution of 500 m and were then reprojected and resampled to the model resolution (250 m) and projection system (UTM 53 N).

Meteorological Forcing
Precipitation forcing is based on the Radar-AMeDAS composite product obtained from the Data Integration and Analysis System (DIAS) with a spatial resolution of 1 km and a temporal resolution of 30 min. This product is a combination of 20 JMA C-band radars and 26 MLIT (Ministry of Land Infrastructure Transport and Tourism) C-band radars along with approximately 1,300 AMeDAS stations. The precipitation grid was interpolated to model spatial and temporal resolution.
Air temperatures were obtained from networks managed by KEPCO and AMeDAS. Both provide air temperature observations at a resolution of 1 hr. Data for the four stations were obtained from KEPCO, whereas data for 10 AMeDAS stations were downloaded from DIAS. It should be noted that the AMeDAS stations provide real time data.
All other meteorological forcings (wind speed, relative humidity, downward shortwave radiation, downward longwave radiation, and air pressure) were obtained from JRA55 with a temporal resolution of 3-hr and a grid resolution of 0.5625° (∼55 km). The 2 m air temperature was also obtained from JRA55 for use in experiment JRA55 [R]. All the meteorological forcings from JRA55 were linearly interpolated to a temporal resolution of 1 hr and a spatial resolution of 250 m except for 2 m air temperature, which was corrected using VTP in experiment JRA55 [R].

Vertical Temperature Profile (VTP)
VTP data were extracted from the isobaric analysis fields (anl_p125) of the 1.25° (∼110 km) model of the JRA55. The 1.25° model provided data at a temporal resolution of 6 hr. The gridded isobaric temperature fields were linearly interpolated to a temporal resolution of 1 hr for this study. The geopotential height and air temperature were the two variables that were extracted. The isobaric analysis fields are available at 37 pressure levels from 1,000 to 1 hPa; however, in this study, we merely used the pressure level ranging from 1,000 to 650 hPa to cover the full elevation range of the Kurobe River Basin. The data set was downloaded from DIAS.

Evaluation Data Sets
For discharge evaluation, we used hourly discharge observations at the Kurobe, Sennindani, Koyadaira, and Dashidaira dams. For snow cover evaluations, the MODIS 8-day composite maximum snow extent product  (MOD10A2) from the Terra sensor was utilized. The 8-day composite was used for minimal cloud contamination in the images. The data set was reprojected and resampled to model resolution (250 m) and projection system (UTM 53N). The MODIS snow product was downloaded from NASA's AppEEARS website. The snow depth (at a daily temporal resolution) at each dam site was also used as an evaluation data set.

Land-Soil Parameter Calibration
The land-soil parameters were calibrated for the summer season of 2011, when MODIS exhibited no snow cover in the entire basin. The initial soil moisture conditions were obtained by spinning up the model several times in the same year. The model was initialized on 2010-09-15 00:00 when there was no snow cover, since the initial conditions of snow on the land surface are difficult to obtain. No snow correction factors were applied while calibrating the land-soil parameters. As the Radar-AMeDAS product is already calibrated with AMeDAS, we did not apply any correction to liquid precipitation. The calibrated land-soil model parameters are listed in Table S1 in Supporting Information S1. Figure S5 represents the calibrated daily hydrograph for the late summer of 2011. As displayed, the daily simulated discharge matches well with the daily observed discharge, with an NS of 0.68, and a PBIAS of −30.22%. The large negative PBIAS is due to the underestimation of baseflow, because snowfall correction factors were not applied in this simulation. Due to the underestimated amount of snowfall during the winter season, the snowmelt runoff is significantly underestimated, leading to lower soil moisture in the underlying soil. However, the peaks of the hydrograph are well reproduced.
In all the experiments discussed in the following sections, we used the same land-soil calibration parameters. In addition, the air temperature threshold for the phase partitioning of precipitation ( snow ) was set to 0 o C and the albedo of fresh snow in the visible spectrum ( VIS ) was set to 0.75, from the default value of 0.85. The albedo of fresh snow in the near-infrared spectrum ( NIR ) was set to a default value of 0.65. These values were determined through a series of sensitivity experiments (not shown here) and were kept fixed in all experimental runs. All the experimental runs were initialized from 2010-09-15 00:00 and ran up to 2014-09-23:00.

Experiment Set 1: Using Different Individual Air Temperature Observation Station With VTP for Extrapolation
In experiment set 1, the impact of using different stations to offset the JRA55 VTP was analyzed. Referring to Figure 4a, it can be clearly seen that depending upon the station used to offset the VTP, elevation-corrected air temperature differs significantly among different experiments. For example, in KR-VTP, the corrected annual average air temperature at grids was significantly higher than that of DD-VTP. Similar differences were also observed for the seasonal averages. This means that at higher elevations, DD-VTP received more precipitation in the form of snow, as compared to KR-VTP, because more grids were below the threshold temperature ( snow ). This led to a larger snow cover area in the DD-VTP experiment compared with the KR-VTP. The basin average corrected annual snowfall differed significantly among the different experimental runs, by up to 18%.
KR-VTP exhibited the best performance for discharge simulation in terms of both NS and PBIAS (see Table 3). The PBIAS was significantly lower in KR-VTP than in the other experiments, largely owing to its higher snowfall correction factor and the resulting snowmelt. SD-VTP, KD-VTP, and DD-VTP exhibited large negative PBIAS values, which indicates that the snowfall may have been underestimated. Figure 5a shows that due to the higher temperature used in KR-VTP, discharge was generally higher during the early snowmelt season (April and May) than in DD-VTP, where snow may have melted later in the season. Due to larger negative PBIAS in experiments SD-VTP, KD-VTP, and DD-VTP at Kurobe, the water storage in Kurobe Dam was underestimated, which led to a larger negative PBIAS in the three downstream dams.
While KR-VTP performed reasonably well in simulating discharge and snow depth at Kurobe, the snow depth at lower elevation stations (Koyadaira and Dashidaira) was significantly underestimated, as shown in Figure 5b. This is due to the fact that the air temperature used in KR-VTP for lower elevations (such as at Dashidaira) was much higher than the observation. In DD-VTP, on the other hand, the snow depth simulation at Dashidaira was much more reasonable. The snow cover simulations match quite well with the MODIS observations in KR-VTP, MOIZ ET AL.    however, DD-VTP overestimated the snow-covered area compared to MODIS, particularly in the lower elevation grids located in the valleys (Figure 5c).

Experiment Set 2A: Comparison of Linear Regression Lapse Rate Based Air Temperatures With Those Derived Using VTP (KEPCO Observation Network)
Based on the findings from experiment set 1, in experiment set 2A, we further compared the performance of KR-VTP with HLR [K]. In HLR [K], the four stations from KEPCO were used to derive hourly lapse rates using linear regression. On average, a very high negative correlation ( = −0.86 ) was obtained between the air temperature at the stations and their elevation. Referring to Figure 4b, it can be interpreted that the variation in annual average air temperatures with elevation was much steeper in the case of KR-VTP than in HLR [K]. This difference was particularly large at lower elevations, with HLR [K] generally exhibiting a colder air temperature. The basin average annual corrected snowfall was very similar between the two experiments, with a difference of less than 0.9%. The discharge simulations were also very similar, with KR-VTP exhibiting a slightly higher NS and lower PBIAS compared with HLR [K].
In contrast, HLR [K] performed best in terms of point snow depth simulation at all stations, as exhibited by the significantly lower snow depth RMSE than that for KR-VTP. HLR [K] generally overestimated the snow cover in May compared with KR-VTP ( Figure S6 in Supporting Information S1).
The HYBRID [K] experiment used the HLR [K] lapse rate below the Kurobe station and that of the KR-VTP above it for elevation correction of grid air temperature, thus acting as a hybrid of the two (Figure 4b). The performance of the discharge simulation in the case of HYBRID [K] is slightly better than that of KR-VTP at Kurobe however, these are generally similar since the same technique is used to extrapolate air temperature above Kurobe. However, owing to the slightly poor PBIAS at Sennindani, the NS deteriorated at Koyadaira and Dashidaira. Due to the higher temperatures at high elevations in HLR [K], snowmelt occurred earlier, leading to slightly higher discharge early in the snowmelt season and lower discharge in the post-snowmelt season (Figure 6a) as compared with HYBRID [K]. Despite the slightly improved performance for discharge simulations, HYBRID [K] still exhibits significantly better point snow depth simulations than KR-VTP (Figure 6b). An exception to this is the snow depth simulation at Kurobe, where we noticed a minor deterioration in performance as exhibited by the snow depth RMSE (Table 3). In the case of Koyadaira and Dashidaira, all experiments largely underestimated the snow depth (Figure 6b). The large underestimation at low elevations is peculiar and we think it may be caused by radar beam blockage in the valley. Overall, the snow cover simulation of HYBRID [K] was largely similar to that of HLR [K], with minor overestimation ( Figure S6 in Supporting Information S1).

Experiment Set 2B: Comparison of Linear Regression Lapse Rate Based Air Temperatures With Those Derived Using VTP (AMeDAS Observation Network)
Recall that experiment set 2A used air temperature stations located inside the river basin and was able to estimate hourly lapse rates with a fairly high correlation. However, this may not always be the case in a data-scarce region. Moreover, experiment set 2A utilized station data that may not be available as open-source data sets. Therefore, in experiment set 2B, we used air temperature stations from AMeDAS, all of which are located outside the river basin boundaries and are generally concentrated at lower elevations. As expected, the correlation of hourly lapse rates was much lower in the case of HLR [A] ( = −0.66 ) than HLR [K] ( = −0.86 ). Furthermore, similar to KR-VTP, in experiment set 2B we used the highest elevation station from AMeDAS (within the basin proximity) along with the VTP; we named this experiment OM-VTP. The average annual temperature distribution across the grids' elevation appears very similar between HLR [A] and OM-VTP (see Figure 4c). However, there were considerable differences in the seasonal averages (see Figure S3 in Supporting Information S1). Moreover, err was also slightly lower (9.0%) than that of HLR [A] (9.1%).

As shown in
While the discharge simulations were improved in OM-VTP versus HLR [A], similar to the outcome of experiment set 2A, the performance of snow depth simulation deteriorated at the lower elevation stations, as indicated by the higher snow depth RMSE in OM-VTP (see Table 3 and Figure 7b)  Figure S3 in Supporting Information S1), leading to earlier melting of snow. The snow cover simulation did not differ significantly among different experiments (see Figure S7 in Supporting Information S1).  Figure 7a shows that during April and May, HYBRID [A] Snowfall produced a higher discharge as compared with HYBRID . This is because the air temperature at high elevations in March-April-May (see Figure S3 in Supporting Information S1) was lower in case of HYBRID Snowfall. This demonstrates that the use of the hybrid product for phase partitioning contributes more significantly to spatial snow depth simulations as compared with its use for snowpack processes.

HYBRID [A]
We further tested the assumption of setting the air temperature threshold for phase partitioning ( snow) to 0 o C by conducting sensitivity runs following Shrestha, Koike, et al. (2014) and Shrestha, Wang et al. (2014). We used higher values of snow (1 o C and 2 o C) , optimized the snowfall correction factors and evaluated the discharge ( err ) and snow cover errors ( err ) as well as the value of the objective function ( err ) for the HYBRID [A] experiment. Figure 9 shows that a higher value of snow led to higher errors in both the discharge and snow cover simulation, consequently producing a higher value of the objective function, which justifies our assumption of using 0 o C as the air temperature phase partitioning threshold in this river basin.

Experiment Set 3: Using 2 m JRA55 Gridded Air Temperature for Extrapolation With VTP
In all experiments discussed above, station-based air temperature observations were used. However, this may pose limitations in regions where there are no observations. To handle this issue, in experiment set 3, we conducted an experiment where reanalysis-based 2 m gridded air temperature was corrected using a reanalysis-based VTP (JRA55 [R]) and compared its performance with the experiment based on lapse rates generated through linear regression, HLR [K]. In the case of JRA55 [R], the VTP was generally steeper than the lapse rate in HLR [K], leading to lower temperatures above Sennindani and higher temperatures below it (see Figure 4d). The corrected basin average annual snowfall amount was optimized to be approximately 4.3% higher in JRA55 [R] than in HLR [K] (see Table 1 , where the snow melted more rapidly, as shown in Figure 10a, with a higher discharge during the latter part of the snow season.
While the discharge was simulated well by JRA55 [R], the point snow depth was poorly simulated as compared with HLR [K], which is reasonable because HLR [K] uses the observed temperature at each location to simulate the snow depth. Owing to the higher temperature at low elevations in JRA55 [R], the snow depth was underestimated at Koyadaira and Dashidaira (see Figure 10b). The snow cover simulations were largely similar; however, during the early snowmelt season (April and May), JRA55 [R] slightly overestimates the snow cover area at low elevations, largely because of the lower air temperatures at this elevation compared to those in HLR [K] during March-April-May (as shown in Figure S4 in Supporting Information S1), which also explains the higher (see Figure 10c). Nonetheless, remarkably, the discharge and snow-process simulations in JRA55 [R] are reasonably comparable, especially at high elevations, even though no station observations were used in JRA55 [R].

Discussion
In this study, we have thoroughly investigated the hydrological implications of using VTP to extrapolate air temperature to higher elevations through several sets of comprehensive hydrological experiments. Experimental set 1 was designed to investigate whether there are any consequences to using different individual station observations along with VTP to extrapolate air temperature. These experiments were similar to the methodology proposed by Naseer et al. (2019). It was found that the choice of the station used for the extrapolation significantly altered the hydrological simulations, with corrected annual snowfall differing by as much as 18% among experiments. We found that the use of the highest elevation station (in KR-VTP) led to the overall best simulations in terms of discharge NS and PBIAS. This result clearly shows that the methodology proposed by Naseer et al. (2019) is highly sensitive to the individual station observation used with VTP and should be used with caution. This is especially true when such methods are used to optimize snowfall correction factors through a similar framework adopted in this study, such as in Shrestha, Koike, et al. (2014) and Shrestha, Wang et al. (2014), as this can lead to erroneous snowfall estimates at high elevations.
Experimental set 2 was designed to compare the performance of the VTP-based method proposed by Naseer et al. (2019) with the traditional linear regression-based method. At first the local air temperature station data available within the river basin (provided by KEPCO) were used in experimental set 2A. We found that indeed the use of the VTP-based method with the highest elevation station (KR-VTP) did outperform the linear regression-based method (HLR [K]) in terms of discharge simulations, however, this was not the case for point snow depth simulations, where the linear regression-based method simply performed better. This was especially true for lower elevation stations. Here it is worth mentioning that the HLR [K] represents near-surface air temperature gradients as it is based on station observations. Whereas KR-VTP represents the free air temperature gradi- ent as this is based on the reanalysis pressure levels and is mainly controlled by adiabatic effects and stratification of the atmosphere (Cullen & Marshall, 2011;Marshall et al., 2007). We think that KR-VTP performs better at high elevations, because free air flow conditions are more dominant at high elevations whereas at lower elevations near-surface conditions dominate, which is why HLR [K] performs better when simulating snow depth at lower elevation stations (Blandford et al., 2008;Mahrt, 2006;Pepin & Seidel, 2005;Rolland, 2003). For example, the lower temperature at low elevations in HLR [K] could be due to cold-air drainage in the Kurobe valley leading to localized lapse rates (Minder et al., 2010), which are not well represented by JRA55 VTP. A similar observation has also been made in the study by Gao et al. (2012Gao et al. ( , 2017Gao et al. ( , 2018. Furthermore, we also suspect that even though JRA55 assimilates observations from AMeDAS stations, radiosonde and satellites, this difference can be attributed to the fact that the station observations we used in this study are located below the JRA55 1.25° model orography, where VTP may simply have been linearly extrapolated to fill the missing values. Another reason for this difference could also be that the local stations from KEPCO were not assimilated in the JRA55 product. Keeping this in mind, we proposed a hybrid product which incorporated VTP-based temperatures at high elevations (above the elevation of Kurobe station) and linear regression-based temperatures at lower elevations (HYBRID [K]). As expected, this product led to both slightly improved discharge simulations at Kurobe as well as significantly improved snow depth simulations at lower elevation stations. This could have very important implications for the hydrological science community as air temperature observation stations are often located at lower elevations and therefore, such a hybrid product could be very useful for hydrological modeling in such data-scarce situations.
As the station data used in HYBRID [K] is maintained by a local authority (KEPCO), station data at high elevations (such as in the case of Kurobe) may not always be available openly. Furthermore, unlike the AMeDAS network, the stations maintained by local authorities such as KEPCO do not record data in real time. Obtaining real time meteorological data is crucial for developing real time operational hydrometeorological forecasts. Such forecasts could be utilized to support reservoir operations, such in the case of the Kurobe Basin. Therefore, to further consider the general applicability of the hybrid product, we designed experimental set 2B where only observations from the AMeDAS network were used (HLR [A], OM-VTP, and HYBRID [A]). Here, the overall performance of all these experiments was lower than when local observation stations (from KEPCO) were used since all the AMeDAS stations were located outside the river basin, with the highest elevation station, Omachi, located at a much lower elevation than Kurobe (784 vs. 1,453 m.a.s.l.). However, despite this, the hybrid product (HYBRID [A]) still outperformed both the cases where either only the linear regression-based lapse rates were used (HLR [A]) or only VPT-based temperatures were used (OM-VTP), similar to the findings of experimental set 2A. Using the hybrid product based on AMeDAS stations is especially useful for real time data applications, such as in the case of operational hydrometeorological forecasting to obtain snow and soil moisture conditions. Through an additional experiment (HYBRID [A] Snowfall ), we also found that this hybrid product is more important for precipitation phase partitioning rather than snowpack processes when simulating spatial snow depth in the case of Kurobe River Basin. This is in contrast to the results of similar experiments conducted by Wang et al. (2016) using MODIS nighttime LSTs for the upper Yellow River Basin, where the snowpack processes were more dominant. While techniques involving the use of MODIS nighttime LSTs to estimate lapse rates do have their advantages due to their high spatial resolution, they often suffer from cloud contamination and mixed pixel effects (Zhang et al., 2018). Furthermore, the MODIS nighttime LST derived lapse rates are sensitive to the snow depth and may require spatial snow depth observations, for characterizing lapse rates (as in Wang et al. (2016)), which are often not available. We think that the hybrid product developed in this study could be particularly useful for developing real time operational hydrometeorological forecasts in situations where real time air temperature observations are very limited and located at lower elevations.
For situations where no air temperature observations are available, we conducted the experimental set 3, where only air temperature from JRA55 reanalysis was used (JRA55 [R]). This is similar to the studies by Gao et al. (2012Gao et al. ( , 2017Gao et al. ( , 2018 and Dutra et al. (2020), who employed ERA-Interim and ERA5 respectively. We found that while this experiment did perform better than HLR [K] in simulating discharge, the point snow depth simulations had higher errors at lower elevations due to the same reasons discussed in the preceding paragraphs for the experimental set 2 such as the coarse resolution of the JRA55 model orography and poor representation of the localized lapse rates in valleys by the JRA55 VTP. Nonetheless, this technique can be quite important for situations where no station observations are available.

of 25
It is worth mentioning that while our assumption of setting the air temperature threshold for phase partitioning ( ) to 0 o C is reasonable in the case of the Kurobe River Basin since any increase in the threshold air temperature led to increased model errors, similar to the findings of Shrestha, Koike, et al. (2014) and Shrestha, Wang et al. (2014). However, this may not always be the case for other river basins (Jennings et al., 2018). Furthermore, another source of uncertainty is the coarse spatial resolution of the radar precipitation product used in this study. While Radar-AMeDAS is the highest spatial resolution precipitation product available in Japan, its spatial resolution of 1,000 m is still coarse compared to the model resolution of 250 m. The high variability of precipitation in mountainous regions can lead to uncertainties in the snowfall and rainfall estimates, and the discharge simulations. Furthermore, this can also lead to uncertainties in the snow depth simulations, which are evaluated at point scale, as well as the pixel-by-pixel evaluation of snow cover simulations using MODIS data (having a spatial resolution of 250 m).
There are several points which could be considered for future research. Although this study establishes the importance of using reanalysis-based VTP, only JRA55 was used as the reanalysis product in this study. Other reanalysis products having a higher resolution, such as ERA5 (Dutra et al., 2020), should be explored using the experimental framework devised in this study. Furthermore, the applicability of VTP-based methods should also be extended across other snowfed river basins using a similar framework to further validate their applicability. Other methods for data-scarce regions, such as the use of MODIS nighttime LSTs, should also be compared with VTP-based methods in future works. Remotely sensed snow depth observations, such as those from passive microwave satellites (Wang et al., 2016), can also be used in further studies to validate snow depth and its spatial distribution at high elevations.
values based on observation of precipitation phase (Jennings et al., 2018), can be used to further analyze the 0 o C assumptions made in this study. Future work should also be conducted to build upon the correction factors for Radar-AMeDAS precipitation suggested by Naseer et al. (2019) by considering basin topography and relative location of radars to take radar beam blockage into account, in addition to radar CAPPI.

Conclusion
This study tested the applicability of reanalysis-based VTP to obtain elevation-corrected air temperatures for distributed hydrological modeling and compared the performance of this method with the traditional method for elevation correction of air temperatures using lapse rates derived through linear regression-based analysis of station air temperature measurements. For this purpose, we used the case of the Kurobe River Basin in Japan, a typical snowfed watershed, and evaluated the performance of the hydrological model under different experimental settings by comparison with observed discharge, MODIS snow cover observations, and point snow depth observations for four hydrological years (2011)(2012)(2013)(2014). To deal with the uncertainty of snow precipitation in complex mountainous regions with data scarcity, we used a high-resolution Radar-AMeDAS precipitation product and optimized correction factors for snow precipitation using a distributed hydrological model and a heuristic algorithm to minimize errors in discharge simulation and snow cover simulation.
It was found that depending on the single-station observation used to offset the VTP, the performance of discharge simulation (NS ranging from 0.855 to 0.864 at Kurobe) and snow depth simulation differed significantly, with minor differences in the snow cover simulations as well. Furthermore, the corrected basin average snowfall amounts differed by as much as 18%. Comparative analysis of simulations based on linear regression-based lapse rates and VTP-based methods showed that the VTP-based methods generally resulted in better discharge simulations. However, this resulted in very poor point snow depth simulations at low elevation stations. A hybrid of the two methods, combining linear regression-based lapse rates used at lower elevations where station data were available, with the VTP-based method at high elevations where station data were limited, was also investigated. We found that the hybrid of the two products in general resulted in improved discharge simulations as well as improved point snow depth simulations at low elevations. We tested this for two groups of stations and found that for the local observation stations maintained by KEPCO, the NS for discharge simulation improved from 0.857 to 0.865 when compared with the use of linear regression-based lapse rates. Similarly, in the case of AMeDAS stations located outside the river basin, the NS for discharge simulations significantly improved from 0.824 to 0.850 when using the hybrid product. We conclude that the hybrid method can be especially useful in regions with limited station observations, which are not well distributed across the elevation of the basin, and also, incorporating the use of VTP can lead to particularly improved simulations in regions where most of the stations 23 of 25 are concentrated at low elevations. Furthermore, for data-scarce regions, we conducted an experiment with no station observation and instead applied elevation correction to 2 m gridded near-surface air temperatures using VTP from the same reanalysis. In general, we found that this method led to very reasonable discharge simulations with overestimated point snow depths at low elevations. While this method cannot completely replace any of the methods mentioned above, it can be quite useful for hydrological simulations in mountainous river basins where observations are extremely limited.

Data Availability Statement
The DEM data was downloaded from GSI Japan and is publicly available to download from https://www.gsi.go.jp/ENGLISH/page_e30233.html.
Soil parameter maps were obtained from the Food and Agriculture Organization (2003). Land cover data was obtained from the United States Geological Survey (2018). All MODIS data sets were obtained from Myneni et al. (2015) for LAI and FPAR variables and Hall and Riggs (2016) for snow cover extent variable. JRA55 reanalysis data sets were from Kobayashi et al. (2015) and can be downloaded from DIAS. Radar-AMeDAS data set can be downloaded from a publicly open database (http://database.rish.kyoto-u.ac.jp/arch/jmadata/data/jma-radar/synthetic/original/). Radar-AMeDAS can also be downloaded from DIAS (https://diasjp.net/en/), however in this case, login credentials and administrator permissions are required. In situ temperature data sets were partially obtained from AMeDAS and partially from KEPCO. AMeDAS station data is open to the public and can be downloaded from JMA's website (https://www.data.jma.go.jp/gmd/risk/obsdl/index.php#) by specifying the name of the station and the desired duration of data set. Air temperature data sets and other meteorological data sets from KEPCO cannot be shared publicly due to a Non-Disclosure Agreement. Readers may request this data by contact with KEPCO.