Employing higher density lower reliability weather data from the Global Historical Climatology Network monitors to generate serially complete weather data for watershed modelling

Hydrological models require complete and accurate weather data time series to represent watershed‐scale responses adequately. The Global Historical Climatology Network (GHCN) is the most comprehensive weather database used in hydrological modelling studies globally. Since higher‐density, lower‐reliability precipitation measurements from private citizens collected by the Community Collaborative Rain, Hail, and Snow (CoCoRaHS) network data were integrated into the GHCN, hydrological modellers in the United States have access to a much greater amount of weather data. However, the benefit of using CoCoRaHS data has not been assessed. The objectives of this work were to develop a method for generating a complete weather data time series based on the combination of data from multiple GHCN monitors and to assess several methods for the estimation of missing weather data. Weather data from GHCN monitors located within a specific radius of a watershed were obtained and interpolated using three estimation methods (Inverse Distance Weighting (IDW), Inverse Distance and Elevation Weighting (IDEW) and Closest Station), creating a seamless time series of weather observations. To evaluate the performance of the methodologies, weather data obtained from each estimation method was used to force the Soil and Water Assessment Tool (SWAT) and Thornthwaite‐Mather models for 21 US Department of Agriculture‐Conservation Effects Assessment Project watersheds in different climate regions to simulate daily streamflow for 2010–2021. Except for three watersheds, all of the SWAT models had Nash‐Sutcliffe Efficiency above 0.5, the ratio of the root mean square error to the standard deviation of observations below 0.7, and percent bias from −25% to 25% with a satisfactory performance rating. IDEW and IDW performed similarly, and the Closest Station method resulted in the poorest streamflow simulation. A comparison with published SWAT model results further corroborated improved model performance using novel combined GHCN data with all Closest Station, IDW and IDEW methods.


| INTRODUCTION
Weather data are critical inputs for hydrological models used to simulate hydrological, geomorphological and biological processes in watersheds.Complete and accurate weather data are essential for modellers and decision-makers to accurately and efficiently predict, assess and manage water resources (Gyau-Boakye & Schultz, 1994) although acquiring representative weather data is often challenging due to the sparse availability of timely, continuous and objective observations with uniform spatial coverage (Kite & Haberlandt, 1999;Rafii & Kechadi, 2019).Also, if the regional distribution of precipitation is not adequately represented, process-based hydrological models such as the Soil and Water Assessment Tool (SWAT) (Arnold et al., 1998;Gassman et al., 2010;Neitsch et al., 1998) cannot accurately simulate the hydrological process in a watershed (Chaplot et al., 2005).The amount of weather data available to hydrological modellers in the United States has extensively expanded with the integration of the much higher-density (though less reliable) precipitation measurements from private citizens submitted to the Community Collaborative Rain, Hail, and Snow (CoCoRaHS) (Doesken & Reges, 2011;Reges et al., 2016) into the Global Historical Climatology Network (GHCN) (Menne, Durre, Korzeniewski et al., 2012;Menne, Durre, Vose et al., 2012); however, the benefit of the additional data which is less reliable at any given location has not been assessed over multiple climatic regions of the United States.
Estimation of weather data to force watershed models is usually accomplished by using the closest weather stations and combining records from neighbouring weather stations where weather records are available (Galván et al., 2014).There are many methods proposed for estimating and combining weather data to obtain representative forcing data (Thiebaux & Pedder, 1987).Many approaches compute missing values from adjacent stations using a weighting function, which could involve everything from simple arithmetic averaging to more complex differential weighting techniques to employing spatial covariance to interpolate values (DeGaetano et al., 1995;Huth & Nemes ˘ová, 1995;Saborowski & Stock, 1994;Wallis et al., 1991;Willmott & Robeson, 1995).Among all the various methods, the Closest Station (Wallis et al., 1991), inverse distance weighting (IDW) (Hubbard, 1994) and inverse distance and elevation weighting (IDEW) techniques (Masih et al., 2011) are the most commonly used and have shown good performance for the estimation of missing data (Suhaila et al., 2008;Teegavarapu & Chandramouli, 2005).Numerous research studies have focused on filling gaps in weather data using advanced techniques like quantile mapping, machine learning and deep learning (Eischeid et al., 2000;Newman et al., 2015;Tang et al., 2020Tang et al., , 2021)).
These methods aim to create serially complete and accurate weather data sets.However, the generated data sets tend to be limited to certain locations or specific time periods, and they often do not include the most recent data.Additionally, users need to download the data and make further adjustments to use it for their needs.
There are many sources of weather data used in hydrological modelling studies including ground-based (Faurès et al., 1995), reanalysis-based (Fuka et al., 2014;Dile & Srinivasan, 2014) and satellite-based (Alazzy et al., 2017).Weather data from ground-based stations are often considered standard for hydrologic modelling (Colston et al., 2018;Mistry et al., 2022); however, many groundbased stations contain missing data.The Global Historical Climatology Network (GHCN) is the most comprehensive ground-based global weather database consisting of daily weather records (Menne, Durre, Vose et al., 2012), often including precipitation, daily maximum and minimum temperature, snowfall, humidity and snow depth managed by the National Oceanic and Atmospheric Administration (NOAA)'s National Centers for Environmental Information (NCEI).GHCN includes data from 107 000 ground stations across the globe (Jaffrés, 2019;Menne, Durre, Korzeniewski et al., 2012;Menne, Durre, Vose et al., 2012), and is used broadly in multiple hydroecological applications (Brazel et al., 2000;Larkin, 2005;Muche et al., 2020).GHCN data are derived from several weather networks, including the CoCoRaHS, the National Weather Service's Cooperative Observers Program (COOP), the European Climate Assessment and Dataset (ECA&D), the World Meteorological Organization (WMOID), the National Meteorological or Hydrological Center (NM/HC), the US Interagency Remote Automatic Weather Station (RAWS), the US Natural Resources Conservation Service (NRCS) SNOwpack TELemtry (SNOTEL) and the Weather Bureau Army Navy channel (WBAN) (Menne, Durre, Korzeniewski et al., 2012;Menne, Durre, Vose et al., 2012).Of the aforementioned eight GHCN weather data sources, five sources (CoCoRaHS, COOP, RAWS, WBAN and SNO-TEL) provide the majority of reporting stations in the continental United States, where the study is performed.
This study evaluates whether the combined datasets of GHCN, including the addition of higher-density but perhaps lower-quality citizen science weather data, combined with previously published methods to estimate missing weather data from multiple nearby weather stations better represent the weather forcings over a watershed than the most commonly used methods of using the closest weather stations.To test this, we use the SWAT model for 21 -United States Department of Agriculture (USDA)-Agricultural Research Service (ARS)-NRCS-Conservation Effects Assessment Project (CEAP) watersheds (Sadler et al., 2008(Sadler et al., , 2015) ) across five Köppen climate classification (Peel et al., 2007) regions in the United States and compare model predicted streamflow to observed data.To perform the evaluation, we developed a new modelling interface, 'Fill-MissWX' (Garna et al., 2023), to automatically download GHCN data (precipitation, maximum temperature and minimum temperature) from all monitors that are located in a specific radius from the target loca-

| GHCN weather data sources
CoCoRaHS, with over 20 000 active observers across the United States, Puerto Rico, the US Virgin Islands and Canada (Reges et al., 2016), is a collaborative precipitation monitoring network sponsored by NOAA and National Science Foundation (NSF) (Kelsch, 1998;Reges et al., 2016).The CoCoRaHS is a network of volunteer observers using inexpensive measurement tools to measure daily precipitation (https://www.cocorahs.org/).The National Weather Service (NWS) COOP (Wuertz et al., 2018) is another network with a mix of contractors and volunteers that provides observational weather data, including snowfall, precipitation, minimum and maximum temperature at more than 8700 observers.All COOP providers must employ NWS-approved equipment and standards (https://www.weather.gov/coop/Overview).

Remote Automatic
Weather Stations (RAWS) (Brown et al., 2011) are solar-powered stations, both portable and permanent, that collect weather data, primarily by US government agencies such as the US Forest Service and the US National Park Service (Zachariassen, 2003) used.This continues to the point that either there is no more missing data in the particular time period or there are no more stations within the maximum station radius with measured data (Wallis et al., 1991).
Inverse distance weighting (IDW): IDW estimates the missing data using distance-weighted average data from stations within the user-defined radius (Hubbard, 1994).The missing values are calculated by Equation 1: where V 0 is the estimated value of the missing data, V i is the data value of the i th closest station, d i is the distance between i th closest station and the target location, α is the weighting power that ranges from 1 to 6 with the most common value of 2 (Teegavarapu & Chandramouli, 2005;Vieux, 2001), and n is the total number of GHCN monitors located in a user-defined search radius.
Inverse distance and elevation weighting (IDEW): While IDW considers the influence of the distance between the target and reference locations to estimate missing data, in many areas, elevation also influences the distribution of precipitation.The IDEW method for this study is built based on the methodology presented in (Liston & Elder, 2006;Zhang et al., 2017) for precipitation (P 0 ) and temperature (T 0 ), respectively: where P i and T i are precipitation and temperature values of the i th closest station, E o is the elevation of target location with missing data, E i is the elevation of the i th closest station, x is a precipitation adjustment factor to vary monthly values, and Γ is the air temperature lapse rate that varies depending on the month of the year.Lapse rates are by default set to those suggested by Kunkel (1989), Liston andElder (2006), andThornton et al. (1997) for precipitation in the Northern Hemisphere.Lapse rates, x and Γ can be specified if known.

| Modelling interface development
To reliably estimate missing weather data, we developed FillMissWX, a modelling interface in the EcoHydRology R package (Fuka et al., 2014), to automatically download daily weather data of GHCN monitors and fill and interpolate missing data using data from neighbouring monitors based on three estimation methods: IDW, IDEW and Closest Station.The modelling interface assimilates precipitation and maximum and minimum temperature from monitors that are located within the user-defined distance of a location of interest using the 'rnoaa' R package library (Edmund et al., 2016).The modelling interface will also automatically generate plots of weather variables including the sources and numbers of GHCN monitors used (CoCoRaHS, COOP, ECA&D, WMOID, NM/HC, RAWS, SNOTEL) and their distances from the target location.The required inputs to run the FillMissWX function are latitude (declat) and longitude (declon) of the location of interest, the radius within which to search for monitors from the target location in kilometres (StnRadius), the minimum number of monitors from which data need to be downloaded (minstns), the earliest (date_min) and latest (date_max) date of interest, the elevation of the target location (km) in IDEW method (targElev), the method to use to fill missing weather data including 'closest', 'IDW' and 'IDEW' (method), the weighting power in IDW and IDEW methods with the default value of 2 (alfa, 1-6), and the print format 'png' or 'pdf' format (printinto).The outputs of the FillMissWX function include a data frame containing filled precipitation (P) (mm), maximum temperature (MaxTemp) and minimum temperature (MinTemp) ( C), the weighted-average elevation of monitors used for precipitation (prcpElev), maximum temperature (tmaxElev) and minimum temperature (tminElev).

| Study watersheds
To evaluate whether combined and complete GHCN weather data from estimation methods adequately represent regional weather distribution as a major driving force of hydrological processes, we initialized SWAT models for each of the 21 USDA-ARS-NRCS-CEAP (Sadler et al., 2008(Sadler et al., , 2015) ) benchmark watersheds (Figure 1 and missing data and can be varied based on user knowledge of data completeness and density of stations.The initial search radius was set to 30 km as has been shown to be a suitable first estimate in many locations (Chen & Liu, 2012;Fuka et al., 2014).In some cases, 30 km was inadequate to provide suitable station density; therefore, search radii greater than 30 km in Table 1 were increased iteratively until sufficient station numbers were identified.Figure 1 shows the watershed locations and their corresponding climate types based on the Köppen climate classification (Köppen et al., 2011;Rubel & Kottek, 2011).
Based on this methodology, three weather datasets for each of the 21 watersheds were developed for testing.For each watershed, we investigated the types of GHCN networks and the number of monitors per type that were used to generate the complete weather data time series used within the SWAT model.

| Hydrologic model initialization and calibration
The SWATmodel package, an R-based distribution of the SWAT model, was used for this study (Fuka et al., 2014).The SWATmodel R package offers a linear model-like interface with the SWAT modelling system in R, transforming weather data into hydrological output responses.The SWATmodel initialization for each CEAP watershed includes two steps.First, we obtained the required data to represent the various watersheds, including basin characteristics (size, centroid, location), the weather forcing data interpolated to the watershed centroid and the hydrological response data used for calibration.The SWAT model uses a process known as spatial disaggregation to distribute weather data from weather stations to the hydrologic response units (HRUs.)The elevation of each HRU is taken into account during this process to appropriately adjust the weather data and account for elevation-related variations in climate and hydrological processes.As described in section 2.3, a 30-km search radius was initially set and increased until a sufficient number of stations was found to fill in the missing data.Second, models of the 21 watersheds were initialized for the 2010-2021 period and calibrated against USGS-measured data using the DEoptim algorithm (Ardia et al., 2007;Mullen et al., 2011).DEoptim is coupled to the SWATmodel package in R and performs evolutionary global optimization using the differential evolution algorithm.Note that three SWAT models were initial- Table S1).The simulated streamflow using each of the Closest Station, IDW and IDEW methods was assessed for the goodness of fit against observations using three statistical measures [Nash-Sutcliffe efficiency (NSE), percent bias (PBIAS) and ratio of the root mean square error to the standard deviation of measured data (RSR)] as recommended by Moriasi et al. (2007) for each watershed.
The NSE shows how well the simulated versus observed data plot falls along the 1:1 line, ranges between À∞ and 1 and is a good estimator of overall mass balance adherence (Nash & Sutcliffe, 1970).
RSR is another statistic for model evaluation, ranging from 0 to positive values, where 0 indicates the perfect fit (Singh et al., 2005).PBIAS is a measure of how much the model over-(>0) or under-(<0) estimates observed values (Gupta et al., 1999).Based on the reported statistical metrics, Moriasi et al. (2007) suggested that models for     1989-1998, 2001-2008, 2006-2012 and 1995-2003,  Creek studies did not report model evaluation.In order to determine whether the newly developed complete GHCN weather data were as reliable as the previously used weather data in representing hydrological processes, we compared the resulting NSEs with the previous studies.

| Weather data
For the Closest Station, IDW and IDEW weather data time series the search radius from which the weather variables were filled varied from 30 km for most watersheds to 85 km for Big Sunflower River at Sunflower, MS, as shown in Table 1.The watershed above USGS number 04282650 is shown in Figure S1 to give an example of how the automatically generated plots by FillMissWX that show the types of GHCN monitors and their distances from the watershed centroid, while scatter plots of precipitation, maximum and minimum temperatures data obtained by IDW and IDEW against the Closest Station method are shown in Figure S2.
Figure S3 shows the number of stations by GHCN platform type used to acquire and interpolate precipitation data for 21 basins.
Table 1 and Figure S3 show that among all GHCN sources, and for nearly every watershed, data from the CoCoRaH observer network contributes at least 50% of the precipitation data, with COOP data the next most common source and WBAN and SNOTEL contributing less than 10% of the observations.

| Evaluation of Closest Station, IDW, and IDEW estimation methods for streamflow simulation
Simulated streamflow from the SWAT model runs were calibrated individually against observed data for the 2010-2021 period.In Table S2 and Figure S2).Consistent with the findings for the SWAT model, the Riesel and Leon River watersheds exhibited lower performance ratings.
The results of the one-way ANOVA test showed significant difference among estimation methods for all statistical metrics with probability values ( p-value) less than 0.1.Tukey test revealed that this significant difference in the estimation methods was caused by Closest Station method (Table 3).Note that we used diagnostic plots to confirm that the linear models are appropriate for the given data and meet all the necessary assumptions (Figures S4-S6).For all watersheds, the IDW and IDEW methods resulted in greater NSE, lower RSR, and lower APBIAS values, in most cases substantially, than the Closest Station method (Table S2 and Figure 2), demonstrating the better performance of IDW and IDEW to represent hydrological processes than the Closest Station method.
Climate classification had a significant effect only on the APBIAS values (Table 4) There was no significant effect found from the interaction of the estimation method and climate classification on the studied statistical measures (Table 4).We utilized diagnostic plots to The description of MICRONET is detailed in (Guzman et al., 2014).
assess the suitability of linear models for the provided data and to ensure that all essential assumptions were met (Figures S7-S9).Abbreviations: ANOVA, analysis of variance; APBIAS, absolute percent bias; IDEW, inverse distance and elevation weighting; IDW, inverse distance weighting; NSE, Nash-Sutcliffe efficiency; RSR, the ratio of the root mean square error to the standard deviation of measured data.
T A B L E 4 Table of p-value, degree of freedom (Df), sum of squares or the total variation between the group means and the overall mean (Sum Sq) and mean of the sum of squares (Mean Sq) obtained for the fitted linear model.In the model, statistical metrics (NSE, RSR, APBIAS) were response variables with the fixed effects of estimation method, climate classification and interaction between them (interaction).Abbreviations: APBIAS, absolute percent bias; IDEW, inverse distance and elevation weighting; IDW, inverse distance weighting; NSE, Nash-Sutcliffe efficiency; RSR, the ratio of the root mean square error to the standard deviation of measured data.4a), Choptank River (Figure 4b), Ft.Cobb Reservoir Watershed (Figure 4c) and Goodwater Creek (Figure 4d).

| Model evaluation and corroboration
Across all watersheds using the Closest Station, IDW and IDEW methods, the simulated hydrographs align fairly well with observed peak flows and dry periods, capturing timing accurately.However, the models tend to slightly underestimate a few storm events.For the Closest Station method, the hydrographs of Ft.Cobb Reservoir Watershed (Figure 4c) and Goodwater Creek (Figure 4d) reveal a moderate overestimation of some simulated storm event magnitudes.given the simplicity of this method.Limiting weather data to the closest station can bias model results (Sattari et al., 2017), especially in large watersheds (Table S2 and Figure 2).The IDW and IDEW estimation methods were similar across climate classification, with the exception of the BSk climate region.The improvements to model performance using IDEW in the BSk region were likely due to the considerable elevation changes between the target and nearby weather stations located in a mountainous region where there is a high correlation between precipitation and elevation (Zhang et al., 2017).
The results of model corroboration (Table 6) using previously published SWAT model results for four watersheds revealed that combined GHCN weather data generally provided similar streamflow prediction to the weather data sources that were used in the individual watershed studies (Ghidey et al., 2007;Guzman et al., 2015;Lee et al., 2018;Wagena et al., 2018).Interestingly, all three estimation In duplicating the SWAT model study with the simpler TM model, we are able to suggest that the SWAT model results can apply more generally across similar watershed model structures.

| CONCLUSIONS
This study demonstrates that for watershed modelling across a variety of hydroclimate regimes and watershed types, using higher-density, but perhaps lower-quality weather data can improve model performance.Incorporation of the citizen science-based CoCoRaHS weather monitors and integrated into the GHCN better represent weather forcings over a watershed.As hypothesized, the integration of all available weather measurements, even if of lower 'quality' resulted in as good or better streamflow predictions as the best (often nearest) weather station to a watershed centroid.This is because the higher station density is able to capture more information than an individual weather station.The addition of higher-density, lower-quality, citizen weather measurements to the suite of hydrological modelling tools provides opportunities for improving hydrological understanding, including for modelling ungauged watersheds.As citizen science data is made available more rapidly, these data have the potential to advance real-time hydrological modelling and prediction.As highlighted in our discussion section, the model exhibited unsatisfactory performance in two large watersheds, primarily attributed to spatial variation in weather.Thus, when initializing watershed models, there is a need to consider the spatial variability of the weather in the basin and force internal basins with more locally representative weather measurements.This scenario presents a potential application where the incorporation of higher-density data could be particularly valuable and should be considered in future works.
tion and then estimate and interpolate missing data based on three estimation methods; Closest Station, IDW and IDEW.Then, complete weather data time series from multiple GHCN monitors generated by FillMissWX estimation methods are used to force SWAT models and simulate streamflow.The predicted streamflow of each estimation method is evaluated based on standard statistical measures [Nash-Sutcliffe efficiency (NSE), percent bias (PBIAS), and ratio of the root mean square error to the standard deviation of measured data (RSR)], and evaluation criteria suggested by Moriasi et al. (2007).The effect of each estimation method on simulated streamflow was further assessed by comparing model results with previously published model results.To examine whether the results from the methodologies developed in this study are particular to the SWAT model, or would apply more generally across model structures, we employ the Thornthwaite-Mather model (TM) to simulate streamflow in 21 watersheds, and the results are assessed using the NSE metric.
for each of the 21 watersheds based on Closest Station, IDW and IDEW weather data time series.After model initialization in each of the 21 ARS-NRCS-CEAP watersheds, 19 parameters, previously shown as important SWAT streamflow parameters were calibrated(Cibin et al., 2010;Khorashadi Zadeh et al., 2017;Leta et al., 2015;Nossent & Bauwens, 2012; streamflow simulation are satisfactory when NSE >0.5, RSR ≤0.70 and À25 ≤ PBIAS ≤25.Although PBIAS indicates whether the simulated data of a model tend to be larger or smaller than their observed counterparts, comparing different estimation models based on PBIAS values for each watershed is misleading as even a model underestimates or overestimates the observed data it will still have a PBIAS close to zero, regardless of the model's poor performance(Moriasi et al., 2015;Schaefli & Gupta, 2007).Thus, in addition to evaluating the goodness of fit based on the PBIAS metric, we calculated the percent absolute bias (APBIAS).A one-way analysis of variance (ANOVA) test and the post hoc of Tukey's Honest Significant Difference were used individually for each of the statistical measures (NSE, RSR, APBIAS) to identify statistically significant differences between the means of the estimation methods F I G U R E 1 Locations and climate classifications of 21 ARS-NRCS-CEAP watersheds and test USGS watersheds.Red dots show areaweighted USGS watershed centroids.Climate of the watersheds is grouped based on Köppen climate classification(Beck et al., 2018) into BSk, cold semi-arid and hybrid rain-snow-dominated hydrological regime with infrequent but intense rainfall events (orange); Cfa, humid subtropical climates and rain-dominated hydrological regime due to consistent year-round precipitation (light green); Csa, Mediterranean hot summer climates and rain-dominated hydrological regime due to consistent year-round precipitation (yellow); Dfa, Hot summer continental and cold winter climates and snow-dominated hydrological regime with significant snow accumulation in winter and snowmelt in spring (light blue); Dfb, Warm summer continental and cold winter or hemiboreal climates and snow-dominated hydrological regime with significant snow accumulation in winter and snowmelt in spring (dark blue).T A B L E 1

2. 5 |
Model evaluation and corroborationDue to the potential influence of changing conditions on hydrological simulations, the conventional split-sampling technique may fail to encompass the full range of variability in the watershed.Opting for full-time series calibration, as recommended byArsenault et al. (2018),Garna et al. (2023), andSingh and Bárdossy (2012) proves to be a more robust alternative.Thus, we performed the calibration of our model for daily streamflow simulation across the entire duration of the dataset (2010 to 2021).To assess the model's performance, we employed bootstrapping resampling, which enabled us to establish the probability distribution of the performance metrics (NSE and RSR) for the entire period spanning 2010 to 2021.
Figure 2).For the Upper Washita River watershed in Oklahoma, the Closest Station and IDW simulations indicated poor performance with PBIAS values of À36 and À40, respectively.Streamflow simulations from the TM model calibrated against observed data for the period from 2010 to 2021 revealed agreement between the simulated and observed streamflow, though as would be expected from a simpler model, the TM model did not perform as well as the SWAT model which includes more processes (TableS2).Consistent with the find-

Figure 3
Figure 3 shows the distribution of NSE, RSR, and APBIAS values for the watersheds categorized by Köppen climate classification.For

F
I G U R E 3 Boxplot of NSE (a), RSR (b) and APBIAS (c), values (dots) of methods provided better model performance in the Mahantango Creek Watershed and Goodwater Creek watersheds, which both employ custom USDA research weather stations, and more sophisticated SWAT models.In the Choptank River and Ft.Cobb Reservoir watersheds, all three estimation methods resulted in poorer model performance for the calibration period, although IDW and IDEW NSE values were similar to the results of Lee et al. (2018) in the Choptank watershed.The previous studies initialized their models with multiple time series of weather data which would result in more degrees of freedom and better calibration results than our models where a single weather time series was used.For the evaluation period in the Choptank watershed, Closest Station, IDW and IDEW models outperformed the previous study result.These results indicate that employing the open source GHCN datasets to force the models achieve similar results and is much cheaper than deploying custom weather stations.Our results are in agreement with several studies that assessed sources and methods of weather data to drive hydrological models.Auerbach et al. (2016) andFuka et al., (2014)  evaluated reanalysisbased Climate Forecast System Reanalysis weather data (which integrates all available higher-density climate stations), either gridded or interpolated at various densities almost exclusively resulted in better hydrologic model results, at least for streamflow, than using the closest land-based weather station.Others(Tan et al., 2015;Vu et al., 2018;Worqlul et al., 2014) have used satellite-based weather data which were also shown to provide better estimates of weather occurring over a watershed, as estimated by the hydrological model response.This study demonstrates that using GHCN stations and specifically including CoCoRaHS volunteer monitors result in similar levels of hydrological model performance across diverse watersheds.

Table 1
).The relevant properties of the 21 watersheds are shown in Table 1, including the ARS-NRCS-CEAP benchmark watershed name, US Geological Survey (USGS) gauge name (U.S. Geological Survey, 2022), drainage area (km 2 ), latitude and longitude of watershed centroid, the search radius, number and type of reporting weather station and average annual weather characteristics including precipitation, maximum and minimum temperature.USGS gauges, listed in Table 1, are the portion of ARS-NRCS-CEAP watersheds used as test watersheds in this study that had daily measured streamflow data for 2010-2021 and were used for model calibration.The search radius, a user-defined parameter, is the distance from which to draw Table of study area information including ARS-NRCS-CEAP benchmark watershed names, the name of experimental (test) USGS gauge as a portion of ARS-NRCS-CEAP watershed, the area of watershed above USGS gauge (km 2), longitude and latitude of watershed centroids, the search radius (R) in km to interpolate all missing data from centre of the watershed, number and platform type of precipitation, maximum and minimum temperature GHCN stations in a circle with a centre of latitude and longitude of watershed's centroid and the radius of R, average annual precipitation (mm) and annual maximum and minimum temperature ( C) using FillMissWX modelling interface for each of Closest Station (Closest), IDW and IDEW interpolation methods.
(PET) losses, with any surplus soil moisture exceeding the AWC being retained within the watershed storage.The release of stored water into the river is determined by a linear equation.We evaluated the agreement between the simulated streamflow generated by the TM model, forced by each of the Closest Station, IDW and IDEW methods and observed data by assessing the goodness of fit using the NSE metric.
, maximum and minimum temperatures obtained by three methods in FillMissWX modelling interface including Closest Station, IDW, IDEW.
(Ghidey et al., 2007;water Creek(Ghidey et al., 2007;Table 2).These studies were selected because they had previously published SWAT model results and employed weather data from four different research groups using traditional weather data aggregation methods.To compare streamflow simulations, SWAT model performance from previous studies was compared with SWAT model results using the T A B L E 1 (Continued) Abbreviations: ARS, Agricultural Research Service; CEAP, Conservation Effects Assessment Project; CoCoRaHS, Community Collaborative Rain, Hail, and Snow; COOP, Cooperative Observers Program; GHCN, Global Historical Climatology Network; IDEW, inverse distance and elevation weighting; IDW, inverse distance weighting; NRCS, Natural Resources Conservation Service; RAWS, Remote Automatic Weather Station; SNOTEL, US Natural Resources Conservation Service SNOwpack TELemtry; USGS, US Geological Survey; WBAN, Weather Bureau Army Navy. a The platform types of GHCN network for ARS-NRCS-CEAP watersheds include CoCoRaHS, COOP, WBAN, RAWS, SNOTEL.b Average annual precipitation T A B L E 2 Watershed name, calibration and evaluation period, and weather data sources for four watersheds: Mahantango Creek Watershed, Choptank River, Ft.Cobb Reservoir watershed and Goodwater Creek.Ft.Cobb Reservoir watershed and Goodwater Creek watersheds did not report any evaluation periods. a Table of probability values (p-values), degree of freedom (Df), mean square error of the main effect (MSerror) and coefficient of variance (CV) obtained from the post hoc of Tukey's Honest Significant Difference test to determine if there is a significant difference between the means of Closest Station and IDW (Closest Station-IDW), Closest Station and IDEW (Closest Station-IDEW) and IDW and IDEW (IDW-IDEW) for each of NSE, RSR and APBIAS statistical measures.
F I G U R E 2 Bar plots of NSE (a), RSR (b), PBIAS (c) and APBIAS (d) values obtained from Closest Station, IDW and IDEW methods for SWAT models of 21 ARS-NRCS-CEAP watersheds.T A B L E 3 ARS-NRCS-CEAP watersheds obtained from Closest Station, IDW and IDEW were grouped based on climate classification (BSk, cold semi-arid; Cfa, humid subtropical climates; Csa, Mediterranean hot summer climates; Dfa, hot summer continental climates; Dfb, warm summer continental or hemiboreal climates).Red, green and blue dots show the statistical metrics obtained from Closest Station, IDW and IDEW methods, respectively.Note that we only have one watershed for each of BSk and Csa.The boxplot shows the first and third quartiles, median, minimum and maximum values.The model with higher NSE, lower RSR and lower APBIAS shows better performance.For APBIAS, the median IDW and IDEW values for all climate classifications were lower than the Closest Station method.Thus, given the lowest NSE, highest RSR and APBIAS values, the Closest Station method resulted in the poorest performance.For IDW and IDEW, median NSE, RSR and APBIAS values for the Dfa, Dfb, Cfa and Csa were approximately similar.However, for the BSk climate region (Snake River near Kimberly, ID) the IDEW methodology resulted in higher NSE, lower RSR and lower APBIAS values, although because there was only one BSk watershed, no statistical test was performed.
Lee et al., 2018)007)of NSE and RSR obtained from the bootstrap test for model evaluation (9Ghidey et al., 2007).Performance ratings are considered as satisfactory if NSE > 0.5 and RSR ≤ 0.70 as suggested byMoriasi et al. (2007).Comparison of calibration and evaluation NSE values of daily (for Mahantango Creek Watershed;Wagena et al., 2018, Ft.Cobb  Reservoir watershed; Guzman et al., 2015 and Goodwater Creek;Ghidey et al., 2007)and monthly (for Choptank River;Lee et al., 2018)streamflow simulation of SWAT models initialized by three estimation methods (Closest Station, IDW and IDEW) with NSE of previous studies., both located in Texas.Both watersheds are relatively large (76 671 and 5995.5 km 2 ) compared with other watersheds listed in Table1.As watersheds become larger, utilizing only the centroid of the basin may misrepresent the spatial variability of precipitation events over the watersheds.Both watersheds also have features that add complexity to the model development: the Leon River watershed While 18 of the 21 CEAP watersheds had satisfactory model performance using our weather estimation methods, there were two watersheds that had poorer performance, Riesel and Leon River a The study did not report model fit for an evaluation period.watersheds