The effect of data sources on calculating mean temperature and integrated water vapor in Iran

The weighted mean temperature ( Tm ) plays a crucial role in calculating Precipitable Water Vapor (PWV) and integrated water vapor (IWV) using Global Navigation Satellite Systems (GNSS) techniques. Currently, the primary sources for meteorological parameters are radiosonde measurements and Numerical Weather Models (NWMs). This study focuses on assessing the influence of different data sources on the computation of Tm and IWV in Iran. The investigation involved comparing several datasets: ERA5 numerical data with spatial resolutions of 0.125° and 2.5° (ERA5 0.125, ERA5 2.5), ERA‐Interim, NCEP numerical data and Tm results derived from the GPT3 model. Validation of the results utilized data from 12 radiosonde stations situated across Iran. In addition, the precision of the IWV parameter was evaluated by utilizing measurements from the only available IGS station in the region, situated in Tehran. The results revealed that ERA5 0.125 exhibited superior accuracy in Tm estimation compared with the other datasets, showing a discrepancy of approximately 1–2 K. In contrast, the GPT3 model displayed an accuracy of about 3 K. Analysing the results across different months of the year revealed elevated root mean square error (RMSE) values during warmer months, with little variability based on station height in the region for the four datasets. Regarding IWV, the ERA5 0.125 dataset outperformed the other three datasets, demonstrating an accuracy of about 0.07 kg m−2. Notably, RMSE values during summer were approximately 50% higher compared with the annual RMSE.


| INTRODUCTION
The troposphere, a vital layer within the atmosphere, holds a distinct significance due to its intricate nature.This layer is also considered one of the most challenging error sources in spatial geodesy, as it is composed of water vapor which is temporally and spatially variable (Böhm & Schuh, 2013).Its thickness spans 50-80 km, contingent upon the region (Hofmann-Wellenhof et al., 2008;Leick et al., 2015;Seeber, 2008).Monitoring the troposphere bears significance from two perspectives: geodetic and meteorological from a geodetic standpoint, this layer presents a substantial source of error that employing dual frequencies alone cannot mitigate (Hofmeister & Böhm, 2017).Meteorologically, the troposphere resides closest to the Earth's surface, and alterations within it directly reverberate throughout the Earth's systems and climate (Zhao et al., 2019).Amidst the spectrum of tropospheric parameters, water vapor emerges as a pivotal and influential factor.Typically confined within altitudes of 8-11 km above ground level, its influence diminishes at higher elevations.To derive meteorological insights, the avenues of radiosonde data and meteorological numerical data are explored.Radiosonde data, albeit characterized by a non-uniform temporal resolution of 12 h, offer substantial potential, while meteorological numerical data present a uniform grid with temporal resolutions ranging from 6 to 1 h.This latter dataset finds utility in ray tracing for tropospheric delay estimation (Nafisi et al., 2012).Leveraging the precision of radiosonde data, assessment and validation of outcomes become viable (Chen et al., 2017;Liu et al., 2018).T m and Zenith Total Delay (ZTD) emerge as pivotal tropospheric parameters, holding sway in both geodetic and meteorological domains.These parameters function in tandem, underpinning the computation of integrated water vapor (IWV) and precipitable water vapor (PWV) (Davis et al., 1985).So far, several models have been proposed for T m .For T m , linear local surface temperature-dependent models are proposed for different regions (Bevis et al., 1992;Boutiouta & Lahcene, 2013;Chen et al., 2017;Liou et al., 2001).There are also global models for calculating T m obtained based on numerical data (Böhm et al., 2015;Schueler et al., 2001).In the exploration of meteorological data's influence on tropospheric parameters, noteworthy studies have emerged.Chen evaluated the ZTD accuracy attained from ECMWF and NCEP data for Asia using GPS-derived ZTD data.Results unveiled superior precision of ZTDs derived from ECMWF data in comparison with NCEP data (Chen et al., 2011).Further expanding this horizon, Chen's investigation encompassed 29 GPS stations within China, affirming the integral numerical methods' superiority over the Saastamoinen approach by a margin of 1-3 cm, with ECMWF data displaying supremacy over NCEP data.Weng also compared five datasets to calculate PWV using 268 GPS stations with global coverage (Wang et al., 2020).The results indicated that the ERA5 data are more accurate than the other data.Jiang et al. evaluated the accuracy of ZTDs derived from ERA5 data using 219 GPS stations in China.Based on their results, the accuracy (root mean square error [RMSE]) of ZTDs derived from ERA5 data is about 11.49 mm.In comparing the time series of ZTD values, the ERA5 data in sub-daily variation had better agreement than GPT3 and ERA-Interim (Jiang et al., 2020).This study undertakes the calculation and comparison of the T m parameter across four datasets.The considered data sources encompass ERA5 data at spatial resolutions of 0.125 and 2.5 , boasting a 6-h time resolution, in addition to ERA-Interim and NCEP data featuring a spatial resolution of 2.5-and a 6-h time resolution.The global GPT3 model was also adopted to compare T m results with 4 datasets.GPT3, referred to as Global Pressure and Temperature 3 (GPT3), is a comprehensive troposphere model designed to enhance mapping functions like Vienna Mapping Functions 3 (VMF3).GPT3 integrates empirical coefficients and meteorological data to more accurately approximate tropospheric delays, making it a versatile tool for geodetic, meteorological and climatological applications (Landskron & Böhm, 2018).For validation, a set of 12 well-dispersed radiosonde stations located in Iran was employed.To facilitate IWV result comparisons, observations of ZTD from the IGS station in Tehran were employed.This study is structured to comprehensively explore the intricacies of T m and IWV computations, delving into their nuances across diverse data sources.Beginning with an exposition of the pivotal data sources (Section 2), the subsequent sections navigate through the mechanics of T m calculation derived from numerical data (Section 3), the comprehensive portrayal of IWV computation grounded in T m and Zenith Wet Delay (ZWD) (Section 4), meticulous examination of T m calculation outcomes vis-à-vis radiosonde data (Section 5) and the presentation and evaluation of IWV results from the four datasets and the GPT3 model, juxtaposed with radiosonde data (Section 6).These sections collectively culminate in the synthesis of conclusions, encapsulating key insights gleaned from this systematic analysis (Section 7).This structured approach allows us to unravel the dependencies of T m and IWV computations on varying data sources, leading to comprehensive conclusions that contribute to a deeper understanding of meteorological phenomena and operational forecasting practices.

| DATA SOURCES
The meteorological datasets used in this study include ERA5, ERA-Interim and NCEP.ERA5 is the fifth generation of ECMWF reanalysis data.This dataset provides many meteorological and oceanic parameters on an hourly basis from January 1950.ERA5 is produced by Copernicus Climate Change Service at ECMWF and is available at https://cds.climate.copernicus.eu.These data have global coverage and provide meteorological parameters from the surface layer to an altitude of 80 km.Two sets of this data set with two spatial resolutions were employed in this research.The first had a spatial resolution of 0.125 (ERA5 0.125) and the second had a spatial resolution of 2.5 (ERA5 2.5).For better comparison with the other two datasets, the time resolution used was considered to be 6 h (0, 6, 12, 18, UTC).The height resolution used includes 37 pressure layers from 1000 to 1 hPa.The second dataset is ERA-Interim (ERAI), developed by the European Center for Medium-Range Weather Forecasting (ECMWF).This dataset provides atmospheric and oceanic parameters on a global scale between January 1979 and August 31, 2019.It was replaced with the ERA5 dataset from 2019 onwards.The spatial resolution of this dataset was 2.5 and the time and height resolutions of this dataset were considered the same as the ERA5 dataset.The third dataset is the NCEP data.This dataset is a joint product of the National Centers for Environmental Prediction (NCEP) and the National Center for Atmospheric Research (NCAR).These data are available from its website at https://www.psl.noaa.gov.This dataset has been providing meteorological data on a regular grid on a global scale since 1948.These datasets were all downloaded for (24 -41 ) latitude and (43 -64 ) longitude.Moreover, the period of these datasets was between 2007 and the end of 2019.Radiosonde data emerge as another pivotal meteorological source, capturing direct measurements through sensors affixed to balloons.Typically, these data are furnished over 12-h intervals and can be accessed freely via the National Oceanic and Atmospheric Administration (NOAA) website (https://ruc.noaa.gov/raobs).Table 1 shows the list of radiosonde stations used in this study.Figure 1 also displays the distribution of these stations in Iran.To investigate the impact of data sources on IWV, the IGS station located in Tehran (OIII station in Figure 1) was used.The IGS has been in charge of providing GNSS data and products since 1994.One of the products is ZTD data presented with a time resolution of 300 s and a nominal accuracy of 4 mm (https://igs.org/wg/troposphere/#data). Tropospheric products are also archived at IGS site at https://cddis.nasa.gov/archive/gnss/products/troposphere/zpd/.

| T m CALCULATION FROM THE METEOROLOGICAL DATASET AND RADIOSONDE DATA
The T m is defined as in integral Equation (1) (Davis et al., 1985).
where e is the water vapor pressure and T is the air temperature.Moreover, h 0 represents the height of the station and h tp represents the height of the highest troposphere layer.This height is considered the place where the water vapor pressure is minimized and its value can be neglected.In meteorological data, the surface height corresponds to a pressure of 1000 hPa.As this might diverge from the station's height, vertical interpolation becomes essential for calculating meteorological parameters (Chen et al., 2011).Moreover, since the position of the station is generally not the same as that of the grid points, a horizontal interpolation is required to calculate the results at the station (Rocken et al., 2001).To calculate water vapor pressure in numerical data using specific humidity, water vapor pressure can be calculated from Equation ( 2).
Here, e and q are water vapor pressure and specific humidity, respectively; ε is considered equal to 0.622 (Wallace & Hobbs, 2006).Figure 1 illustrates the mean values of the T m for the 13 years (2007-2019) using ERA5 data for Iran.In radiosonde data, based on the dew point temperature, Equation (3) is used to calculate the water vapor pressure (R ozsa, 2014).
In this equation, T d is the dew point temperature.

| IWV CALCULATION FROM T m AND ZWD
Based on the definition, IWV is integral Equation (4).
Here, ρ ν is water density.In GNSS, Equation ( 5) is used to calculate IWV because the ZTD value can be achieved with high accuracy of around ±7 mm (Pacione & Vespe, 2008).
Here, R d and R ν are the specific gas constant for dry air and water vapor, and k 1 , k 2 , and k 3 are the experimental constant coefficients, respectively, (Bevis et al., 1992).Zenith Wet Delay (ZWD), conversely, emerges as a non-hydrostatic zenith delay calculated through the subtraction of the hydrostatic zenith delay from the zenith total delay.Given the millimetre-precision attainable through models like the Saastamoinen model for hydrostatic zenith delay, it can be inferred that ZWD's accuracy lies within the millimetre range (Böhm & Schuh, 2013;Davis et al., 1985;Saastamoinen, 1972).According to Equation ( 5) and the principles of error propagation, the pursuit of 1% accuracy in IWV corresponds to an equivalent 1% (2.74 K) accuracy in T m (Bevis et al., 1994;Wang et al., 2005).Therefore, the accuracy of calculating T m plays an important role in the accuracy of calculating IWV in GNSS.

| EVALUATION OF T m RESULTS FROM FOUR DATASETS AND THE GPT3 MODEL WITH RADIOSONDE DATA
According to the previous section, T m values derived from four datasets were calculated at the position of F I G U R E 3 T m root mean square error values for the 4 datasets and the GPT3 model (radiosonde results as reference).The blue line with rhombus dots for ERA5 (0.125), the red line with square dots for ERA5 (2.5), the grey line with triangle dots for ERAI, the yellow line with cross dots for NCEP and the green line with circular dots for the GPT3.F I G U R E 4 T m mean absolute error values for the 4 datasets and the GPT3 model (radiosonde results as reference).
The blue line with rhombus dots for ERA5 (0.125), the red line with square dots for ERA5 (2.5), the grey line with triangle dots for ERAI, the yellow line with cross dots for NCEP and the green line with circular dots for the GPT3.

T A B L E 2
The statistical parameters of the bias are the T m values obtained from four datasets and GPT3 (T m values in Kelvin of radiosonde data as a reference).12 radiosonde stations to be compared with T m values from radiosonde data.Figure 2 shows the temporal variation of T m for station OIII.By regarding the values derived from radiosonde data as the reference, RMSE and mean absolute error (MAE) were obtained for the 13-year observation period for each dataset extending this analysis; T m values from the GPT3 model were computed at station coordinates for inclusion in this comparative assessment.RMSE and MAE are visualized through Figures 3 and 4, respectively.From Figure 3, it is apparent that ERA5 0.125 exhibits a narrow RMSE range of approximately 1-2 K across most stations (11 out of 12), showcasing higher accuracy than other datasets.ERA5 2.5 and ERAI closely follow, F I G U R E 5 T m root mean square error values divided by month for the OIII station located in Tehran.The blue line with rhombus dots for ERA5 (0.125), the red line with square dots for ERA5 (2.5), the grey line with triangle dots for ERAI, the yellow line with cross dots for NCEP and the green line with circular dots for the GPT3.displaying accuracy in the range of 1.8-3 K. NCEP datasets, however, demonstrate a slightly lower accuracy across most stations (11 out of 12), with values ranging from 2 to 3.2 K. Conversely, the GPT3 model registers a larger RMSE than the other datasets, encompassing a uniform 3 K range for all radiosonde stations.Turning to Figure 4, MAE values reflect a similar trend.ERA5 0.125 maintains MAE values ranging from 0.88 to 1.7 K across all stations, whereas the other three datasets span the 1-2 K range.Notably, the GPT3 model's MAE values exceed those of the other datasets, resting between 2 and 3 K.Table 2 presents the maximum and minimum bias values obtained from the radiosonde data for the four datasets and the GPT3 model.The data range for these four datasets, like the GPT3 model, spans around 5 to 19 K.More precisely, Figure 5 depicts the monthly division of RMSE values over 13 years for the OIII station situated in Tehran.From the figure, it becomes evident that the RMSE values for the four datasets during months 7, 8 and 9 are slightly higher compared with other months.This discrepancy is likely attributed to warmer weather conditions and increased atmospheric water vapor content.However, this dependency is less pronounced in the case of the GPT3 model.For an exploration of the impact of height, Figure 6 illustrates the RMSE values across different heights for the various datasets and the GPT3 model.As indicated in the figure, no significant height dependence is observable across any of the datasets or the GPT3 model.

| EVALUATION OF IWV RESULTS FROM 4 DATASETS AND GPT3 MODEL WITH RADIOSONDE DATA
In this section, the impact of data sources on IWV determination was investigated using ZTD observations from the Tehran IGS station (THEN).To achieve this, the station's ZTD observations were transformed into ZWD using the Saastamoinen model.Subsequently, these ZWD values were converted into IWV using both T m values derived from the four datasets and the T m value from the GPT3 model.Figure 7 shows the temporal variation of IWV for station OIII.By taking the radiosonde data at the Tehran station (OIII) as the benchmark, the evaluation extended to computing the RMSE across the four datasets and the GPT3 model.Table 3, in turn, compiles the overall RMSE values for these entities, stratified by season.As discerned from this table, ERA5 0.125 leads the accuracy chart with the smallest RMSE, hovering around 0.07 kg m À2 .Meanwhile, the results converge for ERA5 2.5 and ERAI, each bearing RMSE values of approximately 0.09 kg m À2 .NCEP's dataset follows suit with an RMSE of around 0.1 kg m À2 , marginally lower than the other three datasets.Notably, the GPT3 model exhibits relatively higher RMSE outcomes, registering around 0.13 kg m À2 .The seasonal partitioning of results unravelled an interesting trend: during summer, RMSE values escalate across all datasets, eclipsing those of other seasons by about 50%.This season-specific escalation is also observed in the total RMSE.Figure 8 offers a schematic overview of the process undertaken to calculate and juxtapose IWV values across the four datasets and the GPT3 model.

| CONCLUSION
The investigation conducted in this study introduces significant novelty and innovation within meteorological research and operational forecasting.While applying established methods to Iran contributes to its novelty, its innovation extends beyond this initial application.Our findings illuminate the accuracy and reliability of T m and IWV calculations within Iran's distinct climatic landscape.Comparing T m values across various datasets, including the GPT3 model, offers critical insights into their regional performance.Notably, ERA5 data with a 0.125 resolution demonstrates higher accuracy, indicating its reliability for precise T m calculations.In the operational context, refining T m calculations to account for regional climatic variations and exploring hybrid models using multiple datasets can significantly enhance accuracy and operational robustness.Moreover, investigating the influence of variables like terrain and topography on T m and IWV calculations has the potential to advance our understanding of Iranian atmospheric dynamics.Importantly, the numerical results from our study support these conclusions.The accuracy of T m results derived from ERA5 data with a 0.125 resolution surpasses other datasets, falling within the range of 1-2 K. Additionally, the accuracy of ERA5 and ERAI datasets with a 2.5 resolution is comparable with and higher than NCEP's accuracy.In contrast, the GPT3 model exhibits lower accuracy, ranging around 3 K.Assessing RMSE results across months of the year reveals that T m accuracy is lower during warm seasons (months 6, 7 and 8) with higher RMSE values.Furthermore, the RMSE results for T m demonstrate minimal dependence on station height.Evaluating IWV outcomes at the Tehran station illustrates that RMSE results of the ERA5 dataset with a 0.125 resolution are higher compared with other datasets, around 0.07 (kg m À2 ).Both ERA5 and ERAI datasets exhibit a similar RMSE, approximately 0.09 (kg m À2 ).Notably, the GPT3 model's accuracy is relatively lower than other datasets.Dividing results by seasons reveals higher RMSE during summer, about 50% higher compared with the annual RMSE.These numerical results underscore the practical significance of our findings.
U R E 6 T m root mean square error values in terms of height for four datasets and the GPT3 model.no height dependence is observed in any of the datasets or the GPT3 model.The blue line with rhombus dots for ERA5 (0.125), the red line with square dots for ERA5 (2.5), the grey line with triangle dots for ERAI, the yellow line with cross dots for NCEP and the green line with circular dots for the GPT3.Variation of integrated water vapor with time for OIII station.
Steps of calculating and comparing integrated water vapor values for four datasets and the GPT3 model.
List of radiosonde stations with geographical coordinates and number of epochs in Iran between 2007 and 2019.
T A B L E 1 F I G U R E 1 Location of radiosonde stations in Iran and mean values of T m over 2007 to the end of 2019.
T A B L E 3 IWV RMSE of the 4 datasets and the GPT3 model in general and divided by season.