Evaluation of Five Reanalysis Products With Radiosonde Observations Over the Central Taklimakan Desert During Summer

To provide guidance for the use of reanalysis data in the Central Taklimakan Desert (CTD), five reanalysis products are evaluated based on the radiosonde data obtained from two field experiments during summer for the first time in the CTD, including the European Center for Medium‐Range Weather Forecasts (ECMWF) Reanalysis version 5 (ERA5), ECMWF Reanalysis‐Interim (ERA‐Interim), Japanese 55‐years Reanalysis (JRA55), Modern‐Era Retrospective Analysis for Research and Applications version 2 (MERRA2), and the National Centers for Environmental Prediction‐Department of Energy Reanalysis version 2 (NCEP2). The results show that reanalysis temperature (T), specific humidity (Q), geopotential height (GPH), and wind field (U and V components) are consistent with the radiosonde observations in terms of the vertical distribution. In general, ERA5 has the best performance in the CTD during the study period, followed closely by ERA‐Interim. However, NCEP2 produces the largest error. The errors of all the reanalysis data show significant diurnal variations, and the diurnal variations differ from each other. Moreover, the results indicate that the reanalysis datasets have the largest deviation at 850 hPa (near the ground), which means that in the desert region complex interactions may exist between the land surface and the atmosphere. Therefore, more attention should be paid to the description of complex interactions between land and atmosphere over the moving‐sand desert region in the numerical models.

by data assimilation technique and predictive model (Kalnay et al., 1996;Lin et al., 2014). In view of these inherent uncertainties, it is highly imperative to assess the reanalysis data before weather, climate, and environmental studies (Hodges et al., 2011;Hu et al., 2019;Lin et al., 2014).
Several previous studies have shown that reanalysis datasets have multiple degrees of errors for meteorological variables, especially in regions with harsh environmental conditions and insufficient observation data. For instance, based on the comparison of observations at weather stations and reanalysis data (MERRA, NCEP/NCAR-1, CFSR, ERA-40, ERA-Interim, and GLDAS), Wang and Zeng (2012) pointed out that CFSR has the best overall performance over the Tibetan Plateau, while NCEP/NCAR-1 ranks at the last place. By evaluating reanalysis temperature and precipitation amount of the National Centers for Environmental Prediction-National Center for Atmospheric Research (NCEP-NCAR) (R1) and NCEP-Department of Energy (NCEP-DOE) (R2) based on the surface weather observations, R1 and R2 are found to perform worst for precipitation amount than temperature over the Western Himalayas by Singh et al. (2017). Besides, they found that R1 and R2 perform better for temperature on stations at higher elevations than at lower elevations. Also, Yu et al. (2010) used both surface weather stations and radiosonde observations to verify the performance of the reanalysis products of ECMWF Reanalysis-40 years (ERA-40) and NCEP-NCAR in the Antarctic and found that both products capture the intraseasonal variability of temperature and pressure during winter over the Antarctic.
The quality of reanalysis data in the CTD given by the global model remains uncleardue to the less observational data available there. In this study, the radiosonde data obtained from two experimental field observations in the CTD is used to evaluate the performance of five widely used reanalysis products, including the HUANG ET AL.  , and elevation (contours) over the Taklimakan Desert and its surrounding areas. Operational radiosonde and surface stations are marked with black circles and asterisks, respectively. Circles with crosses denote sites where both radiosonde and surface observations are available. The black dot represents the radiosonde site (83.63°E, 39.04°N, 1,099.3 m) in the central Taklimakan Desert. The closest four grid points, which are used to interpolate to the radiosonde site by the bilinear interpolation method, surrounding the radiosonde site from ERA5, ERA-Interim, JRA55, MERRA2, and NCEP2, are denoted with respective boxes of red, green, blue, black, and purple lines.
European Center for Medium-Range Weather Forecasts (ECMWF) Reanalysis version 5 (ERA5), ECMWF Reanalysis-Interim (ERA-Interim), the Japanese 55-years Reanalysis (JRA55), the Modern-Era Retrospective Analysis for Research and Applications version 2 (MERRA2), and the National Centers for Environmental Prediction-Department of Energy Reanalysis version 2 (NCEP2). This study aims to lift the veil on the accuracy of the reanalysis data of CTD, and might provide a guidance to increase the accuracy of numerical model forecasts in this region where only sparse observations are available due to the complicated underlying boundary conditions. The remainder of this paper is organized as follows. Section 2 gives an overview of the observations and reanalysis datasets, as well as a brief description of the research method. Section 3 provides detailed results of verification, and the Taylor diagrams are applied to compare the reanalysis data in Section 4. A comparison of diurnal variations among the five reanalysis products is given in Section 5. Finally, summary and discussion are stated in Section 6.

Observations and Reanalysis Datasets
Tazhong station (83.63°E, 39.04°N, 1,099.3 m) marked with a black dot in Figure 1 is located in the CTD. The radiosonde data were collected at the Tazhong station during the periods from June 25 to July 3, 2015, and the entire July in 2016, using the global positioning system (GPS) based radiosonde. During the balloon ascending, a digital radiosonde equipped with a GPS and a receiver at the ground was set up to receive and record the data. The ground receiving system (CFL-GNSS-JS) was developed by Beijing Changfeng Microelectronics Technology Company, which has been widely used in China. Four times high-resolution vertical profiles of meteorological variables were obtained every day at 0000, 0600, 1200, and 1800 UTC. One of the advantages of the radiosonde measurement is the high vertical resolution, which is achieved by high frequency (1-s per data set) data acquisition during the balloon ascending. In other words, the vertical resolution of radiosonde data is ∼5-8 m (e.g., Guo et al., 2016;Zhang et al., 2018).
In this study, the instantaneous meteorological variables temperature (T), specific humidity (Q), geopotential height (GPH), zonal wind (U), and meridional wind (V) at the specific levels (i.e., 850,800,750,700,650,600,550,500,450,400,350,300,250,200,150, and 100 hPa) are selected to evaluate reanalysis products.
It should be noted that the radiosonde data used in this study was not assimilated into any of the reanalysis data. ERA5, ERA-Interim, JRA55, MERRA2, and NCEP2 are chosen to be compared with radiosonde data since they are new-generation reanalysis products. There are a variety of differences in these reanalysis data sets, in terms of temporal and spatial resolution, selected data assimilation schemes, physical parameterizations, numerical schemes, and so on (Decker et al., 2012). All the reanalysis data from 850 hPa to 100 hPa with interval of 50 hPa are evaluated, except for NCEP2 with 850,700,600,500,400,300,250,200,150, and 100 hPa levels available. Corresponding to the 4-times daily radiosonde data, the instantaneous values of the reanalysis datasets with intervals of 6 h are evaluated. The specifications of the reanalysis datasets used in this study are listed in Table 1.

Methods
In order to compare the reanalysis data on linear grids with the radiosonde data at the observational site, the reanalysis data are interpolated to the radiosonde site at the same time and same pressure levels, using the bilinear interpolation method. The closest four model grid points used to interpolate for all reanalysis are marked in Figure 1. It is worth noting that there is no obvious terrain in the CTD, while it is common that the height of sand-dunes are from several meters to tens of meters. Except the NCEP2 has a lower horizontal resolution, there are quite small differences between the observational site and the model terrain (Table 1). The statistical quantities error (D), mean error (ME), root mean square error (RMSE), correlation coefficient (R), and standard deviation (STD) are used to evaluate the performance of the chosen reanalysis datasets. An error of reanalysis data set (D i ) is defined as the difference between the reanalysis and the observation, which is given by F i and O i stand for the meteorological element of reanalysis products and observations at each time, respectively. ME, RMSE, R, and STD are calculated by the following formulas, which are where N (= 158) is the total number of the observations in this study, and F and O stand for the average value of meteorological element of the reanalysis products and observations respectively.

Overall Root Mean Square Errors (RMSEs) and Mean Errors (MEs)
The vertical profiles of the RMSE for each reanalysis product verifying against radiosonde observations are given in Figure 2. It is obvious that the largest RMSE of T occurs at 850 hPa level which is near the ground in the CTD. At this level, ERA5 has the smallest RMSE, followed by MERRA2. While NCEP2 shows the largest HUANG ET AL.  (Kanamitsu, 2001) Approximate longitude grid spacing is reported in degrees for models with regular Gaussian grids (Fn) and in kilometers for models with reduced Gaussian grids (Nn). Wavenumber truncations for models with Gaussian grids are shown in parentheses. Terrain refers to the topography height above the sea level, which is obtained from the four closest grid points surrounding the radiosonde site using the bilinear interpolation method.   Figure 3. The secondary maximum RMSE of Q appears at around 600 hPa with a temperature near 0°C, while the one of T occurs at 300 hPa level. The RMSEs peak of T at lower (upper) levels is related to the complicated interactions with land (tropopause). The RMSEs peak value of Q appears at the bottom with the same reason as for T, while the second RMSEs peak of Q may result from the complicated cloud microphysical processes near 0°C. The vertical distributions of GPH show the same results as T. For the vertical distributions of U/V wind field, it is different from the distribution of T, its peak value appears at three levels (850 hPa, 600-500 hPa and 300 hPa) respectively. More specifically, for U, the largest RMSE for ERA5, MERRA2, and HUANG ET AL.
10.1029/2021EA001707 5 of 14 JRA55 appear at 300 hPa not 850 hPa. Compared with others, ERA5 performs the best for U/V winds near the ground, and NCEP2 displays the largest RMSE.
Broadly speaking, the RMSEs of T, Q, GPH, U, and V decrease with increasing height, with the largest value of RMSE occurred near the ground, which means that in the desert region complicated interactions may exist between the land and the atmosphere, whereas the models might lack a full description of the mechanisms involving thermal and dynamical processes near the ground (Santanello et al., 2018). Except for Q, which has very small values at the levels above 150 hPa, RMSEs of other variables increase rapidly at those levels. Around 300 hPa, T, U, and V have their secondary largest RMSE. The large RMSEs at 300 hPa might be related to the interactions between the troposphere and the stratosphere (Smith & Kushner, 2012).
Regarding to the vertical distributions of RMSE of different reanalysis, ERA5 has the smallest RMSE, while NCEP2 has the largest one with an exception of Q. Thus, with an overview, ERA5 has the best performance ability over the CTD during the study period, followed closely by ERA-Interim. With respect to the HUANG ET AL.
10.1029/2021EA001707 6 of 14 correlation coefficient (Figure 3), T has the highest R among the variables. Except for NCEP2, T has high R with values over 0.90 below 600 hPa. For ERA5, R of T is larger than 0.90 except for the one at 500 hPa level. Compared with other meteorological variables, Q has the worst performance in terms of R, especially above 300 hPa levels. It is found out that the reanalysis products show good performances in T, while it has low qualities in Q. The results are in highly agreement with the results given by Bao and Zhang (2013) over the Tibetan Plateau. On the other hand, the values of R of GPH are lower than 0.50 below 700 hPa in all the products, implying complicated interactions between land and atmosphere occur in the lower levels near the ground. Interestingly, U wind shows the lowest R near the ground, while V shows the lowest one at 700 hPa. This might be related to the dominant easterly wind in the lower levels due to the topography.
In terms of MEs (Figure 4), an atmospheric variable can be overestimated or underestimated with irregular patterns among the reanalysis products. In general, ERA5 has the lowest ME, followed closely by ERA-Interim. At 850 hPa level, T is underestimated in JRA55 and NCEP2, while it is overestimated in ERA5, ERA-Interim, and MERRA2. It is worth noting that the T cold bias near the ground of NCEP2 is likely to HUANG ET AL.  be related to the large terrain difference between model and observation (Table 1) due to the coarse horizontal resolution of NCEP2. Q is overestimated by NCEP2, while underestimated by others at most levels. Besides, it should be emphasized that ERA5 has much less ME near the ground (850 hPa) compared to others. Concerning MEs of GPH, large errors occur at 850 hPa and 100 hPa in ERA5, JRA55, and MERRA2. Besides, GPH is overestimated at lower layers while underestimated at upper layers by ERA5, JRA55, and MERRA2. For the MEs of GPH in ERA-Interim, it is almost positive at all the levels except for 100 hPa. For NCEP2, it has positive MEs for GPH at each level, and its largest ME appears at 850 hPa. MEs of U(V) are underestimated at the upper level of 250 hPa in all the products except for V wind in MERRA2. In the lower levels below 500 hPa, the NCEP2 has negative MEs of U(V), while the other reanalysis products mostly have positive values.
There exist great differences among the reanalysis products. The great differences among the reanalysis products may be caused by several reasons (e.g., Chen et al., 2014;Fujiwara et al., 2017;Ma et al., 2008;, one of which is the different model resolution. ERA5 has the finest horizontal and vertical resolution, and thus gives the best capture capability. The result is consistent with Mass et al. (2002) that decreasing the grid spacing to ∼10 km or less normally produces more realistic mesoscale structures, with particular benefits for orographically and diurnally driven flows. Besides, Wang and Zeng (2012) pointed out that various numbers and types of observations assimilated in those reanalysis products is another reason for the difference. In addition, some of the differences are led by the data assimilation and numerical model systems. For instance, Tiedtke (1993) cumulus scheme is utilized in ERA-Interim, while an updated version of the Tiedtke scheme with some modifications is applied in ERA5 (Fujiwara et al., 2017). Figure 5 shows the comparisons of Taylor diagrams (Taylor, 2001), which derived from the R and STD of T, Q, GPH, U, and V. In general, ERA5 (green dots) is the closest to the "OBS" compared with others datasets, which means that ERA5 has the highest R and the smallest RMSE. In the contrast, NCEP2 lies the furthest from the "OBS," with relatively poor performance. Besides, ERA5 captures the variations (STD) better than other datasets, especially in the distribution of T. For instance, similar STDs as the observations can be viewed at 400 and 150 hPa.

Comparisons Among the Reanalysis Datasets
Regarding to the variables, all reanalysis models performed well in the T field and its variations (Figure 5a). However, obvious differences are visible in Q (Figure 5b). The reanalyzes show reasonable patterns in GPH, while all products show larger variations than the observations. The results are consistent with the vertical distribution of RMSE, R, and ME in Figures 2-4. As an alternative to observation data, ERA5 is most suitable compared to the other reanalysis products, followed closely by ERA-Interim. Moreover, the variable T has the highest credibility. Figure 6 shows the diurnal variation of the RMSEs for T, Q, GPH, U, and V in each of the reanalysis products. The RMSEs of all the meteorological elements show significant diurnal variation. Concretely speaking, ERA5 has the smallest diurnal variation, compared with other reanalysis products. Taking T as an example, the diurnal variation of ERA5 is inapparent, and the peaks of RMSEs of other reanalysis datasets appear at lower levels, in which ERA-Interim and NCEP2 have peaks of RMSEs at 0000 local standard time (LST, UTC + 6), and JRA55 and MERRA2 occurring at 1200 LST. Obviously, the RMSEs diurnal variation for Q appears below 400 hPa. However, a slight diurnal variation can be seen in upper levels due to the low RMSEs. Note again that the low RMSEs result from very small values of Q by taking R into account, not the good performance of RMSEs (Figure 3b). The peaks of Q occur at 0000 LST of ERA5, ERA-Interim, and MERRA2, while those of JRA55 and NCEP2 occur at 1800 LST and 1200 LST respectively. All reanalysis products have large RMSEs for GPH at 1200 LST. The errors may be related to the strong radiation in the middle of the day, which makes the atmospheric boundary layer in an unstable state and thus affects the results of the reanalysis (Stull, 1988). For U wind, large RMSEs occur at 0000 LST or 0600 LST, although the peaks appear at different levels. For instance, ERA5 has the largest RMSEs at 300 hPa, while NCEP2  has a peak at 850 hPa with a value of 6.75 m s −1 . As for V, except for ERA5, all the peaks of other reanalysis products occur at 0000 LST on 850 hPa.

Diurnal Variations of RMSEs and MEs
The diurnal variation of ME is shown in Figure 7. Note that the levels and time of the ME peaks are consistent with that of the RMSE peaks, indicating that the diurnal variation of RMSE is greatly dependent on the HUANG ET AL.
10.1029/2021EA001707 11 of 14 diurnal variation of ME. In view of the ME variations, irregular errors (overestimate or underestimate) of atmospheric variables are shown in the reanalysis products, indicating that the accuracy of reanalysis may be influenced by different factors. It is worth noting that the GPH of all the products are overestimated at 0000, 0600, and 1200 LST on almost all levels except for JRA55 and MERRA2.
Both RMSEs and MEs show significant diurnal variations in all variables, and the diurnal variation of RMSE is greatly dependent on the diurnal variation of ME. In summary, ERA5 has the smallest variation, while NCEP2 shows the most significant diurnal variation. Moreover, the large errors of GPH near the surface in the middle of the day (1200 LST) suggest that all the reanalysis models are insufficient to simulate the boundary layer processes with strong turbulence over CTD (Wang, Xu, et al., 2019). Therefore, more attention should be paid to investigate the characteristics in the boundary layer over CTD and to improve the boundary layer parameterization scheme of reanalysis models (Couvreux et al., 2011).

Conclusions and Discussions
The radiosonde observational datasets (including T, RH, Q, GPH, U, and V) from the two field experiments carried out at Tazhong station in CTD were utilized to evaluate the performance of five reanalysis products, including ERA5, ERA-Interim, JRA55, MERRA2, and NCEP2. By comparison, in terms of RMSE, the accuracy of ERA5 stands in the first place, followed by ERA-Interim, JRA55, MERRA2, while NCEP2 has the largest RMSE. The same results can also be detected from the comparisons in Taylor diagrams ( Figure 5). ERA5 (NCEP2) has the finest (coarsest) horizontal and vertical resolution, suggesting that the model resolution plays a significant influence on the quality of the reanalysis data.
Concerned with meteorological variables, the reanalysis products provide the highest credibility of the T field in CTD. However, all the reanalysis products show large errors (RMSE and ME) on the low level of 850 hPa (near the ground). The same pattern can be found in the vertical distributions of Q, GPH, and wind field, which means that in the desert region complex interactions may exist between the land surface and the atmosphere. To some extent, that also means the land-atmosphere processes were not well represented in these reanalysis model systems. In view of this, developing an appropriate representation approach by taking desert surface characteristics into account in future reanalysis works should be considered. Besides, the RMSEs of T, Q, GPH, U, and V decrease with the increasing height. The second-largest errors of T, U, and V occur at the higher levels of 300 hPa, indicating that the interactions between the troposphere and stratosphere have a strong influence on the accuracy of the reanalysis products. Particularly, the secondary maximum RMSE of Q occurs at the lower level at 600 hPa, which may result from the complicated cold cloud microphysical processes near 0°C. RMSEs and MEs show obvious diurnal variations in the meteorological variables. By comparison, ERA5 with the highest resolution shows the smallest variation, while NCEP2 has the most significant diurnal variation. The large error near the ground of GPH in the middle of the day (1200 LST) means that the reanalysis models lack sufficient capacity to describe the boundary layer processes with strong turbulence in CTD.
Finally, we would like to mention several limitations of this statistical study. The observations were obtained in summer (June and July), thus the results only indicate the performance of these reanalysis data during summer. We infer that the largest error near the ground is related to the complicated underlying surface of the desert and interactions between the ground and the atmosphere. Detailed comparisons among the land models used in the reanalysis systems might give some clues to explain the differences between the reanalysis datasets. However, it is a challenge to launch a comparison among the land models. The shortcoming should be kept in mind in the future comprehensive observations and further studies are required. Last but not the least, although there are large differences among the reanalysis products, it is troublesome to determine what is the main factor resulting in the differences because it could be due to model configuration (e.g., resolution), model framework (e.g., physical parameterizations), assimilation system, computational techniques, etc. Nevertheless, this study proves the performance of the most widely used reanalysis products in the context of an evaluation of the observations in summer of CTD. The results may be helpful for the reanalysis data application and might provide a guide for launching comprehensive field experiments for atmospheric sciences researches in this region.