Comparing the performance of high-resolution global precipitation products across topographic and climatic gradients of Central Asia

Accurate and reliable precipitation data with high spatial and temporal resolution are essential in studying climate variability, water resources management, and hydrological forecasting. A range of global precipitation data are available to this end, but how well these capture actual precipitation remains unknown, particularly for mountain regions where ground stations are sparse. We examined the performance of three global high-resolution precipitation products for capturing precipitation over Central Asia, a hotspot of climate change, where reliable precipitation data are particularly scarce. Specifically, we evaluated MSWEP, CHIRPS, and GSMAP against independent gauging stations for the period 1985 – 2015. Our results show that MSWEP and CHIRPS outperformed GSMAP for wetter periods (i.e., winter and spring) and wetter locations (150 – 600 mm (cid:1) year − 1 ), lowlands, and mid-altitudes (0


| INTRODUCTION
The spatial and temporal variability of precipitation shape hydrological cycles (Michaelides et al., 2009). Climate change alters these cycles through changes in precipitation frequency, intensity, and amount and by affecting evapotranspiration patterns (Trenberth, 2011;Tan et al., 2020). These changes in turn impact freshwater availability for agriculture, hydropower, and socioeconomic development (Tan et al., 2020). Most regions with rainfed or irrigated crops depend on precipitation totals during the peak months of plant growth to meet water demand . Accurate and reliable precipitation records with high spatial and temporal resolutions are therefore essential to study climate variability, the management of water resources, and hydrological forecasting (Sun et al., 2018).
Satellite precipitation sensors are currently the only instruments that can provide near real-time global coverage of precipitation with estimates from geosynchronous infrared sensors on geostationary satellites, which have a high sampling frequency, and polar-orbiting microwave sensors on low Earth-orbiting satellites with lower temporal resolution (Huffman et al., 2007). Satellite-based estimates of precipitation are increasingly used to complement ground station observations, which are limited in areal coverage and density, particularly in inaccessible regions (e.g., mountainous areas), sparsely populated areas, and especially in developing countries (Zambrano-Bigiarini et al., 2017;Rivera et al., 2018;Sun et al., 2018). The lack of station data in mountainous regions is worrisome because precipitation patterns in the mountains are crucial to assess changes in regional climate and in the cryosphere, which directly affect water availability in downstream regions (Unger-Shayesteh et al., 2013;Immerzeel et al., 2020;Viviroli et al., 2020).
Merging satellite data with gauge measurements from ground stations and reanalysis estimations can improve the accuracy of global precipitation datasets (Tan et al., 2020). Reanalysis-based estimates merge atmospheric measurements and climatic models encompassing physical and dynamical processes to produce consistent, accurate, and continuous meteorological data (Sun et al., 2018;Tan et al., 2020). However, varying availability of ground observations for calibrating the satellite algorithms and reanalysis estimates can compromise the quality of the merged global precipitation datasets (Hu et al., 2018;Sun et al., 2018;Zandler et al., 2019). Therefore, quantitative validation of these global precipitation datasets against independent ground observations is critical to determine the accuracy and uncertainty of the global products at local and regional scales because misestimations can arise from sampling, instrumental (e.g., sensor observations), and algorithmic errors (Nijssen, 2004;Ebert, 2007;Hu et al., 2018). Validation of global precipitation data is hence a keystone for better understanding the impact of climate changes on regional hydrological cycles.
Station data for Central Asia are very scarce, especially in higher elevations and particularly since the dilapidation of much of the meteorological infrastructure following the collapse of the Soviet Union in 1991 and the independence of the Central Asian republics (Schiemann et al., 2008;Unger-Shayesteh et al., 2013). This is unfortunate because precipitation, glaciers, and snowmelt dominate the hydrological budget in the semiarid continental climate of Central Asia, where water fluxes in the mountainous areas play a crucial role in downstream hydrology and water availability (Schär et al., 2004;Mannig et al., 2013;Maussion et al., 2013). The region's economy and ecology heavily rely on water from the two main endorheic rivers, the Amu Darya and Syr Darya, which originate in the headwater catchments of the Pamir and Tien Shan mountains, respectively (Schär et al., 2004;Unger-Shayesteh et al., 2013). In addition, the region is a hotspot of climate change with warming rates of up to 0.3 C per decade during the past half century (Teixeira et al., 2013;Reyer et al., 2017;Peng et al., 2019).
Precipitation data sourced from global precipitation products are paramount for data-scarce or ungauged regions, such as Central Asia. Previous studies have evaluated precipitation products for the region with varied and sometimes contrasting results (Table S1, Supporting Information). Several studies have suggested that the gauge-based products from the Global Precipitation Climatology Centre (GPCC), with spatial resolutions ranging from 0.25 to 1 , were the most reliable precipitation data for the region but underestimated precipitation in the mountains (Malsy et al., 2014;Hu et al., 2018). In the Amu Darya basin, however, the gauge-based Climate Prediction Center (CPC) (0.5 ) dataset performed best (Salehie et al., 2021). In the Pamir mountains, the reanalysis product Modern-Era Retrospective analysis for Research Application (MERRA, 0.5 ) stood out, although its performance deteriorated strongly for the period 1998-2012 due to the decline of station data availability (Zandler et al., 2019).
The resolution of the abovementioned gauge-based and reanalysis products is not suitable for studies at regional and catchment scales in Central Asia because their spatial resolution is too coarse to capture precipitation gradients in the complex topography of the region (Hellwig et al., 2018;Henn et al., 2018). Here we evaluate global or near-global precipitation products with a spatial resolution higher than 12 × 12 km and for the period 1981-2015 for the Pamir and Tien Shan mountains and adjacent lowlands of Central Asia. We only selected gauge-corrected products with proven reliability at local and regional scales with a sufficiently long time series available to support analysing climatic trends and variability (see Table S1 and references therein). Finally, we only consider products that are still operational. These criteria resulted in the selection of the Climate Hazard Group InfraRed Precipitation with Station Data (CHIRPS version 2, 0.05 ) , the Multi-Source Weighted-Ensemble Precipitation (MSWEP, 0.1 ) (Beck et al., 2017a), and the Global Satellite Mapping of Precipitation (GSMAP, 0.1 ) (Ushio et al., 2009).
The gauge-corrected versions of GSMAP and MSWEP products were previously considered the best-performing high-resolution precipitation data for the region (Guo et al., 2015;Guo et al., 2017;Lu et al., 2021) and globally (Beck et al., 2017b). The gauge-calibrated CHIRPS product was ranked third in Central Asia when compared to six coarser precipitation products (Salehie et al., 2021). However, the accuracy of MSWEP and CHIRPS has to date not been assessed for Central Asia with data from meteorological stations that were not used for gauge correction of the investigated products.
Here we aim to (a) identify the strengths and limitations of the three global precipitation products at daily, monthly, seasonal, and annual timescales, (b) determine the effect of topography and climate regimes on the performance of the precipitation products, (c) quantify the accuracy of the products for different precipitation intensities, and (d) based on these evaluations, propose which precipitation product is most appropriate for subsequent studies. Our analysis hence facilitates informed decisions F I G U R E 1 Study area and location of the 30 precipitation stations used for validation. Bar plots represent the annual precipitation regimes (section 4.4) of the clusters shown in the map: cluster 1 (yellow) is characterized by winter and spring precipitation and long, dry summers and autumns; cluster 2 (blue) has winter and spring precipitation and a short, dry summer period; and summer precipitation dominates in cluster 3 (red) [Colour figure can be viewed at wileyonlinelibrary.com] for assessing climate variability, hydrological and agricultural studies, and water management in this heterogeneous and data-scarce region.

| STUDY AREA
The study area covers the Tien Shan and Pamir mountains, including the adjacent semi-arid lowlands of Kazakhstan, Kyrgyzstan, Tajikistan, Turkmenistan, and Uzbekistan ( Figure 1). In the western and northern Tien Shan and Pamir mountains, most precipitation occurs during the winter and spring seasons (November-March) and falls primarily as snow (Barlow and Tippett, 2008;Sorg et al., 2012). In contrast, parts of central and eastern Tien Shan and eastern Pamir receive most precipitation during summer months (Apel et al., 2018). Precipitation amounts in the region range from 50 to 1,000 mm annually, primarily determined by orographic uplift and midlatitude westerly cyclones (Mariotti, 2007;Barlow and Tippett, 2008).
Large-scale variation of extratropical westerlies, which transport moisture from the Atlantic Ocean, the Mediterranean and Caspian Sea, and the Persian Gulf, are the major moisture sources throughout the year in Central Asia; the central and eastern Tien Shan and southeastern Pamir are also affected by the Indian Monsoon in summer (Böhner, 2006;Meier et al., 2013). Moisture fluxes from the Arabian Sea and tropical Africa during warm El Niño-Southern Oscillation events cause higher precipitation in autumn and spring in southwestern Central Asia (Mariotti, 2007).
3 | DATA 3.1 | Precipitation products 3.1.1 | CHIRPS CHIRPS provides daily blended gauge-satellite precipitation estimates covering most global land regions (50 N-50 S) with a resolution of 0.05 (about 5 km at the equator) from 1981 until present and with a low latency (updated roughly every 2 days, with a stable product released every 3 weeks) . CHIRPS combines precipitation estimates based on observations of infrared cold cloud duration in which cold and bright clouds are related to convection and therefore rain (Sun et al., 2018). CHIRPS incorporates station data from public data streams and private archives and uses reanalysisbased estimates of the Coupled Forecast System (CFS) to temporally disaggregate from 5-day to daily estimates and when thermal infrared observations are missing (Shukla et al., 2014;Funk et al., 2015). Calibration of CHIRPS involves three main components: (a) the Climate Hazards group Precipitation climatology (CHPclim); (b) the satellite-only Climate Hazards group Infrared Precipitation (CHIRP); and (c) the stationblending procedure . We downloaded the CHIRPS data from https://data.chc.ucsb.edu/ products/CHIRPS-2.0/.

| GSMAP
The GSMAP is a multisatellite algorithm developed by the Japan Science and Technology Agency (Okamoto et al., 2005;Kubota et al., 2007). The algorithm follows three main steps: (a) retrieval of precipitation rate from passive microwave data (precipitation-sized particles such as ice content are detected through clouds), provided by the CPC using a Kalman filter approach (Ushio et al., 2009;Shige et al., 2013;Yamamoto et al., 2017); (b) propagation of the estimated precipitation rates using a backward-and forward-morphing technique (Joyce et al., 2004); and (c) refinement of precipitation data based on the relationship between the infrared brightness temperature and surface precipitation rates. GSMAP has a spatial resolution of 0.1 (about 11 km at the equator) and near-global coverage (60 N-60 S). It provides hourly averaged rainfall (mmÁhr −1 ). We used daily precipitation values (mmÁday −1 ) of the GSMAP_Gauge_NRT (near real-time with gauge-calibration using the NOAA CPC Unified Gauge-Based Analysis of Global Precipitation dataset, 0.5 ) that has the longest record, starting in 2000 up to present day. We downloaded the GSMAP data from https://sharaku.eorc.jaxa.jp/GSMaP/.

| MSWEP
The MSWEP precipitation dataset provides 3-hourly and daily temporal resolution at 0.1 -0.25 spatial resolutions from 1979 to near present on a global scale (Beck et al., 2017a;2017b;Beck et al., 2019). It merges gauge observations, satellite, and reanalysis estimates based on timescale and location (Beck et al., 2019). The weight assigned to the gauge-based estimates is calculated from the gauge network density, and the weights assigned to the satellite-and reanalysis-based estimates are calculated from their comparative performance at surrounding gauges (Sun et al., 2018). We used the latest version of MSWEP with a spatial resolution of 0.1 (about 11 km at the equator). This dataset relies on the reanalysis ERA5, the Multi-Satellite Retrievals from the Global (IMERG) satellite constellation, and the Gridded Satellite (GridSat) thermal infrared imagery, with GridSat only used prior to 2000. Unlike previous versions of MSWEP, this version does not correct underestimation over mountainous and snow-dominated regions in order to match rain gauge observations as closely as possible (Beck et al., 2021). We downloaded the MSWEP data from http://www.gloh2o. org/.

| Precipitation gauge data
We collected data for 30 stations within the five central Asian countries for the period 1981-2015 (Figure 1), which were not used to correct the global precipitation products. The selection of these stations was based on the list of station names and locations used for the CHIRPS product, the main public gauge sources of MSWEP . We directly requested and collected the station data from the local research and governmental institutions for validation, and they are not available on open public archiving domains that are used for gauge correction of the global datasets. For some stations, the calibration of the global precipitation made only used a share of the available time series (see Table S2); in these cases, we used only the remaining data for the validation. For GSMAP, we used only the data from those 27 stations, which are available for the period April 2000 through December 2015.

| Evaluation of precipitation products at different timescales
We evaluated all products at daily, monthly, seasonal, and annual timescales to understand their value for applications that require precipitation data of various temporal resolutions (e.g., hydrological forecasting, water resource management, and agricultural drought monitoring) (Tobin and Bennett, 2014;Funk et al., 2015). For the seasonal timescale, we used calendar seasons: December, January, February (DJF); March, April, May (MAM); June, July, August (JJA); and September, October, November (SON). We grouped stations by elevation bands, precipitation amount, and precipitation regime to evaluate the reliability of precipitation products in diverse environmental conditions and to determine the effect of topography and climate regimes on the performance of the products. Since different precipitation intensities challenge the accuracy of the precipitation estimates, we classified daily precipitation time series into dry spells (<1 mmÁday −1 ) and wet spells of various intensities (1-5, 5-20, 20-40, and >40

| Evaluation of precipitation products at different spatial scales
We performed a point-to-pixel analysis to compare the time series of precipitation gauge data to the corresponding pixel of each product (Thiemig et al., 2012;Zambrano-Bigiarini et al., 2017;Baez-Villanueva et al., 2018). To ensure a consistent comparison among the products, we upscaled CHIRPS to the coarser spatial resolution of MSWEP and GSMAP (i.e., 0.1 ) using bilinear interpolation. To determine the effect of the upscaling, we performed the evaluation for both original and upscaled versions (hereafter termed CHIRPS upscaled).

| Evaluation metrics
We evaluated the performance of the products for continuous precipitation time series and for discrete precipitation events. For precipitation time series, we used the modified Kling-Gupta efficiency (KGE 0 ) (Gupta et al., 2009;Kling et al., 2012) (Equation S1), a dimensionless metric that measures the ability of the precipitation products to reproduce temporal dynamics (correlation coefficient r) while preserving the volume (bias ratio β) and the distribution of precipitation (variability ratio γ). KGE 0 , r, β, and γ values of 1 indicate a perfect agreement between the precipitation estimates from the product and the ground observations. KGE 0 values range from −∞ to 1. To determine the product accuracy, we used the mean absolute error (MAE) (Equation S2), which measures the average magnitude of the difference between the estimated and observed values (Ebert, 2007).
We evaluated the ability of tested precipitation products measuring the correspondence between estimated and observed dry and wet spells of various intensity groups (section 4.1) using a standard contingency table (Ebert, 2007) that summarizes the frequency of correct and false predictions. We used three categorical measures-that is, the probability of detection (POD), the false alarm ratio (FAR), and frequency bias (fBias) (Equation S3)-that quantify various aspects of performance: POD measures the fraction of correctly identified observed events ("hit rate"), FAR gives the fraction of diagnosed events that were dry spells, and fBias calculates the ratio of the estimated events to the observed precipitation (Ebert, 2007;Guo et al., 2017;Baez-Villanueva et al., 2018). Perfect values are fBias (no bias), POD (detection of all events) is 1, and FAR (no events are incorrectly identified) is 0.

| Dominant precipitation regimes
To capture the heterogeneity of climatic conditions and precipitation seasonality, we determined the precipitation regimes of each gauge and its corresponding grid locations with a monthly sequence of the Pardé coefficients (Pardé, 1933) (Equation S4), which are dimensionless and can be used for interregional comparisons of precipitation regimes. We used the k-means clustering algorithm (Lloyd, 1982), which minimizes the sum of squares of distances between the gauging stations' values and the cluster with the nearest mean. In that way, we grouped the "shapes" of the seasonal precipitation regime according to membership in a cluster of precipitation with a similar shape (Weingartner et al., 2013). We selected the optimal number of clusters (k) using the elbow method, a tradeoff between the cluster sum of squared errors, and a larger number of clusters (graphically) (Thorndike, 1953;Zhang et al., 2016).

| Performance at different timescales
At the seasonal scale, all products performed worst in summer (Figure 2). The overall performance of GSMAP was lower compared to the other products at all timescales, except summer, and especially in winter (KGE 0 < 0). MSWEP, CHIRPS, and its upscaled version showed the best performance in winter. The second-best seasonal performance was spring for MSWEP and autumn for CHIRPS products. All products showed positive correlation coefficients (r) for all timescales ( Figure S1). MSWEP best captures the temporal dynamics of precipitation in winter, followed by the two CHIRPS products. Moreover, MSWEP and both versions of CHIRPS performed similarly well in autumn and spring. In terms of bias values (β), CHIRPS and CHIRPS upscaled showed the best performance at all timescales, except for the summer season, when it slightly F I G U R E 2 KGE 0 between the precipitation products and precipitation gauge data for six different timescales. The vertical blue line indicates the optimum value for KGE 0 . From left to right and up to bottom: monthly, winter (December, January, February), spring (March, April, May), summer (June, July, August), and autumn (September, October, November) [Colour figure can be viewed at wileyonlinelibrary.com] underestimated precipitation ( Figure S2). GSMAP revealed higher overestimation in winter and underestimation in summer, whereas MSWEP overestimates precipitation in autumn and summer. Among all products, only GSMAP overestimated the variability (γ) of the observed precipitation, especially in winter ( Figure S3), whereas the other products underestimated it at all timescales but particularly during the summer season.
CHIRPS and its upscaled version performed best at monthly timescales and performed similar to MSWEP at the annual timescale. MSWEP showed the highest correlations but also the highest overestimation of precipitation at both scales. CHIRPS upscaled, followed by CHIRPS, performed best in terms of bias as well as in capturing the precipitation variability at monthly timescales, whereas MSWEP had a better performance in estimating the precipitation variability at the annual timescale.
Our results reveal distinct variations in MAE for different timescales (Figure 3). Regarding the lowest median MAE, both CHIRPS datasets showed it in autumn, GSMAP demonstrated it in summer, and MSWEP showed it in spring and winter. MSWEP in summer and GSMAP in spring, winter, and autumn exhibited the largest errors. CHIRPS products presented the lowest MAE at annual and monthly timescales.

| Spatial evaluation of the products' performance
The highest correlations for all products were found in the western part of the study area, where most of the precipitation occurs in winter and spring (Figure 4). GSMAP overestimated the variability of precipitation (γ) in the southern Pamir and western Tien Shan, while the other products, especially MSWEP, underestimated the variability in this area. All of the datasets overestimated precipitation (β) in the same region, with CHIRPS and its upscaled version performing slightly better. In the southeastern Pamir and western Tien Shan, the overall performance (measured with KGE 0 ) was poor for all products, but especially that of GSMAP. The precipitation products performed best in the stations located in the western Pamir and northern Tien Shan, with KGE 0 values of 0.92 for MSWEP, 0.87 for both CHIRPS products, and 0.83 for GSMAP. Overall, the MAE was lowest in the western region where MSWEP performed best, followed by GSMAP, CHIRPS upscaled, and CHIRPS. We found the highest MAE in the southwestern Tien Shan for all products, followed by the southern Pamir, especially for GSMAP and MSWEP.
To determine how topography and climate regime affected the products' performance, we grouped all F I G U R E 3 Mean absolute error (MAE) in mm of various global precipitation datasets and precipitation gauge data for six different temporal scales. The vertical blue line indicates the optimum value for MAE [Colour figure can be viewed at wileyonlinelibrary.com] stations by elevation bands, precipitation amount, and precipitation regime ( Figure 5). All of the products, but especially GSMAP, had lower performance at high elevations (>3,000 m). MSWEP performed best in the lowlands (<1,000 m), while CHIRPS excelled at midaltitudes (2,000-3,000 m). The MSWEP and CHIRPS products had similar performance between 1,000 m and 2,000 m elevation.
Considering annual precipitation, MSWEP performed best for wetter locations (>300 mmÁyear −1 ), while both CHIRPS performed best for moderately wet locations (150-300 mmÁyear −1 ) (Figure 5b). For drier locations (<150 mmÁyear −1 ), all of the products failed to capture the precipitation dynamics. With respect to precipitation regimes, all of the products performed best in cluster 1, where most of the precipitation falls

KGE'
r MAE β γ F I G U R E 4 KGE 0 , its components (r, β, γ), and MAE derived from a monthly timescale. The colours for KGE 0 , r, and MAE range from light yellow (very poor performance) to dark red (best performance). For β and γ, white colours represent their best performance, while underestimation is depicted in dark purple and overestimation is depicted in dark green. Grey colour identifies stations not available for product evaluation (i.e., GSMAP, data before 2000). The black outlines correspond to the countries (see Figure 1  F I G U R E 5 KGE 0 at a monthly timescale between the precipitation products and precipitation gauge data for different (a) elevation bands: 0-1,000 m, 1,000-2,000 m, 2,000-3,000 m, and 3,000-4,000 m; (b) mean annual precipitation (2000)(2001)(2002)(2003)(2004)(2005)(2006)(2007)(2008)(2009)(2010)(2011)(2012)(2013)(2014)(2015); (c) precipitation regimes with corresponding clusters (cluster 1 has winter-spring precipitation, long dry summers; cluster 2 has winter-spring precipitation, short dry summers, and cluster 3 is characterized by summer precipitation). The vertical blue line indicates the optimum value for KGE 0 . N indicates the number of stations in each group [Colour figure can be viewed at wileyonlinelibrary.com] in winter and spring, with long dry summers and autumns (Figures 1 and 5c). MSWEP showed the highest median values, followed by CHIRPS. In cluster 2, with precipitation in winter and spring but short, dry summers, CHIRPS products performed better than the other products did. However, in cluster 3 (summer precipitation), all of the products performed poorly, and only MSWEP had positive median KGE 0 values. CHIRPS and its upscaled version were unable to capture the summer precipitation regime of most of the stations (Figure 6e,f), and MSWEP performed better in representing the region's climatology but overestimated the precipitation amounts.

| Evaluation of dry and wet spells
All of the precipitation products were able to accurately detect dry spells with POD >0.6 ( Figure 7). However, the ability to detect wet spells decreased proportionally with increasing precipitation intensity. MSWEP showed slightly better performance in terms of POD, except for the most intense precipitation class, for which CHIRPS upscaled performed better. The FAR values are consistent with the POD, and all of the products identified dry spells very well, with MSWEP having slightly better performance, which decreased with precipitation intensity, and GSMAP having better performance for moderate precipitation events. CHIRPS showed the closest agreement for all precipitation intensities in terms of fBias, with a slight overestimation (fBias >1) of light events and an underestimation (fBias <1) of heavy precipitation.

| DISCUSSION
We evaluated the performance of three precipitation products (CHIRPS, GSMAP, and MSWEP) with a spatial resolution higher than 12 km to capture local precipitation patterns over the heterogeneous topography and climate of Central Asia. To do so, we collected precipitation data from 30 independent gauging stations across the region. We accounted for elevation, precipitation regime, precipitation amount, intensity of wet spells, temporal dynamics, and different timescales. Overall, the products all performed best in (a) altitudes below 3,000 m; (b) regions dominated by winter and spring precipitation; and (c) wetter periods (i.e., winter and spring) and locations with between 150 and 600 mm of precipitation per year, and the products accurately detected dry spells. We found key differences between the products. MSWEP was best at capturing precipitation dynamics, CHIRPS was best at representing the volume and distribution of precipitation over different timescales and locations, and GSMAP generally showed poorer performance. We also evaluated MSWEP v2.8 for the first time and found that, as compared to previous versions (the results of an earlier evaluation are presented in Figures S4 and S5), v2.8 improved the overall performance in the study region, especially for spring and winter, and did not overestimate precipitation as much. MSWEP and CHIRPS also captured precipitation dynamics well for the Tibetan Plateau (MSWEP v2) (Liu et al., 2019), Chile (MSWEP v1.1) (Zambrano-Bigiarini et al., 2017), western Africa (MSWEPv2.2) (Satgé et al., 2020), India (MSWEP v2.1) (Prakash, 2019), and the Bolivian Altiplano (MSWEP v2.1) (Satgé et al., 2019). Because CHIRPS is intended to support agricultural drought monitoring, its best performance was expected at around the wettest months for each location . This is supported by our results for Central Asia, where CHIRPS performed best for wetter periods and locations. Similar to our results, GSMAP also had the comparatively poor performance for the mountainous endorheic system of the Bolivian Altiplano (Satgé et al., 2019) and for western Africa (Satgé et al., 2020). The accuracy of GSMAP estimates may be affected by the lower number of stations in the source data CPC compared to MSWEP and CHIRPS (Satgé et al., 2020).
Despite MSWEP and CHIRPS having the best overall performance, we found some limitations. Both products performed worst in summer (overestimation of precipitation), during the driest period in areas where winter and spring precipitation dominate (clusters 1 and 2; Figure 1), and for stations in areas with precipitation below 150 mmÁyear −1 . Similar findings have been reported for CHIRPS in other drylands, such as northeast Brazil (Paredes-Trejo et al., 2017), Sub-Saharan Africa (Harrison et al., 2019), and Mainland China (Bai et al., 2018), as well as for MSWEP (v2.1) in northeast India (Prakash, 2019). The low performance in these areas arguably is due to the very low precipitation, in that a single incorrectly identified rainfall event could lead to 100% overestimation or underestimation (Zambrano-Bigiarini et al., 2017). Satellite-based precipitation estimates may be more suited to estimating convectional tropical rainfall patterns than the isolated, highly localized, and short-lived convective rainfall typical in semiarid to arid areas (Dinku et al., 2010;Thiemig et al., 2012;Beck et al., 2017b). Our findings support this claim. In dry regions, detecting precipitation is difficult because space-born sensors (e.g., microwave and infrared sensors) can miss the subcloud evaporation of raindrops or rainfall suppression by desert aerosols (e.g., mineral dust) and be affected by the land's surface properties, such as a hot background (e.g., upwelling microwave radiation) (Dinku et al., 2011;Beck et al., 2017b).

Annual Precipitation Interannual Precipitation
F I G U R E 6 Interannual variation in precipitation and annual precipitation estimated by the precipitation products (dashed lines), as compared to the gauging stations (solid lines) for each cluster. Cluster 1 (winter/spring precipitation; long, dry summers); cluster 2 (winter/spring precipitation; short, dry summers), and cluster 3 (summer precipitation) [Colour figure can be viewed at wileyonlinelibrary.com] We found that the products overestimated precipitation at higher elevations (>3,000 m), possibly because the gauge network density in such areas is low (Harrison et al., 2019). In complex mountainous terrains, precipitation can be falsely detected due to long-lasting orographic clouds or by the contrast between the temperature and the emissivity of rough land surfaces of water and snowcovered areas, which satellite sensors can misinterpret as precipitation (Gebregiorgis and Hossain, 2013;Guo et al., 2015;Satgé et al., 2019). In addition, in global evaluations, the reanalyses exhibited lower accuracy than the microwave-and infrared-based satellite datasets in the tropics did (Beck et al., 2017b). In contrast, these products perform well in extratropical regions, probably linked to deficiencies in the subgrid convection parameterization schemes along with issues in the land surface parameterization (Beck et al., 2017b). The coverage of the raw data sources, orographic correction, and interpolation techniques may compromise the accuracy of the precipitation products (Sun et al., 2018). Considering the high dependency of the global precipitation products on local gauge calibration, more efforts are needed to increase the accessibility of local observations in order to improve the products' quality and reliability for hydrological, agricultural, and climate studies.
All of the products performed worse in southeastern Pamir and Tien Shan, where precipitation peaks in summer. Generally, such poor performance among all of the products for summer precipitation can be related to challenges in capturing the orographic uplift of warm clouds, false detection of very cold high clouds as precipitating by infrared products, and microwave products missing warm precipitation from shallow clouds (Gebregiorgis and Hossain, 2013;Behrangi et al., 2014;Satgé et al., 2019). Moreover, for the reanalysis-based products (i.e., CHIRPS and MSWEP), poor summer performance might additionally have resulted from an unrealistic northward displacement of the monsoon cycle (Di Giuseppe et al., 2013) and from the fact that atmospheric models in mid-latitudes can more reliably predict winter precipitation associated with synoptic systems such as fronts than it can summer precipitation, which is more often associated with convective systems such as thunderstorms (Haiden et al., 2012;Zhu et al., 2014).
Although the examined precipitation products were able to detect dry spells accurately, their performance decreased for wet spells as precipitation intensity increased. This lower performance for higher intensities can be associated with local storm events with a spatial extent smaller than the satellites' spatial resolution (Thiemig et al., 2012). The weaker detection ability of CHIRPS might be related to its fixed threshold for detecting precipitation from cloud temperatures that might not be appropriate for this region as well as its dependency on the 0.25 TRMM training data, which contributes to the false detection of rainfall events when averaged over larger areas (Dinku et al., 2010;Toté et al., 2015;Paredes-Trejo et al., 2017;Dinku et al., 2018). The reported duplication and inconsistency in the gauge sources used to calibrate CHIRPS could be additional sources of uncertainty (Rivera et al., 2018).
Finally, our results suggest that the precipitation product's selection depends on the specific user needs or application and the regional characteristics. For example, CHIRPS and MSWEP perform best during winter and spring, which makes them suitable to assess terrestrial water storage prior to the irrigation season. While CHIRPS provides daily precipitation amounts, MSWEP has a higher temporal resolution (3-hr resolution), making it more appropriate for subseasonal hydrological monitoring and forecasting (Beck et al., 2017b). Both products have had a long temporal record, from 1979 (MSWEP) and 1981 (CHIRPS) to near the present, with a delay of several days (CHIRPS) or several hours (MSWEP). CHIRPS has a higher spatial resolution (0.05 ) as compared to MSWEP (0.1 ) and more suitable for smaller catchments in elevations below 3,000 m. The products' best performance was achieved in the western, central, and northern Tien Shan and Pamir mountains and in adjacent regions. Although we did not find considerable differences between the original and upscaled version of CHIRPS, a lower performance of the upscaled version can arise from the resampling method used for upscaling; hence, we advise the use of CHIRPS in its native resolution.

| CONCLUSION
We presented the first evaluation of three global highresolution precipitation products over the heterogeneous topography and climate of Central Asia using independent station data. We quantified the products' ability to reproduce temporal dynamics while preserving the volume and distribution of precipitation, evaluated the products' accuracy, and assessed the products' ability to detect dry and wet spells of different intensities accurately. We found that CHIRPS and MSWEP were the most reliable global products for obtaining high-resolution precipitation estimates in Central Asia, especially for wet seasons. Nevertheless, our results highlight high spatial and temporal heterogeneity of the performance, which indicates that the final product for a local application must be selected with care, based on the guidelines provided above. This is particularly relevant for regions with low precipitation levels and in complex terrain where ground station data are sparse.