Precipitation homogenization and trends in the Usumacinta River Basin (Mexico‐Guatemala) over the period 1959–2018

The precipitation variability and trends were investigated in the Usumacinta River Basin (URB) for the period 1959–2018, based on imputed and homogenized data records from 60 meteorological stations in Mexico and Guatemala. The homogenization process played a crucial role in enhancing the quality of the original precipitation series, reducing regional inconsistencies and improving temporal and spatial coherence. The dataset reliably captured large‐scale climate variations, revealing three regions with similar precipitation variability and trends in the URB. Notably, maximum precipitation occurred at 636 m a.s.l., while minimum precipitation was at 1531 m a.s.l., indicating an orographic effect in the region. Extreme precipitation events were linked to El Niño–Southern Oscillation. Although the Mann–Kendall test showed statistically significant negative trends in only 18% of the stations, integration of Sen's slope analysis and 30‐year normals and dry year occurrences highlighted a progressive shift towards dryer conditions throughout the study period in the URB. These drier conditions could notably affect regions with higher precipitation, requiring special attention due to possible socioeconomic impacts associated with drought events. By identifying these vulnerable regions, policymakers and stakeholders can proactively plan and execute adaptive measures to mitigate the potential impacts of droughts on communities, ecosystems, and economic activities within the basin.


| INTRODUCTION
Precipitation is one of the most studied hydroclimatic variables because of its important social, economic and ecological implications.Variations in global or local rainfall (i.e., amount, intensity, frequency) can lead to extreme hydrological events (i.e., drought and floods) and affect water availability and management, and agricultural activities, particularly in developing countries (Kotz et al., 2022).Information related to these events is vital in the current scenario of climate change, especially since recent global climate models predict changes in local precipitation patterns, in particular, reduced precipitation and altered rainfall seasons in the Central America region, and a higher number per year of extreme precipitation events (IPCC, 2022).As a result, Central America is considered a region expected to suffer significant agricultural, ecological and hydrological impacts of climate change (Hannah et al., 2017;Hidalgo et al., 2013).
The Usumacinta River Basin (URB) in the southeast of Mexico and northern Guatemala is in the eastern part of the Grijalva-Usumacinta fluvial system, the second largest to flow into the Gulf of Mexico (after the Mississippi River) and the largest river basin in Mexico, accounting for 30% of the surface runoff of the country (CONAGUA, 2018).The URB belongs to one of the world's regions with the greatest biodiversity, as it is home to the Lacandon Jungle, which has the highest biodiversity in the Tropics (Soasores & Garcia-Garcia, 2017).The basin hosts $2 million inhabitants, belonging to one of the most marginalized populations of Mexico and Guatemala, with the prevalence of indigenous people, who live mainly from crop cultivation and livestock, with low production and high impact on the environment (March-Mifsut & Castro, 2010).
The average annual precipitation in the URB ranges from 1500 mmÁyear −1 in the Mexican lowlands (García, 1998) to 3500 mmÁyear −1 in the Guatemalan highlands (INSIVUMEH, 2016).In this region, precipitation is influenced by easterly waves (Serra et al., 2010), cyclonic activity (NOAA, 2022), polar fronts (Z arate-Hern andez, 2013) and the complex topography of the region, resulting in marked contrast in the annual precipitation cycle between the Caribbean and the Pacific slopes (Alfaro, 2002;Giannini et al., 2000;Maldonado et al., 2017;Taylor & Alfaro, 2021).Previous studies have analysed annual precipitation trends based on climate indices from meteorological station series over three decades in the Mexican part of the URB (Aguilar et al., 2005;De la Barreda et al., 2020;Montero-Martínez et al., 2018), and from reanalysis data in the URB region (Andrade-Vel azquez & Medrano-Pérez, 2020), and in Mexico (Murray-Tortarolo, 2021); however, most results differed between stations, and precipitation did not show general trends.Furthermore, no studies have examined the variability of rainfall in the URB, including meteorological stations in the Guatemalan portion of the basin, which, however, constitutes the largest area of the URB and experiences a high precipitation variability.This can be explained by the restricted spatial and temporal availability of data in northern Guatemala, which is constrained by the poor infrastructure of the network of meteorological stations due to the difficulty of access (e.g., absence of highways, low population, presence of a jungle; Fuentes, 2021).However, precipitation data from the upper basin is essential for understanding the hydrological processes, water availability and impacts of climate change in the entire river basin.
To calculate climate normals and trends, it is highly recommended to perform quality control and homogenization of the dataset and to ensure that the dataset is complete with no missing or erroneous values (WMO, 2017).Among the datasets that can be considered to study the URB, global satellite and reanalysis data have become increasingly available; however, many of these datasets are not always calibrated with rain gauge information for Mexico and Guatemala (Morales-Velazquez et al., 2021).Satellite datasets typically cover a relatively short-term record, mainly since the 1980s, while reanalysis datasets, generated through data assimilation techniques (Parker, 2016), provide broader temporal coverage, but may exhibit a greater bias in estimating precipitation in complex mountainous regions and tend to overestimate lighter precipitation events while underestimating heavier ones (e.g., Izadi et al., 2021;Jiang et al., 2021).Rain gauge time-series have longer periods of availability, and certain Mexican records have been available since the beginning of the twentieth century (SMN-CONAGUA, 2022).However, these data series are usually limited by gaps in the data and by dubious values, behaviours or trends owing to nonclimatic external factors, generated during the measurement process or data digitization or by changes in instrument, calibration or station location (Peterson et al., 1998;WMO, 2017).These alterations (inhomogeneities) could lead to erroneous interpretations (Guijarro, 2018).Among the various methods that aim to guarantee that the observed values correspond only to climatic processes, the neighbour-based homogenization algorithm "climatol" (Guijarro, 2019) has outstanding accessibility, flexibility to adapt to any climatic variable, its high tolerance of missing data and capability to deal with any temporal resolution (e.g., day, month or year).In the last few years, climatol has been widely applied to homogenize a variety of daily and monthly climate databases.In particular, it has shown its efficiency in studying air temperature, precipitation, and wind speed series worldwide (e.g., Chile, Meseguer-Ruiz et al., 2018;Australia, Azorin-Molina et al., 2019;Israel, Yosef et al., 2019;Ireland, Coll et al., 2020;Domonkos et al., 2020;Iran, Javanshiri et al., 2021;Canada, James et al., 2022) and is recommended as one of the top homogenization methods (Coll et al., 2020;Guijarro et al., 2023).
This study used a meteorological station dataset over the period 1959-2018 to construct a homogenized and complete precipitation dataset for the URB region, identify the main precipitation regions and the possible factors that explained the spatiotemporal precipitation variability, and conduct a trend analysis at station and precipitation region scale over the study period.It aims to improve upon previous research conducted in the region by introducing several novel aspects: (1) incorporated data from meteorological stations situated in both the Mexican and Guatemalan territories to capture a broader representation of the precipitation patterns and climatic conditions within the URB; (2) the use of one of the most widely used and recent software (climatol) for quality control, homogenization and imputation of missing data to obtain valuable insights into long-term precipitation trends  in the URB; (3) the identification of precipitation regions within the URB to find the main precipitation characteristics and examine long-term temporal trends at yearly scales, which have significant practical implications for sustainable planning and adaptation strategies, particularly for stakeholders and policymakers in the URB region.

| Study site
The URB extends from 14.90 N to 18.70 N and from 92.71 W to 89.13 W, and its 77,183 km 2 is shared by Mexico ($46%), Guatemala ($54%) and Belize (<1%).The water source is in the upper basin, in the northwestern highlands of Guatemala (Huehuetenango and El Quiche departments, with a maximum altitude of 3800 m a.s.l).The Usumacinta River runs off through the Mexican states of Chiapas, Tabasco and Campeche before reaching the Gulf of Mexico, with a river flow estimated at 1700 m 3 Ás −1 (Cotler-Ávalos, 2010).The Köppen climate classification was modified by García (1973) for Mexico, based on temperature and precipitation, and divides the basin into three altitude levels (lower, middle, and upper), for which the climatic types are warm-humid in the lower basin (Am2 (x'); Am(f)) and middle basin (Am), and semi-warm humid in the upper basin ((A)C(m)(f); (A)C(m)).Considering the latitude and altitude ranges of the URB, snow is not considered in the region.

| Database management
The boundaries of the URB used in this study were geographically defined by Solorza-G omez (2017) (bold black line in Figure 1).The daily precipitation database for URB was compiled with data from the Mexican states of Campeche, Tabasco, andChiapas (period 1942-2019) from the database of SMN-CONAGUA (2022), and Guatemala (period 1969-2018) from the National Weather Service (INSIVUMEH, 2021).Precipitation time series from 77 meteorological stations were found in the URB: 70 in Mexico and 7 in Guatemala.Daily data missing from the stations within Mexico were calculated over the respective networks' operating period from this study's database.Daily data missing over its operating period from Guatemalan stations were calculated by Fuentes (2021).As a first step, only time series with ≤20% of daily missing data and ≥10 years of continuous data were selected, that is, 54 in Mexico and 6 in Guatemala (Figure 1 and Table S1, Supporting Information).
For further analyses, monthly accumulated precipitation was computed from daily data.To compute valid monthly accumulates, the daily data had to fulfil two requirements set by the World Meteorological Organization (WMO, 2017): (1) there cannot be 10 days with missing data in a month and (2) there cannot be ≥4 consecutive days with missing data in a month.The neighbour-based homogenization algorithm R-package climatol version 3.1.2(Guijarro, 2019) was used for quality control of the data, time-series homogenization and imputation of missing data by comparing each time series with a reference series estimated from an average of nearby stations.

| Data availability and homogenization
Climatol requires a minimum of three (ideally five or more) data at every time step (here monthly).Data availability was lowest (3-14 data per month) between 1940 and 1959 and highest (36-51 data per month) between 1970 and 1990 and ranged from 14 to 51 during the period 1959-2018 (Figure 2).To minimize errors stemming from data gaps in the initial decades and enhance the accuracy of infilled data, the monthly accumulated precipitation data were limited to the period 1959-2018 and prepared using the approach outlined by Guijarro (2019).This period allowed the study of two 30-year periods (1959-1988 and 1989-2018) to calculate climate normals and compute consistent trends (WMO, 2017).
From the preliminary run of climatol, the following parameters were determined for quality control, homogenization, and imputation: (i) the minimum and maximum limits for the standardized anomalies that allowed the exclusion of outliers and (ii) the thresholds of the Standard Normal Homogeneity Test (SNHT; Alexandersson, 1986) that verified the homogeneity of the series, dividing the series at a break-point (i.e., sudden shifts in the means) when the SNHT statistics were more significant than the prescribed threshold, and filling missing data before and after a break-point.To prevent negative values, average ratio normalization was applied to the raw data to indicate that precipitation has a natural zero lower limit.Afterward, climatol was run in full mode with the previously defined thresholds, and complete monthly precipitation series were obtained for each climatological station.The station density was checked to validate the statistical homogeneity of the results (Gubler et al., 2017), and higher weights were assigned to the closest station series during the imputation process.Spearman correlation coefficients between each pair of precipitation series assessed both the strength and direction of the relationship between timeseries data before and after the homogenization process and allowed the identification of reference stations for homogenization (Aguilar et al., 2003).
The performance evaluation metrics used for evaluating the impact of the homogenization process on the data quality included: (1) the break-points detection by the homogenization process that indicates when occurred sudden shifts in the means of the station precipitation series (i.e., data irregularities in the series), leading to a break and correction of the series by climatol; (2) the comparison of monthly median accumulated precipitation for raw, imputed and homogenized data, calculating for all the available data in the homogenization process and indicating the monthly distribution of the original data compared to the reconstructed data; (3) the mean absolute difference (in mm) and the bias percentage (in %) for each station, indicating the deviation magnitude and orientation (i.e., under-or over-estimation) of the reconstructed data compared to the original data; and (4) the bias time-series percentage between the means of original and the reconstructed data throughout the study period.

| Precipitation regions and extreme precipitation events
The subsequent analyses were conducted using annual precipitation data calculated specifically for the period from May to October, which corresponds to the 6 months of the highest annual precipitation in the region (i.e., the rainy season; Figure 3), in order to focus on long-term series independently from the impact of seasonal variability.The Gap statistic (Tibshirani et al., 2001) was used to estimate the optimal number of meteorological station clusters, and a hierarchical cluster analysis identified major station groups that shared a similar temporal pattern of annual precipitation (Kolde, 2019).
In order to define dry and wet periods, the Standardized Precipitation Index (SPI; McKee et al., 1993) at a 6-month scale (May-October) was used to quantify the anomalies (Beguería & Vicente-Serrano, 2017) from normal precipitation conditions for each station and cluster.Annual values of 6-month SPI ≤ −1 were considered dry, and annual values of 6-month SPI ≥ 1 were considered wet, between values of 0.99 to −0.99, defined as close to normal (WMO, 2012) (Table S2).The Niño3.4 index (CPC, 2021) was calculated for the period from May to October of each year (using the period of the rainy season defined by SPI; Figure 3) to describe annual ENSO sea surface temperature anomalies in the central Equatorial F I G U R E 3 Annual cycle of the monthly mean accumulated precipitation for each station in the Usumacinta River Basin (1959Basin ( -2018)).Bold line is the median.Within the dotted lines is found the period of highest annual precipitation in the region (May-October) [Colour figure can be viewed at wileyonlinelibrary.com]Pacific Ocean (5 N-5 S, 170 E-120 W) and was compared to the occurrence of extremes precipitation events in the 6-month SPI series of each cluster.

| Trends analysis
As precipitation does not follow a normal distribution, the Mann-Kendall nonparametric test (Kendall, 1975;Mann, 1945) and Sen's slope estimator (Sen, 1968) were applied, at a 95% confidence level, to the stations and the cluster series, aiming to detect potential trends within the series and precipitation region.These methods are widely used in analysing hydrological data trends as they are less sensitive to extreme values or outliers (e.g., Aditya et al., 2021;Aswad et al., 2020;Gan & Kwong, 1992;Gocic & Trajkovic, 2013;Hirsch et al., 1982).In particular, Sen's slope analysis offers an estimation of the trend's magnitude (expressed in this study in mmÁyear −1 ), enabling the interpretation of both the direction and extent of the trend, even in cases where the Mann-Kendall test does not yield statistically significant results (e.g., Aditya et al., 2021;Aswad et al., 2020;Gocic & Trajkovic, 2013;Stefanidis & Stathis, 2018).
To complete the previous trend analyses, 30-year precipitation normals were calculated and compared for the periods 1959-1988 and 1989-2018.The 30-year normals were derived by averaging the monthly normals for each year of the precipitation series and then computing the mean of the annual normals over the respective 30-year intervals, following the guidelines outlined by the World Meteorological Organization (WMO, 2017).Climate normals have proven to be valuable tools for identifying and quantifying precipitation trends (e.g., Grigorieva & de Freitas, 2014;Kuya et al., 2022).Additionally, the occurrence of dry years based on 6-month SPI values below −1 (WMO, 2012) was compared between the two 30-year periods for each station and precipitation region, aiming to investigate potential changes in extreme drought events over time.The SPI has been previously used for drought monitoring and trend analysis (e.g., Guimarães Santos et al., 2019;Montero-Martínez et al., 2018;Naresh Kumar et al., 2009;Saada & Abu-Romman, 2017).

| Homogenization and reliability of the imputation
Most of the meteorological stations (50) showed no break-points (splits) through the homogenization process.A total of 13 splits were observed in the time series of 10 stations, with each station experiencing either one or two splits due to anomalous changes in their mean (Figure 4).From the 1980s to 2000s, there was a notable surge in the number of data splits per year, indicating a substantial increase in data inconsistencies.This rise in splits coincided with the progressive decline of data availability in the region (Figure 3).
Considering the entire study period, the mean distance to the closest stations with available data used in the imputation process was 63 km (46-122 km) and 93.33% of stations exhibited a mean value below 100 km (Figure S1), indicating sufficient data within a 100-km radius to perform the imputation and homogenization process.The median Spearman correlation coefficient obtained for time-series from stations within 100 km was 0.81 (0.64-0.92 at a 95% confidence interval) before homogenization and 0.87 (0.75-0.95 at a 95% confidence interval) for the homogenized data (Figure 5).The monthly medians of the raw, imputed and homogenized were compared to gain insights into the potential biases introduced by the homogenization and imputation procedures (Figure 6).In general, the imputed data demonstrated higher median monthly precipitation values (+10.2 mm; +4.5%) compared to the raw data.The difference was more pronounced during the rainy season from May to October (+13.5 mm; +4.8%) compared to the dry season from November to April (+7.0 mm; +4.2%).The homogenized data consistently exhibited higher values (+5.2 mm; +3.4%) than the raw data, both during the rainy season (+4.0 mm; +2.4%) and the dry season (+6.5 mm; +4.4%).However, the homogenized data still remained lower than the imputed data, suggesting that the homogenization process effectively adjusted the dataset and preserved the statistical distribution of the original data, as evidenced by the relationship between the raw and homogenized data, which displayed a high degree of significance with a strong determination coefficient (R 2 = 0.95; Figure 7).Among the Mexican stations, stations 7089 and 7006 exhibited noteworthy deviations from the regression line (Figure 7; see detailed discussion in section 4.1).
The absolute differences between the means of monthly precipitation data before and after homogenization were within a narrow range for a majority of the stations, as absolute difference values were below 5 mm for 68% of the stations, below 10 mm for 80% of the stations and below 15 mm for 94% of the stations (Figure 8).The evaluation of bias percentages demonstrated that the majority of stations exhibited minimal bias, with values that were within the range of −5% to +5% for 80% of the stations and between −10% and +10% for 97% of the stations.The comparison between the means of raw and homogenized data over the study period indicated a bias per year that ranged from −5.1% to 6% for the period 1959-1988 and from 5% to 21% during the second 30-year period (Figure 9).

| Data clustering and precipitation anomalies
The median annual precipitation in the URB was 1384 mmÁyear −1 over 1959-2018 (IQR: 1025-1879 mmÁyear −1 ) and the median altitude of the stations was 777 m a.s.l.(IQR: 122-1258 m a.s.l.) (Table 1).The data clustering was done on the homogenized data.The Gap statistic identified three main clusters for the study period that suggested the presence of three main precipitation regions within the URB.A composite series of annual precipitation with the average of all stations per cluster was computed (Ci: Cluster i, i = [1, 3]; Figure 10 and Table 1).The highest accumulated annual precipitation  1960, 1969, 1981, 1984, 1999, 2010, 2013 and 2017) (Figure 10).Significant (p < 0.05) positive Spearman's correlations were found between the 6-month SPI time series of the clusters (0.81 ≤ r ≤ 0.88) and significant ( p < 0.05) negative correlations were observed between the cluster and Niño3.4 index time series (−0.40 ≤ r ≤ −0.31) (Table 2).

| Trend analysis
The application of the Mann-Kendall test to the raw data revealed significant ( p ≤ 0.05) trends, both positive and negative, in eight stations across the basin (Figure 11).After homogenization of the station series, the Mann-Kendall test revealed significant (p ≤ 0.05) negative trends in 11 stations from the central-western region of the URB, which exhibited negative Sen's slopes values (−4 to −6 mmÁyear −1 ), a notable difference between the 30-year normals (−22% to −10%) and a higher number of dry years (+6 to +11 dry events) during the second period compared to the first.At the scale of the basin, the Sen's slope value was negative for 82% of the stations (median: −2.33 mmÁyear −1 ) and the difference between the 30-year normals was negative for 81% of the stations (−4.70%).The analysis of the 6-month SPI revealed an increase in the number of dry years during the second 30-year period, observed in 92% of the meteorological stations.More than half of them (52%) experienced an additional 4-11 dry years occurring during the second period.
At the scale of the precipitation regions, the Mann-Kendall test on homogenized data assessed negative values of a nonsignificantly (p ≥ 0.05) monotonic trend in the cluster time series of the URB (−0.09 to −0.10) (Table 1).Sen's slope estimated the magnitude of that trend, accounting for −3.89 mmÁyear −1 in C1, −2.10 mmÁyear −1 in C2, and −1.95 mmÁyear −1 in C3.The discrepancy in climate normals exhibited a negative trend across all three precipitation regions, with C1 displaying a higher disparity compared to the other clusters.Over the second period , a noticeable rise in dry years was observed, showing an increase ranging from +4 to +7.

| Reliability of the imputation and homogenization process
Studying temporal and spatial precipitation patterns at a regional scale requires long-term observations at sufficiently high temporal resolution and enough spatial coverage (Easterling, 2013).However, these conditions are challenging in the URB region.Indeed, data were scarce in Mexico before 1960 (3-14 data per month) and nonexistent in Guatemala before 1970 (Figure 2); this was due to the paucity of meteorological stations during earlier times, as was the case in most of the countries of Central America (e.g., Aguilar et al., 2005;INSIVUMEH, 2016).The declining data availability in Mexico, starting in the 1980s and accelerating after the 1990s, is likely due to budgetary constraints (CONAGUA, 2012) because many climatological stations in Mexico that were operative in the 1950s were progressively suspended (SMN-CONAGUA, 2022): of 5880 stations that were once operative in Mexico, only 3348 remain operational in the country (CONAGUA, 2018).Results suggested that the restructuring of the station network in Mexico may have compromised the quality of data from the 1980s to the 2000s, resulting in a noticeable rise in data inconsistencies and irregularities during this timeframe (Figure 4).The station INS-110104 (Flores) in Guatemala exhibited data inconsistencies in 2000 that can be ascribed to localized alterations introduced during the measurement or data digitization process, which may occur in the Guatemalan station network, as reported in Fuentes (2021).Abbreviation: m a.s.l., meters above sea level. a Comparing the two 30-year periods of 1959-1988 and 1989-2018.b Based on the number of dry years (defined as 6-month-SPI values >1) in 1989-2018 compared to 1959-1988.The imputation process, relying on a reference precipitation series derived from an average of neighbouring stations, was made possible over the study period due to low mean distances from the closest station with available data (46-122 km) (Figure S1) and the strong correlation observed among their time-series (Figures 5 and S1).The homogenization process improved the correlation coefficients among precipitation time-series, as indicated by an increase from a median value of 0.81-0.87(Figure 5) and a diminution of the bias percentage of monthly median accumulated precipitation between the raw and imputed data (Figure 6), demonstrating the efficacy of the homogenization process in enhancing the quality of the data compared to its pre-homogenization state.Overall, the monthly means of the two datasets exhibited a strong relationship (R 2 = 0.95; Figure 7), and the absolute differences and the bias between the means of nonhomogenized and homogenized series were low for the majority of the stations (Figure 8), indicating a high level of accuracy and agreement between the nonhomogenized and homogenized data.Throughout the study period, the bias time-series percentage between the raw and homogenized data remained consistently low (Figure 9), indicating the successful preservation of the temporal distribution of average precipitation despite the homogenization process.In the second 30-year period for the Mexican stations, bias values increased slightly (ranging from 5% to 21%) due to a higher correction of artificial precipitation values and trends through the homogenization process after the break-point detection (Figure 4).
Despite the presence of a unique station in the northern part of Guatemala (INS-110104), the closest stations that served as references for its missing data imputation were located within a 122 km radius (Figure S1) and the correlation coefficients between the station INS-110104 and other stations were relatively high (76-78, 95% confidence interval), indicating sufficient data for the infilling procedure and good correspondence with other station series.Moreover, station INS-110104 presented a minimal amount of missing data (2%) during its operational period (Fuentes, 2021) and its reconstructed data was reliable, as demonstrated by low absolute difference (7.6 mm) and bias percentage (−5.2%)values compared to the original data (Figures 7-9).Among the Mexican stations that were analysed, the two stations 7089 and 7006, stood out from the others for showing noticeable deviations from the regression line (Figure 7) and relatively high absolute difference values (7089: 44 mm; 7006: 83 mm; Figure 8), although the correlation coefficients were high before homogenization (7089: 0.86; 7006: 0.82), indicating a similar temporal variability with other stations.This was explained by the presence of significant shifts in precipitation series detected by the SNHT test, which were attributed to nonclimatic external errors, likely resulting from changes in the instrument or calibration.These alterations affected the recorded precipitation values after the breakpoints.For instance, station 7089 showed a transition from mean monthly values around 220-110 mm after the breakpoint.Similarly, station 7006 displayed anomalous high precipitation values during the period December 1986-August 1989 compared to nearby stations, leading to the detection of two breaks before and after this period.
The different used parameters have demonstrated the reliability of climatol in imputing and homogenizing the precipitation series of each station, successfully rectifying data inconsistencies as observed in stations 7006 and 7089.Following homogenization, the station series presented improved temporal and spatial coherence compared to the nonhomogenized series.

| Precipitation regions
The three clusters of meteorological stations, each corresponding to different precipitation regions, exhibited well-defined and distinct precipitation ranges.The C1 stations, situated at intermediate altitudes, exhibited the highest median precipitation, while the C3 stations, located at the highest altitudes, experienced the lowest median precipitation.Thus, the spatial distribution of clusters was primarily influenced by altitude, where the highest precipitation was observed at a mean elevation of 636 m a.s.l., represented by C1, and the minimum precipitation occurred at a mean elevation of 1531 m a.s.l, as shown by C3.The wide altitude range of the C2 stations (19-963 m a.s.l) reflected the influence of other factors, such as the source and direction of wind and humidity F I G U R E 1 1 Trend analysis over the station series (Mann-Kendall and Sen's slope) and the difference normals and the number of dry years between the two 30-year periods (1959-1988 and 1989-2018) for nonhomogenized and homogenized data in the Usumacinta River Basin [Colour figure can be viewed at wileyonlinelibrary.com] (windward/leeward mountainsides) and the distance to the ocean (Houghton, 1979).
The relationship between elevation and precipitation is the result of the orographic effect occurring in the Tropics, a combination of the cooling of moist air ascending along the mountains and the decreasing air moisture with altitude (Daly et al., 1994;Fern andez et al., 1996;Houghton, 1979), that produces an increase in precipitation until a particular elevation (belt of maximum precipitation) after which precipitation declines with higher elevation.Finding a belt of maximum precipitation is a frequent phenomenon in the mountains of the Tropics and subtropics (Hastenrath, 1967) and has been found around the world, such as India (Puvaneswaran & Smithson, 1991), Costa Rica (Chac on & Fernandez, 1985;Fern andez et al., 1996) and Morocco (Abahous et al., 2018).The belt of maximum precipitation of this study defined as 265-712 m a.s.l. and based on C1 stations (Table 1) was found at a lower altitude than other studies in Central America, such as on the Pacific escarpment of the Guatemala Highlands (900 m a.s.l.; Hastenrath, 1967) and on the Pacific and the Caribbean coasts ($1000 m a.s.l.; Fern andez et al., 1996) and in the Reventaz on River Basin (1700 m a.s.l.; Chac on & Fernandez, 1985) in Costa Rica.This level varies in Central American mountains; it depends on each region's topographic and meteorological conditions but is generally reported to occur at intermediate elevations.
Certain events of El Niño (Niño3.4index ≥0.5 C) and La Niña (Niño3.4index ≤0.5 C) were associated with changes in URB precipitation in the last decades (Figure 10; hatched bars for dry years and El Niño events; dotted bars for dry years and La Niña events).All the clusters showed mild to severe drought SPI categories (−1 ≤ 6-month SPI < −1.5) in the URB linked to the El Niño medium intensity event of 2002-2003 and linked to the very intense El Niño event of 2015.The strong La Niña events of 1999 and 2010 were associated with severely wet conditions (6-month SPI ≥ 1.5), and the moderate event of La Niña in 1984 was related to the moderate wet conditions (6-month SPI ≥ 1) in the URB.In particular, these events were also evident in the temporal bias (Figure 9), where they resulted in a notable increase in variability among the station series.Specifically, we observed a higher bias during La Niña events and a lower bias during El Niño events, compared to the original series.The significant negative correlation (−0.47 ≤ r ≤ −0.32; p < 0.05) found between the timeseries of the clusters and the Niño3.4index (Table 2) is consistent with results from previous studies in the URB region (Andrade-Vel azquez & Medrano-Pérez, 2020;INSIVUMEH, 2016) and in Central America (Alfaro, 2002;Maldonado et al., 2018), as a result of the impact of El Niño events on the moisture transport for Central American precipitation (Dur an-Quesada et al., 2017).In addition, evidence of a relationship between drought and El Niño events has been found in southern Mexico (Magaña et al., 2003;Salas-Flores et al., 2014;Salas-Flores & Jones, 2014).The possible further increases in the frequency and intensity of extreme El Niño events related to climate change (Cai et al., 2022;Wang et al., 2019) may lead to future drought conditions in the URB and associated socioeconomic consequences, although further research and consideration of various factors are required to better understand and project the future behaviour of ENSO under the influence of anthropogenic climate change (Cai et al., 2022).

| Trend analysis at station and precipitation region scale
Based on the Mann-Kendall test results, a small number of meteorological stations presented statistically significant trends over 1959-2018 in both non-homogenized (8 stations) and homogenized (11 stations) data, suggesting that the rest of the station data or the period considered in this study may be limited in capturing significant precipitation trends due to its inherent wide variability (WMO, 2017).This concurs with most previous findings for Central America (Alfaro-C ordoba et al., 2020;Dur an-Quesada et al., 2017;Hannah et al., 2017;Hidalgo et al., 2013Hidalgo et al., , 2017;;Maldonado et al., 2021), which have suggested that precipitation trends are not always homogeneous for Central America and depend on the database used in the assessment.However, the investigation of trends in this study was extended through the integration of results based on the reconstructed homogenized data and different trend methods, such as Sen's slope, climate normals, and the occurrence of dry years.
Before homogenization, the trend analysis using the raw data yielded a mixture of trends over the study period, with 55% of the stations exhibiting positive trends and 45% showing negative trends, and a low magnitude of change, as indicated by the Sen's slope values (−0.51 to 0.26 mmÁyear −1 ).The difference between the 30-year normals (−6.1 to +4.8%) and the change in dry years (−2 to +5) in 1959-1989 to 1989-2018 did not show a general pattern (Figure 10).However, after homogenization of the station series, the Mann-Kendall test showed a concentration of significant ( p ≤ 0.05) negative trends in the central-western region of the URB, which was identified as the area with the most pronounced drought condition within the URB.The remaining stations of the URB also exhibited a general negative trend; however, these trends did not reach the statistical significance threshold.The negative value of the Sen's slope confirmed declining precipitation in most of the meteorological stations, with high magnitudes of change (−4 to −6 mmÁyear −1 ) (Figure 10).The evidence indicating a trend towards drier conditions was reinforced by strong negative values in the difference of 30-year normals (−22 to −5%), indicating lower mean precipitation in 1989-2018 compared to 1959-1988, and an observable increase in the occurrence of dry years during the second 30-year period (+6 to +11) across the majority of stations.
Applying homogenized data for analysing precipitation trends enhanced spatial and temporal coherence compared to nonhomogenized data (Figures 5 and 11).When executed correctly and with meticulous control of parameters, the improved coherence within long-term trend data series is a consistent and expected result of the homogenization process (Aguilar et al., 2003), widely observed in studies employing climatol methodology on hydroclimatic datasets (e.g., Abahous et al., 2020;Javanshiri et al., 2021;Kessabi et al., 2022;Kuya et al., 2022;Meseguer-Ruiz et al., 2018;Yosef et al., 2019).The validation of the data density and reliability of the homogenization process within the URB (in section 4.1) confirmed the accuracy of trends derived from the homogenized dataset, and it also ensured that the heightened coherence within series directly resulted from effectively eliminating nonclimatic noise while preserving natural climatic data variability and information.
The comparison of results obtained before and after the homogenization process (Figure 11) through the Mann-Kendall test evidenced a transition from significant positive trends to significant negative trends in precipitation records from two of the meteorological stations, as well as a shift from significant to nonsignificant trends in the records of another five stations.Previous studies elsewhere that used homogenization of precipitation data have also reported such trend variations (e.g., Abahous et al., 2020;Kessabi et al., 2022;Kuya et al., 2022) that were attributed as a positive outcome of the homogenization process itself.Indeed, this process is designed to identify and rectify trend inconsistencies introduced by nonclimatic factors and to isolate the precipitation trend driven by true climatic factors (Aguilar et al., 2003).In this study, the absence of spatial coherence in precipitation trends before homogenization could have resulted from the sensor misalignment, station relocations, alterations in observational practices, or the progressive land use change around the station (Ribeiro et al., 2016).This lack of coherence was likely a consequence of the reconfiguration of the meteorological station network in Mexico during the period from the 1980s to the 2020s and resulted in the emergence of artificial trends superimposed upon the true climate trends in URB stations.
Across the scale of precipitation regions represented by the clusters, a noticeable transition towards drier precipitation conditions was evident throughout the study period (Figure 10 and Table 1).Stations experiencing the highest levels of precipitation (C1) were particularly affected, suggesting that the consequences of this shift may be more pronounced in this specific region and at intermediate altitudes.The observed rise in drought conditions over the past few decades aligns with previous studies that have documented warming trends in the Mexican part of the URB (Montero-Martínez et al., 2018) and in Central America across interdecadal scales (Alfaro-C ordoba et al., 2020;Dur an-Quesada et al., 2017;Hannah et al., 2017;Hidalgo et al., 2013Hidalgo et al., , 2017;;Maldonado et al., 2021).Such temperature trends can have substantial implications for the regional hydrological cycle and water resources, as the concurrent increase in evaporation and evapotranspiration may lead to shifts in atmospheric circulation patterns, altering the distribution and intensity of rainfall, and a significant reduction in water availability and runoff.Additionally, warmer temperatures can increase the capacity of the atmosphere to hold moisture, potentially leading to more intense rainfall events but also longer periods of drought.However, it is important to note that the relationship between warming trends and precipitation variability is complex, and further improvements in the study could be achieved by considering longer time-series data, for example, using paleo-precipitation records, or by incorporating reanalysis data calibrated with local rain gauge data.These additional sources of information, as demonstrated in previous studies (e.g., Hidalgo et al., 2017), can provide valuable insights for a more comprehensive understanding of precipitation trends.
The escalating aridity in the URB region holds significant importance for stakeholders, particularly because the socio-economic development of the area relies mainly on crop and livestock production.These crucial sectors are exceedingly susceptible to fluctuations in water availability and extreme weather events like droughts.Hence, it becomes imperative to take these changing drying conditions into thoughtful consideration.

| CONCLUSIONS
This study made a significant contribution by creating a complete and homogenized precipitation dataset for the Usumacinta River Basin, spanning 1959-2018 and including meteorological stations from Guatemala.The climatol imputation and homogenization process demonstrated high reliability through strong Spearman correlations and low absolute differences and bias percentages between raw and homogenized data.The homogenization process corrected abrupt shifts and artificial trends in 13 time series primarily observed between 1980 and 2000.The resulting temporal evolution showed improved spatial consistency, enhancing temporal and spatial coherence while preserving the overall distribution of raw data.The 60 meteorological stations were grouped into three clusters based on distinct annual accumulated precipitation ranges, mainly related to the orographic effect of altitude.Interannual variability in cluster time series was associated with El Niño Southern Oscillation.Although the Mann-Kendall test did not show statistical significance for most stations, combining it with Sen's slope analysis and comparing 30-year precipitation normals and dry year occurrences revealed a discernible decreasing trend in homogenized precipitation data.This trend was particularly evident in the cluster characterized by the highest precipitation levels in the Usumacinta River Basin (C1), suggesting a higher risk of intensifying droughts in this region, corresponding to the intermediate altitudes of Chiapas.Understanding how local and global factors (e.g., topography and ocean-atmosphere interactions) influence precipitation in the region is crucial.The Usumacinta River Basin is an exceptionally diverse region with high economic potential, and extreme events in the context of future climate change may affect key aspects of essential sectors such as health, agriculture, environmental science, and water and disaster risk management.

F
I G U R E 1 Meteorological stations (Δ) used for the analysis of precipitation patterns and trends in the Usumacinta River Basin (bold black line), in southern Mexico and northern Guatemala [Colour figure can be viewed at wileyonlinelibrary.com]F I G U R E 2 Monthly data availability per station in the meteorological stations of the Usumacinta River Basin (1959-2018) [Colour figure can be viewed at wileyonlinelibrary.com]

F
I G U R E 4 Number of splits per station (left) and per year (right) in precipitation series accounted during the homogenization process in meteorological stations from the Usumacinta River Basin (1959-2018) F I G U R E 5 Spearman correlation coefficient for time series between stations against their distance for the raw (left) and homogenized data (right) in the Usumacinta River Basin (1959-2018) [Colour figure can be viewed at wileyonlinelibrary.com] was in C1 (IQR: 1957-2688 mmÁyear −1 ; n = 15), which grouped stations in a restricted area of Chiapas (average distance between stations of 62 km) with a mean altitude range of 636 m a.s.l.Precipitation values were intermediate in C2 (IQR: 1160-1604 mmÁyear −1 ; n = 32) with a lower mean altitude (536 m a.s.l.).Precipitation was lowest in C3 (IQR: 874-1035 mmÁyear −1 ; n = 13) in the mountains of Chiapas and northwestern Guatemala, including the highest mean altitude (1531 m a.s.l.).The 6-month SPI analysis revealed similar values among the three clusters, with shared common dry years(6-month SPI ≤ −1;1994, 2004, 2015, 2016 and 2018)  and common wet years (6-month SPI ≥ 1;

F
I G U R E 6 Monthly median accumulated precipitation of raw, imputed and homogenized data calculated for all the available data in the Usumacinta RiverBasin (1959Basin ( -2018)   )    F I G U R E 7 Monthly means of accumulated precipitation for raw and homogenized data in the Mexican and Guatemalan meteorological stations from the Usumacinta River Basin The median Sen's slope, indicating the magnitude of the observed trends, was calculated as −0.12 mmÁyear −1 (−0.51 to 0.26 mmÁyear −1 ).When comparing the 30-year normals, the results revealed that 35 stations exhibited values of difference ranging from −10% to 7% in 1989-2018 compared to 1959-1988, while the remaining stations recorded values below −10%.Moreover, among all the stations, 32 stations displayed an increase of 1-4 dry years (6-month SPI values ≥1) during the second period.Fourteen stations showed no F I G U R E 8 Mean absolute difference (left) and bias percentages (right) between the raw and homogenized data at monthly scale in meteorological stations from the Usumacinta River Basin (1959-2018) [Colour figure can be viewed at wileyonlinelibrary.com]F I G U R E 9 The yearly mean bias percentage between the raw and homogenized data from the Usumacinta River Basin (1959-2018) [Colour figure can be viewed at wileyonlinelibrary.com] change, while another 14 experienced a reduction of one or two dry years from 1989 to 2018 when compared to the 1959-1988 period.
Annual precipitation, station altitude, and trend analysis in clusters grouping meteorological stations from the Usumacinta RiverBasin (1959Basin ( -2018)   ) T A B L E 1