The predictability of daily rainfall during rainy season over East Asia by a Bayesian nonhomogeneous hidden Markov model

Precipitation plays a significant role in human society and the environment, and how large‐scale climatic features influence precipitation has obtained worldwide attention. The nonhomogeneous hidden Markov model (NHMM) is a typical method to downscale large‐scale climatic elements to the regional level for many hydrologic applications. The traditional NHMM using point estimates of parameters has the risk to have no solutions for parameter estimation, but the Bayesian‐NHMM provides a Bayesian method to estimate parameters and avoids the risk. In this study, the suitability of the Bayesian‐NHMM in East Asia is evaluated. Two typical regions (i.e., the Yangtze River basin and the Zhujiang River basin in China) dominated by different climates were chosen to check model performance. Results show that: (1) the model could divide rainy‐season precipitation into several hidden states, whose variation is correspondent to the variability of monsoon and flux moisture transportation; (2) the model captures seasonality and inter‐annual variation of precipitation amount and also wet days during rainy season well in both river basins. These results suggest that the model could improve the prediction of daily precipitation in East Asia, which in turn could help many regions with similar climatic conditions worldwide to supervise floods and droughts.


| INTRODUCTION
Rainfall is one of the most significant variables responding to climate change in hydrological and meteorological cycles, which has attracted attention worldwide (Chen et al., 2018;Lee et al., 2022;Xiao et al., 2017).Frequent occurrences of flood and drought due to changes in the amplitude of rainfall have caused devastating effects on society and the environment.For instance, floods killed over 540,000 and injured over 360,000 people globally and have been cited as the most lethal disaster (Petrucci et al., 2019).The flood disaster that occurred in the Yangtze River basin (YRB) in 1998 and in Pakistan in 2010 has caused an estimated economic loss of $26 million and $21 million, respectively (Hsu et al., 2015).As a consequence, a quantitative rainfall forecast is urgently required to increase the lead time for flood warming for fast-responding river basins (Ni et al., 2020).An accurate rainfall forecast also could provide advanced information for variable water quality problems.Besides, simulating and modeling precipitation on a daily scale is extremely important in many other aspects, such as crop modeling and water policy decisions (Hansen et al., 2006;Piani et al., 2010).
Rainfall amount is mainly driven by large-scale climatic fluctuation (Gu et al., 2017;Peatman et al., 2021).Consequently, statistical downscaling (SD) is often used in the daily rainfall forecast model to downscale largescale climate variables and evaluate their impacts on daily precipitation.SD models have been developed in many aspects, such as predictors, working principles, and skill measures during the last 20 years (Al-Mukhtar & Qasim, 2019;Baghanam et al., 2020;Maraun et al., 2010).The homogeneous and nonhomogeneous hidden Markov models (HMM vs. NHMM) are typical SD models, which are frequently applied in predicting daily precipitation (Cioffi et al., 2017;Lianyi et al., 2018;Pal et al., 2015;Wang et al., 2023).The advantage of the HMM and NHMMs lies in their efficiency in capturing distinct rainfall states from complex precipitation data to simulate multisite rainfall occurrence probability and amount.
In the HMM, multisite daily rainfall in a river basin could be linked to a particular hidden rainfall state, which follows the first-order Markov chain; therefore, the HMM could translate a series of rainfall observations into a stochastic process.A transition matrix only depending on the current rainfall state controls changes among rainfall states.In the NHMM, the transition matrix is affected by the current rainfall state and atmospheric variables (such as sea surface temperature), which leads to the excellent ability of the NHMM in demonstrating spatial relation and temporal dependence between atmospheric variables and rainfall in a region.In the traditional NHMM, the Expectation-Maximization (EM) algorithm is used for parameter estimation.The EM algorithm depends on point estimates of parameters and will have no solutions for many complex spatial situations (Cioffi et al., 2017).
A new NHMM (the Bayesian-NHMM) was introduced by Greene et al. (2015) to solve this problem.The Bayesian method could allow for an automatic assessment of the influences of every exogenous atmospheric variable by providing uncertainty estimates for parameters in the model, which is more suitable to feature uncertainty of parameter estimation in comparison to the EM approach.In addition, the Bayesian-NHMM could not only downscale the large-scale climate data to predict daily rainfall but also consider region-scale variables to provide more local precipitation-related information in the model.
In the previous research, the application of the Bayesian-NHMM is mainly concentrated in South Asia.For instance, Holsclaw et al. (2016) and Holsclaw et al. (2017) used the Bayesian-NHMM to simulate daily rainfall characteristics in the Punjab and regions near the Himalayas in South Asia.They argued that the model can well capture precipitation characteristics at different rainfall stations in terms of seasonality and inter-annual variability.The climate of the south Asia is dominated by the humid climate, which is mainly influenced by the south Asia Summer monsoon.However, whether the Bayesian-NHMM is also suitable in other climatic conditions (such as regions affected by the east Asia Summer monsoon [EASM]) is still unclear.
The EASM has both subtropical and tropical characteristics, with the SASM only having tropical properties; therefore, the influences of the EASM on precipitation are more complex as compared to the SASM (Huang et al., 2017;Zhao et al., 2023).The difference between the two monsoon systems can contribute to an obvious difference in summertime rainfall features.There are many countries and regions affected by the EASM (e.g., China, Japan, and Korea), where the occurrences of flood and drought disasters are very frequent (Yin et al., 2022;Yin, Gentine, et al., 2023;Yin, Guo, et al., 2023).As a consequence, investigating the suitability of the Bayesian-NHMM in regions influenced by the EASM is of vital importance in terms of disaster warning and water resources management.As a consequence, this study selected two typical regions as study areas (i.e., the YRB and the Zhujiang River basin [ZRB] in China shown in Figure 1).The YRB is dominated by a semi-humid and humid climate, and the ZRB has a humid climate, both of which are affected by the EASM and the Pacific Ocean.
While the Bayesian-NHMM have been used in various domains, the authors apply the model to specific locations that have not been previously explored or documented.This novel application demonstrates the versatility of Bayesian-NHMM and expands their potential applications to new domains, providing valuable insights and predictions specific to the studied location.The authors introduce novel adaptations to the new Markov model framework to suit the specific characteristics and requirements of the location in these two regions, which could provide useful information for many regions with similar climatic conditions worldwide to supervise flood and drought.This adaptation may involve incorporating additional factors, modifying the transition probabilities, and extending the model to include higher-order Markov chains.These modifications enhance the model's ability to capture the unique dynamics and dependencies within the location, setting it apart from previous applications of Markov Models.The authors incorporate novel external factors (large-scale climate indices) into the Markov Model, enriching the model's predictive capabilities.By integrating these additional factors, the authors provide a more comprehensive understanding of the location's dynamics and improve the model's predictive accuracy.This constituted the motivation for this study.
In this paper, we will apply the Bayesian-HMM to determine rainy-season rainfall patterns in the YRB and the ZRB.Large-scale climate indices will be chosen as an input of the Bayesian-NHMM based on their correlation coefficients with rainy-season precipitation in two river basins.The seasonality and inter-annual variability of daily precipitation and wet days will be further evaluated.We will also explore the possible relationship between rainfall patterns determined by the model and the EASM to understand the signals of rainy-season precipitation.This study is of vital importance for helping us to check the suitability of the Bayesian-NHMM in regions with semi-humid and humid climates influenced by the EASM and is significant to understand flood warning and hydrological cycle in regions with similar climatic conditions worldwide.
The paper is organized as below.The study area and data are described in Section 2, which is followed by the method in Section 3. Results regarding the evaluation of the applicability of the model in two river basins are presented in Section 4. The paper is concluded with a discussion and summary of the finding in Section 5.
The EASM affects many countries in East Asia (e.g., China, and Japan) and could contribute to flood and drought disasters.Motivated by the limited research about the applicability of the Bayesian-NHMM to predict daily rainfall in regions influenced by the EASM, two typical regions (the YRB and ZRB, shown in Figure 1) are chosen in this study.The YRB (located in 90 33-121 25 E, 24 30-35 45 N), with a length of 6380 km, is the longest river in Asia and the third-longest river worldwide.The YRB is dominated by a semi-humid and humid climate.The ZRB (located in 102 14-115 53 E, 21 31-26 49 N), with a drainage area of 4.4 Â 10 5 km 2 , is the third-largest river basin in China.The ZRB is mainly affected by the humid climate, with annual precipitation of more than 800 mm.
Daily precipitation data from 1987 to 2016 at precipitation stations in the ZRB and YRB were selected for this study.The data were obtained from the China Meteorological Data Sharing Service System, and the data quality has been checked regularly.The locations of precipitation stations are shown in Figure 1.
Large-scale climate data (such as Indian Ocean Dipole [IOD], El Niño-Southern Oscillation (ENSO), and Summer North Atlantic Oscillation [SNAO]) were selected to establish teleconnection to daily precipitation.In addition, the National Centres for Environmental Prediction (NCEP)/National Centres for Atmospheric Research (NCAR) reanalysis data were used to analyze underlying causes of rainfall patterns (Kalnay et al., 1996).

| Determination of rainy season
The rainy season is a period when flooding occurs more often and rainy-season precipitation can provide some predictability for flood warnings.China is prone to flood occurrence, so it is important to predict rainy-season precipitation.
Previous research used threshold values of given indices (such as rainfall, monsoon, or temperature) to determine rainy season, which is somewhat subjective (Goswami & Xavier, 2005;Reason et al., 2005).Therefore, the multi-scale moving t-test, which is a data-originated objective method, was applied in this study to detect the rainy season.This method has been used to detect the rainy season by many researchers (e.g., Wang et al. (2016) and Cao et al. (2017)).The multi-scale moving t-test is featured by identifying a breakpoint between two samples, which have equal size before and after the breakpoint.The determination of the breakpoint can be described as (Fraedrich et al., 1997): where x i1 and x i2 defined as and x i is daily rainfall in Julian day i within 1 year and for one station.x i1 and x i2 are the average values of subsamples before and after Julian day i, respectively.The t-value t n,i ð Þ calculated above was normalized by the 0.01 test value shown in Equation ( 4), which is equal to the result of the Mann-Kendall test at the 0.05 significance level where t r n, i ð Þ can be taken as the threshold value to detect the onset and retreat of the rainy season.t r n, i ð Þ > 1.0 represents an increasing trend, with t r n, i ð Þ < 1.0 a decreasing trend.The onset (retreat) of rainy season in this study was defined as the abruption point corresponding to a maximum (minimum) t r n, i ð Þ value, where precipitation changes from a smaller (higher) to a higher (smaller) value.
In this study, daily precipitation data from 1987 to 2016 at 121 (51) rainfall stations in the YRB (ZRB) were selected to investigate rainy-season precipitation.The average onset and retreat of rainy season in 1987-2016 for the YRB (ZRB) is Julian Day 109 (109) and 255 (248), respectively.The average rainy seasons for two river basins were chosen as the study period for predicting precipitation in this paper.

| The Bayesian homogeneous Markov model
In a Bayesian homogeneous Markov model (Bayesian-HMM), data are modeled based on several discrete hidden states undergoing Markovian transition.In terms of hydrological application, hidden states represent an unobserved weather variable affecting multisite rainfall occurrence and amount at the same time in a region.In this paper, the Bayesian-HMM presents a probabilistic framework of the joint distribution of multisite rainyseason rainfall on a daily time scale in the YRB (and the ZRB). Figure 2 shows the Bayesian-HMM network.R t,s refers to rainy-season precipitation at station s (s = 1,2, …, S) at time t (t = 1,2, …, T).Z t ranging from 1 to the total number of hidden states K is a hidden Markov state variable at time t.θ z,s represents a set for all parameters in a spatial precipitation distribution, specific to hidden state z and station s.
There are two assumptions made for the Bayesian-HMM.The first one is that hidden states undergo first-order Markovian processes and the stateto-state probability matrix does not change in time: The second one is that observations and states in time t are conditionally independent of that in days up to time t: where Þ is observed rainy-season rainfall at time t for all stations in a region.Based on the two assumptions, the conditional likelihood in the Bayesian-HMM could be written as: where P Z 1 jZ 0 ð Þis the initial state distribution.

| Bayesian nonhomogeneous Markov model
As compared to the Bayesian-HMM, additional observed values are introduced into the transition matrix to allow every transition to own specific logistic coefficients in the Bayesian-NHMM.Thus, rainfall depends on observed input variables.In the Bayesian-NHMM, the matrix Q t (with components q ijt ) is a nonhomogeneous K Â K transition matrix, which is calculated as: where X t ¼ X t,1 ,X t,2 , …,X t,p À Á is a sequence of p exogenous large-scale climate variables at time t related to rainy-season rainfall.ρ j is a p-dimensional coefficients vector related to X t .ε ij is bias in the exponential function.
Þ is another main component of the model which represents the state-dependent emission distributions.In general, these emission distributions are nonhomogeneous over time due to the fact that θ depends on station-level variables W t,S .W t,S represents exogenous large-scale climate variables in time t associated with rainfall stations, such as an annual linear trend and seasonal harmonics to capture monsoon seasonality.The emission distribution R t was chosen to be a delta function and mixture of two gamma distributions to fit dry days (days with no rainfall) and rainfall amounts on wet days (Kwon et al., 2009;Robertson et al., 2009).
Figure 3 shows the Bayesian-NHMM network.The observed values (R(t, S), (X t, W t,S ) are presented in gray boxes.The unknown parameters (θ for the emission distribution and δ for the transition probabilities), and the hidden states Z t are shown as circles.Q t in double circles is a set of matrices, which includes transition probabilities from the Markov property of hidden states and exogenous variables X t .The network presents how the two types of exogenous variables (X t and W t,S Þ influence the Bayesian-NHMM.
The conditional likelihood of data in the model could be written as: The Bayesian-hidden Markov model network.
Dependencies are denoted by arrows with time t going from left to right.

| Bayesian estimation
Estimation starts with Bayes' conditional probability theory: where θ is a set of all parameters in the model and R is the precipitation sequence.P θ ð Þ is an unconditional distribution, representing our knowledge of θ before R. P Rjθ ð Þ is the likelihood function, and P θjR ð Þ is the posterior density.
Equation ( 10) can have analytical solutions only for simple problems, with no closed-form solutions for complex models.Consequently, the Markov chain Monte Carlo (MCMC) method is applied to provide samples from the posterior densities of interest.Details about how to use MCMC in the Bayesian-NHMM are shown by Holsclaw et al. (2017).Several steps to construct the Bayesian-HMM and Bayesian-NHMMs are described briefly as follows: First, the Bayesian-HMM is used to identify the number of hidden states in the YRB (and ZRB).In this phase, the daily precipitation amount and precipitation occurrence probability for rainfall states in the two river basins are calculated.The interannual variability of hidden states is then presented by the Viterbi algorithm, which could find the state sequence corresponding to the maximum P R t , Z t ð Þ and identify the most probable rainfall states (Forney, 1973;Rabiner & Juang, 1986).The detailed programming algorithm of the Viterbi algorithm is provided by Kirshner (2005).
Then, large-scale atmospheric variables associated with daily precipitation during rainy season in the two regions are chosen as input for the Bayesian-NHMM.The Bayesian-NHMMs with different combinations of candidate atmospheric predictors were evaluated in terms of seasonality and interannual variability.All models are carried out by five-fold cross-validation, with five different 6-year consecutive data used to fit the model parameters and the remaining 6-year period used to check or validate the predictions.For every model, 1000 predictive runs are generated.

| Identification of the number of states and spatial rainfall dependence over the YRB and ZRB
To fit the Bayesian-HMM, the number of hidden states for each of the two river basins is selected first.The model was fitted with a different number of rainfall states (K = 1, …,12) and then the K value minimizing the Bayesian information criterion (BIC) was chosen.The deviance information criterion (DIC) was also experimented with by other researchers (such as Holsclaw et al., 2016) but they found that DIC preferred models with the largest number of parameters consistently due to nonnormality in the parameter distribution.As a consequence, BIC was often used to select the number of hidden states in the HMM and NHMM.BIC is The seasonality of states averaged over 30 years and the interannual variability of states for all stations in the Yangtze River basin.defined as the negative loglikelihood of data plus the penalty for model complexity (Schwarz, 1978).For the YRB, the Bayesian-HMM has a minimum BIC value with the hidden state K = 10.For the ZRB, the BIC value decreases sharply initially before K = 4 and levels off for K > 4, so four states are selected for the Zhujiang model.The ZRB region is smaller and thus more spatially homogeneous, so we expect this smaller number of states.

| Spatial rainfall dependence over the YRB
Rainfall states present the seasonality of station-scale precipitation and patterns of sub-seasonal weather changes.After 1000 predictive runs are generated for the Bayesian-HMM, giving 1000 different samples of hidden states z n t (n = 1,…, N).To generate figures in this and the next subsections, the authors assign each day t to the most probably occurring state across N samples found by the Viterbi algorithm.Figure 5 shows the probability of precipitation for 10 states over the YRB.State 1 has the smallest rainfall occurrence probability, which is lower than 0.2 for the Middle and Lower YRB and 0.2-0.4 for the Upper YRB.State 2-3 show similar patterns to State 1, with a higher possibility for rainfall over the whole region (0.4-0.6 for the Upper YRB and 0.2-0.4 for the Middle and Lower YRB).In State 4, precipitation occurrence probability ranges from 0.2 to 0.4 in the Middle YRB and 0.4-0.6 in other regions, with state 5 representing contrast patterns.The possibility of rainfall is higher in State 6-10 in different regions over the YRB.In general, the precipitation occurrence probability loosely changes from smallest to highest.
The most probable daily series of 10 states are plotted in Figure 6.Generally, Figure 6 shows that rainfall sequences present large variability for rainfall states both at seasonal (horizontally in Figure 6) and interannual time scales (vertically in Figure 6), which is demonstrated in Figure 7

| Spatial rainfall dependence over the ZRB
Figure 8 shows the mean daily rainfall amount for each of the four states for the ZRB.It can be seen from Figure 8 that State 1 represents dry days during the rainy season, with a mean daily rainfall amount of less than 3 mm for all stations.State 2 demonstrates wet days in the western part of the ZRB, with rainfall amounts ranging from 3 to 9 mm and other stations still keeping dry.In contrast, State 3 represents large (small) daily rainfall in the eastern (western) region of the ZRB.State 4 demonstrates days with large rainfall amounts (9-27 mm) for all stations.
The occurrence probability for four rainfall states over the ZRB is presented in Figure 9, which correspondent to Figure 8. State 1 shows the smallest precipitation occurrence probability (less than 0.2).The possibility of rainfall occurrence is generally higher for State 2, especially in the western ZRB, where rainfall occurrence probability ranges from 0.4 to 0.8.In contrast, there is no obvious change in terms of rainfall occurrence possibility in the western ZRB for State 3, with the eastern ZRB showing more probable precipitation.State 4 demonstrates the largest rainfall occurrence probability over the whole river basin (0.6-1.0).
Figures 10, 11 show the estimated rainfall state sequence and its seasonality and interannual variability for the Zhujiang River basin network, analogous to Figures 6, 7. The sequence is less noisy than for the YRB and average seasonality and interannual variability are much easier to discern.As shown in Figures 10,  11, the occurrence frequency of States 1 and 3 decreased when the rainy season began until the middle phase and then increased.In contrast, States 2 and 4 happened most frequently in the middle phase of the rainy season.At the interannual scale, there is no obvious tendency in terms of the occurrence frequency of hidden states.lagged influences on precipitation, so correlations between preceding indices (lagging 1-12 months) and precipitation were all investigated.Predictors where pvalues are smaller than 0.1 and lag-correlation is strongest among 12 lagged periods are chosen in the Bayesian-NHMM (shown in Tables 1 and 3).
Four predictors (Niño 1-2, IOD, NAO, and Niño 3) were selected as inputs in models for the YRB (Table 1).Niño 1-2 has the strongest correlation with rainy-season precipitation over the YRB (the r s value reaching 0.49) and is significant at the 0.01 significant level.IOD 6 months lagged to the rainy season also has a strong relationship with precipitation over the YRB (the r s value reaching 0.45) and is significant at the 0.05 significant level.The correlation coefficients of NAO, Niño 3, and rainfall are 0.37 and 0.36, respectively, which are significant at the 0.1 significant level.As listed in Table 2, models with five different combinations of predictors were set up to compare the ability of the different Bayesian-HMM and Bayesian-NHMMs to simulate rainfall.Models include the Bayesian-HMM without predictors, Bayesian-NHMMs with one (Niño 1-2), two (Niño 1-2 + IOD), three (Niño 1-2 + IOD + NAO), and four (Niño 1-2 + IOD + NAO + Niño 3) predictors.
As listed in Table 3, three large-scale indices (SNAO, IOD, and NAO) were chosen as inputs in models for the ZRB.SNAO and NAO both have negative correlations to rainy-season rainfall over the ZRB, which is significant at the 0.01 and 0.1 significant levels, respectively.SNAO also has the strongest relation to precipitation, with the r s value reaching À0.52.IOD has a positive relationship with precipitation over the ZRB, with the r s value reaching 0.37 and significant at the 0.1 significant level.As listed in Table 4, models with four different combinations of predictors were set up, including the Bayesian-HMM without predictors, Bayesian-NHMMs with one (SNAO), two (SNAO + IOD), three (SNAO + IOD + NAO) predictors.

| Seasonality
The ability of the selected models in reproducing the seasonal cycle over the YRB and the ZRB is assessed in rainy season for 30 years, with red lines showing simulated average rainy-season precipitation.The range of precipitation variability for 1000 simulation results in each chosen model is presented by the gray-shaded areas, which are the 50% and 95% confidence bands.The coefficient of variation of root mean squared error (CVRMSE) of each model is calculated to estimate the difference in the performance of models.The CVRMSE is defined as followed: It can be pointed out from Figure 12 that the Bayesian-HMM could not reproduce the seasonal cycle of daily rainfall during rainy season over the YRB, which also has the largest CVRMSE value of 0.236 as compared to other Bayesian-NHMMs.In contrast, the seasonality of rainy-season precipitation is well captured by the Bayesian-NHMMs with different combinations of predictors, with CVRMSE values ranging from 0.17 to 0.174.The Bayesian-NHMM with predictors Niño 1-2 and IOD have better performance in reproducing rainfall seasonality compared to other models, considering its smallest CVRMSE value (0.17).
The mean precipitation during rainy season over the YRB for multi-years has a one-peak feature, with rainfall reaching the largest amount in the middle of rainy season (Ding & Wang, 2008).Mean simulated rainfall in each Bayesian-NHMM correspondent with the characteristic of rainy-season rainfall over the YRB, with most observed values within the 95% confidence band.However, the Bayesian-NHMMs underestimate extreme rainfall amounts during the rainy season over the YRB.
Figure 13 demonstrates the ability of the Bayesian-HMM and Bayesian-NHMMs in simulating the seasonal cycle of daily rainy-season rainfall over the ZRB.Similar to the YRB, the Bayesian-HMM also could not capture rainfall features over the ZRB, with a larger CVRMSE

| Interannual variability
In this section, the models' abilities to capture interannual variability of precipitation amount and wet days in two river basins are evaluated (shown in Figures 14-17).In these figures, blue lines represent an interannual variation of rainy-season rainfall amount/wet days, with the red lines showing average simulated rainy-season precipitation/wet days for every year.The range of precipitation/wet days variability for 1000 simulation results in each chosen model is presented by the gray-shaded areas, which interprets the interquartile range.
Figure 14 shows observed and simulated precipitation during rainy season at the interannual scale for the YRB.As shown in the map, the Bayesian-HMM cannot capture the interannual variability of daily rainfall, which has the largest CVRMSE value (0.093) among models.The Bayesian-NHMMs with different combinations of predictors show similar performance in simulating interannual precipitation amounts, with CVRMSE ranging from 0.084 to 0.088.The simulated rainfall amount and the variability of the confidence band could well reflect the interannual variation of observed rainy-season precipitation.One-predictor (Niño 1-2) Bayesian-NHMM, with the smallest CVRMSE value of 0.084, shows better performance than models with more predictors.Most simulated values are within the 95% confidence band.
Figure 15 presents the same type of plots as in Figure 14, demonstrating the interannual variability of rainy-season rainfall amounts for the Zhujiang River basin.At the interannual scale, the Bayesian-HMM has the largest CVRMSE value (0.144) and the worst performance in simulating rainfall among selected models.In contrast, Bayesian-NHMMs with different predictors all have good results in terms of simulating rainfall amounts at the interannual scale.Among models, the Bayesian- NHMM with three predictors (SNAO+IOD + NAO) fits the observation better in comparison with models with one or two predictors, with the smallest CVRMS value (0.135).The variability of simulated value and the confidence bands could reflect the variation of observed rainfall amount.The ability of models with one and two predictors in simulating interannual precipitation is not good enough, with the CVRMSE value reaching 0.144 and 0.143, respectively.
Figure 16 shows the interannual variation of wet days for the Yangtze River basin.It could be pointed out that the Bayesian-HMM has the worst performance in simulating interannual wet days among models, with the CVRMSE value reaching 0.07.In the Bayesian-HMM, the tendency of simulated interannual wet days is not clear, which is not correspond to the observed wet days.In contrast, the Bayesian-NHMMs with different predictors all have good performance in reflecting the variability of interannual wet days, with CVRMSE values of approximately 0.06.Among Bayesian-NHMMs, the model with two predictors (Niño 1-2 + IOD) has the best performance in simulating wet days.
Figure 17 shows the same type of plots as in Figure 16, presenting interannual variability of wet days during the rainy season for the Zhujiang River basin.The Bayesian-HMM without predictors still has the worst performance in simulating wet days at the interannual scale, with the largest CVRMSE value of 0.091.The variation of simulated wet days could not reflect the variability of observed values.Similarly, the Bayesian-NHMMs with one or two predictors also have weak skills in simulating wet days, with CVRMSE values reaching 0.09.In contrast, the Bayesian-NHMM with three predictors (SNAO + IOD + NAO) has a better result compared to other models.Most extreme values are within the 95% confidence band and the changes in simulated wet days could reflect the observed values.

| SUMMARY AND DISCUSSION
The suitability of a new model (Bayesian-NHMM) to simulate multisite precipitation in different climatic conditions in East Asia is checked in this paper.Two river  In order to get insight into hidden rainfall patterns correlated to weather states, composite atmospheric anomalies at 850 hPa geopotential height for four hidden states over the ZRB, where the variation of hidden states is much clearer than the YRB, were presented in Figure 18.Previous research has checked different circulation patterns at 200, 500, and 850 hPa in East Asia and found that heavy rainfall is mainly related to wind anomalies at the 850 hPa level (e.g., Li et al. (2016)).As presented in Figure 18, State 1 demonstrated the weakest monsoon and moisture flux near the ZRB.State 2 shows a similar circulation pattern to State 3, both with stronger monsoon and moisture flux in comparison to State 1. State 4 is representative of the strongest monsoon.A stronger monsoon could bring more moisture to regions and contribute to a larger amount of precipitation (Cao et al., 2019).The spatial patterns of monsoon and moisture flux in Figure 18 correspond with the variability of four hidden rainfall states over the ZRB (Figures 8, 9), which could present a physical linkage between hidden states and atmospheric forcing factors.
The Bayesian-NHMM was then applied to simulate precipitation based on different combinations of atmospheric predictors, whose performance was evaluated based on their ability to capture seasonality and interannual variability of precipitation.The results demonstrated that the Bayesian-NHMM is better to capture such variability of rainfall over the YRB and ZRB as compared to the Bayesian-HMM, which means that the Bayesian-NHMM has good performance in simulating seasonality and interannual variability of precipitation under different types of climate in East Asia.Extreme values of precipitation are not well captured by Bayesian-NHMMs because the Bayesian-NHMM is not specifically designed for simulating extreme events, which solely relies on the tail behavior of gamma distribution to model extremes F I G U R E 1 7 Comparison between interannual variability of observed (blue) and simulated (red) wet days averaged for all stations during the period of 1987-2016 over the Zhujiang River basin.(Holsclaw et al., 2016).As a consequence, it is not suggested to replace other extreme value theory models when extremes are considered as the primary interest.

F
I G U R E 3 The Bayesian-NHMM network.Dependencies are denoted by arrows with time t going from left to right.F I G U R E 4 Daily rainfall amounts for 10 selected states in the Yangtze River basin.The dots in the figure demonstrate not only the location of rainfall stations but also the magnitude of daily precipitation amount.

F
I G U R E 5 Rainfall occurrence probability for 10 selected states in the Yangtze River basin.The dots in the figure demonstrate not only the location of rainfall stations but also the magnitude of rainfall occurrence probability.F I G U R E 6 The Viterbi sequence of most likely states from 1987 to 2016 for the Yangtze River basin.

Figure 4
Figure 4 demonstrates the average daily precipitation amount for each of the 10 states for the YRB.States are loosely ordered from light to heavy rain, with State 1 showing the smallest rainfall amount (0-3 mm) over the whole region.The mean daily precipitation amount for States 2-4 increased in different parts of the YRB (the Upper YRB, the northern part in the Middle YRB, and lower YRB, respectively) as compared to State 1, with increased rainfall amount ranging from 3 to 9 mm.State 5 represents a larger daily rainy-season precipitation amount (3-6 mm) for all stations compared to State 1. States 6-10 show increased precipitation amounts in different regions in comparison with State 5, with increased daily precipitation changing from 9 to 29 mm.Figure5shows the probability of precipitation for 10 states over the YRB.State 1 has the smallest rainfall occurrence probability, which is lower than 0.2 for the Middle and Lower YRB and 0.2-0.4 for the Upper YRB.State 2-3 show similar patterns to State 1, with a higher possibility for rainfall over the whole region (0.4-0.6 for

F
I G U R E 8 Daily rainfall amounts for four selected states in the Zhujiang River basin.The dots in the figure demonstrate not only the location of rainfall stations but also the magnitude of daily precipitation amount.
in detail.As shown in Figure 7a, State 1 occurred most frequently near the beginning end of the rainy season.The frequency of State 2, State 4, State 6, and State 7 has no obvious trend during the whole rainy season.In contrast, State 3, State 9, and State 10 showed most frequently from the middle to the end of rainy season, which means that the increase of rainfall amount occurring in the Middle YRB usually happened from June to August.Figure 7b presented the interannual variability of 10 states for the YRB and there is no obvious trend.

F
I G U R E 9 Rainfall occurrence probability for four selected states in the Zhujiang River basin.The dots in the figure demonstrate not only the location of rainfall stations but also the magnitude of rainfall occurrence probability.

4. 2 |
Daily precipitation calibration and validation over the YRB and ZRB 4.2.1 | Selection of the potential predictors for the Bayesian-NHMM The first step for building a Bayesian-NHMM is to choose potential predictors.There are many largeclimate indices affecting precipitation over the YRB and the ZRB, such as ENSO, North Atlantic Oscillation (NAO), and IOD.Xiao et al. (2015) investigated the effects of ENSO, NAO, IOD, and Pacific Decadal Oscillation (PDO) on seasonal rainfall regimes over the YRB.They found that ENSO is the leading driver of seasonal rainfall variations and the influences of ENSO, PDO, NAO, and IOD on seasonal occurrence and intensity of rainfall events are complex.Gu et al. (2017) gave new insight into relationships between large-climate indices (IOD, NAO, and PDO) and hydrological processes over the ZRB and argued that river flows in different regions of the ZRB are significantly related to different largeclimate indices.As a consequence, eight climate indices related to the Pacific, Atlantic and Indian Oceans (i.e., Niño 1-2, Niño 3, Niño 3.4, Niño 4, IOD, PDO, NAO, and SNAO) are selected to analyze their correlations to rainy-season precipitation over the YRB and ZRB in this paper.Besides, large-scale indices have F I G U R E 1 0 The Viterbi sequence of most likely states from 1987 to 2016 for the Zhujiang River basin.

F
I G U R E 1 1 The seasonality of states averaged over 30 years and the interannual variability of states for all stations in the Zhujiang River basin.T A B L E 1 Correlation between area rainfall and climate predictors for the YRB. 2 + IOD + NAO+ Niño 3 Figures 12 and 13, respectively.The blue lines in the figures represent the seasonality of observed rainfall during the T A B L E 3 Correlation between area rainfall and climate predictors for the ZRB.Comparison between the seasonal cycle of observed (blue) and simulated (red) rainfall amount averaged for all stations in the period of 1987-2016 over the Yangtze River basin.

F
I G U R E 1 3 Comparison between the seasonal cycle of observed (blue) and simulated (red) rainfall amount averaged for all stations in the period of 1987-2016 over the Zhujiang River basin.value of 0.225 as compared to Bayesian-NHMMs.The Bayesian-NHMMs have better performance in simulating the seasonality of rainy season over the ZRB (with CVRMSE values ranging from 0.161 to 0.165) and extreme values in comparison to the YRB.

F
I G U R E 1 4 Comparison between interannual variability of observed (blue) and simulated (red) rainfall amounts averaged for all stations from 1987 to 2016 over the Yangtze River basin.

F
I G U R E 1 5 Comparison between interannual variability of observed (blue) and simulated (red) rainfall amount averaged for all stations from 1987 to 2016 over the Zhujiang River basin.basins(the YRB and the ZRB, China) with different types of climate were chosen, with the YRB dominated by the semi-humid and humid climate, and the ZRB having the humid climate.In comparison to the traditional NHMM, the new model allows for exogenous variables to influence both transition probabilities of hidden rainfall states and emission distribution.The Bayesian-HMM was applied first to determine rainfall states over two river basins.A 10-state model was selected for the YRB, with the ZRB having four hidden rainfall states.A larger number of precipitation states in the YRB is consistent with the larger number of rainfall stations and smaller spatial coherence due to more complex climate conditions as compared to the ZRB.Holsclaw et al. (2016) have applied the Bayesian-HMM to predict daily rainfall in the upper YRB and six hidden states were chosen in the model, but more rainfall states (K = 10) were selected in this paper.The different results between this study and the research ofHolsclaw et al. (2016) lie in the following two aspects: (1) larger region (the whole YRB rather than the upper YRB) was chosen in this paper, which has larger precipitation variability because of more complex climate; (2) rainy season was chosen as the study period in this paper, with the research ofHolsclaw et al. (2016) selecting summer rainfall.Spatial rainfall dependence and temporal variability for rainfall states over two river basins were then demonstrated in this paper.The spatial patterns and temporal variations of hidden states over the YRB are extremely noisy, which is correspondent with the research of Holsclaw et al. (2016), who reported that there are no trends in occurrence frequencies of hidden states in the Upper YRB due to large complexity of seasonal and interannual precipitation variability.In contrast, spatial variability of hidden rainfall states over the ZRB is much clear, with states loosely representing regionally North-South or East-West gradient features for rainfall amount and rainfall occurrence probability.In terms of interannual variability, the extreme values of the occurrence frequency of hidden states are consistent with the occurrence of floods and droughts over the ZRB.Zhi-Yong et al. (2013) found that extreme floods occurred in the whole ZRB in June 1994, which is correspondent to the high frequency of

F
I G U R E 1 6 Comparison between interannual variability of observed (blue) and simulated (red) wet days averaged for all stations during the period of 1987-2016 over the Yangtze River basin.State 4 (the wettest state) and small frequency of State 1 (the driest state) shown in 1994 in Figure11.Similarly, drought disasters happened in 2003 and 2004 over the ZRB, which is consistent with the higher occurrence frequency of State 1 and the lower occurrence frequency of State 4. The variability of hidden states could provide important information on the occurrence of floods and droughts.