Trends, Shifting, or Oscillations? Stochastic Modeling of Nonstationary Time Series for Future Water‐Related Risk Management

Hydrological time series often present nonstationarities such as trends, shifts, or oscillations due to anthropogenic effects and hydroclimatological variations, including global climate change. For water managers, it is crucial to recognize and define the nonstationarities in hydrological records. The nonstationarities must be appropriately modeled and stochastically simulated according to the characteristics of observed records to evaluate the adequacy of flood risk mitigation measures and future water resources management strategies. Therefore, in the current study, three approaches were suggested to address stochastically nonstationary behaviors, especially in the long‐term variability of hydrological variables: as an overall trend, shifting mean, or as a long‐term oscillation. To represent these options for hydrological variables, the autoregressive model with an overall trend, shifting mean level (SML), and empirical mode decomposition with nonstationary oscillation resampling (EMD‐NSOR) were employed in the hydrological series of the net basin supply in the Lake Champlain‐River Richelieu basin, where the International Joint Committee recently managed and significant flood damage from long consistent high flows occurred. The detailed results indicate that the EMD‐NSOR model can be an appropriate option by reproducing long‐term dependence statistics and generating manageable scenarios, while the SML model does not properly reproduce the observed long‐term dependence, that are critical to simulate sustainable flood events. The trend model produces too many risks for floods in the future but no risk for droughts. The overall results conclude that the nonstationarities in hydrological series should be carefully handled in stochastic simulation models to appropriately manage future water‐related risks.

the Niger River. The SML model was also adopted to stochastically model and simulate the Great Lakes system (Fagherazzi et al., 2007(Fagherazzi et al., , 2011. In addition, Kwon et al. (2007) employed wavelet analysis (Foufoula-Georgiou & Kumar, 1994;Labat, 2005) and autoregressive moving average (ARMA) models to simulate the selected signals of hydroclimatological variables. Even if its application was successful in simulating rainfall and temperature series, its wavelet components were not easy to select, and the ARMA model does not have feature the ability to reproduce oscillatory signals extracted from wavelet analysis. Also, Lee and Ouarda (2012) proposed a model that extracts signals from a time series with empirical mode decomposition (EMD) and simulates the combined significant signals with the nonstationary oscillation resampling technique (NSOR). The EMD results from a finite number of signals from the time series and the NSOR has the ability to reproduce the smoothed variability of the extracted signals. Further model development has been made to simulate time series with a multivariate framework (Lee & Ouarda, 2019).
Furthermore, deep learning models have been popularly employed in the literature, especially for the prediction of hydroclimatological variables by modeling nonlinear temporal relations in hydroclimatological variables (Kumar et al., 2019;Le et al., 2019;Qi et al., 2019). Among others, long short-term memory (LSTM), abbreviated as LSTM (Hochreiter & Schmidhuber, 1997), was employed as one of the most popular temporal deep learning models. Lee et al. (2020) recently employed the LSTM model to simulate hydroclimatological variables and proved that the LSTM model can be a good alternative to reproduce the long-term dependence structure of a historical time series since the LSTM model was originally invented to reproduce the long-term memory as presented in the name itself (i.e., LSTM). Likewise, there are a number of alternative models to illustrate the nonlinearities and nonstationarity embedded in hydroclimatological series (Khaliq et al., 2009;Myronidis et al., 2018;Sakiur Rahman et al., 2018;Zirulia et al., 2021).
Nonetheless, it is often not clear which type of nonstationarity is included in the target hydroclimatological time series, and its impact on water management is significant according to the definition of nonstationarity. Meanwhile, the record-breaking floods of 2011 in Lake Champlain (LC) in Vermont and New York, U.S. and the Richelieu River (RR) in the province of Quebec, Canada prompted the U.S. and Canadian governments to work on identifying how flood forecasting, preparedness and mitigation can be improved in the Lake Champlain and Richelieu River (LCRR) basin (Saad et al., 2016). The International Joint Commission (IJC) submitted the LCRR Plan of Study (PoS) to the Governments of Canada and the United States in 2013. The 2013 PoS presented the work required in the LCRR basin to analyze potential floodplain management solutions, to build a new flood forecasting system for the LCRR basin, and to suggest structural and non-structural flood mitigation and prevention measures. To perform the objectives suggested in the PoS, it is important to adequately generate synthetic net basin supply (NBS) series of the LCRR basin. These series are crucial to evaluate the adequacy of flood risk mitigation measures and management strategies under many potential hydrological scenarios that could occur in the future (Riboust & Brissette, 2015).
To simulate synthetic NBS series, an appropriate time series model must be selected according to the characteristics of the observed NBS series, including the stochastically nonstationary behavior in that the series presents oscillatory long-term variability (note that this has been discussed later). Furthermore, it has been requested that the simulated series reproduce the historical levels of LC and RR flows by the LCRR IJC technical work group.
Therefore, the objectives of the current study are (a) to apply feasible simulation models that can produce the long-term nonstationary variability of the hydrological series; (b) to compare the simulation models according to the reproduction of the water-related risks for water management; and (c) to provide all feasible future scenarios to test structural and non-structural flood mitigation and prevention measures according to the simulated scenarios.
The remainder of this paper is organized as follows. In Section 2, the mathematical background of the employed SML model and empirical mode decomposition with nonstationary oscillation resampling (EMD-NSOR) are reviewed. Section 3 presents study area and data description. Section 4 reports the results for the simulated data with the SML and EMD-NSOR models as well as the AR with trend. In Section 5, the comparison result of the tested three different models were discussed followed by future scenarios. Section 6 provides a summary and conclusions.

Mathematical Backgrounds
As discussed, the major objective of the current study is to build a stochastic simulation model that reflects the statistical characteristics of the NBS for the LCRR system shown in Figure 1 and to simulate a long-term series with the model. To build a relevant time series model, the NBS time series was observed, in Figure 2. Ouarda and Charron (2019) performed frequency analysis for the RR flood flows and Champlain level and they found that abrupt change points were detected in the NBS series. The overall variation in the NBS series can be viewed in following three different ways: as the overall linear trend (top panel), shifting mean process (middle panel), and long-term oscillation (bottom panel), as seen in Figure 2.
Modeling the overall variation is important in the current NBS simulation because this series should be used in testing mitigation plans for all possible hydrological events, such as the 2011 flood event in the LCRR basin (Saad et al., 2016) and water level declination of Great Lakes in 2014 (Watras et al., 2014). The overall trend, in the top panel of Figure 2, was modeled with a linear trend model and AR model for its residual. The shifting mean process in the middle panel of Figure 2 was modeled with the SML by Sveinsson et al. (2003). Additionally, the long-term oscillation feature in the bottom panel of Figure 2 was modeled with the EMD-NSOR model (Lee & Ouarda, 2012. In Figure 3, the overall procedure of the current study is presented. At first, the observed data was obtained and transformed into the normal domain. Then, the tested models as AR with Trend, SML, and EMD-NSOR were fitted to the observations followed by simulating the series for the current period. The model performance and comparison were made with the backtransformed simulation data. Finally, the future extension was made with the tested models. Map of the Lake Champlain and Richelieu River (LCRR) basin. Note that dark blue presents the whole area of the LCRR basin, light blue inside the dark blue presents Lake Champlain (LC), and the northward line from LC illustrates the Richelieu River. The map was provided by the International Joint Commission.

Autoregressive Model
A number of time series models have been tested in the literature to simulate time series of hydroclimatological variables Laux et al., 2011;Monbet & Marteau, 2004;Salas, 1993;Tao & Delleur, 1976). The traditional autoregressive (AR) model has been commonly used to consider random noise and time-dependent structures (Brockwell & Davis, 2003). For a hydroclimatological variable at time t (referring a year), Y t , the AR model of order p (AR(p)) can be explained as follows: where Y t is the time-dependent variable with zero mean, and ε t is the time-independent component as ε t ∼ N(0,σ ε 2 ), and is the parameter of the model for the jth lagged variable (i.e., Y t-j ). The target time series should have the condition that there is no long-term trend or cyclicity and the time series variable Y t is assumed to be normally distributed. For example, the AR(1) model can be simply written as follows: In other words, the AR model assumes stationarity so that the marginal and joint probability distribution of the variable Y t is consistent as follows:  Another feasible way to consider the time series of the annual NBS would be that the time series presents the overall trend, in Figure 2. It can be identified and modeled with a linear trend as follows: where 0 and 1 are parameters and "t" represents time (i.e., year). The residual can be modeled as a time series model such as AR(p) in Equation 1 (Hipel & McLeod, 1996). The model of AR(1) with Trend is referred to as ARwTr in the current study. Salas and Boes (1980) devised the SML to model an abrupt shift process, and Sveinsson et al. (2003) further improved this model and applied it to hydroclimatological variables. Note that the physical background of the SML is a sudden systematic shift from human effects or natural phenomena such as volcanic activities rather than random processes.

Shifting Mean Level Model
The basic formula of the SML (Salas & Boes, 1980) is presented for a sequence of a random variable Y t as follows.
= + where U t is a sequence of independent identically distributed variables with mean and variance 2 . Z t is a sequence with mean zero and variance 2 and is expressed as follows: where m i is a random variable with a zero mean and standard deviation of . This represents the various mean levels over the time expansion. = 1 + 2 + ⋯ + with 0 = 0 and N i is the timespan at a certain level, i, and a random variable with a geometric distribution and its parameter p as the following: Note that p is the probability that a mean level changes, and its reverse is the expected length that its mean level stays. Additionally, ( ]( ) is the indicator function, which takes the value 1 if ∈ ( ] ; otherwise, it takes the value 0. The sequences m i , N i , and U t are assumed to be mutually independent of each other. The correlation structure of the sequence Y t (Salas & Boes, 1980) is expressed as follows: where (ℎ) is the lagged correlation for the order h. This model has been tested, and its performance for reproducing long-term variability is well studied in the literature (Lee & Ouarda, 2012;Ouarda & Charron, 2019;Sveinsson et al., 2003).

EMD
EMD is invented to extract the different intrinsic modes of oscillations at separate frequencies in a time series. An intrinsic mode of oscillation is also called an intrinsic mode function (IMF) when it meets the following conditions: the number of extrema must be the same as the number of zero crossings or differ from it by no more than one, and the mean value of the two envelopes of the local maxima and minima must be zero. A time sequence can be decomposed into a finite number of IMFs through the shifting process in EMD. The sifting process is performed to obtain IMFs from a time series, y(t), where t = 1,…, N as follows: 1. Estimate all of the local extrema and link all local maxima (minima) with a smoothing technique to obtain the upper and lower envelopes with cubic spline (Press et al., 2002) in general (Huang & Wu, 2008). 2. Acquire the first component, h, by finding the difference between the time series and the local mean of the upper and lower envelopes, m, as = − . 3. Substitute y by h and repeat steps 1 and 2 until a certain criterion is met such that component (h) surely retains a sufficient physical sense of both amplitude and frequency modulations (Huang & Wu, 2008). 4. Assign the final h as the lth IMF, c l , and the residue is r l (i.e., r l = r l-1 − c l where r 0 = x). 5. Repeat steps 1-4 by handling the residue r l as the original data until the final residue (r n ) is a monotonic function, defined as c n+1 .
Finally, the original time sequence, x(t), is presented as the summation of the estimated IMFs c j, where j = 1, …, n + 1 (Huang & Wu, 2008) as Components with lower order have a higher frequency level and vice versa. Note that the lowest frequency level component is c n+1, while the highest frequency level component is c 1 .
Z. Wu and Huang (2004) and Z. H. Wu and Huang (2005) developed a statistical significance test for IMFs to examine whether an IMF obtained from EMD includes a true signal or just a noise component. This test compares the mean period and the spectral energy between the IMF of the original signal and that of the white noise. When the IMF energy of the observed data with a mean period is placed above the confidence level, it indicates that the corresponding IMF is considered to be statistically significant at the given level.

NSOR
The NSOR model (Lee & Ouarda, 2012 is based on two nonparametric techniques: K-nearest neighbor resampling and block bootstrapping. The stochastic simulation procedure of the NSOR model is briefly described in the following.

of 23
Assume a time series of a certain IMF component c(t) where t = 1, …, N. The superscripts H and G are employed to illustrate the historical and generated data, respectively.
1. Choose an initial value c G (0) from the N observations, supposing that each observation has an equal probability of being selected. 2. Simulate a block length, L B , from a discrete distribution (here, Poisson as in Lee & Ouarda, 2010). 3. Calculate the distances as: where 1 and 2 are the inverse variances of each component c and its change rates Δc, respectively. 4. Arrange the distances in ascending order and choose the first k values. Then, randomly choose one among the k values with the weighting probability of 5. Calculate the generated data with length L B with the following: where is the selected time index in step 4. 6. Repeat steps 2-5 until all the required data are simulated.

Overall Procedure of Stochastic Simulation With EMD-NSOR
The proposed EMD-NSOR model can be briefly described as follows: 1. Decompose the target time series y(t) into a finite number of IMFs with EMD. 2. Select the significant components from the significance test (Wu & Huang, 2004). 3. Apply the NSOR model to the selected significant IMF components and the stochastic simulation models to the residuals. 4. Simulate the selected IMFs and residuals with the NSOR model and the fitted stochastic simulation model, respectively. 5. Estimate the final simulation series by summing up the simulated components.

Performance Statistics
In order to compare the performances of the tested models in the current study, a number of statistics were estimated including key statistics and surplus-drought statistics. The key statistics applied are mean, standard deviation, skewness, lag-1 correlation, maximum and minimum as well as empirical cumulative distribution function (CDF) and kernel density estimate (Chen & Hsu, 2004;Lall et al., 1993) for probability density function (PDF). The drought statistics were also estimated as drought length and amount, surplus length and amount, and storage capacity as well as the Hurst Coefficient (Hurst, 1951). These drought statistics can be estimated with the sequence peak algorithm as the following.
With the water demand WD at time t, surplus and drought is defined with the deviation from the water demand as R t . In general, the mean value of the observation is applied for the water demand (i.e., WD = ) (Yevjevich, 1967). Therefore, The length of the continuous R t values that are greater than zero is surplus length and its summation is the surplus amount. Also, the length of the continuous values R t smaller than zero is drought length and its summation is the drought amount. Furthermore, the storage capacity can be estimated with the sequence peak algorithm (Loucks et al., 1981) as: 8 of 23 The storage capacity is the maximum value of S t . The maximum range (R n , unit: m 3 /s) is calculated as Also, the Hurst Coefficient (Hurst, 1951) is estimated by where S n is the standard deviation of the accumulated shortage for R n and N is the record length.
A trend test was also applied in the current study to detect the trend in the NBS time series. Among others, the Mann-Kendall test was applied (Kendall, 1955;Mann, 1945) in the current study due to its popularity. The Mann-Kendall test is a rank correlation test between the rank of the values and the ordered values as The standardized value of S k as is assumed to be normally distributed and tested with two-sided significance test (Yue & Pilon, 2004).

Study Area
The LCRR basin spans 23,900 square kilometers, with approximately 84% of the basin in northeastern New York and northwestern Vermont in the US and 16% in the province of Quebec in Canada, in Figure 1. The water supplies to the lake or river are commonly referred to as NBS and are estimated with both component-based and residual-based methods (Croley & Lee, 1993). Here, the component-based NBS series is used due to its accuracy and popularity in the literature (Fagherazzi et al., 2011;Ouarda & Charron, 2019).
During the spring of 2011, as mentioned, the LCRR basin experienced the devastating flood recorded over the past 100 years, with damages estimated at more than US$82 million. Diverse simulation series for all possible hydrologic scenarios are requested to check feasible mitigation plans for the LCRR system. Figure 2 presents the observed time series of the annual NBS (thick black line). It shows that the time series does not seem to have high autocorrelation, but the values are relatively lower during 1925-1970 compared to the later part of the series. Ouarda and Charron (2019) detected a significant change point approximately 1970 for this series, and it also presented a significant trend based on the modified Mann-Kendall trend test (Yue & Pilon, 2004).

Data Normalization
Standardization of datasets has been commonly employed for time series modeling to meet the normality condition of traditional time series models and many other machine learning algorithms by subtracting the mean and dividing by the standard deviation so that the result variable (Y t ) becomes standard normal with zero mean and unit variance.
In some cases, the standardization might not be appropriate when the original variable does not follow normally distributed data (e.g., skewed data). To prevent this case, the normal copula transformation can be applied (Jeong & Lee, 2015) as the following: where F z is the CDF of the Z variable, such as gamma, and −1 Φ is its inverse function of the standard normal distribution. From this transformation, the output variable has a standard normal distribution. Gamma distribution was applied for a marginal distribution of the original variable in the current study since the distribution has been popularly employed for hydroclimatological variables and well fitted to positively skewed nonnegative variables such as precipitation, flood, and drought data (Hanson & Vogel, 2008;Nadarajah, 2007Nadarajah, , 2009).
The Gamma CDF (F z ) can be fitted to the hydrological data (here, NBS) by estimating the parameter of the Gamma distribution. In principle, this Gamma-Gaussian transformation with the normal copula can be descripted 9 of 23 differently and it has been widely adopted in the hydroclimatological field. For example, quantile mapping for hydrological variables with the Gamma marginal distribution is equivalent to this Gamma-Gaussian copula transformation (Cannon et al., 2015;Maraun, 2013) and in developing standardized precipitation index (SPI) (McKee et al., 1993;Serinaldi et al., 2009), this transformation can be applied even though its approximation form was proposed in the original study of the SPI.
Finally, the back-transformation must be performed to have its original form as:

AR Model With Trend (ARwTr)
In the panel (a) of Figure 2, the overall trend was fitted after the trend was found to be significant with the Mann-Kendall test (a rejection of the null hypothesis with the significance level of 0.05, p = 0.0027). The residual is modeled with the AR(1) model since not much significant autocorrelation remains except for the lag-2 autocorrelation function (ACF) (data not shown). The same record length and 200 series were simulated, and one of the series is depicted in the top panel of Figure 4.
Normal copula standardization as in Equation 17 was not feasible for this model since its trend results in some infinite values are in the back-transformation. Instead, the typical standardization in Equation was applied. As a result, the skewness and extrema underestimate the observed values, as shown in Figure 5, while the other statistics as mean, standard deviation, lag-1 correlation are reproduced well. The drought and surplus statistics show some improvement in Figure 6 compared to the AR model presented in Figure S4 in the Supporting Information S1. However, it still presents some underestimation.
The simple AR(1) model has also been tested, and the result is shown in Text S1 in the Supporting Information S1 as well as Figures S1, S3, and S4 in the Supporting Information S1. The result indicates that no better performance can be found from the original AR(1) model than the ARwTr.

SML Model
As mentioned, the shifting mean level model assumes that the mean level persisted for a period and is changing randomly. Following these statistical characteristics, the model can reproduce the mean level shifting in the annual NBS, as shown in the middle panel of Figure 4. Note that the mean levels from the observation in the middle panel of Figure 4 were selected from the visual inspection. In contrast, the simulated mean levels with the standard normal distribution and the geometric distribution (Z t in Equation 7) were taken from the simulated data. The simulated data (see the red thick solid line with circles) show that its mean level is changing.
The basic statistics of the observation are reproduced fairly from the simulated data of the SML model in Figure 7, even though slight underestimation can be observed in the skewness and maxima. The drought and surplus statistics as well as the storage capacity of the observed data are reproduced fairly well with the SML model (see Figure 8) compared to the ARwTr and AR(1) models. However, the results still show underestimation for these statistics. Note that the lagged dependence structure of the SML model is exponentially decreased as shown in Equation 9 and the SML model reproduces only the short-term memory of the applied time series (Brockwell & Davis, 2003).

EMD-NSOR Model
To apply the EMD-NSOR model, the EMD procedure must be applied to the target time series so that the target time series are split into a few IMFs, as shown in Figure 9. Note that (a) the standardized data with the normal copula approach were applied to preserve the marginal distribution of the observed data; (b) long-term oscillation can be seen in the IMFs of c 3 , c 4 , and c 5 ; and (c) the last component (c 6 ) is not an IMF but the overall trend. Among the IMFs, one should select the significant IMFs. The significance test was applied to the IMFs, and the result is shown in Figure 10. Each asterisk represents each IMF, and an IMF farther away from the lines indicates that its corresponding IMF is distinguishable from random noise. The result indicates that c 5 and c 6 are the significant IMFs. However, Lee and Ouarda (2010) argued that this significance test has some drawbacks, such that no red noise signal (i.e., existing of the underlying time-dependent structure) is considered and thereby suggested a subjective judgment with inspecting several combinations visually and selecting the best combination that presents the best reproduction of the key and drought statistics as well as the best reproduction of the observed long-term oscillations. We tested several combinations in the present study and concluded that the summation of the IMFs of c 4 , c 5 , and c 6 should be used since c 5 and c 6 must be included based on the significance test, and c 4 must also be included since it contains relatively long-term oscillation and its modulation is close to the other selected IMFs. The other combinations did not lead to better performances in reproducing the long-term oscillation structure of the observed data.
In the bottom panel of Figure 4, the summed selected IMFs (thick black dotted line with cross markers) of the observed data are presented. The long-term variation in the observed time series is captured well. The selected     IMF was modeled with the NSOR model, while the summed residual IMF series was modeled with the AR(1) model to avoid losing any underlying time-dependent structure. The simulated series of the selected IMFs with the NSOR model is presented in the bottom panel of Figure 4, shown with the thin black dotted line. The plot shows that the NSOR model reproduces the long-term variability of the observed time series well. Figure 11 shows that the basic statistics are reproduced well, similar to the other models, since the normal copula approach was also applied in this model. As shown in Figure 12, the drought and surplus statistics are preserved best among all the tested models even if a slight underestimation can still be observed. The best reproduction of these statistics is from the reproducibility of the long-term oscillatory behavior in the observed data.

Comparison of Nonstationary Models
In the current study, a number of stochastic simulation models were applied to simulate the annual NBS for the LCRR system. To be a comparable stochastic simulation model, it must preserve the basic statistics and reproduce the long-term persistency.
The long-term variability of the observed time series cannot be reproduced with the traditional time series model (AR (1)) as well as with the trend (i.e., AR(1) with trend). The LSTM model was also not able to reproduce it and resulted in only minimal improvement. The SML model treats the long-term  The asterisks (*) farther away from the lines indicate that the corresponding intrinsic mode function (IMF) of the observed series is different from the corresponding IMF of random noise. Note that (1) c 5 and c 6 are significant, while c 1 , c 2, c 3, and c 4 can be considered insignificant signals random component (Wu & Huang, 2004) and (2) this significance test still does not consider a red noise signal as underlying short-term correlations, and further visual inspection and subjective selection must be included to find the proper selection of components (Lee & Ouarda, 2010 variability as shifts of the mean level and leads to substantial improvement. The EMD-NSOR model mimics the long-term variability with long-term oscillations and preserves the long-term variability best even though a slight underestimation can still be observed.
Since the SML and EMD-NSOR models are comparable and it is worthwhile to investigate the availability of the SML because the model was applied once to the NBS of the current system and to propose a comparable model, these two models were further tested. To do this work, further comparisons were made by focusing on these two models with the statistics of kernel density, CDF, spectral analysis, and Hurst coefficient.
In Figure 13, the CDF and kernel density for the simulated series and observed series are presented. Note that a kernel density is the PDF that is nonparametrically driven (Lall et al., 1993;Salas & Lee, 2010). The result indicates that both models reproduce the observed CDF and density well. In particular, the tail behavior of the observed CDF is well reproduced.
In the panel (d) of Figure 13, it seems that the simulated series with the EMD-NSOR model presents two types of distributions, with one having a median of 300 m 3 /s and the other having a median of approximately 400 m 3 /s, as shown in the two representative examples (see the thick black lines). This might result from the long-term persistent characteristics of the simulated series from the EMD-NSOR model. The simulated series can be either high or low for an extended period of time. As shown in Figure 4, the simulated series is higher during the 1960-1990 period and lower for the rest of the series and can hence be either high or low for a long time. The distribution of this series seems to be similar to the one in the left one among two representative distributions shown with the thick black lines in the panel (d) of Figure 13. Meanwhile, the series from the SML does not present this behavior. This lack of presentation might be because the estimated probability parameter "p" in Equation 8 for the annual NBS is 0.184. Therefore, the average number of sequences with the same mean is 5.438 (1/0.184) for the shifting mean process. The relatively high value of the probability parameter (i.e., p = 0.184) is due to the lower values of ACFs shown in Figure S2 in the Supporting Information S1. The parameters of the SML model are estimated with the first four or five ACFs and Equation 9. This indicates that the average length of a sequence with the same Figure 11. Boxplots of basic statistics of simulated series (93 years 200 series) with the empirical mode decomposition with nonstationary oscillation resampling model for the net basin supply (unit: m 3 /s) of Lake Champlain and Richelieu River as well as its observation (circle).
mean simulated from the SML model is not more than 6 years. It illustrates that the simulated series from the SML does not have as much as of a long-term persistency as the ones from the EMD-NSOR model.
In Figure 14, the estimated spectral density is presented along with the frequency (f = 1/T, where T represents time [here, years]). Note that a spectral density with lower frequency (f) indicates long-term persistency (i.e., larger T years). In detail, the EMD-NSOR model better preserves the spectral density for the lower frequency (f), representing long-term persistence even though the difference is not substantial.
We further estimated the Hurst coefficient and the maximum range (R n ) of the difference of a current NBS from the mean, as shown in Figure 15 (see its caption for details). The Hurst coefficient is the measure that Hurst (1951) suggested, for which a random and autocorrelation process has a value of 0.5, while most geophysical time series present a value of approximately 0.73. Here, the observed annual NBS has a Hurst coefficient value slightly larger than 0.8. It has been a challenging task to reproduce the observed Hurst coefficient because a number of stochastic simulation models with long-term dependence structures cannot reproduce the observed value. The median of the Hurst coefficient for the 200 simulated series from the SML model reaches up to 0.75, which indicates that the model can reproduce this statistic for most geophysical time series (approximately 0.73 as mentioned above). However, the observed annual NBS presents a higher Hurst coefficient value than the average, even if its ACFs are not very high. This high Hurst coefficient might result from the long-term persistency, as shown in the historical time series, and it is observed in the ACFs for the large lags between 40 and 44 (see Figure  S2 in the Supporting Information S1).
Meanwhile, the EMD-NSOR model reproduces this statistic well, in the top panel of Figure 15. The median of the Hurst coefficient of the simulated series from the EMD-NSOR model is approximately 0.8 and is very close to the observed value. This indicates that the modeling of the long-term variability as long-term oscillation (i.e., EMD-NSOR) might be appropriate for the annual NBS of the LCRR basin. Note that we also tested the LSTM deep learning model further. The description of the model and its result can be found in Text S2 in the Supporting Information S1. Not much critical improvement is seen from the result of the LSTM model, in Figures S6 and S7 in the Supporting Information S1.
The annual NBS for the LCRR shows a relatively lower correlation, especially in small lags, but it presents long-term persistency. The results conclude that the EMD-NSOR model can be a comparable alternative to stochastically simulate the series of the annual NBS for the LCRR even if the series has a very unusual dependent structure to reproduce. Note that all the additional statistics shown in this section were also estimated for the other models, such as AR(1), AR(1) with Trend, and LSTM. No model can outperform the EMD-NSOR model for these additional statistics (results not shown).

Future Strategies for Water-Related Risks
The provided results showed that the EMD-NSOR model provides reasonable simulation scenarios for handling water-related risks compared to the trend and SML models. In addition, a future extension with three tested simulation models was performed, and their implications for the management of water-related risks were studied. In Figure 16, each example of the extended time series up to 2,100 is illustrated for the ARwTr, SML, and EMD-NSOR at the top, middle, and bottom panels, respectively. Note that the initial condition was set as the NBS value of 2017 for the ARwTr and EMD-NSOR models, while no initial condition was employed for the SML due to its model characteristic.
The time series for the ARwTr at the top panel of Figure 16 illustrated the constant increase in the mean and resulted in an approximate mean of 450 m 3 /s for the future simulation, while it was approximately 400 m 3 /s during the observed period. This time series trend might lead to the overestimation of the flood risk and the underestimation of the drought risk. Significant physical evidence is requested to follow this simulation result. Key statistics and drought and surplus statistics for the simulated future series are presented in Figures 17 and 18, respectively. As expected, the mean (see the first box plot of the left top panel in Figure 17) is very high compared to the observed (see the circle) as well as the maximum and minimum shown in Figure 5 for the key statistics of the observed period. The drought length and amount are much lower than the observations shown in the first boxplots of the panels in Figure 18, while the surplus length and amount are much higher than the observations. The storage capacity is much higher than the observation. Note that the storage capacity is estimated by accumulating the difference between the current NBS ( ) and its demand (here, the observed mean ), as = −1 + ( − ) ). Therefore, the accumulated surplus should be relatively large in the increasing trend, especially for the future extension period. This scenario might lead to too high a risk for a surplus event (i.e., flood) and too low risk for drought.
The future scenario of the SML is randomly simulated without considering the initial condition of the last stage. Even if the mean level stays for a few sequences and varies afterward, the overall structure of the SML is stationary since the parameters of the SML are not varied. Therefore, the key and drought statistics are not much varied in the future extension, in Figures 17  and 18, compared to the observation period in Figures 7 and 8.
Meanwhile, the initial condition of the annual NBS for the EMD-NSOR model is relatively higher in the last 20 years, in the bottom panel of Figure 16. Therefore, the high phase continues up to 2,040, followed by the 30-year low phase, and then the 20-year high phase is followed again. This oscillation results in the future extension up to 2,100 with the EMD-NSOR model containing longer high phases than low phases. Subsequently, the mean of the annual NBS for the future extension (see Figure 17) might be larger than the observed period shown in Figure 11, and the minimum of the future period is larger than that of the observed period. As a result, the drought characteristics for the future extension with the EMD-NSOR model illustrate that the drought length and amount are biased to the lower values, in the last boxplots of the top panels of Figure 18, and the surplus length and amount are biased to the higher values. For the same reason, the storage capacity is higher than that observed. The magnitude of the bias difference from the observation for the EMD-NSOR model is smaller than that for the ARwTr model since the future extension with the EMD-NSOR model oscillates high and low phases rather than one direction increase such as the ARwTr. According to the model structures, the simulated series of the future extension have distinct characteristics. Unfortunately, it is very difficult to select a singular best model for a water manager as long as substantial physical evidence exists for a target water management system. Therefore, a water manager should prepare the system following the simulated future series and the economic budget allowed. Similar to the flood risk management system in the LCRR basin, surplus statistics are critical to apply. Even if the increas-  ing trend might result in the worst scenario from the point view of floods, it is often difficult for a water manager to accept the trend scenarios. Instead, the EMD-NSOR scenario can be a reasonable alternative that provides oscillatory scenarios and a substantial increase in flood risk.

Summary and Conclusions
The nonstationarities in hydroclimatological time series, such as trends, shifts, and oscillations from human development and climatic long-term variations, are critical to consider for water-related risk management. In the current study, how stochastics models can take nonstationarities into account by simulating different scenarios with different nonstationarities, such as trends, shifts, and oscillations, is illustrated. Furthermore, nonstationary Figure 16. Time series of the net basin supply (NBS) series (unit: m 3 /s) for the observed (solid black line) and the simulated series for the future extension (2018-2100 year) from three tested models as the trend, shifting mean, and summed selected IMFs. Note that the simulated example of the future extension is presented with the red thick solid line with a circle, while the trend, shifting mean, and oscillation are depicted with the black thick dotted line. stochastic simulation models were applied to the annual NBS of the LRCC basin to test feasible alternatives for flood mitigation plans and other actions for the LCRR basin.
Since the SML has already been applied in the Great Lakes study for stochastic simulation (Fagherazzi et al., 2011), the feasibility of the SML model and the suggestion of a comparable model for the NBS of the LCRR system were requested from the water managers of the LRCC. The annual NBS series for the LCRR system was investigated, and we noticed that the series seems to have long-term variability. The long-term variability was treated as an overall trend, shifting mean, and long-term oscillation, and appropriate models were selected to mimic the variability accordingly through the ARwTr, SML, and EMD-NSOR. Two additional models were also tested: the Figure 18. Boxplots of drought and surplus statistics for the future simulated series (2018-2100 yr) with three tested models (ARwTr, SML, and empirical mode decomposition with nonstationary oscillation resampling [EMD-NSOR]) for the net basin supply of Lake Champlain and Richelieu River as well as its observation (circle). AR model for traditional time series modeling and LSTM for a recent deep learning model with the capability of long-term memory.
The results show that the AR, ARwTr, and LSTM models do not adequately reproduce the observed drought and surplus statistics as well as storage capacity, while the SML and EMD-NSOR models reproduce these statistics. Meticulous comparisons were made with the statistics of CDF, density, spectral density, and Hurst coefficient for these two models. The detailed comparison results indicate that the EMD-NSOR model better preserves the SML model, especially for the long-term variability of the observed NBS. The poor performance of the SML model might be due to the weak ACF of the observed NBS, especially for short lags. In contrast, the EMD-NSOR model reproduces the long-term variability well, as illustrated with the Hurst coefficient.
In addition, the future extension of the annual NBS up to 2,100 was performed with the tested stochastic simulation models. The results showed that the ARwTr presents too much risk for floods and is difficult to adopt for flood risk management due to its limited budget. The SML scenarios of the future extension are not much different from the observed period. Instead, the EMD-NSOR model produces higher flood risk by providing reasonable and acceptable scenarios to apply.
Overall, the provided results indicate that the EMD-NSOR model can be a good alternative for stochastic simulation of the annual NBS for the LCRR system by treating the long-term variability as long-term oscillations. The synthetic series simulated from the EMD-NSOR model can be used to check the appropriateness for feasible mitigation action plans and preparedness for future floods in the LCRR basin. Further stochastic simulation models can be adopted by incorporating climate variables for future timeframes such as sea surface temperature or climate indices. The behavior of the simulated series is highly dependent on the incorporated climate variables not on the model itself.

Data Availability Statement
The simulation data were published in the Mendeley Data as Simulated NBS for LCRR and downloadable from https://data.mendeley.com/datasets/5n4vr53hbp. The net basin supply (NBS) observed data are available from the International Joint Commission (IJC, https://www.ijc.org/en) upon request. Software: MATLAB software of EMD was used in the link https://kr.mathworks.com/discovery/empirical-mode-decomposition.html. The shifting mean level model was performed with SAM2007, which is downloadable from http://www.sams.colostate.edu/. The MATLAB code (Gen_NSOR_sim) for the NSOR model is downloadable from the same website for the simulation data as https://data.mendeley.com/datasets/5n4vr53hbp.