Synthetic Simulation of Spatially‐Correlated Streamflows: Weighted‐Modified Fractional Gaussian Noise

Stochastic methods have been typically used for the design and operations of hydraulic infrastructure. They allow decision makers to evaluate existing or new infrastructure under different possible scenarios, giving them the flexibility and tools needed in decision making. In this paper, we present a novel stochastic streamflow simulation approach able to replicate both temporal and spatial dependencies from the original data in a multi‐site basin context. The proposed model is a multi‐site extension of the modified Fractional Gaussian Noise (mFGN) model which is well‐known to be efficient to maintain periodic correlation for several time lags, but presents shortcomings in preserving the spatial correlation. Our method, called Weighted‐mFGN (WmFGN), incorporates spatial dependency into streamflows simulated with mFGN by relying on the Cholesky decomposition of the spatial correlation matrix of the historical streamflow records. As the order in which the decomposition steps are performed (temporal then spatial, or vice‐versa) affects the performance in terms of preserving the temporal and spatial correlation, our method searches for an optimal convex combination of the resulting correlation matrices. The result is a Pareto‐curve that indicates the optimal weights of the convex combination depending on the importance given by the user to spatial and temporal correlations. The model is applied to a number of river basins in Chile, where the results show that the WmFGN approach maintains the qualities of the single‐site mFGN, while significantly improving spatial correlation.


• Weighted modified Fractional
Gaussian Noise model improves spatial correlations without significantly sacrificing temporal correlation • The method searches for an optimal convex combination of the spatial and temporal correlation matrices according on the user's priority • Our results in several Chilean basins demonstrate that Weighted-modified Fractional Gaussian Noise (WmFGN) represents a significant improvement over the mFGN in preserving correlations

Supporting Information:
Supporting Information may be found in the online version of this article.

Literature Review
Some of the most popular techniques in stochastic hydrology are based on bootstrapping (Efron & Tibshirani, 1994), which generates time-series from the random sampling with replacement of historical records that lost any autocorrelation specific to the original series.This strategy was followed by several variants such as the method of moving blocks Bootstrap (Srinivas & Srinivasan, 2005;Vogel & Shallcross, 1996) and nearest neighbor Bootstrap (Lall & Sharma, 1996).The former method only partially corrects the autocorrelation issues.
Bootstrapping-based methods can only generate data by resampling historical data, and thus are restrained by historical events, which is a drawback if one wants to simulate stochastic change conditions as projected in (IPCC, 2021).In parallel to bootstrapping-based methods, the family of autoregressive (AR) models arose as a first order Markovian model (Thomas & Fiering, 1962).These methods later evolved with multiple related works (Jettmar & Young, 1975;Matalas, 1967;Moreau & Pyatt, 1970;Young & Jettmar, 1976) to formalize the p-order AR (AR(p)) models (Box et al., 2015), and the autoregressive moving average model (ARMA).Autoregressive models adequately incorporate autocorrelation in the time-series, but they assume that the autocorrelation is constant in time.This is an important limitation for the simulation of shorter time step streamflows (e.g., less than one year), as autocorrelation does change over the year, due to seasonality.In view of the above, periodic autoregressive models (PAR(p)) have been proposed (Pagano, 1978;Parzen & Pagano, 1979;Salas et al., 1982), which are AR(p) models using sets of autocorrelations specific to each time period (e.g., weekly or monthly).However, even when the PAR(p) manages to circumvent the AR(p) models autocorrelation problem, doubts arise as to how long the period should be (e.g., monthly or seasonally), or which parameter estimation methodology should be used (Noakes et al., 1985).
More recently, Copula-based autoregressive models have been proposed for multi-site runoff synthetic generation (Chen et al., 2015;de Almeida Pereira & Veiga, 2019;Hao & Singh, 2013;Lee & Salas, 2011;Pereira et al., 2017;Reddy & Ganguli, 2012).A major strength of the Copula-based models is that they can incorporate different types of relationships between variables (e.g., spatial correlation), through an analytical distribution, which is an advantage over other methods.A monthly copulas model has been proposed in Xu et al. (2022) for flow forecasting that is highly capable of predicting future short and medium-term flows in non-stationary contexts.Note that flow forecasting is used for decision making, but some strategic decisions require long-term simulations, which are not addressed by flow forecasting methods.
Attempts to integrate both temporal and spatial correlations for synthetic runoff generation have been proposed with trivariate copulas by Chen et al. (2015), which simulates first a single streamflow (Lee & Salas, 2011), and then adds the multi-site correlation.Trivariate copulas are able to preserve cross-correlation between different tributaries at lag 0, and consistently replicate historical characteristics of the different sites such as mean, variance and autocorrelations in lags 1 and 2 (with larger but acceptable differences in the latter).Hao and Singh (2013)  10.1029/2023WR035371 3 of 22 use a maximum entropy copula to simulate multi-site streamflows, which requires iterative parameter estimation methods.However, as stated by the authors, this becomes complicated in a multi-site case.Consequently, copulas for multi-river simulation have not been widely used, in most cases they are applied to a single river (de Almeida Pereira & Veiga, 2019).Another application was developed by Pereira et al. (2017) through a two-stage model in which simulations for different sites (39 hydropower plants) are generated independently with a PAR(p) model.The spatial correlations are then incorporated in a second stage by means of vine-copulas as proposed in Erhardt et al. (2015).In de Almeida Pereira and Veiga (2019), the authors developed a multi-site flow simulator based on copula autoregressive (COPAR) model previously used in economics (Brechmann & Czado, 2015).The COPAR model has a periodic component and directly solve the temporal and spatial relationships of the different tributaries with a multi-dimensional copula.As in Chen et al. (2015), it ensures spatial correlations in the simulations close to the historical ones up to lag 2, and mean autocorrelations consistent with the historical ones up to lag 5 (except for some months).Copulas and other autoregressive based models have the drawback of not being sensitive enough to replicate high historical temporal correlation (Kirsch et al., 2013).
Other methodologies used in hydrology that deal with the simulation of temporal and spatial features simultaneously are introduced by Tsoukalas, Efstratiadis, and Makropoulos (2018) and Tsoukalas, Makropoulos, and Koutsoyiannis (2018).These authors design a new family of Nataf-based models which is an extension of Nataf's joint distribution models (Nataf, 1962) initially implemented to generate random vectors with arbitrary distributions in independent series but with cross-correlation.This process starts with the generation of random data from Gaussian copulas to then transform the marginal distribution with the inverse cumulative distribution function.The SMARTA (Symmetric Moving Average (neaRly) To Anything) model (Tsoukalas, Makropoulos, & Koutsoyiannis, 2018) expands the capabilities of a Symmetric Moving Average (SMA), from just Gaussian distribution to almost any distribution.The SMA models are able to replicate short-run and long-run time dependencies in univariate as well as multivariate context, but are unable to incorporate cyclostationary correlation structures (i.e., seasonality or periodicity in temporal correlation).A model which has several of the qualities of SMARTA and is able to capture the cyclostationary correlation structure is SPARTA (Stochastic Periodic AutoRegressive To Anything) (Tsoukalas, Efstratiadis, & Makropoulos, 2018).SPARTA just as SMARTA uses a Nataf-based model, but it starts with a PAR(p) model, instead of a SMA one.These gives SPARTA the capability of simulating cyclostationary correlation, but it also loses the capability of the SMARTA of simulating long-run time dependencies.
The long-run dependencies (LRD) are known as Hurst phenomenon (Koutsoyiannis, 2002), which is measured with the Hurst coefficient index (H).The higher the magnitude of the index, the higher the prevalence of significant autocorrelation at very high lags (e.g., 100 lags).A statistical model capable to capture and replicate the Hurst phenomenon, by explicitly using the Hurst coefficient, is the Fractional Gaussian Noise (FGN) method (Mandelbrot & Van Ness, 1968;Mandelbrot & Wallis, 1968, 1969).The FGN was originally proposed as a mathematical approach to emulate long-range dependencies seen in normally distributed data, which had immediate implications in hydrology.Not long after the development of the FGN, a multi-site extension of the method was proposed (Matalas & Wallis, 1971).Unfortunately it was concluded that FGN fails in simulations longer than 100 time periods (McLeod & Hipel, 1978).Although the FGN can properly simulate the frequency of extreme events (Mandelbrot & Wallis, 1968), the previously mentioned drawback was a dead end, until Kirsch et al. (2013) proposed the modified Fractional Gaussian Noise (mFGN), and solved the period barrier that hampered the use of FGN.Note that, similarly to FGN, the mFGN estimates parameters from a periodic time series and is based on preserving temporal correlation by summing expressions with increasing number of terms over time, but this modification of the FGN does not explicitly use the Hurst coefficient.The mFGN is able to generate univariate time series of infinite length, while replicating the cyclostationary correlation of it.With this model, Kirsch et al. (2013) demonstrated that one can use mFGN to simulate several years of streamflow while preserving its correlation structure.The streamflow generation with mFGN is performed by transforming the Gaussian generated data with a Log-Normal distribution.The method allows for additive changes in the mean and standard deviation of the time series, thereby making it a powerful and useful tool for climate variability and change studies.One advantage of the mFGN approach over autoregressive models is that it captures high levels of autocorrelation, cyclostationary correlation, as well as the Hurst phenomenon (Kirsch et al., 2013).
The mFGN proposed by Kirsch et al. (2013) performs well when replicating a single-site streamflow, but, when extended to a multi-site context, the method does not always present successful results (Herman et al., 2016).Herman et al. (2016) increase the likelihood of drought events by increasing the weight of low streamflows in the distribution successfully.Nonetheless, they try to extend the mFGN into a multi-site method by using a bootstrap resampling technique of historical data, which is able to preserve historical temporal correlations, but it presents some difficulties in preserving spatial correlation.Other attempts to develop a multi-site mFGN have been more successful, such as the "Kirsch-Nowak Streamflow Generator" (https://github.com/julianneq/Kirsch-Nowak_Streamflow_Generator),based on a monthly multi-site streamflow generation using Kirsch et al. (2013) mFGN, combined with a k-nearest neighbor method (k-NN) disaggregation into daily data, as proposed by Novak et al. (2010), after Lall and Sharma (1996) k-NN method.The "Kirsch-Nowak" method has been used in several studies (Giuliani et al., 2017;Quinn et al., 2017;Salazar et al., 2017).Note that the "Kirsch-Nowak" method, at a monthly scale, is the same as the original mFGN method in Kirsch et al. (2013), and consequently it does not fully resolve the spatial issue.The difficulties with the multi-site mFGN appear when spatial and temporal correlations are strong and present heterogeneity between locations and times of the year, as it happens in our case study.Therefore, there are still some gaps to be addressed regarding a multi-site mFGN.These gaps notwithstanding, in a single-site streamflow generation mFGN has shown to outperform autoregressive models such as AR(p) or PAR(p) (Kirsch et al., 2013), hence a proper extension of mFGN into a multi-site method would allow preserving its benefits in multi-site streamflow generation.
Successful multi-site stochastic methods have been developed in other areas of simulation.Bracken et al. (2016) proposes the use of a single site Bayesian hierarchical nonhomogeneous hidden Markov model method, which is then combined into a multi-site method by using an elliptical copula to simulate multi-site annual streamflow reconstructions.Another example was developed by Bracken et al. (2018), using a similar Bayesian hierarchical framework to model multivariate nonstationary frequency for hydrological variables.For that study, generalized extreme values were used for marginal distributions, in the simulation of peak snow water equivalent, peak inflows, and peak reservoir elevation for Taylor Park reservoir, Colorado, USA.For weather generation, Steinschneider et al. (2019) uses a nonhomogeneous Markov chain for the weather regime, then a block bootstrapping and a Gaussian copula for a multivariate, multi-site daily weather simulation, and finally it uses modules to impose a thermodynamic and dynamical climate change conditions.All of these are advances in multi-site stochastic method, but they do not possess some of the aforementioned qualities that the mFGN has in the case of single-site models (Kirsch et al., 2013), which still needs improvement as a multi-site method.
In this paper, we present a novel stochastic streamflow simulation approach able to replicate both temporal and spatial dependencies from the original data in a multi-site basin context.

Methodology
In this section we describe the Weighted-mFGN methodology we propose.Before that, we recall the FGN and mFGN methodologies proposed in the literature, upon which we build our approach.In what follows we shall assume that the monthly streamflow follows a log-normal distribution, which is a common assumption in the literature as streamflows do indeed tend to follow such a distribution in practice.Note that such an assumption is made only for the sake of simplifying the presentation; one can also use other distributions that allow for the normalization of streamflows in any of the methods discussed in the paper.Several examples of other distributions are presented in Svensson et al. (2017).The log-normal distribution was chosen for two main reasons: first, it is a two-parameter distribution, which makes it simpler and easier to use, and second, it is a strictly non-negative distribution, a desired quality when simulating streamflows.

Fractional Gaussian Noise
We start by describing the Fractional Gaussian Noise (FGN) method (Mandelbrot & Van Ness, 1968;Mandelbrot & Wallis, 1968, 1969).Note that the FGN described here is the version of the method used by Kirsch et al. (2013), and not the original method.The (Kirsch et al., 2013) approach estimates the parameters of the FGN from a time series, rather than using the Hurst coefficient as in the original FGN method.We have chosen to describe the FGN method as presented in Kirsch et al. (2013) since our WmFGN method builds upon the latter work.
Consider a matrix  Ŷ which is populated with N years of historic inflow data in such a way that the hydrological years are set as rows, and each month is a column (it is implied that months are treated as independent processes , where the first subscript indexes the years in the data, and the subscript j ∈ = 1, …, J stands for the jth month): As a first step, the matrix  Ŷ in Equation 1 is transformed to resemble a Normal distribution in each of its months, and then it is standardized.More specifically, let The next step is to generate a new matrix X of size N s × J with independent random samples from a Normal(0,1) distribution, where N s is the number of years to simulate.This matrix X is called the uncorrelated synthetic inflow matrix.To introduce the time dependencies of the original streamflow time-series, the matrix Σ ≔ Corr(Y) is computed, which is the square and symmetric correlation matrix of the original rearranged time-series containing the pairwise correlation coefficients between all the months.The Cholesky decomposition of Σ is then computed as: The Cholesky decomposition in Equation 3 is the key step of the FGN because with the resulting upper triangular matrix Q the uncorrelated synthetic inflow matrix can be adjusted to capture the historic monthly temporal correlations, that is, one computes The output matrix Z is of size N s × J.Note that Corr(Z) ≈ Corr(Y) as desired, thereby preserving the temporal correlation between months of each year, but not the correlations across years.Finally, Z is transformed back into the original space of streamflows by computing.

Modified Fractional Gaussian Noise (mFGN)
The FGN approach described above provides a clean and simple of way to incorporate temporal correlations.One deficiency of the method, however, is that only considers the correlations between months within the same year.
To overcome that issue (Kirsch et al., 2013), propose a modification to the method that overlaps 6-month periods.
More specifically, let Y be the matrix constructed in Equation 1.Then, a new matrix Y′ is built (see Figure 1a) so that the row corresponding to the ith year in Y′ contains the last six months of year i plus the first six months of year i + 1 in Y (note that Y′ is one row shorter than Y).That is, Y′ can be constructed by applying a linear operator   to Y as follows.Let

Now define the
Then we have that Let Q′ be the matrix corresponding to the Cholesky decomposition of Corr(Y′).Now, consider as before a matrix X of size N s × J with independent random samples from a Normal(0,1) distribution, where N s is one year more than the ones to be simulated, and the matrix Q corresponding to the Cholesky decomposition of Corr(Y).Then, a new matrix X′ of size N s − 1 × J is constructed by applying the linear operator   defined above to X, that is,   ′ ∶=  () , and one computes The final matrix Z of simulated values is then built by using the right-most columns of Z 1 and Z 2 , as indicated in Figure 1b.

Spatial Correlation Integration Into Temporally Correlated Data
The mFGN method described above yields excellent results in the sense that the corresponding simulated series preserve the temporal correlation of the data.The method, however, falls short of representing spatial correlations adequately.To circumvent this limitation, Kirsch et al. (2013) propose a modification that uses the same random "seed" when simulating correlated basins.The spatial approach proposed by Kirsch et al. (2013) consists in applying the mFGN as described in Section 3.2, but with a slight modification when building X.Instead of using random Normal(0,1) numbers to fill X, one bootstraps values from the historical data Y.That is, for each month/year one wants to simulate, a year is selected randomly from the historical data and the corresponding month of that year is used for X.The spatial correlation is then imposed by making sure the historical bootstrapped seed corresponds to the same month and year in each site.
To illustrate the idea, suppose we want to generate simulated values for the months of January, February and March in the year 2025 at two nearby sites, and that Y contains the monthly records of both sites from 1981 to 2020.Then, a random selection of years for X might choose 1992, 2005, and 1987 for January, February and March, respectively, and the corresponding Y values of those months/years are used for both sites.That is, by denoting by     the simulated streamflow for month j of year i at site k (and similarly for the historical data Y) we have: Note that each site will have its own set of simulated values X, but the values will be correlated because the historical data is spatially correlated.Nevertheless, the mFGN distorts the spatial correlation as reported by Herman et al. (2016).As explained by Herman et al. (2016), the cross-correlation or spatial correlation is downward biased because this is simulated in the first step by bootstrapping, which then adds the temporal correlation or autocorrelation in a second step by the Cholesky decomposition procedure at each site.This second step that adds the autocorrelation is the one that biases the spatial correlation downward.Because of the limitations of mFGN in preserving spatial correlation, we propose an alternative method, as we describe next.
To introduce spatial correlation to the independently simulated streamflows of each site, we shall consider a three-dimensional version of the normalized historical inflow data matrix Y defined in Equations 1 and 2 so that the third dimension corresponds to each site (see Figure 2).Denote the new structure as   , which has dimension N × J × K, where K is the total number of sites and denote by Y k the normalized historical inflow data matrix for site k.We then have that , where The main idea of our procedure is described as follows.First, we create matrices , where As a second step, we calculate the spatial correlation matrix Corr(U j ) and its upper triangular Cholesky decomposition matrix R j (of dimension K × K), that is, Next, we construct a three-dimensional matrix   similarly to   , but using the matrices Z k of simulated data constructed in Section 3.2 for each site k instead of the normalized data matrices Y k .As before, we define matrices The key step of our procedure is the calculation of the matrices The matrix   now contains our simulated data for all sites and all months, which takes into account both temporal and spatial correlations, in that order.We shall call this procedure mFGNS, which is illustrated in Figure 2.

Reverting the Order: Temporal Correlation Integration Into Spatially Correlated Data
The mFGNS procedure proposed in Section 3.3.1 makes clear that spatial correlation is incorporated into the simulated data after accounting for temporal correlation.One could, however, invert the order in which we apply  9, we can first construct matrices U 1 , …, U J as in Equation 10and their respective Cholesky decomposition matrices R j as in Equation 11.The next step is to generate a new matrix  X of size N s × K with independent random samples from a Normal(0,1) distribution, where N s is the one year more than the number of years to simulate.Now, by using the Cholesky decomposition matrix R j , the uncorrelated synthetic matrix  X can be adjusted to capture the spatial correlation for each month j, that is, one computes Note that  Ṽ (which has dimension N s × K) contains the spatially correlated simulated data for month j.We then construct the three-dimensional matrix Then, for each k = 1, …, K, we apply the mFGN procedure of Section 3.2 with  Z in place of X, thereby yielding a matrix  W which incorporates temporal correlation into the (spatially correlated) simulated data for site k.
Finally, we construct a three-dimensional matrix   as The matrix   now contains our simulated data for all sites and all months, which takes into account both spatial and temporal correlations, in that order.We shall call this procedure SmFGN, which is illustrated in Figure 3.

Combining the mFGNS and SmFGN Approaches
As discussed earlier, the procedures mFGNS and SmFGN described in the previous sections both aim at the same goal, which is to incorporate spatial correlation into the mFGN approach.The two procedures, however, lead to different simulated results, as the order in which the spatial and temporal correlations are considered does indeed matter.As we shall see in Section 5, the dimension that is considered first (spatial or temporal) is worse represented in the simulated data than the dimension that comes second.Our proposed approach WmFGN deals with this trade-off; other previous methods, such as mFGN, implicitly prioritize the temporal correlation at the expense of the spatial correlation.
It is natural then to consider a weighted average of the simulated data generated by the procedures mFGNS and SmFGN, a procedure we shall call Weighted mFGN (WmFGN for short).Note that for the WmFGN procedure to work, the random numbers X used in the mFGNS and SmFGN procedures must be the same.More specifically, we consider the three-dimensional matrices   and   defined respectively in Equations 14 and 18, and define, for α ∈ [0, 1], Our goal is to find the value of α such that the spatial and temporal correlations induced by  () are closest to the corresponding correlations of the historical data.With that in mind, we define the following error metrics: In the above equations, Y j denotes a slice of   across the month j, Y k denotes a slice of   across the site k, and similarly for  Ŵ () and  Ŵ () .The metrics defined in Equations 20-23 measure the correlation error in four different ways-spatial or temporal error, mean or maximum error.Suppose the decision maker is interested in minimizing both the spatial and temporal errors, but one of the dimensions is more important than the other.Such preference can be represented by a (user-defined) parameter λ ∈ [0, 1] such that the temporal error has weight λ whereas the spatial error has weight 1 − λ.We can then define two optimization problems to find the optimal α: (25) Note that, in either case, the optimal α* is a function of the user-defined parameter λ.Once α* is found, by using Equation 19 the simulated data is then defined as  ( * ) .
Remark: The metrics defined in Equations 20-23 can be interpreted in terms of vector norms on matrices (see, e.g., Horn and Johnson (2012)).To see that, given a square matrix A M×M , for p ≥ 1 define the ℓ p -norm As customary, the above definition can be extended to p = ∞ as follows: ‖‖∞ ∶= max Define now the following error metrics: In the next sections, we present a case study to illustrate the application of the WmFGN procedure described above, and compare the results with those obtained by using the approach of Kirsch et al. (2013).

Case Study
To test the WmFGN methodology, we simulate streamflows from the Bio-bio river basin depicted in Figure 4.The Bio-bio river basin, located in Southern Chile, presents an average annual precipitation of 1,330 mm, which leads to mean daily discharges of 960 m 3 /s (Grantham et al., 2013).The basin has a significant urban area, which includes the city of Concepción, a large percentage of forest plantations (∼20% of the land cover), and has been historically important for the country due to its hydroelectric production (Grantham et al., 2013).The Bio-bio river basin represents almost 40% of the hydroelectric potential of Chile, a country that is historically known for the importance of its hydroelectric sector (CNE, 2023).In addition, the Biobio region (i.e., formed by the Biobio basin, small coastal basins near Biobio and also used to include the northern Ñuble region, see Figure 4) supplies about 10% of Chile's urban drinking water consumption (Molinos-Senante & Donoso, 2021).
Given the importance of hydroelectricity for Chile and the Bio-bio river basin, we decided to perform our numerical analysis on the streamflow database of that basin used by the National Electric Coordinator (NEC) (CEN, 2021).The data includes weekly streamflows (i.e., considering four weeks per month) between the hydrological years 1960/61 and 2018/19-note that hydrological years start in April in this region, for the rivers of interest for the NEC (e.g., inflows of hydro-power plants).After filtering the NEC streamflow database by location, the weekly flows were aggregated into monthly time series.Then, the flows were filtered to identify those for which most of their months (i.e., at least 9 out of 12) had a log-normal distribution (as evaluated by the Kolmogorov-Smirnov test, using a significance value of 5%).Finally, nine sites remained for the Bio-bio river basin, which are those presented in Figure 4. Statistics for these rivers are presented in Table 1, which include location, annual streamflow mean and standard deviation.
The seasonal variation of the monthly mean and standard deviation of the streamflows, for the period 1960/61-2018/19, are presented in Figure 5 for three representative rivers (Abanico, El Toro, and Ralco).Seasonal variations for the remaining six rivers are given in Supporting Information S1 (Figures S1 and S2).As can be seen in these figures, most locations, regardless of the streamflows magnitudes, present a double peak in the Winter months (Jun-July) and in Spring (October to December).The first one is related to a pluvial peak, given that most of the precipitation falls during Winter, while the second peak is related to snow-melt.Hence, the Bio-bio river basins has a mixed nivo-pluvial flow.
Some recent challenges of the Bio-bio river basin have been related to both floods and droughts.Hence, developing proper hydrological modeling is of importance for the basin.The Bio-bio river basin suffered a 100-year flood during Winter of 2006 (Gironás et al., 2021).On the other hand Bio-bio is undergoing a "megadrought", which corresponds to an uninterrupted event of below-average precipitation years since 2010 (Boisier et al., 2016).The  megadrought is a phenomenon that has affected other basins as well (Barría et al., 2021), having an impact for more than a decade over Central-Southern Chile (Garreaud et al., 2020(Garreaud et al., , 2021)).Also, climate change projections over Central-Southern Chile indicate that precipitations should decrease in the future (Araya-Osses et al., 2020;Chadwick et al., 2018Chadwick et al., , 2023)).

Optimal Parameters From WmFGN
The performance of the proposed WmFGN method is measured in term of its capability of preserving the original temporal and spatial correlations in the observed data.The measurements of correlation errors of the simulated data are computed in Equations 20-23 for different values of α and plotted in Figures 6a and 6b.The performance is also compared against that of mFGN, which is not dependent on α and presents difficulties with replicating the spatial correlation of the observed data.The temporal and spatial correlation errors for the mean and maximum error metrics for mFGN are computed using similar expressions as in Equations 20-23, but with the matrix  () replaced with the matrix corresponding to the mFGN method with spatial correlation added via re-sampling, as discussed in Section 3.3.1.We shall denote the resulting correlation errors for mFGN by Figures 6c and 6d depict the Pareto frontier of spatial and temporal correlation errors.As expected, there is a trade-off between obtaining good performance in the spatial correlation and good performance in the temporal correlation, both in terms of the mean error (Figures 6a and 6c) and the maximum error (Figures 6b  and 6d).
For this specific problem, we see in Figures 6b and 6d that the WmFGN method, with a value of α around 0.5, shows considerable reduction of the maximum error in the spatial correlation compared to mFGN (which coincides with the spatial error of WmFGN with α = 1), with almost no increase in the temporal correlation error.On the other hand, we observe in Figures 6a and 6c that there is a range of values of α (between around 0.75 and 0.98) for which both spatial and temporal mean errors for WmFGN are smaller than the mFGN errors, that is, for that metric WmGFN is superior to mFGN in both spatial and temporal dimensions.It is clear that the best possible option would be to have a method that perfectly simulates both temporal and spatial correlation.The mFGN method, as shown in Figure 6, by default prioritizes the temporal correlation over the spatial correlation, which is undesirable if one wishes to model spatial correlation properly.Thus, although having to balance between these two types of correlations can be perceived as a drawback of WmFGN, it represents a significant improvement compared to mFGN.Also, as discussed above, there is a range of α values that allow decreasing the spatial correlation, without sacrificing temporal correlation.
Although the results are specific for the Bio-bio basin, they represent a clear illustration of how the WmFGN presents an improvement over mFGN in terms of preserving both spatial and temporal correlations, in addition to allowing for more flexibility.As we shall see shortly, results obtained with an extended case for a larger portion of Chile are qualitatively similar to those obtained for the Bio-Bio basin.
When finding the optimal weight α balancing mFGNS and SmFGN in Equation 19 that minimizes the weighted sum of spatial and temporal mean (resp.maximum) correlation errors with model Equation 24(resp.Equation 25) under different user-defined parameters λ, we obtain a curve as displayed in Figure 7a (resp.Figure 7b).As discussed earlier, the value of λ allows the user of WmFGN to prioritize between the spatial (i.e., λ = 0) or temporal (i.e., λ = 1) correlation.Not surprisingly, the optimal value of α coincides with λ at the extreme cases--after all, if the user is only concerned with spatial correlation (i.e., chooses λ = 0) then the best combination between mFGNS and SmFGN is really just using mFGNS which gives the highest priority to spatial correlation, and that corresponds to taking α = 0 in Equation 19.An analogous argument holds for the case where temporal correlation is preferred.
The optimal values of the objective functions 1 (Equation 24) and 2 (Equation 25) are presented in Figures 7c  and 7d, respectively, for different values of the user-defined parameter λ.For comparison, we also compute the values of Objectives 1 and 2 for the mFGN.This amounts to calculating Objective 2: where    avg ,    avg ,    max and    max are the correlation errors for mFGN, as defined earlier.Note that for objective function 1, WmFGN always yields lower values than mFGN (Figure 7c), whereas for objective function 2, WmFGN yields lower values than mFGN for most values of λ, except when λ is close to 1 in which case both WmFGN and mFGN coincide (Figure 7d).The advantage of using WmFGN over mFGN increases as the user gives higher importance of spatial correlation over temporal correlation, which is visually represented as the increasing gap between the objective functions in Figures 7c and 7d, as λ approaches zero.
Deciding on an appropriate value for λ will depend on the user's priority for correctly simulating spatial or temporal correlation, which will eventually define the associated value for α.As an example of a situation in which a decision maker might want to prioritize the correct modeling of the spatial correlation over the temporal one, consider a system of similar reservoirs that are located geographically close to each other, so that properly understanding the system operation requires incorporating the spatial interaction among them.On the other extreme, consider a reservoir system that comprises a single large reservoir plus other smaller ones.In that case, one would most likely prioritize the temporal correlation.Suppose now that the user has similar priorities for both correlations, and wants to decide which λ to use.Then, an option could be simply using λ = 0.5.Interestingly, as seen in Figure 7a, such a value corresponds to taking α = 0.33, that is, giving twice the weight to mFGNS relatively to SmFGN.Another option could be equating the temporal and spatial errors, which for the mean error criterion yields a value of α of 0.562 (Figure 6a), whereas for the maximum error criterion it yields a value of α = 0.578 (Figure 6b).These values of α correspond to taking λ = 0.574 and λ = 0.842 for the mean and maximum error criteria, respectively (Figures 7a and 7b).Note also that these values of λ are the maximizers of the WmFGN objective functions in Figures 7c and 7d, and represent a change in the concavity of the α-curves (Figures 7a  and 7b).
To further assess the performance of the WmFGN method with the optimal α parameter, we adopted the value of α of 0.562, which as explained equates the mean spatial and temporal errors for the case study as one can see from Figure 6a.With this α we simulated 10,000 years of streamflows.Then the flow duration curves (FDC) were built for each river and compared with those of the observed streamflows.Figure 8 below shows the flow duration curves for the Ralco river from April to March; the curves for the remaining rivers can be found in Supporting Information S1 (Figures S41 to S48).The FDC Figures show that the WmFGN is able to properly simulate the historical observed frequencies.Also, given that more years were simulated in the synthetic generation, more extreme flows appear in the high flows, which is expected from a parametric method, and an advantage of using parametric seeds in the WmFGN method over using historical seeds, as was proposed by Kirsch et al. (2013) in the multi-site mFGN.
Further testing was conducted by applying the WmFGN method in the entire region of Central-Southern Chile.Due to reasons of space, we comment on the results but do not include figures; those can be found in Supporting Information S1, and are numbered Sxx in the text that follows.This extended experiment case study (Figures S49 to S55 in Supporting Information S1) was carried out using 37 rivers from 21 communes (local Chilean territorial divisions, similar to the counties), covering ∼1,000 km of territory from Central to Southern Chile.The climates range from semi-arid with less than 300 mm of precipitation in Central Chile to humid climate with ∼2,000 mm in the South.Note also, that in this area, there is a mixture of nival, mixed nivo-pluvial and pluvial basins (i.e., with more nival behavior in the northern region).
The overall behavior of the mean and maximum spatial and temporal errors (Figure S50 in Supporting Information S1, similar to Figure 6, but for the extended experiment), is quite similar to the case study presented here.Some differences are that the mean error has a slight increase in the extended experiment, due to the higher number of rivers been simulated, while the maximum error has a strong deterioration.The WmFGN shows that when several rivers are considered, the maximum spatial error can be significantly reduced without sacrificing the spatial error (α of 0.55 or higher).Because of this behavior, the objective function 2 in the extended case study shows a constant decreasing when diminishing the λ parameter (Figure S51 in Supporting Information S1).The WmFGN approach is less affected than mFGN when simulating the extended experiment, so there is more to gain by using WmFGN with a larger number of rivers.

Illustration of the Behavior of the Correlations
One advantage of the WmFGN approach, compared to other methods proposed in the literature (including mFGN) is that it tailors the procedure according to the importance of temporal versus spatial correlation specified by the user, which in this case is accomplished by means of the parameter λ in Equations 24 and 25.Figures 9d-9h and 10d-10h illustrate that flexibility, displaying the correlations calculated from the simulated data generated by WmFGN using the mean error metric (Objective 1).The figures depict the temporal correlation of among months for a representative river (Ralco), and the spatial correlation among locations for a representative month (November), respectively, for different values of the parameter λ.Note that Ralco was chosen as representative because it is the river with the highest streamflow (Table 1), and November is one of the peak flow months in the basin (Figure 5).In addition to the actual correlations, Figures 9i-9m and 10i-10m show the correlation errors with respect to observed data.For the sake of comparison, Figures 9a-9c and 10a-10c show the correlations of observed data, the correlations calculated from data simulated for mFGN, and the associated correlation errors.
The figures demonstrate that WmFGN accomplishes what it proposes to do.For values of λ ≥ 0.75 (priority to temporal correlation), we see in Figure 9 that the temporal correlation errors are indeed small.These errors increase as λ decreases.In Figure 10 we see the opposite effect-the spatial errors are small for λ ≤ 0.25 (priority to spatial correlation), and increase as λ increases.The figures also corroborate the previous conclusions about the benefit of the WmFGN approach over mFGN for preserving both spatial and temporal correlations.The mFGN procedure-which by construction prioritizes temporal correlation-presents very low temporal correlation errors as shown in Figure 9c, at the expense of  high spatial correlation errors (see Figure 10c).This is in line with the results of previous studies (Herman et al., 2016).However, a comparison between the correlation errors for mFGN and for WmFGN with λ = 1 in Figures 9m and 10m (which is the comparable case where full priority is given to temporal correlation) shows that the temporal correlation errors for WmFGN are in fact smaller than those for mFGN, and the spatial correlation errors are similar.Moreover, by introducing flexibility via the λ parameter, the WmFGN approach allows the user to "sacrifice" some of the precision in the temporal correlation in order to increase the precision in the spatial correlation-a flexibility that is not present in mFGN.
The above discussion is based on the results corresponding to Objective 1 (mean error criterion).Similar conclusions can be obtained by examining Figures 11 and 12, which display the results corresponding to Objective 2 (maximum error criterion).Note also that the conclusions drawn above for the chosen representative river and month apply similarly to the other rivers and months considered in this paper as shown in Figures S3 to S40 in Supporting Information S1.
There is also an extended exercise that replicates Figures 9-12, but using 37 river basins across Central and Southern Chile.Considering so many river basins in the extended exercise, has a cost in the performance of WmFGN, which can be seen when comparing Figures 9i, 9j, 11i, and 11j, with Figures S52i, S52j, S54i, and S54j in Supporting Information S1, respectively, showing some degradation in the representation of the temporal correlation.This effect is not so clear for λ values of 0.5 or higher.On the other hand, WmFGN shows a great potential, being able to replicate the spatial correlation in a variety of basins, for such a territorial extension (Figures S53 and S55 in Supporting Information S1) obtaining a good representation of the spatial correlation.The same results were built for each month and river, but are not shown due to the large number of figures, nevertheless, the performance is similar to those of the selected figures.
Although the choice of error metric to be used (mean or maximum error, corresponding to Objective 1 and 2, respectively) is problem-specific and depends on the priorities of the user, there are some general recommendations.For example, when comparing the WmFGN temporal correlation errors in Figures 9i-9m and 11i-11m, we see that the latter are more sensitive to changes in the user-defined parameter λ-indeed, the errors are similar up to λ = 0.75, and then they change considerably for λ = 1.The correlation errors in Figure 9 change more smoothly and hence are not so sensitive to small changes in λ.Such behavior is also illustrated in Figures 7a and 7b, where we see a smoother curve for the case of the mean error metric.These properties, if desired, would favor the use of the mean error metric over the maximum one.Overall, we tend to recommend using the mean error as it is representative of the average performance.However, there are cases where the maximum error might be more important, for example, in a basin where one river is more important than the rest (i.e., it is larger and provides most of the water availability), one might want to optimize the maximum errors in that river, because having an error there would lead to a poor hydrological modeling of the entire basin.

Conclusions
Hydrology has used for several years the synthetic simulation of hydroclimatic variables in different problems.Several reasons make it attractive to extrapolate historical records, or to have the capability of analyzing the behavior of infrastructure under conditions different from the historical ones.When evaluating the design of new water infrastructure such as reservoirs or water facilities, stochastic methods have been used.These tools have also shown to be useful in the evaluation of the operation of current infrastructure.In addition, due to challenges as climate variability and change, it does not suffice to evaluate new and existing infrastructure under historical conditions.For this reason, the synthetic simulation of streamflows that are not only consistent with historic statistical properties, but also adjustable to future conditions is necessary.The stochastic models of the family of Fractional Gaussian Noise (FGN) have great potential for this.
The FGN approaches have shown to be capable of capturing long term memory in time series.Unfortunately, the original FGN procedure is not able to simulate infinite time series; that changed when the Modified FGN (mFGN) method was developed.The mFGN procedure is capable of simulating infinite time series that recreate the seasonal or periodic correlation structure, overcoming the major limitation of FGN.Also, mFGN has shown to replicate the temporal correlations of the data.However, mFGN is not well suited to represent the spatial correlation structure required to simulate several streamflows at the same time.
In this paper we have proposed a new method, called Weighted mFGN (WmFGN), that improves spatial correlations without significantly sacrificing the temporal correlations.Our numerical experiments for several basins in Chile demonstrate that the WmFGN procedure represents a significant improvement in preserving the spatial 10.1029/2023WR035371 20 of 22 correlation, when compared against mFGN.Moreover, since there is a trade-off in terms of representing the spatial and temporal correlations, the method allows the user to specify the importance of one type of correlation over the other, and tailors the method for that choice by optimizing over some internal parameters.To the best of our knowledge, no other method in the literature addresses this trade-off in a systematic way.
Regardless of the trade-off, in our experiments the WmFGN procedure outperforms mFGN, even when temporal correlation is prioritized.Moreover, the higher the priority of the spatial correlation specified by the user, the higher the benefit of using WmFGN over mFGN.
As discussed earlier, the proposed approach requires the user to specify the importance of temporal correlation over the spatial one, by means of a parameter λ such that the weight of temporal correlation is λ whereas the weight of spatial correlation is 1 − λ.In the absence of a preference, the user can give equal weights to both correlations (i.e., choose λ = 0.5).Note however that such a choice does not imply that the errors in both correlations (with regards to observed data) are the same; thus, another possible choice for the user is to impose that the errors in both correlations be equal, and let the method compute the corresponding value of λ automatically.
Finally, it is important to remember that our conclusions about the performance of WmFGN are based on the numerical experiments we have conducted.Although, we have tested the method in several basins in Chile, future studies should further test the WmFGN in other basins, and also with different climates.Moreover, different error metrics can be tested; for instance, the remark in Section 3.3 suggests that other vector norms on matrices (or, more generally, other matrix norms) can be used.As the scope of this study was to present WmFGN, we did not thoroughly test the method under climate change conditions.Future studies, should also consider adding climate change and testing the method in different experiments.This research was funded by Grants ANILLO ACT 192094 and FONDECYT de Iniciación 11220952.We want to give a special thanks to Constanza Zavala for her help in the literature review and in the extended experiment.We also want to thank the anonymous reviewers and the editors of the paper, who have written thorough reviews that helped us significantly improve the manuscript.
swapped data matrix S ≔ Y T. Define S 1 and S 2 as the left and right halves of S, that is,  = [ | ].

Figure 1 .
Figure 1.(a) Example process of how to retrieve Y' out of Y matrix (equivalent to the obtention of X' out of X).(b) Demonstration of how to build Z. Monthly scaled version of the mFGN process developed in Kirsch et al. (2013).

)
In the above equations, Y j denotes a slice of   across the month j, Y k denotes a slice of   across the site k, and similarly for  Ŵ () and  Ŵ () .We see that Equations 26 and 27 coincides with Equations 22 and 23 when p = ∞.Moreover, when p = 1, Equation 20 is equivalent to Equation 26 divided by JK 2 , whereas Equation 21 is equivalent to Equation 27 divided by KJ 2 .We use Equations 20-23 as they are more intuitive to formulate, but the interpretation as vector norms on matrices opens the possibility to measure the error with different values of p-for instance, p = 2 which corresponds to the well-known Frobenius norm.

Figure 5 .
Figure 5. Monthly mean (a, c, and e) and standard deviation (b, d, and f) of the streamflows of Abanico (a, and b), El Toro (c, and d), and Ralco (e, and f) rivers.

Figure 6 .
Figure 6.Mean (a) and maximum (b) spatial and temporal correlation errors as a function of α; Panels (c) and (d) depict spatial versus temporal error for the mean and maximum error metrics, respectively.

Figure 7 .
Figure 7. Optimal values of α as a function of λ for the mean error (a) and maximum error (b) criteria; (c) and (d) depict the optimal objective function values in Equations 24 and 25, respectively, as a function of λ.

Figure 8 .
Figure8.Flow duration curve (FDC) of the historical observed and those of 10,000 years generated with WmFGN.The values for WmFGN correspond to α = 0.562 for the Ralco river in the months of April to March (subfigures a through l), respectively.Note that the first month is April, as this is the start of the hydrological year.

Figure 9 .
Figure 9. Pairwise temporal correlations of the 12 months of the year (i.e., the first month of the hydrological year is April), for the Ralco river, for different time series: (a) observed, (b) mFGN, (c) absolute difference between observed and mFGN, (d-h) WmFGN with different λ values, subjected to objective function 1, and (i-m) absolute difference between the observed and WmFGN with different λ values.

Figure 10 .
Figure 10.Pairwise spatial correlations of the nine river sites (i.e., the sites use the numbering from Table 1), for the month of November, for different time series: (a) observed, (b) mFGN, (c) absolute difference between observed and mFGN, (d-h) WmFGN with different λ values, subjected to objective function 1, and (i-m) absolute difference between the observed and WmFGN with different λ values.

Figure 11 .
Figure 11.Pairwise temporal correlations of the 12 months of the year (i.e., the first month of the hydrological year is April), for the Ralco river, for different time series: (a) observed, (b) mFGN, (c) absolute difference between observed and mFGN, (d-h) WmFGN with different λ values, subjected to objective function 2, and (i-m) absolute difference between the observed and WmFGN with different λ values.

Figure 12 .
Figure 12.Pairwise spatial correlations of the nine river sites (i.e., the sites use the numbering from Table 1), for the month of November, for different time series: (a) observed, (b) mFGN, (c) absolute difference between observed and mFGN, (d-h) WmFGN with different λ values, subjected to objective function 2, and (i-m) absolute difference between the observed and WmFGN with different λ values.

Table 1
Locations of the Streamflows With Their Annual Mean, and Standard Deviation