Probabilistic Sensitivity Analysis With Dependent Variables: Covariance‐Based Decomposition of Hydrologic Models

Variance‐based analysis has emerged as method of choice for quantifying the sensitivity of the output, y, of a scalar‐valued square‐integrable function, f ∈ L2( χ $\boldsymbol{\upchi }$ ), to its d ≥ 1 input variables, x = {x1, …, xd}, with support x∈χ⊆Rd $\mathbf{x}\in \boldsymbol{\upchi }\subseteq {\mathbb{R}}^{d}$ . The prototype of this approach, Sobol's method is a generalization of the analysis of variance (ANOVA) to d > 2 independent input variables and decomposes y, as sum of elementary functions of zeroth‐, first‐, second‐, up to dth‐order. This independence assumption is mathematically convenient but may not be borne out of the causal or correlational relationships between the x's. This paper is concerned with variance‐based sensitivity analysis (SA) for correlated input variables, for example, multivariate dependencies in a posterior parameter distribution. We use high‐dimensional model representation (HDMR) of Li et al. (2010, https://doi.org/10.1021/jp9096919), Li and Rabitz (2012, https://doi.org/10.1007/s10910-011-9898-0) and replace Sobol's elementary functions with so‐called component functions with unknown expansion coefficients to disentangle the structural, correlative and total contribution of input factors. We contrast the default HDMR methodology with cubic B‐splines and sequential coefficient estimation against its successor, HDMRext of Li and Rabitz (2012, https://doi.org/10.1007/s10910-011-9898-0), which uses polynomial component functions with an extended orthonormalized basis. Benchmark experiments confirm that HDMR and HDMRext parse out the structural and correlative contributions of input factors to the model output and infer an optimal experimental design with parameter correlation. Our last study applies HDMRext to probabilistic SA of a watershed model. The multivariate posterior parameter distribution supports model emulation and yields sensitivity indices that pertain to measured discharge data.

where f 0 signifies the expected (mean) value, 0 = [ ( )] , of the model output and ϵ is the residual. This residual is assumed to follow a normal distribution, ∼  ( 0, 2 ) , with constant (unexplained) variance, 2 . The first-order functions, f i (x i ), characterize the individual effects of the input variables on the simulated model output.
The second-, f ij (x i , x j ), third-, f ijk (x i , x j , x k ), up to dth-order effects, f 12…d (x 1 , x 2 , …, x d ), characterize the contribution of groups of two, three, up to all input variables combined to y. These functions account for interactions between the input variables that contribute partially to the model output. with n 1 = d, main effects, n 2 = d(d − 1)/2, second-order effects, n 3 = d(d − 1)(d − 2)/6, third-order effects, and so forth. The total number of additive terms (summands) in the expression above equals 2 d .
Unknown to Sobol' (1993) at the time, the decomposition of Equation 2 was derived decades earlier by Hoeffding (1948) as a generalization of the analysis of variance (ANOVA) to d > 2 independent variables. Functional decomposition of the variance of the model output, y, in Equation 2 produces the following identity (Hoeffding, 1948;Sobol', 1990) If we divide all terms by Var[y], we yield Sobol' sensitivity indices pending the assumption that 2 ≪ Var[ ] . The S i 's are the individual (main) effects of the input variables on the simulated output and the sensitivity indices with more than one subscript are the interactions. As the order of the indices in the subscript of the interaction terms is inconsequential, we follow convention and sort the indices from low to high.
A useful related measure is the total-order or total-effect sensitivity index (Homma & Saltelli, 1996) and includes all Sobol' indices of Equation 4 with index i in the subscript. The total-order sensitivity index equals the fraction of variance that would remain on average if x i is left to vary over its prior range while all other input variables are fixed (Puy et al., 2021). Note that ∑ =1 T ( ) ≥ 1 as interaction terms, say, S ij (x i , x j ), count in both T and T . Most mathematical functions do not admit a closed-form solution for the Sobol' indices, and, therefore, we must resort to a numerical procedure to estimate the sensitivity indices of Equation 4. This includes the (quasi)-Monte Carlo approaches of Sobol' (2001) and Owen (2005), Fourier amplitude sensitivity testing (Cukier et al., 1973), random balance designs (Tarantola et al., 2006) and variogram analysis of response surfaces (Razavi & Gupta, 2016) and involve repeated evaluation of the model output f(x) for x ∈ . Note that the last three methods solve only for the main and/or total-order indices of the input factors.
Sobol's decomposition is unique only under orthogonality constraints. If u is a subset of selected dimensions of the input variables, u ⊆ {1, …, d}, then Equation 2 is synonymous to a d-way ANOVA-expansion of f(x) under one of two constraints (Chastaing et al., 2015)  The independence requirement of x 1 , …, x d is mathematically convenient but may not be borne out of the causal or correlational relationships between the input variables. Such association is not uncommon in hydrologic modeling and may be imposed by constraints on the input variables or by the fact that experimental data, ̃= {̃1, . . . ,̃} , and/or expert judgment are used (Mara et al., 2015). Indeed, correlation among x 1 , …, x d is almost guaranteed if the model output originates from the posterior probability density function, ( |̃) , of the input variables (Gomes et al., 2017;Pan et al., 2009;Song et al., 2015;Vrugt et al., 2008;H. Wang, Gong, et al., 2020). This induced structure in the ( , ( )) -relationship violates the expansion of Hoeffding (1948) or Sobol' (1993) in Equation 2 and demands an ANCOVA-based model decomposition. Several publications have discussed the importance of parameter correlation in hydrologic SA (Song et al., 2015;Yuan et al., 2015), but most applications of global SA in hydrologic modeling continue to rely on parameter independence (A. Wang & Solomatine, 2019;A. Wang, Pianosi, et al., 2020;Zelelew & Alfredsen, 2013). A few studies have considered parameter correlation in hydrologic SA but the dependent parameters are (a) either lumped together (Khare et al., 2013) or (b) assumed to follow a convenient multivariate normal distribution (Ahn, 1996;Manache & Melching, 2008;Pan et al., 2011;Zhu, 2012) or isoprobabilistic Nataf transformation (Do & Razavi, 2020). The so-obtained sensitivity indices do not unify with Sobol' indices in Equation 4 or in the case of the gSTAR-VARS method of Do and Razavi (2020) provide only total-order effects.
In this paper, we are concerned with variance-based SA of hydrologic models in the presence of correlated input variables, for example, multivariate dependencies among the parameters of a posterior distribution. We use high dimensional model representation or HDMR (Li & Rabitz, 2012; and replace the elementary functions in Sobol's decomposition of Equation 2 by so-called component functions with linear expansion coefficients as unknown parameters. If the component functions satisfy hierarchical orthogonality, this meta-modeling approach disentangles exactly the structural and correlative contributions of individual and groups of input variables to the model output. The resulting sensitivity indices are a generalization of Sobol' indices in Equation 4. The HDMR methodology has been used in earlier publications in the hydrologic literature to prioritize and/or rank input factors (Bennett & Fentie, 2017), quantify interaction terms of the input factors (Borgonovo et al., 2017) and assess the direction of change in model outputs (A. Wang, Pianosi, et al., 2020). But these studies are limited to independent input variables, do not necessarily satisfy hierarchical orthogonality of the component functions and/or use a first-order model decomposition only. We use first-, second-and third-order HDMR model decomposition and examine in detail how the structural, correlative and total sensitivity indices depend on soil and watershed wetness. Furthermore, we investigate the impact of parameter correlation on experimental design and demonstrate the use of the HDMR decomposition for model emulation.
The HDMR method arguably is one of the most advanced SA methods, but by no means the only variance-based approach that can handle correlation between the input factors. These approaches simplify ANCOVA-based function decomposition to a correlated and uncorrelated contribution (McKay, 1997;Xu, 2013;Xu & Gertner, 2008a, 2008b or main effects and total sensitivity indices (Kucherenko et al., 2012(Kucherenko et al., , 2017, made possible in part by application of Sobol's method to orthogonalized input variables derived from grouping (Jacques et al., 2006;Saltelli & Tarantola, 2002), Gram-Schmidt projection (Chastaing et al., 2015;Mara & Tarantola, 2012) and isoprobabilistic transformation (Mara et al., 2015;Tarantola & Mara, 2017). HDMR decomposes the variance-covariance structure of the input-output relationship into a structural and correlative contribution at first-, second-, and/or higher orders. This general-purpose method admits samples of the posterior distribution of the input variables for probabilistic SA of the ( , ( )) -relationship. The remainder of this paper is organized as follows. Section 2 summarizes the problem at hand. Section 3 presents rationales and theory of HDMR and discusses its default implementation and a variant with extended bases, HDMR ext . This is followed in Section 4 by application of both methods to four case studies of increasing complexity. The first two studies illustrate and benchmark the performance of HDMR and HDMR ext for dependent and independent input variables by application to a simple linear model and the well-known Ishigami function (Ishigami & Homma, 1990). The next two case studies demonstrate the power and usefulness of HDMR ext in GAO ET AL.
10.1029/2022WR032834 4 of 29 hydrologic modeling using SA of the soil water characteristic of van Genuchten (1980) and the rainfall-discharge relationship using hmodel . In these last two studies, we analyze traces of the structural and correlative sensitivity indices and examine in detail the performance of the HDMR ext emulator as a cheap surrogate of the forward model. Finally, Section 5 concludes this paper with a summary and discussion of our main findings. Specifically, we address the strengths and weaknesses of the HDMR and HDMR ext , and provide some suggestions for future developments.

Problem Statement
Let's consider a real-world system, , whose future state, behavior and regularities may be described with explanatory laws, for example, those most fundamental and celebrated in physics. Our model, Y ← f(x, Ω), scales up the physical laws to the domain of , simulates its spatiotemporal evolution and returns a matrix, Y, of simulated values of different quantities of interest. The d-vector, x = {x 1 , …, x d }, specifies the system and/or material properties of and the array, Ω, consists of all other variables (constant or not) believed to govern, by causality, the variables of Y using the evolution rules of nature expressed in mathematical form. Generally, f is not available in closed form but we assume that it can be evaluated at any point, = { 1, . . . , } ∈ ⊆ ℝ .
As most real-world systems have an intractable complexity, the dimensionality of the input space, (x, Ω), may grow very large to accurately portray the spatiotemporal evolution of . This large model complexity will cloud the relationship between model inputs and outputs. SA will help determine which input variables exert the largest control on the simulated output, y, and which variables (parameters) have a negligible impact, thus, is an essential ingredient of model building and quality assurance (Saltelli et al., 2020). This serves many different purposes among which model simplification, model development, model calibration and uncertainty analysis and/or reduction are the most important applications. This paper addresses a critical limitation of commonly used SA methods in hydrology and is concerned with global SA in the presence of correlated input variables. Such linear and/or nonlinear dependencies violate the variance summation used in Sobol's method and necessitate application of a mathematically much more rigorous but arguably more difficult ANCOVA-based model decomposition. Before doing so, we first clarify our mathematical notation. Without loss of generality, we focus in this paper on the vector of model parameters, x, and consider only a single output, y. Thus, we suppress the array Ω in our subsequent notation and write, = ( ) ∶ ℝ → ℝ 1 , for the scalar-valued form of the model with respect to x. The entries of the d-vector, x = {x 1 , …, x d }, are referred to as input variables, input factors or parameters in the remainder of this paper. After computation of the model output, y = f(x), the input variables are normalized to the unit d-dimensional cube, = { | 0 ≤ ≤ 1 for all = 1, . . . , } , prior to SA.
In this paper, we are concerned with a rigorous variance-based description of the structural and correlative contribution of hydrologic input variables to the model output and not with computational efficiency. This is inconsequential for our current case studies, but can pose a major limitation for high-dimensional and/or CPU-intensive hydrologic models. We will address the computational requirements of the HDMR methodology in passing.
Although we use Sobol's method as main reference, we reiterate that this paper is not about a comparison of variance-based methods but rather about (probabilistic) SA in the presence of dependent input variables.

High Dimensional Model Representation
In this section, we present a brief review of the main theory and ingredients of the HDMR methodology of  and describe its most up to date implementation with extended bases, so-called HDMR ext , from Li and Rabitz (2012).

Main Theory
The HDMR method is an extension of Sobol's method to correlated input variables. HDMR is rooted in statistical theory and uses variance/covariance-based decomposition of the model output to determine the structural and correlative contributions of each input variable separately and combinations thereof. Projector operator theory simplifies the learning of the input-output relationships to polynomial complexity.
HDMR builds on the model output decomposition in Equation 2 and replaces the elementary functions,  Kucherenko et al. (2011), component functions up to the third-order will usually suffice for describing the ( , ( )) -relationship of systems with not equally important variables (type A) and systems with equally important variables and dominant low order terms (type B). Most physical systems are of these two types (e.g. Falchi et al., 2018;Li & Rabitz, 2012;Rabitz & Aliş, 1999;Ratto et al., 2007;Shereena & Rao, 2019;H. Wang et al., 2017;Ziehn & Tomlin, 2008). The last class of functions with equally important variables and dominant interaction terms, type C identified by Kucherenko et al. (2011), may demand fourth and/or higher-order component functions. This may introduce problems with the well-posedness of the HDMR functional decomposition, unless a very large number of input-output samples is used. Note that the sum of the component functions at the right hand side of Equation 7 may serve as a cheap surrogate model (emulator) of the original system model, f(x). The emulator provides an approximation of f(x) − f 0 up to the residual, . In practice, we approximate 2 by the sample variance, 2 , of the residuals of the surrogate model. The change in subscript notation of the component functions suits variance decomposition of the simulated output where w(x) denotes the probability density function of the d-variate prior distribution of the input variables, x. If we substitute Equation 7 into Equation 8 we yield the following expression  Var which is equivalent to where equals the unit cube of the normalized input variables. Least squares determination of the different f u 's enforces orthogonality between the component functions and residuals, ∼  ( 0, 2 ) . As a result, the inner product of each f u and ϵ equals zero. Thus, the second term dissipates and Equation 10 simplifies to This results in the following formulation for the variance of the model output, y, in Equation 7 Var The variance of the model output equals the sum of the variances and covariances of the component functions.
we yield the structural, a , and correlative, b , sensitivity of each component function, f u , and, thus, input variable or combinations thereof. The total sensitivity index, S u , of the uth component function, f u , now simply equals the sum of its structural and correlative sensitivity indices to yield  = a + b .
the sum of the total sensitivity indices of the component functions should amount to one. When the input variables are independent, the correlative contribution of the component functions will be zero by definition and Equation 13 simplifies to Sobol' variance decomposition and the total sensitivity index, S u , of each component function will then amount to its structural sensitivity index, a , only. Thus, for independent input variables, HDMR reduces to Sobol's method. We are left with the choice of the component functions of Equation 7. Unfortunately, the model decomposition in Equation 7 does not admit a convenient closed-form solution for the mathematical form of the component functions (Khorashadi Zadeh et al., 2017;Li et al., 2001). We must therefore prescribe a suitable functional form of the component functions.  used uniform cubic B-splines for the first-, second-and third-order component functions, where m equals the number of so-called Bézier curves of each cubic B-spline, α, β, and γ are unknown expansion coefficients of the component functions and B r,3 (x i ), B s,3 (x j ), and B t,3 (x k ) are defined on the unit intervals of by recursion using the Cox-de Boor algorithm (Cox, 1972;De Boor, 1972). Thus, for m = 4, the number of control points of each cubic B-spline is equal to m + 3 = 7.
If we substitute Equations 18a-18c into Equation 2 we yield the following expression for the model output which has a total of d(m + 3) unknown α-coefficients (first order), d(d − 1)(m + 3) 2 /2 unknown β-coefficients (second order) and d(d − 1)(d − 2)(m + 3) 3 /6 unknown γ-coefficients (third order). The total number of expansion coefficients, l, of the HDMR functional decomposition in Equation 19 is now equal to 7 of 29 and articulates a linear, quadratic and cubic growth of the number of expansion coefficients of the first-, second-, and third-order component functions, respectively, with the number of input variables d and control points, m + 3, of the B-splines. To guarantee a well-posed inverse problem, the number of input-output samples N should exceed the number of expansion coefficients, l. In general, the larger the quotient of N by l, the more robust the estimates of α, β, and γ will be and the smaller the confidence intervals of the HDMR sensitivity indices. In our case studies, the ratio N/l is at least equal to seven but often considerably larger.
As the variance decomposition of Equation 7 is additive and the basis functions of Equations 18a-18c are valid we can resort to regularized least squares to simultaneously determine the expansion coefficients, α, β, and γ, of all n 123 component functions from a collection of N different (x, y)-data pairs. As the number of expansion coefficients grows rapidly with dimensionality, d will determine whether the first-order component function of x d is significant or not. We must reject the null hypothesis, " 0 ∶ ( ) is insignificant ," if the F-statistic exceeds the critical value, −1  ( | 1 − 0, − 1) , of the F-distribution at confidence level, p α ∈ (0, 1), and l 1 − l 0 , and N − l 1 degrees of freedom. One can use forward selection (starting from f 0 ) or backward elimination (starting from 0 + ∑ =1 ( ) ) to determine the significant first-order component functions. Then, a similar procedure can be used for the second-order component functions, with significant first-order component functions fixed and so forth. Figure S1 in Supporting Information S1 summarizes the different steps of the HDMR methodology of . Interested readers are referred to this publication for more information.
The model decomposition of Equation 2 considers all possible orders of the component functions up to f 12…d (x 1 , …, x d ). As the higher order component function usually contribute little to the simulated output, y, the HDMR functional decomposition in Equation 19 uses only first-, second-and third-order component functions. If deemed appropriate, we can discard the third-order component functions and work with a second-or possibly even first-order model decomposition instead. We can evaluate the overall performance of the HDMR decomposition with different order component functions to determine a suitable maximum order ∈ ℕ+ of the component functions. Unless stated differently, we use m = 2 Bézier curves for the cubic B-splines and implement forward selection of the HDMR component functions using p α = 0.99.

Extended Bases
Sequential determination of the component functions of each given order with backfitting in the HDMR method may be computationally appealing but has important side effects that become more and more debilitating with increasing dimensionality, d, and degree of correlation of the input variables. Under those circumstances, backfitting may suffer convergence problems in pursuit of the optimum values of the expansion coefficients and does not guarantee a minimum variance of the residuals. Then the total sensitivity indices, S u , of the n 123 component functions may not sum to one. A more profound problem with the HDMR implementation is that the cubic B-splines may not satisfy hierarchical orthogonality. This provokes further problems with the variance decomposition of the model output.
To guarantee an exact decomposition of the variance of the model output, y, Li and Rabitz (2012) introduced an alternate implementation of the HDMR methodology with extended bases. This method, abbreviated herein to HDMR ext , enforces hierarchical orthogonality between the component functions. Furthermore, the coefficients, α, β, and γ of the first-, second-, and third-order component functions are simultaneously determined using least squares regression. These modifications guarantee an exact description of the structural and correlative sensitivity indices of the input variables, Hierarchical orthogonality of the component functions is guaranteed if we satisfy a so-called relaxed vanishing condition (Hooker, 2007) where u is a subset of elements of the superset U = {1, …, d}, x u denote the dimensions u of the input vector and w u (x u ) signifies the probability density function (pdf) of x u . For a second-order component function, the vanishing condition of Equation 22 dictates that f ij (x i , x j ) should be orthogonal to its lower order component functions, f i (x i ) and f j (x j ). With independent input variables, the multivariate pdf w u (x u ) equals a |u|-variate uniform distribution with constant density for all ∈ | | , and Equation 22 reduces to Equation 2 of Sobol' (2001).
To satisfy the vanishing condition of Equation 22, Li and Rabitz (2012) resort to the family of orthogonal polynomial functions where the coefficients, a, b, and c, are determined by applying Gram-Schmidt orthonormalization to the respective basis, x 3 , x 2 , x and 1 of the polynomials. This projection operator constructs an orthonormal basis for the polynomial functions on the unit interval of x with respect to an arbitrary weighting function. The component functions of Equation 7 are now equal to the sum of linear multiples of the orthonormalized polynomial functions of degrees 1 to p to yield where the expansion coefficients, α, β, and γ, may be estimated from a large collection of (x, y)-data pairs using least squares regression. Unlike the cubic B-splines in HDMR, the component functions of HDMR ext are written as sum of linear multiples of orthonormalized polynomial functions of degrees 1 to p. This is what Li and Rabitz (2012) refers to as extended bases and is required to satisfy the vanishing condition of Equation 22. The use of extended bases has implications for our index notation of the coefficients. The symbol(s) in the parenthesis of each superscript of α, β and γ convey the component function. The symbols that are listed next in superscript correspond to the indices of the input vector.
Suppose that all component functions have the same polynomial degree, p. This is not a requirement of our MATLAB implementation of HDMR ext but simplifies the mathematical description. Then, if we substitute Equations 24a-24c into Equation 2 we yield the following expression for the model output GAO ET AL.
10.1029/2022WR032834 9 of 29 where δ signifies the maximum order of the HDMR ext decomposition. Thus, the complexity of the HDMR ext emulator is governed by the dimensionality, d, of the input space and the order δ and polynomial degree p of the component functions.

We can write Equation 25 as a vector inner product
of a 1 × l design vector, ϕ( ) ⊤ , with orthonormalized polynomial functions of Equation 23 (and products thereof) evaluated at their respective entries of x and arranged in order of their appearance in Equation 25 and l × 1 coefficient vector with the corresponding multiples, α, β, and γ, of each component function where the symbol ⊤ denotes transpose. For a collection of N input vectors, {x (1) , x (2) , …, x (N) }, and corresponding function values, {f(x (1) ), f(x (2) ), …, f(x (N) )}, we must repeat the vector inner product of Equation 27 a total of N times. This system of linear equations can be cast in matrix form GAO ET AL.

10.1029/2022WR032834
10 of 29 where the N × l design matrix, Φ, and N × 1 data vector, b, have the following entries Thus, the design matrix stores row-wise the design vectors of the collection of x's and the data vector equals the right-hand side of Equation 27 for each of these N input vectors. Li and Rabitz (2012) introduced a solution method coined Diffeomorphic modulation under observable response preserving homotopy, or D-MORPH regression, which satisfies hierarchical orthogonality of the component functions and simplifies the search for the preferred optimum coefficient vector, ̂d m , among the potentially infinite number of plausible solutions. We provide a brief step-by-step summary of D-MORPH regression and refer interested readers to  for a more exhaustive description. In short, the least squares values of the expansion coefficients may be computed as followŝ satisfying all four Moore-Penrose conditions (Golub & Van Loan, 1996;Penrose, 1955), whose redundant rows (the first d ⋅ p rows of first-order basis functions) are removed and d is the (l − d ⋅ p) × 1 vector Φ ⊤ b without the first d ⋅ p rows. Whether unique or not, the least squares solution of Equation 32 does not enforce hierarchical orthogonality of the first-, second-, and third-order component functions. Therefore,  introduce a square constrained matrix, B, which consists of the inner products of the orthonormalized polynomial basis functions of Equation 28 so as to satisfy the relaxed vanishing condition of Equation 22 The expansion coefficients, ̂d m , derived from D-MORPH regression are now written as a linear combination of their values, ̂l s , obtained from least-squares regression where U l−r and V l−r equal the last l − r columns of the l × l matrices U and V determined from singular value decomposition (SVD) of the product of a l × l projection matrix P=(I l − G) and the l × l constrained matrix B where I l signifies the l × l identity matrix and r denotes the number of nonzero singular values. Upon determination of the D-MORPH values of the expansion coefficients, ̂d m , of the component functions, we resort to Equations 14 and 15 for the structural, a , correlative, b , and total, S u , sensitivity indices of each input variable and combinations thereof.
Hierarchical orthogonality of the component functions imposes a strong constraint on the expansion coefficients. This does not, however, relax the requirements of a sufficiently large sample N of ( , ( )) -data pairs. To guarantee robust estimates of the sensitivity indices with relatively small uncertainty, the number of input-output samples N must be substantially larger than the number of expansion coefficients l. In our case studies, the ratio N/l exceeds 10. Pending a sufficient order δ of the model decomposition, the use of orthonormalized polynomial functions with extended bases in conjunction with D-MORPH regression guarantees an exact variance decomposition of the model output.
The default implementation of HDMR ext assumes the third-order approximation of Equation 25. If so desired, the user can specify a lower order expansion of the model output. For example, for δ = 2, the HDMR ext model decomposition would use first-and/or second-order component functions only. This will involve substantially fewer expansion coefficients, l, and is particularly attractive for high-dimensional input spaces, . Of course, care should always be exercised that the HDMR ext expansion of the model output is sufficiently accurate. Unless stated differently, we further assume that p = 3, thus, use the orthonormalized polynomial functions of first-, second-and third-degree of Equation 23. This amounts to the component functions of Equation 24.
Algorithm 1 and Figure S2 in Supporting Information S1 summarize the different steps of the HDMR ext method. The user must specify the maximum order, 1 ≤ ≤ 3; ∈ ℕ+ , of the HDMR ext component functions and supply a collection, These d-vectors of input variables may be drawn from an arbitrary joint probability distribution, p(x), specified on the prior domain χ. In the context of probabilistic SA the samples of X amount to realizations of the posterior distribution, ( |̃) . Prior to computation of the design matrix, Φ, the samples of X are mapped to the unit hypercube, .

Case Studies
We illustrate the HDMR and HDMR ext methods by application to variance-based SA of the parameters of four different models with increasing complexity. The first two studies serve as illustration of both methods and benchmark the performance of HDMR and HDMR ext against known sensitivity indices of the parameters. The last two studies present the application of variance/covariance-based SA to hydrologic modeling. As a reminder, in each study, the input variables are normalized/rescaled to the unit interval of the cubic B-splines and orthogonal polynomial functions prior to application of HDMR and HDMR ext . The structural and correlative sensitivity indices, a and b , are color coded in our figures using "apple green" and "blue" as mnemonic of the superscripts Algorithm 1. Main steps of HDMR ext 12 of 29 "a" and "b", respectively. Table S1 in Supporting Information S1 specifies the values of d and N for each case study along with our settings for the algorithmic variables of HDMR and HDMR ext . These tabulated values confirm that (a) the ratio N/l is at least equal to 7 and (b) the number of expansion coefficients of HDMR exceeds that of HDMR ext . As this paper has a methodological focus we chose rather common hydrologic case studies of variance-based SA. This should make it easier to digest the results. As these cases involve the use of relatively simple functions and models, computational efficiency is not a key consideration.

A Linear Function
We follow  and consider a simple additive function where the d = 4 input variables, x 1 , x 2 , x 3 , and x 4 are drawn jointly from a multivariate normal distribution, ∼ 4(μ, ) with unit mean μ = [1 1 1 1 ] ⊤ and 4 × 4 covariance matrix, Σ, and the noise term, ∼  ( 0, 2 ) with 2 ≪ Var[ ] . We evaluate the function for two different collections of five-thousand samples, X 1 and X 2 , that differ in their covariance matrix As the off-diagonal entries of Σ 1 are zero, the entries of x in the first collection of N samples, X 1 , are uncorrelated. The samples of collection X 2 , on the contrary, are dependent with Pearson correlation coefficient of 0.6 (x 1 and x 2 ), 0.2 (x 1 and x 3 ) and 0.2 (x 2 and x 3 ), respectively. We implement the bootstrap method (Efron & Tibshirani, 1994;Storlie et al., 2009) and execute HDMR and HDMR ext a total of M = 100 different times with each trial using a different selection of N = 3,000 samples (drawn without replacement) from the original collections, X 1 and X 2 , of five-thousand parameter vectors. The linear function in Equation 36 does not have interaction terms, thus, a maximum order, δ = 1, will suffice for HDMR and HDMR ext . Table 1 reports mean values of the structural, a , correlative, b , and total, S i , sensitivity indices of the four input variables (parameters) of the linear function in Equation 36 for the collection of independent input vectors, X 1 , using HDMR (left-side) and HDMR ext (right-side). The values between parenthesis document the 99% confidence intervals of the bootstrap trials. The results of HDMR and HDMR ext are almost indistinguishable, as is to be expected when the input variables are independent. The structural sensitivity indices confirm that all four input variables have an equal contribution to the model output, y. The correlative sensitivity indices of about zero testify to the independence of the input variables. The 99% confidence intervals of the sensitivity indices appear very small as a result of using a relatively large number of samples N in each bootstrap trial relative to the number of unknown coefficients. Thus, the component functions are very well defined in each separate bootstrap trial.
We now move on to the collection X 2 of dependent input variables and present in Table 2 mean values and associated 99% confidence intervals (between parenthesis) of the structural, correlative and total sensitivity indices of HDMR and HDMR ext derived from the bootstrap method using hundred different trials. The results of HDMR and HDMR ext are again in close agreement but the presence of correlation among the input variables has a noticeable impact on the sensitivity indices of HDMR and HDMR ext . The direct contributions of x 1 , x 2 , x 3 , and x 4 to the simulated output of the linear model has lowered somewhat with structural sensitivity indices of about 0.17. The correlative sensitivity indices of all input variables but x 4 have changed from zero to values of about 0.13 for x 1 and x 2 and 0.06 for x 3 . The magnitude of these values is in agreement with the covariance matrix, Σ 2 , of the samples. Indeed, if we take the sum of the covariances in each row then we observe a 2:1 ratio in the total covariance of x 1 (or x 2 ) and x 3 . This simple back of the envelope calculation confirms that the correlative contribution of x 1 and x 2 should be about twice that of x 3 . Per the covariance matrix, Σ 2 , the fourth input variable, x 4 , is independent of the other three inputs. Hence, its correlative sensitivity index should be zero. Note that Sobol's method would only provide estimates of the structural sensitivity.
GAO ET AL.

10.1029/2022WR032834
13 of 29 We conclude that HDMR and HDMR ext yield similar results for a simple additive function with dependent or independent input variables. The values of the structural, correlative and total sensitivity indices are in agreement with prior expectations. Given the nature of the linear function, in HDMR and HDMR ext we used a first-order approximation only of the simulated output. One can expand the emulator with second-and third-order component functions, but this would not change the results as these terms should/will not contribute to the model output. One reviewer alerted to the smaller confidence intervals of a , b and S i for the dependent input variables. Albeit in agreement with , this result may not be intuitive. The covariance matrix Σ 2 enlarges the variance of the model output, which, in turn reduces the uncertainty of the structural, correlative and total sensitivity indices.

The Ishigami Function
As our second case study, we consider the somewhat more complex Ishigami function. This function is a recurring test case for SA methods due to its non-linear and non-monotonic behavior and presence of interaction effects between the input variables. The Ishigami function is given by Ishigami and Homma (1990) = ( ) = sin( 1) + sin 2 ( 2) + 4 3 sin( 1), where x = {x 1 , x 2 , x 3 } ∈ [−π, π] 3 signify the input variables and a and b are constants set to 7 and 0.1, respectively (Marrel et al., 2009). The non-monotonic behavior of the Ishigami function warrants use of a larger number of knots and polynomial degree for the component functions of HDMR and HDMR ext , respectively. Based on convergence analysis (see Figure S3 in Supporting Information S1) of the sample variance 2 of the functional decomposition of Equations 19 and 25, we use m = 5 Bézier curves (HDMR) and a sixth-order polynomial p = 6 (HDMR ext ) instead (e.g., Table S1 in Supporting Information S1).
In our first experiment, we draw N = 8,000 realizations, X = {x (1) , x (2) , …, x (N) } from the trivariate uniform prior distribution, ∼ 3(−π, π) between −π and π using Latin hypercube sampling. If computational efficiency is of key importance, then one can use Sobol' (1967) sequences instead as method for quasi-random sampling (Kucherenko et al., 2011(Kucherenko et al., , 2015Zuniga et al., 2013). For each sample, we evaluate the Ishigami function and admit the collection of N = 8,000 different (x, y)-data pairs to HDMR and HDMR ext . We use δ = 2, thus, discard the third-order component functions in our function approximation with HDMR and HDMR ext .  Figure 1 presents bar charts of the (a) total sensitivity index, S i /S ij , of Equation 15 and (b) total-effect, T of Equation 5, of the three input variables of the Ishigami function derived from Sobol's method, HDMR and HDMR ext using the collection of N = 8,000 independent samples from the uniform prior distribution. The left bar chart also display the sensitivity index of x 1 and x 3 combined (in yellow). The total sensitivity indices and total effects of HDMR and HDMR ext are not only in excellent mutual agreement (as expected) but are also an excellent approximation to the analytic sensitivity estimates of the Ishigami function articulated by Sobol's method (Homma & Saltelli, 1996). The third input variable, x 3 , does not have a first-order contribution to the output of the Ishigami function. Hence, its total sensitivity index, S 3 , is zero in Figure 1a. This is also true for the total sensitivity indices, S 12 , S 23 , and S 123 , which do not contribute to the model output as is evident from the mathematical definition of the Ishigami function. The only interaction that counts originates from x 1 and x 3 . We do not tabulate the uncertainty estimates of the sensitivity indices. The 99% confidence intervals of the total sensitivity indices of HDMR and HDMR ext are very small and on the order of ±0.01 for bootstrapping with 4,000 input-output samples.
We now turn our attention to the collection X 2 of correlated input vectors. The bottom panel of Figure 1 presents the same indices as in the top panel for (modified) Sobol's method, HDMR and HDMR ext but using a collection of N = 8,000 samples of x with correlation coefficient of 0.5 between the values of x 1 and x 3 . The modified Sobol method of Kucherenko et al. (2012) is a generalization of the variance-based sensitivity indices of Sobol to dependent input variables. Notice again the excellent agreement between the three different methods. The correlation between x 1 and x 3 increases the first-order contribution of x 3 from zero to about 0.12. This in turn reduces the first-order effect of x 1 and the second-order effect of (x 1 , x 3 ). This changes the ranking of the total-effects of the three input variables from T 1 > T 2 > T 3 with independent inputs to T 2 > T 1 > T 3 with correlation among x 1 and x 3 . Thus, correlation between the input factors of a model can profoundly change the relative importance of each input variable. The 99% confidence intervals of the HDMR/HDMR ext sensitivity indices have only marginally increased compared to the case with independent variables to an average of about ±0.012 for S 1 , S 2 , S 3 , S 13 . This second study confirms the ability and robustness of HDMR and HDMR ext to decompose the variance of a simple nonlinear function with interaction between some of its input variables. From hereon, we focus our attention on more practical problems using examples from hydrologic modeling. In these studies we use relatively large values of the ratio N/l and, consequently, the 99% confidence intervals of the sensitivity indices will be small. As a result, we do not report the bootstrap uncertainty estimates. Furthermore, as our interest is in a rigorous variance-based decomposition of the input-output relationship at first-, second-and higher orders, we do not further consider the modified Sobol method of Kucherenko et al. (2012). In fact, in reference to this and the modified FAST method of Xu (2013), Song et al. (2015) even goes on to say (p. 751) that there "…have been very few successful applications in hydrological modeling."

Model Description
The third case study considers a common problem in vadose zone hydrology and involves the characterization of the hydraulic properties of variably saturated soils. This study serves to demonstrate the ability of HDMR and HDMR ext to guide experimental design and help determine measurement collection. We use the capillary pressure-saturation function of van Genuchten (1980) which describes the relationship between the volumetric soil moisture content, θ [L 3 L −3 ], and soil water pressure head, h [L], as follows where θ s and θ r [L 3 L −3 ] denote the saturated and residual moisture contents, respectively, α vg [L −1 ] is an estimate of the reciprocal of the air-entry value and n vg is a dimensionless scalar. The soil hydraulic parameters are considered input factors in our present analysis, thus, x = {θ r , θ s , α vg , n vg }. To investigate the relative importance of each hydraulic parameter to the model output, θ, we generate a vector of two-hundred logarithmically equally spaced points of the soil water pressure head between decades −10 6 and −10 −1 cm. Next, we create the N × d matrices, X 1 and X 2 , by drawing N = 5,000 parameter vectors from a multivariate normal distribution, ∼ 4(μ, ) , with mean, μ, d × 1 vector of standard deviations, σ, and d × d covariance matrices, Σ 1 = R 1 (σσ ⊤ ) and Σ 2 = R 2 (σσ ⊤ ), specified in Table  S2 in Supporting Information S1. These listed statistics are taken from a study by Gomes et al. (2017) on the stability of a variably-saturated slope in Rio de Janeiro, Brazil. Thus, the input factors of X 1 are independent, whereas correlation is present between the input variables of X 2 . We consider a third-order approximation of the model output, δ = 3, and execute the HDMR and HDMR ext methods for each discretized soil water pressure head, h. This produces two-hundred values of the structural, correlative and total sensitivity indices, a , b , and S i , respectively, for individual input variables and combinations thereof. We separately also compute the total-effect, T , of each input factor.
After a careful inspection of the results of both SA methods, we decided to discard the results of HDMR and focus exclusively on HDMR ext in this case study. The sensitivity indices derived from HDMR reveal a problem with the variance decomposition of Equation 39 in the face of correlated input variables (see Figure S4 in Supporting Information S1). The total sensitivity indices of the n 123 component functions of HDMR do not sum to one. This cannot be resolved by using a larger sample size N but is simply a consequence of backfitting and failure of HDMR to satisfy hierarchical orthogonality of the component functions. A larger number of control points (or Bézier curves, m) may sometimes help to relieve this problem but this is not a permanent remedy. The deviation of the sum of the total sensitivity indices from unity is small at volumetric moisture contents close to residual saturation (left-side) but increases substantially at some of the soil water pressure heads in the wet range of the WRF. This inexact variance/covariance decomposition will corrupt and/or bias the structural, correlative and total sensitivity indices of the input factors. The third-order function approximation in Equation 25 does not suffer this problem. The total sensitivity indices computed by HDMR ext consistently sum up to one for all soil water pressure heads. This is a testament to the hierarchical orthogonality of the component functions. The graphs of the structural sensitivity indices bear a close resemblance to the local sensitivities (solid black lines) of the hydraulic parameters. This is not an uncommon finding and corroborated by the theoretical treatise of Kucherenko and Song (2016) which links global derivative-based measures of sensitivity to Sobol' total-effect index T in Equation 5. For independent input variables, derivative-based sensitivities can offer a valuable CPU-efficient alternative to variance-based methods for computationally costly and/or highly-parameterized nonlinear dynamic models (Kiparissides et al., 2009). But we may not compare the values of the two sensitivity measures as they have different units and interpretations. The variance-based sensitivity indices of HDMR ext are normalized, thus, dimensionless, and characterize the mean sensitivity of the input factors in the neighborhood of their mean values. The first-order partial derivatives have units of cm or cm 3 cm −3 and quantify only the local sensitivity of the model output, θ, with respect to each input factor. These local sensitivities are intimately related to the Fisher information matrix and serve an important purpose in experimental design. For functions that are nonlinear in their parameters, the global variance-based indices of HDMR ext will not match perfectly with the local sensitivity estimates of θ r , θ s , α vg , and n vg unless we confine the sample of input vectors of HDMR ext to a small space immediately surrounding their mean values of Table S2 in Supporting Information S1.

Uncorrelated Input Variables
The moisture content of the soil exerts a large control on the structural sensitivity of the hydraulic parameters. The most informative (θ, h)-data pairs for θ r , θ s , α vg , and n vg are beyond the soil's wilting point ( ∈ [ −∞, −1.5 × 10 4 ] cm), close to saturation ( ∈ [−10, 0] cm), near the air-entry value ( ∈ [−300, −200] cm) and around the inflection point ( ∈ [−1, 200, −1, 000] cm) of the WRF, respectively. This confirms earlier findings of Vrugt et al. (2002) and Vrugt and ter Braak (2011). Note that the listed ranges of the soil water pressure head, h, are not particularly meaningful as they are soil dependent. The total-effect of θ r , θ s , α vg , and n vg is indistinguishable from the structural sensitivity of each hydraulic parameter. Thus, the second-and third-order component functions of the HDMR ext emulator must have a negligible contribution to the model output, θ. This is not uncommon for simple functions and/or uncorrelated input factors.

Correlated Input Variables
The use of independent input variables does not do justice to the strong correlations found among the hydraulic parameters in laboratory and field experiments. Based on our results for the Ishigami function, we can only posit that this correlation will impact the structural sensitivities of the WRF parameters, θ r , θ s , α vg , and n vg , and govern their total effects. This may have implications for experimental design and the precise location of the most informative (θ, h)-data points along the WRF. Figure 3 presents the structural sensitivity, a , correlative sensitivity, b , and total-effect, T , of (a) θ r , (b) θ s , (c) α vg , and (d) n vg as function of the soil water pressure head, h, using dashed apple green, dashed blue, and solid orange lines, respectively. We also display the higher-order effect, ∑S ij + ∑S ijk , using dash-dotted black lines. The correlation among the four soil hydraulic parameters manifests itself in non-zero values of the correlative sensitivity indices, b . As a result of this, the structural sensitivities a of θ s , θ r , α vg , and n vg do not equal their total-effects T . The correlation, however, does not impact the parameters' total-effects across the entire range of soil water pressure heads. Rather, the correlative contribution of the soil hydraulic parameters to the model output, θ, is limited to h ∈ [−10 6 , −10 2 ] cm. In this interval, the soil's volumetric water content decreases from near saturation to residual moisture content. Outside this range, the correlative contribution of θ r , θ s , α vg , and n vg to the model output is near zero. The magnitude of the correlative sensitivity indices is commensurate with the matrix of correlation coefficients, R 2 , of Table S2 in Supporting Information S1.
The negative values of the correlative sensitivity indices of α vg and n vg are debit to their negative total-effects in the range of h ∈ [−10 5 , −10 3 ] and h ∈ [−600, −70] cm, respectively. Negative T 's are not possible with Sobol's method and, thus, may seem odd at first sight. But covariances among the input variables can suppress variations in model output, hence decrease their total influence on the model output. This result is confirmed by a simple analytic example in Text S1 in Supporting Information S1.
The strong negative correlation, vg , vg = −0.86 , between α vg and n vg is not only responsible for the negative total-effects of α vg and n vg over the documented range of soil water pressure heads but, in turn, also increases the magnitude of the structural sensitivities of α vg and n vg . Nevertheless, the graphs of Figures 2 and 3 are in sufficient agreement not to alter much our earlier conclusions with respect to experimental design, where one should pay attention to only the sensitive domain of correlated parameters for the appropriate selection of informative θ(h) data pairs.
The secondary and third-order component functions again have a negligible contribution to the total effect of the hydraulic parameters. A first-order approximation suffices to accurately describe the θ(h) relationship of the WRF of van Genuchten (1980) in the presence of correlated and uncorrelated hydraulic parameters. This supports earlier conclusions of Rabitz and Aliş (1999) that most physical systems are adequately characterized by only low-order input-output relationships. Altogether, we conclude that the correlation between the soil hydraulic parameters of the WRF does not fundamentally change our earlier findings with independent input factors. Thus far we have focused our attention on the structural and correlative sensitivity indices and total-effect of the hydraulic parameters. But the model output decomposition of HDMR ext may serve another important role, and that is as an emulator of the input-output relationship. Figure S5 in Supporting Information S1 compares the third-order approximation of Equation 25 with expansion coefficients derived from D-MORPH regression against the WRF of van Genuchten (1980) using hydraulic parameter values of 12 USDA soils from Carsel and Parrish (1988).
The HDMR ext emulators provide an excellent description of the WRF's of the different soil types. The surrogate WRF's (solid lines) pass exactly through the one-hundred θ(h)-data pairs of each soil. This inspires confidence in the ability of HDMR ext to provide an accurate assessment of the structural and/or correlative sensitivity indices of the soil hydraulic parameters and combinations thereof. Results demonstrate that the function decomposition of Equation 25 may serve as surrogate of the WRF of Equation 39.

Model Description
As fourth and last case study, we turn our attention to hmodel, a parsimonious conceptual watershed model originally developed by . The hmodel transforms rainfall into runoff at the watershed outlet using an interception, unsaturated zone, fast and slow flow reservoir, respectively, which simulate interception, throughfall, evaporation, surface runoff, percolation, fast streamflow and baseflow (see Figure S6 in Supporting Information S1). In this study, we focus our attention on the parameters of the hmodel. Table S3 in Supporting Information S1 presents a description of each parameter and lists the corresponding symbol, unit, lower and upper bounds. These bounds define a d = 7-dimensional (hyper)cube, ℍ0 ⊆ ℝ , in which the input factors, x = {I max , R u,max , Q s,max , α e , α f , K f , K s }, are allowed to vary freely.
Many different studies have appeared in the hydrologic literature on the quantification and analysis of parameter sensitivity in conceptual watershed models (Borgonovo et al., 2017;Mockler et al., 2016;A. Wang, Pianosi, et al., 2020;A. Wang & Solomatine, 2019;Zelelew & Alfredsen, 2013). With a few exceptions, these published studies almost always use some form of uniform sampling within the hypercube, ℍ0 , to satisfy the requirements of the d-way ANOVA decomposition in Sobol's method and quantify the first-, and/or second-, third-and higher-order parameter sensitivities and/or interactions. We are concerned with the impact of parameter correlation on variance-based estimates of parameter sensitivity and how the structural, correlative and total sensitivity indices depend on the simulated discharge and state (e.g., wetness) of the watershed. As we will show next, the HDMR ext methodology admits the characterization of parameter sensitivity throughout the posterior parameter distribution.

Model Training and Description of Multivariate Parameter Distribution
We estimate the posterior distribution of the hmodel parameters by application to a 9-year record (1 October 1999-30 September 2008) of daily discharge measurements from the Leaf River near Collins, MS (USGS 02472000). This medium-sized watershed of about 1,990 km 2 exhibits a strong winter regime according to the functional classification of Brunner et al. (2020) and has been studied extensively in the hydrologic literature. Daily time series of streamflow, precipitation, and surface temperature are taken from the CAMELS data set (Newman et al., 2014). We impose a uniform prior distribution over the ranges specified in Table S3 in Supporting Information S1 and use the generalized likelihood function of  to infer the posterior distribution of the hmodel parameters using Markov chain Monte Carlo simulation with the DREAM (ZS) algorithm (Vrugt, 2016). The transition kernel of DREAM creates multiple different sequences (chains) of parameter vectors that are stationary and ergodic and have as a joint distribution the posterior target distribution (ter Braak, 2006;Vrugt, ter Braak, et al., 2009). Convergence of the joint chains to a stationary distribution is monitored using single chain and multi-chain diagnostics, among the univariate and multivariate scale-reduction factors of Gelman and Rubin (1992) and Brooks and Gelman (1998), respectively. Figure 4 presents a scatter plot matrix of the N = 10,000 posterior realizations of the hmodel parameters sampled with the DREAM (ZS) algorithm. The diagonal entries correspond to the parameters' marginal distributions, whereas the off-diagonal graphs present bivariate scatter plots of the different parameter pairs. The Maximum APosteriori (MAP) hmodel parameter values are separately indicated in each graph with a red cross. The dotted rectangles make up a closed hypercube, ℍ1 ⊆ ℍ0 , interior to the prior hypercube, ℍ0 . This posterior hypercube will be populated with independent samples so as to compare the sensitivity indices of the posterior realizations of the DREAM (ZS) algorithm. The hmodel parameters appear well defined by the measured daily discharge data. The histograms of the marginal posterior parameter distributions are generally well described by a Gaussian distribution with small dispersion relative to the prior parameter ranges. The two-dimensional dotty plots reveal the presence of correlation between the bivariate samples of the posterior distribution. The hmodel parameter pairs (α e , α f ) and (R u,max , Q s,max ) exhibit a particularly strong positive and negative correlation, respectively. This violates Sobol's method and necessitates the application of the variance/covariance-based model output decomposition of Equation 7.
The bivariate scatter plots confirm the presence of parameter correlation among the samples of the posterior distribution. What is less evident, however, is that the chain samples may also exhibit autocorrelation as the state of the Markov chain is duplicated each time a candidate point is rejected. Chain thinning hardly affects the results of our analysis and so did the use of only the unique posterior samples of the joint Markov chains. Thus, our findings are nearly invariant to the degree of autocorrelation and frequentness (=weight) of the posterior samples. Based on these grounds we use as input to probabilistic SA the DREAM (ZS) -sampled posterior realizations.  Figure 5 summarizes the results of HDMR ext and presents the evolution of the first-order structural, a , and correlative, b , sensitivity indices of (d) R u,max , (e) Q s,max , (f) α e , (g) α f , and (h) K f during the period of 1 October 2000-30 September 2001. To understand the temporal dynamics of the indices, the parameter panels are preceded by time series plots of (a) rainfall and MAP simulated discharge, and the state of the (b) unsaturated and (c) fast reservoirs, respectively. The other hmodel parameters and state variables are displayed in Figure S8 in Supporting Information S1. We present only the first-order sensitivity indices as the higher-order terms exert only a minor control on the model output, y. This finding is supported by the values in Table S4 in Supporting Information S1 of the first-and second-order structural, correlative and total posterior sensitivity indices of the hmodel parameters on dry, moist and wet days.

State/Time-Dependent Probabilistic Sensitivity Analysis
The results in Figure 5 highlight several important findings. In the first place, notice that the structural (green) and correlative (blue) sensitivity indices exhibit considerable variations during the WY 2001. The temporal dynamics are particularly large for the structural sensitivity indices, a , with sudden and substantial fluctuations from one discharge value to the next. The traces of the correlative sensitivity indices appear smoother and more Figure 5. Precipitation, simulated discharge, hmodel state variables and HDMR ext estimates of structural and correlative sensitivity indices of hmodel parameters as function of time, where R u and R f are unsaturated zone and fast reservoir storage, respectively, the subscripts for the sensitivity indices, {2, 3, 4, 6, 7}, follow the same order of the parameters as they are presented in Table S3 in Supporting Information S1. compressed. Second, the magnitude of the structural sensitivity indices exceeds the correlative sensitivity indices. This is generally the case as the total main effect, a + b , is strictly positive, at least in theory. In fact, the correlative sensitivity indices are near zero for large portions of the 1-year discharge record. This is particularly true for (e) Q s,max , (f) α e , (g) α f , and (h) K f . Third, the structural and correlative sensitivity indices not only differ in magnitude and temporal dynamics but also in sign. The structural indices are strictly positive, whereas the correlative contribution usually takes on negative values. Thus, the correlative indices can substantially decrease the total sensitivity indices, = a + b , of the input factors and, thus, main effects of the hmodel parameters. This is the reason so as to why the structural sensitivity indices of (e) Q s,max , (f) α e , and (g) α f exceed unity for certain parts of the simulated hydrograph. This guarantees an exact variance decomposition with sum of the total sensitivity indices, = a + b that always amounts to unity for each successive discharge data point. Fourthly, the temporal behavior of the structural and/or correlative main effects does not seem to relate in an obvious manner to the hyetograph, hmodel simulated flow levels and/or wetness of the watershed as expressed in the state of the unsaturated and/or fast reservoir respectively. Altogether, the results demonstrate that parameter sensitivity is not an invariant property but rather varies dynamically depending on the model input, dominant processes, parameter values and state of the watershed. This is the underlying premise of AMALGAM (Vrugt & Robinson, 2007;).
The temporal behavior of the structural and correlative main effects of the hmodel parameters do not seem to correlate well with the hyetograph, simulated flow level and the watershed state. To investigate this in more detail, please consider Figure 6 which presents bivariate scatter plots of the first-order total sensitivity indices of the hmodel parameters and the cumulative rainfall amounts of the five preceding days, and simulated state of the interception, unsaturated zone, fast and slow reservoirs, respectively. The total main effects demonstrate a complex dependency on the cumulative rainfall and/or state of the watershed (Garambois et al., 2013;Herman et al., 2013;Huang et al., 2021;Pianosi & Wagener, 2016). This dependency is poorly characterized by a linear regression function and follows instead a more parabolic (dc, de, fc, fe) or hyperbolic (aa, ad, ca, cd, gd) relationship. The large data scatter confirms, however, that even such nonlinear relationships do not suffice to capture the temporal dynamics of the main effects. As a result, we resort to a more qualitative interpretation of the dotty plots. In doing so, we shall refer back to Figure S6 in Supporting Information S1 and the ensuing caption which summarizes the process equations of hmodel. The product-moment correlation coefficients and regression functions display a positive correlation between the states of the interception, unsaturated zone and fast reservoirs and the magnitude of the main effects of (ab) I max , (bc) R u,max , and (fd) K f , respectively. This positive relationship is induced by the hmodel Equations which confirm that the impact of the parameters on the simulated discharge grows with the state of their respective reservoirs. The opposite is true for Q s,max and α e , hence why we see a negative correlation in graph (cc) between the magnitude of the main effect of Q s,max and the state of the unsaturated zone reservoir. Furthermore, as the slow reservoir exerts only a minor control on the simulated discharge, the main effect of its recession constant, K s , is close to zero during the 1-year discharge record in the bottom-right panel. Further research should explore the usefulness of the HDMR/HDMR ext derived probabilistic sensitivity indices for improving hydrologic process representation.

Comparison of Prior and Posterior Sensitivity Indices With (In)dependent Samples
We investigate the impact of hmodel parameter correlation on the sensitivity indices derived from variance-based analysis of the simulated rainfall-discharge relationships. Figure 7 presents traces of the main-effects of each hmodel parameter derived from uniform (=independent) sampling of the prior, ℍ0 , and posterior, ℍ1 , hypercubes and the DREAM (ZS) collection of dependent samples of the posterior distribution, ( |̃) (dash-dot green). In Figure S7 in Supporting Information S1 we display the corresponding ranges of the simulated discharge records. We use bullet points to discuss our main findings.
1. The first-order total sensitivity indices of the hmodel parameters derived from the dependent samples of the DREAM (ZS) algorithm are generally in good agreement with the main effects of the independent samples of the posterior hypercube, ℍ1 . For most hmodel parameters, I max , Q s,max , α f , K f , and K s , the traces are markedly similar and typically only show differences in the magnitude of the sensitivity indices. This cannot be said for R u,max and α e whose traces differ substantially, particularly during the second-half of the water year. This is a result of the relative strong correlation between (R u,max , Q s,max ) and (α e , α f ) so evidently visible in the scatter plots of Figure 4. Apparently, this correlation does not affect much the traces of Q s,max and α f as it does R u,max and α e . GAO ET AL. and (f) K f are much smaller than their posterior counterparts. In fact, their prior main effects are so small that one could classify these six parameters as insensitive. An opposite trend between the prior and posterior main-effect is found for the recession constant, K s , of the slow reservoir. This parameter exerts a large and overarching control on the variance of the simulated discharges in the prior distribution but has a negligible impact on the discharge records of the posterior hypercube, ℍ1 , with its main-effect, S 7 , reduced to almost zero. Thus, the Sobol' estimates of the main effects demonstrate a large change in parameter sensitivity when transitioning between the prior and posterior distribution. In this process, the contribution to the model output increases for all parameters but the recession constant K s . These dynamic changes in parameter sensitivity will help guide search and optimization methods in pursuit of the optimum solution and/or posterior parameter distribution and determine the order of iterative refinement of the parameters. This impact of scale in SA is explored more explicitly by the VARS method of Razavi et al. (2019), Razavi and Gupta (2016) but obscures and/or clouds the ranking of input factor importance (Puy et al., 2021). 4. The total first-order sensitivity indices of R u,max , α e , and α f across the posterior distribution ( |̃) take on negative values on certain days of the water year. We have witnessed this before in the third case study and eluded to a simple analytic example in Text S1 in Supporting Information S1. Apparently, the correlative contributions of R u,max , α e , and α f to the simulated discharge are negative and larger in magnitude than their Figure 7. First-order total sensitivity indices, S i , of (a) I max , (b) R u,max , (c) Q s,max , (d) α e , (e) α f , (g) K f , and (h) K s , for the 365-day discharge record using N = 10,000 independent samples of ℍ0 (dotted purple) and ℍ1 (solid orange) and ten-thousand posterior realizations of the DREAM (ZS) algorithm (dash-dotted green).
GAO ET AL.

10.1029/2022WR032834
24 of 29 respective structural contributions. Thus, on certain days these three parameters suppress the model output variance. We will revisit this finding in Section 5.

Surrogate Model Evaluation
To verify the accuracy of the HDMR ext model output decomposition in Equation 25, for emulation of the rainfall-discharge relationship, we apply Algorithm 1 to the simulated discharge records of the N = 10,000 independent samples of the posterior hypercube, ℍ1 . Figure 8a presents a frequency distribution of the sample variances, 2 = 1 − ∑ =1 2 , of the N residuals of the daily surrogate models. A second-order decomposition was deemed sufficient to describe the data. As a result, δ = 2, and the number of unknown expansion coefficients estimated with D-MORPH regression equals l = 336. The residual variances appear rather small. The median value of 2 of about 2.1 × 10 −7 mm 2 /d 2 is substantially smaller than the average within-day variance of 2.2 × 10 −3 mm 2 /d 2 of the N discharges of the training samples. Thus, the variance decomposition should be close to exact. This inspires confidence in the sensitivity indices and surrogate model of HDMR ext .
Next, we evaluate the accuracy of the surrogate models of Equation 25 for samples of x that have not been used for D-MORPH coefficient estimation. Figure 8b displays a histogram of the variance of the surrogate model residuals computed for the ten-thousand posterior realizations of the DREAM (ZS) algorithm. The use of dependent samples increases the median value of 2 to about 1.1 × 10 −4 mm 2 /d 2 , about 20 times smaller than the average within-day variance of the discharges. Thus, within the hypercube, ℍ1 , of the posterior parameter distribution, the second-order expansion of HDMR ext accurately describes the rainfall-discharge relationship of the hmodel. Further research is warranted to determine the performance of the surrogate model over the domain, χ, of the prior hypercube, ℍ0 .

Discussion and Conclusions
This paper has focused attention on probabilistic and/or variance-based SA with correlated input variables. We used the methodology of , called high dimensional model representation or HDMR, which is a generalization of ANOVA to dependent input factors and replaces the elementary functions of Sobol' by component functions with linear expansion coefficients. Under strict regularity conditions, the HDMR superposition of linear multiples of component functions will exactly decompose the variance of the model output and parse out the structural and correlative contributions of the input variables at first-, second-, third-, and higher-orders. We considered two different implementations of HDMR with variance/covariance decomposition of the model output up to the third-order. The default implementation of  uses cubic B-spline component functions and determines the expansion coefficients of each order sequentially through backfitting. A more powerful variant, HDMR ext of Li and Rabitz (2012), implements orthonormalized polynomial component functions with extended bases and simultaneously infers the expansion coefficients of all orders using D-MORPH regression. This latter method guarantees hierarchical orthogonality of the component functions. The bootstrap method is implemented in both HDMR implementations to provide confidence intervals of the structural and correlative sensitivity indices at first, second-and third-order.
The power and usefulness of covariance-based SA was demonstrated by application of HDMR and HDMR ext to a simple linear model and the well-known Ishigami function. These two studies confirmed that both methods can accurately decompose the variance of the model output and back out the structural and correlative contributions at first-, second-, and/or higher orders of the input factors. For independent input variables, the sensitivity indices of HDMR and HDMR ext matched exactly their counterparts derived from Sobol'. For correlated input variables, the sum of the structural and correlative indices were in excellent agreement with the total sensitivity indices of modified Sobol' (Kucherenko et al., 2012).
The third case study considered application of HDMR to covariance-based SA of the water retention, θ(h), function of van Genuchten (1980). As the B-spline component functions in the default HDMR implementation did not guarantee an exact variance decomposition (as anticipated), we focused our attention on HDMR ext . The total effects of θ s , θ r , α vg , and n vg , appeared relatively unaffected by correlation and were in close agreement with the partial derivatives, ∂θ/∂θ s , ∂θ/∂θ r , ∂θ/∂α vg , and ∂θ/∂n vg of the van Genuchten (1980) soil water characteristic. Our results confirmed earlier findings of Vrugt et al. (2002) and Vrugt and ter Braak (2011) with respect to optimal experimental design of the hanging-column method.
Our fourth and last study considered probabilistic SA of the seven-parameter hmodel using the rainfall-discharge transformation of the Leaf River watershed in Mississippi. Posterior realizations sampled with the DREAM (ZS) algorithm served as input to HDMR ext to quantify sensitivity across the posterior distribution. The structural and correlative main effects of the hmodel parameters varied considerably during the water year in a manner that did not obviously relate to simulated flow level, catchment wetness, rainfall and potential evaporation. The total main effects of the posterior parameter distribution ( |̃) compared generally well with their Sobol' counterparts obtained from independent samples of the posterior hypercube, ℍ1 . The main effects of two hmodel parameters changed considerably as a result of correlation. This affects the ranking of the main effects, and, thus, assessment of parameter importance.
The results of probabilistic SA of the hmodel appeared unaffected by the degree of autocorrelation among the posterior realizations and their frequency of appearance in the sampled Markov chains. Duplicate chain samples do not carry new information about the input-output relationship for the expansion coefficients, but do increase the weight that is attached to these samples in D-MORPH regression. Our results confirmed (not shown) that subsampling (thinning) of the Markov chains and/or the use of unique posterior samples only, did not change much the results of HDMR ext . Thus, our covariance-based estimates of probabilistic sensitivity across the posterior hmodel parameter distribution appeared invariant to the exact distribution of the samples within this high-probability density region of the parameter space. If deemed necessary, one can fit a Gaussian mixture distribution to the posterior realizations using the methods presented in Volpi et al. (2017) and draw a large collection of samples from this distribution as input to probabilistic SA.
With only mild correlation among the hmodel parameters, the fourth and last case study did not do justice to the HDMR ext methodology. When multivariate parameter dependencies are weak, the independence assumption of Sobol' will generally suffice to approximate the parameters' main effects across the posterior distribution. But then the posterior hypercube ℍ1 must be delineated first from the samples of the posterior distribution. This two-step procedure will break down when multivariate parameter dependencies are strong and we must resort to covariance-based model decomposition with HDMR ext to obtain robust variance-based estimates of the structural a correlative b and total S i contributions of each input factor to the model output.
One can envision many other case studies for which an exact variance-covariance decomposition at first-, secondand higher-orders with HDMR ext is a must for deriving sensical sensitivity indices. One can think of applications wherein input variables must satisfy a certain hierarchy, spatial arrangement or organization so as to not to violate system physics and/or modeling assumptions. For example, in surface hydrology one may wish to evaluate model sensitivity to spatiotemporal rainfall structure. In soil hydrology, one may want to determine the sensitivity of soil evaporation and/or root water uptake to surface temperature, humidity, wind-velocity and solar radiation. These meteorological variables are intimately related (Cascone et al., 2019) and treating them as independent would violate process physics and yield rather meaningless estimates of sensitivity. In all these cases, HDMR ext can provide assistance in screening, model development and input prioritization. Furthermore, the HDMR ext functional decomposition into linear multiples of component functions has application beyond (co)variance-based SA. This decomposition can serve as surrogate model in two-stage MCMC simulation to accelerate posterior exploration (Laloy et al., 2013;Paun & Husmeier, 2021). The total main effects of the parameters can also serve as guiding principle to continuously adapt search subspace in DREAM Suite (Vrugt, 2016) in pursuit of the posterior parameter distribution.
While HDMR ext is a powerful addition to the hydrologists' arsenal of SA methods, this method can pose considerable computational challenges. The number of expansion coefficients l and, thus, number of columns of the design matrix Φ in Equation 32 increases polynomially with the number of input variables d, polynomial degree p of the component functions and order δ of the functional decomposition in Equation 25. This complicates the application of HDMR ext to parameter-rich models. For example, if d = 10, p = 3, and δ = 2, we have l = 705 unknown expansion coefficients. For a third-order decomposition δ = 3 this value increases to l = 8,265. The rapid growth of the number of expansion coefficients with d, p, and/or δ can pose problems with the storage of the design matrix Φ in random-access memory (RAM). Suppose, for example, we use a ratio N/l = 5, then a stand-alone PC with 16 gigabytes of RAM can handle a maximum of l = 20,000 expansion coefficients before depleting the memory. If we 26 of 29 assume default settings of p = 3, then a second-order model decomposition would reach this number of expansion coefficients for d = 51. For δ = 3 the maximum number of input variables is limited only to d = 13. Thus, there are significant advantages to keeping the order δ of the HDMR model decomposition and polynomial degree p of the component functions as low as possible. Fortunately, most physics-based models are of type A or B according to the functional classification of Kucherenko et al., 2011 and, thus, support a second-order model decomposition (Falchi et al., 2018;Rabitz & Aliş, 1999;Ratto et al., 2007;Shereena & Rao, 2019). If the input-output relationship warrants use of a higher-order decomposition (e.g., Huang et al., 2021) then distributed-memory and/ or multi-core computer system is required to implement ANCOVA-based model decomposition with HDMR ext .
As closing remark we comment on the negative values of the total-and main effect sensitivity indices, T and S i , respectively, in case studies three and four. This non-intuitive finding is not a numerical artifact but the result of applying the total law of covariance as is demonstrated in Text S1 and Table S5 in Supporting Information S1 with a simple analytic example. This study confirms that a parameter's negative correlative contribution can exceed in magnitude its structural contribution. Thus, some input variables can suppress the model output variance by promoting, for example, negative feed backs in the model and driving the output to an equilibrium.