Generalized information reuse for optimization under uncertainty with non‐sample average estimators

In optimization under uncertainty for engineering design, the behavior of the system outputs due to uncertain inputs needs to be quantified at each optimization iteration, but this can be computationally expensive. Multifidelity techniques can significantly reduce the computational cost of Monte Carlo sampling methods for quantifying the effect of uncertain inputs, but existing multifidelity techniques in this context apply only to Monte Carlo estimators that can be expressed as a sample average, such as estimators of statistical moments. Information reuse is a particular multifidelity method that treats previous optimization iterations as lower fidelity models. This work generalizes information reuse to be applicable to quantities whose estimators are not sample averages. The extension makes use of bootstrapping to estimate the error of estimators and the covariance between estimators at different fidelities. Specifically, the horsetail matching metric and quantile function are considered as quantities whose estimators are not sample averages. In an optimization under uncertainty for an acoustic horn design problem, generalized information reuse demonstrated computational savings of over 60% compared with regular Monte Carlo sampling.


INTRODUCTION
Optimization techniques are becoming increasingly integrated within the engineering design process when computational models of the system are available. Often, various inputs to models of the system are uncertain in reality, 1

and
Nomenclature: x, vector of design variables; n x , the number of design variables; , underlying random outcome; U( ), random vector of uncertain input parameters; u, realization of input parameters; n u , the number of uncertain input parameters; n c , the number of constraints; y, general system model output; Y( ), random variable representing a system output; n, number of sampled values; y i , the ith sampled value of Y( ); Y (s) , the sth-order statistic out of n-sampled values of Y( ); d, quantity used in an optimization formulation;d, naive Monte Carlo estimator of d from sampled values of Y( );ŝ, estimator of d used in the optimization; N boot , number of times resampling occurs in bootstrapping;d b , the value ofd evaluated using the bth set of resampled values of Y( );V, bootstrapped estimate of the variance of an estimator;Σ, bootstrapped estimate of covariance between two estimators; F Y ( y), cumulative distribution function of Y( ); x, as a subscript/superscript: of a given design point; A, as a subscript/superscript: of the current design point in an optimization; C, as a subscript/superscript: of the design point selected as the control point; , minimum acceptable probability of constraint satisfaction.
it is important to account during the design process for their influence on the performance of the system. The field of optimization under uncertainty (OUU) has therefore developed various optimization formulations for engineering design that can improve the behavior of a system subject to uncertain inputs. [2][3][4][5][6] These formulations involve defining quantities that describe the behavior under uncertainty of the system, which are then used as the objective or as constraints in an optimization.
For practical engineering systems, the system model can be computationally expensive to evaluate, and so optimizing such systems represents a significant computational burden. Surrogate-based and multifidelity formulations have been developed to address this issue. [7][8][9][10][11][12][13][14] Multifidelity approaches leverage cheap surrogate models to reduce the computational cost, while retaining occasional recourse to the high-fidelity model to ensure accuracy.
In OUU, the effects of uncertain inputs on the system must be propagated at every optimization iteration. Various methods have been developed to achieve this propagation at low computational expense, including (but not limited to) quadrature-based integration, 15 stochastic expansions, 9,[15][16][17] and surrogate models. 18,19 While effective in many scenarios, especially when the problem is smooth, these methods suffer from the "curse of dimensionality" whereby their cost increases exponentially with the number of uncertain inputs.
Monte Carlo (MC) sampling, on the other hand, is a method whose convergence rate is independent of both the dimensionality and smoothness of the problem, 20,21 making it attractive for problems with a large number of uncertainties or with particularly non-smooth behavior. This convergence rate is slow, however, and some problems may simply be too computationally expensive for MC sampling, for example if only a few hundred expensive high-fidelity model evaluations can be afforded within an optimization, then MC sampling is not recommended since a designer would have little confidence in the accuracy of the results.
Multifidelity approaches can offer a way of reducing the computational expense of MC sampling, 22 allowing it to be useful when the cost would otherwise be marginally too high, or allowing it to be better integrated within an engineer's workflow for practical problems. In the context of OUU with MC sampling, a multifidelity formulation based on control variates has been proposed to reduce the number of system model evaluations required to propagate the effect of the uncertainties. 23 The formulation in that work applies to general low fidelity models, but of particular benefit in the OUU setting is the use of information from previous optimization iterations, termed "information reuse" (IR) in the work of Ng and Willcox. 23 However, this control variate-based approach, as presented in the work of the aforementioned authors, 23 is limited to formulations of the OUU problem where estimators of the quantities being optimized or constrained must be expressed as a sample average (eg, estimators of the first two statistical moments, ie, mean and variance).
There exists a rich variety of quantities that can be optimized or constrained in OUU for engineering design. 3,4,[24][25][26][27] Estimators of many of these quantities cannot be represented as a sample average, and so existing multifidelity approaches for OUU cannot be used to accelerate such estimators. This paper extends the IR approach of the work Ng and Willcox 23 to estimators that cannot be expressed as a sample average. Consequently, IR becomes applicable to a broader range of OUU formulations. Our "generalized information reuse" (gIR) approach makes use of the ideas behind bootstrapping 28 to evaluate both the error of estimators and the covariance between estimators using different fidelity models. Section 2 outlines the general formulation of an OUU for engineering design and gives examples of quantities optimized/constrained in an OUU. Section 3 presents the gIR approach. Section 4 tests the approach on an algebraic test problem and applies it to the design of an acoustic horn under uncertainty. Section 5 concludes this paper.

FORMULATIONS OF OUU
This section discusses a general formulation of the OUU problem and the use of MC sampling in solving it. We then present several specific quantities that can be used as objective or constraint functions in OUU, including statistical moments, the probability of failure, the quantile/value at risk, and the horsetail matching metric.

General formulation of the optimization problem
Models of practical engineering systems often produce various outputs of interest. These outputs enter into the design problem through the objective or constraints. We denote a general output by y(x, u), which is a function of controllable design variables x and input parameters u. We treat u as uncertain, and use a probabilistic representation such that u is a realization of a random vector U( ), where denotes the underlying random event with sample space Ω such that ∈ Ω. Thus, at a given design x, an output is a random variable Y x ( ) defined such that Y x ( ) = y(x, U( )).
An OUU requires an objective, denoted d 0 (x), and constraints, denoted d j (x), j = 1, … , n c , to be defined that characterize the behavior of Y x as a function of the design x. For example, the expected value of Y x could be minimized subject to a constraint on the variance of Y x . In general, the formulation of a (single objective) OUU problem can be written as where each of d j (x), j = 0, 1, … , n c is an appropriately defined function. Examples of functions to use in this OUU formulation are given in Sections 2.3 to 2.6. In the following, we drop the subscript notation to refer to a general d(x) that could be any of d j (x), j = 0, 1, … , n c .

Monte Carlo sampling
To implement an OUU, an estimate of d(x) must be obtained at each design point x visited by the optimization algorithm. Such an estimate is usually obtained by evaluating the output y(x, u) at particular values of the input parameters. In many practical design problems, evaluating y(x, u) involves using black box simulations that model the system being optimized, and so a nonintrusive approach is required. Monte Carlo sampling is a nonintrusive approach, which uses an estimator, denoted byd, to obtain an estimate of d(x). 20,21 Solving the OUU problem using MC sampling gives a nested formulation, where in the outer loop, an optimizer determines the sequence of design points to visit, and in the inner loop, MC sampling obtains estimates of d j (x), j = 0, 1, … , n c . Many MC estimators are sample averages in that they are a function that can be expressed in the form where Z i , i = 1, 2, … , n are n independent and identically distributed (i.i.d.) random variables that depend on Y x . A sample average estimator is an unbiased estimator of the mean of Z i , and an analytical form of the estimator variance can be obtained as follows: where 2 Z is the variance of Z i . Therefore, the standard deviation, and hence root mean squared error (MSE), of a sample average estimator is proportional to n −1/2 . 20,21 However, in general, MC estimators are of the form where f is a function that cannot necessarily be expressed as a sample average, in which case, we cannot use Equation (3) to derive their variance. Sections 2.3 to 2.6 discuss several key choices of d(x) that can be used in OUU for engineering design and give their MC estimators. Whether they are used as objectives or constraints in the OUU formulation is left up to a designer setting up the problem.

Statistical moments
A common quantity used as the objective function in OUU is the mean value, = E[Y x ]. The variance, 2 = Var[Y x ], and standard deviation, , are also often used. The mean and variance (or standard deviation) are used as the objective function(s) in traditional robust design optimization, where they can be optimized separately in a multiobjective formulation 3,29-33 but are often combined into a single objective using a weighted sum. 3,17,29,34,35 Constraints are also often formulated using statistical moments, eg, requiring that the mean value is standard deviations away from failure. In this case, the constraint in the OUU formulation is d = + ≤ 0. [36][37][38] In the case where Y x is distributed normally, this gives a constraint on the probability of failure. Note that this is the approach applied to the constraints in the original IR formulation. 23,39 The estimator of the mean of Y x is given bŷ= where Y i x are n i.i.d. random variables with the distribution of Y x . Thus, we can set Z i = Y i x to express this estimator in the form of Equation (2). The estimator of the variance of Y x is given bŷ and we can set Z i = n n−1 (Y i x −̂) 2 to express this estimator in the form of Equation (2). Since both of these estimators are sample averages (they can be expressed in the form of Equation (2)), we can use Equation (3) to estimate their variance.

Probability of failure
Another common formulation of OUU uses the probability of failure where, here, failure is indicated by Y x > 0. This failure probability is usually used as a constraint in OUU formulations such that p − (1 − ) ≤ 0, where is the minimum acceptable probability of success, so (1 − ) is the maximum acceptable probability of failure. The sample average estimator of p is given by using where 1 S (.) is the indicator function that is equal to 1 when its argument is in the set S and 0 otherwise.

Quantile/value at risk
An alternative to constraining the probability of failure is to constrain the quantile function (the inverse of the CDF) where F −1 Y x ( ) is the quantile function of Y x , evaluated at . The quantile has previously seen much use in OUU formulations for operational research and financial applications [40][41][42] (where it is often known as the value at risk), and recently, it is being considered more in engineering design applications. 26,27 The following estimator of the quantile function F −1 where ⌊.⌋ indicates rounding down to the nearest integer and Y (k) x is the kth-order statistic of a set of n realizations of Y x (which is defined as the kth smallest out of the realized values).
This estimator cannot be expressed as a sample average, and so an analytical form of the variance cannot be derived from Equation (3).

Horsetail matching metric
Horsetail matching is a formulation of OUU that minimizes the difference between the inverse CDF of Y x and a target, 44 where this difference is given by where is the quantile function of Y x and t(h) is the target. Note that this target does not necessarily have to consist of the inverse of a valid CDF; it can be any function of h. 44 In the work of Cook and Jarrett, 44 the horsetail matching metric was evaluated in differentiable form using kernels, but, here, since only the value of d hm (not the gradient) is required, the metric is evaluated by directly integrating the empirical CDF. This is given by the following estimator: where Y (k) x is the kth-order statistic of a set of n realizations of Y x . Again, this is not a sample average so an analytical expression for the error of this estimator cannot be derived from Equation (3).

GENERALIZED INFORMATION REUSE
At the current design point selected by the optimizer outer loop, denoted x A , an estimator of d(x A ) must be evaluated; d could be (but is not limited to) any of the quantities discussed in Section 2. To evaluate an estimator using regular MC, we draw n i.i.d. sampled values u i , i = 1, 2, … , n from the distribution of U( ); then, evaluate the system model at each of these values to obtain n i.i.d. sampled values of Y A , which is defined to be equal to y( , n is then used as a realization of each of the random variables Y i x in the estimator definition (such as those given in Section 2) to evaluate the estimator and obtain an estimate of d.
If the MSE of an estimator is too high such that the optimizer is not given accurate information about the objective and constraints, the optimizer will not be able to effectively select new design points to improve the design and meet the constraints. Therefore, a tolerance is set on the MSE of the estimator evaluated by the MC inner loop, which must be met at each design point. The estimator that meets this acceptable error at the current design point is denoted byŝ A . A regular MC estimator may require a large value of n, and hence require many expensive system model evaluations to achieve this acceptable error. In this section, we present and analyze the gIR estimator that can reduce this computational cost.

The IR estimator
Information reuse is a variance reduction method for obtainingŝ A , which is based on the control variate approach. 20,45 The control variate approach uses an auxiliary random variable to make a correction to a regular MC estimator. For IR, this auxiliary random variable is the system output at the closest point in design space (in terms of smallest Euclidian distance) previously visited during an optimization. We denote this closest point as the control point x C , and let Y C ( ) = y(x C , U( )) be the auxiliary random variable.
Given a set of i.i.d. sampled values u i , i = 1, … , n that are drawn from the distribution of U( ), we evaluate the system model using these sampled values at the current design point and the control point to obtain respectively. Then, we use i A and i C , i = 1, … , n, to evaluated A andd C , which are regular MC estimators of d(x A ) and d(x C ), respectively. The classical control variate estimator is then given bŷ where d C is the true value of d(x C ), and is the correction factor. In practice, the true value of d(x C ) is not known and the estimator with acceptable error at the control point,ŝ C , is used instead. The IR estimator is thus given byŝ .
The sampled values u i , i = 1, … , n used to evaluated A andd C are drawn independently for each new design point visited. This ensuresŝ C is uncorrelated with eitherd A ord C , and thus, the variance of the IR estimator is given by where V [.] denotes variance and Σ[., .] denotes covariance. The optimal choice of minimizes the variance of the IR and using this value of gives the following variance of the IR estimator: which is derived from Equation (15) and so V[ŝ A ] ≥ 0.
If the covariance between the MC estimators at the current design point and the control design point is high, then the variance of the estimatorŝ A can be significantly reduced compared with the regular MC estimatord A . Within an optimization, it is the norm that the design points visited at consecutive iterations get closer together as the optimization progresses; therefore, the covariance between MC estimators at these design points is expected to increase the closer together they are; it is this phenomenon of which IR takes advantage. Indeed, in the work of Ng and Willcox, 23 it was shown that, in one dimension, the correlation between y(x, U( )) and y(x + Δx, U( )) is quadratic in Δx for |Δx| < < 1 and approaches 1 as Δx approaches 0.
To implement IR, estimates for the variance and covariance terms in Equation (17) must be obtained. It was illustrated in the work of Ng and Willcox 23 how the variance reduction effect of IR is robust with respect to inaccuracies in estimates of these variance and covariance terms. Therefore, since we do not have access to exact values of these terms, we can use estimates obtained from the values of i A and i C , i = 1, 2, … , n and still obtain significant reduction in the variance ofŝ A . Such estimates in the original formulation of IR 23,39 rely on analytical expressions for the variance and covariance of sample average MC estimators similar to Equation (3). Therefore, of the quantities from Section 2, this original formulation is only valid for̂,̂2, andp .
Our generalization of IR can be applied to a general d, whose MC estimator cannot necessarily be expressed as a sample average. Examples of such estimators included hm andq . Our approach uses the idea of resampling from an existing set of sampled values, which is the backbone of bootstrapping. 28,46 Bootstrapping is widely used to estimate standard errors of estimators for which analytical expressions based on sampled values are not available. The formulation presented here uses this idea to estimate not only the error of an estimator but also the covariance between estimators at two separate design points.

Bootstrapped estimates of variance and covariance
Given a set of n i.i.d. sampled values u i , i = 1, … , n, from the distribution of U( ), consider the empirical distribution consisting of a probability density function made up of delta functions at each sampled value u i . This empirical distribution approximates the true distribution of U( ). Thus, resampling with replacement from this empirical distribution, by randomly drawing an index from i = 1, … , n and using u i as the resulting sample realization, obtains sampled values that are from an approximation to the distribution of U( ). The idea behind bootstrapping is to use this resampling to obtain a new set of sampled values of Y A ( ) and Y C ( ) from which an estimator is evaluated, giving a realization of the estimator from an approximation to the estimator's distribution. 28,46 In order to use bootstrapping to estimate the terms in Equation (17), first, a set of n i.i.d. sampled values u i , i = 1, … , n, are drawn from the distribution of U( ), and the system model at both x A and x C is evaluated to obtain n sampled values of both Y A ( ) and Y C ( ), denoted i A = (x A , u i ) and i C = (x C , u i ), respectively, for i = 1, … , n. Next, resampling with replacement occurs n times from u i , i = 1, … , n to obtain a new set of values u j , j = 1, … , n, and the corresponding values of (x A , u ) and (x C , u ) (which have already been evaluated) are used as new sets of sampled values of Y A ( ) and Y C ( ), respectively. This resampling is repeated N boot times, and using each set of the resampled values ( These are sampled values from approximations to the true distributions ofd A andd C .
The bootstrapped estimate of the variance of an MC estimatord, denotedV[d], is obtained from the sample variance of these N boot valuesd b , b = 1, … N boot . Similarly, since the resampled values of Y A and Y C are the system model evaluated at the same values of u, the bootstrapped estimate of covariance between the estimators at design points x A and x C , denoted Σ, can be obtained from the sample covariancē . TheV andΣ notation highlights the fact these are sample average estimators from the finite number of sampled values of d b , and so are themselves random variables. This is in contrast to the estimate of variance for a sample average estimator, which is analytically derived and so is fixed for a given set of sampled values. The estimatorsV andΣ thus have a variance that depends on N boot , and this is discussed further in Section 3.5.
Note that the covariance Σ[d A ,d C ] is different to the covariance between y(x A , U( )) and y(x C , U( )), which could be estimated analytically from the sampled values i A and i C i = 1, … , n. This bootstrapping procedure is illustrated in Figure 1 (16) to obtain an estimated optimal , which we denote as̄ * . This̄ * is subsequently used to obtain an estimated varianceV[ŝ A ] given by Equation (17).
To use this estimation procedure within an optimization loop, a required acceptable variance for the estimatorŝ A is specified (which for unbiased estimators corresponds to an acceptable MSE) and y(x A , u) and y(x C , u) are evaluated at new realizations of U( ) until the estimator variance is below this value. If the covariance between estimators at a particular design point and the control point is not sufficiently high, the IR estimator may require more computational effort than regular MC, since it requires evaluation of the system model y at both design points. In this case, recourse to regular MC is required for the current design point, but in order to determine whether IR or regular MC requires more evaluations, a way of predicting how many sampled values each approach would use is required.

Predicting number of samples required
Since a general quantity d with an unknown MC estimatord is being considered, an assumption must be made about how the terms in Equation (17) vary as a function of n.
This paper specifically considers the quantile and the horsetail matching metric, and for both of these, it is known that, asymptotically, the variance of their MC estimators is proportional to n −1 . Therefore, for each term in Equation (17), a convergence function is fit proportional to n −1 through each term. For example, for a variance term,V = v(n) =V 0 × (n∕n 0 ) −1 , where n 0 is the current number of sampled values andV 0 is the current estimate of variance. In general, the same could be done for any function v(n) for estimators with different convergence behavior. This is done for the variance ofd A , the variance ofd C , and the covariance betweend A andd C , giving the predicted functions of n v A , v C , and v A,C , respectively. Substituting these into each of the terms in Equation (17), with V[ŝ A ] set to the required value V req , gives a function in n whose roots give the predicted value of n, ie, the number of sampled values of Y A ( ) and Y C ( ) required to reach the acceptable variance value If there are multiple solutions to the equation, the smallest is chosen, and then the solution is rounded down to the nearest integer to get the predicted value of n.
Clearly, the further into the asymptotic convergence regime each of the terms in Equation (17) are the more accurate this prediction will be. This prediction is used at the start of each optimization iteration with an initial n init evaluations of y at both x A and x C , and thus n init should be chosen to be sufficient to give a reasonable prediction. This approach is outlined in Algorithm 2.
For the regular MC estimator, this is applied to a single term, ie, n pred,MC = n 0 ×V[d A ] 0 ∕V req . Once it has been decided whether to revert back to regular MC or continue with IR at a given design point, y(x A , u) and y(x C , u) are evaluated at additional sampled values of U( ) untilV[ŝ A ] is below the required level.
When using bootstrapping, estimating the variance terms is no longer at negligible computational cost, especially if evaluatingd requires sorting of the sampled values, since this is to be done N boot times. Therefore, if only a small number of sampled values were added at a time,d could be evaluated a large number of times and the computational cost may become nonnegligible. While in practical problems, obtaining a sampled value of Y x by evaluating the system model might far outweigh the cost of the bootstrapping procedure and so this is a nonissue; if the number of times bootstrapping is performed should be limited, Algorithm 2 can also be used to determine the number of additional realizations of Y x to obtain.
When a small number of initial sampled values, n init , are used, the prediction given by Algorithm 2 is likely to be inaccurate. Therefore, to avoid overestimating the value of n needed to give an acceptable variance, the number of sampled values of U drawn at a time, at which the system model is evaluated, is limited to be a constant times the current value of n. Additionally, to avoid iterating by a small amount many times close to the required number, sampled values are obtained to give a total of (1 + )n pred , where > 0 is a small value. This paper uses = 10 and = 0.1, but if evaluating the system model is far more expensive than bootstrapping, then = 0 and a smaller can be used.

Bias
If the estimatord is unbiased, then the IR estimator is unbiased. 23 However, for a general quantity d with MC estimator d, the bias of the IR estimator is given by

Variance of bootstrapped estimators
The bootstrapped estimatesV andΣ are random variables that depend on the resampled values ofd b . If N boot is large enough their variance will be negligible, but since a finite number of sampled values have to be used in reality, the variance of the bootstrapped estimator of the IR estimator variance in Equation (15) is given by where, for clarity, SinceV andΣ are sample average estimators, their variance is approximated by 1 N boot̂2 , wherê2 is the sample variance of the values being averaged. With a reasonable value of N boot , this variance will be small; however, if is large, then the 4

Var[V S C ] term in Equation (21) is potentially nonnegligible. If the resulting value of Var[V[ŝ A ])] is then used as Var[V S C ]
in a subsequent iteration, it could be magnified further leading to a continuously growing inaccuracy in the estimated variance of the IR estimator and so giving meaningless results.
In IR, the optimal choice for is rarely greater than 1, and is often ≃ 1, so this is treated as a nonissue in this paper's implementation. Furthermore, the main application of this work is for problems where evaluating the system model y is far more expensive than the bootstrapping procedure, and a large N boot can be used to keep the variance of the bootstrapped estimators low. However, if desired, this variance can be kept track of in the optimization using Equation (21), and if it is deemed that the inaccuracy is too great, an MC estimator can be resorted back to to break the chain of dependence.

Overall algorithm
The overall algorithm for performing an optimization using the proposed gIR method is given in Algorithm 3.

NUMERICAL EXPERIMENTS
In this section, experiments are performed on an algebraic test problem and a physical acoustic horn design problem.

Algebraic test problem
The algebraic test problem uses two design variables, x 1 and x 2 , and n u uncertain variables, u 1 , … , u n u . There are two outputs where The bounds on both design variables are [ − 1, 1], and the uncertain inputs are all uniformly distributed over [ − 1, 1]. Due to the stochastic nature of using MC sampling to evaluate objectives and constraints, an optimization algorithm that is resistant to a small amount of noise is desirable. Derivative-free methods sample over relatively large regions of design space, which can smooth out the effect of the sampling noise, and so are suitable for sampling-based OUU. The derivative-free optimizer Constrained Optimization By Linear Approximation (COBYLA) 48 is used here, where it is run until the magnitude of the change in design vectors is less than 10 −2 ; it is implemented using the NLopt toolbox. 49

Validation
First, in order to validate the gIR method, the variance of the gIR estimator predicted by Equation (15) and Algorithm 1 is compared with an empirically generated variance (both using N boot = 5000). To achieve this, using n u = 2, a sequence of design points are taken from an optimization and fixed; then, the IR estimator and its variance are obtained using the method outlined in Section 3 at each point as it would be in an optimization but using a fixed (n = 1000) number of sampled values. This is repeated 5000 times; the average of the gIR variances gives the theoretical bootstrapped curve on Figure 2 and the variance of the estimators gives the empirical curve on Figure 2. This validation process is performed when estimating both the horsetail matching metric with t(h) = −h 4 and the 90% quantile of the random variable defined by y 0 from Equation (22); both of these curves show good agreement.

Optimization acceleration
Next, using n u = 10, the horsetail matching metric for y 0 is minimized subject to the 90% quantile function on y 1 being less than zero (indicating a 90% probability of constraint satisfaction). This gives the following optimization problem: where F −1 0 and F −1 1 are the inverse CDFs (quantile function) for the random variables given by y 0 (x, U( )) and y 1 (x, U( )), respectively. These choices for objective and constraints are propagated as discussed in Sections 2.5 and 2.6, ie, neither of their estimators can be expressed as a sample average. For this problem, we use the risk-averse target t(h) = −(4h 6 ) in the horsetail matching metric. This target is risk averse because it emphasizes minimizing values of the quantile function F −1 0 (h) for values of h close to 1 more than minimizing values of h close to 0, hence preferring designs with better worst-case behavior. We refer the reader to the works of Cook and Jarrett 44 and Cook et al 50 for further discussion on horsetail matching targets. We use a required variance of 1 × 10 −3 for the objective function and 4 × 10 −3 for the constraint. In preliminary experiments, using these values in a regular MC estimator of the objective required similar computational cost to a regular MC estimator of the constraint, facilitating ease of illustration on Figure 3. The influence of this choice of required variance is discussed in Section 4.1.3. The optimization is run from an initial design point of x 0 = [ − 1, − 1]; additionally, n init = 50, = 10, and = 0.1.
We run optimizations using only regular MC and using gIR. Figure 3 gives the convergence, in terms of computational cost (in terms of number of evaluations of y(x, u)), of the objective function (the horsetail matching metric) and constraint (quantile function) and the computational cost of each optimization iteration. The optimization using IR achieves significant computational savings compared with regular MC in both objective function and constraint evaluations.
Additionally, we run optimizations where the input parameters are distributed according to a gamma distribution (instead of being uniformly distributed) with shape parameter 2 and scale parameter 0.5 such that u i is the realization of a random variable given by 1 − U i ( ), where U i ∼ Γ(2, 0.5), for i = 1, … , n u . This random variable is skewed and has support (−∞, 1]. The results of optimizations with these input uncertainties are given in Figure 4. On Figure 4, gIR performs similarly to Figure 3 (when the uncertainties were uniformly distributed). This is because the distributions of Y A ( ) and Y C ( ), and hence, the correlation betweend A andd C depend on the nonlinearity and variance  Since the nonlinear test function given by Equation (22) is being used here, changing the distribution of U( ) does not significantly affect the ability of IR to improve upon regular MC.
The COBYLA algorithm is not guaranteed to converge to the true optimum, and due to the stochastic nature of the problem, it is not expected that the same design is obtained each time an optimization is performed. To investigate this, we run the optimizations an additional 13 times from random initial design points (with uniformly distributed input uncertainties), and plot the convergence of the objective function on Figure 5, along with the optimal design point obtained from each run. From Figure 5, we can see that the benefit of IR is not specific to a particular initial design point. Additionally, the scatter in the optimum design point obtained by the different runs highlights the noisy nature of using MC sampling for OUU and that the required variance must be low enough for the results of the OUU to be meaningful.

Influence of required variance
To investigate the impact of the chosen required variance values, we run an optimization using values an order of magnitude lower than in Section 4.1.2 (therefore, 1 × 10 −4 for the objective and 4 × 10 −4 for the constraint); the results are given in Figure 6. The overall cost of the optimization is roughly an order of magnitude higher than the previous optimizations (note the scale on the y-axis) since, as discussed in Section 2.2, the variance of the MC estimators considered here decreases proportional to n −1 . Information reuse still offers significant speedup over regular MC, indicating that, while the choice of required variance influences the overall cost of the OUU, IR is still able to reduce this cost.
Additionally we can see that, compared with Figure 3, more optimization iterations are needed before IR is able to obtain the required variance values using only n init sampled values. This is because, for only n init sampled values to be needed to obtain a lower required variance, a larger correlation between estimators at the design point and control point is needed; therefore, the design points need to be closer together, which occurs later in the optimization run.
In some cases, it is possible for IR to be more expensive than regular MC, ie, if n init is chosen to be too high (relative to the required variances) such that regular MC can achieve the required variance in less than n init sampled values, IR wastes the n init evaluations of the system model at the control point. For example, Figure 7 gives the results of an optimization where the required variances are set to be an order of magnitude higher than in Section 4.1.2, (therefore, 1 × 10 −2 for the objective and 4 × 10 −2 for the constraint) and a value of n init = 50 is used.
For the majority of design points, the n init sampled values were enough to achieve the required variance using regular MC, and so IR wasted the additional n init system model evaluations at the control point. In this case, this required variance is so high that the results of this OUU are unlikely to be meaningful. In design scenarios where a designer has a reasonable idea of the computational cost needed to reach their required variance, they should be able to choose a sensible value of n init .
If only a very few number of model evaluations can be afforded such that evaluating n init sampled values at both the current design point and control point is too expensive, then the current implementation will perform poorly. However, such a computational expense implies that using MC sampling to propagate the uncertainties is perhaps not the best choice in the first place.

Application to acoustic horn design problem
This problem uses a simulation of a two-dimensional acoustic horn governed by the nondimensional Helmholtz equation. An incoming wave enters the horn through the inlet and exits the outlet into the exterior domain with a truncated absorbing boundary. 23,51 The flow field is solved using a reduced basis method obtained by projecting the system of equations of a high-fidelity finite element discretization onto a reduced subspace 52 ; here, 100 reduced basis functions are used.
This system's single output of interest, given by y (x, u), is the reflection coefficient, which is a measure of the horn's efficiency. The geometry of the horn is parametrized by the widths at six equally spaced axial positions from root to tip. The design variables give the nominal value of these widths x k,nom , and then uncertainty is introduced representing manufacturing tolerances such that the actual value x k is given by a random variable centered on x k,nom . Additionally, the wavenumber of the incoming wave and the acoustic impedance of the upper and lower horn walls are uncertain input parameters. The bounds for the design variables are given in Table 1, and the uncertainty distributions are given in Table 2.    As an illustration of the problem, the magnitudes of the complex acoustic pressure field for the horn geometries given by the lower and upper bounds on the design variables are shown in Figure 8; the reflection coefficient, at nominal values of the uncertain inputs (the mean of their distribution), is also given.
While derivative-free optimizers such as COBLYA are resistant to a small amount of noise, they are local optimizers that will converge to a local minimum, and the number of iterations required for convergence is sensitive to the initial design point. Therefore, in order to fairly compare multiple optimizations on this design problem and so more thoroughly compare IR with regular MC, a global optimizer is used. Here, the evolutionary strategy optimizer from the ecspy python toolbox * is used, run with a population size of 25 until a total of 10 6 acoustic horn model evaluations have been sampled. This represents a design scenario in which a fixed computational budget is permitted and enables a consistent comparison between regular MC and IR optimizations over multiple optimization runs. *https://pypi.python.org/pypi/ecspy First, for reference, a classical approach to robust optimization is used and a weighted sum of the first two statistical moments of Y x ( ) = y(x, U( )) is minimized using regular IR and it is compared with regular MC. Then, the horsetail matching metric under a risk averse target is minimized using gIR. In both cases, n init = 100, = 0.1, and = 10.
The weighted sum of moments formulation is solving the following optimization problem: This function d ws is a combination of two quantities whose estimators are sample averages, and so the variance of its estimator can be estimated from a first-order Taylor expansion of d ws about estimators of E[Y x ( )] and V[Y x ( )], both of which can be evaluated using regular IR. 23,39 A required variance similar to that used to demonstrate the original IR formulation in the work of Ng and Willcox 23 is used here, ie, a value of V req = 2 × 10 −6 . The optimization is run from 10 random starting populations, such that one optimization that uses regular IR (labeled IR) and one that uses regular MC is performed from each starting point. Figure 9A gives the convergence of all the IR and MC optimizations. Figure 9B gives the number of sampled values needed to reach the required variance for the first 100 design candidates in the optimization for both techniques for a single optimization run and the correlation coefficient between sampled values at the design point and control point for the IR run.
The results on Figure 9 demonstrate how regular IR leads to computational savings compared with regular MC. On average, the IR optimizations achieved the objective function value that the regular MC optimizations took 10 6 evaluations to obtain in ≃ 2.5 × 10 5 evaluations. Additionally, it can be seen how for the IR optimization, the computational cost is smallest when the correlation between sampled values at the current design point and control point are close to one; the spikes on Figure 9B where the computational cost of IR increases correspond to iterations where the correlation drops significantly below 1.
Next, gIR is used to optimize the horsetail matching metric, giving the following optimization problem: where F −1 is the inverse CDF (quantile function) for the reflection coefficient given by the random variable Y( ) = y(x, U( )) at a given design. The risk averse target t(h) = −0.05h 4 and a required variance of 4 × 10 −7 are used. Again, n init = 100, = 0.1, and = 10. Figure 10A plots the convergence for both gIR and MC optimizations. Figure 10B gives the number of evaluations needed to reach the required variance for the first 100 design points evaluated, along with correlation between sampled values at the current design point and the control point for a single optimization run.  Comparing Figures 9 and 10, it appears that similar computational gains are achieved by the gIR method with a non-sample average estimator as regular IR with a sample average estimator. Once again, the gIR optimizations achieved the objective function value that the regular MC optimizations took 10 6 evaluations to obtain in ≃ 2.5 × 10 5 evaluations.
It can also be observed that, in both cases, there is no significant benefit of IR and gIR over regular MC until after ≃ 100 × 10 3 evaluations. This is due to the fact that the evolutionary strategy optimizer deals with populations of solutions, and so the first population is likely to be spread out in design space with little correlation between design points. This is also observable on Figures 9B and 10B, where for the first ≃ 25 designs evaluated (corresponding to the population size), the gIR and MC optimization runs required a similar number of evaluations.
To compare the designs obtained from these optimizations, an empirical CDF is obtained using 1000 sampled values of Y A ( ) for the best design from all 10 runs in both the weighted sum and horsetail matching cases, which are plotted on  Figure 11B (along with the target used in the horsetail matching formulation). Additionally, for reference, an empirical CDF is obtained for the designs at the upper and lower design variable bounds along with a random point in design space, and these are plotted on Figure 11A; these CDFs illustrate the magnitude of the variation of the reflection coefficient induced by the uncertain input parameters, along with the variation in shape of CDF over design space. Figure 11 demonstrates that using the weighted sum of moments for this problem has given rise to a design that is stochastically dominated by the design obtained by using the horsetail matching metric, ie, the CDFs of these two designs do not cross. This highlights part of the goal in developing this gIR, ie, certain quantities, whose estimators are not sample averages, can offer more powerful formulations for design under uncertainty, but one does not wish to take a computational penalty for using them over more traditional quantities with sample average estimators. While it is difficult to quantitatively compare the gIR approach optimizing the horsetail matching metric to the regular IR approach optimizing a weighted sum of moments, since they use different objectives and different values for the required variances, the similarity between the advantage of generalized IR over regular MC on Figure 10 and the advantage of regular IR over regular MC on Figure 9 suggests that this goal has been achieved.

CONCLUSIONS
Using MC sampling within OUU is appropriate when conditions favorable for other more efficient uncertainty propagation methods are not present (eg, when the number of uncertainties is too high or the problem is not sufficiently smooth). Optimization under uncertainty using MC sampling requires a relatively large number of system evaluations so it may be infeasible for computationally expensive applications; however, multifidelity methods for MC sampling can reduce the computational cost enough for it to become feasible in many cases.
Information reuse is a multifidelity method for MC sampling within OUU that treats neighboring design points in an optimization as lower fidelity models, meaning it is a multifidelity method that can be applied to problems where only a single fidelity model is available. This paper has proposed a generalized information reuse method that extends to optimization formulations using quantities whose estimators cannot be expressed as a sample average.
On an algebraic test problem, the gIR method accurately predicts the variance of two non-sample average estimators, and when used in a derivative-free local optimizer, gives significant computational savings over regular MC sampling. On a physical acoustic horn design problem, when used in an evolutionary strategy global optimizer, optimizations of the horsetail matching metric using gIR provide similar computational savings over regular MC to optimizations of a weighted sum of mean and standard deviation using regular IR. Additionally, the optimal design of the horsetail matching metric stochastically dominates the optimal design of weighted sum of moments, highlighting the importance of being able to use quantities without sample average estimators in optimization.